diff --git "a/data/dataarxivfinal.csv" "b/data/dataarxivfinal.csv" --- "a/data/dataarxivfinal.csv" +++ "b/data/dataarxivfinal.csv" @@ -1,122378 +1,3 @@ -,id,authors,title,categories,abstract -0,0704.2083,"H. Satori, M. Harti and N. Chenfour",Introduction to Arabic Speech Recognition Using CMUSphinx System,cs.CL cs.AI," In this paper Arabic was investigated from the speech recognition problem -point of view. We propose a novel approach to build an Arabic Automated Speech -Recognition System (ASR). This system is based on the open source CMU Sphinx-4, -from the Carnegie Mellon University. CMU Sphinx is a large-vocabulary; -speaker-independent, continuous speech recognition system based on discrete -Hidden Markov Models (HMMs). We build a model using utilities from the -OpenSource CMU Sphinx. We will demonstrate the possible adaptability of this -system to Arabic voice recognition. -" -1,0704.2201,"H. Satori, M. Harti and N. Chenfour",Arabic Speech Recognition System using CMU-Sphinx4,cs.CL cs.AI," In this paper we present the creation of an Arabic version of Automated -Speech Recognition System (ASR). This system is based on the open source -Sphinx-4, from the Carnegie Mellon University. Which is a speech recognition -system based on discrete hidden Markov models (HMMs). We investigate the -changes that must be made to the model to adapt Arabic voice recognition. - Keywords: Speech recognition, Acoustic model, Arabic language, HMMs, -CMUSphinx-4, Artificial intelligence. -" -2,0704.3662,"Mike Tian-Jian Jiang, James Zhan, Jaimie Lin, Jerry Lin, Wen-Lien Hsu",An Automated Evaluation Metric for Chinese Text Entry,cs.HC cs.CL," In this paper, we propose an automated evaluation metric for text entry. We -also consider possible improvements to existing text entry evaluation metrics, -such as the minimum string distance error rate, keystrokes per character, cost -per correction, and a unified approach proposed by MacKenzie, so they can -accommodate the special characteristics of Chinese text. Current methods lack -an integrated concern about both typing speed and accuracy for Chinese text -entry evaluation. Our goal is to remove the bias that arises due to human -factors. First, we propose a new metric, called the correction penalty (P), -based on Fitts' law and Hick's law. Next, we transform it into the approximate -amortized cost (AAC) of information theory. An analysis of the AAC of Chinese -text input methods with different context lengths is also presented. -" -3,0704.3665,"Mike Tian-Jian Jiang, Deng Liu, Meng-Juei Hsieh, Wen-Lien Hsu",On the Development of Text Input Method - Lessons Learned,cs.CL cs.HC," Intelligent Input Methods (IM) are essential for making text entries in many -East Asian scripts, but their application to other languages has not been fully -explored. This paper discusses how such tools can contribute to the development -of computer processing of other oriental languages. We propose a design -philosophy that regards IM as a text service platform, and treats the study of -IM as a cross disciplinary subject from the perspectives of software -engineering, human-computer interaction (HCI), and natural language processing -(NLP). We discuss these three perspectives and indicate a number of possible -future research directions. -" -4,0704.3708,Bernat Corominas-Murtra,Network statistics on early English Syntax: Structural criteria,cs.CL," This paper includes a reflection on the role of networks in the study of -English language acquisition, as well as a collection of practical criteria to -annotate free-speech corpora from children utterances. At the theoretical -level, the main claim of this paper is that syntactic networks should be -interpreted as the outcome of the use of the syntactic machinery. Thus, the -intrinsic features of such machinery are not accessible directly from (known) -network properties. Rather, what one can see are the global patterns of its use -and, thus, a global view of the power and organization of the underlying -grammar. Taking a look into more practical issues, the paper examines how to -build a net from the projection of syntactic relations. Recall that, as opposed -to adult grammars, early-child language has not a well-defined concept of -structure. To overcome such difficulty, we develop a set of systematic criteria -assuming constituency hierarchy and a grammar based on lexico-thematic -relations. At the end, what we obtain is a well defined corpora annotation that -enables us i) to perform statistics on the size of structures and ii) to build -a network from syntactic relations over which we can perform the standard -measures of complexity. We also provide a detailed example. -" -5,0704.3886,Walid S. Saba,A Note on Ontology and Ordinary Language,cs.AI cs.CL," We argue for a compositional semantics grounded in a strongly typed ontology -that reflects our commonsense view of the world and the way we talk about it. -Assuming such a structure we show that the semantics of various natural -language phenomena may become nearly trivial. -" -6,0705.0462,"Paul-Andr\'e Melli\`es (PPS), Nicolas Tabareau (PPS)",Resource modalities in game semantics,math.CT cs.CL," The description of resources in game semantics has never achieved the -simplicity and precision of linear logic, because of a misleading conception: -the belief that linear logic is more primitive than game semantics. We advocate -instead the contrary: that game semantics is conceptually more primitive than -linear logic. Starting from this revised point of view, we design a categorical -model of resources in game semantics, and construct an arena game model where -the usual notion of bracketing is extended to multi- bracketing in order to -capture various resource policies: linear, affine and exponential. -" -7,0705.1161,Lillian Lee,"IDF revisited: A simple new derivation within the Robertson-Sp\""arck - Jones probabilistic model",cs.IR cs.CL," There have been a number of prior attempts to theoretically justify the -effectiveness of the inverse document frequency (IDF). Those that take as their -starting point Robertson and Sparck Jones's probabilistic model are based on -strong or complex assumptions. We show that a more intuitively plausible -assumption suffices. Moreover, the new assumption, while conceptually very -simple, provides a solution to an estimation problem that had been deemed -intractable by Robertson and Walker (1997). -" -8,0705.4676,Daniel Lemire and Owen Kaser,"Recursive n-gram hashing is pairwise independent, at best",cs.DB cs.CL," Many applications use sequences of n consecutive symbols (n-grams). Hashing -these n-grams can be a performance bottleneck. For more speed, recursive hash -families compute hash values by updating previous values. We prove that -recursive hash families cannot be more than pairwise independent. While hashing -by irreducible polynomials is pairwise independent, our implementations either -run in time O(n) or use an exponential amount of memory. As a more scalable -alternative, we make hashing by cyclic polynomials pairwise independent by -ignoring n-1 bits. Experimentally, we show that hashing by cyclic polynomials -is is twice as fast as hashing by irreducible polynomials. We also show that -randomized Karp-Rabin hash families are not pairwise independent. -" -9,0707.0895,Damian H. Zanette,Segmentation and Context of Literary and Musical Sequences,cs.CL physics.data-an," We test a segmentation algorithm, based on the calculation of the -Jensen-Shannon divergence between probability distributions, to two symbolic -sequences of literary and musical origin. The first sequence represents the -successive appearance of characters in a theatrical play, and the second -represents the succession of tones from the twelve-tone scale in a keyboard -sonata. The algorithm divides the sequences into segments of maximal -compositional divergence between them. For the play, these segments are related -to changes in the frequency of appearance of different characters and in the -geographical setting of the action. For the sonata, the segments correspond to -tonal domains and reveal in detail the characteristic tonal progression of such -kind of musical composition. -" -10,0707.1913,"Owen Kaser, Daniel Lemire","Removing Manually-Generated Boilerplate from Electronic Texts: - Experiments with Project Gutenberg e-Books",cs.DL cs.CL," Collaborative work on unstructured or semi-structured documents, such as in -literature corpora or source code, often involves agreed upon templates -containing metadata. These templates are not consistent across users and over -time. Rule-based parsing of these templates is expensive to maintain and tends -to fail as new documents are added. Statistical techniques based on frequent -occurrences have the potential to identify automatically a large fraction of -the templates, thus reducing the burden on the programmers. We investigate the -case of the Project Gutenberg corpus, where most documents are in ASCII format -with preambles and epilogues that are often copied and pasted or manually -typed. We show that a statistical approach can solve most cases though some -documents require knowledge of English. We also survey various technical -solutions that make our approach applicable to large data sets. -" -11,0707.3269,"Laurent Romary (INRIA Lorraine - LORIA), Nancy Ide (INRIA Lorraine - - LORIA)",International Standard for a Linguistic Annotation Framework,cs.CL," This paper describes the Linguistic Annotation Framework under development -within ISO TC37 SC4 WG1. The Linguistic Annotation Framework is intended to -serve as a basis for harmonizing existing language resources as well as -developing new ones. -" -12,0707.3270,"Laurent Romary (INRIA Lorraine - LORIA), Nancy Ide, Adam Kilgarriff",A Formal Model of Dictionary Structure and Content,cs.CL," We show that a general model of lexical information conforms to an abstract -model that reflects the hierarchy of information found in a typical dictionary -entry. We show that this model can be mapped into a well-formed XML document, -and how the XSL transformation language can be used to implement a semantics -defined over the abstract model to enable extraction and manipulation of the -information in any format. -" -13,0707.3559,Wilson Wong,"Practical Approach to Knowledge-based Question Answering with Natural - Language Understanding and Advanced Reasoning",cs.CL cs.AI cs.HC cs.IR," This research hypothesized that a practical approach in the form of a -solution framework known as Natural Language Understanding and Reasoning for -Intelligence (NaLURI), which combines full-discourse natural language -understanding, powerful representation formalism capable of exploiting -ontological information and reasoning approach with advanced features, will -solve the following problems without compromising practicality factors: 1) -restriction on the nature of question and response, and 2) limitation to scale -across domains and to real-life natural language text. -" -14,0707.3972,Ted Pedersen,Learning Probabilistic Models of Word Sense Disambiguation,cs.CL cs.AI," This dissertation presents several new methods of supervised and unsupervised -learning of word sense disambiguation models. The supervised methods focus on -performing model searches through a space of probabilistic models, and the -unsupervised methods rely on the use of Gibbs Sampling and the Expectation -Maximization (EM) algorithm. In both the supervised and unsupervised case, the -Naive Bayesian model is found to perform well. An explanation for this success -is presented in terms of learning rates and bias-variance decompositions. -" -15,0708.0694,"Maurice HT Ling, Christophe Lefevre, Kevin R. Nicholas, and Feng Lin","Reconstruction of Protein-Protein Interaction Pathways by Mining - Subject-Verb-Objects Intermediates",cs.IR cs.CL cs.DL," The exponential increase in publication rate of new articles is limiting -access of researchers to relevant literature. This has prompted the use of text -mining tools to extract key biological information. Previous studies have -reported extensive modification of existing generic text processors to process -biological text. However, this requirement for modification had not been -examined. In this study, we have constructed Muscorian, using MontyLingua, a -generic text processor. It uses a two-layered generalization-specialization -paradigm previously proposed where text was generically processed to a suitable -intermediate format before domain-specific data extraction techniques are -applied at the specialization layer. Evaluation using a corpus and experts -indicated 86-90% precision and approximately 30% recall in extracting -protein-protein interactions, which was comparable to previous studies using -either specialized biological text processing tools or modified existing tools. -Our study had also demonstrated the flexibility of the two-layered -generalization-specialization paradigm by using the same generalization layer -for two specialized information extraction tasks. -" -16,0708.1564,Stasinos Konstantopoulos,Learning Phonotactics Using ILP,cs.CL," This paper describes experiments on learning Dutch phonotactic rules using -Inductive Logic Programming, a machine learning discipline based on inductive -logical operators. Two different ways of approaching the problem are -experimented with, and compared against each other as well as with related work -on the task. The results show a direct correspondence between the quality and -informedness of the background knowledge and the constructed theory, -demonstrating the ability of ILP to take good advantage of the prior domain -knowledge available. Further research is outlined. -" -17,0708.2303,Walid S. Saba,Compositional Semantics Grounded in Commonsense Metaphysics,cs.AI cs.CL," We argue for a compositional semantics grounded in a strongly typed ontology -that reflects our commonsense view of the world and the way we talk about it in -ordinary language. Assuming the existence of such a structure, we show that the -semantics of various natural language phenomena may become nearly trivial. -" -18,0709.0116,Fionn Murtagh,On Ultrametric Algorithmic Information,cs.AI cs.CL," How best to quantify the information of an object, whether natural or -artifact, is a problem of wide interest. A related problem is the computability -of an object. We present practical examples of a new way to address this -problem. By giving an appropriate representation to our objects, based on a -hierarchical coding of information, we exemplify how it is remarkably easy to -compute complex objects. Our algorithmic complexity is related to the length of -the class of objects, rather than to the length of the object. -" -19,0709.2401,Timothy Baldwin,Bootstrapping Deep Lexical Resources: Resources for Courses,cs.CL," We propose a range of deep lexical acquisition methods which make use of -morphological, syntactic and ontological language resources to model word -similarity and bootstrap from a seed lexicon. The different methods are -deployed in learning lexical items for a precision grammar, and shown to each -have strengths and weaknesses over different word classes. A particular focus -of this paper is the relative accessibility of different language resource -types, and predicted ``bang for the buck'' associated with each in deep lexical -acquisition applications. -" -20,0710.0009,Adam Lipowski and Dorota Lipowska,"Bio-linguistic transition and Baldwin effect in an evolutionary - naming-game model",cs.CL cond-mat.stat-mech cs.AI physics.soc-ph q-bio.PE," We examine an evolutionary naming-game model where communicating agents are -equipped with an evolutionarily selected learning ability. Such a coupling of -biological and linguistic ingredients results in an abrupt transition: upon a -small change of a model control parameter a poorly communicating group of -linguistically unskilled agents transforms into almost perfectly communicating -group with large learning abilities. When learning ability is kept fixed, the -transition appears to be continuous. Genetic imprinting of the learning -abilities proceeds via Baldwin effect: initially unskilled communicating agents -learn a language and that creates a niche in which there is an evolutionary -pressure for the increase of learning ability.Our model suggests that when -linguistic (or cultural) processes became intensive enough, a transition took -place where both linguistic performance and biological endowment of our species -experienced an abrupt change that perhaps triggered the rapid expansion of -human civilization. -" -21,0710.0105,Dmitrii Manin,Zipf's Law and Avoidance of Excessive Synonymy,cs.CL physics.soc-ph," Zipf's law states that if words of language are ranked in the order of -decreasing frequency in texts, the frequency of a word is inversely -proportional to its rank. It is very robust as an experimental observation, but -to date it escaped satisfactory theoretical explanation. We suggest that Zipf's -law may arise from the evolution of word semantics dominated by expansion of -meanings and competition of synonyms. -" -22,0710.0169,A. A. Krizhanovsky,"Evaluation experiments on related terms search in Wikipedia: Information - Content and Adapted HITS (In Russian)",cs.IR cs.CL," The classification of metrics and algorithms search for related terms via -WordNet, Roget's Thesaurus, and Wikipedia was extended to include adapted HITS -algorithm. Evaluation experiments on Information Content and adapted HITS -algorithm are described. The test collection of Russian word pairs with -human-assigned similarity judgments is proposed. - ----- - Klassifikacija metrik i algoritmov poiska semanticheski blizkih slov v -tezaurusah WordNet, Rozhe i jenciklopedii Vikipedija rasshirena adaptirovannym -HITS algoritmom. S pomow'ju jeksperimentov v Vikipedii oceneny metrika -Information Content i adaptirovannyj algoritm HITS. Predlozhen resurs dlja -ocenki semanticheskoj blizosti russkih slov. -" -23,0710.0225,"D.V. Lande, A.A. Snarskii",On the role of autocorrelations in texts,cs.CL," The task of finding a criterion allowing to distinguish a text from an -arbitrary set of words is rather relevant in itself, for instance, in the -aspect of development of means for internet-content indexing or separating -signals and noise in communication channels. The Zipf law is currently -considered to be the most reliable criterion of this kind [3]. At any rate, -conventional stochastic word sets do not meet this law. The present paper deals -with one of possible criteria based on the determination of the degree of data -compression. -" -24,0710.0228,"S. Braichevsky, D. Lande, A. Snarskii","On the fractal nature of mutual relevance sequences in the Internet news - message flows",cs.CL," In the task of information retrieval the term relevance is taken to mean -formal conformity of a document given by the retrieval system to user's -information query. As a rule, the documents found by the retrieval system -should be submitted to the user in a certain order. Therefore, a retrieval -perceived as a selection of documents formally solving the user's query, should -be supplemented with a certain procedure of processing a relevant set. It would -be natural to introduce a quantitative measure of document conformity to query, -i.e. the relevance measure. Since no single rule exists for the determination -of the relevance measure, we shall consider two of them which are the simplest -in our opinion. The proposed approach does not suppose any restrictions and can -be applied to other relevance measures. -" -25,0710.1481,Stasinos Konstantopoulos,What's in a Name?,cs.CL cs.AI," This paper describes experiments on identifying the language of a single name -in isolation or in a document written in a different language. A new corpus has -been compiled and made available, matching names against languages. This corpus -is used in a series of experiments measuring the performance of general -language models and names-only language models on the language identification -task. Conclusions are drawn from the comparison between using general language -models and names-only language models and between identifying the language of -isolated names and the language of very short document fragments. Future -research directions are outlined. -" -26,0710.1511,Damian H. Zanette,Demographic growth and the distribution of language sizes,physics.data-an cs.CL physics.soc-ph," It is argued that the present log-normal distribution of language sizes is, -to a large extent, a consequence of demographic dynamics within the population -of speakers of each language. A two-parameter stochastic multiplicative process -is proposed as a model for the population dynamics of individual languages, and -applied over a period spanning the last ten centuries. The model disregards -language birth and death. A straightforward fitting of the two parameters, -which statistically characterize the population growth rate, predicts a -distribution of language sizes in excellent agreement with empirical data. -Numerical simulations, and the study of the size distribution within language -families, validate the assumptions at the basis of the model. -" -27,0710.2446,"Catherine Recanati (LIPN), Nicoleta Rogovschi (LIPN), Youn\`es Bennani - (LIPN)","The structure of verbal sequences analyzed with unsupervised learning - techniques",cs.CL cs.AI cs.LG," Data mining allows the exploration of sequences of phenomena, whereas one -usually tends to focus on isolated phenomena or on the relation between two -phenomena. It offers invaluable tools for theoretical analyses and exploration -of the structure of sentences, texts, dialogues, and speech. We report here the -results of an attempt at using it for inspecting sequences of verbs from French -accounts of road accidents. This analysis comes from an original approach of -unsupervised training allowing the discovery of the structure of sequential -data. The entries of the analyzer were only made of the verbs appearing in the -sentences. It provided a classification of the links between two successive -verbs into four distinct clusters, allowing thus text segmentation. We give -here an interpretation of these clusters by applying a statistical analysis to -independent semantic annotations. -" -28,0710.2674,James Ford,Linguistic Information Energy,cs.CL cs.IT math.IT," In this treatment a text is considered to be a series of word impulses which -are read at a constant rate. The brain then assembles these units of -information into higher units of meaning. A classical systems approach is used -to model an initial part of this assembly process. The concepts of linguistic -system response, information energy, and ordering energy are defined and -analyzed. Finally, as a demonstration, information energy is used to estimate -the publication dates of a series of texts and the similarity of a set of -texts. -" -29,0710.2852,"Patrick Blackburn (INRIA Lorraine - LORIA), S\'ebastien Hinderer - (INRIA Lorraine - LORIA)",Generating models for temporal representations,cs.CL," We discuss the use of model building for temporal representations. We chose -Polish to illustrate our discussion because it has an interesting aspectual -system, but the points we wish to make are not language specific. Rather, our -goal is to develop theoretical and computational tools for temporal model -building tasks in computational semantics. To this end, we present a -first-order theory of time and events which is rich enough to capture -interesting semantic distinctions, and an algorithm which takes minimal models -for first-order theories and systematically attempts to ``perturb'' their -temporal component to provide non-minimal, but semantically significant, -models. -" -30,0710.2988,Paul Bedaride (INRIA Lorraine - Loria),Using Description Logics for Recognising Textual Entailment,cs.CL," The aim of this paper is to show how we can handle the Recognising Textual -Entailment (RTE) task by using Description Logics (DLs). To do this, we propose -a representation of natural language semantics in DLs inspired by existing -representations in first-order logic. But our most significant contribution is -the definition of two novel inference tasks: A-Box saturation and subgraph -detection which are crucial for our approach to RTE. -" -31,0710.3285,Tretjakova Tamara,Nontraditional Scoring of C-tests,cs.CY cs.CL," In C-tests the hypothesis of items local independence is violated, which -doesn't permit to consider them as real tests. It is suggested to determine the -distances between separate C-test items (blanks) and to combine items into -clusters. Weights, inversely proportional to the number of items in -corresponding clusters, are assigned to items. As a result, the C-test -structure becomes similar to the structure of classical tests, without -violation of local independence hypothesis. -" -32,0710.3502,"Stergos D. Afantenos, V. Karkaletsis, P. Stamatopoulos and C. Halatsis","Using Synchronic and Diachronic Relations for Summarizing Multiple - Documents Describing Evolving Events",cs.CL cs.IR," In this paper we present a fresh look at the problem of summarizing evolving -events from multiple sources. After a discussion concerning the nature of -evolving events we introduce a distinction between linearly and non-linearly -evolving events. We present then a general methodology for the automatic -creation of summaries from evolving events. At its heart lie the notions of -Synchronic and Diachronic cross-document Relations (SDRs), whose aim is the -identification of similarities and differences between sources, from a -synchronical and diachronical perspective. SDRs do not connect documents or -textual elements found therein, but structures one might call messages. -Applying this methodology will yield a set of messages and relations, SDRs, -connecting them, that is a graph which we call grid. We will show how such a -grid can be considered as the starting point of a Natural Language Generation -System. The methodology is evaluated in two case-studies, one for linearly -evolving events (descriptions of football matches) and another one for -non-linearly evolving events (terrorist incidents involving hostages). In both -cases we evaluate the results produced by our computational systems. -" -33,0710.4516,"Thomas Sch\""urmann and Peter Grassberger",The predictability of letters in written english,physics.soc-ph cs.CL stat.ML," We show that the predictability of letters in written English texts depends -strongly on their position in the word. The first letters are usually the least -easy to predict. This agrees with the intuitive notion that words are well -defined subunits in written languages, with much weaker correlations across -these units than within them. It implies that the average entropy of a letter -deep inside a word is roughly 4 times smaller than the entropy of the first -letter. -" -34,0710.5382,Stergos D. Afantenos,"Some Reflections on the Task of Content Determination in the Context of - Multi-Document Summarization of Evolving Events",cs.CL," Despite its importance, the task of summarizing evolving events has received -small attention by researchers in the field of multi-document summariztion. In -a previous paper (Afantenos et al. 2007) we have presented a methodology for -the automatic summarization of documents, emitted by multiple sources, which -describe the evolution of an event. At the heart of this methodology lies the -identification of similarities and differences between the various documents, -in two axes: the synchronic and the diachronic. This is achieved by the -introduction of the notion of Synchronic and Diachronic Relations. Those -relations connect the messages that are found in the documents, resulting thus -in a graph which we call grid. Although the creation of the grid completes the -Document Planning phase of a typical NLG architecture, it can be the case that -the number of messages contained in a grid is very large, exceeding thus the -required compression rate. In this paper we provide some initial thoughts on a -probabilistic model which can be applied at the Content Determination stage, -and which tries to alleviate this problem. -" -35,0711.0666,"Ghazi Bouselmi (INRIA Lorraine - LORIA), Dominique Fohr (INRIA - Lorraine - LORIA), Irina Illina (INRIA Lorraine - LORIA), Jean-Paul Haton - (INRIA Lorraine - LORIA)","Discriminative Phoneme Sequences Extraction for Non-Native Speaker's - Origin Classification",cs.CL," In this paper we present an automated method for the classification of the -origin of non-native speakers. The origin of non-native speakers could be -identified by a human listener based on the detection of typical pronunciations -for each nationality. Thus we suppose the existence of several phoneme -sequences that might allow the classification of the origin of non-native -speakers. Our new method is based on the extraction of discriminative sequences -of phonemes from a non-native English speech database. These sequences are used -to construct a probabilistic classifier for the speakers' origin. The existence -of discriminative phone sequences in non-native speech is a significant result -of this work. The system that we have developed achieved a significant correct -classification rate of 96.3% and a significant error reduction compared to some -other tested techniques. -" -36,0711.0811,"Ghazi Bouselmi (INRIA Lorraine - LORIA), Dominique Fohr (INRIA - Lorraine - LORIA), Irina Illina (INRIA Lorraine - LORIA)","Combined Acoustic and Pronunciation Modelling for Non-Native Speech - Recognition",cs.CL," In this paper, we present several adaptation methods for non-native speech -recognition. We have tested pronunciation modelling, MLLR and MAP non-native -pronunciation adaptation and HMM models retraining on the HIWIRE foreign -accented English speech database. The ``phonetic confusion'' scheme we have -developed consists in associating to each spoken phone several sequences of -confused phones. In our experiments, we have used different combinations of -acoustic models representing the canonical and the foreign pronunciations: -spoken and native models, models adapted to the non-native accent with MAP and -MLLR. The joint use of pronunciation modelling and acoustic adaptation led to -further improvements in recognition accuracy. The best combination of the above -mentioned techniques resulted in a relative word error reduction ranging from -46% to 71%. -" -37,0711.1038,"Ghazi Bouselmi (INRIA Lorraine - LORIA), Dominique Fohr (INRIA - Lorraine - LORIA), Irina Illina (INRIA Lorraine - LORIA), Jean-Paul Haton - (INRIA Lorraine - LORIA)","Am\'elioration des Performances des Syst\`emes Automatiques de - Reconnaissance de la Parole pour la Parole Non Native",cs.CL," In this article, we present an approach for non native automatic speech -recognition (ASR). We propose two methods to adapt existing ASR systems to the -non-native accents. The first method is based on the modification of acoustic -models through integration of acoustic models from the mother tong. The -phonemes of the target language are pronounced in a similar manner to the -native language of speakers. We propose to combine the models of confused -phonemes so that the ASR system could recognize both concurrent -pronounciations. The second method we propose is a refinment of the -pronounciation error detection through the introduction of graphemic -constraints. Indeed, non native speakers may rely on the writing of words in -their uttering. Thus, the pronounctiation errors might depend on the characters -composing the words. The average error rate reduction that we observed is -(22.5%) relative for the sentence error rate, and 34.5% (relative) in word -error rate. -" -38,0711.1360,Damian H. Zanette,Analytical approach to bit-string models of language evolution,physics.soc-ph cs.CL," A formulation of bit-string models of language evolution, based on -differential equations for the population speaking each language, is introduced -and preliminarily studied. Connections with replicator dynamics and diffusion -processes are pointed out. The stability of the dominance state, where most of -the population speaks a single language, is analyzed within a mean-field-like -approximation, while the homogeneous state, where the population is evenly -distributed among languages, can be exactly studied. This analysis discloses -the existence of a bistability region, where dominance coexists with -homogeneity as possible asymptotic states. Numerical resolution of the -differential system validates these findings. -" -39,0711.2023,Peter D. Turney (National Research Council of Canada),Empirical Evaluation of Four Tensor Decomposition Algorithms,cs.LG cs.CL cs.IR," Higher-order tensor decompositions are analogous to the familiar Singular -Value Decomposition (SVD), but they transcend the limitations of matrices -(second-order tensors). SVD is a powerful tool that has achieved impressive -results in information retrieval, collaborative filtering, computational -linguistics, computational vision, and other fields. However, SVD is limited to -two-dimensional arrays of data (two modes), and many potential applications -have three or more modes, which require higher-order tensor decompositions. -This paper evaluates four algorithms for higher-order tensor decomposition: -Higher-Order Singular Value Decomposition (HO-SVD), Higher-Order Orthogonal -Iteration (HOOI), Slice Projection (SP), and Multislice Projection (MP). We -measure the time (elapsed run time), space (RAM and disk space requirements), -and fit (tensor reconstruction accuracy) of the four algorithms, under a -variety of conditions. We find that standard implementations of HO-SVD and HOOI -do not scale up to larger tensors, due to increasing RAM requirements. We -recommend HOOI for tensors that are small enough for the available RAM and MP -for larger tensors. -" -40,0711.2270,"I. M. Suslov (P.L.Kapitza Institute for Physical Problems, Moscow, - Russia)",Can a Computer Laugh ?,cs.CL cs.AI q-bio.NC," A computer model of ""a sense of humour"" suggested previously -[arXiv:0711.2058,0711.2061], relating the humorous effect with a specific -malfunction in information processing, is given in somewhat different -exposition. Psychological aspects of humour are elaborated more thoroughly. The -mechanism of laughter is formulated on the more general level. Detailed -discussion is presented for the higher levels of information processing, which -are responsible for a perception of complex samples of humour. Development of a -sense of humour in the process of evolution is discussed. -" -41,0711.2444,"Richard Moot (INRIA Futurs, Labri)",Proof nets for display logic,cs.CL," This paper explores several extensions of proof nets for the Lambek calculus -in order to handle the different connectives of display logic in a natural way. -The new proof net calculus handles some recent additions to the Lambek -vocabulary such as Galois connections and Grishin interactions. It concludes -with an exploration of the generative capacity of the Lambek-Grishin calculus, -presenting an embedding of lexicalized tree adjoining grammars into the -Lambek-Grishin calculus. -" -42,0711.3197,"I. M. Suslov (P.L.Kapitza Institute for Physical Problems, Moscow, - Russia)","How to realize ""a sense of humour"" in computers ?",cs.CL cs.AI q-bio.NC," Computer model of a ""sense of humour"" suggested previously [arXiv:0711.2058, -0711.2061, 0711.2270] is raised to the level of a realistic algorithm. -" -43,0711.3412,"Ivan Berlocher, Hyun-Gue Huh (IGM-LabInfo), Eric Laporte - (IGM-LabInfo), Jee-Sun Nam",Morphological annotation of Korean with Directly Maintainable Resources,cs.CL," This article describes an exclusively resource-based method of morphological -annotation of written Korean text. Korean is an agglutinative language. Our -annotator is designed to process text before the operation of a syntactic -parser. In its present state, it annotates one-stem words only. The output is a -graph of morphemes annotated with accurate linguistic information. The -granularity of the tagset is 3 to 5 times higher than usual tagsets. A -comparison with a reference annotated corpus showed that it achieves 89% recall -without any corpus training. The language resources used by the system are -lexicons of stems, transducers of suffixes and transducers of generation of -allomorphs. All can be easily updated, which allows users to control the -evolution of the performances of the system. It has been claimed that -morphological annotation of Korean text could only be performed by a -morphological analysis module accessing a lexicon of morphemes. We show that it -can also be performed directly with a lexicon of words and without applying -morphological rules at annotation time, which speeds up annotation to 1,210 -word/s. The lexicon of words is obtained from the maintainable language -resources through a fully automated compilation process. -" -44,0711.3449,Eric Laporte (IGM-LabInfo),Lexicon management and standard formats,cs.CL," International standards for lexicon formats are in preparation. To a certain -extent, the proposed formats converge with prior results of standardization -projects. However, their adequacy for (i) lexicon management and (ii) -lexicon-driven applications have been little debated in the past, nor are they -as a part of the present standardization effort. We examine these issues. IGM -has developed XML formats compatible with the emerging international standards, -and we report experimental results on large-coverage lexica. -" -45,0711.3452,Eric Laporte (IGM-LabInfo),In memoriam Maurice Gross,cs.CL," Maurice Gross (1934-2001) was both a great linguist and a pioneer in natural -language processing. This article is written in homage to his memory -" -46,0711.3453,"Hyun-Gue Huh (IGM-LabInfo), Eric Laporte (IGM-LabInfo)",A resource-based Korean morphological annotation system,cs.CL," We describe a resource-based method of morphological annotation of written -Korean text. Korean is an agglutinative language. The output of our system is a -graph of morphemes annotated with accurate linguistic information. The language -resources used by the system can be easily updated, which allows us-ers to -control the evolution of the per-formances of the system. We show that -morphological annotation of Korean text can be performed directly with a -lexicon of words and without morpho-logical rules. -" -47,0711.3454,"Eric Laporte (IGM-LabInfo), S\'ebastien Paumier (IGM-LabInfo)",Graphes param\'etr\'es et outils de lexicalisation,cs.CL," Shifting to a lexicalized grammar reduces the number of parsing errors and -improves application results. However, such an operation affects a syntactic -parser in all its aspects. One of our research objectives is to design a -realistic model for grammar lexicalization. We carried out experiments for -which we used a grammar with a very simple content and formalism, and a very -informative syntactic lexicon, the lexicon-grammar of French elaborated by the -LADL. Lexicalization was performed by applying the parameterized-graph -approach. Our results tend to show that most information in the lexicon-grammar -can be transferred into a grammar and exploited successfully for the syntactic -parsing of sentences. -" -48,0711.3457,Eric Laporte (IGM-LabInfo),Evaluation of a Grammar of French Determiners,cs.CL," Existing syntactic grammars of natural languages, even with a far from -complete coverage, are complex objects. Assessments of the quality of parts of -such grammars are useful for the validation of their construction. We evaluated -the quality of a grammar of French determiners that takes the form of a -recursive transition network. The result of the application of this local -grammar gives deeper syntactic information than chunking or information -available in treebanks. We performed the evaluation by comparison with a corpus -independently annotated with information on determiners. We obtained 86% -precision and 92% recall on text not tagged for parts of speech. -" -49,0711.3605,"Eric Laporte (IGM-LabInfo), Christian Lecl\`ere (IGM-LabInfo), Maria - Carmelita P. Dias",Very strict selectional restrictions,cs.CL," We discuss the characteristics and behaviour of two parallel classes of verbs -in two Romance languages, French and Portuguese. Examples of these verbs are -Port. abater [gado] and Fr. abattre [b\'etail], both meaning ""slaughter -[cattle]"". In both languages, the definition of the class of verbs includes -several features: - They have only one essential complement, which is a direct -object. - The nominal distribution of the complement is very limited, i.e., few -nouns can be selected as head nouns of the complement. However, this selection -is not restricted to a single noun, as would be the case for verbal idioms such -as Fr. monter la garde ""mount guard"". - We excluded from the class -constructions which are reductions of more complex constructions, e.g. Port. -afinar [instrumento] com ""tune [instrument] with"". -" -50,0711.3691,"Olivier Blanc (IGM-LabInfo), Matthieu Constant (IGM-LabInfo), Eric - Laporte (IGM-LabInfo)","Outilex, plate-forme logicielle de traitement de textes \'ecrits",cs.CL," The Outilex software platform, which will be made available to research, -development and industry, comprises software components implementing all the -fundamental operations of written text processing: processing without lexicons, -exploitation of lexicons and grammars, language resource management. All data -are structured in XML formats, and also in more compact formats, either -readable or binary, whenever necessary; the required format converters are -included in the platform; the grammar formats allow for combining statistical -approaches with resource-based approaches. Manually constructed lexicons for -French and English, originating from the LADL, and of substantial coverage, -will be distributed with the platform under LGPL-LR license. -" -51,0711.3726,Michael Zock and Stergos D. Afantenos,Let's get the student into the driver's seat,cs.CL," Speaking a language and achieving proficiency in another one is a highly -complex process which requires the acquisition of various kinds of knowledge -and skills, like the learning of words, rules and patterns and their connection -to communicative goals (intentions), the usual starting point. To help the -learner to acquire these skills we propose an enhanced, electronic version of -an age old method: pattern drills (henceforth PDs). While being highly regarded -in the fifties, PDs have become unpopular since then, partially because of -their lack of grounding (natural context) and rigidity. Despite these -shortcomings we do believe in the virtues of this approach, at least with -regard to the acquisition of basic linguistic reflexes or skills (automatisms), -necessary to survive in the new language. Of course, the method needs -improvement, and we will show here how this can be achieved. Unlike tapes or -books, computers are open media, allowing for dynamic changes, taking users' -performances and preferences into account. Building an electronic version of -PDs amounts to building an open resource, accomodatable to the users' ever -changing needs. -" -52,0711.4475,{\L}ukasz D\k{e}bowski,Valence extraction using EM selection and co-occurrence matrices,cs.CL," This paper discusses two new procedures for extracting verb valences from raw -texts, with an application to the Polish language. The first novel technique, -the EM selection algorithm, performs unsupervised disambiguation of valence -frame forests, obtained by applying a non-probabilistic deep grammar parser and -some post-processing to the text. The second new idea concerns filtering of -incorrect frames detected in the parsed text and is motivated by an observation -that verbs which take similar arguments tend to have similar frames. This -phenomenon is described in terms of newly introduced co-occurrence matrices. -Using co-occurrence matrices, we split filtering into two steps. The list of -valid arguments is first determined for each verb, whereas the pattern -according to which the arguments are combined into frames is computed in the -following stage. Our best extracted dictionary reaches an $F$-score of 45%, -compared to an $F$-score of 39% for the standard frame-based BHT filtering. -" -53,0712.1529,Walid S. Saba,Ontology and Formal Semantics - Integration Overdue,cs.AI cs.CL," In this note we suggest that difficulties encountered in natural language -semantics are, for the most part, due to the use of mere symbol manipulation -systems that are devoid of any content. In such systems, where there is hardly -any link with our common-sense view of the world, and it is quite difficult to -envision how one can formally account for the considerable amount of content -that is often implicit, but almost never explicitly stated in our everyday -discourse. The solution, in our opinion, is a compositional semantics grounded -in an ontology that reflects our commonsense view of the world and the way we -talk about it in ordinary language. In the compositional logic we envision -there are ontological (or first-intension) concepts, and logical (or -second-intension) concepts, and where the ontological concepts include not only -Davidsonian events, but other abstract objects as well (e.g., states, -processes, properties, activities, attributes, etc.) It will be demonstrated -here that in such a framework, a number of challenges in the semantics of -natural language (e.g., metonymy, intensionality, metaphor, etc.) can be -properly and uniformly addressed. -" -54,0712.3298,"Dragomir Radev, Mark Hodges, Anthony Fader, Mark Joseph, Joshua - Gerrish, Mark Schaller, Jonathan dePeri, Bryan Gibson",CLAIRLIB Documentation v1.03,cs.IR cs.CL," The Clair library is intended to simplify a number of generic tasks in -Natural Language Processing (NLP), Information Retrieval (IR), and Network -Analysis. Its architecture also allows for external software to be plugged in -with very little effort. Functionality native to Clairlib includes -Tokenization, Summarization, LexRank, Biased LexRank, Document Clustering, -Document Indexing, PageRank, Biased PageRank, Web Graph Analysis, Network -Generation, Power Law Distribution Analysis, Network Analysis (clustering -coefficient, degree distribution plotting, average shortest path, diameter, -triangles, shortest path matrices, connected components), Cosine Similarity, -Random Walks on Graphs, Statistics (distributions, tests), Tf, Idf, Community -Finding. -" -55,0712.3705,Tuomo Kakkonen,Framework and Resources for Natural Language Parser Evaluation,cs.CL," Because of the wide variety of contemporary practices used in the automatic -syntactic parsing of natural languages, it has become necessary to analyze and -evaluate the strengths and weaknesses of different approaches. This research is -all the more necessary because there are currently no genre- and -domain-independent parsers that are able to analyze unrestricted text with 100% -preciseness (I use this term to refer to the correctness of analyses assigned -by a parser). All these factors create a need for methods and resources that -can be used to evaluate and compare parsing systems. This research describes: -(1) A theoretical analysis of current achievements in parsing and parser -evaluation. (2) A framework (called FEPa) that can be used to carry out -practical parser evaluations and comparisons. (3) A set of new evaluation -resources: FiEval is a Finnish treebank under construction, and MGTS and RobSet -are parser evaluation resources in English. (4) The results of experiments in -which the developed evaluation framework and the two resources for English were -used for evaluating a set of selected parsers. -" -56,0801.0253,Greg J. Stephens and William Bialek,Toward a statistical mechanics of four letter words,q-bio.NC cs.CL physics.data-an physics.soc-ph," We consider words as a network of interacting letters, and approximate the -probability distribution of states taken on by this network. Despite the -intuition that the rules of English spelling are highly combinatorial (and -arbitrary), we find that maximum entropy models consistent with pairwise -correlations among letters provide a surprisingly good approximation to the -full statistics of four letter words, capturing ~92% of the multi-information -among letters and even ""discovering"" real words that were not represented in -the data from which the pairwise correlations were estimated. The maximum -entropy model defines an energy landscape on the space of possible words, and -local minima in this landscape account for nearly two-thirds of words used in -written English. -" -57,0801.1179,"Bernard Jacquemin (ISC, UMR 7044, GERIICO), Sabine Ploux (ISC)",Corpus sp{\'e}cialis{\'e} et ressource de sp{\'e}cialit{\'e},cs.IR cs.CL," ""Semantic Atlas"" is a mathematic and statistic model to visualise word senses -according to relations between words. The model, that has been applied to -proximity relations from a corpus, has shown its ability to distinguish word -senses as the corpus' contributors comprehend them. We propose to use the model -and a specialised corpus in order to create automatically a specialised -dictionary relative to the corpus' domain. A morpho-syntactic analysis -performed on the corpus makes it possible to create the dictionary from -syntactic relations between lexical units. The semantic resource can be used to -navigate semantically - and not only lexically - through the corpus, to create -classical dictionaries or for diachronic studies of the language. -" -58,0801.1415,S. Wichmann,The emerging field of language dynamics,cs.CL physics.soc-ph," A simple review by a linguist, citing many articles by physicists: -Quantitative methods, agent-based computer simulations, language dynamics, -language typology, historical linguistics -" -59,0801.1658,"Adam Lipowski, Dorota Lipowska","Computational approach to the emergence and evolution of language - - evolutionary naming game model",physics.soc-ph cs.CL cs.MA," Computational modelling with multi-agent systems is becoming an important -technique of studying language evolution. We present a brief introduction into -this rapidly developing field, as well as our own contributions that include an -analysis of the evolutionary naming-game model. In this model communicating -agents, that try to establish a common vocabulary, are equipped with an -evolutionarily selected learning ability. Such a coupling of biological and -linguistic ingredients results in an abrupt transition: upon a small change of -the model control parameter a poorly communicating group of linguistically -unskilled agents transforms into almost perfectly communicating group with -large learning abilities. Genetic imprinting of the learning abilities proceeds -via Baldwin effect: initially unskilled communicating agents learn a language -and that creates a niche in which there is an evolutionary pressure for the -increase of learning ability. Under the assumption that communication intensity -increases continuously with finite speed, the transition is split into several -transition-like changes. It shows that the speed of cultural changes, that sets -an additional characteristic timescale, might be yet another factor affecting -the evolution of language. In our opinion, this model shows that linguistic and -biological processes have a strong influence on each other and this effect -certainly has contributed to an explosive development of our species. -" -60,0801.2510,J. Gillet and M. Ausloos,"A Comparison of natural (english) and artificial (esperanto) languages. - A Multifractal method based analysis",cs.CL physics.data-an," We present a comparison of two english texts, written by Lewis Carroll, one -(Alice in wonderland) and the other (Through a looking glass), the former -translated into esperanto, in order to observe whether natural and artificial -languages significantly differ from each other. We construct one dimensional -time series like signals using either word lengths or word frequencies. We use -the multifractal ideas for sorting out correlations in the writings. In order -to check the robustness of the methods we also write the corresponding shuffled -texts. We compare characteristic functions and e.g. observe marked differences -in the (far from parabolic) f(alpha) curves, differences which we attribute to -Tsallis non extensive statistical features in the ''frequency time series'' and -''length time series''. The esperanto text has more extreme vallues. A very -rough approximation consists in modeling the texts as a random Cantor set if -resulting from a binomial cascade of long and short words (or words and -blanks). This leads to parameters characterizing the text style, and most -likely in fine the author writings. -" -61,0801.3239,"Solomiya Buk, Andrij Rovenchak","Online-concordance ""Perekhresni stezhky"" (""The Cross-Paths""), a novel by - Ivan Franko",cs.CL cs.DL," In the article, theoretical principles and practical realization for the -compilation of the concordance to ""Perekhresni stezhky"" (""The Cross-Paths""), a -novel by Ivan Franko, are described. Two forms for the context presentation are -proposed. The electronic version of this lexicographic work is available -online. -" -62,0801.3817,Tuomo Kakkonen,"Robustness Evaluation of Two CCG, a PCFG and a Link Grammar Parsers",cs.CL," Robustness in a parser refers to an ability to deal with exceptional -phenomena. A parser is robust if it deals with phenomena outside its normal -range of inputs. This paper reports on a series of robustness evaluations of -state-of-the-art parsers in which we concentrated on one aspect of robustness: -its ability to parse sentences containing misspelled words. We propose two -measures for robustness evaluation based on a comparison of a parser's output -for grammatical input sentences and their noisy counterparts. In this paper, we -use these measures to compare the overall robustness of the four evaluated -parsers, and we present an analysis of the decline in parser performance with -increasing error levels. Our results indicate that performance typically -declines tens of percentage units when parsers are presented with texts -containing misspellings. When it was tested on our purpose-built test set of -443 sentences, the best parser in the experiment (C&C parser) was able to -return exactly the same parse tree for the grammatical and ungrammatical -sentences for 60.8%, 34.0% and 14.9% of the sentences with one, two or three -misspelled words respectively. -" -63,0801.3864,Alberto Pepe and Johan Bollen,"Between conjecture and memento: shaping a collective emotional - perception of the future",cs.CL cs.GL," Large scale surveys of public mood are costly and often impractical to -perform. However, the web is awash with material indicative of public mood such -as blogs, emails, and web queries. Inexpensive content analysis on such -extensive corpora can be used to assess public mood fluctuations. The work -presented here is concerned with the analysis of the public mood towards the -future. Using an extension of the Profile of Mood States questionnaire, we have -extracted mood indicators from 10,741 emails submitted in 2006 to futureme.org, -a web service that allows its users to send themselves emails to be delivered -at a later date. Our results indicate long-term optimism toward the future, but -medium-term apprehension and confusion. -" -64,0801.4716,"Tonio Wandmacher, Jean-Yves Antoine","Methods to integrate a language model with semantic information for a - word prediction component",cs.CL," Most current word prediction systems make use of n-gram language models (LM) -to estimate the probability of the following word in a phrase. In the past -years there have been many attempts to enrich such language models with further -syntactic or semantic information. We want to explore the predictive powers of -Latent Semantic Analysis (LSA), a method that has been shown to provide -reliable information on long-distance semantic dependencies between words in a -context. We present and evaluate here several methods that integrate LSA-based -information with a standard language model: a semantic cache, partial -reranking, and different forms of interpolation. We found that all methods show -significant improvements, compared to the 4-gram baseline, and most of them to -a simple cache model as well. -" -65,0801.4746,Walid S. Saba,"Concerning Olga, the Beautiful Little Street Dancer (Adjectives as - Higher-Order Polymorphic Functions)",cs.CL cs.LO," In this paper we suggest a typed compositional seman-tics for nominal -compounds of the form [Adj Noun] that models adjectives as higher-order -polymorphic functions, and where types are assumed to represent concepts in an -ontology that reflects our commonsense view of the world and the way we talk -about it in or-dinary language. In addition to [Adj Noun] compounds our -proposal seems also to suggest a plausible explana-tion for well known -adjective ordering restrictions. -" -66,0802.2234,"Christoph Schommer, Conny Uhde","Textual Fingerprinting with Texts from Parkin, Bassewitz, and Leander",cs.CL cs.CR," Current research in author profiling to discover a legal author's fingerprint -does not only follow examinations based on statistical parameters only but -include more and more dynamic methods that can learn and that react adaptable -to the specific behavior of an author. But the question on how to appropriately -represent a text is still one of the fundamental tasks, and the problem of -which attribute should be used to fingerprint the author's style is still not -exactly defined. In this work, we focus on linguistic selection of attributes -to fingerprint the style of the authors Parkin, Bassewitz and Leander. We use -texts of the genre Fairy Tale as it has a clear style and texts of a shorter -size with a straightforward story-line and a simple language. -" -67,0802.4112,"Hanna E. Makaruk, Robert Owczarek",Hubs in Languages: Scale Free Networks of Synonyms,physics.soc-ph cs.CL physics.data-an," Natural languages are described in this paper in terms of networks of -synonyms: a word is identified with a node, and synonyms are connected by -undirected links. Our statistical analysis of the network of synonyms in Polish -language showed it is scale-free; similar to what is known for English. The -statistical properties of the networks are also similar. Thus, the statistical -aspects of the networks are good candidates for culture independent elements of -human language. We hypothesize that optimization for robustness and efficiency -is responsible for this universality. Despite the statistical similarity, there -is no one-to-one mapping between networks of these two languages. Although many -hubs in Polish are translated into similarly highly connected hubs in English, -there are also hubs specific to one of these languages only: a single word in -one language is equivalent to many different and disconnected words in the -other, in accordance with the Whorf hypothesis about language relativity. -Identifying language-specific hubs is vitally important for automatic -translation, and for understanding contextual, culturally related messages that -are frequently missed or twisted in a naive, literary translation. -" -68,0802.4198,"Solomija Buk, J\'an Ma\v{c}utek, Andrij Rovenchak",Some properties of the Ukrainian writing system,cs.CL," We investigate the grapheme-phoneme relation in Ukrainian and some properties -of the Ukrainian version of the Cyrillic alphabet. -" -69,0802.4215,M. Ausloos,"Equilibrium (Zipf) and Dynamic (Grasseberg-Procaccia) method based - analyses of human texts. A comparison of natural (english) and artificial - (esperanto) languages",physics.soc-ph cs.CL physics.data-an," A comparison of two english texts from Lewis Carroll, one (Alice in -wonderland), also translated into esperanto, the other (Through a looking -glass) are discussed in order to observe whether natural and artificial -languages significantly differ from each other. One dimensional time series -like signals are constructed using only word frequencies (FTS) or word lengths -(LTS). The data is studied through (i) a Zipf method for sorting out -correlations in the FTS and (ii) a Grassberger-Procaccia (GP) technique based -method for finding correlations in LTS. Features are compared : different power -laws are observed with characteristic exponents for the ranking properties, and -the {\it phase space attractor dimensionality}. The Zipf exponent can take -values much less than unity ($ca.$ 0.50 or 0.30) depending on how a sentence is -defined. This non-universality is conjectured to be a measure of the author -$style$. Moreover the attractor dimension $r$ is a simple function of the so -called phase space dimension $n$, i.e., $r = n^{\lambda}$, with $\lambda = -0.79$. Such an exponent should also conjecture to be a measure of the author -$creativity$. However, even though there are quantitative differences between -the original english text and its esperanto translation, the qualitative -differences are very minutes, indicating in this case a translation relatively -well respecting, along our analysis lines, the content of the author writing. -" -70,0802.4326,Jiyou Jia,"The Generation of Textual Entailment with NLML in an Intelligent - Dialogue system for Language Learning CSIEC",cs.CL cs.AI cs.CY," This research report introduces the generation of textual entailment within -the project CSIEC (Computer Simulation in Educational Communication), an -interactive web-based human-computer dialogue system with natural language for -English instruction. The generation of textual entailment (GTE) is critical to -the further improvement of CSIEC project. Up to now we have found few -literatures related with GTE. Simulating the process that a human being learns -English as a foreign language we explore our naive approach to tackle the GTE -problem and its algorithm within the framework of CSIEC, i.e. rule annotation -in NLML, pattern recognition (matching), and entailment transformation. The -time and space complexity of our algorithm is tested with some entailment -examples. Further works include the rules annotation based on the English -textbooks and a GUI interface for normal users to edit the entailment rules. -" -71,0803.2856,"T. Rothenberger, S. Oez, E. Tahirovic, C. Schommer","Figuring out Actors in Text Streams: Using Collocations to establish - Incremental Mind-maps",cs.CL cs.LG," The recognition, involvement, and description of main actors influences the -story line of the whole text. This is of higher importance as the text per se -represents a flow of words and expressions that once it is read it is lost. In -this respect, the understanding of a text and moreover on how the actor exactly -behaves is not only a major concern: as human beings try to store a given input -on short-term memory while associating diverse aspects and actors with -incidents, the following approach represents a virtual architecture, where -collocations are concerned and taken as the associative completion of the -actors' acting. Once that collocations are discovered, they become managed in -separated memory blocks broken down by the actors. As for human beings, the -memory blocks refer to associative mind-maps. We then present several priority -functions to represent the actual temporal situation inside a mind-map to -enable the user to reconstruct the recent events from the discovered temporal -results. -" -72,0804.0143,"Beno\^it Lemaire (TIMC), Guy Denhi\`ere (LPC)",Effects of High-Order Co-occurrences on Word Semantic Similarities,cs.CL," A computational model of the construction of word meaning through exposure to -texts is built in order to simulate the effects of co-occurrence values on word -semantic similarities, paragraph by paragraph. Semantic similarity is here -viewed as association. It turns out that the similarity between two words W1 -and W2 strongly increases with a co-occurrence, decreases with the occurrence -of W1 without W2 or W2 without W1, and slightly increases with high-order -co-occurrences. Therefore, operationalizing similarity as a frequency of -co-occurrence probably introduces a bias: first, there are cases in which there -is similarity without co-occurrence and, second, the frequency of co-occurrence -overestimates similarity. -" -73,0804.0317,"Maurice HT Ling, Christophe Lefevre, Kevin R. Nicholas","Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in - Extracting Information from Biomedical Text",cs.CL cs.IR," A recent study reported development of Muscorian, a generic text processing -tool for extracting protein-protein interactions from text that achieved -comparable performance to biomedical-specific text processing tools. This -result was unexpected since potential errors from a series of text analysis -processes is likely to adversely affect the outcome of the entire process. Most -biomedical entity relationship extraction tools have used biomedical-specific -parts-of-speech (POS) tagger as errors in POS tagging and are likely to affect -subsequent semantic analysis of the text, such as shallow parsing. This study -aims to evaluate the parts-of-speech (POS) tagging accuracy and attempts to -explore whether a comparable performance is obtained when a generic POS tagger, -MontyTagger, was used in place of MedPost, a tagger trained in biomedical text. -Our results demonstrated that MontyTagger, Muscorian's POS tagger, has a POS -tagging accuracy of 83.1% when tested on biomedical text. Replacing MontyTagger -with MedPost did not result in a significant improvement in entity relationship -extraction from text; precision of 55.6% from MontyTagger versus 56.8% from -MedPost on directional relationships and 86.1% from MontyTagger compared to -81.8% from MedPost on nondirectional relationships. This is unexpected as the -potential for poor POS tagging by MontyTagger is likely to affect the outcome -of the information extraction. An analysis of POS tagging errors demonstrated -that 78.5% of tagging errors are being compensated by shallow parsing. Thus, -despite 83.1% tagging accuracy, MontyTagger has a functional tagging accuracy -of 94.6%. -" -74,0804.1033,"Sviatlana Danilava, Christoph Schommer","A Semi-Automatic Framework to Discover Epistemic Modalities in - Scientific Articles",cs.CL cs.LO," Documents in scientific newspapers are often marked by attitudes and opinions -of the author and/or other persons, who contribute with objective and -subjective statements and arguments as well. In this respect, the attitude is -often accomplished by a linguistic modality. As in languages like english, -french and german, the modality is expressed by special verbs like can, must, -may, etc. and the subjunctive mood, an occurrence of modalities often induces -that these verbs take over the role of modality. This is not correct as it is -proven that modality is the instrument of the whole sentence where both the -adverbs, modal particles, punctuation marks, and the intonation of a sentence -contribute. Often, a combination of all these instruments are necessary to -express a modality. In this work, we concern with the finding of modal verbs in -scientific texts as a pre-step towards the discovery of the attitude of an -author. Whereas the input will be an arbitrary text, the output consists of -zones representing modalities. -" -75,0804.2354,"A. V. Smirnov, A. A. Krizhanovsky",Information filtering based on wiki index database,cs.IR cs.CL," In this paper we present a profile-based approach to information filtering by -an analysis of the content of text documents. The Wikipedia index database is -created and used to automatically generate the user profile from the user -document collection. The problem-oriented Wikipedia subcorpora are created -(using knowledge extracted from the user profile) for each topic of user -interests. The index databases of these subcorpora are applied to filtering -information flow (e.g., mails, news). Thus, the analyzed texts are classified -into several topics explicitly presented in the user profile. The paper -concentrates on the indexing part of the approach. The architecture of an -application implementing the Wikipedia indexing is described. The indexing -method is evaluated using the Russian and Simple English Wikipedia. -" -76,0804.3269,"Santiago Fern\'andez, Alex Graves, Juergen Schmidhuber",Phoneme recognition in TIMIT with BLSTM-CTC,cs.CL cs.NE," We compare the performance of a recurrent neural network with the best -results published so far on phoneme recognition in the TIMIT database. These -published results have been obtained with a combination of classifiers. -However, in this paper we apply a single recurrent neural network to the same -task. Our recurrent neural network attains an error rate of 24.6%. This result -is not significantly different from that obtained by the other best methods, -but they rely on a combination of classifiers for achieving comparable -performance. -" -77,0804.3599,"Oren Kurland, Lillian Lee","Respect My Authority! HITS Without Hyperlinks, Utilizing Cluster-Based - Language Models",cs.IR cs.CL," We present an approach to improving the precision of an initial document -ranking wherein we utilize cluster information within a graph-based framework. -The main idea is to perform re-ranking based on centrality within bipartite -graphs of documents (on one side) and clusters (on the other side), on the -premise that these are mutually reinforcing entities. Links between entities -are created via consideration of language models induced from them. - We find that our cluster-document graphs give rise to much better retrieval -performance than previously proposed document-only graphs do. For example, -authority-based re-ranking of documents via a HITS-style cluster-based approach -outperforms a previously-proposed PageRank-inspired algorithm applied to -solely-document graphs. Moreover, we also show that computing authority scores -for clusters constitutes an effective method for identifying clusters -containing a large percentage of relevant documents. -" -78,0804.4584,Sylvain Schmitz and Joseph Le Roux,Feature Unification in TAG Derivation Trees,cs.CL," The derivation trees of a tree adjoining grammar provide a first insight into -the sentence semantics, and are thus prime targets for generation systems. We -define a formalism, feature-based regular tree grammars, and a translation from -feature based tree adjoining grammars into this new formalism. The translation -preserves the derivation structures of the original grammar, and accounts for -feature unification. -" -79,0805.1030,"Stephane Zampelli, Martin Mann, Yves Deville and Rolf Backofen",Decomposition Techniques for Subgraph Matching,cs.CC cs.CL," In the constraint programming framework, state-of-the-art static and dynamic -decomposition techniques are hard to apply to problems with complete initial -constraint graphs. For such problems, we propose a hybrid approach of these -techniques in the presence of global constraints. In particular, we solve the -subgraph isomorphism problem. Further we design specific heuristics for this -hard problem, exploiting its special structure to achieve decomposition. The -underlying idea is to precompute a static heuristic on a subset of its -constraint network, to follow this static ordering until a first problem -decomposition is available, and to switch afterwards to a fully propagated, -dynamically decomposing search. Experimental results show that, for sparse -graphs, our decomposition method solves more instances than dedicated, -state-of-the-art matching algorithms or standard constraint programming -approaches. -" -80,0805.2303,"Richard Moot (LaBRI, Inria Futurs)",Graph Algorithms for Improving Type-Logical Proof Search,cs.CL," Proof nets are a graph theoretical representation of proofs in various -fragments of type-logical grammar. In spite of this basis in graph theory, -there has been relatively little attention to the use of graph theoretic -algorithms for type-logical proof search. In this paper we will look at several -ways in which standard graph theoretic algorithms can be used to restrict the -search space. In particular, we will provide an O(n4) algorithm for selecting -an optimal axiom link at any stage in the proof search as well as a O(kn3) -algorithm for selecting the k best proof candidates. -" -81,0805.2537,"Patrick Henry, Christian Bassac (LaBRI)",A toolkit for a generative lexicon,cs.CL," In this paper we describe the conception of a software toolkit designed for -the construction, maintenance and collaborative use of a Generative Lexicon. In -order to ease its portability and spreading use, this tool was built with free -and open source products. We eventually tested the toolkit and showed it -filters the adequate form of anaphoric reference to the modifier in endocentric -compounds. -" -82,0805.3366,"Fabian Steeg, Christoph Benden, Paul O. Samuelsdorff","Computational Representation of Linguistic Structures using - Domain-Specific Languages",cs.CL," We describe a modular system for generating sentences from formal definitions -of underlying linguistic structures using domain-specific languages. The system -uses Java in general, Prolog for lexical entries and custom domain-specific -languages based on Functional Grammar and Functional Discourse Grammar -notation, implemented using the ANTLR parser generator. We show how linguistic -and technological parts can be brought together in a natural language -processing system and how domain-specific languages can be used as a tool for -consistent formal notation in linguistic description. -" -83,0805.3410,Sylvain Pogodalla (INRIA Lorraine - LORIA),"Exploring a type-theoretic approach to accessibility constraint - modelling",cs.CL," The type-theoretic modelling of DRT that [degroote06] proposed features -continuations for the management of the context in which a clause has to be -interpreted. This approach, while keeping the standard definitions of -quantifier scope, translates the rules of the accessibility constraints of -discourse referents inside the semantic recipes. In this paper, we deal with -additional rules for these accessibility constraints. In particular in the case -of discourse referents introduced by proper nouns, that negation does not -block, and in the case of rhetorical relations that structure discourses. We -show how this continuation-based approach applies to those accessibility -constraints and how we can consider the parallel management of various -principles. -" -84,0805.4101,"Sylvie Saget (IRISA), Marc Guyomard (IRISA)","Goal-oriented Dialog as a Collaborative Subordinated Activity involving - Collective Acceptance",cs.AI cs.CL," Modeling dialog as a collaborative activity consists notably in specifying -the content of the Conversational Common Ground and the kind of social mental -state involved. In previous work (Saget, 2006), we claim that Collective -Acceptance is the proper social attitude for modeling Conversational Common -Ground in the particular case of goal-oriented dialog. In this paper, a -formalization of Collective Acceptance is shown, besides elements in order to -integrate this attitude in a rational model of dialog are provided; and -finally, a model of referential acts as being part of a collaborative activity -is presented. The particular case of reference has been chosen in order to -exemplify our claims. -" -85,0805.4369,"Guy Denhi\`ere (LPC), Beno\^it Lemaire (TIMC), C\'edrick Bellissens, - Sandra Jhean",A semantic space for modeling children's semantic memory,cs.CL," The goal of this paper is to present a model of children's semantic memory, -which is based on a corpus reproducing the kinds of texts children are exposed -to. After presenting the literature in the development of the semantic memory, -a preliminary French corpus of 3.2 million words is described. Similarities in -the resulting semantic space are compared to human data on four tests: -association norms, vocabulary test, semantic judgments and memory tasks. A -second corpus is described, which is composed of subcorpora corresponding to -various ages. This stratified corpus is intended as a basis for developmental -studies. Finally, two applications of these models of semantic memory are -presented: the first one aims at tracing the development of semantic -similarities paragraph by paragraph; the second one describes an implementation -of a model of text comprehension derived from the Construction-integration -model (Kintsch, 1988, 1998) and based on such models of semantic memory. -" -86,0805.4521,Doina Tatar and Militon Frentiu,Textual Entailment Recognizing by Theorem Proving Approach,cs.CL," In this paper we present two original methods for recognizing textual -inference. First one is a modified resolution method such that some linguistic -considerations are introduced in the unification of two atoms. The approach is -possible due to the recent methods of transforming texts in logic formulas. -Second one is based on semantic relations in text, as presented in WordNet. -Some similarities between these two methods are remarked. -" -87,0805.4722,"Bernard Jacquemin (LIMSI), Aur\'elien Lauf (LIMSI), C\'eline Poudat - (LTCI), Martine Hurault-Plantet (LIMSI), Nicolas Auray (LTCI)",La fiabilit\'e des informations sur le web,cs.IR cs.CL cs.CY," Online IR tools have to take into account new phenomena linked to the -appearance of blogs, wiki and other collaborative publications. Among these -collaborative sites, Wikipedia represents a crucial source of information. -However, the quality of this information has been recently questionned. A -better knowledge of the contributors' behaviors should help users navigate -through information whose quality may vary from one source to another. In order -to explore this idea, we present an analysis of the role of different types of -contributors in the control of the publication of conflictual articles. -" -88,0805.4754,"Bernard Jacquemin (LIMSI), Aur\'elien Lauf (LIMSI), C\'eline Poudat - (LTCI), Martine Hurault-Plantet (LIMSI), Nicolas Auray (LTCI)",Managing conflicts between users in Wikipedia,cs.IR cs.CL cs.CY cs.HC," Wikipedia is nowadays a widely used encyclopedia, and one of the most visible -sites on the Internet. Its strong principle of collaborative work and free -editing sometimes generates disputes due to disagreements between users. In -this article we study how the wikipedian community resolves the conflicts and -which roles do wikipedian choose in this process. We observed the users -behavior both in the article talk pages, and in the Arbitration Committee pages -specifically dedicated to serious disputes. We first set up a users typology -according to their involvement in conflicts and their publishing and management -activity in the encyclopedia. We then used those user types to describe users -behavior in contributing to articles that are tagged by the wikipedian -community as being in conflict with the official guidelines of Wikipedia, or -conversely as being well featured. -" -89,0806.2581,"Doina Tatar, Gabriela Serban, Andreea Mihis, Mihaiela Lupea, Dana - Lupsa and Militon Frentiu",A chain dictionary method for Word Sense Disambiguation and applications,cs.CL," A large class of unsupervised algorithms for Word Sense Disambiguation (WSD) -is that of dictionary-based methods. Various algorithms have as the root Lesk's -algorithm, which exploits the sense definitions in the dictionary directly. Our -approach uses the lexical base WordNet for a new algorithm originated in -Lesk's, namely ""chain algorithm for disambiguation of all words"", CHAD. We show -how translation from a language into another one and also text entailment -verification could be accomplished by this disambiguation. -" -90,0806.3710,"A. Blondin Masse, G. Chicoisne, Y. Gargouri, S. Harnad, O. Picard, O. - Marcotte",How Is Meaning Grounded in Dictionary Definitions?,cs.CL cs.DB," Meaning cannot be based on dictionary definitions all the way down: at some -point the circularity of definitions must be broken in some way, by grounding -the meanings of certain words in sensorimotor categories learned from -experience or shaped by evolution. This is the ""symbol grounding problem."" We -introduce the concept of a reachable set -- a larger vocabulary whose meanings -can be learned from a smaller vocabulary through definition alone, as long as -the meanings of the smaller vocabulary are themselves already grounded. We -provide simple algorithms to compute reachable sets for any given dictionary. -" -91,0806.3787,"Ted Pedersen (University of Minnesota, Duluth)","Computational Approaches to Measuring the Similarity of Short Contexts : - A Review of Applications and Methods",cs.CL," Measuring the similarity of short written contexts is a fundamental problem -in Natural Language Processing. This article provides a unifying framework by -which short context problems can be categorized both by their intended -application and proposed solution. The goal is to show that various problems -and methodologies that appear quite different on the surface are in fact very -closely related. The axes by which these categorizations are made include the -format of the contexts (headed versus headless), the way in which the contexts -are to be measured (first-order versus second-order similarity), and the -information used to represent the features in the contexts (micro versus macro -views). The unifying thread that binds together many short context applications -and methods is the fact that similarity decisions must be made between contexts -that share few (if any) words in common. -" -92,0807.0311,D.V. Lande and V.V. Zhygalo,About the creation of a parallel bilingual corpora of web-publications,cs.CL," The algorithm of the creation texts parallel corpora was presented. The -algorithm is based on the use of ""key words"" in text documents, and on the -means of their automated translation. Key words were singled out by means of -using Russian and Ukrainian morphological dictionaries, as well as dictionaries -of the translation of nouns for the Russian and Ukrainianlanguages. Besides, to -calculate the weights of the terms in the documents, empiric-statistic rules -were used. The algorithm under consideration was realized in the form of a -program complex, integrated into the content-monitoring InfoStream system. As a -result, a parallel bilingual corpora of web-publications containing about 30 -thousand documents, was created -" -93,0807.0565,Damian H. Zanette,"Music, Complexity, Information",physics.soc-ph cs.CL," These are the preparatory notes for a Science & Music essay, ""Playing by -numbers"", appeared in Nature 453 (2008) 988-989. -" -94,0807.1560,Vahed Qazvinian and Dragomir R. Radev,Scientific Paper Summarization Using Citation Summary Networks,cs.IR cs.CL," Quickly moving to a new area of research is painful for researchers due to -the vast amount of scientific literature in each field of study. One possible -way to overcome this problem is to summarize a scientific topic. In this paper, -we propose a model of summarizing a single article, which can be further used -to summarize an entire topic. Our model is based on analyzing others' viewpoint -of the target article's contributions and the study of its citation summary -network using a clustering approach. -" -95,0807.3622,"Laura Kallmeyer (SFB 441), Timm Lichte (SFB 441), Wolfgang Maier (SFB - 441), Yannick Parmentier (INRIA Lorraine - LORIA), Johannes Dellert (SFB - 441), Kilian Evang (SFB 441)","TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar - Engineering",cs.CL," In this paper, we present an open-source parsing environment (Tuebingen -Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar -(RCG) as a pivot formalism, thus opening the way to the parsing of several -mildly context-sensitive formalisms. This environment currently supports -tree-based grammars (namely Tree-Adjoining Grammars, TAG) and Multi-Component -Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not -only of syntactic structures, but also of the corresponding semantic -representations. It is used for the development of a tree-based grammar for -German. -" -96,0807.3845,Stefano Crespi Reghizzi,Formal semantics of language and the Richard-Berry paradox,cs.CL cs.CC cs.LO," The classical logical antinomy known as Richard-Berry paradox is combined -with plausible assumptions about the size i.e. the descriptional complexity of -Turing machines formalizing certain sentences, to show that formalization of -language leads to contradiction. -" -97,0808.0521,Ian Pratt-Hartmann and Lawrence S. Moss,Logics for the Relational Syllogistic,cs.LO cs.CC cs.CL," The Aristotelian syllogistic cannot account for the validity of many -inferences involving relational facts. In this paper, we investigate the -prospects for providing a relational syllogistic. We identify several fragments -based on (a) whether negation is permitted on all nouns, including those in the -subject of a sentence; and (b) whether the subject noun phrase may contain a -relative clause. The logics we present are extensions of the classical -syllogistic, and we pay special attention to the question of whether reductio -ad absurdum is needed. Thus our main goal is to derive results on the existence -(or non-existence) of syllogistic proof systems for relational fragments. We -also determine the computational complexity of all our fragments. -" -98,0808.1211,Walid S. Saba,"Commonsense Knowledge, Ontology and Ordinary Language",cs.AI cs.CL," Over two decades ago a ""quite revolution"" overwhelmingly replaced -knowledgebased approaches in natural language processing (NLP) by quantitative -(e.g., statistical, corpus-based, machine learning) methods. Although it is our -firm belief that purely quantitative approaches cannot be the only paradigm for -NLP, dissatisfaction with purely engineering approaches to the construction of -large knowledge bases for NLP are somewhat justified. In this paper we hope to -demonstrate that both trends are partly misguided and that the time has come to -enrich logical semantics with an ontological structure that reflects our -commonsense view of the world and the way we talk about in ordinary language. -In this paper it will be demonstrated that assuming such an ontological -structure a number of challenges in the semantics of natural language (e.g., -metonymy, intensionality, copredication, nominal compounds, etc.) can be -properly and uniformly addressed. -" -99,0808.1753,A. A. Krizhanovsky,Index wiki database: design and experiments,cs.IR cs.CL," With the fantastic growth of Internet usage, information search in documents -of a special type called a ""wiki page"" that is written using a simple markup -language, has become an important problem. This paper describes the software -architectural model for indexing wiki texts in three languages (Russian, -English, and German) and the interaction between the software components (GATE, -Lemmatizer, and Synarcher). The inverted file index database was designed using -visual tool DBDesigner. The rules for parsing Wikipedia texts are illustrated -by examples. Two index databases of Russian Wikipedia (RW) and Simple English -Wikipedia (SEW) are built and compared. The size of RW is by order of magnitude -higher than SEW (number of words, lexemes), though the growth rate of number of -pages in SEW was found to be 14% higher than in Russian, and the rate of -acquisition of new words in SEW lexicon was 7% higher during a period of five -months (from September 2007 to February 2008). The Zipf's law was tested with -both Russian and Simple Wikipedias. The entire source code of the indexing -software and the generated index databases are freely available under GPL (GNU -General Public License). -" -100,0808.2904,Reginald D. Smith,Investigation of the Zipf-plot of the extinct Meroitic language,cs.CL," The ancient and extinct language Meroitic is investigated using Zipf's Law. -In particular, since Meroitic is still undeciphered, the Zipf law analysis -allows us to assess the quality of current texts and possible avenues for -future investigation using statistical techniques. -" -101,0808.3563,Stevan Harnad,What It Feels Like To Hear Voices: Fond Memories of Julian Jaynes,cs.CL," Julian Jaynes's profound humanitarian convictions not only prevented him from -going to war, but would have prevented him from ever kicking a dog. Yet -according to his theory, not only are language-less dogs unconscious, but so -too were the speaking/hearing Greeks in the Bicameral Era, when they heard -gods' voices telling them what to do rather than thinking for themselves. I -argue that to be conscious is to be able to feel, and that all mammals (and -probably lower vertebrates and invertebrates too) feel, hence are conscious. -Julian Jaynes's brilliant analysis of our concepts of consciousness -nevertheless keeps inspiring ever more inquiry and insights into the age-old -mind/body problem and its relation to cognition and language. -" -102,0808.3569,"Itiel Dror, Stevan Harnad",Offloading Cognition onto Cognitive Technology,cs.MA cs.CL," ""Cognizing"" (e.g., thinking, understanding, and knowing) is a mental state. -Systems without mental states, such as cognitive technology, can sometimes -contribute to human cognition, but that does not make them cognizers. Cognizers -can offload some of their cognitive functions onto cognitive technology, -thereby extending their performance capacity beyond the limits of their own -brain power. Language itself is a form of cognitive technology that allows -cognizers to offload some of their cognitive functions onto the brains of other -cognizers. Language also extends cognizers' individual and joint performance -powers, distributing the load through interactive and collaborative cognition. -Reading, writing, print, telecommunications and computing further extend -cognizers' capacities. And now the web, with its network of cognizers, digital -databases and software agents, all accessible anytime, anywhere, has become our -'Cognitive Commons,' in which distributed cognizers and cognitive technology -can interoperate globally with a speed, scope and degree of interactivity -inconceivable through local individual cognition alone. And as with language, -the cognitive tool par excellence, such technological changes are not merely -instrumental and quantitative: they can have profound effects on how we think -and encode information, on how we communicate with one another, on our mental -states, and on our very nature. -" -103,0808.3616,Reginald D. Smith,Constructing word similarities in Meroitic as an aid to decipherment,cs.CL," Meroitic is the still undeciphered language of the ancient civilization of -Kush. Over the years, various techniques for decipherment such as finding a -bilingual text or cognates from modern or other ancient languages in the Sudan -and surrounding areas has not been successful. Using techniques borrowed from -information theory and natural language statistics, similar words are paired -and attempts are made to use currently defined words to extract at least -partial meaning from unknown words. -" -104,0808.3889,M.T. Carrasco Benitez,Open architecture for multilingual parallel texts,cs.CL," Multilingual parallel texts (abbreviated to parallel texts) are linguistic -versions of the same content (""translations""); e.g., the Maastricht Treaty in -English and Spanish are parallel texts. This document is about creating an open -architecture for the whole Authoring, Translation and Publishing Chain -(ATP-chain) for the processing of parallel texts. -" -105,0808.4122,Tomoyuki Yamakami,Swapping Lemmas for Regular and Context-Free Languages,cs.CC cs.CL cs.FL," In formal language theory, one of the most fundamental tools, known as -pumping lemmas, is extremely useful for regular and context-free languages. -However, there are natural properties for which the pumping lemmas are of -little use. One of such examples concerns a notion of advice, which depends -only on the size of an underlying input. A standard pumping lemma encounters -difficulty in proving that a given language is not regular in the presence of -advice. We develop its substitution, called a swapping lemma for regular -languages, to demonstrate the non-regularity of a target language with advice. -For context-free languages, we also present a similar form of swapping lemma, -which serves as a technical tool to show that certain languages are not -context-free with advice. -" -106,0809.0103,Dmitrii Y. Manin,On the nature of long-range letter correlations in texts,cs.CL cs.IT math.IT," The origin of long-range letter correlations in natural texts is studied -using random walk analysis and Jensen-Shannon divergence. It is concluded that -they result from slow variations in letter frequency distribution, which are a -consequence of slow variations in lexical composition within the text. These -correlations are preserved by random letter shuffling within a moving window. -As such, they do reflect structural properties of the text, but in a very -indirect manner. -" -107,0809.0124,Peter D. Turney (National Research Council of Canada),"A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations",cs.CL cs.IR cs.LG," Recognizing analogies, synonyms, antonyms, and associations appear to be four -distinct tasks, requiring distinct NLP algorithms. In the past, the four tasks -have been treated independently, using a wide variety of algorithms. These four -semantic classes, however, are a tiny sample of the full range of semantic -phenomena, and we cannot afford to create ad hoc algorithms for each semantic -phenomenon; we need to seek a unified approach. We propose to subsume a broad -range of phenomena under analogies. To limit the scope of this paper, we -restrict our attention to the subsumption of synonyms, antonyms, and -associations. We introduce a supervised corpus-based machine learning algorithm -for classifying analogous word pairs, and we show that it can solve -multiple-choice SAT analogy questions, TOEFL synonym questions, ESL -synonym-antonym questions, and similar-associated-both questions from cognitive -psychology. -" -108,0809.0360,"Piero A. Bonatti, Carsten Lutz, Aniello Murano, Moshe Y. Vardi",The Complexity of Enriched Mu-Calculi,cs.LO cs.CL," The fully enriched μ-calculus is the extension of the propositional -μ-calculus with inverse programs, graded modalities, and nominals. While -satisfiability in several expressive fragments of the fully enriched -μ-calculus is known to be decidable and ExpTime-complete, it has recently -been proved that the full calculus is undecidable. In this paper, we study the -fragments of the fully enriched μ-calculus that are obtained by dropping at -least one of the additional constructs. We show that, in all fragments obtained -in this way, satisfiability is decidable and ExpTime-complete. Thus, we -identify a family of decidable logics that are maximal (and incomparable) in -expressive power. Our results are obtained by introducing two new automata -models, showing that their emptiness problems are ExpTime-complete, and then -reducing satisfiability in the relevant logics to these problems. The automata -models we introduce are two-way graded alternating parity automata over -infinite trees (2GAPTs) and fully enriched automata (FEAs) over infinite -forests. The former are a common generalization of two incomparable automata -models from the literature. The latter extend alternating automata in a similar -way as the fully enriched μ-calculus extends the standard μ-calculus. -" -109,0809.3250,Andrey Kutuzov,Using descriptive mark-up to formalize translation quality assessment,cs.CL," The paper deals with using descriptive mark-up to emphasize translation -mistakes. The author postulates the necessity to develop a standard and formal -XML-based way of describing translation mistakes. It is considered to be -important for achieving impersonal translation quality assessment. Marked-up -translations can be used in corpus translation studies; moreover, automatic -translation assessment based on marked-up mistakes is possible. The paper -concludes with setting up guidelines for further activity within the described -field. -" -110,0809.4530,"Olena Medelyan, David Milne, Catherine Legg and Ian H. Witten",Mining Meaning from Wikipedia,cs.AI cs.CL cs.IR," Wikipedia is a goldmine of information; not just for its many readers, but -also for the growing community of researchers who recognize it as a resource of -exceptional scale and utility. It represents a vast investment of manual effort -and judgment: a huge, constantly evolving tapestry of concepts and relations -that is being applied to a host of tasks. - This article provides a comprehensive description of this work. It focuses on -research that extracts and makes use of the concepts, relations, facts and -descriptions found in Wikipedia, and organizes the work into four broad -categories: applying Wikipedia to natural language processing; using it to -facilitate information retrieval and information extraction; and as a resource -for ontology building. The article addresses how Wikipedia is being used as is, -how it is being improved and adapted, and how it is being combined with other -structures to create entirely new resources. We identify the research groups -and individuals involved, and how their work has developed in the last few -years. We provide a comprehensive list of the open-source software they have -produced. -" -111,0810.0200,"Andrij Rovenchak, J\'an Ma\v{c}utek, Charles Riley",Distribution of complexities in the Vai script,cs.CL," In the paper, we analyze the distribution of complexities in the Vai script, -an indigenous syllabic writing system from Liberia. It is found that the -uniformity hypothesis for complexities fails for this script. The models using -Poisson distribution for the number of components and hyper-Poisson -distribution for connections provide good fits in the case of the Vai script. -" -112,0810.1199,Pascal Vaillant,"Une grammaire formelle du cr\'eole martiniquais pour la g\'en\'eration - automatique",cs.CL," In this article, some first elements of a computational modelling of the -grammar of the Martiniquese French Creole dialect are presented. The sources of -inspiration for the modelling is the functional description given by Damoiseau -(1984), and Pinalie's & Bernabe's (1999) grammar manual. Based on earlier works -in text generation (Vaillant, 1997), a unification grammar formalism, namely -Tree Adjoining Grammars (TAG), and a modelling of lexical functional categories -based on syntactic and semantic properties, are used to implement a grammar of -Martiniquese Creole which is used in a prototype of text generation system. One -of the main applications of the system could be its use as a tool software -supporting the task of learning Creole as a second language. -- Nous -pr\'esenterons dans cette communication les premiers travaux de mod\'elisation -informatique d'une grammaire de la langue cr\'eole martiniquaise, en nous -inspirant des descriptions fonctionnelles de Damoiseau (1984) ainsi que du -manuel de Pinalie & Bernab\'e (1999). Prenant appui sur des travaux -ant\'erieurs en g\'en\'eration de texte (Vaillant, 1997), nous utilisons un -formalisme de grammaires d'unification, les grammaires d'adjonction d'arbres -(TAG d'apr\`es l'acronyme anglais), ainsi qu'une mod\'elisation de cat\'egories -lexicales fonctionnelles \`a base syntaxico-s\'emantique, pour mettre en oeuvre -une grammaire du cr\'eole martiniquais utilisable dans une maquette de -syst\`eme de g\'en\'eration automatique. L'un des int\'er\^ets principaux de ce -syst\`eme pourrait \^etre son utilisation comme logiciel outil pour l'aide \`a -l'apprentissage du cr\'eole en tant que langue seconde. -" -113,0810.1207,Pascal Vaillant,"A Layered Grammar Model: Using Tree-Adjoining Grammars to Build a Common - Syntactic Kernel for Related Dialects",cs.CL," This article describes the design of a common syntactic description for the -core grammar of a group of related dialects. The common description does not -rely on an abstract sub-linguistic structure like a metagrammar: it consists in -a single FS-LTAG where the actual specific language is included as one of the -attributes in the set of attribute types defined for the features. When the -lang attribute is instantiated, the selected subset of the grammar is -equivalent to the grammar of one dialect. When it is not, we have a model of a -hybrid multidialectal linguistic system. This principle is used for a group of -creole languages of the West-Atlantic area, namely the French-based Creoles of -Haiti, Guadeloupe, Martinique and French Guiana. -" -114,0810.1212,"Pascal Vaillant, Richard Nock and Claudia Henry","Analyse spectrale des textes: d\'etection automatique des fronti\`eres - de langue et de discours",cs.CL cs.IR," We propose a theoretical framework within which information on the vocabulary -of a given corpus can be inferred on the basis of statistical information -gathered on that corpus. Inferences can be made on the categories of the words -in the vocabulary, and on their syntactical properties within particular -languages. Based on the same statistical data, it is possible to build matrices -of syntagmatic similarity (bigram transition matrices) or paradigmatic -similarity (probability for any pair of words to share common contexts). When -clustered with respect to their syntagmatic similarity, words tend to group -into sublanguage vocabularies, and when clustered with respect to their -paradigmatic similarity, into syntactic or semantic classes. Experiments have -explored the first of these two possibilities. Their results are interpreted in -the frame of a Markov chain modelling of the corpus' generative processe(s): we -show that the results of a spectral analysis of the transition matrix can be -interpreted as probability distributions of words within clusters. This method -yields a soft clustering of the vocabulary into sublanguages which contribute -to the generation of heterogeneous corpora. As an application, we show how -multilingual texts can be visually segmented into linguistically homogeneous -segments. Our method is specifically useful in the case of related languages -which happened to be mixed in corpora. -" -115,0810.1261,"Richard Nock, Pascal Vaillant, Frank Nielsen and Claudia Henry","Soft Uncoupling of Markov Chains for Permeable Language Distinction: A - New Algorithm",cs.CL cs.IR," Without prior knowledge, distinguishing different languages may be a hard -task, especially when their borders are permeable. We develop an extension of -spectral clustering -- a powerful unsupervised classification toolbox -- that -is shown to resolve accurately the task of soft language distinction. At the -heart of our approach, we replace the usual hard membership assignment of -spectral clustering by a soft, probabilistic assignment, which also presents -the advantage to bypass a well-known complexity bottleneck of the method. -Furthermore, our approach relies on a novel, convenient construction of a -Markov chain out of a corpus. Extensive experiments with a readily available -system clearly display the potential of the method, which brings a visually -appealing soft distinction of languages that may define altogether a whole -corpus. -" -116,0810.3125,{\L}ukasz D\k{e}bowski,"On the Vocabulary of Grammar-Based Codes and the Logical Consistency of - Texts",cs.IT cs.CL math.IT," The article presents a new interpretation for Zipf-Mandelbrot's law in -natural language which rests on two areas of information theory. Firstly, we -construct a new class of grammar-based codes and, secondly, we investigate -properties of strongly nonergodic stationary processes. The motivation for the -joint discussion is to prove a proposition with a simple informal statement: If -a text of length $n$ describes $n^\beta$ independent facts in a repetitive way -then the text contains at least $n^\beta/\log n$ different words, under -suitable conditions on $n$. In the formal statement, two modeling postulates -are adopted. Firstly, the words are understood as nonterminal symbols of the -shortest grammar-based encoding of the text. Secondly, the text is assumed to -be emitted by a finite-energy strongly nonergodic source whereas the facts are -binary IID variables predictable in a shift-invariant way. -" -117,0810.3416,K.Koroutchev and E.Korutcheva,Text as Statistical Mechanics Object,cs.CL physics.soc-ph," In this article we present a model of human written text based on statistical -mechanics approach by deriving the potential energy for different parts of the -text using large text corpus. We have checked the results numerically and found -that the specific heat parameter effectively separates the closed class words -from the specific terms used in the text. -" -118,0810.3442,Adam Lipowski and Dorota Lipowska,Language structure in the n-object naming game,cs.CL cs.MA physics.soc-ph," We examine a naming game with two agents trying to establish a common -vocabulary for n objects. Such efforts lead to the emergence of language that -allows for an efficient communication and exhibits some degree of homonymy and -synonymy. Although homonymy reduces the communication efficiency, it seems to -be a dynamical trap that persists for a long, and perhaps indefinite, time. On -the other hand, synonymy does not reduce the efficiency of communication, but -appears to be only a transient feature of the language. Thus, in our model the -role of synonymy decreases and in the long-time limit it becomes negligible. A -similar rareness of synonymy is observed in present natural languages. The role -of noise, that distorts the communicated words, is also examined. Although, in -general, the noise reduces the communication efficiency, it also regroups the -words so that they are more evenly distributed within the available ""verbal"" -space. -" -119,0810.4616,"Claudine Brucks, Christoph Schommer",Assembling Actor-based Mind-Maps from Text Stream,cs.CL cs.DL," For human beings, the processing of text streams of unknown size leads -generally to problems because e.g. noise must be selected out, information be -tested for its relevance or redundancy, and linguistic phenomenon like -ambiguity or the resolution of pronouns be advanced. Putting this into -simulation by using an artificial mind-map is a challenge, which offers the -gate for a wide field of applications like automatic text summarization or -punctual retrieval. In this work we present a framework that is a first step -towards an automatic intellect. It aims at assembling a mind-map based on -incoming text streams and on a subject-verb-object strategy, having the verb as -an interconnection between the adjacent nouns. The mind-map's performance is -enriched by a pronoun resolution engine that bases on the work of D. Klein, and -C. D. Manning. -" -120,0810.4952,Adam Lipowski and Dorota Lipowska,Computational modelling of evolution: ecosystems and language,q-bio.PE cs.CL physics.soc-ph," Recently, computational modelling became a very important research tool that -enables us to study problems that for decades evaded scientific analysis. -Evolutionary systems are certainly examples of such problems: they are composed -of many units that might reproduce, diffuse, mutate, die, or in some cases for -example communicate. These processes might be of some adaptive value, they -influence each other and occur on various time scales. That is why such systems -are so difficult to study. In this paper we briefly review some computational -approaches, as well as our contributions, to the evolution of ecosystems and -language. We start from Lotka-Volterra equations and the modelling of simple -two-species prey-predator systems. Such systems are canonical example for -studying oscillatory behaviour in competitive populations. Then we describe -various approaches to study long-term evolution of multi-species ecosystems. We -emphasize the need to use models that take into account both ecological and -evolutionary processes. Finally, we address the problem of the emergence and -development of language. It is becoming more and more evident that any theory -of language origin and development must be consistent with darwinian principles -of evolution. Consequently, a number of techniques developed for modelling -evolution of complex ecosystems are being applied to the problem of language. -We briefly review some of these approaches. -" -121,0811.0453,"Cynthia Wagner, and Christoph Schommer",CoZo+ - A Content Zoning Engine for textual documents,cs.CL cs.IR," Content zoning can be understood as a segmentation of textual documents into -zones. This is inspired by [6] who initially proposed an approach for the -argumentative zoning of textual documents. With the prototypical CoZo+ engine, -we focus on content zoning towards an automatic processing of textual streams -while considering only the actors as the zones. We gain information that can be -used to realize an automatic recognition of content for pre-defined actors. We -understand CoZo+ as a necessary pre-step towards an automatic generation of -summaries and to make intellectual ownership of documents detectable. -" -122,0811.0579,"Gilles s\'erasset (IMAG, Clips - Imag, Lig), Christian Boitet (IMAG, - Clips - Imag, Lig)","UNL-French deconversion as transfer & generation from an interlingua - with possible quality enhancement through offline human interaction",cs.CL," We present the architecture of the UNL-French deconverter, which ""generates"" -from the UNL interlingua by first""localizing"" the UNL form for French, within -UNL, and then applying slightly adapted but classical transfer and generation -techniques, implemented in GETA's Ariane-G5 environment, supplemented by some -UNL-specific tools. Online interaction can be used during deconversion to -enhance output quality and is now used for development purposes. We show how -interaction could be delayed and embedded in the postedition phase, which would -then interact not directly with the output text, but indirectly with several -components of the deconverter. Interacting online or offline can improve the -quality not only of the utterance at hand, but also of the utterances processed -later, as various preferences may be automatically changed to let the -deconverter ""learn"". -" -123,0811.1260,"Raj Kishor Bisht, H.S.Dhami",The Application of Fuzzy Logic to Collocation Extraction,cs.CL," Collocations are important for many tasks of Natural language processing such -as information retrieval, machine translation, computational lexicography etc. -So far many statistical methods have been used for collocation extraction. -Almost all the methods form a classical crisp set of collocation. We propose a -fuzzy logic approach of collocation extraction to form a fuzzy set of -collocations in which each word combination has a certain grade of membership -for being collocation. Fuzzy logic provides an easy way to express natural -language into fuzzy logic rules. Two existing methods; Mutual information and -t-test have been utilized for the input of the fuzzy inference system. The -resulting membership function could be easily seen and demonstrated. To show -the utility of the fuzzy logic some word pairs have been examined as an -example. The working data has been based on a corpus of about one million words -contained in different novels constituting project Gutenberg available on -www.gutenberg.org. The proposed method has all the advantages of the two -methods, while overcoming their drawbacks. Hence it provides a better result -than the two methods. -" -124,0811.4717,"Roxana Teodorescu (UPT, LAB), Daniel Racoceanu (LAB, IPAAL), Wee-Kheng - Leow (IPAAL, NUS), Vladimir Cretu (UPT)","Prospective Study for Semantic Inter-Media Fusion in Content-Based - Medical Image Retrieval",cs.IR cs.CL," One important challenge in modern Content-Based Medical Image Retrieval -(CBMIR) approaches is represented by the semantic gap, related to the -complexity of the medical knowledge. Among the methods that are able to close -this gap in CBMIR, the use of medical thesauri/ontologies has interesting -perspectives due to the possibility of accessing on-line updated relevant -webservices and to extract real-time medical semantic structured information. -The CBMIR approach proposed in this paper uses the Unified Medical Language -System's (UMLS) Metathesaurus to perform a semantic indexing and fusion of -medical media. This fusion operates before the query processing (retrieval) and -works at an UMLS-compliant conceptual indexing level. Our purpose is to study -various techniques related to semantic data alignment, preprocessing, fusion, -clustering and retrieval, by evaluating the various techniques and highlighting -future research directions. The alignment and the preprocessing are based on -partial text/image retrieval feedback and on the data structure. We analyze -various probabilistic, fuzzy and evidence-based approaches for the fusion -process and different similarity functions for the retrieval process. All the -proposed methods are evaluated on the Cross Language Evaluation Forum's (CLEF) -medical image retrieval benchmark, by focusing also on a more homogeneous -component medical image database: the Pathology Education Instructional -Resource (PEIR). -" -125,0812.3070,"J. Borge, A. Arenas","A Computational Model to Disentangle Semantic Information Embedded in - Word Association Norms",cs.CL cs.AI physics.data-an physics.soc-ph," Two well-known databases of semantic relationships between pairs of words -used in psycholinguistics, feature-based and association-based, are studied as -complex networks. We propose an algorithm to disentangle feature based -relationships from free association semantic networks. The algorithm uses the -rich topology of the free association semantic network to produce a new set of -relationships between words similar to those observed in feature production -norms. -" -126,0812.4446,Peter D. Turney (National Research Council of Canada),The Latent Relation Mapping Engine: Algorithm and Experiments,cs.CL cs.AI cs.LG," Many AI researchers and cognitive scientists have argued that analogy is the -core of cognition. The most influential work on computational modeling of -analogy-making is Structure Mapping Theory (SMT) and its implementation in the -Structure Mapping Engine (SME). A limitation of SME is the requirement for -complex hand-coded representations. We introduce the Latent Relation Mapping -Engine (LRME), which combines ideas from SME and Latent Relational Analysis -(LRA) in order to remove the requirement for hand-coded representations. LRME -builds analogical mappings between lists of words, using a large corpus of raw -text to automatically discover the semantic relations among the words. We -evaluate LRME on a set of twenty analogical mapping problems, ten based on -scientific analogies and ten based on common metaphors. LRME achieves -human-level performance on the twenty problems. We compare LRME with a variety -of alternative approaches and find that they are not able to reach the same -level of performance. -" -127,0901.2216,"Animesh Mukherjee, Monojit Choudhury and Ravi Kannan","Discovering Global Patterns in Linguistic Networks through Spectral - Analysis: A Case Study of the Consonant Inventories",cs.CL physics.data-an," Recent research has shown that language and the socio-cognitive phenomena -associated with it can be aptly modeled and visualized through networks of -linguistic entities. However, most of the existing works on linguistic networks -focus only on the local properties of the networks. This study is an attempt to -analyze the structure of languages via a purely structural technique, namely -spectral analysis, which is ideally suited for discovering the global -correlations in a network. Application of this technique to PhoNet, the -co-occurrence network of consonants, not only reveals several natural -linguistic principles governing the structure of the consonant inventories, but -is also able to quantify their relative importance. We believe that this -powerful technique can be successfully applied, in general, to study the -structure of natural languages. -" -128,0901.2349,"Eduardo G. Altmann, Janet B. Pierrehumbert, and Adilson E. Motter","Beyond word frequency: Bursts, lulls, and scaling in the temporal - distributions of words",cs.CL cond-mat.dis-nn physics.data-an physics.soc-ph," Background: Zipf's discovery that word frequency distributions obey a power -law established parallels between biological and physical processes, and -language, laying the groundwork for a complex systems perspective on human -communication. More recent research has also identified scaling regularities in -the dynamics underlying the successive occurrences of events, suggesting the -possibility of similar findings for language as well. - Methodology/Principal Findings: By considering frequent words in USENET -discussion groups and in disparate databases where the language has different -levels of formality, here we show that the distributions of distances between -successive occurrences of the same word display bursty deviations from a -Poisson process and are well characterized by a stretched exponential (Weibull) -scaling. The extent of this deviation depends strongly on semantic type -- a -measure of the logicality of each word -- and less strongly on frequency. We -develop a generative model of this behavior that fully determines the dynamics -of word usage. - Conclusions/Significance: Recurrence patterns of words are well described by -a stretched exponential distribution of recurrence times, an empirical scaling -that cannot be anticipated from Zipf's law. Because the use of words provides a -uniquely precise and powerful lens on human thought and activity, our findings -also have implications for other overt manifestations of collective human -dynamics. -" -129,0901.2924,"Alvaro Corral (1), Ramon Ferrer-i-Cancho (2), Gemma Boleda (2), Albert - Diaz-Guilera (3). ((1) Centre de Recerca Matematica, (2) U Politecnica - Catalunya, (3) U Barcelona)",Universal Complex Structures in Written Language,physics.soc-ph cs.CL," Quantitative linguistics has provided us with a number of empirical laws that -characterise the evolution of languages and competition amongst them. In terms -of language usage, one of the most influential results is Zipf's law of word -frequencies. Zipf's law appears to be universal, and may not even be unique to -human language. However, there is ongoing controversy over whether Zipf's law -is a good indicator of complexity. Here we present an alternative approach that -puts Zipf's law in the context of critical phenomena (the cornerstone of -complexity in physics) and establishes the presence of a large scale -""attraction"" between successive repetitions of words. Moreover, this phenomenon -is scale-invariant and universal -- the pattern is independent of word -frequency and is observed in texts by different authors and written in -different languages. There is evidence, however, that the shape of the scaling -relation changes for words that play a key role in the text, implying the -existence of different ""universality classes"" in the repetition of words. These -behaviours exhibit striking parallels with complex catastrophic phenomena. -" -130,0901.3017,"Nisha Yadav, Hrishikesh Joglekar, Rajesh P. N. Rao, M. N. Vahia, - Iravatham Mahadevan and R. Adhikari",Statistical analysis of the Indus script using $n$-grams,cs.CL," The Indus script is one of the major undeciphered scripts of the ancient -world. The small size of the corpus, the absence of bilingual texts, and the -lack of definite knowledge of the underlying language has frustrated efforts at -decipherment since the discovery of the remains of the Indus civilisation. -Recently, some researchers have questioned the premise that the Indus script -encodes spoken language. Building on previous statistical approaches, we apply -the tools of statistical language processing, specifically $n$-gram Markov -chains, to analyse the Indus script for syntax. Our main results are that the -script has well-defined signs which begin and end texts, that there is -directionality and strong correlations in the sign order, and that there are -groups of signs which appear to have identical syntactic function. All these -require no {\it a priori} suppositions regarding the syntactic or semantic -content of the signs, but follow directly from the statistical analysis. Using -information theoretic measures, we find the information in the script to be -intermediate between that of a completely random and a completely fixed -ordering of signs. Our study reveals that the Indus script is a structured sign -system showing features of a formal language, but, at present, cannot -conclusively establish that it encodes {\it natural} language. Our $n$-gram -Markov model is useful for predicting signs which are missing or illegible in a -corpus of Indus texts. This work forms the basis for the development of a -stochastic grammar which can be used to explore the syntax of the Indus script -in greater detail. -" -131,0901.3291,"Stanislaw Drozdz, Jaroslaw Kwapien, Adam Orczyk",Approaching the linguistic complexity,cs.CL physics.data-an," We analyze the rank-frequency distributions of words in selected English and -Polish texts. We compare scaling properties of these distributions in both -languages. We also study a few small corpora of Polish literary texts and find -that for a corpus consisting of texts written by different authors the basic -scaling regime is broken more strongly than in the case of comparable corpus -consisting of texts written by the same author. Similarly, for a corpus -consisting of texts translated into Polish from other languages the scaling -regime is broken more strongly than for a comparable corpus of native Polish -texts. Moreover, based on the British National Corpus, we consider the -rank-frequency distributions of the grammatically basic forms of words (lemmas) -tagged with their proper part of speech. We find that these distributions do -not scale if each part of speech is analyzed separately. The only part of -speech that independently develops a trace of scaling is verbs. -" -132,0901.3990,"Bernard Jacquemin (LIMSI), Sabine Ploux (L2C2)",Du corpus au dictionnaire,cs.CL cs.IR," In this article, we propose an automatic process to build multi-lingual -lexico-semantic resources. The goal of these resources is to browse -semantically textual information contained in texts of different languages. -This method uses a mathematical model called Atlas s\'emantiques in order to -represent the different senses of each word. It uses the linguistic relations -between words to create graphs that are projected into a semantic space. These -projections constitute semantic maps that denote the sense trends of each given -word. This model is fed with syntactic relations between words extracted from a -corpus. Therefore, the lexico-semantic resource produced describes all the -words and all their meanings observed in the corpus. The sense trends are -expressed by syntactic contexts, typical for a given meaning. The link between -each sense trend and the utterances used to build the sense trend are also -stored in an index. Thus all the instances of a word in a particular sense are -linked and can be browsed easily. And by using several corpora of different -languages, several resources are built that correspond with each other through -languages. It makes it possible to browse information through languages thanks -to syntactic contexts translations (even if some of them are partial). -" -133,0901.4180,Bj{\o}rn Kjos-Hanssen and Alberto J. Evangelista,Google distance between words,cs.CL," Cilibrasi and Vitanyi have demonstrated that it is possible to extract the -meaning of words from the world-wide web. To achieve this, they rely on the -number of webpages that are found through a Google search containing a given -word and they associate the page count to the probability that the word appears -on a webpage. Thus, conditional probabilities allow them to correlate one word -with another word's meaning. Furthermore, they have developed a similarity -distance function that gauges how closely related a pair of words is. We -present a specific counterexample to the triangle inequality for this -similarity distance function. -" -134,0901.4375,"P.D. Bruza, K. Kitto, D. Nelson, C. McEvoy","Extracting Spooky-activation-at-a-distance from Considerations of - Entanglement",physics.data-an cs.CL quant-ph," Following an early claim by Nelson & McEvoy \cite{Nelson:McEvoy:2007} -suggesting that word associations can display `spooky action at a distance -behaviour', a serious investigation of the potentially quantum nature of such -associations is currently underway. This paper presents a simple quantum model -of a word association system. It is shown that a quantum model of word -entanglement can recover aspects of both the Spreading Activation equation and -the Spooky-activation-at-a-distance equation, both of which are used to model -the activation level of words in human memory. -" -135,0901.4784,Fabio G. Guerrero,On the Entropy of Written Spanish,cs.CL cs.IT math.IT," This paper reports on results on the entropy of the Spanish language. They -are based on an analysis of natural language for n-word symbols (n = 1 to 18), -trigrams, digrams, and characters. The results obtained in this work are based -on the analysis of twelve different literary works in Spanish, as well as a -279917 word news file provided by the Spanish press agency EFE. Entropy values -are calculated by a direct method using computer processing and the probability -law of large numbers. Three samples of artificial Spanish language produced by -a first-order model software source are also analyzed and compared with natural -Spanish language. -" -136,0902.0606,"M. Angeles Serrano, Alessandro Flammini, and Filippo Menczer",Beyond Zipf's law: Modeling the structure of human language,cs.CL physics.soc-ph," Human language, the most powerful communication system in history, is closely -associated with cognition. Written text is one of the fundamental -manifestations of language, and the study of its universal regularities can -give clues about how our brains process information and how we, as a society, -organize and share it. Still, only classical patterns such as Zipf's law have -been explored in depth. In contrast, other basic properties like the existence -of bursts of rare words in specific documents, the topical organization of -collections, or the sublinear growth of vocabulary size with the length of a -document, have only been studied one by one and mainly applying heuristic -methodologies rather than basic principles and general mechanisms. As a -consequence, there is a lack of understanding of linguistic processes as -complex emergent phenomena. Beyond Zipf's law for word frequencies, here we -focus on Heaps' law, burstiness, and the topicality of document collections, -which encode correlations within and across documents absent in random null -models. We introduce and validate a generative model that explains the -simultaneous emergence of all these patterns from simple rules. As a result, we -find a connection between the bursty nature of rare words and the topical -organization of texts and identify dynamic word ranking and memory across -documents as key mechanisms explaining the non trivial organization of written -text. Our research can have broad implications and practical applications in -computer science, cognitive science, and linguistics. -" -137,0902.1033,"Sylvain Raybaud (INRIA Lorraine - LORIA), Caroline Lavecchia (INRIA - Lorraine - LORIA), David Langlois (INRIA Lorraine - LORIA), Kamel Sma\""ili - (INRIA Lorraine - LORIA)",New Confidence Measures for Statistical Machine Translation,cs.CL," A confidence measure is able to estimate the reliability of an hypothesis -provided by a machine translation system. The problem of confidence measure can -be seen as a process of testing : we want to decide whether the most probable -sequence of words provided by the machine translation system is correct or not. -In the following we describe several original word-level confidence measures -for machine translation, based on mutual information, n-gram language model and -lexical features language model. We evaluate how well they perform individually -or together, and show that using a combination of confidence measures based on -mutual information yields a classification error rate as low as 25.1% with an -F-measure of 0.708. -" -138,0902.2230,Ama\c{c} Herda\u{g}delen and Marco Baroni,BagPack: A general framework to represent semantic relations,cs.CL cs.IR," We introduce a way to represent word pairs instantiating arbitrary semantic -relations that keeps track of the contexts in which the words in the pair occur -both together and independently. The resulting features are of sufficient -generality to allow us, with the help of a standard supervised machine learning -algorithm, to tackle a variety of unrelated semantic tasks with good results -and almost no task-specific tailoring. -" -139,0902.2345,Stergos D. Afantenos and Nicolas Hernandez,What's in a Message?,cs.CL," In this paper we present the first step in a larger series of experiments for -the induction of predicate/argument structures. The structures that we are -inducing are very similar to the conceptual structures that are used in Frame -Semantics (such as FrameNet). Those structures are called messages and they -were previously used in the context of a multi-document summarization system of -evolving events. The series of experiments that we are proposing are -essentially composed from two stages. In the first stage we are trying to -extract a representative vocabulary of words. This vocabulary is later used in -the second stage, during which we apply to it various clustering approaches in -order to identify the clusters of predicates and arguments--or frames and -semantic roles, to use the jargon of Frame Semantics. This paper presents in -detail and evaluates the first stage. -" -140,0902.3072,"Eric Laporte (IGM-LabInfo), Elisabete Ranchhod (ONSET-CEL), Anastasia - Yannacopoulou (IGM-LabInfo)",Syntactic variation of support verb constructions,cs.CL," We report experiments about the syntactic variations of support verb -constructions, a special type of multiword expressions (MWEs) containing -predicative nouns. In these expressions, the noun can occur with or without the -verb, with no clear-cut semantic difference. We extracted from a large French -corpus a set of examples of the two situations and derived statistical results -from these data. The extraction involved large-coverage language resources and -finite-state techniques. The results show that, most frequently, predicative -nouns occur without a support verb. This fact has consequences on methods of -extracting or recognising MWEs. -" -141,0902.4060,"Ken Yamamoto, Yoshihiro Yamazaki",Network of two-Chinese-character compound words in Japanese language,cs.CL physics.soc-ph," Some statistical properties of a network of two-Chinese-character compound -words in Japanese language are reported. In this network, a node represents a -Chinese character and an edge represents a two-Chinese-character compound word. -It is found that this network has properties of ""small-world"" and ""scale-free."" -A network formed by only Chinese characters for common use ({\it joyo-kanji} in -Japanese), which is regarded as a subclass of the original network, also has -small-world property. However, a degree distribution of the network exhibits no -clear power law. In order to reproduce disappearance of the power-law property, -a model for a selecting process of the Chinese characters for common use is -proposed. -" -142,0903.2792,"Kostadin Koroutchev, Jian Shen, Elka Koroutcheva and Manuel Cebrian",Thermodynamics of Information Retrieval,cs.IT cs.CL cs.SI math.IT," In this work, we suggest a parameterized statistical model (the gamma -distribution) for the frequency of word occurrences in long strings of English -text and use this model to build a corresponding thermodynamic picture by -constructing the partition function. We then use our partition function to -compute thermodynamic quantities such as the free energy and the specific heat. -In this approach, the parameters of the word frequency model vary from word to -word so that each word has a different corresponding thermodynamics and we -suggest that differences in the specific heat reflect differences in how the -words are used in language, differentiating keywords from common and function -words. Finally, we apply our thermodynamic picture to the problem of retrieval -of texts based on keywords and suggest some advantages over traditional -information retrieval methods. -" -143,0903.5168,"Rakesh Pandey, H.S. Dhami","Mathematical Model for Transformation of Sentences from Active Voice to - Passive Voice",cs.CL," Formal work in linguistics has both produced and used important mathematical -tools. Motivated by a survey of models for context and word meaning, syntactic -categories, phrase structure rules and trees, an attempt is being made in the -present paper to present a mathematical model for structuring of sentences from -active voice to passive voice, which is is the form of a transitive verb whose -grammatical subject serves as the patient, receiving the action of the verb. - For this purpose we have parsed all sentences of a corpus and have generated -Boolean groups for each of them. It has been observed that when we take -constituents of the sentences as subgroups, the sequences of phrases form -permutation roups. Application of isomorphism property yields permutation -mapping between the important subgroups. It has resulted in a model for -transformation of sentences from active voice to passive voice. A computer -program has been written to enable the software developers to evolve grammar -software for sentence transformations. -" -144,0904.1289,"Monojit Choudhury, Animesh Mukherjee, Anupam Basu, Niloy Ganguly, - Ashish Garg, Vaibhav Jalan","Language Diversity across the Consonant Inventories: A Study in the - Framework of Complex Networks",cs.CL physics.comp-ph physics.soc-ph," n this paper, we attempt to explain the emergence of the linguistic diversity -that exists across the consonant inventories of some of the major language -families of the world through a complex network based growth model. There is -only a single parameter for this model that is meant to introduce a small -amount of randomness in the otherwise preferential attachment based growth -process. The experiments with this model parameter indicates that the choice of -consonants among the languages within a family are far more preferential than -it is across the families. The implications of this result are twofold -- (a) -there is an innate preference of the speakers towards acquiring certain -linguistic structures over others and (b) shared ancestry propels the stronger -preferential connection between the languages within a family than across them. -Furthermore, our observations indicate that this parameter might bear a -correlation with the period of existence of the language families under -investigation. -" -145,0905.0740,Gerardo Cisneros,"A FORTRAN coded regular expression Compiler for IBM 1130 Computing - System",cs.CL cs.PL," REC (Regular Expression Compiler) is a concise programming language which -allows students to write programs without knowledge of the complicated syntax -of languages like FORTRAN and ALGOL. The language is recursive and contains -only four elements for control. This paper describes an interpreter of REC -written in FORTRAN. -" -146,0905.1130,"Florian Boudin, Patricia Velazquez-Morales and Juan-Manuel - Torres-Moreno",Statistical Automatic Summarization in Organic Chemistry,cs.IR cs.CL," We present an oriented numerical summarizer algorithm, applied to producing -automatic summaries of scientific documents in Organic Chemistry. We present -its implementation named Yachs (Yet Another Chemistry Summarizer) that combines -a specific document pre-processing with a sentence scoring method relying on -the statistical properties of documents. We show that Yachs achieves the best -results among several other summarizers on a corpus of Organic Chemistry -articles. -" -147,0905.1235,"Serguei A. Mokhov, Stephen Sinclair, Ian Cl\'ement, Dimitrios - Nicolacopoulos (for the MARF R&D Group)","The Modular Audio Recognition Framework (MARF) and its Applications: - Scientific and Software Engineering Notes",cs.SD cs.CL cs.CV cs.MM cs.NE," MARF is an open-source research platform and a collection of -voice/sound/speech/text and natural language processing (NLP) algorithms -written in Java and arranged into a modular and extensible framework -facilitating addition of new algorithms. MARF can run distributively over the -network and may act as a library in applications or be used as a source for -learning and extension. A few example applications are provided to show how to -use the framework. There is an API reference in the Javadoc format as well as -this set of accompanying notes with the detailed description of the -architectural design, algorithms, and applications. MARF and its applications -are released under a BSD-style license and is hosted at SourceForge.net. This -document provides the details and the insight on the internals of MARF and some -of the mentioned applications. -" -148,0905.1609,Nabil Hathout (CLLE),"Acquisition of morphological families and derivational series from a - machine readable dictionary",cs.CL," The paper presents a linguistic and computational model aiming at making the -morphological structure of the lexicon emerge from the formal and semantic -regularities of the words it contains. The model is word-based. The proposed -morphological structure consists of (1) binary relations that connect each -headword with words that are morphologically related, and especially with the -members of its morphological family and its derivational series, and of (2) the -analogies that hold between the words. The model has been tested on the lexicon -of French using the TLFi machine readable dictionary. -" -149,0905.2990,"Juan-Manuel Torres-Moreno and Pier-Luc St-Onge and Michel Gagnon and - Marc El-B\`eze and Patrice Bellot","Automatic Summarization System coupled with a Question-Answering System - (QAAS)",cs.IR cs.CL," To select the most relevant sentences of a document, it uses an optimal -decision algorithm that combines several metrics. The metrics processes, -weighting and extract pertinence sentences by statistical and informational -algorithms. This technique might improve a Question-Answering system, whose -function is to provide an exact answer to a question in natural language. In -this paper, we present the results obtained by coupling the Cortex summarizer -with a Question-Answering system (QAAS). Two configurations have been -evaluated. In the first one, a low compression level is selected and the -summarization system is only used as a noise filter. In the second -configuration, the system actually functions as a summarizer, with a very high -level of compression. Our results on French corpus demonstrate that the -coupling of Automatic Summarization system with a Question-Answering system is -promising. Then the system has been adapted to generate a customized summary -depending on the specific question. Tests on a french multi-document corpus -have been realized, and the personalized QAAS system obtains the best -performances. -" -150,0905.3318,Maarten Hijzelendoorn and Crit Cremers,An Object-Oriented and Fast Lexicon for Semantic Generation,cs.CL cs.DB cs.DS cs.IR cs.PL," This paper is about the technical design of a large computational lexicon, -its storage, and its access from a Prolog environment. Traditionally, efficient -access and storage of data structures is implemented by a relational database -management system. In Delilah, a lexicon-based NLP system, efficient access to -the lexicon by the semantic generator is vital. We show that our highly -detailed HPSG-style lexical specifications do not fit well in the Relational -Model, and that they cannot be efficiently retrieved. We argue that they fit -more naturally in the Object-Oriented Model. Although storage of objects is -redundant, we claim that efficient access is still possible by applying -indexing, and compression techniques from the Relational Model to the -Object-Oriented Model. We demonstrate that it is possible to implement -object-oriented storage and fast access in ISO Prolog. -" -151,0905.4039,"Rudi L. Cilibrasi (software consultant Oakland, CA) and Paul M.B. - Vitanyi (CWI, Amsterdam)",Normalized Web Distance and Word Similarity,cs.CL cs.IR," There is a great deal of work in cognitive psychology, linguistics, and -computer science, about using word (or phrase) frequencies in context in text -corpora to develop measures for word similarity or word association, going back -to at least the 1960s. The goal of this chapter is to introduce the -normalizedis a general way to tap the amorphous low-grade knowledge available -for free on the Internet, typed in by local users aiming at personal -gratification of diverse objectives, and yet globally achieving what is -effectively the largest semantic electronic database in the world. Moreover, -this database is available for all by using any search engine that can return -aggregate page-count estimates for a large range of search-queries. In the -paper introducing the NWD it was called `normalized Google distance (NGD),' but -since Google doesn't allow computer searches anymore, we opt for the more -neutral and descriptive NWD. web distance (NWD) method to determine similarity -between words and phrases. It -" -152,0906.0675,"Martin Holmes (HCMC), Laurent Romary (INRIA Saclay - Ile de France, - IDSL)",Encoding models for scholarly literature,cs.CL," We examine the issue of digital formats for document encoding, archiving and -publishing, through the specific example of ""born-digital"" scholarly journal -articles. We will begin by looking at the traditional workflow of journal -editing and publication, and how these practices have made the transition into -the online domain. We will examine the range of different file formats in which -electronic articles are currently stored and published. We will argue strongly -that, despite the prevalence of binary and proprietary formats such as PDF and -MS Word, XML is a far superior encoding choice for journal articles. Next, we -look at the range of XML document structures (DTDs, Schemas) which are in -common use for encoding journal articles, and consider some of their strengths -and weaknesses. We will suggest that, despite the existence of specialized -schemas intended specifically for journal articles (such as NLM), and more -broadly-used publication-oriented schemas such as DocBook, there are strong -arguments in favour of developing a subset or customization of the Text -Encoding Initiative (TEI) schema for the purpose of journal-article encoding; -TEI is already in use in a number of journal publication projects, and the -scale and precision of the TEI tagset makes it particularly appropriate for -encoding scholarly articles. We will outline the document structure of a -TEI-encoded journal article, and look in detail at suggested markup patterns -for specific features of journal articles. -" -153,0906.0716,"Sebastian Bernhardsson, Luis Enrique Correa da Rocha and Petter - Minnhagen",Size dependent word frequencies and translational invariance of books,cs.CL physics.soc-ph," It is shown that a real novel shares many characteristic features with a null -model in which the words are randomly distributed throughout the text. Such a -common feature is a certain translational invariance of the text. Another is -that the functional form of the word-frequency distribution of a novel depends -on the length of the text in the same way as the null model. This means that an -approximate power-law tail ascribed to the data will have an exponent which -changes with the size of the text-section which is analyzed. A further -consequence is that a novel cannot be described by text-evolution models like -the Simon model. The size-transformation of a novel is found to be well -described by a specific Random Book Transformation. This size transformation in -addition enables a more precise determination of the functional form of the -word-frequency distribution. The implications of the results are discussed. -" -154,0906.1467,"Chris Biemann, Monojit Choudhury and Animesh Mukherjee","Syntax is from Mars while Semantics from Venus! Insights from Spectral - Analysis of Distributional Similarity Networks",physics.data-an cs.CL," We study the global topology of the syntactic and semantic distributional -similarity networks for English through the technique of spectral analysis. We -observe that while the syntactic network has a hierarchical structure with -strong communities and their mixtures, the semantic network has several tightly -knit communities along with a large core without any such well-defined -community structure. -" -155,0906.2369,Andreas Maletti and Catalin Ionut Tirnauca,Properties of quasi-alphabetic tree bimorphisms,cs.CL cs.FL," We study the class of quasi-alphabetic relations, i.e., tree transformations -defined by tree bimorphisms with two quasi-alphabetic tree homomorphisms and a -regular tree language. We present a canonical representation of these -relations; as an immediate consequence, we get the closure under union. Also, -we show that they are not closed under intersection and complement, and do not -preserve most common operations on trees (branches, subtrees, v-product, -v-quotient, f-top-catenation). Moreover, we prove that the translations defined -by quasi-alphabetic tree bimorphism are exactly products of context-free string -languages. We conclude by presenting the connections between quasi-alphabetic -relations, alphabetic relations and classes of tree transformations defined by -several types of top-down tree transducers. Furthermore, we get that -quasi-alphabetic relations preserve the recognizable and algebraic tree -languages. -" -156,0906.2415,"Cristian Danescu-Niculescu-Mizil, Lillian Lee and Richard Ducott","Without a 'doubt'? Unsupervised discovery of downward-entailing - operators",cs.CL," An important part of textual inference is making deductions involving -monotonicity, that is, determining whether a given assertion entails -restrictions or relaxations of that assertion. For instance, the statement 'We -know the epidemic spread quickly' does not entail 'We know the epidemic spread -quickly via fleas', but 'We doubt the epidemic spread quickly' entails 'We -doubt the epidemic spread quickly via fleas'. Here, we present the first -algorithm for the challenging lexical-semantics problem of learning linguistic -constructions that, like 'doubt', are downward entailing (DE). Our algorithm is -unsupervised, resource-lean, and effective, accurately recovering many DE -operators that are missing from the hand-constructed lists that -textual-inference systems currently use. -" -157,0906.2835,Mikhail Basilyan,"Employing Wikipedia's Natural Intelligence For Cross Language - Information Retrieval",cs.IR cs.CL," In this paper we present a novel method for retrieving information in -languages other than that of the query. We use this technique in combination -with existing traditional Cross Language Information Retrieval (CLIR) -techniques to improve their results. This method has a number of advantages -over traditional techniques that rely on machine translation to translate the -query and then search the target document space using a machine translation. -This method is not limited to the availability of a machine translation -algorithm for the desired language and uses already existing sources of readily -available translated information on the internet as a ""middle-man"" approach. In -this paper we use Wikipedia; however, any similar multilingual, cross -referenced body of documents can be used. For evaluation and comparison -purposes we also implemented a traditional machine translation approach -separately as well as the Wikipedia approach separately. -" -158,0906.3741,"Cristian Danescu-Niculescu-Mizil, Gueorgi Kossinets, Jon Kleinberg, - Lillian Lee","How opinions are received by online communities: A case study on - Amazon.com helpfulness votes",cs.CL cs.IR physics.data-an physics.soc-ph," There are many on-line settings in which users publicly express opinions. A -number of these offer mechanisms for other users to evaluate these opinions; a -canonical example is Amazon.com, where reviews come with annotations like ""26 -of 32 people found the following review helpful."" Opinion evaluation appears in -many off-line settings as well, including market research and political -campaigns. Reasoning about the evaluation of an opinion is fundamentally -different from reasoning about the opinion itself: rather than asking, ""What -did Y think of X?"", we are asking, ""What did Z think of Y's opinion of X?"" Here -we develop a framework for analyzing and modeling opinion evaluation, using a -large-scale collection of Amazon book reviews as a dataset. We find that the -perceived helpfulness of a review depends not just on its content but also but -also in subtle ways on how the expressed evaluation relates to other -evaluations of the same product. As part of our approach, we develop novel -methods that take advantage of the phenomenon of review ""plagiarism"" to control -for the effects of text in opinion evaluation, and we provide a simple and -natural mathematical model consistent with our findings. Our analysis also -allows us to distinguish among the predictions of competing theories from -sociology and social psychology, and to discover unexpected differences in the -collective opinion-evaluation behavior of user populations from different -countries. -" -159,0906.5114,Hal Daum\'e III,Non-Parametric Bayesian Areal Linguistics,cs.CL," We describe a statistical model over linguistic areas and phylogeny. - Our model recovers known areas and identifies a plausible hierarchy of areal -features. The use of areas improves genetic reconstruction of languages both -qualitatively and quantitatively according to a variety of metrics. We model -linguistic areas by a Pitman-Yor process and linguistic phylogeny by Kingman's -coalescent. -" -160,0907.0784,Hal Daum\'e III,Cross-Task Knowledge-Constrained Self Training,cs.LG cs.CL," We present an algorithmic framework for learning multiple related tasks. Our -framework exploits a form of prior knowledge that relates the output spaces of -these tasks. We present PAC learning results that analyze the conditions under -which such learning is possible. We present results on learning a shallow -parser and named-entity recognition system that exploits our framework, showing -consistent improvements over baseline methods. -" -161,0907.0785,Hal Daum\'e III and Lyle Campbell,A Bayesian Model for Discovering Typological Implications,cs.CL," A standard form of analysis for linguistic typology is the universal -implication. These implications state facts about the range of extant -languages, such as ``if objects come after verbs, then adjectives come after -nouns.'' Such implications are typically discovered by painstaking hand -analysis over a small sample of languages. We propose a computational model for -assisting at this process. Our model is able to discover both well-known -implications as well as some novel implications that deserve further study. -Moreover, through a careful application of hierarchical analysis, we are able -to cope with the well-known sampling problem: languages are not independent. -" -162,0907.0786,Hal Daum\'e III and John Langford and Daniel Marcu,Search-based Structured Prediction,cs.LG cs.CL," We present Searn, an algorithm for integrating search and learning to solve -complex structured prediction problems such as those that occur in natural -language, speech, computational biology, and vision. Searn is a meta-algorithm -that transforms these complex problems into simple classification problems to -which any binary classifier may be applied. Unlike current algorithms for -structured learning that require decomposition of both the loss function and -the feature functions over the predicted structure, Searn is able to learn -prediction functions for any loss function and any class of features. Moreover, -Searn comes with a strong, natural theoretical guarantee: good performance on -the derived classification problems implies good performance on the structured -prediction problem. -" -163,0907.0804,Hal Daum\'e III and Daniel Marcu,"Induction of Word and Phrase Alignments for Automatic Document - Summarization",cs.CL," Current research in automatic single document summarization is dominated by -two effective, yet naive approaches: summarization by sentence extraction, and -headline generation via bag-of-words models. While successful in some tasks, -neither of these models is able to adequately capture the large set of -linguistic devices utilized by humans when they produce summaries. One possible -explanation for the widespread use of these models is that good techniques have -been developed to extract appropriate training data for them from existing -document/abstract and document/headline corpora. We believe that future -progress in automatic summarization will be driven both by the development of -more sophisticated, linguistically informed models, as well as a more effective -leveraging of document/abstract corpora. In order to open the doors to -simultaneously achieving both of these goals, we have developed techniques for -automatically producing word-to-word and phrase-to-phrase alignments between -documents and their human-written abstracts. These alignments make explicit the -correspondences that exist in such document/abstract pairs, and create a -potentially rich data source from which complex summarization algorithms may -learn. This paper describes experiments we have carried out to analyze the -ability of humans to perform such alignments, and based on these analyses, we -describe experiments for creating them automatically. Our model for the -alignment task is based on an extension of the standard hidden Markov model, -and learns to create alignments in a completely unsupervised fashion. We -describe our model in detail and present experimental results that show that -our model is able to learn to reliably identify word- and phrase-level -alignments in a corpus of pairs. -" -164,0907.0806,Hal Daum\'e III and Daniel Marcu,A Noisy-Channel Model for Document Compression,cs.CL," We present a document compression system that uses a hierarchical -noisy-channel model of text production. Our compression system first -automatically derives the syntactic structure of each sentence and the overall -discourse structure of the text given as input. The system then uses a -statistical hierarchical model of text production in order to drop -non-important syntactic and discourse constituents so as to generate coherent, -grammatical document compressions of arbitrary length. The system outperforms -both a baseline and a sentence-based compression system that operates by -simplifying sequentially all sentences in a text. Our results support the claim -that discourse knowledge plays an important role in document summarization. -" -165,0907.0807,Hal Daum\'e III and Daniel Marcu,"A Large-Scale Exploration of Effective Global Features for a Joint - Entity Detection and Tracking Model",cs.CL," Entity detection and tracking (EDT) is the task of identifying textual -mentions of real-world entities in documents, extending the named entity -detection and coreference resolution task by considering mentions other than -names (pronouns, definite descriptions, etc.). Like NE tagging and coreference -resolution, most solutions to the EDT task separate out the mention detection -aspect from the coreference aspect. By doing so, these solutions are limited to -using only local features for learning. In contrast, by modeling both aspects -of the EDT task simultaneously, we are able to learn using highly complex, -non-local features. We develop a new joint EDT model and explore the utility of -many features, demonstrating their effectiveness on this task. -" -166,0907.0809,Hal Daum\'e III and Daniel Marcu,"Learning as Search Optimization: Approximate Large Margin Methods for - Structured Prediction",cs.LG cs.CL," Mappings to structured output spaces (strings, trees, partitions, etc.) are -typically learned using extensions of classification algorithms to simple -graphical structures (eg., linear chains) in which search and parameter -estimation can be performed exactly. Unfortunately, in many complex problems, -it is rare that exact search or parameter estimation is tractable. Instead of -learning exact models and searching via heuristic means, we embrace this -difficulty and treat the structured output problem in terms of approximate -search. We present a framework for learning as search optimization, and two -parameter updates with convergence theorems and bounds. Empirical evidence -shows that our integrated approach to learning and decoding can outperform -exact models at smaller computational cost. -" -167,0907.1558,Marcelo A. Montemurro and Damian Zanette,"Towards the quantification of the semantic information encoded in - written language",physics.soc-ph cs.CL physics.data-an," Written language is a complex communication signal capable of conveying -information encoded in the form of ordered sequences of words. Beyond the local -order ruled by grammar, semantic and thematic structures affect long-range -patterns in word usage. Here, we show that a direct application of information -theory quantifies the relationship between the statistical distribution of -words and the semantic content of the text. We show that there is a -characteristic scale, roughly around a few thousand words, which establishes -the typical size of the most informative segments in written language. -Moreover, we find that the words whose contributions to the overall information -is larger, are the ones more closely associated with the main subjects and -topics of the text. This scenario can be explained by a model of word usage -that assumes that words are distributed along the text in domains of a -characteristic size where their frequency is higher than elsewhere. Our -conclusions are based on the analysis of a large database of written language, -diverse in subjects and styles, and thus are likely to be applicable to general -language sequences encoding complex information. -" -168,0907.1814,Hal Daum\'e III,Bayesian Query-Focused Summarization,cs.CL cs.IR cs.LG," We present BayeSum (for ``Bayesian summarization''), a model for sentence -extraction in query-focused summarization. BayeSum leverages the common case in -which multiple documents are relevant to a single query. Using these documents -as reinforcement for query terms, BayeSum is not afflicted by the paucity of -information in short queries. We show that approximate inference in BayeSum is -possible on large data sets and results in a state-of-the-art summarization -system. Furthermore, we show how BayeSum can be understood as a justified query -expansion technique in the language modeling for IR framework. -" -169,0907.1815,Hal Daum\'e III,Frustratingly Easy Domain Adaptation,cs.LG cs.CL," We describe an approach to domain adaptation that is appropriate exactly in -the case when one has enough ``target'' data to do slightly better than just -using only ``source'' data. Our approach is incredibly simple, easy to -implement as a preprocessing step (10 lines of Perl!) and outperforms -state-of-the-art approaches on a range of datasets. Moreover, it is trivially -extended to a multi-domain adaptation problem, where one has data from a -variety of different domains. -" -170,0907.2452,"Koichi Takeuchi (NII), Kyo Kageura (NII), Teruo Koyama (NII), - B\'eatrice Daille (LINA), Laurent Romary (INRIA Lorraine - LORIA)",Pattern Based Term Extraction Using ACABIT System,cs.CL," In this paper, we propose a pattern-based term extraction approach for -Japanese, applying ACABIT system originally developed for French. The proposed -approach evaluates termhood using morphological patterns of basic terms and -term variants. After extracting term candidates, ACABIT system filters out -non-terms from the candidates based on log-likelihood. This approach is -suitable for Japanese term extraction because most of Japanese terms are -compound nouns or simple phrasal patterns. -" -171,0907.3781,St\'ephanie L\'eon (LIRMM),"Un syst\`eme modulaire d'acquisition automatique de traductions \`a - partir du Web",cs.CL," We present a method of automatic translation (French/English) of Complex -Lexical Units (CLU) for aiming at extracting a bilingual lexicon. Our modular -system is based on linguistic properties (compositionality, polysemy, etc.). -Different aspects of the multilingual Web are used to validate candidate -translations and collect new terms. We first build a French corpus of Web pages -to collect CLU. Three adapted processing stages are applied for each linguistic -property : compositional and non polysemous translations, compositional -polysemous translations and non compositional translations. Our evaluation on a -sample of CLU shows that our technique based on the Web can reach a very high -precision. -" -172,0907.4960,Muthiah Annamalai,Ezhil: A Tamil Programming Language,cs.PL cs.CL," Ezhil is a Tamil language based interpreted procedural programming language. -Tamil keywords and grammar are chosen to make the native Tamil speaker write -programs in the Ezhil system. Ezhil allows easy representation of computer -program closer to the Tamil language logical constructs equivalent to the -conditional, branch and loop statements in modern English based programming -languages. Ezhil is a compact programming language aimed towards Tamil speaking -novice computer users. Grammar for Ezhil and a few example programs are -reported here, from the initial proof-of-concept implementation using the -Python programming language1. To the best of our knowledge, Ezhil language is -the first freely available Tamil programming language. -" -173,0907.5083,M. Sakthi Balan (Infosys),"Serializing the Parallelism in Parallel Communicating Pushdown Automata - Systems",cs.FL cs.CL cs.DC," We consider parallel communicating pushdown automata systems (PCPA) and -define a property called known communication for it. We use this property to -prove that the power of a variant of PCPA, called returning centralized -parallel communicating pushdown automata (RCPCPA), is equivalent to that of -multi-head pushdown automata. The above result presents a new sub-class of -returning parallel communicating pushdown automata systems (RPCPA) called -simple-RPCPA and we show that it can be written as a finite intersection of -multi-head pushdown automata systems. -" -174,0908.4413,"Patrice Lopez (IDSL), Laurent Romary (IDSL, INRIA Saclay - Ile de - France)",Multiple Retrieval Models and Regression Models for Prior Art Search,cs.CL," This paper presents the system called PATATRAS (PATent and Article Tracking, -Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach -presents three main characteristics: 1. The usage of multiple retrieval models -(KL, Okapi) and term index definitions (lemma, phrase, concept) for the three -languages considered in the present track (English, French, German) producing -ten different sets of ranked results. 2. The merging of the different results -based on multiple regression models using an additional validation set created -from the patent collection. 3. The exploitation of patent metadata and of the -citation structures for creating restricted initial working sets of patents and -for producing a final re-ranking regression model. As we exploit specific -metadata of the patent documents and the citation relations only at the -creation of initial working sets and during the final post ranking step, our -architecture remains generic and easy to extend. -" -175,0908.4431,B Prabhulla Chandran Pillai,An OLAC Extension for Dravidian Languages,cs.CL," OLAC was founded in 2000 for creating online databases of language resources. -This paper intends to review the bottom-up distributed character of the project -and proposes an extension of the architecture for Dravidian languages. An -ontological structure is considered for effective natural language processing -(NLP) and its advantages over statistical methods are reviewed -" -176,0909.1147,B Prabhulla Chandran Pillai,"Empowering OLAC Extension using Anusaaraka and Effective text processing - using Double Byte coding",cs.CL," The paper reviews the hurdles while trying to implement the OLAC extension -for Dravidian / Indian languages. The paper further explores the possibilities -which could minimise or solve these problems. In this context, the Chinese -system of text processing and the anusaaraka system are scrutinised. -" -177,0909.1308,"Nataliya Sokolovska (LTCI), Thomas Lavergne (LIMSI), Olivier Capp\'e - (LTCI), Fran\c{c}ois Yvon (LIMSI)","Efficient Learning of Sparse Conditional Random Fields for Supervised - Sequence Labelling",cs.LG cs.CL," Conditional Random Fields (CRFs) constitute a popular and efficient approach -for supervised sequence labelling. CRFs can cope with large description spaces -and can integrate some form of structural dependency between labels. In this -contribution, we address the issue of efficient feature selection for CRFs -based on imposing sparsity through an L1 penalty. We first show how sparsity of -the parameter set can be exploited to significantly speed up training and -labelling. We then introduce coordinate descent parameter update schemes for -CRFs with L1 regularization. We finally provide some empirical comparisons of -the proposed approach with state-of-the-art CRF training strategies. In -particular, it is shown that the proposed approach is able to take profit of -the sparsity to speed up processing and hence potentially handle larger -dimensional models. -" -178,0909.2379,Priyanka Gupta and Vishal Goyal,"Implementation of Rule Based Algorithm for Sandhi-Vicheda Of Compound - Hindi Words",cs.CL," Sandhi means to join two or more words to coin new word. Sandhi literally -means `putting together' or combining (of sounds), It denotes all combinatory -sound-changes effected (spontaneously) for ease of pronunciation. -Sandhi-vicheda describes [5] the process by which one letter (whether single or -cojoined) is broken to form two words. Part of the broken letter remains as the -last letter of the first word and part of the letter forms the first letter of -the next letter. Sandhi- Vicheda is an easy and interesting way that can give -entirely new dimension that add new way to traditional approach to Hindi -Teaching. In this paper using the Rule based algorithm we have reported an -accuracy of 60-80% depending upon the number of rules to be implemented. -" -179,0909.2626,"Susanne Salmon-Alt (INRIA Lorraine - LORIA), Laurent Romary (INRIA - Lorraine - LORIA)",Reference Resolution within the Framework of Cognitive Grammar,cs.CL," Following the principles of Cognitive Grammar, we concentrate on a model for -reference resolution that attempts to overcome the difficulties previous -approaches, based on the fundamental assumption that all reference (independent -on the type of the referring expression) is accomplished via access to and -restructuring of domains of reference rather than by direct linkage to the -entities themselves. The model accounts for entities not explicitly mentioned -but understood in a discourse, and enables exploitation of discursive and -perceptual context to limit the set of potential referents for a given -referring expression. As the most important feature, we note that a single -mechanism is required to handle what are typically treated as diverse -phenomena. Our approach, then, provides a fresh perspective on the relations -between Cognitive Grammar and the problem of reference. -" -180,0909.2715,"Dan Cristea, Nancy Ide, Laurent Romary (INRIA Lorraine - LORIA)",Marking-up multiple views of a Text: Discourse and Reference,cs.CL," We describe an encoding scheme for discourse structure and reference, based -on the TEI Guidelines and the recommendations of the Corpus Encoding -Specification (CES). A central feature of the scheme is a CES-based data -architecture enabling the encoding of and access to multiple views of a -marked-up document. We describe a tool architecture that supports the encoding -scheme, and then show how we have used the encoding scheme and the tools to -perform a discourse analytic task in support of a model of global discourse -cohesion called Veins Theory (Cristea & Ide, 1998). -" -181,0909.2718,"Nancy Ide (INRIA Lorraine - LORIA), Laurent Romary (INRIA Lorraine - - LORIA), Tomaz Erjavec",A Common XML-based Framework for Syntactic Annotations,cs.CL," It is widely recognized that the proliferation of annotation schemes runs -counter to the need to re-use language resources, and that standards for -linguistic annotation are becoming increasingly mandatory. To answer this need, -we have developed a framework comprised of an abstract model for a variety of -different annotation types (e.g., morpho-syntactic tagging, syntactic -annotation, co-reference annotation, etc.), which can be instantiated in -different ways depending on the annotator's approach and goals. In this paper -we provide an overview of the framework, demonstrate its applicability to -syntactic annotation, and show how it can contribute to comparative evaluation -of parser output and diverse syntactic annotation schemes. -" -182,0909.2719,"Nancy Ide (COMPUTER Science Department), Laurent Romary (INRIA - Lorraine - Loria)",Standards for Language Resources,cs.CL," This paper presents an abstract data model for linguistic annotations and its -implementation using XML, RDF and related standards; and to outline the work of -a newly formed committee of the International Standards Organization (ISO), -ISO/TC 37/SC 4 Language Resource Management, which will use this work as its -starting point. The primary motive for presenting the latter is to solicit the -participation of members of the research community to contribute to the work of -the committee. -" -183,0909.3027,"Emmanuel Ep Prochasson (LINA), Christian Viard-Gaudin (IRCCyN), - Emmanuel Morin (LINA)",Language Models for Handwritten Short Message Services,cs.CL," Handwriting is an alternative method for entering texts composing Short -Message Services. However, a whole new language features the texts which are -produced. They include for instance abbreviations and other consonantal writing -which sprung up for time saving and fashion. We have collected and processed a -significant number of such handwriting SMS, and used various strategies to -tackle this challenging area of handwriting recognition. We proposed to study -more specifically three different phenomena: consonant skeleton, rebus, and -phonetic writing. For each of them, we compare the rough results produced by a -standard recognition system with those obtained when using a specific language -model. -" -184,0909.3028,"Emmanuel Prochasson (LINA), Emmanuel Morin (LINA), Christian - Viard-Gaudin (IRCCyN)",Vers la reconnaissance de mini-messages manuscrits,cs.CL," Handwriting is an alternative method for entering texts which composed Short -Message Services. However, a whole new language features the texts which are -produced. They include for instance abbreviations and other consonantal writing -which sprung up for time saving and fashion. We have collected and processed a -significant number of such handwritten SMS, and used various strategies to -tackle this challenging area of handwriting recognition. We proposed to study -more specifically three different phenomena: consonant skeleton, rebus, and -phonetic writing. For each of them, we compare the rough results produced by a -standard recognition system with those obtained when using a specific language -model to take care of them. -" -185,0909.3444,"Jonathan Marchand (INRIA Lorraine - LORIA), Bruno Guillaume (INRIA - Lorraine - LORIA), Guy Perrier (INRIA Lorraine - LORIA)",Analyse en d\'ependances \`a l'aide des grammaires d'interaction,cs.CL," This article proposes a method to extract dependency structures from -phrase-structure level parsing with Interaction Grammars. Interaction Grammars -are a formalism which expresses interactions among words using a polarity -system. Syntactical composition is led by the saturation of polarities. -Interactions take place between constituents, but as grammars are lexicalized, -these interactions can be translated at the level of words. Dependency -relations are extracted from the parsing process: every dependency is the -consequence of a polarity saturation. The dependency relations we obtain can be -seen as a refinement of the usual dependency tree. Generally speaking, this -work sheds new light on links between phrase structure and dependency parsing. -" -186,0909.3445,"Ingrid Falk (INRIA Lorraine - LORIA), Claire Gardent (INRIA Lorraine - - LORIA), Evelyne Jacquey (ATILF), Fabienne Venant (INRIA Lorraine - LORIA)",Grouping Synonyms by Definitions,cs.CL," We present a method for grouping the synonyms of a lemma according to its -dictionary senses. The senses are defined by a large machine readable -dictionary for French, the TLFi (Tr\'esor de la langue fran\c{c}aise -informatis\'e) and the synonyms are given by 5 synonym dictionaries (also for -French). To evaluate the proposed method, we manually constructed a gold -standard where for each (word, definition) pair and given the set of synonyms -defined for that word by the 5 synonym dictionaries, 4 lexicographers specified -the set of synonyms they judge adequate. While inter-annotator agreement ranges -on that task from 67% to at best 88% depending on the annotator pair and on the -synonym dictionary being considered, the automatic procedure we propose scores -a precision of 67% and a recall of 71%. The proposed method is compared with -related work namely, word sense disambiguation, synonym lexicon acquisition and -WordNet construction. -" -187,0909.3591,"P. Gilkey, S. Lopez Ornat, and A. Karousou","Mathematics, Recursion, and Universals in Human Languages",cs.CL," There are many scientific problems generated by the multiple and conflicting -alternative definitions of linguistic recursion and human recursive processing -that exist in the literature. The purpose of this article is to make available -to the linguistic community the standard mathematical definition of recursion -and to apply it to discuss linguistic recursion. As a byproduct, we obtain an -insight into certain ""soft universals"" of human languages, which are related to -cognitive constructs necessary to implement mathematical reasoning, i.e. -mathematical model theory. -" -188,0909.4280,"Harry Bunt, Laurent Romary (INRIA Lorraine - LORIA)",Towards Multimodal Content Representation,cs.CL," Multimodal interfaces, combining the use of speech, graphics, gestures, and -facial expressions in input and output, promise to provide new possibilities to -deal with information in more effective and efficient ways, supporting for -instance: - the understanding of possibly imprecise, partial or ambiguous -multimodal input; - the generation of coordinated, cohesive, and coherent -multimodal presentations; - the management of multimodal interaction (e.g., -task completion, adapting the interface, error prevention) by representing and -exploiting models of the user, the domain, the task, the interactive context, -and the media (e.g. text, audio, video). The present document is intended to -support the discussion on multimodal content representation, its possible -objectives and basic constraints, and how the definition of a generic -representation framework for multimodal content representation may be -approached. It takes into account the results of the Dagstuhl workshop, in -particular those of the informal working group on multimodal meaning -representation that was active during the workshop (see -http://www.dfki.de/~wahlster/Dagstuhl_Multi_Modality, Working Group 4). -" -189,0909.4385,"Sebastian Bernhardsson, Luis Enrique Correa da Rocha and Petter - Minnhagen",The meta book and size-dependent properties of written language,physics.soc-ph cs.CL physics.data-an," Evidence is given for a systematic text-length dependence of the power-law -index gamma of a single book. The estimated gamma values are consistent with a -monotonic decrease from 2 to 1 with increasing length of a text. A direct -connection to an extended Heap's law is explored. The infinite book limit is, -as a consequence, proposed to be given by gamma = 1 instead of the value -gamma=2 expected if the Zipf's law was ubiquitously applicable. In addition we -explore the idea that the systematic text-length dependence can be described by -a meta book concept, which is an abstract representation reflecting the -word-frequency structure of a text. According to this concept the -word-frequency distribution of a text, with a certain length written by a -single author, has the same characteristics as a text of the same length pulled -out from an imaginary complete infinite corpus written by the same author. -" -190,0910.0537,Victor Gluzberg,A Note On Higher Order Grammar,cs.CL," Both syntax-phonology and syntax-semantics interfaces in Higher Order Grammar -(HOG) are expressed as axiomatic theories in higher-order logic (HOL), i.e. a -language is defined entirely in terms of provability in the single logical -system. An important implication of this elegant architecture is that the -meaning of a valid expression turns out to be represented not by a single, nor -even by a few ""discrete"" terms (in case of ambiguity), but by a ""continuous"" -set of logically equivalent terms. The note is devoted to precise formulation -and proof of this observation. -" -191,0910.1484,"Alain Lecomte (INRIA Futurs, SFLTAMP), Myriam Quatrini (IML)",Ludics and its Applications to natural Language Semantics,cs.CL," Proofs, in Ludics, have an interpretation provided by their counter-proofs, -that is the objects they interact with. We follow the same idea by proposing -that sentence meanings are given by the counter-meanings they are opposed to in -a dialectical interaction. The conception is at the intersection of a -proof-theoretic and a game-theoretic accounts of semantics, but it enlarges -them by allowing to deal with possibly infinite processes. -" -192,0910.1868,Vishal Goyal and Gurpreet Singh Lehal,Evaluation of Hindi to Punjabi Machine Translation System,cs.CL," Machine Translation in India is relatively young. The earliest efforts date -from the late 80s and early 90s. The success of every system is judged from its -evaluation experimental results. Number of machine translation systems has been -started for development but to the best of author knowledge, no high quality -system has been completed which can be used in real applications. Recently, -Punjabi University, Patiala, India has developed Punjabi to Hindi Machine -translation system with high accuracy of about 92%. Both the systems i.e. -system under question and developed system are between same closely related -languages. Thus, this paper presents the evaluation results of Hindi to Punjabi -machine translation system. It makes sense to use same evaluation criteria as -that of Punjabi to Hindi Punjabi Machine Translation System. After evaluation, -the accuracy of the system is found to be about 95%. -" -193,0910.5410,"David Fernandez-Amoros, Julio Gonzalo, Felisa Verdejo",The Uned systems at Senseval-2,cs.CL cs.AI," We have participated in the SENSEVAL-2 English tasks (all words and lexical -sample) with an unsupervised system based on mutual information measured over a -large corpus (277 million words) and some additional heuristics. A supervised -extension of the system was also presented to the lexical sample task. - Our system scored first among unsupervised systems in both tasks: 56.9% -recall in all words, 40.2% in lexical sample. This is slightly worse than the -first sense heuristic for all words and 3.6% better for the lexical sample, a -strong indication that unsupervised Word Sense Disambiguation remains being a -strong challenge. -" -194,0910.5419,David Fernandez-Amoros,"Word Sense Disambiguation Based on Mutual Information and Syntactic - Patterns",cs.CL cs.AI," This paper describes a hybrid system for WSD, presented to the English -all-words and lexical-sample tasks, that relies on two different unsupervised -approaches. The first one selects the senses according to mutual information -proximity between a context word a variant of the sense. The second heuristic -analyzes the examples of use in the glosses of the senses so that simple -syntactic patterns are inferred. This patterns are matched against the -disambiguation contexts. We show that the first heuristic obtains a precision -and recall of .58 and .35 respectively in the all words task while the second -obtains .80 and .25. The high precision obtained recommends deeper research of -the techniques. Results for the lexical sample task are also provided. -" -195,0910.5682,David Fernandez-Amoros,"Word Sense Disambiguation Using English-Spanish Aligned Phrases over - Comparable Corpora",cs.CL cs.AI," In this paper we describe a WSD experiment based on bilingual English-Spanish -comparable corpora in which individual noun phrases have been identified and -aligned with their respective counterparts in the other language. The -evaluation of the experiment has been carried out against SemCor. - We show that, with the alignment algorithm employed, potential precision is -high (74.3%), however the coverage of the method is low (2.7%), due to -alignments being far less frequent than we expected. - Contrary to our intuition, precision does not rise consistently with the -number of alignments. The coverage is low due to several factors; there are -important domain differences, and English and Spanish are too close languages -for this approach to be able to discriminate efficiently between senses, -rendering it unsuitable for WSD, although the method may prove more productive -in machine translation. -" -196,0911.0894,N. Rama and Meenakshi Lakshmanan,"A New Computational Schema for Euphonic Conjunctions in Sanskrit - Processing",cs.CL," Automated language processing is central to the drive to enable facilitated -referencing of increasingly available Sanskrit E texts. The first step towards -processing Sanskrit text involves the handling of Sanskrit compound words that -are an integral part of Sanskrit texts. This firstly necessitates the -processing of euphonic conjunctions or sandhis, which are points in words or -between words, at which adjacent letters coalesce and transform. The ancient -Sanskrit grammarian Panini's codification of the Sanskrit grammar is the -accepted authority in the subject. His famed sutras or aphorisms, numbering -approximately four thousand, tersely, precisely and comprehensively codify the -rules of the grammar, including all the rules pertaining to sandhis. This work -presents a fresh new approach to processing sandhis in terms of a computational -schema. This new computational model is based on Panini's complex codification -of the rules of grammar. The model has simple beginnings and is yet powerful, -comprehensive and computationally lean. -" -197,0911.0907,Kaustubh Bhattacharyya and Kandarpa Kumar Sarma,"ANN-based Innovative Segmentation Method for Handwritten text in - Assamese",cs.CL," Artificial Neural Network (ANN) s has widely been used for recognition of -optically scanned character, which partially emulates human thinking in the -domain of the Artificial Intelligence. But prior to recognition, it is -necessary to segment the character from the text to sentences, words etc. -Segmentation of words into individual letters has been one of the major -problems in handwriting recognition. Despite several successful works all over -the work, development of such tools in specific languages is still an ongoing -process especially in the Indian context. This work explores the application of -ANN as an aid to segmentation of handwritten characters in Assamese- an -important language in the North Eastern part of India. The work explores the -performance difference obtained in applying an ANN-based dynamic segmentation -algorithm compared to projection- based static segmentation. The algorithm -involves, first training of an ANN with individual handwritten characters -recorded from different individuals. Handwritten sentences are separated out -from text using a static segmentation method. From the segmented line, -individual characters are separated out by first over segmenting the entire -line. Each of the segments thus obtained, next, is fed to the trained ANN. The -point of segmentation at which the ANN recognizes a segment or a combination of -several segments to be similar to a handwritten character, a segmentation -boundary for the character is assumed to exist and segmentation performed. The -segmented character is next compared to the best available match and the -segmentation boundary confirmed. -" -198,0911.1451,"Loet Leydesdorff, Ping Zhou",Co-word Analysis using the Chinese Character Set,cs.CL cs.DL," Until recently, Chinese texts could not be studied using co-word analysis -because the words are not separated by spaces in Chinese (and Japanese). A word -can be composed of one or more characters. The online availability of programs -that separate Chinese texts makes it possible to analyze them using semantic -maps. Chinese characters contain not only information, but also meaning. This -may enhance the readability of semantic maps. In this study, we analyze 58 -words which occur ten or more times in the 1652 journal titles of the China -Scientific and Technical Papers and Citations Database. The word occurrence -matrix is visualized and factor-analyzed. -" -199,0911.1516,"Sana Ullah, M.A. Khan, Kyung Sup Kwak",A Discourse-based Approach in Text-based Machine Translation,cs.CL," This paper presents a theoretical research based approach to ellipsis -resolution in machine translation. The formula of discourse is applied in order -to resolve ellipses. The validity of the discourse formula is analyzed by -applying it to the real world text, i.e., newspaper fragments. The source text -is converted into mono-sentential discourses where complex discourses require -further dissection either directly into primitive discourses or first into -compound discourses and later into primitive ones. The procedure of dissection -needs further improvement, i.e., discovering as many primitive discourse forms -as possible. An attempt has been made to investigate new primitive discourses -or patterns from the given text. -" -200,0911.1517,"Sana Ullah, M.Asdaque Hussain, and Kyung Sup Kwak",Resolution of Unidentified Words in Machine Translation,cs.CL," This paper presents a mechanism of resolving unidentified lexical units in -Text-based Machine Translation (TBMT). In a Machine Translation (MT) system it -is unlikely to have a complete lexicon and hence there is intense need of a new -mechanism to handle the problem of unidentified words. These unknown words -could be abbreviations, names, acronyms and newly introduced terms. We have -proposed an algorithm for the resolution of the unidentified words. This -algorithm takes discourse unit (primitive discourse) as a unit of analysis and -provides real time updates to the lexicon. We have manually applied the -algorithm to news paper fragments. Along with anaphora and cataphora -resolution, many unknown words especially names and abbreviations were updated -to the lexicon. -" -201,0911.1842,"Nancy Ide (INRIA Lorraine - LORIA), Laurent Romary (INRIA Lorraine - - LORIA)",Standards for Language Resources,cs.CL," The goal of this paper is two-fold: to present an abstract data model for -linguistic annotations and its implementation using XML, RDF and related -standards; and to outline the work of a newly formed committee of the -International Standards Organization (ISO), ISO/TC 37/SC 4 Language Resource -Management, which will use this work as its starting point. -" -202,0911.1965,"Nitin Madnani, Hongyan Jing, Nanda Kambhatla and Salim Roukos","Active Learning for Mention Detection: A Comparison of Sentence - Selection Strategies",cs.CL cs.AI," We propose and compare various sentence selection strategies for active -learning for the task of detecting mentions of entities. The best strategy -employs the sum of confidences of two statistical classifiers trained on -different views of the data. Our experimental results show that, compared to -the random selection strategy, this strategy reduces the amount of required -labeled training data by over 50% while achieving the same performance. The -effect is even more significant when only named mentions are considered: the -system achieves the same performance by using only 42% of the training data -required by the random selection strategy. -" -203,0911.2284,Fabio G. Guerrero,A New Look at the Classical Entropy of Written English,cs.CL," A simple method for finding the entropy and redundancy of a reasonable long -sample of English text by direct computer processing and from first principles -according to Shannon theory is presented. As an example, results on the entropy -of the English language have been obtained based on a total of 20.3 million -characters of written English, considering symbols from one to five hundred -characters in length. Besides a more realistic value of the entropy of English, -a new perspective on some classic entropy-related concepts is presented. This -method can also be extended to other Latin languages. Some implications for -practical applications such as plagiarism-detection software, and the minimum -number of words that should be used in social Internet network messaging, are -discussed. -" -204,0911.3280,Maurizio Serva,Automated languages phylogeny from Levenshtein distance,cs.CL q-bio.PE q-bio.QM," Languages evolve over time in a process in which reproduction, mutation and -extinction are all possible, similar to what happens to living organisms. Using -this similarity it is possible, in principle, to build family trees which show -the degree of relatedness between languages. - The method used by modern glottochronology, developed by Swadesh in the -1950s, measures distances from the percentage of words with a common historical -origin. The weak point of this method is that subjective judgment plays a -relevant role. - Recently we proposed an automated method that avoids the subjectivity, whose -results can be replicated by studies that use the same database and that -doesn't require a specific linguistic knowledge. Moreover, the method allows a -quick comparison of a large number of languages. - We applied our method to the Indo-European and Austronesian families, -considering in both cases, fifty different languages. The resulting trees are -similar to those of previous studies, but with some important differences in -the position of few languages and subgroups. We believe that these differences -carry new information on the structure of the tree and on the phylogenetic -relationships within families. -" -205,0911.3292,Filippo Petroni and Maurizio Serva,Automated words stability and languages phylogeny,cs.CL physics.soc-ph q-bio.PE," The idea of measuring distance between languages seems to have its roots in -the work of the French explorer Dumont D'Urville (D'Urville 1832). He collected -comparative words lists of various languages during his voyages aboard the -Astrolabe from 1826 to1829 and, in his work about the geographical division of -the Pacific, he proposed a method to measure the degree of relation among -languages. The method used by modern glottochronology, developed by Morris -Swadesh in the 1950s (Swadesh 1952), measures distances from the percentage of -shared cognates, which are words with a common historical origin. Recently, we -proposed a new automated method which uses normalized Levenshtein distance -among words with the same meaning and averages on the words contained in a -list. Another classical problem in glottochronology is the study of the -stability of words corresponding to different meanings. Words, in fact, evolve -because of lexical changes, borrowings and replacement at a rate which is not -the same for all of them. The speed of lexical evolution is different for -different meanings and it is probably related to the frequency of use of the -associated words (Pagel et al. 2007). This problem is tackled here by an -automated methodology only based on normalized Levenshtein distance. -" -206,0911.3411,Loet Leydesdorff and Iina Hellsten,"Measuring the Meaning of Words in Contexts: An automated analysis of - controversies about Monarch butterflies, Frankenfoods, and stem cells",cs.CL cs.IR physics.soc-ph," Co-words have been considered as carriers of meaning across different domains -in studies of science, technology, and society. Words and co-words, however, -obtain meaning in sentences, and sentences obtain meaning in their contexts of -use. At the science/society interface, words can be expected to have different -meanings: the codes of communication that provide meaning to words differ on -the varying sides of the interface. Furthermore, meanings and interfaces may -change over time. Given this structuring of meaning across interfaces and over -time, we distinguish between metaphors and diaphors as reflexive mechanisms -that facilitate the translation between contexts. Our empirical focus is on -three recent scientific controversies: Monarch butterflies, Frankenfoods, and -stem-cell therapies. This study explores new avenues that relate the study of -co-word analysis in context with the sociological quest for the analysis and -processing of meaning. -" -207,0911.3944,"Christopher M. White, Sanjeev P. Khudanpur, and Patrick J. Wolfe","Likelihood-based semi-supervised model selection with applications to - speech processing",stat.ML cs.CL cs.LG stat.AP," In conventional supervised pattern recognition tasks, model selection is -typically accomplished by minimizing the classification error rate on a set of -so-called development data, subject to ground-truth labeling by human experts -or some other means. In the context of speech processing systems and other -large-scale practical applications, however, such labeled development data are -typically costly and difficult to obtain. This article proposes an alternative -semi-supervised framework for likelihood-based model selection that leverages -unlabeled data by using trained classifiers representing each model to -automatically generate putative labels. The errors that result from this -automatic labeling are shown to be amenable to results from robust statistics, -which in turn provide for minimax-optimal censored likelihood ratio tests that -recover the nonparametric sign test as a limiting case. This approach is then -validated experimentally using a state-of-the-art automatic speech recognition -system to select between candidate word pronunciations using unlabeled speech -data that only potentially contain instances of the words under test. Results -provide supporting evidence for the utility of this approach, and suggest that -it may also find use in other applications of machine learning. -" -208,0911.5116,"Laurent Romary (INRIA Saclay - Ile de France, IDSL)","Standardization of the formal representation of lexical information for - NLP",cs.CL," A survey of dictionary models and formats is presented as well as a -presentation of corresponding recent standardisation activities. -" -209,0911.5568,"C\'edric Messiant (LIPN), Thierry Poibeau (LIPN)","Acquisition d'informations lexicales \`a partir de corpus C\'edric - Messiant et Thierry Poibeau",cs.CL cs.AI," This paper is about automatic acquisition of lexical information from -corpora, especially subcategorization acquisition. -" -210,0911.5703,"Olivier Picard, Alexandre Blondin-Masse, Stevan Harnad, Odile - Marcotte, Guillaume Chicoisne and Yassine Gargouri",Hierarchies in Dictionary Definition Space,cs.CL cs.LG," A dictionary defines words in terms of other words. Definitions can tell you -the meanings of words you don't know, but only if you know the meanings of the -defining words. How many words do you need to know (and which ones) in order to -be able to learn all the rest from definitions? We reduced dictionaries to -their ""grounding kernels"" (GKs), about 10% of the dictionary, from which all -the other words could be defined. The GK words turned out to have -psycholinguistic correlates: they were learned at an earlier age and more -concrete than the rest of the dictionary. But one can compress still more: the -GK turns out to have internal structure, with a strongly connected ""kernel -core"" (KC) and a surrounding layer, from which a hierarchy of definitional -distances can be derived, all the way out to the periphery of the full -dictionary. These definitional distances, too, are correlated with -psycholinguistic variables (age of acquisition, concreteness, imageability, -oral and written frequency) and hence perhaps with the ""mental lexicon"" in each -of our heads. -" -211,0912.0821,Filippo Petroni and Maurizio Serva,Lexical evolution rates by automated stability measure,cs.CL physics.soc-ph," Phylogenetic trees can be reconstructed from the matrix which contains the -distances between all pairs of languages in a family. Recently, we proposed a -new method which uses normalized Levenshtein distances among words with same -meaning and averages on all the items of a given list. Decisions about the -number of items in the input lists for language comparison have been debated -since the beginning of glottochronology. The point is that words associated to -some of the meanings have a rapid lexical evolution. Therefore, a large -vocabulary comparison is only apparently more accurate then a smaller one since -many of the words do not carry any useful information. In principle, one should -find the optimal length of the input lists studying the stability of the -different items. In this paper we tackle the problem with an automated -methodology only based on our normalized Levenshtein distance. With this -approach, the program of an automated reconstruction of languages relationships -is completed. -" -212,0912.0884,Filippo Petroni and Maurizio Serva,Measures of lexical distance between languages,cs.CL physics.soc-ph," The idea of measuring distance between languages seems to have its roots in -the work of the French explorer Dumont D'Urville \cite{Urv}. He collected -comparative words lists of various languages during his voyages aboard the -Astrolabe from 1826 to 1829 and, in his work about the geographical division of -the Pacific, he proposed a method to measure the degree of relation among -languages. The method used by modern glottochronology, developed by Morris -Swadesh in the 1950s, measures distances from the percentage of shared -cognates, which are words with a common historical origin. Recently, we -proposed a new automated method which uses normalized Levenshtein distance -among words with the same meaning and averages on the words contained in a -list. Recently another group of scholars \cite{Bak, Hol} proposed a refined of -our definition including a second normalization. In this paper we compare the -information content of our definition with the refined version in order to -decide which of the two can be applied with greater success to resolve -relationships among languages. -" -213,0912.1820,"Mirzanur Rahman, Sufal Das and Utpal Sharma",Parsing of part-of-speech tagged Assamese Texts,cs.CL," A natural language (or ordinary language) is a language that is spoken, -written, or signed by humans for general-purpose communication, as -distinguished from formal languages (such as computer-programming languages or -the ""languages"" used in the study of formal logic). The computational -activities required for enabling a computer to carry out information processing -using natural language is called natural language processing. We have taken -Assamese language to check the grammars of the input sentence. Our aim is to -produce a technique to check the grammatical structures of the sentences in -Assamese text. We have made grammar rules by analyzing the structures of -Assamese sentences. Our parsing program finds the grammatical errors, if any, -in the Assamese sentence. If there is no error, the program will generate the -parse tree for the Assamese sentence -" -214,0912.1829,Dang Tuan Nguyen and Ha Quy-Tinh Luong,"Document Searching System based on Natural Language Query Processing for - Vietnam Open Courseware Library",cs.IR cs.CL," The necessary of buiding the searching system being able to support users -expressing their searching by natural language queries is very important and -opens the researching direction with many potential. It combines the -traditional methods of information retrieval and the researching of Question -Answering (QA). In this paper, we introduce a searching system built by us for -searching courses on the Vietnam OpenCourseWare Program (VOCW). It can be -considered as the first tool to be able to perform the user's Vietnamese -questions. The experiment results are rather good when we evaluate this system -on the precision -" -215,0912.2881,"Lothar Lemnitzer, Laurent Romary (INRIA Saclay - Ile de France, IDSL), - Andreas Witt",Representing human and machine dictionaries in Markup languages,cs.CL," In this chapter we present the main issues in representing machine readable -dictionaries in XML, and in particular according to the Text Encoding -Dictionary (TEI) guidelines. -" -216,0912.3747,Ion Androutsopoulos and Prodromos Malakasiotis,A Survey of Paraphrasing and Textual Entailment Methods,cs.CL cs.AI," Paraphrasing methods recognize, generate, or extract phrases, sentences, or -longer natural language expressions that convey almost the same information. -Textual entailment methods, on the other hand, recognize, generate, or extract -pairs of natural language expressions, such that a human who reads (and trusts) -the first element of a pair would most likely infer that the other element is -also true. Paraphrasing can be seen as bidirectional textual entailment and -methods from the two areas are often similar. Both kinds of methods are useful, -at least in principle, in a wide range of natural language processing -applications, including question answering, summarization, text generation, and -machine translation. We summarize key ideas from the two areas by considering -in turn recognition, generation, and extraction methods, also pointing to -prominent articles and resources. -" -217,0912.3917,"Mustapha Guezouri, Larbi Mesbahi, Abdelkader Benyettou","Speech Recognition Oriented Vowel Classification Using Temporal Radial - Basis Functions",cs.CL cs.MM," The recent resurgence of interest in spatio-temporal neural network as speech -recognition tool motivates the present investigation. In this paper an approach -was developed based on temporal radial basis function ""TRBF"" looking to many -advantages: few parameters, speed convergence and time invariance. This -application aims to identify vowels taken from natural speech samples from the -Timit corpus of American speech. We report a recognition accuracy of 98.06 -percent in training and 90.13 in test on a subset of 6 vowel phonemes, with the -possibility to expend the vowel sets in future. -" -218,1001.2263,"N. Kalyani, Dr K. V. N. Sunitha",Syllable Analysis to Build a Dictation System in Telugu language,cs.CL cs.HC," In recent decades, Speech interactive systems gained increasing importance. -To develop Dictation System like Dragon for Indian languages it is most -important to adapt the system to a speaker with minimum training. In this paper -we focus on the importance of creating speech database at syllable units and -identifying minimum text to be considered while training any speech recognition -system. There are systems developed for continuous speech recognition in -English and in few Indian languages like Hindi and Tamil. This paper gives the -statistical details of syllables in Telugu and its use in minimizing the search -space during recognition of speech. The minimum words that cover maximum -syllables are identified. This words list can be used for preparing a small -text which can be used for collecting speech sample while training the -dictation system. The results are plotted for frequency of syllables and the -number of syllables in each word. This approach is applied on the CIIL Mysore -text corpus which is of 3 million words. -" -219,1001.2267,"M. A. Anusuya, S. K. Katti","Speech Recognition by Machine, A Review",cs.CL," This paper presents a brief survey on Automatic Speech Recognition and -discusses the major themes and advances made in the past 60 years of research, -so as to provide a technological perspective and an appreciation of the -fundamental progress that has been accomplished in this important area of -speech communication. After years of research and development the accuracy of -automatic speech recognition remains one of the important research challenges -(e.g., variations of the context, speakers, and environment).The design of -Speech Recognition system requires careful attentions to the following issues: -Definition of various types of speech classes, speech representation, feature -extraction techniques, speech classifiers, database and performance evaluation. -The problems that are existing in ASR and the various techniques to solve these -problems constructed by various research workers have been presented in a -chronological order. Hence authors hope that this work shall be a contribution -in the area of speech recognition. The objective of this review paper is to -summarize and compare some of the well known methods used in various stages of -speech recognition system and identify research topic and applications which -are at the forefront of this exciting and challenging field. -" -220,1001.4273,"Siddhartha Jonnalagadda, Graciela Gonzalez",Sentence Simplification Aids Protein-Protein Interaction Extraction,cs.CL," Accurate systems for extracting Protein-Protein Interactions (PPIs) -automatically from biomedical articles can help accelerate biomedical research. -Biomedical Informatics researchers are collaborating to provide metaservices -and advance the state-of-art in PPI extraction. One problem often neglected by -current Natural Language Processing systems is the characteristic complexity of -the sentences in biomedical literature. In this paper, we report on the impact -that automatic simplification of sentences has on the performance of a -state-of-art PPI extraction system, showing a substantial improvement in recall -(8%) when the sentence simplification method is applied, without significant -impact to precision. -" -221,1001.4277,"Siddhartha Jonnalagadda, Luis Tari, Jorg Hakenberg, Chitta Baral and - Graciela Gonzalez","Towards Effective Sentence Simplification for Automatic Processing of - Biomedical Text",cs.CL," The complexity of sentences characteristic to biomedical articles poses a -challenge to natural language parsers, which are typically trained on -large-scale corpora of non-technical text. We propose a text simplification -process, bioSimplify, that seeks to reduce the complexity of sentences in -biomedical abstracts in order to improve the performance of syntactic parsers -on the processed sentences. Syntactic parsing is typically one of the first -steps in a text mining pipeline. Thus, any improvement in performance would -have a ripple effect over all processing steps. We evaluated our method using a -corpus of biomedical sentences annotated with syntactic links. Our empirical -results show an improvement of 2.90% for the Charniak-McClosky parser and of -4.23% for the Link Grammar parser when processing simplified sentences rather -than the original sentences in the corpus. -" -222,1001.4368,"Iina Hellsten, James Dawson, Loet Leydesdorff","Implicit media frames: Automated analysis of public debate on artificial - sweeteners",cs.IR cs.CL," The framing of issues in the mass media plays a crucial role in the public -understanding of science and technology. This article contributes to research -concerned with diachronic analysis of media frames by making an analytical -distinction between implicit and explicit media frames, and by introducing an -automated method for analysing diachronic changes of implicit frames. In -particular, we apply a semantic maps method to a case study on the newspaper -debate about artificial sweeteners, published in The New York Times (NYT) -between 1980 and 2006. Our results show that the analysis of semantic changes -enables us to filter out the dynamics of implicit frames, and to detect -emerging metaphors in public debates. Theoretically, we discuss the relation -between implicit frames in public debates and codification of information in -scientific discourses, and suggest further avenues for research interested in -the automated analysis of frame changes and trends in public debates. -" -223,1002.0478,"Odile Piton (SAMM), H\'el\`ene Pignot","\'Etude et traitement automatique de l'anglais du XVIIe si\`ecle : - outils morphosyntaxiques et dictionnaires",cs.CL," In this article, we record the main linguistic differences or singularities -of 17th century English, analyse them morphologically and syntactically and -propose equivalent forms in contemporary English. We show how 17th century -texts may be transcribed into modern English, combining the use of electronic -dictionaries with rules of transcription implemented as transducers. Apr\`es -avoir expos\'e la constitution du corpus, nous recensons les principales -diff\'erences ou particularit\'es linguistiques de la langue anglaise du XVIIe -si\`ecle, les analysons du point de vue morphologique et syntaxique et -proposons des \'equivalents en anglais contemporain (AC). Nous montrons comment -nous pouvons effectuer une transcription automatique de textes anglais du XVIIe -si\`ecle en anglais moderne, en combinant l'utilisation de dictionnaires -\'electroniques avec des r\`egles de transcriptions impl\'ement\'ees sous forme -de transducteurs. -" -224,1002.0479,"Odile Piton (SAMM), H\'el\`ene Pignot (SAMM)","""Mind your p's and q's"": or the peregrinations of an apostrophe in 17th - Century English",cs.CL," If the use of the apostrophe in contemporary English often marks the Saxon -genitive, it may also indicate the omission of one or more let-ters. Some -writers (wrongly?) use it to mark the plural in symbols or abbreviations, -visual-ised thanks to the isolation of the morpheme ""s"". This punctuation mark -was imported from the Continent in the 16th century. During the 19th century -its use was standardised. However the rules of its usage still seem problematic -to many, including literate speakers of English. ""All too often, the apostrophe -is misplaced"", or ""errant apostrophes are springing up every-where"" is a -complaint that Internet users fre-quently come across when visiting grammar -websites. Many of them detail its various uses and misuses, and attempt to -correct the most common mistakes about it, especially its mis-use in the -plural, called greengrocers' apostro-phes and humorously misspelled -""greengro-cers apostrophe's"". While studying English travel accounts published -in the seventeenth century, we noticed that the different uses of this symbol -may accompany various models of metaplasms. We were able to highlight the -linguistic variations of some lexemes, and trace the origin of modern grammar -rules gov-erning its usage. -" -225,1002.0481,"Abdelmajid Ben Hamadou (MIRACL), Odile Piton, H\'ela Fehri (MIRACL)","Recognition and translation Arabic-French of Named Entities: case of the - Sport places",cs.CL," The recognition of Arabic Named Entities (NE) is a problem in different -domains of Natural Language Processing (NLP) like automatic translation. -Indeed, NE translation allows the access to multilingual in-formation. This -translation doesn't always lead to expected result especially when NE contains -a person name. For this reason and in order to ameliorate translation, we can -transliterate some part of NE. In this context, we propose a method that -integrates translation and transliteration together. We used the linguis-tic -NooJ platform that is based on local grammars and transducers. In this paper, -we focus on sport domain. We will firstly suggest a refinement of the -typological model presented at the MUC Conferences we will describe the -integration of an Arabic transliteration module into translation system. -Finally, we will detail our method and give the results of the evaluation. -" -226,1002.0485,"Odile Piton, Klara Lagji","Morphological study of Albanian words, and processing with NooJ",cs.CL," We are developing electronic dictionaries and transducers for the automatic -processing of the Albanian Language. We will analyze the words inside a linear -segment of text. We will also study the relationship between units of sense and -units of form. The composition of words takes different forms in Albanian. We -have found that morphemes are frequently concatenated or simply juxtaposed or -contracted. The inflected grammar of NooJ allows constructing the dictionaries -of flexed forms (declensions or conjugations). The diversity of word structures -requires tools to identify words created by simple concatenation, or to treat -contractions. The morphological tools of NooJ allow us to create grammatical -tools to represent and treat these phenomena. But certain problems exceed the -morphological analysis and must be represented by syntactical grammars. -" -227,1002.0773,Steven Wegmann,"Approximations to the MMI criterion and their effect on lattice-based - MMI",cs.CL," Maximum mutual information (MMI) is a model selection criterion used for -hidden Markov model (HMM) parameter estimation that was developed more than -twenty years ago as a discriminative alternative to the maximum likelihood -criterion for HMM-based speech recognition. It has been shown in the speech -recognition literature that parameter estimation using the current MMI -paradigm, lattice-based MMI, consistently outperforms maximum likelihood -estimation, but this is at the expense of undesirable convergence properties. -In particular, recognition performance is sensitive to the number of times that -the iterative MMI estimation algorithm, extended Baum-Welch, is performed. In -fact, too many iterations of extended Baum-Welch will lead to degraded -performance, despite the fact that the MMI criterion improves at each -iteration. This phenomenon is at variance with the analogous behavior of -maximum likelihood estimation -- at least for the HMMs used in speech -recognition -- and it has previously been attributed to `over fitting'. In this -paper, we present an analysis of lattice-based MMI that demonstrates, first of -all, that the asymptotic behavior of lattice-based MMI is much worse than was -previously understood, i.e. it does not appear to converge at all, and, second -of all, that this is not due to `over fitting'. Instead, we demonstrate that -the `over fitting' phenomenon is the result of standard methodology that -exacerbates the poor behavior of two key approximations in the lattice-based -MMI machinery. We also demonstrate that if we modify the standard methodology -to improve the validity of these approximations, then the convergence -properties of lattice-based MMI become benign without sacrificing improvements -to recognition accuracy. -" -228,1002.0904,Serguei A. Mokhov,On Event Structure in the Torn Dress,cs.CL," Using Pustejovsky's ""The Syntax of Event Structure"" and Fong's ""On Mending a -Torn Dress"" we give a glimpse of a Pustejovsky-like analysis to some example -sentences in Fong. We attempt to give a framework for semantics to the noun -phrases and adverbs as appropriate as well as the lexical entries for all words -in the examples and critique both papers in light of our findings and -difficulties. -" -229,1002.1095,Frank Rudzicz and Serguei A. Mokhov,"Towards a Heuristic Categorization of Prepositional Phrases in English - with WordNet",cs.CL," This document discusses an approach and its rudimentary realization towards -automatic classification of PPs; the topic, that has not received as much -attention in NLP as NPs and VPs. The approach is a rule-based heuristics -outlined in several levels of our research. There are 7 semantic categories of -PPs considered in this document that we are able to classify from an annotated -corpus. -" -230,1002.1919,"Somnuk Sinthupoun, Ohm Sornil",Thai Rhetorical Structure Analysis,cs.CL," Rhetorical structure analysis (RSA) explores discourse relations among -elementary discourse units (EDUs) in a text. It is very useful in many text -processing tasks employing relationships among EDUs such as text understanding, -summarization, and question-answering. Thai language with its distinctive -linguistic characteristics requires a unique technique. This article proposes -an approach for Thai rhetorical structure analysis. First, EDUs are segmented -by two hidden Markov models derived from syntactic rules. A rhetorical -structure tree is constructed from a clustering technique with its similarity -measure derived from Thai semantic rules. Then, a decision tree whose features -derived from the semantic rules is used to determine discourse relations. -" -231,1002.2034,Christophe Roche (LISTIC),Dire n'est pas concevoir,cs.AI cs.CL," The conceptual modelling built from text is rarely an ontology. As a matter -of fact, such a conceptualization is corpus-dependent and does not offer the -main properties we expect from ontology. Furthermore, ontology extracted from -text in general does not match ontology defined by expert using a formal -language. It is not surprising since ontology is an extra-linguistic -conceptualization whereas knowledge extracted from text is the concern of -textual linguistics. Incompleteness of text and using rhetorical figures, like -ellipsis, modify the perception of the conceptualization we may have. -Ontological knowledge, which is necessary for text understanding, is not in -general embedded into documents. -" -232,1002.3320,Raungrong Suleesathira,"Co-channel Interference Cancellation for Space-Time Coded OFDM Systems - Using Adaptive Beamforming and Null Deepening",cs.CL," Combined with space-time coding, the orthogonal frequency division -multiplexing (OFDM) system explores space diversity. It is a potential scheme -to offer spectral efficiency and robust high data rate transmissions over -frequency-selective fading channel. However, space-time coding impairs the -system ability to suppress interferences as the signals transmitted from two -transmit antennas are superposed and interfered at the receiver antennas. In -this paper, we developed an adaptive beamforming based on least mean squared -error algorithm and null deepening to combat co-channel interference (CCI) for -the space-time coded OFDM (STC-OFDM) system. To illustrate the performance of -the presented approach, it is compared to the null steering beamformer which -requires a prior knowledge of directions of arrival (DOAs). The structure of -space-time decoders are preserved although there is the use of beamformers -before decoding. By incorporating the proposed beamformer as a CCI canceller in -the STC-OFDM systems, the performance improvement is achieved as shown in the -simulation results. -" -233,1002.4665,"Jordan Boyd-Graber, David M. Blei",Syntactic Topic Models,cs.CL cs.AI math.ST stat.TH," The syntactic topic model (STM) is a Bayesian nonparametric model of language -that discovers latent distributions of words (topics) that are both -semantically and syntactically coherent. The STM models dependency parsed -corpora where sentences are grouped into documents. It assumes that each word -is drawn from a latent topic chosen by combining document-level features and -the local syntactic context. Each document has a distribution over latent -topics, as in topic models, which provides the semantic consistency. Each -element in the dependency parse tree also has a distribution over the topics of -its children, as in latent-state syntax models, which provides the syntactic -consistency. These distributions are convolved so that the topic of each word -is likely under both its document and syntactic context. We derive a fast -posterior inference algorithm based on variational methods. We report -qualitative and quantitative studies on both synthetic data and hand-parsed -documents. We show that the STM is a more predictive model of language than -current models based only on syntax or only on topics. -" -234,1002.4820,"Yann Desalle (CLLE, Lordat), Bruno Gaume (CLLE), Karine Duvignau - (CLLE, Erss)",SLAM : Solutions lexicales automatique pour m\'etaphores,cs.CL," This article presents SLAM, an Automatic Solver for Lexical Metaphors like -?d\'eshabiller* une pomme? (to undress* an apple). SLAM calculates a -conventional solution for these productions. To carry on it, SLAM has to -intersect the paradigmatic axis of the metaphorical verb ?d\'eshabiller*?, -where ?peler? (?to peel?) comes closer, with a syntagmatic axis that comes from -a corpus where ?peler une pomme? (to peel an apple) is semantically and -syntactically regular. We test this model on DicoSyn, which is a ?small world? -network of synonyms, to compute the paradigmatic axis and on Frantext.20, a -French corpus, to compute the syntagmatic axis. Further, we evaluate the model -with a sample of an experimental corpus of the database of Flexsem -" -235,1003.0206,Steven Wegmann and Larry Gillick,"Why has (reasonably accurate) Automatic Speech Recognition been so hard - to achieve?",cs.CL," Hidden Markov models (HMMs) have been successfully applied to automatic -speech recognition for more than 35 years in spite of the fact that a key HMM -assumption -- the statistical independence of frames -- is obviously violated -by speech data. In fact, this data/model mismatch has inspired many attempts to -modify or replace HMMs with alternative models that are better able to take -into account the statistical dependence of frames. However it is fair to say -that in 2010 the HMM is the consensus model of choice for speech recognition -and that HMMs are at the heart of both commercially available products and -contemporary research systems. In this paper we present a preliminary -exploration aimed at understanding how speech data depart from HMMs and what -effect this departure has on the accuracy of HMM-based speech recognition. Our -analysis uses standard diagnostic tools from the field of statistics -- -hypothesis testing, simulation and resampling -- which are rarely used in the -field of speech recognition. Our main result, obtained by novel manipulations -of real and resampled data, demonstrates that real data have statistical -dependency and that this dependency is responsible for significant numbers of -recognition errors. We also demonstrate, using simulation and resampling, that -if we `remove' the statistical dependency from data, then the resulting -recognition error rates become negligible. Taken together, these results -suggest that a better understanding of the structure of the statistical -dependency in speech data is a crucial first step towards improving HMM-based -speech recognition. -" -236,1003.0337,Andrey Kutuzov,"Change of word types to word tokens ratio in the course of translation - (based on Russian translations of K. Vonnegut novels)",cs.CL," The article provides lexical statistical analysis of K. Vonnegut's two novels -and their Russian translations. It is found out that there happen some changes -between the speed of word types and word tokens ratio change in the source and -target texts. The author hypothesizes that these changes are typical for -English-Russian translations, and moreover, they represent an example of -Baker's translation feature of levelling out. -" -237,1003.0628,"Yi Mao, Krishnakumar Balasubramanian, Guy Lebanon",Linguistic Geometries for Unsupervised Dimensionality Reduction,cs.CL," Text documents are complex high dimensional objects. To effectively visualize -such data it is important to reduce its dimensionality and visualize the low -dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore -dimensionality reduction methods that draw upon domain knowledge in order to -achieve a better low dimensional embedding and visualization of documents. We -consider the use of geometries specified manually by an expert, geometries -derived automatically from corpus statistics, and geometries computed from -linguistic resources. -" -238,1003.1141,Peter D. Turney and Patrick Pantel,From Frequency to Meaning: Vector Space Models of Semantics,cs.CL cs.IR cs.LG," Computers understand very little of the meaning of human language. This -profoundly limits our ability to give instructions to computers, the ability of -computers to explain their actions to us, and the ability of computers to -analyse and process text. Vector space models (VSMs) of semantics are beginning -to address these limits. This paper surveys the use of VSMs for semantic -processing of text. We organize the literature on VSMs according to the -structure of the matrix in a VSM. There are currently three broad classes of -VSMs, based on term-document, word-context, and pair-pattern matrices, yielding -three classes of applications. We survey a broad range of applications in these -three categories and we take a detailed look at a specific open source project -in each category. Our goal in this survey is to show the breadth of -applications of VSMs for semantics, to provide a new perspective on VSMs for -those who are already familiar with the area, and to provide pointers into the -literature for those who are less familiar with the field. -" -239,1003.1399,"Peter Vaclavik, Jaroslav Poruban, Marek Mezei","Automatic derivation of domain terms and concept location based on the - analysis of the identifiers",cs.CL," Developers express the meaning of the domain ideas in specifically selected -identifiers and comments that form the target implemented code. Software -maintenance requires knowledge and understanding of the encoded ideas. This -paper presents a way how to create automatically domain vocabulary. Knowledge -of domain vocabulary supports the comprehension of a specific domain for later -code maintenance or evolution. We present experiments conducted in two selected -domains: application servers and web frameworks. Knowledge of domain terms -enables easy localization of chunks of code that belong to a certain term. We -consider these chunks of code as ""concepts"" and their placement in the code as -""concept location"". Application developers may also benefit from the obtained -domain terms. These terms are parts of speech that characterize a certain -concept. Concepts are encoded in ""classes"" (OO paradigm) and the obtained -vocabulary of terms supports the selection and the comprehension of the class' -appropriate identifiers. We measured the following software products with our -tool: JBoss, JOnAS, GlassFish, Tapestry, Google Web Toolkit and Echo2. -" -240,1003.1410,"Seungyeon Kim, Guy Lebanon",Local Space-Time Smoothing for Version Controlled Documents,cs.GR cs.CL cs.LG," Unlike static documents, version controlled documents are continuously edited -by one or more authors. Such collaborative revision process makes traditional -modeling and visualization techniques inappropriate. In this paper we propose a -new representation based on local space-time smoothing that captures important -revision patterns. We demonstrate the applicability of our framework using -experiments on synthetic and real-world data. -" -241,1003.1455,"Rama N., Meenakshi Lakshmanan","A Computational Algorithm based on Empirical Analysis, that Composes - Sanskrit Poetry",cs.CL," Poetry-writing in Sanskrit is riddled with problems for even those who know -the language well. This is so because the rules that govern Sanskrit prosody -are numerous and stringent. We propose a computational algorithm that converts -prose given as E-text into poetry in accordance with the metrical rules of -Sanskrit prosody, simultaneously taking care to ensure that sandhi or euphonic -conjunction, which is compulsory in verse, is handled. The algorithm is -considerably speeded up by a novel method of reducing the target search -database. The algorithm further gives suggestions to the poet in case what -he/she has given as the input prose is impossible to fit into any allowed -metrical format. There is also an interactive component of the algorithm by -which the algorithm interacts with the poet to resolve ambiguities. In -addition, this unique work, which provides a solution to a problem that has -never been addressed before, provides a simple yet effective speech recognition -interface that would help the visually impaired dictate words in E-text, which -is in turn versified by our Poetry Composer Engine. -" -242,1003.4065,"Chien-Ying Chen, Jen-Yuan Yeh, Hao-Ren Ke",Plagiarism Detection using ROUGE and WordNet,cs.OH cs.CL," With the arrival of digital era and Internet, the lack of information control -provides an incentive for people to freely use any content available to them. -Plagiarism occurs when users fail to credit the original owner for the content -referred to, and such behavior leads to violation of intellectual property. Two -main approaches to plagiarism detection are fingerprinting and term occurrence; -however, one common weakness shared by both approaches, especially -fingerprinting, is the incapability to detect modified text plagiarism. This -study proposes adoption of ROUGE and WordNet to plagiarism detection. The -former includes ngram co-occurrence statistics, skip-bigram, and longest common -subsequence (LCS), while the latter acts as a thesaurus and provides semantic -information. N-gram co-occurrence statistics can detect verbatim copy and -certain sentence modification, skip-bigram and LCS are immune from text -modification such as simple addition or deletion of words, and WordNet may -handle the problem of word substitution. -" -243,1003.4149,"Claude Martineau (IGM-LabInfo), Elsa Tolone (IGM-LabInfo), Stavroula - Voyatzi (IGM-LabInfo)","Les Entit\'es Nomm\'ees : usage et degr\'es de pr\'ecision et de - d\'esambigu\""isation",cs.CL," The recognition and classification of Named Entities (NER) are regarded as an -important component for many Natural Language Processing (NLP) applications. -The classification is usually made by taking into account the immediate context -in which the NE appears. In some cases, this immediate context does not allow -getting the right classification. We show in this paper that the use of an -extended syntactic context and large-scale resources could be very useful in -the NER task. -" -244,1003.4394,"Bob Coecke, Mehrnoosh Sadrzadeh, Stephen Clark","Mathematical Foundations for a Compositional Distributional Model of - Meaning",cs.CL cs.LO math.CT," We propose a mathematical framework for a unification of the distributional -theory of meaning in terms of vector space models, and a compositional theory -for grammatical types, for which we rely on the algebra of Pregroups, -introduced by Lambek. This mathematical framework enables us to compute the -meaning of a well-typed sentence from the meanings of its constituents. -Concretely, the type reductions of Pregroups are `lifted' to morphisms in a -category, a procedure that transforms meanings of constituents into a meaning -of the (well-typed) whole. Importantly, meanings of whole sentences live in a -single space, independent of the grammatical structure of the sentence. Hence -the inner-product can be used to compare meanings of arbitrary sentences, as it -is for comparing the meanings of words in the distributional model. The -mathematical structure we employ admits a purely diagrammatic calculus which -exposes how the information flows between the words in a sentence in order to -make up the meaning of the whole sentence. A variation of our `categorical -model' which involves constraining the scalars of the vector spaces to the -semiring of Booleans results in a Montague-style Boolean-valued semantics. -" -245,1003.4894,"Michel Aurnague (CLLE), Laure Vieu (IRIT), Andr\'ee Borillo (CLLE)",La repr\'esentation formelle des concepts spatiaux dans la langue,cs.CL," In this chapter, we assume that systematically studying spatial markers -semantics in language provides a means to reveal fundamental properties and -concepts characterizing conceptual representations of space. We propose a -formal system accounting for the properties highlighted by the linguistic -analysis, and we use these tools for representing the semantic content of -several spatial relations of French. The first part presents a semantic -analysis of the expression of space in French aiming at describing the -constraints that formal representations have to take into account. In the -second part, after presenting the structure of our formal system, we set out -its components. A commonsense geometry is sketched out and several functional -and pragmatic spatial concepts are formalized. We take a special attention in -showing that these concepts are well suited to representing the semantic -content of several prepositions of French ('sur' (on), 'dans' (in), 'devant' -(in front of), 'au-dessus' (above)), and in illustrating the inferential -adequacy of these representations. -" -246,1003.4898,"Michel Aurnague (CLLE), Maya Hickmann (SFLTAMP), Laure Vieu (IRIT)","Les entit\'es spatiales dans la langue : \'etude descriptive, formelle - et exp\'erimentale de la cat\'egorisation",cs.CL," While previous linguistic and psycholinguistic research on space has mainly -analyzed spatial relations, the studies reported in this paper focus on how -language distinguishes among spatial entities. Descriptive and experimental -studies first propose a classification of entities, which accounts for both -static and dynamic space, has some cross-linguistic validity, and underlies -adults' cognitive processing. Formal and computational analyses then introduce -theoretical elements aiming at modelling these categories, while fulfilling -various properties of formal ontologies (generality, parsimony, coherence...). -This formal framework accounts, in particular, for functional dependences among -entities underlying some part-whole descriptions. Finally, developmental -research shows that language-specific properties have a clear impact on how -children talk about space. The results suggest some cross-linguistic -variability in children's spatial representations from an early age onwards, -bringing into question models in which general cognitive capacities are the -only determinants of spatial cognition during the course of development. -" -247,1003.5372,"Stergos Afantenos, Pascal Denis, Philippe Muller, Laurence Danlos",Learning Recursive Segments for Discourse Parsing,cs.CL," Automatically detecting discourse segments is an important preliminary step -towards full discourse parsing. Previous research on discourse segmentation -have relied on the assumption that elementary discourse units (EDUs) in a -document always form a linear sequence (i.e., they can never be nested). -Unfortunately, this assumption turns out to be too strong, for some theories of -discourse like SDRT allows for nested discourse units. In this paper, we -present a simple approach to discourse segmentation that is able to produce -nested EDUs. Our approach builds on standard multi-class classification -techniques combined with a simple repairing heuristic that enforces global -coherence. Our system was developed and evaluated on the first round of -annotations provided by the French Annodis project (an ongoing effort to create -a discourse bank for French). Cross-validated on only 47 documents (1,445 -EDUs), our system achieves encouraging performance results with an F-score of -73% for finding EDUs. -" -248,1003.5749,"Iris Eshkol (CORAL), Isabelle Tellier (LIFO), Taalab Samer (LIFO), - Sylvie Billot (LIFO)","Etiqueter un corpus oral par apprentissage automatique \`a l'aide de - connaissances linguistiques",cs.LG cs.CL," Thanks to the Eslo1 (""Enqu\^ete sociolinguistique d'Orl\'eans"", i.e. -""Sociolinguistic Inquiery of Orl\'eans"") campain, a large oral corpus has been -gathered and transcribed in a textual format. The purpose of the work presented -here is to associate a morpho-syntactic label to each unit of this corpus. To -this aim, we have first studied the specificities of the necessary labels, and -their various possible levels of description. This study has led to a new -original hierarchical structuration of labels. Then, considering that our new -set of labels was different from the one used in every available software, and -that these softwares usually do not fit for oral data, we have built a new -labeling tool by a Machine Learning approach, from data labeled by Cordial and -corrected by hand. We have applied linear CRF (Conditional Random Fields) -trying to take the best possible advantage of the linguistic knowledge that was -used to define the set of labels. We obtain an accuracy between 85 and 90%, -depending of the parameters used. -" -249,1004.3183,"Juan-Manuel Torres Moreno, Silvia Fernandez and Eric SanJuan",Statistical Physics for Natural Language Processing,cs.CL cond-mat.stat-mech cs.IR," This paper has been withdrawn by the author. -" -250,1004.4181,Glyn Morrill and Oriol Valent\'in,Displacement Calculus,cs.CL," The Lambek calculus provides a foundation for categorial grammar in the form -of a logic of concatenation. But natural language is characterized by -dependencies which may also be discontinuous. In this paper we introduce the -displacement calculus, a generalization of Lambek calculus, which preserves its -good proof-theoretic properties while embracing discontinuiity and subsuming -it. We illustrate linguistic applications and prove Cut-elimination, the -subformula property, and decidability -" -251,1004.4848,M. Ausloos,Punctuation effects in English and Esperanto texts,cs.CL physics.data-an," A statistical physics study of punctuation effects on sentence lengths is -presented for written texts: {\it Alice in wonderland} and {\it Through a -looking glass}. The translation of the first text into esperanto is also -considered as a test for the role of punctuation in defining a style, and for -contrasting natural and artificial, but written, languages. Several log-log -plots of the sentence length-rank relationship are presented for the major -punctuation marks. Different power laws are observed with characteristic -exponents. The exponent can take a value much less than unity ($ca.$ 0.50 or -0.30) depending on how a sentence is defined. The texts are also mapped into -time series based on the word frequencies. The quantitative differences between -the original and translated texts are very minutes, at the exponent level. It -is argued that sentences seem to be more reliable than word distributions in -discussing an author style. -" -252,1005.3902,Nabil Hathout (CLLE),Morphonette: a morphological network of French,cs.CL," This paper describes in details the first version of Morphonette, a new -French morphological resource and a new radically lexeme-based method of -morphological analysis. This research is grounded in a paradigmatic conception -of derivational morphology where the morphological structure is a structure of -the entire lexicon and not one of the individual words it contains. The -discovery of this structure relies on a measure of morphological similarity -between words, on formal analogy and on the properties of two morphological -paradigms: -" -253,1005.4697,Jeroen Bransen,The Lambek-Grishin calculus is NP-complete,cs.CL," The Lambek-Grishin calculus LG is the symmetric extension of the -non-associative Lambek calculus NL. In this paper we prove that the -derivability problem for LG is NP-complete. -" -254,1005.4997,"Sitabhra Sinha, Md Izhar Ashraf, Raj Kumar Pan and Bryan Kenneth Wells","Network analysis of a corpus of undeciphered Indus civilization - inscriptions indicates syntactic organization",cs.CL physics.data-an physics.soc-ph," Archaeological excavations in the sites of the Indus Valley civilization -(2500-1900 BCE) in Pakistan and northwestern India have unearthed a large -number of artifacts with inscriptions made up of hundreds of distinct signs. To -date there is no generally accepted decipherment of these sign sequences and -there have been suggestions that the signs could be non-linguistic. Here we -apply complex network analysis techniques to a database of available Indus -inscriptions, with the aim of detecting patterns indicative of syntactic -organization. Our results show the presence of patterns, e.g., recursive -structures in the segmentation trees of the sequences, that suggest the -existence of a grammar underlying these inscriptions. -" -255,1005.5253,"Sergio Guadarrama (1) and David P. Pancho (1) ((1) European Centre for - Soft Computing)","Using Soft Constraints To Learn Semantic Models Of Descriptions Of - Shapes",cs.CL cs.AI cs.HC cs.LG," The contribution of this paper is to provide a semantic model (using soft -constraints) of the words used by web-users to describe objects in a language -game; a game in which one user describes a selected object of those composing -the scene, and another user has to guess which object has been described. The -given description needs to be non ambiguous and accurate enough to allow other -users to guess the described shape correctly. - To build these semantic models the descriptions need to be analyzed to -extract the syntax and words' classes used. We have modeled the meaning of -these descriptions using soft constraints as a way for grounding the meaning. - The descriptions generated by the system took into account the context of the -object to avoid ambiguous descriptions, and allowed users to guess the -described object correctly 72% of the times. -" -256,1005.5466,Solomiya Buk,"Quantitative parametrization of texts written by Ivan Franko: An attempt - of the project",cs.CL," In the article, the project of quantitative parametrization of all texts by -Ivan Franko is manifested. It can be made only by using modern computer -techniques after the frequency dictionaries for all Franko's works are -compiled. The paper describes the application spheres, methodology, stages, -principles and peculiarities in the compilation of the frequency dictionary of -the second half of the 19th century - the beginning of the 20th century. The -relation between the Ivan Franko frequency dictionary, explanatory dictionary -of writer's language and text corpus is discussed. -" -257,1005.5596,"Matthieu Constant (IGM-LabInfo), Elsa Tolone (IGM-LabInfo)",A generic tool to generate a lexicon for NLP from Lexicon-Grammar tables,cs.CL," Lexicon-Grammar tables constitute a large-coverage syntactic lexicon but they -cannot be directly used in Natural Language Processing (NLP) applications -because they sometimes rely on implicit information. In this paper, we -introduce LGExtract, a generic tool for generating a syntactic lexicon for NLP -from the Lexicon-Grammar tables. It is based on a global table that contains -undefined information and on a unique extraction script including all -operations to be performed for all tables. We also present an experiment that -has been conducted to generate a new lexicon of French verbs and predicative -nouns. -" -258,1006.0153,Solomiya Buk,"Ivan Franko's novel Dlja domashnjoho ohnyshcha (For the Hearth) in the - light of the frequency dictionary",cs.CL," In the article, the methodology and the principles of the compilation of the -Frequency dictionary for Ivan Franko's novel Dlja domashnjoho ohnyshcha (For -the Hearth) are described. The following statistical parameters of the novel -vocabulary are obtained: variety, exclusiveness, concentration indexes, -correlation between word rank and text coverage, etc. The main quantitative -characteristics of Franko's novels Perekhresni stezhky (The Cross-Paths) and -Dlja domashnjoho ohnyshcha are compared on the basis of their frequency -dictionaries. -" -259,1006.1343,Fionn Murtagh and Adam Ganz,"Segmentation and Nodal Points in Narrative: Study of Multiple Variations - of a Ballad",cs.CL stat.ML," The Lady Maisry ballads afford us a framework within which to segment a -storyline into its major components. Segments and as a consequence nodal points -are discussed for nine different variants of the Lady Maisry story of a (young) -woman being burnt to death by her family, on account of her becoming pregnant -by a foreign personage. We motivate the importance of nodal points in textual -and literary analysis. We show too how the openings of the nine variants can be -analyzed comparatively, and also the conclusions of the ballads. -" -260,1006.1786,Diederik Aerts,Measuring Meaning on the World-Wide Web,cs.AI cs.CL," We introduce the notion of the 'meaning bound' of a word with respect to -another word by making use of the World-Wide Web as a conceptual environment -for meaning. The meaning of a word with respect to another word is established -by multiplying the product of the number of webpages containing both words by -the total number of webpages of the World-Wide Web, and dividing the result by -the product of the number of webpages for each of the single words. We -calculate the meaning bounds for several words and analyze different aspects of -these by looking at specific examples. -" -261,1006.1930,"Diederik Aerts, Marek Czachor, Bart D'Hooghe and Sandro Sozzo",The Pet-Fish problem on the World-Wide Web,cs.AI cs.CL," We identify the presence of Pet-Fish problem situations and the corresponding -Guppy effect of concept theory on the World-Wide Web. For this purpose, we -introduce absolute weights for words expressing concepts and relative weights -between words expressing concepts, and the notion of 'meaning bound' between -two words expressing concepts, making explicit use of the conceptual structure -of the World-Wide Web. The Pet-Fish problem occurs whenever there are exemplars -- in the case of Pet and Fish these can be Guppy or Goldfish - for which the -meaning bound with respect to the conjunction is stronger than the meaning -bounds with respect to the individual concepts. -" -262,1006.2809,"A.A Zaidan, B.B Zaidan, Hamid.A.Jalab, Hamdan.O.Alanazi and Rami - Alnaqeib",Offline Arabic Handwriting Recognition Using Artificial Neural Network,cs.CL," The ambition of a character recognition system is to transform a text -document typed on paper into a digital format that can be manipulated by word -processor software Unlike other languages, Arabic has unique features, while -other language doesn't have, from this language these are seven or eight -language such as ordo, jewie and Persian writing, Arabic has twenty eight -letters, each of which can be linked in three different ways or separated -depending on the case. The difficulty of the Arabic handwriting recognition is -that, the accuracy of the character recognition which affects on the accuracy -of the word recognition, in additional there is also two or three from for each -character, the suggested solution by using artificial neural network can solve -the problem and overcome the difficulty of Arabic handwriting recognition. -" -263,1006.2835,P. Venkata Subba Reddy,"Fuzzy Modeling and Natural Language Processing for Panini's Sanskrit - Grammar",cs.CL," Indian languages have long history in World Natural languages. Panini was the -first to define Grammar for Sanskrit language with about 4000 rules in fifth -century. These rules contain uncertainty information. It is not possible to -Computer processing of Sanskrit language with uncertain information. In this -paper, fuzzy logic and fuzzy reasoning are proposed to deal to eliminate -uncertain information for reasoning with Sanskrit grammar. The Sanskrit -language processing is also discussed in this paper. -" -264,1006.3271,"Anne S. Hsu (Univ. College, University London), Nick Chater (Univ. - College, University London), Paul M.B. Vitanyi (CWI, Amsterdam)","The probabilistic analysis of language acquisition: Theoretical, - computational, and experimental analysis",cs.CL physics.data-an q-bio.NC," There is much debate over the degree to which language learning is governed -by innate language-specific biases, or acquired through cognition-general -principles. Here we examine the probabilistic language acquisition hypothesis -on three levels: We outline a novel theoretical result showing that it is -possible to learn the exact generative model underlying a wide class of -languages, purely from observing samples of the language. We then describe a -recently proposed practical framework, which quantifies natural language -learnability, allowing specific learnability predictions to be made for the -first time. In previous work, this framework was used to make learnability -predictions for a wide variety of linguistic constructions, for which -learnability has been much debated. Here, we present a new experiment which -tests these learnability predictions. We find that our experimental results -support the possibility that these linguistic constructions are acquired -probabilistically from cognition-general principles. -" -265,1006.3787,Serguei A. Mokhov,"Complete Complementary Results Report of the MARF's NLP Approach to the - DEFT 2010 Competition",cs.CL," This companion paper complements the main DEFT'10 article describing the MARF -approach (arXiv:0905.1235) to the DEFT'10 NLP challenge (described at -http://www.groupes.polymtl.ca/taln2010/deft.php in French). This paper is aimed -to present the complete result sets of all the conducted experiments and their -settings in the resulting tables highlighting the approach and the best -results, but also showing the worse and the worst and their subsequent -analysis. This particular work focuses on application of the MARF's classical -and NLP pipelines to identification tasks within various francophone corpora to -identify decades when certain articles were published for the first track -(Piste 1) and place of origin of a publication (Piste 2), such as the journal -and location (France vs. Quebec). This is the sixth iteration of the release of -the results. -" -266,1006.5827,Sergio Guadarrama and Antonio Ruiz-Mayor,"Approximate Robotic Mapping from sonar data by modeling Perceptions with - Antonyms",cs.RO cs.CL," This work, inspired by the idea of ""Computing with Words and Perceptions"" -proposed by Zadeh in 2001, focuses on how to transform measurements into -perceptions for the problem of map building by Autonomous Mobile Robots. We -propose to model the perceptions obtained from sonar-sensors as two grid maps: -one for obstacles and another for empty spaces. The rules used to build and -integrate these maps are expressed by linguistic descriptions and modeled by -fuzzy rules. The main difference of this approach from other studies reported -in the literature is that the method presented here is based on the hypothesis -that the concepts ""occupied"" and ""empty"" are antonyms rather than complementary -(as it happens in probabilistic approaches), or independent (as it happens in -the previous fuzzy models). - Controlled experimentation with a real robot in three representative indoor -environments has been performed and the results presented. We offer a -qualitative and quantitative comparison of the estimated maps obtained by the -probabilistic approach, the previous fuzzy method and the new antonyms-based -fuzzy approach. It is shown that the maps obtained with the antonyms-based -approach are better defined, capture better the shape of the walls and of the -empty-spaces, and contain less errors due to rebounds and short-echoes. -Furthermore, in spite of noise and low resolution inherent to the sonar-sensors -used, the maps obtained are accurate and tolerant to imprecision. -" -267,1006.5880,Stergos Afantenos and Nicholas Asher,Testing SDRT's Right Frontier,cs.CL," The Right Frontier Constraint (RFC), as a constraint on the attachment of new -constituents to an existing discourse structure, has important implications for -the interpretation of anaphoric elements in discourse and for Machine Learning -(ML) approaches to learning discourse structures. In this paper we provide -strong empirical support for SDRT's version of RFC. The analysis of about 100 -doubly annotated documents by five different naive annotators shows that SDRT's -RFC is respected about 95% of the time. The qualitative analysis of presumed -violations that we have performed shows that they are either click-errors or -structural misconceptions. -" -268,1007.0936,"Jaroslaw Kwapien, Stanislaw Drozdz, Adam Orczyk","Linguistic complexity: English vs. Polish, text vs. corpus",cs.CL physics.soc-ph," We analyze the rank-frequency distributions of words in selected English and -Polish texts. We show that for the lemmatized (basic) word forms the -scale-invariant regime breaks after about two decades, while it might be -consistent for the whole range of ranks for the inflected word forms. We also -find that for a corpus consisting of texts written by different authors the -basic scale-invariant regime is broken more strongly than in the case of -comparable corpus consisting of texts written by the same author. Similarly, -for a corpus consisting of texts translated into Polish from other languages -the scale-invariant regime is broken more strongly than for a comparable corpus -of native Polish texts. Moreover, we find that if the words are tagged with -their proper part of speech, only verbs show rank-frequency distribution that -is almost scale-invariant. -" -269,1007.1025,Henryk Fuk\'s,Inflection system of a language as a complex network,cs.CL nlin.AO," We investigate inflection structure of a synthetic language using Latin as an -example. We construct a bipartite graph in which one group of vertices -correspond to dictionary headwords and the other group to inflected forms -encountered in a given text. Each inflected form is connected to its -corresponding headword, which in some cases in non-unique. The resulting sparse -graph decomposes into a large number of connected components, to be called word -groups. We then show how the concept of the word group can be used to construct -coverage curves of selected Latin texts. We also investigate a version of the -inflection graph in which all theoretically possible inflected forms are -included. Distribution of sizes of connected components of this graphs -resembles cluster distribution in a lattice percolation near the critical -point. -" -270,1007.3254,"J. T. Stevanak, David M. Larue, and Lincoln D. Carr","Distinguishing Fact from Fiction: Pattern Recognition in Texts Using - Complex Networks",cs.CL cond-mat.stat-mech physics.soc-ph," We establish concrete mathematical criteria to distinguish between different -kinds of written storytelling, fictional and non-fictional. Specifically, we -constructed a semantic network from both novels and news stories, with $N$ -independent words as vertices or nodes, and edges or links allotted to words -occurring within $m$ places of a given vertex; we call $m$ the word distance. -We then used measures from complex network theory to distinguish between news -and fiction, studying the minimal text length needed as well as the optimized -word distance $m$. The literature samples were found to be most effectively -represented by their corresponding power laws over degree distribution $P(k)$ -and clustering coefficient $C(k)$; we also studied the mean geodesic distance, -and found all our texts were small-world networks. We observed a natural -break-point at $k=\sqrt{N}$ where the power law in the degree distribution -changed, leading to separate power law fit for the bulk and the tail of $P(k)$. -Our linear discriminant analysis yielded a $73.8 \pm 5.15%$ accuracy for the -correct classification of novels and $69.1 \pm 1.22%$ for news stories. We -found an optimal word distance of $m=4$ and a minimum text length of 100 to 200 -words $N$. -" -271,1007.4748,Aron Culotta,Detecting influenza outbreaks by analyzing Twitter messages,cs.IR cs.CL," We analyze over 500 million Twitter messages from an eight month period and -find that tracking a small number of flu-related keywords allows us to forecast -future influenza rates with high accuracy, obtaining a 95% correlation with -national health statistics. We then analyze the robustness of this approach to -spurious keyword matches, and we propose a document classification component to -filter these misleading messages. We find that this document classifier can -reduce error rates by over half in simulated false alarm experiments, though -more research is needed to develop methods that are robust in cases of -extremely high noise. -" -272,1008.0170,Michael Moortgat,Symmetric categorial grammar: residuation and Galois connections,cs.CL," The Lambek-Grishin calculus is a symmetric extension of the Lambek calculus: -in addition to the residuated family of product, left and right division -operations of Lambek's original calculus, one also considers a family of -coproduct, right and left difference operations, related to the former by an -arrow-reversing duality. Communication between the two families is implemented -in terms of linear distributivity principles. The aim of this paper is to -complement the symmetry between (dual) residuated type-forming operations with -an orthogonal opposition that contrasts residuated and Galois connected -operations. Whereas the (dual) residuated operations are monotone, the Galois -connected operations (and their duals) are antitone. We discuss the algebraic -properties of the (dual) Galois connected operations, and generalize the -(co)product distributivity principles to include the negative operations. We -give a continuation-passing-style translation for the new type-forming -operations, and discuss some linguistic applications. -" -273,1008.0706,Allen Lavoie and Mukkai Krishnamoorthy,Algorithmic Detection of Computer Generated Text,stat.ML cs.CL," Computer generated academic papers have been used to expose a lack of -thorough human review at several computer science conferences. We assess the -problem of classifying such documents. After identifying and evaluating several -quantifiable features of academic papers, we apply methods from machine -learning to build a binary classifier. In tests with two hundred papers, the -resulting classifier correctly labeled papers either as human written or as -computer generated with no false classifications of computer generated papers -as human and a 2% false classification rate for human papers as computer -generated. We believe generalizations of these features are applicable to -similar classification problems. While most current text-based spam detection -techniques focus on the keyword-based classification of email messages, a new -generation of unsolicited computer-generated advertisements masquerade as -legitimate postings in online groups, message boards and social news sites. Our -results show that taking the formatting and contextual clues offered by these -environments into account may be of central importance when selecting features -with which to identify such unwanted postings. -" -274,1008.1394,"Zeeshan Ahmed, Saman Majeed, Thomas Dandekar","Towards Design and Implementation of a Language Technology based - Information Processor for PDM Systems",cs.IR cs.CL cs.SE," Product Data Management (PDM) aims to provide 'Systems' contributing in -industries by electronically maintaining organizational data, improving data -repository system, facilitating with easy access to CAD and providing -additional information engineering and management modules to access, store, -integrate, secure, recover and manage information. Targeting one of the -unresolved issues i.e., provision of natural language based processor for the -implementation of an intelligent record search mechanism, an approach is -proposed and discussed in detail in this manuscript. Designing an intelligent -application capable of reading and analyzing user's structured and unstructured -natural language based text requests and then extracting desired concrete and -optimized results from knowledge base is still a challenging task for the -designers because it is still very difficult to completely extract Meta data -out of raw data. Residing within the limited scope of current research and -development; we present an approach capable of reading user's natural language -based input text, understanding the semantic and extracting results from -repositories. To evaluate the effectiveness of implemented prototyped version -of proposed approach, it is compared with some existing PDM Systems, in the end -the discussion is concluded with an abstract presentation of resultant -comparison amongst implemented prototype and some existing PDM Systems. -" -275,1008.1673,Alex V Berka,Space and the Synchronic A-Ram,cs.CL cs.PL," Space is a circuit oriented, spatial programming language designed to exploit -the massive parallelism available in a novel formal model of computation called -the Synchronic A-Ram, and physically related FPGA and reconfigurable -architectures. Space expresses variable grained MIMD parallelism, is modular, -strictly typed, and deterministic. Barring operations associated with memory -allocation and compilation, modules cannot access global variables, and are -referentially transparent. At a high level of abstraction, modules exhibit a -small, sequential state transition system, aiding verification. Space deals -with communication, scheduling, and resource contention issues in parallel -computing, by resolving them explicitly in an incremental manner, module by -module, whilst ascending the ladder of abstraction. Whilst the Synchronic A-Ram -model was inspired by linguistic considerations, it is also put forward as a -formal model for reconfigurable digital circuits. A programming environment has -been developed, that incorporates a simulator and compiler that transform Space -programs into Synchronic A-Ram machine code, consisting of only three bit-level -instructions, and a marking instruction. Space and the Synchronic A-Ram point -to novel routes out of the parallel computing crisis. -" -276,1008.1986,"Mark Yatskar, Bo Pang, Cristian Danescu-Niculescu-Mizil and Lillian - Lee","For the sake of simplicity: Unsupervised extraction of lexical - simplifications from Wikipedia",cs.CL," We report on work in progress on extracting lexical simplifications (e.g., -""collaborate"" -> ""work together""), focusing on utilizing edit histories in -Simple English Wikipedia for this task. We consider two main approaches: (1) -deriving simplification probabilities via an edit model that accounts for a -mixture of different operations, and (2) using metadata to focus on edits that -are more likely to be simplification operations. We find our methods to -outperform a reasonable baseline and yield many high-quality lexical -simplifications not included in an independently-created manually prepared -list. -" -277,1008.3169,Cristian Danescu-Niculescu-Mizil and Lillian Lee,"Don't 'have a clue'? Unsupervised co-learning of downward-entailing - operators",cs.CL," Researchers in textual entailment have begun to consider inferences involving -'downward-entailing operators', an interesting and important class of lexical -items that change the way inferences are made. Recent work proposed a method -for learning English downward-entailing operators that requires access to a -high-quality collection of 'negative polarity items' (NPIs). However, English -is one of the very few languages for which such a list exists. We propose the -first approach that can be applied to the many languages for which there is no -pre-existing high-precision database of NPIs. As a case study, we apply our -method to Romanian and show that our method yields good results. Also, we -perform a cross-linguistic analysis that suggests interesting connections to -some findings in linguistic typology. -" -278,1008.3667,"Ishanu Chattopadhyay, Yicheng Wen and Asok Ray","Pattern Classification In Symbolic Streams via Semantic Annihilation of - Information",cs.SC cs.CL cs.IT math.IT," We propose a technique for pattern classification in symbolic streams via -selective erasure of observed symbols, in cases where the patterns of interest -are represented as Probabilistic Finite State Automata (PFSA). We define an -additive abelian group for a slightly restricted subset of probabilistic finite -state automata (PFSA), and the group sum is used to formulate pattern-specific -semantic annihilators. The annihilators attempt to identify pre-specified -patterns via removal of essentially all inter-symbol correlations from observed -sequences, thereby turning them into symbolic white noise. Thus a perfect -annihilation corresponds to a perfect pattern match. This approach of -classification via information annihilation is shown to be strictly -advantageous, with theoretical guarantees, for a large class of PFSA models. -The results are supported by simulation experiments. -" -279,1008.5287,"Dipak Chaudhari, Om P. Damani, and Srivatsan Laxman","Lexical Co-occurrence, Statistical Significance, and Word Association",cs.CL cs.IR," Lexical co-occurrence is an important cue for detecting word associations. We -present a theoretical framework for discovering statistically significant -lexical co-occurrences from a given corpus. In contrast with the prevalent -practice of giving weightage to unigram frequencies, we focus only on the -documents containing both the terms (of a candidate bigram). We detect biases -in span distributions of associated words, while being agnostic to variations -in global unigram frequencies. Our framework has the fidelity to distinguish -different classes of lexical co-occurrences, based on strengths of the document -and corpuslevel cues of co-occurrence in the data. We perform extensive -experiments on benchmark data sets to study the performance of various -co-occurrence measures that are currently known in literature. We find that a -relatively obscure measure called Ochiai, and a newly introduced measure CSA -capture the notion of lexical co-occurrence best, followed next by LLR, Dice, -and TTest, while another popular measure, PMI, suprisingly, performs poorly in -the context of lexical co-occurrence. -" -280,1009.0108,Arslan Shaukat and Ke Chen,Emotional State Categorization from Speech: Machine vs. Human,cs.CL cs.AI cs.HC," This paper presents our investigations on emotional state categorization from -speech signals with a psychologically inspired computational model against -human performance under the same experimental setup. Based on psychological -studies, we propose a multistage categorization strategy which allows -establishing an automatic categorization model flexibly for a given emotional -speech categorization task. We apply the strategy to the Serbian Emotional -Speech Corpus (GEES) and the Danish Emotional Speech Corpus (DES), where human -performance was reported in previous psychological studies. Our work is the -first attempt to apply machine learning to the GEES corpus where the human -recognition rates were only available prior to our study. Unlike the previous -work on the DES corpus, our work focuses on a comparison to human performance -under the same experimental settings. Our studies suggest that -psychology-inspired systems yield behaviours that, to a great extent, resemble -what humans perceived and their performance is close to that of humans under -the same experimental setup. Furthermore, our work also uncovers some -differences between machine and humans in terms of emotional state recognition -from speech. -" -281,1009.1117,"Elsa Tolone (LIGM), Stavroula Voyatzi (LIGM), Christian Lecl\`ere - (LIGM)",Constructions d\'efinitoires des tables du Lexique-Grammaire,cs.CL," Lexicon-Grammar tables are a very rich syntactic lexicon for the French -language. This linguistic database is nevertheless not directly suitable for -use by computer programs, as it is incomplete and lacks consistency. Tables are -defined on the basis of features which are not explicitly recorded in the -lexicon. These features are only described in literature. Our aim is to define -for each tables these essential properties to make them usable in various -Natural Language Processing (NLP) applications, such as parsing. -" -282,1009.2706,"Artiom Alhazov, Sergey Verlan","Minimization Strategies for Maximally Parallel Multiset Rewriting - Systems",cs.FL cs.CC cs.CL cs.DM," Maximally parallel multiset rewriting systems (MPMRS) give a convenient way -to express relations between unstructured objects. The functioning of various -computational devices may be expressed in terms of MPMRS (e.g., register -machines and many variants of P systems). In particular, this means that MPMRS -are computationally complete; however, a direct translation leads to quite a -big number of rules. Like for other classes of computationally complete -devices, there is a challenge to find a universal system having the smallest -number of rules. In this article we present different rule minimization -strategies for MPMRS based on encodings and structural transformations. We -apply these strategies to the translation of a small universal register machine -(Korec, 1996) and we show that there exists a universal MPMRS with 23 rules. -Since MPMRS are identical to a restricted variant of P systems with antiport -rules, the results we obtained improve previously known results on the number -of rules for those systems. -" -283,1009.3238,Arno Bastenhof,Tableaux for the Lambek-Grishin calculus,cs.CL," Categorial type logics, pioneered by Lambek, seek a proof-theoretic -understanding of natural language syntax by identifying categories with -formulas and derivations with proofs. We typically observe an intuitionistic -bias: a structural configuration of hypotheses (a constituent) derives a single -conclusion (the category assigned to it). Acting upon suggestions of Grishin to -dualize the logical vocabulary, Moortgat proposed the Lambek-Grishin calculus -(LG) with the aim of restoring symmetry between hypotheses and conclusions. We -develop a theory of labeled modal tableaux for LG, inspired by the -interpretation of its connectives as binary modal operators in the relational -semantics of Kurtonina and Moortgat. As a linguistic application of our method, -we show that grammars based on LG are context-free through use of an -interpolation lemma. This result complements that of Melissen, who proved that -LG augmented by mixed associativity and -commutativity was exceeds LTAG in -expressive power. -" -284,1009.3321,"Eduardo G. Altmann, Janet B. Pierrehumbert, Adilson E. Motter",Niche as a determinant of word fate in online groups,cs.CL cond-mat.dis-nn nlin.AO physics.soc-ph q-bio.PE," Patterns of word use both reflect and influence a myriad of human activities -and interactions. Like other entities that are reproduced and evolve, words -rise or decline depending upon a complex interplay between {their intrinsic -properties and the environments in which they function}. Using Internet -discussion communities as model systems, we define the concept of a word niche -as the relationship between the word and the characteristic features of the -environments in which it is used. We develop a method to quantify two important -aspects of the size of the word niche: the range of individuals using the word -and the range of topics it is used to discuss. Controlling for word frequency, -we show that these aspects of the word niche are strong determinants of changes -in word frequency. Previous studies have already indicated that word frequency -itself is a correlate of word success at historical time scales. Our analysis -of changes in word frequencies over time reveals that the relative sizes of -word niches are far more important than word frequencies in the dynamics of the -entire vocabulary at shorter time scales, as the language adapts to new -concepts and social groupings. We also distinguish endogenous versus exogenous -factors as additional contributors to the fates of words, and demonstrate the -force of this distinction in the rise of novel words. Our results indicate that -short-term nonstationarity in word statistics is strongly driven by individual -proclivities, including inclinations to provide novel information and to -project a distinctive social identity. -" -285,1010.1826,Thomas Mainguy,A probabilistic top-down parser for minimalist grammars,cs.CL," This paper describes a probabilistic top-down parser for minimalist grammars. -Top-down parsers have the great advantage of having a certain predictive power -during the parsing, which takes place in a left-to-right reading of the -sentence. Such parsers have already been well-implemented and studied in the -case of Context-Free Grammars, which are already top-down, but these are -difficult to adapt to Minimalist Grammars, which generate sentences bottom-up. -I propose here a way of rewriting Minimalist Grammars as Linear Context-Free -Rewriting Systems, allowing to easily create a top-down parser. This rewriting -allows also to put a probabilistic field on these grammars, which can be used -to accelerate the parser. Finally, I propose a method of refining the -probabilistic field by using algorithms used in data compression. -" -286,1010.2384,"Mihaiela Lupea, Doina Tatar and Zsuzsana Marian",Learning Taxonomy for Text Segmentation by Formal Concept Analysis,cs.CL," In this paper the problems of deriving a taxonomy from a text and -concept-oriented text segmentation are approached. Formal Concept Analysis -(FCA) method is applied to solve both of these linguistic problems. The -proposed segmentation method offers a conceptual view for text segmentation, -using a context-driven clustering of sentences. The Concept-oriented Clustering -Segmentation algorithm (COCS) is based on k-means linear clustering of the -sentences. Experimental results obtained using COCS algorithm are presented. -" -287,1010.3003,Johan Bollen and Huina Mao and Xiao-Jun Zeng,Twitter mood predicts the stock market,cs.CE cs.CL cs.SI physics.soc-ph," Behavioral economics tells us that emotions can profoundly affect individual -behavior and decision-making. Does this also apply to societies at large, i.e., -can societies experience mood states that affect their collective decision -making? By extension is the public mood correlated or even predictive of -economic indicators? Here we investigate whether measurements of collective -mood states derived from large-scale Twitter feeds are correlated to the value -of the Dow Jones Industrial Average (DJIA) over time. We analyze the text -content of daily Twitter feeds by two mood tracking tools, namely OpinionFinder -that measures positive vs. negative mood and Google-Profile of Mood States -(GPOMS) that measures mood in terms of 6 dimensions (Calm, Alert, Sure, Vital, -Kind, and Happy). We cross-validate the resulting mood time series by comparing -their ability to detect the public's response to the presidential election and -Thanksgiving day in 2008. A Granger causality analysis and a Self-Organizing -Fuzzy Neural Network are then used to investigate the hypothesis that public -mood states, as measured by the OpinionFinder and GPOMS mood time series, are -predictive of changes in DJIA closing values. Our results indicate that the -accuracy of DJIA predictions can be significantly improved by the inclusion of -specific public mood dimensions but not others. We find an accuracy of 87.6% in -predicting the daily up and down changes in the closing values of the DJIA and -a reduction of the Mean Average Percentage Error by more than 6%. -" -288,1010.6091,Damian H. Zanette,Network motifs in music sequences,physics.soc-ph cs.CL physics.data-an," This paper has been withdrawn by the author because it needs a deep -methodological revision. -" -289,1011.0519,"Laurent Romary (IDSL, INRIA Saclay - Ile de France)","Stabilizing knowledge through standards - A perspective for the - humanities",cs.CL," It is usual to consider that standards generate mixed feelings among -scientists. They are often seen as not really reflecting the state of the art -in a given domain and a hindrance to scientific creativity. Still, scientists -should theoretically be at the best place to bring their expertise into -standard developments, being even more neutral on issues that may typically be -related to competing industrial interests. Even if it could be thought of as -even more complex to think about developping standards in the humanities, we -will show how this can be made feasible through the experience gained both -within the Text Encoding Initiative consortium and the International -Organisation for Standardisation. By taking the specific case of lexical -resources, we will try to show how this brings about new ideas for designing -future research infrastructures in the human and social sciences. -" -290,1011.0835,"Ziheng Lin, Hwee Tou Ng, and Min-Yen Kan",A PDTB-Styled End-to-End Discourse Parser,cs.CL," We have developed a full discourse parser in the Penn Discourse Treebank -(PDTB) style. Our trained parser first identifies all discourse and -non-discourse relations, locates and labels their arguments, and then -classifies their relation types. When appropriate, the attribution spans to -these relations are also determined. We present a comprehensive evaluation from -both component-wise and error-cascading perspectives. -" -291,1011.2575,"Kentaro Katahira, Kenta Suzuki, Kazuo Okanoya and Masato Okada","Complex sequencing rules of birdsong can be explained by simple hidden - Markov processes",q-bio.NC cs.CL," Complex sequencing rules observed in birdsongs provide an opportunity to -investigate the neural mechanism for generating complex sequential behaviors. -To relate the findings from studying birdsongs to other sequential behaviors, -it is crucial to characterize the statistical properties of the sequencing -rules in birdsongs. However, the properties of the sequencing rules in -birdsongs have not yet been fully addressed. In this study, we investigate the -statistical propertiesof the complex birdsong of the Bengalese finch (Lonchura -striata var. domestica). Based on manual-annotated syllable sequences, we first -show that there are significant higher-order context dependencies in Bengalese -finch songs, that is, which syllable appears next depends on more than one -previous syllable. This property is shared with other complex sequential -behaviors. We then analyze acoustic features of the song and show that -higher-order context dependencies can be explained using first-order hidden -state transition dynamics with redundant hidden states. This model corresponds -to hidden Markov models (HMMs), well known statistical models with a large -range of application for time series modeling. The song annotation with these -models with first-order hidden state dynamics agreed well with manual -annotation, the score was comparable to that of a second-order HMM, and -surpassed the zeroth-order model (the Gaussian mixture model (GMM)), which does -not use context information. Our results imply that the hierarchical -representation with hidden state dynamics may underlie the neural -implementation for generating complex sequences with higher-order dependencies. -" -292,1011.2922,Carl Vogel and Jerom Janssen,Emoticonsciousness,cs.CL," A temporal analysis of emoticon use in Swedish, Italian, German and English -asynchronous electronic communication is reported. Emoticons are classified as -positive, negative and neutral. Postings to newsgroups over a 66 week period -are considered. The aggregate analysis of emoticon use in newsgroups for -science and politics tend on the whole to be consistent over the entire time -period. Where possible, events that coincide with divergences from trends in -language-subject pairs are noted. Political discourse in Italian over the -period shows marked use of negative emoticons, and in Swedish, positive -emoticons. -" -293,1011.3258,Zeeshan Ahmed and Ina Tacheva,Integration of Agile Ontology Mapping towards NLP Search in I-SOAS,cs.CL cs.IR," In this research paper we address the importance of Product Data Management -(PDM) with respect to its contributions in industry. Moreover we also present -some currently available major challenges to PDM communities and targeting some -of these challenges we present an approach i.e. I-SOAS, and briefly discuss how -this approach can be helpful in solving the PDM community's faced problems. -Furthermore, limiting the scope of this research to one challenge, we focus on -the implementation of a semantic based search mechanism in PDM Systems. Going -into the details, at first we describe the respective field i.e. Language -Technology (LT), contributing towards natural language processing, to take -advantage in implementing a search engine capable of understanding the semantic -out of natural language based search queries. Then we discuss how can we -practically take advantage of LT by implementing its concepts in the form of -software application with the use of semantic web technology i.e. Ontology. -Later, in the end of this research paper, we briefly present a prototype -application developed with the use of concepts of LT towards semantic based -search. -" -294,1011.4155,"Jonathan Marchand (INRIA Lorraine - LORIA), Bruno Guillaume (INRIA - Lorraine - LORIA), Guy Perrier (INRIA Lorraine - LORIA)",Motifs de graphe pour le calcul de d\'ependances syntaxiques compl\`etes,cs.CL," This article describes a method to build syntactical dependencies starting -from the phrase structure parsing process. The goal is to obtain all the -information needed for a detailled semantical analysis. Interaction Grammars -are used for parsing; the saturation of polarities which is the core of this -formalism can be mapped to dependency relation. Formally, graph patterns are -used to express the set of constraints which control dependency creations. -" -295,1011.4623,"Samaneh Moghaddam, Fred Popowich",Opinion Polarity Identification through Adjectives,cs.CL," ""What other people think"" has always been an important piece of information -during various decision-making processes. Today people frequently make their -opinions available via the Internet, and as a result, the Web has become an -excellent source for gathering consumer opinions. There are now numerous Web -resources containing such opinions, e.g., product reviews forums, discussion -groups, and Blogs. But, due to the large amount of information and the wide -range of sources, it is essentially impossible for a customer to read all of -the reviews and make an informed decision on whether to purchase the product. -It is also difficult for the manufacturer or seller of a product to accurately -monitor customer opinions. For this reason, mining customer reviews, or opinion -mining, has become an important issue for research in Web information -extraction. One of the important topics in this research area is the -identification of opinion polarity. The opinion polarity of a review is usually -expressed with values 'positive', 'negative' or 'neutral'. We propose a -technique for identifying polarity of reviews by identifying the polarity of -the adjectives that appear in them. Our evaluation shows the technique can -provide accuracy in the area of 73%, which is well above the 58%-64% provided -by naive Bayesian classifiers. -" -296,1011.5076,Andrij Rovenchak and Solomija Buk,Application of a Quantum Ensemble Model to Linguistic Analysis,physics.data-an cs.CL," A new set of parameters to describe the word frequency behavior of texts is -proposed. The analogy between the word frequency distribution and the -Bose-distribution is suggested and the notion of ""temperature"" is introduced -for this case. The calculations are made for English, Ukrainian, and the -Guinean Maninka languages. The correlation between in-deep language structure -(the level of analyticity) and the defined parameters is shown to exist. -" -297,1011.5188,Yannis Haralambous and Elisa Lavagnino,La r\'eduction de termes complexes dans les langues de sp\'ecialit\'e,cs.CL," Our study applies statistical methods to French and Italian corpora to -examine the phenomenon of multi-word term reduction in specialty languages. -There are two kinds of reduction: anaphoric and lexical. We show that anaphoric -reduction depends on the discourse type (vulgarization, pedagogical, -specialized) but is independent of both domain and language; that lexical -reduction depends on domain and is more frequent in technical, rapidly evolving -domains; and that anaphoric reductions tend to follow full terms rather than -precede them. We define the notion of the anaphoric tree of the term and study -its properties. Concerning lexical reduction, we attempt to prove statistically -that there is a notion of term lifecycle, where the full form is progressively -replaced by a lexical reduction. ----- Nous \'etudions par des m\'ethodes -statistiques sur des corpus fran\c{c}ais et italiens, le ph\'enom\`ene de -r\'eduction des termes complexes dans les langues de sp\'ecialit\'e. Il existe -deux types de r\'eductions : anaphorique et lexicale. Nous montrons que la -r\'eduction anaphorique d\'epend du type de discours (de vulgarisation, -p\'edagogique, sp\'ecialis\'e) mais ne d\'epend ni du domaine, ni de la langue, -alors que la r\'eduction lexicale d\'epend du domaine et est plus fr\'equente -dans les domaines techniques \`a \'evolution rapide. D'autre part, nous -montrons que la r\'eduction anaphorique a tendance \`a suivre la forme pleine -du terme, nous d\'efinissons une notion d'arbre anaphorique de terme et nous -\'etudions ses propri\'et\'es. Concernant la r\'eduction lexicale, nous tentons -de d\'emontrer statistiquement qu'il existe une notion de cycle de vie de -terme, o\`u la forme pleine est progressivement remplac\'ee par une r\'eduction -lexicale. -" -298,1011.5209,"Loet Leydesdorff, Kasper Welbers",The semantic mapping of words and co-words in contexts,cs.CL stat.AP," Meaning can be generated when information is related at a systemic level. -Such a system can be an observer, but also a discourse, for example, -operationalized as a set of documents. The measurement of semantics as -similarity in patterns (correlations) and latent variables (factor analysis) -has been enhanced by computer techniques and the use of statistics; for -example, in ""Latent Semantic Analysis"". This communication provides an -introduction, an example, pointers to relevant software, and summarizes the -choices that can be made by the analyst. Visualization (""semantic mapping"") is -thus made more accessible. -" -299,1012.2042,"George Giannakopoulos (1) and George Vouros (2) and Vangelis - Karkaletsis (1) ((1) NCSR Demokritos, Greece, (2) University of the Aegean, - Greece)",MUDOS-NG: Multi-document Summaries Using N-gram Graphs (Tech Report),cs.CL cs.AI," This report describes the MUDOS-NG summarization system, which applies a set -of language-independent and generic methods for generating extractive -summaries. The proposed methods are mostly combinations of simple operators on -a generic character n-gram graph representation of texts. This work defines the -set of used operators upon n-gram graphs and proposes using these operators -within the multi-document summarization process in such subtasks as document -analysis, salient sentence selection, query expansion and redundancy control. -Furthermore, a novel chunking methodology is used, together with a novel way to -assign concepts to sentences for query expansion. The experimental results of -the summarization system, performed upon widely used corpora from the Document -Understanding and the Text Analysis Conferences, are promising and provide -evidence for the potential of the generic methods introduced. This work aims to -designate core methods exploiting the n-gram graph representation, providing -the basis for more advanced summarization systems. -" -300,1012.2661,"Maxime Amblard (INRIA Lorraine - LORIA), Alain Lecomte (INRIA Bordeaux - - Sud-Ouest, SFLTAMP), Christian Retor\'e (INRIA Bordeaux - Sud-Ouest, LaBRI)",Categorial Minimalist Grammar,cs.CL math.LO," We first recall some basic notions on minimalist grammars and on categorial -grammars. Next we shortly introduce partially commutative linear logic, and our -representation of minimalist grammars within this categorial system, the -so-called categorial minimalist grammars. Thereafter we briefly present -\lambda\mu-DRT (Discourse Representation Theory) an extension of \lambda-DRT -(compositional DRT) in the framework of \lambda\mu calculus: it avoids type -raising and derives different readings from a single semantic representation, -in a setting which follows discourse structure. We run a complete example which -illustrates the various structures and rules that are needed to derive a -semantic representation from the categorial view of a transformational -syntactic analysis. -" -301,1012.5248,"Ion Petre, Sergey Verlan",Matrix Insertion-Deletion Systems,cs.FL cs.CC cs.CL cs.DM," In this article, we consider for the first time the operations of insertion -and deletion working in a matrix controlled manner. We show that, similarly as -in the case of context-free productions, the computational power is strictly -increased when using a matrix control: computational completeness can be -obtained by systems with insertion or deletion rules involving at most two -symbols in a contextual or in a context-free manner and using only binary -matrices. -" -302,1012.5962,Jose Hernandez-Orallo,Annotated English,cs.CL," This document presents Annotated English, a system of diacritical symbols -which turns English pronunciation into a precise and unambiguous process. The -annotations are defined and located in such a way that the original English -text is not altered (not even a letter), thus allowing for a consistent reading -and learning of the English language with and without annotations. The -annotations are based on a set of general rules that make the frequency of -annotations not dramatically high. This makes the reader easily associate -annotations with exceptions, and makes it possible to shape, internalise and -consolidate some rules for the English language which otherwise are weakened by -the enormous amount of exceptions in English pronunciation. The advantages of -this annotation system are manifold. Any existing text can be annotated without -a significant increase in size. This means that we can get an annotated version -of any document or book with the same number of pages and fontsize. Since no -letter is affected, the text can be perfectly read by a person who does not -know the annotation rules, since annotations can be simply ignored. The -annotations are based on a set of rules which can be progressively learned and -recognised, even in cases where the reader has no access or time to read the -rules. This means that a reader can understand most of the annotations after -reading a few pages of Annotated English, and can take advantage from that -knowledge for any other annotated document she may read in the future. -" -303,1101.0309,"Edward Grefenstette, Mehrnoosh Sadrzadeh, Stephen Clark, Bob Coecke - and Stephen Pulman","Concrete Sentence Spaces for Compositional Distributional Models of - Meaning",cs.CL cs.AI cs.IR," Coecke, Sadrzadeh, and Clark (arXiv:1003.4394v1 [cs.CL]) developed a -compositional model of meaning for distributional semantics, in which each word -in a sentence has a meaning vector and the distributional meaning of the -sentence is a function of the tensor products of the word vectors. Abstractly -speaking, this function is the morphism corresponding to the grammatical -structure of the sentence in the category of finite dimensional vector spaces. -In this paper, we provide a concrete method for implementing this linear -meaning map, by constructing a corpus-based vector space for the type of -sentence. Our construction method is based on structured vector spaces whereby -meaning vectors of all sentences, regardless of their grammatical structure, -live in the same vector space. Our proposed sentence space is the tensor -product of two noun spaces, in which the basis vectors are pairs of words each -augmented with a grammatical role. This enables us to compare meanings of -sentences by simply taking the inner product of their vectors. -" -304,1101.0510,"Lars Kai Hansen, Adam Arvidsson, Finn {\AA}rup Nielsen, Elanor - Colleoni, Michael Etter","Good Friends, Bad News - Affect and Virality in Twitter",cs.SI cs.CL physics.soc-ph," The link between affect, defined as the capacity for sentimental arousal on -the part of a message, and virality, defined as the probability that it be sent -along, is of significant theoretical and practical importance, e.g. for viral -marketing. A quantitative study of emailing of articles from the NY Times finds -a strong link between positive affect and virality, and, based on psychological -theories it is concluded that this relation is universally valid. The -conclusion appears to be in contrast with classic theory of diffusion in news -media emphasizing negative affect as promoting propagation. In this paper we -explore the apparent paradox in a quantitative analysis of information -diffusion on Twitter. Twitter is interesting in this context as it has been -shown to present both the characteristics social and news media. The basic -measure of virality in Twitter is the probability of retweet. Twitter is -different from email in that retweeting does not depend on pre-existing social -relations, but often occur among strangers, thus in this respect Twitter may be -more similar to traditional news media. We therefore hypothesize that negative -news content is more likely to be retweeted, while for non-news tweets positive -sentiments support virality. To test the hypothesis we analyze three corpora: A -complete sample of tweets about the COP15 climate summit, a random sample of -tweets, and a general text corpus including news. The latter allows us to train -a classifier that can distinguish tweets that carry news and non-news -information. We present evidence that negative sentiment enhances virality in -the news segment, but not in the non-news segment. We conclude that the -relation between affect and virality is more complex than expected based on the -findings of Berger and Milkman (2010), in short 'if you want to be cited: Sweet -talk your friends or serve bad news to the public'. -" -305,1101.2804,"Animesh Mukherjee, Francesca Tria, Andrea Baronchelli, Andrea Puglisi - and Vittorio Loreto",Aging in language dynamics,physics.soc-ph cond-mat.stat-mech cs.CL cs.MA," Human languages evolve continuously, and a puzzling problem is how to -reconcile the apparent robustness of most of the deep linguistic structures we -use with the evidence that they undergo possibly slow, yet ceaseless, changes. -Is the state in which we observe languages today closer to what would be a -dynamical attractor with statistically stationary properties or rather closer -to a non-steady state slowly evolving in time? Here we address this question in -the framework of the emergence of shared linguistic categories in a population -of individuals interacting through language games. The observed emerging -asymptotic categorization, which has been previously tested - with success - -against experimental data from human languages, corresponds to a metastable -state where global shifts are always possible but progressively more unlikely -and the response properties depend on the age of the system. This aging -mechanism exhibits striking quantitative analogies to what is observed in the -statistical mechanics of glassy systems. We argue that this can be a general -scenario in language dynamics where shared linguistic conventions would not -emerge as attractors, but rather as metastable states. -" -306,1101.3578,Maarten McKubre-Jordens and Phillip L. Wilson,Infinity in computable probability,math.LO cs.CL cs.LO," Does combining a finite collection of objects infinitely many times guarantee -the construction of a particular object? Here we use recursive function theory -to examine the popular scenario of an infinite collection of typing monkeys -reproducing the works of Shakespeare. Our main result is to show that it is -possible to assign typing probabilities in such a way that while it is -impossible that no monkey reproduces Shakespeare's works, the probability of -any finite collection of monkeys doing so is arbitrarily small. We extend our -results to target-free writing, and end with a broad discussion and pointers to -future work. -" -307,1101.4479,Daoud Clarke,"A Context-theoretic Framework for Compositionality in Distributional - Semantics",cs.CL cs.AI," Techniques in which words are represented as vectors have proved useful in -many applications in computational linguistics, however there is currently no -general semantic formalism for representing meaning in terms of vectors. We -present a framework for natural language semantics in which words, phrases and -sentences are all represented as vectors, based on a theoretical analysis which -assumes that meaning is determined by context. - In the theoretical analysis, we define a corpus model as a mathematical -abstraction of a text corpus. The meaning of a string of words is assumed to be -a vector representing the contexts in which it occurs in the corpus model. -Based on this assumption, we can show that the vector representations of words -can be considered as elements of an algebra over a field. We note that in -applications of vector spaces to representing meanings of words there is an -underlying lattice structure; we interpret the partial ordering of the lattice -as describing entailment between meanings. We also define the context-theoretic -probability of a string, and, based on this and the lattice structure, a degree -of entailment between strings. - We relate the framework to existing methods of composing vector-based -representations of meaning, and show that our approach generalises many of -these, including vector addition, component-wise multiplication, and the tensor -product. -" -308,1101.5076,"Peter beim Graben, Sabrina Gerth",Geometric representations for minimalist grammars,cs.CL," We reformulate minimalist grammars as partial functions on term algebras for -strings and trees. Using filler/role bindings and tensor product -representations, we construct homomorphisms for these data structures into -geometric vector spaces. We prove that the structure-building functions as well -as simple processors for minimalist languages can be realized by piecewise -linear operators in representation space. We also propose harmony, i.e. the -distance of an intermediate processing step from the final well-formed state in -representation space, as a measure of processing complexity. Finally, we -illustrate our findings by means of two particular arithmetic and fractal -representations. -" -309,1101.5494,"Mourad Gridach, Noureddine Chenfour","Developing a New Approach for Arabic Morphological Analysis and - Generation",cs.CL," Arabic morphological analysis is one of the essential stages in Arabic -Natural Language Processing. In this paper we present an approach for Arabic -morphological analysis. This approach is based on Arabic morphological -automaton (AMAUT). The proposed technique uses a morphological database -realized using XMODEL language. Arabic morphology represents a special type of -morphological systems because it is based on the concept of scheme to represent -Arabic words. We use this concept to develop the Arabic morphological automata. -The proposed approach has development standardization aspect. It can be -exploited by NLP applications such as syntactic and semantic analysis, -information retrieval, machine translation and orthographical correction. The -proposed approach is compared with Xerox Arabic Analyzer and Smrz Arabic -Analyzer. -" -310,1101.5757,Arno Bastenhof,Polarized Montagovian Semantics for the Lambek-Grishin calculus,cs.CL," Grishin proposed enriching the Lambek calculus with multiplicative -disjunction (par) and coresiduals. Applications to linguistics were discussed -by Moortgat, who spoke of the Lambek-Grishin calculus (LG). In this paper, we -adapt Girard's polarity-sensitive double negation embedding for classical logic -to extract a compositional Montagovian semantics from a display calculus for -focused proof search in LG. We seize the opportunity to illustrate our approach -alongside an analysis of extraction, providing linguistic motivation for linear -distributivity of tensor over par, thus answering a question of -Kurtonina&Moortgat. We conclude by comparing our proposal to the continuation -semantics of Bernardi&Moortgat, corresponding to call-by- name and -call-by-value evaluation strategies. -" -311,1102.2180,"M. Serva, F. Petroni, D. Volchenkov and S. Wichmann",Malagasy Dialects and the Peopling of Madagascar,cs.CL physics.soc-ph," The origin of Malagasy DNA is half African and half Indonesian, nevertheless -the Malagasy language, spoken by the entire population, belongs to the -Austronesian family. The language most closely related to Malagasy is Maanyan -(Greater Barito East group of the Austronesian family), but related languages -are also in Sulawesi, Malaysia and Sumatra. For this reason, and because -Maanyan is spoken by a population which lives along the Barito river in -Kalimantan and which does not possess the necessary skill for long maritime -navigation, the ethnic composition of the Indonesian colonizers is still -unclear. - There is a general consensus that Indonesian sailors reached Madagascar by a -maritime trek, but the time, the path and the landing area of the first -colonization are all disputed. In this research we try to answer these problems -together with other ones, such as the historical configuration of Malagasy -dialects, by types of analysis related to lexicostatistics and glottochronology -which draw upon the automated method recently proposed by the authors -\cite{Serva:2008, Holman:2008, Petroni:2008, Bakker:2009}. The data were -collected by the first author at the beginning of 2010 with the invaluable help -of Joselin\`a Soafara N\'er\'e and consist of Swadesh lists of 200 items for 23 -dialects covering all areas of the Island. -" -312,1102.2831,Madhav Krishna and Ahmed Hassan and Yang Liu and Dragomir Radev,"The effect of linguistic constraints on the large scale organization of - language",cs.CL cs.SI," This paper studies the effect of linguistic constraints on the large scale -organization of language. It describes the properties of linguistic networks -built using texts of written language with the words randomized. These -properties are compared to those obtained for a network built over the text in -natural order. It is observed that the ""random"" networks too exhibit -small-world and scale-free characteristics. They also show a high degree of -clustering. This is indeed a surprising result - one that has not been -addressed adequately in the literature. We hypothesize that many of the network -statistics reported here studied are in fact functions of the distribution of -the underlying data from which the network is built and may not be indicative -of the nature of the concerned network. -" -313,1102.5185,Victor Gluzberg,Universal Higher Order Grammar,cs.CL cs.AI," We examine the class of languages that can be defined entirely in terms of -provability in an extension of the sorted type theory (Ty_n) by embedding the -logic of phonologies, without introduction of special types for syntactic -entities. This class is proven to precisely coincide with the class of -logically closed languages that may be thought of as functions from expressions -to sets of logically equivalent Ty_n terms. For a specific sub-class of -logically closed languages that are described by finite sets of rules or rule -schemata, we find effective procedures for building a compact Ty_n -representation, involving a finite number of axioms or axiom schemata. The -proposed formalism is characterized by some useful features unavailable in a -two-component architecture of a language model. A further specialization and -extension of the formalism with a context type enable effective account of -intensional and dynamic semantics. -" -314,1103.0398,"Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray - Kavukcuoglu, Pavel Kuksa",Natural Language Processing (almost) from Scratch,cs.LG cs.CL," We propose a unified neural network architecture and learning algorithm that -can be applied to various natural language processing tasks including: -part-of-speech tagging, chunking, named entity recognition, and semantic role -labeling. This versatility is achieved by trying to avoid task-specific -engineering and therefore disregarding a lot of prior knowledge. Instead of -exploiting man-made input features carefully optimized for each task, our -system learns internal representations on the basis of vast amounts of mostly -unlabeled training data. This work is then used as a basis for building a -freely available tagging system with good performance and minimal computational -requirements. -" -315,1103.0784,"Johan Bollen, Bruno Goncalves, Guangchen Ruan, and Huina Mao",Happiness is assortative in online social networks,cs.SI cs.CL physics.soc-ph," Social networks tend to disproportionally favor connections between -individuals with either similar or dissimilar characteristics. This propensity, -referred to as assortative mixing or homophily, is expressed as the correlation -between attribute values of nearest neighbour vertices in a graph. Recent -results indicate that beyond demographic features such as age, sex and race, -even psychological states such as ""loneliness"" can be assortative in a social -network. In spite of the increasing societal importance of online social -networks it is unknown whether assortative mixing of psychological states takes -place in situations where social ties are mediated solely by online networking -services in the absence of physical contact. Here, we show that general -happiness or Subjective Well-Being (SWB) of Twitter users, as measured from a 6 -month record of their individual tweets, is indeed assortative across the -Twitter social network. To our knowledge this is the first result that shows -assortative mixing in online networks at the level of SWB. Our results imply -that online social networks may be equally subject to the social mechanisms -that cause assortative mixing in real social networks and that such assortative -mixing takes place at the level of SWB. Given the increasing prevalence of -online social networks, their propensity to connect users with similar levels -of SWB may be an important instrument in better understanding how both positive -and negative sentiments spread through online social ties. Future research may -focus on how event-specific mood states can propagate and influence user -behavior in ""real life"". -" -316,1103.0890,"Qi Mao, Ivor W. Tsang",Efficient Multi-Template Learning for Structured Prediction,cs.LG cs.CL," Conditional random field (CRF) and Structural Support Vector Machine -(Structural SVM) are two state-of-the-art methods for structured prediction -which captures the interdependencies among output variables. The success of -these methods is attributed to the fact that their discriminative models are -able to account for overlapping features on the whole input observations. These -features are usually generated by applying a given set of templates on labeled -data, but improper templates may lead to degraded performance. To alleviate -this issue, in this paper, we propose a novel multiple template learning -paradigm to learn structured prediction and the importance of each template -simultaneously, so that hundreds of arbitrary templates could be added into the -learning model without caution. This paradigm can be formulated as a special -multiple kernel learning problem with exponential number of constraints. Then -we introduce an efficient cutting plane algorithm to solve this problem in the -primal, and its convergence is presented. We also evaluate the proposed -learning paradigm on two widely-studied structured prediction tasks, -\emph{i.e.} sequence labeling and dependency parsing. Extensive experimental -results show that the proposed method outperforms CRFs and Structural SVMs due -to exploiting the importance of each template. Our complexity analysis and -empirical results also show that our proposed method is more efficient than -OnlineMKL on very sparse and high-dimensional data. We further extend this -paradigm for structured prediction using generalized $p$-block norm -regularization with $p>1$, and experiments show competitive performances when -$p \in [1,2)$. -" -317,1103.1898,Heather Pon-Barry and Stuart M. Shieber,Recognizing Uncertainty in Speech,cs.CL," We address the problem of inferring a speaker's level of certainty based on -prosodic information in the speech signal, which has application in -speech-based dialogue systems. We show that using phrase-level prosodic -features centered around the phrases causing uncertainty, in addition to -utterance-level prosodic features, improves our model's level of certainty -classification. In addition, our models can be used to predict which phrase a -person is uncertain about. These results rely on a novel method for eliciting -utterances of varying levels of certainty that allows us to compare the utility -of contextually-based feature sets. We elicit level of certainty ratings from -both the speakers themselves and a panel of listeners, finding that there is -often a mismatch between speakers' internal states and their perceived states, -and highlighting the importance of this distinction. -" -318,1103.2325,"David Levary, Jean-Pierre Eckmann, Elisha Moses and Tsvi Tlusty",Self reference in word definitions,cs.CL cs.AI physics.soc-ph," Dictionaries are inherently circular in nature. A given word is linked to a -set of alternative words (the definition) which in turn point to further -descendants. Iterating through definitions in this way, one typically finds -that definitions loop back upon themselves. The graph formed by such -definitional relations is our object of study. By eliminating those links which -are not in loops, we arrive at a core subgraph of highly connected nodes. - We observe that definitional loops are conveniently classified by length, -with longer loops usually emerging from semantic misinterpretation. By breaking -the long loops in the graph of the dictionary, we arrive at a set of -disconnected clusters. We find that the words in these clusters constitute -semantic units, and moreover tend to have been introduced into the English -language at similar times, suggesting a possible mechanism for language -evolution. -" -319,1103.2681,"Sebastian Bernhardsson, Seung Ki Baek and Petter Minnhagen",A Paradoxical Property of the Monkey Book,physics.data-an cond-mat.stat-mech cs.CL cs.IR physics.soc-ph," A ""monkey book"" is a book consisting of a random distribution of letters and -blanks, where a group of letters surrounded by two blanks is defined as a word. -We compare the statistics of the word distribution for a monkey book with the -corresponding distribution for the general class of random books, where the -latter are books for which the words are randomly distributed. It is shown that -the word distribution statistics for the monkey book is different and quite -distinct from a typical sampled book or real book. In particular the monkey -book obeys Heaps' power law to an extraordinary good approximation, in contrast -to the word distributions for sampled and real books, which deviate from Heaps' -law in a characteristics way. The somewhat counter-intuitive conclusion is that -a ""monkey book"" obeys Heaps' power law precisely because its word-frequency -distribution is not a smooth power law, contrary to the expectation based on -simple mathematical arguments that if one is a power law, so is the other. -" -320,1103.2903,Finn {\AA}rup Nielsen,"A new ANEW: Evaluation of a word list for sentiment analysis in - microblogs",cs.IR cs.CL," Sentiment analysis of microblogs such as Twitter has recently gained a fair -amount of attention. One of the simplest sentiment analysis approaches compares -the words of a posting against a labeled word list, where each word has been -scored for valence, -- a 'sentiment lexicon' or 'affective word lists'. There -exist several affective word lists, e.g., ANEW (Affective Norms for English -Words) developed before the advent of microblogging and sentiment analysis. I -wanted to examine how well ANEW and other word lists performs for the detection -of sentiment strength in microblog posts in comparison with a new word list -specifically constructed for microblogs. I used manually labeled postings from -Twitter scored for sentiment. Using a simple word matching I show that the new -word list may perform better than ANEW, though not as good as the more -elaborate approach found in SentiStrength. -" -321,1103.2950,Wentian Li and Pedro Miramontes,"Fitting Ranked English and Spanish Letter Frequency Distribution in U.S. - and Mexican Presidential Speeches",cs.CL," The limited range in its abscissa of ranked letter frequency distributions -causes multiple functions to fit the observed distribution reasonably well. In -order to critically compare various functions, we apply the statistical model -selections on ten functions, using the texts of U.S. and Mexican presidential -speeches in the last 1-2 centuries. Dispite minor switching of ranking order of -certain letters during the temporal evolution for both datasets, the letter -usage is generally stable. The best fitting function, judged by either -least-square-error or by AIC/BIC model selection, is the Cocho/Beta function. -We also use a novel method to discover clusters of letters by their -observed-over-expected frequency ratios. -" -322,1103.3585,"Fredrik Sandin, Blerim Emruli, Magnus Sahlgren",Incremental dimension reduction of tensors with random index,cs.DS cs.CL cs.IR," We present an incremental, scalable and efficient dimension reduction -technique for tensors that is based on sparse random linear coding. Data is -stored in a compactified representation with fixed size, which makes memory -requirements low and predictable. Component encoding and decoding are performed -on-line without computationally expensive re-analysis of the data set. The -range of tensor indices can be extended dynamically without modifying the -component representation. This idea originates from a mathematical model of -semantic memory and a method known as random indexing in natural language -processing. We generalize the random-indexing algorithm to tensors and present -signal-to-noise-ratio simulations for representations of vectors and matrices. -We present also a mathematical analysis of the approximate orthogonality of -high-dimensional ternary vectors, which is a property that underpins this and -other similar random-coding approaches to dimension reduction. To further -demonstrate the properties of random indexing we present results of a synonym -identification task. The method presented here has some similarities with -random projection and Tucker decomposition, but it performs well at high -dimensionality only (n>10^3). Random indexing is useful for a range of complex -practical problems, e.g., in natural language processing, data mining, pattern -recognition, event detection, graph searching and search engines. Prototype -software is provided. It supports encoding and decoding of tensors of order >= -1 in a unified framework, i.e., vectors, matrices and higher order tensors. -" -323,1103.3952,{\L}ukasz D\k{e}bowski,"Mixing, Ergodic, and Nonergodic Processes with Rapidly Growing - Information between Blocks",cs.IT cs.CL math.IT," We construct mixing processes over an infinite alphabet and ergodic processes -over a finite alphabet for which Shannon mutual information between adjacent -blocks of length $n$ grows as $n^\beta$, where $\beta\in(0,1)$. The processes -are a modification of nonergodic Santa Fe processes, which were introduced in -the context of natural language modeling. The rates of mutual information for -the latter processes are alike and also established in this paper. As an -auxiliary result, it is shown that infinite direct products of mixing processes -are also mixing. -" -324,1103.4012,"Simone Pompei, Vittorio Loreto and Francesca Tria",On the accuracy of language trees,physics.soc-ph cs.CL q-bio.QM," Historical linguistics aims at inferring the most likely language -phylogenetic tree starting from information concerning the evolutionary -relatedness of languages. The available information are typically lists of -homologous (lexical, phonological, syntactic) features or characters for many -different languages. - From this perspective the reconstruction of language trees is an example of -inverse problems: starting from present, incomplete and often noisy, -information, one aims at inferring the most likely past evolutionary history. A -fundamental issue in inverse problems is the evaluation of the inference made. -A standard way of dealing with this question is to generate data with -artificial models in order to have full access to the evolutionary process one -is going to infer. This procedure presents an intrinsic limitation: when -dealing with real data sets, one typically does not know which model of -evolution is the most suitable for them. A possible way out is to compare -algorithmic inference with expert classifications. This is the point of view we -take here by conducting a thorough survey of the accuracy of reconstruction -methods as compared with the Ethnologue expert classifications. We focus in -particular on state-of-the-art distance-based methods for phylogeny -reconstruction using worldwide linguistic databases. - In order to assess the accuracy of the inferred trees we introduce and -characterize two generalizations of standard definitions of distances between -trees. Based on these scores we quantify the relative performances of the -distance-based algorithms considered. Further we quantify how the completeness -and the coverage of the available databases affect the accuracy of the -reconstruction. Finally we draw some conclusions about where the accuracy of -the reconstructions in historical linguistics stands and about the leading -directions to improve it. -" -325,1103.4090,"An\'alia Louren\c{c}o, Michael Conover, Andrew Wong, Azadeh - Nematzadeh, Fengxia Pan, Hagit Shatkay, Luis M. Rocha","A Linear Classifier Based on Entity Recognition Tools and a Statistical - Approach to Method Extraction in the Protein-Protein Interaction Literature",q-bio.QM cs.CL cs.IR cs.LG," We participated, in the Article Classification and the Interaction Method -subtasks (ACT and IMT, respectively) of the Protein-Protein Interaction task of -the BioCreative III Challenge. For the ACT, we pursued an extensive testing of -available Named Entity Recognition and dictionary tools, and used the most -promising ones to extend our Variable Trigonometric Threshold linear -classifier. For the IMT, we experimented with a primarily statistical approach, -as opposed to employing a deeper natural language processing strategy. Finally, -we also studied the benefits of integrating the method extraction approach that -we have used for the IMT into the ACT pipeline. For the ACT, our linear article -classifier leads to a ranking and classification performance significantly -higher than all the reported submissions. For the IMT, our results are -comparable to those of other systems, which took very different approaches. For -the ACT, we show that the use of named entity recognition tools leads to a -substantial improvement in the ranking and classification of articles relevant -to protein-protein interaction. Thus, we show that our substantially expanded -linear classifier is a very competitive classifier in this domain. Moreover, -this classifier produces interpretable surfaces that can be understood as -""rules"" for human understanding of the classification. In terms of the IMT -task, in contrast to other participants, our approach focused on identifying -sentences that are likely to bear evidence for the application of a PPI -detection method, rather than on classifying a document as relevant to a -method. As BioCreative III did not perform an evaluation of the evidence -provided by the system, we have conducted a separate assessment; the evaluators -agree that our tool is indeed effective in detecting relevant evidence for PPI -detection methods. -" -326,1103.5676,Tobias Kuhn,"Codeco: A Grammar Notation for Controlled Natural Language in Predictive - Editors",cs.CL," Existing grammar frameworks do not work out particularly well for controlled -natural languages (CNL), especially if they are to be used in predictive -editors. I introduce in this paper a new grammar notation, called Codeco, which -is designed specifically for CNLs and predictive editors. Two different parsers -have been implemented and a large subset of Attempto Controlled English (ACE) -has been represented in Codeco. The results show that Codeco is practical, -adequate and efficient. -" -327,1104.2034,Yavor Angelov Parvanov,"Materials to the Russian-Bulgarian Comparative Dictionary ""EAD""",cs.CL," This article presents a fragment of a new comparative dictionary ""A -comparative dictionary of names of expansive action in Russian and Bulgarian -languages"". Main features of the new web-based comparative dictionary are -placed, the principles of its formation are shown, primary links between the -word-matches are classified. The principal difference between translation -dictionaries and the model of double comparison is also shown. The -classification scheme of the pages is proposed. New concepts and keywords have -been introduced. The real prototype of the dictionary with a few key pages is -published. The broad debate about the possibility of this prototype to become a -version of Russian-Bulgarian comparative dictionary of a new generation is -available. -" -328,1104.2086,"Slav Petrov, Dipanjan Das and Ryan McDonald",A Universal Part-of-Speech Tagset,cs.CL," To facilitate future research in unsupervised induction of syntactic -structure and to standardize best-practices, we propose a tagset that consists -of twelve universal part-of-speech categories. In addition to the tagset, we -develop a mapping from 25 different treebank tagsets to this universal set. As -a result, when combined with the original treebank data, this universal tagset -and mapping produce a dataset consisting of common parts-of-speech for 22 -different languages. We highlight the use of this resource via two experiments, -including one that reports competitive accuracies for unsupervised grammar -induction without gold standard part-of-speech tags. -" -329,1104.4321,Yannis Haralambous,"Seeking Meaning in a Space Made out of Strokes, Radicals, Characters and - Compounds",cs.CL," Chinese characters can be compared to a molecular structure: a character is -analogous to a molecule, radicals are like atoms, calligraphic strokes -correspond to elementary particles, and when characters form compounds, they -are like molecular structures. In chemistry the conjunction of all of these -structural levels produces what we perceive as matter. In language, the -conjunction of strokes, radicals, characters, and compounds produces meaning. -But when does meaning arise? We all know that radicals are, in some sense, the -basic semantic components of Chinese script, but what about strokes? -Considering the fact that many characters are made by adding individual strokes -to (combinations of) radicals, we can legitimately ask the question whether -strokes carry meaning, or not. In this talk I will present my project of -extending traditional NLP techniques to radicals and strokes, aiming to obtain -a deeper understanding of the way ideographic languages model the world. -" -330,1104.4426,Maurizio Serva,Phylogeny and geometry of languages from normalized Levenshtein distance,cs.CL q-bio.PE," The idea that the distance among pairs of languages can be evaluated from -lexical differences seems to have its roots in the work of the French explorer -Dumont D'Urville. He collected comparative words lists of various languages -during his voyages aboard the Astrolabe from 1826 to 1829 and, in his work -about the geographical division of the Pacific, he proposed a method to measure -the degree of relation between languages. - The method used by the modern lexicostatistics, developed by Morris Swadesh -in the 1950s, measures distances from the percentage of shared cognates, which -are words with a common historical origin. The weak point of this method is -that subjective judgment plays a relevant role. - Recently, we have proposed a new automated method which is motivated by the -analogy with genetics. The new approach avoids any subjectivity and results can -be easily replicated by other scholars. The distance between two languages is -defined by considering a renormalized Levenshtein distance between pair of -words with the same meaning and averaging on the words contained in a list. The -renormalization, which takes into account the length of the words, plays a -crucial role, and no sensible results can be found without it. - In this paper we give a short review of our automated method and we -illustrate it by considering the cluster of Malagasy dialects. We show that it -sheds new light on their kinship relation and also that it furnishes a lot of -new information concerning the modalities of the settlement of Madagascar. -" -331,1104.4681,"R. Rajeswara Rao, V. Kamakshi Prasad, A. Nagesh","Performance Evaluation of Statistical Approaches for Text Independent - Speaker Recognition Using Source Feature",cs.CL," This paper introduces the performance evaluation of statistical approaches -for TextIndependent speaker recognition system using source feature. Linear -prediction LP residual is used as a representation of excitation information in -speech. The speaker-specific information in the excitation of voiced speech is -captured using statistical approaches such as Gaussian Mixture Models GMMs and -Hidden Markov Models HMMs. The decrease in the error during training and -recognizing speakers during testing phase close to 100 percent accuracy -demonstrates that the excitation component of speech contains speaker-specific -information and is indeed being effectively captured by continuous Ergodic HMM -than GMM. The performance of the speaker recognition system is evaluated on GMM -and 2 state ergodic HMM with different mixture components and test speech -duration. We demonstrate the speaker recognition studies on TIMIT database for -both GMM and Ergodic HMM. -" -332,1104.4950,Hamed Hassanzadeh and MohammadReza Keyvanpour,"A Machine Learning Based Analytical Framework for Semantic Annotation - Requirements",cs.AI cs.CL," The Semantic Web is an extension of the current web in which information is -given well-defined meaning. The perspective of Semantic Web is to promote the -quality and intelligence of the current web by changing its contents into -machine understandable form. Therefore, semantic level information is one of -the cornerstones of the Semantic Web. The process of adding semantic metadata -to web resources is called Semantic Annotation. There are many obstacles -against the Semantic Annotation, such as multilinguality, scalability, and -issues which are related to diversity and inconsistency in content of different -web pages. Due to the wide range of domains and the dynamic environments that -the Semantic Annotation systems must be performed on, the problem of automating -annotation process is one of the significant challenges in this domain. To -overcome this problem, different machine learning approaches such as supervised -learning, unsupervised learning and more recent ones like, semi-supervised -learning and active learning have been utilized. In this paper we present an -inclusive layered classification of Semantic Annotation challenges and discuss -the most important issues in this field. Also, we review and analyze machine -learning applications for solving semantic annotation problems. For this goal, -the article tries to closely study and categorize related researches for better -understanding and to reach a framework that can map machine learning techniques -into the Semantic Annotation challenges and requirements. -" -333,1104.5362,Andr\'e Kempe,"Selected Operations, Algorithms, and Applications of n-Tape Weighted - Finite-State Machines",cs.FL cs.CL," A weighted finite-state machine with n tapes (n-WFSM) defines a rational -relation on n strings. It is a generalization of weighted acceptors (one tape) -and transducers (two tapes). - After recalling some basic definitions about n-ary weighted rational -relations and n-WFSMs, we summarize some central operations on these relations -and machines, such as join and auto-intersection. Unfortunately, due to Post's -Correspondence Problem, a fully general join or auto-intersection algorithm -cannot exist. We recall a restricted algorithm for a class of n-WFSMs. - Through a series of practical applications, we finally investigate the -augmented descriptive power of n-WFSMs and their join, compared to classical -transducers and their composition. Some applications are not feasible with the -latter. The series includes: the morphological analysis of Semitic languages, -the preservation of intermediate results in transducer cascades, the induction -of morphological rules from corpora, the alignment of lexicon entries, the -automatic extraction of acronyms and their meaning from corpora, and the search -for cognates in a bilingual lexicon. - All described operations and applications have been implemented with Xerox's -WFSC tool. -" -334,1105.0673,"Cristian Danescu-Niculescu-Mizil, Michael Gamon, Susan Dumais",Mark My Words! Linguistic Style Accommodation in Social Media,cs.CL cs.SI," The psycholinguistic theory of communication accommodation accounts for the -general observation that participants in conversations tend to converge to one -another's communicative behavior: they coordinate in a variety of dimensions -including choice of words, syntax, utterance length, pitch and gestures. In its -almost forty years of existence, this theory has been empirically supported -exclusively through small-scale or controlled laboratory studies. Here we -address this phenomenon in the context of Twitter conversations. Undoubtedly, -this setting is unlike any other in which accommodation was observed and, thus, -challenging to the theory. Its novelty comes not only from its size, but also -from the non real-time nature of conversations, from the 140 character length -restriction, from the wide variety of social relation types, and from a design -that was initially not geared towards conversation at all. Given such -constraints, it is not clear a priori whether accommodation is robust enough to -occur given the constraints of this new environment. To investigate this, we -develop a probabilistic framework that can model accommodation and measure its -effects. We apply it to a large Twitter conversational dataset specifically -developed for this task. This is the first time the hypothesis of linguistic -style accommodation has been examined (and verified) in a large scale, real -world setting. Furthermore, when investigating concepts such as stylistic -influence and symmetry of accommodation, we discover a complexity of the -phenomenon which was never observed before. We also explore the potential -relation between stylistic influence and network features commonly associated -with social status. -" -335,1105.1072,"G. Barisevi\v{c}ius, B. Tamulynas","English-Lithuanian-English Machine Translation lexicon and engine: - current state and future work",cs.CL," This article overviews the current state of the English-Lithuanian-English -machine translation system. The first part of the article describes the -problems that system poses today and what actions will be taken to solve them -in the future. The second part of the article tackles the main issue of the -translation process. Article briefly overviews the word sense disambiguation -for MT technique using Google. -" -336,1105.1226,"G. Barisevi\v{c}ius, B. Tamulynas",Multilingual lexicon design tool and database management system for MT,cs.CL," The paper presents the design and development of English-Lithuanian-English -dictionarylexicon tool and lexicon database management system for MT. The -system is oriented to support two main requirements: to be open to the user and -to describe much more attributes of speech parts as a regular dictionary that -are required for the MT. Programming language Java and database management -system MySql is used to implement the designing tool and lexicon database -respectively. This solution allows easily deploying this system in the -Internet. The system is able to run on various OS such as: Windows, Linux, Mac -and other OS where Java Virtual Machine is supported. Since the modern lexicon -database managing system is used, it is not a problem accessing the same -database for several users. -" -337,1105.1306,{\L}ukasz D\k{e}bowski,Excess entropy in natural language: present state and perspectives,cs.IT cs.CL math.IT," We review recent progress in understanding the meaning of mutual information -in natural language. Let us define words in a text as strings that occur -sufficiently often. In a few previous papers, we have shown that a power-law -distribution for so defined words (a.k.a. Herdan's law) is obeyed if there is a -similar power-law growth of (algorithmic) mutual information between adjacent -portions of texts of increasing length. Moreover, the power-law growth of -information holds if texts describe a complicated infinite (algorithmically) -random object in a highly repetitive way, according to an analogous power-law -distribution. The described object may be immutable (like a mathematical or -physical constant) or may evolve slowly in time (like cultural heritage). Here -we reflect on the respective mathematical results in a less technical way. We -also discuss feasibility of deciding to what extent these results apply to the -actual human communication. -" -338,1105.1702,Mehrnoosh Sadrzadeh and Edward Grefenstette,"A Compositional Distributional Semantics, Two Concrete Constructions, - and some Experimental Evaluations",cs.CL math.CT," We provide an overview of the hybrid compositional distributional model of -meaning, developed in Coecke et al. (arXiv:1003.4394v1 [cs.CL]), which is based -on the categorical methods also applied to the analysis of information flow in -quantum protocols. The mathematical setting stipulates that the meaning of a -sentence is a linear function of the tensor products of the meanings of its -words. We provide concrete constructions for this definition and present -techniques to build vector spaces for meaning vectors of words, as well as that -of sentences. The applicability of these methods is demonstrated via a toy -vector space as well as real data from the British National Corpus and two -disambiguation experiments. -" -339,1105.4582,Maxim Makatchev and Reid Simmons,"Perception of Personality and Naturalness through Dialogues by Native - Speakers of American English and Arabic",cs.CL cs.RO," Linguistic markers of personality traits have been studied extensively, but -few cross-cultural studies exist. In this paper, we evaluate how native -speakers of American English and Arabic perceive personality traits and -naturalness of English utterances that vary along the dimensions of verbosity, -hedging, lexical and syntactic alignment, and formality. The utterances are the -turns within dialogue fragments that are presented as text transcripts to the -workers of Amazon's Mechanical Turk. The results of the study suggest that all -four dimensions can be used as linguistic markers of all personality traits by -both language communities. A further comparative analysis shows cross-cultural -differences for some combinations of measures of personality traits and -naturalness, the dimensions of linguistic variability and dialogue acts. -" -340,1105.6162,Jerry R. Van Aken,A statistical learning algorithm for word segmentation,cs.CL," In natural speech, the speaker does not pause between words, yet a human -listener somehow perceives this continuous stream of phonemes as a series of -distinct words. The detection of boundaries between spoken words is an instance -of a general capability of the human neocortex to remember and to recognize -recurring sequences. This paper describes a computer algorithm that is designed -to solve the problem of locating word boundaries in blocks of English text from -which the spaces have been removed. This problem avoids the complexities of -speech processing but requires similar capabilities for detecting recurring -sequences. The algorithm relies entirely on statistical relationships between -letters in the input stream to infer the locations of word boundaries. A -Viterbi trellis is used to simultaneously evaluate a set of hypothetical -segmentations of a block of adjacent words. This technique improves accuracy -but incurs a small latency between the arrival of letters in the input stream -and the sending of words to the output stream. The source code for a C++ -version of this algorithm is presented in an appendix. -" -341,1106.0107,"John Jomy, K. V. Pramod, Balakrishnan Kannan",Handwritten Character Recognition of South Indian Scripts: A Review,cs.CV cs.CL cs.CY," Handwritten character recognition is always a frontier area of research in -the field of pattern recognition and image processing and there is a large -demand for OCR on hand written documents. Even though, sufficient studies have -performed in foreign scripts like Chinese, Japanese and Arabic characters, only -a very few work can be traced for handwritten character recognition of Indian -scripts especially for the South Indian scripts. This paper provides an -overview of offline handwritten character recognition in South Indian Scripts, -namely Malayalam, Tamil, Kannada and Telungu. -" -342,1106.0411,Alvaro Francisco Huertas-Rosero and C. J. van Rijsbergen,Quantum-Like Uncertain Conditionals for Text Analysis,cs.CL quant-ph," Simple representations of documents based on the occurrences of terms are -ubiquitous in areas like Information Retrieval, and also frequent in Natural -Language Processing. In this work we propose a logical-probabilistic approach -to the analysis of natural language text based in the concept of Uncertain -Conditional, on top of a formulation of lexical measurements inspired in the -theoretical concept of ideal quantum measurements. The proposed concept can be -used for generating topic-specific representations of text, aiming to match in -a simple way the perception of a user with a pre-established idea of what the -usage of terms in the text should be. A simple example is developed with two -versions of a text in two languages, showing how regularities in the use of -terms are detected and easily represented. -" -343,1106.0673,"P. Martinez-Barco, M. Palomar",Computational Approach to Anaphora Resolution in Spanish Dialogues,cs.CL," This paper presents an algorithm for identifying noun-phrase antecedents of -pronouns and adjectival anaphors in Spanish dialogues. We believe that anaphora -resolution requires numerous sources of information in order to find the -correct antecedent of the anaphor. These sources can be of different kinds, -e.g., linguistic information, discourse/dialogue structure information, or -topic information. For this reason, our algorithm uses various different kinds -of information (hybrid information). The algorithm is based on linguistic -constraints and preferences and uses an anaphoric accessibility space within -which the algorithm finds the noun phrase. We present some experiments related -to this algorithm and this space using a corpus of 204 dialogues. The algorithm -is implemented in Prolog. According to this study, 95.9% of antecedents were -located in the proposed space, a precision of 81.3% was obtained for pronominal -anaphora resolution, and 81.5% for adjectival anaphora. -" -344,1106.3077,Cristian Danescu-Niculescu-Mizil and Lillian Lee,"Chameleons in imagined conversations: A new approach to understanding - coordination of linguistic style in dialogs",cs.CL physics.soc-ph," Conversational participants tend to immediately and unconsciously adapt to -each other's language styles: a speaker will even adjust the number of articles -and other function words in their next utterance in response to the number in -their partner's immediately preceding utterance. This striking level of -coordination is thought to have arisen as a way to achieve social goals, such -as gaining approval or emphasizing difference in status. But has the adaptation -mechanism become so deeply embedded in the language-generation process as to -become a reflex? We argue that fictional dialogs offer a way to study this -question, since authors create the conversations but don't receive the social -benefits (rather, the imagined characters do). Indeed, we find significant -coordination across many families of function words in our large movie-script -corpus. We also report suggestive preliminary findings on the effects of gender -and other features; e.g., surprisingly, for articles, on average, characters -adapt more to females than to males. -" -345,1106.4058,Edward Grefenstette and Mehrnoosh Sadrzadeh,"Experimental Support for a Categorical Compositional Distributional - Model of Meaning",cs.CL math.CT," Modelling compositional meaning for sentences using empirical distributional -methods has been a challenge for computational linguists. We implement the -abstract categorical model of Coecke et al. (arXiv:1003.4394v1 [cs.CL]) using -data from the BNC and evaluate it. The implementation is based on unsupervised -learning of matrices for relational words and applying them to the vectors of -their arguments. The evaluation is based on the word disambiguation task -developed by Mitchell and Lapata (2008) for intransitive sentences, and on a -similar new experiment designed for transitive sentences. Our model matches the -results of its competitors in the first experiment, and betters them in the -second. The general improvement in results with increase in syntactic -complexity showcases the compositional power of our model. -" -346,1106.4571,C. Thompson,Acquiring Word-Meaning Mappings for Natural Language Interfaces,cs.CL cs.AI," This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted -Examples), that acquires a semantic lexicon from a corpus of sentences paired -with semantic representations. The lexicon learned consists of phrases paired -with meaning representations. WOLFIE is part of an integrated system that -learns to transform sentences into representations such as logical database -queries. Experimental results are presented demonstrating WOLFIE's ability to -learn useful lexicons for a database interface in four different natural -languages. The usefulness of the lexicons learned by WOLFIE are compared to -those acquired by a similar system, with results favorable to WOLFIE. A second -set of experiments demonstrates WOLFIE's ability to scale to larger and more -difficult, albeit artificially generated, corpora. In natural language -acquisition, it is difficult to gather the annotated data needed for supervised -learning; however, unannotated data is fairly plentiful. Active learning -methods attempt to select for annotation and training only the most informative -examples, and therefore are potentially very useful in natural language -applications. However, most results to date for active learning have only -considered standard classification tasks. To reduce annotation effort while -maintaining accuracy, we apply active learning to semantic lexicons. We show -that active learning can significantly reduce the number of annotated examples -required to achieve a given level of performance. -" -347,1106.4862,"A. Ferrandez, J. Peral","Translation of Pronominal Anaphora between English and Spanish: - Discrepancies and Evaluation",cs.CL cs.AI," This paper evaluates the different tasks carried out in the translation of -pronominal anaphora in a machine translation (MT) system. The MT interlingua -approach named AGIR (Anaphora Generation with an Interlingua Representation) -improves upon other proposals presented to date because it is able to translate -intersentential anaphors, detect co-reference chains, and translate Spanish -zero pronouns into English---issues hardly considered by other systems. The -paper presents the resolution and evaluation of these anaphora problems in AGIR -with the use of different kinds of knowledge (lexical, morphological, -syntactic, and semantic). The translation of English and Spanish anaphoric -third-person personal pronouns (including Spanish zero pronouns) into the -target language has been evaluated on unrestricted corpora. We have obtained a -precision of 80.4% and 84.8% in the translation of Spanish and English -pronouns, respectively. Although we have only studied the Spanish and English -languages, our approach can be easily extended to other languages such as -Portuguese, Italian, or Japanese. -" -348,1106.5264,"E. Reiter, R. Robertson, S. G. Sripada",Acquiring Correct Knowledge for Natural Language Generation,cs.CL," Natural language generation (NLG) systems are computer software systems that -produce texts in English and other human languages, often from non-linguistic -input data. NLG systems, like most AI systems, need substantial amounts of -knowledge. However, our experience in two NLG projects suggests that it is -difficult to acquire correct knowledge for NLG systems; indeed, every knowledge -acquisition (KA) technique we tried had significant problems. In general terms, -these problems were due to the complexity, novelty, and poorly understood -nature of the tasks our systems attempted, and were worsened by the fact that -people write so differently. This meant in particular that corpus-based KA -approaches suffered because it was impossible to assemble a sizable corpus of -high-quality consistent manually written texts in our domains; and structured -expert-oriented KA techniques suffered because experts disagreed and because we -could not get enough information about special and unusual cases to build -robust systems. We believe that such problems are likely to affect many other -NLG systems as well. In the long term, we hope that new KA techniques may -emerge to help NLG system builders. In the shorter term, we believe that -understanding how individual KA techniques can fail, and using a mixture of -different KA techniques with different strengths and weaknesses, can help -developers acquire NLG knowledge that is mostly correct. -" -349,1106.5308,"Florin Pop, Diana Petrescu, \c{S}tefan Trau\c{s}an-Matu",Clasificarea distribuita a mesajelor de e-mail,cs.HC cs.CL," A basic component in Internet applications is the electronic mail and its -various implications. The paper proposes a mechanism for automatically -classifying emails and create dynamic groups that belong to these messages. -Proposed mechanisms will be based on natural language processing techniques and -will be designed to facilitate human-machine interaction in this direction. -" -350,1106.5973,Venkata Ravinder Paruchuri,Entropy of Telugu,cs.CL," This paper presents an investigation of the entropy of the Telugu script. -Since this script is syllabic, and not alphabetic, the computation of entropy -is somewhat complicated. -" -351,1107.0193,Jordi Fortuny and Bernat Corominas-Murtra,On the origin of ambiguity in efficient communication,cs.CL," This article studies the emergence of ambiguity in communication through the -concept of logical irreversibility and within the framework of Shannon's -information theory. This leads us to a precise and general expression of the -intuition behind Zipf's vocabulary balance in terms of a symmetry equation -between the complexities of the coding and the decoding processes that imposes -an unavoidable amount of logical uncertainty in natural communication. -Accordingly, the emergence of irreversible computations is required if the -complexities of the coding and the decoding processes are balanced in a -symmetric scenario, which means that the emergence of ambiguous codes is a -necessary condition for natural communication to succeed. -" -352,1107.1753,Yavor Parvanov,Notes on Electronic Lexicography,cs.CL," These notes are a continuation of topics covered by V. Selegej in his article -""Electronic Dictionaries and Computational lexicography"". How can an electronic -dictionary have as its object the description of closely related languages? -Obviously, such a question allows multiple answers. -" -353,1107.3119,Edward Grefenstette and Mehrnoosh Sadrzadeh,Experimenting with Transitive Verbs in a DisCoCat,cs.CL math.CT," Formal and distributional semantic models offer complementary benefits in -modeling meaning. The categorical compositional distributional (DisCoCat) model -of meaning of Coecke et al. (arXiv:1003.4394v1 [cs.CL]) combines aspected of -both to provide a general framework in which meanings of words, obtained -distributionally, are composed using methods from the logical setting to form -sentence meaning. Concrete consequences of this general abstract setting and -applications to empirical data are under active study (Grefenstette et al., -arxiv:1101.0309; Grefenstette and Sadrzadeh, arXiv:1106.4058v1 [cs.CL]). . In -this paper, we extend this study by examining transitive verbs, represented as -matrices in a DisCoCat. We discuss three ways of constructing such matrices, -and evaluate each method in a disambiguation task developed by Grefenstette and -Sadrzadeh (arXiv:1106.4058v1 [cs.CL]). -" -354,1107.3263,Dorota Lipowska and Adam Lipowski,Naming Game on Adaptive Weighted Networks,cond-mat.stat-mech cs.CL physics.soc-ph," We examine a naming game on an adaptive weighted network. A weight of -connection for a given pair of agents depends on their communication success -rate and determines the probability with which the agents communicate. In some -cases, depending on the parameters of the model, the preference toward -successfully communicating agents is basically negligible and the model behaves -similarly to the naming game on a complete graph. In particular, it quickly -reaches a single-language state, albeit some details of the dynamics are -different from the complete-graph version. In some other cases, the preference -toward successfully communicating agents becomes much more relevant and the -model gets trapped in a multi-language regime. In this case gradual coarsening -and extinction of languages lead to the emergence of a dominant language, -albeit with some other languages still being present. A comparison of -distribution of languages in our model and in the human population is -discussed. -" -355,1107.3707,"Alexander M. Petersen, Joel Tenenbaum, Shlomo Havlin, H. Eugene - Stanley","Statistical Laws Governing Fluctuations in Word Use from Word Birth to - Word Death",physics.soc-ph cs.CL cs.IR nlin.AO physics.pop-ph," We analyze the dynamic properties of 10^7 words recorded in English, Spanish -and Hebrew over the period 1800--2008 in order to gain insight into the -coevolution of language and culture. We report language independent patterns -useful as benchmarks for theoretical models of language evolution. A -significantly decreasing (increasing) trend in the birth (death) rate of words -indicates a recent shift in the selection laws governing word use. For new -words, we observe a peak in the growth-rate fluctuations around 40 years after -introduction, consistent with the typical entry time into standard dictionaries -and the human generational timescale. Pronounced changes in the dynamics of -language during periods of war shows that word correlations, occurring across -time and between words, are largely influenced by coevolutionary social, -technological, and political factors. We quantify cultural memory by analyzing -the long-term correlations in the use of individual words using detrended -fluctuation analysis. -" -356,1107.4218,Maurizio Serva,The settlement of Madagascar: what dialects and languages can tell,cs.CL q-bio.PE," The dialects of Madagascar belong to the Greater Barito East group of the -Austronesian family and it is widely accepted that the Island was colonized by -Indonesian sailors after a maritime trek which probably took place around 650 -CE. The language most closely related to Malagasy dialects is Maanyan but also -Malay is strongly related especially for what concerns navigation terms. Since -the Maanyan Dayaks live along the Barito river in Kalimantan (Borneo) and they -do not possess the necessary skill for long maritime navigation, probably they -were brought as subordinates by Malay sailors. - In a recent paper we compared 23 different Malagasy dialects in order to -determine the time and the landing area of the first colonization. In this -research we use new data and new methods to confirm that the landing took place -on the south-east coast of the Island. Furthermore, we are able to state here -that it is unlikely that there were multiple settlements and, therefore, -colonization consisted in a single founding event. - To reach our goal we find out the internal kinship relations among all the 23 -Malagasy dialects and we also find out the different kinship degrees of the 23 -dialects versus Malay and Maanyan. The method used is an automated version of -the lexicostatistic approach. The data concerning Madagascar were collected by -the author at the beginning of 2010 and consist of Swadesh lists of 200 items -for 23 dialects covering all areas of the Island. The lists for Maanyan and -Malay were obtained from published datasets integrated by author's interviews. -" -357,1107.4557,"Myle Ott, Yejin Choi, Claire Cardie, Jeffrey T. Hancock",Finding Deceptive Opinion Spam by Any Stretch of the Imagination,cs.CL cs.CY," Consumers increasingly rate, review and research products online. -Consequently, websites containing consumer reviews are becoming targets of -opinion spam. While recent work has focused primarily on manually identifiable -instances of opinion spam, in this work we study deceptive opinion -spam---fictitious opinions that have been deliberately written to sound -authentic. Integrating work from psychology and computational linguistics, we -develop and compare three approaches to detecting deceptive opinion spam, and -ultimately develop a classifier that is nearly 90% accurate on our -gold-standard opinion spam dataset. Based on feature analysis of our learned -models, we additionally make several theoretical contributions, including -revealing a relationship between deceptive opinions and imaginative writing. -" -358,1107.4573,Peter D. Turney (National Research Council of Canada),Analogy perception applied to seven tests of word comprehension,cs.AI cs.CL cs.LG," It has been argued that analogy is the core of cognition. In AI research, -algorithms for analogy are often limited by the need for hand-coded high-level -representations as input. An alternative approach is to use high-level -perception, in which high-level representations are automatically generated -from raw data. Analogy perception is the process of recognizing analogies using -high-level perception. We present PairClass, an algorithm for analogy -perception that recognizes lexical proportional analogies using representations -that are automatically generated from a large corpus of raw textual data. A -proportional analogy is an analogy of the form A:B::C:D, meaning ""A is to B as -C is to D"". A lexical proportional analogy is a proportional analogy with -words, such as carpenter:wood::mason:stone. PairClass represents the semantic -relations between two words using a high-dimensional feature vector, in which -the elements are based on frequencies of patterns in the corpus. PairClass -recognizes analogies by applying standard supervised machine learning -techniques to the feature vectors. We show how seven different tests of word -comprehension can be framed as problems of analogy perception and we then apply -PairClass to the seven resulting sets of analogy perception problems. We -achieve competitive results on all seven tests. This is the first time a -uniform approach has handled such a range of tests of word comprehension. -" -359,1107.4687,"Luis Quesada, Fernando Berzal, and Francisco J. Cortijo","Fence - An Efficient Parser with Ambiguity Support for Model-Driven - Language Specification",cs.CL," Model-based language specification has applications in the implementation of -language processors, the design of domain-specific languages, model-driven -software development, data integration, text mining, natural language -processing, and corpus-based induction of models. Model-based language -specification decouples language design from language processing and, unlike -traditional grammar-driven approaches, which constrain language designers to -specific kinds of grammars, it needs general parser generators able to deal -with ambiguities. In this paper, we propose Fence, an efficient bottom-up -parsing algorithm with lexical and syntactic ambiguity support that enables the -use of model-based language specification in practice. -" -360,1107.4723,"Yannis Haralambous, Vitaly Klyuev","A Semantic Relatedness Measure Based on Combined Encyclopedic, - Ontological and Collocational Knowledge",cs.CL," We describe a new semantic relatedness measure combining the Wikipedia-based -Explicit Semantic Analysis measure, the WordNet path measure and the mixed -collocation index. Our measure achieves the currently highest results on the -WS-353 test: a Spearman rho coefficient of 0.79 (vs. 0.75 in (Gabrilovich and -Markovitch, 2007)) when applying the measure directly, and a value of 0.87 (vs. -0.78 in (Agirre et al., 2009)) when using the prediction of a polynomial SVM -classifier trained on our measure. - In the appendix we discuss the adaptation of ESA to 2011 Wikipedia data, as -well as various unsuccessful attempts to enhance ESA by filtering at word, -sentence, and section level. -" -361,1107.4734,Mohamed Hssini and Azzeddine Lazrek,Design of Arabic Diacritical Marks,cs.CL," Diacritical marks play a crucial role in meeting the criteria of usability of -typographic text, such as: homogeneity, clarity and legibility. To change the -diacritic of a letter in a word could completely change its semantic. The -situation is very complicated with multilingual text. Indeed, the problem of -design becomes more difficult by the presence of diacritics that come from -various scripts; they are used for different purposes, and are controlled by -various typographic rules. It is quite challenging to adapt rules from one -script to another. This paper aims to study the placement and sizing of -diacritical marks in Arabic script, with a comparison with the Latin's case. -The Arabic script is cursive and runs from right-to-left; its criteria and -rules are quite distinct from those of the Latin script. In the beginning, we -compare the difficulty of processing diacritics in both scripts. After, we will -study the limits of Latin resolution strategies when applied to Arabic. At the -end, we propose an approach to resolve the problem for positioning and resizing -diacritics. This strategy includes creating an Arabic font, designed in -OpenType format, along with suitable justification in TEX. -" -362,1107.4796,"Ali Jowharpour, Masha allah abbasi dezfuli, Mohammad hosein Yektaee","Use Pronunciation by Analogy for text to speech system in Persian - language",cs.CL," The interest in text to speech synthesis increased in the world .text to -speech have been developed formany popular languages such as English, Spanish -and French and many researches and developmentshave been applied to those -languages. Persian on the other hand, has been given little attentioncompared -to other languages of similar importance and the research in Persian is still -in its infancy.Persian language possess many difficulty and exceptions that -increase complexity of text to speechsystems. For example: short vowels is -absent in written text or existence of homograph words. in thispaper we propose -a new method for persian text to phonetic that base on pronunciations by -analogy inwords, semantic relations and grammatical rules for finding proper -phonetic. Keywords:PbA, text to speech, Persian language, FPbA -" -363,1107.5743,"Siddhartha Jonnalagadda, Philip Topham","NEMO: Extraction and normalization of organization names from PubMed - affiliation strings",cs.CL," We propose NEMO, a system for extracting organization names in the -affiliation and normalizing them to a canonical organization name. Our parsing -process involves multi-layered rule matching with multiple dictionaries. The -system achieves more than 98% f-score in extracting organization names. Our -process of normalization that involves clustering based on local sequence -alignment metrics and local learning based on finding connected components. A -high precision was also observed in normalization. NEMO is the missing link in -associating each biomedical paper and its authors to an organization name in -its canonical form and the Geopolitical location of the organization. This -research could potentially help in analyzing large social networks of -organizations for landscaping a particular topic, improving performance of -author disambiguation, adding weak links in the co-author network of authors, -augmenting NLM's MARS system for correcting errors in OCR output of affiliation -field, and automatically indexing the PubMed citations with the normalized -organization name and country. Our system is available as a graphical user -interface available for download along with this paper. -" -364,1107.5744,"Siddhartha Jonnalagadda, Graciela Gonzalez","BioSimplify: an open source sentence simplification engine to improve - recall in automatic biomedical information extraction",cs.CL," BioSimplify is an open source tool written in Java that introduces and -facilitates the use of a novel model for sentence simplification tuned for -automatic discourse analysis and information extraction (as opposed to sentence -simplification for improving human readability). The model is based on a -""shot-gun"" approach that produces many different (simpler) versions of the -original sentence by combining variants of its constituent elements. This tool -is optimized for processing biomedical scientific literature such as the -abstracts indexed in PubMed. We tested our tool on its impact to the task of -PPI extraction and it improved the f-score of the PPI tool by around 7%, with -an improvement in recall of around 20%. The BioSimplify tool and test corpus -can be downloaded from https://biosimplify.sourceforge.net. -" -365,1107.5752,Siddhartha Jonnalagadda,"An Effective Approach to Biomedical Information Extraction with Limited - Training Data",cs.CL," Overall, the two main contributions of this work include the application of -sentence simplification to association extraction as described above, and the -use of distributional semantics for concept extraction. The proposed work on -concept extraction amalgamates for the first time two diverse research areas --distributional semantics and information extraction. This approach renders all -the advantages offered in other semi-supervised machine learning systems, and, -unlike other proposed semi-supervised approaches, it can be used on top of -different basic frameworks and algorithms. -http://gradworks.umi.com/34/49/3449837.html -" -366,1108.0353,"Velimir M. Ilic, Miroslav D. Ciric, Miomir S. Stankovic",Cross-moments computation for stochastic context-free grammars,cs.CL," In this paper we consider the problem of efficient computation of -cross-moments of a vector random variable represented by a stochastic -context-free grammar. Two types of cross-moments are discussed. The sample -space for the first one is the set of all derivations of the context-free -grammar, and the sample space for the second one is the set of all derivations -which generate a string belonging to the language of the grammar. In the past, -this problem was widely studied, but mainly for the cross-moments of scalar -variables and up to the second order. This paper presents new algorithms for -computing the cross-moments of an arbitrary order, and the previously developed -ones are derived as special cases. -" -367,1108.0631,"Laurent Romary (IDSL, INRIA Saclay - Ile de France), Amir Zeldes, - Florian Zipser (IDSL, INRIA Saclay - Ile de France)",Serialising the ISO SynAF Syntactic Object Model,cs.CL," This paper introduces, an XML format developed to serialise the object model -defined by the ISO Syntactic Annotation Framework SynAF. Based on widespread -best practices we adapt a popular XML format for syntactic annotation, -TigerXML, with additional features to support a variety of syntactic phenomena -including constituent and dependency structures, binding, and different node -types such as compounds or empty elements. We also define interfaces to other -formats and standards including the Morpho-syntactic Annotation Framework MAF -and the ISOCat Data Category Registry. Finally a case study of the German -Treebank TueBa-D/Z is presented, showcasing the handling of constituent -structures, topological fields and coreference annotation in tandem. -" -368,1108.1966,Anil Kumar Singh,"A Concise Query Language with Search and Transform Operations for - Corpora with Multiple Levels of Annotation",cs.CL," The usefulness of annotated corpora is greatly increased if there is an -associated tool that can allow various kinds of operations to be performed in a -simple way. Different kinds of annotation frameworks and many query languages -for them have been proposed, including some to deal with multiple layers of -annotation. We present here an easy to learn query language for a particular -kind of annotation framework based on 'threaded trees', which are somewhere -between the complete order of a tree and the anarchy of a graph. Through -'typed' threads, they can allow multiple levels of annotation in the same -document. Our language has a simple, intuitive and concise syntax and high -expressive power. It allows not only to search for complicated patterns with -short queries but also allows data manipulation and specification of arbitrary -return values. Many of the commonly used tasks that otherwise require writing -programs, can be performed with one or more queries. We compare the language -with some others and try to evaluate it. -" -369,1108.3843,"Chitta Baral, Juraj Dzifcak, Marcos Alvarez Gonzalez and Jiayu Zhou","Using Inverse lambda and Generalization to Translate English to Formal - Languages",cs.CL," We present a system to translate natural language sentences to formulas in a -formal or a knowledge representation language. Our system uses two inverse -lambda-calculus operators and using them can take as input the semantic -representation of some words, phrases and sentences and from that derive the -semantic representation of other words and phrases. Our inverse lambda operator -works on many formal languages including first order logic, database query -languages and answer set programming. Our system uses a syntactic combinatorial -categorial parser to parse natural language sentences and also to construct the -semantic meaning of the sentences as directed by their parsing. The same parser -is used for both. In addition to the inverse lambda-calculus operators, our -system uses a notion of generalization to learn semantic representation of -words from the semantic representation of other words that are of the same -category. Together with this, we use an existing statistical learning approach -to assign weights to deal with multiple meanings of words. Our system produces -improved results on standard corpora on natural language interfaces for robot -command and control and database queries. -" -370,1108.3848,Chitta Baral and Juraj Dzifcak,"Language understanding as a step towards human level intelligence - - automatizing the construction of the initial dictionary from example - sentences",cs.CL," For a system to understand natural language, it needs to be able to take -natural language text and answer questions given in natural language with -respect to that text; it also needs to be able to follow instructions given in -natural language. To achieve this, a system must be able to process natural -language and be able to capture the knowledge within that text. Thus it needs -to be able to translate natural language text into a formal language. We -discuss our approach to do this, where the translation is achieved by composing -the meaning of words in a sentence. Our initial approach uses an inverse lambda -method that we developed (and other methods) to learn meaning of words from -meaning of sentences and an initial lexicon. We then present an improved method -where the initial lexicon is also learned by analyzing the training sentence -and meaning pairs. We evaluate our methods and compare them with other existing -methods on a corpora of database querying and robot command and control. -" -371,1108.3850,Chitta Baral and Juraj Dzifcak,"Solving puzzles described in English by automated translation to answer - set programming and learning how to do that translation",cs.CL cs.AI cs.LO," We present a system capable of automatically solving combinatorial logic -puzzles given in (simplified) English. It involves translating the English -descriptions of the puzzles into answer set programming(ASP) and using ASP -solvers to provide solutions of the puzzles. To translate the descriptions, we -use a lambda-calculus based approach using Probabilistic Combinatorial -Categorial Grammars (PCCG) where the meanings of words are associated with -parameters to be able to distinguish between multiple meanings of the same -word. Meaning of many words and the parameters are learned. The puzzles are -represented in ASP using an ontology which is applicable to a large set of -logic puzzles. -" -372,1108.4052,Vitaly Klyuev and Yannis Haralambous,"Query Expansion: Term Selection using the EWC Semantic Relatedness - Measure",cs.CL," This paper investigates the efficiency of the EWC semantic relatedness -measure in an ad-hoc retrieval task. This measure combines the Wikipedia-based -Explicit Semantic Analysis measure, the WordNet path measure and the mixed -collocation index. In the experiments, the open source search engine Terrier -was utilised as a tool to index and retrieve data. The proposed technique was -tested on the NTCIR data collection. The experiments demonstrated promising -results. -" -373,1108.4297,"Jean-Louis Dessalles (INFRES, LTCI)","Why is language well-designed for communication? (Commentary on - Christiansen and Chater: 'Language as shaped by the brain')",cs.CL q-bio.NC," Selection through iterated learning explains no more than other -non-functional accounts, such as universal grammar, why language is so -well-designed for communicative efficiency. It does not predict several -distinctive features of language like central embedding, large lexicons or the -lack of iconicity, that seem to serve communication purposes at the expense of -learnability. -" -374,1108.5016,"Maxime Amblard (LORIA), Musiol Michel (LABPSYLOR), Rebuschi Manuel - (LHSP)","Une analyse bas\'ee sur la S-DRT pour la mod\'elisation de dialogues - pathologiques",cs.CL cs.AI," In this article, we present a corpus of dialogues between a schizophrenic -speaker and an interlocutor who drives the dialogue. We had identified specific -discontinuities for paranoid schizophrenics. We propose a modeling of these -discontinuities with S-DRT (its pragmatic part) -" -375,1108.5017,"Sai Qian (LORIA), Maxime Amblard (LORIA)",Event in Compositional Dynamic Semantics,cs.CL cs.AI cs.LO," We present a framework which constructs an event-style dis- course semantics. -The discourse dynamics are encoded in continuation semantics and various -rhetorical relations are embedded in the resulting interpretation of the -framework. We assume discourse and sentence are distinct semantic objects, that -play different roles in meaning evalua- tion. Moreover, two sets of composition -functions, for handling different discourse relations, are introduced. The -paper first gives the necessary background and motivation for event and dynamic -semantics, then the framework with detailed examples will be introduced. -" -376,1108.5027,Maxime Amblard (LORIA),"Encoding Phases using Commutativity and Non-commutativity in a Logical - Framework",cs.CL cs.AI cs.LO," This article presents an extension of Minimalist Categorial Gram- mars (MCG) -to encode Chomsky's phases. These grammars are based on Par- tially Commutative -Logic (PCL) and encode properties of Minimalist Grammars (MG) of Stabler. The -first implementation of MCG were using both non- commutative properties (to -respect the linear word order in an utterance) and commutative ones (to model -features of different constituents). Here, we pro- pose to adding Chomsky's -phases with the non-commutative tensor product of the logic. Then we could give -account of the PIC just by using logical prop- erties of the framework. -" -377,1108.5096,Maxime Amblard (LORIA),"Minimalist Grammars and Minimalist Categorial Grammars, definitions - toward inclusion of generated languages",cs.CL," Stabler proposes an implementation of the Chomskyan Minimalist Program, -Chomsky 95 with Minimalist Grammars - MG, Stabler 97. This framework inherits a -long linguistic tradition. But the semantic calculus is more easily added if -one uses the Curry-Howard isomorphism. Minimalist Categorial Grammars - MCG, -based on an extension of the Lambek calculus, the mixed logic, were introduced -to provide a theoretically-motivated syntax-semantics interface, Amblard 07. In -this article, we give full definitions of MG with algebraic tree descriptions -and of MCG, and take the first steps towards giving a proof of inclusion of -their generated languages. -" -378,1108.5192,"Isabel M. Kloumann, Christopher M. Danforth, Kameron Decker Harris, - Catherine A. Bliss, and Peter Sheridan Dodds",Positivity of the English language,physics.soc-ph cs.CL," Over the last million years, human language has emerged and evolved as a -fundamental instrument of social communication and semiotic representation. -People use language in part to convey emotional information, leading to the -central and contingent questions: (1) What is the emotional spectrum of natural -language? and (2) Are natural languages neutrally, positively, or negatively -biased? Here, we report that the human-perceived positivity of over 10,000 of -the most frequently used English words exhibits a clear positive bias. More -deeply, we characterize and quantify distributions of word positivity for four -large and distinct corpora, demonstrating that their form is broadly invariant -with respect to frequency of word use. -" -379,1108.5520,"Murphy Choy, Michelle L.F. Cheong, Ma Nang Laik, Koo Ping Shung","A sentiment analysis of Singapore Presidential Election 2011 using - Twitter data with census correction",stat.AP cs.CL cs.SI," Sentiment analysis is a new area in text analytics where it focuses on the -analysis and understanding of the emotions from the text patterns. This new -form of analysis has been widely adopted in customer relation management -especially in the context of complaint management. With increasing level of -interest in this technology, more and more companies are adopting it and using -it to champion their marketing efforts. However, sentiment analysis using -twitter has remained extremely difficult to manage due to the sampling bias. In -this paper, we will discuss about the application of using reweighting -techniques in conjunction with online sentiment divisions to predict the vote -percentage that individual candidate will receive. There will be in depth -discussion about the various aspects using sentiment analysis to predict -outcomes as well as the potential pitfalls in the estimation due to the -anonymous nature of the internet. -" -380,1108.5567,"Yuliya Lierler (University of Kentucky) and Peter Sch\""uller - (Technische Universit\""at Wien)","Parsing Combinatory Categorial Grammar with Answer Set Programming: - Preliminary Report",cs.AI cs.CL," Combinatory categorial grammar (CCG) is a grammar formalism used for natural -language parsing. CCG assigns structured lexical categories to words and uses a -small set of combinatory rules to combine these categories to parse a sentence. -In this work we propose and implement a new approach to CCG parsing that relies -on a prominent knowledge representation formalism, answer set programming (ASP) -- a declarative programming paradigm. We formulate the task of CCG parsing as a -planning problem and use an ASP computational tool to compute solutions that -correspond to valid parses. Compared to other approaches, there is no need to -implement a specific parsing algorithm using such a declarative method. Our -approach aims at producing all semantically distinct parse trees for a given -sentence. From this goal, normalization and efficiency issues arise, and we -deal with them by combining and extending existing strategies. We have -implemented a CCG parsing tool kit - AspCcgTk - that uses ASP as its main -computational means. The C&C supertagger can be used as a preprocessor within -AspCcgTk, which allows us to achieve wide-coverage natural language parsing. -" -381,1108.5974,"Pawe{\l} Wero\'nski, Julian Sienkiewicz, Georgios Paltoglou, Kevan - Buckley, Mike Thelwall and Janusz A. Ho{\l}yst",Emotional Analysis of Blogs and Forums Data,cs.CL physics.data-an physics.soc-ph," We perform a statistical analysis of emotionally annotated comments in two -large online datasets, examining chains of consecutive posts in the -discussions. Using comparisons with randomised data we show that there is a -high level of correlation for the emotional content of messages. -" -382,1109.0069,"Shibamouli Lahiri, Xiaofei Lu",Inter-rater Agreement on Sentence Formality,cs.CL," Formality is one of the most important dimensions of writing style variation. -In this study we conducted an inter-rater reliability experiment for assessing -sentence formality on a five-point Likert scale, and obtained good agreement -results as well as different rating distributions for different sentence -categories. We also performed a difficulty analysis to identify the bottlenecks -of our rating procedure. Our main objective is to design an automatic scoring -mechanism for sentence-level formality, and this study is important for that -purpose. -" -383,1109.0624,Marwa Graja and Maher Jaoua and Lamia Hadrich Belguith,Building Ontologies to Understand Spoken Tunisian Dialect,cs.CL," This paper presents a method to understand spoken Tunisian dialect based on -lexical semantic. This method takes into account the specificity of the -Tunisian dialect which has no linguistic processing tools. This method is -ontology-based which allows exploiting the ontological concepts for semantic -annotation and ontological relations for speech interpretation. This -combination increases the rate of comprehension and limits the dependence on -linguistic resources. This paper also details the process of building the -ontology used for annotation and interpretation of Tunisian dialect in the -context of speech understanding in dialogue systems for restricted domain. -" -384,1109.1618,"Son Doan, Bao-Khanh Ho Vo, and Nigel Collier",An analysis of Twitter messages in the 2011 Tohoku Earthquake,cs.SI cs.CL physics.soc-ph," Social media such as Facebook and Twitter have proven to be a useful resource -to understand public opinion towards real world events. In this paper, we -investigate over 1.5 million Twitter messages (tweets) for the period 9th March -2011 to 31st May 2011 in order to track awareness and anxiety levels in the -Tokyo metropolitan district to the 2011 Tohoku Earthquake and subsequent -tsunami and nuclear emergencies. These three events were tracked using both -English and Japanese tweets. Preliminary results indicated: 1) close -correspondence between Twitter data and earthquake events, 2) strong -correlation between English and Japanese tweets on the same events, 3) tweets -in the native language play an important roles in early warning, 4) tweets -showed how quickly Japanese people's anxiety returned to normal levels after -the earthquake event. Several distinctions between English and Japanese tweets -on earthquake events are also discussed. The results suggest that Twitter data -can be used as a useful resource for tracking the public mood of populations -affected by natural disasters as well as an early warning system. -" -385,1109.2128,"Gunes Erkan, Dragomir R. Radev","LexRank: Graph-based Lexical Centrality as Salience in Text - Summarization",cs.CL," We introduce a stochastic graph-based method for computing relative -importance of textual units for Natural Language Processing. We test the -technique on the problem of Text Summarization (TS). Extractive TS relies on -the concept of sentence salience to identify the most important sentences in a -document or set of documents. Salience is typically defined in terms of the -presence of particular important words or in terms of similarity to a centroid -pseudo-sentence. We consider a new approach, LexRank, for computing sentence -importance based on the concept of eigenvector centrality in a graph -representation of sentences. In this model, a connectivity matrix based on -intra-sentence cosine similarity is used as the adjacency matrix of the graph -representation of sentences. Our system, based on LexRank ranked in first place -in more than one task in the recent DUC 2004 evaluation. In this paper we -present a detailed analysis of our approach and apply it to a larger data set -including data from earlier DUC evaluations. We discuss several methods to -compute centrality using the similarity graph. The results show that -degree-based methods (including LexRank) outperform both centroid-based methods -and other systems participating in DUC in most of the cases. Furthermore, the -LexRank with threshold method outperforms the other degree-based techniques -including continuous LexRank. We also show that our approach is quite -insensitive to the noise in the data that may result from an imperfect topical -clustering of documents. -" -386,1109.2130,"A. Montoyo, M. Palomar, G. Rigau, A. Suarez",Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods,cs.CL," In this paper we concentrate on the resolution of the lexical ambiguity that -arises when a given word has several different meanings. This specific task is -commonly referred to as word sense disambiguation (WSD). The task of WSD -consists of assigning the correct sense to words using an electronic dictionary -as the source of word definitions. We present two WSD methods based on two main -methodological approaches in this research area: a knowledge-based method and a -corpus-based method. Our hypothesis is that word-sense disambiguation requires -several knowledge sources in order to solve the semantic ambiguity of the -words. These sources can be of different kinds--- for example, syntagmatic, -paradigmatic or statistical information. Our approach combines various sources -of knowledge, through combinations of the two WSD methods mentioned above. -Mainly, the paper concentrates on how to combine these methods and sources of -information in order to achieve good results in the disambiguation. Finally, -this paper presents a comprehensive study and experimental work on evaluation -of the methods and their combinations. -" -387,1109.2136,"P. W. Jordan, M. A. Walker","Learning Content Selection Rules for Generating Object Descriptions in - Dialogue",cs.CL," A fundamental requirement of any task-oriented dialogue system is the ability -to generate object descriptions that refer to objects in the task domain. The -subproblem of content selection for object descriptions in task-oriented -dialogue has been the focus of much previous work and a large number of models -have been proposed. In this paper, we use the annotated COCONUT corpus of -task-oriented design dialogues to develop feature sets based on Dale and -Reiters (1995) incremental model, Brennan and Clarks (1996) conceptual pact -model, and Jordans (2000b) intentional influences model, and use these feature -sets in a machine learning experiment to automatically learn a model of content -selection for object descriptions. Since Dale and Reiters model requires a -representation of discourse structure, the corpus annotations are used to -derive a representation based on Grosz and Sidners (1986) theory of the -intentional structure of discourse, as well as two very simple representations -of discourse structure based purely on recency. We then apply the -rule-induction program RIPPER to train and test the content selection component -of an object description generator on a set of 393 object descriptions from the -corpus. To our knowledge, this is the first reported experiment of a trainable -content selection component for object description generation in dialogue. -Three separate content selection models that are based on the three theoretical -models, all independently achieve accuracies significantly above the majority -class baseline (17%) on unseen test data, with the intentional influences model -(42.4%) performing significantly better than either the incremental model -(30.4%) or the conceptual pact model (28.9%). But the best performing models -combine all the feature sets, achieving accuracies near 60%. Surprisingly, a -simple recency-based representation of discourse structure does as well as one -based on intentional structure. To our knowledge, this is also the first -empirical comparison of a representation of Grosz and Sidners model of -discourse structure with a simpler model for any generation task. -" -388,1109.2657,"Seyed M. Montazeri (University of Gothenburg, Sweden), Nivir K.S. Roy - (University of Gothenburg, Sweden), Gerardo Schneider (Chalmers | University - of Gothenburg, Sweden)",From Contracts in Structured English to CL Specifications,cs.CL cs.FL cs.LO," In this paper we present a framework to analyze conflicts of contracts -written in structured English. A contract that has manually been rewritten in a -structured English is automatically translated into a formal language using the -Grammatical Framework (GF). In particular we use the contract language CL as a -target formal language for this translation. In our framework CL specifications -could then be input into the tool CLAN to detect the presence of conflicts -(whether there are contradictory obligations, permissions, and prohibitions. We -also use GF to get a version in (restricted) English of CL formulae. We discuss -the implementation of such a framework. -" -389,1109.4531,"Janne V. Kujala, Aleksi Keurulainen",A Probabilistic Approach to Pronunciation by Analogy,cs.CL," The relationship between written and spoken words is convoluted in languages -with a deep orthography such as English and therefore it is difficult to devise -explicit rules for generating the pronunciations for unseen words. -Pronunciation by analogy (PbA) is a data-driven method of constructing -pronunciations for novel words from concatenated segments of known words and -their pronunciations. PbA performs relatively well with English and outperforms -several other proposed methods. However, the best published word accuracy of -65.5% (for the 20,000 word NETtalk corpus) suggests there is much room for -improvement in it. - Previous PbA algorithms have used several different scoring strategies such -as the product of the frequencies of the component pronunciations of the -segments, or the number of different segmentations that yield the same -pronunciation, and different combinations of these methods, to evaluate the -candidate pronunciations. In this article, we instead propose to use a -probabilistically justified scoring rule. We show that this principled approach -alone yields better accuracy (66.21% for the NETtalk corpus) than any -previously published PbA algorithm. Furthermore, combined with certain ad hoc -modifications motivated by earlier algorithms, the performance climbs up to -66.6%, and further improvements are possible by combining this method with -other methods. -" -390,1109.4906,"Odile Piton (SAMM), Slim Mesfar (RIADI), H\'el\`ene Pignot (SAMM)","Automatic transcription of 17th century English text in Contemporary - English with NooJ: Method and Evaluation",cs.CL," Since 2006 we have undertaken to describe the differences between 17th -century English and contemporary English thanks to NLP software. Studying a -corpus spanning the whole century (tales of English travellers in the Ottoman -Empire in the 17th century, Mary Astell's essay A Serious Proposal to the -Ladies and other literary texts) has enabled us to highlight various lexical, -morphological or grammatical singularities. Thanks to the NooJ linguistic -platform, we created dictionaries indexing the lexical variants and their -transcription in CE. The latter is often the result of the validation of forms -recognized dynamically by morphological graphs. We also built syntactical -graphs aimed at transcribing certain archaic forms in contemporary English. Our -previous research implied a succession of elementary steps alternating textual -analysis and result validation. We managed to provide examples of -transcriptions, but we have not created a global tool for automatic -transcription. Therefore we need to focus on the results we have obtained so -far, study the conditions for creating such a tool, and analyze possible -difficulties. In this paper, we will be discussing the technical and linguistic -aspects we have not yet covered in our previous work. We are using the results -of previous research and proposing a transcription method for words or -sequences identified as archaic. -" -391,1109.5798,Yuriy Ostapov,"Object-oriented semantics of English in natural language understanding - system",cs.CL," A new approach to the problem of natural language understanding is proposed. -The knowledge domain under consideration is the social behavior of people. -English sentences are translated into set of predicates of a semantic database, -which describe persons, occupations, organizations, projects, actions, events, -messages, machines, things, animals, location and time of actions, relations -between objects, thoughts, cause-and-effect relations, abstract objects. There -is a knowledge base containing the description of semantics of objects -(functions and structure), actions (motives and causes), and operations. -" -392,1109.6018,"Chenhao Tan, Lillian Lee, Jie Tang, Long Jiang, Ming Zhou, and Ping Li",User-level sentiment analysis incorporating social networks,cs.CL cs.IR physics.data-an physics.soc-ph," We show that information about social relationships can be used to improve -user-level sentiment analysis. The main motivation behind our approach is that -users that are somehow ""connected"" may be more likely to hold similar opinions; -therefore, relationship information can complement what we can extract about a -user's viewpoints from their utterances. Employing Twitter as a source for our -experimental data, and working within a semi-supervised framework, we propose -models that are induced either from the Twitter follower/followee network or -from the network in Twitter formed by users referring to each other using ""@"" -mentions. Our transductive learning results reveal that incorporating -social-network information can indeed lead to statistically significant -sentiment-classification improvements over the performance of an approach based -on Support Vector Machines having access only to textual features. -" -393,1109.6341,"H. Daume III, D. Marcu",Domain Adaptation for Statistical Classifiers,cs.LG cs.CL," The most basic assumption used in statistical learning theory is that -training data and test data are drawn from the same underlying distribution. -Unfortunately, in many applications, the ""in-domain"" test data is drawn from a -distribution that is related, but not identical, to the ""out-of-domain"" -distribution of the training data. We consider the common case in which labeled -out-of-domain data is plentiful, but labeled in-domain data is scarce. We -introduce a statistical formulation of this problem in terms of a simple -mixture model and present an instantiation of this framework to maximum entropy -classifiers and their linear chain counterparts. We present efficient inference -algorithms for this special case based on the technique of conditional -expectation maximization. Our experimental results show that our approach leads -to improved performance on three real world tasks on four different data sets -from the natural language processing domain. -" -394,1110.1391,"K. Choi, H. Isahara, J. Oh",A Comparison of Different Machine Transliteration Models,cs.CL cs.AI," Machine transliteration is a method for automatically converting words in one -language into phonetically equivalent ones in another language. Machine -transliteration plays an important role in natural language applications such -as information retrieval and machine translation, especially for handling -proper nouns and technical terms. Four machine transliteration models -- -grapheme-based transliteration model, phoneme-based transliteration model, -hybrid transliteration model, and correspondence-based transliteration model -- -have been proposed by several researchers. To date, however, there has been -little research on a framework in which multiple transliteration models can -operate simultaneously. Furthermore, there has been no comparison of the four -models within the same framework and using the same data. We addressed these -problems by 1) modeling the four models within the same framework, 2) comparing -them under the same conditions, and 3) developing a way to improve machine -transliteration through this comparison. Our comparison showed that the hybrid -and correspondence-based models were the most effective and that the four -models can be used in a complementary manner to improve machine transliteration -performance. -" -395,1110.1394,"M. Lapata, A. Lascarides",Learning Sentence-internal Temporal Relations,cs.CL cs.AI," In this paper we propose a data intensive approach for inferring -sentence-internal temporal relations. Temporal inference is relevant for -practical NLP applications which either extract or synthesize temporal -information (e.g., summarisation, question answering). Our method bypasses the -need for manual coding by exploiting the presence of markers like after"", which -overtly signal a temporal relation. We first show that models trained on main -and subordinate clauses connected with a temporal marker achieve good -performance on a pseudo-disambiguation task simulating temporal inference -(during testing the temporal marker is treated as unseen and the models must -select the right marker from a set of possible candidates). Secondly, we assess -whether the proposed approach holds promise for the semi-automatic creation of -temporal annotations. Specifically, we use a model trained on noisy and -approximate data (i.e., main and subordinate clauses) to predict -intra-sentential relations present in TimeBank, a corpus annotated rich -temporal information. Our experiments compare and contrast several -probabilistic models differing in their feature space, linguistic assumptions -and data requirements. We evaluate performance against gold standard corpora -and also against human subjects. -" -396,1110.1428,Duy Khang Ly and Kazunari Sugiyama and Ziheng Lin and Min-Yen Kan,"Product Review Summarization based on Facet Identification and Sentence - Clustering",cs.CL cs.DL," Product review nowadays has become an important source of information, not -only for customers to find opinions about products easily and share their -reviews with peers, but also for product manufacturers to get feedback on their -products. As the number of product reviews grows, it becomes difficult for -users to search and utilize these resources in an efficient way. In this work, -we build a product review summarization system that can automatically process a -large collection of reviews and aggregate them to generate a concise summary. -More importantly, the drawback of existing product summarization systems is -that they cannot provide the underlying reasons to justify users' opinions. In -our method, we solve this problem by applying clustering, prior to selecting -representative candidates for summarization. -" -397,1110.1470,Luis Quesada and Fernando Berzal and Francisco J. Cortijo,A Constraint-Satisfaction Parser for Context-Free Grammars,cs.CL," Traditional language processing tools constrain language designers to -specific kinds of grammars. In contrast, model-based language specification -decouples language design from language processing. As a consequence, -model-based language specification tools need general parsers able to parse -unrestricted context-free grammars. As languages specified following this -approach may be ambiguous, parsers must deal with ambiguities. Model-based -language specification also allows the definition of associativity, precedence, -and custom constraints. Therefore parsers generated by model-driven language -specification tools need to enforce constraints. In this paper, we propose -Fence, an efficient bottom-up chart parser with lexical and syntactic ambiguity -support that allows the specification of constraints and, therefore, enables -the use of model-based language specification in practice. -" -398,1110.1758,"Laurent Romary (IDSL, INRIA Saclay - Ile de France), Andreas Witt - (IDS)",Data formats for phonological corpora,cs.CL," The goal of the present chapter is to explore the possibility of providing -the research (but also the industrial) community that commonly uses spoken -corpora with a stable portfolio of well-documented standardised formats that -allow a high re-use rate of annotated spoken resources and, as a consequence, -better interoperability across tools used to produce or exploit such resources. -" -399,1110.2162,"Ruben Sipos, Pannaga Shivaswamy, Thorsten Joachims",Large-Margin Learning of Submodular Summarization Methods,cs.AI cs.CL cs.LG," In this paper, we present a supervised learning approach to training -submodular scoring functions for extractive multi-document summarization. By -taking a structured predicition approach, we provide a large-margin method that -directly optimizes a convex relaxation of the desired performance measure. The -learning method applies to all submodular summarization methods, and we -demonstrate its effectiveness for both pairwise as well as coverage-based -scoring functions on multiple datasets. Compared to state-of-the-art functions -that were tuned manually, our method significantly improves performance and -enables high-fidelity models with numbers of parameters well beyond what could -reasonbly be tuned by hand. -" -400,1110.2215,"R. J. Evans, C. Orasan",NP Animacy Identification for Anaphora Resolution,cs.CL," In anaphora resolution for English, animacy identification can play an -integral role in the application of agreement restrictions between pronouns and -candidates, and as a result, can improve the accuracy of anaphora resolution -systems. In this paper, two methods for animacy identification are proposed and -evaluated using intrinsic and extrinsic measures. The first method is a -rule-based one which uses information about the unique beginners in WordNet to -classify NPs on the basis of their animacy. The second method relies on a -machine learning algorithm which exploits a WordNet enriched with animacy -information for each sense. The effect of word sense disambiguation on the two -methods is also assessed. The intrinsic evaluation reveals that the machine -learning method reaches human levels of performance. The extrinsic evaluation -demonstrates that animacy identification can be beneficial in anaphora -resolution, especially in the cases where animate entities are identified with -high precision. -" -401,1110.3088,Nigel Collier,Towards cross-lingual alerting for bursty epidemic events,cs.CL cs.IR cs.SI," Background: Online news reports are increasingly becoming a source for event -based early warning systems that detect natural disasters. Harnessing the -massive volume of information available from multilingual newswire presents as -many challenges as opportunities due to the patterns of reporting complex -spatiotemporal events. Results: In this article we study the problem of -utilising correlated event reports across languages. We track the evolution of -16 disease outbreaks using 5 temporal aberration detection algorithms on -text-mined events classified according to disease and outbreak country. Using -ProMED reports as a silver standard, comparative analysis of news data for 13 -languages over a 129 day trial period showed improved sensitivity, F1 and -timeliness across most models using cross-lingual events. We report a detailed -case study analysis for Cholera in Angola 2010 which highlights the challenges -faced in correlating news events with the silver standard. Conclusions: The -results show that automated health surveillance using multilingual text mining -has the potential to turn low value news into high value alerts if informed -choices are used to govern the selection of models and data sources. An -implementation of the C2 alerting algorithm using multilingual news is -available at the BioCaster portal http://born.nii.ac.jp/?page=globalroundup. -" -402,1110.3089,"Nigel Collier, Nguyen Truong Son, Ngoc Mai Nguyen",OMG U got flu? Analysis of shared health messages for bio-surveillance,cs.CL cs.IR cs.SI," Background: Micro-blogging services such as Twitter offer the potential to -crowdsource epidemics in real-time. However, Twitter posts ('tweets') are often -ambiguous and reactive to media trends. In order to ground user messages in -epidemic response we focused on tracking reports of self-protective behaviour -such as avoiding public gatherings or increased sanitation as the basis for -further risk analysis. Results: We created guidelines for tagging self -protective behaviour based on Jones and Salath\'e (2009)'s behaviour response -survey. Applying the guidelines to a corpus of 5283 Twitter messages related to -influenza like illness showed a high level of inter-annotator agreement (kappa -0.86). We employed supervised learning using unigrams, bigrams and regular -expressions as features with two supervised classifiers (SVM and Naive Bayes) -to classify tweets into 4 self-reported protective behaviour categories plus a -self-reported diagnosis. In addition to classification performance we report -moderately strong Spearman's Rho correlation by comparing classifier output -against WHO/NREVSS laboratory data for A(H1N1) in the USA during the 2009-2010 -influenza season. Conclusions: The study adds to evidence supporting a high -degree of correlation between pre-diagnostic social media signals and -diagnostic influenza case data, pointing the way towards low cost sensor -networks. We believe that the signals we have modelled may be applicable to a -wide range of diseases. -" -403,1110.3091,Nigel Collier,What's unusual in online disease outbreak news?,cs.CL cs.IR cs.SI," Background: Accurate and timely detection of public health events of -international concern is necessary to help support risk assessment and response -and save lives. Novel event-based methods that use the World Wide Web as a -signal source offer potential to extend health surveillance into areas where -traditional indicator networks are lacking. In this paper we address the issue -of systematically evaluating online health news to support automatic alerting -using daily disease-country counts text mined from real world data using -BioCaster. For 18 data sets produced by BioCaster, we compare 5 aberration -detection algorithms (EARS C2, C3, W2, F-statistic and EWMA) for performance -against expert moderated ProMED-mail postings. Results: We report sensitivity, -specificity, positive predictive value (PPV), negative predictive value (NPV), -mean alerts/100 days and F1, at 95% confidence interval (CI) for 287 -ProMED-mail postings on 18 outbreaks across 14 countries over a 366 day period. -Results indicate that W2 had the best F1 with a slight benefit for day of week -effect over C2. In drill down analysis we indicate issues arising from the -granular choice of country-level modeling, sudden drops in reporting due to day -of week effects and reporting bias. Automatic alerting has been implemented in -BioCaster available from http://born.nii.ac.jp. Conclusions: Online health news -alerts have the potential to enhance manual analytical methods by increasing -throughput, timeliness and detection rates. Systematic evaluation of health -news aberrations is necessary to push forward our understanding of the complex -relationship between news report volumes and case numbers and to select the -best performing features and algorithms. -" -404,1110.3094,"Nigel Collier, Son Doan",Syndromic classification of Twitter messages,cs.CL cs.IR cs.SI," Recent studies have shown strong correlation between social networking data -and national influenza rates. We expanded upon this success to develop an -automated text mining system that classifies Twitter messages in real time into -six syndromic categories based on key terms from a public health ontology. -10-fold cross validation tests were used to compare Naive Bayes (NB) and -Support Vector Machine (SVM) models on a corpus of 7431 Twitter messages. SVM -performed better than NB on 4 out of 6 syndromes. The best performing -classifiers showed moderately strong F1 scores: respiratory = 86.2 (NB); -gastrointestinal = 85.4 (SVM polynomial kernel degree 2); neurological = 88.6 -(SVM polynomial kernel degree 1); rash = 86.0 (SVM polynomial kernel degree 1); -constitutional = 89.3 (SVM polynomial kernel degree 1); hemorrhagic = 89.9 -(NB). The resulting classifiers were deployed together with an EARS C2 -aberration detection algorithm in an experimental online system. -" -405,1110.4123,"David Garcia, Antonios Garas, and Frank Schweitzer",Positive words carry less information than negative words,cs.CL cs.IR physics.soc-ph," We show that the frequency of word use is not only determined by the word -length \cite{Zipf1935} and the average information content -\cite{Piantadosi2011}, but also by its emotional content. We have analyzed -three established lexica of affective word usage in English, German, and -Spanish, to verify that these lexica have a neutral, unbiased, emotional -content. Taking into account the frequency of word usage, we find that words -with a positive emotional content are more frequently used. This lends support -to Pollyanna hypothesis \cite{Boucher1969} that there should be a positive bias -in human expression. We also find that negative words contain more information -than positive words, as the informativeness of a word increases uniformly with -its valence decrease. Our findings support earlier conjectures about (i) the -relation between word frequency and information content, and (ii) the impact of -positive emotions on communication and social links. -" -406,1110.4248,Luojie Xiang,Ideogram Based Chinese Sentiment Word Orientation Computation,cs.CL," This paper presents a novel algorithm to compute sentiment orientation of -Chinese sentiment word. The algorithm uses ideograms which are a distinguishing -feature of Chinese language. The proposed algorithm can be applied to any -sentiment classification scheme. To compute a word's sentiment orientation -using the proposed algorithm, only the word itself and a precomputed character -ontology is required, rather than a corpus. The influence of three parameters -over the algorithm performance is analyzed and verified by experiment. -Experiment also shows that proposed algorithm achieves an F Measure of 85.02% -outperforming existing ideogram based algorithm. -" -407,1110.6200,"Jacob Eisenstein, Duen Horng ""Polo"" Chau, Aniket Kittur, Eric P. Xing",TopicViz: Semantic Navigation of Document Collections,cs.HC cs.AI cs.CL," When people explore and manage information, they think in terms of topics and -themes. However, the software that supports information exploration sees text -at only the surface level. In this paper we show how topic modeling -- a -technique for identifying latent themes across large collections of documents --- can support semantic exploration. We present TopicViz, an interactive -environment for information exploration. TopicViz combines traditional search -and citation-graph functionality with a range of novel interactive -visualizations, centered around a force-directed layout that links documents to -the latent themes discovered by the topic model. We describe several use -scenarios in which TopicViz supports rapid sensemaking on large document -collections. -" -408,1111.0048,"F. Mairesse, R. Prasad, A. Stent, M. A. Walker",Individual and Domain Adaptation in Sentence Planning for Dialogue,cs.CL," One of the biggest challenges in the development and deployment of spoken -dialogue systems is the design of the spoken language generation module. This -challenge arises from the need for the generator to adapt to many features of -the dialogue domain, user population, and dialogue context. A promising -approach is trainable generation, which uses general-purpose linguistic -knowledge that is automatically adapted to the features of interest, such as -the application domain, individual user, or user group. In this paper we -present and evaluate a trainable sentence planner for providing restaurant -information in the MATCH dialogue system. We show that trainable sentence -planning can produce complex information presentations whose quality is -comparable to the output of a template-based generator tuned to this domain. We -also show that our method easily supports adapting the sentence planner to -individuals, and that the individualized sentence planners generally perform -better than models trained and tested on a population of individuals. Previous -work has documented and utilized individual preferences for content selection, -but to our knowledge, these results provide the first demonstration of -individual preferences for sentence planning operations, affecting the content -order, discourse structure and sentence structure of system responses. Finally, -we evaluate the contribution of different feature sets, and show that, in our -application, n-gram features often do as well as features based on higher-level -linguistic representations. -" -409,1111.1648,Archana Shukla,Sentiment Analysis of Document Based on Annotation,cs.IR cs.CL," I present a tool which tells the quality of document or its usefulness based -on annotations. Annotation may include comments, notes, observation, -highlights, underline, explanation, question or help etc. comments are used for -evaluative purpose while others are used for summarization or for expansion -also. Further these comments may be on another annotation. Such annotations are -referred as meta-annotation. All annotation may not get equal weightage. My -tool considered highlights, underline as well as comments to infer the -collective sentiment of annotators. Collective sentiments of annotators are -classified as positive, negative, objectivity. My tool computes collective -sentiment of annotations in two manners. It counts all the annotation present -on the documents as well as it also computes sentiment scores of all annotation -which includes comments to obtain the collective sentiments about the document -or to judge the quality of document. I demonstrate the use of tool on research -paper. -" -410,1111.1673,Daoud Clarke,Algebras over a field and semantics for context based reasoning,cs.CL cs.LO," This paper introduces context algebras and demonstrates their application to -combining logical and vector-based representations of meaning. Other approaches -to this problem attempt to reproduce aspects of logical semantics within new -frameworks. The approach we present here is different: We show how logical -semantics can be embedded within a vector space framework, and use this to -combine distributional semantics, in which the meanings of words are -represented as vectors, with logical semantics, in which the meaning of a -sentence is represented as a logical form. -" -411,1111.2399,Kishorjit Nongmeikapam and Sivaji Bandyopadhyay,"Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri - Multiword Expression (MWE) Identification",cs.CL cs.NE," This paper deals with the identification of Multiword Expressions (MWEs) in -Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the -Eight Schedule of Indian Constitution. MWE plays an important role in the -applications of Natural Language Processing(NLP) like Machine Translation, Part -of Speech tagging, Information Retrieval, Question Answering etc. Feature -selection is an important factor in the recognition of Manipuri MWEs using -Conditional Random Field (CRF). The disadvantage of manual selection and -choosing of the appropriate features for running CRF motivates us to think of -Genetic Algorithm (GA). Using GA we are able to find the optimal features to -run the CRF. We have tried with fifty generations in feature selection along -with three fold cross validation as fitness function. This model demonstrated -the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%, -showing an improvement over the CRF based Manipuri MWE identification without -GA application. -" -412,1111.3122,"Iris Eshkol (LLL), D. Maurel (LI), Nathalie Friburger (LI)",ESLO: from transcription to speakers' personal information annotation,cs.CL," This paper presents the preliminary works to put online a French oral corpus -and its transcription. This corpus is the Socio-Linguistic Survey in Orleans, -realized in 1968. First, we numerized the corpus, then we handwritten -transcribed it with the Transcriber software adding different tags about -speakers, time, noise, etc. Each document (audio file and XML file of the -transcription) was described by a set of metadata stored in an XML format to -allow an easy consultation. Second, we added different levels of annotations, -recognition of named entities and annotation of personal information about -speakers. This two annotation tasks used the CasSys system of transducer -cascades. We used and modified a first cascade to recognize named entities. -Then we built a second cascade to annote the designating entities, i.e. -information about the speaker. These second cascade parsed the named entity -annotated corpus. The objective is to locate information about the speaker and, -also, what kind of information can designate him/her. These two cascades was -evaluated with precision and recall measures. -" -413,1111.3152,"Elsa Tolone (LIGM, FaMAF), \'Eric De La Clergerie (INRIA - Rocquencourt), Sagot Benoit (INRIA Rocquencourt)","\'Evaluation de lexiques syntaxiques par leur int\'egartion dans - l'analyseur syntaxiques FRMG",cs.CL," In this paper, we evaluate various French lexica with the parser FRMG: the -Lefff, LGLex, the lexicon built from the tables of the French Lexicon-Grammar, -the lexicon DICOVALENCE and a new version of the verbal entries of the Lefff, -obtained by merging with DICOVALENCE and partial manual validation. For this, -all these lexica have been converted to the format of the Lefff, Alexina -format. The evaluation was made on the part of the EASy corpus used in the -first evaluation campaign Passage. -" -414,1111.3153,"Kyriaki Ioannidou (LTTL), Elsa Tolone (LIGM, FaMAF)","Construction du lexique LGLex \`a partir des tables du Lexique-Grammaire - des verbes du grec moderne",cs.CL," In this paper, we summerize the work done on the resources of Modern Greek on -the Lexicon-Grammar of verbs. We detail the definitional features of each -table, and all changes made to the names of features to make them consistent. -Through the development of the table of classes, including all the features, we -have considered the conversion of tables in a syntactic lexicon: LGLex. The -lexicon, in plain text format or XML, is generated by the LGExtract tool -(Constant & Tolone, 2010). This format is directly usable in applications of -Natural Language Processing (NLP). -" -415,1111.3462,"Elsa Tolone (LIGM, FaMAF), Voyatzi Stavroula (LIGM)",Extending the adverbial coverage of a NLP oriented resource for French,cs.CL," This paper presents a work on extending the adverbial entries of LGLex: a NLP -oriented syntactic resource for French. Adverbs were extracted from the -Lexicon-Grammar tables of both simple adverbs ending in -ment '-ly' (Molinier -and Levrier, 2000) and compound adverbs (Gross, 1986; 1990). This work relies -on the exploitation of fine-grained linguistic information provided in existing -resources. Various features are encoded in both LG tables and they haven't been -exploited yet. They describe the relations of deleting, permuting, intensifying -and paraphrasing that associate, on the one hand, the simple and compound -adverbs and, on the other hand, different types of compound adverbs. The -resulting syntactic resource is manually evaluated and freely available under -the LGPL-LR license. -" -416,1111.4316,"Valeria Fionda, Claudio Gutierrez, Giuseppe Pirr\'o","Semantic Navigation on the Web of Data: Specification of Routes, Web - Fragments and Actions",cs.NI cs.CL," The massive semantic data sources linked in the Web of Data give new meaning -to old features like navigation; introduce new challenges like semantic -specification of Web fragments; and make it possible to specify actions relying -on semantic data. In this paper we introduce a declarative language to face -these challenges. Based on navigational features, it is designed to specify -fragments of the Web of Data and actions to be performed based on these data. -We implement it in a centralized fashion, and show its power and performance. -Finally, we explore the same ideas in a distributed setting, showing their -feasibility, potentialities and challenges. -" -417,1111.4343,Yuriy Ostapov,"Question Answering in a Natural Language Understanding System Based on - Object-Oriented Semantics",cs.CL," Algorithms of question answering in a computer system oriented on input and -logical processing of text information are presented. A knowledge domain under -consideration is social behavior of a person. A database of the system includes -an internal representation of natural language sentences and supplemental -information. The answer {\it Yes} or {\it No} is formed for a general question. -A special question containing an interrogative word or group of interrogative -words permits to find a subject, object, place, time, cause, purpose and way of -action or event. Answer generation is based on identification algorithms of -persons, organizations, machines, things, places, and times. Proposed -algorithms of question answering can be realized in information systems closely -connected with text processing (criminology, operation of business, medicine, -document systems). -" -418,1111.5293,Sanjay K. Dwivedi and Pramod P. Sukhadeve,Rule based Part of speech Tagger for Homoeopathy Clinical realm,cs.CL," A tagger is a mandatory segment of most text scrutiny systems, as it -consigned a s yntax class (e.g., noun, verb, adjective, and adverb) to every -word in a sentence. In this paper, we present a simple part of speech tagger -for homoeopathy clinical language. This paper reports about the anticipated -part of speech tagger for homoeopathy clinical language. It exploit standard -pattern for evaluating sentences, untagged clinical corpus of 20085 words is -used, from which we had selected 125 sentences (2322 tokens). The problem of -tagging in natural language processing is to find a way to tag every word in a -text as a meticulous part of speech. The basic idea is to apply a set of rules -on clinical sentences and on each word, Accuracy is the leading factor in -evaluating any POS tagger so the accuracy of proposed tagger is also conversed. -" -419,1111.6553,"Jan P\""oschko",Exploring Twitter Hashtags,cs.CL," Twitter messages often contain so-called hashtags to denote keywords related -to them. Using a dataset of 29 million messages, I explore relations among -these hashtags with respect to co-occurrences. Furthermore, I present an -attempt to classify hashtags into five intuitive classes, using a -machine-learning approach. The overall outcome is an interactive Web -application to explore Twitter hashtags. -" -420,1111.7190,Micha{\l} B. Paradowski,Developing Embodied Multisensory Dialogue Agents,cs.AI cs.CL," A few decades of work in the AI field have focused efforts on developing a -new generation of systems which can acquire knowledge via interaction with the -world. Yet, until very recently, most such attempts were underpinned by -research which predominantly regarded linguistic phenomena as separated from -the brain and body. This could lead one into believing that to emulate -linguistic behaviour, it suffices to develop 'software' operating on abstract -representations that will work on any computational machine. This picture is -inaccurate for several reasons, which are elucidated in this paper and extend -beyond sensorimotor and semantic resonance. Beginning with a review of -research, I list several heterogeneous arguments against disembodied language, -in an attempt to draw conclusions for developing embodied multisensory agents -which communicate verbally and non-verbally with their environment. Without -taking into account both the architecture of the human brain, and embodiment, -it is unrealistic to replicate accurately the processes which take place during -language acquisition, comprehension, production, or during non-linguistic -actions. While robots are far from isomorphic with humans, they could benefit -from strengthened associative connections in the optimization of their -processes and their reactivity and sensitivity to environmental stimuli, and in -situated human-machine interaction. The concept of multisensory integration -should be extended to cover linguistic input and the complementary information -combined from temporally coincident sensory impressions. -" -421,1112.0168,"Achraf Othman, and Mohamed Jemni","Statistical Sign Language Machine Translation: from English written text - to American Sign Language Gloss",cs.CL," This works aims to design a statistical machine translation from English text -to American Sign Language (ASL). The system is based on Moses tool with some -modifications and the results are synthesized through a 3D avatar for -interpretation. First, we translate the input text to gloss, a written form of -ASL. Second, we pass the output to the WebSign Plug-in to play the sign. -Contributions of this work are the use of a new couple of language English/ASL -and an improvement of statistical machine translation based on string matching -thanks to Jaro-distance. -" -422,1112.0396,"Win Win Thant, Tin Myat Htwe and Ni Lar Thein","Grammatical Relations of Myanmar Sentences Augmented by - Transformation-Based Learning of Function Tagging",cs.CL," In this paper we describe function tagging using Transformation Based -Learning (TBL) for Myanmar that is a method of extensions to the previous -statistics-based function tagger. Contextual and lexical rules (developed using -TBL) were critical in achieving good results. First, we describe a method for -expressing lexical relations in function tagging that statistical function -tagging are currently unable to express. Function tagging is the preprocessing -step to show grammatical relations of the sentences. Then we use the context -free grammar technique to clarify the grammatical relations in Myanmar -sentences or to output the parse trees. The grammatical relations are the -functional structure of a language. They rely very much on the function tag of -the tokens. We augment the grammatical relations of Myanmar sentences with -transformation-based learning of function tagging. -" -423,1112.2468,Tao Chen and Min-Yen Kan,"Creating a Live, Public Short Message Service Corpus: The NUS SMS Corpus",cs.CL," Short Message Service (SMS) messages are largely sent directly from one -person to another from their mobile phones. They represent a means of personal -communication that is an important communicative artifact in our current -digital era. As most existing studies have used private access to SMS corpora, -comparative studies using the same raw SMS data has not been possible up to -now. We describe our efforts to collect a public SMS corpus to address this -problem. We use a battery of methodologies to collect the corpus, paying -particular attention to privacy issues to address contributors' concerns. Our -live project collects new SMS message submissions, checks their quality and -adds the valid messages, releasing the resultant corpus as XML and as SQL -dumps, along with corpus statistics, every month. We opportunistically collect -as much metadata about the messages and their sender as possible, so as to -enable different types of analyses. To date, we have collected about 60,000 -messages, focusing on English and Mandarin Chinese. -" -424,1112.3670,"Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang, Jon Kleinberg","Echoes of power: Language effects and power differences in social - interaction",cs.SI cs.CL physics.soc-ph," Understanding social interaction within groups is key to analyzing online -communities. Most current work focuses on structural properties: who talks to -whom, and how such interactions form larger network structures. The -interactions themselves, however, generally take place in the form of natural -language --- either spoken or written --- and one could reasonably suppose that -signals manifested in language might also provide information about roles, -status, and other aspects of the group's dynamics. To date, however, finding -such domain-independent language-based signals has been a challenge. - Here, we show that in group discussions power differentials between -participants are subtly revealed by how much one individual immediately echoes -the linguistic style of the person they are responding to. Starting from this -observation, we propose an analysis framework based on linguistic coordination -that can be used to shed light on power relationships and that works -consistently across multiple types of power --- including a more ""static"" form -of power based on status differences, and a more ""situational"" form of power in -which one individual experiences a type of dependence on another. Using this -framework, we study how conversational behavior can reveal power relationships -in two very different settings: discussions among Wikipedians and arguments -before the U.S. Supreme Court. -" -425,1112.5947,"Sergiu Ivanov, Sergey Verlan",Random Context and Semi-Conditional Insertion-Deletion Systems,cs.FL cs.CC cs.CL cs.DM," In this article we introduce the operations of insertion and deletion working -in a random-context and semi-conditional manner. We show that the conditional -use of rules strictly increase the computational power. In the case of -semi-conditional insertion-deletion systems context-free insertion and deletion -rules of one symbol are sufficient to get the computational completeness. In -the random context case our results expose an asymmetry between the -computational power of insertion and deletion rules: systems of size $(2,0,0; -1,1,0)$ are computationally complete, while systems of size $(1,1,0;2,0,0)$ -(and more generally of size $(1,1,0;p,1,1)$) are not. This is particularly -interesting because other control mechanisms like graph-control or matrix -control used together with insertion-deletion systems do not present such -asymmetry. -" -426,1112.6045,"Diego R. Amancio, Eduardo G. Altmann, Osvaldo N. Oliveira Jr., Luciano - da F. Costa","Comparing intermittency and network measurements of words and their - dependency on authorship",physics.soc-ph cs.CL cs.SI physics.data-an," Many features from texts and languages can now be inferred from statistical -analyses using concepts from complex networks and dynamical systems. In this -paper we quantify how topological properties of word co-occurrence networks and -intermittency (or burstiness) in word distribution depend on the style of -authors. Our database contains 40 books from 8 authors who lived in the 19th -and 20th centuries, for which the following network measurements were obtained: -clustering coefficient, average shortest path lengths, and betweenness. We -found that the two factors with stronger dependency on the authors were the -skewness in the distribution of word intermittency and the average shortest -paths. Other factors such as the betweeness and the Zipf's law exponent show -only weak dependency on authorship. Also assessed was the contribution from -each measurement to authorship recognition using three machine learning -methods. The best performance was a ca. 65 % accuracy upon combining complex -network and intermittency features with the nearest neighbor algorithm. From a -detailed analysis of the interdependence of the various metrics it is concluded -that the methods used here are complementary for providing short- and -long-scale perspectives of texts, which are useful for applications such as -identification of topical words and information retrieval. -" -427,1112.6286,Esther Vlieger and Loet Leydesdorff,"Visualization and Analysis of Frames in Collections of Messages: Content - Analysis and the Measurement of Meaning",cs.CL," A step-to-step introduction is provided on how to generate a semantic map -from a collection of messages (full texts, paragraphs or statements) using -freely available software and/or SPSS for the relevant statistics and the -visualization. The techniques are discussed in the various theoretical contexts -of (i) linguistics (e.g., Latent Semantic Analysis), (ii) sociocybernetics and -social systems theory (e.g., the communication of meaning), and (iii) -communication studies (e.g., framing and agenda-setting). We distinguish -between the communication of information in the network space (social network -analysis) and the communication of meaning in the vector space. The vector -space can be considered a generated as an architecture by the network of -relations in the network space; words are then not only related, but also -positioned. These positions are expected rather than observed and therefore one -can communicate meaning. Knowledge can be generated when these meanings can -recursively be communicated and therefore also further codified. -" -428,1112.6384,Michael Moortgat and Richard Moot,Proof nets for the Lambek-Grishin calculus,cs.CL," Grishin's generalization of Lambek's Syntactic Calculus combines a -non-commutative multiplicative conjunction and its residuals (product, left and -right division) with a dual family: multiplicative disjunction, right and left -difference. Interaction between these two families takes the form of linear -distributivity principles. We study proof nets for the Lambek-Grishin calculus -and the correspondence between these nets and unfocused and focused versions of -its sequent calculus. -" -429,1201.1192,Oleg Bisikalo and Irina Kravchuk,"Formalization of semantic network of image constructions in electronic - content",cs.CL," A formal theory based on a binary operator of directional associative -relation is constructed in the article and an understanding of an associative -normal form of image constructions is introduced. A model of a commutative -semigroup, which provides a presentation of a sentence as three components of -an interrogative linguistic image construction, is considered. -" -430,1201.1652,"Sylvie Gibet (VALORIA, IRISA), Pierre-Fran\c{c}ois Marteau (VALORIA, - IRISA), Kyle Duarte (VALORIA)",Toward a Motor Theory of Sign Language Perception,cs.CL cs.HC," Researches on signed languages still strongly dissociate lin- guistic issues -related on phonological and phonetic aspects, and gesture studies for -recognition and synthesis purposes. This paper focuses on the imbrication of -motion and meaning for the analysis, synthesis and evaluation of sign language -gestures. We discuss the relevance and interest of a motor theory of perception -in sign language communication. According to this theory, we consider that -linguistic knowledge is mapped on sensory-motor processes, and propose a -methodology based on the principle of a synthesis-by-analysis approach, guided -by an evaluation process that aims to validate some hypothesis and concepts of -this theory. Examples from existing studies illustrate the di erent concepts -and provide avenues for future work. -" -431,1201.2010,"K. M. Azharul Hasan, Al-Mahmud, Amit Mondal, Amit Saha",Recognizing Bangla Grammar using Predictive Parser,cs.CL," We describe a Context Free Grammar (CFG) for Bangla language and hence we -propose a Bangla parser based on the grammar. Our approach is very much general -to apply in Bangla Sentences and the method is well accepted for parsing a -language of a grammar. The proposed parser is a predictive parser and we -construct the parse table for recognizing Bangla grammar. Using the parse table -we recognize syntactical mistakes of Bangla sentences when there is no entry -for a terminal in the parse table. If a natural language can be successfully -parsed then grammar checking from this language becomes possible. The proposed -scheme is based on Top down parsing method and we have avoided the left -recursion of the CFG using the idea of left factoring. -" -432,1201.2073,"Mehwish Aziz, Muhammad Rafi",Pbm: A new dataset for blog mining,cs.AI cs.CL cs.IR," Text mining is becoming vital as Web 2.0 offers collaborative content -creation and sharing. Now Researchers have growing interest in text mining -methods for discovering knowledge. Text mining researchers come from variety of -areas like: Natural Language Processing, Computational Linguistic, Machine -Learning, and Statistics. A typical text mining application involves -preprocessing of text, stemming and lemmatization, tagging and annotation, -deriving knowledge patterns, evaluating and interpreting the results. There are -numerous approaches for performing text mining tasks, like: clustering, -categorization, sentimental analysis, and summarization. There is a growing -need to standardize the evaluation of these tasks. One major component of -establishing standardization is to provide standard datasets for these tasks. -Although there are various standard datasets available for traditional text -mining tasks, but there are very few and expensive datasets for blog-mining -task. Blogs, a new genre in web 2.0 is a digital diary of web user, which has -chronological entries and contains a lot of useful knowledge, thus offers a lot -of challenges and opportunities for text mining. In this paper, we report a new -indigenous dataset for Pakistani Political Blogosphere. The paper describes the -process of data collection, organization, and standardization. We have used -this dataset for carrying out various text mining tasks for blogosphere, like: -blog-search, political sentiments analysis and tracking, identification of -influential blogger, and clustering of the blog-posts. We wish to offer this -dataset free for others who aspire to pursue further in this domain. -" -433,1201.2240,Kamal Sarkar,Bengali text summarization by sentence extraction,cs.IR cs.CL," Text summarization is a process to produce an abstract or a summary by -selecting significant portion of the information from one or more texts. In an -automatic text summarization process, a text is given to the computer and the -computer returns a shorter less redundant extract or abstract of the original -text(s). Many techniques have been developed for summarizing English text(s). -But, a very few attempts have been made for Bengali text summarization. This -paper presents a method for Bengali text summarization which extracts important -sentences from a Bengali document to produce a summary. -" -434,1201.2719,Fionn Murtagh,"Ultrametric Model of Mind, II: Application to Text Content Analysis",cs.AI cs.CL," In a companion paper, Murtagh (2012), we discussed how Matte Blanco's work -linked the unrepressed unconscious (in the human) to symmetric logic and -thought processes. We showed how ultrametric topology provides a most useful -representational and computational framework for this. Now we look at the -extent to which we can find ultrametricity in text. We use coherent and -meaningful collections of nearly 1000 texts to show how we can measure inherent -ultrametricity. On the basis of our findings we hypothesize that inherent -ultrametricty is a basis for further exploring unconscious thought processes. -" -435,1201.4733,"Michael Zock (LIF), Guy Lapalme (DIRO)",Du TAL au TIL,cs.CL cs.HC," Historically two types of NLP have been investigated: fully automated -processing of language by machines (NLP) and autonomous processing of natural -language by people, i.e. the human brain (psycholinguistics). We believe that -there is room and need for another kind, INLP: interactive natural language -processing. This intermediate approach starts from peoples' needs, trying to -bridge the gap between their actual knowledge and a given goal. Given the fact -that peoples' knowledge is variable and often incomplete, the aim is to build -bridges linking a given knowledge state to a given goal. We present some -examples, trying to show that this goal is worth pursuing, achievable and at a -reasonable cost. -" -436,1201.5477,"Julian Sienkiewicz, Marcin Skowron, Georgios Paltoglou, and Janusz A. - Holyst",Entropy-growth-based model of emotionally charged online dialogues,physics.soc-ph cs.CL cs.SI physics.data-an," We analyze emotionally annotated massive data from IRC (Internet Relay Chat) -and model the dialogues between its participants by assuming that the driving -force for the discussion is the entropy growth of emotional probability -distribution. This process is claimed to be correlated to the emergence of the -power-law distribution of the discussion lengths observed in the dialogues. We -perform numerical simulations based on the noticed phenomenon obtaining a good -agreement with the real data. Finally, we propose a method to artificially -prolong the duration of the discussion that relies on the entropy of emotional -probability distribution. -" -437,1201.5484,"Piotr Pohorecki, Julian Sienkiewicz, Marija Mitrovic, Georgios - Paltoglou, and Janusz A. Holyst",Statistical analysis of emotions and opinions at Digg website,physics.soc-ph cs.CL cs.SI physics.data-an," We performed statistical analysis on data from the Digg.com website, which -enables its users to express their opinion on news stories by taking part in -forum-like discussions as well as directly evaluate previous posts and stories -by assigning so called ""diggs"". Owing to fact that the content of each post has -been annotated with its emotional value, apart from the strictly structural -properties, the study also includes an analysis of the average emotional -response of the posts commenting the main story. While analysing correlations -at the story level, an interesting relationship between the number of diggs and -the number of comments received by a story was found. The correlation between -the two quantities is high for data where small threads dominate and -consistently decreases for longer threads. However, while the correlation of -the number of diggs and the average emotional response tends to grow for longer -threads, correlations between numbers of comments and the average emotional -response are almost zero. We also show that the initial set of comments given -to a story has a substantial impact on the further ""life"" of the discussion: -high negative average emotions in the first 10 comments lead to longer threads -while the opposite situation results in shorter discussions. We also suggest -presence of two different mechanisms governing the evolution of the discussion -and, consequently, its length. -" -438,1201.6224,Yannis Haralambous and Vitaly Klyuev,Wikipedia Arborification and Stratified Explicit Semantic Analysis,cs.CL," [This is the translation of paper ""Arborification de Wikip\'edia et analyse -s\'emantique explicite stratifi\'ee"" submitted to TALN 2012.] - We present an extension of the Explicit Semantic Analysis method by -Gabrilovich and Markovitch. Using their semantic relatedness measure, we weight -the Wikipedia categories graph. Then, we extract a minimal spanning tree, using -Chu-Liu & Edmonds' algorithm. We define a notion of stratified tfidf where the -stratas, for a given Wikipedia page and a given term, are the classical tfidf -and categorical tfidfs of the term in the ancestor categories of the page -(ancestors in the sense of the minimal spanning tree). Our method is based on -this stratified tfidf, which adds extra weight to terms that ""survive"" when -climbing up the category tree. We evaluate our method by a text classification -on the WikiNews corpus: it increases precision by 18%. Finally, we provide -hints for future research -" -439,1202.0116,Yuriy Ostapov,"Inference and Plausible Reasoning in a Natural Language Understanding - System Based on Object-Oriented Semantics",cs.CL," Algorithms of inference in a computer system oriented to input and semantic -processing of text information are presented. Such inference is necessary for -logical questions when the direct comparison of objects from a question and -database can not give a result. The following classes of problems are -considered: a check of hypotheses for persons and non-typical actions, the -determination of persons and circumstances for non-typical actions, planning -actions, the determination of event cause and state of persons. To form an -answer both deduction and plausible reasoning are used. As a knowledge domain -under consideration is social behavior of persons, plausible reasoning is based -on laws of social psychology. Proposed algorithms of inference and plausible -reasoning can be realized in computer systems closely connected with text -processing (criminology, operation of business, medicine, document systems). -" -440,1202.0617,"Nitin, Ankush Bansal, Siddhartha Mahadev Sharma, Kapil Kumar, Anuj - Aggarwal, Sheenu Goyal, Kanika Choudhary, Kunal Chawla, Kunal Jain and Manav - Bhasin",Classification of Flames in Computer Mediated Communications,cs.SI cs.CL," Computer Mediated Communication (CMC) has brought about a revolution in the -way the world communicates with each other. With the increasing number of -people, interacting through the internet and the rise of new platforms and -technologies has brought together the people from different social, cultural -and geographical backgrounds to present their thoughts, ideas and opinions on -topics of their interest. CMC has, in some cases, gave users more freedom to -express themselves as compared to Face-to-face communication. This has also led -to rise in the use of hostile and aggressive language and terminologies -uninhibitedly. Since such use of language is detrimental to the discussion -process and affects the audience and individuals negatively, efforts are being -taken to control them. The research sees the need to understand the concept of -flaming and hence attempts to classify them in order to give a better -understanding of it. The classification is done on the basis of type of flame -content being presented and the Style in which they are presented. -" -441,1202.1054,Alex Rudnick,Considering a resource-light approach to learning verb valencies,cs.CL," Here we describe work on learning the subcategories of verbs in a -morphologically rich language using only minimal linguistic resources. Our goal -is to learn verb subcategorizations for Quechua, an under-resourced -morphologically rich language, from an unannotated corpus. We compare results -from applying this approach to an unannotated Arabic corpus with those achieved -by processing the same text in treebank form. The original plan was to use only -a morphological analyzer and an unannotated corpus, but experiments suggest -that this approach by itself will not be effective for learning the -combinatorial potential of Arabic verbs in general. The lower bound on -resources for acquiring this information is somewhat higher, apparently -requiring a a part-of-speech tagger and chunker for most languages, and a -morphological disambiguater for Arabic. -" -442,1202.1568,"Seungyeon Kim, Fuxin Li, Guy Lebanon, and Irfan Essa",Beyond Sentiment: The Manifold of Human Emotions,cs.CL," Sentiment analysis predicts the presence of positive or negative emotions in -a text document. In this paper we consider higher dimensional extensions of the -sentiment concept, which represent a richer set of human emotions. Our approach -goes beyond previous work in that our model contains a continuous manifold -rather than a finite set of human emotions. We investigate the resulting model, -compare it to psychological observations, and explore its predictive -capabilities. Besides obtaining significant improvements over a baseline -without manifold, we are also able to visualize different notions of positive -sentiment in different domains. -" -443,1202.2518,Wang Liang,Segmenting DNA sequence into `words',q-bio.GN cs.CL," This paper presents a novel method to segment/decode DNA sequences based on -n-grams statistical language model. Firstly, we find the length of most DNA -'words' is 12 to 15 bps by analyzing the genomes of 12 model species. Then we -design an unsupervised probability based approach to segment the DNA sequences. -The benchmark of segmenting method is also proposed. -" -444,1202.3752,"Nebojsa Jojic, Alessandro Perina","Multidimensional counting grids: Inferring word order from disordered - bags of words",cs.IR cs.CL cs.LG stat.ML," Models of bags of words typically assume topic mixing so that the words in a -single bag come from a limited number of topics. We show here that many sets of -bag of words exhibit a very different pattern of variation than the patterns -that are efficiently captured by topic mixing. In many cases, from one bag of -words to the next, the words disappear and new ones appear as if the theme -slowly and smoothly shifted across documents (providing that the documents are -somehow ordered). Examples of latent structure that describe such ordering are -easily imagined. For example, the advancement of the date of the news stories -is reflected in a smooth change over the theme of the day as certain evolving -news stories fall out of favor and new events create new stories. Overlaps -among the stories of consecutive days can be modeled by using windows over -linearly arranged tight distributions over words. We show here that such -strategy can be extended to multiple dimensions and cases where the ordering of -data is not readily obvious. We demonstrate that this way of modeling -covariation in word occurrences outperforms standard topic models in -classification and prediction tasks in applications in biology, text modeling -and computer vision. -" -445,1202.4837,"Jordi Saludes (UPC), Sebastian Xamb\'o (UPC)",The GF Mathematics Library,cs.MS cs.CL," This paper is devoted to present the Mathematics Grammar Library, a system -for multilingual mathematical text processing. We explain the context in which -it originated, its current design and functionality and the current development -goals. We also present two prototype services and comment on possible future -applications in the area of artificial mathematics assistants. -" -446,1202.5913,"Ruedi Stoop, Patrick N\""uesch, Ralph Lukas Stoop, and Leonid - Bunimovich",Fly out-smarts man,q-bio.PE cs.CL physics.bio-ph," Precopulatory courtship is a high-cost, non-well understood animal world -mystery. Drosophila's (=D.'s) precopulatory courtship not only shows marked -structural similarities with mammalian courtship, but also with human spoken -language. This suggests the study of purpose, modalities and in particular of -the power of this language and to compare it to human language. Following a -mathematical symbolic dynamics approach, we translate courtship videos of D.'s -body language into a formal language. This approach made it possible to show -that D. may use its body language to express individual information - -information that may be important for evolutionary optimization, on top of the -sexual group membership. Here, we use Chomsky's hierarchical language -classification to characterize the power of D.'s body language, and then -compare it with the power of languages spoken by humans. We find that from a -formal language point of view, D.'s body language is at least as powerful as -the languages spoken by humans. From this we conclude that human intellect -cannot be the direct consequence of the formal grammar complexity of human -language. -" -447,1202.6266,"Ali Sadiqui, Noureddine Chenfour","Realisation d'un systeme de reconnaissance automatique de la parole - arabe base sur CMU Sphinx",cs.CL," This paper presents the continuation of the work completed by Satori and all. -[SCH07] by the realization of an automatic speech recognition system (ASR) for -Arabic language based SPHINX 4 system. The previous work was limited to the -recognition of the first ten digits, whereas the present work is a remarkable -projection consisting in continuous Arabic speech recognition with a rate of -recognition of surroundings 96%. -" -448,1202.6583,"Luis Quesada, Fernando Berzal, Francisco J. Cortijo",A Lexical Analysis Tool with Ambiguity Support,cs.CL cs.FL," Lexical ambiguities naturally arise in languages. We present Lamb, a lexical -analyzer that produces a lexical analysis graph describing all the possible -sequences of tokens that can be found within the input string. Parsers can -process such lexical analysis graphs and discard any sequence of tokens that -does not produce a valid syntactic sentence, therefore performing, together -with Lamb, a context-sensitive lexical analysis in lexically-ambiguous language -specifications. -" -449,1203.0145,Peter beim Graben,The Horse Raced Past: Gardenpath Processing in Dynamical Systems,cs.CL," I pinpoint an interesting similarity between a recent account to rational -parsing and the treatment of sequential decisions problems in a dynamical -systems approach. I argue that expectation-driven search heuristics aiming at -fast computation resembles a high-risk decision strategy in favor of large -transition velocities. Hale's rational parser, combining generalized -left-corner parsing with informed $\mathrm{A}^*$ search to resolve processing -conflicts, explains gardenpath effects in natural sentence processing by -misleading estimates of future processing costs that are to be minimized. On -the other hand, minimizing the duration of cognitive computations in -time-continuous dynamical systems can be described by combining vector space -representations of cognitive states by means of filler/role decompositions and -subsequent tensor product representations with the paradigm of stable -heteroclinic sequences. Maximizing transition velocities according to a -high-risk decision strategy could account for a fast race even between states -that are apparently remote in representation space. -" -450,1203.0504,Martin Bachwerk and Carl Vogel,Modelling Social Structures and Hierarchies in Language Evolution,cs.CL cs.AI cs.MA," Language evolution might have preferred certain prior social configurations -over others. Experiments conducted with models of different social structures -(varying subgroup interactions and the role of a dominant interlocutor) suggest -that having isolated agent groups rather than an interconnected agent is more -advantageous for the emergence of a social communication system. Distinctive -groups that are closely connected by communication yield systems less like -natural language than fully isolated groups inhabiting the same world. -Furthermore, the addition of a dominant male who is asymmetrically favoured as -a hearer, and equally likely to be a speaker has no positive influence on the -disjoint groups. -" -451,1203.0512,Martin Bachwerk and Carl Vogel,Establishing linguistic conventions in task-oriented primeval dialogue,cs.CL cs.AI cs.MA," In this paper, we claim that language is likely to have emerged as a -mechanism for coordinating the solution of complex tasks. To confirm this -thesis, computer simulations are performed based on the coordination task -presented by Garrod & Anderson (1987). The role of success in task-oriented -dialogue is analytically evaluated with the help of performance measurements -and a thorough lexical analysis of the emergent communication system. -Simulation results confirm a strong effect of success mattering on both -reliability and dispersion of linguistic conventions. -" -452,1203.1685,"Win Win Thant, Tin Myat Htwe and Ni Lar Thein","Statistical Function Tagging and Grammatical Relations of Myanmar - Sentences",cs.CL," This paper describes a context free grammar (CFG) based grammatical relations -for Myanmar sentences which combine corpus-based function tagging system. Part -of the challenge of statistical function tagging for Myanmar sentences comes -from the fact that Myanmar has free-phrase-order and a complex morphological -system. Function tagging is a pre-processing step to show grammatical relations -of Myanmar sentences. In the task of function tagging, which tags the function -of Myanmar sentences with correct segmentation, POS (part-of-speech) tagging -and chunking information, we use Naive Bayesian theory to disambiguate the -possible function tags of a word. We apply context free grammar (CFG) to find -out the grammatical relations of the function tags. We also create a functional -annotated tagged corpus for Myanmar and propose the grammar rules for Myanmar -sentences. Experiments show that our analysis achieves a good result with -simple sentences and complex sentences. -" -453,1203.1743,Christian Retor\'e (LaBRI),"Variable types for meaning assembly: a logical syntax for generic noun - phrases introduced by most",math.LO cs.CL cs.LO," This paper proposes a way to compute the meanings associated with sentences -with generic noun phrases corresponding to the generalized quantifier most. We -call these generics specimens and they resemble stereotypes or prototypes in -lexical semantics. The meanings are viewed as logical formulae that can -thereafter be interpreted in your favourite models. To do so, we depart -significantly from the dominant Fregean view with a single untyped universe. -Indeed, our proposal adopts type theory with some hints from Hilbert -\epsilon-calculus (Hilbert, 1922; Avigad and Zach, 2008) and from medieval -philosophy, see e.g. de Libera (1993, 1996). Our type theoretic analysis bears -some resemblance with ongoing work in lexical semantics (Asher 2011; Bassac et -al. 2010; Moot, Pr\'evot and Retor\'e 2011). Our model also applies to -classical examples involving a class, or a generic element of this class, which -is not uttered but provided by the context. An outcome of this study is that, -in the minimalism-contextualism debate, see Conrad (2011), if one adopts a type -theoretical view, terms encode the purely semantic meaning component while -their typing is pragmatically determined. -" -454,1203.1858,Saif M. Mohammad and Graeme Hirst,Distributional Measures of Semantic Distance: A Survey,cs.CL," The ability to mimic human notions of semantic distance has widespread -applications. Some measures rely only on raw text (distributional measures) and -some rely on knowledge sources such as WordNet. Although extensive studies have -been performed to compare WordNet-based measures with human judgment, the use -of distributional measures as proxies to estimate semantic distance has -received little attention. Even though they have traditionally performed poorly -when compared to WordNet-based measures, they lay claim to certain uniquely -attractive features, such as their applicability in resource-poor languages and -their ability to mimic both semantic similarity and semantic relatedness. -Therefore, this paper presents a detailed study of distributional measures. -Particular attention is paid to flesh out the strengths and limitations of both -WordNet-based and distributional measures, and how distributional measures of -distance can be brought more in line with human notions of semantic distance. -We conclude with a brief discussion of recent work on hybrid measures. -" -455,1203.1889,Saif M Mohammad and Graeme Hirst,Distributional Measures as Proxies for Semantic Relatedness,cs.CL," The automatic ranking of word pairs as per their semantic relatedness and -ability to mimic human notions of semantic relatedness has widespread -applications. Measures that rely on raw data (distributional measures) and -those that use knowledge-rich ontologies both exist. Although extensive studies -have been performed to compare ontological measures with human judgment, the -distributional measures have primarily been evaluated by indirect means. This -paper is a detailed study of some of the major distributional measures; it -lists their respective merits and limitations. New measures that overcome these -drawbacks, that are more in line with the human notions of semantic -relatedness, are suggested. The paper concludes with an exhaustive comparison -of the distributional and ontology-based measures. Along the way, significant -research problems are identified. Work on these problems may lead to a better -understanding of how semantic relatedness is to be measured. -" -456,1203.2293,"Sergey Petrov, Jose F. Fontanari and Leonid I. Perlovsky",Categories of Emotion names in Web retrieved texts,cs.CL cs.IR," The categorization of emotion names, i.e., the grouping of emotion words that -have similar emotional connotations together, is a key tool of Social -Psychology used to explore people's knowledge about emotions. Without -exception, the studies following that research line were based on the gauging -of the perceived similarity between emotion names by the participants of the -experiments. Here we propose and examine a new approach to study the categories -of emotion names - the similarities between target emotion names are obtained -by comparing the contexts in which they appear in texts retrieved from the -World Wide Web. This comparison does not account for any explicit semantic -information; it simply counts the number of common words or lexical items used -in the contexts. This procedure allows us to write the entries of the -similarity matrix as dot products in a linear vector space of contexts. The -properties of this matrix were then explored using Multidimensional Scaling -Analysis and Hierarchical Clustering. Our main findings, namely, the underlying -dimension of the emotion space and the categories of emotion names, were -consistent with those based on people's judgments of emotion names -similarities. -" -457,1203.2299,"Maxim Makatchev, Reid Simmons, Majd Sakr","A Cross-cultural Corpus of Annotated Verbal and Nonverbal Behaviors in - Receptionist Encounters",cs.CL cs.RO," We present the first annotated corpus of nonverbal behaviors in receptionist -interactions, and the first nonverbal corpus (excluding the original video and -audio data) of service encounters freely available online. Native speakers of -American English and Arabic participated in a naturalistic role play at -reception desks of university buildings in Doha, Qatar and Pittsburgh, USA. -Their manually annotated nonverbal behaviors include gaze direction, hand and -head gestures, torso positions, and facial expressions. We discuss possible -uses of the corpus and envision it to become a useful tool for the human-robot -interaction community. -" -458,1203.2498,"Riadh Bouslimi, Houda Amraoui",Fault detection system for Arabic language,cs.CL," The study of natural language, especially Arabic, and mechanisms for the -implementation of automatic processing is a fascinating field of study, with -various potential applications. The importance of tools for natural language -processing is materialized by the need to have applications that can -effectively treat the vast mass of information available nowadays on electronic -forms. Among these tools, mainly driven by the necessity of a fast writing in -alignment to the actual daily life speed, our interest is on the writing -auditors. The morphological and syntactic properties of Arabic make it a -difficult language to master, and explain the lack in the processing tools for -that language. Among these properties, we can mention: the complex structure of -the Arabic word, the agglutinative nature, lack of vocalization, the -segmentation of the text, the linguistic richness, etc. -" -459,1203.3023,Mehrez Boulares and Mohamed Jemni,"Toward an example-based machine translation from written text to ASL - using virtual agent animation",cs.CL," Modern computational linguistic software cannot produce important aspects of -sign language translation. Using some researches we deduce that the majority of -automatic sign language translation systems ignore many aspects when they -generate animation; therefore the interpretation lost the truth information -meaning. Our goals are: to translate written text from any language to ASL -animation; to model maximum raw information using machine learning and -computational techniques; and to produce a more adapted and expressive form to -natural looking and understandable ASL animations. Our methods include -linguistic annotation of initial text and semantic orientation to generate the -facial expression. We use the genetic algorithms coupled to learning/recognized -systems to produce the most natural form. To detect emotion we are based on -fuzzy logic to produce the degree of interpolation between facial expressions. -Roughly, we present a new expressive language Text Adapted Sign Modeling -Language TASML that describes all maximum aspects related to a natural sign -language interpretation. This paper is organized as follow: the next section is -devoted to present the comprehension effect of using Space/Time/SVO form in ASL -animation based on experimentation. In section 3, we describe our technical -considerations. We present the general approach we adopted to develop our tool -in section 4. Finally, we give some perspectives and future works. -" -460,1203.3227,Anton Loss,Generalisation of language and knowledge models for corpus analysis,cs.AI cs.CL," This paper takes new look on language and knowledge modelling for corpus -linguistics. Using ideas of Chaitin, a line of argument is made against -language/knowledge separation in Natural Language Processing. A simplistic -model, that generalises approaches to language and knowledge, is proposed. One -of hypothetical consequences of this model is Strong AI. -" -461,1203.3511,"Sebastian Riedel, David A. Smith, Andrew McCallum","Inference by Minimizing Size, Divergence, or their Sum",cs.LG cs.CL stat.ML," We speed up marginal inference by ignoring factors that do not significantly -contribute to overall accuracy. In order to pick a suitable subset of factors -to ignore, we propose three schemes: minimizing the number of model factors -under a bound on the KL divergence between pruned and full models; minimizing -the KL divergence under a bound on factor count; and minimizing the weighted -sum of KL divergence and factor count. All three problems are solved using an -approximation of the KL divergence than can be calculated in terms of marginals -computed on a simple seed graph. Applied to synthetic image denoising and to -three different types of NLP parsing models, this technique performs marginal -inference up to 11 times faster than loopy BP, with graph sizes reduced up to -98%-at comparable error in marginals and parsing accuracy. We also show that -minimizing the weighted sum of divergence and size is substantially faster than -minimizing either of the other objectives based on the approximation to -divergence presented here. -" -462,1203.3584,Tarek El-Shishtawy and Fatma El-Ghannam,"An Accurate Arabic Root-Based Lemmatizer for Information Retrieval - Purposes",cs.CL," In spite of its robust syntax, semantic cohesion, and less ambiguity, lemma -level analysis and generation does not yet focused in Arabic NLP literatures. -In the current research, we propose the first non-statistical accurate Arabic -lemmatizer algorithm that is suitable for information retrieval (IR) systems. -The proposed lemmatizer makes use of different Arabic language knowledge -resources to generate accurate lemma form and its relevant features that -support IR purposes. As a POS tagger, the experimental results show that, the -proposed algorithm achieves a maximum accuracy of 94.8%. For first seen -documents, an accuracy of 89.15% is achieved, compared to 76.7% of up to date -Stanford accurate Arabic model, for the same, dataset. -" -463,1203.3586,Mohsen Pourvali and Mohammad Saniee Abadeh,"Automated Text Summarization Base on Lexicales Chain and graph Using of - WordNet and Wikipedia Knowledge Base",cs.IR cs.CL," The technology of automatic document summarization is maturing and may -provide a solution to the information overload problem. Nowadays, document -summarization plays an important role in information retrieval. With a large -volume of documents, presenting the user with a summary of each document -greatly facilitates the task of finding the desired documents. Document -summarization is a process of automatically creating a compressed version of a -given document that provides useful information to users, and multi-document -summarization is to produce a summary delivering the majority of information -content from a set of documents about an explicit or implicit main topic. The -lexical cohesion structure of the text can be exploited to determine the -importance of a sentence/phrase. Lexical chains are useful tools to analyze the -lexical cohesion structure in a text .In this paper we consider the effect of -the use of lexical cohesion features in Summarization, And presenting a -algorithm base on the knowledge base. Ours algorithm at first find the correct -sense of any word, Then constructs the lexical chains, remove Lexical chains -that less score than other, detects topics roughly from lexical chains, -segments the text with respect to the topics and selects the most important -sentences. The experimental results on an open benchmark datasets from DUC01 -and DUC02 show that our proposed approach can improve the performance compared -to sate-of-the-art summarization approaches. -" -464,1203.4176,"A.M. Riad, Hamdy K.Elmonier, Samaa. M. Shohieb, and A.S. Asem","SignsWorld; Deeping Into the Silence World and Hearing Its Signs (State - of the Art)",cs.CL cs.CV," Automatic speech processing systems are employed more and more often in real -environments. Although the underlying speech technology is mostly language -independent, differences between languages with respect to their structure and -grammar have substantial effect on the recognition systems performance. In this -paper, we present a review of the latest developments in the sign language -recognition research in general and in the Arabic sign language (ArSL) in -specific. This paper also presents a general framework for improving the deaf -community communication with the hearing people that is called SignsWorld. The -overall goal of the SignsWorld project is to develop a vision-based technology -for recognizing and translating continuous Arabic sign language ArSL. -" -465,1203.4238,"Marco Guerini, Alberto Pepe, Bruno Lepri","Do Linguistic Style and Readability of Scientific Abstracts affect their - Virality?",cs.SI cs.CL cs.DL," Reactions to textual content posted in an online social network show -different dynamics depending on the linguistic style and readability of the -submitted content. Do similar dynamics exist for responses to scientific -articles? Our intuition, supported by previous research, suggests that the -success of a scientific article depends on its content, rather than on its -linguistic style. In this article, we examine a corpus of scientific abstracts -and three forms of associated reactions: article downloads, citations, and -bookmarks. Through a class-based psycholinguistic analysis and readability -indices tests, we show that certain stylistic and readability features of -abstracts clearly concur in determining the success and viral capability of a -scientific article. -" -466,1203.4605,Tarek El-shishtawy and Abdulwahab Al-sammak,"Arabic Keyphrase Extraction using Linguistic knowledge and Machine - Learning Techniques",cs.CL," In this paper, a supervised learning technique for extracting keyphrases of -Arabic documents is presented. The extractor is supplied with linguistic -knowledge to enhance its efficiency instead of relying only on statistical -information such as term frequency and distance. During analysis, an annotated -Arabic corpus is used to extract the required lexical features of the document -words. The knowledge also includes syntactic rules based on part of speech tags -and allowed word sequences to extract the candidate keyphrases. In this work, -the abstract form of Arabic words is used instead of its stem form to represent -the candidate terms. The Abstract form hides most of the inflections found in -Arabic words. The paper introduces new features of keyphrases based on -linguistic knowledge, to capture titles and subtitles of a document. A simple -ANOVA test is used to evaluate the validity of selected features. Then, the -learning model is built using the LDA - Linear Discriminant Analysis - and -training documents. Although, the presented system is trained using documents -in the IT domain, experiments carried out show that it has a significantly -better performance than the existing Arabic extractor systems, where precision -and recall values reach double their corresponding values in the other systems -especially for lengthy and non-scientific articles. -" -467,1203.4933,"Kishorjit Nongmeikapam, Lairenlakpam Nonglenjaoba, Yumnam Nirmal and - Sivaji Bandyopadhyay","Reduplicated MWE (RMWE) helps in improving the CRF based Manipuri POS - Tagger",cs.CL," This paper gives a detail overview about the modified features selection in -CRF (Conditional Random Field) based Manipuri POS (Part of Speech) tagging. -Selection of features is so important in CRF that the better are the features -then the better are the outputs. This work is an attempt or an experiment to -make the previous work more efficient. Multiple new features are tried to run -the CRF and again tried with the Reduplicated Multiword Expression (RMWE) as -another feature. The CRF run with RMWE because Manipuri is rich of RMWE and -identification of RMWE becomes one of the necessities to bring up the result of -POS tagging. The new CRF system shows a Recall of 78.22%, Precision of 73.15% -and F-measure of 75.60%. With the identification of RMWE and considering it as -a feature makes an improvement to a Recall of 80.20%, Precision of 74.31% and -F-measure of 77.14%. -" -468,1203.5051,Leon Derczynski and Robert Gaizauskas,Analysing Temporally Annotated Corpora with CAVaT,cs.CL," We present CAVaT, a tool that performs Corpus Analysis and Validation for -TimeML. CAVaT is an open source, modular checking utility for statistical -analysis of features specific to temporally-annotated natural language corpora. -It provides reporting, highlights salient links between a variety of general -and time-specific linguistic features, and also validates a temporal annotation -to ensure that it is logically consistent and sufficiently annotated. Uniquely, -CAVaT provides analysis specific to TimeML-annotated temporal information. -TimeML is a standard for annotating temporal information in natural language -text. In this paper, we present the reporting part of CAVaT, and then its -error-checking ability, including the workings of several novel TimeML document -verification methods. This is followed by the execution of some example tasks -using the tool to show relations between times, events, signals and links. We -also demonstrate inconsistencies in a TimeML corpus (TimeBank) that have been -detected with CAVaT. -" -469,1203.5055,Leon Derczynski and Robert Gaizauskas,Using Signals to Improve Automatic Classification of Temporal Relations,cs.CL," Temporal information conveyed by language describes how the world around us -changes through time. Events, durations and times are all temporal elements -that can be viewed as intervals. These intervals are sometimes temporally -related in text. Automatically determining the nature of such relations is a -complex and unsolved problem. Some words can act as ""signals"" which suggest a -temporal ordering between intervals. In this paper, we use these signal words -to improve the accuracy of a recent approach to classification of temporal -links. -" -470,1203.5060,Leon Derczynski and Robert Gaizauskas,USFD2: Annotating Temporal Expresions and TLINKs for TempEval-2,cs.CL," We describe the University of Sheffield system used in the TempEval-2 -challenge, USFD2. The challenge requires the automatic identification of -temporal entities and relations in text. USFD2 identifies and anchors temporal -expressions, and also attempts two of the four temporal relation assignment -tasks. A rule-based system picks out and anchors temporal expressions, and a -maximum entropy classifier assigns temporal link labels, based on features that -include descriptions of associated temporal signal words. USFD2 identified -temporal expressions successfully, and correctly classified their type in 90% -of cases. Determining the relation between an event and time expression in the -same sentence was performed at 63% accuracy, the second highest score in this -part of the challenge. -" -471,1203.5062,Leon Derczynski and Robert Gaizauskas,An Annotation Scheme for Reichenbach's Verbal Tense Structure,cs.CL," In this paper we present RTMML, a markup language for the tenses of verbs and -temporal relations between verbs. There is a richness to tense in language that -is not fully captured by existing temporal annotation schemata. Following -Reichenbach we present an analysis of tense in terms of abstract time points, -with the aim of supporting automated processing of tense and temporal relations -in language. This allows for precise reasoning about tense in documents, and -the deduction of temporal relations between the times and verbal events in a -discourse. We define the syntax of RTMML, and demonstrate the markup in a range -of situations. -" -472,1203.5066,Leon Derczynski and Robert Gaizauskas,A Corpus-based Study of Temporal Signals,cs.CL," Automatic temporal ordering of events described in discourse has been of -great interest in recent years. Event orderings are conveyed in text via va -rious linguistic mechanisms including the use of expressions such as ""before"", -""after"" or ""during"" that explicitly assert a temporal relation -- temporal -signals. In this paper, we investigate the role of temporal signals in temporal -relation extraction and provide a quantitative analysis of these expres sions -in the TimeBank annotated corpus. -" -473,1203.5073,"Amev Burman, Arun Jayapal, Sathish Kannan, Madhu Kavilikatta, Ayman - Alhelbawy, Leon Derczynski, Robert Gaizauskas","USFD at KBP 2011: Entity Linking, Slot Filling and Temporal Bounding",cs.CL," This paper describes the University of Sheffield's entry in the 2011 TAC KBP -entity linking and slot filling tasks. We chose to participate in the -monolingual entity linking task, the monolingual slot filling task and the -temporal slot filling tasks. We set out to build a framework for -experimentation with knowledge base population. This framework was created, and -applied to multiple KBP tasks. We demonstrated that our proposed framework is -effective and suitable for collaborative development efforts, as well as useful -in a teaching environment. Finally we present results that, while very modest, -provide improvements an order of magnitude greater than our 2010 attempt. -" -474,1203.5076,Leon Derczynski and H\'ector Llorens and Estela Saquete,Massively Increasing TIMEX3 Resources: A Transduction Approach,cs.CL," Automatic annotation of temporal expressions is a research challenge of great -interest in the field of information extraction. Gold standard -temporally-annotated resources are limited in size, which makes research using -them difficult. Standards have also evolved over the past decade, so not all -temporally annotated data is in the same format. We vastly increase available -human-annotated temporal expression resources by converting older format -resources to TimeML/TIMEX3. This task is difficult due to differing annotation -methods. We present a robust conversion tool and a new, large temporal -expression resource. Using this, we evaluate our conversion process by using it -as training data for an existing TimeML annotation tool, achieving a 0.87 F1 -measure -- better than any system in the TempEval-2 timex recognition exercise. -" -475,1203.5084,"Leon Derczynski, Jun Wang, Robert Gaizauskas and Mark A. Greenwood",A Data Driven Approach to Query Expansion in Question Answering,cs.CL cs.IR," Automated answering of natural language questions is an interesting and -useful problem to solve. Question answering (QA) systems often perform -information retrieval at an initial stage. Information retrieval (IR) -performance, provided by engines such as Lucene, places a bound on overall -system performance. For example, no answer bearing documents are retrieved at -low ranks for almost 40% of questions. - In this paper, answer texts from previous QA evaluations held as part of the -Text REtrieval Conferences (TREC) are paired with queries and analysed in an -attempt to identify performance-enhancing words. These words are then used to -evaluate the performance of a query expansion method. - Data driven extension words were found to help in over 70% of difficult -questions. These words can be used to improve and evaluate query expansion -methods. Simple blind relevance feedback (RF) was correctly predicted as -unlikely to help overall performance, and an possible explanation is provided -for its low value in IR for QA. -" -476,1203.5188,"Stefan Hen{\ss}, Martin Monperrus (INRIA Lille - Nord Europe), Mira - Mezini","Semi-Automatically Extracting FAQs to Improve Accessibility of Software - Development Knowledge",cs.SE cs.CL cs.IR," Frequently asked questions (FAQs) are a popular way to document software -development knowledge. As creating such documents is expensive, this paper -presents an approach for automatically extracting FAQs from sources of software -development discussion, such as mailing lists and Internet forums, by combining -techniques of text mining and natural language processing. We apply the -approach to popular mailing lists and carry out a survey among software -developers to show that it is able to extract high-quality FAQs that may be -further improved by experts. -" -477,1203.5255,"Youssef Bassil, Mohammad Alwani","Post-Editing Error Correction Algorithm for Speech Recognition using - Bing Spelling Suggestion",cs.CL," ASR short for Automatic Speech Recognition is the process of converting a -spoken speech into text that can be manipulated by a computer. Although ASR has -several applications, it is still erroneous and imprecise especially if used in -a harsh surrounding wherein the input speech is of low quality. This paper -proposes a post-editing ASR error correction method and algorithm based on -Bing's online spelling suggestion. In this approach, the ASR recognized output -text is spell-checked using Bing's spelling suggestion technology to detect and -correct misrecognized words. More specifically, the proposed algorithm breaks -down the ASR output text into several word-tokens that are submitted as search -queries to Bing search engine. A returned spelling suggestion implies that a -query is misspelled; and thus it is replaced by the suggested correction; -otherwise, no correction is performed and the algorithm continues with the next -token until all tokens get validated. Experiments carried out on various -speeches in different languages indicated a successful decrease in the number -of ASR errors and an improvement in the overall error correction rate. Future -research can improve upon the proposed algorithm so much so that it can be -parallelized to take advantage of multiprocessor computers. -" -478,1203.5262,"Youssef Bassil, Paul Semaan",ASR Context-Sensitive Error Correction Based on Microsoft N-Gram Dataset,cs.CL," At the present time, computers are employed to solve complex tasks and -problems ranging from simple calculations to intensive digital image processing -and intricate algorithmic optimization problems to computationally-demanding -weather forecasting problems. ASR short for Automatic Speech Recognition is yet -another type of computational problem whose purpose is to recognize human -spoken speech and convert it into text that can be processed by a computer. -Despite that ASR has many versatile and pervasive real-world applications,it is -still relatively erroneous and not perfectly solved as it is prone to produce -spelling errors in the recognized text, especially if the ASR system is -operating in a noisy environment, its vocabulary size is limited, and its input -speech is of bad or low quality. This paper proposes a post-editing ASR error -correction method based on MicrosoftN-Gram dataset for detecting and correcting -spelling errors generated by ASR systems. The proposed method comprises an -error detection algorithm for detecting word errors; a candidate corrections -generation algorithm for generating correction suggestions for the detected -word errors; and a context-sensitive error correction algorithm for selecting -the best candidate for correction. The virtue of using the Microsoft N-Gram -dataset is that it contains real-world data and word sequences extracted from -the web which canmimica comprehensive dictionary of words having a large and -all-inclusive vocabulary. Experiments conducted on numerous speeches, performed -by different speakers, showed a remarkable reduction in ASR errors. Future -research can improve upon the proposed algorithm so much so that it can be -parallelized to take advantage of multiprocessor and distributed systems. -" -479,1203.5502,"Marco Guerini, Carlo Strapparava and Gozde Ozbal",Exploring Text Virality in Social Networks,cs.CL cs.SI physics.soc-ph," This paper aims to shed some light on the concept of virality - especially in -social networks - and to provide new insights on its structure. We argue that: -(a) virality is a phenomenon strictly connected to the nature of the content -being spread, rather than to the influencers who spread it, (b) virality is a -phenomenon with many facets, i.e. under this generic term several different -effects of persuasive communication are comprised and they only partially -overlap. To give ground to our claims, we provide initial experiments in a -machine learning framework to show how various aspects of virality can be -independently predicted according to content features. -" -480,1203.6136,Alex Rudnick,"Tree Transducers, Machine Translation, and Cross-Language Divergences",cs.CL," Tree transducers are formal automata that transform trees into other trees. -Many varieties of tree transducers have been explored in the automata theory -literature, and more recently, in the machine translation literature. In this -paper I review T and xT transducers, situate them among related formalisms, and -show how they can be used to implement rules for machine translation systems -that cover all of the cross-language structural divergences described in Bonnie -Dorr's influential article on the topic. I also present an implementation of xT -transduction, suitable and convenient for experimenting with translation rules. -" -481,1203.6339,Massimiliano Dal Mas,"Intelligent Interface Architectures for Folksonomy Driven Structure - Network",cs.HC cs.CL cs.CY cs.IR," The folksonomy is the result of free personal information or assignment of -tags to an object (determined by the URI) in order to find them. The practice -of tagging is done in a collective environment. Folksonomies are self -constructed, based on co-occurrence of definitions, rather than a hierarchical -structure of the data. The downside of this was that a few sites and -applications are able to successfully exploit the sharing of bookmarks. The -need for tools that are able to resolve the ambiguity of the definitions is -becoming urgent as the need of simple instruments for their visualization, -editing and exploitation in web applications still hinders their diffusion and -wide adoption. An intelligent interactive interface design for folksonomies -should consider the contextual design and inquiry based on a concurrent -interaction for a perceptual user interfaces. To represent folksonomies a new -concept structure called ""Folksodriven"" is used in this paper. While it is -presented the Folksodriven Structure Network (FSN) to resolve the ambiguity of -definitions of folksonomy tags suggestions for the user. On this base a -Human-Computer Interactive (HCI) systems is developed for the visualization, -navigation, updating and maintenance of folksonomies Knowledge Bases - the FSN -- through the web. System functionalities as well as its internal architecture -will be introduced. -" -482,1203.6360,"Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon Kleinberg, Lillian - Lee",You had me at hello: How phrasing affects memorability,cs.CL cs.SI physics.soc-ph," Understanding the ways in which information achieves widespread public -awareness is a research question of significant interest. We consider whether, -and how, the way in which the information is phrased --- the choice of words -and sentence structure --- can affect this process. To this end, we develop an -analysis framework and build a corpus of movie quotes, annotated with -memorability information, in which we are able to control for both the speaker -and the setting of the quotes. We find that there are significant differences -between memorable and non-memorable quotes in several key dimensions, even -after controlling for situational and contextual factors. One is lexical -distinctiveness: in aggregate, memorable quotes use less common word choices, -but at the same time are built upon a scaffolding of common syntactic patterns. -Another is that memorable quotes tend to be more general in ways that make them -easy to apply in new contexts --- that is, more portable. We also show how the -concept of ""memorable language"" can be extended across domains. -" -483,1203.6845,"M\'onica Marrero, Sonia S\'anchez-Cuadrado, Juli\'an Urbano, Jorge - Morato, Jos\'e-Antonio Moreiro",Information Retrieval Systems Adapted to the Biomedical Domain,cs.CL cs.IR," The terminology used in Biomedicine shows lexical peculiarities that have -required the elaboration of terminological resources and information retrieval -systems with specific functionalities. The main characteristics are the high -rates of synonymy and homonymy, due to phenomena such as the proliferation of -polysemic acronyms and their interaction with common language. Information -retrieval systems in the biomedical domain use techniques oriented to the -treatment of these lexical peculiarities. In this paper we review some of the -techniques used in this domain, such as the application of Natural Language -Processing (BioNLP), the incorporation of lexical-semantic resources, and the -application of Named Entity Recognition (BioNER). Finally, we present the -evaluation methods adopted to assess the suitability of these techniques for -retrieving biomedical resources. -" -484,1204.0140,Mario Jarmasz,Roget's Thesaurus as a Lexical Resource for Natural Language Processing,cs.CL," WordNet proved that it is possible to construct a large-scale electronic -lexical database on the principles of lexical semantics. It has been accepted -and used extensively by computational linguists ever since it was released. -Inspired by WordNet's success, we propose as an alternative a similar resource, -based on the 1987 Penguin edition of Roget's Thesaurus of English Words and -Phrases. - Peter Mark Roget published his first Thesaurus over 150 years ago. Countless -writers, orators and students of the English language have used it. -Computational linguists have employed Roget's for almost 50 years in Natural -Language Processing, however hesitated in accepting Roget's Thesaurus because a -proper machine tractable version was not available. - This dissertation presents an implementation of a machine-tractable version -of the 1987 Penguin edition of Roget's Thesaurus - the first implementation of -its kind to use an entire current edition. It explains the steps necessary for -taking a machine-readable file and transforming it into a tractable system. -This involves converting the lexical material into a format that can be more -easily exploited, identifying data structures and designing classes to -computerize the Thesaurus. Roget's organization is studied in detail and -contrasted with WordNet's. - We show two applications of the computerized Thesaurus: computing semantic -similarity between words and phrases, and building lexical chains in a text. -The experiments are performed using well-known benchmarks and the results are -compared to those of other systems that use Roget's, WordNet and statistical -techniques. Roget's has turned out to be an excellent resource for measuring -semantic similarity; lexical chains are easily built but more difficult to -evaluate. We also explain ways in which Roget's Thesaurus and WordNet can be -combined. -" -485,1204.0184,Youssef Bassil,Parallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset,cs.CL," Spell-checking is the process of detecting and sometimes providing -suggestions for incorrectly spelled words in a text. Basically, the larger the -dictionary of a spell-checker is, the higher is the error detection rate; -otherwise, misspellings would pass undetected. Unfortunately, traditional -dictionaries suffer from out-of-vocabulary and data sparseness problems as they -do not encompass large vocabulary of words indispensable to cover proper names, -domain-specific terms, technical jargons, special acronyms, and terminologies. -As a result, spell-checkers will incur low error detection and correction rate -and will fail to flag all errors in the text. This paper proposes a new -parallel shared-memory spell-checking algorithm that uses rich real-world word -statistics from Yahoo! N-Grams Dataset to correct non-word and real-word errors -in computer text. Essentially, the proposed algorithm can be divided into three -sub-algorithms that run in a parallel fashion: The error detection algorithm -that detects misspellings, the candidates generation algorithm that generates -correction suggestions, and the error correction algorithm that performs -contextual error correction. Experiments conducted on a set of text articles -containing misspellings, showed a remarkable spelling error correction rate -that resulted in a radical reduction of both non-word and real-word errors in -electronic text. In a further study, the proposed algorithm is to be optimized -for message-passing systems so as to become more flexible and less costly to -scale over distributed machines. -" -486,1204.0188,"Youssef Bassil, Mohammad Alwani","OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram - Data Set",cs.CL cs.IR," Since the dawn of the computing era, information has been represented -digitally so that it can be processed by electronic computers. Paper books and -documents were abundant and widely being published at that time; and hence, -there was a need to convert them into digital format. OCR, short for Optical -Character Recognition was conceived to translate paper-based books into digital -e-books. Regrettably, OCR systems are still erroneous and inaccurate as they -produce misspellings in the recognized text, especially when the source -document is of low printing quality. This paper proposes a post-processing OCR -context-sensitive error correction method for detecting and correcting non-word -and real-word OCR errors. The cornerstone of this proposed approach is the use -of Google Web 1T 5-gram data set as a dictionary of words to spell-check OCR -text. The Google data set incorporates a very large vocabulary and word -statistics entirely reaped from the Internet, making it a reliable source to -perform dictionary-based error correction. The core of the proposed solution is -a combination of three algorithms: The error detection, candidate spellings -generator, and error correction algorithms, which all exploit information -extracted from Google Web 1T 5-gram data set. Experiments conducted on scanned -images written in different languages showed a substantial improvement in the -OCR error correction rate. As future developments, the proposed algorithm is to -be parallelised so as to support parallel and distributed computing -architectures. -" -487,1204.0191,"Youssef Bassil, Mohammad Alwani","OCR Post-Processing Error Correction Algorithm using Google Online - Spelling Suggestion",cs.CL," With the advent of digital optical scanners, a lot of paper-based books, -textbooks, magazines, articles, and documents are being transformed into an -electronic version that can be manipulated by a computer. For this purpose, -OCR, short for Optical Character Recognition was developed to translate scanned -graphical text into editable computer text. Unfortunately, OCR is still -imperfect as it occasionally mis-recognizes letters and falsely identifies -scanned text, leading to misspellings and linguistics errors in the OCR output -text. This paper proposes a post-processing context-based error correction -algorithm for detecting and correcting OCR non-word and real-word errors. The -proposed algorithm is based on Google's online spelling suggestion which -harnesses an internal database containing a huge collection of terms and word -sequences gathered from all over the web, convenient to suggest possible -replacements for words that have been misspelled during the OCR process. -Experiments carried out revealed a significant improvement in OCR error -correction rate. Future research can improve upon the proposed algorithm so -much so that it can be parallelized and executed over multiprocessing -platforms. -" -488,1204.0245,"Mario Jarmasz, and Stan Szpakowicz",Roget's Thesaurus and Semantic Similarity,cs.CL," We have implemented a system that measures semantic similarity using a -computerized 1987 Roget's Thesaurus, and evaluated it by performing a few -typical tests. We compare the results of these tests with those produced by -WordNet-based similarity measures. One of the benchmarks is Miller and Charles' -list of 30 noun pairs to which human judges had assigned similarity measures. -We correlate these measures with those computed by several NLP systems. The 30 -pairs can be traced back to Rubenstein and Goodenough's 65 pairs, which we have -also studied. Our Roget's-based system gets correlations of .878 for the -smaller and .818 for the larger list of noun pairs; this is quite close to the -.885 that Resnik obtained when he employed humans to replicate the Miller and -Charles experiment. We further evaluate our measure by using Roget's and -WordNet to answer 80 TOEFL, 50 ESL and 300 Reader's Digest questions: the -correct synonym must be selected amongst a group of four words. Our system gets -78.75%, 82.00% and 74.33% of the questions respectively. -" -489,1204.0255,Mario Jarmasz and Caroline Barri\`ere,Keyphrase Extraction : Enhancing Lists,cs.CL cs.IR," This paper proposes some modest improvements to Extractor, a state-of-the-art -keyphrase extraction system, by using a terabyte-sized corpus to estimate the -informativeness and semantic similarity of keyphrases. We present two -techniques to improve the organization and remove outliers of lists of -keyphrases. The first is a simple ordering according to their occurrences in -the corpus; the second is clustering according to semantic similarity. -Evaluation issues are discussed. We present a novel technique of comparing -extracted keyphrases to a gold standard which relies on semantic similarity -rather than string matching or an evaluation involving human judges. -" -490,1204.0257,Mario Jarmasz and Stan Szpakowicz,"Not As Easy As It Seems: Automating the Construction of Lexical Chains - Using Roget's Thesaurus",cs.CL," Morris and Hirst present a method of linking significant words that are about -the same topic. The resulting lexical chains are a means of identifying -cohesive regions in a text, with applications in many natural language -processing tasks, including text summarization. The first lexical chains were -constructed manually using Roget's International Thesaurus. Morris and Hirst -wrote that automation would be straightforward given an electronic thesaurus. -All applications so far have used WordNet to produce lexical chains, perhaps -because adequate electronic versions of Roget's were not available until -recently. We discuss the building of lexical chains using an electronic version -of Roget's Thesaurus. We implement a variant of the original algorithm, and -explain the necessary design decisions. We include a comparison with other -implementations. -" -491,1204.0258,Mario Jarmasz and Stan Szpakowicz,Roget's Thesaurus: a Lexical Resource to Treasure,cs.CL," This paper presents the steps involved in creating an electronic lexical -knowledge base from the 1987 Penguin edition of Roget's Thesaurus. Semantic -relations are labelled with the help of WordNet. The two resources are compared -in a qualitative and quantitative manner. Differences in the organization of -the lexical material are discussed, as well as the possibility of merging both -resources. -" -492,1204.1615,Sofiene Haboubi and Samia Maddouri and Hamid Amiri,Discrimination between Arabic and Latin from bilingual documents,cs.CV cs.CL cs.IR," 2011 International Conference on Communications, Computing and Control -Applications (CCCA) -" -493,1204.2523,"Khalid El-Arini, Emily B. Fox, Carlos Guestrin",Concept Modeling with Superwords,stat.ML cs.CL cs.IR cs.LG," In information retrieval, a fundamental goal is to transform a document into -concepts that are representative of its content. The term ""representative"" is -in itself challenging to define, and various tasks require different -granularities of concepts. In this paper, we aim to model concepts that are -sparse over the vocabulary, and that flexibly adapt their content based on -other relevant semantic information such as textual structure or associated -image features. We explore a Bayesian nonparametric model based on nested beta -processes that allows for inferring an unknown number of strictly sparse -concepts. The resulting model provides an inherently different representation -of concepts than a standard LDA (or HDP) based topic model, and allows for -direct incorporation of semantic features. We demonstrate the utility of this -representation on multilingual blog data and the Congressional Record. -" -494,1204.2765,"Taha Yasseri, Andr\'as Kornai, and J\'anos Kert\'esz",A practical approach to language complexity: a Wikipedia case study,cs.CL physics.data-an physics.soc-ph," In this paper we present statistical analysis of English texts from -Wikipedia. We try to address the issue of language complexity empirically by -comparing the simple English Wikipedia (Simple) to comparable samples of the -main English Wikipedia (Main). Simple is supposed to use a more simplified -language with a limited vocabulary, and editors are explicitly requested to -follow this guideline, yet in practice the vocabulary richness of both samples -are at the same level. Detailed analysis of longer units (n-grams of words and -part of speech tags) shows that the language of Simple is less complex than -that of Main primarily due to the use of shorter sentences, as opposed to -drastically simplified syntax or vocabulary. Comparing the two language -varieties by the Gunning readability index supports this conclusion. We also -report on the topical dependence of language complexity, e.g. that the language -is more advanced in conceptual articles compared to person-based (biographical) -and object-based articles. Finally, we investigate the relation between -conflict and language complexity by analyzing the content of the talk pages -associated to controversial and peacefully developing articles, concluding that -controversy has the effect of reducing language complexity. -" -495,1204.2804,"Myle Ott, Claire Cardie, Jeff Hancock",Estimating the Prevalence of Deception in Online Review Communities,cs.SI cs.CL cs.CY," Consumers' purchase decisions are increasingly influenced by user-generated -online reviews. Accordingly, there has been growing concern about the potential -for posting ""deceptive opinion spam"" -- fictitious reviews that have been -deliberately written to sound authentic, to deceive the reader. But while this -practice has received considerable public attention and concern, relatively -little is known about the actual prevalence, or rate, of deception in online -review communities, and less still about the factors that influence it. - We propose a generative model of deception which, in conjunction with a -deception classifier, we use to explore the prevalence of deception in six -popular online review communities: Expedia, Hotels.com, Orbitz, Priceline, -TripAdvisor, and Yelp. We additionally propose a theoretical model of online -reviews based on economic signaling theory, in which consumer reviews diminish -the inherent information asymmetry between consumers and producers, by acting -as a signal to a product's true, unknown quality. We find that deceptive -opinion spam is a growing problem overall, but with different growth rates -across communities. These rates, we argue, are driven by the different -signaling costs associated with deception for each review community, e.g., -posting requirements. When measures are taken to increase signaling cost, e.g., -filtering reviews written by first-time reviewers, deception prevalence is -effectively reduced. -" -496,1204.2847,Chris Fournier and Diana Inkpen,Segmentation Similarity and Agreement,cs.CL," We propose a new segmentation evaluation metric, called segmentation -similarity (S), that quantifies the similarity between two segmentations as the -proportion of boundaries that are not transformed when comparing them using -edit distance, essentially using edit distance as a penalty function and -scaling penalties by segmentation size. We propose several adapted -inter-annotator agreement coefficients which use S that are suitable for -segmentation. We show that S is configurable enough to suit a wide variety of -segmentation evaluations, and is an improvement upon the state of the art. We -also propose using inter-annotator agreement coefficients to evaluate automatic -segmenters in terms of human performance. -" -497,1204.3198,Ramon Ferrer-i-Cancho and Antoni Hern\'andez-Fern\'andez,"The failure of the law of brevity in two New World primates. Statistical - caveats",q-bio.NC cs.CL," Parallels of Zipf's law of brevity, the tendency of more frequent words to be -shorter, have been found in bottlenose dolphins and Formosan macaques. Although -these findings suggest that behavioral repertoires are shaped by a general -principle of compression, common marmosets and golden-backed uakaris do not -exhibit the law. However, we argue that the law may be impossible or difficult -to detect statistically in a given species if the repertoire is too small, a -problem that could be affecting golden backed uakaris, and show that the law is -present in a subset of the repertoire of common marmosets. We suggest that the -visibility of the law will depend on the subset of the repertoire under -consideration or the repertoire size. -" -498,1204.3458,Bob Coecke,The logic of quantum mechanics - Take II,quant-ph cs.CL cs.LO math.CT math.LO," We put forward a new take on the logic of quantum mechanics, following -Schroedinger's point of view that it is composition which makes quantum theory -what it is, rather than its particular propositional structure due to the -existence of superpositions, as proposed by Birkhoff and von Neumann. This -gives rise to an intrinsically quantitative kind of logic, which truly deserves -the name `logic' in that it also models meaning in natural language, the latter -being the origin of logic, that it supports automation, the most prominent -practical use of logic, and that it supports probabilistic inference. -" -499,1204.3498,Vahed Qazvinian and Dragomir R. Radev,A Computational Analysis of Collective Discourse,cs.SI cs.CL physics.soc-ph," This paper is focused on the computational analysis of collective discourse, -a collective behavior seen in non-expert content contributions in online social -media. We collect and analyze a wide range of real-world collective discourse -datasets from movie user reviews to microblogs and news headlines to scientific -citations. We show that all these datasets exhibit diversity of perspective, a -property seen in other collective systems and a criterion in wise crowds. Our -experiments also confirm that the network of different perspective -co-occurrences exhibits the small-world property with high clustering of -different perspectives. Finally, we show that non-expert contributions in -collective discourse can be used to answer simple questions that are otherwise -hard to answer. -" -500,1204.3731,"Arkaitz Zubiaga, Damiano Spina, Enrique Amig\'o and Julio Gonzalo",Towards Real-Time Summarization of Scheduled Events from Twitter Streams,cs.IR cs.CL cs.SI," This paper explores the real-time summarization of scheduled events such as -soccer games from torrential flows of Twitter streams. We propose and evaluate -an approach that substantially shrinks the stream of tweets in real-time, and -consists of two steps: (i) sub-event detection, which determines if something -new has occurred, and (ii) tweet selection, which picks a representative tweet -to describe each sub-event. We compare the summaries generated in three -languages for all the soccer games in ""Copa America 2011"" to reference live -reports offered by Yahoo! Sports journalists. We show that simple text analysis -methods which do not involve external knowledge lead to summaries that cover -84% of the sub-events on average, and 100% of key types of sub-events (such as -goals in soccer). Our approach should be straightforwardly applicable to other -kinds of scheduled events such as other sports, award ceremonies, keynote -talks, TV shows, etc. -" -501,1204.3800,Srinivasan Kalyanaraman,"Indus script corpora, archaeo-metallurgy and Meluhha (Mleccha)",cs.CL," Jules Bloch's work on formation of the Marathi language has to be expanded -further to provide for a study of evolution and formation of Indian languages -in the Indian language union (sprachbund). The paper analyses the stages in the -evolution of early writing systems which began with the evolution of counting -in the ancient Near East. A stage anterior to the stage of syllabic -representation of sounds of a language, is identified. Unique geometric shapes -required for tokens to categorize objects became too large to handle to -abstract hundreds of categories of goods and metallurgical processes during the -production of bronze-age goods. About 3500 BCE, Indus script as a writing -system was developed to use hieroglyphs to represent the 'spoken words' -identifying each of the goods and processes. A rebus method of representing -similar sounding words of the lingua franca of the artisans was used in Indus -script. This method is recognized and consistently applied for the lingua -franca of the Indian sprachbund. That the ancient languages of India, -constituted a sprachbund (or language union) is now recognized by many -linguists. The sprachbund area is proximate to the area where most of the Indus -script inscriptions were discovered, as documented in the corpora. That -hundreds of Indian hieroglyphs continued to be used in metallurgy is evidenced -by their use on early punch-marked coins. This explains the combined use of -syllabic scripts such as Brahmi and Kharoshti together with the hieroglyphs on -Rampurva copper bolt, and Sohgaura copper plate from about 6th century -BCE.Indian hieroglyphs constitute a writing system for meluhha language and are -rebus representations of archaeo-metallurgy lexemes. The rebus principle was -employed by the early scripts and can legitimately be used to decipher the -Indus script, after secure pictorial identification. -" -502,1204.4346,"James Cook, Atish Das Sarma, Alex Fabrikant, Andrew Tomkins",Your Two Weeks of Fame and Your Grandmother's,cs.DL cs.CL cs.SI physics.soc-ph," Did celebrity last longer in 1929, 1992 or 2009? We investigate the -phenomenon of fame by mining a collection of news articles that spans the -twentieth century, and also perform a side study on a collection of blog posts -from the last 10 years. By analyzing mentions of personal names, we measure -each person's time in the spotlight, using two simple metrics that evaluate, -roughly, the duration of a single news story about a person, and the overall -duration of public interest in a person. We watched the distribution evolve -from 1895 to 2010, expecting to find significantly shortening fame durations, -per the much popularly bemoaned shortening of society's attention spans and -quickening of media's news cycles. Instead, we conclusively demonstrate that, -through many decades of rapid technological and societal change, through the -appearance of Twitter, communication satellites, and the Internet, fame -durations did not decrease, neither for the typical case nor for the extremely -famous, with the last statistically significant fame duration decreases coming -in the early 20th century, perhaps from the spread of telegraphy and telephony. -Furthermore, while median fame durations stayed persistently constant, for the -most famous of the famous, as measured by either volume or duration of media -attention, fame durations have actually trended gently upward since the 1940s, -with statistically significant increases on 40-year timescales. Similar studies -have been done with much shorter timescales specifically in the context of -information spreading on Twitter and similar social networking sites. To the -best of our knowledge, this is the first massive scale study of this nature -that spans over a century of archived data, thereby allowing us to track -changes across decades. -" -503,1204.4914,Diederik Aerts and Sandro Sozzo,Quantum Interference in Cognition: Structural Aspects of the Brain,cs.AI cs.CL quant-ph," We identify the presence of typically quantum effects, namely 'superposition' -and 'interference', in what happens when human concepts are combined, and -provide a quantum model in complex Hilbert space that represents faithfully -experimental data measuring the situation of combining concepts. Our model -shows how 'interference of concepts' explains the effects of underextension and -overextension when two concepts combine to the disjunction of these two -concepts. This result supports our earlier hypothesis that human thought has a -superposed two-layered structure, one layer consisting of 'classical logical -thought' and a superposed layer consisting of 'quantum conceptual thought'. -Possible connections with recent findings of a 'grid-structure' for the brain -are analyzed, and influences on the mind/brain relation, and consequences on -applied disciplines, such as artificial intelligence and quantum computation, -are considered. -" -504,1204.5316,"Maxime Lefran\c{c}ois (INRIA Sophia Antipolis), Fabien Gandon (INRIA - Sophia Antipolis)","ILexicOn: toward an ECD-compliant interlingual lexical ontology - described with semantic web formalisms",cs.CL cs.AI," We are interested in bridging the world of natural language and the world of -the semantic web in particular to support natural multilingual access to the -web of data. In this paper we introduce a new type of lexical ontology called -interlingual lexical ontology (ILexicOn), which uses semantic web formalisms to -make each interlingual lexical unit class (ILUc) support the projection of its -semantic decomposition on itself. After a short overview of existing lexical -ontologies, we briefly introduce the semantic web formalisms we use. We then -present the three layered architecture of our approach: i) the interlingual -lexical meta-ontology (ILexiMOn); ii) the ILexicOn where ILUcs are formally -defined; iii) the data layer. We illustrate our approach with a standalone -ILexicOn, and introduce and explain a concise human-readable notation to -represent ILexicOns. Finally, we show how semantic web formalisms enable the -projection of a semantic decomposition on the decomposed ILUc. -" -505,1204.5345,"William M. Stevens, Andrew Adamatzky, Ishrat Jahan, Ben de Lacy - Costello","Time-dependent wave selection for information processing in excitable - media",nlin.PS cs.CL," We demonstrate an improved technique for implementing logic circuits in -light-sensitive chemical excitable media. The technique makes use of the -constant-speed propagation of waves along defined channels in an excitable -medium based on the Belousov-Zhabotinsky reaction, along with the mutual -annihilation of colliding waves. What distinguishes this work from previous -work in this area is that regions where channels meet at a junction can -periodically alternate between permitting the propagation of waves and blocking -them. These valve-like areas are used to select waves based on the length of -time that it takes waves to propagate from one valve to another. In an -experimental implementation, the channels which make up the circuit layout are -projected by a digital projector connected to a computer. Excitable channels -are projected as dark areas, unexcitable regions as light areas. Valves -alternate between dark and light: every valve has the same period and phase, -with a 50% duty cycle. This scheme can be used to make logic gates based on -combinations of OR and AND-NOT operations, with few geometrical constraints. -Because there are few geometrical constraints, compact circuits can be -implemented. Experimental results from an implementation of a 4-bit input, -2-bit output integer square root circuit are given. This is the most complex -logic circuit that has been implemented in BZ excitable media to date. -" -506,1204.5369,"Marco Guerini, Carlo Strapparava, Oliviero Stock",Ecological Evaluation of Persuasive Messages Using Google AdWords,cs.CL cs.SI," In recent years there has been a growing interest in crowdsourcing -methodologies to be used in experimental research for NLP tasks. In particular, -evaluation of systems and theories about persuasion is difficult to accommodate -within existing frameworks. In this paper we present a new cheap and fast -methodology that allows fast experiment building and evaluation with -fully-automated analysis at a low cost. The central idea is exploiting existing -commercial tools for advertising on the web, such as Google AdWords, to measure -message impact in an ecological setting. The paper includes a description of -the approach, tips for how to use AdWords for scientific research, and results -of pilot experiments on the impact of affective text variations which confirm -the effectiveness of the approach. -" -507,1204.5852,"Youssef Bassil, Mohammad Alwani","Context-sensitive Spelling Correction Using Google Web 1T 5-Gram - Information",cs.CL," In computing, spell checking is the process of detecting and sometimes -providing spelling suggestions for incorrectly spelled words in a text. -Basically, a spell checker is a computer program that uses a dictionary of -words to perform spell checking. The bigger the dictionary is, the higher is -the error detection rate. The fact that spell checkers are based on regular -dictionaries, they suffer from data sparseness problem as they cannot capture -large vocabulary of words including proper names, domain-specific terms, -technical jargons, special acronyms, and terminologies. As a result, they -exhibit low error detection rate and often fail to catch major errors in the -text. This paper proposes a new context-sensitive spelling correction method -for detecting and correcting non-word and real-word errors in digital text -documents. The approach hinges around data statistics from Google Web 1T 5-gram -data set which consists of a big volume of n-gram word sequences, extracted -from the World Wide Web. Fundamentally, the proposed method comprises an error -detector that detects misspellings, a candidate spellings generator based on a -character 2-gram model that generates correction suggestions, and an error -corrector that performs contextual error correction. Experiments conducted on a -set of text documents from different domains and containing misspellings, -showed an outstanding spelling error correction rate and a drastic reduction of -both non-word and real-word errors. In a further study, the proposed algorithm -is to be parallelized so as to lower the computational cost of the error -detection and correction processes. -" -508,1204.6362,Rushdi Shams and Adel Elsayed,"A Corpus-based Evaluation of Lexical Components of a Domainspecific Text - to Knowledge Mapping Prototype",cs.IR cs.CL," The aim of this paper is to evaluate the lexical components of a Text to -Knowledge Mapping (TKM) prototype. The prototype is domain-specific, the -purpose of which is to map instructional text onto a knowledge domain. The -context of the knowledge domain of the prototype is physics, specifically DC -electrical circuits. During development, the prototype has been tested with a -limited data set from the domain. The prototype now reached a stage where it -needs to be evaluated with a representative linguistic data set called corpus. -A corpus is a collection of text drawn from typical sources which can be used -as a test data set to evaluate NLP systems. As there is no available corpus for -the domain, we developed a representative corpus and annotated it with -linguistic information. The evaluation of the prototype considers one of its -two main components- lexical knowledge base. With the corpus, the evaluation -enriches the lexical knowledge resources like vocabulary and grammar structure. -This leads the prototype to parse a reasonable amount of sentences in the -corpus. -" -509,1204.6364,"Rushdi Shams, Adel Elsayed, Quazi Mah-Zereen Akter","A Corpus-based Evaluation of a Domain-specific Text to Knowledge Mapping - Prototype",cs.CL," The aim of this paper is to evaluate a Text to Knowledge Mapping (TKM) -Prototype. The prototype is domain-specific, the purpose of which is to map -instructional text onto a knowledge domain. The context of the knowledge domain -is DC electrical circuit. During development, the prototype has been tested -with a limited data set from the domain. The prototype reached a stage where it -needs to be evaluated with a representative linguistic data set called corpus. -A corpus is a collection of text drawn from typical sources which can be used -as a test data set to evaluate NLP systems. As there is no available corpus for -the domain, we developed and annotated a representative corpus. The evaluation -of the prototype considers two of its major components- lexical components and -knowledge model. Evaluation on lexical components enriches the lexical -resources of the prototype like vocabulary and grammar structures. This leads -the prototype to parse a reasonable amount of sentences in the corpus. While -dealing with the lexicon was straight forward, the identification and -extraction of appropriate semantic relations was much more involved. It was -necessary, therefore, to manually develop a conceptual structure for the domain -to formulate a domain-specific framework of semantic relations. The framework -of semantic relationsthat has resulted from this study consisted of 55 -relations, out of which 42 have inverse relations. We also conducted rhetorical -analysis on the corpus to prove its representativeness in conveying semantic. -Finally, we conducted a topical and discourse analysis on the corpus to analyze -the coverage of discourse by the prototype. -" -510,1204.6441,Daniel Gayo-Avello,"""I Wanted to Predict Elections with Twitter and all I got was this Lousy - Paper"" -- A Balanced Survey on Election Prediction using Twitter Data",cs.CY cs.CL cs.SI physics.soc-ph," Predicting X from Twitter is a popular fad within the Twitter research -subculture. It seems both appealing and relatively easy. Among such kind of -studies, electoral prediction is maybe the most attractive, and at this moment -there is a growing body of literature on such a topic. This is not only an -interesting research problem but, above all, it is extremely difficult. -However, most of the authors seem to be more interested in claiming positive -results than in providing sound and reproducible methods. It is also especially -worrisome that many recent papers seem to only acknowledge those studies -supporting the idea of Twitter predicting elections, instead of conducting a -balanced literature review showing both sides of the matter. After reading many -of such papers I have decided to write such a survey myself. Hence, in this -paper, every study relevant to the matter of electoral prediction using social -media is commented. From this review it can be concluded that the predictive -power of Twitter regarding elections has been greatly exaggerated, and that -hard research problems still lie ahead. -" -511,1205.0627,"Yann Ponty (LIX, INRIA Saclay - Ile de France)","Rule-weighted and terminal-weighted context-free grammars have identical - expressivity",cs.CL," Two formalisms, both based on context-free grammars, have recently been -proposed as a basis for a non-uniform random generation of combinatorial -objects. The former, introduced by Denise et al, associates weights with -letters, while the latter, recently explored by Weinberg et al in the context -of random generation, associates weights to transitions. In this short note, we -use a simple modification of the Greibach Normal Form transformation algorithm, -due to Blum and Koch, to show the equivalent expressivities, in term of their -induced distributions, of these two formalisms. -" -512,1205.1564,Wentian Li,"Characterizing Ranked Chinese Syllable-to-Character Mapping Spectrum: A - Bridge Between the Spoken and Written Chinese Language",cs.CL stat.AP," One important aspect of the relationship between spoken and written Chinese -is the ranked syllable-to-character mapping spectrum, which is the ranked list -of syllables by the number of characters that map to the syllable. Previously, -this spectrum is analyzed for more than 400 syllables without distinguishing -the four intonations. In the current study, the spectrum with 1280 toned -syllables is analyzed by logarithmic function, Beta rank function, and -piecewise logarithmic function. Out of the three fitting functions, the -two-piece logarithmic function fits the data the best, both by the smallest sum -of squared errors (SSE) and by the lowest Akaike information criterion (AIC) -value. The Beta rank function is the close second. By sampling from a Poisson -distribution whose parameter value is chosen from the observed data, we -empirically estimate the $p$-value for testing the -two-piece-logarithmic-function being better than the Beta rank function -hypothesis, to be 0.16. For practical purposes, the piecewise logarithmic -function and the Beta rank function can be considered a tie. -" -513,1205.1603,"Win Win Thant, Tin Myat Htwe and Ni Lar Thein",Parsing of Myanmar sentences with function tagging,cs.CL," This paper describes the use of Naive Bayes to address the task of assigning -function tags and context free grammar (CFG) to parse Myanmar sentences. Part -of the challenge of statistical function tagging for Myanmar sentences comes -from the fact that Myanmar has free-phrase-order and a complex morphological -system. Function tagging is a pre-processing step for parsing. In the task of -function tagging, we use the functional annotated corpus and tag Myanmar -sentences with correct segmentation, POS (part-of-speech) tagging and chunking -information. We propose Myanmar grammar rules and apply context free grammar -(CFG) to find out the parse tree of function tagged Myanmar sentences. -Experiments show that our analysis achieves a good result with parsing of -simple sentences and three types of complex sentences. -" -514,1205.1639,Sajilal Divakaran,"Spectral Analysis of Projection Histogram for Enhancing Close matching - character Recognition in Malayalam",cs.CL cs.CV cs.IR," The success rates of Optical Character Recognition (OCR) systems for printed -Malayalam documents is quite impressive with the state of the art accuracy -levels in the range of 85-95% for various. However for real applications, -further enhancement of this accuracy levels are required. One of the bottle -necks in further enhancement of the accuracy is identified as close-matching -characters. In this paper, we delineate the close matching characters in -Malayalam and report the development of a specialised classifier for these -close-matching characters. The output of a state of the art of OCR is taken and -characters falling into the close-matching character set is further fed into -this specialised classifier for enhancing the accuracy. The classifier is based -on support vector machine algorithm and uses feature vectors derived out of -spectral coefficients of projection histogram signals of close-matching -characters. -" -515,1205.1794,Behrouz Abdolali and Hossein Sameti,"A Novel Method For Speech Segmentation Based On Speakers' - Characteristics",cs.AI cs.CL," Speech Segmentation is the process change point detection for partitioning an -input audio stream into regions each of which corresponds to only one audio -source or one speaker. One application of this system is in Speaker Diarization -systems. There are several methods for speaker segmentation; however, most of -the Speaker Diarization Systems use BIC-based Segmentation methods. The main -goal of this paper is to propose a new method for speaker segmentation with -higher speed than the current methods - e.g. BIC - and acceptable accuracy. Our -proposed method is based on the pitch frequency of the speech. The accuracy of -this method is similar to the accuracy of common speaker segmentation methods. -However, its computation cost is much less than theirs. We show that our method -is about 2.4 times faster than the BIC-based method, while the average accuracy -of pitch-based method is slightly higher than that of the BIC-based method. -" -516,1205.1975,"Arnaud Casteigts, Paola Flocchini, Emmanuel Godard, Nicola Santoro, - Masafumi Yamashita","Expressivity of Time-Varying Graphs and the Power of Waiting in Dynamic - Networks",cs.DC cs.CL," In infrastructure-less highly dynamic networks, computing and performing even -basic tasks (such as routing and broadcasting) is a very challenging activity -due to the fact that connectivity does not necessarily hold, and the network -may actually be disconnected at every time instant. Clearly the task of -designing protocols for these networks is less difficult if the environment -allows waiting (i.e., it provides the nodes with store-carry-forward-like -mechanisms such as local buffering) than if waiting is not feasible. No -quantitative corroborations of this fact exist (e.g., no answer to the -question: how much easier?). In this paper, we consider these qualitative -questions about dynamic networks, modeled as time-varying (or evolving) graphs, -where edges exist only at some times. - We examine the difficulty of the environment in terms of the expressivity of -the corresponding time-varying graph; that is in terms of the language -generated by the feasible journeys in the graph. We prove that the set of -languages $L_{nowait}$ when no waiting is allowed contains all computable -languages. On the other end, using algebraic properties of quasi-orders, we -prove that $L_{wait}$ is just the family of regular languages. In other words, -we prove that, when waiting is no longer forbidden, the power of the accepting -automaton (difficulty of the environment) drops drastically from being as -powerful as a Turing machine, to becoming that of a Finite-State machine. This -(perhaps surprisingly large) gap is a measure of the computational power of -waiting. - We also study bounded waiting; that is when waiting is allowed at a node only -for at most $d$ time units. We prove the negative result that $L_{wait[d]} = -L_{nowait}$; that is, the expressivity decreases only if the waiting is finite -but unpredictable (i.e., under the control of the protocol designer and not of -the environment). -" -517,1205.2657,"Jordan Boyd-Graber, David Blei",Multilingual Topic Models for Unaligned Text,cs.CL cs.IR cs.LG stat.ML," We develop the multilingual topic model for unaligned text (MuTo), a -probabilistic model of text that is designed to analyze corpora composed of -documents in two languages. From these documents, MuTo uses stochastic EM to -simultaneously discover both a matching between the languages and multilingual -latent topics. We demonstrate that MuTo is able to find shared topics on -real-world multilingual corpora, successfully pairing related documents across -languages. MuTo provides a new framework for creating multilingual topic models -without needing carefully curated parallel corpora and allows applications -built using the topic model formalism to be applied to a much wider class of -corpora. -" -518,1205.3183,"Luis Quesada, Fernando Berzal, Francisco J. Cortijo",A Model-Driven Probabilistic Parser Generator,cs.CL," Existing probabilistic scanners and parsers impose hard constraints on the -way lexical and syntactic ambiguities can be resolved. Furthermore, traditional -grammar-based parsing tools are limited in the mechanisms they allow for taking -context into account. In this paper, we propose a model-driven tool that allows -for statistical language models with arbitrary probability estimators. Our work -on model-driven probabilistic parsing is built on top of ModelCC, a model-based -parser generator, and enables the probabilistic interpretation and resolution -of anaphoric, cataphoric, and recursive references in the disambiguation of -abstract syntax graphs. In order to prove the expression power of ModelCC, we -describe the design of a general-purpose natural language parser. -" -519,1205.3316,Naim Terbeh and Mounir Zrigui,"Arabic Language Learning Assisted by Computer, based on Automatic Speech - Recognition",cs.CL," This work consists of creating a system of the Computer Assisted Language -Learning (CALL) based on a system of Automatic Speech Recognition (ASR) for the -Arabic language using the tool CMU Sphinx3 [1], based on the approach of HMM. -To this work, we have constructed a corpus of six hours of speech recordings -with a number of nine speakers. we find in the robustness to noise a grounds -for the choice of the HMM approach [2]. the results achieved are encouraging -since our corpus is made by only nine speakers, but they are always reasons -that open the door for other improvement works. -" -520,1205.4298,Yoav Goldberg,Task-specific Word-Clustering for Part-of-Speech Tagging,cs.CL," While the use of cluster features became ubiquitous in core NLP tasks, most -cluster features in NLP are based on distributional similarity. We propose a -new type of clustering criteria, specific to the task of part-of-speech -tagging. Instead of distributional similarity, these clusters are based on the -beha vior of a baseline tagger when applied to a large corpus. These cluster -features provide similar gains in accuracy to those achieved by -distributional-similarity derived clusters. Using both types of cluster -features together further improve tagging accuracies. We show that the method -is effective for both the in-domain and out-of-domain scenarios for English, -and for French, German and Italian. The effect is larger for out-of-domain -text. -" -521,1205.4324,P\'adraig Mac Carron and Ralph Kenna,Universal Properties of Mythological Networks,physics.soc-ph cs.CL cs.SI," As in statistical physics, the concept of universality plays an important, -albeit qualitative, role in the field of comparative mythology. Here we apply -statistical mechanical tools to analyse the networks underlying three iconic -mythological narratives with a view to identifying common and distinguishing -quantitative features. Of the three narratives, an Anglo-Saxon and a Greek text -are mostly believed by antiquarians to be partly historically based while the -third, an Irish epic, is often considered to be fictional. Here we show that -network analysis is able to discriminate real from imaginary social networks -and place mythological narratives on the spectrum between them. Moreover, the -perceived artificiality of the Irish narrative can be traced back to anomalous -features associated with six characters. Considering these as amalgams of -several entities or proxies, renders the plausibility of the Irish text -comparable to the others from a network-theoretic point of view. -" -522,1205.4387,"Yoav Goldberg, Michael Elhadad",Precision-biased Parsing and High-Quality Parse Selection,cs.CL," We introduce precision-biased parsing: a parsing task which favors precision -over recall by allowing the parser to abstain from decisions deemed uncertain. -We focus on dependency-parsing and present an ensemble method which is capable -of assigning parents to 84% of the text tokens while being over 96% accurate on -these tokens. We use the precision-biased parsing task to solve the related -high-quality parse-selection task: finding a subset of high-quality (accurate) -trees in a large collection of parsed text. We present a method for choosing -over a third of the input trees while keeping unlabeled dependency parsing -accuracy of 97% on these trees. We also present a method which is not based on -an ensemble but rather on directly predicting the risk associated with -individual parser decisions. In addition to its efficiency, this method -demonstrates that a parsing system can provide reasonable estimates of -confidence in its predictions without relying on ensembles or aggregate corpus -counts. -" -523,1205.5407,Deniz Yuret,"FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely - Lexical Substitutes Based on an N-gram Language Model",cs.CL," Lexical substitutes have found use in areas such as paraphrasing, text -simplification, machine translation, word sense disambiguation, and part of -speech induction. However the computational complexity of accurately -identifying the most likely substitutes for a word has made large scale -experiments difficult. In this paper I introduce a new search algorithm, -FASTSUBS, that is guaranteed to find the K most likely lexical substitutes for -a given word in a sentence based on an n-gram language model. The computation -is sub-linear in both K and the vocabulary size V. An implementation of the -algorithm and a dataset with the top 100 substitutes of each token in the WSJ -section of the Penn Treebank are available at http://goo.gl/jzKH0. -" -524,1205.6396,Murphy Choy,Effective Listings of Function Stop words for Twitter,cs.IR cs.CL," Many words in documents recur very frequently but are essentially meaningless -as they are used to join words together in a sentence. It is commonly -understood that stop words do not contribute to the context or content of -textual documents. Due to their high frequency of occurrence, their presence in -text mining presents an obstacle to the understanding of the content in the -documents. To eliminate the bias effects, most text mining software or -approaches make use of stop words list to identify and remove those words. -However, the development of such top words list is difficult and inconsistent -between textual sources. This problem is further aggravated by sources such as -Twitter which are highly repetitive or similar in nature. In this paper, we -will be examining the original work using term frequency, inverse document -frequency and term adjacency for developing a stop words list for the Twitter -data source. We propose a new technique using combinatorial values as an -alternative measure to effectively list out stop words. -" -525,1205.6832,"Ga\""elle Lortal (TRT), Brigitte Grau (LIMSI), Michael Zock (LIF)","Syst\`eme d'aide \`a l'acc\`es lexical : trouver le mot qu'on a sur le - bout de la langue",cs.CL," The study of the Tip of the Tongue phenomenon (TOT) provides valuable clues -and insights concerning the organisation of the mental lexicon (meaning, number -of syllables, relation with other words, etc.). This paper describes a tool -based on psycho-linguistic observations concerning the TOT phenomenon. We've -built it to enable a speaker/writer to find the word he is looking for, word he -may know, but which he is unable to access in time. We try to simulate the TOT -phenomenon by creating a situation where the system knows the target word, yet -is unable to access it. In order to find the target word we make use of the -paradigmatic and syntagmatic associations stored in the linguistic databases. -Our experiment allows the following conclusion: a tool like SVETLAN, capable to -structure (automatically) a dictionary by domains can be used sucessfully to -help the speaker/writer to find the word he is looking for, if it is combined -with a database rich in terms of paradigmatic links like EuroWordNet. -" -526,1206.0042,"Megan Belzner, Sean Colin-Ellerin and Jorge H. Roman",Language Acquisition in Computers,cs.CL," This project explores the nature of language acquisition in computers, guided -by techniques similar to those used in children. While existing natural -language processing methods are limited in scope and understanding, our system -aims to gain an understanding of language from first principles and hence -minimal initial input. The first portion of our system was implemented in Java -and is focused on understanding the morphology of language using bigrams. We -use frequency distributions and differences between them to define and -distinguish languages. English and French texts were analyzed to determine a -difference threshold of 55 before the texts are considered to be in different -languages, and this threshold was verified using Spanish texts. The second -portion of our system focuses on gaining an understanding of the syntax of a -language using a recursive method. The program uses one of two possible methods -to analyze given sentences based on either sentence patterns or surrounding -words. Both methods have been implemented in C++. The program is able to -understand the structure of simple sentences and learn new words. In addition, -we have provided some suggestions regarding future work and potential -extensions of the existing program. -" -527,1206.0377,"Balazs Pinter, Gyula Voros, Zoltan Szabo, Andras Lorincz",Automated Word Puzzle Generation via Topic Dictionaries,cs.CL math.CO," We propose a general method for automated word puzzle generation. Contrary to -previous approaches in this novel field, the presented method does not rely on -highly structured datasets obtained with serious human annotation effort: it -only needs an unstructured and unannotated corpus (i.e., document collection) -as input. The method builds upon two additional pillars: (i) a topic model, -which induces a topic dictionary from the input corpus (examples include e.g., -latent semantic analysis, group-structured dictionaries or latent Dirichlet -allocation), and (ii) a semantic similarity measure of word pairs. Our method -can (i) generate automatically a large number of proper word puzzles of -different types, including the odd one out, choose the related word and -separate the topics puzzle. (ii) It can easily create domain-specific puzzles -by replacing the corpus component. (iii) It is also capable of automatically -generating puzzles with parameterizable levels of difficulty suitable for, -e.g., beginners or intermediate learners. -" -528,1206.0381,"Md. Nawab Yousuf Ali, Shamim Ripon and Shaikh Muhammad Allayear","UNL Based Bangla Natural Text Conversion - Predicate Preserving Parser - Approach",cs.CL," Universal Networking Language (UNL) is a declarative formal language that is -used to represent semantic data extracted from natural language texts. This -paper presents a novel approach to converting Bangla natural language text into -UNL using a method known as Predicate Preserving Parser (PPP) technique. PPP -performs morphological, syntactic and semantic, and lexical analysis of text -synchronously. This analysis produces a semantic-net like structure represented -using UNL. We demonstrate how Bangla texts are analyzed following the PPP -technique to produce UNL documents which can then be translated into any other -suitable natural language facilitating the opportunity to develop a universal -language translation method via UNL. -" -529,1206.1066,"Eunsol Choi, Chenhao Tan, Lillian Lee, Cristian - Danescu-Niculescu-Mizil, and Jennifer Spindel","Hedge detection as a lens on framing in the GMO debates: A position - paper",cs.CL," Understanding the ways in which participants in public discussions frame -their arguments is important in understanding how public opinion is formed. In -this paper, we adopt the position that it is time for more -computationally-oriented research on problems involving framing. In the -interests of furthering that goal, we propose the following specific, -interesting and, we believe, relatively accessible question: In the controversy -regarding the use of genetically-modified organisms (GMOs) in agriculture, do -pro- and anti-GMO articles differ in whether they choose to adopt a -""scientific"" tone? - Prior work on the rhetoric and sociology of science suggests that hedging may -distinguish popular-science text from text written by professional scientists -for their colleagues. We propose a detailed approach to studying whether hedge -detection can be used to understanding scientific framing in the GMO debates, -and provide corpora to facilitate this study. Some of our preliminary analyses -suggest that hedges occur less frequently in scientific discourse than in -popular text, a finding that contradicts prior assertions in the literature. We -hope that our initial work and data will encourage others to pursue this -promising line of inquiry. -" -530,1206.1069,"Diederik Aerts, Liane Gabora and Sandro Sozzo","Concepts and Their Dynamics: A Quantum-Theoretic Modeling of Human - Thought",cs.AI cs.CL quant-ph," We analyze different aspects of our quantum modeling approach of human -concepts, and more specifically focus on the quantum effects of contextuality, -interference, entanglement and emergence, illustrating how each of them makes -its appearance in specific situations of the dynamics of human concepts and -their combinations. We point out the relation of our approach, which is based -on an ontology of a concept as an entity in a state changing under influence of -a context, with the main traditional concept theories, i.e. prototype theory, -exemplar theory and theory theory. We ponder about the question why quantum -theory performs so well in its modeling of human concepts, and shed light on -this question by analyzing the role of complex amplitudes, showing how they -allow to describe interference in the statistics of measurement outcomes, while -in the traditional theories statistics of outcomes originates in classical -probability weights, without the possibility of interference. The relevance of -complex numbers, the appearance of entanglement, and the role of Fock space in -explaining contextual emergence, all as unique features of the quantum -modeling, are explicitly revealed in this paper by analyzing human concepts and -their dynamics. -" -531,1206.2009,"Asma Boudhief, Mohsen Maraoui, Mounir Zrigui","Developing a model for a text database indexed pedagogically for - teaching the Arabic language",cs.CL," In this memory we made the design of an indexing model for Arabic language -and adapting standards for describing learning resources used (the LOM and -their application profiles) with learning conditions such as levels education -of students, their levels of understanding...the pedagogical context with -taking into account the repre-sentative elements of the text, text's -length,...in particular, we highlight the specificity of the Arabic language -which is a complex language, characterized by its flexion, its voyellation and -its agglutination. -" -532,1206.2010,Michele Filannino,Temporal expression normalisation in natural language texts,cs.CL cs.IR," Automatic annotation of temporal expressions is a research challenge of great -interest in the field of information extraction. In this report, I describe a -novel rule-based architecture, built on top of a pre-existing system, which is -able to normalise temporal expressions detected in English texts. Gold standard -temporally-annotated resources are limited in size and this makes research -difficult. The proposed system outperforms the state-of-the-art systems with -respect to TempEval-2 Shared Task (value attribute) and achieves substantially -better results with respect to the pre-existing system on top of which it has -been developed. I will also introduce a new free corpus consisting of 2822 -unique annotated temporal expressions. Both the corpus and the system are -freely available on-line. -" -533,1206.3254,"Amit Gruber, Michal Rosen-Zvi, Yair Weiss",Latent Topic Models for Hypertext,cs.IR cs.CL cs.LG stat.ML," Latent topic models have been successfully applied as an unsupervised topic -discovery technique in large document collections. With the proliferation of -hypertext document collection such as the Internet, there has also been great -interest in extending these approaches to hypertext [6, 9]. These approaches -typically model links in an analogous fashion to how they model words - the -document-link co-occurrence matrix is modeled in the same way that the -document-word co-occurrence matrix is modeled in standard topic models. In this -paper we present a probabilistic generative model for hypertext document -collections that explicitly models the generation of links. Specifically, links -from a word w to a document d depend directly on how frequent the topic of w is -in d, in addition to the in-degree of d. We show how to perform EM learning on -this model efficiently. By not modeling links as analogous to words, we end up -using far fewer free parameters and obtain better link prediction results. -" -534,1206.3293,"Peter Thwaites, Jim Q. Smith, Robert G. Cowell",Propagation using Chain Event Graphs,cs.AI cs.CL," A Chain Event Graph (CEG) is a graphial model which designed to embody -conditional independencies in problems whose state spaces are highly asymmetric -and do not admit a natural product structure. In this paer we present a -probability propagation algorithm which uses the topology of the CEG to build a -transporter CEG. Intriungly,the transporter CEG is directly analogous to the -triangulated Bayesian Network (BN) in the more conventional junction tree -propagation algorithms used with BNs. The propagation method uses factorization -formulae also analogous to (but different from) the ones using potentials on -cliques and separators of the BN. It appears that the methods will be typically -more efficient than the BN algorithms when applied to contexts where there is -significant asymmetry present. -" -535,1206.4522,Phil Gooch,"BADREX: In situ expansion and coreference of biomedical abbreviations - using dynamic regular expressions",cs.CL," BADREX uses dynamically generated regular expressions to annotate term -definition-term abbreviation pairs, and corefers unpaired acronyms and -abbreviations back to their initial definition in the text. Against the -Medstract corpus BADREX achieves precision and recall of 98% and 97%, and -against a much larger corpus, 90% and 85%, respectively. BADREX yields improved -performance over previous approaches, requires no training data and allows -runtime customisation of its input parameters. BADREX is freely available from -https://github.com/philgooch/BADREX-Biomedical-Abbreviation-Expander as a -plugin for the General Architecture for Text Engineering (GATE) framework and -is licensed under the GPLv3. -" -536,1206.4631,"Edoardo M Airoldi, Jonathan M Bischof","A Poisson convolution model for characterizing topical content with word - frequency and exclusivity",cs.LG cs.CL cs.IR stat.ME stat.ML," An ongoing challenge in the analysis of document collections is how to -summarize content in terms of a set of inferred themes that can be interpreted -substantively in terms of topics. The current practice of parametrizing the -themes in terms of most frequent words limits interpretability by ignoring the -differential use of words across topics. We argue that words that are both -common and exclusive to a theme are more effective at characterizing topical -content. We consider a setting where professional editors have annotated -documents to a collection of topic categories, organized into a tree, in which -leaf-nodes correspond to the most specific topics. Each document is annotated -to multiple categories, at different levels of the tree. We introduce a -hierarchical Poisson convolution model to analyze annotated documents in this -setting. The model leverages the structure among categories defined by -professional editors to infer a clear semantic description for each topic in -terms of words that are both frequent and exclusive. We carry out a large -randomized experiment on Amazon Turk to demonstrate that topic summaries based -on the FREX score are more interpretable than currently established frequency -based summaries, and that the proposed model produces more efficient estimates -of exclusivity than with currently models. We also develop a parallelized -Hamiltonian Monte Carlo sampler that allows the inference to scale to millions -of documents. -" -537,1206.4637,"Paul Prasse (University of Potsdam), Christoph Sawade (University of - Potsdam), Niels Landwehr (University of Potsdam), Tobias Scheffer (University - of Potsdam)",Learning to Identify Regular Expressions that Describe Email Campaigns,cs.LG cs.CL stat.ML," This paper addresses the problem of inferring a regular expression from a -given set of strings that resembles, as closely as possible, the regular -expression that a human expert would have written to identify the language. -This is motivated by our goal of automating the task of postmasters of an email -service who use regular expressions to describe and blacklist email spam -campaigns. Training data contains batches of messages and corresponding regular -expressions that an expert postmaster feels confident to blacklist. We model -this task as a learning problem with structured output spaces and an -appropriate loss function, derive a decoder and the resulting optimization -problem, and a report on a case study conducted with an email service. -" -538,1206.4958,"Peiyou Song, Anhei Shu, Anyu Zhou, Dan Wallach, Jedidiah R. Crandall",A Pointillism Approach for Natural Language Processing of Social Media,cs.IR cs.CL cs.SI," The Chinese language poses challenges for natural language processing based -on the unit of a word even for formal uses of the Chinese language, social -media only makes word segmentation in Chinese even more difficult. In this -document we propose a pointillism approach to natural language processing. -Rather than words that have individual meanings, the basic unit of a -pointillism approach is trigrams of characters. These grams take on meaning in -aggregate when they appear together in a way that is correlated over time. - Our results from three kinds of experiments show that when words and topics -do have a meme-like trend, they can be reconstructed from only trigrams. For -example, for 4-character idioms that appear at least 99 times in one day in our -data, the unconstrained precision (that is, precision that allows for deviation -from a lexicon when the result is just as correct as the lexicon version of the -word or phrase) is 0.93. For longer words and phrases collected from -Wiktionary, including neologisms, the unconstrained precision is 0.87. We -consider these results to be very promising, because they suggest that it is -feasible for a machine to reconstruct complex idioms, phrases, and neologisms -with good precision without any notion of words. Thus the colorful and baroque -uses of language that typify social media in challenging languages such as -Chinese may in fact be accessible to machines. -" -539,1206.5333,"Naushad UzZaman, Hector Llorens, James Allen, Leon Derczynski, Marc - Verhagen and James Pustejovsky","TempEval-3: Evaluating Events, Time Expressions, and Temporal Relations",cs.CL," We describe the TempEval-3 task which is currently in preparation for the -SemEval-2013 evaluation exercise. The aim of TempEval is to advance research on -temporal information processing. TempEval-3 follows on from previous TempEval -events, incorporating: a three-part task structure covering event, temporal -expression and temporal relation extraction; a larger dataset; and single -overall task quality scores. -" -540,1206.5384,Tarek El-Shishtawy and Fatma El-Ghannam,Keyphrase Based Arabic Summarizer (KPAS),cs.CL cs.AI," This paper describes a computationally inexpensive and efficient generic -summarization algorithm for Arabic texts. The algorithm belongs to extractive -summarization family, which reduces the problem into representative sentences -identification and extraction sub-problems. Important keyphrases of the -document to be summarized are identified employing combinations of statistical -and linguistic features. The sentence extraction algorithm exploits keyphrases -as the primary attributes to rank a sentence. The present experimental work, -demonstrates different techniques for achieving various summarization goals -including: informative richness, coverage of both main and auxiliary topics, -and keeping redundancy to a minimum. A scoring scheme is then adopted that -balances between these summarization goals. To evaluate the resulted Arabic -summaries with well-established systems, aligned English/Arabic texts are used -through the experiments. -" -541,1206.5851,Daniel Gayo-Avello,"A meta-analysis of state-of-the-art electoral prediction from Twitter - data",cs.SI cs.CL cs.CY physics.soc-ph," Electoral prediction from Twitter data is an appealing research topic. It -seems relatively straightforward and the prevailing view is overly optimistic. -This is problematic because while simple approaches are assumed to be good -enough, core problems are not addressed. Thus, this paper aims to (1) provide a -balanced and critical review of the state of the art; (2) cast light on the -presume predictive power of Twitter data; and (3) depict a roadmap to push -forward the field. Hence, a scheme to characterize Twitter prediction methods -is proposed. It covers every aspect from data collection to performance -evaluation, through data processing and vote inference. Using that scheme, -prior research is analyzed and organized to explain the main approaches taken -up to date but also their weaknesses. This is the first meta-analysis of the -whole body of research regarding electoral prediction from Twitter data. It -reveals that its presumed predictive power regarding electoral prediction has -been rather exaggerated: although social media may provide a glimpse on -electoral outcomes current research does not provide strong evidence to support -it can replace traditional polls. Finally, future lines of research along with -a set of requirements they must fulfill are provided. -" -542,1206.6403,"Paramveer Dhillon (University of Pennsylvania), Jordan Rodu - (University of Pennsylvania), Dean Foster (University of Pennsylvania), Lyle - Ungar (University of Pennsylvania)","Two Step CCA: A new spectral method for estimating vector models of - words",cs.CL cs.LG," Unlabeled data is often used to learn representations which can be used to -supplement baseline features in a supervised learner. For example, for text -applications where the words lie in a very high dimensional space (the size of -the vocabulary), one can learn a low rank ""dictionary"" by an -eigen-decomposition of the word co-occurrence matrix (e.g. using PCA or CCA). -In this paper, we present a new spectral method based on CCA to learn an -eigenword dictionary. Our improved procedure computes two set of CCAs, the -first one between the left and right contexts of the given word and the second -one between the projections resulting from this CCA and the word itself. We -prove theoretically that this two-step procedure has lower sample complexity -than the simple single step procedure and also illustrate the empirical -efficacy of our approach and the richness of representations learned by our Two -Step CCA (TSCCA) procedure on the tasks of POS tagging and sentiment -classification. -" -543,1206.6423,"Cynthia Matuszek (University of Washington), Nicholas FitzGerald - (University of Washington), Luke Zettlemoyer (University of Washington), - Liefeng Bo (University of Washington), Dieter Fox (University of Washington)",A Joint Model of Language and Perception for Grounded Attribute Learning,cs.CL cs.LG cs.RO," As robots become more ubiquitous and capable, it becomes ever more important -to enable untrained users to easily interact with them. Recently, this has led -to study of the language grounding problem, where the goal is to extract -representations of the meanings of natural language tied to perception and -actuation in the physical world. In this paper, we present an approach for -joint learning of language and perception models for grounded attribute -induction. Our perception model includes attribute classifiers, for example to -detect object color and shape, and the language model is based on a -probabilistic categorial grammar that enables the construction of rich, -compositional meaning representations. The approach is evaluated on the task of -interpreting sentences that describe sets of objects in a physical workspace. -We demonstrate accurate task performance and effective latent-variable concept -induction in physical grounded scenes. -" -544,1206.6426,"Andriy Mnih (University College London), Yee Whye Teh (University - College London)","A Fast and Simple Algorithm for Training Neural Probabilistic Language - Models",cs.CL cs.LG," In spite of their superior performance, neural probabilistic language models -(NPLMs) remain far less widely used than n-gram models due to their notoriously -long training times, which are measured in weeks even for moderately-sized -datasets. Training NPLMs is computationally expensive because they are -explicitly normalized, which leads to having to consider all words in the -vocabulary when computing the log-likelihood gradients. - We propose a fast and simple algorithm for training NPLMs based on -noise-contrastive estimation, a newly introduced procedure for estimating -unnormalized continuous distributions. We investigate the behaviour of the -algorithm on the Penn Treebank corpus and show that it reduces the training -times by more than an order of magnitude without affecting the quality of the -resulting models. The algorithm is also more efficient and much more stable -than importance sampling because it requires far fewer noise samples to perform -well. - We demonstrate the scalability of the proposed approach by training several -neural language models on a 47M-word corpus with a 80K-word vocabulary, -obtaining state-of-the-art results on the Microsoft Research Sentence -Completion Challenge dataset. -" -545,1206.6481,"Yuhong Guo (Temple University), Min Xiao (Temple University)","Cross Language Text Classification via Subspace Co-Regularized - Multi-View Learning",cs.CL cs.IR cs.LG," In many multilingual text classification problems, the documents in different -languages often share the same set of categories. To reduce the labeling cost -of training a classification model for each individual language, it is -important to transfer the label knowledge gained from one language to another -language by conducting cross language classification. In this paper we develop -a novel subspace co-regularized multi-view learning method for cross language -text classification. This method is built on parallel corpora produced by -machine translation. It jointly minimizes the training error of each classifier -in each language while penalizing the distance between the subspace -representations of parallel documents. Our empirical study on a large set of -cross language text classification tasks shows the proposed method consistently -outperforms a number of inductive methods, domain adaptation methods, and -multi-view learning methods. -" -546,1206.6735,"Shay B. Cohen, Carlos G\'omez-Rodr\'iguez, Giorgio Satta",Elimination of Spurious Ambiguity in Transition-Based Dependency Parsing,cs.CL cs.AI," We present a novel technique to remove spurious ambiguity from transition -systems for dependency parsing. Our technique chooses a canonical sequence of -transition operations (computation) for a given dependency tree. Our technique -can be applied to a large class of bottom-up transition systems, including for -instance Nivre (2004) and Attardi (2006). -" -547,1207.0052,Jacob Andreas,The Complexity of Learning Principles and Parameters Grammars,cs.FL cs.CL," We investigate models for learning the class of context-free and -context-sensitive languages (CFLs and CSLs). We begin with a brief discussion -of some early hardness results which show that unrestricted language learning -is impossible, and unrestricted CFL learning is computationally infeasible; we -then briefly survey the literature on algorithms for learning restricted -subclasses of the CFLs. Finally, we introduce a new family of subclasses, the -principled parametric context-free grammars (and a corresponding family of -principled parametric context-sensitive grammars), which roughly model the -""Principles and Parameters"" framework in psycholinguistics. We present three -hardness results: first, that the PPCFGs are not efficiently learnable given -equivalence and membership oracles, second, that the PPCFGs are not efficiently -learnable from positive presentations unless P = NP, and third, that the PPCSGs -are not efficiently learnable from positive presentations unless integer -factorization is in P. -" -548,1207.0245,Noah A. Smith,Adversarial Evaluation for Models of Natural Language,cs.CL," We now have a rich and growing set of modeling tools and algorithms for -inducing linguistic structure from text that is less than fully annotated. In -this paper, we discuss some of the weaknesses of our current methodology. We -present a new abstract framework for evaluating natural language processing -(NLP) models in general and unsupervised NLP models in particular. The central -idea is to make explicit certain adversarial roles among researchers, so that -the different roles in an evaluation are more clearly defined and performers of -all roles are offered ways to make measurable contributions to the larger goal. -Adopting this approach may help to characterize model successes and failures by -encouraging earlier consideration of error analysis. The framework can be -instantiated in a variety of ways, simulating some familiar intrinsic and -extrinsic evaluations as well as some new evaluations. -" -549,1207.0396,"Peratham Wiriyathammabhum, Boonserm Kijsirikul, Hiroya Takamura, - Manabu Okumura",Applying Deep Belief Networks to Word Sense Disambiguation,cs.CL cs.LG," In this paper, we applied a novel learning algorithm, namely, Deep Belief -Networks (DBN) to word sense disambiguation (WSD). DBN is a probabilistic -generative model composed of multiple layers of hidden units. DBN uses -Restricted Boltzmann Machine (RBM) to greedily train layer by layer as a -pretraining. Then, a separate fine tuning step is employed to improve the -discriminative power. We compared DBN with various state-of-the-art supervised -learning algorithms in WSD such as Support Vector Machine (SVM), Maximum -Entropy model (MaxEnt), Naive Bayes classifier (NB) and Kernel Principal -Component Analysis (KPCA). We used all words in the given paragraph, -surrounding context words and part-of-speech of surrounding words as our -knowledge sources. We conducted our experiment on the SENSEVAL-2 data set. We -observed that DBN outperformed all other learning algorithms. -" -550,1207.0658,"Eduardo G. Altmann, Giampaolo Cristadoro, and Mirko Degli Esposti",On the origin of long-range correlations in texts,physics.data-an cs.CL physics.soc-ph," The complexity of human interactions with social and natural phenomena is -mirrored in the way we describe our experiences through natural language. In -order to retain and convey such a high dimensional information, the statistical -properties of our linguistic output has to be highly correlated in time. An -example are the robust observations, still largely not understood, of -correlations on arbitrary long scales in literary texts. In this paper we -explain how long-range correlations flow from highly structured linguistic -levels down to the building blocks of a text (words, letters, etc..). By -combining calculations and data analysis we show that correlations take form of -a bursty sequence of events once we approach the semantically relevant topics -of the text. The mechanisms we identify are fairly general and can be equally -applied to other hierarchical settings. -" -551,1207.0742,Marc Dymetman and Guillaume Bouchard and Simon Carter,The OS* Algorithm: a Joint Approach to Exact Optimization and Sampling,cs.AI cs.CL cs.LG," Most current sampling algorithms for high-dimensional distributions are based -on MCMC techniques and are approximate in the sense that they are valid only -asymptotically. Rejection sampling, on the other hand, produces valid samples, -but is unrealistically slow in high-dimension spaces. The OS* algorithm that we -propose is a unified approach to exact optimization and sampling, based on -incremental refinements of a functional upper bound, which combines ideas of -adaptive rejection sampling and of A* optimization search. We show that the -choice of the refinement can be done in a way that ensures tractability in -high-dimension spaces, and we present first experiments in two different -settings: inference in high-order HMMs and in large discrete graphical models. -" -552,1207.1420,"Luke S. Zettlemoyer, Michael Collins","Learning to Map Sentences to Logical Form: Structured Classification - with Probabilistic Categorial Grammars",cs.CL," This paper addresses the problem of mapping natural language sentences to -lambda-calculus encodings of their meaning. We describe a learning algorithm -that takes as input a training set of sentences labeled with expressions in the -lambda calculus. The algorithm induces a grammar for the problem, along with a -log-linear model that represents a distribution over syntactic and semantic -analyses conditioned on the input sentence. We apply the method to the task of -learning natural language interfaces to databases and show that the learned -parsers outperform previous methods in two benchmark database domains. -" -553,1207.1847,Ted Dunning,"Finding Structure in Text, Genome and Other Symbolic Sequences",cs.CL cs.IR," The statistical methods derived and described in this thesis provide new ways -to elucidate the structural properties of text and other symbolic sequences. -Generically, these methods allow detection of a difference in the frequency of -a single feature, the detection of a difference between the frequencies of an -ensemble of features and the attribution of the source of a text. These three -abstract tasks suffice to solve problems in a wide variety of settings. -Furthermore, the techniques described in this thesis can be extended to provide -a wide range of additional tests beyond the ones described here. - A variety of applications for these methods are examined in detail. These -applications are drawn from the area of text analysis and genetic sequence -analysis. The textually oriented tasks include finding interesting collocations -and cooccurent phrases, language identification, and information retrieval. The -biologically oriented tasks include species identification and the discovery of -previously unreported long range structure in genes. In the applications -reported here where direct comparison is possible, the performance of these new -methods substantially exceeds the state of the art. - Overall, the methods described here provide new and effective ways to analyse -text and other symbolic sequences. Their particular strength is that they deal -well with situations where relatively little data are available. Since these -methods are abstract in nature, they can be applied in novel situations with -relative ease. -" -554,1207.2265,Daoud Clarke,Challenges for Distributional Compositional Semantics,cs.CL cs.AI," This paper summarises the current state-of-the art in the study of -compositionality in distributional semantics, and major challenges for this -area. We single out generalised quantifiers and intensional semantics as areas -on which to focus attention for the development of the theory. Once suitable -theories have been developed, algorithms will be needed to apply the theory to -tasks. Evaluation is a major problem; we single out application to recognising -textual entailment and machine translation for this purpose. -" -555,1207.2334,Reginald D. Smith,Distinct word length frequencies: distributions and symbol entropies,cs.CL physics.data-an," The distribution of frequency counts of distinct words by length in a -language's vocabulary will be analyzed using two methods. The first, will look -at the empirical distributions of several languages and derive a distribution -that reasonably explains the number of distinct words as a function of length. -We will be able to derive the frequency count, mean word length, and variance -of word length based on the marginal probability of letters and spaces. The -second, based on information theory, will demonstrate that the conditional -entropies can also be used to estimate the frequency of distinct words of a -given length in a language. In addition, it will be shown how these techniques -can also be applied to estimate higher order entropies using vocabulary word -length. -" -556,1207.2714,"Mohamed Achraf Ben Mohamed, Mounir Zrigui and Mohsen Maraoui",Clustering based approach extracting collocations,cs.CL," The following study presents a collocation extraction approach based on -clustering technique. This study uses a combination of several classical -measures which cover all aspects of a given corpus then it suggests separating -bigrams found in the corpus in several disjoint groups according to the -probability of presence of collocations. This will allow excluding groups where -the presence of collocations is very unlikely and thus reducing in a meaningful -way the search space. -" -557,1207.3169,"Stuart Semple, Minna J. Hsu, Govindasamy Agoramoorthy, Ramon - Ferrer-i-Cancho","The law of brevity in macaque vocal communication is not an artifact of - analyzing mean call durations",q-bio.NC cs.CL physics.data-an," Words follow the law of brevity, i.e. more frequent words tend to be shorter. -From a statistical point of view, this qualitative definition of the law states -that word length and word frequency are negatively correlated. Here the recent -finding of patterning consistent with the law of brevity in Formosan macaque -vocal communication (Semple et al., 2010) is revisited. It is shown that the -negative correlation between mean duration and frequency of use in the -vocalizations of Formosan macaques is not an artifact of the use of a mean -duration for each call type instead of the customary 'word' length of studies -of the law in human language. The key point demonstrated is that the total -duration of calls of a particular type increases with the number of calls of -that type. The finding of the law of brevity in the vocalizations of these -macaques therefore defies a trivial explanation. -" -558,1207.3932,"Kishorjit Nongmeikapam, Vidya Raj RK, Oinam Imocha Singh and Sivaji - Bandyopadhyay",Automatic Segmentation of Manipuri (Meiteilon) Word into Syllabic Units,cs.CL," The work of automatic segmentation of a Manipuri language (or Meiteilon) word -into syllabic units is demonstrated in this paper. This language is a scheduled -Indian language of Tibeto-Burman origin, which is also a very highly -agglutinative language. This language usages two script: a Bengali script and -Meitei Mayek (Script). The present work is based on the second script. An -algorithm is designed so as to identify mainly the syllables of Manipuri origin -word. The result of the algorithm shows a Recall of 74.77, Precision of 91.21 -and F-Score of 82.18 which is a reasonable score with the first attempt of such -kind for this language. -" -559,1207.4307,"Artur Ventura, Nuno Diegues, David Martins de Matos",Frame Interpretation and Validation in a Open Domain Dialogue System,cs.CL cs.RO," Our goal in this paper is to establish a means for a dialogue platform to be -able to cope with open domains considering the possible interaction between the -embodied agent and humans. To this end we present an algorithm capable of -processing natural language utterances and validate them against knowledge -structures of an intelligent agent's mind. Our algorithm leverages dialogue -techniques in order to solve ambiguities and acquire knowledge about unknown -entities. -" -560,1207.4625,E. Laporte (LIGM),Appropriate Nouns with Obligatory Modifiers,cs.CL," The notion of appropriate sequence as introduced by Z. Harris provides a -powerful syntactic way of analysing the detailed meaning of various sentences, -including ambiguous ones. In an adjectival sentence like 'The leather was -yellow', the introduction of an appropriate noun, here 'colour', specifies -which quality the adjective describes. In some other adjectival sentences with -an appropriate noun, that noun plays the same part as 'colour' and seems to be -relevant to the description of the adjective. These appropriate nouns can -usually be used in elementary sentences like 'The leather had some colour', but -in many cases they have a more or less obligatory modifier. For example, you -can hardly mention that an object has a colour without qualifying that colour -at all. About 300 French nouns are appropriate in at least one adjectival -sentence and have an obligatory modifier. They enter in a number of sentence -structures related by several syntactic transformations. The appropriateness of -the noun and the fact that the modifier is obligatory are reflected in these -transformations. The description of these syntactic phenomena provides a basis -for a classification of these nouns. It also concerns the lexical properties of -thousands of predicative adjectives, and in particular the relations between -the sentence without the noun : 'The leather was yellow' and the adjectival -sentence with the noun : 'The colour of the leather was yellow'. -" -561,1207.5328,"Kais Haddar (MIRACL), H\'ela Fehri (MIRACL), Laurent Romary (IDSL, - INRIA Saclay - Ile de France)",A prototype for projecting HPSG syntactic lexica towards LMF,cs.CL," The comparative evaluation of Arabic HPSG grammar lexica requires a deep -study of their linguistic coverage. The complexity of this task results mainly -from the heterogeneity of the descriptive components within those lexica -(underlying linguistic resources and different data categories, for example). -It is therefore essential to define more homogeneous representations, which in -turn will enable us to compare them and eventually merge them. In this context, -we present a method for comparing HPSG lexica based on a rule system. This -method is implemented within a prototype for the projection from Arabic HPSG to -a normalised pivot language compliant with LMF (ISO 24613 - Lexical Markup -Framework) and serialised using a TEI (Text Encoding Initiative) based -representation. The design of this system is based on an initial study of the -HPSG formalism looking at its adequacy for the representation of Arabic, and -from this, we identify the appropriate feature structures corresponding to each -Arabic lexical category and their possible LMF counterparts. -" -562,1207.5409,"Deepak Kumar, Manjeet Singh, Seema Shukla",FST Based Morphological Analyzer for Hindi Language,cs.CL cs.IR," Hindi being a highly inflectional language, FST (Finite State Transducer) -based approach is most efficient for developing a morphological analyzer for -this language. The work presented in this paper uses the SFST (Stuttgart Finite -State Transducer) tool for generating the FST. A lexicon of root words is -created. Rules are then added for generating inflectional and derivational -words from these root words. The Morph Analyzer developed was used in a Part Of -Speech (POS) Tagger based on Stanford POS Tagger. The system was first trained -using a manually tagged corpus and MAXENT (Maximum Entropy) approach of -Stanford POS tagger was then used for tagging input sentences. The -morphological analyzer gives approximately 97% correct results. POS tagger -gives an accuracy of approximately 87% for the sentences that have the words -known to the trained model file, and 80% accuracy for the sentences that have -the words unknown to the trained model file. -" -563,1208.0200,"Asma Boudhief, Mohsen Maraoui and Mounir Zrigui","Adaptation of pedagogical resources description standard (LOM) with the - specificity of Arabic language",cs.CL," In this article we focus firstly on the principle of pedagogical indexing and -characteristics of Arabic language and secondly on the possibility of adapting -the standard for describing learning resources used (the LOM and its -Application Profiles) with learning conditions such as the educational levels -of students and their levels of understanding,... the educational context with -taking into account the representative elements of text, text length, ... in -particular, we put in relief the specificity of the Arabic language which is a -complex language, characterized by its flexion, its voyellation and -agglutination. -" -564,1208.2777,Hyonil Kim and Changil Choe,"A Method for Selecting Noun Sense using Co-occurrence Relation in - English-Korean Translation",cs.CL," The sense analysis is still critical problem in machine translation system, -especially such as English-Korean translation which the syntactical different -between source and target languages is very great. We suggest a method for -selecting the noun sense using contextual feature in English-Korean -Translation. -" -565,1208.2873,Vasileios Lampos,"Detecting Events and Patterns in Large-Scale User Generated Textual - Streams with Statistical Learning Methods",cs.LG cs.CL cs.IR cs.SI stat.AP stat.ML," A vast amount of textual web streams is influenced by events or phenomena -emerging in the real world. The social web forms an excellent modern paradigm, -where unstructured user generated content is published on a regular basis and -in most occasions is freely distributed. The present Ph.D. Thesis deals with -the problem of inferring information - or patterns in general - about events -emerging in real life based on the contents of this textual stream. We show -that it is possible to extract valuable information about social phenomena, -such as an epidemic or even rainfall rates, by automatic analysis of the -content published in Social Media, and in particular Twitter, using Statistical -Machine Learning methods. An important intermediate task regards the formation -and identification of features which characterise a target event; we select and -use those textual features in several linear, non-linear and hybrid inference -approaches achieving a significantly good performance in terms of the applied -loss function. By examining further this rich data set, we also propose methods -for extracting various types of mood signals revealing how affective norms - at -least within the social web's population - evolve during the day and how -significant events emerging in the real world are influencing them. Lastly, we -present some preliminary findings showing several spatiotemporal -characteristics of this textual information as well as the potential of using -it to tackle tasks such as the prediction of voting intentions. -" -566,1208.3001,"Zhili Chen, Liusheng Huang, Wei Yang, Peng Meng, and Haibo Miao","More than Word Frequencies: Authorship Attribution via Natural Frequency - Zoned Word Distribution Analysis",cs.CL," With such increasing popularity and availability of digital text data, -authorships of digital texts can not be taken for granted due to the ease of -copying and parsing. This paper presents a new text style analysis called -natural frequency zoned word distribution analysis (NFZ-WDA), and then a basic -authorship attribution scheme and an open authorship attribution scheme for -digital texts based on the analysis. NFZ-WDA is based on the observation that -all authors leave distinct intrinsic word usage traces on texts written by them -and these intrinsic styles can be identified and employed to analyze the -authorship. The intrinsic word usage styles can be estimated through the -analysis of word distribution within a text, which is more than normal word -frequency analysis and can be expressed as: which groups of words are used in -the text; how frequently does each group of words occur; how are the -occurrences of each group of words distributed in the text. Next, the basic -authorship attribution scheme and the open authorship attribution scheme -provide solutions for both closed and open authorship attribution problems. -Through analysis and extensive experimental studies, this paper demonstrates -the efficiency of the proposed method for authorship attribution. -" -567,1208.3047,Arif Nurwidyantoro and Edi Winarko,"Parallelization of Maximum Entropy POS Tagging for Bahasa Indonesia with - MapReduce",cs.DC cs.CL," In this paper, MapReduce programming model is used to parallelize training -and tagging proceess in Maximum Entropy part of speech tagging for Bahasa -Indonesia. In training process, MapReduce model is implemented dictionary, -tagtoken, and feature creation. In tagging process, MapReduce is implemented to -tag lines of document in parallel. The training experiments showed that total -training time using MapReduce is faster, but its result reading time inside the -process slow down the total training time. The tagging experiments using -different number of map and reduce process showed that MapReduce implementation -could speedup the tagging process. The fastest tagging result is showed by -tagging process using 1,000,000 word corpus and 30 map process. -" -568,1208.3530,"Haimonti Dutta, William Chan, Deepak Shankargouda, Manoj Pooleery, - Axinia Radeva, Kyle Rego, Boyi Xie, Rebecca Passonneau, Austin Lee and - Barbara Taranto","Leveraging Subjective Human Annotation for Clustering Historic Newspaper - Articles",cs.IR cs.CL cs.DL," The New York Public Library is participating in the Chronicling America -initiative to develop an online searchable database of historically significant -newspaper articles. Microfilm copies of the newspapers are scanned and high -resolution Optical Character Recognition (OCR) software is run on them. The -text from the OCR provides a wealth of data and opinion for researchers and -historians. However, categorization of articles provided by the OCR engine is -rudimentary and a large number of the articles are labeled editorial without -further grouping. Manually sorting articles into fine-grained categories is -time consuming if not impossible given the size of the corpus. This paper -studies techniques for automatic categorization of newspaper articles so as to -enhance search and retrieval on the archive. We explore unsupervised (e.g. -KMeans) and semi-supervised (e.g. constrained clustering) learning algorithms -to develop article categorization schemes geared towards the needs of -end-users. A pilot study was designed to understand whether there was unanimous -agreement amongst patrons regarding how articles can be categorized. It was -found that the task was very subjective and consequently automated algorithms -that could deal with subjective labels were used. While the small scale pilot -study was extremely helpful in designing machine learning algorithms, a much -larger system needs to be developed to collect annotations from users of the -archive. The ""BODHI"" system currently being developed is a step in that -direction, allowing users to correct wrongly scanned OCR and providing keywords -and tags for newspaper articles used frequently. On successful implementation -of the beta version of this system, we hope that it can be integrated with -existing software being developed for the Chronicling America project. -" -569,1208.4079,Nishal Pradeepkumar Shah,"Recent Technological Advances in Natural Language Processing and - Artificial Intelligence",cs.CL," A recent advance in computer technology has permitted scientists to implement -and test algorithms that were known from quite some time (or not) but which -were computationally expensive. Two such projects are IBM's Jeopardy as a part -of its DeepQA project [1] and Wolfram's Wolframalpha[2]. Both these methods -implement natural language processing (another goal of AI scientists) and try -to answer questions as asked by the user. Though the goal of the two projects -is similar, both of them have a different procedure at it's core. In the -following sections, the mechanism and history of IBM's Jeopardy and Wolfram -alpha has been explained followed by the implications of these projects in -realizing Ray Kurzweil's [3] dream of passing the Turing test by 2029. A recipe -of taking the above projects to a new level is also explained. -" -570,1208.4503,Gueddah Hicham,Introduction of the weight edition errors in the Levenshtein distance,cs.CL," In this paper, we present a new approach dedicated to correcting the spelling -errors of the Arabic language. This approach corrects typographical errors like -inserting, deleting, and permutation. Our method is inspired from the -Levenshtein algorithm, and allows a finer and better scheduling than -Levenshtein. The results obtained are very satisfactory and encouraging, which -shows the interest of our new approach. -" -571,1208.6109,"Vladimir V. Bochkarev, Anna V. Shevlyakova, Valery D. Solovyev",Average word length dynamics as indicator of cultural changes in society,cs.CL," Dynamics of average length of words in Russian and English is analysed in the -article. Words belonging to the diachronic text corpus Google Books Ngram and -dated back to the last two centuries are studied. It was found out that average -word length slightly increased in the 19th century, and then it was growing -rapidly most of the 20th century and started decreasing over the period from -the end of the 20th - to the beginning of the 21th century. Words which -contributed mostly to increase or decrease of word average length were -identified. At that, content words and functional words are analysed -separately. Long content words contribute mostly to word average length of -word. As it was shown, these words reflect the main tendencies of social -development and thus, are used frequently. Change of frequency of personal -pronouns also contributes significantly to change of average word length. The -other parameters connected with average length of word were also analysed. -" -572,1208.6268,Tanmoy Chakraborty,Authorship Identification in Bengali Literature: a Comparative Analysis,cs.CL cs.IR," Stylometry is the study of the unique linguistic styles and writing behaviors -of individuals. It belongs to the core task of text categorization like -authorship identification, plagiarism detection etc. Though reasonable number -of studies have been conducted in English language, no major work has been done -so far in Bengali. In this work, We will present a demonstration of authorship -identification of the documents written in Bengali. We adopt a set of -fine-grained stylistic features for the analysis of the text and use them to -develop two different models: statistical similarity model consisting of three -measures and their combination, and machine learning model with Decision Tree, -Neural Network and SVM. Experimental results show that SVM outperforms other -state-of-the-art methods after 10-fold cross validations. We also validate the -relative importance of each stylistic feature to show that some of them remain -consistently significant in every model used in this experiment. -" -573,1209.0249,"M. A. El-Dosuky, M. Z. Rashad, T. T. Hamza, A. H. EL-Bassiouny","Robopinion: Opinion Mining Framework Inspired by Autonomous Robot - Navigation",cs.CL cs.IR," Data association methods are used by autonomous robots to find matches -between the current landmarks and the new set of observed features. We seek a -framework for opinion mining to benefit from advancements in autonomous robot -navigation in both research and development -" -574,1209.1300,"Nisheeth Joshi, Iti Mathur",Input Scheme for Hindi Using Phonetic Mapping,cs.CL," Written Communication on Computers requires knowledge of writing text for the -desired language using Computer. Mostly people do not use any other language -besides English. This creates a barrier. To resolve this issue we have -developed a scheme to input text in Hindi using phonetic mapping scheme. Using -this scheme we generate intermediate code strings and match them with -pronunciations of input text. Our system show significant success over other -input systems available. -" -575,1209.1301,"Nisheeth Joshi, Iti Mathur",Evaluation of Computational Grammar Formalisms for Indian Languages,cs.CL," Natural Language Parsing has been the most prominent research area since the -genesis of Natural Language Processing. Probabilistic Parsers are being -developed to make the process of parser development much easier, accurate and -fast. In Indian context, identification of which Computational Grammar -Formalism is to be used is still a question which needs to be answered. In this -paper we focus on this problem and try to analyze different formalisms for -Indian languages. -" -576,1209.1751,Ramon Ferrer-i-Cancho and Ferm\'in Moscoso del Prado Mart\'in,Information content versus word length in random typing,physics.data-an cond-mat.stat-mech cs.CL," Recently, it has been claimed that a linear relationship between a measure of -information content and word length is expected from word length optimization -and it has been shown that this linearity is supported by a strong correlation -between information content and word length in many languages (Piantadosi et -al. 2011, PNAS 108, 3825-3826). Here, we study in detail some connections -between this measure and standard information theory. The relationship between -the measure and word length is studied for the popular random typing process -where a text is constructed by pressing keys at random from a keyboard -containing letters and a space behaving as a word delimiter. Although this -random process does not optimize word lengths according to information content, -it exhibits a linear relationship between information content and word length. -The exact slope and intercept are presented for three major variants of the -random typing process. A strong correlation between information content and -word length can simply arise from the units making a word (e.g., letters) and -not necessarily from the interplay between a word and its context as proposed -by Piantadosi et al. In itself, the linear relation does not entail the results -of any optimization process. -" -577,1209.2163,"Alexandre Delano\""e and Serge Galam","Modeling controversies in the press: the case of the abnormal bees' - death",physics.soc-ph cs.CL," The controversy about the cause(s) of abnormal death of bee colonies in -France is investigated through an extensive analysis of the french speaking -press. A statistical analysis of textual data is first performed on the lexicon -used by journalists to describe the facts and to present associated -informations during the period 1998-2010. Three states are identified to -explain the phenomenon. The first state asserts a unique cause, the second one -focuses on multifactor causes and the third one states the absence of current -proof. Assigning each article to one of the three states, we are able to follow -the associated opinion dynamics among the journalists over 13 years. Then, we -apply the Galam sequential probabilistic model of opinion dynamic to those -data. Assuming journalists are either open mind or inflexible about their -respective opinions, the results are reproduced precisely provided we account -for a series of annual changes in the proportions of respective inflexibles. -The results shed a new counter intuitive light on the various pressure supposed -to apply on the journalists by either chemical industries or beekeepers and -experts or politicians. The obtained dynamics of respective inflexibles shows -the possible effect of lobbying, the inertia of the debate and the net -advantage gained by the first whistleblowers. -" -578,1209.2341,"A.R. Balamurali, Subhabrata Mukherjee, Akshat Malu, Pushpak - Bhattacharyya",Leveraging Sentiment to Compute Word Similarity,cs.IR cs.CL," In this paper, we introduce a new WordNet based similarity metric, SenSim, -which incorporates sentiment content (i.e., degree of positive or negative -sentiment) of the words being compared to measure the similarity between them. -The proposed metric is based on the hypothesis that knowing the sentiment is -beneficial in measuring the similarity. To verify this hypothesis, we measure -and compare the annotator agreement for 2 annotation strategies: 1) sentiment -information of a pair of words is considered while annotating and 2) sentiment -information of a pair of words is not considered while annotating. -Inter-annotator correlation scores show that the agreement is better when the -two annotators consider sentiment information while assigning a similarity -score to a pair of words. We use this hypothesis to measure the similarity -between a pair of words. Specifically, we represent each word as a vector -containing sentiment scores of all the content words in the WordNet gloss of -the sense of that word. These sentiment scores are derived from a sentiment -lexicon. We then measure the cosine similarity between the two vectors. We -perform both intrinsic and extrinsic evaluation of SenSim and compare the -performance with other widely usedWordNet similarity metrics. -" -579,1209.2352,"Subhabrata Mukherjee, Pushpak Bhattacharyya",Feature Specific Sentiment Analysis for Product Reviews,cs.IR cs.CL," In this paper, we present a novel approach to identify feature specific -expressions of opinion in product reviews with different features and mixed -emotions. The objective is realized by identifying a set of potential features -in the review and extracting opinion expressions about those features by -exploiting their associations. Capitalizing on the view that more closely -associated words come together to express an opinion about a certain feature, -dependency parsing is used to identify relations between the opinion -expressions. The system learns the set of significant relations to be used by -dependency parsing and a threshold parameter which allows us to merge closely -associated opinion expressions. The data requirement is minimal as this is a -one time learning of the domain independent parameters. The associations are -represented in the form of a graph which is partitioned to finally retrieve the -opinion expression describing the user specified feature. We show that the -system achieves a high accuracy across all domains and performs at par with -state-of-the-art systems despite its data limitations. -" -580,1209.2400,"Estelle Delpech (LINA), B\'eatrice Daille (LINA), Emmanuel Morin - (LINA), Claire Lemaire","Identification of Fertile Translations in Medical Comparable Corpora: a - Morpho-Compositional Approach",cs.CL," This paper defines a method for lexicon in the biomedical domain from -comparable corpora. The method is based on compositional translation and -exploits morpheme-level translation equivalences. It can generate translations -for a large variety of morphologically constructed words and can also generate -'fertile' translations. We show that fertile translations increase the overall -quality of the extracted lexicon for English to French translation. -" -581,1209.2493,"Subhabrata Mukherjee, Pushpak Bhattacharyya","WikiSent : Weakly Supervised Sentiment Analysis Through Extractive - Summarization With Wikipedia",cs.IR cs.CL," This paper describes a weakly supervised system for sentiment analysis in the -movie review domain. The objective is to classify a movie review into a -polarity class, positive or negative, based on those sentences bearing opinion -on the movie alone. The irrelevant text, not directly related to the reviewer -opinion on the movie, is left out of analysis. Wikipedia incorporates the world -knowledge of movie-specific features in the system which is used to obtain an -extractive summary of the review, consisting of the reviewer's opinions about -the specific aspects of the movie. This filters out the concepts which are -irrelevant or objective with respect to the given movie. The proposed system, -WikiSent, does not require any labeled data for training. The only weak -supervision arises out of the usage of resources like WordNet, Part-of-Speech -Tagger and Sentiment Lexicons by virtue of their construction. WikiSent -achieves a considerable accuracy improvement over the baseline and has a better -or comparable accuracy to the existing semi-supervised and unsupervised systems -in the domain, on the same dataset. We also perform a general movie review -trend analysis using WikiSent to find the trend in movie-making and the public -acceptance in terms of movie genre, year of release and polarity. -" -582,1209.2495,"Subhabrata Mukherjee, Akshat Malu, A.R. Balamurali, Pushpak - Bhattacharyya",TwiSent: A Multistage System for Analyzing Sentiment in Twitter,cs.IR cs.CL," In this paper, we present TwiSent, a sentiment analysis system for Twitter. -Based on the topic searched, TwiSent collects tweets pertaining to it and -categorizes them into the different polarity classes positive, negative and -objective. However, analyzing micro-blog posts have many inherent challenges -compared to the other text genres. Through TwiSent, we address the problems of -1) Spams pertaining to sentiment analysis in Twitter, 2) Structural anomalies -in the text in the form of incorrect spellings, nonstandard abbreviations, -slangs etc., 3) Entity specificity in the context of the topic searched and 4) -Pragmatics embedded in text. The system performance is evaluated on manually -annotated gold standard data and on an automatically annotated tweet set based -on hashtags. It is a common practise to show the efficacy of a supervised -system on an automatically annotated dataset. However, we show that such a -system achieves lesser classification accurcy when tested on generic twitter -dataset. We also show that our system performs much better than an existing -system. -" -583,1209.3126,Juan-Manuel Torres-Moreno,"Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic - Text Summarization",cs.IR cs.CL," In Automatic Text Summarization, preprocessing is an important phase to -reduce the space of textual representation. Classically, stemming and -lemmatization have been widely used for normalizing words. However, even using -normalization on large texts, the curse of dimensionality can disturb the -performance of summarizers. This paper describes a new method for normalization -of words to further reduce the space of representation. We propose to reduce -each word to its initial letters, as a form of Ultra-stemming. The results show -that Ultra-stemming not only preserve the content of summaries produced by this -representation, but often the performances of the systems can be dramatically -improved. Summaries on trilingual corpora were evaluated automatically with -Fresa. Results confirm an increase in the performance, regardless of summarizer -system used. -" -584,1209.4277,"Elisa Omodei, Thierry Poibeau, Jean-Philippe Cointet",Multi-Level Modeling of Quotation Families Morphogenesis,cs.CY cs.CL cs.SI physics.soc-ph," This paper investigates cultural dynamics in social media by examining the -proliferation and diversification of clearly-cut pieces of content: quoted -texts. In line with the pioneering work of Leskovec et al. and Simmons et al. -on memes dynamics we investigate in deep the transformations that quotations -published online undergo during their diffusion. We deliberately put aside the -structure of the social network as well as the dynamical patterns pertaining to -the diffusion process to focus on the way quotations are changed, how often -they are modified and how these changes shape more or less diverse families and -sub-families of quotations. Following a biological metaphor, we try to -understand in which way mutations can transform quotations at different scales -and how mutation rates depend on various properties of the quotations. -" -585,1209.4471,Nikola Milo\v{s}evi\'c,Stemmer for Serbian language,cs.CL cs.IR," In linguistic morphology and information retrieval, stemming is the process -for reducing inflected (or sometimes derived) words to their stem, base or root -form; generally a written word form. In this work is presented suffix stripping -stemmer for Serbian language, one of the highly inflectional languages. -" -586,1209.6238,Kevin Mote,Natural Language Processing - A Survey,cs.CL," The utility and power of Natural Language Processing (NLP) seems destined to -change our technological society in profound and fundamental ways. However -there are, to date, few accessible descriptions of the science of NLP that have -been written for a popular audience, or even for an audience of intelligent, -but uninitiated scientists. This paper aims to provide just such an overview. -In short, the objective of this article is to describe the purpose, procedures -and practical applications of NLP in a clear, balanced, and readable way. We -will examine the most recent literature describing the methods and processes of -NLP, analyze some of the challenges that researchers are faced with, and -briefly survey some of the current and future applications of this science to -IT research in general. -" -587,1210.0252,"Fethi Fkih, Mohamed Nazih Omri and Imen Toumia","A Linguistic Model for Terminology Extraction based Conditional Random - Fields",cs.CL cs.AI," In this paper, we show the possibility of using a linear Conditional Random -Fields (CRF) for terminology extraction from a specialized text corpus. -" -588,1210.0794,"Nahla Jlaiel, Khouloud Madhbouh, Mohamed Ben Ahmed","A Semantic Approach for Automatic Structuring and Analysis of Software - Process Patterns",cs.AI cs.CL," The main contribution of this paper, is to propose a novel semantic approach -based on a Natural Language Processing technique in order to ensure a semantic -unification of unstructured process patterns which are expressed not only in -different formats but also, in different forms. This approach is implemented -using the GATE text engineering framework and then evaluated leading up to -high-quality results motivating us to continue in this direction. -" -589,1210.0848,"Son Doan, Lucila Ohno-Machado, Nigel Collier","Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example - in Tracking Influenza-Like Illnesses",cs.SI cs.CL physics.soc-ph," Systems that exploit publicly available user generated content such as -Twitter messages have been successful in tracking seasonal influenza. We -developed a novel filtering method for Influenza-Like-Illnesses (ILI)-related -messages using 587 million messages from Twitter micro-blogs. We first filtered -messages based on syndrome keywords from the BioCaster Ontology, an extant -knowledge model of laymen's terms. We then filtered the messages according to -semantic features such as negation, hashtags, emoticons, humor and geography. -The data covered 36 weeks for the US 2009 influenza season from 30th August -2009 to 8th May 2010. Results showed that our system achieved the highest -Pearson correlation coefficient of 98.46% (p-value<2.2e-16), an improvement of -3.98% over the previous state-of-the-art method. The results indicate that -simple NLP-based enhancements to existing approaches to mine Twitter data can -increase the value of this inexpensive resource. -" -590,1210.0852,"Winfried G\""odert",Detecting multiword phrases in mathematical text corpora,cs.CL cs.IR," We present an approach for detecting multiword phrases in mathematical text -corpora. The method used is based on characteristic features of mathematical -terminology. It makes use of a software tool named Lingo which allows to -identify words by means of previously defined dictionaries for specific word -classes as adjectives, personal names or nouns. The detection of multiword -groups is done algorithmically. Possible advantages of the method for indexing -and information retrieval and conclusions for applying dictionary-based methods -of automatic indexing instead of stemming procedures are discussed. -" -591,1210.3312,Juan-Manuel Torres-Moreno,Artex is AnotheR TEXt summarizer,cs.IR cs.AI cs.CL," This paper describes Artex, another algorithm for Automatic Text -Summarization. In order to rank sentences, a simple inner product is calculated -between each sentence, a document vector (text topic) and a lexical vector -(vocabulary used by a sentence). Summaries are then generated by assembling the -highest ranked sentences. No ruled-based linguistic post-processing is -necessary in order to obtain summaries. Tests over several datasets (coming -from Document Understanding Conferences (DUC), Text Analysis Conferences (TAC), -evaluation campaigns, etc.) in French, English and Spanish have shown that -summarizer achieves interesting results. -" -592,1210.3634,Robert Wahlstedt,Quick Summary,cs.CL cs.AI," Quick Summary is an innovate implementation of an automatic document -summarizer that inputs a document in the English language and evaluates each -sentence. The scanner or evaluator determines criteria based on its grammatical -structure and place in the paragraph. The program then asks the user to specify -the number of sentences the person wishes to highlight. For example should the -user ask to have three of the most important sentences, it would highlight the -first and most important sentence in green. Commonly this is the sentence -containing the conclusion. Then Quick Summary finds the second most important -sentence usually called a satellite and highlights it in yellow. This is -usually the topic sentence. Then the program finds the third most important -sentence and highlights it in red. The implementations of this technology are -useful in a society of information overload when a person typically receives 42 -emails a day (Microsoft). The paper also is a candid look at difficulty that -machine learning has in textural translating. However, it speaks on how to -overcome the obstacles that historically prevented progress. This paper -proposes mathematical meta-data criteria that justify the place of importance -of a sentence. Just as tools for the study of relational symmetry in -bio-informatics, this tool seeks to classify words with greater clarity. -""Survey Finds Workers Average Only Three Productive Days per Week."" Microsoft -News Center. Microsoft. Web. 31 Mar. 2012. -" -593,1210.3729,Tanmoy Chakraborty and Sivaji Bandyopadhyay,"Inference of Fine-grained Attributes of Bengali Corpus for Stylometry - Detection",cs.CL cs.CV," Stylometry, the science of inferring characteristics of the author from the -characteristics of documents written by that author, is a problem with a long -history and belongs to the core task of Text categorization that involves -authorship identification, plagiarism detection, forensic investigation, -computer security, copyright and estate disputes etc. In this work, we present -a strategy for stylometry detection of documents written in Bengali. We adopt a -set of fine-grained attribute features with a set of lexical markers for the -analysis of the text and use three semi-supervised measures for making -decisions. Finally, a majority voting approach has been taken for final -classification. The system is fully automatic and language-independent. -Evaluation results of our attempt for Bengali author's stylometry detection -show reasonably promising accuracy in comparison to the baseline model. -" -594,1210.3865,"Chien-Liang Chen, Chao-Lin Liu, Yuan-Chen Chang, and Hsiang-Ping Tsai","Opinion Mining for Relating Subjective Expressions and Annual Earnings - in US Financial Statements",cs.CL cs.AI cs.IR q-fin.GN," Financial statements contain quantitative information and manager's -subjective evaluation of firm's financial status. Using information released in -U.S. 10-K filings. Both qualitative and quantitative appraisals are crucial for -quality financial decisions. To extract such opinioned statements from the -reports, we built tagging models based on the conditional random field (CRF) -techniques, considering a variety of combinations of linguistic factors -including morphology, orthography, predicate-argument structure, syntax, and -simple semantics. Our results show that the CRF models are reasonably effective -to find opinion holders in experiments when we adopted the popular MPQA corpus -for training and testing. The contribution of our paper is to identify opinion -patterns in multiword expressions (MWEs) forms rather than in single word -forms. - We find that the managers of corporations attempt to use more optimistic -words to obfuscate negative financial performance and to accentuate the -positive financial performance. Our results also show that decreasing earnings -were often accompanied by ambiguous and mild statements in the reporting year -and that increasing earnings were stated in assertive and positive way. -" -595,1210.3926,"Julian McAuley, Jure Leskovec, Dan Jurafsky",Learning Attitudes and Attributes from Multi-Aspect Reviews,cs.CL cs.IR cs.LG," The majority of online reviews consist of plain-text feedback together with a -single numeric score. However, there are multiple dimensions to products and -opinions, and understanding the `aspects' that contribute to users' ratings may -help us to better understand their individual preferences. For example, a -user's impression of an audiobook presumably depends on aspects such as the -story and the narrator, and knowing their opinions on these aspects may help us -to recommend better products. In this paper, we build models for rating systems -in which such dimensions are explicit, in the sense that users leave separate -ratings for each aspect of a product. By introducing new corpora consisting of -five million reviews, rated with between three and six aspects, we evaluate our -models on three prediction tasks: First, we use our model to uncover which -parts of a review discuss which of the rated aspects. Second, we use our model -to summarize reviews, which for us means finding the sentences that best -explain a user's rating. Finally, since aspect ratings are optional in many of -the datasets we consider, we use our model to recover those ratings that are -missing from a user's evaluation. Our model matches state-of-the-art approaches -on existing small-scale datasets, while scaling to the real-world datasets we -introduce. Moreover, our model is able to `disentangle' content and sentiment -words: we automatically learn content words that are indicative of a particular -aspect as well as the aspect-specific sentiment words that are indicative of a -particular rating. -" -596,1210.4567,"David Bamman, Jacob Eisenstein, and Tyler Schnoebelen",Gender identity and lexical variation in social media,cs.CL," We present a study of the relationship between gender, linguistic style, and -social networks, using a novel corpus of 14,000 Twitter users. Prior -quantitative work on gender often treats this social variable as a female/male -binary; we argue for a more nuanced approach. By clustering Twitter users, we -find a natural decomposition of the dataset into various styles and topical -interests. Many clusters have strong gender orientations, but their use of -linguistic resources sometimes directly conflicts with the population-level -language statistics. We view these clusters as a more accurate reflection of -the multifaceted nature of gendered language styles. Previous corpus-based work -has also had little to say about individuals whose linguistic styles defy -population-level gender patterns. To identify such individuals, we train a -statistical classifier, and measure the classifier confidence for each -individual in the dataset. Examining individuals whose language does not match -the classifier's model for their gender, we find that they have social networks -that include significantly fewer same-gender social connections and that, in -general, social network homophily is correlated with the use of same-gender -language markers. Pairing computational methods and social theory thus offers a -new perspective on how gender emerges as individuals position themselves -relative to audiences, topics, and mainstream gender norms. -" -597,1210.4854,"Hannaneh Hajishirzi, Mohammad Rastegari, Ali Farhadi, Jessica K. - Hodgins",Semantic Understanding of Professional Soccer Commentaries,cs.CL cs.AI," This paper presents a novel approach to the problem of semantic parsing via -learning the correspondences between complex sentences and rich sets of events. -Our main intuition is that correct correspondences tend to occur more -frequently. Our model benefits from a discriminative notion of similarity to -learn the correspondence between sentence and an event and a ranking machinery -that scores the popularity of each correspondence. Our method can discover a -group of events (called macro-events) that best describes a sentence. We -evaluate our method on our novel dataset of professional soccer commentaries. -The empirical results show that our method significantly outperforms the -state-of-theart. -" -598,1210.4871,"Hui Lin, Jeff A. Bilmes","Learning Mixtures of Submodular Shells with Application to Document - Summarization",cs.LG cs.CL cs.IR stat.ML," We introduce a method to learn a mixture of submodular ""shells"" in a -large-margin setting. A submodular shell is an abstract submodular function -that can be instantiated with a ground set and a set of parameters to produce a -submodular function. A mixture of such shells can then also be so instantiated -to produce a more complex submodular function. What our algorithm learns are -the mixture weights over such shells. We provide a risk bound guarantee when -learning in a large-margin structured-prediction setting using a projected -subgradient method when only approximate submodular optimization is possible -(such as with submodular function maximization). We apply this method to the -problem of multi-document summarization and produce the best results reported -so far on the widely used NIST DUC-05 through DUC-07 document summarization -corpora. -" -599,1210.5268,"Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, Eric P. Xing",Diffusion of Lexical Change in Social Media,cs.CL cs.SI physics.soc-ph," Computer-mediated communication is driving fundamental changes in the nature -of written language. We investigate these changes by statistical analysis of a -dataset comprising 107 million Twitter messages (authored by 2.7 million unique -user accounts). Using a latent vector autoregressive model to aggregate across -thousands of words, we identify high-level patterns in diffusion of linguistic -change over the United States. Our model is robust to unpredictable changes in -Twitter's sampling rate, and provides a probabilistic characterization of the -relationship of macro-scale linguistic influence to a set of demographic and -geographic predictors. The results of this analysis offer support for prior -arguments that focus on geographical proximity and population size. However, -demographic similarity -- especially with regard to race -- plays an even more -central role, as cities with similar racial demographics are far more likely to -share linguistic influence. Rather than moving towards a single unified -""netspeak"" dialect, language evolution in computer-mediated communication -reproduces existing fault lines in spoken American English. -" -600,1210.5321,Koji Ohnishi,"The origin of Mayan languages from Formosan language group of - Austronesian",cs.CL q-bio.PE," Basic body-part names (BBPNs) were defined as body-part names in Swadesh -basic 200 words. Non-Mayan cognates of Mayan (MY) BBPNs were extensively -searched for, by comparing with non-MY vocabulary, including ca.1300 basic -words of 82 AN languages listed by Tryon (1985), etc. Thus found cognates (CGs) -in non-MY are listed in Table 1, as classified by language groups to which most -similar cognates (MSCs) of MY BBPNs belong. CGs of MY are classified to 23 -mutually unrelated CG-items, of which 17.5 CG-items have their MSCs in -Austronesian (AN), giving its closest similarity score (CSS), CSS(AN) = 17.5, -which consists of 10.33 MSCs in Formosan, 1.83 MSCs in Western -Malayo-Polynesian (W.MP), 0.33 in Central MP, 0.0 in SHWNG, and 5.0 in Oceanic -[i.e., CSS(FORM)= 10.33, CSS(W.MP) = 1.88, ..., CSS(OC)= 5.0]. These CSSs for -language (sub)groups are also listed in the underline portion of every section -of (Section1 - Section 6) in Table 1. Chi-squar test (degree of freedom = 1) -using [Eq 1] and [Eqs.2] revealed that MSCs of MY BBPNs are distributed in -Formosan in significantly higher frequency (P < 0.001) than in other subgroups -of AN, as well as than in non-AN languages. MY is thus concluded to have been -derived from Formosan of AN. Eskimo shows some BBPN similarities to FORM and -MY. -" -601,1210.5486,"Juhi Ameta, Nisheeth Joshi, Iti Mathur",A Lightweight Stemmer for Gujarati,cs.CL," Gujarati is a resource poor language with almost no language processing tools -being available. In this paper we have shown an implementation of a rule based -stemmer of Gujarati. We have shown the creation of rules for stemming and the -richness in morphology that Gujarati possesses. We have also evaluated our -results by verifying it with a human expert. -" -602,1210.5517,"Nisheeth Joshi, Iti Mathur",Design of English-Hindi Translation Memory for Efficient Translation,cs.CL," Developing parallel corpora is an important and a difficult activity for -Machine Translation. This requires manual annotation by Human Translators. -Translating same text again is a useless activity. There are tools available to -implement this for European Languages, but no such tool is available for Indian -Languages. In this paper we present a tool for Indian Languages which not only -provides automatic translations of the previously available translation but -also provides multiple translations, in cases where a sentence has multiple -translations, in ranked list of suggestive translations for a sentence. -Moreover this tool also lets translators have global and local saving options -of their work, so that they may share it with others, which further lightens -the task. -" -603,1210.5581,"Chia-Chi Tsai, Chao-Lin Liu, Wei-Jie Huang, Man-Kwan Shan",Hidden Trends in 90 Years of Harvard Business Review,cs.CL cs.DL cs.IR," In this paper, we demonstrate and discuss results of our mining the abstracts -of the publications in Harvard Business Review between 1922 and 2012. -Techniques for computing n-grams, collocations, basic sentiment analysis, and -named-entity recognition were employed to uncover trends hidden in the -abstracts. We present findings about international relationships, sentiment in -HBR's abstracts, important international companies, influential technological -inventions, renown researchers in management theories, US presidents via -chronological analyses. -" -604,1210.5751,"Estelle Delpech (LINA), B\'eatrice Daille (LINA), Emmanuel Morin - (LINA), Claire Lemaire","Extraction of domain-specific bilingual lexicon from comparable corpora: - compositional translation and ranking",cs.CL," This paper proposes a method for extracting translations of morphologically -constructed terms from comparable corpora. The method is based on compositional -translation and exploits translation equivalences at the morpheme-level, which -allows for the generation of ""fertile"" translations (translation pairs in which -the target term has more words than the source term). Ranking methods relying -on corpus-based and translation-based features are used to select the best -candidate translation. We obtain an average precision of 91% on the Top1 -candidate translation. The method was tested on two language pairs -(English-French and English-German) and with a small specialized comparable -corpora (400k words per language). -" -605,1210.5898,"Chao-Lin Liu, Guantao Jin, Qingfeng Liu, Wei-Yun Chiu, Yih-Soong Yu","Some Chances and Challenges in Applying Language Technologies to - Historical Studies in Chinese",cs.CL cs.DL cs.IR," We report applications of language technology to analyzing historical -documents in the Database for the Study of Modern Chinese Thoughts and -Literature (DSMCTL). We studied two historical issues with the reported -techniques: the conceptualization of ""huaren"" (Chinese people) and the attempt -to institute constitutional monarchy in the late Qing dynasty. We also discuss -research challenges for supporting sophisticated issues using our experience -with DSMCTL, the Database of Government Officials of the Republic of China, and -the Dream of the Red Chamber. Advanced techniques and tools for lexical, -syntactic, semantic, and pragmatic processing of language information, along -with more thorough data collection, are needed to strengthen the collaboration -between historians and computer scientists. -" -606,1210.5965,Bohdan Pavlyshenko,"Classification Analysis Of Authorship Fiction Texts in The Space Of - Semantic Fields",cs.CL," The use of naive Bayesian classifier (NB) and the classifier by the k nearest -neighbors (kNN) in classification semantic analysis of authors' texts of -English fiction has been analysed. The authors' works are considered in the -vector space the basis of which is formed by the frequency characteristics of -semantic fields of nouns and verbs. Highly precise classification of authors' -texts in the vector space of semantic fields indicates about the presence of -particular spheres of author's idiolect in this space which characterizes the -individual author's style. -" -607,1210.7137,Bernard Ycart (LJK),Alberti's letter counts,math.HO cs.CL," Four centuries before modern statistical linguistics was born, Leon Battista -Alberti (1404--1472) compared the frequency of vowels in Latin poems and -orations, making the first quantified observation of a stylistic difference -ever. Using a corpus of 20 Latin texts (over 5 million letters), Alberti's -observations are statistically assessed. Letter counts prove that poets used -significantly more a's, e's, and y's, whereas orators used more of the other -vowels. The sample sizes needed to justify the assertions are studied, and -proved to be within reach for Alberti's scholarship. -" -608,1210.7282,Robert Bishop and Ruggero Micheletto,The Hangulphabet: A Descriptive Alphabet,cs.CL," This paper describes the Hangulphabet, a new writing system that should prove -useful in a number of contexts. Using the Hangulphabet, a user can instantly -see voicing, manner and place of articulation of any phoneme found in human -language. The Hangulphabet places consonant graphemes on a grid with the x-axis -representing the place of articulation and the y-axis representing manner of -articulation. Each individual grapheme contains radicals from both axes where -the points intersect. The top radical represents manner of articulation where -the bottom represents place of articulation. A horizontal line running through -the middle of the bottom radical represents voicing. For vowels, place of -articulation is located on a grid that represents the position of the tongue in -the mouth. This grid is similar to that of the IPA vowel chart (International -Phonetic Association, 1999). The difference with the Hangulphabet being the -trapezoid representing the vocal apparatus is on a slight tilt. Place of -articulation for a vowel is represented by a breakout figure from the grid. -This system can be used as an alternative to the International Phonetic -Alphabet (IPA) or as a complement to it. Beginning students of linguistics may -find it particularly useful. A Hangulphabet font has been created to facilitate -switching between the Hangulphabet and the IPA. -" -609,1210.7599,"Krunoslav Zubrinic, Damir Kalpic, Mario Milicevic","The automatic creation of concept maps from documents written using - morphologically rich languages",cs.IR cs.AI cs.CL," Concept map is a graphical tool for representing knowledge. They have been -used in many different areas, including education, knowledge management, -business and intelligence. Constructing of concept maps manually can be a -complex task; an unskilled person may encounter difficulties in determining and -positioning concepts relevant to the problem area. An application that -recommends concept candidates and their position in a concept map can -significantly help the user in that situation. This paper gives an overview of -different approaches to automatic and semi-automatic creation of concept maps -from textual and non-textual sources. The concept map mining process is -defined, and one method suitable for the creation of concept maps from -unstructured textual sources in highly inflected languages such as the Croatian -language is described in detail. Proposed method uses statistical and data -mining techniques enriched with linguistic tools. With minor adjustments, that -method can also be used for concept map mining from textual sources in other -morphologically rich languages. -" -610,1210.7917,Bohdan Pavlyshenko,The Model of Semantic Concepts Lattice For Data Mining Of Microblogs,cs.CL cs.IR," The model of semantic concept lattice for data mining of microblogs has been -proposed in this work. It is shown that the use of this model is effective for -the semantic relations analysis and for the detection of associative rules of -key words. -" -611,1210.8436,Maryam Kamvar and Ciprian Chelba,"Optimal size, freshness and time-frame for voice search vocabulary",cs.CL cs.IR," In this paper, we investigate how to optimize the vocabulary for a voice -search language model. The metric we optimize over is the out-of-vocabulary -(OoV) rate since it is a strong indicator of user experience. In a departure -from the usual way of measuring OoV rates, web search logs allow us to compute -the per-session OoV rate and thus estimate the percentage of users that -experience a given OoV rate. Under very conservative text normalization, we -find that a voice search vocabulary consisting of 2 to 2.5 million words -extracted from 1 week of search query data will result in an aggregate OoV rate -of 1%; at that size, the same OoV rate will also be experienced by 90% of -users. The number of words included in the vocabulary is a stable indicator of -the OoV rate. Altering the freshness of the vocabulary or the duration of the -time window over which the training data is gathered does not significantly -change the OoV rate. Surprisingly, a significantly larger vocabulary -(approximately 10 million words) is required to guarantee OoV rates below 1% -for 95% of the users. -" -612,1210.8440,"Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar - Kumar",Large Scale Language Modeling in Automatic Speech Recognition,cs.CL," Large language models have been proven quite beneficial for a variety of -automatic speech recognition tasks in Google. We summarize results on Voice -Search and a few YouTube speech transcription tasks to highlight the impact -that one can expect from increasing both the amount of training data, and the -size of the language model estimated from such data. Depending on the task, -availability and amount of training data used, language model size and amount -of work and care put into integrating them in the lattice rescoring step we -observe reductions in word error rate between 6% and 10% relative, for systems -on a wide range of operating points between 17% and 52% word error rate. -" -613,1211.0074,Alex Rudnick,Transition-Based Dependency Parsing With Pluggable Classifiers,cs.CL," In principle, the design of transition-based dependency parsers makes it -possible to experiment with any general-purpose classifier without other -changes to the parsing algorithm. In practice, however, it often takes -substantial software engineering to bridge between the different -representations used by two software packages. Here we present extensions to -MaltParser that allow the drop-in use of any classifier conforming to the -interface of the Weka machine learning package, a wrapper for the TiMBL -memory-based learner to this interface, and experiments on multilingual -dependency parsing with a variety of classifiers. While earlier work had -suggested that memory-based learners might be a good choice for low-resource -parsing scenarios, we cannot support that hypothesis in this work. We observed -that support-vector machines give better parsing performance than the -memory-based learner, regardless of the size of the training set. -" -614,1211.0418,"Normunds Gr\=uz\=itis, Gunta Ne\v{s}pore, Baiba Saul\=ite",Verbalizing Ontologies in Controlled Baltic Languages,cs.CL cs.AI," Controlled natural languages (mostly English-based) recently have emerged as -seemingly informal supplementary means for OWL ontology authoring, if compared -to the formal notations that are used by professional knowledge engineers. In -this paper we present by examples controlled Latvian language that has been -designed to be compliant with the state of the art Attempto Controlled English. -We also discuss relation with controlled Lithuanian language that is being -designed in parallel. -" -615,1211.0498,Rami Al-Rfou',Detecting English Writing Styles For Non-native Speakers,cs.CL," Analyzing writing styles of non-native speakers is a challenging task. In -this paper, we analyze the comments written in the discussion pages of the -English Wikipedia. Using learning algorithms, we are able to detect native -speakers' writing style with an accuracy of 74%. Given the diversity of the -English Wikipedia users and the large number of languages they speak, we -measure the similarities among their native languages by comparing the -influence they have on their English writing style. Our results show that -languages known to have the same origin and development path have similar -footprint on their speakers' English writing style. To enable further studies, -the dataset we extracted from Wikipedia will be made available publicly. -" -616,1211.2290,"Abhimanu Kumar, Jason Baldridge, Matthew Lease, Joydeep Ghosh",Dating Texts without Explicit Temporal Cues,cs.CL cs.AI," This paper tackles temporal resolution of documents, such as determining when -a document is about or when it was written, based only on its text. We apply -techniques from information retrieval that predict dates via language models -over a discretized timeline. Unlike most previous works, we rely {\it solely} -on temporal cues implicit in the text. We consider both document-likelihood and -divergence based techniques and several smoothing methods for both of them. Our -best model predicts the mid-point of individuals' lives with a median of 22 and -mean error of 36 years for Wikipedia biographies from 3800 B.C. to the present -day. We also show that this approach works well when training on such -biographies and predicting dates both for non-biographical Wikipedia pages -about specific years (500 B.C. to 2010 A.D.) and for publication dates of short -stories (1798 to 2008). Together, our work shows that, even in absence of -temporal extraction resources, it is possible to achieve remarkable temporal -locality across a diverse set of texts. -" -617,1211.2741,"Kamlesh Sharma, S. V. A. V. Prasad and T. V. Prasad",A Hindi Speech Actuated Computer Interface for Web Search,cs.CL cs.HC cs.IR," Aiming at increasing system simplicity and flexibility, an audio evoked based -system was developed by integrating simplified headphone and user-friendly -software design. This paper describes a Hindi Speech Actuated Computer -Interface for Web search (HSACIWS), which accepts spoken queries in Hindi -language and provides the search result on the screen. This system recognizes -spoken queries by large vocabulary continuous speech recognition (LVCSR), -retrieves relevant document by text retrieval, and provides the search result -on the Web by the integration of the Web and the voice systems. The LVCSR in -this system showed enough performance levels for speech with acoustic and -language models derived from a query corpus with target contents. -" -618,1211.3402,Bohdan Pavlyshenko,"Genetic Optimization of Keywords Subset in the Classification Analysis - of Texts Authorship",cs.IR cs.CL," The genetic selection of keywords set, the text frequencies of which are -considered as attributes in text classification analysis, has been analyzed. -The genetic optimization was performed on a set of words, which is the fraction -of the frequency dictionary with given frequency limits. The frequency -dictionary was formed on the basis of analyzed text array of texts of English -fiction. As the fitness function which is minimized by the genetic algorithm, -the error of nearest k neighbors classifier was used. The obtained results show -high precision and recall of texts classification by authorship categories on -the basis of attributes of keywords set which were selected by the genetic -algorithm from the frequency dictionary. -" -619,1211.3643,Tobias Kuhn,"A Principled Approach to Grammars for Controlled Natural Languages and - Predictive Editors",cs.CL," Controlled natural languages (CNL) with a direct mapping to formal logic have -been proposed to improve the usability of knowledge representation systems, -query interfaces, and formal specifications. Predictive editors are a popular -approach to solve the problem that CNLs are easy to read but hard to write. -Such predictive editors need to be able to ""look ahead"" in order to show all -possible continuations of a given unfinished sentence. Such lookahead features, -however, are difficult to implement in a satisfying way with existing grammar -frameworks, especially if the CNL supports complex nonlocal structures such as -anaphoric references. Here, methods and algorithms are presented for a new -grammar notation called Codeco, which is specifically designed for controlled -natural languages and predictive editors. A parsing approach for Codeco based -on an extended chart parsing algorithm is presented. A large subset of Attempto -Controlled English (ACE) has been represented in Codeco. Evaluation of this -grammar and the parser implementation shows that the approach is practical, -adequate and efficient. -" -620,1211.4161,"Ae-Lim Ahn (DICORA), \'Eric Laporte (LIGM), Jee-Sun Nam (DICORA, LIGM)",Semantic Polarity of Adjectival Predicates in Online Reviews,cs.CL," Web users produce more and more documents expressing opinions. Because these -have become important resources for customers and manufacturers, many have -focused on them. Opinions are often expressed through adjectives with positive -or negative semantic values. In extracting information from users' opinion in -online reviews, exact recognition of the semantic polarity of adjectives is one -of the most important requirements. Since adjectives have different semantic -orientations according to contexts, it is not satisfying to extract opinion -information without considering the semantic and lexical relations between the -adjectives and the feature nouns appropriate to a given domain. In this paper, -we present a classification of adjectives by polarity, and we analyze -adjectives that are undetermined in the absence of contexts. Our research -should be useful for accurately predicting semantic orientations of opinion -sentences, and should be taken into account before relying on an automatic -methods. -" -621,1211.4488,Jessica C. Ram\'irez and Yuji Matsumoto,"A Rule-Based Approach For Aligning Japanese-Spanish Sentences From A - Comparable Corpora",cs.CL cs.AI," The performance of a Statistical Machine Translation System (SMT) system is -proportionally directed to the quality and length of the parallel corpus it -uses. However for some pair of languages there is a considerable lack of them. -The long term goal is to construct a Japanese-Spanish parallel corpus to be -used for SMT, whereas, there are a lack of useful Japanese-Spanish parallel -Corpus. To address this problem, In this study we proposed a method for -extracting Japanese-Spanish Parallel Sentences from Wikipedia using POS tagging -and Rule-Based approach. The main focus of this approach is the syntactic -features of both languages. Human evaluation was performed over a sample and -shows promising results, in comparison with the baseline. -" -622,1211.4929,Trung V. Nguyen and Alice H. Oh,"Summarizing Reviews with Variable-length Syntactic Patterns and Topic - Models",cs.IR cs.CL," We present a novel summarization framework for reviews of products and -services by selecting informative and concise text segments from the reviews. -Our method consists of two major steps. First, we identify five frequently -occurring variable-length syntactic patterns and use them to extract candidate -segments. Then we use the output of a joint generative sentiment topic model to -filter out the non-informative segments. We verify the proposed method with -quantitative and qualitative experiments. In a quantitative study, our approach -outperforms previous methods in producing informative segments and summaries -that capture aspects of products and services as expressed in the -user-generated pros and cons lists. Our user study with ninety users resonates -with this result: individual segments extracted and filtered by our method are -rated as more useful by users compared to previous approaches by users. -" -623,1211.6847,Bernard Ycart (LJK),"Letter counting: a stem cell for Cryptology, Quantitative Linguistics, - and Statistics",math.HO cs.CL cs.CR," Counting letters in written texts is a very ancient practice. It has -accompanied the development of Cryptology, Quantitative Linguistics, and -Statistics. In Cryptology, counting frequencies of the different characters in -an encrypted message is the basis of the so called frequency analysis method. -In Quantitative Linguistics, the proportion of vowels to consonants in -different languages was studied long before authorship attribution. In -Statistics, the alternation vowel-consonants was the only example that Markov -ever gave of his theory of chained events. A short history of letter counting -is presented. The three domains, Cryptology, Quantitative Linguistics, and -Statistics, are then examined, focusing on the interactions with the other two -fields through letter counting. As a conclusion, the eclectism of past -centuries scholars, their background in humanities, and their familiarity with -cryptograms, are identified as contributing factors to the mutual enrichment -process which is described here. -" -624,1211.6887,Marcin Mi{\l}kowski,Automating rule generation for grammar checkers,cs.CL cs.LG," In this paper, I describe several approaches to automatic or semi-automatic -development of symbolic rules for grammar checkers from the information -contained in corpora. The rules obtained this way are an important addition to -manually-created rules that seem to dominate in rule-based checkers. However, -the manual process of creation of rules is costly, time-consuming and -error-prone. It seems therefore advisable to use machine-learning algorithms to -create the rules automatically or semi-automatically. The results obtained seem -to corroborate my initial hypothesis that symbolic machine learning algorithms -can be useful for acquiring new rules for grammar checking. It turns out, -however, that for practical uses, error corpora cannot be the sole source of -information used in grammar checking. I suggest therefore that only by using -different approaches, grammar-checkers, or more generally, computer-aided -proofreading tools, will be able to cover most frequent and severe mistakes and -avoid false alarms that seem to distract users. -" -625,1212.0074,Kyumars Sheykh Esmaili,Challenges in Kurdish Text Processing,cs.IR cs.CL," Despite having a large number of speakers, the Kurdish language is among the -less-resourced languages. In this work we highlight the challenges and problems -in providing the required tools and techniques for processing texts written in -Kurdish. From a high-level perspective, the main challenges are: the inherent -diversity of the language, standardization and segmentation issues, and the -lack of language resources. -" -626,1212.0229,James Gerard Wolff,"Simplification and integration in computing and cognition: the SP theory - and the multiple alignment concept",cs.AI cs.CL," The main purpose of this article is to describe potential benefits and -applications of the SP theory, a unique attempt to simplify and integrate ideas -across artificial intelligence, mainstream computing and human cognition, with -information compression as a unifying theme. The theory, including a concept of -multiple alignment, combines conceptual simplicity with descriptive and -explanatory power in several areas including representation of knowledge, -natural language processing, pattern recognition, several kinds of reasoning, -the storage and retrieval of information, planning and problem solving, -unsupervised learning, information compression, and human perception and -cognition. In the SP machine -- an expression of the SP theory which is -currently realised in the form of computer models -- there is potential for an -overall simplification of computing systems, including software. As a theory -with a broad base of support, the SP theory promises useful insights in many -areas and the integration of structures and functions, both within a given area -and amongst different areas. There are potential benefits in natural language -processing (with potential for the understanding and translation of natural -languages), the need for a versatile intelligence in autonomous robots, -computer vision, intelligent databases, maintaining multiple versions of -documents or web pages, software engineering, criminal investigations, the -management of big data and gaining benefits from it, the semantic web, medical -diagnosis, the detection of computer viruses, the economical transmission of -data, and data fusion. Further development of these ideas would be facilitated -by the creation of a high-parallel, web-based, open-source version of the SP -machine, with a good user interface. This would provide a means for researchers -to explore what can be done with the system and to refine it. -" -627,1212.0927,"Ke Wu, Philip Resnik","Two Algorithms for Finding $k$ Shortest Paths of a Weighted Pushdown - Automaton",cs.CL cs.DS cs.FL," We introduce efficient algorithms for finding the $k$ shortest paths of a -weighted pushdown automaton (WPDA), a compact representation of a weighted set -of strings with potential applications in parsing and machine translation. Both -of our algorithms are derived from the same weighted deductive logic -description of the execution of a WPDA using different search strategies. -Experimental results show our Algorithm 2 adds very little overhead vs. the -single shortest path algorithm, even with a large $k$. -" -628,1212.1192,"Miquel Espl\`a-Gomis, Felipe S\'anchez-Mart\'inez, Mikel L. Forcada","Using external sources of bilingual information for on-the-fly word - alignment",cs.CL," In this paper we present a new and simple language-independent method for -word-alignment based on the use of external sources of bilingual information -such as machine translation systems. We show that the few parameters of the -aligner can be trained on a very small corpus, which leads to results -comparable to those obtained by the state-of-the-art tool GIZA++ in terms of -precision. Regarding other metrics, such as alignment error rate or F-measure, -the parametric aligner, when trained on a very small gold-standard (450 pairs -of sentences), provides results comparable to those produced by GIZA++ when -trained on an in-domain corpus of around 10,000 pairs of sentences. -Furthermore, the results obtained indicate that the training is -domain-independent, which enables the use of the trained aligner 'on the fly' -on any new pair of sentences. -" -629,1212.1362,Martin Gerlach and Eduardo G. Altmann,Stochastic model for the vocabulary growth in natural languages,physics.soc-ph cs.CL physics.data-an," We propose a stochastic model for the number of different words in a given -database which incorporates the dependence on the database size and historical -changes. The main feature of our model is the existence of two different -classes of words: (i) a finite number of core-words which have higher frequency -and do not affect the probability of a new word to be used; and (ii) the -remaining virtually infinite number of noncore-words which have lower frequency -and once used reduce the probability of a new word to be used in the future. -Our model relies on a careful analysis of the google-ngram database of books -published in the last centuries and its main consequence is the generalization -of Zipf's and Heaps' law to two scaling regimes. We confirm that these -generalizations yield the best simple description of the data among generic -descriptive models and that the two free parameters depend only on the language -but not on the database. From the point of view of our model the main change on -historical time scales is the composition of the specific words included in the -finite list of core-words, which we observe to decay exponentially in time with -a rate of approximately 30 words per year for English. -" -630,1212.1478,Bohdan Pavlyshenko,"The Clustering of Author's Texts of English Fiction in the Vector Space - of Semantic Fields",cs.CL cs.DL cs.IR," The clustering of text documents in the vector space of semantic fields and -in the semantic space with orthogonal basis has been analysed. It is shown that -using the vector space model with the basis of semantic fields is effective in -the cluster analysis algorithms of author's texts in English fiction. The -analysis of the author's texts distribution in cluster structure showed the -presence of the areas of semantic space that represent the author's ideolects -of individual authors. SVD factorization of the semantic fields matrix makes it -possible to reduce significantly the dimension of the semantic space in the -cluster analysis of author's texts. -" -631,1212.1709,Matjaz Perc,"Evolution of the most common English words and phrases over the - centuries",physics.soc-ph cs.CL cs.DL," By determining which were the most common English words and phrases since the -beginning of the 16th century, we obtain a unique large-scale view of the -evolution of written text. We find that the most common words and phrases in -any given year had a much shorter popularity lifespan in the 16th than they had -in the 20th century. By measuring how their usage propagated across the years, -we show that for the past two centuries the process has been governed by linear -preferential attachment. Along with the steady growth of the English lexicon, -this provides an empirical explanation for the ubiquity of the Zipf's law in -language statistics and confirms that writing, although undoubtedly an -expression of art and skill, is not immune to the same influences of -self-organization that are known to regulate processes as diverse as the making -of new friends and World Wide Web growth. -" -632,1212.1918,"Juan-Manuel Torres-Moreno, Patricia Vel\'azquez-Morales, Jean-Guy - Meunier",Condens\'es de textes par des m\'ethodes num\'eriques,cs.IR cs.CL," Since information in electronic form is already a standard, and that the -variety and the quantity of information become increasingly large, the methods -of summarizing or automatic condensation of texts is a critical phase of the -analysis of texts. This article describes CORTEX a system based on numerical -methods, which allows obtaining a condensation of a text, which is independent -of the topic and of the length of the text. The structure of the system enables -it to find the abstracts in French or Spanish in very short times. -" -633,1212.2006,Jiwei Li and Sujian Li,"A Novel Feature-based Bayesian Model for Query Focused Multi-document - Summarization",cs.CL cs.IR," Both supervised learning methods and LDA based topic model have been -successfully applied in the field of query focused multi-document -summarization. In this paper, we propose a novel supervised approach that can -incorporate rich sentence features into Bayesian topic models in a principled -way, thus taking advantages of both topic model and feature based supervised -learning methods. Experiments on TAC2008 and TAC2009 demonstrate the -effectiveness of our approach. -" -634,1212.2036,Jiwei Li and Sujian Li,"Query-focused Multi-document Summarization: Combining a Novel Topic - Model with Graph-based Semi-supervised Learning",cs.CL cs.IR," Graph-based semi-supervised learning has proven to be an effective approach -for query-focused multi-document summarization. The problem of previous -semi-supervised learning is that sentences are ranked without considering the -higher level information beyond sentence level. Researches on general -summarization illustrated that the addition of topic level can effectively -improve the summary quality. Inspired by previous researches, we propose a -two-layer (i.e. sentence layer and topic layer) graph-based semi-supervised -learning approach. At the same time, we propose a novel topic model which makes -full use of the dependence between sentences and words. Experimental results on -DUC and TAC data sets demonstrate the effectiveness of our proposed approach. -" -635,1212.2145,Shuang-Hong Yang,A Scale-Space Theory for Text,cs.IR cs.CL," Scale-space theory has been established primarily by the computer vision and -signal processing communities as a well-founded and promising framework for -multi-scale processing of signals (e.g., images). By embedding an original -signal into a family of gradually coarsen signals parameterized with a -continuous scale parameter, it provides a formal framework to capture the -structure of a signal at different scales in a consistent way. In this paper, -we present a scale space theory for text by integrating semantic and spatial -filters, and demonstrate how natural language documents can be understood, -processed and analyzed at multiple resolutions, and how this scale-space -representation can be used to facilitate a variety of NLP and text analysis -tasks. -" -636,1212.2390,Eric Werner,"On the complexity of learning a language: An improvement of Block's - algorithm",cs.CL cs.LG," Language learning is thought to be a highly complex process. One of the -hurdles in learning a language is to learn the rules of syntax of the language. -Rules of syntax are often ordered in that before one rule can applied one must -apply another. It has been thought that to learn the order of n rules one must -go through all n! permutations. Thus to learn the order of 27 rules would -require 27! steps or 1.08889x10^{28} steps. This number is much greater than -the number of seconds since the beginning of the universe! In an insightful -analysis the linguist Block ([Block 86], pp. 62-63, p.238) showed that with the -assumption of transitivity this vast number of learning steps reduces to a mere -377 steps. We present a mathematical analysis of the complexity of Block's -algorithm. The algorithm has a complexity of order n^2 given n rules. In -addition, we improve Block's results exponentially, by introducing an algorithm -that has complexity of order less than n log n. -" -637,1212.2453,"David Azari, Eric J. Horvitz, Susan Dumais, Eric Brill",Web-Based Question Answering: A Decision-Making Perspective,cs.IR cs.CL," We describe an investigation of the use of probabilistic models and -cost-benefit analyses to guide resource-intensive procedures used by a -Web-based question answering system. We first provide an overview of research -on question-answering systems. Then, we present details on AskMSR, a prototype -web-based question answering system. We discuss Bayesian analyses of the -quality of answers generated by the system and show how we can endow the system -with the ability to make decisions about the number of queries issued to a -search engine, given the cost of queries and the expected value of query -results in refining an ultimate answer. Finally, we review the results of a set -of experiments. -" -638,1212.2477,"Shyong (Tony) K. Lam, David M Pennock, Dan Cosley, Steve Lawrence","1 Billion Pages = 1 Million Dollars? Mining the Web to Play ""Who Wants - to be a Millionaire?""",cs.IR cs.CL," We exploit the redundancy and volume of information on the web to build a -computerized player for the ABC TV game show 'Who Wants To Be A Millionaire?' -The player consists of a question-answering module and a decision-making -module. The question-answering module utilizes question transformation -techniques, natural language parsing, multiple information retrieval -algorithms, and multiple search engines; results are combined in the spirit of -ensemble learning using an adaptive weighting scheme. Empirically, the system -correctly answers about 75% of questions from the Millionaire CD-ROM, 3rd -edition - general-interest trivia questions often about popular culture and -common knowledge. The decision-making module chooses from allowable actions in -the game in order to maximize expected risk-adjusted winnings, where the -estimated probability of answering correctly is a function of past performance -and confidence in in correctly answering the current question. When given a six -question head start (i.e., when starting from the $2,000 level), we find that -the system performs about as well on average as humans starting at the -beginning. Our system demonstrates the potential of simple but well-chosen -techniques for mining answers from unstructured information such as the web. -" -639,1212.2616,"Alexander M. Petersen, Joel N. Tenenbaum, Shlomo Havlin, H. Eugene - Stanley, Matjaz Perc","Languages cool as they expand: Allometric scaling and the decreasing - need for new words",physics.soc-ph cond-mat.stat-mech cs.CL stat.AP," We analyze the occurrence frequencies of over 15 million words recorded in -millions of books published during the past two centuries in seven different -languages. For all languages and chronological subsets of the data we confirm -that two scaling regimes characterize the word frequency distributions, with -only the more common words obeying the classic Zipf law. Using corpora of -unprecedented size, we test the allometric scaling relation between the corpus -size and the vocabulary size of growing languages to demonstrate a decreasing -marginal need for new words, a feature that is likely related to the underlying -correlations between words. We calculate the annual growth fluctuations of word -use which has a decreasing trend as the corpus size increases, indicating a -slowdown in linguistic evolution following language expansion. This ""cooling -pattern"" forms the basis of a third statistical regularity, which unlike the -Zipf and the Heaps law, is dynamical in nature. -" -640,1212.2676,Aaron Gerow and Mark Keane,Mining the Web for the Voice of the Herd to Track Stock Market Bubbles,cs.CL cs.IR physics.soc-ph q-fin.GN," We show that power-law analyses of financial commentaries from newspaper -web-sites can be used to identify stock market bubbles, supplementing -traditional volatility analyses. Using a four-year corpus of 17,713 online, -finance-related articles (10M+ words) from the Financial Times, the New York -Times, and the BBC, we show that week-to-week changes in power-law -distributions reflect market movements of the Dow Jones Industrial Average -(DJI), the FTSE-100, and the NIKKEI-225. Notably, the statistical regularities -in language track the 2007 stock market bubble, showing emerging structure in -the language of commentators, as progressively greater agreement arose in their -positive perceptions of the market. Furthermore, during the bubble period, a -marked divergence in positive language occurs as revealed by a Kullback-Leibler -analysis. -" -641,1212.3023,"Mahyuddin K. M. Nasution, Shahrul Azman Mohd Noah",Keyword Extraction for Identifying Social Actors,cs.IR cs.CL," Identifying the social actor has become one of tasks in Artificial -Intelligence, whereby extracting keyword from Web snippets depend on the use of -web is steadily gaining ground in this research. We develop therefore an -approach based on overlap principle for utilizing a collection of features in -web snippets, where use of keyword will eliminate the un-relevant web pages. -" -642,1212.3138,Aaron Georw and Mark Keane,"Identifying Metaphor Hierarchies in a Corpus Analysis of Finance - Articles",cs.CL," Using a corpus of over 17,000 financial news reports (involving over 10M -words), we perform an analysis of the argument-distributions of the UP- and -DOWN-verbs used to describe movements of indices, stocks, and shares. Using -measures of the overlap in the argument distributions of these verbs and -k-means clustering of their distributions, we advance evidence for the proposal -that the metaphors referred to by these verbs are organised into hierarchical -structures of superordinate and subordinate groups. -" -643,1212.3139,Aaron Gerow and Mark Keane,Identifying Metaphoric Antonyms in a Corpus Analysis of Finance Articles,cs.CL," Using a corpus of 17,000+ financial news reports (involving over 10M words), -we perform an analysis of the argument-distributions of the UP and DOWN verbs -used to describe movements of indices, stocks and shares. In Study 1 -participants identified antonyms of these verbs in a free-response task and a -matching task from which the most commonly identified antonyms were compiled. -In Study 2, we determined whether the argument-distributions for the verbs in -these antonym-pairs were sufficiently similar to predict the most -frequently-identified antonym. Cosine similarity correlates moderately with the -proportions of antonym-pairs identified by people (r = 0.31). More -impressively, 87% of the time the most frequently-identified antonym is either -the first- or second-most similar pair in the set of alternatives. The -implications of these results for distributional approaches to determining -metaphoric knowledge are discussed. -" -644,1212.3162,Aaron Gerow and Khurshid Ahmad,Diachronic Variation in Grammatical Relations,cs.CL," We present a method of finding and analyzing shifts in grammatical relations -found in diachronic corpora. Inspired by the econometric technique of measuring -return and volatility instead of relative frequencies, we propose them as a way -to better characterize changes in grammatical patterns like nominalization, -modification and comparison. To exemplify the use of these techniques, we -examine a corpus of NIPS papers and report trends which manifest at the token, -part-of-speech and grammatical levels. Building up from frequency observations -to a second-order analysis, we show that shifts in frequencies overlook deeper -trends in language, even when part-of-speech information is included. Examining -token, POS and grammatical levels of variation enables a summary view of -diachronic text as a whole. We conclude with a discussion about how these -methods can inform intuitions about specialist domains as well as changes in -language use as a whole. -" -645,1212.3171,"Iwona Grabska-Gradzi\'nska, Andrzej Kulig, Jaros{\l}aw Kwapie\'n, - Pawe{\l} O\'swi\k{e}cimka, Stanis{\l}aw Dro\.zd\.z",Multifractal analysis of sentence lengths in English literary texts,physics.data-an cs.CL physics.soc-ph," This paper presents analysis of 30 literary texts written in English by -different authors. For each text, there were created time series representing -length of sentences in words and analyzed its fractal properties using two -methods of multifractal analysis: MFDFA and WTMM. Both methods showed that -there are texts which can be considered multifractal in this representation but -a majority of texts are not multifractal or even not fractal at all. Out of 30 -books, only a few have so-correlated lengths of consecutive sentences that the -analyzed signals can be interpreted as real multifractals. An interesting -direction for future investigations would be identifying what are the specific -features which cause certain texts to be multifractal and other to be -monofractal or even not fractal at all. -" -646,1212.3228,"Peiyou Song, Anhei Shu, David Phipps, Dan Wallach, Mohit Tiwari, - Jedidiah Crandall, George Luger","Language Without Words: A Pointillist Model for Natural Language - Processing",cs.CL cs.IR cs.SI," This paper explores two separate questions: Can we perform natural language -processing tasks without a lexicon?; and, Should we? Existing natural language -processing techniques are either based on words as units or use units such as -grams only for basic classification tasks. How close can a machine come to -reasoning about the meanings of words and phrases in a corpus without using any -lexicon, based only on grams? - Our own motivation for posing this question is based on our efforts to find -popular trends in words and phrases from online Chinese social media. This form -of written Chinese uses so many neologisms, creative character placements, and -combinations of writing systems that it has been dubbed the ""Martian Language."" -Readers must often use visual queues, audible queues from reading out loud, and -their knowledge and understanding of current events to understand a post. For -analysis of popular trends, the specific problem is that it is difficult to -build a lexicon when the invention of new ways to refer to a word or concept is -easy and common. For natural language processing in general, we argue in this -paper that new uses of language in social media will challenge machines' -abilities to operate with words as the basic unit of understanding, not only in -Chinese but potentially in other languages. -" -647,1212.3493,"Alejandro Molina, Juan-Manuel Torres-Moreno, Iria da Cunha, Eric - SanJuan, Gerardo Sierra","Sentence Compression in Spanish driven by Discourse Segmentation and - Language Models",cs.CL cs.IR," Previous works demonstrated that Automatic Text Summarization (ATS) by -sentences extraction may be improved using sentence compression. In this work -we present a sentence compressions approach guided by level-sentence discourse -segmentation and probabilistic language models (LM). The results presented here -show that the proposed solution is able to generate coherent summaries with -grammatical compressed sentences. The approach is simple enough to be -transposed into other languages. -" -648,1212.3634,"Hanane Froud, Abdelmonaim Lachkar, and Said Alaoui Ouatik","A comparative study of root-based and stem-based approaches for - measuring the similarity between arabic words for arabic text mining - applications",cs.CL cs.IR," Representation of semantic information contained in the words is needed for -any Arabic Text Mining applications. More precisely, the purpose is to better -take into account the semantic dependencies between words expressed by the -co-occurrence frequencies of these words. There have been many proposals to -compute similarities between words based on their distributions in contexts. In -this paper, we compare and contrast the effect of two preprocessing techniques -applied to Arabic corpus: Rootbased (Stemming), and Stem-based (Light Stemming) -approaches for measuring the similarity between Arabic words with the well -known abstractive model -Latent Semantic Analysis (LSA)- with a wide variety of -distance functions and similarity measures, such as the Euclidean Distance, -Cosine Similarity, Jaccard Coefficient, and the Pearson Correlation -Coefficient. The obtained results show that, on the one hand, the variety of -the corpus produces more accurate results; on the other hand, the Stem-based -approach outperformed the Root-based one because this latter affects the words -meanings. -" -649,1212.4315,Lorenzo Gatti and Marco Guerini,Assessing Sentiment Strength in Words Prior Polarities,cs.CL," Many approaches to sentiment analysis rely on lexica where words are tagged -with their prior polarity - i.e. if a word out of context evokes something -positive or something negative. In particular, broad-coverage resources like -SentiWordNet provide polarities for (almost) every word. Since words can have -multiple senses, we address the problem of how to compute the prior polarity of -a word starting from the polarity of each sense and returning its polarity -strength as an index between -1 and 1. We compare 14 such formulae that appear -in the literature, and assess which one best approximates the human judgement -of prior polarities, with both regression and classification models. -" -650,1212.4674,Hyeok Kong,"Natural Language Understanding Based on Semantic Relations between - Sentences",cs.CL," In this paper, we define event expression over sentences of natural language -and semantic relations between events. Based on this definition, we formally -consider text understanding process having events as basic unit. -" -651,1212.5238,"Delia Mocanu, Andrea Baronchelli, Bruno Gon\c{c}alves, Nicola Perra, - Alessandro Vespignani","The Twitter of Babel: Mapping World Languages through Microblogging - Platforms",physics.soc-ph cs.CL cs.SI," Large scale analysis and statistics of socio-technical systems that just a -few short years ago would have required the use of consistent economic and -human resources can nowadays be conveniently performed by mining the enormous -amount of digital data produced by human activities. Although a -characterization of several aspects of our societies is emerging from the data -revolution, a number of questions concerning the reliability and the biases -inherent to the big data ""proxies"" of social life are still open. Here, we -survey worldwide linguistic indicators and trends through the analysis of a -large-scale dataset of microblogging posts. We show that available data allow -for the study of language geography at scales ranging from country-level -aggregation to specific city neighborhoods. The high resolution and coverage of -the data allows us to investigate different indicators such as the linguistic -homogeneity of different countries, the touristic seasonal patterns within -countries and the geographical distribution of different languages in -multilingual regions. This work highlights the potential of geolocalized -studies of open data sources to improve current analysis and develop indicators -for major social phenomena in specific communities. -" -652,1212.6527,Eugene Yuta Bann,"Discovering Basic Emotion Sets via Semantic Clustering on a Twitter - Corpus",cs.AI cs.CL," A plethora of words are used to describe the spectrum of human emotions, but -how many emotions are there really, and how do they interact? Over the past few -decades, several theories of emotion have been proposed, each based around the -existence of a set of 'basic emotions', and each supported by an extensive -variety of research including studies in facial expression, ethology, neurology -and physiology. Here we present research based on a theory that people transmit -their understanding of emotions through the language they use surrounding -emotion keywords. Using a labelled corpus of over 21,000 tweets, six of the -basic emotion sets proposed in existing literature were analysed using Latent -Semantic Clustering (LSC), evaluating the distinctiveness of the semantic -meaning attached to the emotional label. We hypothesise that the more distinct -the language is used to express a certain emotion, then the more distinct the -perception (including proprioception) of that emotion is, and thus more -'basic'. This allows us to select the dimensions best representing the entire -spectrum of emotion. We find that Ekman's set, arguably the most frequently -used for classifying emotions, is in fact the most semantically distinct -overall. Next, taking all analysed (that is, previously proposed) emotion terms -into account, we determine the optimal semantically irreducible basic emotion -set using an iterative LSC algorithm. Our newly-derived set (Accepting, -Ashamed, Contempt, Interested, Joyful, Pleased, Sleepy, Stressed) generates a -6.1% increase in distinctiveness over Ekman's set (Angry, Disgusted, Joyful, -Sad, Scared). We also demonstrate how using LSC data can help visualise -emotions. We introduce the concept of an Emotion Profile and briefly analyse -compound emotions both visually and mathematically. -" -653,1301.0570,Joshua Goodman,Reduction of Maximum Entropy Models to Hidden Markov Models,cs.AI cs.CL," We show that maximum entropy (maxent) models can be modeled with certain -kinds of HMMs, allowing us to construct maxent models with hidden variables, -hidden state sequences, or other characteristics. The models can be trained -using the forward-backward algorithm. While the results are primarily of -theoretical interest, unifying apparently unrelated concepts, we also give -experimental results for a maxent model with a hidden variable on a word -disambiguation task; the model outperforms standard techniques. -" -654,1301.0722,"Stefan Gerdjikov, Stoyan Mihov, Petar Mitankin, Klaus U. Schulz","Good parts first - a new algorithm for approximate search in lexica and - string databases",cs.CL cs.DS," We present a new efficient method for approximate search in electronic -lexica. Given an input string (the pattern) and a similarity threshold, the -algorithm retrieves all entries of the lexicon that are sufficiently similar to -the pattern. Search is organized in subsearches that always start with an exact -partial match where a substring of the input pattern is aligned with a -substring of a lexicon word. Afterwards this partial match is extended stepwise -to larger substrings. For aligning further parts of the pattern with -corresponding parts of lexicon entries, more errors are tolerated at each -subsequent step. For supporting this alignment order, which may start at any -part of the pattern, the lexicon is represented as a structure that enables -immediate access to any substring of a lexicon word and permits the extension -of such substrings in both directions. Experimental evaluations of the -approximate search procedure are given that show significant efficiency -improvements compared to existing techniques. Since the technique can be used -for large error bounds it offers interesting possibilities for approximate -search in special collections of ""long"" strings, such as phrases, sentences, or -book ti -" -655,1301.1429,"Christian M. Alis, May T. Lim",Adaptation of fictional and online conversations to communication media,physics.soc-ph cs.CL physics.data-an," Conversations allow the quick transfer of short bits of information and it is -reasonable to expect that changes in communication medium affect how we -converse. Using conversations in works of fiction and in an online social -networking platform, we show that the utterance length of conversations is -slowly shortening with time but adapts more strongly to the constraints of the -communication medium. This indicates that the introduction of any new medium of -communication can affect the way natural language evolves. -" -656,1301.1950,Bogdan Patrut,"Syntactic Analysis Based on Morphological Characteristic Features of the - Romanian Language",cs.CL cs.AI," This paper refers to the syntactic analysis of phrases in Romanian, as an -important process of natural language processing. We will suggest a real-time -solution, based on the idea of using some words or groups of words that -indicate grammatical category; and some specific endings of some parts of -sentence. Our idea is based on some characteristics of the Romanian language, -where some prepositions, adverbs or some specific endings can provide a lot of -information about the structure of a complex sentence. Such characteristics can -be found in other languages, too, such as French. Using a special grammar, we -developed a system (DIASEXP) that can perform a dialogue in natural language -with assertive and interogative sentences about a ""story"" (a set of sentences -describing some events from the real life). -" -657,1301.2405,"Gelila Tilahun, Andrey Feuerverger, Michael Gervers",Dating medieval English charters,stat.AP cs.CL," Deeds, or charters, dealing with property rights, provide a continuous -documentation which can be used by historians to study the evolution of social, -economic and political changes. This study is concerned with charters (written -in Latin) dating from the tenth through early fourteenth centuries in England. -Of these, at least one million were left undated, largely due to administrative -changes introduced by William the Conqueror in 1066. Correctly dating such -charters is of vital importance in the study of English medieval history. This -paper is concerned with computer-automated statistical methods for dating such -document collections, with the goal of reducing the considerable efforts -required to date them manually and of improving the accuracy of assigned dates. -Proposed methods are based on such data as the variation over time of word and -phrase usage, and on measures of distance between documents. The extensive (and -dated) Documents of Early England Data Set (DEEDS) maintained at the University -of Toronto was used for this purpose. -" -658,1301.2444,"Laurent Romary (ALPAGE, CMB)",TEI and LMF crosswalks,cs.CL," The present paper explores various arguments in favour of making the Text -Encoding Initia-tive (TEI) guidelines an appropriate serialisation for ISO -standard 24613:2008 (LMF, Lexi-cal Mark-up Framework) . It also identifies the -issues that would have to be resolved in order to reach an appropriate -implementation of these ideas, in particular in terms of infor-mational -coverage. We show how the customisation facilities offered by the TEI -guidelines can provide an adequate background, not only to cover missing -components within the current Dictionary chapter of the TEI guidelines, but -also to allow specific lexical projects to deal with local constraints. We -expect this proposal to be a basis for a future ISO project in the context of -the on going revision of LMF. -" -659,1301.2466,"Oleg Sychev, Dmitry Mamontov","Determining token sequence mistakes in responses to questions with open - text answer",cs.CL cs.CY," When learning grammar of the new language, a teacher should routinely check -student's exercises for grammatical correctness. The paper describes a method -of automatically detecting and reporting grammar mistakes, regarding an order -of tokens in the response. It could report extra tokens, missing tokens and -misplaced tokens. The method is useful when teaching language, where order of -tokens is important, which includes most formal languages and some natural ones -(like English). The method was implemented in a question type plug-in -CorrectWriting for the widely used learning manage system Moodle. -" -660,1301.2811,Christian Scheible and Hinrich Schuetze,Cutting Recursive Autoencoder Trees,cs.CL cs.AI," Deep Learning models enjoy considerable success in Natural Language -Processing. While deep architectures produce useful representations that lead -to improvements in various tasks, they are often difficult to interpret. This -makes the analysis of learned structures particularly difficult. In this paper, -we rely on empirical tests to see whether a particular structure makes sense. -We present an analysis of the Semi-Supervised Recursive Autoencoder, a -well-known model that produces structural representations of text. We show that -for certain tasks, the structure of the autoencoder can be significantly -reduced without loss of classification accuracy and we evaluate the produced -structures using human judgment. -" -661,1301.2857,Rami Al-Rfou' and Steven Skiena,SpeedRead: A Fast Named Entity Recognition Pipeline,cs.CL," Online content analysis employs algorithmic methods to identify entities in -unstructured text. Both machine learning and knowledge-base approaches lie at -the foundation of contemporary named entities extraction systems. However, the -progress in deploying these approaches on web-scale has been been hampered by -the computational cost of NLP over massive text corpora. We present SpeedRead -(SR), a named entity recognition pipeline that runs at least 10 times faster -than Stanford NLP pipeline. This pipeline consists of a high performance Penn -Treebank- compliant tokenizer, close to state-of-art part-of-speech (POS) -tagger and knowledge-based named entity recognizer. -" -662,1301.3214,"Seungyeon Kim, Fuxin Li, Guy Lebanon, Irfan Essa",The Manifold of Human Emotions,cs.CL," Sentiment analysis predicts the presence of positive or negative emotions in -a text document. In this paper, we consider higher dimensional extensions of -the sentiment concept, which represent a richer set of human emotions. Our -approach goes beyond previous work in that our model contains a continuous -manifold rather than a finite set of human emotions. We investigate the -resulting model, compare it to psychological observations, and explore its -predictive capabilities. -" -663,1301.3226,"Yanqing Chen, Bryan Perozzi, Rami Al-Rfou, Steven Skiena",The Expressive Power of Word Embeddings,cs.LG cs.CL stat.ML," We seek to better understand the difference in quality of the several -publicly released embeddings. We propose several tasks that help to distinguish -the characteristics of different embeddings. Our evaluation of sentiment -polarity and synonym/antonym relations shows that embeddings are able to -capture surprisingly nuanced semantics even in the absence of sentence -structure. Moreover, benchmarking the embeddings shows great variance in -quality and characteristics of the semantics captured by the tested embeddings. -Finally, we show the impact of varying the number of dimensions and the -resolution of each dimension on the effective useful features captured by the -embedding space. Our contributions highlight the importance of embeddings for -NLP tasks and the effect of their quality on the final results. -" -664,1301.3547,Benjamin Englard,A Rhetorical Analysis Approach to Natural Language Processing,cs.CL stat.ML," The goal of this research was to find a way to extend the capabilities of -computers through the processing of language in a more human way, and present -applications which demonstrate the power of this method. This research presents -a novel approach, Rhetorical Analysis, to solving problems in Natural Language -Processing (NLP). The main benefit of Rhetorical Analysis, as opposed to -previous approaches, is that it does not require the accumulation of large sets -of training data, but can be used to solve a multitude of problems within the -field of NLP. The NLP problems investigated with Rhetorical Analysis were the -Author Identification problem - predicting the author of a piece of text based -on its rhetorical strategies, Election Prediction - predicting the winner of a -presidential candidate's re-election campaign based on rhetorical strategies -within that president's inaugural address, Natural Language Generation - having -a computer produce text containing rhetorical strategies, and Document -Summarization. The results of this research indicate that an Author -Identification system based on Rhetorical Analysis could predict the correct -author 100% of the time, that a re-election predictor based on Rhetorical -Analysis could predict the correct winner of a re-election campaign 55% of the -time, that a Natural Language Generation system based on Rhetorical Analysis -could output text with up to 87.3% similarity to Shakespeare in style, and that -a Document Summarization system based on Rhetorical Analysis could extract -highly relevant sentences. Overall, this study demonstrated that Rhetorical -Analysis could be a useful approach to solving problems in NLP. -" -665,1301.3605,"Dong Yu, Michael L. Seltzer, Jinyu Li, Jui-Ting Huang, Frank Seide","Feature Learning in Deep Neural Networks - Studies on Speech Recognition - Tasks",cs.LG cs.CL cs.NE eess.AS," Recent studies have shown that deep neural networks (DNNs) perform -significantly better than shallow networks and Gaussian mixture models (GMMs) -on large vocabulary speech recognition tasks. In this paper, we argue that the -improved accuracy achieved by the DNNs is the result of their ability to -extract discriminative internal representations that are robust to the many -sources of variability in speech signals. We show that these representations -become increasingly insensitive to small perturbations in the input with -increasing network depth, which leads to better speech recognition performance -with deeper networks. We also show that DNNs cannot extrapolate to test samples -that are substantially different from the training examples. If the training -data are sufficiently representative, however, internal features learned by the -DNN are relatively stable with respect to speaker differences, bandwidth -differences, and environment distortion. This enables DNN-based recognizers to -perform as well or better than state-of-the-art systems based on GMMs or -shallow networks without the need for explicit model adaptation or feature -normalization. -" -666,1301.3614,Tsuyoshi Okita,"Joint Space Neural Probabilistic Language Model for Statistical Machine - Translation",cs.CL," A neural probabilistic language model (NPLM) provides an idea to achieve the -better perplexity than n-gram language model and their smoothed language -models. This paper investigates application area in bilingual NLP, specifically -Statistical Machine Translation (SMT). We focus on the perspectives that NPLM -has potential to open the possibility to complement potentially `huge' -monolingual resources into the `resource-constraint' bilingual resources. We -introduce an ngram-HMM language model as NPLM using the non-parametric Bayesian -construction. In order to facilitate the application to various tasks, we -propose the joint space model of ngram-HMM language model. We show an -experiment of system combination in the area of SMT. One discovery was that our -treatment of noise improved the results 0.20 BLEU points if NPLM is trained in -relatively small corpus, in our case 500,000 sentence pairs, which is often the -case due to the long training time of NPLM. -" -667,1301.3618,"Danqi Chen, Richard Socher, Christopher D. Manning, Andrew Y. Ng","Learning New Facts From Knowledge Bases With Neural Tensor Networks and - Semantic Word Vectors",cs.CL cs.LG," Knowledge bases provide applications with the benefit of easily accessible, -systematic relational knowledge but often suffer in practice from their -incompleteness and lack of knowledge of new entities and relations. Much work -has focused on building or extending them by finding patterns in large -unannotated text corpora. In contrast, here we mainly aim to complete a -knowledge base by predicting additional true relationships between entities, -based on generalizations that can be discerned in the given knowledgebase. We -introduce a neural tensor network (NTN) model which predicts new relationship -entries that can be added to the database. This model can be improved by -initializing entity representations with word vectors learned in an -unsupervised fashion from text, and when doing this, existing relations can -even be queried for entities that were not present in the database. Our model -generalizes and outperforms existing models for this problem, and can classify -unseen relationships in WordNet with an accuracy of 75.8%. -" -668,1301.3627,"Hinrich Schuetze, Christian Scheible",Two SVDs produce more focal deep learning representations,cs.CL cs.LG," A key characteristic of work on deep learning and neural networks in general -is that it relies on representations of the input that support generalization, -robust inference, domain adaptation and other desirable functionalities. Much -recent progress in the field has focused on efficient and effective methods for -computing representations. In this paper, we propose an alternative method that -is more efficient than prior work and produces representations that have a -property we call focality -- a property we hypothesize to be important for -neural network representations. The method consists of a simple application of -two consecutive SVDs and is inspired by Anandkumar (2012). -" -669,1301.3781,"Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean",Efficient Estimation of Word Representations in Vector Space,cs.CL," We propose two novel model architectures for computing continuous vector -representations of words from very large data sets. The quality of these -representations is measured in a word similarity task, and the results are -compared to the previously best performing techniques based on different types -of neural networks. We observe large improvements in accuracy at much lower -computational cost, i.e. it takes less than a day to learn high quality word -vectors from a 1.6 billion words data set. Furthermore, we show that these -vectors provide state-of-the-art performance on our test set for measuring -syntactic and semantic word similarities. -" -670,1301.4432,"Anne S. Hsu (Department of Cognitive, Perceptual and Brain Sciences, - University College London), Nick Chater (Behavioural Science Group, Warwick - Business School, University of Warwick), Paul M.B. Vit\'anyi (CWI, Amsterdam)","Language learning from positive evidence, reconsidered: A - simplicity-based approach",cs.CL," Children learn their native language by exposure to their linguistic and -communicative environment, but apparently without requiring that their mistakes -are corrected. Such learning from positive evidence has been viewed as raising -logical problems for language acquisition. In particular, without correction, -how is the child to recover from conjecturing an over-general grammar, which -will be consistent with any sentence that the child hears? There have been many -proposals concerning how this logical problem can be dissolved. Here, we review -recent formal results showing that the learner has sufficient data to learn -successfully from positive evidence, if it favours the simplest encoding of the -linguistic input. Results include the ability to learn a linguistic prediction, -grammaticality judgements, language production, and form-meaning mappings. The -simplicity approach can also be scaled-down to analyse the ability to learn a -specific linguistic constructions, and is amenable to empirical test as a -framework for describing human language acquisition. -" -671,1301.4938,"Christian Retor\'e (LaBRI, IRIT)","A type theoretical framework for natural language semantics: the - Montagovian generative lexicon",cs.LO cs.CL," We present a framework, named the Montagovian generative lexicon, for -computing the semantics of natural language sentences, expressed in many sorted -higher order logic. Word meaning is depicted by lambda terms of second order -lambda calculus (Girard's system F) with base types including a type for -propositions and many types for sorts of a many sorted logic. This framework is -able to integrate a proper treatment of lexical phenomena into a Montagovian -compositional semantics, including the restriction of selection which imposes -the nature of the arguments of a predicate, and the possible adaptation of a -word meaning to some contexts. Among these adaptations of a word's sense to the -context, ontological inclusions are handled by an extension of system F with -coercive subtyping that is introduced in the present paper. The benefits of -this framework for lexical pragmatics are illustrated on meaning transfers and -coercions, on possible and impossible copredication over different senses, on -deverbal ambiguities, and on ""fictive motion"". Next we show that the -compositional treatment of determiners, quantifiers, plurals,... are finer -grained in our framework. We then conclude with the linguistic, logical and -computational perspectives opened by the Montagovian generative lexicon. -" -672,1301.5686,"Jeon-Hyung Kang, Jun Ma, Yan Liu",Transfer Topic Modeling with Ease and Scalability,cs.CL cs.LG stat.ML," The increasing volume of short texts generated on social media sites, such as -Twitter or Facebook, creates a great demand for effective and efficient topic -modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it -is not optimal due to its weakness in handling short texts with fast-changing -topics and scalability concerns. In this paper, we propose a transfer learning -approach that utilizes abundant labeled documents from other domains (such as -Yahoo! News or Wikipedia) to improve topic modeling, with better model fitting -and result interpretation. Specifically, we develop Transfer Hierarchical LDA -(thLDA) model, which incorporates the label information from other domains via -informative priors. In addition, we develop a parallel implementation of our -model for large-scale applications. We demonstrate the effectiveness of our -thLDA model on both a microblogging dataset and standard text collections -including AP and RCV1 datasets. -" -673,1301.6939,"Edward Grefenstette, Georgiana Dinu, Yao-Zhong Zhang, Mehrnoosh - Sadrzadeh and Marco Baroni","Multi-Step Regression Learning for Compositional Distributional - Semantics",cs.CL cs.LG," We present a model for compositional distributional semantics related to the -framework of Coecke et al. (2010), and emulating formal semantics by -representing functions as tensors and arguments as vectors. We introduce a new -learning method for tensors, generalising the approach of Baroni and Zamparelli -(2010). We evaluate it on two benchmark data sets, and find it to outperform -existing leading methods. We argue in our analysis that the nature of this -learning method also renders it suitable for solving more subtle problems -compositional distributional models might face. -" -674,1301.7382,"David Heckerman, Eric J. Horvitz","Inferring Informational Goals from Free-Text Queries: A Bayesian - Approach",cs.IR cs.AI cs.CL," People using consumer software applications typically do not use technical -jargon when querying an online database of help topics. Rather, they attempt to -communicate their goals with common words and phrases that describe software -functionality in terms of structure and objects they understand. We describe a -Bayesian approach to modeling the relationship between words in a user's query -for assistance and the informational goals of the user. After reviewing the -general method, we describe several extensions that center on integrating -additional distinctions and structure about language usage and user goals into -the Bayesian models. -" -675,1301.7738,"Fl\'avio Code\c{c}o Coelho and Renato Rocha Souza and \'Alvaro Justen - and Fl\'avio Amieiro and Heliana Mello",PyPLN: a Distributed Platform for Natural Language Processing,cs.CL cs.IR," This paper presents a distributed platform for Natural Language Processing -called PyPLN. PyPLN leverages a vast array of NLP and text processing open -source tools, managing the distribution of the workload on a variety of -configurations: from a single server to a cluster of linux servers. PyPLN is -developed using Python 2.7.3 but makes it very easy to incorporate other -softwares for specific tasks as long as a linux version is available. PyPLN -facilitates analyses both at document and corpus level, simplifying management -and publication of corpora and analytical results through an easy to use web -interface. In the current (beta) release, it supports English and Portuguese -languages with support to other languages planned for future releases. To -support the Portuguese language PyPLN uses the PALAVRAS parser\citep{Bick2000}. -Currently PyPLN offers the following features: Text extraction with encoding -normalization (to UTF-8), part-of-speech tagging, token frequency, semantic -annotation, n-gram extraction, word and sentence repertoire, and full-text -search across corpora. The platform is licensed as GPL-v3. -" -676,1302.0393,"Bob Coecke, Edward Grefenstette, and Mehrnoosh Sadrzadeh","Lambek vs. Lambek: Functorial Vector Space Semantics and String Diagrams - for Lambek Calculus",math.LO cs.CL math.CT," The Distributional Compositional Categorical (DisCoCat) model is a -mathematical framework that provides compositional semantics for meanings of -natural language sentences. It consists of a computational procedure for -constructing meanings of sentences, given their grammatical structure in terms -of compositional type-logic, and given the empirically derived meanings of -their words. For the particular case that the meaning of words is modelled -within a distributional vector space model, its experimental predictions, -derived from real large scale data, have outperformed other empirically -validated methods that could build vectors for a full sentence. This success -can be attributed to a conceptually motivated mathematical underpinning, by -integrating qualitative compositional type-logic and quantitative modelling of -meaning within a category-theoretic mathematical framework. - The type-logic used in the DisCoCat model is Lambek's pregroup grammar. -Pregroup types form a posetal compact closed category, which can be passed, in -a functorial manner, on to the compact closed structure of vector spaces, -linear maps and tensor product. The diagrammatic versions of the equational -reasoning in compact closed categories can be interpreted as the flow of word -meanings within sentences. Pregroups simplify Lambek's previous type-logic, the -Lambek calculus, which has been extensively used to formalise and reason about -various linguistic phenomena. The apparent reliance of the DisCoCat on -pregroups has been seen as a shortcoming. This paper addresses this concern, by -pointing out that one may as well realise a functorial passage from the -original type-logic of Lambek, a monoidal bi-closed category, to vector spaces, -or to any other model of meaning organised within a monoidal bi-closed -category. The corresponding string diagram calculus, due to Baez and Stay, now -depicts the flow of word meanings. -" -677,1302.1123,"Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson",Large Scale Distributed Acoustic Modeling With Back-off N-grams,cs.CL," The paper revives an older approach to acoustic modeling that borrows from -n-gram language modeling in an attempt to scale up both the amount of training -data and model size (as measured by the number of parameters in the model), to -approximately 100 times larger than current sizes used in automatic speech -recognition. In such a data-rich setting, we can expand the phonetic context -significantly beyond triphones, as well as increase the number of Gaussian -mixture components for the context-dependent states that allow it. We have -experimented with contexts that span seven or more context-independent phones, -and up to 620 mixture components per state. Dealing with unseen phonetic -contexts is accomplished using the familiar back-off technique used in language -modeling due to implementation simplicity. The back-off acoustic model is -estimated, stored and served using MapReduce distributed computing -infrastructure. - Speech recognition experiments are carried out in an N-best list rescoring -framework for Google Voice Search. Training big models on large amounts of data -proves to be an effective way to increase the accuracy of a state-of-the-art -automatic speech recognition system. We use 87,000 hours of training data -(speech along with transcription) obtained by filtering utterances in Voice -Search logs on automatic speech recognition confidence. Models ranging in size -between 20--40 million Gaussians are estimated using maximum likelihood -training. They achieve relative reductions in word-error-rate of 11% and 6% -when combined with first-pass models trained using maximum likelihood, and -boosted maximum mutual information, respectively. Increasing the context size -beyond five phones (quinphones) does not help. -" -678,1302.1380,"Catarina Moreira and Ana Cristina Mendes and Lu\'isa Coheur and Bruno - Martins",Towards the Rapid Development of a Natural Language Understanding Module,cs.CL," When developing a conversational agent, there is often an urgent need to have -a prototype available in order to test the application with real users. A -Wizard of Oz is a possibility, but sometimes the agent should be simply -deployed in the environment where it will be used. Here, the agent should be -able to capture as many interactions as possible and to understand how people -react to failure. In this paper, we focus on the rapid development of a natural -language understanding module by non experts. Our approach follows the learning -paradigm and sees the process of understanding natural language as a -classification problem. We test our module with a conversational agent that -answers questions in the art domain. Moreover, we show how our approach can be -used by a natural language interface to a cinema database. -" -679,1302.1422,"Christian Retor\'e (LaBRI, IRIT)",S\'emantique des d\'eterminants dans un cadre richement typ\'e,cs.CL," The variation of word meaning according to the context leads us to enrich the -type system of our syntactical and semantic analyser of French based on -categorial grammars and Montague semantics (or lambda-DRT). The main advantage -of a deep semantic analyse is too represent meaning by logical formulae that -can be easily used e.g. for inferences. Determiners and quantifiers play a -fundamental role in the construction of those formulae. But in our rich type -system the usual semantic terms do not work. We propose a solution ins- pired -by the tau and epsilon operators of Hilbert, kinds of generic elements and -choice functions. This approach unifies the treatment of the different determi- -ners and quantifiers as well as the dynamic binding of pronouns. Above all, -this fully computational view fits in well within the wide coverage parser -Grail, both from a theoretical and a practical viewpoint. -" -680,1302.1572,"Ian Thomas, Ingrid Zukerman, Jonathan Oliver, David Albrecht, Bhavani - Raskutti","Lexical Access for Speech Understanding using Minimum Message Length - Encoding",cs.CL," The Lexical Access Problem consists of determining the intended sequence of -words corresponding to an input sequence of phonemes (basic speech sounds) that -come from a low-level phoneme recognizer. In this paper we present an -information-theoretic approach based on the Minimum Message Length Criterion -for solving the Lexical Access Problem. We model sentences using phoneme -realizations seen in training, and word and part-of-speech information obtained -from text corpora. We show results on multiple-speaker, continuous, read speech -and discuss a heuristic using equivalence classes of similar sounding words -which speeds up the recognition process without significant deterioration in -recognition accuracy. -" -681,1302.1612,"Hanane Froud, Abdelmonaime Lachkar, Said Alaoui Ouatik","Arabic text summarization based on latent semantic analysis to enhance - arabic documents clustering",cs.IR cs.CL," Arabic Documents Clustering is an important task for obtaining good results -with the traditional Information Retrieval (IR) systems especially with the -rapid growth of the number of online documents present in Arabic language. -Documents clustering aim to automatically group similar documents in one -cluster using different similarity/distance measures. This task is often -affected by the documents length, useful information on the documents is often -accompanied by a large amount of noise, and therefore it is necessary to -eliminate this noise while keeping useful information to boost the performance -of Documents clustering. In this paper, we propose to evaluate the impact of -text summarization using the Latent Semantic Analysis Model on Arabic Documents -Clustering in order to solve problems cited above, using five -similarity/distance measures: Euclidean Distance, Cosine Similarity, Jaccard -Coefficient, Pearson Correlation Coefficient and Averaged Kullback-Leibler -Divergence, for two times: without and with stemming. Our experimental results -indicate that our proposed approach effectively solves the problems of noisy -information and documents length, and thus significantly improve the clustering -performance. -" -682,1302.2131,Bohdan Pavlyshenko,"Data Mining of the Concept ""End of the World"" in Twitter Microblogs",cs.SI cs.CL cs.IR physics.soc-ph," This paper describes the analysis of quantitative characteristics of frequent -sets and association rules in the posts of Twitter microblogs, related to the -discussion of ""end of the world"", which was allegedly predicted on December 21, -2012 due to the Mayan calendar. Discovered frequent sets and association rules -characterize semantic relations between the concepts of analyzed subjects.The -support for some fequent sets reaches the global maximum before the expected -event with some time delay. Such frequent sets may be considered as predictive -markers that characterize the significance of expected events for blogosphere -users. It was shown that time dynamics of confidence of some revealed -association rules can also have predictive characteristics. Exceeding a certain -threshold, it may be a signal for the corresponding reaction in the society -during the time interval between the maximum and probable coming of an event. -" -683,1302.2569,Olivier Catoni and Thomas Mainguy,Toric grammars: a new statistical approach to natural language modeling,stat.ML cs.CL math.PR," We propose a new statistical model for computational linguistics. Rather than -trying to estimate directly the probability distribution of a random sentence -of the language, we define a Markov chain on finite sets of sentences with many -finite recurrent communicating classes and define our language model as the -invariant probability measures of the chain on each recurrent communicating -class. This Markov chain, that we call a communication model, recombines at -each step randomly the set of sentences forming its current state, using some -grammar rules. When the grammar rules are fixed and known in advance instead of -being estimated on the fly, we can prove supplementary mathematical properties. -In particular, we can prove in this case that all states are recurrent states, -so that the chain defines a partition of its state space into finite recurrent -communicating classes. We show that our approach is a decisive departure from -Markov models at the sentence level and discuss its relationships with Context -Free Grammars. Although the toric grammars we use are closely related to -Context Free Grammars, the way we generate the language from the grammar is -qualitatively different. Our communication model has two purposes. On the one -hand, it is used to define indirectly the probability distribution of a random -sentence of the language. On the other hand it can serve as a (crude) model of -language transmission from one speaker to another speaker through the -communication of a (large) set of sentences. -" -684,1302.3057,"Jacob Dlougach, Irina Galinskaya",Building a reordering system using tree-to-string hierarchical model,cs.CL," This paper describes our submission to the First Workshop on Reordering for -Statistical Machine Translation. We have decided to build a reordering system -based on tree-to-string model, using only publicly available tools to -accomplish this task. With the provided training data we have built a -translation model using Moses toolkit, and then we applied a chart decoder, -implemented in Moses, to reorder the sentences. Even though our submission only -covered English-Farsi language pair, we believe that the approach itself should -work regardless of the choice of the languages, so we have also carried out the -experiments for English-Italian and English-Urdu. For these language pairs we -have noticed a significant improvement over the baseline in BLEU, Kendall-Tau -and Hamming metrics. A detailed description is given, so that everyone can -reproduce our results. Also, some possible directions for further improvements -are discussed. -" -685,1302.3831,Diederik Aerts and Sandro Sozzo,Quantum Entanglement in Concept Combinations,cs.AI cs.CL quant-ph," Research in the application of quantum structures to cognitive science -confirms that these structures quite systematically appear in the dynamics of -concepts and their combinations and quantum-based models faithfully represent -experimental data of situations where classical approaches are problematical. -In this paper, we analyze the data we collected in an experiment on a specific -conceptual combination, showing that Bell's inequalities are violated in the -experiment. We present a new refined entanglement scheme to model these data -within standard quantum theory rules, where 'entangled measurements and -entangled evolutions' occur, in addition to the expected 'entangled states', -and present a full quantum representation in complex Hilbert space of the data. -This stronger form of entanglement in measurements and evolutions might have -relevant applications in the foundations of quantum theory, as well as in the -interpretation of nonlocality tests. It could indeed explain some -non-negligible 'anomalies' identified in EPR-Bell experiments. -" -686,1302.3892,"Eduardo G. Altmann, Zakary L. Whichard, Adilson E. Motter",Identifying trends in word frequency dynamics,physics.soc-ph cond-mat.dis-nn cs.CL q-bio.PE," The word-stock of a language is a complex dynamical system in which words can -be created, evolve, and become extinct. Even more dynamic are the short-term -fluctuations in word usage by individuals in a population. Building on the -recent demonstration that word niche is a strong determinant of future rise or -fall in word frequency, here we introduce a model that allows us to distinguish -persistent from temporary increases in frequency. Our model is illustrated -using a 10^8-word database from an online discussion group and a 10^11-word -collection of digitized books. The model reveals a strong relation between -changes in word dissemination and changes in frequency. Aside from their -implications for short-term word frequency dynamics, these observations are -potentially important for language evolution as new words must survive in the -short term in order to survive in the long term. -" -687,1302.4383,"Armen E. Allahverdyan, Weibing Deng, and Q. A. Wang",Explaining Zipf's Law via Mental Lexicon,physics.data-an cond-mat.stat-mech cs.CL," The Zipf's law is the major regularity of statistical linguistics that served -as a prototype for rank-frequency relations and scaling laws in natural -sciences. Here we show that the Zipf's law -- together with its applicability -for a single text and its generalizations to high and low frequencies including -hapax legomena -- can be derived from assuming that the words are drawn into -the text with random probabilities. Their apriori density relates, via the -Bayesian statistics, to general features of the mental lexicon of the author -who produced the text. -" -688,1302.4465,"Diego R. Amancio, Osvaldo N. Oliveira Jr. and Luciano da F. Costa","Unveiling the relationship between complex networks metrics and word - senses",physics.soc-ph cs.CL cs.SI physics.data-an," The automatic disambiguation of word senses (i.e., the identification of -which of the meanings is used in a given context for a word that has multiple -meanings) is essential for such applications as machine translation and -information retrieval, and represents a key step for developing the so-called -Semantic Web. Humans disambiguate words in a straightforward fashion, but this -does not apply to computers. In this paper we address the problem of Word Sense -Disambiguation (WSD) by treating texts as complex networks, and show that word -senses can be distinguished upon characterizing the local structure around -ambiguous words. Our goal was not to obtain the best possible disambiguation -system, but we nevertheless found that in half of the cases our approach -outperforms traditional shallow methods. We show that the hierarchical -connectivity and clustering of words are usually the most relevant features for -WSD. The results reported here shine light on the relationship between semantic -and structural parameters of complex networks. They also indicate that when -combined with traditional techniques the complex network approach may be useful -to enhance the discrimination of senses in large texts -" -689,1302.4471,Thiago C. Silva and Diego R. Amancio,Word sense disambiguation via high order of learning in complex networks,physics.soc-ph cs.CL cs.SI physics.data-an," Complex networks have been employed to model many real systems and as a -modeling tool in a myriad of applications. In this paper, we use the framework -of complex networks to the problem of supervised classification in the word -disambiguation task, which consists in deriving a function from the supervised -(or labeled) training data of ambiguous words. Traditional supervised data -classification takes into account only topological or physical features of the -input data. On the other hand, the human (animal) brain performs both low- and -high-level orders of learning and it has facility to identify patterns -according to the semantic meaning of the input data. In this paper, we apply a -hybrid technique which encompasses both types of learning in the field of word -sense disambiguation and show that the high-level order of learning can really -improve the accuracy rate of the model. This evidence serves to demonstrate -that the internal structures formed by the words do present patterns that, -generally, cannot be correctly unveiled by only traditional techniques. -Finally, we exhibit the behavior of the model for different weights of the low- -and high-level classifiers by plotting decision boundaries. This study helps -one to better understand the effectiveness of the model. -" -690,1302.4489,Sa Liu and Chengzhi Zhang,"Termhood-based Comparability Metrics of Comparable Corpus in Special - Domain",cs.CL," Cross-Language Information Retrieval (CLIR) and machine translation (MT) -resources, such as dictionaries and parallel corpora, are scarce and hard to -come by for special domains. Besides, these resources are just limited to a few -languages, such as English, French, and Spanish and so on. So, obtaining -comparable corpora automatically for such domains could be an answer to this -problem effectively. Comparable corpora, that the subcorpora are not -translations of each other, can be easily obtained from web. Therefore, -building and using comparable corpora is often a more feasible option in -multilingual information processing. Comparability metrics is one of key issues -in the field of building and using comparable corpus. Currently, there is no -widely accepted definition or metrics method of corpus comparability. In fact, -Different definitions or metrics methods of comparability might be given to -suit various tasks about natural language processing. A new comparability, -namely, termhood-based metrics, oriented to the task of bilingual terminology -extraction, is proposed in this paper. In this method, words are ranked by -termhood not frequency, and then the cosine similarities, calculated based on -the ranking lists of word termhood, is used as comparability. Experiments -results show that termhood-based metrics performs better than traditional -frequency-based metrics. -" -691,1302.4490,"Diego R. Amancio, Sandra M. Aluisio, Osvaldo N. Oliveira Jr. and - Luciano da F. Costa",Complex networks analysis of language complexity,physics.soc-ph cs.CL cs.SI physics.data-an," Methods from statistical physics, such as those involving complex networks, -have been increasingly used in quantitative analysis of linguistic phenomena. -In this paper, we represented pieces of text with different levels of -simplification in co-occurrence networks and found that topological regularity -correlated negatively with textual complexity. Furthermore, in less complex -texts the distance between concepts, represented as nodes, tended to decrease. -The complex networks metrics were treated with multivariate pattern recognition -techniques, which allowed us to distinguish between original texts and their -simplified versions. For each original text, two simplified versions were -generated manually with increasing number of simplification operations. As -expected, distinction was easier for the strongly simplified versions, where -the most relevant metrics were node strength, shortest paths and diversity. -Also, the discrimination of complex texts was improved with higher hierarchical -network metrics, thus pointing to the usefulness of considering wider contexts -around the concepts. Though the accuracy rate in the distinction was not as -high as in methods using deep linguistic knowledge, the complex network -approach is still useful for a rapid screening of texts whenever assessing -complexity is essential to guarantee accessibility to readers with limited -reading ability -" -692,1302.4492,Chengzhi Zhang and Dan Wu,Bilingual Terminology Extraction Using Multi-level Termhood,cs.CL," Purpose: Terminology is the set of technical words or expressions used in -specific contexts, which denotes the core concept in a formal discipline and is -usually applied in the fields of machine translation, information retrieval, -information extraction and text categorization, etc. Bilingual terminology -extraction plays an important role in the application of bilingual dictionary -compilation, bilingual Ontology construction, machine translation and -cross-language information retrieval etc. This paper addresses the issues of -monolingual terminology extraction and bilingual term alignment based on -multi-level termhood. - Design/methodology/approach: A method based on multi-level termhood is -proposed. The new method computes the termhood of the terminology candidate as -well as the sentence that includes the terminology by the comparison of the -corpus. Since terminologies and general words usually have differently -distribution in the corpus, termhood can also be used to constrain and enhance -the performance of term alignment when aligning bilingual terms on the parallel -corpus. In this paper, bilingual term alignment based on termhood constraints -is presented. - Findings: Experiment results show multi-level termhood can get better -performance than existing method for terminology extraction. If termhood is -used as constrain factor, the performance of bilingual term alignment can be -improved. -" -693,1302.4619,"D.V. Lande, A.A.Snarskii",Compactified Horizontal Visibility Graph for the Language Network,cs.CL cs.DS," A compactified horizontal visibility graph for the language network is -proposed. It was found that the networks constructed in such way are scale -free, and have a property that among the nodes with largest degrees there are -words that determine not only a text structure communication, but also its -informational structure. -" -694,1302.4726,"Khalil Riad Bouzidi (INRIA Sophia Antipolis / Laboratoire I3S), Bruno - Fies (CSTB Sophia Antipolis), Marc Bourdeau (CSTB Sophia Antipolis), - Catherine Faron-Zucker (INRIA Sophia Antipolis / Laboratoire I3S), Nhan - Le-Thanh (I3S)","An Ontology for Modelling and Supporting the Process of Authoring - Technical Assessments",cs.IR cs.CL cs.DL," In this paper, we present a semantic web approach for modelling the process -of creating new technical and regulatory documents related to the Building -sector. This industry, among other industries, is currently experiencing a -phenomenal growth in its technical and regulatory texts. Therefore, it is -urgent and crucial to improve the process of creating regulations by automating -it as much as possible. We focus on the creation of particular technical -documents issued by the French Scientific and Technical Centre for Building -(CSTB), called Technical Assessments, and we propose services based on Semantic -Web models and techniques for modelling the process of their creation. -" -695,1302.4811,"Khalil Riad Bouzidi (INRIA Sophia Antipolis / Laboratoire I3S), - Catherine Faron-Zucker (INRIA Sophia Antipolis / Laboratoire I3S), Bruno Fies - (CSTB Sophia Antipolis), Olivier Corby (INRIA Sophia Antipolis / Laboratoire - I3S), Le-Thanh Nhan (I3S)","Towards a Semantic-based Approach for Modeling Regulatory Documents in - Building Industry",cs.CL," Regulations in the Building Industry are becoming increasingly complex and -involve more than one technical area. They cover products, components and -project implementation. They also play an important role to ensure the quality -of a building, and to minimize its environmental impact. In this paper, we are -particularly interested in the modeling of the regulatory constraints derived -from the Technical Guides issued by CSTB and used to validate Technical -Assessments. We first describe our approach for modeling regulatory constraints -in the SBVR language, and formalizing them in the SPARQL language. Second, we -describe how we model the processes of compliance checking described in the -CSTB Technical Guides. Third, we show how we implement these processes to -assist industrials in drafting Technical Documents in order to acquire a -Technical Assessment; a compliance report is automatically generated to explain -the compliance or noncompliance of this Technical Documents. -" -696,1302.4813,"Jackie Chi Kit Cheung, Hoifung Poon, Lucy Vanderwende",Probabilistic Frame Induction,cs.CL," In natural-language discourse, related events tend to appear near each other -to describe a larger scenario. Such structures can be formalized by the notion -of a frame (a.k.a. template), which comprises a set of related events and -prototypical participants and event transitions. Identifying frames is a -prerequisite for information extraction and natural language generation, and is -usually done manually. Methods for inducing frames have been proposed recently, -but they typically use ad hoc procedures and are difficult to diagnose or -extend. In this paper, we propose the first probabilistic approach to frame -induction, which incorporates frames, events, participants as latent topics and -learns those frame and event transitions that best explain the text. The number -of frames is inferred by a novel application of a split-merge method from -syntactic parsing. In end-to-end evaluations from text to induced frames and -extracted facts, our method produced state-of-the-art results while -substantially reducing engineering effort. -" -697,1302.4814,"Georges Antoniadis (LIDILEM), Sylviane Granger, Olivier Kraif - (LIDILEM), Claude Ponton (LIDILEM), Virginie Zampa (LIDILEM)",NLP and CALL: integration is working,cs.CL," In the first part of this article, we explore the background of -computer-assisted learning from its beginnings in the early XIXth century and -the first teaching machines, founded on theories of learning, at the start of -the XXth century. With the arrival of the computer, it became possible to offer -language learners different types of language activities such as comprehension -tasks, simulations, etc. However, these have limits that cannot be overcome -without some contribution from the field of natural language processing (NLP). -In what follows, we examine the challenges faced and the issues raised by -integrating NLP into CALL. We hope to demonstrate that the key to success in -integrating NLP into CALL is to be found in multidisciplinary work between -computer experts, linguists, language teachers, didacticians and NLP -specialists. -" -698,1302.4874,"Gon\c{c}alo Sim\~oes, Helena Galhardas, David Matos",A Labeled Graph Kernel for Relationship Extraction,cs.CL cs.LG," In this paper, we propose an approach for Relationship Extraction (RE) based -on labeled graph kernels. The kernel we propose is a particularization of a -random walk kernel that exploits two properties previously studied in the RE -literature: (i) the words between the candidate entities or connecting them in -a syntactic representation are particularly likely to carry information -regarding the relationship; and (ii) combining information from distinct -sources in a kernel may help the RE system make better decisions. We performed -experiments on a dataset of protein-protein interactions and the results show -that our approach obtains effectiveness values that are comparable with the -state-of-the art kernel methods. Moreover, our approach is able to outperform -the state-of-the-art kernels when combined with other kernel methods. -" -699,1302.5181,Mark Burgin,Basic Classes of Grammars with Prohibition,cs.FL cs.CL," A practical tool for natural language modeling and development of -human-machine interaction is developed in the context of formal grammars and -languages. A new type of formal grammars, called grammars with prohibition, is -introduced. Grammars with prohibition provide more powerful tools for natural -language generation and better describe processes of language learning than the -conventional formal grammars. Here we study relations between languages -generated by different grammars with prohibition based on conventional types of -formal grammars such as context-free or context sensitive grammars. Besides, we -compare languages generated by different grammars with prohibition and -languages generated by conventional formal grammars. In particular, it is -demonstrated that they have essentially higher computational power and -expressive possibilities in comparison with the conventional formal grammars. -Thus, while conventional formal grammars are recursive and subrecursive -algorithms, many classes of grammars with prohibition are superrecursive -algorithms. Results presented in this work are aimed at the development of -human-machine interaction, modeling natural languages, empowerment of -programming languages, computer simulation, better software systems, and theory -of recursion. -" -700,1302.5526,"Rainer Reisenauer, Kenny Smith and Richard A. Blythe","Stochastic dynamics of lexicon learning in an uncertain and nonuniform - world",physics.soc-ph cond-mat.stat-mech cs.CL q-bio.NC," We study the time taken by a language learner to correctly identify the -meaning of all words in a lexicon under conditions where many plausible -meanings can be inferred whenever a word is uttered. We show that the most -basic form of cross-situational learning - whereby information from multiple -episodes is combined to eliminate incorrect meanings - can perform badly when -words are learned independently and meanings are drawn from a nonuniform -distribution. If learners further assume that no two words share a common -meaning, we find a phase transition between a maximally-efficient learning -regime, where the learning time is reduced to the shortest it can possibly be, -and a partially-efficient regime where incorrect candidate meanings for words -persist at late times. We obtain exact results for the word-learning process -through an equivalence to a statistical mechanical problem of enumerating loops -in the space of word-meaning mappings. -" -701,1302.5645,Djallel Bouneffouf,Role of temporal inference in the recognition of textual inference,cs.CL," This project is a part of nature language processing and its aims to develop -a system of recognition inference text-appointed TIMINF. This type of system -can detect, given two portions of text, if a text is semantically deducted from -the other. We focused on making the inference time in this type of system. For -that we have built and analyzed a body built from questions collected through -the web. This study has enabled us to classify different types of times -inferences and for designing the architecture of TIMINF which seeks to -integrate a module inference time in a detection system inference text. We also -assess the performance of sorties TIMINF system on a test corpus with the same -strategy adopted in the challenge RTE. -" -702,1302.5675,"Wafa N. Bdour, Natheer K. Gharaibeh",Development of Yes/No Arabic Question Answering System,cs.CL cs.IR," Developing Question Answering systems has been one of the important research -issues because it requires insights from a variety of -disciplines,including,Artificial Intelligence,Information Retrieval, -Information Extraction,Natural Language Processing, and Psychology.In this -paper we realize a formal model for a lightweight semantic based open domain -yes/no Arabic question answering system based on paragraph retrieval with -variable length. We propose a constrained semantic representation. Using an -explicit unification framework based on semantic similarities and query -expansion synonyms and antonyms.This frequently improves the precision of the -system. Employing the passage retrieval system achieves a better precision by -retrieving more paragraphs that contain relevant answers to the question; It -significantly reduces the amount of text to be processed by the system. -" -703,1302.6334,"Guillaume Bonfante (LORIA Universit\'e de Lorraine), Bruno Guillaume - (LORIA Inria Nancy Grand-Est)",Non-simplifying Graph Rewriting Termination,cs.CL cs.CC cs.LO," So far, a very large amount of work in Natural Language Processing (NLP) rely -on trees as the core mathematical structure to represent linguistic -informations (e.g. in Chomsky's work). However, some linguistic phenomena do -not cope properly with trees. In a former paper, we showed the benefit of -encoding linguistic structures by graphs and of using graph rewriting rules to -compute on those structures. Justified by some linguistic considerations, graph -rewriting is characterized by two features: first, there is no node creation -along computations and second, there are non-local edge modifications. Under -these hypotheses, we show that uniform termination is undecidable and that -non-uniform termination is decidable. We describe two termination techniques -based on weights and we give complexity bound on the derivation length for -these rewriting system. -" -704,1302.6777,"Greg Adams, Beth Millar, Eric Neufeld, Tim Philip",Ending-based Strategies for Part-of-speech Tagging,cs.CL," Probabilistic approaches to part-of-speech tagging rely primarily on -whole-word statistics about word/tag combinations as well as contextual -information. But experience shows about 4 per cent of tokens encountered in -test sets are unknown even when the training set is as large as a million -words. Unseen words are tagged using secondary strategies that exploit word -features such as endings, capitalizations and punctuation marks. In this work, -word-ending statistics are primary and whole-word statistics are secondary. -First, a tagger was trained and tested on word endings only. Subsequent -experiments added back whole-word statistics for the words occurring most -frequently in the training set. As grew larger, performance was expected to -improve, in the limit performing the same as word-based taggers. Surprisingly, -the ending-based tagger initially performed nearly as well as the word-based -tagger; in the best case, its performance significantly exceeded that of the -word-based tagger. Lastly, and unexpectedly, an effect of negative returns was -observed - as grew larger, performance generally improved and then declined. By -varying factors such as ending length and tag-list strategy, we achieved a -success rate of 97.5 percent. -" -705,1302.7056,"Wesam Elshamy, Doina Caragea, William Hsu",KSU KDD: Word Sense Induction by Clustering in Topic Space,cs.CL cs.AI stat.AP stat.ML," We describe our language-independent unsupervised word sense induction -system. This system only uses topic features to cluster different word senses -in their global context topic space. Using unlabeled data, this system trains a -latent Dirichlet allocation (LDA) topic model then uses it to infer the topics -distribution of the test instances. By clustering these topics distributions in -their topic space we cluster them into different senses. Our hypothesis is that -closeness in topic space reflects similarity between different word senses. -This system participated in SemEval-2 word sense induction and disambiguation -task and achieved the second highest V-measure score among all other systems. -" -706,1303.0347,"Diego R. Amancio, Eduardo G. Altmann, Diego Rybski, Osvaldo N. - Oliveira Jr. and Luciano da F. Costa","Probing the statistical properties of unknown texts: application to the - Voynich Manuscript",physics.soc-ph cs.CL physics.data-an," While the use of statistical physics methods to analyze large corpora has -been useful to unveil many patterns in texts, no comprehensive investigation -has been performed investigating the properties of statistical measurements -across different languages and texts. In this study we propose a framework that -aims at determining if a text is compatible with a natural language and which -languages are closest to it, without any knowledge of the meaning of the words. -The approach is based on three types of statistical measurements, i.e. obtained -from first-order statistics of word properties in a text, from the topology of -complex networks representing text, and from intermittency concepts where text -is treated as a time series. Comparative experiments were performed with the -New Testament in 15 different languages and with distinct books in English and -Portuguese in order to quantify the dependency of the different measurements on -the language and on the story being told in the book. The metrics found to be -informative in distinguishing real texts from their shuffled versions include -assortativity, degree and selectivity of words. As an illustration, we analyze -an undeciphered medieval manuscript known as the Voynich Manuscript. We show -that it is mostly compatible with natural languages and incompatible with -random texts. We also obtain candidates for key-words of the Voynich Manuscript -which could be helpful in the effort of deciphering it. Because we were able to -identify statistical measurements that are more dependent on the syntax than on -the semantics, the framework may also serve for text analysis in -language-dependent applications. -" -707,1303.0350,Diego R. Amancio and Osvaldo N. Oliveira Jr. and Luciano da F. Costa,"Structure-semantics interplay in complex networks and its effects on the - predictability of similarity in texts",cs.CL physics.soc-ph," There are different ways to define similarity for grouping similar texts into -clusters, as the concept of similarity may depend on the purpose of the task. -For instance, in topic extraction similar texts mean those within the same -semantic field, whereas in author recognition stylistic features should be -considered. In this study, we introduce ways to classify texts employing -concepts of complex networks, which may be able to capture syntactic, semantic -and even pragmatic features. The interplay between the various metrics of the -complex networks is analyzed with three applications, namely identification of -machine translation (MT) systems, evaluation of quality of machine translated -texts and authorship recognition. We shall show that topological features of -the networks representing texts can enhance the ability to identify MT systems -in particular cases. For evaluating the quality of MT texts, on the other hand, -high correlation was obtained with methods capable of capturing the semantics. -This was expected because the golden standards used are themselves based on -word co-occurrence. Notwithstanding, the Katz similarity, which involves -semantic and structure in the comparison of texts, achieved the highest -correlation with the NIST measurement, indicating that in some cases the -combination of both approaches can improve the ability to quantify quality in -MT. In authorship recognition, again the topological features were relevant in -some contexts, though for the books and authors analyzed good results were -obtained with semantic features as well. Because hybrid approaches encompassing -semantic and topological features have not been extensively used, we believe -that the methodology proposed here may be useful to enhance text classification -considerably, as it combines well-established strategies. -" -708,1303.0445,Kanagavalli V R and Raja. K,"Detecting and resolving spatial ambiguity in text using named entity - extraction and self learning fuzzy logic techniques",cs.IR cs.CL," Information extraction identifies useful and relevant text in a document and -converts unstructured text into a form that can be loaded into a database -table. Named entity extraction is a main task in the process of information -extraction and is a classification problem in which words are assigned to one -or more semantic classes or to a default non-entity class. A word which can -belong to one or more classes and which has a level of uncertainty in it can be -best handled by a self learning Fuzzy Logic Technique. This paper proposes a -method for detecting the presence of spatial uncertainty in the text and -dealing with spatial ambiguity using named entity extraction techniques coupled -with self learning fuzzy logic techniques -" -709,1303.0446,"Boyan Bonev, Gema Ram\'irez-S\'anchez, Sergio Ortiz Rojas",Statistical sentiment analysis performance in Opinum,cs.CL," The classification of opinion texts in positive and negative is becoming a -subject of great interest in sentiment analysis. The existence of many labeled -opinions motivates the use of statistical and machine-learning methods. -First-order statistics have proven to be very limited in this field. The Opinum -approach is based on the order of the words without using any syntactic and -semantic information. It consists of building one probabilistic model for the -positive and another one for the negative opinions. Then the test opinions are -compared to both models and a decision and confidence measure are calculated. -In order to reduce the complexity of the training corpus we first lemmatize the -texts and we replace most named-entities with wildcards. Opinum presents an -accuracy above 81% for Spanish opinions in the financial products domain. In -this work we discuss which are the most important factors that have impact on -the classification performance. -" -710,1303.0489,"Leena H. Patil, Mohammed Atique",A Semantic approach for effective document clustering using WordNet,cs.CL cs.IR," Now a days, the text document is spontaneously increasing over the internet, -e-mail and web pages and they are stored in the electronic database format. To -arrange and browse the document it becomes difficult. To overcome such problem -the document preprocessing, term selection, attribute reduction and maintaining -the relationship between the important terms using background knowledge, -WordNet, becomes an important parameters in data mining. In these paper the -different stages are formed, firstly the document preprocessing is done by -removing stop words, stemming is performed using porter stemmer algorithm, word -net thesaurus is applied for maintaining relationship between the important -terms, global unique words, and frequent word sets get generated, Secondly, -data matrix is formed, and thirdly terms are extracted from the documents by -using term selection approaches tf-idf, tf-df, and tf2 based on their minimum -threshold value. Further each and every document terms gets preprocessed, where -the frequency of each term within the document is counted for representation. -The purpose of this approach is to reduce the attributes and find the effective -term selection method using WordNet for better clustering accuracy. Experiments -are evaluated on Reuters Transcription Subsets, wheat, trade, money grain, and -ship, Reuters 21578, Classic 30, 20 News group (atheism), 20 News group -(Hardware), 20 News group (Computer Graphics) etc. -" -711,1303.1232,"Jessica Ram\'irez, Masayuki Asahara, Yuji Matsumoto",Japanese-Spanish Thesaurus Construction Using English as a Pivot,cs.CL cs.AI," We present the results of research with the goal of automatically creating a -multilingual thesaurus based on the freely available resources of Wikipedia and -WordNet. Our goal is to increase resources for natural language processing -tasks such as machine translation targeting the Japanese-Spanish language pair. -Given the scarcity of resources, we use existing English resources as a pivot -for creating a trilingual Japanese-Spanish-English thesaurus. Our approach -consists of extracting the translation tuples from Wikipedia, disambiguating -them by mapping them to WordNet word senses. We present results comparing two -methods of disambiguation, the first using VSM on Wikipedia article texts and -WordNet definitions, and the second using categorical information extracted -from Wikipedia, We find that mixing the two methods produces favorable results. -Using the proposed method, we have constructed a multilingual -Spanish-Japanese-English thesaurus consisting of 25,375 entries. The same -method can be applied to any pair of languages that are linked to English in -Wikipedia. -" -712,1303.1441,Kamal Sarkar,A Hybrid Approach to Extract Keyphrases from Medical Documents,cs.IR cs.CL," Keyphrases are the phrases, consisting of one or more words, representing the -important concepts in the articles. Keyphrases are useful for a variety of -tasks such as text summarization, automatic indexing, -clustering/classification, text mining etc. This paper presents a hybrid -approach to keyphrase extraction from medical documents. The keyphrase -extraction approach presented in this paper is an amalgamation of two methods: -the first one assigns weights to candidate keyphrases based on an effective -combination of features such as position, term frequency, inverse document -frequency and the second one assign weights to candidate keyphrases using some -knowledge about their similarities to the structure and characteristics of -keyphrases available in the memory (stored list of keyphrases). An efficient -candidate keyphrase identification method as the first component of the -proposed keyphrase extraction system has also been introduced in this paper. -The experimental results show that the proposed hybrid approach performs better -than some state-of-the art keyphrase extraction approaches. -" -713,1303.1599,"Xiao-Yong Yan, Ying Fan, Zengru Di, Shlomo Havlin, Jinshan Wu","Efficient learning strategy of Chinese characters based on network - approach",physics.soc-ph cs.CL cs.SI," Based on network analysis of hierarchical structural relations among Chinese -characters, we develop an efficient learning strategy of Chinese characters. We -regard a more efficient learning method if one learns the same number of useful -Chinese characters in less effort or time. We construct a node-weighted network -of Chinese characters, where character usage frequencies are used as node -weights. Using this hierarchical node-weighted network, we propose a new -learning method, the distributed node weight (DNW) strategy, which is based on -a new measure of nodes' importance that takes into account both the weight of -the nodes and the hierarchical structure of the network. Chinese character -learning strategies, particularly their learning order, are analyzed as -dynamical processes over the network. We compare the efficiency of three -theoretical learning methods and two commonly used methods from mainstream -Chinese textbooks, one for Chinese elementary school students and the other for -students learning Chinese as a second language. We find that the DNW method -significantly outperforms the others, implying that the efficiency of current -learning methods of major textbooks can be greatly improved. -" -714,1303.1703,Fatiha Boubekeur and Wassila Azzoug,Concept-based indexing in text information retrieval,cs.IR cs.CL," Traditional information retrieval systems rely on keywords to index documents -and queries. In such systems, documents are retrieved based on the number of -shared keywords with the query. This lexical-focused retrieval leads to -inaccurate and incomplete results when different keywords are used to describe -the documents and queries. Semantic-focused retrieval approaches attempt to -overcome this problem by relying on concepts rather than on keywords to -indexing and retrieval. The goal is to retrieve documents that are semantically -relevant to a given user query. This paper addresses this issue by proposing a -solution at the indexing level. More precisely, we propose a novel approach for -semantic indexing based on concepts identified from a linguistic resource. In -particular, our approach relies on the joint use of WordNet and WordNetDomains -lexical databases for concept identification. Furthermore, we propose a -semantic-based concept weighting scheme that relies on a novel definition of -concept centrality. The resulting system is evaluated on the TIME test -collection. Experimental results show the effectiveness of our proposition over -traditional IR approaches. -" -715,1303.1929,"Muntsa Padr\'o, N\'uria Bel, Silvia Necsulescu",Towards the Fully Automatic Merging of Lexical Resources: A Step Forward,cs.CL," This article reports on the results of the research done towards the fully -automatically merging of lexical resources. Our main goal is to show the -generality of the proposed approach, which have been previously applied to -merge Spanish Subcategorization Frames lexica. In this work we extend and apply -the same technique to perform the merging of morphosyntactic lexica encoded in -LMF. The experiments showed that the technique is general enough to obtain good -results in these two different tasks which is an important step towards -performing the merging of lexical resources fully automatically. -" -716,1303.1930,"N\'uria Bel, Lauren Romeo, Muntsa Padr\'o",Automatic lexical semantic classification of nouns,cs.CL," The work we present here addresses cue-based noun classification in English -and Spanish. Its main objective is to automatically acquire lexical semantic -information by classifying nouns into previously known noun lexical classes. -This is achieved by using particular aspects of linguistic contexts as cues -that identify a specific lexical class. Here we concentrate on the task of -identifying such cues and the theoretical background that allows for an -assessment of the complexity of the task. The results show that, despite of the -a-priori complexity of the task, cue-based classification is a useful tool in -the automatic acquisition of lexical semantic classes. -" -717,1303.1931,"Silvia V\'azquez, N\'uria Bel",A Classification of Adjectives for Polarity Lexicons Enhancement,cs.CL," Subjective language detection is one of the most important challenges in -Sentiment Analysis. Because of the weight and frequency in opinionated texts, -adjectives are considered a key piece in the opinion extraction process. These -subjective units are more and more frequently collected in polarity lexicons in -which they appear annotated with their prior polarity. However, at the moment, -any polarity lexicon takes into account prior polarity variations across -domains. This paper proves that a majority of adjectives change their prior -polarity value depending on the domain. We propose a distinction between domain -dependent and domain independent adjectives. Moreover, our analysis led us to -propose a further classification related to subjectivity degree: constant, -mixed and highly subjective adjectives. Following this classification, polarity -values will be a better support for Sentiment Analysis. -" -718,1303.1932,"N\'uria Bel, Vassilis Papavasiliou, Prokopis Prokopidis, Antonio - Toral, Victoria Arranz",Mining and Exploiting Domain-Specific Corpora in the PANACEA Platform,cs.CL," The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform -that automates the stages involved in the acquisition, production, updating and -maintenance of the large language resources required by, among others, MT -systems. The development of a Corpus Acquisition Component (CAC) for extracting -monolingual and bilingual data from the web is one of the most innovative -building blocks of PANACEA. The CAC, which is the first stage in the PANACEA -pipeline for building Language Resources, adopts an efficient and distributed -methodology to crawl for web documents with rich textual content in specific -languages and predefined domains. The CAC includes modules that can acquire -parallel data from sites with in-domain content available in more than one -language. In order to extrinsically evaluate the CAC methodology, we have -conducted several experiments that used crawled parallel corpora for the -identification and extraction of parallel sentences using sentence alignment. -The corpora were then successfully used for domain adaptation of Machine -Translation Systems. -" -719,1303.2430,Diederik Aerts,"Quantum and Concept Combination, Entangled Measurements and Prototype - Theory",cs.AI cs.CL quant-ph," We analyze the meaning of the violation of the marginal probability law for -situations of correlation measurements where entanglement is identified. We -show that for quantum theory applied to the cognitive realm such a violation -does not lead to the type of problems commonly believed to occur in situations -of quantum theory applied to the physical realm. We briefly situate our quantum -approach for modeling concepts and their combinations with respect to the -notions of 'extension' and 'intension' in theories of meaning, and in existing -concept theories. -" -720,1303.2448,"N\'uria Bel, Maria Coll, Gabriela Resnik","Automatic Detection of Non-deverbal Event Nouns for Quick Lexicon - Production",cs.CL," In this work we present the results of our experimental work on the -develop-ment of lexical class-based lexica by automatic means. The objective is -to as-sess the use of linguistic lexical-class based information as a feature -selection methodology for the use of classifiers in quick lexical development. -The results show that the approach can help in re-ducing the human effort -required in the development of language resources sig-nificantly. -" -721,1303.2449,"Lauren Romeo, Sara Mendes, N\'uria Bel","Using qualia information to identify lexical semantic classes in an - unsupervised clustering task",cs.CL," Acquiring lexical information is a complex problem, typically approached by -relying on a number of contexts to contribute information for classification. -One of the first issues to address in this domain is the determination of such -contexts. The work presented here proposes the use of automatically obtained -FORMAL role descriptors as features used to draw nouns from the same lexical -semantic class together in an unsupervised clustering task. We have dealt with -three lexical semantic classes (HUMAN, LOCATION and EVENT) in English. The -results obtained show that it is possible to discriminate between elements from -different lexical semantic classes using only FORMAL role information, hence -validating our initial hypothesis. Also, iterating our method accurately -accounts for fine-grained distinctions within lexical classes, namely -distinctions involving ambiguous expressions. Moreover, a filtering and -bootstrapping strategy employed in extracting FORMAL role descriptors proved to -minimize effects of sparse data and noise in our task. -" -722,1303.2826,William M. Darling and Fei Song,Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA,cs.CL," This article presents a probabilistic generative model for text based on -semantic topics and syntactic classes called Part-of-Speech LDA (POSLDA). -POSLDA simultaneously uncovers short-range syntactic patterns (syntax) and -long-range semantic patterns (topics) that exist in document collections. This -results in word distributions that are specific to both topics (sports, -education, ...) and parts-of-speech (nouns, verbs, ...). For example, -multinomial distributions over words are uncovered that can be understood as -""nouns about weather"" or ""verbs about law"". We describe the model and an -approximate inference algorithm and then demonstrate the quality of the learned -topics both qualitatively and quantitatively. Then, we discuss an NLP -application where the output of POSLDA can lead to strong improvements in -quality: unsupervised part-of-speech tagging. We describe algorithms for this -task that make use of POSLDA-learned distributions that result in improved -performance beyond the state of the art. -" -723,1303.3036,"Christian Retor\'e (LaBRI, IRIT)","Type-theoretical natural language semantics: on the system F for meaning - assembly",cs.LO cs.CL math.LO," This paper presents and extends our type theoretical framework for a -compositional treatment of natural language semantics with some lexical -features like coercions (e.g. of a town into a football club) and copredication -(e.g. on a town as a set of people and as a location). The second order typed -lambda calculus was shown to be a good framework, and here we discuss how to -introduced predefined types and coercive subtyping which are much more natural -than internally coded similar constructs. Linguistic applications of these new -features are also exemplified. -" -724,1303.3170,Peter Hines,Types and forgetfulness in categorical linguistics and quantum mechanics,cs.CL math.CT quant-ph," The role of types in categorical models of meaning is investigated. A general -scheme for how typed models of meaning may be used to compare sentences, -regardless of their grammatical structure is described, and a toy example is -used as an illustration. Taking as a starting point the question of whether the -evaluation of such a type system 'loses information', we consider the -parametrized typing associated with connectives from this viewpoint. - The answer to this question implies that, within full categorical models of -meaning, the objects associated with types must exhibit a simple but subtle -categorical property known as self-similarity. We investigate the category -theory behind this, with explicit reference to typed systems, and their -monoidal closed structure. We then demonstrate close connections between such -self-similar structures and dagger Frobenius algebras. In particular, we -demonstrate that the categorical structures implied by the polymorphically -typed connectives give rise to a (lax unitless) form of the special forms of -Frobenius algebras known as classical structures, used heavily in abstract -categorical approaches to quantum mechanics. -" -725,1303.3592,"Maxim Makatchev, Reid Simmons, Majd Sakr and Micheline Ziadee",Expressing Ethnicity through Behaviors of a Robot Character,cs.CL cs.CY cs.RO," Achieving homophily, or association based on similarity, between a human user -and a robot holds a promise of improved perception and task performance. -However, no previous studies that address homophily via ethnic similarity with -robots exist. In this paper, we discuss the difficulties of evoking ethnic cues -in a robot, as opposed to a virtual agent, and an approach to overcome those -difficulties based on using ethnically salient behaviors. We outline our -methodology for selecting and evaluating such behaviors, and culminate with a -study that evaluates our hypotheses of the possibility of ethnic attribution of -a robot character through verbal and nonverbal behaviors and of achieving the -homophily effect. -" -726,1303.3948,"Urmila Shrawankar, Vilas Thakare",An Adaptive Methodology for Ubiquitous ASR System,cs.CL cs.HC cs.SD," Achieving and maintaining the performance of ubiquitous (Automatic Speech -Recognition) ASR system is a real challenge. The main objective of this work is -to develop a method that will improve and show the consistency in performance -of ubiquitous ASR system for real world noisy environment. An adaptive -methodology has been developed to achieve an objective with the help of -implementing followings, -Cleaning speech signal as much as possible while -preserving originality / intangibility using various modified filters and -enhancement techniques. -Extracting features from speech signals using various -sizes of parameter. -Train the system for ubiquitous environment using -multi-environmental adaptation training methods. -Optimize the word recognition -rate with appropriate variable size of parameters using fuzzy technique. The -consistency in performance is tested using standard noise databases as well as -in real world environment. A good improvement is noticed. This work will be -helpful to give discriminative training of ubiquitous ASR system for better -Human Computer Interaction (HCI) using Speech User Interface (SUI). -" -727,1303.4293,Kaarel Kaljurand and Tobias Kuhn,"A Multilingual Semantic Wiki Based on Attempto Controlled English and - Grammatical Framework",cs.CL cs.HC," We describe a semantic wiki system with an underlying controlled natural -language grammar implemented in Grammatical Framework (GF). The grammar -restricts the wiki content to a well-defined subset of Attempto Controlled -English (ACE), and facilitates a precise bidirectional automatic translation -between ACE and language fragments of a number of other natural languages, -making the wiki content accessible multilingually. Additionally, our approach -allows for automatic translation into the Web Ontology Language (OWL), which -enables automatic reasoning over the wiki content. The developed wiki -environment thus allows users to build, query and view OWL knowledge bases via -a user-friendly multilingual natural language interface. As a further feature, -the underlying multilingual grammar is integrated into the wiki and can be -collaboratively edited to extend the vocabulary of the wiki or even customize -its sentence structures. This work demonstrates the combination of the existing -technologies of Attempto Controlled English and Grammatical Framework, and is -implemented as an extension of the existing semantic wiki engine AceWiki. -" -728,1303.4959,"Victoria Otero-Espinar, Lu\'is F. Seoane, Juan J. Nieto, Jorge Mira","Analytic solution of a model of language competition with bilingualism - and interlinguistic similarity",physics.soc-ph cs.CL," An in-depth analytic study of a model of language dynamics is presented: a -model which tackles the problem of the coexistence of two languages within a -closed community of speakers taking into account bilingualism and incorporating -a parameter to measure the distance between languages. After previous numerical -simulations, the model yielded that coexistence might lead to survival of both -languages within monolingual speakers along with a bilingual community or to -extinction of the weakest tongue depending on different parameters. In this -paper, such study is closed with thorough analytical calculations to settle the -results in a robust way and previous results are refined with some -modifications. From the present analysis it is possible to almost completely -assay the number and nature of the equilibrium points of the model, which -depend on its parameters, as well as to build a phase space based on them. -Also, we obtain conclusions on the way the languages evolve with time. Our -rigorous considerations also suggest ways to further improve the model and -facilitate the comparison of its consequences with those from other approaches -or with real data. -" -729,1303.5148,Damianos Karakos and Mark Dredze and Sanjeev Khudanpur,"Estimating Confusions in the ASR Channel for Improved Topic-based - Language Model Adaptation",cs.CL cs.LG," Human language is a combination of elemental languages/domains/styles that -change across and sometimes within discourses. Language models, which play a -crucial role in speech recognizers and machine translation systems, are -particularly sensitive to such changes, unless some form of adaptation takes -place. One approach to speech language model adaptation is self-training, in -which a language model's parameters are tuned based on automatically -transcribed audio. However, transcription errors can misguide self-training, -particularly in challenging settings such as conversational speech. In this -work, we propose a model that considers the confusions (errors) of the ASR -channel. By modeling the likely confusions in the ASR output instead of using -just the 1-best, we improve self-training efficacy by obtaining a more reliable -reference transcription estimate. We demonstrate improved topic-based language -modeling adaptation results over both 1-best and lattice self-training using -our ASR channel confusion estimates on telephone conversations. -" -730,1303.5513,"Urmila Shrawankar, Vilas Thakare","Parameters Optimization for Improving ASR Performance in Adverse Real - World Noisy Environmental Conditions",cs.CL cs.SD," From the existing research it has been observed that many techniques and -methodologies are available for performing every step of Automatic Speech -Recognition (ASR) system, but the performance (Minimization of Word Error -Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology -is not dependent on the only technique applied in that method. The research -work indicates that, performance mainly depends on the category of the noise, -the level of the noise and the variable size of the window, frame, frame -overlap etc is considered in the existing methods. The main aim of the work -presented in this paper is to use variable size of parameters like window size, -frame size and frame overlap percentage to observe the performance of -algorithms for various categories of noise with different levels and also train -the system for all size of parameters and category of real world noisy -environment to improve the performance of the speech recognition system. This -paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by -applying variable size of parameters. It is observed that, it is really very -hard to evaluate test results and decide parameter size for ASR performance -improvement for its resultant optimization. Hence, this study further suggests -the feasible and optimum parameter size using Fuzzy Inference System (FIS) for -enhancing resultant accuracy in adverse real world noisy environmental -conditions. This work will be helpful to give discriminative training of -ubiquitous ASR system for better Human Computer Interaction (HCI). -" -731,1303.5515,"Urmila Shrawankar, VM Thakare",Adverse Conditions and ASR Techniques for Robust Speech User Interface,cs.CL cs.SD," The main motivation for Automatic Speech Recognition (ASR) is efficient -interfaces to computers, and for the interfaces to be natural and truly useful, -it should provide coverage for a large group of users. The purpose of these -tasks is to further improve man-machine communication. ASR systems exhibit -unacceptable degradations in performance when the acoustical environments used -for training and testing the system are not the same. The goal of this research -is to increase the robustness of the speech recognition systems with respect to -changes in the environment. A system can be labeled as environment-independent -if the recognition accuracy for a new environment is the same or higher than -that obtained when the system is retrained for that environment. Attaining such -performance is the dream of the researchers. This paper elaborates some of the -difficulties with Automatic Speech Recognition (ASR). These difficulties are -classified into Speakers characteristics and environmental conditions, and -tried to suggest some techniques to compensate variations in speech signal. -This paper focuses on the robustness with respect to speakers variations and -changes in the acoustical environment. We discussed several different external -factors that change the environment and physiological differences that affect -the performance of a speech recognition system followed by techniques that are -helpful to design a robust ASR system. -" -732,1303.5778,"Alex Graves, Abdel-rahman Mohamed and Geoffrey Hinton",Speech Recognition with Deep Recurrent Neural Networks,cs.NE cs.CL," Recurrent neural networks (RNNs) are a powerful model for sequential data. -End-to-end training methods such as Connectionist Temporal Classification make -it possible to train RNNs for sequence labelling problems where the -input-output alignment is unknown. The combination of these methods with the -Long Short-term Memory RNN architecture has proved particularly fruitful, -delivering state-of-the-art results in cursive handwriting recognition. However -RNN performance in speech recognition has so far been disappointing, with -better results returned by deep feedforward networks. This paper investigates -\emph{deep recurrent neural networks}, which combine the multiple levels of -representation that have proved so effective in deep networks with the flexible -use of long range context that empowers RNNs. When trained end-to-end with -suitable regularisation, we find that deep Long Short-term Memory RNNs achieve -a test set error of 17.7% on the TIMIT phoneme recognition benchmark, which to -our knowledge is the best recorded score. -" -733,1303.5960,Daniel Christen,SYNTAGMA. A Linguistic Approach to Parsing,cs.CL," SYNTAGMA is a rule-based parsing system, structured on two levels: a general -parsing engine and a language specific grammar. The parsing engine is a -language independent program, while grammar and language specific rules and -resources are given as text files, consisting in a list of constituent -structuresand a lexical database with word sense related features and -constraints. Since its theoretical background is principally Tesniere's -Elements de syntaxe, SYNTAGMA's grammar emphasizes the role of argument -structure (valency) in constraint satisfaction, and allows also horizontal -bounds, for instance treating coordination. Notions such as Pro, traces, empty -categories are derived from Generative Grammar and some solutions are close to -Government&Binding Theory, although they are the result of an autonomous -research. These properties allow SYNTAGMA to manage complex syntactic -configurations and well known weak points in parsing engineering. An important -resource is the semantic network, which is used in disambiguation tasks. -Parsing process follows a bottom-up, rule driven strategy. Its behavior can be -controlled and fine-tuned. -" -734,1303.6175,"R. Ferrer-i-Cancho, A. Hern\'andez-Fern\'andez, D. Lusseau, G. - Agoramoorthy, M. J. Hsu and S. Semple",Compression as a universal principle of animal behavior,q-bio.NC cs.CL cs.IT math.IT physics.data-an q-bio.QM," A key aim in biology and psychology is to identify fundamental principles -underpinning the behavior of animals, including humans. Analyses of human -language and the behavior of a range of non-human animal species have provided -evidence for a common pattern underlying diverse behavioral phenomena: words -follow Zipf's law of brevity (the tendency of more frequently used words to be -shorter), and conformity to this general pattern has been seen in the behavior -of a number of other animals. It has been argued that the presence of this law -is a sign of efficient coding in the information theoretic sense. However, no -strong direct connection has been demonstrated between the law and compression, -the information theoretic principle of minimizing the expected length of a -code. Here we show that minimizing the expected code length implies that the -length of a word cannot increase as its frequency increases. Furthermore, we -show that the mean code length or duration is significantly small in human -language, and also in the behavior of other species in all cases where -agreement with the law of brevity has been found. We argue that compression is -a general principle of animal behavior, that reflects selection for efficiency -of coding. -" -735,1303.7310,"Niraj Kumar, Rashmi Gangadharaiah, Kannan Srinathan and Vasudeva Varma","Exploring the Role of Logically Related Non-Question Phrases for - Answering Why-Questions",cs.CL cs.IR," In this paper, we show that certain phrases although not present in a given -question/query, play a very important role in answering the question. Exploring -the role of such phrases in answering questions not only reduces the dependency -on matching question phrases for extracting answers, but also improves the -quality of the extracted answers. Here matching question phrases means phrases -which co-occur in given question and candidate answers. To achieve the above -discussed goal, we introduce a bigram-based word graph model populated with -semantic and topical relatedness of terms in the given document. Next, we apply -an improved version of ranking with a prior-based approach, which ranks all -words in the candidate document with respect to a set of root words (i.e. -non-stopwords present in the question and in the candidate document). As a -result, terms logically related to the root words are scored higher than terms -that are not related to the root words. Experimental results show that our -devised system performs better than state-of-the-art for the task of answering -Why-questions. -" -736,1304.0104,"Diederik Aerts, Jan Broekaert, Sandro Sozzo and Tomas Veloz",Meaning-focused and Quantum-inspired Information Retrieval,cs.IR cs.CL quant-ph," In recent years, quantum-based methods have promisingly integrated the -traditional procedures in information retrieval (IR) and natural language -processing (NLP). Inspired by our research on the identification and -application of quantum structures in cognition, more specifically our work on -the representation of concepts and their combinations, we put forward a -'quantum meaning based' framework for structured query retrieval in text -corpora and standardized testing corpora. This scheme for IR rests on -considering as basic notions, (i) 'entities of meaning', e.g., concepts and -their combinations and (ii) traces of such entities of meaning, which is how -documents are considered in this approach. The meaning content of these -'entities of meaning' is reconstructed by solving an 'inverse problem' in the -quantum formalism, consisting of reconstructing the full states of the entities -of meaning from their collapsed states identified as traces in relevant -documents. The advantages with respect to traditional approaches, such as -Latent Semantic Analysis (LSA), are discussed by means of concrete examples. -" -737,1304.0715,"Ladislau B\""ol\""oni",A cookbook of translating English to Xapi,cs.AI cs.CL," The Xapagy cognitive architecture had been designed to perform narrative -reasoning: to model and mimic the activities performed by humans when -witnessing, reading, recalling, narrating and talking about stories. Xapagy -communicates with the outside world using Xapi, a simplified, ""pidgin"" language -which is strongly tied to the internal representation model (instances, scenes -and verb instances) and reasoning techniques (shadows and headless shadows). -While not fully a semantic equivalent of natural language, Xapi can represent a -wide range of complex stories. We illustrate the representation technique used -in Xapi through examples taken from folk physics, folk psychology as well as -some more unusual literary examples. We argue that while the Xapi model -represents a conceptual shift from the English representation, the mapping is -logical and consistent, and a trained knowledge engineer can translate between -English and Xapi at near-native speed. -" -738,1304.1018,"Dimitri Palaz, Ronan Collobert, Mathew Magimai.-Doss","Estimating Phoneme Class Conditional Probabilities from Raw Speech - Signal using Convolutional Neural Networks",cs.LG cs.CL cs.NE," In hybrid hidden Markov model/artificial neural networks (HMM/ANN) automatic -speech recognition (ASR) system, the phoneme class conditional probabilities -are estimated by first extracting acoustic features from the speech signal -based on prior knowledge such as, speech perception or/and speech production -knowledge, and, then modeling the acoustic features with an ANN. Recent -advances in machine learning techniques, more specifically in the field of -image processing and text processing, have shown that such divide and conquer -strategy (i.e., separating feature extraction and modeling steps) may not be -necessary. Motivated from these studies, in the framework of convolutional -neural networks (CNNs), this paper investigates a novel approach, where the -input to the ANN is raw speech signal and the output is phoneme class -conditional probability estimates. On TIMIT phoneme recognition task, we study -different ANN architectures to show the benefit of CNNs and compare the -proposed approach against conventional approach where, spectral-based feature -MFCC is extracted and modeled by a multilayer perceptron. Our studies show that -the proposed approach can yield comparable or better phoneme recognition -performance when compared to the conventional approach. It indicates that CNNs -can learn features relevant for phoneme classification automatically from the -raw speech signal. -" -739,1304.2476,"Rushdi Shams, M.M.A. Hashem, Afrina Hossain, Suraiya Rumana Akter, and - Monika Gope","Corpus-based Web Document Summarization using Statistical and Linguistic - Approach",cs.IR cs.CL," Single document summarization generates summary by extracting the -representative sentences from the document. In this paper, we presented a novel -technique for summarization of domain-specific text from a single web document -that uses statistical and linguistic analysis on the text in a reference corpus -and the web document. The proposed summarizer uses the combinational function -of Sentence Weight (SW) and Subject Weight (SuW) to determine the rank of a -sentence, where SW is the function of number of terms (t_n) and number of words -(w_n) in a sentence, and term frequency (t_f) in the corpus and SuW is the -function of t_n and w_n in a subject, and t_f in the corpus. 30 percent of the -ranked sentences are considered to be the summary of the web document. We -generated three web document summaries using our technique and compared each of -them with the summaries developed manually from 16 different human subjects. -Results showed that 68 percent of the summaries produced by our approach -satisfy the manual summaries. -" -740,1304.3092,Steven J. Henkind,"Imprecise Meanings as a Cause of Uncertainty in Medical Knowledge-Based - Systems",cs.AI cs.CL," There has been a considerable amount of work on uncertainty in -knowledge-based systems. This work has generally been concerned with -uncertainty arising from the strength of inferences and the weight of evidence. -In this paper we discuss another type of uncertainty: that which is due to -imprecision in the underlying primitives used to represent the knowledge of the -system. In particular, a given word may denote many similar but not identical -entities. Such words are said to be lexically imprecise. Lexical imprecision -has caused widespread problems in many areas. Unless this phenomenon is -recognized and appropriately handled, it can degrade the performance of -knowledge-based systems. In particular, it can lead to difficulties with the -user interface, and with the inferencing processes of these systems. Some -techniques are suggested for coping with this phenomenon. -" -741,1304.3265,"Maher Jebali, Patrice Dalle, Mohamed Jemni","Extension of hidden markov model for recognizing large vocabulary of - sign language",cs.CL," Computers still have a long way to go before they can interact with users in -a truly natural fashion. From a users perspective, the most natural way to -interact with a computer would be through a speech and gesture interface. -Although speech recognition has made significant advances in the past ten -years, gesture recognition has been lagging behind. Sign Languages (SL) are the -most accomplished forms of gestural communication. Therefore, their automatic -analysis is a real challenge, which is interestingly implied to their lexical -and syntactic organization levels. Statements dealing with sign language occupy -a significant interest in the Automatic Natural Language Processing (ANLP) -domain. In this work, we are dealing with sign language recognition, in -particular of French Sign Language (FSL). FSL has its own specificities, such -as the simultaneity of several parameters, the important role of the facial -expression or movement and the use of space for the proper utterance -organization. Unlike speech recognition, Frensh sign language (FSL) events -occur both sequentially and simultaneously. Thus, the computational processing -of FSL is too complex than the spoken languages. We present a novel approach -based on HMM to reduce the recognition complexity. -" -742,1304.3432,"Stephen Jose Hanson, Malcolm Bauer","Machine Learning, Clustering, and Polymorphy",cs.AI cs.CL cs.LG," This paper describes a machine induction program (WITT) that attempts to -model human categorization. Properties of categories to which human subjects -are sensitive includes best or prototypical members, relative contrasts between -putative categories, and polymorphy (neither necessary or sufficient features). -This approach represents an alternative to usual Artificial Intelligence -approaches to generalization and conceptual clustering which tend to focus on -necessary and sufficient feature rules, equivalence classes, and simple search -and match schemes. WITT is shown to be more consistent with human -categorization while potentially including results produced by more traditional -clustering schemes. Applications of this approach in the domains of expert -systems and information retrieval are also discussed. -" -743,1304.3841,Ramon Ferrer-i-Cancho and Haitao Liu,"The risks of mixing dependency lengths from sequences of different - length",cs.CL physics.data-an," Mixing dependency lengths from sequences of different length is a common -practice in language research. However, the empirical distribution of -dependency lengths of sentences of the same length differs from that of -sentences of varying length and the distribution of dependency lengths depends -on sentence length for real sentences and also under the null hypothesis that -dependencies connect vertices located in random positions of the sequence. This -suggests that certain results, such as the distribution of syntactic dependency -lengths mixing dependencies from sentences of varying length, could be a mere -consequence of that mixing. Furthermore, differences in the global averages of -dependency length (mixing lengths from sentences of varying length) for two -different languages do not simply imply a priori that one language optimizes -dependency lengths better than the other because those differences could be due -to differences in the distribution of sentence lengths and other factors. -" -744,1304.3879,"Valmi Dufour-Lussier (INRIA Nancy - Grand Est / LORIA), Florence Le - Ber (ICube), Jean Lieber (INRIA Nancy - Grand Est / LORIA), Emmanuel Nauer - (INRIA Nancy - Grand Est / LORIA)","Automatic case acquisition from texts for process-oriented case-based - reasoning",cs.AI cs.CL," This paper introduces a method for the automatic acquisition of a rich case -representation from free text for process-oriented case-based reasoning. Case -engineering is among the most complicated and costly tasks in implementing a -case-based reasoning system. This is especially so for process-oriented -case-based reasoning, where more expressive case representations are generally -used and, in our opinion, actually required for satisfactory case adaptation. -In this context, the ability to acquire cases automatically from procedural -texts is a major step forward in order to reason on processes. We therefore -detail a methodology that makes case acquisition from processes described as -free text possible, with special attention given to assembly instruction texts. -This methodology extends the techniques we used to extract actions from cooking -recipes. We argue that techniques taken from natural language processing are -required for this task, and that they give satisfactory results. An evaluation -based on our implemented prototype extracting workflows from recipe texts is -provided. -" -745,1304.4086,Ramon Ferrer-i-Cancho,"Hubiness, length, crossings and their relationships in dependency trees",cs.CL cs.DM cs.SI physics.soc-ph," Here tree dependency structures are studied from three different -perspectives: their degree variance (hubiness), the mean dependency length and -the number of dependency crossings. Bounds that reveal pairwise dependencies -among these three metrics are derived. Hubiness (the variance of degrees) plays -a central role: the mean dependency length is bounded below by hubiness while -the number of crossings is bounded above by hubiness. Our findings suggest that -the online memory cost of a sentence might be determined not just by the -ordering of words but also by the hubiness of the underlying structure. The 2nd -moment of degree plays a crucial role that is reminiscent of its role in large -complex networks. -" -746,1304.4520,Subhabrata Mukherjee and Pushpak Bhattacharyya,Sentiment Analysis : A Literature Survey,cs.CL," Our day-to-day life has always been influenced by what people think. Ideas -and opinions of others have always affected our own opinions. The explosion of -Web 2.0 has led to increased activity in Podcasting, Blogging, Tagging, -Contributing to RSS, Social Bookmarking, and Social Networking. As a result -there has been an eruption of interest in people to mine these vast resources -of data for opinions. Sentiment Analysis or Opinion Mining is the computational -treatment of opinions, sentiments and subjectivity of text. In this report, we -take a look at the various challenges and applications of Sentiment Analysis. -We will discuss in details various approaches to perform a computational -treatment of sentiments and opinions. Various supervised or data-driven -techniques to SA like Na\""ive Byes, Maximum Entropy, SVM, and Voted Perceptrons -will be discussed and their strengths and drawbacks will be touched upon. We -will also see a new dimension of analyzing sentiments by Cognitive Psychology -mainly through the work of Janyce Wiebe, where we will see ways to detect -subjectivity, perspective in narrative and understanding the discourse -structure. We will also study some specific topics in Sentiment Analysis and -the contemporary works in those areas. -" -747,1304.5823,Edward Grefenstette,"Towards a Formal Distributional Semantics: Simulating Logical Calculi - with Tensors",math.LO cs.CL cs.LO," The development of compositional distributional models of semantics -reconciling the empirical aspects of distributional semantics with the -compositional aspects of formal semantics is a popular topic in the -contemporary literature. This paper seeks to bring this reconciliation one step -further by showing how the mathematical constructs commonly used in -compositional distributional models, such as tensors and matrices, can be used -to simulate different aspects of predicate logic. - This paper discusses how the canonical isomorphism between tensors and -multilinear maps can be exploited to simulate a full-blown quantifier-free -predicate calculus using tensors. It provides tensor interpretations of the set -of logical connectives required to model propositional calculi. It suggests a -variant of these tensor calculi capable of modelling quantifiers, using few -non-linear operations. It finally discusses the relation between these -variants, and how this relation should constitute the subject of future work. -" -748,1304.5880,"M.-A. Abchir (CHART), Isis Truck (CHART), Anna Pappa (LIASD)",Dealing with natural language interfaces in a geolocation context,cs.CL," In the geolocation field where high-level programs and low-level devices -coexist, it is often difficult to find a friendly user inter- face to configure -all the parameters. The challenge addressed in this paper is to propose -intuitive and simple, thus natural lan- guage interfaces to interact with -low-level devices. Such inter- faces contain natural language processing and -fuzzy represen- tations of words that facilitate the elicitation of -business-level objectives in our context. -" -749,1304.7157,"Leon Derczynski, Richard Shaw, Ben Solway, Jun Wang",Question Answering Against Very-Large Text Collections,cs.CL cs.IR," Question answering involves developing methods to extract useful information -from large collections of documents. This is done with specialised search -engines such as Answer Finder. The aim of Answer Finder is to provide an answer -to a question rather than a page listing related documents that may contain the -correct answer. So, a question such as ""How tall is the Eiffel Tower"" would -simply return ""325m"" or ""1,063ft"". Our task was to build on the current version -of Answer Finder by improving information retrieval, and also improving the -pre-processing involved in question series analysis. -" -750,1304.7282,"Priti Saktel, Urmila Shrawankar",An Improved Approach for Word Ambiguity Removal,cs.CL," Word ambiguity removal is a task of removing ambiguity from a word, i.e. -correct sense of word is identified from ambiguous sentences. This paper -describes a model that uses Part of Speech tagger and three categories for word -sense disambiguation (WSD). Human Computer Interaction is very needful to -improve interactions between users and computers. For this, the Supervised and -Unsupervised methods are combined. The WSD algorithm is used to find the -efficient and accurate sense of a word based on domain information. The -accuracy of this work is evaluated with the aim of finding best suitable domain -of word. -" -751,1304.7289,"Leon Derczynski, Hector Llorens, Naushad UzZaman",TimeML-strict: clarifying temporal annotation,cs.CL," TimeML is an XML-based schema for annotating temporal information over -discourse. The standard has been used to annotate a variety of resources and is -followed by a number of tools, the creation of which constitute hundreds of -thousands of man-hours of research work. However, the current state of -resources is such that many are not valid, or do not produce valid output, or -contain ambiguous or custom additions and removals. Difficulties arising from -these variances were highlighted in the TempEval-3 exercise, which included its -own extra stipulations over conventional TimeML as a response. - To unify the state of current resources, and to make progress toward easy -adoption of its current incarnation ISO-TimeML, this paper introduces -TimeML-strict: a valid, unambiguous, and easy-to-process subset of TimeML. We -also introduce three resources -- a schema for TimeML-strict; a validator tool -for TimeML-strict, so that one may ensure documents are in the correct form; -and a repair tool that corrects common invalidating errors and adds -disambiguating markup in order to convert documents from the laxer TimeML -standard to TimeML-strict. -" -752,1304.7359,"Ramon Ferrer-i-Cancho, {\L}ukasz D\k{e}bowski and Ferm\'in Moscoso del - Prado Mart\'in",Constant conditional entropy and related hypotheses,cond-mat.stat-mech cs.CL cs.IT math.IT physics.data-an," Constant entropy rate (conditional entropies must remain constant as the -sequence length increases) and uniform information density (conditional -probabilities must remain constant as the sequence length increases) are two -information theoretic principles that are argued to underlie a wide range of -linguistic phenomena. Here we revise the predictions of these principles to the -light of Hilberg's law on the scaling of conditional entropy in language and -related laws. We show that constant entropy rate (CER) and two interpretations -for uniform information density (UID), full UID and strong UID, are -inconsistent with these laws. Strong UID implies CER but the reverse is not -true. Full UID, a particular case of UID, leads to costly uncorrelated -sequences that are totally unrealistic. We conclude that CER and its particular -cases are incomplete hypotheses about the scaling of conditional entropies. -" -753,1304.7507,"Eugene Yuta Bann, Joanna J. Bryson","Measuring Cultural Relativity of Emotional Valence and Arousal using - Semantic Clustering and Twitter",cs.CL cs.AI," Researchers since at least Darwin have debated whether and to what extent -emotions are universal or culture-dependent. However, previous studies have -primarily focused on facial expressions and on a limited set of emotions. Given -that emotions have a substantial impact on human lives, evidence for cultural -emotional relativity might be derived by applying distributional semantics -techniques to a text corpus of self-reported behaviour. Here, we explore this -idea by measuring the valence and arousal of the twelve most popular emotion -keywords expressed on the micro-blogging site Twitter. We do this in three -geographical regions: Europe, Asia and North America. We demonstrate that in -our sample, the valence and arousal levels of the same emotion keywords differ -significantly with respect to these geographical regions --- Europeans are, or -at least present themselves as more positive and aroused, North Americans are -more negative and Asians appear to be more positive but less aroused when -compared to global valence and arousal levels of the same emotion keywords. Our -work is the first in kind to programatically map large text corpora to a -dimensional model of affect. -" -754,1304.7728,"Sugata Sanyal, Rajdeep Borgohain",Machine Translation Systems in India,cs.CL cs.CY," Machine Translation is the translation of one natural language into another -using automated and computerized means. For a multilingual country like India, -with the huge amount of information exchanged between various regions and in -different languages in digitized format, it has become necessary to find an -automated process from one language to another. In this paper, we take a look -at the various Machine Translation System in India which is specifically built -for the purpose of translation between the Indian languages. We discuss the -various approaches taken for building the machine translation system and then -discuss some of the Machine Translation Systems in India along with their -features. -" -755,1304.7942,"Michele Filannino, Gavin Brown, and Goran Nenadic","ManTIME: Temporal expression identification and normalization in the - TempEval-3 challenge",cs.CL," This paper describes a temporal expression identification and normalization -system, ManTIME, developed for the TempEval-3 challenge. The identification -phase combines the use of conditional random fields along with a -post-processing identification pipeline, whereas the normalization phase is -carried out using NorMA, an open-source rule-based temporal normalizer. We -investigate the performance variation with respect to different feature types. -Specifically, we show that the use of WordNet-based features in the -identification task negatively affects the overall performance, and that there -is no statistically significant difference in using gazetteers, shallow parsing -and propositional noun phrases labels on top of the morphological features. On -the test data, the best run achieved 0.95 (P), 0.85 (R) and 0.90 (F1) in the -identification phase. Normalization accuracies are 0.84 (type attribute) and -0.77 (value attribute). Surprisingly, the use of the silver data (alone or in -addition to the gold annotated ones) does not improve the performance. -" -756,1304.8016,"Lukas Barth, Stephen Kobourov, Sergey Pupyrev, Torsten Ueckerdt",On Semantic Word Cloud Representation,cs.DS cs.CL," We study the problem of computing semantic-preserving word clouds in which -semantically related words are close to each other. While several heuristic -approaches have been described in the literature, we formalize the underlying -geometric algorithm problem: Word Rectangle Adjacency Contact (WRAC). In this -model each word is associated with rectangle with fixed dimensions, and the -goal is to represent semantically related words by ensuring that the two -corresponding rectangles touch. We design and analyze efficient polynomial-time -algorithms for some variants of the WRAC problem, show that several general -variants are NP-hard, and describe a number of approximation algorithms. -Finally, we experimentally demonstrate that our theoretically-sound algorithms -outperform the early heuristics. -" -757,1305.0194,"Cihan Aksoy, Vincent Labatut, Chantal Cherifi, Jean-Fran\c{c}ois - Santucci",MATAWS: A Multimodal Approach for Automatic WS Semantic Annotation,cs.SE cs.CL cs.IR," Many recent works aim at developing methods and tools for the processing of -semantic Web services. In order to be properly tested, these tools must be -applied to an appropriate benchmark, taking the form of a collection of -semantic WS descriptions. However, all of the existing publicly available -collections are limited by their size or their realism (use of randomly -generated or resampled descriptions). Larger and realistic syntactic (WSDL) -collections exist, but their semantic annotation requires a certain level of -automation, due to the number of operations to be processed. In this article, -we propose a fully automatic method to semantically annotate such large WS -collections. Our approach is multimodal, in the sense it takes advantage of the -latent semantics present not only in the parameter names, but also in the type -names and structures. Concept-to-word association is performed by using Sigma, -a mapping of WordNet to the SUMO ontology. After having described in details -our annotation method, we apply it to the larger collection of real-world -syntactic WS descriptions we could find, and assess its efficiency. -" -758,1305.0556,"Stephen Clark, Bob Coecke, Edward Grefenstette, Stephen Pulman and - Mehrnoosh Sadrzadeh","A quantum teleportation inspired algorithm produces sentence meaning - from word meaning and grammatical structure",cs.CL quant-ph," We discuss an algorithm which produces the meaning of a sentence given -meanings of its words, and its resemblance to quantum teleportation. In fact, -this protocol was the main source of inspiration for this algorithm which has -many applications in the area of Natural Language Processing. -" -759,1305.0625,"Kamlesh Sharma, Dr. T. V. Prasad",CONATION: English Command Input/Output System for Computers,cs.HC cs.CL," In this information technology age, a convenient and user friendly interface -is required to operate the computer system on very fast rate. In the human -being, speech being a natural mode of communication has potential to being a -fast and convenient mode of interaction with computer. Speech recognition will -play an important role in taking technology to them. It is the need of this era -to access the information within seconds. This paper describes the design and -development of speaker independent and English command interpreted system for -computers. HMM model is used to represent the phoneme like speech commands. -Experiments have been done on real world data and system has been trained in -normal condition for real world subject. -" -760,1305.1145,"Urmila Shrawankar, V M Thakare","Techniques for Feature Extraction In Speech Recognition System : A - Comparative Study",cs.SD cs.CL," The time domain waveform of a speech signal carries all of the auditory -information. From the phonological point of view, it little can be said on the -basis of the waveform itself. However, past research in mathematics, acoustics, -and speech technology have provided many methods for converting data that can -be considered as information if interpreted correctly. In order to find some -statistically relevant information from incoming data, it is important to have -mechanisms for reducing the information of each segment in the audio signal -into a relatively small number of parameters, or features. These features -should describe each segment in such a characteristic way that other similar -segments can be grouped together by comparing their features. There are -enormous interesting and exceptional ways to describe the speech signal in -terms of parameters. Though, they all have their strengths and weaknesses, we -have presented some of the most used methods with their importance. -" -761,1305.1319,David Bamman and Noah A. Smith,New Alignment Methods for Discriminative Book Summarization,cs.CL," We consider the unsupervised alignment of the full text of a book with a -human-written summary. This presents challenges not seen in other text -alignment problems, including a disparity in length and, consequent to this, a -violation of the expectation that individual words and phrases should align, -since large passages and chapters can be distilled into a single summary -phrase. We present two new methods, based on hidden Markov models, specifically -targeted to this problem, and demonstrate gains on an extractive book -summarization task. While there is still much room for improvement, -unsupervised alignment holds intrinsic value in offering insight into what -features of a book are deemed worthy of summarization. -" -762,1305.1343,Arnim Bleier and Andreas Strotmann,"Towards an Author-Topic-Term-Model Visualization of 100 Years of German - Sociological Society Proceedings",cs.DL cs.CL cs.IR," Author co-citation studies employ factor analysis to reduce high-dimensional -co-citation matrices to low-dimensional and possibly interpretable factors, but -these studies do not use any information from the text bodies of publications. -We hypothesise that term frequencies may yield useful information for -scientometric analysis. In our work we ask if word features in combination with -Bayesian analysis allow well-founded science mapping studies. This work goes -back to the roots of Mosteller and Wallace's (1964) statistical text analysis -using word frequency features and a Bayesian inference approach, tough with -different goals. To answer our research question we (i) introduce a new data -set on which the experiments are carried out, (ii) describe the Bayesian model -employed for inference and (iii) present first results of the analysis. -" -763,1305.1426,"Urmila Shrawankar, V. M. Thakare",Speech Enhancement Modeling Towards Robust Speech Recognition System,cs.SD cs.CL," Form about four decades human beings have been dreaming of an intelligent -machine which can master the natural speech. In its simplest form, this machine -should consist of two subsystems, namely automatic speech recognition (ASR) and -speech understanding (SU). The goal of ASR is to transcribe natural speech -while SU is to understand the meaning of the transcription. Recognizing and -understanding a spoken sentence is obviously a knowledge-intensive process, -which must take into account all variable information about the speech -communication process, from acoustics to semantics and pragmatics. While -developing an Automatic Speech Recognition System, it is observed that some -adverse conditions degrade the performance of the Speech Recognition System. In -this contribution, speech enhancement system is introduced for enhancing speech -signals corrupted by additive noise and improving the performance of Automatic -Speech Recognizers in noisy conditions. Automatic speech recognition -experiments show that replacing noisy speech signals by the corresponding -enhanced speech signals leads to an improvement in the recognition accuracies. -The amount of improvement varies with the type of the corrupting noise. -" -764,1305.1925,"Urmila Shrawankar, Anjali Mahajan","Speech: A Challenge to Digital Signal Processing Technology for - Human-to-Computer Interaction",cs.HC cs.CL," This software project based paper is for a vision of the near future in which -computer interaction is characterized by natural face-to-face conversations -with lifelike characters that speak, emote, and gesture. The first step is -speech. The dream of a true virtual reality, a complete human-computer -interaction system will not come true unless we try to give some perception to -machine and make it perceive the outside world as humans communicate with each -other. This software project is under development for listening and replying -machine (Computer) through speech. The Speech interface is developed to convert -speech input into some parametric form (Speech-to-Text) for further processing -and the results, text output to speech synthesis (Text-to-Speech) -" -765,1305.2352,"Rashmi Makhijani, Urmila Shrawankar, V M Thakare",Speech Enhancement Using Pitch Detection Approach For Noisy Environment,cs.SD cs.CL," Acoustical mismatch among training and testing phases degrades outstandingly -speech recognition results. This problem has limited the development of -real-world nonspecific applications, as testing conditions are highly variant -or even unpredictable during the training process. Therefore the background -noise has to be removed from the noisy speech signal to increase the signal -intelligibility and to reduce the listener fatigue. Enhancement techniques -applied, as pre-processing stages; to the systems remarkably improve -recognition results. In this paper, a novel approach is used to enhance the -perceived quality of the speech signal when the additive noise cannot be -directly controlled. Instead of controlling the background noise, we propose to -reinforce the speech signal so that it can be heard more clearly in noisy -environments. The subjective evaluation shows that the proposed method improves -perceptual quality of speech in various noisy environments. As in some cases -speaking may be more convenient than typing, even for rapid typists: many -mathematical symbols are missing from the keyboard but can be easily spoken and -recognized. Therefore, the proposed system can be used in an application -designed for mathematical symbol recognition (especially symbols not available -on the keyboard) in schools. -" -766,1305.2680,Sulaiman S. AlDahri,"A study for the effect of the Emphaticness and language and dialect for - Voice Onset Time (VOT) in Modern Standard Arabic (MSA)",cs.CL cs.SD," The signal sound contains many different features, including Voice Onset Time -(VOT), which is a very important feature of stop sounds in many languages. The -only application of VOT values is stopping phoneme subsets. This subset of -consonant sounds is stop phonemes exist in the Arabic language, and in fact, -all languages. The pronunciation of these sounds is hard and unique especially -for less-educated Arabs and non-native Arabic speakers. VOT can be utilized by -the human auditory system to distinguish between voiced and unvoiced stops such -as /p/ and /b/ in English.This search focuses on computing and analyzing VOT of -Modern Standard Arabic (MSA), within the Arabic language, for all pairs of -non-emphatic (namely, /d/ and /t/) and emphatic pairs (namely, /d?/ and /t?/) -depending on carrier words. This research uses a database built by ourselves, -and uses the carrier words syllable structure: CV-CV-CV. One of the main -outcomes always found is the emphatic sounds (/d?/, /t?/) are less than 50% of -non-emphatic (counter-part) sounds ( /d/, /t/).Also, VOT can be used to -classify or detect for a dialect ina language. -" -767,1305.2846,"Rashmi Makhijani, Urmila Shrawankar, V M Thakare",Opportunities & Challenges In Automatic Speech Recognition,cs.CL cs.SD," Automatic speech recognition enables a wide range of current and emerging -applications such as automatic transcription, multimedia content analysis, and -natural human-computer interfaces. This paper provides a glimpse of the -opportunities and challenges that parallelism provides for automatic speech -recognition and related application research from the point of view of speech -researchers. The increasing parallelism in computing platforms opens three -major possibilities for speech recognition systems: improving recognition -accuracy in non-ideal, everyday noisy environments; increasing recognition -throughput in batch processing of speech data; and reducing recognition latency -in realtime usage scenarios. This paper describes technical challenges, -approaches taken, and possible directions for future research to guide the -design of efficient parallel software and hardware infrastructures. -" -768,1305.2847,"Neema Mishra, Urmila Shrawankar, V M Thakare",An Overview of Hindi Speech Recognition,cs.CL cs.SD," In this age of information technology, information access in a convenient -manner has gained importance. Since speech is a primary mode of communication -among human beings, it is natural for people to expect to be able to carry out -spoken dialogue with computer. Speech recognition system permits ordinary -people to speak to the computer to retrieve information. It is desirable to -have a human computer dialogue in local language. Hindi being the most widely -spoken Language in India is the natural primary human language candidate for -human machine interaction. There are five pairs of vowels in Hindi languages; -one member is longer than the other one. This paper describes an overview of -speech recognition system that includes how speech is produced and the -properties and characteristics of Hindi Phoneme. -" -769,1305.2959,"Neema Mishra, Urmila Shrawankar, V M Thakare","Automatic Speech Recognition Using Template Model for Man-Machine - Interface",cs.SD cs.CL," Speech is a natural form of communication for human beings, and computers -with the ability to understand speech and speak with a human voice are expected -to contribute to the development of more natural man-machine interfaces. -Computers with this kind of ability are gradually becoming a reality, through -the evolution of speech recognition technologies. Speech is being an important -mode of interaction with computers. In this paper Feature extraction is -implemented using well-known Mel-Frequency Cepstral Coefficients (MFCC).Pattern -matching is done using Dynamic time warping (DTW) algorithm. -" -770,1305.3107,"Sasa Petrovic, Miles Osborne and Victor Lavrenko","I Wish I Didn't Say That! Analyzing and Predicting Deleted Messages in - Twitter",cs.SI cs.CL," Twitter has become a major source of data for social media researchers. One -important aspect of Twitter not previously considered are {\em deletions} -- -removal of tweets from the stream. Deletions can be due to a multitude of -reasons such as privacy concerns, rashness or attempts to undo public -statements. We show how deletions can be automatically predicted ahead of time -and analyse which tweets are likely to be deleted and how. -" -771,1305.3882,Daniel Christen,"Rule-Based Semantic Tagging. An Application Undergoing Dictionary - Glosses",cs.CL," The project presented in this article aims to formalize criteria and -procedures in order to extract semantic information from parsed dictionary -glosses. The actual purpose of the project is the generation of a semantic -network (nearly an ontology) issued from a monolingual Italian dictionary, -through unsupervised procedures. Since the project involves rule-based Parsing, -Semantic Tagging and Word Sense Disambiguation techniques, its outcomes may -find an interest also beyond this immediate intent. The cooperation of both -syntactic and semantic features in meaning construction are investigated, and -procedures which allows a translation of syntactic dependencies in semantic -relations are discussed. The procedures that rise from this project can be -applied also to other text types than dictionary glosses, as they convert the -output of a parsing process into a semantic representation. In addition some -mechanism are sketched that may lead to a kind of procedural semantics, through -which multiple paraphrases of an given expression can be generated. Which means -that these techniques may find an application also in 'query expansion' -strategies, interesting Information Retrieval, Search Engines and Question -Answering Systems. -" -772,1305.3981,"Kaixu Zhang, Can Wang, Maosong Sun",Binary Tree based Chinese Word Segmentation,cs.CL," Chinese word segmentation is a fundamental task for Chinese language -processing. The granularity mismatch problem is the main cause of the errors. -This paper showed that the binary tree representation can store outputs with -different granularity. A binary tree based framework is also designed to -overcome the granularity mismatch problem. There are two steps in this -framework, namely tree building and tree pruning. The tree pruning step is -specially designed to focus on the granularity problem. Previous work for -Chinese word segmentation such as the sequence tagging can be easily employed -in this framework. This framework can also provide quantitative error analysis -methods. The experiments showed that after using a more sophisticated tree -pruning function for a state-of-the-art conditional random field based -baseline, the error reduction can be up to 20%. -" -773,1305.4561,Ramon Ferrer-i-Cancho,Random crossings in dependency trees,cs.CL cs.DM cs.SI physics.soc-ph," It has been hypothesized that the rather small number of crossings in real -syntactic dependency trees is a side-effect of pressure for dependency length -minimization. Here we answer a related important research question: what would -be the expected number of crossings if the natural order of a sentence was lost -and replaced by a random ordering? We show that this number depends only on the -number of vertices of the dependency tree (the sentence length) and the second -moment about zero of vertex degrees. The expected number of crossings is -minimum for a star tree (crossings are impossible) and maximum for a linear -tree (the number of crossings is of the order of the square of the sequence -length). -" -774,1305.5566,"Taha Yasseri, Anselm Spoerri, Mark Graham, and J\'anos Kert\'esz","The most controversial topics in Wikipedia: A multilingual and - geographical analysis",physics.soc-ph cs.CL cs.DL cs.SI physics.data-an," We present, visualize and analyse the similarities and differences between -the controversial topics related to ""edit wars"" identified in 10 different -language versions of Wikipedia. After a brief review of the related work we -describe the methods developed to locate, measure, and categorize the -controversial topics in the different languages. Visualizations of the degree -of overlap between the top 100 lists of most controversial articles in -different languages and the content related to geographical locations will be -presented. We discuss what the presented analysis and visualizations can tell -us about the multicultural aspects of Wikipedia and practices of -peer-production. Our results indicate that Wikipedia is more than just an -encyclopaedia; it is also a window into convergent and divergent social-spatial -priorities, interests and preferences. -" -775,1305.5753,"Peter D. Bruza and Kirsty Kitto and Brentyn J. Ramm and Laurianne - Sitbon","A probabilistic framework for analysing the compositionality of - conceptual combinations",cs.CL," Conceptual combination performs a fundamental role in creating the broad -range of compound phrases utilized in everyday language. This article provides -a novel probabilistic framework for assessing whether the semantics of -conceptual combinations are compositional, and so can be considered as a -function of the semantics of the constituent concepts, or not. While the -systematicity and productivity of language provide a strong argument in favor -of assuming compositionality, this very assumption is still regularly -questioned in both cognitive science and philosophy. Additionally, the -principle of semantic compositionality is underspecified, which means that -notions of both ""strong"" and ""weak"" compositionality appear in the literature. -Rather than adjudicating between different grades of compositionality, the -framework presented here contributes formal methods for determining a clear -dividing line between compositional and non-compositional semantics. In -addition, we suggest that the distinction between these is contextually -sensitive. Utilizing formal frameworks developed for analyzing composite -systems in quantum theory, we present two methods that allow the semantics of -conceptual combinations to be classified as ""compositional"" or -""non-compositional"". Compositionality is first formalised by factorising the -joint probability distribution modeling the combination, where the terms in the -factorisation correspond to individual concepts. This leads to the necessary -and sufficient condition for the joint probability distribution to exist. A -failure to meet this condition implies that the underlying concepts cannot be -modeled in a single probability space when considering their combination, and -the combination is thus deemed ""non-compositional"". The formal analysis methods -are demonstrated by applying them to an empirical study of twenty-four -non-lexicalised conceptual combinations. -" -776,1305.5785,Vivek Srikumar and Dan Roth,An Inventory of Preposition Relations,cs.CL," We describe an inventory of semantic relations that are expressed by -prepositions. We define these relations by building on the word sense -disambiguation task for prepositions and propose a mapping from preposition -senses to the relation labels by collapsing semantically related senses across -prepositions. -" -777,1305.5918,"Kaixu Zhang, Maosong Sun","Reduce Meaningless Words for Joint Chinese Word Segmentation and - Part-of-speech Tagging",cs.CL," Conventional statistics-based methods for joint Chinese word segmentation and -part-of-speech tagging (S&T) have generalization ability to recognize new words -that do not appear in the training data. An undesirable side effect is that a -number of meaningless words will be incorrectly created. We propose an -effective and efficient framework for S&T that introduces features to -significantly reduce meaningless words generation. A general lexicon, Wikepedia -and a large-scale raw corpus of 200 billion characters are used to generate -word-based features for the wordhood. The word-lattice based framework consists -of a character-based model and a word-based model in order to employ our -word-based features. Experiments on Penn Chinese treebank 5 show that this -method has a 62.9% reduction of meaningless word generation in comparison with -the baseline. As a result, the F1 measure for segmentation is increased to -0.984. -" -778,1305.6143,"Vivek Narayanan, Ishan Arora, Arjun Bhatia","Fast and accurate sentiment classification using an enhanced Naive Bayes - model",cs.CL cs.IR cs.LG," We have explored different methods of improving the accuracy of a Naive Bayes -classifier for sentiment analysis. We observed that a combination of methods -like negation handling, word n-grams and feature selection by mutual -information results in a significant improvement in accuracy. This implies that -a highly accurate and fast sentiment classifier can be built using a simple -Naive Bayes model that has linear training and testing time complexities. We -achieved an accuracy of 88.80% on the popular IMDB movie reviews dataset. -" -779,1305.6211,"Snigdha Paul, Nisheeth Joshi, Iti Mathur",Development of a Hindi Lemmatizer,cs.CL," We live in a translingual society, in order to communicate with people from -different parts of the world we need to have an expertise in their respective -languages. Learning all these languages is not at all possible; therefore we -need a mechanism which can do this task for us. Machine translators have -emerged as a tool which can perform this task. In order to develop a machine -translator we need to develop several different rules. The very first module -that comes in machine translation pipeline is morphological analysis. Stemming -and lemmatization comes under morphological analysis. In this paper we have -created a lemmatizer which generates rules for removing the affixes along with -the addition of rules for creating a proper root word. -" -780,1305.6238,Richard Moot (LaBRI),Extended Lambek calculi and first-order linear logic,cs.CL cs.LO," First-order multiplicative intuitionistic linear logic (MILL1) can be seen as -an extension of the Lambek calculus. In addition to the fragment of MILL1 which -corresponds to the Lambek calculus (of Moot & Piazza 2001), I will show -fragments of MILL1 which generate the multiple context-free languages and which -correspond to the Displacement calculus of Morrilll e.a. -" -781,1305.7014,Bohdan Pavlyshenko,Tweets Miner for Stock Market Analysis,cs.IR cs.CL cs.SI," In this paper, we present a software package for the data mining of Twitter -microblogs for the purpose of using them for the stock market analysis. The -package is written in R langauge using apropriate R packages. The model of -tweets has been considered. We have also compared stock market charts with -frequent sets of keywords in Twitter microblogs messages. -" -782,1306.0963,"Been Kim, Caleb M. Chacha, Julie Shah","Inferring Robot Task Plans from Human Team Meetings: A Generative - Modeling Approach with Logic-Based Prior",cs.AI cs.CL cs.RO stat.ML," We aim to reduce the burden of programming and deploying autonomous systems -to work in concert with people in time-critical domains, such as military field -operations and disaster response. Deployment plans for these operations are -frequently negotiated on-the-fly by teams of human planners. A human operator -then translates the agreed upon plan into machine instructions for the robots. -We present an algorithm that reduces this translation burden by inferring the -final plan from a processed form of the human team's planning conversation. Our -approach combines probabilistic generative modeling with logical plan -validation used to compute a highly structured prior over possible plans. This -hybrid approach enables us to overcome the challenge of performing inference -over the large solution space with only a small amount of noisy data from the -team planning session. We validate the algorithm through human subject -experimentation and show we are able to infer a human team's final plan with -83% accuracy on average. We also describe a robot demonstration in which two -people plan and execute a first-response collaborative task with a PR2 robot. -To the best of our knowledge, this is the first work that integrates a logical -planning technique within a generative model to perform plan inference. -" -783,1306.1343,Andrea Esuli,The User Feedback on SentiWordNet,cs.CL cs.IR," With the release of SentiWordNet 3.0 the related Web interface has been -restyled and improved in order to allow users to submit feedback on the -SentiWordNet entries, in the form of the suggestion of alternative triplets of -values for an entry. This paper reports on the release of the user feedback -collected so far and on the plans for the future. -" -784,1306.1927,Been Kim and Cynthia Rudin,Learning About Meetings,stat.AP cs.CL," Most people participate in meetings almost every day, multiple times a day. -The study of meetings is important, but also challenging, as it requires an -understanding of social signals and complex interpersonal dynamics. Our aim -this work is to use a data-driven approach to the science of meetings. We -provide tentative evidence that: i) it is possible to automatically detect when -during the meeting a key decision is taking place, from analyzing only the -local dialogue acts, ii) there are common patterns in the way social dialogue -acts are interspersed throughout a meeting, iii) at the time key decisions are -made, the amount of time left in the meeting can be predicted from the amount -of time that has passed, iv) it is often possible to predict whether a proposal -during a meeting will be accepted or rejected based entirely on the language -(the set of persuasive words) used by the speaker. -" -785,1306.2091,"Nathan Schneider, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal - Faruqui, Noah A. Smith, Chris Dyer, Jason Baldridge","A framework for (under)specifying dependency syntax without overloading - annotators",cs.CL," We introduce a framework for lightweight dependency syntax annotation. Our -formalism builds upon the typical representation for unlabeled dependencies, -permitting a simple notation and annotation workflow. Moreover, the formalism -encourages annotators to underspecify parts of the syntax if doing so would -streamline the annotation process. We demonstrate the efficacy of this -annotation on three languages and develop algorithms to evaluate and compare -underspecified annotations. -" -786,1306.2158,"Karl Moritz Hermann, Edward Grefenstette and Phil Blunsom","""Not not bad"" is not ""bad"": A distributional account of negation",cs.CL," With the increasing empirical success of distributional models of -compositional semantics, it is timely to consider the types of textual logic -that such models are capable of capturing. In this paper, we address -shortcomings in the ability of current models to capture logical operations -such as negation. As a solution we propose a tripartite formulation for a -continuous vector space representation of semantics and subsequently use this -representation to develop a formal compositional notion of negation within such -models. -" -787,1306.2268,Keehang Kwon and Mi-Young Park,Accomplishable Tasks in Knowledge Representation,cs.AI cs.CL," Knowledge Representation (KR) is traditionally based on the logic of facts, -expressed in boolean logic. However, facts about an agent can also be seen as a -set of accomplished tasks by the agent. This paper proposes a new approach to -KR: the notion of task logical KR based on Computability Logic. This notion -allows the user to represent both accomplished tasks and accomplishable tasks -by the agent. This notion allows us to build sophisticated KRs about many -interesting agents, which have not been supported by previous logical -languages. -" -788,1306.2499,"Mohammed Alaeddine Abderrahim, Mohammed El Amine Abderrahim, Mohammed - Amine Chikh","Using Arabic Wordnet for semantic indexation in information retrieval - system",cs.IR cs.CL," In the context of arabic Information Retrieval Systems (IRS) guided by arabic -ontology and to enable those systems to better respond to user requirements, -this paper aims to representing documents and queries by the best concepts -extracted from Arabic Wordnet. Identified concepts belonging to Arabic WordNet -synsets are extracted from documents and queries, and those having a single -sense are expanded. The expanded query is then used by the IRS to retrieve the -relevant documents searched. Our experiments are based primarily on a medium -size corpus of arabic text. The results obtained shown us that there are a -global improvement in the performance of the arabic IRS. -" -789,1306.2593,Elaine Y L Tsiang,A Perceptual Alphabet for the 10-dimensional Phonetic-prosodic Space,cs.SD cs.CL," We define an alphabet, the IHA, of the 10-D phonetic-prosodic space. The -dimensions of this space are perceptual observables, rather than articulatory -specifications. Speech is defined as a random chain in time of the 4-D phonetic -subspace, that is, a symbolic sequence, augmented with diacritics of the -remaining 6-D prosodic subspace. The definitions here are based on the model of -speech of oral billiards, and supersedes an earlier version. This paper only -enumerates the IHA in detail as a supplement to the exposition of oral -billiards in a separate paper. The IHA has been implemented as the target -random variable in a speech recognizer. -" -790,1306.2838,"Diederik Aerts, Jan Broekaert, Sandro Sozzo and Tomas Veloz",The Quantum Challenge in Concept Theory and Natural Language Processing,cs.CL cs.IR quant-ph," The mathematical formalism of quantum theory has been successfully used in -human cognition to model decision processes and to deliver representations of -human knowledge. As such, quantum cognition inspired tools have improved -technologies for Natural Language Processing and Information Retrieval. In this -paper, we overview the quantum cognition approach developed in our Brussels -team during the last two decades, specifically our identification of quantum -structures in human concepts and language, and the modeling of data from -psychological and corpus-text-based experiments. We discuss our -quantum-theoretic framework for concepts and their conjunctions/disjunctions in -a Fock-Hilbert space structure, adequately modeling a large amount of data -collected on concept combinations. Inspired by this modeling, we put forward -elements for a quantum contextual and meaning-based approach to information -technologies in which 'entities of meaning' are inversely reconstructed from -texts, which are considered as traces of these entities' states. -" -791,1306.3584,"Nal Kalchbrenner, Phil Blunsom",Recurrent Convolutional Neural Networks for Discourse Compositionality,cs.CL," The compositionality of meaning extends beyond the single sentence. Just as -words combine to form the meaning of sentences, so do sentences combine to form -the meaning of paragraphs, dialogues and general discourse. We introduce both a -sentence model and a discourse model corresponding to the two levels of -compositionality. The sentence model adopts convolution as the central -operation for composing semantic vectors and is based on a novel hierarchical -convolutional neural network. The discourse model extends the sentence model -and is based on a recurrent neural network that is conditioned in a novel way -both on the current sentence and on the current speaker. The discourse model is -able to capture both the sequentiality of sentences and the interaction between -different speakers. Without feature engineering or pretraining and with simple -greedy decoding, the discourse model coupled to the sentence model obtains -state of the art performance on a dialogue act classification experiment. -" -792,1306.3692,"Felipe S\'anchez-Mart\'inez, Isabel Mart\'inez-Sempere, Xavier - Ivars-Ribes, Rafael C. Carrasco","An open diachronic corpus of historical Spanish: annotation criteria and - automatic modernisation of spelling",cs.CL cs.DL," The IMPACT-es diachronic corpus of historical Spanish compiles over one -hundred books --containing approximately 8 million words-- in addition to a -complementary lexicon which links more than 10 thousand lemmas with -attestations of the different variants found in the documents. This textual -corpus and the accompanying lexicon have been released under an open license -(Creative Commons by-nc-sa) in order to permit their intensive exploitation in -linguistic research. Approximately 7% of the words in the corpus (a selection -aimed at enhancing the coverage of the most frequent word forms) have been -annotated with their lemma, part of speech, and modern equivalent. This paper -describes the annotation criteria followed and the standards, based on the Text -Encoding Initiative recommendations, used to the represent the texts in digital -form. As an illustration of the possible synergies between diachronic textual -resources and linguistic research, we describe the application of statistical -machine translation techniques to infer probabilistic context-sensitive rules -for the automatic modernisation of spelling. The automatic modernisation with -this type of statistical methods leads to very low character error rates when -the output is compared with the supervised modern version of the text. -" -793,1306.3920,Thiago C. Silva and Diego R. Amancio,Discriminating word senses with tourist walks in complex networks,cs.CL cs.SI physics.soc-ph," Patterns of topological arrangement are widely used for both animal and human -brains in the learning process. Nevertheless, automatic learning techniques -frequently overlook these patterns. In this paper, we apply a learning -technique based on the structural organization of the data in the attribute -space to the problem of discriminating the senses of 10 polysemous words. Using -two types of characterization of meanings, namely semantical and topological -approaches, we have observed significative accuracy rates in identifying the -suitable meanings in both techniques. Most importantly, we have found that the -characterization based on the deterministic tourist walk improves the -disambiguation process when one compares with the discrimination achieved with -traditional complex networks measurements such as assortativity and clustering -coefficient. To our knowledge, this is the first time that such deterministic -walk has been applied to such a kind of problem. Therefore, our finding -suggests that the tourist walk characterization may be useful in other related -applications. -" -794,1306.4134,"Suket Arora, Kamaljeet Batra, Sarabjit Singh",Dialogue System: A Brief Review,cs.CL," A Dialogue System is a system which interacts with human in natural language. -At present many universities are developing the dialogue system in their -regional language. This paper will discuss about dialogue system, its -components, challenges and its evaluation. This paper helps the researchers for -getting info regarding dialogues system. -" -795,1306.4139,"Preeti Verma, Suket Arora, Kamaljit Batra",Punjabi Language Interface to Database: a brief review,cs.CL cs.HC," Unlike most user-computer interfaces, a natural language interface allows -users to communicate fluently with a computer system with very little -preparation. Databases are often hard to use in cooperating with the users -because of their rigid interface. A good NLIDB allows a user to enter commands -and ask questions in native language and then after interpreting respond to the -user in native language. For a large number of applications requiring -interaction between humans and the computer systems, it would be convenient to -provide the end-user friendly interface. Punjabi language interface to database -would proof fruitful to native people of Punjab, as it provides ease to them to -use various e-governance applications like Punjab Sewa, Suwidha, Online Public -Utility Forms, Online Grievance Cell, Land Records Management System,legacy -matters, e-District, agriculture, etc. Punjabi is the mother tongue of more -than 110 million people all around the world. According to available -information, Punjabi ranks 10th from top out of a total of 6,900 languages -recognized internationally by the United Nations. This paper covers a brief -overview of the Natural language interface to database, its different -components, its advantages, disadvantages, approaches and techniques used. The -paper ends with the work done on Punjabi language interface to database and -future enhancements that can be done. -" -796,1306.4886,"Luis Marujo, Anatole Gershman, Jaime Carbonell, Robert Frederking, - Jo\~ao P. Neto","Supervised Topical Key Phrase Extraction of News Stories using - Crowdsourcing, Light Filtering and Co-reference Normalization",cs.CL cs.IR," Fast and effective automated indexing is critical for search and personalized -services. Key phrases that consist of one or more words and represent the main -concepts of the document are often used for the purpose of indexing. In this -paper, we investigate the use of additional semantic features and -pre-processing steps to improve automatic key phrase extraction. These features -include the use of signal words and freebase categories. Some of these features -lead to significant improvements in the accuracy of the results. We also -experimented with 2 forms of document pre-processing that we call light -filtering and co-reference normalization. Light filtering removes sentences -from the document, which are judged peripheral to its main content. -Co-reference normalization unifies several written forms of the same named -entity into a unique form. We also needed a ""Gold Standard"" - a set of labeled -documents for training and evaluation. While the subjective nature of key -phrase selection precludes a true ""Gold Standard"", we used Amazon's Mechanical -Turk service to obtain a useful approximation. Our data indicates that the -biggest improvements in performance were due to shallow semantic features, news -categories, and rhetorical signals (nDCG 78.47% vs. 68.93%). The inclusion of -deeper semantic features such as Freebase sub-categories was not beneficial by -itself, but in combination with pre-processing, did cause slight improvements -in the nDCG scores. -" -797,1306.4890,"Luis Marujo, Ricardo Ribeiro, David Martins de Matos, Jo\~ao P. Neto, - Anatole Gershman, and Jaime Carbonell",Key Phrase Extraction of Lightly Filtered Broadcast News,cs.CL cs.IR," This paper explores the impact of light filtering on automatic key phrase -extraction (AKE) applied to Broadcast News (BN). Key phrases are words and -expressions that best characterize the content of a document. Key phrases are -often used to index the document or as features in further processing. This -makes improvements in AKE accuracy particularly important. We hypothesized that -filtering out marginally relevant sentences from a document would improve AKE -accuracy. Our experiments confirmed this hypothesis. Elimination of as little -as 10% of the document sentences lead to a 2% improvement in AKE precision and -recall. AKE is built over MAUI toolkit that follows a supervised learning -approach. We trained and tested our AKE method on a gold standard made of 8 BN -programs containing 110 manually annotated news stories. The experiments were -conducted within a Multimedia Monitoring Solution (MMS) system for TV and radio -news/programs, running daily, and monitoring 12 TV and 4 radio channels. -" -798,1306.4908,"Luis Marujo, Wang Ling, Anatole Gershman, Jaime Carbonell, Jo\~ao P. - Neto, David Matos",Recognition of Named-Event Passages in News Articles,cs.CL cs.IR," We extend the concept of Named Entities to Named Events - commonly occurring -events such as battles and earthquakes. We propose a method for finding -specific passages in news articles that contain information about such events -and report our preliminary evaluation results. Collecting ""Gold Standard"" data -presents many problems, both practical and conceptual. We present a method for -obtaining such data using the Amazon Mechanical Turk service. -" -799,1306.5170,"Wafaa Tawfik Abdel-moneim, Mohamed Hashem Abdel-Aziz, and Mohamed - Monier Hassan",Clinical Relationships Extraction Techniques from Patient Narratives,cs.IR cs.CL," The Clinical E-Science Framework (CLEF) project was used to extract important -information from medical texts by building a system for the purpose of clinical -research, evidence-based healthcare and genotype-meets-phenotype informatics. -The system is divided into two parts, one part concerns with the identification -of relationships between clinically important entities in the text. The full -parses and domain-specific grammars had been used to apply many approaches to -extract the relationship. In the second part of the system, statistical machine -learning (ML) approaches are applied to extract relationship. A corpus of -oncology narratives that hand annotated with clinical relationships can be used -to train and test a system that has been designed and implemented by supervised -machine learning (ML) approaches. Many features can be extracted from these -texts that are used to build a model by the classifier. Multiple supervised -machine learning algorithms can be applied for relationship extraction. Effects -of adding the features, changing the size of the corpus, and changing the type -of the algorithm on relationship extraction are examined. Keywords: Text -mining; information extraction; NLP; entities; and relations. -" -800,1306.5263,"Haonan Yu, Jeffrey Mark Siskind","Discriminative Training: Learning to Describe Video with Sentences, from - Video Described with Sentences",cs.CV cs.CL," We present a method for learning word meanings from complex and realistic -video clips by discriminatively training (DT) positive sentential labels -against negative ones, and then use the trained word models to generate -sentential descriptions for new video. This new work is inspired by recent work -which adopts a maximum likelihood (ML) framework to address the same problem -using only positive sentential labels. The new method, like the ML-based one, -is able to automatically determine which words in the sentence correspond to -which concepts in the video (i.e., ground words to meanings) in a weakly -supervised fashion. While both DT and ML yield comparable results with -sufficient training data, DT outperforms ML significantly with smaller training -sets because it can exploit negative training labels to better constrain the -learning problem. -" -801,1306.6078,"Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure - Leskovec, Christopher Potts","A Computational Approach to Politeness with Application to Social - Factors",cs.CL cs.SI physics.soc-ph," We propose a computational framework for identifying linguistic aspects of -politeness. Our starting point is a new corpus of requests annotated for -politeness, which we use to evaluate aspects of politeness theory and to -uncover new interactions between politeness markers and context. These findings -guide our construction of a classifier with domain-independent lexical and -syntactic features operationalizing key components of politeness theory, such -as indirection, deference, impersonalization and modality. Our classifier -achieves close to human performance and is effective across domains. We use our -framework to study the relationship between politeness and social power, -showing that polite Wikipedia editors are more likely to achieve high status -through elections, but, once elevated, they become less polite. We see a -similar negative correlation between politeness and power on Stack Exchange, -where users at the top of the reputation scale are less polite than those at -the bottom. Finally, we apply our classifier to a preliminary analysis of -politeness variation by gender and community. -" -802,1306.6130,Robert Bishop Jr,Competency Tracking for English as a Second or Foreign Language Learners,cs.CL," My system utilizes the outcomes feature found in Moodle and other learning -content management systems (LCMSs) to keep track of where students are in terms -of what language competencies they have mastered and the competencies they need -to get where they want to go. These competencies are based on the Common -European Framework for (English) Language Learning. This data can be available -for everyone involved with a given student's progress (e.g. educators, parents, -supervisors and the students themselves). A given student's record of past -accomplishments can also be meshed with those of his classmates. Not only are a -student's competencies easily seen and tracked, educators can view competencies -of a group of students that were achieved prior to enrollment in the class. -This should make curriculum decision making easier and more efficient for -educators. -" -803,1306.6755,Kareem Darwish,Arabizi Detection and Conversion to Arabic,cs.CL cs.IR," Arabizi is Arabic text that is written using Latin characters. Arabizi is -used to present both Modern Standard Arabic (MSA) or Arabic dialects. It is -commonly used in informal settings such as social networking sites and is often -with mixed with English. In this paper we address the problems of: identifying -Arabizi in text and converting it to Arabic characters. We used word and -sequence-level features to identify Arabizi that is mixed with English. We -achieved an identification accuracy of 98.5%. As for conversion, we used -transliteration mining with language modeling to generate equivalent Arabic -text. We achieved 88.7% conversion accuracy, with roughly a third of errors -being spelling and morphological variants of the forms in ground truth. -" -804,1306.6944,"Ulf Sch\""oneberg and Wolfram Sperber",The DeLiVerMATH project - Text analysis in mathematics,cs.CL cs.DL cs.IR," A high-quality content analysis is essential for retrieval functionalities -but the manual extraction of key phrases and classification is expensive. -Natural language processing provides a framework to automatize the process. -Here, a machine-based approach for the content analysis of mathematical texts -is described. A prototype for key phrase extraction and classification of -mathematical texts is presented. -" -805,1307.0087,Fabrizio M.A. Lolli,"Semantics and pragmatics in actual software applications and in web - search engines: exploring innovations",cs.IR cs.CL cs.HC," While new ways to use the Semantic Web are developed every week, which allow -the user to find information on web more accurately - for example in search -engines - some sophisticated pragmatic tools are becoming more important - for -example in web interfaces known as Social Intelligence, or in the most famous -Siri by Apple. The work aims to analyze whether and where we can identify the -boundary between semantics and pragmatics in the software used by analyzed -systems. examining how the linguistic disciplines are fundamental in their -progress. Is it possible to assume that the tools of social intelligence have a -pragmatic approach to the questions of the user, or it is just a use of a very -rich vocabulary, with the use of semantic tools? -" -806,1307.0261,"Bhavana Dalvi, William W. Cohen, and Jamie Callan","WebSets: Extracting Sets of Entities from the Web Using Unsupervised - Information Extraction",cs.LG cs.CL cs.IR," We describe a open-domain information extraction method for extracting -concept-instance pairs from an HTML corpus. Most earlier approaches to this -problem rely on combining clusters of distributionally similar terms and -concept-instance pairs obtained with Hearst patterns. In contrast, our method -relies on a novel approach for clustering terms found in HTML tables, and then -assigning concept names to these clusters using Hearst patterns. The method can -be efficiently applied to a large corpus, and experimental results on several -datasets show that our method can accurately extract large numbers of -concept-instance pairs. -" -807,1307.0596,Om P. Damani,"Improving Pointwise Mutual Information (PMI) by Incorporating - Significant Co-occurrence",cs.CL," We design a new co-occurrence based word association measure by incorporating -the concept of significant cooccurrence in the popular word association measure -Pointwise Mutual Information (PMI). By extensive experiments with a large -number of publicly available datasets we show that the newly introduced measure -performs better than other co-occurrence based measures and despite being -resource-light, compares well with the best known resource-heavy distributional -similarity and knowledge based word association measures. We investigate the -source of this performance improvement and find that of the two types of -significant co-occurrence - corpus-level and document-level, the concept of -corpus level significance combined with the use of document counts in place of -word counts is responsible for all the performance gains observed. The concept -of document level significance is not helpful for PMI adaptation. -" -808,1307.1662,"Rami Al-Rfou, Bryan Perozzi, Steven Skiena",Polyglot: Distributed Word Representations for Multilingual NLP,cs.CL cs.LG," Distributed word representations (word embeddings) have recently contributed -to competitive performance in language modeling and several NLP tasks. In this -work, we train word embeddings for more than 100 languages using their -corresponding Wikipedias. We quantitatively demonstrate the utility of our word -embeddings by using them as the sole features for training a part of speech -tagger for a subset of these languages. We find their performance to be -competitive with near state-of-art methods in English, Danish and Swedish. -Moreover, we investigate the semantic features captured by these embeddings -through the proximity of word groupings. We will release these embeddings -publicly to help researchers in the development and enhancement of multilingual -applications. -" -809,1307.1872,"Ibrahim Sabek, Noha A. Yousri, Nagwa Elmakky and Mona Habib",Intelligent Hybrid Man-Machine Translation Quality Estimation,cs.CL," Inferring evaluation scores based on human judgments is invaluable compared -to using current evaluation metrics which are not suitable for real-time -applications e.g. post-editing. However, these judgments are much more -expensive to collect especially from expert translators, compared to evaluation -based on indicators contrasting source and translation texts. This work -introduces a novel approach for quality estimation by combining learnt -confidence scores from a probabilistic inference model based on human -judgments, with selective linguistic features-based scores, where the proposed -inference model infers the credibility of given human ranks to solve the -scarcity and inconsistency issues of human judgments. Experimental results, -using challenging language-pairs, demonstrate improvement in correlation with -human judgments over traditional evaluation metrics. -" -810,1307.3040,Mehul Bhatt,"Between Sense and Sensibility: Declarative narrativisation of mental - models as a basis and benchmark for visuo-spatial cognition and computation - focussed collaborative cognitive systems",cs.AI cs.CL cs.CV cs.HC cs.RO," What lies between `\emph{sensing}' and `\emph{sensibility}'? In other words, -what kind of cognitive processes mediate sensing capability, and the formation -of sensible impressions ---e.g., abstractions, analogies, hypotheses and theory -formation, beliefs and their revision, argument formation--- in domain-specific -problem solving, or in regular activities of everyday living, working and -simply going around in the environment? How can knowledge and reasoning about -such capabilities, as exhibited by humans in particular problem contexts, be -used as a model and benchmark for the development of collaborative cognitive -(interaction) systems concerned with human assistance, assurance, and -empowerment? - We pose these questions in the context of a range of assistive technologies -concerned with \emph{visuo-spatial perception and cognition} tasks encompassing -aspects such as commonsense, creativity, and the application of specialist -domain knowledge and problem-solving thought processes. Assistive technologies -being considered include: (a) human activity interpretation; (b) high-level -cognitive rovotics; (c) people-centred creative design in domains such as -architecture & digital media creation, and (d) qualitative analyses geographic -information systems. Computational narratives not only provide a rich cognitive -basis, but they also serve as a benchmark of functional performance in our -development of computational cognitive assistance systems. We posit that -computational narrativisation pertaining to space, actions, and change provides -a useful model of \emph{visual} and \emph{spatio-temporal thinking} within a -wide-range of problem-solving tasks and application areas where collaborative -cognitive systems could serve an assistive and empowering function. -" -811,1307.3310,"Juhi Ameta, Nisheeth Joshi and Iti Mathur","Improving the quality of Gujarati-Hindi Machine Translation through - part-of-speech tagging and stemmer-assisted transliteration",cs.CL," Machine Translation for Indian languages is an emerging research area. -Transliteration is one such module that we design while designing a translation -system. Transliteration means mapping of source language text into the target -language. Simple mapping decreases the efficiency of overall translation -system. We propose the use of stemming and part-of-speech tagging for -transliteration. The effectiveness of translation can be improved if we use -part-of-speech tagging and stemming assisted transliteration.We have shown that -much of the content in Gujarati gets transliterated while being processed for -translation to Hindi language. -" -812,1307.3336,"Arti Buche, Dr. M. B. Chandak and Akshay Zadgaonkar",Opinion Mining and Analysis: A survey,cs.CL cs.IR," The current research is focusing on the area of Opinion Mining also called as -sentiment analysis due to sheer volume of opinion rich web resources such as -discussion forums, review sites and blogs are available in digital form. One -important problem in sentiment analysis of product reviews is to produce -summary of opinions based on product features. We have surveyed and analyzed in -this paper, various techniques that have been developed for the key tasks of -opinion mining. We have provided an overall picture of what is involved in -developing a software system for opinion mining on the basis of our survey and -analysis. -" -813,1307.3489,Bilel Ben Ali and Fethi Jarray,Genetic approach for arabic part of speech tagging,cs.CL cs.NE," With the growing number of textual resources available, the ability to -understand them becomes critical. An essential first step in understanding -these sources is the ability to identify the part of speech in each sentence. -Arabic is a morphologically rich language, wich presents a challenge for part -of speech tagging. In this paper, our goal is to propose, improve and implement -a part of speech tagger based on a genetic alorithm. The accuracy obtained with -this method is comparable to that of other probabilistic approaches. -" -814,1307.4038,Bob Coecke,"An alternative Gospel of structure: order, composition, processes",math.CT cs.CL quant-ph," We survey some basic mathematical structures, which arguably are more -primitive than the structures taught at school. These structures are orders, -with or without composition, and (symmetric) monoidal categories. We list -several `real life' incarnations of each of these. This paper also serves as an -introduction to these structures and their current and potentially future uses -in linguistics, physics and knowledge representation. -" -815,1307.4299,"Jyoti Singh, Nisheeth Joshi, Iti Mathur",Part of Speech Tagging of Marathi Text Using Trigram Method,cs.CL," In this paper we present a Marathi part of speech tagger. It is a -morphologically rich language. It is spoken by the native people of -Maharashtra. The general approach used for development of tagger is statistical -using trigram Method. The main concept of trigram is to explore the most likely -POS for a token based on given information of previous two tags by calculating -probabilities to determine which is the best sequence of a tag. In this paper -we show the development of the tagger. Moreover we have also shown the -evaluation done. -" -816,1307.4300,"Deepti Bhalla, Nisheeth Joshi and Iti Mathur",Rule Based Transliteration Scheme for English to Punjabi,cs.CL," Machine Transliteration has come out to be an emerging and a very important -research area in the field of machine translation. Transliteration basically -aims to preserve the phonological structure of words. Proper transliteration of -name entities plays a very significant role in improving the quality of machine -translation. In this paper we are doing machine transliteration for -English-Punjabi language pair using rule based approach. We have constructed -some rules for syllabification. Syllabification is the process to extract or -separate the syllable from the words. In this we are calculating the -probabilities for name entities (Proper names and location). For those words -which do not come under the category of name entities, separate probabilities -are being calculated by using relative frequency through a statistical machine -translation toolkit known as MOSES. Using these probabilities we are -transliterating our input text from English to Punjabi. -" -817,1307.4879,"Carlos Castillo, Gianmarco De Francisci Morales, Marcelo Mendoza, - Nasir Khan",Says who? Automatic Text-Based Content Analysis of Television News,cs.CL cs.IR," We perform an automatic analysis of television news programs, based on the -closed captions that accompany them. Specifically, we collect all the news -broadcasted in over 140 television channels in the US during a period of six -months. We start by segmenting, processing, and annotating the closed captions -automatically. Next, we focus on the analysis of their linguistic style and on -mentions of people using NLP methods. We present a series of key insights about -news providers, people in the news, and we discuss the biases that can be -uncovered by automatic means. These insights are contrasted by looking at the -data from multiple points of view, including qualitative assessment. -" -818,1307.4986,Diego Gabriel Krivochen,On the Necessity of Mixed Models: Dynamical Frustrations in the Mind,nlin.CD cs.CL math.DS," In the present work we will present and analyze some basic processes at the -local and global level in linguistic derivations that seem to go beyond the -limits of Markovian or Turing-like computation, and require, in our opinion, a -quantum processor. We will first present briefly the working hypothesis and -then focus on the empirical domain. At the same time, we will argue that a -model appealing to only one kind of computation (be it quantum or not) is -necessarily insufficient, and thus both linear and non-linear formal models are -to be invoked in order to pursue a fuller understanding of mental computations -within a unified framework. -" -819,1307.5336,"Pekka Malo, Ankur Sinha, Pyry Takala, Pekka Korhonen, Jyrki Wallenius",Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts,cs.CL cs.IR q-fin.CP," The use of robo-readers to analyze news texts is an emerging technology trend -in computational finance. In recent research, a substantial effort has been -invested to develop sophisticated financial polarity-lexicons that can be used -to investigate how financial sentiments relate to future company performance. -However, based on experience from other fields, where sentiment analysis is -commonly applied, it is well-known that the overall semantic orientation of a -sentence may differ from the prior polarity of individual words. The objective -of this article is to investigate how semantic orientations can be better -detected in financial and economic news by accommodating the overall -phrase-structure information and domain-specific use of language. Our three -main contributions are: (1) establishment of a human-annotated finance -phrase-bank, which can be used as benchmark for training and evaluating -alternative models; (2) presentation of a technique to enhance financial -lexicons with attributes that help to identify expected direction of events -that affect overall sentiment; (3) development of a linearized phrase-structure -model for detecting contextual semantic orientations in financial and economic -news texts. The relevance of the newly added lexicon features and the benefit -of using the proposed learning-algorithm are demonstrated in a comparative -study against previously used general sentiment models as well as the popular -word frequency models used in recent financial studies. The proposed framework -is parsimonious and avoids the explosion in feature-space caused by the use of -conventional n-gram features. -" -820,1307.5393,Miral Patel and Prem Balani,Clustering Algorithm for Gujarati Language,cs.CL," Natural language processing area is still under research. But now a day it is -on platform for worldwide researchers. Natural language processing includes -analyzing the language based on its structure and then tagging of each word -appropriately with its grammar base. Here we have 50,000 tagged words set and -we try to cluster those Gujarati words based on proposed algorithm, we have -defined our own algorithm for processing. Many clustering techniques are -available Ex. Single linkage, complete, linkage,average linkage, Hear no of -clusters to be formed are not known, so it is all depends on the type of data -set provided . Clustering is preprocess for stemming . Stemming is the process -where root is extracted from its word. Ex. cats= cat+S, meaning. Cat: Noun and -plural form. -" -821,1307.5736,"R. Sandanalakshmi, P. Abinaya Viji, M. Kiruthiga, M. Manjari, M. - Sharina","Speaker Independent Continuous Speech to Text Converter for Mobile - Application",cs.CL cs.NE cs.SD," An efficient speech to text converter for mobile application is presented in -this work. The prime motive is to formulate a system which would give optimum -performance in terms of complexity, accuracy, delay and memory requirements for -mobile environment. The speech to text converter consists of two stages namely -front-end analysis and pattern recognition. The front end analysis involves -preprocessing and feature extraction. The traditional voice activity detection -algorithms which track only energy cannot successfully identify potential -speech from input because the unwanted part of the speech also has some energy -and appears to be speech. In the proposed system, VAD that calculates energy of -high frequency part separately as zero crossing rate to differentiate noise -from speech is used. Mel Frequency Cepstral Coefficient (MFCC) is used as -feature extraction method and Generalized Regression Neural Network is used as -recognizer. MFCC provides low word error rate and better feature extraction. -Neural Network improves the accuracy. Thus a small database containing all -possible syllable pronunciation of the user is sufficient to give recognition -accuracy closer to 100%. Thus the proposed technique entertains realization of -real time speaker independent applications like mobile phones, PDAs etc. -" -822,1307.6163,"Nisheeth Joshi, Hemant Darbari, Iti Mathur",Human and Automatic Evaluation of English-Hindi Machine Translation,cs.CL," For the past 60 years, Research in machine translation is going on. For the -development in this field, a lot of new techniques are being developed each -day. As a result, we have witnessed development of many automatic machine -translators. A manager of machine translation development project needs to know -the performance increase/decrease, after changes have been done in his system. -Due to this reason, a need for evaluation of machine translation systems was -felt. In this article, we shall present the evaluation of some machine -translators. This evaluation will be done by a human evaluator and by some -automatic evaluation metrics, which will be done at sentence, document and -system level. In the end we shall also discuss the comparison between the -evaluations. -" -823,1307.6235,Anindya Kumar Biswas,Graphical law beneath each written natural language,physics.gen-ph cs.CL," We study twenty four written natural languages. We draw in the log scale, -number of words starting with a letter vs rank of the letter, both normalised. -We find that all the graphs are of the similar type. The graphs are -tantalisingly closer to the curves of reduced magnetisation vs reduced -temperature for magnetic materials. We make a weak conjecture that a curve of -magnetisation underlies a written natural language. -" -824,1307.6726,Steven T. Piantadosi and Harry Tily and Edward Gibson,"Information content versus word length in natural language: A reply to - Ferrer-i-Cancho and Moscoso del Prado Martin [arXiv:1209.1751]",cs.CL math.PR physics.data-an," Recently, Ferrer i Cancho and Moscoso del Prado Martin [arXiv:1209.1751] -argued that an observed linear relationship between word length and average -surprisal (Piantadosi, Tily, & Gibson, 2011) is not evidence for communicative -efficiency in human language. We discuss several shortcomings of their approach -and critique: their model critically rests on inaccurate assumptions, is -incapable of explaining key surprisal patterns in language, and is incompatible -with recent behavioral results. More generally, we argue that statistical -models must not critically rely on assumptions that are incompatible with the -real system under study. -" -825,1307.6937,"Renu Mudgal, Rosy Madaan, A.K.Sharma, Ashutosh Dixit","A Novel Architecture For Question Classification Based Indexing Scheme - For Efficient Question Answering",cs.IR cs.CL," Question answering system can be seen as the next step in information -retrieval, allowing users to pose question in natural language and receive -compact answers. For the Question answering system to be successful, research -has shown that the correct classification of question with respect to the -expected answer type is requisite. We propose a novel architecture for question -classification and searching in the index, maintained on the basis of expected -answer types, for efficient question answering. The system uses the criteria -for Answer Relevance Score for finding the relevance of each answer returned by -the system. On analysis of the proposed system, it has been found that the -system has shown promising results than the existing systems based on question -classification. -" -826,1307.7382,Brendan O'Connor,Learning Frames from Text with an Unsupervised Latent Variable Model,cs.CL," We develop a probabilistic latent-variable model to discover semantic -frames---types of events and their participants---from corpora. We present a -Dirichlet-multinomial model in which frames are latent categories that explain -the linking of verb-subject-object triples, given document-level sparsity. We -analyze what the model learns, and compare it to FrameNet, noting it learns -some novel and interesting frames. This document also contains a discussion of -inference issues, including concentration parameter learning; and a small-scale -error analysis of syntactic parsing accuracy. -" -827,1307.7973,"Jason Weston, Antoine Bordes, Oksana Yakhnenko, Nicolas Usunier","Connecting Language and Knowledge Bases with Embedding Models for - Relation Extraction",cs.CL cs.IR cs.LG," This paper proposes a novel approach for relation extraction from free text -which is trained to jointly use information from the text and from existing -knowledge. Our model is based on two scoring functions that operate by learning -low-dimensional embeddings of words and of entities and relationships from a -knowledge base. We empirically show on New York Times articles aligned with -Freebase relations that our approach is able to efficiently use the extra -information provided by a large subset of Freebase data (4M entities, 23k -relationships) to improve over existing methods that rely on text features -alone. -" -828,1307.8057,Rushdi Shams and Robert E. Mercer,Extracting Connected Concepts from Biomedical Texts using Fog Index,cs.CL cs.IR," In this paper, we establish Fog Index (FI) as a text filter to locate the -sentences in texts that contain connected biomedical concepts of interest. To -do so, we have used 24 random papers each containing four pairs of connected -concepts. For each pair, we categorize sentences based on whether they contain -both, any or none of the concepts. We then use FI to measure difficulty of the -sentences of each category and find that sentences containing both of the -concepts have low readability. We rank sentences of a text according to their -FI and select 30 percent of the most difficult sentences. We use an association -matrix to track the most frequent pairs of concepts in them. This matrix -reports that the first filter produces some pairs that hold almost no -connections. To remove these unwanted pairs, we use the Equally Weighted -Harmonic Mean of their Positive Predictive Value (PPV) and Sensitivity as a -second filter. Experimental results demonstrate the effectiveness of our -method. -" -829,1307.8060,Rushdi Shams,Extracting Information-rich Part of Texts using Text Denoising,cs.IR cs.CL," The aim of this paper is to report on a novel text reduction technique, -called Text Denoising, that highlights information-rich content when processing -a large volume of text data, especially from the biomedical domain. The core -feature of the technique, the text readability index, embodies the hypothesis -that complex text is more information-rich than the rest. When applied on tasks -like biomedical relation bearing text extraction, keyphrase indexing and -extracting sentences describing protein interactions, it is evident that the -reduced set of text produced by text denoising is more information-rich than -the rest. -" -830,1307.8225,"Deepti Kapri, Rosy Madaan, A. K Sharma, Ashutosh Dixit",A Novel Architecture for Relevant Blog Page Identifcation,cs.IR cs.CL," Blogs are undoubtedly the richest source of information available in -cyberspace. Blogs can be of various natures i.e. personal blogs which contain -posts on mixed issues or blogs can be domain specific which contains posts on -particular topics, this is the reason, they offer wide variety of relevant -information which is often focused. A general search engine gives back a huge -collection of web pages which may or may not give correct answers, as web is -the repository of information of all kinds and a user has to go through various -documents before he gets what he was originally looking for, which is a very -time consuming process. So, the search can be made more focused and accurate if -it is limited to blogosphere instead of web pages. The reason being that the -blogs are more focused in terms of information. So, User will only get related -blogs in response to his query. These results will be then ranked according to -our proposed method and are finally presented in front of user in descending -order -" -831,1308.0658,"Jimmy SJ. Ren, Wei Wang, Jiawei Wang, Stephen Shaoyi Liao","Exploring The Contribution of Unlabeled Data in Financial Sentiment - Analysis",cs.CL cs.LG," With the proliferation of its applications in various industries, sentiment -analysis by using publicly available web data has become an active research -area in text classification during these years. It is argued by researchers -that semi-supervised learning is an effective approach to this problem since it -is capable to mitigate the manual labeling effort which is usually expensive -and time-consuming. However, there was a long-term debate on the effectiveness -of unlabeled data in text classification. This was partially caused by the fact -that many assumptions in theoretic analysis often do not hold in practice. We -argue that this problem may be further understood by adding an additional -dimension in the experiment. This allows us to address this problem in the -perspective of bias and variance in a broader view. We show that the well-known -performance degradation issue caused by unlabeled data can be reproduced as a -subset of the whole scenario. We argue that if the bias-variance trade-off is -to be better balanced by a more effective feature selection method unlabeled -data is very likely to boost the classification performance. We then propose a -feature selection framework in which labeled and unlabeled training samples are -both considered. We discuss its potential in achieving such a balance. Besides, -the application in financial sentiment analysis is chosen because it not only -exemplifies an important application, the data possesses better illustrative -power as well. The implications of this study in text classification and -financial sentiment analysis are both discussed. -" -832,1308.0661,Samet Atda\u{g} and Vincent Labatut,"A Comparison of Named Entity Recognition Tools Applied to Biographical - Texts",cs.IR cs.CL," Named entity recognition (NER) is a popular domain of natural language -processing. For this reason, many tools exist to perform this task. Amongst -other points, they differ in the processing method they rely upon, the entity -types they can detect, the nature of the text they can handle, and their -input/output formats. This makes it difficult for a user to select an -appropriate NER tool for a specific situation. In this article, we try to -answer this question in the context of biographic texts. For this matter, we -first constitute a new corpus by annotating Wikipedia articles. We then select -publicly available, well known and free for research NER tools for comparison: -Stanford NER, Illinois NET, OpenCalais NER WS and Alias-i LingPipe. We apply -them to our corpus, assess their performances and compare them. When -considering overall performances, a clear hierarchy emerges: Stanford has the -best results, followed by LingPipe, Illionois and OpenCalais. However, a more -detailed evaluation performed relatively to entity types and article categories -highlights the fact their performances are diversely influenced by those -factors. This complementarity opens an interesting perspective regarding the -combination of these individual tools in order to improve performance. -" -833,1308.0701,"Meisam Booshehri, Abbas Malekpour, Peter Luksch, Kamran Zamanifar, - Shahdad Shariatmadari",Ontology Enrichment by Extracting Hidden Assertional Knowledge from Text,cs.IR cs.CL," In this position paper we present a new approach for discovering some special -classes of assertional knowledge in the text by using large RDF repositories, -resulting in the extraction of new non-taxonomic ontological relations. Also we -use inductive reasoning beside our approach to make it outperform. Then, we -prepare a case study by applying our approach on sample data and illustrate the -soundness of our proposed approach. Moreover in our point of view current LOD -cloud is not a suitable base for our proposal in all informational domains. -Therefore we figure out some directions based on prior works to enrich datasets -of Linked Data by using web mining. The result of such enrichment can be reused -for further relation extraction and ontology enrichment from unstructured free -text documents. -" -834,1308.0850,Alex Graves,Generating Sequences With Recurrent Neural Networks,cs.NE cs.CL," This paper shows how Long Short-term Memory recurrent neural networks can be -used to generate complex sequences with long-range structure, simply by -predicting one data point at a time. The approach is demonstrated for text -(where the data are discrete) and online handwriting (where the data are -real-valued). It is then extended to handwriting synthesis by allowing the -network to condition its predictions on a text sequence. The resulting system -is able to generate highly realistic cursive handwriting in a wide variety of -styles. -" -835,1308.0897,"Kowcika A, Uma Maheswari, Geetha T V",Context Specific Event Model For News Articles,cs.CL cs.IR," We present a new context based event indexing and event ranking model for -News Articles. The context event clusters formed from the UNL Graphs uses the -modified scoring scheme for segmenting events which is followed by clustering -of events. From the context clusters obtained three models are developed- -Identification of Main and Sub events; Event Indexing and Event Ranking. Based -on the properties considered from the UNL Graphs for the modified scoring main -events and sub events associated with main-events are identified. The temporal -details obtained from the context cluster are stored using hashmap data -structure. The temporal details are place-where the event took; person-who -involved in that event; time-when the event took place. Based on the -information collected from the context clusters three indices are generated- -Time index, Person index, and Place index. This index gives complete details -about every event obtained from context clusters. A new scoring scheme is -introduced for ranking the events. The scoring scheme for event ranking gives -weight-age based on the priority level of the events. The priority level -includes the occurrence of the event in the title of the document, event -frequency, and inverse document frequency of the events. -" -836,1308.1004,Azad Dehghan,Boundary identification of events in clinical named entity recognition,cs.CL," The problem of named entity recognition in the medical/clinical domain has -gained increasing attention do to its vital role in a wide range of clinical -decision support applications. The identification of complete and correct term -span is vital for further knowledge synthesis (e.g., coding/mapping concepts -thesauruses and classification standards). This paper investigates boundary -adjustment by sequence labeling representations models and post-processing -techniques in the problem of clinical named entity recognition (recognition of -clinical events). Using current state-of-the-art sequence labeling algorithm -(conditional random fields), we show experimentally that sequence labeling -representation and post-processing can be significantly helpful in strict -boundary identification of clinical events. -" -837,1308.1292,Elysia Wells,"Science Fiction as a Worldwide Phenomenon: A Study of International - Creation, Consumption and Dissemination",cs.DL cs.CL cs.SI physics.soc-ph," This paper examines the international nature of science fiction. The focus of -this research is to determine whether science fiction is primarily English -speaking and Western or global; being created and consumed by people in -non-Western, non-English speaking countries? Science fiction's international -presence was found in three ways, by network analysis, by examining a online -retailer and with a survey. Condor, a program developed by GalaxyAdvisors was -used to determine if science fiction is being talked about by non-English -speakers. An analysis of the international Amazon.com websites was done to -discover if it was being consumed worldwide. A survey was also conducted to see -if people had experience with science fiction. All three research methods -revealed similar results. Science fiction was found to be international, with -science fiction creators originating in different countries and writing in a -host of different languages. English and non-English science fiction was being -created and consumed all over the world, not just in the English speaking West. -" -838,1308.1507,Yuriy Ostapov,"Logical analysis of natural language semantics to solve the problem of - computer understanding",cs.CL," An object--oriented approach to create a natural language understanding -system is considered. The understanding program is a formal system built on the -base of predicative calculus. Horn's clauses are used as well--formed formulas. -An inference is based on the principle of resolution. Sentences of natural -language are represented in the view of typical predicate set. These predicates -describe physical objects and processes, abstract objects, categories and -semantic relations between objects. Predicates for concrete assertions are -saved in a database. To describe the semantics of classes for physical objects, -abstract concepts and processes, a knowledge base is applied. The proposed -representation of natural language sentences is a semantic net. Nodes of such -net are typical predicates. This approach is perspective as, firstly, such -typification of nodes facilitates essentially forming of processing algorithms -and object descriptions, secondly, the effectiveness of algorithms is increased -(particularly for the great number of nodes), thirdly, to describe the -semantics of words, encyclopedic knowledge is used, and this permits -essentially to extend the class of solved problems. -" -839,1308.1847,"Vu Dung Nguyen, Blesson Varghese, Adam Barker","The Royal Birth of 2013: Analysing and Visualising Public Sentiment in - the UK Using Twitter",cs.CL cs.IR cs.SI physics.soc-ph," Analysis of information retrieved from microblogging services such as Twitter -can provide valuable insight into public sentiment in a geographic region. This -insight can be enriched by visualising information in its geographic context. -Two underlying approaches for sentiment analysis are dictionary-based and -machine learning. The former is popular for public sentiment analysis, and the -latter has found limited use for aggregating public sentiment from Twitter -data. The research presented in this paper aims to extend the machine learning -approach for aggregating public sentiment. To this end, a framework for -analysing and visualising public sentiment from a Twitter corpus is developed. -A dictionary-based approach and a machine learning approach are implemented -within the framework and compared using one UK case study, namely the royal -birth of 2013. The case study validates the feasibility of the framework for -analysis and rapid visualisation. One observation is that there is good -correlation between the results produced by the popular dictionary-based -approach and the machine learning approach when large volumes of tweets are -analysed. However, for rapid analysis to be possible faster methods need to be -developed using big data techniques and parallel methods. -" -840,1308.2359,"Arun S. Maiya, John P. Thompson, Francisco Loaiza-Lemos, Robert M. - Rolfe",Exploratory Analysis of Highly Heterogeneous Document Collections,cs.CL cs.HC cs.IR," We present an effective multifaceted system for exploratory analysis of -highly heterogeneous document collections. Our system is based on intelligently -tagging individual documents in a purely automated fashion and exploiting these -tags in a powerful faceted browsing framework. Tagging strategies employed -include both unsupervised and supervised approaches based on machine learning -and natural language processing. As one of our key tagging strategies, we -introduce the KERA algorithm (Keyword Extraction for Reports and Articles). -KERA extracts topic-representative terms from individual documents in a purely -unsupervised fashion and is revealed to be significantly more effective than -state-of-the-art methods. Finally, we evaluate our system in its ability to -help users locate documents pertaining to military critical technologies buried -deep in a large heterogeneous sea of information. -" -841,1308.2428,"Olivier Picard, M\'elanie Lord, Alexandre Blondin-Mass\'e, Odile - Marcotte, Marcos Lopes and Stevan Harnad",Hidden Structure and Function in the Lexicon,cs.CL," How many words are needed to define all the words in a dictionary? -Graph-theoretic analysis reveals that about 10% of a dictionary is a unique -Kernel of words that define one another and all the rest, but this is not the -smallest such subset. The Kernel consists of one huge strongly connected -component (SCC), about half its size, the Core, surrounded by many small SCCs, -the Satellites. Core words can define one another but not the rest of the -dictionary. The Kernel also contains many overlapping Minimal Grounding Sets -(MGSs), each about the same size as the Core, each part-Core, part-Satellite. -MGS words can define all the rest of the dictionary. They are learned earlier, -more concrete and more frequent than the rest of the dictionary. Satellite -words, not correlated with age or frequency, are less concrete (more abstract) -words that are also needed for full lexical power. -" -842,1308.2696,A. Paxton and R. Dale,B(eo)W(u)LF: Facilitating recurrence analysis on multi-level language,cs.CL," Discourse analysis may seek to characterize not only the overall composition -of a given text but also the dynamic patterns within the data. This technical -report introduces a data format intended to facilitate multi-level -investigations, which we call the by-word long-form or B(eo)W(u)LF. Inspired by -the long-form data format required for mixed-effects modeling, B(eo)W(u)LF -structures linguistic data into an expanded matrix encoding any number of -researchers-specified markers, making it ideal for recurrence-based analyses. -While we do not necessarily claim to be the first to use methods along these -lines, we have created a series of tools utilizing Python and MATLAB to enable -such discourse analyses and demonstrate them using 319 lines of the Old English -epic poem, Beowulf, translated into modern English. -" -843,1308.3106,"Sachin Kumar, Ashish Kumar, Pinaki Mitra, Girish Sundaram",System and Methods for Converting Speech to SQL,cs.CL cs.DB," This paper concerns with the conversion of a Spoken English Language Query -into SQL for retrieving data from RDBMS. A User submits a query as speech -signal through the user interface and gets the result of the query in the text -format. We have developed the acoustic and language models using which a speech -utterance can be converted into English text query and thus natural language -processing techniques can be applied on this English text query to generate an -equivalent SQL query. For conversion of speech into English text HTK and Julius -tools have been used and for conversion of English text query into SQL query we -have implemented a System which uses rule based translation to translate -English Language Query into SQL Query. The translation uses lexical analyzer, -parser and syntax directed translation techniques like in compilers. JFLex and -BYACC tools have been used to build lexical analyzer and parser respectively. -System is domain independent i.e. system can run on different database as it -generates lex files from the underlying database. -" -844,1308.3243,"M. Ben Halima, H. Karray and A. M. Alimi",Arabic Text Recognition in Video Sequences,cs.MM cs.CL cs.CV," In this paper, we propose a robust approach for text extraction and -recognition from Arabic news video sequence. The text included in video -sequences is an important needful for indexing and searching system. However, -this text is difficult to detect and recognize because of the variability of -its size, their low resolution characters and the complexity of the -backgrounds. To solve these problems, we propose a system performing in two -main tasks: extraction and recognition of text. Our system is tested on a -varied database composed of different Arabic news programs and the obtained -results are encouraging and show the merits of our approach. -" -845,1308.3294,Nicholas Kersting,A Secure and Comparable Text Encryption Algorithm,cs.CR cs.CL cs.CY cs.SI," This paper discloses a simple algorithm for encrypting text messages, based -on the NP-completeness of the subset sum problem, such that the similarity -between encryptions is roughly proportional to the semantic similarity between -their generating messages. This allows parties to compare encrypted messages -for semantic overlap without trusting an intermediary and might be applied, for -example, as a means of finding scientific collaborators over the Internet. -" -846,1308.3785,"Md. Ali Hossain, Md. Mijanur Rahman, Uzzal Kumar Prodhan and Md. - Farukuzzaman Khan","Implementation Of Back-Propagation Neural Network For Isolated Bangla - Speech Recognition",cs.CL cs.NE," This paper is concerned with the development of Back-propagation Neural -Network for Bangla Speech Recognition. In this paper, ten bangla digits were -recorded from ten speakers and have been recognized. The features of these -speech digits were extracted by the method of Mel Frequency Cepstral -Coefficient (MFCC) analysis. The mfcc features of five speakers were used to -train the network with Back propagation algorithm. The mfcc features of ten -bangla digit speeches, from 0 to 9, of another five speakers were used to test -the system. All the methods and algorithms used in this research were -implemented using the features of Turbo C and C++ languages. From our -investigation it is seen that the developed system can successfully encode and -analyze the mfcc features of the speech signal to recognition. The developed -system achieved recognition rate about 96.332% for known speakers (i.e., -speaker dependent) and 92% for unknown speakers (i.e., speaker independent). -" -847,1308.3830,"Rukshan Alexander, Prashanthi Rukshan, and Sinnathamby Mahesan",Natural Language Web Interface for Database (NLWIDB),cs.CL cs.DB cs.HC," It is a long term desire of the computer users to minimize the communication -gap between the computer and a human. On the other hand, almost all ICT -applications store information in to databases and retrieve from them. -Retrieving information from the database requires knowledge of technical -languages such as Structured Query Language. However majority of the computer -users who interact with the databases do not have a technical background and -are intimidated by the idea of using languages such as SQL. For above reasons, -a Natural Language Web Interface for Database (NLWIDB) has been developed. The -NLWIDB allows the user to query the database in a language more like English, -through a convenient interface over the Internet. -" -848,1308.3839,"Tamal Chowdhury, Rabindra Rakshit and Arko Banerjee",Consensus Sequence Segmentation,cs.CL," In this paper we introduce a method to detect words or phrases in a given -sequence of alphabets without knowing the lexicon. Our linear time unsupervised -algorithm relies entirely on statistical relationships among alphabets in the -input sequence to detect location of word boundaries. We compare our algorithm -to previous approaches from unsupervised sequence segmentation literature and -provide superior segmentation over number of benchmarks. -" -849,1308.4189,"N. Siddharth, Andrei Barbu, Jeffrey Mark Siskind",Seeing What You're Told: Sentence-Guided Activity Recognition In Video,cs.CV cs.AI cs.CL," We present a system that demonstrates how the compositional structure of -events, in concert with the compositional structure of language, can interplay -with the underlying focusing mechanisms in video action recognition, thereby -providing a medium, not only for top-down and bottom-up integration, but also -for multi-modal integration between vision and language. We show how the roles -played by participants (nouns), their characteristics (adjectives), the actions -performed (verbs), the manner of such actions (adverbs), and changing spatial -relations between participants (prepositions) in the form of whole sentential -descriptions mediated by a grammar, guides the activity-recognition process. -Further, the utility and expressiveness of our framework is demonstrated by -performing three separate tasks in the domain of multi-activity videos: -sentence-guided focus of attention, generation of sentential descriptions of -video, and query-based video search, simply by leveraging the framework in -different manners. -" -850,1308.4479,Juan Luo and Yves Lepage,"An Investigation of the Sampling-Based Alignment Method and Its - Contributions",cs.CL," By investigating the distribution of phrase pairs in phrase translation -tables, the work in this paper describes an approach to increase the number of -n-gram alignments in phrase translation tables output by a sampling-based -alignment method. This approach consists in enforcing the alignment of n-grams -in distinct translation subtables so as to increase the number of n-grams. -Standard normal distribution is used to allot alignment time among translation -subtables, which results in adjustment of the distribution of n- grams. This -leads to better evaluation results on statistical machine translation tasks -than the original sampling-based alignment approach. Furthermore, the -translation quality obtained by merging phrase translation tables computed from -the sampling-based alignment method and from MGIZA++ is examined. -" -851,1308.4618,"Michael J. Bell, Matthew Collison, Phillip Lord","Can inferred provenance and its visualisation be used to detect - erroneous annotation? A case study using UniProtKB",cs.CL cs.CE cs.DL q-bio.QM," A constant influx of new data poses a challenge in keeping the annotation in -biological databases current. Most biological databases contain significant -quantities of textual annotation, which often contains the richest source of -knowledge. Many databases reuse existing knowledge, during the curation process -annotations are often propagated between entries. However, this is often not -made explicit. Therefore, it can be hard, potentially impossible, for a reader -to identify where an annotation originated from. Within this work we attempt to -identify annotation provenance and track its subsequent propagation. -Specifically, we exploit annotation reuse within the UniProt Knowledgebase -(UniProtKB), at the level of individual sentences. We describe a visualisation -approach for the provenance and propagation of sentences in UniProtKB which -enables a large-scale statistical analysis. Initially levels of sentence reuse -within UniProtKB were analysed, showing that reuse is heavily prevalent, which -enables the tracking of provenance and propagation. By analysing sentences -throughout UniProtKB, a number of interesting propagation patterns were -identified, covering over 100, 000 sentences. Over 8000 sentences remain in the -database after they have been removed from the entries where they originally -occurred. Analysing a subset of these sentences suggest that approximately 30% -are erroneous, whilst 35% appear to be inconsistent. These results suggest that -being able to visualise sentence propagation and provenance can aid in the -determination of the accuracy and quality of textual annotation. Source code -and supplementary data are available from the authors website. -" -852,1308.4648,"Nikki McNeil, Robert A. Bridges, Michael D. Iannacone, Bogdan Czejdo, - Nicolas Perez, John R. Goodall","PACE: Pattern Accurate Computationally Efficient Bootstrapping for - Timely Discovery of Cyber-Security Concepts",cs.IR cs.CL," Public disclosure of important security information, such as knowledge of -vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and -other online sources months before proper classification into structured -databases. In order to facilitate timely discovery of such knowledge, we -propose a novel semi-supervised learning algorithm, PACE, for identifying and -classifying relevant entities in text sources. The main contribution of this -paper is an enhancement of the traditional bootstrapping method for entity -extraction by employing a time-memory trade-off that simultaneously circumvents -a costly corpus search while strengthening pattern nomination, which should -increase accuracy. An implementation in the cyber-security domain is discussed -as well as challenges to Natural Language Processing imposed by the security -domain. -" -853,1308.4941,"Robert A. Bridges, Corinne L. Jones, Michael D. Iannacone, Kelly M. - Testa, John R. Goodall",Automatic Labeling for Entity Extraction in Cyber Security,cs.IR cs.CL," Timely analysis of cyber-security information necessitates automated -information extraction from unstructured text. While state-of-the-art -extraction methods produce extremely accurate results, they require ample -training data, which is generally unavailable for specialized applications, -such as detecting security related entities; moreover, manual annotation of -corpora is very costly and often not a viable solution. In response, we develop -a very precise method to automatically label text from several data sources by -leveraging related, domain-specific, structured data and provide public access -to a corpus annotated with cyber-security entities. Next, we implement a -Maximum Entropy Model trained with the average perceptron on a portion of our -corpus ($\sim$750,000 words) and achieve near perfect precision, recall, and -accuracy, with training times under 17 seconds. -" -854,1308.4965,"Maurice Margenstern, Lan Wu","A proposal for a Chinese keyboard for cellphones, smartphones, ipads and - tablets",cs.HC cs.CL," In this paper, we investigate the possibility to use two tilings of the -hyperbolic plane as basic frame for devising a way to input texts in Chinese -characters into messages of cellphones, smartphones, ipads and tablets. -" -855,1308.5010,"Karla Z. Bertrand, Maya Bialik, Kawandeep Virdee, Andreas Gros and - Yaneer Bar-Yam",Sentiment in New York City: A High Resolution Spatial and Temporal View,physics.soc-ph cs.CL cs.CY," Measuring public sentiment is a key task for researchers and policymakers -alike. The explosion of available social media data allows for a more -time-sensitive and geographically specific analysis than ever before. In this -paper we analyze data from the micro-blogging site Twitter and generate a -sentiment map of New York City. We develop a classifier specifically tuned for -140-character Twitter messages, or tweets, using key words, phrases and -emoticons to determine the mood of each tweet. This method, combined with -geotagging provided by users, enables us to gauge public sentiment on extremely -fine-grained spatial and temporal scales. We find that public mood is generally -highest in public parks and lowest at transportation hubs, and locate other -areas of strong sentiment such as cemeteries, medical centers, a jail, and a -sewage facility. Sentiment progressively improves with proximity to Times -Square. Periodic patterns of sentiment fluctuate on both a daily and a weekly -scale: more positive tweets are posted on weekends than on weekdays, with a -daily peak in sentiment around midnight and a nadir between 9:00 a.m. and noon. -" -856,1308.5423,"M.Thangarasu, R.Manavalan",A Literature Review: Stemming Algorithms for Indian Languages,cs.CL," Stemming is the process of extracting root word from the given inflection -word. It also plays significant role in numerous application of Natural -Language Processing (NLP). The stemming problem has addressed in many contexts -and by researchers in many disciplines. This expository paper presents survey -of some of the latest developments on stemming algorithms in data mining and -also presents with some of the solutions for various Indian language stemming -algorithms along with the results. -" -857,1308.5499,Bodo Winter,"Linear models and linear mixed effects models in R with linguistic - applications",cs.CL," This text is a conceptual introduction to mixed effects modeling with -linguistic applications, using the R programming environment. The reader is -introduced to linear modeling and assumptions, as well as to mixed -effects/multilevel modeling, including a discussion of random intercepts, -random slopes and likelihood ratio tests. The example used throughout the text -focuses on the phonetic analysis of voice pitch data. -" -858,1308.6242,"Saif M. Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu","NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of - Tweets",cs.CL," In this paper, we describe how we created two state-of-the-art SVM -classifiers, one to detect the sentiment of messages such as tweets and SMS -(message-level task) and one to detect the sentiment of a term within a -submissions stood first in both tasks on tweets, obtaining an F-score of 69.02 -in the message-level task and 88.93 in the term-level task. We implemented a -variety of surface-form, semantic, and sentiment features. with sentiment-word -hashtags, and one from tweets with emoticons. In the message-level task, the -lexicon-based features provided a gain of 5 F-score points over all others. -Both of our systems can be replicated us available resources. -" -859,1308.6297,Saif M. Mohammad and Peter D. Turney,Crowdsourcing a Word-Emotion Association Lexicon,cs.CL," Even though considerable attention has been given to the polarity of words -(positive and negative) and the creation of large polarity lexicons, research -in emotion analysis has had to rely on limited and small emotion lexicons. In -this paper we show how the combined strength and wisdom of the crowds can be -used to generate a large, high-quality, word-emotion and word-polarity -association lexicon quickly and inexpensively. We enumerate the challenges in -emotion annotation in a crowdsourcing scenario and propose solutions to address -them. Most notably, in addition to questions about emotions associated with -terms, we show how the inclusion of a word choice question can discourage -malicious data entry, help identify instances where the annotator may not be -familiar with the target term (allowing us to reject such annotations), and -help obtain annotations at sense level (rather than at word level). We -conducted experiments on how to formulate the emotion-annotation questions, and -show that asking if a term is associated with an emotion leads to markedly -higher inter-annotator agreement than that obtained by asking if a term evokes -an emotion. -" -860,1308.6300,"Saif M. Mohammad, Bonnie J. Dorr, Graeme Hirst, and Peter D. Turney",Computing Lexical Contrast,cs.CL," Knowing the degree of semantic contrast between words has widespread -application in natural language processing, including machine translation, -information retrieval, and dialogue systems. Manually-created lexicons focus on -opposites, such as {\rm hot} and {\rm cold}. Opposites are of many kinds such -as antipodals, complementaries, and gradable. However, existing lexicons often -do not classify opposites into the different kinds. They also do not explicitly -list word pairs that are not opposites but yet have some degree of contrast in -meaning, such as {\rm warm} and {\rm cold} or {\rm tropical} and {\rm -freezing}. We propose an automatic method to identify contrasting word pairs -that is based on the hypothesis that if a pair of words, $A$ and $B$, are -contrasting, then there is a pair of opposites, $C$ and $D$, such that $A$ and -$C$ are strongly related and $B$ and $D$ are strongly related. (For example, -there exists the pair of opposites {\rm hot} and {\rm cold} such that {\rm -tropical} is related to {\rm hot,} and {\rm freezing} is related to {\rm -cold}.) We will call this the contrast hypothesis. We begin with a large -crowdsourcing experiment to determine the amount of human agreement on the -concept of oppositeness and its different kinds. In the process, we flesh out -key features of different kinds of opposites. We then present an automatic and -empirical measure of lexical contrast that relies on the contrast hypothesis, -corpus statistics, and the structure of a {\it Roget}-like thesaurus. We show -that the proposed measure of lexical contrast obtains high precision and large -coverage, outperforming existing methods. -" -861,1308.6628,"Kewei Tu, Meng Meng, Mun Wai Lee, Tae Eun Choe, Song-Chun Zhu","Joint Video and Text Parsing for Understanding Events and Answering - Queries",cs.CV cs.CL cs.MM," We propose a framework for parsing video and text jointly for understanding -events and answering user queries. Our framework produces a parse graph that -represents the compositional structures of spatial information (objects and -scenes), temporal information (actions and events) and causal information -(causalities between events and fluents) in the video and text. The knowledge -representation of our framework is based on a spatial-temporal-causal And-Or -graph (S/T/C-AOG), which jointly models possible hierarchical compositions of -objects, scenes and events as well as their interactions and mutual contexts, -and specifies the prior probabilistic distribution of the parse graphs. We -present a probabilistic generative model for joint parsing that captures the -relations between the input video/text, their corresponding parse graphs and -the joint parse graph. Based on the probabilistic model, we propose a joint -parsing system consisting of three modules: video parsing, text parsing and -joint inference. Video parsing and text parsing produce two parse graphs from -the input video and text respectively. The joint inference module produces a -joint parse graph by performing matching, deduction and revision on the video -and text parse graphs. The proposed framework has the following objectives: -Firstly, we aim at deep semantic parsing of video and text that goes beyond the -traditional bag-of-words approaches; Secondly, we perform parsing and reasoning -across the spatial, temporal and causal dimensions based on the joint S/T/C-AOG -representation; Thirdly, we show that deep joint parsing facilitates subsequent -applications such as generating narrative text descriptions and answering -queries in the forms of who, what, when, where and why. We empirically -evaluated our system based on comparison against ground-truth as well as -accuracy of query answering and obtained satisfactory results. -" -862,1309.0326,"Micha{\l} {\L}opuszy\'nski, {\L}ukasz Bolikowski","Tagging Scientific Publications using Wikipedia and Natural Language - Processing Tools. Comparison on the ArXiv Dataset",cs.CL cs.DL," In this work, we compare two simple methods of tagging scientific -publications with labels reflecting their content. As a first source of labels -Wikipedia is employed, second label set is constructed from the noun phrases -occurring in the analyzed corpus. We examine the statistical properties and the -effectiveness of both approaches on the dataset consisting of abstracts from -0.7 million of scientific documents deposited in the ArXiv preprint collection. -We believe that obtained tags can be later on applied as useful document -features in various machine learning tasks (document similarity, clustering, -topic modelling, etc.). -" -863,1309.1014,"Bruno Mery (LaBRI), Christian Retor\'e (LaBRI)",Advances in the Logical Representation of Lexical Semantics,cs.CL," The integration of lexical semantics and pragmatics in the analysis of the -meaning of natural lan- guage has prompted changes to the global framework -derived from Montague. In those works, the original lexicon, in which words -were assigned an atomic type of a single-sorted logic, has been re- placed by a -set of many-facetted lexical items that can compose their meaning with salient -contextual properties using a rich typing system as a guide. Having related our -proposal for such an expanded framework \LambdaTYn, we present some recent -advances in the logical formalisms associated, including constraints on lexical -transformations and polymorphic quantifiers, and ongoing discussions and -research on the granularity of the type system and the limits of transitivity. -" -864,1309.1125,"Ana Cristina Mendes, Lu\'isa Coheur, S\'ergio Curto",Learning to answer questions,cs.CL," We present an open-domain Question-Answering system that learns to answer -questions based on successful past interactions. We follow a pattern-based -approach to Answer-Extraction, where (lexico-syntactic) patterns that relate a -question to its answer are automatically learned and used to answer future -questions. Results show that our approach contributes to the system's best -performance when it is conjugated with typical Answer-Extraction strategies. -Moreover, it allows the system to learn with the answered questions and to -rectify wrong or unsolved past questions. -" -865,1309.1129,"Rashmi Gupta, Nisheeth Joshi, Iti Mathur","Analysing Quality of English-Hindi Machine Translation Engine Outputs - Using Bayesian Classification",cs.CL," This paper considers the problem for estimating the quality of machine -translation outputs which are independent of human intervention and are -generally addressed using machine learning techniques.There are various -measures through which a machine learns translations quality. Automatic -Evaluation metrics produce good co-relation at corpus level but cannot produce -the same results at the same segment or sentence level. In this paper 16 -features are extracted from the input sentences and their translations and a -quality score is obtained based on Bayesian inference produced from training -data. -" -866,1309.1501,"Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. - Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin, Bhuvana - Ramabhadran",Improvements to deep convolutional neural networks for LVCSR,cs.LG cs.CL cs.NE math.OC stat.ML," Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural -Networks (DNN), as they are able to better reduce spectral variation in the -input signal. This has also been confirmed experimentally, with CNNs showing -improvements in word error rate (WER) between 4-12% relative compared to DNNs -across a variety of LVCSR tasks. In this paper, we describe different methods -to further improve CNN performance. First, we conduct a deep analysis comparing -limited weight sharing and full weight sharing with state-of-the-art features. -Second, we apply various pooling strategies that have shown improvements in -computer vision to an LVCSR speech task. Third, we introduce a method to -effectively incorporate speaker adaptation, namely fMLLR, into log-mel -features. Fourth, we introduce an effective strategy to use dropout during -Hessian-free sequence training. We find that with these improvements, -particularly with fMLLR and dropout, we are able to achieve an additional 2-3% -relative improvement in WER on a 50-hour Broadcast News task over our previous -best CNN baseline. On a larger 400-hour BN task, we find an additional 4-5% -relative improvement over our previous best CNN baseline. -" -867,1309.1508,"Tara N. Sainath, Lior Horesh, Brian Kingsbury, Aleksandr Y. Aravkin, - Bhuvana Ramabhadran","Accelerating Hessian-free optimization for deep neural networks by - implicit preconditioning and sampling",cs.LG cs.CL cs.NE math.OC stat.ML," Hessian-free training has become a popular parallel second or- der -optimization technique for Deep Neural Network training. This study aims at -speeding up Hessian-free training, both by means of decreasing the amount of -data used for training, as well as through reduction of the number of Krylov -subspace solver iterations used for implicit estimation of the Hessian. In this -paper, we develop an L-BFGS based preconditioning scheme that avoids the need -to access the Hessian explicitly. Since L-BFGS cannot be regarded as a -fixed-point iteration, we further propose the employment of flexible Krylov -subspace solvers that retain the desired theoretical convergence guarantees of -their conventional counterparts. Second, we propose a new sampling algorithm, -which geometrically increases the amount of data utilized for gradient and -Krylov subspace iteration calculations. On a 50-hr English Broadcast News task, -we find that these methodologies provide roughly a 1.5x speed-up, whereas, on a -300-hr Switchboard task, these techniques provide over a 2.3x speedup, with no -loss in WER. These results suggest that even further speed-up is expected, as -problems scale and complexity grows. -" -868,1309.1536,"W.B. Deng, A.E. Allahverdyan, B. Li, Q.A. Wang",Rank-frequency relation for Chinese characters,cs.CL physics.data-an," We show that the Zipf's law for Chinese characters perfectly holds for -sufficiently short texts (few thousand different characters). The scenario of -its validity is similar to the Zipf's law for words in short English texts. For -long Chinese texts (or for mixtures of short Chinese texts), rank-frequency -relations for Chinese characters display a two-layer, hierarchic structure that -combines a Zipfian power-law regime for frequent characters (first layer) with -an exponential-like regime for less frequent characters (second layer). For -these two layers we provide different (though related) theoretical descriptions -that include the range of low-frequency characters (hapax legomena). The -comparative analysis of rank-frequency relations for Chinese characters versus -English words illustrates the extent to which the characters play for Chinese -writers the same role as the words for those writing within alphabetical -systems. -" -869,1309.1649,Jinho D. Choi,"Preparing Korean Data for the Shared Task on Parsing Morphologically - Rich Languages",cs.CL," This document gives a brief description of Korean data prepared for the SPMRL -2013 shared task. A total of 27,363 sentences with 350,090 tokens are used for -the shared task. All constituent trees are collected from the KAIST Treebank -and transformed to the Penn Treebank style. All dependency trees are converted -from the transformed constituent trees using heuristics and labeling rules de- -signed specifically for the KAIST Treebank. In addition to the gold-standard -morphological analysis provided by the KAIST Treebank, two sets of automatic -morphological analysis are provided for the shared task, one is generated by -the HanNanum morphological analyzer, and the other is generated by the Sejong -morphological analyzer. -" -870,1309.1939,Ramon Ferrer-i-Cancho,"The placement of the head that minimizes online memory: a complex - systems approach",cs.CL nlin.AO physics.data-an physics.soc-ph," It is well known that the length of a syntactic dependency determines its -online memory cost. Thus, the problem of the placement of a head and its -dependents (complements or modifiers) that minimizes online memory is -equivalent to the problem of the minimum linear arrangement of a star tree. -However, how that length is translated into cognitive cost is not known. This -study shows that the online memory cost is minimized when the head is placed at -the center, regardless of the function that transforms length into cost, -provided only that this function is strictly monotonically increasing. Online -memory defines a quasi-convex adaptive landscape with a single central minimum -if the number of elements is odd and two central minima if that number is even. -We discuss various aspects of the dynamics of word order of subject (S), verb -(V) and object (O) from a complex systems perspective and suggest that word -orders tend to evolve by swapping adjacent constituents from an initial or -early SOV configuration that is attracted towards a central word order by -online memory minimization. We also suggest that the stability of SVO is due to -at least two factors, the quasi-convex shape of the adaptive landscape in the -online memory dimension and online memory adaptations that avoid regression to -SOV. Although OVS is also optimal for placing the verb at the center, its low -frequency is explained by its long distance to the seminal SOV in the -permutation space. -" -871,1309.2471,Harinder Singh and Parteek Kumar,"Implementation of nlization framework for verbs, pronouns and - determiners with eugene",cs.CL," UNL system is designed and implemented by a nonprofit organization, UNDL -Foundation at Geneva in 1999. UNL applications are application softwares that -allow end users to accomplish natural language tasks, such as translating, -summarizing, retrieving or extracting information, etc. Two major web based -application softwares are Interactive ANalyzer (IAN), which is a natural -language analysis system. It represents natural language sentences as semantic -networks in the UNL format. Other application software is dEep-to-sUrface -GENErator (EUGENE), which is an open-source interactive NLizer. It generates -natural language sentences out of semantic networks represented in the UNL -format. In this paper, NLization framework with EUGENE is focused, while using -UNL system for accomplishing the task of machine translation. In whole -NLization process, EUGENE takes a UNL input and delivers an output in natural -language without any human intervention. It is language-independent and has to -be parametrized to the natural language input through a dictionary and a -grammar, provided as separate interpretable files. In this paper, it is -explained that how UNL input is syntactically and semantically analyzed with -the UNL-NL T-Grammar for NLization of UNL sentences involving verbs, pronouns -and determiners for Punjabi natural language. -" -872,1309.2853,"Alexandre Denis (LORIA), Samuel Cruz-Lara (LORIA), Nadia Bellalem - (LORIA)",General Purpose Textual Sentiment Analysis and Emotion Detection Tools,cs.CL," Textual sentiment analysis and emotion detection consists in retrieving the -sentiment or emotion carried by a text or document. This task can be useful in -many domains: opinion mining, prediction, feedbacks, etc. However, building a -general purpose tool for doing sentiment analysis and emotion detection raises -a number of issues, theoretical issues like the dependence to the domain or to -the language but also pratical issues like the emotion representation for -interoperability. In this paper we present our sentiment/emotion analysis -tools, the way we propose to circumvent the di culties and the applications -they are used for. -" -873,1309.3323,"Ted Underwood, Michael L. Black, Loretta Auvil, Boris Capitanu",Mapping Mutable Genres in Structurally Complex Volumes,cs.CL cs.DL," To mine large digital libraries in humanistically meaningful ways, scholars -need to divide them by genre. This is a task that classification algorithms are -well suited to assist, but they need adjustment to address the specific -challenges of this domain. Digital libraries pose two problems of scale not -usually found in the article datasets used to test these algorithms. 1) Because -libraries span several centuries, the genres being identified may change -gradually across the time axis. 2) Because volumes are much longer than -articles, they tend to be internally heterogeneous, and the classification task -needs to begin with segmentation. We describe a multi-layered solution that -trains hidden Markov models to segment volumes, and uses ensembles of -overlapping classifiers to address historical change. We test this approach on -a collection of 469,200 volumes drawn from HathiTrust Digital Library. To -demonstrate the humanistic value of these methods, we extract 32,209 volumes of -fiction from the digital library, and trace the changing proportions of first- -and third-person narration in the corpus. We note that narrative points of view -seem to have strong associations with particular themes and genres. -" -874,1309.3946,"Anuj Sharma, Shubhamoy Dey",Using Self-Organizing Maps for Sentiment Analysis,cs.IR cs.CL cs.NE," Web 2.0 services have enabled people to express their opinions, experience -and feelings in the form of user-generated content. Sentiment analysis or -opinion mining involves identifying, classifying and aggregating opinions as -per their positive or negative polarity. This paper investigates the efficacy -of different implementations of Self-Organizing Maps (SOM) for sentiment based -visualization and classification of online reviews. Specifically, this paper -implements the SOM algorithm for both supervised and unsupervised learning from -text documents. The unsupervised SOM algorithm is implemented for sentiment -based visualization and classification tasks. For supervised sentiment -analysis, a competitive learning algorithm known as Learning Vector -Quantization is used. Both algorithms are also compared with their respective -multi-pass implementations where a quick rough ordering pass is followed by a -fine tuning pass. The experimental results on the online movie review data set -show that SOMs are well suited for sentiment based classification and sentiment -polarity visualization. -" -875,1309.3949,"Anuj sharma, Shubhamoy Dey",Performance Investigation of Feature Selection Methods,cs.IR cs.CL cs.LG," Sentiment analysis or opinion mining has become an open research domain after -proliferation of Internet and Web 2.0 social media. People express their -attitudes and opinions on social media including blogs, discussion forums, -tweets, etc. and, sentiment analysis concerns about detecting and extracting -sentiment or opinion from online text. Sentiment based text classification is -different from topical text classification since it involves discrimination -based on expressed opinion on a topic. Feature selection is significant for -sentiment analysis as the opinionated text may have high dimensions, which can -adversely affect the performance of sentiment analysis classifier. This paper -explores applicability of feature selection methods for sentiment analysis and -investigates their performance for classification in term of recall, precision -and accuracy. Five feature selection methods (Document Frequency, Information -Gain, Gain Ratio, Chi Squared, and Relief-F) and three popular sentiment -feature lexicons (HM, GI and Opinion Lexicon) are investigated on movie reviews -corpus with a size of 2000 documents. The experimental results show that -Information Gain gave consistent results and Gain Ratio performs overall best -for sentimental feature selection while sentiment lexicons gave poor -performance. Furthermore, we found that performance of the classifier depends -on appropriate number of representative feature selected from text. -" -876,1309.4035,Peter D. Turney,"Domain and Function: A Dual-Space Model of Semantic Relations and - Compositions",cs.CL cs.AI cs.LG," Given appropriate representations of the semantic relations between carpenter -and wood and between mason and stone (for example, vectors in a vector space -model), a suitable algorithm should be able to recognize that these relations -are highly similar (carpenter is to wood as mason is to stone; the relations -are analogous). Likewise, with representations of dog, house, and kennel, an -algorithm should be able to recognize that the semantic composition of dog and -house, dog house, is highly similar to kennel (dog house and kennel are -synonymous). It seems that these two tasks, recognizing relations and -compositions, are closely connected. However, up to now, the best models for -relations are significantly different from the best models for compositions. In -this paper, we introduce a dual-space model that unifies these two tasks. This -model matches the performance of the best previous models for relations and -compositions. The dual-space model consists of a space for measuring domain -similarity and a space for measuring function similarity. Carpenter and wood -share the same domain, the domain of carpentry. Mason and stone share the same -domain, the domain of masonry. Carpenter and mason share the same function, the -function of artisans. Wood and stone share the same function, the function of -materials. In the composition dog house, kennel has some domain overlap with -both dog and house (the domains of pets and buildings). The function of kennel -is similar to the function of house (the function of shelters). By combining -domain and function similarities in various ways, we can model relations, -compositions, and other aspects of semantics. -" -877,1309.4058,Ramon Ferrer-i-Cancho,"Why SOV might be initially preferred and then lost or recovered? A - theoretical framework",cs.CL nlin.AO physics.soc-ph q-bio.NC," Little is known about why SOV order is initially preferred and then discarded -or recovered. Here we present a framework for understanding these and many -related word order phenomena: the diversity of dominant orders, the existence -of free words orders, the need of alternative word orders and word order -reversions and cycles in evolution. Under that framework, word order is -regarded as a multiconstraint satisfaction problem in which at least two -constraints are in conflict: online memory minimization and maximum -predictability. -" -878,1309.4168,"Tomas Mikolov, Quoc V. Le, Ilya Sutskever",Exploiting Similarities among Languages for Machine Translation,cs.CL," Dictionaries and phrase tables are the basis of modern statistical machine -translation systems. This paper develops a method that can automate the process -of generating and extending dictionaries and phrase tables. Our method can -translate missing word and phrase entries by learning language structures based -on large monolingual data and mapping between languages from small bilingual -data. It uses distributed representation of words and learns a linear mapping -between vector spaces of languages. Despite its simplicity, our method is -surprisingly effective: we can achieve almost 90% precision@5 for translation -of words between English and Spanish. This method makes little assumption about -the languages, so it can be used to extend and refine dictionaries and -translation tables for any language pairs. -" -879,1309.4628,Grzegorz Chrupa{\l}a,Text segmentation with character-level text embeddings,cs.CL," Learning word representations has recently seen much success in computational -linguistics. However, assuming sequences of word tokens as input to linguistic -analysis is often unjustified. For many languages word segmentation is a -non-trivial task and naturally occurring text is sometimes a mixture of natural -language strings and other character data. We propose to learn text -representations directly from raw character sequences by training a Simple -recurrent Network to predict the next character in text. The network uses its -hidden layer to evolve abstract representations of the character sequences it -sees. To demonstrate the usefulness of the learned text embeddings, we use them -as features in a supervised character level text segmentation and labeling -task: recognizing spans of text containing programming language code. By using -the embeddings as features we are able to substantially improve over a baseline -which uses only surface character n-grams. -" -880,1309.5174,"Andrei Barbu, N. Siddharth, Jeffrey Mark Siskind",Saying What You're Looking For: Linguistics Meets Video Search,cs.CV cs.CL cs.IR," We present an approach to searching large video corpora for video clips which -depict a natural-language query in the form of a sentence. This approach uses -compositional semantics to encode subtle meaning that is lost in other systems, -such as the difference between two sentences which have identical words but -entirely different meaning: ""The person rode the horse} vs. \emph{The horse -rode the person"". Given a video-sentence pair and a natural-language parser, -along with a grammar that describes the space of sentential queries, we produce -a score which indicates how well the video depicts the sentence. We produce -such a score for each video clip in a corpus and return a ranked list of clips. -Furthermore, this approach addresses two fundamental problems simultaneously: -detecting and tracking objects, and recognizing whether those tracks depict the -query. Because both tracking and object detection are unreliable, this uses -knowledge about the intended sentential query to focus the tracker on the -relevant participants and ensures that the resulting tracks are described by -the sentential query. While earlier work was limited to single-word queries -which correspond to either verbs or nouns, we show how one can search for -complex queries which contain multiple phrases, such as prepositional phrases, -and modifiers, such as adverbs. We demonstrate this approach by searching for -141 queries involving people and horses interacting with each other in 10 -full-length Hollywood movies. -" -881,1309.5223,"Ralf Steinberger, Mohamed Ebrahim, Marco Turchi","JRC EuroVoc Indexer JEX - A freely available multi-label categorisation - tool",cs.CL," EuroVoc (2012) is a highly multilingual thesaurus consisting of over 6,700 -hierarchically organised subject domains used by European Institutions and many -authorities in Member States of the European Union (EU) for the classification -and retrieval of official documents. JEX is JRC-developed multi-label -classification software that learns from manually labelled data to -automatically assign EuroVoc descriptors to new documents in a profile-based -category-ranking task. The JEX release consists of trained classifiers for 22 -official EU languages, of parallel training data in the same languages, of an -interface that allows viewing and amending the assignment results, and of a -module that allows users to re-train the tool on their own document -collections. JEX allows advanced users to change the document representation so -as to possibly improve the categorisation result through linguistic -pre-processing. JEX can be used as a tool for interactive EuroVoc descriptor -assignment to increase speed and consistency of the human categorisation -process, or it can be used fully automatically. The output of JEX is a -language-independent EuroVoc feature vector lending itself also as input to -various other Language Technology tasks, including cross-lingual clustering and -classification, cross-lingual plagiarism detection, sentence selection and -ranking, and more. -" -882,1309.5226,"Ralf Steinberger, Andreas Eisele, Szymon Klocek, Spyridon Pilos, - Patrick Schl\""uter",DGT-TM: A freely Available Translation Memory in 22 Languages,cs.CL," The European Commission's (EC) Directorate General for Translation, together -with the EC's Joint Research Centre, is making available a large translation -memory (TM; i.e. sentences and their professionally produced translations) -covering twenty-two official European Union (EU) languages and their 231 -language pairs. Such a resource is typically used by translation professionals -in combination with TM software to improve speed and consistency of their -translations. However, this resource has also many uses for translation studies -and for language technology applications, including Statistical Machine -Translation (SMT), terminology extraction, Named Entity Recognition (NER), -multilingual classification and clustering, and many more. In this reference -paper for DGT-TM, we introduce this new resource, provide statistics regarding -its size, and explain how it was produced and how to use it. -" -883,1309.5290,"Ralf Steinberger, Bruno Pouliquen, Erik van der Goot",An introduction to the Europe Media Monitor family of applications,cs.CL," Most large organizations have dedicated departments that monitor the media to -keep up-to-date with relevant developments and to keep an eye on how they are -represented in the news. Part of this media monitoring work can be automated. -In the European Union with its 23 official languages, it is particularly -important to cover media reports in many languages in order to capture the -complementary news content published in the different countries. It is also -important to be able to access the news content across languages and to merge -the extracted information. We present here the four publicly accessible systems -of the Europe Media Monitor (EMM) family of applications, which cover between -19 and 50 languages (see http://press.jrc.it/overview.html). We give an -overview of their functionality and discuss some of the implications of the -fact that they cover quite so many languages. We discuss design issues -necessary to be able to achieve this high multilinguality, as well as the -benefits of this multilinguality. -" -884,1309.5319,"Cl\'ement Moulin-Frier (INRIA Bordeaux - Sud-Ouest, GIPSA-lab), M. A. - Arbib (USC)","Recognizing Speech in a Novel Accent: The Motor Theory of Speech - Perception Reframed",cs.CL cs.LG q-bio.NC," The motor theory of speech perception holds that we perceive the speech of -another in terms of a motor representation of that speech. However, when we -have learned to recognize a foreign accent, it seems plausible that recognition -of a word rarely involves reconstruction of the speech gestures of the speaker -rather than the listener. To better assess the motor theory and this -observation, we proceed in three stages. Part 1 places the motor theory of -speech perception in a larger framework based on our earlier models of the -adaptive formation of mirror neurons for grasping, and for viewing extensions -of that mirror system as part of a larger system for neuro-linguistic -processing, augmented by the present consideration of recognizing speech in a -novel accent. Part 2 then offers a novel computational model of how a listener -comes to understand the speech of someone speaking the listener's native -language with a foreign accent. The core tenet of the model is that the -listener uses hypotheses about the word the speaker is currently uttering to -update probabilities linking the sound produced by the speaker to phonemes in -the native language repertoire of the listener. This, on average, improves the -recognition of later words. This model is neutral regarding the nature of the -representations it uses (motor vs. auditory). It serve as a reference point for -the discussion in Part 3, which proposes a dual-stream neuro-linguistic -architecture to revisits claims for and against the motor theory of speech -perception and the relevance of mirror neurons, and extracts some implications -for the reframing of the motor theory. -" -885,1309.5391,Saif M. Mohammad,Even the Abstract have Colour: Consensus in Word-Colour Associations,cs.CL," Colour is a key component in the successful dissemination of information. -Since many real-world concepts are associated with colour, for example danger -with red, linguistic information is often complemented with the use of -appropriate colours in information visualization and product marketing. Yet, -there is no comprehensive resource that captures concept-colour associations. -We present a method to create a large word-colour association lexicon by -crowdsourcing. A word-choice question was used to obtain sense-level -annotations and to ensure data quality. We focus especially on abstract -concepts and emotions to show that even they tend to have strong colour -associations. Thus, using the right colours can not only improve semantic -coherence, but also inspire the desired emotional response. -" -886,1309.5652,"Mona Diab, Nizar Habash, Owen Rambow and Ryan Roth",LDC Arabic Treebanks and Associated Corpora: Data Divisions Manual,cs.CL," The Linguistic Data Consortium (LDC) has developed hundreds of data corpora -for natural language processing (NLP) research. Among these are a number of -annotated treebank corpora for Arabic. Typically, these corpora consist of a -single collection of annotated documents. NLP research, however, usually -requires multiple data sets for the purposes of training models, developing -techniques, and final evaluation. Therefore it becomes necessary to divide the -corpora used into the required data sets (divisions). This document details a -set of rules that have been defined to enable consistent divisions for old and -new Arabic treebanks (ATB) and related corpora. -" -887,1309.5657,T.El-Shishtawy,A Hybrid Algorithm for Matching Arabic Names,cs.CL," In this paper, a new hybrid algorithm which combines both of token-based and -character-based approaches is presented. The basic Levenshtein approach has -been extended to token-based distance metric. The distance metric is enhanced -to set the proper granularity level behavior of the algorithm. It smoothly maps -a threshold of misspellings differences at the character level, and the -importance of token level errors in terms of token's position and frequency. -Using a large Arabic dataset, the experimental results show that the proposed -algorithm overcomes successfully many types of errors such as: typographical -errors, omission or insertion of middle name components, omission of -non-significant popular name components, and different writing styles character -variations. When compared the results with other classical algorithms, using -the same dataset, the proposed algorithm was found to increase the minimum -success level of best tested algorithms, while achieving higher upper limits . -" -888,1309.5843,"Marco Guerini, Lorenzo Gatti, Marco Turchi",Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet,cs.CL," Assigning a positive or negative score to a word out of context (i.e. a -word's prior polarity) is a challenging task for sentiment analysis. In the -literature, various approaches based on SentiWordNet have been proposed. In -this paper, we compare the most often used techniques together with newly -proposed ones and incorporate all of them in a learning framework to see -whether blending them can further improve the estimation of prior polarity -scores. Using two different versions of SentiWordNet and testing regression and -classification models across tasks and datasets, our learning approach -consistently outperforms the single metrics, providing a new state-of-the-art -approach in computing words' prior polarity for sentiment analysis. We conclude -our investigation showing interesting biases in calculated prior polarity -scores when word Part of Speech and annotator gender are considered. -" -889,1309.5909,Saif Mohammad,"From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels - and Fairy Tales",cs.CL," Today we have access to unprecedented amounts of literary texts. However, -search still relies heavily on key words. In this paper, we show how sentiment -analysis can be used in tandem with effective visualizations to quantify and -track emotions in both individual books and across very large collections. We -introduce the concept of emotion word density, and using the Brothers Grimm -fairy tales as example, we show how collections of text can be organized for -better search. Using the Google Books Corpus we show how to determine an -entity's emotion associations from co-occurring words. Finally, we compare -emotion words in fairy tales and novels, to show that fairy tales have a much -wider range of emotion word densities than novels. -" -890,1309.5942,Saif Mohammad,Colourful Language: Measuring Word-Colour Associations,cs.CL," Since many real-world concepts are associated with colour, for example danger -with red, linguistic information is often complimented with the use of -appropriate colours in information visualization and product marketing. Yet, -there is no comprehensive resource that captures concept-colour associations. -We present a method to create a large word-colour association lexicon by -crowdsourcing. We focus especially on abstract concepts and emotions to show -that even though they cannot be physically visualized, they too tend to have -strong colour associations. Finally, we show how word-colour associations -manifest themselves in language, and quantify usefulness of co-occurrence and -polarity cues in automatically detecting colour associations. -" -891,1309.6047,Nikolay Lyubimov and Mikhail Kotov,"Non-negative Matrix Factorization with Linear Constraints for - Single-Channel Speech Enhancement",cs.SD cs.CL," This paper investigates a non-negative matrix factorization (NMF)-based -approach to the semi-supervised single-channel speech enhancement problem where -only non-stationary additive noise signals are given. The proposed method -relies on sinusoidal model of speech production which is integrated inside NMF -framework using linear constraints on dictionary atoms. This method is further -developed to regularize harmonic amplitudes. Simple multiplicative algorithms -are presented. The experimental evaluation was made on TIMIT corpus mixed with -various types of noise. It has been shown that the proposed method outperforms -some of the state-of-the-art noise suppression techniques in terms of -signal-to-noise ratio. -" -892,1309.6162,"Ralf Steinberger, Bruno Pouliquen, Mijail Kabadjov, Erik van der Goot","JRC-Names: A freely available, highly multilingual named entity resource",cs.CL," This paper describes a new, freely available, highly multilingual named -entity resource for person and organisation names that has been compiled over -seven years of large-scale multilingual news analysis combined with Wikipedia -mining, resulting in 205,000 per-son and organisation names plus about the same -number of spelling variants written in over 20 different scripts and in many -more languages. This resource, produced as part of the Europe Media Monitor -activity (EMM, http://emm.newsbrief.eu/overview.html), can be used for a number -of purposes. These include improving name search in databases or on the -internet, seeding machine learning systems to learn named entity recognition -rules, improve machine translation results, and more. We describe here how this -resource was created; we give statistics on its current size; we address the -issue of morphological inflection; and we give details regarding its -functionality. Updates to this resource will be made available daily. -" -893,1309.6176,"Xin Zheng, Zhiyong Wu, Helen Meng, Weifeng Li, Lianhong Cai","Feature Learning with Gaussian Restricted Boltzmann Machine for Robust - Speech Recognition",cs.CL cs.LG cs.SD," In this paper, we first present a new variant of Gaussian restricted -Boltzmann machine (GRBM) called multivariate Gaussian restricted Boltzmann -machine (MGRBM), with its definition and learning algorithm. Then we propose -using a learned GRBM or MGRBM to extract better features for robust speech -recognition. Our experiments on Aurora2 show that both GRBM-extracted and -MGRBM-extracted feature performs much better than Mel-frequency cepstral -coefficient (MFCC) with either HMM-GMM or hybrid HMM-deep neural network (DNN) -acoustic model, and MGRBM-extracted feature is slightly better. -" -894,1309.6185,"Maud Ehrmann, Leonida della Rocca, Ralf Steinberger, Hristo Tanev",Acronym recognition and processing in 22 languages,cs.CL," We are presenting work on recognising acronyms of the form Long-Form -(Short-Form) such as ""International Monetary Fund (IMF)"" in millions of news -articles in twenty-two languages, as part of our more general effort to -recognise entities and their variants in news text and to use them for the -automatic analysis of the news, including the linking of related news across -languages. We show how the acronym recognition patterns, initially developed -for medical terms, needed to be adapted to the more general news domain and we -present evaluation results. We describe our effort to automatically merge the -numerous long-form variants referring to the same short-form, while keeping -non-related long-forms separate. Finally, we provide extensive statistics on -the frequency and the distribution of short-form/long-form pairs across -languages. -" -895,1309.6202,"Alexandra Balahur, Ralf Steinberger, Mijail Kabadjov, Vanni Zavarella, - Erik van der Goot, Matina Halkia, Bruno Pouliquen, Jenya Belyaeva",Sentiment Analysis in the News,cs.CL," Recent years have brought a significant growth in the volume of research in -sentiment analysis, mostly on highly subjective text types (movie or product -reviews). The main difference these texts have with news articles is that their -target is clearly defined and unique across the text. Following different -annotation efforts and the analysis of the issues encountered, we realised that -news opinion mining is different from that of other text types. We identified -three subtasks that need to be addressed: definition of the target; separation -of the good and bad news content from the good and bad sentiment expressed on -the target; and analysis of clearly marked opinion that is expressed -explicitly, not needing interpretation or the use of world knowledge. -Furthermore, we distinguish three different possible views on newspaper -articles - author, reader and text, which have to be addressed differently at -the time of analysing sentiment. Given these definitions, we present work on -mining opinions about entities in English language news, in which (a) we test -the relative suitability of various sentiment dictionaries and (b) we attempt -to separate positive or negative opinion from good or bad news. In the -experiments described here, we tested whether or not subject domain-defining -vocabulary should be ignored. Results showed that this idea is more appropriate -in the context of news opinion mining and that the approaches taking this into -consideration produce a better performance. -" -896,1309.6347,Saif M. Mohammad and Tony (Wenda) Yang,Tracking Sentiment in Mail: How Genders Differ on Emotional Axes,cs.CL," With the widespread use of email, we now have access to unprecedented amounts -of text that we ourselves have written. In this paper, we show how sentiment -analysis can be used in tandem with effective visualizations to quantify and -track emotions in many types of mail. We create a large word--emotion -association lexicon by crowdsourcing, and use it to compare emotions in love -letters, hate mail, and suicide notes. We show that there are marked -differences across genders in how they use emotion words in work-place email. -For example, women use many words from the joy--sadness axis, whereas men -prefer terms from the fear--trust axis. Finally, we show visualizations that -can help people track emotions in their emails. -" -897,1309.6352,Saif M. Mohammad and Svetlana Kiritchenko,Using Nuances of Emotion to Identify Personality,cs.CL," Past work on personality detection has shown that frequency of lexical -categories such as first person pronouns, past tense verbs, and sentiment words -have significant correlations with personality traits. In this paper, for the -first time, we show that fine affect (emotion) categories such as that of -excitement, guilt, yearning, and admiration are significant indicators of -personality. Additionally, we perform experiments to show that the gains -provided by the fine affect categories are not obtained by using coarse affect -categories alone or with specificity features alone. We employ these features -in five SVM classifiers for detecting five personality traits through essays. -We find that the use of fine emotion features leads to statistically -significant improvement over a competitive baseline, whereas the use of coarse -affect and specificity features does not. -" -898,1309.6650,"Haytham Al-Feel, Ralph Schafermeier, Adrian Paschke",An Inter-lingual Reference Approach For Multi-Lingual Ontology Matching,cs.CL cs.DL," Ontologies are considered as the backbone of the Semantic Web. With the -rising success of the Semantic Web, the number of participating communities -from different countries is constantly increasing. The growing number of -ontologies available in different natural languages leads to an -interoperability problem. In this paper, we discuss several approaches for -ontology matching; examine similarities and differences, identify weaknesses, -and compare the existing automated approaches with the manual approaches for -integrating multilingual ontologies. In addition to that, we propose a new -architecture for a multilingual ontology matching service. As a case study we -used an example of two multilingual enterprise ontologies - the university -ontology of Freie Universitaet Berlin and the ontology for Fayoum University in -Egypt. -" -899,1309.6722,"Tang Duyu, Qin Bing, Zhou LanJun, Wong KamFai, Zhao Yanyan, Liu Ting","Domain-Specific Sentiment Word Extraction by Seed Expansion and Pattern - Generation",cs.CL," This paper focuses on the automatic extraction of domain-specific sentiment -word (DSSW), which is a fundamental subtask of sentiment analysis. Most -previous work utilizes manual patterns for this task. However, the performance -of those methods highly relies on the labelled patterns or selected seeds. In -order to overcome the above problem, this paper presents an automatic framework -to detect large-scale domain-specific patterns for DSSW extraction. To this -end, sentiment seeds are extracted from massive dataset of user comments. -Subsequently, these sentiment seeds are expanded by synonyms using a -bootstrapping mechanism. Simultaneously, a synonymy graph is built and the -graph propagation algorithm is applied on the built synonymy graph. Afterwards, -syntactic and sequential relations between target words and high-ranked -sentiment words are extracted automatically to construct large-scale patterns, -which are further used to extracte DSSWs. The experimental results in three -domains reveal the effectiveness of our method. -" -900,1309.6874,"Pengtao Xie, Eric P. Xing",Integrating Document Clustering and Topic Modeling,cs.LG cs.CL cs.IR stat.ML," Document clustering and topic modeling are two closely related tasks which -can mutually benefit each other. Topic modeling can project documents into a -topic space which facilitates effective document clustering. Cluster labels -discovered by document clustering can be incorporated into topic models to -extract local topics specific to each cluster and global topics shared by all -clusters. In this paper, we propose a multi-grain clustering topic model -(MGCTM) which integrates document clustering and topic modeling into a unified -framework and jointly performs the two tasks to achieve the overall best -performance. Our model tightly couples two components: a mixture component used -for discovering latent groups in document collection and a topic model -component used for mining multi-grain topics including local topics specific to -each cluster and global topics shared across clusters.We employ variational -inference to approximate the posterior of hidden variables and learn model -parameters. Experiments on two datasets demonstrate the effectiveness of our -model. -" -901,1309.7270,"Tianjun Fu, Ahmed Abbasi, Daniel Zeng and Hsinchun Chen",Evaluating the Usefulness of Sentiment Information for Focused Crawlers,cs.IR cs.CL," Despite the prevalence of sentiment-related content on the Web, there has -been limited work on focused crawlers capable of effectively collecting such -content. In this study, we evaluated the efficacy of using sentiment-related -information for enhanced focused crawling of opinion-rich web content regarding -a particular topic. We also assessed the impact of using sentiment-labeled web -graphs to further improve collection accuracy. Experimental results on a large -test bed encompassing over half a million web pages revealed that focused -crawlers utilizing sentiment information as well as sentiment-labeled web -graphs are capable of gathering more holistic collections of opinion-related -content regarding a particular topic. The results have important implications -for business and marketing intelligence gathering efforts in the Web 2.0 era. -" -902,1309.7312,"Himangshu Sarma, Navanath Saharia, Utpal Sharma, Smriti Kumar Sinha, - Mancha Jyoti Malakar",Development and Transcription of Assamese Speech Corpus,cs.CL," A balanced speech corpus is the basic need for any speech processing task. In -this report we describe our effort on development of Assamese speech corpus. We -mainly focused on some issues and challenges faced during development of the -corpus. Being a less computationally aware language, this is the first effort -to develop speech corpus for Assamese. As corpus development is an ongoing -process, in this paper we report only the initial task. -" -903,1309.7340,Jiwei Li and Claire Cardie,Early Stage Influenza Detection from Twitter,cs.SI cs.CL," Influenza is an acute respiratory illness that occurs virtually every year -and results in substantial disease, death and expense. Detection of Influenza -in its earliest stage would facilitate timely action that could reduce the -spread of the illness. Existing systems such as CDC and EISS which try to -collect diagnosis data, are almost entirely manual, resulting in about two-week -delays for clinical data acquisition. Twitter, a popular microblogging service, -provides us with a perfect source for early-stage flu detection due to its -real- time nature. For example, when a flu breaks out, people that get the flu -may post related tweets which enables the detection of the flu breakout -promptly. In this paper, we investigate the real-time flu detection problem on -Twitter data by proposing Flu Markov Network (Flu-MN): a spatio-temporal -unsupervised Bayesian algorithm based on a 4 phase Markov Network, trying to -identify the flu breakout at the earliest stage. We test our model on real -Twitter datasets from the United States along with baselines in multiple -applications, such as real-time flu breakout detection, future epidemic phase -prediction, or Influenza-like illness (ILI) physician visits. Experimental -results show the robustness and effectiveness of our approach. We build up a -real time flu reporting system based on the proposed approach, and we are -hopeful that it would help government or health organizations in identifying -flu outbreaks and facilitating timely actions to decrease unnecessary -mortality. -" -904,1310.0201,Moreno I. Coco and Rick Dale,"Cross-Recurrence Quantification Analysis of Categorical and Continuous - Time Series: an R package",cs.CL stat.AP," This paper describes the R package crqa to perform cross-recurrence -quantification analysis of two time series of either a categorical or -continuous nature. Streams of behavioral information, from eye movements to -linguistic elements, unfold over time. When two people interact, such as in -conversation, they often adapt to each other, leading these behavioral levels -to exhibit recurrent states. In dialogue, for example, interlocutors adapt to -each other by exchanging interactive cues: smiles, nods, gestures, choice of -words, and so on. In order for us to capture closely the goings-on of dynamic -interaction, and uncover the extent of coupling between two individuals, we -need to quantify how much recurrence is taking place at these levels. Methods -available in crqa would allow researchers in cognitive science to pose such -questions as how much are two people recurrent at some level of analysis, what -is the characteristic lag time for one person to maximally match another, or -whether one person is leading another. First, we set the theoretical ground to -understand the difference between 'correlation' and 'co-visitation' when -comparing two time series, using an aggregative or cross-recurrence approach. -Then, we describe more formally the principles of cross-recurrence, and show -with the current package how to carry out analyses applying them. We end the -paper by comparing computational efficiency, and results' consistency, of crqa -R package, with the benchmark MATLAB toolbox crptoolbox. We show perfect -comparability between the two libraries on both levels. -" -905,1310.0573,"Deepti Bhalla, Nisheeth Joshi, Iti Mathur","Improving the Quality of MT Output using Novel Name Entity Translation - Scheme",cs.CL," This paper presents a novel approach to machine translation by combining the -state of art name entity translation scheme. Improper translation of name -entities lapse the quality of machine translated output. In this work, name -entities are transliterated by using statistical rule based approach. This -paper describes the translation and transliteration of name entities from -English to Punjabi. We have experimented on four types of name entities which -are: Proper names, Location names, Organization names and miscellaneous. -Various rules for the purpose of syllabification have been constructed. -Transliteration of name entities is accomplished with the help of Probability -calculation. N-Gram probabilities for the extracted syllables have been -calculated using statistical machine translation toolkit MOSES. -" -906,1310.0575,"Jyoti Singh, Nisheeth Joshi, Iti Mathur",Development of Marathi Part of Speech Tagger Using Statistical Approach,cs.CL," Part-of-speech (POS) tagging is a process of assigning the words in a text -corresponding to a particular part of speech. A fundamental version of POS -tagging is the identification of words as nouns, verbs, adjectives etc. For -processing natural languages, Part of Speech tagging is a prominent tool. It is -one of the simplest as well as most constant and statistical model for many NLP -applications. POS Tagging is an initial stage of linguistics, text analysis -like information retrieval, machine translator, text to speech synthesis, -information extraction etc. In POS Tagging we assign a Part of Speech tag to -each word in a sentence and literature. Various approaches have been proposed -to implement POS taggers. In this paper we present a Marathi part of speech -tagger. It is morphologically rich language. Marathi is spoken by the native -people of Maharashtra. The general approach used for development of tagger is -statistical using Unigram, Bigram, Trigram and HMM Methods. It presents a clear -idea about all the algorithms with suitable examples. It also introduces a tag -set for Marathi which can be used for tagging Marathi text. In this paper we -have shown the development of the tagger as well as compared to check the -accuracy of taggers output. The three Marathi POS taggers viz. Unigram, Bigram, -Trigram and HMM gives the accuracy of 77.38%, 90.30%, 91.46% and 93.82% -respectively. -" -907,1310.0578,"Vaishali Gupta, Nisheeth Joshi, Iti Mathur","Subjective and Objective Evaluation of English to Urdu Machine - Translation",cs.CL," Machine translation is research based area where evaluation is very important -phenomenon for checking the quality of MT output. The work is based on the -evaluation of English to Urdu Machine translation. In this research work we -have evaluated the translation quality of Urdu language which has been -translated by using different Machine Translation systems like Google, Babylon -and Ijunoon. The evaluation process is done by using two approaches - Human -evaluation and Automatic evaluation. We have worked for both the approaches -where in human evaluation emphasis is given to scales and parameters while in -automatic evaluation emphasis is given to some automatic metric such as BLEU, -GTM, METEOR and ATEC. -" -908,1310.0581,"Vaishali Gupta, Nisheeth Joshi, Iti Mathur",Rule Based Stemmer in Urdu,cs.CL," Urdu is a combination of several languages like Arabic, Hindi, English, -Turkish, Sanskrit etc. It has a complex and rich morphology. This is the reason -why not much work has been done in Urdu language processing. Stemming is used -to convert a word into its respective root form. In stemming, we separate the -suffix and prefix from the word. It is useful in search engines, natural -language processing and word processing, spell checkers, word parsing, word -frequency and count studies. This paper presents a rule based stemmer for Urdu. -The stemmer that we have discussed here is used in information retrieval. We -have also evaluated our results by verifying it with a human expert. -" -909,1310.0754,"M.Thangarasu, R.Manavalan",Stemmers for Tamil Language: Performance Analysis,cs.CL," Stemming is the process of extracting root word from the given inflection -word and also plays significant role in numerous application of Natural -Language Processing (NLP). Tamil Language raises several challenges to NLP, -since it has rich morphological patterns than other languages. The rule based -approach light-stemmer is proposed in this paper, to find stem word for given -inflection Tamil word. The performance of proposed approach is compared to a -rule based suffix removal stemmer based on correctly and incorrectly predicted. -The experimental result clearly show that the proposed approach light stemmer -for Tamil language perform better than suffix removal stemmer and also more -effective in Information Retrieval System (IRS). -" -910,1310.1249,"Andrzej Jarynowski, Amir Rostami",Reading Stockholm Riots 2013 in social media by text-mining,cs.SI cs.CL physics.soc-ph stat.AP," The riots in Stockholm in May 2013 were an event that reverberated in the -world media for its dimension of violence that had spread through the Swedish -capital. In this study we have investigated the role of social media in -creating media phenomena via text mining and natural language processing. We -have focused on two channels of communication for our analysis: Twitter and -Poloniainfo.se (Forum of Polish community in Sweden). Our preliminary results -show some hot topics driving discussion related mostly to Swedish Police and -Swedish Politics by counting word usage. Typical features for media -intervention are presented. We have built networks of most popular phrases, -clustered by categories (geography, media institution, etc.). Sentiment -analysis shows negative connotation with Police. The aim of this preliminary -exploratory quantitative study was to generate questions and hypotheses, which -we could carefully follow by deeper more qualitative methods. -" -911,1310.1285,"S\'ebastien Harispe, Sylvie Ranwez, Stefan Janaqi, Jacky Montmain","Semantic Measures for the Comparison of Units of Language, Concepts or - Instances from Text and Knowledge Base Analysis",cs.CL," Semantic measures are widely used today to estimate the strength of the -semantic relationship between elements of various types: units of language -(e.g., words, sentences, documents), concepts or even instances semantically -characterized (e.g., diseases, genes, geographical locations). Semantic -measures play an important role to compare such elements according to semantic -proxies: texts and knowledge representations, which support their meaning or -describe their nature. Semantic measures are therefore essential for designing -intelligent agents which will for example take advantage of semantic analysis -to mimic human ability to compare abstract or concrete objects. This paper -proposes a comprehensive survey of the broad notion of semantic measure for the -comparison of units of language, concepts or instances based on semantic proxy -analyses. Semantic measures generalize the well-known notions of semantic -similarity, semantic relatedness and semantic distance, which have been -extensively studied by various communities over the last decades (e.g., -Cognitive Sciences, Linguistics, and Artificial Intelligence to mention a few). -" -912,1310.1425,Mohammad Nasiruddin,"A State of the Art of Word Sense Induction: A Way Towards Word Sense - Disambiguation for Under-Resourced Languages",cs.CL," Word Sense Disambiguation (WSD), the process of automatically identifying the -meaning of a polysemous word in a sentence, is a fundamental task in Natural -Language Processing (NLP). Progress in this approach to WSD opens up many -promising developments in the field of NLP and its applications. Indeed, -improvement over current performance levels could allow us to take a first step -towards natural language understanding. Due to the lack of lexical resources it -is sometimes difficult to perform WSD for under-resourced languages. This paper -is an investigation on how to initiate research in WSD for under-resourced -languages by applying Word Sense Induction (WSI) and suggests some interesting -topics to focus on. -" -913,1310.1426,"Foyzul Hassan, Mohammed Rokibul Alam Kotwal, Md. Mostafizur Rahman, - Mohammad Nasiruddin, Md. Abdul Latif and Mohammad Nurul Huda","Local Feature or Mel Frequency Cepstral Coefficients - Which One is - Better for MLN-Based Bangla Speech Recognition?",cs.CL," This paper discusses the dominancy of local features (LFs), as input to the -multilayer neural network (MLN), extracted from a Bangla input speech over mel -frequency cepstral coefficients (MFCCs). Here, LF-based method comprises three -stages: (i) LF extraction from input speech, (ii) phoneme probabilities -extraction using MLN from LF and (iii) the hidden Markov model (HMM) based -classifier to obtain more accurate phoneme strings. In the experiments on -Bangla speech corpus prepared by us, it is observed that the LFbased automatic -speech recognition (ASR) system provides higher phoneme correct rate than the -MFCC-based system. Moreover, the proposed system requires fewer mixture -components in the HMMs. -" -914,1310.1590,Paheli Bhattacharya and Arnab Bhattacharya,Evolution of the Modern Phase of Written Bangla: A Statistical Study,cs.CL," Active languages such as Bangla (or Bengali) evolve over time due to a -variety of social, cultural, economic, and political issues. In this paper, we -analyze the change in the written form of the modern phase of Bangla -quantitatively in terms of character-level, syllable-level, morpheme-level and -word-level features. We collect three different types of corpora---classical, -newspapers and blogs---and test whether the differences in their features are -statistically significant. Results suggest that there are significant changes -in the length of a word when measured in terms of characters, but there is not -much difference in usage of different characters, syllables and morphemes in a -word or of different words in a sentence. To the best of our knowledge, this is -the first work on Bangla of this kind. -" -915,1310.1597,Mengqiu Wang and Christopher D. Manning,"Cross-lingual Pseudo-Projected Expectation Regularization for Weakly - Supervised Learning",cs.CL cs.AI," We consider a multilingual weakly supervised learning scenario where -knowledge from annotated corpora in a resource-rich language is transferred via -bitext to guide the learning in other languages. Past approaches project labels -across bitext and use them as features or gold labels for training. We propose -a new method that projects model expectations rather than labels, which -facilities transfer of model uncertainty across language boundaries. We encode -expectations as constraints and train a discriminative CRF model using -Generalized Expectation Criteria (Mann and McCallum, 2010). Evaluated on -standard Chinese-English and German-English NER datasets, our method -demonstrates F1 scores of 64% and 60% when no labeled data is used. Attaining -the same accuracy with supervised CRFs requires 12k and 1.5k labeled sentences. -Furthermore, when combined with labeled examples, our method yields significant -improvements over state-of-the-art supervised methods, achieving best reported -numbers to date on Chinese OntoNotes and German CoNLL-03 datasets. -" -916,1310.1964,"Flavio Massimiliano Cecchini (Universit\`a degli Studi di Milano), - Elisabetta Fersini (Universiy of Milano-Bicocca)","Named entity recognition using conditional random fields with non-local - relational constraints",cs.CL," We begin by introducing the Computer Science branch of Natural Language -Processing, then narrowing the attention on its subbranch of Information -Extraction and particularly on Named Entity Recognition, discussing briefly its -main methodological approaches. It follows an introduction to state-of-the-art -Conditional Random Fields under the form of linear chains. Subsequently, the -idea of constrained inference as a way to model long-distance relationships in -a text is presented, based on an Integer Linear Programming representation of -the problem. Adding such relationships to the problem as automatically inferred -logical formulas, translatable into linear conditions, we propose to solve the -resulting more complex problem with the aid of Lagrangian relaxation, of which -some technical details are explained. Lastly, we give some experimental -results. -" -917,1310.1975,Brendan O'Connor and Michael Heilman,ARKref: a rule-based coreference resolution system,cs.CL," ARKref is a tool for noun phrase coreference. It is a deterministic, -rule-based system that uses syntactic information from a constituent parser, -and semantic information from an entity recognition component. Its architecture -is based on the work of Haghighi and Klein (2009). ARKref was originally -written in 2009. At the time of writing, the last released version was in March -2011. This document describes that version, which is open-source and publicly -available at: http://www.ark.cs.cmu.edu/ARKref -" -918,1310.2408,"Jun Zhu, Xun Zheng, Bo Zhang","Improved Bayesian Logistic Supervised Topic Models with Data - Augmentation",cs.LG cs.CL stat.AP stat.ML," Supervised topic models with a logistic likelihood have two issues that -potentially limit their practical use: 1) response variables are usually -over-weighted by document word counts; and 2) existing variational inference -methods make strict mean-field assumptions. We address these issues by: 1) -introducing a regularization constant to better balance the two parts based on -an optimization formulation of Bayesian inference; and 2) developing a simple -Gibbs sampling algorithm by introducing auxiliary Polya-Gamma variables and -collapsing out Dirichlet variables. Our augment-and-collapse sampling algorithm -has analytical forms of each conditional distribution without making any -restricting assumptions and can be easily parallelized. Empirical results -demonstrate significant improvements on prediction performance and time -efficiency. -" -919,1310.2479,"Christian M. Alis, May T. Lim",Spatio-temporal variation of conversational utterances on Twitter,physics.soc-ph cs.CL cs.SI," Conversations reflect the existing norms of a language. Previously, we found -that utterance lengths in English fictional conversations in books and movies -have shortened over a period of 200 years. In this work, we show that this -shortening occurs even for a brief period of 3 years (September 2009-December -2012) using 229 million utterances from Twitter. Furthermore, the subset of -geographically-tagged tweets from the United States show an inverse proportion -between utterance lengths and the state-level percentage of the Black -population. We argue that shortening of utterances can be explained by the -increasing usage of jargon including coined words. -" -920,1310.2527,"Maxime Amblard (INRIA Nancy - Grand Est / LORIA, MSH Lorraine)",Treating clitics with minimalist grammars,cs.CL cs.LO," We propose an extension of Stabler's version of clitics treatment for a wider -coverage of the French language. For this, we present the lexical entries -needed in the lexicon. Then, we show the recognition of complex syntactic -phenomena as (left and right) dislo- cation, clitic climbing over modal and -extraction from determiner phrase. The aim of this presentation is the -syntax-semantic interface for clitics analyses in which we will stress on -clitic climbing over verb and raising verb. -" -921,1310.3099,"Roland Maas, Christian Huemmer, Armin Sehr, Walter Kellermann","A Bayesian Network View on Acoustic Model-Based Techniques for Robust - Speech Recognition",cs.LG cs.CL stat.ML," This article provides a unifying Bayesian network view on various approaches -for acoustic model adaptation, missing feature, and uncertainty decoding that -are well-known in the literature of robust automatic speech recognition. The -representatives of these classes can often be deduced from a Bayesian network -that extends the conventional hidden Markov models used in speech recognition. -These extensions, in turn, can in many cases be motivated from an underlying -observation model that relates clean and distorted feature vectors. By -converting the observation models into a Bayesian network representation, we -formulate the corresponding compensation rules leading to a unified view on -known derivations as well as to new formulations for certain approaches. The -generic Bayesian perspective provided in this contribution thus highlights -structural differences and similarities between the analyzed approaches. -" -922,1310.3333,Sriramkumar Balasubramanian and Raghuram Reddy Nagireddy,Visualizing Bags of Vectors,cs.IR cs.CL cs.LG," The motivation of this work is two-fold - a) to compare between two different -modes of visualizing data that exists in a bag of vectors format b) to propose -a theoretical model that supports a new mode of visualizing data. Visualizing -high dimensional data can be achieved using Minimum Volume Embedding, but the -data has to exist in a format suitable for computing similarities while -preserving local distances. This paper compares the visualization between two -methods of representing data and also proposes a new method providing sample -visualizations for that method. -" -923,1310.3499,Bohdan Pavlyshenko,Forecasting of Events by Tweet Data Mining,cs.SI cs.CL cs.CY," This paper describes the analysis of quantitative characteristics of frequent -sets and association rules in the posts of Twitter microblogs related to -different event discussions. For the analysis, we used a theory of frequent -sets, association rules and a theory of formal concept analysis. We revealed -the frequent sets and association rules which characterize the semantic -relations between the concepts of analyzed subjects. The support of some -frequent sets reaches its global maximum before the expected event but with -some time delay. Such frequent sets may be considered as predictive markers -that characterize the significance of expected events for blogosphere users. We -showed that the time dynamics of confidence in some revealed association rules -can also have predictive characteristics. Exceeding a certain threshold may be -a signal for corresponding reaction in the society within the time interval -between the maximum and the probable coming of an event. In this paper, we -considered two types of events: the Olympic tennis tournament final in London, -2012 and the prediction of Eurovision 2013 winner. -" -924,1310.3500,Bohdan Pavlyshenko,Can Twitter Predict Royal Baby's Name ?,cs.SI cs.CL cs.CY," In this paper, we analyze the existence of possible correlation between -public opinion of twitter users and the decision-making of persons who are -influential in the society. We carry out this analysis on the example of the -discussion of probable name of the British crown baby, born in July, 2013. In -our study, we use the methods of quantitative processing of natural language, -the theory of frequent sets, the algorithms of visual displaying of users' -communities. We also analyzed the time dynamics of keyword frequencies. The -analysis showed that the main predictable name was dominating in the spectrum -of names before the official announcement. Using the theories of frequent sets, -we showed that the full name consisting of three component names was the part -of top 5 by the value of support. It was revealed that the structure of -dynamically formed users' communities participating in the discussion is -determined by only a few leaders who influence significantly the viewpoints of -other users. -" -925,1310.4546,"Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean","Distributed Representations of Words and Phrases and their - Compositionality",cs.CL cs.LG stat.ML," The recently introduced continuous Skip-gram model is an efficient method for -learning high-quality distributed vector representations that capture a large -number of precise syntactic and semantic word relationships. In this paper we -present several extensions that improve both the quality of the vectors and the -training speed. By subsampling of the frequent words we obtain significant -speedup and also learn more regular word representations. We also describe a -simple alternative to the hierarchical softmax called negative sampling. An -inherent limitation of word representations is their indifference to word order -and their inability to represent idiomatic phrases. For example, the meanings -of ""Canada"" and ""Air"" cannot be easily combined to obtain ""Air Canada"". -Motivated by this example, we present a simple method for finding phrases in -text, and show that learning good vector representations for millions of -phrases is possible. -" -926,1310.4909,"M. Sudheep Elayidom, Chinchu Jose, Anitta Puthussery, Neenu K Sasi",Text Classification For Authorship Attribution Analysis,cs.DL cs.CL cs.LG," Authorship attribution mainly deals with undecided authorship of literary -texts. Authorship attribution is useful in resolving issues like uncertain -authorship, recognize authorship of unknown texts, spot plagiarism so on. -Statistical methods can be used to set apart the approach of an author -numerically. The basic methodologies that are made use in computational -stylometry are word length, sentence length, vocabulary affluence, frequencies -etc. Each author has an inborn style of writing, which is particular to -himself. Statistical quantitative techniques can be used to differentiate the -approach of an author in a numerical way. The problem can be broken down into -three sub problems as author identification, author characterization and -similarity detection. The steps involved are pre-processing, extracting -features, classification and author identification. For this different -classifiers can be used. Here fuzzy learning classifier and SVM are used. After -author identification the SVM was found to have more accuracy than Fuzzy -classifier. Later combined the classifiers to obtain a better accuracy when -compared to individual SVM and fuzzy classifier. -" -927,1310.4938,Andreas Wotzlaw and Ravi Coote,"A Logic-based Approach for Recognizing Textual Entailment Supported by - Ontological Background Knowledge",cs.CL cs.AI cs.LO," We present the architecture and the evaluation of a new system for -recognizing textual entailment (RTE). In RTE we want to identify automatically -the type of a logical relation between two input texts. In particular, we are -interested in proving the existence of an entailment between them. We conceive -our system as a modular environment allowing for a high-coverage syntactic and -semantic text analysis combined with logical inference. For the syntactic and -semantic analysis we combine a deep semantic analysis with a shallow one -supported by statistical models in order to increase the quality and the -accuracy of results. For RTE we use logical inference of first-order employing -model-theoretic techniques and automated reasoning tools. The inference is -supported with problem-relevant background knowledge extracted automatically -and on demand from external sources like, e.g., WordNet, YAGO, and OpenCyc, or -other, more experimental sources with, e.g., manually defined presupposition -resolutions, or with axiomatized general and common sense knowledge. The -results show that fine-grained and consistent knowledge coming from diverse -sources is a necessary condition determining the correctness and traceability -of results. -" -928,1310.5042,Peter D. Turney,"Distributional semantics beyond words: Supervised learning of analogy - and paraphrase",cs.LG cs.AI cs.CL cs.IR," There have been several efforts to extend distributional semantics beyond -individual words, to measure the similarity of word pairs, phrases, and -sentences (briefly, tuples; ordered sets of words, contiguous or -noncontiguous). One way to extend beyond words is to compare two tuples using a -function that combines pairwise similarities between the component words in the -tuples. A strength of this approach is that it works with both relational -similarity (analogy) and compositional similarity (paraphrase). However, past -work required hand-coding the combination function for different tasks. The -main contribution of this paper is that combination functions are generated by -supervised learning. We achieve state-of-the-art results in measuring -relational similarity between word pairs (SAT analogies and SemEval~2012 Task -2) and measuring compositional similarity between noun-modifier phrases and -unigrams (multiple-choice paraphrase questions). -" -929,1310.5884,Ramon Ferrer-i-Cancho,The optimality of attaching unlinked labels to unlinked meanings,cs.CL physics.data-an physics.soc-ph," Vocabulary learning by children can be characterized by many biases. When -encountering a new word, children as well as adults, are biased towards -assuming that it means something totally different from the words that they -already know. To the best of our knowledge, the 1st mathematical proof of the -optimality of this bias is presented here. First, it is shown that this bias is -a particular case of the maximization of mutual information between words and -meanings. Second, the optimality is proven within a more general information -theoretic framework where mutual information maximization competes with other -information theoretic principles. The bias is a prediction from modern -information theory. The relationship between information theoretic principles -and the principles of contrast and mutual exclusivity is also shown. -" -930,1310.5963,"Foruzan Kiamarzpour, Rouhollah Dianat, Mohammad bahrani, Mehdi - Sadeghzadeh",Improving the methods of email classification based on words ontology,cs.IR cs.CL," The Internet has dramatically changed the relationship among people and their -relationships with others people and made the valuable information available -for the users. Email is the service, which the Internet provides today for its -own users; this service has attracted most of the users' attention due to the -low cost. Along with the numerous benefits of Email, one of the weaknesses of -this service is that the number of received emails is continually being -enhanced, thus the ways are needed to automatically filter these disturbing -letters. Most of these filters utilize a combination of several techniques such -as the Black or white List, using the keywords and so on in order to identify -the spam more accurately In this paper, we introduce a new method to classify -the spam. We are seeking to increase the accuracy of Email classification by -combining the output of several decision trees and the concept of ontology. -" -931,1310.6772,Thamar Solorio and Ragib Hasan and Mainul Mizan,"Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive - Writing for Linking Identities",cs.CL cs.CR cs.CY," This paper describes the corpus of sockpuppet cases we gathered from -Wikipedia. A sockpuppet is an online user account created with a fake identity -for the purpose of covering abusive behavior and/or subverting the editing -regulation process. We used a semi-automated method for crawling and curating a -dataset of real sockpuppet investigation cases. To the best of our knowledge, -this is the first corpus available on real-world deceptive writing. We describe -the process for crawling the data and some preliminary results that can be used -as baseline for benchmarking research. The dataset will be released under a -Creative Commons license from our project website: http://docsig.cis.uab.edu. -" -932,1310.6775,Linas Vepstas,Durkheim Project Data Analysis Report,cs.AI cs.CL cs.LG," This report describes the suicidality prediction models created under the -DARPA DCAPS program in association with the Durkheim Project -[http://durkheimproject.org/]. The models were built primarily from -unstructured text (free-format clinician notes) for several hundred patient -records obtained from the Veterans Health Administration (VHA). The models were -constructed using a genetic programming algorithm applied to bag-of-words and -bag-of-phrases datasets. The influence of additional structured data was -explored but was found to be minor. Given the small dataset size, -classification between cohorts was high fidelity (98%). Cross-validation -suggests these models are reasonably predictive, with an accuracy of 50% to 69% -on five rotating folds, with ensemble averages of 58% to 67%. One particularly -noteworthy result is that word-pairs can dramatically improve classification -accuracy; but this is the case only when one of the words in the pair is -already known to have a high predictive value. By contrast, the set of all -possible word-pairs does not improve on a simple bag-of-words model. -" -933,1310.7782,"Andrea Baronchelli, Vittorio Loreto, Andrea Puglisi","Individual Biases, Cultural Evolution, and the Statistical Nature of - Language Universals: The Case of Colour Naming Systems",physics.soc-ph cs.CL cs.MA q-bio.PE," Language universals have long been attributed to an innate Universal Grammar. -An alternative explanation states that linguistic universals emerged -independently in every language in response to shared cognitive or perceptual -biases. A computational model has recently shown how this could be the case, -focusing on the paradigmatic example of the universal properties of colour -naming patterns, and producing results in quantitative agreement with the -experimental data. Here we investigate the role of an individual perceptual -bias in the framework of the model. We study how, and to what extent, the -structure of the bias influences the corresponding linguistic universal -patterns. We show that the cultural history of a group of speakers introduces -population-specific constraints that act against the pressure for uniformity -arising from the individual bias, and we clarify the interplay between these -two forces. -" -934,1310.8059,Thabet Slimani,Description and Evaluation of Semantic Similarity Measures Approaches,cs.CL," In recent years, semantic similarity measure has a great interest in Semantic -Web and Natural Language Processing (NLP). Several similarity measures have -been developed, being given the existence of a structured knowledge -representation offered by ontologies and corpus which enable semantic -interpretation of terms. Semantic similarity measures compute the similarity -between concepts/terms included in knowledge sources in order to perform -estimations. This paper discusses the existing semantic similarity methods -based on structure, information content and feature approaches. Additionally, -we present a critical evaluation of several categories of semantic similarity -approaches based on two standard benchmarks. The aim of this paper is to give -an efficient evaluation of all these measures which help researcher and -practitioners to select the measure that best fit for their requirements. -" -935,1310.8511,{\L}ukasz D\k{e}bowski,"A Preadapted Universal Switch Distribution for Testing Hilberg's - Conjecture",cs.IT cs.CL math.IT," Hilberg's conjecture about natural language states that the mutual -information between two adjacent long blocks of text grows like a power of the -block length. The exponent in this statement can be upper bounded using the -pointwise mutual information estimate computed for a carefully chosen code. The -bound is the better, the lower the compression rate is but there is a -requirement that the code be universal. So as to improve a received upper bound -for Hilberg's exponent, in this paper, we introduce two novel universal codes, -called the plain switch distribution and the preadapted switch distribution. -Generally speaking, switch distributions are certain mixtures of adaptive -Markov chains of varying orders with some additional communication to avoid so -called catch-up phenomenon. The advantage of these distributions is that they -both achieve a low compression rate and are guaranteed to be universal. Using -the switch distributions we obtain that a sample of a text in English is -non-Markovian with Hilberg's exponent being $\le 0.83$, which improves over the -previous bound $\le 0.94$ obtained using the Lempel-Ziv code. -" -936,1311.0833,Zitao Liu,"A Comparative Study on Linguistic Feature Selection in Sentiment - Polarity Classification",cs.CL," Sentiment polarity classification is perhaps the most widely studied topic. -It classifies an opinionated document as expressing a positive or negative -opinion. In this paper, using movie review dataset, we perform a comparative -study with different single kind linguistic features and the combinations of -these features. We find that the classic topic-based classifier(Naive Bayes and -Support Vector Machine) do not perform as well on sentiment polarity -classification. And we find that with some combination of different linguistic -features, the classification accuracy can be boosted a lot. We give some -reasonable explanations about these boosting outcomes. -" -937,1311.1169,"D\'aniel Kondor, Istv\'an Csabai, L\'aszl\'o Dobos, J\'anos Sz\""ule, - Norbert Barankai, Tam\'as Hanyecz, Tam\'as Seb\H{o}k, Zs\'ofia Kallus, - G\'abor Vattay","Using Robust PCA to estimate regional characteristics of language use - from geo-tagged Twitter messages",cs.CL," Principal component analysis (PCA) and related techniques have been -successfully employed in natural language processing. Text mining applications -in the age of the online social media (OSM) face new challenges due to -properties specific to these use cases (e.g. spelling issues specific to texts -posted by users, the presence of spammers and bots, service announcements, -etc.). In this paper, we employ a Robust PCA technique to separate typical -outliers and highly localized topics from the low-dimensional structure present -in language use in online social networks. Our focus is on identifying -geospatial features among the messages posted by the users of the Twitter -microblogging service. Using a dataset which consists of over 200 million -geolocated tweets collected over the course of a year, we investigate whether -the information present in word usage frequencies can be used to identify -regional features of language use and topics of interest. Using the PCA pursuit -method, we are able to identify important low-dimensional features, which -constitute smoothly varying functions of the geographic location. -" -938,1311.1194,"Saif M. Mohammad, Svetlana Kiritchenko, and Joel Martin",Identifying Purpose Behind Electoral Tweets,cs.CL," Tweets pertaining to a single event, such as a national election, can number -in the hundreds of millions. Automatically analyzing them is beneficial in many -downstream natural language applications such as question answering and -summarization. In this paper, we propose a new task: identifying the purpose -behind electoral tweets--why do people post election-oriented tweets? We show -that identifying purpose is correlated with the related phenomenon of sentiment -and emotion detection, but yet significantly different. Detecting purpose has a -number of applications including detecting the mood of the electorate, -estimating the popularity of policies, identifying key issues of contention, -and predicting the course of events. We create a large dataset of electoral -tweets and annotate a few thousand tweets for purpose. We develop a system that -automatically classifies electoral tweets as per their purpose, obtaining an -accuracy of 43.56% on an 11-class task and an accuracy of 73.91% on a 3-class -task (both accuracies well above the most-frequent-class baseline). Finally, we -show that resources developed for emotion detection are also helpful for -detecting purpose. -" -939,1311.1539,Edward Grefenstette,"Category-Theoretic Quantitative Compositional Distributional Models of - Natural Language Semantics",cs.CL cs.LG math.CT math.LO," This thesis is about the problem of compositionality in distributional -semantics. Distributional semantics presupposes that the meanings of words are -a function of their occurrences in textual contexts. It models words as -distributions over these contexts and represents them as vectors in high -dimensional spaces. The problem of compositionality for such models concerns -itself with how to produce representations for larger units of text by -composing the representations of smaller units of text. - This thesis focuses on a particular approach to this compositionality -problem, namely using the categorical framework developed by Coecke, Sadrzadeh, -and Clark, which combines syntactic analysis formalisms with distributional -semantic representations of meaning to produce syntactically motivated -composition operations. This thesis shows how this approach can be -theoretically extended and practically implemented to produce concrete -compositional distributional models of natural language semantics. It -furthermore demonstrates that such models can perform on par with, or better -than, other competing approaches in the field of natural language processing. - There are three principal contributions to computational linguistics in this -thesis. The first is to extend the DisCoCat framework on the syntactic front -and semantic front, incorporating a number of syntactic analysis formalisms and -providing learning procedures allowing for the generation of concrete -compositional distributional models. The second contribution is to evaluate the -models developed from the procedures presented here, showing that they -outperform other compositional distributional models present in the literature. -The third contribution is to show how using category theory to solve linguistic -problems forms a sound basis for research, illustrated by examples of work on -this topic, that also suggest directions for future research. -" -940,1311.1897,"Christian Retor\'e (LaBRI, INRIA Bordeaux - Sud-Ouest)",Logique math\'ematique et linguistique formelle,math.LO cs.CL," As the etymology of the word shows, logic is intimately related to language, -as exemplified by the work of philosophers from Antiquity and from the -Middle-Age. At the beginning of the XX century, the crisis of the foundations -of mathematics invented mathematical logic and imposed logic as a -language-based foundation for mathematics. How did the relations between logic -and language evolved in this newly defined mathematical framework? After a -survey of the history of the relation between logic and linguistics, -traditionally focused on semantics, we focus on some present issues: 1) grammar -as a deductive system 2) the transformation of the syntactic structure of a -sentence to a logical formula representing its meaning 3) taking into account -the context when interpreting words. This lecture shows that type theory -provides a convenient framework both for natural language syntax and for the -interpretation of any of tis level (words, sentences, discourse). -" -941,1311.2252,Ran El-Yaniv and David Yanay,"Semantic Sort: A Supervised Approach to Personalized Semantic - Relatedness",cs.CL cs.LG," We propose and study a novel supervised approach to learning statistical -semantic relatedness models from subjectively annotated training examples. The -proposed semantic model consists of parameterized co-occurrence statistics -associated with textual units of a large background knowledge corpus. We -present an efficient algorithm for learning such semantic models from a -training sample of relatedness preferences. Our method is corpus independent -and can essentially rely on any sufficiently large (unstructured) collection of -coherent texts. Moreover, the approach facilitates the fitting of semantic -models for specific users or groups of users. We present the results of -extensive range of experiments from small to large scale, indicating that the -proposed method is effective and competitive with the state-of-the-art. -" -942,1311.2702,"Tobias Kuhn, Alexandre Bergel",Verifiable Source Code Documentation in Controlled Natural Language,cs.SE cs.AI cs.CL cs.HC cs.LO," Writing documentation about software internals is rarely considered a -rewarding activity. It is highly time-consuming and the resulting documentation -is fragile when the software is continuously evolving in a multi-developer -setting. Unfortunately, traditional programming environments poorly support the -writing and maintenance of documentation. Consequences are severe as the lack -of documentation on software structure negatively impacts the overall quality -of the software product. We show that using a controlled natural language with -a reasoner and a query engine is a viable technique for verifying the -consistency and accuracy of documentation and source code. Using ACE, a -state-of-the-art controlled natural language, we present positive results on -the comprehensibility and the general feasibility of creating and verifying -documentation. As a case study, we used automatic documentation verification to -identify and fix severe flaws in the architecture of a non-trivial piece of -software. Moreover, a user experiment shows that our language is faster and -easier to learn and understand than other formal languages for software -documentation. -" -943,1311.2978,"Shibamouli Lahiri, Rada Mihalcea",Authorship Attribution Using Word Network Features,cs.CL," In this paper, we explore a set of novel features for authorship attribution -of documents. These features are derived from a word network representation of -natural language text. As has been noted in previous studies, natural language -tends to show complex network structure at word level, with low degrees of -separation and scale-free (power law) degree distribution. There has also been -work on authorship attribution that incorporates ideas from complex networks. -The goal of our paper is to explore properties of these complex networks that -are suitable as features for machine-learning-based authorship attribution of -documents. We performed experiments on three different datasets, and obtained -promising results. -" -944,1311.3011,Yoav Artzi,Cornell SPF: Cornell Semantic Parsing Framework,cs.CL," The Cornell Semantic Parsing Framework (SPF) is a learning and inference -framework for mapping natural language to formal representation of its meaning. -" -945,1311.3175,"Athira P. M., Sreeja M. and P. C. Reghu Raj","Architecture of an Ontology-Based Domain-Specific Natural Language - Question Answering System",cs.CL cs.IR," Question answering (QA) system aims at retrieving precise information from a -large collection of documents against a query. This paper describes the -architecture of a Natural Language Question Answering (NLQA) system for a -specific domain based on the ontological information, a step towards semantic -web question answering. The proposed architecture defines four basic modules -suitable for enhancing current QA capabilities with the ability of processing -complex questions. The first module was the question processing, which analyses -and classifies the question and also reformulates the user query. The second -module allows the process of retrieving the relevant documents. The next module -processes the retrieved documents, and the last module performs the extraction -and generation of a response. Natural language processing techniques are used -for processing the question and documents and also for answer extraction. -Ontology and domain knowledge are used for reformulating queries and -identifying the relations. The aim of the system is to generate short and -specific answer to the question that is asked in the natural language in a -specific domain. We have achieved 94 % accuracy of natural language question -answering in our implementation. -" -946,1311.3961,"Nisheeth Joshi, Iti Mathur, Hemant Darbari and Ajai Kumar",HEVAL: Yet Another Human Evaluation Metric,cs.CL," Machine translation evaluation is a very important activity in machine -translation development. Automatic evaluation metrics proposed in literature -are inadequate as they require one or more human reference translations to -compare them with output produced by machine translation. This does not always -give accurate results as a text can have several different translations. Human -evaluation metrics, on the other hand, lacks inter-annotator agreement and -repeatability. In this paper we have proposed a new human evaluation metric -which addresses these issues. Moreover this metric also provides solid grounds -for making sound assumptions on the quality of the text produced by a machine -translation. -" -947,1311.3987,"Seyed-Mehdi-Reza Beheshti and Srikumar Venugopal and Seung Hwan Ryu - and Boualem Benatallah and Wei Wang","Big Data and Cross-Document Coreference Resolution: Current State and - Future Opportunities",cs.CL cs.DC cs.IR," Information Extraction (IE) is the task of automatically extracting -structured information from unstructured/semi-structured machine-readable -documents. Among various IE tasks, extracting actionable intelligence from -ever-increasing amount of data depends critically upon Cross-Document -Coreference Resolution (CDCR) - the task of identifying entity mentions across -multiple documents that refer to the same underlying entity. Recently, document -datasets of the order of peta-/tera-bytes has raised many challenges for -performing effective CDCR such as scaling to large numbers of mentions and -limited representational power. The problem of analysing such datasets is -called ""big data"". The aim of this paper is to provide readers with an -understanding of the central concepts, subtasks, and the current -state-of-the-art in CDCR process. We provide assessment of existing -tools/techniques for CDCR subtasks and highlight big data challenges in each of -them to help readers identify important and outstanding issues for further -investigation. Finally, we provide concluding remarks and discuss possible -directions for future work. -" -948,1311.5401,Nicolas Turenne,Clustering and Relational Ambiguity: from Text Data to Natural Data,cs.CL cs.IR," Text data is often seen as ""take-away"" materials with little noise and easy -to process information. Main questions are how to get data and transform them -into a good document format. But data can be sensitive to noise oftenly called -ambiguities. Ambiguities are aware from a long time, mainly because polysemy is -obvious in language and context is required to remove uncertainty. I claim in -this paper that syntactic context is not suffisant to improve interpretation. -In this paper I try to explain that firstly noise can come from natural data -themselves, even involving high technology, secondly texts, seen as verified -but meaningless, can spoil content of a corpus; it may lead to contradictions -and background noise. -" -949,1311.5427,"Gerardo Febres, Klaus Jaffe, Carlos Gershenson",Complexity measurement of natural and artificial languages,cs.CL cs.IT math.IT nlin.AO physics.soc-ph," We compared entropy for texts written in natural languages (English, Spanish) -and artificial languages (computer software) based on a simple expression for -the entropy as a function of message length and specific word diversity. Code -text written in artificial languages showed higher entropy than text of similar -length expressed in natural languages. Spanish texts exhibit more symbolic -diversity than English ones. Results showed that algorithms based on complexity -measures differentiate artificial from natural languages, and that text -analysis based on complexity measures allows the unveiling of important aspects -of their nature. We propose specific expressions to examine entropy related -aspects of tests and estimate the values of entropy, emergence, -self-organization and complexity based on specific diversity and message -length. -" -950,1311.5836,"Pooja Gupta, Nisheeth Joshi, Iti Mathur",Automatic Ranking of MT Outputs using Approximations,cs.CL," Since long, research on machine translation has been ongoing. Still, we do -not get good translations from MT engines so developed. Manual ranking of these -outputs tends to be very time consuming and expensive. Identifying which one is -better or worse than the others is a very taxing task. In this paper, we show -an approach which can provide automatic ranks to MT outputs (translations) -taken from different MT Engines and which is based on N-gram approximations. We -provide a solution where no human intervention is required for ranking systems. -Further we also show the evaluations of our results which show equivalent -results as that of human ranking. -" -951,1311.6045,"Nidhal El-Abbadi, Ahmed Nidhal Khdhair, Adel Al-Nasrawi",Build Electronic Arabic Lexicon,cs.CL," There are many known Arabic lexicons organized on different ways, each of -them has a different number of Arabic words according to its organization way. -This paper has used mathematical relations to count a number of Arabic words, -which proofs the number of Arabic words presented by Al Farahidy. The paper -also presents new way to build an electronic Arabic lexicon by using a hash -function that converts each word (as input) to correspond a unique integer -number (as output), these integer numbers will be used as an index to a lexicon -entry. -" -952,1311.6063,"Sheng Yu, Tianrun Cai and Tianxi Cai",NILE: Fast Natural Language Processing for Electronic Health Records,cs.CL," Objective: Narrative text in Electronic health records (EHR) contain rich -information for medical and data science studies. This paper introduces the -design and performance of Narrative Information Linear Extraction (NILE), a -natural language processing (NLP) package for EHR analysis that we share with -the medical informatics community. Methods: NILE uses a modified prefix-tree -search algorithm for named entity recognition, which can detect prefix and -suffix sharing. The semantic analyses are implemented as rule-based finite -state machines. Analyses include negation, location, modification, family -history, and ignoring. Result: The processing speed of NILE is hundreds to -thousands times faster than existing NLP software for medical text. The -accuracy of presence analysis of NILE is on par with the best performing models -on the 2010 i2b2/VA NLP challenge data. Conclusion: The speed, accuracy, and -being able to operate via API make NILE a valuable addition to the NLP software -for medical informatics and data science. -" -953,1311.6421,"Pierluigi Crescenzi, Daniel Gildea, Andrea Marino, Gianluca Rossi, - Giorgio Satta",Synchronous Context-Free Grammars and Optimal Linear Parsing Strategies,cs.FL cs.CL," Synchronous Context-Free Grammars (SCFGs), also known as syntax-directed -translation schemata, are unlike context-free grammars in that they do not have -a binary normal form. In general, parsing with SCFGs takes space and time -polynomial in the length of the input strings, but with the degree of the -polynomial depending on the permutations of the SCFG rules. We consider linear -parsing strategies, which add one nonterminal at a time. We show that for a -given input permutation, the problems of finding the linear parsing strategy -with the minimum space and time complexity are both NP-hard. -" -954,1312.0482,"Jianfeng Gao, Xiaodong He, Wen-tau Yih, and Li Deng",Learning Semantic Representations for the Phrase Translation Model,cs.CL," This paper presents a novel semantic-based phrase translation model. A pair -of source and target phrases are projected into continuous-valued vector -representations in a low-dimensional latent semantic space, where their -translation score is computed by the distance between the pair in this new -space. The projection is performed by a multi-layer neural network whose -weights are learned on parallel training data. The learning is aimed to -directly optimize the quality of end-to-end machine translation results. -Experimental evaluation has been performed on two Europarl translation tasks, -English-French and German-English. The results show that the new semantic-based -phrase translation model significantly improves the performance of a -state-of-the-art phrase-based statistical machine translation sys-tem, leading -to a gain of 0.7-1.0 BLEU points. -" -955,1312.0493,"Ozan \.Irsoy, Claire Cardie","Bidirectional Recursive Neural Networks for Token-Level Labeling with - Structure",cs.LG cs.CL stat.ML," Recently, deep architectures, such as recurrent and recursive neural networks -have been successfully applied to various natural language processing tasks. -Inspired by bidirectional recurrent neural networks which use representations -that summarize the past and future around an instance, we propose a novel -architecture that aims to capture the structural information around an input, -and use it to label instances. We apply our method to the task of opinion -expression extraction, where we employ the binary parse tree of a sentence as -the structure, and word vector representations as the initial representation of -a single token. We conduct preliminary experiments to investigate its -performance and compare it to the sequential approach. -" -956,1312.0976,Scott A. Hale,Multilinguals and Wikipedia Editing,cs.CY cs.CL cs.DL cs.SI physics.soc-ph," This article analyzes one month of edits to Wikipedia in order to examine the -role of users editing multiple language editions (referred to as multilingual -users). Such multilingual users may serve an important function in diffusing -information across different language editions of the encyclopedia, and prior -work has suggested this could reduce the level of self-focus bias in each -edition. This study finds multilingual users are much more active than their -single-edition (monolingual) counterparts. They are found in all language -editions, but smaller-sized editions with fewer users have a higher percentage -of multilingual users than larger-sized editions. About a quarter of -multilingual users always edit the same articles in multiple languages, while -just over 40% of multilingual users edit different articles in different -languages. When non-English users do edit a second language edition, that -edition is most frequently English. Nonetheless, several regional and -linguistic cross-editing patterns are also present. -" -957,1312.2087,Nicholas H. Kirk,"Towards Structural Natural Language Formalization: Mapping Discourse to - Controlled Natural Language",cs.CL," The author describes a conceptual study towards mapping grounded natural -language discourse representation structures to instances of controlled -language statements. This can be achieved via a pipeline of preexisting state -of the art technologies, namely natural language syntax to semantic discourse -mapping, and a reduction of the latter to controlled language discourse, given -a set of previously learnt reduction rules. Concludingly a description on -evaluation, potential and limitations for ontology-based reasoning is -presented. -" -958,1312.2137,"Dimitri Palaz, Ronan Collobert, Mathew Magimai.-Doss","End-to-end Phoneme Sequence Recognition using Convolutional Neural - Networks",cs.LG cs.CL cs.NE," Most phoneme recognition state-of-the-art systems rely on a classical neural -network classifiers, fed with highly tuned features, such as MFCC or PLP -features. Recent advances in ``deep learning'' approaches questioned such -systems, but while some attempts were made with simpler features such as -spectrograms, state-of-the-art systems still rely on MFCCs. This might be -viewed as a kind of failure from deep learning approaches, which are often -claimed to have the ability to train with raw signals, alleviating the need of -hand-crafted features. In this paper, we investigate a convolutional neural -network approach for raw speech signals. While convolutional architectures got -tremendous success in computer vision or text processing, they seem to have -been let down in the past recent years in the speech processing field. We show -that it is possible to learn an end-to-end phoneme sequence classifier system -directly from raw signal, with similar performance on the TIMIT and WSJ -datasets than existing systems based on MFCC, questioning the need of complex -hand-crafted features on large datasets. -" -959,1312.2244,"Rumeng Li, Tao Wang, Xun Wang",Time-dependent Hierarchical Dirichlet Model for Timeline Generation,cs.CL cs.IR," Timeline Generation aims at summarizing news from different epochs and -telling readers how an event evolves. It is a new challenge that combines -salience ranking with novelty detection. For long-term public events, the main -topic usually includes various aspects across different epochs and each aspect -has its own evolving pattern. Existing approaches neglect such hierarchical -topic structure involved in the news corpus in timeline generation. In this -paper, we develop a novel time-dependent Hierarchical Dirichlet Model (HDM) for -timeline generation. Our model can aptly detect different levels of topic -information across corpus and such structure is further used for sentence -selection. Based on the topic mined fro HDM, sentences are selected by -considering different aspects such as relevance, coherence and coverage. We -develop experimental systems to evaluate 8 long-term events that public -concern. Performance comparison between different systems demonstrates the -effectiveness of our model in terms of ROUGE metrics. -" -960,1312.2844,"Norbert Rimoux, Patrice Descourt",mARC: Memory by Association and Reinforcement of Contexts,cs.IR cs.CL nlin.AO nlin.CD," This paper introduces the memory by Association and Reinforcement of Contexts -(mARC). mARC is a novel data modeling technology rooted in the second -quantization formulation of quantum mechanics. It is an all-purpose incremental -and unsupervised data storage and retrieval system which can be applied to all -types of signal or data, structured or unstructured, textual or not. mARC can -be applied to a wide range of information clas-sification and retrieval -problems like e-Discovery or contextual navigation. It can also for-mulated in -the artificial life framework a.k.a Conway ""Game Of Life"" Theory. In contrast -to Conway approach, the objects evolve in a massively multidimensional space. -In order to start evaluating the potential of mARC we have built a mARC-based -Internet search en-gine demonstrator with contextual functionality. We compare -the behavior of the mARC demonstrator with Google search both in terms of -performance and relevance. In the study we find that the mARC search engine -demonstrator outperforms Google search by an order of magnitude in response -time while providing more relevant results for some classes of queries. -" -961,1312.3005,"Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, - Phillipp Koehn, Tony Robinson","One Billion Word Benchmark for Measuring Progress in Statistical - Language Modeling",cs.CL," We propose a new benchmark corpus to be used for measuring progress in -statistical language modeling. With almost one billion words of training data, -we hope this benchmark will be useful to quickly evaluate novel language -modeling techniques, and to compare their contribution when combined with other -advanced techniques. We show performance of several well-known types of -language models, with the best results achieved with a recurrent neural network -based language model. The baseline unpruned Kneser-Ney 5-gram model achieves -perplexity 67.6; a combination of techniques leads to 35% reduction in -perplexity, or 10% reduction in cross-entropy (bits), over that baseline. - The benchmark is available as a code.google.com project; besides the scripts -needed to rebuild the training/held-out data, it also makes available -log-probability values for each word in each of ten held-out data sets, for -each of the baseline n-gram models. -" -962,1312.3168,"Bruno Mery (LaBRI), Christian Retor\'e (LaBRI)","Semantic Types, Lexical Sorts and Classifiers",cs.CL," We propose a cognitively and linguistically motivated set of sorts for -lexical semantics in a compositional setting: the classifiers in languages that -do have such pronouns. These sorts are needed to include lexical considerations -in a semantical analyser such as Boxer or Grail. Indeed, all proposed lexical -extensions of usual Montague semantics to model restriction of selection, -felicitous and infelicitous copredication require a rich and refined type -system whose base types are the lexical sorts, the basis of the many-sorted -logic in which semantical representations of sentences are stated. However, -none of those approaches define precisely the actual base types or sorts to be -used in the lexicon. In this article, we shall discuss some of the options -commonly adopted by researchers in formal lexical semantics, and defend the -view that classifiers in the languages which have such pronouns are an -appealing solution, both linguistically and cognitively motivated. -" -963,1312.3251,"Nayan Jyoti Kalita, Navanath Saharia and Smriti Kumar Sinha",Towards The Development of a Bishnupriya Manipuri Corpus,cs.CL," For any deep computational processing of language we need evidences, and one -such set of evidences is corpus. This paper describes the development of a -text-based corpus for the Bishnupriya Manipuri language. A Corpus is considered -as a building block for any language processing tasks. Due to the lack of -awareness like other Indian languages, it is also studied less frequently. As a -result the language still lacks a good corpus and basic language processing -tools. As per our knowledge this is the first effort to develop a corpus for -Bishnupriya Manipuri language. -" -964,1312.3258,Henda Chorfi Ouertani,"Implicit Sensitive Text Summarization based on Data Conveyed by - Connectives",cs.CL," So far and trying to reach human capabilities, research in automatic -summarization has been based on hypothesis that are both enabling and limiting. -Some of these limitations are: how to take into account and reflect (in the -generated summary) the implicit information conveyed in the text, the author -intention, the reader intention, the context influence, the general world -knowledge. Thus, if we want machines to mimic human abilities, then they will -need access to this same large variety of knowledge. The implicit is affecting -the orientation and the argumentation of the text and consequently its summary. -Most of Text Summarizers (TS) are processing as compressing the initial data -and they necessarily suffer from information loss. TS are focusing on features -of the text only, not on what the author intended or why the reader is reading -the text. In this paper, we address this problem and we present a system -focusing on acquiring knowledge that is implicit. We principally spotlight the -implicit information conveyed by the argumentative connectives such as: but, -even, yet and their effect on the summary. -" -965,1312.4092,"Edouard Grave (LIENS, INRIA Paris - Rocquencourt), Guillaume Obozinski - (LIGM), Francis Bach (LIENS, INRIA Paris - Rocquencourt)",Domain adaptation for sequence labeling using hidden Markov models,cs.CL cs.LG," Most natural language processing systems based on machine learning are not -robust to domain shift. For example, a state-of-the-art syntactic dependency -parser trained on Wall Street Journal sentences has an absolute drop in -performance of more than ten points when tested on textual data from the Web. -An efficient solution to make these methods more robust to domain shift is to -first learn a word representation using large amounts of unlabeled data from -both domains, and then use this representation as features in a supervised -learning algorithm. In this paper, we propose to use hidden Markov models to -learn word representations for part-of-speech tagging. In particular, we study -the influence of using data from the source, the target or both domains to -learn the representation and the different ways to represent words using an -HMM. -" -966,1312.4617,"Mariam Adedoyin-Olowe, Mohamed Medhat Gaber and Frederic Stahl",A Survey of Data Mining Techniques for Social Media Analysis,cs.SI cs.CL," Social network has gained remarkable attention in the last decade. Accessing -social network sites such as Twitter, Facebook LinkedIn and Google+ through the -internet and the web 2.0 technologies has become more affordable. People are -becoming more interested in and relying on social network for information, news -and opinion of other users on diverse subject matters. The heavy reliance on -social network sites causes them to generate massive data characterised by -three computational issues namely; size, noise and dynamism. These issues often -make social network data very complex to analyse manually, resulting in the -pertinent use of computational means of analysing them. Data mining provides a -wide range of techniques for detecting useful knowledge from massive datasets -like trends, patterns and rules [44]. Data mining techniques are used for -information retrieval, statistical modelling and machine learning. These -techniques employ data pre-processing, data analysis, and data interpretation -processes in the course of data analysis. This survey discusses different data -mining techniques used in mining diverse aspects of the social network over -decades going from the historical techniques to the up-to-date models, -including our novel technique named TRCM. All the techniques covered in this -survey are listed in the Table.1 including the tools employed as well as names -of their authors. -" -967,1312.4706,"Donna Vakharia, Rachel Gibbs",Designing Spontaneous Speech Search Interface for Historical Archives,cs.HC cs.CL," Spontaneous speech in the form of conversations, meetings, voice-mail, -interviews, oral history, etc. is one of the most ubiquitous forms of human -communication. Search engines providing access to such speech collections have -the potential to better inform intelligence and make relevant data over vast -audio/video archives available to users. This project presents a search user -interface design supporting search tasks over a speech collection consisting of -an historical archive with nearly 52,000 audiovisual testimonies of survivors -and witnesses of the Holocaust and other genocides. The design incorporates -faceted search, along with other UI elements like highlighted search items, -tags, snippets, etc., to promote discovery and exploratory search. Two -different designs have been created to support both manual and automated -transcripts. Evaluation was performed using human subjects to measure accuracy -in retrieving results, understanding user-perspective on the design elements, -and ease of parsing information. -" -968,1312.4824,"B. P. Pande, Pawan Tamta, H. S. Dhami","Generation, Implementation and Appraisal of an N-gram based Stemming - Algorithm",cs.IR cs.CL," A language independent stemmer has always been looked for. Single N-gram -tokenization technique works well, however, it often generates stems that start -with intermediate characters, rather than initial ones. We present a novel -technique that takes the concept of N gram stemming one step ahead and compare -our method with an established algorithm in the field, Porter's Stemmer. -Results indicate that our N gram stemmer is not inferior to Porter's linguistic -stemmer. -" -969,1312.5129,"Wenpeng Yin and Hinrich Sch\""utze",Deep Learning Embeddings for Discontinuous Linguistic Units,cs.CL," Deep learning embeddings have been successfully used for many natural -language processing problems. Embeddings are mostly computed for word forms -although a number of recent papers have extended this to other linguistic units -like morphemes and phrases. In this paper, we argue that learning embeddings -for discontinuous linguistic units should also be considered. In an -experimental evaluation on coreference resolution, we show that such embeddings -perform better than word form embeddings. -" -970,1312.5198,Ashutosh Modi and Ivan Titov,Learning Semantic Script Knowledge with Event Embeddings,cs.LG cs.AI cs.CL stat.ML," Induction of common sense knowledge about prototypical sequences of events -has recently received much attention. Instead of inducing this knowledge in the -form of graphs, as in much of the previous work, in our method, distributed -representations of event realizations are computed based on distributed -representations of predicates and their arguments, and then these -representations are used to predict prototypical event orderings. The -parameters of the compositional process for computing the event representations -and the ranking component of the model are jointly estimated from texts. We -show that this approach results in a substantial boost in ordering performance -with respect to previous methods. -" -971,1312.5542,R\'emi Lebret and Ronan Collobert,Word Emdeddings through Hellinger PCA,cs.CL cs.LG," Word embeddings resulting from neural language models have been shown to be -successful for a large variety of NLP tasks. However, such architecture might -be difficult to train and time-consuming. Instead, we propose to drastically -simplify the word embeddings computation through a Hellinger PCA of the word -co-occurence matrix. We compare those new word embeddings with some well-known -embeddings on NER and movie review tasks and show that we can reach similar or -even better performance. Although deep learning is not really necessary for -generating good word embeddings, we show that it can provide an easy way to -adapt embeddings to specific tasks. -" -972,1312.5559,"Irina Sergienya and Hinrich Sch\""utze","Distributional Models and Deep Learning Embeddings: Combining the Best - of Both Worlds",cs.CL," There are two main approaches to the distributed representation of words: -low-dimensional deep learning embeddings and high-dimensional distributional -models, in which each dimension corresponds to a context word. In this paper, -we combine these two approaches by learning embeddings based on -distributional-model vectors - as opposed to one-hot vectors as is standardly -done in deep learning. We show that the combined approach has better -performance on a word relatedness judgment task. -" -973,1312.5985,Tamara Polajnar and Luana Fagarasan and Stephen Clark,Learning Type-Driven Tensor-Based Meaning Representations,cs.CL cs.LG," This paper investigates the learning of 3rd-order tensors representing the -semantics of transitive verbs. The meaning representations are part of a -type-driven tensor-based semantic framework, from the newly emerging field of -compositional distributional semantics. Standard techniques from the neural -networks literature are used to learn the tensors, which are tested on a -selectional preference-style task with a simple 2-dimensional sentence space. -Promising results are obtained against a competitive corpus-based baseline. We -argue that extending this work beyond transitive verbs, and to -higher-dimensional sentence spaces, is an interesting and challenging problem -for the machine learning community to consider. -" -974,1312.6168,Anjan Nepal and Alexander Yates,"Factorial Hidden Markov Models for Learning Representations of Natural - Language",cs.LG cs.CL," Most representation learning algorithms for language and image processing are -local, in that they identify features for a data point based on surrounding -points. Yet in language processing, the correct meaning of a word often depends -on its global context. As a step toward incorporating global context into -representation learning, we develop a representation learning algorithm that -incorporates joint prediction into its technique for producing features for a -word. We develop efficient variational methods for learning Factorial Hidden -Markov Models from large texts, and use variational distributions to produce -features for each word that are sensitive to the entire input sequence, not -just to a local context window. Experiments on part-of-speech tagging and -chunking indicate that the features are competitive with or better than -existing state-of-the-art representation learning methods. -" -975,1312.6173,Karl Moritz Hermann and Phil Blunsom,Multilingual Distributed Representations without Word Alignment,cs.CL," Distributed representations of meaning are a natural way to encode covariance -relationships between words and phrases in NLP. By overcoming data sparsity -problems, as well as providing information about semantic relatedness which is -not available in discrete representations, distributed representations have -proven useful in many NLP tasks. Recent work has shown how compositional -semantic representations can successfully be applied to a number of monolingual -applications such as sentiment analysis. At the same time, there has been some -initial success in work on learning shared word-level representations across -languages. We combine these two approaches by proposing a method for learning -distributed representations in a multilingual setup. Our model learns to assign -similar embeddings to aligned sentences and dissimilar ones to sentence which -are not aligned while not requiring word alignments. We show that our -representations are semantically informative and apply them to a cross-lingual -document classification task where we outperform the previous state of the art. -Further, by employing parallel corpora of multiple language pairs we find that -our model learns representations that capture semantic relationships across -languages for which no parallel data was used. -" -976,1312.6192,Samuel R. Bowman,Can recursive neural tensor networks learn logical reasoning?,cs.CL cs.LG," Recursive neural network models and their accompanying vector representations -for words have seen success in an array of increasingly semantically -sophisticated tasks, but almost nothing is known about their ability to -accurately capture the aspects of linguistic meaning that are necessary for -interpretation or reasoning. To evaluate this, I train a recursive model on a -new corpus of constructed examples of logical reasoning in short sentences, -like the inference of ""some animal walks"" from ""some dog walks"" or ""some cat -walks,"" given that dogs and cats are animals. This model learns representations -that generalize well to new types of reasoning pattern in all but a few cases, -a result which is promising for the ability of learned representation models to -capture logical reasoning. -" -977,1312.6802,"B. P. Pande, Pawan Tamta and H. S. Dhami",Suffix Stripping Problem as an Optimization Problem,cs.IR cs.CL," Stemming or suffix stripping, an important part of the modern Information -Retrieval systems, is to find the root word (stem) out of a given cluster of -words. Existing algorithms targeting this problem have been developed in a -haphazard manner. In this work, we model this problem as an optimization -problem. An Integer Program is being developed to overcome the shortcomings of -the existing approaches. The sample results of the proposed method are also -being compared with an established technique in the field for English language. -An AMPL code for the same IP has also been given. -" -978,1312.6849,Matthew Ager and Zoran Cvetkovic and Peter Sollich,Speech Recognition Front End Without Information Loss,cs.CL cs.CV cs.LG," Speech representation and modelling in high-dimensional spaces of acoustic -waveforms, or a linear transformation thereof, is investigated with the aim of -improving the robustness of automatic speech recognition to additive noise. The -motivation behind this approach is twofold: (i) the information in acoustic -waveforms that is usually removed in the process of extracting low-dimensional -features might aid robust recognition by virtue of structured redundancy -analogous to channel coding, (ii) linear feature domains allow for exact noise -adaptation, as opposed to representations that involve non-linear processing -which makes noise adaptation challenging. Thus, we develop a generative -framework for phoneme modelling in high-dimensional linear feature domains, and -use it in phoneme classification and recognition tasks. Results show that -classification and recognition in this framework perform better than analogous -PLP and MFCC classifiers below 18 dB SNR. A combination of the high-dimensional -and MFCC features at the likelihood level performs uniformly better than either -of the individual representations across all noise levels. -" -979,1312.6947,"Sourish Dasgupta, Ankur Padia, Kushal Shah, Prasenjit Majumder","Formal Ontology Learning on Factual IS-A Corpus in English using - Description Logics",cs.CL cs.AI," Ontology Learning (OL) is the computational task of generating a knowledge -base in the form of an ontology given an unstructured corpus whose content is -in natural language (NL). Several works can be found in this area most of which -are limited to statistical and lexico-syntactic pattern matching based -techniques Light-Weight OL. These techniques do not lead to very accurate -learning mostly because of several linguistic nuances in NL. Formal OL is an -alternative (less explored) methodology were deep linguistics analysis is made -using theory and tools found in computational linguistics to generate formal -axioms and definitions instead simply inducing a taxonomy. In this paper we -propose ""Description Logic (DL)"" based formal OL framework for learning factual -IS-A type sentences in English. We claim that semantic construction of IS-A -sentences is non trivial. Hence, we also claim that such sentences requires -special studies in the context of OL before any truly formal OL can be -proposed. We introduce a learner tool, called DLOL_IS-A, that generated such -ontologies in the owl format. We have adopted ""Gold Standard"" based OL -evaluation on IS-A rich WCL v.1.1 dataset and our own Community representative -IS-A dataset. We observed significant improvement of DLOL_IS-A when compared to -the light-weight OL tool Text2Onto and formal OL tool FRED. -" -980,1312.6948,"Sourish Dasgupta, Rupali KaPatel, Ankur Padia, Kushal Shah",Description Logics based Formalization of Wh-Queries,cs.CL cs.AI," The problem of Natural Language Query Formalization (NLQF) is to translate a -given user query in natural language (NL) into a formal language so that the -semantic interpretation has equivalence with the NL interpretation. -Formalization of NL queries enables logic based reasoning during information -retrieval, database query, question-answering, etc. Formalization also helps in -Web query normalization and indexing, query intent analysis, etc. In this paper -we are proposing a Description Logics based formal methodology for wh-query -intent (also called desire) identification and corresponding formal -translation. We evaluated the scalability of our proposed formalism using -Microsoft Encarta 98 query dataset and OWL-S TC v.4.0 dataset. -" -981,1312.6962,Ahmad Kamal,"Subjectivity Classification using Machine Learning Techniques for Mining - Feature-Opinion Pairs from Web Opinion Sources",cs.IR cs.CL cs.LG," Due to flourish of the Web 2.0, web opinion sources are rapidly emerging -containing precious information useful for both customers and manufactures. -Recently, feature based opinion mining techniques are gaining momentum in which -customer reviews are processed automatically for mining product features and -user opinions expressed over them. However, customer reviews may contain both -opinionated and factual sentences. Distillations of factual contents improve -mining performance by preventing noisy and irrelevant extraction. In this -paper, combination of both supervised machine learning and rule-based -approaches are proposed for mining feasible feature-opinion pairs from -subjective review sentences. In the first phase of the proposed approach, a -supervised machine learning technique is applied for classifying subjective and -objective sentences from customer reviews. In the next phase, a rule based -method is implemented which applies linguistic and semantic analysis of texts -to mine feasible feature-opinion pairs from subjective sentences retained after -the first phase. The effectiveness of the proposed methods is established -through experimentation over customer reviews on different electronic products. -" -982,1312.7077,"Ankur P. Parikh, Avneesh Saluja, Chris Dyer, Eric P. Xing",Language Modeling with Power Low Rank Ensembles,cs.CL cs.LG stat.ML," We present power low rank ensembles (PLRE), a flexible framework for n-gram -language modeling where ensembles of low rank matrices and tensors are used to -obtain smoothed probability estimates of words in context. Our method can be -understood as a generalization of n-gram modeling to non-integer n, and -includes standard techniques such as absolute discounting and Kneser-Ney -smoothing as special cases. PLRE training is efficient and our approach -outperforms state-of-the-art modified Kneser Ney baselines in terms of -perplexity on large corpora as well as on BLEU score in a downstream machine -translation task. -" -983,1312.7223,"Rashmi Gupta, Nisheeth Joshi, Iti Mathur",Quality Estimation of English-Hindi Outputs using Naive Bayes Classifier,cs.CL," In this paper we present an approach for estimating the quality of machine -translation system. There are various methods for estimating the quality of -output sentences, but in this paper we focus on Na\""ive Bayes classifier to -build model using features which are extracted from the input sentences. These -features are used for finding the likelihood of each of the sentences of the -training data which are then further used for determining the scores of the -test data. On the basis of these scores we determine the class labels of the -test data. -" -984,1401.0509,"Yann N. Dauphin, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck",Zero-Shot Learning for Semantic Utterance Classification,cs.CL cs.LG," We propose a novel zero-shot learning method for semantic utterance -classification (SUC). It learns a classifier $f: X \to Y$ for problems where -none of the semantic categories $Y$ are present in the training set. The -framework uncovers the link between categories and utterances using a semantic -space. We show that this semantic space can be learned by deep neural networks -trained on large amounts of search engine query log data. More precisely, we -propose a novel method that can learn discriminative semantic features without -supervision. It uses the zero-shot learning framework to guide the learning of -the semantic features. We demonstrate the effectiveness of the zero-shot -semantic learning algorithm on the SUC dataset collected by (Tur, 2012). -Furthermore, we achieve state-of-the-art results by combining the semantic -features with a supervised method. -" -985,1401.0569,"Son Doan, Mike Conway, Tu Minh Phuong, Lucila Ohno-Machado","Natural Language Processing in Biomedicine: A Unified System - Architecture Overview",cs.CL," In modern electronic medical records (EMR) much of the clinically important -data - signs and symptoms, symptom severity, disease status, etc. - are not -provided in structured data fields, but rather are encoded in clinician -generated narrative text. Natural language processing (NLP) provides a means of -""unlocking"" this important data source for applications in clinical decision -support, quality assurance, and public health. This chapter provides an -overview of representative NLP systems in biomedicine based on a unified -architectural view. A general architecture in an NLP system consists of two -main components: background knowledge that includes biomedical knowledge -resources and a framework that integrates NLP tools to process text. Systems -differ in both components, which we will review briefly. Additionally, -challenges facing current research efforts in biomedical NLP include the -paucity of large, publicly available annotated corpora, although initiatives -that facilitate data sharing, system evaluation, and collaborative work between -researchers in clinical NLP are starting to emerge. -" -986,1401.0640,Fatma El-Ghannam and Tarek El-Shishtawy,Multi-Topic Multi-Document Summarizer,cs.CL," Current multi-document summarization systems can successfully extract summary -sentences, however with many limitations including: low coverage, inaccurate -extraction to important sentences, redundancy and poor coherence among the -selected sentences. The present study introduces a new concept of centroid -approach and reports new techniques for extracting summary sentences for -multi-document. In both techniques keyphrases are used to weigh sentences and -documents. The first summarization technique (Sen-Rich) prefers maximum -richness sentences. While the second (Doc-Rich), prefers sentences from -centroid document. To demonstrate the new summarization system application to -extract summaries of Arabic documents we performed two experiments. First, we -applied Rouge measure to compare the new techniques among systems presented at -TAC2011. The results show that Sen-Rich outperformed all systems in ROUGE-S. -Second, the system was applied to summarize multi-topic documents. Using human -evaluators, the results show that Doc-Rich is the superior, where summary -sentences characterized by extra coverage and more cohesion. -" -987,1401.0660,"Bruno Mery (LaBRI), Richard Moot (LaBRI), Christian Retor\'e (LaBRI)",Plurals: individuals and sets in a richly typed semantics,cs.CL," We developed a type-theoretical framework for natural lan- guage semantics -that, in addition to the usual Montagovian treatment of compositional -semantics, includes a treatment of some phenomena of lex- ical semantic: -coercions, meaning, transfers, (in)felicitous co-predication. In this setting -we see how the various readings of plurals (collective, dis- tributive, -coverings,...) can be modelled. -" -988,1401.0708,"Taraka Rama, Sudheer Kolachina, Lakshmi Bai B","Quantitative methods for Phylogenetic Inference in Historical - Linguistics: An experimental case study of South Central Dravidian",cs.CL cs.AI," In this paper we examine the usefulness of two classes of algorithms Distance -Methods, Discrete Character Methods (Felsenstein and Felsenstein 2003) widely -used in genetics, for predicting the family relationships among a set of -related languages and therefore, diachronic language change. Applying these -algorithms to the data on the numbers of shared cognates- with-change and -changed as well as unchanged cognates for a group of six languages belonging to -a Dravidian language sub-family given in Krishnamurti et al. (1983), we -observed that the resultant phylogenetic trees are largely in agreement with -the linguistic family tree constructed using the comparative method of -reconstruction with only a few minor differences. Furthermore, we studied these -minor differences and found that they were cases of genuine ambiguity even for -a well-trained historical linguist. We evaluated the trees obtained through our -experiments using a well-defined criterion and report the results here. We -finally conclude that quantitative methods like the ones we examined are quite -useful in predicting family relationships among languages. In addition, we -conclude that a modest degree of confidence attached to the intuition that -there could indeed exist a parallelism between the processes of linguistic and -genetic change is not totally misplaced. -" -989,1401.0794,"Taraka Rama, Lars Borin",Properties of phoneme N -grams across the world's language families,cs.CL stat.CO," In this article, we investigate the properties of phoneme N-grams across half -of the world's languages. We investigate if the sizes of three different N-gram -distributions of the world's language families obey a power law. Further, the -N-gram distributions of language families parallel the sizes of the families, -which seem to obey a power law distribution. The correlation between N-gram -distributions and language family sizes improves with increasing values of N. -We applied statistical tests, originally given by physicists, to test the -hypothesis of power law fit to twelve different datasets. The study also raises -some new questions about the use of N-gram distributions in linguistic -research, which we answer by running a statistical test. -" -990,1401.1158,"Benjamin Roth, Tassilo Barth, Michael Wiegand, Mittul Singh, Dietrich - Klakow",Effective Slot Filling Based on Shallow Distant Supervision Methods,cs.CL," Spoken Language Systems at Saarland University (LSV) participated this year -with 5 runs at the TAC KBP English slot filling track. Effective algorithms for -all parts of the pipeline, from document retrieval to relation prediction and -response post-processing, are bundled in a modular end-to-end relation -extraction system called RelationFactory. The main run solely focuses on -shallow techniques and achieved significant improvements over LSV's last year's -system, while using the same training data and patterns. Improvements mainly -have been obtained by a feature representation focusing on surface skip n-grams -and improved scoring for extracted distant supervision patterns. Important -factors for effective extraction are the training and tuning scheme for distant -supervision classifiers, and the query expansion by a translation model based -on Wikipedia links. In the TAC KBP 2013 English Slotfilling evaluation, the -submitted main run of the LSV RelationFactory system achieved the top-ranked -F1-score of 37.3%. -" -991,1401.1486,"Imdad Ali Ismaili, Zeeshan Bhatti, Azhar Ali Shah",Design & Development of the Graphical User Interface for Sindhi Language,cs.HC cs.CL," This paper describes the design and implementation of a Unicode-based GUISL -(Graphical User Interface for Sindhi Language). The idea is to provide a -software platform to the people of Sindh as well as Sindhi diasporas living -across the globe to make use of computing for basic tasks such as editing, -composition, formatting, and printing of documents in Sindhi by using GUISL. -The implementation of the GUISL has been done in the Java technology to make -the system platform independent. The paper describes several design issues of -Sindhi GUI in the context of existing software tools and technologies and -explains how mapping and concatenation techniques have been employed to achieve -the cursive shape of Sindhi script. -" -992,1401.1803,"Stanislas Lauly, Alex Boulanger, Hugo Larochelle","Learning Multilingual Word Representations using a Bag-of-Words - Autoencoder",cs.CL cs.LG stat.ML," Recent work on learning multilingual word representations usually relies on -the use of word-level alignements (e.g. infered with the help of GIZA++) -between translated sentences, in order to align the word embeddings in -different languages. In this workshop paper, we investigate an autoencoder -model for learning multilingual word representations that does without such -word-level alignements. The autoencoder is trained to reconstruct the -bag-of-word representation of given sentence from an encoded representation -extracted from its translation. We evaluate our approach on a multilingual -document classification task, where labeled data is available only for one -language (e.g. English) while classification must be performed in a different -language (e.g. French). In our experiments, we observe that our method compares -favorably with a previously proposed method that exploits word-level alignments -to learn word representations. -" -993,1401.2258,Benjamin Roth,Assessing Wikipedia-Based Cross-Language Retrieval Models,cs.IR cs.CL," This work compares concept models for cross-language retrieval: First, we -adapt probabilistic Latent Semantic Analysis (pLSA) for multilingual documents. -Experiments with different weighting schemes show that a weighting method -favoring documents of similar length in both language sides gives best results. -Considering that both monolingual and multilingual Latent Dirichlet Allocation -(LDA) behave alike when applied for such documents, we use a training corpus -built on Wikipedia where all documents are length-normalized and obtain -improvements over previously reported scores for LDA. Another focus of our work -is on model combination. For this end we include Explicit Semantic Analysis -(ESA) in the experiments. We observe that ESA is not competitive with LDA in a -query based retrieval task on CLEF 2000 data. The combination of machine -translation with concept models increased performance by 21.1% map in -comparison to machine translation alone. Machine translation relies on parallel -corpora, which may not be available for many language pairs. We further explore -how much cross-lingual information can be carried over by a specific -information source in Wikipedia, namely linked text. The best results are -obtained using a language modeling approach, entirely without information from -parallel corpora. The need for smoothing raises interesting questions on -soundness and efficiency. Link models capture only a certain kind of -information and suggest weighting schemes to emphasize particular words. For a -combined model, another interesting question is therefore how to integrate -different weighting schemes. Using a very simple combination scheme, we obtain -results that compare favorably to previously reported results on the CLEF 2000 -dataset. -" -994,1401.2517,"Andrea Ballatore, Michela Bertolotto, David C. Wilson",The semantic similarity ensemble,cs.CL," Computational measures of semantic similarity between geographic terms -provide valuable support across geographic information retrieval, data mining, -and information integration. To date, a wide variety of approaches to -geo-semantic similarity have been devised. A judgment of similarity is not -intrinsically right or wrong, but obtains a certain degree of cognitive -plausibility, depending on how closely it mimics human behavior. Thus selecting -the most appropriate measure for a specific task is a significant challenge. To -address this issue, we make an analogy between computational similarity -measures and soliciting domain expert opinions, which incorporate a subjective -set of beliefs, perceptions, hypotheses, and epistemic biases. Following this -analogy, we define the semantic similarity ensemble (SSE) as a composition of -different similarity measures, acting as a panel of experts having to reach a -decision on the semantic similarity of a set of geographic terms. The approach -is evaluated in comparison to human judgments, and results indicate that an SSE -performs better than the average of its parts. Although the best member tends -to outperform the ensemble, all ensembles outperform the average performance of -each ensemble's member. Hence, in contexts where the best measure is unknown, -the ensemble provides a more cognitively plausible approach. -" -995,1401.2618,"Deepali Virmani, Vikrant Malhotra, Ridhi Tyagi",Sentiment Analysis Using Collaborated Opinion Mining,cs.IR cs.CL," Opinion mining and Sentiment analysis have emerged as a field of study since -the widespread of World Wide Web and internet. Opinion refers to extraction of -those lines or phrase in the raw and huge data which express an opinion. -Sentiment analysis on the other hand identifies the polarity of the opinion -being extracted. In this paper we propose the sentiment analysis in -collaboration with opinion extraction, summarization, and tracking the records -of the students. The paper modifies the existing algorithm in order to obtain -the collaborated opinion about the students. The resultant opinion is -represented as very high, high, moderate, low and very low. The paper is based -on a case study where teachers give their remarks about the students and by -applying the proposed sentiment analysis algorithm the opinion is extracted and -represented. -" -996,1401.2641,"Imdad Ali Ismaili, Zeeshan Bhatti, Azhar Ali Shah","Towards a Generic Framework for the Development of Unicode Based Digital - Sindhi Dictionaries",cs.CL," Dictionaries are essence of any language providing vital linguistic recourse -for the language learners, researchers and scholars. This paper focuses on the -methodology and techniques used in developing software architecture for a -UBSESD (Unicode Based Sindhi to English and English to Sindhi Dictionary). The -proposed system provides an accurate solution for construction and -representation of Unicode based Sindhi characters in a dictionary implementing -Hash Structure algorithm and a custom java Object as its internal data -structure saved in a file. The System provides facilities for Insertion, -Deletion and Editing of new records of Sindhi. Through this framework any type -of Sindhi to English and English to Sindhi Dictionary (belonging to different -domains of knowledge, e.g. engineering, medicine, computer, biology etc.) could -be developed easily with accurate representation of Unicode Characters in font -independent manner. -" -997,1401.2663,"Cem R{\i}fk{\i} Ayd{\i}n, Ali Erkan, Tunga G\""ung\""or, and Hidayet - Tak\c{c}{\i}",Dictionary-Based Concept Mining: An Application for Turkish,cs.CL," In this study, a dictionary-based method is used to extract expressive -concepts from documents. So far, there have been many studies concerning -concept mining in English, but this area of study for Turkish, an agglutinative -language, is still immature. We used dictionary instead of WordNet, a lexical -database grouping words into synsets that is widely used for concept -extraction. The dictionaries are rarely used in the domain of concept mining, -but taking into account that dictionary entries have synonyms, hypernyms, -hyponyms and other relationships in their meaning texts, the success rate has -been high for determining concepts. This concept extraction method is -implemented on documents, that are collected from different corpora. -" -998,1401.2851,"Md. Naseef-Ur-Rahman Chowdhury, Suvankar Paul, and Kazi Zakia Sultana","Statistical Analysis based Hypothesis Testing Method in Biological - Knowledge Discovery",cs.IR cs.CL," The correlation and interactions among different biological entities comprise -the biological system. Although already revealed interactions contribute to the -understanding of different existing systems, researchers face many questions -everyday regarding inter-relationships among entities. Their queries have -potential role in exploring new relations which may open up a new area of -investigation. In this paper, we introduce a text mining based method for -answering the biological queries in terms of statistical computation such that -researchers can come up with new knowledge discovery. It facilitates user to -submit their query in natural linguistic form which can be treated as -hypothesis. Our proposed approach analyzes the hypothesis and measures the -p-value of the hypothesis with respect to the existing literature. Based on the -measured value, the system either accepts or rejects the hypothesis from -statistical point of view. Moreover, even it does not find any direct -relationship among the entities of the hypothesis, it presents a network to -give an integral overview of all the entities through which the entities might -be related. This is also congenial for the researchers to widen their view and -thus think of new hypothesis for further investigation. It assists researcher -to get a quantitative evaluation of their assumptions such that they can reach -a logical conclusion and thus aids in relevant re-searches of biological -knowledge discovery. The system also provides the researchers a graphical -interactive interface to submit their hypothesis for assessment in a more -convenient way. -" -999,1401.2937,Ralf Steinberger,"A survey of methods to ease the development of highly multilingual text - mining applications",cs.CL," Multilingual text processing is useful because the information content found -in different languages is complementary, both regarding facts and opinions. -While Information Extraction and other text mining software can, in principle, -be developed for many languages, most text analysis tools have only been -applied to small sets of languages because the development effort per language -is large. Self-training tools obviously alleviate the problem, but even the -effort of providing training data and of manually tuning the results is usually -considerable. In this paper, we gather insights by various multilingual system -developers on how to minimise the effort of developing natural language -processing applications for many languages. We also explain the main guidelines -underlying our own effort to develop complex text mining software for tens of -languages. While these guidelines - most of all: extreme simplicity - can be -very restrictive and limiting, we believe to have shown the feasibility of the -approach through the development of the Europe Media Monitor (EMM) family of -applications (http://emm.newsbrief.eu/overview.html). EMM is a set of complex -media monitoring tools that process and analyse up to 100,000 online news -articles per day in between twenty and fifty languages. We will also touch upon -the kind of language resources that would make it easier for all to develop -highly multilingual text mining applications. We will argue that - to achieve -this - the most needed resources would be freely available, simple, parallel -and uniform multilingual dictionaries, corpora and software tools. -" -1000,1401.2943,"Marco Turchi, Martin Atkinson, Alastair Wilcox, Brett Crawley, Stefano - Bucci, Ralf Steinberger and Erik Van der Goot","ONTS: ""Optima"" News Translation System",cs.CL," We propose a real-time machine translation system that allows users to select -a news category and to translate the related live news articles from Arabic, -Czech, Danish, Farsi, French, German, Italian, Polish, Portuguese, Spanish and -Turkish into English. The Moses-based system was optimised for the news domain -and differs from other available systems in four ways: (1) News items are -automatically categorised on the source side, before translation; (2) Named -entity translation is optimised by recognising and extracting them on the -source side and by re-inserting their translation in the target language, -making use of a separate entity repository; (3) News titles are translated with -a separate translation system which is optimised for the specific style of news -titles; (4) The system was optimised for speed in order to cope with the large -volume of daily news articles. -" -1001,1401.3230,K Paramesha and K C Ravishankar,Optimization Of Cross Domain Sentiment Analysis Using Sentiwordnet,cs.CL cs.IR," The task of sentiment analysis of reviews is carried out using manually built -/ automatically generated lexicon resources of their own with which terms are -matched with lexicon to compute the term count for positive and negative -polarity. On the other hand the Sentiwordnet, which is quite different from -other lexicon resources that gives scores (weights) of the positive and -negative polarity for each word. The polarity of a word namely positive, -negative and neutral have the score ranging between 0 to 1 indicates the -strength/weight of the word with that sentiment orientation. In this paper, we -show that using the Sentiwordnet, how we could enhance the performance of the -classification at both sentence and document level. -" -1002,1401.3322,"Jibran Yousafzai and Zoran Cvetkovic and Peter Sollich and Matthew - Ager",A Subband-Based SVM Front-End for Robust ASR,cs.CL cs.LG cs.SD," This work proposes a novel support vector machine (SVM) based robust -automatic speech recognition (ASR) front-end that operates on an ensemble of -the subband components of high-dimensional acoustic waveforms. The key issues -of selecting the appropriate SVM kernels for classification in frequency -subbands and the combination of individual subband classifiers using ensemble -methods are addressed. The proposed front-end is compared with state-of-the-art -ASR front-ends in terms of robustness to additive noise and linear filtering. -Experiments performed on the TIMIT phoneme classification task demonstrate the -benefits of the proposed subband based SVM front-end: it outperforms the -standard cepstral front-end in the presence of noise and linear filtering for -signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed -front-end with a conventional front-end such as MFCC yields further -improvements over the individual front ends across the full range of noise -levels. -" -1003,1401.3372,Linas Vepstas and Ben Goertzel,Learning Language from a Large (Unannotated) Corpus,cs.CL cs.LG," A novel approach to the fully automated, unsupervised extraction of -dependency grammars and associated syntax-to-semantic-relationship mappings -from large text corpora is described. The suggested approach builds on the -authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well -as on a number of prior papers and approaches from the statistical language -learning literature. If successful, this approach would enable the mining of -all the information needed to power a natural language comprehension and -generation system, directly from a large, unannotated corpus. -" -1004,1401.3457,"S.R.K. Branavan, Harr Chen, Jacob Eisenstein, Regina Barzilay",Learning Document-Level Semantic Properties from Free-Text Annotations,cs.CL cs.IR," This paper presents a new method for inferring the semantic properties of -documents by leveraging free-text keyphrase annotations. Such annotations are -becoming increasingly abundant due to the recent dramatic growth in -semi-structured, user-generated online content. One especially relevant domain -is product reviews, which are often annotated by their authors with pros/cons -keyphrases such as a real bargain or good value. These annotations are -representative of the underlying semantic properties; however, unlike expert -annotations, they are noisy: lay authors may use different labels to denote the -same property, and some labels may be missing. To learn using such noisy -annotations, we find a hidden paraphrase structure which clusters the -keyphrases. The paraphrase structure is linked with a latent topic model of the -review texts, enabling the system to predict the properties of unannotated -documents and to effectively aggregate the semantic properties of multiple -reviews. Our approach is implemented as a hierarchical Bayesian model with -joint inference. We find that joint inference increases the robustness of the -keyphrase clustering and encourages the latent topics to correlate with -semantically meaningful properties. Multiple evaluations demonstrate that our -model substantially outperforms alternative approaches for summarizing single -and multiple documents into a set of semantically salient keyphrases. -" -1005,1401.3479,"Yllias Chali, Shafiq Rayhan Joty, Sadid A. Hasan","Complex Question Answering: Unsupervised Learning Approaches and - Experiments",cs.CL cs.IR cs.LG," Complex questions that require inferencing and synthesizing information from -multiple documents can be seen as a kind of topic-oriented, informative -multi-document summarization where the goal is to produce a single text as a -compressed version of a set of documents with a minimum loss of relevant -information. In this paper, we experiment with one empirical method and two -unsupervised statistical machine learning techniques: K-means and Expectation -Maximization (EM), for computing relative importance of the sentences. We -compare the results of these approaches. Our experiments show that the -empirical approach outperforms the other two techniques and EM performs better -than K-means. However, the performance of these approaches depends entirely on -the feature set used and the weighting of these features. In order to measure -the importance and relevance to the user query we extract different kinds of -features (i.e. lexical, lexical semantic, cosine similarity, basic element, -tree kernel based syntactic and shallow-semantic) for each of the document -sentences. We use a local search technique to learn the weights of the -features. To the best of our knowledge, no study has used tree kernel functions -to encode syntactic/semantic information for more complex tasks such as -computing the relatedness between the query sentences and the document -sentences in order to generate query-focused summaries (or answers to complex -questions). For each of our methods of generating summaries (i.e. empirical, -K-means and EM) we show the effects of syntactic and shallow-semantic features -over the bag-of-words (BOW) features. -" -1006,1401.3482,"Estela Saquete, Jose Luis Vicedo, Patricio Mart\'inez-Barco, Rafael - Mu\~noz, Hector Llorens","Enhancing QA Systems with Complex Temporal Question Processing - Capabilities",cs.CL cs.AI cs.IR," This paper presents a multilayered architecture that enhances the -capabilities of current QA systems and allows different types of complex -questions or queries to be processed. The answers to these questions need to be -gathered from factual information scattered throughout different documents. -Specifically, we designed a specialized layer to process the different types of -temporal questions. Complex temporal questions are first decomposed into simple -questions, according to the temporal relations expressed in the original -question. In the same way, the answers to the resulting simple questions are -recomposed, fulfilling the temporal restrictions of the original complex -question. A novel aspect of this approach resides in the decomposition which -uses a minimal quantity of resources, with the final aim of obtaining a -portable platform that is easily extensible to other languages. In this paper -we also present a methodology for evaluation of the decomposition of the -questions as well as the ability of the implemented temporal layer to perform -at a multilingual level. The temporal layer was first performed for English, -then evaluated and compared with: a) a general purpose QA system (F-measure -65.47% for QA plus English temporal layer vs. 38.01% for the general QA -system), and b) a well-known QA system. Much better results were obtained for -temporal questions with the multilayered system. This system was therefore -extended to Spanish and very good results were again obtained in the evaluation -(F-measure 40.36% for QA plus Spanish temporal layer vs. 22.94% for the general -QA system). -" -1007,1401.3488,"Harr Chen, S.R.K. Branavan, Regina Barzilay, David R. Karger",Content Modeling Using Latent Permutations,cs.IR cs.CL cs.LG," We present a novel Bayesian topic model for learning discourse-level document -structure. Our model leverages insights from discourse theory to constrain -latent topic assignments in a way that reflects the underlying organization of -document topics. We propose a global model in which both topic selection and -ordering are biased to be similar across a collection of related documents. We -show that this space of orderings can be effectively represented using a -distribution over permutations called the Generalized Mallows Model. We apply -our method to three complementary discourse-level tasks: cross-document -alignment, document segmentation, and information ordering. Our experiments -show that incorporating our permutation-based model in these applications -yields substantial improvements in performance over previously proposed -methods. -" -1008,1401.3510,Saurabh Varshney and Jyoti Bajpai,"Improving Performance Of English-Hindi Cross Language Information - Retrieval Using Transliteration Of Query Terms",cs.IR cs.CL," The main issue in Cross Language Information Retrieval (CLIR) is the poor -performance of retrieval in terms of average precision when compared to -monolingual retrieval performance. The main reasons behind poor performance of -CLIR are mismatching of query terms, lexical ambiguity and un-translated query -terms. The existing problems of CLIR are needed to be addressed in order to -increase the performance of the CLIR system. In this paper, we are putting our -effort to solve the given problem by proposed an algorithm for improving the -performance of English-Hindi CLIR system. We used all possible combination of -Hindi translated query using transliteration of English query terms and -choosing the best query among them for retrieval of documents. The experiment -is performed on FIRE 2010 (Forum of Information Retrieval Evaluation) datasets. -The experimental result show that the proposed approach gives better -performance of English-Hindi CLIR system and also helps in overcoming existing -problems and outperforms the existing English-Hindi CLIR system in terms of -average precision. -" -1009,1401.3669,"D. Tatar, M.Lupea, E. Kapetanios","Hrebs and Cohesion Chains as similar tools for semantic text properties - research",cs.CL," In this study it is proven that the Hrebs used in Denotation analysis of -texts and Cohesion Chains (defined as a fusion between Lexical Chains and -Coreference Chains) represent similar linguistic tools. This result gives us -the possibility to extend to Cohesion Chains (CCs) some important indicators -as, for example the Kernel of CCs, the topicality of a CC, text concentration, -CC-diffuseness and mean diffuseness of the text. Let us mention that nowhere in -the Lexical Chains or Coreference Chains literature these kinds of indicators -are introduced and used since now. Similarly, some applications of CCs in the -study of a text (as for example segmentation or summarization of a text) could -be realized starting from hrebs. As an illustration of the similarity between -Hrebs and CCs a detailed analyze of the poem ""Lacul"" by Mihai Eminescu is -given. -" -1010,1401.3832,"Matthew Michelson, Craig A. Knoblock","Constructing Reference Sets from Unstructured, Ungrammatical Text",cs.CL cs.IR," Vast amounts of text on the Web are unstructured and ungrammatical, such as -classified ads, auction listings, forum postings, etc. We call such text -""posts."" Despite their inconsistent structure and lack of grammar, posts are -full of useful information. This paper presents work on semi-automatically -building tables of relational information, called ""reference sets,"" by -analyzing such posts directly. Reference sets can be applied to a number of -tasks such as ontology maintenance and information extraction. Our -reference-set construction method starts with just a small amount of background -knowledge, and constructs tuples representing the entities in the posts to form -a reference set. We also describe an extension to this approach for the special -case where even this small amount of background knowledge is impossible to -discover and use. To evaluate the utility of the machine-constructed reference -sets, we compare them to manually constructed reference sets in the context of -reference-set-based information extraction. Our results show the reference sets -constructed by our method outperform manually constructed reference sets. We -also compare the reference-set-based extraction approach using the -machine-constructed reference set to supervised extraction approaches using -generic features. These results demonstrate that using machine-constructed -reference sets outperforms the supervised methods, even though the supervised -methods require training data. -" -1011,1401.3865,"Xavier Tannier, Philippe Muller",Evaluating Temporal Graphs Built from Texts via Transitive Reduction,cs.CL cs.IR," Temporal information has been the focus of recent attention in information -extraction, leading to some standardization effort, in particular for the task -of relating events in a text. This task raises the problem of comparing two -annotations of a given text, because relations between events in a story are -intrinsically interdependent and cannot be evaluated separately. A proper -evaluation measure is also crucial in the context of a machine learning -approach to the problem. Finding a common comparison referent at the text level -is not obvious, and we argue here in favor of a shift from event-based measures -to measures on a unique textual object, a minimal underlying temporal graph, or -more formally the transitive reduction of the graph of relations between event -boundaries. We support it by an investigation of its properties on synthetic -data and on a well-know temporal corpus. -" -1012,1401.3908,"Ricardo Ribeiro, David Martins de Matos","Centrality-as-Relevance: Support Sets and Similarity as Geometric - Proximity",cs.IR cs.CL," In automatic summarization, centrality-as-relevance means that the most -important content of an information source, or a collection of information -sources, corresponds to the most central passages, considering a representation -where such notion makes sense (graph, spatial, etc.). We assess the main -paradigms, and introduce a new centrality-based relevance model for automatic -summarization that relies on the use of support sets to better estimate the -relevant content. Geometric proximity is used to compute semantic relatedness. -Centrality (relevance) is determined by considering the whole input source (and -not only local information), and by taking into account the existence of minor -topics or lateral subjects in the information sources to be summarized. The -method consists in creating, for each passage of the input source, a support -set consisting only of the most semantically related passages. Then, the -determination of the most relevant content is achieved by selecting the -passages that occur in the largest number of support sets. This model produces -extractive summaries that are generic, and language- and domain-independent. -Thorough automatic evaluation shows that the method achieves state-of-the-art -performance, both in written text, and automatically transcribed speech -summarization, including when compared to considerably more complex approaches. -" -1013,1401.4205,"Maria Kalimeri, Vassilios Constantoudis, Constantinos Papadimitriou, - Kostantinos Karamanos, Fotis K. Diakonos and Haris Papageorgiou","Entropy analysis of word-length series of natural language texts: - Effects of text language and genre",cs.CL physics.data-an," We estimate the $n$-gram entropies of natural language texts in word-length -representation and find that these are sensitive to text language and genre. We -attribute this sensitivity to changes in the probability distribution of the -lengths of single words and emphasize the crucial role of the uniformity of -probabilities of having words with length between five and ten. Furthermore, -comparison with the entropies of shuffled data reveals the impact of word -length correlations on the estimated $n$-gram entropies. -" -1014,1401.4436,"Muhammad Arshad Ul Abedin, Vincent Ng, Latifur Khan","Cause Identification from Aviation Safety Incident Reports via Weakly - Supervised Semantic Lexicon Construction",cs.CL cs.LG," The Aviation Safety Reporting System collects voluntarily submitted reports -on aviation safety incidents to facilitate research work aiming to reduce such -incidents. To effectively reduce these incidents, it is vital to accurately -identify why these incidents occurred. More precisely, given a set of possible -causes, or shaping factors, this task of cause identification involves -identifying all and only those shaping factors that are responsible for the -incidents described in a report. We investigate two approaches to cause -identification. Both approaches exploit information provided by a semantic -lexicon, which is automatically constructed via Thelen and Riloffs Basilisk -framework augmented with our linguistic and algorithmic modifications. The -first approach labels a report using a simple heuristic, which looks for the -words and phrases acquired during the semantic lexicon learning process in the -report. The second approach recasts cause identification as a text -classification problem, employing supervised and transductive text -classification algorithms to learn models from incident reports labeled with -shaping factors and using the models to label unseen reports. Our experiments -show that both the heuristic-based approach and the learning-based approach -(when given sufficient training data) outperform the baseline system -significantly. -" -1015,1401.4603,"Esperanza Albacete, Javier Calle, Elena Castro, Dolores Cuadra","Semantic Similarity Measures Applied to an Ontology for Human-Like - Interaction",cs.AI cs.CL," The focus of this paper is the calculation of similarity between two concepts -from an ontology for a Human-Like Interaction system. In order to facilitate -this calculation, a similarity function is proposed based on five dimensions -(sort, compositional, essential, restrictive and descriptive) constituting the -structure of ontological knowledge. The paper includes a proposal for computing -a similarity function for each dimension of knowledge. Later on, the similarity -values obtained are weighted and aggregated to obtain a global similarity -measure. In order to calculate those weights associated to each dimension, four -training methods have been proposed. The training methods differ in the element -to fit: the user, concepts or pairs of concepts, and a hybrid approach. For -evaluating the proposal, the knowledge base was fed from WordNet and extended -by using a knowledge editing toolkit (Cognos). The evaluation of the proposal -is carried out through the comparison of system responses with those given by -human test subjects, both providing a measure of the soundness of the procedure -and revealing ways in which the proposal may be improved. -" -1016,1401.4634,"Farzad Farnoud (Hassanzadeh), Moshe Schwartz, Jehoshua Bruck",The Capacity of String-Replication Systems,cs.IT cs.CL math.IT," It is known that the majority of the human genome consists of repeated -sequences. Furthermore, it is believed that a significant part of the rest of -the genome also originated from repeated sequences and has mutated to its -current form. In this paper, we investigate the possibility of constructing an -exponentially large number of sequences from a short initial sequence and -simple replication rules, including those resembling genomic replication -processes. In other words, our goal is to find out the capacity, or the -expressive power, of these string-replication systems. Our results include -exact capacities, and bounds on the capacities, of four fundamental -string-replication systems. -" -1017,1401.4869,"Taraka Rama, Karthik Gali, Avinesh PVS",Does Syntactic Knowledge help English-Hindi SMT?,cs.CL cs.AI," In this paper we explore various parameter settings of the state-of-art -Statistical Machine Translation system to improve the quality of the -translation for a `distant' language pair like English-Hindi. We proposed new -techniques for efficient reordering. A slight improvement over the baseline is -reported using these techniques. We also show that a simple pre-processing step -can improve the quality of the translation significantly. -" -1018,1401.4994,Nikolaos Mavridis,A Review of Verbal and Non-Verbal Human-Robot Interactive Communication,cs.RO cs.CL," In this paper, an overview of human-robot interactive communication is -presented, covering verbal as well as non-verbal aspects of human-robot -interaction. Following a historical introduction, and motivation towards fluid -human-robot communication, ten desiderata are proposed, which provide an -organizational axis both of recent as well as of future research on human-robot -communication. Then, the ten desiderata are examined in detail, culminating to -a unifying discussion, and a forward-looking conclusion. -" -1019,1401.5327,Dimitri Kartsaklis,Compositional Operators in Distributional Semantics,cs.CL cs.AI math.CT," This survey presents in some detail the main advances that have been recently -taking place in Computational Linguistics towards the unification of the two -prominent semantic paradigms: the compositional formal semantics view and the -distributional models of meaning based on vector spaces. After an introduction -to these two approaches, I review the most important models that aim to provide -compositionality in distributional semantics. Then I proceed and present in -more detail a particular framework by Coecke, Sadrzadeh and Clark (2010) based -on the abstract mathematical setting of category theory, as a more complete -example capable to demonstrate the diversity of techniques and scientific -disciplines that this kind of research can draw from. This paper concludes with -a discussion about important open issues that need to be addressed by the -researchers in the future. -" -1020,1401.5389,"Sajib Dasgupta, Vincent Ng","Which Clustering Do You Want? Inducing Your Ideal Clustering with - Minimal Feedback",cs.IR cs.CL cs.LG," While traditional research on text clustering has largely focused on grouping -documents by topic, it is conceivable that a user may want to cluster documents -along other dimensions, such as the authors mood, gender, age, or sentiment. -Without knowing the users intention, a clustering algorithm will only group -documents along the most prominent dimension, which may not be the one the user -desires. To address the problem of clustering documents along the user-desired -dimension, previous work has focused on learning a similarity metric from data -manually annotated with the users intention or having a human construct a -feature space in an interactive manner during the clustering process. With the -goal of reducing reliance on human knowledge for fine-tuning the similarity -function or selecting the relevant features required by these approaches, we -propose a novel active clustering algorithm, which allows a user to easily -select the dimension along which she wants to cluster the documents by -inspecting only a small number of words. We demonstrate the viability of our -algorithm on a variety of commonly-used sentiment datasets. -" -1021,1401.5390,"S.R.K. Branavan, David Silver, Regina Barzilay",Learning to Win by Reading Manuals in a Monte-Carlo Framework,cs.CL cs.AI cs.LG," Domain knowledge is crucial for effective performance in autonomous control -systems. Typically, human effort is required to encode this knowledge into a -control algorithm. In this paper, we present an approach to language grounding -which automatically interprets text in the context of a complex control -application, such as a game, and uses domain knowledge extracted from the text -to improve control performance. Both text analysis and control strategies are -learned jointly using only a feedback signal inherent to the application. To -effectively leverage textual information, our method automatically extracts the -text segment most relevant to the current game state, and labels it with a -task-centric predicate structure. This labeled text is then used to bias an -action selection policy for the game, guiding it towards promising regions of -the action space. We encode our model for text analysis and game playing in a -multi-layer neural network, representing linguistic decisions via latent -variables in the hidden layers, and game action quality via the output layer. -Operating within the Monte-Carlo Search framework, we estimate model parameters -using feedback from simulated games. We apply our approach to the complex -strategy game Civilization II using the official game manual as the text guide. -Our results show that a linguistically-informed game-playing agent -significantly outperforms its language-unaware counterpart, yielding a 34% -absolute improvement and winning over 65% of games when playing against the -built-in AI of Civilization. -" -1022,1401.5644,Issam Sahmoudi and Hanane Froud and Abdelmonaime Lachkar,"A new keyphrases extraction method based on suffix tree data structure - for arabic documents clustering",cs.CL cs.IR," Document Clustering is a branch of a larger area of scientific study known as -data mining .which is an unsupervised classification using to find a structure -in a collection of unlabeled data. The useful information in the documents can -be accompanied by a large amount of noise words when using Full Text -Representation, and therefore will affect negatively the result of the -clustering process. So it is with great need to eliminate the noise words and -keeping just the useful information in order to enhance the quality of the -clustering results. This problem occurs with different degree for any language -such as English, European, Hindi, Chinese, and Arabic Language. To overcome -this problem, in this paper, we propose a new and efficient Keyphrases -extraction method based on the Suffix Tree data structure (KpST), the extracted -Keyphrases are then used in the clustering process instead of Full Text -Representation. The proposed method for Keyphrases extraction is language -independent and therefore it may be applied to any language. In this -investigation, we are interested to deal with the Arabic language which is one -of the most complex languages. To evaluate our method, we conduct an -experimental study on Arabic Documents using the most popular Clustering -approach of Hierarchical algorithms: Agglomerative Hierarchical algorithm with -seven linkage techniques and a variety of distance functions and similarity -measures to perform Arabic Document Clustering task. The obtained results show -that our method for extracting Keyphrases increases the quality of the -clustering results. We propose also to study the effect of using the stemming -for the testing dataset to cluster it with the same documents clustering -techniques and similarity/distance measures. -" -1023,1401.5674,"Felipe S\'anchez-Mart\'inez, Rafael C. Carrasco, Miguel A. - Mart\'inez-Prieto, Joaquin Adiego",Generalized Biwords for Bitext Compression and Translation Spotting,cs.CL," Large bilingual parallel texts (also known as bitexts) are usually stored in -a compressed form, and previous work has shown that they can be more -efficiently compressed if the fact that the two texts are mutual translations -is exploited. For example, a bitext can be seen as a sequence of biwords ----pairs of parallel words with a high probability of co-occurrence--- that can -be used as an intermediate representation in the compression process. However, -the simple biword approach described in the literature can only exploit -one-to-one word alignments and cannot tackle the reordering of words. We -therefore introduce a generalization of biwords which can describe multi-word -expressions and reorderings. We also describe some methods for the binary -compression of generalized biword sequences, and compare their performance when -different schemes are applied to the extraction of the biword sequence. In -addition, we show that this generalization of biwords allows for the -implementation of an efficient algorithm to look on the compressed bitext for -words or text segments in one of the texts and retrieve their counterpart -translations in the other text ---an application usually referred to as -translation spotting--- with only some minor modifications in the compression -algorithm. -" -1024,1401.5693,"Trevor Anthony Cohn, Mirella Lapata",Sentence Compression as Tree Transduction,cs.CL," This paper presents a tree-to-tree transduction method for sentence -compression. Our model is based on synchronous tree substitution grammar, a -formalism that allows local distortion of the tree topology and can thus -naturally capture structural mismatches. We describe an algorithm for decoding -in this framework and show how the model can be trained discriminatively within -a large margin framework. Experimental results on sentence compression bring -significant improvements over a state-of-the-art model. -" -1025,1401.5694,"Sebastian Pado, Mirella Lapata",Cross-lingual Annotation Projection for Semantic Roles,cs.CL," This article considers the task of automatically inducing role-semantic -annotations in the FrameNet paradigm for new languages. We propose a general -framework that is based on annotation projection, phrased as a graph -optimization problem. It is relatively inexpensive and has the potential to -reduce the human effort involved in creating role-semantic resources. Within -this framework, we present projection models that exploit lexical and syntactic -information. We provide an experimental evaluation on an English-German -parallel corpus which demonstrates the feasibility of inducing high-precision -German semantic role annotation both for manually and automatically annotated -English data. -" -1026,1401.5695,"Tahira Naseem, Benjamin Snyder, Jacob Eisenstein, Regina Barzilay",Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches,cs.CL," We demonstrate the effectiveness of multilingual learning for unsupervised -part-of-speech tagging. The central assumption of our work is that by combining -cues from multiple languages, the structure of each becomes more apparent. We -consider two ways of applying this intuition to the problem of unsupervised -part-of-speech tagging: a model that directly merges tag structures for a pair -of languages into a single sequence and a second model which instead -incorporates multilingual context using latent variables. Both approaches are -formulated as hierarchical Bayesian models, using Markov Chain Monte Carlo -sampling techniques for inference. Our results demonstrate that by -incorporating multilingual evidence we can achieve impressive performance gains -across a range of scenarios. We also found that performance improves steadily -as the number of available languages increases. -" -1027,1401.5696,"Alexander Pieter Yates, Oren Etzioni","Unsupervised Methods for Determining Object and Relation Synonyms on the - Web",cs.CL," The task of identifying synonymous relations and objects, or synonym -resolution, is critical for high-quality information extraction. This paper -investigates synonym resolution in the context of unsupervised information -extraction, where neither hand-tagged training examples nor domain knowledge is -available. The paper presents a scalable, fully-implemented system that runs in -O(KN log N) time in the number of extractions, N, and the maximum number of -synonyms per word, K. The system, called Resolver, introduces a probabilistic -relational model for predicting whether two strings are co-referential based on -the similarity of the assertions containing them. On a set of two million -assertions extracted from the Web, Resolver resolves objects with 78% precision -and 68% recall, and resolves relations with 90% precision and 35% recall. -Several variations of resolvers probabilistic model are explored, and -experiments demonstrate that under appropriate conditions these variations can -improve F1 by 5%. An extension to the basic Resolver system allows it to handle -polysemous names with 97% precision and 95% recall on a data set from the TREC -corpus. -" -1028,1401.5697,"Evgeniy Gabrilovich, Shaul Markovitch",Wikipedia-based Semantic Interpretation for Natural Language Processing,cs.CL," Adequate representation of natural language semantics requires access to vast -amounts of common sense and domain-specific world knowledge. Prior work in the -field was based on purely statistical techniques that did not make use of -background knowledge, on limited lexicographic knowledge bases such as WordNet, -or on huge manual efforts such as the CYC project. Here we propose a novel -method, called Explicit Semantic Analysis (ESA), for fine-grained semantic -interpretation of unrestricted natural language texts. Our method represents -meaning in a high-dimensional space of concepts derived from Wikipedia, the -largest encyclopedia in existence. We explicitly represent the meaning of any -text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our -method on text categorization and on computing the degree of semantic -relatedness between fragments of natural language text. Using ESA results in -significant improvements over the previous state of the art in both tasks. -Importantly, due to the use of natural concepts, the ESA model is easy to -explain to human users. -" -1029,1401.5698,"Yifan Li, Petr Musilek, Marek Reformat, Loren Wyard-Scott",Identification of Pleonastic It Using the Web,cs.CL," In a significant minority of cases, certain pronouns, especially the pronoun -it, can be used without referring to any specific entity. This phenomenon of -pleonastic pronoun usage poses serious problems for systems aiming at even a -shallow understanding of natural language texts. In this paper, a novel -approach is proposed to identify such uses of it: the extrapositional cases are -identified using a series of queries against the web, and the cleft cases are -identified using a simple set of syntactic rules. The system is evaluated with -four sets of news articles containing 679 extrapositional cases as well as 78 -cleft constructs. The identification results are comparable to those obtained -by human efforts. -" -1030,1401.5699,"George Tsatsaronis, Iraklis Varlamis, Michalis Vazirgiannis",Text Relatedness Based on a Word Thesaurus,cs.CL," The computation of relatedness between two fragments of text in an automated -manner requires taking into account a wide range of factors pertaining to the -meaning the two fragments convey, and the pairwise relations between their -words. Without doubt, a measure of relatedness between text segments must take -into account both the lexical and the semantic relatedness between words. Such -a measure that captures well both aspects of text relatedness may help in many -tasks, such as text retrieval, classification and clustering. In this paper we -present a new approach for measuring the semantic relatedness between words -based on their implicit semantic links. The approach exploits only a word -thesaurus in order to devise implicit semantic links between words. Based on -this approach, we introduce Omiotis, a new measure of semantic relatedness -between texts which capitalizes on the word-to-word semantic relatedness -measure (SR) and extends it to measure the relatedness between texts. We -gradually validate our method: we first evaluate the performance of the -semantic relatedness measure between individual words, covering word-to-word -similarity and relatedness, synonym identification and word analogy; then, we -proceed with evaluating the performance of our method in measuring text-to-text -semantic relatedness in two tasks, namely sentence-to-sentence similarity and -paraphrase recognition. Experimental evaluation shows that the proposed method -outperforms every lexicon-based method of semantic relatedness in the selected -tasks and the used data sets, and competes well against corpus-based and hybrid -approaches. -" -1031,1401.5700,"Felipe S\'anchez-Mart\'inez, Mikel L. Forcada","Inferring Shallow-Transfer Machine Translation Rules from Small Parallel - Corpora",cs.CL," This paper describes a method for the automatic inference of structural -transfer rules to be used in a shallow-transfer machine translation (MT) system -from small parallel corpora. The structural transfer rules are based on -alignment templates, like those used in statistical MT. Alignment templates are -extracted from sentence-aligned parallel corpora and extended with a set of -restrictions which are derived from the bilingual dictionary of the MT system -and control their application as transfer rules. The experiments conducted -using three different language pairs in the free/open-source MT platform -Apertium show that translation quality is improved as compared to word-for-word -translation (when no transfer rules are used), and that the resulting -translation quality is close to that obtained using hand-coded transfer rules. -The method we present is entirely unsupervised and benefits from information in -the rest of modules of the MT system in which the inferred rules are applied. -" -1032,1401.5980,"Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Stephen Pulman, Bob Coecke","Reasoning about Meaning in Natural Language with Compact Closed - Categories and Frobenius Algebras",cs.CL cs.AI math.CT," Compact closed categories have found applications in modeling quantum -information protocols by Abramsky-Coecke. They also provide semantics for -Lambek's pregroup algebras, applied to formalizing the grammatical structure of -natural language, and are implicit in a distributional model of word meaning -based on vector spaces. Specifically, in previous work Coecke-Clark-Sadrzadeh -used the product category of pregroups with vector spaces and provided a -distributional model of meaning for sentences. We recast this theory in terms -of strongly monoidal functors and advance it via Frobenius algebras over vector -spaces. The former are used to formalize topological quantum field theories by -Atiyah and Baez-Dolan, and the latter are used to model classical data in -quantum protocols by Coecke-Pavlovic-Vicary. The Frobenius algebras enable us -to work in a single space in which meanings of words, phrases, and sentences of -any structure live. Hence we can compare meanings of different language -constructs and enhance the applicability of the theory. We report on -experimental results on a number of language tasks and verify the theoretical -predictions. -" -1033,1401.6050,"Hai Zhao, Xiaotian Zhang, Chunyu Kit","Integrative Semantic Dependency Parsing via Efficient Large-scale - Feature Selection",cs.CL," Semantic parsing, i.e., the automatic derivation of meaning representation -such as an instantiated predicate-argument structure for a sentence, plays a -critical role in deep processing of natural language. Unlike all other top -systems of semantic dependency parsing that have to rely on a pipeline -framework to chain up a series of submodels each specialized for a specific -subtask, the one presented in this article integrates everything into one -model, in hopes of achieving desirable integrity and practicality for real -applications while maintaining a competitive performance. This integrative -approach tackles semantic parsing as a word pair classification problem using a -maximum entropy classifier. We leverage adaptive pruning of argument candidates -and large-scale feature selection engineering to allow the largest feature -space ever in use so far in this field, it achieves a state-of-the-art -performance on the evaluation data set for CoNLL-2008 shared task, on top of -all but one top pipeline system, confirming its feasibility and effectiveness. -" -1034,1401.6122,"Tanmoy Chakraborty, Dipankar Das, Sivaji Bandyopadhyay",Identifying Bengali Multiword Expressions using Semantic Clustering,cs.CL," One of the key issues in both natural language understanding and generation -is the appropriate processing of Multiword Expressions (MWEs). MWEs pose a huge -problem to the precise language processing due to their idiosyncratic nature -and diversity in lexical, syntactical and semantic properties. The semantics of -a MWE cannot be expressed after combining the semantics of its constituents. -Therefore, the formalism of semantic clustering is often viewed as an -instrument for extracting MWEs especially for resource constraint languages -like Bengali. The present semantic clustering approach contributes to locate -clusters of the synonymous noun tokens present in the document. These clusters -in turn help measure the similarity between the constituent words of a -potentially candidate phrase using a vector space model and judge the -suitability of this phrase to be a MWE. In this experiment, we apply the -semantic clustering approach for noun-noun bigram MWEs, though it can be -extended to any types of MWEs. In parallel, the well known statistical models, -namely Point-wise Mutual Information (PMI), Log Likelihood Ratio (LLR), -Significance function are also employed to extract MWEs from the Bengali -corpus. The comparative evaluation shows that the semantic clustering approach -outperforms all other competing statistical models. As a by-product of this -experiment, we have started developing a standard lexicon in Bengali that -serves as a productive Bengali linguistic thesaurus. -" -1035,1401.6131,"Jo\~ao V. Gra\c{c}a, Kuzman Ganchev, Luisa Coheur, Fernando Pereira, - Ben Taskar",Controlling Complexity in Part-of-Speech Induction,cs.CL cs.LG," We consider the problem of fully unsupervised learning of grammatical -(part-of-speech) categories from unlabeled text. The standard -maximum-likelihood hidden Markov model for this task performs poorly, because -of its weak inductive bias and large model capacity. We address this problem by -refining the model and modifying the learning objective to control its capacity -via para- metric and non-parametric constraints. Our approach enforces -word-category association sparsity, adds morphological and orthographic -features, and eliminates hard-to-estimate parameters for rare words. We develop -an efficient learning algorithm that is not much more computationally intensive -than standard training. We also provide an open-source implementation of the -algorithm. Our experiments on five diverse languages (Bulgarian, Danish, -English, Portuguese, Spanish) achieve significant improvements compared with -previous methods for the same task. -" -1036,1401.6169,"Hossein Soleimani, David J. Miller",Parsimonious Topic Models with Salient Word Discovery,cs.LG cs.CL cs.IR stat.ML," We propose a parsimonious topic model for text corpora. In related models -such as Latent Dirichlet Allocation (LDA), all words are modeled -topic-specifically, even though many words occur with similar frequencies -across different topics. Our modeling determines salient words for each topic, -which have topic-specific probabilities, with the rest explained by a universal -shared model. Further, in LDA all topics are in principle present in every -document. By contrast our model gives sparse topic representation, determining -the (small) subset of relevant topics for each document. We derive a Bayesian -Information Criterion (BIC), balancing model complexity and goodness of fit. -Here, interestingly, we identify an effective sample size and corresponding -penalty specific to each parameter type in our model. We minimize BIC to -jointly determine our entire model -- the topic-specific words, -document-specific topics, all model parameter values, {\it and} the total -number of topics -- in a wholly unsupervised fashion. Results on three text -corpora and an image dataset show that our model achieves higher test set -likelihood and better agreement with ground-truth class labels, compared to LDA -and to a model designed to incorporate sparsity. -" -1037,1401.6224,"Maria Kalimeri, Vassilios Constantoudis, Constantinos Papadimitriou, - Konstantinos Karamanos, Fotis K. Diakonos and Harris Papageorgiou",Word-length entropies and correlations of natural language written texts,cs.CL physics.data-an," We study the frequency distributions and correlations of the word lengths of -ten European languages. Our findings indicate that a) the word-length -distribution of short words quantified by the mean value and the entropy -distinguishes the Uralic (Finnish) corpus from the others, b) the tails at long -words, manifested in the high-order moments of the distributions, differentiate -the Germanic languages (except for English) from the Romanic languages and -Greek and c) the correlations between nearby word lengths measured by the -comparison of the real entropies with those of the shuffled texts are found to -be smaller in the case of Germanic and Finnish languages. -" -1038,1401.6330,"Li Dong, Furu Wei, Shujie Liu, Ming Zhou, Ke Xu",A Statistical Parsing Framework for Sentiment Classification,cs.CL," We present a statistical parsing framework for sentence-level sentiment -classification in this article. Unlike previous works that employ syntactic -parsing results for sentiment analysis, we develop a statistical parser to -directly analyze the sentiment structure of a sentence. We show that -complicated phenomena in sentiment analysis (e.g., negation, intensification, -and contrast) can be handled the same as simple and straightforward sentiment -expressions in a unified and probabilistic way. We formulate the sentiment -grammar upon Context-Free Grammars (CFGs), and provide a formal description of -the sentiment parsing framework. We develop the parsing model to obtain -possible sentiment parse trees for a sentence, from which the polarity model is -proposed to derive the sentiment strength and polarity, and the ranking model -is dedicated to selecting the best sentiment tree. We train the parser directly -from examples of sentences annotated only with sentiment polarity labels but -without any syntactic annotations or polarity annotations of constituents -within sentences. Therefore we can obtain training data easily. In particular, -we train a sentiment parser, s.parser, from a large amount of review sentences -with users' ratings as rough sentiment polarity labels. Extensive experiments -on existing benchmark datasets show significant improvements over baseline -sentiment classification approaches. -" -1039,1401.6422,"Christina Sauper, Regina Barzilay",Automatic Aggregation by Joint Modeling of Aspects and Values,cs.CL," We present a model for aggregation of product review snippets by joint aspect -identification and sentiment analysis. Our model simultaneously identifies an -underlying set of ratable aspects presented in the reviews of a product (e.g., -sushi and miso for a Japanese restaurant) and determines the corresponding -sentiment of each aspect. This approach directly enables discovery of -highly-rated or inconsistent aspects of a product. Our generative model admits -an efficient variational mean-field inference algorithm. It is also easily -extensible, and we describe several modifications and their effects on model -structure and inference. We test our model on two tasks, joint aspect -identification and sentiment analysis on a set of Yelp reviews and aspect -identification alone on a set of medical summaries. We evaluate the performance -of the model on aspect identification, sentiment analysis, and per-word -labeling accuracy. We demonstrate that our model outperforms applicable -baselines by a considerable margin, yielding up to 32% relative error reduction -on aspect identification and up to 20% relative error reduction on sentiment -analysis. -" -1040,1401.6427,"Seyed Abolghasem Mirroshandel, Gholamreza Ghassem-Sani",Towards Unsupervised Learning of Temporal Relations between Events,cs.LG cs.CL," Automatic extraction of temporal relations between event pairs is an -important task for several natural language processing applications such as -Question Answering, Information Extraction, and Summarization. Since most -existing methods are supervised and require large corpora, which for many -languages do not exist, we have concentrated our efforts to reduce the need for -annotated data as much as possible. This paper presents two different -algorithms towards this goal. The first algorithm is a weakly supervised -machine learning approach for classification of temporal relations between -events. In the first stage, the algorithm learns a general classifier from an -annotated corpus. Then, inspired by the hypothesis of ""one type of temporal -relation per discourse, it extracts useful information from a cluster of -topically related documents. We show that by combining the global information -of such a cluster with local decisions of a general classifier, a bootstrapping -cross-document classifier can be built to extract temporal relations between -events. Our experiments show that without any additional annotated data, the -accuracy of the proposed algorithm is higher than that of several previous -successful systems. The second proposed method for temporal relation extraction -is based on the expectation maximization (EM) algorithm. Within EM, we used -different techniques such as a greedy best-first search and integer linear -programming for temporal inconsistency removal. We think that the experimental -results of our EM based algorithm, as a first step toward a fully unsupervised -temporal relation extraction method, is encouraging. -" -1041,1401.6567,"Vivekananda Gayen, Kamal Sarkar","A Machine Learning Approach for the Identification of Bengali Noun-Noun - Compound Multiword Expressions",cs.CL cs.LG," This paper presents a machine learning approach for identification of Bengali -multiword expressions (MWE) which are bigram nominal compounds. Our proposed -approach has two steps: (1) candidate extraction using chunk information and -various heuristic rules and (2) training the machine learning algorithm called -Random Forest to classify the candidates into two groups: bigram nominal -compound MWE or not bigram nominal compound MWE. A variety of association -measures, syntactic and linguistic clues and a set of WordNet-based similarity -features have been used for our MWE identification task. The approach presented -in this paper can be used to identify bigram nominal compound MWE in Bengali -running text. -" -1042,1401.6571,"Shibamouli Lahiri, Sagnik Ray Choudhury, Cornelia Caragea","Keyword and Keyphrase Extraction Using Centrality Measures on - Collocation Networks",cs.CL cs.IR," Keyword and keyphrase extraction is an important problem in natural language -processing, with applications ranging from summarization to semantic search to -document clustering. Graph-based approaches to keyword and keyphrase extraction -avoid the problem of acquiring a large in-domain training corpus by applying -variants of PageRank algorithm on a network of words. Although graph-based -approaches are knowledge-lean and easily adoptable in online systems, it -remains largely open whether they can benefit from centrality measures other -than PageRank. In this paper, we experiment with an array of centrality -measures on word and noun phrase collocation networks, and analyze their -performance on four benchmark datasets. Not only are there centrality measures -that perform as well as or better than PageRank, but they are much simpler -(e.g., degree, strength, and neighborhood size). Furthermore, centrality-based -methods give results that are competitive with and, in some cases, better than -two strong unsupervised baselines. -" -1043,1401.6573,"Livy-Maria Real-Coelho (LaBRI, UFPR), Christian Retor\'e (LaBRI, IRIT)",Deverbal semantics and the Montagovian generative lexicon,cs.CL cs.LO," We propose a lexical account of action nominals, in particular of deverbal -nominalisations, whose meaning is related to the event expressed by their base -verb. The literature about nominalisations often assumes that the semantics of -the base verb completely defines the structure of action nominals. We argue -that the information in the base verb is not sufficient to completely determine -the semantics of action nominals. We exhibit some data from different -languages, especially from Romance language, which show that nominalisations -focus on some aspects of the verb semantics. The selected aspects, however, -seem to be idiosyncratic and do not automatically result from the internal -structure of the verb nor from its interaction with the morphological suffix. -We therefore propose a partially lexicalist approach view of deverbal nouns. It -is made precise and computable by using the Montagovian Generative Lexicon, a -type theoretical framework introduced by Bassac, Mery and Retor\'e in this -journal in 2010. This extension of Montague semantics with a richer type system -easily incorporates lexical phenomena like the semantics of action nominals in -particular deverbals, including their polysemy and (in)felicitous -copredications. -" -1044,1401.6574,"Jean Gillibert (IMB), Christian Retor\'e (LaBRI)","Category theory, logic and formal linguistics: some connections, old and - new",math.CT cs.CL cs.LO math.LO," We seize the opportunity of the publication of selected papers from the -\emph{Logic, categories, semantics} workshop in the \emph{Journal of Applied -Logic} to survey some current trends in logic, namely intuitionistic and linear -type theories, that interweave categorical, geometrical and computational -considerations. We thereafter present how these rich logical frameworks can -model the way language conveys meaning. -" -1045,1401.6875,"Shaolin Qu, Joyce Y. Chai",Context-based Word Acquisition for Situated Dialogue in a Virtual World,cs.CL," To tackle the vocabulary problem in conversational systems, previous work has -applied unsupervised learning approaches on co-occurring speech and eye gaze -during interaction to automatically acquire new words. Although these -approaches have shown promise, several issues related to human language -behavior and human-machine conversation have not been addressed. First, -psycholinguistic studies have shown certain temporal regularities between human -eye movement and language production. While these regularities can potentially -guide the acquisition process, they have not been incorporated in the previous -unsupervised approaches. Second, conversational systems generally have an -existing knowledge base about the domain and vocabulary. While the existing -knowledge can potentially help bootstrap and constrain the acquired new words, -it has not been incorporated in the previous models. Third, eye gaze could -serve different functions in human-machine conversation. Some gaze streams may -not be closely coupled with speech stream, and thus are potentially detrimental -to word acquisition. Automated recognition of closely-coupled speech-gaze -streams based on conversation context is important. To address these issues, we -developed new approaches that incorporate user language behavior, domain -knowledge, and conversation context in word acquisition. We evaluated these -approaches in the context of situated dialogue in a virtual world. Our -experimental results have shown that incorporating the above three types of -contextual information significantly improves word acquisition performance. -" -1046,1401.6876,"Preslav Ivanov Nakov, Hwee Tou Ng","Improving Statistical Machine Translation for a Resource-Poor Language - Using Related Resource-Rich Languages",cs.CL," We propose a novel language-independent approach for improving machine -translation for resource-poor languages by exploiting their similarity to -resource-rich ones. More precisely, we improve the translation from a -resource-poor source language X_1 into a resource-rich language Y given a -bi-text containing a limited number of parallel sentences for X_1-Y and a -larger bi-text for X_2-Y for some resource-rich language X_2 that is closely -related to X_1. This is achieved by taking advantage of the opportunities that -vocabulary overlap and similarities between the languages X_1 and X_2 in -spelling, word order, and syntax offer: (1) we improve the word alignments for -the resource-poor language, (2) we further augment it with additional -translation options, and (3) we take care of potential spelling differences -through appropriate transliteration. The evaluation for Indonesian- >English -using Malay and for Spanish -> English using Portuguese and pretending Spanish -is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, -respectively, which is an improvement over the best rivaling approaches, while -using much less additional data. Overall, our method cuts the amount of -necessary ""real training data by a factor of 2--5. -" -1047,1401.6984,Yajie Miao,Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN,cs.LG cs.CL," The Kaldi toolkit is becoming popular for constructing automated speech -recognition (ASR) systems. Meanwhile, in recent years, deep neural networks -(DNNs) have shown state-of-the-art performance on various ASR tasks. This -document describes our open-source recipes to implement fully-fledged DNN -acoustic modeling using Kaldi and PDNN. PDNN is a lightweight deep learning -toolkit developed under the Theano environment. Using these recipes, we can -build up multiple systems including DNN hybrid systems, convolutional neural -network (CNN) systems and bottleneck feature systems. These recipes are -directly based on the Kaldi Switchboard 110-hour setup. However, adapting them -to new datasets is easy to achieve. -" -1048,1401.7077,"Gerardo Febres, Klaus Jaffe",Quantifying literature quality using complexity criteria,cs.CL," We measured entropy and symbolic diversity for English and Spanish texts -including literature Nobel laureates and other famous authors. Entropy, symbol -diversity and symbol frequency profiles were compared for these four groups. We -also built a scale sensitive to the quality of writing and evaluated its -relationship with the Flesch's readability index for English and the -Szigriszt's perspicuity index for Spanish. Results suggest a correlation -between entropy and word diversity with quality of writing. Text genre also -influences the resulting entropy and diversity of the text. Results suggest the -plausibility of automated quality assessment of texts. -" -1049,1401.8269,Peter D. Turney and Saif M. Mohammad,Experiments with Three Approaches to Recognizing Lexical Entailment,cs.CL cs.AI cs.LG," Inference in natural language often involves recognizing lexical entailment -(RLE); that is, identifying whether one word entails another. For example, -""buy"" entails ""own"". Two general strategies for RLE have been proposed: One -strategy is to manually construct an asymmetric similarity measure for context -vectors (directional similarity) and another is to treat RLE as a problem of -learning to recognize semantic relations using supervised machine learning -techniques (relation classification). In this paper, we experiment with two -recent state-of-the-art representatives of the two general strategies. The -first approach is an asymmetric similarity measure (an instance of the -directional similarity strategy), designed to capture the degree to which the -contexts of a word, a, form a subset of the contexts of another word, b. The -second approach (an instance of the relation classification strategy) -represents a word pair, a:b, with a feature vector that is the concatenation of -the context vectors of a and b, and then applies supervised learning to a -training set of labeled feature vectors. Additionally, we introduce a third -approach that is a new instance of the relation classification strategy. The -third approach represents a word pair, a:b, with a feature vector in which the -features are the differences in the similarities of a and b to a set of -reference words. All three approaches use vector space models (VSMs) of -semantics, based on word-context matrices. We perform an extensive evaluation -of the three approaches using three different datasets. The proposed new -approach (similarity differences) performs significantly better than the other -two approaches on some datasets and there is no dataset for which it is -significantly worse. Our results suggest it is beneficial to make connections -between the research in lexical entailment and the research in semantic -relation classification. -" -1050,1402.0543,Jan Koeman and William Rea,How Does Latent Semantic Analysis Work? A Visualisation Approach,cs.CL cs.IR," By using a small example, an analogy to photographic compression, and a -simple visualization using heatmaps, we show that latent semantic analysis -(LSA) is able to extract what appears to be semantic meaning of words from a -set of documents by blurring the distinctions between the words. -" -1051,1402.0556,"Vahed Qazvinian, Dragomir R. Radev, Saif M. Mohammad, Bonnie Dorr, - David Zajic, Michael Whidby, Taesun Moon",Generating Extractive Summaries of Scientific Paradigms,cs.IR cs.CL," Researchers and scientists increasingly find themselves in the position of -having to quickly understand large amounts of technical material. Our goal is -to effectively serve this need by using bibliometric text mining and -summarization techniques to generate summaries of scientific literature. We -show how we can use citations to produce automatically generated, readily -consumable, technical extractive summaries. We first propose C-LexRank, a model -for summarizing single scientific articles based on citations, which employs -community detection and extracts salient information-rich sentences. Next, we -further extend our experiments to summarize a set of papers, which cover the -same scientific topic. We generate extractive summaries of a set of Question -Answering (QA) and Dependency Parsing (DP) papers, their abstracts, and their -citation sentences and show that citations have unique information amenable to -creating a summary. -" -1052,1402.0563,"Marta R. Costa-juss\`a, Carlos A. Henr\'iquez, Rafael E. Banchs","Evaluating Indirect Strategies for Chinese-Spanish Statistical Machine - Translation",cs.CL," Although, Chinese and Spanish are two of the most spoken languages in the -world, not much research has been done in machine translation for this language -pair. This paper focuses on investigating the state-of-the-art of -Chinese-to-Spanish statistical machine translation (SMT), which nowadays is one -of the most popular approaches to machine translation. For this purpose, we -report details of the available parallel corpus which are Basic Traveller -Expressions Corpus (BTEC), Holy Bible and United Nations (UN). Additionally, we -conduct experimental work with the largest of these three corpora to explore -alternative SMT strategies by means of using a pivot language. Three -alternatives are considered for pivoting: cascading, pseudo-corpus and -triangulation. As pivot language, we use either English, Arabic or French. -Results show that, for a phrase-based SMT system, English is the best pivot -language between Chinese and Spanish. We propose a system output combination -using the pivot strategies which is capable of outperforming the direct -translation strategy. The main objective of this work is motivating and -involving the research community to work in this important pair of languages -given their demographic impact. -" -1053,1402.0574,"Kira Radinsky, Sagie Davidovich, Shaul Markovitch",Learning to Predict from Textual Data,cs.CL cs.AI cs.IR," Given a current news event, we tackle the problem of generating plausible -predictions of future events it might cause. We present a new methodology for -modeling and predicting such future news events using machine learning and data -mining techniques. Our Pundit algorithm generalizes examples of causality pairs -to infer a causality predictor. To obtain precisely labeled causality examples, -we mine 150 years of news articles and apply semantic natural language modeling -techniques to headlines containing certain predefined causality patterns. For -generalization, the model uses a vast number of world knowledge ontologies. -Empirical evaluation on real news articles shows that our Pundit algorithm -performs as well as non-expert humans. -" -1054,1402.0578,"Maytham Alabbas, Allan Ramsay","Natural Language Inference for Arabic Using Extended Tree Edit Distance - with Subtrees",cs.CL," Many natural language processing (NLP) applications require the computation -of similarities between pairs of syntactic or semantic trees. Many researchers -have used tree edit distance for this task, but this technique suffers from the -drawback that it deals with single node operations only. We have extended the -standard tree edit distance algorithm to deal with subtree transformation -operations as well as single nodes. The extended algorithm with subtree -operations, TED+ST, is more effective and flexible than the standard algorithm, -especially for applications that pay attention to relations among nodes (e.g. -in linguistic trees, deleting a modifier subtree should be cheaper than the sum -of deleting its components individually). We describe the use of TED+ST for -checking entailment between two Arabic text snippets. The preliminary results -of using TED+ST were encouraging when compared with two string-based approaches -and with the standard algorithm. -" -1055,1402.0586,"Shafiq Rayhan Joty, Giuseppe Carenini, Raymond T Ng",Topic Segmentation and Labeling in Asynchronous Conversations,cs.CL," Topic segmentation and labeling is often considered a prerequisite for -higher-level conversation analysis and has been shown to be useful in many -Natural Language Processing (NLP) applications. We present two new corpora of -email and blog conversations annotated with topics, and evaluate annotator -reliability for the segmentation and labeling tasks in these asynchronous -conversations. We propose a complete computational framework for topic -segmentation and labeling in asynchronous conversations. Our approach extends -state-of-the-art methods by considering a fine-grained structure of an -asynchronous conversation, along with other conversational features by applying -recent graph-based methods for NLP. For topic segmentation, we propose two -novel unsupervised models that exploit the fine-grained conversational -structure, and a novel graph-theoretic supervised model that combines lexical, -conversational and topic features. For topic labeling, we propose two novel -(unsupervised) random walk models that respectively capture conversation -specific clues from two different sources: the leading sentences and the -fine-grained conversational structure. Empirical evaluation shows that the -segmentation and the labeling performed by our best models beat the -state-of-the-art, and are highly correlated with human annotations. -" -1056,1402.1128,"Ha\c{s}im Sak, Andrew Senior, Fran\c{c}oise Beaufays","Long Short-Term Memory Based Recurrent Neural Network Architectures for - Large Vocabulary Speech Recognition",cs.NE cs.CL cs.LG stat.ML," Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) -architecture that has been designed to address the vanishing and exploding -gradient problems of conventional RNNs. Unlike feedforward neural networks, -RNNs have cyclic connections making them powerful for modeling sequences. They -have been successfully used for sequence labeling and sequence prediction -tasks, such as handwriting recognition, language modeling, phonetic labeling of -acoustic frames. However, in contrast to the deep neural networks, the use of -RNNs in speech recognition has been limited to phone recognition in small scale -tasks. In this paper, we present novel LSTM based RNN architectures which make -more effective use of model parameters to train acoustic models for large -vocabulary speech recognition. We train and compare LSTM, RNN and DNN models at -various numbers of parameters and configurations. We show that LSTM models -converge quickly and give state of the art speech recognition performance for -relatively small sized models. -" -1057,1402.1454,"Sarath Chandar A P, Stanislas Lauly, Hugo Larochelle, Mitesh M. - Khapra, Balaraman Ravindran, Vikas Raykar, Amrita Saha",An Autoencoder Approach to Learning Bilingual Word Representations,cs.CL cs.LG stat.ML," Cross-language learning allows us to use training data from one language to -build models for a different language. Many approaches to bilingual learning -require that we have word-level alignment of sentences from parallel corpora. -In this work we explore the use of autoencoder-based methods for cross-language -learning of vectorial word representations that are aligned between two -languages, while not relying on word-level alignments. We show that by simply -learning to reconstruct the bag-of-words representations of aligned sentences, -within and between languages, we can in fact learn high-quality representations -and do without word alignments. Since training autoencoders on word -observations presents certain computational issues, we propose and compare -different variations adapted to this setting. We also propose an explicit -correlation maximizing regularizer that leads to significant improvement in the -performance. We empirically investigate the success of our approach on the -problem of cross-language test classification, where a classifier trained on a -given language (e.g., English) must learn to generalize to a different language -(e.g., German). These experiments demonstrate that our approaches are -competitive with the state-of-the-art, achieving up to 10-14 percentage point -improvements over the best reported results on this task. -" -1058,1402.1668,"John David Osborne, Binod Gyawali, Thamar Solorio",Evaluation of YTEX and MetaMap for clinical concept recognition,cs.IR cs.CL," We used MetaMap and YTEX as a basis for the construc- tion of two separate -systems to participate in the 2013 ShARe/CLEF eHealth Task 1[9], the -recognition of clinical concepts. No modifications were directly made to these -systems, but output concepts were filtered using stop concepts, stop concept -text and UMLS semantic type. Con- cept boundaries were also adjusted using a -small collection of rules to increase precision on the strict task. Overall -MetaMap had better per- formance than YTEX on the strict task, primarily due to -a 20% perfor- mance improvement in precision. In the relaxed task YTEX had -better performance in both precision and recall giving it an overall F-Score -4.6% higher than MetaMap on the test data. Our results also indicated a 1.3% -higher accuracy for YTEX in UMLS CUI mapping. -" -1059,1402.1939,Xiao-Yong Yan and Petter Minnhagen,"Maximum Entropy, Word-Frequency, Chinese Characters, and Multiple - Meanings",physics.soc-ph cs.CL," The word-frequency distribution of a text written by an author is well -accounted for by a maximum entropy distribution, the RGF (random group -formation)-prediction. The RGF-distribution is completely determined by the a -priori values of the total number of words in the text (M), the number of -distinct words (N) and the number of repetitions of the most common word -(k_max). It is here shown that this maximum entropy prediction also describes a -text written in Chinese characters. In particular it is shown that although the -same Chinese text written in words and Chinese characters have quite -differently shaped distributions, they are nevertheless both well predicted by -their respective three a priori characteristic values. It is pointed out that -this is analogous to the change in the shape of the distribution when -translating a given text to another language. Another consequence of the -RGF-prediction is that taking a part of a long text will change the input -parameters (M, N, k_max) and consequently also the shape of the frequency -distribution. This is explicitly confirmed for texts written in Chinese -characters. Since the RGF-prediction has no system-specific information beyond -the three a priori values (M, N, k_max), any specific language characteristic -has to be sought in systematic deviations from the RGF-prediction and the -measured frequencies. One such systematic deviation is identified and, through -a statistical information theoretical argument and an extended RGF-model, it is -proposed that this deviation is caused by multiple meanings of Chinese -characters. The effect is stronger for Chinese characters than for Chinese -words. The relation between Zipf's law, the Simon-model for texts and the -present results are discussed. -" -1060,1402.2427,"Jan Hauffa, Tobias Lichtenberg, Georg Groh","An evaluation of keyword extraction from online communication for the - characterisation of social relations",cs.SI cs.CL cs.IR," The set of interpersonal relationships on a social network service or a -similar online community is usually highly heterogenous. The concept of tie -strength captures only one aspect of this heterogeneity. Since the unstructured -text content of online communication artefacts is a salient source of -information about a social relationship, we investigate the utility of keywords -extracted from the message body as a representation of the relationship's -characteristics as reflected by the conversation topics. Keyword extraction is -performed using standard natural language processing methods. Communication -data and human assessments of the extracted keywords are obtained from Facebook -users via a custom application. The overall positive quality assessment -provides evidence that the keywords indeed convey relevant information about -the relationship. -" -1061,1402.2561,"Tiziano Flati, Roberto Navigli","The CQC Algorithm: Cycling in Graphs to Semantically Enrich and Enhance - a Bilingual Dictionary",cs.CL," Bilingual machine-readable dictionaries are knowledge resources useful in -many automatic tasks. However, compared to monolingual computational lexicons -like WordNet, bilingual dictionaries typically provide a lower amount of -structured information, such as lexical and semantic relations, and often do -not cover the entire range of possible translations for a word of interest. In -this paper we present Cycles and Quasi-Cycles (CQC), a novel algorithm for the -automated disambiguation of ambiguous translations in the lexical entries of a -bilingual machine-readable dictionary. The dictionary is represented as a -graph, and cyclic patterns are sought in the graph to assign an appropriate -sense tag to each translation in a lexical entry. Further, we use the -algorithms output to improve the quality of the dictionary itself, by -suggesting accurate solutions to structural problems such as misalignments, -partial alignments and missing entries. Finally, we successfully apply CQC to -the task of synonym extraction. -" -1062,1402.2562,"Nathalie Chaignaud (LITIS), Val\'erie Delavigne (LiDiFra), Maryvonne - Holzem (LiDiFra), Jean-Philippe Kotowicz (LITIS), Alain Loisel (LITIS)","\'Etude cognitive des processus de construction d'une requ\^ete dans un - syst\`eme de gestion de connaissances m\'edicales",cs.IR cs.CL," This article presents the Cogni-CISMeF project, which aims at improving -medical information search in the CISMeF system (Catalog and Index of -French-language health resources) by including a conversational agent to -interact with the user in natural language. To study the cognitive processes -involved during the information search, a bottom-up methodology was adopted. -Experimentation has been set up to obtain human dialogs between a user (playing -the role of patient) dealing with medical information search and a CISMeF -expert refining the request. The analysis of these dialogs underlined the use -of discursive evidence: vocabulary, reformulation, implicit or explicit -expression of user intentions, conversational sequences, etc. A model of -artificial agent is proposed. It leads the user in its information search by -proposing to him examples, assistance and choices. This model was implemented -and integrated in the CISMeF system. ---- Cet article d\'ecrit le projet -Cogni-CISMeF qui propose un module de dialogue Homme-Machine \`a int\'egrer -dans le syst\`eme d'indexation de connaissances m\'edicales CISMeF (Catalogue -et Index des Sites M\'edicaux Francophones). Nous avons adopt\'e une d\'emarche -de mod\'elisation cognitive en proc\'edant \`a un recueil de corpus de -dialogues entre un utilisateur (jouant le r\^ole d'un patient) d\'esirant une -information m\'edicale et un expert CISMeF af inant cette demande pour -construire la requ\^ete. Nous avons analys\'e la structure des dialogues ainsi -obtenus et avons \'etudi\'e un certain nombre d'indices discursifs : -vocabulaire employ\'e, marques de reformulation, commentaires m\'eta et -\'epilinguistiques, expression implicite ou explicite des intentions de -l'utilisateur, encha\^inement conversationnel, etc. De cette analyse, nous -avons construit un mod\`ele d'agent artificiel dot\'e de capacit\'es cognitives -capables d'aider l'utilisateur dans sa t\^ache de recherche d'information. Ce -mod\`ele a \'et\'e impl\'ement\'e et int\'egr\'e dans le syst\`eme CISMeF. -" -1063,1402.2796,Fabio Celli and Massimo Poesio,"PR2: A Language Independent Unsupervised Tool for Personality - Recognition from Text",cs.CL," We present PR2, a personality recognition system available online, that -performs instance-based classification of Big5 personality types from -unstructured text, using language-independent features. It has been tested on -English and Italian, achieving performances up to f=.68. -" -1064,1402.3040,"Jia-Fei Hong, Kathleen Ahrens, Chu-Ren Huang",Event Structure of Transitive Verb: A MARVS perspective,cs.CL," Module-Attribute Representation of Verbal Semantics (MARVS) is a theory of -the representation of verbal semantics that is based on Mandarin Chinese data -(Huang et al. 2000). In the MARVS theory, there are two different types of -modules: Event Structure Modules and Role Modules. There are also two sets of -attributes: Event-Internal Attributes and Role-Internal Attributes, which are -linked to the Event Structure Module and the Role Module, respectively. In this -study, we focus on four transitive verbs as chi1(eat), wan2(play), -huan4(change) and shao1(burn) and explore their event structures by the MARVS -theory. -" -1065,1402.3080,Santhy Viswam and Sajeer Karattil,Software Requirement Specification Using Reverse Speech Technology,cs.CL cs.SD," Speech analysis had been taken to a new level with the discovery of Reverse -Speech (RS). RS is the discovery of hidden messages, referred as reversals, in -normal speech. Works are in progress for exploiting the relevance of RS in -different real world applications such as investigation, medical field etc. In -this paper we represent an innovative method for preparing a reliable Software -Requirement Specification (SRS) document with the help of reverse speech. As -SRS act as the backbone for the successful completion of any project, a -reliable method is needed to overcome the inconsistencies. Using RS such a -reliable method for SRS documentation was developed. -" -1066,1402.3371,"Andrea Ballatore, Michela Bertolotto, David C. Wilson",An evaluative baseline for geo-semantic relatedness and similarity,cs.CL," In geographic information science and semantics, the computation of semantic -similarity is widely recognised as key to supporting a vast number of tasks in -information integration and retrieval. By contrast, the role of geo-semantic -relatedness has been largely ignored. In natural language processing, semantic -relatedness is often confused with the more specific semantic similarity. In -this article, we discuss a notion of geo-semantic relatedness based on Lehrer's -semantic fields, and we compare it with geo-semantic similarity. We then -describe and validate the Geo Relatedness and Similarity Dataset (GeReSiD), a -new open dataset designed to evaluate computational measures of geo-semantic -relatedness and similarity. This dataset is larger than existing datasets of -this kind, and includes 97 geographic terms combined into 50 term pairs rated -by 203 human subjects. GeReSiD is available online and can be used as an -evaluation baseline to determine empirically to what degree a given -computational model approximates geo-semantic relatedness and similarity. -" -1067,1402.3382,"K.Rajan, Dr.V.Ramalingam, Dr.M.Ganesan","Machine Learning of Phonologically Conditioned Noun Declensions For - Tamil Morphological Generators",cs.CL," This paper presents machine learning solutions to a practical problem of -Natural Language Generation (NLG), particularly the word formation in -agglutinative languages like Tamil, in a supervised manner. The morphological -generator is an important component of Natural Language Processing in -Artificial Intelligence. It generates word forms given a root and affixes. The -morphophonemic changes like addition, deletion, alternation etc., occur when -two or more morphemes or words joined together. The Sandhi rules should be -explicitly specified in the rule based morphological analyzers and generators. -In machine learning framework, these rules can be learned automatically by the -system from the training samples and subsequently be applied for new inputs. In -this paper we proposed the machine learning models which learn the -morphophonemic rules for noun declensions from the given training data. These -models are trained to learn sandhi rules using various learning algorithms and -the performance of those algorithms are presented. From this we conclude that -machine learning of morphological processing such as word form generation can -be successfully learned in a supervised manner, without explicit description of -rules. The performance of Decision trees and Bayesian machine learning -algorithms on noun declensions are discussed. -" -1068,1402.3405,"Daniele Cerra, Mihai Datcu, and Peter Reinartz",Authorship Analysis based on Data Compression,cs.CL cs.DL cs.IR stat.ML," This paper proposes to perform authorship analysis using the Fast Compression -Distance (FCD), a similarity measure based on compression with dictionaries -directly extracted from the written texts. The FCD computes a similarity -between two documents through an effective binary search on the intersection -set between the two related dictionaries. In the reported experiments the -proposed method is applied to documents which are heterogeneous in style, -written in five different languages and coming from different historical -periods. Results are comparable to the state of the art and outperform -traditional compression-based methods. -" -1069,1402.3648,"Shikha Kabra, Ritika Agarwal",Auto Spell Suggestion for High Quality Speech Synthesis in Hindi,cs.CL cs.SD," The goal of Text-to-Speech (TTS) synthesis in a particular language is to -convert arbitrary input text to intelligible and natural sounding speech. -However, for a particular language like Hindi, which is a highly confusing -language (due to very close spellings), it is not an easy task to identify -errors/mistakes in input text and an incorrect text degrade the quality of -output speech hence this paper is a contribution to the development of high -quality speech synthesis with the involvement of Spellchecker which generates -spell suggestions for misspelled words automatically. Involvement of -spellchecker would increase the efficiency of speech synthesis by providing -spell suggestions for incorrect input text. Furthermore, we have provided the -comparative study for evaluating the resultant effect on to phonetic text by -adding spellchecker on to input text. -" -1070,1402.3722,Yoav Goldberg and Omer Levy,"word2vec Explained: deriving Mikolov et al.'s negative-sampling - word-embedding method",cs.CL cs.LG stat.ML," The word2vec software of Tomas Mikolov and colleagues -(https://code.google.com/p/word2vec/ ) has gained a lot of traction lately, and -provides state-of-the-art word embeddings. The learning models behind the -software are described in two research papers. We found the description of the -models in these papers to be somewhat cryptic and hard to follow. While the -motivations and presentation may be obvious to the neural-networks -language-modeling crowd, we had to struggle quite a bit to figure out the -rationale behind the equations. - This note is an attempt to explain equation (4) (negative sampling) in -""Distributed Representations of Words and Phrases and their Compositionality"" -by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean. -" -1071,1402.3891,Vinodhini G Chandrasekaran RM,"Performance Evaluation of Machine Learning Classifiers in Sentiment - Mining",cs.LG cs.CL cs.IR," In recent years, the use of machine learning classifiers is of great value in -solving a variety of problems in text classification. Sentiment mining is a -kind of text classification in which, messages are classified according to -sentiment orientation such as positive or negative. This paper extends the idea -of evaluating the performance of various classifiers to show their -effectiveness in sentiment mining of online product reviews. The product -reviews are collected from Amazon reviews. To evaluate the performance of -classifiers various evaluation methods like random sampling, linear sampling -and bootstrap sampling are used. Our results shows that support vector machine -with bootstrap sampling method outperforms others classifiers and sampling -methods in terms of misclassification rate. -" -1072,1402.4259,Roberto Marazzato and Amelia Carolina Sparavigna,"Extracting Networks of Characters and Places from Written Works with - CHAPLIN",cs.CY cs.CL," We are proposing a tool able to gather information on social networks from -narrative texts. Its name is CHAPLIN, CHAracters and PLaces Interaction -Network, implemented in VB.NET. Characters and places of the narrative works -are extracted in a list of raw words. Aided by the interface, the user selects -names out of them. After this choice, the tool allows the user to enter some -parameters, and, according to them, creates a network where the nodes are the -characters and places, and the edges their interactions. Edges are labelled by -performances. The output is a GV file, written in the DOT graph scripting -language, which is rendered by means of the free open source software Graphviz. -" -1073,1402.4380,"Samuel Danso, Eric Atwell and Owen Johnson","A Comparative Study of Machine Learning Methods for Verbal Autopsy Text - Classification",cs.CL," A Verbal Autopsy is the record of an interview about the circumstances of an -uncertified death. In developing countries, if a death occurs away from health -facilities, a field-worker interviews a relative of the deceased about the -circumstances of the death; this Verbal Autopsy can be reviewed off-site. We -report on a comparative study of the processes involved in Text Classification -applied to classifying Cause of Death: feature value representation; machine -learning classification algorithms; and feature reduction strategies in order -to identify the suitable approaches applicable to the classification of Verbal -Autopsy text. We demonstrate that normalised term frequency and the standard -TFiDF achieve comparable performance across a number of classifiers. The -results also show Support Vector Machine is superior to other classification -algorithms employed in this research. Finally, we demonstrate the effectiveness -of employing a ""locally-semi-supervised"" feature reduction strategy in order to -increase performance accuracy. -" -1074,1402.4678,Yelena Mandelshtam and Natalia Komarova,"When Learners Surpass their Sources: Mathematical Modeling of Learning - from an Inconsistent Source",cs.CL," We present a new algorithm to model and investigate the learning process of a -learner mastering a set of grammatical rules from an inconsistent source. The -compelling interest of human language acquisition is that the learning succeeds -in virtually every case, despite the fact that the input data are formally -inadequate to explain the success of learning. Our model explains how a learner -can successfully learn from or even surpass its imperfect source without -possessing any additional biases or constraints about the types of patterns -that exist in the language. We use the data collected by Singleton and Newport -(2004) on the performance of a 7-year boy Simon, who mastered the American Sign -Language (ASL) by learning it from his parents, both of whom were imperfect -speakers of ASL. We show that the algorithm possesses a frequency-boosting -property, whereby the frequency of the most common form of the source is -increased by the learner. We also explain several key features of Simon's ASL. -" -1075,1402.4802,Ricard V. Sol\'e and Lu\'is F. Seoane,Ambiguity in language networks,physics.soc-ph cs.CL q-bio.NC," Human language defines the most complex outcomes of evolution. The emergence -of such an elaborated form of communication allowed humans to create extremely -structured societies and manage symbols at different levels including, among -others, semantics. All linguistic levels have to deal with an astronomic -combinatorial potential that stems from the recursive nature of languages. This -recursiveness is indeed a key defining trait. However, not all words are -equally combined nor frequent. In breaking the symmetry between less and more -often used and between less and more meaning-bearing units, universal scaling -laws arise. Such laws, common to all human languages, appear on different -stages from word inventories to networks of interacting words. Among these -seemingly universal traits exhibited by language networks, ambiguity appears to -be a specially relevant component. Ambiguity is avoided in most computational -approaches to language processing, and yet it seems to be a crucial element of -language architecture. Here we review the evidence both from language network -architecture and from theoretical reasonings based on a least effort argument. -Ambiguity is shown to play an essential role in providing a source of language -efficiency, and is likely to be an inevitable byproduct of network growth. -" -1076,1402.5123,"Abdelmalek Amine, Reda Mohamed Hamou, Michel Simonet",Detecting Opinions in Tweets,cs.CL cs.SI," Given the incessant growth of documents describing the opinions of different -people circulating on the web, including Web 2.0 has made it possible to give -an opinion on any product in the net. In this paper, we examine the various -opinions expressed in the tweets and classify them positive, negative or -neutral by using the emoticons for the Bayesian method and adjectives and -adverbs for the Turney's method -" -1077,1402.6010,Linhong Zhu and Aram Galstyan and James Cheng and Kristina Lerman,"Tripartite Graph Clustering for Dynamic Sentiment Analysis on Social - Media",cs.SI cs.CL cs.IR," The growing popularity of social media (e.g, Twitter) allows users to easily -share information with each other and influence others by expressing their own -sentiments on various subjects. In this work, we propose an unsupervised -\emph{tri-clustering} framework, which analyzes both user-level and tweet-level -sentiments through co-clustering of a tripartite graph. A compelling feature of -the proposed framework is that the quality of sentiment clustering of tweets, -users, and features can be mutually improved by joint clustering. We further -investigate the evolution of user-level sentiments and latent feature vectors -in an online framework and devise an efficient online algorithm to sequentially -update the clustering of tweets, users and features with newly arrived data. -The online framework not only provides better quality of both dynamic -user-level and tweet-level sentiment analysis, but also improves the -computational and storage efficiency. We verified the effectiveness and -efficiency of the proposed approaches on the November 2012 California ballot -Twitter data. -" -1078,1402.6238,"Jobin Wilson, Santanu Chaudhury, Brejesh Lall, Prateek Kapadia","Improving Collaborative Filtering based Recommenders using Topic - Modelling",cs.IR cs.CL cs.LG," Standard Collaborative Filtering (CF) algorithms make use of interactions -between users and items in the form of implicit or explicit ratings alone for -generating recommendations. Similarity among users or items is calculated -purely based on rating overlap in this case,without considering explicit -properties of users or items involved, limiting their applicability in domains -with very sparse rating spaces. In many domains such as movies, news or -electronic commerce recommenders, considerable contextual data in text form -describing item properties is available along with the rating data, which could -be utilized to improve recommendation quality.In this paper, we propose a novel -approach to improve standard CF based recommenders by utilizing latent -Dirichlet allocation (LDA) to learn latent properties of items, expressed in -terms of topic proportions, derived from their textual description. We infer -user's topic preferences or persona in the same latent space,based on her -historical ratings. While computing similarity between users, we make use of a -combined similarity measure involving rating overlap as well as similarity in -the latent topic space. This approach alleviates sparsity problem as it allows -calculation of similarity between users even if they have not rated any items -in common. Our experiments on multiple public datasets indicate that the -proposed hybrid approach significantly outperforms standard user Based and item -Based CF recommenders in terms of classification accuracy metrics such as -precision, recall and f-measure. -" -1079,1402.6516,Greg Dubbin and Phil Blunsom,Modelling the Lexicon in Unsupervised Part of Speech Induction,cs.CL," Automatically inducing the syntactic part-of-speech categories for words in -text is a fundamental task in Computational Linguistics. While the performance -of unsupervised tagging models has been slowly improving, current -state-of-the-art systems make the obviously incorrect assumption that all -tokens of a given word type must share a single part-of-speech tag. This -one-tag-per-type heuristic counters the tendency of Hidden Markov Model based -taggers to over generate tags for a given word type. However, it is clearly -incompatible with basic syntactic theory. In this paper we extend a -state-of-the-art Pitman-Yor Hidden Markov Model tagger with an explicit model -of the lexicon. In doing so we are able to incorporate a soft bias towards -inducing few tags per type. We develop a particle filter for drawing samples -from the posterior of our model and present empirical results that show that -our model is competitive with and faster than the state-of-the-art without -making any unrealistic restrictions. -" -1080,1402.6690,"Jalal Mahmud, Jilin Chen, Jeffrey Nichols",Why Are You More Engaged? Predicting Social Engagement from Word Use,cs.SI cs.CL cs.CY," We present a study to analyze how word use can predict social engagement -behaviors such as replies and retweets in Twitter. We compute psycholinguistic -category scores from word usage, and investigate how people with different -scores exhibited different reply and retweet behaviors on Twitter. We also -found psycholinguistic categories that show significant correlations with such -social engagement behaviors. In addition, we have built predictive models of -replies and retweets from such psycholinguistic category based features. Our -experiments using a real world dataset collected from Twitter validates that -such predictions can be done with reasonable accuracy. -" -1081,1402.6764,"Hazlina Haron, Abdul Azim Abd. Ghani","A method to identify potential ambiguous Malay words through Ambiguity - Attributes mapping: An exploratory Study",cs.SE cs.CL," We describe here a methodology to identify a list of ambiguous Malay words -that are commonly being used in Malay documentations such as Requirement -Specification. We compiled several relevant and appropriate requirement quality -attributes and sentence rules from previous literatures and adopt it to come -out with a set of ambiguity attributes that most suit Malay words. The -extracted Malay ambiguous words (potential) are then being mapped onto the -constructed ambiguity attributes to confirm their vagueness. The list is then -verified by Malay linguist experts. This paper aims to identify a list of -potential ambiguous words in Malay as an attempt to assist writers to avoid -using the vague words while documenting Malay Requirement Specification as well -as to any other related Malay documentation. The result of this study is a list -of 120 potential ambiguous Malay words that could act as guidelines in writing -Malay sentences -" -1082,1402.6792,"Lada A. Adamic, Thomas M. Lento, Eytan Adar, Pauline C. Ng",Information Evolution in Social Networks,cs.SI cs.CL physics.soc-ph," Social networks readily transmit information, albeit with less than perfect -fidelity. We present a large-scale measurement of this imperfect information -copying mechanism by examining the dissemination and evolution of thousands of -memes, collectively replicated hundreds of millions of times in the online -social network Facebook. The information undergoes an evolutionary process that -exhibits several regularities. A meme's mutation rate characterizes the -population distribution of its variants, in accordance with the Yule process. -Variants further apart in the diffusion cascade have greater edit distance, as -would be expected in an iterative, imperfect replication process. Some text -sequences can confer a replicative advantage; these sequences are abundant and -transfer ""laterally"" between different memes. Subpopulations of the social -network can preferentially transmit a specific variant of a meme if the variant -matches their beliefs or culture. Understanding the mechanism driving change in -diffusing information has important implications for how we interpret and -harness the information that reaches us through our social networks. -" -1083,1402.6880,M.T. Keane and A. Gerow,"It's distributions all the way down!: Second order changes in - statistical distributions also occur",cs.CL," The textual, big-data literature misses Bentley, OBrien, & Brocks (Bentley et -als) message on distributions; it largely examines the first-order effects of -how a single, signature distribution can predict population behaviour, -neglecting second-order effects involving distributional shifts, either between -signature distributions or within a given signature distribution. Indeed, -Bentley et al. themselves under-emphasise the potential richness of the latter, -within-distribution effects. -" -1084,1402.7265,Yarin Gal,"Semantics, Modelling, and the Problem of Representation of Meaning -- a - Brief Survey of Recent Literature",cs.CL," Over the past 50 years many have debated what representation should be used -to capture the meaning of natural language utterances. Recently new needs of -such representations have been raised in research. Here I survey some of the -interesting representations suggested to answer for these new needs. -" -1085,1403.0052,"Laurent Romary (IDSL, INRIA Saclay - Ile de France, CMB)","TBX goes TEI -- Implementing a TBX basic extension for the Text Encoding - Initiative guidelines",cs.CL," This paper presents an attempt to customise the TEI (Text Encoding -Initiative) guidelines in order to offer the possibility to incorporate TBX -(TermBase eXchange) based terminological entries within any kind of TEI -documents. After presenting the general historical, conceptual and technical -contexts, we describe the various design choices we had to take while creating -this customisation, which in turn have led to make various changes in the -actual TBX serialisation. Keeping in mind the objective to provide the TEI -guidelines with, again, an onomasiological model, we try to identify the best -comprise in maintaining both the isomorphism with the existing TBX Basic -standard and the characteristics of the TEI framework. -" -1086,1403.0531,Josiah P. Zayner,"We Tweet Like We Talk and Other Interesting Observations: An Analysis of - English Communication Modalities",cs.CL," Modalities of communication for human beings are gradually increasing in -number with the advent of new forms of technology. Many human beings can -readily transition between these different forms of communication with little -or no effort, which brings about the question: How similar are these different -communication modalities? To understand technology$\text{'}$s influence on -English communication, four different corpora were analyzed and compared: -Writing from Books using the 1-grams database from the Google Books project, -Twitter, IRC Chat, and transcribed Talking. Multi-word confusion matrices -revealed that Talking has the most similarity when compared to the other modes -of communication, while 1-grams were the least similar form of communication -analyzed. Based on the analysis of word usage, word usage frequency -distributions, and word class usage, among other things, Talking is also the -most similar to Twitter and IRC Chat. This suggests that communicating using -Twitter and IRC Chat evolved from Talking rather than Writing. When we -communicate online, even though we are writing, we do not Tweet or Chat how we -write books; we Tweet and Chat how we Speak. Nonfiction and Fiction writing -were clearly differentiable from our analysis with Twitter and Chat being much -more similar to Fiction than Nonfiction writing. These hypotheses were then -tested using author and journalists Cory Doctorow. Mr. Doctorow$\text{'}$s -Writing, Twitter usage, and Talking were all found to have very similar -vocabulary usage patterns as the amalgamized populations, as long as the -writing was Fiction. However, Mr. Doctorow$\text{'}$s Nonfiction writing is -different from 1-grams and other collected Nonfiction writings. This data could -perhaps be used to create more entertaining works of Nonfiction. -" -1087,1403.0541,Saadat Anwar,"Representing, reasoning and answering questions about biological - pathways - various applications",cs.AI cs.CE cs.CL," Biological organisms are composed of numerous interconnected biochemical -processes. Diseases occur when normal functionality of these processes is -disrupted. Thus, understanding these biochemical processes and their -interrelationships is a primary task in biomedical research and a prerequisite -for diagnosing diseases, and drug development. Scientists studying these -processes have identified various pathways responsible for drug metabolism, and -signal transduction, etc. - Newer techniques and speed improvements have resulted in deeper knowledge -about these pathways, resulting in refined models that tend to be large and -complex, making it difficult for a person to remember all aspects of it. Thus, -computer models are needed to analyze them. We want to build such a system that -allows modeling of biological systems and pathways in such a way that we can -answer questions about them. - Many existing models focus on structural and/or factoid questions, using -surface-level knowledge that does not require understanding the underlying -model. We believe these are not the kind of questions that a biologist may ask -someone to test their understanding of the biological processes. We want our -system to answer the kind of questions a biologist may ask. Such questions -appear in early college level text books. - Thus the main goal of our thesis is to develop a system that allows us to -encode knowledge about biological pathways and answer such questions about them -demonstrating understanding of the pathway. To that end, we develop a language -that will allow posing such questions and illustrate the utility of our -framework with various applications in the biological domain. We use some -existing tools with modifications to accomplish our goal. - Finally, we apply our system to real world applications by extracting pathway -knowledge from text and answering questions related to drug development. -" -1088,1403.0801,"Derrick Higgins, Chris Brew, Michael Heilman, Ramon Ziai, Lei Chen, - Aoife Cahill, Michael Flor, Nitin Madnani, Joel Tetreault, Daniel Blanchard, - Diane Napolitano, Chong Min Lee and John Blackmore","Is getting the right answer just about choosing the right words? The - role of syntactically-informed features in short answer scoring",cs.CL," Developments in the educational landscape have spurred greater interest in -the problem of automatically scoring short answer questions. A recent shared -task on this topic revealed a fundamental divide in the modeling approaches -that have been applied to this problem, with the best-performing systems split -between those that employ a knowledge engineering approach and those that -almost solely leverage lexical information (as opposed to higher-level -syntactic information) in assigning a score to a given response. This paper -aims to introduce the NLP community to the largest corpus currently available -for short-answer scoring, provide an overview of methods used in the shared -task using this data, and explore the extent to which more -syntactically-informed features can contribute to the short answer scoring task -in a way that avoids the question-specific manual effort of the knowledge -engineering approach. -" -1089,1403.1194,Minoru Sasaki,"Latent Semantic Word Sense Disambiguation Using Global Co-occurrence - Information",cs.CL cs.IR," In this paper, I propose a novel word sense disambiguation method based on -the global co-occurrence information using NMF. When I calculate the dependency -relation matrix, the existing method tends to produce very sparse co-occurrence -matrix from a small training set. Therefore, the NMF algorithm sometimes does -not converge to desired solutions. To obtain a large number of co-occurrence -relations, I propose to use co-occurrence frequencies of dependency relations -between word features in the whole training set. This enables us to solve data -sparseness problem and induce more effective latent features. To evaluate the -efficiency of the method of word sense disambiguation, I make some experiments -to compare with the result of the two baseline methods. The results of the -experiments show this method is effective for word sense disambiguation in -comparison with the all baseline methods. Moreover, the proposed method is -effective for obtaining a stable effect by analyzing the global co-occurrence -information. -" -1090,1403.1252,"Bryan Perozzi, Rami Al-Rfou, Vivek Kulkarni, Steven Skiena",Inducing Language Networks from Continuous Space Word Representations,cs.LG cs.CL cs.SI," Recent advancements in unsupervised feature learning have developed powerful -latent representations of words. However, it is still not clear what makes one -representation better than another and how we can learn the ideal -representation. Understanding the structure of latent spaces attained is key to -any future advancement in unsupervised learning. In this work, we introduce a -new view of continuous space word representations as language networks. We -explore two techniques to create language networks from learned features by -inducing them for two popular word representation methods and examining the -properties of their resulting networks. We find that the induced networks -differ from other methods of creating language networks, and that they contain -meaningful community structure. -" -1091,1403.1310,"M.A.C. Jiffriya, M.A.C. Akmal Jahan, R.G. Ragel and S. Deegalla","AntiPlag: Plagiarism Detection on Electronic Submissions of Text Based - Assignments",cs.IR cs.CL cs.DL," Plagiarism is one of the growing issues in academia and is always a concern -in Universities and other academic institutions. The situation is becoming even -worse with the availability of ample resources on the web. This paper focuses -on creating an effective and fast tool for plagiarism detection for text based -electronic assignments. Our plagiarism detection tool named AntiPlag is -developed using the tri-gram sequence matching technique. Three sets of text -based assignments were tested by AntiPlag and the results were compared against -an existing commercial plagiarism detection tool. AntiPlag showed better -results in terms of false positives compared to the commercial tool due to the -pre-processing steps performed in AntiPlag. In addition, to improve the -detection latency, AntiPlag applies a data clustering technique making it four -times faster than the commercial tool considered. AntiPlag could be used to -isolate plagiarized text based assignments from non-plagiarised assignments -easily. Therefore, we present AntiPlag, a fast and effective tool for -plagiarism detection on text based electronic assignments. -" -1092,1403.1314,"R. G. Ragel, P. Herath and U. Senanayake",Authorship detection of SMS messages using unigrams,cs.CL cs.IR," SMS messaging is a popular media of communication. Because of its popularity -and privacy, it could be used for many illegal purposes. Additionally, since -they are part of the day to day life, SMSes can be used as evidence for many -legal disputes. Since a cellular phone might be accessible to people close to -the owner, it is important to establish the fact that the sender of the message -is indeed the owner of the phone. For this purpose, the straight forward -solutions seem to be the use of popular stylometric methods. However, in -comparison with the data used for stylometry in the literature, SMSes have -unusual characteristics making it hard or impossible to apply these methods in -a conventional way. Our target is to come up with a method of authorship -detection of SMS messages that could still give a usable accuracy. We argue -that, considering the methods of author attribution, the best method that could -be applied to SMS messages is an n-gram method. To prove our point, we checked -two different methods of distribution comparison with varying number of -training and testing data. We specifically try to compare how well our -algorithms work under less amount of testing data and large number of candidate -authors (which we believe to be the real world scenario) against controlled -tests with less number of authors and selected SMSes with large number of -words. To counter the lack of information in an SMS message, we propose the -method of stacking together few SMSes. -" -1093,1403.1349,"Sam Anzaroot, Alexandre Passos, David Belanger, Andrew McCallum","Learning Soft Linear Constraints with Application to Citation Field - Extraction",cs.CL cs.DL cs.IR," Accurately segmenting a citation string into fields for authors, titles, etc. -is a challenging task because the output typically obeys various global -constraints. Previous work has shown that modeling soft constraints, where the -model is encouraged, but not require to obey the constraints, can substantially -improve segmentation performance. On the other hand, for imposing hard -constraints, dual decomposition is a popular technique for efficient prediction -given existing algorithms for unconstrained inference. We extend the technique -to perform prediction subject to soft constraints. Moreover, with a technique -for performing inference given soft constraints, it is easy to automatically -generate large families of constraints and learn their costs with a simple -convex optimization problem during training. This allows us to obtain -substantial gains in accuracy on a new, challenging citation extraction -dataset. -" -1094,1403.1451,"Arkaitz Zubiaga, Damiano Spina, Raquel Mart\'inez, V\'ictor Fresno",Real-Time Classification of Twitter Trends,cs.IR cs.CL cs.SI," Social media users give rise to social trends as they share about common -interests, which can be triggered by different reasons. In this work, we -explore the types of triggers that spark trends on Twitter, introducing a -typology with following four types: 'news', 'ongoing events', 'memes', and -'commemoratives'. While previous research has analyzed trending topics in a -long term, we look at the earliest tweets that produce a trend, with the aim of -categorizing trends early on. This would allow to provide a filtered subset of -trends to end users. We analyze and experiment with a set of straightforward -language-independent features based on the social spread of trends to -categorize them into the introduced typology. Our method provides an efficient -way to accurately categorize trending topics without need of external data, -enabling news organizations to discover breaking news in real-time, or to -quickly identify viral memes that might enrich marketing decisions, among -others. The analysis of social features also reveals patterns associated with -each type of trend, such as tweets about ongoing events being shorter as many -were likely sent from mobile devices, or memes having more retweets originating -from a few trend-setters. -" -1095,1403.1618,"Maryam Mahmoodi, Mohammad Mahmoodi Varnamkhasti",Design a Persian Automated Plagiarism Detector (AMZPPD),cs.AI cs.CL," Currently there are lots of plagiarism detection approaches. But few of them -implemented and adapted for Persian languages. In this paper, our work on -designing and implementation of a plagiarism detection system based on -pre-processing and NLP technics will be described. And the results of testing -on a corpus will be presented. -" -1096,1403.1773,"Fred Morstatter, Nichola Lubold, Heather Pon-Barry, J\""urgen Pfeffer, - and Huan Liu",Finding Eyewitness Tweets During Crises,cs.CL cs.CY," Disaster response agencies have started to incorporate social media as a -source of fast-breaking information to understand the needs of people affected -by the many crises that occur around the world. These agencies look for tweets -from within the region affected by the crisis to get the latest updates of the -status of the affected region. However only 1% of all tweets are geotagged with -explicit location information. First responders lose valuable information -because they cannot assess the origin of many of the tweets they collect. In -this work we seek to identify non-geotagged tweets that originate from within -the crisis region. Towards this, we address three questions: (1) is there a -difference between the language of tweets originating within a crisis region -and tweets originating outside the region, (2) what are the linguistic patterns -that can be used to differentiate within-region and outside-region tweets, and -(3) for non-geotagged tweets, can we automatically identify those originating -within the crisis region in real-time? -" -1097,1403.2004,Michael Stewart,Natural Language Feature Selection via Cooccurrence,cs.CL," Specificity is important for extracting collocations, keyphrases, multi-word -and index terms [Newman et al. 2012]. It is also useful for tagging, ontology -construction [Ryu and Choi 2006], and automatic summarization of documents -[Louis and Nenkova 2011, Chali and Hassan 2012]. Term frequency and -inverse-document frequency (TF-IDF) are typically used to do this, but fail to -take advantage of the semantic relationships between terms [Church and Gale -1995]. The result is that general idiomatic terms are mistaken for specific -terms. We demonstrate use of relational data for estimation of term -specificity. The specificity of a term can be learned from its distribution of -relations with other terms. This technique is useful for identifying relevant -words or terms for other natural language processing tasks. -" -1098,1403.2124,Hannah Davis and Saif M. Mohammad,Generating Music from Literature,cs.CL," We present a system, TransProse, that automatically generates musical pieces -from text. TransProse uses known relations between elements of music such as -tempo and scale, and the emotions they evoke. Further, it uses a novel -mechanism to determine sequences of notes that capture the emotional activity -in the text. The work has applications in information visualization, in -creating audio-visual e-books, and in developing music apps. -" -1099,1403.2152,Robert John Freeman,Parsing using a grammar of word association vectors,cs.CL cs.NE," This paper was was first drafted in 2001 as a formalization of the system -described in U.S. patent U.S. 7,392,174. It describes a system for implementing -a parser based on a kind of cross-product over vectors of contextually similar -words. It is being published now in response to nascent interest in vector -combination models of syntax and semantics. The method used aggressive -substitution of contextually similar words and word groups to enable product -vectors to stay in the same space as their operands and make entire sentences -comparable syntactically, and potentially semantically. The vectors generated -had sufficient representational strength to generate parse trees at least -comparable with contemporary symbolic parsers. -" -1100,1403.2345,"Jalal Mahmud, Jeffrey Nichols, Clemens Drews",Home Location Identification of Twitter Users,cs.SI cs.CL cs.CY," We present a new algorithm for inferring the home location of Twitter users -at different granularities, including city, state, time zone or geographic -region, using the content of users tweets and their tweeting behavior. Unlike -existing approaches, our algorithm uses an ensemble of statistical and -heuristic classifiers to predict locations and makes use of a geographic -gazetteer dictionary to identify place-name entities. We find that a -hierarchical classification approach, where time zone, state or geographic -region is predicted first and city is predicted next, can improve prediction -accuracy. We have also analyzed movement variations of Twitter users, built a -classifier to predict whether a user was travelling in a certain period of time -and use that to further improve the location detection accuracy. Experimental -evidence suggests that our algorithm works well in practice and outperforms the -best existing algorithms for predicting the home location of Twitter users. -" -1101,1403.2837,Ayshe Rashidi and Mina Zolfy Lighvan,HPS: a hierarchical Persian stemming method,cs.CL," In this paper, a novel hierarchical Persian stemming approach based on the -Part-Of-Speech of the word in a sentence is presented. The implemented stemmer -includes hash tables and several deterministic finite automata in its different -levels of hierarchy for removing the prefixes and suffixes of the words. We had -two intentions in using hash tables in our method. The first one is that the -DFA don't support some special words, so hash table can partly solve the -addressed problem. the second goal is to speed up the implemented stemmer with -omitting the time that deterministic finite automata need. Because of the -hierarchical organization, this method is fast and flexible enough. Our -experiments on test sets from Hamshahri collection and security news (istna.ir) -show that our method has the average accuracy of 95.37% which is even improved -in using the method on a test set with common topics. -" -1102,1403.3142,"Shalini Ghosh, Daniel Elenius, Wenchao Li, Patrick Lincoln, Natarajan - Shankar, Wilfried Steiner","ARSENAL: Automatic Requirements Specification Extraction from Natural - Language",cs.CL cs.SE," Requirements are informal and semi-formal descriptions of the expected -behavior of a complex system from the viewpoints of its stakeholders -(customers, users, operators, designers, and engineers). However, for the -purpose of design, testing, and verification for critical systems, we can -transform requirements into formal models that can be analyzed automatically. -ARSENAL is a framework and methodology for systematically transforming natural -language (NL) requirements into analyzable formal models and logic -specifications. These models can be analyzed for consistency and -implementability. The ARSENAL methodology is specialized to individual domains, -but the approach is general enough to be adapted to new domains. -" -1103,1403.3185,Md. Ansarul Haque,Sentiment Analysis by Using Fuzzy Logic,cs.IR cs.CL," How could a product or service is reasonably evaluated by anyone in the -shortest time? A million dollar question but it is having a simple answer: -Sentiment analysis. Sentiment analysis is consumers review on products and -services which helps both the producers and consumers (stakeholders) to take -effective and efficient decision within a shortest period of time. Producers -can have better knowledge of their products and services through the sentiment -analysis (ex. positive and negative comments or consumers likes and dislikes) -which will help them to know their products status (ex. product limitations or -market status). Consumers can have better knowledge of their interested -products and services through the sentiment analysis (ex. positive and negative -comments or consumers likes and dislikes) which will help them to know their -deserving products status (ex. product limitations or market status). For more -specification of the sentiment values, fuzzy logic could be introduced. -Therefore, sentiment analysis with the help of fuzzy logic (deals with -reasoning and gives closer views to the exact sentiment values) will help the -producers or consumers or any interested person for taking the effective -decision according to their product or service interest. -" -1104,1403.3351,Samson Abramsky and Mehrnoosh Sadrzadeh,Semantic Unification A sheaf theoretic approach to natural language,cs.CL," Language is contextual and sheaf theory provides a high level mathematical -framework to model contextuality. We show how sheaf theory can model the -contextual nature of natural language and how gluing can be used to provide a -global semantics for a discourse by putting together the local logical -semantics of each sentence within the discourse. We introduce a presheaf -structure corresponding to a basic form of Discourse Representation Structures. -Within this setting, we formulate a notion of semantic unification --- gluing -meanings of parts of a discourse into a coherent whole --- as a form of -sheaf-theoretic gluing. We illustrate this idea with a number of examples where -it can used to represent resolutions of anaphoric references. We also discuss -multivalued gluing, described using a distributions functor, which can be used -to represent situations where multiple gluings are possible, and where we may -need to rank them using quantitative measures. - Dedicated to Jim Lambek on the occasion of his 90th birthday. -" -1105,1403.3460,"Chi Wang, Xueqing Liu, Yanglei Song, Jiawei Han",Scalable and Robust Construction of Topical Hierarchies,cs.LG cs.CL cs.DB cs.IR," Automated generation of high-quality topical hierarchies for a text -collection is a dream problem in knowledge engineering with many valuable -applications. In this paper a scalable and robust algorithm is proposed for -constructing a hierarchy of topics from a text collection. We divide and -conquer the problem using a top-down recursive framework, based on a tensor -orthogonal decomposition technique. We solve a critical challenge to perform -scalable inference for our newly designed hierarchical topic model. Experiments -with various real-world datasets illustrate its ability to generate robust, -high-quality hierarchies efficiently. Our method reduces the time of -construction by several orders of magnitude, and its robust feature renders it -possible for users to interactively revise the hierarchy. -" -1106,1403.3668,Arthur Merin,"Language Heedless of Logic - Philosophy Mindful of What? Failures of - Distributive and Absorption Laws",cs.CL," Much of philosophical logic and all of philosophy of language make empirical -claims about the vernacular natural language. They presume semantics under -which `and' and `or' are related by the dually paired distributive and -absorption laws. However, at least one of each pair of laws fails in the -vernacular. `Implicature'-based auxiliary theories associated with the -programme of H.P. Grice do not prove remedial. Conceivable alternatives that -might replace the familiar logics as descriptive instruments are briefly noted: -(i) substructural logics and (ii) meaning composition in linear algebras over -the reals, occasionally constrained by norms of classical logic. Alternative -(ii) locates the problem in violations of one of the idempotent laws. Reasons -for a lack of curiosity about elementary and easily testable implications of -the received theory are considered. The concept of `reflective equilibrium' is -critically examined for its role in reconciling normative desiderata and -descriptive commitments. -" -1107,1403.4024,"Uli Fahrenberg, Fabrizio Biondi, Kevin Corre, Cyrille Jegourel, Simon - Kongsh{\o}j, Axel Legay",Measuring Global Similarity between Texts,cs.CL," We propose a new similarity measure between texts which, contrary to the -current state-of-the-art approaches, takes a global view of the texts to be -compared. We have implemented a tool to compute our textual distance and -conducted experiments on several corpuses of texts. The experiments show that -our methods can reliably identify different global types of texts. -" -1108,1403.4362,Mohammed El Amine Abderrahim,"Concept Based vs. Pseudo Relevance Feedback Performance Evaluation for - Information Retrieval System",cs.IR cs.CL," This article evaluates the performance of two techniques for query -reformulation in a system for information retrieval, namely, the concept based -and the pseudo relevance feedback reformulation. The experiments performed on a -corpus of Arabic text have allowed us to compare the contribution of these two -reformulation techniques in improving the performance of an information -retrieval system for Arabic texts. -" -1109,1403.4467,R\'emi Dubot and Christophe Collet,A hybrid formalism to parse Sign Languages,cs.CL," Sign Language (SL) linguistic is dependent on the expensive task of -annotating. Some automation is already available for low-level information (eg. -body part tracking) and the lexical level has shown significant progresses. The -syntactic level lacks annotated corpora as well as complete and consistent -models. This article presents a solution for the automatic annotation of SL -syntactic elements. It exposes a formalism able to represent both -constituency-based and dependency-based models. The first enable the -representation the structures one may want to annotate, the second aims at -fulfilling the holes of the first. A parser is presented and used to conduct -two experiments on the solution. One experiment is on a real corpus, the other -is on a synthetic corpus. -" -1110,1403.4473,R\'emi Dubot and Christophe Collet,Sign Language Gibberish for syntactic parsing evaluation,cs.CL," Sign Language (SL) automatic processing slowly progresses bottom-up. The -field has seen proposition to handle the video signal, to recognize and -synthesize sublexical and lexical units. It starts to see the development of -supra-lexical processing. But the recognition, at this level, lacks data. The -syntax of SL appears very specific as it uses massively the multiplicity of -articulators and its access to the spatial dimensions. Therefore new parsing -techniques are developed. However these need to be evaluated. The shortage on -real data restrains the corpus-based models to small sizes. We propose here a -solution to produce data-sets for the evaluation of parsers on the specific -properties of SL. The article first describes the general model used to -generates dependency grammars and the phrase generation from these lasts. It -then discusses the limits of approach. The solution shows to be of particular -interest to evaluate the scalability of the techniques on big models. -" -1111,1403.4759,"Zeeshan Bhatti, Imdad Ali Ismaili, Asad Ali Shaikh, Waseem Javaid",Spelling Error Trends and Patterns in Sindhi,cs.CL," Statistical error Correction technique is the most accurate and widely used -approach today, but for a language like Sindhi which is a low resourced -language the trained corpora's are not available, so the statistical techniques -are not possible at all. Instead a useful alternative would be to exploit -various spelling error trends in Sindhi by using a Rule based approach. For -designing such technique an essential prerequisite would be to study the -various error patterns in a language. This pa per presents various studies of -spelling error trends and their types in Sindhi Language. The research shows -that the error trends common to all languages are also encountered in Sindhi -but their do exist some error patters that are catered specifically to a Sindhi -language. -" -1112,1403.4887,Andrew Warren and Joao Setubal,Using Entropy Estimates for DAG-Based Ontologies,cs.CL," Motivation: Entropy measurements on hierarchical structures have been used in -methods for information retrieval and natural language modeling. Here we -explore its application to semantic similarity. By finding shared ontology -terms, semantic similarity can be established between annotated genes. A common -procedure for establishing semantic similarity is to calculate the -descriptiveness (information content) of ontology terms and use these values to -determine the similarity of annotations. Most often information content is -calculated for an ontology term by analyzing its frequency in an annotation -corpus. The inherent problems in using these values to model functional -similarity motivates our work. Summary: We present a novel calculation for -establishing the entropy of a DAG-based ontology, which can be used in an -alternative method for establishing the information content of its terms. We -also compare our IC metric to two others using semantic and sequence -similarity. -" -1113,1403.4928,"Steven Bethard, Leon Derczynski, James Pustejovsky, Marc Verhagen",Clinical TempEval,cs.CL," We describe the Clinical TempEval task which is currently in preparation for -the SemEval-2015 evaluation exercise. This task involves identifying and -describing events, times and the relations between them in clinical text. Six -discrete subtasks are included, focusing on recognising mentions of times and -events, describing those mentions for both entity types, identifying the -relation between an event and the document creation time, and identifying -narrative container relations. -" -1114,1403.5596,Tarek El-Shishtawy and Fatma El-Ghannam,A Lemma Based Evaluator for Semitic Language Text Summarization Systems,cs.CL cs.IR," Matching texts in highly inflected languages such as Arabic by simple -stemming strategy is unlikely to perform well. In this paper, we present a -strategy for automatic text matching technique for for inflectional languages, -using Arabic as the test case. The system is an extension of ROUGE test in -which texts are matched on token's lemma level. The experimental results show -an enhancement of detecting similarities between different sentences having -same semantics but written in different lexical forms.. -" -1115,1403.6023,"Lu\'is Marujo, Anatole Gershman, Jaime Carbonell, Jo\~ao P. Neto, - David Martins de Matos",Ensemble Detection of Single & Multiple Events at Sentence-Level,cs.CL cs.LG," Event classification at sentence level is an important Information Extraction -task with applications in several NLP, IR, and personalization systems. -Multi-label binary relevance (BR) are the state-of-art methods. In this work, -we explored new multi-label methods known for capturing relations between event -types. These new methods, such as the ensemble Chain of Classifiers, improve -the F1 on average across the 6 labels by 2.8% over the Binary Relevance. The -low occurrence of multi-label sentences motivated the reduction of the hard -imbalanced multi-label classification problem with low number of occurrences of -multiple labels per instance to an more tractable imbalanced multiclass problem -with better results (+ 4.6%). We report the results of adding new features, -such as sentiment strength, rhetorical signals, domain-id (source-id and date), -and key-phrases in both single-label and multi-label event classification -scenarios. -" -1116,1403.6173,"Anna Senina and Marcus Rohrbach and Wei Qiu and Annemarie Friedrich - and Sikandar Amin and Mykhaylo Andriluka and Manfred Pinkal and Bernt Schiele",Coherent Multi-Sentence Video Description with Variable Level of Detail,cs.CV cs.CL," Humans can easily describe what they see in a coherent way and at varying -level of detail. However, existing approaches for automatic video description -are mainly focused on single sentence generation and produce descriptions at a -fixed level of detail. In this paper, we address both of these limitations: for -a variable level of detail we produce coherent multi-sentence descriptions of -complex videos. We follow a two-step approach where we first learn to predict a -semantic representation (SR) from video and then generate natural language -descriptions from the SR. To produce consistent multi-sentence descriptions, we -model across-sentence consistency at the level of the SR by enforcing a -consistent topic. We also contribute both to the visual recognition of objects -proposing a hand-centric approach as well as to the robust generation of -sentences using a word lattice. Human judges rate our multi-sentence -descriptions as more readable, correct, and relevant than related work. To -understand the difference between more detailed and shorter descriptions, we -collect and analyze a video description corpus of three levels of detail. -" -1117,1403.6381,"K. Sureka, K.G. Srinivasagan, S. Suganthi",An efficiency dependency parser using hybrid approach for tamil language,cs.CL," Natural language processing is a prompt research area across the country. -Parsing is one of the very crucial tool in language analysis system which aims -to forecast the structural relationship among the words in a given sentence. -Many researchers have already developed so many language tools but the accuracy -is not meet out the human expectation level, thus the research is still exists. -Machine translation is one of the major application area under Natural Language -Processing. While translation between one language to another language, the -structure identification of a sentence play a key role. This paper introduces -the hybrid way to solve the identification of relationship among the given -words in a sentence. In existing system is implemented using rule based -approach, which is not suited in huge amount of data. The machine learning -approaches is suitable for handle larger amount of data and also to get better -accuracy via learning and training the system. The proposed approach takes a -Tamil sentence as an input and produce the result of a dependency relation as a -tree like structure using hybrid approach. This proposed tool is very helpful -for researchers and act as an odd-on improve the quality of existing -approaches. -" -1118,1403.6392,"Arturo Curiel, Christophe Collet","Implementation of an Automatic Sign Language Lexical Annotation - Framework based on Propositional Dynamic Logic",cs.CL," In this paper, we present the implementation of an automatic Sign Language -(SL) sign annotation framework based on a formal logic, the Propositional -Dynamic Logic (PDL). Our system relies heavily on the use of a specific variant -of PDL, the Propositional Dynamic Logic for Sign Language (PDLSL), which lets -us describe SL signs as formulae and corpora videos as labeled transition -systems (LTSs). Here, we intend to show how a generic annotation system can be -constructed upon these underlying theoretical principles, regardless of the -tracking technologies available or the input format of corpora. With this in -mind, we generated a development framework that adapts the system to specific -use cases. Furthermore, we present some results obtained by our application -when adapted to one distinct case, 2D corpora analysis with pre-processed -tracking information. We also present some insights on how such a technology -can be used to analyze 3D real-time data, captured with a depth device. -" -1119,1403.6397,"Frank Rosner, Alexander Hinneburg, Michael R\""oder, Martin Nettling, - Andreas Both",Evaluating topic coherence measures,cs.LG cs.CL cs.IR," Topic models extract representative word sets - called topics - from word -counts in documents without requiring any semantic annotations. Topics are not -guaranteed to be well interpretable, therefore, coherence measures have been -proposed to distinguish between good and bad topics. Studies of topic coherence -so far are limited to measures that score pairs of individual words. For the -first time, we include coherence measures from scientific philosophy that score -pairs of more complex word subsets and apply them to topic scoring. -" -1120,1403.6636,"Arturo Curiel, Christophe Collet",Sign Language Lexical Recognition With Propositional Dynamic Logic,cs.CL," This paper explores the use of Propositional Dynamic Logic (PDL) as a -suitable formal framework for describing Sign Language (SL), the language of -deaf people, in the context of natural language processing. SLs are visual, -complete, standalone languages which are just as expressive as oral languages. -Signs in SL usually correspond to sequences of highly specific body postures -interleaved with movements, which make reference to real world objects, -characters or situations. Here we propose a formal representation of SL signs, -that will help us with the analysis of automatically-collected hand tracking -data from French Sign Language (FSL) video corpora. We further show how such a -representation could help us with the design of computer aided SL verification -tools, which in turn would bring us closer to the development of an automatic -recognition system for these languages. -" -1121,1403.7335,"Duyu Tang, Bing Qin, Ting Liu, Qiuhui Shi",Emotion Analysis Platform on Chinese Microblog,cs.CL cs.CY cs.IR," Weibo, as the largest social media service in China, has billions of messages -generated every day. The huge number of messages contain rich sentimental -information. In order to analyze the emotional changes in accordance with time -and space, this paper presents an Emotion Analysis Platform (EAP), which -explores the emotional distribution of each province, so that can monitor the -global pulse of each province in China. The massive data of Weibo and the -real-time requirements make the building of EAP challenging. In order to solve -the above problems, emoticons, emotion lexicon and emotion-shifting rules are -adopted in EAP to analyze the emotion of each tweet. In order to verify the -effectiveness of the platform, case study on the Sichuan earthquake is done, -and the analysis result of the platform accords with the fact. In order to -analyze from quantity, we manually annotate a test set and conduct experiment -on it. The experimental results show that the macro-Precision of EAP reaches -80% and the EAP works effectively. -" -1122,1403.7455,"Shruti Mathur, Varun Prakash Saxena",Hybrid Approach to English-Hindi Name Entity Transliteration,cs.CL," Machine translation (MT) research in Indian languages is still in its -infancy. Not much work has been done in proper transliteration of name entities -in this domain. In this paper we address this issue. We have used English-Hindi -language pair for our experiments and have used a hybrid approach. At first we -have processed English words using a rule based approach which extracts -individual phonemes from the words and then we have applied statistical -approach which converts the English into its equivalent Hindi phoneme and in -turn the corresponding Hindi word. Through this approach we have attained -83.40% accuracy. -" -1123,1404.0850,"Rui Couto (University of Minho), Ant\'onio Nestor Ribeiro (University - of Minho), Jos\'e Creissac Campos (University of Minho)","Application of Ontologies in Identifying Requirements Patterns in Use - Cases",cs.SE cs.CL cs.IR," Use case specifications have successfully been used for requirements -description. They allow joining, in the same modeling space, the expectations -of the stakeholders as well as the needs of the software engineer and analyst -involved in the process. While use cases are not meant to describe a system's -implementation, by formalizing their description we are able to extract -implementation relevant information from them. More specifically, we are -interested in identifying requirements patterns (common requirements with -typical implementation solutions) in support for a requirements based software -development approach. In the paper we propose the transformation of Use Case -descriptions expressed in a Controlled Natural Language into an ontology -expressed in the Web Ontology Language (OWL). OWL's query engines can then be -used to identify requirements patterns expressed as queries over the ontology. -We describe a tool that we have developed to support the approach and provide -an example of usage. -" -1124,1404.1521,"Vivek Kulkarni, Rami Al-Rfou', Bryan Perozzi, Steven Skiena",Exploring the power of GPU's for training Polyglot language models,cs.LG cs.CL," One of the major research trends currently is the evolution of heterogeneous -parallel computing. GP-GPU computing is being widely used and several -applications have been designed to exploit the massive parallelism that -GP-GPU's have to offer. While GPU's have always been widely used in areas of -computer vision for image processing, little has been done to investigate -whether the massive parallelism provided by GP-GPU's can be utilized -effectively for Natural Language Processing(NLP) tasks. In this work, we -investigate and explore the power of GP-GPU's in the task of learning language -models. More specifically, we investigate the performance of training Polyglot -language models using deep belief neural networks. We evaluate the performance -of training the model on the GPU and present optimizations that boost the -performance on the GPU.One of the key optimizations, we propose increases the -performance of a function involved in calculating and updating the gradient by -approximately 50 times on the GPU for sufficiently large batch sizes. We show -that with the above optimizations, the GP-GPU's performance on the task -increases by factor of approximately 3-4. The optimizations we made are generic -Theano optimizations and hence potentially boost the performance of other -models which rely on these operations.We also show that these optimizations -result in the GPU's performance at this task being now comparable to that on -the CPU. We conclude by presenting a thorough evaluation of the applicability -of GP-GPU's for this task and highlight the factors limiting the performance of -training a Polyglot model on the GPU. -" -1125,1404.1847,"Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar, Hemant - Darbari","Evaluation and Ranking of Machine Translated Output in Hindi Language - using Precision and Recall Oriented Metrics",cs.CL," Evaluation plays a crucial role in development of Machine translation -systems. In order to judge the quality of an existing MT system i.e. if the -translated output is of human translation quality or not, various automatic -metrics exist. We here present the implementation results of different metrics -when used on Hindi language along with their comparisons, illustrating how -effective are these metrics on languages like Hindi (free word order language). -" -1126,1404.1872,"Anthony Sigogne (LIGM), Matthieu Constant (LIGM), Eric Laporte (LIGM)","Int\'egration des donn\'ees d'un lexique syntaxique dans un analyseur - syntaxique probabiliste",cs.CL," This article reports the evaluation of the integration of data from a -syntactic-semantic lexicon, the Lexicon-Grammar of French, into a syntactic -parser. We show that by changing the set of labels for verbs and predicational -nouns, we can improve the performance on French of a non-lexicalized -probabilistic parser. -" -1127,1404.1890,"Maksymilian Bujok, Piotr Fronczak, Agata Fronczak","Polish and English wordnets -- statistical analysis of interconnected - networks",cs.CL physics.soc-ph," Wordnets are semantic networks containing nouns, verbs, adjectives, and -adverbs organized according to linguistic principles, by means of semantic -relations. In this work, we adopt a complex network perspective to perform a -comparative analysis of the English and Polish wordnets. We determine their -similarities and show that the networks exhibit some of the typical -characteristics observed in other real-world networks. We analyse interlingual -relations between both wordnets and deliberate over the problem of mapping the -Polish lexicon onto the English one. -" -1128,1404.1982,"Amani K Samha, Yuefeng Li and Jinglan Zhang",Aspect-Based Opinion Extraction from Customer reviews,cs.CL cs.IR," Text is the main method of communicating information in the digital age. -Messages, blogs, news articles, reviews, and opinionated information abound on -the Internet. People commonly purchase products online and post their opinions -about purchased items. This feedback is displayed publicly to assist others -with their purchasing decisions, creating the need for a mechanism with which -to extract and summarize useful information for enhancing the decision-making -process. Our contribution is to improve the accuracy of extraction by combining -different techniques from three major areas, named Data Mining, Natural -Language Processing techniques and Ontologies. The proposed framework -sequentially mines products aspects and users opinions, groups representative -aspects by similarity, and generates an output summary. This paper focuses on -the task of extracting product aspects and users opinions by extracting all -possible aspects and opinions from reviews using natural language, ontology, -and frequent (tag) sets. The proposed framework, when compared with an existing -baseline model, yielded promising results. -" -1129,1404.2071,Dana Dann\'ells and Normunds Gr\=uz\=itis,Extracting a bilingual semantic grammar from FrameNet-annotated corpora,cs.CL," We present the creation of an English-Swedish FrameNet-based grammar in -Grammatical Framework. The aim of this research is to make existing framenets -computationally accessible for multilingual natural language applications via a -common semantic grammar API, and to facilitate the porting of such grammar to -other languages. In this paper, we describe the abstract syntax of the semantic -grammar while focusing on its automatic extraction possibilities. We have -extracted a shared abstract syntax from ~58,500 annotated sentences in Berkeley -FrameNet (BFN) and ~3,500 annotated sentences in Swedish FrameNet (SweFN). The -abstract syntax defines 769 frame-specific valence patterns that cover 77.8% -examples in BFN and 74.9% in SweFN belonging to the shared set of 471 frames. -As a side result, we provide a unified method for comparing semantic and -syntactic valence patterns across framenets. -" -1130,1404.2188,"Nal Kalchbrenner, Edward Grefenstette, Phil Blunsom",A Convolutional Neural Network for Modelling Sentences,cs.CL," The ability to accurately represent sentences is central to language -understanding. We describe a convolutional architecture dubbed the Dynamic -Convolutional Neural Network (DCNN) that we adopt for the semantic modelling of -sentences. The network uses Dynamic k-Max Pooling, a global pooling operation -over linear sequences. The network handles input sentences of varying length -and induces a feature graph over the sentence that is capable of explicitly -capturing short and long-range relations. The network does not rely on a parse -tree and is easily applicable to any language. We test the DCNN in four -experiments: small scale binary and multi-class sentiment prediction, six-way -question classification and Twitter sentiment prediction by distant -supervision. The network achieves excellent performance in the first three -tasks and a greater than 25% error reduction in the last task with respect to -the strongest baseline. -" -1131,1404.2878,"Dalwadi Bijal, Suthar Sanket",Overview of Stemming Algorithms for Indian and Non-Indian Languages,cs.CL," Stemming is a pre-processing step in Text Mining applications as well as a -very common requirement of Natural Language processing functions. Stemming is -the process for reducing inflected words to their stem. The main purpose of -stemming is to reduce different grammatical forms / word forms of a word like -its noun, adjective, verb, adverb etc. to its root form. Stemming is widely -uses in Information Retrieval system and reduces the size of index files. We -can say that the goal of stemming is to reduce inflectional forms and sometimes -derivationally related forms of a word to a common base form. In this paper we -have discussed different stemming algorithm for non-Indian and Indian language, -methods of stemming, accuracy and errors. -" -1132,1404.2997,"Jean-Gabriel Ganascia (LIP6), Pierre Glaudes (CELFF XVI-XXI), Andrea - Del Lungo (ALITHILA)",Automatic Detection of Reuses and Citations in Literary Texts,cs.CL cs.DL," For more than forty years now, modern theories of literature (Compagnon, -1979) insist on the role of paraphrases, rewritings, citations, reciprocal -borrowings and mutual contributions of any kinds. The notions of -intertextuality, transtextuality, hypertextuality/hypotextuality, were -introduced in the seventies and eighties to approach these phenomena. The -careful analysis of these references is of particular interest in evaluating -the distance that the creator voluntarily introduces with his/her masters. -Phoebus is collaborative project that makes computer scientists from the -University Pierre and Marie Curie (LIP6-UPMC) collaborate with the literary -teams of Paris-Sorbonne University with the aim to develop efficient tools for -literary studies that take advantage of modern computer science techniques. In -this context, we have developed a piece of software that automatically detects -and explores networks of textual reuses in classical literature. This paper -describes the principles on which is based this program, the significant -results that have already been obtained and the perspectives for the near -future. -" -1133,1404.3026,"Todd Bodnar, Victoria C Barclay, Nilam Ram, Conrad S Tucker, Marcel - Salath\'e","On the Ground Validation of Online Diagnosis with Twitter and Medical - Records",cs.SI cs.CL cs.LG," Social media has been considered as a data source for tracking disease. -However, most analyses are based on models that prioritize strong correlation -with population-level disease rates over determining whether or not specific -individual users are actually sick. Taking a different approach, we develop a -novel system for social-media based disease detection at the individual level -using a sample of professionally diagnosed individuals. Specifically, we -develop a system for making an accurate influenza diagnosis based on an -individual's publicly available Twitter data. We find that about half (17/35 = -48.57%) of the users in our sample that were sick explicitly discuss their -disease on Twitter. By developing a meta classifier that combines text -analysis, anomaly detection, and social network analysis, we are able to -diagnose an individual with greater than 99% accuracy even if she does not -discuss her health. -" -1134,1404.3233,"Joshua Hailpern, Niranjan Damera Venkata, Marina Danilevsky","Pagination: It's what you say, not how long it takes to say it",cs.CL cs.IR," Pagination - the process of determining where to break an article across -pages in a multi-article layout is a common layout challenge for most -commercially printed newspapers and magazines. To date, no one has created an -algorithm that determines a minimal pagination break point based on the content -of the article. Existing approaches for automatic multi-article layout focus -exclusively on maximizing content (number of articles) and optimizing aesthetic -presentation (e.g., spacing between articles). However, disregarding the -semantic information within the article can lead to overly aggressive cutting, -thereby eliminating key content and potentially confusing the reader, or -setting too generous of a break point, thereby leaving in superfluous content -and making automatic layout more difficult. This is one of the remaining -challenges on the path from manual layouts to fully automated processes that -still ensure article content quality. In this work, we present a new approach -to calculating a document minimal break point for the task of pagination. Our -approach uses a statistical language model to predict minimal break points -based on the semantic content of an article. We then compare 4 novel candidate -approaches, and 4 baselines (currently in use by layout algorithms). Results -from this experiment show that one of our approaches strongly outperforms the -baselines and alternatives. Results from a second study suggest that humans are -not able to agree on a single ""best"" break point. Therefore, this work shows -that a semantic-based lower bound break point prediction is necessary for ideal -automated document synthesis within a real-world context. -" -1135,1404.3377,"Rene Pickhardt, Thomas Gottron, Martin K\""orner, Paul Georg Wagner, - Till Speicher, Steffen Staab","A Generalized Language Model as the Combination of Skipped n-grams and - Modified Kneser-Ney Smoothing",cs.CL," We introduce a novel approach for building language models based on a -systematic, recursive exploration of skip n-gram models which are interpolated -using modified Kneser-Ney smoothing. Our approach generalizes language models -as it contains the classical interpolation with lower order models as a special -case. In this paper we motivate, formalize and present our approach. In an -extensive empirical experiment over English text corpora we demonstrate that -our generalized language models lead to a substantial reduction of perplexity -between 3.1% and 12.7% in comparison to traditional language models using -modified Kneser-Ney smoothing. Furthermore, we investigate the behaviour over -three other languages and a domain specific corpus where we observed consistent -improvements. Finally, we also show that the strength of our approach lies in -its ability to cope in particular with sparse training data. Using a very small -training data set of only 736 KB text we yield improvements of even 25.7% -reduction of perplexity. -" -1136,1404.3610,"Cosme Adrover, Todd Bodnar, Marcel Salathe","Targeting HIV-related Medication Side Effects and Sentiment Using - Twitter Data",cs.SI cs.CL cs.IR," We present a descriptive analysis of Twitter data. Our study focuses on -extracting the main side effects associated with HIV treatments. The crux of -our work was the identification of personal tweets referring to HIV. We -summarize our results in an infographic aimed at the general public. In -addition, we present a measure of user sentiment based on hand-rated tweets. -" -1137,1404.3759,Bogdan Babych and Anthony Hartley,Meta-evaluation of comparability metrics using parallel corpora,cs.CL," Metrics for measuring the comparability of corpora or texts need to be -developed and evaluated systematically. Applications based on a corpus, such as -training Statistical MT systems in specialised narrow domains, require finding -a reasonable balance between the size of the corpus and its consistency, with -controlled and benchmarked levels of comparability for any newly added -sections. In this article we propose a method that can meta-evaluate -comparability metrics by calculating monolingual comparability scores -separately on the 'source' and 'target' sides of parallel corpora. The range of -scores on the source side is then correlated (using Pearson's r coefficient) -with the range of 'target' scores; the higher the correlation - the more -reliable is the metric. The intuition is that a good metric should yield the -same distance between different domains in different languages. Our method -gives consistent results for the same metrics on different data sets, which -indicates that it is reliable and can be used for metric comparison or for -optimising settings of parametrised metrics. -" -1138,1404.3925,"Antonin Delpeuch (\'Ecole Normale Sup\'erieure, Paris)",Complexity of Grammar Induction for Quantum Types,cs.CL math.CT," Most categorical models of meaning use a functor from the syntactic category -to the semantic category. When semantic information is available, the problem -of grammar induction can therefore be defined as finding preimages of the -semantic types under this forgetful functor, lifting the information flow from -the semantic level to a valid reduction at the syntactic level. We study the -complexity of grammar induction, and show that for a variety of type systems, -including pivotal and compact closed categories, the grammar induction problem -is NP-complete. Our approach could be extended to linguistic type systems such -as autonomous or bi-closed categories. -" -1139,1404.3959,"Marco Guerini, Fabio Pianesi, Oliviero Stock",Is it morally acceptable for a system to lie to persuade me?,cs.CY cs.CL," Given the fast rise of increasingly autonomous artificial agents and robots, -a key acceptability criterion will be the possible moral implications of their -actions. In particular, intelligent persuasive systems (systems designed to -influence humans via communication) constitute a highly sensitive topic because -of their intrinsically social nature. Still, ethical studies in this area are -rare and tend to focus on the output of the required action. Instead, this work -focuses on the persuasive acts themselves (e.g. ""is it morally acceptable that -a machine lies or appeals to the emotions of a person to persuade her, even if -for a good end?""). Exploiting a behavioral approach, based on human assessment -of moral dilemmas -- i.e. without any prior assumption of underlying ethical -theories -- this paper reports on a set of experiments. These experiments -address the type of persuader (human or machine), the strategies adopted -(purely argumentative, appeal to positive emotions, appeal to negative -emotions, lie) and the circumstances. Findings display no differences due to -the agent, mild acceptability for persuasion and reveal that truth-conditional -reasoning (i.e. argument validity) is a significant dimension affecting -subjects' judgment. Some implications for the design of intelligent persuasive -systems are discussed. -" -1140,1404.3992,"Aditi Kalyani, Hemant Kumud, Shashi Pal Singh, Ajai Kumar",Assessing the Quality of MT Systems for Hindi to English Translation,cs.CL," Evaluation plays a vital role in checking the quality of MT output. It is -done either manually or automatically. Manual evaluation is very time consuming -and subjective, hence use of automatic metrics is done most of the times. This -paper evaluates the translation quality of different MT Engines for -Hindi-English (Hindi data is provided as input and English is obtained as -output) using various automatic metrics like BLEU, METEOR etc. Further the -comparison automatic evaluation results with Human ranking have also been -given. -" -1141,1404.4314,"Lingpeng Kong, Noah A. Smith",An Empirical Comparison of Parsing Methods for Stanford Dependencies,cs.CL," Stanford typed dependencies are a widely desired representation of natural -language sentences, but parsing is one of the major computational bottlenecks -in text analysis systems. In light of the evolving definition of the Stanford -dependencies and developments in statistical dependency parsing algorithms, -this paper revisits the question of Cer et al. (2010): what is the tradeoff -between accuracy and speed in obtaining Stanford dependencies in particular? We -also explore the effects of input representations on this tradeoff: -part-of-speech tags, the novel use of an alternative dependency representation -as input, and distributional representaions of words. We find that direct -dependency parsing is a more viable solution than it was found to be in the -past. An accompanying software release can be found at: -http://www.ark.cs.cmu.edu/TBSD -" -1142,1404.4326,"Antoine Bordes, Jason Weston and Nicolas Usunier",Open Question Answering with Weakly Supervised Embedding Models,cs.CL cs.LG," Building computers able to answer questions on any subject is a long standing -goal of artificial intelligence. Promising progress has recently been achieved -by methods that learn to map questions to logical forms or database queries. -Such approaches can be effective but at the cost of either large amounts of -human-labeled data or by defining lexicons and grammars tailored by -practitioners. In this paper, we instead take the radical approach of learning -to map questions to vectorial feature representations. By mapping answers into -the same space one can query any knowledge base independent of its schema, -without requiring any grammar or lexicon. Our method is trained with a new -optimization procedure combining stochastic gradient descent followed by a -fine-tuning step using the weak supervision provided by blending automatically -and collaboratively generated resources. We empirically demonstrate that our -model can capture meaningful signals from its noisy supervision leading to -major improvements over paralex, the only existing method able to be trained on -similar weakly labeled data. -" -1143,1404.4572,"Behrang Qasemizadeh, Saeed Rahimi, Behrooz Mahmoodi Bakhtiari","The First Parallel Multilingual Corpus of Persian: Toward a Persian - BLARK",cs.CL," In this article, we have introduced the first parallel corpus of Persian with -more than 10 other European languages. This article describes primary steps -toward preparing a Basic Language Resources Kit (BLARK) for Persian. Up to now, -we have proposed morphosyntactic specification of Persian based on -EAGLE/MULTEXT guidelines and specific resources of MULTEXT-East. The article -introduces Persian Language, with emphasis on its orthography and -morphosyntactic features, then a new Part-of-Speech categorization and -orthography for Persian in digital environments is proposed. Finally, the -corpus and related statistic will be analyzed. -" -1144,1404.4606,"Derek Greene, Derek O'Callaghan, P\'adraig Cunningham",How Many Topics? Stability Analysis for Topic Models,cs.LG cs.CL cs.IR," Topic modeling refers to the task of discovering the underlying thematic -structure in a text corpus, where the output is commonly presented as a report -of the top terms appearing in each topic. Despite the diversity of topic -modeling algorithms that have been proposed, a common challenge in successfully -applying these techniques is the selection of an appropriate number of topics -for a given corpus. Choosing too few topics will produce results that are -overly broad, while choosing too many will result in the ""over-clustering"" of a -corpus into many small, highly-similar topics. In this paper, we propose a -term-centric stability analysis strategy to address this issue, the idea being -that a model with an appropriate number of topics will be more robust to -perturbations in the data. Using a topic modeling approach based on matrix -factorization, evaluations performed on a range of corpora show that this -strategy can successfully guide the model selection process. -" -1145,1404.4641,Karl Moritz Hermann and Phil Blunsom,Multilingual Models for Compositional Distributed Semantics,cs.CL," We present a novel technique for learning semantic representations, which -extends the distributional hypothesis to multilingual data and joint-space -embeddings. Our models leverage parallel data and learn to strongly align the -embeddings of semantically equivalent sentences, while maintaining sufficient -distance between those of dissimilar sentences. The models do not rely on word -alignments or any syntactic information and are successfully applied to a -number of diverse languages. We extend our approach to learn semantic -representations at the document level, too. We evaluate these models on two -cross-lingual document classification tasks, outperforming the prior state of -the art. Through qualitative analysis and the study of pivoting effects we -demonstrate that our representations are semantically plausible and can capture -semantic relationships across languages without parallel data. -" -1146,1404.4714,"Yaming Sun, Lei Lin, Duyu Tang, Nan Yang, Zhenzhou Ji, Xiaolong Wang",Radical-Enhanced Chinese Character Embedding,cs.CL," We present a method to leverage radical for learning Chinese character -embedding. Radical is a semantic and phonetic component of Chinese character. -It plays an important role as characters with the same radical usually have -similar semantic meaning and grammatical usage. However, existing Chinese -processing algorithms typically regard word or character as the basic unit but -ignore the crucial radical information. In this paper, we fill this gap by -leveraging radical for learning continuous representation of Chinese character. -We develop a dedicated neural architecture to effectively learn character -embedding and apply it on Chinese character similarity judgement and Chinese -word segmentation. Experiment results show that our radical-enhanced method -outperforms existing embedding learning algorithms on both tasks. -" -1147,1404.4740,"Behrang QasemiZadeh, Saeed Rahimi and Mehdi Safaee Ghalati",Challenges in Persian Electronic Text Analysis,cs.CL," Farsi, also known as Persian, is the official language of Iran and Tajikistan -and one of the two main languages spoken in Afghanistan. Farsi enjoys a unified -Arabic script as its writing system. In this paper we briefly introduce the -writing standards of Farsi and highlight problems one would face when analyzing -Farsi electronic texts, especially during development of Farsi corpora -regarding to transcription and encoding of Farsi e-texts. The pointes mentioned -may sounds easy but they are crucial when developing and processing written -corpora of Farsi. -" -1148,1404.4935,"Richa Sharma, Shweta Nigam, Rekha Jain",Opinion Mining In Hindi Language: A Survey,cs.IR cs.CL," Opinions are very important in the life of human beings. These Opinions -helped the humans to carry out the decisions. As the impact of the Web is -increasing day by day, Web documents can be seen as a new source of opinion for -human beings. Web contains a huge amount of information generated by the users -through blogs, forum entries, and social networking websites and so on To -analyze this large amount of information it is required to develop a method -that automatically classifies the information available on the Web. This domain -is called Sentiment Analysis and Opinion Mining. Opinion Mining or Sentiment -Analysis is a natural language processing task that mine information from -various text forms such as reviews, news, and blogs and classify them on the -basis of their polarity as positive, negative or neutral. But, from the last -few years, enormous increase has been seen in Hindi language on the Web. -Research in opinion mining mostly carried out in English language but it is -very important to perform the opinion mining in Hindi language also as large -amount of information in Hindi is also available on the Web. This paper gives -an overview of the work that has been done Hindi language. -" -1149,1404.5278,"Mehrnoosh Sadrzadeh, Stephen Clark, Bob Coecke","The Frobenius anatomy of word meanings I: subject and object relative - pronouns",cs.CL," This paper develops a compositional vector-based semantics of subject and -object relative pronouns within a categorical framework. Frobenius algebras are -used to formalise the operations required to model the semantics of relative -pronouns, including passing information between the relative clause and the -modified noun phrase, as well as copying, combining, and discarding parts of -the relative clause. We develop two instantiations of the abstract semantics, -one based on a truth-theoretic approach and one based on corpus statistics. -" -1150,1404.5357,"Nayan Jyoti Kalita, Navanath Saharia, and Smriti Kumar Sinha","Morphological Analysis of the Bishnupriya Manipuri Language using Finite - State Transducers",cs.CL," In this work we present a morphological analysis of Bishnupriya Manipuri -language, an Indo-Aryan language spoken in the north eastern India. As of now, -there is no computational work available for the language. Finite state -morphology is one of the successful approaches applied in a wide variety of -languages over the year. Therefore we adapted the finite state approach to -analyse morphology of the Bishnupriya Manipuri language. -" -1151,1404.5367,"Alexandre Passos, Vineet Kumar, Andrew McCallum",Lexicon Infused Phrase Embeddings for Named Entity Resolution,cs.CL," Most state-of-the-art approaches for named-entity recognition (NER) use semi -supervised information in the form of word clusters and lexicons. Recently -neural network-based language models have been explored, as they as a byproduct -generate highly informative vector representations for words, known as word -embeddings. In this paper we present two contributions: a new form of learning -word embeddings that can leverage information from relevant lexicons to improve -the representations, and the first system to use neural word embeddings to -achieve state-of-the-art results on named-entity recognition in both CoNLL and -Ontonotes NER. Our system achieves an F1 score of 90.90 on the test set for -CoNLL 2003---significantly better than any previous system trained on public -data, and matching a system employing massive private industrial query-log -data. -" -1152,1404.5372,Andrea Ballatore and Michela Bertolotto and David C. Wilson,Linking Geographic Vocabularies through WordNet,cs.IR cs.CL," The linked open data (LOD) paradigm has emerged as a promising approach to -structuring and sharing geospatial information. One of the major obstacles to -this vision lies in the difficulties found in the automatic integration between -heterogeneous vocabularies and ontologies that provides the semantic backbone -of the growing constellation of open geo-knowledge bases. In this article, we -show how to utilize WordNet as a semantic hub to increase the integration of -LOD. With this purpose in mind, we devise Voc2WordNet, an unsupervised mapping -technique between a given vocabulary and WordNet, combining intensional and -extensional aspects of the geographic terms. Voc2WordNet is evaluated against a -sample of human-generated alignments with the OpenStreetMap (OSM) Semantic -Network, a crowdsourced geospatial resource, and the GeoNames ontology, the -vocabulary of a large digital gazetteer. These empirical results indicate that -the approach can obtain high precision and recall. -" -1153,1404.5585,Matthew Skala,A Structural Query System for Han Characters,cs.CL cs.DB," The IDSgrep structural query system for Han character dictionaries is -presented. This system includes a data model and syntax for describing the -spatial structure of Han characters using Extended Ideographic Description -Sequences (EIDSes) based on the Unicode IDS syntax; a language for querying -EIDS databases, designed to suit the needs of font developers and foreign -language learners; a bit vector index inspired by Bloom filters for faster -query operations; a freely available implementation; and format translation -from popular third-party IDS and XML character databases. Experimental results -are included, with a comparison to other software used for similar -applications. -" -1154,1404.6312,"Yevgeni Berzak, Roi Reichart and Boris Katz",Reconstructing Native Language Typology from Foreign Language Usage,cs.CL," Linguists and psychologists have long been studying cross-linguistic -transfer, the influence of native language properties on linguistic performance -in a foreign language. In this work we provide empirical evidence for this -process in the form of a strong correlation between language similarities -derived from structural features in English as Second Language (ESL) texts and -equivalent similarities obtained from the typological features of the native -languages. We leverage this finding to recover native language typological -similarity structure directly from ESL text, and perform prediction of -typological features in an unsupervised fashion with respect to the target -languages. Our method achieves 72.2% accuracy on the typology prediction task, -a result that is highly competitive with equivalent methods that rely on -typological resources. -" -1155,1404.6491,"Janyce Wiebe, Lingjia Deng",An Account of Opinion Implicatures,cs.CL cs.IR," While previous sentiment analysis research has concentrated on the -interpretation of explicitly stated opinions and attitudes, this work initiates -the computational study of a type of opinion implicature (i.e., -opinion-oriented inference) in text. This paper described a rule-based -framework for representing and analyzing opinion implicatures which we hope -will contribute to deeper automatic interpretation of subjective language. In -the course of understanding implicatures, the system recognizes implicit -sentiments (and beliefs) toward various events and entities in the sentence, -often attributed to different sources (holders) and of mixed polarities; thus, -it produces a richer interpretation than is typical in opinion analysis. -" -1156,1404.7296,"Edward Grefenstette, Phil Blunsom, Nando de Freitas and Karl Moritz - Hermann",A Deep Architecture for Semantic Parsing,cs.CL," Many successful approaches to semantic parsing build on top of the syntactic -analysis of text, and make use of distributional representations or statistical -models to match parses to ontology-specific queries. This paper presents a -novel deep learning architecture which provides a semantic parsing system -through the union of two neural models of language semantics. It allows for the -generation of ontology-specific queries from natural language statements and -questions without the need for parsing, which makes it especially suitable to -grammatically malformed or syntactically atypical text, such as tweets, as well -as permitting the development of semantic parsers for resource-poor languages. -" -1157,1404.7362,"Jinzhu Jia, Luke Miratrix, Bin Yu, Brian Gawalt, Laurent El Ghaoui, - Luke Barnesmoore, Sophie Clavier","Concise comparative summaries (CCS) of large text corpora with a human - experiment",cs.CL stat.AP," In this paper we propose a general framework for topic-specific summarization -of large text corpora and illustrate how it can be used for the analysis of -news databases. Our framework, concise comparative summarization (CCS), is -built on sparse classification methods. CCS is a lightweight and flexible tool -that offers a compromise between simple word frequency based methods currently -in wide use and more heavyweight, model-intensive methods such as latent -Dirichlet allocation (LDA). We argue that sparse methods have much to offer for -text analysis and hope CCS opens the door for a new branch of research in this -important field. For a particular topic of interest (e.g., China or energy), -CSS automatically labels documents as being either on- or off-topic (usually -via keyword search), and then uses sparse classification methods to predict -these labels with the high-dimensional counts of all the other words and -phrases in the documents. The resulting small set of phrases found as -predictive are then harvested as the summary. To validate our tool, we, using -news articles from the New York Times international section, designed and -conducted a human survey to compare the different summarizers with human -understanding. We demonstrate our approach with two case studies, a media -analysis of the framing of ""Egypt"" in the New York Times throughout the Arab -Spring and an informal comparison of the New York Times' and Wall Street -Journal's coverage of ""energy."" Overall, we find that the Lasso with $L^2$ -normalization can be effectively and usefully used to summarize large corpora, -regardless of document size. -" -1158,1405.0049,P. F. Tupper,Exemplar Dynamics Models of the Stability of Phonological Categories,cs.CL cs.SD," We develop a model for the stability and maintenance of phonological -categories. Examples of phonological categories are vowel sounds such as ""i"" -and ""e"". We model such categories as consisting of collections of labeled -exemplars that language users store in their memory. Each exemplar is a -detailed memory of an instance of the linguistic entity in question. Starting -from an exemplar-level model we derive integro-differential equations for the -long-term evolution of the density of exemplars in different portions of -phonetic space. Using these latter equations we investigate under what -conditions two phonological categories merge or not. Our main conclusion is -that for the preservation of distinct phonological categories, it is necessary -that anomalous speech tokens of a given category are discarded, and not merely -stored in memory as an exemplar of another category. -" -1159,1405.0145,Kais Dukes,Contextual Semantic Parsing using Crowdsourced Spatial Descriptions,cs.CL," We describe a contextual parser for the Robot Commands Treebank, a new -crowdsourced resource. In contrast to previous semantic parsers that select the -most-probable parse, we consider the different problem of parsing using -additional situational context to disambiguate between different readings of a -sentence. We show that multiple semantic analyses can be searched using dynamic -programming via interaction with a spatial planner, to guide the parsing -process. We are able to parse sentences in near linear-time by ruling out -analyses early on that are incompatible with spatial context. We report a 34% -upper bound on accuracy, as our planner correctly processes spatial context for -3,394 out of 10,000 sentences. However, our parser achieves a 96.53% -exact-match score for parsing within the subset of sentences recognized by the -planner, compared to 82.14% for a non-contextual parser. -" -1160,1405.0546,"Antti Puurula, Jesse Read, Albert Bifet",Kaggle LSHTC4 Winning Solution,cs.AI cs.CL cs.IR," Our winning submission to the 2014 Kaggle competition for Large Scale -Hierarchical Text Classification (LSHTC) consists mostly of an ensemble of -sparse generative models extending Multinomial Naive Bayes. The -base-classifiers consist of hierarchically smoothed models combining document, -label, and hierarchy level Multinomials, with feature pre-processing using -variants of TF-IDF and BM25. Additional diversification is introduced by -different types of folds and random search optimization for different measures. -The ensemble algorithm optimizes macroFscore by predicting the documents for -each label, instead of the usual prediction of labels per document. Scores for -documents are predicted by weighted voting of base-classifier outputs with a -variant of Feature-Weighted Linear Stacking. The number of documents per label -is chosen using label priors and thresholding of vote scores. This document -describes the models and software used to build our solution. Reproducing the -results for our solution can be done by running the scripts included in the -Kaggle package. A package omitting precomputed result files is also -distributed. All code is open source, released under GNU GPL 2.0, and GPL 3.0 -for Weka and Meka dependencies. -" -1161,1405.0603,Aibek Makazhanov and Denilson Barbosa and Grzegorz Kondrak,Extracting Family Relationship Networks from Novels,cs.CL," We present an approach to the extraction of family relations from literary -narrative, which incorporates a technique for utterance attribution proposed -recently by Elson and McKeown (2010). In our work this technique is used in -combination with the detection of vocatives - the explicit forms of address -used by the characters in a novel. We take advantage of the fact that certain -vocatives indicate family relations between speakers. The extracted relations -are then propagated using a set of rules. We report the results of the -application of our method to Jane Austen's Pride and Prejudice. -" -1162,1405.0616,"James Brofos, Ajay Kannan, Rui Shu",Automated Attribution and Intertextual Analysis,cs.CL cs.DL stat.ML," In this work, we employ quantitative methods from the realm of statistics and -machine learning to develop novel methodologies for author attribution and -textual analysis. In particular, we develop techniques and software suitable -for applications to Classical study, and we illustrate the efficacy of our -approach in several interesting open questions in the field. We apply our -numerical analysis techniques to questions of authorship attribution in the -case of the Greek tragedian Euripides, to instances of intertextuality and -influence in the poetry of the Roman statesman Seneca the Younger, and to cases -of ""interpolated"" text with respect to the histories of Livy. -" -1163,1405.0701,Manaal Faruqui,"""Translation can't change a name"": Using Multilingual Data for Named - Entity Recognition",cs.CL," Named Entities (NEs) are often written with no orthographic changes across -different languages that share a common alphabet. We show that this can be -leveraged so as to improve named entity recognition (NER) by using unsupervised -word clusters from secondary languages as features in state-of-the-art -discriminative NER systems. We observe significant increases in performance, -finding that person and location identification is particularly improved, and -that phylogenetically close languages provide more valuable features than more -distant languages. -" -1164,1405.0941,Elena Cabrio and Serena Villata,Towards a Benchmark of Natural Language Arguments,cs.AI cs.CL," The connections among natural language processing and argumentation theory -are becoming stronger in the latest years, with a growing amount of works going -in this direction, in different scenarios and applying heterogeneous -techniques. In this paper, we present two datasets we built to cope with the -combination of the Textual Entailment framework and bipolar abstract -argumentation. In our approach, such datasets are used to automatically -identify through a Textual Entailment system the relations among the arguments -(i.e., attack, support), and then the resulting bipolar argumentation graphs -are analyzed to compute the accepted arguments. -" -1165,1405.0947,"Tom\'a\v{s} Ko\v{c}isk\'y, Karl Moritz Hermann, Phil Blunsom",Learning Bilingual Word Representations by Marginalizing Alignments,cs.CL," We present a probabilistic model that simultaneously learns alignments and -distributed representations for bilingual data. By marginalizing over word -alignments the model captures a larger semantic context than prior work relying -on hard alignments. The advantage of this approach is demonstrated in a -cross-lingual classification task, where we outperform the prior published -state of the art. -" -1166,1405.1346,Olena Orobinska (ERIC),"Automatic Method Of Domain Ontology Construction based on - Characteristics of Corpora POS-Analysis",cs.CL," It is now widely recognized that ontologies, are one of the fundamental -cornerstones of knowledge-based systems. What is lacking, however, is a -currently accepted strategy of how to build ontology; what kinds of the -resources and techniques are indispensables to optimize the expenses and the -time on the one hand and the amplitude, the completeness, the robustness of en -ontology on the other hand. The paper offers a semi-automatic ontology -construction method from text corpora in the domain of radiological protection. -This method is composed from next steps: 1) text annotation with part-of-speech -tags; 2) revelation of the significant linguistic structures and forming the -templates; 3) search of text fragments corresponding to these templates; 4) -basic ontology instantiation process -" -1167,1405.1359,Michael Kai Petersen,"Latent semantics of action verbs reflect phonetic parameters of - intensity and emotional content",cs.CL," Conjuring up our thoughts, language reflects statistical patterns of word -co-occurrences which in turn come to describe how we perceive the world. -Whether counting how frequently nouns and verbs combine in Google search -queries, or extracting eigenvectors from term document matrices made up of -Wikipedia lines and Shakespeare plots, the resulting latent semantics capture -not only the associative links which form concepts, but also spatial dimensions -embedded within the surface structure of language. As both the shape and -movements of objects have been found to be associated with phonetic contrasts -already in toddlers, this study explores whether articulatory and acoustic -parameters may likewise differentiate the latent semantics of action verbs. -Selecting 3 x 20 emotion, face, and hand related verbs known to activate -premotor areas in the brain, their mutual cosine similarities were computed -using latent semantic analysis LSA, and the resulting adjacency matrices were -compared based on two different large scale text corpora; HAWIK and TASA. -Applying hierarchical clustering to identify common structures across the two -text corpora, the verbs largely divide into combined mouth and hand movements -versus emotional expressions. Transforming the verbs into their constituent -phonemes, the clustered small and large size movements appear differentiated by -front versus back vowels corresponding to increasing levels of arousal. Whereas -the clustered emotional verbs seem characterized by sequences of close versus -open jaw produced phonemes, generating up- or downwards shifts in formant -frequencies that may influence their perceived valence. Suggesting, that the -latent semantics of action verbs reflect parameters of intensity and emotional -polarity that appear correlated with the articulatory contrasts and acoustic -characteristics of phonemes -" -1168,1405.1379,"Ramin Pichevar, Jason Wung, Daniele Giacobello, Joshua Atkins","Design and Optimization of a Speech Recognition Front-End for - Distant-Talking Control of a Music Playback Device",cs.SD cs.CL," This paper addresses the challenging scenario for the distant-talking control -of a music playback device, a common portable speaker with four small -loudspeakers in close proximity to one microphone. The user controls the device -through voice, where the speech-to-music ratio can be as low as -30 dB during -music playback. We propose a speech enhancement front-end that relies on known -robust methods for echo cancellation, double-talk detection, and noise -suppression, as well as a novel adaptive quasi-binary mask that is well suited -for speech recognition. The optimization of the system is then formulated as a -large scale nonlinear programming problem where the recognition rate is -maximized and the optimal values for the system parameters are found through a -genetic algorithm. We validate our methodology by testing over the TIMIT -database for different music playback levels and noise types. Finally, we show -that the proposed front-end allows a natural interaction with the device for -limited-vocabulary voice commands. -" -1169,1405.1406,"Sallam Abualhaija, Karl-Heinz Zimmermann","D-Bees: A Novel Method Inspired by Bee Colony Optimization for Solving - Word Sense Disambiguation",cs.CL," Word sense disambiguation (WSD) is a problem in the field of computational -linguistics given as finding the intended sense of a word (or a set of words) -when it is activated within a certain context. WSD was recently addressed as a -combinatorial optimization problem in which the goal is to find a sequence of -senses that maximize the semantic relatedness among the target words. In this -article, a novel algorithm for solving the WSD problem called D-Bees is -proposed which is inspired by bee colony optimization (BCO)where artificial bee -agents collaborate to solve the problem. The D-Bees algorithm is evaluated on a -standard dataset (SemEval 2007 coarse-grained English all-words task corpus)and -is compared to simulated annealing, genetic algorithms, and two ant colony -optimization techniques (ACO). It will be observed that the BCO and ACO -approaches are on par. -" -1170,1405.1438,"Chenhao Tan, Lillian Lee, Bo Pang","The effect of wording on message propagation: Topic- and - author-controlled natural experiments on Twitter",cs.SI cs.CL physics.soc-ph," Consider a person trying to spread an important message on a social network. -He/she can spend hours trying to craft the message. Does it actually matter? -While there has been extensive prior work looking into predicting popularity of -social-media content, the effect of wording per se has rarely been studied -since it is often confounded with the popularity of the author and the topic. -To control for these confounding factors, we take advantage of the surprising -fact that there are many pairs of tweets containing the same url and written by -the same user but employing different wording. Given such pairs, we ask: which -version attracts more retweets? This turns out to be a more difficult task than -predicting popular topics. Still, humans can answer this question better than -chance (but far from perfectly), and the computational methods we develop can -do better than both an average human and a strong competing method trained on -non-controlled data. -" -1171,1405.1439,"Chenhao Tan, Lillian Lee","A Corpus of Sentence-level Revisions in Academic Writing: A Step towards - Understanding Statement Strength in Communication",cs.CL," The strength with which a statement is made can have a significant impact on -the audience. For example, international relations can be strained by how the -media in one country describes an event in another; and papers can be rejected -because they overstate or understate their findings. It is thus important to -understand the effects of statement strength. A first step is to be able to -distinguish between strong and weak statements. However, even this problem is -understudied, partly due to a lack of data. Since strength is inherently -relative, revisions of texts that make claims are a natural source of data on -strength differences. In this paper, we introduce a corpus of sentence-level -revisions from academic writing. We also describe insights gained from our -annotation efforts for this task. -" -1172,1405.1605,Jacopo Staiano and Marco Guerini,DepecheMood: a Lexicon for Emotion Analysis from Crowd-Annotated News,cs.CL cs.CY," While many lexica annotated with words polarity are available for sentiment -analysis, very few tackle the harder task of emotion analysis and are usually -quite limited in coverage. In this paper, we present a novel approach for -extracting - in a totally automated way - a high-coverage and high-precision -lexicon of roughly 37 thousand terms annotated with emotion scores, called -DepecheMood. Our approach exploits in an original way 'crowd-sourced' affective -annotation implicitly provided by readers of news articles from rappler.com. By -providing new state-of-the-art performances in unsupervised settings for -regression and classification tasks, even using a na\""{\i}ve approach, our -experiments show the beneficial impact of harvesting social media data for -affective lexicon building. -" -1173,1405.1893,"Kristina Ban, Ana Me\v{s}trovi\'c, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c",Initial Comparison of Linguistic Networks Measures for Parallel Texts,cs.CL cs.SI physics.soc-ph," This paper presents preliminary results of Croatian syllable networks -analysis. Syllable network is a network in which nodes are syllables and links -between them are constructed according to their connections within words. In -this paper we analyze networks of syllables generated from texts collected from -the Croatian Wikipedia and Blogs. As a main tool we use complex network -analysis methods which provide mechanisms that can reveal new patterns in a -language structure. We aim to show that syllable networks have much higher -clustering coefficient in comparison to Erd\""os-Renyi random networks. The -results indicate that Croatian syllable networks exhibit certain properties of -a small world networks. Furthermore, we compared Croatian syllable networks -with Portuguese and Chinese syllable networks and we showed that they have -similar properties. -" -1174,1405.1924,Tebbi Hanane and Azzoune Hamid,"An Expert System for Automatic Reading of A Text Written in Standard - Arabic",cs.CL," In this work we present our expert system of Automatic reading or speech -synthesis based on a text written in Standard Arabic, our work is carried out -in two great stages: the creation of the sound data base, and the -transformation of the written text into speech (Text To Speech TTS). This -transformation is done firstly by a Phonetic Orthographical Transcription (POT) -of any written Standard Arabic text with the aim of transforming it into his -corresponding phonetics sequence, and secondly by the generation of the voice -signal which corresponds to the chain transcribed. We spread out the different -of conception of the system, as well as the results obtained compared to others -works studied to realize TTS based on Standard Arabic. -" -1175,1405.2048,"Jeffrey Sukharev, Leonid Zhukov, Alexandrin Popescul",Learning Alternative Name Spellings,cs.IR cs.CL," Name matching is a key component of systems for entity resolution or record -linkage. Alternative spellings of the same names are a com- mon occurrence in -many applications. We use the largest collection of genealogy person records in -the world together with user search query logs to build name matching models. -The procedure for building a crowd-sourced training set is outlined together -with the presentation of our method. We cast the problem of learning -alternative spellings as a machine translation problem at the character level. -We use in- formation retrieval evaluation methodology to show that this method -substantially outperforms on our data a number of standard well known phonetic -and string similarity methods in terms of precision and re- call. Additionally, -we rigorously compare the performance of standard methods when compared with -each other. Our result can lead to a significant practical impact in entity -resolution applications. -" -1176,1405.2386,Srayan Datta,Predicting Central Topics in a Blog Corpus from a Networks Perspective,cs.IR cs.CL cs.SI physics.soc-ph," In today's content-centric Internet, blogs are becoming increasingly popular -and important from a data analysis perspective. According to Wikipedia, there -were over 156 million public blogs on the Internet as of February 2011. Blogs -are a reflection of our contemporary society. The contents of different blog -posts are important from social, psychological, economical and political -perspectives. Discovery of important topics in the blogosphere is an area which -still needs much exploring. We try to come up with a procedure using -probabilistic topic modeling and network centrality measures which identifies -the central topics in a blog corpus. -" -1177,1405.2434,Chen Lijiang,"Coordinate System Selection for Minimum Error Rate Training in - Statistical Machine Translation",cs.CL," Minimum error rate training (MERT) is a widely used training procedure for -statistical machine translation. A general problem of this approach is that the -search space is easy to converge to a local optimum and the acquired weight set -is not in accord with the real distribution of feature functions. This paper -introduces coordinate system selection (RSS) into the search algorithm for -MERT. Contrary to previous approaches in which every dimension only corresponds -to one independent feature function, we create several coordinate systems by -moving one of the dimensions to a new direction. The basic idea is quite simple -but critical that the training procedure of MERT should be based on a -coordinate system formed by search directions but not directly on feature -functions. Experiments show that by selecting coordinate systems with tuning -set results, better results can be obtained without any other language -knowledge. -" -1178,1405.2584,Rahul Tejwani (University at Buffalo),Sentiment Analysis: A Survey,cs.IR cs.CL," Sentiment analysis (also known as opinion mining) refers to the use of -natural language processing, text analysis and computational linguistics to -identify and extract subjective information in source materials. Mining -opinions expressed in the user generated content is a challenging yet -practically very useful problem. This survey would cover various approaches and -methodology used in Sentiment Analysis and Opinion Mining in general. The focus -would be on Internet text like, Product review, tweets and other social media. -" -1179,1405.2702,"Sabina \v{S}i\v{s}ovi\'c, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c and Ana - Me\v{s}trovi\'c",Comparison of the language networks from literature and blogs,cs.CL cs.SI physics.soc-ph," In this paper we present the comparison of the linguistic networks from -literature and blog texts. The linguistic networks are constructed from texts -as directed and weighted co-occurrence networks of words. Words are nodes and -links are established between two nodes if they are directly co-occurring -within the sentence. The comparison of the networks structure is performed at -global level (network) in terms of: average node degree, average shortest path -length, diameter, clustering coefficient, density and number of components. -Furthermore, we perform analysis on the local level (node) by comparing the -rank plots of in and out degree, strength and selectivity. The -selectivity-based results point out that there are differences between the -structure of the networks constructed from literature and blogs. -" -1180,1405.2874,"Dimitri Kartsaklis (University of Oxford), Mehrnoosh Sadrzadeh (Queen - Mary University of London)",A Study of Entanglement in a Categorical Framework of Natural Language,cs.CL cs.AI math.CT quant-ph," In both quantum mechanics and corpus linguistics based on vector spaces, the -notion of entanglement provides a means for the various subsystems to -communicate with each other. In this paper we examine a number of -implementations of the categorical framework of Coecke, Sadrzadeh and Clark -(2010) for natural language, from an entanglement perspective. Specifically, -our goal is to better understand in what way the level of entanglement of the -relational tensors (or the lack of it) affects the compositional structures in -practical situations. Our findings reveal that a number of proposals for verb -construction lead to almost separable tensors, a fact that considerably -simplifies the interactions between the words. We examine the ramifications of -this fact, and we show that the use of Frobenius algebras mitigates the -potential problems to a great extent. Finally, we briefly examine a machine -learning method that creates verb tensors exhibiting a sufficient level of -entanglement. -" -1181,1405.3033,"Zeeshan Bhatti, Ahmad Waqas, Imdad Ali Ismaili, Dil Nawaz Hakro, - Waseem Javaid Soomro","Phonetic based SoundEx & ShapeEx algorithm for Sindhi Spell Checker - System",cs.CL," This paper presents a novel combinational phonetic algorithm for Sindhi -Language, to be used in developing Sindhi Spell Checker which has yet not been -developed prior to this work. The compound textual forms and glyphs of Sindhi -language presents a substantial challenge for developing Sindhi spell checker -system and generating similar suggestion list for misspelled words. In order to -implement such a system, phonetic based Sindhi language rules and patterns must -be considered into account for increasing the accuracy and efficiency. The -proposed system is developed with a blend between Phonetic based SoundEx -algorithm and ShapeEx algorithm for pattern or glyph matching, generating -accurate and efficient suggestion list for incorrect or misspelled Sindhi -words. A table of phonetically similar sounding Sindhi characters for SoundEx -algorithm is also generated along with another table containing similar glyph -or shape based character groups for ShapeEx algorithm. Both these are first -ever attempt of any such type of categorization and representation for Sindhi -Language. -" -1182,1405.3272,Nicholas Kersting,Fast and Fuzzy Private Set Intersection,cs.CR cs.CL," Private Set Intersection (PSI) is usually implemented as a sequence of -encryption rounds between pairs of users, whereas the present work implements -PSI in a simpler fashion: each set only needs to be encrypted once, after which -each pair of users need only one ordinary set comparison. This is typically -orders of magnitude faster than ordinary PSI at the cost of some ``fuzziness"" -in the matching, which may nonetheless be tolerable or even desirable. This is -demonstrated in the case where the sets consist of English words processed with -WordNet. -" -1183,1405.3282,"Tim Althoff, Cristian Danescu-Niculescu-Mizil, Dan Jurafsky","How to Ask for a Favor: A Case Study on the Success of Altruistic - Requests",cs.CL cs.SI physics.soc-ph," Requests are at the core of many social media systems such as question & -answer sites and online philanthropy communities. While the success of such -requests is critical to the success of the community, the factors that lead -community members to satisfy a request are largely unknown. Success of a -request depends on factors like who is asking, how they are asking, when are -they asking, and most critically what is being requested, ranging from small -favors to substantial monetary donations. We present a case study of altruistic -requests in an online community where all requests ask for the very same -contribution and do not offer anything tangible in return, allowing us to -disentangle what is requested from textual and social factors. Drawing from -social psychology literature, we extract high-level social features from text -that operationalize social relations between recipient and donor and -demonstrate that these extracted relations are predictive of success. More -specifically, we find that clearly communicating need through the narrative is -essential and that that linguistic indications of gratitude, evidentiality, and -generalized reciprocity, as well as high status of the asker further increase -the likelihood of success. Building on this understanding, we develop a model -that can predict the success of unseen requests, significantly improving over -several baselines. We link these findings to research in psychology on helping -behavior, providing a basis for further analysis of success in social media -systems. -" -1184,1405.3515,"Yoon Kim, Yi-I Chiu, Kentaro Hanaki, Darshan Hegde, Slav Petrov",Temporal Analysis of Language through Neural Language Models,cs.CL," We provide a method for automatically detecting change in language across -time through a chronologically trained neural language model. We train the -model on the Google Books Ngram corpus to obtain word vector representations -specific to each year, and identify words that have changed significantly from -1900 to 2009. The model identifies words such as ""cell"" and ""gay"" as having -changed during that time period. The model simultaneously identifies the -specific years during which such words underwent change. -" -1185,1405.3518,"Yoon Kim, Owen Zhang","Credibility Adjusted Term Frequency: A Supervised Term Weighting Scheme - for Sentiment Analysis and Text Classification",cs.CL cs.IR," We provide a simple but novel supervised weighting scheme for adjusting term -frequency in tf-idf for sentiment analysis and text classification. We compare -our method to baseline weighting schemes and find that it outperforms them on -multiple benchmarks. The method is robust and works well on both snippets and -longer documents. -" -1186,1405.3539,"Fionn Murtagh, Adam Ganz","Pattern Recognition in Narrative: Tracking Emotional Expression in - Context",cs.AI cs.CL," Using geometric data analysis, our objective is the analysis of narrative, -with narrative of emotion being the focus in this work. The following two -principles for analysis of emotion inform our work. Firstly, emotion is -revealed not as a quality in its own right but rather through interaction. We -study the 2-way relationship of Ilsa and Rick in the movie Casablanca, and the -3-way relationship of Emma, Charles and Rodolphe in the novel {\em Madame -Bovary}. Secondly, emotion, that is expression of states of mind of subjects, -is formed and evolves within the narrative that expresses external events and -(personal, social, physical) context. In addition to the analysis methodology -with key aspects that are innovative, the input data used is crucial. We use, -firstly, dialogue, and secondly, broad and general description that -incorporates dialogue. In a follow-on study, we apply our unsupervised -narrative mapping to data streams with very low emotional expression. We map -the narrative of Twitter streams. Thus we demonstrate map analysis of general -narratives. -" -1187,1405.3772,Yannis Haralambous and Julie Sauvage-Vincent and John Puentes,"INAUT, a Controlled Language for the French Coast Pilot Books - Instructions nautiques",cs.CL," We describe INAUT, a controlled natural language dedicated to collaborative -update of a knowledge base on maritime navigation and to automatic generation -of coast pilot books (Instructions nautiques) of the French National -Hydrographic and Oceanographic Service SHOM. INAUT is based on French language -and abundantly uses georeferenced entities. After describing the structure of -the overall system, giving details on the language and on its generation, and -discussing the three major applications of INAUT (document production, -interaction with ENCs and collaborative updates of the knowledge base), we -conclude with future extensions and open problems. -" -1188,1405.3786,"Domagoj Margan, Ana Me\v{s}trovi\'c, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c","Complex Networks Measures for Differentiation between Normal and - Shuffled Croatian Texts",cs.CL physics.soc-ph," This paper studies the properties of the Croatian texts via complex networks. -We present network properties of normal and shuffled Croatian texts for -different shuffling principles: on the sentence level and on the text level. In -both experiments we preserved the vocabulary size, word and sentence frequency -distributions. Additionally, in the first shuffling approach we preserved the -sentence structure of the text and the number of words per sentence. Obtained -results showed that degree rank distributions exhibit no substantial deviation -in shuffled networks, and strength rank distributions are preserved due to the -same word frequencies. Therefore, standard approach to study the structure of -linguistic co-occurrence networks showed no clear difference among the -topologies of normal and shuffled texts. Finally, we showed that the in- and -out- selectivity values from shuffled texts are constantly below selectivity -values calculated from normal texts. Our results corroborate that the node -selectivity measure can capture structural differences between original and -shuffled Croatian texts. -" -1189,1405.3925,"Laurent Romary (IDSL, INRIA Saclay - Ile de France, CMB), Andreas Witt - (IDS)","M\'ethodes pour la repr\'esentation informatis\'ee de donn\'ees - lexicales / Methoden der Speicherung lexikalischer Daten",cs.CL," In recent years, new developments in the area of lexicography have altered -not only the management, processing and publishing of lexicographical data, but -also created new types of products such as electronic dictionaries and -thesauri. These expand the range of possible uses of lexical data and support -users with more flexibility, for instance in assisting human translation. In -this article, we give a short and easy-to-understand introduction to the -problematic nature of the storage, display and interpretation of lexical data. -We then describe the main methods and specifications used to build and -represent lexical data. This paper is targeted for the following groups of -people: linguists, lexicographers, IT specialists, computer linguists and all -others who wish to learn more about the modelling, representation and -visualization of lexical knowledge. This paper is written in two languages: -French and German. -" -1190,1405.4053,Quoc V. Le and Tomas Mikolov,Distributed Representations of Sentences and Documents,cs.CL cs.AI cs.LG," Many machine learning algorithms require the input to be represented as a -fixed-length feature vector. When it comes to texts, one of the most common -fixed-length features is bag-of-words. Despite their popularity, bag-of-words -features have two major weaknesses: they lose the ordering of the words and -they also ignore semantics of the words. For example, ""powerful,"" ""strong"" and -""Paris"" are equally distant. In this paper, we propose Paragraph Vector, an -unsupervised algorithm that learns fixed-length feature representations from -variable-length pieces of texts, such as sentences, paragraphs, and documents. -Our algorithm represents each document by a dense vector which is trained to -predict words in the document. Its construction gives our algorithm the -potential to overcome the weaknesses of bag-of-words models. Empirical results -show that Paragraph Vectors outperform bag-of-words models as well as other -techniques for text representations. Finally, we achieve new state-of-the-art -results on several text classification and sentiment analysis tasks. -" -1191,1405.4097,"Kristina Ban, Ivan Ivaki\'c and Ana Me\v{s}trovi\'c",A preliminary study of Croatian Language Syllable Networks,cs.CL," This paper presents preliminary results of Croatian syllable networks -analysis. Syllable network is a network in which nodes are syllables and links -between them are constructed according to their connections within words. In -this paper we analyze networks of syllables generated from texts collected from -the Croatian Wikipedia and Blogs. As a main tool we use complex network -analysis methods which provide mechanisms that can reveal new patterns in a -language structure. We aim to show that syllable networks have much higher -clustering coefficient in comparison to Erd\""os-Renyi random networks. The -results indicate that Croatian syllable networks exhibit certain properties of -a small world networks. Furthermore, we compared Croatian syllable networks -with Portuguese and Chinese syllable networks and we showed that they have -similar properties. -" -1192,1405.4248,Yannis Haralambous,Les math\'ematiques de la langue : l'approche formelle de Montague,cs.CL," We present a natural language modelization method which is strongely relying -on mathematics. This method, called ""Formal Semantics,"" has been initiated by -the American linguist Richard M. Montague in the 1970's. It uses mathematical -tools such as formal languages and grammars, first-order logic, type theory and -$\lambda$-calculus. Our goal is to have the reader discover both Montagovian -formal semantics and the mathematical tools that he used in his method. - ----- - Nous pr\'esentons une m\'ethode de mod\'elisation de la langue naturelle qui -est fortement bas\'ee sur les math\'ematiques. Cette m\'ethode, appel\'ee -{\guillemotleft}s\'emantique formelle{\guillemotright}, a \'et\'e initi\'ee par -le linguiste am\'ericain Richard M. Montague dans les ann\'ees 1970. Elle -utilise des outils math\'ematiques tels que les langages et grammaires formels, -la logique du 1er ordre, la th\'eorie de types et le $\lambda$-calcul. Nous -nous proposons de faire d\'ecouvrir au lecteur tant la s\'emantique formelle de -Montague que les outils math\'ematiques dont il s'est servi. -" -1193,1405.4273,Jan A. Botha and Phil Blunsom,Compositional Morphology for Word Representations and Language Modelling,cs.CL," This paper presents a scalable method for integrating compositional -morphological representations into a vector-based probabilistic language model. -Our approach is evaluated in the context of log-bilinear language models, -rendered suitably efficient for implementation inside a machine translation -decoder by factoring the vocabulary. We perform both intrinsic and extrinsic -evaluations, presenting results on a range of languages which demonstrate that -our model learns morphological representations that both perform well on word -similarity tasks and lead to substantial reductions in perplexity. When used -for translation into morphologically rich languages with large vocabularies, -our models obtain improvements of up to 1.2 BLEU points relative to a baseline -system using back-off n-gram models. -" -1194,1405.4364,Yannis Haralambous and Vitaly Klyuev,Thematically Reinforced Explicit Semantic Analysis,cs.CL," We present an extended, thematically reinforced version of Gabrilovich and -Markovitch's Explicit Semantic Analysis (ESA), where we obtain thematic -information through the category structure of Wikipedia. For this we first -define a notion of categorical tfidf which measures the relevance of terms in -categories. Using this measure as a weight we calculate a maximal spanning tree -of the Wikipedia corpus considered as a directed graph of pages and categories. -This tree provides us with a unique path of ""most related categories"" between -each page and the top of the hierarchy. We reinforce tfidf of words in a page -by aggregating it with categorical tfidfs of the nodes of these paths, and -define a thematically reinforced ESA semantic relatedness measure which is more -robust than standard ESA and less sensitive to noise caused by out-of-context -words. We apply our method to the French Wikipedia corpus, evaluate it through -a text classification on a 37.5 MB corpus of 20 French newsgroups and obtain a -precision increase of 9-10% compared with standard ESA. -" -1195,1405.4392,"Sunny Mitra, Ritwik Mitra, Martin Riedl, Chris Biemann, Animesh - Mukherjee, Pawan Goyal","That's sick dude!: Automatic identification of word sense change across - different timescales",cs.CL cs.AI," In this paper, we propose an unsupervised method to identify noun sense -changes based on rigorous analysis of time-varying text data available in the -form of millions of digitized books. We construct distributional thesauri based -networks from data at different time points and cluster each of them separately -to obtain word-centric sense clusters corresponding to the different time -points. Subsequently, we compare these sense clusters of two different time -points to find if (i) there is birth of a new sense or (ii) if an older sense -has got split into more than one sense or (iii) if a newer sense has been -formed from the joining of older senses or (iv) if a particular sense has died. -We conduct a thorough evaluation of the proposed methodology both manually as -well as through comparison with WordNet. Manual evaluation indicates that the -algorithm could correctly identify 60.4% birth cases from a set of 48 randomly -picked samples and 57% split/join cases from a set of 21 randomly picked -samples. Remarkably, in 44% cases the birth of a novel sense is attested by -WordNet, while in 46% cases and 43% cases split and join are respectively -confirmed by WordNet. Our approach can be applied for lexicography, as well as -for applications like word sense disambiguation or semantic search. -" -1196,1405.4433,"Domagoj Margan, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c, Ana Me\v{s}trovi\'c","Preliminary Report on the Structure of Croatian Linguistic Co-occurrence - Networks",cs.CL cs.SI physics.soc-ph," In this article, we investigate the structure of Croatian linguistic -co-occurrence networks. We examine the change of network structure properties -by systematically varying the co-occurrence window sizes, the corpus sizes and -removing stopwords. In a co-occurrence window of size $n$ we establish a link -between the current word and $n-1$ subsequent words. The results point out that -the increase of the co-occurrence window size is followed by a decrease in -diameter, average path shortening and expectedly condensing the average -clustering coefficient. The same can be noticed for the removal of the -stopwords. Finally, since the size of texts is reflected in the network -properties, our results suggest that the corpus influence can be reduced by -increasing the co-occurrence window size. -" -1197,1405.4599,Dalei Wu and Haiqing Wu,"Modelling Data Dispersion Degree in Automatic Robust Estimation for - Multivariate Gaussian Mixture Models with an Application to Noisy Speech - Processing",cs.CL cs.LG stat.ML," The trimming scheme with a prefixed cutoff portion is known as a method of -improving the robustness of statistical models such as multivariate Gaussian -mixture models (MG- MMs) in small scale tests by alleviating the impacts of -outliers. However, when this method is applied to real- world data, such as -noisy speech processing, it is hard to know the optimal cut-off portion to -remove the outliers and sometimes removes useful data samples as well. In this -paper, we propose a new method based on measuring the dispersion degree (DD) of -the training data to avoid this problem, so as to realise automatic robust -estimation for MGMMs. The DD model is studied by using two different measures. -For each one, we theoretically prove that the DD of the data samples in a -context of MGMMs approximately obeys a specific (chi or chi-square) -distribution. The proposed method is evaluated on a real-world application with -a moderately-sized speaker recognition task. Experiments show that the proposed -method can significantly improve the robustness of the conventional training -method of GMMs for speaker recognition. -" -1198,1405.4918,"Mishari Almishari, Ekin Oguz, Gene Tsudik",Fighting Authorship Linkability with Crowdsourcing,cs.DL cs.CL," Massive amounts of contributed content -- including traditional literature, -blogs, music, videos, reviews and tweets -- are available on the Internet -today, with authors numbering in many millions. Textual information, such as -product or service reviews, is an important and increasingly popular type of -content that is being used as a foundation of many trendy community-based -reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown -that, due partly to their specialized/topical nature, sets of reviews authored -by the same person are readily linkable based on simple stylometric features. -In practice, this means that individuals who author more than a few reviews -under different accounts (whether within one site or across multiple sites) can -be linked, which represents a significant loss of privacy. - In this paper, we start by showing that the problem is actually worse than -previously believed. We then explore ways to mitigate authorship linkability in -community-based reviewing. We first attempt to harness the global power of -crowdsourcing by engaging random strangers into the process of re-writing -reviews. As our empirical results (obtained from Amazon Mechanical Turk) -clearly demonstrate, crowdsourcing yields impressively sensible reviews that -reflect sufficiently different stylometric characteristics such that prior -stylometric linkability techniques become largely ineffective. We also consider -using machine translation to automatically re-write reviews. Contrary to what -was previously believed, our results show that translation decreases authorship -linkability as the number of intermediate languages grows. Finally, we explore -the combination of crowdsourcing and machine translation and report on the -results. -" -1199,1405.5202,"Altaf Rahman, Vincent Ng","Narrowing the Modeling Gap: A Cluster-Ranking Approach to Coreference - Resolution",cs.CL," Traditional learning-based coreference resolvers operate by training the -mention-pair model for determining whether two mentions are coreferent or not. -Though conceptually simple and easy to understand, the mention-pair model is -linguistically rather unappealing and lags far behind the heuristic-based -coreference models proposed in the pre-statistical NLP era in terms of -sophistication. Two independent lines of recent research have attempted to -improve the mention-pair model, one by acquiring the mention-ranking model to -rank preceding mentions for a given anaphor, and the other by training the -entity-mention model to determine whether a preceding cluster is coreferent -with a given mention. We propose a cluster-ranking approach to coreference -resolution, which combines the strengths of the mention-ranking model and the -entity-mention model, and is therefore theoretically more appealing than both -of these models. In addition, we seek to improve cluster rankers via two -extensions: (1) lexicalization and (2) incorporating knowledge of anaphoricity -by jointly modeling anaphoricity determination and coreference resolution. -Experimental results on the ACE data sets demonstrate the superior performance -of cluster rankers to competing approaches as well as the effectiveness of our -two extensions. -" -1200,1405.5208,"Alexander M. Rush, Michael Collins","A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference - in Natural Language Processing",cs.CL cs.AI," Dual decomposition, and more generally Lagrangian relaxation, is a classical -method for combinatorial optimization; it has recently been applied to several -inference problems in natural language processing (NLP). This tutorial gives an -overview of the technique. We describe example algorithms, describe formal -guarantees for the method, and describe practical issues in implementing the -algorithms. While our examples are predominantly drawn from the NLP literature, -the material should be of general relevance to inference problems in machine -learning. A central theme of this tutorial is that Lagrangian relaxation is -naturally applied in conjunction with a broad class of combinatorial -algorithms, allowing inference in models that go significantly beyond previous -work on Lagrangian relaxation for inference in graphical models. -" -1201,1405.5447,"Hosein Azarbonyad, Azadeh Shakery, Heshaam Faili","Learning to Exploit Different Translation Resources for Cross Language - Information Retrieval",cs.IR cs.CL," One of the important factors that affects the performance of Cross Language -Information Retrieval(CLIR)is the quality of translations being employed in -CLIR. In order to improve the quality of translations, it is important to -exploit available resources efficiently. Employing different translation -resources with different characteristics has many challenges. In this paper, we -propose a method for exploiting available translation resources simultaneously. -This method employs Learning to Rank(LTR) for exploiting different translation -resources. To apply LTR methods for query translation, we define different -translation relation based features in addition to context based features. We -use the contextual information contained in translation resources for -extracting context based features.The proposed method uses LTR to construct a -translation ranking model based on defined features. The constructed model is -used for ranking translation candidates of query words. To evaluate the -proposed method we do English-Persian CLIR, in which we employ the translation -ranking model to find translations of English queries and employ the -translations to retrieve Persian documents. Experimental results show that our -approach significantly outperforms single resource based CLIR methods. -" -1202,1405.5474,Yannis Haralambous,"New Perspectives in Sinographic Language Processing Through the Use of - Character Structure",cs.CL," Chinese characters have a complex and hierarchical graphical structure -carrying both semantic and phonetic information. We use this structure to -enhance the text model and obtain better results in standard NLP operations. -First of all, to tackle the problem of graphical variation we define -allographic classes of characters. Next, the relation of inclusion of a -subcharacter in a characters, provides us with a directed graph of allographic -classes. We provide this graph with two weights: semanticity (semantic relation -between subcharacter and character) and phoneticity (phonetic relation) and -calculate ""most semantic subcharacter paths"" for each character. Finally, -adding the information contained in these paths to unigrams we claim to -increase the efficiency of text mining methods. We evaluate our method on a -text classification task on two corpora (Chinese and Japanese) of a total of 18 -million characters and get an improvement of 3% on an already high baseline of -89.6% precision, obtained by a linear SVM classifier. Other possible -applications and perspectives of the system are discussed. -" -1203,1405.5654,Lijiang Chen,"Machine Translation Model based on Non-parallel Corpus and - Semi-supervised Transductive Learning",cs.CL," Although the parallel corpus has an irreplaceable role in machine -translation, its scale and coverage is still beyond the actual needs. -Non-parallel corpus resources on the web have an inestimable potential value in -machine translation and other natural language processing tasks. This article -proposes a semi-supervised transductive learning method for expanding the -training corpus in statistical machine translation system by extracting -parallel sentences from the non-parallel corpus. This method only requires a -small amount of labeled corpus and a large unlabeled corpus to build a -high-performance classifier, especially for when there is short of labeled -corpus. The experimental results show that by combining the non-parallel corpus -alignment and the semi-supervised transductive learning method, we can more -effectively use their respective strengths to improve the performance of -machine translation system. -" -1204,1405.5674,Mathieu Mangeot (LIG),"Mot\`aMot project: conversion of a French-Khmer published dictionary for - building a multilingual lexical system",cs.CL," Economic issues related to the information processing techniques are very -important. The development of such technologies is a major asset for developing -countries like Cambodia and Laos, and emerging ones like Vietnam, Malaysia and -Thailand. The MotAMot project aims to computerize an under-resourced language: -Khmer, spoken mainly in Cambodia. The main goal of the project is the -development of a multilingual lexical system targeted for Khmer. The -macrostructure is a pivot one with each word sense of each language linked to a -pivot axi. The microstructure comes from a simplification of the explanatory -and combinatory dictionary. The lexical system has been initialized with data -coming mainly from the conversion of the French-Khmer bilingual dictionary of -Denis Richer from Word to XML format. The French part was completed with -pronunciation and parts-of-speech coming from the FeM French-english-Malay -dictionary. The Khmer headwords noted in IPA in the Richer dictionary were -converted to Khmer writing with OpenFST, a finite state transducer tool. The -resulting resource is available online for lookup, editing, download and remote -programming via a REST API on a Jibiki platform. -" -1205,1405.5893,"Chantal Enguehard (LINA), Mathieu Mangeot (LIG)",Computerization of African languages-French dictionaries,cs.CL," This paper relates work done during the DiLAF project. It consists in -converting 5 bilingual African language-French dictionaries originally in Word -format into XML following the LMF model. The languages processed are Bambara, -Hausa, Kanuri, Tamajaq and Songhai-zarma, still considered as under-resourced -languages concerning Natural Language Processing tools. Once converted, the -dictionaries are available online on the Jibiki platform for lookup and -modification. The DiLAF project is first presented. A description of each -dictionary follows. Then, the conversion methodology from .doc format to XML -files is presented. A specific point on the usage of Unicode follows. Then, -each step of the conversion into XML and LMF is detailed. The last part -presents the Jibiki lexical resources management platform used for the project. -" -1206,1405.6068,Dmitry Lande,"Building of Networks of Natural Hierarchies of Terms Based on Analysis - of Texts Corpora",cs.CL," The technique of building of networks of hierarchies of terms based on the -analysis of chosen text corpora is offered. The technique is based on the -methodology of horizontal visibility graphs. Constructed and investigated -language network, formed on the basis of electronic preprints arXiv on topics -of information retrieval. -" -1207,1405.6103,"Kurt Winkler, Tobias Kuhn, Martin Volk","Evaluating the fully automatic multi-language translation of the Swiss - avalanche bulletin",cs.CL," The Swiss avalanche bulletin is produced twice a day in four languages. Due -to the lack of time available for manual translation, a fully automated -translation system is employed, based on a catalogue of predefined phrases and -predetermined rules of how these phrases can be combined to produce sentences. -The system is able to automatically translate such sentences from German into -the target languages French, Italian and English without subsequent -proofreading or correction. Our catalogue of phrases is limited to a small -sublanguage. The reduction of daily translation costs is expected to offset the -initial development costs within a few years. After being operational for two -winter seasons, we assess here the quality of the produced texts based on an -evaluation where participants rate real danger descriptions from both origins, -the catalogue of phrases versus the manually written and translated texts. With -a mean recognition rate of 55%, users can hardly distinguish between the two -types of texts, and give similar ratings with respect to their language -quality. Overall, the output from the catalogue system can be considered -virtually equivalent to a text written by avalanche forecasters and then -manually translated by professional translators. Furthermore, forecasters -declared that all relevant situations were captured by the system with -sufficient accuracy and within the limited time available. -" -1208,1405.6164,"Ion Androutsopoulos, Gerasimos Lampouras, Dimitrios Galanis","Generating Natural Language Descriptions from OWL Ontologies: the - NaturalOWL System",cs.CL cs.AI," We present NaturalOWL, a natural language generation system that produces -texts describing individuals or classes of OWL ontologies. Unlike simpler OWL -verbalizers, which typically express a single axiom at a time in controlled, -often not entirely fluent natural language primarily for the benefit of domain -experts, we aim to generate fluent and coherent multi-sentence texts for -end-users. With a system like NaturalOWL, one can publish information in OWL on -the Web, along with automatically produced corresponding texts in multiple -languages, making the information accessible not only to computer programs and -domain experts, but also end-users. We discuss the processing stages of -NaturalOWL, the optional domain-dependent linguistic resources that the system -can use at each stage, and why they are useful. We also present trials showing -that when the domain-dependent llinguistic resources are available, NaturalOWL -produces significantly better texts compared to a simpler verbalizer, and that -the resources can be created with relatively light effort. -" -1209,1405.6293,Ahmed H. Yousef,Cross-Language Personal Name Mapping,cs.CL," Name matching between multiple natural languages is an important step in -cross-enterprise integration applications and data mining. It is difficult to -decide whether or not two syntactic values (names) from two heterogeneous data -sources are alternative designation of the same semantic entity (person), this -process becomes more difficult with Arabic language due to several factors -including spelling and pronunciation variation, dialects and special vowel and -consonant distinction and other linguistic characteristics. This paper proposes -a new framework for name matching between the Arabic language and other -languages. The framework uses a dictionary based on a new proposed version of -the Soundex algorithm to encapsulate the recognition of special features of -Arabic names. The framework proposes a new proximity matching algorithm to suit -the high importance of order sensitivity in Arabic name matching. New -performance evaluation metrics are proposed as well. The framework is -implemented and verified empirically in several case studies demonstrating -substantial improvements compared to other well-known techniques found in -literature. -" -1210,1405.6667,Puneet Singh Ludu,Inferring gender of a Twitter user using celebrities it follows,cs.IR cs.CL," This paper addresses the task of user gender classification in social media, -with an application to Twitter. The approach automatically predicts gender by -leveraging observable information such as the tweet behavior, linguistic -content of the user's Twitter feed and the celebrities followed by the user. -This paper first evaluates linguistic content based features using LIWC -dictionary and popular neighborhood features using Wikipedia and Freebase. Then -augments both features which yielded a significant increase in the accuracy for -gender prediction. Results show that rich linguistic features combined with -popular neighborhood prove valuables and promising for additional user -classification needs. -" -1211,1405.6678,Richard Moot (LaBRI),"Hybrid Type-Logical Grammars, First-Order Linear Logic and the - Descriptive Inadequacy of Lambda Grammars",cs.LO cs.CL," In this article we show that hybrid type-logical grammars are a fragment of -first-order linear logic. This embedding result has several important -consequences: it not only provides a simple new proof theory for the calculus, -thereby clarifying the proof-theoretic foundations of hybrid type-logical -grammars, but, since the translation is simple and direct, it also provides -several new parsing strategies for hybrid type-logical grammars. Second, -NP-completeness of hybrid type-logical grammars follows immediately. The main -embedding result also sheds new light on problems with lambda grammars/abstract -categorial grammars and shows lambda grammars/abstract categorial grammars -suffer from problems of over-generation and from problems at the -syntax-semantics interface unlike any other categorial grammar. -" -1212,1405.6682,Thierry Poibeau (LaTTICe),Optimality Theory as a Framework for Lexical Acquisition,cs.CL," This paper re-investigates a lexical acquisition system initially developed -for French.We show that, interestingly, the architecture of the system -reproduces and implements the main components of Optimality Theory. However, we -formulate the hypothesis that some of its limitations are mainly due to a poor -representation of the constraints used. Finally, we show how a better -representation of the constraints used would yield better results. -" -1213,1405.7397,"Vivekananda Gayen, Kamal Sarkar","An HMM Based Named Entity Recognition System for Indian Languages: The - JU System at ICON 2013",cs.CL," This paper reports about our work in the ICON 2013 NLP TOOLS CONTEST on Named -Entity Recognition. We submitted runs for Bengali, English, Hindi, Marathi, -Punjabi, Tamil and Telugu. A statistical HMM (Hidden Markov Models) based model -has been used to implement our system. The system has been trained and tested -on the NLP TOOLS CONTEST: ICON 2013 datasets. Our system obtains F-measures of -0.8599, 0.7704, 0.7520, 0.4289, 0.5455, 0.4466, and 0.4003 for Bengali, -English, Hindi, Marathi, Punjabi, Tamil and Telugu respectively. -" -1214,1405.7519,"Deepali Virmani, Vikrant Malhotra, Ridhi Tyagi",Aspect Based Sentiment Analysis to Extract Meticulous Opinion Value,cs.IR cs.CL," Opinion Mining and Sentiment Analysis is a process of identifying opinions in -large unstructured/structured data and then analysing polarity of those -opinions. Opinion mining and sentiment analysis have found vast application in -analysing online ratings, analysing product based reviews, e-governance, and -managing hostile content over the internet. This paper proposes an algorithm to -implement aspect level sentiment analysis. The algorithm takes input from the -remarks submitted by various teachers of a student. An aspect tree is formed -which has various levels and weights are assigned to each branch to identify -level of aspect. Aspect value is calculated by the algorithm by means of the -proposed aspect tree. Dictionary based method is implemented to evaluate the -polarity of the remark. The algorithm returns the aspect value clubbed with -opinion value and sentiment value which helps in concluding the summarized -value of remark. -" -1215,1405.7711,"David L. Chen, Joohyun Kim, Raymond J. Mooney","Training a Multilingual Sportscaster: Using Perceptual Context to Learn - Language",cs.CL," We present a novel framework for learning to interpret and generate language -using only perceptual context as supervision. We demonstrate its capabilities -by developing a system that learns to sportscast simulated robot soccer games -in both English and Korean without any language-specific prior knowledge. -Training employs only ambiguous supervision consisting of a stream of -descriptive textual comments and a sequence of events extracted from the -simulation trace. The system simultaneously establishes correspondences between -individual comments and the events that they describe while building a -translation model that supports both parsing and generation. We also present a -novel algorithm for learning which events are worth describing. Human -evaluations of the generated commentaries indicate they are of reasonable -quality and in some cases even on par with those produced by humans for our -limited domain. -" -1216,1405.7713,"Sophia Katrenko, Pieter Adriaans, Maarten van Someren",Using Local Alignments for Relation Recognition,cs.CL cs.IR cs.LG," This paper discusses the problem of marrying structural similarity with -semantic relatedness for Information Extraction from text. Aiming at accurate -recognition of relations, we introduce local alignment kernels and explore -various possibilities of using them for this task. We give a definition of a -local alignment (LA) kernel based on the Smith-Waterman score as a sequence -similarity measure and proceed with a range of possibilities for computing -similarity between elements of sequences. We show how distributional similarity -measures obtained from unlabeled data can be incorporated into the learning -task as semantic knowledge. Our experiments suggest that the LA kernel yields -promising results on various biomedical corpora outperforming two baselines by -a large margin. Additional series of experiments have been conducted on the -data sets of seven general relation types, where the performance of the LA -kernel is comparable to the current state-of-the-art results. -" -1217,1405.7908,Peter D. Turney,Semantic Composition and Decomposition: From Recognition to Generation,cs.CL cs.AI cs.LG," Semantic composition is the task of understanding the meaning of text by -composing the meanings of the individual words in the text. Semantic -decomposition is the task of understanding the meaning of an individual word by -decomposing it into various aspects (factors, constituents, components) that -are latent in the meaning of the word. We take a distributional approach to -semantics, in which a word is represented by a context vector. Much recent work -has considered the problem of recognizing compositions and decompositions, but -we tackle the more difficult generation problem. For simplicity, we focus on -noun-modifier bigrams and noun unigrams. A test for semantic composition is, -given context vectors for the noun and modifier in a noun-modifier bigram (""red -salmon""), generate a noun unigram that is synonymous with the given bigram -(""sockeye""). A test for semantic decomposition is, given a context vector for a -noun unigram (""snifter""), generate a noun-modifier bigram that is synonymous -with the given unigram (""brandy glass""). With a vocabulary of about 73,000 -unigrams from WordNet, there are 73,000 candidate unigram compositions for a -bigram and 5,300,000,000 (73,000 squared) candidate bigram decompositions for a -unigram. We generate ranked lists of potential solutions in two passes. A fast -unsupervised learning algorithm generates an initial list of candidates and -then a slower supervised learning algorithm refines the list. We evaluate the -candidate solutions by comparing them to WordNet synonym sets. For -decomposition (unigram to bigram), the top 100 most highly ranked bigrams -include a WordNet synonym of the given unigram 50.7% of the time. For -composition (bigram to unigram), the top 100 most highly ranked unigrams -include a WordNet synonym of the given bigram 77.8% of the time. -" -1218,1405.7975,Ercan Canhasi,Multi-layered graph-based multi-document summarization model,cs.IR cs.CL," Multi-document summarization is a process of automatic generation of a -compressed version of the given collection of documents. Recently, the -graph-based models and ranking algorithms have been actively investigated by -the extractive document summarization community. While most work to date -focuses on homogeneous connecteness of sentences and heterogeneous connecteness -of documents and sentences (e.g. sentence similarity weighted by document -importance), in this paper we present a novel 3-layered graph model that -emphasizes not only sentence and document level relations but also the -influence of under sentence level relations (e.g. a part of sentence -similarity). -" -1219,1406.0032,"Pollyanna Gon\c{c}alves and Matheus Ara\'ujo and Fabr\'icio Benevenuto - and Meeyoung Cha",Comparing and Combining Sentiment Analysis Methods,cs.CL," Several messages express opinions about events, products, and services, -political views or even their author's emotional state and mood. Sentiment -analysis has been used in several applications including analysis of the -repercussions of events in social networks, analysis of opinions about products -and services, and simply to better understand aspects of social communication -in Online Social Networks (OSNs). There are multiple methods for measuring -sentiments, including lexical-based approaches and supervised machine learning -methods. Despite the wide use and popularity of some methods, it is unclear -which method is better for identifying the polarity (i.e., positive or -negative) of a message as the current literature does not provide a method of -comparison among existing methods. Such a comparison is crucial for -understanding the potential limitations, advantages, and disadvantages of -popular methods in analyzing the content of OSNs messages. Our study aims at -filling this gap by presenting comparisons of eight popular sentiment analysis -methods in terms of coverage (i.e., the fraction of messages whose sentiment is -identified) and agreement (i.e., the fraction of identified sentiments that are -in tune with ground truth). We develop a new method that combines existing -approaches, providing the best coverage results and competitive agreement. We -also present a free Web service called iFeel, which provides an open API for -accessing and comparing results across different sentiment methods for a given -text. -" -1220,1406.0079,Shashishekar Ramakrishna and Adrian Paschke,"Bridging the gap between Legal Practitioners and Knowledge Engineers - using semi-formal KR",cs.CL cs.AI," The use of Structured English as a computation independent knowledge -representation format for non-technical users in business rules representation -has been proposed in OMGs Semantics and Business Vocabulary Representation -(SBVR). In the legal domain we face a similar problem. Formal representation -languages, such as OASIS LegalRuleML and legal ontologies (LKIF, legal OWL2 -ontologies etc.) support the technical knowledge engineer and the automated -reasoning. But, they can be hardly used directly by the legal domain experts -who do not have a computer science background. In this paper we adapt the SBVR -Structured English approach for the legal domain and implement a -proof-of-concept, called KR4IPLaw, which enables legal domain experts to -represent their knowledge in Structured English in a computational independent -and hence, for them, more usable way. The benefit of this approach is that the -underlying pre-defined semantics of the Structured English approach makes -transformations into formal languages such as OASIS LegalRuleML and OWL2 -ontologies possible. We exemplify our approach in the domain of patent law. -" -1221,1406.1078,"Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry - Bahdanau, Fethi Bougares, Holger Schwenk and Yoshua Bengio","Learning Phrase Representations using RNN Encoder-Decoder for - Statistical Machine Translation",cs.CL cs.LG cs.NE stat.ML," In this paper, we propose a novel neural network model called RNN -Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN -encodes a sequence of symbols into a fixed-length vector representation, and -the other decodes the representation into another sequence of symbols. The -encoder and decoder of the proposed model are jointly trained to maximize the -conditional probability of a target sequence given a source sequence. The -performance of a statistical machine translation system is empirically found to -improve by using the conditional probabilities of phrase pairs computed by the -RNN Encoder-Decoder as an additional feature in the existing log-linear model. -Qualitatively, we show that the proposed model learns a semantically and -syntactically meaningful representation of linguistic phrases. -" -1222,1406.1143,"Sarah Weissman, Samet Ayhan, Joshua Bradley, and Jimmy Lin",Identifying Duplicate and Contradictory Information in Wikipedia,cs.IR cs.CL cs.DL cs.SI," Our study identifies sentences in Wikipedia articles that are either -identical or highly similar by applying techniques for near-duplicate detection -of web pages. This is accomplished with a MapReduce implementation of minhash -to identify clusters of sentences with high Jaccard similarity. We show that -these clusters can be categorized into six different types, two of which are -particularly interesting: identical sentences quantify the extent to which -content in Wikipedia is copied and pasted, and near-duplicate sentences that -state contradictory facts point to quality issues in Wikipedia. -" -1223,1406.1203,"Divyanshu Bhartiya, Ashudeep Singh",A Semantic Approach to Summarization,cs.CL," Sentence extraction based summarization methods has some limitations as it -doesn't go into the semantics of the document. Also, it lacks the capability of -sentence generation which is intuitive to humans. Here we present a novel -method to summarize text documents taking the process to semantic levels with -the use of WordNet and other resources, and using a technique for sentence -generation. We involve semantic role labeling to get the semantic -representation of text and use of segmentation to form clusters of the related -pieces of text. Picking out the centroids and sentence generation completes the -task. We evaluate our system against human composed summaries and also present -an evaluation done by humans to measure the quality attributes of our -summaries. -" -1224,1406.1234,Chen Lijiang,A Geometric Method to Obtain the Generation Probability of a Sentence,cs.CL cs.AI math.ST stat.CO stat.ME stat.TH," ""How to generate a sentence"" is the most critical and difficult problem in -all the natural language processing technologies. In this paper, we present a -new approach to explain the generation process of a sentence from the -perspective of mathematics. Our method is based on the premise that in our -brain a sentence is a part of a word network which is formed by many word -nodes. Experiments show that the probability of the entire sentence can be -obtained by the probabilities of single words and the probabilities of the -co-occurrence of word pairs, which indicate that human use the synthesis method -to generate a sentence. -" -1225,1406.1241,T. El-Shishtawy and A. El-Sammak,The Best Templates Match Technique For Example Based Machine Translation,cs.CL," It has been proved that large scale realistic Knowledge Based Machine -Translation applications require acquisition of huge knowledge about language -and about the world. This knowledge is encoded in computational grammars, -lexicons and domain models. Another approach which avoids the need for -collecting and analyzing massive knowledge, is the Example Based approach, -which is the topic of this paper. We show through the paper that using Example -Based in its native form is not suitable for translating into Arabic. Therefore -a modification to the basic approach is presented to improve the accuracy of -the translation process. The basic idea of the new approach is to improve the -technique by which template-based approaches select the appropriate templates. -" -1226,1406.1280,Sunil Kumar Kopparapu and M Laxminarayana,"Basis Identification for Automatic Creation of Pronunciation Lexicon for - Proper Names",cs.CL," Development of a proper names pronunciation lexicon is usually a manual -effort which can not be avoided. Grapheme to phoneme (G2P) conversion modules, -in literature, are usually rule based and work best for non-proper names in a -particular language. Proper names are foreign to a G2P module. We follow an -optimization approach to enable automatic construction of proper names -pronunciation lexicon. The idea is to construct a small orthogonal set of words -(basis) which can span the set of names in a given database. We propose two -algorithms for the construction of this basis. The transcription lexicon of all -the proper names in a database can be produced by the manual transcription of -only the small set of basis words. We first construct a cost function and show -that the minimization of the cost function results in a basis. We derive -conditions for convergence of this cost function and validate them -experimentally on a very large proper name database. Experiments show the -transcription can be achieved by transcribing a set of small number of basis -words. The algorithms proposed are generic and independent of language; however -performance is better if the proper names have same origin, namely, same -language or geographical region. -" -1227,1406.1765,Anne Condamines and Maxime Warnier,"Linguistic Analysis of Requirements of a Space Project and their - Conformity with the Recommendations Proposed by a Controlled Natural Language",cs.SE cs.CL," The long term aim of the project carried out by the French National Space -Agency (CNES) is to design a writing guide based on the real and regular -writing of requirements. As a first step in the project, this paper proposes a -lin-guistic analysis of requirements written in French by CNES engineers. The -aim is to determine to what extent they conform to two rules laid down in -INCOSE, a recent guide for writing requirements. Although CNES engineers are -not obliged to follow any Controlled Natural Language in their writing of -requirements, we believe that language regularities are likely to emerge from -this task, mainly due to the writers' experience. The issue is approached using -natural language processing tools to identify sentences that do not comply with -INCOSE rules. We further review these sentences to understand why the -recommendations cannot (or should not) always be applied when specifying -large-scale projects. -" -1228,1406.1827,"Samuel R. Bowman, Christopher Potts, Christopher D. Manning",Recursive Neural Networks Can Learn Logical Semantics,cs.CL cs.LG cs.NE," Tree-structured recursive neural networks (TreeRNNs) for sentence meaning -have been successful for many applications, but it remains an open question -whether the fixed-length representations that they learn can support tasks as -demanding as logical deduction. We pursue this question by evaluating whether -two such models---plain TreeRNNs and tree-structured neural tensor networks -(TreeRNTNs)---can correctly learn to identify logical relationships such as -entailment and contradiction using these representations. In our first set of -experiments, we generate artificial data from a logical grammar and use it to -evaluate the models' ability to learn to handle basic relational reasoning, -recursive structures, and quantification. We then evaluate the models on the -more natural SICK challenge data. Both models perform competitively on the SICK -data and generalize well in all three experiments on simulated data, suggesting -that they can learn suitable representations for logical inference in natural -language. -" -1229,1406.1870,C. Maria Keet and Langa Khumalo,Toward verbalizing ontologies in isiZulu,cs.CL," IsiZulu is one of the eleven official languages of South Africa and roughly -half the population can speak it. It is the first (home) language for over 10 -million people in South Africa. Only a few computational resources exist for -isiZulu and its related Nguni languages, yet the imperative for tool -development exists. We focus on natural language generation, and the grammar -options and preferences in particular, which will inform verbalization of -knowledge representation languages and could contribute to machine translation. -The verbalization pattern specification shows that the grammar rules are -elaborate and there are several options of which one may have preference. We -devised verbalization patterns for subsumption, basic disjointness, existential -and universal quantification, and conjunction. This was evaluated in a survey -among linguists and non-linguists. Some differences between linguists and -non-linguists can be observed, with the former much more in agreement, and -preferences depend on the overall structure of the sentence, such as singular -for subsumption and plural in other cases. -" -1230,1406.1953,"Peilei Liu, Ting Wang",Automatic Extraction of Protein Interaction in Literature,cs.CL cs.CE," Protein-protein interaction extraction is the key precondition of the -construction of protein knowledge network, and it is very important for the -research in the biomedicine. This paper extracted directional protein-protein -interaction from the biological text, using the SVM-based method. Experiments -were evaluated on the LLL05 corpus with good results. The results show that -dependency features are import for the protein-protein interaction extraction -and features related to the interaction word are effective for the interaction -direction judgment. At last, we analyzed the effects of different features and -planed for the next step. -" -1231,1406.2022,Rahul Tejwani (University at Buffalo),Two-dimensional Sentiment Analysis of text,cs.IR cs.CL," Sentiment Analysis aims to get the underlying viewpoint of the text, which -could be anything that holds a subjective opinion, such as an online review, -Movie rating, Comments on Blog posts etc. This paper presents a novel approach -that classify text in two-dimensional Emotional space, based on the sentiments -of the author. The approach uses existing lexical resources to extract feature -set, which is trained using Supervised Learning techniques. -" -1232,1406.2035,Dani Yogatama and Manaal Faruqui and Chris Dyer and Noah A. Smith,Learning Word Representations with Hierarchical Sparse Coding,cs.CL cs.LG stat.ML," We propose a new method for learning word representations using hierarchical -regularization in sparse coding inspired by the linguistic study of word -meanings. We show an efficient learning algorithm based on stochastic proximal -methods that is significantly faster than previous approaches, making it -possible to perform hierarchical sparse coding on a corpus of billions of word -tokens. Experiments on various benchmark tasks---word similarity ranking, -analogies, sentence completion, and sentiment analysis---demonstrate that the -method outperforms or is competitive with state-of-the-art methods. Our word -representations are available at -\url{http://www.ark.cs.cmu.edu/dyogatam/wordvecs/}. -" -1233,1406.2096,"Paul Brillant Feuto Njonko, Sylviane Cardey, Peter Greenfield, and - Walid El Abed",RuleCNL: A Controlled Natural Language for Business Rule Specifications,cs.SE cs.CL," Business rules represent the primary means by which companies define their -business, perform their actions in order to reach their objectives. Thus, they -need to be expressed unambiguously to avoid inconsistencies between business -stakeholders and formally in order to be machine-processed. A promising -solution is the use of a controlled natural language (CNL) which is a good -mediator between natural and formal languages. This paper presents RuleCNL, -which is a CNL for defining business rules. Its core feature is the alignment -of the business rule definition with the business vocabulary which ensures -traceability and consistency with the business domain. The RuleCNL tool -provides editors that assist end-users in the writing process and automatic -mappings into the Semantics of Business Vocabulary and Business Rules (SBVR) -standard. SBVR is grounded in first order logic and includes constructs called -semantic formulations that structure the meaning of rules. -" -1234,1406.2204,"Sandra Williams, Richard Power and Allan Third","How Easy is it to Learn a Controlled Natural Language for Building a - Knowledge Base?",cs.CL," Recent developments in controlled natural language editors for knowledge -engineering (KE) have given rise to expectations that they will make KE tasks -more accessible and perhaps even enable non-engineers to build knowledge bases. -This exploratory research focussed on novices and experts in knowledge -engineering during their attempts to learn a controlled natural language (CNL) -known as OWL Simplified English and use it to build a small knowledge base. -Participants' behaviours during the task were observed through eye-tracking and -screen recordings. This was an attempt at a more ambitious user study than in -previous research because we used a naturally occurring text as the source of -domain knowledge, and left them without guidance on which information to -select, or how to encode it. We have identified a number of skills -(competencies) required for this difficult task and key problems that authors -face. -" -1235,1406.2298,Gordon J. Pace and Michael Rosner,"Explaining Violation Traces with Finite State Natural Language - Generation Models",cs.SE cs.CL," An essential element of any verification technique is that of identifying and -communicating to the user, system behaviour which leads to a deviation from the -expected behaviour. Such behaviours are typically made available as long traces -of system actions which would benefit from a natural language explanation of -the trace and especially in the context of business logic level specifications. -In this paper we present a natural language generation model which can be used -to explain such traces. A key idea is that the explanation language is a CNL -that is, formally speaking, regular language susceptible transformations that -can be expressed with finite state machinery. At the same time it admits -various forms of abstraction and simplification which contribute to the -naturalness of explanations that are communicated to the user. -" -1236,1406.2400,Dana Dann\'ells and Normunds Gr\=uz\=itis,"Controlled Natural Language Generation from a Multilingual - FrameNet-based Grammar",cs.CL," This paper presents a currently bilingual but potentially multilingual -FrameNet-based grammar library implemented in Grammatical Framework. The -contribution of this paper is two-fold. First, it offers a methodological -approach to automatically generate the grammar based on semantico-syntactic -valence patterns extracted from FrameNet-annotated corpora. Second, it provides -a proof of concept for two use cases illustrating how the acquired multilingual -grammar can be exploited in different CNL applications in the domains of arts -and tourism. -" -1237,1406.2538,Guntis Barzdins,"FrameNet CNL: a Knowledge Representation and Information Extraction - Language",cs.CL cs.AI cs.IR cs.LG," The paper presents a FrameNet-based information extraction and knowledge -representation framework, called FrameNet-CNL. The framework is used on natural -language documents and represents the extracted knowledge in a tailor-made -Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be -generated automatically in multiple languages. This approach brings together -the fields of information extraction and CNL, because a source text can be -considered belonging to FrameNet-CNL, if information extraction parser produces -the correct knowledge representation as a result. We describe a -state-of-the-art information extraction parser used by a national news agency -and speculate that FrameNet-CNL eventually could shape the natural language -subset used for writing the newswire articles. -" -1238,1406.2710,"Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov","A Multiplicative Model for Learning Distributed Text-Based Attribute - Representations",cs.LG cs.CL," In this paper we propose a general framework for learning distributed -representations of attributes: characteristics of text whose representations -can be jointly learned with word embeddings. Attributes can correspond to -document indicators (to learn sentence vectors), language indicators (to learn -distributed language representations), meta-data and side information (such as -the age, gender and industry of a blogger) or representations of authors. We -describe a third-order model where word context and attribute vectors interact -multiplicatively to predict the next word in a sequence. This leads to the -notion of conditional word similarity: how meanings of words change when -conditioned on different attributes. We perform several experimental tasks -including sentiment classification, cross-lingual document classification, and -blog authorship attribution. We also qualitatively evaluate conditional word -neighbours and attribute-conditioned text generation. -" -1239,1406.2880,"Ulf Sch\""oneberg and Wolfram Sperber",POS Tagging and its Applications for Mathematics,cs.DL cs.CL cs.IR," Content analysis of scientific publications is a nontrivial task, but a -useful and important one for scientific information services. In the Gutenberg -era it was a domain of human experts; in the digital age many machine-based -methods, e.g., graph analysis tools and machine-learning techniques, have been -developed for it. Natural Language Processing (NLP) is a powerful -machine-learning approach to semiautomatic speech and language processing, -which is also applicable to mathematics. The well established methods of NLP -have to be adjusted for the special needs of mathematics, in particular for -handling mathematical formulae. We demonstrate a mathematics-aware part of -speech tagger and give a short overview about our adaptation of NLP methods for -mathematical publications. We show the use of the tools developed for key -phrase extraction and classification in the database zbMATH. -" -1240,1406.2903,Hazem Safwat and Brian Davis,A Brief State of the Art for Ontology Authoring,cs.CL," One of the main challenges for building the Semantic web is Ontology -Authoring. Controlled Natural Languages CNLs offer a user friendly means for -non-experts to author ontologies. This paper provides a snapshot of the -state-of-the-art for the core CNLs for ontology authoring and reviews their -respective evaluations. -" -1241,1406.2963,"Shanan E. Peters, Ce Zhang, Miron Livny, Christopher R\'e",A machine-compiled macroevolutionary history of Phanerozoic life,cs.DB cs.CL cs.LG q-bio.PE," Many aspects of macroevolutionary theory and our understanding of biotic -responses to global environmental change derive from literature-based -compilations of palaeontological data. Existing manually assembled databases -are, however, incomplete and difficult to assess and enhance. Here, we develop -and validate the quality of a machine reading system, PaleoDeepDive, that -automatically locates and extracts data from heterogeneous text, tables, and -figures in publications. PaleoDeepDive performs comparably to humans in complex -data extraction and inference tasks and generates congruent synthetic -macroevolutionary results. Unlike traditional databases, PaleoDeepDive produces -a probabilistic database that systematically improves as information is added. -We also show that the system can readily accommodate sophisticated data types, -such as morphological data in biological illustrations and associated textual -descriptions. Our machine reading approach to scientific data integration and -synthesis brings within reach many questions that are currently underdetermined -and does so in ways that may stimulate entirely new modes of inquiry. -" -1242,1406.3287,Matthew Mayo,A Clustering Analysis of Tweet Length and its Relation to Sentiment,cs.CL cs.IR cs.SI," Sentiment analysis of Twitter data is performed. The researcher has made the -following contributions via this paper: (1) an innovative method for deriving -sentiment score dictionaries using an existing sentiment dictionary as seed -words is explored, and (2) an analysis of clustered tweet sentiment scores -based on tweet length is performed. -" -1243,1406.3460,Karolina Suchowolec,Are Style Guides Controlled Languages? The Case of Koenig & Bauer AG,cs.CL," Controlled natural languages for industrial application are often regarded as -a response to the challenges of translation and multilingual communication. -This paper presents a quite different approach taken by Koenig & Bauer AG, -where the main goal was the improvement of the authoring process for technical -documentation. Most importantly, this paper explores the notion of a controlled -language and demonstrates how style guides can emerge from non-linguistic -considerations. Moreover, it shows the transition from loose language -recommendations into precise and prescriptive rules and investigates whether -such rules can be regarded as a full-fledged controlled language. -" -1244,1406.3676,"Antoine Bordes, Sumit Chopra, Jason Weston",Question Answering with Subgraph Embeddings,cs.CL," This paper presents a system which learns to answer questions on a broad -range of topics from a knowledge base using few hand-crafted features. Our -model learns low-dimensional embeddings of words and knowledge base -constituents; these representations are used to score natural language -questions against candidate answers. Training our system using pairs of -questions and structured representations of their answers, and pairs of -question paraphrases, yields competitive results on a competitive benchmark of -the literature. -" -1245,1406.3714,"Richa Sharma, Shweta Nigam and Rekha Jain",Mining of product reviews at aspect level,cs.CL cs.IR," Todays world is a world of Internet, almost all work can be done with the -help of it, from simple mobile phone recharge to biggest business deals can be -done with the help of this technology. People spent their most of the times on -surfing on the Web it becomes a new source of entertainment, education, -communication, shopping etc. Users not only use these websites but also give -their feedback and suggestions that will be useful for other users. In this way -a large amount of reviews of users are collected on the Web that needs to be -explored, analyse and organized for better decision making. Opinion Mining or -Sentiment Analysis is a Natural Language Processing and Information Extraction -task that identifies the users views or opinions explained in the form of -positive, negative or neutral comments and quotes underlying the text. Aspect -based opinion mining is one of the level of Opinion mining that determines the -aspect of the given reviews and classify the review for each feature. In this -paper an aspect based opinion mining system is proposed to classify the reviews -as positive, negative and neutral for each feature. Negation is also handled in -the proposed system. Experimental results using reviews of products show the -effectiveness of the system. -" -1246,1406.3830,"Misha Denil and Alban Demiraj and Nal Kalchbrenner and Phil Blunsom - and Nando de Freitas","Modelling, Visualising and Summarising Documents with a Single - Convolutional Neural Network",cs.CL cs.LG stat.ML," Capturing the compositional process which maps the meaning of words to that -of documents is a central challenge for researchers in Natural Language -Processing and Information Retrieval. We introduce a model that is able to -represent the meaning of documents by embedding them in a low dimensional -vector space, while preserving distinctions of word and sentence order crucial -for capturing nuanced semantics. Our model is based on an extended Dynamic -Convolution Neural Network, which learns convolution filters at both the -sentence and document level, hierarchically learning to capture and compose low -level lexical features into high level semantic concepts. We demonstrate the -effectiveness of this model on a range of document modelling tasks, achieving -strong results with no feature engineering and with a more compact model. -Inspired by recent advances in visualising deep convolution networks for -computer vision, we present a novel visualisation technique for our document -networks which not only provides insight into their learning process, but also -can be interpreted to produce a compelling automatic summarisation system for -texts. -" -1247,1406.3855,"Peter Sheridan Dodds, Eric M. Clark, Suma Desu, Morgan R. Frank, - Andrew J. Reagan, Jake Ryland Williams, Lewis Mitchell, Kameron Decker - Harris, Isabel M. Kloumann, James P. Bagrow, Karine Megerdoomian, Matthew T. - McMahon, Brian F. Tivnan, and Christopher M. Danforth",Human language reveals a universal positivity bias,physics.soc-ph cs.CL cs.SI," Using human evaluation of 100,000 words spread across 24 corpora in 10 -languages diverse in origin and culture, we present evidence of a deep imprint -of human sociality in language, observing that (1) the words of natural human -language possess a universal positivity bias; (2) the estimated emotional -content of words is consistent between languages under translation; and (3) -this positivity bias is strongly independent of frequency of word usage. -Alongside these general regularities, we describe inter-language variations in -the emotional spectrum of languages which allow us to rank corpora. We also -show how our word evaluations can be used to construct physical-like -instruments for both real-time and offline measurement of the emotional content -of large-scale texts. -" -1248,1406.3915,"Sankar Mukherjee, Shyamal Kumar Das Mandal",A Bengali HMM Based Speech Synthesis System,cs.SD cs.CL cs.MM," The paper presents the capability of an HMM-based TTS system to produce -Bengali speech. In this synthesis method, trajectories of speech parameters are -generated from the trained Hidden Markov Models. A final speech waveform is -synthesized from those speech parameters. In our experiments, spectral -properties were represented by Mel Cepstrum Coefficients. Both the training and -synthesis issues are investigated in this paper using annotated Bengali speech -database. Experimental evaluation depicts that the developed text-to-speech -system is capable of producing adequately natural speech in terms of -intelligibility and intonation for Bengali. -" -1249,1406.3969,"Siddhartha Ghosh, Sujata Thamke and Kalyani U.R.S","Translation Of Telugu-Marathi and Vice-Versa using Rule Based Machine - Translation",cs.CL," In todays digital world automated Machine Translation of one language to -another has covered a long way to achieve different kinds of success stories. -Whereas Babel Fish supports a good number of foreign languages and only Hindi -from Indian languages, the Google Translator takes care of about 10 Indian -languages. Though most of the Automated Machine Translation Systems are doing -well but handling Indian languages needs a major care while handling the local -proverbs/ idioms. Most of the Machine Translation system follows the direct -translation approach while translating one Indian language to other. Our -research at KMIT R&D Lab found that handling the local proverbs/idioms is not -given enough attention by the earlier research work. This paper focuses on two -of the majorly spoken Indian languages Marathi and Telugu, and translation -between them. Handling proverbs and idioms of both the languages have been -given a special care, and the research outcome shows a significant achievement -in this direction. -" -1250,1406.3976,"Ramona Enache, Inari Listenmaa, Prasanth Kolachina",Handling non-compositionality in multilingual CNLs,cs.CL," In this paper, we describe methods for handling multilingual -non-compositional constructions in the framework of GF. We specifically look at -methods to detect and extract non-compositional phrases from parallel texts and -propose methods to handle such constructions in GF grammars. We expect that the -methods to handle non-compositional constructions will enrich CNLs by providing -more flexibility in the design of controlled languages. We look at two specific -use cases of non-compositional constructions: a general-purpose method to -detect and extract multilingual multiword expressions and a procedure to -identify nominal compounds in German. We evaluate our procedure for multiword -expressions by performing a qualitative analysis of the results. For the -experiments on nominal compounds, we incorporate the detected compounds in a -full SMT pipeline and evaluate the impact of our method in machine translation -process. -" -1251,1406.3987,"Juyeon Kang, Patrick Saint Dizier","Towards an Error Correction Memory to Enhance Technical Texts Authoring - in LELIE",cs.CL," In this paper, we investigate and experiment the notion of error correction -memory applied to error correction in technical texts. The main purpose is to -induce relatively generic correction patterns associated with more contextual -correction recommendations, based on previously memorized and analyzed -corrections. The notion of error correction memory is developed within the -framework of the LELIE project and illustrated on the case of fuzzy lexical -items, which is a major problem in technical texts. -" -1252,1406.4057,Aarne Ranta,Embedded Controlled Languages,cs.CL," Inspired by embedded programming languages, an embedded CNL (controlled -natural language) is a proper fragment of an entire natural language (its host -language), but it has a parser that recognizes the entire host language. This -makes it possible to process out-of-CNL input and give useful feedback to -users, instead of just reporting syntax errors. This extended abstract explains -the main concepts of embedded CNL implementation in GF (Grammatical Framework), -with examples from machine translation and some other ongoing work. -" -1253,1406.4211,Pierre Bourreau and Thierry Poibeau,Mapping the Economic Crisis: Some Preliminary Investigations,cs.CL," In this paper we describe our contribution to the PoliInformatics 2014 -Challenge on the 2007-2008 financial crisis. We propose a state of the art -technique to extract information from texts and provide different -representations, giving first a static overview of the domain and then a -dynamic representation of its main evolutions. We show that this strategy -provides a practical solution to some recent theories in social sciences that -are facing a lack of methods and tools to automatically extract information -from natural language texts. -" -1254,1406.4441,Martin Gerlach and Eduardo G. Altmann,Scaling laws and fluctuations in the statistics of word frequencies,physics.soc-ph cs.CL physics.data-an," In this paper we combine statistical analysis of large text databases and -simple stochastic models to explain the appearance of scaling laws in the -statistics of word frequencies. Besides the sublinear scaling of the vocabulary -size with database size (Heaps' law), here we report a new scaling of the -fluctuations around this average (fluctuation scaling analysis). We explain -both scaling laws by modeling the usage of words by simple stochastic processes -in which the overall distribution of word-frequencies is fat tailed (Zipf's -law) and the frequency of a single word is subject to fluctuations across -documents (as in topic models). In this framework, the mean and the variance of -the vocabulary size can be expressed as quenched averages, implying that: i) -the inhomogeneous dissemination of words cause a reduction of the average -vocabulary size in comparison to the homogeneous case, and ii) correlations in -the co-occurrence of words lead to an increase in the variance and the -vocabulary size becomes a non-self-averaging quantity. We address the -implications of these observations to the measurement of lexical richness. We -test our results in three large text databases (Google-ngram, Enlgish -Wikipedia, and a collection of scientific articles). -" -1255,1406.4469,"Santiago Segarra, Mark Eisen, Alejandro Ribeiro",Authorship Attribution through Function Word Adjacency Networks,cs.CL cs.LG stat.ML," A method for authorship attribution based on function word adjacency networks -(WANs) is introduced. Function words are parts of speech that express -grammatical relationships between other words but do not carry lexical meaning -on their own. In the WANs in this paper, nodes are function words and directed -edges stand in for the likelihood of finding the sink word in the ordered -vicinity of the source word. WANs of different authors can be interpreted as -transition probabilities of a Markov chain and are therefore compared in terms -of their relative entropies. Optimal selection of WAN parameters is studied and -attribution accuracy is benchmarked across a diverse pool of authors and -varying text lengths. This analysis shows that, since function words are -independent of content, their use tends to be specific to an author and that -the relational data captured by function WANs is a good summary of stylometric -fingerprints. Attribution accuracy is observed to exceed the one achieved by -methods that rely on word frequencies alone. Further combining WANs with -methods that rely on word frequencies alone, results in larger attribution -accuracy, indicating that both sources of information encode different aspects -of authorial styles. -" -1256,1406.4498,"Fakhteh Ghanbarnejad, Martin Gerlach, Jose M. Miotto, Eduardo G. - Altmann",Extracting information from S-curves of language change,physics.soc-ph cs.CL physics.data-an," It is well accepted that adoption of innovations are described by S-curves -(slow start, accelerating period, and slow end). In this paper, we analyze how -much information on the dynamics of innovation spreading can be obtained from a -quantitative description of S-curves. We focus on the adoption of linguistic -innovations for which detailed databases of written texts from the last 200 -years allow for an unprecedented statistical precision. Combining data analysis -with simulations of simple models (e.g., the Bass dynamics on complex networks) -we identify signatures of endogenous and exogenous factors in the S-curves of -adoption. We propose a measure to quantify the strength of these factors and -three different methods to estimate it from S-curves. We obtain cases in which -the exogenous factors are dominant (in the adoption of German orthographic -reforms and of one irregular verb) and cases in which endogenous factors are -dominant (in the adoption of conventions for romanization of Russian names and -in the regularization of most studied verbs). These results show that the shape -of S-curve is not universal and contains information on the adoption mechanism. -(published at ""J. R. Soc. Interface, vol. 11, no. 101, (2014) 1044""; DOI: -http://dx.doi.org/10.1098/rsif.2014.1044) -" -1257,1406.4690,Mehrnoosh Sadrzadeh and Stephen Clark and Bob Coecke,The Frobenius anatomy of word meanings II: possessive relative pronouns,cs.CL math.CT," Within the categorical compositional distributional model of meaning, we -provide semantic interpretations for the subject and object roles of the -possessive relative pronoun `whose'. This is done in terms of Frobenius -algebras over compact closed categories. These algebras and their diagrammatic -language expose how meanings of words in relative clauses interact with each -other. We show how our interpretation is related to Montague-style semantics -and provide a truth-theoretic interpretation. We also show how vector spaces -provide a concrete interpretation and provide preliminary corpus-based -experimental evidence. In a prequel to this paper, we used similar methods and -dealt with the case of subject and object relative pronouns. -" -1258,1406.4710,Christian Retor\'e,"Typed Hilbert Epsilon Operators and the Semantics of Determiner Phrases - (Invited Lecture)",cs.CL cs.AI cs.LO math.LO," The semantics of determiner phrases, be they definite de- scriptions, -indefinite descriptions or quantified noun phrases, is often as- sumed to be a -fully solved question: common nouns are properties, and determiners are -generalised quantifiers that apply to two predicates: the property -corresponding to the common noun and the one corresponding to the verb phrase. -We first present a criticism of this standard view. Firstly, the semantics of -determiners does not follow the syntactical structure of the sentence. Secondly -the standard interpretation of the indefinite article cannot ac- count for -nominal sentences. Thirdly, the standard view misses the linguis- tic asymmetry -between the two properties of a generalised quantifier. In the sequel, we -propose a treatment of determiners and quantifiers as Hilbert terms in a richly -typed system that we initially developed for lexical semantics, using a many -sorted logic for semantical representations. We present this semantical -framework called the Montagovian generative lexicon and show how these terms -better match the syntactical structure and avoid the aforementioned problems of -the standard approach. Hilbert terms rather differ from choice functions in -that there is one polymorphic operator and not one operator per formula. They -also open an intriguing connection between the logic for meaning assembly, the -typed lambda calculus handling compositionality and the many-sorted logic for -semantical representations. Furthermore epsilon terms naturally introduce -type-judgements and confirm the claim that type judgment are a form of -presupposition. -" -1259,1406.4824,"Rana D. Parshad, Vineeta Chand, Neha Sinha, Nitu Kumari","What is India speaking: The ""Hinglish"" invasion",cs.CL math.DS," While language competition models of diachronic language shift are -increasingly sophisticated, drawing on sociolinguistic components like variable -language prestige, distance from language centers and intermediate bilingual -transitionary populations, in one significant way they fall short. They fail to -consider contact-based outcomes resulting in mixed language practices, e.g. -outcome scenarios such as creoles or unmarked code switching as an emergent -communicative norm. On these lines something very interesting is uncovered in -India, where traditionally there have been monolingual Hindi speakers and -Hindi/English bilinguals, but virtually no monolingual English speakers. While -the Indian census data reports a sharp increase in the proportion of -Hindi/English bilinguals, we argue that the number of Hindi/English bilinguals -in India is inaccurate, given a new class of urban individuals speaking a mixed -lect of Hindi and English, popularly known as ""Hinglish"". Based on -predator-prey, sociolinguistic theories, salient local ecological factors and -the rural-urban divide in India, we propose a new mathematical model of -interacting monolingual Hindi speakers, Hindi/English bilinguals and Hinglish -speakers. The model yields globally asymptotic stable states of coexistence, as -well as bilingual extinction. To validate our model, sociolinguistic data from -different Indian classes are contrasted with census reports: We see that -purported urban Hindi/English bilinguals are unable to maintain fluent Hindi -speech and instead produce Hinglish, whereas rural speakers evidence -monolingual Hindi. Thus we present evidence for the first time where an -unrecognized mixed lect involving English but not ""English"", has possibly taken -over a sizeable faction of a large global population. -" -1260,1406.5181,"Jake Ryland Williams, Paul R. Lessard, Suma Desu, Eric Clark, James P. - Bagrow, Christopher M. Danforth, and Peter Sheridan Dodds","Zipf's law holds for phrases, not words",cs.CL physics.soc-ph," With Zipf's law being originally and most famously observed for word -frequency, it is surprisingly limited in its applicability to human language, -holding over no more than three to four orders of magnitude before hitting a -clear break in scaling. Here, building on the simple observation that phrases -of one or more words comprise the most coherent units of meaning in language, -we show empirically that Zipf's law for phrases extends over as many as nine -orders of rank magnitude. In doing so, we develop a principled and scalable -statistical mechanical method of random text partitioning, which opens up a -rich frontier of rigorous text analysis via a rank ordering of mixed length -phrases. -" -1261,1406.5598,Reshma Prasad and Mary Priya Sebastian,A survey on phrase structure learning methods for text classification,cs.CL," Text classification is a task of automatic classification of text into one of -the predefined categories. The problem of text classification has been widely -studied in different communities like natural language processing, data mining -and information retrieval. Text classification is an important constituent in -many information management tasks like topic identification, spam filtering, -email routing, language identification, genre classification, readability -assessment etc. The performance of text classification improves notably when -phrase patterns are used. The use of phrase patterns helps in capturing -non-local behaviours and thus helps in the improvement of text classification -task. Phrase structure extraction is the first step to continue with the phrase -pattern identification. In this survey, detailed study of phrase structure -learning methods have been carried out. This will enable future work in several -NLP tasks, which uses syntactic information from phrase structure like grammar -checkers, question answering, information extraction, machine translation, text -classification. The paper also provides different levels of classification and -detailed comparison of the phrase structure learning methods. -" -1262,1406.5679,"Andrej Karpathy, Armand Joulin and Li Fei-Fei",Deep Fragment Embeddings for Bidirectional Image Sentence Mapping,cs.CV cs.CL cs.LG," We introduce a model for bidirectional retrieval of images and sentences -through a multi-modal embedding of visual and natural language data. Unlike -previous models that directly map images or sentences into a common embedding -space, our model works on a finer level and embeds fragments of images -(objects) and fragments of sentences (typed dependency tree relations) into a -common space. In addition to a ranking objective seen in previous work, this -allows us to add a new fragment alignment objective that learns to directly -associate these fragments across modalities. Extensive experimental evaluation -shows that reasoning on both the global level of images and sentences and the -finer level of their respective fragments significantly improves performance on -image-sentence retrieval tasks. Additionally, our model provides interpretable -predictions since the inferred inter-modal fragment alignment is explicit. -" -1263,1406.5691,"John J. Camilleri, Gabriele Paganelli, Gerardo Schneider",A CNL for Contract-Oriented Diagrams,cs.CL cs.FL," We present a first step towards a framework for defining and manipulating -normative documents or contracts described as Contract-Oriented (C-O) Diagrams. -These diagrams provide a visual representation for such texts, giving the -possibility to express a signatory's obligations, permissions and prohibitions, -with or without timing constraints, as well as the penalties resulting from the -non-fulfilment of a contract. This work presents a CNL for verbalising C-O -Diagrams, a web-based tool allowing editing in this CNL, and another for -visualising and manipulating the diagrams interactively. We then show how these -proof-of-concept tools can be used by applying them to a small example. -" -1264,1406.5824,"Serena Yeung, Alireza Fathi, and Li Fei-Fei",VideoSET: Video Summary Evaluation through Text,cs.CV cs.CL cs.IR," In this paper we present VideoSET, a method for Video Summary Evaluation -through Text that can evaluate how well a video summary is able to retain the -semantic information contained in its original video. We observe that semantics -is most easily expressed in words, and develop a text-based approach for the -evaluation. Given a video summary, a text representation of the video summary -is first generated, and an NLP-based metric is then used to measure its -semantic distance to ground-truth text summaries written by humans. We show -that our technique has higher agreement with human judgment than pixel-based -distance metrics. We also release text annotations and ground-truth text -summaries for a number of publicly available video datasets, for use by the -computer vision community. -" -1265,1406.6101,"Imen Trabelsi, Dorra Ben Ayed, Noureddine Ellouze","Improved Frame Level Features and SVM Supervectors Approach for the - Recogniton of Emotional States from Speech: Application to categorical and - dimensional states",cs.CL cs.LG," The purpose of speech emotion recognition system is to classify speakers -utterances into different emotional states such as disgust, boredom, sadness, -neutral and happiness. Speech features that are commonly used in speech emotion -recognition rely on global utterance level prosodic features. In our work, we -evaluate the impact of frame level feature extraction. The speech samples are -from Berlin emotional database and the features extracted from these utterances -are energy, different variant of mel frequency cepstrum coefficients, velocity -and acceleration features. -" -1266,1406.6312,"Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare Voss, Jiawei Han",Scalable Topical Phrase Mining from Text Corpora,cs.CL cs.IR cs.LG," While most topic modeling algorithms model text corpora with unigrams, human -interpretation often relies on inherent grouping of terms into phrases. As -such, we consider the problem of discovering topical phrases of mixed lengths. -Existing work either performs post processing to the inference results of -unigram-based topic models, or utilizes complex n-gram-discovery topic models. -These methods generally produce low-quality topical phrases or suffer from poor -scalability on even moderately-sized datasets. We propose a different approach -that is both computationally efficient and effective. Our solution combines a -novel phrase mining framework to segment a document into single and multi-word -phrases, and a new topic model that operates on the induced document partition. -Our approach discovers high quality topical phrases with negligible extra cost -to the bag-of-words topic model in a variety of datasets including research -publication titles, abstracts, reviews, and news articles. -" -1267,1406.6844,"Normunds Gruzitis, Peteris Paikens, Guntis Barzdins",FrameNet Resource Grammar Library for GF,cs.CL," In this paper we present an ongoing research investigating the possibility -and potential of integrating frame semantics, particularly FrameNet, in the -Grammatical Framework (GF) application grammar development. An important -component of GF is its Resource Grammar Library (RGL) that encapsulates the -low-level linguistic knowledge about morphology and syntax of currently more -than 20 languages facilitating rapid development of multilingual applications. -In the ideal case, porting a GF application grammar to a new language would -only require introducing the domain lexicon - translation equivalents that are -interlinked via common abstract terms. While it is possible for a highly -restricted CNL, developing and porting a less restricted CNL requires above -average linguistic knowledge about the particular language, and above average -GF experience. Specifying a lexicon is mostly straightforward in the case of -nouns (incl. multi-word units), however, verbs are the most complex category -(in terms of both inflectional paradigms and argument structure), and adding -them to a GF application grammar is not a straightforward task. In this paper -we are focusing on verbs, investigating the possibility of creating a -multilingual FrameNet-based GF library. We propose an extension to the current -RGL, allowing GF application developers to define clauses on the semantic -level, thus leaving the language-specific syntactic mapping to this extension. -We demonstrate our approach by reengineering the MOLTO Phrasebook application -grammar. -" -1268,1406.7314,Imen Trabelsi and Dorra Ben Ayed,"On the Use of Different Feature Extraction Methods for Linear and Non - Linear kernels",cs.CL cs.LG," The speech feature extraction has been a key focus in robust speech -recognition research; it significantly affects the recognition performance. In -this paper, we first study a set of different features extraction methods such -as linear predictive coding (LPC), mel frequency cepstral coefficient (MFCC) -and perceptual linear prediction (PLP) with several features normalization -techniques like rasta filtering and cepstral mean subtraction (CMS). Based on -this, a comparative evaluation of these features is performed on the task of -text independent speaker identification using a combination between gaussian -mixture models (GMM) and linear and non-linear kernels based on support vector -machine (SVM). -" -1269,1406.7483,"Alicia Gonzalez Martinez, Susana Lopez Hervas, Doaa Samy, Carlos G. - Arques, Antonio Moreno Sandoval","Jabalin: a Comprehensive Computational Model of Modern Standard Arabic - Verbal Morphology Based on Traditional Arabic Prosody",cs.CL," The computational handling of Modern Standard Arabic is a challenge in the -field of natural language processing due to its highly rich morphology. -However, several authors have pointed out that the Arabic morphological system -is in fact extremely regular. The existing Arabic morphological analyzers have -exploited this regularity to variable extent, yet we believe there is still -some scope for improvement. Taking inspiration in traditional Arabic prosody, -we have designed and implemented a compact and simple morphological system -which in our opinion takes further advantage of the regularities encountered in -the Arabic morphological system. The output of the system is a large-scale -lexicon of inflected forms that has subsequently been used to create an Online -Interface for a morphological analyzer of Arabic verbs. The Jabalin Online -Interface is available at http://elvira.lllf.uam.es/jabalin/, hosted at the -LLI-UAM lab. The generation system is also available under a GNU GPL 3 license. -" -1270,1406.7558,"Nicolas Fay, Monica Tamariz, T Mark Ellison, Dale Barr",Human Communication Systems Evolve by Cultural Selection,cs.SI cs.CL physics.soc-ph," Human communication systems, such as language, evolve culturally; their -components undergo reproduction and variation. However, a role for selection in -cultural evolutionary dynamics is less clear. Often neutral evolution (also -known as 'drift') models, are used to explain the evolution of human -communication systems, and cultural evolution more generally. Under this -account, cultural change is unbiased: for instance, vocabulary, baby names and -pottery designs have been found to spread through random copying. - While drift is the null hypothesis for models of cultural evolution it does -not always adequately explain empirical results. Alternative models include -cultural selection, which assumes variant adoption is biased. Theoretical -models of human communication argue that during conversation interlocutors are -biased to adopt the same labels and other aspects of linguistic representation -(including prosody and syntax). This basic alignment mechanism has been -extended by computer simulation to account for the emergence of linguistic -conventions. When agents are biased to match the linguistic behavior of their -interlocutor, a single variant can propagate across an entire population of -interacting computer agents. This behavior-matching account operates at the -level of the individual. We call it the Conformity-biased model. Under a -different selection account, called content-biased selection, functional -selection or replicator selection, variant adoption depends upon the intrinsic -value of the particular variant (e.g., ease of learning or use). This second -alternative account operates at the level of the cultural variant. Following -Boyd and Richerson we call it the Content-biased model. The present paper tests -the drift model and the two biased selection models' ability to explain the -spread of communicative signal variants in an experimental micro-society. -" -1271,1406.7806,"Andrew L. Maas, Peng Qi, Ziang Xie, Awni Y. Hannun, Christopher T. - Lengerich, Daniel Jurafsky and Andrew Y. Ng",Building DNN Acoustic Models for Large Vocabulary Speech Recognition,cs.CL cs.LG cs.NE stat.ML," Deep neural networks (DNNs) are now a central component of nearly all -state-of-the-art speech recognition systems. Building neural network acoustic -models requires several design decisions including network architecture, size, -and training loss function. This paper offers an empirical investigation on -which aspects of DNN acoustic model design are most important for speech -recognition system performance. We report DNN classifier performance and final -speech recognizer word error rates, and compare DNNs using several metrics to -quantify factors influencing differences in task performance. Our first set of -experiments use the standard Switchboard benchmark corpus, which contains -approximately 300 hours of conversational telephone speech. We compare standard -DNNs to convolutional networks, and present the first experiments using -locally-connected, untied neural networks for acoustic modeling. We -additionally build systems on a corpus of 2,100 hours of training data by -combining the Switchboard and Fisher corpora. This larger corpus allows us to -more thoroughly examine performance of large DNN models -- with up to ten times -more parameters than those typically used in speech recognition systems. Our -results suggest that a relatively simple DNN architecture and optimization -technique produces strong results. These findings, along with previous work, -help establish a set of best practices for building DNN hybrid speech -recognition systems with maximum likelihood training. Our experiments in DNN -optimization additionally serve as a case study for training DNNs with -discriminative loss functions for speech tasks, as well as DNN classifiers more -generally. -" -1272,1407.0167,Robert Pagael and Moritz Schubotz,Mathematical Language Processing Project,cs.DL cs.CL cs.IR," In natural language, words and phrases themselves imply the semantics. In -contrast, the meaning of identifiers in mathematical formulae is undefined. -Thus scientists must study the context to decode the meaning. The Mathematical -Language Processing (MLP) project aims to support that process. In this paper, -we compare two approaches to discover identifier-definition tuples. At first we -use a simple pattern matching approach. Second, we present the MLP approach -that uses part-of-speech tag based distances as well as sentence positions to -calculate identifier-definition probabilities. The evaluation of our -prototypical system, applied on the Wikipedia text corpus, shows that our -approach augments the user experience substantially. While hovering the -identifiers in the formula, tool-tips with the most probable definitions occur. -Tests with random samples show that the displayed definitions provide a good -match with the actual meaning of the identifiers. -" -1273,1407.1165,"Prashant Bordea, Amarsinh Varpeb, Ramesh Manzac, Pravin Yannawara","Recognition of Isolated Words using Zernike and MFCC features for Audio - Visual Speech Recognition",cs.CV cs.CL," Automatic Speech Recognition (ASR) by machine is an attractive research topic -in signal processing domain and has attracted many researchers to contribute in -this area. In recent year, there have been many advances in automatic speech -reading system with the inclusion of audio and visual speech features to -recognize words under noisy conditions. The objective of audio-visual speech -recognition system is to improve recognition accuracy. In this paper we -computed visual features using Zernike moments and audio feature using Mel -Frequency Cepstral Coefficients (MFCC) on vVISWa (Visual Vocabulary of -Independent Standard Words) dataset which contains collection of isolated set -of city names of 10 speakers. The visual features were normalized and dimension -of features set was reduced by Principal Component Analysis (PCA) in order to -recognize the isolated word utterance on PCA space.The performance of -recognition of isolated words based on visual only and audio only features -results in 63.88 and 100 respectively. -" -1274,1407.1605,"\'Emeline Lecuit (LLL), Denis Maurel (LI), Dusko Vitas",Les noms propres se traduisent-ils ? \'Etude d'un corpus multilingue,cs.CL," In this paper, we tackle the problem of the translation of proper names. We -introduce our hypothesis according to which proper names can be translated more -often than most people seem to think. Then, we describe the construction of a -parallel multilingual corpus used to illustrate our point. We eventually -evaluate both the advantages and limits of this corpus in our study. -" -1275,1407.1640,"Bin Gao, Jiang Bian, and Tie-Yan Liu",WordRep: A Benchmark for Research on Learning Word Representations,cs.CL cs.LG," WordRep is a benchmark collection for the research on learning distributed -word representations (or word embeddings), released by Microsoft Research. In -this paper, we describe the details of the WordRep collection and show how to -use it in different types of machine learning research related to word -embedding. Specifically, we describe how the evaluation tasks in WordRep are -selected, how the data are sampled, and how the evaluation tool is built. We -then compare several state-of-the-art word representations on WordRep, report -their evaluation performance, and make discussions on the results. After that, -we discuss new potential research topics that can be supported by WordRep, in -addition to algorithm comparison. We hope that this paper can help people gain -deeper understanding of WordRep, and enable more interesting research on -learning distributed word representations and related topics. -" -1276,1407.1687,"Qing Cui, Bin Gao, Jiang Bian, Siyu Qiu, and Tie-Yan Liu","KNET: A General Framework for Learning Word Embedding using - Morphological Knowledge",cs.CL cs.LG," Neural network techniques are widely applied to obtain high-quality -distributed representations of words, i.e., word embeddings, to address text -mining, information retrieval, and natural language processing tasks. Recently, -efficient methods have been proposed to learn word embeddings from context that -captures both semantic and syntactic relationships between words. However, it -is challenging to handle unseen words or rare words with insufficient context. -In this paper, inspired by the study on word recognition process in cognitive -psychology, we propose to take advantage of seemingly less obvious but -essentially important morphological knowledge to address these challenges. In -particular, we introduce a novel neural network architecture called KNET that -leverages both contextual information and morphological word similarity built -based on morphological knowledge to learn word embeddings. Meanwhile, the -learning architecture is also able to refine the pre-defined morphological -knowledge and obtain more accurate word similarity. Experiments on an -analogical reasoning task and a word similarity task both demonstrate that the -proposed KNET framework can greatly enhance the effectiveness of word -embeddings. -" -1277,1407.1933,Adam Saulwick,Lexpresso: a Controlled Natural Language,cs.CL cs.AI," This paper presents an overview of `Lexpresso', a Controlled Natural Language -developed at the Defence Science & Technology Organisation as a bidirectional -natural language interface to a high-level information fusion system. The paper -describes Lexpresso's main features including lexical coverage, expressiveness -and range of linguistic syntactic and semantic structures. It also touches on -its tight integration with a formal semantic formalism and tentatively -classifies it against the PENS system. -" -1278,1407.1976,"Shanta Phani, Shibamouli Lahiri and Arindam Biswas",Inter-Rater Agreement Study on Readability Assessment in Bengali,cs.CL," An inter-rater agreement study is performed for readability assessment in -Bengali. A 1-7 rating scale was used to indicate different levels of -readability. We obtained moderate to fair agreement among seven independent -annotators on 30 text passages written by four eminent Bengali authors. As a by -product of our study, we obtained a readability-annotated ground truth dataset -in Bengali. . -" -1279,1407.2019,"Kalyanee Kanchan Baruah, Pranjal Das, Abdul Hannan, Shikhar Kr. Sarma",Assamese-English Bilingual Machine Translation,cs.CL," Machine translation is the process of translating text from one language to -another. In this paper, Statistical Machine Translation is done on Assamese and -English language by taking their respective parallel corpus. A statistical -phrase based translation toolkit Moses is used here. To develop the language -model and to align the words we used two another tools IRSTLM, GIZA -respectively. BLEU score is used to check our translation system performance, -how good it is. A difference in BLEU scores is obtained while translating -sentences from Assamese to English and vice-versa. Since Indian languages are -morphologically very rich hence translation is relatively harder from English -to Assamese resulting in a low BLEU score. A statistical transliteration system -is also introduced with our translation system to deal basically with proper -nouns, OOV (out of vocabulary) words which are not present in our corpus. -" -1280,1407.2694,"Pooja Gupta, Nisheeth Joshi, Iti Mathur",Quality Estimation Of Machine Translation Outputs Through Stemming,cs.CL," Machine Translation is the challenging problem for Indian languages. Every -day we can see some machine translators being developed, but getting a high -quality automatic translation is still a very distant dream . The correct -translated sentence for Hindi language is rarely found. In this paper, we are -emphasizing on English-Hindi language pair, so in order to preserve the correct -MT output we present a ranking system, which employs some machine learning -techniques and morphological features. In ranking no human intervention is -required. We have also validated our results by comparing it with human -ranking. -" -1281,1407.2918,"Gitimoni Talukdar, Pranjal Protim Borah, Arup Baruah","A Survey of Named Entity Recognition in Assamese and other Indian - Languages",cs.CL," Named Entity Recognition is always important when dealing with major Natural -Language Processing tasks such as information extraction, question-answering, -machine translation, document summarization etc so in this paper we put forward -a survey of Named Entities in Indian Languages with particular reference to -Assamese. There are various rule-based and machine learning approaches -available for Named Entity Recognition. At the very first of the paper we give -an idea of the available approaches for Named Entity Recognition and then we -discuss about the related research in this field. Assamese like other Indian -languages is agglutinative and suffers from lack of appropriate resources as -Named Entity Recognition requires large data sets, gazetteer list, dictionary -etc and some useful feature like capitalization as found in English cannot be -found in Assamese. Apart from this we also describe some of the issues faced in -Assamese while doing Named Entity Recognition. -" -1282,1407.2989,A.J.P.M.P. Jayaweera and N.G.J. Dias,Hidden Markov Model Based Part of Speech Tagger for Sinhala Language,cs.CL," In this paper we present a fundamental lexical semantics of Sinhala language -and a Hidden Markov Model (HMM) based Part of Speech (POS) Tagger for Sinhala -language. In any Natural Language processing task, Part of Speech is a very -vital topic, which involves analysing of the construction, behaviour and the -dynamics of the language, which the knowledge could utilized in computational -linguistics analysis and automation applications. Though Sinhala is a -morphologically rich and agglutinative language, in which words are inflected -with various grammatical features, tagging is very essential for further -analysis of the language. Our research is based on statistical based approach, -in which the tagging process is done by computing the tag sequence probability -and the word-likelihood probability from the given corpus, where the linguistic -knowledge is automatically extracted from the annotated corpus. The current -tagger could reach more than 90% of accuracy for known words. -" -1283,1407.3636,"Sabina \v{S}i\v{s}ovi\'c, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c and Ana - Me\v{s}trovi\'c",Toward Network-based Keyword Extraction from Multitopic Web Documents,cs.CL cs.IR," In this paper we analyse the selectivity measure calculated from the complex -network in the task of the automatic keyword extraction. Texts, collected from -different web sources (portals, forums), are represented as directed and -weighted co-occurrence complex networks of words. Words are nodes and links are -established between two nodes if they are directly co-occurring within the -sentence. We test different centrality measures for ranking nodes - keyword -candidates. The promising results are achieved using the selectivity measure. -Then we propose an approach which enables extracting word pairs according to -the values of the in/out selectivity and weight measures combined with -filtering. -" -1284,1407.3751,"Sutanay Choudhury, Chase Dowling",Benchmarking Named Entity Disambiguation approaches for Streaming Graphs,cs.CL cs.IR," Named Entity Disambiaguation (NED) is a central task for applications dealing -with natural language text. Assume that we have a graph based knowledge base -(subsequently referred as Knowledge Graph) where nodes represent various real -world entities such as people, location, organization and concepts. Given data -sources such as social media streams and web pages Entity Linking is the task -of mapping named entities that are extracted from the data to those present in -the Knowledge Graph. This is an inherently difficult task due to several -reasons. Almost all these data sources are generated without any formal -ontology; the unstructured nature of the input, limited context and the -ambiguity involved when multiple entities are mapped to the same name make this -a hard task. This report looks at two state of the art systems employing two -distinctive approaches: graph based Accurate Online Disambiguation of Entities -(AIDA) and Mined Evidence Named Entity Disambiguation (MENED), which employs a -statistical inference approach. We compare both approaches using the data set -and queries provided by the Knowledge Base Population (KBP) track at 2011 NIST -Text Analytics Conference (TAC). This report begins with an overview of the -respective approaches, followed by detailed description of the experimental -setup. It concludes with our findings from the benchmarking exercise. -" -1285,1407.4610,"Stefan Thurner, Rudolf Hanel, Bo Liu and Bernat Corominas-Murtra","Understanding Zipf's law of word frequencies through sample-space - collapse in sentence formation",physics.soc-ph cs.CL," The formation of sentences is a highly structured and history-dependent -process. The probability of using a specific word in a sentence strongly -depends on the 'history' of word-usage earlier in that sentence. We study a -simple history-dependent model of text generation assuming that the -sample-space of word usage reduces along sentence formation, on average. We -first show that the model explains the approximate Zipf law found in word -frequencies as a direct consequence of sample-space reduction. We then -empirically quantify the amount of sample-space reduction in the sentences of -ten famous English books, by analysis of corresponding word-transition tables -that capture which words can follow any given word in a text. We find a highly -nested structure in these transition tables and show that this `nestedness' is -tightly related to the power law exponents of the observed word frequency -distributions. With the proposed model it is possible to understand that the -nestedness of a text can be the origin of the actual scaling exponent, and that -deviations from the exact Zipf law can be understood by variations of the -degree of nestedness on a book-by-book basis. On a theoretical level we are -able to show that in case of weak nesting, Zipf's law breaks down in a fast -transition. Unlike previous attempts to understand Zipf's law in language the -sample-space reducing model is not based on assumptions of multiplicative, -preferential, or self-organised critical mechanisms behind language formation, -but simply used the empirically quantifiable parameter 'nestedness' to -understand the statistics of word frequencies. -" -1286,1407.4723,"Slobodan Beliga, Ana Me\v{s}trovi\'c, Sanda - Martin\v{c}i\'c-Ip\v{s}i\'c",Toward Selectivity Based Keyword Extraction for Croatian News,cs.CL cs.IR cs.SI," Preliminary report on network based keyword extraction for Croatian is an -unsupervised method for keyword extraction from the complex network. We build -our approach with a new network measure the node selectivity, motivated by the -research of the graph based centrality approaches. The node selectivity is -defined as the average weight distribution on the links of the single node. We -extract nodes (keyword candidates) based on the selectivity value. Furthermore, -we expand extracted nodes to word-tuples ranked with the highest in/out -selectivity values. Selectivity based extraction does not require linguistic -knowledge while it is purely derived from statistical and structural -information en-compassed in the source text which is reflected into the -structure of the network. Obtained sets are evaluated on a manually annotated -keywords: for the set of extracted keyword candidates average F1 score is -24,63%, and average F2 score is 21,19%; for the exacted words-tuples candidates -average F1 score is 25,9% and average F2 score is 24,47%. -" -1287,1407.6027,"Alberto Besana, Cristina Mart\'inez",Modeling languages from graph networks,cs.CL math.CO," We model and compute the probability distribution of the letters in random -generated words in a language by using the theory of set partitions, Young -tableaux and graph theoretical representation methods. This has been of -interest for several application areas such as network systems, bioinformatics, -internet search, data mining and computacional linguistics. -" -1288,1407.6099,"S.G. Macdonell, K. Min, A.M. Connor","Autonomous requirements specification processing using natural language - processing",cs.CL cs.SE," We describe our ongoing research that centres on the application of natural -language processing (NLP) to software engineering and systems development -activities. In particular, this paper addresses the use of NLP in the -requirements analysis and systems design processes. We have developed a -prototype toolset that can assist the systems analyst or software engineer to -select and verify terms relevant to a project. In this paper we describe the -processes employed by the system to extract and classify objects of interest -from requirements documents. These processes are illustrated using a small -example. -" -1289,1407.6439,"Christopher R\'e, Amir Abbas Sadeghian, Zifei Shan, Jaeho Shin, Feiran - Wang, Sen Wu, Ce Zhang",Feature Engineering for Knowledge Base Construction,cs.DB cs.CL cs.LG," Knowledge base construction (KBC) is the process of populating a knowledge -base, i.e., a relational database together with inference rules, with -information extracted from documents and structured sources. KBC blurs the -distinction between two traditional database problems, information extraction -and information integration. For the last several years, our group has been -building knowledge bases with scientific collaborators. Using our approach, we -have built knowledge bases that have comparable and sometimes better quality -than those constructed by human volunteers. In contrast to these knowledge -bases, which took experts a decade or more human years to construct, many of -our projects are constructed by a single graduate student. - Our approach to KBC is based on joint probabilistic inference and learning, -but we do not see inference as either a panacea or a magic bullet: inference is -a tool that allows us to be systematic in how we construct, debug, and improve -the quality of such systems. In addition, inference allows us to construct -these systems in a more loosely coupled way than traditional approaches. To -support this idea, we have built the DeepDive system, which has the design goal -of letting the user ""think about features---not algorithms."" We think of -DeepDive as declarative in that one specifies what they want but not how to get -it. We describe our approach with a focus on feature engineering, which we -argue is an understudied problem relative to its importance to end-to-end -quality. -" -1290,1407.6639,Torsten Timm,How the Voynich Manuscript was created,cs.CR cs.CL," The Voynich manuscript is a medieval book written in an unknown script. This -paper studies the relation between similarly spelled words in the Voynich -manuscript. By means of a detailed analysis of similar spelled words it was -possible to reveal the text generation method used for the Voynich manuscript. -" -1291,1407.6853,Volkan Cirik and Deniz Yuret,Substitute Based SCODE Word Embeddings in Supervised NLP Tasks,cs.CL," We analyze a word embedding method in supervised tasks. It maps words on a -sphere such that words co-occurring in similar contexts lie closely. The -similarity of contexts is measured by the distribution of substitutes that can -fill them. We compared word embeddings, including more recent representations, -in Named Entity Recognition (NER), Chunking, and Dependency Parsing. We examine -our framework in multilingual dependency parsing as well. The results show that -the proposed method achieves as good as or better results compared to the other -word embeddings in the tasks we investigate. It achieves state-of-the-art -results in multilingual dependency parsing. Word embeddings in 7 languages are -available for public use. -" -1292,1407.6872,Ivan Ivek,"Interpretable Low-Rank Document Representations with Label-Dependent - Sparsity Patterns",cs.CL cs.IR cs.LG," In context of document classification, where in a corpus of documents their -label tags are readily known, an opportunity lies in utilizing label -information to learn document representation spaces with better discriminative -properties. To this end, in this paper application of a Variational Bayesian -Supervised Nonnegative Matrix Factorization (supervised vbNMF) with -label-driven sparsity structure of coefficients is proposed for learning of -discriminative nonsubtractive latent semantic components occuring in TF-IDF -document representations. Constraints are such that the components pursued are -made to be frequently occuring in a small set of labels only, making it -possible to yield document representations with distinctive label-specific -sparse activation patterns. A simple measure of quality of this kind of -sparsity structure, dubbed inter-label sparsity, is introduced and -experimentally brought into tight connection with classification performance. -Representing a great practical convenience, inter-label sparsity is shown to be -easily controlled in supervised vbNMF by a single parameter. -" -1293,1407.7094,Bruno Gon\c{c}alves and David S\'anchez,Crowdsourcing Dialect Characterization through Twitter,physics.soc-ph cs.CL cs.SI stat.ML," We perform a large-scale analysis of language diatopic variation using -geotagged microblogging datasets. By collecting all Twitter messages written in -Spanish over more than two years, we build a corpus from which a carefully -selected list of concepts allows us to characterize Spanish varieties on a -global scale. A cluster analysis proves the existence of well defined -macroregions sharing common lexical properties. Remarkably enough, we find that -Spanish language is split into two superdialects, namely, an urban speech used -across major American and Spanish citites and a diverse form that encompasses -rural areas and small towns. The latter can be further clustered into smaller -varieties with a stronger regional character. -" -1294,1407.7169,Matilde Marcolli,Principles and Parameters: a coding theory perspective,cs.CL cs.IT math.IT," We propose an approach to Longobardi's parametric comparison method (PCM) via -the theory of error-correcting codes. One associates to a collection of -languages to be analyzed with the PCM a binary (or ternary) code with one code -words for each language in the family and each word consisting of the binary -values of the syntactic parameters of the language, with the ternary case -allowing for an additional parameter state that takes into account phenomena of -entailment of parameters. The code parameters of the resulting code can be -compared with some classical bounds in coding theory: the asymptotic bound, the -Gilbert-Varshamov bound, etc. The position of the code parameters with respect -to some of these bounds provides quantitative information on the variability of -syntactic parameters within and across historical-linguistic families. While -computations carried out for languages belonging to the same family yield codes -below the GV curve, comparisons across different historical families can give -examples of isolated codes lying above the asymptotic bound. -" -1295,1407.7357,Yannis Haralambous and Philippe Lenca,"Text Classification Using Association Rules, Dependency Pruning and - Hyperonymization",cs.IR cs.CL," We present new methods for pruning and enhancing item- sets for text -classification via association rule mining. Pruning methods are based on -dependency syntax and enhancing methods are based on replacing words by their -hyperonyms of various orders. We discuss the impact of these methods, compared -to pruning based on tfidf rank of words. -" -1296,1407.7736,"Xiangju Qin, Derek Greene, and P\'adraig Cunningham",A Latent Space Analysis of Editor Lifecycles in Wikipedia,cs.SI cs.CL cs.CY physics.soc-ph," Collaborations such as Wikipedia are a key part of the value of the modern -Internet. At the same time there is concern that these collaborations are -threatened by high levels of member turnover. In this paper we borrow ideas -from topic analysis to editor activity on Wikipedia over time into a latent -space that offers an insight into the evolving patterns of editor behavior. -This latent space representation reveals a number of different categories of -editor (e.g. content experts, social networkers) and we show that it does -provide a signal that predicts an editor's departure from the community. We -also show that long term editors gradually diversify their participation by -shifting edit preference from one or two namespaces to multiple namespaces and -experience relatively soft evolution in their editor profiles, while short term -editors generally distribute their contribution randomly among the namespaces -and experience considerably fluctuated evolution in their editor profiles. -" -1297,1407.8215,Vanessa Wei Feng and Graeme Hirst,Two-pass Discourse Segmentation with Pairing and Global Features,cs.CL," Previous attempts at RST-style discourse segmentation typically adopt -features centered on a single token to predict whether to insert a boundary -before that token. In contrast, we develop a discourse segmenter utilizing a -set of pairing features, which are centered on a pair of adjacent tokens in the -sentence, by equally taking into account the information from both tokens. -Moreover, we propose a novel set of global features, which encode -characteristics of the segmentation as a whole, once we have an initial -segmentation. We show that both the pairing and global features are useful on -their own, and their combination achieved an $F_1$ of 92.6% of identifying -in-sentence discourse boundaries, which is a 17.8% error-rate reduction over -the state-of-the-art performance, approaching 95% of human performance. In -addition, similar improvement is observed across different classification -frameworks. -" -1298,1407.8322,"Alvaro Corral, Gemma Boleda and Ramon Ferrer-i-Cancho",Zipf's law for word frequencies: word forms versus lemmas in long texts,physics.soc-ph cs.CL physics.data-an," Zipf's law is a fundamental paradigm in the statistics of written and spoken -natural language as well as in other communication systems. We raise the -question of the elementary units for which Zipf's law should hold in the most -natural way, studying its validity for plain word forms and for the -corresponding lemma forms. In order to have as homogeneous sources as possible, -we analyze some of the longest literary texts ever written, comprising four -different languages, with different levels of morphological complexity. In all -cases Zipf's law is fulfilled, in the sense that a power-law distribution of -word or lemma frequencies is valid for several orders of magnitude. We -investigate the extent to which the word-lemma transformation preserves two -parameters of Zipf's law: the exponent and the low-frequency cut-off. We are -not able to demonstrate a strict invariance of the tail, as for a few texts -both exponents deviate significantly, but we conclude that the exponents are -very similar, despite the remarkable transformation that going from words to -lemmas represents, considerably affecting all ranges of frequencies. In -contrast, the low-frequency cut-offs are less stable. -" -1299,1408.0016,Stephen Guy and Rolf Schwitter,"Architecture of a Web-based Predictive Editor for Controlled Natural - Language Processing",cs.CL cs.AI," In this paper, we describe the architecture of a web-based predictive text -editor being developed for the controlled natural language PENG$^{ASP)$. This -controlled language can be used to write non-monotonic specifications that have -the same expressive power as Answer Set Programs. In order to support the -writing process of these specifications, the predictive text editor -communicates asynchronously with the controlled natural language processor that -generates lookahead categories and additional auxiliary information for the -author of a specification text. The text editor can display multiple sets of -lookahead categories simultaneously for different possible sentence -completions, anaphoric expressions, and supports the addition of new content -words to the lexicon. -" -1300,1408.0782,Sandeep Ashwini and Jinho D. Choi,Targetable Named Entity Recognition in Social Media,cs.CL," We present a novel approach for recognizing what we call targetable named -entities; that is, named entities in a targeted set (e.g, movies, books, TV -shows). Unlike many other NER systems that need to retrain their statistical -models as new entities arrive, our approach does not require such retraining, -which makes it more adaptable for types of entities that are frequently -updated. For this preliminary study, we focus on one entity type, movie title, -using data collected from Twitter. Our system is tested on two evaluation sets, -one including only entities corresponding to movies in our training set, and -the other excluding any of those entities. Our final model shows F1-scores of -76.19% and 78.70% on these evaluation sets, which gives strong evidence that -our approach is completely unbiased to any par- ticular set of entities found -during training. -" -1301,1408.0985,"Jordi Luque, Bartolo Luque and Lucas Lacasa",Speech earthquakes: scaling and universality in human voice,physics.soc-ph cs.CL q-bio.NC," Speech is a distinctive complex feature of human capabilities. In order to -understand the physics underlying speech production, in this work we -empirically analyse the statistics of large human speech datasets ranging -several languages. We first show that during speech the energy is unevenly -released and power-law distributed, reporting a universal robust -Gutenberg-Richter-like law in speech. We further show that such earthquakes in -speech show temporal correlations, as the interevent statistics are again -power-law distributed. Since this feature takes place in the intra-phoneme -range, we conjecture that the responsible for this complex phenomenon is not -cognitive, but it resides on the physiological speech production mechanism. -Moreover, we show that these waiting time distributions are scale invariant -under a renormalisation group transformation, suggesting that the process of -speech generation is indeed operating close to a critical point. These results -are put in contrast with current paradigms in speech processing, which point -towards low dimensional deterministic chaos as the origin of nonlinear traits -in speech fluctuations. As these latter fluctuations are indeed the aspects -that humanize synthetic speech, these findings may have an impact in future -speech synthesis technologies. Results are robust and independent of the -communication language or the number of speakers, pointing towards an universal -pattern and yet another hint of complexity in human speech. -" -1302,1408.1031,"Mohamed Elhoseiny, Ahmed Elgammal","Text to Multi-level MindMaps: A Novel Method for Hierarchical Visual - Abstraction of Natural Language Text",cs.CL cs.HC," MindMapping is a well-known technique used in note taking, which encourages -learning and studying. MindMapping has been manually adopted to help present -knowledge and concepts in a visual form. Unfortunately, there is no reliable -automated approach to generate MindMaps from Natural Language text. This work -firstly introduces MindMap Multilevel Visualization concept which is to jointly -visualize and summarize textual information. The visualization is achieved -pictorially across multiple levels using semantic information (i.e. ontology), -while the summarization is achieved by the information in the highest levels as -they represent abstract information in the text. This work also presents the -first automated approach that takes a text input and generates a MindMap -visualization out of it. The approach could visualize text documents in -multilevel MindMaps, in which a high-level MindMap node could be expanded into -child MindMaps. \ignore{ As far as we know, this is the first work that view -MindMapping as a new approach to jointly summarize and visualize textual -information.} The proposed method involves understanding of the input text and -converting it into intermediate Detailed Meaning Representation (DMR). The DMR -is then visualized with two modes; Single level or Multiple levels, which is -convenient for larger text. The generated MindMaps from both approaches were -evaluated based on Human Subject experiments performed on Amazon Mechanical -Turk with various parameter settings. -" -1303,1408.1774,Ramon Ferrer-i-Cancho,"Beyond description. Comment on ""Approaching human language with complex - networks"" by Cong & Liu",cs.CL cs.SI physics.soc-ph," Comment on ""Approaching human language with complex networks"" by Cong & Liu -" -1304,1408.1928,"Benjamin M Good, Max Nanis, Andrew I. Su","Microtask crowdsourcing for disease mention annotation in PubMed - abstracts",cs.CL," Identifying concepts and relationships in biomedical text enables knowledge -to be applied in computational analyses. Many biological natural language -process (BioNLP) projects attempt to address this challenge, but the state of -the art in BioNLP still leaves much room for improvement. Progress in BioNLP -research depends on large, annotated corpora for evaluating information -extraction systems and training machine learning models. Traditionally, such -corpora are created by small numbers of expert annotators often working over -extended periods of time. Recent studies have shown that workers on microtask -crowdsourcing platforms such as Amazon's Mechanical Turk (AMT) can, in -aggregate, generate high-quality annotations of biomedical text. Here, we -investigated the use of the AMT in capturing disease mentions in PubMed -abstracts. We used the NCBI Disease corpus as a gold standard for refining and -benchmarking our crowdsourcing protocol. After several iterations, we arrived -at a protocol that reproduced the annotations of the 593 documents in the -training set of this gold standard with an overall F measure of 0.872 -(precision 0.862, recall 0.883). The output can also be tuned to optimize for -precision (max = 0.984 when recall = 0.269) or recall (max = 0.980 when -precision = 0.436). Each document was examined by 15 workers, and their -annotations were merged based on a simple voting method. In total 145 workers -combined to complete all 593 documents in the span of 1 week at a cost of $.06 -per abstract per worker. The quality of the annotations, as judged with the F -measure, increases with the number of workers assigned to each task such that -the system can be tuned to balance cost against quality. These results -demonstrate that microtask crowdsourcing can be a valuable tool for generating -well-annotated corpora in BioNLP. -" -1305,1408.1985,"Janet B. Pierrehumbert, Forrest Stonedahl, and Robert Daland",A model of grassroots changes in linguistic systems,cs.CL nlin.AO physics.soc-ph," Linguistic norms emerge in human communities because people imitate each -other. A shared linguistic system provides people with the benefits of shared -knowledge and coordinated planning. Once norms are in place, why would they -ever change? This question, echoing broad questions in the theory of social -dynamics, has particular force in relation to language. By definition, an -innovator is in the minority when the innovation first occurs. In some areas of -social dynamics, important minorities can strongly influence the majority -through their power, fame, or use of broadcast media. But most linguistic -changes are grassroots developments that originate with ordinary people. Here, -we develop a novel model of communicative behavior in communities, and identify -a mechanism for arbitrary innovations by ordinary people to have a good chance -of being widely adopted. - To imitate each other, people must form a mental representation of what other -people do. Each time they speak, they must also decide which form to produce -themselves. We introduce a new decision function that enables us to smoothly -explore the space between two types of behavior: probability matching (matching -the probabilities of incoming experience) and regularization (producing some -forms disproportionately often). Using Monte Carlo methods, we explore the -interactions amongst the degree of regularization, the distribution of biases -in a network, and the network position of the innovator. We identify two -regimes for the widespread adoption of arbritrary innovations, viewed as -informational cascades in the network. With moderate regularization of -experienced input, average people (not well-connected people) are the most -likely source of successful innovations. Our results shed light on a major -outstanding puzzle in the theory of language change. The framework also holds -promise for understanding the dynamics of other social norms. -" -1306,1408.2359,Taraka Rama,"Gap-weighted subsequences for automatic cognate identification and - phylogenetic inference",cs.CL," In this paper, we describe the problem of cognate identification and its -relation to phylogenetic inference. We introduce subsequence based features for -discriminating cognates from non-cognates. We show that subsequence based -features perform better than the state-of-the-art string similarity measures -for the purpose of cognate identification. We use the cognate judgments for the -purpose of phylogenetic inference and observe that these classifiers infer a -tree which is close to the gold standard tree. The contribution of this paper -is the use of subsequence features for cognate identification and to employ the -cognate judgments for phylogenetic inference. -" -1307,1408.2430,"Boris Iolis, Gianluca Bontempi","Optimizing Component Combination in a Multi-Indexing Paragraph Retrieval - System",cs.IR cs.CL," We demonstrate a method to optimize the combination of distinct components in -a paragraph retrieval system. Our system makes use of several indices, query -generators and filters, each of them potentially contributing to the quality of -the returned list of results. The components are combined with a weighed sum, -and we optimize the weights using a heuristic optimization algorithm. This -allows us to maximize the quality of our results, but also to determine which -components are most valuable in our system. We evaluate our approach on the -paragraph selection task of a Question Answering dataset. -" -1308,1408.2466,Rolf Schwitter,"Controlled Natural Language Processing as Answer Set Programming: an - Experiment",cs.CL cs.AI," Most controlled natural languages (CNLs) are processed with the help of a -pipeline architecture that relies on different software components. We -investigate in this paper in an experimental way how well answer set -programming (ASP) is suited as a unifying framework for parsing a CNL, deriving -a formal representation for the resulting syntax trees, and for reasoning with -that representation. We start from a list of input tokens in ASP notation and -show how this input can be transformed into a syntax tree using an ASP grammar -and then into reified ASP rules in form of a set of facts. These facts are then -processed by an ASP meta-interpreter that allows us to infer new knowledge. -" -1309,1408.2699,"Christine F. Cuskley, Martina Pugliese, Claudio Castellano, Francesca - Colaiori, Vittorio Loreto, Francesca Tria","Internal and external dynamics in language: Evidence from verb - regularity in a historical corpus of English",physics.soc-ph cs.CL," Human languages are rule governed, but almost invariably these rules have -exceptions in the form of irregularities. Since rules in language are efficient -and productive, the persistence of irregularity is an anomaly. How does -irregularity linger in the face of internal (endogenous) and external -(exogenous) pressures to conform to a rule? Here we address this problem by -taking a detailed look at simple past tense verbs in the Corpus of Historical -American English. The data show that the language is open, with many new verbs -entering. At the same time, existing verbs might tend to regularize or -irregularize as a consequence of internal dynamics, but overall, the amount of -irregularity sustained by the language stays roughly constant over time. -Despite continuous vocabulary growth, and presumably, an attendant increase in -expressive power, there is no corresponding growth in irregularity. We analyze -the set of irregulars, showing they may adhere to a set of minority rules, -allowing for increased stability of irregularity over time. These findings -contribute to the debate on how language systems become rule governed, and how -and why they sustain exceptions to rules, providing insight into the interplay -between the emergence and maintenance of rules and exceptions in language. -" -1310,1408.2873,"Awni Y. Hannun, Andrew L. Maas, Daniel Jurafsky, Andrew Y. Ng","First-Pass Large Vocabulary Continuous Speech Recognition using - Bi-Directional Recurrent DNNs",cs.CL cs.LG cs.NE," We present a method to perform first-pass large vocabulary continuous speech -recognition using only a neural network and language model. Deep neural network -acoustic models are now commonplace in HMM-based speech recognition systems, -but building such systems is a complex, domain-specific task. Recent work -demonstrated the feasibility of discarding the HMM sequence modeling framework -by directly predicting transcript text from audio. This paper extends this -approach in two ways. First, we demonstrate that a straightforward recurrent -neural network architecture can achieve a high level of accuracy. Second, we -propose and evaluate a modified prefix-search decoding algorithm. This approach -to decoding enables first-pass speech recognition with a language model, -completely unaided by the cumbersome infrastructure of HMM-based systems. -Experiments on the Wall Street Journal corpus demonstrate fairly competitive -word error rates, and the importance of bi-directional network recurrence. -" -1311,1408.3153,L. Amber Wilcox-O'Hearn,Detection is the central problem in real-word spelling correction,cs.CL," Real-word spelling correction differs from non-word spelling correction in -its aims and its challenges. Here we show that the central problem in real-word -spelling correction is detection. Methods from non-word spelling correction, -which focus instead on selection among candidate corrections, do not address -detection adequately, because detection is either assumed in advance or heavily -constrained. As we demonstrate in this paper, merely discriminating between the -intended word and a random close variation of it within the context of a -sentence is a task that can be performed with high accuracy using -straightforward models. Trigram models are sufficient in almost all cases. The -difficulty comes when every word in the sentence is a potential error, with a -large set of possible candidate corrections. Despite their strengths, trigram -models cannot reliably find true errors without introducing many more, at least -not when used in the obvious sequential way without added structure. The -detection task exposes weakness not visible in the selection task. -" -1312,1408.3456,"Felix Hill, Roi Reichart and Anna Korhonen","SimLex-999: Evaluating Semantic Models with (Genuine) Similarity - Estimation",cs.CL," We present SimLex-999, a gold standard resource for evaluating distributional -semantic models that improves on existing resources in several important ways. -First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly -quantifies similarity rather than association or relatedness, so that pairs of -entities that are associated but not actually similar [Freud, psychology] have -a low rating. We show that, via this focus on similarity, SimLex-999 -incentivizes the development of models with a different, and arguably wider -range of applications than those which reflect conceptual association. Second, -SimLex-999 contains a range of concrete and abstract adjective, noun and verb -pairs, together with an independent rating of concreteness and (free) -association strength for each pair. This diversity enables fine-grained -analyses of the performance of models on concepts of different types, and -consequently greater insight into how architectures can be improved. Further, -unlike existing gold standard evaluations, for which automatic approaches have -reached or surpassed the inter-annotator agreement ceiling, state-of-the-art -models perform well below this ceiling on SimLex-999. There is therefore plenty -of scope for SimLex-999 to quantify future improvements to distributional -semantic models, guiding the development of the next generation of -representation-learning architectures. -" -1313,1408.3731,"Micha{\l} Jungiewicz, Micha{\l} {\L}opuszy\'nski",Unsupervised Keyword Extraction from Polish Legal Texts,cs.CL," In this work, we present an application of the recently proposed unsupervised -keyword extraction algorithm RAKE to a corpus of Polish legal texts from the -field of public procurement. RAKE is essentially a language and domain -independent method. Its only language-specific input is a stoplist containing a -set of non-content words. The performance of the method heavily depends on the -choice of such a stoplist, which should be domain adopted. Therefore, we -complement RAKE algorithm with an automatic approach to selecting non-content -words, which is based on the statistical properties of term distribution. -" -1314,1408.3829,"Richa Sharma, Shweta Nigam, Rekha Jain",Opinion mining of movie reviews at document level,cs.IR cs.CL," The whole world is changed rapidly and using the current technologies -Internet becomes an essential need for everyone. Web is used in every field. -Most of the people use web for a common purpose like online shopping, chatting -etc. During an online shopping large number of reviews/opinions are given by -the users that reflect whether the product is good or bad. These reviews need -to be explored, analyse and organized for better decision making. Opinion -Mining is a natural language processing task that deals with finding -orientation of opinion in a piece of text with respect to a topic. In this -paper a document based opinion mining system is proposed that classify the -documents as positive, negative and neutral. Negation is also handled in the -proposed system. Experimental results using reviews of movies show the -effectiveness of the system. -" -1315,1408.3934,"Alejandro Mosquera, Lamine Aouad, Slawomir Grzonkowski, Dylan Morss","On Detecting Messaging Abuse in Short Text Messages using Linguistic and - Behavioral patterns",cs.CL cs.AI cs.SI," The use of short text messages in social media and instant messaging has -become a popular communication channel during the last years. This rising -popularity has caused an increment in messaging threats such as spam, phishing -or malware as well as other threats. The processing of these short text message -threats could pose additional challenges such as the presence of lexical -variants, SMS-like contractions or advanced obfuscations which can degrade the -performance of traditional filtering solutions. By using a real-world SMS data -set from a large telecommunications operator from the US and a social media -corpus, in this paper we analyze the effectiveness of machine learning filters -based on linguistic and behavioral patterns in order to detect short text spam -and abusive users in the network. We have also explored different ways to deal -with short text message challenges such as tokenization and entity detection by -using text normalization and substring clustering techniques. The obtained -results show the validity of the proposed solution by enhancing baseline -approaches. -" -1316,1408.4245,Dmitry Ustalov,Towards crowdsourcing and cooperation in linguistic resources,cs.SI cs.CL," Linguistic resources can be populated with data through the use of such -approaches as crowdsourcing and gamification when motivated people are -involved. However, current crowdsourcing genre taxonomies lack the concept of -cooperation, which is the principal element of modern video games and may -potentially drive the annotators' interest. This survey on crowdsourcing -taxonomies and cooperation in linguistic resources provides recommendations on -using cooperation in existent genres of crowdsourcing and an evidence of the -efficiency of cooperation using a popular Russian linguistic resource created -through crowdsourcing as an example. -" -1317,1408.4753,Phillip M. Alday,"Be Careful When Assuming the Obvious: Commentary on ""The placement of - the head that minimizes online memory: a complex systems approach""",cs.CL," Ferrer-i-Cancho (2015) presents a mathematical model of both the synchronic -and diachronic nature of word order based on the assumption that memory costs -are a never decreasing function of distance and a few very general linguistic -assumptions. However, even these minimal and seemingly obvious assumptions are -not as safe as they appear in light of recent typological and psycholinguistic -evidence. The interaction of word order and memory has further depths to be -explored. -" -1318,1408.5403,"Peilei Liu, Ting Wang",Neural Mechanism of Language,cs.NE cs.CL q-bio.NC," This paper is based on our previous work on neural coding. It is a -self-organized model supported by existing evidences. Firstly, we briefly -introduce this model in this paper, and then we explain the neural mechanism of -language and reasoning with it. Moreover, we find that the position of an area -determines its importance. Specifically, language relevant areas are in the -capital position of the cortical kingdom. Therefore they are closely related -with autonomous consciousness and working memories. In essence, language is a -miniature of the real world. Briefly, this paper would like to bridge the gap -between molecule mechanism of neurons and advanced functions such as language -and reasoning. -" -1319,1408.5427,"Daniel Godfrey, Caley Johns, Carl Meyer, Shaina Race, Carol Sadek","A Case Study in Text Mining: Interpreting Twitter Data From World Cup - Tweets",stat.ML cs.CL cs.IR cs.LG," Cluster analysis is a field of data analysis that extracts underlying -patterns in data. One application of cluster analysis is in text-mining, the -analysis of large collections of text to find similarities between documents. -We used a collection of about 30,000 tweets extracted from Twitter just before -the World Cup started. A common problem with real world text data is the -presence of linguistic noise. In our case it would be extraneous tweets that -are unrelated to dominant themes. To combat this problem, we created an -algorithm that combined the DBSCAN algorithm and a consensus matrix. This way -we are left with the tweets that are related to those dominant themes. We then -used cluster analysis to find those topics that the tweets describe. We -clustered the tweets using k-means, a commonly used clustering algorithm, and -Non-Negative Matrix Factorization (NMF) and compared the results. The two -algorithms gave similar results, but NMF proved to be faster and provided more -easily interpreted results. We explored our results using two visualization -tools, Gephi and Wordle. -" -1320,1408.5882,Yoon Kim,Convolutional Neural Networks for Sentence Classification,cs.CL cs.NE," We report on a series of experiments with convolutional neural networks (CNN) -trained on top of pre-trained word vectors for sentence-level classification -tasks. We show that a simple CNN with little hyperparameter tuning and static -vectors achieves excellent results on multiple benchmarks. Learning -task-specific vectors through fine-tuning offers further gains in performance. -We additionally propose a simple modification to the architecture to allow for -the use of both task-specific and static vectors. The CNN models discussed -herein improve upon the state of the art on 4 out of 7 tasks, which include -sentiment analysis and question classification. -" -1321,1408.6179,"Dmitrijs Milajevs, Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Matthew - Purver","Evaluating Neural Word Representations in Tensor-Based Compositional - Settings",cs.CL," We provide a comparative study between neural word representations and -traditional vector spaces based on co-occurrence counts, in a number of -compositional tasks. We use three different semantic spaces and implement seven -tensor-based compositional models, which we then test (together with simpler -additive and multiplicative approaches) in tasks involving verb disambiguation -and sentence similarity. To check their scalability, we additionally evaluate -the spaces using simple compositional methods on larger-scale tasks with less -constrained language: paraphrase detection and dialogue act tagging. In the -more constrained tasks, co-occurrence vectors are competitive, although choice -of compositional method is important; on the larger-scale tasks, they are -outperformed by neural word embeddings, which show robust, stable performance -across the tasks. -" -1322,1408.6181,"Dimitri Kartsaklis, Nal Kalchbrenner, Mehrnoosh Sadrzadeh",Resolving Lexical Ambiguity in Tensor Regression Models of Meaning,cs.CL," This paper provides a method for improving tensor-based compositional -distributional models of meaning by the addition of an explicit disambiguation -step prior to composition. In contrast with previous research where this -hypothesis has been successfully tested against relatively simple compositional -models, in our work we use a robust model trained with linear regression. The -results we get in two experiments show the superiority of the prior -disambiguation method and suggest that the effectiveness of this approach is -model-independent. -" -1323,1408.6418,"Andrei Barbu, Alexander Bridge, Zachary Burchill, Dan Coroian, Sven - Dickinson, Sanja Fidler, Aaron Michaux, Sam Mussman, Siddharth Narayanaswamy, - Dhaval Salvi, Lara Schmidt, Jiangnan Shangguan, Jeffrey Mark Siskind, Jarrell - Waggoner, Song Wang, Jinlian Wei, Yifan Yin, Zhiqi Zhang",Video In Sentences Out,cs.CV cs.CL cs.IR," We present a system that produces sentential descriptions of video: who did -what to whom, and where and how they did it. Action class is rendered as a -verb, participant objects as noun phrases, properties of those objects as -adjectival modifiers in those noun phrases, spatial relations between those -participants as prepositional phrases, and characteristics of the event as -prepositional-phrase adjuncts and adverbial modifiers. Extracting the -information needed to render these linguistic entities requires an approach to -event recognition that recovers object tracks, the trackto-role assignments, -and changing body posture. -" -1324,1408.6746,"Slobodan Beliga, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c",Non-Standard Words as Features for Text Categorization,cs.CL cs.LG," This paper presents categorization of Croatian texts using Non-Standard Words -(NSW) as features. Non-Standard Words are: numbers, dates, acronyms, -abbreviations, currency, etc. NSWs in Croatian language are determined -according to Croatian NSW taxonomy. For the purpose of this research, 390 text -documents were collected and formed the SKIPEZ collection with 6 classes: -official, literary, informative, popular, educational and scientific. Text -categorization experiment was conducted on three different representations of -the SKIPEZ collection: in the first representation, the frequencies of NSWs are -used as features; in the second representation, the statistic measures of NSWs -(variance, coefficient of variation, standard deviation, etc.) are used as -features; while the third representation combines the first two feature sets. -Naive Bayes, CN2, C4.5, kNN, Classification Trees and Random Forest algorithms -were used in text categorization experiments. The best categorization results -are achieved using the first feature set (NSW frequencies) with the -categorization accuracy of 87%. This suggests that the NSWs should be -considered as features in highly inflectional languages, such as Croatian. NSW -based features reduce the dimensionality of the feature space without standard -lemmatization procedures, and therefore the bag-of-NSWs should be considered -for further Croatian texts categorization experiments. -" -1325,1408.6762,Nikolaos Polatidis,Chatbot for admissions,cs.CY cs.CL," The communication of potential students with a university department is -performed manually and it is a very time consuming procedure. The opportunity -to communicate with on a one-to-one basis is highly valued. However with many -hundreds of applications each year, one-to-one conversations are not feasible -in most cases. The communication will require a member of academic staff to -expend several hours to find suitable answers and contact each student. It -would be useful to reduce his costs and time. - The project aims to reduce the burden on the head of admissions, and -potentially other users, by developing a convincing chatbot. A suitable -algorithm must be devised to search through the set of data and find a -potential answer. The program then replies to the user and provides a relevant -web link if the user is not satisfied by the answer. Furthermore a web -interface is provided for both users and an administrator. - The achievements of the project can be summarised as follows. To prepare the -background of the project a literature review was undertaken, together with an -investigation of existing tools, and consultation with the head of admissions. -The requirements of the system were established and a range of algorithms and -tools were investigated, including keyword and template matching. An algorithm -that combines keyword matching with string similarity has been developed. A -usable system using the proposed algorithm has been implemented. The system was -evaluated by keeping logs of questions and answers and by feedback received by -potential students that used it. -" -1326,1408.6788,Julian Hough and Matthew Purver,Strongly Incremental Repair Detection,cs.CL," We present STIR (STrongly Incremental Repair detection), a system that -detects speech repairs and edit terms on transcripts incrementally with minimal -latency. STIR uses information-theoretic measures from n-gram models as its -principal decision features in a pipeline of classifiers detecting the -different stages of repairs. Results on the Switchboard disfluency tagged -corpus show utterance-final accuracy on a par with state-of-the-art incremental -repair detection methods, but with better incremental accuracy, faster -time-to-detection and less computational overhead. We evaluate its performance -using incremental metrics and propose new repair processing evaluation -standards. -" -1327,1408.6988,"Zongcheng Ji, Zhengdong Lu, Hang Li",An Information Retrieval Approach to Short Text Conversation,cs.IR cs.CL," Human computer conversation is regarded as one of the most difficult problems -in artificial intelligence. In this paper, we address one of its key -sub-problems, referred to as short text conversation, in which given a message -from human, the computer returns a reasonable response to the message. We -leverage the vast amount of short conversation data available on social media -to study the issue. We propose formalizing short text conversation as a search -problem at the first step, and employing state-of-the-art information retrieval -(IR) techniques to carry out the task. We investigate the significance as well -as the limitation of the IR approach. Our experiments demonstrate that the -retrieval-based model can make the system behave rather ""intelligently"", when -combined with a huge repository of conversation data from social media. -" -1328,1409.0314,Taraka Rama,Empirical Evaluation of Tree distances for Parser Evaluation,cs.CL," In this empirical study, I compare various tree distance measures -- -originally developed in computational biology for the purpose of tree -comparison -- for the purpose of parser evaluation. I will control for the -parser setting by comparing the automatically generated parse trees from the -state-of-the-art parser Charniak, 2000) with the gold-standard parse trees. The -article describes two different tree distance measures (RF and QD) along with -its variants (GRF and GQD) for the purpose of parser evaluation. The article -will argue that RF measure captures similar information as the standard EvalB -metric (Sekine and Collins, 1997) and the tree edit distance (Zhang and Shasha, -1989) applied by Tsarfaty et al. (2011). Finally, the article also provides -empirical evidence by reporting high correlations between the different tree -distances and EvalB metric's scores. -" -1329,1409.0473,Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio,Neural Machine Translation by Jointly Learning to Align and Translate,cs.CL cs.LG cs.NE stat.ML," Neural machine translation is a recently proposed approach to machine -translation. Unlike the traditional statistical machine translation, the neural -machine translation aims at building a single neural network that can be -jointly tuned to maximize the translation performance. The models proposed -recently for neural machine translation often belong to a family of -encoder-decoders and consists of an encoder that encodes a source sentence into -a fixed-length vector from which a decoder generates a translation. In this -paper, we conjecture that the use of a fixed-length vector is a bottleneck in -improving the performance of this basic encoder-decoder architecture, and -propose to extend this by allowing a model to automatically (soft-)search for -parts of a source sentence that are relevant to predicting a target word, -without having to form these parts as a hard segment explicitly. With this new -approach, we achieve a translation performance comparable to the existing -state-of-the-art phrase-based system on the task of English-to-French -translation. Furthermore, qualitative analysis reveals that the -(soft-)alignments found by the model agree well with our intuition. -" -1330,1409.0915,H. Hernan Moraldo,An Approach for Text Steganography Based on Markov Chains,cs.MM cs.CL," A text steganography method based on Markov chains is introduced, together -with a reference implementation. This method allows for information hiding in -texts that are automatically generated following a given Markov model. Other -Markov - based systems of this kind rely on big simplifications of the language -model to work, which produces less natural looking and more easily detectable -texts. The method described here is designed to generate texts within a good -approximation of the original language model provided. -" -1331,1409.1257,"Jean Pouget-Abadie and Dzmitry Bahdanau and Bart van Merrienboer and - Kyunghyun Cho and Yoshua Bengio","Overcoming the Curse of Sentence Length for Neural Machine Translation - using Automatic Segmentation",cs.CL cs.LG cs.NE stat.ML," The authors of (Cho et al., 2014a) have shown that the recently introduced -neural network translation systems suffer from a significant drop in -translation quality when translating long sentences, unlike existing -phrase-based translation systems. In this paper, we propose a way to address -this issue by automatically segmenting an input sentence into phrases that can -be easily translated by the neural network translation model. Once each segment -has been independently translated by the neural machine translation model, the -translated clauses are concatenated to form a final translation. Empirical -results show a significant improvement in translation quality for long -sentences. -" -1332,1409.1259,"Kyunghyun Cho and Bart van Merrienboer and Dzmitry Bahdanau and Yoshua - Bengio","On the Properties of Neural Machine Translation: Encoder-Decoder - Approaches",cs.CL stat.ML," Neural machine translation is a relatively new approach to statistical -machine translation based purely on neural networks. The neural machine -translation models often consist of an encoder and a decoder. The encoder -extracts a fixed-length representation from a variable-length input sentence, -and the decoder generates a correct translation from this representation. In -this paper, we focus on analyzing the properties of the neural machine -translation using two models; RNN Encoder--Decoder and a newly proposed gated -recursive convolutional neural network. We show that the neural machine -translation performs relatively well on short sentences without unknown words, -but its performance degrades rapidly as the length of the sentence and the -number of unknown words increase. Furthermore, we find that the proposed gated -recursive convolutional network learns a grammatical structure of a sentence -automatically. -" -1333,1409.1612,Andrey Kutuzov,"Semantic clustering of Russian web search results: possibilities and - problems",cs.CL cs.IR," The paper deals with word sense induction from lexical co-occurrence graphs. -We construct such graphs on large Russian corpora and then apply this data to -cluster Mail.ru Search results according to meanings of the query. We compare -different methods of performing such clustering and different source corpora. -Models of applying distributional semantics to big linguistic data are -described. -" -1334,1409.1744,"V.A. Traag, R. Reinanda and G. van Klinken",Structure of a media co-occurrence network,physics.soc-ph cs.CL cs.SI," Social networks have been of much interest in recent years. We here focus on -a network structure derived from co-occurrences of people in traditional -newspaper media. We find three clear deviations from what can be expected in a -random graph. First, the average degree in the empirical network is much lower -than expected, and the average weight of a link much higher than expected. -Secondly, high degree nodes attract disproportionately much weight. Thirdly, -relatively much of the weight seems to concentrate between high degree nodes. -We believe this can be explained by the fact that most people tend to co-occur -repeatedly with the same people. We create a model that replicates these -observations qualitatively based on two self-reinforcing processes: (1) more -frequently occurring persons are more likely to occur again; and (2) if two -people co-occur frequently, they are more likely to co-occur again. This -suggest that the media tends to focus on people that are already in the news, -and that they reinforce existing co-occurrences. -" -1335,1409.2073,Tobias Kortkamp,An NLP Assistant for Clide,cs.CL," This report describes an NLP assistant for the collaborative development -environment Clide, that supports the development of NLP applications by -providing easy access to some common NLP data structures. The assistant -visualizes text fragments and their dependencies by displaying the semantic -graph of a sentence, the coreference chain of a paragraph and mined triples -that are extracted from a paragraph's semantic graphs and linked using its -coreference chain. Using this information and a logic programming library, we -create an NLP database which is used by a series of queries to mine the -triples. The algorithm is tested by translating a natural language text -describing a graph to an actual graph that is shown as an annotation in the -text editor. -" -1336,1409.2195,"Daniel Fried, Mihai Surdeanu, Stephen Kobourov, Melanie Hingle, Dane - Bell",Analyzing the Language of Food on Social Media,cs.CL cs.CY cs.SI," We investigate the predictive power behind the language of food on social -media. We collect a corpus of over three million food-related posts from -Twitter and demonstrate that many latent population characteristics can be -directly predicted from this data: overweight rate, diabetes rate, political -leaning, and home geographical location of authors. For all tasks, our -language-based models significantly outperform the majority-class baselines. -Performance is further improved with more complex natural language processing, -such as topic modeling. We analyze which textual features have most predictive -power for these datasets, providing insight into the connections between the -language of food, geographic locale, and community characteristics. Lastly, we -design and implement an online system for real-time query and visualization of -the dataset. Visualization tools, such as geo-referenced heatmaps, -semantics-preserving wordclouds and temporal histograms, allow us to discover -more complex, global patterns mirrored in the language of food. -" -1337,1409.2433,"Antonina Kolokolova, Renesa Nizamee","Approximating solution structure of the Weighted Sentence Alignment - problem",cs.CL cs.CC cs.DS," We study the complexity of approximating solution structure of the bijective -weighted sentence alignment problem of DeNero and Klein (2008). In particular, -we consider the complexity of finding an alignment that has a significant -overlap with an optimal alignment. We discuss ways of representing the solution -for the general weighted sentence alignment as well as phrases-to-words -alignment problem, and show that computing a string which agrees with the -optimal sentence partition on more than half (plus an arbitrarily small -polynomial fraction) positions for the phrases-to-words alignment is NP-hard. -For the general weighted sentence alignment we obtain such bound from the -agreement on a little over 2/3 of the bits. Additionally, we generalize the -Hamming distance approximation of a solution structure to approximating it with -respect to the edit distance metric, obtaining similar lower bounds. -" -1338,1409.2450,"Robert West, Hristo S. Paskov, Jure Leskovec, Christopher Potts","Exploiting Social Network Structure for Person-to-Person Sentiment - Analysis",cs.SI cs.CL physics.soc-ph," Person-to-person evaluations are prevalent in all kinds of discourse and -important for establishing reputations, building social bonds, and shaping -public opinion. Such evaluations can be analyzed separately using signed social -networks and textual sentiment analysis, but this misses the rich interactions -between language and social context. To capture such interactions, we develop a -model that predicts individual A's opinion of individual B by synthesizing -information from the signed social network in which A and B are embedded with -sentiment analysis of the evaluative texts relating A to B. We prove that this -problem is NP-hard but can be relaxed to an efficiently solvable hinge-loss -Markov random field, and we show that this implementation outperforms text-only -and network-only versions in two very different datasets involving -community-level decision-making: the Wikipedia Requests for Adminship corpus -and the Convote U.S. Congressional speech corpus. -" -1339,1409.2944,Hao Wang and Naiyan Wang and Dit-Yan Yeung,Collaborative Deep Learning for Recommender Systems,cs.LG cs.CL cs.IR cs.NE stat.ML," Collaborative filtering (CF) is a successful approach commonly used by many -recommender systems. Conventional CF-based methods use the ratings given to -items by users as the sole source of information for learning to make -recommendation. However, the ratings are often very sparse in many -applications, causing CF-based methods to degrade significantly in their -recommendation performance. To address this sparsity problem, auxiliary -information such as item content information may be utilized. Collaborative -topic regression (CTR) is an appealing recent method taking this approach which -tightly couples the two components that learn from two different sources of -information. Nevertheless, the latent representation learned by CTR may not be -very effective when the auxiliary information is very sparse. To address this -problem, we generalize recent advances in deep learning from i.i.d. input to -non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian -model called collaborative deep learning (CDL), which jointly performs deep -representation learning for the content information and collaborative filtering -for the ratings (feedback) matrix. Extensive experiments on three real-world -datasets from different domains show that CDL can significantly advance the -state of the art. -" -1340,1409.2993,"Jian Tang, Ming Zhang, Qiaozhu Mei","""Look Ma, No Hands!"" A Parameter-Free Topic Model",cs.LG cs.CL cs.IR," It has always been a burden to the users of statistical topic models to -predetermine the right number of topics, which is a key parameter of most topic -models. Conventionally, automatic selection of this parameter is done through -either statistical model selection (e.g., cross-validation, AIC, or BIC) or -Bayesian nonparametric models (e.g., hierarchical Dirichlet process). These -methods either rely on repeated runs of the inference algorithm to search -through a large range of parameter values which does not suit the mining of big -data, or replace this parameter with alternative parameters that are less -intuitive and still hard to be determined. In this paper, we explore to -""eliminate"" this parameter from a new perspective. We first present a -nonparametric treatment of the PLSA model named nonparametric probabilistic -latent semantic analysis (nPLSA). The inference procedure of nPLSA allows for -the exploration and comparison of different numbers of topics within a single -execution, yet remains as simple as that of PLSA. This is achieved by -substituting the parameter of the number of topics with an alternative -parameter that is the minimal goodness of fit of a document. We show that the -new parameter can be further eliminated by two parameter-free treatments: -either by monitoring the diversity among the discovered topics or by a weak -supervision from users in the form of an exemplar topic. The parameter-free -topic model finds the appropriate number of topics when the diversity among the -discovered topics is maximized, or when the granularity of the discovered -topics matches the exemplar topic. Experiments on both synthetic and real data -prove that the parameter-free topic model extracts topics with a comparable -quality comparing to classical topic models with ""manual transmission"". The -quality of the topics outperforms those extracted through classical Bayesian -nonparametric models. -" -1341,1409.3005,"Abdelkader El Mahdaouy, Sa\""id EL Alaoui Ouatik and Eric Gaussier","A Study of Association Measures and their Combination for Arabic MWT - Extraction",cs.CL," Automatic Multi-Word Term (MWT) extraction is a very important issue to many -applications, such as information retrieval, question answering, and text -categorization. Although many methods have been used for MWT extraction in -English and other European languages, few studies have been applied to Arabic. -In this paper, we propose a novel, hybrid method which combines linguistic and -statistical approaches for Arabic Multi-Word Term extraction. The main -contribution of our method is to consider contextual information and both -termhood and unithood for association measures at the statistical filtering -step. In addition, our technique takes into account the problem of MWT -variation in the linguistic filtering step. The performance of the proposed -statistical measure (NLC-value) is evaluated using an Arabic environment corpus -by comparing it with some existing competitors. Experimental results show that -our NLC-value measure outperforms the other ones in term of precision for both -bi-grams and tri-grams. -" -1342,1409.3215,Ilya Sutskever and Oriol Vinyals and Quoc V. Le,Sequence to Sequence Learning with Neural Networks,cs.CL cs.LG," Deep Neural Networks (DNNs) are powerful models that have achieved excellent -performance on difficult learning tasks. Although DNNs work well whenever large -labeled training sets are available, they cannot be used to map sequences to -sequences. In this paper, we present a general end-to-end approach to sequence -learning that makes minimal assumptions on the sequence structure. Our method -uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to -a vector of a fixed dimensionality, and then another deep LSTM to decode the -target sequence from the vector. Our main result is that on an English to -French translation task from the WMT'14 dataset, the translations produced by -the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's -BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did -not have difficulty on long sentences. For comparison, a phrase-based SMT -system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM -to rerank the 1000 hypotheses produced by the aforementioned SMT system, its -BLEU score increases to 36.5, which is close to the previous best result on -this task. The LSTM also learned sensible phrase and sentence representations -that are sensitive to word order and are relatively invariant to the active and -the passive voice. Finally, we found that reversing the order of the words in -all source sentences (but not target sentences) improved the LSTM's performance -markedly, because doing so introduced many short term dependencies between the -source and the target sentence which made the optimization problem easier. -" -1343,1409.3512,"Udaya Raj Dhungana, Subarna Shakya, Kabita Baral and Bharat Sharma",Word Sense Disambiguation using WSD specific Wordnet of Polysemy Words,cs.CL," This paper presents a new model of WordNet that is used to disambiguate the -correct sense of polysemy word based on the clue words. The related words for -each sense of a polysemy word as well as single sense word are referred to as -the clue words. The conventional WordNet organizes nouns, verbs, adjectives and -adverbs together into sets of synonyms called synsets each expressing a -different concept. In contrast to the structure of WordNet, we developed a new -model of WordNet that organizes the different senses of polysemy words as well -as the single sense words based on the clue words. These clue words for each -sense of a polysemy word as well as for single sense word are used to -disambiguate the correct meaning of the polysemy word in the given context -using knowledge based Word Sense Disambiguation (WSD) algorithms. The clue word -can be a noun, verb, adjective or adverb. -" -1344,1409.3813,Yannick Versley,"Incorporating Semi-supervised Features into Discontinuous Easy-First - Constituent Parsing",cs.CL," This paper describes adaptations for EaFi, a parser for easy-first parsing of -discontinuous constituents, to adapt it to multiple languages as well as make -use of the unlabeled data that was provided as part of the SPMRL shared task -2014. -" -1345,1409.3870,"Jake Ryland Williams, James P. Bagrow, Christopher M. Danforth, and - Peter Sheridan Dodds","Text mixing shapes the anatomy of rank-frequency distributions: A modern - Zipfian mechanics for natural language",cs.CL physics.soc-ph," Natural languages are full of rules and exceptions. One of the most famous -quantitative rules is Zipf's law which states that the frequency of occurrence -of a word is approximately inversely proportional to its rank. Though this -`law' of ranks has been found to hold across disparate texts and forms of data, -analyses of increasingly large corpora over the last 15 years have revealed the -existence of two scaling regimes. These regimes have thus far been explained by -a hypothesis suggesting a separability of languages into core and non-core -lexica. Here, we present and defend an alternative hypothesis, that the two -scaling regimes result from the act of aggregating texts. We observe that text -mixing leads to an effective decay of word introduction, which we show provides -accurate predictions of the location and severity of breaks in scaling. Upon -examining large corpora from 10 languages in the Project Gutenberg eBooks -collection (eBooks), we find emphatic empirical support for the universality of -our claim. -" -1346,1409.3881,Michael Bloodgood and K. Vijay-Shanker,An Approach to Reducing Annotation Costs for BioNLP,cs.CL cs.LG stat.ML," There is a broad range of BioNLP tasks for which active learning (AL) can -significantly reduce annotation costs and a specific AL algorithm we have -developed is particularly effective in reducing annotation costs for these -tasks. We have previously developed an AL algorithm called ClosestInitPA that -works best with tasks that have the following characteristics: redundancy in -training material, burdensome annotation costs, Support Vector Machines (SVMs) -work well for the task, and imbalanced datasets (i.e. when set up as a binary -classification problem, one class is substantially rarer than the other). Many -BioNLP tasks have these characteristics and thus our AL algorithm is a natural -approach to apply to BioNLP tasks. -" -1347,1409.3942,"Richa Sharma, Shweta Nigam, Rekha Jain",Polarity detection movie reviews in hindi language,cs.CL cs.IR," Nowadays peoples are actively involved in giving comments and reviews on -social networking websites and other websites like shopping websites, news -websites etc. large number of people everyday share their opinion on the web, -results is a large number of user data is collected .users also find it trivial -task to read all the reviews and then reached into the decision. It would be -better if these reviews are classified into some category so that the user -finds it easier to read. Opinion Mining or Sentiment Analysis is a natural -language processing task that mines information from various text forms such as -reviews, news, and blogs and classify them on the basis of their polarity as -positive, negative or neutral. But, from the last few years, user content in -Hindi language is also increasing at a rapid rate on the Web. So it is very -important to perform opinion mining in Hindi language as well. In this paper a -Hindi language opinion mining system is proposed. The system classifies the -reviews as positive, negative and neutral for Hindi language. Negation is also -handled in the proposed system. Experimental results using reviews of movies -show the effectiveness of the system -" -1348,1409.4169,Rama N. and Meenakshi Lakshmanan,"An Algorithm Based on Empirical Methods, for Text-to-Tuneful-Speech - Synthesis of Sanskrit Verse",cs.CL," The rendering of Sanskrit poetry from text to speech is a problem that has -not been solved before. One reason may be the complications in the language -itself. We present unique algorithms based on extensive empirical analysis, to -synthesize speech from a given text input of Sanskrit verses. Using a -pre-recorded audio units database which is itself tremendously reduced in size -compared to the colossal size that would otherwise be required, the algorithms -work on producing the best possible, tunefully rendered chanting of the given -verse. His would enable the visually impaired and those with reading -disabilities to easily access the contents of Sanskrit verses otherwise -available only in writing. -" -1349,1409.4354,"S. V. Kasmir Raja, V. Rajitha, Meenakshi Lakshmanan","A Binary Schema and Computational Algorithms to Process Vowel-based - Euphonic Conjunctions for Word Searches",cs.CL," Comprehensively searching for words in Sanskrit E-text is a non-trivial -problem because words could change their forms in different contexts. One such -context is sandhi or euphonic conjunctions, which cause a word to change owing -to the presence of adjacent letters or words. The change wrought by these -possible conjunctions can be so significant in Sanskrit that a simple search -for the word in its given form alone can significantly reduce the success level -of the search. This work presents a representational schema that represents -letters in a binary format and reduces Paninian rules of euphonic conjunctions -to simple bit set-unset operations. The work presents an efficient algorithm to -process vowel-based sandhis using this schema. It further presents another -algorithm that uses the sandhi processor to generate the possible transformed -word forms of a given word to use in a comprehensive word search. -" -1350,1409.4364,"S. V. Kasmir Raja, V. Rajitha, Meenakshi Lakshmanan","Computational Algorithms Based on the Paninian System to Process - Euphonic Conjunctions for Word Searches",cs.CL," Searching for words in Sanskrit E-text is a problem that is accompanied by -complexities introduced by features of Sanskrit such as euphonic conjunctions -or sandhis. A word could occur in an E-text in a transformed form owing to the -operation of rules of sandhi. Simple word search would not yield these -transformed forms of the word. Further, there is no search engine in the -literature that can comprehensively search for words in Sanskrit E-texts taking -euphonic conjunctions into account. This work presents an optimal binary -representational schema for letters of the Sanskrit alphabet along with -algorithms to efficiently process the sandhi rules of Sanskrit grammar. The -work further presents an algorithm that uses the sandhi processing algorithm to -perform a comprehensive word search on E-text. -" -1351,1409.4504,Tao Wang and Hua Zhu,Voting for Deceptive Opinion Spam Detection,cs.CL cs.SI," Consumers' purchase decisions are increasingly influenced by user-generated -online reviews. Accordingly, there has been growing concern about the potential -for posting deceptive opinion spam fictitious reviews that have been -deliberately written to sound authentic, to deceive the readers. Existing -approaches mainly focus on developing automatic supervised learning based -methods to help users identify deceptive opinion spams. - This work, we used the LSI and Sprinkled LSI technique to reduce the -dimension for deception detection. We make our contribution to demonstrate what -LSI is capturing in latent semantic space and reveal how deceptive opinions can -be recognized automatically from truthful opinions. Finally, we proposed a -voting scheme which integrates different approaches to further improve the -classification performance. -" -1352,1409.4614,Bilal Ahmed,Lexical Normalisation of Twitter Data,cs.CL," Twitter with over 500 million users globally, generates over 100,000 tweets -per minute . The 140 character limit per tweet, perhaps unintentionally, -encourages users to use shorthand notations and to strip spellings to their -bare minimum ""syllables"" or elisions e.g. ""srsly"". The analysis of twitter -messages which typically contain misspellings, elisions, and grammatical -errors, poses a challenge to established Natural Language Processing (NLP) -tools which are generally designed with the assumption that the data conforms -to the basic grammatical structure commonly used in English language. In order -to make sense of Twitter messages it is necessary to first transform them into -a canonical form, consistent with the dictionary or grammar. This process, -performed at the level of individual tokens (""words""), is called lexical -normalisation. This paper investigates various techniques for lexical -normalisation of Twitter data and presents the findings as the techniques are -applied to process raw data from Twitter. -" -1353,1409.4617,Ronald Hochreiter and Christoph Waldhauser,The Role of Emotions in Propagating Brands in Social Networks,cs.SI cs.CL stat.ML," A key aspect of word of mouth marketing are emotions. Emotions in texts help -propagating messages in conventional advertising. In word of mouth scenarios, -emotions help to engage consumers and incite to propagate the message further. -While the function of emotions in offline marketing in general and word of -mouth marketing in particular is rather well understood, online marketing can -only offer a limited view on the function of emotions. In this contribution we -seek to close this gap. We therefore investigate how emotions function in -social media. To do so, we collected more than 30,000 brand marketing messages -from the Google+ social networking site. Using state of the art computational -linguistics classifiers, we compute the sentiment of these messages. Starting -out with Poisson regression-based baseline models, we seek to replicate earlier -findings using this large data set. We extend upon earlier research by -computing multi-level mixed effects models that compare the function of -emotions across different industries. We find that while the well known notion -of activating emotions propagating messages holds in general for our data as -well. But there are significant differences between the observed industries. -" -1354,1409.4714,"Andrzej Kulig, Stanislaw Drozdz, Jaroslaw Kwapien, Pawel Oswiecimka","Modeling the average shortest path length in growth of word-adjacency - networks",cs.CL physics.soc-ph," We investigate properties of evolving linguistic networks defined by the -word-adjacency relation. Such networks belong to the category of networks with -accelerated growth but their shortest path length appears to reveal the network -size dependence of different functional form than the ones known so far. We -thus compare the networks created from literary texts with their artificial -substitutes based on different variants of the Dorogovtsev-Mendes model and -observe that none of them is able to properly simulate the novel asymptotics of -the shortest path length. Then, we identify the local chain-like linear growth -induced by grammar and style as a missing element in this model and extend it -by incorporating such effects. It is in this way that a satisfactory agreement -with the empirical result is obtained. -" -1355,1409.4835,Michael Bloodgood and K. Vijay-Shanker,"Taking into Account the Differences between Actively and Passively - Acquired Data: The Case of Active Learning with Support Vector Machines for - Imbalanced Datasets",cs.LG cs.CL stat.ML," Actively sampled data can have very different characteristics than passively -sampled data. Therefore, it's promising to investigate using different -inference procedures during AL than are used during passive learning (PL). This -general idea is explored in detail for the focused case of AL with -cost-weighted SVMs for imbalanced data, a situation that arises for many HLT -tasks. The key idea behind the proposed InitPA method for addressing imbalance -is to base cost models during AL on an estimate of overall corpus imbalance -computed via a small unbiased sample rather than the imbalance in the labeled -training data, which is the leading method used during PL. -" -1356,1409.5165,Michael Bloodgood and K. Vijay-Shanker,"A Method for Stopping Active Learning Based on Stabilizing Predictions - and the Need for User-Adjustable Stopping",cs.LG cs.CL stat.ML," A survey of existing methods for stopping active learning (AL) reveals the -needs for methods that are: more widely applicable; more aggressive in saving -annotations; and more stable across changing datasets. A new method for -stopping AL based on stabilizing predictions is presented that addresses these -needs. Furthermore, stopping methods are required to handle a broad range of -different annotation/performance tradeoff valuations. Despite this, the -existing body of work is dominated by conservative methods with little (if any) -attention paid to providing users with control over the behavior of stopping -methods. The proposed method is shown to fill a gap in the level of -aggressiveness available for stopping AL and supports providing users with -control over stopping behavior. -" -1357,1409.5502,"Alexander Kalinin, George Savchenko","Using crowdsourcing system for creating site-specific statistical - machine translation engine",cs.CL," A crowdsourcing translation approach is an effective tool for globalization -of site content, but it is also an important source of parallel linguistic -data. For the given site, processed with a crowdsourcing system, a -sentence-aligned corpus can be fetched, which covers a very narrow domain of -terminology and language patterns - a site-specific domain. These data can be -used for training and estimation of site-specific statistical machine -translation engine -" -1358,1409.5623,"Samuel R\""onnqvist, Xiaolu Wang, Peter Sarlin",Interactive Visual Exploration of Topic Models using Graphs,cs.IR cs.CL," Probabilistic topic modeling is a popular and powerful family of tools for -uncovering thematic structure in large sets of unstructured text documents. -While much attention has been directed towards the modeling algorithms and -their various extensions, comparatively few studies have concerned how to -present or visualize topic models in meaningful ways. In this paper, we present -a novel design that uses graphs to visually communicate topic structure and -meaning. By connecting topic nodes via descriptive keyterms, the graph -representation reveals topic similarities, topic meaning and shared, ambiguous -keyterms. At the same time, the graph can be used for information retrieval -purposes, to find documents by topic or topic subsets. To exemplify the utility -of the design, we illustrate its use for organizing and exploring corpora of -financial patents. -" -1359,1409.7085,"Kathryn Baker, Michael Bloodgood, Chris Callison-Burch, Bonnie J. - Dorr, Nathaniel W. Filardo, Lori Levin, Scott Miller and Christine Piatko","Semantically-Informed Syntactic Machine Translation: A Tree-Grafting - Approach",cs.CL cs.LG stat.ML," We describe a unified and coherent syntactic framework for supporting a -semantically-informed syntactic approach to statistical machine translation. -Semantically enriched syntactic tags assigned to the target-language training -texts improved translation quality. The resulting system significantly -outperformed a linguistically naive baseline model (Hiero), and reached the -highest scores yet reported on the NIST 2009 Urdu-English translation task. -This finding supports the hypothesis (posed by many researchers in the MT -community, e.g., in DARPA GALE) that both syntactic and semantic information -are critical for improving translation quality---and further demonstrates that -large gains can be achieved for low-resource languages with different word -order than English. -" -1360,1409.7275,Ramon Ferrer-i-Cancho,"The meaning-frequency law in Zipfian optimization models of - communication",cs.CL physics.data-an physics.soc-ph," According to Zipf's meaning-frequency law, words that are more frequent tend -to have more meanings. Here it is shown that a linear dependency between the -frequency of a form and its number of meanings is found in a family of models -of Zipf's law for word frequencies. This is evidence for a weak version of the -meaning-frequency law. Interestingly, that weak law (a) is not an inevitable of -property of the assumptions of the family and (b) is found at least in the -narrow regime where those models exhibit Zipf's law for word frequencies. -" -1361,1409.7336,"Juan Pablo C\'ardenas, Iv\'an Gonz\'alez, Gerardo Vidal, Miguel - Fuentes",Does network complexity help organize Babel's library?,physics.soc-ph cs.CL nlin.AO physics.data-an," In this work, we study properties of texts from the perspective of complex -network theory. Words in given texts are linked by co-occurrence and -transformed into networks, and we observe that these display topological -properties common to other complex systems. However, there are some properties -that seem to be exclusive to texts; many of these properties depend on the -frequency of words in the text, while others seem to be strictly determined by -the grammar. Precisely, these properties allow for a categorization of texts as -either with a sense and others encoded or senseless. -" -1362,1409.7386,Rushdi Shams,Performance of Stanford and Minipar Parser on Biomedical Texts,cs.CL," In this paper, the performance of two dependency parsers, namely Stanford and -Minipar, on biomedical texts has been reported. The performance of te parsers -to assignm dependencies between two biomedical concepts that are already proved -to be connected is not satisfying. Both Stanford and Minipar, being statistical -parsers, fail to assign dependency relation between two connected concepts if -they are distant by at least one clause. Minipar's performance, in terms of -precision, recall and the F-score of the attachment score (e.g., correctly -identified head in a dependency), to parse biomedical text is also measured -taking the Stanford's as a gold standard. The results suggest that Minipar is -not suitable yet to parse biomedical texts. In addition, a qualitative -investigation reveals that the difference between working principles of the -parsers also play a vital role for Minipar's degraded performance. -" -1363,1409.7591,Arun S. Maiya and Robert M. Rolfe,Topic Similarity Networks: Visual Analytics for Large Document Sets,cs.CL cs.HC cs.IR cs.SI stat.ML," We investigate ways in which to improve the interpretability of LDA topic -models by better analyzing and visualizing their outputs. We focus on examining -what we refer to as topic similarity networks: graphs in which nodes represent -latent topics in text collections and links represent similarity among topics. -We describe efficient and effective approaches to both building and labeling -such networks. Visualizations of topic models based on these networks are shown -to be a powerful means of exploring, characterizing, and summarizing large -collections of unstructured text documents. They help to ""tease out"" -non-obvious connections among different sets of documents and provide insights -into how topics form larger themes. We demonstrate the efficacy and -practicality of these approaches through two case studies: 1) NSF grants for -basic research spanning a 14 year period and 2) the entire English portion of -Wikipedia. -" -1364,1409.7612,Rushdi Shams,Semi-supervised Classification for Natural Language Processing,cs.CL cs.LG," Semi-supervised classification is an interesting idea where classification -models are learned from both labeled and unlabeled data. It has several -advantages over supervised classification in natural language processing -domain. For instance, supervised classification exploits only labeled data that -are expensive, often difficult to get, inadequate in quantity, and require -human experts for annotation. On the other hand, unlabeled data are inexpensive -and abundant. Despite the fact that many factors limit the wide-spread use of -semi-supervised classification, it has become popular since its level of -performance is empirically as good as supervised classification. This study -explores the possibilities and achievements as well as complexity and -limitations of semi-supervised classification for several natural langue -processing tasks like parsing, biomedical information processing, text -classification, and summarization. -" -1365,1409.7619,"Ekaterina Ovchinnikova, Vladimir Zaytsev, Suzanne Wertheim, Ross - Israel",Generating Conceptual Metaphors from Proposition Stores,cs.CL," Contemporary research on computational processing of linguistic metaphors is -divided into two main branches: metaphor recognition and metaphor -interpretation. We take a different line of research and present an automated -method for generating conceptual metaphors from linguistic data. Given the -generated conceptual metaphors, we find corresponding linguistic metaphors in -corpora. In this paper, we describe our approach and its evaluation using -English and Russian data. -" -1366,1409.7985,Yanchuan Sim and Bryan Routledge and Noah A. Smith,The Utility of Text: The Case of Amicus Briefs and the Supreme Court,cs.CL cs.AI cs.GT cs.LG," We explore the idea that authoring a piece of text is an act of maximizing -one's expected utility. To make this idea concrete, we consider the societally -important decisions of the Supreme Court of the United States. Extensive past -work in quantitative political science provides a framework for empirically -modeling the decisions of justices and how they relate to text. We incorporate -into such a model texts authored by amici curiae (""friends of the court"" -separate from the litigants) who seek to weigh in on the decision, then -explicitly model their goals in a random utility model. We demonstrate the -benefits of this approach in improved vote prediction and the ability to -perform counterfactual analysis. -" -1367,1409.8008,"Arjun Das, Utpal Garain",CRF-based Named Entity Recognition @ICON 2013,cs.CL," This paper describes performance of CRF based systems for Named Entity -Recognition (NER) in Indian language as a part of ICON 2013 shared task. In -this task we have considered a set of language independent features for all the -languages. Only for English a language specific feature, i.e. capitalization, -has been added. Next the use of gazetteer is explored for Bengali, Hindi and -English. The gazetteers are built from Wikipedia and other sources. Test -results show that the system achieves the highest F measure of 88% for English -and the lowest F measure of 69% for both Tamil and Telugu. Note that for the -least performing two languages no gazetteer was used. NER in Bengali and Hindi -finds accuracy (F measure) of 87% and 79%, respectively. -" -1368,1409.8152,"Yelena Mejova, Amy X. Zhang, Nicholas Diakopoulos, Carlos Castillo",Controversy and Sentiment in Online News,cs.CY cs.CL," How do news sources tackle controversial issues? In this work, we take a -data-driven approach to understand how controversy interplays with emotional -expression and biased language in the news. We begin by introducing a new -dataset of controversial and non-controversial terms collected using -crowdsourcing. Then, focusing on 15 major U.S. news outlets, we compare -millions of articles discussing controversial and non-controversial issues over -a span of 7 months. We find that in general, when it comes to controversial -issues, the use of negative affect and biased language is prevalent, while the -use of strong emotion is tempered. We also observe many differences across news -sources. Using these findings, we show that we can indicate to what extent an -issue is controversial, by comparing it with other issues in terms of how they -are portrayed across different media. -" -1369,1409.8309,"Youssef Hassan, Mohamed Aly and Amir Atiya",Arabic Spelling Correction using Supervised Learning,cs.LG cs.CL," In this work, we address the problem of spelling correction in the Arabic -language utilizing the new corpus provided by QALB (Qatar Arabic Language Bank) -project which is an annotated corpus of sentences with errors and their -corrections. The corpus contains edit, add before, split, merge, add after, -move and other error types. We are concerned with the first four error types as -they contribute more than 90% of the spelling errors in the corpus. The -proposed system has many models to address each error type on its own and then -integrating all the models to provide an efficient and robust system that -achieves an overall recall of 0.59, precision of 0.58 and F1 score of 0.58 -including all the error types on the development set. Our system participated -in the QALB 2014 shared task ""Automatic Arabic Error Correction"" and achieved -an F1 score of 0.6, earning the sixth place out of nine participants. -" -1370,1409.8484,"Christian Napoli, Giuseppe Pappalardo, Emiliano Tramontana","An agent-driven semantical identifier using radial basis neural networks - and reinforcement learning",cs.NE cs.AI cs.CL cs.LG cs.MA," Due to the huge availability of documents in digital form, and the deception -possibility raise bound to the essence of digital documents and the way they -are spread, the authorship attribution problem has constantly increased its -relevance. Nowadays, authorship attribution,for both information retrieval and -analysis, has gained great importance in the context of security, trust and -copyright preservation. This work proposes an innovative multi-agent driven -machine learning technique that has been developed for authorship attribution. -By means of a preprocessing for word-grouping and time-period related analysis -of the common lexicon, we determine a bias reference level for the recurrence -frequency of the words within analysed texts, and then train a Radial Basis -Neural Networks (RBPNN)-based classifier to identify the correct author. The -main advantage of the proposed approach lies in the generality of the semantic -analysis, which can be applied to different contexts and lexical domains, -without requiring any modification. Moreover, the proposed system is able to -incorporate an external input, meant to tune the classifier, and then -self-adjust by means of continuous learning reinforcement. -" -1371,1409.8558,Prasanna Kumar Muthukumar and Alan W. Black,"A Deep Learning Approach to Data-driven Parameterizations for - Statistical Parametric Speech Synthesis",cs.CL cs.LG cs.NE," Nearly all Statistical Parametric Speech Synthesizers today use Mel Cepstral -coefficients as the vocal tract parameterization of the speech signal. Mel -Cepstral coefficients were never intended to work in a parametric speech -synthesis framework, but as yet, there has been little success in creating a -better parameterization that is more suited to synthesis. In this paper, we use -deep learning algorithms to investigate a data-driven parameterization -technique that is designed for the specific requirements of synthesis. We -create an invertible, low-dimensional, noise-robust encoding of the Mel Log -Spectrum by training a tapered Stacked Denoising Autoencoder (SDA). This SDA is -then unwrapped and used as the initialization for a Multi-Layer Perceptron -(MLP). The MLP is fine-tuned by training it to reconstruct the input at the -output layer. This MLP is then split down the middle to form encoding and -decoding networks. These networks produce a parameterization of the Mel Log -Spectrum that is intended to better fulfill the requirements of synthesis. -Results are reported for experiments conducted using this resulting -parameterization with the ClusterGen speech synthesizer. -" -1372,1409.8581,"M. Anand Kumar, V. Dhanalakshmi, K. P. Soman and V. Sharmiladevi","Improving the Performance of English-Tamil Statistical Machine - Translation System using Source-Side Pre-Processing",cs.CL," Machine Translation is one of the major oldest and the most active research -area in Natural Language Processing. Currently, Statistical Machine Translation -(SMT) dominates the Machine Translation research. Statistical Machine -Translation is an approach to Machine Translation which uses models to learn -translation patterns directly from data, and generalize them to translate a new -unseen text. The SMT approach is largely language independent, i.e. the models -can be applied to any language pair. Statistical Machine Translation (SMT) -attempts to generate translations using statistical methods based on bilingual -text corpora. Where such corpora are available, excellent results can be -attained translating similar texts, but such corpora are still not available -for many language pairs. Statistical Machine Translation systems, in general, -have difficulty in handling the morphology on the source or the target side -especially for morphologically rich languages. Errors in morphology or syntax -in the target language can have severe consequences on meaning of the sentence. -They change the grammatical function of words or the understanding of the -sentence through the incorrect tense information in verb. Baseline SMT also -known as Phrase Based Statistical Machine Translation (PBSMT) system does not -use any linguistic information and it only operates on surface word form. -Recent researches shown that adding linguistic information helps to improve the -accuracy of the translation with less amount of bilingual corpora. Adding -linguistic information can be done using the Factored Statistical Machine -Translation system through pre-processing steps. This paper investigates about -how English side pre-processing is used to improve the accuracy of -English-Tamil SMT system. -" -1373,1410.0210,Mateusz Malinowski and Mario Fritz,"A Multi-World Approach to Question Answering about Real-World Scenes - based on Uncertain Input",cs.AI cs.CL cs.CV cs.LG," We propose a method for automatically answering questions about images by -bringing together recent advances from natural language processing and computer -vision. We combine discrete reasoning with uncertain predictions by a -multi-world approach that represents uncertainty about the perceived world in a -bayesian framework. Our approach can handle human questions of high complexity -about realistic scenes and replies with range of answer like counts, object -classes, instances and lists of them. The system is directly trained from -question-answer pairs. We establish a first benchmark for this task that can be -seen as a modern attempt at a visual turing test. -" -1374,1410.0286,"Dirk Roorda, Gino Kalkman, Martijn Naaijer, Andreas van Cranenburgh","LAF-Fabric: a data analysis tool for Linguistic Annotation Framework - with an application to the Hebrew Bible",cs.CL," The Linguistic Annotation Framework (LAF) provides a general, extensible -stand-off markup system for corpora. This paper discusses LAF-Fabric, a new -tool to analyse LAF resources in general with an extension to process the -Hebrew Bible in particular. We first walk through the history of the Hebrew -Bible as text database in decennium-wide steps. Then we describe how LAF-Fabric -may serve as an analysis tool for this corpus. Finally, we describe three -analytic projects/workflows that benefit from the new LAF representation: - 1) the study of linguistic variation: extract cooccurrence data of common -nouns between the books of the Bible (Martijn Naaijer); 2) the study of the -grammar of Hebrew poetry in the Psalms: extract clause typology (Gino Kalkman); -3) construction of a parser of classical Hebrew by Data Oriented Parsing: -generate tree structures from the database (Andreas van Cranenburgh). -" -1375,1410.0291,Yanchuan Sim,"A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives",cs.CL," We present an open source morphological analyzer for Japanese nouns, verbs -and adjectives. The system builds upon the morphological analyzing capabilities -of MeCab to incorporate finer details of classification such as politeness, -tense, mood and voice attributes. We implemented our analyzer in the form of a -finite state transducer using the open source finite state compiler FOMA -toolkit. The source code and tool is available at -https://bitbucket.org/skylander/yc-nlplab/. -" -1376,1410.0316,Brian Lee Yung Rowe,Using social network graph analysis for interest detection,cs.SI cs.CL physics.soc-ph," A person's interests exist as an internal state and are difficult to define. -Since only external actions are observable, a proxy must be used that -represents someone's interests. Techniques like collaborative filtering, -behavioral targeting, and hashtag analysis implicitly model an individual's -interests. I argue that these models are limited to shallow, temporary -interests, which do not reflect people's deeper interests or passions. I -propose an alternative model of interests that takes advantage of a user's -social graph. The basic principle is that people only follow those that -interest them, so the social graph is an effective and robust proxy for -people's interests. -" -1377,1410.0718,"Felix Hill, KyungHyun Cho, Sebastien Jean, Coline Devin and Yoshua - Bengio",Not All Neural Embeddings are Born Equal,cs.CL," Neural language models learn word representations that capture rich -linguistic and conceptual information. Here we investigate the embeddings -learned by neural machine translation models. We show that translation-based -embeddings outperform those learned by cutting-edge monolingual models at -single-language tasks requiring knowledge of conceptual similarity and/or -syntactic role. The findings suggest that, while monolingual models learn -information about how concepts are related, neural-translation models better -capture their true ontological status. -" -1378,1410.1080,"Valery D. Solovyev, Vladimir V. Bochkarev",Generating abbreviations using Google Books library,cs.CL stat.AP," The article describes the original method of creating a dictionary of -abbreviations based on the Google Books Ngram Corpus. The dictionary of -abbreviations is designed for Russian, yet as its methodology is universal it -can be applied to any language. The dictionary can be used to define the -function of the period during text segmentation in various applied systems of -text processing. The article describes difficulties encountered in the process -of its construction as well as the ways to overcome them. A model of evaluating -a probability of first and second type errors (extraction accuracy and -fullness) is constructed. Certain statistical data for the use of abbreviations -are provided. -" -1379,1410.1090,"Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan L. Yuille",Explain Images with Multimodal Recurrent Neural Networks,cs.CV cs.CL cs.LG," In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model -for generating novel sentence descriptions to explain the content of images. It -directly models the probability distribution of generating a word given -previous words and the image. Image descriptions are generated by sampling from -this distribution. The model consists of two sub-networks: a deep recurrent -neural network for sentences and a deep convolutional network for images. These -two sub-networks interact with each other in a multimodal layer to form the -whole m-RNN model. The effectiveness of our model is validated on three -benchmark datasets (IAPR TC-12, Flickr 8K, and Flickr 30K). Our model -outperforms the state-of-the-art generative method. In addition, the m-RNN -model can be applied to retrieval tasks for retrieving images or sentences, and -achieves significant performance improvement over the state-of-the-art methods -which directly optimize the ranking objective function for retrieval. -" -1380,1410.1135,"Walaa Medhat, Ahmed H. Yousef, Hoda Korashy","Corpora Preparation and Stopword List Generation for Arabic data in - Social Network",cs.CL," This paper proposes a methodology to prepare corpora in Arabic language from -online social network (OSN) and review site for Sentiment Analysis (SA) task. -The paper also proposes a methodology for generating a stopword list from the -prepared corpora. The aim of the paper is to investigate the effect of removing -stopwords on the SA task. The problem is that the stopwords lists generated -before were on Modern Standard Arabic (MSA) which is not the common language -used in OSN. We have generated a stopword list of Egyptian dialect and a -corpus-based list to be used with the OSN corpora. We compare the efficiency of -text classification when using the generated lists along with previously -generated lists of MSA and combining the Egyptian dialect list with the MSA -list. The text classification was performed using Na\""ive Bayes and Decision -Tree classifiers and two feature selection approaches, unigrams and bigram. The -experiments show that the general lists containing the Egyptian dialects words -give better performance than using lists of MSA stopwords only. -" -1381,1410.2045,Ashis Kumar Mandal and Rikta Sen,Supervised learning Methods for Bangla Web Document Categorization,cs.CL cs.LG," This paper explores the use of machine learning approaches, or more -specifically, four supervised learning Methods, namely Decision Tree(C 4.5), -K-Nearest Neighbour (KNN), Na\""ive Bays (NB), and Support Vector Machine (SVM) -for categorization of Bangla web documents. This is a task of automatically -sorting a set of documents into categories from a predefined set. Whereas a -wide range of methods have been applied to English text categorization, -relatively few studies have been conducted on Bangla language text -categorization. Hence, we attempt to analyze the efficiency of those four -methods for categorization of Bangla documents. In order to validate, Bangla -corpus from various websites has been developed and used as examples for the -experiment. For Bangla, empirical results support that all four methods produce -satisfactory performance with SVM attaining good result in terms of high -dimensional and relatively noisy document feature vectors. -" -1382,1410.2082,"Yang Liu, Maosong Sun",Contrastive Unsupervised Word Alignment with Non-Local Features,cs.CL," Word alignment is an important natural language processing task that -indicates the correspondence between natural languages. Recently, unsupervised -learning of log-linear models for word alignment has received considerable -attention as it combines the merits of generative and discriminative -approaches. However, a major challenge still remains: it is intractable to -calculate the expectations of non-local features that are critical for -capturing the divergence between natural languages. We propose a contrastive -approach that aims to differentiate observed training examples from noises. It -not only introduces prior knowledge to guide unsupervised learning but also -cancels out partition functions. Based on the observation that the probability -mass of log-linear models for word alignment is usually highly concentrated, we -propose to use top-n alignments to approximate the expectations with respect to -posterior distributions. This allows for efficient and accurate calculation of -expectations of non-local features. Experiments show that our approach achieves -significant improvements over state-of-the-art unsupervised word alignment -methods. -" -1383,1410.2149,Roger Bilisoly,Language-based Examples in the Statistics Classroom,cs.CL," Statistics pedagogy values using a variety of examples. Thanks to text -resources on the Web, and since statistical packages have the ability to -analyze string data, it is now easy to use language-based examples in a -statistics class. Three such examples are discussed here. First, many types of -wordplay (e.g., crosswords and hangman) involve finding words with letters that -satisfy a certain pattern. Second, linguistics has shown that idiomatic pairs -of words often appear together more frequently than chance. For example, in the -Brown Corpus, this is true of the phrasal verb to throw up (p-value=7.92E-10.) -Third, a pangram contains all the letters of the alphabet at least once. These -are searched for in Charles Dickens' A Christmas Carol, and their lengths are -compared to the expected value given by the unequal probability coupon -collector's problem as well as simulations. -" -1384,1410.2265,"Chetan Kaushik, Atul Mishra","A Scalable, Lexicon Based Technique for Sentiment Analysis",cs.IR cs.CL," Rapid increase in the volume of sentiment rich social media on the web has -resulted in an increased interest among researchers regarding Sentimental -Analysis and opinion mining. However, with so much social media available on -the web, sentiment analysis is now considered as a big data task. Hence the -conventional sentiment analysis approaches fails to efficiently handle the vast -amount of sentiment data available now a days. The main focus of the research -was to find such a technique that can efficiently perform sentiment analysis on -big data sets. A technique that can categorize the text as positive, negative -and neutral in a fast and accurate manner. In the research, sentiment analysis -was performed on a large data set of tweets using Hadoop and the performance of -the technique was measured in form of speed and accuracy. The experimental -results shows that the technique exhibits very good efficiency in handling big -sentiment data sets. -" -1385,1410.2455,"Stephan Gouws, Yoshua Bengio, Greg Corrado","BilBOWA: Fast Bilingual Distributed Representations without Word - Alignments",stat.ML cs.CL cs.LG," We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple -and computationally-efficient model for learning bilingual distributed -representations of words which can scale to large monolingual datasets and does -not require word-aligned parallel training data. Instead it trains directly on -monolingual data and extracts a bilingual signal from a smaller set of raw-text -sentence-aligned data. This is achieved using a novel sampled bag-of-words -cross-lingual objective, which is used to regularize two noise-contrastive -language models for efficient cross-lingual feature learning. We show that -bilingual embeddings learned using the proposed model outperform -state-of-the-art methods on a cross-lingual document classification task as -well as a lexical translation task on WMT11 data. -" -1386,1410.2479,"Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann","Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy - and Reverberant Environments",cs.CL cs.NE cs.SD stat.ML," We propose a spatial diffuseness feature for deep neural network (DNN)-based -automatic speech recognition to improve recognition accuracy in reverberant and -noisy environments. The feature is computed in real-time from multiple -microphone signals without requiring knowledge or estimation of the direction -of arrival, and represents the relative amount of diffuse noise in each time -and frequency bin. It is shown that using the diffuseness feature as an -additional input to a DNN-based acoustic model leads to a reduced word error -rate for the REVERB challenge corpus, both compared to logmelspec features -extracted from noisy signals, and features enhanced by spectral subtraction. -" -1387,1410.2646,"Mohamed Bebah, Chennoufi Amine, Mazroui Azzeddine, Lakhouaja Abdelhak",Hybrid approaches for automatic vowelization of Arabic texts,cs.CL," Hybrid approaches for automatic vowelization of Arabic texts are presented in -this article. The process is made up of two modules. In the first one, a -morphological analysis of the text words is performed using the open source -morphological Analyzer AlKhalil Morpho Sys. Outputs for each word analyzed out -of context, are its different possible vowelizations. The integration of this -Analyzer in our vowelization system required the addition of a lexical database -containing the most frequent words in Arabic language. Using a statistical -approach based on two hidden Markov models (HMM), the second module aims to -eliminate the ambiguities. Indeed, for the first HMM, the unvowelized Arabic -words are the observed states and the vowelized words are the hidden states. -The observed states of the second HMM are identical to those of the first, but -the hidden states are the lists of possible diacritics of the word without its -Arabic letters. Our system uses Viterbi algorithm to select the optimal path -among the solutions proposed by Al Khalil Morpho Sys. Our approach opens an -important way to improve the performance of automatic vowelization of Arabic -texts for other uses in automatic natural language processing. -" -1388,1410.2686,"Ferhat \""Ozg\""ur \c{C}atak","Polarization Measurement of High Dimensional Social Media Messages With - Support Vector Machine Algorithm Using Mapreduce",cs.LG cs.CL," In this article, we propose a new Support Vector Machine (SVM) training -algorithm based on distributed MapReduce technique. In literature, there are a -lots of research that shows us SVM has highest generalization property among -classification algorithms used in machine learning area. Also, SVM classifier -model is not affected by correlations of the features. But SVM uses quadratic -optimization techniques in its training phase. The SVM algorithm is formulated -as quadratic optimization problem. Quadratic optimization problem has $O(m^3)$ -time and $O(m^2)$ space complexity, where m is the training set size. The -computation time of SVM training is quadratic in the number of training -instances. In this reason, SVM is not a suitable classification algorithm for -large scale dataset classification. To solve this training problem we developed -a new distributed MapReduce method developed. Accordingly, (i) SVM algorithm is -trained in distributed dataset individually; (ii) then merge all support -vectors of classifier model in every trained node; and (iii) iterate these two -steps until the classifier model converges to the optimal classifier function. -In the implementation phase, large scale social media dataset is presented in -TFxIDF matrix. The matrix is used for sentiment analysis to get polarization -value. Two and three class models are created for classification method. -Confusion matrices of each classification model are presented in tables. Social -media messages corpus consists of 108 public and 66 private universities -messages in Turkey. Twitter is used for source of corpus. Twitter user messages -are collected using Twitter Streaming API. Results are shown in graphics and -tables. -" -1389,1410.2871,"S. V. Kasmir Raja, V. Rajitha, Meenakshi Lakshmanan","An Ontology for Comprehensive Tutoring of Euphonic Conjunctions of - Sanskrit Grammar",cs.CL," Euphonic conjunctions (sandhis) form a very important aspect of Sanskrit -morphology and phonology. The traditional and modern methods of studying about -euphonic conjunctions in Sanskrit follow different methodologies. The former -involves a rigorous study of the Paninian system embodied in Panini's -Ashtadhyayi, while the latter usually involves the study of a few important -sandhi rules with the use of examples. The former is not suitable for -beginners, and the latter, not sufficient to gain a comprehensive understanding -of the operation of sandhi rules. This is so since there are not only numerous -sandhi rules and exceptions, but also complex precedence rules involved. The -need for a new ontology for sandhi-tutoring was hence felt. This work presents -a comprehensive ontology designed to enable a student-user to learn in stages -all about euphonic conjunctions and the relevant aphorisms of Sanskrit grammar -and to test and evaluate the progress of the student-user. The ontology forms -the basis of a multimedia sandhi tutor that was given to different categories -of users including Sanskrit scholars for extensive and rigorous testing. -" -1390,1410.2910,Daoud Clarke,Riesz Logic,cs.LO cs.CL," We introduce Riesz Logic, whose models are abelian lattice ordered groups, -which generalise Riesz spaces (vector lattices), and show soundness and -completeness. Our motivation is to provide a logic for distributional semantics -of natural language, where words are typically represented as elements of a -vector space whose dimensions correspond to contexts in which words may occur. -This basis provides a lattice ordering on the space, and this ordering may be -interpreted as ""distributional entailment"". Several axioms of Riesz Logic are -familiar from Basic Fuzzy Logic, and we show how the models of these two logics -may be related; Riesz Logic may thus be considered a new fuzzy logic. In -addition to applications in natural language processing, there is potential for -applying the theory to neuro-fuzzy systems. -" -1391,1410.3460,"Junhui Shen, Peiyan Zhu, Rui Fan, Wei Tan","Sentiment Analysis based on User Tag for Traditional Chinese Medicine in - Weibo",cs.CL cs.SI," With the acceptance of Western culture and science, Traditional Chinese -Medicine (TCM) has become a controversial issue in China. So, it's important to -study the public's sentiment and opinion on TCM. The rapid development of -online social network, such as twitter, make it convenient and efficient to -sample hundreds of millions of people for the aforementioned sentiment study. -To the best of our knowledge, the present work is the first attempt that -applies sentiment analysis to the domain of TCM on Sina Weibo (a twitter-like -microblogging service in China). In our work, firstly we collect tweets topic -about TCM from Sina Weibo, and label the tweets as supporting TCM and opposing -TCM automatically based on user tag. Then, a support vector machine classifier -has been built to predict the sentiment of TCM tweets without labels. Finally, -we present a method to adjust the classifier result. The performance of -F-measure attained with our method is 97%. -" -1392,1410.3791,"Rami Al-Rfou, Vivek Kulkarni, Bryan Perozzi, Steven Skiena",POLYGLOT-NER: Massive Multilingual Named Entity Recognition,cs.CL cs.LG," The increasing diversity of languages used on the web introduces a new level -of complexity to Information Retrieval (IR) systems. We can no longer assume -that textual content is written in one language or even the same language -family. In this paper, we demonstrate how to build massive multilingual -annotators with minimal human expertise and intervention. We describe a system -that builds Named Entity Recognition (NER) annotators for 40 major languages -using Wikipedia and Freebase. Our approach does not require NER human annotated -datasets or language specific resources like treebanks, parallel corpora, and -orthographic rules. The novelty of approach lies therein - using only language -agnostic techniques, while achieving competitive performance. - Our method learns distributed word representations (word embeddings) which -encode semantic and syntactic features of words in each language. Then, we -automatically generate datasets from Wikipedia link structure and Freebase -attributes. Finally, we apply two preprocessing stages (oversampling and exact -surface form matching) which do not require any linguistic expertise. - Our evaluation is two fold: First, we demonstrate the system performance on -human annotated datasets. Second, for languages where no gold-standard -benchmarks are available, we propose a new method, distant evaluation, based on -statistical machine translation. -" -1393,1410.3916,"Jason Weston, Sumit Chopra, Antoine Bordes",Memory Networks,cs.AI cs.CL stat.ML," We describe a new class of learning models called memory networks. Memory -networks reason with inference components combined with a long-term memory -component; they learn how to use these jointly. The long-term memory can be -read and written to, with the goal of using it for prediction. We investigate -these models in the context of question answering (QA) where the long-term -memory effectively acts as a (dynamic) knowledge base, and the output is a -textual response. We evaluate them on a large-scale QA task, and a smaller, but -more complex, toy task generated from a simulated world. In the latter, we show -the reasoning power of such models by chaining multiple supporting sentences to -answer questions that require understanding the intension of verbs. -" -1394,1410.4176,"Samuel R. Bowman, Christopher Potts, and Christopher D. Manning",Learning Distributed Word Representations for Natural Logic Reasoning,cs.CL," Natural logic offers a powerful relational conception of meaning that is a -natural counterpart to distributed semantic representations, which have proven -valuable in a wide range of sophisticated language tasks. However, it remains -an open question whether it is possible to train distributed representations to -support the rich, diverse logical reasoning captured by natural logic. We -address this question using two neural network-based models for learning -embeddings: plain neural networks and neural tensor networks. Our experiments -evaluate the models' ability to learn the basic algebra of natural logic -relations from simulated data and from the WordNet noun graph. The overall -positive results are promising for the future of learned distributed -representations in the applied modeling of logical semantics. -" -1395,1410.4281,"Xiangang Li, Xihong Wu","Constructing Long Short-Term Memory based Deep Recurrent Neural Networks - for Large Vocabulary Speech Recognition",cs.CL cs.NE," Long short-term memory (LSTM) based acoustic modeling methods have recently -been shown to give state-of-the-art performance on some speech recognition -tasks. To achieve a further performance improvement, in this research, deep -extensions on LSTM are investigated considering that deep hierarchical model -has turned out to be more efficient than a shallow one. Motivated by previous -research on constructing deep recurrent neural networks (RNNs), alternative -deep LSTM architectures are proposed and empirically evaluated on a large -vocabulary conversational telephone speech recognition task. Meanwhile, -regarding to multi-GPU devices, the training process for LSTM networks is -introduced and discussed. Experimental results demonstrate that the deep LSTM -networks benefit from the depth and yield the state-of-the-art performance on -this task. -" -1396,1410.4445,Massimo Stella and Markus Brede,"Patterns in the English Language: Phonological Networks, Percolation and - Assembly Models",cs.CL cond-mat.stat-mech," In this paper we provide a quantitative framework for the study of -phonological networks (PNs) for the English language by carrying out principled -comparisons to null models, either based on site percolation, randomization -techniques, or network growth models. In contrast to previous work, we mainly -focus on null models that reproduce lower order characteristics of the -empirical data. We find that artificial networks matching connectivity -properties of the English PN are exceedingly rare: this leads to the hypothesis -that the word repertoire might have been assembled over time by preferentially -introducing new words which are small modifications of old words. Our null -models are able to explain the ""power-law-like"" part of the degree -distributions and generally retrieve qualitative features of the PN such as -high clustering, high assortativity coefficient, and small-world -characteristics. However, the detailed comparison to expectations from null -models also points out significant differences, suggesting the presence of -additional constraints in word assembly. Key constraints we identify are the -avoidance of large degrees, the avoidance of triadic closure, and the avoidance -of large non-percolating clusters. -" -1397,1410.4510,Finale Doshi-Velez and Byron Wallace and Ryan Adams,Graph-Sparse LDA: A Topic Model with Structured Sparsity,stat.ML cs.CL cs.LG," Originally designed to model text, topic modeling has become a powerful tool -for uncovering latent structure in domains including medicine, finance, and -vision. The goals for the model vary depending on the application: in some -cases, the discovered topics may be used for prediction or some other -downstream task. In other cases, the content of the topic itself may be of -intrinsic scientific interest. - Unfortunately, even using modern sparse techniques, the discovered topics are -often difficult to interpret due to the high dimensionality of the underlying -space. To improve topic interpretability, we introduce Graph-Sparse LDA, a -hierarchical topic model that leverages knowledge of relationships between -words (e.g., as encoded by an ontology). In our model, topics are summarized by -a few latent concept-words from the underlying graph that explain the observed -words. Graph-Sparse LDA recovers sparse, interpretable summaries on two -real-world biomedical datasets while matching state-of-the-art prediction -performance. -" -1398,1410.4639,"Darryl McAdams, Jonathan Sterling",Dependent Types for Pragmatics,cs.CL," This paper proposes the use of dependent types for pragmatic phenomena such -as pronoun binding and presupposition resolution as a type-theoretic -alternative to formalisms such as Discourse Representation Theory and Dynamic -Semantics. -" -1399,1410.4863,Yannis Haralambous and Yassir Elidrissi and Philippe Lenca,"Arabic Language Text Classification Using Dependency Syntax-Based - Feature Selection",cs.CL," We study the performance of Arabic text classification combining various -techniques: (a) tfidf vs. dependency syntax, for feature selection and -weighting; (b) class association rules vs. support vector machines, for -classification. The Arabic text is used in two forms: rootified and lightly -stemmed. The results we obtain show that lightly stemmed text leads to better -performance than rootified text; that class association rules are better suited -for small feature sets obtained by dependency syntax constraints; and, finally, -that support vector machines are better suited for large feature sets based on -morphological feature selection criteria. -" -1400,1410.4868,"Kathryn Baker, Michael Bloodgood, Bonnie J. Dorr, Nathaniel W. - Filardo, Lori Levin and Christine Piatko",A Modality Lexicon and its use in Automatic Tagging,cs.CL," This paper describes our resource-building results for an eight-week JHU -Human Language Technology Center of Excellence Summer Camp for Applied Language -Exploration (SCALE-2009) on Semantically-Informed Machine Translation. -Specifically, we describe the construction of a modality annotation scheme, a -modality lexicon, and two automated modality taggers that were built using the -lexicon and annotation scheme. Our annotation scheme is based on identifying -three components of modality: a trigger, a target and a holder. We describe how -our modality lexicon was produced semi-automatically, expanding from an initial -hand-selected list of modality trigger words and phrases. The resulting -expanded modality lexicon is being made publicly available. We demonstrate that -one tagger---a structure-based tagger---results in precision around 86% -(depending on genre) for tagging of a standard LDC data set. In a machine -translation application, using the structure-based tagger to annotate English -modalities on an English-Urdu training corpus improved the translation quality -score for Urdu by 0.3 Bleu points in the face of sparse training data. -" -1401,1410.4966,Chiraag Lala and Shay B. Cohen,"The Visualization of Change in Word Meaning over Time using Temporal - Word Embeddings",cs.CL," We describe a visualization tool that can be used to view the change in -meaning of words over time. The tool makes use of existing (static) word -embedding datasets together with a timestamped $n$-gram corpus to create {\em -temporal} word embeddings. -" -1402,1410.5078,Paolo Pareti and Ewan Klein,Learning Vague Concepts for the Semantic Web,cs.AI cs.CL," Ontologies can be a powerful tool for structuring knowledge, and they are -currently the subject of extensive research. Updating the contents of an -ontology or improving its interoperability with other ontologies is an -important but difficult process. In this paper, we focus on the presence of -vague concepts, which are pervasive in natural language, within the framework -of formal ontologies. We will adopt a framework in which vagueness is captured -via numerical restrictions that can be automatically adjusted. Since updating -vague concepts, either through ontology alignment or ontology evolution, can -lead to inconsistent sets of axioms, we define and implement a method to -detecting and repairing such inconsistencies in a local fashion. -" -1403,1410.5485,Ramon Ferrer-i-Cancho,A stronger null hypothesis for crossing dependencies,cs.CL cs.SI physics.soc-ph," The syntactic structure of a sentence can be modeled as a tree where vertices -are words and edges indicate syntactic dependencies between words. It is -well-known that those edges normally do not cross when drawn over the sentence. -Here a new null hypothesis for the number of edge crossings of a sentence is -presented. That null hypothesis takes into account the length of the pair of -edges that may cross and predicts the relative number of crossings in random -trees with a small error, suggesting that a ban of crossings or a principle of -minimization of crossings are not needed in general to explain the origins of -non-crossing dependencies. Our work paves the way for more powerful null -hypotheses to investigate the origins of non-crossing dependencies in nature. -" -1404,1410.5491,Michael Bloodgood and Chris Callison-Burch,Using Mechanical Turk to Build Machine Translation Evaluation Sets,cs.CL cs.LG stat.ML," Building machine translation (MT) test sets is a relatively expensive task. -As MT becomes increasingly desired for more and more language pairs and more -and more domains, it becomes necessary to build test sets for each case. In -this paper, we investigate using Amazon's Mechanical Turk (MTurk) to make MT -test sets cheaply. We find that MTurk can be used to make test sets much -cheaper than professionally-produced test sets. More importantly, in -experiments with multiple MT systems, we find that the MTurk-produced test sets -yield essentially the same conclusions regarding system performance as the -professionally-produced test sets yield. -" -1405,1410.5877,Michael Bloodgood and Chris Callison-Burch,"Bucking the Trend: Large-Scale Cost-Focused Active Learning for - Statistical Machine Translation",cs.CL cs.LG stat.ML," We explore how to improve machine translation systems by adding more -translation data in situations where we already have substantial resources. The -main challenge is how to buck the trend of diminishing returns that is commonly -encountered. We present an active learning-style data solicitation algorithm to -meet this challenge. We test it, gathering annotations via Amazon Mechanical -Turk, and find that we get an order of magnitude increase in performance rates -of improvement. -" -1406,1410.6830,"I\c{s}{\i}k Bar{\i}\c{s} Fidaner, Ali Taylan Cemgil",Clustering Words by Projection Entropy,cs.CL cs.LG," We apply entropy agglomeration (EA), a recently introduced algorithm, to -cluster the words of a literary text. EA is a greedy agglomerative procedure -that minimizes projection entropy (PE), a function that can quantify the -segmentedness of an element set. To apply it, the text is reduced to a feature -allocation, a combinatorial object to represent the word occurences in the -text's paragraphs. The experiment results demonstrate that EA, despite its -reduction and simplicity, is useful in capturing significant relationships -among the words in the text. This procedure was implemented in Python and -published as a free software: REBUS. -" -1407,1410.6903,Laxmi Narayana M. and Sunil Kumar Kopparapu,Choice of Mel Filter Bank in Computing MFCC of a Resampled Speech,cs.SD cs.CL," Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used -speech features in most speech and speaker recognition applications. In this -paper, we study the effect of resampling a speech signal on these speech -features. We first derive a relationship between the MFCC param- eters of the -resampled speech and the MFCC parameters of the original speech. We propose six -methods of calculating the MFCC parameters of downsampled speech by -transforming the Mel filter bank used to com- pute MFCC of the original speech. -We then experimentally compute the MFCC parameters of the down sampled speech -using the proposed meth- ods and compute the Pearson coefficient between the -MFCC parameters of the downsampled speech and that of the original speech to -identify the most effective choice of Mel-filter band that enables the computed -MFCC of the resampled speech to be as close as possible to the original speech -sample MFCC. -" -1408,1410.7182,"Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp, - Genevieve Gorrell, Rapha\""el Troncy, Johann Petrak, Kalina Bontcheva",Analysis of Named Entity Recognition and Linking for Tweets,cs.CL," Applying natural language processing for mining and intelligent information -access to tweets (a form of microblog) is a challenging, emerging research -area. Unlike carefully authored news text and other longer content, tweets pose -a number of new challenges, due to their short, noisy, context-dependent, and -dynamic nature. Information extraction from tweets is typically performed in a -pipeline, comprising consecutive stages of language identification, -tokenisation, part-of-speech tagging, named entity recognition and entity -disambiguation (e.g. with respect to DBpedia). In this work, we describe a new -Twitter entity disambiguation dataset, and conduct an empirical analysis of -named entity recognition and disambiguation, investigating how robust a number -of state-of-the-art systems are on such noisy texts, what the main sources of -error are, and which problems should be further investigated to improve the -state of the art. -" -1409,1410.7382,Kiran Kumar Bhuvanagiri and Sunil Kumar Kopparapu,Modified Mel Filter Bank to Compute MFCC of Subsampled Speech,cs.CL cs.SD," Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used -speech features in most speech and speaker recognition applications. In this -work, we propose a modified Mel filter bank to extract MFCCs from subsampled -speech. We also propose a stronger metric which effectively captures the -correlation between MFCCs of original speech and MFCC of resampled speech. It -is found that the proposed method of filter bank construction performs -distinguishably well and gives recognition performance on resampled speech -close to recognition accuracies on original speech. -" -1410,1410.7787,"David Zajic, Michael Maxwell, David Doermann, Paul Rodrigues and - Michael Bloodgood","Correcting Errors in Digital Lexicographic Resources Using a Dictionary - Manipulation Language",cs.CL," We describe a paradigm for combining manual and automatic error correction of -noisy structured lexicographic data. Modifications to the structure and -underlying text of the lexicographic data are expressed in a simple, -interpreted programming language. Dictionary Manipulation Language (DML) -commands identify nodes by unique identifiers, and manipulations are performed -using simple commands such as create, move, set text, etc. Corrected lexicons -are produced by applying sequences of DML commands to the source version of the -lexicon. DML commands can be written manually to repair one-off errors or -generated automatically to correct recurring problems. We discuss advantages of -the paradigm for the task of editing digital bilingual dictionaries. -" -1411,1410.8027,Mateusz Malinowski and Mario Fritz,Towards a Visual Turing Challenge,cs.AI cs.CL cs.CV cs.LG," As language and visual understanding by machines progresses rapidly, we are -observing an increasing interest in holistic architectures that tightly -interlink both modalities in a joint learning and inference process. This trend -has allowed the community to progress towards more challenging and open tasks -and refueled the hope at achieving the old AI dream of building machines that -could pass a turing test in open domains. In order to steadily make progress -towards this goal, we realize that quantifying performance becomes increasingly -difficult. Therefore we ask how we can precisely define such challenges and how -we can evaluate different algorithms on this open tasks? In this paper, we -summarize and discuss such challenges as well as try to give answers where -appropriate options are available in the literature. We exemplify some of the -solutions on a recently presented dataset of question-answering task based on -real-world indoor images that establishes a visual turing challenge. Finally, -we argue despite the success of unique ground-truth annotation, we likely have -to step away from carefully curated dataset and rather rely on 'social -consensus' as the main driving force to create suitable benchmarks. Providing -coverage in this inherently ambiguous output space is an emerging challenge -that we face in order to make quantifiable progress in this area. -" -1412,1410.8149,"Paul Rodrigues, David Zajic, David Doermann, Michael Bloodgood and - Peng Ye","Detecting Structural Irregularity in Electronic Dictionaries Using - Language Modeling",cs.CL cs.LG," Dictionaries are often developed using tools that save to Extensible Markup -Language (XML)-based standards. These standards often allow high-level -repeating elements to represent lexical entries, and utilize descendants of -these repeating elements to represent the structure within each lexical entry, -in the form of an XML tree. In many cases, dictionaries are published that have -errors and inconsistencies that are expensive to find manually. This paper -discusses a method for dictionary writers to quickly audit structural -regularity across entries in a dictionary by using statistical language -modeling. The approach learns the patterns of XML nodes that could occur within -an XML tree, and then calculates the probability of each XML tree in the -dictionary against these patterns to look for entries that diverge from the -norm. -" -1413,1410.8206,"Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech - Zaremba",Addressing the Rare Word Problem in Neural Machine Translation,cs.CL cs.LG cs.NE," Neural Machine Translation (NMT) is a new approach to machine translation -that has shown promising results that are comparable to traditional approaches. -A significant weakness in conventional NMT systems is their inability to -correctly translate very rare words: end-to-end NMTs tend to have relatively -small vocabularies with a single unk symbol that represents every possible -out-of-vocabulary (OOV) word. In this paper, we propose and implement an -effective technique to address this problem. We train an NMT system on data -that is augmented by the output of a word alignment algorithm, allowing the NMT -system to emit, for each OOV word in the target sentence, the position of its -corresponding word in the source sentence. This information is later utilized -in a post-processing step that translates every OOV word using a dictionary. -Our experiments on the WMT14 English to French translation task show that this -method provides a substantial improvement of up to 2.8 BLEU points over an -equivalent NMT system that does not use this technique. With 37.5 BLEU points, -our NMT system is the first to surpass the best result achieved on a WMT14 -contest task. -" -1414,1410.8326,Nicholas H. Kirk,Towards Learning Object Affordance Priors from Technical Texts,cs.LG cs.AI cs.CL cs.RO," Everyday activities performed by artificial assistants can potentially be -executed naively and dangerously given their lack of common sense knowledge. -This paper presents conceptual work towards obtaining prior knowledge on the -usual modality (passive or active) of any given entity, and their affordance -estimates, by extracting high-confidence ability modality semantic relations (X -can Y relationship) from non-figurative texts, by analyzing co-occurrence of -grammatical instances of subjects and verbs, and verbs and objects. The -discussion includes an outline of the concept, potential and limitations, and -possible feature and learning framework adoption. -" -1415,1410.8498,"Emma Strubell, Luke Vilnis, Andrew McCallum",Training for Fast Sequential Prediction Using Dynamic Feature Selection,cs.CL cs.AI," We present paired learning and inference algorithms for significantly -reducing computation and increasing speed of the vector dot products in the -classifiers that are at the heart of many NLP components. This is accomplished -by partitioning the features into a sequence of templates which are ordered -such that high confidence can often be reached using only a small fraction of -all features. Parameter estimation is arranged to maximize accuracy and early -confidence in this sequence. We present experiments in left-to-right -part-of-speech tagging on WSJ, demonstrating that we can preserve accuracy -above 97% with over a five-fold reduction in run-time. -" -1416,1410.8553,"Michael Bloodgood, Peng Ye, Paul Rodrigues, David Zajic and David - Doermann","A random forest system combination approach for error detection in - digital dictionaries",cs.CL cs.LG stat.ML," When digitizing a print bilingual dictionary, whether via optical character -recognition or manual entry, it is inevitable that errors are introduced into -the electronic version that is created. We investigate automating the process -of detecting errors in an XML representation of a digitized print dictionary -using a hybrid approach that combines rule-based, feature-based, and language -model-based methods. We investigate combining methods and show that using -random forests is a promising approach. We find that in isolation, unsupervised -methods rival the performance of supervised methods. Random forests typically -require training data so we investigate how we can apply random forests to -combine individual base methods that are themselves unsupervised without -requiring large amounts of training data. Experiments reveal empirically that a -relatively small amount of data is sufficient and can potentially be further -reduced through specific selection criteria. -" -1417,1410.8581,"Dilek K\""u\c{c}\""uk and Yusuf Arslan","Semi-Automatic Construction of a Domain Ontology for Wind Energy Using - Wikipedia Articles",cs.CL cs.CE," Domain ontologies are important information sources for knowledge-based -systems. Yet, building domain ontologies from scratch is known to be a very -labor-intensive process. In this study, we present our semi-automatic approach -to building an ontology for the domain of wind energy which is an important -type of renewable energy with a growing share in electricity generation all -over the world. Related Wikipedia articles are first processed in an automated -manner to determine the basic concepts of the domain together with their -properties and next the concepts, properties, and relationships are organized -to arrive at the ultimate ontology. We also provide pointers to other -engineering ontologies which could be utilized together with the proposed wind -energy ontology in addition to its prospective application areas. The current -study is significant as, to the best of our knowledge, it proposes the first -considerably wide-coverage ontology for the wind energy domain and the ontology -is built through a semi-automatic process which makes use of the related Web -resources, thereby reducing the overall cost of the ontology building process. -" -1418,1410.8668,"Dilek K\""u\c{c}\""uk and Ralf Steinberger",Experiments to Improve Named Entity Recognition on Turkish Tweets,cs.CL," Social media texts are significant information sources for several -application areas including trend analysis, event monitoring, and opinion -mining. Unfortunately, existing solutions for tasks such as named entity -recognition that perform well on formal texts usually perform poorly when -applied to social media texts. In this paper, we report on experiments that -have the purpose of improving named entity recognition on Turkish tweets, using -two different annotated data sets. In these experiments, starting with a -baseline named entity recognition system, we adapt its recognition rules and -resources to better fit Twitter language by relaxing its capitalization -constraint and by diacritics-based expansion of its lexical resources, and we -employ a simplistic normalization scheme on tweets to observe the effects of -these on the overall named entity recognition performance on Turkish tweets. -The evaluation results of the system with these different settings are provided -with discussions of these results. -" -1419,1410.8749,"Jiwei Li, Xun Wang and Eduard Hovy",What a Nasty day: Exploring Mood-Weather Relationship from Twitter,cs.SI cs.CL," While it has long been believed in psychology that weather somehow influences -human's mood, the debates have been going on for decades about how they are -correlated. In this paper, we try to study this long-lasting topic by -harnessing a new source of data compared from traditional psychological -researches: Twitter. We analyze 2 years' twitter data collected by twitter API -which amounts to $10\%$ of all postings and try to reveal the correlations -between multiple dimensional structure of human mood with meteorological -effects. Some of our findings confirm existing hypotheses, while others -contradict them. We are hopeful that our approach, along with the new data -source, can shed on the long-going debates on weather-mood correlation. -" -1420,1410.8783,"Nabil Khoufi, Chafik Aloulou, Lamia Hadrich Belguith",Supervised learning model for parsing Arabic language,cs.CL cs.LG," Parsing the Arabic language is a difficult task given the specificities of -this language and given the scarcity of digital resources (grammars and -annotated corpora). In this paper, we suggest a method for Arabic parsing based -on supervised machine learning. We used the SVMs algorithm to select the -syntactic labels of the sentence. Furthermore, we evaluated our parser -following the cross validation method by using the Penn Arabic Treebank. The -obtained results are very encouraging. -" -1421,1410.8808,"Paolo Pareti, Ewan Klein and Adam Barker",A Semantic Web of Know-How: Linked Data for Community-Centric Tasks,cs.AI cs.CL," This paper proposes a novel framework for representing community know-how on -the Semantic Web. Procedural knowledge generated by web communities typically -takes the form of natural language instructions or videos and is largely -unstructured. The absence of semantic structure impedes the deployment of many -useful applications, in particular the ability to discover and integrate -know-how automatically. We discuss the characteristics of community know-how -and argue that existing knowledge representation frameworks fail to represent -it adequately. We present a novel framework for representing the semantic -structure of community know-how and demonstrate the feasibility of our approach -by providing a concrete implementation which includes a method for -automatically acquiring procedural knowledge for real-world tasks. -" -1422,1411.0007,"John E. Miller, Michael Bloodgood, Manabu Torii and K. Vijay-Shanker",Rapid Adaptation of POS Tagging for Domain Specific Uses,cs.CL cs.LG stat.ML," Part-of-speech (POS) tagging is a fundamental component for performing -natural language tasks such as parsing, information extraction, and question -answering. When POS taggers are trained in one domain and applied in -significantly different domains, their performance can degrade dramatically. We -present a methodology for rapid adaptation of POS taggers to new domains. Our -technique is unsupervised in that a manually annotated corpus for the new -domain is not necessary. We use suffix information gathered from large amounts -of raw text as well as orthographic information to increase the lexical -coverage. We present an experiment in the Biological domain where our POS -tagger achieves results comparable to POS taggers specifically trained to this -domain. -" -1423,1411.0129,"Philippe Vincent-Lamarre, Alexandre Blondin Mass\'e, Marcos Lopes, - M\'elanie Lord, Odile Marcotte, Stevan Harnad",The Latent Structure of Dictionaries,cs.CL cs.IR," How many words (and which ones) are sufficient to define all other words? -When dictionaries are analyzed as directed graphs with links from defining -words to defined words, they reveal a latent structure. Recursively removing -all words that are reachable by definition but that do not define any further -words reduces the dictionary to a Kernel of about 10%. This is still not the -smallest number of words that can define all the rest. About 75% of the Kernel -turns out to be its Core, a Strongly Connected Subset of words with a -definitional path to and from any pair of its words and no word's definition -depending on a word outside the set. But the Core cannot define all the rest of -the dictionary. The 25% of the Kernel surrounding the Core consists of small -strongly connected subsets of words: the Satellites. The size of the smallest -set of words that can define all the rest (the graph's Minimum Feedback Vertex -Set or MinSet) is about 1% of the dictionary, 15% of the Kernel, and half-Core, -half-Satellite. But every dictionary has a huge number of MinSets. The Core -words are learned earlier, more frequent, and less concrete than the -Satellites, which in turn are learned earlier and more frequent but more -concrete than the rest of the Dictionary. In principle, only one MinSet's words -would need to be grounded through the sensorimotor capacity to recognize and -categorize their referents. In a dual-code sensorimotor-symbolic model of the -mental lexicon, the symbolic code could do all the rest via re-combinatory -definition. -" -1424,1411.0588,"Nadezhda Borisova, Grigor Iliev, Elena Karashtranova","On Detecting Noun-Adjective Agreement Errors in Bulgarian Language Using - GATE",cs.CL," In this article, we describe an approach for automatic detection of -noun-adjective agreement errors in Bulgarian texts by explaining the necessary -steps required to develop a simple Java-based language processing application. -For this purpose, we use the GATE language processing framework, which is -capable of analyzing texts in Bulgarian language and can be embedded in -software applications, accessed through a set of Java APIs. In our example -application we also demonstrate how to use the functionality of GATE to perform -regular expressions over annotations for detecting agreement errors in simple -noun phrases formed by two words - attributive adjective and a noun, where the -attributive adjective precedes the noun. The provided code samples can also be -used as a starting point for implementing natural language processing -functionalities in software applications related to language processing tasks -like detection, annotation and retrieval of word groups meeting a specific set -of criteria. -" -1425,1411.0778,"Xiaolei Huang, Lei Zhang, Tianli Liu, David Chiu, Tingshao Zhu, Xin Li","Detecting Suicidal Ideation in Chinese Microblogs with Psychological - Lexicons",cs.CL," Suicide is among the leading causes of death in China. However, technical -approaches toward preventing suicide are challenging and remaining under -development. Recently, several actual suicidal cases were preceded by users who -posted microblogs with suicidal ideation to Sina Weibo, a Chinese social media -network akin to Twitter. It would therefore be desirable to detect suicidal -ideations from microblogs in real-time, and immediately alert appropriate -support groups, which may lead to successful prevention. In this paper, we -propose a real-time suicidal ideation detection system deployed over Weibo, -using machine learning and known psychological techniques. Currently, we have -identified 53 known suicidal cases who posted suicide notes on Weibo prior to -their deaths.We explore linguistic features of these known cases using a -psychological lexicon dictionary, and train an effective suicidal Weibo post -detection model. 6714 tagged posts and several classifiers are used to verify -the model. By combining both machine learning and psychological knowledge, SVM -classifier has the best performance of different classifiers, yielding an -F-measure of 68:3%, a Precision of 78:9%, and a Recall of 60:3%. -" -1426,1411.0861,"Lei Zhang, Xiaolei Huang, Tianli Liu, Zhenxiang Chen, Tingshao Zhu","Using Linguistic Features to Estimate Suicide Probability of Chinese - Microblog Users",cs.SI cs.CL," If people with high risk of suicide can be identified through social media -like microblog, it is possible to implement an active intervention system to -save their lives. Based on this motivation, the current study administered the -Suicide Probability Scale(SPS) to 1041 weibo users at Sina Weibo, which is a -leading microblog service provider in China. Two NLP (Natural Language -Processing) methods, the Chinese edition of Linguistic Inquiry and Word Count -(LIWC) lexicon and Latent Dirichlet Allocation (LDA), are used to extract -linguistic features from the Sina Weibo data. We trained predicting models by -machine learning algorithm based on these two types of features, to estimate -suicide probability based on linguistic features. The experiment results -indicate that LDA can find topics that relate to suicide probability, and -improve the performance of prediction. Our study adds value in prediction of -suicidal probability of social network users with their behaviors. -" -1427,1411.0895,Liang Lu and Steve Renals,Tied Probabilistic Linear Discriminant Analysis for Speech Recognition,cs.CL cs.AI," Acoustic models using probabilistic linear discriminant analysis (PLDA) -capture the correlations within feature vectors using subspaces which do not -vastly expand the model. This allows high dimensional and correlated feature -spaces to be used, without requiring the estimation of multiple high dimension -covariance matrices. In this letter we extend the recently presented PLDA -mixture model for speech recognition through a tied PLDA approach, which is -better able to control the model size to avoid overfitting. We carried out -experiments using the Switchboard corpus, with both mel frequency cepstral -coefficient features and bottleneck feature derived from a deep neural network. -Reductions in word error rate were obtained by using tied PLDA, compared with -the PLDA mixture model, subspace Gaussian mixture models, and deep neural -networks. -" -1428,1411.1006,Javid Dadashkarimi and Azadeh Shakery and Heshaam Faili,"A Probabilistic Translation Method for Dictionary-based Cross-lingual - Information Retrieval in Agglutinative Languages",cs.IR cs.CL," Translation ambiguity, out of vocabulary words and missing some translations -in bilingual dictionaries make dictionary-based Cross-language Information -Retrieval (CLIR) a challenging task. Moreover, in agglutinative languages which -do not have reliable stemmers, missing various lexical formations in bilingual -dictionaries degrades CLIR performance. This paper aims to introduce a -probabilistic translation model to solve the ambiguity problem, and also to -provide most likely formations of a dictionary candidate. We propose Minimum -Edit Support Candidates (MESC) method that exploits a monolingual corpus and a -bilingual dictionary to translate users' native language queries to documents' -language. Our experiments show that the proposed method outperforms -state-of-the-art dictionary-based English-Persian CLIR. -" -1429,1411.1147,"Waleed Ammar, Chris Dyer, Noah A. Smith","Conditional Random Field Autoencoders for Unsupervised Structured - Prediction",cs.LG cs.CL," We introduce a framework for unsupervised learning of structured predictors -with overlapping, global features. Each input's latent representation is -predicted conditional on the observable data using a feature-rich conditional -random field. Then a reconstruction of the input is (re)generated, conditional -on the latent structure, using models for which maximum likelihood estimation -has a closed-form. Our autoencoder formulation enables efficient learning -without making unrealistic independence assumptions or restricting the kinds of -features that can be used. We illustrate insightful connections to traditional -autoencoders, posterior regularization and multi-view learning. We show -competitive results with instantiations of the model for two canonical NLP -tasks: part-of-speech induction and bitext word alignment, and show that -training our model can be substantially more efficient than comparable -feature-rich baselines. -" -1430,1411.1243,"Stylianos Kampakis, Andreas Adamides",Using Twitter to predict football outcomes,stat.ML cs.CL cs.SI," Twitter has been proven to be a notable source for predictive modelling on -various domains such as the stock market, the dissemination of diseases or -sports outcomes. However, such a study has not been conducted in football -(soccer) so far. The purpose of this research was to study whether data mined -from Twitter can be used for this purpose. We built a set of predictive models -for the outcome of football games of the English Premier League for a 3 month -period based on tweets and we studied whether these models can overcome -predictive models which use only historical data and simple football -statistics. Moreover, combined models are constructed using both Twitter and -historical data. The final results indicate that data mined from Twitter can -indeed be a useful source for predicting games in the Premier League. The final -Twitter-based model performs significantly better than chance when measured by -Cohen's kappa and is comparable to the model that uses simple statistics and -historical data. Combining both models raises the performance higher than it -was achieved by each individual model. Thereby, this study provides evidence -that Twitter derived features can indeed provide useful information for the -prediction of football (soccer) outcomes. -" -1431,1411.1999,"Hossam Ishkewy, Hany Harb and Hassan Farahat",Azhary: An Arabic Lexical Ontology,cs.AI cs.CL," Arabic language is the most spoken languages in the Semitic languages group, -and one of the most common languages in the world spoken by more than 422 -million. It is also of paramount importance to Muslims, it is a sacred language -of the Islamic Holly Book (Quran) and prayer (and other acts of worship) in -Islam is performed only by mastering some of Arabic words. Arabic is also a -major ritual language of a number of Christian churches in the Arab world and -it is also used in writing several intellectual and religious Jewish books in -the Middle Ages. Despite this, there is no semantic Arabic lexicon which -researchers can depend on. In this paper we introduce Azhary as a lexical -ontology for the Arabic language. It groups Arabic words into sets of synonyms -called synsets, and records a number of relationships between words such as -synonym, antonym, hypernym, hyponym, meronym, holonym and association -relations. The ontology contains 26,195 words organized in 13,328 synsets. It -has been developed and contrasted against AWN which is the most common -available Arabic lexical ontology. -" -1432,1411.2328,Xun Wang,Modeling Word Relatedness in Latent Dirichlet Allocation,cs.CL cs.AI," Standard LDA model suffers the problem that the topic assignment of each word -is independent and word correlation hence is neglected. To address this -problem, in this paper, we propose a model called Word Related Latent Dirichlet -Allocation (WR-LDA) by incorporating word correlation into LDA topic models. -This leads to new capabilities that standard LDA model does not have such as -estimating infrequently occurring words or multi-language topic modeling. -Experimental results demonstrate the effectiveness of our model compared with -standard LDA. -" -1433,1411.2539,"Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel","Unifying Visual-Semantic Embeddings with Multimodal Neural Language - Models",cs.LG cs.CL cs.CV," Inspired by recent advances in multimodal learning and machine translation, -we introduce an encoder-decoder pipeline that learns (a): a multimodal joint -embedding space with images and text and (b): a novel language model for -decoding distributed representations from our space. Our pipeline effectively -unifies joint image-text embedding models with multimodal neural language -models. We introduce the structure-content neural language model that -disentangles the structure of a sentence to its content, conditioned on -representations produced by the encoder. The encoder allows one to rank images -and sentences while the decoder can generate novel descriptions from scratch. -Using LSTM to encode sentences, we match the state-of-the-art performance on -Flickr8K and Flickr30K without using object detections. We also set new best -results when using the 19-layer Oxford convolutional network. Furthermore we -show that with linear encoders, the learned embedding space captures multimodal -regularities in terms of vector space arithmetic e.g. *image of a blue car* - -""blue"" + ""red"" is near images of red cars. Sample captions generated for 800 -images are made available for comparison. -" -1434,1411.2645,Ramon Ferrer-i-Cancho,"Non-crossing dependencies: least effort, not grammar",cs.CL cs.SI physics.soc-ph," The use of null hypotheses (in a statistical sense) is common in hard -sciences but not in theoretical linguistics. Here the null hypothesis that the -low frequency of syntactic dependency crossings is expected by an arbitrary -ordering of words is rejected. It is shown that this would require star -dependency structures, which are both unrealistic and too restrictive. The -hypothesis of the limited resources of the human brain is revisited. Stronger -null hypotheses taking into account actual dependency lengths for the -likelihood of crossings are presented. Those hypotheses suggests that crossings -are likely to reduce when dependencies are shortened. A hypothesis based on -pressure to reduce dependency lengths is more parsimonious than a principle of -minimization of crossings or a grammatical ban that is totally dissociated from -the general and non-linguistic principle of economy. -" -1435,1411.2674,"Fangjian Guo, Charles Blundell, Hanna Wallach and Katherine Heller","The Bayesian Echo Chamber: Modeling Social Influence via Linguistic - Accommodation",stat.ML cs.CL cs.LG cs.SI," We present the Bayesian Echo Chamber, a new Bayesian generative model for -social interaction data. By modeling the evolution of people's language usage -over time, this model discovers latent influence relationships between them. -Unlike previous work on inferring influence, which has primarily focused on -simple temporal dynamics evidenced via turn-taking behavior, our model captures -more nuanced influence relationships, evidenced via linguistic accommodation -patterns in interaction content. The model, which is based on a discrete analog -of the multivariate Hawkes process, permits a fully Bayesian inference -algorithm. We validate our model's ability to discover latent influence -patterns using transcripts of arguments heard by the US Supreme Court and the -movie ""12 Angry Men."" We showcase our model's capabilities by using it to infer -latent influence patterns from Federal Open Market Committee meeting -transcripts, demonstrating state-of-the-art performance at uncovering social -dynamics in group discussions. -" -1436,1411.2679,"Jiwei Li, Alan Ritter and Dan Jurafsky","Inferring User Preferences by Probabilistic Logical Reasoning over - Social Networks",cs.SI cs.AI cs.CL cs.LG," We propose a framework for inferring the latent attitudes or preferences of -users by performing probabilistic first-order logical reasoning over the social -network graph. Our method answers questions about Twitter users like {\em Does -this user like sushi?} or {\em Is this user a New York Knicks fan?} by building -a probabilistic model that reasons over user attributes (the user's location or -gender) and the social network (the user's friends and spouse), via inferences -like homophily (I am more likely to like sushi if spouse or friends like sushi, -I am more likely to like the Knicks if I live in New York). The algorithm uses -distant supervision, semi-supervised data harvesting and vector space models to -extract user attributes (e.g. spouse, education, location) and preferences -(likes and dislikes) from text. The extracted propositions are then fed into a -probabilistic reasoner (we investigate both Markov Logic and Probabilistic Soft -Logic). Our experiments show that probabilistic logical reasoning significantly -improves the performance on attribute and relation extraction, and also -achieves an F-score of 0.791 at predicting a users likes or dislikes, -significantly better than two strong baselines. -" -1437,1411.2738,Xin Rong,word2vec Parameter Learning Explained,cs.CL," The word2vec model and application by Mikolov et al. have attracted a great -amount of attention in recent two years. The vector representations of words -learned by word2vec models have been shown to carry semantic meanings and are -useful in various NLP tasks. As an increasing number of researchers would like -to experiment with word2vec or similar techniques, I notice that there lacks a -material that comprehensively explains the parameter learning process of word -embedding models in details, thus preventing researchers that are non-experts -in neural networks from understanding the working mechanism of such models. - This note provides detailed derivations and explanations of the parameter -update equations of the word2vec models, including the original continuous -bag-of-word (CBOW) and skip-gram (SG) models, as well as advanced optimization -techniques, including hierarchical softmax and negative sampling. Intuitive -interpretations of the gradient equations are also provided alongside -mathematical derivations. - In the appendix, a review on the basics of neuron networks and -backpropagation is provided. I also created an interactive demo, wevi, to -facilitate the intuitive understanding of the model. -" -1438,1411.3146,Karl Moritz Hermann,Distributed Representations for Compositional Semantics,cs.CL," The mathematical representation of semantics is a key issue for Natural -Language Processing (NLP). A lot of research has been devoted to finding ways -of representing the semantics of individual words in vector spaces. -Distributional approaches --- meaning distributed representations that exploit -co-occurrence statistics of large corpora --- have proved popular and -successful across a number of tasks. However, natural language usually comes in -structures beyond the word level, with meaning arising not only from the -individual words but also the structure they are contained in at the phrasal or -sentential level. Modelling the compositional process by which the meaning of -an utterance arises from the meaning of its parts is an equally fundamental -task of NLP. - This dissertation explores methods for learning distributed semantic -representations and models for composing these into representations for larger -linguistic units. Our underlying hypothesis is that neural models are a -suitable vehicle for learning semantically rich representations and that such -representations in turn are suitable vehicles for solving important tasks in -natural language processing. The contribution of this thesis is a thorough -evaluation of our hypothesis, as part of which we introduce several new -approaches to representation learning and compositional semantics, as well as -multiple state-of-the-art models which apply distributed semantic -representations to various tasks in NLP. -" -1439,1411.3315,"Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and Steven Skiena",Statistically Significant Detection of Linguistic Change,cs.CL cs.IR cs.LG," We propose a new computational approach for tracking and detecting -statistically significant linguistic shifts in the meaning and usage of words. -Such linguistic shifts are especially prevalent on the Internet, where the -rapid exchange of ideas can quickly change a word's meaning. Our meta-analysis -approach constructs property time series of word usage, and then uses -statistically sound change point detection algorithms to identify significant -linguistic shifts. - We consider and analyze three approaches of increasing complexity to generate -such linguistic property time series, the culmination of which uses -distributional characteristics inferred from word co-occurrences. Using -recently proposed deep neural language models, we first train vector -representations of words for each time period. Second, we warp the vector -spaces into one unified coordinate system. Finally, we construct a -distance-based distributional time series for each word to track it's -linguistic displacement over time. - We demonstrate that our approach is scalable by tracking linguistic change -across years of micro-blogging using Twitter, a decade of product reviews using -a corpus of movie reviews from Amazon, and a century of written books using the -Google Book-ngrams. Our analysis reveals interesting patterns of language usage -change commensurate with each medium. -" -1440,1411.3561,Prabhsimran Singh and Amritpal Singh,A Text to Speech (TTS) System with English to Punjabi Conversion,cs.CL," The paper aims to show how an application can be developed that converts the -English language into the Punjabi Language, and the same application can -convert the Text to Speech(TTS) i.e. pronounce the text. This application can -be really beneficial for those with special needs. -" -1441,1411.3827,Antonin Delpeuch (University of Oxford),Autonomization of Monoidal Categories,math.CT cs.CL," We show that contrary to common belief in the DisCoCat community, a monoidal -category is all that is needed to define a categorical compositional model of -natural language. This relies on a construction which freely adds adjoints to a -monoidal category. In the case of distributional semantics, this broadens the -range of available models, to include non-linear maps and cartesian products -for instance. We illustrate the applications of this principle to various -distributional models of meaning. -" -1442,1411.4072,"Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng",Learning Multi-Relational Semantics Using Neural-Embedding Models,cs.CL cs.LG stat.ML," In this paper we present a unified framework for modeling multi-relational -representations, scoring, and learning, and conduct an empirical study of -several recent multi-relational embedding models under the framework. We -investigate the different choices of relation operators based on linear and -bilinear transformations, and also the effects of entity representations by -incorporating unsupervised vectors pre-trained on extra textual resources. Our -results show several interesting findings, enabling the design of a simple -embedding model that achieves the new state-of-the-art performance on a popular -knowledge base completion task evaluated on Freebase. -" -1443,1411.4109,Glenn R. Hofford,Resolution of Difficult Pronouns Using the ROSS Method,cs.CL cs.AI," A new natural language understanding method for disambiguation of difficult -pronouns is described. Difficult pronouns are those pronouns for which a level -of world or domain knowledge is needed in order to perform anaphoral or other -types of resolution. Resolution of difficult pronouns may in some cases require -a prior step involving the application of inference to a situation that is -represented by the natural language text. A general method is described: it -performs entity resolution and pronoun resolution. An extension to the general -pronoun resolution method performs inference as an embedded commonsense -reasoning method. The general method and the embedded method utilize features -of the ROSS representational scheme; in particular the methods use ROSS -ontology classes and the ROSS situation model. The overall method is a working -solution that solves the following Winograd schemas: a) trophy and suitcase, b) -person lifts person, c) person pays detective, and d) councilmen and -demonstrators. -" -1444,1411.4114,"Ha Jong Won, Li Gwang Chol, Kim Hyok Chol, Li Kum Song (College of - Computer Science, Kim Il Sung University)","Definition of Visual Speech Element and Research on a Method of - Extracting Feature Vector for Korean Lip-Reading",cs.CL cs.CV cs.LG," In this paper, we defined the viseme (visual speech element) and described -about the method of extracting visual feature vector. We defined the 10 visemes -based on vowel by analyzing of Korean utterance and proposed the method of -extracting the 20-dimensional visual feature vector, combination of static -features and dynamic features. Lastly, we took an experiment in recognizing -words based on 3-viseme HMM and evaluated the efficiency. -" -1445,1411.4116,"Jianpeng Cheng, Dimitri Kartsaklis, Edward Grefenstette","Investigating the Role of Prior Disambiguation in Deep-learning - Compositional Models of Meaning",cs.CL cs.LG cs.NE," This paper aims to explore the effect of prior disambiguation on neural -network- based compositional models, with the hope that better semantic -representations for text compounds can be produced. We disambiguate the input -word vectors before they are fed into a compositional deep net. A series of -evaluations shows the positive effect of prior disambiguation for such deep -models. -" -1446,1411.4166,"Manaal Faruqui and Jesse Dodge and Sujay K. Jauhar and Chris Dyer and - Eduard Hovy and Noah A. Smith",Retrofitting Word Vectors to Semantic Lexicons,cs.CL," Vector space word representations are learned from distributional information -of words in large corpora. Although such statistics are semantically -informative, they disregard the valuable information that is contained in -semantic lexicons such as WordNet, FrameNet, and the Paraphrase Database. This -paper proposes a method for refining vector space representations using -relational information from semantic lexicons by encouraging linked words to -have similar vector representations, and it makes no assumptions about how the -input vectors were constructed. Evaluated on a battery of standard lexical -semantic evaluation tasks in several languages, we obtain substantial -improvements starting with a variety of word vector models. Our refinement -method outperforms prior techniques for incorporating semantic lexicons into -the word vector training algorithms. -" -1447,1411.4194,Glenn R. Hofford,ROSS User's Guide and Reference Manual (Version 1.0),cs.AI cs.CL," The ROSS method is a new approach in the area of knowledge representation -that is useful for many artificial intelligence and natural language -understanding representation and reasoning tasks. (ROSS stands for -""Representation"", ""Ontology"", ""Structure"", ""Star"" language). ROSS is a physical -symbol-based representational scheme. ROSS provides a complex model for the -declarative representation of physical structure and for the representation of -processes and causality. From the metaphysical perspective, the ROSS view of -external reality involves a 4D model, wherein discrete single-time-point -unit-sized locations with states are the basis for all objects, processes and -aspects that can be modeled. ROSS includes a language called ""Star"" for the -specification of ontology classes. The ROSS method also includes a formal -scheme called the ""instance model"". Instance models are used in the area of -natural language meaning representation to represent situations. This document -is an in-depth specification of the ROSS method. -" -1448,1411.4455,"Miao Fan, Deli Zhao, Qiang Zhou, Zhiyuan Liu, Thomas Fang Zheng, - Edward Y. Chang","Errata: Distant Supervision for Relation Extraction with Matrix - Completion",cs.CL cs.LG," The essence of distantly supervised relation extraction is that it is an -incomplete multi-label classification problem with sparse and noisy features. -To tackle the sparsity and noise challenges, we propose solving the -classification problem using matrix completion on factorized matrix of -minimized rank. We formulate relation classification as completing the unknown -labels of testing items (entity pairs) in a sparse matrix that concatenates -training and testing textual features with training labels. Our algorithmic -framework is based on the assumption that the rank of item-by-feature and -item-by-label joint matrix is low. We apply two optimization models to recover -the underlying low-rank matrix leveraging the sparsity of feature-label matrix. -The matrix completion problem is then solved by the fixed point continuation -(FPC) algorithm, which can find the global optimum. Experiments on two widely -used datasets with different dimensions of textual features demonstrate that -our low-rank matrix completion approach significantly outperforms the baseline -and the state-of-the-art methods. -" -1449,1411.4472,Andrej Gajduk and Ljupco Kocarev,Opinion mining of text documents written in Macedonian language,cs.CL," The ability to extract public opinion from web portals such as review sites, -social networks and blogs will enable companies and individuals to form a view, -an attitude and make decisions without having to do lengthy and costly -researches and surveys. In this paper machine learning techniques are used for -determining the polarity of forum posts on kajgana which are written in -Macedonian language. The posts are classified as being positive, negative or -neutral. We test different feature metrics and classifiers and provide detailed -evaluation of their participation in improving the overall performance on a -manually generated dataset. By achieving 92% accuracy, we show that the -performance of systems for automated opinion mining is comparable to a human -evaluator, thus making it a viable option for text data analysis. Finally, we -present a few statistics derived from the forum posts using the developed -system. -" -1450,1411.4614,"Pascal Vaillant, Jean-Baptiste Lamy","Using graph transformation algorithms to generate natural language - equivalents of icons expressing medical concepts",cs.CL," A graphical language addresses the need to communicate medical information in -a synthetic way. Medical concepts are expressed by icons conveying fast visual -information about patients' current state or about the known effects of drugs. -In order to increase the visual language's acceptance and usability, a natural -language generation interface is currently developed. In this context, this -paper describes the use of an informatics method ---graph transformation--- to -prepare data consisting of concepts in an OWL-DL ontology for use in a natural -language generation component. The OWL concept may be considered as a -star-shaped graph with a central node. The method transforms it into a graph -representing the deep semantic structure of a natural language phrase. This -work may be of future use in other contexts where ontology concepts have to be -mapped to half-formalized natural language expressions. -" -1451,1411.4618,"Christopher J.C. Burges, Erin Renshaw, and Andrzej Pastusiak",Relations World: A Possibilistic Graphical Model,cs.CL cs.AI," We explore the idea of using a ""possibilistic graphical model"" as the basis -for a world model that drives a dialog system. As a first step we have -developed a system that uses text-based dialog to derive a model of the user's -family relations. The system leverages its world model to infer relational -triples, to learn to recover from upstream coreference resolution errors and -ambiguities, and to learn context-dependent paraphrase models. We also explore -some theoretical aspects of the underlying graphical model. -" -1452,1411.4825,Ulrich Furbach and Claudia Schon and Frieder Stolzenburg,Cognitive Systems and Question Answering,cs.AI cs.CL," This paper briefly characterizes the field of cognitive computing. As an -exemplification, the field of natural language question answering is introduced -together with its specific challenges. A possibility to master these challenges -is illustrated by a detailed presentation of the LogAnswer system, which is a -successful representative of the field of natural language question answering. -" -1453,1411.4925,A. Ramos-Soto and A. Bugar\'in and S. Barro and J. Taboada,"Linguistic Descriptions for Automatic Generation of Textual Short-Term - Weather Forecasts on Real Prediction Data",cs.AI cs.CL," We present in this paper an application which automatically generates textual -short-term weather forecasts for every municipality in Galicia (NW Spain), -using the real data provided by the Galician Meteorology Agency (MeteoGalicia). -This solution combines in an innovative way computing with perceptions -techniques and strategies for linguistic description of data together with a -natural language generation (NLG) system. The application, named GALiWeather, -extracts relevant information from weather forecast input data and encodes it -into intermediate descriptions using linguistic variables and temporal -references. These descriptions are later translated into natural language texts -by the natural language generation system. The obtained forecast results have -been thoroughly validated by an expert meteorologist from MeteoGalicia using a -quality assessment methodology which covers two key dimensions of a text: the -accuracy of its content and the correctness of its form. Following this -validation GALiWeather will be released as a real service offering custom -forecasts for a wide public. -" -1454,1411.4952,"Hao Fang and Saurabh Gupta and Forrest Iandola and Rupesh Srivastava - and Li Deng and Piotr Doll\'ar and Jianfeng Gao and Xiaodong He and Margaret - Mitchell and John C. Platt and C. Lawrence Zitnick and Geoffrey Zweig",From Captions to Visual Concepts and Back,cs.CV cs.CL," This paper presents a novel approach for automatically generating image -descriptions: visual detectors, language models, and multimodal similarity -models learnt directly from a dataset of image captions. We use multiple -instance learning to train visual detectors for words that commonly occur in -captions, including many different parts of speech such as nouns, verbs, and -adjectives. The word detector outputs serve as conditional inputs to a -maximum-entropy language model. The language model learns from a set of over -400,000 image descriptions to capture the statistics of word usage. We capture -global semantics by re-ranking caption candidates using sentence-level features -and a deep multimodal similarity model. Our system is state-of-the-art on the -official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. When -human judges compare the system captions to ones written by other people on our -held-out test set, the system captions have equal or better quality 34% of the -time. -" -1455,1411.4960,"Hana Rizvi\'c, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c, Ana Me\v{s}trovi\'c",Network Motifs Analysis of Croatian Literature,cs.CL," In this paper we analyse network motifs in the co-occurrence directed -networks constructed from five different texts (four books and one portal) in -the Croatian language. After preparing the data and network construction, we -perform the network motif analysis. We analyse the motif frequencies and -Z-scores in the five networks. We present the triad significance profile for -five datasets. Furthermore, we compare our results with the existing results -for the linguistic networks. Firstly, we show that the triad significance -profile for the Croatian language is very similar with the other languages and -all the networks belong to the same family of networks. However, there are -certain differences between the Croatian language and other analysed languages. -We conclude that this is due to the free word-order of the Croatian language. -" -1456,1411.5379,"Kai Zhao, Liang Huang",Type-Driven Incremental Semantic Parsing with Polymorphism,cs.CL," Semantic parsing has made significant progress, but most current semantic -parsers are extremely slow (CKY-based) and rather primitive in representation. -We introduce three new techniques to tackle these problems. First, we design -the first linear-time incremental shift-reduce-style semantic parsing algorithm -which is more efficient than conventional cubic-time bottom-up semantic -parsers. Second, our parser, being type-driven instead of syntax-driven, uses -type-checking to decide the direction of reduction, which eliminates the need -for a syntactic grammar such as CCG. Third, to fully exploit the power of -type-driven semantic parsing beyond simple types (such as entities and truth -values), we borrow from programming language theory the concepts of subtype -polymorphism and parametric polymorphism to enrich the type system in order to -better guide the parsing. Our system learns very accurate parses in GeoQuery, -Jobs and Atis domains. -" -1457,1411.5595,"Tianze Shi, Zhiyuan Liu",Linking GloVe with word2vec,cs.CL cs.LG stat.ML," The Global Vectors for word representation (GloVe), introduced by Jeffrey -Pennington et al. is reported to be an efficient and effective method for -learning vector representations of words. State-of-the-art performance is also -provided by skip-gram with negative-sampling (SGNS) implemented in the word2vec -tool. In this note, we explain the similarities between the training objectives -of the two models, and show that the objective of SGNS is similar to the -objective of a specialized form of GloVe, though their cost functions are -defined differently. -" -1458,1411.5654,Xinlei Chen and C. Lawrence Zitnick,Learning a Recurrent Visual Representation for Image Caption Generation,cs.CV cs.AI cs.CL," In this paper we explore the bi-directional mapping between images and their -sentence-based descriptions. We propose learning this mapping using a recurrent -neural network. Unlike previous approaches that map both sentences and images -to a common embedding, we enable the generation of novel sentences given an -image. Using the same model, we can also reconstruct the visual features -associated with an image given its visual description. We use a novel recurrent -visual memory that automatically learns to remember long-term visual concepts -to aid in both sentence generation and visual feature reconstruction. We -evaluate our approach on several tasks. These include sentence generation, -sentence retrieval and image retrieval. State-of-the-art results are shown for -the task of generating novel image descriptions. When compared to human -generated captions, our automatically generated captions are preferred by -humans over $19.8\%$ of the time. Results are better than or comparable to -state-of-the-art results on the image and sentence retrieval tasks for methods -using similar visual features. -" -1459,1411.5726,"Ramakrishna Vedantam, C. Lawrence Zitnick and Devi Parikh",CIDEr: Consensus-based Image Description Evaluation,cs.CV cs.CL cs.IR," Automatically describing an image with a sentence is a long-standing -challenge in computer vision and natural language processing. Due to recent -progress in object detection, attribute classification, action recognition, -etc., there is renewed interest in this area. However, evaluating the quality -of descriptions has proven to be challenging. We propose a novel paradigm for -evaluating image descriptions that uses human consensus. This paradigm consists -of three main parts: a new triplet-based method of collecting human annotations -to measure consensus, a new automated metric (CIDEr) that captures consensus, -and two new datasets: PASCAL-50S and ABSTRACT-50S that contain 50 sentences -describing each image. Our simple metric captures human judgment of consensus -better than existing metrics across sentences generated by various sources. We -also evaluate five state-of-the-art image description approaches using this new -protocol and provide a benchmark for future comparisons. A version of CIDEr -named CIDEr-D is available as a part of MS COCO evaluation server to enable -systematic evaluation and benchmarking. -" -1460,1411.5732,"Suleyman Cetintas, Luo Si, Yan Ping Xin, Dake Zhang, Joo Young Park, - Ron Tzur","A Joint Probabilistic Classification Model of Relevant and Irrelevant - Sentences in Mathematical Word Problems",cs.CL cs.IR cs.LG stat.ML," Estimating the difficulty level of math word problems is an important task -for many educational applications. Identification of relevant and irrelevant -sentences in math word problems is an important step for calculating the -difficulty levels of such problems. This paper addresses a novel application of -text categorization to identify two types of sentences in mathematical word -problems, namely relevant and irrelevant sentences. A novel joint probabilistic -classification model is proposed to estimate the joint probability of -classification decisions for all sentences of a math word problem by utilizing -the correlation among all sentences along with the correlation between the -question sentence and other sentences, and sentence text. The proposed model is -compared with i) a SVM classifier which makes independent classification -decisions for individual sentences by only using the sentence text and ii) a -novel SVM classifier that considers the correlation between the question -sentence and other sentences along with the sentence text. An extensive set of -experiments demonstrates the effectiveness of the joint probabilistic -classification model for identifying relevant and irrelevant sentences as well -as the novel SVM classifier that utilizes the correlation between the question -sentence and other sentences. Furthermore, empirical results and analysis show -that i) it is highly beneficial not to remove stopwords and ii) utilizing part -of speech tagging does not make a significant improvement although it has been -shown to be effective for the related task of math word problem type -classification. -" -1461,1411.5796,"Rajveer Kaur, Saurabh Sharma",Pre-processing of Domain Ontology Graph Generation System in Punjabi,cs.CL," This paper describes pre-processing phase of ontology graph generation system -from Punjabi text documents of different domains. This research paper focuses -on pre-processing of Punjabi text documents. Pre-processing is structured -representation of the input text. Pre-processing of ontology graph generation -includes allowing input restrictions to the text, removal of special symbols -and punctuation marks, removal of duplicate terms, removal of stop words, -extract terms by matching input terms with dictionary and gazetteer lists -terms. -" -1462,1411.6699,Yangfeng Ji and Jacob Eisenstein,"One Vector is Not Enough: Entity-Augmented Distributional Semantics for - Discourse Relations",cs.CL cs.LG," Discourse relations bind smaller linguistic units into coherent texts. -However, automatically identifying discourse relations is difficult, because it -requires understanding the semantics of the linked arguments. A more subtle -challenge is that it is not enough to represent the meaning of each argument of -a discourse relation, because the relation may depend on links between -lower-level components, such as entity mentions. Our solution computes -distributional meaning representations by composition up the syntactic parse -tree. A key difference from previous work on compositional distributional -semantics is that we also compute representations for entity mentions, using a -novel downward compositional pass. Discourse relations are predicted from the -distributional representations of the arguments, and also of their coreferent -entity mentions. The resulting system obtains substantial improvements over the -previous state-of-the-art in predicting implicit discourse relations in the -Penn Discourse Treebank. -" -1463,1411.6718,"Mahmoud Nabil, Mohamed Aly, Amir Atiya",LABR: A Large Scale Arabic Sentiment Analysis Benchmark,cs.CL cs.LG," We introduce LABR, the largest sentiment analysis dataset to-date for the -Arabic language. It consists of over 63,000 book reviews, each rated on a scale -of 1 to 5 stars. We investigate the properties of the dataset, and present its -statistics. We explore using the dataset for two tasks: (1) sentiment polarity -classification; and (2) ratings classification. Moreover, we provide standard -splits of the dataset into training, validation and testing, for both polarity -and ratings classification, in both balanced and unbalanced settings. We extend -our previous work by performing a comprehensive analysis on the dataset. In -particular, we perform an extended survey of the different classifiers -typically used for the sentiment polarity classification problem. We also -construct a sentiment lexicon from the dataset that contains both single and -compound sentiment words and we explore its effectiveness. We make the dataset -and experimental details publicly available. -" -1464,1411.7820,Vivi Nastase and Angela Fahrni,"Coarse-grained Cross-lingual Alignment of Comparable Texts with Topic - Models and Encyclopedic Knowledge",cs.CL," We present a method for coarse-grained cross-lingual alignment of comparable -texts: segments consisting of contiguous paragraphs that discuss the same theme -(e.g. history, economy) are aligned based on induced multilingual topics. The -method combines three ideas: a two-level LDA model that filters out words that -do not convey themes, an HMM that models the ordering of themes in the -collection of documents, and language-independent concept annotations to serve -as a cross-language bridge and to strengthen the connection between paragraphs -in the same segment through concept relations. The method is evaluated on -English and French data previously used for monolingual alignment. The results -show state-of-the-art performance in both monolingual and cross-lingual -settings. -" -1465,1411.7942,"Tamara Polajnar, Laura Rimell, Stephen Clark",Using Sentence Plausibility to Learn the Semantics of Transitive Verbs,cs.CL," The functional approach to compositional distributional semantics considers -transitive verbs to be linear maps that transform the distributional vectors -representing nouns into a vector representing a sentence. We conduct an initial -investigation that uses a matrix consisting of the parameters of a logistic -regression classifier trained on a plausibility task as a transitive verb -function. We compare our method to a commonly used corpus-based method for -constructing a verb matrix and find that the plausibility training may be more -effective for disambiguation tasks. -" -1466,1412.0696,"Shuyang Gao, Greg Ver Steeg and Aram Galstyan","Understanding confounding effects in linguistic coordination: an - information-theoretic approach",cs.CL cs.IT cs.SI math.IT physics.data-an," We suggest an information-theoretic approach for measuring stylistic -coordination in dialogues. The proposed measure has a simple predictive -interpretation and can account for various confounding factors through proper -conditioning. We revisit some of the previous studies that reported strong -signatures of stylistic accommodation, and find that a significant part of the -observed coordination can be attributed to a simple confounding effect - length -coordination. Specifically, longer utterances tend to be followed by longer -responses, which gives rise to spurious correlations in the other stylistic -features. We propose a test to distinguish correlations in length due to -contextual factors (topic of conversation, user verbosity, etc.) and -turn-by-turn coordination. We also suggest a test to identify whether stylistic -coordination persists even after accounting for length coordination and -contextual factors. -" -1467,1412.0751,John Wieting,Tiered Clustering to Improve Lexical Entailment,cs.CL," Many tasks in Natural Language Processing involve recognizing lexical -entailment. Two different approaches to this problem have been proposed -recently that are quite different from each other. The first is an asymmetric -similarity measure designed to give high scores when the contexts of the -narrower term in the entailment are a subset of those of the broader term. The -second is a supervised approach where a classifier is learned to predict -entailment given a concatenated latent vector representation of the word. Both -of these approaches are vector space models that use a single context vector as -a representation of the word. In this work, I study the effects of clustering -words into senses and using these multiple context vectors to infer entailment -using extensions of these two algorithms. I find that this approach offers some -improvement to these entailment algorithms. -" -1468,1412.0879,"Sean Gallagher, Wlodek Zadrozny, Walid Shalaby, Adarsh Avadhani",Watsonsim: Overview of a Question Answering Engine,cs.CL cs.IR," The objective of the project is to design and run a system similar to Watson, -designed to answer Jeopardy questions. In the course of a semester, we -developed an open source question answering system using the Indri, Lucene, -Bing and Google search engines, Apache UIMA, Open- and CoreNLP, and Weka among -additional modules. By the end of the semester, we achieved 18% accuracy on -Jeopardy questions, and work has not stopped since then. -" -1469,1412.1058,Rie Johnson and Tong Zhang,"Effective Use of Word Order for Text Categorization with Convolutional - Neural Networks",cs.CL cs.LG stat.ML," Convolutional neural network (CNN) is a neural network that can make use of -the internal structure of data such as the 2D structure of image data. This -paper studies CNN on text categorization to exploit the 1D structure (namely, -word order) of text data for accurate prediction. Instead of using -low-dimensional word vectors as input as is often done, we directly apply CNN -to high-dimensional text data, which leads to directly learning embedding of -small text regions for use in classification. In addition to a straightforward -adaptation of CNN from image to text, a simple but new variation which employs -bag-of-word conversion in the convolution layer is proposed. An extension to -combine multiple convolution layers is also explored for higher accuracy. The -experiments demonstrate the effectiveness of our approach in comparison with -state-of-the-art methods. -" -1470,1412.1215,"H\'el\`ene Pignot (SAMM), Odile Piton (SAMM)","Mary Astell's words in A Serious Proposal to the Ladies (part I), a - lexicographic inquiry with NooJ",cs.CL," In the following article we elected to study with NooJ the lexis of a 17 th -century text, Mary Astell's seminal essay, A Serious Proposal to the Ladies, -part I, published in 1694. We first focused on the semantics to see how Astell -builds her vindication of the female sex, which words she uses to sensitise -women to their alienated condition and promote their education. Then we studied -the morphology of the lexemes (which is different from contemporary English) -used by the author, thanks to the NooJ tools we have devised for this purpose. -NooJ has great functionalities for lexicographic work. Its commands and graphs -prove to be most efficient in the spotting of archaic words or variants in -spelling. Introduction In our previous articles, we have studied the -singularities of 17 th century English within the framework of a diachronic -analysis thanks to syntactical and morphological graphs and thanks to the -dictionaries we have compiled from a corpus that may be expanded overtime. Our -early work was based on a limited corpus of English travel literature to Greece -in the 17 th century. This article deals with a late seventeenth century text -written by a woman philosopher and essayist, Mary Astell (1666--1731), -considered as one of the first English feminists. Astell wrote her essay at a -time in English history when women were ""the weaker vessel"" and their main -business in life was to charm and please men by their looks and submissiveness. -In this essay we will see how NooJ can help us analyse Astell's rhetoric (what -point of view does she adopt, does she speak in her own name, in the name of -all women, what is her representation of men and women and their relationships -in the text, what are the goals of education?). Then we will turn our attention -to the morphology of words in the text and use NooJ commands and graphs to -carry out a lexicographic inquiry into Astell's lexemes. -" -1471,1412.1342,Diego R. Amancio,"A perspective on the advancement of natural language processing tasks - via topological analysis of complex networks",cs.CL," Comment on ""Approaching human language with complex networks"" by Cong and Liu -(Physics of Life Reviews, Volume 11, Issue 4, December 2014, Pages 598-618). -" -1472,1412.1454,"Noam Shazeer, Joris Pelemans, Ciprian Chelba","Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability - Estimation",cs.LG cs.CL," We present a novel family of language model (LM) estimation techniques named -Sparse Non-negative Matrix (SNM) estimation. A first set of experiments -empirically evaluating it on the One Billion Word Benchmark shows that SNM -$n$-gram LMs perform almost as well as the well-established Kneser-Ney (KN) -models. When using skip-gram features the models are able to match the -state-of-the-art recurrent neural network (RNN) LMs; combining the two modeling -techniques yields the best known result on the benchmark. The computational -advantages of SNM over both maximum entropy and RNN LM estimation are probably -its main strength, promising an approach that has the same flexibility in -combining arbitrary features effectively and yet should scale to very large -amounts of data as gracefully as $n$-gram LMs do. -" -1473,1412.1632,"Lei Yu, Karl Moritz Hermann, Phil Blunsom and Stephen Pulman",Deep Learning for Answer Sentence Selection,cs.CL," Answer sentence selection is the task of identifying sentences that contain -the answer to a given question. This is an important problem in its own right -as well as in the larger context of open domain question answering. We propose -a novel approach to solving this task via means of distributed representations, -and learn to match questions with answers by considering their semantic -encoding. This contrasts prior work on this task, which typically relies on -classifiers with large numbers of hand-crafted syntactic and semantic features -and various external resources. Our approach does not require any feature -engineering nor does it involve specialist linguistic data, making this model -easily applicable to a wide range of domains and languages. Experimental -results on a standard benchmark dataset from TREC demonstrate that---despite -its simplicity---our model matches state of the art performance on the answer -sentence selection task. -" -1474,1412.1820,"Dan Gillick, Nevena Lazic, Kuzman Ganchev, Jesse Kirchner, David Huynh",Context-Dependent Fine-Grained Entity Type Tagging,cs.CL," Entity type tagging is the task of assigning category labels to each mention -of an entity in a document. While standard systems focus on a small set of -types, recent work (Ling and Weld, 2012) suggests that using a large -fine-grained label set can lead to dramatic improvements in downstream tasks. -In the absence of labeled training data, existing fine-grained tagging systems -obtain examples automatically, using resolved entities and their types -extracted from a knowledge base. However, since the appropriate type often -depends on context (e.g. Washington could be tagged either as city or -government), this procedure can result in spurious labels, leading to poorer -generalization. We propose the task of context-dependent fine type tagging, -where the set of acceptable labels for a mention is restricted to only those -deducible from the local context (e.g. sentence or document). We introduce new -resources for this task: 12,017 mentions annotated with their context-dependent -fine types, and we provide baseline experimental results on this data. -" -1475,1412.1841,P. F. Tupper,Exemplar Dynamics and Sound Merger in Language,cs.CL math.DS nlin.AO," We develop a model of phonological contrast in natural language. -Specifically, the model describes the maintenance of contrast between different -words in a language, and the elimination of such contrast when sounds in the -words merge. An example of such a contrast is that provided by the two vowel -sounds 'i' and 'e', which distinguish pairs of words such as 'pin' and 'pen' in -most dialects of English. We model language users' knowledge of the -pronunciation of a word as consisting of collections of labeled exemplars -stored in memory. Each exemplar is a detailed memory of a particular utterance -of the word in question. In our model an exemplar is represented by one or two -phonetic variables along with a weight indicating how strong the memory of the -utterance is. Starting from an exemplar-level model we derive -integro-differential equations for the evolution of exemplar density fields in -phonetic space. Using these latter equations we investigate under what -conditions two sounds merge, thus eliminating the contrast. Our main conclusion -is that for the preservation of phonological contrast, it is necessary that -anomalous utterances of a given word are discarded, and not merely stored in -memory as an exemplar of another word. -" -1476,1412.1866,"Catherine Kerr, Terri Hoare, Paula Carroll, Jakub Marecek",Integer-Programming Ensemble of Temporal-Relations Classifiers,cs.CL cs.LG math.OC," The extraction and understanding of temporal events and their relations are -major challenges in natural language processing. Processing text on a -sentence-by-sentence or expression-by-expression basis often fails, in part due -to the challenge of capturing the global consistency of the text. We present an -ensemble method, which reconciles the outputs of multiple classifiers of -temporal expressions across the text using integer programming. Computational -experiments show that the ensemble improves upon the best individual results -from two recent challenges, SemEval-2013 TempEval-3 (Temporal Annotation) and -SemEval-2016 Task 12 (Clinical TempEval). -" -1477,1412.2007,"S\'ebastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio",On Using Very Large Target Vocabulary for Neural Machine Translation,cs.CL," Neural machine translation, a recently proposed approach to machine -translation based purely on neural networks, has shown promising results -compared to the existing approaches such as phrase-based statistical machine -translation. Despite its recent success, neural machine translation has its -limitation in handling a larger vocabulary, as training complexity as well as -decoding complexity increase proportionally to the number of target words. In -this paper, we propose a method that allows us to use a very large target -vocabulary without increasing training complexity, based on importance -sampling. We show that decoding can be efficiently done even with the model -having a very large target vocabulary by selecting only a small subset of the -whole target vocabulary. The models trained by the proposed approach are -empirically found to outperform the baseline models with a small vocabulary as -well as the LSTM-based neural machine translation models. Furthermore, when we -use the ensemble of a few models with very large target vocabularies, we -achieve the state-of-the-art translation performance (measured by BLEU) on the -English->German translation and almost as high performance as state-of-the-art -English->French translation system. -" -1478,1412.2197,Liangliang Cao and Chang Wang,Practice in Synonym Extraction at Large Scale,cs.CL," Synonym extraction is an important task in natural language processing and -often used as a submodule in query expansion, question answering and other -applications. Automatic synonym extractor is highly preferred for large scale -applications. Previous studies in synonym extraction are most limited to small -scale datasets. In this paper, we build a large dataset with 3.4 million -synonym/non-synonym pairs to capture the challenges in real world scenarios. We -proposed (1) a new cost function to accommodate the unbalanced learning -problem, and (2) a feature learning based deep neural network to model the -complicated relationships in synonym pairs. We compare several different -approaches based on SVMs and neural networks, and find out a novel feature -learning based neural network outperforms the methods with hand-assigned -features. Specifically, the best performance of our model surpasses the SVM -baseline with a significant 97\% relative improvement. -" -1479,1412.2378,"Danushka Bollegala and Takanori Maehara and Yuichi Yoshida and - Ken-ichi Kawarabayashi",Learning Word Representations from Relational Graphs,cs.CL," Attributes of words and relations between two words are central to numerous -tasks in Artificial Intelligence such as knowledge representation, similarity -measurement, and analogy detection. Often when two words share one or more -attributes in common, they are connected by some semantic relations. On the -other hand, if there are numerous semantic relations between two words, we can -expect some of the attributes of one of the words to be inherited by the other. -Motivated by this close connection between attributes and relations, given a -relational graph in which words are inter- connected via numerous semantic -relations, we propose a method to learn a latent representation for the -individual words. The proposed method considers not only the co-occurrences of -words as done by existing approaches for word representation learning, but also -the semantic relations in which two words co-occur. To evaluate the accuracy of -the word representations learnt using the proposed method, we use the learnt -word representations to solve semantic word analogy problems. Our experimental -results show that it is possible to learn better word representations by using -semantic semantics between words. -" -1480,1412.2442,M. Yahia Kaadan and Asaad Kaadan,Rediscovering the Alphabet - On the Innate Universal Grammar,cs.CL," Universal Grammar (UG) theory has been one of the most important research -topics in linguistics since introduced five decades ago. UG specifies the -restricted set of languages learnable by human brain, and thus, many -researchers believe in its biological roots. Numerous empirical studies of -neurobiological and cognitive functions of the human brain, and of many natural -languages, have been conducted to unveil some aspects of UG. This, however, -resulted in different and sometimes contradicting theories that do not indicate -a universally unique grammar. In this research, we tackle the UG problem from -an entirely different perspective. We search for the Unique Universal Grammar -(UUG) that facilitates communication and knowledge transfer, the sole purpose -of a language. We formulate this UG and show that it is unique, intrinsic, and -cosmic, rather than humanistic. Initial analysis on a widespread natural -language already showed some positive results. -" -1481,1412.2486,Ramon Ferrer-i-Cancho,Optimization models of natural communication,physics.soc-ph cs.CL physics.data-an," A family of information theoretic models of communication was introduced more -than a decade ago to explain the origins of Zipf's law for word frequencies. -The family is a based on a combination of two information theoretic principles: -maximization of mutual information between forms and meanings and minimization -of form entropy. The family also sheds light on the origins of three other -patterns: the principle of contrast, a related vocabulary learning bias and the -meaning-frequency law. Here two important components of the family, namely the -information theoretic principles and the energy function that combines them -linearly, are reviewed from the perspective of psycholinguistics, language -learning, information theory and synergetic linguistics. The minimization of -this linear function is linked to the problem of compression of standard -information theory and might be tuned by self-organization. -" -1482,1412.2487,"Richard A. Blythe, Andrew D. M. Smith and Kenny Smith",Word learning under infinite uncertainty,physics.soc-ph cs.CL," Language learners must learn the meanings of many thousands of words, despite -those words occurring in complex environments in which infinitely many meanings -might be inferred by the learner as a word's true meaning. This problem of -infinite referential uncertainty is often attributed to Willard Van Orman -Quine. We provide a mathematical formalisation of an ideal cross-situational -learner attempting to learn under infinite referential uncertainty, and -identify conditions under which word learning is possible. As Quine's -intuitions suggest, learning under infinite uncertainty is in fact possible, -provided that learners have some means of ranking candidate word meanings in -terms of their plausibility; furthermore, our analysis shows that this ranking -could in fact be exceedingly weak, implying that constraints which allow -learners to infer the plausibility of candidate word meanings could themselves -be weak. This approach lifts the burden of explanation from `smart' word -learning constraints in learners, and suggests a programme of research into -weak, unreliable, probabilistic constraints on the inference of word meaning in -real word learners. -" -1483,1412.2812,Ivan Titov and Ehsan Khoddam,"Unsupervised Induction of Semantic Roles within a Reconstruction-Error - Minimization Framework",cs.CL cs.AI cs.LG stat.ML," We introduce a new approach to unsupervised estimation of feature-rich -semantic role labeling models. Our model consists of two components: (1) an -encoding component: a semantic role labeling model which predicts roles given a -rich set of syntactic and lexical features; (2) a reconstruction component: a -tensor factorization model which relies on roles to predict argument fillers. -When the components are estimated jointly to minimize errors in argument -reconstruction, the induced roles largely correspond to roles defined in -annotated resources. Our method performs on par with most accurate role -induction methods on English and German, even though, unlike these previous -approaches, we do not incorporate any prior linguistic knowledge about the -languages. -" -1484,1412.2821,Xiuli Wang,Zipf's Law and the Frequency of Characters or Words of Oracles,cs.CL math.ST stat.TH," The article discusses the frequency of characters of Oracle,concluding that -the frequency and the rank of a word or character is fit to Zipf-Mandelboit Law -or Zipf's law with three parameters,and figuring out the parameters based on -the frequency,and pointing out that what some researchers of Oracle call the -assembling on the two ends is just a description by their impression about the -Oracle data. -" -1485,1412.3336,Dami\'an H. Zanette,Statistical Patterns in Written Language,cs.CL," Quantitative linguistics has been allowed, in the last few decades, within -the admittedly blurry boundaries of the field of complex systems. A growing -host of applied mathematicians and statistical physicists devote their efforts -to disclose regularities, correlations, patterns, and structural properties of -language streams, using techniques borrowed from statistics and information -theory. Overall, results can still be categorized as modest, but the prospects -are promising: medium- and long-range features in the organization of human -language -which are beyond the scope of traditional linguistics- have already -emerged from this kind of analysis and continue to be reported, contributing a -new perspective to our understanding of this most complex communication system. -This short book is intended to review some of these recent contributions. -" -1486,1412.3714,Jiwei Li,Feature Weight Tuning for Recursive Neural Networks,cs.NE cs.AI cs.CL cs.LG," This paper addresses how a recursive neural network model can automatically -leave out useless information and emphasize important evidence, in other words, -to perform ""weight tuning"" for higher-level representation acquisition. We -propose two models, Weighted Neural Network (WNN) and Binary-Expectation Neural -Network (BENN), which automatically control how much one specific unit -contributes to the higher-level representation. The proposed model can be -viewed as incorporating a more powerful compositional function for embedding -acquisition in recursive neural networks. Experimental results demonstrate the -significant improvement over standard neural models. -" -1487,1412.4021,"Dat Quoc Nguyen, Dai Quoc Nguyen, Dang Duc Pham, Son Bao Pham","A Robust Transformation-Based Learning Approach Using Ripple Down Rules - for Part-of-Speech Tagging",cs.CL," In this paper, we propose a new approach to construct a system of -transformation rules for the Part-of-Speech (POS) tagging task. Our approach is -based on an incremental knowledge acquisition method where rules are stored in -an exception structure and new rules are only added to correct the errors of -existing rules; thus allowing systematic control of the interaction between the -rules. Experimental results on 13 languages show that our approach is fast in -terms of training time and tagging speed. Furthermore, our approach obtains -very competitive accuracy in comparison to state-of-the-art POS and -morphological taggers. -" -1488,1412.4160,"Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham",Ripple Down Rules for Question Answering,cs.CL cs.IR," Recent years have witnessed a new trend of building ontology-based question -answering systems. These systems use semantic web information to produce more -precise answers to users' queries. However, these systems are mostly designed -for English. In this paper, we introduce an ontology-based question answering -system named KbQAS which, to the best of our knowledge, is the first one made -for Vietnamese. KbQAS employs our question analysis approach that -systematically constructs a knowledge base of grammar rules to convert each -input question into an intermediate representation element. KbQAS then takes -the intermediate representation element with respect to a target ontology and -applies concept-matching techniques to return an answer. On a wide range of -Vietnamese questions, experimental results show that the performance of KbQAS -is promising with accuracies of 84.1% and 82.4% for analyzing input questions -and retrieving output answers, respectively. Furthermore, our question analysis -approach can easily be applied to new domains and new languages, thus saving -time and human effort. -" -1489,1412.4314,Joseph Chee Chang and Chu-Cheng Lin,"Recurrent-Neural-Network for Language Detection on Twitter - Code-Switching Corpus",cs.NE cs.CL," Mixed language data is one of the difficult yet less explored domains of -natural language processing. Most research in fields like machine translation -or sentiment analysis assume monolingual input. However, people who are capable -of using more than one language often communicate using multiple languages at -the same time. Sociolinguists believe this ""code-switching"" phenomenon to be -socially motivated. For example, to express solidarity or to establish -authority. Most past work depend on external tools or resources, such as -part-of-speech tagging, dictionary look-up, or named-entity recognizers to -extract rich features for training machine learning models. In this paper, we -train recurrent neural networks with only raw features, and use word embedding -to automatically learn meaningful representations. Using the same -mixed-language Twitter corpus, our system is able to outperform the best -SVM-based systems reported in the EMNLP'14 Code-Switching Workshop by 1% in -accuracy, or by 17% in error rate reduction. -" -1490,1412.4369,Daniel Fried and Kevin Duh,"Incorporating Both Distributional and Relational Semantics in Word - Representations",cs.CL," We investigate the hypothesis that word representations ought to incorporate -both distributional and relational semantics. To this end, we employ the -Alternating Direction Method of Multipliers (ADMM), which flexibly optimizes a -distributional objective on raw text and a relational objective on WordNet. -Preliminary results on knowledge base completion, analogy tests, and parsing -show that word representations trained on both objectives can give improvements -in some cases. -" -1491,1412.4385,Yi Yang and Jacob Eisenstein,Unsupervised Domain Adaptation with Feature Embeddings,cs.CL cs.LG," Representation learning is the dominant technique for unsupervised domain -adaptation, but existing approaches often require the specification of ""pivot -features"" that generalize across domains, which are selected by task-specific -heuristics. We show that a novel but simple feature embedding approach provides -better performance, by exploiting the feature template structure common in NLP -problems. -" -1492,1412.4401,"C. Enguehard (LINA), B. Daille, E. Morin",Tools for Terminology Processing,cs.CY cs.CL," Automatic terminology processing appeared 10 years ago when electronic -corpora became widely available. Such processing may be statistically or -linguistically based and produces terminology resources that can be used in a -number of applications : indexing, information retrieval, technology watch, -etc. We present the tools that have been developed in the IRIN Institute. They -all take as input texts (or collection of texts) and reflect different states -of terminology processing: term acquisition, term recognition and term -structuring. -" -1493,1412.4616,"Felix Weninger, Bj\""orn Schuller, Florian Eyben, Martin W\""ollmer, - Gerhard Rigoll","A Broadcast News Corpus for Evaluation and Tuning of German LVCSR - Systems",cs.CL cs.SD," Transcription of broadcast news is an interesting and challenging application -for large-vocabulary continuous speech recognition (LVCSR). We present in -detail the structure of a manually segmented and annotated corpus including -over 160 hours of German broadcast news, and propose it as an evaluation -framework of LVCSR systems. We show our own experimental results on the corpus, -achieved with a state-of-the-art LVCSR decoder, measuring the effect of -different feature sets and decoding parameters, and thereby demonstrate that -real-time decoding of our test set is feasible on a desktop PC at 9.2% word -error rate. -" -1494,1412.4682,Erik Tromp and Mykola Pechenizkiy,"Rule-based Emotion Detection on Social Media: Putting Tweets on - Plutchik's Wheel",cs.CL," We study sentiment analysis beyond the typical granularity of polarity and -instead use Plutchik's wheel of emotions model. We introduce RBEM-Emo as an -extension to the Rule-Based Emission Model algorithm to deduce such emotions -from human-written messages. We evaluate our approach on two different datasets -and compare its performance with the current state-of-the-art techniques for -emotion detection, including a recursive auto-encoder. The results of the -experimental study suggest that RBEM-Emo is a promising approach advancing the -current state-of-the-art in emotion detection. -" -1495,1412.4729,"Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, - Raymond Mooney, Kate Saenko","Translating Videos to Natural Language Using Deep Recurrent Neural - Networks",cs.CV cs.CL," Solving the visual symbol grounding problem has long been a goal of -artificial intelligence. The field appears to be advancing closer to this goal -with recent breakthroughs in deep learning for natural language grounding in -static images. In this paper, we propose to translate videos directly to -sentences using a unified deep neural network with both convolutional and -recurrent structure. Described video datasets are scarce, and most existing -methods have been applied to toy domains with a small vocabulary of possible -words. By transferring knowledge from 1.2M+ images with category labels and -100,000+ images with captions, our method is able to create sentence -descriptions of open-domain videos with large vocabularies. We compare our -approach with recent work using language generation metrics, subject, verb, and -object prediction accuracy, and a human evaluation. -" -1496,1412.4846,"Ruokuang Lin, Qianli D.Y. Ma and Chunhua Bian","Scaling laws in human speech, decreasing emergence of new words and a - generalized model",cs.CL physics.data-an," Human language, as a typical complex system, its organization and evolution -is an attractive topic for both physical and cultural researchers. In this -paper, we present the first exhaustive analysis of the text organization of -human speech. Two important results are that: (i) the construction and -organization of spoken language can be characterized as Zipf's law and Heaps' -law, as observed in written texts; (ii) word frequency vs. rank distribution -and the growth of distinct words with the increase of text length shows -significant differences between book and speech. In speech word frequency -distribution are more concentrated on higher frequency words, and the emergence -of new words decreases much rapidly when the content length grows. Based on -these observations, a new generalized model is proposed to explain these -complex dynamical behaviors and the differences between speech and book. -" -1497,1412.4930,R\'emi Lebret and Ronan Collobert,Rehabilitation of Count-based Models for Word Vector Representations,cs.CL," Recent works on word representations mostly rely on predictive models. -Distributed word representations (aka word embeddings) are trained to optimally -predict the contexts in which the corresponding words tend to appear. Such -models have succeeded in capturing word similarties as well as semantic and -syntactic regularities. Instead, we aim at reviving interest in a model based -on counts. We present a systematic study of the use of the Hellinger distance -to extract semantic representations from the word co-occurence statistics of -large text corpora. We show that this distance gives good performance on word -similarity and analogy tasks, with a proper type and size of context, and a -dimensionality reduction based on a stochastic low-rank approximation. Besides -being both simple and intuitive, this method also provides an encoding function -which can be used to infer unseen words or phrases. This becomes a clear -advantage compared to predictive models which must train these new words. -" -1498,1412.5212,Micha{\l} {\L}opuszy\'nski,Application of Topic Models to Judgments from Public Procurement Domain,cs.CL," In this work, automatic analysis of themes contained in a large corpora of -judgments from public procurement domain is performed. The employed technique -is unsupervised latent Dirichlet allocation (LDA). In addition, it is proposed, -to use LDA in conjunction with recently developed method of unsupervised -keyword extraction. Such an approach improves the interpretability of the -automatically obtained topics and allows for better computational performance. -The described analysis illustrates a potential of the method in detecting -recurring themes and discovering temporal trends in lodged contract appeals. -These results may be in future applied to improve information retrieval from -repositories of legal texts or as auxiliary material for legal analyses carried -out by human experts. -" -1499,1412.5335,"Gr\'egoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio","Ensemble of Generative and Discriminative Techniques for Sentiment - Analysis of Movie Reviews",cs.CL cs.IR cs.LG cs.NE," Sentiment analysis is a common task in natural language processing that aims -to detect polarity of a text document (typically a consumer review). In the -simplest settings, we discriminate only between positive and negative -sentiment, turning the task into a standard binary classification problem. We -compare several ma- chine learning approaches to this problem, and combine them -to achieve the best possible results. We show how to use for this task the -standard generative lan- guage models, which are slightly complementary to the -state of the art techniques. We achieve strong results on a well-known dataset -of IMDB movie reviews. Our results are easily reproducible, as we publish also -the code needed to repeat the experiments. This should simplify further advance -of the state of the art, as other researchers can combine their techniques with -ours with little effort. -" -1500,1412.5404,"Yuan Zuo, Jichang Zhao, Ke Xu","Word Network Topic Model: A Simple but General Solution for Short and - Imbalanced Texts",cs.CL cs.IR," The short text has been the prevalent format for information of Internet in -recent decades, especially with the development of online social media, whose -millions of users generate a vast number of short messages everyday. Although -sophisticated signals delivered by the short text make it a promising source -for topic modeling, its extreme sparsity and imbalance brings unprecedented -challenges to conventional topic models like LDA and its variants. Aiming at -presenting a simple but general solution for topic modeling in short texts, we -present a word co-occurrence network based model named WNTM to tackle the -sparsity and imbalance simultaneously. Different from previous approaches, WNTM -models the distribution over topics for each word instead of learning topics -for each document, which successfully enhance the semantic density of data -space without importing too much time or space complexity. Meanwhile, the rich -contextual information preserved in the word-word space also guarantees its -sensitivity in identifying rare topics with convincing quality. Furthermore, -employing the same Gibbs sampling with LDA makes WNTM easily to be extended to -various application scenarios. Extensive validations on both short and normal -texts testify the outperformance of WNTM as compared to baseline methods. And -finally we also demonstrate its potential in precisely discovering newly -emerging topics or unexpected events in Weibo at pretty early stages. -" -1501,1412.5448,"Micka\""el Poussevin and Vincent Guigue and Patrick Gallinari","Extended Recommendation Framework: Generating the Text of a User Review - as a Personalized Summary",cs.IR cs.CL," We propose to augment rating based recommender systems by providing the user -with additional information which might help him in his choice or in the -understanding of the recommendation. We consider here as a new task, the -generation of personalized reviews associated to items. We use an extractive -summary formulation for generating these reviews. We also show that the two -information sources, ratings and items could be used both for estimating -ratings and for generating summaries, leading to improved performance for each -system compared to the use of a single source. Besides these two contributions, -we show how a personalized polarity classifier can integrate the rating and -textual aspects. Overall, the proposed system offers the user three -personalized hints for a recommendation: rating, text and polarity. We evaluate -these three components on two datasets using appropriate measures for each -task. -" -1502,1412.5477,"S V Kasmir Raja, V Rajitha and Lakshmanan Meenakshi","Computational Model to Generate Case-Inflected Forms of Masculine Nouns - for Word Search in Sanskrit E-Text",cs.CL," The problem of word search in Sanskrit is inseparable from complexities that -include those caused by euphonic conjunctions and case-inflections. The -case-inflectional forms of a noun normally number 24 owing to the fact that in -Sanskrit there are eight cases and three numbers-singular, dual and plural. The -traditional method of generating these inflectional forms is rather elaborate -owing to the fact that there are differences in the forms generated between -even very similar words and there are subtle nuances involved. Further, it -would be a cumbersome exercise to generate and search for 24 forms of a word -during a word search in a large text, using the currently available -case-inflectional form generators. This study presents a new approach to -generating case-inflectional forms that is simpler to compute. Further, an -optimized model that is sufficient for generating only those word forms that -are required in a word search and is more than 80% efficient compared to the -complete case-inflectional forms generator, is presented in this study for the -first time. -" -1503,1412.5567,"Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, - Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates and - Andrew Y. Ng",Deep Speech: Scaling up end-to-end speech recognition,cs.CL cs.LG cs.NE," We present a state-of-the-art speech recognition system developed using -end-to-end deep learning. Our architecture is significantly simpler than -traditional speech systems, which rely on laboriously engineered processing -pipelines; these traditional systems also tend to perform poorly when used in -noisy environments. In contrast, our system does not need hand-designed -components to model background noise, reverberation, or speaker variation, but -instead directly learns a function that is robust to such effects. We do not -need a phoneme dictionary, nor even the concept of a ""phoneme."" Key to our -approach is a well-optimized RNN training system that uses multiple GPUs, as -well as a set of novel data synthesis techniques that allow us to efficiently -obtain a large amount of varied data for training. Our system, called Deep -Speech, outperforms previously published results on the widely studied -Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech -also handles challenging noisy environments better than widely used, -state-of-the-art commercial speech systems. -" -1504,1412.5659,"Nicholas Dronen, Peter W. Foltz, Kyle Habermehl",Effective sampling for large-scale automated writing evaluation systems,cs.CL cs.LG," Automated writing evaluation (AWE) has been shown to be an effective -mechanism for quickly providing feedback to students. It has already seen wide -adoption in enterprise-scale applications and is starting to be adopted in -large-scale contexts. Training an AWE model has historically required a single -batch of several hundred writing examples and human scores for each of them. -This requirement limits large-scale adoption of AWE since human-scoring essays -is costly. Here we evaluate algorithms for ensuring that AWE models are -consistently trained using the most informative essays. Our results show how to -minimize training set sizes while maximizing predictive performance, thereby -reducing cost without unduly sacrificing accuracy. We conclude with a -discussion of how to integrate this approach into large-scale AWE systems. -" -1505,1412.5673,Yangfeng Ji and Jacob Eisenstein,Entity-Augmented Distributional Semantics for Discourse Relations,cs.CL cs.LG," Discourse relations bind smaller linguistic elements into coherent texts. -However, automatically identifying discourse relations is difficult, because it -requires understanding the semantics of the linked sentences. A more subtle -challenge is that it is not enough to represent the meaning of each sentence of -a discourse relation, because the relation may depend on links between -lower-level elements, such as entity mentions. Our solution computes -distributional meaning representations by composition up the syntactic parse -tree. A key difference from previous work on compositional distributional -semantics is that we also compute representations for entity mentions, using a -novel downward compositional pass. Discourse relations are predicted not only -from the distributional representations of the sentences, but also of their -coreferent entity mentions. The resulting system obtains substantial -improvements over the previous state-of-the-art in predicting implicit -discourse relations in the Penn Discourse Treebank. -" -1506,1412.5836,Daniel Fried and Kevin Duh,"Incorporating Both Distributional and Relational Semantics in Word - Representations",cs.CL," We investigate the hypothesis that word representations ought to incorporate -both distributional and relational semantics. To this end, we employ the -Alternating Direction Method of Multipliers (ADMM), which flexibly optimizes a -distributional objective on raw text and a relational objective on WordNet. -Preliminary results on knowledge base completion, analogy tests, and parsing -show that word representations trained on both objectives can give improvements -in some cases. -" -1507,1412.6045,Luis Nieto Pi\~na and Richard Johansson,A Simple and Efficient Method To Generate Word Sense Representations,cs.CL," Distributed representations of words have boosted the performance of many -Natural Language Processing tasks. However, usually only one representation per -word is obtained, not acknowledging the fact that some words have multiple -meanings. This has a negative effect on the individual word representations and -the language model as a whole. In this paper we present a simple model that -enables recent techniques for building word vectors to represent distinct -senses of polysemic words. In our assessment of this model we show that it is -able to effectively discriminate between words' senses and to do so in a -computationally efficient manner. -" -1508,1412.6069,"Dirk Roorda, Charles van den Heuvel",Annotation as a New Paradigm in Research Archiving,cs.DL cs.CL," We outline a paradigm to preserve results of digital scholarship, whether -they are query results, feature values, or topic assignments. This paradigm is -characterized by using annotations as multifunctional carriers and making them -portable. The testing grounds we have chosen are two significant enterprises, -one in the history of science, and one in Hebrew scholarship. The first one -(CKCC) focuses on the results of a project where a Dutch consortium of -universities, research institutes, and cultural heritage institutions -experimented for 4 years with language techniques and topic modeling methods -with the aim to analyze the emergence of scholarly debates. The data: a complex -set of about 20.000 letters. The second one (DTHB) is a multi-year effort to -express the linguistic features of the Hebrew bible in a text database, which -is still growing in detail and sophistication. Versions of this database are -packaged in commercial bible study software. We state that the results of these -forms of scholarship require new knowledge management and archive practices. -Only when researchers can build efficiently on each other's (intermediate) -results, they can achieve the aggregations of quality data by which new -questions can be answered, and hidden patterns visualized. Archives are -required to find a balance between preserving authoritative versions of sources -and supporting collaborative efforts in digital scholarship. Annotations are -promising vehicles for preserving and reusing research results. Keywords -annotation, portability, archiving, queries, features, topics, keywords, -Republic of Letters, Hebrew text databases. -" -1509,1412.6211,"Xianfeng Hu, Yang Wang and Qiang Wu","Multiple Authors Detection: A Quantitative Analysis of Dream of the Red - Chamber",cs.LG cs.CL," Inspired by the authorship controversy of Dream of the Red Chamber and the -application of machine learning in the study of literary stylometry, we develop -a rigorous new method for the mathematical analysis of authorship by testing -for a so-called chrono-divide in writing styles. Our method incorporates some -of the latest advances in the study of authorship attribution, particularly -techniques from support vector machines. By introducing the notion of relative -frequency as a feature ranking metric our method proves to be highly effective -and robust. - Applying our method to the Cheng-Gao version of Dream of the Red Chamber has -led to convincing if not irrefutable evidence that the first $80$ chapters and -the last $40$ chapters of the book were written by two different authors. -Furthermore, our analysis has unexpectedly provided strong support to the -hypothesis that Chapter 67 was not the work of Cao Xueqin either. - We have also tested our method to the other three Great Classical Novels in -Chinese. As expected no chrono-divides have been found. This provides further -evidence of the robustness of our method. -" -1510,1412.6264,Taraka Rama K,"Supertagging: Introduction, learning, and application",cs.CL," Supertagging is an approach originally developed by Bangalore and Joshi -(1999) to improve the parsing efficiency. In the beginning, the scholars used -small training datasets and somewhat na\""ive smoothing techniques to learn the -probability distributions of supertags. Since its inception, the applicability -of Supertags has been explored for TAG (tree-adjoining grammar) formalism as -well as other related yet, different formalisms such as CCG. This article will -try to summarize the various chapters, relevant to statistical parsing, from -the most recent edited book volume (Bangalore and Joshi, 2010). The chapters -were selected so as to blend the learning of supertags, its integration into -full-scale parsing, and in semantic parsing. -" -1511,1412.6277,R\'emi Lebret and Ronan Collobert,N-gram-Based Low-Dimensional Representation for Document Classification,cs.CL," The bag-of-words (BOW) model is the common approach for classifying -documents, where words are used as feature for training a classifier. This -generally involves a huge number of features. Some techniques, such as Latent -Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA), have been -designed to summarize documents in a lower dimension with the least semantic -information loss. Some semantic information is nevertheless always lost, since -only words are considered. Instead, we aim at using information coming from -n-grams to overcome this limitation, while remaining in a low-dimension space. -Many approaches, such as the Skip-gram model, provide good word vector -representations very quickly. We propose to average these representations to -obtain representations of n-grams. All n-grams are thus embedded in a same -semantic space. A K-means clustering can then group them into semantic -concepts. The number of features is therefore dramatically reduced and -documents can be represented as bag of semantic concepts. We show that this -model outperforms LSA and LDA on a sentiment classification task, and yields -similar results than a traditional BOW-model with far less features. -" -1512,1412.6334,Hubert Soyer and Pontus Stenetorp and Akiko Aizawa,"Leveraging Monolingual Data for Crosslingual Compositional Word - Representations",cs.CL," In this work, we present a novel neural network based architecture for -inducing compositional crosslingual word representations. Unlike previously -proposed methods, our method fulfills the following three criteria; it -constrains the word-level representations to be compositional, it is capable of -leveraging both bilingual and monolingual data, and it is scalable to large -vocabularies and large quantities of data. The key component of our approach is -what we refer to as a monolingual inclusion criterion, that exploits the -observation that phrases are more closely semantically related to their -sub-phrases than to other randomly sampled phrases. We evaluate our method on a -well-established crosslingual document classification task and achieve results -that are either comparable, or greatly improve upon previous state-of-the-art -methods. Concretely, our method reaches a level of 92.7% and 84.4% accuracy for -the English to German and German to English sub-tasks respectively. The former -advances the state of the art by 0.9% points of accuracy, the latter is an -absolute improvement upon the previous state of the art by 7.7% points of -accuracy and an improvement of 33.0% in error reduction. -" -1513,1412.6418,Ivan Titov and Ehsan Khoddam,"Inducing Semantic Representation from Text by Jointly Predicting and - Factorizing Relations",cs.CL cs.LG stat.ML," In this work, we propose a new method to integrate two recent lines of work: -unsupervised induction of shallow semantics (e.g., semantic roles) and -factorization of relations in text and knowledge bases. Our model consists of -two components: (1) an encoding component: a semantic role labeling model which -predicts roles given a rich set of syntactic and lexical features; (2) a -reconstruction component: a tensor factorization model which relies on roles to -predict argument fillers. When the components are estimated jointly to minimize -errors in argument reconstruction, the induced roles largely correspond to -roles defined in annotated resources. Our method performs on par with most -accurate role induction methods on English, even though, unlike these previous -approaches, we do not incorporate any prior linguistic knowledge about the -language. -" -1514,1412.6448,"Felix Hill, Kyunghyun Cho, Sebastien Jean, Coline Devin and Yoshua - Bengio",Embedding Word Similarity with Neural Machine Translation,cs.CL," Neural language models learn word representations, or embeddings, that -capture rich linguistic and conceptual information. Here we investigate the -embeddings learned by neural machine translation models, a recently-developed -class of neural language model. We show that embeddings from translation models -outperform those learned by monolingual models at tasks that require knowledge -of both conceptual similarity and lexical-syntactic role. We further show that -these effects hold when translating from both English to French and English to -German, and argue that the desirable properties of translation embeddings -should emerge largely independently of the source and target languages. -Finally, we apply a new method for training neural translation models with very -large vocabularies, and show that this vocabulary expansion algorithm results -in minimal degradation of embedding quality. Our embedding spaces can be -queried in an online demo and downloaded from our web page. Overall, our -analyses indicate that translation-based embeddings should be used in -applications that require concepts to be organised according to similarity -and/or lexical function, while monolingual embeddings are better suited to -modelling (nonspecific) inter-word relatedness. -" -1515,1412.6568,"Georgiana Dinu, Angeliki Lazaridou, Marco Baroni",Improving zero-shot learning by mitigating the hubness problem,cs.CL cs.LG," The zero-shot paradigm exploits vector-based word representations extracted -from text corpora with unsupervised methods to learn general mapping functions -from other feature spaces onto word space, where the words associated to the -nearest neighbours of the mapped vectors are used as their linguistic labels. -We show that the neighbourhoods of the mapped elements are strongly polluted by -hubs, vectors that tend to be near a high proportion of items, pushing their -correct labels down the neighbour list. After illustrating the problem -empirically, we propose a simple method to correct it by taking the proximity -distribution of potential neighbours across many mapped vectors into account. -We show that this correction leads to consistent improvements in realistic -zero-shot experiments in the cross-lingual, image labeling and image retrieval -domains. -" -1516,1412.6575,"Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, Li Deng","Embedding Entities and Relations for Learning and Inference in Knowledge - Bases",cs.CL," We consider learning representations of entities and relations in KBs using -the neural-embedding approach. We show that most existing models, including NTN -(Socher et al., 2013) and TransE (Bordes et al., 2013b), can be generalized -under a unified learning framework, where entities are low-dimensional vectors -learned from a neural network and relations are bilinear and/or linear mapping -functions. Under this framework, we compare a variety of embedding models on -the link prediction task. We show that a simple bilinear formulation achieves -new state-of-the-art results for the task (achieving a top-10 accuracy of 73.2% -vs. 54.7% by TransE on Freebase). Furthermore, we introduce a novel approach -that utilizes the learned relation embeddings to mine logical rules such as -""BornInCity(a,b) and CityInCountry(b,c) => Nationality(a,c)"". We find that -embeddings learned from the bilinear objective are particularly good at -capturing relational semantics and that the composition of relations is -characterized by matrix multiplication. More interestingly, we demonstrate that -our embedding-based rule extraction approach successfully outperforms a -state-of-the-art confidence-based rule mining approach in mining Horn rules -that involve compositional reasoning. -" -1517,1412.6577,"Ozan \.Irsoy, Claire Cardie",Modeling Compositionality with Multiplicative Recurrent Neural Networks,cs.LG cs.CL stat.ML," We present the multiplicative recurrent neural network as a general model for -compositional meaning in language, and evaluate it on the task of fine-grained -sentiment analysis. We establish a connection to the previously investigated -matrix-space models for compositionality, and show they are special cases of -the multiplicative recurrent net. Our experiments show that these models -perform comparably or better than Elman-type additive recurrent neural networks -and outperform matrix-space models on a standard fine-grained sentiment -analysis corpus. Furthermore, they yield comparable results to structural deep -models on the recently published Stanford Sentiment Treebank without the need -for generating parse trees. -" -1518,1412.6616,"Abram Demski, Volkan Ustun, Paul Rosenbloom, Cody Kommers",Outperforming Word2Vec on Analogy Tasks with Random Projections,cs.CL cs.LG," We present a distributed vector representation based on a simplification of -the BEAGLE system, designed in the context of the Sigma cognitive architecture. -Our method does not require gradient-based training of neural networks, matrix -decompositions as with LSA, or convolutions as with BEAGLE. All that is -involved is a sum of random vectors and their pointwise products. Despite the -simplicity of this technique, it gives state-of-the-art results on analogy -problems, in most cases better than Word2Vec. To explain this success, we -interpret it as a dimension reduction via random projection. -" -1519,1412.6623,"Luke Vilnis, Andrew McCallum",Word Representations via Gaussian Embedding,cs.CL cs.LG," Current work in lexical distributed representations maps each word to a point -vector in low-dimensional space. Mapping instead to a density provides many -interesting advantages, including better capturing uncertainty about a -representation and its relationships, expressing asymmetries more naturally -than dot product or cosine similarity, and enabling more expressive -parameterization of decision boundaries. This paper advocates for density-based -distributed embeddings and presents a method for learning representations in -the space of Gaussian distributions. We compare performance on various word -embedding benchmarks, investigate the ability of these embeddings to model -entailment and other asymmetric relationships, and explore novel properties of -the representation. -" -1520,1412.6632,"Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille",Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN),cs.CV cs.CL cs.LG," In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model -for generating novel image captions. It directly models the probability -distribution of generating a word given previous words and an image. Image -captions are generated by sampling from this distribution. The model consists -of two sub-networks: a deep recurrent neural network for sentences and a deep -convolutional network for images. These two sub-networks interact with each -other in a multimodal layer to form the whole m-RNN model. The effectiveness of -our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, -Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In -addition, we apply the m-RNN model to retrieval tasks for retrieving images or -sentences, and achieves significant performance improvement over the -state-of-the-art methods which directly optimize the ranking objective function -for retrieval. The project page of this work is: -www.stat.ucla.edu/~junhua.mao/m-RNN.html . -" -1521,1412.6645,"Gabriel Synnaeve, Emmanuel Dupoux",Weakly Supervised Multi-Embeddings Learning of Acoustic Models,cs.SD cs.CL cs.LG," We trained a Siamese network with multi-task same/different information on a -speech dataset, and found that it was possible to share a network for both -tasks without a loss in performance. The first task was to discriminate between -two same or different words, and the second was to discriminate between two -same or different talkers. -" -1522,1412.6650,"Aram Ter-Sarkisov, Holger Schwenk, Loic Barrault and Fethi Bougares",Incremental Adaptation Strategies for Neural Network Language Models,cs.NE cs.CL cs.LG," It is today acknowledged that neural network language models outperform -backoff language models in applications like speech recognition or statistical -machine translation. However, training these models on large amounts of data -can take several days. We present efficient techniques to adapt a neural -network language model to new data. Instead of training a completely new model -or relying on mixture approaches, we propose two new methods: continued -training on resampled data or insertion of adaptation layers. We present -experimental results in an CAT environment where the post-edits of professional -translators are used to improve an SMT system. Both methods are very fast and -achieve significant improvements without overfitting the small adaptation data. -" -1523,1412.6815,Misha Denil and Alban Demiraj and Nando de Freitas,Extraction of Salient Sentences from Labelled Documents,cs.CL cs.IR cs.LG," We present a hierarchical convolutional document model with an architecture -designed to support introspection of the document structure. Using this model, -we show how to use visualisation techniques from the computer vision literature -to identify and extract topic-relevant sentences. - We also introduce a new scalable evaluation technique for automatic sentence -extraction systems that avoids the need for time consuming human annotation of -validation data. -" -1524,1412.6881,"Jinseok Nam and Johannes F\""urnkranz",On Learning Vector Representations in Hierarchical Label Spaces,cs.LG cs.CL stat.ML," An important problem in multi-label classification is to capture label -patterns or underlying structures that have an impact on such patterns. This -paper addresses one such problem, namely how to exploit hierarchical structures -over labels. We present a novel method to learn vector representations of a -label space given a hierarchy of labels and label co-occurrence patterns. Our -experimental results demonstrate qualitatively that the proposed method is able -to learn regularities among labels by exploiting a label hierarchy as well as -label co-occurrences. It highlights the importance of the hierarchical -information in order to obtain regularities which facilitate analogical -reasoning over a label space. We also experimentally illustrate the dependency -of the learned representations on the label hierarchy. -" -1525,1412.7004,"Pranava Swaroop Madhyastha, Xavier Carreras, Ariadna Quattoni","Tailoring Word Embeddings for Bilexical Predictions: An Experimental - Comparison",cs.CL cs.LG," We investigate the problem of inducing word embeddings that are tailored for -a particular bilexical relation. Our learning algorithm takes an existing -lexical vector space and compresses it such that the resulting word embeddings -are good predictors for a target bilexical relation. In experiments we show -that task-specific embeddings can benefit both the quality and efficiency in -lexical prediction tasks. -" -1526,1412.7026,"Aditya Joshi, Johan Halseth, Pentti Kanerva",Language Recognition using Random Indexing,cs.CL cs.LG," Random Indexing is a simple implementation of Random Projections with a wide -range of applications. It can solve a variety of problems with good accuracy -without introducing much complexity. Here we use it for identifying the -language of text samples. We present a novel method of generating language -representation vectors using letter blocks. Further, we show that the method is -easily implemented and requires little computational power and space. -Experiments on a number of model parameters illustrate certain properties about -high dimensional sparse vector representations of data. Proof of statistically -relevant language vectors are shown through the extremely high success of -various language recognition tasks. On a difficult data set of 21,000 short -sentences from 21 different languages, our model performs a language -recognition task and achieves 97.8% accuracy, comparable to state-of-the-art -methods. -" -1527,1412.7028,"Jo\""el Legrand and Ronan Collobert",Joint RNN-Based Greedy Parsing and Word Composition,cs.LG cs.CL cs.NE," This paper introduces a greedy parser based on neural networks, which -leverages a new compositional sub-tree representation. The greedy parser and -the compositional procedure are jointly trained, and tightly depends on -each-other. The composition procedure outputs a vector representation which -summarizes syntactically (parsing tags) and semantically (words) sub-trees. -Composition and tagging is achieved over continuous (word or tag) -representations, and recurrent neural networks. We reach F1 performance on par -with well-known existing parsers, while having the advantage of speed, thanks -to the greedy nature of the parser. We provide a fully functional -implementation of the method described in this paper. -" -1528,1412.7063,"Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran",Diverse Embedding Neural Network Language Models,cs.CL cs.LG cs.NE," We propose Diverse Embedding Neural Network (DENN), a novel architecture for -language models (LMs). A DENNLM projects the input word history vector onto -multiple diverse low-dimensional sub-spaces instead of a single -higher-dimensional sub-space as in conventional feed-forward neural network -LMs. We encourage these sub-spaces to be diverse during network training -through an augmented loss function. Our language modeling experiments on the -Penn Treebank data set show the performance benefit of using a DENNLM. -" -1529,1412.7091,"Pascal Vincent, Alexandre de Br\'ebisson, Xavier Bouthillier","Efficient Exact Gradient Update for training Deep Networks with Very - Large Sparse Targets",cs.NE cs.CL cs.LG," An important class of problems involves training deep neural networks with -sparse prediction targets of very high dimension D. These occur naturally in -e.g. neural language models or the learning of word-embeddings, often posed as -predicting the probability of next words among a vocabulary of size D (e.g. 200 -000). Computing the equally large, but typically non-sparse D-dimensional -output vector from a last hidden layer of reasonable dimension d (e.g. 500) -incurs a prohibitive O(Dd) computational cost for each example, as does -updating the D x d output weight matrix and computing the gradient needed for -backpropagation to previous layers. While efficient handling of large sparse -network inputs is trivial, the case of large sparse targets is not, and has -thus so far been sidestepped with approximate alternatives such as hierarchical -softmax or sampling-based approximations during training. In this work we -develop an original algorithmic approach which, for a family of loss functions -that includes squared error and spherical softmax, can compute the exact loss, -gradient update for the output weights, and gradient for backpropagation, all -in O(d^2) per example instead of O(Dd), remarkably without ever computing the -D-dimensional output. The proposed algorithm yields a speedup of D/4d , i.e. -two orders of magnitude for typical sizes, for that critical part of the -computations that often dominates the training time in this kind of network -architecture. -" -1530,1412.7110,"Dimitri Palaz, Mathew Magimai Doss and Ronan Collobert","Learning linearly separable features for speech recognition using - convolutional neural networks",cs.LG cs.CL cs.NE," Automatic speech recognition systems usually rely on spectral-based features, -such as MFCC of PLP. These features are extracted based on prior knowledge such -as, speech perception or/and speech production. Recently, convolutional neural -networks have been shown to be able to estimate phoneme conditional -probabilities in a completely data-driven manner, i.e. using directly temporal -raw speech signal as input. This system was shown to yield similar or better -performance than HMM/ANN based system on phoneme recognition task and on large -scale continuous speech recognition task, using less parameters. Motivated by -these studies, we investigate the use of simple linear classifier in the -CNN-based framework. Thus, the network learns linearly separable features from -raw speech. We show that such system yields similar or better performance than -MLP based system using cepstral-based features as input. -" -1531,1412.7119,Paul Baltescu and Phil Blunsom,Pragmatic Neural Language Modelling in Machine Translation,cs.CL," This paper presents an in-depth investigation on integrating neural language -models in translation systems. Scaling neural language models is a difficult -task, but crucial for real-world applications. This paper evaluates the impact -on end-to-end MT quality of both new and existing scaling techniques. We show -when explicitly normalising neural models is necessary and what optimisation -tricks one should use in such scenarios. We also focus on scalable training -algorithms and investigate noise contrastive estimation and diagonal contexts -as sources for further speed improvements. We explore the trade-offs between -neural models and back-off n-gram models and find that neural models make -strong candidates for natural language applications in memory constrained -environments, yet still lag behind traditional models in raw translation -quality. We conclude with a set of recommendations one should follow to build a -scalable neural language model for MT. -" -1532,1412.7180,"Yishu Miao, Ziyu Wang, Phil Blunsom",Bayesian Optimisation for Machine Translation,cs.CL cs.LG," This paper presents novel Bayesian optimisation algorithms for minimum error -rate training of statistical machine translation systems. We explore two -classes of algorithms for efficiently exploring the translation space, with the -first based on N-best lists and the second based on a hypergraph representation -that compactly represents an exponential number of translation options. Our -algorithms exhibit faster convergence and are capable of obtaining lower error -rates than the existing translation model specific approaches, all within a -generic Bayesian optimisation framework. Further more, we also introduce a -random embedding algorithm to scale our approach to sparse high dimensional -feature sets. -" -1533,1412.7186,Ramon Ferrer-i-Cancho,"Reply to the commentary ""Be careful when assuming the obvious"", by P. - Alday",cs.CL physics.data-an physics.soc-ph," Here we respond to some comments by Alday concerning headedness in linguistic -theory and the validity of the assumptions of a mathematical model for word -order. For brevity, we focus only on two assumptions: the unit of measurement -of dependency length and the monotonicity of the cost of a dependency as a -function of its length. We also revise the implicit psychological bias in -Alday's comments. Notwithstanding, Alday is indicating the path for linguistic -research with his unusual concerns about parsimony from multiple dimensions. -" -1534,1412.7415,"Jestin Joy, Kannan Balakrishnan",A prototype Malayalam to Sign Language Automatic Translator,cs.CL," Sign language, which is a medium of communication for deaf people, uses -manual communication and body language to convey meaning, as opposed to using -sound. This paper presents a prototype Malayalam text to sign language -translation system. The proposed system takes Malayalam text as input and -generates corresponding Sign Language. Output animation is rendered using a -computer generated model. This system will help to disseminate information to -the deaf people in public utility places like railways, banks, hospitals etc. -This will also act as an educational tool in learning Sign Language. -" -1535,1412.7449,"Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, - Geoffrey Hinton",Grammar as a Foreign Language,cs.CL cs.LG stat.ML," Syntactic constituency parsing is a fundamental problem in natural language -processing and has been the subject of intensive research and engineering for -decades. As a result, the most accurate parsers are domain specific, complex, -and inefficient. In this paper we show that the domain agnostic -attention-enhanced sequence-to-sequence model achieves state-of-the-art results -on the most widely used syntactic constituency parsing dataset, when trained on -a large synthetic corpus that was annotated using existing parsers. It also -matches the performance of standard parsers when trained only on a small -human-annotated dataset, which shows that this model is highly data-efficient, -in contrast to sequence-to-sequence models without the attention mechanism. Our -parser is also fast, processing over a hundred sentences per second with an -unoptimized CPU implementation. -" -1536,1412.7782,"MAC Jiffriya, MAC Akmal Jahan, and Roshan G. Ragel","Plagiarism Detection on Electronic Text based Assignments using Vector - Space Model (ICIAfS14)",cs.IR cs.CL," Plagiarism is known as illegal use of others' part of work or whole work as -one's own in any field such as art, poetry, literature, cinema, research and -other creative forms of study. Plagiarism is one of the important issues in -academic and research fields and giving more concern in academic systems. The -situation is even worse with the availability of ample resources on the web. -This paper focuses on an effective plagiarism detection tool on identifying -suitable intra-corpal plagiarism detection for text based assignments by -comparing unigram, bigram, trigram of vector space model with cosine similarity -measure. Manually evaluated, labelled dataset was tested using unigram, bigram -and trigram vector. Even though trigram vector consumes comparatively more -time, it shows better results with the labelled data. In addition, the selected -trigram vector space model with cosine similarity measure is compared with -tri-gram sequence matching technique with Jaccard measure. In the results, -cosine similarity score shows slightly higher values than the other. Because, -it focuses on giving more weight for terms that do not frequently exist in the -dataset and cosine similarity measure using trigram technique is more -preferable than the other. Therefore, we present our new tool and it could be -used as an effective tool to evaluate text based electronic assignments and -minimize the plagiarism among students. -" -1537,1412.8010,Xuan-Son Vu and Seong-Bae Park,Construction of Vietnamese SentiWordNet by using Vietnamese Dictionary,cs.CL," SentiWordNet is an important lexical resource supporting sentiment analysis -in opinion mining applications. In this paper, we propose a novel approach to -construct a Vietnamese SentiWordNet (VSWN). SentiWordNet is typically generated -from WordNet in which each synset has numerical scores to indicate its opinion -polarities. Many previous studies obtained these scores by applying a machine -learning method to WordNet. However, Vietnamese WordNet is not available -unfortunately by the time of this paper. Therefore, we propose a method to -construct VSWN from a Vietnamese dictionary, not from WordNet. We show the -effectiveness of the proposed method by generating a VSWN with 39,561 synsets -automatically. The method is experimentally tested with 266 synsets with aspect -of positivity and negativity. It attains a competitive result compared with -English SentiWordNet that is 0.066 and 0.052 differences for positivity and -negativity sets respectively. -" -1538,1412.8079,"Ayoub Bagheri, Mohamad Saraee","Persian Sentiment Analyzer: A Framework based on a Novel Feature - Selection Method",cs.CL cs.IR," In the recent decade, with the enormous growth of digital content in internet -and databases, sentiment analysis has received more and more attention between -information retrieval and natural language processing researchers. Sentiment -analysis aims to use automated tools to detect subjective information from -reviews. One of the main challenges in sentiment analysis is feature selection. -Feature selection is widely used as the first stage of analysis and -classification tasks to reduce the dimension of problem, and improve speed by -the elimination of irrelevant and redundant features. Up to now as there are -few researches conducted on feature selection in sentiment analysis, there are -very rare works for Persian sentiment analysis. This paper considers the -problem of sentiment classification using different feature selection methods -for online customer reviews in Persian language. Three of the challenges of -Persian text are using of a wide variety of declensional suffixes, different -word spacing and many informal or colloquial words. In this paper we study -these challenges by proposing a model for sentiment classification of Persian -review documents. The proposed model is based on lemmatization and feature -selection and is employed Naive Bayes algorithm for classification. We evaluate -the performance of the model on a manually gathered collection of cellphone -reviews, where the results show the effectiveness of the proposed approaches. -" -1539,1412.8102,"Bob Coecke (University of Oxford), Ichiro Hasuo (The University of - Tokyo), Prakash Panangaden (McGill University)",Proceedings of the 11th workshop on Quantum Physics and Logic,cs.LO cs.CL cs.PL quant-ph," This volume contains the proceedings of the 11th International Workshop on -Quantum Physics and Logic (QPL 2014), which was held from the 4th to the 6th of -June, 2014, at Kyoto University, Japan. - The goal of the QPL workshop series is to bring together researchers working -on mathematical foundations of quantum physics, quantum computing and -spatio-temporal causal structures, and in particular those that use logical -tools, ordered algebraic and category-theoretic structures, formal languages, -semantic methods and other computer science methods for the study of physical -behavior in general. Over the past few years, there has been growing activity -in these foundational approaches, together with a renewed interest in the -foundations of quantum theory, which complement the more mainstream research in -quantum computation. Earlier workshops in this series, with the same acronym -under the name ""Quantum Programming Languages"", were held in Ottawa (2003), -Turku (2004), Chicago (2005), and Oxford (2006). The first QPL under the new -name Quantum Physics and Logic was held in Reykjavik (2008), followed by Oxford -(2009 and 2010), Nijmegen (2011), Brussels (2012) and Barcelona (2013). -" -1540,1412.8319,"Stanis{\l}aw Dro\.zd\.z, Pawe{\l} O\'swi\k{e}cimka, Andrzej Kulig, - Jaros{\l}aw Kwapie\'n, Katarzyna Bazarnik, Iwona Grabska-Gradzi\'nska, Jan - Rybicki, Marek Stanuszek","Quantifying origin and character of long-range correlations in narrative - texts",cs.CL physics.soc-ph," In natural language using short sentences is considered efficient for -communication. However, a text composed exclusively of such sentences looks -technical and reads boring. A text composed of long ones, on the other hand, -demands significantly more effort for comprehension. Studying characteristics -of the sentence length variability (SLV) in a large corpus of world-famous -literary texts shows that an appealing and aesthetic optimum appears somewhere -in between and involves selfsimilar, cascade-like alternation of various -lengths sentences. A related quantitative observation is that the power spectra -S(f) of thus characterized SLV universally develop a convincing `1/f^beta' -scaling with the average exponent beta =~ 1/2, close to what has been -identified before in musical compositions or in the brain waves. An -overwhelming majority of the studied texts simply obeys such fractal attributes -but especially spectacular in this respect are hypertext-like, ""stream of -consciousness"" novels. In addition, they appear to develop structures -characteristic of irreducibly interwoven sets of fractals called multifractals. -Scaling of S(f) in the present context implies existence of the long-range -correlations in texts and appearance of multifractality indicates that they -carry even a nonlinear component. A distinct role of the full stops in inducing -the long-range correlations in texts is evidenced by the fact that the above -quantitative characteristics on the long-range correlations manifest themselves -in variation of the full stops recurrence times along texts, thus in SLV, but -to a much lesser degree in the recurrence times of the most frequent words. In -this latter case the nonlinear correlations, thus multifractality, disappear -even completely for all the texts considered. Treated as one extra word, the -full stops at the same time appear to obey the Zipfian rank-frequency -distribution, however. -" -1541,1412.8419,Remi Lebret and Pedro O. Pinheiro and Ronan Collobert,Simple Image Description Generator via a Linear Phrase-Based Approach,cs.CL cs.CV cs.NE," Generating a novel textual description of an image is an interesting problem -that connects computer vision and natural language processing. In this paper, -we present a simple model that is able to generate descriptive sentences given -a sample image. This model has a strong focus on the syntax of the -descriptions. We train a purely bilinear model that learns a metric between an -image representation (generated from a previously trained Convolutional Neural -Network) and phrases that are used to described them. The system is then able -to infer phrases from a given image sample. Based on caption syntax statistics, -we propose a simple language model that can produce relevant descriptions for a -given test image using the phrases inferred. Our approach, which is -considerably simpler than state-of-the-art models, achieves comparable results -on the recently release Microsoft COCO dataset. -" -1542,1412.8504,Diego R. Amancio,"Probing the topological properties of complex networks modeling short - written texts",cs.CL physics.soc-ph," In recent years, graph theory has been widely employed to probe several -language properties. More specifically, the so-called word adjacency model has -been proven useful for tackling several practical problems, especially those -relying on textual stylistic analysis. The most common approach to treat texts -as networks has simply considered either large pieces of texts or entire books. -This approach has certainly worked well -- many informative discoveries have -been made this way -- but it raises an uncomfortable question: could there be -important topological patterns in small pieces of texts? To address this -problem, the topological properties of subtexts sampled from entire books was -probed. Statistical analyzes performed on a dataset comprising 50 novels -revealed that most of the traditional topological measurements are stable for -short subtexts. When the performance of the authorship recognition task was -analyzed, it was found that a proper sampling yields a discriminability similar -to the one found with full texts. Surprisingly, the support vector machine -classification based on the characterization of short texts outperformed the -one performed with entire books. These findings suggest that a local -topological analysis of large documents might improve its global -characterization. Most importantly, it was verified, as a proof of principle, -that short texts can be analyzed with the methods and concepts of complex -networks. As a consequence, the techniques described here can be extended in a -straightforward fashion to analyze texts as time-varying complex networks. -" -1543,1412.8527,"Anne Preller (LIRMM, France)",From Logical to Distributional Models,cs.LO cs.CL quant-ph," The paper relates two variants of semantic models for natural language, -logical functional models and compositional distributional vector space models, -by transferring the logic and reasoning from the logical to the distributional -models. - The geometrical operations of quantum logic are reformulated as algebraic -operations on vectors. A map from functional models to vector space models -makes it possible to compare the meaning of sentences word by word. -" -1544,1501.00311,Jun-Ping Ng and Min-Yen Kan,QANUS: An Open-source Question-Answering Platform,cs.IR cs.CL," In this paper, we motivate the need for a publicly available, generic -software framework for question-answering (QA) systems. We present an -open-source QA framework QANUS which researchers can leverage on to build new -QA systems easily and rapidly. The framework implements much of the code that -will otherwise have been repeated across different QA systems. To demonstrate -the utility and practicality of the framework, we further present a fully -functioning factoid QA system QA-SYS built on top of QANUS. -" -1545,1501.00657,Scott A. Hale,"Cross-language Wikipedia Editing of Okinawa, Japan",cs.CY cs.CL cs.SI," This article analyzes users who edit Wikipedia articles about Okinawa, Japan, -in English and Japanese. It finds these users are among the most active and -dedicated users in their primary languages, where they make many large, -high-quality edits. However, when these users edit in their non-primary -languages, they tend to make edits of a different type that are overall smaller -in size and more often restricted to the narrow set of articles that exist in -both languages. Design changes to motivate wider contributions from users in -their non-primary languages and to encourage multilingual users to transfer -more information across language divides are presented. -" -1546,1501.00841,Gerard Lynch and Carl Vogel,"Chasing the Ghosts of Ibsen: A computational stylistic analysis of drama - in translation",cs.CL," Research into the stylistic properties of translations is an issue which has -received some attention in computational stylistics. Previous work by Rybicki -(2006) on the distinguishing of character idiolects in the work of Polish -author Henryk Sienkiewicz and two corresponding English translations using -Burrow's Delta method concluded that idiolectal differences could be observed -in the source texts and this variation was preserved to a large degree in both -translations. This study also found that the two translations were also highly -distinguishable from one another. Burrows (2002) examined English translations -of Juvenal also using the Delta method, results of this work suggest that some -translators are more adept at concealing their own style when translating the -works of another author whereas other authors tend to imprint their own style -to a greater extent on the work they translate. Our work examines the writing -of a single author, Norwegian playwright Henrik Ibsen, and these writings -translated into both German and English from Norwegian, in an attempt to -investigate the preservation of characterization, defined here as the -distinctiveness of textual contributions of characters. -" -1547,1501.00960,"Eitan Adam Pechenick, Christopher M. Danforth, Peter Sheridan Dodds","Characterizing the Google Books corpus: Strong limits to inferences of - socio-cultural and linguistic evolution",physics.soc-ph cond-mat.stat-mech cs.CL stat.AP," It is tempting to treat frequency trends from the Google Books data sets as -indicators of the ""true"" popularity of various words and phrases. Doing so -allows us to draw quantitatively strong conclusions about the evolution of -cultural perception of a given topic, such as time or gender. However, the -Google Books corpus suffers from a number of limitations which make it an -obscure mask of cultural popularity. A primary issue is that the corpus is in -effect a library, containing one of each book. A single, prolific author is -thereby able to noticeably insert new phrases into the Google Books lexicon, -whether the author is widely read or not. With this understood, the Google -Books corpus remains an important data set to be considered more lexicon-like -than text-like. Here, we show that a distinct problematic feature arises from -the inclusion of scientific texts, which have become an increasingly -substantive portion of the corpus throughout the 1900s. The result is a surge -of phrases typical to academic articles but less common in general, such as -references to time in the form of citations. We highlight these dynamics by -examining and comparing major contributions to the statistical divergence of -English data sets between decades in the period 1800--2000. We find that only -the English Fiction data set from the second version of the corpus is not -heavily affected by professional texts, in clear contrast to the first version -of the fiction data set and both unfiltered English data sets. Our findings -emphasize the need to fully characterize the dynamics of the Google Books -corpus before using these data sets to draw broad conclusions about cultural -and linguistic evolution. -" -1548,1501.01243,"Juan-Manuel Torres-Moreno, Javier Ramirez, Iria da Cunha","Un r\'esumeur \`a base de graphes, ind\'ep\'endant de la langue",cs.CL," In this paper we present REG, a graph-based approach for study a fundamental -problem of Natural Language Processing (NLP): the automatic text summarization. -The algorithm maps a document as a graph, then it computes the weight of their -sentences. We have applied this approach to summarize documents in three -languages. -" -1549,1501.01252,"Mayeul Mathias, Assema Moussa, Fen Zhou, Juan-Manuel Torres-Moreno, - Marie-Sylvie Poli, Didier Josselin, Marc El-B\`eze, Andr\'ea Carneiro - Linhares, Francoise Rigat","Optimisation using Natural Language Processing: Personalized Tour - Recommendation for Museums",cs.AI cs.CL," This paper proposes a new method to provide personalized tour recommendation -for museum visits. It combines an optimization of preference criteria of -visitors with an automatic extraction of artwork importance from museum -information based on Natural Language Processing using textual energy. This -project includes researchers from computer and social sciences. Some results -are obtained with numerical experiments. They show that our model clearly -improves the satisfaction of the visitor who follows the proposed tour. This -work foreshadows some interesting outcomes and applications about on-demand -personalized visit of museums in a very near future. -" -1550,1501.01254,A.J.P.M.P. Jayaweera and N.G.J. Dias,Unknown Words Analysis in POS tagging of Sinhala Language,cs.CL," Part of Speech (POS) is a very vital topic in Natural Language Processing -(NLP) task in any language, which involves analysing the construction of the -language, behaviours and the dynamics of the language, the knowledge that could -be utilized in computational linguistics analysis and automation applications. -In this context, dealing with unknown words (words do not appear in the lexicon -referred as unknown words) is also an important task, since growing NLP systems -are used in more and more new applications. One aid of predicting lexical -categories of unknown words is the use of syntactical knowledge of the -language. The distinction between open class words and closed class words -together with syntactical features of the language used in this research to -predict lexical categories of unknown words in the tagging process. An -experiment is performed to investigate the ability of the approach to parse -unknown words using syntactical knowledge without human intervention. This -experiment shows that the performance of the tagging process is enhanced when -word class distinction is used together with syntactic rules to parse sentences -containing unknown words in Sinhala language. -" -1551,1501.01318,"Ashraf Odeh, Aymen Abu-Errub, Qusai Shambour and Nidal Turab",Arabic Text Categorization Algorithm using Vector Evaluation Method,cs.IR cs.CL," Text categorization is the process of grouping documents into categories -based on their contents. This process is important to make information -retrieval easier, and it became more important due to the huge textual -information available online. The main problem in text categorization is how to -improve the classification accuracy. Although Arabic text categorization is a -new promising field, there are a few researches in this field. This paper -proposes a new method for Arabic text categorization using vector evaluation. -The proposed method uses a categorized Arabic documents corpus, and then the -weights of the tested document's words are calculated to determine the document -keywords which will be compared with the keywords of the corpus categorizes to -determine the tested document's best category. -" -1552,1501.01386,"Misbah Daud, Rafiullah Khan, Mohibullah and Aitazaz Daud",Roman Urdu Opinion Mining System (RUOMiS),cs.CL cs.IR," Convincing a customer is always considered as a challenging task in every -business. But when it comes to online business, this task becomes even more -difficult. Online retailers try everything possible to gain the trust of the -customer. One of the solutions is to provide an area for existing users to -leave their comments. This service can effectively develop the trust of the -customer however normally the customer comments about the product in their -native language using Roman script. If there are hundreds of comments this -makes difficulty even for the native customers to make a buying decision. This -research proposes a system which extracts the comments posted in Roman Urdu, -translate them, find their polarity and then gives us the rating of the -product. This rating will help the native and non-native customers to make -buying decision efficiently from the comments posted in Roman Urdu. -" -1553,1501.01866,Dirk Roorda,The Hebrew Bible as Data: Laboratory - Sharing - Experiences,cs.CL cs.DL," The systematic study of ancient texts including their production, -transmission and interpretation is greatly aided by the digital methods that -started taking off in the 1970s. But how is that research in turn transmitted -to new generations of researchers? We tell a story of Bible and computer across -the decades and then point out the current challenges: (1) finding a stable -data representation for changing methods of computation; (2) sharing results in -inter- and intra-disciplinary ways, for reproducibility and -cross-fertilization. We report recent developments in meeting these challenges. -The scene is the text database of the Hebrew Bible, constructed by the Eep -Talstra Centre for Bible and Computer (ETCBC), which is still growing in detail -and sophistication. We show how a subtle mix of computational ingredients -enable scholars to research the transmission and interpretation of the Hebrew -Bible in new ways: (1) a standard data format, Linguistic Annotation Framework -(LAF); (2) the methods of scientific computing, made accessible by -(interactive) Python and its associated ecosystem. Additionally, we show how -these efforts have culminated in the construction of a new, publicly accessible -search engine SHEBANQ, where the text of the Hebrew Bible and its underlying -data can be queried in a simple, yet powerful query language MQL, and where -those queries can be saved and shared. -" -1554,1501.01894,Vinodh Rajan,"Quantifying Scripts: Defining metrics of characters for quantitative and - descriptive analysis",cs.CL," Analysis of scripts plays an important role in paleography and in -quantitative linguistics. Especially in the field of digital paleography -quantitative features are much needed to differentiate glyphs. We describe an -elaborate set of metrics that quantify qualitative information contained in -characters and hence indirectly also quantify the scribal features. We broadly -divide the metrics into several categories and describe each individual metric -with its underlying qualitative significance. The metrics are largely derived -from the related area of gesture design and recognition. We also propose -several novel metrics. The proposed metrics are soundly grounded on the -principles of handwriting production and handwriting analysis. These computed -metrics could serve as descriptors for scripts and also be used for comparing -and analyzing scripts. We illustrate some quantitative analysis based on the -proposed metrics by applying it to the paleographic evolution of the medieval -Tamil script from Brahmi. We also outline future work. -" -1555,1501.02527,"Harini Suresh, Nicholas Locascio","Autodetection and Classification of Hidden Cultural City Districts from - Yelp Reviews",cs.CL cs.AI cs.IR," Topic models are a way to discover underlying themes in an otherwise -unstructured collection of documents. In this study, we specifically used the -Latent Dirichlet Allocation (LDA) topic model on a dataset of Yelp reviews to -classify restaurants based off of their reviews. Furthermore, we hypothesize -that within a city, restaurants can be grouped into similar ""clusters"" based on -both location and similarity. We used several different clustering methods, -including K-means Clustering and a Probabilistic Mixture Model, in order to -uncover and classify districts, both well-known and hidden (i.e. cultural areas -like Chinatown or hearsay like ""the best street for Italian restaurants"") -within a city. We use these models to display and label different clusters on a -map. We also introduce a topic similarity heatmap that displays the similarity -distribution in a city to a new restaurant. -" -1556,1501.02530,"Anna Rohrbach, Marcus Rohrbach, Niket Tandon, Bernt Schiele",A Dataset for Movie Description,cs.CV cs.CL cs.IR," Descriptive video service (DVS) provides linguistic descriptions of movies -and allows visually impaired people to follow a movie along with their peers. -Such descriptions are by design mainly visual and thus naturally form an -interesting data source for computer vision and computational linguistics. In -this work we propose a novel dataset which contains transcribed DVS, which is -temporally aligned to full length HD movies. In addition we also collected the -aligned movie scripts which have been used in prior work and compare the two -different sources of descriptions. In total the Movie Description dataset -contains a parallel corpus of over 54,000 sentences and video snippets from 72 -HD movies. We characterize the dataset by benchmarking different approaches for -generating video descriptions. Comparing DVS to scripts, we find that DVS is -far more visual and describes precisely what is shown rather than what should -happen according to the scripts created prior to movie production. -" -1557,1501.02598,"Angeliki Lazaridou, Nghia The Pham, Marco Baroni",Combining Language and Vision with a Multimodal Skip-gram Model,cs.CL cs.CV cs.LG," We extend the SKIP-GRAM model of Mikolov et al. (2013a) by taking visual -information into account. Like SKIP-GRAM, our multimodal models (MMSKIP-GRAM) -build vector-based word representations by learning to predict linguistic -contexts in text corpora. However, for a restricted set of words, the models -are also exposed to visual representations of the objects they denote -(extracted from natural images), and must predict linguistic and visual -features jointly. The MMSKIP-GRAM models achieve good performance on a variety -of semantic benchmarks. Moreover, since they propagate visual information to -all words, we use them to improve image labeling and retrieval in the zero-shot -setup, where the test concepts are never seen during model training. Finally, -the MMSKIP-GRAM models discover intriguing visual properties of abstract words, -paving the way to realistic implementations of embodied theories of meaning. -" -1558,1501.02670,Amaru Cuba Gyllensten and Magnus Sahlgren,Navigating the Semantic Horizon using Relative Neighborhood Graphs,cs.CL," This paper is concerned with nearest neighbor search in distributional -semantic models. A normal nearest neighbor search only returns a ranked list of -neighbors, with no information about the structure or topology of the local -neighborhood. This is a potentially serious shortcoming of the mode of querying -a distributional semantic model, since a ranked list of neighbors may conflate -several different senses. We argue that the topology of neighborhoods in -semantic space provides important information about the different senses of -terms, and that such topological structures can be used for word-sense -induction. We also argue that the topology of the neighborhoods in semantic -space can be used to determine the semantic horizon of a point, which we define -as the set of neighbors that have a direct connection to the point. We -introduce relative neighborhood graphs as method to uncover the topological -properties of neighborhoods in semantic models. We also provide examples of -relative neighborhood graphs for three well-known semantic models; the PMI -model, the GloVe model, and the skipgram model. -" -1559,1501.02714,"Angeliki Lazaridou, Georgiana Dinu, Adam Liska, Marco Baroni","From Visual Attributes to Adjectives through Decompositional - Distributional Semantics",cs.CL cs.CV," As automated image analysis progresses, there is increasing interest in -richer linguistic annotation of pictures, with attributes of objects (e.g., -furry, brown...) attracting most attention. By building on the recent -""zero-shot learning"" approach, and paying attention to the linguistic nature of -attributes as noun modifiers, and specifically adjectives, we show that it is -possible to tag images with attribute-denoting adjectives even when no training -data containing the relevant annotation are available. Our approach relies on -two key observations. First, objects can be seen as bundles of attributes, -typically expressed as adjectival modifiers (a dog is something furry, brown, -etc.), and thus a function trained to map visual representations of objects to -nominal labels can implicitly learn to map attributes to adjectives. Second, -objects and attributes come together in pictures (the same thing is a dog and -it is brown). We can thus achieve better attribute (and object) label retrieval -by treating images as ""visual phrases"", and decomposing their linguistic -representation into an attribute-denoting adjective and an object-denoting -noun. Our approach performs comparably to a method exploiting manual attribute -annotation, it outperforms various competitive alternatives in both attribute -and object annotation, and it automatically constructs attribute-centric -representations that significantly improve performance in supervised object -recognition. -" -1560,1501.03191,Benjamin S. Mericli and Michael Bloodgood,Annotating Cognates and Etymological Origin in Turkic Languages,cs.CL," Turkic languages exhibit extensive and diverse etymological relationships -among lexical items. These relationships make the Turkic languages promising -for exploring automated translation lexicon induction by leveraging cognate and -other etymological information. However, due to the extent and diversity of the -types of relationships between words, it is not clear how to annotate such -information. In this paper, we present a methodology for annotating cognates -and etymological origin in Turkic languages. Our method strives to balance the -amount of research effort the annotator expends with the utility of the -annotations for supporting research on improving automated translation lexicon -induction. -" -1561,1501.03210,"Piyush Bansal, Romil Bansal and Vasudeva Varma",Towards Deep Semantic Analysis Of Hashtags,cs.IR cs.CL," Hashtags are semantico-syntactic constructs used across various social -networking and microblogging platforms to enable users to start a topic -specific discussion or classify a post into a desired category. Segmenting and -linking the entities present within the hashtags could therefore help in better -understanding and extraction of information shared across the social media. -However, due to lack of space delimiters in the hashtags (e.g #nsavssnowden), -the segmentation of hashtags into constituent entities (""NSA"" and ""Edward -Snowden"" in this case) is not a trivial task. Most of the current -state-of-the-art social media analytics systems like Sentiment Analysis and -Entity Linking tend to either ignore hashtags, or treat them as a single word. -In this paper, we present a context aware approach to segment and link entities -in the hashtags to a knowledge base (KB) entry, based on the context within the -tweet. Our approach segments and links the entities in hashtags such that the -coherence between hashtag semantics and the tweet is maximized. To the best of -our knowledge, no existing study addresses the issue of linking entities in -hashtags for extracting semantic information. We evaluate our method on two -different datasets, and demonstrate the effectiveness of our technique in -improving the overall entity linking in tweets via additional semantic -information provided by segmenting and linking entities in a hashtag. -" -1562,1501.03214,Roger Bilisoly,Quantifying Prosodic Variability in Middle English Alliterative Poetry,stat.AP cs.CL," Interest in the mathematical structure of poetry dates back to at least the -19th century: after retiring from his mathematics position, J. J. Sylvester -wrote a book on prosody called $\textit{The Laws of Verse}$. Today there is -interest in the computer analysis of poems, and this paper discusses how a -statistical approach can be applied to this task. Starting with the definition -of what Middle English alliteration is, $\textit{Sir Gawain and the Green -Knight}$ and William Langland's $\textit{Piers Plowman}$ are used to illustrate -the methodology. Theory first developed for analyzing data from a Riemannian -manifold turns out to be applicable to strings allowing one to compute a -generalized mean and variance for textual data, which is applied to the poems -above. The ratio of these two variances produces the analogue of the F test, -and resampling allows p-values to be estimated. Consequently, this methodology -provides a way to compare prosodic variability between two texts. -" -1563,1501.03302,Mateusz Malinowski and Mario Fritz,Hard to Cheat: A Turing Test based on Answering Questions about Images,cs.AI cs.CL cs.CV cs.LG," Progress in language and image understanding by machines has sparkled the -interest of the research community in more open-ended, holistic tasks, and -refueled an old AI dream of building intelligent machines. We discuss a few -prominent challenges that characterize such holistic tasks and argue for -""question answering about images"" as a particular appealing instance of such a -holistic task. In particular, we point out that it is a version of a Turing -Test that is likely to be more robust to over-interpretations and contrast it -with tasks like grounding and generation of descriptions. Finally, we discuss -tools to measure progress in this field. -" -1564,1501.04324,Jia Xu and Geliang Chen,Phrase Based Language Model For Statistical Machine Translation,cs.CL," We consider phrase based Language Models (LM), which generalize the commonly -used word level models. Similar concept on phrase based LMs appears in speech -recognition, which is rather specialized and thus less suitable for machine -translation (MT). In contrast to the dependency LM, we first introduce the -exhaustive phrase-based LMs tailored for MT use. Preliminary experimental -results show that our approach outperform word based LMs with the respect to -perplexity and translation quality. -" -1565,1501.04325,Lars Maaloe and Morten Arngren and Ole Winther,Deep Belief Nets for Topic Modeling,cs.CL cs.LG stat.ML," Applying traditional collaborative filtering to digital publishing is -challenging because user data is very sparse due to the high volume of -documents relative to the number of users. Content based approaches, on the -other hand, is attractive because textual content is often very informative. In -this paper we describe large-scale content based collaborative filtering for -digital publishing. To solve the digital publishing recommender problem we -compare two approaches: latent Dirichlet allocation (LDA) and deep belief nets -(DBN) that both find low-dimensional latent representations for documents. -Efficient retrieval can be carried out in the latent representation. We work -both on public benchmarks and digital media content provided by Issuu, an -online publishing platform. This article also comes with a newly developed deep -belief nets toolbox for topic modeling tailored towards performance evaluation -of the DBN model and comparisons to the LDA model. -" -1566,1501.04346,"Andrew S. Lan and Divyanshu Vats and Andrew E. Waters and Richard G. - Baraniuk","Mathematical Language Processing: Automatic Grading and Feedback for - Open Response Mathematical Questions",stat.ML cs.AI cs.CL cs.LG," While computer and communication technologies have provided effective means -to scale up many aspects of education, the submission and grading of -assessments such as homework assignments and tests remains a weak link. In this -paper, we study the problem of automatically grading the kinds of open response -mathematical questions that figure prominently in STEM (science, technology, -engineering, and mathematics) courses. Our data-driven framework for -mathematical language processing (MLP) leverages solution data from a large -number of learners to evaluate the correctness of their solutions, assign -partial-credit scores, and provide feedback to each learner on the likely -locations of any errors. MLP takes inspiration from the success of natural -language processing for text data and comprises three main steps. First, we -convert each solution to an open response mathematical question into a series -of numerical features. Second, we cluster the features from several solutions -to uncover the structures of correct, partially correct, and incorrect -solutions. We develop two different clustering approaches, one that leverages -generic clustering algorithms and one based on Bayesian nonparametrics. Third, -we automatically grade the remaining (potentially large number of) solutions -based on their assigned cluster and one instructor-provided grade per cluster. -As a bonus, we can track the cluster assignment of each step of a multistep -solution and determine when it departs from a cluster of correct solutions, -which enables us to indicate the likely locations of errors to learners. We -test and validate MLP on real-world MOOC data to demonstrate how it can -substantially reduce the human effort required in large-scale educational -platforms. -" -1567,1501.04920,"Gerardo Sierra, Juan-Manuel Torres-Moreno, Alejandro Molina",Regroupement s\'emantique de d\'efinitions en espagnol,cs.IR cs.CL," This article focuses on the description and evaluation of a new unsupervised -learning method of clustering of definitions in Spanish according to their -semantic. Textual Energy was used as a clustering measure, and we study an -adaptation of the Precision and Recall to evaluate our method. -" -1568,1501.05203,Geliang Chen,"Phrase Based Language Model for Statistical Machine Translation: - Empirical Study",cs.CL," Reordering is a challenge to machine translation (MT) systems. In MT, the -widely used approach is to apply word based language model (LM) which considers -the constituent units of a sentence as words. In speech recognition (SR), some -phrase based LM have been proposed. However, those LMs are not necessarily -suitable or optimal for reordering. We propose two phrase based LMs which -considers the constituent units of a sentence as phrases. Experiments show that -our phrase based LMs outperform the word based LM with the respect of -perplexity and n-best list re-ranking. -" -1569,1501.05396,"Youssef Mroueh, Etienne Marcheret, Vaibhava Goel",Deep Multimodal Learning for Audio-Visual Speech Recognition,cs.CL cs.LG," In this paper, we present methods in deep multimodal learning for fusing -speech and visual modalities for Audio-Visual Automatic Speech Recognition -(AV-ASR). First, we study an approach where uni-modal deep networks are trained -separately and their final hidden layers fused to obtain a joint feature space -in which another deep network is built. While the audio network alone achieves -a phone error rate (PER) of $41\%$ under clean condition on the IBM large -vocabulary audio-visual studio dataset, this fusion model achieves a PER of -$35.83\%$ demonstrating the tremendous value of the visual channel in phone -classification even in audio with high signal to noise ratio. Second, we -present a new deep network architecture that uses a bilinear softmax layer to -account for class specific correlations between modalities. We show that -combining the posteriors from the bilinear networks with those from the fused -model mentioned above results in a further significant phone error rate -reduction, yielding a final PER of $34.03\%$. -" -1570,1501.05940,"T. Rachad, J. Boutahar and S. El ghazi",A New Efficient Method for Calculating Similarity Between Web Services,cs.AI cs.CL cs.IR cs.SE," Web services allow communication between heterogeneous systems in a -distributed environment. Their enormous success and their increased use led to -the fact that thousands of Web services are present on the Internet. This -significant number of Web services which not cease to increase has led to -problems of the difficulty in locating and classifying web services, these -problems are encountered mainly during the operations of web services discovery -and substitution. Traditional ways of search based on keywords are not -successful in this context, their results do not support the structure of Web -services and they consider in their search only the identifiers of the web -service description language (WSDL) interface elements. The methods based on -semantics (WSDLS, OWLS, SAWSDL...) which increase the WSDL description of a Web -service with a semantic description allow raising partially this problem, but -their complexity and difficulty delays their adoption in real cases. Measuring -the similarity between the web services interfaces is the most suitable -solution for this kind of problems, it will classify available web services so -as to know those that best match the searched profile and those that do not -match. Thus, the main goal of this work is to study the degree of similarity -between any two web services by offering a new method that is more effective -than existing works. -" -1571,1501.06587,"Xiaodan Zhu, Peter Turney, Daniel Lemire, Andr\'e Vellino",Measuring academic influence: Not all citations are equal,cs.DL cs.CL cs.LG," The importance of a research article is routinely measured by counting how -many times it has been cited. However, treating all citations with equal weight -ignores the wide variety of functions that citations perform. We want to -automatically identify the subset of references in a bibliography that have a -central academic influence on the citing paper. For this purpose, we examine -the effectiveness of a variety of features for determining the academic -influence of a citation. By asking authors to identify the key references in -their own work, we created a data set in which citations were labeled according -to their academic influence. Using automatic feature selection with supervised -machine learning, we found a model for predicting academic influence that -achieves good performance on this data set using only four features. The best -features, among those we evaluated, were those based on the number of times a -reference is mentioned in the body of a citing paper. The performance of these -features inspired us to design an influence-primed h-index (the hip-index). -Unlike the conventional h-index, it weights citations by how many times a -reference is mentioned. According to our experiments, the hip-index is a better -indicator of researcher performance than the conventional h-index. -" -1572,1501.07005,"Monika T. Makwana, Deepak C. Vegda",Survey:Natural Language Parsing For Indian Languages,cs.CL," Syntactic parsing is a necessary task which is required for NLP applications -including machine translation. It is a challenging task to develop a -qualitative parser for morphological rich and agglutinative languages. -Syntactic analysis is used to understand the grammatical structure of a natural -language sentence. It outputs all the grammatical information of each word and -its constituent. Also issues related to it help us to understand the language -in a more detailed way. This literature survey is groundwork to understand the -different parser development for Indian languages and various approaches that -are used to develop such tools and techniques. This paper provides a survey of -research papers from well known journals and conferences. -" -1573,1501.07496,E.L.F. Da Silva and H.M. de Oliveira,"Implementation of an Automatic Syllabic Division Algorithm from Speech - Files in Portuguese Language",cs.SD cs.CL cs.DS eess.AS," A new algorithm for voice automatic syllabic splitting in the Portuguese -language is proposed, which is based on the envelope of the speech signal of -the input audio file. A computational implementation in MatlabTM is presented -and made available at the URL -http://www2.ee.ufpe.br/codec/divisao_silabica.html. Due to its -straightforwardness, the proposed method is very attractive for embedded -systems (e.g. i-phones). It can also be used as a screen to assist more -sophisticated methods. Voice excerpts containing more than one syllable and -identified by the same envelope are named as super-syllables and they are -subsequently separated. The results indicate which samples corresponds to the -beginning and end of each detected syllable. Preliminary tests were performed -to fifty words at an identification rate circa 70% (further improvements may be -incorporated to treat particular phonemes). This algorithm is also useful in -voice command systems, as a tool in the teaching of Portuguese language or even -for patients with speech pathology. -" -1574,1501.07676,"Issa Atoum, Chih How Bong, Narayanan Kulathuramaiyer",Towards Resolving Software Quality-in-Use Measurement Challenges,cs.SE cs.CL," Software quality-in-use comprehends the quality from user's perspectives. It -has gained its importance in e-learning applications, mobile service based -applications and project management tools. User's decisions on software -acquisitions are often ad hoc or based on preference due to difficulty in -quantitatively measure software quality-in-use. However, why quality-in-use -measurement is difficult? Although there are many software quality models to -our knowledge, no works surveys the challenges related to software -quality-in-use measurement. This paper has two main contributions; 1) presents -major issues and challenges in measuring software quality-in-use in the context -of the ISO SQuaRE series and related software quality models, 2) Presents a -novel framework that can be used to predict software quality-in-use, and 3) -presents preliminary results of quality-in-use topic prediction. Concisely, the -issues are related to the complexity of the current standard models and the -limitations and incompleteness of the customized software quality models. The -proposed framework employs sentiment analysis techniques to predict software -quality-in-use. -" -1575,1502.00512,"Will Williams, Niranjani Prasad, David Mrva, Tom Ash, Tony Robinson",Scaling Recurrent Neural Network Language Models,cs.CL cs.LG," This paper investigates the scaling properties of Recurrent Neural Network -Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and -address the questions of how RNNLMs scale with respect to model size, -training-set size, computational costs and memory. Our analysis shows that -despite being more costly to train, RNNLMs obtain much lower perplexities on -standard benchmarks than n-gram models. We train the largest known RNNs and -present relative word error rates gains of 18% on an ASR task. We also present -the new lowest perplexities on the recently released billion word language -modelling benchmark, 1 BLEU point gain on machine translation and a 17% -relative hit rate gain in word prediction. -" -1576,1502.00731,"Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, - Christopher R\'e",Incremental Knowledge Base Construction Using DeepDive,cs.DB cs.CL cs.LG," Populating a database with unstructured information is a long-standing -problem in industry and research that encompasses problems of extraction, -cleaning, and integration. Recent names used for this problem include dealing -with dark data and knowledge base construction (KBC). In this work, we describe -DeepDive, a system that combines database and machine learning ideas to help -develop KBC systems, and we present techniques to make the KBC process more -efficient. We observe that the KBC process is iterative, and we develop -techniques to incrementally produce inference results for KBC systems. We -propose two methods for incremental inference, based respectively on sampling -and variational techniques. We also study the tradeoff space of these methods -and develop a simple rule-based optimizer. DeepDive includes all of these -contributions, and we evaluate DeepDive on five KBC systems, showing that it -can speed up KBC inference tasks by up to two orders of magnitude with -negligible impact on quality. -" -1577,1502.00831,"Robin Piedeleu, Dimitri Kartsaklis, Bob Coecke and Mehrnoosh Sadrzadeh",Open System Categorical Quantum Semantics in Natural Language Processing,cs.CL cs.LO math.CT math.QA," Originally inspired by categorical quantum mechanics (Abramsky and Coecke, -LiCS'04), the categorical compositional distributional model of natural -language meaning of Coecke, Sadrzadeh and Clark provides a conceptually -motivated procedure to compute the meaning of a sentence, given its grammatical -structure within a Lambek pregroup and a vectorial representation of the -meaning of its parts. The predictions of this first model have outperformed -that of other models in mainstream empirical language processing tasks on large -scale data. Moreover, just like CQM allows for varying the model in which we -interpret quantum axioms, one can also vary the model in which we interpret -word meaning. - In this paper we show that further developments in categorical quantum -mechanics are relevant to natural language processing too. Firstly, Selinger's -CPM-construction allows for explicitly taking into account lexical ambiguity -and distinguishing between the two inherently different notions of homonymy and -polysemy. In terms of the model in which we interpret word meaning, this means -a passage from the vector space model to density matrices. Despite this change -of model, standard empirical methods for comparing meanings can be easily -adopted, which we demonstrate by a small-scale experiment on real-world data. -This experiment moreover provides preliminary evidence of the validity of our -proposed new model for word meaning. - Secondly, commutative classical structures as well as their non-commutative -counterparts that arise in the image of the CPM-construction allow for encoding -relative pronouns, verbs and adjectives, and finally, iteration of the -CPM-construction, something that has no counterpart in the quantum realm, -enables one to accommodate both entailment and ambiguity. -" -1578,1502.01245,Diego R. Amancio,"Authorship recognition via fluctuation analysis of network topology and - word intermittency",cs.CL," Statistical methods have been widely employed in many practical natural -language processing applications. More specifically, complex networks concepts -and methods from dynamical systems theory have been successfully applied to -recognize stylistic patterns in written texts. Despite the large amount of -studies devoted to represent texts with physical models, only a few studies -have assessed the relevance of attributes derived from the analysis of -stylistic fluctuations. Because fluctuations represent a pivotal factor for -characterizing a myriad of real systems, this study focused on the analysis of -the properties of stylistic fluctuations in texts via topological analysis of -complex networks and intermittency measurements. The results showed that -different authors display distinct fluctuation patterns. In particular, it was -found that it is possible to identify the authorship of books using the -intermittency of specific words. Taken together, the results described here -suggest that the patterns found in stylistic fluctuations could be used to -analyze other related complex systems. Furthermore, the discovery of novel -patterns related to textual stylistic fluctuations indicates that these -patterns could be useful to improve the state of the art of many -stylistic-based natural language processing tasks. -" -1579,1502.01271,Gregory Grefenstette (TAO),INRIASAC: Simple Hypernym Extraction Methods,cs.CL," Given a set of terms from a given domain, how can we structure them into a -taxonomy without manual intervention? This is the task 17 of SemEval 2015. Here -we present our simple taxonomy structuring techniques which, despite their -simplicity, ranked first in this 2015 benchmark. We use large quantities of -text (English Wikipedia) and simple heuristics such as term overlap and -document and sentence co-occurrence to produce hypernym lists. We describe -these techniques and pre-sent an initial evaluation of results. -" -1580,1502.01446,"Jiajun Zhang, Shujie Liu, Mu Li, Ming Zhou, Chengqing Zong",Beyond Word-based Language Model in Statistical Machine Translation,cs.CL," Language model is one of the most important modules in statistical machine -translation and currently the word-based language model dominants this -community. However, many translation models (e.g. phrase-based models) generate -the target language sentences by rendering and compositing the phrases rather -than the words. Thus, it is much more reasonable to model dependency between -phrases, but few research work succeed in solving this problem. In this paper, -we tackle this problem by designing a novel phrase-based language model which -attempts to solve three key sub-problems: 1, how to define a phrase in language -model; 2, how to determine the phrase boundary in the large-scale monolingual -data in order to enlarge the training set; 3, how to alleviate the data -sparsity problem due to the huge vocabulary size of phrases. By carefully -handling these issues, the extensive experiments on Chinese-to-English -translation show that our phrase-based language model can significantly improve -the translation quality by up to +1.47 absolute BLEU score. -" -1581,1502.01682,"Kathryn Baker, Michael Bloodgood, Bonnie J. Dorr, Chris - Callison-Burch, Nathaniel W. Filardo, Christine Piatko, Lori Levin and Scott - Miller",Use of Modality and Negation in Semantically-Informed Syntactic MT,cs.CL cs.LG stat.ML," This paper describes the resource- and system-building efforts of an -eight-week Johns Hopkins University Human Language Technology Center of -Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on -Semantically-Informed Machine Translation (SIMT). We describe a new -modality/negation (MN) annotation scheme, the creation of a (publicly -available) MN lexicon, and two automated MN taggers that we built using the -annotation scheme and lexicon. Our annotation scheme isolates three components -of modality and negation: a trigger (a word that conveys modality or negation), -a target (an action associated with modality or negation) and a holder (an -experiencer of modality). We describe how our MN lexicon was semi-automatically -produced and we demonstrate that a structure-based MN tagger results in -precision around 86% (depending on genre) for tagging of a standard LDC data -set. - We apply our MN annotation scheme to statistical machine translation using a -syntactic framework that supports the inclusion of semantic annotations. -Syntactic tags enriched with semantic annotations are assigned to parse trees -in the target-language training texts through a process of tree grafting. While -the focus of our work is modality and negation, the tree grafting procedure is -general and supports other types of semantic information. We exploit this -capability by including named entities, produced by a pre-existing tagger, in -addition to the MN elements produced by the taggers described in this paper. -The resulting system significantly outperformed a linguistically naive baseline -model (Hiero), and reached the highest scores yet reported on the NIST 2009 -Urdu-English test set. This finding supports the hypothesis that both syntactic -and semantic information can improve translation quality. -" -1582,1502.01710,"Xiang Zhang, Yann LeCun",Text Understanding from Scratch,cs.LG cs.CL," This article demontrates that we can apply deep learning to text -understanding from character-level inputs all the way up to abstract text -concepts, using temporal convolutional networks (ConvNets). We apply ConvNets -to various large-scale datasets, including ontology classification, sentiment -analysis, and text categorization. We show that temporal ConvNets can achieve -astonishing performance without the knowledge of words, phrases, sentences and -any other syntactic or semantic structures with regards to a human language. -Evidence shows that our models can work for both English and Chinese. -" -1583,1502.01753,"Peter Wittek, S\'andor Dar\'anyi, Efstratios Kontopoulos, Theodoros - Moysiadis, Ioannis Kompatsiaris","Monitoring Term Drift Based on Semantic Consistency in an Evolving - Vector Field",cs.CL cs.LG cs.NE stat.ML," Based on the Aristotelian concept of potentiality vs. actuality allowing for -the study of energy and dynamics in language, we propose a field approach to -lexical analysis. Falling back on the distributional hypothesis to -statistically model word meaning, we used evolving fields as a metaphor to -express time-dependent changes in a vector space model by a combination of -random indexing and evolving self-organizing maps (ESOM). To monitor semantic -drifts within the observation period, an experiment was carried out on the term -space of a collection of 12.8 million Amazon book reviews. For evaluation, the -semantic consistency of ESOM term clusters was compared with their respective -neighbourhoods in WordNet, and contrasted with distances among term vectors by -random indexing. We found that at 0.05 level of significance, the terms in the -clusters showed a high level of semantic consistency. Tracking the drift of -distributional patterns in the term space across time periods, we found that -consistency decreased, but not at a statistically significant level. Our method -is highly scalable, with interpretations in philosophy. -" -1584,1502.02233,"Adham Beykikhoshk, Ognjen Arandjelovic, Dinh Phung, Svetha Venkatesh","Hierarchical Dirichlet process for tracking complex topical structure - evolution and its application to autism research literature",cs.IR cs.CL," In this paper we describe a novel framework for the discovery of the topical -content of a data corpus, and the tracking of its complex structural changes -across the temporal dimension. In contrast to previous work our model does not -impose a prior on the rate at which documents are added to the corpus nor does -it adopt the Markovian assumption which overly restricts the type of changes -that the model can capture. Our key technical contribution is a framework based -on (i) discretization of time into epochs, (ii) epoch-wise topic discovery -using a hierarchical Dirichlet process-based model, and (iii) a temporal -similarity graph which allows for the modelling of complex topic changes: -emergence and disappearance, evolution, and splitting and merging. The power of -the proposed framework is demonstrated on the medical literature corpus -concerned with the autism spectrum disorder (ASD) - an increasingly important -research subject of significant social and healthcare importance. In addition -to the collected ASD literature corpus which we will make freely available, our -contributions also include two free online tools we built as aids to ASD -researchers. These can be used for semantically meaningful navigation and -searching, as well as knowledge discovery from this large and rapidly growing -corpus of literature. -" -1585,1502.02277,Seung-Hoon Na and In-Su Kang and Jong-Hyeok Lee,"Improving Term Frequency Normalization for Multi-topical Documents, and - Application to Language Modeling Approaches",cs.IR cs.CL," Term frequency normalization is a serious issue since lengths of documents -are various. Generally, documents become long due to two different reasons - -verbosity and multi-topicality. First, verbosity means that the same topic is -repeatedly mentioned by terms related to the topic, so that term frequency is -more increased than the well-summarized one. Second, multi-topicality indicates -that a document has a broad discussion of multi-topics, rather than single -topic. Although these document characteristics should be differently handled, -all previous methods of term frequency normalization have ignored these -differences and have used a simplified length-driven approach which decreases -the term frequency by only the length of a document, causing an unreasonable -penalization. To attack this problem, we propose a novel TF normalization -method which is a type of partially-axiomatic approach. We first formulate two -formal constraints that the retrieval model should satisfy for documents having -verbose and multi-topicality characteristic, respectively. Then, we modify -language modeling approaches to better satisfy these two constraints, and -derive novel smoothing methods. Experimental results show that the proposed -method increases significantly the precision for keyword queries, and -substantially improves MAP (Mean Average Precision) for verbose queries. -" -1586,1502.02655,Simon \v{S}uster,"An investigation into language complexity of World-of-Warcraft - game-external texts",cs.CL," We present a language complexity analysis of World of Warcraft (WoW) -community texts, which we compare to texts from a general corpus of web -English. Results from several complexity types are presented, including lexical -diversity, density, readability and syntactic complexity. The language of WoW -texts is found to be comparable to the general corpus on some complexity -measures, yet more specialized on other measures. Our findings can be used by -educators willing to include game-related activities into school curricula. -" -1587,1502.03322,"Yongfeng Zhang, Min Zhang, Yiqun Liu, and Shaoping Ma","Boost Phrase-level Polarity Labelling with Review-level Sentiment - Classification",cs.CL cs.AI," Sentiment analysis on user reviews helps to keep track of user reactions -towards products, and make advices to users about what to buy. State-of-the-art -review-level sentiment classification techniques could give pretty good -precisions of above 90%. However, current phrase-level sentiment analysis -approaches might only give sentiment polarity labelling precisions of around -70%~80%, which is far from satisfaction and restricts its application in many -practical tasks. In this paper, we focus on the problem of phrase-level -sentiment polarity labelling and attempt to bridge the gap between phrase-level -and review-level sentiment analysis. We investigate the inconsistency between -the numerical star ratings and the sentiment orientation of textual user -reviews. Although they have long been treated as identical, which serves as a -basic assumption in previous work, we find that this assumption is not -necessarily true. We further propose to leverage the results of review-level -sentiment classification to boost the performance of phrase-level polarity -labelling using a novel constrained convex optimization framework. Besides, the -framework is capable of integrating various kinds of information sources and -heuristics, while giving the global optimal solution due to its convexity. -Experimental results on both English and Chinese reviews show that our -framework achieves high labelling precisions of up to 89%, which is a -significant improvement from current approaches. -" -1588,1502.03520,"Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski",A Latent Variable Model Approach to PMI-based Word Embeddings,cs.LG cs.CL stat.ML," Semantic word embeddings represent the meaning of a word via a vector, and -are created by diverse methods. Many use nonlinear operations on co-occurrence -statistics, and have hand-tuned hyperparameters and reweighting methods. - This paper proposes a new generative model, a dynamic version of the -log-linear topic model of~\citet{mnih2007three}. The methodological novelty is -to use the prior to compute closed form expressions for word statistics. This -provides a theoretical justification for nonlinear models like PMI, word2vec, -and GloVe, as well as some hyperparameter choices. It also helps explain why -low-dimensional semantic embeddings contain linear algebraic structure that -allows solution of word analogies, as shown by~\citet{mikolov2013efficient} and -many subsequent papers. - Experimental support is provided for the generative model assumptions, the -most important of which is that latent word vectors are fairly uniformly -dispersed in space. -" -1589,1502.03630,"Min Yang, Tianyi Cui, Wenting Tu",Ordering-sensitive and Semantic-aware Topic Modeling,cs.LG cs.CL cs.IR," Topic modeling of textual corpora is an important and challenging problem. In -most previous work, the ""bag-of-words"" assumption is usually made which ignores -the ordering of words. This assumption simplifies the computation, but it -unrealistically loses the ordering information and the semantic of words in the -context. In this paper, we present a Gaussian Mixture Neural Topic Model -(GMNTM) which incorporates both the ordering of words and the semantic meaning -of sentences into topic modeling. Specifically, we represent each topic as a -cluster of multi-dimensional vectors and embed the corpus into a collection of -vectors generated by the Gaussian mixture model. Each word is affected not only -by its topic, but also by the embedding vector of its surrounding words and the -context. The Gaussian mixture components and the topic of documents, sentences -and words can be learnt jointly. Extensive experiments show that our model can -learn better topics and more accurate word distributions for each topic. -Quantitatively, comparing to state-of-the-art topic modeling approaches, GMNTM -obtains significantly better performance in terms of perplexity, retrieval -accuracy and classification accuracy. -" -1590,1502.03671,"R\'emi Lebret, Pedro O. Pinheiro, Ronan Collobert",Phrase-based Image Captioning,cs.CL," Generating a novel textual description of an image is an interesting problem -that connects computer vision and natural language processing. In this paper, -we present a simple model that is able to generate descriptive sentences given -a sample image. This model has a strong focus on the syntax of the -descriptions. We train a purely bilinear model that learns a metric between an -image representation (generated from a previously trained Convolutional Neural -Network) and phrases that are used to described them. The system is then able -to infer phrases from a given image sample. Based on caption syntax statistics, -we propose a simple language model that can produce relevant descriptions for a -given test image using the phrases inferred. Our approach, which is -considerably simpler than state-of-the-art models, achieves comparable results -in two popular datasets for the task: Flickr30k and the recently proposed -Microsoft COCO. -" -1591,1502.03682,"Jose Antonio Mi\~narro-Gim\'enez, Oscar Mar\'in-Alonso, Matthias - Samwald","Applying deep learning techniques on medical corpora from the World Wide - Web: a prototypical system and evaluation",cs.CL cs.IR cs.LG cs.NE," BACKGROUND: The amount of biomedical literature is rapidly growing and it is -becoming increasingly difficult to keep manually curated knowledge bases and -ontologies up-to-date. In this study we applied the word2vec deep learning -toolkit to medical corpora to test its potential for identifying relationships -from unstructured text. We evaluated the efficiency of word2vec in identifying -properties of pharmaceuticals based on mid-sized, unstructured medical text -corpora available on the web. Properties included relationships to diseases -('may treat') or physiological processes ('has physiological effect'). We -compared the relationships identified by word2vec with manually curated -information from the National Drug File - Reference Terminology (NDF-RT) -ontology as a gold standard. RESULTS: Our results revealed a maximum accuracy -of 49.28% which suggests a limited ability of word2vec to capture linguistic -regularities on the collected medical corpora compared with other published -results. We were able to document the influence of different parameter settings -on result accuracy and found and unexpected trade-off between ranking quality -and accuracy. Pre-processing corpora to reduce syntactic variability proved to -be a good strategy for increasing the utility of the trained vector models. -CONCLUSIONS: Word2vec is a very efficient implementation for computing vector -representations and for its ability to identify relationships in textual data -without any prior domain knowledge. We found that the ranking and retrieved -results generated by word2vec were not of sufficient quality for automatic -population of knowledge bases and ontologies, but could serve as a starting -point for further manual curation. -" -1592,1502.03752,"Saad Alkahtani, Wei Liu, and William J. Teahan",A new hybrid metric for verifying parallel corpora of Arabic-English,cs.CL," This paper discusses a new metric that has been applied to verify the quality -in translation between sentence pairs in parallel corpora of Arabic-English. -This metric combines two techniques, one based on sentence length and the other -based on compression code length. Experiments on sample test parallel -Arabic-English corpora indicate the combination of these two techniques -improves accuracy of the identification of satisfactory and unsatisfactory -sentence pairs compared to sentence length and compression code length alone. -The new method proposed in this research is effective at filtering noise and -reducing mis-translations resulting in greatly improved quality. -" -1593,1502.04049,"Preethi Raghavan, James L. Chen, Eric Fosler-Lussier, Albert M. Lai","How essential are unstructured clinical narratives and information - fusion to clinical trial recruitment?",cs.CY cs.AI cs.CL," Electronic health records capture patient information using structured -controlled vocabularies and unstructured narrative text. While structured data -typically encodes lab values, encounters and medication lists, unstructured -data captures the physician's interpretation of the patient's condition, -prognosis, and response to therapeutic intervention. In this paper, we -demonstrate that information extraction from unstructured clinical narratives -is essential to most clinical applications. We perform an empirical study to -validate the argument and show that structured data alone is insufficient in -resolving eligibility criteria for recruiting patients onto clinical trials for -chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is -essential to solving 59% of the CLL trial criteria and 77% of the prostate -cancer trial criteria. More specifically, for resolving eligibility criteria -with temporal constraints, we show the need for temporal reasoning and -information integration with medical events within and across unstructured -clinical narratives and structured data. -" -1594,1502.04081,David Belanger and Sham Kakade,A Linear Dynamical System Model for Text,stat.ML cs.CL cs.LG," Low dimensional representations of words allow accurate NLP models to be -trained on limited annotated data. While most representations ignore words' -local context, a natural way to induce context-dependent representations is to -perform inference in a probabilistic latent-variable sequence model. Given the -recent success of continuous vector space word representations, we provide such -an inference procedure for continuous states, where words' representations are -given by the posterior mean of a linear dynamical system. Here, efficient -inference can be performed using Kalman filtering. Our learning algorithm is -extremely scalable, operating on simple cooccurrence counts for both parameter -initialization using the method of moments and subsequent iterations of EM. In -our experiments, we employ our inferred word embeddings as features in standard -tagging tasks, obtaining significant accuracy improvements. Finally, the Kalman -filter updates can be seen as a linear recurrent neural network. We demonstrate -that using the parameters of our model to initialize a non-linear recurrent -neural network language model reduces its training time by a day and yields -lower perplexity. -" -1595,1502.04174,"Xuezhe Ma, Hai Zhao",Probabilistic Models for High-Order Projective Dependency Parsing,cs.CL," This paper presents generalized probabilistic models for high-order -projective dependency parsing and an algorithmic framework for learning these -statistical models involving dependency trees. Partition functions and -marginals for high-order dependency trees can be computed efficiently, by -adapting our algorithms which extend the inside-outside algorithm to -higher-order cases. To show the effectiveness of our algorithms, we perform -experiments on three languages---English, Chinese and Czech, using maximum -conditional likelihood estimation for model training and L-BFGS for parameter -estimation. Our methods achieve competitive performance for English, and -outperform all previously reported dependency parsers for Chinese and Czech. -" -1596,1502.04938,Arianna Bisazza and Marcello Federico,"A Survey of Word Reordering in Statistical Machine Translation: - Computational Models and Language Phenomena",cs.CL," Word reordering is one of the most difficult aspects of statistical machine -translation (SMT), and an important factor of its quality and efficiency. -Despite the vast amount of research published to date, the interest of the -community in this problem has not decreased, and no single method appears to be -strongly dominant across language pairs. Instead, the choice of the optimal -approach for a new translation task still seems to be mostly driven by -empirical trials. To orientate the reader in this vast and complex research -area, we present a comprehensive survey of word reordering viewed as a -statistical modeling challenge and as a natural language phenomenon. The survey -describes in detail how word reordering is modeled within different -string-based and tree-based SMT frameworks and as a stand-alone task, including -systematic overviews of the literature in advanced reordering modeling. We then -question why some approaches are more successful than others in different -language pairs. We argue that, besides measuring the amount of reordering, it -is important to understand which kinds of reordering occur in a given language -pair. To this end, we conduct a qualitative analysis of word reordering -phenomena in a diverse sample of language pairs, based on a large collection of -linguistic knowledge. Empirical results in the SMT literature are shown to -support the hypothesis that a few linguistic facts can be very useful to -anticipate the reordering characteristics of a language pair and to select the -SMT framework that best suits them. -" -1597,1502.05441,Ahmad B.A. Hassanat and Ghada Awad Altarawneh,"Rule-and Dictionary-based Solution for Variations in Written Arabic - Names in Social Networks, Big Data, Accounting Systems and Large Databases",cs.DB cs.CL cs.IR," This paper investigates the problem that some Arabic names can be written in -multiple ways. When someone searches for only one form of a name, neither exact -nor approximate matching is appropriate for returning the multiple variants of -the name. Exact matching requires the user to enter all forms of the name for -the search, and approximate matching yields names not among the variations of -the one being sought. In this paper, we attempt to solve the problem with a -dictionary of all Arabic names mapped to their different (alternative) writing -forms. We generated alternatives based on rules we derived from reviewing the -first names of 9.9 million citizens and former citizens of Jordan. This -dictionary can be used for both standardizing the written form when inserting a -new name into a database and for searching for the name and all its alternative -written forms. Creating the dictionary automatically based on rules resulted in -at least 7% erroneous acceptance errors and 7.9% erroneous rejection errors. We -addressed the errors by manually editing the dictionary. The dictionary can be -of help to real world-databases, with the qualification that manual editing -does not guarantee 100% correctness. -" -1598,1502.05472,Diego Marcheggiani and Fabrizio Sebastiani,"On the Effects of Low-Quality Training Data on Information Extraction - from Clinical Reports",cs.LG cs.CL cs.IR," In the last five years there has been a flurry of work on information -extraction from clinical documents, i.e., on algorithms capable of extracting, -from the informal and unstructured texts that are generated during everyday -clinical practice, mentions of concepts relevant to such practice. Most of this -literature is about methods based on supervised learning, i.e., methods for -training an information extraction system from manually annotated examples. -While a lot of work has been devoted to devising learning methods that generate -more and more accurate information extractors, no work has been devoted to -investigating the effect of the quality of training data on the learning -process. Low quality in training data often derives from the fact that the -person who has annotated the data is different from the one against whose -judgment the automatically annotated data must be evaluated. In this paper we -test the impact of such data quality issues on the accuracy of information -extraction systems as applied to the clinical domain. We do this by comparing -the accuracy deriving from training data annotated by the authoritative coder -(i.e., the one who has also annotated the test data, and by whose judgment we -must abide), with the accuracy deriving from training data annotated by a -different coder. The results indicate that, although the disagreement between -the two coders (as measured on the training set) is substantial, the difference -is (surprisingly enough) not always statistically significant. -" -1599,1502.05698,"Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart - van Merri\""enboer, Armand Joulin, Tomas Mikolov",Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks,cs.AI cs.CL stat.ML," One long-term goal of machine learning research is to produce methods that -are applicable to reasoning and natural language, in particular building an -intelligent dialogue agent. To measure progress towards that goal, we argue for -the usefulness of a set of proxy tasks that evaluate reading comprehension via -question answering. Our tasks measure understanding in several ways: whether a -system is able to answer questions via chaining facts, simple induction, -deduction and many more. The tasks are designed to be prerequisites for any -system that aims to be capable of conversing with a human. We believe many -existing learning systems can currently not solve them, and hence our aim is to -classify these tasks into skill sets, so that researchers can identify (and -then rectify) the failings of their systems. We also extend and improve the -recently introduced Memory Networks model, and show it is able to solve some, -but not all, of the tasks. -" -1600,1502.05957,"Andrew R. Cohen (Dept Electri. Comput. Eng., Drexel Univ.), Paul M.B. - Vitanyi (CWI and University of Amsterdam)",Web Similarity in Sets of Search Terms using Database Queries,cs.IR cs.CL cs.CV," Normalized web distance (NWD) is a similarity or normalized semantic distance -based on the World Wide Web or another large electronic database, for instance -Wikipedia, and a search engine that returns reliable aggregate page counts. For -sets of search terms the NWD gives a common similarity (common semantics) on a -scale from 0 (identical) to 1 (completely different). The NWD approximates the -similarity of members of a set according to all (upper semi)computable -properties. We develop the theory and give applications of classifying using -Amazon, Wikipedia, and the NCBI website from the National Institutes of Health. -The last gives new correlations between health hazards. A restriction of the -NWD to a set of two yields the earlier normalized google distance (NGD) but no -combination of the NGD's of pairs in a set can extract the information the NWD -extracts from the set. The NWD enables a new contextual (different databases) -learning approachbased on Kolmogorov complexity theory that incorporates -knowledge from these databases. -" -1601,1502.06161,Thiago Marzag\~ao,Using NLP to measure democracy,cs.CL cs.IR cs.LG stat.ML," This paper uses natural language processing to create the first machine-coded -democracy index, which I call Automated Democracy Scores (ADS). The ADS are -based on 42 million news articles from 6,043 different sources and cover all -independent countries in the 1993-2012 period. Unlike the democracy indices we -have today the ADS are replicable and have standard errors small enough to -actually distinguish between cases. - The ADS are produced with supervised learning. Three approaches are tried: a) -a combination of Latent Semantic Analysis and tree-based regression methods; b) -a combination of Latent Dirichlet Allocation and tree-based regression methods; -and c) the Wordscores algorithm. The Wordscores algorithm outperforms the -alternatives, so it is the one on which the ADS are based. - There is a web application where anyone can change the training set and see -how the results change: democracy-scores.org -" -1602,1502.06922,"Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, - Jianshu Chen, Xinying Song, Rabab Ward","Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis - and Application to Information Retrieval",cs.CL cs.IR cs.LG cs.NE," This paper develops a model that addresses sentence embedding, a hot topic in -current natural language processing research, using recurrent neural networks -with Long Short-Term Memory (LSTM) cells. Due to its ability to capture long -term memory, the LSTM-RNN accumulates increasingly richer information as it -goes through the sentence, and when it reaches the last word, the hidden layer -of the network provides a semantic representation of the whole sentence. In -this paper, the LSTM-RNN is trained in a weakly supervised manner on user -click-through data logged by a commercial web search engine. Visualization and -analysis are performed to understand how the embedding process works. The model -is found to automatically attenuate the unimportant words and detects the -salient keywords in the sentence. Furthermore, these detected keywords are -found to automatically activate different cells of the LSTM-RNN, where words -belonging to a similar topic activate the same cell. As a semantic -representation of the sentence, the embedding vector can be used in many -different applications. These automatic keyword detection and topic allocation -abilities enabled by the LSTM-RNN allow the network to perform document -retrieval, a difficult language processing task, where the similarity between -the query and documents can be measured by the distance between their -corresponding sentence embedding vectors computed by the LSTM-RNN. On a web -search task, the LSTM-RNN embedding is shown to significantly outperform -several existing state of the art methods. We emphasize that the proposed model -generates sentence embedding vectors that are specially useful for web document -retrieval tasks. A comparison with a well known general sentence embedding -method, the Paragraph Vector, is performed. The results show that the proposed -method in this paper significantly outperforms it for web document retrieval -task. -" -1603,1502.07038,Dominick Ng and Mohit Bansal and James R. Curran,Web-scale Surface and Syntactic n-gram Features for Dependency Parsing,cs.CL," We develop novel first- and second-order features for dependency parsing -based on the Google Syntactic Ngrams corpus, a collection of subtree counts of -parsed sentences from scanned books. We also extend previous work on surface -$n$-gram features from Web1T to the Google Books corpus and from first-order to -second-order, comparing and analysing performance over newswire and web -treebanks. - Surface and syntactic $n$-grams both produce substantial and complementary -gains in parsing accuracy across domains. Our best system combines the two -feature sets, achieving up to 0.8% absolute UAS improvements on newswire and -1.4% on web text. -" -1604,1502.07157,"Pierre-Fran\c{c}ois Marteau (IRISA), Guiyao Ke (IRISA)","Exploiting a comparability mapping to improve bi-lingual data - categorization: a three-mode data analysis perspective",cs.IR cs.CL," We address in this paper the co-clustering and co-classification of bilingual -data laying in two linguistic similarity spaces when a comparability measure -defining a mapping between these two spaces is available. A new approach that -we can characterized as a three-mode analysis scheme, is proposed to mix the -comparability measure with the two similarity measures. Our aim is to improve -jointly the accuracy of classification and clustering tasks performed in each -of the two linguistic spaces, as well as the quality of the final alignment of -comparable clusters that can be obtained. We used first some purely synthetic -random data sets to assess our formal similarity-comparability mixing model. We -then propose two variants of the comparability measure that has been defined by -(Li and Gaussier 2010) in the context of bilingual lexicon extraction to adapt -it to clustering or categorizing tasks. These two variant measures are -subsequently used to evaluate our similarity-comparability mixing model in the -context of the co-classification and co-clustering of comparable textual data -sets collected from Wikipedia categories for the English and French languages. -Our experiments show clear improvements in clustering and classification -accuracies when mixing comparability with similarity measures, with, as -expected, a higher robustness obtained when the two comparability variant -measures that we propose are used. We believe that this approach is -particularly well suited for the construction of thematic comparable corpora of -controllable quality. -" -1605,1502.07257,"Sergey Bartunov, Dmitry Kondrashkin, Anton Osokin, Dmitry Vetrov",Breaking Sticks and Ambiguities with Adaptive Skip-gram,cs.CL," Recently proposed Skip-gram model is a powerful method for learning -high-dimensional word representations that capture rich semantic relationships -between words. However, Skip-gram as well as most prior work on learning word -representations does not take into account word ambiguity and maintain only -single representation per word. Although a number of Skip-gram modifications -were proposed to overcome this limitation and learn multi-prototype word -representations, they either require a known number of word meanings or learn -them using greedy heuristic approaches. In this paper we propose the Adaptive -Skip-gram model which is a nonparametric Bayesian extension of Skip-gram -capable to automatically learn the required number of representations for all -words at desired semantic resolution. We derive efficient online variational -learning algorithm for the model and empirically demonstrate its efficiency on -word-sense induction task. -" -1606,1502.07504,Attia Nehar and Djelloul Ziadi and Hadda Cherroun,Rational Kernels for Arabic Stemming and Text Classification,cs.CL," In this paper, we address the problems of Arabic Text Classification and -stemming using Transducers and Rational Kernels. We introduce a new stemming -technique based on the use of Arabic patterns (Pattern Based Stemmer). Patterns -are modelled using transducers and stemming is done without depending on any -dictionary. Using transducers for stemming, documents are transformed into -finite state transducers. This document representation allows us to use and -explore rational kernels as a framework for Arabic Text Classification. -Stemming experiments are conducted on three word collections and classification -experiments are done on the Saudi Press Agency dataset. Results show that our -approach, when compared with other approaches, is promising specially in terms -of Accuracy, Recall and F1. -" -1607,1502.07920,Jiajun Zhang,Local Translation Prediction with Global Sentence Representation,cs.CL," Statistical machine translation models have made great progress in improving -the translation quality. However, the existing models predict the target -translation with only the source- and target-side local context information. In -practice, distinguishing good translations from bad ones does not only depend -on the local features, but also rely on the global sentence-level information. -In this paper, we explore the source-side global sentence-level features for -target-side local translation prediction. We propose a novel -bilingually-constrained chunk-based convolutional neural network to learn -sentence semantic representations. With the sentence-level feature -representation, we further design a feed-forward neural network to better -predict translations using both local and global information. The large-scale -experiments show that our method can obtain substantial improvements in -translation quality over the strong baseline: the hierarchical phrase-based -translation model augmented with the neural network joint model. -" -1608,1502.08029,"Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, - Hugo Larochelle, Aaron Courville",Describing Videos by Exploiting Temporal Structure,stat.ML cs.AI cs.CL cs.CV cs.LG," Recent progress in using recurrent neural networks (RNNs) for image -description has motivated the exploration of their application for video -description. However, while images are static, working with videos requires -modeling their dynamic temporal structure and then properly integrating that -information into a natural language description. In this context, we propose an -approach that successfully takes into account both the local and global -temporal structure of videos to produce descriptions. First, our approach -incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) -representation of the short temporal dynamics. The 3-D CNN representation is -trained on video action recognition tasks, so as to produce a representation -that is tuned to human motion and behavior. Second we propose a temporal -attention mechanism that allows to go beyond local temporal modeling and learns -to automatically select the most relevant temporal segments given the -text-generating RNN. Our approach exceeds the current state-of-art for both -BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on -a new, larger and more challenging dataset of paired video and natural language -descriptions. -" -1609,1502.08030,"Hung Nghiep Tran, Tin Huynh, Tien Do",Author Name Disambiguation by Using Deep Neural Network,cs.DL cs.CL cs.LG," Author name ambiguity decreases the quality and reliability of information -retrieved from digital libraries. Existing methods have tried to solve this -problem by predefining a feature set based on expert's knowledge for a specific -dataset. In this paper, we propose a new approach which uses deep neural -network to learn features automatically from data. Additionally, we propose the -general system architecture for author name disambiguation on any dataset. In -this research, we evaluate the proposed method on a dataset containing -Vietnamese author names. The results show that this method significantly -outperforms other methods that use predefined feature set. The proposed method -achieves 99.31% in terms of accuracy. Prediction error rate decreases from -1.83% to 0.69%, i.e., it decreases by 1.14%, or 62.3% relatively compared with -other methods that use predefined feature set (Table 3). -" -1610,1502.08033,"Vu Le Anh, Vo Hoang Hai, Hung Nghiep Tran, Jason J. Jung","SciRecSys: A Recommendation System for Scientific Publication by - Discovering Keyword Relationships",cs.DL cs.CL cs.IR," In this work, we propose a new approach for discovering various relationships -among keywords over the scientific publications based on a Markov Chain model. -It is an important problem since keywords are the basic elements for -representing abstract objects such as documents, user profiles, topics and many -things else. Our model is very effective since it combines four important -factors in scientific publications: content, publicity, impact and randomness. -Particularly, a recommendation system (called SciRecSys) has been presented to -support users to efficiently find out relevant articles. -" -1611,1503.00030,Daniel Fern\'andez-Gonz\'alez and Andr\'e F. T. Martins,Parsing as Reduction,cs.CL," We reduce phrase-representation parsing to dependency parsing. Our reduction -is grounded on a new intermediate representation, ""head-ordered dependency -trees"", shown to be isomorphic to constituent trees. By encoding order -information in the dependency labels, we show that any off-the-shelf, trainable -dependency parser can be used to produce constituents. When this parser is -non-projective, we can perform discontinuous parsing in a very natural manner. -Despite the simplicity of our approach, experiments show that the resulting -parsers are on par with strong baselines, such as the Berkeley parser for -English and the best single system in the SPMRL-2014 shared task. Results are -particularly striking for discontinuous parsing of German, where we surpass the -current state of the art by a wide margin. -" -1612,1503.00064,"Dahua Lin, Chen Kong, Sanja Fidler, Raquel Urtasun",Generating Multi-Sentence Lingual Descriptions of Indoor Scenes,cs.CV cs.CL," This paper proposes a novel framework for generating lingual descriptions of -indoor scenes. Whereas substantial efforts have been made to tackle this -problem, previous approaches focusing primarily on generating a single sentence -for each image, which is not sufficient for describing complex scenes. We -attempt to go beyond this, by generating coherent descriptions with multiple -sentences. Our approach is distinguished from conventional ones in several -aspects: (1) a 3D visual parsing system that jointly infers objects, -attributes, and relations; (2) a generative grammar learned automatically from -training text; and (3) a text generation algorithm that takes into account the -coherence among sentences. Experiments on the augmented NYU-v2 dataset show -that our framework can generate natural descriptions with substantially higher -ROGUE scores compared to those produced by the baseline. -" -1613,1503.00075,"Kai Sheng Tai, Richard Socher, Christopher D. Manning","Improved Semantic Representations From Tree-Structured Long Short-Term - Memory Networks",cs.CL cs.AI cs.LG," Because of their superior ability to preserve sequence information over time, -Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with -a more complex computational unit, have obtained strong results on a variety of -sequence modeling tasks. The only underlying LSTM structure that has been -explored so far is a linear chain. However, natural language exhibits syntactic -properties that would naturally combine words to phrases. We introduce the -Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. -Tree-LSTMs outperform all existing systems and strong LSTM baselines on two -tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task -1) and sentiment classification (Stanford Sentiment Treebank). -" -1614,1503.00095,"Kazuma Hashimoto, Pontus Stenetorp, Makoto Miwa, Yoshimasa Tsuruoka","Task-Oriented Learning of Word Embeddings for Semantic Relation - Classification",cs.CL," We present a novel learning method for word embeddings designed for relation -classification. Our word embeddings are trained by predicting words between -noun pairs using lexical relation-specific features on a large unlabeled -corpus. This allows us to explicitly incorporate relation-specific information -into the word embeddings. The learned word embeddings are then used to -construct feature vectors for a relation classification model. On a -well-established semantic relation classification task, our method -significantly outperforms a baseline based on a previously introduced word -embedding method, and compares favorably to previous state-of-the-art models -that use syntactic information or manually constructed external resources. -" -1615,1503.00107,Shujian Huang and Huadong Chen and Xinyu Dai and Jiajun Chen,Non-linear Learning for Statistical Machine Translation,cs.CL cs.NE," Modern statistical machine translation (SMT) systems usually use a linear -combination of features to model the quality of each translation hypothesis. -The linear combination assumes that all the features are in a linear -relationship and constrains that each feature interacts with the rest features -in an linear manner, which might limit the expressive power of the model and -lead to a under-fit model on the current data. In this paper, we propose a -non-linear modeling for the quality of translation hypotheses based on neural -networks, which allows more complex interaction between features. A learning -framework is presented for training the non-linear models. We also discuss -possible heuristics in designing the network structure which may improve the -non-linear learning performance. Experimental results show that with the basic -features of a hierarchical phrase-based machine translation system, our method -produce translations that are better than a linear model. -" -1616,1503.00168,Jiwei Li and Eduard Hovy,The NLP Engine: A Universal Turing Machine for NLP,cs.CL," It is commonly accepted that machine translation is a more complex task than -part of speech tagging. But how much more complex? In this paper we make an -attempt to develop a general framework and methodology for computing the -informational and/or processing complexity of NLP applications and tasks. We -define a universal framework akin to a Turning Machine that attempts to fit -(most) NLP tasks into one paradigm. We calculate the complexities of various -NLP tasks using measures of Shannon Entropy, and compare `simple' ones such as -part of speech tagging to `complex' ones such as machine translation. This -paper provides a first, though far from perfect, attempt to quantify NLP tasks -under a uniform paradigm. We point out current deficiencies and suggest some -avenues for fruitful research. -" -1617,1503.00185,"Jiwei Li, Minh-Thang Luong, Dan Jurafsky and Eudard Hovy",When Are Tree Structures Necessary for Deep Learning of Representations?,cs.AI cs.CL," Recursive neural models, which use syntactic parse trees to recursively -generate representations bottom-up, are a popular architecture. But there have -not been rigorous evaluations showing for exactly which tasks this syntax-based -method is appropriate. In this paper we benchmark {\bf recursive} neural models -against sequential {\bf recurrent} neural models (simple recurrent and LSTM -models), enforcing apples-to-apples comparison as much as possible. We -investigate 4 tasks: (1) sentiment classification at the sentence level and -phrase level; (2) matching questions to answer-phrases; (3) discourse parsing; -(4) semantic relation extraction (e.g., {\em component-whole} between nouns). - Our goal is to understand better when, and why, recursive models can -outperform simpler models. We find that recursive models help mainly on tasks -(like semantic relation extraction) that require associating headwords across a -long distance, particularly on very long sequences. We then introduce a method -for allowing recurrent models to achieve similar performance: breaking long -sentences into clause-like units at punctuation and processing them separately -before combining. Our results thus help understand the limitations of both -classes of models, and suggest directions for improving recurrent models. -" -1618,1503.00339,Vladislav Kargin,Variation of word frequencies in Russian literary texts,cs.CL physics.soc-ph stat.AP," We study the variation of word frequencies in Russian literary texts. Our -findings indicate that the standard deviation of a word's frequency across -texts depends on its average frequency according to a power law with exponent -$0.62,$ showing that the rarer words have a relatively larger degree of -frequency volatility (i.e., ""burstiness""). - Several latent factors models have been estimated to investigate the -structure of the word frequency distribution. The dependence of a word's -frequency volatility on its average frequency can be explained by the asymmetry -in the distribution of latent factors. -" -1619,1503.00693,Dani Yogatama and Noah A. Smith,Bayesian Optimization of Text Representations,cs.CL cs.LG stat.ML," When applying machine learning to problems in NLP, there are many choices to -make about how to represent input texts. These choices can have a big effect on -performance, but they are often uninteresting to researchers or practitioners -who simply need a module that performs well. We propose an approach to -optimizing over this space of choices, formulating the problem as global -optimization. We apply a sequential model-based optimization technique and show -that our method makes standard linear models competitive with more -sophisticated, expensive state-of-the-art methods based on latent variable -models or neural networks on various topic classification and sentiment -analysis problems. Our approach is a first step towards black-box NLP systems -that work with raw text and do not require manual tuning. -" -1620,1503.00841,"Biao Liu, Minlie Huang",Robustly Leveraging Prior Knowledge in Text Classification,cs.CL cs.AI cs.IR cs.LG," Prior knowledge has been shown very useful to address many natural language -processing tasks. Many approaches have been proposed to formalise a variety of -knowledge, however, whether the proposed approach is robust or sensitive to the -knowledge supplied to the model has rarely been discussed. In this paper, we -propose three regularization terms on top of generalized expectation criteria, -and conduct extensive experiments to justify the robustness of the proposed -methods. Experimental results demonstrate that our proposed methods obtain -remarkable improvements and are much more robust than baselines. -" -1621,1503.01129,Marcelo A Montemurro and Dami\'an H Zanette,Complexity and universality in the long-range order of words,cs.CL physics.data-an physics.soc-ph," As is the case of many signals produced by complex systems, language presents -a statistical structure that is balanced between order and disorder. Here we -review and extend recent results from quantitative characterisations of the -degree of order in linguistic sequences that give insights into two relevant -aspects of language: the presence of statistical universals in word ordering, -and the link between semantic information and the statistical linguistic -structure. We first analyse a measure of relative entropy that assesses how -much the ordering of words contributes to the overall statistical structure of -language. This measure presents an almost constant value close to 3.5 bits/word -across several linguistic families. Then, we show that a direct application of -information theory leads to an entropy measure that can quantify and extract -semantic structures from linguistic samples, even without prior knowledge of -the underlying language. -" -1622,1503.01180,Chenhao Tan and Lillian Lee,"All Who Wander: On the Prevalence and Characteristics of Multi-community - Engagement",cs.SI cs.CL physics.soc-ph," Although analyzing user behavior within individual communities is an active -and rich research domain, people usually interact with multiple communities -both on- and off-line. How do users act in such multi-community environments? -Although there are a host of intriguing aspects to this question, it has -received much less attention in the research community in comparison to the -intra-community case. In this paper, we examine three aspects of -multi-community engagement: the sequence of communities that users post to, the -language that users employ in those communities, and the feedback that users -receive, using longitudinal posting behavior on Reddit as our main data source, -and DBLP for auxiliary experiments. We also demonstrate the effectiveness of -features drawn from these aspects in predicting users' future level of -activity. - One might expect that a user's trajectory mimics the ""settling-down"" process -in real life: an initial exploration of sub-communities before settling down -into a few niches. However, we find that the users in our data continually post -in new communities; moreover, as time goes on, they post increasingly evenly -among a more diverse set of smaller communities. Interestingly, it seems that -users that eventually leave the community are ""destined"" to do so from the very -beginning, in the sense of showing significantly different ""wandering"" patterns -very early on in their trajectories; this finding has potentially important -design implications for community maintainers. Our multi-community perspective -also allows us to investigate the ""situation vs. personality"" debate from -language usage across different communities. -" -1623,1503.01190,"Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr, - Lori Levin, Christine D. Piatko, Owen Rambow and Benjamin Van Durme","Statistical modality tagging from rule-based annotations and - crowdsourcing",cs.CL cs.LG stat.ML," We explore training an automatic modality tagger. Modality is the attitude -that a speaker might have toward an event or state. One of the main hurdles for -training a linguistic tagger is gathering training data. This is particularly -problematic for training a tagger for modality because modality triggers are -sparse for the overwhelming majority of sentences. We investigate an approach -to automatically training a modality tagger where we first gathered sentences -based on a high-recall simple rule-based modality tagger and then provided -these sentences to Mechanical Turk annotators for further annotation. We used -the resulting set of training data to train a precise modality tagger using a -multi-class SVM that delivers good performance. -" -1624,1503.01258,Oleg V. Pavenkov and Vladimir G. Pavenkov and Mariia V. Rubtcova,"The concept ""altruism"" for sociological research: from conceptualization - to operationalization",cs.CY cs.CL," This article addresses the question of the relevant conceptualization of -{\guillemotleft}altruism{\guillemotright} in Russian from the perspective -sociological research operationalization. It investigates the spheres of social -application of the word {\guillemotleft}altruism{\guillemotright}, include -Russian equivalent {\guillemotleft}vzaimopomoshh`{\guillemotright} (mutual -help). The data for the study comes from Russian National Corpus (Russian). The -theoretical framework consists of Paul F. Lazarsfeld`s Theory of Sociological -Research Methodology and the Natural Semantic Metalanguage (NSM). Quantitative -analysis shows features in the representation of altruism in Russian that -sociologists need to know in the preparation of questionnaires, interview -guides and analysis of transcripts. -" -1625,1503.01397,Luke Vilnis and David Belanger and Daniel Sheldon and Andrew McCallum,Bethe Projections for Non-Local Inference,stat.ML cs.CL cs.LG," Many inference problems in structured prediction are naturally solved by -augmenting a tractable dependency structure with complex, non-local auxiliary -objectives. This includes the mean field family of variational inference -algorithms, soft- or hard-constrained inference using Lagrangian relaxation or -linear programming, collective graphical models, and forms of semi-supervised -learning such as posterior regularization. We present a method to -discriminatively learn broad families of inference objectives, capturing -powerful non-local statistics of the latent variables, while maintaining -tractable and provably fast inference using non-Euclidean projected gradient -descent with a distance-generating function given by the Bethe entropy. We -demonstrate the performance and flexibility of our method by (1) extracting -structured citations from research papers by learning soft global constraints, -(2) achieving state-of-the-art results on a widely-used handwriting recognition -task using a novel learned non-convex inference procedure, and (3) providing a -fast and highly scalable algorithm for the challenging problem of inference in -a collective graphical model applied to bird migration. -" -1626,1503.01549,"William Hsu, Mohammed Abduljabbar, Ryuichi Osuga, Max Lu, Wesam - Elshamy","Visualization of Clandestine Labs from Seizure Reports: Thematic Mapping - and Data Mining Research Directions",cs.IR cs.CL," The problem of spatiotemporal event visualization based on reports entails -subtasks ranging from named entity recognition to relationship extraction and -mapping of events. We present an approach to event extraction that is driven by -data mining and visualization goals, particularly thematic mapping and trend -analysis. This paper focuses on bridging the information extraction and -visualization tasks and investigates topic modeling approaches. We develop a -static, finite topic model and examine the potential benefits and feasibility -of extending this to dynamic topic modeling with a large number of topics and -continuous time. We describe an experimental test bed for event mapping that -uses this end-to-end information retrieval system, and report preliminary -results on a geoinformatics problem: tracking of methamphetamine lab seizure -events across time and space. -" -1627,1503.01558,"Jonathan Malmaud, Jonathan Huang, Vivek Rathod, Nick Johnston, Andrew - Rabinovich, and Kevin Murphy","What's Cookin'? Interpreting Cooking Videos using Text, Speech and - Vision",cs.CL cs.CV cs.IR," We present a novel method for aligning a sequence of instructions to a video -of someone carrying out a task. In particular, we focus on the cooking domain, -where the instructions correspond to the recipe. Our technique relies on an HMM -to align the recipe steps to the (automatically generated) speech transcript. -We then refine this alignment using a state-of-the-art visual food detector, -based on a deep convolutional neural network. We show that our technique -outperforms simpler techniques based on keyword spotting. It also enables -interesting applications, such as automatically illustrating recipes with -keyframes, and searching within a video for events of interest. -" -1628,1503.01655,"Eneko Agirre, Ander Barrena and Aitor Soroa","Studying the Wikipedia Hyperlink Graph for Relatedness and - Disambiguation",cs.CL," Hyperlinks and other relations in Wikipedia are a extraordinary resource -which is still not fully understood. In this paper we study the different types -of links in Wikipedia, and contrast the use of the full graph with respect to -just direct links. We apply a well-known random walk algorithm on two tasks, -word relatedness and named-entity disambiguation. We show that using the full -graph is more effective than just direct links by a large margin, that -non-reciprocal links harm performance, and that there is no benefit from -categories and infoboxes, with coherent results on both tasks. We set new -state-of-the-art figures for systems based on Wikipedia links, comparable to -systems exploiting several information sources and/or supervised machine -learning. Our approach is open source, with instruction to reproduce results, -and amenable to be integrated with complementary text-based methods. -" -1629,1503.01838,"Fandong Meng and Zhengdong Lu and Mingxuan Wang and Hang Li and Wenbin - Jiang and Qun Liu","Encoding Source Language with Convolutional Neural Network for Machine - Translation",cs.CL cs.LG cs.NE," The recently proposed neural network joint model (NNJM) (Devlin et al., 2014) -augments the n-gram target language model with a heuristically chosen source -context window, achieving state-of-the-art performance in SMT. In this paper, -we give a more systematic treatment by summarizing the relevant source -information through a convolutional architecture guided by the target -information. With different guiding signals during decoding, our specifically -designed convolution+gating architectures can pinpoint the parts of a source -sentence that are relevant to predicting a target word, and fuse them with the -context of entire source sentence to form a unified representation. This -representation, together with target language words, are fed to a deep neural -network (DNN) to form a stronger NNJM. Experiments on two NIST Chinese-English -translation tasks show that the proposed model can achieve significant -improvements over the previous NNJM by up to +1.08 BLEU points on average -" -1630,1503.02108,"Zhen Huang, Sabato Marco Siniscalchi, I-Fan Chen, Jiadong Wu, and - Chin-Hui Lee",Maximum a Posteriori Adaptation of Network Parameters in Deep Models,cs.LG cs.CL cs.NE," We present a Bayesian approach to adapting parameters of a well-trained -context-dependent, deep-neural-network, hidden Markov model (CD-DNN-HMM) to -improve automatic speech recognition performance. Given an abundance of DNN -parameters but with only a limited amount of data, the effectiveness of the -adapted DNN model can often be compromised. We formulate maximum a posteriori -(MAP) adaptation of parameters of a specially designed CD-DNN-HMM with an -augmented linear hidden networks connected to the output tied states, or -senones, and compare it to feature space MAP linear regression previously -proposed. Experimental evidences on the 20,000-word open vocabulary Wall Street -Journal task demonstrate the feasibility of the proposed framework. In -supervised adaptation, the proposed MAP adaptation approach provides more than -10% relative error reduction and consistently outperforms the conventional -transformation based methods. Furthermore, we present an initial attempt to -generate hierarchical priors to improve adaptation efficiency and effectiveness -with limited adaptation data by exploiting similarities among senones. -" -1631,1503.02120,"Jake Ryland Williams, Eric M. Clark, James P. Bagrow, Christopher M. - Danforth, and Peter Sheridan Dodds","Identifying missing dictionary entries with frequency-conserving context - models",cs.CL cs.IT math.IT stat.ML," In an effort to better understand meaning from natural language texts, we -explore methods aimed at organizing lexical objects into contexts. A number of -these methods for organization fall into a family defined by word ordering. -Unlike demographic or spatial partitions of data, these collocation models are -of special importance for their universal applicability. While we are -interested here in text and have framed our treatment appropriately, our work -is potentially applicable to other areas of research (e.g., speech, genomics, -and mobility patterns) where one has ordered categorical data, (e.g., sounds, -genes, and locations). Our approach focuses on the phrase (whether word or -larger) as the primary meaning-bearing lexical unit and object of study. To do -so, we employ our previously developed framework for generating word-conserving -phrase-frequency data. Upon training our model with the Wiktionary---an -extensive, online, collaborative, and open-source dictionary that contains over -100,000 phrasal-definitions---we develop highly effective filters for the -identification of meaningful, missing phrase-entries. With our predictions we -then engage the editorial community of the Wiktionary and propose short lists -of potential missing entries for definition, developing a breakthrough, lexical -extraction technique, and expanding our knowledge of the defined English -lexicon of phrases. -" -1632,1503.02335,"Karthik Narasimhan, Regina Barzilay, Tommi Jaakkola",An Unsupervised Method for Uncovering Morphological Chains,cs.CL," Most state-of-the-art systems today produce morphological analysis based only -on orthographic patterns. In contrast, we propose a model for unsupervised -morphological analysis that integrates orthographic and semantic views of -words. We model word formation in terms of morphological chains, from base -words to the observed words, breaking the chains into parent-child relations. -We use log-linear models with morpheme and word-level features to predict -possible parents, including their modifications, for each word. The limited set -of candidate parents for each word render contrastive estimation feasible. Our -model consistently matches or outperforms five state-of-the-art systems on -Arabic, English and Turkish. -" -1633,1503.02357,"Zhaopeng Tu, Baotian Hu, Zhengdong Lu, and Hang Li","Context-Dependent Translation Selection Using Convolutional Neural - Network",cs.CL cs.LG cs.NE," We propose a novel method for translation selection in statistical machine -translation, in which a convolutional neural network is employed to judge the -similarity between a phrase pair in two languages. The specifically designed -convolutional architecture encodes not only the semantic similarity of the -translation pair, but also the context containing the phrase in the source -language. Therefore, our approach is able to capture context-dependent semantic -similarities of translation pairs. We adopt a curriculum learning strategy to -train the model: we classify the training examples into easy, medium, and -difficult categories, and gradually build the ability of representing phrase -and sentence level context by using training examples from easy to difficult. -Experimental results show that our approach significantly outperforms the -baseline system by up to 1.4 BLEU points. -" -1634,1503.02364,"Lifeng Shang, Zhengdong Lu, Hang Li",Neural Responding Machine for Short-Text Conversation,cs.CL cs.AI cs.NE," We propose Neural Responding Machine (NRM), a neural network-based response -generator for Short-Text Conversation. NRM takes the general encoder-decoder -framework: it formalizes the generation of response as a decoding process based -on the latent representation of the input text, while both encoding and -decoding are realized with recurrent neural networks (RNN). The NRM is trained -with a large amount of one-round conversation data collected from a -microblogging service. Empirical study shows that NRM can generate -grammatically correct and content-wise appropriate responses to over 75% of the -input text, outperforming state-of-the-arts in the same setting, including -retrieval-based and SMT-based models. -" -1635,1503.02417,"Ehsan Shareghi, Gholamreza Haffari, Trevor Cohn, Ann Nicholson",Structured Prediction of Sequences and Trees using Infinite Contexts,cs.LG cs.CL," Linguistic structures exhibit a rich array of global phenomena, however -commonly used Markov models are unable to adequately describe these phenomena -due to their strong locality assumptions. We propose a novel hierarchical model -for structured prediction over sequences and trees which exploits global -context by conditioning each generation decision on an unbounded context of -prior decisions. This builds on the success of Markov models but without -imposing a fixed bound in order to better represent global phenomena. To -facilitate learning of this large and unbounded model, we use a hierarchical -Pitman-Yor process prior which provides a recursive form of smoothing. We -propose prediction algorithms based on A* and Markov Chain Monte Carlo -sampling. Empirical results demonstrate the potential of our model compared to -baseline finite-context Markov models on part-of-speech tagging and syntactic -parsing. -" -1636,1503.02427,Mingxuan Wang and Zhengdong Lu and Hang Li and Qun Liu,Syntax-based Deep Matching of Short Texts,cs.CL cs.LG cs.NE," Many tasks in natural language processing, ranging from machine translation -to question answering, can be reduced to the problem of matching two sentences -or more generally two short texts. We propose a new approach to the problem, -called Deep Match Tree (DeepMatch$_{tree}$), under a general setting. The -approach consists of two components, 1) a mining algorithm to discover patterns -for matching two short-texts, defined in the product space of dependency trees, -and 2) a deep neural network for matching short texts using the mined patterns, -as well as a learning algorithm to build the network having a sparse structure. -We test our algorithm on the problem of matching a tweet and a response in -social media, a hard matching problem proposed in [Wang et al., 2013], and show -that DeepMatch$_{tree}$ can outperform a number of competitor models including -one without using dependency trees and one based on word-embedding, all with -large margins -" -1637,1503.02510,Phong Le and Willem Zuidema,Compositional Distributional Semantics with Long Short Term Memory,cs.CL cs.AI cs.LG," We are proposing an extension of the recursive neural network that makes use -of a variant of the long short-term memory architecture. The extension allows -information low in parse trees to be stored in a memory register (the `memory -cell') and used much later higher up in the parse tree. This provides a -solution to the vanishing gradient problem and allows the network to capture -long range dependencies. Experimental results show that our composition -outperformed the traditional neural-network composition on the Stanford -Sentiment Treebank. -" -1638,1503.02801,"Jiaming Xu, Bo Xu, Guanhua Tian, Jun Zhao, Fangyuan Wang, Hongwei Hao","Short Text Hashing Improved by Integrating Multi-Granularity Topics and - Tags",cs.IR cs.CL," Due to computational and storage efficiencies of compact binary codes, -hashing has been widely used for large-scale similarity search. Unfortunately, -many existing hashing methods based on observed keyword features are not -effective for short texts due to the sparseness and shortness. Recently, some -researchers try to utilize latent topics of certain granularity to preserve -semantic similarity in hash codes beyond keyword matching. However, topics of -certain granularity are not adequate to represent the intrinsic semantic -information. In this paper, we present a novel unified approach for short text -Hashing using Multi-granularity Topics and Tags, dubbed HMTT. In particular, we -propose a selection method to choose the optimal multi-granularity topics -depending on the type of dataset, and design two distinct hashing strategies to -incorporate multi-granularity topics. We also propose a simple and effective -method to exploit tags to enhance the similarity of related texts. We carry out -extensive experiments on one short text dataset as well as on one normal text -dataset. The results demonstrate that our approach is effective and -significantly outperforms baselines on several evaluation metrics. -" -1639,1503.03244,"Baotian Hu, Zhengdong Lu, Hang Li, Qingcai Chen","Convolutional Neural Network Architectures for Matching Natural Language - Sentences",cs.CL cs.LG cs.NE," Semantic matching is of central importance to many natural language tasks -\cite{bordes2014semantic,RetrievalQA}. A successful matching algorithm needs to -adequately model the internal structures of language objects and the -interaction between them. As a step toward this goal, we propose convolutional -neural network models for matching two sentences, by adapting the convolutional -strategy in vision and speech. The proposed models not only nicely represent -the hierarchical structures of sentences with their layer-by-layer composition -and pooling, but also capture the rich matching patterns at different levels. -Our models are rather generic, requiring no prior knowledge on language, and -can hence be applied to matching tasks of different nature and in different -languages. The empirical study on a variety of matching tasks demonstrates the -efficacy of the proposed model on a variety of matching tasks and its -superiority to competitor models. -" -1640,1503.03512,"Eitan Adam Pechenick, Christopher M. Danforth, Peter Sheridan Dodds","Is language evolution grinding to a halt? The scaling of lexical - turbulence in English fiction suggests it is not",cs.CL cs.IT math.IT physics.soc-ph stat.AP," Of basic interest is the quantification of the long term growth of a -language's lexicon as it develops to more completely cover both a culture's -communication requirements and knowledge space. Here, we explore the usage -dynamics of words in the English language as reflected by the Google Books 2012 -English Fiction corpus. We critique an earlier method that found decreasing -birth and increasing death rates of words over the second half of the 20th -Century, showing death rates to be strongly affected by the imposed time cutoff -of the arbitrary present and not increasing dramatically. We provide a robust, -principled approach to examining lexical evolution by tracking the volume of -word flux across various relative frequency thresholds. We show that while the -overall statistical structure of the English language remains stable over time -in terms of its raw Zipf distribution, we find evidence of an enduring `lexical -turbulence': The flux of words across frequency thresholds from decade to -decade scales superlinearly with word rank and exhibits a scaling break we -connect to that of Zipf's law. To better understand the changing lexicon, we -examine the contributions to the Jensen-Shannon divergence of individual words -crossing frequency thresholds. We also find indications that scholarly works -about fiction are strongly represented in the 2012 English Fiction corpus, and -suggest that a future revision of the corpus should attempt to separate -critical works from fiction itself. -" -1641,1503.03535,"Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, - Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio",On Using Monolingual Corpora in Neural Machine Translation,cs.CL," Recent work on end-to-end neural network-based architectures for machine -translation has shown promising results for En-Fr and En-De translation. -Arguably, one of the major factors behind this success has been the -availability of high quality parallel corpora. In this work, we investigate how -to leverage abundant monolingual corpora for neural machine translation. -Compared to a phrase-based and hierarchical baseline, we obtain up to $1.96$ -BLEU improvement on the low-resource language pair Turkish-English, and $1.59$ -BLEU on the focused domain task of Chinese-English chat messages. While our -method was initially targeted toward such tasks with less parallel data, we -show that it also extends to high resource languages such as Cs-En and De-En -where we obtain an improvement of $0.39$ and $0.47$ BLEU scores over the neural -machine translation baselines, respectively. -" -1642,1503.03989,Mirzanur Rahman and Shikhar Kumar Sarma,An implementation of Apertium based Assamese morphological analyzer,cs.CL," Morphological Analysis is an important branch of linguistics for any Natural -Language Processing Technology. Morphology studies the word structure and -formation of word of a language. In current scenario of NLP research, -morphological analysis techniques have become more popular day by day. For -processing any language, morphology of the word should be first analyzed. -Assamese language contains very complex morphological structure. In our work we -have used Apertium based Finite-State-Transducers for developing morphological -analyzer for Assamese Language with some limited domain and we get 72.7% -accuracy -" -1643,1503.04250,"Julia Bernd, Damian Borth, Benjamin Elizalde, Gerald Friedland, - Heather Gallagher, Luke Gottlieb, Adam Janin, Sara Karabashlieva, Jocelyn - Takahashi, Jennifer Won","The YLI-MED Corpus: Characteristics, Procedures, and Plans",cs.MM cs.CL," The YLI Multimedia Event Detection corpus is a public-domain index of videos -with annotations and computed features, specialized for research in multimedia -event detection (MED), i.e., automatically identifying what's happening in a -video by analyzing the audio and visual content. The videos indexed in the -YLI-MED corpus are a subset of the larger YLI feature corpus, which is being -developed by the International Computer Science Institute and Lawrence -Livermore National Laboratory based on the Yahoo Flickr Creative Commons 100 -Million (YFCC100M) dataset. The videos in YLI-MED are categorized as depicting -one of ten target events, or no target event, and are annotated for additional -attributes like language spoken and whether the video has a musical score. The -annotations also include degree of annotator agreement and average annotator -confidence scores for the event categorization of each video. Version 1.0 of -YLI-MED includes 1823 ""positive"" videos that depict the target events and -48,138 ""negative"" videos, as well as 177 supplementary videos that are similar -to event videos but are not positive examples. Our goal in producing YLI-MED is -to be as open about our data and procedures as possible. This report describes -the procedures used to collect the corpus; gives detailed descriptive -statistics about the corpus makeup (and how video attributes affected -annotators' judgments); discusses possible biases in the corpus introduced by -our procedural choices and compares it with the most similar existing dataset, -TRECVID MED's HAVIC corpus; and gives an overview of our future plans for -expanding the annotation effort. -" -1644,1503.04723,Marco Guerini and Jacopo Staiano,"Deep Feelings: A Massive Cross-Lingual Study on the Relation between - Emotions and Virality",cs.SI cs.CL cs.CY," This article provides a comprehensive investigation on the relations between -virality of news articles and the emotions they are found to evoke. Virality, -in our view, is a phenomenon with many facets, i.e. under this generic term -several different effects of persuasive communication are comprised. By -exploiting a high-coverage and bilingual corpus of documents containing metrics -of their spread on social networks as well as a massive affective annotation -provided by readers, we present a thorough analysis of the interplay between -evoked emotions and viral facets. We highlight and discuss our findings in -light of a cross-lingual approach: while we discover differences in evoked -emotions and corresponding viral effects, we provide preliminary evidence of a -generalized explanatory model rooted in the deep structure of emotions: the -Valence-Arousal-Dominance (VAD) circumplex. We find that viral facets appear to -be consistently affected by particular VAD configurations, and these -configurations indicate a clear connection with distinct phenomena underlying -persuasive communication. -" -1645,1503.04881,"Xiaodan Zhu, Parinaz Sobhani, Hongyu Guo",Long Short-Term Memory Over Tree Structures,cs.CL cs.LG cs.NE," The chain-structured long short-term memory (LSTM) has showed to be effective -in a wide range of problems such as speech recognition and machine translation. -In this paper, we propose to extend it to tree structures, in which a memory -cell can reflect the history memories of multiple child cells or multiple -descendant cells in a recursive process. We call the model S-LSTM, which -provides a principled way of considering long-distance interaction over -hierarchies, e.g., language or image parse structures. We leverage the models -for semantic composition to understand the meaning of text, a fundamental -problem in natural language understanding, and show that it outperforms a -state-of-the-art recursive model by replacing its composition layers with the -S-LSTM memory blocks. We also show that utilizing the given structures is -helpful in achieving a performance better than that without considering the -structures. -" -1646,1503.05034,"Mingxuan Wang, Zhengdong Lu, Hang Li, Wenbin Jiang, Qun Liu",$gen$CNN: A Convolutional Architecture for Word Sequence Prediction,cs.CL," We propose a novel convolutional architecture, named $gen$CNN, for word -sequence prediction. Different from previous work on neural network-based -language modeling and generation (e.g., RNN or LSTM), we choose not to greedily -summarize the history of words as a fixed length vector. Instead, we use a -convolutional neural network to predict the next word with the history of words -of variable length. Also different from the existing feedforward networks for -language modeling, our model can effectively fuse the local correlation and -global correlation in the word sequence, with a convolution-gating strategy -specifically designed for the task. We argue that our model can give adequate -representation of the history, and therefore can naturally exploit both the -short and long range dependencies. Our model is fast, easy to train, and -readily parallelized. Our extensive experiments on text generation and $n$-best -re-ranking in machine translation show that $gen$CNN outperforms the -state-of-the-arts with big margins. -" -1647,1503.05123,"Manuel Amunategui, Tristan Markwell, Yelena Rozenfeld",Prediction Using Note Text: Synthetic Feature Creation with word2vec,cs.CL," word2vec affords a simple yet powerful approach of extracting quantitative -variables from unstructured textual data. Over half of healthcare data is -unstructured and therefore hard to model without involved expertise in data -engineering and natural language processing. word2vec can serve as a bridge to -quickly gather intelligence from such data sources. - In this study, we ran 650 megabytes of unstructured, medical chart notes from -the Providence Health & Services electronic medical record through word2vec. We -used two different approaches in creating predictive variables and tested them -on the risk of readmission for patients with COPD (Chronic Obstructive Lung -Disease). As a comparative benchmark, we ran the same test using the LACE risk -model (a single score based on length of stay, acuity, comorbid conditions, and -emergency department visits). - Using only free text and mathematical might, we found word2vec comparable to -LACE in predicting the risk of readmission of COPD patients. -" -1648,1503.05543,"Alexander A Alemi, Paul Ginsparg",Text Segmentation based on Semantic Word Embeddings,cs.CL cs.IR," We explore the use of semantic word embeddings in text segmentation -algorithms, including the C99 segmentation algorithm and new algorithms -inspired by the distributed word vector representation. By developing a general -framework for discussing a class of segmentation objectives, we study the -effectiveness of greedy versus exact optimization approaches and suggest a new -iterative refinement technique for improving the performance of greedy -strategies. We compare our results to known benchmarks, using known metrics. We -demonstrate state-of-the-art performance for an untrained method with our -Content Vector Segmentation (CVS) on the Choi test set. Finally, we apply the -segmentation procedure to an in-the-wild dataset consisting of text extracted -from scholarly articles in the arXiv.org database. -" -1649,1503.05615,"Kai-Wei Chang, He He, Hal Daum\'e III, John Langford",Learning to Search for Dependencies,cs.CL cs.LG," We demonstrate that a dependency parser can be built using a credit -assignment compiler which removes the burden of worrying about low-level -machine learning details from the parser implementation. The result is a simple -parser which robustly applies to many languages that provides similar -statistical and computational performance with best-to-date transition-based -parsing approaches, while avoiding various downsides including randomization, -extra feature requirements, and custom learning algorithms. -" -1650,1503.05626,Myong-Chol Pak,"Phrase database Approach to structural and semantic disambiguation in - English-Korean Machine Translation",cs.CL," In machine translation it is common phenomenon that machine-readable -dictionaries and standard parsing rules are not enough to ensure accuracy in -parsing and translating English phrases into Korean language, which is revealed -in misleading translation results due to consequent structural and semantic -ambiguities. This paper aims to suggest a solution to structural and semantic -ambiguities due to the idiomaticity and non-grammaticalness of phrases commonly -used in English language by applying bilingual phrase database in -English-Korean Machine Translation (EKMT). This paper firstly clarifies what -the phrase unit in EKMT is based on the definition of the English phrase, -secondly clarifies what kind of language unit can be the target of the phrase -database for EKMT, thirdly suggests a way to build the phrase database by -presenting the format of the phrase database with examples, and finally -discusses briefly the method to apply this bilingual phrase database to the -EKMT for structural and semantic disambiguation. -" -1651,1503.05907,Daniel Christen,Syntagma Lexical Database,cs.CL," This paper discusses the structure of Syntagma's Lexical Database (focused on -Italian). The basic database consists in four tables. Table Forms contains word -inflections, used by the POS-tagger for the identification of input-words. -Forms is related to Lemma. Table Lemma stores all kinds of grammatical features -of words, word-level semantic data and restrictions. In the table Meanings -meaning-related data are stored: definition, examples, domain, and semantic -information. Table Valency contains the argument structure of each meaning, -with syntactic and semantic features for each argument. The extended version of -SLD contains the links to Syntagma's Semantic Net and to the WordNet synsets of -other languages. -" -1652,1503.06151,Maxim Litvak,On measuring linguistic intelligence,cs.CL," This work addresses the problem of measuring how many languages a person -""effectively"" speaks given that some of the languages are close to each other. -In other words, to assign a meaningful number to her language portfolio. -Intuition says that someone who speaks fluently Spanish and Portuguese is -linguistically less proficient compared to someone who speaks fluently Spanish -and Chinese since it takes more effort for a native Spanish speaker to learn -Chinese than Portuguese. As the number of languages grows and their proficiency -levels vary, it gets even more complicated to assign a score to a language -portfolio. In this article we propose such a measure (""linguistic quotient"" - -LQ) that can account for these effects. - We define the properties that such a measure should have. They are based on -the idea of coherent risk measures from the mathematical finance. Having laid -down the foundation, we propose one such a measure together with the algorithm -that works on languages classification tree as input. - The algorithm together with the input is available online at lingvometer.com -" -1653,1503.06410,David M. W. Powers,"What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes",cs.IR cs.CL cs.LG cs.NE stat.CO stat.ML," The F-measure or F-score is one of the most commonly used single number -measures in Information Retrieval, Natural Language Processing and Machine -Learning, but it is based on a mistake, and the flawed assumptions render it -unsuitable for use in most contexts! Fortunately, there are better -alternatives. -" -1654,1503.06450,Manaal Faruqui and Shankar Kumar,Multilingual Open Relation Extraction Using Cross-lingual Projection,cs.CL," Open domain relation extraction systems identify relation and argument -phrases in a sentence without relying on any underlying schema. However, -current state-of-the-art relation extraction systems are available only for -English because of their heavy reliance on linguistic tools such as -part-of-speech taggers and dependency parsers. We present a cross-lingual -annotation projection method for language independent relation extraction. We -evaluate our method on a manually annotated test set and present results on -three typologically different languages. We release these manual annotations -and extracted relations in 61 languages from Wikipedia. -" -1655,1503.06733,"Mohammad Sadegh Rasooli, Joel Tetreault",Yara Parser: A Fast and Accurate Dependency Parser,cs.CL," Dependency parsers are among the most crucial tools in natural language -processing as they have many important applications in downstream tasks such as -information retrieval, machine translation and knowledge acquisition. We -introduce the Yara Parser, a fast and accurate open-source dependency parser -based on the arc-eager algorithm and beam search. It achieves an unlabeled -accuracy of 93.32 on the standard WSJ test set which ranks it among the top -dependency parsers. At its fastest, Yara can parse about 4000 sentences per -second when in greedy mode (1 beam). When optimizing for accuracy (using 64 -beams and Brown cluster features), Yara can parse 45 sentences per second. The -parser can be trained on any syntactic dependency treebank and different -options are provided in order to make it more flexible and tunable for specific -tasks. It is released with the Apache version 2.0 license and can be used for -both commercial and academic purposes. The parser can be found at -https://github.com/yahoo/YaraParser. -" -1656,1503.06760,Chu-Cheng Lin and Waleed Ammar and Chris Dyer and Lori Levin,Unsupervised POS Induction with Word Embeddings,cs.CL," Unsupervised word embeddings have been shown to be valuable as features in -supervised learning problems; however, their role in unsupervised problems has -been less thoroughly explored. In this paper, we show that embeddings can -likewise add value to the problem of unsupervised POS induction. In two -representative models of POS induction, we replace multinomial distributions -over the vocabulary with multivariate Gaussian distributions over word -embeddings and observe consistent improvements in eight languages. We also -analyze the effect of various choices while inducing word embeddings on -""downstream"" POS induction results. -" -1657,1503.06934,Issa Atoum and Chih How Bong,"Measuring Software Quality in Use: State-of-the-Art and Research - Challenges",cs.SE cs.CL," Software quality in use comprises quality from the user's perspective. It has -gained its importance in e-government applications, mobile-based applications, -embedded systems, and even business process development. User's decisions on -software acquisitions are often ad hoc or based on preference due to difficulty -in quantitatively measuring software quality in use. But, why is quality-in-use -measurement difficult? Although there are many software quality models, to the -authors' knowledge no works survey the challenges related to software -quality-in-use measurement. This article has two main contributions: 1) it -identifies and explains major issues and challenges in measuring software -quality in use in the context of the ISO SQuaRE series and related software -quality models and highlights open research areas; and 2) it sheds light on a -research direction that can be used to predict software quality in use. In -short, the quality-in-use measurement issues are related to the complexity of -the current standard models and the limitations and incompleteness of the -customized software quality models. A sentiment analysis of software reviews is -proposed to deal with these issues. -" -1658,1503.07283,Mikhail Korobov,Morphological Analyzer and Generator for Russian and Ukrainian Languages,cs.CL," pymorphy2 is a morphological analyzer and generator for Russian and Ukrainian -languages. It uses large efficiently encoded lexi- cons built from OpenCorpora -and LanguageTool data. A set of linguistically motivated rules is developed to -enable morphological analysis and generation of out-of-vocabulary words -observed in real-world documents. For Russian pymorphy2 provides -state-of-the-arts morphological analysis quality. The analyzer is implemented -in Python programming language with optional C++ extensions. Emphasis is put on -ease of use, documentation and extensibility. The package is distributed under -a permissive open-source license, encouraging its use in both academic and -commercial setting. -" -1659,1503.07294,"Wendy Tan Wei Syn, Bong Chih How, Issa Atoum","Using Latent Semantic Analysis to Identify Quality in Use (QU) - Indicators from User Reviews",cs.CL cs.AI cs.IR," The paper describes a novel approach to categorize users' reviews according -to the three Quality in Use (QU) indicators defined in ISO: effectiveness, -efficiency and freedom from risk. With the tremendous amount of reviews -published each day, there is a need to automatically summarize user reviews to -inform us if any of the software able to meet requirement of a company -according to the quality requirements. We implemented the method of Latent -Semantic Analysis (LSA) and its subspace to predict QU indicators. We build a -reduced dimensionality universal semantic space from Information System -journals and Amazon reviews. Next, we projected set of indicators' measurement -scales into the universal semantic space and represent them as subspace. In the -subspace, we can map similar measurement scales to the unseen reviews and -predict the QU indicators. Our preliminary study able to obtain the average of -F-measure, 0.3627. -" -1660,1503.07613,"David Fifield, Torbj{\o}rn Follan, Emil Lunde",Unsupervised authorship attribution,cs.CL," We describe a technique for attributing parts of a written text to a set of -unknown authors. Nothing is assumed to be known a priori about the writing -styles of potential authors. We use multiple independent clusterings of an -input text to identify parts that are similar and dissimilar to one another. We -describe algorithms necessary to combine the multiple clusterings into a -meaningful output. We show results of the application of the technique on texts -having multiple writing styles. -" -1661,1503.07921,"Julio Reis, Fabr{\i}cio Benevenuto, Pedro O.S. Vaz de Melo, Raquel - Prates, Haewoon Kwak, Jisun An",Breaking the News: First Impressions Matter on Online News,cs.CY cs.CL," A growing number of people are changing the way they consume news, replacing -the traditional physical newspapers and magazines by their virtual online -versions or/and weblogs. The interactivity and immediacy present in online news -are changing the way news are being produced and exposed by media corporations. -News websites have to create effective strategies to catch people's attention -and attract their clicks. In this paper we investigate possible strategies used -by online news corporations in the design of their news headlines. We analyze -the content of 69,907 headlines produced by four major global media -corporations during a minimum of eight consecutive months in 2014. In order to -discover strategies that could be used to attract clicks, we extracted features -from the text of the news headlines related to the sentiment polarity of the -headline. We discovered that the sentiment of the headline is strongly related -to the popularity of the news and also with the dynamics of the posted comments -on that particular news. -" -1662,1503.08155,"Miao Fan, Qiang Zhou and Thomas Fang Zheng","Learning Embedding Representations for Knowledge Inference on Imperfect - and Incomplete Repositories",cs.AI cs.CL," This paper considers the problem of knowledge inference on large-scale -imperfect repositories with incomplete coverage by means of embedding entities -and relations at the first attempt. We propose IIKE (Imperfect and Incomplete -Knowledge Embedding), a probabilistic model which measures the probability of -each belief, i.e. $\langle h,r,t\rangle$, in large-scale knowledge bases such -as NELL and Freebase, and our objective is to learn a better low-dimensional -vector representation for each entity ($h$ and $t$) and relation ($r$) in the -process of minimizing the loss of fitting the corresponding confidence given by -machine learning (NELL) or crowdsouring (Freebase), so that we can use $||{\bf -h} + {\bf r} - {\bf t}||$ to assess the plausibility of a belief when -conducting inference. We use subsets of those inexact knowledge bases to train -our model and test the performances of link prediction and triplet -classification on ground truth beliefs, respectively. The results of extensive -experiments show that IIKE achieves significant improvement compared with the -baseline and state-of-the-art approaches. -" -1663,1503.08167,"Slobodan Beliga, Miran Pobar and Sanda Martin\v{c}i\'c-Ip\v{s}i\'c",Normalization of Non-Standard Words in Croatian Texts,cs.CL," This paper presents text normalization which is an integral part of any -text-to-speech synthesis system. Text normalization is a set of methods with a -task to write non-standard words, like numbers, dates, times, abbreviations, -acronyms and the most common symbols, in their full expanded form are -presented. The whole taxonomy for classification of non-standard words in -Croatian language together with rule-based normalization methods combined with -a lookup dictionary are proposed. Achieved token rate for normalization of -Croatian texts is 95%, where 80% of expanded words are in correct morphological -form. -" -1664,1503.08542,"Junyu Xuan, Jie Lu, Guangquan Zhang, Richard Yi Da Xu, Xiangfeng Luo",Nonparametric Relational Topic Models through Dependent Gamma Processes,stat.ML cs.CL cs.IR cs.LG," Traditional Relational Topic Models provide a way to discover the hidden -topics from a document network. Many theoretical and practical tasks, such as -dimensional reduction, document clustering, link prediction, benefit from this -revealed knowledge. However, existing relational topic models are based on an -assumption that the number of hidden topics is known in advance, and this is -impractical in many real-world applications. Therefore, in order to relax this -assumption, we propose a nonparametric relational topic model in this paper. -Instead of using fixed-dimensional probability distributions in its generative -model, we use stochastic processes. Specifically, a gamma process is assigned -to each document, which represents the topic interest of this document. -Although this method provides an elegant solution, it brings additional -challenges when mathematically modeling the inherent network structure of -typical document network, i.e., two spatially closer documents tend to have -more similar topics. Furthermore, we require that the topics are shared by all -the documents. In order to resolve these challenges, we use a subsampling -strategy to assign each document a different gamma process from the global -gamma process, and the subsampling probabilities of documents are assigned with -a Markov Random Field constraint that inherits the document network structure. -Through the designed posterior inference algorithm, we can discover the hidden -topics and its number simultaneously. Experimental results on both synthetic -and real-world network datasets demonstrate the capabilities of learning the -hidden topics and, more importantly, the number of topics. -" -1665,1503.08581,"Ioannis Partalas, Aris Kosmopoulos, Nicolas Baskiotis, Thierry - Artieres, George Paliouras, Eric Gaussier, Ion Androutsopoulos, Massih-Reza - Amini, Patrick Galinari",LSHTC: A Benchmark for Large-Scale Text Classification,cs.IR cs.CL cs.LG," LSHTC is a series of challenges which aims to assess the performance of -classification systems in large-scale classification in a a large number of -classes (up to hundreds of thousands). This paper describes the dataset that -have been released along the LSHTC series. The paper details the construction -of the datsets and the design of the tracks as well as the evaluation measures -that we implemented and a quick overview of the results. All of these datasets -are available online and runs may still be submitted on the online server of -the challenges. -" -1666,1503.08895,"Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston and Rob Fergus",End-To-End Memory Networks,cs.NE cs.CL," We introduce a neural network with a recurrent attention model over a -possibly large external memory. The architecture is a form of Memory Network -(Weston et al., 2015) but unlike the model in that work, it is trained -end-to-end, and hence requires significantly less supervision during training, -making it more generally applicable in realistic settings. It can also be seen -as an extension of RNNsearch to the case where multiple computational steps -(hops) are performed per output symbol. The flexibility of the model allows us -to apply it to tasks as diverse as (synthetic) question answering and to -language modeling. For the former our approach is competitive with Memory -Networks, but with less supervision. For the latter, on the Penn TreeBank and -Text8 datasets our approach demonstrates comparable performance to RNNs and -LSTMs. In both cases we show that the key concept of multiple computational -hops yields improved results. -" -1667,1503.09144,"Ant\'onio Lopes and David Martins de Matos and Vera Cabarr\~ao and - Ricardo Ribeiro and Helena Moniz and Isabel Trancoso and Ana Isabel Mata","Towards Using Machine Translation Techniques to Induce Multilingual - Lexica of Discourse Markers",cs.CL," Discourse markers are universal linguistic events subject to language -variation. Although an extensive literature has already reported language -specific traits of these events, little has been said on their cross-language -behavior and on building an inventory of multilingual lexica of discourse -markers. This work describes new methods and approaches for the description, -classification, and annotation of discourse markers in the specific domain of -the Europarl corpus. The study of discourse markers in the context of -translation is crucial due to the idiomatic nature of these structures. -Multilingual lexica together with the functional analysis of such structures -are useful tools for the hard task of translating discourse markers into -possible equivalents from one language to another. Using Daniel Marcu's -validated discourse markers for English, extracted from the Brown Corpus, our -purpose is to build multilingual lexica of discourse markers for other -languages, based on machine translation techniques. The major assumption in -this study is that the usage of a discourse marker is independent of the -language, i.e., the rhetorical function of a discourse marker in a sentence in -one language is equivalent to the rhetorical function of the same discourse -marker in another language. -" -1668,1504.00325,"Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh - Gupta, Piotr Dollar, C. Lawrence Zitnick",Microsoft COCO Captions: Data Collection and Evaluation Server,cs.CV cs.CL," In this paper we describe the Microsoft COCO Caption dataset and evaluation -server. When completed, the dataset will contain over one and a half million -captions describing over 330,000 images. For the training and validation -images, five independent human generated captions will be provided. To ensure -consistency in evaluation of automatic caption generation algorithms, an -evaluation server is used. The evaluation server receives candidate captions -and scores them using several popular metrics, including BLEU, METEOR, ROUGE -and CIDEr. Instructions for using the evaluation server are provided. -" -1669,1504.00548,"Felix Hill, Kyunghyun Cho, Anna Korhonen and Yoshua Bengio",Learning to Understand Phrases by Embedding the Dictionary,cs.CL," Distributional models that learn rich semantic word representations are a -success story of recent NLP research. However, developing models that learn -useful representations of phrases and sentences has proved far harder. We -propose using the definitions found in everyday dictionaries as a means of -bridging this gap between lexical and phrasal semantics. Neural language -embedding models can be effectively trained to map dictionary definitions -(phrases) to (lexical) representations of the words defined by those -definitions. We present two applications of these architectures: ""reverse -dictionaries"" that return the name of a concept given a definition or -description and general-knowledge crossword question answerers. On both tasks, -neural language embedding models trained on definitions from a handful of -freely-available lexical resources perform as well or better than existing -commercial systems that rely on significant task-specific engineering. The -results highlight the effectiveness of both neural embedding architectures and -definition-based training for developing models that understand phrases and -sentences. -" -1670,1504.00657,"Geoffrey Fairchild (1 and 3), Lalindra De Silva (2), Sara Y. Del Valle - (1), Alberto M. Segre (3) ((1) Los Alamos National Laboratory, Los Alamos, - NM, USA, (2) The University of Utah, Salt Lake City, UT, USA, (3) The - University of Iowa, Iowa City, IA, USA)",Eliciting Disease Data from Wikipedia Articles,cs.IR cs.CL cs.SI q-bio.PE," Traditional disease surveillance systems suffer from several disadvantages, -including reporting lags and antiquated technology, that have caused a movement -towards internet-based disease surveillance systems. Internet systems are -particularly attractive for disease outbreaks because they can provide data in -near real-time and can be verified by individuals around the globe. However, -most existing systems have focused on disease monitoring and do not provide a -data repository for policy makers or researchers. In order to fill this gap, we -analyzed Wikipedia article content. - We demonstrate how a named-entity recognizer can be trained to tag case -counts, death counts, and hospitalization counts in the article narrative that -achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola -virus disease epidemic article as a case study, that there are detailed time -series data that are consistently updated that closely align with ground truth -data. - We argue that Wikipedia can be used to create the first community-driven -open-source emerging disease detection, monitoring, and repository system. -" -1671,1504.00854,David M. W. Powers,Evaluation Evaluation a Monte Carlo study,cs.AI cs.CL stat.ML," Over the last decade there has been increasing concern about the biases -embodied in traditional evaluation methods for Natural Language -Processing/Learning, particularly methods borrowed from Information Retrieval. -Without knowledge of the Bias and Prevalence of the contingency being tested, -or equivalently the expectation due to chance, the simple conditional -probabilities Recall, Precision and Accuracy are not meaningful as evaluation -measures, either individually or in combinations such as F-factor. The -existence of bias in NLP measures leads to the 'improvement' of systems by -increasing their bias, such as the practice of improving tagging and parsing -scores by using most common value (e.g. water is always a Noun) rather than the -attempting to discover the correct one. The measures Cohen Kappa and Powers -Informedness are discussed as unbiased alternative to Recall and related to the -psychologically significant measure DeltaP. In this paper we will analyze both -biased and unbiased measures theoretically, characterizing the precise -relationship between all these measures as well as evaluating the evaluation -measures themselves empirically using a Monte Carlo simulation. -" -1672,1504.00923,"Fred Richardson, Douglas Reynolds, Najim Dehak",A Unified Deep Neural Network for Speaker and Language Recognition,cs.CL cs.CV cs.LG cs.NE stat.ML," Learned feature representations and sub-phoneme posteriors from Deep Neural -Networks (DNNs) have been used separately to produce significant performance -gains for speaker and language recognition tasks. In this work we show how -these gains are possible using a single DNN for both speaker and language -recognition. The unified DNN approach is shown to yield substantial performance -improvements on the the 2013 Domain Adaptation Challenge speaker recognition -task (55% reduction in EER for the out-of-domain condition) and on the NIST -2011 Language Recognition Evaluation (48% reduction in EER for the 30s test -condition). -" -1673,1504.01106,"Lili Mou, Hao Peng, Ge Li, Yan Xu, Lu Zhang, Zhi Jin",Discriminative Neural Sentence Modeling by Tree-Based Convolution,cs.CL cs.LG cs.NE," This paper proposes a tree-based convolutional neural network (TBCNN) for -discriminative sentence modeling. Our models leverage either constituency trees -or dependency trees of sentences. The tree-based convolution process extracts -sentences' structural features, and these features are aggregated by max -pooling. Such architecture allows short propagation paths between the output -layer and underlying feature detectors, which enables effective structural -feature learning and extraction. We evaluate our models on two tasks: sentiment -analysis and question classification. In both experiments, TBCNN outperforms -previous state-of-the-art results, including existing neural networks and -dedicated feature/rule engineering. We also make efforts to visualize the -tree-based convolution process, shedding light on how our models work. -" -1674,1504.01182,"Nayan Jyoti Kalita, Baharul Islam","Bengali to Assamese Statistical Machine Translation using Moses (Corpus - Based)",cs.CL," Machine dialect interpretation assumes a real part in encouraging man-machine -correspondence and in addition men-men correspondence in Natural Language -Processing (NLP). Machine Translation (MT) alludes to utilizing machine to -change one dialect to an alternate. Statistical Machine Translation is a type -of MT consisting of Language Model (LM), Translation Model (TM) and decoder. In -this paper, Bengali to Assamese Statistical Machine Translation Model has been -created by utilizing Moses. Other translation tools like IRSTLM for Language -Model and GIZA-PP-V1.0.7 for Translation model are utilized within this -framework which is accessible in Linux situations. The purpose of the LM is to -encourage fluent output and the purpose of TM is to encourage similarity -between input and output, the decoder increases the probability of translated -text in target language. A parallel corpus of 17100 sentences in Bengali and -Assamese has been utilized for preparing within this framework. Measurable MT -procedures have not so far been generally investigated for Indian dialects. It -might be intriguing to discover to what degree these models can help the -immense continuous MT deliberations in the nation. -" -1675,1504.01255,Rie Johnson and Tong Zhang,"Semi-supervised Convolutional Neural Networks for Text Categorization - via Region Embedding",stat.ML cs.CL cs.LG," This paper presents a new semi-supervised framework with convolutional neural -networks (CNNs) for text categorization. Unlike the previous approaches that -rely on word embeddings, our method learns embeddings of small text regions -from unlabeled data for integration into a supervised CNN. The proposed scheme -for embedding learning is based on the idea of two-view semi-supervised -learning, which is intended to be useful for the task of interest even though -the training is done on unlabeled data. Our models achieve better results than -previous approaches on sentiment classification and topic classification tasks. -" -1676,1504.01383,"Vlad Niculae, Caroline Suen, Justine Zhang, Cristian - Danescu-Niculescu-Mizil, Jure Leskovec","QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting - Patterns",cs.CL cs.SI physics.soc-ph," Given the extremely large pool of events and stories available, media outlets -need to focus on a subset of issues and aspects to convey to their audience. -Outlets are often accused of exhibiting a systematic bias in this selection -process, with different outlets portraying different versions of reality. -However, in the absence of objective measures and empirical evidence, the -direction and extent of systematicity remains widely disputed. - In this paper we propose a framework based on quoting patterns for -quantifying and characterizing the degree to which media outlets exhibit -systematic bias. We apply this framework to a massive dataset of news articles -spanning the six years of Obama's presidency and all of his speeches, and -reveal that a systematic pattern does indeed emerge from the outlet's quoting -behavior. Moreover, we show that this pattern can be successfully exploited in -an unsupervised prediction setting, to determine which new quotes an outlet -will select to broadcast. By encoding bias patterns in a low-rank space we -provide an analysis of the structure of political media coverage. This reveals -a latent media bias space that aligns surprisingly well with political ideology -and outlet type. A linguistic analysis exposes striking differences across -these latent dimensions, showing how the different types of media outlets -portray different realities even when reporting on the same events. For -example, outlets mapped to the mainstream conservative side of the latent space -focus on quotes that portray a presidential persona disproportionately -characterized by negativity. -" -1677,1504.01427,"Sunil Kopparapu, Saurabh Bhatnagar, K. Sahana, Sathyanarayana, - Akhilesh Srivastava, P.V.S. Rao",A Metric to Classify Style of Spoken Speech,cs.CL," The ability to classify spoken speech based on the style of speaking is an -important problem. With the advent of BPO's in recent times, specifically those -that cater to a population other than the local population, it has become -necessary for BPO's to identify people with certain style of speaking -(American, British etc). Today BPO's employ accent analysts to identify people -having the required style of speaking. This process while involving human bias, -it is becoming increasingly infeasible because of the high attrition rate in -the BPO industry. In this paper, we propose a new metric, which robustly and -accurately helps classify spoken speech based on the style of speaking. The -role of the proposed metric is substantiated by using it to classify real -speech data collected from over seventy different people working in a BPO. We -compare the performance of the metric against human experts who independently -carried out the classification process. Experimental results show that the -performance of the system using the novel metric performs better than two -different human expert. -" -1678,1504.01482,"William Chan, Ian Lane",Deep Recurrent Neural Networks for Acoustic Modelling,cs.LG cs.CL cs.NE stat.ML," We present a novel deep Recurrent Neural Network (RNN) model for acoustic -modelling in Automatic Speech Recognition (ASR). We term our contribution as a -TC-DNN-BLSTM-DNN model, the model combines a Deep Neural Network (DNN) with -Time Convolution (TC), followed by a Bidirectional Long Short-Term Memory -(BLSTM), and a final DNN. The first DNN acts as a feature processor to our -model, the BLSTM then generates a context from the sequence acoustic signal, -and the final DNN takes the context and models the posterior probabilities of -the acoustic states. We achieve a 3.47 WER on the Wall Street Journal (WSJ) -eval92 task or more than 8% relative improvement over the baseline DNN models. -" -1679,1504.01483,William Chan and Nan Rosemary Ke and Ian Lane,Transferring Knowledge from a RNN to a DNN,cs.LG cs.CL cs.NE stat.ML," Deep Neural Network (DNN) acoustic models have yielded many state-of-the-art -results in Automatic Speech Recognition (ASR) tasks. More recently, Recurrent -Neural Network (RNN) models have been shown to outperform DNNs counterparts. -However, state-of-the-art DNN and RNN models tend to be impractical to deploy -on embedded systems with limited computational capacity. Traditionally, the -approach for embedded platforms is to either train a small DNN directly, or to -train a small DNN that learns the output distribution of a large DNN. In this -paper, we utilize a state-of-the-art RNN to transfer knowledge to small DNN. We -use the RNN model to generate soft alignments and minimize the Kullback-Leibler -divergence against the small DNN. The small DNN trained on the soft RNN -alignments achieved a 3.93 WER on the Wall Street Journal (WSJ) eval92 task -compared to a baseline 4.54 WER or more than 13% relative improvement. -" -1680,1504.01496,Sunil Kumar Kopparapu,Voice based self help System: User Experience Vs Accuracy,cs.CL," In general, self help systems are being increasingly deployed by service -based industries because they are capable of delivering better customer service -and increasingly the switch is to voice based self help systems because they -provide a natural interface for a human to interact with a machine. A speech -based self help system ideally needs a speech recognition engine to convert -spoken speech to text and in addition a language processing engine to take care -of any misrecognitions by the speech recognition engine. Any off-the-shelf -speech recognition engine is generally a combination of acoustic processing and -speech grammar. While this is the norm, we believe that ideally a speech -recognition application should have in addition to a speech recognition engine -a separate language processing engine to give the system better performance. In -this paper, we discuss ways in which the speech recognition engine and the -language processing engine can be combined to give a better user experience. -" -1681,1504.01683,"Miao Fan, Kai Cao, Yifan He and Ralph Grishman",Jointly Embedding Relations and Mentions for Knowledge Population,cs.CL," This paper contributes a joint embedding model for predicting relations -between a pair of entities in the scenario of relation inference. It differs -from most stand-alone approaches which separately operate on either knowledge -bases or free texts. The proposed model simultaneously learns low-dimensional -vector representations for both triplets in knowledge repositories and the -mentions of relations in free texts, so that we can leverage the evidence both -resources to make more accurate predictions. We use NELL to evaluate the -performance of our approach, compared with cutting-edge methods. Results of -extensive experiments show that our model achieves significant improvement on -relation extraction. -" -1682,1504.01684,"Miao Fan, Qiang Zhou, Thomas Fang Zheng and Ralph Grishman",Large Margin Nearest Neighbor Embedding for Knowledge Representation,cs.AI cs.CL," Traditional way of storing facts in triplets ({\it head\_entity, relation, -tail\_entity}), abbreviated as ({\it h, r, t}), makes the knowledge intuitively -displayed and easily acquired by mankind, but hardly computed or even reasoned -by AI machines. Inspired by the success in applying {\it Distributed -Representations} to AI-related fields, recent studies expect to represent each -entity and relation with a unique low-dimensional embedding, which is different -from the symbolic and atomic framework of displaying knowledge in triplets. In -this way, the knowledge computing and reasoning can be essentially facilitated -by means of a simple {\it vector calculation}, i.e. ${\bf h} + {\bf r} \approx -{\bf t}$. We thus contribute an effective model to learn better embeddings -satisfying the formula by pulling the positive tail entities ${\bf t^{+}}$ to -get together and close to {\bf h} + {\bf r} ({\it Nearest Neighbor}), and -simultaneously pushing the negatives ${\bf t^{-}}$ away from the positives -${\bf t^{+}}$ via keeping a {\it Large Margin}. We also design a corresponding -learning algorithm to efficiently find the optimal solution based on {\it -Stochastic Gradient Descent} in iterative fashion. Quantitative experiments -illustrate that our approach can achieve the state-of-the-art performance, -compared with several latest methods on some benchmark datasets for two -classical applications, i.e. {\it Link prediction} and {\it Triplet -classification}. Moreover, we analyze the parameter complexities among all the -evaluated models, and analytical results indicate that our model needs fewer -computational resources on outperforming the other methods. -" -1683,1504.02059,Hayat Alrefaie and Allan Ramsay,Supporting Language Learners with the Meanings Of Closed Class Items,cs.AI cs.CL," The process of language learning involves the mastery of countless tasks: -making the constituent sounds of the language being learned, learning the -grammatical patterns, and acquiring the requisite vocabulary for reception and -production. While a plethora of computational tools exist to facilitate the -first and second of these tasks, a number of challenges arise with respect to -enabling the third. This paper describes a tool that has been designed to -support language learners with the challenge of understanding the use of -closed-class lexical items. The process of learning the Arabic for office is -(mktb) is relatively simple and should be possible by means of simple -repetition of the word. However, it is much more difficult to learn and -correctly use the Arabic equivalent of the word on. The current paper describes -a mechanism for the delivery of diagnostic information regarding specific -lexical examples, with the aim of clearly demonstrating why a particular -translation of a given closed-class item may be appropriate in certain -situations but not others, thereby helping learners to understand and use the -term correctly. -" -1684,1504.02148,Peter K. Bol and Chao-Lin Liu and Hongsu Wang,"Mining and discovering biographical information in Difangzhi with a - language-model-based approach",cs.CL cs.CY cs.DL," We present results of expanding the contents of the China Biographical -Database by text mining historical local gazetteers, difangzhi. The goal of the -database is to see how people are connected together, through kinship, social -connections, and the places and offices in which they served. The gazetteers -are the single most important collection of names and offices covering the Song -through Qing periods. Although we begin with local officials we shall -eventually include lists of local examination candidates, people from the -locality who served in government, and notable local figures with biographies. -The more data we collect the more connections emerge. The value of doing -systematic text mining work is that we can identify relevant connections that -are either directly informative or can become useful without deep historical -research. Academia Sinica is developing a name database for officials in the -central governments of the Ming and Qing dynasties. -" -1685,1504.02150,Wei-Jie Huang and Chao-Lin Liu,"Exploring Lexical, Syntactic, and Semantic Features for Chinese Textual - Entailment in NTCIR RITE Evaluation Tasks",cs.CL cs.AI cs.DL," We computed linguistic information at the lexical, syntactic, and semantic -levels for Recognizing Inference in Text (RITE) tasks for both traditional and -simplified Chinese in NTCIR-9 and NTCIR-10. Techniques for syntactic parsing, -named-entity recognition, and near synonym recognition were employed, and -features like counts of common words, statement lengths, negation words, and -antonyms were considered to judge the entailment relationships of two -statements, while we explored both heuristics-based functions and -machine-learning approaches. The reported systems showed robustness by -simultaneously achieving second positions in the binary-classification subtasks -for both simplified and traditional Chinese in NTCIR-10 RITE-2. We conducted -more experiments with the test data of NTCIR-9 RITE, with good results. We also -extended our work to search for better configurations of our classifiers and -investigated contributions of individual features. This extended work showed -interesting results and should encourage further discussion. -" -1686,1504.02162,"Diego R. Amancio, Filipi N. Silva and Luciano da F. Costa","Concentric network symmetry grasps authors' styles in word adjacency - networks",cs.CL," Several characteristics of written texts have been inferred from statistical -analysis derived from networked models. Even though many network measurements -have been adapted to study textual properties at several levels of complexity, -some textual aspects have been disregarded. In this paper, we study the -symmetry of word adjacency networks, a well-known representation of text as a -graph. A statistical analysis of the symmetry distribution performed in several -novels showed that most of the words do not display symmetric patterns of -connectivity. More specifically, the merged symmetry displayed a distribution -similar to the ubiquitous power-law distribution. Our experiments also revealed -that the studied metrics do not correlate with other traditional network -measurements, such as the degree or betweenness centrality. The effectiveness -of the symmetry measurements was verified in the authorship attribution task. -Interestingly, we found that specific authors prefer particular types of -symmetric motifs. As a consequence, the authorship of books could be accurately -identified in 82.5% of the cases, in a dataset comprising books written by 8 -authors. Because the proposed measurements for text analysis are complementary -to the traditional approach, they can be used to improve the characterization -of text networks, which might be useful for related applications, such as those -relying on the identification of topical words and information retrieval. -" -1687,1504.02490,Aaron Jaech and Mari Ostendorf,"Leveraging Twitter for Low-Resource Conversational Speech Language - Modeling",cs.CL," In applications involving conversational speech, data sparsity is a limiting -factor in building a better language model. We propose a simple, -language-independent method to quickly harvest large amounts of data from -Twitter to supplement a smaller training set that is more closely matched to -the domain. The techniques lead to a significant reduction in perplexity on -four low-resource languages even though the presence on Twitter of these -languages is relatively small. We also find that the Twitter text is more -useful for learning word classes than the in-domain text and that use of these -word classes leads to further reductions in perplexity. Additionally, we -introduce a method of using social and textual information to prioritize the -download queue during the Twitter crawling. This maximizes the amount of useful -data that can be collected, impacting both perplexity and vocabulary coverage. -" -1688,1504.03068,Ahmad Kamal,Review Mining for Feature Based Opinion Summarization and Visualization,cs.IR cs.CL," The application and usage of opinion mining, especially for business -intelligence, product recommendation, targeted marketing etc. have fascinated -many research attentions around the globe. Various research efforts attempted -to mine opinions from customer reviews at different levels of granularity, -including word-, sentence-, and document-level. However, development of a fully -automatic opinion mining and sentiment analysis system is still elusive. Though -the development of opinion mining and sentiment analysis systems are getting -momentum, most of them attempt to perform document-level sentiment analysis, -classifying a review document as positive, negative, or neutral. Such -document-level opinion mining approaches fail to provide insight about users -sentiment on individual features of a product or service. Therefore, it seems -to be a great help for both customers and manufacturers, if the reviews could -be processed at a finer-grained level and presented in a summarized form -through some visual means, highlighting individual features of a product and -users sentiment expressed over them. In this paper, the design of a unified -opinion mining and sentiment analysis framework is presented at the -intersection of both machine learning and natural language processing -approaches. Also, design of a novel feature-level review summarization scheme -is proposed to visualize mined features, opinions and their polarity values in -a comprehendible way. -" -1689,1504.03425,"Iftekhar Naim, M. Iftekhar Tanveer, Daniel Gildea, Mohammed (Ehsan) - Hoque",Automated Analysis and Prediction of Job Interview Performance,cs.HC cs.AI cs.CL," We present a computational framework for automatically quantifying verbal and -nonverbal behaviors in the context of job interviews. The proposed framework is -trained by analyzing the videos of 138 interview sessions with 69 -internship-seeking undergraduates at the Massachusetts Institute of Technology -(MIT). Our automated analysis includes facial expressions (e.g., smiles, head -gestures, facial tracking points), language (e.g., word counts, topic -modeling), and prosodic information (e.g., pitch, intonation, and pauses) of -the interviewees. The ground truth labels are derived by taking a weighted -average over the ratings of 9 independent judges. Our framework can -automatically predict the ratings for interview traits such as excitement, -friendliness, and engagement with correlation coefficients of 0.75 or higher, -and can quantify the relative importance of prosody, language, and facial -expressions. By analyzing the relative feature weights learned by the -regression models, our framework recommends to speak more fluently, use less -filler words, speak as ""we"" (vs. ""I""), use more unique words, and smile more. -We also find that the students who were rated highly while answering the first -interview question were also rated highly overall (i.e., first impression -matters). Finally, our MIT Interview dataset will be made available to other -researchers to further validate and expand our findings. -" -1690,1504.03608,"Michaela Koscov\'a, J\'an Macutek, Emmerich Kelih","A data-based classification of Slavic languages: Indices of qualitative - variation applied to grapheme frequencies",stat.AP cs.CL," The Ord's graph is a simple graphical method for displaying frequency -distributions of data or theoretical distributions in the two-dimensional -plane. Its coordinates are proportions of the first three moments, either -empirical or theoretical ones. A modification of the Ord's graph based on -proportions of indices of qualitative variation is presented. Such a -modification makes the graph applicable also to data of categorical character. -In addition, the indices are normalized with values between 0 and 1, which -enables comparing data files divided into different numbers of categories. Both -the original and the new graph are used to display grapheme frequencies in -eleven Slavic languages. As the original Ord's graph requires an assignment of -numbers to the categories, graphemes were ordered decreasingly according to -their frequencies. Data were taken from parallel corpora, i.e., we work with -grapheme frequencies from a Russian novel and its translations to ten other -Slavic languages. Then, cluster analysis is applied to the graph coordinates. -While the original graph yields results which are not linguistically -interpretable, the modification reveals meaningful relations among the -languages. -" -1691,1504.03659,Azad Dehghan,Temporal ordering of clinical events,cs.CL cs.AI," This report describes a minimalistic set of methods engineered to anchor -clinical events onto a temporal space. Specifically, we describe methods to -extract clinical events (e.g., Problems, Treatments and Tests), temporal -expressions (i.e., time, date, duration, and frequency), and temporal links -(e.g., Before, After, Overlap) between events and temporal entities. These -methods are developed and validated using high quality datasets. -" -1692,1504.04317,"Corinne L. Jones, Robert A. Bridges, Kelly Huffer, John Goodall",Towards a relation extraction framework for cyber-security concepts,cs.IR cs.CL cs.CR," In order to assist security analysts in obtaining information pertaining to -their network, such as novel vulnerabilities, exploits, or patches, information -retrieval methods tailored to the security domain are needed. As labeled text -data is scarce and expensive, we follow developments in semi-supervised Natural -Language Processing and implement a bootstrapping algorithm for extracting -security entities and their relationships from text. The algorithm requires -little input data, specifically, a few relations or patterns (heuristics for -identifying relations), and incorporates an active learning component which -queries the user on the most important decisions to prevent drifting from the -desired relations. Preliminary testing on a small corpus shows promising -results, obtaining precision of .82. -" -1693,1504.04666,Phong Le and Willem Zuidema,Unsupervised Dependency Parsing: Let's Use Supervised Parsers,cs.CL cs.LG," We present a self-training approach to unsupervised dependency parsing that -reuses existing supervised and unsupervised parsing algorithms. Our approach, -called `iterated reranking' (IR), starts with dependency trees generated by an -unsupervised parser, and iteratively improves these trees using the richer -probability models used in supervised parsing that are in turn trained on these -trees. Our system achieves 1.8% accuracy higher than the state-of-the-part -parser of Spitkovsky et al. (2013) on the WSJ corpus. -" -1694,1504.04716,Vishal Shukla,"Gap Analysis of Natural Language Processing Systems with respect to - Linguistic Modality",cs.CL cs.AI," Modality is one of the important components of grammar in linguistics. It -lets speaker to express attitude towards, or give assessment or potentiality of -state of affairs. It implies different senses and thus has different -perceptions as per the context. This paper presents an account showing the gap -in the functionality of the current state of art Natural Language Processing -(NLP) systems. The contextual nature of linguistic modality is studied. In this -paper, the works and logical approaches employed by Natural Language Processing -systems dealing with modality are reviewed. It sees human cognition and -intelligence as multi-layered approach that can be implemented by intelligent -systems for learning. Lastly, current flow of research going on within this -field is talked providing futurology. -" -1695,1504.04751,"Dilek K\""u\c{c}\""uk and Meltem Turhan Y\""ondem",A Knowledge-poor Pronoun Resolution System for Turkish,cs.CL," A pronoun resolution system which requires limited syntactic knowledge to -identify the antecedents of personal and reflexive pronouns in Turkish is -presented. As in its counterparts for languages like English, Spanish and -French, the core of the system is the constraints and preferences determined -empirically. In the evaluation phase, it performed considerably better than the -baseline algorithm used for comparison. The system is significant for its being -the first fully specified knowledge-poor computational framework for pronoun -resolution in Turkish where Turkish possesses different structural properties -from the languages for which knowledge-poor systems had been developed. -" -1696,1504.04770,"Maxim Rabinovich, C\'edric Archambeau",Online Inference for Relation Extraction with a Reduced Feature Set,cs.CL cs.LG," Access to web-scale corpora is gradually bringing robust automatic knowledge -base creation and extension within reach. To exploit these large -unannotated---and extremely difficult to annotate---corpora, unsupervised -machine learning methods are required. Probabilistic models of text have -recently found some success as such a tool, but scalability remains an obstacle -in their application, with standard approaches relying on sampling schemes that -are known to be difficult to scale. In this report, we therefore present an -empirical assessment of the sublinear time sparse stochastic variational -inference (SSVI) scheme applied to RelLDA. We demonstrate that online inference -leads to relatively strong qualitative results but also identify some of its -pathologies---and those of the model---which will need to be overcome if SSVI -is to be used for large-scale relation extraction. -" -1697,1504.04802,Ryuta Arisaka,"Gradual Classical Logic for Attributed Objects - Extended in - Re-Presentation",cs.AI cs.CL cs.LO," Our understanding about things is conceptual. By stating that we reason about -objects, it is in fact not the objects but concepts referring to them that we -manipulate. Now, so long just as we acknowledge infinitely extending notions -such as space, time, size, colour, etc, - in short, any reasonable quality - -into which an object is subjected, it becomes infeasible to affirm atomicity in -the concept referring to the object. However, formal/symbolic logics typically -presume atomic entities upon which other expressions are built. Can we reflect -our intuition about the concept onto formal/symbolic logics at all? I assure -that we can, but the usual perspective about the atomicity needs inspected. In -this work, I present gradual logic which materialises the observation that we -cannot tell apart whether a so-regarded atomic entity is atomic or is just -atomic enough not to be considered non-atomic. The motivation is to capture -certain phenomena that naturally occur around concepts with attributes, -including presupposition and contraries. I present logical particulars of the -logic, which is then mapped onto formal semantics. Two linguistically -interesting semantics will be considered. Decidability is shown. -" -1698,1504.04884,"R. Ferrer-i-Cancho, C. Bentz and C. Seguin",Compression and the origins of Zipf's law of abbreviation,cs.IT cs.CL cs.SI math.IT physics.data-an," Languages across the world exhibit Zipf's law of abbreviation, namely more -frequent words tend to be shorter. The generalized version of the law - an -inverse relationship between the frequency of a unit and its magnitude - holds -also for the behaviours of other species and the genetic code. The apparent -universality of this pattern in human language and its ubiquity in other -domains calls for a theoretical understanding of its origins. To this end, we -generalize the information theoretic concept of mean code length as a mean -energetic cost function over the probability and the magnitude of the types of -the repertoire. We show that the minimization of that cost function and a -negative correlation between probability and the magnitude of types are -intimately related. -" -1699,1504.05070,"Han Zhao, Zhengdong Lu, Pascal Poupart",Self-Adaptive Hierarchical Sentence Model,cs.CL cs.LG cs.NE," The ability to accurately model a sentence at varying stages (e.g., -word-phrase-sentence) plays a central role in natural language processing. As -an effort towards this goal we propose a self-adaptive hierarchical sentence -model (AdaSent). AdaSent effectively forms a hierarchy of representations from -words to phrases and then to sentences through recursive gated local -composition of adjacent segments. We design a competitive mechanism (through -gating networks) to allow the representations of the same sentence to be -engaged in a particular learning task (e.g., classification), therefore -effectively mitigating the gradient vanishing problem persistent in other -recursive models. Both qualitative and quantitative analysis shows that AdaSent -can automatically form and select the representations suitable for the task at -hand during training, yielding superior classification performance over -competitor models on 5 benchmark data sets. -" -1700,1504.05319,"Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider - and Timothy Baldwin","Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: - The Impact of Word Representation on Sequence Labelling Tasks",cs.CL," Word embeddings -- distributed word representations that can be learned from -unlabelled data -- have been shown to have high utility in many natural -language processing applications. In this paper, we perform an extrinsic -evaluation of five popular word embedding methods in the context of four -sequence labelling tasks: POS-tagging, syntactic chunking, NER and MWE -identification. A particular focus of the paper is analysing the effects of -task-based updating of word representations. We show that when using word -embeddings as features, as few as several hundred training instances are -sufficient to achieve competitive results, and that word embeddings lead to -improvements over OOV words and out of domain. Perhaps more surprisingly, our -results indicate there is little difference between the different word -embedding methods, and that simple Brown clusters are often competitive with -word embeddings across all tasks we consider. -" -1701,1504.05929,Bishan Yang and Claire Cardie and Peter Frazier,"A Hierarchical Distance-dependent Bayesian Model for Event Coreference - Resolution",cs.CL stat.ML," We present a novel hierarchical distance-dependent Bayesian model for event -coreference resolution. While existing generative models for event coreference -resolution are completely unsupervised, our model allows for the incorporation -of pairwise distances between event mentions -- information that is widely used -in supervised coreference models to guide the generative clustering processing -for better event clustering both within and across documents. We model the -distances between event mentions using a feature-rich learnable distance -function and encode them as Bayesian priors for nonparametric clustering. -Experiments on the ECB+ corpus show that our model outperforms state-of-the-art -methods for both within- and cross-document event coreference resolution. -" -1702,1504.06063,"Lin Ma, Zhengdong Lu, Lifeng Shang, Hang Li",Multimodal Convolutional Neural Networks for Matching Image and Sentence,cs.CV cs.CL cs.NE," In this paper, we propose multimodal convolutional neural networks (m-CNNs) -for matching image and sentence. Our m-CNN provides an end-to-end framework -with convolutional architectures to exploit image representation, word -composition, and the matching relations between the two modalities. More -specifically, it consists of one image CNN encoding the image content, and one -matching CNN learning the joint representation of image and sentence. The -matching CNN composes words to different semantic fragments and learns the -inter-modal relations between image and the composed fragments at different -levels, thus fully exploit the matching relations between image and sentence. -Experimental results on benchmark databases of bidirectional image and sentence -retrieval demonstrate that the proposed m-CNNs can effectively capture the -information necessary for image and sentence matching. Specifically, our -proposed m-CNNs for bidirectional image and sentence retrieval on Flickr30K and -Microsoft COCO databases achieve the state-of-the-art performances. -" -1703,1504.06077,"Nicolas Turenne, Mathieu Andro, Roselyne Corbi\`ere, Tien T. Phan","Open Data Platform for Knowledge Access in Plant Health Domain : VESPA - Mining",cs.IR cs.CL," Important data are locked in ancient literature. It would be uneconomic to -produce these data again and today or to extract them without the help of text -mining technologies. Vespa is a text mining project whose aim is to extract -data on pest and crops interactions, to model and predict attacks on crops, and -to reduce the use of pesticides. A few attempts proposed an agricultural -information access. Another originality of our work is to parse documents with -a dependency of the document architecture. -" -1704,1504.06078,"Nicolas Turenne, Tien Phan","x.ent: R Package for Entities and Relations Extraction based on - Unsupervised Learning and Document Structure",cs.CL cs.AI," Relation extraction with accurate precision is still a challenge when -processing full text databases. We propose an approach based on cooccurrence -analysis in each document for which we used document organization to improve -accuracy of relation extraction. This approach is implemented in a R package -called \emph{x.ent}. Another facet of extraction relies on use of extracted -relation into a querying system for expert end-users. Two datasets had been -used. One of them gets interest from specialists of epidemiology in plant -health. For this dataset usage is dedicated to plant-disease exploration -through agricultural information news. An open-data platform exploits exports -from \emph{x.ent} and is publicly available. -" -1705,1504.06080,Nicolas Turenne,"svcR: An R Package for Support Vector Clustering improved with Geometric - Hashing applied to Lexical Pattern Discovery",cs.LG cs.CL," We present a new R package which takes a numerical matrix format as data -input, and computes clusters using a support vector clustering method (SVC). We -have implemented an original 2D-grid labeling approach to speed up cluster -extraction. In this sense, SVC can be seen as an efficient cluster extraction -if clusters are separable in a 2-D map. Secondly we showed that this SVC -approach using a Jaccard-Radial base kernel can help to classify well enough a -set of terms into ontological classes and help to define regular expression -rules for information extraction in documents; our case study concerns a set of -terms and documents about developmental and molecular biology. -" -1706,1504.06329,Michael Bloodgood and John Grothendieck,Analysis of Stopping Active Learning based on Stabilizing Predictions,cs.LG cs.CL stat.ML," Within the natural language processing (NLP) community, active learning has -been widely investigated and applied in order to alleviate the annotation -bottleneck faced by developers of new NLP systems and technologies. This paper -presents the first theoretical analysis of stopping active learning based on -stabilizing predictions (SP). The analysis has revealed three elements that are -central to the success of the SP method: (1) bounds on Cohen's Kappa agreement -between successively trained models impose bounds on differences in F-measure -performance of the models; (2) since the stop set does not have to be labeled, -it can be made large in practice, helping to guarantee that the results -transfer to previously unseen streams of examples at test/application time; and -(3) good (low variance) sample estimates of Kappa between successive models can -be obtained. Proofs of relationships between the level of Kappa agreement and -the difference in performance between consecutive models are presented. -Specifically, if the Kappa agreement between two models exceeds a threshold T -(where $T>0$), then the difference in F-measure performance between those -models is bounded above by $\frac{4(1-T)}{T}$ in all cases. If precision of the -positive conjunction of the models is assumed to be $p$, then the bound can be -tightened to $\frac{4(1-T)}{(p+1)T}$. -" -1707,1504.06391,Eben M. Haber,"On the Stability of Online Language Features: How Much Text do you Need - to know a Person?",cs.CL," In recent years, numerous studies have inferred personality and other traits -from people's online writing. While these studies are encouraging, more -information is needed in order to use these techniques with confidence. How do -linguistic features vary across different online media, and how much text is -required to have a representative sample for a person? In this paper, we -examine several large sets of online, user-generated text, drawn from Twitter, -email, blogs, and online discussion forums. We examine and compare -population-wide results for the linguistic measure LIWC, and the inferred -traits of Big5 Personality and Basic Human Values. We also empirically measure -the stability of these traits across different sized samples for each -individual. Our results highlight the importance of tuning models to each -online medium, and include guidelines for the minimum amount of text required -for a representative result. -" -1708,1504.06580,"Cicero Nogueira dos Santos, Bing Xiang, Bowen Zhou",Classifying Relations by Ranking with Convolutional Neural Networks,cs.CL cs.LG cs.NE," Relation classification is an important semantic processing task for which -state-ofthe-art systems still rely on costly handcrafted features. In this work -we tackle the relation classification task using a convolutional neural network -that performs classification by ranking (CR-CNN). We propose a new pairwise -ranking loss function that makes it easy to reduce the impact of artificial -classes. We perform experiments using the the SemEval-2010 Task 8 dataset, -which is designed for the task of classifying the relationship between two -nominals marked in a sentence. Using CRCNN, we outperform the state-of-the-art -for this dataset and achieve a F1 of 84.1 without using any costly handcrafted -features. Additionally, our experimental results show that: (1) our approach is -more effective than CNN followed by a softmax classifier; (2) omitting the -representation of the artificial class Other improves both precision and -recall; and (3) using only word embeddings as input features is enough to -achieve state-of-the-art results if we consider only the text between the two -target nominals. -" -1709,1504.06650,Arvind Neelakantan and Michael Collins,"Learning Dictionaries for Named Entity Recognition using Minimal - Supervision",cs.CL stat.ML," This paper describes an approach for automatic construction of dictionaries -for Named Entity Recognition (NER) using large amounts of unlabeled data and a -few seed examples. We use Canonical Correlation Analysis (CCA) to obtain lower -dimensional embeddings (representations) for candidate phrases and classify -these phrases using a small number of labeled examples. Our method achieves -16.5% and 11.3% F-1 score improvement over co-training on disease and virus NER -respectively. We also show that by adding candidate phrase embeddings as -features in a sequence tagger gives better performance compared to using word -embeddings. -" -1710,1504.06654,"Arvind Neelakantan, Jeevan Shankar, Alexandre Passos and Andrew - McCallum","Efficient Non-parametric Estimation of Multiple Embeddings per Word in - Vector Space",cs.CL stat.ML," There is rising interest in vector-space word embeddings and their use in -NLP, especially given recent methods for their fast estimation at very large -scale. Nearly all this work, however, assumes a single vector per word type -ignoring polysemy and thus jeopardizing their usefulness for downstream tasks. -We present an extension to the Skip-gram model that efficiently learns multiple -embeddings per word type. It differs from recent related work by jointly -performing word sense discrimination and embedding learning, by -non-parametrically estimating the number of senses per word type, and by its -efficiency and scalability. We present new state-of-the-art results in the word -similarity in context task and demonstrate its scalability by training with one -machine on a corpus of nearly 1 billion tokens in less than 6 hours. -" -1711,1504.06658,Arvind Neelakantan and Ming-Wei Chang,"Inferring Missing Entity Type Instances for Knowledge Base Completion: - New Dataset and Methods",cs.CL stat.ML," Most of previous work in knowledge base (KB) completion has focused on the -problem of relation extraction. In this work, we focus on the task of inferring -missing entity type instances in a KB, a fundamental task for KB competition -yet receives little attention. Due to the novelty of this task, we construct a -large-scale dataset and design an automatic evaluation methodology. Our -knowledge base completion method uses information within the existing KB and -external information from Wikipedia. We show that individual methods trained -with a global objective that considers unobserved cells from both the entity -and the type side gives consistently higher quality predictions compared to -baseline methods. We also perform manual evaluation on a small subset of the -data to verify the effectiveness of our knowledge base completion methods and -the correctness of our proposed automatic evaluation method. -" -1712,1504.06662,"Arvind Neelakantan, Benjamin Roth and Andrew McCallum",Compositional Vector Space Models for Knowledge Base Completion,cs.CL stat.ML," Knowledge base (KB) completion adds new facts to a KB by making inferences -from existing facts, for example by inferring with high likelihood -nationality(X,Y) from bornIn(X,Y). Most previous methods infer simple one-hop -relational synonyms like this, or use as evidence a multi-hop relational path -treated as an atomic feature, like bornIn(X,Z) -> containedIn(Z,Y). This paper -presents an approach that reasons about conjunctions of multi-hop relations -non-atomically, composing the implications of a path using a recursive neural -network (RNN) that takes as inputs vector embeddings of the binary relation in -the path. Not only does this allow us to generalize to paths unseen at training -time, but also, with a single high-capacity RNN, to predict new relation types -not seen when the compositional model was trained (zero-shot learning). We -assemble a new dataset of over 52M relational triples, and show that our method -improves over a traditional classifier by 11%, and a method leveraging -pre-trained embeddings by 7%. -" -1713,1504.06665,"Michael Pust, Ulf Hermjakob, Kevin Knight, Daniel Marcu, Jonathan May","Using Syntax-Based Machine Translation to Parse English into Abstract - Meaning Representation",cs.CL cs.AI," We present a parser for Abstract Meaning Representation (AMR). We treat -English-to-AMR conversion within the framework of string-to-tree, syntax-based -machine translation (SBMT). To make this work, we transform the AMR structure -into a form suitable for the mechanics of SBMT and useful for modeling. We -introduce an AMR-specific language model and add data and features drawn from -semantic resources. Our resulting AMR parser improves upon state-of-the-art -results by 7 Smatch points. -" -1714,1504.06692,"Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille","Learning like a Child: Fast Novel Visual Concept Learning from Sentence - Descriptions of Images",cs.CV cs.CL cs.LG," In this paper, we address the task of learning novel visual concepts, and -their interactions with other concepts, from a few images with sentence -descriptions. Using linguistic context and visual features, our method is able -to efficiently hypothesize the semantic meaning of new words and add them to -its word dictionary so that they can be used to describe images which contain -these novel concepts. Our method has an image captioning module based on m-RNN -with several improvements. In particular, we propose a transposed weight -sharing scheme, which not only improves performance on image captioning, but -also makes the model more suitable for the novel concept learning task. We -propose methods to prevent overfitting the new concepts. In addition, three -novel concept datasets are constructed for this new task. In the experiments, -we show that our method effectively learns novel visual concepts from a few -examples without disturbing the previously learned concepts. The project page -is http://www.stat.ucla.edu/~junhua.mao/projects/child_learning.html -" -1715,1504.06936,"Alejandro Metke-Jimenez, Sarvnaz Karimi","Concept Extraction to Identify Adverse Drug Reactions in Medical Forums: - A Comparison of Algorithms",cs.AI cs.CL cs.IR," Social media is becoming an increasingly important source of information to -complement traditional pharmacovigilance methods. In order to identify signals -of potential adverse drug reactions, it is necessary to first identify medical -concepts in the social media text. Most of the existing studies use -dictionary-based methods which are not evaluated independently from the overall -signal detection task. - We compare different approaches to automatically identify and normalise -medical concepts in consumer reviews in medical forums. Specifically, we -implement several dictionary-based methods popular in the relevant literature, -as well as a method we suggest based on a state-of-the-art machine learning -method for entity recognition. MetaMap, a popular biomedical concept extraction -tool, is used as a baseline. Our evaluations were performed in a controlled -setting on a common corpus which is a collection of medical forum posts -annotated with concepts and linked to controlled vocabularies such as MedDRA -and SNOMED CT. - To our knowledge, our study is the first to systematically examine the effect -of popular concept extraction methods in the area of signal detection for -adverse reactions. We show that the choice of algorithm or controlled -vocabulary has a significant impact on concept extraction, which will impact -the overall signal detection process. We also show that our proposed machine -learning approach significantly outperforms all the other methods in -identification of both adverse reactions and drugs, even when trained with a -relatively small set of annotated text. -" -1716,1504.07071,"Daniel Hienert, Dennis Wegener, Siegfried Schomisch",Exploring semantically-related concepts from Wikipedia: the case of SeRE,cs.CL cs.IR," In this paper we present our web application SeRE designed to explore -semantically related concepts. Wikipedia and DBpedia are rich data sources to -extract related entities for a given topic, like in- and out-links, broader and -narrower terms, categorisation information etc. We use the Wikipedia full text -body to compute the semantic relatedness for extracted terms, which results in -a list of entities that are most relevant for a topic. For any given query, the -user interface of SeRE visualizes these related concepts, ordered by semantic -relatedness; with snippets from Wikipedia articles that explain the connection -between those two entities. In a user study we examine how SeRE can be used to -find important entities and their relationships for a given topic and to answer -the question of how the classification system can be used for filtering. -" -1717,1504.07225,"Sarath Chandar, Mitesh M. Khapra, Hugo Larochelle, Balaraman Ravindran",Correlational Neural Networks,cs.CL cs.LG cs.NE stat.ML," Common Representation Learning (CRL), wherein different descriptions (or -views) of the data are embedded in a common subspace, is receiving a lot of -attention recently. Two popular paradigms here are Canonical Correlation -Analysis (CCA) based approaches and Autoencoder (AE) based approaches. CCA -based approaches learn a joint representation by maximizing correlation of the -views when projected to the common subspace. AE based methods learn a common -representation by minimizing the error of reconstructing the two views. Each of -these approaches has its own advantages and disadvantages. For example, while -CCA based approaches outperform AE based approaches for the task of transfer -learning, they are not as scalable as the latter. In this work we propose an AE -based approach called Correlational Neural Network (CorrNet), that explicitly -maximizes correlation among the views when projected to the common subspace. -Through a series of experiments, we demonstrate that the proposed CorrNet is -better than the above mentioned approaches with respect to its ability to learn -correlated common representations. Further, we employ CorrNet for several cross -language tasks and show that the representations learned using CorrNet perform -better than the ones learned using other state of the art approaches. -" -1718,1504.07295,Matt Taddy,"Document Classification by Inversion of Distributed Language - Representations",cs.CL cs.IR stat.AP," There have been many recent advances in the structure and measurement of -distributed language models: those that map from words to a vector-space that -is rich in information about word choice and composition. This vector-space is -the distributed language representation. The goal of this note is to point out -that any distributed representation can be turned into a classifier through -inversion via Bayes rule. The approach is simple and modular, in that it will -work with any language representation whose training can be formulated as -optimizing a probability model. In our application to 2 million sentences from -Yelp reviews, we also find that it performs as well as or better than complex -purpose-built algorithms. -" -1719,1504.07324,"Piji Li, Lidong Bing, Wai Lam, Hang Li and Yi Liao",Reader-Aware Multi-Document Summarization via Sparse Coding,cs.CL cs.AI," We propose a new MDS paradigm called reader-aware multi-document -summarization (RA-MDS). Specifically, a set of reader comments associated with -the news reports are also collected. The generated summaries from the reports -for the event should be salient according to not only the reports but also the -reader comments. To tackle this RA-MDS problem, we propose a -sparse-coding-based method that is able to calculate the salience of the text -units by jointly considering news reports and reader comments. Another -reader-aware characteristic of our framework is to improve linguistic quality -via entity rewriting. The rewriting consideration is jointly assessed together -with other summarization requirements under a unified optimization model. To -support the generation of compressive summaries via optimization, we explore a -finer syntactic unit, namely, noun/verb phrase. In this work, we also generate -a data set for conducting RA-MDS. Extensive experiments on this data set and -some classical data sets demonstrate the effectiveness of our proposed -approach. -" -1720,1504.07395,"Thanh-Le Ha, Jan Niehues, Alex Waibel",Lexical Translation Model Using a Deep Neural Network Architecture,cs.CL cs.LG cs.NE," In this paper we combine the advantages of a model using global source -sentence contexts, the Discriminative Word Lexicon, and neural networks. By -using deep neural networks instead of the linear maximum entropy model in the -Discriminative Word Lexicon models, we are able to leverage dependencies -between different source words due to the non-linearity. Furthermore, the -models for different target words can share parameters and therefore data -sparsity problems are effectively reduced. - By using this approach in a state-of-the-art translation system, we can -improve the performance by up to 0.5 BLEU points for three different language -pairs on the TED translation task. -" -1721,1504.07459,"Marian-Andrei Rizoiu, Adrien Guille and Julien Velcin","CommentWatcher: An Open Source Web-based platform for analyzing - discussions on web forums",cs.CL cs.SI," We present CommentWatcher, an open source tool aimed at analyzing discussions -on web forums. Constructed as a web platform, CommentWatcher features automatic -mass fetching of user posts from forum on multiple sites, extracting topics, -visualizing the topics as an expression cloud and exploring their temporal -evolution. The underlying social network of users is simultaneously constructed -using the citation relations between users and visualized as a graph structure. -Our platform addresses the issues of the diversity and dynamics of structures -of webpages hosting the forums by implementing a parser architecture that is -independent of the HTML structure of webpages. This allows easy on-the-fly -adding of new websites. Two types of users are targeted: end users who seek to -study the discussed topics and their temporal evolution, and researchers in -need of establishing a forum benchmark dataset and comparing the performances -of analysis tools. -" -1722,1504.07678,Hongzhao Huang and Larry Heck and Heng Ji,"Leveraging Deep Neural Networks and Knowledge Graphs for Entity - Disambiguation",cs.CL," Entity Disambiguation aims to link mentions of ambiguous entities to a -knowledge base (e.g., Wikipedia). Modeling topical coherence is crucial for -this task based on the assumption that information from the same semantic -context tends to belong to the same topic. This paper presents a novel deep -semantic relatedness model (DSRM) based on deep neural networks (DNN) and -semantic knowledge graphs (KGs) to measure entity semantic relatedness for -topical coherence modeling. The DSRM is directly trained on large-scale KGs and -it maps heterogeneous types of knowledge of an entity from KGs to numerical -feature vectors in a latent space such that the distance between two -semantically-related entities is minimized. Compared with the state-of-the-art -relatedness approach proposed by (Milne and Witten, 2008a), the DSRM obtains -19.4% and 24.5% reductions in entity disambiguation errors on two publicly -available datasets respectively. -" -1723,1504.07843,"Hyejin Youn, Logan Sutton, Eric Smith, Cristopher Moore, Jon F. - Wilkins, Ian Maddieson, William Croft, Tanmoy Bhattacharya",On the universal structure of human lexical semantics,physics.soc-ph cs.CL," How universal is human conceptual structure? The way concepts are organized -in the human brain may reflect distinct features of cultural, historical, and -environmental background in addition to properties universal to human -cognition. Semantics, or meaning expressed through language, provides direct -access to the underlying conceptual structure, but meaning is notoriously -difficult to measure, let alone parameterize. Here we provide an empirical -measure of semantic proximity between concepts using cross-linguistic -dictionaries. Across languages carefully selected from a phylogenetically and -geographically stratified sample of genera, translations of words reveal cases -where a particular language uses a single polysemous word to express concepts -represented by distinct words in another. We use the frequency of polysemies -linking two concepts as a measure of their semantic proximity, and represent -the pattern of such linkages by a weighted network. This network is highly -uneven and fragmented: certain concepts are far more prone to polysemy than -others, and there emerge naturally interpretable clusters loosely connected to -each other. Statistical analysis shows such structural properties are -consistent across different language groups, largely independent of geography, -environment, and literacy. It is therefore possible to conclude the conceptual -structure connecting basic vocabulary studied is primarily due to universal -features of human cognition and language use. -" -1724,1504.08050,Shuangyong Song and Yao Meng,Detecting Concept-level Emotion Cause in Microblogging,cs.CL cs.AI," In this paper, we propose a Concept-level Emotion Cause Model (CECM), instead -of the mere word-level models, to discover causes of microblogging users' -diversified emotions on specific hot event. A modified topic-supervised biterm -topic model is utilized in CECM to detect emotion topics' in event-related -tweets, and then context-sensitive topical PageRank is utilized to detect -meaningful multiword expressions as emotion causes. Experimental results on a -dataset from Sina Weibo, one of the largest microblogging websites in China, -show CECM can better detect emotion causes than baseline methods. -" -1725,1504.08102,Emiel van Miltenburg,Detecting and ordering adjectival scalemates,cs.CL," This paper presents a pattern-based method that can be used to infer -adjectival scales, such as , from a corpus. Specifically, -the proposed method uses lexical patterns to automatically identify and order -pairs of scalemates, followed by a filtering phase in which unrelated pairs are -discarded. For the filtering phase, several different similarity measures are -implemented and compared. The model presented in this paper is evaluated using -the current standard, along with a novel evaluation set, and shown to be at -least as good as the current state-of-the-art. -" -1726,1504.08183,Andrey Kutuzov and Igor Andreev,"Texts in, meaning out: neural language models in semantic similarity - task for Russian",cs.CL," Distributed vector representations for natural language vocabulary get a lot -of attention in contemporary computational linguistics. This paper summarizes -the experience of applying neural network language models to the task of -calculating semantic similarity for Russian. The experiments were performed in -the course of Russian Semantic Similarity Evaluation track, where our models -took from the 2nd to the 5th position, depending on the task. - We introduce the tools and corpora used, comment on the nature of the shared -task and describe the achieved results. It was found out that Continuous -Skip-gram and Continuous Bag-of-words models, previously successfully applied -to English material, can be used for semantic modeling of Russian as well. -Moreover, we show that texts in Russian National Corpus (RNC) provide an -excellent training material for such models, outperforming other, much larger -corpora. It is especially true for semantic relatedness tasks (although -stacking models trained on larger corpora on top of RNC models improves -performance even more). - High-quality semantic vectors learned in such a way can be used in a variety -of linguistic tasks and promise an exciting field for further study. -" -1727,1504.08342,Shay B. Cohen and Daniel Gildea,"Parsing Linear Context-Free Rewriting Systems with Fast Matrix - Multiplication",cs.CL cs.FL," We describe a matrix multiplication recognition algorithm for a subset of -binary linear context-free rewriting systems (LCFRS) with running time -$O(n^{\omega d})$ where $M(m) = O(m^{\omega})$ is the running time for $m -\times m$ matrix multiplication and $d$ is the ""contact rank"" of the LCFRS -- -the maximal number of combination and non-combination points that appear in the -grammar rules. We also show that this algorithm can be used as a subroutine to -get a recognition algorithm for general binary LCFRS with running time -$O(n^{\omega d + 1})$. The currently best known $\omega$ is smaller than -$2.38$. Our result provides another proof for the best known result for parsing -mildly context sensitive formalisms such as combinatory categorial grammars, -head grammars, linear indexed grammars, and tree adjoining grammars, which can -be parsed in time $O(n^{4.76})$. It also shows that inversion transduction -grammars can be parsed in time $O(n^{5.76})$. In addition, binary LCFRS -subsumes many other formalisms and types of grammars, for some of which we also -improve the asymptotic complexity of parsing. -" -1728,1505.00122,Richard A. Blythe,Hierarchy of Scales in Language Dynamics,physics.soc-ph cs.CL," Methods and insights from statistical physics are finding an increasing -variety of applications where one seeks to understand the emergent properties -of a complex interacting system. One such area concerns the dynamics of -language at a variety of levels of description, from the behaviour of -individual agents learning simple artificial languages from each other, up to -changes in the structure of languages shared by large groups of speakers over -historical timescales. In this Colloquium, we survey a hierarchy of scales at -which language and linguistic behaviour can be described, along with the main -progress in understanding that has been made at each of them---much of which -has come from the statistical physics community. We argue that future -developments may arise by linking the different levels of the hierarchy -together in a more coherent fashion, in particular where this allows more -effective use of rich empirical data sets. -" -1729,1505.00138,Dimitri Kartsaklis,"Compositional Distributional Semantics with Compact Closed Categories - and Frobenius Algebras",cs.CL cs.AI math.CT math.QA quant-ph," This thesis contributes to ongoing research related to the categorical -compositional model for natural language of Coecke, Sadrzadeh and Clark in -three ways: Firstly, I propose a concrete instantiation of the abstract -framework based on Frobenius algebras (joint work with Sadrzadeh). The theory -improves shortcomings of previous proposals, extends the coverage of the -language, and is supported by experimental work that improves existing results. -The proposed framework describes a new class of compositional models that find -intuitive interpretations for a number of linguistic phenomena. Secondly, I -propose and evaluate in practice a new compositional methodology which -explicitly deals with the different levels of lexical ambiguity (joint work -with Pulman). A concrete algorithm is presented, based on the separation of -vector disambiguation from composition in an explicit prior step. Extensive -experimental work shows that the proposed methodology indeed results in more -accurate composite representations for the framework of Coecke et al. in -particular and every other class of compositional models in general. As a last -contribution, I formalize the explicit treatment of lexical ambiguity in the -context of the categorical framework by resorting to categorical quantum -mechanics (joint work with Coecke). In the proposed extension, the concept of a -distributional vector is replaced with that of a density matrix, which -compactly represents a probability distribution over the potential different -meanings of the specific word. Composition takes the form of quantum -measurements, leading to interesting analogies between quantum physics and -linguistics. -" -1730,1505.00161,"Danushka Bollegala, Takanori Maehara, Ken-ichi Kawarabayashi",Embedding Semantic Relations into Word Representations,cs.CL," Learning representations for semantic relations is important for various -tasks such as analogy detection, relational search, and relation -classification. Although there have been several proposals for learning -representations for individual words, learning word representations that -explicitly capture the semantic relations between words remains under -developed. We propose an unsupervised method for learning vector -representations for words such that the learnt representations are sensitive to -the semantic relations that exist between two words. First, we extract lexical -patterns from the co-occurrence contexts of two words in a corpus to represent -the semantic relations that exist between those two words. Second, we represent -a lexical pattern as the weighted sum of the representations of the words that -co-occur with that lexical pattern. Third, we train a binary classifier to -detect relationally similar vs. non-similar lexical pattern pairs. The proposed -method is unsupervised in the sense that the lexical pattern pairs we use as -train data are automatically sampled from a corpus, without requiring any -manual intervention. Our proposed method statistically significantly -outperforms the current state-of-the-art word representations on three -benchmark datasets for proportional analogy detection, demonstrating its -ability to accurately capture the semantic relations among words. -" -1731,1505.00277,"Dana Movshovitz-Attias, William W. Cohen","Grounded Discovery of Coordinate Term Relationships between Software - Entities",cs.CL cs.AI cs.LG cs.SE," We present an approach for the detection of coordinate-term relationships -between entities from the software domain, that refer to Java classes. Usually, -relations are found by examining corpus statistics associated with text -entities. In some technical domains, however, we have access to additional -information about the real-world objects named by the entities, suggesting that -coupling information about the ""grounded"" entities with corpus statistics might -lead to improved methods for relation discovery. To this end, we develop a -similarity measure for Java classes using distributional information about how -they are used in software, which we combine with corpus statistics on the -distribution of contexts in which the classes appear in text. Using our -approach, cross-validation accuracy on this dataset can be improved -dramatically, from around 60% to 88%. Human labeling results show that our -classifier has an F1 score of 86% over the top 1000 predicted pairs. -" -1732,1505.00468,"Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. - Lawrence Zitnick, Dhruv Batra, Devi Parikh",VQA: Visual Question Answering,cs.CL cs.CV," We propose the task of free-form and open-ended Visual Question Answering -(VQA). Given an image and a natural language question about the image, the task -is to provide an accurate natural language answer. Mirroring real-world -scenarios, such as helping the visually impaired, both the questions and -answers are open-ended. Visual questions selectively target different areas of -an image, including background details and underlying context. As a result, a -system that succeeds at VQA typically needs a more detailed understanding of -the image and complex reasoning than a system producing generic image captions. -Moreover, VQA is amenable to automatic evaluation, since many open-ended -answers contain only a few words or a closed set of answers that can be -provided in a multiple-choice format. We provide a dataset containing ~0.25M -images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the -information it provides. Numerous baselines and methods for VQA are provided -and compared with human performance. Our VQA demo is available on CloudCV -(http://cloudcv.org/vqa). -" -1733,1505.00863,"Shuangyong Song, Yao Meng, Zhongguang Zheng, Jun Sun","A Feature-based Classification Technique for Answering Multi-choice - World History Questions",cs.IR cs.AI cs.CL," Our FRDC_QA team participated in the QA-Lab English subtask of the NTCIR-11. -In this paper, we describe our system for solving real-world university -entrance exam questions, which are related to world history. Wikipedia is used -as the main external resource for our system. Since problems with choosing -right/wrong sentence from multiple sentence choices account for about -two-thirds of the total, we individually design a classification based model -for solving this type of questions. For other types of questions, we also -design some simple methods. -" -1734,1505.01072,"Arun S. Maiya, Dale Visser, Andrew Wan",Mining Measured Information from Text,cs.CL cs.IR," We present an approach to extract measured information from text (e.g., a -1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such -extractions are critically important across a wide range of domains - -especially those involving search and exploration of scientific and technical -documents. We first propose a rule-based entity extractor to mine measured -quantities (i.e., a numeric value paired with a measurement unit), which -supports a vast and comprehensive set of both common and obscure measurement -units. Our method is highly robust and can correctly recover valid measured -quantities even when significant errors are introduced through the process of -converting document formats like PDF to plain text. Next, we describe an -approach to extracting the properties being measured (e.g., the property ""pixel -pitch"" in the phrase ""a pixel pitch as high as 352 {\mu}m""). Finally, we -present MQSearch: the realization of a search engine with full support for -measured information. -" -1735,1505.01121,Mateusz Malinowski and Marcus Rohrbach and Mario Fritz,"Ask Your Neurons: A Neural-based Approach to Answering Questions about - Images",cs.CV cs.AI cs.CL," We address a question answering task on real-world images that is set up as a -Visual Turing Test. By combining latest advances in image representation and -natural language processing, we propose Neural-Image-QA, an end-to-end -formulation to this problem for which all parts are trained jointly. In -contrast to previous efforts, we are facing a multi-modal problem where the -language output (answer) is conditioned on visual and natural language input -(image and question). Our approach Neural-Image-QA doubles the performance of -the previous best approach on this problem. We provide additional insights into -the problem by analyzing how much information is contained only in the language -part for which we provide a new human baseline. To study human consensus, which -is related to the ambiguities inherent in this challenging task, we propose two -novel metrics and collect additional answers which extends the original DAQUAR -dataset to DAQUAR-Consensus. -" -1736,1505.01393,"Iana Atanassova, Marc Bertin, Philipp Mayr","Mining Scientific Papers for Bibliometrics: a (very) Brief Survey of - Methods and Tools",cs.DL cs.CL," The Open Access movement in scientific publishing and search engines like -Google Scholar have made scientific articles more broadly accessible. During -the last decade, the availability of scientific papers in full text has become -more and more widespread thanks to the growing number of publications on online -platforms such as ArXiv and CiteSeer. The efforts to provide articles in -machine-readable formats and the rise of Open Access publishing have resulted -in a number of standardized formats for scientific papers (such as NLM-JATS, -TEI, DocBook). Our aim is to stimulate research at the intersection of -Bibliometrics and Computational Linguistics in order to study the ways -Bibliometrics can benefit from large-scale text analytics and sense mining of -scientific papers, thus exploring the interdisciplinarity of Bibliometrics and -Natural Language Processing. -" -1737,1505.01504,"Shiliang Zhang, Hui Jiang, Mingbin Xu, Junfeng Hou, Lirong Dai","A Fixed-Size Encoding Method for Variable-Length Sequences with its - Application to Neural Network Language Models",cs.NE cs.CL cs.LG," In this paper, we propose the new fixed-size ordinally-forgetting encoding -(FOFE) method, which can almost uniquely encode any variable-length sequence of -words into a fixed-size representation. FOFE can model the word order in a -sequence using a simple ordinally-forgetting mechanism according to the -positions of words. In this work, we have applied FOFE to feedforward neural -network language models (FNN-LMs). Experimental results have shown that without -using any recurrent feedbacks, FOFE based FNN-LMs can significantly outperform -not only the standard fixed-input FNN-LMs but also the popular RNN-LMs. -" -1738,1505.01757,Kazem Taghva,"Contextual Analysis for Middle Eastern Languages with Hidden Markov - Models",cs.CL cs.AI," Displaying a document in Middle Eastern languages requires contextual -analysis due to different presentational forms for each character of the -alphabet. The words of the document will be formed by the joining of the -correct positional glyphs representing corresponding presentational forms of -the characters. A set of rules defines the joining of the glyphs. As usual, -these rules vary from language to language and are subject to interpretation by -the software developers. - In this paper, we propose a machine learning approach for contextual analysis -based on the first order Hidden Markov Model. We will design and build a model -for the Farsi language to exhibit this technology. The Farsi model achieves 94 -\% accuracy with the training based on a short list of 89 Farsi vocabularies -consisting of 2780 Farsi characters. - The experiment can be easily extended to many languages including Arabic, -Urdu, and Sindhi. Furthermore, the advantage of this approach is that the same -software can be used to perform contextual analysis without coding complex -rules for each specific language. Of particular interest is that the languages -with fewer speakers can have greater representation on the web, since they are -typically ignored by software developers due to lack of financial incentives. -" -1739,1505.01809,"Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong - He, Geoffrey Zweig, Margaret Mitchell",Language Models for Image Captioning: The Quirks and What Works,cs.CL cs.AI cs.CV cs.LG," Two recent approaches have achieved state-of-the-art results in image -captioning. The first uses a pipelined process where a set of candidate words -is generated by a convolutional neural network (CNN) trained on images, and -then a maximum entropy (ME) language model is used to arrange these words into -a coherent sentence. The second uses the penultimate activation layer of the -CNN as input to a recurrent neural network (RNN) that then generates the -caption sequence. In this paper, we compare the merits of these different -language modeling approaches for the first time by using the same -state-of-the-art CNN as input. We examine issues in the different approaches, -including linguistic irregularities, caption repetition, and data set overlap. -By combining key aspects of the ME and RNN methods, we achieve a new record -performance over previously published results on the benchmark COCO dataset. -However, the gains we see in BLEU do not translate to human judgments. -" -1740,1505.02074,"Mengye Ren, Ryan Kiros, Richard Zemel",Exploring Models and Data for Image Question Answering,cs.LG cs.AI cs.CL cs.CV," This work aims to address the problem of image-based question-answering (QA) -with new models and datasets. In our work, we propose to use neural networks -and visual semantic embeddings, without intermediate stages such as object -detection and image segmentation, to predict answers to simple questions about -images. Our model performs 1.8 times better than the only published results on -an existing image QA dataset. We also present a question generation algorithm -that converts image descriptions, which are widely available, into QA form. We -used this algorithm to produce an order-of-magnitude larger dataset, with more -evenly distributed answers. A suite of baseline results on this new dataset are -also presented. -" -1741,1505.02251,Aris Kosmopoulos and Georgios Paliouras and Ion Androutsopoulos,Probabilistic Cascading for Large Scale Hierarchical Classification,cs.LG cs.CL cs.IR," Hierarchies are frequently used for the organization of objects. Given a -hierarchy of classes, two main approaches are used, to automatically classify -new instances: flat classification and cascade classification. Flat -classification ignores the hierarchy, while cascade classification greedily -traverses the hierarchy from the root to the predicted leaf. In this paper we -propose a new approach, which extends cascade classification to predict the -right leaf by estimating the probability of each root-to-leaf path. We provide -experimental results which indicate that, using the same classification -algorithm, one can achieve better results with our approach, compared to the -traditional flat and cascade classifications. -" -1742,1505.02419,Matthew R. Gormley and Mo Yu and Mark Dredze,"Improved Relation Extraction with Feature-Rich Compositional Embedding - Models",cs.CL cs.AI cs.LG," Compositional embedding models build a representation (or embedding) for a -linguistic structure based on its component word embeddings. We propose a -Feature-rich Compositional Embedding Model (FCM) for relation extraction that -is expressive, generalizes to new domains, and is easy-to-implement. The key -idea is to combine both (unlexicalized) hand-crafted features with learned word -embeddings. The model is able to directly tackle the difficulties met by -traditional compositional embeddings models, such as handling arbitrary types -of sentence annotations and utilizing global information for composition. We -test the proposed model on two relation extraction tasks, and demonstrate that -our model outperforms both previous compositional models and traditional -feature rich models on the ACE 2005 relation extraction task, and the SemEval -2010 relation classification task. The combination of our model and a -log-linear classifier with hand-crafted features gives state-of-the-art -results. -" -1743,1505.02425,"Michael Heilman, Kenji Sagae",Fast Rhetorical Structure Theory Discourse Parsing,cs.CL," In recent years, There has been a variety of research on discourse parsing, -particularly RST discourse parsing. Most of the recent work on RST parsing has -focused on implementing new types of features or learning algorithms in order -to improve accuracy, with relatively little focus on efficiency, robustness, or -practical use. Also, most implementations are not widely available. Here, we -describe an RST segmentation and parsing system that adapts models and feature -sets from various previous work, as described below. Its accuracy is near -state-of-the-art, and it was developed to be fast, robust, and practical. For -example, it can process short documents such as news articles or essays in less -than a second. -" -1744,1505.02973,"Evangelos Psomakelis, Konstantinos Tserpes, Dimosthenis - Anagnostopoulos, Theodora Varvarigou",Comparing methods for Twitter Sentiment Analysis,cs.CL cs.IR cs.SI," This work extends the set of works which deal with the popular problem of -sentiment analysis in Twitter. It investigates the most popular document -(""tweet"") representation methods which feed sentiment evaluation mechanisms. In -particular, we study the bag-of-words, n-grams and n-gram graphs approaches and -for each of them we evaluate the performance of a lexicon-based and 7 -learning-based classification algorithms (namely SVM, Na\""ive Bayesian -Networks, Logistic Regression, Multilayer Perceptrons, Best-First Trees, -Functional Trees and C4.5) as well as their combinations, using a set of 4451 -manually annotated tweets. The results demonstrate the superiority of -learning-based methods and in particular of n-gram graphs approaches for -predicting the sentiment of tweets. They also show that the combinatory -approach has impressive effects on n-grams, raising the confidence up to 83.15% -on the 5-Grams, using majority vote and a balanced dataset (equal number of -positive, negative and neutral tweets for training). In the n-gram graph cases -the improvement was small to none, reaching 94.52% on the 4-gram graphs, using -Orthodromic distance and a threshold of 0.001. -" -1745,1505.03081,"AbdelRahim A. Elmadany, Sherif M. Abdou, Mervat Gheith","Turn Segmentation into Utterances for Arabic Spontaneous Dialogues and - Instance Messages",cs.CL," Text segmentation task is an essential processing task for many of Natural -Language Processing (NLP) such as text summarization, text translation, -dialogue language understanding, among others. Turns segmentation considered -the key player in dialogue understanding task for building automatic -Human-Computer systems. In this paper, we introduce a novel approach to turn -segmentation into utterances for Egyptian spontaneous dialogues and Instance -Messages (IM) using Machine Learning (ML) approach as a part of automatic -understanding Egyptian spontaneous dialogues and IM task. Due to the lack of -Egyptian dialect dialogue corpus the system evaluated by our corpus includes -3001 turns, which are collected, segmented, and annotated manually from -Egyptian call-centers. The system achieves F1 scores of 90.74% and accuracy of -95.98%. -" -1746,1505.03084,"AbdelRahim A. Elmadany, Sherif M. Abdou, Mervat Gheith","A Survey of Arabic Dialogues Understanding for Spontaneous Dialogues and - Instant Message",cs.CL," Building dialogues systems interaction has recently gained considerable -attention, but most of the resources and systems built so far are tailored to -English and other Indo-European languages. The need for designing systems for -other languages is increasing such as Arabic language. For this reasons, there -are more interest for Arabic dialogue acts classification task because it a key -player in Arabic language understanding to building this systems. This paper -surveys different techniques for dialogue acts classification for Arabic. We -describe the main existing techniques for utterances segmentations and -classification, annotation schemas, and test corpora for Arabic dialogues -understanding that have introduced in the literature -" -1747,1505.03085,Edwin Lunando and Ayu Purwarianti,Indonesian Social Media Sentiment Analysis With Sarcasm Detection,cs.CL," Sarcasm is considered one of the most difficult problem in sentiment -analysis. In our ob-servation on Indonesian social media, for cer-tain topics, -people tend to criticize something using sarcasm. Here, we proposed two -additional features to detect sarcasm after a common sentiment analysis is -conducted. The features are the negativity information and the number of -interjection words. We also employed translated SentiWordNet in the sentiment -classification. All the classifications were conducted with machine learning -algorithms. The experimental results showed that the additional features are -quite effective in the sarcasm detection. -" -1748,1505.03105,"Hossam S. Ibrahim, Sherif M. Abdou, Mervat Gheith",Sentiment Analysis For Modern Standard Arabic And Colloquial,cs.CL," The rise of social media such as blogs and social networks has fueled -interest in sentiment analysis. With the proliferation of reviews, ratings, -recommendations and other forms of online expression, online opinion has turned -into a kind of virtual currency for businesses looking to market their -products, identify new opportunities and manage their reputations, therefore -many are now looking to the field of sentiment analysis. In this paper, we -present a feature-based sentence level approach for Arabic sentiment analysis. -Our approach is using Arabic idioms/saying phrases lexicon as a key importance -for improving the detection of the sentiment polarity in Arabic sentences as -well as a number of novels and rich set of linguistically motivated features -contextual Intensifiers, contextual Shifter and negation handling), syntactic -features for conflicting phrases which enhance the sentiment classification -accuracy. Furthermore, we introduce an automatic expandable wide coverage -polarity lexicon of Arabic sentiment words. The lexicon is built with -gold-standard sentiment words as a seed which is manually collected and -annotated and it expands and detects the sentiment orientation automatically of -new sentiment words using synset aggregation technique and free online Arabic -lexicons and thesauruses. Our data focus on modern standard Arabic (MSA) and -Egyptian dialectal Arabic tweets and microblogs (hotel reservation, product -reviews, etc.). The experimental results using our resources and techniques -with SVM classifier indicate high performance levels, with accuracies of over -95%. -" -1749,1505.03239,"Sarika Hegde, K. K. Achary, Surendra Shetty","Feature selection using Fisher's ratio technique for automatic speech - recognition",cs.CL," Automatic Speech Recognition involves mainly two steps; feature extraction -and classification . Mel Frequency Cepstral Coefficient is used as one of the -prominent feature extraction techniques in ASR. Usually, the set of all 12 MFCC -coefficients is used as the feature vector in the classification step. But the -question is whether the same or improved classification accuracy can be -achieved by using a subset of 12 MFCC as feature vector. In this paper, -Fisher's ratio technique is used for selecting a subset of 12 MFCC coefficients -that contribute more in discriminating a pattern. The selected coefficients are -used in classification with Hidden Markov Model algorithm. The classification -accuracies that we get by using 12 coefficients and by using the selected -coefficients are compared. -" -1750,1505.03783,"Germinal Cocho, Jorge Flores, Carlos Gershenson, Carlos Pineda, Sergio - S\'anchez","Rank diversity of languages: Generic behavior in computational - linguistics",cs.CL," Statistical studies of languages have focused on the rank-frequency -distribution of words. Instead, we introduce here a measure of how word ranks -change in time and call this distribution \emph{rank diversity}. We calculate -this diversity for books published in six European languages since 1800, and -find that it follows a universal lognormal distribution. Based on the mean and -standard deviation associated with the lognormal distribution, we define three -different word regimes of languages: ""heads"" consist of words which almost do -not change their rank in time, ""bodies"" are words of general use, while ""tails"" -are comprised by context-specific words and vary their rank considerably in -time. The heads and bodies reflect the size of language cores identified by -linguists for basic communication. We propose a Gaussian random walk model -which reproduces the rank variation of words in time and thus the diversity. -Rank diversity of words can be understood as the result of random variations in -rank, where the size of the variation depends on the rank itself. We find that -the core size is similar for all languages studied. -" -1751,1505.03823,"Miao Fan, Qiang Zhou and Thomas Fang Zheng",Distant Supervision for Entity Linking,cs.CL cs.IR," Entity linking is an indispensable operation of populating knowledge -repositories for information extraction. It studies on aligning a textual -entity mention to its corresponding disambiguated entry in a knowledge -repository. In this paper, we propose a new paradigm named distantly supervised -entity linking (DSEL), in the sense that the disambiguated entities that belong -to a huge knowledge repository (Freebase) are automatically aligned to the -corresponding descriptive webpages (Wiki pages). In this way, a large scale of -weakly labeled data can be generated without manual annotation and fed to a -classifier for linking more newly discovered entities. Compared with -traditional paradigms based on solo knowledge base, DSEL benefits more via -jointly leveraging the respective advantages of Freebase and Wikipedia. -Specifically, the proposed paradigm facilitates bridging the disambiguated -labels (Freebase) of entities and their textual descriptions (Wikipedia) for -Web-scale entities. Experiments conducted on a dataset of 140,000 items and -60,000 features achieve a baseline F1-measure of 0.517. Furthermore, we analyze -the feature performance and improve the F1-measure to 0.545. -" -1752,1505.04197,"AbdelRahim A. Elmadany, Sherif M. Abdou, Mervat Gheith",Arabic Inquiry-Answer Dialogue Acts Annotation Schema,cs.CL," We present an annotation schema as part of an effort to create a manually -annotated corpus for Arabic dialogue language understanding including spoken -dialogue and written ""chat"" dialogue for inquiry-answer domain. The proposed -schema handles mainly the request and response acts that occurs frequently in -inquiry-answer debate conversations expressing request services, suggests, and -offers. We applied the proposed schema on 83 Arabic inquiry-answer dialogues. -" -1753,1505.04313,Erkki Luuk,A type-theoretical approach to Universal Grammar,cs.CL math.LO," The idea of Universal Grammar (UG) as the hypothetical linguistic structure -shared by all human languages harkens back at least to the 13th century. The -best known modern elaborations of the idea are due to Chomsky. Following a -devastating critique from theoretical, typological and field linguistics, these -elaborations, the idea of UG itself and the more general idea of language -universals stand untenable and are largely abandoned. The proposal tackles the -hypothetical contents of UG using dependent and polymorphic type theory in a -framework very different from the Chomskyan ones. We introduce a type logic for -a precise, universal and parsimonious representation of natural language -morphosyntax and compositional semantics. The logic handles grammatical -ambiguity (with polymorphic types), selectional restrictions and diverse kinds -of anaphora (with dependent types), and features a partly universal set of -morphosyntactic types (by the Curry-Howard isomorphism). -" -1754,1505.04342,"Eric M. Clark, Jake Ryland Williams, Chris A. Jones, Richard A. - Galbraith, Christopher M. Danforth, Peter Sheridan Dodds","Sifting Robotic from Organic Text: A Natural Language Approach for - Detecting Automation on Twitter",cs.CL," Twitter, a popular social media outlet, has evolved into a vast source of -linguistic data, rich with opinion, sentiment, and discussion. Due to the -increasing popularity of Twitter, its perceived potential for exerting social -influence has led to the rise of a diverse community of automatons, commonly -referred to as bots. These inorganic and semi-organic Twitter entities can -range from the benevolent (e.g., weather-update bots, help-wanted-alert bots) -to the malevolent (e.g., spamming messages, advertisements, or radical -opinions). Existing detection algorithms typically leverage meta-data (time -between tweets, number of followers, etc.) to identify robotic accounts. Here, -we present a powerful classification scheme that exclusively uses the natural -language text from organic users to provide a criterion for identifying -accounts posting automated messages. Since the classifier operates on text -alone, it is flexible and may be applied to any textual data beyond the -Twitter-sphere. -" -1755,1505.04420,Miryam de Lhoneux,CCG Parsing and Multiword Expressions,cs.CL," This thesis presents a study about the integration of information about -Multiword Expressions (MWEs) into parsing with Combinatory Categorial Grammar -(CCG). We build on previous work which has shown the benefit of adding -information about MWEs to syntactic parsing by implementing a similar pipeline -with CCG parsing. More specifically, we collapse MWEs to one token in training -and test data in CCGbank, a corpus which contains sentences annotated with CCG -derivations. Our collapsing algorithm however can only deal with MWEs when they -form a constituent in the data which is one of the limitations of our approach. - We study the effect of collapsing training and test data. A parsing effect -can be obtained if collapsed data help the parser in its decisions and a -training effect can be obtained if training on the collapsed data improves -results. We also collapse the gold standard and show that our model -significantly outperforms the baseline model on our gold standard, which -indicates that there is a training effect. We show that the baseline model -performs significantly better on our gold standard when the data are collapsed -before parsing than when the data are collapsed after parsing which indicates -that there is a parsing effect. We show that these results can lead to improved -performance on the non-collapsed standard benchmark although we fail to show -that it does so significantly. We conclude that despite the limited settings, -there are noticeable improvements from using MWEs in parsing. We discuss ways -in which the incorporation of MWEs into parsing can be improved and hypothesize -that this will lead to more substantial results. - We finally show that turning the MWE recognition part of the pipeline into an -experimental part is a useful thing to do as we obtain different results with -different recognizers. -" -1756,1505.04630,"Zhiyuan Tang, Dong Wang and Zhiyong Zhang",Recurrent Neural Network Training with Dark Knowledge Transfer,stat.ML cs.CL cs.LG cs.NE," Recurrent neural networks (RNNs), particularly long short-term memory (LSTM), -have gained much attention in automatic speech recognition (ASR). Although some -successful stories have been reported, training RNNs remains highly -challenging, especially with limited training data. Recent research found that -a well-trained model can be used as a teacher to train other child models, by -using the predictions generated by the teacher model as supervision. This -knowledge transfer learning has been employed to train simple neural nets with -a complex one, so that the final performance can reach a level that is -infeasible to obtain by regular training. In this paper, we employ the -knowledge transfer learning approach to train RNNs (precisely LSTM) using a -deep neural network (DNN) model as the teacher. This is different from most of -the existing research on knowledge transfer learning, since the teacher (DNN) -is assumed to be weaker than the child (RNN); however, our experiments on an -ASR task showed that it works fairly well: without applying any tricks on the -learning scheme, this approach can train RNNs successfully even with limited -training data. -" -1757,1505.04657,"Phong Minh Vu, Tam The Nguyen, Hung Viet Pham, Tung Thanh Nguyen",Mining User Opinions in Mobile App Reviews: A Keyword-based Approach,cs.IR cs.CL," User reviews of mobile apps often contain complaints or suggestions which are -valuable for app developers to improve user experience and satisfaction. -However, due to the large volume and noisy-nature of those reviews, manually -analyzing them for useful opinions is inherently challenging. To address this -problem, we propose MARK, a keyword-based framework for semi-automated review -analysis. MARK allows an analyst describing his interests in one or some mobile -apps by a set of keywords. It then finds and lists the reviews most relevant to -those keywords for further analysis. It can also draw the trends over time of -those keywords and detect their sudden changes, which might indicate the -occurrences of serious issues. To help analysts describe their interests more -effectively, MARK can automatically extract keywords from raw reviews and rank -them by their associations with negative reviews. In addition, based on a -vector-based semantic representation of keywords, MARK can divide a large set -of keywords into more cohesive subsets, or suggest keywords similar to the -selected ones. -" -1758,1505.04771,"Eric Malmi, Pyry Takala, Hannu Toivonen, Tapani Raiko, Aristides - Gionis",DopeLearning: A Computational Approach to Rap Lyrics Generation,cs.LG cs.AI cs.CL cs.NE," Writing rap lyrics requires both creativity to construct a meaningful, -interesting story and lyrical skills to produce complex rhyme patterns, which -form the cornerstone of good flow. We present a rap lyrics generation method -that captures both of these aspects. First, we develop a prediction model to -identify the next line of existing lyrics from a set of candidate next lines. -This model is based on two machine-learning techniques: the RankSVM algorithm -and a deep neural network model with a novel structure. Results show that the -prediction model can identify the true next line among 299 randomly selected -lines with an accuracy of 17%, i.e., over 50 times more likely than by random. -Second, we employ the prediction model to combine lines from existing songs, -producing lyrics with rhyme and a meaning. An evaluation of the produced lyrics -shows that in terms of quantitative rhyme density, the method outperforms the -best human rappers by 21%. The rap lyrics generator has been deployed as an -online tool called DeepBeat, and the performance of the tool has been assessed -by analyzing its usage logs. This analysis shows that machine-learned rankings -correlate with user preferences. -" -1759,1505.04870,"Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, - Julia Hockenmaier, and Svetlana Lazebnik","Flickr30k Entities: Collecting Region-to-Phrase Correspondences for - Richer Image-to-Sentence Models",cs.CV cs.CL," The Flickr30k dataset has become a standard benchmark for sentence-based -image description. This paper presents Flickr30k Entities, which augments the -158k captions from Flickr30k with 244k coreference chains, linking mentions of -the same entities across different captions for the same image, and associating -them with 276k manually annotated bounding boxes. Such annotations are -essential for continued progress in automatic image description and grounded -language understanding. They enable us to define a new benchmark for -localization of textual entity mentions in an image. We present a strong -baseline for this task that combines an image-text embedding, detectors for -common objects, a color classifier, and a bias towards selecting larger -objects. While our baseline rivals in accuracy more complex state-of-the-art -models, we show that its gains cannot be easily parlayed into improvements on -such tasks as image-sentence retrieval, thus underlining the limitations of -current methods and the need for further research. -" -1760,1505.04891,"Fei Tian, Bin Gao, Enhong Chen, Tie-Yan Liu","Learning Better Word Embedding by Asymmetric Low-Rank Projection of - Knowledge Graph",cs.CL," Word embedding, which refers to low-dimensional dense vector representations -of natural words, has demonstrated its power in many natural language -processing tasks. However, it may suffer from the inaccurate and incomplete -information contained in the free text corpus as training data. To tackle this -challenge, there have been quite a few works that leverage knowledge graphs as -an additional information source to improve the quality of word embedding. -Although these works have achieved certain success, they have neglected some -important facts about knowledge graphs: (i) many relationships in knowledge -graphs are \emph{many-to-one}, \emph{one-to-many} or even \emph{many-to-many}, -rather than simply \emph{one-to-one}; (ii) most head entities and tail entities -in knowledge graphs come from very different semantic spaces. To address these -issues, in this paper, we propose a new algorithm named ProjectNet. ProjecNet -models the relationships between head and tail entities after transforming them -with different low-rank projection matrices. The low-rank projection can allow -non \emph{one-to-one} relationships between entities, while different -projection matrices for head and tail entities allow them to originate in -different semantic spaces. The experimental results demonstrate that ProjectNet -yields more accurate word embedding than previous works, thus leads to clear -improvements in various natural language processing tasks. -" -1761,1505.05008,"Cicero Nogueira dos Santos, Victor Guimar\~aes",Boosting Named Entity Recognition with Neural Character Embeddings,cs.CL," Most state-of-the-art named entity recognition (NER) systems rely on -handcrafted features and on the output of other NLP tasks such as -part-of-speech (POS) tagging and text chunking. In this work we propose a -language-independent NER system that uses automatically learned features only. -Our approach is based on the CharWNN deep neural network, which uses word-level -and character-level representations (embeddings) to perform sequential -classification. We perform an extensive number of experiments using two -annotated corpora in two different languages: HAREM I corpus, which contains -texts in Portuguese; and the SPA CoNLL-2002 corpus, which contains texts in -Spanish. Our experimental results shade light on the contribution of neural -character embeddings for NER. Moreover, we demonstrate that the same neural -network which has been successfully applied to POS tagging can also achieve -state-of-the-art results for language-independet NER, using the same -hyperparameters, and without any handcrafted features. For the HAREM I corpus, -CharWNN outperforms the state-of-the-art system by 7.9 points in the F1-score -for the total scenario (ten NE classes), and by 7.2 points in the F1 for the -selective scenario (five NE classes). -" -1762,1505.05253,"Jun Feng, Mantong Zhou, Yu Hao, Minlie Huang and Xiaoyan Zhu",Knowlege Graph Embedding by Flexible Translation,cs.CL," Knowledge graph embedding refers to projecting entities and relations in -knowledge graph into continuous vector spaces. State-of-the-art methods, such -as TransE, TransH, and TransR build embeddings by treating relation as -translation from head entity to tail entity. However, previous models can not -deal with reflexive/one-to-many/many-to-one/many-to-many relations properly, or -lack of scalability and efficiency. Thus, we propose a novel method, flexible -translation, named TransF, to address the above issues. TransF regards relation -as translation between head entity vector and tail entity vector with flexible -magnitude. To evaluate the proposed model, we conduct link prediction and -triple classification on benchmark datasets. Experimental results show that our -method remarkably improve the performance compared with several -state-of-the-art baselines. -" -1763,1505.05612,"Haoyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, Wei Xu","Are You Talking to a Machine? Dataset and Methods for Multilingual Image - Question Answering",cs.CV cs.CL cs.LG," In this paper, we present the mQA model, which is able to answer questions -about the content of an image. The answer can be a sentence, a phrase or a -single word. Our model contains four components: a Long Short-Term Memory -(LSTM) to extract the question representation, a Convolutional Neural Network -(CNN) to extract the visual representation, an LSTM for storing the linguistic -context in an answer, and a fusing component to combine the information from -the first three components and generate the answer. We construct a Freestyle -Multilingual Image Question Answering (FM-IQA) dataset to train and evaluate -our mQA model. It contains over 150,000 images and 310,000 freestyle Chinese -question-answer pairs and their English translations. The quality of the -generated answers of our mQA model on this dataset is evaluated by human judges -through a Turing Test. Specifically, we mix the answers provided by humans and -our model. The human judges need to distinguish our model from the human. They -will also provide a score (i.e. 0, 1, 2, the larger the better) indicating the -quality of the answer. We propose strategies to monitor the quality of this -evaluation process. The experiments show that in 64.7% of cases, the human -judges cannot distinguish our model from humans. The average score is 1.454 -(1.918 for human). The details of this work, including the FM-IQA dataset, can -be found on the project page: http://idl.baidu.com/FM-IQA.html -" -1764,1505.05667,"Chenxi Zhu, Xipeng Qiu, Xinchi Chen, Xuanjing Huang","A Re-ranking Model for Dependency Parser with Recursive Convolutional - Neural Network",cs.CL cs.LG cs.NE," In this work, we address the problem to model all the nodes (words or -phrases) in a dependency tree with the dense representations. We propose a -recursive convolutional neural network (RCNN) architecture to capture syntactic -and compositional-semantic representations of phrases and words in a dependency -tree. Different with the original recursive neural network, we introduce the -convolution and pooling layers, which can model a variety of compositions by -the feature maps and choose the most informative compositions by the pooling -layers. Based on RCNN, we use a discriminative model to re-rank a $k$-best list -of candidate dependency parsing trees. The experiments show that RCNN is very -effective to improve the state-of-the-art dependency parsing on both English -and Chinese datasets. -" -1765,1505.05841,Michael Bloodgood and Benjamin Strauss,Translation Memory Retrieval Methods,cs.CL," Translation Memory (TM) systems are one of the most widely used translation -technologies. An important part of TM systems is the matching algorithm that -determines what translations get retrieved from the bank of available -translations to assist the human translator. Although detailed accounts of the -matching algorithms used in commercial systems can't be found in the -literature, it is widely believed that edit distance algorithms are used. This -paper investigates and evaluates the use of several matching algorithms, -including the edit distance algorithm that is believed to be at the heart of -most modern commercial TM systems. This paper presents results showing how well -various matching algorithms correlate with human judgments of helpfulness -(collected via crowdsourcing with Amazon's Mechanical Turk). A new algorithm -based on weighted n-gram precision that can be adjusted for translator length -preferences consistently returns translations judged to be most helpful by -translators for multiple domains and language pairs. -" -1766,1505.05899,"George Saon, Hong-Kwang J. Kuo, Steven Rennie and Michael Picheny",The IBM 2015 English Conversational Telephone Speech Recognition System,cs.CL," We describe the latest improvements to the IBM English conversational -telephone speech recognition system. Some of the techniques that were found -beneficial are: maxout networks with annealed dropout rates; networks with a -very large number of outputs trained on 2000 hours of data; joint modeling of -partially unfolded recurrent neural networks and convolutional nets by -combining the bottleneck and output layers and retraining the resulting model; -and lastly, sophisticated language model rescoring with exponential and neural -network LMs. These techniques result in an 8.0% word error rate on the -Switchboard part of the Hub5-2000 evaluation test set which is 23% relative -better than our previous best published result. -" -1767,1505.06027,"Piotr Bojanowski (WILLOW, LIENS), R\'emi Lajugie (LIENS, SIERRA), - Edouard Grave (APAM), Francis Bach (LIENS, SIERRA), Ivan Laptev (WILLOW, - LIENS), Jean Ponce (WILLOW, LIENS), Cordelia Schmid (LEAR)",Weakly-Supervised Alignment of Video With Text,cs.CV cs.CL," Suppose that we are given a set of videos, along with natural language -descriptions in the form of multiple sentences (e.g., manual annotations, movie -scripts, sport summaries etc.), and that these sentences appear in the same -temporal order as their visual counterparts. We propose in this paper a method -for aligning the two modalities, i.e., automatically providing a time stamp for -every sentence. Given vectorial features for both video and text, we propose to -cast this task as a temporal assignment problem, with an implicit linear -mapping between the two feature modalities. We formulate this problem as an -integer quadratic program, and solve its continuous convex relaxation using an -efficient conditional gradient algorithm. Several rounding procedures are -proposed to construct the final integer solution. After demonstrating -significant improvements over the state of the art on the related task of -aligning video with symbolic labels [7], we evaluate our method on a -challenging dataset of videos with associated textual descriptions [36], using -both bag-of-words and continuous representations for text. -" -1768,1505.06169,"Emma Strubell, Luke Vilnis, Kate Silverstein, Andrew McCallum",Learning Dynamic Feature Selection for Fast Sequential Prediction,cs.CL cs.LG," We present paired learning and inference algorithms for significantly -reducing computation and increasing speed of the vector dot products in the -classifiers that are at the heart of many NLP components. This is accomplished -by partitioning the features into a sequence of templates which are ordered -such that high confidence can often be reached using only a small fraction of -all features. Parameter estimation is arranged to maximize accuracy and early -confidence in this sequence. Our approach is simpler and better suited to NLP -than other related cascade methods. We present experiments in left-to-right -part-of-speech tagging, named entity recognition, and transition-based -dependency parsing. On the typical benchmarking datasets we can preserve POS -tagging accuracy above 97% and parsing LAS above 88.5% both with over a -five-fold reduction in run-time, and NER F1 above 88 with more than 2x increase -in speed. -" -1769,1505.06228,"Fatma Elghannam, Tarek El-Shishtawy",Keyphrase Based Evaluation of Automatic Text Summarization,cs.CL," The development of methods to deal with the informative contents of the text -units in the matching process is a major challenge in automatic summary -evaluation systems that use fixed n-gram matching. The limitation causes -inaccurate matching between units in a peer and reference summaries. The -present study introduces a new Keyphrase based Summary Evaluator KpEval for -evaluating automatic summaries. The KpEval relies on the keyphrases since they -convey the most important concepts of a text. In the evaluation process, the -keyphrases are used in their lemma form as the matching text unit. The system -was applied to evaluate different summaries of Arabic multi-document data set -presented at TAC2011. The results showed that the new evaluation technique -correlates well with the known evaluation systems: Rouge1, Rouge2, RougeSU4, -and AutoSummENG MeMoG. KpEval has the strongest correlation with AutoSummENG -MeMoG, Pearson and spearman correlation coefficient measures are 0.8840, 0.9667 -respectively. -" -1770,1505.06256,"Tong Shu Li, Benjamin M. Good, Andrew I. Su","Exposing ambiguities in a relation-extraction gold standard with - crowdsourcing",cs.CL q-bio.QM," Semantic relation extraction is one of the frontiers of biomedical natural -language processing research. Gold standards are key tools for advancing this -research. It is challenging to generate these standards because of the high -cost of expert time and the difficulty in establishing agreement between -annotators. We implemented and evaluated a microtask crowdsourcing approach -that can produce a gold standard for extracting drug-disease relations. The -aggregated crowd judgment agreed with expert annotations from a pre-existing -corpus on 43 of 60 sentences tested. The levels of crowd agreement varied in a -similar manner to the levels of agreement among the original expert annotators. -This work rein-forces the power of crowdsourcing in the process of assembling -gold standards for relation extraction. Further, it high-lights the importance -of exposing the levels of agreement between human annotators, expert or crowd, -in gold standard corpora as these are reproducible signals indicating -ambiguities in the data or in the annotation guidelines. -" -1771,1505.06289,"Angel Chang, Will Monroe, Manolis Savva, Christopher Potts, - Christopher D. Manning",Text to 3D Scene Generation with Rich Lexical Grounding,cs.CL cs.GR," The ability to map descriptions of scenes to 3D geometric representations has -many applications in areas such as art, education, and robotics. However, prior -work on the text to 3D scene generation task has used manually specified object -categories and language that identifies them. We introduce a dataset of 3D -scenes annotated with natural language descriptions and learn from this data -how to ground textual descriptions to physical objects. Our method successfully -grounds a variety of lexical terms to concrete referents, and we show -quantitatively that our method improves 3D scene generation over previous work -using purely rule-based methods. We evaluate the fidelity and plausibility of -3D scenes generated with our grounding approach through human judgments. To -ease evaluation on this task, we also introduce an automated metric that -strongly correlates with human judgments. -" -1772,1505.06294,Dimitri Kartsaklis and Mehrnoosh Sadrzadeh,"A Frobenius Model of Information Structure in Categorical Compositional - Distributional Semantics",cs.CL cs.AI math.CT math.RA," The categorical compositional distributional model of Coecke, Sadrzadeh and -Clark provides a linguistically motivated procedure for computing the meaning -of a sentence as a function of the distributional meaning of the words therein. -The theoretical framework allows for reasoning about compositional aspects of -language and offers structural ways of studying the underlying relationships. -While the model so far has been applied on the level of syntactic structures, a -sentence can bring extra information conveyed in utterances via intonational -means. In the current paper we extend the framework in order to accommodate -this additional information, using Frobenius algebraic structures canonically -induced over the basis of finite-dimensional vector spaces. We detail the -theory, provide truth-theoretic and distributional semantics for meanings of -intonationally-marked utterances, and present justifications and extensive -examples. -" -1773,1505.06427,Lantian Li and Dong Wang and Zhiyong Zhang and Thomas Fang Zheng,Deep Speaker Vectors for Semi Text-independent Speaker Verification,cs.CL cs.LG cs.NE," Recent research shows that deep neural networks (DNNs) can be used to extract -deep speaker vectors (d-vectors) that preserve speaker characteristics and can -be used in speaker verification. This new method has been tested on -text-dependent speaker verification tasks, and improvement was reported when -combined with the conventional i-vector method. - This paper extends the d-vector approach to semi text-independent speaker -verification tasks, i.e., the text of the speech is in a limited set of short -phrases. We explore various settings of the DNN structure used for d-vector -extraction, and present a phone-dependent training which employs the posterior -features obtained from an ASR system. The experimental results show that it is -possible to apply d-vectors on semi text-independent speaker recognition, and -the phone-dependent training improves system performance. -" -1774,1505.06750,"P. S. Dodds, E. M. Clark, S. Desu, M. R. Frank, A. J. Reagan, J. R. - Williams, L. Mitchell, K. D. Harris, I. M. Kloumann, J. P. Bagrow, K. - Megerdoomian, M. T. McMahon, B. F. Tivnan, and C. M. Danforth","Reply to Garcia et al.: Common mistakes in measuring frequency dependent - word characteristics",physics.soc-ph cs.CL," We demonstrate that the concerns expressed by Garcia et al. are misplaced, -due to (1) a misreading of our findings in [1]; (2) a widespread failure to -examine and present words in support of asserted summary quantities based on -word usage frequencies; and (3) a range of misconceptions about word usage -frequency, word rank, and expert-constructed word lists. In particular, we show -that the English component of our study compares well statistically with two -related surveys, that no survey design influence is apparent, and that -estimates of measurement error do not explain the positivity biases reported in -our work and that of others. We further demonstrate that for the frequency -dependence of positivity---of which we explored the nuances in great detail in -[1]---Garcia et al. did not perform a reanalysis of our data---they instead -carried out an analysis of a different, statistically improper data set and -introduced a nonlinearity before performing linear regression. -" -1775,1505.06816,"I. Beltagy, Stephen Roller, Pengxiang Cheng, Katrin Erk, Raymond J. - Mooney","Representing Meaning with a Combination of Logical and Distributional - Models",cs.CL," NLP tasks differ in the semantic information they require, and at this time -no single se- mantic representation fulfills all requirements. Logic-based -representations characterize sentence structure, but do not capture the graded -aspect of meaning. Distributional models give graded similarity ratings for -words and phrases, but do not capture sentence structure in the same detail as -logic-based approaches. So it has been argued that the two are complementary. -We adopt a hybrid approach that combines logic-based and distributional -semantics through probabilistic logic inference in Markov Logic Networks -(MLNs). In this paper, we focus on the three components of a practical system -integrating logical and distributional models: 1) Parsing and task -representation is the logic-based part where input problems are represented in -probabilistic logic. This is quite different from representing them in standard -first-order logic. 2) For knowledge base construction we form weighted -inference rules. We integrate and compare distributional information with other -sources, notably WordNet and an existing paraphrase collection. In particular, -we use our system to evaluate distributional lexical entailment approaches. We -use a variant of Robinson resolution to determine the necessary inference -rules. More sources can easily be added by mapping them to logical rules; our -system learns a resource-specific weight that corrects for scaling differences -between resources. 3) In discussing probabilistic inference, we show how to -solve the inference problems efficiently. To evaluate our approach, we use the -task of textual entailment (RTE), which can utilize the strengths of both -logic-based and distributional representations. In particular we focus on the -SICK dataset, where we achieve state-of-the-art results. -" -1776,1505.07184,Danushka Bollegala and Takanori Maehara and Ken-ichi Kawarabayashi,Unsupervised Cross-Domain Word Representation Learning,cs.CL," Meaning of a word varies from one domain to another. Despite this important -domain dependence in word semantics, existing word representation learning -methods are bound to a single domain. Given a pair of -\emph{source}-\emph{target} domains, we propose an unsupervised method for -learning domain-specific word representations that accurately capture the -domain-specific aspects of word semantics. First, we select a subset of -frequent words that occur in both domains as \emph{pivots}. Next, we optimize -an objective function that enforces two constraints: (a) for both source and -target domain documents, pivots that appear in a document must accurately -predict the co-occurring non-pivots, and (b) word representations learnt for -pivots must be similar in the two domains. Moreover, we propose a method to -perform domain adaptation using the learnt word representations. Our proposed -method significantly outperforms competitive baselines including the -state-of-the-art domain-insensitive word representations, and reports best -sentiment classification accuracies for all domain-pairs in a benchmark -dataset. -" -1777,1505.07302,Derek Greene and James P. Cross,"Unveiling the Political Agenda of the European Parliament Plenary: A - Topical Analysis",cs.CL cs.CY," This study analyzes political interactions in the European Parliament (EP) by -considering how the political agenda of the plenary sessions has evolved over -time and the manner in which Members of the European Parliament (MEPs) have -reacted to external and internal stimuli when making Parliamentary speeches. It -does so by considering the context in which speeches are made, and the content -of those speeches. To detect latent themes in legislative speeches over time, -speech content is analyzed using a new dynamic topic modeling method, based on -two layers of matrix factorization. This method is applied to a new corpus of -all English language legislative speeches in the EP plenary from the period -1999-2014. Our findings suggest that the political agenda of the EP has evolved -significantly over time, is impacted upon by the committee structure of the -Parliament, and reacts to exogenous events such as EU Treaty referenda and the -emergence of the Euro-crisis have a significant impact on what is being -discussed in Parliament. -" -1778,1505.07599,"Xipeng Qiu, Peng Qian, Liusong Yin, Shiyu Wu, Xuanjing Huang","Overview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and - POS Tagging for Micro-blog Texts",cs.CL," In this paper, we give an overview for the shared task at the 4th CCF -Conference on Natural Language Processing \& Chinese Computing (NLPCC 2015): -Chinese word segmentation and part-of-speech (POS) tagging for micro-blog -texts. Different with the popular used newswire datasets, the dataset of this -shared task consists of the relatively informal micro-texts. The shared task -has two sub-tasks: (1) individual Chinese word segmentation and (2) joint -Chinese word segmentation and POS Tagging. Each subtask has three tracks to -distinguish the systems with different resources. We first introduce the -dataset and task, then we characterize the different approaches of the -participating systems, report the test results, and provide a overview analysis -of these results. An online system is available for open registration and -evaluation at http://nlp.fudan.edu.cn/nlpcc2015. -" -1779,1505.07712,Eric Werner,A Category Theory of Communication Theory,cs.IT cs.CL cs.LO math.IT," A theory of how agents can come to understand a language is presented. If -understanding a sentence $\alpha$ is to associate an operator with $\alpha$ -that transforms the representational state of the agent as intended by the -sender, then coming to know a language involves coming to know the operators -that correspond to the meaning of any sentence. This involves a higher order -operator that operates on the possible transformations that operate on the -representational capacity of the agent. We formalize these constructs using -concepts and diagrams analogous to category theory. -" -1780,1505.07909,"Huazheng Wang, Fei Tian, Bin Gao, Jiang Bian, Tie-Yan Liu","Solving Verbal Comprehension Questions in IQ Test by Knowledge-Powered - Word Embedding",cs.CL cs.IR cs.LG," Intelligence Quotient (IQ) Test is a set of standardized questions designed -to evaluate human intelligence. Verbal comprehension questions appear very -frequently in IQ tests, which measure human's verbal ability including the -understanding of the words with multiple senses, the synonyms and antonyms, and -the analogies among words. In this work, we explore whether such tests can be -solved automatically by artificial intelligence technologies, especially the -deep learning technologies that are recently developed and successfully applied -in a number of fields. However, we found that the task was quite challenging, -and simply applying existing technologies (e.g., word embedding) could not -achieve a good performance, mainly due to the multiple senses of words and the -complex relations among words. To tackle these challenges, we propose a novel -framework consisting of three components. First, we build a classifier to -recognize the specific type of a verbal question (e.g., analogy, -classification, synonym, or antonym). Second, we obtain distributed -representations of words and relations by leveraging a novel word embedding -method that considers the multi-sense nature of words and the relational -knowledge among words (or their senses) contained in dictionaries. Third, for -each type of questions, we propose a specific solver based on the obtained -distributed word representations and relation representations. Experimental -results have shown that the proposed framework can not only outperform existing -methods for solving verbal comprehension questions but also exceed the average -performance of the Amazon Mechanical Turk workers involved in the study. The -results indicate that with appropriate uses of the deep learning technologies -we might be a further step closer to the human intelligence. -" -1781,1505.07931,"Xuefeng Yang, Kezhi Mao",Supervised Fine Tuning for Word Embedding with Integrated Knowledge,cs.CL," Learning vector representation for words is an important research field which -may benefit many natural language processing tasks. Two limitations exist in -nearly all available models, which are the bias caused by the context -definition and the lack of knowledge utilization. They are difficult to tackle -because these algorithms are essentially unsupervised learning approaches. -Inspired by deep learning, the authors propose a supervised framework for -learning vector representation of words to provide additional supervised fine -tuning after unsupervised learning. The framework is knowledge rich approacher -and compatible with any numerical vectors word representation. The authors -perform both intrinsic evaluation like attributional and relational similarity -prediction and extrinsic evaluations like the sentence completion and sentiment -analysis. Experiments results on 6 embeddings and 4 tasks with 10 datasets show -that the proposed fine tuning framework may significantly improve the quality -of the vector representation of words. -" -1782,1505.08075,"Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. - Smith",Transition-Based Dependency Parsing with Stack Long Short-Term Memory,cs.CL cs.LG cs.NE," We propose a technique for learning representations of parser states in -transition-based dependency parsers. Our primary innovation is a new control -structure for sequence-to-sequence neural networks---the stack LSTM. Like the -conventional stack data structures used in transition-based parsing, elements -can be pushed to or popped from the top of the stack in constant time, but, in -addition, an LSTM maintains a continuous space embedding of the stack contents. -This lets us formulate an efficient parsing model that captures three facets of -a parser's state: (i) unbounded look-ahead into the buffer of incoming words, -(ii) the complete history of actions taken by the parser, and (iii) the -complete contents of the stack of partially built tree fragments, including -their internal structures. Standard backpropagation techniques are used for -training and yield state-of-the-art parsing performance. -" -1783,1505.08149,"Michael Kapustin, Pavlo Kapustin","Modeling meaning: computational interpreting and understanding of - natural language fragments",cs.CL," In this introductory article we present the basics of an approach to -implementing computational interpreting of natural language aiming to model the -meanings of words and phrases. Unlike other approaches, we attempt to define -the meanings of text fragments in a composable and computer interpretable way. -We discuss models and ideas for detecting different types of semantic -incomprehension and choosing the interpretation that makes most sense in a -given context. Knowledge representation is designed for handling -context-sensitive and uncertain / imprecise knowledge, and for easy -accommodation of new information. It stores quantitative information capturing -the essence of the concepts, because it is crucial for working with natural -language understanding and reasoning. Still, the representation is general -enough to allow for new knowledge to be learned, and even generated by the -system. The article concludes by discussing some reasoning-related topics: -possible approaches to generation of new abstract concepts, and describing -situations and concepts in words (e.g. for specifying interpretation -difficulties). -" -1784,1506.00037,"Gilchan Park, Julia M. Taylor",Using Syntactic Features for Phishing Detection,cs.CL," This paper reports on the comparison of the subject and object of verbs in -their usage between phishing emails and legitimate emails. The purpose of this -research is to explore whether the syntactic structures and subjects and -objects of verbs can be distinguishable features for phishing detection. To -achieve the objective, we have conducted two series of experiments: the -syntactic similarity for sentences, and the subject and object of verb -comparison. The results of the experiments indicated that both features can be -used for some verbs, but more work has to be done for others. -" -1785,1506.00195,Baolin Peng and Kaisheng Yao,"Recurrent Neural Networks with External Memory for Language - Understanding",cs.CL cs.AI cs.LG cs.NE," Recurrent Neural Networks (RNNs) have become increasingly popular for the -task of language understanding. In this task, a semantic tagger is deployed to -associate a semantic label to each word in an input sequence. The success of -RNN may be attributed to its ability to memorize long-term dependence that -relates the current-time semantic label prediction to the observations many -time instances away. However, the memory capacity of simple RNNs is limited -because of the gradient vanishing and exploding problem. We propose to use an -external memory to improve memorization capability of RNNs. We conducted -experiments on the ATIS dataset, and observed that the proposed model was able -to achieve the state-of-the-art results. We compare our proposed model with -alternative models and report analysis results that may provide insights for -future research. -" -1786,1506.00196,Kaisheng Yao and Geoffrey Zweig,"Sequence-to-Sequence Neural Net Models for Grapheme-to-Phoneme - Conversion",cs.CL," Sequence-to-sequence translation methods based on generation with a -side-conditioned language model have recently shown promising results in -several tasks. In machine translation, models conditioned on source side words -have been used to produce target-language text, and in image captioning, models -conditioned images have been used to generate caption text. Past work with this -approach has focused on large vocabulary tasks, and measured quality in terms -of BLEU. In this paper, we explore the applicability of such models to the -qualitatively different grapheme-to-phoneme task. Here, the input and output -side vocabularies are small, plain n-gram models do well, and credit is only -given when the output is exactly correct. We find that the simple -side-conditioned generation approach is able to rival the state-of-the-art, and -we are able to significantly advance the stat-of-the-art with bi-directional -long short-term memory (LSTM) neural networks that use the same alignment -information that is used in conventional approaches. -" -1787,1506.00275,Shashi Narayan and Shay B. Cohen,Diversity in Spectral Learning for Natural Language Parsing,cs.CL," We describe an approach to create a diverse set of predictions with spectral -learning of latent-variable PCFGs (L-PCFGs). Our approach works by creating -multiple spectral models where noise is added to the underlying features in the -training set before the estimation of each model. We describe three ways to -decode with multiple models. In addition, we describe a simple variant of the -spectral algorithm for L-PCFGs that is fast and leads to compact models. Our -experiments for natural language parsing, for English and German, show that we -get a significant improvement over baselines comparable to state of the art. -For English, we achieve the $F_1$ score of 90.18, and for German we achieve the -$F_1$ score of 83.38. -" -1788,1506.00278,"Licheng Yu, Eunbyung Park, Alexander C. Berg, and Tamara L. Berg","Visual Madlibs: Fill in the blank Image Generation and Question - Answering",cs.CV cs.CL," In this paper, we introduce a new dataset consisting of 360,001 focused -natural language descriptions for 10,738 images. This dataset, the Visual -Madlibs dataset, is collected using automatically produced fill-in-the-blank -templates designed to gather targeted descriptions about: people and objects, -their appearances, activities, and interactions, as well as inferences about -the general scene or its broader context. We provide several analyses of the -Visual Madlibs dataset and demonstrate its applicability to two new description -generation tasks: focused description generation, and multiple-choice -question-answering for images. Experiments using joint-embedding and deep -learning methods show promising results on these tasks. -" -1789,1506.00301,"Travis Wolfe, Mark Dredze, James Mayfield, Paul McNamee, Craig Harman, - Tim Finin, Benjamin Van Durme",Interactive Knowledge Base Population,cs.AI cs.CL," Most work on building knowledge bases has focused on collecting entities and -facts from as large a collection of documents as possible. We argue for and -describe a new paradigm where the focus is on a high-recall extraction over a -small collection of documents under the supervision of a human expert, that we -call Interactive Knowledge Base Population (IKBP). -" -1790,1506.00333,"Lin Ma, Zhengdong Lu, Hang Li","Learning to Answer Questions From Image Using Convolutional Neural - Network",cs.CL cs.CV cs.LG cs.NE," In this paper, we propose to employ the convolutional neural network (CNN) -for the image question answering (QA). Our proposed CNN provides an end-to-end -framework with convolutional architectures for learning not only the image and -question representations, but also their inter-modal interactions to produce -the answer. More specifically, our model consists of three CNNs: one image CNN -to encode the image content, one sentence CNN to compose the words of the -question, and one multimodal convolution layer to learn their joint -representation for the classification in the space of candidate answer words. -We demonstrate the efficacy of our proposed model on the DAQUAR and COCO-QA -datasets, which are two benchmark datasets for the image QA, with the -performances significantly outperforming the state-of-the-art. -" -1791,1506.00379,"Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, Song Liu",Modeling Relation Paths for Representation Learning of Knowledge Bases,cs.CL," Representation learning of knowledge bases (KBs) aims to embed both entities -and relations into a low-dimensional space. Most existing methods only consider -direct relations in representation learning. We argue that multiple-step -relation paths also contain rich inference patterns between entities, and -propose a path-based representation learning model. This model considers -relation paths as translations between entities for representation learning, -and addresses two key challenges: (1) Since not all relation paths are -reliable, we design a path-constraint resource allocation algorithm to measure -the reliability of relation paths. (2) We represent relation paths via semantic -composition of relation embeddings. Experimental results on real-world datasets -show that, as compared with baselines, our model achieves significant and -consistent improvements on knowledge base completion and relation extraction -from text. -" -1792,1506.00406,Amir Pouya Aghasadeghi and Mohadeseh Bastan,"Monolingually Derived Phrase Scores for Phrase Based SMT Using Neural - Networks Vector Representations",cs.CL," In this paper, we propose two new features for estimating phrase-based -machine translation parameters from mainly monolingual data. Our method is -based on two recently introduced neural network vector representation models -for words and sentences. It is the first time that these models have been used -in an end to end phrase-based machine translation system. Scores obtained from -our method can recover more than 80% of BLEU loss caused by removing phrase -table probabilities. We also show that our features combined with the phrase -table probabilities improve the BLEU score by absolute 0.74 points. -" -1793,1506.00468,Michal Lukasik and Trevor Cohn and Kalina Bontcheva,Classifying Tweet Level Judgements of Rumours in Social Media,cs.SI cs.CL cs.LG," Social media is a rich source of rumours and corresponding community -reactions. Rumours reflect different characteristics, some shared and some -individual. We formulate the problem of classifying tweet level judgements of -rumours as a supervised learning task. Both supervised and unsupervised domain -adaptation are considered, in which tweets from a rumour are classified on the -basis of other annotated rumours. We demonstrate how multi-task learning helps -achieve good results on rumours from the 2011 England riots. -" -1794,1506.00528,"Chang Wang, Liangliang Cao, Bowen Zhou",Medical Synonym Extraction with Concept Space Models,cs.CL," In this paper, we present a novel approach for medical synonym extraction. We -aim to integrate the term embedding with the medical domain knowledge for -healthcare applications. One advantage of our method is that it is very -scalable. Experiments on a dataset with more than 1M term pairs show that the -proposed approach outperforms the baseline approaches by a large margin. -" -1795,1506.00572,"Han-Teng Liao, King-wa Fu, Scott A. Hale","How much is said in a microblog? A multilingual inquiry based on Weibo - and Twitter",cs.SI cs.CL cs.CY," This paper presents a multilingual study on, per single post of microblog -text, (a) how much can be said, (b) how much is written in terms of characters -and bytes, and (c) how much is said in terms of information content in posts by -different organizations in different languages. Focusing on three different -languages (English, Chinese, and Japanese), this research analyses Weibo and -Twitter accounts of major embassies and news agencies. We first establish our -criterion for quantifying ""how much can be said"" in a digital text based on the -openly available Universal Declaration of Human Rights and the translated -subtitles from TED talks. These parallel corpora allow us to determine the -number of characters and bits needed to represent the same content in different -languages and character encodings. We then derive the amount of information -that is actually contained in microblog posts authored by selected accounts on -Weibo and Twitter. Our results confirm that languages with larger character -sets such as Chinese and Japanese contain more information per character than -English, but the actual information content contained within a microblog text -varies depending on both the type of organization and the language of the post. -We conclude with a discussion on the design implications of microblog text -limits for different languages. -" -1796,1506.00578,William Blacoe,"On Quantum Generalizations of Information-Theoretic Measures and their - Contribution to Distributional Semantics",cs.IT cs.CL math.IT," Information-theoretic measures such as relative entropy and correlation are -extremely useful when modeling or analyzing the interaction of probabilistic -systems. We survey the quantum generalization of 5 such measures and point out -some of their commonalities and interpretations. In particular we find the -application of information theory to distributional semantics useful. By -modeling the distributional meaning of words as density operators rather than -vectors, more of their semantic structure may be exploited. Furthermore, -properties of and interactions between words such as ambiguity, similarity and -entailment can be simulated more richly and intuitively when using methods from -quantum information theory. -" -1797,1506.00698,"Hendra Setiawan, Zhongqiang Huang, Jacob Devlin, Thomas Lamar, Rabih - Zbib, Richard Schwartz and John Makhoul",Statistical Machine Translation Features with Multitask Tensor Networks,cs.CL," We present a three-pronged approach to improving Statistical Machine -Translation (SMT), building on recent success in the application of neural -networks to SMT. First, we propose new features based on neural networks to -model various non-local translation phenomena. Second, we augment the -architecture of the neural network with tensor layers that capture important -higher-order interaction among the network units. Third, we apply multitask -learning to estimate the neural network parameters jointly. Each of our -proposed methods results in significant improvements that are complementary. -The overall improvement is +2.7 and +1.8 BLEU points for Arabic-English and -Chinese-English translation over a state-of-the-art system that already -includes neural network features. -" -1798,1506.00765,"Zheng Cai, Donglin Cao, Rongrong Ji",Video (GIF) Sentiment Analysis using Large-Scale Mid-Level Ontology,cs.MM cs.CL cs.IR," With faster connection speed, Internet users are now making social network a -huge reservoir of texts, images and video clips (GIF). Sentiment analysis for -such online platform can be used to predict political elections, evaluates -economic indicators and so on. However, GIF sentiment analysis is quite -challenging, not only because it hinges on spatio-temporal visual -contentabstraction, but also for the relationship between such abstraction and -final sentiment remains unknown.In this paper, we dedicated to find out such -relationship.We proposed a SentiPairSequence basedspatiotemporal visual -sentiment ontology, which forms the midlevel representations for GIFsentiment. -The establishment process of SentiPair contains two steps. First, we construct -the Synset Forest to define the semantic tree structure of visual sentiment -label elements. Then, through theSynset Forest, we organically select and -combine sentiment label elements to form a mid-level visual sentiment -representation. Our experiments indicate that SentiPair outperforms other -competing mid-level attributes. Using SentiPair, our analysis frameworkcan -achieve satisfying prediction accuracy (72.6%). We also opened ourdataset -(GSO-2015) to the research community. GSO-2015 contains more than 6,000 -manually annotated GIFs out of more than 40,000 candidates. Each is labeled -with both sentiment and SentiPair Sequence. -" -1799,1506.00799,Xiangyu Zeng and Shi Yin and Dong Wang,Learning Speech Rate in Speech Recognition,cs.CL cs.LG," A significant performance reduction is often observed in speech recognition -when the rate of speech (ROS) is too low or too high. Most of present -approaches to addressing the ROS variation focus on the change of speech -signals in dynamic properties caused by ROS, and accordingly modify the dynamic -model, e.g., the transition probabilities of the hidden Markov model (HMM). -However, an abnormal ROS changes not only the dynamic but also the static -property of speech signals, and thus can not be compensated for purely by -modifying the dynamic model. This paper proposes an ROS learning approach based -on deep neural networks (DNN), which involves an ROS feature as the input of -the DNN model and so the spectrum distortion caused by ROS can be learned and -compensated for. The experimental results show that this approach can deliver -better performance for too slow and too fast utterances, demonstrating our -conjecture that ROS impacts both the dynamic and the static property of speech. -In addition, the proposed approach can be combined with the conventional HMM -transition adaptation method, offering additional performance gains. -" -1800,1506.00839,"Eug\'enio Ribeiro, Ricardo Ribeiro, David Martins de Matos",The Influence of Context on Dialogue Act Recognition,cs.CL," This article presents an analysis of the influence of context information on -dialog act recognition. We performed experiments on the widely explored -Switchboard corpus, as well as on data annotated according to the recent ISO -24617-2 standard. The latter was obtained from the Tilburg DialogBank and -through the mapping of the annotations of a subset of the Let's Go corpus. We -used a classification approach based on SVMs, which had proved successful in -previous work and allowed us to limit the amount of context information -provided. This way, we were able to observe the influence patterns as the -amount of context information increased. Our base features consisted of -n-grams, punctuation, and wh-words. Context information was obtained from one -to five preceding segments and provided either as n-grams or dialog act -classifications, with the latter typically leading to better results and more -stable influence patterns. In addition to the conclusions about the importance -and influence of context information, our experiments on the Switchboard corpus -also led to results that advanced the state-of-the-art on the dialog act -recognition task on that corpus. Furthermore, the results obtained on data -annotated according to the ISO 24617-2 standard define a baseline for future -work and contribute for the standardization of experiments in the area. -" -1801,1506.00999,"Alberto Garcia-Duran, Antoine Bordes, Nicolas Usunier, Yves Grandvalet","Combining Two And Three-Way Embeddings Models for Link Prediction in - Knowledge Bases",cs.AI cs.CL cs.LG," This paper tackles the problem of endogenous link prediction for Knowledge -Base completion. Knowledge Bases can be represented as directed graphs whose -nodes correspond to entities and edges to relationships. Previous attempts -either consist of powerful systems with high capacity to model complex -connectivity patterns, which unfortunately usually end up overfitting on rare -relationships, or in approaches that trade capacity for simplicity in order to -fairly model all relationships, frequent or not. In this paper, we propose -Tatec a happy medium obtained by complementing a high-capacity model with a -simpler one, both pre-trained separately and then combined. We present several -variants of this model with different kinds of regularization and combination -strategies and show that this approach outperforms existing methods on -different types of relationships by achieving state-of-the-art results on four -benchmarks of the literature. -" -1802,1506.01057,"Jiwei Li, Minh-Thang Luong and Dan Jurafsky",A Hierarchical Neural Autoencoder for Paragraphs and Documents,cs.CL," Natural language generation of coherent long texts like paragraphs or longer -documents is a challenging problem for recurrent networks models. In this -paper, we explore an important step toward this generation task: training an -LSTM (Long-short term memory) auto-encoder to preserve and reconstruct -multi-sentence paragraphs. We introduce an LSTM model that hierarchically -builds an embedding for a paragraph from embeddings for sentences and words, -then decodes this embedding to reconstruct the original paragraph. We evaluate -the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, -showing that neural models are able to encode texts in a way that preserve -syntactic, semantic, and discourse coherence. While only a first step toward -generating coherent text units from neural models, our work has the potential -to significantly impact natural language generation and -summarization\footnote{Code for the three models described in this paper can be -found at www.stanford.edu/~jiweil/ . -" -1803,1506.01066,"Jiwei Li, Xinlei Chen, Eduard Hovy and Dan Jurafsky",Visualizing and Understanding Neural Models in NLP,cs.CL," While neural networks have been successfully applied to many NLP tasks the -resulting vector-based models are very difficult to interpret. For example it's -not clear how they achieve {\em compositionality}, building sentence meaning -from the meanings of words and phrases. In this paper we describe four -strategies for visualizing compositionality in neural models for NLP, inspired -by similar work in computer vision. We first plot unit values to visualize -compositionality of negation, intensification, and concessive clauses, allow us -to see well-known markedness asymmetries in negation. We then introduce three -simple and straightforward methods for visualizing a unit's {\em salience}, the -amount it contributes to the final composed meaning: (1) gradient -back-propagation, (2) the variance of a token from the average word node, (3) -LSTM-style gates that measure information flow. We test our methods on -sentiment using simple recurrent nets and LSTMs. Our general-purpose methods -may have wide applications for understanding compositionality and other -semantic properties of deep networks , and also shed light on why LSTMs -outperform simple recurrent nets, -" -1804,1506.01070,Jiwei Li and Dan Jurafsky,Do Multi-Sense Embeddings Improve Natural Language Understanding?,cs.CL," Learning a distinct representation for each sense of an ambiguous word could -lead to more powerful and fine-grained models of vector-space representations. -Yet while `multi-sense' methods have been proposed and tested on artificial -word-similarity tasks, we don't know if they improve real natural language -understanding tasks. In this paper we introduce a multi-sense embedding model -based on Chinese Restaurant Processes that achieves state of the art -performance on matching human word similarity judgments, and propose a -pipelined architecture for incorporating multi-sense embeddings into language -understanding. - We then test the performance of our model on part-of-speech tagging, named -entity recognition, sentiment analysis, semantic relation identification and -semantic relatedness, controlling for embedding dimensionality. We find that -multi-sense embeddings do improve performance on some tasks (part-of-speech -tagging, semantic relation identification, semantic relatedness) but not on -others (named entity recognition, various forms of sentiment analysis). We -discuss how these differences may be caused by the different role of word sense -information in each of the tasks. The results highlight the importance of -testing embedding models in real applications. -" -1805,1506.01094,"Kelvin Guu, John Miller, Percy Liang",Traversing Knowledge Graphs in Vector Space,cs.CL cs.AI cs.DB stat.ML," Path queries on a knowledge graph can be used to answer compositional -questions such as ""What languages are spoken by people living in Lisbon?"". -However, knowledge graphs often have missing facts (edges) which disrupts path -queries. Recent models for knowledge base completion impute missing facts by -embedding knowledge graphs in vector spaces. We show that these models can be -recursively applied to answer path queries, but that they suffer from cascading -errors. This motivates a new ""compositional"" training objective, which -dramatically improves all models' ability to answer path queries, in some cases -more than doubling accuracy. On a standard knowledge base completion task, we -also demonstrate that compositional training acts as a novel form of structural -regularization, reliably improving performance across all base models (reducing -errors by up to 43%) and achieving new state-of-the-art results. -" -1806,1506.01171,"Ahmed G. M. ElSayed, Ahmed S. Salama and Alaa El-Din M. El-Ghazali","A Hybrid Model for Enhancing Lexical Statistical Machine Translation - (SMT)",cs.CL," The interest in statistical machine translation systems increases currently -due to political and social events in the world. A proposed Statistical Machine -Translation (SMT) based model that can be used to translate a sentence from the -source Language (English) to the target language (Arabic) automatically through -efficiently incorporating different statistical and Natural Language Processing -(NLP) models such as language model, alignment model, phrase based model, -reordering model, and translation model. These models are combined to enhance -the performance of statistical machine translation (SMT). Many implementation -tools have been used in this work such as Moses, Gizaa++, IRSTLM, KenLM, and -BLEU. Based on the implementation, evaluation of this model, and comparing the -generated translation with other implemented machine translation systems like -Google Translate, it was proved that this proposed model has enhanced the -results of the statistical machine translation, and forms a reliable and -efficient model in this field of research. -" -1807,1506.01192,"Bo-Hsiang Tseng, Hung-Yi Lee, and Lin-Shan Lee","Personalizing Universal Recurrent Neural Network Language Model with - User Characteristic Features by Social Network Crowdsouring",cs.CL cs.LG," With the popularity of mobile devices, personalized speech recognizer becomes -more realizable today and highly attractive. Each mobile device is primarily -used by a single user, so it's possible to have a personalized recognizer well -matching to the characteristics of individual user. Although acoustic model -personalization has been investigated for decades, much less work have been -reported on personalizing language model, probably because of the difficulties -in collecting enough personalized corpora. Previous work used the corpora -collected from social networks to solve the problem, but constructing a -personalized model for each user is troublesome. In this paper, we propose a -universal recurrent neural network language model with user characteristic -features, so all users share the same model, except each with different user -characteristic features. These user characteristic features can be obtained by -crowdsouring over social networks, which include huge quantity of texts posted -by users with known friend relationships, who may share some subject topics and -wording patterns. The preliminary experiments on Facebook corpus showed that -this proposed approach not only drastically reduced the model perplexity, but -offered very good improvement in recognition accuracy in n-best rescoring -tests. This approach also mitigated the data sparseness problem for -personalized language models. -" -1808,1506.01273,"Marta Apar\'icio, Paulo Figueiredo, Francisco Raposo, David Martins de - Matos, Ricardo Ribeiro, Lu\'is Marujo",Summarization of Films and Documentaries Based on Subtitles and Scripts,cs.CL cs.AI cs.IR," We assess the performance of generic text summarization algorithms applied to -films and documentaries, using the well-known behavior of summarization of news -articles as reference. We use three datasets: (i) news articles, (ii) film -scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics -are used for comparing generated summaries against news abstracts, plot -summaries, and synopses. We show that the best performing algorithms are LSA, -for news articles and documentaries, and LexRank and Support Sets, for films. -Despite the different nature of films and documentaries, their relative -behavior is in accordance with that obtained for news articles. -" -1809,1506.01597,"Lidong Bing, Piji Li, Yi Liao, Wai Lam, Weiwei Guo, Rebecca J. - Passonneau","Abstractive Multi-Document Summarization via Phrase Selection and - Merging",cs.CL cs.AI," We propose an abstraction-based multi-document summarization framework that -can construct new sentences by exploring more fine-grained syntactic units than -sentences, namely, noun/verb phrases. Different from existing abstraction-based -approaches, our method first constructs a pool of concepts and facts -represented by phrases from the input documents. Then new sentences are -generated by selecting and merging informative phrases to maximize the salience -of phrases and meanwhile satisfy the sentence construction constraints. We -employ integer linear optimization for conducting phrase selection and merging -simultaneously in order to achieve the global optimal solution for a summary. -Experimental results on the benchmark data set TAC 2011 show that our framework -outperforms the state-of-the-art models under automated pyramid evaluation -metric, and achieves reasonably well results on manual linguistic quality -evaluation. -" -1810,1506.01698,Anna Rohrbach and Marcus Rohrbach and Bernt Schiele,The Long-Short Story of Movie Description,cs.CV cs.CL," Generating descriptions for videos has many applications including assisting -blind people and human-robot interaction. The recent advances in image -captioning as well as the release of large-scale movie description datasets -such as MPII Movie Description allow to study this task in more depth. Many of -the proposed methods for image captioning rely on pre-trained object classifier -CNNs and Long-Short Term Memory recurrent networks (LSTMs) for generating -descriptions. While image description focuses on objects, we argue that it is -important to distinguish verbs, objects, and places in the challenging setting -of movie description. In this work we show how to learn robust visual -classifiers from the weak annotations of the sentence descriptions. Based on -these visual classifiers we learn how to generate a description using an LSTM. -We explore different design choices to build and train the LSTM and achieve the -best performance to date on the challenging MPII-MD dataset. We compare and -analyze our approach and prior work along various dimensions to better -understand the key challenges of the movie description task. -" -1811,1506.01906,"Hossam S. Ibrahim, Sherif M. Abdou and Mervat Gheith","Idioms-Proverbs Lexicon for Modern Standard Arabic and Colloquial - Sentiment Analysis",cs.CL," Although, the fair amount of works in sentiment analysis (SA) and opinion -mining (OM) systems in the last decade and with respect to the performance of -these systems, but it still not desired performance, especially for -morphologically-Rich Language (MRL) such as Arabic, due to the complexities and -challenges exist in the nature of the languages itself. One of these challenges -is the detection of idioms or proverbs phrases within the writer text or -comment. An idiom or proverb is a form of speech or an expression that is -peculiar to itself. Grammatically, it cannot be understood from the individual -meanings of its elements and can yield different sentiment when treats as -separate words. Consequently, In order to facilitate the task of detection and -classification of lexical phrases for automated SA systems, this paper presents -AIPSeLEX a novel idioms/ proverbs sentiment lexicon for modern standard Arabic -(MSA) and colloquial. AIPSeLEX is manually collected and annotated at sentence -level with semantic orientation (positive or negative). The efforts of manually -building and annotating the lexicon are reported. Moreover, we build a -classifier that extracts idioms and proverbs, phrases from text using n-gram -and similarity measure methods. Finally, several experiments were carried out -on various data, including Arabic tweets and Arabic microblogs (hotel -reservation, product reviews, and TV program comments) from publicly available -Arabic online reviews websites (social media, blogs, forums, e-commerce web -sites) to evaluate the coverage and accuracy of AIPSeLEX. -" -1812,1506.01914,"Niklas Laxstr\""om, Pau Giner, Santhosh Thottingal","Content Translation: Computer-assisted translation tool for Wikipedia - articles",cs.CL," The quality and quantity of articles in each Wikipedia language varies -greatly. Translating from another Wikipedia is a natural way to add more -content, but the translation process is not properly supported in the software -used by Wikipedia. Past computer-assisted translation tools built for Wikipedia -are not commonly used. We created a tool that adapts to the specific needs of -an open community and to the kind of content in Wikipedia. Qualitative and -quantitative data indicates that the new tool helps users translate articles -easier and faster. -" -1813,1506.02004,"Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, Noah Smith",Sparse Overcomplete Word Vector Representations,cs.CL," Current distributed representations of words show little resemblance to -theories of lexical semantics. The former are dense and uninterpretable, the -latter largely based on familiar, discrete classes (e.g., supersenses) and -relations (e.g., synonymy and hypernymy). We propose methods that transform -word vectors into sparse (and optionally binary) vectors. The resulting -representations are more similar to the interpretable features typically used -in NLP, though they are discovered automatically from raw corpora. Because the -vectors are highly sparse, they are computationally easy to work with. Most -importantly, we find that they outperform the original vectors on benchmark -tasks. -" -1814,1506.02075,"Antoine Bordes, Nicolas Usunier, Sumit Chopra, Jason Weston",Large-scale Simple Question Answering with Memory Networks,cs.LG cs.CL," Training large-scale question answering systems is complicated because -training sources usually cover a small portion of the range of possible -questions. This paper studies the impact of multitask and transfer learning for -simple question answering; a setting for which the reasoning required to answer -is quite easy, as long as one can retrieve the correct evidence given a -question, which can be difficult in large-scale conditions. To this end, we -introduce a new dataset of 100k questions that we use in conjunction with -existing benchmarks. We conduct our study within the framework of Memory -Networks (Weston et al., 2015) because this perspective allows us to eventually -scale up to more complex reasoning, and show that Memory Networks can be -successfully trained to achieve excellent performance. -" -1815,1506.02078,"Andrej Karpathy, Justin Johnson, Li Fei-Fei",Visualizing and Understanding Recurrent Networks,cs.LG cs.CL cs.NE," Recurrent Neural Networks (RNNs), and specifically a variant with Long -Short-Term Memory (LSTM), are enjoying renewed interest as a result of -successful applications in a wide range of machine learning problems that -involve sequential data. However, while LSTMs provide exceptional results in -practice, the source of their performance and their limitations remain rather -poorly understood. Using character-level language models as an interpretable -testbed, we aim to bridge this gap by providing an analysis of their -representations, predictions and error types. In particular, our experiments -reveal the existence of interpretable cells that keep track of long-range -dependencies such as line lengths, quotes and brackets. Moreover, our -comparative analysis with finite horizon n-gram models traces the source of the -LSTM improvements to long-range structural dependencies. Finally, we provide -analysis of the remaining errors and suggests areas for further study. -" -1816,1506.02170,"Megha Rughani, D.Shivakrishna","Hybridized Feature Extraction and Acoustic Modelling Approach for - Dysarthric Speech Recognition",cs.SD cs.CL," Dysarthria is malfunctioning of motor speech caused by faintness in the human -nervous system. It is characterized by the slurred speech along with physical -impairment which restricts their communication and creates the lack of -confidence and affects the lifestyle. This paper attempt to increase the -efficiency of Automatic Speech Recognition (ASR) system for unimpaired speech -signal. It describes state of art of research into improving ASR for speakers -with dysarthria by means of incorporated knowledge of their speech production. -Hybridized approach for feature extraction and acoustic modelling technique -along with evolutionary algorithm is proposed for increasing the efficiency of -the overall system. Here number of feature vectors are varied and tested the -system performance. It is observed that system performance is boosted by -genetic algorithm. System with 16 acoustic features optimized with genetic -algorithm has obtained highest recognition rate of 98.28% with training time of -5:30:17. -" -1817,1506.02275,Umashanthi Pavalanathan and Jacob Eisenstein,Confounds and Consequences in Geotagged Twitter Data,cs.CL," Twitter is often used in quantitative studies that identify -geographically-preferred topics, writing styles, and entities. These studies -rely on either GPS coordinates attached to individual messages, or on the -user-supplied location field in each profile. In this paper, we compare these -data acquisition techniques and quantify the biases that they introduce; we -also measure their effects on linguistic analysis and text-based geolocation. -GPS-tagging and self-reported locations yield measurably different corpora, and -these linguistic differences are partially attributable to differences in -dataset composition by age and gender. Using a latent variable model to induce -age and gender, we show how these demographic variables interact with geography -to affect language use. We also show that the accuracy of text-based -geolocation varies with population demographics, giving the best results for -men above the age of 40. -" -1818,1506.02306,Shibamouli Lahiri,"SQUINKY! A Corpus of Sentence-level Formality, Informativeness, and - Implicature",cs.CL," We introduce a corpus of 7,032 sentences rated by human annotators for -formality, informativeness, and implicature on a 1-7 scale. The corpus was -annotated using Amazon Mechanical Turk. Reliability in the obtained judgments -was examined by comparing mean ratings across two MTurk experiments, and -correlation with pilot annotations (on sentence formality) conducted in a more -controlled setting. Despite the subjectivity and inherent difficulty of the -annotation task, correlations between mean ratings were quite encouraging, -especially on formality and informativeness. We further explored correlation -between the three linguistic variables, genre-wise variation of ratings and -correlations within genres, compatibility with automatic stylistic scoring, and -sentential make-up of a document in terms of style. To date, our corpus is the -largest sentence-level annotated corpus released for formality, -informativeness, and implicature. -" -1819,1506.02327,"Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, Yuan-ming Liou, - Yen-Chen Wu, Yen-Ju Lu, Hung-yi Lee and Lin-shan Lee","A Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN) for - Unsupervised Discovery of Linguistic Units and Generation of High Quality - Features",cs.CL cs.LG cs.NE," This paper summarizes the work done by the authors for the Zero Resource -Speech Challenge organized in the technical program of Interspeech 2015. The -goal of the challenge is to discover linguistic units directly from unlabeled -speech data. The Multi-layered Acoustic Tokenizer (MAT) proposed in this work -automatically discovers multiple sets of acoustic tokens from the given corpus. -Each acoustic token set is specified by a set of hyperparameters that describe -the model configuration. These sets of acoustic tokens carry different -characteristics of the given corpus and the language behind thus can be -mutually reinforced. The multiple sets of token labels are then used as the -targets of a Multi-target DNN (MDNN) trained on low-level acoustic features. -Bottleneck features extracted from the MDNN are used as feedback for the MAT -and the MDNN itself. We call this iterative system the Multi-layered Acoustic -Tokenizing Deep Neural Network (MAT-DNN) which generates high quality features -for track 1 of the challenge and acoustic tokens for track 2 of the challenge. -" -1820,1506.02338,"Andrew Trask, David Gilmore, Matthew Russell",Modeling Order in Neural Word Embeddings at Scale,cs.CL," Natural Language Processing (NLP) systems commonly leverage bag-of-words -co-occurrence techniques to capture semantic and syntactic word relationships. -The resulting word-level distributed representations often ignore morphological -information, though character-level embeddings have proven valuable to NLP -tasks. We propose a new neural language model incorporating both word order and -character order in its embedding. The model produces several vector spaces with -meaningful substructure, as evidenced by its performance of 85.8% on a recent -word-analogy task, exceeding best published syntactic word-analogy scores by a -58% error margin. Furthermore, the model includes several parallel training -methods, most notably allowing a skip-gram network with 160 billion parameters -to be trained overnight on 3 multi-core CPUs, 14x larger than the previous -largest neural network. -" -1821,1506.02516,"Edward Grefenstette, Karl Moritz Hermann, Mustafa Suleyman, Phil - Blunsom",Learning to Transduce with Unbounded Memory,cs.NE cs.CL cs.LG," Recently, strong results have been demonstrated by Deep Recurrent Neural -Networks on natural language transduction problems. In this paper we explore -the representational power of these models using synthetic grammars designed to -exhibit phenomena similar to those found in real transduction problems such as -machine translation. These experiments lead us to propose new memory-based -recurrent networks that implement continuously differentiable analogues of -traditional data structures such as Stacks, Queues, and DeQues. We show that -these architectures exhibit superior generalisation performance to Deep RNNs -and are often able to learn the underlying generating algorithms in our -transduction experiments. -" -1822,1506.02739,"Hannah Rashkin, Sameer Singh, and Yejin Choi",Connotation Frames: A Data-Driven Investigation,cs.CL," Through a particular choice of a predicate (e.g., ""x violated y""), a writer -can subtly connote a range of implied sentiments and presupposed facts about -the entities x and y: (1) writer's perspective: projecting x as an -""antagonist""and y as a ""victim"", (2) entities' perspective: y probably dislikes -x, (3) effect: something bad happened to y, (4) value: y is something valuable, -and (5) mental state: y is distressed by the event. We introduce connotation -frames as a representation formalism to organize these rich dimensions of -connotation using typed relations. First, we investigate the feasibility of -obtaining connotative labels through crowdsourcing experiments. We then present -models for predicting the connotation frames of verb predicates based on their -distributional word representations and the interplay between different types -of connotative relations. Empirical results confirm that connotation frames can -be induced from various data sources that reflect how people use language and -give rise to the connotative meanings. We conclude with analytical results that -show the potential use of connotation frames for analyzing subtle biases in -online news media. -" -1823,1506.02761,"Shihao Ji, Hyokun Yun, Pinar Yanardag, Shin Matsushima, and S. V. N. - Vishwanathan",WordRank: Learning Word Embeddings via Robust Ranking,cs.CL cs.LG stat.ML," Embedding words in a vector space has gained a lot of attention in recent -years. While state-of-the-art methods provide efficient computation of word -similarities via a low-dimensional matrix embedding, their motivation is often -left unclear. In this paper, we argue that word embedding can be naturally -viewed as a ranking problem due to the ranking nature of the evaluation -metrics. Then, based on this insight, we propose a novel framework WordRank -that efficiently estimates word representations via robust ranking, in which -the attention mechanism and robustness to noise are readily achieved via the -DCG-like ranking losses. The performance of WordRank is measured in word -similarity and word analogy benchmarks, and the results are compared to the -state-of-the-art word embedding techniques. Our algorithm is very competitive -to the state-of-the- arts on large corpora, while outperforms them by a -significant margin when the training set is limited (i.e., sparse and noisy). -With 17 million tokens, WordRank performs almost as well as existing methods -using 7.2 billion tokens on a popular word similarity benchmark. Our multi-node -distributed implementation of WordRank is publicly available for general usage. -" -1824,1506.02816,"George Gkotsis, Maria Liakata, Carlos Pedrinaci, John Domingue","Leveraging Textual Features for Best Answer Prediction in - Community-based Question Answering",cs.CL cs.IR," This paper addresses the problem of determining the best answer in -Community-based Question Answering (CQA) websites by focussing on the content. -In particular, we present a system, ACQUA [http://acqua.kmi.open.ac.uk], that -can be installed onto the majority of browsers as a plugin. The service offers -a seamless and accurate prediction of the answer to be accepted. Previous -research on this topic relies on the exploitation of community feedback on the -answers, which involves rating of either users (e.g., reputation) or answers -(e.g. scores manually assigned to answers). We propose a new technique that -leverages the content/textual features of answers in a novel way. Our approach -delivers better results than related linguistics-based solutions and manages to -match rating-based approaches. More specifically, the gain in performance is -achieved by rendering the values of these features into a discretised form. We -also show how our technique manages to deliver equally good results in -real-time settings, as opposed to having to rely on information not always -readily available, such as user ratings and answer scores. We ran an evaluation -on 21 StackExchange websites covering around 4 million questions and more than -8 million answers. We obtain 84% average precision and 70% recall, which shows -that our technique is robust, effective, and widely applicable. -" -1825,1506.02922,Dimitra Gkatzia and Helen Hastie,An Ensemble method for Content Selection for Data-to-text Systems,cs.CL cs.AI," We present a novel approach for automatic report generation from time-series -data, in the context of student feedback generation. Our proposed methodology -treats content selection as a multi-label classification (MLC) problem, which -takes as input time-series data (students' learning data) and outputs a summary -of these data (feedback). Unlike previous work, this method considers all data -simultaneously using ensembles of classifiers, and therefore, it achieves -higher accuracy and F- score compared to meaningful baselines. -" -1826,1506.03099,"Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer","Scheduled Sampling for Sequence Prediction with Recurrent Neural - Networks",cs.LG cs.CL cs.CV," Recurrent Neural Networks can be trained to produce sequences of tokens given -some input, as exemplified by recent results in machine translation and image -captioning. The current approach to training them consists of maximizing the -likelihood of each token in the sequence given the current (recurrent) state -and the previous token. At inference, the unknown previous token is then -replaced by a token generated by the model itself. This discrepancy between -training and inference can yield errors that can accumulate quickly along the -generated sequence. We propose a curriculum learning strategy to gently change -the training process from a fully guided scheme using the true previous token, -towards a less guided scheme which mostly uses the generated token instead. -Experiments on several sequence prediction tasks show that this approach yields -significant improvements. Moreover, it was used successfully in our winning -entry to the MSCOCO image captioning challenge, 2015. -" -1827,1506.03139,"Keenon Werling, Gabor Angeli, Christopher Manning","Robust Subgraph Generation Improves Abstract Meaning Representation - Parsing",cs.CL," The Abstract Meaning Representation (AMR) is a representation for open-domain -rich semantics, with potential use in fields like event extraction and machine -translation. Node generation, typically done using a simple dictionary lookup, -is currently an important limiting factor in AMR parsing. We propose a small -set of actions that derive AMR subgraphs by transformations on spans of text, -which allows for more robust learning of this stage. Our set of construction -actions generalize better than the previous approach, and can be learned with a -simple classifier. We improve on the previous state-of-the-art result for AMR -parsing, boosting end-to-end performance by 3 F$_1$ on both the LDC2013E117 and -LDC2014T12 datasets. -" -1828,1506.03229,"Bruno Golosio, Angelo Cangelosi, Olesya Gamotina, Giovanni Luca Masala","A cognitive neural architecture able to learn and communicate through - natural language",cs.CL," Communicative interactions involve a kind of procedural knowledge that is -used by the human brain for processing verbal and nonverbal inputs and for -language production. Although considerable work has been done on modeling human -language abilities, it has been difficult to bring them together to a -comprehensive tabula rasa system compatible with current knowledge of how -verbal information is processed in the brain. This work presents a cognitive -system, entirely based on a large-scale neural architecture, which was -developed to shed light on the procedural knowledge involved in language -elaboration. The main component of this system is the central executive, which -is a supervising system that coordinates the other components of the working -memory. In our model, the central executive is a neural network that takes as -input the neural activation states of the short-term memory and yields as -output mental actions, which control the flow of information among the working -memory components through neural gating mechanisms. The proposed system is -capable of learning to communicate through natural language starting from -tabula rasa, without any a priori knowledge of the structure of phrases, -meaning of words, role of the different classes of words, only by interacting -with a human through a text-based interface, using an open-ended incremental -learning process. It is able to learn nouns, verbs, adjectives, pronouns and -other word classes, and to use them in expressive language. The model was -validated on a corpus of 1587 input sentences, based on literature on early -language assessment, at the level of about 4-years old child, and produced 521 -output sentences, expressing a broad range of language processing -functionalities. -" -1829,1506.03257,Borja Navarro-Colorado and Estela Saquete,"Combining Temporal Information and Topic Modeling for Cross-Document - Event Ordering",cs.CL," Building unified timelines from a collection of written news articles -requires cross-document event coreference resolution and temporal relation -extraction. In this paper we present an approach event coreference resolution -according to: a) similar temporal information, and b) similar semantic -arguments. Temporal information is detected using an automatic temporal -information system (TIPSem), while semantic information is represented by means -of LDA Topic Modeling. The evaluation of our approach shows that it obtains the -highest Micro-average F-score results in the SemEval2015 Task 4: TimeLine: -Cross-Document Event Ordering (25.36\% for TrackB, 23.15\% for SubtrackB), with -an improvement of up to 6\% in comparison to the other systems. However, our -experiment also showed some draw-backs in the Topic Modeling approach that -degrades performance of the system. -" -1830,1506.03340,"Karl Moritz Hermann, Tom\'a\v{s} Ko\v{c}isk\'y, Edward Grefenstette, - Lasse Espeholt, Will Kay, Mustafa Suleyman and Phil Blunsom",Teaching Machines to Read and Comprehend,cs.CL cs.AI cs.NE," Teaching machines to read natural language documents remains an elusive -challenge. Machine reading systems can be tested on their ability to answer -questions posed on the contents of documents that they have seen, but until now -large scale training and test datasets have been missing for this type of -evaluation. In this work we define a new methodology that resolves this -bottleneck and provides large scale supervised reading comprehension data. This -allows us to develop a class of attention based deep neural networks that learn -to read real documents and answer complex questions with minimal prior -knowledge of language structure. -" -1831,1506.03487,"John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu, and Dan Roth",From Paraphrase Database to Compositional Paraphrase Model and Back,cs.CL," The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive -semantic resource, consisting of a list of phrase pairs with (heuristic) -confidence estimates. However, it is still unclear how it can best be used, due -to the heuristic nature of the confidences and its necessarily incomplete -coverage. We propose models to leverage the phrase pairs from the PPDB to build -parametric paraphrase models that score paraphrase pairs more accurately than -the PPDB's internal scores while simultaneously improving its coverage. They -allow for learning phrase embeddings as well as improved word embeddings. -Moreover, we introduce two new, manually annotated datasets to evaluate -short-phrase paraphrasing models. Using our paraphrase model trained using -PPDB, we achieve state-of-the-art results on standard word and bigram -similarity tasks and beat strong baselines on our new short phrase paraphrase -tasks. -" -1832,1506.03500,"Angeliki Lazaridou, Dat Tien Nguyen, Raffaella Bernardi, Marco Baroni","Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image - Generation",cs.CV cs.CL," We introduce language-driven image generation, the task of generating an -image visualizing the semantic contents of a word embedding, e.g., given the -word embedding of grasshopper, we generate a natural image of a grasshopper. We -implement a simple method based on two mapping functions. The first takes as -input a word embedding (as produced, e.g., by the word2vec toolkit) and maps it -onto a high-level visual space (e.g., the space defined by one of the top -layers of a Convolutional Neural Network). The second function maps this -abstract visual representation to pixel space, in order to generate the target -image. Several user studies suggest that the current system produces images -that capture general visual properties of the concepts encoded in the word -embedding, such as color or typical environment, and are sufficient to -discriminate between general categories of objects. -" -1833,1506.03694,Grzegorz Chrupa{\l}a and \'Akos K\'ad\'ar and Afra Alishahi,Learning language through pictures,cs.CL," We propose Imaginet, a model of learning visually grounded representations of -language from coupled textual and visual input. The model consists of two Gated -Recurrent Unit networks with shared word embeddings, and uses a multi-task -objective by receiving a textual description of a scene and trying to -concurrently predict its visual representation and the next word in the -sentence. Mimicking an important aspect of human language learning, it acquires -meaning representations for individual words from descriptions of visual -scenes. Moreover, it learns to effectively use sequential structure in semantic -interpretation of multi-word phrases. -" -1834,1506.03775,"Prakhar Biyani, Cornelia Caragea, Narayan Bhamidipati",Entity-Specific Sentiment Classification of Yahoo News Comments,cs.CL cs.IR cs.SI," Sentiment classification is widely used for product reviews and in online -social media such as forums, Twitter, and blogs. However, the problem of -classifying the sentiment of user comments on news sites has not been addressed -yet. News sites cover a wide range of domains including politics, sports, -technology, and entertainment, in contrast to other online social sites such as -forums and review sites, which are specific to a particular domain. A user -associated with a news site is likely to post comments on diverse topics (e.g., -politics, smartphones, and sports) or diverse entities (e.g., Obama, iPhone, or -Google). Classifying the sentiment of users tied to various entities may help -obtain a holistic view of their personality, which could be useful in -applications such as online advertising, content personalization, and political -campaign planning. In this paper, we formulate the problem of entity-specific -sentiment classification of comments posted on news articles in Yahoo News and -propose novel features that are specific to news comments. Experimental results -show that our models outperform state-of-the-art baselines. -" -1835,1506.04089,"Hongyuan Mei, Mohit Bansal, Matthew R. Walter","Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to - Action Sequences",cs.CL cs.AI cs.LG cs.NE cs.RO," We propose a neural sequence-to-sequence model for direction following, a -task that is essential to realizing effective autonomous agents. Our -alignment-based encoder-decoder model with long short-term memory recurrent -neural networks (LSTM-RNN) translates natural language instructions to action -sequences based upon a representation of the observable world state. We -introduce a multi-level aligner that empowers our model to focus on sentence -""regions"" salient to the current world state by using multiple abstractions of -the input sentence. In contrast to existing methods, our model uses no -specialized linguistic resources (e.g., parsers) or task-specific annotations -(e.g., seed lexicons). It is therefore generalizable, yet still achieves the -best results reported to-date on a benchmark single-sentence dataset and -competitive results for the limited-training multi-sentence setting. We analyze -our model through a series of ablations that elucidate the contributions of the -primary components of our model. -" -1836,1506.04147,"Jacob Andreas, Maxim Rabinovich, Dan Klein, Michael I. Jordan",On the accuracy of self-normalized log-linear models,stat.ML cs.CL cs.LG stat.ME," Calculation of the log-normalizer is a major computational obstacle in -applications of log-linear models with large output spaces. The problem of fast -normalizer computation has therefore attracted significant attention in the -theoretical and applied machine learning literature. In this paper, we analyze -a recently proposed technique known as ""self-normalization"", which introduces a -regularization term in training to penalize log normalizers for deviating from -zero. This makes it possible to use unnormalized model scores as approximate -probabilities. Empirical evidence suggests that self-normalization is extremely -effective, but a theoretical understanding of why it should work, and how -generally it can be applied, is largely lacking. We prove generalization bounds -on the estimated variance of normalizers and upper bounds on the loss in -accuracy due to self-normalization, describe classes of input distributions -that self-normalize easily, and construct explicit examples of high-variance -input distributions. Our theoretical results make predictions about the -difficulty of fitting self-normalized models to several classes of -distributions, and we conclude with empirical validation of these predictions. -" -1837,1506.04228,"Grigor Iliev, Nadezhda Borisova, Elena Karashtranova, Dafina - Kostadinova",A Publicly Available Cross-Platform Lemmatizer for Bulgarian,cs.CL," Our dictionary-based lemmatizer for the Bulgarian language presented here is -distributed as free software, publicly available to download and use under the -GPL v3 license. The presented software is written entirely in Java and is -distributed as a GATE plugin. To our best knowledge, at the time of writing -this article, there are not any other free lemmatization tools specifically -targeting the Bulgarian language. The presented lemmatizer is a work in -progress and currently yields an accuracy of about 95% in comparison to the -manually annotated corpus BulTreeBank-Morph, which contains 273933 tokens. -" -1838,1506.04229,"Elena Karashtranova, Grigor Iliev, Nadezhda Borisova, Yana Chankova, - Irena Atanasova",Evaluation of the Accuracy of the BGLemmatizer,cs.CL," This paper reveals the results of an analysis of the accuracy of developed -software for automatic lemmatization for the Bulgarian language. This -lemmatization software is written entirely in Java and is distributed as a GATE -plugin. Certain statistical methods are used to define the accuracy of this -software. The results of the analysis show 95% lemmatization accuracy. -" -1839,1506.04334,Jan Buys and Phil Blunsom,A Bayesian Model for Generative Transition-based Dependency Parsing,cs.CL," We propose a simple, scalable, fully generative model for transition-based -dependency parsing with high accuracy. The model, parameterized by Hierarchical -Pitman-Yor Processes, overcomes the limitations of previous generative models -by allowing fast and accurate inference. We propose an efficient decoding -algorithm based on particle filtering that can adapt the beam size to the -uncertainty in the model while jointly predicting POS tags and parse trees. The -UAS of the parser is on par with that of a greedy discriminative baseline. As a -language model, it obtains better perplexity than a n-gram model by performing -semi-supervised learning over a large unlabelled corpus. We show that the model -is able to generate locally and syntactically coherent sentences, opening the -door to further applications in language generation. -" -1840,1506.04365,"Kuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang, Berlin Chen, Hsin-Hsi Chen",Leveraging Word Embeddings for Spoken Document Summarization,cs.CL cs.AI," Owing to the rapidly growing multimedia content available on the Internet, -extractive spoken document summarization, with the purpose of automatically -selecting a set of representative sentences from a spoken document to concisely -express the most important theme of the document, has been an active area of -research and experimentation. On the other hand, word embedding has emerged as -a newly favorite research subject because of its excellent performance in many -natural language processing (NLP)-related tasks. However, as far as we are -aware, there are relatively few studies investigating its use in extractive -text or speech summarization. A common thread of leveraging word embeddings in -the summarization process is to represent the document (or sentence) by -averaging the word embeddings of the words occurring in the document (or -sentence). Then, intuitively, the cosine similarity measure can be employed to -determine the relevance degree between a pair of representations. Beyond the -continued efforts made to improve the representation of words, this paper -focuses on building novel and efficient ranking models based on the general -word embedding methods for extractive speech summarization. Experimental -results demonstrate the effectiveness of our proposed methods, compared to -existing state-of-the-art methods. -" -1841,1506.04488,"Lili Mou, Ran Jia, Yan Xu, Ge Li, Lu Zhang, Zhi Jin",Distilling Word Embeddings: An Encoding Approach,cs.CL cs.LG," Distilling knowledge from a well-trained cumbersome network to a small one -has recently become a new research topic, as lightweight neural networks with -high performance are particularly in need in various resource-restricted -systems. This paper addresses the problem of distilling word embeddings for NLP -tasks. We propose an encoding approach to distill task-specific knowledge from -a set of high-dimensional embeddings, which can reduce model complexity by a -large margin as well as retain high accuracy, showing a good compromise between -efficiency and performance. Experiments in two tasks reveal the phenomenon that -distilling knowledge from cumbersome embeddings is better than directly -training neural networks with small embeddings. -" -1842,1506.04744,"Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, Cristian - Danescu-Niculescu-Mizil","Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy - Game",cs.CL cs.AI cs.SI physics.soc-ph stat.ML," Interpersonal relations are fickle, with close friendships often dissolving -into enmity. In this work, we explore linguistic cues that presage such -transitions by studying dyadic interactions in an online strategy game where -players form alliances and break those alliances through betrayal. We -characterize friendships that are unlikely to last and examine temporal -patterns that foretell betrayal. - We reveal that subtle signs of imminent betrayal are encoded in the -conversational patterns of the dyad, even if the victim is not aware of the -relationship's fate. In particular, we find that lasting friendships exhibit a -form of balance that manifests itself through language. In contrast, sudden -changes in the balance of certain conversational attributes---such as positive -sentiment, politeness, or focus on future planning---signal impending betrayal. -" -1843,1506.04803,"Afshin Rahimi, Duy Vu, Trevor Cohn, and Timothy Baldwin","Exploiting Text and Network Context for Geolocation of Social Media - Users",cs.CL cs.SI," Research on automatically geolocating social media users has conventionally -been based on the text content of posts from a given user or the social network -of the user, with very little crossover between the two, and no bench-marking -of the two approaches over compara- ble datasets. We bring the two threads of -research together in first proposing a text-based method based on adaptive -grids, followed by a hybrid network- and text-based method. Evaluating over -three Twitter datasets, we show that the empirical difference between text- and -network-based methods is not great, and that hybridisation of the two is -superior to the component methods, especially in contexts where the user graph -is not well connected. We achieve state-of-the-art results on all three -datasets. -" -1844,1506.04828,"T. V. Ananthapadmanabha, A. G. Ramakrishnan, and Shubham Sharma","Significance of the levels of spectral valleys with application to - front/back distinction of vowel sounds",cs.CL cs.SD," An objective critical distance (OCD) has been defined as that spacing between -adjacent formants, when the level of the valley between them reaches the mean -spectral level. The measured OCD lies in the same range (viz., 3-3.5 bark) as -the critical distance determined by subjective experiments for similar -experimental conditions. The level of spectral valley serves a purpose similar -to that of the spacing between the formants with an added advantage that it can -be measured from the spectral envelope without an explicit knowledge of formant -frequencies. Based on the relative spacing of formant frequencies, the level of -the spectral valley, VI (between F1 and F2) is much higher than the level of -VII (spectral valley between F2 and F3) for back vowels and vice-versa for -front vowels. Classification of vowels into front/back distinction with the -difference (VI-VII) as an acoustic feature, tested using TIMIT, NTIMIT, Tamil -and Kannada language databases gives, on the average, an accuracy of about 95%, -which is comparable to the accuracy (90.6%) obtained using a neural network -classifier trained and tested using MFCC as the feature vector for TIMIT -database. The acoustic feature (VI-VII) has also been tested for its robustness -on the TIMIT database for additive white and babble noise and an accuracy of -about 95% has been obtained for SNRs down to 25 dB for both types of noise. -" -1845,1506.04834,"Samuel R. Bowman, Christopher D. Manning, and Christopher Potts","Tree-structured composition in neural networks without tree-structured - architectures",cs.CL cs.LG," Tree-structured neural networks encode a particular tree geometry for a -sentence in the network design. However, these models have at best only -slightly outperformed simpler sequence-based models. We hypothesize that neural -sequence models like LSTMs are in fact able to discover and implicitly use -recursive compositional structure, at least for tasks with clear cues to that -structure in the data. We demonstrate this possibility using an artificial data -task for which recursive compositional structure is crucial, and find an -LSTM-based sequence model can indeed learn to exploit the underlying tree -structure. However, its performance consistently lags behind that of tree -models, even on large training sets, suggesting that tree-structured models are -more effective at exploiting recursive structure. -" -1846,1506.04891,Douglas Bagnall,Author Identification using Multi-headed Recurrent Neural Networks,cs.CL cs.LG cs.NE," Recurrent neural networks (RNNs) are very good at modelling the flow of text, -but typically need to be trained on a far larger corpus than is available for -the PAN 2015 Author Identification task. This paper describes a novel approach -where the output layer of a character-level RNN language model is split into -several independent predictive sub-models, each representing an author, while -the recurrent layer is shared by all. This allows the recurrent layer to model -the language as a whole without over-fitting, while the outputs select aspects -of the underlying model that reflect their author's style. The method proves -competitive, ranking first in two of the four languages. -" -1847,1506.04897,Rudolf Rosa,Parsing Natural Language Sentences by Semi-supervised Methods,cs.CL," We present our work on semi-supervised parsing of natural language sentences, -focusing on multi-source crosslingual transfer of delexicalized dependency -parsers. We first evaluate the influence of treebank annotation styles on -parsing performance, focusing on adposition attachment style. Then, we present -KLcpos3, an empirical language similarity measure, designed and tuned for -source parser weighting in multi-source delexicalized parser transfer. And -finally, we introduce a novel resource combination method, based on -interpolation of trained parser models. -" -1848,1506.04940,"Xi Ma, Xiaoxi Wang, Dong Wang, Zhiyong Zhang",Recognize Foreign Low-Frequency Words with Similar Pairs,cs.CL," Low-frequency words place a major challenge for automatic speech recognition -(ASR). The probabilities of these words, which are often important name -entities, are generally under-estimated by the language model (LM) due to their -limited occurrences in the training data. Recently, we proposed a word-pair -approach to deal with the problem, which borrows information of frequent words -to enhance the probabilities of low-frequency words. This paper presents an -extension to the word-pair method by involving multiple `predicting words' to -produce better estimation for low-frequency words. We also employ this approach -to deal with out-of-language words in the task of multi-lingual speech -recognition. -" -1849,1506.05012,"Adit Jamdar, Jessica Abraham, Karishma Khanna and Rahul Dubey",Emotion Analysis of Songs Based on Lyrical and Audio Features,cs.CL cs.AI cs.SD," In this paper, a method is proposed to detect the emotion of a song based on -its lyrical and audio features. Lyrical features are generated by segmentation -of lyrics during the process of data extraction. ANEW and WordNet knowledge is -then incorporated to compute Valence and Arousal values. In addition to this, -linguistic association rules are applied to ensure that the issue of ambiguity -is properly addressed. Audio features are used to supplement the lyrical ones -and include attributes like energy, tempo, and danceability. These features are -extracted from The Echo Nest, a widely used music intelligence platform. -Construction of training and test sets is done on the basis of social tags -extracted from the last.fm website. The classification is done by applying -feature weighting and stepwise threshold reduction on the k-Nearest Neighbors -algorithm to provide fuzziness in the classification. -" -1850,1506.05230,"Manaal Faruqui, Chris Dyer",Non-distributional Word Vector Representations,cs.CL," Data-driven representation learning for words is a technique of central -importance in NLP. While indisputably useful as a source of features in -downstream tasks, such vectors tend to consist of uninterpretable components -whose relationship to the categories of traditional lexical semantic theories -is tenuous at best. We present a method for constructing interpretable word -vectors from hand-crafted linguistic resources like WordNet, FrameNet etc. -These vectors are binary (i.e, contain only 0 and 1) and are 99.9% sparse. We -analyze their performance on state-of-the-art evaluation methods for -distributional models of word vectors and find they are competitive to standard -distributional approaches. -" -1851,1506.05402,"Iana Atanassova, Marc Bertin, Philipp Mayr","Editorial for the First Workshop on Mining Scientific Papers: - Computational Linguistics and Bibliometrics",cs.CL cs.DL cs.IR," The workshop ""Mining Scientific Papers: Computational Linguistics and -Bibliometrics"" (CLBib 2015), co-located with the 15th International Society of -Scientometrics and Informetrics Conference (ISSI 2015), brought together -researchers in Bibliometrics and Computational Linguistics in order to study -the ways Bibliometrics can benefit from large-scale text analytics and sense -mining of scientific papers, thus exploring the interdisciplinarity of -Bibliometrics and Natural Language Processing (NLP). The goals of the workshop -were to answer questions like: How can we enhance author network analysis and -Bibliometrics using data obtained by text analytics? What insights can NLP -provide on the structure of scientific writing, on citation networks, and on -in-text citation analysis? This workshop is the first step to foster the -reflection on the interdisciplinarity and the benefits that the two disciplines -Bibliometrics and Natural Language Processing can drive from it. -" -1852,1506.05514,"Ubai Sandouk, Ke Chen","Learning Contextualized Semantics from Co-occurring Terms via a Siamese - Architecture",cs.IR cs.CL cs.LG," One of the biggest challenges in Multimedia information retrieval and -understanding is to bridge the semantic gap by properly modeling concept -semantics in context. The presence of out of vocabulary (OOV) concepts -exacerbates this difficulty. To address the semantic gap issues, we formulate a -problem on learning contextualized semantics from descriptive terms and propose -a novel Siamese architecture to model the contextualized semantics from -descriptive terms. By means of pattern aggregation and probabilistic topic -models, our Siamese architecture captures contextualized semantics from the -co-occurring descriptive terms via unsupervised learning, which leads to a -concept embedding space of the terms in context. Furthermore, the co-occurring -OOV concepts can be easily represented in the learnt concept embedding space. -The main properties of the concept embedding space are demonstrated via -visualization. Using various settings in semantic priming, we have carried out -a thorough evaluation by comparing our approach to a number of state-of-the-art -methods on six annotation corpora in different domains, i.e., MagTag5K, CAL500 -and Million Song Dataset in the music domain as well as Corel5K, LabelMe and -SUNDatabase in the image domain. Experimental results on semantic priming -suggest that our approach outperforms those state-of-the-art methods -considerably in various aspects. -" -1853,1506.05561,Richard Moot (LaBRI),Comparing and evaluating extended Lambek calculi,cs.CL cs.LO," Lambeks Syntactic Calculus, commonly referred to as the Lambek calculus, was -innovative in many ways, notably as a precursor of linear logic. But it also -showed that we could treat our grammatical framework as a logic (as opposed to -a logical theory). However, though it was successful in giving at least a basic -treatment of many linguistic phenomena, it was also clear that a slightly more -expressive logical calculus was needed for many other cases. Therefore, many -extensions and variants of the Lambek calculus have been proposed, since the -eighties and up until the present day. As a result, there is now a large class -of calculi, each with its own empirical successes and theoretical results, but -also each with its own logical primitives. This raises the question: how do we -compare and evaluate these different logical formalisms? To answer this -question, I present two unifying frameworks for these extended Lambek calculi. -Both are proof net calculi with graph contraction criteria. The first calculus -is a very general system: you specify the structure of your sequents and it -gives you the connectives and contractions which correspond to it. The calculus -can be extended with structural rules, which translate directly into graph -rewrite rules. The second calculus is first-order (multiplicative -intuitionistic) linear logic, which turns out to have several other, -independently proposed extensions of the Lambek calculus as fragments. I will -illustrate the use of each calculus in building bridges between analyses -proposed in different frameworks, in highlighting differences and in helping to -identify problems. -" -1854,1506.05676,"Jiri Marsik (SEMAGRAMME), Maxime Amblard (SEMAGRAMME)",Pragmatic Side Effects,cs.CL," In the quest to give a formal compositional semantics to natural languages, -semanticists have started turning their attention to phenomena that have been -also considered as parts of pragmatics (e.g., discourse anaphora and -presupposition projection). To account for these phenomena, the very kinds of -meanings assigned to words and phrases are often revisited. To be more -specific, in the prevalent paradigm of modeling natural language denotations -using the simply-typed lambda calculus (higher-order logic) this means -revisiting the types of denotations assigned to individual parts of speech. -However, the lambda calculus also serves as a fundamental theory of -computation, and in the study of computation, similar type shifts have been -employed to give a meaning to side effects. Side effects in programming -languages correspond to actions that go beyond the lexical scope of an -expression (a thrown exception might propagate throughout a program, a variable -modified at one point might later be read at an another) or even beyond the -scope of the program itself (a program might interact with the outside world by -e.g., printing documents, making sounds, operating robotic limbs...). -" -1855,1506.05702,Diego R. Amancio,Comparing the writing style of real and artificial papers,cs.CL," Recent years have witnessed the increase of competition in science. While -promoting the quality of research in many cases, an intense competition among -scientists can also trigger unethical scientific behaviors. To increase the -total number of published papers, some authors even resort to software tools -that are able to produce grammatical, but meaningless scientific manuscripts. -Because automatically generated papers can be misunderstood as real papers, it -becomes of paramount importance to develop means to identify these scientific -frauds. In this paper, I devise a methodology to distinguish real manuscripts -from those generated with SCIGen, an automatic paper generator. Upon modeling -texts as complex networks (CN), it was possible to discriminate real from fake -papers with at least 89\% of accuracy. A systematic analysis of features -relevance revealed that the accessibility and betweenness were useful in -particular cases, even though the relevance depended upon the dataset. The -successful application of the methods described here show, as a proof of -principle, that network features can be used to identify scientific gibberish -papers. In addition, the CN-based approach can be combined in a straightforward -fashion with traditional statistical language processing methods to improve the -performance in identifying artificially generated papers. -" -1856,1506.05703,R\'emi Lebret and Ronan Collobert,"""The Sum of Its Parts"": Joint Learning of Word and Phrase - Representations with Autoencoders",cs.CL," Recently, there has been a lot of effort to represent words in continuous -vector spaces. Those representations have been shown to capture both semantic -and syntactic information about words. However, distributed representations of -phrases remain a challenge. We introduce a novel model that jointly learns word -vector representations and their summation. Word representations are learnt -using the word co-occurrence statistical information. To embed sequences of -words (i.e. phrases) with different sizes into a common semantic space, we -propose to average word vector representations. In contrast with previous -methods which reported a posteriori some compositionality aspects by simple -summation, we simultaneously train words to sum, while keeping the maximum -information from the original vectors. We evaluate the quality of the word -representations on several classical word evaluation tasks, and we introduce a -novel task to evaluate the quality of the phrase representations. While our -distributed representations compete with other methods of learning word -representations on word evaluations, we show that they give better performance -on the phrase evaluation. Such representations of phrases could be interesting -for many tasks in natural language processing. -" -1857,1506.05865,"Baotian Hu, Qingcai Chen, Fangze Zhu",LCSTS: A Large Scale Chinese Short Text Summarization Dataset,cs.CL cs.IR cs.LG," Automatic text summarization is widely regarded as the highly difficult -problem, partially because of the lack of large text summarization data set. -Due to the great challenge of constructing the large scale summaries for full -text, in this paper, we introduce a large corpus of Chinese short text -summarization dataset constructed from the Chinese microblogging website Sina -Weibo, which is released to the public -{http://icrc.hitsz.edu.cn/Article/show/139.html}. This corpus consists of over -2 million real Chinese short texts with short summaries given by the author of -each text. We also manually tagged the relevance of 10,666 short summaries with -their corresponding short texts. Based on the corpus, we introduce recurrent -neural network for the summary generation and achieve promising results, which -not only shows the usefulness of the proposed corpus for short text -summarization research, but also provides a baseline for further research on -this topic. -" -1858,1506.05869,"Oriol Vinyals, Quoc Le",A Neural Conversational Model,cs.CL," Conversational modeling is an important task in natural language -understanding and machine intelligence. Although previous approaches exist, -they are often restricted to specific domains (e.g., booking an airline ticket) -and require hand-crafted rules. In this paper, we present a simple approach for -this task which uses the recently proposed sequence to sequence framework. Our -model converses by predicting the next sentence given the previous sentence or -sentences in a conversation. The strength of our model is that it can be -trained end-to-end and thus requires much fewer hand-crafted rules. We find -that this straightforward model can generate simple conversations given a large -conversational training dataset. Our preliminary results suggest that, despite -optimizing the wrong objective function, the model is able to converse well. It -is able extract knowledge from both a domain specific dataset, and from a -large, noisy, and general domain dataset of movie subtitles. On a -domain-specific IT helpdesk dataset, the model can find a solution to a -technical problem via conversations. On a noisy open-domain movie transcript -dataset, the model can perform simple forms of common sense reasoning. As -expected, we also find that the lack of consistency is a common failure mode of -our model. -" -1859,1506.06158,"David Weiss, Chris Alberti, Michael Collins, Slav Petrov",Structured Training for Neural Network Transition-Based Parsing,cs.CL," We present structured perceptron training for neural network transition-based -dependency parsing. We learn the neural network representation using a gold -corpus augmented by a large number of automatically parsed sentences. Given -this fixed network representation, we learn a final layer using the structured -perceptron with beam-search decoding. On the Penn Treebank, our parser reaches -94.26% unlabeled and 92.41% labeled attachment accuracy, which to our knowledge -is the best accuracy on Stanford Dependencies to date. We also provide in-depth -ablative analysis to determine which aspects of our model provide the largest -gains in accuracy. -" -1860,1506.06418,"Raphael Hoffmann, Luke Zettlemoyer, Daniel S. Weld",Extreme Extraction: Only One Hour per Relation,cs.CL cs.AI cs.IR," Information Extraction (IE) aims to automatically generate a large knowledge -base from natural language text, but progress remains slow. Supervised learning -requires copious human annotation, while unsupervised and weakly supervised -approaches do not deliver competitive accuracy. As a result, most fielded -applications of IE, as well as the leading TAC-KBP systems, rely on significant -amounts of manual engineering. Even ""Extreme"" methods, such as those reported -in Freedman et al. 2011, require about 10 hours of expert labor per relation. - This paper shows how to reduce that effort by an order of magnitude. We -present a novel system, InstaRead, that streamlines authoring with an ensemble -of methods: 1) encoding extraction rules in an expressive and compositional -representation, 2) guiding the user to promising rules based on corpus -statistics and mined resources, and 3) introducing a new interactive -development cycle that provides immediate feedback --- even on large datasets. -Experiments show that experts can create quality extractors in under an hour -and even NLP novices can author good extractors. These extractors equal or -outperform ones obtained by comparably supervised and state-of-the-art -distantly supervised approaches. -" -1861,1506.06442,"Fandong Meng, Zhengdong Lu, Zhaopeng Tu, Hang Li, and Qun Liu",A Deep Memory-based Architecture for Sequence-to-Sequence Learning,cs.CL cs.LG cs.NE," We propose DEEPMEMORY, a novel deep architecture for sequence-to-sequence -learning, which performs the task through a series of nonlinear transformations -from the representation of the input sequence (e.g., a Chinese sentence) to the -final output sequence (e.g., translation to English). Inspired by the recently -proposed Neural Turing Machine (Graves et al., 2014), we store the intermediate -representations in stacked layers of memories, and use read-write operations on -the memories to realize the nonlinear transformations between the -representations. The types of transformations are designed in advance but the -parameters are learned from data. Through layer-by-layer transformations, -DEEPMEMORY can model complicated relations between sequences necessary for -applications such as machine translation between distant languages. The -architecture can be trained with normal back-propagation on sequenceto-sequence -data, and the learning can be easily scaled up to a large corpus. DEEPMEMORY is -broad enough to subsume the state-of-the-art neural translation model in -(Bahdanau et al., 2015) as its special case, while significantly improving upon -the model with its deeper architecture. Remarkably, DEEPMEMORY, being purely -neural network-based, can achieve performance comparable to the traditional -phrase-based machine translation system Moses with a small vocabulary and a -modest parameter size. -" -1862,1506.06490,"Xiaoqiang Zhou, Baotian Hu, Qingcai Chen, Buzhou Tang, Xiaolong Wang","Answer Sequence Learning with Neural Networks for Answer Selection in - Community Question Answering",cs.CL cs.IR cs.LG," In this paper, the answer selection problem in community question answering -(CQA) is regarded as an answer sequence labeling task, and a novel approach is -proposed based on the recurrent architecture for this problem. Our approach -applies convolution neural networks (CNNs) to learning the joint representation -of question-answer pair firstly, and then uses the joint representation as -input of the long short-term memory (LSTM) to learn the answer sequence of a -question for labeling the matching quality of each answer. Experiments -conducted on the SemEval 2015 CQA dataset shows the effectiveness of our -approach. -" -1863,1506.06534,"Esma Balkir, Mehrnoosh Sadrzadeh and Bob Coecke",Distributional Sentence Entailment Using Density Matrices,cs.CL cs.IT cs.LO math.CT math.IT," Categorical compositional distributional model of Coecke et al. (2010) -suggests a way to combine grammatical composition of the formal, type logical -models with the corpus based, empirical word representations of distributional -semantics. This paper contributes to the project by expanding the model to also -capture entailment relations. This is achieved by extending the representations -of words from points in meaning space to density operators, which are -probability distributions on the subspaces of the space. A symmetric measure of -similarity and an asymmetric measure of entailment is defined, where lexical -entailment is measured using von Neumann entropy, the quantum variant of -Kullback-Leibler divergence. Lexical entailment, combined with the composition -map on word representations, provides a method to obtain entailment relations -on the level of sentences. Truth theoretic and corpus-based examples are -provided. -" -1864,1506.06646,"Tadahiro Taniguchi, Ryo Nakashima, and Shogo Nagasaka","Nonparametric Bayesian Double Articulation Analyzer for Direct Language - Acquisition from Continuous Speech Signals",cs.AI cs.CL cs.LG stat.ML," Human infants can discover words directly from unsegmented speech signals -without any explicitly labeled data. In this paper, we develop a novel machine -learning method called nonparametric Bayesian double articulation analyzer -(NPB-DAA) that can directly acquire language and acoustic models from observed -continuous speech signals. For this purpose, we propose an integrative -generative model that combines a language model and an acoustic model into a -single generative model called the ""hierarchical Dirichlet process hidden -language model"" (HDP-HLM). The HDP-HLM is obtained by extending the -hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by -Johnson et al. An inference procedure for the HDP-HLM is derived using the -blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure -enables the simultaneous and direct inference of language and acoustic models -from continuous speech signals. Based on the HDP-HLM and its inference -procedure, we developed a novel double articulation analyzer. By assuming -HDP-HLM as a generative model of observed time series data, and by inferring -latent variables of the model, the method can analyze latent double -articulation structure, i.e., hierarchically organized latent words and -phonemes, of the data in an unsupervised manner. The novel unsupervised double -articulation analyzer is called NPB-DAA. - The NPB-DAA can automatically estimate double articulation structure embedded -in speech signals. We also carried out two evaluation experiments using -synthetic data and actual human continuous speech signals representing Japanese -vowel sequences. In the word acquisition and phoneme categorization tasks, the -NPB-DAA outperformed a conventional double articulation analyzer (DAA) and -baseline automatic speech recognition system whose acoustic model was trained -in a supervised manner. -" -1865,1506.06714,"Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, - Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, Bill Dolan","A Neural Network Approach to Context-Sensitive Generation of - Conversational Responses",cs.CL cs.AI cs.LG cs.NE," We present a novel response generation system that can be trained end to end -on large quantities of unstructured Twitter conversations. A neural network -architecture is used to address sparsity issues that arise when integrating -contextual information into classic statistical models, allowing the system to -take into account previous dialog utterances. Our dynamic-context generative -models show consistent gains over both context-sensitive and -non-context-sensitive Machine Translation and Information Retrieval baselines. -" -1866,1506.06724,"Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel - Urtasun, Antonio Torralba, Sanja Fidler","Aligning Books and Movies: Towards Story-like Visual Explanations by - Watching Movies and Reading Books",cs.CV cs.CL," Books are a rich source of both fine-grained information, how a character, an -object or a scene looks like, as well as high-level semantics, what someone is -thinking, feeling and how these states evolve through a story. This paper aims -to align books to their movie releases in order to provide rich descriptive -explanations for visual content that go semantically far beyond the captions -available in current datasets. To align movies and books we exploit a neural -sentence embedding that is trained in an unsupervised way from a large corpus -of books, as well as a video-text neural embedding for computing similarities -between movie clips and sentences in the book. We propose a context-aware CNN -to combine information from multiple sources. We demonstrate good quantitative -performance for movie/book alignment and show several qualitative examples that -showcase the diversity of tasks our model can be used for. -" -1867,1506.06726,"Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio - Torralba, Raquel Urtasun, Sanja Fidler",Skip-Thought Vectors,cs.CL cs.LG," We describe an approach for unsupervised learning of a generic, distributed -sentence encoder. Using the continuity of text from books, we train an -encoder-decoder model that tries to reconstruct the surrounding sentences of an -encoded passage. Sentences that share semantic and syntactic properties are -thus mapped to similar vector representations. We next introduce a simple -vocabulary expansion method to encode words that were not seen as part of -training, allowing us to expand our vocabulary to a million words. After -training our model, we extract and evaluate our vectors with linear models on 8 -tasks: semantic relatedness, paraphrase detection, image-sentence ranking, -question-type classification and 4 benchmark sentiment and subjectivity -datasets. The end result is an off-the-shelf encoder that can produce highly -generic sentence representations that are robust and perform well in practice. -We will make our encoder publicly available. -" -1868,1506.06832,"Assel Davletcharova, Sherin Sugathan, Bibia Abraham, Alex Pappachen - James",Detection and Analysis of Emotion From Speech Signals,cs.SD cs.CL cs.HC," Recognizing emotion from speech has become one the active research themes in -speech processing and in applications based on human-computer interaction. This -paper conducts an experimental study on recognizing emotions from human speech. -The emotions considered for the experiments include neutral, anger, joy and -sadness. The distinuishability of emotional features in speech were studied -first followed by emotion classification performed on a custom dataset. The -classification was performed for different classifiers. One of the main feature -attribute considered in the prepared dataset was the peak-to-peak distance -obtained from the graphical representation of the speech signals. After -performing the classification tests on a dataset formed from 30 different -subjects, it was found that for getting better accuracy, one should consider -the data collected from one person rather than considering the data from a -group of people. -" -1869,1506.06833,"Francis Ferraro, Nasrin Mostafazadeh, Ting-Hao (Kenneth) Huang, Lucy - Vanderwende, Jacob Devlin, Michel Galley, Margaret Mitchell",A Survey of Current Datasets for Vision and Language Research,cs.CL cs.AI cs.CV," Integrating vision and language has long been a dream in work on artificial -intelligence (AI). In the past two years, we have witnessed an explosion of -work that brings together vision and language from images to videos and beyond. -The available corpora have played a crucial role in advancing this area of -research. In this paper, we propose a set of quality metrics for evaluating and -analyzing the vision & language datasets and categorize them accordingly. Our -analyses show that the most recent datasets have been using more complex -language and more abstract concepts, however, there are different strengths and -weaknesses in each. -" -1870,1506.06863,"Michel Galley, Chris Brockett, Alessandro Sordoni, Yangfeng Ji, - Michael Auli, Chris Quirk, Margaret Mitchell, Jianfeng Gao, Bill Dolan","deltaBLEU: A Discriminative Metric for Generation Tasks with - Intrinsically Diverse Targets",cs.CL," We introduce Discriminative BLEU (deltaBLEU), a novel metric for intrinsic -evaluation of generated text in tasks that admit a diverse range of possible -outputs. Reference strings are scored for quality by human raters on a scale of -[-1, +1] to weight multi-reference BLEU. In tasks involving generation of -conversational responses, deltaBLEU correlates reasonably with human judgments -and outperforms sentence-level and IBM BLEU in terms of both Spearman's rho and -Kendall's tau. -" -1871,1506.06904,"Kim Song Jon, An Hae Gum","New Approach to translation of Isolated Units in English-Korean Machine - Translation",cs.CL," It is the most effective way for quick translation of tremendous amount of -explosively increasing science and technique information material to develop a -practicable machine translation system and introduce it into translation -practice. This essay treats problems arising from translation of isolated units -on the basis of the practical materials and experiments obtained in the -development and introduction of English-Korean machine translation system. In -other words, this essay considers establishment of information for isolated -units and their Korean equivalents and word order. -" -1872,1506.07190,"Nikola Mrk\v{s}i\'c, Diarmuid \'O S\'eaghdha, Blaise Thomson, Milica - Ga\v{s}i\'c, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen and Steve Young",Multi-domain Dialog State Tracking using Recurrent Neural Networks,cs.CL cs.LG," Dialog state tracking is a key component of many modern dialog systems, most -of which are designed with a single, well-defined domain in mind. This paper -shows that dialog data drawn from different dialog domains can be used to train -a general belief tracking model which can operate across all of these domains, -exhibiting superior performance to each of the domain-specific models. We -propose a training procedure which uses out-of-domain data to initialise belief -tracking models for entirely new domains. This procedure leads to improvements -in belief tracking performance regardless of the amount of in-domain data -available for training the model. -" -1873,1506.07220,Yangtuo Peng and Hui Jiang,"Leverage Financial News to Predict Stock Price Movements Using Word - Embeddings and Deep Neural Networks",cs.CE cs.AI cs.CL," Financial news contains useful information on public companies and the -market. In this paper we apply the popular word embedding methods and deep -neural networks to leverage financial news to predict stock price movements in -the market. Experimental results have shown that our proposed methods are -simple but very effective, which can significantly improve the stock prediction -accuracy on a standard financial database over the baseline system using only -the historical price information. -" -1874,1506.07285,"Ankit Kumar and Ozan Irsoy and Peter Ondruska and Mohit Iyyer and - James Bradbury and Ishaan Gulrajani and Victor Zhong and Romain Paulus and - Richard Socher",Ask Me Anything: Dynamic Memory Networks for Natural Language Processing,cs.CL cs.LG cs.NE," Most tasks in natural language processing can be cast into question answering -(QA) problems over language input. We introduce the dynamic memory network -(DMN), a neural network architecture which processes input sequences and -questions, forms episodic memories, and generates relevant answers. Questions -trigger an iterative attention process which allows the model to condition its -attention on the inputs and the result of previous iterations. These results -are then reasoned over in a hierarchical recurrent sequence model to generate -answers. The DMN can be trained end-to-end and obtains state-of-the-art results -on several types of tasks and datasets: question answering (Facebook's bAbI -dataset), text classification for sentiment analysis (Stanford Sentiment -Treebank) and sequence modeling for part-of-speech tagging (WSJ-PTB). The -training for these different tasks relies exclusively on trained word vector -representations and input-question-answer triplets. -" -1875,1506.07477,Jiatao Gu and Victor O.K. Li,Efficient Learning for Undirected Topic Models,cs.LG cs.CL cs.IR stat.ML," Replicated Softmax model, a well-known undirected topic model, is powerful in -extracting semantic representations of documents. Traditional learning -strategies such as Contrastive Divergence are very inefficient. This paper -provides a novel estimator to speed up the learning based on Noise Contrastive -Estimate, extended for documents of variant lengths and weighted inputs. -Experiments on two benchmarks show that the new estimator achieves great -learning efficiency and high accuracy on document retrieval and classification. -" -1876,1506.07503,"Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, - Yoshua Bengio",Attention-Based Models for Speech Recognition,cs.CL cs.LG cs.NE stat.ML," Recurrent sequence generators conditioned on input data through an attention -mechanism have recently shown very good performance on a range of tasks in- -cluding machine translation, handwriting synthesis and image caption gen- -eration. We extend the attention-mechanism with features needed for speech -recognition. We show that while an adaptation of the model used for machine -translation in reaches a competitive 18.7% phoneme error rate (PER) on the -TIMIT phoneme recognition task, it can only be applied to utterances which are -roughly as long as the ones it was trained on. We offer a qualitative -explanation of this failure and propose a novel and generic method of adding -location-awareness to the attention mechanism to alleviate this issue. The new -method yields a model that is robust to long inputs and achieves 18% PER in -single utterances and 20% in 10-times longer (repeated) utterances. Finally, we -propose a change to the at- tention mechanism that prevents it from -concentrating too much on single frames, which further reduces PER to 17.6% -level. -" -1877,1506.07650,"Kun Xu, Yansong Feng, Songfang Huang, Dongyan Zhao","Semantic Relation Classification via Convolutional Neural Networks with - Simple Negative Sampling",cs.CL cs.LG," Syntactic features play an essential role in identifying relationship in a -sentence. Previous neural network models often suffer from irrelevant -information introduced when subjects and objects are in a long distance. In -this paper, we propose to learn more robust relation representations from the -shortest dependency path through a convolution neural network. We further -propose a straightforward negative sampling strategy to improve the assignment -of subjects and objects. Experimental results show that our method outperforms -the state-of-the-art methods on the SemEval-2010 Task 8 dataset. -" -1878,1506.07732,"Nicolas Bourgeois (SAMM), Marie Cottrell (SAMM), Benjamin D\'eruelle - (LAMOP), St\'ephane Lamass\'e (LAMOP), Patrick Letr\'emy (SAMM)","How to improve robustness in Kohonen maps and display additional - information in Factorial Analysis: application to text mining",math.ST cs.CL stat.TH," This article is an extended version of a paper presented in the WSOM'2012 -conference [1]. We display a combination of factorial projections, SOM -algorithm and graph techniques applied to a text mining problem. The corpus -contains 8 medieval manuscripts which were used to teach arithmetic techniques -to merchants. Among the techniques for Data Analysis, those used for -Lexicometry (such as Factorial Analysis) highlight the discrepancies between -manuscripts. The reason for this is that they focus on the deviation from the -independence between words and manuscripts. Still, we also want to discover and -characterize the common vocabulary among the whole corpus. Using the properties -of stochastic Kohonen maps, which define neighborhood between inputs in a -non-deterministic way, we highlight the words which seem to play a special role -in the vocabulary. We call them fickle and use them to improve both Kohonen map -robustness and significance of FCA visualization. Finally we use graph -algorithmic to exploit this fickleness for classification of words. -" -1879,1506.08052,"Carlo Combi, Riccardo Lora, Ugo Moretti, Marco Pagliarini, Margherita - Zorzi",Automagically encoding Adverse Drug Reactions in MedDRA,cs.CL," Pharmacovigilance is the field of science devoted to the collection, analysis -and prevention of Adverse Drug Reactions (ADRs). Efficient strategies for the -extraction of information about ADRs from free text resources are essential to -support the work of experts, employed in the crucial task of detecting and -classifying unexpected pathologies possibly related to drug assumptions. -Narrative ADR descriptions may be collected in several way, e.g. by monitoring -social networks or through the so called spontaneous reporting, the main method -pharmacovigilance adopts in order to identify ADRs. The encoding of free-text -ADR descriptions according to MedDRA standard terminology is central for report -analysis. It is a complex work, which has to be manually implemented by the -pharmacovigilance experts. The manual encoding is expensive (in terms of time). -Moreover, a problem about the accuracy of the encoding may occur, since the -number of reports is growing up day by day. In this paper, we propose -MagiCoder, an efficient Natural Language Processing algorithm able to -automatically derive MedDRA terminologies from free-text ADR descriptions. -MagiCoder is part of VigiWork, a web application for online ADR reporting and -analysis. From a practical view-point, MagiCoder radically reduces the revision -time of ADR reports: the pharmacologist has simply to revise and validate the -automatic solution versus the hard task of choosing solutions in the 70k terms -of MedDRA. This improvement of the expert work efficiency has a meaningful -impact on the quality of data analysis. Moreover, our procedure is general -purpose. We developed MagiCoder for the Italian pharmacovigilance language, but -preliminarily analyses show that it is robust to language and dictionary -changes. -" -1880,1506.08126,"Dragomir Radev and Amanda Stent and Joel Tetreault and Aasish Pappu - and Aikaterini Iliakopoulou and Agustin Chanfreau and Paloma de Juan and - Jordi Vallmitjana and Alejandro Jaimes and Rahul Jha and Bob Mankoff","Humor in Collective Discourse: Unsupervised Funniness Detection in the - New Yorker Cartoon Caption Contest",cs.CL cs.AI cs.MM stat.ML," The New Yorker publishes a weekly captionless cartoon. More than 5,000 -readers submit captions for it. The editors select three of them and ask the -readers to pick the funniest one. We describe an experiment that compares a -dozen automatic methods for selecting the funniest caption. We show that -negative sentiment, human-centeredness, and lexical centrality most strongly -match the funniest captions, followed by positive sentiment. These results are -useful for understanding humor and also in the design of more engaging -conversational agents in text and multimodal (vision+text) systems. As part of -this work, a large set of cartoons and captions is being made available to the -community. -" -1881,1506.08259,"Afshin Rahimi, Trevor Cohn, and Timothy Baldwin","Twitter User Geolocation Using a Unified Text and Network Prediction - Model",cs.CL cs.SI," We propose a label propagation approach to geolocation prediction based on -Modified Adsorption, with two enhancements:(1) the removal of ""celebrity"" nodes -to increase location homophily and boost tractability, and (2) he incorporation -of text-based geolocation priors for test users. Experiments over three Twitter -benchmark datasets achieve state-of-the-art results, and demonstrate the -effectiveness of the enhancements. -" -1882,1506.08349,Lantian Li and Yiye Lin and Zhiyong Zhang and Dong Wang,"Improved Deep Speaker Feature Learning for Text-Dependent Speaker - Recognition",cs.CL cs.LG cs.NE," A deep learning approach has been proposed recently to derive speaker -identifies (d-vector) by a deep neural network (DNN). This approach has been -applied to text-dependent speaker recognition tasks and shows reasonable -performance gains when combined with the conventional i-vector approach. -Although promising, the existing d-vector implementation still can not compete -with the i-vector baseline. This paper presents two improvements for the deep -learning approach: a phonedependent DNN structure to normalize phone variation, -and a new scoring approach based on dynamic time warping (DTW). Experiments on -a text-dependent speaker recognition task demonstrated that the proposed -methods can provide considerable performance improvement over the existing -d-vector implementation. -" -1883,1506.08422,Li-Qiang Niu and Xin-Yu Dai,Topic2Vec: Learning Distributed Representations of Topics,cs.CL cs.LG," Latent Dirichlet Allocation (LDA) mining thematic structure of documents -plays an important role in nature language processing and machine learning -areas. However, the probability distribution from LDA only describes the -statistical relationship of occurrences in the corpus and usually in practice, -probability is not the best choice for feature representations. Recently, -embedding methods have been proposed to represent words and documents by -learning essential concepts and representations, such as Word2Vec and Doc2Vec. -The embedded representations have shown more effectiveness than LDA-style -representations in many tasks. In this paper, we propose the Topic2Vec approach -which can learn topic representations in the same semantic vector space with -words, as an alternative to probability. The experimental results show that -Topic2Vec achieves interesting and meaningful results. -" -1884,1506.08454,"Vijil Chenthamarakshan, Prasad M Desphande, Raghu Krishnapuram, - Ramakrishna Varadarajan, Knut Stolze","WYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Visual - Information Extraction",cs.CL cs.DB cs.IR," The visual layout of a webpage can provide valuable clues for certain types -of Information Extraction (IE) tasks. In traditional rule based IE frameworks, -these layout cues are mapped to rules that operate on the HTML source of the -webpages. In contrast, we have developed a framework in which the rules can be -specified directly at the layout level. This has many advantages, since the -higher level of abstraction leads to simpler extraction rules that are largely -independent of the source code of the page, and, therefore, more robust. It can -also enable specification of new types of rules that are not otherwise -possible. To the best of our knowledge, there is no general framework that -allows declarative specification of information extraction rules based on -spatial layout. Our framework is complementary to traditional text based rules -framework and allows a seamless combination of spatial layout based rules with -traditional text based rules. We describe the algebra that enables such a -system and its efficient implementation using standard relational and text -indexing features of a relational database. We demonstrate the simplicity and -efficiency of this system for a task involving the extraction of software -system requirements from software product pages. -" -1885,1506.08663,Massimo Piattelli-Palmarini and Giuseppe Vitiello,Linguistics and some aspects of its underlying dynamics,cs.CL quant-ph," In recent years, central components of a new approach to linguistics, the -Minimalist Program (MP) have come closer to physics. Features of the Minimalist -Program, such as the unconstrained nature of recursive Merge, the operation of -the Labeling Algorithm that only operates at the interface of Narrow Syntax -with the Conceptual-Intentional and the Sensory-Motor interfaces, the -difference between pronounced and un-pronounced copies of elements in a -sentence and the build-up of the Fibonacci sequence in the syntactic derivation -of sentence structures, are directly accessible to representation in terms of -algebraic formalism. Although in our scheme linguistic structures are classical -ones, we find that an interesting and productive isomorphism can be established -between the MP structure, algebraic structures and many-body field theory -opening new avenues of inquiry on the dynamics underlying some central aspects -of linguistics. -" -1886,1506.08789,Najla Al-Saati and Raghda Abdul-Jaleel,Requirement Tracing using Term Extraction,cs.SE cs.CL cs.IR," Requirements traceability is an essential step in ensuring the quality of -software during the early stages of its development life cycle. Requirements -tracing usually consists of document parsing, candidate link generation and -evaluation and traceability analysis. This paper demonstrates the applicability -of Statistical Term Extraction metrics to generate candidate links. It is -applied and validated using two data sets and four types of filters two for -each data set, 0.2 and 0.25 for MODIS, 0 and 0.05 for CM1. This method -generates requirements traceability matrices between textual requirements -artifacts (such as high-level requirements traced to low-level requirements). -The proposed method includes ten word frequency metrics divided into three main -groups for calculating the frequency of terms. The results show that the -proposed method gives better result when compared with the traditional TF-IDF -method. -" -1887,1506.08909,"Ryan Lowe, Nissan Pow, Iulian Serban, Joelle Pineau","The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured - Multi-Turn Dialogue Systems",cs.CL cs.AI cs.LG cs.NE," This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost -1 million multi-turn dialogues, with a total of over 7 million utterances and -100 million words. This provides a unique resource for research into building -dialogue managers based on neural language models that can make use of large -amounts of unlabeled data. The dataset has both the multi-turn property of -conversations in the Dialog State Tracking Challenge datasets, and the -unstructured nature of interactions from microblog services such as Twitter. We -also describe two neural learning architectures suitable for analyzing this -dataset, and provide benchmark performance on the task of selecting the best -next response. -" -1888,1506.08941,"Karthik Narasimhan, Tejas Kulkarni and Regina Barzilay","Language Understanding for Text-based Games Using Deep Reinforcement - Learning",cs.CL cs.AI," In this paper, we consider the task of learning control policies for -text-based games. In these games, all interactions in the virtual world are -through text and the underlying state is not observed. The resulting language -barrier makes such environments challenging for automatic game players. We -employ a deep reinforcement learning framework to jointly learn state -representations and action policies using game rewards as feedback. This -framework enables us to map text descriptions into vector representations that -capture the semantics of the game states. We evaluate our approach on two game -worlds, comparing against baselines using bag-of-words and bag-of-bigrams for -state representations. Our algorithm outperforms the baselines on both worlds -demonstrating the importance of learning expressive representations. -" -1889,1506.09107,Diego R. Amancio,A complex network approach to stylometry,cs.CL," Statistical methods have been widely employed to study the fundamental -properties of language. In recent years, methods from complex and dynamical -systems proved useful to create several language models. Despite the large -amount of studies devoted to represent texts with physical models, only a -limited number of studies have shown how the properties of the underlying -physical systems can be employed to improve the performance of natural language -processing tasks. In this paper, I address this problem by devising complex -networks methods that are able to improve the performance of current -statistical methods. Using a fuzzy classification strategy, I show that the -topological properties extracted from texts complement the traditional textual -description. In several cases, the performance obtained with hybrid approaches -outperformed the results obtained when only traditional or networked methods -were used. Because the proposed model is generic, the framework devised here -could be straightforwardly used to study similar textual applications where the -topology plays a pivotal role in the description of the interacting agents. -" -1890,1507.00133,"Valeria Borz\`i, Simone Faro, Arianna Pavone and Sabrina Sansone",Prior Polarity Lexical Resources for the Italian Language,cs.CL," In this paper we present SABRINA (Sentiment Analysis: a Broad Resource for -Italian Natural language Applications) a manually annotated prior polarity -lexical resource for Italian natural language applications in the field of -opinion mining and sentiment induction. The resource consists in two different -sets, an Italian dictionary of more than 277.000 words tagged with their prior -polarity value, and a set of polarity modifiers, containing more than 200 -words, which can be used in combination with non neutral terms of the -dictionary in order to induce the sentiment of Italian compound terms. To the -best of our knowledge this is the first prior polarity manually annotated -resource which has been developed for the Italian natural language. -" -1891,1507.00209,Hai Zhuge,Dimensionality on Summarization,cs.CL cs.IR," Summarization is one of the key features of human intelligence. It plays an -important role in understanding and representation. With rapid and continual -expansion of texts, pictures and videos in cyberspace, automatic summarization -becomes more and more desirable. Text summarization has been studied for over -half century, but it is still hard to automatically generate a satisfied -summary. Traditional methods process texts empirically and neglect the -fundamental characteristics and principles of language use and understanding. -This paper summarizes previous text summarization approaches in a -multi-dimensional classification space, introduces a multi-dimensional -methodology for research and development, unveils the basic characteristics and -principles of language use and understanding, investigates some fundamental -mechanisms of summarization, studies the dimensions and forms of -representations, and proposes a multi-dimensional evaluation mechanisms. -Investigation extends to the incorporation of pictures into summary and to the -summarization of videos, graphs and pictures, and then reaches a general -summarization framework. -" -1892,1507.00639,Daoud Clarke,"Simple, Fast Semantic Parsing with a Tensor Kernel",cs.CL," We describe a simple approach to semantic parsing based on a tensor product -kernel. We extract two feature vectors: one for the query and one for each -candidate logical form. We then train a classifier using the tensor product of -the two vectors. Using very simple features for both, our system achieves an -average F1 score of 40.1% on the WebQuestions dataset. This is comparable to -more complex systems but is simpler to implement and runs faster. -" -1893,1507.00955,"Olga Kolchyna, Tharsis T. P. Souza, Philip Treleaven, Tomaso Aste","Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and - Their Combination",cs.CL cs.IR cs.LG stat.ME stat.ML," This paper covers the two approaches for sentiment analysis: i) lexicon based -method; ii) machine learning method. We describe several techniques to -implement these approaches and discuss how they can be adopted for sentiment -classification of Twitter messages. We present a comparative study of different -lexicon combinations and show that enhancing sentiment lexicons with emoticons, -abbreviations and social-media slang expressions increases the accuracy of -lexicon-based classification for Twitter. We discuss the importance of feature -generation and feature selection processes for machine learning sentiment -classification. To quantify the performance of the main sentiment analysis -methods over Twitter we run these algorithms on a benchmark Twitter dataset -from the SemEval-2013 competition, task 2-B. The results show that machine -learning method based on SVM and Naive Bayes classifiers outperforms the -lexicon method. We present a new ensemble method that uses a lexicon based -sentiment score as input feature for the machine learning approach. The -combined method proved to produce more precise classifications. We also show -that employing a cost-sensitive classifier for highly unbalanced datasets -yields an improvement of sentiment classification performance up to 7%. -" -1894,1507.01053,"Kyunghyun Cho, Aaron Courville, Yoshua Bengio","Describing Multimedia Content using Attention-based Encoder--Decoder - Networks",cs.NE cs.CL cs.CV cs.LG," Whereas deep neural networks were first mostly used for classification tasks, -they are rapidly expanding in the realm of structured output problems, where -the observed target is composed of multiple random variables that have a rich -joint distribution, given the input. We focus in this paper on the case where -the input also has a rich structure and the input and output structures are -somehow related. We describe systems that learn to attend to different places -in the input, for each element of the output, for a variety of tasks: machine -translation, image caption generation, video clip description and speech -recognition. All these systems are based on a shared set of building blocks: -gated recurrent neural networks and convolutional neural networks, along with -trained attention mechanisms. We report on experimental results with these -systems, showing impressively good performance and the advantage of the -attention mechanism. -" -1895,1507.01127,"Sascha Rothe and Hinrich Sch\""utze","AutoExtend: Extending Word Embeddings to Embeddings for Synsets and - Lexemes",cs.CL," We present \textit{AutoExtend}, a system to learn embeddings for synsets and -lexemes. It is flexible in that it can take any word embeddings as input and -does not need an additional training corpus. The synset/lexeme embeddings -obtained live in the same vector space as the word embeddings. A sparse tensor -formalization guarantees efficiency and parallelizability. We use WordNet as a -lexical resource, but AutoExtend can be easily applied to other resources like -Freebase. AutoExtend achieves state-of-the-art performance on word similarity -and word sense disambiguation tasks. -" -1896,1507.01193,"Piotr Mirowski, Andreas Vlachos",Dependency Recurrent Neural Language Models for Sentence Completion,cs.CL cs.AI cs.LG," Recent work on language modelling has shifted focus from count-based models -to neural models. In these works, the words in each sentence are always -considered in a left-to-right order. In this paper we show how we can improve -the performance of the recurrent neural network (RNN) language model by -incorporating the syntactic dependencies of a sentence, which have the effect -of bringing relevant contexts closer to the word being predicted. We evaluate -our approach on the Microsoft Research Sentence Completion Challenge and show -that the dependency RNN proposed improves over the RNN by about 10 points in -accuracy. Furthermore, we achieve results comparable with the state-of-the-art -models on this task. -" -1897,1507.01526,"Nal Kalchbrenner, Ivo Danihelka, Alex Graves",Grid Long Short-Term Memory,cs.NE cs.CL cs.LG," This paper introduces Grid Long Short-Term Memory, a network of LSTM cells -arranged in a multidimensional grid that can be applied to vectors, sequences -or higher dimensional data such as images. The network differs from existing -deep LSTM architectures in that the cells are connected between network layers -as well as along the spatiotemporal dimensions of the data. The network -provides a unified way of using LSTM for both deep and sequential computation. -We apply the model to algorithmic tasks such as 15-digit integer addition and -sequence memorization, where it is able to significantly outperform the -standard LSTM. We then give results for two empirical tasks. We find that 2D -Grid LSTM achieves 1.47 bits per character on the Wikipedia character -prediction benchmark, which is state-of-the-art among neural approaches. In -addition, we use the Grid LSTM to define a novel two-dimensional translation -model, the Reencoder, and show that it outperforms a phrase-based reference -system on a Chinese-to-English translation task. -" -1898,1507.01529,Fionn Murtagh,"Correspondence Factor Analysis of Big Data Sets: A Case Study of 30 - Million Words; and Contrasting Analytics using Apache Solr and Correspondence - Analysis in R",cs.CL," We consider a large number of text data sets. These are cooking recipes. Term -distribution and other distributional properties of the data are investigated. -Our aim is to look at various analytical approaches which allow for mining of -information on both high and low detail scales. Metric space embedding is -fundamental to our interest in the semantic properties of this data. We -consider the projection of all data into analyses of aggregated versions of the -data. We contrast that with projection of aggregated versions of the data into -analyses of all the data. Analogously for the term set, we look at analysis of -selected terms. We also look at inherent term associations such as between -singular and plural. In addition to our use of Correspondence Analysis in R, -for latent semantic space mapping, we also use Apache Solr. Setting up the Solr -server and carrying out querying is described. A further novelty is that -querying is supported in Solr based on the principal factor plane mapping of -all the data. This uses a bounding box query, based on factor projections. -" -1899,1507.01636,Jiwei Li and Eduard Hovy,Reflections on Sentiment/Opinion Analysis,cs.CL," In this paper, we described possible directions for deeper understanding, -helping bridge the gap between psychology / cognitive science and computational -approaches in sentiment/opinion analysis literature. We focus on the opinion -holder's underlying needs and their resultant goals, which, in a utilitarian -model of sentiment, provides the basis for explaining the reason a sentiment -valence is held. While these thoughts are still immature, scattered, -unstructured, and even imaginary, we believe that these perspectives might -suggest fruitful avenues for various kinds of future work. -" -1900,1507.01701,Tobias Kuhn,A Survey and Classification of Controlled Natural Languages,cs.CL," What is here called controlled natural language (CNL) has traditionally been -given many different names. Especially during the last four decades, a wide -variety of such languages have been designed. They are applied to improve -communication among humans, to improve translation, or to provide natural and -intuitive representations for formal notations. Despite the apparent -differences, it seems sensible to put all these languages under the same -umbrella. To bring order to the variety of languages, a general classification -scheme is presented here. A comprehensive survey of existing English-based CNLs -is given, listing and describing 100 languages from 1930 until today. -Classification of these languages reveals that they form a single scattered -cloud filling the conceptual space between natural languages such as English on -the one end and formal languages such as propositional logic on the other. The -goal of this article is to provide a common terminology and a common model for -CNL, to contribute to the understanding of their general nature, to provide a -starting point for researchers interested in the area, and to help developers -to make design decisions. -" -1901,1507.01839,Mingbo Ma and Liang Huang and Bing Xiang and Bowen Zhou,Dependency-based Convolutional Neural Networks for Sentence Embedding,cs.CL cs.AI cs.LG," In sentence modeling and classification, convolutional neural network -approaches have recently achieved state-of-the-art results, but all such -efforts process word vectors sequentially and neglect long-distance -dependencies. To exploit both deep learning and linguistic structures, we -propose a tree-based convolutional neural network model which exploit various -long-distance relationships between words. Our model improves the sequential -baselines on all three sentiment and question classification tasks, and -achieves the highest published accuracy on TREC. -" -1902,1507.02012,"Akanksha Gehlot, Vaishali Sharma, Shashi Pal Singh, Ajai Kumar",Hindi to English Transfer Based Machine Translation System,cs.CL," In large societies like India there is a huge demand to convert one human -language into another. Lots of work has been done in this area. Many transfer -based MTS have developed for English to other languages, as MANTRA CDAC Pune, -MATRA CDAC Pune, SHAKTI IISc Bangalore and IIIT Hyderabad. Still there is a -little work done for Hindi to other languages. Currently we are working on it. -In this paper we focus on designing a system, that translate the document from -Hindi to English by using transfer based approach. This system takes an input -text check its structure through parsing. Reordering rules are used to generate -the text in target language. It is better than Corpus Based MTS because Corpus -Based MTS require large amount of word aligned data for translation that is not -available for many languages while Transfer Based MTS requires only knowledge -of both the languages(source language and target language) to make transfer -rules. We get correct translation for simple assertive sentences and almost -correct for complex and compound sentences. -" -1903,1507.02020,"Thierry Poibeau (LaTTICe), Pablo Ruiz (LaTTICe)",Generating Navigable Semantic Maps from Social Sciences Corpora,cs.CL cs.AI cs.IR," It is now commonplace to observe that we are facing a deluge of online -information. Researchers have of course long acknowledged the potential value -of this information since digital traces make it possible to directly observe, -describe and analyze social facts, and above all the co-evolution of ideas and -communities over time. However, most online information is expressed through -text, which means it is not directly usable by machines, since computers -require structured, organized and typed information in order to be able to -manipulate it. Our goal is thus twofold: 1. Provide new natural language -processing techniques aiming at automatically extracting relevant information -from texts, especially in the context of social sciences, and connect these -pieces of information so as to obtain relevant socio-semantic networks; 2. -Provide new ways of exploring these socio-semantic networks, thanks to tools -allowing one to dynamically navigate these networks, de-construct and -re-construct them interactively, from different points of view following the -needs expressed by domain experts. -" -1904,1507.02045,Aaron Jaech and Mari Ostendorf,What Your Username Says About You,cs.CL," Usernames are ubiquitous on the Internet, and they are often suggestive of -user demographics. This work looks at the degree to which gender and language -can be inferred from a username alone by making use of unsupervised morphology -induction to decompose usernames into sub-units. Experimental results on the -two tasks demonstrate the effectiveness of the proposed morphological features -compared to a character n-gram baseline. -" -1905,1507.02062,"Xiaojun Wan, Ziqiang Cao, Furu Wei, Sujian Li and Ming Zhou",Multi-Document Summarization via Discriminative Summary Reranking,cs.CL," Existing multi-document summarization systems usually rely on a specific -summarization model (i.e., a summarization method with a specific parameter -setting) to extract summaries for different document sets with different -topics. However, according to our quantitative analysis, none of the existing -summarization models can always produce high-quality summaries for different -document sets, and even a summarization model with good overall performance may -produce low-quality summaries for some document sets. On the contrary, a -baseline summarization model may produce high-quality summaries for some -document sets. Based on the above observations, we treat the summaries produced -by different summarization models as candidate summaries, and then explore -discriminative reranking techniques to identify high-quality summaries from the -candidates for difference document sets. We propose to extract a set of -candidate summaries for each document set based on an ILP framework, and then -leverage Ranking SVM for summary reranking. Various useful features have been -developed for the reranking process, including word-level features, -sentence-level features and summary-level features. Evaluation results on the -benchmark DUC datasets validate the efficacy and robustness of our proposed -approach. -" -1906,1507.02086,Shashishekar Ramakrishna and Lukasz Gorski and Adrian Paschke,The Role of Pragmatics in Legal Norm Representation,cs.CL cs.AI," Despite the 'apparent clarity' of a given legal provision, its application -may result in an outcome that does not exactly conform to the semantic level of -a statute. The vagueness within a legal text is induced intentionally to -accommodate all possible scenarios under which such norms should be applied, -thus making the role of pragmatics an important aspect also in the -representation of a legal norm and reasoning on top of it. The notion of -pragmatics considered in this paper does not focus on the aspects associated -with judicial decision making. The paper aims to shed light on the aspects of -pragmatics in legal linguistics, mainly focusing on the domain of patent law, -only from a knowledge representation perspective. The philosophical discussions -presented in this paper are grounded based on the legal theories from Grice and -Marmor. -" -1907,1507.02140,Yue Hu and Xiaojun Wan,Mining and Analyzing the Future Works in Scientific Articles,cs.DL cs.CL cs.IR," Future works in scientific articles are valuable for researchers and they can -guide researchers to new research directions or ideas. In this paper, we mine -the future works in scientific articles in order to 1) provide an insight for -future work analysis and 2) facilitate researchers to search and browse future -works in a research area. First, we study the problem of future work extraction -and propose a regular expression based method to address the problem. Second, -we define four different categories for the future works by observing the data -and investigate the multi-class future work classification problem. Third, we -apply the extraction method and the classification model to a paper dataset in -the computer science field and conduct a further analysis of the future works. -Finally, we design a prototype system to search and demonstrate the future -works mined from the scientific papers. Our evaluation results show that our -extraction method can get high precision and recall values and our -classification model can also get good results and it outperforms several -baseline models. Further analysis of the future work sentences also indicates -interesting results. -" -1908,1507.02145,"Xiaojiang Huang, Xiaojun Wan and Jianguo Xiao",Learning to Mine Chinese Coordinate Terms Using the Web,cs.CL," Coordinate relation refers to the relation between instances of a concept and -the relation between the directly hyponyms of a concept. In this paper, we -focus on the task of extracting terms which are coordinate with a user given -seed term in Chinese, and grouping the terms which belong to different concepts -if the seed term has several meanings. We propose a semi-supervised method that -integrates manually defined linguistic patterns and automatically learned -semi-structural patterns to extract coordinate terms in Chinese from web search -results. In addition, terms are grouped into different concepts based on their -co-occurring terms and contexts. We further calculate the saliency scores of -extracted terms and rank them accordingly. Experimental results demonstrate -that our proposed method generates results with high quality and wide coverage. -" -1909,1507.02205,"Aaron Jaech, Victoria Zayats, Hao Fang, Mari Ostendorf and Hannaneh - Hajishirzi",Talking to the crowd: What do people react to in online discussions?,cs.CL cs.SI," This paper addresses the question of how language use affects community -reaction to comments in online discussion forums, and the relative importance -of the message vs. the messenger. A new comment ranking task is proposed based -on community annotated karma in Reddit discussions, which controls for topic -and timing of comments. Experimental work with discussion threads from six -subreddits shows that the importance of different types of language features -varies with the community of interest. -" -1910,1507.02447,Santosh Tirunagari,"Data Mining of Causal Relations from Text: Analysing Maritime Accident - Investigation Reports",cs.IR cs.CL," Text mining is a process of extracting information of interest from text. -Such a method includes techniques from various areas such as Information -Retrieval (IR), Natural Language Processing (NLP), and Information Extraction -(IE). In this study, text mining methods are applied to extract causal -relations from maritime accident investigation reports collected from the -Marine Accident Investigation Branch (MAIB). These causal relations provide -information on various mechanisms behind accidents, including human and -organizational factors relating to the accident. The objective of this study is -to facilitate the analysis of the maritime accident investigation reports, by -means of extracting contributory causes with more feasibility. A careful -investigation of contributory causes from the reports provide opportunity to -improve safety in future. - Two methods have been employed in this study to extract the causal relations. -They are 1) Pattern classification method and 2) Connectives method. The -earlier one uses naive Bayes and Support Vector Machines (SVM) as classifiers. -The latter simply searches for the words connecting cause and effect in -sentences. - The causal patterns extracted using these two methods are compared to the -manual (human expert) extraction. The pattern classification method showed a -fair and sensible performance with F-measure(average) = 65% when compared to -connectives method with F-measure(average) = 58%. This study is an evidence, -that text mining methods could be employed in extracting causal relations from -marine accident investigation reports. -" -1911,1507.02628,Zhiguo Wang and Abraham Ittycheriah,FAQ-based Question Answering via Word Alignment,cs.CL," In this paper, we propose a novel word-alignment-based method to solve the -FAQ-based question answering task. First, we employ a neural network model to -calculate question similarity, where the word alignment between two questions -is used for extracting features. Second, we design a bootstrap-based feature -extraction method to extract a small set of effective lexical features. Third, -we propose a learning-to-rank algorithm to train parameters more suitable for -the ranking tasks. Experimental results, conducted on three languages (English, -Spanish and Japanese), demonstrate that the question similarity model is more -effective than baseline systems, the sparse features bring 5% improvements on -top-1 accuracy, and the learning-to-rank algorithm works significantly better -than the traditional method. We further evaluate our method on the answer -sentence selection task. Our method outperforms all the previous systems on the -standard TREC data set. -" -1912,1507.02907,"Lu\'is Marujo, Ricardo Ribeiro, David Martins de Matos, Jo\~ao P. - Neto, Anatole Gershman, Jaime Carbonell","Extending a Single-Document Summarizer to Multi-Document: a Hierarchical - Approach",cs.IR cs.CL," The increasing amount of online content motivated the development of -multi-document summarization methods. In this work, we explore straightforward -approaches to extend single-document summarization methods to multi-document -summarization. The proposed methods are based on the hierarchical combination -of single-document summaries, and achieves state of the art results. -" -1913,1507.03045,"Tushar Khot, Niranjan Balasubramanian, Eric Gribkoff, Ashish - Sabharwal, Peter Clark, Oren Etzioni",Markov Logic Networks for Natural Language Question Answering,cs.AI cs.CL," Our goal is to answer elementary-level science questions using knowledge -extracted automatically from science textbooks, expressed in a subset of -first-order logic. Given the incomplete and noisy nature of these automatically -extracted rules, Markov Logic Networks (MLNs) seem a natural model to use, but -the exact way of leveraging MLNs is by no means obvious. We investigate three -ways of applying MLNs to our task. In the first, we simply use the extracted -science rules directly as MLN clauses. Unlike typical MLN applications, our -domain has long and complex rules, leading to an unmanageable number of -groundings. We exploit the structure present in hard constraints to improve -tractability, but the formulation remains ineffective. In the second approach, -we instead interpret science rules as describing prototypical entities, thus -mapping rules directly to grounded MLN assertions, whose constants are then -clustered using existing entity resolution methods. This drastically simplifies -the network, but still suffers from brittleness. Finally, our third approach, -called Praline, uses MLNs to align the lexical elements as well as define and -control how inference should be performed in this task. Our experiments, -demonstrating a 15\% accuracy boost and a 10x reduction in runtime, suggest -that the flexibility and different inference semantics of Praline are a better -fit for the natural language question answering task. -" -1914,1507.03077,Adel Rahimi,A new hybrid stemming algorithm for Persian,cs.CL cs.IR," Stemming has been an influential part in Information retrieval and search -engines. There have been tremendous endeavours in making stemmer that are both -efficient and accurate. Stemmers can have three method in stemming, Dictionary -based stemmer, statistical-based stemmers, and rule-based stemmers. This paper -aims at building a hybrid stemmer that uses both Dictionary based method and -rule-based method for stemming. This ultimately helps the efficacy and -accurateness of the stemmer. -" -1915,1507.03223,"Shruti Tyagi, Deepti Chopra, Iti Mathur, Nisheeth Joshi",Classifier-Based Text Simplification for Improved Machine Translation,cs.CL," Machine Translation is one of the research fields of Computational -Linguistics. The objective of many MT Researchers is to develop an MT System -that produce good quality and high accuracy output translations and which also -covers maximum language pairs. As internet and Globalization is increasing day -by day, we need a way that improves the quality of translation. For this -reason, we have developed a Classifier based Text Simplification Model for -English-Hindi Machine Translation Systems. We have used support vector machines -and Na\""ive Bayes Classifier to develop this model. We have also evaluated the -performance of these classifiers. -" -1916,1507.03462,"Itziar Aldabe, Oier Lopez de Lacalle, I\~nigo Lopez-Gazpio and Montse - Maritxalar",Supervised Hierarchical Classification for Student Answer Scoring,cs.CL," This paper describes a hierarchical system that predicts one label at a time -for automated student response analysis. For the task, we build a -classification binary tree that delays more easily confused labels to later -stages using hierarchical processes. In particular, the paper describes how the -hierarchical classifier has been built and how the classification task has been -broken down into binary subtasks. It finally discusses the motivations and -fundamentals of such an approach. -" -1917,1507.03471,"Lukas Zilka, Filip Jurcicek",Incremental LSTM-based Dialog State Tracker,cs.CL," A dialog state tracker is an important component in modern spoken dialog -systems. We present an incremental dialog state tracker, based on LSTM -networks. It directly uses automatic speech recognition hypotheses to track the -state. We also present the key non-standard aspects of the model that bring its -performance close to the state-of-the-art and experimentally analyze their -contribution: including the ASR confidence scores, abstracting scarcely -represented values, including transcriptions in the training data, and model -averaging. -" -1918,1507.03641,Greg Durrett and Dan Klein,Neural CRF Parsing,cs.CL cs.NE," This paper describes a parsing model that combines the exact dynamic -programming of CRF parsing with the rich nonlinear featurization of neural net -approaches. Our model is structurally a CRF that factors over anchored rule -productions, but instead of linear potential functions based on sparse -features, we use nonlinear potentials computed via a feedforward neural -network. Because potentials are still local to anchored rules, structured -inference (CKY) is unchanged from the sparse case. Computing gradients during -learning involves backpropagating an error signal formed from standard CRF -sufficient statistics (expected rule counts). Using only dense features, our -neural CRF already exceeds a strong baseline CRF model (Hall et al., 2014). In -combination with sparse features, our system achieves 91.1 F1 on section 23 of -the Penn Treebank, and more generally outperforms the best prior single parser -results on a range of languages. -" -1919,1507.03934,"Kai Sun, Qizhe Xie, Kai Yu",Recurrent Polynomial Network for Dialogue State Tracking,cs.CL," Dialogue state tracking (DST) is a process to estimate the distribution of -the dialogue states as a dialogue progresses. Recent studies on constrained -Markov Bayesian polynomial (CMBP) framework take the first step towards -bridging the gap between rule-based and statistical approaches for DST. In this -paper, the gap is further bridged by a novel framework -- recurrent polynomial -network (RPN). RPN's unique structure enables the framework to have all the -advantages of CMBP including efficiency, portability and interpretability. -Additionally, RPN achieves more properties of statistical approaches than CMBP. -RPN was evaluated on the data corpora of the second and the third Dialog State -Tracking Challenge (DSTC-2/3). Experiments showed that RPN can significantly -outperform both traditional rule-based approaches and statistical approaches -with similar feature set. Compared with the state-of-the-art statistical DST -approaches with a lot richer features, RPN is also competitive. -" -1920,1507.04019,D. S. Pavan Kumar,Feature Normalisation for Robust Speech Recognition,cs.CL cs.SD," Speech recognition system performance degrades in noisy environments. If the -acoustic models are built using features of clean utterances, the features of a -noisy test utterance would be acoustically mismatched with the trained model. -This gives poor likelihoods and poor recognition accuracy. Model adaptation and -feature normalisation are two broad areas that address this problem. While the -former often gives better performance, the latter involves estimation of lesser -number of parameters, making the system feasible for practical implementations. - This research focuses on the efficacies of various subspace, statistical and -stereo based feature normalisation techniques. A subspace projection based -method has been investigated as a standalone and adjunct technique involving -reconstruction of noisy speech features from a precomputed set of clean speech -building-blocks. The building blocks are learned using non-negative matrix -factorisation (NMF) on log-Mel filter bank coefficients, which form a basis for -the clean speech subspace. The work provides a detailed study on how the method -can be incorporated into the extraction process of Mel-frequency cepstral -coefficients. Experimental results show that the new features are robust to -noise, and achieve better results when combined with the existing techniques. - The work also proposes a modification to the training process of SPLICE -algorithm for noise robust speech recognition. It is based on feature -correlations, and enables this stereo-based algorithm to improve the -performance in all noise conditions, especially in unseen cases. Further, the -modified framework is extended to work for non-stereo datasets where clean and -noisy training utterances, but not stereo counterparts, are required. An -MLLR-based computationally efficient run-time noise adaptation method in SPLICE -framework has been proposed. -" -1921,1507.04116,"Angelo Mariano, Giorgio Parisi, Saverio Pascazio",Language discrimination and clustering via a neural network approach,cond-mat.dis-nn cs.CL cs.NE physics.soc-ph," We classify twenty-one Indo-European languages starting from written text. We -use neural networks in order to define a distance among different languages, -construct a dendrogram and analyze the ultrametric structure that emerges. Four -or five subgroups of languages are identified, according to the ""cut"" of the -dendrogram, drawn with an entropic criterion. The results and the method are -discussed. -" -1922,1507.04214,Umit Mersinli,Associative Measures and Multi-word Unit Extraction in Turkish,cs.CL," Associative measures are ""mathematical formulas determining the strength of -association between two or more words based on their occurrences and -cooccurrences in a text corpus"" (Pecina, 2010, p. 138). The purpose of this -paper is to test the 12 associative measures that Text-NSP (Banerjee & -Pedersen, 2003) contains on a 10-million-word subcorpus of Turkish National -Corpus (TNC) (Aksan et.al., 2012). A statistical comparison of those measures -is out of the scope of the study, and the measures will be evaluated according -to the linguistic relevance of the rankings they provide. The focus of the -study is basically on optimizing the corpus data, before applying the measures -and then, evaluating the rankings produced by these measures as a whole, not on -the linguistic relevance of individual n-grams. The findings include -intra-linguistically relevant associative measures for a comma delimited, -sentence splitted, lower-cased, well-balanced, representative, 10-million-word -corpus of Turkish. -" -1923,1507.04420,"James Kirby, Morgan Sonderegger",Bias and population structure in the actuation of sound change,cs.CL physics.soc-ph," Why do human languages change at some times, and not others? We address this -longstanding question from a computational perspective, focusing on the case of -sound change. Sound change arises from the pronunciation variability ubiquitous -in every speech community, but most such variability does not lead to change. -Hence, an adequate model must allow for stability as well as change. Existing -theories of sound change tend to emphasize factors at the level of individual -learners promoting one outcome or the other, such as channel bias (which favors -change) or inductive bias (which favors stability). Here, we consider how the -interaction of these biases can lead to both stability and change in a -population setting. We find that population structure itself can act as a -source of stability, but that both stability and change are possible only when -both types of bias are active, suggesting that it is possible to understand why -sound change occurs at some times and not others as the population-level result -of the interplay between forces promoting each outcome in individual speakers. -In addition, if it is assumed that learners learn from two or more teachers, -the transition from stability to change is marked by a phase transition, -consistent with the abrupt transitions seen in many empirical cases of sound -change. The predictions of multiple-teacher models thus match empirical cases -of sound change better than the predictions of single-teacher models, -underscoring the importance of modeling language change in a population -setting. -" -1924,1507.04646,"Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming Zhou, Houfeng Wang",A Dependency-Based Neural Network for Relation Classification,cs.CL cs.LG cs.NE," Previous research on relation classification has verified the effectiveness -of using dependency shortest paths or subtrees. In this paper, we further -explore how to make full use of the combination of these dependency -information. We first propose a new structure, termed augmented dependency path -(ADP), which is composed of the shortest dependency path between two entities -and the subtrees attached to the shortest path. To exploit the semantic -representation behind the ADP structure, we develop dependency-based neural -networks (DepNN): a recursive neural network designed to model the subtrees, -and a convolutional neural network to capture the most important features on -the shortest path. Experiments on the SemEval-2010 dataset show that our -proposed method achieves state-of-art results. -" -1925,1507.04798,"Samuel R\""onnqvist",Exploratory topic modeling with distributional semantics,cs.IR cs.CL cs.LG," As we continue to collect and store textual data in a multitude of domains, -we are regularly confronted with material whose largely unknown thematic -structure we want to uncover. With unsupervised, exploratory analysis, no prior -knowledge about the content is required and highly open-ended tasks can be -supported. In the past few years, probabilistic topic modeling has emerged as a -popular approach to this problem. Nevertheless, the representation of the -latent topics as aggregations of semi-coherent terms limits their -interpretability and level of detail. - This paper presents an alternative approach to topic modeling that maps -topics as a network for exploration, based on distributional semantics using -learned word vectors. From the granular level of terms and their semantic -similarity relations global topic structures emerge as clustered regions and -gradients of concepts. Moreover, the paper discusses the visual interactive -representation of the topic map, which plays an important role in supporting -its exploration. -" -1926,1507.04808,"Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville - and Joelle Pineau","Building End-To-End Dialogue Systems Using Generative Hierarchical - Neural Network Models",cs.CL cs.AI cs.LG cs.NE," We investigate the task of building open domain, conversational dialogue -systems based on large dialogue corpora using generative models. Generative -models produce system responses that are autonomously generated word-by-word, -opening up the possibility for realistic, flexible interactions. In support of -this goal, we extend the recently proposed hierarchical recurrent -encoder-decoder neural network to the dialogue domain, and demonstrate that -this model is competitive with state-of-the-art neural language models and -back-off n-gram models. We investigate the limitations of this and similar -approaches, and show how its performance can be improved by bootstrapping the -learning from a larger question-answer pair corpus and from pretrained word -embeddings. -" -1927,1507.04908,Darko Brodic and Zoran N. Milivojevic and Alessia Amelio,"Analysis of the South Slavic Scripts by Run-Length Features of the Image - Texture",cs.CV cs.CL," The paper proposes an algorithm for the script recognition based on the -texture characteristics. The image texture is achieved by coding each letter -with the equivalent script type (number code) according to its position in the -text line. Each code is transformed into equivalent gray level pixel creating -an 1-D image. Then, the image texture is subjected to the run-length analysis. -This analysis extracts the run-length features, which are classified to make a -distinction between the scripts under consideration. In the experiment, a -custom oriented database is subject to the proposed algorithm. The database -consists of some text documents written in Cyrillic, Latin and Glagolitic -scripts. Furthermore, it is divided into training and test parts. The results -of the experiment show that 3 out of 5 run-length features can be used for -effective differentiation between the analyzed South Slavic scripts. -" -1928,1507.05134,"Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark, Crystal - Liang, Shival Dasu, Matilde Marcolli",Persistent Topology of Syntax,cs.CL math.AT," We study the persistent homology of the data set of syntactic parameters of -the world languages. We show that, while homology generators behave erratically -over the whole data set, non-trivial persistent homology appears when one -restricts to specific language families. Different families exhibit different -persistent homology. We focus on the cases of the Indo-European and the -Niger-Congo families, for which we compare persistent homology over different -cluster filtering values. We investigate the possible significance, in -historical linguistic terms, of the presence of persistent generators of the -first homology. In particular, we show that the persistent first homology -generator we find in the Indo-European family is not due (as one might guess) -to the Anglo-Norman bridge in the Indo-European phylogenetic network, but is -related to the position of Ancient Greek and the Hellenic branch within the -network. -" -1929,1507.05523,"Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao",How to Generate a Good Word Embedding?,cs.CL," We analyze three critical components of word embedding training: the model, -the corpus, and the training parameters. We systematize existing -neural-network-based word embedding algorithms and compare them using the same -corpus. We evaluate each word embedding in three ways: analyzing its semantic -properties, using it as a feature for supervised tasks and using it to -initialize neural networks. We also provide several simple guidelines for -training word embeddings. First, we discover that corpus domain is more -important than corpus size. We recommend choosing a corpus in a suitable domain -for the desired task, after that, using a larger corpus yields better results. -Second, we find that faster models provide sufficient performance in most -cases, and more complex models can be used if the training corpus is -sufficiently large. Third, the early stopping metric for iterating should rely -on the development set of the desired task rather than the validation loss of -training embedding. -" -1930,1507.05630,Matteo Grella,Notes About a More Aware Dependency Parser,cs.CL," In this paper I explain the reasons that led me to research and conceive a -novel technology for dependency parsing, mixing together the strengths of -data-driven transition-based and constraint-based approaches. In particular I -highlight the problem to infer the reliability of the results of a data-driven -transition-based parser, which is extremely important for high-level processes -that expect to use correct parsing results. I then briefly introduce a number -of notes about a new parser model I'm working on, capable to proceed with the -analysis in a ""more aware"" way, with a more ""robust"" concept of robustness. -" -1931,1507.05910,"Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, Yoshua - Bengio",Clustering is Efficient for Approximate Maximum Inner Product Search,cs.LG cs.CL stat.ML," Efficient Maximum Inner Product Search (MIPS) is an important task that has a -wide applicability in recommendation systems and classification with a large -number of classes. Solutions based on locality-sensitive hashing (LSH) as well -as tree-based solutions have been investigated in the recent literature, to -perform approximate MIPS in sublinear time. In this paper, we compare these to -another extremely simple approach for solving approximate MIPS, based on -variants of the k-means clustering algorithm. Specifically, we propose to train -a spherical k-means, after having reduced the MIPS problem to a Maximum Cosine -Similarity Search (MCSS). Experiments on two standard recommendation system -benchmarks as well as on large vocabulary word embeddings, show that this -simple approach yields much higher speedups, for the same retrieval precision, -than current state-of-the-art hashing-based and tree-based methods. This simple -method also yields more robust retrievals when the query is corrupted by noise. -" -1932,1507.06020,"Rimah Amami, Dorra Ben Ayed, Noureddine Ellouze","Practical Selection of SVM Supervised Parameters with Different Feature - Representations for Vowel Recognition",cs.CL cs.LG," It is known that the classification performance of Support Vector Machine -(SVM) can be conveniently affected by the different parameters of the kernel -tricks and the regularization parameter, C. Thus, in this article, we propose a -study in order to find the suitable kernel with which SVM may achieve good -generalization performance as well as the parameters to use. We need to analyze -the behavior of the SVM classifier when these parameters take very small or -very large values. The study is conducted for a multi-class vowel recognition -using the TIMIT corpus. Furthermore, for the experiments, we used different -feature representations such as MFCC and PLP. Finally, a comparative study was -done to point out the impact of the choice of the parameters, kernel trick and -feature representations on the performance of the SVM classifier -" -1933,1507.06021,"Rimah Amami, Dorra Ben Ayed, Noureddine Ellouze","An Empirical Comparison of SVM and Some Supervised Learning Algorithms - for Vowel recognition",cs.CL cs.LG," In this article, we conduct a study on the performance of some supervised -learning algorithms for vowel recognition. This study aims to compare the -accuracy of each algorithm. Thus, we present an empirical comparison between -five supervised learning classifiers and two combined classifiers: SVM, KNN, -Naive Bayes, Quadratic Bayes Normal (QDC) and Nearst Mean. Those algorithms -were tested for vowel recognition using TIMIT Corpus and Mel-frequency cepstral -coefficients (MFCCs). -" -1934,1507.06023,"Rimah Amami, Ghaith Manita, Abir Smiti","Robust speech recognition using consensus function based on multi-layer - networks",cs.CL cs.LG," The clustering ensembles mingle numerous partitions of a specified data into -a single clustering solution. Clustering ensemble has emerged as a potent -approach for ameliorating both the forcefulness and the stability of -unsupervised classification results. One of the major problems in clustering -ensembles is to find the best consensus function. Finding final partition from -different clustering results requires skillfulness and robustness of the -classification algorithm. In addition, the major problem with the consensus -function is its sensitivity to the used data sets quality. This limitation is -due to the existence of noisy, silence or redundant data. This paper proposes a -novel consensus function of cluster ensembles based on Multilayer networks -technique and a maintenance database method. This maintenance database approach -is used in order to handle any given noisy speech and, thus, to guarantee the -quality of databases. This can generates good results and efficient data -partitions. To show its effectiveness, we support our strategy with empirical -evaluation using distorted speech from Aurora speech databases. -" -1935,1507.06025,"Rimah Amami, Dorra Ben Ayed, Nouerddine Ellouze",Incorporating Belief Function in SVM for Phoneme Recognition,cs.CL cs.LG," The Support Vector Machine (SVM) method has been widely used in numerous -classification tasks. The main idea of this algorithm is based on the principle -of the margin maximization to find an hyperplane which separates the data into -two different classes.In this paper, SVM is applied to phoneme recognition -task. However, in many real-world problems, each phoneme in the data set for -recognition problems may differ in the degree of significance due to noise, -inaccuracies, or abnormal characteristics; All those problems can lead to the -inaccuracies in the prediction phase. Unfortunately, the standard formulation -of SVM does not take into account all those problems and, in particular, the -variation in the speech input. This paper presents a new formulation of SVM -(B-SVM) that attributes to each phoneme a confidence degree computed based on -its geometric position in the space. Then, this degree is used in order to -strengthen the class membership of the tested phoneme. Hence, we introduce a -reformulation of the standard SVM that incorporates the degree of belief. -Experimental performance on TIMIT database shows the effectiveness of the -proposed method B-SVM on a phoneme recognition problem. -" -1936,1507.06028,"Rimah Amami, Dorra Ben Ayed, Noureddine Ellouze","The challenges of SVM optimization using Adaboost on a phoneme - recognition problem",cs.CL cs.LG," The use of digital technology is growing at a very fast pace which led to the -emergence of systems based on the cognitive infocommunications. The expansion -of this sector impose the use of combining methods in order to ensure the -robustness in cognitive systems. -" -1937,1507.06073,"Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu",Discriminative Segmental Cascades for Feature-Rich Phone Recognition,cs.CL," Discriminative segmental models, such as segmental conditional random fields -(SCRFs) and segmental structured support vector machines (SSVMs), have had -success in speech recognition via both lattice rescoring and first-pass -decoding. However, such models suffer from slow decoding, hampering the use of -computationally expensive features, such as segment neural networks or other -high-order features. A typical solution is to use approximate decoding, either -by beam pruning in a single pass or by beam pruning to generate a lattice -followed by a second pass. In this work, we study discriminative segmental -models trained with a hinge loss (i.e., segmental structured SVMs). We show -that beam search is not suitable for learning rescoring models in this -approach, though it gives good approximate decoding performance when the model -is already well-trained. Instead, we consider an approach inspired by -structured prediction cascades, which use max-marginal pruning to generate -lattices. We obtain a high-accuracy phonetic recognition system with several -expensive feature types: a segment neural network, a second-order language -model, and second-order phone boundary features. -" -1938,1507.06711,"Shitao Weng, Shushan Chen, Lei Yu, Xuewei Wu, Weicheng Cai, Zhi Liu, - Ming Li","The SYSU System for the Interspeech 2015 Automatic Speaker Verification - Spoofing and Countermeasures Challenge",cs.SD cs.CL," Many existing speaker verification systems are reported to be vulnerable -against different spoofing attacks, for example speaker-adapted speech -synthesis, voice conversion, play back, etc. In order to detect these spoofed -speech signals as a countermeasure, we propose a score level fusion approach -with several different i-vector subsystems. We show that the acoustic level -Mel-frequency cepstral coefficients (MFCC) features, the phase level modified -group delay cepstral coefficients (MGDCC) and the phonetic level phoneme -posterior probability (PPP) tandem features are effective for the -countermeasure. Furthermore, feature level fusion of these features before -i-vector modeling also enhance the performance. A polynomial kernel support -vector machine is adopted as the supervised classifier. In order to enhance the -generalizability of the countermeasure, we also adopted the cosine similarity -and PLDA scoring as one-class classifications methods. By combining the -proposed i-vector subsystems with the OpenSMILE baseline which covers the -acoustic and prosodic information further improves the final performance. The -proposed fusion system achieves 0.29% and 3.26% EER on the development and test -set of the database provided by the INTERSPEECH 2015 automatic speaker -verification spoofing and countermeasures challenge. -" -1939,1507.06829,"Lisa Posch, Arnim Bleier, Philipp Schaer, Markus Strohmaier",The Polylingual Labeled Topic Model,cs.CL cs.IR cs.LG," In this paper, we present the Polylingual Labeled Topic Model, a model which -combines the characteristics of the existing Polylingual Topic Model and -Labeled LDA. The model accounts for multiple languages with separate topic -distributions for each language while restricting the permitted topics of a -document to a set of predefined labels. We explore the properties of the model -in a two-language setting on a dataset from the social science domain. Our -experiments show that our model outperforms LDA and Labeled LDA in terms of -their held-out perplexity and that it produces semantically coherent topics -which are well interpretable by human subjects. -" -1940,1507.06837,Jeremy Fix and Herve Frezza-buet,YARBUS : Yet Another Rule Based belief Update System,cs.CL cs.AI," We introduce a new rule based system for belief tracking in dialog systems. -Despite the simplicity of the rules being considered, the proposed belief -tracker ranks favourably compared to the previous submissions on the second and -third Dialog State Tracking challenges. The results of this simple tracker -allows to reconsider the performances of previous submissions using more -elaborate techniques. -" -1941,1507.06947,"Ha\c{s}im Sak, Andrew Senior, Kanishka Rao, Fran\c{c}oise Beaufays","Fast and Accurate Recurrent Neural Network Acoustic Models for Speech - Recognition",cs.CL cs.LG cs.NE stat.ML," We have recently shown that deep Long Short-Term Memory (LSTM) recurrent -neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as -acoustic models for speech recognition. More recently, we have shown that the -performance of sequence trained context dependent (CD) hidden Markov model -(HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained -phone models initialized with connectionist temporal classification (CTC). In -this paper, we present techniques that further improve performance of LSTM RNN -acoustic models for large vocabulary speech recognition. We show that frame -stacking and reduced frame rate lead to more accurate models and faster -decoding. CD phone modeling leads to further improvements. We also present -initial results for LSTM RNN models outputting words directly. -" -1942,1507.07636,"Sridhar Mahadevan, Sarath Chandar","Reasoning about Linguistic Regularities in Word Embeddings using Matrix - Manifolds",cs.CL," Recent work has explored methods for learning continuous vector space word -representations reflecting the underlying semantics of words. Simple vector -space arithmetic using cosine distances has been shown to capture certain types -of analogies, such as reasoning about plurals from singulars, past tense from -present tense, etc. In this paper, we introduce a new approach to capture -analogies in continuous word representations, based on modeling not just -individual word vectors, but rather the subspaces spanned by groups of words. -We exploit the property that the set of subspaces in n-dimensional Euclidean -space form a curved manifold space called the Grassmannian, a quotient subgroup -of the Lie group of rotations in n- dimensions. Based on this mathematical -model, we develop a modified cosine distance model based on geodesic kernels -that captures relation-specific distances across word categories. Our -experiments on analogy tasks show that our approach performs significantly -better than the previous approaches for the given task. -" -1943,1507.07826,"Henrique F. de Arruda, Luciano da F. Costa and Diego R. Amancio",Classifying informative and imaginative prose using complex networks,cs.CL," Statistical methods have been widely employed in recent years to grasp many -language properties. The application of such techniques have allowed an -improvement of several linguistic applications, which encompasses machine -translation, automatic summarization and document classification. In the -latter, many approaches have emphasized the semantical content of texts, as it -is the case of bag-of-word language models. This approach has certainly yielded -reasonable performance. However, some potential features such as the structural -organization of texts have been used only on a few studies. In this context, we -probe how features derived from textual structure analysis can be effectively -employed in a classification task. More specifically, we performed a supervised -classification aiming at discriminating informative from imaginative documents. -Using a networked model that describes the local topological/dynamical -properties of function words, we achieved an accuracy rate of up to 95%, which -is much higher than similar networked approaches. A systematic analysis of -feature relevance revealed that symmetry and accessibility measurements are -among the most prominent network measurements. Our results suggest that these -measurements could be used in related language applications, as they play a -complementary role in characterizing texts. -" -1944,1507.07998,Andrew M. Dai and Christopher Olah and Quoc V. Le,Document Embedding with Paragraph Vectors,cs.CL cs.AI cs.LG," Paragraph Vectors has been recently proposed as an unsupervised method for -learning distributed representations for pieces of texts. In their work, the -authors showed that the method can learn an embedding of movie review texts -which can be leveraged for sentiment analysis. That proof of concept, while -encouraging, was rather narrow. Here we consider tasks other than sentiment -analysis, provide a more thorough comparison of Paragraph Vectors to other -document modelling algorithms such as Latent Dirichlet Allocation, and evaluate -performance of the method as we vary the dimensionality of the learned -representation. We benchmarked the models on two document similarity data sets, -one from Wikipedia, one from arXiv. We observe that the Paragraph Vector method -performs significantly better than other methods, and propose a simple -improvement to enhance embedding quality. Somewhat surprisingly, we also show -that much like word embeddings, vector operations on Paragraph Vectors can -perform useful semantic results. -" -1945,1507.08240,"Yajie Miao, Mohammad Gowayyed, Florian Metze","EESEN: End-to-End Speech Recognition using Deep RNN Models and - WFST-based Decoding",cs.CL cs.LG," The performance of automatic speech recognition (ASR) has improved -tremendously due to the application of deep neural networks (DNNs). Despite -this progress, building a new ASR system remains a challenging task, requiring -various resources, multiple training stages and significant expertise. This -paper presents our Eesen framework which drastically simplifies the existing -pipeline to build state-of-the-art ASR systems. Acoustic modeling in Eesen -involves learning a single recurrent neural network (RNN) predicting -context-independent targets (phonemes or characters). To remove the need for -pre-generated frame labels, we adopt the connectionist temporal classification -(CTC) objective function to infer the alignments between speech and label -sequences. A distinctive feature of Eesen is a generalized decoding approach -based on weighted finite-state transducers (WFSTs), which enables the efficient -incorporation of lexicons and language models into CTC decoding. Experiments -show that compared with the standard hybrid DNN systems, Eesen achieves -comparable word error rates (WERs), while at the same time speeding up decoding -significantly. -" -1946,1507.08396,"Shuangyin Li, Jiefei Li, Guan Huang, Ruiyang Tan, and Rong Pan",Tag-Weighted Topic Model For Large-scale Semi-Structured Documents,cs.CL cs.IR cs.LG stat.ML," To date, there have been massive Semi-Structured Documents (SSDs) during the -evolution of the Internet. These SSDs contain both unstructured features (e.g., -plain text) and metadata (e.g., tags). Most previous works focused on modeling -the unstructured text, and recently, some other methods have been proposed to -model the unstructured text with specific tags. To build a general model for -SSDs remains an important problem in terms of both model fitness and -efficiency. We propose a novel method to model the SSDs by a so-called -Tag-Weighted Topic Model (TWTM). TWTM is a framework that leverages both the -tags and words information, not only to learn the document-topic and topic-word -distributions, but also to infer the tag-topic distributions for text mining -tasks. We present an efficient variational inference method with an EM -algorithm for estimating the model parameters. Meanwhile, we propose three -large-scale solutions for our model under the MapReduce distributed computing -platform for modeling large-scale SSDs. The experimental results show the -effectiveness, efficiency and the robustness by comparing our model with the -state-of-the-art methods in document modeling, tags prediction and text -classification. We also show the performance of the three distributed solutions -in terms of time and accuracy on document modeling. -" -1947,1507.08449,David Vilares and Carlos G\'omez-Rodr\'iguez and Miguel A. Alonso,"One model, two languages: training bilingual parsers with harmonized - treebanks",cs.CL," We introduce an approach to train lexicalized parsers using bilingual corpora -obtained by merging harmonized treebanks of different languages, producing -parsers that can analyze sentences in either of the learned languages, or even -sentences that mix both. We test the approach on the Universal Dependency -Treebanks, training with MaltParser and MaltOptimizer. The results show that -these bilingual parsers are more than competitive, as most combinations not -only preserve accuracy, but some even achieve significant improvements over the -corresponding monolingual parsers. Preliminary experiments also show the -approach to be promising on texts with code-switching and when more languages -are added. -" -1948,1507.08452,Shashi Narayan and Claire Gardent,Unsupervised Sentence Simplification Using Deep Semantics,cs.CL," We present a novel approach to sentence simplification which departs from -previous work in two main ways. First, it requires neither hand written rules -nor a training corpus of aligned standard and simplified sentences. Second, -sentence splitting operates on deep semantic structure. We show (i) that the -unsupervised framework we propose is competitive with four state-of-the-art -supervised systems and (ii) that our semantic based approach allows for a -principled and effective handling of sentence splitting. -" -1949,1507.08539,"Domagoj Margan, Ana Me\v{s}trovi\'c, Sanda Martin\v{c}i\'c-Ip\v{s}i\'c","Multilayer Network of Language: a Unified Framework for Structural - Analysis of Linguistic Subsystems",cs.CL," Recently, the focus of complex networks research has shifted from the -analysis of isolated properties of a system toward a more realistic modeling of -multiple phenomena - multilayer networks. Motivated by the prosperity of -multilayer approach in social, transport or trade systems, we propose the -introduction of multilayer networks for language. The multilayer network of -language is a unified framework for modeling linguistic subsystems and their -structural properties enabling the exploration of their mutual interactions. -Various aspects of natural language systems can be represented as complex -networks, whose vertices depict linguistic units, while links model their -relations. The multilayer network of language is defined by three aspects: the -network construction principle, the linguistic subsystem and the language of -interest. More precisely, we construct a word-level (syntax, co-occurrence and -its shuffled counterpart) and a subword level (syllables and graphemes) network -layers, from five variations of original text (in the modeled language). The -obtained results suggest that there are substantial differences between the -networks structures of different language subsystems, which are hidden during -the exploration of an isolated layer. The word-level layers share structural -properties regardless of the language (e.g. Croatian or English), while the -syllabic subword level expresses more language dependent structural properties. -The preserved weighted overlap quantifies the similarity of word-level layers -in weighted and directed networks. Moreover, the analysis of motifs reveals a -close topological structure of the syntactic and syllabic layers for both -languages. The findings corroborate that the multilayer network framework is a -powerful, consistent and systematic approach to model several linguistic -subsystems simultaneously and hence to provide a more unified view on language. -" -1950,1508.00106,"Ira Leviant, Roi Reichart","Separated by an Un-common Language: Towards Judgment Language Informed - Vector Space Modeling",cs.CL," A common evaluation practice in the vector space models (VSMs) literature is -to measure the models' ability to predict human judgments about lexical -semantic relations between word pairs. Most existing evaluation sets, however, -consist of scores collected for English word pairs only, ignoring the potential -impact of the judgment language in which word pairs are presented on the human -scores. In this paper we translate two prominent evaluation sets, wordsim353 -(association) and SimLex999 (similarity), from English to Italian, German and -Russian and collect scores for each dataset from crowdworkers fluent in its -language. Our analysis reveals that human judgments are strongly impacted by -the judgment language. Moreover, we show that the predictions of monolingual -VSMs do not necessarily best correlate with human judgments made with the -language used for model training, suggesting that models and humans are -affected differently by the language they use when making semantic judgments. -Finally, we show that in a large number of setups, multilingual VSM combination -results in improved correlations with human judgments, suggesting that -multilingualism may partially compensate for the judgment language effect on -human judgments. -" -1951,1508.00189,"Devendra Singh Sachan, Shailesh Kumar",Class Vectors: Embedding representation of Document Classes,cs.CL cs.IR," Distributed representations of words and paragraphs as semantic embeddings in -high dimensional data are used across a number of Natural Language -Understanding tasks such as retrieval, translation, and classification. In this -work, we propose ""Class Vectors"" - a framework for learning a vector per class -in the same embedding space as the word and paragraph embeddings. Similarity -between these class vectors and word vectors are used as features to classify a -document to a class. In experiment on several sentiment analysis tasks such as -Yelp reviews and Amazon electronic product reviews, class vectors have shown -better or comparable results in classification while learning very meaningful -class embeddings. -" -1952,1508.00200,"Jian Tang, Meng Qu, Qiaozhu Mei","PTE: Predictive Text Embedding through Large-scale Heterogeneous Text - Networks",cs.CL cs.LG cs.NE," Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, -have been attracting increasing attention due to their simplicity, scalability, -and effectiveness. However, comparing to sophisticated deep learning -architectures such as convolutional neural networks, these methods usually -yield inferior results when applied to particular machine learning tasks. One -possible reason is that these text embedding methods learn the representation -of text in a fully unsupervised way, without leveraging the labeled information -available for the task. Although the low dimensional representations learned -are applicable to many different tasks, they are not particularly tuned for any -task. In this paper, we fill this gap by proposing a semi-supervised -representation learning method for text data, which we call the -\textit{predictive text embedding} (PTE). Predictive text embedding utilizes -both labeled and unlabeled data to learn the embedding of text. The labeled -information and different levels of word co-occurrence information are first -represented as a large-scale heterogeneous text network, which is then embedded -into a low dimensional space through a principled and efficient algorithm. This -low dimensional embedding not only preserves the semantic closeness of words -and documents, but also has a strong predictive power for the particular task. -Compared to recent supervised approaches based on convolutional neural -networks, predictive text embedding is comparable or more effective, much more -efficient, and has fewer parameters to tune. -" -1953,1508.00305,"Panupong Pasupat, Percy Liang",Compositional Semantic Parsing on Semi-Structured Tables,cs.CL," Two important aspects of semantic parsing for question answering are the -breadth of the knowledge source and the depth of logical compositionality. -While existing work trades off one aspect for another, this paper -simultaneously makes progress on both fronts through a new task: answering -complex questions on semi-structured tables using question-answer pairs as -supervision. The central challenge arises from two compounding factors: the -broader domain results in an open-ended set of relations, and the deeper -compositionality results in a combinatorial explosion in the space of logical -forms. We propose a logical-form driven parsing algorithm guided by strong -typing constraints and show that it obtains significant improvements over -natural baselines. For evaluation, we created a new dataset of 22,033 complex -questions on Wikipedia tables, which is made publicly available. -" -1954,1508.00354,"Sivanand Achanta, Anandaswarup Vadapalli, Sai Krishna R., Suryakanth - V. Gangashetty","Significance of Maximum Spectral Amplitude in Sub-bands for Spectral - Envelope Estimation and Its Application to Statistical Parametric Speech - Synthesis",cs.SD cs.CL," In this paper we propose a technique for spectral envelope estimation using -maximum values in the sub-bands of Fourier magnitude spectrum (MSASB). Most -other methods in the literature parametrize spectral envelope in cepstral -domain such as Mel-generalized cepstrum etc. Such cepstral domain -representations, although compact, are not readily interpretable. This -difficulty is overcome by our method which parametrizes in the spectral domain -itself. In our experiments, spectral envelope estimated using MSASB method was -incorporated in the STRAIGHT vocoder. Both objective and subjective results of -analysis-by-synthesis indicate that the proposed method is comparable to -STRAIGHT. We also evaluate the effectiveness of the proposed parametrization in -a statistical parametric speech synthesis framework using deep neural networks. -" -1955,1508.00504,"Karthik Siva, Jim Tao, Matilde Marcolli",Spin Glass Models of Syntax and Language Evolution,cs.CL cond-mat.dis-nn physics.soc-ph," Using the SSWL database of syntactic parameters of world languages, and the -MIT Media Lab data on language interactions, we construct a spin glass model of -language evolution. We treat binary syntactic parameters as spin states, with -languages as vertices of a graph, and assigned interaction energies along the -edges. We study a rough model of syntax evolution, under the assumption that a -strong interaction energy tends to cause parameters to align, as in the case of -ferromagnetic materials. We also study how the spin glass model needs to be -modified to account for entailment relations between syntactic parameters. This -modification leads naturally to a generalization of Potts models with external -magnetic field, which consists of a coupling at the vertices of an Ising model -and a Potts model with q=3, that have the same edge interactions. We describe -the results of simulations of the dynamics of these models, in different -temperature and energy regimes. We discuss the linguistic interpretation of the -parameters of the physical model. -" -1956,1508.00657,"Miguel Ballesteros, Chris Dyer, Noah A. Smith","Improved Transition-Based Parsing by Modeling Characters instead of - Words with LSTMs",cs.CL," We present extensions to a continuous-state dependency parsing method that -makes it applicable to morphologically rich languages. Starting with a -high-performance transition-based parser that uses long short-term memory -(LSTM) recurrent neural networks to learn representations of the parser state, -we replace lookup-based word representations with representations constructed -from the orthographic representations of the words, also using LSTMs. This -allows statistical sharing across word forms that are similar on the surface. -Experiments for morphologically rich languages show that the parsing model -benefits from incorporating the character-based encodings of words. -" -1957,1508.00715,"Zhilin Yang, Jie Tang, William Cohen",Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs,cs.CL cs.SI," We study the extent to which online social networks can be connected to open -knowledge bases. The problem is referred to as learning social knowledge -graphs. We propose a multi-modal Bayesian embedding model, GenVector, to learn -latent topics that generate word and network embeddings. GenVector leverages -large-scale unlabeled data with embeddings and represents data of two -modalities---i.e., social network users and knowledge concepts---in a shared -latent topic space. Experiments on three datasets show that the proposed method -clearly outperforms state-of-the-art methods. We then deploy the method on -AMiner, a large-scale online academic search system with a network of -38,049,189 researchers with a knowledge base with 35,415,011 concepts. Our -method significantly decreases the error rate in an online A/B test with live -users. -" -1958,1508.00973,"Peixian Chen, Nevin L. Zhang, Leonard K.M. Poon, Zhourong Chen",Progressive EM for Latent Tree Models and Hierarchical Topic Detection,cs.LG cs.CL cs.IR stat.ML," Hierarchical latent tree analysis (HLTA) is recently proposed as a new method -for topic detection. It differs fundamentally from the LDA-based methods in -terms of topic definition, topic-document relationship, and learning method. It -has been shown to discover significantly more coherent topics and better topic -hierarchies. However, HLTA relies on the Expectation-Maximization (EM) -algorithm for parameter estimation and hence is not efficient enough to deal -with large datasets. In this paper, we propose a method to drastically speed up -HLTA using a technique inspired by recent advances in the moments method. -Empirical experiments show that our method greatly improves the efficiency of -HLTA. It is as efficient as the state-of-the-art LDA-based method for -hierarchical topic detection and finds substantially better topics and topic -hierarchies. -" -1959,1508.01006,Dongxu Zhang and Dong Wang,Relation Classification via Recurrent Neural Network,cs.CL cs.LG cs.NE," Deep learning has gained much success in sentence-level relation -classification. For example, convolutional neural networks (CNN) have delivered -competitive performance without much effort on feature engineering as the -conventional pattern-based methods. Thus a lot of works have been produced -based on CNN structures. However, a key issue that has not been well addressed -by the CNN-based method is the lack of capability to learn temporal features, -especially long-distance dependency between nominal pairs. In this paper, we -propose a simple framework based on recurrent neural networks (RNN) and compare -it with CNN-based model. To show the limitation of popular used SemEval-2010 -Task 8 dataset, we introduce another dataset refined from MIMLRE(Angeli et al., -2014). Experiments on two different datasets strongly indicates that the -RNN-based model can deliver better performance on relation classification, and -it is particularly capable of learning long-distance relation patterns. This -makes it suitable for real-world applications where complicated expressions are -often involved. -" -1960,1508.01011,"Dongxu Zhang, Tianyi Luo, Dong Wang and Rong Liu",Learning from LDA using Deep Neural Networks,cs.LG cs.CL cs.IR cs.NE," Latent Dirichlet Allocation (LDA) is a three-level hierarchical Bayesian -model for topic inference. In spite of its great success, inferring the latent -topic distribution with LDA is time-consuming. Motivated by the transfer -learning approach proposed by~\newcite{hinton2015distilling}, we present a -novel method that uses LDA to supervise the training of a deep neural network -(DNN), so that the DNN can approximate the costly LDA inference with less -computation. Our experiments on a document classification task show that a -simple DNN can learn the LDA behavior pretty well, while the inference is -speeded up tens or hundreds of times. -" -1961,1508.01067,Jing Su and Ois\'in Boydell and Derek Greene and Gerard Lynch,Topic Stability over Noisy Sources,cs.CL cs.IR," Topic modelling techniques such as LDA have recently been applied to speech -transcripts and OCR output. These corpora may contain noisy or erroneous texts -which may undermine topic stability. Therefore, it is important to know how -well a topic modelling algorithm will perform when applied to noisy data. In -this paper we show that different types of textual noise will have diverse -effects on the stability of different topic models. From these observations, we -propose guidelines for text corpus generation, with a focus on automatic speech -transcription. We also suggest topic model selection methods for noisy corpora. -" -1962,1508.01211,William Chan and Navdeep Jaitly and Quoc V. Le and Oriol Vinyals,"Listen, Attend and Spell",cs.CL cs.LG cs.NE stat.ML," We present Listen, Attend and Spell (LAS), a neural network that learns to -transcribe speech utterances to characters. Unlike traditional DNN-HMM models, -this model learns all the components of a speech recognizer jointly. Our system -has two components: a listener and a speller. The listener is a pyramidal -recurrent network encoder that accepts filter bank spectra as inputs. The -speller is an attention-based recurrent network decoder that emits characters -as outputs. The network produces character sequences without making any -independence assumptions between the characters. This is the key improvement of -LAS over previous end-to-end CTC models. On a subset of the Google voice search -task, LAS achieves a word error rate (WER) of 14.1% without a dictionary or a -language model, and 10.3% with language model rescoring over the top 32 beams. -By comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0%. -" -1963,1508.01306,Michael Minock and Nils Everling,Replication and Generalization of PRECISE,cs.CL cs.AI cs.DB," This report describes an initial replication study of the PRECISE system and -develops a clearer, more formal description of the approach. Based on our -evaluation, we conclude that the PRECISE results do not fully replicate. -However the formalization developed here suggests a road map to further enhance -and extend the approach pioneered by PRECISE. - After a long, productive discussion with Ana-Maria Popescu (one of the -authors of PRECISE) we got more clarity on the PRECISE approach and how the -lexicon was authored for the GEO evaluation. Based on this we built a more -direct implementation over a repaired formalism. Although our new evaluation is -not yet complete, it is clear that the system is performing much better now. We -will continue developing our ideas and implementation and generate a future -report/publication that more accurately evaluates PRECISE like approaches. -" -1964,1508.01321,Fatima M. Moncada and Jaderick P. Pabico,"On Gobbledygook and Mood of the Philippine Senate: An Exploratory Study - on the Readability and Sentiment of Selected Philippine Senators' Microposts",cs.CL cs.CY," This paper presents the findings of a readability assessment and sentiment -analysis of selected six Philippine senators' microposts over the popular -Twitter microblog. Using the Simple Measure of Gobbledygook (SMOG), tweets of -Senators Cayetano, Defensor-Santiago, Pangilinan, Marcos, Guingona, and -Escudero were assessed. A sentiment analysis was also done to determine the -polarity of the senators' respective microposts. Results showed that on the -average, the six senators are tweeting at an eight to ten SMOG level. This -means that, at least a sixth grader will be able to understand the senators' -tweets. Moreover, their tweets are mostly neutral and their sentiments vary in -unison at some period of time. This could mean that a senator's tweet sentiment -is affected by specific Philippine-based events. -" -1965,1508.01346,Alok Ranjan Pal and Diganta Saha,Word sense disambiguation: a survey,cs.CL," In this paper, we made a survey on Word Sense Disambiguation (WSD). Near -about in all major languages around the world, research in WSD has been -conducted upto different extents. In this paper, we have gone through a survey -regarding the different approaches adopted in different research works, the -State of the Art in the performance in this domain, recent works in different -Indian languages and finally a survey in Bengali language. We have made a -survey on different competitions in this field and the bench mark results, -obtained from those competitions. -" -1966,1508.01349,"Alok Ranjan Pal, Diganta Saha and Niladri Sekhar Dash","Automatic classification of bengali sentences based on sense definitions - present in bengali wordnet",cs.CL," Based on the sense definition of words available in the Bengali WordNet, an -attempt is made to classify the Bengali sentences automatically into different -groups in accordance with their underlying senses. The input sentences are -collected from 50 different categories of the Bengali text corpus developed in -the TDIL project of the Govt. of India, while information about the different -senses of particular ambiguous lexical item is collected from Bengali WordNet. -In an experimental basis we have used Naive Bayes probabilistic model as a -useful classifier of sentences. We have applied the algorithm over 1747 -sentences that contain a particular Bengali lexical item which, because of its -ambiguous nature, is able to trigger different senses that render sentences in -different meanings. In our experiment we have achieved around 84% accurate -result on the sense classification over the total input sentences. We have -analyzed those residual sentences that did not comply with our experiment and -did affect the results to note that in many cases, wrong syntactic structures -and less semantic information are the main hurdles in semantic classification -of sentences. The applicational relevance of this study is attested in -automatic text classification, machine learning, information extraction, and -word sense disambiguation. -" -1967,1508.01420,"Lu\'is Marujo, Jos\'e Port\^elo, Wang Ling, David Martins de Matos, - Jo\~ao P. Neto, Anatole Gershman, Jaime Carbonell, Isabel Trancoso, Bhiksha - Raj",Privacy-Preserving Multi-Document Summarization,cs.IR cs.CL cs.CR," State-of-the-art extractive multi-document summarization systems are usually -designed without any concern about privacy issues, meaning that all documents -are open to third parties. In this paper we propose a privacy-preserving -approach to multi-document summarization. Our approach enables other parties to -obtain summaries without learning anything else about the original documents' -content. We use a hashing scheme known as Secure Binary Embeddings to convert -documents representation containing key phrases and bag-of-words into bit -strings, allowing the computation of approximate distances, instead of exact -ones. Our experiments indicate that our system yields similar results to its -non-private counterpart on standard multi-document evaluation datasets. -" -1968,1508.01447,Iyad AlAgha,"Using Linguistic Analysis to Translate Arabic Natural Language Queries - to SPARQL",cs.CL cs.AI cs.DB," The logic-based machine-understandable framework of the Semantic Web often -challenges naive users when they try to query ontology-based knowledge bases. -Existing research efforts have approached this problem by introducing Natural -Language (NL) interfaces to ontologies. These NL interfaces have the ability to -construct SPARQL queries based on NL user queries. However, most efforts were -restricted to queries expressed in English, and they often benefited from the -advancement of English NLP tools. However, little research has been done to -support querying the Arabic content on the Semantic Web by using NL queries. -This paper presents a domain-independent approach to translate Arabic NL -queries to SPARQL by leveraging linguistic analysis. Based on a special -consideration on Noun Phrases (NPs), our approach uses a language parser to -extract NPs and the relations from Arabic parse trees and match them to the -underlying ontology. It then utilizes knowledge in the ontology to group NPs -into triple-based representations. A SPARQL query is finally generated by -extracting targets and modifiers, and interpreting them into SPARQL. The -interpretation of advanced semantic features including negation, conjunctive -and disjunctive modifiers is also supported. The approach was evaluated by -using two datasets consisting of OWL test data and queries, and the obtained -results have confirmed its feasibility to translate Arabic NL queries to -SPARQL. -" -1969,1508.01476,"Qiang Zhan, Chunhong Wang","Hyponymy extraction of domain ontology concept based on ccrfs and - hierarchy clustering",cs.CL," Concept hierarchy is the backbone of ontology, and the concept hierarchy -acquisition has been a hot topic in the field of ontology learning. this paper -proposes a hyponymy extraction method of domain ontology concept based on -cascaded conditional random field(CCRFs) and hierarchy clustering. It takes -free text as extracting object, adopts CCRFs identifying the domain concepts. -First the low layer of CCRFs is used to identify simple domain concept, then -the results are sent to the high layer, in which the nesting concepts are -recognized. Next we adopt hierarchy clustering to identify the hyponymy -relation between domain ontology concepts. The experimental results demonstrate -the proposed method is efficient. -" -1970,1508.01571,"Humberto Corona, Michael P. O'Mahony",A Mood-based Genre Classification of Television Content,cs.IR cs.CL," The classification of television content helps users organise and navigate -through the large list of channels and programs now available. In this paper, -we address the problem of television content classification by exploiting text -information extracted from program transcriptions. We present an analysis which -adapts a model for sentiment that has been widely and successfully applied in -other fields such as music or blog posts. We use a real-world dataset obtained -from the Boxfish API to compare the performance of classifiers trained on a -number of different feature sets. Our experiments show that, over a large -collection of television content, program genres can be represented in a -three-dimensional space of valence, arousal and dominance, and that promising -classification results can be achieved using features based on this -representation. This finding supports the use of the proposed representation of -television content as a feature space for similarity computation and -recommendation generation. -" -1971,1508.01577,"Javier Vera, Felipe Urbina, Eric Goles","Automata networks model for alignment and least effort on vocabulary - formation",cs.CL physics.soc-ph," Can artificial communities of agents develop language with scaling relations -close to the Zipf law? As a preliminary answer to this question, we propose an -Automata Networks model of the formation of a vocabulary on a population of -individuals, under two in principle opposite strategies: the alignment and the -least effort principle. Within the previous account to the emergence of -linguistic conventions (specially, the Naming Game), we focus on modeling -speaker and hearer efforts as actions over their vocabularies and we study the -impact of these actions on the formation of a shared language. The numerical -simulations are essentially based on an energy function, that measures the -amount of local agreement between the vocabularies. The results suggests that -on one dimensional lattices the best strategy to the formation of shared -languages is the one that minimizes the efforts of speakers on communicative -tasks. -" -1972,1508.01580,"Javier Vera, Eric Goles","Automata networks for memory loss effects in the formation of linguistic - conventions",cs.CL physics.soc-ph," This work attempts to give new theoretical insights to the absence of -intermediate stages in the evolution of language. In particular, it is -developed an automata networks approach to a crucial question: how a population -of language users can reach agreement on a linguistic convention? To describe -the appearance of sharp transitions in the self-organization of language, it is -adopted an extremely simple model of (working) memory. At each time step, -language users simply loss part of their word-memories. Through computer -simulations of low-dimensional lattices, it appear sharp transitions at -critical values that depend on the size of the vicinities of the individuals. -" -1973,1508.01585,"Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, Bowen Zhou",Applying Deep Learning to Answer Selection: A Study and An Open Task,cs.CL cs.LG," We apply a general deep learning framework to address the non-factoid -question answering task. Our approach does not rely on any linguistic tools and -can be applied to different languages or domains. Various architectures are -presented and compared. We create and release a QA corpus and setup a new QA -task in the insurance domain. Experimental results demonstrate superior -performance compared to the baseline methods and various technologies give -further improvements. For this highly challenging task, the top-1 accuracy can -reach up to 65.3% on a test set, which indicates a great potential for -practical use. -" -1974,1508.01718,"Rimah Amami, Noureddine Ellouze","Study of Phonemes Confusions in Hierarchical Automatic Phoneme - Recognition System",cs.CL," In this paper, we have analyzed the impact of confusions on the robustness of -phoneme recognitions system. The confusions are detected at the pronunciation -and the confusions matrices of the phoneme recognizer. The confusions show that -some similarities between phonemes at the pronunciation affect significantly -the recognition rates. This paper proposes to understand those confusions in -order to improve the performance of the phoneme recognition system by isolating -the problematic phonemes. Confusion analysis leads to build a new hierarchical -recognizer using new phoneme distribution and the information from the -confusion matrices. This new hierarchical phoneme recognition system shows -significant improvements of the recognition rates on TIMIT database. -" -1975,1508.01745,"Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David - Vandyke, Steve Young","Semantically Conditioned LSTM-based Natural Language Generation for - Spoken Dialogue Systems",cs.CL," Natural language generation (NLG) is a critical component of spoken dialogue -and it has a significant impact both on usability and perceived quality. Most -NLG systems in common use employ rules and heuristics and tend to generate -rigid and stylised responses without the natural variation of human language. -They are also not easily scaled to systems covering multiple domains and -languages. This paper presents a statistical language generator based on a -semantically controlled Long Short-term Memory (LSTM) structure. The LSTM -generator can learn from unaligned data by jointly optimising sentence planning -and surface realisation using a simple cross entropy training criterion, and -language variation can be easily achieved by sampling from output candidates. -With fewer heuristics, an objective evaluation in two differing test domains -showed the proposed method improved performance compared to previous methods. -Human judges scored the LSTM system higher on informativeness and naturalness -and overall preferred it to the other systems. -" -1976,1508.01746,"Alan Godoy, Fl\'avio Sim\~oes, Jos\'e Augusto Stuchi, Marcus de Assis - Angeloni, M\'ario Uliani, Ricardo Violato",Using Deep Learning for Detecting Spoofing Attacks on Speech Signals,cs.SD cs.CL cs.CR cs.LG stat.ML," It is well known that speaker verification systems are subject to spoofing -attacks. The Automatic Speaker Verification Spoofing and Countermeasures -Challenge -- ASVSpoof2015 -- provides a standard spoofing database, containing -attacks based on synthetic speech, along with a protocol for experiments. This -paper describes CPqD's systems submitted to the ASVSpoof2015 Challenge, based -on deep neural networks, working both as a classifier and as a feature -extraction module for a GMM and a SVM classifier. Results show the validity of -this approach, achieving less than 0.5\% EER for known attacks. -" -1977,1508.01755,"Tsung-Hsien Wen, Milica Gasic, Dongho Kim, Nikola Mrksic, Pei-Hao Su, - David Vandyke, Steve Young","Stochastic Language Generation in Dialogue using Recurrent Neural - Networks with Convolutional Sentence Reranking",cs.CL," The natural language generation (NLG) component of a spoken dialogue system -(SDS) usually needs a substantial amount of handcrafting or a well-labeled -dataset to be trained on. These limitations add significantly to development -costs and make cross-domain, multi-lingual dialogue systems intractable. -Moreover, human languages are context-aware. The most natural response should -be directly learned from data rather than depending on predefined syntaxes or -rules. This paper presents a statistical language generator based on a joint -recurrent and convolutional neural network structure which can be trained on -dialogue act-utterance pairs without any semantic alignments or predefined -grammar trees. Objective metrics suggest that this new model outperforms -previous methods under the same experimental conditions. Results of an -evaluation by human judges indicate that it produces not only high quality but -linguistically varied utterances which are preferred compared to n-gram and -rule-based systems. -" -1978,1508.01786,"Daniel M. Romero, Roderick I. Swaab, Brian Uzzi, and Adam D. Galinsky","Mimicry Is Presidential: Linguistic Style Matching in Presidential - Debates and Improved Polling Numbers",cs.CL cs.SI," The current research used the contexts of U.S. presidential debates and -negotiations to examine whether matching the linguistic style of an opponent in -a two-party exchange affects the reactions of third-party observers. Building -off communication accommodation theory (CAT), interaction alignment theory -(IAT), and processing fluency, we propose that language style matching (LSM) -will improve subsequent third-party evaluations because matching an opponent's -linguistic style reflects greater perspective taking and will make one's -arguments easier to process. In contrast, research on status inferences -predicts that LSM will negatively impact third-party evaluations because LSM -implies followership. We conduct two studies to test these competing -hypotheses. Study 1 analyzed transcripts of U.S. presidential debates between -1976 and 2012 and found that candidates who matched their opponent's linguistic -style increased their standing in the polls. Study 2 demonstrated a causal -relationship between LSM and third-party observer evaluations using negotiation -transcripts. -" -1979,1508.01991,"Zhiheng Huang, Wei Xu, and Kai Yu",Bidirectional LSTM-CRF Models for Sequence Tagging,cs.CL," In this paper, we propose a variety of Long Short-Term Memory (LSTM) based -models for sequence tagging. These models include LSTM networks, bidirectional -LSTM (BI-LSTM) networks, LSTM with a Conditional Random Field (CRF) layer -(LSTM-CRF) and bidirectional LSTM with a CRF layer (BI-LSTM-CRF). Our work is -the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to -NLP benchmark sequence tagging data sets. We show that the BI-LSTM-CRF model -can efficiently use both past and future input features thanks to a -bidirectional LSTM component. It can also use sentence level tag information -thanks to a CRF layer. The BI-LSTM-CRF model can produce state of the art (or -close to) accuracy on POS, chunking and NER data sets. In addition, it is -robust and has less dependence on word embedding as compared to previous -observations. -" -1980,1508.01993,Stefan Feuerriegel and Ralph Fehrer,"Improving Decision Analytics with Deep Learning: The Case of Financial - Disclosures",stat.ML cs.CL cs.LG," Decision analytics commonly focuses on the text mining of financial news -sources in order to provide managerial decision support and to predict stock -market movements. Existing predictive frameworks almost exclusively apply -traditional machine learning methods, whereas recent research indicates that -traditional machine learning methods are not sufficiently capable of extracting -suitable features and capturing the non-linear nature of complex tasks. As a -remedy, novel deep learning models aim to overcome this issue by extending -traditional neural network models with additional hidden layers. Indeed, deep -learning has been shown to outperform traditional methods in terms of -predictive performance. In this paper, we adapt the novel deep learning -technique to financial decision support. In this instance, we aim to predict -the direction of stock movements following financial disclosures. As a result, -we show how deep learning can outperform the accuracy of random forests as a -benchmark for machine learning by 5.66%. -" -1981,1508.01996,"Hui Yu, Xiaofeng Wu, Wenbin Jiang, Qun Liu, ShouXun Lin","An Automatic Machine Translation Evaluation Metric Based on Dependency - Parsing Model",cs.CL," Most of the syntax-based metrics obtain the similarity by comparing the -sub-structures extracted from the trees of hypothesis and reference. These -sub-structures are defined by human and can't express all the information in -the trees because of the limited length of sub-structures. In addition, the -overlapped parts between these sub-structures are computed repeatedly. To avoid -these problems, we propose a novel automatic evaluation metric based on -dependency parsing model, with no need to define sub-structures by human. -First, we train a dependency parsing model by the reference dependency tree. -Then we generate the hypothesis dependency tree and the corresponding -probability by the dependency parsing model. The quality of the hypothesis can -be judged by this probability. In order to obtain the lexicon similarity, we -also introduce the unigram F-score to the new metric. Experiment results show -that the new metric gets the state-of-the-art performance on system level, and -is comparable with METEOR on sentence level. -" -1982,1508.02060,"Walaa Medhat, Ahmed H. Yousef, Hoda Korashy",Egyptian Dialect Stopword List Generation from Social Network Data,cs.CL," This paper proposes a methodology for generating a stopword list from online -social network (OSN) corpora in Egyptian Dialect(ED). The aim of the paper is -to investigate the effect of removingED stopwords on the Sentiment Analysis -(SA) task. The stopwords lists generated before were on Modern Standard Arabic -(MSA) which is not the common language used in OSN. We have generated a -stopword list of Egyptian dialect to be used with the OSN corpora. We compare -the efficiency of text classification when using the generated list along with -previously generated lists of MSA and combining the Egyptian dialect list with -the MSA list. The text classification was performed using Na\""ive Bayes and -Decision Tree classifiers and two feature selection approaches, unigram and -bigram. The experiments show that removing ED stopwords give better performance -than using lists of MSA stopwords only. -" -1983,1508.02091,"Jack Hessel, Nicolas Savva, Michael J. Wilber",Image Representations and New Domains in Neural Image Captioning,cs.CL cs.CV," We examine the possibility that recent promising results in automatic caption -generation are due primarily to language models. By varying image -representation quality produced by a convolutional neural network, we find that -a state-of-the-art neural captioning algorithm is able to produce quality -captions even when provided with surprisingly poor image representations. We -replicate this result in a new, fine-grained, transfer learned captioning -domain, consisting of 66K recipe image/title pairs. We also provide some -experiments regarding the appropriateness of datasets for automatic captioning, -and find that having multiple captions per image is beneficial, but not an -absolute requirement. -" -1984,1508.02096,"Wang Ling and Tiago Lu\'is and Lu\'is Marujo and Ram\'on Fernandez - Astudillo and Silvio Amir and Chris Dyer and Alan W. Black and Isabel - Trancoso","Finding Function in Form: Compositional Character Models for Open - Vocabulary Word Representation",cs.CL," We introduce a model for constructing vector representations of words by -composing characters using bidirectional LSTMs. Relative to traditional word -representation models that have independent vectors for each word type, our -model requires only a single vector per character type and a fixed set of -parameters for the compositional model. Despite the compactness of this model -and, more importantly, the arbitrary nature of the form-function relationship -in language, our ""composed"" word representations yield state-of-the-art results -in language modeling and part-of-speech tagging. Benefits over traditional -baselines are particularly pronounced in morphologically rich languages (e.g., -Turkish). -" -1985,1508.02131,"Daniel Beck, Trevor Cohn, Christian Hardmeier, Lucia Specia",Learning Structural Kernels for Natural Language Processing,cs.CL cs.LG," Structural kernels are a flexible learning paradigm that has been widely used -in Natural Language Processing. However, the problem of model selection in -kernel-based methods is usually overlooked. Previous approaches mostly rely on -setting default values for kernel hyperparameters or using grid search, which -is slow and coarse-grained. In contrast, Bayesian methods allow efficient model -selection by maximizing the evidence on the training data through -gradient-based methods. In this paper we show how to perform this in the -context of structural kernels by using Gaussian Processes. Experimental results -on tree kernels show that this procedure results in better prediction -performance compared to hyperparameter optimization via grid search. The -framework proposed in this paper can be adapted to other structures besides -trees, e.g., strings and graphs, thereby extending the utility of kernel-based -methods. -" -1986,1508.02142,Iftekhar Naim and Daniel Gildea,Feature-based Decipherment for Large Vocabulary Machine Translation,cs.CL," Orthographic similarities across languages provide a strong signal for -probabilistic decipherment, especially for closely related language pairs. The -existing decipherment models, however, are not well-suited for exploiting these -orthographic similarities. We propose a log-linear model with latent variables -that incorporates orthographic similarity features. Maximum likelihood training -is computationally expensive for the proposed log-linear model. To address this -challenge, we perform approximate inference via MCMC sampling and contrastive -divergence. Our results show that the proposed log-linear model with -contrastive divergence scales to large vocabularies and outperforms the -existing generative decipherment models by exploiting the orthographic -features. -" -1987,1508.02225,"Hui Yu, Xiaofeng Wu, Wenbin Jiang, Qun Liu, Shouxun Lin","Improve the Evaluation of Fluency Using Entropy for Machine Translation - Evaluation Metrics",cs.CL," The widely-used automatic evaluation metrics cannot adequately reflect the -fluency of the translations. The n-gram-based metrics, like BLEU, limit the -maximum length of matched fragments to n and cannot catch the matched fragments -longer than n, so they can only reflect the fluency indirectly. METEOR, which -is not limited by n-gram, uses the number of matched chunks but it does not -consider the length of each chunk. In this paper, we propose an entropy-based -method, which can sufficiently reflect the fluency of translations through the -distribution of matched words. This method can easily combine with the -widely-used automatic evaluation metrics to improve the evaluation of fluency. -Experiments show that the correlations of BLEU and METEOR are improved on -sentence level after combining with the entropy-based method on WMT 2010 and -WMT 2012. -" -1988,1508.02285,Nut Limsopatham and Nigel Collier,"Adapting Phrase-based Machine Translation to Normalise Medical Terms in - Social Media Messages",cs.CL," Previous studies have shown that health reports in social media, such as -DailyStrength and Twitter, have potential for monitoring health conditions -(e.g. adverse drug reactions, infectious diseases) in particular communities. -However, in order for a machine to understand and make inferences on these -health conditions, the ability to recognise when laymen's terms refer to a -particular medical concept (i.e.\ text normalisation) is required. To achieve -this, we propose to adapt an existing phrase-based machine translation (MT) -technique and a vector representation of words to map between a social media -phrase and a medical concept. We evaluate our proposed approach using a -collection of phrases from tweets related to adverse drug reactions. Our -experimental results show that the combination of a phrase-based MT technique -and the similarity between word vector representations outperforms the -baselines that apply only either of them by up to 55%. -" -1989,1508.02297,Adriaan M. J. Schakel and Benjamin J. Wilson,Measuring Word Significance using Distributed Representations of Words,cs.CL," Distributed representations of words as real-valued vectors in a relatively -low-dimensional space aim at extracting syntactic and semantic features from -large text corpora. A recently introduced neural network, named word2vec -(Mikolov et al., 2013a; Mikolov et al., 2013b), was shown to encode semantic -information in the direction of the word vectors. In this brief report, it is -proposed to use the length of the vectors, together with the term frequency, as -measure of word significance in a corpus. Experimental evidence using a -domain-specific corpus of abstracts is presented to support this proposal. A -useful visualization technique for text corpora emerges, where words are mapped -onto a two-dimensional plane and automatically ranked by significance. -" -1990,1508.02354,Jianpeng Cheng and Dimitri Kartsaklis,"Syntax-Aware Multi-Sense Word Embeddings for Deep Compositional Models - of Meaning",cs.CL cs.AI cs.NE," Deep compositional models of meaning acting on distributional representations -of words in order to produce vectors of larger text constituents are evolving -to a popular area of NLP research. We detail a compositional distributional -framework based on a rich form of word embeddings that aims at facilitating the -interactions between words in the context of a sentence. Embeddings and -composition layers are jointly learned against a generic objective that -enhances the vectors with syntactic information from the surrounding context. -Furthermore, each word is associated with a number of senses, the most -plausible of which is selected dynamically during the composition process. We -evaluate the produced vectors qualitatively and quantitatively with positive -results. At the sentence level, the effectiveness of the framework is -demonstrated on the MSRPar task, for which we report results within the -state-of-the-art range. -" -1991,1508.02375,"Matthew R. Gormley, Mark Dredze, Jason Eisner",Approximation-Aware Dependency Parsing by Belief Propagation,cs.CL cs.LG," We show how to train the fast dependency parser of Smith and Eisner (2008) -for improved accuracy. This parser can consider higher-order interactions among -edges while retaining O(n^3) runtime. It outputs the parse with maximum -expected recall -- but for speed, this expectation is taken under a posterior -distribution that is constructed only approximately, using loopy belief -propagation through structured factors. We show how to adjust the model -parameters to compensate for the errors introduced by this approximation, by -following the gradient of the actual loss on training data. We find this -gradient by back-propagation. That is, we treat the entire parser -(approximations and all) as a differentiable circuit, as Stoyanov et al. (2011) -and Domke (2010) did for loopy CRFs. The resulting trained parser obtains -higher accuracy with fewer iterations of belief propagation than one trained by -conditional log-likelihood. -" -1992,1508.02445,Milo\v{s} Stanojevi\'c,Removing Biases from Trainable MT Metrics by Using Self-Training,cs.CL," Most trainable machine translation (MT) metrics train their weights on human -judgments of state-of-the-art MT systems outputs. This makes trainable metrics -biases in many ways. One of them is preferring longer translations. These -biased metrics when used for tuning are evaluating different types of -translations -- n-best lists of translations with very diverse quality. Systems -tuned with these metrics tend to produce overly long translations that are -preferred by the metric but not by humans. This is usually solved by manually -tweaking metric's weights to equally value recall and precision. Our solution -is more general: (1) it does not address only the recall bias but also all -other biases that might be present in the data and (2) it does not require any -knowledge of the types of features used which is useful in cases when manual -tuning of metric's weights is not possible. This is accomplished by -self-training on unlabeled n-best lists by using metric that was initially -trained on standard human judgments. One way of looking at this is as domain -adaptation from the domain of state-of-the-art MT translations to diverse -n-best list translations. -" -1993,1508.03040,Ram\'on Casares,Syntax Evolution: Problems and Recursion,cs.CL," To investigate the evolution of syntax, we need to ascertain the evolutionary -r\^ole of syntax and, before that, the very nature of syntax. Here, we will -assume that syntax is computing. And then, since we are computationally Turing -complete, we meet an evolutionary anomaly, the anomaly of sytax: we are -syntactically too competent for syntax. Assuming that problem solving is -computing, and realizing that the evolutionary advantage of Turing completeness -is full problem solving and not syntactic proficiency, we explain the anomaly -of syntax by postulating that syntax and problem solving co-evolved in humans -towards Turing completeness. Examining the requirements that full problem -solving impose on language, we find firstly that semantics is not sufficient -and that syntax is necessary to represent problems. Our final conclusion is -that full problem solving requires a functional semantics on an infinite -tree-structured syntax. Besides these results, the introduction of Turing -completeness and problem solving to explain the evolution of syntax should help -us to fit the evolution of language within the evolution of cognition, giving -us some new clues to understand the elusive relation between language and -thinking. -" -1994,1508.03170,"Paulo Figueiredo and Marta Apar\'icio and David Martins de Matos and - Ricardo Ribeiro","Generation of Multimedia Artifacts: An Extractive Summarization-based - Approach",cs.AI cs.CL cs.MM," We explore methods for content selection and address the issue of coherence -in the context of the generation of multimedia artifacts. We use audio and -video to present two case studies: generation of film tributes, and -lecture-driven science talks. For content selection, we use centrality-based -and diversity-based summarization, along with topic analysis. To establish -coherence, we use the emotional content of music, for film tributes, and ensure -topic similarity between lectures and documentaries, for science talks. -Composition techniques for the production of multimedia artifacts are addressed -as a means of organizing content, in order to improve coherence. We discuss our -results considering the above aspects. -" -1995,1508.03276,Jakob Suchan and Mehul Bhatt and Harshita Jhavar,"Talking about the Moving Image: A Declarative Model for Image Schema - Based Embodied Perception Grounding and Language Generation",cs.AI cs.CL cs.CV cs.HC," We present a general theory and corresponding declarative model for the -embodied grounding and natural language based analytical summarisation of -dynamic visuo-spatial imagery. The declarative model ---ecompassing -spatio-linguistic abstractions, image schemas, and a spatio-temporal feature -based language generator--- is modularly implemented within Constraint Logic -Programming (CLP). The implemented model is such that primitives of the theory, -e.g., pertaining to space and motion, image schemata, are available as -first-class objects with `deep semantics' suited for inference and query. We -demonstrate the model with select examples broadly motivated by areas such as -film, design, geography, smart environments where analytical natural language -based externalisations of the moving image are central from the viewpoint of -human interaction, evidence-based qualitative analysis, and sensemaking. - Keywords: moving image, visual semantics and embodiment, visuo-spatial -cognition and computation, cognitive vision, computational models of narrative, -declarative spatial reasoning -" -1996,1508.03386,"Pei-Hao Su, David Vandyke, Milica Gasic, Dongho Kim, Nikola Mrksic, - Tsung-Hsien Wen, Steve Young","Learning from Real Users: Rating Dialogue Success with Neural Networks - for Reinforcement Learning in Spoken Dialogue Systems",cs.LG cs.CL," To train a statistical spoken dialogue system (SDS) it is essential that an -accurate method for measuring task success is available. To date training has -relied on presenting a task to either simulated or paid users and inferring the -dialogue's success by observing whether this presented task was achieved or -not. Our aim however is to be able to learn from real users acting under their -own volition, in which case it is non-trivial to rate the success as any prior -knowledge of the task is simply unavailable. User feedback may be utilised but -has been found to be inconsistent. Hence, here we present two neural network -models that evaluate a sequence of turn-level features to rate the success of a -dialogue. Importantly these models make no use of any prior knowledge of the -user's task. The models are trained on dialogues generated by a simulated user -and the best model is then used to train a policy on-line which is shown to -perform at least as well as a baseline system using prior knowledge of the -user's task. We note that the models should also be of interest for evaluating -SDS and for monitoring a dialogue in rule-based SDS. -" -1997,1508.03391,"Pei-Hao Su, David Vandyke, Milica Gasic, Nikola Mrksic, Tsung-Hsien - Wen, Steve Young","Reward Shaping with Recurrent Neural Networks for Speeding up On-Line - Policy Learning in Spoken Dialogue Systems",cs.LG cs.CL," Statistical spoken dialogue systems have the attractive property of being -able to be optimised from data via interactions with real users. However in the -reinforcement learning paradigm the dialogue manager (agent) often requires -significant time to explore the state-action space to learn to behave in a -desirable manner. This is a critical issue when the system is trained on-line -with real users where learning costs are expensive. Reward shaping is one -promising technique for addressing these concerns. Here we examine three -recurrent neural network (RNN) approaches for providing reward shaping -information in addition to the primary (task-orientated) environmental -feedback. These RNNs are trained on returns from dialogues generated by a -simulated user and attempt to diffuse the overall evaluation of the dialogue -back down to the turn level to guide the agent towards good behaviour faster. -In both simulated and real user scenarios these RNNs are shown to increase -policy learning speed. Importantly, they do not require prior knowledge of the -user's goal. -" -1998,1508.03530,"Dami\'an G. Hern\'andez, Dami\'an H. Zanette, In\'es Samengo","Information-theoretical analysis of the statistical dependencies among - three variables: Applications to written language",cs.CL physics.data-an physics.soc-ph," We develop the information-theoretical concepts required to study the -statistical dependencies among three variables. Some of such dependencies are -pure triple interactions, in the sense that they cannot be explained in terms -of a combination of pairwise correlations. We derive bounds for triple -dependencies, and characterize the shape of the joint probability distribution -of three binary variables with high triple interaction. The analysis also -allows us to quantify the amount of redundancy in the mutual information -between pairs of variables, and to assess whether the information between two -variables is or is not mediated by a third variable. These concepts are applied -to the analysis of written texts. We find that the probability that a given -word is found in a particular location within the text is not only modulated by -the presence or absence of other nearby words, but also, on the presence or -absence of nearby pairs of words. We identify the words enclosing the key -semantic concepts of the text, the triplets of words with high pairwise and -triple interactions, and the words that mediate the pairwise interactions -between other words. -" -1999,1508.03601,Ranjitha R. K. and Sanjay Singh,Is Stack Overflow Overflowing With Questions and Tags,cs.SI cs.CL," Programming question and answer (Q & A) websites, such as Quora, Stack -Overflow, and Yahoo! Answer etc. helps us to understand the programming -concepts easily and quickly in a way that has been tested and applied by many -software developers. Stack Overflow is one of the most frequently used -programming Q\&A website where the questions and answers posted are presently -analyzed manually, which requires a huge amount of time and resource. To save -the effort, we present a topic modeling based technique to analyze the words of -the original texts to discover the themes that run through them. We also -propose a method to automate the process of reviewing the quality of questions -on Stack Overflow dataset in order to avoid ballooning the stack overflow with -insignificant questions. The proposed method also recommends the appropriate -tags for the new post, which averts the creation of unnecessary tags on Stack -Overflow. -" -2000,1508.03720,"Xu Yan, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng, Zhi Jin","Classifying Relations via Long Short Term Memory Networks along Shortest - Dependency Path",cs.CL cs.LG," Relation classification is an important research arena in the field of -natural language processing (NLP). In this paper, we present SDP-LSTM, a novel -neural network to classify the relation of two entities in a sentence. Our -neural architecture leverages the shortest dependency path (SDP) between two -entities; multichannel recurrent neural networks, with long short term memory -(LSTM) units, pick up heterogeneous information along the SDP. Our proposed -model has several distinct features: (1) The shortest dependency paths retain -most relevant information (to relation classification), while eliminating -irrelevant words in the sentence. (2) The multichannel LSTM networks allow -effective information integration from heterogeneous sources over the -dependency paths. (3) A customized dropout strategy regularizes the neural -network to alleviate overfitting. We test our model on the SemEval 2010 -relation classification task, and achieve an $F_1$-score of 83.7\%, higher than -competing methods in the literature. -" -2001,1508.03721,"Hao Peng, Lili Mou, Ge Li, Yunchuan Chen, Yangyang Lu, Zhi Jin","A Comparative Study on Regularization Strategies for Embedding-based - Neural Networks",cs.CL cs.LG," This paper aims to compare different regularization strategies to address a -common phenomenon, severe overfitting, in embedding-based neural networks for -NLP. We chose two widely studied neural models and tasks as our testbed. We -tried several frequently applied or newly proposed regularization strategies, -including penalizing weights (embeddings excluded), penalizing embeddings, -re-embedding words, and dropout. We also emphasized on incremental -hyperparameter tuning, and combining different regularizations. The results -provide a picture on tuning hyperparameters for neural NLP models. -" -2002,1508.03790,"Kaisheng Yao, Trevor Cohn, Katerina Vylomova, Kevin Duh, and Chris - Dyer",Depth-Gated LSTM,cs.NE cs.CL," In this short note, we present an extension of long short-term memory (LSTM) -neural networks to using a depth gate to connect memory cells of adjacent -layers. Doing so introduces a linear dependence between lower and upper layer -recurrent units. Importantly, the linear dependence is gated through a gating -function, which we call depth gate. This gate is a function of the lower layer -memory cell, the input to and the past memory cell of this layer. We conducted -experiments and verified that this new architecture of LSTMs was able to -improve machine translation and language modeling performances. -" -2003,1508.03826,"Shaohua Li, Jun Zhu, Chunyan Miao","A Generative Word Embedding Model and its Low Rank Positive Semidefinite - Solution",cs.CL cs.LG stat.ML," Most existing word embedding methods can be categorized into Neural Embedding -Models and Matrix Factorization (MF)-based methods. However some models are -opaque to probabilistic interpretation, and MF-based methods, typically solved -using Singular Value Decomposition (SVD), may incur loss of corpus information. -In addition, it is desirable to incorporate global latent factors, such as -topics, sentiments or writing styles, into the word embedding model. Since -generative models provide a principled way to incorporate latent factors, we -propose a generative word embedding model, which is easy to interpret, and can -serve as a basis of more sophisticated latent factor models. The model -inference reduces to a low rank weighted positive semidefinite approximation -problem. Its optimization is approached by eigendecomposition on a submatrix, -followed by online blockwise regression, which is scalable and avoids the -information loss in SVD. In experiments on 7 common benchmark datasets, our -vectors are competitive to word2vec, and better than other MF-based methods. -" -2004,1508.03854,Marek Rei,Online Representation Learning in Recurrent Neural Language Models,cs.CL cs.LG cs.NE," We investigate an extension of continuous online learning in recurrent neural -network language models. The model keeps a separate vector representation of -the current unit of text being processed and adaptively adjusts it after each -prediction. The initial experiments give promising results, indicating that the -method is able to increase language modelling accuracy, while also decreasing -the parameters needed to store the model along with the computation required at -each step. -" -2005,1508.03868,"Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, - Shih-Fu Chang","Visual Affect Around the World: A Large-scale Multilingual Visual - Sentiment Ontology",cs.MM cs.CL cs.CV cs.IR," Every culture and language is unique. Our work expressly focuses on the -uniqueness of culture and language in relation to human affect, specifically -sentiment and emotion semantics, and how they manifest in social multimedia. We -develop sets of sentiment- and emotion-polarized visual concepts by adapting -semantic structures called adjective-noun pairs, originally introduced by Borth -et al. (2013), but in a multilingual context. We propose a new -language-dependent method for automatic discovery of these adjective-noun -constructs. We show how this pipeline can be applied on a social multimedia -platform for the creation of a large-scale multilingual visual sentiment -concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our -unified ontology is organized hierarchically by multilingual clusters of -visually detectable nouns and subclusters of emotionally biased versions of -these nouns. In addition, we present an image-based prediction task to show how -generalizable language-specific models are in a multilingual context. A new, -publicly available dataset of >15.6K sentiment-biased visual concepts across 12 -languages with language-specific detector banks, >7.36M images and their -metadata is also released. -" -2006,1508.04025,"Minh-Thang Luong, Hieu Pham, Christopher D. Manning",Effective Approaches to Attention-based Neural Machine Translation,cs.CL," An attentional mechanism has lately been used to improve neural machine -translation (NMT) by selectively focusing on parts of the source sentence -during translation. However, there has been little work exploring useful -architectures for attention-based NMT. This paper examines two simple and -effective classes of attentional mechanism: a global approach which always -attends to all source words and a local one that only looks at a subset of -source words at a time. We demonstrate the effectiveness of both approaches -over the WMT translation tasks between English and German in both directions. -With local attention, we achieve a significant gain of 5.0 BLEU points over -non-attentional systems which already incorporate known techniques such as -dropout. Our ensemble model using different attention architectures has -established a new state-of-the-art result in the WMT'15 English to German -translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over -the existing best system backed by NMT and an n-gram reranker. -" -2007,1508.04112,"Tao Lei, Regina Barzilay and Tommi Jaakkola","Molding CNNs for text: non-linear, non-consecutive convolutions",cs.CL cs.AI," The success of deep learning often derives from well-chosen operational -building blocks. In this work, we revise the temporal convolution operation in -CNNs to better adapt it to text processing. Instead of concatenating word -representations, we appeal to tensor algebra and use low-rank n-gram tensors to -directly exploit interactions between words already at the convolution stage. -Moreover, we extend the n-gram convolution to non-consecutive words to -recognize patterns with intervening words. Through a combination of low-rank -tensors, and pattern weighting, we can efficiently evaluate the resulting -convolution operation via dynamic programming. We test the resulting -architecture on standard sentiment classification and news categorization -tasks. Our model achieves state-of-the-art performance both in terms of -accuracy and training speed. For instance, we obtain 51.2% accuracy on the -fine-grained sentiment classification task. -" -2008,1508.04257,"Wenpeng Yin, Hinrich Sch\""utze",Learning Meta-Embeddings by Using Ensembles of Embedding Sets,cs.CL," Word embeddings -- distributed representations of words -- in deep learning -are beneficial for many tasks in natural language processing (NLP). However, -different embedding sets vary greatly in quality and characteristics of the -captured semantics. Instead of relying on a more advanced algorithm for -embedding learning, this paper proposes an ensemble approach of combining -different public embedding sets with the aim of learning meta-embeddings. -Experiments on word similarity and analogy tasks and on part-of-speech tagging -show better performance of meta-embeddings compared to individual embedding -sets. One advantage of meta-embeddings is the increased vocabulary coverage. We -will release our meta-embeddings publicly. -" -2009,1508.04271,Jan A. Botha,Probabilistic Modelling of Morphologically Rich Languages,cs.CL," This thesis investigates how the sub-structure of words can be accounted for -in probabilistic models of language. Such models play an important role in -natural language processing tasks such as translation or speech recognition, -but often rely on the simplistic assumption that words are opaque symbols. This -assumption does not fit morphologically complex language well, where words can -have rich internal structure and sub-word elements are shared across distinct -word forms. - Our approach is to encode basic notions of morphology into the assumptions of -three different types of language models, with the intention that leveraging -shared sub-word structure can improve model performance and help overcome data -sparsity that arises from morphological processes. - In the context of n-gram language modelling, we formulate a new Bayesian -model that relies on the decomposition of compound words to attain better -smoothing, and we develop a new distributed language model that learns vector -representations of morphemes and leverages them to link together -morphologically related words. In both cases, we show that accounting for word -sub-structure improves the models' intrinsic performance and provides benefits -when applied to other tasks, including machine translation. - We then shift the focus beyond the modelling of word sequences and consider -models that automatically learn what the sub-word elements of a given language -are, given an unannotated list of words. We formulate a novel model that can -learn discontiguous morphemes in addition to the more conventional contiguous -morphemes that most previous models are limited to. This approach is -demonstrated on Semitic languages, and we find that modelling discontiguous -sub-word structures leads to improvements in the task of segmenting words into -their contiguous morphemes. -" -2010,1508.04395,"Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, - Yoshua Bengio",End-to-End Attention-based Large Vocabulary Speech Recognition,cs.CL cs.AI cs.LG cs.NE," Many of the current state-of-the-art Large Vocabulary Continuous Speech -Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov -Models (HMMs). Most of these systems contain separate components that deal with -the acoustic modelling, language modelling and sequence decoding. We -investigate a more direct approach in which the HMM is replaced with a -Recurrent Neural Network (RNN) that performs sequence prediction directly at -the character level. Alignment between the input features and the desired -character sequence is learned automatically by an attention mechanism built -into the RNN. For each predicted character, the attention mechanism scans the -input sequence and chooses relevant frames. We propose two methods to speed up -this operation: limiting the scan to a subset of most promising frames and -pooling over time the information contained in neighboring frames, thereby -reducing source sequence length. Integrating an n-gram language model into the -decoding process yields recognition accuracies similar to other HMM-free -RNN-based approaches. -" -2011,1508.04515,"Wei Zhang, Judith Gelernter","Exploring Metaphorical Senses and Word Representations for Identifying - Metonyms",cs.CL," A metonym is a word with a figurative meaning, similar to a metaphor. Because -metonyms are closely related to metaphors, we apply features that are used -successfully for metaphor recognition to the task of detecting metonyms. On the -ACL SemEval 2007 Task 8 data with gold standard metonym annotations, our system -achieved 86.45% accuracy on the location metonyms. Our code can be found on -GitHub. -" -2012,1508.04525,"Wei Zhang, Yang Yu, Osho Gupta, Judith Gelernter","Recognizing Extended Spatiotemporal Expressions by Actively Trained - Average Perceptron Ensembles",cs.CL cs.LG," Precise geocoding and time normalization for text requires that location and -time phrases be identified. Many state-of-the-art geoparsers and temporal -parsers suffer from low recall. Categories commonly missed by parsers are: -nouns used in a non- spatiotemporal sense, adjectival and adverbial phrases, -prepositional phrases, and numerical phrases. We collected and annotated data -set by querying commercial web searches API with such spatiotemporal -expressions as were missed by state-of-the- art parsers. Due to the high cost -of sentence annotation, active learning was used to label training data, and a -new strategy was designed to better select training examples to reduce labeling -cost. For the learning algorithm, we applied an average perceptron trained -Featurized Hidden Markov Model (FHMM). Five FHMM instances were used to create -an ensemble, with the output phrase selected by voting. Our ensemble model was -tested on a range of sequential labeling tasks, and has shown competitive -performance. Our contributions include (1) an new dataset annotated with named -entities and expanded spatiotemporal expressions; (2) a comparison of inference -algorithms for ensemble models showing the superior accuracy of Belief -Propagation over Viterbi Decoding; (3) a new example re-weighting method for -active ensemble learning that 'memorizes' the latest examples trained; (4) a -spatiotemporal parser that jointly recognizes expanded spatiotemporal -expressions as well as named entities. -" -2013,1508.04562,"Jingwei Zhang, Aaron Gerow, Jaan Altosaar, James Evans, Richard Jean - So","Fast, Flexible Models for Discovering Topic Correlation across - Weakly-Related Collections",cs.CL cs.IR," Weak topic correlation across document collections with different numbers of -topics in individual collections presents challenges for existing -cross-collection topic models. This paper introduces two probabilistic topic -models, Correlated LDA (C-LDA) and Correlated HDP (C-HDP). These address -problems that can arise when analyzing large, asymmetric, and potentially -weakly-related collections. Topic correlations in weakly-related collections -typically lie in the tail of the topic distribution, where they would be -overlooked by models unable to fit large numbers of topics. To efficiently -model this long tail for large-scale analysis, our models implement a parallel -sampling algorithm based on the Metropolis-Hastings and alias methods (Yuan et -al., 2015). The models are first evaluated on synthetic data, generated to -simulate various collection-level asymmetries. We then present a case study of -modeling over 300k documents in collections of sciences and humanities research -from JSTOR. -" -2014,1508.05051,Kenton Murray and David Chiang,Auto-Sizing Neural Networks: With Applications to n-gram Language Models,cs.CL," Neural networks have been shown to improve performance across a range of -natural-language tasks. However, designing and training them can be -complicated. Frequently, researchers resort to repeated experimentation to pick -optimal settings. In this paper, we address the issue of choosing the correct -number of units in hidden layers. We introduce a method for automatically -adjusting network size by pruning out hidden units through $\ell_{\infty,1}$ -and $\ell_{2,1}$ regularization. We apply this method to language modeling and -demonstrate its ability to correctly choose the number of hidden units while -maintaining perplexity. We also include these models in a machine translation -decoder and show that these smaller neural models maintain the significant -improvements of their unpruned versions. -" -2015,1508.05154,Khanh Nguyen and Brendan O'Connor,"Posterior calibration and exploratory analysis for natural language - processing models",cs.CL," Many models in natural language processing define probabilistic distributions -over linguistic structures. We argue that (1) the quality of a model' s -posterior distribution can and should be directly evaluated, as to whether -probabilities correspond to empirical frequencies, and (2) NLP uncertainty can -be projected not only to pipeline components, but also to exploratory data -analysis, telling a user when to trust and not trust the NLP analysis. We -present a method to analyze calibration, and apply it to compare the -miscalibration of several commonly used models. We also contribute a -coreference sampling algorithm that can create confidence intervals for a -political event extraction task. -" -2016,1508.05163,"Yustinus Eko Soelistio, Martinus Raditia Sigit Surendra","Simple Text Mining for Sentiment Analysis of Political Figure Using - Naive Bayes Classifier Method",cs.CL cs.IR," Text mining can be applied to many fields. One of the application is using -text mining in digital newspaper to do politic sentiment analysis. In this -paper sentiment analysis is applied to get information from digital news -articles about its positive or negative sentiment regarding particular -politician. This paper suggests a simple model to analyze digital newspaper -sentiment polarity using naive Bayes classifier method. The model uses a set of -initial data to begin with which will be updated when new information appears. -The model showed promising result when tested and can be implemented to some -other sentiment analysis problems. -" -2017,1508.05326,"Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. - Manning",A large annotated corpus for learning natural language inference,cs.CL," Understanding entailment and contradiction is fundamental to understanding -natural language, and inference about entailment and contradiction is a -valuable testing ground for the development of semantic representations. -However, machine learning research in this area has been dramatically limited -by the lack of large-scale resources. To address this, we introduce the -Stanford Natural Language Inference corpus, a new, freely available collection -of labeled sentence pairs, written by humans doing a novel grounded task based -on image captioning. At 570K pairs, it is two orders of magnitude larger than -all other resources of its type. This increase in scale allows lexicalized -classifiers to outperform some sophisticated existing entailment models, and it -allows a neural network-based model to perform competitively on natural -language inference benchmarks for the first time. -" -2018,1508.05508,"Baolin Peng, Zhengdong Lu, Hang Li and Kam-Fai Wong",Towards Neural Network-based Reasoning,cs.AI cs.CL cs.LG cs.NE," We propose Neural Reasoner, a framework for neural network-based reasoning -over natural language sentences. Given a question, Neural Reasoner can infer -over multiple supporting facts and find an answer to the question in specific -forms. Neural Reasoner has 1) a specific interaction-pooling mechanism, -allowing it to examine multiple facts, and 2) a deep architecture, allowing it -to model the complicated logical relations in reasoning tasks. Assuming no -particular structure exists in the question and facts, Neural Reasoner is able -to accommodate different types of reasoning and different forms of language -expressions. Despite the model complexity, Neural Reasoner can still be trained -effectively in an end-to-end manner. Our empirical studies show that Neural -Reasoner can outperform existing neural reasoning systems with remarkable -margins on two difficult artificial tasks (Positional Reasoning and Path -Finding) proposed in [8]. For example, it improves the accuracy on Path -Finding(10K) from 33.4% [6] to over 98%. -" -2019,1508.05565,"Weicong Ding, Prakash Ishwar, Venkatesh Saligrama","Necessary and Sufficient Conditions and a Provably Efficient Algorithm - for Separable Topic Discovery",cs.LG cs.CL cs.IR stat.ML," We develop necessary and sufficient conditions and a novel provably -consistent and efficient algorithm for discovering topics (latent factors) from -observations (documents) that are realized from a probabilistic mixture of -shared latent factors that have certain properties. Our focus is on the class -of topic models in which each shared latent factor contains a novel word that -is unique to that factor, a property that has come to be known as separability. -Our algorithm is based on the key insight that the novel words correspond to -the extreme points of the convex hull formed by the row-vectors of a suitably -normalized word co-occurrence matrix. We leverage this geometric insight to -establish polynomial computation and sample complexity bounds based on a few -isotropic random projections of the rows of the normalized word co-occurrence -matrix. Our proposed random-projections-based algorithm is naturally amenable -to an efficient distributed implementation and is attractive for modern -web-scale distributed data mining applications. -" -2020,1508.05817,"Marco Guerini, G\""ozde \""Ozbal, Carlo Strapparava",Echoes of Persuasion: The Effect of Euphony in Persuasive Communication,cs.CL cs.CY cs.SI," While the effect of various lexical, syntactic, semantic and stylistic -features have been addressed in persuasive language from a computational point -of view, the persuasive effect of phonetics has received little attention. By -modeling a notion of euphony and analyzing four datasets comprising persuasive -and non-persuasive sentences in different domains (political speeches, movie -quotes, slogans and tweets), we explore the impact of sounds on different forms -of persuasiveness. We conduct a series of analyses and prediction experiments -within and across datasets. Our results highlight the positive role of phonetic -devices on persuasion. -" -2021,1508.05902,Arun S. Maiya,A Framework for Comparing Groups of Documents,cs.CL cs.SI," We present a general framework for comparing multiple groups of documents. A -bipartite graph model is proposed where document groups are represented as one -node set and the comparison criteria are represented as the other node set. -Using this model, we present basic algorithms to extract insights into -similarities and differences among the document groups. Finally, we demonstrate -the versatility of our framework through an analysis of NSF funding programs -for basic research. -" -2022,1508.06034,"Jun-Ping Ng, Viktoria Abrecht",Better Summarization Evaluation with Word Embeddings for ROUGE,cs.CL cs.IR," ROUGE is a widely adopted, automatic evaluation measure for text -summarization. While it has been shown to correlate well with human judgements, -it is biased towards surface lexical similarities. This makes it unsuitable for -the evaluation of abstractive summarization, or summaries with substantial -paraphrasing. We study the effectiveness of word embeddings to overcome this -disadvantage of ROUGE. Specifically, instead of measuring lexical overlaps, -word embeddings are used to compute the semantic similarity of the words used -in summaries instead. Our experimental results show that our proposal is able -to achieve better correlations with human judgements when measured with the -Spearman and Kendall rank coefficients. -" -2023,1508.06044,"Hanchuan Li, Haichen Shen, Shengliang Xu and Congle Zhang",Visualizing NLP annotations for Crowdsourcing,cs.CL," Visualizing NLP annotation is useful for the collection of training data for -the statistical NLP approaches. Existing toolkits either provide limited visual -aid, or introduce comprehensive operators to realize sophisticated linguistic -rules. Workers must be well trained to use them. Their audience thus can hardly -be scaled to large amounts of non-expert crowdsourced workers. In this paper, -we present CROWDANNO, a visualization toolkit to allow crowd-sourced workers to -annotate two general categories of NLP problems: clustering and parsing. -Workers can finish the tasks with simplified operators in an interactive -interface, and fix errors conveniently. User studies show our toolkit is very -friendly to NLP non-experts, and allow them to produce high quality labels for -several sophisticated problems. We release our source code and toolkit to spur -future research. -" -2024,1508.06161,"Daniel Paul Barrett, Scott Alan Bronikowski, Haonan Yu, and Jeffrey - Mark Siskind","Robot Language Learning, Generation, and Comprehension",cs.RO cs.AI cs.CL cs.HC cs.LG," We present a unified framework which supports grounding natural-language -semantics in robotic driving. This framework supports acquisition (learning -grounded meanings of nouns and prepositions from human annotation of robotic -driving paths), generation (using such acquired meanings to generate sentential -description of new robotic driving paths), and comprehension (using such -acquired meanings to support automated driving to accomplish navigational goals -specified in natural language). We evaluate the performance of these three -tasks by having independent human judges rate the semantic fidelity of the -sentences associated with paths, achieving overall average correctness of 94.6% -and overall average completeness of 85.6%. -" -2025,1508.06374,Alexander Koplenig,"A fully data-driven method to identify (correlated) changes in - diachronic corpora",cs.CL cs.IR stat.AP," In this paper, a method for measuring synchronic corpus (dis-)similarity put -forward by Kilgarriff (2001) is adapted and extended to identify trends and -correlated changes in diachronic text data, using the Corpus of Historical -American English (Davies 2010a) and the Google Ngram Corpora (Michel et al. -2010a). This paper shows that this fully data-driven method, which extracts -word types that have undergone the most pronounced change in frequency in a -given period of time, is computationally very cheap and that it allows -interpretations of diachronic trends that are both intuitively plausible and -motivated from the perspective of information theory. Furthermore, it -demonstrates that the method is able to identify correlated linguistic changes -and diachronic shifts that can be linked to historical events. Finally, it can -help to improve diachronic POS tagging and complement existing NLP approaches. -This indicates that the approach can facilitate an improved understanding of -diachronic processes in language change. -" -2026,1508.06451,Ramon Ferrer-i-Cancho and Carlos G\'omez-Rodr\'iguez,Crossings as a side effect of dependency lengths,cs.CL cs.SI physics.soc-ph," The syntactic structure of sentences exhibits a striking regularity: -dependencies tend to not cross when drawn above the sentence. We investigate -two competing explanations. The traditional hypothesis is that this trend -arises from an independent principle of syntax that reduces crossings -practically to zero. An alternative to this view is the hypothesis that -crossings are a side effect of dependency lengths, i.e. sentences with shorter -dependency lengths should tend to have fewer crossings. We are able to reject -the traditional view in the majority of languages considered. The alternative -hypothesis can lead to a more parsimonious theory of language. -" -2027,1508.06491,Jacob Andreas and Dan Klein,Alignment-based compositional semantics for instruction following,cs.CL," This paper describes an alignment-based model for interpreting natural -language instructions in context. We approach instruction following as a search -over plans, scoring sequences of actions conditioned on structured observations -of text and the environment. By explicitly modeling both the low-level -compositional structure of individual actions and the high-level structure of -full plans, we are able to learn both grounded representations of sentence -meaning and pragmatic constraints on interpretation. To demonstrate the model's -flexibility, we apply it to a diverse set of benchmark tasks. On every task, we -outperform strong task-specific baselines, and achieve several new -state-of-the-art results. -" -2028,1508.06615,"Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush",Character-Aware Neural Language Models,cs.CL cs.NE stat.ML," We describe a simple neural language model that relies only on -character-level inputs. Predictions are still made at the word-level. Our model -employs a convolutional neural network (CNN) and a highway network over -characters, whose output is given to a long short-term memory (LSTM) recurrent -neural network language model (RNN-LM). On the English Penn Treebank the model -is on par with the existing state-of-the-art despite having 60% fewer -parameters. On languages with rich morphology (Arabic, Czech, French, German, -Spanish, Russian), the model outperforms word-level/morpheme-level LSTM -baselines, again with fewer parameters. The results suggest that on many -languages, character inputs are sufficient for language modeling. Analysis of -word representations obtained from the character composition part of the model -reveals that the model is able to encode, from characters only, both semantic -and orthographic information. -" -2029,1508.06669,"Yanran Li, Wenjie Li, Fei Sun and Sujian Li",Component-Enhanced Chinese Character Embeddings,cs.CL," Distributed word representations are very useful for capturing semantic -information and have been successfully applied in a variety of NLP tasks, -especially on English. In this work, we innovatively develop two -component-enhanced Chinese character embedding models and their bigram -extensions. Distinguished from English word embeddings, our models explore the -compositions of Chinese characters, which often serve as semantic indictors -inherently. The evaluations on both word similarity and text classification -demonstrate the effectiveness of our models. -" -2030,1508.07266,"Suin Kim, Sungjoon Park, Scott A. Hale, Sooyoung Kim, Jeongmin Byun - and Alice Oh",Understanding Editing Behaviors in Multilingual Wikipedia,cs.SI cs.CL cs.CY," Multilingualism is common offline, but we have a more limited understanding -of the ways multilingualism is displayed online and the roles that -multilinguals play in the spread of content between speakers of different -languages. We take a computational approach to studying multilingualism using -one of the largest user-generated content platforms, Wikipedia. We study -multilingualism by collecting and analyzing a large dataset of the content -written by multilingual editors of the English, German, and Spanish editions of -Wikipedia. This dataset contains over two million paragraphs edited by over -15,000 multilingual users from July 8 to August 9, 2013. We analyze these -multilingual editors in terms of their engagement, interests, and language -proficiency in their primary and non-primary (secondary) languages and find -that the English edition of Wikipedia displays different dynamics from the -Spanish and German editions. Users primarily editing the Spanish and German -editions make more complex edits than users who edit these editions as a second -language. In contrast, users editing the English edition as a second language -make edits that are just as complex as the edits by users who primarily edit -the English edition. In this way, English serves a special role bringing -together content written by multilinguals from many language editions. -Nonetheless, language remains a formidable hurdle to the spread of content: we -find evidence for a complexity barrier whereby editors are less likely to edit -complex content in a second language. In addition, we find that multilinguals -are less engaged and show lower levels of language proficiency in their second -languages. We also examine the topical interests of multilingual editors and -find that there is no significant difference between primary and non-primary -editors in each language. -" -2031,1508.07544,"Dong Nguyen, A. Seza Do\u{g}ru\""oz, Carolyn P. Ros\'e, Franciska de - Jong",Computational Sociolinguistics: A Survey,cs.CL," Language is a social phenomenon and variation is inherent to its social -nature. Recently, there has been a surge of interest within the computational -linguistics (CL) community in the social dimension of language. In this article -we present a survey of the emerging field of ""Computational Sociolinguistics"" -that reflects this increased interest. We aim to provide a comprehensive -overview of CL research on sociolinguistic themes, featuring topics such as the -relation between language and social identity, language use in social -interaction and multilingual communication. Moreover, we demonstrate the -potential for synergy between the research communities involved, by showing how -the large-scale data-driven methods that are widely used in CL can complement -existing sociolinguistic studies, and how sociolinguistics can inform and -challenge the methods and assumptions employed in CL studies. We hope to convey -the possible benefits of a closer collaboration between the two communities and -conclude with a discussion of open challenges. -" -2032,1508.07555,Yanping Chen,An Event Network for Exploring Open Information,cs.CL," In this paper, an event network is presented for exploring open information, -where linguistic units about an event are organized for analysing. The process -is divided into three steps: document event detection, event network -construction and event network analysis. First, by implementing event detection -or tracking, documents are retrospectively (or on-line) organized into document -events. Secondly, for each of the document event, linguistic units are -extracted and combined into event networks. Thirdly, various analytic methods -are proposed for event network analysis. In our application methodologies are -presented for exploring open information. -" -2033,1508.07709,Simon \v{S}uster and Gertjan van Noord and Ivan Titov,"Word Representations, Tree Models and Syntactic Functions",cs.CL cs.LG stat.ML," Word representations induced from models with discrete latent variables -(e.g.\ HMMs) have been shown to be beneficial in many NLP applications. In this -work, we exploit labeled syntactic dependency trees and formalize the induction -problem as unsupervised learning of tree-structured hidden Markov models. -Syntactic functions are used as additional observed variables in the model, -influencing both transition and emission components. Such syntactic information -can potentially lead to capturing more fine-grain and functional distinctions -between words, which, in turn, may be desirable in many NLP applications. We -evaluate the word representations on two tasks -- named entity recognition and -semantic frame identification. We observe improvements from exploiting -syntactic function information in both cases, and the results rivaling those of -state-of-the-art representation learning methods. Additionally, we revisit the -relationship between sequential and unlabeled-tree models and find that the -advantage of the latter is not self-evident. -" -2034,1508.07909,"Rico Sennrich, Barry Haddow and Alexandra Birch",Neural Machine Translation of Rare Words with Subword Units,cs.CL," Neural machine translation (NMT) models typically operate with a fixed -vocabulary, but translation is an open-vocabulary problem. Previous work -addresses the translation of out-of-vocabulary words by backing off to a -dictionary. In this paper, we introduce a simpler and more effective approach, -making the NMT model capable of open-vocabulary translation by encoding rare -and unknown words as sequences of subword units. This is based on the intuition -that various word classes are translatable via smaller units than words, for -instance names (via character copying or transliteration), compounds (via -compositional translation), and cognates and loanwords (via phonological and -morphological transformations). We discuss the suitability of different word -segmentation techniques, including simple character n-gram models and a -segmentation based on the byte pair encoding compression algorithm, and -empirically show that subword models improve over a back-off dictionary -baseline for the WMT 15 translation tasks English-German and English-Russian by -1.1 and 1.3 BLEU, respectively. -" -2035,1509.00533,"Scott Wisdom, Thomas Powers, Les Atlas, and James Pitton","Enhancement and Recognition of Reverberant and Noisy Speech by Extending - Its Coherence",cs.SD cs.CL stat.AP," Most speech enhancement algorithms make use of the short-time Fourier -transform (STFT), which is a simple and flexible time-frequency decomposition -that estimates the short-time spectrum of a signal. However, the duration of -short STFT frames are inherently limited by the nonstationarity of speech -signals. The main contribution of this paper is a demonstration of speech -enhancement and automatic speech recognition in the presence of reverberation -and noise by extending the length of analysis windows. We accomplish this -extension by performing enhancement in the short-time fan-chirp transform -(STFChT) domain, an overcomplete time-frequency representation that is coherent -with speech signals over longer analysis window durations than the STFT. This -extended coherence is gained by using a linear model of fundamental frequency -variation of voiced speech signals. Our approach centers around using a -single-channel minimum mean-square error log-spectral amplitude (MMSE-LSA) -estimator proposed by Habets, which scales coefficients in a time-frequency -domain to suppress noise and reverberation. In the case of multiple -microphones, we preprocess the data with either a minimum variance -distortionless response (MVDR) beamformer, or a delay-and-sum beamformer (DSB). -We evaluate our algorithm on both speech enhancement and recognition tasks for -the REVERB challenge dataset. Compared to the same processing done in the STFT -domain, our approach achieves significant improvement in terms of objective -enhancement metrics (including PESQ---the ITU-T standard measurement for speech -quality). In terms of automatic speech recognition (ASR) performance as -measured by word error rate (WER), our experiments indicate that the STFT with -a long window is more effective for ASR. -" -2036,1509.00685,"Alexander M. Rush, Sumit Chopra and Jason Weston",A Neural Attention Model for Abstractive Sentence Summarization,cs.CL cs.AI," Summarization based on text extraction is inherently limited, but -generation-style abstractive methods have proven challenging to build. In this -work, we propose a fully data-driven approach to abstractive sentence -summarization. Our method utilizes a local attention-based model that generates -each word of the summary conditioned on the input sentence. While the model is -structurally simple, it can easily be trained end-to-end and scales to a large -amount of training data. The model shows significant performance gains on the -DUC-2004 shared task compared with several strong baselines. -" -2037,1509.00705,Dinesh Balaji Sashikanth,Analysis of Communication Pattern with Scammers in Enron Corpus,cs.CL," This paper is an exploratory analysis into fraud detection taking Enron email -corpus as the case study. The paper posits conclusions like strict servitude -and unquestionable faith among employees as breeding grounds for sham among -higher executives. We also try to infer on the nature of communication between -fraudulent employees and between non- fraudulent-fraudulent employees -" -2038,1509.00838,Hongyuan Mei and Mohit Bansal and Matthew R. Walter,"What to talk about and how? Selective Generation using LSTMs with - Coarse-to-Fine Alignment",cs.CL cs.AI cs.LG cs.NE," We propose an end-to-end, domain-independent neural encoder-aligner-decoder -model for selective generation, i.e., the joint task of content selection and -surface realization. Our model first encodes a full set of over-determined -database event records via an LSTM-based recurrent neural network, then -utilizes a novel coarse-to-fine aligner to identify the small subset of salient -records to talk about, and finally employs a decoder to generate free-form -descriptions of the aligned, selected records. Our model achieves the best -selection and generation results reported to-date (with 59% relative -improvement in generation) on the benchmark WeatherGov dataset, despite using -no specialized features or linguistic resources. Using an improved k-nearest -neighbor beam filter helps further. We also perform a series of ablations and -visualizations to elucidate the contributions of our key model components. -Lastly, we evaluate the generalizability of our model on the RoboCup dataset, -and get results that are competitive with or better than the state-of-the-art, -despite being severely data-starved. -" -2039,1509.00963,"Dilek K\""u\c{c}\""uk and Do\u{g}an K\""u\c{c}\""uk",On TimeML-Compliant Temporal Expression Extraction in Turkish,cs.CL," It is commonly acknowledged that temporal expression extractors are important -components of larger natural language processing systems like information -retrieval and question answering systems. Extraction and normalization of -temporal expressions in Turkish has not been given attention so far except the -extraction of some date and time expressions within the course of named entity -recognition. As TimeML is the current standard of temporal expression and event -annotation in natural language texts, in this paper, we present an analysis of -temporal expressions in Turkish based on the related TimeML classification -(i.e., date, time, duration, and set expressions). We have created a lexicon -for Turkish temporal expressions and devised considerably wide-coverage -patterns using the lexical classes as the building blocks. We believe that the -proposed patterns, together with convenient normalization rules, can be readily -used by prospective temporal expression extraction tools for Turkish. -" -2040,1509.01007,"Dominique Osborne, Shashi Narayan and Shay B. Cohen",Encoding Prior Knowledge with Eigenword Embeddings,cs.CL," Canonical correlation analysis (CCA) is a method for reducing the dimension -of data represented using two views. It has been previously used to derive word -embeddings, where one view indicates a word, and the other view indicates its -context. We describe a way to incorporate prior knowledge into CCA, give a -theoretical justification for it, and test it by deriving word embeddings and -evaluating them on a myriad of datasets. -" -2041,1509.01023,Ibrahim Adeyanju,Generating Weather Forecast Texts with Case Based Reasoning,cs.AI cs.CL," Several techniques have been used to generate weather forecast texts. In this -paper, case based reasoning (CBR) is proposed for weather forecast text -generation because similar weather conditions occur over time and should have -similar forecast texts. CBR-METEO, a system for generating weather forecast -texts was developed using a generic framework (jCOLIBRI) which provides modules -for the standard components of the CBR architecture. The advantage in a CBR -approach is that systems can be built in minimal time with far less human -effort after initial consultation with experts. The approach depends heavily on -the goodness of the retrieval and revision components of the CBR process. We -evaluated CBRMETEO with NIST, an automated metric which has been shown to -correlate well with human judgements for this domain. The system shows -comparable performance with other NLG systems that perform the same task. -" -2042,1509.01288,"Max Zimmermann, Eirini Ntoutsi, Myra Spiliopoulou","Incremental Active Opinion Learning Over a Stream of Opinionated - Documents",cs.IR cs.CL cs.LG," Applications that learn from opinionated documents, like tweets or product -reviews, face two challenges. First, the opinionated documents constitute an -evolving stream, where both the author's attitude and the vocabulary itself may -change. Second, labels of documents are scarce and labels of words are -unreliable, because the sentiment of a word depends on the (unknown) context in -the author's mind. Most of the research on mining over opinionated streams -focuses on the first aspect of the problem, whereas for the second a continuous -supply of labels from the stream is assumed. Such an assumption though is -utopian as the stream is infinite and the labeling cost is prohibitive. To this -end, we investigate the potential of active stream learning algorithms that ask -for labels on demand. Our proposed ACOSTREAM 1 approach works with limited -labels: it uses an initial seed of labeled documents, occasionally requests -additional labels for documents from the human expert and incrementally adapts -to the underlying stream while exploiting the available labeled documents. In -its core, ACOSTREAM consists of a MNB classifier coupled with ""sampling"" -strategies for requesting class labels for new unlabeled documents. In the -experiments, we evaluate the classifier performance over time by varying: (a) -the class distribution of the opinionated stream, while assuming that the set -of the words in the vocabulary is fixed but their polarities may change with -the class distribution; and (b) the number of unknown words arriving at each -moment, while the class polarity may also change. Our results show that active -learning on a stream of opinionated documents, delivers good performance while -requiring a small selection of labels -" -2043,1509.01310,"Qian Lu, Chunshan Xu and Haitao Liu",The influence of Chunking on Dependency Crossing and Distance,cs.CL," This paper hypothesizes that chunking plays important role in reducing -dependency distance and dependency crossings. Computer simulations, when -compared with natural languages,show that chunking reduces mean dependency -distance (MDD) of a linear sequence of nodes (constrained by continuity or -projectivity) to that of natural languages. More interestingly, chunking alone -brings about less dependency crossings as well, though having failed to reduce -them, to such rarity as found in human languages. These results suggest that -chunking may play a vital role in the minimization of dependency distance, and -a somewhat contributing role in the rarity of dependency crossing. In addition, -the results point to a possibility that the rarity of dependency crossings is -not a mere side-effect of minimization of dependency distance, but a linguistic -phenomenon with its own motivations. -" -2044,1509.01599,Parminder Bhatia and Yangfeng Ji and Jacob Eisenstein,Better Document-level Sentiment Analysis from RST Discourse Parsing,cs.CL cs.AI," Discourse structure is the hidden link between surface features and -document-level properties, such as sentiment polarity. We show that the -discourse analyses produced by Rhetorical Structure Theory (RST) parsers can -improve document-level sentiment analysis, via composition of local information -up the discourse tree. First, we show that reweighting discourse units -according to their position in a dependency representation of the rhetorical -structure can yield substantial improvements on lexicon-based sentiment -analysis. Next, we present a recursive neural network over the RST structure, -which offers significant improvements over classification-based methods. -" -2045,1509.01626,"Xiang Zhang, Junbo Zhao, Yann LeCun",Character-level Convolutional Networks for Text Classification,cs.LG cs.CL," This article offers an empirical exploration on the use of character-level -convolutional networks (ConvNets) for text classification. We constructed -several large-scale datasets to show that character-level convolutional -networks could achieve state-of-the-art or competitive results. Comparisons are -offered against traditional models such as bag of words, n-grams and their -TFIDF variants, and deep learning models such as word-based ConvNets and -recurrent neural networks. -" -2046,1509.01692,"Ekaterina Vylomova, Laura Rimell, Trevor Cohn, Timothy Baldwin","Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility - of Vector Differences for Lexical Relation Learning",cs.CL," Recent work on word embeddings has shown that simple vector subtraction over -pre-trained embeddings is surprisingly effective at capturing different lexical -relations, despite lacking explicit supervision. Prior work has evaluated this -intriguing result using a word analogy prediction formulation and hand-selected -relations, but the generality of the finding over a broader range of lexical -relation types and different learning settings has not been evaluated. In this -paper, we carry out such an evaluation in two learning settings: (1) spectral -clustering to induce word relations, and (2) supervised learning to classify -vector differences into relation types. We find that word embeddings capture a -surprising amount of information, and that, under suitable supervised training, -vector subtraction generalises well to a broad range of relations, including -over unseen lexical items. -" -2047,1509.01722,Ramon Ferrer-i-Cancho,"A commentary on ""The now-or-never bottleneck: a fundamental constraint - on language"", by Christiansen and Chater (2016)",cs.CL," In a recent article, Christiansen and Chater (2016) present a fundamental -constraint on language, i.e. a now-or-never bottleneck that arises from our -fleeting memory, and explore its implications, e.g., chunk-and-pass processing, -outlining a framework that promises to unify different areas of research. Here -we explore additional support for this constraint and suggest further -connections from quantitative linguistics and information theory. -" -2048,1509.01771,Gibran Fuentes-Pineda and Ivan Vladimir Meza-Ruiz,Sampled Weighted Min-Hashing for Large-Scale Topic Mining,cs.LG cs.CL cs.IR," We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to -automatically mine topics from large-scale corpora. SWMH generates multiple -random partitions of the corpus vocabulary based on term co-occurrence and -agglomerates highly overlapping inter-partition cells to produce the mined -topics. While other approaches define a topic as a probabilistic distribution -over a vocabulary, SWMH topics are ordered subsets of such vocabulary. -Interestingly, the topics mined by SWMH underlie themes from the corpus at -different levels of granularity. We extensively evaluate the meaningfulness of -the mined topics both qualitatively and quantitatively on the NIPS (1.7 K -documents), 20 Newsgroups (20 K), Reuters (800 K) and Wikipedia (4 M) corpora. -Additionally, we compare the quality of SWMH with Online LDA topics for -document representation in classification. -" -2049,1509.01865,"Alex Olieman, Jaap Kamps, Maarten Marx, and Arjan Nusselder",A Hybrid Approach to Domain-Specific Entity Linking,cs.IR cs.CL," The current state-of-the-art Entity Linking (EL) systems are geared towards -corpora that are as heterogeneous as the Web, and therefore perform -sub-optimally on domain-specific corpora. A key open problem is how to -construct effective EL systems for specific domains, as knowledge of the local -context should in principle increase, rather than decrease, effectiveness. In -this paper we propose the hybrid use of simple specialist linkers in -combination with an existing generalist system to address this problem. Our -main findings are the following. First, we construct a new reusable benchmark -for EL on a corpus of domain-specific conversations. Second, we test the -performance of a range of approaches under the same conditions, and show that -specialist linkers obtain high precision in isolation, and high recall when -combined with generalist linkers. Hence, we can effectively exploit local -context and get the best of both worlds. -" -2050,1509.01899,"Quan Liu, Wu Guo, Zhen-Hua Ling","Integrate Document Ranking Information into Confidence Measure - Calculation for Spoken Term Detection",cs.CL," This paper proposes an algorithm to improve the calculation of confidence -measure for spoken term detection (STD). Given an input query term, the -algorithm first calculates a measurement named document ranking weight for each -document in the speech database to reflect its relevance with the query term by -summing all the confidence measures of the hypothesized term occurrences in -this document. The confidence measure of each term occurrence is then -re-estimated through linear interpolation with the calculated document ranking -weight to improve its reliability by integrating document-level information. -Experiments are conducted on three standard STD tasks for Tamil, Vietnamese and -English respectively. The experimental results all demonstrate that the -proposed algorithm achieves consistent improvements over the state-of-the-art -method for confidence measure calculation. Furthermore, this algorithm is still -effective even if a high accuracy speech recognizer is not available, which -makes it applicable for the languages with limited speech resources. -" -2051,1509.01938,"Katrin Kirchhoff, Bing Zhao, Wen Wang","Exploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical - Machine Translation",cs.CL," Statistical machine translation for dialectal Arabic is characterized by a -lack of data since data acquisition involves the transcription and translation -of spoken language. In this study we develop techniques for extracting parallel -data for one particular dialect of Arabic (Iraqi Arabic) from out-of-domain -corpora in different dialects of Arabic or in Modern Standard Arabic. We -compare two different data selection strategies (cross-entropy based and -submodular selection) and demonstrate that a very small but highly targeted -amount of found data can improve the performance of a baseline machine -translation system. We furthermore report on preliminary experiments on using -automatically translated speech data as additional training data. -" -2052,1509.01978,"Darko Brodic, Alessia Amelio, Zoran N. Milivojevic","An Approach to the Analysis of the South Slavic Medieval Labels Using - Image Texture",cs.CV cs.AI cs.CL," The paper presents a new script classification method for the discrimination -of the South Slavic medieval labels. It consists in the textural analysis of -the script types. In the first step, each letter is coded by the equivalent -script type, which is defined by its typographical features. Obtained coded -text is subjected to the run-length statistical analysis and to the adjacent -local binary pattern analysis in order to extract the features. The result -shows a diversity between the extracted features of the scripts, which makes -the feature classification more effective. It is the basis for the -classification process of the script identification by using an extension of a -state-of-the-art approach for document clustering. The proposed method is -evaluated on an example of hand-engraved in stone and hand-printed in paper -labels in old Cyrillic, angular and round Glagolitic. Experiments demonstrate -very positive results, which prove the effectiveness of the proposed method. -" -2053,1509.02208,"Cheng-Tao Chung, Chun-an Chan, Lin-shan Lee","Unsupervised Discovery of Linguistic Structure Including Two-level - Acoustic Patterns Using Three Cascaded Stages of Iterative Optimization",cs.CL," Techniques for unsupervised discovery of acoustic patterns are getting -increasingly attractive, because huge quantities of speech data are becoming -available but manual annotations remain hard to acquire. In this paper, we -propose an approach for unsupervised discovery of linguistic structure for the -target spoken language given raw speech data. This linguistic structure -includes two-level (subword-like and word-like) acoustic patterns, the lexicon -of word-like patterns in terms of subword-like patterns and the N-gram language -model based on word-like patterns. All patterns, models, and parameters can be -automatically learned from the unlabelled speech corpus. This is achieved by an -initialization step followed by three cascaded stages for acoustic, linguistic, -and lexical iterative optimization. The lexicon of word-like patterns defines -allowed consecutive sequence of HMMs for subword-like patterns. In each -iteration, model training and decoding produces updated labels from which the -lexicon and HMMs can be further updated. In this way, model parameters and -decoded labels are respectively optimized in each iteration, and the knowledge -about the linguistic structure is learned gradually layer after layer. The -proposed approach was tested in preliminary experiments on a corpus of Mandarin -broadcast news, including a task of spoken term detection with performance -compared to a parallel test using models trained in a supervised way. Results -show that the proposed system not only yields reasonable performance on its -own, but is also complimentary to existing large vocabulary ASR systems. -" -2054,1509.02213,"Cheng-Tao Chung, Chun-an Chan, Lin-shan Lee","Unsupervised Spoken Term Detection with Spoken Queries by Multi-level - Acoustic Patterns with Varying Model Granularity",cs.CL," This paper presents a new approach for unsupervised Spoken Term Detection -with spoken queries using multiple sets of acoustic patterns automatically -discovered from the target corpus. The different pattern HMM -configurations(number of states per model, number of distinct models, number of -Gaussians per state)form a three-dimensional model granularity space. Different -sets of acoustic patterns automatically discovered on different points properly -distributed over this three-dimensional space are complementary to one another, -thus can jointly capture the characteristics of the spoken terms. By -representing the spoken content and spoken query as sequences of acoustic -patterns, a series of approaches for matching the pattern index sequences while -considering the signal variations are developed. In this way, not only the -on-line computation load can be reduced, but the signal distributions caused by -different speakers and acoustic conditions can be reasonably taken care of. The -results indicate that this approach significantly outperformed the unsupervised -feature-based DTW baseline by 16.16\% in mean average precision on the TIMIT -corpus. -" -2055,1509.02217,"Cheng-Tao Chung, Wei-Ning Hsu, Cheng-Yi Lee, Lin-Shan Lee","Enhancing Automatically Discovered Multi-level Acoustic Patterns - Considering Context Consistency With Applications in Spoken Term Detection",cs.CL," This paper presents a novel approach for enhancing the multiple sets of -acoustic patterns automatically discovered from a given corpus. In a previous -work it was proposed that different HMM configurations (number of states per -model, number of distinct models) for the acoustic patterns form a -two-dimensional space. Multiple sets of acoustic patterns automatically -discovered with the HMM configurations properly located on different points -over this two-dimensional space were shown to be complementary to one another, -jointly capturing the characteristics of the given corpus. By representing the -given corpus as sequences of acoustic patterns on different HMM sets, the -pattern indices in these sequences can be relabeled considering the context -consistency across the different sequences. Good improvements were observed in -preliminary experiments of pattern spoken term detection (STD) performed on -both TIMIT and Mandarin Broadcast News with such enhanced patterns. -" -2056,1509.02301,"Octavian-Eugen Ganea, Marina Ganea, Aurelien Lucchi, Carsten Eickhoff, - Thomas Hofmann",Probabilistic Bag-Of-Hyperlinks Model for Entity Linking,cs.CL," Many fundamental problems in natural language processing rely on determining -what entities appear in a given text. Commonly referenced as entity linking, -this step is a fundamental component of many NLP tasks such as text -understanding, automatic summarization, semantic search or machine translation. -Name ambiguity, word polysemy, context dependencies and a heavy-tailed -distribution of entities contribute to the complexity of this problem. - We here propose a probabilistic approach that makes use of an effective -graphical model to perform collective entity disambiguation. Input mentions -(i.e.,~linkable token spans) are disambiguated jointly across an entire -document by combining a document-level prior of entity co-occurrences with -local information captured from mentions and their surrounding context. The -model is based on simple sufficient statistics extracted from data, thus -relying on few parameters to be learned. - Our method does not require extensive feature engineering, nor an expensive -training procedure. We use loopy belief propagation to perform approximate -inference. The low complexity of our model makes this step sufficiently fast -for real-time usage. We demonstrate the accuracy of our approach on a wide -range of benchmark datasets, showing that it matches, and in many cases -outperforms, existing state-of-the-art methods. -" -2057,1509.02409,"Mortaza Doulaty, Oscar Saz, Thomas Hain",Data-selective Transfer Learning for Multi-Domain Speech Recognition,cs.LG cs.CL cs.SD," Negative transfer in training of acoustic models for automatic speech -recognition has been reported in several contexts such as domain change or -speaker characteristics. This paper proposes a novel technique to overcome -negative transfer by efficient selection of speech data for acoustic model -training. Here data is chosen on relevance for a specific target. A submodular -function based on likelihood ratios is used to determine how acoustically -similar each training utterance is to a target test set. The approach is -evaluated on a wide-domain data set, covering speech from radio and TV -broadcasts, telephone conversations, meetings, lectures and read speech. -Experiments demonstrate that the proposed technique both finds relevant data -and limits negative transfer. Results on a 6--hour test set show a relative -improvement of 4% with data selection over using all data in PLP based models, -and 2% with DNN features. -" -2058,1509.02412,"Mortaza Doulaty, Oscar Saz, Thomas Hain","Unsupervised Domain Discovery using Latent Dirichlet Allocation for - Acoustic Modelling in Speech Recognition",cs.CL," Speech recognition systems are often highly domain dependent, a fact widely -reported in the literature. However the concept of domain is complex and not -bound to clear criteria. Hence it is often not evident if data should be -considered to be out-of-domain. While both acoustic and language models can be -domain specific, work in this paper concentrates on acoustic modelling. We -present a novel method to perform unsupervised discovery of domains using -Latent Dirichlet Allocation (LDA) modelling. Here a set of hidden domains is -assumed to exist in the data, whereby each audio segment can be considered to -be a weighted mixture of domain properties. The classification of audio -segments into domains allows the creation of domain specific acoustic models -for automatic speech recognition. Experiments are conducted on a dataset of -diverse speech data covering speech from radio and TV broadcasts, telephone -conversations, meetings, lectures and read speech, with a joint training set of -60 hours and a test set of 6 hours. Maximum A Posteriori (MAP) adaptation to -LDA based domains was shown to yield relative Word Error Rate (WER) -improvements of up to 16% relative, compared to pooled training, and up to 10%, -compared with models adapted with human-labelled prior domain knowledge. -" -2059,1509.02437,"Rishabh Soni, K. James Mathai",Improved Twitter Sentiment Prediction through Cluster-then-Predict Model,cs.IR cs.CL cs.LG cs.SI," Over the past decade humans have experienced exponential growth in the use of -online resources, in particular social media and microblogging websites such as -Facebook, Twitter, YouTube and also mobile applications such as WhatsApp, Line, -etc. Many companies have identified these resources as a rich mine of marketing -knowledge. This knowledge provides valuable feedback which allows them to -further develop the next generation of their product. In this paper, sentiment -analysis of a product is performed by extracting tweets about that product and -classifying the tweets showing it as positive and negative sentiment. The -authors propose a hybrid approach which combines unsupervised learning in the -form of K-means clustering to cluster the tweets and then performing supervised -learning methods such as Decision Trees and Support Vector Machines for -classification. -" -2060,1509.03208,"Abdelrahim A Elmadany, Sherif M Abdou and Mervat Gheith",Towards Understanding Egyptian Arabic Dialogues,cs.CL," Labelling of user's utterances to understanding his attends which called -Dialogue Act (DA) classification, it is considered the key player for dialogue -language understanding layer in automatic dialogue systems. In this paper, we -proposed a novel approach to user's utterances labeling for Egyptian -spontaneous dialogues and Instant Messages using Machine Learning (ML) approach -without relying on any special lexicons, cues, or rules. Due to the lack of -Egyptian dialect dialogue corpus, the system evaluated by multi-genre corpus -includes 4725 utterances for three domains, which are collected and annotated -manually from Egyptian call-centers. The system achieves F1 scores of 70. 36% -overall domains. -" -2061,1509.03295,Ramon Ferrer-i-Cancho and Carlos G\'omez-Rodr\'iguez,Liberating language research from dogmas of the 20th century,cs.CL cs.SI physics.soc-ph," A commentary on the article ""Large-scale evidence of dependency length -minimization in 37 languages"" by Futrell, Mahowald & Gibson (PNAS 2015 112 (33) -10336-10341). -" -2062,1509.03488,Judith Eckle-Kohler,"Verbs Taking Clausal and Non-Finite Arguments as Signals of Modality - - Revisiting the Issue of Meaning Grounded in Syntax",cs.CL," We revisit Levin's theory about the correspondence of verb meaning and syntax -and infer semantic classes from a large syntactic classification of more than -600 German verbs taking clausal and non-finite arguments. Grasping the meaning -components of Levin-classes is known to be hard. We address this challenge by -setting up a multi-perspective semantic characterization of the inferred -classes. To this end, we link the inferred classes and their English -translation to independently constructed semantic classes in three different -lexicons - the German wordnet GermaNet, VerbNet and FrameNet - and perform a -detailed analysis and evaluation of the resulting German-English classification -(available at www.ukp.tu-darmstadt.de/modality-verbclasses/). -" -2063,1509.03611,"Ella Rabinovich, Shuly Wintner, Ofek Luis Lewinsohn",A Parallel Corpus of Translationese,cs.CL," We describe a set of bilingual English--French and English--German parallel -corpora in which the direction of translation is accurately and reliably -annotated. The corpora are diverse, consisting of parliamentary proceedings, -literary works, transcriptions of TED talks and political commentary. They will -be instrumental for research of translationese and its applications to (human -and machine) translation; specifically, they can be used for the task of -translationese identification, a research direction that enjoys a growing -interest in recent years. To validate the quality and reliability of the -corpora, we replicated previous results of supervised and unsupervised -identification of translationese, and further extended the experiments to -additional datasets and languages. -" -2064,1509.03739,"Roland Roller, Eneko Agirre, Aitor Soroa, Mark Stevenson",Improving distant supervision using inference learning,cs.CL," Distant supervision is a widely applied approach to automatic training of -relation extraction systems and has the advantage that it can generate large -amounts of labelled data with minimal effort. However, this data may contain -errors and consequently systems trained using distant supervision tend not to -perform as well as those based on manually labelled data. This work proposes a -novel method for detecting potential false negative training examples using a -knowledge inference method. Results show that our approach improves the -performance of relation extraction systems trained using distantly supervised -data. -" -2065,1509.03870,"Raymond W. M. Ng, Mortaza Doulaty, Rama Doddipatla, Wilker Aziz, - Kashif Shah, Oscar Saz, Madina Hasan, Ghada AlHarbi, Lucia Specia, Thomas - Hain",The USFD Spoken Language Translation System for IWSLT 2014,cs.CL," The University of Sheffield (USFD) participated in the International Workshop -for Spoken Language Translation (IWSLT) in 2014. In this paper, we will -introduce the USFD SLT system for IWSLT. Automatic speech recognition (ASR) is -achieved by two multi-pass deep neural network systems with adaptation and -rescoring techniques. Machine translation (MT) is achieved by a phrase-based -system. The USFD primary system incorporates state-of-the-art ASR and MT -techniques and gives a BLEU score of 23.45 and 14.75 on the English-to-French -and English-to-German speech-to-text translation task with the IWSLT 2014 data. -The USFD contrastive systems explore the integration of ASR and MT by using a -quality estimation system to rescore the ASR outputs, optimising towards better -translation. This gives a further 0.54 and 0.26 BLEU improvement respectively -on the IWSLT 2012 and 2014 evaluation data. -" -2066,1509.04219,Afroze Ibrahim Baqapuri,Twitter Sentiment Analysis,cs.CL cs.IR cs.SI," This project addresses the problem of sentiment analysis in twitter; that is -classifying tweets according to the sentiment expressed in them: positive, -negative or neutral. Twitter is an online micro-blogging and social-networking -platform which allows users to write short status updates of maximum length 140 -characters. It is a rapidly expanding service with over 200 million registered -users - out of which 100 million are active users and half of them log on -twitter on a daily basis - generating nearly 250 million tweets per day. Due to -this large amount of usage we hope to achieve a reflection of public sentiment -by analysing the sentiments expressed in the tweets. Analysing the public -sentiment is important for many applications such as firms trying to find out -the response of their products in the market, predicting political elections -and predicting socioeconomic phenomena like stock exchange. The aim of this -project is to develop a functional classifier for accurate and automatic -sentiment classification of an unknown tweet stream. -" -2067,1509.04385,S. Amarappa and S. V. Sathyanarayana,"Kannada named entity recognition and classification (nerc) based on - multinomial na\""ive bayes (mnb) classifier",cs.CL," Named Entity Recognition and Classification (NERC) is a process of -identification of proper nouns in the text and classification of those nouns -into certain predefined categories like person name, location, organization, -date, and time etc. NERC in Kannada is an essential and challenging task. The -aim of this work is to develop a novel model for NERC, based on Multinomial -Na\""ive Bayes (MNB) Classifier. The Methodology adopted in this paper is based -on feature extraction of training corpus, by using term frequency, inverse -document frequency and fitting them to a tf-idf-vectorizer. The paper discusses -the various issues in developing the proposed model. The details of -implementation and performance evaluation are discussed. The experiments are -conducted on a training corpus of size 95,170 tokens and test corpus of 5,000 -tokens. It is observed that the model works with Precision, Recall and -F1-measure of 83%, 79% and 81% respectively. -" -2068,1509.04393,"Haitao Liu, Chunshan Xu and Junying Liang",Dependency length minimization: Puzzles and Promises,cs.CL," In the recent issue of PNAS, Futrell et al. claims that their study of 37 -languages gives the first large scale cross-language evidence for Dependency -Length Minimization, which is an overstatement that ignores similar previous -researches. In addition,this study seems to pay no attention to factors like -the uniformity of genres,which weakens the validity of the argument that DLM is -universal. Another problem is that this study sets the baseline random language -as projective, which fails to truly uncover the difference between natural -language and random language, since projectivity is an important feature of -many natural languages. Finally, the paper contends an ""apparent relationship -between head finality and dependency length"" despite the lack of an explicit -statistical comparison, which renders this conclusion rather hasty and -improper. -" -2069,1509.04473,"Joachim Daiber, Lautaro Quiroz, Roger Wechsler, Stella Frank",Splitting Compounds by Semantic Analogy,cs.CL," Compounding is a highly productive word-formation process in some languages -that is often problematic for natural language processing applications. In this -paper, we investigate whether distributional semantics in the form of word -embeddings can enable a deeper, i.e., more knowledge-rich, processing of -compounds than the standard string-based methods. We present an unsupervised -approach that exploits regularities in the semantic vector space (based on -analogies such as ""bookshop is to shop as bookshelf is to shelf"") to produce -compound analyses of high quality. A subsequent compound splitting algorithm -based on these analyses is highly effective, particularly for ambiguous -compounds. German to English machine translation experiments show that this -semantic analogy-based compound splitter leads to better translations than a -commonly used frequency-based method. -" -2070,1509.04556,Liang Liu and Lili Yu,On the evolution of word usage of classical Chinese poetry,physics.soc-ph cs.CL," The hierarchy of classical Chinese poetry has been broadly acknowledged by a -number of studies in Chinese literature. However, quantitative investigations -about the evolutionary linkages of classical Chinese poetry are limited. The -primary goal of this study is to provide quantitative evidence of the -evolutionary linkages, with emphasis on character usage, among different period -genres of classical Chinese poetry. Specifically, various statistical analyses -are performed to find and compare the patterns of character usage in the poems -of nine period genres, including shi jing, chu ci, Han shi , Jin shi, Tang shi, -Song shi, Yuan shi, Ming shi, and Qing shi. The result of analysis indicates -that each of nine period genres has unique patterns of character usage, with -some Chinese characters that are preferably used in the poems of a particular -period genre. The analysis on the general pattern of character preference -implies a decreasing trend in the use of Chinese characters that rarely occur -in modern Chinese literature along the timeline of dynastic types of classical -Chinese poetry. The phylogenetic analysis based on the distance matrix suggests -that the evolutionary linkages of different types of classical Chinese poetry -are congruent with their chronological order, suggesting that character -frequencies contain phylogenetic information that is useful for inferring -evolutionary linkages among various types of classical Chinese poetry. The -estimated phylogenetic tree identifies four groups (shi jing, chu ci), (Han -shi, Jin shi), (Tang shi, Song shi, Yuan shi), and (Ming shi, Qing shi). The -statistical analyses conducted in this study can be generalized to analyze the -data sets of general Chinese literature. Such analyses can provide quantitative -insights about the evolutionary linkages of general Chinese literature. -" -2071,1509.04811,Tadele Tedla,amLite: Amharic Transliteration Using Key Map Dictionary,cs.CL cs.IR," amLite is a framework developed to map ASCII transliterated Amharic texts -back to the original Amharic letter texts. The aim of such a framework is to -make existing Amharic linguistic data consistent and interoperable among -researchers. For achieving the objective, a key map dictionary is constructed -using the possible ASCII combinations actively in use for transliterating -Amharic letters; and a mapping of the combinations to the corresponding Amharic -letters is done. The mapping is then used to replace the Amharic linguistic -text back to form the original Amharic letters text. The framework indicated -97.7, 99.7 and 98.4 percentage accuracy on converting the three sample random -test data. It is; however, possible to improve the accuracy of the framework by -adding an exception to the implementation of the algorithm, or by preprocessing -the input text prior to conversion. This paper outlined the rationales behind -the need for developing the framework and the processes undertaken in the -development. -" -2072,1509.05209,Antonio Trenta and Anthony Hunter and Sebastian Riedel,"Extraction of evidence tables from abstracts of randomized clinical - trials using a maximum entropy classifier and global constraints",cs.CL cs.AI," Systematic use of the published results of randomized clinical trials is -increasingly important in evidence-based medicine. In order to collate and -analyze the results from potentially numerous trials, evidence tables are used -to represent trials concerning a set of interventions of interest. An evidence -table has columns for the patient group, for each of the interventions being -compared, for the criterion for the comparison (e.g. proportion who survived -after 5 years from treatment), and for each of the results. Currently, it is a -labour-intensive activity to read each published paper and extract the -information for each field in an evidence table. There have been some NLP -studies investigating how some of the features from papers can be extracted, or -at least the relevant sentences identified. However, there is a lack of an NLP -system for the systematic extraction of each item of information required for -an evidence table. We address this need by a combination of a maximum entropy -classifier, and integer linear programming. We use the later to handle -constraints on what is an acceptable classification of the features to be -extracted. With experimental results, we demonstrate substantial advantages in -using global constraints (such as the features describing the patient group, -and the interventions, must occur before the features describing the results of -the comparison). -" -2073,1509.05281,Diego R. Amancio,Network analysis of named entity co-occurrences in written texts,cs.CL physics.data-an physics.soc-ph," The use of methods borrowed from statistics and physics to analyze written -texts has allowed the discovery of unprecedent patterns of human behavior and -cognition by establishing links between models features and language structure. -While current models have been useful to unveil patterns via analysis of -syntactical and semantical networks, only a few works have probed the relevance -of investigating the structure arising from the relationship between relevant -entities such as characters, locations and organizations. In this study, we -represent entities appearing in the same context as a co-occurrence network, -where links are established according to a null model based on random, shuffled -texts. Computational simulations performed in novels revealed that the proposed -model displays interesting topological features, such as the small world -feature, characterized by high values of clustering coefficient. The -effectiveness of our model was verified in a practical pattern recognition task -in real networks. When compared with traditional word adjacency networks, our -model displayed optimized results in identifying unknown references in texts. -Because the proposed representation plays a complementary role in -characterizing unstructured documents via topological analysis of named -entities, we believe that it could be useful to improve the characterization of -written texts (and related systems), specially if combined with traditional -approaches based on statistical and deeper paradigms. -" -2074,1509.05488,"Han Xiao, Minlie Huang, Yu Hao, Xiaoyan Zhu",TransG : A Generative Mixture Model for Knowledge Graph Embedding,cs.CL," Recently, knowledge graph embedding, which projects symbolic entities and -relations into continuous vector space, has become a new, hot topic in -artificial intelligence. This paper addresses a new issue of multiple relation -semantics that a relation may have multiple meanings revealed by the entity -pairs associated with the corresponding triples, and proposes a novel Gaussian -mixture model for embedding, TransG. The new model can discover latent -semantics for a relation and leverage a mixture of relation component vectors -for embedding a fact triple. To the best of our knowledge, this is the first -generative model for knowledge graph embedding, which is able to deal with -multiple relation semantics. Extensive experiments show that the proposed model -achieves substantial improvements against the state-of-the-art baselines. -" -2075,1509.05490,"Han Xiao, Minlie Huang, Yu Hao, Xiaoyan Zhu",TransA: An Adaptive Approach for Knowledge Graph Embedding,cs.CL," Knowledge representation is a major topic in AI, and many studies attempt to -represent entities and relations of knowledge base in a continuous vector -space. Among these attempts, translation-based methods build entity and -relation vectors by minimizing the translation loss from a head entity to a -tail one. In spite of the success of these methods, translation-based methods -also suffer from the oversimplified loss metric, and are not competitive enough -to model various and complex entities/relations in knowledge bases. To address -this issue, we propose \textbf{TransA}, an adaptive metric approach for -embedding, utilizing the metric learning ideas to provide a more flexible -embedding method. Experiments are conducted on the benchmark datasets and our -proposed method makes significant and consistent improvements over the -state-of-the-art baselines. -" -2076,1509.05517,Gang Chen and Mikel L. Forcada,"A Light Sliding-Window Part-of-Speech Tagger for the Apertium - Free/Open-Source Machine Translation Platform",cs.CL," This paper describes a free/open-source implementation of the light -sliding-window (LSW) part-of-speech tagger for the Apertium free/open-source -machine translation platform. Firstly, the mechanism and training process of -the tagger are reviewed, and a new method for incorporating linguistic rules is -proposed. Secondly, experiments are conducted to compare the performances of -the tagger under different window settings, with or without Apertium-style -""forbid"" rules, with or without Constraint Grammar, and also with respect to -the traditional HMM tagger in Apertium. -" -2077,1509.05736,"Issa Atoum, Chih How Bong, Narayanan Kulathuramaiyer",Building a Pilot Software Quality-in-Use Benchmark Dataset,cs.SE cs.CL," Prepared domain specific datasets plays an important role to supervised -learning approaches. In this article a new sentence dataset for software -quality-in-use is proposed. Three experts were chosen to annotate the data -using a proposed annotation scheme. Then the data were reconciled in a (no -match eliminate) process to reduce bias. The Kappa, k statistics revealed an -acceptable level of agreement; moderate to substantial agreement between the -experts. The built data can be used to evaluate software quality-in-use models -in sentiment analysis models. Moreover, the annotation scheme can be used to -extend the current dataset. -" -2078,1509.05808,"Tatsunori B. Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola","Word, graph and manifold embedding from Markov processes",cs.CL cs.LG stat.ML," Continuous vector representations of words and objects appear to carry -surprisingly rich semantic content. In this paper, we advance both the -conceptual and theoretical understanding of word embeddings in three ways. -First, we ground embeddings in semantic spaces studied in -cognitive-psychometric literature and introduce new evaluation tasks. Second, -in contrast to prior work, we take metric recovery as the key object of study, -unify existing algorithms as consistent metric recovery methods based on -co-occurrence counts from simple Markov random walks, and propose a new -recovery algorithm. Third, we generalize metric recovery to graphs and -manifolds, relating co-occurence counts on random walks in graphs and random -processes on manifolds to the underlying metric to be recovered, thereby -reconciling manifold estimation and embedding algorithms. We compare embedding -algorithms across a range of tasks, from nonlinear dimensionality reduction to -three semantic language tasks, including analogies, sequence completion, and -classification. -" -2079,1509.06053,"Hugo Jair Escalante and Manuel Montes-y-G\'omez, and Luis - Villase\~nor-Pineda, and Marcelo Luis Errecalde",Early text classification: a Naive solution,cs.CL," Text classification is a widely studied problem, and it can be considered -solved for some domains and under certain circumstances. There are scenarios, -however, that have received little or no attention at all, despite its -relevance and applicability. One of such scenarios is early text -classification, where one needs to know the category of a document by using -partial information only. A document is processed as a sequence of terms, and -the goal is to devise a method that can make predictions as fast as possible. -The importance of this variant of the text classification problem is evident in -domains like sexual predator detection, where one wants to identify an offender -as early as possible. This paper analyzes the suitability of the standard naive -Bayes classifier for approaching this problem. Specifically, we assess its -performance when classifying documents after seeing an increasingly number of -terms. A simple modification to the standard naive Bayes implementation allows -us to make predictions with partial information. To the best of our knowledge -naive Bayes has not been used for this purpose before. Throughout an extensive -experimental evaluation we show the effectiveness of the classifier for early -text classification. What is more, we show that this simple solution is very -competitive when compared with state of the art methodologies that are more -elaborated. We foresee our work will pave the way for the development of more -effective early text classification techniques based in the naive Bayes -formulation. -" -2080,1509.06103,"Xiaofei Wang, Chao Wu, Pengyuan Zhang, Ziteng Wang, Yong Liu, Xu Li, - Qiang Fu, Yonghong Yan","Noise Robust IOA/CAS Speech Separation and Recognition System For The - Third 'CHIME' Challenge",cs.SD cs.CL," This paper presents the contribution to the third 'CHiME' speech separation -and recognition challenge including both front-end signal processing and -back-end speech recognition. In the front-end, Multi-channel Wiener filter -(MWF) is designed to achieve background noise reduction. Different from -traditional MWF, optimized parameter for the tradeoff between noise reduction -and target signal distortion is built according to the desired noise reduction -level. In the back-end, several techniques are taken advantage to improve the -noisy Automatic Speech Recognition (ASR) performance including Deep Neural -Network (DNN), Convolutional Neural Network (CNN) and Long short-term memory -(LSTM) using medium vocabulary, Lattice rescoring with a big vocabulary -language model finite state transducer, and ROVER scheme. Experimental results -show the proposed system combining front-end and back-end is effective to -improve the ASR performance. -" -2081,1509.06585,"Jean-Val\`ere Cossu (LIA), Vincent Labatut (LIA), Nicolas Dugu\'e (UO)","A Review of Features for the Discrimination of Twitter Users: - Application to the Prediction of Offline Influence",cs.CL cs.SI," Many works related to Twitter aim at characterizing its users in some way: -role on the service (spammers, bots, organizations, etc.), nature of the user -(socio-professional category, age, etc.), topics of interest , and others. -However, for a given user classification problem, it is very difficult to -select a set of appropriate features, because the many features described in -the literature are very heterogeneous, with name overlaps and collisions, and -numerous very close variants. In this article, we review a wide range of such -features. In order to present a clear state-of-the-art description, we unify -their names, definitions and relationships, and we propose a new, neutral, -typology. We then illustrate the interest of our review by applying a selection -of these features to the offline influence detection problem. This task -consists in identifying users which are influential in real-life, based on -their Twitter account and related data. We show that most features deemed -efficient to predict online influence, such as the numbers of retweets and -followers, are not relevant to this problem. However, We propose several -content-based approaches to label Twitter users as Influencers or not. We also -rank them according to a predicted influence level. Our proposals are evaluated -over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art methods. -" -2082,1509.06594,Bob Coecke and Martha Lewis,A Compositional Explanation of the Pet Fish Phenomenon,cs.AI cs.CL math.CT," The `pet fish' phenomenon is often cited as a paradigm example of the -`non-compositionality' of human concept use. We show here how this phenomenon -is naturally accommodated within a compositional distributional model of -meaning. This model describes the meaning of a composite concept by accounting -for interaction between its constituents via their grammatical roles. We give -two illustrative examples to show how the qualitative phenomena are exhibited. -We go on to apply the model to experimental data, and finally discuss -extensions of the formalism. -" -2083,1509.06664,"Tim Rockt\""aschel, Edward Grefenstette, Karl Moritz Hermann, - Tom\'a\v{s} Ko\v{c}isk\'y, Phil Blunsom",Reasoning about Entailment with Neural Attention,cs.CL cs.AI cs.LG cs.NE," While most approaches to automatically recognizing entailment relations have -used classifiers employing hand engineered features derived from complex -natural language processing pipelines, in practice their performance has been -only slightly better than bag-of-word pair classifiers using only lexical -similarity. The only attempt so far to build an end-to-end differentiable -neural network for entailment failed to outperform such a simple similarity -classifier. In this paper, we propose a neural model that reads two sentences -to determine entailment using long short-term memory units. We extend this -model with a word-by-word neural attention mechanism that encourages reasoning -over entailments of pairs of words and phrases. Furthermore, we present a -qualitative analysis of attention weights produced by this model, demonstrating -such reasoning capabilities. On a large entailment dataset this model -outperforms the previous best neural model and a classifier with engineered -features by a substantial margin. It is the first generic end-to-end -differentiable system that achieves state-of-the-art accuracy on a textual -entailment dataset. -" -2084,1509.06928,"Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha - Yella, James Glass, Peter Bell, Steve Renals",Automatic Dialect Detection in Arabic Broadcast Speech,cs.CL," We investigate different approaches for dialect identification in Arabic -broadcast speech, using phonetic, lexical features obtained from a speech -recognition system, and acoustic features using the i-vector framework. We -studied both generative and discriminate classifiers, and we combined these -features using a multi-class Support Vector Machine (SVM). We validated our -results on an Arabic/English language identification task, with an accuracy of -100%. We used these features in a binary classifier to discriminate between -Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%. We -further report results using the proposed method to discriminate between the -five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine, -North African, and MSA, with an accuracy of 52%. We discuss dialect -identification errors in the context of dialect code-switching between -Dialectal Arabic and MSA, and compare the error pattern between manually -labeled data, and the output from our classifier. We also release the train and -test data as standard corpus for dialect identification. -" -2085,1509.06937,Kurt Winkler and Tobias Kuhn,"Fully automatic multi-language translation with a catalogue of phrases - - successful employment for the Swiss avalanche bulletin",cs.CL," The Swiss avalanche bulletin is produced twice a day in four languages. Due -to the lack of time available for manual translation, a fully automated -translation system is employed, based on a catalogue of predefined phrases and -predetermined rules of how these phrases can be combined to produce sentences. -Because this catalogue of phrases is limited to a small sublanguage, the system -is able to automatically translate such sentences from German into the target -languages French, Italian and English without subsequent proofreading or -correction. Having been operational for two winter seasons, we assess here the -quality of the produced texts based on two different surveys where participants -rated texts from real avalanche bulletins from both origins, the catalogue of -phrases versus manually written and translated texts. With a mean recognition -rate of 55%, users can hardly distinguish between thetwo types of texts, and -give very similar ratings with respect to their language quality. Overall, the -output from the catalogue system can be considered virtually equivalent to a -text written by avalanche forecasters and then manually translated by -professional translators. Furthermore, forecasters declared that all relevant -situations were captured by the system with sufficient accuracy. Forecaster's -working load did not change with the introduction of the catalogue: the extra -time to find matching sentences is compensated by the fact that they no longer -need to double-check manually translated texts. The reduction of daily -translation costs is expected to offset the initial development costs within a -few years. -" -2086,1509.07175,Jaimie Murdock and Colin Allen and Simon DeDeo,"Exploration and Exploitation of Victorian Science in Darwin's Reading - Notebooks",cs.CL cs.AI cs.CY cs.DL physics.soc-ph," Search in an environment with an uncertain distribution of resources involves -a trade-off between exploitation of past discoveries and further exploration. -This extends to information foraging, where a knowledge-seeker shifts between -reading in depth and studying new domains. To study this decision-making -process, we examine the reading choices made by one of the most celebrated -scientists of the modern era: Charles Darwin. From the full-text of books -listed in his chronologically-organized reading journals, we generate topic -models to quantify his local (text-to-text) and global (text-to-past) reading -decisions using Kullback-Liebler Divergence, a cognitively-validated, -information-theoretic measure of relative surprise. Rather than a pattern of -surprise-minimization, corresponding to a pure exploitation strategy, Darwin's -behavior shifts from early exploitation to later exploration, seeking unusually -high levels of cognitive surprise relative to previous eras. These shifts, -detected by an unsupervised Bayesian model, correlate with major intellectual -epochs of his career as identified both by qualitative scholarship and Darwin's -own self-commentary. Our methods allow us to compare his consumption of texts -with their publication order. We find Darwin's consumption more exploratory -than the culture's production, suggesting that underneath gradual societal -changes are the explorations of individual synthesis and discovery. Our -quantitative methods advance the study of cognitive search through a framework -for testing interactions between individual and collective behavior and between -short- and long-term consumption choices. This novel application of topic -modeling to characterize individual reading complements widespread studies of -collective scientific behavior. -" -2087,1509.07179,"Kai-Wei Chang and Shyam Upadhyay and Ming-Wei Chang and Vivek Srikumar - and Dan Roth",IllinoisSL: A JAVA Library for Structured Prediction,cs.LG cs.CL stat.ML," IllinoisSL is a Java library for learning structured prediction models. It -supports structured Support Vector Machines and structured Perceptron. The -library consists of a core learning module and several applications, which can -be executed from command-lines. Documentation is provided to guide users. In -Comparison to other structured learning libraries, IllinoisSL is efficient, -general, and easy to use. -" -2088,1509.07211,"Zaihu Pang, Fengyun Zhu","Noise-Robust ASR for the third 'CHiME' Challenge Exploiting - Time-Frequency Masking based Multi-Channel Speech Enhancement and Recurrent - Neural Network",cs.SD cs.CL," In this paper, the Lingban entry to the third 'CHiME' speech separation and -recognition challenge is presented. A time-frequency masking based speech -enhancement front-end is proposed to suppress the environmental noise utilizing -multi-channel coherence and spatial cues. The state-of-the-art speech -recognition techniques, namely recurrent neural network based acoustic and -language modeling, state space minimum Bayes risk based discriminative acoustic -modeling, and i-vector based acoustic condition modeling, are carefully -integrated into the speech recognition back-end. To further improve the system -performance by fully exploiting the advantages of different technologies, the -final recognition results are obtained by lattice combination and rescoring. -Evaluations carried out on the official dataset prove the effectiveness of the -proposed systems. Comparing with the best baseline result, the proposed system -obtains consistent improvements with over 57% relative word error rate -reduction on the real-data test set. -" -2089,1509.07308,Ivan Vuli\'c and Marie-Francine Moens,"Bilingual Distributed Word Representations from Document-Aligned - Comparable Data",cs.CL," We propose a new model for learning bilingual word representations from -non-parallel document-aligned data. Following the recent advances in word -representation learning, our model learns dense real-valued word vectors, that -is, bilingual word embeddings (BWEs). Unlike prior work on inducing BWEs which -heavily relied on parallel sentence-aligned corpora and/or readily available -translation resources such as dictionaries, the article reveals that BWEs may -be learned solely on the basis of document-aligned comparable data without any -additional lexical resources nor syntactic information. We present a comparison -of our approach with previous state-of-the-art models for learning bilingual -word representations from comparable data that rely on the framework of -multilingual probabilistic topic modeling (MuPTM), as well as with -distributional local context-counting models. We demonstrate the utility of the -induced BWEs in two semantic tasks: (1) bilingual lexicon extraction, (2) -suggesting word translations in context for polysemous words. Our simple yet -effective BWE-based models significantly outperform the MuPTM-based and -context-counting representation models from comparable data as well as prior -BWE-based models, and acquire the best reported results on both tasks for all -three tested language pairs. -" -2090,1509.07513,"Marco A. Valenzuela-Esc\'arcega, Gus Hahn-Powell, Mihai Surdeanu",Description of the Odin Event Extraction Framework and Rule Language,cs.CL," This document describes the Odin framework, which is a domain-independent -platform for developing rule-based event extraction models. Odin aims to be -powerful (the rule language allows the modeling of complex syntactic -structures) and robust (to recover from syntactic parsing errors, syntactic -patterns can be freely mixed with surface, token-based patterns), while -remaining simple (some domain grammars can be up and running in minutes), and -fast (Odin processes over 100 sentences/second in a real-world domain with over -200 rules). Here we include a thorough definition of the Odin rule language, -together with a description of the Odin API in the Scala language, which allows -one to apply these rules to arbitrary texts. -" -2091,1509.07612,Nils Haldenwang and Oliver Vornberger,"Sentiment Uncertainty and Spam in Twitter Streams and Its Implications - for General Purpose Realtime Sentiment Analysis",cs.CL," State of the art benchmarks for Twitter Sentiment Analysis do not consider -the fact that for more than half of the tweets from the public stream a -distinct sentiment cannot be chosen. This paper provides a new perspective on -Twitter Sentiment Analysis by highlighting the necessity of explicitly -incorporating uncertainty. Moreover, a dataset of high quality to evaluate -solutions for this new problem is introduced and made publicly available. -" -2092,1509.07761,"Petra Kralj Novak, Jasmina Smailovi\'c, Borut Sluban, Igor Mozeti\v{c}",Sentiment of Emojis,cs.CL," There is a new generation of emoticons, called emojis, that is increasingly -being used in mobile communications and social media. In the past two years, -over ten billion emojis were used on Twitter. Emojis are Unicode graphic -symbols, used as a shorthand to express concepts and ideas. In contrast to the -small number of well-known emoticons that carry clear emotional contents, there -are hundreds of emojis. But what are their emotional contents? We provide the -first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a -sentiment map of the 751 most frequently used emojis. The sentiment of the -emojis is computed from the sentiment of the tweets in which they occur. We -engaged 83 human annotators to label over 1.6 million tweets in 13 European -languages by the sentiment polarity (negative, neutral, or positive). About 4% -of the annotated tweets contain emojis. The sentiment analysis of the emojis -allows us to draw several interesting conclusions. It turns out that most of -the emojis are positive, especially the most popular ones. The sentiment -distribution of the tweets with and without emojis is significantly different. -The inter-annotator agreement on the tweets with emojis is higher. Emojis tend -to occur at the end of the tweets, and their sentiment polarity increases with -the distance. We observe no significant differences in the emoji rankings -between the 13 languages and the Emoji Sentiment Ranking. Consequently, we -propose our Emoji Sentiment Ranking as a European language-independent resource -for automated sentiment analysis. Finally, the paper provides a formalization -of sentiment and a novel visualization in the form of a sentiment bar. -" -2093,1509.07845,"Bharat Singh, Xintong Han, Zhe Wu, Vlad I. Morariu and Larry S. Davis",Selecting Relevant Web Trained Concepts for Automated Event Retrieval,cs.CV cs.CL cs.IR," Complex event retrieval is a challenging research problem, especially when no -training videos are available. An alternative to collecting training videos is -to train a large semantic concept bank a priori. Given a text description of an -event, event retrieval is performed by selecting concepts linguistically -related to the event description and fusing the concept responses on unseen -videos. However, defining an exhaustive concept lexicon and pre-training it -requires vast computational resources. Therefore, recent approaches automate -concept discovery and training by leveraging large amounts of weakly annotated -web data. Compact visually salient concepts are automatically obtained by the -use of concept pairs or, more generally, n-grams. However, not all visually -salient n-grams are necessarily useful for an event query--some combinations of -concepts may be visually compact but irrelevant--and this drastically affects -performance. We propose an event retrieval algorithm that constructs pairs of -automatically discovered concepts and then prunes those concepts that are -unlikely to be helpful for retrieval. Pruning depends both on the query and on -the specific video instance being evaluated. Our approach also addresses -calibration and domain adaptation issues that arise when applying concept -detectors to unseen videos. We demonstrate large improvements over other vision -based systems on the TRECVID MED 13 dataset. -" -2094,1509.08639,"Krzysztof Wo{\l}k, Krzysztof Marasek",Tuned and GPU-accelerated parallel data mining from comparable corpora,cs.CL cs.AI cs.DS," The multilingual nature of the world makes translation a crucial requirement -today. Parallel dictionaries constructed by humans are a widely-available -resource, but they are limited and do not provide enough coverage for good -quality translation purposes, due to out-of-vocabulary words and neologisms. -This motivates the use of statistical translation systems, which are -unfortunately dependent on the quantity and quality of training data. Such has -a very limited availability especially for some languages and very narrow text -domains. Is this research we present our improvements to Yalign mining -methodology by reimplementing the comparison algorithm, introducing a tuning -scripts and by improving performance using GPU computing acceleration. The -experiments are conducted on various text domains and bi-data is extracted from -the Wikipedia dumps. -" -2095,1509.08644,"Krzysztof Wo{\l}k, Krzysztof Marasek","Neural-based machine translation for medical text domain. Based on - European Medicines Agency leaflet texts",cs.CL cs.CY cs.NE stat.ML," The quality of machine translation is rapidly evolving. Today one can find -several machine translation systems on the web that provide reasonable -translations, although the systems are not perfect. In some specific domains, -the quality may decrease. A recently proposed approach to this domain is neural -machine translation. It aims at building a jointly-tuned single neural network -that maximizes translation performance, a very different approach from -traditional statistical machine translation. Recently proposed neural machine -translation models often belong to the encoder-decoder family in which a source -sentence is encoded into a fixed length vector that is, in turn, decoded to -generate a translation. The present research examines the effects of different -training methods on a Polish-English Machine Translation system used for -medical data. The European Medicines Agency parallel text corpus was used as -the basis for training of neural and statistical network-based translation -systems. The main machine translation evaluation metrics have also been used in -analysis of the systems. A comparison and implementation of a real-time medical -translator is the main focus of our experiments. -" -2096,1509.08842,Ryan Shaw,Automatically Segmenting Oral History Transcripts,cs.CL," Dividing oral histories into topically coherent segments can make them more -accessible online. People regularly make judgments about where coherent -segments can be extracted from oral histories. But making these judgments can -be taxing, so automated assistance is potentially attractive to speed the task -of extracting segments from open-ended interviews. When different people are -asked to extract coherent segments from the same oral histories, they often do -not agree about precisely where such segments begin and end. This low agreement -makes the evaluation of algorithmic segmenters challenging, but there is reason -to believe that for segmenting oral history transcripts, some approaches are -more promising than others. The BayesSeg algorithm performs slightly better -than TextTiling, while TextTiling does not perform significantly better than a -uniform segmentation. BayesSeg might be used to suggest boundaries to someone -segmenting oral histories, but this segmentation task needs to be better -defined. -" -2097,1509.08874,"Krzysztof Wo{\l}k, Krzysztof Marasek","Polish - English Speech Statistical Machine Translation Systems for the - IWSLT 2014",cs.CL," This research explores effects of various training settings between Polish -and English Statistical Machine Translation systems for spoken language. -Various elements of the TED parallel text corpora for the IWSLT 2014 evaluation -campaign were used as the basis for training of language models, and for -development, tuning and testing of the translation system as well as Wikipedia -based comparable corpora prepared by us. The BLEU, NIST, METEOR and TER metrics -were used to evaluate the effects of data preparations on translation results. -Our experiments included systems, which use lemma and morphological information -on Polish words. We also conducted a deep analysis of provided Polish data as -preparatory work for the automatic data correction and cleaning phase. -" -2098,1509.08881,"Krzysztof Wo{\l}k, Krzysztof Marasek","Building Subject-aligned Comparable Corpora and Mining it for Truly - Parallel Sentence Pairs",cs.CL cs.IR stat.ML," Parallel sentences are a relatively scarce but extremely useful resource for -many applications including cross-lingual retrieval and statistical machine -translation. This research explores our methodology for mining such data from -previously obtained comparable corpora. The task is highly practical since -non-parallel multilingual data exist in far greater quantities than parallel -corpora, but parallel sentences are a much more useful resource. Here we -propose a web crawling method for building subject-aligned comparable corpora -from Wikipedia articles. We also introduce a method for extracting truly -parallel sentences that are filtered out from noisy or just comparable sentence -pairs. We describe our implementation of a specialized tool for this task as -well as training and adaption of a machine translation system that supplies our -filter with additional information about the similarity of comparable sentence -pairs. -" -2099,1509.08909,"Krzysztof Wo{\l}k, Krzysztof Marasek",Polish -English Statistical Machine Translation of Medical Texts,cs.CL cs.IR stat.ML," This new research explores the effects of various training methods on a -Polish to English Statistical Machine Translation system for medical texts. -Various elements of the EMEA parallel text corpora from the OPUS project were -used as the basis for training of phrase tables and language models and for -development, tuning and testing of the translation system. The BLEU, NIST, -METEOR, RIBES and TER metrics have been used to evaluate the effects of various -system and data preparations on translation results. Our experiments included -systems that used POS tagging, factored phrase models, hierarchical models, -syntactic taggers, and many different alignment methods. We also conducted a -deep analysis of Polish data as preparatory work for automatic data correction -such as true casing and punctuation normalization phase. -" -2100,1509.08967,"Tom Sercu, Christian Puhrsch, Brian Kingsbury, Yann LeCun",Very Deep Multilingual Convolutional Neural Networks for LVCSR,cs.CL cs.NE," Convolutional neural networks (CNNs) are a standard component of many current -state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) -systems. However, CNNs in LVCSR have not kept pace with recent advances in -other domains where deeper neural networks provide superior performance. In -this paper we propose a number of architectural advances in CNNs for LVCSR. -First, we introduce a very deep convolutional network architecture with up to -14 weight layers. There are multiple convolutional layers before each pooling -layer, with small 3x3 kernels, inspired by the VGG Imagenet 2014 architecture. -Then, we introduce multilingual CNNs with multiple untied layers. Finally, we -introduce multi-scale input features aimed at exploiting more context at -negligible computational cost. We evaluate the improvements first on a Babel -task for low resource speech recognition, obtaining an absolute 5.77% WER -improvement over the baseline PLP DNN by training our CNN on the combined data -of six different languages. We then evaluate the very deep CNNs on the Hub5'00 -benchmark (using the 262 hours of SWB-1 training data) achieving a word error -rate of 11.8% after cross-entropy training, a 1.4% WER improvement (10.6% -relative) over the best published CNN result so far. -" -2101,1509.08973,"Tadahiro Taniguchi, Takayuki Nagai, Tomoaki Nakamura, Naoto Iwahashi, - Tetsuya Ogata, and Hideki Asoh",Symbol Emergence in Robotics: A Survey,cs.AI cs.CL cs.CV cs.RO," Humans can learn the use of language through physical interaction with their -environment and semiotic communication with other people. It is very important -to obtain a computational understanding of how humans can form a symbol system -and obtain semiotic skills through their autonomous mental development. -Recently, many studies have been conducted on the construction of robotic -systems and machine-learning methods that can learn the use of language through -embodied multimodal interaction with their environment and other systems. -Understanding human social interactions and developing a robot that can -smoothly communicate with human users in the long term, requires an -understanding of the dynamics of symbol systems and is crucially important. The -embodied cognition and social interaction of participants gradually change a -symbol system in a constructive manner. In this paper, we introduce a field of -research called symbol emergence in robotics (SER). SER is a constructive -approach towards an emergent symbol system. The emergent symbol system is -socially self-organized through both semiotic communications and physical -interactions with autonomous cognitive developmental agents, i.e., humans and -developmental robots. Specifically, we describe some state-of-art research -topics concerning SER, e.g., multimodal categorization, word discovery, and a -double articulation analysis, that enable a robot to obtain words and their -embodied meanings from raw sensory--motor information, including visual -information, haptic information, auditory information, and acoustic speech -signals, in a totally unsupervised manner. Finally, we suggest future -directions of research in SER. -" -2102,1509.09088,"Krzysztof Wo{\l}k, Krzysztof Marasek",Enhanced Bilingual Evaluation Understudy,cs.CL stat.ML," Our research extends the Bilingual Evaluation Understudy (BLEU) evaluation -technique for statistical machine translation to make it more adjustable and -robust. We intend to adapt it to resemble human evaluation more. We perform -experiments to evaluate the performance of our technique against the primary -existing evaluation methods. We describe and show the improvements it makes -over existing methods as well as correlation to them. When human translators -translate a text, they often use synonyms, different word orders or style, and -other similar variations. We propose an SMT evaluation technique that enhances -the BLEU metric to consider variations such as those. -" -2103,1509.09090,"Krzysztof Wo{\l}k, Krzysztof Marasek",Real-Time Statistical Speech Translation,cs.CL stat.ML," This research investigates the Statistical Machine Translation approaches to -translate speech in real time automatically. Such systems can be used in a -pipeline with speech recognition and synthesis software in order to produce a -real-time voice communication system between foreigners. We obtained three main -data sets from spoken proceedings that represent three different types of human -speech. TED, Europarl, and OPUS parallel text corpora were used as the basis -for training of language models, for developmental tuning and testing of the -translation system. We also conducted experiments involving part of speech -tagging, compound splitting, linear language model interpolation, TrueCasing -and morphosyntactic analysis. We evaluated the effects of variety of data -preparations on the translation results using the BLEU, NIST, METEOR and TER -metrics and tried to give answer which metric is most suitable for PL-EN -language pair. -" -2104,1509.09093,"Krzysztof Wo{\l}k, Krzysztof Marasek","A Sentence Meaning Based Alignment Method for Parallel Text Corpora - Preparation",cs.CL cs.IR," Text alignment is crucial to the accuracy of Machine Translation (MT) -systems, some NLP tools or any other text processing tasks requiring bilingual -data. This research proposes a language independent sentence alignment approach -based on Polish (not position-sensitive language) to English experiments. This -alignment approach was developed on the TED Talks corpus, but can be used for -any text domain or language pair. The proposed approach implements various -heuristics for sentence recognition. Some of them value synonyms and semantic -text structure analysis as a part of additional information. Minimization of -data loss was ensured. The solution is compared to other sentence alignment -implementations. Also an improvement in MT system score with text processed -with described tool is shown. -" -2105,1509.09097,"Krzysztof Wo{\l}k, Krzysztof Marasek","Polish - English Speech Statistical Machine Translation Systems for the - IWSLT 2013",cs.CL stat.ML," This research explores the effects of various training settings from Polish -to English Statistical Machine Translation system for spoken language. Various -elements of the TED parallel text corpora for the IWSLT 2013 evaluation -campaign were used as the basis for training of language models, and for -development, tuning and testing of the translation system. The BLEU, NIST, -METEOR and TER metrics were used to evaluate the effects of data preparations -on translation results. Our experiments included systems, which use stems and -morphological information on Polish words. We also conducted a deep analysis of -provided Polish data as preparatory work for the automatic data correction and -cleaning phase. -" -2106,1509.09121,Md Izhar Ashraf and Sitabhra Sinha,"The ""handedness"" of language: Directional symmetry breaking of sign - usage in words",cs.CL," Language, which allows complex ideas to be communicated through symbolic -sequences, is a characteristic feature of our species and manifested in a -multitude of forms. Using large written corpora for many different languages -and scripts, we show that the occurrence probability distributions of signs at -the left and right ends of words have a distinct heterogeneous nature. -Characterizing this asymmetry using quantitative inequality measures, viz. -information entropy and the Gini index, we show that the beginning of a word is -less restrictive in sign usage than the end. This property is not simply -attributable to the use of common affixes as it is seen even when only word -roots are considered. We use the existence of this asymmetry to infer the -direction of writing in undeciphered inscriptions that agrees with the -archaeological evidence. Unlike traditional investigations of phonotactic -constraints which focus on language-specific patterns, our study reveals a -property valid across languages and writing systems. As both language and -writing are unique aspects of our species, this universal signature may reflect -an innate feature of the human cognitive phenomenon. -" -2107,1510.00001,Krzysztof Wo{\l}k,Polish to English Statistical Machine Translation,cs.CL stat.ML," This research explores the effects of various training settings on a Polish -to English Statistical Machine Translation system for spoken language. Various -elements of the TED, Europarl, and OPUS parallel text corpora were used as the -basis for training of language models, for development, tuning and testing of -the translation system. The BLEU, NIST, METEOR and TER metrics were used to -evaluate the effects of the data preparations on the translation results. -" -2108,1510.00240,Rodmonga Potapova and Denis Gordeev,"Determination of the Internet Anonymity Influence on the Level of - Aggression and Usage of Obscene Lexis",cs.CL," This article deals with the analysis of the semantic content of the anonymous -Russian-speaking forum 2ch.hk, different verbal means of expressing of the -emotional state of aggression are revealed for this site, and aggression is -classified by its directions. The lexis of different Russian-and English- -speaking anonymous forums (2ch.hk and iichan.hk, 4chan.org) and public -community ""MDK"" of the Russian-speaking social network VK is analyzed and -compared with the Open Corpus of the Russian language (Opencorpora.org and -Brown corpus). The analysis shows that anonymity has no influence on the amount -of invective items usage. The effectiveness of moderation was shown for -anonymous forums. It was established that Russian obscene lexis was used to -express the emotional state of aggression only in 60.4% of cases for 2ch.hk. -These preliminary results show that the Russian obscene lexis on the Internet -does not have direct dependence on the emotional state of aggression. -" -2109,1510.00244,Fadhela Kerdjoudj and Olivier Cur\'e,RDF Knowledge Graph Visualization From a Knowledge Extraction System,cs.HC cs.CL," In this paper, we present a system to visualize RDF knowledge graphs. These -graphs are obtained from a knowledge extraction system designed by -GEOLSemantics. This extraction is performed using natural language processing -and trigger detection. The user can visualize subgraphs by selecting some -ontology features like concepts or individuals. The system is also -multilingual, with the use of the annotated ontology in English, French, Arabic -and Chinese. -" -2110,1510.00259,"Stephanie L. Hyland, Theofanis Karaletsos, Gunnar R\""atsch",A Generative Model of Words and Relationships from Multiple Sources,cs.CL cs.LG stat.ML," Neural language models are a powerful tool to embed words into semantic -vector spaces. However, learning such models generally relies on the -availability of abundant and diverse training examples. In highly specialised -domains this requirement may not be met due to difficulties in obtaining a -large corpus, or the limited range of expression in average use. Such domains -may encode prior knowledge about entities in a knowledge base or ontology. We -propose a generative model which integrates evidence from diverse data sources, -enabling the sharing of semantic information. We achieve this by generalising -the concept of co-occurrence from distributional semantics to include other -relationships between entities or words, which we model as affine -transformations on the embedding space. We demonstrate the effectiveness of -this approach by outperforming recent models on a link prediction task and -demonstrating its ability to profit from partially or fully unobserved data -training labels. We further demonstrate the usefulness of learning from -different data sources with overlapping vocabularies. -" -2111,1510.00277,"Martin Gerlach, Francesc Font-Clos, Eduardo G. Altmann",Similarity of symbol frequency distributions with heavy tails,physics.soc-ph cs.CL physics.data-an," Quantifying the similarity between symbolic sequences is a traditional -problem in Information Theory which requires comparing the frequencies of -symbols in different sequences. In numerous modern applications, ranging from -DNA over music to texts, the distribution of symbol frequencies is -characterized by heavy-tailed distributions (e.g., Zipf's law). The large -number of low-frequency symbols in these distributions poses major difficulties -to the estimation of the similarity between sequences, e.g., they hinder an -accurate finite-size estimation of entropies. Here we show analytically how the -systematic (bias) and statistical (fluctuations) errors in these estimations -depend on the sample size~$N$ and on the exponent~$\gamma$ of the heavy-tailed -distribution. Our results are valid for the Shannon entropy $(\alpha=1)$, its -corresponding similarity measures (e.g., the Jensen-Shanon divergence), and -also for measures based on the generalized entropy of order $\alpha$. For small -$\alpha$'s, including $\alpha=1$, the errors decay slower than the $1/N$-decay -observed in short-tailed distributions. For $\alpha$ larger than a critical -value $\alpha^* = 1+1/\gamma \leq 2$, the $1/N$-decay is recovered. We show the -practical significance of our results by quantifying the evolution of the -English language over the last two centuries using a complete $\alpha$-spectrum -of measures. We find that frequent words change more slowly than less frequent -words and that $\alpha=2$ provides the most robust measure to quantify language -change. -" -2112,1510.00436,Richard Futrell and Kyle Mahowald and Edward Gibson,"Response to Liu, Xu, and Liang (2015) and Ferrer-i-Cancho and - G\'omez-Rodr\'iguez (2015) on Dependency Length Minimization",cs.CL," We address recent criticisms (Liu et al., 2015; Ferrer-i-Cancho and -G\'omez-Rodr\'iguez, 2015) of our work on empirical evidence of dependency -length minimization across languages (Futrell et al., 2015). First, we -acknowledge error in failing to acknowledge Liu (2008)'s previous work on -corpora of 20 languages with similar aims. A correction will appear in PNAS. -Nevertheless, we argue that our work provides novel, strong evidence for -dependency length minimization as a universal quantitative property of -languages, beyond this previous work, because it provides baselines which focus -on word order preferences. Second, we argue that our choices of baselines were -appropriate because they control for alternative theories. -" -2113,1510.00618,Miguel Fernandez-Fernandez and Daniel Gayo-Avello,"Automatic Taxonomy Extraction from Query Logs with no Additional Sources - of Information",cs.CL," Search engine logs store detailed information on Web users interactions. -Thus, as more and more people use search engines on a daily basis, important -trails of users common knowledge are being recorded in those files. Previous -research has shown that it is possible to extract concept taxonomies from full -text documents, while other scholars have proposed methods to obtain similar -queries from query logs. We propose a mixture of both lines of research, that -is, mining query logs not to find related queries nor query hierarchies, but -actual term taxonomies that could be used to improve search engine -effectiveness and efficiency. As a result, in this study we have developed a -method that combines lexical heuristics with a supervised classification model -to successfully extract hyponymy relations from specialization search patterns -revealed from log missions, with no additional sources of information, and in a -language independent way. -" -2114,1510.00726,Yoav Goldberg,A Primer on Neural Network Models for Natural Language Processing,cs.CL," Over the past few years, neural networks have re-emerged as powerful -machine-learning models, yielding state-of-the-art results in fields such as -image recognition and speech processing. More recently, neural network models -started to be applied also to textual natural language signals, again with very -promising results. This tutorial surveys neural network models from the -perspective of natural language processing research, in an attempt to bring -natural-language researchers up to speed with the neural techniques. The -tutorial covers input encoding for natural language tasks, feed-forward -networks, convolutional networks, recurrent networks and recursive networks, as -well as the computation graph abstraction for automatic gradient computation. -" -2115,1510.00759,"Afshin Rahimi, Moharram Eslami, Bahram Vazirnezhad",It is not all downhill from here: Syllable Contact Law in Persian,cs.CL," Syllable contact pairs crosslinguistically tend to have a falling sonority -slope a constraint which is called the Syllable Contact Law SCL In this study -the phonotactics of syllable contacts in 4202 CVCCVC words of Persian lexicon -is investigated The consonants of Persian were divided into five sonority -categories and the frequency of all possible sonority slopes is computed both -in lexicon type frequency and in corpus token frequency Since an unmarked -phonological structure has been shown to diachronically become more frequent we -expect to see the same pattern for syllable contact pairs with falling sonority -slope The correlation of sonority categories of the two consonants in a -syllable contact pair is measured using Pointwise Mutual Information -" -2116,1510.00760,"Afshin Rahimi, Bahram Vazirnezhad, Moharram Eslami","P-trac Procedure: The Dispersion and Neutralization of Contrasts in - Lexicon",cs.CL," Cognitive acoustic cues have an important role in shaping the phonological -structure of language as a means to optimal communication. In this paper we -introduced P-trac procedure in order to track dispersion of contrasts in -different contexts in lexicon. The results of applying P-trac procedure to the -case of dispersion of contrasts in pre- consonantal contexts and in consonantal -positions of CVCC sequences in Persian provide Evidence in favor of phonetic -basis of dispersion argued by Licensing by Cue hypothesis and the Dispersion -Theory of Contrast. The P- trac procedure is proved to be very effective in -revealing the dispersion of contrasts in lexicon especially when comparing the -dispersion of contrasts in different contexts. -" -2117,1510.01026,"Gerardo Febres, Klaus Jaffe","Calculating entropy at different scales among diverse communication - systems",cs.IT cs.CL math.IT," We evaluated the impact of changing the observation scale over the entropy -measures for text descriptions. MIDI coded Music, computer code and two human -natural languages were studied at the scale of characters, words, and at the -Fundamental Scale resulting from adjusting the symbols length used to interpret -each text-description until it produced minimum entropy. The results show that -the Fundamental Scale method is comparable with the use of words when measuring -entropy levels in written texts. However, this method can also be used in -communication systems lacking words such as music. Measuring symbolic entropy -at the fundamental scale allows to calculate quantitatively, relative levels of -complexity for different communication systems. The results open novel vision -on differences among the structure of the communication systems studied. -" -2118,1510.01032,"Herman Kamper, Weiran Wang, and Karen Livescu","Deep convolutional acoustic word embeddings using word-pair side - information",cs.CL," Recent studies have been revisiting whole words as the basic modelling unit -in speech recognition and query applications, instead of phonetic units. Such -whole-word segmental systems rely on a function that maps a variable-length -speech segment to a vector in a fixed-dimensional space; the resulting acoustic -word embeddings need to allow for accurate discrimination between different -word types, directly in the embedding space. We compare several old and new -approaches in a word discrimination task. Our best approach uses side -information in the form of known word pairs to train a Siamese convolutional -neural network (CNN): a pair of tied networks that take two speech segments as -input and produce their embeddings, trained with a hinge loss that separates -same-word pairs and different-word pairs by some margin. A word classifier CNN -performs similarly, but requires much stronger supervision. Both types of CNNs -yield large improvements over the best previously published results on the word -discrimination task. -" -2119,1510.01315,Weibing Deng and Armen E. Allahverdyan,"Stochastic model for phonemes uncovers an author-dependency of their - usage",cs.CL nlin.AO," We study rank-frequency relations for phonemes, the minimal units that still -relate to linguistic meaning. We show that these relations can be described by -the Dirichlet distribution, a direct analogue of the ideal-gas model in -statistical mechanics. This description allows us to demonstrate that the -rank-frequency relations for phonemes of a text do depend on its author. The -author-dependency effect is not caused by the author's vocabulary (common words -used in different texts), and is confirmed by several alternative means. This -suggests that it can be directly related to phonemes. These features contrast -to rank-frequency relations for words, which are both author and text -independent and are governed by the Zipf's law. -" -2120,1510.01431,"Alexander Mathews, Lexing Xie, Xuming He",SentiCap: Generating Image Descriptions with Sentiments,cs.CV cs.CL," The recent progress on image recognition and language modeling is making -automatic description of image content a reality. However, stylized, -non-factual aspects of the written description are missing from the current -systems. One such style is descriptions with emotions, which is commonplace in -everyday communication, and influences decision-making and interpersonal -relationships. We design a system to describe an image with emotions, and -present a model that automatically generates captions with positive or negative -sentiments. We propose a novel switching recurrent neural network with -word-level regularization, which is able to produce emotional image captions -using only 2000+ training sentences containing sentiments. We evaluate the -captions with different automatic and crowd-sourcing metrics. Our model -compares favourably in common quality metrics for image captioning. In 84.6% of -cases the generated positive captions were judged as being at least as -descriptive as the factual captions. Of these positive captions 88% were -confirmed by the crowd-sourced workers as having the appropriate sentiment. -" -2121,1510.01562,Benjamin Piwowarski and Sylvain Lamprier and Nicolas Despres,Parameterized Neural Network Language Models for Information Retrieval,cs.IR cs.CL," Information Retrieval (IR) models need to deal with two difficult issues, -vocabulary mismatch and term dependencies. Vocabulary mismatch corresponds to -the difficulty of retrieving relevant documents that do not contain exact query -terms but semantically related terms. Term dependencies refers to the need of -considering the relationship between the words of the query when estimating the -relevance of a document. A multitude of solutions has been proposed to solve -each of these two problems, but no principled model solve both. In parallel, in -the last few years, language models based on neural networks have been used to -cope with complex natural language processing tasks like emotion and paraphrase -detection. Although they present good abilities to cope with both term -dependencies and vocabulary mismatch problems, thanks to the distributed -representation of words they are based upon, such models could not be used -readily in IR, where the estimation of one language model per document (or -query) is required. This is both computationally unfeasible and prone to -over-fitting. Based on a recent work that proposed to learn a generic language -model that can be modified through a set of document-specific parameters, we -explore use of new neural network models that are adapted to ad-hoc IR tasks. -Within the language model IR framework, we propose and study the use of a -generic language model as well as a document-specific language model. Both can -be used as a smoothing component, but the latter is more adapted to the -document at hand and has the potential of being used as a full document -language model. We experiment with such models and analyze their results on -TREC-1 to 8 datasets. -" -2122,1510.01570,David Alfter,Analyzer and generator for Pali,cs.CL," This work describes a system that performs morphological analysis and -generation of Pali words. The system works with regular inflectional paradigms -and a lexical database. The generator is used to build a collection of -inflected and derived words, which in turn is used by the analyzer. Generating -and storing morphological forms along with the corresponding morphological -information allows for efficient and simple look up by the analyzer. Indeed, by -looking up a word and extracting the attached morphological information, the -analyzer does not have to compute this information. As we must, however, assume -the lexical database to be incomplete, the system can also work without the -dictionary component, using a rule-based approach. -" -2123,1510.01717,David Alfter,Language Segmentation,cs.CL," Language segmentation consists in finding the boundaries where one language -ends and another language begins in a text written in more than one language. -This is important for all natural language processing tasks. The problem can be -solved by training language models on language data. However, in the case of -low- or no-resource languages, this is problematic. I therefore investigate -whether unsupervised methods perform better than supervised methods when it is -difficult or impossible to train supervised approaches. A special focus is -given to difficult texts, i.e. texts that are rather short (one sentence), -containing abbreviations, low-resource languages and non-standard language. I -compare three approaches: supervised n-gram language models, unsupervised -clustering and weakly supervised n-gram language model induction. I devised the -weakly supervised approach in order to deal with difficult text specifically. -In order to test the approach, I compiled a small corpus of different text -types, ranging from one-sentence texts to texts of about 300 words. The weakly -supervised language model induction approach works well on short and difficult -texts, outperforming the clustering algorithm and reaching scores in the -vicinity of the supervised approach. The results look promising, but there is -room for improvement and a more thorough investigation should be undertaken. -" -2124,1510.01886,Diego Moussallem and Ricardo Choren,"Using Ontology-Based Context in the Portuguese-English Translation of - Homographs in Textual Dialogues",cs.CL," This paper introduces a novel approach to tackle the existing gap on message -translations in dialogue systems. Currently, submitted messages to the dialogue -systems are considered as isolated sentences. Thus, missing context information -impede the disambiguation of homographs words in ambiguous sentences. Our -approach solves this disambiguation problem by using concepts over existing -ontologies. -" -2125,1510.01942,"Manny Rayner and Alejandro Armando and Pierrette Bouillon and Sarah - Ebling and Johanna Gerlach and Sonia Halimi and Irene Strasly and Nikos - Tsourakis",Helping Domain Experts Build Speech Translation Systems,cs.HC cs.CL," We present a new platform, ""Regulus Lite"", which supports rapid development -and web deployment of several types of phrasal speech translation systems using -a minimal formalism. A distinguishing feature is that most development work can -be performed directly by domain experts. We motivate the need for platforms of -this type and discuss three specific cases: medical speech translation, -speech-to-sign-language translation and voice questionnaires. We briefly -describe initial experiences in developing practical systems. -" -2126,1510.01949,Antti Suni and Daniel Aalto and Martti Vainio,Hierarchical Representation of Prosody for Statistical Speech Synthesis,cs.CL cs.SD," Prominences and boundaries are the essential constituents of prosodic -structure in speech. They provide for means to chunk the speech stream into -linguistically relevant units by providing them with relative saliences and -demarcating them within coherent utterance structures. Prominences and -boundaries have both been widely used in both basic research on prosody as well -as in text-to-speech synthesis. However, there are no representation schemes -that would provide for both estimating and modelling them in a unified fashion. -Here we present an unsupervised unified account for estimating and representing -prosodic prominences and boundaries using a scale-space analysis based on -continuous wavelet transform. The methods are evaluated and compared to earlier -work using the Boston University Radio News corpus. The results show that the -proposed method is comparable with the best published supervised annotation -methods. -" -2127,1510.02049,"Spandana Gella, Marc Dymetman, Jean Michel Renders, Sriram - Venkatapathy",Assisting Composition of Email Responses: a Topic Prediction Approach,cs.CL," We propose an approach for helping agents compose email replies to customer -requests. To enable that, we use LDA to extract latent topics from a collection -of email exchanges. We then use these latent topics to label our data, -obtaining a so-called ""silver standard"" topic labelling. We exploit this -labelled set to train a classifier to: (i) predict the topic distribution of -the entire agent's email response, based on features of the customer's email; -and (ii) predict the topic distribution of the next sentence in the agent's -reply, based on the customer's email features and on features of the agent's -current sentence. The experimental results on a large email collection from a -contact center in the tele- com domain show that the proposed ap- proach is -effective in predicting the best topic of the agent's next sentence. In 80% of -the cases, the correct topic is present among the top five recommended topics -(out of fifty possible ones). This shows the potential of this method to be -applied in an interactive setting, where the agent is presented a small list of -likely topics to choose from for the next sentence. -" -2128,1510.02125,David Schlangen and Sina Zarriess and Casey Kennington,"Resolving References to Objects in Photographs using the - Words-As-Classifiers Model",cs.CL," A common use of language is to refer to visually present objects. Modelling -it in computers requires modelling the link between language and perception. -The ""words as classifiers"" model of grounded semantics views words as -classifiers of perceptual contexts, and composes the meaning of a phrase -through composition of the denotations of its component words. It was recently -shown to perform well in a game-playing scenario with a small number of object -types. We apply it to two large sets of real-world photographs that contain a -much larger variety of types and for which referring expressions are available. -Using a pre-trained convolutional neural network to extract image features, and -augmenting these with in-picture positional information, we show that the model -achieves performance competitive with the state of the art in a reference -resolution task (given expression, find bounding box of its referent), while, -as we argue, being conceptually simpler and more flexible. -" -2129,1510.02358,"Javier Vera, Pedro Montealegre, Eric Goles",Automata networks for multi-party communication in the Naming Game,cs.CL," The Naming Game has been studied to explore the role of self-organization in -the development and negotiation of linguistic conventions. In this paper, we -define an automata networks approach to the Naming Game. Two problems are -faced: (1) the definition of an automata networks for multi-party communicative -interactions; and (2) the proof of convergence for three different orders in -which the individuals are updated (updating schemes). Finally, computer -simulations are explored in two-dimensional lattices with the purpose to -recover the main features of the Naming Game and to describe the dynamics under -different updating schemes. -" -2130,1510.02387,"Pranava Swaroop Madhyastha, Mohit Bansal, Kevin Gimpel and Karen - Livescu",Mapping Unseen Words to Task-Trained Embedding Spaces,cs.CL cs.LG," We consider the supervised training setting in which we learn task-specific -word embeddings. We assume that we start with initial embeddings learned from -unlabelled data and update them to learn task-specific embeddings for words in -the supervised training data. However, for new words in the test set, we must -use either their initial embeddings or a single unknown embedding, which often -leads to errors. We address this by learning a neural network to map from -initial embeddings to the task-specific embedding space, via a multi-loss -objective function. The technique is general, but here we demonstrate its use -for improved dependency parsing (especially for sentences with -out-of-vocabulary words), as well as for downstream improvements on sentiment -analysis. -" -2131,1510.02675,Benjamin J. Wilson and Adriaan M. J. Schakel,Controlled Experiments for Word Embeddings,cs.CL," An experimental approach to studying the properties of word embeddings is -proposed. Controlled experiments, achieved through modifications of the -training corpus, permit the demonstration of direct relations between word -properties and word vector direction and length. The approach is demonstrated -using the word2vec CBOW model with experiments that independently vary word -frequency and word co-occurrence noise. The experiments reveal that word vector -length depends more or less linearly on both word frequency and the level of -noise in the co-occurrence distribution of the word. The coefficients of -linearity depend upon the word. The special point in feature space, defined by -the (artificial) word with pure noise in its co-occurrence distribution, is -found to be small but non-zero. -" -2132,1510.02693,"ShiLiang Zhang, Hui Jiang, Si Wei, LiRong Dai",Feedforward Sequential Memory Neural Networks without Recurrent Feedback,cs.NE cs.CL cs.LG," We introduce a new structure for memory neural networks, called feedforward -sequential memory networks (FSMN), which can learn long-term dependency without -using recurrent feedback. The proposed FSMN is a standard feedforward neural -networks equipped with learnable sequential memory blocks in the hidden layers. -In this work, we have applied FSMN to several language modeling (LM) tasks. -Experimental results have shown that the memory blocks in FSMN can learn -effective representations of long history. Experiments have shown that FSMN -based language models can significantly outperform not only feedforward neural -network (FNN) based LMs but also the popular recurrent neural network (RNN) -LMs. -" -2133,1510.02755,"Koushiki Sarkar, Ritwika Law",A Novel Approach to Document Classification using WordNet,cs.IR cs.CL," Content based Document Classification is one of the biggest challenges in the -context of free text mining. Current algorithms on document classifications -mostly rely on cluster analysis based on bag-of-words approach. However that -method is still being applied to many modern scientific dilemmas. It has -established a strong presence in fields like economics and social science to -merit serious attention from the researchers. In this paper we would like to -propose and explore an alternative grounded more securely on the dictionary -classification and correlatedness of words and phrases. It is expected that -application of our existing knowledge about the underlying classification -structure may lead to improvement of the classifier's performance. -" -2134,1510.02823,Daniel Gildea and T. Florian Jaeger,Human languages order information efficiently,cs.CL," Most languages use the relative order between words to encode meaning -relations. Languages differ, however, in what orders they use and how these -orders are mapped onto different meanings. We test the hypothesis that, despite -these differences, human languages might constitute different `solutions' to -common pressures of language use. Using Monte Carlo simulations over data from -five languages, we find that their word orders are efficient for processing in -terms of both dependency length and local lexical probability. This suggests -that biases originating in how the brain understands language strongly -constrain how human languages change over generations. -" -2135,1510.02983,Boyi Xie and Rebecca J. Passonneau,OmniGraph: Rich Representation and Graph Kernel Learning,cs.CL cs.LG," OmniGraph, a novel representation to support a range of NLP classification -tasks, integrates lexical items, syntactic dependencies and frame semantic -parses into graphs. Feature engineering is folded into the learning through -convolution graph kernel learning to explore different extents of the graph. A -high-dimensional space of features includes individual nodes as well as complex -subgraphs. In experiments on a text-forecasting problem that predicts stock -price change from news for company mentions, OmniGraph beats several benchmarks -based on bag-of-words, syntactic dependencies, and semantic trees. The highly -expressive features OmniGraph discovers provide insights into the semantics -across distinct market sectors. To demonstrate the method's generality, we also -report its high performance results on a fine-grained sentiment corpus. -" -2136,1510.03021,"Chao-Lin Liu, Guan-Tao Jin, Hongsu Wang, Qing-Feng Liu, Wen-Huei - Cheng, Wei-Yun Chiu, Richard Tzong-Han Tsai, Yu-Chun Wang","Textual Analysis for Studying Chinese Historical Documents and Literary - Novels",cs.CL cs.DL," We analyzed historical and literary documents in Chinese to gain insights -into research issues, and overview our studies which utilized four different -sources of text materials in this paper. We investigated the history of -concepts and transliterated words in China with the Database for the Study of -Modern China Thought and Literature, which contains historical documents about -China between 1830 and 1930. We also attempted to disambiguate names that were -shared by multiple government officers who served between 618 and 1912 and were -recorded in Chinese local gazetteers. To showcase the potentials and challenges -of computer-assisted analysis of Chinese literatures, we explored some -interesting yet non-trivial questions about two of the Four Great Classical -Novels of China: (1) Which monsters attempted to consume the Buddhist monk -Xuanzang in the Journey to the West (JTTW), which was published in the 16th -century, (2) Which was the most powerful monster in JTTW, and (3) Which major -role smiled the most in the Dream of the Red Chamber, which was published in -the 18th century. Similar approaches can be applied to the analysis and study -of modern documents, such as the newspaper articles published about the 228 -incident that occurred in 1947 in Taiwan. -" -2137,1510.03055,"Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan",A Diversity-Promoting Objective Function for Neural Conversation Models,cs.CL," Sequence-to-sequence neural network models for generation of conversational -responses tend to generate safe, commonplace responses (e.g., ""I don't know"") -regardless of the input. We suggest that the traditional objective function, -i.e., the likelihood of output (response) given input (message) is unsuited to -response generation tasks. Instead we propose using Maximum Mutual Information -(MMI) as the objective function in neural models. Experimental results -demonstrate that the proposed MMI models produce more diverse, interesting, and -appropriate responses, yielding substantive gains in BLEU scores on two -conversational datasets and in human evaluations. -" -2138,1510.03421,"Michal Jungiewicz, Micha{\l} {\L}opuszy\'nski",Towards Meaningful Maps of Polish Case Law,cs.CL," In this work, we analyze the utility of two dimensional document maps for -exploratory analysis of Polish case law. We start by comparing two methods of -generating such visualizations. First is based on linear principal component -analysis (PCA). Second makes use of the modern nonlinear t-Distributed -Stochastic Neighbor Embedding method (t-SNE). We apply both PCA and t-SNE to a -corpus of judgments from different courts in Poland. It emerges that t-SNE -provides better, more interpretable results than PCA. As a next test, we apply -t-SNE to randomly selected sample of common court judgments corresponding to -different keywords. We show that t-SNE, in this case, reveals hidden topical -structure of the documents related to keyword,,pension"". In conclusion, we find -that the t-SNE method could be a promising tool to facilitate the exploitative -analysis of legal texts, e.g., by complementing search or browse functionality -in legal databases. -" -2139,1510.03519,"Janarthanan Rajendran, Mitesh M. Khapra, Sarath Chandar, Balaraman - Ravindran","Bridge Correlational Neural Networks for Multilingual Multimodal - Representation Learning",cs.CL," Recently there has been a lot of interest in learning common representations -for multiple views of data. Typically, such common representations are learned -using a parallel corpus between the two views (say, 1M images and their English -captions). In this work, we address a real-world scenario where no direct -parallel data is available between two views of interest (say, $V_1$ and $V_2$) -but parallel data is available between each of these views and a pivot view -($V_3$). We propose a model for learning a common representation for $V_1$, -$V_2$ and $V_3$ using only the parallel data available between $V_1V_3$ and -$V_2V_3$. The proposed model is generic and even works when there are $n$ views -of interest and only one pivot view which acts as a bridge between them. There -are two specific downstream applications that we focus on (i) transfer learning -between languages $L_1$,$L_2$,...,$L_n$ using a pivot language $L$ and (ii) -cross modal access between images and a language $L_1$ using a pivot language -$L_2$. Our model achieves state-of-the-art performance in multilingual document -classification on the publicly available multilingual TED corpus and promising -results in multilingual multimodal retrieval on a new dataset created and -released as a part of this work. -" -2140,1510.03602,"Brij Mohan Lal Srivastava, Hari Krishna Vydana, Anil Kumar Vuppala, - Manish Shrivastava","A language model based approach towards large scale and lightweight - language identification systems",cs.SD cs.CL," Multilingual spoken dialogue systems have gained prominence in the recent -past necessitating the requirement for a front-end Language Identification -(LID) system. Most of the existing LID systems rely on modeling the language -discriminative information from low-level acoustic features. Due to the -variabilities of speech (speaker and emotional variabilities, etc.), -large-scale LID systems developed using low-level acoustic features suffer from -a degradation in the performance. In this approach, we have attempted to model -the higher level language discriminative phonotactic information for developing -an LID system. In this paper, the input speech signal is tokenized to phone -sequences by using a language independent phone recognizer. The language -discriminative phonotactic information in the obtained phone sequences are -modeled using statistical and recurrent neural network based language modeling -approaches. As this approach, relies on higher level phonotactical information -it is more robust to variabilities of speech. Proposed approach is -computationally light weight, highly scalable and it can be used in complement -with the existing LID systems. -" -2141,1510.03710,Miroslav Vodol\'an and Rudolf Kadlec and Jan Kleindienst,Hybrid Dialog State Tracker,cs.CL," This paper presents a hybrid dialog state tracker that combines a rule based -and a machine learning based approach to belief state tracking. Therefore, we -call it a hybrid tracker. The machine learning in our tracker is realized by a -Long Short Term Memory (LSTM) network. To our knowledge, our hybrid tracker -sets a new state-of-the-art result for the Dialog State Tracking Challenge -(DSTC) 2 dataset when the system uses only live SLU as its input. -" -2142,1510.03753,"Rudolf Kadlec, Martin Schmid, Jan Kleindienst",Improved Deep Learning Baselines for Ubuntu Corpus Dialogs,cs.CL," This paper presents results of our experiments for the next utterance ranking -on the Ubuntu Dialog Corpus -- the largest publicly available multi-turn dialog -corpus. First, we use an in-house implementation of previously reported models -to do an independent evaluation using the same data. Second, we evaluate the -performances of various LSTMs, Bi-LSTMs and CNNs on the dataset. Third, we -create an ensemble by averaging predictions of multiple models. The ensemble -further improves the performance and it achieves a state-of-the-art result for -the next utterance ranking on this dataset. Finally, we discuss our future -plans using this corpus. -" -2143,1510.03797,"Stefano Gurciullo, Michael Smallegan, Mar\'ia Pereda, Federico - Battiston, Alice Patania, Sebastian Poledna, Daniel Hedblom, Bahattin Tolga - Oztan, Alexander Herzog, Peter John, Slava Mikhaylov","Complex Politics: A Quantitative Semantic and Topological Analysis of UK - House of Commons Debates",physics.soc-ph cs.CL cs.SI," This study is a first, exploratory attempt to use quantitative semantics -techniques and topological analysis to analyze systemic patterns arising in a -complex political system. In particular, we use a rich data set covering all -speeches and debates in the UK House of Commons between 1975 and 2014. By the -use of dynamic topic modeling (DTM) and topological data analysis (TDA) we show -that both members and parties feature specific roles within the system, -consistent over time, and extract global patterns indicating levels of -political cohesion. Our results provide a wide array of novel hypotheses about -the complex dynamics of political systems, with valuable policy applications. -" -2144,1510.03820,Ye Zhang and Byron Wallace,"A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional - Neural Networks for Sentence Classification",cs.CL cs.LG cs.NE," Convolutional Neural Networks (CNNs) have recently achieved remarkably strong -performance on the practically important task of sentence classification (kim -2014, kalchbrenner 2014, johnson 2014). However, these models require -practitioners to specify an exact model architecture and set accompanying -hyperparameters, including the filter region size, regularization parameters, -and so on. It is currently unknown how sensitive model performance is to -changes in these configurations for the task of sentence classification. We -thus conduct a sensitivity analysis of one-layer CNNs to explore the effect of -architecture components on model performance; our aim is to distinguish between -important and comparatively inconsequential design decisions for sentence -classification. We focus on one-layer CNNs (to the exclusion of more complex -models) due to their comparative simplicity and strong empirical performance, -which makes it a modern standard baseline method akin to Support Vector Machine -(SVMs) and logistic regression. We derive practical advice from our extensive -empirical results for those interested in getting the most out of CNNs for -sentence classification in real world settings. -" -2145,1510.04104,"Simon Kaltenbacher, Nicholas H. Kirk, Dongheui Lee",A Preliminary Study on the Learning Informativeness of Data Subsets,cs.CL cs.RO," Estimating the internal state of a robotic system is complex: this is -performed from multiple heterogeneous sensor inputs and knowledge sources. -Discretization of such inputs is done to capture saliences, represented as -symbolic information, which often presents structure and recurrence. As these -sequences are used to reason over complex scenarios, a more compact -representation would aid exactness of technical cognitive reasoning -capabilities, which are today constrained by computational complexity issues -and fallback to representational heuristics or human intervention. Such -problems need to be addressed to ensure timely and meaningful human-robot -interaction. Our work is towards understanding the variability of learning -informativeness when training on subsets of a given input dataset. This is in -view of reducing the training size while retaining the majority of the symbolic -learning potential. We prove the concept on human-written texts, and conjecture -this work will reduce training data size of sequential instructions, while -preserving semantic relations, when gathering information from large remote -sources. -" -2146,1510.04500,Krzysztof Wo{\l}k,"Noisy-parallel and comparable corpora filtering methodology for the - extraction of bi-lingual equivalent data at sentence level",cs.CL," Text alignment and text quality are critical to the accuracy of Machine -Translation (MT) systems, some NLP tools, and any other text processing tasks -requiring bilingual data. This research proposes a language independent -bi-sentence filtering approach based on Polish (not a position-sensitive -language) to English experiments. This cleaning approach was developed on the -TED Talks corpus and also initially tested on the Wikipedia comparable corpus, -but it can be used for any text domain or language pair. The proposed approach -implements various heuristics for sentence comparison. Some of them leverage -synonyms and semantic and structural analysis of text as additional -information. Minimization of data loss was ensured. An improvement in MT system -score with text processed using the tool is discussed. -" -2147,1510.04600,"Krzysztof Wo{\l}k, Krzysztof Marasek, Wojciech Glinkowski",Telemedicine as a special case of Machine Translation,cs.CL," Machine translation is evolving quite rapidly in terms of quality. Nowadays, -we have several machine translation systems available in the web, which provide -reasonable translations. However, these systems are not perfect, and their -quality may decrease in some specific domains. This paper examines the effects -of different training methods when it comes to Polish - English Statistical -Machine Translation system used for the medical data. Numerous elements of the -EMEA parallel text corpora and not related OPUS Open Subtitles project were -used as the ground for creation of phrase tables and different language models -including the development, tuning and testing of these translation systems. The -BLEU, NIST, METEOR, and TER metrics have been used in order to evaluate the -results of various systems. Our experiments deal with the systems that include -POS tagging, factored phrase models, hierarchical models, syntactic taggers, -and other alignment methods. We also executed a deep analysis of Polish data as -preparatory work before automatized data processing such as true casing or -punctuation normalization phase. Normalized metrics was used to compare -results. Scores lower than 15% mean that Machine Translation engine is unable -to provide satisfying quality, scores greater than 30% mean that translations -should be understandable without problems and scores over 50 reflect adequate -translations. The average results of Polish to English translations scores for -BLEU, NIST, METEOR, and TER were relatively high and ranged from 70,58 to -82,72. The lowest score was 64,38. The average results ranges for English to -Polish translations were little lower (67,58 - 78,97). The real-life -implementations of presented high quality Machine Translation Systems are -anticipated in general medical practice and telemedicine. -" -2148,1510.04709,"Desmond Elliott, Stella Frank, Eva Hasler",Multilingual Image Description with Neural Sequence Models,cs.CL cs.CV cs.LG cs.NE," In this paper we present an approach to multi-language image description -bringing together insights from neural machine translation and neural image -description. To create a description of an image for a given target language, -our sequence generation models condition on feature vectors from the image, the -description from the source language, and/or a multimodal vector computed over -the image and a description in the source language. In image description -experiments on the IAPR-TC12 dataset of images aligned with English and German -sentences, we find significant and substantial improvements in BLEU4 and Meteor -scores for models trained over multiple languages, compared to a monolingual -baseline. -" -2149,1510.04734,"Michael Subotin, Anthony R. Davis","A Method for Modeling Co-Occurrence Propensity of Clinical Codes with - Application to ICD-10-PCS Auto-Coding",cs.CL," Objective. Natural language processing methods for medical auto-coding, or -automatic generation of medical billing codes from electronic health records, -generally assign each code independently of the others. They may thus assign -codes for closely related procedures or diagnoses to the same document, even -when they do not tend to occur together in practice, simply because the right -choice can be difficult to infer from the clinical narrative. - Materials and Methods. We propose a method that injects awareness of the -propensities for code co-occurrence into this process. First, a model is -trained to estimate the conditional probability that one code is assigned by a -human coder, given than another code is known to have been assigned to the same -document. Then, at runtime, an iterative algorithm is used to apply this model -to the output of an existing statistical auto-coder to modify the confidence -scores of the codes. - Results. We tested this method in combination with a primary auto-coder for -ICD-10 procedure codes, achieving a 12% relative improvement in F-score over -the primary auto-coder baseline. - Discussion. The proposed method can be used, with appropriate features, in -combination with any auto-coder that generates codes with different levels of -confidence. - Conclusion. The promising results obtained for ICD-10 procedure codes suggest -that the proposed method may have wider applications in auto-coding. -" -2150,1510.04780,"Chenhao Zhu, Kan Ren, Xuan Liu, Haofen Wang, Yiding Tian and Yong Yu","A Graph Traversal Based Approach to Answer Non-Aggregation Questions - Over DBpedia",cs.CL cs.IR," We present a question answering system over DBpedia, filling the gap between -user information needs expressed in natural language and a structured query -interface expressed in SPARQL over the underlying knowledge base (KB). Given -the KB, our goal is to comprehend a natural language query and provide -corresponding accurate answers. Focusing on solving the non-aggregation -questions, in this paper, we construct a subgraph of the knowledge base from -the detected entities and propose a graph traversal method to solve both the -semantic item mapping problem and the disambiguation problem in a joint way. -Compared with existing work, we simplify the process of query intention -understanding and pay more attention to the answer path ranking. We evaluate -our method on a non-aggregation question dataset and further on a complete -dataset. Experimental results show that our method achieves best performance -compared with several state-of-the-art systems. -" -2151,1510.04972,"Weiyi Sun, Anna Rumshisky, Ozlem Uzuner","Normalization of Relative and Incomplete Temporal Expressions in - Clinical Narratives",cs.CL cs.AI cs.IR," We analyze the RI-TIMEXes in temporally annotated corpora and propose two -hypotheses regarding the normalization of RI-TIMEXes in the clinical narrative -domain: the anchor point hypothesis and the anchor relation hypothesis. We -annotate the RI-TIMEXes in three corpora to study the characteristics of -RI-TMEXes in different domains. This informed the design of our RI-TIMEX -normalization system for the clinical domain, which consists of an anchor point -classifier, an anchor relation classifier and a rule-based RI-TIMEX text span -parser. We experiment with different feature sets and perform error analysis -for each system component. The annotation confirmed the hypotheses that we can -simplify the RI-TIMEXes normalization task using two multi-label classifiers. -Our system achieves anchor point classification, anchor relation classification -and rule-based parsing accuracy of 74.68%, 87.71% and 57.2% (82.09% under -relaxed matching criteria) respectively on the held-out test set of the 2012 -i2b2 temporal relation challenge. Experiments with feature sets reveals some -interesting findings such as the verbal tense feature does not inform the -anchor relation classification in clinical narratives as much as the tokens -near the RI-TIMEX. Error analysis shows that underrepresented anchor point and -anchor relation classes are difficult to detect. We formulate the RI-TIMEX -normalization problem as a pair of multi-label classification problems. -Considering only the RI-TIMEX extraction and normalization, the system achieves -statistically significant improvement over the RI-TIMEX results of the best -systems in the 2012 i2b2 challenge. -" -2152,1510.05198,"Jiwei Li, Alan Ritter and Dan Jurafsky","Learning multi-faceted representations of individuals from heterogeneous - evidence using neural networks",cs.SI cs.CL," Inferring latent attributes of people online is an important social computing -task, but requires integrating the many heterogeneous sources of information -available on the web. We propose learning individual representations of people -using neural nets to integrate rich linguistic and network evidence gathered -from social media. The algorithm is able to combine diverse cues, such as the -text a person writes, their attributes (e.g. gender, employer, education, -location) and social relations to other people. We show that by integrating -both textual and network evidence, these representations offer improved -performance at four important tasks in social media inference on Twitter: -predicting (1) gender, (2) occupation, (3) location, and (4) friendships for -users. Our approach scales to large datasets and the learned representations -can be used as general features in and have the potential to benefit a large -number of downstream tasks including link prediction, community detection, or -probabilistic reasoning over social networks. -" -2153,1510.05203,"Graham Neubig, Makoto Morishita, Satoshi Nakamura","Neural Reranking Improves Subjective Quality of Machine Translation: - NAIST at WAT2015",cs.CL," This year, the Nara Institute of Science and Technology (NAIST)'s submission -to the 2015 Workshop on Asian Translation was based on syntax-based statistical -machine translation, with the addition of a reranking component using neural -attentional machine translation models. Experiments re-confirmed results from -previous work stating that neural MT reranking provides a large gain in -objective evaluation measures such as BLEU, and also confirmed for the first -time that these results also carry over to manual evaluation. We further -perform a detailed analysis of reasons for this increase, finding that the main -contributions of the neural models lie in improvement of the grammatical -correctness of the output, as opposed to improvements in lexical choice of -content words. -" -2154,1510.06168,"Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao","Part-of-Speech Tagging with Bidirectional Long Short-Term Memory - Recurrent Neural Network",cs.CL," Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has -been shown to be very effective for tagging sequential data, e.g. speech -utterances or handwritten documents. While word embedding has been demoed as a -powerful representation for characterizing the statistical properties of -natural language. In this study, we propose to use BLSTM-RNN with word -embedding for part-of-speech (POS) tagging task. When tested on Penn Treebank -WSJ test set, a state-of-the-art performance of 97.40 tagging accuracy is -achieved. Without using morphological features, this approach can also achieve -a good performance comparable with the Stanford POS tagger. -" -2155,1510.06342,"Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, - Vibhor Kumar, Matilde Marcolli","Prevalence and recoverability of syntactic parameters in sparse - distributed memories",cs.CL cs.IT math.IT," We propose a new method, based on Sparse Distributed Memory (Kanerva -Networks), for studying dependency relations between different syntactic -parameters in the Principles and Parameters model of Syntax. We store data of -syntactic parameters of world languages in a Kanerva Network and we check the -recoverability of corrupted parameter data from the network. We find that -different syntactic parameters have different degrees of recoverability. We -identify two different effects: an overall underlying relation between the -prevalence of parameters across languages and their degree of recoverability, -and a finer effect that makes some parameters more easily recoverable beyond -what their prevalence would indicate. We interpret a higher recoverability for -a syntactic parameter as an indication of the existence of a dependency -relation, through which the given parameter can be determined using the -remaining uncorrupted data. -" -2156,1510.06549,Aaron Q Li,Multi-GPU Distributed Parallel Bayesian Differential Topic Modelling,cs.CL cs.DC cs.LG," There is an explosion of data, documents, and other content, and people -require tools to analyze and interpret these, tools to turn the content into -information and knowledge. Topic modeling have been developed to solve these -problems. Topic models such as LDA [Blei et. al. 2003] allow salient patterns -in data to be extracted automatically. When analyzing texts, these patterns are -called topics. Among numerous extensions of LDA, few of them can reliably -analyze multiple groups of documents and extract topic similarities. Recently, -the introduction of differential topic modeling (SPDP) [Chen et. al. 2012] -performs uniformly better than many topic models in a discriminative setting. - There is also a need to improve the sampling speed for topic models. While -some effort has been made for distributed algorithms, there is no work -currently done using graphical processing units (GPU). Note the GPU framework -has already become the most cost-efficient platform for many problems. - In this thesis, I propose and implement a scalable multi-GPU distributed -parallel framework which approximates SPDP. Through experiments, I have shown -my algorithms have a gain in speed of about 50 times while being almost as -accurate, with only one single cheap laptop GPU. Furthermore, I have shown the -speed improvement is sublinearly scalable when multiple GPUs are used, while -fairly maintaining the accuracy. Therefore on a medium-sized GPU cluster, the -speed improvement could potentially reach a factor of a thousand. - Note SPDP is just a representative of other extensions of LDA. Although my -algorithm is implemented to work with SPDP, it is designed to be a general -enough to work with other topic models. The speed-up on smaller collections -(i.e., 1000s of documents), means that these more complex LDA extensions could -now be done in real-time, thus opening up a new way of using these LDA models -in industry. -" -2157,1510.06646,"Osama Khalifa, David Wolfe Corne, Mike Chantler","A 'Gibbs-Newton' Technique for Enhanced Inference of Multivariate Polya - Parameters and Topic Models",cs.LG cs.CL stat.ML," Hyper-parameters play a major role in the learning and inference process of -latent Dirichlet allocation (LDA). In order to begin the LDA latent variables -learning process, these hyper-parameters values need to be pre-determined. We -propose an extension for LDA that we call 'Latent Dirichlet allocation Gibbs -Newton' (LDA-GN), which places non-informative priors over these -hyper-parameters and uses Gibbs sampling to learn appropriate values for them. -At the heart of LDA-GN is our proposed 'Gibbs-Newton' algorithm, which is a new -technique for learning the parameters of multivariate Polya distributions. We -report Gibbs-Newton performance results compared with two prominent existing -approaches to the latter task: Minka's fixed-point iteration method and the -Moments method. We then evaluate LDA-GN in two ways: (i) by comparing it with -standard LDA in terms of the ability of the resulting topic models to -generalize to unseen documents; (ii) by comparing it with standard LDA in its -performance on a binary classification task. -" -2158,1510.06786,"Vivek Kulkarni, Bryan Perozzi, Steven Skiena","Freshman or Fresher? Quantifying the Geographic Variation of Internet - Language",cs.CL cs.IR cs.LG," We present a new computational technique to detect and analyze statistically -significant geographic variation in language. Our meta-analysis approach -captures statistical properties of word usage across geographical regions and -uses statistical methods to identify significant changes specific to regions. -While previous approaches have primarily focused on lexical variation between -regions, our method identifies words that demonstrate semantic and syntactic -variation as well. - We extend recently developed techniques for neural language models to learn -word representations which capture differing semantics across geographical -regions. In order to quantify this variation and ensure robust detection of -true regional differences, we formulate a null model to determine whether -observed changes are statistically significant. Our method is the first such -approach to explicitly account for random variation due to chance while -detecting regional variation in word meaning. - To validate our model, we study and analyze two different massive online data -sets: millions of tweets from Twitter spanning not only four different -countries but also fifty states, as well as millions of phrases contained in -the Google Book Ngrams. Our analysis reveals interesting facets of language -change at multiple scales of geographic resolution -- from neighboring states -to distant continents. - Finally, using our model, we propose a measure of semantic distance between -languages. Our analysis of British and American English over a period of 100 -years reveals that semantic variation between these dialects is shrinking. -" -2159,1510.06807,"Will Monroe, Christopher Potts",Learning in the Rational Speech Acts Model,cs.CL," The Rational Speech Acts (RSA) model treats language use as a recursive -process in which probabilistic speaker and listener agents reason about each -other's intentions to enrich the literal semantics of their language along -broadly Gricean lines. RSA has been shown to capture many kinds of -conversational implicature, but it has been criticized as an unrealistic model -of speakers, and it has so far required the manual specification of a semantic -lexicon, preventing its use in natural language processing applications that -learn lexical knowledge from data. We address these concerns by showing how to -define and optimize a trained statistical classifier that uses the intermediate -agents of RSA as hidden layers of representation forming a non-linear -activation function. This treatment opens up new application domains and new -possibilities for learning effectively from data. We validate the model on a -referential expression generation task, showing that the best performance is -achieved by incorporating features approximating well-established insights -about natural language generation into RSA. -" -2160,1510.07035,"Joseph W Robinson, Aaron Q Li","Fast Latent Variable Models for Inference and Visualization on Mobile - Devices",cs.LG cs.CL cs.DC cs.IR," In this project we outline Vedalia, a high performance distributed network -for performing inference on latent variable models in the context of Amazon -review visualization. We introduce a new model, RLDA, which extends Latent -Dirichlet Allocation (LDA) [Blei et al., 2003] for the review space by -incorporating auxiliary data available in online reviews to improve modeling -while simultaneously remaining compatible with pre-existing fast sampling -techniques such as [Yao et al., 2009; Li et al., 2014a] to achieve high -performance. The network is designed such that computation is efficiently -offloaded to the client devices using the Chital system [Robinson & Li, 2015], -improving response times and reducing server costs. The resulting system is -able to rapidly compute a large number of specialized latent variable models -while requiring minimal server resources. -" -2161,1510.07099,Yao Yushi and Huang Zheng,Combine CRF and MMSEG to Boost Chinese Word Segmentation in Social Media,cs.CL," In this paper, we propose a joint algorithm for the word segmentation on -Chinese social media. Previous work mainly focus on word segmentation for plain -Chinese text, in order to develop a Chinese social media processing tool, we -need to take the main features of social media into account, whose grammatical -structure is not rigorous, and the tendency of using colloquial and Internet -terms makes the existing Chinese-processing tools inefficient to obtain good -performance on social media. - In our approach, we combine CRF and MMSEG algorithm and extend features of -traditional CRF algorithm to train the model for word segmentation, We use -Internet lexicon in order to improve the performance of our model on Chinese -social media. Our experimental result on Sina Weibo shows that our approach -outperforms the state-of-the-art model. -" -2162,1510.07193,Kais Dukes,Statistical Parsing by Machine Learning from a Classical Arabic Treebank,cs.CL," Research into statistical parsing for English has enjoyed over a decade of -successful results. However, adapting these models to other languages has met -with difficulties. Previous comparative work has shown that Modern Arabic is -one of the most difficult languages to parse due to rich morphology and free -word order. Classical Arabic is the ancient form of Arabic, and is understudied -in computational linguistics, relative to its worldwide reach as the language -of the Quran. The thesis is based on seven publications that make significant -contributions to knowledge relating to annotating and parsing Classical Arabic. - A central argument of this thesis is that using a hybrid representation -closely aligned to traditional grammar leads to improved parsing for Arabic. To -test this hypothesis, two approaches are compared. As a reference, a pure -dependency parser is adapted using graph transformations, resulting in an -87.47% F1-score. This is compared to an integrated parsing model with an -F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is -better suited to Classical Arabic. -" -2163,1510.07385,"Jean-Val\`ere Cossu (LIA), Ludovic Bonnefoy (LIA), Xavier Bost (LIA), - Marc El B\`eze (LIA)",How to merge three different methods for information filtering ?,cs.CL cs.IR," Twitter is now a gold marketing tool for entities concerned with online -reputation. To automatically monitor online reputation of entities , systems -have to deal with ambiguous entity names, polarity detection and topic -detection. We propose three approaches to tackle the first issue: monitoring -Twitter in order to find relevant tweets about a given entity. Evaluated within -the framework of the RepLab-2013 Filtering task, each of them has been shown -competitive with state-of-the-art approaches. Mainly we investigate on how much -merging strategies may impact performances on a filtering task according to the -evaluation measure. -" -2164,1510.07439,"Abinash Tripathy, Santanu Kumar Rath","Object Oriented Analysis using Natural Language Processing concepts: A - Review",cs.SE cs.CL," The Software Development Life Cycle (SDLC) starts with eliciting requirements -of the customers in the form of Software Requirement Specification (SRS). SRS -document needed for software development is mostly written in Natural -Language(NL) convenient for the client. From the SRS document only, the class -name, its attributes and the functions incorporated in the body of the class -are traced based on pre-knowledge of analyst. The paper intends to present a -review on Object Oriented (OO) analysis using Natural Language Processing (NLP) -techniques. This analysis can be manual where domain expert helps to generate -the required diagram or automated system, where the system generates the -required diagram, from the input in the form of SRS. -" -2165,1510.07482,"Effi Levi, Roi Reichart and Ari Rappoport","Edge-Linear First-Order Dependency Parsing with Undirected Minimum - Spanning Tree Inference",cs.CL," The run time complexity of state-of-the-art inference algorithms in -graph-based dependency parsing is super-linear in the number of input words -(n). Recently, pruning algorithms for these models have shown to cut a large -portion of the graph edges, with minimal damage to the resulting parse trees. -Solving the inference problem in run time complexity determined solely by the -number of edges (m) is hence of obvious importance. - We propose such an inference algorithm for first-order models, which encodes -the problem as a minimum spanning tree (MST) problem in an undirected graph. -This allows us to utilize state-of-the-art undirected MST algorithms whose run -time is O(m) at expectation and with a very high probability. A directed parse -tree is then inferred from the undirected MST and is subsequently improved with -respect to the directed parsing model through local greedy updates, both steps -running in O(n) time. In experiments with 18 languages, a variant of the -first-order MSTParser (McDonald et al., 2005b) that employs our algorithm -performs very similarly to the original parser that runs an O(n^2) directed MST -inference. -" -2166,1510.07526,"Yang Yu, Wei Zhang, Chung-Wei Hang, Bing Xiang and Bowen Zhou",Empirical Study on Deep Learning Models for Question Answering,cs.CL cs.AI cs.LG," In this paper we explore deep learning models with memory component or -attention mechanism for question answering task. We combine and compare three -models, Neural Machine Translation, Neural Turing Machine, and Memory Networks -for a simulated QA data set. This paper is the first one that uses Neural -Machine Translation and Neural Turing Machines for solving QA tasks. Our -results suggest that the combination of attention and memory have potential to -solve certain QA problem. -" -2167,1510.07586,"Sudha Rao, Yogarshi Vyas, Hal Daume III, Philip Resnik",Parser for Abstract Meaning Representation using Learning to Search,cs.CL," We develop a novel technique to parse English sentences into Abstract Meaning -Representation (AMR) using SEARN, a Learning to Search approach, by modeling -the concept and the relation learning in a unified framework. We evaluate our -parser on multiple datasets from varied domains and show an absolute -improvement of 2% to 6% over the state-of-the-art. Additionally we show that -using the most frequent concept gives us a baseline that is stronger than the -state-of-the-art for concept prediction. We plan to release our parser for -public use. -" -2168,1510.07851,"Laurent Romary (ALPAGE, CMB)","Standards for language resources in ISO -- Looking back at 13 fruitful - years",cs.CL," This paper provides an overview of the various projects carried out within -ISO committee TC 37/SC 4 dealing with the management of language (digital) -resources. On the basis of the technical experience gained in the committee and -the wider standardization landscape the paper identifies some possible trends -for the future. -" -2169,1510.08418,Katja Filippova and Enrique Alfonseca,Fast k-best Sentence Compression,cs.CL," A popular approach to sentence compression is to formulate the task as a -constrained optimization problem and solve it with integer linear programming -(ILP) tools. Unfortunately, dependence on ILP may make the compressor -prohibitively slow, and thus approximation techniques have been proposed which -are often complex and offer a moderate gain in speed. As an alternative -solution, we introduce a novel compression algorithm which generates k-best -compressions relying on local deletion decisions. Our algorithm is two orders -of magnitude faster than a recent ILP-based method while producing better -compressions. Moreover, an extensive evaluation demonstrates that the quality -of compressions does not degrade much as we move from single best to top-five -results. -" -2170,1510.08480,Umashanthi Pavalanathan and Jacob Eisenstein,Emoticons vs. Emojis on Twitter: A Causal Inference Approach,cs.CL," Online writing lacks the non-verbal cues present in face-to-face -communication, which provide additional contextual information about the -utterance, such as the speaker's intention or affective state. To fill this -void, a number of orthographic features, such as emoticons, expressive -lengthening, and non-standard punctuation, have become popular in social media -services including Twitter and Instagram. Recently, emojis have been introduced -to social media, and are increasingly popular. This raises the question of -whether these predefined pictographic characters will come to replace earlier -orthographic methods of paralinguistic communication. In this abstract, we -attempt to shed light on this question, using a matching approach from causal -inference to test whether the adoption of emojis causes individual users to -employ fewer emoticons in their text on Twitter. -" -2171,1510.08983,"Yu Zhang and Guoguo Chen and Dong Yu and Kaisheng Yao and Sanjeev - Khudanpur and James Glass",Highway Long Short-Term Memory RNNs for Distant Speech Recognition,cs.NE cs.AI cs.CL cs.LG eess.AS," In this paper, we extend the deep long short-term memory (DLSTM) recurrent -neural networks by introducing gated direct connections between memory cells in -adjacent layers. These direct links, called highway connections, enable -unimpeded information flow across different layers and thus alleviate the -gradient vanishing problem when building deeper LSTMs. We further introduce the -latency-controlled bidirectional LSTMs (BLSTMs) which can exploit the whole -history while keeping the latency under control. Efficient algorithms are -proposed to train these novel networks using both frame and sequence -discriminative criteria. Experiments on the AMI distant speech recognition -(DSR) task indicate that we can train deeper LSTMs and achieve better -improvement from sequence training with highway LSTMs (HLSTMs). Our novel model -obtains $43.9/47.7\%$ WER on AMI (SDM) dev and eval sets, outperforming all -previous works. It beats the strong DNN and DLSTM baselines with $15.7\%$ and -$5.3\%$ relative improvement respectively. -" -2172,1510.08985,"Yu Zhang, Ekapol Chuangsuwanich, James Glass, Dong Yu","Prediction-Adaptation-Correction Recurrent Neural Networks for - Low-Resource Language Speech Recognition",cs.CL cs.LG cs.NE eess.AS," In this paper, we investigate the use of prediction-adaptation-correction -recurrent neural networks (PAC-RNNs) for low-resource speech recognition. A -PAC-RNN is comprised of a pair of neural networks in which a {\it correction} -network uses auxiliary information given by a {\it prediction} network to help -estimate the state probability. The information from the correction network is -also used by the prediction network in a recurrent loop. Our model outperforms -other state-of-the-art neural networks (DNNs, LSTMs) on IARPA-Babel tasks. -Moreover, transfer learning from a language that is similar to the target -language can help improve performance further. -" -2173,1510.09079,"Lorenzo Gatti, Marco Guerini, Marco Turchi","SentiWords: Deriving a High Precision and High Coverage Lexicon for - Sentiment Analysis",cs.CL," Deriving prior polarity lexica for sentiment analysis - where positive or -negative scores are associated with words out of context - is a challenging -task. Usually, a trade-off between precision and coverage is hard to find, and -it depends on the methodology used to build the lexicon. Manually annotated -lexica provide a high precision but lack in coverage, whereas automatic -derivation from pre-existing knowledge guarantees high coverage at the cost of -a lower precision. Since the automatic derivation of prior polarities is less -time consuming than manual annotation, there has been a great bloom of these -approaches, in particular based on the SentiWordNet resource. In this paper, we -compare the most frequently used techniques based on SentiWordNet with newer -ones and blend them in a learning framework (a so called 'ensemble method'). By -taking advantage of manually built prior polarity lexica, our ensemble method -is better able to predict the prior value of unseen words and to outperform all -the other SentiWordNet approaches. Using this technique we have built -SentiWords, a prior polarity lexicon of approximately 155,000 words, that has -both a high precision and a high coverage. We finally show that in sentiment -analysis tasks, using our lexicon allows us to outperform both the single -metrics derived from SentiWordNet and popular manually annotated sentiment -lexica. -" -2174,1510.09202,Hongyu Guo,Generating Text with Deep Reinforcement Learning,cs.CL cs.LG cs.NE," We introduce a novel schema for sequence to sequence learning with a Deep -Q-Network (DQN), which decodes the output sequence iteratively. The aim here is -to enable the decoder to first tackle easier portions of the sequences, and -then turn to cope with difficult parts. Specifically, in each iteration, an -encoder-decoder Long Short-Term Memory (LSTM) network is employed to, from the -input sequence, automatically create features to represent the internal states -of and formulate a list of potential actions for the DQN. Take rephrasing a -natural sentence as an example. This list can contain ranked potential words. -Next, the DQN learns to make decision on which action (e.g., word) will be -selected from the list to modify the current decoded sequence. The newly -modified output sequence is subsequently used as the input to the DQN for the -next decoding iteration. In each iteration, we also bias the reinforcement -learning's attention to explore sequence portions which are previously -difficult to be decoded. For evaluation, the proposed strategy was trained to -decode ten thousands natural sentences. Our experiments indicate that, when -compared to a left-to-right greedy beam search LSTM decoder, the proposed -method performed competitively well when decoding sentences from the training -set, but significantly outperformed the baseline when decoding unseen -sentences, in terms of BLEU score obtained. -" -2175,1511.00040,Sta\v{s}a Milojevi\'c,Quantifying the Cognitive Extent of Science,cs.DL astro-ph.IM cs.CL physics.soc-ph," While the modern science is characterized by an exponential growth in -scientific literature, the increase in publication volume clearly does not -reflect the expansion of the cognitive boundaries of science. Nevertheless, -most of the metrics for assessing the vitality of science or for making funding -and policy decisions are based on productivity. Similarly, the increasing level -of knowledge production by large science teams, whose results often enjoy -greater visibility, does not necessarily mean that ""big science"" leads to -cognitive expansion. Here we present a novel, big-data method to quantify the -extents of cognitive domains of different bodies of scientific literature -independently from publication volume, and apply it to 20 million articles -published over 60-130 years in physics, astronomy, and biomedicine. The method -is based on the lexical diversity of titles of fixed quotas of research -articles. Owing to large size of quotas, the method overcomes the inherent -stochasticity of article titles to achieve <1% precision. We show that the -periods of cognitive growth do not necessarily coincide with the trends in -publication volume. Furthermore, we show that the articles produced by larger -teams cover significantly smaller cognitive territory than (the same quota of) -articles from smaller teams. Our findings provide a new perspective on the role -of small teams and individual researchers in expanding the cognitive boundaries -of science. The proposed method of quantifying the extent of the cognitive -territory can also be applied to study many other aspects of ""science of -science."" -" -2176,1511.00060,"Xingxing Zhang, Liang Lu, Mirella Lapata",Top-down Tree Long Short-Term Memory Networks,cs.CL cs.LG," Long Short-Term Memory (LSTM) networks, a type of recurrent neural network -with a more complex computational unit, have been successfully applied to a -variety of sequence modeling tasks. In this paper we develop Tree Long -Short-Term Memory (TreeLSTM), a neural network model based on LSTM, which is -designed to predict a tree rather than a linear sequence. TreeLSTM defines the -probability of a sentence by estimating the generation probability of its -dependency tree. At each time step, a node is generated based on the -representation of the generated sub-tree. We further enhance the modeling power -of TreeLSTM by explicitly representing the correlations between left and right -dependents. Application of our model to the MSR sentence completion challenge -achieves results beyond the current state of the art. We also report results on -dependency parsing reranking achieving competitive performance. -" -2177,1511.00215,"Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao","A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network - with Word Embedding",cs.CL," Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has -been shown to be very effective for modeling and predicting sequential data, -e.g. speech utterances or handwritten documents. In this study, we propose to -use BLSTM-RNN for a unified tagging solution that can be applied to various -tagging tasks including part-of-speech tagging, chunking and named entity -recognition. Instead of exploiting specific features carefully optimized for -each task, our solution only uses one set of task-independent features and -internal representations learnt from unlabeled text for all tasks.Requiring no -task specific knowledge or sophisticated feature engineering, our approach gets -nearly state-of-the-art performance in all these three tagging tasks. -" -2178,1511.00352,Abhinav Maurya,"Spatial Semantic Scan: Jointly Detecting Subtle Events and their Spatial - Footprint",cs.LG cs.CL stat.ML," Many methods have been proposed for detecting emerging events in text streams -using topic modeling. However, these methods have shortcomings that make them -unsuitable for rapid detection of locally emerging events on massive text -streams. We describe Spatially Compact Semantic Scan (SCSS) that has been -developed specifically to overcome the shortcomings of current methods in -detecting new spatially compact events in text streams. SCSS employs -alternating optimization between using semantic scan to estimate contrastive -foreground topics in documents, and discovering spatial neighborhoods with high -occurrence of documents containing the foreground topics. We evaluate our -method on Emergency Department chief complaints dataset (ED dataset) to verify -the effectiveness of our method in detecting real-world disease outbreaks from -free-text ED chief complaint data. -" -2179,1511.00360,"Chuang Ding, Lei Xie, Jie Yan, Weini Zhang, Yang Liu","Automatic Prosody Prediction for Chinese Speech Synthesis using - BLSTM-RNN and Embedding Features",cs.CL cs.SD," Prosody affects the naturalness and intelligibility of speech. However, -automatic prosody prediction from text for Chinese speech synthesis is still a -great challenge and the traditional conditional random fields (CRF) based -method always heavily relies on feature engineering. In this paper, we propose -to use neural networks to predict prosodic boundary labels directly from -Chinese characters without any feature engineering. Experimental results show -that stacking feed-forward and bidirectional long short-term memory (BLSTM) -recurrent network layers achieves superior performance over the CRF-based -method. The embedding features learned from raw text further enhance the -performance. -" -2180,1511.00622,Steffen Eger,On the Number of Many-to-Many Alignments of Multiple Sequences,math.CO cs.CL cs.DM," We count the number of alignments of $N \ge 1$ sequences when match-up types -are from a specified set $S\subseteq \mathbb{N}^N$. Equivalently, we count the -number of nonnegative integer matrices whose rows sum to a given fixed vector -and each of whose columns lie in $S$. We provide a new asymptotic formula for -the case $S=\{(s_1,\ldots,s_N) \:|\: 1\le s_i\le 2\}$. -" -2181,1511.01042,Junyoung Chung and Jacob Devlin and Hany Hassan Awadalla,Detecting Interrogative Utterances with Recurrent Neural Networks,cs.CL cs.LG cs.NE," In this paper, we explore different neural network architectures that can -predict if a speaker of a given utterance is asking a question or making a -statement. We com- pare the outcomes of regularization methods that are -popularly used to train deep neural networks and study how different context -functions can affect the classification performance. We also compare the -efficacy of gated activation functions that are favorably used in recurrent -neural networks and study how to combine multimodal inputs. We evaluate our -models on two multimodal datasets: MSR-Skype and CALLHOME. -" -2182,1511.01158,"Minwei Feng, Bing Xiang, Bowen Zhou",Distributed Deep Learning for Question Answering,cs.LG cs.CL cs.DC," This paper is an empirical study of the distributed deep learning for -question answering subtasks: answer selection and question classification. -Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP, -DOWNPOUR and EASGD/EAMSGD algorithms have been presented. Experimental results -show that the distributed framework based on the message passing interface can -accelerate the convergence speed at a sublinear scale. This paper demonstrates -the importance of distributed training. For example, with 48 workers, a 24x -speedup is achievable for the answer selection task and running time is -decreased from 138.2 hours to 5.81 hours, which will increase the productivity -significantly. -" -2183,1511.01259,"Gregory Grefenstette (TAO), Karima Rafes (TAO, LRI)","Transforming Wikipedia into an Ontology-based Information Retrieval - Search Engine for Local Experts using a Third-Party Taxonomy",cs.IR cs.CL," Wikipedia is widely used for finding general information about a wide variety -of topics. Its vocation is not to provide local information. For example, it -provides plot, cast, and production information about a given movie, but not -showing times in your local movie theatre. Here we describe how we can connect -local information to Wikipedia, without altering its content. The case study we -present involves finding local scientific experts. Using a third-party -taxonomy, independent from Wikipedia's category hierarchy, we index information -connected to our local experts, present in their activity reports, and we -re-index Wikipedia content using the same taxonomy. The connections between -Wikipedia pages and local expert reports are stored in a relational database, -accessible through as public SPARQL endpoint. A Wikipedia gadget (or plugin) -activated by the interested user, accesses the endpoint as each Wikipedia page -is accessed. An additional tab on the Wikipedia page allows the user to open up -a list of teams of local experts associated with the subject matter in the -Wikipedia page. The technique, though presented here as a way to identify local -experts, is generic, in that any third party taxonomy, can be used in this to -connect Wikipedia to any non-Wikipedia data source. -" -2184,1511.01432,Andrew M. Dai and Quoc V. Le,Semi-supervised Sequence Learning,cs.LG cs.CL," We present two approaches that use unlabeled data to improve sequence -learning with recurrent networks. The first approach is to predict what comes -next in a sequence, which is a conventional language model in natural language -processing. The second approach is to use a sequence autoencoder, which reads -the input sequence into a vector and predicts the input sequence again. These -two algorithms can be used as a ""pretraining"" step for a later supervised -sequence learning algorithm. In other words, the parameters obtained from the -unsupervised step can be used as a starting point for other supervised training -models. In our experiments, we find that long short term memory recurrent -networks after being pretrained with the two approaches are more stable and -generalize better. With pretraining, we are able to train long short term -memory recurrent networks up to a few hundred timesteps, thereby achieving -strong performance in many text classification tasks, such as IMDB, DBpedia and -20 Newsgroups. -" -2185,1511.01480,Maurizio Naldi,Approximation of the truncated Zeta distribution and Zipf's law,stat.AP cs.CL cs.SI," Zipf's law appears in many application areas but does not have a closed form -expression, which may make its use cumbersome. Since it coincides with the -truncated version of the Zeta distribution, in this paper we propose three -approximate closed form expressions for the truncated Zeta distribution, which -may be employed for Zipf's law as well. The three approximations are based on -the replacement of the sum occurring in Zipf's law with an integral, and are -named respectively the integral approximation, the average integral -approximation, and the trapezoidal approximation. While the first one is shown -to be of little use, the trapezoidal approximation exhibits an error which is -typically lower than 1\%, but is as low as 0.1\% for the range of values of the -Zipf parameter below 1. -" -2186,1511.01556,"Chao-Lin Liu, Chih-Kai Huang, Hongsu Wang, Peter K. Bol","Mining Local Gazetteers of Literary Chinese with CRF and Pattern based - Methods for Biographical Information in Chinese History",cs.CL cs.DL cs.IR cs.LG," Person names and location names are essential building blocks for identifying -events and social networks in historical documents that were written in -literary Chinese. We take the lead to explore the research on algorithmically -recognizing named entities in literary Chinese for historical studies with -language-model based and conditional-random-field based methods, and extend our -work to mining the document structures in historical documents. Practical -evaluations were conducted with texts that were extracted from more than 220 -volumes of local gazetteers (Difangzhi). Difangzhi is a huge and the single -most important collection that contains information about officers who served -in local government in Chinese history. Our methods performed very well on -these realistic tests. Thousands of names and addresses were identified from -the texts. A good portion of the extracted names match the biographical -information currently recorded in the China Biographical Database (CBDB) of -Harvard University, and many others can be verified by historians and will -become as new additions to CBDB. -" -2187,1511.01559,"Chao-Lin Liu, Hongsu Wang, Wen-Huei Cheng, Chu-Ting Hsu, Wei-Yun Chiu","Color Aesthetics and Social Networks in Complete Tang Poems: - Explorations and Discoveries",cs.CL cs.DL cs.IR," The Complete Tang Poems (CTP) is the most important source to study Tang -poems. We look into CTP with computational tools from specific linguistic -perspectives, including distributional semantics and collocational analysis. -From such quantitative viewpoints, we compare the usage of ""wind"" and ""moon"" in -the poems of Li Bai and Du Fu. Colors in poems function like sounds in movies, -and play a crucial role in the imageries of poems. Thus, words for colors are -studied, and ""white"" is the main focus because it is the most frequent color in -CTP. We also explore some cases of using colored words in antithesis pairs that -were central for fostering the imageries of the poems. CTP also contains useful -historical information, and we extract person names in CTP to study the social -networks of the Tang poets. Such information can then be integrated with the -China Biographical Database of Harvard University. -" -2188,1511.01574,Ciprian Chelba and Fernando Pereira,"Multinomial Loss on Held-out Data for the Sparse Non-negative Matrix - Language Model",cs.CL," We describe Sparse Non-negative Matrix (SNM) language model estimation using -multinomial loss on held-out data. - Being able to train on held-out data is important in practical situations -where the training data is usually mismatched from the held-out/test data. It -is also less constrained than the previous training algorithm using -leave-one-out on training data: it allows the use of richer meta-features in -the adjustment model, e.g. the diversity counts used by Kneser-Ney smoothing -which would be difficult to deal with correctly in leave-one-out training. - In experiments on the one billion words language modeling benchmark, we are -able to slightly improve on our previous results which use a different loss -function, and employ leave-one-out training on a subset of the main training -set. Surprisingly, an adjustment model with meta-features that discard all -lexical information can perform as well as lexicalized meta-features. We find -that fairly small amounts of held-out data (on the order of 30-70 thousand -words) are sufficient for training the adjustment model. - In a real-life scenario where the training data is a mix of data sources that -are imbalanced in size, and of different degrees of relevance to the held-out -and test data, taking into account the data source for a given skip-/n-gram -feature and combining them for best performance on held-out/test data improves -over skip-/n-gram SNM models trained on pooled data by about 8% in the SMT -setup, or as much as 15% in the ASR/IME setup. - The ability to mix various data sources based on how relevant they are to a -mismatched held-out set is probably the most attractive feature of the new -estimation method for SNM LM. -" -2189,1511.01665,"Yiou Lin, Hang Lei, Jia Wu and Xiaoyu Li","An Empirical Study on Sentiment Classification of Chinese Review using - Word Embedding",cs.CL," In this article, how word embeddings can be used as features in Chinese -sentiment classification is presented. Firstly, a Chinese opinion corpus is -built with a million comments from hotel review websites. Then the word -embeddings which represent each comment are used as input in different machine -learning methods for sentiment classification, including SVM, Logistic -Regression, Convolutional Neural Network (CNN) and ensemble methods. These -methods get better performance compared with N-gram models using Naive Bayes -(NB) and Maximum Entropy (ME). Finally, a combination of machine learning -methods is proposed which presents an outstanding performance in precision, -recall and F1 score. After selecting the most useful methods to construct the -combinational model and testing over the corpus, the final F1 score is 0.920. -" -2190,1511.01666,Abhinav Tushar and Abhinav Dahiya,Comparing Writing Styles using Word Embedding and Dynamic Time Warping,cs.CL," The development of plot or story in novels is reflected in the content and -the words used. The flow of sentiments, which is one aspect of writing style, -can be quantified by analyzing the flow of words. This study explores literary -works as signals in word embedding space and tries to compare writing styles of -popular classic novels using dynamic time warping. -" -2191,1511.01756,"Suzanne Mpouli (ACASA), Jean-Gabriel Ganascia (ACASA)","""Pale as death"" or ""p\^ale comme la mort"" : Frozen similes used as - literary clich\'es",cs.CL," The present study is focused on the automatic identification and description -of frozen similes in British and French novels written between the 19 th -century and the beginning of the 20 th century. Two main patterns of frozen -similes were considered: adjectival ground + simile marker + nominal vehicle -(e.g. happy as a lark) and eventuality + simile marker + nominal vehicle (e.g. -sleep like a top). All potential similes and their components were first -extracted using a rule-based algorithm. Then, frozen similes were identified -based on reference lists of existing similes and semantic distance between the -tenor and the vehicle. The results obtained tend to confirm the fact that -frozen similes are not used haphazardly in literary texts. In addition, -contrary to how they are often presented, frozen similes often go beyond the -ground or the eventuality and the vehicle to also include the tenor. -" -2192,1511.01974,"Xu Chen, Han Zhang, Judith Gelernter",Multi-lingual Geoparsing based on Machine Translation,cs.CL cs.IR," Our method for multi-lingual geoparsing uses monolingual tools and resources -along with machine translation and alignment to return location words in many -languages. Not only does our method save the time and cost of developing -geoparsers for each language separately, but also it allows the possibility of -a wide range of language capabilities within a single interface. We evaluated -our method in our LanguageBridge prototype on location named entities using -newswire, broadcast news and telephone conversations in English, Arabic and -Chinese data from the Linguistic Data Consortium (LDC). Our results for -geoparsing Chinese and Arabic text using our multi-lingual geoparsing method -are comparable to our results for geoparsing English text with our English -tools. Furthermore, experiments using our machine translation approach results -in accuracy comparable to results from the same data that was translated -manually. -" -2193,1511.02014,Alexander Koplenig and Carolin Mueller-Spitzer,"Population size predicts lexical diversity, but so does the mean sea - level - why it is important to correctly account for the structure of - temporal data",cs.CL," In order to demonstrate why it is important to correctly account for the -(serial dependent) structure of temporal data, we document an apparently -spectacular relationship between population size and lexical diversity: for -five out of seven investigated languages, there is a strong relationship -between population size and lexical diversity of the primary language in this -country. We show that this relationship is the result of a misspecified model -that does not consider the temporal aspect of the data by presenting a similar -but nonsensical relationship between the global annual mean sea level and -lexical diversity. Given the fact that in the recent past, several studies were -published that present surprising links between different economic, cultural, -political and (socio-)demographical variables on the one hand and cultural or -linguistic characteristics on the other hand, but seem to suffer from exactly -this problem, we explain the cause of the misspecification and show that it has -profound consequences. We demonstrate how simple transformation of the time -series can often solve problems of this type and argue that the evaluation of -the plausibility of a relationship is important in this context. We hope that -our paper will help both researchers and reviewers to understand why it is -important to use special models for the analysis of data with a natural -temporal ordering. -" -2194,1511.02024,"S. Sathiya Keerthi, Tobias Schnabel, Rajiv Khanna",Towards a Better Understanding of Predict and Count Models,cs.LG cs.CL," In a recent paper, Levy and Goldberg pointed out an interesting connection -between prediction-based word embedding models and count models based on -pointwise mutual information. Under certain conditions, they showed that both -models end up optimizing equivalent objective functions. This paper explores -this connection in more detail and lays out the factors leading to differences -between these models. We find that the most relevant differences from an -optimization perspective are (i) predict models work in a low dimensional space -where embedding vectors can interact heavily; (ii) since predict models have -fewer parameters, they are less prone to overfitting. - Motivated by the insight of our analysis, we show how count models can be -regularized in a principled manner and provide closed-form solutions for L1 and -L2 regularization. Finally, we propose a new embedding model with a convex -objective and the additional benefit of being intelligible. -" -2195,1511.02117,Kerry Fultz and Seth Filip,Introducing SKYSET - a Quintuple Approach for Improving Instructions,cs.CL," A new approach called SKYSET (Synthetic Knowledge Yield Social Entities -Translation) is proposed to validate completeness and to reduce ambiguity from -written instructional documentation. SKYSET utilizes a quintuple set of -standardized categories, which differs from traditional approaches that -typically use triples. The SKYSET System defines the categories required to -form a standard template for representing information that is portable across -different domains. It provides a standardized framework that enables sentences -from written instructions to be translated into sets of category typed entities -on a table or database. The SKYSET entities contain conceptual units or phrases -that represent information from the original source documentation. SKYSET -enables information concatenation where multiple documents from different -domains can be translated and combined into a single common filterable and -searchable table of entities. -" -2196,1511.02274,"Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola",Stacked Attention Networks for Image Question Answering,cs.LG cs.CL cs.CV cs.NE," This paper presents stacked attention networks (SANs) that learn to answer -natural language questions from images. SANs use semantic representation of a -question as query to search for the regions in an image that are related to the -answer. We argue that image question answering (QA) often requires multiple -steps of reasoning. Thus, we develop a multiple-layer SAN in which we query an -image multiple times to infer the answer progressively. Experiments conducted -on four image QA data sets demonstrate that the proposed SANs significantly -outperform previous state-of-the-art approaches. The visualization of the -attention layers illustrates the progress that the SAN locates the relevant -visual clues that lead to the answer of the question layer-by-layer. -" -2197,1511.02283,"Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan - Yuille, Kevin Murphy",Generation and Comprehension of Unambiguous Object Descriptions,cs.CV cs.CL cs.LG cs.RO," We propose a method that can generate an unambiguous description (known as a -referring expression) of a specific object or region in an image, and which can -also comprehend or interpret such an expression to infer which object is being -described. We show that our method outperforms previous methods that generate -descriptions of objects without taking into account other potentially ambiguous -objects in the scene. Our model is inspired by recent successes of deep -learning methods for image captioning, but while image captioning is difficult -to evaluate, our task allows for easy objective evaluation. We also present a -new large-scale dataset for referring expressions, based on MS-COCO. We have -released the dataset and a toolbox for visualization and evaluation, see -https://github.com/mjhucla/Google_Refexp_toolbox -" -2198,1511.02301,"Felix Hill, Antoine Bordes, Sumit Chopra, Jason Weston","The Goldilocks Principle: Reading Children's Books with Explicit Memory - Representations",cs.CL," We introduce a new test of how well language models capture meaning in -children's books. Unlike standard language modelling benchmarks, it -distinguishes the task of predicting syntactic function words from that of -predicting lower-frequency words, which carry greater semantic content. We -compare a range of state-of-the-art models, each with a different way of -encoding what has been previously read. We show that models which store -explicit representations of long-term contexts outperform state-of-the-art -neural language models at predicting semantic content words, although this -advantage is not observed for syntactic function words. Interestingly, we find -that the amount of text encoded in a single memory representation is highly -influential to the performance: there is a sweet-spot, not too big and not too -small, between single words and full sentences that allows the most meaningful -information in a text to be effectively retained and recalled. Further, the -attention over such window-based memories can be trained effectively through -self-supervision. We then assess the generality of this principle by applying -it to the CNN QA benchmark, which involves identifying named entities in -paraphrased summaries of news articles, and achieve state-of-the-art -performance. -" -2199,1511.02385,"Sylvester Olubolu Orimaye, Saadat M. Alhashmi, Eu-Gene Siew and Sang - Jung Kang","Review-Level Sentiment Classification with Sentence-Level Polarity - Correction",cs.CL cs.AI cs.LG," We propose an effective technique to solving review-level sentiment -classification problem by using sentence-level polarity correction. Our -polarity correction technique takes into account the consistency of the -polarities (positive and negative) of sentences within each product review -before performing the actual machine learning task. While sentences with -inconsistent polarities are removed, sentences with consistent polarities are -used to learn state-of-the-art classifiers. The technique achieved better -results on different types of products reviews and outperforms baseline models -without the correction technique. Experimental results show an average of 82% -F-measure on four different product review domains. -" -2200,1511.02435,"Son-Il Kwak, O-Chol Kown, Chang-Sin Kim, Yong-Il Pak, Gum-Chol Son, - Chol-Jun Hwang, Hyon-Chol Kim, Hyok-Chol Sin, Gyong-Il Hyon, Sok-Min Han",A Chinese POS Decision Method Using Korean Translation Information,cs.CL," In this paper we propose a method that imitates a translation expert using -the Korean translation information and analyse the performance. Korean is good -at tagging than Chinese, so we can use this property in Chinese POS tagging. -" -2201,1511.02436,"Sylvester Olubolu Orimaye, Kah Yee Tai, Jojo Sze-Meng Wong and Chee - Piau Wong","Learning Linguistic Biomarkers for Predicting Mild Cognitive Impairment - using Compound Skip-grams",cs.CL cs.AI," Predicting Mild Cognitive Impairment (MCI) is currently a challenge as -existing diagnostic criteria rely on neuropsychological examinations. Automated -Machine Learning (ML) models that are trained on verbal utterances of MCI -patients can aid diagnosis. Using a combination of skip-gram features, our -model learned several linguistic biomarkers to distinguish between 19 patients -with MCI and 19 healthy control individuals from the DementiaBank language -transcript clinical dataset. Results show that a model with compound of -skip-grams has better AUC and could help ML prediction on small MCI data -sample. -" -2202,1511.02506,"Yi-Hsiu Liao, Hung-yi Lee, Lin-shan Lee",Towards Structured Deep Neural Network for Automatic Speech Recognition,cs.CL cs.LG cs.NE," In this paper we propose the Structured Deep Neural Network (structured DNN) -as a structured and deep learning framework. This approach can learn to find -the best structured object (such as a label sequence) given a structured input -(such as a vector sequence) by globally considering the mapping relationships -between the structures rather than item by item. - When automatic speech recognition is viewed as a special case of such a -structured learning problem, where we have the acoustic vector sequence as the -input and the phoneme label sequence as the output, it becomes possible to -comprehensively learn utterance by utterance as a whole, rather than frame by -frame. - Structured Support Vector Machine (structured SVM) was proposed to perform -ASR with structured learning previously, but limited by the linear nature of -SVM. Here we propose structured DNN to use nonlinear transformations in -multi-layers as a structured and deep learning approach. This approach was -shown to beat structured SVM in preliminary experiments on TIMIT. -" -2203,1511.02556,"Hao Wang, Jorge A. Castanon",Sentiment Expression via Emoticons on Social Media,cs.CL cs.SI," Emoticons (e.g., :) and :( ) have been widely used in sentiment analysis and -other NLP tasks as features to ma- chine learning algorithms or as entries of -sentiment lexicons. In this paper, we argue that while emoticons are strong and -common signals of sentiment expression on social media the relationship between -emoticons and sentiment polarity are not always clear. Thus, any algorithm that -deals with sentiment polarity should take emoticons into account but extreme -cau- tion should be exercised in which emoticons to depend on. First, to -demonstrate the prevalence of emoticons on social media, we analyzed the -frequency of emoticons in a large re- cent Twitter data set. Then we carried -out four analyses to examine the relationship between emoticons and sentiment -polarity as well as the contexts in which emoticons are used. The first -analysis surveyed a group of participants for their perceived sentiment -polarity of the most frequent emoticons. The second analysis examined -clustering of words and emoti- cons to better understand the meaning conveyed -by the emoti- cons. The third analysis compared the sentiment polarity of -microblog posts before and after emoticons were removed from the text. The last -analysis tested the hypothesis that removing emoticons from text hurts -sentiment classification by training two machine learning models with and -without emoticons in the text respectively. The results confirms the arguments -that: 1) a few emoticons are strong and reliable signals of sentiment polarity -and one should take advantage of them in any senti- ment analysis; 2) a large -group of the emoticons conveys com- plicated sentiment hence they should be -treated with extreme caution. -" -2204,1511.02570,"Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel, Anthony Dick",Explicit Knowledge-based Reasoning for Visual Question Answering,cs.CV cs.CL," We describe a method for visual question answering which is capable of -reasoning about contents of an image on the basis of information extracted from -a large-scale knowledge base. The method not only answers natural language -questions using concepts not contained in the image, but can provide an -explanation of the reasoning by which it developed its answer. The method is -capable of answering far more complex questions than the predominant long -short-term memory-based approach, and outperforms it significantly in the -testing. We also provide a dataset and a protocol by which to evaluate such -methods, thus addressing one of the key issues in general visual ques- tion -answering. -" -2205,1511.02669,Adrian Groza and Roxana Szabo,"Enacting textual entailment and ontologies for automated essay grading - in chemical domain",cs.AI cs.CL," We propose a system for automated essay grading using ontologies and textual -entailment. The process of textual entailment is guided by hypotheses, which -are extracted from a domain ontology. Textual entailment checks if the truth of -the hypothesis follows from a given text. We enact textual entailment to -compare students answer to a model answer obtained from ontology. We validated -the solution against various essays written by students in the chemistry -domain. -" -2206,1511.02799,"Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein",Neural Module Networks,cs.CV cs.CL cs.LG cs.NE," Visual question answering is fundamentally compositional in nature---a -question like ""where is the dog?"" shares substructure with questions like ""what -color is the dog?"" and ""where is the cat?"" This paper seeks to simultaneously -exploit the representational capacity of deep networks and the compositional -linguistic structure of questions. We describe a procedure for constructing and -learning *neural module networks*, which compose collections of jointly-trained -neural ""modules"" into deep networks for question answering. Our approach -decomposes questions into their linguistic substructures, and uses these -structures to dynamically instantiate modular networks (with reusable -components for recognizing dogs, classifying colors, etc.). The resulting -compound networks are jointly trained. We evaluate our approach on two -challenging datasets for visual question answering, achieving state-of-the-art -results on both the VQA natural image dataset and a new dataset of complex -questions about abstract shapes. -" -2207,1511.03012,Adrian Groza and Lidia Corde,Information retrieval in folktales using natural language processing,cs.CL cs.AI cs.IR," Our aim is to extract information about literary characters in unstructured -texts. We employ natural language processing and reasoning on domain -ontologies. The first task is to identify the main characters and the parts of -the story where these characters are described or act. We illustrate the system -in a scenario in the folktale domain. The system relies on a folktale ontology -that we have developed based on Propp's model for folktales morphology. -" -2208,1511.03053,"Suzanne Mpouli (ACASA), Jean-Gabriel Ganascia (ACASA)","Investigating the stylistic relevance of adjective and verb simile - markers",cs.CL," Similes play an important role in literary texts not only as rhetorical -devices and as figures of speech but also because of their evocative power, -their aptness for description and the relative ease with which they can be -combined with other figures of speech (Israel et al. 2004). Detecting all types -of simile constructions in a particular text therefore seems crucial when -analysing the style of an author. Few research studies however have been -dedicated to the study of less prominent simile markers in fictional prose and -their relevance for stylistic studies. The present paper studies the frequency -of adjective and verb simile markers in a corpus of British and French novels -in order to determine which ones are really informative and worth including in -a stylistic analysis. Furthermore, are those adjectives and verb simile markers -used differently in both languages? -" -2209,1511.03088,Leon Derczynski and Isabelle Augenstein and Kalina Bontcheva,USFD: Twitter NER with Drift Compensation and Linked Data,cs.CL," This paper describes a pilot NER system for Twitter, comprising the USFD -system entry to the W-NUT 2015 NER shared task. The goal is to correctly label -entities in a tweet dataset, using an inventory of ten types. We employ -structured learning, drawing on gazetteers taken from Linked Data, and on -unsupervised clustering features, and attempting to compensate for stylistic -and topic drift - a key challenge in social media text. Our result is -competitive; we provide an analysis of the components of our methodology, and -an examination of the target dataset in the context of this task. -" -2210,1511.03292,"Somak Aditya, Yezhou Yang, Chitta Baral, Cornelia Fermuller, Yiannis - Aloimonos","From Images to Sentences through Scene Description Graphs using - Commonsense Reasoning and Knowledge",cs.CV cs.AI cs.CL," In this paper we propose the construction of linguistic descriptions of -images. This is achieved through the extraction of scene description graphs -(SDGs) from visual scenes using an automatically constructed knowledge base. -SDGs are constructed using both vision and reasoning. Specifically, commonsense -reasoning is applied on (a) detections obtained from existing perception -methods on given images, (b) a ""commonsense"" knowledge base constructed using -natural language processing of image annotations and (c) lexical ontological -knowledge from resources such as WordNet. Amazon Mechanical Turk(AMT)-based -evaluations on Flickr8k, Flickr30k and MS-COCO datasets show that in most -cases, sentences auto-constructed from SDGs obtained by our method give a more -relevant and thorough description of an image than a recent state-of-the-art -image caption based approach. Our Image-Sentence Alignment Evaluation results -are also comparable to that of the recent state-of-the art approaches. -" -2211,1511.03546,"Guorui Zhou, Guang Chen",Hierarchical Latent Semantic Mapping for Automated Topic Generation,cs.LG cs.CL cs.IR," Much of information sits in an unprecedented amount of text data. Managing -allocation of these large scale text data is an important problem for many -areas. Topic modeling performs well in this problem. The traditional generative -models (PLSA,LDA) are the state-of-the-art approaches in topic modeling and -most recent research on topic generation has been focusing on improving or -extending these models. However, results of traditional generative models are -sensitive to the number of topics K, which must be specified manually. The -problem of generating topics from corpus resembles community detection in -networks. Many effective algorithms can automatically detect communities from -networks without a manually specified number of the communities. Inspired by -these algorithms, in this paper, we propose a novel method named Hierarchical -Latent Semantic Mapping (HLSM), which automatically generates topics from -corpus. HLSM calculates the association between each pair of words in the -latent topic space, then constructs a unipartite network of words with this -association and hierarchically generates topics from this network. We apply -HLSM to several document collections and the experimental comparisons against -several state-of-the-art approaches demonstrate the promising performance. -" -2212,1511.03683,"Zachary C. Lipton, Sharad Vikram, Julian McAuley","Generative Concatenative Nets Jointly Learn to Write and Classify - Reviews",cs.CL cs.LG," A recommender system's basic task is to estimate how users will respond to -unseen items. This is typically modeled in terms of how a user might rate a -product, but here we aim to extend such approaches to model how a user would -write about the product. To do so, we design a character-level Recurrent Neural -Network (RNN) that generates personalized product reviews. The network -convincingly learns styles and opinions of nearly 1000 distinct authors, using -a large corpus of reviews from BeerAdvocate.com. It also tailors reviews to -describe specific items, categories, and star ratings. Using a simple input -replication strategy, the Generative Concatenative Network (GCN) preserves the -signal of static auxiliary inputs across wide sequence intervals. Without any -additional training, the generative model can classify reviews, identifying the -author of the review, the product category, and the sentiment (rating), with -remarkable accuracy. Our evaluation shows the GCN captures complex dynamics in -text, such as the effect of negation, misspellings, slang, and large -vocabularies gracefully absent any machinery explicitly dedicated to the -purpose. -" -2213,1511.03690,David Harwath and James Glass,Deep Multimodal Semantic Embeddings for Speech and Images,cs.CV cs.AI cs.CL," In this paper, we present a model which takes as input a corpus of images -with relevant spoken captions and finds a correspondence between the two -modalities. We employ a pair of convolutional neural networks to model visual -objects and speech signals at the word level, and tie the networks together -with an embedding and alignment model which learns a joint semantic space over -both modalities. We evaluate our model using image search and annotation tasks -on the Flickr8k dataset, which we augmented by collecting a corpus of 40,000 -spoken captions using Amazon Mechanical Turk. -" -2214,1511.03729,Tian Wang and Kyunghyun Cho,Larger-Context Language Modelling,cs.CL," In this work, we propose a novel method to incorporate corpus-level discourse -information into language modelling. We call this larger-context language -model. We introduce a late fusion approach to a recurrent language model based -on long short-term memory units (LSTM), which helps the LSTM unit keep -intra-sentence dependencies and inter-sentence dependencies separate from each -other. Through the evaluation on three corpora (IMDB, BBC, and PennTree Bank), -we demon- strate that the proposed model improves perplexity significantly. In -the experi- ments, we evaluate the proposed approach while varying the number -of context sentences and observe that the proposed late fusion is superior to -the usual way of incorporating additional inputs to the LSTM. By analyzing the -trained larger- context language model, we discover that content words, -including nouns, adjec- tives and verbs, benefit most from an increasing number -of context sentences. This analysis suggests that larger-context language model -improves the unconditional language model by capturing the theme of a document -better and more easily. -" -2215,1511.03745,"Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, Bernt - Schiele",Grounding of Textual Phrases in Images by Reconstruction,cs.CV cs.CL cs.LG," Grounding (i.e. localizing) arbitrary, free-form textual phrases in visual -content is a challenging problem with many applications for human-computer -interaction and image-text reference resolution. Few datasets provide the -ground truth spatial localization of phrases, thus it is desirable to learn -from data with no or little grounding supervision. We propose a novel approach -which learns grounding by reconstructing a given phrase using an attention -mechanism, which can be either latent or optimized directly. During training -our approach encodes the phrase using a recurrent network language model and -then learns to attend to the relevant image region in order to reconstruct the -input phrase. At test time, the correct attention, i.e., the grounding, is -evaluated. If grounding supervision is available it can be directly applied via -a loss over the attention mechanism. We demonstrate the effectiveness of our -approach on the Flickr 30k Entities and ReferItGame datasets with different -levels of supervision, ranging from no supervision over partial supervision to -full supervision. Our supervised variant improves by a large margin over the -state-of-the-art on both datasets. -" -2216,1511.03924,Normunds Gruzitis and Dana Dann\'ells,"A Multilingual FrameNet-based Grammar and Lexicon for Controlled Natural - Language",cs.CL," Berkeley FrameNet is a lexico-semantic resource for English based on the -theory of frame semantics. It has been exploited in a range of natural language -processing applications and has inspired the development of framenets for many -languages. We present a methodological approach to the extraction and -generation of a computational multilingual FrameNet-based grammar and lexicon. -The approach leverages FrameNet-annotated corpora to automatically extract a -set of cross-lingual semantico-syntactic valence patterns. Based on data from -Berkeley FrameNet and Swedish FrameNet, the proposed approach has been -implemented in Grammatical Framework (GF), a categorial grammar formalism -specialized for multilingual grammars. The implementation of the grammar and -lexicon is supported by the design of FrameNet, providing a frame semantic -abstraction layer, an interlingual semantic API (application programming -interface), over the interlingual syntactic API already provided by GF Resource -Grammar Library. The evaluation of the acquired grammar and lexicon shows the -feasibility of the approach. Additionally, we illustrate how the FrameNet-based -grammar and lexicon are exploited in two distinct multilingual controlled -natural language applications. The produced resources are available under an -open source license. -" -2217,1511.03962,"Yangfeng Ji, Trevor Cohn, Lingpeng Kong, Chris Dyer, Jacob Eisenstein",Document Context Language Models,cs.CL cs.LG stat.ML," Text documents are structured on multiple levels of detail: individual words -are related by syntax, but larger units of text are related by discourse -structure. Existing language models generally fail to account for discourse -structure, but it is crucial if we are to have language models that reward -coherence and generate coherent texts. We present and empirically evaluate a -set of multi-level recurrent neural network language models, called -Document-Context Language Models (DCLM), which incorporate contextual -information both within and beyond the sentence. In comparison with word-level -recurrent neural network language models, the DCLM models obtain slightly -better predictive likelihoods, and considerably better assessments of document -coherence. -" -2218,1511.04024,"Zachary Seymour, Yingming Li, Zhongfei Zhang",Multimodal Skip-gram Using Convolutional Pseudowords,cs.CL cs.CV," This work studies the representational mapping across multimodal data such -that given a piece of the raw data in one modality the corresponding semantic -description in terms of the raw data in another modality is immediately -obtained. Such a representational mapping can be found in a wide spectrum of -real-world applications including image/video retrieval, object recognition, -action/behavior recognition, and event understanding and prediction. To that -end, we introduce a simplified training objective for learning multimodal -embeddings using the skip-gram architecture by introducing convolutional -""pseudowords:"" embeddings composed of the additive combination of distributed -word representations and image features from convolutional neural networks -projected into the multimodal space. We present extensive results of the -representational properties of these embeddings on various word similarity -benchmarks to show the promise of this approach. -" -2219,1511.04108,"Ming Tan, Cicero dos Santos, Bing Xiang, Bowen Zhou",LSTM-based Deep Learning Models for Non-factoid Answer Selection,cs.CL cs.LG," In this paper, we apply a general deep learning (DL) framework for the answer -selection task, which does not depend on manually defined features or -linguistic tools. The basic framework is to build the embeddings of questions -and answers based on bidirectional long short-term memory (biLSTM) models, and -measure their closeness by cosine similarity. We further extend this basic -model in two directions. One direction is to define a more composite -representation for questions and answers by combining convolutional neural -network with the basic framework. The other direction is to utilize a simple -but efficient attention mechanism in order to generate the answer -representation according to the question context. Several variations of models -are provided. The models are examined by two datasets, including TREC-QA and -InsuranceQA. Experimental results demonstrate that the proposed models -substantially outperform several strong baselines. -" -2220,1511.04164,"Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, - Trevor Darrell",Natural Language Object Retrieval,cs.CV cs.CL," In this paper, we address the task of natural language object retrieval, to -localize a target object within a given image based on a natural language query -of the object. Natural language object retrieval differs from text-based image -retrieval task as it involves spatial information about objects within the -scene and global scene context. To address this issue, we propose a novel -Spatial Context Recurrent ConvNet (SCRC) model as scoring function on candidate -boxes for object retrieval, integrating spatial configurations and global -scene-level contextual information into the network. Our model processes query -text, local image descriptors, spatial configurations and global context -features through a recurrent network, outputs the probability of the query text -conditioned on each candidate box as a score for the box, and can transfer -visual-linguistic knowledge from image captioning domain to our task. -Experimental results demonstrate that our method effectively utilizes both -local and global information, outperforming previous baseline methods -significantly on different datasets and scenarios, and can exploit large scale -vision and language datasets for knowledge transfer. -" -2221,1511.04401,"Federico Raue, Andreas Dengel, Thomas M. Breuel, Marcus Liwicki","Symbol Grounding Association in Multimodal Sequences with Missing - Elements",cs.CV cs.CL cs.LG cs.NE," In this paper, we extend a symbolic association framework for being able to -handle missing elements in multimodal sequences. The general scope of the work -is the symbolic associations of object-word mappings as it happens in language -development in infants. In other words, two different representations of the -same abstract concepts can associate in both directions. This scenario has been -long interested in Artificial Intelligence, Psychology, and Neuroscience. In -this work, we extend a recent approach for multimodal sequences (visual and -audio) to also cope with missing elements in one or both modalities. Our method -uses two parallel Long Short-Term Memories (LSTMs) with a learning rule based -on EM-algorithm. It aligns both LSTM outputs via Dynamic Time Warping (DTW). We -propose to include an extra step for the combination with the max operation for -exploiting the common elements between both sequences. The motivation behind is -that the combination acts as a condition selector for choosing the best -representation from both LSTMs. We evaluated the proposed extension in the -following scenarios: missing elements in one modality (visual or audio) and -missing elements in both modalities (visual and sound). The performance of our -extension reaches better results than the original model and similar results to -individual LSTM trained in each modality. -" -2222,1511.04586,Wang Ling and Isabel Trancoso and Chris Dyer and Alan W Black,Character-based Neural Machine Translation,cs.CL," We introduce a neural machine translation model that views the input and -output sentences as sequences of characters rather than words. Since word-level -information provides a crucial source of bias, our input model composes -representations of character sequences into representations of words (as -determined by whitespace boundaries), and then these are translated using a -joint attention/translation model. In the target language, the translation is -modeled as a sequence of word vectors, but each word is generated one character -at a time, conditional on the previous character generations in each word. As -the representation and generation of words is performed at the character level, -our model is capable of interpreting and generating unseen word forms. A -secondary benefit of this approach is that it alleviates much of the challenges -associated with preprocessing/tokenization of the source and target languages. -We show that our model can achieve translation results that are on par with -conventional word-based models. -" -2223,1511.04590,"Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, Yoshua Bengio",Oracle performance for visual captioning,cs.CV cs.CL stat.ML," The task of associating images and videos with a natural language description -has attracted a great amount of attention recently. Rapid progress has been -made in terms of both developing novel algorithms and releasing new datasets. -Indeed, the state-of-the-art results on some of the standard datasets have been -pushed into the regime where it has become more and more difficult to make -significant improvements. Instead of proposing new models, this work -investigates the possibility of empirically establishing performance upper -bounds on various visual captioning datasets without extra data labelling -effort or human evaluation. In particular, it is assumed that visual captioning -is decomposed into two steps: from visual inputs to visual concepts, and from -visual concepts to natural language descriptions. One would be able to obtain -an upper bound when assuming the first step is perfect and only requiring -training a conditional language model for the second step. We demonstrate the -construction of such bounds on MS-COCO, YouTube2Text and LSMDC (a combination -of M-VAD and MPII-MD). Surprisingly, despite of the imperfect process we used -for visual concept extraction in the first step and the simplicity of the -language model for the second step, we show that current state-of-the-art -models fall short when being compared with the learned upper bounds. -Furthermore, with such a bound, we quantify several important factors -concerning image and video captioning: the number of visual concepts captured -by different models, the trade-off between the amount of visual elements -captured and their accuracy, and the intrinsic difficulty and blessing of -different datasets. -" -2224,1511.04623,"Kazuya Kawakami, Chris Dyer",Learning to Represent Words in Context with Multilingual Supervision,cs.CL," We present a neural network architecture based on bidirectional LSTMs to -compute representations of words in the sentential contexts. These -context-sensitive word representations are suitable for, e.g., distinguishing -different word senses and other context-modulated variations in meaning. To -learn the parameters of our model, we use cross-lingual supervision, -hypothesizing that a good representation of a word in context will be one that -is sufficient for selecting the correct translation into a second language. We -evaluate the quality of our representations as features in three downstream -tasks: prediction of semantic supersenses (which assign nouns and verbs into a -few dozen semantic classes), low resource machine translation, and a lexical -substitution task, and obtain state-of-the-art results on all of these. -" -2225,1511.04636,"Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, - Mari Ostendorf",Deep Reinforcement Learning with a Natural Language Action Space,cs.AI cs.CL cs.LG," This paper introduces a novel architecture for reinforcement learning with -deep neural networks designed to handle state and action spaces characterized -by natural language, as found in text-based games. Termed a deep reinforcement -relevance network (DRRN), the architecture represents action and state spaces -with separate embedding vectors, which are combined with an interaction -function to approximate the Q-function in reinforcement learning. We evaluate -the DRRN on two popular text games, showing superior performance over other -deep Q-learning architectures. Experiments with paraphrased action descriptions -show that the model is extracting meaning rather than simply memorizing strings -of text. -" -2226,1511.04646,"Yikang Shen, Wenge Rong, Nan Jiang, Baolin Peng, Jie Tang, Zhang Xiong",Word Embedding based Correlation Model for Question/Answer Matching,cs.CL cs.AI," With the development of community based question answering (Q&A) services, a -large scale of Q&A archives have been accumulated and are an important -information and knowledge resource on the web. Question and answer matching has -been attached much importance to for its ability to reuse knowledge stored in -these systems: it can be useful in enhancing user experience with recurrent -questions. In this paper, we try to improve the matching accuracy by overcoming -the lexical gap between question and answer pairs. A Word Embedding based -Correlation (WEC) model is proposed by integrating advantages of both the -translation model and word embedding, given a random pair of words, WEC can -score their co-occurrence probability in Q&A pairs and it can also leverage the -continuity and smoothness of continuous space word representation to deal with -new pairs of words that are rare in the training parallel text. An experimental -study on Yahoo! Answers dataset and Baidu Zhidao dataset shows this new -method's promising potential. -" -2227,1511.04661,"Hao Wang, Vijay R. Bommireddipalli, Ayman Hanafy, Mohamed Bahgat, Sara - Noeman and Ossama S. Emam",A System for Extracting Sentiment from Large-Scale Arabic Social Data,cs.CL," Social media data in Arabic language is becoming more and more abundant. It -is a consensus that valuable information lies in social media data. Mining this -data and making the process easier are gaining momentum in the industries. This -paper describes an enterprise system we developed for extracting sentiment from -large volumes of social data in Arabic dialects. First, we give an overview of -the Big Data system for information extraction from multilingual social data -from a variety of sources. Then, we focus on the Arabic sentiment analysis -capability that was built on top of the system including normalizing written -Arabic dialects, building sentiment lexicons, sentiment classification, and -performance evaluation. Lastly, we demonstrate the value of enriching sentiment -results with user profiles in understanding sentiments of a specific user -group. -" -2228,1511.04747,"Sayan Ghosh, Eugene Laksana, Louis-Philippe Morency, Stefan Scherer",Learning Representations of Affect from Speech,cs.CL cs.LG," There has been a lot of prior work on representation learning for speech -recognition applications, but not much emphasis has been given to an -investigation of effective representations of affect from speech, where the -paralinguistic elements of speech are separated out from the verbal content. In -this paper, we explore denoising autoencoders for learning paralinguistic -attributes i.e. categorical and dimensional affective traits from speech. We -show that the representations learnt by the bottleneck layer of the autoencoder -are highly discriminative of activation intensity and at separating out -negative valence (sadness and anger) from positive valence (happiness). We -experiment with different input speech features (such as FFT and log-mel -spectrograms with temporal context windows), and different autoencoder -architectures (such as stacked and deep autoencoders). We also learn utterance -specific representations by a combination of denoising autoencoders and BLSTM -based recurrent autoencoders. Emotion classification is performed with the -learnt temporal/dynamic representations to evaluate the quality of the -representations. Experiments on a well-established real-life speech dataset -(IEMOCAP) show that the learnt representations are comparable to state of the -art feature extractors (such as voice quality features and MFCCs) and are -competitive with state-of-the-art approaches at emotion and dimensional affect -recognition. -" -2229,1511.04834,"Arvind Neelakantan, Quoc V. Le, Ilya Sutskever",Neural Programmer: Inducing Latent Programs with Gradient Descent,cs.LG cs.CL stat.ML," Deep neural networks have achieved impressive supervised classification -performance in many tasks including image recognition, speech recognition, and -sequence to sequence learning. However, this success has not been translated to -applications like question answering that may involve complex arithmetic and -logic reasoning. A major limitation of these models is in their inability to -learn even simple arithmetic and logic operations. For example, it has been -shown that neural networks fail to learn to add two binary numbers reliably. In -this work, we propose Neural Programmer, an end-to-end differentiable neural -network augmented with a small set of basic arithmetic and logic operations. -Neural Programmer can call these augmented operations over several steps, -thereby inducing compositional programs that are more complex than the built-in -operations. The model learns from a weak supervision signal which is the result -of execution of the correct program, hence it does not require expensive -annotation of the correct program itself. The decisions of what operations to -call, and what data segments to apply to are inferred by Neural Programmer. -Such decisions, during training, are done in a differentiable fashion so that -the entire network can be trained jointly by gradient descent. We find that -training the model is difficult, but it can be greatly improved by adding -random noise to the gradient. On a fairly complex synthetic table-comprehension -dataset, traditional recurrent networks and attentional models perform poorly -while Neural Programmer typically obtains nearly perfect accuracy. -" -2230,1511.04868,"Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya - Sutskever and Samy Bengio",A Neural Transducer,cs.LG cs.CL cs.NE," Sequence-to-sequence models have achieved impressive results on various -tasks. However, they are unsuitable for tasks that require incremental -predictions to be made as more data arrives or tasks that have long input -sequences and output sequences. This is because they generate an output -sequence conditioned on an entire input sequence. In this paper, we present a -Neural Transducer that can make incremental predictions as more input arrives, -without redoing the entire computation. Unlike sequence-to-sequence models, the -Neural Transducer computes the next-step distribution conditioned on the -partially observed input sequence and the partially generated sequence. At each -time step, the transducer can decide to emit zero to many output symbols. The -data can be processed using an encoder and presented as input to the -transducer. The discrete decision to emit a symbol at every time step makes it -difficult to learn with conventional backpropagation. It is however possible to -train the transducer by using a dynamic programming algorithm to generate -target discrete decisions. Our experiments show that the Neural Transducer -works well in settings where it is required to produce output predictions as -data come in. We also find that the Neural Transducer performs well for long -sequences even when attention mechanisms are not used. -" -2231,1511.04891,"Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed - Elgammal",Sherlock: Scalable Fact Learning in Images,cs.CV cs.CL cs.LG," We study scalable and uniform understanding of facts in images. Existing -visual recognition systems are typically modeled differently for each fact type -such as objects, actions, and interactions. We propose a setting where all -these facts can be modeled simultaneously with a capacity to understand -unbounded number of facts in a structured way. The training data comes as -structured facts in images, including (1) objects (e.g., $<$boy$>$), (2) -attributes (e.g., $<$boy, tall$>$), (3) actions (e.g., $<$boy, playing$>$), and -(4) interactions (e.g., $<$boy, riding, a horse $>$). Each fact has a semantic -language view (e.g., $<$ boy, playing$>$) and a visual view (an image with this -fact). We show that learning visual facts in a structured way enables not only -a uniform but also generalizable visual understanding. We propose and -investigate recent and strong approaches from the multiview learning literature -and also introduce two learning representation models as potential baselines. -We applied the investigated methods on several datasets that we augmented with -structured facts and a large scale dataset of more than 202,000 facts and -814,000 images. Our experiments show the advantage of relating facts by the -structure by the proposed models compared to the designed baselines on -bidirectional fact retrieval. -" -2232,1511.04970,Bruno Gon\c{c}alves and David S\'anchez,Learning about Spanish dialects through Twitter,stat.ML cs.CL cs.CY physics.soc-ph stat.AP," This paper maps the large-scale variation of the Spanish language by -employing a corpus based on geographically tagged Twitter messages. Lexical -dialects are extracted from an analysis of variants of tens of concepts. The -resulting maps show linguistic variation on an unprecedented scale across the -globe. We discuss the properties of the main dialects within a machine learning -approach and find that varieties spoken in urban areas have an international -character in contrast to country areas where dialects show a more regional -uniformity. -" -2233,1511.05076,"Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain","Latent Dirichlet Allocation Based Organisation of Broadcast Media - Archives for Deep Neural Network Adaptation",cs.CL," This paper presents a new method for the discovery of latent domains in -diverse speech data, for the use of adaptation of Deep Neural Networks (DNNs) -for Automatic Speech Recognition. Our work focuses on transcription of -multi-genre broadcast media, which is often only categorised broadly in terms -of high level genres such as sports, news, documentary, etc. However, in terms -of acoustic modelling these categories are coarse. Instead, it is expected that -a mixture of latent domains can better represent the complex and diverse -behaviours within a TV show, and therefore lead to better and more robust -performance. We propose a new method, whereby these latent domains are -discovered with Latent Dirichlet Allocation, in an unsupervised manner. These -are used to adapt DNNs using the Unique Binary Code (UBIC) representation for -the LDA domains. Experiments conducted on a set of BBC TV broadcasts, with more -than 2,000 shows for training and 47 shows for testing, show that the use of -LDA-UBIC DNNs reduces the error up to 13% relative compared to the baseline -hybrid DNN models. -" -2234,1511.05099,"Peng Zhang, Yash Goyal, Douglas Summers-Stay, Dhruv Batra, Devi Parikh",Yin and Yang: Balancing and Answering Binary Visual Questions,cs.CL cs.CV cs.LG," The complex compositional structure of language makes problems at the -intersection of vision and language challenging. But language also provides a -strong prior that can result in good superficial performance, without the -underlying models truly understanding the visual content. This can hinder -progress in pushing state of art in the computer vision aspects of multi-modal -AI. In this paper, we address binary Visual Question Answering (VQA) on -abstract scenes. We formulate this problem as visual verification of concepts -inquired in the questions. Specifically, we convert the question to a tuple -that concisely summarizes the visual concept to be detected in the image. If -the concept can be found in the image, the answer to the question is ""yes"", and -otherwise ""no"". Abstract scenes play two roles (1) They allow us to focus on -the high-level semantics of the VQA task as opposed to the low-level -recognition problems, and perhaps more importantly, (2) They provide us the -modality to balance the dataset such that language priors are controlled, and -the role of vision is essential. In particular, we collect fine-grained pairs -of scenes for every question, such that the answer to the question is ""yes"" for -one scene, and ""no"" for the other for the exact same question. Indeed, language -priors alone do not perform better than chance on our balanced dataset. -Moreover, our proposed approach matches the performance of a state-of-the-art -VQA approach on the unbalanced dataset, and outperforms it on the balanced -dataset. -" -2235,1511.05234,Huijuan Xu and Kate Saenko,"Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for - Visual Question Answering",cs.CV cs.AI cs.CL cs.NE," We address the problem of Visual Question Answering (VQA), which requires -joint image and language understanding to answer a question about a given -photograph. Recent approaches have applied deep image captioning methods based -on convolutional-recurrent networks to this problem, but have failed to model -spatial inference. To remedy this, we propose a model we call the Spatial -Memory Network and apply it to the VQA task. Memory networks are recurrent -neural networks with an explicit attention mechanism that selects certain parts -of the information stored in memory. Our Spatial Memory Network stores neuron -activations from different spatial regions of the image in its memory, and uses -the question to choose relevant regions for computing the answer, a process of -which constitutes a single ""hop"" in the network. We propose a novel spatial -attention architecture that aligns words with image patches in the first hop, -and obtain improved results by adding a second attention hop which considers -the whole question to choose visual evidence based on the results of the first -hop. To better understand the inference process learned by the network, we -design synthetic questions that specifically require spatial inference and -visualize the attention weights. We evaluate our model on two published visual -question answering datasets, DAQUAR [1] and VQA [2], and obtain improved -results compared to a strong deep baseline model (iBOWIMG) which concatenates -image and question features to predict the answer [3]. -" -2236,1511.05284,"Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond - Mooney, Kate Saenko, Trevor Darrell","Deep Compositional Captioning: Describing Novel Object Categories - without Paired Training Data",cs.CV cs.CL," While recent deep neural network models have achieved promising results on -the image captioning task, they rely largely on the availability of corpora -with paired image and sentence captions to describe objects in context. In this -work, we propose the Deep Compositional Captioner (DCC) to address the task of -generating descriptions of novel objects which are not present in paired -image-sentence datasets. Our method achieves this by leveraging large object -recognition datasets and external text corpora and by transferring knowledge -between semantically similar concepts. Current deep caption models can only -describe objects contained in paired image-sentence corpora, despite the fact -that they are pre-trained with large object recognition datasets, namely -ImageNet. In contrast, our model can compose sentences that describe novel -objects and their interactions with other objects. We demonstrate our model's -ability to describe novel concepts by empirically evaluating its performance on -MSCOCO and show qualitative results on ImageNet images of objects for which no -paired image-caption data exist. Further, we extend our approach to generate -descriptions of objects in video clips. Our results show that DCC has distinct -advantages over existing image and video captioning approaches for generating -descriptions of new objects in context. -" -2237,1511.05389,"Imran Sheikh, Irina Illina, Dominique Fohr, Georges Linar\`es",Learning to retrieve out-of-vocabulary words in speech recognition,cs.CL," Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech -recognition systems used to process diachronic audio data. To help recovery of -the PNs missed by the system, relevant OOV PNs can be retrieved out of the many -OOVs by exploiting semantic context of the spoken content. In this paper, we -propose two neural network models targeted to retrieve OOV PNs relevant to an -audio document: (a) Document level Continuous Bag of Words (D-CBOW), (b) -Document level Continuous Bag of Weighted Words (D-CBOW2). Both these models -take document words as input and learn with an objective to maximise the -retrieval of co-occurring OOV PNs. With the D-CBOW2 model we propose a new -approach in which the input embedding layer is augmented with a context anchor -layer. This layer learns to assign importance to input words and has the -ability to capture (task specific) key-words in a bag-of-word neural network -model. With experiments on French broadcast news videos we show that these two -models outperform the baseline methods based on raw embeddings from LDA, -Skip-gram and Paragraph Vectors. Combining the D-CBOW and D-CBOW2 models gives -faster convergence during training. -" -2238,1511.05392,"Eric Nalisnick, Sachin Ravi",Learning the Dimensionality of Word Embeddings,stat.ML cs.CL cs.LG," We describe a method for learning word embeddings with data-dependent -dimensionality. Our Stochastic Dimensionality Skip-Gram (SD-SG) and Stochastic -Dimensionality Continuous Bag-of-Words (SD-CBOW) are nonparametric analogs of -Mikolov et al.'s (2013) well-known 'word2vec' models. Vector dimensionality is -made dynamic by employing techniques used by Cote & Larochelle (2016) to define -an RBM with an infinite number of hidden units. We show qualitatively and -quantitatively that SD-SG and SD-CBOW are competitive with their -fixed-dimension counterparts while providing a distribution over embedding -dimensionalities, which offers a window into how semantics distribute across -dimensions. -" -2239,1511.05526,Zhengyang Wu and Mohit Bansal and Matthew R. Walter,Learning Articulated Motion Models from Visual and Lingual Signals,cs.RO cs.CL cs.CV," In order for robots to operate effectively in homes and workplaces, they must -be able to manipulate the articulated objects common within environments built -for and by humans. Previous work learns kinematic models that prescribe this -manipulation from visual demonstrations. Lingual signals, such as natural -language descriptions and instructions, offer a complementary means of -conveying knowledge of such manipulation models and are suitable to a wide -range of interactions (e.g., remote manipulation). In this paper, we present a -multimodal learning framework that incorporates both visual and lingual -information to estimate the structure and parameters that define kinematic -models of articulated objects. The visual signal takes the form of an RGB-D -image stream that opportunistically captures object motion in an unprepared -scene. Accompanying natural language descriptions of the motion constitute the -lingual signal. We present a probabilistic language model that uses word -embeddings to associate lingual verbs with their corresponding kinematic -structures. By exploiting the complementary nature of the visual and lingual -input, our method infers correct kinematic structures for various multiple-part -objects on which the previous state-of-the-art, visual-only system fails. We -evaluate our multimodal learning framework on a dataset comprised of a variety -of household objects, and demonstrate a 36% improvement in model accuracy over -the vision-only baseline. -" -2240,1511.05756,"Hyeonwoo Noh, Paul Hongsuck Seo, Bohyung Han","Image Question Answering using Convolutional Neural Network with Dynamic - Parameter Prediction",cs.CV cs.CL cs.LG," We tackle image question answering (ImageQA) problem by learning a -convolutional neural network (CNN) with a dynamic parameter layer whose weights -are determined adaptively based on questions. For the adaptive parameter -prediction, we employ a separate parameter prediction network, which consists -of gated recurrent unit (GRU) taking a question as its input and a -fully-connected layer generating a set of candidate weights as its output. -However, it is challenging to construct a parameter prediction network for a -large number of parameters in the fully-connected dynamic parameter layer of -the CNN. We reduce the complexity of this problem by incorporating a hashing -technique, where the candidate weights given by the parameter prediction -network are selected using a predefined hash function to determine individual -weights in the dynamic parameter layer. The proposed network---joint network -with the CNN for ImageQA and the parameter prediction network---is trained -end-to-end through back-propagation, where its weights are initialized using a -pre-trained CNN and GRU. The proposed algorithm illustrates the -state-of-the-art performance on all available public ImageQA benchmarks. -" -2241,1511.05926,Thien Huu Nguyen and Ralph Grishman,"Combining Neural Networks and Log-linear Models to Improve Relation - Extraction",cs.CL cs.LG," The last decade has witnessed the success of the traditional feature-based -method on exploiting the discrete structures such as words or lexical patterns -to extract relations from text. Recently, convolutional and recurrent neural -networks has provided very effective mechanisms to capture the hidden -structures within sentences via continuous representations, thereby -significantly advancing the performance of relation extraction. The advantage -of convolutional neural networks is their capacity to generalize the -consecutive k-grams in the sentences while recurrent neural networks are -effective to encode long ranges of sentence context. This paper proposes to -combine the traditional feature-based method, the convolutional and recurrent -neural networks to simultaneously benefit from their advantages. Our systematic -evaluation of different network architectures and combination methods -demonstrates the effectiveness of this approach and results in the -state-of-the-art performance on the ACE 2005 and SemEval dataset. -" -2242,1511.06018,"Lingpeng Kong, Chris Dyer, Noah A. Smith",Segmental Recurrent Neural Networks,cs.CL cs.LG," We introduce segmental recurrent neural networks (SRNNs) which define, given -an input sequence, a joint probability distribution over segmentations of the -input and labelings of the segments. Representations of the input segments -(i.e., contiguous subsequences of the input) are computed by encoding their -constituent tokens using bidirectional recurrent neural nets, and these -""segment embeddings"" are used to define compatibility scores with output -labels. These local compatibility scores are integrated using a global -semi-Markov conditional random field. Both fully supervised training -- in -which segment boundaries and labels are observed -- as well as partially -supervised training -- in which segment boundaries are latent -- are -straightforward. Experiments on handwriting recognition and joint Chinese word -segmentation/POS tagging show that, compared to models that do not explicitly -represent segments such as BIO tagging schemes and connectionist temporal -classification (CTC), SRNNs obtain substantially higher accuracies. -" -2243,1511.06038,"Yishu Miao, Lei Yu and Phil Blunsom",Neural Variational Inference for Text Processing,cs.CL cs.LG stat.ML," Recent advances in neural variational inference have spawned a renaissance in -deep latent variable models. In this paper we introduce a generic variational -inference framework for generative and conditional models of text. While -traditional variational methods derive an analytic approximation for the -intractable distributions over latent variables, here we construct an inference -network conditioned on the discrete text input to provide the variational -distribution. We validate this framework on two very different text modelling -applications, generative document modelling and supervised question answering. -Our neural variational document model combines a continuous stochastic document -representation with a bag-of-words generative model and achieves the lowest -reported perplexities on two standard test corpora. The neural answer selection -model employs a stochastic representation layer within an attention mechanism -to extract the semantics between a question and answer pair. On two question -answering benchmarks this model exceeds all previous published benchmarks. -" -2244,1511.06052,Yi Yang and Jacob Eisenstein,"Overcoming Language Variation in Sentiment Analysis with Social - Attention",cs.CL cs.AI cs.SI," Variation in language is ubiquitous, particularly in newer forms of writing -such as social media. Fortunately, variation is not random, it is often linked -to social properties of the author. In this paper, we show how to exploit -social networks to make sentiment analysis more robust to social language -variation. The key idea is linguistic homophily: the tendency of socially -linked individuals to use language in similar ways. We formalize this idea in a -novel attention-based neural network architecture, in which attention is -divided among several basis models, depending on the author's position in the -social network. This has the effect of smoothing the classification function -across the social network, and makes it possible to induce personalized -classifiers even for authors for whom there is no labeled data or demographic -metadata. This model significantly improves the accuracies of sentiment -analysis on Twitter and on review data. -" -2245,1511.06066,Dong Wang and Thomas Fang Zheng,Transfer Learning for Speech and Language Processing,cs.CL cs.LG," Transfer learning is a vital technique that generalizes models trained for -one setting or task to other settings or tasks. For example in speech -recognition, an acoustic model trained for one language can be used to -recognize speech in another language, with little or no re-training data. -Transfer learning is closely related to multi-task learning (cross-lingual vs. -multilingual), and is traditionally studied in the name of `model adaptation'. -Recent advance in deep learning shows that transfer learning becomes much -easier and more effective with high-level abstract features learned by deep -models, and the `transfer' can be conducted not only between data distributions -and data types, but also between model structures (e.g., shallow nets and deep -nets) or even model types (e.g., Bayesian models and neural models). This -review paper summarizes some recent prominent research towards this direction, -particularly for speech and language processing. We also report some results -from our group and highlight the potential of this very interesting research -field. -" -2246,1511.06078,"Liwei Wang, Yin Li, Svetlana Lazebnik",Learning Deep Structure-Preserving Image-Text Embeddings,cs.CV cs.CL cs.LG," This paper proposes a method for learning joint embeddings of images and text -using a two-branch neural network with multiple layers of linear projections -followed by nonlinearities. The network is trained using a large margin -objective that combines cross-view ranking constraints with within-view -neighborhood structure preservation constraints inspired by metric learning -literature. Extensive experiments show that our approach gains significant -improvements in accuracy for image-to-text and text-to-image retrieval. Our -method achieves new state-of-the-art results on the Flickr30K and MSCOCO -image-sentence datasets and shows promise on the new task of phrase -localization on the Flickr30K Entities dataset. -" -2247,1511.06114,"Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz - Kaiser",Multi-task Sequence to Sequence Learning,cs.LG cs.CL stat.ML," Sequence to sequence learning has recently emerged as a new paradigm in -supervised learning. To date, most of its applications focused on only one task -and not much work explored this framework for multiple tasks. This paper -examines three multi-task learning (MTL) settings for sequence to sequence -models: (a) the oneto-many setting - where the encoder is shared between -several tasks such as machine translation and syntactic parsing, (b) the -many-to-one setting - useful when only the decoder can be shared, as in the -case of translation and image caption generation, and (c) the many-to-many -setting - where multiple encoders and decoders are shared, which is the case -with unsupervised objectives and translation. Our results show that training on -a small amount of parsing and image caption data can improve the translation -quality between English and German by up to 1.5 BLEU points over strong -single-task baselines on the WMT benchmarks. Furthermore, we have established a -new state-of-the-art result in constituent parsing with 93.0 F1. Lastly, we -reveal interesting properties of the two unsupervised learning objectives, -autoencoder and skip-thought, in the MTL context: autoencoder helps less in -terms of perplexities but more on BLEU scores compared to skip-thought. -" -2248,1511.06219,"Lucas Sterckx and Thomas Demeester and Johannes Deleu and Chris - Develder",Knowledge Base Population using Semantic Label Propagation,cs.CL cs.LG," A crucial aspect of a knowledge base population system that extracts new -facts from text corpora, is the generation of training data for its relation -extractors. In this paper, we present a method that maximizes the effectiveness -of newly trained relation extractors at a minimal annotation cost. Manual -labeling can be significantly reduced by Distant Supervision, which is a method -to construct training data automatically by aligning a large text corpus with -an existing knowledge base of known facts. For example, all sentences -mentioning both 'Barack Obama' and 'US' may serve as positive training -instances for the relation born_in(subject,object). However, distant -supervision typically results in a highly noisy training set: many training -sentences do not really express the intended relation. We propose to combine -distant supervision with minimal manual supervision in a technique called -feature labeling, to eliminate noise from the large and noisy initial training -set, resulting in a significant increase of precision. We further improve on -this approach by introducing the Semantic Label Propagation method, which uses -the similarity between low-dimensional representations of candidate training -instances, to extend the training set in order to increase recall while -maintaining high precision. Our proposed strategy for generating training data -is studied and evaluated on an established test collection designed for -knowledge base population tasks. The experimental results show that the -Semantic Label Propagation strategy leads to substantial performance gains when -compared to existing approaches, while requiring an almost negligible manual -annotation effort. -" -2249,1511.06246,"Xinchi Chen, Xipeng Qiu, Jingxiang Jiang, Xuanjing Huang",Gaussian Mixture Embeddings for Multiple Word Prototypes,cs.CL," Recently, word representation has been increasingly focused on for its -excellent properties in representing the word semantics. Previous works mainly -suffer from the problem of polysemy phenomenon. To address this problem, most -of previous models represent words as multiple distributed vectors. However, it -cannot reflect the rich relations between words by representing words as points -in the embedded space. In this paper, we propose the Gaussian mixture skip-gram -(GMSG) model to learn the Gaussian mixture embeddings for words based on -skip-gram framework. Each word can be regarded as a gaussian mixture -distribution in the embedded space, and each gaussian component represents a -word sense. Since the number of senses varies from word to word, we further -propose the Dynamic GMSG (D-GMSG) model by adaptively increasing the sense -number of words during training. Experiments on four benchmarks show the -effectiveness of our proposed model. -" -2250,1511.06285,"Krzysztof Wo{\l}k, Emilia Rejmund, Krzysztof Marasek","Harvesting comparable corpora and mining them for equivalent bilingual - sentences using statistical classification and analogy- based heuristics",cs.CL stat.ML," Parallel sentences are a relatively scarce but extremely useful resource for -many applications including cross-lingual retrieval and statistical machine -translation. This research explores our new methodologies for mining such data -from previously obtained comparable corpora. The task is highly practical since -non-parallel multilingual data exist in far greater quantities than parallel -corpora, but parallel sentences are a much more useful resource. Here we -propose a web crawling method for building subject-aligned comparable corpora -from e.g. Wikipedia dumps and Euronews web page. The improvements in machine -translation are shown on Polish-English language pair for various text domains. -We also tested another method of building parallel corpora based on comparable -corpora data. It lets automatically broad existing corpus of sentences from -subject of corpora based on analogies between them. -" -2251,1511.06303,Piotr Bojanowski and Armand Joulin and Tomas Mikolov,Alternative structures for character-level RNNs,cs.LG cs.CL," Recurrent neural networks are convenient and efficient models for language -modeling. However, when applied on the level of characters instead of words, -they suffer from several problems. In order to successfully model long-term -dependencies, the hidden representation needs to be large. This in turn implies -higher computational costs, which can become prohibitive in practice. We -propose two alternative structural modifications to the classical RNN model. -The first one consists on conditioning the character level representation on -the previous word representation. The other one uses the character history to -condition the output probability. We evaluate the performance of the two -proposed modifications on challenging, multi-lingual real world data. -" -2252,1511.06312,"James Cross, Bing Xiang, and Bowen Zhou","Good, Better, Best: Choosing Word Embedding Context",cs.CL," We propose two methods of learning vector representations of words and -phrases that each combine sentence context with structural features extracted -from dependency trees. Using several variations of neural network classifier, -we show that these combined methods lead to improved performance when used as -input features for supervised term-matching. -" -2253,1511.06341,"Ramanathan V Guha, Vineet Gupta",Communicating Semantics: Reference by Description,cs.CL cs.AI," Messages often refer to entities such as people, places and events. Correct -identification of the intended reference is an essential part of communication. -Lack of shared unique names often complicates entity reference. Shared -knowledge can be used to construct uniquely identifying descriptive references -for entities with ambiguous names. We introduce a mathematical model for -`Reference by Description', derive results on the conditions under which, with -high probability, programs can construct unambiguous references to most -entities in the domain of discourse and provide empirical validation of these -results. -" -2254,1511.06349,"Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal - Jozefowicz, Samy Bengio",Generating Sentences from a Continuous Space,cs.LG cs.CL," The standard recurrent neural network language model (RNNLM) generates -sentences one word at a time and does not work from an explicit global sentence -representation. In this work, we introduce and study an RNN-based variational -autoencoder generative model that incorporates distributed latent -representations of entire sentences. This factorization allows it to explicitly -model holistic properties of sentences such as style, topic, and high-level -syntactic features. Samples from the prior over these sentence representations -remarkably produce diverse and well-formed sentences through simple -deterministic decoding. By examining paths through this latent space, we are -able to generate coherent novel sentences that interpolate between known -sentences. We present techniques for solving the difficult learning problem -presented by this model, demonstrate its effectiveness in imputing missing -words, explore many interesting properties of the model's latent sentence -space, and present negative results on the use of the model in language -modeling. -" -2255,1511.06361,"Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun",Order-Embeddings of Images and Language,cs.LG cs.CL cs.CV," Hypernymy, textual entailment, and image captioning can be seen as special -cases of a single visual-semantic hierarchy over words, sentences, and images. -In this paper we advocate for explicitly modeling the partial order structure -of this hierarchy. Towards this goal, we introduce a general method for -learning ordered representations, and show how it can be applied to a variety -of tasks involving images and language. We show that the resulting -representations improve performance over current approaches for hypernym -prediction and image-caption retrieval. -" -2256,1511.06379,"Richard Searle, Megan Bingham-Walker",Dynamic Adaptive Network Intelligence,cs.CL cs.LG," Accurate representational learning of both the explicit and implicit -relationships within data is critical to the ability of machines to perform -more complex and abstract reasoning tasks. We describe the efficient weakly -supervised learning of such inferences by our Dynamic Adaptive Network -Intelligence (DANI) model. We report state-of-the-art results for DANI over -question answering tasks in the bAbI dataset that have proved difficult for -contemporary approaches to learning representation (Weston et al., 2015). -" -2257,1511.06388,"Andrew Trask, Phil Michalak, John Liu","sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In - Neural Word Embeddings",cs.CL cs.LG," Neural word representations have proven useful in Natural Language Processing -(NLP) tasks due to their ability to efficiently model complex semantic and -syntactic word relationships. However, most techniques model only one -representation per word, despite the fact that a single word can have multiple -meanings or ""senses"". Some techniques model words by using multiple vectors -that are clustered based on context. However, recent neural approaches rarely -focus on the application to a consuming NLP algorithm. Furthermore, the -training process of recent word-sense models is expensive relative to -single-sense embedding processes. This paper presents a novel approach which -addresses these concerns by modeling multiple embeddings for each word based on -supervised disambiguation, which provides a fast and accurate way for a -consuming NLP model to select a sense-disambiguated embedding. We demonstrate -that these embeddings can disambiguate both contrastive senses such as nominal -and verbal senses as well as nuanced senses such as sarcasm. We further -evaluate Part-of-Speech disambiguated embeddings on neural dependency parsing, -yielding a greater than 8% average error reduction in unlabeled attachment -scores across 6 languages. -" -2258,1511.06391,"Oriol Vinyals, Samy Bengio, Manjunath Kudlur",Order Matters: Sequence to sequence for sets,stat.ML cs.CL cs.LG," Sequences have become first class citizens in supervised learning thanks to -the resurgence of recurrent neural networks. Many complex tasks that require -mapping from or to a sequence of observations can now be formulated with the -sequence-to-sequence (seq2seq) framework which employs the chain rule to -efficiently represent the joint probability of sequences. In many cases, -however, variable sized inputs and/or outputs might not be naturally expressed -as sequences. For instance, it is not clear how to input a set of numbers into -a model where the task is to sort them; similarly, we do not know how to -organize outputs when they correspond to random variables and the task is to -model their unknown joint probability. In this paper, we first show using -various examples that the order in which we organize input and/or output data -matters significantly when learning an underlying model. We then discuss an -extension of the seq2seq framework that goes beyond sequences and handles input -sets in a principled way. In addition, we propose a loss which, by searching -over possible orders during training, deals with the lack of structure of -output sets. We show empirical evidence of our claims regarding ordering, and -on the modifications to the seq2seq framework on benchmark language modeling -and parsing tasks, as well as two artificial tasks -- sorting numbers and -estimating the joint probability of unknown graphical models. -" -2259,1511.06396,"Patrick Verga, David Belanger, Emma Strubell, Benjamin Roth, Andrew - McCallum",Multilingual Relation Extraction using Compositional Universal Schema,cs.CL cs.LG," Universal schema builds a knowledge base (KB) of entities and relations by -jointly embedding all relation types from input KBs as well as textual patterns -expressing relations from raw text. In most previous applications of universal -schema, each textual pattern is represented as a single embedding, preventing -generalization to unseen patterns. Recent work employs a neural network to -capture patterns' compositional semantics, providing generalization to all -possible input text. In response, this paper introduces significant further -improvements to the coverage and flexibility of universal schema relation -extraction: predictions for entities unseen in training and multilingual -transfer learning to domains with no annotation. We evaluate our model through -extensive experiments on the English and Spanish TAC KBP benchmark, -outperforming the top system from TAC 2013 slot-filling using no handwritten -patterns or additional annotation. We also consider a multilingual setting in -which English training data entities overlap with the seed KB, but Spanish text -does not. Despite having no annotation for Spanish data, we train an accurate -predictor, with additional improvements obtained by tying word embeddings -across languages. Furthermore, we find that multilingual training improves -English relation extraction accuracy. Our approach is thus suited to -broad-coverage automated knowledge base construction in a variety of languages -and domains. -" -2260,1511.06397,Martin Andrews,Compressing Word Embeddings,cs.CL cs.LG," Recent methods for learning vector space representations of words have -succeeded in capturing fine-grained semantic and syntactic regularities using -vector arithmetic. However, these vector space representations (created through -large-scale text analysis) are typically stored verbatim, since their internal -structure is opaque. Using word-analogy tests to monitor the level of detail -stored in compressed re-representations of the same vector space, the -trade-offs between the reduction in memory usage and expressiveness are -investigated. A simple scheme is outlined that can reduce the memory footprint -of a state-of-the-art embedding by a factor of 10, with only minimal impact on -performance. Then, using the same `bit budget', a binary (approximate) -factorisation of the same space is also explored, with the aim of creating an -equivalent representation with better interpretability. -" -2261,1511.06407,"Suyoun Kim, Ian Lane","Recurrent Models for Auditory Attention in Multi-Microphone Distance - Speech Recognition",cs.LG cs.CL," Integration of multiple microphone data is one of the key ways to achieve -robust speech recognition in noisy environments or when the speaker is located -at some distance from the input device. Signal processing techniques such as -beamforming are widely used to extract a speech signal of interest from -background noise. These techniques, however, are highly dependent on prior -spatial information about the microphones and the environment in which the -system is being used. In this work, we present a neural attention network that -directly combines multi-channel audio to generate phonetic states without -requiring any prior knowledge of the microphone layout or any explicit signal -preprocessing for speech enhancement. We embed an attention mechanism within a -Recurrent Neural Network (RNN) based acoustic model to automatically tune its -attention to a more reliable input source. Unlike traditional multi-channel -preprocessing, our system can be optimized towards the desired output in one -step. Although attention-based models have recently achieved impressive results -on sequence-to-sequence learning, no attention mechanisms have previously been -applied to learn potentially asynchronous and non-stationary multiple inputs. -We evaluate our neural attention model on the CHiME-3 challenge task, and show -that the model achieves comparable performance to beamforming using a purely -data-driven method. -" -2262,1511.06420,Ethan Caballero,Skip-Thought Memory Networks,cs.NE cs.CL cs.LG," Question Answering (QA) is fundamental to natural language processing in that -most nlp problems can be phrased as QA (Kumar et al., 2015). Current weakly -supervised memory network models that have been proposed so far struggle at -answering questions that involve relations among multiple entities (such as -facebook's bAbi qa5-three-arg-relations in (Weston et al., 2015)). To address -this problem of learning multi-argument multi-hop semantic relations for the -purpose of QA, we propose a method that combines the jointly learned long-term -read-write memory and attentive inference components of end-to-end memory -networks (MemN2N) (Sukhbaatar et al., 2015) with distributed sentence vector -representations encoded by a Skip-Thought model (Kiros et al., 2015). This -choice to append Skip-Thought Vectors to the existing MemN2N framework is -motivated by the fact that Skip-Thought Vectors have been shown to accurately -model multi-argument semantic relations (Kiros et al., 2015). -" -2263,1511.06426,"Moontae Lee, Xiaodong He, Wen-tau Yih, Jianfeng Gao, Li Deng, Paul - Smolensky",Reasoning in Vector Space: An Exploratory Study of Question Answering,cs.CL," Question answering tasks have shown remarkable progress with distributed -vector representation. In this paper, we investigate the recently proposed -Facebook bAbI tasks which consist of twenty different categories of questions -that require complex reasoning. Because the previous work on bAbI are all -end-to-end models, errors could come from either an imperfect understanding of -semantics or in certain steps of the reasoning. For clearer analysis, we -propose two vector space models inspired by Tensor Product Representation (TPR) -to perform knowledge encoding and logical reasoning based on common-sense -inference. They together achieve near-perfect accuracy on all categories -including positional reasoning and path finding that have proved difficult for -most of the previous approaches. We hypothesize that the difficulties in these -categories are due to the multi-relations in contrast to uni-relational -characteristic of other categories. Our exploration sheds light on designing -more sophisticated dataset and moving one step toward integrating transparent -and interpretable formalism of TPR into existing learning paradigms. -" -2264,1511.06438,"Danushka Bollegala, Alsuhaibani Mohammed, Takanori Maehara, Ken-ichi - Kawarabayashi",Joint Word Representation Learning using a Corpus and a Semantic Lexicon,cs.CL cs.AI," Methods for learning word representations using large text corpora have -received much attention lately due to their impressive performance in numerous -natural language processing (NLP) tasks such as, semantic similarity -measurement, and word analogy detection. Despite their success, these -data-driven word representation learning methods do not consider the rich -semantic relational structure between words in a co-occurring context. On the -other hand, already much manual effort has gone into the construction of -semantic lexicons such as the WordNet that represent the meanings of words by -defining the various relationships that exist among the words in a language. We -consider the question, can we improve the word representations learnt using a -corpora by integrating the knowledge from semantic lexicons?. For this purpose, -we propose a joint word representation learning method that simultaneously -predicts the co-occurrences of two words in a sentence subject to the -relational constrains given by the semantic lexicon. We use relations that -exist between words in the lexicon to regularize the word representations -learnt from the corpus. Our proposed method statistically significantly -outperforms previously proposed methods for incorporating semantic lexicons -into word representations on several benchmark datasets for semantic similarity -and word analogy. -" -2265,1511.06591,Normunds Gruzitis and Guntis Barzdins,Polysemy in Controlled Natural Language Texts,cs.CL," Computational semantics and logic-based controlled natural languages (CNL) do -not address systematically the word sense disambiguation problem of content -words, i.e., they tend to interpret only some functional words that are crucial -for construction of discourse representation structures. We show that -micro-ontologies and multi-word units allow integration of the rich and -polysemous multi-domain background knowledge into CNL thus providing -interpretation for the content words. The proposed approach is demonstrated by -extending the Attempto Controlled English (ACE) with polysemous and procedural -constructs resulting in a more natural CNL named PAO covering narrative -multi-domain texts. -" -2266,1511.06674,Anirudh Goyal and Marius Leordeanu,"Stories in the Eye: Contextual Visual Interactions for Efficient Video - to Language Translation",cs.CV cs.CL," Integrating higher level visual and linguistic interpretations is at the -heart of human intelligence. As automatic visual category recognition in images -is approaching human performance, the high level understanding in the dynamic -spatiotemporal domain of videos and its translation into natural language is -still far from being solved. While most works on vision-to-text translations -use pre-learned or pre-established computational linguistic models, in this -paper we present an approach that uses vision alone to efficiently learn how to -translate into language the video content. We discover, in simple form, the -story played by main actors, while using only visual cues for representing -objects and their interactions. Our method learns in a hierarchical manner -higher level representations for recognizing subjects, actions and objects -involved, their relevant contextual background and their interaction to one -another over time. We have a three stage approach: first we take in -consideration features of the individual entities at the local level of -appearance, then we consider the relationship between these objects and actions -and their video background, and third, we consider their spatiotemporal -relations as inputs to classifiers at the highest level of interpretation. -Thus, our approach finds a coherent linguistic description of videos in the -form of a subject, verb and object based on their role played in the overall -visual story learned directly from training data, without using a known -language model. We test the efficiency of our approach on a large scale dataset -containing YouTube clips taken in the wild and demonstrate state-of-the-art -performance, often superior to current approaches that use more complex, -pre-learned linguistic knowledge. -" -2267,1511.06709,Rico Sennrich and Barry Haddow and Alexandra Birch,Improving Neural Machine Translation Models with Monolingual Data,cs.CL," Neural Machine Translation (NMT) has obtained state-of-the art performance -for several language pairs, while only using parallel data for training. -Target-side monolingual data plays an important role in boosting fluency for -phrase-based statistical machine translation, and we investigate the use of -monolingual data for NMT. In contrast to previous work, which combines NMT -models with separately trained language models, we note that encoder-decoder -NMT architectures already have the capacity to learn the same information as a -language model, and we explore strategies to train with monolingual data -without changing the neural network architecture. By pairing monolingual -training data with an automatic back-translation, we can treat it as additional -parallel training data, and we obtain substantial improvements on the WMT 15 -task English<->German (+2.8-3.7 BLEU), and for the low-resourced IWSLT 14 task -Turkish->English (+2.1-3.4 BLEU), obtaining new state-of-the-art results. We -also show that fine-tuning on in-domain monolingual and parallel data gives -substantial improvements for the IWSLT 15 task English->German. -" -2268,1511.06732,"Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba",Sequence Level Training with Recurrent Neural Networks,cs.LG cs.CL," Many natural language processing applications use language models to generate -text. These models are typically trained to predict the next word in a -sequence, given the previous words and some context such as an image. However, -at test time the model is expected to generate the entire sequence from -scratch. This discrepancy makes generation brittle, as errors may accumulate -along the way. We address this issue by proposing a novel sequence level -training algorithm that directly optimizes the metric used at test time, such -as BLEU or ROUGE. On three different tasks, our approach outperforms several -strong baselines for greedy generation. The method is also competitive when -these baselines employ beam search, while being several times faster. -" -2269,1511.06798,Luke Miratrix and Robin Ackerman,"Conducting sparse feature selection on arbitrarily long phrases in text - corpora with a focus on interpretability",cs.CL cs.IR stat.AP," We propose a general framework for topic-specific summarization of large text -corpora, and illustrate how it can be used for analysis in two quite different -contexts: an OSHA database of fatality and catastrophe reports (to facilitate -surveillance for patterns in circumstances leading to injury or death) and -legal decisions on workers' compensation claims (to explore relevant case law). -Our summarization framework, built on sparse classification methods, is a -compromise between simple word frequency based methods currently in wide use, -and more heavyweight, model-intensive methods such as Latent Dirichlet -Allocation (LDA). For a particular topic of interest (e.g., mental health -disability, or chemical reactions), we regress a labeling of documents onto the -high-dimensional counts of all the other words and phrases in the documents. -The resulting small set of phrases found as predictive are then harvested as -the summary. Using a branch-and-bound approach, this method can be extended to -allow for phrases of arbitrary length, which allows for potentially rich -summarization. We discuss how focus on the purpose of the summaries can inform -choices of regularization parameters and model constraints. We evaluate this -tool by comparing computational time and summary statistics of the resulting -word lists to three other methods in the literature. We also present a new R -package, textreg. Overall, we argue that sparse methods have much to offer text -analysis, and is a branch of research that should be considered further in this -context. -" -2270,1511.06833,"S. Thenmalar, J. Balaji, and T.V. Geetha",Semi-supervised Bootstrapping approach for Named Entity Recognition,cs.CL cs.IR," The aim of Named Entity Recognition (NER) is to identify references of named -entities in unstructured documents, and to classify them into pre-defined -semantic categories. NER often aids from added background knowledge in the form -of gazetteers. However using such a collection does not deal with name variants -and cannot resolve ambiguities associated in identifying the entities in -context and associating them with predefined categories. We present a -semi-supervised NER approach that starts with identifying named entities with a -small set of training data. Using the identified named entities, the word and -the context features are used to define the pattern. This pattern of each named -entity category is used as a seed pattern to identify the named entities in the -test set. Pattern scoring and tuple value score enables the generation of the -new patterns to identify the named entity categories. We have evaluated the -proposed system for English language with the dataset of tagged (IEER) and -untagged (CoNLL 2003) named entity corpus and for Tamil language with the -documents from the FIRE corpus and yield an average f-measure of 75% for both -the languages. -" -2271,1511.06838,"Takuya Narihira, Damian Borth, Stella X. Yu, Karl Ni, Trevor Darrell","Mapping Images to Sentiment Adjective Noun Pairs with Factorized Neural - Nets",cs.CV cs.CL," We consider the visual sentiment task of mapping an image to an adjective -noun pair (ANP) such as ""cute baby"". To capture the two-factor structure of our -ANP semantics as well as to overcome annotation noise and ambiguity, we propose -a novel factorized CNN model which learns separate representations for -adjectives and nouns but optimizes the classification performance over their -product. Our experiments on the publicly available SentiBank dataset show that -our model significantly outperforms not only independent ANP classifiers on -unseen ANPs and on retrieving images of novel ANPs, but also image captioning -models which capture word semantics from co-occurrence of natural text; the -latter turn out to be surprisingly poor at capturing the sentiment evoked by -pure visual experience. That is, our factorized ANP CNN not only trains better -from noisy labels, generalizes better to new images, but can also expands the -ANP vocabulary on its own. -" -2272,1511.06909,"Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson - and Pradeep Dubey","BlackOut: Speeding up Recurrent Neural Network Language Models With Very - Large Vocabularies",cs.LG cs.CL cs.NE stat.ML," We propose BlackOut, an approximation algorithm to efficiently train massive -recurrent neural network language models (RNNLMs) with million word -vocabularies. BlackOut is motivated by using a discriminative loss, and we -describe a new sampling strategy which significantly reduces computation while -improving stability, sample efficiency, and rate of convergence. One way to -understand BlackOut is to view it as an extension of the DropOut strategy to -the output layer, wherein we use a discriminative training loss and a weighted -sampling scheme. We also establish close connections between BlackOut, -importance sampling, and noise contrastive estimation (NCE). Our experiments, -on the recently released one billion word language modeling benchmark, -demonstrate scalability and accuracy of BlackOut; we outperform the -state-of-the art, and achieve the lowest perplexity scores on this dataset. -Moreover, unlike other established methods which typically require GPUs or CPU -clusters, we show that a carefully implemented version of BlackOut requires -only 1-10 days on a single machine to train a RNNLM with a million word -vocabulary and billions of parameters on one billion words. Although we -describe BlackOut in the context of RNNLM training, it can be used to any -networks with large softmax output layers. -" -2273,1511.06931,"Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, - Alexander Miller, Arthur Szlam, Jason Weston",Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems,cs.CL cs.LG," A long-term goal of machine learning is to build intelligent conversational -agents. One recent popular approach is to train end-to-end models on a large -amount of real dialog transcripts between humans (Sordoni et al., 2015; Vinyals -& Le, 2015; Shang et al., 2015). However, this approach leaves many questions -unanswered as an understanding of the precise successes and shortcomings of -each model is hard to assess. A contrasting recent proposal are the bAbI tasks -(Weston et al., 2015b) which are synthetic data that measure the ability of -learning machines at various reasoning tasks over toy language. Unfortunately, -those tests are very small and hence may encourage methods that do not scale. -In this work, we propose a suite of new tasks of a much larger scale that -attempt to bridge the gap between the two regimes. Choosing the domain of -movies, we provide tasks that test the ability of models to answer factual -questions (utilizing OMDB), provide personalization (utilizing MovieLens), -carry short conversations about the two, and finally to perform on natural -dialogs from Reddit. We provide a dataset covering 75k movie entities and with -3.5M training examples. We present results of various models on these tasks, -and evaluate their performance. -" -2274,1511.06961,Lisa Seung-Yeon Lee,On the Linear Algebraic Structure of Distributed Word Representations,cs.CL cs.LG," In this work, we leverage the linear algebraic structure of distributed word -representations to automatically extend knowledge bases and allow a machine to -learn new facts about the world. Our goal is to extract structured facts from -corpora in a simpler manner, without applying classifiers or patterns, and -using only the co-occurrence statistics of words. We demonstrate that the -linear algebraic structure of word embeddings can be used to reduce data -requirements for methods of learning facts. In particular, we demonstrate that -words belonging to a common category, or pairs of words satisfying a certain -relation, form a low-rank subspace in the projected space. We compute a basis -for this low-rank subspace using singular value decomposition (SVD), then use -this basis to discover new facts and to fit vectors for less frequent words -which we do not yet have vectors for. -" -2275,1511.06995,Paolo Dragone,"Non-Sentential Utterances in Dialogue: Experiments in Classification and - Interpretation",cs.CL cs.AI," Non-sentential utterances (NSUs) are utterances that lack a complete -sentential form but whose meaning can be inferred from the dialogue context, -such as ""OK"", ""where?"", ""probably at his apartment"". The interpretation of -non-sentential utterances is an important problem in computational linguistics -since they constitute a frequent phenomena in dialogue and they are -intrinsically context-dependent. The interpretation of NSUs is the task of -retrieving their full semantic content from their form and the dialogue -context. The first half of this thesis is devoted to the NSU classification -task. Our work builds upon Fern\'andez et al. (2007) which present a series of -machine-learning experiments on the classification of NSUs. We extended their -approach with a combination of new features and semi-supervised learning -techniques. The empirical results presented in this thesis show a modest but -significant improvement over the state-of-the-art classification performance. -The consecutive, yet independent, problem is how to infer an appropriate -semantic representation of such NSUs on the basis of the dialogue context. -Fern\'andez (2006) formalizes this task in terms of ""resolution rules"" built on -top of the Type Theory with Records (TTR). Our work is focused on the -reimplementation of the resolution rules from Fern\'andez (2006) with a -probabilistic account of the dialogue state. The probabilistic rules formalism -Lison (2014) is particularly suited for this task because, similarly to the -framework developed by Ginzburg (2012) and Fern\'andez (2006), it involves the -specification of update rules on the variables of the dialogue state to capture -the dynamics of the conversation. However, the probabilistic rules can also -encode probabilistic knowledge, thereby providing a principled account of -ambiguities in the NSU resolution process. -" -2276,1511.07001,A.C. Sparavigna and R. Marazzato,"Analysis of a Play by Means of CHAPLIN, the Characters and Places - Interaction Network Software",cs.CY cs.CL cs.SI," Recently, we have developed a software able of gathering information on -social networks from written texts. This software, the CHAracters and PLaces -Interaction Network (CHAPLIN) tool, is implemented in Visual Basic. By means of -it, characters and places of a literary work can be extracted from a list of -raw words. The software interface helps users to select their names out of this -list. Setting some parameters, CHAPLIN creates a network where nodes represent -characters/places and edges give their interactions. Nodes and edges are -labelled by performances. In this paper, we propose to use CHAPLIN for the -analysis a William Shakespeare's play, the famous 'Tragedy of Hamlet, Prince of -Denmark'. Performances of characters in the play as a whole and in each act of -it are given by graphs. -" -2277,1511.07067,"Satwik Kottur, Ramakrishna Vedantam, Jos\'e M. F. Moura, Devi Parikh","Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings - Using Abstract Scenes",cs.CV cs.CL," We propose a model to learn visually grounded word embeddings (vis-w2v) to -capture visual notions of semantic relatedness. While word embeddings trained -using text have been extremely successful, they cannot uncover notions of -semantic relatedness implicit in our visual world. For instance, although -""eats"" and ""stares at"" seem unrelated in text, they share semantics visually. -When people are eating something, they also tend to stare at the food. -Grounding diverse relations like ""eats"" and ""stares at"" into vision remains -challenging, despite recent progress in vision. We note that the visual -grounding of words depends on semantics, and not the literal pixels. We thus -use abstract scenes created from clipart to provide the visual grounding. We -find that the embeddings we learn capture fine-grained, visually grounded -notions of semantic relatedness. We show improvements over text-only word -embeddings (word2vec) on three tasks: common-sense assertion classification, -visual paraphrasing and text-based image retrieval. Our code and datasets are -available online. -" -2278,1511.07607,"Rahul Anand Sharma, Pramod Sankar K and CV Jawahar",Fine-Grain Annotation of Cricket Videos,cs.MM cs.CL cs.CV," The recognition of human activities is one of the key problems in video -understanding. Action recognition is challenging even for specific categories -of videos, such as sports, that contain only a small set of actions. -Interestingly, sports videos are accompanied by detailed commentaries available -online, which could be used to perform action annotation in a weakly-supervised -setting. For the specific case of Cricket videos, we address the challenge of -temporal segmentation and annotation of ctions with semantic descriptions. Our -solution consists of two stages. In the first stage, the video is segmented -into ""scenes"", by utilizing the scene category information extracted from -text-commentary. The second stage consists of classifying video-shots as well -as the phrases in the textual description into various categories. The relevant -phrases are then suitably mapped to the video-shots. The novel aspect of this -work is the fine temporal scale at which semantic information is assigned to -the video. As a result of our approach, we enable retrieval of specific actions -that last only a few seconds, from several hours of video. This solution yields -a large number of labeled exemplars, with no manual effort, that could be used -by machine learning algorithms to learn complex actions. -" -2279,1511.07788,"Krzysztof Marasek, {\L}ukasz Brocki, Danijel Korzinek, Krzysztof - Wo{\l}k, Ryszard Gubrynowicz",Spoken Language Translation for Polish,cs.CL," Spoken language translation (SLT) is becoming more important in the -increasingly globalized world, both from a social and economic point of view. -It is one of the major challenges for automatic speech recognition (ASR) and -machine translation (MT), driving intense research activities in these areas. -While past research in SLT, due to technology limitations, dealt mostly with -speech recorded under controlled conditions, today's major challenge is the -translation of spoken language as it can be found in real life. Considered -application scenarios range from portable translators for tourists, lectures -and presentations translation, to broadcast news and shows with live -captioning. We would like to present PJIIT's experiences in the SLT gained from -the Eu-Bridge 7th framework project and the U-Star consortium activities for -the Polish/English language pair. Presented research concentrates on ASR -adaptation for Polish (state-of-the-art acoustic models: DBN-BLSTM training, -Kaldi: LDA+MLLT+SAT+MMI), language modeling for ASR & MT (text normalization, -RNN-based LMs, n-gram model domain interpolation) and statistical translation -techniques (hierarchical models, factored translation models, automatic casing -and punctuation, comparable and bilingual corpora preparation). While results -for the well-defined domains (phrases for travelers, parliament speeches, -medical documentation, movie subtitling) are very encouraging, less defined -domains (presentation, lectures) still form a challenge. Our progress in the -IWSLT TED task (MT only) will be presented, as well as current progress in the -Polish ASR. -" -2280,1511.07916,Kyunghyun Cho,Natural Language Understanding with Distributed Representation,cs.CL stat.ML," This is a lecture note for the course DS-GA 3001 at the Center for Data Science , -New York University in Fall, 2015. As the name of the course suggests, this -lecture note introduces readers to a neural network based approach to natural -language understanding/processing. In order to make it as self-contained as -possible, I spend much time on describing basics of machine learning and neural -networks, only after which how they are used for natural languages is -introduced. On the language front, I almost solely focus on language modelling -and machine translation, two of which I personally find most fascinating and -most fundamental to natural language understanding. -" -2281,1511.07972,"Volker Tresp and Crist\'obal Esteban and Yinchong Yang and Stephan - Baier and Denis Krompa{\ss}",Learning with Memory Embeddings,cs.AI cs.CL cs.LG," Embedding learning, a.k.a. representation learning, has been shown to be able -to model large-scale semantic knowledge graphs. A key concept is a mapping of -the knowledge graph to a tensor representation whose entries are predicted by -models using latent representations of generalized entities. Latent variable -models are well suited to deal with the high dimensionality and sparsity of -typical knowledge graphs. In recent publications the embedding models were -extended to also consider time evolutions, time patterns and subsymbolic -representations. In this paper we map embedding models, which were developed -purely as solutions to technical problems for modelling temporal knowledge -graphs, to various cognitive memory functions, in particular to semantic and -concept memory, episodic memory, sensory memory, short-term memory, and working -memory. We discuss learning, query answering, the path from sensory input to -semantic decoding, and the relationship between episodic memory and semantic -memory. We introduce a number of hypotheses on human memory that can be derived -from the developed mathematical models. -" -2282,1511.08130,"Tomas Mikolov, Armand Joulin, Marco Baroni",A Roadmap towards Machine Intelligence,cs.AI cs.CL," The development of intelligent machines is one of the biggest unsolved -challenges in computer science. In this paper, we propose some fundamental -properties these machines should have, focusing in particular on communication -and learning. We discuss a simple environment that could be used to -incrementally teach a machine the basics of natural-language-based -communication, as a prerequisite to more complex interaction with human users. -We also present some conjectures on the sort of algorithms the machine should -support in order to profitably learn from the environment. -" -2283,1511.08198,"John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu",Towards Universal Paraphrastic Sentence Embeddings,cs.CL cs.LG," We consider the problem of learning general-purpose, paraphrastic sentence -embeddings based on supervision from the Paraphrase Database (Ganitkevitch et -al., 2013). We compare six compositional architectures, evaluating them on -annotated textual similarity datasets drawn both from the same distribution as -the training data and from a wide range of other domains. We find that the most -complex architectures, such as long short-term memory (LSTM) recurrent neural -networks, perform best on the in-domain data. However, in out-of-domain -scenarios, simple architectures such as word averaging vastly outperform LSTMs. -Our simplest averaging model is even competitive with systems tuned for the -particular tasks while also being extremely efficient and easy to use. - In order to better understand how these architectures compare, we conduct -further experiments on three supervised NLP tasks: sentence similarity, -entailment, and sentiment classification. We again find that the word averaging -models perform well for sentence similarity and entailment, outperforming -LSTMs. However, on sentiment classification, we find that the LSTM performs -very strongly-even recording new state-of-the-art performance on the Stanford -Sentiment Treebank. - We then demonstrate how to combine our pretrained sentence embeddings with -these supervised tasks, using them both as a prior and as a black box feature -extractor. This leads to performance rivaling the state of the art on the SICK -similarity and entailment tasks. We release all of our resources to the -research community with the hope that they can serve as the new baseline for -further work on universal sentence embeddings. -" -2284,1511.08277,"Shengxian Wan, Yanyan Lan, Jiafeng Guo, Jun Xu, Liang Pang, and Xueqi - Cheng","A Deep Architecture for Semantic Matching with Multiple Positional - Sentence Representations",cs.AI cs.CL cs.NE," Matching natural language sentences is central for many applications such as -information retrieval and question answering. Existing deep models rely on a -single sentence representation or multiple granularity representations for -matching. However, such methods cannot well capture the contextualized local -information in the matching process. To tackle this problem, we present a new -deep architecture to match two sentences with multiple positional sentence -representations. Specifically, each positional sentence representation is a -sentence representation at this position, generated by a bidirectional long -short term memory (Bi-LSTM). The matching score is finally produced by -aggregating interactions between these different positional sentence -representations, through $k$-Max pooling and a multi-layer perceptron. Our -model has several advantages: (1) By using Bi-LSTM, rich context of the whole -sentence is leveraged to capture the contextualized local information in each -positional sentence representation; (2) By matching with multiple positional -sentence representations, it is flexible to aggregate different important -contextualized local information in a sentence to support the matching; (3) -Experiments on different tasks such as question answering and sentence -completion demonstrate the superiority of our model. -" -2285,1511.08299,"Matthew Long, Aditya Jami, Ashutosh Saxena",Hierarchical classification of e-commerce related social media,cs.SI cs.CL cs.IR cs.LG," In this paper, we attempt to classify tweets into root categories of the -Amazon browse node hierarchy using a set of tweets with browse node ID labels, -a much larger set of tweets without labels, and a set of Amazon reviews. -Examining twitter data presents unique challenges in that the samples are short -(under 140 characters) and often contain misspellings or abbreviations that are -trivial for a human to decipher but difficult for a computer to parse. A -variety of query and document expansion techniques are implemented in an effort -to improve information retrieval to modest success. -" -2286,1511.08308,Jason P.C. Chiu and Eric Nichols,Named Entity Recognition with Bidirectional LSTM-CNNs,cs.CL cs.LG cs.NE," Named entity recognition is a challenging task that has traditionally -required large amounts of knowledge in the form of feature engineering and -lexicons to achieve high performance. In this paper, we present a novel neural -network architecture that automatically detects word- and character-level -features using a hybrid bidirectional LSTM and CNN architecture, eliminating -the need for most feature engineering. We also propose a novel method of -encoding partial lexicon matches in neural networks and compare it to existing -approaches. Extensive evaluation shows that, given only tokenized text and -publicly available word embeddings, our system is competitive on the CoNLL-2003 -dataset and surpasses the previously reported state of the art performance on -the OntoNotes 5.0 dataset by 2.13 F1 points. By using two lexicons constructed -from publicly-available sources, we establish new state of the art performance -with an F1 score of 91.62 on CoNLL-2003 and 86.28 on OntoNotes, surpassing -systems that employ heavy feature engineering, proprietary lexicons, and rich -entity linking information. -" -2287,1511.08400,"David Krueger, Roland Memisevic",Regularizing RNNs by Stabilizing Activations,cs.NE cs.CL cs.LG stat.ML," We stabilize the activations of Recurrent Neural Networks (RNNs) by -penalizing the squared distance between successive hidden states' norms. - This penalty term is an effective regularizer for RNNs including LSTMs and -IRNNs, improving performance on character-level language modeling and phoneme -recognition, and outperforming weight noise and dropout. - We achieve competitive performance (18.6\% PER) on the TIMIT phoneme -recognition task for RNNs evaluated without beam search or an RNN transducer. - With this penalty term, IRNN can achieve similar performance to LSTM on -language modeling, although adding the penalty term to the LSTM results in -superior performance. - Our penalty term also prevents the exponential growth of IRNN's activations -outside of their training horizon, allowing them to generalize to much longer -sequences. -" -2288,1511.08407,"Ran Tian, Naoaki Okazaki, Kentaro Inui",The Mechanism of Additive Composition,cs.CL cs.LG," Additive composition (Foltz et al, 1998; Landauer and Dumais, 1997; Mitchell -and Lapata, 2010) is a widely used method for computing meanings of phrases, -which takes the average of vector representations of the constituent words. In -this article, we prove an upper bound for the bias of additive composition, -which is the first theoretical analysis on compositional frameworks from a -machine learning point of view. The bound is written in terms of collocation -strength; we prove that the more exclusively two successive words tend to occur -together, the more accurate one can guarantee their additive composition as an -approximation to the natural phrase vector. Our proof relies on properties of -natural language data that are empirically verified, and can be theoretically -derived from an assumption that the data is generated from a Hierarchical -Pitman-Yor Process. The theory endorses additive composition as a reasonable -operation for calculating meanings of phrases, and suggests ways to improve -additive compositionality, including: transforming entries of distributional -word vectors by a function that meets a specific condition, constructing a -novel type of vector representations to make additive composition sensitive to -word order, and utilizing singular value decomposition to train word vectors. -" -2289,1511.08411,"Mostafa Bayomi, Killian Levacher, M. Rami Ghorab, S\'eamus Lawless","OntoSeg: a Novel Approach to Text Segmentation using Ontological - Similarity",cs.CL," Text segmentation (TS) aims at dividing long text into coherent segments -which reflect the subtopic structure of the text. It is beneficial to many -natural language processing tasks, such as Information Retrieval (IR) and -document summarisation. Current approaches to text segmentation are similar in -that they all use word-frequency metrics to measure the similarity between two -regions of text, so that a document is segmented based on the lexical cohesion -between its words. Various NLP tasks are now moving towards the semantic web -and ontologies, such as ontology-based IR systems, to capture the -conceptualizations associated with user needs and contents. Text segmentation -based on lexical cohesion between words is hence not sufficient anymore for -such tasks. This paper proposes OntoSeg, a novel approach to text segmentation -based on the ontological similarity between text blocks. The proposed method -uses ontological similarity to explore conceptual relations between text -segments and a Hierarchical Agglomerative Clustering (HAC) algorithm to -represent the text as a tree-like hierarchy that is conceptually structured. -The rich structure of the created tree further allows the segmentation of text -in a linear fashion at various levels of granularity. The proposed method was -evaluated on a wellknown dataset, and the results show that using ontological -similarity in text segmentation is very promising. Also we enhance the proposed -method by combining ontological similarity with lexical similarity and the -results show an enhancement of the segmentation quality. -" -2290,1511.08417,"Ziqiang Cao, Chengyao Chen, Wenjie Li, Sujian Li, Furu Wei, Ming Zhou",TGSum: Build Tweet Guided Multi-Document Summarization Dataset,cs.IR cs.CL," The development of summarization research has been significantly hampered by -the costly acquisition of reference summaries. This paper proposes an effective -way to automatically collect large scales of news-related multi-document -summaries with reference to social media's reactions. We utilize two types of -social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to -cluster documents into different topic sets. Also, a tweet with a hyper-link -often highlights certain key points of the corresponding document. We -synthesize a linked document cluster to form a reference summary which can -cover most key points. To this aim, we adopt the ROUGE metrics to measure the -coverage ratio, and develop an Integer Linear Programming solution to discover -the sentence set reaching the upper bound of ROUGE. Since we allow summary -sentences to be selected from both documents and high-quality tweets, the -generated reference summaries could be abstractive. Both informativeness and -readability of the collected summaries are verified by manual judgment. In -addition, we train a Support Vector Regression summarizer on DUC generic -multi-document summarization benchmarks. With the collected data as extra -training resource, the performance of the summarizer improves a lot on all the -test sets. We release this dataset for further research. -" -2291,1511.08629,"Chunting Zhou, Chonglin Sun, Zhiyuan Liu, Francis C.M. Lau",Category Enhanced Word Embedding,cs.CL," Distributed word representations have been demonstrated to be effective in -capturing semantic and syntactic regularities. Unsupervised representation -learning from large unlabeled corpora can learn similar representations for -those words that present similar co-occurrence statistics. Besides local -occurrence statistics, global topical information is also important knowledge -that may help discriminate a word from another. In this paper, we incorporate -category information of documents in the learning of word representations and -to learn the proposed models in a document-wise manner. Our models outperform -several state-of-the-art models in word analogy and word similarity tasks. -Moreover, we evaluate the learned word vectors on sentiment analysis and text -classification tasks, which shows the superiority of our learned word vectors. -We also learn high-quality category embeddings that reflect topical meanings. -" -2292,1511.08630,"Chunting Zhou, Chonglin Sun, Zhiyuan Liu, Francis C.M. Lau",A C-LSTM Neural Network for Text Classification,cs.CL," Neural network models have been demonstrated to be capable of achieving -remarkable performance in sentence and document modeling. Convolutional neural -network (CNN) and recurrent neural network (RNN) are two mainstream -architectures for such modeling tasks, which adopt totally different ways of -understanding natural languages. In this work, we combine the strengths of both -architectures and propose a novel and unified model called C-LSTM for sentence -representation and text classification. C-LSTM utilizes CNN to extract a -sequence of higher-level phrase representations, and are fed into a long -short-term memory recurrent neural network (LSTM) to obtain the sentence -representation. C-LSTM is able to capture both local features of phrases as -well as global and temporal sentence semantics. We evaluate the proposed -architecture on sentiment classification and question classification tasks. The -experimental results show that the C-LSTM outperforms both CNN and LSTM and can -achieve excellent performance on these tasks. -" -2293,1511.08855,Francisco De Sousa Webber,Semantic Folding Theory And its Application in Semantic Fingerprinting,cs.AI cs.CL q-bio.NC," Human language is recognized as a very complex domain since decades. No -computer system has been able to reach human levels of performance so far. The -only known computational system capable of proper language processing is the -human brain. While we gather more and more data about the brain, its -fundamental computational processes still remain obscure. The lack of a sound -computational brain theory also prevents the fundamental understanding of -Natural Language Processing. As always when science lacks a theoretical -foundation, statistical modeling is applied to accommodate as many sampled -real-world data as possible. An unsolved fundamental issue is the actual -representation of language (data) within the brain, denoted as the -Representational Problem. Starting with Jeff Hawkins' Hierarchical Temporal -Memory (HTM) theory, a consistent computational theory of the human cortex, we -have developed a corresponding theory of language data representation: The -Semantic Folding Theory. The process of encoding words, by using a topographic -semantic space as distributional reference frame into a sparse binary -representational vector is called Semantic Folding and is the central topic of -this document. Semantic Folding describes a method of converting language from -its symbolic representation (text) into an explicit, semantically grounded -representation that can be generically processed by Hawkins' HTM networks. As -it turned out, this change in representation, by itself, can solve many complex -NLP problems by applying Boolean operators and a generic similarity function -like the Euclidian Distance. Many practical problems of statistical NLP -systems, like the high cost of computation, the fundamental incongruity of -precision and recall , the complex tuning procedures etc., can be elegantly -overcome by applying Semantic Folding. -" -2294,1511.08952,Ndapandula Nakashole,Bootstrapping Ternary Relation Extractors,cs.CL cs.AI," Binary relation extraction methods have been widely studied in recent years. -However, few methods have been developed for higher n-ary relation extraction. -One limiting factor is the effort required to generate training data. For -binary relations, one only has to provide a few dozen pairs of entities per -relation, as training data. For ternary relations (n=3), each training instance -is a triplet of entities, placing a greater cognitive load on people. For -example, many people know that Google acquired Youtube but not the dollar -amount or the date of the acquisition and many people know that Hillary Clinton -is married to Bill Clinton by not the location or date of their wedding. This -makes higher n-nary training data generation a time consuming exercise in -searching the Web. We present a resource for training ternary relation -extractors. This was generated using a minimally supervised yet effective -approach. We present statistics on the size and the quality of the dataset. -" -2295,1511.09107,"Panagiotis Stalidis, Maria Giatsoglou, Konstantinos Diamantaras, - George Sarigiannidis, Konstantinos Ch. Chatzisavvas","Machine Learning Sentiment Prediction based on Hybrid Document - Representation",cs.CL cs.AI stat.ML," Automated sentiment analysis and opinion mining is a complex process -concerning the extraction of useful subjective information from text. The -explosion of user generated content on the Web, especially the fact that -millions of users, on a daily basis, express their opinions on products and -services to blogs, wikis, social networks, message boards, etc., render the -reliable, automated export of sentiments and opinions from unstructured text -crucial for several commercial applications. In this paper, we present a novel -hybrid vectorization approach for textual resources that combines a weighted -variant of the popular Word2Vec representation (based on Term Frequency-Inverse -Document Frequency) representation and with a Bag- of-Words representation and -a vector of lexicon-based sentiment values. The proposed text representation -approach is assessed through the application of several machine learning -classification algorithms on a dataset that is used extensively in literature -for sentiment detection. The classification accuracy derived through the -proposed hybrid vectorization approach is higher than when its individual -components are used for text represenation, and comparable with -state-of-the-art sentiment detection methodologies. -" -2296,1511.09128,"Haibing Wu, Yiwei Gu, Shangdi Sun and Xiaodong Gu",Aspect-based Opinion Summarization with Convolutional Neural Networks,cs.CL cs.IR cs.LG," This paper considers Aspect-based Opinion Summarization (AOS) of reviews on -particular products. To enable real applications, an AOS system needs to -address two core subtasks, aspect extraction and sentiment classification. Most -existing approaches to aspect extraction, which use linguistic analysis or -topic modeling, are general across different products but not precise enough or -suitable for particular products. Instead we take a less general but more -precise scheme, directly mapping each review sentence into pre-defined aspects. -To tackle aspect mapping and sentiment classification, we propose two -Convolutional Neural Network (CNN) based methods, cascaded CNN and multitask -CNN. Cascaded CNN contains two levels of convolutional networks. Multiple CNNs -at level 1 deal with aspect mapping task, and a single CNN at level 2 deals -with sentiment classification. Multitask CNN also contains multiple aspect CNNs -and a sentiment CNN, but different networks share the same word embeddings. -Experimental results indicate that both cascaded and multitask CNNs outperform -SVM-based methods by large margins. Multitask CNN generally performs better -than cascaded CNN. -" -2297,1511.09173,"Aiqi Zhang, Ang Li and Tingshao Zhu","Recognizing Temporal Linguistic Expression Pattern of Individual with - Suicide Risk on Social Media",cs.SI cs.CL," Suicide is a global public health problem. Early detection of individual -suicide risk plays a key role in suicide prevention. In this paper, we propose -to look into individual suicide risk through time series analysis of personal -linguistic expression on social media (Weibo). We examined temporal patterns of -the linguistic expression of individuals on Chinese social media (Weibo). Then, -we used such temporal patterns as predictor variables to build classification -models for estimating levels of individual suicide risk. Characteristics of -time sequence curves to linguistic features including parentheses, auxiliary -verbs, personal pronouns and body words are reported to affect performance of -suicide most, and the predicting model has a accuracy higher than 0.60, shown -by the results. This paper confirms the efficiency of the social media data in -detecting individual suicide risk. Results of this study may be insightful for -improving the performance of suicide prevention programs. -" -2298,1511.09376,"Snigdha Chaturvedi, Shashank Srivastava, Hal Daume III and Chris Dyer",Modeling Dynamic Relationships Between Characters in Literary Novels,cs.CL cs.AI," Studying characters plays a vital role in computationally representing and -interpreting narratives. Unlike previous work, which has focused on inferring -character roles, we focus on the problem of modeling their relationships. -Rather than assuming a fixed relationship for a character pair, we hypothesize -that relationships are dynamic and temporally evolve with the progress of the -narrative, and formulate the problem of relationship modeling as a structured -prediction problem. We propose a semi-supervised framework to learn -relationship sequences from fully as well as partially labeled data. We present -a Markovian model capable of accumulating historical beliefs about the -relationship and status changes. We use a set of rich linguistic and -semantically motivated features that incorporate world knowledge to investigate -the textual content of narrative. We empirically demonstrate that such a -framework outperforms competitive baselines. -" -2299,1511.09392,"Agnieszka Wo{\l}k, Krzysztof Wo{\l}k, Krzysztof Marasek","Enhancements in statistical spoken language translation by - de-normalization of ASR results",cs.CL stat.ML," Spoken language translation (SLT) has become very important in an -increasingly globalized world. Machine translation (MT) for automatic speech -recognition (ASR) systems is a major challenge of great interest. This research -investigates that automatic sentence segmentation of speech that is important -for enriching speech recognition output and for aiding downstream language -processing. This article focuses on the automatic sentence segmentation of -speech and improving MT results. We explore the problem of identifying sentence -boundaries in the transcriptions produced by automatic speech recognition -systems in the Polish language. We also experiment with reverse normalization -of the recognized speech samples. -" -2300,1511.09460,"Snigdha Chaturvedi, Dan Goldwasser, Hal Daume III","Ask, and shall you receive?: Understanding Desire Fulfillment in Natural - Language Text",cs.AI cs.CL," The ability to comprehend wishes or desires and their fulfillment is -important to Natural Language Understanding. This paper introduces the task of -identifying if a desire expressed by a subject in a given short piece of text -was fulfilled. We propose various unstructured and structured models that -capture fulfillment cues such as the subject's emotional state and actions. Our -experiments with two different datasets demonstrate the importance of -understanding the narrative and discourse structure to address this task. -" -2301,1512.00103,"Dan Gillick, Cliff Brunk, Oriol Vinyals, Amarnag Subramanya",Multilingual Language Processing From Bytes,cs.CL," We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads -text as bytes and outputs span annotations of the form [start, length, label] -where start positions, lengths, and labels are separate entries in our -vocabulary. Because we operate directly on unicode bytes rather than -language-specific words or characters, we can analyze text in many languages -with a single model. Due to the small vocabulary size, these multilingual -models are very compact, but produce results similar to or better than the -state-of- the-art in Part-of-Speech tagging and Named Entity Recognition that -use only the provided training datasets (no external data sources). Our models -are learning ""from scratch"" in that they do not rely on any elements of the -standard pipeline in Natural Language Processing (including tokenization), and -thus can run in standalone fashion on raw text. -" -2302,1512.00112,"Shashank Srivastava, Snigdha Chaturvedi and Tom Mitchell",Inferring Interpersonal Relations in Narrative Summaries,cs.CL cs.AI cs.SI," Characterizing relationships between people is fundamental for the -understanding of narratives. In this work, we address the problem of inferring -the polarity of relationships between people in narrative summaries. We -formulate the problem as a joint structured prediction for each narrative, and -present a model that combines evidence from linguistic and semantic features, -as well as features based on the structure of the social community in the text. -We also provide a clustering-based approach that can exploit regularities in -narrative types. e.g., learn an affinity for love-triangles in romantic -stories. On a dataset of movie summaries from Wikipedia, our structured models -provide more than a 30% error-reduction over a competitive baseline that -considers pairs of characters in isolation. -" -2303,1512.00170,"Yiming Cui, Conghui Zhu, Xiaoning Zhu, and Tiejun Zhao",Augmenting Phrase Table by Employing Lexicons for Pivot-based SMT,cs.CL," Pivot language is employed as a way to solve the data sparseness problem in -machine translation, especially when the data for a particular language pair -does not exist. The combination of source-to-pivot and pivot-to-target -translation models can induce a new translation model through the pivot -language. However, the errors in two models may compound as noise, and still, -the combined model may suffer from a serious phrase sparsity problem. In this -paper, we directly employ the word lexical model in IBM models as an additional -resource to augment pivot phrase table. In addition, we also propose a phrase -table pruning method which takes into account both of the source and target -phrasal coverage. Experimental result shows that our pruning method -significantly outperforms the conventional one, which only considers source -side phrasal coverage. Furthermore, by including the entries in the lexicon -model, the phrase coverage increased, and we achieved improved results in -Chinese-to-Japanese translation using English as pivot language. -" -2304,1512.00177,"Yiming Cui, Shijin Wang, Jianfeng Li",LSTM Neural Reordering Feature for Statistical Machine Translation,cs.CL cs.AI cs.NE," Artificial neural networks are powerful models, which have been widely -applied into many aspects of machine translation, such as language modeling and -translation modeling. Though notable improvements have been made in these -areas, the reordering problem still remains a challenge in statistical machine -translations. In this paper, we present a novel neural reordering model that -directly models word pairs and alignment. By utilizing LSTM recurrent neural -networks, much longer context could be learned for reordering prediction. -Experimental results on NIST OpenMT12 Arabic-English and Chinese-English -1000-best rescoring task show that our LSTM neural reordering feature is robust -and achieves significant improvements over various baseline systems. -" -2305,1512.00531,"Andrew J. Reagan and Brian Tivnan and Jake Ryland Williams and - Christopher M. Danforth and Peter Sheridan Dodds","Benchmarking sentiment analysis methods for large-scale texts: A case - for using continuum-scored words and word shift graphs",cs.CL," The emergence and global adoption of social media has rendered possible the -real-time estimation of population-scale sentiment, bearing profound -implications for our understanding of human behavior. Given the growing -assortment of sentiment measuring instruments, comparisons between them are -evidently required. Here, we perform detailed tests of 6 dictionary-based -methods applied to 4 different corpora, and briefly examine a further 20 -methods. We show that a dictionary-based method will only perform both reliably -and meaningfully if (1) the dictionary covers a sufficiently large enough -portion of a given text's lexicon when weighted by word usage frequency; and -(2) words are scored on a continuous scale. -" -2306,1512.00576,Derwin Suhartono,"Probabilistic Latent Semantic Analysis (PLSA) untuk Klasifikasi Dokumen - Teks Berbahasa Indonesia",cs.CL cs.IR," One task that is included in managing documents is how to find substantial -information inside. Topic modeling is a technique that has been developed to -produce document representation in form of keywords. The keywords will be used -in the indexing process and document retrieval as needed by users. In this -research, we will discuss specifically about Probabilistic Latent Semantic -Analysis (PLSA). It will cover PLSA mechanism which involves Expectation -Maximization (EM) as the training algorithm, how to conduct testing, and obtain -the accuracy result. -" -2307,1512.00578,Derwin Suhartono,"Klasifikasi Komponen Argumen Secara Otomatis pada Dokumen Teks berbentuk - Esai Argumentatif",cs.CL cs.IR," By automatically recognize argument component, essay writers can do some -inspections to texts that they have written. It will assist essay scoring -process objectively and precisely because essay grader is able to see how well -the argument components are constructed. Some reseachers have tried to do -argument detection and classification along with its implementation in some -domains. The common approach is by doing feature extraction to the text. -Generally, the features are structural, lexical, syntactic, indicator, and -contextual. In this research, we add new feature to the existing features. It -adopts keywords list by Knott and Dale (1993). The experiment result shows the -argument classification achieves 72.45% accuracy. Moreover, we still get the -same accuracy without the keyword lists. This concludes that the keyword lists -do not affect significantly to the features. All features are still weak to -classify major claim and claim, so we need other features which are useful to -differentiate those two kind of argument components. -" -2308,1512.00728,"Philip Massey, Patrick Xia, David Bamman and Noah A. Smith",Annotating Character Relationships in Literary Texts,cs.CL," We present a dataset of manually annotated relationships between characters -in literary texts, in order to support the training and evaluation of automatic -methods for relation type prediction in this domain (Makazhanov et al., 2014; -Kokkinakis, 2013) and the broader computational analysis of literary character -(Elson et al., 2010; Bamman et al., 2014; Vala et al., 2015; Flekova and -Gurevych, 2015). In this work, we solicit annotations from workers on Amazon -Mechanical Turk for 109 texts ranging from Homer's _Iliad_ to Joyce's _Ulysses_ -on four dimensions of interest: for a given pair of characters, we collect -judgments as to the coarse-grained category (professional, social, familial), -fine-grained category (friend, lover, parent, rival, employer), and affinity -(positive, negative, neutral) that describes their primary relationship in a -text. We do not assume that this relationship is static; we also collect -judgments as to whether it changes at any point in the course of the text. -" -2309,1512.00765,"Cedric De Boom, Steven Van Canneyt, Steven Bohez, Thomas Demeester, - Bart Dhoedt",Learning Semantic Similarity for Very Short Texts,cs.IR cs.CL," Levering data on social media, such as Twitter and Facebook, requires -information retrieval algorithms to become able to relate very short text -fragments to each other. Traditional text similarity methods such as tf-idf -cosine-similarity, based on word overlap, mostly fail to produce good results -in this case, since word overlap is little or non-existent. Recently, -distributed word representations, or word embeddings, have been shown to -successfully allow words to match on the semantic level. In order to pair short -text fragments - as a concatenation of separate words - an adequate distributed -sentence representation is needed, in existing literature often obtained by -naively combining the individual word representations. We therefore -investigated several text representations as a combination of word embeddings -in the context of semantic pair matching. This paper investigates the -effectiveness of several such naive techniques, as well as traditional tf-idf -similarity, for fragments of different lengths. Our main contribution is a -first step towards a hybrid method that combines the strength of dense -distributed representations - as opposed to sparse term matching - with the -strength of tf-idf based methods to automatically reduce the impact of less -informative terms. Our new approach outperforms the existing techniques in a -toy experimental set-up, leading to the conclusion that the combination of word -embeddings and tf-idf information might lead to a better model for semantic -content within very short text fragments. -" -2310,1512.00818,"Mohamed Elhoseiny, Jingen Liu, Hui Cheng, Harpreet Sawhney, Ahmed - Elgammal","Zero-Shot Event Detection by Multimodal Distributional Semantic - Embedding of Videos",cs.CV cs.CL cs.LG," We propose a new zero-shot Event Detection method by Multi-modal -Distributional Semantic embedding of videos. Our model embeds object and action -concepts as well as other available modalities from videos into a -distributional semantic space. To our knowledge, this is the first Zero-Shot -event detection model that is built on top of distributional semantics and -extends it in the following directions: (a) semantic embedding of multimodal -information in videos (with focus on the visual modalities), (b) automatically -determining relevance of concepts/attributes to a free text query, which could -be useful for other applications, and (c) retrieving videos by free text event -query (e.g., ""changing a vehicle tire"") based on their content. We embed videos -into a distributional semantic space and then measure the similarity between -videos and the event query in a free text form. We validated our method on the -large TRECVID MED (Multimedia Event Detection) challenge. Using only the event -title as a query, our method outperformed the state-of-the-art that uses big -descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC -metric. It is also an order of magnitude faster. -" -2311,1512.00965,"Pengcheng Yin, Zhengdong Lu, Hang Li, Ben Kao",Neural Enquirer: Learning to Query Tables with Natural Language,cs.AI cs.CL cs.LG cs.NE," We proposed Neural Enquirer as a neural network architecture to execute a -natural language (NL) query on a knowledge-base (KB) for answers. Basically, -Neural Enquirer finds the distributed representation of a query and then -executes it on knowledge-base tables to obtain the answer as one of the values -in the tables. Unlike similar efforts in end-to-end training of semantic -parsers, Neural Enquirer is fully ""neuralized"": it not only gives -distributional representation of the query and the knowledge-base, but also -realizes the execution of compositional queries as a series of differentiable -operations, with intermediate results (consisting of annotations of the tables -at different levels) saved on multiple layers of memory. Neural Enquirer can be -trained with gradient descent, with which not only the parameters of the -controlling components and semantic parsing component, but also the embeddings -of the tables and query words can be learned from scratch. The training can be -done in an end-to-end fashion, but it can take stronger guidance, e.g., the -step-by-step supervision for complicated queries, and benefit from it. Neural -Enquirer is one step towards building neural network systems which seek to -understand language by executing it on real-world. Our experiments show that -Neural Enquirer can learn to execute fairly complicated NL queries on tables -with rich structures. -" -2312,1512.01043,"Harsh Thakkar, Dhiren Patel",Approaches for Sentiment Analysis on Twitter: A State-of-Art study,cs.SI cs.CL cs.IR," Microbloging is an extremely prevalent broadcast medium amidst the Internet -fraternity these days. People share their opinions and sentiments about variety -of subjects like products, news, institutions, etc., every day on microbloging -websites. Sentiment analysis plays a key role in prediction systems, opinion -mining systems, etc. Twitter, one of the microbloging platforms allows a limit -of 140 characters to its users. This restriction stimulates users to be very -concise about their opinion and twitter an ocean of sentiments to analyze. -Twitter also provides developer friendly streaming API for data retrieval -purpose allowing the analyst to search real time tweets from various users. In -this paper, we discuss the state-of-art of the works which are focused on -Twitter, the online social network platform, for sentiment analysis. We survey -various lexical, machine learning and hybrid approaches for sentiment analysis -on Twitter. -" -2313,1512.01100,"Duyu Tang, Bing Qin, Xiaocheng Feng, Ting Liu",Effective LSTMs for Target-Dependent Sentiment Classification,cs.CL," Target-dependent sentiment classification remains a challenge: modeling the -semantic relatedness of a target with its context words in a sentence. -Different context words have different influences on determining the sentiment -polarity of a sentence towards the target. Therefore, it is desirable to -integrate the connections between target word and context words when building a -learning system. In this paper, we develop two target dependent long short-term -memory (LSTM) models, where target information is automatically taken into -account. We evaluate our methods on a benchmark dataset from Twitter. Empirical -results show that modeling sentence representation with standard LSTM does not -perform well. Incorporating target information into LSTM can significantly -boost the classification accuracy. The target-dependent LSTM models achieve -state-of-the-art performances without using syntactic parser or external -sentiment lexicons. -" -2314,1512.01173,"Jiaxin Shi, Jun Zhu","Building Memory with Concept Learning Capabilities from Large-scale - Knowledge Base",cs.CL cs.AI cs.LG," We present a new perspective on neural knowledge base (KB) embeddings, from -which we build a framework that can model symbolic knowledge in the KB together -with its learning process. We show that this framework well regularizes -previous neural KB embedding model for superior performance in reasoning tasks, -while having the capabilities of dealing with unseen entities, that is, to -learn their embeddings from natural language descriptions, which is very like -human's behavior of learning semantic concepts. -" -2315,1512.01283,Vivek Datla and Abhinav Vishnu,"Predicting the top and bottom ranks of billboard songs using Machine - Learning",cs.CL cs.LG," The music industry is a $130 billion industry. Predicting whether a song -catches the pulse of the audience impacts the industry. In this paper we -analyze language inside the lyrics of the songs using several computational -linguistic algorithms and predict whether a song would make to the top or -bottom of the billboard rankings based on the language features. We trained and -tested an SVM classifier with a radial kernel function on the linguistic -features. Results indicate that we can classify whether a song belongs to top -and bottom of the billboard charts with a precision of 0.76. -" -2316,1512.01337,"Jun Yin, Xin Jiang, Zhengdong Lu, Lifeng Shang, Hang Li, Xiaoming Li",Neural Generative Question Answering,cs.CL," This paper presents an end-to-end neural network model, named Neural -Generative Question Answering (GENQA), that can generate answers to simple -factoid questions, based on the facts in a knowledge-base. More specifically, -the model is built on the encoder-decoder framework for sequence-to-sequence -learning, while equipped with the ability to enquire the knowledge-base, and is -trained on a corpus of question-answer pairs, with their associated triples in -the knowledge-base. Empirical study shows the proposed model can effectively -deal with the variations of questions and answers, and generate right and -natural answers by referring to the facts in the knowledge-base. The experiment -on question answering demonstrates that the proposed model can outperform an -embedding-based QA model as well as a neural dialogue model trained on the same -data. -" -2317,1512.01370,"Yantao Jia, Yuanzhuo Wang, Hailun Lin, Xiaolong Jin, Xueqi Cheng",Locally Adaptive Translation for Knowledge Graph Embedding,cs.AI cs.CL," Knowledge graph embedding aims to represent entities and relations in a -large-scale knowledge graph as elements in a continuous vector space. Existing -methods, e.g., TransE and TransH, learn embedding representation by defining a -global margin-based loss function over the data. However, the optimal loss -function is determined during experiments whose parameters are examined among a -closed set of candidates. Moreover, embeddings over two knowledge graphs with -different entities and relations share the same set of candidate loss -functions, ignoring the locality of both graphs. This leads to the limited -performance of embedding related applications. In this paper, we propose a -locally adaptive translation method for knowledge graph embedding, called -TransA, to find the optimal loss function by adaptively determining its margin -over different knowledge graphs. Experiments on two benchmark data sets -demonstrate the superiority of the proposed method, as compared to -the-state-of-the-art ones. -" -2318,1512.01384,"Henrique F. de Arruda, Luciano da F. Costa and Diego R. Amancio",Topic segmentation via community detection in complex networks,cs.CL cs.SI," Many real systems have been modelled in terms of network concepts, and -written texts are a particular example of information networks. In recent -years, the use of network methods to analyze language has allowed the discovery -of several interesting findings, including the proposition of novel models to -explain the emergence of fundamental universal patterns. While syntactical -networks, one of the most prevalent networked models of written texts, display -both scale-free and small-world properties, such representation fails in -capturing other textual features, such as the organization in topics or -subjects. In this context, we propose a novel network representation whose main -purpose is to capture the semantical relationships of words in a simple way. To -do so, we link all words co-occurring in the same semantic context, which is -defined in a threefold way. We show that the proposed representations favours -the emergence of communities of semantically related words, and this feature -may be used to identify relevant topics. The proposed methodology to detect -topics was applied to segment selected Wikipedia articles. We have found that, -in general, our methods outperform traditional bag-of-words representations, -which suggests that a high-level textual representation may be useful to study -semantical features of texts. -" -2319,1512.01409,"Mengyun Cao, Jiao Tian, Dezhi Cheng, Jin Liu, Xiaoping Sun",What Makes it Difficult to Understand a Scientific Literature?,cs.CL," In the artificial intelligence area, one of the ultimate goals is to make -computers understand human language and offer assistance. In order to achieve -this ideal, researchers of computer science have put forward a lot of models -and algorithms attempting at enabling the machine to analyze and process human -natural language on different levels of semantics. Although recent progress in -this field offers much hope, we still have to ask whether current research can -provide assistance that people really desire in reading and comprehension. To -this end, we conducted a reading comprehension test on two scientific papers -which are written in different styles. We use the semantic link models to -analyze the understanding obstacles that people will face in the process of -reading and figure out what makes it difficult for human to understand a -scientific literature. Through such analysis, we summarized some -characteristics and problems which are reflected by people with different -levels of knowledge on the comprehension of difficult science and technology -literature, which can be modeled in semantic link network. We believe that -these characteristics and problems will help us re-examine the existing machine -models and are helpful in the designing of new one. -" -2320,1512.01525,"Yezhou Yang and Yiannis Aloimonos and Cornelia Fermuller and Eren - Erdal Aksoy",Learning the Semantics of Manipulation Action,cs.RO cs.CL cs.CV," In this paper we present a formal computational framework for modeling -manipulation actions. The introduced formalism leads to semantics of -manipulation action and has applications to both observing and understanding -human manipulation actions as well as executing them with a robotic mechanism -(e.g. a humanoid robot). It is based on a Combinatory Categorial Grammar. The -goal of the introduced framework is to: (1) represent manipulation actions with -both syntax and semantic parts, where the semantic part employs -$\lambda$-calculus; (2) enable a probabilistic semantic parsing schema to learn -the $\lambda$-calculus representation of manipulation action from an annotated -action corpus of videos; (3) use (1) and (2) to develop a system that visually -observes manipulation actions and understands their meaning while it can reason -beyond observations using propositional logic and axiom schemata. The -experiments conducted on a public available large manipulation action dataset -validate the theoretical framework and our implementation. -" -2321,1512.01587,"Sahil Garg, Aram Galstyan, Ulf Hermjakob, and Daniel Marcu","Extracting Biomolecular Interactions Using Semantic Parsing of - Biomedical Text",cs.CL cs.AI cs.IR cs.IT cs.LG math.IT," We advance the state of the art in biomolecular interaction extraction with -three contributions: (i) We show that deep, Abstract Meaning Representations -(AMR) significantly improve the accuracy of a biomolecular interaction -extraction system when compared to a baseline that relies solely on surface- -and syntax-based features; (ii) In contrast with previous approaches that infer -relations on a sentence-by-sentence basis, we expand our framework to enable -consistent predictions over sets of sentences (documents); (iii) We further -modify and expand a graph kernel learning framework to enable concurrent -exploitation of automatically induced AMR (semantic) and dependency structure -(syntactic) representations. Our experiments show that our approach yields -interaction extraction systems that are more robust in environments where there -is a significant mismatch between training and test conditions. -" -2322,1512.01639,"Krzysztof Wo{\l}k, Krzysztof Marasek","PJAIT Systems for the IWSLT 2015 Evaluation Campaign Enhanced by - Comparable Corpora",cs.CL stat.ML," In this paper, we attempt to improve Statistical Machine Translation (SMT) -systems on a very diverse set of language pairs (in both directions): Czech - -English, Vietnamese - English, French - English and German - English. To -accomplish this, we performed translation model training, created adaptations -of training settings for each language pair, and obtained comparable corpora -for our SMT systems. Innovative tools and data adaptation techniques were -employed. The TED parallel text corpora for the IWSLT 2015 evaluation campaign -were used to train language models, and to develop, tune, and test the system. -In addition, we prepared Wikipedia-based comparable corpora for use with our -SMT system. This data was specified as permissible for the IWSLT 2015 -evaluation. We explored the use of domain adaptation techniques, symmetrized -word alignment models, the unsupervised transliteration models and the KenLM -language modeling tool. To evaluate the effects of different preparations on -translation results, we conducted experiments and used the BLEU, NIST and TER -metrics. Our results indicate that our approach produced a positive impact on -SMT quality. -" -2323,1512.01641,"Krzysztof Wo{\l}k, Krzysztof Marasek","Unsupervised comparable corpora preparation and exploration for - bi-lingual translation equivalents",cs.CL stat.ML," The multilingual nature of the world makes translation a crucial requirement -today. Parallel dictionaries constructed by humans are a widely-available -resource, but they are limited and do not provide enough coverage for good -quality translation purposes, due to out-of-vocabulary words and neologisms. -This motivates the use of statistical translation systems, which are -unfortunately dependent on the quantity and quality of training data. Such -systems have a very limited availability especially for some languages and very -narrow text domains. In this research we present our improvements to current -comparable corpora mining methodologies by re- implementation of the comparison -algorithms (using Needleman-Wunch algorithm), introduction of a tuning script -and computation time improvement by GPU acceleration. Experiments are carried -out on bilingual data extracted from the Wikipedia, on various domains. For the -Wikipedia itself, additional cross-lingual comparison heuristics were -introduced. The modifications made a positive impact on the quality and -quantity of mined data and on the translation quality. -" -2324,1512.01712,Konstantin Lopyrev,Generating News Headlines with Recurrent Neural Networks,cs.CL cs.LG cs.NE," We describe an application of an encoder-decoder recurrent neural network -with LSTM units and attention to generating headlines from the text of news -articles. We find that the model is quite effective at concisely paraphrasing -news articles. Furthermore, we study how the neural network decides which input -words to pay attention to, and specifically we identify the function of the -different neurons in a simplified attention mechanism. Interestingly, our -simplified attention mechanism performs better that the more complex attention -mechanism on a held out set of articles. -" -2325,1512.01768,"Danish, Yogesh Dahiya, Partha Talukdar",Want Answers? A Reddit Inspired Study on How to Pose Questions,cs.CL," Questions form an integral part of our everyday communication, both offline -and online. Getting responses to our questions from others is fundamental to -satisfying our information need and in extending our knowledge boundaries. A -question may be represented using various factors such as social, syntactic, -semantic, etc. We hypothesize that these factors contribute with varying -degrees towards getting responses from others for a given question. We perform -a thorough empirical study to measure effects of these factors using a novel -question and answer dataset from the website Reddit.com. To the best of our -knowledge, this is the first such analysis of its kind on this important topic. -We also use a sparse nonnegative matrix factorization technique to -automatically induce interpretable semantic factors from the question dataset. -We also document various patterns on response prediction we observe during our -analysis in the data. For instance, we found that preference-probing questions -are scantily answered. Our method is robust to capture such latent response -factors. We hope to make our code and datasets publicly available upon -publication of the paper. -" -2326,1512.01818,"Filipe Nunes Ribeiro, Matheus Ara\'ujo, Pollyanna Gon\c{c}alves, - Fabr\'icio Benevenuto, Marcos Andr\'e Gon\c{c}alves","SentiBench - a benchmark comparison of state-of-the-practice sentiment - analysis methods",cs.CL cs.SI," In the last few years thousands of scientific papers have investigated -sentiment analysis, several startups that measure opinions on real data have -emerged and a number of innovative products related to this theme have been -developed. There are multiple methods for measuring sentiments, including -lexical-based and supervised machine learning methods. Despite the vast -interest on the theme and wide popularity of some methods, it is unclear which -one is better for identifying the polarity (i.e., positive or negative) of a -message. Accordingly, there is a strong need to conduct a thorough -apple-to-apple comparison of sentiment analysis methods, \textit{as they are -used in practice}, across multiple datasets originated from different data -sources. Such a comparison is key for understanding the potential limitations, -advantages, and disadvantages of popular methods. This article aims at filling -this gap by presenting a benchmark comparison of twenty-four popular sentiment -analysis methods (which we call the state-of-the-practice methods). Our -evaluation is based on a benchmark of eighteen labeled datasets, covering -messages posted on social networks, movie and product reviews, as well as -opinions and comments in news articles. Our results highlight the extent to -which the prediction performance of these methods varies considerably across -datasets. Aiming at boosting the development of this research area, we open the -methods' codes and datasets used in this article, deploying them in a benchmark -system, which provides an open API for accessing and comparing sentence-level -sentiment analysis methods. -" -2327,1512.01882,Dong Wang and Xuewei Zhang,THCHS-30 : A Free Chinese Speech Corpus,cs.CL cs.SD," Speech data is crucially important for speech recognition research. There are -quite some speech databases that can be purchased at prices that are reasonable -for most research institutes. However, for young people who just start research -activities or those who just gain initial interest in this direction, the cost -for data is still an annoying barrier. We support the `free data' movement in -speech recognition: research institutes (particularly supported by public -funds) publish their data freely so that new researchers can obtain sufficient -data to kick of their career. In this paper, we follow this trend and release a -free Chinese speech database THCHS-30 that can be used to build a full- edged -Chinese speech recognition system. We report the baseline system established -with this database, including the performance under highly noisy conditions. -" -2328,1512.01926,Kamil Rocki,Thinking Required,cs.LG cs.AI cs.CL," There exists a theory of a single general-purpose learning algorithm which -could explain the principles its operation. It assumes the initial rough -architecture, a small library of simple innate circuits which are prewired at -birth. and proposes that all significant mental algorithms are learned. Given -current understanding and observations, this paper reviews and lists the -ingredients of such an algorithm from architectural and functional -perspectives. -" -2329,1512.02009,"Bei Chen, Jun Zhu, Nan Yang, Tian Tian, Ming Zhou, Bo Zhang",Jointly Modeling Topics and Intents with Global Order Structure,cs.CL cs.IR cs.LG," Modeling document structure is of great importance for discourse analysis and -related applications. The goal of this research is to capture the document -intent structure by modeling documents as a mixture of topic words and -rhetorical words. While the topics are relatively unchanged through one -document, the rhetorical functions of sentences usually change following -certain orders in discourse. We propose GMM-LDA, a topic modeling based -Bayesian unsupervised model, to analyze the document intent structure -cooperated with order information. Our model is flexible that has the ability -to combine the annotations and do supervised learning. Additionally, entropic -regularization can be introduced to model the significant divergence between -topics and intents. We perform experiments in both unsupervised and supervised -settings, results show the superiority of our model over several -state-of-the-art baselines. -" -2330,1512.02167,"Bolei Zhou and Yuandong Tian and Sainbayar Sukhbaatar and Arthur Szlam - and Rob Fergus",Simple Baseline for Visual Question Answering,cs.CV cs.CL," We describe a very simple bag-of-words baseline for visual question -answering. This baseline concatenates the word features from the question and -CNN features from the image to predict the answer. When evaluated on the -challenging VQA dataset [2], it shows comparable performance to many recent -approaches using recurrent neural networks. To explore the strength and -weakness of the trained model, we also provide an interactive web demo and -open-source code. . -" -2331,1512.02433,"Shiqi Shen, Yong Cheng, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and - Yang Liu",Minimum Risk Training for Neural Machine Translation,cs.CL," We propose minimum risk training for end-to-end neural machine translation. -Unlike conventional maximum likelihood estimation, minimum risk training is -capable of optimizing model parameters directly with respect to arbitrary -evaluation metrics, which are not necessarily differentiable. Experiments show -that our approach achieves significant improvements over maximum likelihood -estimation on a state-of-the-art neural machine translation system across -various languages pairs. Transparent to architectures, our approach can be -applied to more neural networks and potentially benefit more NLP tasks. -" -2332,1512.02567,Mojtaba Hajiabadi,"Distributed Adaptive LMF Algorithm for Sparse Parameter Estimation in - Gaussian Mixture Noise",cs.IT cs.CL math.IT," A distributed adaptive algorithm for estimation of sparse unknown parameters -in the presence of nonGaussian noise is proposed in this paper based on -normalized least mean fourth (NLMF) criterion. At the first step, local -adaptive NLMF algorithm is modified by zero norm in order to speed up the -convergence rate and also to reduce the steady state error power in sparse -conditions. Then, the proposed algorithm is extended for distributed scenario -in which more improvement in estimation performance is achieved due to -cooperation of local adaptive filters. Simulation results show the superiority -of the proposed algorithm in comparison with conventional NLMF algorithms. -" -2333,1512.02595,"Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared - Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg - Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, - Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew - Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David - Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani - Yogatama, Jun Zhan, Zhenyao Zhu",Deep Speech 2: End-to-End Speech Recognition in English and Mandarin,cs.CL," We show that an end-to-end deep learning approach can be used to recognize -either English or Mandarin Chinese speech--two vastly different languages. -Because it replaces entire pipelines of hand-engineered components with neural -networks, end-to-end learning allows us to handle a diverse variety of speech -including noisy environments, accents and different languages. Key to our -approach is our application of HPC techniques, resulting in a 7x speedup over -our previous system. Because of this efficiency, experiments that previously -took weeks now run in days. This enables us to iterate more quickly to identify -superior architectures and algorithms. As a result, in several cases, our -system is competitive with the transcription of human workers when benchmarked -on standard datasets. Finally, using a technique called Batch Dispatch with -GPUs in the data center, we show that our system can be inexpensively deployed -in an online setting, delivering low latency when serving users at scale. -" -2334,1512.02902,"Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, - Raquel Urtasun, Sanja Fidler",MovieQA: Understanding Stories in Movies through Question-Answering,cs.CV cs.CL," We introduce the MovieQA dataset which aims to evaluate automatic story -comprehension from both video and text. The dataset consists of 14,944 -questions about 408 movies with high semantic diversity. The questions range -from simpler ""Who"" did ""What"" to ""Whom"", to ""Why"" and ""How"" certain events -occurred. Each question comes with a set of five possible answers; a correct -one and four deceiving answers provided by human annotators. Our dataset is -unique in that it contains multiple sources of information -- video clips, -plots, subtitles, scripts, and DVS. We analyze our data through various -statistics and methods. We further extend existing QA techniques to show that -question-answering with such open-ended semantics is hard. We make this data -set public along with an evaluation benchmark to encourage inspiring work in -this challenging domain. -" -2335,1512.03460,Yezhou Yang and Yi Li and Cornelia Fermuller and Yiannis Aloimonos,"Neural Self Talk: Image Understanding via Continuous Questioning and - Answering",cs.CV cs.CL cs.RO," In this paper we consider the problem of continuously discovering image -contents by actively asking image based questions and subsequently answering -the questions being asked. The key components include a Visual Question -Generation (VQG) module and a Visual Question Answering module, in which -Recurrent Neural Networks (RNN) and Convolutional Neural Network (CNN) are -used. Given a dataset that contains images, questions and their answers, both -modules are trained at the same time, with the difference being VQG uses the -images as input and the corresponding questions as output, while VQA uses -images and questions as input and the corresponding answers as output. We -evaluate the self talk process subjectively using Amazon Mechanical Turk, which -show effectiveness of the proposed method. -" -2336,1512.03465,"Walid Shalaby, Wlodek Zadrozny","Mined Semantic Analysis: A New Concept Space Model for Semantic - Representation of Textual Data",cs.CL," Mined Semantic Analysis (MSA) is a novel concept space model which employs -unsupervised learning to generate semantic representations of text. MSA -represents textual structures (terms, phrases, documents) as a Bag of Concepts -(BoC) where concepts are derived from concept rich encyclopedic corpora. -Traditional concept space models exploit only target corpus content to -construct the concept space. MSA, alternatively, uncovers implicit relations -between concepts by mining for their associations (e.g., mining Wikipedia's -""See also"" link graph). We evaluate MSA's performance on benchmark datasets for -measuring semantic relatedness of words and sentences. Empirical results show -competitive performance of MSA compared to prior state-of-the-art methods. -Additionally, we introduce the first analytical study to examine statistical -significance of results reported by different semantic relatedness methods. Our -study shows that, the nuances of results across top performing methods could be -statistically insignificant. The study positions MSA as one of state-of-the-art -methods for measuring semantic relatedness, besides the inherent -interpretability and simplicity of the generated semantic representation. -" -2337,1512.03549,"Pranjal Singh, Amitabha Mukerjee","Words are not Equal: Graded Weighting Model for building Composite - Document Vectors",cs.CL cs.LG cs.NE," Despite the success of distributional semantics, composing phrases from word -vectors remains an important challenge. Several methods have been tried for -benchmark tasks such as sentiment classification, including word vector -averaging, matrix-vector approaches based on parsing, and on-the-fly learning -of paragraph vectors. Most models usually omit stop words from the composition. -Instead of such an yes-no decision, we consider several graded schemes where -words are weighted according to their discriminatory relevance with respect to -its use in the document (e.g., idf). Some of these methods (particularly -tf-idf) are seen to result in a significant improvement in performance over -prior state of the art. Further, combining such approaches into an ensemble -based on alternate classifiers such as the RNN model, results in an 1.6% -performance improvement on the standard IMDB movie review dataset, and a 7.01% -improvement on Amazon product reviews. Since these are language free models and -can be obtained in an unsupervised manner, they are of interest also for -under-resourced languages such as Hindi as well and many more languages. We -demonstrate the language free aspects by showing a gain of 12% for two review -datasets over earlier results, and also release a new larger dataset for future -testing (Singh,2015). -" -2338,1512.03950,Kamal Sarkar,"A Hidden Markov Model Based System for Entity Extraction from Social - Media English Text at FIRE 2015",cs.CL," This paper presents the experiments carried out by us at Jadavpur University -as part of the participation in FIRE 2015 task: Entity Extraction from Social -Media Text - Indian Languages (ESM-IL). The tool that we have developed for the -task is based on Trigram Hidden Markov Model that utilizes information like -gazetteer list, POS tag and some other word level features to enhance the -observation probabilities of the known tokens as well as unknown tokens. We -submitted runs for English only. A statistical HMM (Hidden Markov Models) based -model has been used to implement our system. The system has been trained and -tested on the datasets released for FIRE 2015 task: Entity Extraction from -Social Media Text - Indian Languages (ESM-IL). Our system is the best performer -for English language and it obtains precision, recall and F-measures of 61.96, -39.46 and 48.21 respectively. -" -2339,1512.04092,"Sanket Mehta, Shagun Sodhani",Stack Exchange Tagger,cs.CL cs.LG," The goal of our project is to develop an accurate tagger for questions posted -on Stack Exchange. Our problem is an instance of the more general problem of -developing accurate classifiers for large scale text datasets. We are tackling -the multilabel classification problem where each item (in this case, question) -can belong to multiple classes (in this case, tags). We are predicting the tags -(or keywords) for a particular Stack Exchange post given only the question text -and the title of the post. In the process, we compare the performance of -Support Vector Classification (SVC) for different kernel functions, loss -function, etc. We found linear SVC with Crammer Singer technique produces best -results. -" -2340,1512.04280,Liang Lu and Steve Renals,"Small-footprint Deep Neural Networks with Highway Connections for Speech - Recognition",cs.CL cs.LG cs.NE," For speech recognition, deep neural networks (DNNs) have significantly -improved the recognition accuracy in most of benchmark datasets and application -domains. However, compared to the conventional Gaussian mixture models, -DNN-based acoustic models usually have much larger number of model parameters, -making it challenging for their applications in resource constrained platforms, -e.g., mobile devices. In this paper, we study the application of the recently -proposed highway network to train small-footprint DNNs, which are {\it thinner} -and {\it deeper}, and have significantly smaller number of model parameters -compared to conventional DNNs. We investigated this approach on the AMI meeting -speech transcription corpus which has around 70 hours of audio data. The -highway neural networks constantly outperformed their plain DNN counterparts, -and the number of model parameters can be reduced significantly without -sacrificing the recognition accuracy. -" -2341,1512.04407,"Arjun Chandrasekaran, Ashwin K. Vijayakumar, Stanislaw Antol, Mohit - Bansal, Dhruv Batra, C. Lawrence Zitnick and Devi Parikh",We Are Humor Beings: Understanding and Predicting Visual Humor,cs.CV cs.CL cs.LG," Humor is an integral part of human lives. Despite being tremendously -impactful, it is perhaps surprising that we do not have a detailed -understanding of humor yet. As interactions between humans and AI systems -increase, it is imperative that these systems are taught to understand -subtleties of human expressions such as humor. In this work, we are interested -in the question - what content in a scene causes it to be funny? As a first -step towards understanding visual humor, we analyze the humor manifested in -abstract scenes and design computational models for them. We collect two -datasets of abstract scenes that facilitate the study of humor at both the -scene-level and the object-level. We analyze the funny scenes and explore the -different types of humor depicted in them via human studies. We model two tasks -that we believe demonstrate an understanding of some aspects of visual humor. -The tasks involve predicting the funniness of a scene and altering the -funniness of a scene. We show that our models perform well quantitatively, and -qualitatively through human studies. Our datasets are publicly available. -" -2342,1512.04419,"Esma Balkir, Dimitri Kartsaklis, Mehrnoosh Sadrzadeh",Sentence Entailment in Compositional Distributional Semantics,cs.CL cs.AI math.CT," Distributional semantic models provide vector representations for words by -gathering co-occurrence frequencies from corpora of text. Compositional -distributional models extend these from words to phrases and sentences. In -categorical compositional distributional semantics, phrase and sentence -representations are functions of their grammatical structure and -representations of the words therein. In this setting, grammatical structures -are formalised by morphisms of a compact closed category and meanings of words -are formalised by objects of the same category. These can be instantiated in -the form of vectors or density matrices. This paper concerns the applications -of this model to phrase and sentence level entailment. We argue that -entropy-based distances of vectors and density matrices provide a good -candidate to measure word-level entailment, show the advantage of density -matrices over vectors for word level entailments, and prove that these -distances extend compositionally from words to phrases and sentences. We -exemplify our theoretical constructions on real data and a toy entailment -dataset and provide preliminary experimental evidence. -" -2343,1512.04650,"Yong Cheng, Shiqi Shen, Zhongjun He, Wei He, Hua Wu, Maosong Sun, and - Yang Liu","Agreement-based Joint Training for Bidirectional Attention-based Neural - Machine Translation",cs.CL," The attentional mechanism has proven to be effective in improving end-to-end -neural machine translation. However, due to the intricate structural divergence -between natural languages, unidirectional attention-based models might only -capture partial aspects of attentional regularities. We propose agreement-based -joint training for bidirectional attention-based end-to-end neural machine -translation. Instead of training source-to-target and target-to-source -translation models independently,our approach encourages the two complementary -models to agree on word alignment matrices on the same training data. -Experiments on Chinese-English and English-French translation tasks show that -agreement-based joint training significantly improves both alignment and -translation quality over independent training. -" -2344,1512.04701,"Weixin Li, Jungseock Joo, Hang Qi, and Song-Chun Zhu","Joint Image-Text News Topic Detection and Tracking with And-Or Graph - Representation",cs.IR cs.CL cs.SI," In this paper, we aim to develop a method for automatically detecting and -tracking topics in broadcast news. We present a hierarchical And-Or graph (AOG) -to jointly represent the latent structure of both texts and visuals. The AOG -embeds a context sensitive grammar that can describe the hierarchical -composition of news topics by semantic elements about people involved, related -places and what happened, and model contextual relationships between elements -in the hierarchy. We detect news topics through a cluster sampling process -which groups stories about closely related events. Swendsen-Wang Cuts (SWC), an -effective cluster sampling algorithm, is adopted for traversing the solution -space and obtaining optimal clustering solutions by maximizing a Bayesian -posterior probability. Topics are tracked to deal with the continuously updated -news streams. We generate topic trajectories to show how topics emerge, evolve -and disappear over time. The experimental results show that our method can -explicitly describe the textual and visual data in news videos and produce -meaningful topic trajectories. Our method achieves superior performance -compared to state-of-the-art methods on both a public dataset Reuters-21578 and -a self-collected dataset named UCLA Broadcast News Dataset. -" -2345,1512.04906,Welin Chen and David Grangier and Michael Auli,Strategies for Training Large Vocabulary Neural Language Models,cs.CL cs.LG," Training neural network language models over large vocabularies is still -computationally very costly compared to count-based models such as Kneser-Ney. -At the same time, neural language models are gaining popularity for many -applications such as speech recognition and machine translation whose success -depends on scalability. We present a systematic comparison of strategies to -represent and train large vocabularies, including softmax, hierarchical -softmax, target sampling, noise contrastive estimation and self normalization. -We further extend self normalization to be a proper estimator of likelihood and -introduce an efficient variant of softmax. We evaluate each method on three -popular benchmarks, examining performance on rare words, the speed/accuracy -trade-off and complementarity to Kneser-Ney. -" -2346,1512.04973,Ndapandula Nakashole,An Operator for Entity Extraction in MapReduce,cs.DB cs.CL," Dictionary-based entity extraction involves finding mentions of dictionary -entities in text. Text mentions are often noisy, containing spurious or missing -words. Efficient algorithms for detecting approximate entity mentions follow -one of two general techniques. The first approach is to build an index on the -entities and perform index lookups of document substrings. The second approach -recognizes that the number of substrings generated from documents can explode -to large numbers, to get around this, they use a filter to prune many such -substrings which do not match any dictionary entity and then only verify the -remaining substrings if they are entity mentions of dictionary entities, by -means of a text join. The choice between the index-based approach and the -filter & verification-based approach is a case-to-case decision as the best -approach depends on the characteristics of the input entity dictionary, for -example frequency of entity mentions. Choosing the right approach for the -setting can make a substantial difference in execution time. Making this choice -is however non-trivial as there are parameters within each of the approaches -that make the space of possible approaches very large. In this paper, we -present a cost-based operator for making the choice among execution plans for -entity extraction. Since we need to deal with large dictionaries and even -larger large datasets, our operator is developed for implementations of -MapReduce distributed algorithms. -" -2347,1512.05004,Jaimie Murdock and Jiaan Zeng and Colin Allen,"Towards Evaluation of Cultural-scale Claims in Light of Topic Model - Sampling Effects",cs.DL cs.CL cs.IR," Cultural-scale models of full text documents are prone to over-interpretation -by researchers making unintentionally strong socio-linguistic claims (Pechenick -et al., 2015) without recognizing that even large digital libraries are merely -samples of all the books ever produced. In this study, we test the sensitivity -of the topic models to the sampling process by taking random samples of books -in the Hathi Trust Digital Library from different areas of the Library of -Congress Classification Outline. For each classification area, we train several -topic models over the entire class with different random seeds, generating a -set of spanning models. Then, we train topic models on random samples of books -from the classification area, generating a set of sample models. Finally, we -perform a topic alignment between each pair of models by computing the -Jensen-Shannon distance (JSD) between the word probability distributions for -each topic. We take two measures on each model alignment: alignment distance -and topic overlap. We find that sample models with a large sample size -typically have an alignment distance that falls in the range of the alignment -distance between spanning models. Unsurprisingly, as sample size increases, -alignment distance decreases. We also find that the topic overlap increases as -sample size increases. However, the decomposition of these measures by sample -size differs by number of topics and by classification area. We speculate that -these measures could be used to find classes which have a common ""canon"" -discussed among all books in the area, as shown by high topic overlap and low -alignment distance even in small sample sizes. -" -2348,1512.05030,Manaal Faruqui and Ryan McDonald and Radu Soricut,"Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised - Learning",cs.CL," Morpho-syntactic lexicons provide information about the morphological and -syntactic roles of words in a language. Such lexicons are not available for all -languages and even when available, their coverage can be limited. We present a -graph-based semi-supervised learning method that uses the morphological, -syntactic and semantic relations between words to automatically construct wide -coverage lexicons from small seed sets. Our method is language-independent, and -we show that we can expand a 1000 word seed lexicon to more than 100 times its -size with high quality for 11 languages. In addition, the automatically created -lexicons provide features that improve performance in two downstream tasks: -morphological tagging and dependency parsing. -" -2349,1512.05193,"Wenpeng Yin, Hinrich Sch\""utze, Bing Xiang, Bowen Zhou","ABCNN: Attention-Based Convolutional Neural Network for Modeling - Sentence Pairs",cs.CL," How to model a pair of sentences is a critical issue in many NLP tasks such -as answer selection (AS), paraphrase identification (PI) and textual entailment -(TE). Most prior work (i) deals with one individual task by fine-tuning a -specific system; (ii) models each sentence's representation separately, rarely -considering the impact of the other sentence; or (iii) relies fully on manually -designed, task-specific linguistic features. This work presents a general -Attention Based Convolutional Neural Network (ABCNN) for modeling a pair of -sentences. We make three contributions. (i) ABCNN can be applied to a wide -variety of tasks that require modeling of sentence pairs. (ii) We propose three -attention schemes that integrate mutual influence between sentences into CNN; -thus, the representation of each sentence takes into consideration its -counterpart. These interdependent sentence pair representations are more -powerful than isolated sentence representations. (iii) ABCNN achieves -state-of-the-art performance on AS, PI and TE tasks. -" -2350,1512.05582,Ramon Ferrer-i-Cancho,Kauffman's adjacent possible in word order evolution,cs.CL cs.IT math.IT physics.data-an physics.soc-ph," Word order evolution has been hypothesized to be constrained by a word order -permutation ring: transitions involving orders that are closer in the -permutation ring are more likely. The hypothesis can be seen as a particular -case of Kauffman's adjacent possible in word order evolution. Here we consider -the problem of the association of the six possible orders of S, V and O to -yield a couple of primary alternating orders as a window to word order -evolution. We evaluate the suitability of various competing hypotheses to -predict one member of the couple from the other with the help of information -theoretic model selection. Our ensemble of models includes a six-way model that -is based on the word order permutation ring (Kauffman's adjacent possible) and -another model based on the dual two-way of standard typology, that reduces word -order to basic orders preferences (e.g., a preference for SV over VS and -another for SO over OS). Our analysis indicates that the permutation ring -yields the best model when favoring parsimony strongly, providing support for -Kauffman's general view and a six-way typology. -" -2351,1512.05670,Amrith Krishna and Pawan Goyal,"Towards automating the generation of derivative nouns in Sanskrit by - simulating Panini",cs.CL," About 1115 rules in Astadhyayi from A.4.1.76 to A.5.4.160 deal with -generation of derivative nouns, making it one of the largest topical sections -in Astadhyayi, called as the Taddhita section owing to the head rule A.4.1.76. -This section is a systematic arrangement of rules that enumerates various -affixes that are used in the derivation under specific semantic relations. We -propose a system that automates the process of generation of derivative nouns -as per the rules in Astadhyayi. The proposed system follows a completely object -oriented approach, that models each rule as a class of its own and then groups -them as rule groups. The rule groups are decided on the basis of selective -grouping of rules by virtue of anuvrtti. The grouping of rules results in an -inheritance network of rules which is a directed acyclic graph. Every rule -group has a head rule and the head rule notifies all the direct member rules of -the group about the environment which contains all the details about data -entities, participating in the derivation process. The system implements this -mechanism using multilevel inheritance and observer design patterns. The system -focuses not only on generation of the desired final form, but also on the -correctness of sequence of rules applied to make sure that the derivation has -taken place in strict adherence to Astadhyayi. The proposed system's design -allows to incorporate various conflict resolution methods mentioned in -authentic texts and hence the effectiveness of those rules can be validated -with the results from the system. We also present cases where we have checked -the applicability of the system with the rules which are not specifically -applicable to derivation of derivative nouns, in order to see the effectiveness -of the proposed schema as a generic system for modeling Astadhyayi. -" -2352,1512.05726,"Tao Lei, Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, Katerina - Tymoshenko, Alessandro Moschitti, Lluis Marquez",Semi-supervised Question Retrieval with Gated Convolutions,cs.CL cs.NE," Question answering forums are rapidly growing in size with no effective -automated ability to refer to and reuse answers already available for previous -posted questions. In this paper, we develop a methodology for finding -semantically related questions. The task is difficult since 1) key pieces of -information are often buried in extraneous details in the question body and 2) -available annotations on similar questions are scarce and fragmented. We design -a recurrent and convolutional model (gated convolution) to effectively map -questions to their semantic representations. The models are pre-trained within -an encoder-decoder framework (from body to title) on the basis of the entire -raw corpus, and fine-tuned discriminatively from limited annotations. Our -evaluation demonstrates that our model yields substantial gains over a standard -IR baseline and various neural network architectures (including CNNs, LSTMs and -GRUs). -" -2353,1512.05742,"Iulian Vlad Serban, Ryan Lowe, Peter Henderson, Laurent Charlin, - Joelle Pineau",A Survey of Available Corpora for Building Data-Driven Dialogue Systems,cs.CL cs.AI cs.HC cs.LG stat.ML," During the past decade, several areas of speech and language understanding -have witnessed substantial breakthroughs from the use of data-driven models. In -the area of dialogue systems, the trend is less obvious, and most practical -systems are still built through significant engineering and expert knowledge. -Nevertheless, several recent results suggest that data-driven approaches are -feasible and quite promising. To facilitate research in this area, we have -carried out a wide survey of publicly available datasets suitable for -data-driven learning of dialogue systems. We discuss important characteristics -of these datasets, how they can be used to learn diverse dialogue strategies, -and their other potential uses. We also examine methods for transfer learning -between datasets and the use of external knowledge. Finally, we discuss -appropriate choice of evaluation metrics for the learning objective. -" -2354,1512.05919,"Bing Qin, Duyu Tang, Xinwei Geng, Dandan Ning, Jiahao Liu and Ting Liu",A Planning based Framework for Essay Generation,cs.CL," Generating an article automatically with computer program is a challenging -task in artificial intelligence and natural language processing. In this paper, -we target at essay generation, which takes as input a topic word in mind and -generates an organized article under the theme of the topic. We follow the idea -of text planning \cite{Reiter1997} and develop an essay generation framework. -The framework consists of three components, including topic understanding, -sentence extraction and sentence reordering. For each component, we studied -several statistical algorithms and empirically compared between them in terms -of qualitative or quantitative analysis. Although we run experiments on Chinese -corpus, the method is language independent and can be easily adapted to other -language. We lay out the remaining challenges and suggest avenues for future -research. -" -2355,1512.06110,Manaal Faruqui and Yulia Tsvetkov and Graham Neubig and Chris Dyer,"Morphological Inflection Generation Using Character Sequence to Sequence - Learning",cs.CL," Morphological inflection generation is the task of generating the inflected -form of a given lemma corresponding to a particular linguistic transformation. -We model the problem of inflection generation as a character sequence to -sequence learning problem and present a variant of the neural encoder-decoder -model for solving it. Our model is language independent and can be trained in -both supervised and semi-supervised settings. We evaluate our system on seven -datasets of morphologically rich languages and achieve either better or -comparable results to existing state-of-the-art models of inflection -generation. -" -2356,1512.06612,"Lili Mou, Rui Yan, Ge Li, Lu Zhang, Zhi Jin","Backward and Forward Language Modeling for Constrained Sentence - Generation",cs.CL cs.LG cs.NE," Recent language models, especially those based on recurrent neural networks -(RNNs), make it possible to generate natural language from a learned -probability. Language generation has wide applications including machine -translation, summarization, question answering, conversation systems, etc. -Existing methods typically learn a joint probability of words conditioned on -additional information, which is (either statically or dynamically) fed to -RNN's hidden layer. In many applications, we are likely to impose hard -constraints on the generated texts, i.e., a particular word must appear in the -sentence. Unfortunately, existing approaches could not solve this problem. In -this paper, we propose a novel backward and forward language model. Provided a -specific word, we use RNNs to generate previous words and future words, either -simultaneously or asynchronously, resulting in two model variants. In this way, -the given word could appear at any position in the sentence. Experimental -results show that the generated texts are comparable to sequential LMs in -quality. -" -2357,1512.06643,"Oscar Saz, Mortaza Doulaty, Salil Deena, Rosanna Milner, Raymond W.M. - Ng, Madina Hasan, Yulan Liu, Thomas Hain","The 2015 Sheffield System for Transcription of Multi-Genre Broadcast - Media",cs.CL," We describe the University of Sheffield system for participation in the 2015 -Multi-Genre Broadcast (MGB) challenge task of transcribing multi-genre -broadcast shows. Transcription was one of four tasks proposed in the MGB -challenge, with the aim of advancing the state of the art of automatic speech -recognition, speaker diarisation and automatic alignment of subtitles for -broadcast media. Four topics are investigated in this work: Data selection -techniques for training with unreliable data, automatic speech segmentation of -broadcast media shows, acoustic modelling and adaptation in highly variable -environments, and language modelling of multi-genre shows. The final system -operates in multiple passes, using an initial unadapted decoding stage to -refine segmentation, followed by three adapted passes: a hybrid DNN pass with -input features normalised by speaker-based cepstral normalisation, another -hybrid stage with input features normalised by speaker feature-MLLR -transformations, and finally a bottleneck-based tandem stage with noise and -speaker factorisation. The combination of these three system outputs provides a -final error rate of 27.5% on the official development set, consisting of 47 -multi-genre shows. -" -2358,1512.07046,"Jan Rupnik, Andrej Muhic, Gregor Leban, Primoz Skraba, Blaz Fortuna, - Marko Grobelnik","News Across Languages - Cross-Lingual Document Similarity and Event - Tracking",cs.IR cs.CL," In today's world, we follow news which is distributed globally. Significant -events are reported by different sources and in different languages. In this -work, we address the problem of tracking of events in a large multilingual -stream. Within a recently developed system Event Registry we examine two -aspects of this problem: how to compare articles in different languages and how -to link collections of articles in different languages which refer to the same -event. Taking a multilingual stream and clusters of articles from each -language, we compare different cross-lingual document similarity measures based -on Wikipedia. This allows us to compute the similarity of any two articles -regardless of language. Building on previous work, we show there are methods -which scale well and can compute a meaningful similarity between articles from -languages with little or no direct overlap in the training data. Using this -capability, we then propose an approach to link clusters of articles across -languages which represent the same event. We provide an extensive evaluation of -the system as a whole, as well as an evaluation of the quality and robustness -of the similarity measure and the linking algorithm. -" -2359,1512.07281,"Qian Zhang, Bruno Gon\c{c}alves",Topical differences between Chinese language Twitter and Sina Weibo,cs.SI cs.CL physics.soc-ph," Sina Weibo, China's most popular microblogging platform, is currently used by -over $500M$ users and is considered to be a proxy of Chinese social life. In -this study, we contrast the discussions occurring on Sina Weibo and on Chinese -language Twitter in order to observe two different strands of Chinese culture: -people within China who use Sina Weibo with its government imposed restrictions -and those outside that are free to speak completely anonymously. We first -propose a simple ad-hoc algorithm to identify topics of Tweets and Weibo. -Different from previous works on micro-message topic detection, our algorithm -considers topics of the same contents but with different \#tags. Our algorithm -can also detect topics for Tweets and Weibos without any \#tags. Using a large -corpus of Weibo and Chinese language tweets, covering the period from January -$1$ to December $31$, $2012$, we obtain a list of topics using clustered \#tags -that we can then use to compare the two platforms. Surprisingly, we find that -there are no common entries among the Top $100$ most popular topics. -Furthermore, only $9.2\%$ of tweets correspond to the Top $1000$ topics on Sina -Weibo platform, and conversely only $4.4\%$ of weibos were found to discuss the -most popular Twitter topics. Our results reveal significant differences in -social attention on the two platforms, with most popular topics on Sina Weibo -relating to entertainment while most tweets corresponded to cultural or -political contents that is practically non existent in Sina Weibo. -" -2360,1512.07685,"Nurulhuda A. Manaf (Computer Science, University of Surrey), Sotiris - Moschoyiannis (Computer Science, University of Surrey), Paul Krause (Computer - Science, University of Surrey)","Service Choreography, SBVR, and Time",cs.SE cs.CL," We propose the use of structured natural language (English) in specifying -service choreographies, focusing on the what rather than the how of the -required coordination of participant services in realising a business -application scenario. The declarative approach we propose uses the OMG standard -Semantics of Business Vocabulary and Rules (SBVR) as a modelling language. The -service choreography approach has been proposed for describing the global -orderings of the invocations on interfaces of participant services. We -therefore extend SBVR with a notion of time which can capture the coordination -of the participant services, in terms of the observable message exchanges -between them. The extension is done using existing modelling constructs in -SBVR, and hence respects the standard specification. The idea is that users - -domain specialists rather than implementation specialists - can verify the -requested service composition by directly reading the structured English used -by SBVR. At the same time, the SBVR model can be represented in formal logic so -it can be parsed and executed by a machine. -" -2361,1512.08066,"Chung-Hyok Jang (1), Kwang-Hyok Kim (2) ((1) Foreign Language Faculty, - Kim Il Sung University, (2) Computer Science College, Kim Il Sung University)","The Improvement of Negative Sentences Translation in English-to-Korean - Machine Translation",cs.CL," This paper describes the algorithm for translating English negative sentences -into Korean in English-Korean Machine Translation (EKMT). The proposed -algorithm is based on the comparative study of English and Korean negative -sentences. The earlier translation software cannot translate English negative -sentences into accurate Korean equivalents. We established a new algorithm for -the negative sentence translation and evaluated it. -" -2362,1512.08183,"Bofang Li, Tao Liu, Xiaoyong Du, Deyuan Zhang, Zhe Zhao","Learning Document Embeddings by Predicting N-grams for Sentiment - Classification of Long Movie Reviews",cs.CL," Despite the loss of semantic information, bag-of-ngram based methods still -achieve state-of-the-art results for tasks such as sentiment classification of -long movie reviews. Many document embeddings methods have been proposed to -capture semantics, but they still can't outperform bag-of-ngram based methods -on this task. In this paper, we modify the architecture of the recently -proposed Paragraph Vector, allowing it to learn document vectors by predicting -not only words, but n-gram features as well. Our model is able to capture both -semantics and word order in documents while keeping the expressive power of -learned vectors. Experimental results on IMDB movie review dataset shows that -our model outperforms previous deep learning models and bag-of-ngram based -models due to the above advantages. More robust results are also obtained when -our model is combined with other models. The source code of our model will be -also published together with this paper. -" -2363,1512.08347,"Yang Lou, Guanrong Chen and Jianwei Hu",Communicating with sentences: A multi-word naming game model,cs.CL physics.soc-ph," Naming game simulates the process of naming an object by a single word, in -which a population of communicating agents can reach global consensus -asymptotically through iteratively pair-wise conversations. We propose an -extension of the single-word model to a multi-word naming game (MWNG), -simulating the case of describing a complex object by a sentence (multiple -words). Words are defined in categories, and then organized as sentences by -combining them from different categories. We refer to a formatted combination -of several words as a pattern. In such an MWNG, through a pair-wise -conversation, it requires the hearer to achieve consensus with the speaker with -respect to both every single word in the sentence as well as the sentence -pattern, so as to guarantee the correct meaning of the saying, otherwise, they -fail reaching consensus in the interaction. We validate the model in three -typical topologies as the underlying communication network, and employ both -conventional and man-designed patterns in performing the MWNG. -" -2364,1512.08422,"Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, Zhi Jin","Natural Language Inference by Tree-Based Convolution and Heuristic - Matching",cs.CL cs.LG," In this paper, we propose the TBCNN-pair model to recognize entailment and -contradiction between two sentences. In our model, a tree-based convolutional -neural network (TBCNN) captures sentence-level semantics; then heuristic -matching layers like concatenation, element-wise product/difference combine the -information in individual sentences. Experimental results show that our model -outperforms existing sentence encoding-based approaches by a large margin. -" -2365,1512.08569,Roger Bilisoly,"Analyzing Walter Skeat's Forty-Five Parallel Extracts of William - Langland's Piers Plowman",stat.AP cs.CL," Walter Skeat published his critical edition of William Langland's 14th -century alliterative poem, Piers Plowman, in 1886. In preparation for this he -located forty-five manuscripts, and to compare dialects, he published excerpts -from each of these. This paper does three statistical analyses using these -excerpts, each of which mimics a task he did in writing his critical edition. -First, he combined multiple versions of a poetic line to create a best line, -which is compared to the mean string that is computed by a generalization of -the arithmetic mean that uses edit distance. Second, he claims that a certain -subset of manuscripts varies little. This is quantified by computing a string -variance, which is closely related to the above generalization of the mean. -Third, he claims that the manuscripts fall into three groups, which is a -clustering problem that is addressed by using edit distance. The overall goal -is to develop methodology that would be of use to a literary critic. -" -2366,1512.08849,Shuohang Wang and Jing Jiang,Learning Natural Language Inference with LSTM,cs.CL cs.AI cs.NE," Natural language inference (NLI) is a fundamentally important task in natural -language processing that has many applications. The recently released Stanford -Natural Language Inference (SNLI) corpus has made it possible to develop and -evaluate learning-centered methods such as deep neural networks for natural -language inference (NLI). In this paper, we propose a special long short-term -memory (LSTM) architecture for NLI. Our model builds on top of a recently -proposed neural attention model for NLI but is based on a significantly -different idea. Instead of deriving sentence embeddings for the premise and the -hypothesis to be used for classification, our solution uses a match-LSTM to -perform word-by-word matching of the hypothesis with the premise. This LSTM is -able to place more emphasis on important word-level matching results. In -particular, we observe that this LSTM remembers important mismatches that are -critical for predicting the contradiction or the neutral relationship label. On -the SNLI corpus, our model achieves an accuracy of 86.1%, outperforming the -state of the art. -" -2367,1512.08903,"Kyuyeon Hwang, Minjae Lee, Wonyong Sung",Online Keyword Spotting with a Character-Level Recurrent Neural Network,cs.CL cs.LG cs.NE," In this paper, we propose a context-aware keyword spotting model employing a -character-level recurrent neural network (RNN) for spoken term detection in -continuous speech. The RNN is end-to-end trained with connectionist temporal -classification (CTC) to generate the probabilities of character and -word-boundary labels. There is no need for the phonetic transcription, senone -modeling, or system dictionary in training and testing. Also, keywords can -easily be added and modified by editing the text based keyword list without -retraining the RNN. Moreover, the unidirectional RNN processes an infinitely -long input audio streams without pre-segmentation and keywords are detected -with low-latency before the utterance is finished. Experimental results show -that the proposed keyword spotter significantly outperforms the deep neural -network (DNN) and hidden Markov model (HMM) based keyword-filler model even -with less computations. -" -2368,1512.08982,Sucheta Ghosh,Technical Report: a tool for measuring Prosodic Accommodation,cs.SD cs.CL," This article has been withdrawn by arXiv administrators because the submitter -did not have the legal authority to grant the license applied to the work. -" -2369,1601.00025,"Mohamed Elhoseiny, Ahmed Elgammal, Babak Saleh",Write a Classifier: Predicting Visual Classifiers from Unstructured Text,cs.CV cs.CL cs.LG," People typically learn through exposure to visual concepts associated with -linguistic descriptions. For instance, teaching visual object categories to -children is often accompanied by descriptions in text or speech. In a machine -learning context, these observations motivates us to ask whether this learning -process could be computationally modeled to learn visual classifiers. More -specifically, the main question of this work is how to utilize purely textual -description of visual classes with no training images, to learn explicit visual -classifiers for them. We propose and investigate two baseline formulations, -based on regression and domain transfer, that predict a linear classifier. -Then, we propose a new constrained optimization formulation that combines a -regression function and a knowledge transfer function with additional -constraints to predict the parameters of a linear classifier. We also propose a -generic kernelized models where a kernel classifier is predicted in the form -defined by the representer theorem. The kernelized models allow defining and -utilizing any two RKHS (Reproducing Kernel Hilbert Space) kernel functions in -the visual space and text space, respectively. We finally propose a kernel -function between unstructured text descriptions that builds on distributional -semantics, which shows an advantage in our setting and could be useful for -other applications. We applied all the studied models to predict visual -classifiers on two fine-grained and challenging categorization datasets (CU -Birds and Flower Datasets), and the results indicate successful predictions of -our final model over several baselines that we designed. -" -2370,1601.00087,"Mohammed Korayem, Khalifeh Aljadda, and David Crandall",Sentiment/Subjectivity Analysis Survey for Languages other than English,cs.CL," Subjective and sentiment analysis have gained considerable attention -recently. Most of the resources and systems built so far are done for English. -The need for designing systems for other languages is increasing. This paper -surveys different ways used for building systems for subjective and sentiment -analysis for languages other than English. There are three different types of -systems used for building these systems. The first (and the best) one is the -language specific systems. The second type of systems involves reusing or -transferring sentiment resources from English to the target language. The third -type of methods is based on using language independent methods. The paper -presents a separate section devoted to Arabic sentiment analysis. -" -2371,1601.00248,"Kushal Arora, Anand Rangarajan","Contrastive Entropy: A new evaluation metric for unnormalized language - models",cs.CL," Perplexity (per word) is the most widely used metric for evaluating language -models. Despite this, there has been no dearth of criticism for this metric. -Most of these criticisms center around lack of correlation with extrinsic -metrics like word error rate (WER), dependence upon shared vocabulary for model -comparison and unsuitability for unnormalized language model evaluation. In -this paper, we address the last problem and propose a new discriminative -entropy based intrinsic metric that works for both traditional word level -models and unnormalized language models like sentence level models. We also -propose a discriminatively trained sentence level interpretation of recurrent -neural network based language model (RNN) as an example of unnormalized -sentence level model. We demonstrate that for word level models, contrastive -entropy shows a strong correlation with perplexity. We also observe that when -trained at lower distortion levels, sentence level RNN considerably outperforms -traditional RNNs on this new metric. -" -2372,1601.00372,Jiwei Li and Dan Jurafsky,"Mutual Information and Diverse Decoding Improve Neural Machine - Translation",cs.CL cs.AI," Sequence-to-sequence neural translation models learn semantic and syntactic -relations between sentence pairs by optimizing the likelihood of the target -given the source, i.e., $p(y|x)$, an objective that ignores other potentially -useful sources of information. We introduce an alternative objective function -for neural MT that maximizes the mutual information between the source and -target sentences, modeling the bi-directional dependency of sources and -targets. We implement the model with a simple re-ranking method, and also -introduce a decoding algorithm that increases diversity in the N-best list -produced by the first pass. Applied to the WMT German/English and -French/English tasks, the proposed models offers a consistent performance boost -on both standard LSTM and attention-based neural MT architectures. -" -2373,1601.00620,"Lidong Bing, Mingyang Ling, Richard C. Wang, William W. Cohen",Distant IE by Bootstrapping Using Lists and Document Structure,cs.CL," Distant labeling for information extraction (IE) suffers from noisy training -data. We describe a way of reducing the noise associated with distant IE by -identifying coupling constraints between potential instance labels. As one -example of coupling, items in a list are likely to have the same label. A -second example of coupling comes from analysis of document structure: in some -corpora, sections can be identified such that items in the same section are -likely to have the same label. Such sections do not exist in all corpora, but -we show that augmenting a large corpus with coupling constraints from even a -small, well-structured corpus can improve performance substantially, doubling -F1 on one task. -" -2374,1601.00710,Barret Zoph and Kevin Knight,Multi-Source Neural Translation,cs.CL," We build a multi-source machine translation model and train it to maximize -the probability of a target English string given French and German sources. -Using the neural encoder-decoder framework, we explore several combination -methods and report up to +4.8 Bleu increases on top of a very strong -attention-based neural translation model. -" -2375,1601.00770,Makoto Miwa and Mohit Bansal,"End-to-End Relation Extraction using LSTMs on Sequences and Tree - Structures",cs.CL cs.LG," We present a novel end-to-end neural model to extract entities and relations -between them. Our recurrent neural network based model captures both word -sequence and dependency tree substructure information by stacking bidirectional -tree-structured LSTM-RNNs on bidirectional sequential LSTM-RNNs. This allows -our model to jointly represent both entities and relations with shared -parameters in a single model. We further encourage detection of entities during -training and use of entity information in relation extraction via entity -pretraining and scheduled sampling. Our model improves over the -state-of-the-art feature-based model on end-to-end relation extraction, -achieving 12.1% and 5.7% relative error reductions in F1-score on ACE2005 and -ACE2004, respectively. We also show that our LSTM-RNN based model compares -favorably to the state-of-the-art CNN based model (in F1-score) on nominal -relation classification (SemEval-2010 Task 8). Finally, we present an extensive -ablation analysis of several model components. -" -2376,1601.00816,Pierre-Yves Oudeyer (Flowers),"Open challenges in understanding development and evolution of speech - forms: The roles of embodied self-organization, motivation and active - exploration",cs.AI cs.CL cs.CY cs.LG," This article discusses open scientific challenges for understanding -development and evolution of speech forms, as a commentary to Moulin-Frier et -al. (Moulin-Frier et al., 2015). Based on the analysis of mathematical models -of the origins of speech forms, with a focus on their assumptions , we study -the fundamental question of how speech can be formed out of non--speech, at -both developmental and evolutionary scales. In particular, we emphasize the -importance of embodied self-organization , as well as the role of mechanisms of -motivation and active curiosity-driven exploration in speech formation. Finally -, we discuss an evolutionary-developmental perspective of the origins of -speech. -" -2377,1601.00893,"Oren Melamud, David McClosky, Siddharth Patwardhan, Mohit Bansal",The Role of Context Types and Dimensionality in Learning Word Embeddings,cs.CL," We provide the first extensive evaluation of how using different types of -context to learn skip-gram word embeddings affects performance on a wide range -of intrinsic and extrinsic NLP tasks. Our results suggest that while intrinsic -tasks tend to exhibit a clear preference to particular types of contexts and -higher dimensionality, more careful tuning is required for finding the optimal -settings for most of the extrinsic tasks that we considered. Furthermore, for -these extrinsic tasks, we find that once the benefit from increasing the -embedding dimensionality is mostly exhausted, simple concatenation of word -embeddings, learned with different context types, can yield further performance -gains. As an additional contribution, we propose a new variant of the skip-gram -model that learns word embeddings from weighted contexts of substitute words. -" -2378,1601.00901,Janez Starc and Dunja Mladeni\'c,Joint learning of ontology and semantic parser from text,cs.AI cs.CL," Semantic parsing methods are used for capturing and representing semantic -meaning of text. Meaning representation capturing all the concepts in the text -may not always be available or may not be sufficiently complete. Ontologies -provide a structured and reasoning-capable way to model the content of a -collection of texts. In this work, we present a novel approach to joint -learning of ontology and semantic parser from text. The method is based on -semi-automatic induction of a context-free grammar from semantically annotated -text. The grammar parses the text into semantic trees. Both, the grammar and -the semantic trees are used to learn the ontology on several levels -- classes, -instances, taxonomic and non-taxonomic relations. The approach was evaluated on -the first sentences of Wikipedia pages describing people. -" -2379,1601.01073,"Orhan Firat, Kyunghyun Cho and Yoshua Bengio","Multi-Way, Multilingual Neural Machine Translation with a Shared - Attention Mechanism",cs.CL stat.ML," We propose multi-way, multilingual neural machine translation. The proposed -approach enables a single neural translation model to translate between -multiple languages, with a number of parameters that grows only linearly with -the number of languages. This is made possible by having a single attention -mechanism that is shared across all language pairs. We train the proposed -multi-way, multilingual model on ten language pairs from WMT'15 simultaneously -and observe clear performance improvements over models trained on only one -language pair. In particular, we observe that the proposed model significantly -improves the translation quality of low-resource language pairs. -" -2380,1601.01085,"Trevor Cohn and Cong Duy Vu Hoang and Ekaterina Vymolova and Kaisheng - Yao and Chris Dyer and Gholamreza Haffari","Incorporating Structural Alignment Biases into an Attentional Neural - Translation Model",cs.CL," Neural encoder-decoder models of machine translation have achieved impressive -results, rivalling traditional translation models. However their modelling -formulation is overly simplistic, and omits several key inductive biases built -into traditional models. In this paper we extend the attentional neural -translation model to include structural biases from word based alignment -models, including positional bias, Markov conditioning, fertility and agreement -over translation directions. We show improvements over a baseline attentional -model and standard phrase-based model over several language pairs, evaluating -on difficult languages in a low resource setting. -" -2381,1601.01195,Kamal Sarkar,"Part-of-Speech Tagging for Code-mixed Indian Social Media Text at ICON - 2015",cs.CL," This paper discusses the experiments carried out by us at Jadavpur University -as part of the participation in ICON 2015 task: POS Tagging for Code-mixed -Indian Social Media Text. The tool that we have developed for the task is based -on Trigram Hidden Markov Model that utilizes information from dictionary as -well as some other word level features to enhance the observation probabilities -of the known tokens as well as unknown tokens. We submitted runs for -Bengali-English, Hindi-English and Tamil-English Language pairs. Our system has -been trained and tested on the datasets released for ICON 2015 shared task: POS -Tagging For Code-mixed Indian Social Media Text. In constrained mode, our -system obtains average overall accuracy (averaged over all three language -pairs) of 75.60% which is very close to other participating two systems (76.79% -for IIITH and 75.79% for AMRITA_CEN) ranked higher than our system. In -unconstrained mode, our system obtains average overall accuracy of 70.65% which -is also close to the system (72.85% for AMRITA_CEN) which obtains the highest -average overall accuracy. -" -2382,1601.01272,"Ke Tran, Arianna Bisazza and Christof Monz",Recurrent Memory Networks for Language Modeling,cs.CL," Recurrent Neural Networks (RNN) have obtained excellent result in many -natural language processing (NLP) tasks. However, understanding and -interpreting the source of this success remains a challenge. In this paper, we -propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only -amplifies the power of RNN but also facilitates our understanding of its -internal functioning and allows us to discover underlying patterns in data. We -demonstrate the power of RMN on language modeling and sentence completion -tasks. On language modeling, RMN outperforms Long Short-Term Memory (LSTM) -network on three large German, Italian, and English dataset. Additionally we -perform in-depth analysis of various linguistic dimensions that RMN captures. -On Sentence Completion Challenge, for which it is essential to capture sentence -coherence, our RMN obtains 69.2% accuracy, surpassing the previous -state-of-the-art by a large margin. -" -2383,1601.01280,"Li Dong, Mirella Lapata",Language to Logical Form with Neural Attention,cs.CL," Semantic parsing aims at mapping natural language to machine interpretable -meaning representations. Traditional approaches rely on high-quality lexicons, -manually-built templates, and linguistic features which are either domain- or -representation-specific. In this paper we present a general method based on an -attention-enhanced encoder-decoder model. We encode input utterances into -vector representations, and generate their logical forms by conditioning the -output sequences or trees on the encoding vectors. Experimental results on four -datasets show that our approach performs competitively without using -hand-engineered features and is easy to adapt across domains and meaning -representations. -" -2384,1601.01343,"Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji","Joint Learning of the Embedding of Words and Entities for Named Entity - Disambiguation",cs.CL," Named Entity Disambiguation (NED) refers to the task of resolving multiple -named entity mentions in a document to their correct references in a knowledge -base (KB) (e.g., Wikipedia). In this paper, we propose a novel embedding method -specifically designed for NED. The proposed method jointly maps words and -entities into the same continuous vector space. We extend the skip-gram model -by using two models. The KB graph model learns the relatedness of entities -using the link structure of the KB, whereas the anchor context model aims to -align vectors such that similar words and entities occur close to one another -in the vector space by leveraging KB anchors and their context words. By -combining contexts based on the proposed embedding with standard NED features, -we achieved state-of-the-art accuracy of 93.1% on the standard CoNLL dataset -and 85.2% on the TAC 2010 dataset. -" -2385,1601.01356,Makbule Gulcin Ozsoy,From Word Embeddings to Item Recommendation,cs.LG cs.CL cs.IR cs.SI," Social network platforms can use the data produced by their users to serve -them better. One of the services these platforms provide is recommendation -service. Recommendation systems can predict the future preferences of users -using their past preferences. In the recommendation systems literature there -are various techniques, such as neighborhood based methods, machine-learning -based methods and matrix-factorization based methods. In this work, a set of -well known methods from natural language processing domain, namely Word2Vec, is -applied to recommendation systems domain. Unlike previous works that use -Word2Vec for recommendation, this work uses non-textual features, the -check-ins, and it recommends venues to visit/check-in to the target users. For -the experiments, a Foursquare check-in dataset is used. The results show that -use of continuous vector space representations of items modeled by techniques -of Word2Vec is promising for making recommendations. -" -2386,1601.01530,"Gakuto Kurata, Bing Xiang, Bowen Zhou, Mo Yu","Leveraging Sentence-level Information with Encoder LSTM for Semantic - Slot Filling",cs.CL," Recurrent Neural Network (RNN) and one of its specific architectures, Long -Short-Term Memory (LSTM), have been widely used for sequence labeling. In this -paper, we first enhance LSTM-based sequence labeling to explicitly model label -dependencies. Then we propose another enhancement to incorporate the global -information spanning over the whole input sequence. The latter proposed method, -encoder-labeler LSTM, first encodes the whole input sequence into a fixed -length vector with the encoder LSTM, and then uses this encoded vector as the -initial state of another LSTM for sequence labeling. Combining these methods, -we can predict the label sequence with considering label dependencies and -information of whole input sequence. In the experiments of a slot filling task, -which is an essential component of natural language understanding, with using -the standard ATIS corpus, we achieved the state-of-the-art F1-score of 95.66%. -" -2387,1601.01705,"Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein",Learning to Compose Neural Networks for Question Answering,cs.CL cs.CV cs.NE," We describe a question answering model that applies to both images and -structured knowledge bases. The model uses natural language strings to -automatically assemble neural networks from a collection of composable modules. -Parameters for these modules are learned jointly with network-assembly -parameters via reinforcement learning, with only (world, question, answer) -triples as supervision. Our approach, which we term a dynamic neural model -network, achieves state-of-the-art results on benchmark datasets in both visual -and structured domains. -" -2388,1601.01887,Rustam Tagiew,Research Project: Text Engineering Tool for Ontological Scientometry,cs.CL cs.DL," The number of scientific papers grows exponentially in many disciplines. The -share of online available papers grows as well. At the same time, the period of -time for a paper to loose at chance to be cited anymore shortens. The decay of -the citing rate shows similarity to ultradiffusional processes as for other -online contents in social networks. The distribution of papers per author shows -similarity to the distribution of posts per user in social networks. The rate -of uncited papers for online available papers grows while some papers 'go -viral' in terms of being cited. Summarized, the practice of scientific -publishing moves towards the domain of social networks. The goal of this -project is to create a text engineering tool, which can semi-automatically -categorize a paper according to its type of contribution and extract -relationships between them into an ontological database. Semi-automatic -categorization means that the mistakes made by automatic pre-categorization and -relationship-extraction will be corrected through a wikipedia-like front-end by -volunteers from general public. This tool should not only help researchers and -the general public to find relevant supplementary material and peers faster, -but also provide more information for research funding agencies. -" -2389,1601.02166,Anders S{\o}gaard,Empirical Gaussian priors for cross-lingual transfer learning,cs.CL," Sequence model learning algorithms typically maximize log-likelihood minus -the norm of the model (or minimize Hamming loss + norm). In cross-lingual -part-of-speech (POS) tagging, our target language training data consists of -sequences of sentences with word-by-word labels projected from translations in -$k$ languages for which we have labeled data, via word alignments. Our training -data is therefore very noisy, and if Rademacher complexity is high, learning -algorithms are prone to overfit. Norm-based regularization assumes a constant -width and zero mean prior. We instead propose to use the $k$ source language -models to estimate the parameters of a Gaussian prior for learning new POS -taggers. This leads to significantly better performance in multi-source -transfer set-ups. We also present a drop-out version that injects (empirical) -Gaussian noise during online learning. Finally, we note that using empirical -Gaussian priors leads to much lower Rademacher complexity, and is superior to -optimally weighted model interpolation. -" -2390,1601.02403,Ivan Habernal and Iryna Gurevych,Argumentation Mining in User-Generated Web Discourse,cs.CL," The goal of argumentation mining, an evolving research field in computational -linguistics, is to design methods capable of analyzing people's argumentation. -In this article, we go beyond the state of the art in several ways. (i) We deal -with actual Web data and take up the challenges given by the variety of -registers, multiple domains, and unrestricted noisy user-generated Web -discourse. (ii) We bridge the gap between normative argumentation theories and -argumentation phenomena encountered in actual data by adapting an argumentation -model tested in an extensive annotation study. (iii) We create a new gold -standard corpus (90k tokens in 340 documents) and experiment with several -machine learning methods to identify argument components. We offer the data, -source codes, and annotation guidelines to the community under free licenses. -Our findings show that argumentation mining in user-generated Web discourse is -a feasible but challenging task. -" -2391,1601.02431,"Claudia Peersman, Walter Daelemans, Reinhild Vandekerckhove, Bram - Vandekerckhove, Leona Van Vaerenbergh","The Effects of Age, Gender and Region on Non-standard Linguistic - Variation in Online Social Networks",cs.CL," We present a corpus-based analysis of the effects of age, gender and region -of origin on the production of both ""netspeak"" or ""chatspeak"" features and -regional speech features in Flemish Dutch posts that were collected from a -Belgian online social network platform. The present study shows that combining -quantitative and qualitative approaches is essential for understanding -non-standard linguistic variation in a CMC corpus. It also presents a -methodology that enables the systematic study of this variation by including -all non-standard words in the corpus. The analyses resulted in a convincing -illustration of the Adolescent Peak Principle. In addition, our approach -revealed an intriguing correlation between the use of regional speech features -and chatspeak features. -" -2392,1601.02502,"Jocelyn Coulmance, Jean-Marc Marty, Guillaume Wenzek, Amine Benhalloum","Trans-gram, Fast Cross-lingual Word-embeddings",cs.CL," We introduce Trans-gram, a simple and computationally-efficient method to -simultaneously learn and align wordembeddings for a variety of languages, using -only monolingual data and a smaller set of sentence-aligned data. We use our -new method to compute aligned wordembeddings for twenty-one languages using -English as a pivot language. We show that some linguistic features are aligned -across languages for which we do not have aligned data, even though those -properties do not exist in the pivot language. We also achieve state of the art -results on standard cross-lingual text classification and word translation -tasks. -" -2393,1601.02539,"Zhizheng Wu, Simon King",Investigating gated recurrent neural networks for speech synthesis,cs.CL cs.NE," Recently, recurrent neural networks (RNNs) as powerful sequence models have -re-emerged as a potential acoustic model for statistical parametric speech -synthesis (SPSS). The long short-term memory (LSTM) architecture is -particularly attractive because it addresses the vanishing gradient problem in -standard RNNs, making them easier to train. Although recent studies have -demonstrated that LSTMs can achieve significantly better performance on SPSS -than deep feed-forward neural networks, little is known about why. Here we -attempt to answer two questions: a) why do LSTMs work well as a sequence model -for SPSS; b) which component (e.g., input gate, output gate, forget gate) is -most important. We present a visual analysis alongside a series of experiments, -resulting in a proposal for a simplified architecture. The simplified -architecture has significantly fewer parameters than an LSTM, thus reducing -generation complexity considerably without degrading quality. -" -2394,1601.02543,"Vinod Kumar Pandey, Sunil Kumar Kopparapu",Evaluating the Performance of a Speech Recognition based System,cs.CL cs.AI cs.HC," Speech based solutions have taken center stage with growth in the services -industry where there is a need to cater to a very large number of people from -all strata of the society. While natural language speech interfaces are the -talk in the research community, yet in practice, menu based speech solutions -thrive. Typically in a menu based speech solution the user is required to -respond by speaking from a closed set of words when prompted by the system. A -sequence of human speech response to the IVR prompts results in the completion -of a transaction. A transaction is deemed successful if the speech solution can -correctly recognize all the spoken utterances of the user whenever prompted by -the system. The usual mechanism to evaluate the performance of a speech -solution is to do an extensive test of the system by putting it to actual -people use and then evaluating the performance by analyzing the logs for -successful transactions. This kind of evaluation could lead to dissatisfied -test users especially if the performance of the system were to result in a poor -transaction completion rate. To negate this the Wizard of Oz approach is -adopted during evaluation of a speech system. Overall this kind of evaluations -is an expensive proposition both in terms of time and cost. In this paper, we -propose a method to evaluate the performance of a speech solution without -actually putting it to people use. We first describe the methodology and then -show experimentally that this can be used to identify the performance -bottlenecks of the speech solution even before the system is actually used thus -saving evaluation time and expenses. -" -2395,1601.02553,"Suyoun Kim, Bhiksha Raj, Ian Lane",Environmental Noise Embeddings for Robust Speech Recognition,cs.CL," We propose a novel deep neural network architecture for speech recognition -that explicitly employs knowledge of the background environmental noise within -a deep neural network acoustic model. A deep neural network is used to predict -the acoustic environment in which the system in being used. The discriminative -embedding generated at the bottleneck layer of this network is then -concatenated with traditional acoustic features as input to a deep neural -network acoustic model. Through a series of experiments on Resource Management, -CHiME-3 task, and Aurora4, we show that the proposed approach significantly -improves speech recognition accuracy in noisy and highly reverberant -environments, outperforming multi-condition training, noise-aware training, -i-vector framework, and multi-task learning on both in-domain noise and unseen -noise. -" -2396,1601.02789,"Krzysztof Wo{\l}k, Danijel Kor\v{z}inek","Comparison and Adaptation of Automatic Evaluation Metrics for Quality - Assessment of Re-Speaking",cs.CL stat.AP stat.ML," Re-speaking is a mechanism for obtaining high quality subtitles for use in -live broadcast and other public events. Because it relies on humans performing -the actual re-speaking, the task of estimating the quality of the results is -non-trivial. Most organisations rely on humans to perform the actual quality -assessment, but purely automatic methods have been developed for other similar -problems, like Machine Translation. This paper will try to compare several of -these methods: BLEU, EBLEU, NIST, METEOR, METEOR-PL, TER and RIBES. These will -then be matched to the human-derived NER metric, commonly used in re-speaking. -" -2397,1601.02828,Pawel Swietojanski and Jinyu Li and Steve Renals,"Learning Hidden Unit Contributions for Unsupervised Acoustic Model - Adaptation",cs.CL cs.LG cs.SD," This work presents a broad study on the adaptation of neural network acoustic -models by means of learning hidden unit contributions (LHUC) -- a method that -linearly re-combines hidden units in a speaker- or environment-dependent manner -using small amounts of unsupervised adaptation data. We also extend LHUC to a -speaker adaptive training (SAT) framework that leads to a more adaptable DNN -acoustic model, working both in a speaker-dependent and a speaker-independent -manner, without the requirements to maintain auxiliary speaker-dependent -feature extractors or to introduce significant speaker-dependent changes to the -DNN structure. Through a series of experiments on four different speech -recognition benchmarks (TED talks, Switchboard, AMI meetings, and Aurora4) -comprising 270 test speakers, we show that LHUC in both its test-only and SAT -variants results in consistent word error rate reductions ranging from 5% to -23% relative depending on the task and the degree of mismatch between training -and test data. In addition, we have investigated the effect of the amount of -adaptation data per speaker, the quality of unsupervised adaptation targets, -the complementarity to other adaptation techniques, one-shot adaptation, and an -extension to adapting DNNs trained in a sequence discriminative manner. -" -2398,1601.03210,Carlos G\'omez-Rodr\'iguez and Ramon Ferrer-i-Cancho,"The scarcity of crossing dependencies: a direct outcome of a specific - constraint?",cs.CL cs.SI physics.soc-ph," The structure of a sentence can be represented as a network where vertices -are words and edges indicate syntactic dependencies. Interestingly, crossing -syntactic dependencies have been observed to be infrequent in human languages. -This leads to the question of whether the scarcity of crossings in languages -arises from an independent and specific constraint on crossings. We provide -statistical evidence suggesting that this is not the case, as the proportion of -dependency crossings of sentences from a wide range of languages can be -accurately estimated by a simple predictor based on a null hypothesis on the -local probability that two dependencies cross given their lengths. The relative -error of this predictor never exceeds 5% on average, whereas the error of a -baseline predictor assuming a random ordering of the words of a sentence is at -least 6 times greater. Our results suggest that the low frequency of crossings -in natural languages is neither originated by hidden knowledge of language nor -by the undesirability of crossings per se, but as a mere side effect of the -principle of dependency length minimization. -" -2399,1601.03288,"Vincent Van Asch, Walter Daelemans","Predicting the Effectiveness of Self-Training: Application to Sentiment - Classification",cs.CL," The goal of this paper is to investigate the connection between the -performance gain that can be obtained by selftraining and the similarity -between the corpora used in this approach. Self-training is a semi-supervised -technique designed to increase the performance of machine learning algorithms -by automatically classifying instances of a task and adding these as additional -training material to the same classifier. In the context of language processing -tasks, this training material is mostly an (annotated) corpus. Unfortunately -self-training does not always lead to a performance increase and whether it -will is largely unpredictable. We show that the similarity between corpora can -be used to identify those setups for which self-training can be beneficial. We -consider this research as a step in the process of developing a classifier that -is able to adapt itself to each new test corpus that it is presented with. -" -2400,1601.03313,Valentin Kassarnig,Political Speech Generation,cs.CL," In this report we present a system that can generate political speeches for a -desired political party. Furthermore, the system allows to specify whether a -speech should hold a supportive or opposing opinion. The system relies on a -combination of several state-of-the-art NLP methods which are discussed in this -report. These include n-grams, Justeson & Katz POS tag filter, recurrent neural -networks, and latent Dirichlet allocation. Sequences of words are generated -based on probabilities obtained from two underlying models: A language model -takes care of the grammatical correctness while a topic model aims for textual -consistency. Both models were trained on the Convote dataset which contains -transcripts from US congressional floor debates. Furthermore, we present a -manual and an automated approach to evaluate the quality of generated speeches. -In an experimental evaluation generated speeches have shown very high quality -in terms of grammatical correctness and sentence transitions. -" -2401,1601.03317,"Shi Feng, Shujie Liu, Mu Li, Ming Zhou","Implicit Distortion and Fertility Models for Attention-based - Encoder-Decoder NMT Model",cs.CL," Neural machine translation has shown very promising results lately. Most NMT -models follow the encoder-decoder framework. To make encoder-decoder models -more flexible, attention mechanism was introduced to machine translation and -also other tasks like speech recognition and image captioning. We observe that -the quality of translation by attention-based encoder-decoder can be -significantly damaged when the alignment is incorrect. We attribute these -problems to the lack of distortion and fertility models. Aiming to resolve -these problems, we propose new variations of attention-based encoder-decoder -and compare them with other models on machine translation. Our proposed method -achieved an improvement of 2 BLEU points over the original attention-based -encoder-decoder. -" -2402,1601.03348,"Kayhan Moharreri, Minsu Ha, Ross H Nehm","EvoGrader: an online formative assessment tool for automatically - evaluating written evolutionary explanations",cs.CL," EvoGrader is a free, online, on-demand formative assessment service designed -for use in undergraduate biology classrooms. EvoGrader's web portal is powered -by Amazon's Elastic Cloud and run with LightSIDE Lab's open-source -machine-learning tools. The EvoGrader web portal allows biology instructors to -upload a response file (.csv) containing unlimited numbers of evolutionary -explanations written in response to 86 different ACORNS (Assessing COntextual -Reasoning about Natural Selection) instrument items. The system automatically -analyzes the responses and provides detailed information about the scientific -and naive concepts contained within each student's response, as well as overall -student (and sample) reasoning model types. Graphs and visual models provided -by EvoGrader summarize class-level responses; downloadable files of raw scores -(in .csv format) are also provided for more detailed analyses. Although the -computational machinery that EvoGrader employs is complex, using the system is -easy. Users only need to know how to use spreadsheets to organize student -responses, upload files to the web, and use a web browser. A series of -experiments using new samples of 2,200 written evolutionary explanations -demonstrate that EvoGrader scores are comparable to those of trained human -raters, although EvoGrader scoring takes 99% less time and is free. EvoGrader -will be of interest to biology instructors teaching large classes who seek to -emphasize scientific practices such as generating scientific explanations, and -to teach crosscutting ideas such as evolution and natural selection. The -software architecture of EvoGrader is described as it may serve as a template -for developing machine-learning portals for other core concepts within biology -and across other disciplines. -" -2403,1601.03478,Afroze Ibrahim Baqapuri,Deep Learning Applied to Image and Text Matching,cs.LG cs.CL cs.CV," The ability to describe images with natural language sentences is the -hallmark for image and language understanding. Such a system has wide ranging -applications such as annotating images and using natural sentences to search -for images.In this project we focus on the task of bidirectional image -retrieval: such asystem is capable of retrieving an image based on a sentence -(image search) andretrieve sentence based on an image query (image annotation). -We present asystem based on a global ranking objective function which uses a -combinationof convolutional neural networks (CNN) and multi layer perceptrons -(MLP).It takes a pair of image and sentence and processes them in different -channels,finally embedding it into a common multimodal vector space. These -embeddingsencode abstract semantic information about the two inputs and can be -comparedusing traditional information retrieval approaches. For each such pair, -the modelreturns a score which is interpretted as a similarity metric. If this -score is high,the image and sentence are likely to convey similar meaning, and -if the score is low then they are likely not to. - The visual input is modeled via deep convolutional neural network. On -theother hand we explore three models for the textual module. The first one -isbag of words with an MLP. The second one uses n-grams (bigram, trigrams,and a -combination of trigram & skip-grams) with an MLP. The third is morespecialized -deep network specific for modeling variable length sequences (SSE).We report -comparable performance to recent work in the field, even though ouroverall -model is simpler. We also show that the training time choice of how wecan -generate our negative samples has a significant impact on performance, and can -be used to specialize the bi-directional system in one particular task. -" -2404,1601.03650,"Vuong Van Bui, Cuong Anh Le",Smoothing parameter estimation framework for IBM word alignment models,cs.CL," IBM models are very important word alignment models in Machine Translation. -Following the Maximum Likelihood Estimation principle to estimate their -parameters, the models will easily overfit the training data when the data are -sparse. While smoothing is a very popular solution in Language Model, there -still lacks studies on smoothing for word alignment. In this paper, we propose -a framework which generalizes the notable work Moore [2004] of applying -additive smoothing to word alignment models. The framework allows developers to -customize the smoothing amount for each pair of word. The added amount will be -scaled appropriately by a common factor which reflects how much the framework -trusts the adding strategy according to the performance on data. We also -carefully examine various performance criteria and propose a smoothened version -of the error count, which generally gives the best result. -" -2405,1601.03651,"Yan Xu, Ran Jia, Lili Mou, Ge Li, Yunchuan Chen, Yangyang Lu, Zhi Jin","Improved Relation Classification by Deep Recurrent Neural Networks with - Data Augmentation",cs.CL cs.LG," Nowadays, neural networks play an important role in the task of relation -classification. By designing different neural architectures, researchers have -improved the performance to a large extent in comparison with traditional -methods. However, existing neural networks for relation classification are -usually of shallow architectures (e.g., one-layer convolutional neural networks -or recurrent networks). They may fail to explore the potential representation -space in different abstraction levels. In this paper, we propose deep recurrent -neural networks (DRNNs) for relation classification to tackle this challenge. -Further, we propose a data augmentation method by leveraging the directionality -of relations. We evaluated our DRNNs on the SemEval-2010 Task~8, and achieve an -F1-score of 86.1%, outperforming previous state-of-the-art recorded results. -" -2406,1601.03764,"Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski","Linear Algebraic Structure of Word Senses, with Applications to Polysemy",cs.CL cs.LG stat.ML," Word embeddings are ubiquitous in NLP and information retrieval, but it is -unclear what they represent when the word is polysemous. Here it is shown that -multiple word senses reside in linear superposition within the word embedding -and simple sparse coding can recover vectors that approximately capture the -senses. The success of our approach, which applies to several embedding -methods, is mathematically explained using a variant of the random walk on -discourses model (Arora et al., 2016). A novel aspect of our technique is that -each extracted word sense is accompanied by one of about 2000 ""discourse atoms"" -that gives a succinct description of which other words co-occur with that word -sense. Discourse atoms can be of independent interest, and make the method -potentially more useful. Empirical tests are used to verify and support the -theory. -" -2407,1601.03783,Duygu Altinok,Towards Turkish ASR: Anatomy of a rule-based Turkish g2p,cs.CL," This paper describes the architecture and implementation of a rule-based -grapheme to phoneme converter for Turkish. The system accepts surface form as -input, outputs SAMPA mapping of the all parallel pronounciations according to -the morphological analysis together with stress positions. The system has been -implemented in Python -" -2408,1601.03896,"Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut - Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank","Automatic Description Generation from Images: A Survey of Models, - Datasets, and Evaluation Measures",cs.CL cs.CV," Automatic description generation from natural images is a challenging problem -that has recently received a large amount of interest from the computer vision -and natural language processing communities. In this survey, we classify the -existing approaches based on how they conceptualize this problem, viz., models -that cast description as either generation problem or as a retrieval problem -over a visual or multimodal representational space. We provide a detailed -review of existing models, highlighting their advantages and disadvantages. -Moreover, we give an overview of the benchmark image datasets and the -evaluation measures that have been developed to assess the quality of -machine-generated image descriptions. Finally we extrapolate future directions -in the area of automatic image description generation. -" -2409,1601.03916,"Julian Hitschler, Shigehiko Schamoni and Stefan Riezler",Multimodal Pivots for Image Caption Translation,cs.CL," We present an approach to improve statistical machine translation of image -descriptions by multimodal pivots defined in visual space. The key idea is to -perform image retrieval over a database of images that are captioned in the -target language, and use the captions of the most similar images for -crosslingual reranking of translation outputs. Our approach does not depend on -the availability of large amounts of in-domain parallel data, but only relies -on available large datasets of monolingually captioned images, and on -state-of-the-art convolutional neural networks to compute image similarities. -Our experimental evaluation shows improvements of 1 BLEU point over strong -baselines. -" -2410,1601.04012,Jugal Kalita,Detecting and Extracting Events from Text Documents,cs.CL," Events of various kinds are mentioned and discussed in text documents, -whether they are books, news articles, blogs or microblog feeds. The paper -starts by giving an overview of how events are treated in linguistics and -philosophy. We follow this discussion by surveying how events and associated -information are handled in computationally. In particular, we look at how -textual documents can be mined to extract events and ancillary information. -These days, it is mostly through the application of various machine learning -techniques. We also discuss applications of event detection and extraction -systems, particularly in summarization, in the medical domain and in the -context of Twitter posts. We end the paper with a discussion of challenges and -future directions. -" -2411,1601.04075,Igor A. Podgorny,"Modification of Question Writing Style Influences Content Popularity in - a Social Q&A System",cs.IR cs.CL cs.SI," TurboTax AnswerXchange is a social Q&A system supporting users working on -federal and state tax returns. Using 2015 data, we demonstrate that content -popularity (or number of views per AnswerXchange question) can be predicted -with reasonable accuracy based on attributes of the question alone. We also -employ probabilistic topic analysis and uplift modeling to identify question -features with the highest impact on popularity. We demonstrate that content -popularity is driven by behavioral attributes of AnswerXchange users and -depends on complex interactions between search ranking algorithms, -psycholinguistic factors and question writing style. Our findings provide a -rationale for employing popularity predictions to guide the users into -formulating better questions and editing the existing ones. For example, -starting question title with a question word or adding details to the question -increase number of views per question. Similar approach can be applied to -promoting AnswerXchange content indexed by Google to drive organic traffic to -TurboTax. -" -2412,1601.04468,Artem Sokolov and Stefan Riezler and Tanguy Urvoy,"Bandit Structured Prediction for Learning from Partial Feedback in - Statistical Machine Translation",cs.CL cs.LG," We present an approach to structured prediction from bandit feedback, called -Bandit Structured Prediction, where only the value of a task loss function at a -single predicted point, instead of a correct structure, is observed in -learning. We present an application to discriminative reranking in Statistical -Machine Translation (SMT) where the learning algorithm only has access to a -1-BLEU loss evaluation of a predicted translation instead of obtaining a gold -standard reference translation. In our experiment bandit feedback is obtained -by evaluating BLEU on reference translations without revealing them to the -algorithm. This can be thought of as a simulation of interactive machine -translation where an SMT system is personalized by a user who provides single -point feedback to predicted translations. Our experiments show that our -approach improves translation quality and is comparable to approaches that -employ more informative feedback in learning. -" -2413,1601.04580,Vinodh Krishnan and Jacob Eisenstein,Nonparametric Bayesian Storyline Detection from Microtexts,cs.CL cs.LG," News events and social media are composed of evolving storylines, which -capture public attention for a limited period of time. Identifying storylines -requires integrating temporal and linguistic information, and prior work takes -a largely heuristic approach. We present a novel online non-parametric Bayesian -framework for storyline detection, using the distance-dependent Chinese -Restaurant Process (dd-CRP). To ensure efficient linear-time inference, we -employ a fixed-lag Gibbs sampling procedure, which is novel for the dd-CRP. We -evaluate on the TREC Twitter Timeline Generation (TTG), obtaining encouraging -results: despite using a weak baseline retrieval model, the dd-CRP story -clustering method is competitive with the best entries in the 2014 TTG task. -" -2414,1601.04811,"Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, Hang Li",Modeling Coverage for Neural Machine Translation,cs.CL," Attention mechanism has enhanced state-of-the-art Neural Machine Translation -(NMT) by jointly learning to align and translate. It tends to ignore past -alignment information, however, which often leads to over-translation and -under-translation. To address this problem, we propose coverage-based NMT in -this paper. We maintain a coverage vector to keep track of the attention -history. The coverage vector is fed to the attention model to help adjust -future attention, which lets NMT system to consider more about untranslated -source words. Experiments show that the proposed approach significantly -improves both translation quality and alignment quality over standard -attention-based NMT. -" -2415,1601.04908,"Desislava Bankova, Bob Coecke, Martha Lewis, Daniel Marsden",Graded Entailment for Compositional Distributional Semantics,cs.CL cs.AI cs.LO math.CT quant-ph," The categorical compositional distributional model of natural language -provides a conceptually motivated procedure to compute the meaning of -sentences, given grammatical structure and the meanings of its words. This -approach has outperformed other models in mainstream empirical language -processing tasks. However, until recently it has lacked the crucial feature of -lexical entailment -- as do other distributional models of meaning. - In this paper we solve the problem of entailment for categorical -compositional distributional semantics. Taking advantage of the abstract -categorical framework allows us to vary our choice of model. This enables the -introduction of a notion of entailment, exploiting ideas from the categorical -semantics of partial knowledge in quantum computation. - The new model of language uses density matrices, on which we introduce a -novel robust graded order capturing the entailment strength between concepts. -This graded measure emerges from a general framework for approximate -entailment, induced by any commutative monoid. Quantum logic embeds in our -graded order. - Our main theorem shows that entailment strength lifts compositionally to the -sentence level, giving a lower bound on sentence entailment. We describe the -essential properties of graded entailment such as continuity, and provide a -procedure for calculating entailment strength. -" -2416,1601.05194,"Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang",Improved Spoken Document Summarization with Coverage Modeling Techniques,cs.CL cs.IR," Extractive summarization aims at selecting a set of indicative sentences from -a source document as a summary that can express the major theme of the -document. A general consensus on extractive summarization is that both -relevance and coverage are critical issues to address. The existing methods -designed to model coverage can be characterized by either reducing redundancy -or increasing diversity in the summary. Maximal margin relevance (MMR) is a -widely-cited method since it takes both relevance and redundancy into account -when generating a summary for a given document. In addition to MMR, there is -only a dearth of research concentrating on reducing redundancy or increasing -diversity for the spoken document summarization task, as far as we are aware. -Motivated by these observations, two major contributions are presented in this -paper. First, in contrast to MMR, which considers coverage by reducing -redundancy, we propose two novel coverage-based methods, which directly -increase diversity. With the proposed methods, a set of representative -sentences, which not only are relevant to the given document but also cover -most of the important sub-themes of the document, can be selected -automatically. Second, we make a step forward to plug in several -document/sentence representation methods into the proposed framework to further -enhance the summarization performance. A series of empirical evaluations -demonstrate the effectiveness of our proposed methods. -" -2417,1601.05403,"Jo\~ao Sedoc, Jean Gallier, Lyle Ungar, Dean Foster",Semantic Word Clusters Using Signed Normalized Graph Cuts,cs.CL cs.AI," Vector space representations of words capture many aspects of word -similarity, but such methods tend to make vector spaces in which antonyms (as -well as synonyms) are close to each other. We present a new signed spectral -normalized graph cut algorithm, signed clustering, that overlays existing -thesauri upon distributionally derived vector representations of words, so that -antonym relationships between word pairs are represented by negative weights. -Our signed clustering algorithm produces clusters of words which simultaneously -capture distributional and synonym relations. We evaluate these clusters -against the SimLex-999 dataset (Hill et al.,2014) of human judgments of word -pair similarities, and also show the benefit of using our clusters to predict -the sentiment of a given text. -" -2418,1601.05472,"Halid Ziya Yerebakan, Fitsum Reda, Yiqiang Zhan, Yoshihisa Shinagawa",Hierarchical Latent Word Clustering,cs.CL," This paper presents a new Bayesian non-parametric model by extending the -usage of Hierarchical Dirichlet Allocation to extract tree structured word -clusters from text data. The inference algorithm of the model collects words in -a cluster if they share similar distribution over documents. In our -experiments, we observed meaningful hierarchical structures on NIPS corpus and -radiology reports collected from public repositories. -" -2419,1601.05647,"Milos Cernak, Afsaneh Asaei, Herv\'e Bourlard",On Structured Sparsity of Phonological Posteriors for Linguistic Parsing,cs.CL," The speech signal conveys information on different time scales from short -time scale or segmental, associated to phonological and phonetic information to -long time scale or supra segmental, associated to syllabic and prosodic -information. Linguistic and neurocognitive studies recognize the phonological -classes at segmental level as the essential and invariant representations used -in speech temporal organization. In the context of speech processing, a deep -neural network (DNN) is an effective computational method to infer the -probability of individual phonological classes from a short segment of speech -signal. A vector of all phonological class probabilities is referred to as -phonological posterior. There are only very few classes comprising a short term -speech signal; hence, the phonological posterior is a sparse vector. Although -the phonological posteriors are estimated at segmental level, we claim that -they convey supra-segmental information. Specifically, we demonstrate that -phonological posteriors are indicative of syllabic and prosodic events. -Building on findings from converging linguistic evidence on the gestural model -of Articulatory Phonology as well as the neural basis of speech perception, we -hypothesize that phonological posteriors convey properties of linguistic -classes at multiple time scales, and this information is embedded in their -support (index) of active coefficients. To verify this hypothesis, we obtain a -binary representation of phonological posteriors at the segmental level which -is referred to as first-order sparsity structure; the high-order structures are -obtained by the concatenation of first-order binary vectors. It is then -confirmed that the classification of supra-segmental linguistic events, the -problem known as linguistic parsing, can be achieved with high accuracy using -asimple binary pattern matching of first-order or high-order structures. -" -2420,1601.05768,Daniel Christen,Syntax-Semantics Interaction Parsing Strategies. Inside SYNTAGMA,cs.CL," This paper discusses SYNTAGMA, a rule based NLP system addressing the tricky -issues of syntactic ambiguity reduction and word sense disambiguation as well -as providing innovative and original solutions for constituent generation and -constraints management. To provide an insight into how it operates, the -system's general architecture and components, as well as its lexical, syntactic -and semantic resources are described. After that, the paper addresses the -mechanism that performs selective parsing through an interaction between -syntactic and semantic information, leading the parser to a coherent and -accurate interpretation of the input text. -" -2421,1601.05893,"Shawn Brunsting, Hans De Sterck, Remco Dolman, Teun van Sprundel","GeoTextTagger: High-Precision Location Tagging of Textual Documents - using a Natural Language Processing Approach",cs.AI cs.CL cs.DB cs.IR," Location tagging, also known as geotagging or geolocation, is the process of -assigning geographical coordinates to input data. In this paper we present an -algorithm for location tagging of textual documents. Our approach makes use of -previous work in natural language processing by using a state-of-the-art -part-of-speech tagger and named entity recognizer to find blocks of text which -may refer to locations. A knowledge base (OpenStreatMap) is then used to find a -list of possible locations for each block. Finally, one location is chosen for -each block by assigning distance-based scores to each location and repeatedly -selecting the location and block with the best score. We tested our geolocation -algorithm with Wikipedia articles about topics with a well-defined geographical -location that are geotagged by the articles' authors, where classification -approaches have achieved median errors as low as 11 km, with attainable -accuracy limited by the class size. Our approach achieved a 10th percentile -error of 490 metres and median error of 54 kilometres on the Wikipedia dataset -we used. When considering the five location tags with the greatest scores, 50% -of articles were assigned at least one tag within 8.5 kilometres of the -article's author-assigned true location. We also tested our approach on Twitter -messages that are tagged with the location from which the message was sent. -Twitter texts are challenging because they are short and unstructured and often -do not contain words referring to the location they were sent from, but we -obtain potentially useful results. We explain how we use the Spark framework -for data analytics to collect and process our test data. In general, -classification-based approaches for location tagging may be reaching their -upper accuracy limit, but our precision-focused approach has high accuracy for -some texts and shows significant potential for improvement overall. -" -2422,1601.05936,"Pranay Dighe, Gil Luyet, Afsaneh Asaei and Herve Bourlard","Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic - Modeling in Speech Recognition",cs.CL cs.LG stat.ML," We propose to model the acoustic space of deep neural network (DNN) -class-conditional posterior probabilities as a union of low-dimensional -subspaces. To that end, the training posteriors are used for dictionary -learning and sparse coding. Sparse representation of the test posteriors using -this dictionary enables projection to the space of training data. Relying on -the fact that the intrinsic dimensions of the posterior subspaces are indeed -very small and the matrix of all posteriors belonging to a class has a very low -rank, we demonstrate how low-dimensional structures enable further enhancement -of the posteriors and rectify the spurious errors due to mismatch conditions. -The enhanced acoustic modeling method leads to improvements in continuous -speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in -both clean and noisy conditions, where upto 15.4% relative reduction in word -error rate (WER) is achieved. -" -2423,1601.05991,"Milos Cernak, Stefan Benus, Alexandros Lazaridis",Speech vocoding for laboratory phonology,cs.CL cs.SD," Using phonological speech vocoding, we propose a platform for exploring -relations between phonology and speech processing, and in broader terms, for -exploring relations between the abstract and physical structures of a speech -signal. Our goal is to make a step towards bridging phonology and speech -processing and to contribute to the program of Laboratory Phonology. We show -three application examples for laboratory phonology: compositional phonological -speech modelling, a comparison of phonological systems and an experimental -phonological parametric text-to-speech (TTS) system. The featural -representations of the following three phonological systems are considered in -this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English -(SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded -speech, we conclude that the latter achieves slightly better results than the -former. However, GP - the most compact phonological speech representation - -performs comparably to the systems with a higher number of phonological -features. The parametric TTS based on phonological speech representation, and -trained from an unlabelled audiobook in an unsupervised manner, achieves -intelligibility of 85% of the state-of-the-art parametric speech synthesis. We -envision that the presented approach paves the way for researchers in both -fields to form meaningful hypotheses that are explicitly testable using the -concepts developed and exemplified in this paper. On the one hand, laboratory -phonologists might test the applied concepts of their theoretical models, and -on the other hand, the speech processing community may utilize the concepts -developed for the theoretical phonological models for improvements of the -current state-of-the-art applications. -" -2424,1601.06068,"Shashi Narayan, Siva Reddy and Shay B. Cohen",Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing,cs.CL," One of the limitations of semantic parsing approaches to open-domain question -answering is the lexicosyntactic gap between natural language questions and -knowledge base entries -- there are many ways to ask a question, all with the -same answer. In this paper we propose to bridge this gap by generating -paraphrases of the input question with the goal that at least one of them will -be correctly mapped to a knowledge-base query. We introduce a novel grammar -model for paraphrase generation that does not require any sentence-aligned -paraphrase corpus. Our key idea is to leverage the flexibility and scalability -of latent-variable probabilistic context-free grammars to sample paraphrases. -We do an extrinsic evaluation of our paraphrases by plugging them into a -semantic parser for Freebase. Our evaluation experiments on the WebQuestions -benchmark dataset show that the performance of the semantic parser -significantly improves over strong baselines. -" -2425,1601.06081,Marco Guerini and Carlo Strapparava,Why Do Urban Legends Go Viral?,cs.CL cs.CY cs.SI," Urban legends are a genre of modern folklore, consisting of stories about -rare and exceptional events, just plausible enough to be believed, which tend -to propagate inexorably across communities. In our view, while urban legends -represent a form of ""sticky"" deceptive text, they are marked by a tension -between the credible and incredible. They should be credible like a news -article and incredible like a fairy tale to go viral. In particular we will -focus on the idea that urban legends should mimic the details of news (who, -where, when) to be credible, while they should be emotional and readable like a -fairy tale to be catchy and memorable. Using NLP tools we will provide a -quantitative analysis of these prototypical characteristics. We also lay out -some machine learning experiments showing that it is possible to recognize an -urban legend using just these simple features. -" -2426,1601.06303,"Max Kanovich, Stepan Kuznetsov, Andre Scedrov",Undecidability of the Lambek calculus with a relevant modality,math.LO cs.CL," Morrill and Valentin in the paper ""Computational coverage of TLG: -Nonlinearity"" considered an extension of the Lambek calculus enriched by a -so-called ""exponential"" modality. This modality behaves in the ""relevant"" -style, that is, it allows contraction and permutation, but not weakening. -Morrill and Valentin stated an open problem whether this system is decidable. -Here we show its undecidability. Our result remains valid if we consider the -fragment where all division operations have one direction. We also show that -the derivability problem in a restricted case, where the modality can be -applied only to variables (primitive types), is decidable and belongs to the NP -class. -" -2427,1601.06579,"Dong Nguyen, Jacob Eisenstein",A Kernel Independence Test for Geographical Language Variation,cs.CL," Quantifying the degree of spatial dependence for linguistic variables is a -key task for analyzing dialectal variation. However, existing approaches have -important drawbacks. First, they are based on parametric models of dependence, -which limits their power in cases where the underlying parametric assumptions -are violated. Second, they are not applicable to all types of linguistic data: -some approaches apply only to frequencies, others to boolean indicators of -whether a linguistic variable is present. We present a new method for measuring -geographical language variation, which solves both of these problems. Our -approach builds on Reproducing Kernel Hilbert space (RKHS) representations for -nonparametric statistics, and takes the form of a test statistic that is -computed from pairs of individual geotagged observations without aggregation -into predefined geographical bins. We compare this test with prior work using -synthetic data as well as a diverse set of real datasets: a corpus of Dutch -tweets, a Dutch syntactic atlas, and a dataset of letters to the editor in -North American newspapers. Our proposed test is shown to support robust -inferences across a broad range of scenarios and types of data. -" -2428,1601.06581,"Kyuyeon Hwang, Wonyong Sung","Character-Level Incremental Speech Recognition with Recurrent Neural - Networks",cs.CL cs.LG cs.NE," In real-time speech recognition applications, the latency is an important -issue. We have developed a character-level incremental speech recognition (ISR) -system that responds quickly even during the speech, where the hypotheses are -gradually improved while the speaking proceeds. The algorithm employs a -speech-to-character unidirectional recurrent neural network (RNN), which is -end-to-end trained with connectionist temporal classification (CTC), and an -RNN-based character-level language model (LM). The output values of the -CTC-trained RNN are character-level probabilities, which are processed by beam -search decoding. The RNN LM augments the decoding by providing long-term -dependency information. We propose tree-based online beam search with -additional depth-pruning, which enables the system to process infinitely long -input speech with low latency. This system not only responds quickly on speech -but also can dictate out-of-vocabulary (OOV) words according to pronunciation. -The proposed model achieves the word error rate (WER) of 8.90% on the Wall -Street Journal (WSJ) Nov'92 20K evaluation set when trained on the WSJ SI-284 -training set. -" -2429,1601.06732,"Martha Lewis, Jonathan Lawry",Concept Generation in Language Evolution,cs.AI cs.CL cs.MA," This thesis investigates the generation of new concepts from combinations of -existing concepts as a language evolves. We give a method for combining -concepts, and will be investigating the utility of composite concepts in -language evolution and thence the utility of concept generation. -" -2430,1601.06733,"Jianpeng Cheng, Li Dong, Mirella Lapata",Long Short-Term Memory-Networks for Machine Reading,cs.CL cs.NE," In this paper we address the question of how to render sequence-level -networks better at handling structured input. We propose a machine reading -simulator which processes text incrementally from left to right and performs -shallow reasoning with memory and attention. The reader extends the Long -Short-Term Memory architecture with a memory network in place of a single -memory cell. This enables adaptive memory usage during recurrence with neural -attention, offering a way to weakly induce relations among tokens. The system -is initially designed to process a single sequence but we also demonstrate how -to integrate it with an encoder-decoder architecture. Experiments on language -modeling, sentiment analysis, and natural language inference show that our -model matches or outperforms the state of the art. -" -2431,1601.06738,"Martha Lewis, Jonathan Lawry",A Label Semantics Approach to Linguistic Hedges,cs.AI cs.CL," We introduce a model for the linguistic hedges `very' and `quite' within the -label semantics framework, and combined with the prototype and conceptual -spaces theories of concepts. The proposed model emerges naturally from the -representational framework we use and as such, has a clear semantic grounding. -We give generalisations of these hedge models and show that they can be -composed with themselves and with other functions, going on to examine their -behaviour in the limit of composition. -" -2432,1601.06755,"Martha Lewis, Jonathan Lawry","The Utility of Hedged Assertions in the Emergence of Shared Categorical - Labels",cs.AI cs.CL cs.MA," We investigate the emergence of shared concepts in a community of language -users using a multi-agent simulation. We extend results showing that negated -assertions are of use in developing shared categories, to include assertions -modified by linguistic hedges. Results show that using hedged assertions -positively affects the emergence of shared categories in two distinct ways. -Firstly, using contraction hedges like `very' gives better convergence over -time. Secondly, using expansion hedges such as `quite' reduces concept overlap. -However, both these improvements come at a cost of slower speed of development. -" -2433,1601.06763,"Martha Lewis, Jonathan Lawry","Emerging Dimension Weights in a Conceptual Spaces Model of Concept - Combination",cs.AI cs.CL cs.MA," We investigate the generation of new concepts from combinations of properties -as an artificial language develops. To do so, we have developed a new framework -for conjunctive concept combination. This framework gives a semantic grounding -to the weighted sum approach to concept combination seen in the literature. We -implement the framework in a multi-agent simulation of language evolution and -show that shared combination weights emerge. The expected value and the -variance of these weights across agents may be predicted from the distribution -of elements in the conceptual space, as determined by the underlying -environment, together with the rate at which agents adopt others' concepts. -When this rate is smaller, the agents are able to converge to weights with -lower variance. However, the time taken to converge to a steady state -distribution of weights is longer. -" -2434,1601.06971,"Vishal.A.Kharde, Prof. Sheetal.Sonawane",Sentiment Analysis of Twitter Data: A Survey of Techniques,cs.CL," With the advancement of web technology and its growth, there is a huge volume -of data present in the web for internet users and a lot of data is generated -too. Internet has become a platform for online learning, exchanging ideas and -sharing opinions. Social networking sites like Twitter, Facebook, Google+ are -rapidly gaining popularity as they allow people to share and express their -views about topics,have discussion with different communities, or post messages -across the world. There has been lot of work in the field of sentiment analysis -of twitter data. This survey focuses mainly on sentiment analysis of twitter -data which is helpful to analyze the information in the tweets where opinions -are highly unstructured, heterogeneous and are either positive or negative, or -neutral in some cases. In this paper, we provide a survey and a comparative -analyses of existing techniques for opinion mining like machine learning and -lexicon-based approaches, together with evaluation metrics. Using various -machine learning algorithms like Naive Bayes, Max Entropy, and Support Vector -Machine, we provide a research on twitter data streams.General challenges and -applications of Sentiment Analysis on Twitter are also discussed in this paper. -" -2435,1601.07124,"Elvys Linhares Pontes, Juan-Manuel Torres-Moreno, Andr\'ea Carneiro - Linhares","LIA-RAG: a system based on graphs and divergence of probabilities - applied to Speech-To-Text Summarization",cs.CL cs.IR," This paper aims to introduces a new algorithm for automatic speech-to-text -summarization based on statistical divergences of probabilities and graphs. The -input is a text from speech conversations with noise, and the output a compact -text summary. Our results, on the pilot task CCCS Multiling 2015 French corpus -are very encouraging -" -2436,1601.07215,"Prasanna Kumar Muthukumar, Alan W Black","Recurrent Neural Network Postfilters for Statistical Parametric Speech - Synthesis",cs.CL," In the last two years, there have been numerous papers that have looked into -using Deep Neural Networks to replace the acoustic model in traditional -statistical parametric speech synthesis. However, far less attention has been -paid to approaches like DNN-based postfiltering where DNNs work in conjunction -with traditional acoustic models. In this paper, we investigate the use of -Recurrent Neural Networks as a potential postfilter for synthesis. We explore -the possibility of replacing existing postfilters, as well as highlight the -ease with which arbitrary new features can be added as input to the postfilter. -We also tried a novel approach of jointly training the Classification And -Regression Tree and the postfilter, rather than the traditional approach of -training them independently. -" -2437,1601.07435,Torsten Timm,Co-Occurrence Patterns in the Voynich Manuscript,cs.CL cs.CR," The Voynich Manuscript is a medieval book written in an unknown script. This -paper studies the distribution of similarly spelled words in the Voynich -Manuscript. It shows that the distribution of words within the manuscript is -not compatible with natural languages. -" -2438,1601.07969,"Jake Ryland Williams, James P. Bagrow, Andrew J. Reagan, Sharon E. - Alajajian, Christopher M. Danforth, and Peter Sheridan Dodds",Zipf's law is a consequence of coherent language production,cs.CL," The task of text segmentation may be undertaken at many levels in text -analysis---paragraphs, sentences, words, or even letters. Here, we focus on a -relatively fine scale of segmentation, hypothesizing it to be in accord with a -stochastic model of language generation, as the smallest scale where -independent units of meaning are produced. Our goals in this letter include the -development of methods for the segmentation of these minimal independent units, -which produce feature-representations of texts that align with the independence -assumption of the bag-of-terms model, commonly used for prediction and -classification in computational text analysis. We also propose the measurement -of texts' association (with respect to realized segmentations) to the model of -language generation. We find (1) that our segmentations of phrases exhibit much -better associations to the generation model than words and (2), that texts -which are well fit are generally topically homogeneous. Because our generative -model produces Zipf's law, our study further suggests that Zipf's law may be a -consequence of homogeneity in language production. -" -2439,1601.08188,"Michael Wand and Jan Koutn\'ik and J\""urgen Schmidhuber",Lipreading with Long Short-Term Memory,cs.CV cs.CL," Lipreading, i.e. speech recognition from visual-only recordings of a -speaker's face, can be achieved with a processing pipeline based solely on -neural networks, yielding significantly better accuracy than conventional -methods. Feed-forward and recurrent neural network layers (namely Long -Short-Term Memory; LSTM) are stacked to form a single structure which is -trained by back-propagating error gradients through all the layers. The -performance of such a stacked network was experimentally evaluated and compared -to a standard Support Vector Machine classifier using conventional computer -vision features (Eigenlips and Histograms of Oriented Gradients). The -evaluation was performed on data from 19 speakers of the publicly available -GRID corpus. With 51 different words to classify, we report a best word -accuracy on held-out evaluation speakers of 79.6% using the end-to-end neural -network-based solution (11.6% improvement over the best feature-based solution -evaluated). -" -2440,1602.00104,Mahyuddin K. M. Nasution,"Extracting Keyword for Disambiguating Name Based on the Overlap - Principle",cs.IR cs.CL," Name disambiguation has become one of the main themes in the Semantic Web -agenda. The semantic web is an extension of the current Web in which -information is not only given well-defined meaning, but also has many purposes -that contain the ambiguous naturally or a lot of thing came with the overlap, -mainly deals with the persons name. Therefore, we develop an approach to -extract keywords from web snippet with utilizing the overlap principle, a -concept to understand things with ambiguous, whereby features of person are -generated for dealing with the variety of web, the web is steadily gaining -ground in the semantic research. -" -2441,1602.00293,"Suman Kalyan Maity, Chaitanya Sarda, Anshit Chaudhary, Abhijeet Patil, - Shraman Kumar, Akash Mondal and Animesh Mukherjee",WASSUP? LOL : Characterizing Out-of-Vocabulary Words in Twitter,cs.CL cs.SI," Language in social media is mostly driven by new words and spellings that are -constantly entering the lexicon thereby polluting it and resulting in high -deviation from the formal written version. The primary entities of such -language are the out-of-vocabulary (OOV) words. In this paper, we study various -sociolinguistic properties of the OOV words and propose a classification model -to categorize them into at least six categories. We achieve 81.26% accuracy -with high precision and recall. We observe that the content features are the -most discriminative ones followed by lexical and context features. -" -2442,1602.00367,Yijun Xiao and Kyunghyun Cho,"Efficient Character-level Document Classification by Combining - Convolution and Recurrent Layers",cs.CL," Document classification tasks were primarily tackled at word level. Recent -research that works with character-level inputs shows several benefits over -word-level approaches such as natural incorporation of morphemes and better -handling of rare words. We propose a neural network architecture that utilizes -both convolution and recurrent layers to efficiently encode character inputs. -We validate the proposed model on eight large scale document classification -tasks and compare with character-level convolution-only models. It achieves -comparable performances with much less parameters. -" -2443,1602.00426,"Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, Chia-Hsiang Liu, - Hung-yi Lee and Lin-shan Lee","An Iterative Deep Learning Framework for Unsupervised Discovery of - Speech Features and Linguistic Units with Applications on Spoken Term - Detection",cs.CL cs.LG," In this work we aim to discover high quality speech features and linguistic -units directly from unlabeled speech data in a zero resource scenario. The -results are evaluated using the metrics and corpora proposed in the Zero -Resource Speech Challenge organized at Interspeech 2015. A Multi-layered -Acoustic Tokenizer (MAT) was proposed for automatic discovery of multiple sets -of acoustic tokens from the given corpus. Each acoustic token set is specified -by a set of hyperparameters that describe the model configuration. These sets -of acoustic tokens carry different characteristics fof the given corpus and the -language behind, thus can be mutually reinforced. The multiple sets of token -labels are then used as the targets of a Multi-target Deep Neural Network -(MDNN) trained on low-level acoustic features. Bottleneck features extracted -from the MDNN are then used as the feedback input to the MAT and the MDNN -itself in the next iteration. We call this iterative deep learning framework -the Multi-layered Acoustic Tokenizing Deep Neural Network (MAT-DNN), which -generates both high quality speech features for the Track 1 of the Challenge -and acoustic tokens for the Track 2 of the Challenge. In addition, we performed -extra experiments on the same corpora on the application of query-by-example -spoken term detection. The experimental results showed the iterative deep -learning framework of MAT-DNN improved the detection performance due to better -underlying speech features and acoustic tokens. -" -2444,1602.00515,Nikola Milosevic,Marvin: Semantic annotation using multiple knowledge sources,cs.AI cs.CL," People are producing more written material then anytime in the history. The -increase is so high that professionals from the various fields are no more able -to cope with this amount of publications. Text mining tools can offer tools to -help them and one of the tools that can aid information retrieval and -information extraction is semantic text annotation. In this report we present -Marvin, a text annotator written in Java, which can be used as a command line -tool and as a Java library. Marvin is able to annotate text using multiple -sources, including WordNet, MetaMap, DBPedia and thesauri represented as SKOS. -" -2445,1602.00812,"Richard Moot (LaBRI, CNRS)",The Grail theorem prover: Type theory for syntax and semantics,cs.CL," As the name suggests, type-logical grammars are a grammar formalism based on -logic and type theory. From the prespective of grammar design, type-logical -grammars develop the syntactic and semantic aspects of linguistic phenomena -hand-in-hand, letting the desired semantics of an expression inform the -syntactic type and vice versa. Prototypical examples of the successful -application of type-logical grammars to the syntax-semantics interface include -coordination, quantifier scope and extraction.This chapter describes the Grail -theorem prover, a series of tools for designing and testing grammars in various -modern type-logical grammars which functions as a tool . All tools described in -this chapter are freely available. -" -2446,1602.01103,"Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, Lillian - Lee","Winning Arguments: Interaction Dynamics and Persuasion Strategies in - Good-faith Online Discussions",cs.SI cs.CL physics.soc-ph," Changing someone's opinion is arguably one of the most important challenges -of social interaction. The underlying process proves difficult to study: it is -hard to know how someone's opinions are formed and whether and how someone's -views shift. Fortunately, ChangeMyView, an active community on Reddit, provides -a platform where users present their own opinions and reasoning, invite others -to contest them, and acknowledge when the ensuing discussions change their -original views. In this work, we study these interactions to understand the -mechanisms behind persuasion. - We find that persuasive arguments are characterized by interesting patterns -of interaction dynamics, such as participant entry-order and degree of -back-and-forth exchange. Furthermore, by comparing similar counterarguments to -the same opinion, we show that language factors play an essential role. In -particular, the interplay between the language of the opinion holder and that -of the counterargument provides highly predictive cues of persuasiveness. -Finally, since even in this favorable setting people may not be persuaded, we -investigate the problem of determining whether someone's opinion is susceptible -to being changed at all. For this more difficult task, we show that stylistic -choices in how the opinion is expressed carry predictive power. -" -2447,1602.01208,"Akira Taniguchi, Tadahiro Taniguchi and Tetsunari Inamura","Spatial Concept Acquisition for a Mobile Robot that Integrates - Self-Localization and Unsupervised Word Discovery from Spoken Sentences",cs.AI cs.CL cs.RO," In this paper, we propose a novel unsupervised learning method for the -lexical acquisition of words related to places visited by robots, from human -continuous speech signals. We address the problem of learning novel words by a -robot that has no prior knowledge of these words except for a primitive -acoustic model. Further, we propose a method that allows a robot to effectively -use the learned words and their meanings for self-localization tasks. The -proposed method is nonparametric Bayesian spatial concept acquisition method -(SpCoA) that integrates the generative model for self-localization and the -unsupervised word segmentation in uttered sentences via latent variables -related to the spatial concept. We implemented the proposed method SpCoA on -SIGVerse, which is a simulation environment, and TurtleBot2, which is a mobile -robot in a real environment. Further, we conducted experiments for evaluating -the performance of SpCoA. The experimental results showed that SpCoA enabled -the robot to acquire the names of places from speech sentences. They also -revealed that the robot could effectively utilize the acquired spatial concepts -and reduce the uncertainty in self-localization. -" -2448,1602.01248,"Nikolaos Nodarakis, Spyros Sioutas, Athanasios Tsakalidis and Giannis - Tzimas",Using Hadoop for Large Scale Analysis on Twitter: A Technical Report,cs.DB cs.CL cs.IR," Sentiment analysis (or opinion mining) on Twitter data has attracted much -attention recently. One of the system's key features, is the immediacy in -communication with other users in an easy, user-friendly and fast way. -Consequently, people tend to express their feelings freely, which makes Twitter -an ideal source for accumulating a vast amount of opinions towards a wide -diversity of topics. This amount of information offers huge potential and can -be harnessed to receive the sentiment tendency towards these topics. However, -since none can invest an infinite amount of time to read through these tweets, -an automated decision making approach is necessary. Nevertheless, most existing -solutions are limited in centralized environments only. Thus, they can only -process at most a few thousand tweets. Such a sample, is not representative to -define the sentiment polarity towards a topic due to the massive number of -tweets published daily. In this paper, we go one step further and develop a -novel method for sentiment learning in the MapReduce framework. Our algorithm -exploits the hashtags and emoticons inside a tweet, as sentiment labels, and -proceeds to a classification procedure of diverse sentiment types in a parallel -and distributed manner. Moreover, we utilize Bloom filters to compact the -storage size of intermediate data and boost the performance of our algorithm. -Through an extensive experimental evaluation, we prove that our solution is -efficient, robust and scalable and confirm the quality of our sentiment -identification. -" -2449,1602.01428,"Jason Dou, Ni Sun, Xiaojun Zou","""Draw My Topics"": Find Desired Topics fast from large scale of Corpus",cs.CL cs.IR," We develop the ""Draw My Topics"" toolkit, which provides a fast way to -incorporate social scientists' interest into standard topic modelling. Instead -of using raw corpus with primitive processing as input, an algorithm based on -Vector Space Model and Conditional Entropy are used to connect social -scientists' willingness and unsupervised topic models' output. Space for users' -adjustment on specific corpus of their interest is also accommodated. We -demonstrate the toolkit's use on the Diachronic People's Daily Corpus in -Chinese. -" -2450,1602.01576,Anantharaman Palacode Narayana Iyer,"A Factorized Recurrent Neural Network based architecture for medium to - large vocabulary Language Modelling",cs.CL cs.AI," Statistical language models are central to many applications that use -semantics. Recurrent Neural Networks (RNN) are known to produce state of the -art results for language modelling, outperforming their traditional n-gram -counterparts in many cases. To generate a probability distribution across a -vocabulary, these models require a softmax output layer that linearly increases -in size with the size of the vocabulary. Large vocabularies need a -commensurately large softmax layer and training them on typical laptops/PCs -requires significant time and machine resources. In this paper we present a new -technique for implementing RNN based large vocabulary language models that -substantially speeds up computation while optimally using the limited memory -resources. Our technique, while building on the notion of factorizing the -output layer by having multiple output layers, improves on the earlier work by -substantially optimizing on the individual output layer size and also -eliminating the need for a multistep prediction process. -" -2451,1602.01595,"Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. - Smith","Many Languages, One Parser",cs.CL," We train one multilingual model for dependency parsing and use it to parse -sentences in several languages. The parsing model uses (i) multilingual word -clusters and embeddings; (ii) token-level language information; and (iii) -language-specific features (fine-grained POS tags). This input representation -enables the parser not only to parse effectively in multiple languages, but -also to generalize across languages based on linguistic universals and -typological similarities, making it more effective to learn from limited -annotations. Our parser's performance compares favorably to strong baselines in -a range of data scenarios, including when the target language has a large -treebank, a small treebank, or no treebank for training. -" -2452,1602.01635,"Jules Hedges, Mehrnoosh Sadrzadeh","A Generalised Quantifier Theory of Natural Language in Categorical - Compositional Distributional Semantics with Bialgebras",cs.CL cs.AI math.CT," Categorical compositional distributional semantics is a model of natural -language; it combines the statistical vector space models of words with the -compositional models of grammar. We formalise in this model the generalised -quantifier theory of natural language, due to Barwise and Cooper. The -underlying setting is a compact closed category with bialgebras. We start from -a generative grammar formalisation and develop an abstract categorical -compositional semantics for it, then instantiate the abstract setting to sets -and relations and to finite dimensional vector spaces and linear maps. We prove -the equivalence of the relational instantiation to the truth theoretic -semantics of generalised quantifiers. The vector space instantiation formalises -the statistical usages of words and enables us to, for the first time, reason -about quantified phrases and sentences compositionally in distributional -semantics. -" -2453,1602.01895,"Shijian Tang, Song Han","Generate Image Descriptions based on Deep RNN and Memory Cells for - Images Features",cs.CV cs.CL cs.LG," Generating natural language descriptions for images is a challenging task. -The traditional way is to use the convolutional neural network (CNN) to extract -image features, followed by recurrent neural network (RNN) to generate -sentences. In this paper, we present a new model that added memory cells to -gate the feeding of image features to the deep neural network. The intuition is -enabling our model to memorize how much information from images should be fed -at each stage of the RNN. Experiments on Flickr8K and Flickr30K datasets showed -that our model outperforms other state-of-the-art models with higher BLEU -scores. -" -2454,1602.01925,"Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris - Dyer, Noah A. Smith",Massively Multilingual Word Embeddings,cs.CL," We introduce new methods for estimating and evaluating embeddings of words in -more than fifty languages in a single shared embedding space. Our estimation -methods, multiCluster and multiCCA, use dictionaries and monolingual data; they -do not require parallel data. Our new evaluation method, multiQVEC-CCA, is -shown to correlate better than previous ones with two downstream tasks (text -categorization and parsing). We also describe a web portal for evaluation that -will facilitate further research in this area, along with open-source releases -of all our methods. -" -2455,1602.01929,"Kong Aik Lee, Ville Hautam\""aki, Anthony Larcher, Wei Rao, Hanwu Sun, - Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir - Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, - Haizhou Li, Sylvain Meignier",Fantastic 4 system for NIST 2015 Language Recognition Evaluation,cs.CL," This article describes the systems jointly submitted by Institute for -Infocomm (I$^2$R), the Laboratoire d'Informatique de l'Universit\'e du Maine -(LIUM), Nanyang Technology University (NTU) and the University of Eastern -Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE). The -submitted system is a fusion of nine sub-systems based on i-vectors extracted -from different types of features. Given the i-vectors, several classifiers are -adopted for the language detection task including support vector machines -(SVM), multi-class logistic regression (MCLR), Probabilistic Linear -Discriminant Analysis (PLDA) and Deep Neural Networks (DNN). -" -2456,1602.02047,Elvys Linhares Pontes,"Utiliza\c{c}\~ao de Grafos e Matriz de Similaridade na Sumariza\c{c}\~ao - Autom\'atica de Documentos Baseada em Extra\c{c}\~ao de Frases",cs.CL cs.IR," The internet increased the amount of information available. However, the -reading and understanding of this information are costly tasks. In this -scenario, the Natural Language Processing (NLP) applications enable very -important solutions, highlighting the Automatic Text Summarization (ATS), which -produce a summary from one or more source texts. Automatically summarizing one -or more texts, however, is a complex task because of the difficulties inherent -to the analysis and generation of this summary. This master's thesis describes -the main techniques and methodologies (NLP and heuristics) to generate -summaries. We have also addressed and proposed some heuristics based on graphs -and similarity matrix to measure the relevance of judgments and to generate -summaries by extracting sentences. We used the multiple languages (English, -French and Spanish), CSTNews (Brazilian Portuguese), RPM (French) and DECODA -(French) corpus to evaluate the developped systems. The results obtained were -quite interesting. -" -2457,1602.02068,Andr\'e F. T. Martins and Ram\'on Fernandez Astudillo,"From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label - Classification",cs.CL cs.LG stat.ML," We propose sparsemax, a new activation function similar to the traditional -softmax, but able to output sparse probabilities. After deriving its -properties, we show how its Jacobian can be efficiently computed, enabling its -use in a network trained with backpropagation. Then, we propose a new smooth -and convex loss function which is the sparsemax analogue of the logistic loss. -We reveal an unexpected connection between this new loss and the Huber -classification loss. We obtain promising empirical results in multi-label -classification problems and in attention-based neural networks for natural -language inference. For the latter, we achieve a similar performance as the -traditional softmax, but with a selective, more compact, attention focus. -" -2458,1602.02089,"Martha Lewis, Bob Coecke",Harmonic Grammar in a DisCo Model of Meaning,cs.AI cs.CL," The model of cognition developed in (Smolensky and Legendre, 2006) seeks to -unify two levels of description of the cognitive process: the connectionist and -the symbolic. The theory developed brings together these two levels into the -Integrated Connectionist/Symbolic Cognitive architecture (ICS). Clark and -Pulman (2007) draw a parallel with semantics where meaning may be modelled on -both distributional and symbolic levels, developed by Coecke et al, 2010 into -the Distributional Compositional (DisCo) model of meaning. In the current work, -we revisit Smolensky and Legendre (S&L)'s model. We describe the DisCo -framework, summarise the key ideas in S&L's architecture, and describe how -their description of harmony as a graded measure of grammaticality may be -applied in the DisCo model. -" -2459,1602.02133,Issa Atoum and Ahmed Otoom,"Mining Software Quality from Software Reviews: Research Trends and Open - Issues",cs.CL cs.IR," Software review text fragments have considerably valuable information about -users experience. It includes a huge set of properties including the software -quality. Opinion mining or sentiment analysis is concerned with analyzing -textual user judgments. The application of sentiment analysis on software -reviews can find a quantitative value that represents software quality. -Although many software quality methods are proposed they are considered -difficult to customize and many of them are limited. This article investigates -the application of opinion mining as an approach to extract software quality -properties. We found that the major issues of software reviews mining using -sentiment analysis are due to software lifecycle and the diverse users and -teams. -" -2460,1602.02215,"Noam Shazeer, Ryan Doherty, Colin Evans, Chris Waterson",Swivel: Improving Embeddings by Noticing What's Missing,cs.CL," We present Submatrix-wise Vector Embedding Learner (Swivel), a method for -generating low-dimensional feature embeddings from a feature co-occurrence -matrix. Swivel performs approximate factorization of the point-wise mutual -information matrix via stochastic gradient descent. It uses a piecewise loss -with special handling for unobserved co-occurrences, and thus makes use of all -the information in the matrix. While this requires computation proportional to -the size of the entire matrix, we make use of vectorized multiplication to -process thousands of rows and columns at once to compute millions of predicted -values. Furthermore, we partition the matrix into shards in order to -parallelize the computation across many nodes. This approach results in more -accurate embeddings than can be achieved with methods that consider only -observed co-occurrences, and can scale to much larger corpora than can be -handled with sampling methods. -" -2461,1602.02332,Antti Puurula,Scalable Text Mining with Sparse Generative Models,cs.IR cs.AI cs.CL," The information age has brought a deluge of data. Much of this is in text -form, insurmountable in scope for humans and incomprehensible in structure for -computers. Text mining is an expanding field of research that seeks to utilize -the information contained in vast document collections. General data mining -methods based on machine learning face challenges with the scale of text data, -posing a need for scalable text mining methods. - This thesis proposes a solution to scalable text mining: generative models -combined with sparse computation. A unifying formalization for generative text -models is defined, bringing together research traditions that have used -formally equivalent models, but ignored parallel developments. This framework -allows the use of methods developed in different processing tasks such as -retrieval and classification, yielding effective solutions across different -text mining tasks. Sparse computation using inverted indices is proposed for -inference on probabilistic models. This reduces the computational complexity of -the common text mining operations according to sparsity, yielding probabilistic -models with the scalability of modern search engines. - The proposed combination provides sparse generative models: a solution for -text mining that is general, effective, and scalable. Extensive experimentation -on text classification and ranked retrieval datasets are conducted, showing -that the proposed solution matches or outperforms the leading task-specific -methods in effectiveness, with a order of magnitude decrease in classification -times for Wikipedia article categorization with a million classes. The -developed methods were further applied in two 2014 Kaggle data mining prize -competitions with over a hundred competing teams, earning first and second -places. -" -2462,1602.02373,"Rie Johnson, Tong Zhang","Supervised and Semi-Supervised Text Categorization using LSTM for Region - Embeddings",stat.ML cs.CL cs.LG," One-hot CNN (convolutional neural network) has been shown to be effective for -text categorization (Johnson & Zhang, 2015). We view it as a special case of a -general framework which jointly trains a linear model with a non-linear feature -generator consisting of `text region embedding + pooling'. Under this -framework, we explore a more sophisticated region embedding method using Long -Short-Term Memory (LSTM). LSTM can embed text regions of variable (and possibly -large) sizes, whereas the region size needs to be fixed in a CNN. We seek -effective and efficient use of LSTM for this purpose in the supervised and -semi-supervised settings. The best results were obtained by combining region -embeddings in the form of LSTM and convolution layers trained on unlabeled -data. The results indicate that on this task, embeddings of text regions, which -can convey complex concepts, are more useful than embeddings of single words in -isolation. We report performances exceeding the previous best results on four -benchmark datasets. -" -2463,1602.02410,"Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui - Wu",Exploring the Limits of Language Modeling,cs.CL," In this work we explore recent advances in Recurrent Neural Networks for -large scale Language Modeling, a task central to language understanding. We -extend current models to deal with two key challenges present in this task: -corpora and vocabulary sizes, and complex, long term structure of language. We -perform an exhaustive study on techniques such as character Convolutional -Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. -Our best single model significantly improves state-of-the-art perplexity from -51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), -while an ensemble of models sets a new record by improving perplexity from 41.0 -down to 23.7. We also release these models for the NLP and ML community to -study and improve upon. -" -2464,1602.02499,David A. van Leeuwen and Rosemary Orr,"The ""Sprekend Nederland"" project and its application to accent location",stat.ML cs.CL," This paper describes the data collection effort that is part of the project -Sprekend Nederland (The Netherlands Talking), and discusses its potential use -in Automatic Accent Location. We define Automatic Accent Location as the task -to describe the accent of a speaker in terms of the location of the speaker and -its history. We discuss possible ways of describing accent location, the -consequence these have for the task of automatic accent location, and potential -evaluation metrics. -" -2465,1602.02665,"Johan Bollen, Bruno Gon\c{c}alves, Ingrid van de Leemput, Guangchen - Ruan",The happiness paradox: your friends are happier than you,cs.SI cs.CL cs.HC physics.soc-ph," Most individuals in social networks experience a so-called Friendship -Paradox: they are less popular than their friends on average. This effect may -explain recent findings that widespread social network media use leads to -reduced happiness. However the relation between popularity and happiness is -poorly understood. A Friendship paradox does not necessarily imply a Happiness -paradox where most individuals are less happy than their friends. Here we -report the first direct observation of a significant Happiness Paradox in a -large-scale online social network of $39,110$ Twitter users. Our results reveal -that popular individuals are indeed happier and that a majority of individuals -experience a significant Happiness paradox. The magnitude of the latter effect -is shaped by complex interactions between individual popularity, happiness, and -the fact that users cluster assortatively by level of happiness. Our results -indicate that the topology of online social networks and the distribution of -happiness in some populations can cause widespread psycho-social effects that -affect the well-being of billions of individuals. -" -2466,1602.02850,"Bo Tang, Steven Kay, and Haibo He",Toward Optimal Feature Selection in Naive Bayes for Text Categorization,stat.ML cs.CL cs.IR cs.LG," Automated feature selection is important for text categorization to reduce -the feature size and to speed up the learning process of classifiers. In this -paper, we present a novel and efficient feature selection framework based on -the Information Theory, which aims to rank the features with their -discriminative capacity for classification. We first revisit two information -measures: Kullback-Leibler divergence and Jeffreys divergence for binary -hypothesis testing, and analyze their asymptotic properties relating to type I -and type II errors of a Bayesian classifier. We then introduce a new divergence -measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure -multi-distribution divergence for multi-class classification. Based on the -JMH-divergence, we develop two efficient feature selection methods, termed -maximum discrimination ($MD$) and $MD-\chi^2$ methods, for text categorization. -The promising results of extensive experiments demonstrate the effectiveness of -the proposed approaches. -" -2467,1602.03001,"Miltiadis Allamanis, Hao Peng, Charles Sutton","A Convolutional Attention Network for Extreme Summarization of Source - Code",cs.LG cs.CL cs.SE," Attention mechanisms in neural networks have proved useful for problems in -which the input and output do not have fixed dimension. Often there exist -features that are locally translation invariant and would be valuable for -directing the model's attention, but previous attentional architectures are not -constructed to learn such features specifically. We introduce an attentional -neural network that employs convolution on the input tokens to detect local -time-invariant and long-range topical attention features in a context-dependent -way. We apply this architecture to the problem of extreme summarization of -source code snippets into short, descriptive function name-like summaries. -Using those features, the model sequentially generates a summary by -marginalizing over two attention mechanisms: one that predicts the next summary -token based on the attention weights of the input tokens and another that is -able to copy a code token as-is directly into the summary. We demonstrate our -convolutional attention neural network's performance on 10 popular Java -projects showing that it achieves better performance compared to previous -attentional mechanisms. -" -2468,1602.03265,Aida Nematzadeh and Filip Miscevic and Suzanne Stevenson,Simple Search Algorithms on Semantic Networks Learned from Language Use,cs.CL," Recent empirical and modeling research has focused on the semantic fluency -task because it is informative about semantic memory. An interesting interplay -arises between the richness of representations in semantic memory and the -complexity of algorithms required to process it. It has remained an open -question whether representations of words and their relations learned from -language use can enable a simple search algorithm to mimic the observed -behavior in the fluency task. Here we show that it is plausible to learn rich -representations from naturalistic data for which a very simple search algorithm -(a random walk) can replicate the human patterns. We suggest that explicitly -structuring knowledge about words into a semantic network plays a crucial role -in modeling human behavior in memory search and retrieval; moreover, this is -the case across a range of semantic information sources. -" -2469,1602.03426,"Aditya Joshi, Pushpak Bhattacharyya, Mark James Carman",Automatic Sarcasm Detection: A Survey,cs.CL," Automatic sarcasm detection is the task of predicting sarcasm in text. This -is a crucial step to sentiment analysis, considering prevalence and challenges -of sarcasm in sentiment-bearing text. Beginning with an approach that used -speech-based features, sarcasm detection has witnessed great interest from the -sentiment analysis community. This paper is the first known compilation of past -work in automatic sarcasm detection. We observe three milestones in the -research so far: semi-supervised pattern extraction to identify implicit -sentiment, use of hashtag-based supervision, and use of context beyond target -text. In this paper, we describe datasets, approaches, trends and issues in -sarcasm detection. We also discuss representative performance values, shared -tasks and pointers to future work, as given in prior works. In terms of -resources that could be useful for understanding state-of-the-art, the survey -presents several useful illustrations - most prominently, a table that -summarizes past papers along different dimensions such as features, annotation -techniques, data forms, etc. -" -2470,1602.03483,"Felix Hill, Kyunghyun Cho, Anna Korhonen",Learning Distributed Representations of Sentences from Unlabelled Data,cs.CL cs.LG," Unsupervised methods for learning distributed representations of words are -ubiquitous in today's NLP research, but far less is known about the best ways -to learn distributed phrase or sentence representations from unlabelled data. -This paper is a systematic comparison of models that learn such -representations. We find that the optimal approach depends critically on the -intended application. Deeper, more complex models are preferable for -representations to be used in supervised systems, but shallow log-linear models -work best for building representation spaces that can be decoded with simple -spatial distance metrics. We also propose two new unsupervised -representation-learning objectives designed to optimise the trade-off between -training time, domain portability and performance. -" -2471,1602.03551,"Stephanie L. Hyland, Theofanis Karaletsos, Gunnar R\""atsch",Knowledge Transfer with Medical Language Embeddings,cs.CL stat.AP," Identifying relationships between concepts is a key aspect of scientific -knowledge synthesis. Finding these links often requires a researcher to -laboriously search through scien- tific papers and databases, as the size of -these resources grows ever larger. In this paper we describe how distributional -semantics can be used to unify structured knowledge graphs with unstructured -text to predict new relationships between medical concepts, using a -probabilistic generative model. Our approach is also designed to ameliorate -data sparsity and scarcity issues in the medical domain, which make language -modelling more challenging. Specifically, we integrate the medical relational -database (SemMedDB) with text from electronic health records (EHRs) to perform -knowledge graph completion. We further demonstrate the ability of our model to -predict relationships between tokens not appearing in the relational database. -" -2472,1602.03606,"Federico Barrios, Federico L\'opez, Luis Argerich, Rosa Wachenchauzer","Variations of the Similarity Function of TextRank for Automated - Summarization",cs.CL cs.IR," This article presents new alternatives to the similarity function for the -TextRank algorithm for automatic summarization of texts. We describe the -generalities of the algorithm and the different functions we propose. Some of -these variants achieve a significative improvement using the same metrics and -dataset as the original publication. -" -2473,1602.03609,"Cicero dos Santos, Ming Tan, Bing Xiang, Bowen Zhou",Attentive Pooling Networks,cs.CL cs.LG," In this work, we propose Attentive Pooling (AP), a two-way attention -mechanism for discriminative model training. In the context of pair-wise -ranking or classification with neural networks, AP enables the pooling layer to -be aware of the current input pair, in a way that information from the two -input items can directly influence the computation of each other's -representations. Along with such representations of the paired inputs, AP -jointly learns a similarity measure over projected segments (e.g. trigrams) of -the pair, and subsequently, derives the corresponding attention vector for each -input to guide the pooling. Our two-way attention mechanism is a general -framework independent of the underlying representation learning, and it has -been applied to both convolutional neural networks (CNNs) and recurrent neural -networks (RNNs) in our studies. The empirical results, from three very -different benchmark tasks of question answering/answer selection, demonstrate -that our proposed models outperform a variety of strong baselines and achieve -state-of-the-art performance in all the benchmarks. -" -2474,1602.03661,"Vittorio Loreto, Pietro Gravino, Vito D.P. Servedio, Francesca Tria","On the emergence of syntactic structures: quantifying and modelling - duality of patterning",physics.soc-ph cs.CL," The complex organization of syntax in hierarchical structures is one of the -core design features of human language. Duality of patterning refers for -instance to the organization of the meaningful elements in a language at two -distinct levels: a combinatorial level where meaningless forms are combined -into meaningful forms and a compositional level where meaningful forms are -composed into larger lexical units. The question remains wide open regarding -how such a structure could have emerged. Furthermore a clear mathematical -framework to quantify this phenomenon is still lacking. The aim of this paper -is that of addressing these two aspects in a self-consistent way. First, we -introduce suitable measures to quantify the level of combinatoriality and -compositionality in a language, and present a framework to estimate these -observables in human natural languages. Second, we show that the theoretical -predictions of a multi-agents modeling scheme, namely the Blending Game, are in -surprisingly good agreement with empirical data. In the Blending Game a -population of individuals plays language games aiming at success in -communication. It is remarkable that the two sides of duality of patterning -emerge simultaneously as a consequence of a pure cultural dynamics in a -simulated environment that contains meaningful relations, provided a simple -constraint on message transmission fidelity is also considered. -" -2475,1602.03960,"Sujay Kumar Jauhar, Peter Turney and Eduard Hovy","TabMCQ: A Dataset of General Knowledge Tables and Multiple-choice - Questions",cs.CL," We describe two new related resources that facilitate modelling of general -knowledge reasoning in 4th grade science exams. The first is a collection of -curated facts in the form of tables, and the second is a large set of -crowd-sourced multiple-choice questions covering the facts in the tables. -Through the setup of the crowd-sourced annotation task we obtain implicit -alignment information between questions and tables. We envisage that the -resources will be useful not only to researchers working on question answering, -but also to people investigating a diverse range of other applications such as -information extraction, question parsing, answer type identification, and -lexical semantic modelling. -" -2476,1602.04101,"Tai Wang, Xiangen Hu, Keith Shubeck, Zhiqiang Cai, Jie Tang","An Empirical Study on Academic Commentary and Its Implications on - Reading and Writing",cs.CY cs.CL," The relationship between reading and writing (RRW) is one of the major themes -in learning science. One of its obstacles is that it is difficult to define or -measure the latent background knowledge of the individual. However, in an -academic research setting, scholars are required to explicitly list their -background knowledge in the citation sections of their manuscripts. This unique -opportunity was taken advantage of to observe RRW, especially in the published -academic commentary scenario. RRW was visualized under a proposed topic process -model by using a state of the art version of latent Dirichlet allocation (LDA). -The empirical study showed that the academic commentary is modulated both by -its target paper and the author's background knowledge. Although this -conclusion was obtained in a unique environment, we suggest its implications -can also shed light on other similar interesting areas, such as dialog and -conversation, group discussion, and social media. -" -2477,1602.04278,"Taehwan Kim, Weiran Wang, Hao Tang, Karen Livescu","Signer-independent Fingerspelling Recognition with Deep Neural Network - Adaptation",cs.CL cs.CV cs.NE," We study the problem of recognition of fingerspelled letter sequences in -American Sign Language in a signer-independent setting. Fingerspelled sequences -are both challenging and important to recognize, as they are used for many -content words such as proper nouns and technical terms. Previous work has shown -that it is possible to achieve almost 90% accuracies on fingerspelling -recognition in a signer-dependent setting. However, the more realistic -signer-independent setting presents challenges due to significant variations -among signers, coupled with the dearth of available training data. We -investigate this problem with approaches inspired by automatic speech -recognition. We start with the best-performing approaches from prior work, -based on tandem models and segmental conditional random fields (SCRFs), with -features based on deep neural network (DNN) classifiers of letters and -phonological features. Using DNN adaptation, we find that it is possible to -bridge a large part of the gap between signer-dependent and signer-independent -performance. Using only about 115 transcribed words for adaptation from the -target signer, we obtain letter accuracies of up to 82.7% with frame-level -adaptation labels and 69.7% with only word labels. -" -2478,1602.04341,"Wenpeng Yin, Sebastian Ebert, Hinrich Sch\""utze",Attention-Based Convolutional Neural Network for Machine Comprehension,cs.CL," Understanding open-domain text is one of the primary challenges in natural -language processing (NLP). Machine comprehension benchmarks evaluate the -system's ability to understand text based on the text content only. In this -work, we investigate machine comprehension on MCTest, a question answering (QA) -benchmark. Prior work is mainly based on feature engineering approaches. We -come up with a neural network framework, named hierarchical attention-based -convolutional neural network (HABCNN), to address this task without any -manually designed features. Specifically, we explore HABCNN for this task by -two routes, one is through traditional joint modeling of passage, question and -answer, one is through textual entailment. HABCNN employs an attention -mechanism to detect key phrases, key sentences and key snippets that are -relevant to answering the question. Experiments show that HABCNN outperforms -prior deep learning approaches by a big margin. -" -2479,1602.04375,"Mrinmaya Sachan, Avinava Dubey, Eric P. Xing",Science Question Answering using Instructional Materials,cs.CL cs.AI cs.IR cs.LG," We provide a solution for elementary science test using instructional -materials. We posit that there is a hidden structure that explains the -correctness of an answer given the question and instructional materials and -present a unified max-margin framework that learns to find these hidden -structures (given a corpus of question-answer pairs and instructional -materials), and uses what it learns to answer novel elementary science -questions. Our evaluation shows that our framework outperforms several strong -baselines. -" -2480,1602.04427,Zheng Xu and Douglas Burdick and Louiqa Raschid,"Exploiting Lists of Names for Named Entity Identification of Financial - Institutions from Unstructured Documents",cs.CL," There is a wealth of information about financial systems that is embedded in -document collections. In this paper, we focus on a specialized text extraction -task for this domain. The objective is to extract mentions of names of -financial institutions, or FI names, from financial prospectus documents, and -to identify the corresponding real world entities, e.g., by matching against a -corpus of such entities. The tasks are Named Entity Recognition (NER) and -Entity Resolution (ER); both are well studied in the literature. Our -contribution is to develop a rule-based approach that will exploit lists of FI -names for both tasks; our solution is labeled Dict-based NER and Rank-based ER. -Since the FI names are typically represented by a root, and a suffix that -modifies the root, we use these lists of FI names to create specialized root -and suffix dictionaries. To evaluate the effectiveness of our specialized -solution for extracting FI names, we compare Dict-based NER with a general -purpose rule-based NER solution, ORG NER. Our evaluation highlights the -benefits and limitations of specialized versus general purpose approaches, and -presents additional suggestions for tuning and customization for FI name -extraction. To our knowledge, our proposed solutions, Dict-based NER and -Rank-based ER, and the root and suffix dictionaries, are the first attempt to -exploit specialized knowledge, i.e., lists of FI names, for rule-based NER and -ER. -" -2481,1602.04709,"Giancarlo Crocetti, Amir A. Delay, Fatemeh Seyedmendhi","Identifying Structures in Social Conversations in NSCLC Patients through - the Semi-Automatic extraction of Topical Taxonomies",cs.IR cs.AI cs.CL," The exploration of social conversations for addressing patient's needs is an -important analytical task in which many scholarly publications are contributing -to fill the knowledge gap in this area. The main difficulty remains the -inability to turn such contributions into pragmatic processes the -pharmaceutical industry can leverage in order to generate insight from social -media data, which can be considered as one of the most challenging source of -information available today due to its sheer volume and noise. This study is -based on the work by Scott Spangler and Jeffrey Kreulen and applies it to -identify structure in social media through the extraction of a topical taxonomy -able to capture the latent knowledge in social conversations in health-related -sites. The mechanism for automatically identifying and generating a taxonomy -from social conversations is developed and pressured tested using public data -from media sites focused on the needs of cancer patients and their families. -Moreover, a novel method for generating the category's label and the -determination of an optimal number of categories is presented which extends -Scott and Jeffrey's research in a meaningful way. We assume the reader is -familiar with taxonomies, what they are and how they are used. -" -2482,1602.04853,Yurij Holovatch and Vasyl Palchykov,Complex Networks of Words in Fables,physics.soc-ph cs.CL," In this chapter we give an overview of the application of complex network -theory to quantify some properties of language. Our study is based on two -fables in Ukrainian, Mykyta the Fox and Abu-Kasym's slippers. It consists of -two parts: the analysis of frequency-rank distributions of words and the -application of complex-network theory. The first part shows that the text sizes -are sufficiently large to observe statistical properties. This supports their -selection for the analysis of typical properties of the language networks in -the second part of the chapter. In describing language as a complex network, -while words are usually associated with nodes, there is more variability in the -choice of links and different representations result in different networks. -Here, we examine a number of such representations of the language network and -perform a comparative analysis of their characteristics. Our results suggest -that, irrespective of link representation, the Ukrainian language network used -in the selected fables is a strongly correlated, scale-free, small world. We -discuss how such empirical approaches may help form a useful basis for a -theoretical description of language evolution and how they may be used in -analyses of other textual narratives. -" -2483,1602.04874,"Yushi Yao, Zheng Huang","Bi-directional LSTM Recurrent Neural Network for Chinese Word - Segmentation",cs.LG cs.CL," Recurrent neural network(RNN) has been broadly applied to natural language -processing(NLP) problems. This kind of neural network is designed for modeling -sequential data and has been testified to be quite efficient in sequential -tagging tasks. In this paper, we propose to use bi-directional RNN with long -short-term memory(LSTM) units for Chinese word segmentation, which is a crucial -preprocess task for modeling Chinese sentences and articles. Classical methods -focus on designing and combining hand-craft features from context, whereas -bi-directional LSTM network(BLSTM) does not need any prior knowledge or -pre-designing, and it is expert in keeping the contextual information in both -directions. Experiment result shows that our approach gets state-of-the-art -performance in word segmentation on both traditional Chinese datasets and -simplified Chinese datasets. -" -2484,1602.04930,Yi-Zhi Xu and Hai-Jun Zhou,"Generalized minimum dominating set and application in automatic text - summarization",cs.IR cond-mat.stat-mech cs.CL physics.soc-ph," For a graph formed by vertices and weighted edges, a generalized minimum -dominating set (MDS) is a vertex set of smallest cardinality such that the -summed weight of edges from each outside vertex to vertices in this set is -equal to or larger than certain threshold value. This generalized MDS problem -reduces to the conventional MDS problem in the limiting case of all the edge -weights being equal to the threshold value. We treat the generalized MDS -problem in the present paper by a replica-symmetric spin glass theory and -derive a set of belief-propagation equations. As a practical application we -consider the problem of extracting a set of sentences that best summarize a -given input text document. We carry out a preliminary test of the statistical -physics-inspired method to this automatic text summarization problem. -" -2485,1602.04983,"Sreyasi Nag Chowdhury, Mateusz Malinowski, Andreas Bulling, Mario - Fritz",Contextual Media Retrieval Using Natural Language Queries,cs.IR cs.AI cs.CL cs.CV cs.HC," The widespread integration of cameras in hand-held and head-worn devices as -well as the ability to share content online enables a large and diverse visual -capture of the world that millions of users build up collectively every day. We -envision these images as well as associated meta information, such as GPS -coordinates and timestamps, to form a collective visual memory that can be -queried while automatically taking the ever-changing context of mobile users -into account. As a first step towards this vision, in this work we present -Xplore-M-Ego: a novel media retrieval system that allows users to query a -dynamic database of images and videos using spatio-temporal natural language -queries. We evaluate our system using a new dataset of real user queries as -well as through a usability study. One key finding is that there is a -considerable amount of inter-user variability, for example in the resolution of -spatial relations in natural language utterances. We show that our retrieval -system can cope with this variability using personalisation through an online -learning-based retrieval formulation. -" -2486,1602.05292,"Zhenhao Ge, Yufang Sun and Mark J. T. Smith",Authorship Attribution Using a Neural Network Language Model,cs.CL cs.AI," In practice, training language models for individual authors is often -expensive because of limited data resources. In such cases, Neural Network -Language Models (NNLMs), generally outperform the traditional non-parametric -N-gram models. Here we investigate the performance of a feed-forward NNLM on an -authorship attribution problem, with moderate author set size and relatively -limited data. We also consider how the text topics impact performance. Compared -with a well-constructed N-gram baseline method with Kneser-Ney smoothing, the -proposed method achieves nearly 2:5% reduction in perplexity and increases -author classification accuracy by 3:43% on average, given as few as 5 test -sentences. The performance is very competitive with the state of the art in -terms of accuracy and demand on test data. The source code, preprocessed -datasets, a detailed description of the methodology and results are available -at https://github.com/zge/authorship-attribution. -" -2487,1602.05307,"Xiang Ren, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Jiawei Han","Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label - Embedding",cs.CL cs.LG," Current systems of fine-grained entity typing use distant supervision in -conjunction with existing knowledge bases to assign categories (type labels) to -entity mentions. However, the type labels so obtained from knowledge bases are -often noisy (i.e., incorrect for the entity mention's local context). We define -a new task, Label Noise Reduction in Entity Typing (LNR), to be the automatic -identification of correct type labels (type-paths) for training examples, given -the set of candidate type labels obtained by distant supervision with a given -type hierarchy. The unknown type labels for individual entity mentions and the -semantic similarity between entity types pose unique challenges for solving the -LNR task. We propose a general framework, called PLE, to jointly embed entity -mentions, text features and entity types into the same low-dimensional space -where, in that space, objects whose types are semantically close have similar -representations. Then we estimate the type-path for each training example in a -top-down manner using the learned embeddings. We formulate a global objective -for learning the embeddings from text corpora and knowledge bases, which adopts -a novel margin-based loss that is robust to noisy labels and faithfully models -type correlation derived from knowledge bases. Our experiments on three public -typing datasets demonstrate the effectiveness and robustness of PLE, with an -average of 25% improvement in accuracy compared to next best method. -" -2488,1602.05388,"Muhammad Imran, Prasenjit Mitra, Jaideep Srivastava","Cross-Language Domain Adaptation for Classifying Crisis-Related Short - Messages",cs.CL," Rapid crisis response requires real-time analysis of messages. After a -disaster happens, volunteers attempt to classify tweets to determine needs, -e.g., supplies, infrastructure damage, etc. Given labeled data, supervised -machine learning can help classify these messages. Scarcity of labeled data -causes poor performance in machine training. Can we reuse old tweets to train -classifiers? How can we choose labeled tweets for training? Specifically, we -study the usefulness of labeled data of past events. Do labeled tweets in -different language help? We observe the performance of our classifiers trained -using different combinations of training sets obtained from past disasters. We -perform extensive experimentation on real crisis datasets and show that the -past labels are useful when both source and target events are of the same type -(e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), -cross-language domain adaptation was useful, however, when for different -languages (e.g., Italian and English), the performance decreased. -" -2489,1602.05753,Mark A. Finlayson and Toma\v{z} Erjavec,Overview of Annotation Creation: Processes & Tools,cs.CL cs.HC," Creating linguistic annotations requires more than just a reliable annotation -scheme. Annotation can be a complex endeavour potentially involving many -people, stages, and tools. This chapter outlines the process of creating -end-to-end linguistic annotations, identifying specific tasks that researchers -often perform. Because tool support is so central to achieving high quality, -reusable annotations with low cost, the focus is on identifying capabilities -that are necessary or useful for annotation tools, as well as common problems -these tools present that reduce their utility. Although examples of specific -tools are provided in many cases, this chapter concentrates more on abstract -capabilities and problems because new tools appear continuously, while old -tools disappear into disuse or disrepair. The two core capabilities tools must -have are support for the chosen annotation scheme and the ability to work on -the language under study. Additional capabilities are organized into three -categories: those that are widely provided; those that often useful but found -in only a few tools; and those that have as yet little or no available tool -support. -" -2490,1602.05765,"Shoaib Jameel, Steven Schockaert","Entity Embeddings with Conceptual Subspaces as a Basis for Plausible - Reasoning",cs.AI cs.CL," Conceptual spaces are geometric representations of conceptual knowledge, in -which entities correspond to points, natural properties correspond to convex -regions, and the dimensions of the space correspond to salient features. While -conceptual spaces enable elegant models of various cognitive phenomena, the -lack of automated methods for constructing such representations have so far -limited their application in artificial intelligence. To address this issue, we -propose a method which learns a vector-space embedding of entities from -Wikipedia and constrains this embedding such that entities of the same semantic -type are located in some lower-dimensional subspace. We experimentally -demonstrate the usefulness of these subspaces as (approximate) conceptual space -representations by showing, among others, that important features can be -modelled as directions and that natural properties tend to correspond to convex -regions. -" -2491,1602.05772,Stefan Gerdjikov and Klaus U. Schulz,"Corpus analysis without prior linguistic knowledge - unsupervised mining - of phrases and subphrase structure",cs.CL," When looking at the structure of natural language, ""phrases"" and ""words"" are -central notions. We consider the problem of identifying such ""meaningful -subparts"" of language of any length and underlying composition principles in a -completely corpus-based and language-independent way without using any kind of -prior linguistic knowledge. Unsupervised methods for identifying ""phrases"", -mining subphrase structure and finding words in a fully automated way are -described. This can be considered as a step towards automatically computing a -""general dictionary and grammar of the corpus"". We hope that in the long run -variants of our approach turn out to be useful for other kind of sequence data -as well, such as, e.g., speech, genom sequences, or music annotation. Even if -we are not primarily interested in immediate applications, results obtained for -a variety of languages show that our methods are interesting for many practical -tasks in text mining, terminology extraction and lexicography, search engine -technology, and related fields. -" -2492,1602.05875,"Gil Keren and Bj\""orn Schuller","Convolutional RNN: an Enhanced Model for Extracting Features from - Sequential Data",stat.ML cs.CL," Traditional convolutional layers extract features from patches of data by -applying a non-linearity on an affine function of the input. We propose a model -that enhances this feature extraction process for the case of sequential data, -by feeding patches of the data into a recurrent neural network and using the -outputs or hidden states of the recurrent units to compute the extracted -features. By doing so, we exploit the fact that a window containing a few -frames of the sequential data is a sequence itself and this additional -structure might encapsulate valuable information. In addition, we allow for -more steps of computation in the feature extraction process, which is -potentially beneficial as an affine function followed by a non-linearity can -result in too simple features. Using our convolutional recurrent layers we -obtain an improvement in performance in two audio classification tasks, -compared to traditional convolutional layers. Tensorflow code for the -convolutional recurrent layers is publicly available in -https://github.com/cruvadom/Convolutional-RNN. -" -2493,1602.05944,"Erin Grant, Aida Nematzadeh, and Suzanne Stevenson","The Interaction of Memory and Attention in Novel Word Generalization: A - Computational Investigation",cs.CL," People exhibit a tendency to generalize a novel noun to the basic-level in a -hierarchical taxonomy -- a cognitively salient category such as ""dog"" -- with -the degree of generalization depending on the number and type of exemplars. -Recently, a change in the presentation timing of exemplars has also been shown -to have an effect, surprisingly reversing the prior observed pattern of -basic-level generalization. We explore the precise mechanisms that could lead -to such behavior by extending a computational model of word learning and word -generalization to integrate cognitive processes of memory and attention. Our -results show that the interaction of forgetting and attention to novelty, as -well as sensitivity to both type and token frequencies of exemplars, enables -the model to replicate the empirical results from different presentation -timings. Our results reinforce the need to incorporate general cognitive -processes within word learning models to better understand the range of -observed behaviors in vocabulary acquisition. -" -2494,1602.06023,"Ramesh Nallapati, Bowen Zhou, Cicero Nogueira dos santos, Caglar - Gulcehre, Bing Xiang","Abstractive Text Summarization Using Sequence-to-Sequence RNNs and - Beyond",cs.CL," In this work, we model abstractive text summarization using Attentional -Encoder-Decoder Recurrent Neural Networks, and show that they achieve -state-of-the-art performance on two different corpora. We propose several novel -models that address critical problems in summarization that are not adequately -modeled by the basic architecture, such as modeling key-words, capturing the -hierarchy of sentence-to-word structure, and emitting words that are rare or -unseen at training time. Our work shows that many of our proposed models -contribute to further improvement in performance. We also propose a new dataset -consisting of multi-sentence summaries, and establish performance benchmarks -for further research. -" -2495,1602.06025,"Yong Ren, Yining Wang, Jun Zhu",Spectral Learning for Supervised Topic Models,cs.LG cs.CL cs.IR stat.ML," Supervised topic models simultaneously model the latent topic structure of -large collections of documents and a response variable associated with each -document. Existing inference methods are based on variational approximation or -Monte Carlo sampling, which often suffers from the local minimum defect. -Spectral methods have been applied to learn unsupervised topic models, such as -latent Dirichlet allocation (LDA), with provable guarantees. This paper -investigates the possibility of applying spectral methods to recover the -parameters of supervised LDA (sLDA). We first present a two-stage spectral -method, which recovers the parameters of LDA followed by a power update method -to recover the regression model parameters. Then, we further present a -single-phase spectral algorithm to jointly recover the topic distribution -matrix as well as the regression weights. Our spectral algorithms are provably -correct and computationally efficient. We prove a sample complexity bound for -each algorithm and subsequently derive a sufficient condition for the -identifiability of sLDA. Thorough experiments on synthetic and real-world -datasets verify the theory and demonstrate the practical effectiveness of the -spectral algorithms. In fact, our results on a large-scale review rating -dataset demonstrate that our single-phase spectral algorithm alone gets -comparable or even better performance than state-of-the-art methods, while -previous work on spectral methods has rarely reported such promising -performance. -" -2496,1602.06064,"Tianxing He, Yu Zhang, Jasha Droppo, Kai Yu","On Training Bi-directional Neural Network Language Model with Noise - Contrastive Estimation",cs.CL," We propose to train bi-directional neural network language model(NNLM) with -noise contrastive estimation(NCE). Experiments are conducted on a rescore task -on the PTB data set. It is shown that NCE-trained bi-directional NNLM -outperformed the one trained by conventional maximum likelihood training. But -still(regretfully), it did not out-perform the baseline uni-directional NNLM. -" -2497,1602.06289,"Stanis{\l}aw Jastrz\k{e}bski, Damian Le\'sniak, Wojciech Marian - Czarnecki",Learning to SMILE(S),cs.CL," This paper shows how one can directly apply natural language processing (NLP) -methods to classification problems in cheminformatics. Connection between these -seemingly separate fields is shown by considering standard textual -representation of compound, SMILES. The problem of activity prediction against -a target protein is considered, which is a crucial part of computer aided drug -design process. Conducted experiments show that this way one can not only -outrank state of the art results of hand crafted representations but also gets -direct structural insights into the way decisions are made. -" -2498,1602.06291,"Shalini Ghosh, Oriol Vinyals, Brian Strope, Scott Roy, Tom Dean, Larry - Heck",Contextual LSTM (CLSTM) models for Large scale NLP tasks,cs.CL," Documents exhibit sequential structure at multiple levels of abstraction -(e.g., sentences, paragraphs, sections). These abstractions constitute a -natural hierarchy for representing the context in which to infer the meaning of -words and larger fragments of text. In this paper, we present CLSTM (Contextual -LSTM), an extension of the recurrent neural network LSTM (Long-Short Term -Memory) model, where we incorporate contextual features (e.g., topics) into the -model. We evaluate CLSTM on three specific NLP tasks: word prediction, next -sentence selection, and sentence topic prediction. Results from experiments run -on two corpora, English documents in Wikipedia and a subset of articles from a -recent snapshot of English Google News, indicate that using both words and -topics as features improves performance of the CLSTM models over baseline LSTM -models for these tasks. For example on the next sentence selection task, we get -relative accuracy improvements of 21% for the Wikipedia dataset and 18% for the -Google News dataset. This clearly demonstrates the significant benefit of using -context appropriately in natural language (NL) tasks. This has implications for -a wide variety of NL applications like question answering, sentence completion, -paraphrase generation, and next utterance prediction in dialog systems. -" -2499,1602.06359,"Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, Xueqi - Cheng",Text Matching as Image Recognition,cs.CL cs.AI," Matching two texts is a fundamental problem in many natural language -processing tasks. An effective way is to extract meaningful matching patterns -from words, phrases, and sentences to produce the matching score. Inspired by -the success of convolutional neural network in image recognition, where neurons -can capture many complicated patterns based on the extracted elementary visual -patterns such as oriented edges and corners, we propose to model text matching -as the problem of image recognition. Firstly, a matching matrix whose entries -represent the similarities between words is constructed and viewed as an image. -Then a convolutional neural network is utilized to capture rich matching -patterns in a layer-by-layer way. We show that by resembling the compositional -hierarchies of patterns in image recognition, our model can successfully -identify salient signals such as n-gram and n-term matchings. Experimental -results demonstrate its superiority against the baselines. -" -2500,1602.06727,"Zhizheng Wu, Simon King","Improving Trajectory Modelling for DNN-based Speech Synthesis by using - Stacked Bottleneck Features and Minimum Generation Error Training",cs.SD cs.CL cs.NE," We propose two novel techniques --- stacking bottleneck features and minimum -generation error training criterion --- to improve the performance of deep -neural network (DNN)-based speech synthesis. The techniques address the related -issues of frame-by-frame independence and ignorance of the relationship between -static and dynamic features, within current typical DNN-based synthesis -frameworks. Stacking bottleneck features, which are an acoustically--informed -linguistic representation, provides an efficient way to include more detailed -linguistic context at the input. The minimum generation error training -criterion minimises overall output trajectory error across an utterance, rather -than minimising the error per frame independently, and thus takes into account -the interaction between static and dynamic features. The two techniques can be -easily combined to further improve performance. We present both objective and -subjective results that demonstrate the effectiveness of the proposed -techniques. The subjective results show that combining the two techniques leads -to significantly more natural synthetic speech than from conventional DNN or -long short-term memory (LSTM) recurrent neural network (RNN) systems. -" -2501,1602.06797,"Zhiguo Wang, Haitao Mi and Abraham Ittycheriah","Semi-supervised Clustering for Short Text via Deep Representation - Learning",cs.CL," In this work, we propose a semi-supervised method for short text clustering, -where we represent texts as distributed vectors with neural networks, and use a -small amount of labeled data to specify our intention for clustering. We design -a novel objective to combine the representation learning process and the -k-means clustering process together, and optimize the objective with both -labeled data and unlabeled data iteratively until convergence through three -steps: (1) assign each short text to its nearest centroid based on its -representation from the current neural networks; (2) re-estimate the cluster -centroids based on cluster assignments from step (1); (3) update neural -networks according to the objective by keeping centroids and cluster -assignments fixed. Experimental results on four datasets show that our method -works significantly better than several other text clustering methods. -" -2502,1602.06967,"Danila Doroshin, Nikolay Lubimov, Marina Nastasenko and Mikhail Kotov",Blind score normalization method for PLDA based speaker recognition,cs.CL cs.LG cs.SD," Probabilistic Linear Discriminant Analysis (PLDA) has become state-of-the-art -method for modeling $i$-vector space in speaker recognition task. However the -performance degradation is observed if enrollment data size differs from one -speaker to another. This paper presents a solution to such problem by -introducing new PLDA scoring normalization technique. Normalization parameters -are derived in a blind way, so that, unlike traditional \textit{ZT-norm}, no -extra development data is required. Moreover, proposed method has shown to be -optimal in terms of detection cost function. The experiments conducted on NIST -SRE 2014 database demonstrate an improved accuracy in a mixed enrollment number -condition. -" -2503,1602.06979,"Ethan Fast, Binbin Chen, Michael Bernstein",Empath: Understanding Topic Signals in Large-Scale Text,cs.CL cs.AI," Human language is colored by a broad range of topics, but existing text -analysis tools only focus on a small number of them. We present Empath, a tool -that can generate and validate new lexical categories on demand from a small -set of seed terms (like ""bleed"" and ""punch"" to generate the category violence). -Empath draws connotations between words and phrases by deep learning a neural -embedding across more than 1.8 billion words of modern fiction. Given a small -set of seed words that characterize a category, Empath uses its neural -embedding to discover new related terms, then validates the category with a -crowd-powered filter. Empath also analyzes text across 200 built-in, -pre-validated categories we have generated from common topics in our web -dataset, like neglect, government, and social media. We show that Empath's -data-driven, human validated categories are highly correlated (r=0.906) with -similar categories in LIWC. -" -2504,1602.07019,"Zhiguo Wang, Haitao Mi and Abraham Ittycheriah",Sentence Similarity Learning by Lexical Decomposition and Composition,cs.CL," Most conventional sentence similarity methods only focus on similar parts of -two input sentences, and simply ignore the dissimilar parts, which usually give -us some clues and semantic meanings about the sentences. In this work, we -propose a model to take into account both the similarities and dissimilarities -by decomposing and composing lexical semantics over sentences. The model -represents each word as a vector, and calculates a semantic matching vector for -each word based on all words in the other sentence. Then, each word vector is -decomposed into a similar component and a dissimilar component based on the -semantic matching vector. After this, a two-channel CNN model is employed to -capture features by composing the similar and dissimilar components. Finally, a -similarity score is estimated over the composed feature vectors. Experimental -results show that our model gets the state-of-the-art performance on the answer -sentence selection task, and achieves a comparable result on the paraphrase -identification task. -" -2505,1602.07236,Clayton Norris,Petrarch 2 : Petrarcher,cs.CL," PETRARCH 2 is the fourth generation of a series of Event-Data coders stemming -from research by Phillip Schrodt. Each iteration has brought new functionality -and usability, and this is no exception.Petrarch 2 takes much of the power of -the original Petrarch's dictionaries and redirects it into a faster and smarter -core logic. Earlier iterations handled sentences largely as a list of words, -incorporating some syntactic information here and there. Petrarch 2 now views -the sentence entirely on the syntactic level. It receives the syntactic parse -of a sentence from the Stanford CoreNLP software, and stores this data as a -tree structure of linked nodes, where each node is a Phrase object. -Prepositional, noun, and verb phrases each have their own version of this -Phrase class, which deals with the logic particular to those kinds of phrases. -Since this is an event coder, the core of the logic focuses around the verbs: -who is acting, who is being acted on, and what is happening. The theory behind -this new structure and its logic is founded in Generative Grammar, Information -Theory, and Lambda-Calculus Semantics. -" -2506,1602.07275,"Sandra D. Prado, Silvio R. Dahmen, Ana L.C. Bazzan, Padraig Mac Carron - and Ralph Kenna",Temporal Network Analysis of Literary Texts,physics.soc-ph cs.CL," We study temporal networks of characters in literature focusing on ""Alice's -Adventures in Wonderland"" (1865) by Lewis Carroll and the anonymous ""La Chanson -de Roland"" (around 1100). The former, one of the most influential pieces of -nonsense literature ever written, describes the adventures of Alice in a -fantasy world with logic plays interspersed along the narrative. The latter, a -song of heroic deeds, depicts the Battle of Roncevaux in 778 A.D. during -Charlemagne's campaign on the Iberian Peninsula. We apply methods recently -developed by Taylor and coworkers \cite{Taylor+2015} to find time-averaged -eigenvector centralities, Freeman indices and vitalities of characters. We show -that temporal networks are more appropriate than static ones for studying -stories, as they capture features that the time-independent approaches fail to -yield. -" -2507,1602.07291,"Seyed Omid Sadjadi, Sriram Ganapathy, Jason W. Pelecanos",The IBM 2016 Speaker Recognition System,cs.SD cs.CL stat.ML," In this paper we describe the recent advancements made in the IBM i-vector -speaker recognition system for conversational speech. In particular, we -identify key techniques that contribute to significant improvements in -performance of our system, and quantify their contributions. The techniques -include: 1) a nearest-neighbor discriminant analysis (NDA) approach that is -formulated to alleviate some of the limitations associated with the -conventional linear discriminant analysis (LDA) that assumes Gaussian -class-conditional distributions, 2) the application of speaker- and -channel-adapted features, which are derived from an automatic speech -recognition (ASR) system, for speaker recognition, and 3) the use of a deep -neural network (DNN) acoustic model with a large number of output units (~10k -senones) to compute the frame-level soft alignments required in the i-vector -estimation process. We evaluate these techniques on the NIST 2010 speaker -recognition evaluation (SRE) extended core conditions involving telephone and -microphone trials. Experimental results indicate that: 1) the NDA is more -effective (up to 35% relative improvement in terms of EER) than the traditional -parametric LDA for speaker recognition, 2) when compared to raw acoustic -features (e.g., MFCCs), the ASR speaker-adapted features provide gains in -speaker recognition performance, and 3) increasing the number of output units -in the DNN acoustic model (i.e., increasing the senone set size from 2k to 10k) -provides consistent improvements in performance (for example from 37% to 57% -relative EER gains over our baseline GMM i-vector system). To our knowledge, -results reported in this paper represent the best performances published to -date on the NIST SRE 2010 extended core tasks. -" -2508,1602.07393,Zhenhao Ge and Yufang Sun,"Domain Specific Author Attribution Based on Feedforward Neural Network - Language Models",cs.CL cs.LG cs.NE," Authorship attribution refers to the task of automatically determining the -author based on a given sample of text. It is a problem with a long history and -has a wide range of application. Building author profiles using language models -is one of the most successful methods to automate this task. New language -modeling methods based on neural networks alleviate the curse of dimensionality -and usually outperform conventional N-gram methods. However, there have not -been much research applying them to authorship attribution. In this paper, we -present a novel setup of a Neural Network Language Model (NNLM) and apply it to -a database of text samples from different authors. We investigate how the NNLM -performs on a task with moderate author set size and relatively limited -training and test data, and how the topics of the text samples affect the -accuracy. NNLM achieves nearly 2.5% reduction in perplexity, a measurement of -fitness of a trained language model to the test data. Given 5 random test -sentences, it also increases the author classification accuracy by 3.43% on -average, compared with the N-gram methods using SRILM tools. An open source -implementation of our methodology is freely available at -https://github.com/zge/authorship-attribution/. -" -2509,1602.07394,Zhenhao Ge,"Improved Accent Classification Combining Phonetic Vowels with Acoustic - Features",cs.SD cs.CL," Researches have shown accent classification can be improved by integrating -semantic information into pure acoustic approach. In this work, we combine -phonetic knowledge, such as vowels, with enhanced acoustic features to build an -improved accent classification system. The classifier is based on Gaussian -Mixture Model-Universal Background Model (GMM-UBM), with normalized Perceptual -Linear Predictive (PLP) features. The features are further optimized by -Principle Component Analysis (PCA) and Hetroscedastic Linear Discriminant -Analysis (HLDA). Using 7 major types of accented speech from the Foreign -Accented English (FAE) corpus, the system achieves classification accuracy 54% -with input test data as short as 20 seconds, which is competitive to the state -of the art in this field. -" -2510,1602.07563,"Igor Mozetic, Miha Grcar, Jasmina Smailovic","Multilingual Twitter Sentiment Classification: The Role of Human - Annotators",cs.CL cs.AI," What are the limits of automated Twitter sentiment classification? We analyze -a large set of manually labeled tweets in different languages, use them as -training data, and construct automated classification models. It turns out that -the quality of classification models depends much more on the quality and size -of training data than on the type of the model trained. Experimental results -indicate that there is no statistically significant difference between the -performance of the top classification models. We quantify the quality of -training data by applying various annotator agreement measures, and identify -the weakest points of different datasets. We show that the model performance -approaches the inter-annotator agreement when the size of the training set is -sufficiently large. However, it is crucial to regularly monitor the self- and -inter-annotator agreements since this improves the training datasets and -consequently the model performance. Finally, we show that there is strong -evidence that humans perceive the sentiment classes (negative, neutral, and -positive) as ordered. -" -2511,1602.07572,"Sascha Rothe, Sebastian Ebert, Hinrich Sch\""utze",Ultradense Word Embeddings by Orthogonal Transformation,cs.CL," Embeddings are generic representations that are useful for many NLP tasks. In -this paper, we introduce DENSIFIER, a method that learns an orthogonal -transformation of the embedding space that focuses the information relevant for -a task in an ultradense subspace of a dimensionality that is smaller by a -factor of 100 than the original space. We show that ultradense embeddings -generated by DENSIFIER reach state of the art on a lexicon creation task in -which words are annotated with three types of lexical information - sentiment, -concreteness and frequency. On the SemEval2015 10B sentiment analysis task we -show that no information is lost when the ultradense subspace is used, but -training is an order of magnitude more efficient due to the compactness of the -ultradense space. -" -2512,1602.07618,Bob Coecke,"From quantum foundations via natural language meaning to a theory of - everything",cs.CL quant-ph," In this paper we argue for a paradigmatic shift from `reductionism' to -`togetherness'. In particular, we show how interaction between systems in -quantum theory naturally carries over to modelling how word meanings interact -in natural language. Since meaning in natural language, depending on the -subject domain, encompasses discussions within any scientific discipline, we -obtain a template for theories such as social interaction, animal behaviour, -and many others. -" -2513,1602.07749,"Thien Huu Nguyen, Avirup Sil, Georgiana Dinu and Radu Florian",Toward Mention Detection Robustness with Recurrent Neural Networks,cs.CL," One of the key challenges in natural language processing (NLP) is to yield -good performance across application domains and languages. In this work, we -investigate the robustness of the mention detection systems, one of the -fundamental tasks in information extraction, via recurrent neural networks -(RNNs). The advantage of RNNs over the traditional approaches is their capacity -to capture long ranges of context and implicitly adapt the word embeddings, -trained on a large corpus, into a task-specific word representation, but still -preserve the original semantic generalization to be helpful across domains. Our -systematic evaluation for RNN architectures demonstrates that RNNs not only -outperform the best reported systems (up to 9\% relative error reduction) in -the general setting but also achieve the state-of-the-art performance in the -cross-domain setting for English. Regarding other languages, RNNs are -significantly better than the traditional methods on the similar task of named -entity recognition for Dutch (up to 22\% relative error reduction). -" -2514,1602.07776,"Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Noah A. Smith",Recurrent Neural Network Grammars,cs.CL cs.NE," We introduce recurrent neural network grammars, probabilistic models of -sentences with explicit phrase structure. We explain efficient inference -procedures that allow application to both parsing and language modeling. -Experiments show that they provide better parsing in English than any single -previously published supervised generative model and better language modeling -than state-of-the-art sequential RNNs in English and Chinese. -" -2515,1602.07803,"Md. Masudul Haque, Md. Tarek Habib and Md. Mokhlesur Rahman","Automated Word Prediction in Bangla Language Using Stochastic Language - Models",cs.CL," Word completion and word prediction are two important phenomena in typing -that benefit users who type using keyboard or other similar devices. They can -have profound impact on the typing of disable people. Our work is based on word -prediction on Bangla sentence by using stochastic, i.e. N-gram language model -such as unigram, bigram, trigram, deleted Interpolation and backoff models for -auto completing a sentence by predicting a correct word in a sentence which -saves time and keystrokes of typing and also reduces misspelling. We use large -data corpus of Bangla language of different word types to predict correct word -with the accuracy as much as possible. We have found promising results. We hope -that our work will impact on the baseline for automated Bangla typing. -" -2516,1602.07807,Michael Bloodgood and Benjamin Strauss,"Data Cleaning for XML Electronic Dictionaries via Statistical Anomaly - Detection",cs.DB cs.CL stat.ML," Many important forms of data are stored digitally in XML format. Errors can -occur in the textual content of the data in the fields of the XML. Fixing these -errors manually is time-consuming and expensive, especially for large amounts -of data. There is increasing interest in the research, development, and use of -automated techniques for assisting with data cleaning. Electronic dictionaries -are an important form of data frequently stored in XML format that frequently -have errors introduced through a mixture of manual typographical entry errors -and optical character recognition errors. In this paper we describe methods for -flagging statistical anomalies as likely errors in electronic dictionaries -stored in XML format. We describe six systems based on different sources of -information. The systems detect errors using various signals in the data -including uncommon characters, text length, character-based language models, -word-based language models, tied-field length ratios, and tied-field -transliteration models. Four of the systems detect errors based on expectations -automatically inferred from content within elements of a single field type. We -call these single-field systems. Two of the systems detect errors based on -correspondence expectations automatically inferred from content within elements -of multiple related field types. We call these tied-field systems. For each -system, we provide an intuitive analysis of the type of error that it is -successful at detecting. Finally, we describe two larger-scale evaluations -using crowdsourcing with Amazon's Mechanical Turk platform and using the -annotations of a domain expert. The evaluations consistently show that the -systems are useful for improving the efficiency with which errors in XML -electronic dictionaries can be detected. -" -2517,1602.08128,"Zhenhao Ge, Sudhendu R. Sharma, Mark J. T. Smith",PCA Method for Automated Detection of Mispronounced Words,cs.SD cs.CL cs.LG," This paper presents a method for detecting mispronunciations with the aim of -improving Computer Assisted Language Learning (CALL) tools used by foreign -language learners. The algorithm is based on Principle Component Analysis -(PCA). It is hierarchical with each successive step refining the estimate to -classify the test word as being either mispronounced or correct. Preprocessing -before detection, like normalization and time-scale modification, is -implemented to guarantee uniformity of the feature vectors input to the -detection system. The performance using various features including spectrograms -and Mel-Frequency Cepstral Coefficients (MFCCs) are compared and evaluated. -Best results were obtained using MFCCs, achieving up to 99% accuracy in word -verification and 93% in native/non-native classification. Compared with Hidden -Markov Models (HMMs) which are used pervasively in recognition application, -this particular approach is computational efficient and effective when training -data is limited. -" -2518,1602.08657,Luc Herren,"QuotationFinder - Searching for Quotations and Allusions in Greek and - Latin Texts and Establishing the Degree to Which a Quotation or Allusion - Matches Its Source",cs.CL," The software programs generally used with the TLG (Thesaurus Linguae Graecae) -and the CLCLT (CETEDOC Library of Christian Latin Texts) CD-ROMs are not well -suited for finding quotations and allusions. QuotationFinder uses more -sophisticated criteria as it ranks search results based on how closely they -match the source text, listing search results with literal quotations first and -loose verbal parallels last. -" -2519,1602.08715,"Avi Shmidman, Moshe Koppel, Ely Porat",Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus,cs.CL," We propose a method for efficiently finding all parallel passages in a large -corpus, even if the passages are not quite identical due to rephrasing and -orthographic variation. The key ideas are the representation of each word in -the corpus by its two most infrequent letters, finding matched pairs of strings -of four or five words that differ by at most one word and then identifying -clusters of such matched pairs. Using this method, over 4600 parallel pairs of -passages were identified in the Babylonian Talmud, a Hebrew-Aramaic corpus of -over 1.8 million words, in just over 30 seconds. Empirical comparisons on -sample data indicate that the coverage obtained by our method is essentially -the same as that obtained using slow exhaustive methods. -" -2520,1602.08741,Nikolay N. Vasiliev,"Gibberish Semantics: How Good is Russian Twitter in Word Semantic - Similarity Task?",cs.CL," The most studied and most successful language models were developed and -evaluated mainly for English and other close European languages, such as -French, German, etc. It is important to study applicability of these models to -other languages. The use of vector space models for Russian was recently -studied for multiple corpora, such as Wikipedia, RuWac, lib.ru. These models -were evaluated against word semantic similarity task. For our knowledge Twitter -was not considered as a corpus for this task, with this work we fill the gap. -Results for vectors trained on Twitter corpus are comparable in accuracy with -other single-corpus trained models, although the best performance is currently -achieved by combination of multiple corpora. -" -2521,1602.08742,James C. Loach and Jinzhao Wang,"Optimizing the Learning Order of Chinese Characters Using a Novel - Topological Sort Algorithm",cs.CL physics.soc-ph," We present a novel algorithm for optimizing the order in which Chinese -characters are learned, one that incorporates the benefits of learning them in -order of usage frequency and in order of their hierarchal structural -relationships. We show that our work outperforms previously published orders -and algorithms. Our algorithm is applicable to any scheduling task where nodes -have intrinsic differences in importance and must be visited in topological -order. -" -2522,1602.08761,"Tolga Bolukbasi, Kai-Wei Chang, Joseph Wang, Venkatesh Saligrama",Resource Constrained Structured Prediction,stat.ML cs.CL cs.CV cs.LG," We study the problem of structured prediction under test-time budget -constraints. We propose a novel approach applicable to a wide range of -structured prediction problems in computer vision and natural language -processing. Our approach seeks to adaptively generate computationally costly -features during test-time in order to reduce the computational cost of -prediction while maintaining prediction performance. We show that training the -adaptive feature generation system can be reduced to a series of structured -learning problems, resulting in efficient training using existing structured -learning algorithms. This framework provides theoretical justification for -several existing heuristic approaches found in literature. We evaluate our -proposed adaptive system on two structured prediction tasks, optical character -recognition (OCR) and dependency parsing and show strong performance in -reduction of the feature costs without degrading accuracy. -" -2523,1602.08844,"Pramit Chaudhuri, Joseph P. Dexter",Bioinformatics and Classical Literary Study,cs.CL," This paper describes the Quantitative Criticism Lab, a collaborative -initiative between classicists, quantitative biologists, and computer -scientists to apply ideas and methods drawn from the sciences to the study of -literature. A core goal of the project is the use of computational biology, -natural language processing, and machine learning techniques to investigate -authorial style, intertextuality, and related phenomena of literary -significance. As a case study in our approach, here we review the use of -sequence alignment, a common technique in genomics and computational -linguistics, to detect intertextuality in Latin literature. Sequence alignment -is distinguished by its ability to find inexact verbal similarities, which -makes it ideal for identifying phonetic echoes in large corpora of Latin texts. -Although especially suited to Latin, sequence alignment in principle can be -extended to many other languages. -" -2524,1602.08952,"\'Akos K\'ad\'ar, Grzegorz Chrupa{\l}a, Afra Alishahi","Representation of linguistic form and function in recurrent neural - networks",cs.CL cs.LG," We present novel methods for analyzing the activation patterns of RNNs from a -linguistic point of view and explore the types of linguistic structure they -learn. As a case study, we use a multi-task gated recurrent network -architecture consisting of two parallel pathways with shared word embeddings -trained on predicting the representations of the visual scene corresponding to -an input sentence, and predicting the next word in the same sentence. Based on -our proposed method to estimate the amount of contribution of individual tokens -in the input to the final prediction of the networks we show that the image -prediction pathway: a) is sensitive to the information structure of the -sentence b) pays selective attention to lexical categories and grammatical -functions that carry semantic information c) learns to treat the same input -token differently depending on its grammatical functions in the sentence. In -contrast the language model is comparatively more sensitive to words with a -syntactic function. Furthermore, we propose methods to ex- plore the function -of individual hidden units in RNNs and show that the two pathways of the -architecture in our case study contain specialized units tuned to patterns -informative for the task, some of which can carry activations to later time -steps to encode long-term dependencies. -" -2525,1603.00106,"Saurav Ghosh, Prithwish Chakraborty, Emily Cohn, John S. Brownstein, - and Naren Ramakrishnan","Characterizing Diseases from Unstructured Text: A Vocabulary Driven - Word2vec Approach",cs.LG cs.CL stat.ML," Traditional disease surveillance can be augmented with a wide variety of -real-time sources such as, news and social media. However, these sources are in -general unstructured and, construction of surveillance tools such as -taxonomical correlations and trace mapping involves considerable human -supervision. In this paper, we motivate a disease vocabulary driven word2vec -model (Dis2Vec) to model diseases and constituent attributes as word embeddings -from the HealthMap news corpus. We use these word embeddings to automatically -create disease taxonomies and evaluate our model against corresponding human -annotated taxonomies. We compare our model accuracies against several -state-of-the art word2vec methods. Our results demonstrate that Dis2Vec -outperforms traditional distributed vector representations in its ability to -faithfully capture taxonomical attributes across different class of diseases -such as endemic, emerging and rare. -" -2526,1603.00223,"Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith and Steve Renals",Segmental Recurrent Neural Networks for End-to-end Speech Recognition,cs.CL cs.LG cs.NE," We study the segmental recurrent neural network for end-to-end acoustic -modelling. This model connects the segmental conditional random field (CRF) -with a recurrent neural network (RNN) used for feature extraction. Compared to -most previous CRF-based acoustic models, it does not rely on an external system -to provide features or segmentation boundaries. Instead, this model -marginalises out all the possible segmentations, and features are extracted -from the RNN trained together with the segmental CRF. In essence, this model is -self-contained and can be trained end-to-end. In this paper, we discuss -practical training and decoding issues as well as the method to speed up the -training in the context of speech recognition. We performed experiments on the -TIMIT dataset. We achieved 17.3 phone error rate (PER) from the first-pass -decoding --- the best reported result using CRFs, despite the fact that we only -used a zeroth-order CRF and without using any language model. -" -2527,1603.00260,Dhruv Gupta,"Event Search and Analytics: Detecting Events in Semantically Annotated - Corpora for Search and Analytics",cs.IR cs.CL," In this article, I present the questions that I seek to answer in my PhD -research. I posit to analyze natural language text with the help of semantic -annotations and mine important events for navigating large text corpora. -Semantic annotations such as named entities, geographic locations, and temporal -expressions can help us mine events from the given corpora. These events thus -provide us with useful means to discover the locked knowledge in them. I pose -three problems that can help unlock this knowledge vault in semantically -annotated text corpora: i. identifying important events; ii. semantic search; -and iii. event analytics. -" -2528,1603.00375,"Eliyahu Kiperwasser, Yoav Goldberg",Easy-First Dependency Parsing with Hierarchical Tree LSTMs,cs.CL," We suggest a compositional vector representation of parse trees that relies -on a recursive combination of recurrent-neural network encoders. To demonstrate -its effectiveness, we use the representation as the backbone of a greedy, -bottom-up dependency parser, achieving state-of-the-art accuracies for English -and Chinese, without relying on external word embeddings. The parser's -implementation is available for download at the first author's webpage. -" -2529,1603.00423,Phong Le and Willem Zuidema,"Quantifying the vanishing gradient and long distance dependency problem - in recursive neural networks and recursive LSTMs",cs.AI cs.CL cs.NE," Recursive neural networks (RNN) and their recently proposed extension -recursive long short term memory networks (RLSTM) are models that compute -representations for sentences, by recursively combining word embeddings -according to an externally provided parse tree. Both models thus, unlike -recurrent networks, explicitly make use of the hierarchical structure of a -sentence. In this paper, we demonstrate that RNNs nevertheless suffer from the -vanishing gradient and long distance dependency problem, and that RLSTMs -greatly improve over RNN's on these problems. We present an artificial learning -task that allows us to quantify the severity of these problems for both models. -We further show that a ratio of gradients (at the root node and a focal leaf -node) is highly indicative of the success of backpropagation at optimizing the -relevant weights low in the tree. This paper thus provides an explanation for -existing, superior results of RLSTMs on tasks such as sentiment analysis, and -suggests that the benefits of including hierarchical structure and of including -LSTM-style gating are complementary. -" -2530,1603.00786,Nanyun Peng and Mark Dredze,"Improving Named Entity Recognition for Chinese Social Media with Word - Segmentation Representation Learning",cs.CL," Named entity recognition, and other information extraction tasks, frequently -use linguistic features such as part of speech tags or chunkings. For languages -where word boundaries are not readily identified in text, word segmentation is -a key first step to generating features for an NER system. While using word -boundary tags as features are helpful, the signals that aid in identifying -these boundaries may provide richer information for an NER system. New -state-of-the-art word segmentation systems use neural models to learn -representations for predicting word boundaries. We show that these same -representations, jointly trained with an NER system, yield significant -improvements in NER for Chinese social media. In our experiments, jointly -training NER and word segmentation with an LSTM-CRF model yields nearly 5% -absolute improvement over previously published results. -" -2531,1603.00810,Marta R. Costa-Juss\`a and Jos\'e A. R. Fonollosa,Character-based Neural Machine Translation,cs.CL cs.LG cs.NE stat.ML," Neural Machine Translation (MT) has reached state-of-the-art results. -However, one of the main challenges that neural MT still faces is dealing with -very large vocabularies and morphologically rich languages. In this paper, we -propose a neural MT system using character-based embeddings in combination with -convolutional and highway layers to replace the standard lookup-based word -representations. The resulting unlimited-vocabulary and affix-aware source word -embeddings are tested in a state-of-the-art neural MT based on an -attention-based bidirectional recurrent neural network. The proposed MT scheme -provides improved results even when the source language is not morphologically -rich. Improvements up to 3 BLEU points are obtained in the German-English WMT -task. -" -2532,1603.00892,"Nikola Mrk\v{s}i\'c and Diarmuid \'O S\'eaghdha and Blaise Thomson and - Milica Ga\v{s}i\'c and Lina Rojas-Barahona and Pei-Hao Su and David Vandyke - and Tsung-Hsien Wen and Steve Young",Counter-fitting Word Vectors to Linguistic Constraints,cs.CL cs.LG," In this work, we present a novel counter-fitting method which injects -antonymy and synonymy constraints into vector space representations in order to -improve the vectors' capability for judging semantic similarity. Applying this -method to publicly available pre-trained word vectors leads to a new state of -the art performance on the SimLex-999 dataset. We also show how the method can -be used to tailor the word vector space for the downstream task of dialogue -state tracking, resulting in robust improvements across different dialogue -domains. -" -2533,1603.00957,"Kun Xu, Siva Reddy, Yansong Feng, Songfang Huang, Dongyan Zhao","Question Answering on Freebase via Relation Extraction and Textual - Evidence",cs.CL," Existing knowledge-based question answering systems often rely on small -annotated training data. While shallow methods like relation extraction are -robust to data scarcity, they are less expressive than the deep meaning -representation methods like semantic parsing, thereby failing at answering -questions involving multiple constraints. Here we alleviate this problem by -empowering a relation extraction method with additional evidence from -Wikipedia. We first present a neural network based relation extractor to -retrieve the candidate answers from Freebase, and then infer over Wikipedia to -validate these answers. Experiments on the WebQuestions question answering -dataset show that our method achieves an F_1 of 53.3%, a substantial -improvement over the state-of-the-art. -" -2534,1603.00968,"Ye Zhang, Stephen Roller, Byron Wallace","MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for - Sentence Classification",cs.CL," We introduce a novel, simple convolution neural network (CNN) architecture - -multi-group norm constraint CNN (MGNC-CNN) that capitalizes on multiple sets of -word embeddings for sentence classification. MGNC-CNN extracts features from -input embedding sets independently and then joins these at the penultimate -layer in the network to form a final feature vector. We then adopt a group -regularization strategy that differentially penalizes weights associated with -the subcomponents generated from the respective embedding sets. This model is -much simpler than comparable alternative architectures and requires -substantially less training time. Furthermore, it is flexible in that it does -not require input word embeddings to be of the same dimensionality. We show -that MGNC-CNN consistently outperforms baseline models. -" -2535,1603.01032,Javier Arias Navarro,Right Ideals of a Ring and Sublanguages of Science,cs.CL," Among Zellig Harris's numerous contributions to linguistics his theory of the -sublanguages of science probably ranks among the most underrated. However, not -only has this theory led to some exhaustive and meaningful applications in the -study of the grammar of immunology language and its changes over time, but it -also illustrates the nature of mathematical relations between chunks or subsets -of a grammar and the language as a whole. This becomes most clear when dealing -with the connection between metalanguage and language, as well as when -reflecting on operators. - This paper tries to justify the claim that the sublanguages of science stand -in a particular algebraic relation to the rest of the language they are -embedded in, namely, that of right ideals in a ring. -" -2536,1603.01232,"Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M. Rojas-Barahona, - Pei-Hao Su, David Vandyke, Steve Young","Multi-domain Neural Network Language Generation for Spoken Dialogue - Systems",cs.CL," Moving from limited-domain natural language generation (NLG) to open domain -is difficult because the number of semantic input combinations grows -exponentially with the number of domains. Therefore, it is important to -leverage existing resources and exploit similarities between domains to -facilitate domain adaptation. In this paper, we propose a procedure to train -multi-domain, Recurrent Neural Network-based (RNN) language generators via -multiple adaptation steps. In this procedure, a model is first trained on -counterfeited data synthesised from an out-of-domain dataset, and then fine -tuned on a small set of in-domain utterances with a discriminative objective -function. Corpus-based evaluation results show that the proposed procedure can -achieve competitive performance in terms of BLEU score and slot error rate -while significantly reducing the data needed to train generators in new, unseen -domains. In subjective testing, human judges confirm that the procedure greatly -improves generator performance when only a small amount of data is available in -the domain. -" -2537,1603.01333,"Lei Sha, Sujian Li, Baobao Chang, Zhifang Sui",Joint Learning Templates and Slots for Event Schema Induction,cs.CL," Automatic event schema induction (AESI) means to extract meta-event from raw -text, in other words, to find out what types (templates) of event may exist in -the raw text and what roles (slots) may exist in each event type. In this -paper, we propose a joint entity-driven model to learn templates and slots -simultaneously based on the constraints of templates and slots in the same -sentence. In addition, the entities' semantic information is also considered -for the inner connectivity of the entities. We borrow the normalized cut -criteria in image segmentation to divide the entities into more accurate -template clusters and slot clusters. The experiment shows that our model gains -a relatively higher result than previous work. -" -2538,1603.01354,"Xuezhe Ma, Eduard Hovy",End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF,cs.LG cs.CL stat.ML," State-of-the-art sequence labeling systems traditionally require large -amounts of task-specific knowledge in the form of hand-crafted features and -data pre-processing. In this paper, we introduce a novel neutral network -architecture that benefits from both word- and character-level representations -automatically, by using combination of bidirectional LSTM, CNN and CRF. Our -system is truly end-to-end, requiring no feature engineering or data -pre-processing, thus making it applicable to a wide range of sequence labeling -tasks. We evaluate our system on two data sets for two sequence labeling tasks ---- Penn Treebank WSJ corpus for part-of-speech (POS) tagging and CoNLL 2003 -corpus for named entity recognition (NER). We obtain state-of-the-art -performance on both the two data --- 97.55\% accuracy for POS tagging and -91.21\% F1 for NER. -" -2539,1603.01360,"Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya - Kawakami, Chris Dyer",Neural Architectures for Named Entity Recognition,cs.CL," State-of-the-art named entity recognition systems rely heavily on -hand-crafted features and domain-specific knowledge in order to learn -effectively from the small, supervised training corpora that are available. In -this paper, we introduce two new neural architectures---one based on -bidirectional LSTMs and conditional random fields, and the other that -constructs and labels segments using a transition-based approach inspired by -shift-reduce parsers. Our models rely on two sources of information about -words: character-based word representations learned from the supervised corpus -and unsupervised word representations learned from unannotated corpora. Our -models obtain state-of-the-art performance in NER in four languages without -resorting to any language-specific knowledge or resources such as gazetteers. -" -2540,1603.01417,"Caiming Xiong, Stephen Merity, Richard Socher",Dynamic Memory Networks for Visual and Textual Question Answering,cs.NE cs.CL cs.CV," Neural network architectures with memory and attention mechanisms exhibit -certain reasoning capabilities required for question answering. One such -architecture, the dynamic memory network (DMN), obtained high accuracy on a -variety of language tasks. However, it was not shown whether the architecture -achieves strong results for question answering when supporting facts are not -marked during training or whether it could be applied to other modalities such -as images. Based on an analysis of the DMN, we propose several improvements to -its memory and input modules. Together with these changes we introduce a novel -input module for images in order to be able to answer visual questions. Our new -DMN+ model improves the state of the art on both the Visual Question Answering -dataset and the \babi-10k text question-answering dataset without supporting -fact supervision. -" -2541,1603.01514,"Nikhil Garg, James Henderson",A Bayesian Model of Multilingual Unsupervised Semantic Role Induction,cs.CL," We propose a Bayesian model of unsupervised semantic role induction in -multiple languages, and use it to explore the usefulness of parallel corpora -for this task. Our joint Bayesian model consists of individual models for each -language plus additional latent variables that capture alignments between roles -across languages. Because it is a generative Bayesian model, we can do -evaluations in a variety of scenarios just by varying the inference procedure, -without changing the model, thereby comparing the scenarios directly. We -compare using only monolingual data, using a parallel corpus, using a parallel -corpus with annotations in the other language, and using small amounts of -annotation in the target language. We find that the biggest impact of adding a -parallel corpus to training is actually the increase in mono-lingual data, with -the alignments to another language resulting in small improvements, even with -labeled data for the other language. -" -2542,1603.01520,"Daniel Rubio Bonilla, Colin W. Glass, Jan Kuper",Optimized Polynomial Evaluation with Semantic Annotations,cs.PL cs.CL," In this paper we discuss how semantic annotations can be used to introduce -mathematical algorithmic information of the underlying imperative code to -enable compilers to produce code transformations that will enable better -performance. By using this approaches not only good performance is achieved, -but also better programmability, maintainability and portability across -different hardware architectures. To exemplify this we will use polynomial -equations of different degrees. -" -2543,1603.01541,Martijn Naaijer and Dirk Roorda,"Parallel Texts in the Hebrew Bible, New Methods and Visualizations",cs.CL," In this article we develop an algorithm to detect parallel texts in the -Masoretic Text of the Hebrew Bible. The results are presented online and -chapters in the Hebrew Bible containing parallel passages can be inspected -synoptically. Differences between parallel passages are highlighted. In a -similar way the MT of Isaiah is presented synoptically with 1QIsaa. We also -investigate how one can investigate the degree of similarity between parallel -passages with the help of a case study of 2 Kings 19-25 and its parallels in -Isaiah, Jeremiah and 2 Chronicles. -" -2544,1603.01547,"Rudolf Kadlec, Martin Schmid, Ondrej Bajgar and Jan Kleindienst",Text Understanding with the Attention Sum Reader Network,cs.CL," Several large cloze-style context-question-answer datasets have been -introduced recently: the CNN and Daily Mail news data and the Children's Book -Test. Thanks to the size of these datasets, the associated text comprehension -task is well suited for deep-learning techniques that currently seem to -outperform all alternative approaches. We present a new, simple model that uses -attention to directly pick the answer from the context as opposed to computing -the answer using a blended representation of words in the document as is usual -in similar models. This makes the model particularly suitable for -question-answering problems where the answer is a single word from the -document. Ensemble of our models sets new state of the art on all evaluated -datasets. -" -2545,1603.01595,"Hussam Hamdan, Patrice Bellot, Frederic Bechet",Sentiment Analysis in Scholarly Book Reviews,cs.CL cs.AI," So far different studies have tackled the sentiment analysis in several -domains such as restaurant and movie reviews. But, this problem has not been -studied in scholarly book reviews which is different in terms of review style -and size. In this paper, we propose to combine different features in order to -be presented to a supervised classifiers which extract the opinion target -expressions and detect their polarities in scholarly book reviews. We construct -a labeled corpus for training and evaluating our methods in French book -reviews. We also evaluate them on English restaurant reviews in order to -measure their robustness across the domains and languages. The evaluation shows -that our methods are enough robust for English restaurant reviews and French -book reviews. -" -2546,1603.01597,"Mike Kestemont, Jeroen De Gussem","Integrated Sequence Tagging for Medieval Latin Using Deep Representation - Learning",cs.CL cs.LG stat.ML," In this paper we consider two sequence tagging tasks for medieval Latin: -part-of-speech tagging and lemmatization. These are both basic, yet -foundational preprocessing steps in applications such as text re-use detection. -Nevertheless, they are generally complicated by the considerable orthographic -variation which is typical of medieval Latin. In Digital Classics, these tasks -are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion. -For example, a lexicon is used to generate all the potential lemma-tag pairs -for a token, and next, a context-aware PoS-tagger is used to select the most -appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items, -error percolation is a major downside of such approaches. In this paper we -explore the possibility to elegantly solve these tasks using a single, -integrated approach. For this, we make use of a layered neural network -architecture from the field of deep representation learning. -" -2547,1603.01648,"Gabriel Stanovsky, Jessica Ficler, Ido Dagan, Yoav Goldberg",Getting More Out Of Syntax with PropS,cs.CL," Semantic NLP applications often rely on dependency trees to recognize major -elements of the proposition structure of sentences. Yet, while much semantic -structure is indeed expressed by syntax, many phenomena are not easily read out -of dependency trees, often leading to further ad-hoc heuristic post-processing -or to information loss. To directly address the needs of semantic applications, -we present PropS -- an output representation designed to explicitly and -uniformly express much of the proposition structure which is implied from -syntax, and an associated tool for extracting it from dependency trees. -" -2548,1603.01833,"Giuliano Lancioni, Valeria Pettinari, Laura Garofalo, Marta - Campanelli, Ivana Pepe, Simona Olivieri, Ilaria Cicola","Semi-Automatic Data Annotation, POS Tagging and Mildly Context-Sensitive - Disambiguation: the eXtended Revised AraMorph (XRAM)",cs.CL cs.IR," An extended, revised form of Tim Buckwalter's Arabic lexical and -morphological resource AraMorph, eXtended Revised AraMorph (henceforth XRAM), -is presented which addresses a number of weaknesses and inconsistencies of the -original model by allowing a wider coverage of real-world Classical and -contemporary (both formal and informal) Arabic texts. Building upon previous -research, XRAM enhancements include (i) flag-selectable usage markers, (ii) -probabilistic mildly context-sensitive POS tagging, filtering, disambiguation -and ranking of alternative morphological analyses, (iii) semi-automatic -increment of lexical coverage through extraction of lexical and morphological -information from existing lexical resources. Testing of XRAM through a -front-end Python module showed a remarkable success level. -" -2549,1603.01913,Yangfeng Ji and Gholamreza Haffari and Jacob Eisenstein,"A Latent Variable Recurrent Neural Network for Discourse Relation - Language Models",cs.CL cs.LG cs.NE stat.ML," This paper presents a novel latent variable recurrent neural network -architecture for jointly modeling sequences of words and (possibly latent) -discourse relations between adjacent sentences. A recurrent neural network -generates individual words, thus reaping the benefits of -discriminatively-trained vector representations. The discourse relations are -represented with a latent variable, which can be predicted or marginalized, -depending on the task. The resulting model can therefore employ a training -objective that includes not only discourse relation classification, but also -word prediction. As a result, it outperforms state-of-the-art alternatives for -two tasks: implicit discourse relation classification in the Penn Discourse -Treebank, and dialog act classification in the Switchboard corpus. Furthermore, -by marginalizing over latent discourse relations at test time, we obtain a -discourse informed language model, which improves over a strong LSTM baseline. -" -2550,1603.01987,Vittoria Cozza and Marinella Petrocchi and Angelo Spognardi,"A matter of words: NLP for quality evaluation of Wikipedia medical - articles",cs.IR cs.CL," Automatic quality evaluation of Web information is a task with many fields of -applications and of great relevance, especially in critical domains like the -medical one. We move from the intuition that the quality of content of medical -Web documents is affected by features related with the specific domain. First, -the usage of a specific vocabulary (Domain Informativeness); then, the adoption -of specific codes (like those used in the infoboxes of Wikipedia articles) and -the type of document (e.g., historical and technical ones). In this paper, we -propose to leverage specific domain features to improve the results of the -evaluation of Wikipedia medical articles. In particular, we evaluate the -articles adopting an ""actionable"" model, whose features are related to the -content of the articles, so that the model can also directly suggest strategies -for improving a given article quality. We rely on Natural Language Processing -(NLP) and dictionaries-based techniques in order to extract the bio-medical -concepts in a text. We prove the effectiveness of our approach by classifying -the medical articles of the Wikipedia Medicine Portal, which have been -previously manually labeled by the Wiki Project team. The results of our -experiments confirm that, by considering domain-oriented features, it is -possible to obtain sensible improvements with respect to existing solutions, -mainly for those articles that other approaches have less correctly classified. -Other than being interesting by their own, the results call for further -research in the area of domain specific features suitable for Web data quality -assessment. -" -2551,1603.02488,"Shimaa M. Abd El-salam, Enas M.F. El Houby, A.K. Al Sammak and T.A. - El-Shishtawy",Extracting Arabic Relations from the Web,cs.CL," The goal of this research is to extract a large list or table from named -entities and relations in a specific domain. A small set of a handful of -instance relations is required as input from the user. The system exploits -summaries from Google search engine as a source text. These instances are used -to extract patterns. The output is a set of new entities and their relations. -The results from four experiments show that precision and recall varies -according to relation type. Precision ranges from 0.61 to 0.75 while recall -ranges from 0.71 to 0.83. The best result is obtained for (player, club) -relationship, 0.72 and 0.83 for precision and recall respectively. -" -2552,1603.02514,"Weidi Xu, Haoze Sun, Chao Deng, Ying Tan",Variational Autoencoders for Semi-supervised Text Classification,cs.CL cs.LG," Although semi-supervised variational autoencoder (SemiVAE) works in image -classification task, it fails in text classification task if using vanilla LSTM -as its decoder. From a perspective of reinforcement learning, it is verified -that the decoder's capability to distinguish between different categorical -labels is essential. Therefore, Semi-supervised Sequential Variational -Autoencoder (SSVAE) is proposed, which increases the capability by feeding -label into its decoder RNN at each time-step. Two specific decoder structures -are investigated and both of them are verified to be effective. Besides, in -order to reduce the computational complexity in training, a novel optimization -method is proposed, which estimates the gradient of the unlabeled objective -function by sampling, along with two variance reduction techniques. -Experimental results on Large Movie Review Dataset (IMDB) and AG's News corpus -show that the proposed approach significantly improves the classification -accuracy compared with pure-supervised classifiers, and achieves competitive -performance against previous advanced methods. State-of-the-art results can be -obtained by integrating other pretraining-based methods. -" -2553,1603.02604,"Ralf Steinberger, Aldo Podavini, Alexandra Balahur, Guillaume Jacquet, - Hristo Tanev, Jens Linge, Martin Atkinson, Michele Chinosi, Vanni Zavarella, - Yaniv Steiner, Erik van der Goot",Observing Trends in Automated Multilingual Media Analysis,cs.CL," Any large organisation, be it public or private, monitors the media for -information to keep abreast of developments in their field of interest, and -usually also to become aware of positive or negative opinions expressed towards -them. At least for the written media, computer programs have become very -efficient at helping the human analysts significantly in their monitoring task -by gathering media reports, analysing them, detecting trends and - in some -cases - even to issue early warnings or to make predictions of likely future -developments. We present here trend recognition-related functionality of the -Europe Media Monitor (EMM) system, which was developed by the European -Commission's Joint Research Centre (JRC) for public administrations in the -European Union (EU) and beyond. EMM performs large-scale media analysis in up -to seventy languages and recognises various types of trends, some of them -combining information from news articles written in different languages and -from social media posts. EMM also lets users explore the huge amount of -multilingual media data through interactive maps and graphs, allowing them to -examine the data from various view points and according to multiple criteria. A -lot of EMM's functionality is accessibly freely over the internet or via apps -for hand-held devices. -" -2554,1603.02618,"Angeliki Lazaridou, Nghia The Pham, Marco Baroni","The red one!: On learning to refer to things based on their - discriminative properties",cs.CL cs.CV," As a first step towards agents learning to communicate about their visual -environment, we propose a system that, given visual representations of a -referent (cat) and a context (sofa), identifies their discriminative -attributes, i.e., properties that distinguish them (has_tail). Moreover, -despite the lack of direct supervision at the attribute level, the model learns -to assign plausible attributes to objects (sofa-has_cushion). Finally, we -present a preliminary experiment confirming the referential success of the -predicted discriminative attributes. -" -2555,1603.02776,"Yang Liu, Sujian Li, Xiaodong Zhang and Zhifang Sui","Implicit Discourse Relation Classification via Multi-Task Neural - Networks",cs.CL cs.AI cs.NE," Without discourse connectives, classifying implicit discourse relations is a -challenging task and a bottleneck for building a practical discourse parser. -Previous research usually makes use of one kind of discourse framework such as -PDTB or RST to improve the classification performance on discourse relations. -Actually, under different discourse annotation frameworks, there exist multiple -corpora which have internal connections. To exploit the combination of -different discourse corpora, we design related discourse classification tasks -specific to a corpus, and propose a novel Convolutional Neural Network embedded -multi-task learning system to synthesize these tasks by learning both unique -and shared representations for each task. The experimental results on the PDTB -implicit discourse relation classification task demonstrate that our model -achieves significant gains over baseline systems. -" -2556,1603.02845,"Herman Kamper, Aren Jansen, Sharon Goldwater","Unsupervised word segmentation and lexicon discovery using acoustic word - embeddings",cs.CL," In settings where only unlabelled speech data is available, speech technology -needs to be developed without transcriptions, pronunciation dictionaries, or -language modelling text. A similar problem is faced when modelling infant -language acquisition. In these cases, categorical linguistic structure needs to -be discovered directly from speech audio. We present a novel unsupervised -Bayesian model that segments unlabelled speech and clusters the segments into -hypothesized word groupings. The result is a complete unsupervised tokenization -of the input speech in terms of discovered word types. In our approach, a -potential word segment (of arbitrary length) is embedded in a fixed-dimensional -acoustic vector space. The model, implemented as a Gibbs sampler, then builds a -whole-word acoustic model in this space while jointly performing segmentation. -We report word error rates in a small-vocabulary connected digit recognition -task by mapping the unsupervised decoded output to ground truth transcriptions. -The model achieves around 20% error rate, outperforming a previous HMM-based -system by about 10% absolute. Moreover, in contrast to the baseline, our model -does not require a pre-specified vocabulary size. -" -2557,1603.02905,Adel Rahimi,Lexical bundles in computational linguistics academic literature,cs.CL," In this study we analyzed a corpus of 8 million words academic literature -from Computational lingustics' academic literature. the lexical bundles from -this corpus are categorized based on structures and functions. -" -2558,1603.03112,"Lifu Huang, Jonathan May, Xiaoman Pan, Heng Ji","Building a Fine-Grained Entity Typing System Overnight for a New X (X = - Language, Domain, Genre)",cs.CL cs.AI," Recent research has shown great progress on fine-grained entity typing. Most -existing methods require pre-defining a set of types and training a multi-class -classifier from a large labeled data set based on multi-level linguistic -features. They are thus limited to certain domains, genres and languages. In -this paper, we propose a novel unsupervised entity typing framework by -combining symbolic and distributional semantics. We start from learning general -embeddings for each entity mention, compose the embeddings of specific contexts -using linguistic structures, link the mention to knowledge bases and learn its -related knowledge representations. Then we develop a novel joint hierarchical -clustering and linking algorithm to type all mentions using these -representations. This framework doesn't rely on any annotated data, predefined -typing schema, or hand-crafted features, therefore it can be quickly adapted to -a new domain, genre and language. Furthermore, it has great flexibility at -incorporating linguistic structures (e.g., Abstract Meaning Representation -(AMR), dependency relations) to improve specific context representation. -Experiments on genres (news and discussion forum) show comparable performance -with state-of-the-art supervised typing systems trained from a large amount of -labeled data. Results on various languages (English, Chinese, Japanese, Hausa, -and Yoruba) and domains (general and biomedical) demonstrate the portability of -our framework. -" -2559,1603.03144,Yi Yang and Jacob Eisenstein,Part-of-Speech Tagging for Historical English,cs.CL cs.DL," As more historical texts are digitized, there is interest in applying natural -language processing tools to these archives. However, the performance of these -tools is often unsatisfactory, due to language change and genre differences. -Spelling normalization heuristics are the dominant solution for dealing with -historical texts, but this approach fails to account for changes in usage and -vocabulary. In this empirical paper, we assess the capability of domain -adaptation techniques to cope with historical texts, focusing on the classic -benchmark task of part-of-speech tagging. We evaluate several domain adaptation -methods on the task of tagging Early Modern English and Modern British English -texts in the Penn Corpora of Historical English. We demonstrate that the -Feature Embedding method for unsupervised domain adaptation outperforms word -embeddings and Brown clusters, showing the importance of embedding the entire -feature space, rather than just individual words. Feature Embeddings also give -better performance than spelling normalization, but the combination of the two -methods is better still, yielding a 5% raw improvement in tagging accuracy on -Early Modern English texts. -" -2560,1603.03153,"Bohdan B. Khomtchouk, Claes Wahlestedt","Zipf's law emerges asymptotically during phase transitions in - communicative systems",physics.soc-ph cs.CL," Zipf's law predicts a power-law relationship between word rank and frequency -in language communication systems, and is widely reported in texts yet remains -enigmatic as to its origins. Computer simulations have shown that language -communication systems emerge at an abrupt phase transition in the fidelity of -mappings between symbols and objects. Since the phase transition approximates -the Heaviside or step function, we show that Zipfian scaling emerges -asymptotically at high rank based on the Laplace transform. We thereby -demonstrate that Zipf's law gradually emerges from the moment of phase -transition in communicative systems. We show that this power-law scaling -behavior explains the emergence of natural languages at phase transitions. We -find that the emergence of Zipf's law during language communication suggests -that the use of rare words in a lexicon is critical for the construction of an -effective communicative system at the phase transition. -" -2561,1603.03170,"Laurent Romary (CMB, ALPAGE), Mike Mertens, Anne Baillot (CMB, ALPAGE)",Data fluidity in DARIAH -- pushing the agenda forward,cs.CY cs.CL cs.DL," This paper provides both an update concerning the setting up of the European -DARIAH infrastructure and a series of strong action lines related to the -development of a data centred strategy for the humanities in the coming years. -In particular we tackle various aspect of data management: data hosting, the -setting up of a DARIAH seal of approval, the establishment of a charter between -cultural heritage institutions and scholars and finally a specific view on -certification mechanisms for data. -" -2562,1603.03185,"Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez - Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander - Gruenstein, Francoise Beaufays, Carolina Parada",Personalized Speech recognition on mobile devices,cs.CL cs.LG cs.SD," We describe a large vocabulary speech recognition system that is accurate, -has low latency, and yet has a small enough memory and computational footprint -to run faster than real-time on a Nexus 5 Android smartphone. We employ a -quantized Long Short-Term Memory (LSTM) acoustic model trained with -connectionist temporal classification (CTC) to directly predict phoneme -targets, and further reduce its memory footprint using an SVD-based compression -scheme. Additionally, we minimize our memory footprint by using a single -language model for both dictation and voice command domains, constructed using -Bayesian interpolation. Finally, in order to properly handle device-specific -information, such as proper names and other context-dependent information, we -inject vocabulary items into the decoder graph and bias the language model -on-the-fly. Our system achieves 13.5% word error rate on an open-ended -dictation task, running with a median speed that is seven times faster than -real-time. -" -2563,1603.03610,Mark-Jan Nederhof,A short proof that $O_2$ is an MCFL,cs.FL cs.CL," We present a new proof that $O_2$ is a multiple context-free language. It -contrasts with a recent proof by Salvati (2015) in its avoidance of concepts -that seem specific to two-dimensional geometry, such as the complex exponential -function. Our simple proof creates realistic prospects of widening the results -to higher dimensions. This finding is of central importance to the relation -between extreme free word order and classes of grammars used to describe the -syntax of natural language. -" -2564,1603.03758,"Dane Bell and Gus Hahn-Powell and Marco A. Valenzuela-Esc\'arcega and - Mihai Surdeanu",Sieve-based Coreference Resolution in the Biomedical Domain,cs.CL," We describe challenges and advantages unique to coreference resolution in the -biomedical domain, and a sieve-based architecture that leverages domain -knowledge for both entity and event coreference resolution. Domain-general -coreference resolution algorithms perform poorly on biomedical documents, -because the cues they rely on such as gender are largely absent in this domain, -and because they do not encode domain-specific knowledge such as the number and -type of participants required in chemical reactions. Moreover, it is difficult -to directly encode this knowledge into most coreference resolution algorithms -because they are not rule-based. Our rule-based architecture uses sequentially -applied hand-designed ""sieves"", with the output of each sieve informing and -constraining subsequent sieves. This architecture provides a 3.2% increase in -throughput to our Reach event extraction system with precision parallel to that -of the stricter system that relies solely on syntactic patterns for extraction. -" -2565,1603.03784,"Dane Bell and Daniel Fried and Luwen Huangfu and Mihai Surdeanu and - Stephen Kobourov","Towards using social media to identify individuals at risk for - preventable chronic illness",cs.CL cs.CY cs.SI," We describe a strategy for the acquisition of training data necessary to -build a social-media-driven early detection system for individuals at risk for -(preventable) type 2 diabetes mellitus (T2DM). The strategy uses a game-like -quiz with data and questions acquired semi-automatically from Twitter. The -questions are designed to inspire participant engagement and collect relevant -data to train a public-health model applied to individuals. Prior systems -designed to use social media such as Twitter to predict obesity (a risk factor -for T2DM) operate on entire communities such as states, counties, or cities, -based on statistics gathered by government agencies. Because there is -considerable variation among individuals within these groups, training data on -the individual level would be more effective, but this data is difficult to -acquire. The approach proposed here aims to address this issue. Our strategy -has two steps. First, we trained a random forest classifier on data gathered -from (public) Twitter statuses and state-level statistics with state-of-the-art -accuracy. We then converted this classifier into a 20-questions-style quiz and -made it available online. In doing so, we achieved high engagement with -individuals that took the quiz, while also building a training set of -voluntarily supplied individual-level data for future classification. -" -2566,1603.03793,"Miguel Ballesteros, Yoav Goldberg, Chris Dyer, Noah A. Smith",Training with Exploration Improves a Greedy Stack-LSTM Parser,cs.CL," We adapt the greedy Stack-LSTM dependency parser of Dyer et al. (2015) to -support a training-with-exploration procedure using dynamic oracles(Goldberg -and Nivre, 2013) instead of cross-entropy minimization. This form of training, -which accounts for model predictions at training time rather than assuming an -error-free action history, improves parsing accuracies for both English and -Chinese, obtaining very strong results for both languages. We discuss some -modifications needed in order to get training with exploration to work well for -a probabilistic neural-network. -" -2567,1603.03827,"Ji Young Lee, Franck Dernoncourt","Sequential Short-Text Classification with Recurrent and Convolutional - Neural Networks",cs.CL cs.AI cs.LG cs.NE stat.ML," Recent approaches based on artificial neural networks (ANNs) have shown -promising results for short-text classification. However, many short texts -occur in sequences (e.g., sentences in a document or utterances in a dialog), -and most existing ANN-based systems do not leverage the preceding short texts -when classifying a subsequent one. In this work, we present a model based on -recurrent neural networks and convolutional neural networks that incorporates -the preceding short texts. Our model achieves state-of-the-art results on three -different datasets for dialog act prediction. -" -2568,1603.03873,"Biao Zhang, Deyi Xiong, Jinsong Su",Neural Discourse Relation Recognition with Semantic Memory,cs.CL," Humans comprehend the meanings and relations of discourses heavily relying on -their semantic memory that encodes general knowledge about concepts and facts. -Inspired by this, we propose a neural recognizer for implicit discourse -relation analysis, which builds upon a semantic memory that stores knowledge in -a distributed fashion. We refer to this recognizer as SeMDER. Starting from -word embeddings of discourse arguments, SeMDER employs a shallow encoder to -generate a distributed surface representation for a discourse. A semantic -encoder with attention to the semantic memory matrix is further established -over surface representations. It is able to retrieve a deep semantic meaning -representation for the discourse from the memory. Using the surface and -semantic representations as input, SeMDER finally predicts implicit discourse -relations via a neural recognizer. Experiments on the benchmark data set show -that SeMDER benefits from the semantic memory and achieves substantial -improvements of 2.56\% on average over current state-of-the-art baselines in -terms of F1-score. -" -2569,1603.03876,"Biao Zhang, Deyi Xiong, Jinsong Su, Qun Liu, Rongrong Ji, Hong Duan, - Min Zhang",Variational Neural Discourse Relation Recognizer,cs.CL," Implicit discourse relation recognition is a crucial component for automatic -discourselevel analysis and nature language understanding. Previous studies -exploit discriminative models that are built on either powerful manual features -or deep discourse representations. In this paper, instead, we explore -generative models and propose a variational neural discourse relation -recognizer. We refer to this model as VarNDRR. VarNDRR establishes a directed -probabilistic model with a latent continuous variable that generates both a -discourse and the relation between the two arguments of the discourse. In order -to perform efficient inference and learning, we introduce neural discourse -relation models to approximate the prior and posterior distributions of the -latent variable, and employ these approximated distributions to optimize a -reparameterized variational lower bound. This allows VarNDRR to be trained with -standard stochastic gradient methods. Experiments on the benchmark data set -show that VarNDRR can achieve comparable results against stateof- the-art -baselines without using any manual features. -" -2570,1603.04236,Nicolai Winther-Nielsen (FIUC-Dk),Interactive Tools and Tasks for the Hebrew Bible,cs.CL," This contribution to a special issue on ""Computer-aided processing of -intertextuality"" in ancient texts will illustrate how using digital tools to -interact with the Hebrew Bible offers new promising perspectives for -visualizing the texts and for performing tasks in education and research. This -contribution explores how the corpus of the Hebrew Bible created and maintained -by the Eep Talstra Centre for Bible and Computer can support new methods for -modern knowledge workers within the field of digital humanities and theology be -applied to ancient texts, and how this can be envisioned as a new field of -digital intertextuality. The article first describes how the corpus was used to -develop the Bible Online Learner as a persuasive technology to enhance language -learning with, in, and around a database that acts as the engine driving -interactive tasks for learners. Intertextuality in this case is a matter of -active exploration and ongoing practice. Furthermore, interactive -corpus-technology has an important bearing on the task of textual criticism as -a specialized area of research that depends increasingly on the availability of -digital resources. Commercial solutions developed by software companies like -Logos and Accordance offer a market-based intertextuality defined by the -production of advanced digital resources for scholars and students as useful -alternatives to often inaccessible and expensive printed versions. It is -reasonable to expect that in the future interactive corpus technology will -allow scholars to do innovative academic tasks in textual criticism and -interpretation. We have already seen the emergence of promising tools for text -categorization, analysis of translation shifts, and interpretation. Broadly -speaking, interactive tools and tasks within the three areas of language -learning, textual criticism, and Biblical studies illustrate a new kind of -intertextuality emerging within digital humanities. -" -2571,1603.04351,"Eliyahu Kiperwasser, Yoav Goldberg","Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature - Representations",cs.CL," We present a simple and effective scheme for dependency parsing which is -based on bidirectional-LSTMs (BiLSTMs). Each sentence token is associated with -a BiLSTM vector representing the token in its sentential context, and feature -vectors are constructed by concatenating a few BiLSTM vectors. The BiLSTM is -trained jointly with the parser objective, resulting in very effective feature -extractors for parsing. We demonstrate the effectiveness of the approach by -applying it to a greedy transition-based parser as well as to a globally -optimized graph-based parser. The resulting parsers have very simple -architectures, and match or surpass the state-of-the-art accuracies on English -and Chinese. -" -2572,1603.04513,"Wenpeng Yin, Hinrich Sch\""utze",Multichannel Variable-Size Convolution for Sentence Classification,cs.CL," We propose MVCNN, a convolution neural network (CNN) architecture for -sentence classification. It (i) combines diverse versions of pretrained word -embeddings and (ii) extracts features of multigranular phrases with -variable-size convolution filters. We also show that pretraining MVCNN is -critical for good performance. MVCNN achieves state-of-the-art performance on -four tasks: on small-scale binary, small-scale multi-class and largescale -Twitter sentiment prediction and on subjectivity classification. -" -2573,1603.04553,Xuezhe Ma and Zhengzhong Liu and Eduard Hovy,Unsupervised Ranking Model for Entity Coreference Resolution,cs.CL cs.LG," Coreference resolution is one of the first stages in deep language -understanding and its importance has been well recognized in the natural -language processing community. In this paper, we propose a generative, -unsupervised ranking model for entity coreference resolution by introducing -resolution mode variables. Our unsupervised system achieves 58.44% F1 score of -the CoNLL metric on the English data from the CoNLL-2012 shared task (Pradhan -et al., 2012), outperforming the Stanford deterministic system (Lee et al., -2013) by 3.01%. -" -2574,1603.04747,Ramandeep S Randhawa and Parag Jain and Gagan Madan,Topic Modeling Using Distributed Word Embeddings,cs.CL," We propose a new algorithm for topic modeling, Vec2Topic, that identifies the -main topics in a corpus using semantic information captured via -high-dimensional distributed word embeddings. Our technique is unsupervised and -generates a list of topics ranked with respect to importance. We find that it -works better than existing topic modeling techniques such as Latent Dirichlet -Allocation for identifying key topics in user-generated content, such as -emails, chats, etc., where topics are diffused across the corpus. We also find -that Vec2Topic works equally well for non-user generated content, such as -papers, reports, etc., and for small corpora such as a single-document. -" -2575,1603.04767,"Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning and - Eneko Agirre",Evaluating the word-expert approach for Named-Entity Disambiguation,cs.CL," Named Entity Disambiguation (NED) is the task of linking a named-entity -mention to an instance in a knowledge-base, typically Wikipedia. This task is -closely related to word-sense disambiguation (WSD), where the supervised -word-expert approach has prevailed. In this work we present the results of the -word-expert approach to NED, where one classifier is built for each target -entity mention string. The resources necessary to build the system, a -dictionary and a set of training instances, have been automatically derived -from Wikipedia. We provide empirical evidence of the value of this approach, as -well as a study of the differences between WSD and NED, including ambiguity and -synonymy statistics. -" -2576,1603.05118,"Stanislau Semeniuta, Aliaksei Severyn, Erhardt Barth",Recurrent Dropout without Memory Loss,cs.CL," This paper presents a novel approach to recurrent neural network (RNN) -regularization. Differently from the widely adopted dropout method, which is -applied to \textit{forward} connections of feed-forward architectures or RNNs, -we propose to drop neurons directly in \textit{recurrent} connections in a way -that does not cause loss of long-term memory. Our approach is as easy to -implement and apply as the regular feed-forward dropout and we demonstrate its -effectiveness for Long Short-Term Memory network, the most popular type of RNN -cells. Our experiments on NLP benchmarks show consistent improvements even when -combined with conventional feed-forward dropout. -" -2577,1603.05157,"Heike Adel and Benjamin Roth and Hinrich Sch\""utze","Comparing Convolutional Neural Networks to Traditional Models for Slot - Filling",cs.CL," We address relation classification in the context of slot filling, the task -of finding and evaluating fillers like ""Steve Jobs"" for the slot X in ""X -founded Apple"". We propose a convolutional neural network which splits the -input sentence into three parts according to the relation arguments and compare -it to state-of-the-art and traditional approaches of relation classification. -Finally, we combine different methods and show that the combination is better -than individual approaches. We also analyze the effect of genre differences on -performance. -" -2578,1603.05350,Javier Vera,Self-organization of vocabularies under different interaction orders,cs.CL physics.soc-ph," Traditionally, the formation of vocabularies has been studied by agent-based -models (specially, the Naming Game) in which random pairs of agents negotiate -word-meaning associations at each discrete time step. This paper proposes a -first approximation to a novel question: To what extent the negotiation of -word-meaning associations is influenced by the order in which the individuals -interact? Automata Networks provide the adequate mathematical framework to -explore this question. Computer simulations suggest that on two-dimensional -lattices the typical features of the formation of word-meaning associations are -recovered under random schemes that update small fractions of the population at -the same time. -" -2579,1603.05354,Javier Vera,"Modeling self-organization of vocabularies under phonological similarity - effects",cs.CL physics.soc-ph," This work develops a computational model (by Automata Networks) of -phonological similarity effects involved in the formation of word-meaning -associations on artificial populations of speakers. Classical studies show that -in recalling experiments memory performance was impaired for phonologically -similar words versus dissimilar ones. Here, the individuals confound -phonologically similar words according to a predefined parameter. The main -hypothesis is that there is a critical range of the parameter, and with this, -of working-memory mechanisms, which implies drastic changes in the final -consensus of the entire population. Theoretical results present proofs of -convergence for a particular case of the model within a worst-case complexity -framework. Computer simulations describe the evolution of an energy function -that measures the amount of local agreement between individuals. The main -finding is the appearance of sudden changes in the energy function at critical -parameters. -" -2580,1603.05570,Ryuta Arisaka,Predicate Gradual Logic and Linguistics,cs.CL," There are several major proposals for treating donkey anaphora such as -discourse representation theory and the likes, or E-Type theories and the -likes. Every one of them works well for a set of specific examples that they -use to demonstrate validity of their approaches. As I show in this paper, -however, they are not very generalisable and do not account for essentially the -same problem that they remedy when it manifests in other examples. I propose -another logical approach. I develoop logic that extends a recent, propositional -gradual logic, and show that it can treat donkey anaphora generally. I also -identify and address a problem around the modern convention on existential -import. Furthermore, I show that Aristotle's syllogisms and conversion are -realisable in this logic. -" -2581,1603.05670,"Samuel R\""onnqvist, Peter Sarlin",Bank distress in the news: Describing events through deep learning,cs.CL cs.AI cs.IR cs.NE q-fin.CP," While many models are purposed for detecting the occurrence of significant -events in financial systems, the task of providing qualitative detail on the -developments is not usually as well automated. We present a deep learning -approach for detecting relevant discussion in text and extracting natural -language descriptions of events. Supervised by only a small set of event -information, comprising entity names and dates, the model is leveraged by -unsupervised learning of semantic vector representations on extensive text -data. We demonstrate applicability to the study of financial risk based on news -(6.6M articles), particularly bank distress and government interventions (243 -events), where indices can signal the level of bank-stress-related reporting at -the entity level, or aggregated at national or European level, while being -coupled with explanations. Thus, we exemplify how text, as timely, widely -available and descriptive data, can serve as a useful complementary source of -information for financial and systemic risk analytics. -" -2582,1603.05673,Samantha Wong and Hamidreza Chinaei and Frank Rudzicz,Predicting health inspection results from online restaurant reviews,cs.CL cs.LG," Informatics around public health are increasingly shifting from the -professional to the public spheres. In this work, we apply linguistic analytics -to restaurant reviews, from Yelp, in order to automatically predict official -health inspection reports. We consider two types of feature sets, i.e., keyword -detection and topic model features, and use these in several classification -methods. Our empirical analysis shows that these extracted features can predict -public health inspection reports with over 90% accuracy using simple support -vector machines. -" -2583,1603.05739,"Elliot Schumacher, Maxine Eskenazi","A Readability Analysis of Campaign Speeches from the 2016 US - Presidential Campaign",cs.CL," Readability is defined as the reading level of the speech from grade 1 to -grade 12. It results from the use of the REAP readability analysis (vocabulary -- Collins-Thompson and Callan, 2004; syntax - Heilman et al ,2006, 2007), which -use the lexical contents and grammatical structure of the sentences in a -document to predict the reading level. After analysis, results were grouped -into the average readability of each candidate, the evolution of the -candidate's speeches' readability over time and the standard deviation, or how -much each candidate varied their speech from one venue to another. For -comparison, one speech from four past presidents and the Gettysburg Address -were also analyzed. -" -2584,1603.05962,"Stanislas Lauly, Yin Zheng, Alexandre Allauzen, Hugo Larochelle",Document Neural Autoregressive Distribution Estimation,cs.LG cs.CL," We present an approach based on feed-forward neural networks for learning the -distribution of textual documents. This approach is inspired by the Neural -Autoregressive Distribution Estimator(NADE) model, which has been shown to be a -good estimator of the distribution of discrete-valued igh-dimensional vectors. -In this paper, we present how NADE can successfully be adapted to the case of -textual data, retaining from NADE the property that sampling or computing the -probability of observations can be done exactly and efficiently. The approach -can also be used to learn deep representations of documents that are -competitive to those learned by the alternative topic modeling approaches. -Finally, we describe how the approach can be combined with a regular neural -network N-gram model and substantially improve its performance, by making its -learned representation sensitive to the larger, document-specific context. -" -2585,1603.06009,Sowmya Vajjala and Detmar Meurers,Readability-based Sentence Ranking for Evaluating Text Simplification,cs.CL," We propose a new method for evaluating the readability of simplified -sentences through pair-wise ranking. The validity of the method is established -through in-corpus and cross-corpus evaluation experiments. The approach -correctly identifies the ranking of simplified and unsimplified sentences in -terms of their reading level with an accuracy of over 80%, significantly -outperforming previous results. To gain qualitative insights into the nature of -simplification at the sentence level, we studied the impact of specific -linguistic features. We empirically confirm that both word-level and syntactic -features play a role in comparing the degree of simplification of authentic -data. To carry out this research, we created a new sentence-aligned corpus from -professionally simplified news articles. The new corpus resource enriches the -empirical basis of sentence-level simplification research, which so far relied -on a single resource. Most importantly, it facilitates cross-corpus evaluation -for simplification, a key step towards generalizable results. -" -2586,1603.06021,"Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, - Christopher D. Manning, Christopher Potts",A Fast Unified Model for Parsing and Sentence Understanding,cs.CL," Tree-structured neural networks exploit valuable syntactic parse information -as they interpret the meanings of sentences. However, they suffer from two key -technical problems that make them slow and unwieldy for large-scale NLP tasks: -they usually operate on parsed sentences and they do not directly support -batched computation. We address these issues by introducing the Stack-augmented -Parser-Interpreter Neural Network (SPINN), which combines parsing and -interpretation within a single tree-sequence hybrid model by integrating -tree-structured sentence interpretation into the linear sequential structure of -a shift-reduce parser. Our model supports batched computation for a speedup of -up to 25 times over other tree-structured models, and its integrated parser can -operate on unparsed data with little loss in accuracy. We evaluate it on the -Stanford NLI entailment task and show that it significantly outperforms other -sentence-encoding models. -" -2587,1603.06042,"Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro - Presta, Kuzman Ganchev, Slav Petrov and Michael Collins",Globally Normalized Transition-Based Neural Networks,cs.CL cs.LG cs.NE," We introduce a globally normalized transition-based neural network model that -achieves state-of-the-art part-of-speech tagging, dependency parsing and -sentence compression results. Our model is a simple feed-forward neural network -that operates on a task-specific transition system, yet achieves comparable or -better accuracies than recurrent models. We discuss the importance of global as -opposed to local normalization: a key insight is that the label bias problem -implies that globally normalized models can be strictly more expressive than -locally normalized models. -" -2588,1603.06059,"Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, - Xiaodong He, Lucy Vanderwende",Generating Natural Questions About an Image,cs.CL cs.AI cs.CV," There has been an explosion of work in the vision & language community during -the past few years from image captioning to video transcription, and answering -questions about images. These tasks have focused on literal descriptions of the -image. To move beyond the literal, we choose to explore how questions about an -image are often directed at commonsense inference and the abstract events -evoked by objects in the image. In this paper, we introduce the novel task of -Visual Question Generation (VQG), where the system is tasked with asking a -natural and engaging question when shown an image. We provide three datasets -which cover a variety of images from object-centric to event-centric, with -considerably more abstract training data than provided to state-of-the-art -captioning systems thus far. We train and test several generative and retrieval -models to tackle the task of VQG. Evaluation results show that while such -models ask reasonable questions for a variety of images, there is still a wide -gap with human performance which motivates further work on connecting images -with commonsense knowledge and pragmatics. Our proposed task offers a new -challenge to the community which we hope furthers interest in exploring deeper -connections between vision & language. -" -2589,1603.06067,Kazuma Hashimoto and Yoshimasa Tsuruoka,"Adaptive Joint Learning of Compositional and Non-Compositional Phrase - Embeddings",cs.CL," We present a novel method for jointly learning compositional and -non-compositional phrase embeddings by adaptively weighting both types of -embeddings using a compositionality scoring function. The scoring function is -used to quantify the level of compositionality of each phrase, and the -parameters of the function are jointly optimized with the objective for -learning phrase embeddings. In experiments, we apply the adaptive joint -learning method to the task of learning embeddings of transitive verb phrases, -and show that the compositionality scores have strong correlation with human -ratings for verb-object compositionality, substantially outperforming the -previous state of the art. Moreover, our embeddings improve upon the previous -best model on a transitive verb disambiguation task. We also show that a simple -ensemble technique further improves the results for both tasks. -" -2590,1603.06075,"Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka",Tree-to-Sequence Attentional Neural Machine Translation,cs.CL," Most of the existing Neural Machine Translation (NMT) models focus on the -conversion of sequential data and do not directly use syntactic information. We -propose a novel end-to-end syntactic NMT model, extending a -sequence-to-sequence model with the source-side phrase structure. Our model has -an attention mechanism that enables the decoder to generate a translated word -while softly aligning it with phrases as well as words of the source sentence. -Experimental results on the WAT'15 English-to-Japanese dataset demonstrate that -our proposed model considerably outperforms sequence-to-sequence attentional -NMT models and compares favorably with the state-of-the-art tree-to-string SMT -system. -" -2591,1603.06076,"Vered Shwartz, Yoav Goldberg and Ido Dagan","Improving Hypernymy Detection with an Integrated Path-based and - Distributional Method",cs.CL," Detecting hypernymy relations is a key task in NLP, which is addressed in the -literature using two complementary approaches. Distributional methods, whose -supervised variants are the current best performers, and path-based methods, -which received less research attention. We suggest an improved path-based -algorithm, in which the dependency paths are encoded using a recurrent neural -network, that achieves results comparable to distributional methods. We then -extend the approach to integrate both path-based and distributional signals, -significantly improving upon the state-of-the-art on this task. -" -2592,1603.06111,"Lili Mou, Zhao Meng, Rui Yan, Ge Li, Yan Xu, Lu Zhang, Zhi Jin",How Transferable are Neural Networks in NLP Applications?,cs.CL cs.LG cs.NE," Transfer learning is aimed to make use of valuable knowledge in a source -domain to help model performance in a target domain. It is particularly -important to neural networks, which are very likely to be overfitting. In some -fields like image processing, many studies have shown the effectiveness of -neural network-based transfer learning. For neural NLP, however, existing -studies have only casually applied transfer learning, and conclusions are -inconsistent. In this paper, we conduct systematic case studies and provide an -illuminating picture on the transferability of neural networks in NLP. -" -2593,1603.06127,"Petr Baudi\v{s}, Jan Pichl, Tom\'a\v{s} Vysko\v{c}il, Jan \v{S}ediv\'y",Sentence Pair Scoring: Towards Unified Framework for Text Comprehension,cs.CL cs.AI cs.LG cs.NE," We review the task of Sentence Pair Scoring, popular in the literature in -various forms - viewed as Answer Sentence Selection, Semantic Text Scoring, -Next Utterance Ranking, Recognizing Textual Entailment, Paraphrasing or e.g. a -component of Memory Networks. - We argue that all such tasks are similar from the model perspective and -propose new baselines by comparing the performance of common IR metrics and -popular convolutional, recurrent and attention-based neural models across many -Sentence Pair Scoring tasks and datasets. We discuss the problem of evaluating -randomized models, propose a statistically grounded methodology, and attempt to -improve comparisons by releasing new datasets that are much harder than some of -the currently used well explored benchmarks. We introduce a unified open source -software framework with easily pluggable models and tasks, which enables us to -experiment with multi-task reusability of trained sentence model. We set a new -state-of-art in performance on the Ubuntu Dialogue dataset. -" -2594,1603.06147,"Junyoung Chung, Kyunghyun Cho and Yoshua Bengio","A Character-Level Decoder without Explicit Segmentation for Neural - Machine Translation",cs.CL cs.LG," The existing machine translation systems, whether phrase-based or neural, -have relied almost exclusively on word-level modelling with explicit -segmentation. In this paper, we ask a fundamental question: can neural machine -translation generate a character sequence without any explicit segmentation? To -answer this question, we evaluate an attention-based encoder-decoder with a -subword-level encoder and a character-level decoder on four language -pairs--En-Cs, En-De, En-Ru and En-Fi-- using the parallel corpora from WMT'15. -Our experiments show that the models with a character-level decoder outperform -the ones with a subword-level decoder on all of the four language pairs. -Furthermore, the ensembles of neural models with a character-level decoder -outperform the state-of-the-art non-neural machine translation systems on -En-Cs, En-De and En-Fi and perform comparably on En-Ru. -" -2595,1603.06155,"Jiwei Li, Michel Galley, Chris Brockett, Georgios P. Spithourakis, - Jianfeng Gao and Bill Dolan",A Persona-Based Neural Conversation Model,cs.CL," We present persona-based models for handling the issue of speaker consistency -in neural response generation. A speaker model encodes personas in distributed -embeddings that capture individual characteristics such as background -information and speaking style. A dyadic speaker-addressee model captures -properties of interactions between two interlocutors. Our models yield -qualitative performance improvements in both perplexity and BLEU scores over -baseline sequence-to-sequence models, with similar gains in speaker consistency -as measured by human judges. -" -2596,1603.06270,Zhilin Yang and Ruslan Salakhutdinov and William Cohen,Multi-Task Cross-Lingual Sequence Tagging from Scratch,cs.CL cs.LG," We present a deep hierarchical recurrent neural network for sequence tagging. -Given a sequence of words, our model employs deep gated recurrent units on both -character and word levels to encode morphology and context information, and -applies a conditional random field layer to predict the tags. Our model is task -independent, language independent, and feature engineering free. We further -extend our model to multi-task and cross-lingual joint training by sharing the -architecture and parameters. Our model achieves state-of-the-art results in -multiple languages on several benchmark tasks including POS tagging, chunking, -and NER. We also demonstrate that multi-task and cross-lingual joint training -can improve the performance in various cases. -" -2597,1603.06318,"Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, Eric Xing",Harnessing Deep Neural Networks with Logic Rules,cs.LG cs.AI cs.CL stat.ML," Combining deep neural networks with structured logic rules is desirable to -harness flexibility and reduce uninterpretability of the neural models. We -propose a general framework capable of enhancing various types of neural -networks (e.g., CNNs and RNNs) with declarative first-order logic rules. -Specifically, we develop an iterative distillation method that transfers the -structured information of logic rules into the weights of neural networks. We -deploy the framework on a CNN for sentiment analysis, and an RNN for named -entity recognition. With a few highly intuitive rules, we obtain substantial -improvements and achieve state-of-the-art or comparable results to previous -best-performing systems. -" -2598,1603.06393,"Jiatao Gu, Zhengdong Lu, Hang Li and Victor O.K. Li",Incorporating Copying Mechanism in Sequence-to-Sequence Learning,cs.CL cs.AI cs.LG cs.NE," We address an important problem in sequence-to-sequence (Seq2Seq) learning -referred to as copying, in which certain segments in the input sequence are -selectively replicated in the output sequence. A similar phenomenon is -observable in human language communication. For example, humans tend to repeat -entity names or even long phrases in conversation. The challenge with regard to -copying in Seq2Seq is that new machinery is needed to decide when to perform -the operation. In this paper, we incorporate copying into neural network-based -Seq2Seq learning and propose a new model called CopyNet with encoder-decoder -structure. CopyNet can nicely integrate the regular way of word generation in -the decoder with the new copying mechanism which can choose sub-sequences in -the input sequence and put them at proper places in the output sequence. Our -empirical study on both synthetic data sets and real world data sets -demonstrates the efficacy of CopyNet. For example, CopyNet can outperform -regular RNN-based model with remarkable margins on text summarization tasks. -" -2599,1603.06485,Lisa Posch and Philipp Schaer and Arnim Bleier and Markus Strohmaier,"A System for Probabilistic Linking of Thesauri and Classification - Systems",cs.AI cs.CL cs.DL," This paper presents a system which creates and visualizes probabilistic -semantic links between concepts in a thesaurus and classes in a classification -system. For creating the links, we build on the Polylingual Labeled Topic Model -(PLL-TM). PLL-TM identifies probable thesaurus descriptors for each class in -the classification system by using information from the natural language text -of documents, their assigned thesaurus descriptors and their designated -classes. The links are then presented to users of the system in an interactive -visualization, providing them with an automatically generated overview of the -relations between the thesaurus and the classification system. -" -2600,1603.06503,Bernd Bohnet and Miguel Ballesteros and Ryan McDonald and Joakim Nivre,Static and Dynamic Feature Selection in Morphosyntactic Analyzers,cs.CL," We study the use of greedy feature selection methods for morphosyntactic -tagging under a number of different conditions. We compare a static ordering of -features to a dynamic ordering based on mutual information statistics, and we -apply the techniques to standalone taggers as well as joint systems for tagging -and parsing. Experiments on five languages show that feature selection can -result in more compact models as well as higher accuracy under all conditions, -but also that a dynamic ordering works better than a static ordering and that -joint systems benefit more than standalone taggers. We also show that the same -techniques can be used to select which morphosyntactic categories to predict in -order to maximize syntactic accuracy in a joint system. Our final results -represent a substantial improvement of the state of the art for several -languages, while at the same time reducing both the number of features and the -running time by up to 80% in some cases. -" -2601,1603.06571,Oren Barkan,Bayesian Neural Word Embedding,cs.CL cs.LG," Recently, several works in the domain of natural language processing -presented successful methods for word embedding. Among them, the Skip-Gram with -negative sampling, known also as word2vec, advanced the state-of-the-art of -various linguistics tasks. In this paper, we propose a scalable Bayesian neural -word embedding algorithm. The algorithm relies on a Variational Bayes solution -for the Skip-Gram objective and a detailed step by step description is -provided. We present experimental results that demonstrate the performance of -the proposed algorithm for word analogy and similarity tasks on six different -datasets and show it is competitive with the original Skip-Gram method. -" -2602,1603.06598,"Yuan Zhang, David Weiss",Stack-propagation: Improved Representation Learning for Syntax,cs.CL," Traditional syntax models typically leverage part-of-speech (POS) information -by constructing features from hand-tuned templates. We demonstrate that a -better approach is to utilize POS tags as a regularizer of learned -representations. We propose a simple method for learning a stacked pipeline of -models which we call ""stack-propagation"". We apply this to dependency parsing -and tagging, where we use the hidden layer of the tagger network as a -representation of the input tokens for the parser. At test time, our parser -does not require predicted POS tags. On 19 languages from the Universal -Dependencies, our method is 1.3% (absolute) more accurate than a -state-of-the-art graph-based approach and 2.7% more accurate than the most -comparable greedy model. -" -2603,1603.06677,Percy Liang,Learning Executable Semantic Parsers for Natural Language Understanding,cs.CL cs.AI," For building question answering systems and natural language interfaces, -semantic parsing has emerged as an important and powerful paradigm. Semantic -parsers map natural language into logical forms, the classic representation for -many important linguistic phenomena. The modern twist is that we are interested -in learning semantic parsers from data, which introduces a new layer of -statistical and computational issues. This article lays out the components of a -statistical semantic parser, highlighting the key challenges. We will see that -semantic parsing is a rich fusion of the logical and the statistical world, and -that this fusion will play an integral role in the future of natural language -understanding systems. -" -2604,1603.06679,"Wenya Wang, Sinno Jialin Pan, Daniel Dahlmeier and Xiaokui Xiao","Recursive Neural Conditional Random Fields for Aspect-based Sentiment - Analysis",cs.CL cs.IR cs.LG," In aspect-based sentiment analysis, extracting aspect terms along with the -opinions being expressed from user-generated content is one of the most -important subtasks. Previous studies have shown that exploiting connections -between aspect and opinion terms is promising for this task. In this paper, we -propose a novel joint model that integrates recursive neural networks and -conditional random fields into a unified framework for explicit aspect and -opinion terms co-extraction. The proposed model learns high-level -discriminative features and double propagate information between aspect and -opinion terms, simultaneously. Moreover, it is flexible to incorporate -hand-crafted features into the proposed model to further boost its information -extraction performance. Experimental results on the SemEval Challenge 2014 -dataset show the superiority of our proposed model over several baseline -methods as well as the winning systems of the challenge. -" -2605,1603.06744,"Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tom\'a\v{s} - Ko\v{c}isk\'y, Andrew Senior, Fumin Wang, Phil Blunsom",Latent Predictor Networks for Code Generation,cs.CL cs.NE," Many language generation tasks require the production of text conditioned on -both structured and unstructured inputs. We present a novel neural network -architecture which generates an output sequence conditioned on an arbitrary -number of input functions. Crucially, our approach allows both the choice of -conditioning context and the granularity of generation, for example characters -or tokens, to be marginalised, thus permitting scalable and effective training. -Using this framework, we address the problem of generating programming code -from a mixed natural language and structured specification. We create two new -data sets for this paradigm derived from the collectible trading card games -Magic the Gathering and Hearthstone. On these, and a third preexisting corpus, -we demonstrate that marginalising multiple predictors allows our model to -outperform strong benchmarks. -" -2606,1603.06785,"Krzysztof Wo{\l}k, Emilia Rejmund, Krzysztof Marasek","Multi-domain machine translation enhancements by parallel data - extraction from comparable corpora",cs.CL stat.ML," Parallel texts are a relatively rare language resource, however, they -constitute a very useful research material with a wide range of applications. -This study presents and analyses new methodologies we developed for obtaining -such data from previously built comparable corpora. The methodologies are -automatic and unsupervised which makes them good for large scale research. The -task is highly practical as non-parallel multilingual data occur much more -frequently than parallel corpora and accessing them is easy, although parallel -sentences are a considerably more useful resource. In this study, we propose a -method of automatic web crawling in order to build topic-aligned comparable -corpora, e.g. based on the Wikipedia or Euronews.com. We also developed new -methods of obtaining parallel sentences from comparable data and proposed -methods of filtration of corpora capable of selecting inconsistent or only -partially equivalent translations. Our methods are easily scalable to other -languages. Evaluation of the quality of the created corpora was performed by -analysing the impact of their use on statistical machine translation systems. -Experiments were presented on the basis of the Polish-English language pair for -texts from different domains, i.e. lectures, phrasebooks, film dialogues, -European Parliament proceedings and texts contained medicines leaflets. We also -tested a second method of creating parallel corpora based on data from -comparable corpora which allows for automatically expanding the existing corpus -of sentences about a given domain on the basis of analogies found between them. -It does not require, therefore, having past parallel resources in order to -train a classifier. -" -2607,1603.06807,"Iulian Vlad Serban, Alberto Garc\'ia-Dur\'an, Caglar Gulcehre, Sungjin - Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio","Generating Factoid Questions With Recurrent Neural Networks: The 30M - Factoid Question-Answer Corpus",cs.CL cs.AI cs.LG cs.NE," Over the past decade, large-scale supervised learning corpora have enabled -machine learning researchers to make substantial advances. However, to this -date, there are no large-scale question-answer corpora available. In this paper -we present the 30M Factoid Question-Answer Corpus, an enormous question answer -pair corpus produced by applying a novel neural network architecture on the -knowledge base Freebase to transduce facts into natural language questions. The -produced question answer pairs are evaluated both by human evaluators and using -automatic evaluation metrics, including well-established machine translation -and sentence similarity metrics. Across all evaluation criteria the -question-generation model outperforms the competing template-based baseline. -Furthermore, when presented to human evaluators, the generated questions appear -comparable in quality to real human-generated questions. -" -2608,1603.07012,"Dayu Yuan and Julian Richardson and Ryan Doherty and Colin Evans and - Eric Altendorf",Semi-supervised Word Sense Disambiguation with Neural Models,cs.CL," Determining the intended sense of words in text - word sense disambiguation -(WSD) - is a long standing problem in natural language processing. Recently, -researchers have shown promising results using word vectors extracted from a -neural network language model as features in WSD algorithms. However, a simple -average or concatenation of word vectors for each word in a text loses the -sequential and syntactic information of the text. In this paper, we study WSD -with a sequence learning neural net, LSTM, to better capture the sequential and -syntactic patterns of the text. To alleviate the lack of training data in -all-words WSD, we employ the same LSTM in a semi-supervised label propagation -classifier. We demonstrate state-of-the-art results, especially on verbs. -" -2609,1603.07044,"Wei-Ning Hsu, Yu Zhang and James Glass","Recurrent Neural Network Encoder with Attention for Community Question - Answering",cs.CL cs.LG cs.NE," We apply a general recurrent neural network (RNN) encoder framework to -community question answering (cQA) tasks. Our approach does not rely on any -linguistic processing, and can be applied to different languages or domains. -Further improvements are observed when we extend the RNN encoders with a neural -attention mechanism that encourages reasoning over entire sequences. To deal -with practical issues such as data sparsity and imbalanced labels, we apply -various techniques such as transfer learning and multitask learning. Our -experiments on the SemEval-2016 cQA task show 10% improvement on a MAP score -compared to an information retrieval-based approach, and achieve comparable -performance to a strong handcrafted feature-based method. -" -2610,1603.07150,"Martyn Harris, Mark Levene, Dell Zhang, Dan Levene",The Anatomy of a Search and Mining System for Digital Archives,cs.DL cs.CL cs.IR," Samtla (Search And Mining Tools with Linguistic Analysis) is a digital -humanities system designed in collaboration with historians and linguists to -assist them with their research work in quantifying the content of any textual -corpora through approximate phrase search and document comparison. The -retrieval engine uses a character-based n-gram language model rather than the -conventional word-based one so as to achieve great flexibility in language -agnostic query processing. - The index is implemented as a space-optimised character-based suffix tree -with an accompanying database of document content and metadata. A number of -text mining tools are integrated into the system to allow researchers to -discover textual patterns, perform comparative analysis, and find out what is -currently popular in the research community. - Herein we describe the system architecture, user interface, models and -algorithms, and data storage of the Samtla system. We also present several case -studies of its usage in practice together with an evaluation of the systems' -ranking performance through crowdsourcing. -" -2611,1603.07185,Rajesh Bordawekar and Oded Shmueli,"Enabling Cognitive Intelligence Queries in Relational Databases using - Low-dimensional Word Embeddings",cs.CL cs.DB," We apply distributed language embedding methods from Natural Language -Processing to assign a vector to each database entity associated token (for -example, a token may be a word occurring in a table row, or the name of a -column). These vectors, of typical dimension 200, capture the meaning of tokens -based on the contexts in which the tokens appear together. To form vectors, we -apply a learning method to a token sequence derived from the database. We -describe various techniques for extracting token sequences from a database. The -techniques differ in complexity, in the token sequences they output and in the -database information used (e.g., foreign keys). The vectors can be used to -algebraically quantify semantic relationships between the tokens such as -similarities and analogies. Vectors enable a dual view of the data: relational -and (meaningful rather than purely syntactical) text. We introduce and explore -a new class of queries called cognitive intelligence (CI) queries that extract -information from the database based, in part, on the relationships encoded by -vectors. We have implemented a prototype system on top of Spark to exhibit the -power of CI queries. Here, CI queries are realized via SQL UDFs. This power -goes far beyond text extensions to relational systems due to the information -encoded in vectors. We also consider various extensions to the basic scheme, -including using a collection of views derived from the database to focus on a -domain of interest, utilizing vectors and/or text from external sources, -maintaining vectors as the database evolves and exploring a database without -utilizing its schema. For the latter, we consider minimal extensions to SQL to -vastly improve query expressiveness. -" -2612,1603.07252,"Jianpeng Cheng, Mirella Lapata",Neural Summarization by Extracting Sentences and Words,cs.CL," Traditional approaches to extractive summarization rely heavily on -human-engineered features. In this work we propose a data-driven approach based -on neural networks and continuous sentence features. We develop a general -framework for single-document summarization composed of a hierarchical document -encoder and an attention-based extractor. This architecture allows us to -develop different classes of summarization models which can extract sentences -or words. We train our models on large scale corpora containing hundreds of -thousands of document-summary pairs. Experimental results on two summarization -datasets demonstrate that our models obtain results comparable to the state of -the art without any access to linguistic annotation. -" -2613,1603.07253,"Kimberly Glasgow, Matthew Roos, Amy Haufler, Mark Chevillet, Michael - Wolmetz",Evaluating semantic models with word-sentence relatedness,cs.CL," Semantic textual similarity (STS) systems are designed to encode and evaluate -the semantic similarity between words, phrases, sentences, and documents. One -method for assessing the quality or authenticity of semantic information -encoded in these systems is by comparison with human judgments. A data set for -evaluating semantic models was developed consisting of 775 English -word-sentence pairs, each annotated for semantic relatedness by human raters -engaged in a Maximum Difference Scaling (MDS) task, as well as a faster -alternative task. As a sample application of this relatedness data, -behavior-based relatedness was compared to the relatedness computed via four -off-the-shelf STS models: n-gram, Latent Semantic Analysis (LSA), Word2Vec, and -UMBC Ebiquity. Some STS models captured much of the variance in the human -judgments collected, but they were not sensitive to the implicatures and -entailments that were processed and considered by the participants. All text -stimuli and judgment data have been made freely available. -" -2614,1603.07313,"Piedad Garrido, Jesus Tramullas, Manuel Coll","CONDITOR1: Topic Maps and DITA labelling tool for textual documents with - historical information",cs.DL cs.CL cs.IR," Conditor is a software tool which works with textual documents containing -historical information. The purpose of this work two-fold: firstly to show the -validity of the developed engine to correctly identify and label the entities -of the universe of discourse with a labelled-combined XTM-DITA model. Secondly -to explain the improvements achieved in the information retrieval process -thanks to the use of a object-oriented database (JPOX) as well as its -integration into the Lucene-type database search process to not only accomplish -more accurate searches, but to also help the future development of a -recommender system. We finish with a brief demo in a 3D-graph of the results of -the aforementioned search. -" -2615,1603.07603,"Fei Sun, Jiafeng Guo, Yanyan Lan, Jun Xu, and Xueqi Cheng",Semantic Regularities in Document Representations,cs.CL," Recent work exhibited that distributed word representations are good at -capturing linguistic regularities in language. This allows vector-oriented -reasoning based on simple linear algebra between words. Since many different -methods have been proposed for learning document representations, it is natural -to ask whether there is also linear structure in these learned representations -to allow similar reasoning at document level. To answer this question, we -design a new document analogy task for testing the semantic regularities in -document representations, and conduct empirical evaluations over several -state-of-the-art document representation models. The results reveal that neural -embedding based document representations work better on this analogy task than -conventional methods, and we provide some preliminary explanations over these -observations. -" -2616,1603.07609,"Yevgeni Berzak, Roi Reichart and Boris Katz","Contrastive Analysis with Predictive Power: Typology Driven Estimation - of Grammatical Error Distributions in ESL",cs.CL," This work examines the impact of cross-linguistic transfer on grammatical -errors in English as Second Language (ESL) texts. Using a computational -framework that formalizes the theory of Contrastive Analysis (CA), we -demonstrate that language specific error distributions in ESL writing can be -predicted from the typological properties of the native language and their -relation to the typology of English. Our typology driven model enables to -obtain accurate estimates of such distributions without access to any ESL data -for the target languages. Furthermore, we present a strategy for adjusting our -method to low-resource languages that lack typological documentation using a -bootstrapping approach which approximates native language typology from ESL -texts. Finally, we show that our framework is instrumental for linguistic -inquiry seeking to identify first language factors that contribute to a wide -range of difficulties in second language acquisition. -" -2617,1603.07624,"Eun Hee Ko, Diego Klabjan",Semantic Properties of Customer Sentiment in Tweets,cs.CL cs.IR cs.SI stat.ML," An increasing number of people are using online social networking services -(SNSs), and a significant amount of information related to experiences in -consumption is shared in this new media form. Text mining is an emerging -technique for mining useful information from the web. We aim at discovering in -particular tweets semantic patterns in consumers' discussions on social media. -Specifically, the purposes of this study are twofold: 1) finding similarity and -dissimilarity between two sets of textual documents that include consumers' -sentiment polarities, two forms of positive vs. negative opinions and 2) -driving actual content from the textual data that has a semantic trend. The -considered tweets include consumers opinions on US retail companies (e.g., -Amazon, Walmart). Cosine similarity and K-means clustering methods are used to -achieve the former goal, and Latent Dirichlet Allocation (LDA), a popular topic -modeling algorithm, is used for the latter purpose. This is the first study -which discover semantic properties of textual data in consumption context -beyond sentiment analysis. In addition to major findings, we apply LDA (Latent -Dirichlet Allocations) to the same data and drew latent topics that represent -consumers' positive opinions and negative opinions on social media. -" -2618,1603.07646,Saurabh Kataria,Recursive Neural Language Architecture for Tag Prediction,cs.IR cs.CL cs.LG cs.NE," We consider the problem of learning distributed representations for tags from -their associated content for the task of tag recommendation. Considering -tagging information is usually very sparse, effective learning from content and -tag association is very crucial and challenging task. Recently, various neural -representation learning models such as WSABIE and its variants show promising -performance, mainly due to compact feature representations learned in a -semantic space. However, their capacity is limited by a linear compositional -approach for representing tags as sum of equal parts and hurt their -performance. In this work, we propose a neural feedback relevance model for -learning tag representations with weighted feature representations. Our -experiments on two widely used datasets show significant improvement for -quality of recommendations over various baselines. -" -2619,1603.07695,"Quan Liu, Zhen-Hua Ling, Hui Jiang, Yu Hu",Part-of-Speech Relevance Weights for Learning Word Embeddings,cs.CL," This paper proposes a model to learn word embeddings with weighted contexts -based on part-of-speech (POS) relevance weights. POS is a fundamental element -in natural language. However, state-of-the-art word embedding models fail to -consider it. This paper proposes to use position-dependent POS relevance -weighting matrices to model the inherent syntactic relationship among words -within a context window. We utilize the POS relevance weights to model each -word-context pairs during the word embedding training process. The model -proposed in this paper paper jointly optimizes word vectors and the POS -relevance matrices. Experiments conducted on popular word analogy and word -similarity tasks all demonstrated the effectiveness of the proposed method. -" -2620,1603.07771,"Remi Lebret, David Grangier, Michael Auli","Neural Text Generation from Structured Data with Application to the - Biography Domain",cs.CL," This paper introduces a neural model for concept-to-text generation that -scales to large, rich domains. We experiment with a new dataset of biographies -from Wikipedia that is an order of magnitude larger than existing resources -with over 700k samples. The dataset is also vastly more diverse with a 400k -vocabulary, compared to a few hundred words for Weathergov or Robocup. Our -model builds upon recent work on conditional neural language model for text -generation. To deal with the large vocabulary, we extend these models to mix a -fixed vocabulary with copy actions that transfer sample-specific words from the -input database to the generated output sentence. Our neural model significantly -out-performs a classical Kneser-Ney language model adapted to this task by -nearly 15 BLEU. -" -2621,1603.07954,"Karthik Narasimhan, Adam Yala and Regina Barzilay","Improving Information Extraction by Acquiring External Evidence with - Reinforcement Learning",cs.CL," Most successful information extraction systems operate with access to a large -collection of documents. In this work, we explore the task of acquiring and -incorporating external evidence to improve extraction accuracy in domains where -the amount of training data is scarce. This process entails issuing search -queries, extraction from new sources and reconciliation of extracted values, -which are repeated until sufficient evidence is collected. We approach the -problem using a reinforcement learning framework where our model learns to -select optimal actions based on contextual information. We employ a deep -Q-network, trained to optimize a reward function that reflects extraction -accuracy while penalizing extra effort. Our experiments on two databases -- of -shooting incidents, and food adulteration cases -- demonstrate that our system -significantly outperforms traditional extractors and a competitive -meta-classifier baseline. -" -2622,1603.08016,"Reed Coke, Ben King, Dragomir Radev",Classifying Syntactic Regularities for Hundreds of Languages,cs.CL," This paper presents a comparison of classification methods for linguistic -typology for the purpose of expanding an extensive, but sparse language -resource: the World Atlas of Language Structures (WALS) (Dryer and Haspelmath, -2013). We experimented with a variety of regression and nearest-neighbor -methods for use in classification over a set of 325 languages and six syntactic -rules drawn from WALS. To classify each rule, we consider the typological -features of the other five rules; linguistic features extracted from a -word-aligned Bible in each language; and genealogical features (genus and -family) of each language. In general, we find that propagating the majority -label among all languages of the same genus achieves the best accuracy in label -pre- diction. Following this, a logistic regression model that combines -typological and linguistic features offers the next best performance. -Interestingly, this model actually outperforms the majority labels among all -languages of the same family. -" -2623,1603.08023,"Chia-Wei Liu, Ryan Lowe, Iulian V. Serban, Michael Noseworthy, Laurent - Charlin, Joelle Pineau","How NOT To Evaluate Your Dialogue System: An Empirical Study of - Unsupervised Evaluation Metrics for Dialogue Response Generation",cs.CL cs.AI cs.LG cs.NE," We investigate evaluation metrics for dialogue response generation systems -where supervised labels, such as task completion, are not available. Recent -works in response generation have adopted metrics from machine translation to -compare a model's generated response to a single target response. We show that -these metrics correlate very weakly with human judgements in the non-technical -Twitter domain, and not at all in the technical Ubuntu domain. We provide -quantitative and qualitative results highlighting specific weaknesses in -existing metrics, and provide recommendations for future development of better -automatic evaluation metrics for dialogue systems. -" -2624,1603.08042,"Rohit Prabhavalkar, Ouais Alsharif, Antoine Bruguier, Ian McGraw","On the Compression of Recurrent Neural Networks with an Application to - LVCSR acoustic modeling for Embedded Speech Recognition",cs.CL cs.LG cs.NE," We study the problem of compressing recurrent neural networks (RNNs). In -particular, we focus on the compression of RNN acoustic models, which are -motivated by the goal of building compact and accurate speech recognition -systems which can be run efficiently on mobile devices. In this work, we -present a technique for general recurrent model compression that jointly -compresses both recurrent and non-recurrent inter-layer weight matrices. We -find that the proposed technique allows us to reduce the size of our Long -Short-Term Memory (LSTM) acoustic model to a third of its original size with -negligible loss in accuracy. -" -2625,1603.08048,Michael Ruster,"""Did I Say Something Wrong?"" A Word-Level Analysis of Wikipedia Articles - for Deletion Discussions",cs.CL cs.SI stat.ML," This thesis focuses on gaining linguistic insights into textual discussions -on a word level. It was of special interest to distinguish messages that -constructively contribute to a discussion from those that are detrimental to -them. Thereby, we wanted to determine whether ""I""- and ""You""-messages are -indicators for either of the two discussion styles. These messages are nowadays -often used in guidelines for successful communication. Although their effects -have been successfully evaluated multiple times, a large-scale analysis has -never been conducted. - Thus, we used Wikipedia Articles for Deletion (short: AfD) discussions -together with the records of blocked users and developed a fully automated -creation of an annotated data set. In this data set, messages were labelled -either constructive or disruptive. We applied binary classifiers to the data to -determine characteristic words for both discussion styles. Thereby, we also -investigated whether function words like pronouns and conjunctions play an -important role in distinguishing the two. - We found that ""You""-messages were a strong indicator for disruptive messages -which matches their attributed effects on communication. However, we found -""I""-messages to be indicative for disruptive messages as well which is contrary -to their attributed effects. The importance of function words could neither be -confirmed nor refuted. Other characteristic words for either communication -style were not found. Yet, the results suggest that a different model might -represent disruptive and constructive messages in textual discussions better. -" -2626,1603.08079,"Yevgeni Berzak and Andrei Barbu and Daniel Harari and Boris Katz and - Shimon Ullman",Do You See What I Mean? Visual Resolution of Linguistic Ambiguities,cs.CV cs.AI cs.CL," Understanding language goes hand in hand with the ability to integrate -complex contextual information obtained via perception. In this work, we -present a novel task for grounded language understanding: disambiguating a -sentence given a visual scene which depicts one of the possible interpretations -of that sentence. To this end, we introduce a new multimodal corpus containing -ambiguous sentences, representing a wide range of syntactic, semantic and -discourse ambiguities, coupled with videos that visualize the different -interpretations for each sentence. We address this task by extending a vision -model which determines if a sentence is depicted by a video. We demonstrate how -such a model can be adjusted to recognize different interpretations of the same -underlying sentence, allowing to disambiguate sentences in a unified fashion -across the different ambiguity types. -" -2627,1603.08089,"Qingqing Zhou, Rui Xia, Chengzhi Zhang","Online shopping behavior study based on multi-granularity opinion - mining: China vs. America",cs.CY cs.CL cs.HC," With the development of e-commerce, many products are now being sold -worldwide, and manufacturers are eager to obtain a better understanding of -customer behavior in various regions. To achieve this goal, most previous -efforts have focused mainly on questionnaires, which are time-consuming and -costly. The tremendous volume of product reviews on e-commerce websites has -seen a new trend emerge, whereby manufacturers attempt to understand user -preferences by analyzing online reviews. Following this trend, this paper -addresses the problem of studying customer behavior by exploiting recently -developed opinion mining techniques. This work is novel for three reasons. -First, questionnaire-based investigation is automatically enabled by employing -algorithms for template-based question generation and opinion mining-based -answer extraction. Using this system, manufacturers are able to obtain reports -of customer behavior featuring a much larger sample size, more direct -information, a higher degree of automation, and a lower cost. Second, -international customer behavior study is made easier by integrating tools for -multilingual opinion mining. Third, this is the first time an automatic -questionnaire investigation has been conducted to compare customer behavior in -China and America, where product reviews are written and read in Chinese and -English, respectively. Our study on digital cameras, smartphones, and tablet -computers yields three findings. First, Chinese customers follow the Doctrine -of the Mean, and often use euphemistic expressions, while American customers -express their opinions more directly. Second, Chinese customers care more about -general feelings, while American customers pay more attention to product -details. Third, Chinese customers focus on external features, while American -customers care more about the internal features of products. -" -2628,1603.08091,"Qingqing Zhou, Chengzhi Zhang, Star X. Zhao, Bikun Chen","Measuring Book Impact Based on the Multi-granularity Online Review - Mining",cs.DL cs.CL," As with articles and journals, the customary methods for measuring books' -academic impact mainly involve citations, which is easy but limited to -interrogating traditional citation databases and scholarly book reviews, -Researchers have attempted to use other metrics, such as Google Books, -libcitation, and publisher prestige. However, these approaches lack -content-level information and cannot determine the citation intentions of -users. Meanwhile, the abundant online review resources concerning academic -books can be used to mine deeper information and content utilizing altmetric -perspectives. In this study, we measure the impacts of academic books by -multi-granularity mining online reviews, and we identify factors that affect a -book's impact. First, online reviews of a sample of academic books on Amazon.cn -are crawled and processed. Then, multi-granularity review mining is conducted -to identify review sentiment polarities and aspects' sentiment values. Lastly, -the numbers of positive reviews and negative reviews, aspect sentiment values, -star values, and information regarding helpfulness are integrated via the -entropy method, and lead to the calculation of the final book impact scores. -The results of a correlation analysis of book impact scores obtained via our -method versus traditional book citations show that, although there are -substantial differences between subject areas, online book reviews tend to -reflect the academic impact. Thus, we infer that online reviews represent a -promising source for mining book impact within the altmetric perspective and at -the multi-granularity content level. Moreover, our proposed method might also -be a means by which to measure other books besides academic publications. -" -2629,1603.08148,"Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bowen Zhou and Yoshua - Bengio",Pointing the Unknown Words,cs.CL cs.LG cs.NE," The problem of rare and unknown words is an important issue that can -potentially influence the performance of many NLP systems, including both the -traditional count-based and the deep learning models. We propose a novel way to -deal with the rare and unseen words for the neural network models using -attention. Our model uses two softmax layers in order to predict the next word -in conditional language models: one predicts the location of a word in the -source sentence, and the other predicts a word in the shortlist vocabulary. At -each time-step, the decision of which softmax layer to use choose adaptively -made by an MLP which is conditioned on the context.~We motivate our work from a -psychological evidence that humans naturally have a tendency to point towards -objects in the context or the environment when the name of an object is not -known.~We observe improvements on two tasks, neural machine translation on the -Europarl English to French parallel corpora and text summarization on the -Gigaword dataset using our proposed model. -" -2630,1603.08321,"Linlin Chao, Jianhua Tao, Minghao Yang, Ya Li and Zhengqi Wen","Audio Visual Emotion Recognition with Temporal Alignment and Perception - Attention",cs.CV cs.CL cs.LG," This paper focuses on two key problems for audio-visual emotion recognition -in the video. One is the audio and visual streams temporal alignment for -feature level fusion. The other one is locating and re-weighting the perception -attentions in the whole audio-visual stream for better recognition. The Long -Short Term Memory Recurrent Neural Network (LSTM-RNN) is employed as the main -classification architecture. Firstly, soft attention mechanism aligns the audio -and visual streams. Secondly, seven emotion embedding vectors, which are -corresponding to each classification emotion type, are added to locate the -perception attentions. The locating and re-weighting process is also based on -the soft attention mechanism. The experiment results on EmotiW2015 dataset and -the qualitative analysis show the efficiency of the proposed two techniques. -" -2631,1603.08458,"Shaodian Zhang, Edouard Grave, Elizabeth Sklar, Noemie Elhadad","Longitudinal Analysis of Discussion Topics in an Online Breast Cancer - Community using Convolutional Neural Networks",cs.CL cs.CY cs.SI," Identifying topics of discussions in online health communities (OHC) is -critical to various applications, but can be difficult because topics of OHC -content are usually heterogeneous and domain-dependent. In this paper, we -provide a multi-class schema, an annotated dataset, and supervised classifiers -based on convolutional neural network (CNN) and other models for the task of -classifying discussion topics. We apply the CNN classifier to the most popular -breast cancer online community, and carry out a longitudinal analysis to show -topic distributions and topic changes throughout members' participation. Our -experimental results suggest that CNN outperforms other classifiers in the task -of topic classification, and that certain trajectories can be detected with -respect to topic changes. -" -2632,1603.08474,"Oswaldo Ludwig, Xiao Liu, Parisa Kordjamshidi, Marie-Francine Moens",Deep Embedding for Spatial Role Labeling,cs.CL cs.CV cs.LG cs.NE," This paper introduces the visually informed embedding of word (VIEW), a -continuous vector representation for a word extracted from a deep neural model -trained using the Microsoft COCO data set to forecast the spatial arrangements -between visual objects, given a textual description. The model is composed of a -deep multilayer perceptron (MLP) stacked on the top of a Long Short Term Memory -(LSTM) network, the latter being preceded by an embedding layer. The VIEW is -applied to transferring multimodal background knowledge to Spatial Role -Labeling (SpRL) algorithms, which recognize spatial relations between objects -mentioned in the text. This work also contributes with a new method to select -complementary features and a fine-tuning method for MLP that improves the $F1$ -measure in classifying the words into spatial roles. The VIEW is evaluated with -the Task 3 of SemEval-2013 benchmark data set, SpaceEval. -" -2633,1603.08507,"Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, - Bernt Schiele, Trevor Darrell",Generating Visual Explanations,cs.CV cs.AI cs.CL," Clearly explaining a rationale for a classification decision to an end-user -can be as important as the decision itself. Existing approaches for deep visual -recognition are generally opaque and do not output any justification text; -contemporary vision-language models can describe image content but fail to take -into account class-discriminative image aspects which justify visual -predictions. We propose a new model that focuses on the discriminating -properties of the visible object, jointly predicts a class label, and explains -why the predicted label is appropriate for the image. We propose a novel loss -function based on sampling and reinforcement learning that learns to generate -sentences that realize a global sentence property, such as class specificity. -Our results on a fine-grained bird species classification dataset show that our -model is able to generate explanations which are not only consistent with an -image but also more discriminative than descriptions produced by existing -captioning methods. -" -2634,1603.08594,"Geetanjali Rakshit, Sagar Sontakke, Pushpak Bhattacharyya, Gholamreza - Haffari","Prepositional Attachment Disambiguation Using Bilingual Parsing and - Alignments",cs.CL," In this paper, we attempt to solve the problem of Prepositional Phrase (PP) -attachments in English. The motivation for the work comes from NLP applications -like Machine Translation, for which, getting the correct attachment of -prepositions is very crucial. The idea is to correct the PP-attachments for a -sentence with the help of alignments from parallel data in another language. -The novelty of our work lies in the formulation of the problem into a dual -decomposition based algorithm that enforces agreement between the parse trees -from two languages as a constraint. Experiments were performed on the -English-Hindi language pair and the performance improved by 10% over the -baseline, where the baseline is the attachment predicted by the MSTParser model -trained for English. -" -2635,1603.08636,"Jiri Vinarek (Charles University in Prague, Faculty of Mathematics and - Physics, Department of Distributed and Dependable Systems), Petr Hnetynka - (Charles University in Prague, Faculty of Mathematics and Physics, Department - of Distributed and Dependable Systems)","Towards an Automated Requirements-driven Development of Smart - Cyber-Physical Systems",cs.SE cs.CL," The Invariant Refinement Method for Self Adaptation (IRM-SA) is a design -method targeting development of smart Cyber-Physical Systems (sCPS). It allows -for a systematic translation of the system requirements into the system -architecture expressed as an ensemble-based component system (EBCS). However, -since the requirements are captured using natural language, there exists the -danger of their misinterpretation due to natural language requirements' -ambiguity, which could eventually lead to design errors. Thus, automation and -validation of the design process is desirable. In this paper, we (i) analyze -the translation process of natural language requirements into the IRM-SA model, -(ii) identify individual steps that can be automated and/or validated using -natural language processing techniques, and (iii) propose suitable methods. -" -2636,1603.08701,"Enrico Santus, Tin-Shing Chiu, Qin Lu, Alessandro Lenci, Chu-Ren Huang","What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL - Datasets",cs.CL," In this paper, we claim that Vector Cosine, which is generally considered one -of the most efficient unsupervised measures for identifying word similarity in -Vector Space Models, can be outperformed by a completely unsupervised measure -that evaluates the extent of the intersection among the most associated -contexts of two target words, weighting such intersection according to the rank -of the shared contexts in the dependency ranked lists. This claim comes from -the hypothesis that similar words do not simply occur in similar contexts, but -they share a larger portion of their most relevant contexts compared to other -related words. To prove it, we describe and evaluate APSyn, a variant of -Average Precision that, independently of the adopted parameters, outperforms -the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the -best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy -in the TOEFL dataset, beating therefore the non-English US college applicants -(whose average, as reported in the literature, is 64.50%) and several -state-of-the-art approaches. -" -2637,1603.08702,"Enrico Santus, Alessandro Lenci, Tin-Shing Chiu, Qin Lu, Chu-Ren Huang",Nine Features in a Random Forest to Learn Taxonomical Semantic Relations,cs.CL," ROOT9 is a supervised system for the classification of hypernyms, co-hyponyms -and random words that is derived from the already introduced ROOT13 (Santus et -al., 2016). It relies on a Random Forest algorithm and nine unsupervised -corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 -pairs, equally distributed among the three classes and involving several -Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are -present, ROOT9 achieves an F1 score of 90.7%, against a baseline of 57.2% -(vector cosine). When the classification is binary, ROOT9 achieves the -following results against the baseline: hypernyms-co-hyponyms 95.7% vs. 69.8%, -hypernyms-random 91.8% vs. 64.1% and co-hyponyms-random 97.8% vs. 79.4%. In -order to compare the performance with the state-of-the-art, we have also -evaluated ROOT9 in subsets of the Weeds et al. (2014) datasets, proving that it -is in fact competitive. Finally, we investigated whether the system learns the -semantic relation or it simply learns the prototypical hypernyms, as claimed by -Levy et al. (2015). The second possibility seems to be the most likely, even -though ROOT9 can be trained on negative examples (i.e., switched hypernyms) to -drastically reduce this bias. -" -2638,1603.08705,"Enrico Santus, Tin-Shing Chiu, Qin Lu, Alessandro Lenci and Chu-Ren - Huang","ROOT13: Spotting Hypernyms, Co-Hyponyms and Randoms",cs.CL," In this paper, we describe ROOT13, a supervised system for the classification -of hypernyms, co-hyponyms and random words. The system relies on a Random -Forest algorithm and 13 unsupervised corpus-based features. We evaluate it with -a 10-fold cross validation on 9,600 pairs, equally distributed among the three -classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and -verbs). When all the classes are present, ROOT13 achieves an F1 score of 88.3%, -against a baseline of 57.6% (vector cosine). When the classification is binary, -ROOT13 achieves the following results: hypernyms-co-hyponyms (93.4% vs. 60.2%), -hypernymsrandom (92.3% vs. 65.5%) and co-hyponyms-random (97.3% vs. 81.5%). Our -results are competitive with stateof-the-art models. -" -2639,1603.08832,"Ethan Fast, Tina Vachovsky, Michael S. Bernstein","Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias - in an Online Fiction Writing Community",cs.CL cs.SI," Imagine a princess asleep in a castle, waiting for her prince to slay the -dragon and rescue her. Tales like the famous Sleeping Beauty clearly divide up -gender roles. But what about more modern stories, borne of a generation -increasingly aware of social constructs like sexism and racism? Do these -stories tend to reinforce gender stereotypes, or counter them? In this paper, -we present a technique that combines natural language processing with a -crowdsourced lexicon of stereotypes to capture gender biases in fiction. We -apply this technique across 1.8 billion words of fiction from the Wattpad -online writing community, investigating gender representation in stories, how -male and female characters behave and are described, and how authors' use of -gender stereotypes is associated with the community's ratings. We find that -male over-representation and traditional gender stereotypes (e.g., dominant men -and submissive women) are common throughout nearly every genre in our corpus. -However, only some of these stereotypes, like sexual or violent men, are -associated with highly rated stories. Finally, despite women often being the -target of negative stereotypes, female authors are equally likely to write such -stereotypes as men. -" -2640,1603.08865,Emil Axelsson,Compilation as a Typed EDSL-to-EDSL Transformation,cs.CL," This article is about an implementation and compilation technique that is -used in RAW-Feldspar which is a complete rewrite of the Feldspar embedded -domain-specific language (EDSL) (Axelsson et al. 2010). Feldspar is high-level -functional language that generates efficient C code to run on embedded targets. -The gist of the technique presented in this post is the following: rather -writing a back end that converts pure Feldspar expressions directly to C, we -translate them to a low-level monadic EDSL. From the low-level EDSL, C code is -then generated. This approach has several advantages: - 1. The translation is simpler to write than a complete C back end. - 2. The translation is between two typed EDSLs, which rules out many potential -errors. - 3. The low-level EDSL is reusable and can be shared between several -high-level EDSLs. - Although the article contains a lot of code, most of it is in fact reusable. -As mentioned in Discussion, we can write the same implementation in less than -50 lines of code using generic libraries that we have developed to support -Feldspar. -" -2641,1603.08868,"Ildik\'o Pil\'an, Sowmya Vajjala and Elena Volodina","A Readable Read: Automatic Assessment of Language Learning Materials - based on Linguistic Complexity",cs.CL," Corpora and web texts can become a rich language learning resource if we have -a means of assessing whether they are linguistically appropriate for learners -at a given proficiency level. In this paper, we aim at addressing this issue by -presenting the first approach for predicting linguistic complexity for Swedish -second language learning material on a 5-point scale. After showing that the -traditional Swedish readability measure, L\""asbarhetsindex (LIX), is not -suitable for this task, we propose a supervised machine learning model, based -on a range of linguistic features, that can reliably classify texts according -to their difficulty level. Our model obtained an accuracy of 81.3% and an -F-score of 0.8, which is comparable to the state of the art in English and is -considerably higher than previously reported results for other languages. We -further studied the utility of our features with single sentences instead of -full texts since sentences are a common linguistic unit in language learning -exercises. We trained a separate model on sentence-level data with five -classes, which yielded 63.4% accuracy. Although this is lower than the document -level performance, we achieved an adjacent accuracy of 92%. Furthermore, we -found that using a combination of different features, compared to using lexical -features alone, resulted in 7% improvement in classification accuracy at the -sentence level, whereas at the document level, lexical features were more -dominant. Our models are intended for use in a freely accessible web-based -language learning platform for the automatic generation of exercises. -" -2642,1603.08884,"Adam Trischler and Zheng Ye and Xingdi Yuan and Jing He and Phillip - Bachman and Kaheer Suleman",A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data,cs.CL," Understanding unstructured text is a major goal within natural language -processing. Comprehension tests pose questions based on short text passages to -evaluate such understanding. In this work, we investigate machine comprehension -on the challenging {\it MCTest} benchmark. Partly because of its limited size, -prior work on {\it MCTest} has focused mainly on engineering better features. -We tackle the dataset with a neural approach, harnessing simple neural networks -arranged in a parallel hierarchy. The parallel hierarchy enables our model to -compare the passage, question, and answer from a variety of trainable -perspectives, as opposed to using a manually designed, rigid feature set. -Perspectives range from the word level to sentence fragments to sequences of -sentences; the networks operate only on word-embedding representations of text. -When trained with a methodology designed to help cope with limited training -data, our Parallel-Hierarchical model sets a new state of the art for {\it -MCTest}, outperforming previous feature-engineered approaches slightly and -previous neural approaches by a significant margin (over 15\% absolute). -" -2643,1603.08887,"Greg Durrett, Taylor Berg-Kirkpatrick, and Dan Klein","Learning-Based Single-Document Summarization with Compression and - Anaphoricity Constraints",cs.CL," We present a discriminative model for single-document summarization that -integrally combines compression and anaphoricity constraints. Our model selects -textual units to include in the summary based on a rich set of sparse features -whose weights are learned on a large corpus. We allow for the deletion of -content within a sentence when that deletion is licensed by compression rules; -in our framework, these are implemented as dependencies between subsentential -units of text. Anaphoricity constraints then improve cross-sentence coherence -by guaranteeing that, for each pronoun included in the summary, the pronoun's -antecedent is included as well or the pronoun is rewritten as a full mention. -When trained end-to-end, our final system outperforms prior work on both ROUGE -as well as on human judgments of linguistic quality. -" -2644,1603.09054,"Enrico Santus, Tin-Shing Chiu, Qin Lu, Alessandro Lenci and Chu-Ren - Huang","Unsupervised Measure of Word Similarity: How to Outperform Co-occurrence - and Vector Cosine in VSMs",cs.CL," In this paper, we claim that vector cosine, which is generally considered -among the most efficient unsupervised measures for identifying word similarity -in Vector Space Models, can be outperformed by an unsupervised measure that -calculates the extent of the intersection among the most mutually dependent -contexts of the target words. To prove it, we describe and evaluate APSyn, a -variant of the Average Precision that, without any optimization, outperforms -the vector cosine and the co-occurrence on the standard ESL test set, with an -improvement ranging between +9.00% and +17.98%, depending on the number of -chosen top contexts. -" -2645,1603.09128,Simon \v{S}uster and Ivan Titov and Gertjan van Noord,Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders,cs.CL cs.LG stat.ML," We present an approach to learning multi-sense word embeddings relying both -on monolingual and bilingual information. Our model consists of an encoder, -which uses monolingual and bilingual context (i.e. a parallel sentence) to -choose a sense for a given word, and a decoder which predicts context words -based on the chosen sense. The two components are estimated jointly. We observe -that the word representations induced from bilingual data outperform the -monolingual counterparts across a range of evaluation tasks, even though -crosslingual information is not available at test time. -" -2646,1603.09170,"Bin Wang, Zhijian Ou, Yong He, Akinori Kawamura","Model Interpolation with Trans-dimensional Random Field Language Models - for Speech Recognition",cs.CL cs.LG stat.ML," The dominant language models (LMs) such as n-gram and neural network (NN) -models represent sentence probabilities in terms of conditionals. In contrast, -a new trans-dimensional random field (TRF) LM has been recently introduced to -show superior performances, where the whole sentence is modeled as a random -field. In this paper, we examine how the TRF models can be interpolated with -the NN models, and obtain 12.1\% and 17.9\% relative error rate reductions over -6-gram LMs for English and Chinese speech recognition respectively through -log-linear combination. -" -2647,1603.09188,"Spandana Gella, Mirella Lapata, Frank Keller","Unsupervised Visual Sense Disambiguation for Verbs using Multimodal - Embeddings",cs.CL cs.CV," We introduce a new task, visual sense disambiguation for verbs: given an -image and a verb, assign the correct sense of the verb, i.e., the one that -describes the action depicted in the image. Just as textual word sense -disambiguation is useful for a wide range of NLP tasks, visual sense -disambiguation can be useful for multimodal tasks such as image retrieval, -image description, and text illustration. We introduce VerSe, a new dataset -that augments existing multimodal datasets (COCO and TUHOI) with sense labels. -We propose an unsupervised algorithm based on Lesk which performs visual sense -disambiguation using textual, visual, or multimodal embeddings. We find that -textual embeddings perform well when gold-standard textual annotations (object -labels and image descriptions) are available, while multimodal embeddings -perform well on unannotated images. We also verify our findings by using the -textual and multimodal embeddings as features in a supervised setting and -analyse the performance of visual sense disambiguation task. VerSe is made -publicly available and can be downloaded at: -https://github.com/spandanagella/verse. -" -2648,1603.09381,Peng Li and Heng Huang,Clinical Information Extraction via Convolutional Neural Network,cs.LG cs.CL cs.NE," We report an implementation of a clinical information extraction tool that -leverages deep neural network to annotate event spans and their attributes from -raw clinical notes and pathology reports. Our approach uses context words and -their part-of-speech tags and shape information as features. Then we hire -temporal (1D) convolutional neural network to learn hidden feature -representations. Finally, we use Multilayer Perceptron (MLP) to predict event -spans. The empirical evaluation demonstrates that our approach significantly -outperforms baselines. -" -2649,1603.09405,Peng Li and Heng Huang,"Enhancing Sentence Relation Modeling with Auxiliary Character-level - Embedding",cs.CL cs.AI cs.NE," Neural network based approaches for sentence relation modeling automatically -generate hidden matching features from raw sentence pairs. However, the quality -of matching feature representation may not be satisfied due to complex semantic -relations such as entailment or contradiction. To address this challenge, we -propose a new deep neural network architecture that jointly leverage -pre-trained word embedding and auxiliary character embedding to learn sentence -meanings. The two kinds of word sequence representations as inputs into -multi-layer bidirectional LSTM to learn enhanced sentence representation. After -that, we construct matching features followed by another temporal CNN to learn -high-level hidden matching feature representations. Experimental results -demonstrate that our approach consistently outperforms the existing methods on -standard evaluation datasets. -" -2650,1603.09457,"Yi Luan, Yangfeng Ji, Mari Ostendorf",LSTM based Conversation Models,cs.CL," In this paper, we present a conversational model that incorporates both -context and participant role for two-party conversations. Different -architectures are explored for integrating participant role and context -information into a Long Short-term Memory (LSTM) language model. The -conversational model can function as a language model or a language generation -model. Experiments on the Ubuntu Dialog Corpus show that our model can capture -multiple turn interaction between participants. The proposed method outperforms -a traditional LSTM model as measured by language model perplexity and response -ranking. Generated responses show characteristic differences between the two -participant roles. -" -2651,1603.09460,"Lantian Li, Dong Wang, Xiaodong Zhang, Thomas Fang Zheng, Panshi Jin",System Combination for Short Utterance Speaker Recognition,cs.CL cs.NE," For text-independent short-utterance speaker recognition (SUSR), the -performance often degrades dramatically. This paper presents a combination -approach to the SUSR tasks with two phonetic-aware systems: one is the -DNN-based i-vector system and the other is our recently proposed -subregion-based GMM-UBM system. The former employs phone posteriors to -construct an i-vector model in which the shared statistics offers stronger -robustness against limited test data, while the latter establishes a -phone-dependent GMM-UBM system which represents speaker characteristics with -more details. A score-level fusion is implemented to integrate the respective -advantages from the two systems. Experimental results show that for the -text-independent SUSR task, both the DNN-based i-vector system and the -subregion-based GMM-UBM system outperform their respective baselines, and the -score-level system combination delivers performance improvement. -" -2652,1603.09509,"Zhenyao Zhu, Jesse H. Engel, Awni Hannun",Learning Multiscale Features Directly From Waveforms,cs.CL cs.LG cs.NE cs.SD," Deep learning has dramatically improved the performance of speech recognition -systems through learning hierarchies of features optimized for the task at -hand. However, true end-to-end learning, where features are learned directly -from waveforms, has only recently reached the performance of hand-tailored -representations based on the Fourier transform. In this paper, we detail an -approach to use convolutional filters to push past the inherent tradeoff of -temporal and frequency resolution that exists for spectral representations. At -increased computational cost, we show that increasing temporal resolution via -reduced stride and increasing frequency resolution via additional filters -delivers significant performance improvements. Further, we find more efficient -representations by simultaneously learning at multiple scales, leading to an -overall decrease in word error rate on a difficult internal speech test set by -20.7% relative to networks with the same number of parameters trained on -spectrograms. -" -2653,1603.09630,Pawel Swietojanski and Steve Renals,Differentiable Pooling for Unsupervised Acoustic Model Adaptation,cs.CL cs.LG," We present a deep neural network (DNN) acoustic model that includes -parametrised and differentiable pooling operators. Unsupervised acoustic model -adaptation is cast as the problem of updating the decision boundaries -implemented by each pooling operator. In particular, we experiment with two -types of pooling parametrisations: learned $L_p$-norm pooling and weighted -Gaussian pooling, in which the weights of both operators are treated as -speaker-dependent. We perform investigations using three different large -vocabulary speech recognition corpora: AMI meetings, TED talks and Switchboard -conversational telephone speech. We demonstrate that differentiable pooling -operators provide a robust and relatively low-dimensional way to adapt acoustic -models, with relative word error rates reductions ranging from 5--20% with -respect to unadapted systems, which themselves are better than the baseline -fully-connected DNN-based acoustic models. We also investigate how the proposed -techniques work under various adaptation conditions including the quality of -adaptation data and complementarity to other feature- and model-space -adaptation methods, as well as providing an analysis of the characteristics of -each of the proposed approaches. -" -2654,1603.09631,"Miroslav Vodol\'an, Filip Jur\v{c}\'i\v{c}ek",Data Collection for Interactive Learning through the Dialog,cs.CL cs.LG," This paper presents a dataset collected from natural dialogs which enables to -test the ability of dialog systems to learn new facts from user utterances -throughout the dialog. This interactive learning will help with one of the most -prevailing problems of open domain dialog system, which is the sparsity of -facts a dialog system can reason about. The proposed dataset, consisting of -1900 collected dialogs, allows simulation of an interactive gaining of -denotations and questions explanations from users which can be used for the -interactive learning. -" -2655,1603.09643,"Zhiyuan Tang, Lantian Li and Dong Wang",Multi-task Recurrent Model for Speech and Speaker Recognition,cs.CL cs.LG cs.NE stat.ML," Although highly correlated, speech and speaker recognition have been regarded -as two independent tasks and studied by two communities. This is certainly not -the way that people behave: we decipher both speech content and speaker traits -at the same time. This paper presents a unified model to perform speech and -speaker recognition simultaneously and altogether. The model is based on a -unified neural network where the output of one task is fed to the input of the -other, leading to a multi-task recurrent network. Experiments show that the -joint model outperforms the task-specific models on both the two tasks. -" -2656,1603.09727,"Ziang Xie, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, Andrew Y. Ng",Neural Language Correction with Character-Based Attention,cs.CL cs.AI," Natural language correction has the potential to help language learners -improve their writing skills. While approaches with separate classifiers for -different error types have high precision, they do not flexibly handle errors -such as redundancy or non-idiomatic phrasing. On the other hand, word and -phrase-based machine translation methods are not designed to cope with -orthographic errors, and have recently been outpaced by neural models. -Motivated by these issues, we present a neural network-based approach to -language correction. The core component of our method is an encoder-decoder -recurrent neural network with an attention mechanism. By operating at the -character level, the network avoids the problem of out-of-vocabulary words. We -illustrate the flexibility of our approach on dataset of noisy, user-generated -text collected from an English learner forum. When combined with a language -model, our method achieves a state-of-the-art $F_{0.5}$-score on the CoNLL 2014 -Shared Task. We further demonstrate that training the network on additional -data with synthesized errors can improve performance. -" -2657,1604.00077,"Sheng-syun Shen, Hung-yi Lee","Neural Attention Models for Sequence Classification: Analysis and - Application to Key Term Extraction and Dialogue Act Detection",cs.CL," Recurrent neural network architectures combining with attention mechanism, or -neural attention model, have shown promising performance recently for the tasks -including speech recognition, image caption generation, visual question -answering and machine translation. In this paper, neural attention model is -applied on two sequence classification tasks, dialogue act detection and key -term extraction. In the sequence labeling tasks, the model input is a sequence, -and the output is the label of the input sequence. The major difficulty of -sequence labeling is that when the input sequence is long, it can include many -noisy or irrelevant part. If the information in the whole sequence is treated -equally, the noisy or irrelevant part may degrade the classification -performance. The attention mechanism is helpful for sequence classification -task because it is capable of highlighting important part among the entire -sequence for the classification task. The experimental results show that with -the attention mechanism, discernible improvements were achieved in the sequence -labeling task considered here. The roles of the attention mechanism in the -tasks are further analyzed and visualized in this paper. -" -2658,1604.00100,Kushal Arora and Anand Rangarajan,A Compositional Approach to Language Modeling,cs.CL," Traditional language models treat language as a finite state automaton on a -probability space over words. This is a very strong assumption when modeling -something inherently complex such as language. In this paper, we challenge this -by showing how the linear chain assumption inherent in previous work can be -translated into a sequential composition tree. We then propose a new model that -marginalizes over all possible composition trees thereby removing any -underlying structural assumptions. As the partition function of this new model -is intractable, we use a recently proposed sentence level evaluation metric -Contrastive Entropy to evaluate our model. Given this new evaluation metric, we -report more than 100% improvement across distortion levels over current state -of the art recurrent neural network based language models. -" -2659,1604.00117,Aaron Jaech and Larry Heck and Mari Ostendorf,"Domain Adaptation of Recurrent Neural Networks for Natural Language - Understanding",cs.CL," The goal of this paper is to use multi-task learning to efficiently scale -slot filling models for natural language understanding to handle multiple -target tasks or domains. The key to scalability is reducing the amount of -training data needed to learn a model for a new task. The proposed multi-task -model delivers better performance with less data by leveraging patterns that it -learns from the other tasks. The approach supports an open vocabulary, which -allows the models to generalize to unseen words, which is particularly -important when very little training data is used. A newly collected -crowd-sourced data set, covering four different domains, is used to demonstrate -the effectiveness of the domain adaptation and open vocabulary techniques. -" -2660,1604.00119,Krish Perumal,"Semi-supervised and Unsupervised Methods for Categorizing Posts in Web - Discussion Forums",cs.CL cs.IR cs.LG cs.SI," Web discussion forums are used by millions of people worldwide to share -information belonging to a variety of domains such as automotive vehicles, -pets, sports, etc. They typically contain posts that fall into different -categories such as problem, solution, feedback, spam, etc. Automatic -identification of these categories can aid information retrieval that is -tailored for specific user requirements. Previously, a number of supervised -methods have attempted to solve this problem; however, these depend on the -availability of abundant training data. A few existing unsupervised and -semi-supervised approaches are either focused on identifying a single category -or do not report category-specific performance. In contrast, this work proposes -unsupervised and semi-supervised methods that require no or minimal training -data to achieve this objective without compromising on performance. A -fine-grained analysis is also carried out to discuss their limitations. The -proposed methods are based on sequence models (specifically, Hidden Markov -Models) that can model language for each category using word and part-of-speech -probability distributions, and manually specified features. Empirical -evaluations across domains demonstrate that the proposed methods are better -suited for this task than existing ones. -" -2661,1604.00125,"Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei and Yanran Li","AttSum: Joint Learning of Focusing and Summarization with Neural - Attention",cs.IR cs.CL," Query relevance ranking and sentence saliency ranking are the two main tasks -in extractive query-focused summarization. Previous supervised summarization -systems often perform the two tasks in isolation. However, since reference -summaries are the trade-off between relevance and saliency, using them as -supervision, neither of the two rankers could be trained well. This paper -proposes a novel summarization system called AttSum, which tackles the two -tasks jointly. It automatically learns distributed representations for -sentences as well as the document cluster. Meanwhile, it applies the attention -mechanism to simulate the attentive reading of human behavior when a query is -given. Extensive experiments are conducted on DUC query-focused summarization -benchmark datasets. Without using any hand-crafted features, AttSum achieves -competitive performance. It is also observed that the sentences recognized to -focus on the query indeed meet the query need. -" -2662,1604.00126,"Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan, Sam Gershman",Nonparametric Spherical Topic Modeling with Word Embeddings,cs.CL cs.IR cs.LG stat.ML," Traditional topic models do not account for semantic regularities in -language. Recent distributional representations of words exhibit semantic -consistency over directional metrics such as cosine similarity. However, -neither categorical nor Gaussian observational distributions used in existing -topic models are appropriate to leverage such correlations. In this paper, we -propose to use the von Mises-Fisher distribution to model the density of words -over a unit sphere. Such a representation is well-suited for directional data. -We use a Hierarchical Dirichlet Process for our base topic model and propose an -efficient inference algorithm based on Stochastic Variational Inference. This -model enables us to naturally exploit the semantic structures of word -embeddings while flexibly discovering the number of topics. Experiments -demonstrate that our method outperforms competitive approaches in terms of -topic coherence on two different text corpora while offering efficient -inference. -" -2663,1604.00317,Ehud Ben-Reuven and Jacob Goldberger,"A Semisupervised Approach for Language Identification based on Ladder - Networks",cs.CL cs.LG cs.NE," In this study we address the problem of training a neuralnetwork for language -identification using both labeled and unlabeled speech samples in the form of -i-vectors. We propose a neural network architecture that can also handle -out-of-set languages. We utilize a modified version of the recently proposed -Ladder Network semisupervised training procedure that optimizes the -reconstruction costs of a stack of denoising autoencoders. We show that this -approach can be successfully applied to the case where the training dataset is -composed of both labeled and unlabeled acoustic data. The results show enhanced -language identification on the NIST 2015 language identification dataset. -" -2664,1604.00400,Arman Cohan and Nazli Goharian,Revisiting Summarization Evaluation for Scientific Articles,cs.CL," Evaluation of text summarization approaches have been mostly based on metrics -that measure similarities of system generated summaries with a set of human -written gold-standard summaries. The most widely used metric in summarization -evaluation has been the ROUGE family. ROUGE solely relies on lexical overlaps -between the terms and phrases in the sentences; therefore, in cases of -terminology variations and paraphrasing, ROUGE is not as effective. Scientific -article summarization is one such case that is different from general domain -summarization (e.g. newswire data). We provide an extensive analysis of ROUGE's -effectiveness as an evaluation metric for scientific summarization; we show -that, contrary to the common belief, ROUGE is not much reliable in evaluating -scientific summaries. We furthermore show how different variants of ROUGE -result in very different correlations with the manual Pyramid scores. Finally, -we propose an alternative metric for summarization evaluation which is based on -the content relevance between a system generated summary and the corresponding -human written summaries. We call our metric SERA (Summarization Evaluation by -Relevance Analysis). Unlike ROUGE, SERA consistently achieves high correlations -with manual scores which shows its effectiveness in evaluation of scientific -article summarization. -" -2665,1604.00425,"Shyam Upadhyay, Manaal Faruqui, Chris Dyer and Dan Roth",Cross-lingual Models of Word Embeddings: An Empirical Comparison,cs.CL," Despite interest in using cross-lingual knowledge to learn word embeddings -for various tasks, a systematic comparison of the possible approaches is -lacking in the literature. We perform an extensive evaluation of four popular -approaches of inducing cross-lingual embeddings, each requiring a different -form of supervision, on four typographically different language pairs. Our -evaluation setup spans four different tasks, including intrinsic evaluation on -mono-lingual and cross-lingual similarity, and extrinsic evaluation on -downstream semantic and syntactic applications. We show that models which -require expensive cross-lingual knowledge almost always perform better, but -cheaply supervised models often prove competitive on certain tasks. -" -2666,1604.00461,"Mo Yu, Mark Dredze, Raman Arora, Matthew Gormley",Embedding Lexical Features via Low-Rank Tensors,cs.CL cs.AI cs.LG," Modern NLP models rely heavily on engineered features, which often combine -word and contextual information into complex lexical features. Such combination -results in large numbers of features, which can lead to over-fitting. We -present a new model that represents complex lexical features---comprised of -parts for words, contextual information and labels---in a tensor that captures -conjunction information among these parts. We apply low-rank tensor -approximations to the corresponding parameter tensors to reduce the parameter -space and improve prediction speed. Furthermore, we investigate two methods for -handling features that include $n$-grams of mixed lengths. Our model achieves -state-of-the-art results on tasks in relation extraction, PP-attachment, and -preposition disambiguation. -" -2667,1604.00466,"Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed - Elgammal",Automatic Annotation of Structured Facts in Images,cs.CL cs.CV," Motivated by the application of fact-level image understanding, we present an -automatic method for data collection of structured visual facts from images -with captions. Example structured facts include attributed objects (e.g., -), actions (e.g., ), interactions (e.g., ), and positional information (e.g., ). The -collected annotations are in the form of fact-image pairs (e.g., and an image region containing this fact). With a language approach, the -proposed method is able to collect hundreds of thousands of visual fact -annotations with accuracy of 83% according to human judgment. Our method -automatically collected more than 380,000 visual fact annotations and more than -110,000 unique visual facts from images with captions and localized them in -images in less than one day of processing time on standard CPU platforms. -" -2668,1604.00502,"Wenpeng Yin, Tobias Schnabel, Hinrich Sch\""utze",Online Updating of Word Representations for Part-of-Speech Tagging,cs.CL," We propose online unsupervised domain adaptation (DA), which is performed -incrementally as data comes in and is applicable when batch DA is not possible. -In a part-of-speech (POS) tagging evaluation, we find that online unsupervised -DA performs as well as batch DA. -" -2669,1604.00503,"Wenpeng Yin, Hinrich Sch\""utze",Discriminative Phrase Embedding for Paraphrase Identification,cs.CL," This work, concerning paraphrase identification task, on one hand contributes -to expanding deep learning embeddings to include continuous and discontinuous -linguistic phrases. On the other hand, it comes up with a new scheme TF-KLD-KNN -to learn the discriminative weights of words and phrases specific to paraphrase -task, so that a weighted sum of embeddings can represent sentences more -effectively. Based on these two innovations we get competitive state-of-the-art -performance on paraphrase identification. -" -2670,1604.00562,Jacob Andreas and Dan Klein,Reasoning About Pragmatics with Neural Listeners and Speakers,cs.CL cs.NE," We present a model for pragmatically describing scenes, in which contrastive -behavior results from a combination of inference-driven pragmatics and learned -semantics. Like previous learned approaches to language generation, our model -uses a simple feature-driven architecture (here a pair of neural ""listener"" and -""speaker"" models) to ground language in the world. Like inference-driven -approaches to pragmatics, our model actively reasons about listener behavior -when selecting utterances. For training, our approach requires only ordinary -captions, annotated _without_ demonstration of the pragmatic behavior the model -ultimately exhibits. In human evaluations on a referring expression game, our -approach succeeds 81% of the time, compared to a 69% success rate using -existing techniques. -" -2671,1604.00727,"David Golub, Xiaodong He",Character-Level Question Answering with Attention,cs.CL cs.AI cs.LG," We show that a character-level encoder-decoder framework can be successfully -applied to question answering with a structured knowledge base. We use our -model for single-relation question answering and demonstrate the effectiveness -of our approach on the SimpleQuestions dataset (Bordes et al., 2015), where we -improve state-of-the-art accuracy from 63.9% to 70.9%, without use of -ensembles. Importantly, our character-level model has 16x fewer parameters than -an equivalent word-level model, can be learned with significantly less data -compared to previous work, which relies on data augmentation, and is robust to -new entities in testing. -" -2672,1604.00734,"Matthew Francis-Landau, Greg Durrett and Dan Klein","Capturing Semantic Similarity for Entity Linking with Convolutional - Neural Networks",cs.CL," A key challenge in entity linking is making effective use of contextual -information to disambiguate mentions that might refer to different entities in -different contexts. We present a model that uses convolutional neural networks -to capture semantic correspondence between a mention's context and a proposed -target entity. These convolutional networks operate at multiple granularities -to exploit various kinds of topic information, and their rich parameterization -gives them the capacity to learn which n-grams characterize different topics. -We combine these networks with a sparse linear model to achieve -state-of-the-art performance on multiple entity linking datasets, outperforming -the prior systems of Durrett and Klein (2014) and Nguyen et al. (2014). -" -2673,1604.00788,"Minh-Thang Luong, Christopher D. Manning","Achieving Open Vocabulary Neural Machine Translation with Hybrid - Word-Character Models",cs.CL cs.LG," Nearly all previous work on neural machine translation (NMT) has used quite -restricted vocabularies, perhaps with a subsequent method to patch in unknown -words. This paper presents a novel word-character solution to achieving open -vocabulary NMT. We build hybrid systems that translate mostly at the word level -and consult the character components for rare words. Our character-level -recurrent neural networks compute source word representations and recover -unknown target words when needed. The twofold advantage of such a hybrid -approach is that it is much faster and easier to train than character-based -ones; at the same time, it never produces unknown words as in the case of -word-based models. On the WMT'15 English to Czech translation task, this hybrid -approach offers an addition boost of +2.1-11.4 BLEU points over models that -already handle unknown words. Our best system achieves a new state-of-the-art -result with 20.7 BLEU score. We demonstrate that our character models can -successfully learn to not only generate well-formed words for Czech, a -highly-inflected language with a very complex vocabulary, but also build -correct representations for English source words. -" -2674,1604.00790,"Cheng Wang, Haojin Yang, Christian Bartz, Christoph Meinel",Image Captioning with Deep Bidirectional LSTMs,cs.CV cs.CL cs.MM," This work presents an end-to-end trainable deep bidirectional LSTM -(Long-Short Term Memory) model for image captioning. Our model builds on a deep -convolutional neural network (CNN) and two separate LSTM networks. It is -capable of learning long term visual-language interactions by making use of -history and future context information at high level semantic space. Two novel -deep bidirectional variant models, in which we increase the depth of -nonlinearity transition in different way, are proposed to learn hierarchical -visual-language embeddings. Data augmentation techniques such as multi-crop, -multi-scale and vertical mirror are proposed to prevent overfitting in training -deep models. We visualize the evolution of bidirectional LSTM internal states -over time and qualitatively analyze how our models ""translate"" image to -sentence. Our proposed models are evaluated on caption generation and -image-sentence retrieval tasks with three benchmark datasets: Flickr8K, -Flickr30K and MSCOCO datasets. We demonstrate that bidirectional LSTM models -achieve highly competitive performance to the state-of-the-art results on -caption generation even without integrating additional mechanism (e.g. object -detection, attention model etc.) and significantly outperform recent methods on -retrieval task. -" -2675,1604.00834,"Andrzej Kulig, Jaroslaw Kwapien, Tomasz Stanisz, Stanislaw Drozdz",In narrative texts punctuation marks obey the same statistics as words,cs.CL physics.data-an," From a grammar point of view, the role of punctuation marks in a sentence is -formally defined and well understood. In semantic analysis punctuation plays -also a crucial role as a method of avoiding ambiguity of the meaning. A -different situation can be observed in the statistical analyses of language -samples, where the decision on whether the punctuation marks should be -considered or should be neglected is seen rather as arbitrary and at present it -belongs to a researcher's preference. An objective of this work is to shed some -light onto this problem by providing us with an answer to the question whether -the punctuation marks may be treated as ordinary words and whether they should -be included in any analysis of the word co-occurences. We already know from our -previous study (S.~Dro\.zd\.z {\it et al.}, Inf. Sci. 331 (2016) 32-44) that -full stops that determine the length of sentences are the main carrier of -long-range correlations. Now we extend that study and analyze statistical -properties of the most common punctuation marks in a few Indo-European -languages, investigate their frequencies, and locate them accordingly in the -Zipf rank-frequency plots as well as study their role in the word-adjacency -networks. We show that, from a statistical viewpoint, the punctuation marks -reveal properties that are qualitatively similar to the properties of the most -frequent words like articles, conjunctions, pronouns, and prepositions. This -refers to both the Zipfian analysis and the network analysis. By adding the -punctuation marks to the Zipf plots, we also show that these plots that are -normally described by the Zipf-Mandelbrot distribution largely restore the -power-law Zipfian behaviour for the most frequent items. -" -2676,1604.00933,"Walid Shalaby, Khalifeh Al Jadda, Mohammed Korayem and Trey Grainger","Entity Type Recognition using an Ensemble of Distributional Semantic - Models to Enhance Query Understanding",cs.CL cs.IR," We present an ensemble approach for categorizing search query entities in the -recruitment domain. Understanding the types of entities expressed in a search -query (Company, Skill, Job Title, etc.) enables more intelligent information -retrieval based upon those entities compared to a traditional keyword-based -search. Because search queries are typically very short, leveraging a -traditional bag-of-words model to identify entity types would be inappropriate -due to the lack of contextual information. Our approach instead combines clues -from different sources of varying complexity in order to collect real-world -knowledge about query entities. We employ distributional semantic -representations of query entities through two models: 1) contextual vectors -generated from encyclopedic corpora like Wikipedia, and 2) high dimensional -word embedding vectors generated from millions of job postings using word2vec. -Additionally, our approach utilizes both entity linguistic properties obtained -from WordNet and ontological properties extracted from DBpedia. We evaluate our -approach on a data set created at CareerBuilder; the largest job board in the -US. The data set contains entities extracted from millions of job -seekers/recruiters search queries, job postings, and resume documents. After -constructing the distributional vectors of search entities, we use supervised -machine learning to infer search entity types. Empirical results show that our -approach outperforms the state-of-the-art word2vec distributional semantics -model trained on Wikipedia. Moreover, we achieve micro-averaged F 1 score of -97% using the proposed distributional representations ensemble. -" -2677,1604.00938,Tomasz Jurczyk and Jinho D. Choi,Multi-Field Structural Decomposition for Question Answering,cs.CL," This paper presents a precursory yet novel approach to the question answering -task using structural decomposition. Our system first generates linguistic -structures such as syntactic and semantic trees from text, decomposes them into -multiple fields, then indexes the terms in each field. For each question, it -decomposes the question into multiple fields, measures the relevance score of -each field to the indexed ones, then ranks all documents by their relevance -scores and weights associated with the fields, where the weights are learned -through statistical modeling. Our final model gives an absolute improvement of -over 40% to the baseline approach using simple search for detecting documents -containing answers. -" -2678,1604.01178,Aliaksei Severyn and Alessandro Moschitti,"Modeling Relational Information in Question-Answer Pairs with - Convolutional Neural Networks",cs.CL," In this paper, we propose convolutional neural networks for learning an -optimal representation of question and answer sentences. Their main aspect is -the use of relational information given by the matches between words from the -two members of the pair. The matches are encoded as embeddings with additional -parameters (dimensions), which are tuned by the network. These allows for -better capturing interactions between questions and answers, resulting in a -significant boost in accuracy. We test our models on two widely used answer -sentence selection benchmarks. The results clearly show the effectiveness of -our relational information, which allows our relatively simple network to -approach the state of the art. -" -2679,1604.01219,"Yuting Qiang, Yanwei Fu, Yanwen Guo, Zhi-Hua Zhou and Leonid Sigal",Learning to Generate Posters of Scientific Papers,cs.AI cs.CL cs.HC cs.MM stat.ML," Researchers often summarize their work in the form of posters. Posters -provide a coherent and efficient way to convey core ideas from scientific -papers. Generating a good scientific poster, however, is a complex and time -consuming cognitive task, since such posters need to be readable, informative, -and visually aesthetic. In this paper, for the first time, we study the -challenging problem of learning to generate posters from scientific papers. To -this end, a data-driven framework, that utilizes graphical models, is proposed. -Specifically, given content to display, the key elements of a good poster, -including panel layout and attributes of each panel, are learned and inferred -from data. Then, given inferred layout and attributes, composition of graphical -elements within each panel is synthesized. To learn and validate our model, we -collect and make public a Poster-Paper dataset, which consists of scientific -papers and corresponding posters with exhaustively labelled panels and -attributes. Qualitative and quantitative results indicate the effectiveness of -our approach. -" -2680,1604.01221,"Guntis Barzdins, Steve Renals, Didzis Gosko","Character-Level Neural Translation for Multilingual Media Monitoring in - the SUMMA Project",cs.CL," The paper steps outside the comfort-zone of the traditional NLP tasks like -automatic speech recognition (ASR) and machine translation (MT) to addresses -two novel problems arising in the automated multilingual news monitoring: -segmentation of the TV and radio program ASR transcripts into individual -stories, and clustering of the individual stories coming from various sources -and languages into storylines. Storyline clustering of stories covering the -same events is an essential task for inquisitorial media monitoring. We address -these two problems jointly by engaging the low-dimensional semantic -representation capabilities of the sequence to sequence neural translation -models. To enable joint multi-task learning for multilingual neural translation -of morphologically rich languages we replace the attention mechanism with the -sliding-window mechanism and operate the sequence to sequence neural -translation model on the character-level rather than on the word-level. The -story segmentation and storyline clustering problem is tackled by examining the -low-dimensional vectors produced as a side-product of the neural translation -process. The results of this paper describe a novel approach to the automatic -story segmentation and storyline clustering problem. -" -2681,1604.01235,"Vijay Krishna Menon, S. Rajendran, M. Anand Kumar, K.P. Soman",A new TAG Formalism for Tamil and Parser Analytics,cs.CL," Tree adjoining grammar (TAG) is specifically suited for morph rich and -agglutinated languages like Tamil due to its psycho linguistic features and -parse time dependency and morph resolution. Though TAG and LTAG formalisms have -been known for about 3 decades, efforts on designing TAG Syntax for Tamil have -not been entirely successful due to the complexity of its specification and the -rich morphology of Tamil language. In this paper we present a minimalistic TAG -for Tamil without much morphological considerations and also introduce a parser -implementation with some obvious variations from the XTAG system -" -2682,1604.01243,Massimo Stella and Markus Brede,"Mental Lexicon Growth Modelling Reveals the Multiplexity of the English - Language",physics.soc-ph cs.CL cs.SI," In this work we extend previous analyses of linguistic networks by adopting a -multi-layer network framework for modelling the human mental lexicon, i.e. an -abstract mental repository where words and concepts are stored together with -their linguistic patterns. Across a three-layer linguistic multiplex, we model -English words as nodes and connect them according to (i) phonological -similarities, (ii) synonym relationships and (iii) free word associations. Our -main aim is to exploit this multi-layered structure to explore the influence of -phonological and semantic relationships on lexicon assembly over time. We -propose a model of lexicon growth which is driven by the phonological layer: -words are suggested according to different orderings of insertion (e.g. shorter -word length, highest frequency, semantic multiplex features) and accepted or -rejected subject to constraints. We then measure times of network assembly and -compare these to empirical data about the age of acquisition of words. In -agreement with empirical studies in psycholinguistics, our results provide -quantitative evidence for the hypothesis that word acquisition is driven by -features at multiple levels of organisation within language. -" -2683,1604.01272,Despoina Christou,"Feature extraction using Latent Dirichlet Allocation and Neural - Networks: A case study on movie synopses",cs.CL cs.AI cs.IR cs.LG stat.ML," Feature extraction has gained increasing attention in the field of machine -learning, as in order to detect patterns, extract information, or predict -future observations from big data, the urge of informative features is crucial. -The process of extracting features is highly linked to dimensionality reduction -as it implies the transformation of the data from a sparse high-dimensional -space, to higher level meaningful abstractions. This dissertation employs -Neural Networks for distributed paragraph representations, and Latent Dirichlet -Allocation to capture higher level features of paragraph vectors. Although -Neural Networks for distributed paragraph representations are considered the -state of the art for extracting paragraph vectors, we show that a quick topic -analysis model such as Latent Dirichlet Allocation can provide meaningful -features too. We evaluate the two methods on the CMU Movie Summary Corpus, a -collection of 25,203 movie plot summaries extracted from Wikipedia. Finally, -for both approaches, we use K-Nearest Neighbors to discover similar movies, and -plot the projected representations using T-Distributed Stochastic Neighbor -Embedding to depict the context similarities. These similarities, expressed as -movie distances, can be used for movies recommendation. The recommended movies -of this approach are compared with the recommended movies from IMDB, which use -a collaborative filtering recommendation approach, to show that our two models -could constitute either an alternative or a supplementary recommendation -approach. -" -2684,1604.01278,"Guntis Barzdins, Didzis Gosko","RIGA at SemEval-2016 Task 8: Impact of Smatch Extensions and - Character-Level Neural Translation on AMR Parsing Accuracy",cs.CL," Two extensions to the AMR smatch scoring script are presented. The first -extension com-bines the smatch scoring script with the C6.0 rule-based -classifier to produce a human-readable report on the error patterns frequency -observed in the scored AMR graphs. This first extension results in 4% gain over -the state-of-art CAMR baseline parser by adding to it a manually crafted -wrapper fixing the identified CAMR parser errors. The second extension combines -a per-sentence smatch with an en-semble method for selecting the best AMR graph -among the set of AMR graphs for the same sentence. This second modification -au-tomatically yields further 0.4% gain when ap-plied to outputs of two -nondeterministic AMR parsers: a CAMR+wrapper parser and a novel character-level -neural translation AMR parser. For AMR parsing task the character-level neural -translation attains surprising 7% gain over the carefully optimized word-level -neural translation. Overall, we achieve smatch F1=62% on the SemEval-2016 -official scor-ing set and F1=67% on the LDC2015E86 test set. -" -2685,1604.01485,"Ilija Ilievski, Shuicheng Yan, Jiashi Feng",A Focused Dynamic Attention Model for Visual Question Answering,cs.CV cs.CL cs.NE," Visual Question and Answering (VQA) problems are attracting increasing -interest from multiple research disciplines. Solving VQA problems requires -techniques from both computer vision for understanding the visual contents of a -presented image or video, as well as the ones from natural language processing -for understanding semantics of the question and generating the answers. -Regarding visual content modeling, most of existing VQA methods adopt the -strategy of extracting global features from the image or video, which -inevitably fails in capturing fine-grained information such as spatial -configuration of multiple objects. Extracting features from auto-generated -regions -- as some region-based image recognition methods do -- cannot -essentially address this problem and may introduce some overwhelming irrelevant -features with the question. In this work, we propose a novel Focused Dynamic -Attention (FDA) model to provide better aligned image content representation -with proposed questions. Being aware of the key words in the question, FDA -employs off-the-shelf object detector to identify important regions and fuse -the information from the regions and global features via an LSTM unit. Such -question-driven representations are then combined with question representation -and fed into a reasoning unit for generating the answers. Extensive evaluation -on a large-scale benchmark dataset, VQA, clearly demonstrate the superior -performance of FDA over well-established baselines. -" -2686,1604.01537,"Xiaoyuan Yi, Ruoyu Li, Maosong Sun",Generating Chinese Classical Poems with RNN Encoder-Decoder,cs.CL cs.NE," We take the generation of Chinese classical poem lines as a -sequence-to-sequence learning problem, and build a novel system based on the -RNN Encoder-Decoder structure to generate quatrains (Jueju in Chinese), with a -topic word as input. Our system can jointly learn semantic meaning within a -single line, semantic relevance among lines in a poem, and the use of -structural, rhythmical and tonal patterns, without utilizing any constraint -templates. Experimental results show that our system outperforms other -competitive systems. We also find that the attention mechanism can capture the -word associations in Chinese classical poetry and inverting target lines in -training can improve performance. -" -2687,1604.01692,Robyn Speer and Joshua Chin,An Ensemble Method to Produce High-Quality Word Embeddings (2016),cs.CL," A currently successful approach to computational semantics is to represent -words as embeddings in a machine-learned vector space. We present an ensemble -method that combines embeddings produced by GloVe (Pennington et al., 2014) and -word2vec (Mikolov et al., 2013) with structured knowledge from the semantic -networks ConceptNet (Speer and Havasi, 2012) and PPDB (Ganitkevitch et al., -2013), merging their information into a common representation with a large, -multilingual vocabulary. The embeddings it produces achieve state-of-the-art -performance on many word-similarity evaluations. Its score of $\rho = .596$ on -an evaluation of rare words (Luong et al., 2013) is 16% higher than the -previous best known system. -" -2688,1604.01696,"Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, - Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli and James Allen","A Corpus and Evaluation Framework for Deeper Understanding of - Commonsense Stories",cs.CL cs.AI," Representation and learning of commonsense knowledge is one of the -foundational problems in the quest to enable deep language understanding. This -issue is particularly challenging for understanding casual and correlational -relationships between events. While this topic has received a lot of interest -in the NLP community, research has been hindered by the lack of a proper -evaluation framework. This paper attempts to address this problem with a new -framework for evaluating story understanding and script learning: the 'Story -Cloze Test'. This test requires a system to choose the correct ending to a -four-sentence story. We created a new corpus of ~50k five-sentence commonsense -stories, ROCStories, to enable this evaluation. This corpus is unique in two -ways: (1) it captures a rich set of causal and temporal commonsense relations -between daily events, and (2) it is a high quality collection of everyday life -stories that can also be used for story generation. Experimental evaluation -shows that a host of baselines and state-of-the-art models based on shallow -language understanding struggle to achieve a high score on the Story Cloze -Test. We discuss these implications for script and story learning, and offer -suggestions for deeper language understanding. -" -2689,1604.01729,"Subhashini Venugopalan, Lisa Anne Hendricks, Raymond Mooney, Kate - Saenko","Improving LSTM-based Video Description with Linguistic Knowledge Mined - from Text",cs.CL cs.CV," This paper investigates how linguistic knowledge mined from large text -corpora can aid the generation of natural language descriptions of videos. -Specifically, we integrate both a neural language model and distributional -semantics trained on large text corpora into a recent LSTM-based architecture -for video description. We evaluate our approach on a collection of Youtube -videos as well as two large movie description datasets showing significant -improvements in grammaticality while modestly improving descriptive quality. -" -2690,1604.01792,"Tom Sercu, Vaibhava Goel",Advances in Very Deep Convolutional Neural Networks for LVCSR,cs.CL cs.LG cs.NE," Very deep CNNs with small 3x3 kernels have recently been shown to achieve -very strong performance as acoustic models in hybrid NN-HMM speech recognition -systems. In this paper we investigate how to efficiently scale these models to -larger datasets. Specifically, we address the design choice of pooling and -padding along the time dimension which renders convolutional evaluation of -sequences highly inefficient. We propose a new CNN design without timepadding -and without timepooling, which is slightly suboptimal for accuracy, but has two -significant advantages: it enables sequence training and deployment by allowing -efficient convolutional evaluation of full utterances, and, it allows for batch -normalization to be straightforwardly adopted to CNNs on sequence data. Through -batch normalization, we recover the lost peformance from removing the -time-pooling, while keeping the benefit of efficient convolutional evaluation. -We demonstrate the performance of our models both on larger scale data than -before, and after sequence training. Our very deep CNN model sequence trained -on the 2000h switchboard dataset obtains 9.4 word error rate on the Hub5 -test-set, matching with a single model the performance of the 2015 IBM system -combination, which was the previous best published result. -" -2691,1604.01904,"Ayana, Shiqi Shen, Yu Zhao, Zhiyuan Liu, Maosong Sun",Neural Headline Generation with Sentence-wise Optimization,cs.CL," Recently, neural models have been proposed for headline generation by -learning to map documents to headlines with recurrent neural networks. -Nevertheless, as traditional neural network utilizes maximum likelihood -estimation for parameter optimization, it essentially constrains the expected -training objective within word level rather than sentence level. Moreover, the -performance of model prediction significantly relies on training data -distribution. To overcome these drawbacks, we employ minimum risk training -strategy in this paper, which directly optimizes model parameters in sentence -level with respect to evaluation metrics and leads to significant improvements -for headline generation. Experiment results show that our models outperforms -state-of-the-art systems on both English and Chinese headline generation tasks. -" -2692,1604.02027,Ke Jiang and Suvrit Sra and Brian Kulis,Combinatorial Topic Models using Small-Variance Asymptotics,cs.LG cs.CL stat.ML," Topic models have emerged as fundamental tools in unsupervised machine -learning. Most modern topic modeling algorithms take a probabilistic view and -derive inference algorithms based on Latent Dirichlet Allocation (LDA) or its -variants. In contrast, we study topic modeling as a combinatorial optimization -problem, and propose a new objective function derived from LDA by passing to -the small-variance limit. We minimize the derived objective by using ideas from -combinatorial optimization, which results in a new, fast, and high-quality -topic modeling algorithm. In particular, we show that our results are -competitive with popular LDA-based topic modeling approaches, and also discuss -the (dis)similarities between our approach and its probabilistic counterparts. -" -2693,1604.02038,"Fei Tian, Bin Gao, Di He, Tie-Yan Liu","Sentence Level Recurrent Topic Model: Letting Topics Speak for - Themselves",cs.LG cs.CL cs.IR," We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic model -that assumes the generation of each word within a sentence to depend on both -the topic of the sentence and the whole history of its preceding words in the -sentence. Different from conventional topic models that largely ignore the -sequential order of words or their topic coherence, SLRTM gives full -characterization to them by using a Recurrent Neural Networks (RNN) based -framework. Experimental results have shown that SLRTM outperforms several -strong baselines on various tasks. Furthermore, SLRTM can automatically -generate sentences given a topic (i.e., topics to sentences), which is a key -technology for real world applications such as personalized short text -conversation. -" -2694,1604.02125,"Gordon Christie, Ankit Laddha, Aishwarya Agrawal, Stanislaw Antol, - Yash Goyal, Kevin Kochersberger, Dhruv Batra","Resolving Language and Vision Ambiguities Together: Joint Segmentation & - Prepositional Attachment Resolution in Captioned Scenes",cs.CV cs.CL cs.LG," We present an approach to simultaneously perform semantic segmentation and -prepositional phrase attachment resolution for captioned images. Some -ambiguities in language cannot be resolved without simultaneously reasoning -about an associated image. If we consider the sentence ""I shot an elephant in -my pajamas"", looking at language alone (and not using common sense), it is -unclear if it is the person or the elephant wearing the pajamas or both. Our -approach produces a diverse set of plausible hypotheses for both semantic -segmentation and prepositional phrase attachment resolution that are then -jointly reranked to select the most consistent pair. We show that our semantic -segmentation and prepositional phrase attachment resolution modules have -complementary strengths, and that joint reasoning produces more accurate -results than any module operating in isolation. Multiple hypotheses are also -shown to be crucial to improved multiple-module reasoning. Our vision and -language approach significantly outperforms the Stanford Parser (De Marneffe et -al., 2006) by 17.91% (28.69% relative) and 12.83% (25.28% relative) in two -different experiments. We also make small improvements over DeepLab-CRF (Chen -et al., 2015). -" -2695,1604.02201,"Barret Zoph, Deniz Yuret, Jonathan May, and Kevin Knight",Transfer Learning for Low-Resource Neural Machine Translation,cs.CL," The encoder-decoder framework for neural machine translation (NMT) has been -shown effective in large data scenarios, but is much less effective for -low-resource languages. We present a transfer learning method that -significantly improves Bleu scores across a range of low-resource languages. -Our key idea is to first train a high-resource language pair (the parent -model), then transfer some of the learned parameters to the low-resource pair -(the child model) to initialize and constrain training. Using our transfer -learning method we improve baseline NMT models by an average of 5.6 Bleu on -four low-resource language pairs. Ensembling and unknown word replacement add -another 2 Bleu which brings the NMT performance on low-resource machine -translation close to a strong syntax based machine translation (SBMT) system, -exceeding its performance on one language pair. Additionally, using the -transfer learning model for re-scoring, we can improve the SBMT system by an -average of 1.3 Bleu, improving the state-of-the-art on low-resource machine -translation. -" -2696,1604.02506,Antonio Jimeno Yepes,"Word embeddings and recurrent neural networks based on Long-Short Term - Memory nodes in supervised biomedical word sense disambiguation",cs.CL cs.LG," Word sense disambiguation helps identifying the proper sense of ambiguous -words in text. With large terminologies such as the UMLS Metathesaurus -ambiguities appear and highly effective disambiguation methods are required. -Supervised learning algorithm methods are used as one of the approaches to -perform disambiguation. Features extracted from the context of an ambiguous -word are used to identify the proper sense of such a word. The type of features -have an impact on machine learning methods, thus affect disambiguation -performance. In this work, we have evaluated several types of features derived -from the context of the ambiguous word and we have explored as well more global -features derived from MEDLINE using word embeddings. Results show that word -embeddings improve the performance of more traditional features and allow as -well using recurrent neural network classifiers based on Long-Short Term Memory -(LSTM) nodes. The combination of unigrams and word embeddings with an SVM sets -a new state of the art performance with a macro accuracy of 95.97 in the MSH -WSD data set. -" -2697,1604.02594,"Zhiyun Lu, Vikas Sindhwani, Tara N. Sainath",Learning Compact Recurrent Neural Networks,cs.LG cs.CL cs.NE," Recurrent neural networks (RNNs), including long short-term memory (LSTM) -RNNs, have produced state-of-the-art results on a variety of speech recognition -tasks. However, these models are often too large in size for deployment on -mobile devices with memory and latency constraints. In this work, we study -mechanisms for learning compact RNNs and LSTMs via low-rank factorizations and -parameter sharing schemes. Our goal is to investigate redundancies in recurrent -architectures where compression can be admitted without losing performance. A -hybrid strategy of using structured matrices in the bottom layers and shared -low-rank factors on the top layers is found to be particularly effective, -reducing the parameters of a standard LSTM by 75%, at a small cost of 0.3% -increase in WER, on a 2,000-hr English Voice Search task. -" -2698,1604.02612,"Mois\'es H. R. Pereira, Fl\'avio L. C. P\'adua, Adriano C. M. Pereira, - Fabr\'icio Benevenuto, Daniel H. Dalip","Fusing Audio, Textual and Visual Features for Sentiment Analysis of News - Videos",cs.CL," This paper presents a novel approach to perform sentiment analysis of news -videos, based on the fusion of audio, textual and visual clues extracted from -their contents. The proposed approach aims at contributing to the -semiodiscoursive study regarding the construction of the ethos (identity) of -this media universe, which has become a central part of the modern-day lives of -millions of people. To achieve this goal, we apply state-of-the-art -computational methods for (1) automatic emotion recognition from facial -expressions, (2) extraction of modulations in the participants' speeches and -(3) sentiment analysis from the closed caption associated to the videos of -interest. More specifically, we compute features, such as, visual intensities -of recognized emotions, field sizes of participants, voicing probability, sound -loudness, speech fundamental frequencies and the sentiment scores (polarities) -from text sentences in the closed caption. Experimental results with a dataset -containing 520 annotated news videos from three Brazilian and one American -popular TV newscasts show that our approach achieves an accuracy of up to 84% -in the sentiments (tension levels) classification task, thus demonstrating its -high potential to be used by media analysts in several applications, -especially, in the journalistic domain. -" -2699,1604.02843,Yuan Sun and Zhen Zhu,Method of Tibetan Person Knowledge Extraction,cs.CL," Person knowledge extraction is the foundation of the Tibetan knowledge graph -construction, which provides support for Tibetan question answering system, -information retrieval, information extraction and other researches, and -promotes national unity and social stability. This paper proposes a SVM and -template based approach to Tibetan person knowledge extraction. Through -constructing the training corpus, we build the templates based the shallow -parsing analysis of Tibetan syntactic, semantic features and verbs. Using the -training corpus, we design a hierarchical SVM classifier to realize the entity -knowledge extraction. Finally, experimental results prove the method has -greater improvement in Tibetan person knowledge extraction. -" -2700,1604.02993,Karl Pichotta and Raymond J. Mooney,Using Sentence-Level LSTM Language Models for Script Inference,cs.CL," There is a small but growing body of research on statistical scripts, models -of event sequences that allow probabilistic inference of implicit events from -documents. These systems operate on structured verb-argument events produced by -an NLP pipeline. We compare these systems with recent Recurrent Neural Net -models that directly operate on raw tokens to predict sentences, finding the -latter to be roughly comparable to the former in terms of predicting missing -events in documents. -" -2701,1604.03029,Semi Min and Juyong Park,"Mapping Out Narrative Structures and Dynamics Using Networks and Textual - Information",cs.CL cs.SI physics.soc-ph," Human communication is often executed in the form of a narrative, an account -of connected events composed of characters, actions, and settings. A coherent -narrative structure is therefore a requisite for a well-formulated narrative -- -be it fictional or nonfictional -- for informative and effective communication, -opening up the possibility of a deeper understanding of a narrative by studying -its structural properties. In this paper we present a network-based framework -for modeling and analyzing the structure of a narrative, which is further -expanded by incorporating methods from computational linguistics to utilize the -narrative text. Modeling a narrative as a dynamically unfolding system, we -characterize its progression via the growth patterns of the character network, -and use sentiment analysis and topic modeling to represent the actual content -of the narrative in the form of interaction maps between characters with -associated sentiment values and keywords. This is a network framework advanced -beyond the simple occurrence-based one most often used until now, allowing one -to utilize the unique characteristics of a given narrative to a high degree. -Given the ubiquity and importance of narratives, such advanced network-based -representation and analysis framework may lead to a more systematic modeling -and understanding of narratives for social interactions, expression of human -sentiments, and communication. -" -2702,1604.03035,"Sam Wiseman, Alexander M. Rush, Stuart M. Shieber",Learning Global Features for Coreference Resolution,cs.CL," There is compelling evidence that coreference prediction would benefit from -modeling global information about entity-clusters. Yet, state-of-the-art -performance can be achieved with systems treating each mention prediction -independently, which we attribute to the inherent difficulty of crafting -informative cluster-level features. We instead propose to use recurrent neural -networks (RNNs) to learn latent, global representations of entity clusters -directly from their mentions. We show that such representations are especially -useful for the prediction of pronominal mentions, and can be incorporated into -an end-to-end coreference system that outperforms the state of the art without -requiring any additional search. -" -2703,1604.03114,"Justine Zhang, Ravi Kumar, Sujith Ravi, Cristian - Danescu-Niculescu-Mizil",Conversational flow in Oxford-style debates,cs.CL cs.AI cs.SI physics.soc-ph stat.ML," Public debates are a common platform for presenting and juxtaposing diverging -views on important issues. In this work we propose a methodology for tracking -how ideas flow between participants throughout a debate. We use this approach -in a case study of Oxford-style debates---a competitive format where the winner -is determined by audience votes---and show how the outcome of a debate depends -on aspects of conversational flow. In particular, we find that winners tend to -make better use of a debate's interactive component than losers, by actively -pursuing their opponents' points rather than promoting their own ideas over the -course of the conversation. -" -2704,1604.03136,"Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish - Srivastava, Radhika Mamidi, Dipti M. Sharma",Shallow Parsing Pipeline for Hindi-English Code-Mixed Social Media Text,cs.CL," In this study, the problem of shallow parsing of Hindi-English code-mixed -social media text (CSMT) has been addressed. We have annotated the data, -developed a language identifier, a normalizer, a part-of-speech tagger and a -shallow parser. To the best of our knowledge, we are the first to attempt -shallow parsing on CSMT. The pipeline developed has been made available to the -research community with the goal of enabling better text analysis of Hindi -English CSMT. The pipeline is accessible at http://bit.ly/csmt-parser-api . -" -2705,1604.03209,"Vicky Zayats, Mari Ostendorf and Hannaneh Hajishirzi",Disfluency Detection using a Bidirectional LSTM,cs.CL," We introduce a new approach for disfluency detection using a Bidirectional -Long-Short Term Memory neural network (BLSTM). In addition to the word -sequence, the model takes as input pattern match features that were developed -to reduce sensitivity to vocabulary size in training, which lead to improved -performance over the word sequence alone. The BLSTM takes advantage of explicit -repair states in addition to the standard reparandum states. The final output -leverages integer linear programming to incorporate constraints of disfluency -structure. In experiments on the Switchboard corpus, the model achieves -state-of-the-art performance for both the standard disfluency detection task -and the correction detection task. Analysis shows that the model has better -detection of non-repetition disfluencies, which tend to be much harder to -detect. -" -2706,1604.03249,Marcus Rohrbach,"Attributes as Semantic Units between Natural Language and Visual - Recognition",cs.CV cs.CL," Impressive progress has been made in the fields of computer vision and -natural language processing. However, it remains a challenge to find the best -point of interaction for these very different modalities. In this chapter we -discuss how attributes allow us to exchange information between the two -modalities and in this way lead to an interaction on a semantic level. -Specifically we discuss how attributes allow using knowledge mined from -language resources for recognizing novel visual categories, how we can generate -sentence description about images and video, how we can ground natural language -in visual content, and finally, how we can answer natural language questions -about images. -" -2707,1604.03318,"A.B.M. Shamsuzzaman Sadi, Towfique Anam, Mohamed Abdirazak, Abdillahi - Hasan Adnan, Sazid Zaman Khan, Mohamed Mahmudur Rahman, Ghassan Samara",Applying Ontological Modeling on Quranic Nature Domain,cs.AI cs.CL," The holy Quran is the holy book of the Muslims. It contains information about -many domains. Often people search for particular concepts of holy Quran based -on the relations among concepts. An ontological modeling of holy Quran can be -useful in such a scenario. In this paper, we have modeled nature related -concepts of holy Quran using OWL (Web Ontology Language) / RDF (Resource -Description Framework). Our methodology involves identifying nature related -concepts mentioned in holy Quran and identifying relations among those -concepts. These concepts and relations are represented as classes/instances and -properties of an OWL ontology. Later, in the result section it is shown that, -using the Ontological model, SPARQL queries can retrieve verses and concepts of -interest. Thus, this modeling helps semantic search and query on the holy -Quran. In this work, we have used English translation of the holy Quran by -Sahih International, Protege OWL Editor and for querying we have used SPARQL. -" -2708,1604.03357,"Sigrid Klerke, Yoav Goldberg and Anders S{\o}gaard",Improving sentence compression by learning to predict gaze,cs.CL," We show how eye-tracking corpora can be used to improve sentence compression -models, presenting a novel multi-task learning algorithm based on multi-layer -LSTMs. We obtain performance competitive with or better than state-of-the-art -approaches. -" -2709,1604.03390,"\'Alvaro Peris, Marc Bola\~nos, Petia Radeva and Francisco Casacuberta",Video Description using Bidirectional Recurrent Neural Networks,cs.CV cs.CL cs.LG," Although traditionally used in the machine translation field, the -encoder-decoder framework has been recently applied for the generation of video -and image descriptions. The combination of Convolutional and Recurrent Neural -Networks in these models has proven to outperform the previous state of the -art, obtaining more accurate video descriptions. In this work we propose -pushing further this model by introducing two contributions into the encoding -stage. First, producing richer image representations by combining object and -location information from Convolutional Neural Networks and second, introducing -Bidirectional Recurrent Neural Networks for capturing both forward and backward -temporal relationships in the input frames. -" -2710,1604.03627,"Norah Abokhodair, Daisy Yoo, David W. McDonald","Dissecting a Social Botnet: Growth, Content and Influence in Twitter",cs.CY cs.CL cs.SI," Social botnets have become an important phenomenon on social media. There are -many ways in which social bots can disrupt or influence online discourse, such -as, spam hashtags, scam twitter users, and astroturfing. In this paper we -considered one specific social botnet in Twitter to understand how it grows -over time, how the content of tweets by the social botnet differ from regular -users in the same dataset, and lastly, how the social botnet may have -influenced the relevant discussions. Our analysis is based on a qualitative -coding for approximately 3000 tweets in Arabic and English from the Syrian -social bot that was active for 35 weeks on Twitter before it was shutdown. We -find that the growth, behavior and content of this particular botnet did not -specifically align with common conceptions of botnets. Further we identify -interesting aspects of the botnet that distinguish it from regular users. -" -2711,1604.03968,"Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan - Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet - Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, - Michel Galley, Margaret Mitchell",Visual Storytelling,cs.CL cs.AI cs.CV," We introduce the first dataset for sequential vision-to-language, and explore -how this data may be used for the task of visual storytelling. The first -release of this dataset, SIND v.1, includes 81,743 unique photos in 20,211 -sequences, aligned to both descriptive (caption) and story language. We -establish several strong baselines for the storytelling task, and motivate an -automatic metric to benchmark progress. Modelling concrete description as well -as figurative and social language, as provided in this dataset and the -storytelling task, has the potential to move artificial intelligence from basic -understandings of typical visual scenes towards more and more human-like -understanding of grounded event structure and subjective expression. -" -2712,1604.04358,"Xiang Li, Lili Mou, Rui Yan, Ming Zhang","StalemateBreaker: A Proactive Content-Introducing Approach to Automatic - Human-Computer Conversation",cs.CL cs.AI cs.IR," Existing open-domain human-computer conversation systems are typically -passive: they either synthesize or retrieve a reply provided a human-issued -utterance. It is generally presumed that humans should take the role to lead -the conversation and introduce new content when a stalemate occurs, and that -the computer only needs to ""respond."" In this paper, we propose -StalemateBreaker, a conversation system that can proactively introduce new -content when appropriate. We design a pipeline to determine when, what, and how -to introduce new content during human-computer conversation. We further propose -a novel reranking algorithm Bi-PageRank-HITS to enable rich interaction between -conversation context and candidate replies. Experiments show that both the -content-introducing approach and the reranking algorithm are effective. Our -full StalemateBreaker model outperforms a state-of-the-practice conversation -system by +14.4% p@1 when a stalemate occurs. -" -2713,1604.04378,"Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, Xueqi - Cheng",Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN,cs.CL cs.AI cs.LG cs.NE," Semantic matching, which aims to determine the matching degree between two -texts, is a fundamental problem for many NLP applications. Recently, deep -learning approach has been applied to this problem and significant improvements -have been achieved. In this paper, we propose to view the generation of the -global interaction between two texts as a recursive process: i.e. the -interaction of two texts at each position is a composition of the interactions -between their prefixes as well as the word level interaction at the current -position. Based on this idea, we propose a novel deep architecture, namely -Match-SRNN, to model the recursive matching structure. Firstly, a tensor is -constructed to capture the word level interactions. Then a spatial RNN is -applied to integrate the local interactions recursively, with importance -determined by four types of gates. Finally, the matching score is calculated -based on the global interaction. We show that, after degenerated to the exact -matching scenario, Match-SRNN can approximate the dynamic programming process -of longest common subsequence. Thus, there exists a clear interpretation for -Match-SRNN. Our experiments on two semantic matching tasks showed the -effectiveness of Match-SRNN, and its ability of visualizing the learned -matching structure. -" -2714,1604.04383,"Milos Cernak, Alexandros Lazaridis, Afsaneh Asaei, Philip N. Garner","Composition of Deep and Spiking Neural Networks for Very Low Bit Rate - Speech Coding",cs.SD cs.CL," Most current very low bit rate (VLBR) speech coding systems use hidden Markov -model (HMM) based speech recognition/synthesis techniques. This allows -transmission of information (such as phonemes) segment by segment that -decreases the bit rate. However, the encoder based on a phoneme speech -recognition may create bursts of segmental errors. Segmental errors are further -propagated to optional suprasegmental (such as syllable) information coding. -Together with the errors of voicing detection in pitch parametrization, -HMM-based speech coding creates speech discontinuities and unnatural speech -sound artefacts. - In this paper, we propose a novel VLBR speech coding framework based on -neural networks (NNs) for end-to-end speech analysis and synthesis without -HMMs. The speech coding framework relies on phonological (sub-phonetic) -representation of speech, and it is designed as a composition of deep and -spiking NNs: a bank of phonological analysers at the transmitter, and a -phonological synthesizer at the receiver, both realised as deep NNs, and a -spiking NN as an incremental and robust encoder of syllable boundaries for -coding of continuous fundamental frequency (F0). A combination of phonological -features defines much more sound patterns than phonetic features defined by -HMM-based speech coders, and the finer analysis/synthesis code contributes into -smoother encoded speech. Listeners significantly prefer the NN-based approach -due to fewer discontinuities and speech artefacts of the encoded speech. A -single forward pass is required during the speech encoding and decoding. The -proposed VLBR speech coding operates at a bit rate of approximately 360 bits/s. -" -2715,1604.04562,"Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M. - Rojas-Barahona, Pei-Hao Su, Stefan Ultes, Steve Young",A Network-based End-to-End Trainable Task-oriented Dialogue System,cs.CL cs.AI cs.NE stat.ML," Teaching machines to accomplish tasks by conversing naturally with humans is -challenging. Currently, developing task-oriented dialogue systems requires -creating multiple components and typically this involves either a large amount -of handcrafting, or acquiring costly labelled datasets to solve a statistical -learning problem for each component. In this work we introduce a neural -network-based text-in, text-out end-to-end trainable goal-oriented dialogue -system along with a new way of collecting dialogue data based on a novel -pipe-lined Wizard-of-Oz framework. This approach allows us to develop dialogue -systems easily and without making too many assumptions about the task at hand. -The results show that the model can converse with human subjects naturally -whilst helping them to accomplish tasks in a restaurant search domain. -" -2716,1604.04661,"Shihao Ji, Nadathur Satish, Sheng Li, and Pradeep Dubey",Parallelizing Word2Vec in Shared and Distributed Memory,cs.DC cs.CL stat.ML," Word2Vec is a widely used algorithm for extracting low-dimensional vector -representations of words. It generated considerable excitement in the machine -learning and natural language processing (NLP) communities recently due to its -exceptional performance in many NLP applications such as named entity -recognition, sentiment analysis, machine translation and question answering. -State-of-the-art algorithms including those by Mikolov et al. have been -parallelized for multi-core CPU architectures but are based on vector-vector -operations that are memory-bandwidth intensive and do not efficiently use -computational resources. In this paper, we improve reuse of various data -structures in the algorithm through the use of minibatching, hence allowing us -to express the problem using matrix multiply operations. We also explore -different techniques to distribute word2vec computation across nodes in a -compute cluster, and demonstrate good strong scalability up to 32 nodes. In -combination, these techniques allow us to scale up the computation near -linearly across cores and nodes, and process hundreds of millions of words per -second, which is the fastest word2vec implementation to the best of our -knowledge. -" -2717,1604.04677,"Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber","Sentence-Level Grammatical Error Identification as Sequence-to-Sequence - Correction",cs.CL," We demonstrate that an attention-based encoder-decoder model can be used for -sentence-level grammatical error identification for the Automated Evaluation of -Scientific Writing (AESW) Shared Task 2016. The attention-based encoder-decoder -models can be used for the generation of corrections, in addition to error -identification, which is of interest for certain end-user applications. We show -that a character-based encoder-decoder model is particularly effective, -outperforming other results on the AESW Shared Task on its own, and showing -gains over a word-based counterpart. Our final model--a combination of three -character-based encoder-decoder models, one word-based encoder-decoder model, -and a sentence-level CNN--is the highest performing system on the AESW 2016 -binary prediction Shared Task. -" -2718,1604.04802,Nazneen Fatema Rajani and Raymond J. Mooney,Supervised and Unsupervised Ensembling for Knowledge Base Population,cs.CL cs.LG," We present results on combining supervised and unsupervised methods to -ensemble multiple systems for two popular Knowledge Base Population (KBP) -tasks, Cold Start Slot Filling (CSSF) and Tri-lingual Entity Discovery and -Linking (TEDL). We demonstrate that our combined system along with auxiliary -features outperforms the best performing system for both tasks in the 2015 -competition, several ensembling baselines, as well as the state-of-the-art -stacking approach to ensembling KBP systems. The success of our technique on -two different and challenging problems demonstrates the power and generality of -our combined approach to ensembling. -" -2719,1604.04835,"Han Xiao, Minlie Huang, Xiaoyan Zhu","SSP: Semantic Space Projection for Knowledge Graph Embedding with Text - Descriptions",cs.CL cs.LG," Knowledge representation is an important, long-history topic in AI, and there -have been a large amount of work for knowledge graph embedding which projects -symbolic entities and relations into low-dimensional, real-valued vector space. -However, most embedding methods merely concentrate on data fitting and ignore -the explicit semantic expression, leading to uninterpretable representations. -Thus, traditional embedding methods have limited potentials for many -applications such as question answering, and entity classification. To this -end, this paper proposes a semantic representation method for knowledge graph -\textbf{(KSR)}, which imposes a two-level hierarchical generative process that -globally extracts many aspects and then locally assigns a specific category in -each aspect for every triple. Since both aspects and categories are -semantics-relevant, the collection of categories in each aspect is treated as -the semantic representation of this triple. Extensive experiments justify our -model outperforms other state-of-the-art baselines substantially. -" -2720,1604.04873,"Andreas Scherbakov, Ekaterina Vylomova, Fei Liu, Timothy Baldwin",From Incremental Meaning to Semantic Unit (phrase by phrase),cs.CL," This paper describes an experimental approach to Detection of Minimal -Semantic Units and their Meaning (DiMSUM), explored within the framework of -SemEval 2016 Task 10. The approach is primarily based on a combination of word -embeddings and parserbased features, and employs unidirectional incremental -computation of compositional embeddings for multiword expressions. -" -2721,1604.05073,"Daniel Beck, Adri\`a de Gispert, Gonzalo Iglesias, Aurelien Waite, - Bill Byrne","Speed-Constrained Tuning for Statistical Machine Translation Using - Bayesian Optimization",cs.CL," We address the problem of automatically finding the parameters of a -statistical machine translation system that maximize BLEU scores while ensuring -that decoding speed exceeds a minimum value. We propose the use of Bayesian -Optimization to efficiently tune the speed-related decoding parameters by -easily incorporating speed as a noisy constraint function. The obtained -parameter values are guaranteed to satisfy the speed constraint with an -associated confidence margin. Across three language pairs and two speed -constraint values, we report overall optimization time reduction compared to -grid and random search. We also show that Bayesian Optimization can decouple -speed and BLEU measurements, resulting in a further reduction of overall -optimization time as speed is measured over a small subset of sentences. -" -2722,1604.05372,"Andrey Kutuzov, Mikhail Kopotev, Tatyana Sviridenko, Lyubov Ivanova","Clustering Comparable Corpora of Russian and Ukrainian Academic Texts: - Word Embeddings and Semantic Fingerprints",cs.CL," We present our experience in applying distributional semantics (neural word -embeddings) to the problem of representing and clustering documents in a -bilingual comparable corpus. Our data is a collection of Russian and Ukrainian -academic texts, for which topics are their academic fields. In order to build -language-independent semantic representations of these documents, we train -neural distributional models on monolingual corpora and learn the optimal -linear transformation of vectors from one language to another. The resulting -vectors are then used to produce `semantic fingerprints' of documents, serving -as input to a clustering algorithm. - The presented method is compared to several baselines including `orthographic -translation' with Levenshtein edit distance and outperforms them by a large -margin. We also show that language-independent `semantic fingerprints' are -superior to multi-lingual clustering algorithms proposed in the previous work, -at the same time requiring less linguistic resources. -" -2723,1604.05468,"Rahul Kamath, Masanao Ochi, Yutaka Matsuo","Understanding Rating Behaviour and Predicting Ratings by Identifying - Representative Users",cs.IR cs.AI cs.CL cs.LG," Online user reviews describing various products and services are now abundant -on the web. While the information conveyed through review texts and ratings is -easily comprehensible, there is a wealth of hidden information in them that is -not immediately obvious. In this study, we unlock this hidden value behind user -reviews to understand the various dimensions along which users rate products. -We learn a set of users that represent each of these dimensions and use their -ratings to predict product ratings. Specifically, we work with restaurant -reviews to identify users whose ratings are influenced by dimensions like -'Service', 'Atmosphere' etc. in order to predict restaurant ratings and -understand the variation in rating behaviour across different cuisines. While -previous approaches to obtaining product ratings require either a large number -of user ratings or a few review texts, we show that it is possible to predict -ratings with few user ratings and no review text. Our experiments show that our -approach outperforms other conventional methods by 16-27% in terms of RMSE. -" -2724,1604.05499,"Yijia Liu, Wanxiang Che, Jiang Guo, Bing Qin, Ting Liu",Exploring Segment Representations for Neural Segmentation Models,cs.CL," Many natural language processing (NLP) tasks can be generalized into -segmentation problem. In this paper, we combine semi-CRF with neural network to -solve NLP segmentation tasks. Our model represents a segment both by composing -the input units and embedding the entire segment. We thoroughly study different -composition functions and different segment embeddings. We conduct extensive -experiments on two typical segmentation tasks: named entity recognition (NER) -and Chinese word segmentation (CWS). Experimental results show that our neural -semi-CRF model benefits from representing the entire segment and achieves the -state-of-the-art performance on CWS benchmark dataset and competitive results -on the CoNLL03 dataset. -" -2725,1604.05519,Lingxun Meng and Yan Li,"M$^2$S-Net: Multi-Modal Similarity Metric Learning based Deep - Convolutional Network for Answer Selection",cs.CL," Recent works using artificial neural networks based on distributed word -representation greatly boost performance on various natural language processing -tasks, especially the answer selection problem. Nevertheless, most of the -previous works used deep learning methods (like LSTM-RNN, CNN, etc.) only to -capture semantic representation of each sentence separately, without -considering the interdependence between each other. In this paper, we propose a -novel end-to-end learning framework which constitutes deep convolutional neural -network based on multi-modal similarity metric learning (M$^2$S-Net) on -pairwise tokens. The proposed model demonstrates its performance by surpassing -previous state-of-the-art systems on the answer selection benchmark, i.e., -TREC-QA dataset, in both MAP and MRR metrics. -" -2726,1604.05525,"Sonse Shimaoka, Pontus Stenetorp, Kentaro Inui, Sebastian Riedel","An Attentive Neural Architecture for Fine-grained Entity Type - Classification",cs.CL," In this work we propose a novel attention-based neural network model for the -task of fine-grained entity type classification that unlike previously proposed -models recursively composes representations of entity mention contexts. Our -model achieves state-of-the-art performance with 74.94% loose micro F1-score on -the well-established FIGER dataset, a relative improvement of 2.59%. We also -investigate the behavior of the attention mechanism of our model and observe -that it can learn contextual linguistic expressions that indicate the -fine-grained category memberships of an entity. -" -2727,1604.05529,"Barbara Plank, Anders S{\o}gaard, Yoav Goldberg","Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term - Memory Models and Auxiliary Loss",cs.CL," Bidirectional long short-term memory (bi-LSTM) networks have recently proven -successful for various NLP sequence modeling tasks, but little is known about -their reliance to input representations, target languages, data set size, and -label noise. We address these issues and evaluate bi-LSTMs with word, -character, and unicode byte embeddings for POS tagging. We compare bi-LSTMs to -traditional POS taggers across languages and data sizes. We also present a -novel bi-LSTM model, which combines the POS tagging loss function with an -auxiliary loss function that accounts for rare words. The model obtains -state-of-the-art performance across 22 languages, and works especially well for -morphologically complex languages. Our analysis suggests that bi-LSTMs are less -sensitive to training data size and label corruptions (at small noise levels) -than previously assumed. -" -2728,1604.05559,"Melvyn Drag, Gauthaman Vasudevan",Efficient Calculation of Bigram Frequencies in a Corpus of Short Texts,cs.CL," We show that an efficient and popular method for calculating bigram -frequencies is unsuitable for bodies of short texts and offer a simple -alternative. Our method has the same computational complexity as the old method -and offers an exact count instead of an approximation. -" -2729,1604.05747,Francesco Elia,"Syntactic and semantic classification of verb arguments using - dependency-based and rich semantic features",cs.CL," Corpus Pattern Analysis (CPA) has been the topic of Semeval 2015 Task 15, -aimed at producing a system that can aid lexicographers in their efforts to -build a dictionary of meanings for English verbs using the CPA annotation -process. CPA parsing is one of the subtasks which this annotation process is -made of and it is the focus of this report. A supervised machine-learning -approach has been implemented, in which syntactic features derived from parse -trees and semantic features derived from WordNet and word embeddings are used. -It is shown that this approach performs well, even with the data sparsity -issues that characterize the dataset, and can obtain better results than other -system by a margin of about 4% f-score. -" -2730,1604.05781,"Thomas C. McAndrew, Joshua C. Bongard, Christopher M. Danforth, Peter - S. Dodds, Paul D. H. Hines, and James P. Bagrow","What we write about when we write about causality: Features of causal - statements across large-scale social discourse",cs.CY cs.CL cs.SI," Identifying and communicating relationships between causes and effects is -important for understanding our world, but is affected by language structure, -cognitive and emotional biases, and the properties of the communication medium. -Despite the increasing importance of social media, much remains unknown about -causal statements made online. To study real-world causal attribution, we -extract a large-scale corpus of causal statements made on the Twitter social -network platform as well as a comparable random control corpus. We compare -causal and control statements using statistical language and sentiment analysis -tools. We find that causal statements have a number of significant lexical and -grammatical differences compared with controls and tend to be more negative in -sentiment than controls. Causal statements made online tend to focus on news -and current events, medicine and health, or interpersonal relationships, as -shown by topic models. By quantifying the features and potential biases of -causality communication, this study improves our understanding of the accuracy -of information and opinions found online. -" -2731,1604.05800,"Qingyu Yin, Weinan Zhang, Yu Zhang, Ting Liu",A Deep Neural Network for Chinese Zero Pronoun Resolution,cs.CL," Existing approaches for Chinese zero pronoun resolution overlook semantic -information. This is because zero pronouns have no descriptive information, -which results in difficulty in explicitly capturing their semantic similarities -with antecedents. Moreover, when dealing with candidate antecedents, -traditional systems simply take advantage of the local information of a single -candidate antecedent while failing to consider the underlying information -provided by the other candidates from a global perspective. To address these -weaknesses, we propose a novel zero pronoun-specific neural network, which is -capable of representing zero pronouns by utilizing the contextual information -at the semantic level. In addition, when dealing with candidate antecedents, a -two-level candidate encoder is employed to explicitly capture both the local -and global information of candidate antecedents. We conduct experiments on the -Chinese portion of the OntoNotes 5.0 corpus. Experimental results show that our -approach substantially outperforms the state-of-the-art method in various -experimental settings. -" -2732,1604.05875,"Tiep Mai, Bichen Shi, Patrick K. Nicholson, Deepak Ajwani, Alessandra - Sala",Distributed Entity Disambiguation with Per-Mention Learning,cs.CL cs.IR," Entity disambiguation, or mapping a phrase to its canonical representation in -a knowledge base, is a fundamental step in many natural language processing -applications. Existing techniques based on global ranking models fail to -capture the individual peculiarities of the words and hence, either struggle to -meet the accuracy requirements of many real-world applications or they are too -complex to satisfy real-time constraints of applications. - In this paper, we propose a new disambiguation system that learns specialized -features and models for disambiguating each ambiguous phrase in the English -language. To train and validate the hundreds of thousands of learning models -for this purpose, we use a Wikipedia hyperlink dataset with more than 170 -million labelled annotations. We provide an extensive experimental evaluation -to show that the accuracy of our approach compares favourably with respect to -many state-of-the-art disambiguation systems. The training required for our -approach can be easily distributed over a cluster. Furthermore, updating our -system for new entities or calibrating it for special ones is a computationally -fast process, that does not affect the disambiguation of the other entities. -" -2733,1604.05878,"Johannes Welbl, Guillaume Bouchard, Sebastian Riedel","A Factorization Machine Framework for Testing Bigram Embeddings in - Knowledgebase Completion",cs.CL cs.AI cs.NE stat.ML," Embedding-based Knowledge Base Completion models have so far mostly combined -distributed representations of individual entities or relations to compute -truth scores of missing links. Facts can however also be represented using -pairwise embeddings, i.e. embeddings for pairs of entities and relations. In -this paper we explore such bigram embeddings with a flexible Factorization -Machine model and several ablations from it. We investigate the relevance of -various bigram types on the fb15k237 dataset and find relative improvements -compared to a compositional model. -" -2734,1604.06045,Jason Weston,Dialog-based Language Learning,cs.CL," A long-term goal of machine learning research is to build an intelligent -dialog agent. Most research in natural language understanding has focused on -learning from fixed training sets of labeled data, with supervision either at -the word level (tagging, parsing tasks) or sentence level (question answering, -machine translation). This kind of supervision is not realistic of how humans -learn, where language is both learned by, and used for, communication. In this -work, we study dialog-based language learning, where supervision is given -naturally and implicitly in the response of the dialog partner during the -conversation. We study this setup in two domains: the bAbI dataset of (Weston -et al., 2015) and large-scale question answering from (Dodge et al., 2015). We -evaluate a set of baseline learning strategies on these tasks, and show that a -novel model incorporating predictive lookahead is a promising approach for -learning from a teacher's response. In particular, a surprising result is that -it can learn to answer questions correctly without any reward-based supervision -at all. -" -2735,1604.06076,"Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren - Etzioni and Dan Roth","Question Answering via Integer Programming over Semi-Structured - Knowledge",cs.AI cs.CL," Answering science questions posed in natural language is an important AI -challenge. Answering such questions often requires non-trivial inference and -knowledge that goes beyond factoid retrieval. Yet, most systems for this task -are based on relatively shallow Information Retrieval (IR) and statistical -correlation techniques operating on large unstructured corpora. We propose a -structured inference system for this task, formulated as an Integer Linear -Program (ILP), that answers natural language questions using a semi-structured -knowledge base derived from text, including questions requiring multi-step -inference and a combination of multiple facts. On a dataset of real, unseen -science questions, our system significantly outperforms (+14%) the best -previous attempt at structured reasoning for this task, which used Markov Logic -Networks (MLNs). It also improves upon a previous ILP formulation by 17.7%. -When combined with unstructured inference methods, the ILP system significantly -boosts overall performance (+10%). Finally, we show our approach is -substantially more robust to a simple answer perturbation compared to -statistical correlation methods. -" -2736,1604.06113,"Wei Chu, Ruxin Chen","Speaker Cluster-Based Speaker Adaptive Training for Deep Neural Network - Acoustic Modeling",cs.CL," A speaker cluster-based speaker adaptive training (SAT) method under deep -neural network-hidden Markov model (DNN-HMM) framework is presented in this -paper. During training, speakers that are acoustically adjacent to each other -are hierarchically clustered using an i-vector based distance metric. DNNs with -speaker dependent layers are then adaptively trained for each cluster of -speakers. Before decoding starts, an unseen speaker in test set is matched to -the closest speaker cluster through comparing i-vector based distances. The -previously trained DNN of the matched speaker cluster is used for decoding -utterances of the test speaker. The performance of the proposed method on a -large vocabulary spontaneous speech recognition task is evaluated on a training -set of with 1500 hours of speech, and a test set of 24 speakers with 1774 -utterances. Comparing to a speaker independent DNN with a baseline word error -rate of 11.6%, a relative 6.8% reduction in word error rate is observed from -the proposed method. -" -2737,1604.06225,"Ido Kissos, Nachum Dershowitz","OCR Error Correction Using Character Correction and Feature-Based Word - Classification",cs.IR cs.CL," This paper explores the use of a learned classifier for post-OCR text -correction. Experiments with the Arabic language show that this approach, which -integrates a weighted confusion matrix and a shallow language model, improves -the vast majority of segmentation and recognition errors, the most frequent -types of error on our dataset. -" -2738,1604.06274,"Qixin Wang, Tianyi Luo, Dong Wang, Chao Xing",Chinese Song Iambics Generation with Neural Attention-based Model,cs.CL," Learning and generating Chinese poems is a charming yet challenging task. -Traditional approaches involve various language modeling and machine -translation techniques, however, they perform not as well when generating poems -with complex pattern constraints, for example Song iambics, a famous type of -poems that involve variable-length sentences and strict rhythmic patterns. This -paper applies the attention-based sequence-to-sequence model to generate -Chinese Song iambics. Specifically, we encode the cue sentences by a -bi-directional Long-Short Term Memory (LSTM) model and then predict the entire -iambic with the information provided by the encoder, in the form of an -attention-based LSTM that can regularize the generation process by the fine -structure of the input cues. Several techniques are investigated to improve the -model, including global context integration, hybrid style training, character -vector initialization and adaptation. Both the automatic and subjective -evaluation results show that our model indeed can learn the complex structural -and rhythmic patterns of Song iambics, and the generation is rather successful. -" -2739,1604.06285,"Longyue Wang, Zhaopeng Tu, Xiaojun Zhang, Hang Li, Andy Way, Qun Liu",A Novel Approach to Dropped Pronoun Translation,cs.CL," Dropped Pronouns (DP) in which pronouns are frequently dropped in the source -language but should be retained in the target language are challenge in machine -translation. In response to this problem, we propose a semi-supervised approach -to recall possibly missing pronouns in the translation. Firstly, we build -training data for DP generation in which the DPs are automatically labelled -according to the alignment information from a parallel corpus. Secondly, we -build a deep learning-based DP generator for input sentences in decoding when -no corresponding references exist. More specifically, the generation is -two-phase: (1) DP position detection, which is modeled as a sequential -labelling task with recurrent neural networks; and (2) DP prediction, which -employs a multilayer perceptron with rich features. Finally, we integrate the -above outputs into our translation system to recall missing pronouns by both -extracting rules from the DP-labelled training data and translating the -DP-generated input sentences. Experimental results show that our approach -achieves a significant improvement of 1.58 BLEU points in translation -performance with 66% F-score for DP generation accuracy. -" -2740,1604.06361,Patrick Verga and Andrew McCallum,Row-less Universal Schema,cs.CL," Universal schema jointly embeds knowledge bases and textual patterns to -reason about entities and relations for automatic knowledge base construction -and information extraction. In the past, entity pairs and relations were -represented as learned vectors with compatibility determined by a scoring -function, limiting generalization to unseen text patterns and entities. -Recently, 'column-less' versions of Universal Schema have used compositional -pattern encoders to generalize to all text patterns. In this work we take the -next step and propose a 'row-less' model of universal schema, removing explicit -entity pair representations. Instead of learning vector representations for -each entity pair in our training set, we treat an entity pair as a function of -its relation types. In experimental results on the FB15k-237 benchmark we -demonstrate that we can match the performance of a comparable model with -explicit entity pair representations using a model of attention over relation -types. We further demonstrate that the model per- forms with nearly the same -accuracy on entity pairs never seen during training. -" -2741,1604.06529,"Adhiguna Kuncoro, Yuichiro Sawai, Kevin Duh, Yuji Matsumoto",Dependency Parsing with LSTMs: An Empirical Evaluation,cs.CL cs.LG cs.NE," We propose a transition-based dependency parser using Recurrent Neural -Networks with Long Short-Term Memory (LSTM) units. This extends the feedforward -neural network parser of Chen and Manning (2014) and enables modelling of -entire sequences of shift/reduce transition decisions. On the Google Web -Treebank, our LSTM parser is competitive with the best feedforward parser on -overall accuracy and notably achieves more than 3% improvement for long-range -dependencies, which has proved difficult for previous transition-based parsers -due to error propagation and limited context information. Our findings -additionally suggest that dropout regularisation on the embedding layer is -crucial to improve the LSTM's generalisation. -" -2742,1604.06583,"Elena Volodina and Ildik\'o Pil\'an and Ingegerd Enstr\""om and Lorena - Llozhi and Peter Lundkvist and Gunl\""og Sundberg and Monica Sandell","SweLL on the rise: Swedish Learner Language corpus for European - Reference Level studies",cs.CL," We present a new resource for Swedish, SweLL, a corpus of Swedish Learner -essays linked to learners' performance according to the Common European -Framework of Reference (CEFR). SweLL consists of three subcorpora - SpIn, -SW1203 and Tisus, collected from three different educational establishments. -The common metadata for all subcorpora includes age, gender, native languages, -time of residence in Sweden, type of written task. Depending on the subcorpus, -learner texts may contain additional information, such as text genres, topics, -grades. Five of the six CEFR levels are represented in the corpus: A1, A2, B1, -B2 and C1 comprising in total 339 essays. C2 level is not included since -courses at C2 level are not offered. The work flow consists of collection of -essays and permits, essay digitization and registration, meta-data annotation, -automatic linguistic annotation. Inter-rater agreement is presented on the -basis of SW1203 subcorpus. The work on SweLL is still ongoing with more than -100 essays waiting in the pipeline. This article both describes the resource -and the ""how-to"" behind the compilation of SweLL. -" -2743,1604.06635,"Peng Qian, Xipeng Qiu, Xuanjing Huang",Bridging LSTM Architecture and the Neural Dynamics during Reading,cs.CL cs.AI cs.LG cs.NE," Recently, the long short-term memory neural network (LSTM) has attracted wide -interest due to its success in many tasks. LSTM architecture consists of a -memory cell and three gates, which looks similar to the neuronal networks in -the brain. However, there still lacks the evidence of the cognitive -plausibility of LSTM architecture as well as its working mechanism. In this -paper, we study the cognitive plausibility of LSTM by aligning its internal -architecture with the brain activity observed via fMRI when the subjects read a -story. Experiment results show that the artificial memory vector in LSTM can -accurately predict the observed sequential brain activities, indicating the -correlation between LSTM architecture and the cognitive process of story -reading. -" -2744,1604.06648,Denis Gordeev,"Automatic verbal aggression detection for Russian and American - imageboards",cs.CL," The problem of aggression for Internet communities is rampant. Anonymous -forums usually called imageboards are notorious for their aggressive and -deviant behaviour even in comparison with other Internet communities. This -study is aimed at studying ways of automatic detection of verbal expression of -aggression for the most popular American (4chan.org) and Russian (2ch.hk) -imageboards. A set of 1,802,789 messages was used for this study. The machine -learning algorithm word2vec was applied to detect the state of aggression. A -decent result is obtained for English (88%), the results for Russian are yet to -be improved. -" -2745,1604.06650,Rodmonga Potapova and Denis Gordeev,Detecting state of aggression in sentences using CNN,cs.CL," In this article we study verbal expression of aggression and its detection -using machine learning and neural networks methods. We test our results using -our corpora of messages from anonymous imageboards. We also compare Random -forest classifier with convolutional neural network for ""Movie reviews with one -sentence per review"" corpus. -" -2746,1604.06721,"Manfred Eppe, Sean Trott, Jerome Feldman","Exploiting Deep Semantics and Compositionality of Natural Language for - Human-Robot-Interaction",cs.AI cs.CL cs.RO," We develop a natural language interface for human robot interaction that -implements reasoning about deep semantics in natural language. To realize the -required deep analysis, we employ methods from cognitive linguistics, namely -the modular and compositional framework of Embodied Construction Grammar (ECG) -[Feldman, 2009]. Using ECG, robots are able to solve fine-grained reference -resolution problems and other issues related to deep semantics and -compositionality of natural language. This also includes verbal interaction -with humans to clarify commands and queries that are too ambiguous to be -executed safely. We implement our NLU framework as a ROS package and present -proof-of-concept scenarios with different robots, as well as a survey on the -state of the art. -" -2747,1604.06896,"Wenpeng Yin, Hinrich Sch\""utze","Why and How to Pay Different Attention to Phrase Alignments of Different - Intensities",cs.CL," This work studies comparatively two typical sentence pair classification -tasks: textual entailment (TE) and answer selection (AS), observing that phrase -alignments of different intensities contribute differently in these tasks. We -address the problems of identifying phrase alignments of flexible granularity -and pooling alignments of different intensities for these tasks. Examples for -flexible granularity are alignments between two single words, between a single -word and a phrase and between a short phrase and a long phrase. By intensity we -roughly mean the degree of match, it ranges from identity over surface-form -co-occurrence, rephrasing and other semantic relatedness to unrelated words as -in lots of parenthesis text. Prior work (i) has limitations in phrase -generation and representation, or (ii) conducts alignment at word and phrase -levels by handcrafted features or (iii) utilizes a single attention mechanism -over alignment intensities without considering the characteristics of specific -tasks, which limits the system's effectiveness across tasks. We propose an -architecture based on Gated Recurrent Unit that supports (i) representation -learning of phrases of arbitrary granularity and (ii) task-specific focusing of -phrase alignments between two sentences by attention pooling. Experimental -results on TE and AS match our observation and are state-of-the-art. -" -2748,1604.06952,Fionn Murtagh and Giuseppe Iurato,"Visualization of Jacques Lacan's Registers of the Psychoanalytic Field, - and Discovery of Metaphor and of Metonymy. Analytical Case Study of Edgar - Allan Poe's ""The Purloined Letter""",cs.CL stat.ML," We start with a description of Lacan's work that we then take into our -analytics methodology. In a first investigation, a Lacan-motivated template of -the Poe story is fitted to the data. A segmentation of the storyline is used in -order to map out the diachrony. Based on this, it will be shown how synchronous -aspects, potentially related to Lacanian registers, can be sought. This -demonstrates the effectiveness of an approach based on a model template of the -storyline narrative. In a second and more comprehensive investigation, we -develop an approach for revealing, that is, uncovering, Lacanian register -relationships. Objectives of this work include the wide and general application -of our methodology. This methodology is strongly based on the ""letting the data -speak"" Correspondence Analysis analytics platform of Jean-Paul Benz\'ecri, that -is also the geometric data analysis, both qualitative and quantitative -analytics, developed by Pierre Bourdieu. -" -2749,1604.07236,"Arkaitz Zubiaga, Alex Voss, Rob Procter, Maria Liakata, Bo Wang, Adam - Tsakalidis","Towards Real-Time, Country-Level Location Classification of Worldwide - Tweets",cs.IR cs.CL cs.SI," In contrast to much previous work that has focused on location classification -of tweets restricted to a specific country, here we undertake the task in a -broader context by classifying global tweets at the country level, which is so -far unexplored in a real-time scenario. We analyse the extent to which a -tweet's country of origin can be determined by making use of eight -tweet-inherent features for classification. Furthermore, we use two datasets, -collected a year apart from each other, to analyse the extent to which a model -trained from historical tweets can still be leveraged for classification of new -tweets. With classification experiments on all 217 countries in our datasets, -as well as on the top 25 countries, we offer some insights into the best use of -tweet-inherent features for an accurate country-level classification of tweets. -We find that the use of a single feature, such as the use of tweet content -alone -- the most widely used feature in previous work -- leaves much to be -desired. Choosing an appropriate combination of both tweet content and metadata -can actually lead to substantial improvements of between 20\% and 50\%. We -observe that tweet content, the user's self-reported location and the user's -real name, all of which are inherent in a tweet and available in a real-time -scenario, are particularly useful to determine the country of origin. We also -experiment on the applicability of a model trained on historical tweets to -classify new tweets, finding that the choice of a particular combination of -features whose utility does not fade over time can actually lead to comparable -performance, avoiding the need to retrain. However, the difficulty of achieving -accurate classification increases slightly for countries with multiple -commonalities, especially for English and Spanish speaking countries. -" -2750,1604.07370,Christian Stab and Iryna Gurevych,Parsing Argumentation Structures in Persuasive Essays,cs.CL," In this article, we present a novel approach for parsing argumentation -structures. We identify argument components using sequence labeling at the -token level and apply a new joint model for detecting argumentation structures. -The proposed model globally optimizes argument component types and -argumentative relations using integer linear programming. We show that our -model considerably improves the performance of base classifiers and -significantly outperforms challenging heuristic baselines. Moreover, we -introduce a novel corpus of persuasive essays annotated with argumentation -structures. We show that our annotation scheme and annotation guidelines -successfully guide human annotators to substantial agreement. This corpus and -the annotation guidelines are freely available for ensuring reproducibility and -to encourage future research in computational argumentation. -" -2751,1604.07407,Vlad Niculae and Cristian Danescu-Niculescu-Mizil,Conversational Markers of Constructive Discussions,cs.CL cs.AI cs.SI physics.soc-ph stat.ML," Group discussions are essential for organizing every aspect of modern life, -from faculty meetings to senate debates, from grant review panels to papal -conclaves. While costly in terms of time and organization effort, group -discussions are commonly seen as a way of reaching better decisions compared to -solutions that do not require coordination between the individuals (e.g. -voting)---through discussion, the sum becomes greater than the parts. However, -this assumption is not irrefutable: anecdotal evidence of wasteful discussions -abounds, and in our own experiments we find that over 30% of discussions are -unproductive. - We propose a framework for analyzing conversational dynamics in order to -determine whether a given task-oriented discussion is worth having or not. We -exploit conversational patterns reflecting the flow of ideas and the balance -between the participants, as well as their linguistic choices. We apply this -framework to conversations naturally occurring in an online collaborative world -exploration game developed and deployed to support this research. Using this -setting, we show that linguistic cues and conversational patterns extracted -from the first 20 seconds of a team discussion are predictive of whether it -will be a wasteful or a productive one. -" -2752,1604.07809,Federico Nanni and Pablo Ruiz Fabo,"Entities as topic labels: Improving topic interpretability and - evaluability combining Entity Linking and Labeled LDA",cs.CL," In order to create a corpus exploration method providing topics that are -easier to interpret than standard LDA topic models, here we propose combining -two techniques called Entity linking and Labeled LDA. Our method identifies in -an ontology a series of descriptive labels for each document in a corpus. Then -it generates a specific topic for each label. Having a direct relation between -topics and labels makes interpretation easier; using an ontology as background -knowledge limits label ambiguity. As our topics are described with a limited -number of clear-cut labels, they promote interpretability, and this may help -quantitative evaluation. We illustrate the potential of the approach by -applying it in order to define the most relevant topics addressed by each party -in the European Parliament's fifth mandate (1999-2004). -" -2753,1604.08095,"Zhenhao Ge, Yingyi Tan, Aravind Ganapathiraju",Accent Classification with Phonetic Vowel Representation,cs.SD cs.CL," Previous accent classification research focused mainly on detecting accents -with pure acoustic information without recognizing accented speech. This work -combines phonetic knowledge such as vowels with acoustic information to build -Guassian Mixture Model (GMM) classifier with Perceptual Linear Predictive (PLP) -features, optimized by Hetroscedastic Linear Discriminant Analysis (HLDA). With -input about 20-second accented speech, this system achieves classification rate -of 51% on a 7-way classification system focusing on the major types of accents -in English, which is competitive to the state-of-the-art results in this field. -" -2754,1604.08120,Paramita Mirza,Extracting Temporal and Causal Relations between Events,cs.CL," Structured information resulting from temporal information processing is -crucial for a variety of natural language processing tasks, for instance to -generate timeline summarization of events from news documents, or to answer -temporal/causal-related questions about some events. In this thesis we present -a framework for an integrated temporal and causal relation extraction system. -We first develop a robust extraction component for each type of relations, i.e. -temporal order and causality. We then combine the two extraction components -into an integrated relation extraction system, CATENA---CAusal and Temporal -relation Extraction from NAtural language texts---, by utilizing the -presumption about event precedence in causality, that causing events must -happened BEFORE resulting events. Several resources and techniques to improve -our relation extraction systems are also discussed, including word embeddings -and training data expansion. Finally, we report our adaptation efforts of -temporal information processing for languages other than English, namely -Italian and Indonesian. -" -2755,1604.08242,"George Saon, Tom Sercu, Steven Rennie and Hong-Kwang J. Kuo",The IBM 2016 English Conversational Telephone Speech Recognition System,cs.CL," We describe a collection of acoustic and language modeling techniques that -lowered the word error rate of our English conversational telephone LVCSR -system to a record 6.6% on the Switchboard subset of the Hub5 2000 evaluation -testset. On the acoustic side, we use a score fusion of three strong models: -recurrent nets with maxout activations, very deep convolutional nets with 3x3 -kernels, and bidirectional long short-term memory nets which operate on FMLLR -and i-vector features. On the language modeling side, we use an updated model -""M"" and hierarchical neural network LMs. -" -2756,1604.08504,"Linqing Liu, Yao Lu, Ye Luo, Renxian Zhang, Laurent Itti and Jianwei - Lu","Detecting ""Smart"" Spammers On Social Network: A Topic Model Approach",cs.CL cs.SI," Spammer detection on social network is a challenging problem. The rigid -anti-spam rules have resulted in emergence of ""smart"" spammers. They resemble -legitimate users who are difficult to identify. In this paper, we present a -novel spammer classification approach based on Latent Dirichlet -Allocation(LDA), a topic model. Our approach extracts both the local and the -global information of topic distribution patterns, which capture the essence of -spamming. Tested on one benchmark dataset and one self-collected dataset, our -proposed method outperforms other state-of-the-art methods in terms of averaged -F1-score. -" -2757,1604.08561,Ehsaneddin Asgari and Mohammad R.K. Mofrad,"Comparing Fifty Natural Languages and Twelve Genetic Languages Using - Word Embedding Language Divergence (WELD) as a Quantitative Measure of - Language Distance",cs.CL," We introduce a new measure of distance between languages based on word -embedding, called word embedding language divergence (WELD). WELD is defined as -divergence between unified similarity distribution of words between languages. -Using such a measure, we perform language comparison for fifty natural -languages and twelve genetic languages. Our natural language dataset is a -collection of sentence-aligned parallel corpora from bible translations for -fifty languages spanning a variety of language families. Although we use -parallel corpora, which guarantees having the same content in all languages, -interestingly in many cases languages within the same family cluster together. -In addition to natural languages, we perform language comparison for the coding -regions in the genomes of 12 different organisms (4 plants, 6 animals, and two -human subjects). Our result confirms a significant high-level difference in the -genetic language model of humans/animals versus plants. The proposed method is -a step toward defining a quantitative measure of similarity between languages, -with applications in languages classification, genre identification, dialect -identification, and evaluation of translations. -" -2758,1604.08633,Allen Schmaltz and Alexander M. Rush and Stuart M. Shieber,Word Ordering Without Syntax,cs.CL," Recent work on word ordering has argued that syntactic structure is -important, or even required, for effectively recovering the order of a -sentence. We find that, in fact, an n-gram language model with a simple -heuristic gives strong results on this task. Furthermore, we show that a long -short-term memory (LSTM) language model is even more effective at recovering -order, with our basic model outperforming a state-of-the-art syntactic model by -11.5 BLEU points. Additional data and larger beams yield further gains, at the -expense of training and search time. -" -2759,1604.08672,"Shufeng Xiong, Yue Zhang, Donghong Ji, Yinxia Lou",Distance Metric Learning for Aspect Phrase Grouping,cs.CL," Aspect phrase grouping is an important task in aspect-level sentiment -analysis. It is a challenging problem due to polysemy and context dependency. -We propose an Attention-based Deep Distance Metric Learning (ADDML) method, by -considering aspect phrase representation as well as context representation. -First, leveraging the characteristics of the review text, we automatically -generate aspect phrase sample pairs for distant supervision. Second, we feed -word embeddings of aspect phrases and their contexts into an attention-based -neural network to learn feature representation of contexts. Both aspect phrase -embedding and context embedding are used to learn a deep feature subspace for -measure the distances between aspect phrases for K-means clustering. -Experiments on four review datasets show that the proposed method outperforms -state-of-the-art strong baseline methods. -" -2760,1604.08781,Joseph Corneli and Miriam Corneli,Teaching natural language to computers,cs.CL cs.AI," ""Natural Language,"" whether spoken and attended to by humans, or processed -and generated by computers, requires networked structures that reflect creative -processes in semantic, syntactic, phonetic, linguistic, social, emotional, and -cultural modules. Being able to produce novel and useful behavior following -repeated practice gets to the root of both artificial intelligence and human -language. This paper investigates the modalities involved in language-like -applications that computers -- and programmers -- engage with, and aims to fine -tune the questions we ask to better account for context, self-awareness, and -embodiment. -" -2761,1605.00090,"Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou",Response Selection with Topic Clues for Retrieval-based Chatbots,cs.CL," We consider incorporating topic information into message-response matching to -boost responses with rich content in retrieval-based chatbots. To this end, we -propose a topic-aware convolutional neural tensor network (TACNTN). In TACNTN, -matching between a message and a response is not only conducted between a -message vector and a response vector generated by convolutional neural -networks, but also leverages extra topic information encoded in two topic -vectors. The two topic vectors are linear combinations of topic words of the -message and the response respectively, where the topic words are obtained from -a pre-trained LDA model and their weights are determined by themselves as well -as the message vector and the response vector. The message vector, the response -vector, and the two topic vectors are fed to neural tensors to calculate a -matching score. Empirical study on a public data set and a human annotated data -set shows that TACNTN can significantly outperform state-of-the-art methods for -message-response matching. -" -2762,1605.00122,"Xinyu Fu, Eugene Ch'ng, Uwe Aickelin, Lanyun Zhang","An Improved System for Sentence-level Novelty Detection in Textual - Streams",cs.IR cs.AI cs.CL," Novelty detection in news events has long been a difficult problem. A number -of models performed well on specific data streams but certain issues are far -from being solved, particularly in large data streams from the WWW where -unpredictability of new terms requires adaptation in the vector space model. We -present a novel event detection system based on the Incremental Term -Frequency-Inverse Document Frequency (TF-IDF) weighting incorporated with -Locality Sensitive Hashing (LSH). Our system could efficiently and effectively -adapt to the changes within the data streams of any new terms with continual -updates to the vector space model. Regarding miss probability, our proposed -novelty detection framework outperforms a recognised baseline system by -approximately 16% when evaluating a benchmark dataset from Google News. -" -2763,1605.00223,"Ricardo Pio Monti, Romy Lorenz, Robert Leech, Christoforos - Anagnostopoulos and Giovanni Montana",Text-mining the NeuroSynth corpus using Deep Boltzmann Machines,cs.LG cs.CL q-bio.NC stat.ML," Large-scale automated meta-analysis of neuroimaging data has recently -established itself as an important tool in advancing our understanding of human -brain function. This research has been pioneered by NeuroSynth, a database -collecting both brain activation coordinates and associated text across a large -cohort of neuroimaging research papers. One of the fundamental aspects of such -meta-analysis is text-mining. To date, word counts and more sophisticated -methods such as Latent Dirichlet Allocation have been proposed. In this work we -present an unsupervised study of the NeuroSynth text corpus using Deep -Boltzmann Machines (DBMs). The use of DBMs yields several advantages over the -aforementioned methods, principal among which is the fact that it yields both -word and document embeddings in a high-dimensional vector space. Such -embeddings serve to facilitate the use of traditional machine learning -techniques on the text corpus. The proposed DBM model is shown to learn -embeddings with a clear semantic structure. -" -2764,1605.00459,"Desmond Elliott, Stella Frank, Khalil Sima'an, Lucia Specia",Multi30K: Multilingual English-German Image Descriptions,cs.CL cs.CV," We introduce the Multi30K dataset to stimulate multilingual multimodal -research. Recent advances in image description have been demonstrated on -English-language datasets almost exclusively, but image description should not -be limited to English. This dataset extends the Flickr30K dataset with i) -German translations created by professional translators over a subset of the -English descriptions, and ii) descriptions crowdsourced independently of the -original English descriptions. We outline how the data can be used for -multilingual image description and multimodal machine translation, but we -anticipate the data will be useful for a broader range of tasks. -" -2765,1605.00482,"Geonmin Kim, Hwaran Lee, Jisu Choi, Soo-young Lee","Compositional Sentence Representation from Character within Large - Context Text",cs.CL," This paper describes a Hierarchical Composition Recurrent Network (HCRN) -consisting of a 3-level hierarchy of compositional models: character, word and -sentence. This model is designed to overcome two problems of representing a -sentence on the basis of a constituent word sequence. The first is a -data-sparsity problem in word embedding, and the other is a no usage of -inter-sentence dependency. In the HCRN, word representations are built from -characters, thus resolving the data-sparsity problem, and inter-sentence -dependency is embedded into sentence representation at the level of sentence -composition. We adopt a hierarchy-wise learning scheme in order to alleviate -the optimization difficulties of learning deep hierarchical recurrent network -in end-to-end fashion. The HCRN was quantitatively and qualitatively evaluated -on a dialogue act classification task. Especially, sentence representations -with an inter-sentence dependency are able to capture both implicit and -explicit semantics of sentence, significantly improving performance. In the -end, the HCRN achieved state-of-the-art performance with a test error rate of -22.7% for dialogue act classification on the SWBD-DAMSL database. -" -2766,1605.00855,Xirong Li and Qin Jin,Improving Image Captioning by Concept-based Sentence Reranking,cs.CV cs.CL," This paper describes our winning entry in the ImageCLEF 2015 image sentence -generation task. We improve Google's CNN-LSTM model by introducing -concept-based sentence reranking, a data-driven approach which exploits the -large amounts of concept-level annotations on Flickr. Different from previous -usage of concept detection that is tailored to specific image captioning -models, the propose approach reranks predicted sentences in terms of their -matches with detected concepts, essentially treating the underlying model as a -black box. This property makes the approach applicable to a number of existing -solutions. We also experiment with fine tuning on the deep language model, -which improves the performance further. Scoring METEOR of 0.1875 on the -ImageCLEF 2015 test set, our system outperforms the runner-up (METEOR of -0.1687) with a clear margin. -" -2767,1605.00942,"Seppo Enarvi, Mikko Kurimo",TheanoLM - An Extensible Toolkit for Neural Network Language Modeling,cs.CL cs.NE," We present a new tool for training neural network language models (NNLMs), -scoring sentences, and generating text. The tool has been written using Python -library Theano, which allows researcher to easily extend it and tune any aspect -of the training process. Regardless of the flexibility, Theano is able to -generate extremely fast native code that can utilize a GPU or multiple CPU -cores in order to parallelize the heavy numerical computations. The tool has -been evaluated in difficult Finnish and English conversational speech -recognition tasks, and significant improvement was obtained over our best -back-off n-gram models. The results that we obtained in the Finnish task were -compared to those from existing RNNLM and RWTHLM toolkits, and found to be as -good or better, while training times were an order of magnitude shorter. -" -2768,1605.01194,Lavanya Sita Tekumalla and Sharmistha,"IISCNLP at SemEval-2016 Task 2: Interpretable STS with ILP based - Multiple Chunk Aligner",cs.CL stat.ML," Interpretable semantic textual similarity (iSTS) task adds a crucial -explanatory layer to pairwise sentence similarity. We address various -components of this task: chunk level semantic alignment along with assignment -of similarity type and score for aligned chunks with a novel system presented -in this paper. We propose an algorithm, iMATCH, for the alignment of multiple -non-contiguous chunks based on Integer Linear Programming (ILP). Similarity -type and score assignment for pairs of chunks is done using a supervised -multiclass classification technique based on Random Forrest Classifier. Results -show that our algorithm iMATCH has low execution time and outperforms most -other participating systems in terms of alignment score. Of the three datasets, -we are top ranked for answer- students dataset in terms of overall score and -have top alignment score for headlines dataset in the gold chunks track. -" -2769,1605.01326,Ramon Ferrer-i-Cancho,Compression and the origins of Zipf's law for word frequencies,cs.CL physics.data-an physics.soc-ph q-bio.NC," Here we sketch a new derivation of Zipf's law for word frequencies based on -optimal coding. The structure of the derivation is reminiscent of Mandelbrot's -random typing model but it has multiple advantages over random typing: (1) it -starts from realistic cognitive pressures (2) it does not require fine tuning -of parameters and (3) it sheds light on the origins of other statistical laws -of language and thus can lead to a compact theory of linguistic laws. Our -findings suggest that the recurrence of Zipf's law in human languages could -originate from pressure for easy and fast communication. -" -2770,1605.01478,"Minlie Huang, Yujie Cao, Chao Dong",Modeling Rich Contexts for Sentiment Classification with LSTM,cs.CL cs.IR cs.SI," Sentiment analysis on social media data such as tweets and weibo has become a -very important and challenging task. Due to the intrinsic properties of such -data, tweets are short, noisy, and of divergent topics, and sentiment -classification on these data requires to modeling various contexts such as the -retweet/reply history of a tweet, and the social context about authors and -relationships. While few prior study has approached the issue of modeling -contexts in tweet, this paper proposes to use a hierarchical LSTM to model rich -contexts in tweet, particularly long-range context. Experimental results show -that contexts can help us to perform sentiment classification remarkably -better. -" -2771,1605.01635,"Seyed Omid Sadjadi, Jason Pelecanos, Sriram Ganapathy",The IBM Speaker Recognition System: Recent Advances and Error Analysis,cs.CL cs.SD stat.ML," We present the recent advances along with an error analysis of the IBM -speaker recognition system for conversational speech. Some of the key -advancements that contribute to our system include: a nearest-neighbor -discriminant analysis (NDA) approach (as opposed to LDA) for intersession -variability compensation in the i-vector space, the application of speaker and -channel-adapted features derived from an automatic speech recognition (ASR) -system for speaker recognition, and the use of a DNN acoustic model with a very -large number of output units (~10k senones) to compute the frame-level soft -alignments required in the i-vector estimation process. We evaluate these -techniques on the NIST 2010 SRE extended core conditions (C1-C9), as well as -the 10sec-10sec condition. To our knowledge, results achieved by our system -represent the best performances published to date on these conditions. For -example, on the extended tel-tel condition (C5) the system achieves an EER of -0.59%. To garner further understanding of the remaining errors (on C5), we -examine the recordings associated with the low scoring target trials, where -various issues are identified for the problematic recordings/trials. -Interestingly, it is observed that correcting the pathological recordings not -only improves the scores for the target trials but also for the nontarget -trials. -" -2772,1605.01652,"Phong Le, Marc Dymetman, Jean-Michel Renders",LSTM-based Mixture-of-Experts for Knowledge-Aware Dialogues,cs.AI cs.CL," We introduce an LSTM-based method for dynamically integrating several -word-prediction experts to obtain a conditional language model which can be -good simultaneously at several subtasks. We illustrate this general approach -with an application to dialogue where we integrate a neural chat model, good at -conversational aspects, with a neural question-answering model, good at -retrieving precise information from a knowledge-base, and show how the -integration combines the strengths of the independent components. We hope that -this focused contribution will attract attention on the benefits of using such -mixtures of experts in NLP. -" -2773,1605.01655,"Saif M. Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko",Stance and Sentiment in Tweets,cs.CL," We can often detect from a person's utterances whether he/she is in favor of -or against a given target entity -- their stance towards the target. However, a -person may express the same stance towards a target by using negative or -positive language. Here for the first time we present a dataset of -tweet--target pairs annotated for both stance and sentiment. The targets may or -may not be referred to in the tweets, and they may or may not be the target of -opinion in the tweets. Partitions of this dataset were used as training and -test sets in a SemEval-2016 shared task competition. We propose a simple stance -detection system that outperforms submissions from all 19 teams that -participated in the shared task. Additionally, access to both stance and -sentiment annotations allows us to explore several research questions. We show -that while knowing the sentiment expressed by a tweet is beneficial for stance -classification, it alone is not sufficient. Finally, we use additional -unlabeled data through distant supervision techniques and word embeddings to -further improve stance classification. -" -2774,1605.01661,"R. Ferrer-i-Cancho, D. Lusseau and B. McCowan",Parallels of human language in the behavior of bottlenose dolphins,q-bio.NC cs.CL," A short review of similarities between dolphins and humans with the help of -quantitative linguistics and information theory. -" -2775,1605.01744,"Mengke Hu, David Cinciruk, and John MacLaren Walsh","Improving Automated Patent Claim Parsing: Dataset, System, and - Experiments",cs.CL," Off-the-shelf natural language processing software performs poorly when -parsing patent claims owing to their use of irregular language relative to the -corpora built from news articles and the web typically utilized to train this -software. Stopping short of the extensive and expensive process of accumulating -a large enough dataset to completely retrain parsers for patent claims, a -method of adapting existing natural language processing software towards patent -claims via forced part of speech tag correction is proposed. An Amazon -Mechanical Turk collection campaign organized to generate a public corpus to -train such an improved claim parsing system is discussed, identifying lessons -learned during the campaign that can be of use in future NLP dataset collection -campaigns with AMT. Experiments utilizing this corpus and other patent claim -sets measure the parsing performance improvement garnered via the claim parsing -system. Finally, the utility of the improved claim parsing system within other -patent processing applications is demonstrated via experiments showing improved -automated patent subject classification when the new claim parsing system is -utilized to generate the features. -" -2776,1605.01845,Ildik\'o Pil\'an,"Detecting Context Dependence in Exercise Item Candidates Selected from - Corpora",cs.CL," We explore the factors influencing the dependence of single sentences on -their larger textual context in order to automatically identify candidate -sentences for language learning exercises from corpora which are presentable in -isolation. An in-depth investigation of this question has not been previously -carried out. Understanding this aspect can contribute to a more efficient -selection of candidate sentences which, besides reducing the time required for -item writing, can also ensure a higher degree of variability and authenticity. -We present a set of relevant aspects collected based on the qualitative -analysis of a smaller set of context-dependent corpus example sentences. -Furthermore, we implemented a rule-based algorithm using these criteria which -achieved an average precision of 0.76 for the identification of different -issues related to context dependence. The method has also been evaluated -empirically where 80% of the sentences in which our system did not detect -context-dependent elements were also considered context-independent by human -raters. -" -2777,1605.01919,Scott A. Hale,User Reviews and Language: How Language Influences Ratings,cs.HC cs.CL cs.CY," The number of user reviews of tourist attractions, restaurants, mobile apps, -etc. is increasing for all languages; yet, research is lacking on how reviews -in multiple languages should be aggregated and displayed. Speakers of different -languages may have consistently different experiences, e.g., different -information available in different languages at tourist attractions or -different user experiences with software due to -internationalization/localization choices. This paper assesses the similarity -in the ratings given by speakers of different languages to London tourist -attractions on TripAdvisor. The correlations between different languages are -generally high, but some language pairs are more correlated than others. The -results question the common practice of computing average ratings from reviews -in many languages. -" -2778,1605.02019,Christopher E Moody,Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec,cs.CL," Distributed dense word vectors have been shown to be effective at capturing -token-level semantic and syntactic regularities in language, while topic models -can form interpretable representations over documents. In this work, we -describe lda2vec, a model that learns dense word vectors jointly with -Dirichlet-distributed latent document-level mixtures of topic vectors. In -contrast to continuous dense document representations, this formulation -produces sparse, interpretable document mixtures through a non-negative simplex -constraint. Our method is simple to incorporate into existing automatic -differentiation frameworks and allows for unsupervised document representations -geared for use by scientists while simultaneously learning word vectors and the -linear relationships between them. -" -2779,1605.02129,"Franck Dernoncourt, Ji Young Lee, Trung H. Bui, and Hung H. Bui","Adobe-MIT submission to the DSTC 4 Spoken Language Understanding pilot - task",cs.CL cs.AI cs.LG," The Dialog State Tracking Challenge 4 (DSTC 4) proposes several pilot tasks. -In this paper, we focus on the spoken language understanding pilot task, which -consists of tagging a given utterance with speech acts and semantic slots. We -compare different classifiers: the best system obtains 0.52 and 0.67 F1-scores -on the test set for speech act recognition for the tourist and the guide -respectively, and 0.52 F1-score for semantic tagging for both the guide and the -tourist. -" -2780,1605.02130,"Franck Dernoncourt, Ji Young Lee, Trung H. Bui, Hung H. Bui",Robust Dialog State Tracking for Large Ontologies,cs.CL cs.AI cs.LG," The Dialog State Tracking Challenge 4 (DSTC 4) differentiates itself from the -previous three editions as follows: the number of slot-value pairs present in -the ontology is much larger, no spoken language understanding output is given, -and utterances are labeled at the subdialog level. This paper describes a novel -dialog state tracking method designed to work robustly under these conditions, -using elaborate string matching, coreference resolution tailored for dialogs -and a few other improvements. The method can correctly identify many values -that are not explicitly present in the utterance. On the final evaluation, our -method came in first among 7 competing teams and 24 entries. The F1-score -achieved by our method was 9 and 7 percentage points higher than that of the -runner-up for the utterance-level evaluation and for the subdialog-level -evaluation, respectively. -" -2781,1605.02134,"Wei-Nan Zhang, Ting Liu, Qingyu Yin, Yu Zhang",Neural Recovery Machine for Chinese Dropped Pronoun,cs.CL," Dropped pronouns (DPs) are ubiquitous in pro-drop languages like Chinese, -Japanese etc. Previous work mainly focused on painstakingly exploring the -empirical features for DPs recovery. In this paper, we propose a neural -recovery machine (NRM) to model and recover DPs in Chinese, so that to avoid -the non-trivial feature engineering process. The experimental results show that -the proposed NRM significantly outperforms the state-of-the-art approaches on -both two heterogeneous datasets. Further experiment results of Chinese zero -pronoun (ZP) resolution show that the performance of ZP resolution can also be -improved by recovering the ZPs to DPs. -" -2782,1605.02150,"Elaheh ShafieiBavani, Mohammad Ebrahimi, Raymond Wong, Fang Chen","On Improving Informativity and Grammaticality for Multi-Sentence - Compression",cs.CL," Multi Sentence Compression (MSC) is of great value to many real world -applications, such as guided microblog summarization, opinion summarization and -newswire summarization. Recently, word graph-based approaches have been -proposed and become popular in MSC. Their key assumption is that redundancy -among a set of related sentences provides a reliable way to generate -informative and grammatical sentences. In this paper, we propose an effective -approach to enhance the word graph-based MSC and tackle the issue that most of -the state-of-the-art MSC approaches are confronted with: i.e., improving both -informativity and grammaticality at the same time. Our approach consists of -three main components: (1) a merging method based on Multiword Expressions -(MWE); (2) a mapping strategy based on synonymy between words; (3) a re-ranking -step to identify the best compression candidates generated using a POS-based -language model (POS-LM). We demonstrate the effectiveness of this novel -approach using a dataset made of clusters of English newswire sentences. The -observed improvements on informativity and grammaticality of the generated -compressions show that our approach is superior to state-of-the-art MSC -methods. -" -2783,1605.02257,"Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Meredith Green, - Kathryn Conger, Tim O'Gorman, Martha Palmer",A corpus of preposition supersenses in English web reviews,cs.CL," We present the first corpus annotated with preposition supersenses, -unlexicalized categories for semantic functions that can be marked by English -prepositions (Schneider et al., 2015). That scheme improves upon its -predecessors to better facilitate comprehensive manual annotation. Moreover, -unlike the previous schemes, the preposition supersenses are organized -hierarchically. Our data will be publicly released on the web upon publication. -" -2784,1605.02276,"Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, Chris Dyer",Problems With Evaluation of Word Embeddings Using Word Similarity Tasks,cs.CL," Lacking standardized extrinsic evaluation methods for vector representations -of words, the NLP community has relied heavily on word similarity tasks as a -proxy for intrinsic evaluation of word vectors. Word similarity evaluation, -which correlates the distance between vectors and human judgments of semantic -similarity is attractive, because it is computationally inexpensive and fast. -In this paper we present several problems associated with the evaluation of -word vectors on word similarity datasets, and summarize existing solutions. Our -study suggests that the use of word similarity tasks for evaluation of word -vectors is not sustainable and calls for further research on evaluation -methods. -" -2785,1605.02442,M. Syamala Devi and Himani Mittal,"Machine Learning Techniques with Ontology for Subjective Answer - Evaluation",cs.AI cs.CL cs.IR," Computerized Evaluation of English Essays is performed using Machine learning -techniques like Latent Semantic Analysis (LSA), Generalized LSA, Bilingual -Evaluation Understudy and Maximum Entropy. Ontology, a concept map of domain -knowledge, can enhance the performance of these techniques. Use of Ontology -makes the evaluation process holistic as presence of keywords, synonyms, the -right word combination and coverage of concepts can be checked. In this paper, -the above mentioned techniques are implemented both with and without Ontology -and tested on common input data consisting of technical answers of Computer -Science. Domain Ontology of Computer Graphics is designed and developed. The -software used for implementation includes Java Programming Language and tools -such as MATLAB, Prot\'eg\'e, etc. Ten questions from Computer Graphics with -sixty answers for each question are used for testing. The results are analyzed -and it is concluded that the results are more accurate with use of Ontology. -" -2786,1605.02457,Tobias Kuhn,The Controlled Natural Language of Randall Munroe's Thing Explainer,cs.CL," It is rare that texts or entire books written in a Controlled Natural -Language (CNL) become very popular, but exactly this has happened with a book -that has been published last year. Randall Munroe's Thing Explainer uses only -the 1'000 most often used words of the English language together with drawn -pictures to explain complicated things such as nuclear reactors, jet engines, -the solar system, and dishwashers. This restricted language is a very -interesting new case for the CNL community. I describe here its place in the -context of existing approaches on Controlled Natural Languages, and I provide a -first analysis from a scientific perspective, covering the word production -rules and word distributions. -" -2787,1605.02592,"Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault",GLEU Without Tuning,cs.CL," The GLEU metric was proposed for evaluating grammatical error corrections -using n-gram overlap with a set of reference sentences, as opposed to -precision/recall of specific annotated errors (Napoles et al., 2015). This -paper describes improvements made to the GLEU metric that address problems that -arise when using an increasing number of reference sets. Unlike the originally -presented metric, the modified metric does not require tuning. We recommend -that this version be used instead of the original version. -" -2788,1605.02697,Mateusz Malinowski and Marcus Rohrbach and Mario Fritz,Ask Your Neurons: A Deep Learning Approach to Visual Question Answering,cs.CV cs.AI cs.CL," We address a question answering task on real-world images that is set up as a -Visual Turing Test. By combining latest advances in image representation and -natural language processing, we propose Ask Your Neurons, a scalable, jointly -trained, end-to-end formulation to this problem. - In contrast to previous efforts, we are facing a multi-modal problem where -the language output (answer) is conditioned on visual and natural language -inputs (image and question). We provide additional insights into the problem by -analyzing how much information is contained only in the language part for which -we provide a new human baseline. To study human consensus, which is related to -the ambiguities inherent in this challenging task, we propose two novel metrics -and collect additional answers which extend the original DAQUAR dataset to -DAQUAR-Consensus. - Moreover, we also extend our analysis to VQA, a large-scale question -answering about images dataset, where we investigate some particular design -choices and show the importance of stronger visual models. At the same time, we -achieve strong performance of our model that still uses a global image -representation. Finally, based on such analysis, we refine our Ask Your Neurons -on DAQUAR, which also leads to a better performance on this challenging task. -" -2789,1605.02916,"Pawe{\l} {\L}ozi\'nski, Dariusz Czerski, Mieczys{\l}aw A. K{\l}opotek",Grammatical Case Based IS-A Relation Extraction with Boosting for Polish,cs.CL cs.IR," Pattern-based methods of IS-A relation extraction rely heavily on so called -Hearst patterns. These are ways of expressing instance enumerations of a class -in natural language. While these lexico-syntactic patterns prove quite useful, -they may not capture all taxonomical relations expressed in text. Therefore in -this paper we describe a novel method of IS-A relation extraction from -patterns, which uses morpho-syntactical annotations along with grammatical case -of noun phrases that constitute entities participating in IS-A relation. We -also describe a method for increasing the number of extracted relations that we -call pseudo-subclass boosting which has potential application in any -pattern-based relation extraction method. Experiments were conducted on a -corpus of about 0.5 billion web documents in Polish language. -" -2790,1605.02945,"Yuval Pinter, Roi Reichart, Idan Szpektor","The Yahoo Query Treebank, V. 1.0",cs.CL cs.IR," A description and annotation guidelines for the Yahoo Webscope release of -Query Treebank, Version 1.0, May 2016. -" -2791,1605.02948,"Milad Moradi, Nasser Ghadiri","Different approaches for identifying important concepts in probabilistic - biomedical text summarization",cs.CL cs.IR," Automatic text summarization tools help users in biomedical domain to acquire -their intended information from various textual resources more efficiently. -Some of the biomedical text summarization systems put the basis of their -sentence selection approach on the frequency of concepts extracted from the -input text. However, it seems that exploring other measures rather than the -frequency for identifying the valuable content of the input document, and -considering the correlations existing between concepts may be more useful for -this type of summarization. In this paper, we describe a Bayesian summarizer -for biomedical text documents. The Bayesian summarizer initially maps the input -text to the Unified Medical Language System (UMLS) concepts, then it selects -the important ones to be used as classification features. We introduce -different feature selection approaches to identify the most important concepts -of the text and to select the most informative content according to the -distribution of these concepts. We show that with the use of an appropriate -feature selection approach, the Bayesian biomedical summarizer can improve the -performance of summarization. We perform extensive evaluations on a corpus of -scientific papers in biomedical domain. The results show that the Bayesian -summarizer outperforms the biomedical summarizers that rely on the frequency of -concepts, the domain-independent and baseline methods based on the -Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. Moreover, -the results suggest that using the meaningfulness measure and considering the -correlations of concepts in the feature selection step lead to a significant -increase in the performance of summarization. -" -2792,1605.03148,Haitao Mi and Baskaran Sankaran and Zhiguo Wang and Abe Ittycheriah,Coverage Embedding Models for Neural Machine Translation,cs.CL," In this paper, we enhance the attention-based neural machine translation -(NMT) by adding explicit coverage embedding models to alleviate issues of -repeating and dropping translations in NMT. For each source word, our model -starts with a full coverage embedding vector to track the coverage status, and -then keeps updating it with neural networks as the translation goes. -Experiments on the large-scale Chinese-to-English task show that our enhanced -model improves the translation quality significantly on various test sets over -the strong large vocabulary NMT system. -" -2793,1605.03209,Haitao Mi and Zhiguo Wang and Abe Ittycheriah,Vocabulary Manipulation for Neural Machine Translation,cs.CL," In order to capture rich language phenomena, neural machine translation -models have to use a large vocabulary size, which requires high computing time -and large memory usage. In this paper, we alleviate this issue by introducing a -sentence-level or batch-level vocabulary, which is only a very small sub-set of -the full output vocabulary. For each sentence or batch, we only predict the -target words in its sentence-level or batch-level vocabulary. Thus, we reduce -both the computing time and the memory usage. Our method simply takes into -account the translation options of each word or phrase in the source sentence, -and picks a very small target vocabulary for each sentence based on a -word-to-word translation model or a bilingual phrase library learned from a -traditional machine translation model. Experimental results on the large-scale -English-to-French task show that our method achieves better translation -performance by 1 BLEU point over the large vocabulary neural machine -translation system of Jean et al. (2015). -" -2794,1605.03261,"Junpei Zhong and Martin Peniak and Jun Tani and Tetsuya Ogata and - Angelo Cangelosi","Sensorimotor Input as a Language Generalisation Tool: A Neurorobotics - Model for Generation and Generalisation of Noun-Verb Combinations with - Sensorimotor Inputs",cs.RO cs.CL," The paper presents a neurorobotics cognitive model to explain the -understanding and generalisation of nouns and verbs combinations when a vocal -command consisting of a verb-noun sentence is provided to a humanoid robot. -This generalisation process is done via the grounding process: different -objects are being interacted, and associated, with different motor behaviours, -following a learning approach inspired by developmental language acquisition in -infants. This cognitive model is based on Multiple Time-scale Recurrent Neural -Networks (MTRNN).With the data obtained from object manipulation tasks with a -humanoid robot platform, the robotic agent implemented with this model can -ground the primitive embodied structure of verbs through training with -verb-noun combination samples. Moreover, we show that a functional hierarchical -architecture, based on MTRNN, is able to generalise and produce novel -combinations of noun-verb sentences. Further analyses of the learned network -dynamics and representations also demonstrate how the generalisation is -possible via the exploitation of this functional hierarchical recurrent -network. -" -2795,1605.03284,Tian Tian and Yuezhang Li,Machine Comprehension Based on Learning to Rank,cs.CL," Machine comprehension plays an essential role in NLP and has been widely -explored with dataset like MCTest. However, this dataset is too simple and too -small for learning true reasoning abilities. \cite{hermann2015teaching} -therefore release a large scale news article dataset and propose a deep LSTM -reader system for machine comprehension. However, the training process is -expensive. We therefore try feature-engineered approach with semantics on the -new dataset to see how traditional machine learning technique and semantics can -help with machine comprehension. Meanwhile, our proposed L2R reader system -achieves good performance with efficiency and less training data. -" -2796,1605.03481,"Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl, William - W. Cohen",Tweet2Vec: Character-Based Distributed Representations for Social Media,cs.LG cs.CL," Text from social media provides a set of challenges that can cause -traditional NLP approaches to fail. Informal language, spelling errors, -abbreviations, and special characters are all commonplace in these posts, -leading to a prohibitively large vocabulary size for word-level approaches. We -propose a character composition model, tweet2vec, which finds vector-space -representations of whole tweets by learning complex, non-local dependencies in -character sequences. The proposed model outperforms a word-level baseline at -predicting user-annotated hashtags associated with the posts, doing -significantly better when the input contains many out-of-vocabulary words or -unusual character sequences. Our tweet2vec encoder is publicly available. -" -2797,1605.03664,"Chris Kedzie, Fernando Diaz, and Kathleen McKeown",Real-Time Web Scale Event Summarization Using Sequential Decision Making,cs.CL," We present a system based on sequential decision making for the online -summarization of massive document streams, such as those found on the web. -Given an event of interest (e.g. ""Boston marathon bombing""), our system is able -to filter the stream for relevance and produce a series of short text updates -describing the event as it unfolds over time. Unlike previous work, our -approach is able to jointly model the relevance, comprehensiveness, novelty, -and timeliness required by time-sensitive queries. We demonstrate a 28.3% -improvement in summary F1 and a 43.8% improvement in time-sensitive F1 metrics. -" -2798,1605.03705,"Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, Niket Tandon, - Christopher Pal, Hugo Larochelle, Aaron Courville, Bernt Schiele",Movie Description,cs.CV cs.CL," Audio Description (AD) provides linguistic descriptions of movies and allows -visually impaired people to follow a movie along with their peers. Such -descriptions are by design mainly visual and thus naturally form an interesting -data source for computer vision and computational linguistics. In this work we -propose a novel dataset which contains transcribed ADs, which are temporally -aligned to full length movies. In addition we also collected and aligned movie -scripts used in prior work and compare the two sources of descriptions. In -total the Large Scale Movie Description Challenge (LSMDC) contains a parallel -corpus of 118,114 sentences and video clips from 202 movies. First we -characterize the dataset by benchmarking different approaches for generating -video descriptions. Comparing ADs to scripts, we find that ADs are indeed more -visual and describe precisely what is shown rather than what should happen -according to the scripts created prior to movie production. Furthermore, we -present and compare the results of several teams who participated in a -challenge organized in the context of the workshop ""Describing and -Understanding Video & The Large Scale Movie Description Challenge (LSMDC)"", at -ICCV 2015. -" -2799,1605.03832,"Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, - Patrick Littell, David Mortensen, Alan W Black, Lori Levin and Chris Dyer","Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic - Representation Learning",cs.CL," We introduce polyglot language models, recurrent neural network models -trained to predict symbol sequences in many different languages using shared -representations of symbols and conditioning on typological information about -the language to be predicted. We apply these to the problem of modeling phone -sequences---a domain in which universal symbol inventories and -cross-linguistically shared feature representations are a natural fit. -Intrinsic evaluation on held-out perplexity, qualitative analysis of the -learned representations, and extrinsic evaluation in two downstream -applications that make use of phonetic features show (i) that polyglot models -better generalize to held-out data than comparable monolingual models and (ii) -that polyglot phonetic feature representations are of higher quality than those -learned monolingually. -" -2800,1605.03835,Kyunghyun Cho,"Noisy Parallel Approximate Decoding for Conditional Recurrent Language - Model",cs.CL cs.LG stat.ML," Recent advances in conditional recurrent language modelling have mainly -focused on network architectures (e.g., attention mechanism), learning -algorithms (e.g., scheduled sampling and sequence-level training) and novel -applications (e.g., image/video description generation, speech recognition, -etc.) On the other hand, we notice that decoding algorithms/strategies have not -been investigated as much, and it has become standard to use greedy or beam -search. In this paper, we propose a novel decoding strategy motivated by an -earlier observation that nonlinear hidden layers of a deep neural network -stretch the data manifold. The proposed strategy is embarrassingly -parallelizable without any communication overhead, while improving an existing -decoding algorithm. We extensively evaluate it with attention-based neural -machine translation on the task of En->Cz translation. -" -2801,1605.03852,"Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Brian MacWhinney, Chris - Dyer","Learning the Curriculum with Bayesian Optimization for Task-Specific - Word Representation Learning",cs.CL," We use Bayesian optimization to learn curricula for word representation -learning, optimizing performance on downstream tasks that depend on the learned -representations as features. The curricula are modeled by a linear ranking -function which is the scalar product of a learned weight vector and an -engineered feature vector that characterizes the different aspects of the -complexity of each instance in the training corpus. We show that learning the -curriculum improves performance on a variety of downstream tasks over random -orders and in comparison to the natural corpus order. -" -2802,1605.03924,"Yuezhang Li, Ronghuo Zheng, Tian Tian, Zhiting Hu, Rahul Iyer, Katia - Sycara",Joint Embeddings of Hierarchical Categories and Entities,cs.CL," Due to the lack of structured knowledge applied in learning distributed -representation of categories, existing work cannot incorporate category -hierarchies into entity information.~We propose a framework that embeds -entities and categories into a semantic space by integrating structured -knowledge and taxonomy hierarchy from large knowledge bases. The framework -allows to compute meaningful semantic relatedness between entities and -categories.~Compared with the previous state of the art, our framework can -handle both single-word concepts and multiple-word concepts with superior -performance in concept categorization and semantic relatedness. -" -2803,1605.03956,"Yingtao Tian, Vivek Kulkarni, Bryan Perozzi, Steven Skiena",On the Convergent Properties of Word Embedding Methods,cs.CL," Do word embeddings converge to learn similar things over different -initializations? How repeatable are experiments with word embeddings? Are all -word embedding techniques equally reliable? In this paper we propose evaluating -methods for learning word representations by their consistency across -initializations. We propose a measure to quantify the similarity of the learned -word representations under this setting (where they are subject to different -random initializations). Our preliminary results illustrate that our metric not -only measures a intrinsic property of word embedding methods but also -correlates well with other evaluation metrics on downstream tasks. We believe -our methods are is useful in characterizing robustness -- an important property -to consider when developing new word embedding methods. -" -2804,1605.04002,Paul Tupper and Bobak Shahriari,"Which Learning Algorithms Can Generalize Identity-Based Rules to Novel - Inputs?",cs.CL," We propose a novel framework for the analysis of learning algorithms that -allows us to say when such algorithms can and cannot generalize certain -patterns from training data to test data. In particular we focus on situations -where the rule that must be learned concerns two components of a stimulus being -identical. We call such a basis for discrimination an identity-based rule. -Identity-based rules have proven to be difficult or impossible for certain -types of learning algorithms to acquire from limited datasets. This is in -contrast to human behaviour on similar tasks. Here we provide a framework for -rigorously establishing which learning algorithms will fail at generalizing -identity-based rules to novel stimuli. We use this framework to show that such -algorithms are unable to generalize identity-based rules to novel inputs unless -trained on virtually all possible inputs. We demonstrate these results -computationally with a multilayer feedforward neural network. -" -2805,1605.04013,Stefano Gogioso (University of Oxford),A Corpus-based Toy Model for DisCoCat,cs.CL cs.LO math.CT," The categorical compositional distributional (DisCoCat) model of meaning -rigorously connects distributional semantics and pregroup grammars, and has -found a variety of applications in computational linguistics. From a more -abstract standpoint, the DisCoCat paradigm predicates the construction of a -mapping from syntax to categorical semantics. In this work we present a -concrete construction of one such mapping, from a toy model of syntax for -corpora annotated with constituent structure trees, to categorical semantics -taking place in a category of free R-semimodules over an involutive commutative -semiring R. -" -2806,1605.04072,"Pascale Fung, Dario Bertero, Yan Wan, Anik Dey, Ricky Ho Yin Chan, - Farhad Bin Siddique, Yang Yang, Chien-Sheng Wu, Ruixi Lin",Towards Empathetic Human-Robot Interactions,cs.CL cs.AI cs.HC cs.RO," Since the late 1990s when speech companies began providing their -customer-service software in the market, people have gotten used to speaking to -machines. As people interact more often with voice and gesture controlled -machines, they expect the machines to recognize different emotions, and -understand other high level communication features such as humor, sarcasm and -intention. In order to make such communication possible, the machines need an -empathy module in them which can extract emotions from human speech and -behavior and can decide the correct response of the robot. Although research on -empathetic robots is still in the early stage, we described our approach using -signal processing techniques, sentiment analysis and machine learning -algorithms to make robots that can ""understand"" human emotion. We propose Zara -the Supergirl as a prototype system of empathetic robots. It is a software -based virtual android, with an animated cartoon character to present itself on -the screen. She will get ""smarter"" and more empathetic through its deep -learning algorithms, and by gathering more data and learning from it. In this -paper, we present our work so far in the areas of deep learning of emotion and -sentiment recognition, as well as humor recognition. We hope to explore the -future direction of android development and how it can help improve people's -lives. -" -2807,1605.04122,"Richard Moot (LaBRI), Christian Retor\'e (TEXTE)",Natural Language Semantics and Computability,cs.CL cs.AI cs.CC," This paper is a reflexion on the computability of natural language semantics. -It does not contain a new model or new results in the formal semantics of -natural language: it is rather a computational analysis of the logical models -and algorithms currently used in natural language semantics, defined as the -mapping of a statement to logical formulas - formulas, because a statement can -be ambiguous. We argue that as long as possible world semantics is left out, -one can compute the semantic representation(s) of a given statement, including -aspects of lexical meaning. We also discuss the algorithmic complexity of this -process. -" -2808,1605.04227,"Madhav Nimishakavi, Uday Singh Saini and Partha Talukdar","Relation Schema Induction using Tensor Factorization with Side - Information",cs.IR cs.CL cs.DB," Given a set of documents from a specific domain (e.g., medical research -journals), how do we automatically build a Knowledge Graph (KG) for that -domain? Automatic identification of relations and their schemas, i.e., type -signature of arguments of relations (e.g., undergo(Patient, Surgery)), is an -important first step towards this goal. We refer to this problem as Relation -Schema Induction (RSI). In this paper, we propose Schema Induction using -Coupled Tensor Factorization (SICTF), a novel tensor factorization method for -relation schema induction. SICTF factorizes Open Information Extraction -(OpenIE) triples extracted from a domain corpus along with additional side -information in a principled way to induce relation schemas. To the best of our -knowledge, this is the first application of tensor factorization for the RSI -problem. Through extensive experiments on multiple real-world datasets, we find -that SICTF is not only more accurate than state-of-the-art baselines, but also -significantly faster (about 14x faster). -" -2809,1605.04238,Yuri Manin and Matilde Marcolli,Semantic Spaces,cs.CL," Any natural language can be considered as a tool for producing large -databases (consisting of texts, written, or discursive). This tool for its -description in turn requires other large databases (dictionaries, grammars -etc.). Nowadays, the notion of database is associated with computer processing -and computer memory. However, a natural language resides also in human brains -and functions in human communication, from interpersonal to intergenerational -one. We discuss in this survey/research paper mathematical, in particular -geometric, constructions, which help to bridge these two worlds. In particular, -in this paper we consider the Vector Space Model of semantics based on -frequency matrices, as used in Natural Language Processing. We investigate -underlying geometries, formulated in terms of Grassmannians, projective spaces, -and flag varieties. We formulate the relation between vector space models and -semantic spaces based on semic axes in terms of projectability of subvarieties -in Grassmannians and projective spaces. We interpret Latent Semantics as a -geometric flow on Grassmannians. We also discuss how to formulate G\""ardenfors' -notion of ""meeting of minds"" in our geometric setting. -" -2810,1605.04278,"Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia - Lam, Keiko Sophie Mori, Sebastian Garza and Boris Katz",Universal Dependencies for Learner English,cs.CL," We introduce the Treebank of Learner English (TLE), the first publicly -available syntactic treebank for English as a Second Language (ESL). The TLE -provides manually annotated POS tags and Universal Dependency (UD) trees for -5,124 sentences from the Cambridge First Certificate in English (FCE) corpus. -The UD annotations are tied to a pre-existing error annotation of the FCE, -whereby full syntactic analyses are provided for both the original and error -corrected versions of each sentence. Further on, we delineate ESL annotation -guidelines that allow for consistent syntactic treatment of ungrammatical -English. Finally, we benchmark POS tagging and dependency parsing performance -on the TLE dataset and measure the effect of grammatical errors on parsing -accuracy. We envision the treebank to support a wide range of linguistic and -computational research on second language acquisition as well as automatic -processing of ungrammatical language. The treebank is available at -universaldependencies.org. The annotation manual used in this project and a -graphical query engine are available at esltreebank.org. -" -2811,1605.04359,"Aman Madaan, Sunita Sarawagi","Occurrence Statistics of Entities, Relations and Types on the Web",cs.CL," The problem of collecting reliable estimates of occurrence of entities on the -open web forms the premise for this report. The models learned for tagging -entities cannot be expected to perform well when deployed on the web. This is -owing to the severe mismatch in the distributions of such entities on the web -and in the relatively diminutive training data. In this report, we build up the -case for maximum mean discrepancy for estimation of occurrence statistics of -entities on the web, taking a review of named entity disambiguation techniques -and related concepts along the way. -" -2812,1605.04462,"Tim Althoff, Kevin Clark, Jure Leskovec","Large-scale Analysis of Counseling Conversations: An Application of - Natural Language Processing to Mental Health",cs.CL cs.CY cs.SI," Mental illness is one of the most pressing public health issues of our time. -While counseling and psychotherapy can be effective treatments, our knowledge -about how to conduct successful counseling conversations has been limited due -to lack of large-scale data with labeled outcomes of the conversations. In this -paper, we present a large-scale, quantitative study on the discourse of -text-message-based counseling conversations. We develop a set of novel -computational discourse analysis methods to measure how various linguistic -aspects of conversations are correlated with conversation outcomes. Applying -techniques such as sequence-based conversation models, language model -comparisons, message clustering, and psycholinguistics-inspired word frequency -analyses, we discover actionable conversation strategies that are associated -with better conversation outcomes. -" -2813,1605.04469,"Ye Zhang, Iain Marshall, Byron C. Wallace","Rationale-Augmented Convolutional Neural Networks for Text - Classification",cs.CL," We present a new Convolutional Neural Network (CNN) model for text -classification that jointly exploits labels on documents and their component -sentences. Specifically, we consider scenarios in which annotators explicitly -mark sentences (or snippets) that support their overall document -categorization, i.e., they provide rationales. Our model exploits such -supervision via a hierarchical approach in which each document is represented -by a linear combination of the vector representations of its component -sentences. We propose a sentence-level convolutional model that estimates the -probability that a given sentence is a rationale, and we then scale the -contribution of each sentence to the aggregate document representation in -proportion to these estimates. Experiments on five classification datasets that -have document labels and associated rationales demonstrate that our approach -consistently outperforms strong baselines. Moreover, our model naturally -provides explanations for its predictions. -" -2814,1605.04475,Ryan Georgi and Fei Xia and William D. Lewis,Capturing divergence in dependency trees to improve syntactic projection,cs.CL," Obtaining syntactic parses is a crucial part of many NLP pipelines. However, -most of the world's languages do not have large amounts of syntactically -annotated corpora available for building parsers. Syntactic projection -techniques attempt to address this issue by using parallel corpora consisting -of resource-poor and resource-rich language pairs, taking advantage of a parser -for the resource-rich language and word alignment between the languages to -project the parses onto the data for the resource-poor language. These -projection methods can suffer, however, when the two languages are divergent. -In this paper, we investigate the possibility of using small, parallel, -annotated corpora to automatically detect divergent structural patterns between -two languages. These patterns can then be used to improve structural projection -algorithms, allowing for better performing NLP tools for resource-poor -languages, in particular those that may not have large amounts of annotated -data necessary for traditional, fully-supervised methods. While this detection -process is not exhaustive, we demonstrate that common patterns of divergence -can be identified automatically without prior knowledge of a given language -pair, and the patterns can be used to improve performance of projection -algorithms. -" -2815,1605.04481,"Yevgeni Berzak, Yan Huang, Andrei Barbu, Anna Korhonen, Boris Katz",Anchoring and Agreement in Syntactic Annotations,cs.CL," We present a study on two key characteristics of human syntactic annotations: -anchoring and agreement. Anchoring is a well known cognitive bias in human -decision making, where judgments are drawn towards pre-existing values. We -study the influence of anchoring on a standard approach to creation of -syntactic resources where syntactic annotations are obtained via human editing -of tagger and parser output. Our experiments demonstrate a clear anchoring -effect and reveal unwanted consequences, including overestimation of parsing -performance and lower quality of annotations in comparison with human-based -annotations. Using sentences from the Penn Treebank WSJ, we also report -systematically obtained inter-annotator agreement estimates for English -dependency parsing. Our agreement results control for parser bias, and are -consequential in that they are on par with state of the art parsing performance -for English newswire. We discuss the impact of our findings on strategies for -future annotation efforts and parser evaluations. -" -2816,1605.04515,Lifeng Han,Machine Translation Evaluation Resources and Methods: A Survey,cs.CL," We introduce the Machine Translation (MT) evaluation survey that contains -both manual and automatic evaluation methods. The traditional human evaluation -criteria mainly include the intelligibility, fidelity, fluency, adequacy, -comprehension, and informativeness. The advanced human assessments include -task-oriented measures, post-editing, segment ranking, and extended criteriea, -etc. We classify the automatic evaluation methods into two categories, -including lexical similarity scenario and linguistic features application. The -lexical similarity methods contain edit distance, precision, recall, F-measure, -and word order. The linguistic features can be divided into syntactic features -and semantic features respectively. The syntactic features include part of -speech tag, phrase types and sentence structures, and the semantic features -include named entity, synonyms, textual entailment, paraphrase, semantic roles, -and language models. The deep learning models for evaluation are very newly -proposed. Subsequently, we also introduce the evaluation methods for MT -evaluation including different correlation scores, and the recent quality -estimation (QE) tasks for MT. - This paper differs from the existing works -\cite{GALEprogram2009,EuroMatrixProject2007} from several aspects, by -introducing some recent development of MT evaluation measures, the different -classifications from manual to automatic evaluation measures, the introduction -of recent QE tasks of MT, and the concise construction of the content. - We hope this work will be helpful for MT researchers to easily pick up some -metrics that are best suitable for their specific MT model development, and -help MT evaluation researchers to get a general clue of how MT evaluation -research developed. Furthermore, hopefully, this work can also shine some light -on other evaluation tasks, except for translation, of NLP fields. -" -2817,1605.04553,Dmitrijs Milajevs and Sascha Griffiths,A Proposal for Linguistic Similarity Datasets Based on Commonality Lists,cs.CL," Similarity is a core notion that is used in psychology and two branches of -linguistics: theoretical and computational. The similarity datasets that come -from the two fields differ in design: psychological datasets are focused around -a certain topic such as fruit names, while linguistic datasets contain words -from various categories. The later makes humans assign low similarity scores to -the words that have nothing in common and to the words that have contrast in -meaning, making similarity scores ambiguous. In this work we discuss the -similarity collection procedure for a multi-category dataset that avoids score -ambiguity and suggest changes to the evaluation procedure to reflect the -insights of psychological literature for word, phrase and sentence similarity. -We suggest to ask humans to provide a list of commonalities and differences -instead of numerical similarity scores and employ the structure of human -judgements beyond pairwise similarity for model evaluation. We believe that the -proposed approach will give rise to datasets that test meaning representation -models more thoroughly with respect to the human treatment of similarity. -" -2818,1605.04569,"Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne",Syntactically Guided Neural Machine Translation,cs.CL," We investigate the use of hierarchical phrase-based SMT lattices in -end-to-end neural machine translation (NMT). Weight pushing transforms the -Hiero scores for complete translation hypotheses, with the full translation -grammar score and full n-gram language model score, into posteriors compatible -with NMT predictive probabilities. With a slightly modified NMT beam-search -decoder we find gains over both Hiero and NMT decoding alone, with practical -advantages in extending NMT to very large input and output vocabularies. -" -2819,1605.04655,"Petr Baudis, Silvestr Stanko and Jan Sedivy",Joint Learning of Sentence Embeddings for Relevance and Entailment,cs.CL cs.LG cs.NE," We consider the problem of Recognizing Textual Entailment within an -Information Retrieval context, where we must simultaneously determine the -relevancy as well as degree of entailment for individual pieces of evidence to -determine a yes/no answer to a binary natural language question. - We compare several variants of neural networks for sentence embeddings in a -setting of decision-making based on evidence of varying relevance. We propose a -basic model to integrate evidence for entailment, show that joint training of -the sentence embeddings to model relevance and entailment is feasible even with -no explicit per-evidence supervision, and show the importance of evaluating -strong baselines. We also demonstrate the benefit of carrying over text -comprehension model trained on an unrelated task for our small datasets. - Our research is motivated primarily by a new open dataset we introduce, -consisting of binary questions and news-based evidence snippets. We also apply -the proposed relevance-entailment model on a similar task of ranking -multiple-choice test answers, evaluating it on a preliminary dataset of school -test questions as well as the standard MCTest dataset, where we improve the -neural model state-of-art. -" -2820,1605.04800,Marcin Junczys-Dowmunt and Roman Grundkiewicz,"Log-linear Combinations of Monolingual and Bilingual Neural Machine - Translation Models for Automatic Post-Editing",cs.CL," This paper describes the submission of the AMU (Adam Mickiewicz University) -team to the Automatic Post-Editing (APE) task of WMT 2016. We explore the -application of neural translation models to the APE problem and achieve good -results by treating different models as components in a log-linear model, -allowing for multiple inputs (the MT-output and the source) that are decoded to -the same target language (post-edited translations). A simple string-matching -penalty integrated within the log-linear model is used to control for higher -faithfulness with regard to the raw machine translation output. To overcome the -problem of too little training data, we generate large amounts of artificial -data. Our submission improves over the uncorrected baseline on the unseen test -set by -3.2\% TER and +5.5\% BLEU and outperforms any other system submitted to -the shared-task by a large margin. -" -2821,1605.04809,"Marcin Junczys-Dowmunt, Tomasz Dwojak, Rico Sennrich","The AMU-UEDIN Submission to the WMT16 News Translation Task: - Attention-based NMT Models as Feature Functions in Phrase-based SMT",cs.CL," This paper describes the AMU-UEDIN submissions to the WMT 2016 shared task on -news translation. We explore methods of decode-time integration of -attention-based neural translation models with phrase-based statistical machine -translation. Efficient batch-algorithms for GPU-querying are proposed and -implemented. For English-Russian, our system stays behind the state-of-the-art -pure neural models in terms of BLEU. Among restricted systems, manual -evaluation places it in the first cluster tied with the pure neural model. For -the Russian-English task, our submission achieves the top BLEU result, -outperforming the best pure neural system by 1.1 BLEU points and our own -phrase-based baseline by 1.6 BLEU. After manual evaluation, this system is the -best restricted system in its own cluster. In follow-up experiments we improve -results by additional 0.8 BLEU. -" -2822,1605.05087,Hirotaka Niitsuma and Minho Lee,"Word2Vec is a special case of Kernel Correspondence Analysis and Kernels - for Natural Language Processing",cs.LG cs.CL," We show that correspondence analysis (CA) is equivalent to defining a Gini -index with appropriately scaled one-hot encoding. Using this relation, we -introduce a nonlinear kernel extension to CA. This extended CA gives a known -analysis for natural language via specialized kernels that use an appropriate -contingency table. We propose a semi-supervised CA, which is a special case of -the kernel extension to CA. Because CA requires excessive memory if applied to -numerous categories, CA has not been used for natural language processing. We -address this problem by introducing delayed evaluation to randomized singular -value decomposition. The memory-efficient CA is then applied to a word-vector -representation task. We propose a tail-cut kernel, which is an extension to the -skip-gram within the kernel extension to CA. Our tail-cut kernel outperforms -existing word-vector representation methods. -" -2823,1605.05101,"Pengfei Liu, Xipeng Qiu, Xuanjing Huang","Recurrent Neural Network for Text Classification with Multi-Task - Learning",cs.CL," Neural network based methods have obtained great progress on a variety of -natural language processing tasks. However, in most previous works, the models -are learned based on single-task supervised objectives, which often suffer from -insufficient training data. In this paper, we use the multi-task learning -framework to jointly learn across multiple related tasks. Based on recurrent -neural network, we propose three different mechanisms of sharing information to -model text with task-specific and shared layers. The entire network is trained -jointly on all these tasks. Experiments on four benchmark text classification -tasks show that our proposed models can improve the performance of a task with -the help of other related tasks. -" -2824,1605.05110,"Zhen Xu, Bingquan Liu, Baoxun Wang, Chengjie Sun, Xiaolong Wang","Incorporating Loose-Structured Knowledge into Conversation Modeling via - Recall-Gate LSTM",cs.CL," Modeling human conversations is the essence for building satisfying chat-bots -with multi-turn dialog ability. Conversation modeling will notably benefit from -domain knowledge since the relationships between sentences can be clarified due -to semantic hints introduced by knowledge. In this paper, a deep neural network -is proposed to incorporate background knowledge for conversation modeling. -Through a specially designed Recall gate, domain knowledge can be transformed -into the extra global memory of Long Short-Term Memory (LSTM), so as to enhance -LSTM by cooperating with its local memory to capture the implicit semantic -relevance between sentences within conversations. In addition, this paper -introduces the loose structured domain knowledge base, which can be built with -slight amount of manual work and easily adopted by the Recall gate. Our model -is evaluated on the context-oriented response selecting task, and experimental -results on both two datasets have shown that our approach is promising for -modeling human conversations and building key components of automatic chatting -systems. -" -2825,1605.05134,"Soroush Vosoughi, Deb Roy","A Semi-automatic Method for Efficient Detection of Stories on Social - Media",cs.SI cs.CL cs.IR," Twitter has become one of the main sources of news for many people. As -real-world events and emergencies unfold, Twitter is abuzz with hundreds of -thousands of stories about the events. Some of these stories are harmless, -while others could potentially be life-saving or sources of malicious rumors. -Thus, it is critically important to be able to efficiently track stories that -spread on Twitter during these events. In this paper, we present a novel -semi-automatic tool that enables users to efficiently identify and track -stories about real-world events on Twitter. We ran a user study with 25 -participants, demonstrating that compared to more conventional methods, our -tool can increase the speed and the accuracy with which users can track stories -about real-world events. -" -2826,1605.05150,"Prashanth Vijayaraghavan, Soroush Vosoughi, Deb Roy",Automatic Detection and Categorization of Election-Related Tweets,cs.CL cs.IT cs.SI math.IT," With the rise in popularity of public social media and micro-blogging -services, most notably Twitter, the people have found a venue to hear and be -heard by their peers without an intermediary. As a consequence, and aided by -the public nature of Twitter, political scientists now potentially have the -means to analyse and understand the narratives that organically form, spread -and decline among the public in a political campaign. However, the volume and -diversity of the conversation on Twitter, combined with its noisy and -idiosyncratic nature, make this a hard task. Thus, advanced data mining and -language processing techniques are required to process and analyse the data. In -this paper, we present and evaluate a technical framework, based on recent -advances in deep neural networks, for identifying and analysing -election-related conversation on Twitter on a continuous, longitudinal basis. -Our models can detect election-related tweets with an F-score of 0.92 and can -categorize these tweets into 22 topics with an F-score of 0.90. -" -2827,1605.05156,"Soroush Vosoughi, Deb Roy",Tweet Acts: A Speech Act Classifier for Twitter,cs.CL cs.SI," Speech acts are a way to conceptualize speech as action. This holds true for -communication on any platform, including social media platforms such as -Twitter. In this paper, we explored speech act recognition on Twitter by -treating it as a multi-class classification problem. We created a taxonomy of -six speech acts for Twitter and proposed a set of semantic and syntactic -features. We trained and tested a logistic regression classifier using a data -set of manually labelled tweets. Our method achieved a state-of-the-art -performance with an average F1 score of more than $0.70$. We also explored -classifiers with three different granularities (Twitter-wide, type-specific and -topic-specific) in order to find the right balance between generalization and -overfitting for our task. -" -2828,1605.05166,"Soroush Vosoughi, Helen Zhou, Deb Roy",Digital Stylometry: Linking Profiles Across Social Networks,cs.SI cs.AI cs.CL cs.IR," There is an ever growing number of users with accounts on multiple social -media and networking sites. Consequently, there is increasing interest in -matching user accounts and profiles across different social networks in order -to create aggregate profiles of users. In this paper, we present models for -Digital Stylometry, which is a method for matching users through stylometry -inspired techniques. We experimented with linguistic, temporal, and combined -temporal-linguistic models for matching user accounts, using standard and novel -techniques. Using publicly available data, our best model, a combined -temporal-linguistic one, was able to correctly match the accounts of 31% of -5,612 distinct users across Twitter and Facebook. -" -2829,1605.05172,Taraka Rama,"Siamese convolutional networks based on phonetic features for cognate - identification",cs.CL," In this paper, we explore the use of convolutional networks (ConvNets) for -the purpose of cognate identification. We compare our architecture with binary -classifiers based on string similarity measures on different language families. -Our experiments show that convolutional networks achieve competitive results -across concepts and across language families at the task of cognate -identification. -" -2830,1605.05195,"Soroush Vosoughi, Helen Zhou, Deb Roy",Enhanced Twitter Sentiment Classification Using Contextual Information,cs.SI cs.AI cs.CL cs.IR," The rise in popularity and ubiquity of Twitter has made sentiment analysis of -tweets an important and well-covered area of research. However, the 140 -character limit imposed on tweets makes it hard to use standard linguistic -methods for sentiment classification. On the other hand, what tweets lack in -structure they make up with sheer volume and rich metadata. This metadata -includes geolocation, temporal and author information. We hypothesize that -sentiment is dependent on all these contextual factors. Different locations, -times and authors have different emotional valences. In this paper, we explored -this hypothesis by utilizing distant supervision to collect millions of -labelled tweets from different locations, times and authors. We used this data -to analyse the variation of tweet sentiments across different authors, times -and locations. Once we explored and understood the relationship between these -variables and sentiment, we used a Bayesian approach to combine these variables -with more standard linguistic features such as n-grams to create a Twitter -sentiment classifier. This combined classifier outperforms the purely -linguistic classifier, showing that integrating the rich contextual information -available on Twitter into sentiment classification is a promising direction of -research. -" -2831,1605.05303,"A. Ramos-Soto, A. Bugar\'in, S. Barro",Fuzzy Sets Across the Natural Language Generation Pipeline,cs.AI cs.CL," We explore the implications of using fuzzy techniques (mainly those commonly -used in the linguistic description/summarization of data discipline) from a -natural language generation perspective. For this, we provide an extensive -discussion of some general convergence points and an exploration of the -relationship between the different tasks involved in the standard NLG system -pipeline architecture and the most common fuzzy approaches used in linguistic -summarization/description of data, such as fuzzy quantified statements, -evaluation criteria or aggregation operators. Each individual discussion is -illustrated with a related use case. Recent work made in the context of -cross-fertilization of both research fields is also referenced. This paper -encompasses general ideas that emerged as part of the PhD thesis ""Application -of fuzzy sets in data-to-text systems"". It does not present a specific -application or a formal approach, but rather discusses current high-level -issues and potential usages of fuzzy sets (focused on linguistic summarization -of data) in natural language generation. -" -2832,1605.05362,Nabiha Asghar,Yelp Dataset Challenge: Review Rating Prediction,cs.CL cs.IR cs.LG," Review websites, such as TripAdvisor and Yelp, allow users to post online -reviews for various businesses, products and services, and have been recently -shown to have a significant influence on consumer shopping behaviour. An online -review typically consists of free-form text and a star rating out of 5. The -problem of predicting a user's star rating for a product, given the user's text -review for that product, is called Review Rating Prediction and has lately -become a popular, albeit hard, problem in machine learning. In this paper, we -treat Review Rating Prediction as a multi-class classification problem, and -build sixteen different prediction models by combining four feature extraction -methods, (i) unigrams, (ii) bigrams, (iii) trigrams and (iv) Latent Semantic -Indexing, with four machine learning algorithms, (i) logistic regression, (ii) -Naive Bayes classification, (iii) perceptrons, and (iv) linear Support Vector -Classification. We analyse the performance of each of these sixteen models to -come up with the best model for predicting the ratings from reviews. We use the -dataset provided by Yelp for training and testing the models. -" -2833,1605.05414,"Ryan Lowe, Iulian V. Serban, Mike Noseworthy, Laurent Charlin, Joelle - Pineau",On the Evaluation of Dialogue Systems with Next Utterance Classification,cs.CL cs.LG," An open challenge in constructing dialogue systems is developing methods for -automatically learning dialogue strategies from large amounts of unlabelled -data. Recent work has proposed Next-Utterance-Classification (NUC) as a -surrogate task for building dialogue systems from text data. In this paper we -investigate the performance of humans on this task to validate the relevance of -NUC as a method of evaluation. Our results show three main findings: (1) humans -are able to correctly classify responses at a rate much better than chance, -thus confirming that the task is feasible, (2) human performance levels vary -across task domains (we consider 3 datasets) and expertise levels (novice vs -experts), thus showing that a range of performance is possible on this type of -task, (3) automated dialogue systems built using state-of-the-art machine -learning methods have similar performance to the human novices, but worse than -the experts, thus confirming the utility of this class of tasks for driving -further research in automated dialogue systems. -" -2834,1605.05416,"Teng Long, Ryan Lowe, Jackie Chi Kit Cheung, Doina Precup","Leveraging Lexical Resources for Learning Entity Embeddings in - Multi-Relational Data",cs.CL," Recent work in learning vector-space embeddings for multi-relational data has -focused on combining relational information derived from knowledge bases with -distributional information derived from large text corpora. We propose a simple -approach that leverages the descriptions of entities or phrases available in -lexical resources, in conjunction with distributional semantics, in order to -derive a better initialization for training relational models. Applying this -initialization to the TransE model results in significant new state-of-the-art -performances on the WordNet dataset, decreasing the mean rank from the previous -best of 212 to 51. It also results in faster convergence of the entity -representations. We find that there is a trade-off between improving the mean -rank and the hits@10 with this approach. This illustrates that much remains to -be understood regarding performance improvements in relational models. -" -2835,1605.05433,"Stephen Roller, Katrin Erk","Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns - in Distributional Vectors for Lexical Entailment",cs.CL cs.AI," We consider the task of predicting lexical entailment using distributional -vectors. We perform a novel qualitative analysis of one existing model which -was previously shown to only measure the prototypicality of word pairs. We find -that the model strongly learns to identify hypernyms using Hearst patterns, -which are well known to be predictive of lexical relations. We present a novel -model which exploits this behavior as a method of feature extraction in an -iterative procedure similar to Principal Component Analysis. Our model combines -the extracted features with the strengths of other proposed models in the -literature, and matches or outperforms prior work on multiple data sets. -" -2836,1605.05573,"Pengfei Liu, Xipeng Qiu, Xuanjing Huang",Modelling Interaction of Sentence Pair with coupled-LSTMs,cs.CL," Recently, there is rising interest in modelling the interactions of two -sentences with deep neural networks. However, most of the existing methods -encode two sequences with separate encoders, in which a sentence is encoded -with little or no information from the other sentence. In this paper, we -propose a deep architecture to model the strong interaction of sentence pair -with two coupled-LSTMs. Specifically, we introduce two coupled ways to model -the interdependences of two LSTMs, coupling the local contextualized -interactions of two sentences. We then aggregate these interactions and use a -dynamic pooling to select the most informative features. Experiments on two -very large datasets demonstrate the efficacy of our proposed architecture and -its superiority to state-of-the-art methods. -" -2837,1605.05894,"Muhammad Imran, Prasenjit Mitra, Carlos Castillo","Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of - Crisis-related Messages",cs.CL cs.CY cs.SI," Microblogging platforms such as Twitter provide active communication channels -during mass convergence and emergency events such as earthquakes, typhoons. -During the sudden onset of a crisis situation, affected people post useful -information on Twitter that can be used for situational awareness and other -humanitarian disaster response efforts, if processed timely and effectively. -Processing social media information pose multiple challenges such as parsing -noisy, brief and informal messages, learning information categories from the -incoming stream of messages and classifying them into different classes among -others. One of the basic necessities of many of these tasks is the availability -of data, in particular human-annotated data. In this paper, we present -human-annotated Twitter corpora collected during 19 different crises that took -place between 2013 and 2015. To demonstrate the utility of the annotations, we -train machine learning classifiers. Moreover, we publish first largest word2vec -word embeddings trained on 52 million crisis-related tweets. To deal with -tweets language issues, we present human-annotated normalized lexical resources -for different lexical variations. -" -2838,1605.05906,"Alena Zwahlen, Olivier Carnal, Samuel L\""aubli","Automatic TM Cleaning through MT and POS Tagging: Autodesk's Submission - to the NLP4TM 2016 Shared Task",cs.CL," We describe a machine learning based method to identify incorrect entries in -translation memories. It extends previous work by Barbu (2015) through -incorporating recall-based machine translation and part-of-speech-tagging -features. Our system ranked first in the Binary Classification (II) task for -two out of three language pairs: English-Italian and English-Spanish. -" -2839,1605.06069,"Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, - Joelle Pineau, Aaron Courville, Yoshua Bengio","A Hierarchical Latent Variable Encoder-Decoder Model for Generating - Dialogues",cs.CL cs.AI cs.LG cs.NE," Sequential data often possesses a hierarchical structure with complex -dependencies between subsequences, such as found between the utterances in a -dialogue. In an effort to model this kind of generative process, we propose a -neural network-based generative architecture, with latent stochastic variables -that span a variable number of time steps. We apply the proposed model to the -task of dialogue response generation and compare it with recent neural network -architectures. We evaluate the model performance through automatic evaluation -metrics and by carrying out a human evaluation. The experiments demonstrate -that our model improves upon recently proposed models and that the latent -variables facilitate the generation of long outputs and maintain the context. -" -2840,1605.06083,Emiel van Miltenburg,Stereotyping and Bias in the Flickr30K Dataset,cs.CL cs.CV," An untested assumption behind the crowdsourced descriptions of the images in -the Flickr30K dataset (Young et al., 2014) is that they ""focus only on the -information that can be obtained from the image alone"" (Hodosh et al., 2013, p. -859). This paper presents some evidence against this assumption, and provides a -list of biases and unwarranted inferences that can be found in the Flickr30K -dataset. Finally, it considers methods to find examples of these, and discusses -how we should deal with stereotype-driven descriptions in future applications. -" -2841,1605.06304,"Yang Lou, Guanrong Chen, Zhengping Fan, Luna Xiang","Local communities obstruct global consensus: Naming game on - multi-local-world networks",cs.SI cs.CL physics.soc-ph," Community structure is essential for social communications, where individuals -belonging to the same community are much more actively interacting and -communicating with each other than those in different communities within the -human society. Naming game, on the other hand, is a social communication model -that simulates the process of learning a name of an object within a community -of humans, where the individuals can generally reach global consensus -asymptotically through iterative pair-wise conversations. The underlying -network indicates the relationships among the individuals. In this paper, three -typical topologies, namely random-graph, small-world and scale-free networks, -are employed, which are embedded with the multi-local-world community -structure, to study the naming game. Simulations show that 1) the convergence -process to global consensus is getting slower as the community structure -becomes more prominent, and eventually might fail; 2) if the inter-community -connections are sufficiently dense, neither the number nor the size of the -communities affects the convergence process; and 3) for different topologies -with the same average node-degree, local clustering of individuals obstruct or -prohibit global consensus to take place. The results reveal the role of local -communities in a global naming game in social network studies. -" -2842,1605.06319,Nikola Milosevic and Goran Nenadic,"As Cool as a Cucumber: Towards a Corpus of Contemporary Similes in - Serbian",cs.CL cs.AI," Similes are natural language expressions used to compare unlikely things, -where the comparison is not taken literally. They are often used in everyday -communication and are an important part of cultural heritage. Having an -up-to-date corpus of similes is challenging, as they are constantly coined -and/or adapted to the contemporary times. In this paper we present a -methodology for semi-automated collection of similes from the world wide web -using text mining techniques. We expanded an existing corpus of traditional -similes (containing 333 similes) by collecting 446 additional expressions. We, -also, explore how crowdsourcing can be used to extract and curate new similes. -" -2843,1605.06353,Marcin Junczys-Dowmunt and Roman Grundkiewicz,"Phrase-based Machine Translation is State-of-the-Art for Automatic - Grammatical Error Correction",cs.CL," In this work, we study parameter tuning towards the M^2 metric, the standard -metric for automatic grammar error correction (GEC) tasks. After implementing -M^2 as a scorer in the Moses tuning framework, we investigate interactions of -dense and sparse features, different optimizers, and tuning strategies for the -CoNLL-2014 shared task. We notice erratic behavior when optimizing sparse -feature weights with M^2 and offer partial solutions. We find that a bare-bones -phrase-based SMT setup with task-specific parameter-tuning outperforms all -previously published results for the CoNLL-2014 test set by a large margin -(46.37% M^2 over previously 41.75%, by an SMT system with neural features) -while being trained on the same, publicly available data. Our newly introduced -dense and sparse features widen that gap, and we improve the state-of-the-art -to 49.49% M^2. -" -2844,1605.06650,"Peixian Chen, Nevin L. Zhang, Tengfei Liu, Leonard K.M. Poon, Zhourong - Chen and Farhan Khawar",Latent Tree Models for Hierarchical Topic Detection,cs.CL cs.IR cs.LG stat.ML," We present a novel method for hierarchical topic detection where topics are -obtained by clustering documents in multiple ways. Specifically, we model -document collections using a class of graphical models called hierarchical -latent tree models (HLTMs). The variables at the bottom level of an HLTM are -observed binary variables that represent the presence/absence of words in a -document. The variables at other levels are binary latent variables, with those -at the lowest latent level representing word co-occurrence patterns and those -at higher levels representing co-occurrence of patterns at the level below. -Each latent variable gives a soft partition of the documents, and document -clusters in the partitions are interpreted as topics. Latent variables at high -levels of the hierarchy capture long-range word co-occurrence patterns and -hence give thematically more general topics, while those at low levels of the -hierarchy capture short-range word co-occurrence patterns and give thematically -more specific topics. Unlike LDA-based topic models, HLTMs do not refer to a -document generation process and use word variables instead of token variables. -They use a tree structure to model the relationships between topics and words, -which is conducive to the discovery of meaningful topics and topic hierarchies. -" -2845,1605.06770,"Longyue Wang, Xiaojun Zhang, Zhaopeng Tu, Andy Way, Qun Liu",Automatic Construction of Discourse Corpora for Dialogue Translation,cs.CL," In this paper, a novel approach is proposed to automatically construct -parallel discourse corpus for dialogue machine translation. Firstly, the -parallel subtitle data and its corresponding monolingual movie script data are -crawled and collected from Internet. Then tags such as speaker and discourse -boundary from the script data are projected to its subtitle data via an -information retrieval approach in order to map monolingual discourse to -bilingual texts. We not only evaluate the mapping results, but also integrate -speaker information into the translation. Experiments show our proposed method -can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary -annotation, and speaker-based language model adaptation can obtain around 0.5 -BLEU points improvement in translation qualities. Finally, we publicly release -around 100K parallel discourse data with manual speaker and dialogue boundary -annotation. -" -2846,1605.06778,"Maximilian Schmitt and Bj\""orn W. Schuller","openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words - Toolkit",cs.CV cs.CL cs.IR," We introduce openXBOW, an open-source toolkit for the generation of -bag-of-words (BoW) representations from multimodal input. In the BoW principle, -word histograms were first used as features in document classification, but the -idea was and can easily be adapted to, e.g., acoustic or visual low-level -descriptors, introducing a prior step of vector quantisation. The openXBOW -toolkit supports arbitrary numeric input features and text input and -concatenates computed subbags to a final bag. It provides a variety of -extensions and options. To our knowledge, openXBOW is the first publicly -available toolkit for the generation of crossmodal bags-of-words. The -capabilities of the tool are exemplified in two sample scenarios: -time-continuous speech-based emotion recognition and sentiment analysis in -tweets where improved results over other feature representation forms were -observed. -" -2847,1605.06799,"Andrea Webb Luangrath, Joann Peck, Victor A. Barger",Textual Paralanguage and its Implications for Marketing Communications,cs.CL cs.SI," Both face-to-face communication and communication in online environments -convey information beyond the actual verbal message. In a traditional -face-to-face conversation, paralanguage, or the ancillary meaning- and -emotion-laden aspects of speech that are not actual verbal prose, gives -contextual information that allows interactors to more appropriately understand -the message being conveyed. In this paper, we conceptualize textual -paralanguage (TPL), which we define as written manifestations of nonverbal -audible, tactile, and visual elements that supplement or replace written -language and that can be expressed through words, symbols, images, punctuation, -demarcations, or any combination of these elements. We develop a typology of -textual paralanguage using data from Twitter, Facebook, and Instagram. We -present a conceptual framework of antecedents and consequences of brands' use -of textual paralanguage. Implications for theory and practice are discussed. -" -2848,1605.07133,"Angeliki Lazaridou, Nghia The Pham and Marco Baroni",Towards Multi-Agent Communication-Based Language Learning,cs.CL cs.CV cs.LG," We propose an interactive multimodal framework for language learning. Instead -of being passively exposed to large amounts of natural text, our learners -(implemented as feed-forward neural networks) engage in cooperative referential -games starting from a tabula rasa setup, and thus develop their own language -from the need to communicate in order to succeed at the game. Preliminary -experiments provide promising results, but also suggest that it is important to -ensure that agents trained in this way do not develop an adhoc communication -code only effective for the game they are playing -" -2849,1605.07268,"Eliana Scheihing, Matthieu Vernier, Javiera Born, Julio Guerra, Luis - Carcamo","Classifying discourse in a CSCL platform to evaluate correlations with - Teacher Participation and Progress",cs.CY cs.CL," In Computer-Supported learning, monitoring and engaging a group of learners -is a complex task for teachers, especially when learners are working -collaboratively: Are my students motivated? What kind of progress are they -making? Should I intervene? Is my communication and the didactic design adapted -to my students? Our hypothesis is that the analysis of natural language -interactions between students, and between students and teachers, provide very -valuable information and could be used to produce qualitative indicators to -help teachers' decisions. We develop an automatic approach in three steps (1) -to explore the discursive functions of messages in a CSCL platform, (2) to -classify the messages automatically and (3) to evaluate correlations between -discursive attitudes and other variables linked to the learning activity. -Results tend to show that some types of discourse are correlated with a notion -of Progress on the learning activities and the importance of emotive -participation from the Teacher. -" -2850,1605.07333,"Ngoc Thang Vu and Heike Adel and Pankaj Gupta and Hinrich Sch\""utze","Combining Recurrent and Convolutional Neural Networks for Relation - Classification",cs.CL," This paper investigates two different neural architectures for the task of -relation classification: convolutional neural networks and recurrent neural -networks. For both models, we demonstrate the effect of different architectural -choices. We present a new context representation for convolutional neural -networks for relation classification (extended middle context). Furthermore, we -propose connectionist bi-directional recurrent neural networks and introduce -ranking loss for their optimization. Finally, we show that combining -convolutional and recurrent neural networks using a simple voting scheme is -accurate enough to improve results. Our neural models achieve state-of-the-art -results on the SemEval 2010 relation classification task. -" -2851,1605.07346,"Abdelaziz Lakhfif, Mohammed T. Laskri, Eric Atwell","Multi-Level Analysis and Annotation of Arabic Corpora for Text-to-Sign - Language MT",cs.CL," In this paper, we present an ongoing effort in lexical semantic analysis and -annotation of Modern Standard Arabic (MSA) text, a semi automatic annotation -tool concerned with the morphologic, syntactic, and semantic levels of -description. -" -2852,1605.07366,"Nikhilesh Bhatnagar, Radhika Mamidi",Experiments in Linear Template Combination using Genetic Algorithms,cs.CL," Natural Language Generation systems typically have two parts - strategic -('what to say') and tactical ('how to say'). We present our experiments in -building an unsupervised corpus-driven template based tactical NLG system. We -consider templates as a sequence of words containing gaps. Our idea is based on -the observation that templates are grammatical locally (within their textual -span). We posit the construction of a sentence as a highly restricted sequence -of such templates. This work is an attempt to explore the resulting search -space using Genetic Algorithms to arrive at acceptable solutions. We present a -baseline implementation of this approach which outputs gapped text. -" -2853,1605.07427,"Sarath Chandar, Sungjin Ahn, Hugo Larochelle, Pascal Vincent, Gerald - Tesauro, Yoshua Bengio",Hierarchical Memory Networks,stat.ML cs.CL cs.LG cs.NE," Memory networks are neural networks with an explicit memory component that -can be both read and written to by the network. The memory is often addressed -in a soft way using a softmax function, making end-to-end training with -backpropagation possible. However, this is not computationally scalable for -applications which require the network to read from extremely large memories. -On the other hand, it is well known that hard attention mechanisms based on -reinforcement learning are challenging to train successfully. In this paper, we -explore a form of hierarchical memory network, which can be considered as a -hybrid between hard and soft attention memory networks. The memory is organized -in a hierarchical structure such that reading from it is done with less -computation than soft attention over a flat memory, while also being easier to -train than hard attention over a flat memory. Specifically, we propose to -incorporate Maximum Inner Product Search (MIPS) in the training and inference -procedures for our hierarchical memory network. We explore the use of various -state-of-the art approximate MIPS techniques and report results on -SimpleQuestions, a challenging large scale factoid question answering task. -" -2854,1605.07515,"Michael Roth, Mirella Lapata",Neural Semantic Role Labeling with Dependency Path Embeddings,cs.CL," This paper introduces a novel model for semantic role labeling that makes use -of neural sequence modeling techniques. Our approach is motivated by the -observation that complex syntactic structures and related phenomena, such as -nested subordinations and nominal predicates, are not handled well by existing -models. Our model treats such instances as sub-sequences of lexicalized -dependency paths and learns suitable embedding representations. We -experimentally demonstrate that such embeddings can improve results over -previous state-of-the-art semantic role labelers, and showcase qualitative -improvements obtained by our method. -" -2855,1605.07669,"Pei-Hao Su and Milica Gasic and Nikola Mrksic and Lina Rojas-Barahona - and Stefan Ultes and David Vandyke and Tsung-Hsien Wen and Steve Young","On-line Active Reward Learning for Policy Optimisation in Spoken - Dialogue Systems",cs.CL cs.LG," The ability to compute an accurate reward function is essential for -optimising a dialogue policy via reinforcement learning. In real-world -applications, using explicit user feedback as the reward signal is often -unreliable and costly to collect. This problem can be mitigated if the user's -intent is known in advance or data is available to pre-train a task success -predictor off-line. In practice neither of these apply for most real world -applications. Here we propose an on-line learning framework whereby the -dialogue policy is jointly trained alongside the reward model via active -learning with a Gaussian process model. This Gaussian process operates on a -continuous space dialogue representation generated in an unsupervised fashion -using a recurrent neural network encoder-decoder. The experimental results -demonstrate that the proposed framework is able to significantly reduce data -annotation costs and mitigate noisy user feedback in dialogue policy learning. -" -2856,1605.07683,Antoine Bordes and Y-Lan Boureau and Jason Weston,Learning End-to-End Goal-Oriented Dialog,cs.CL," Traditional dialog systems used in goal-oriented applications require a lot -of domain-specific handcrafting, which hinders scaling up to new domains. -End-to-end dialog systems, in which all components are trained from the dialogs -themselves, escape this limitation. But the encouraging success recently -obtained in chit-chat dialog may not carry over to goal-oriented settings. This -paper proposes a testbed to break down the strengths and shortcomings of -end-to-end dialog systems in goal-oriented applications. Set in the context of -restaurant reservation, our tasks require manipulating sentences and symbols, -so as to properly conduct conversations, issue API calls and use the outputs of -such calls. We show that an end-to-end dialog system based on Memory Networks -can reach promising, yet imperfect, performance and learn to perform -non-trivial operations. We confirm those results by comparing our system to a -hand-crafted slot-filling baseline on data from the second Dialog State -Tracking Challenge (Henderson et al., 2014a). We show similar result patterns -on data extracted from an online concierge service. -" -2857,1605.07733,"Radoslava Kraleva, Velin Kralev","On model architecture for a children's speech recognition interactive - dialog system",cs.HC cs.CL cs.SD," This report presents a general model of the architecture of information -systems for the speech recognition of children. It presents a model of the -speech data stream and how it works. The result of these studies and presented -veins architectural model shows that research needs to be focused on -acoustic-phonetic modeling in order to improve the quality of children's speech -recognition and the sustainability of the systems to noise and changes in -transmission environment. Another important aspect is the development of more -accurate algorithms for modeling of spontaneous child speech. -" -2858,1605.07735,Radoslava Kraleva,Design and development a children's speech database,cs.CL cs.HC cs.SD," The report presents the process of planning, designing and the development of -a database of spoken children's speech whose native language is Bulgarian. The -proposed model is designed for children between the age of 4 and 6 without -speech disorders, and reflects their specific capabilities. At this age most -children cannot read, there is no sustained concentration, they are emotional, -etc. The aim is to unite all the media information accompanying the recording -and processing of spoken speech, thereby to facilitate the work of researchers -in the field of speech recognition. This database will be used for the -development of systems for children's speech recognition, children's speech -synthesis systems, games which allow voice control, etc. As a result of the -proposed model a prototype system for speech recognition is presented. -" -2859,1605.07766,"Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu","Integrating Distributional Lexical Contrast into Word Embeddings for - Antonym-Synonym Distinction",cs.CL," We propose a novel vector representation that integrates lexical contrast -into distributional vectors and strengthens the most salient features for -determining degrees of word similarity. The improved vectors significantly -outperform standard models and distinguish antonyms from synonyms with an -average precision of 0.66-0.76 across word classes (adjectives, nouns, verbs). -Moreover, we integrate the lexical contrast vectors into the objective function -of a skip-gram model. The novel embedding outperforms state-of-the-art models -on predicting word similarities in SimLex-999, and on distinguishing antonyms -from synonyms. -" -2860,1605.07843,"Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, Ming Zhou","Unsupervised Word and Dependency Path Embeddings for Aspect Term - Extraction",cs.CL," In this paper, we develop a novel approach to aspect term extraction based on -unsupervised learning of distributed representations of words and dependency -paths. The basic idea is to connect two words (w1 and w2) with the dependency -path (r) between them in the embedding space. Specifically, our method -optimizes the objective w1 + r = w2 in the low-dimensional space, where the -multi-hop dependency paths are treated as a sequence of grammatical relations -and modeled by a recurrent neural network. Then, we design the embedding -features that consider linear context and dependency context information, for -the conditional random field (CRF) based aspect term extraction. Experimental -results on the SemEval datasets show that, (1) with only embedding features, we -can achieve state-of-the-art results; (2) our embedding method which -incorporates the syntactic information among words yields better performance -than other representative ones in aspect term extraction. -" -2861,1605.07844,"Javid Dadashkarimi, Mahsa S. Shahshahani, Amirhossein Tebbifakhr, - Heshaam Faili, and Azadeh Shakery","Dimension Projection among Languages based on Pseudo-relevant Documents - for Query Translation",cs.IR cs.AI cs.CL," Using top-ranked documents in response to a query has been shown to be an -effective approach to improve the quality of query translation in -dictionary-based cross-language information retrieval. In this paper, we -propose a new method for dictionary-based query translation based on dimension -projection of embedded vectors from the pseudo-relevant documents in the source -language to their equivalents in the target language. To this end, first we -learn low-dimensional vectors of the words in the pseudo-relevant collections -separately and then aim to find a query-dependent transformation matrix between -the vectors of translation pairs appeared in the collections. At the next step, -representation of each query term is projected to the target language and then, -after using a softmax function, a query-dependent translation model is built. -Finally, the model is used for query translation. Our experiments on four CLEF -collections in French, Spanish, German, and Italian demonstrate that the -proposed method outperforms a word embedding baseline based on bilingual -shuffling and a further number of competitive baselines. The proposed method -reaches up to 87% performance of machine translation (MT) in short queries and -considerable improvements in verbose queries. -" -2862,1605.07852,"Javid Dadashkarimi, Hossein Nasr Esfahani, Heshaam Faili, and Azadeh - Shakery",SS4MCT: A Statistical Stemmer for Morphologically Complex Texts,cs.IR cs.CL," There have been multiple attempts to resolve various inflection matching -problems in information retrieval. Stemming is a common approach to this end. -Among many techniques for stemming, statistical stemming has been shown to be -effective in a number of languages, particularly highly inflected languages. In -this paper we propose a method for finding affixes in different positions of a -word. Common statistical techniques heavily rely on string similarity in terms -of prefix and suffix matching. Since infixes are common in irregular/informal -inflections in morphologically complex texts, it is required to find infixes -for stemming. In this paper we propose a method whose aim is to find -statistical inflectional rules based on minimum edit distance table of word -pairs and the likelihoods of the rules in a language. These rules are used to -statistically stem words and can be used in different text mining tasks. -Experimental results on CLEF 2008 and CLEF 2009 English-Persian CLIR tasks -indicate that the proposed method significantly outperforms all the baselines -in terms of MAP. -" -2863,1605.07869,"Biao Zhang, Deyi Xiong, Jinsong Su, Hong Duan, Min Zhang",Variational Neural Machine Translation,cs.CL," Models of neural machine translation are often from a discriminative family -of encoderdecoders that learn a conditional distribution of a target sentence -given a source sentence. In this paper, we propose a variational model to learn -this conditional distribution for neural machine translation: a variational -encoderdecoder model that can be trained end-to-end. Different from the vanilla -encoder-decoder model that generates target translations from hidden -representations of source sentences alone, the variational model introduces a -continuous latent variable to explicitly model underlying semantics of source -sentences and to guide the generation of target translations. In order to -perform efficient posterior inference and large-scale training, we build a -neural posterior approximator conditioned on both the source and the target -sides, and equip it with a reparameterization technique to estimate the -variational lower bound. Experiments on both Chinese-English and English- -German translation tasks show that the proposed variational neural machine -translation achieves significant improvements over the vanilla neural machine -translation baselines. -" -2864,1605.07874,"Biao Zhang, Deyi Xiong and Jinsong Su","BattRAE: Bidimensional Attention-Based Recursive Autoencoders for - Learning Bilingual Phrase Embeddings",cs.CL," In this paper, we propose a bidimensional attention based recursive -autoencoder (BattRAE) to integrate clues and sourcetarget interactions at -multiple levels of granularity into bilingual phrase representations. We employ -recursive autoencoders to generate tree structures of phrases with embeddings -at different levels of granularity (e.g., words, sub-phrases and phrases). Over -these embeddings on the source and target side, we introduce a bidimensional -attention network to learn their interactions encoded in a bidimensional -attention matrix, from which we extract two soft attention weight distributions -simultaneously. These weight distributions enable BattRAE to generate -compositive phrase representations via convolution. Based on the learned phrase -representations, we further use a bilinear neural model, trained via a -max-margin method, to measure bilingual semantic similarity. To evaluate the -effectiveness of BattRAE, we incorporate this semantic similarity as an -additional feature into a state-of-the-art SMT system. Extensive experiments on -NIST Chinese-English test sets show that our model achieves a substantial -improvement of up to 1.63 BLEU points on average over the baseline. -" -2865,1605.07891,"Fernando Diaz, Bhaskar Mitra, Nick Craswell",Query Expansion with Locally-Trained Word Embeddings,cs.IR cs.CL," Continuous space word embeddings have received a great deal of attention in -the natural language processing and machine learning communities for their -ability to model term similarity and other relationships. We study the use of -term relatedness in the context of query expansion for ad hoc information -retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when -trained globally, underperform corpus and query specific embeddings for -retrieval tasks. These results suggest that other tasks benefiting from global -embeddings may also benefit from local embeddings. -" -2866,1605.07895,Nabiha Asghar,"Automatic Extraction of Causal Relations from Natural Language Texts: A - Comprehensive Survey",cs.AI cs.CL cs.IR," Automatic extraction of cause-effect relationships from natural language -texts is a challenging open problem in Artificial Intelligence. Most of the -early attempts at its solution used manually constructed linguistic and -syntactic rules on small and domain-specific data sets. However, with the -advent of big data, the availability of affordable computing power and the -recent popularization of machine learning, the paradigm to tackle this problem -has slowly shifted. Machines are now expected to learn generic causal -extraction rules from labelled data with minimal supervision, in a domain -independent-manner. In this paper, we provide a comprehensive survey of causal -relation extraction techniques from both paradigms, and analyse their relative -strengths and weaknesses, with recommendations for future work. -" -2867,1605.07912,"Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. - Cohen",Review Networks for Caption Generation,cs.LG cs.CL cs.CV," We propose a novel extension of the encoder-decoder framework, called a -review network. The review network is generic and can enhance any existing -encoder- decoder model: in this paper, we consider RNN decoders with both CNN -and RNN encoders. The review network performs a number of review steps with -attention mechanism on the encoder hidden states, and outputs a thought vector -after each review step; the thought vectors are used as the input of the -attention mechanism in the decoder. We show that conventional encoder-decoders -are a special case of our framework. Empirically, we show that our framework -improves over state-of- the-art encoder-decoder systems on the tasks of image -captioning and source code captioning. -" -2868,1605.07918,"Byungsoo Kim, Hwanjo Yu, Gary Geunbae Lee","Automatic Open Knowledge Acquisition via Long Short-Term Memory Networks - with Feedback Negative Sampling",cs.CL cs.AI cs.NE," Previous studies in Open Information Extraction (Open IE) are mainly based on -extraction patterns. They manually define patterns or automatically learn them -from a large corpus. However, these approaches are limited when grasping the -context of a sentence, and they fail to capture implicit relations. In this -paper, we address this problem with the following methods. First, we exploit -long short-term memory (LSTM) networks to extract higher-level features along -the shortest dependency paths, connecting headwords of relations and arguments. -The path-level features from LSTM networks provide useful clues regarding -contextual information and the validity of arguments. Second, we constructed -samples to train LSTM networks without the need for manual labeling. In -particular, feedback negative sampling picks highly negative samples among -non-positive samples through a model trained with positive samples. The -experimental results show that our approach produces more precise and abundant -extractions than state-of-the-art open IE systems. To the best of our -knowledge, this is the first work to apply deep learning to Open IE. -" -2869,1605.08535,"Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim",Deep API Learning,cs.SE cs.CL cs.LG cs.NE," Developers often wonder how to implement a certain functionality (e.g., how -to parse XML files) using APIs. Obtaining an API usage sequence based on an -API-related natural language query is very helpful in this regard. Given a -query, existing approaches utilize information retrieval models to search for -matching API sequences. These approaches treat queries and APIs as bag-of-words -(i.e., keyword matching or word-to-word alignment) and lack a deep -understanding of the semantics of the query. - We propose DeepAPI, a deep learning based approach to generate API usage -sequences for a given natural language query. Instead of a bags-of-words -assumption, it learns the sequence of words in a query and the sequence of -associated APIs. DeepAPI adapts a neural language model named RNN -Encoder-Decoder. It encodes a word sequence (user query) into a fixed-length -context vector, and generates an API sequence based on the context vector. We -also augment the RNN Encoder-Decoder by considering the importance of -individual APIs. We empirically evaluate our approach with more than 7 million -annotated code snippets collected from GitHub. The results show that our -approach generates largely accurate API sequences and outperforms the related -approaches. -" -2870,1605.08675,Piotr Przyby{\l}a,Boosting Question Answering by Deep Entity Recognition,cs.CL," In this paper an open-domain factoid question answering system for Polish, -RAFAEL, is presented. The system goes beyond finding an answering sentence; it -also extracts a single string, corresponding to the required entity. Herein the -focus is placed on different approaches to entity recognition, essential for -retrieving information matching question constraints. Apart from traditional -approach, including named entity recognition (NER) solutions, a novel -technique, called Deep Entity Recognition (DeepER), is introduced and -implemented. It allows a comprehensive search of all forms of entity references -matching a given WordNet synset (e.g. an impressionist), based on a previously -assembled entity library. It has been created by analysing the first sentences -of encyclopaedia entries and disambiguation and redirect pages. DeepER also -provides automatic evaluation, which makes possible numerous experiments, -including over a thousand questions from a quiz TV show answered on the grounds -of Polish Wikipedia. The final results of a manual evaluation on a separate -question set show that the strength of DeepER approach lies in its ability to -answer questions that demand answers beyond the traditional categories of named -entities. -" -2871,1605.08764,Nazneen Fatema Rajani and Raymond J. Mooney,Stacking With Auxiliary Features,cs.CL cs.CV cs.LG," Ensembling methods are well known for improving prediction accuracy. However, -they are limited in the sense that they cannot discriminate among component -models effectively. In this paper, we propose stacking with auxiliary features -that learns to fuse relevant information from multiple systems to improve -performance. Auxiliary features enable the stacker to rely on systems that not -just agree on an output but also the provenance of the output. We demonstrate -our approach on three very different and difficult problems -- the Cold Start -Slot Filling, the Tri-lingual Entity Discovery and Linking and the ImageNet -object detection tasks. We obtain new state-of-the-art results on the first two -tasks and substantial improvements on the detection task, thus verifying the -power and generality of our approach. -" -2872,1605.08889,"John P. Lalor, Hao Wu, Hong Yu",Building an Evaluation Scale using Item Response Theory,cs.CL," Evaluation of NLP methods requires testing against a previously vetted -gold-standard test set and reporting standard metrics -(accuracy/precision/recall/F1). The current assumption is that all items in a -given test set are equal with regards to difficulty and discriminating power. -We propose Item Response Theory (IRT) from psychometrics as an alternative -means for gold-standard test-set generation and NLP system evaluation. IRT is -able to describe characteristics of individual items - their difficulty and -discriminating power - and can account for these characteristics in its -estimation of human intelligence or ability for an NLP task. In this paper, we -demonstrate IRT by generating a gold-standard test set for Recognizing Textual -Entailment. By collecting a large number of human responses and fitting our IRT -model, we show that our IRT model compares NLP systems with the performance in -a human population and is able to provide more insight into system performance -than standard evaluation metrics. We show that a high accuracy score does not -always imply a high IRT score, which depends on the item characteristics and -the response pattern. -" -2873,1605.08900,"Duyu Tang, Bing Qin, Ting Liu",Aspect Level Sentiment Classification with Deep Memory Network,cs.CL," We introduce a deep memory network for aspect level sentiment classification. -Unlike feature-based SVM and sequential neural models such as LSTM, this -approach explicitly captures the importance of each context word when inferring -the sentiment polarity of an aspect. Such importance degree and text -representation are calculated with multiple computational layers, each of which -is a neural attention model over an external memory. Experiments on laptop and -restaurant datasets demonstrate that our approach performs comparable to -state-of-art feature based SVM system, and substantially better than LSTM and -attention-based LSTM architectures. On both datasets we show that multiple -computational layers could improve the performance. Moreover, our approach is -also fast. The deep memory network with 9 layers is 15 times faster than LSTM -with a CPU implementation. -" -2874,1605.09090,"Yang Liu, Chengjie Sun, Lei Lin and Xiaolong Wang","Learning Natural Language Inference using Bidirectional LSTM model and - Inner-Attention",cs.CL," In this paper, we proposed a sentence encoding-based model for recognizing -text entailment. In our approach, the encoding of sentence is a two-stage -process. Firstly, average pooling was used over word-level bidirectional LSTM -(biLSTM) to generate a first-stage sentence representation. Secondly, attention -mechanism was employed to replace average pooling on the same sentence for -better representations. Instead of using target sentence to attend words in -source sentence, we utilized the sentence's first-stage representation to -attend words appeared in itself, which is called ""Inner-Attention"" in our paper -. Experiments conducted on Stanford Natural Language Inference (SNLI) Corpus -has proved the effectiveness of ""Inner-Attention"" mechanism. With less number -of parameters, our model outperformed the existing best sentence encoding-based -approach by a large margin. -" -2875,1605.09096,"William L. Hamilton, Jure Leskovec, Dan Jurafsky",Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change,cs.CL," Understanding how words change their meanings over time is key to models of -language and cultural evolution, but historical data on meaning is scarce, -making theories hard to develop and test. Word embeddings show promise as a -diachronic tool, but have not been carefully evaluated. We develop a robust -methodology for quantifying semantic change by evaluating word embeddings -(PPMI, SVD, word2vec) against known historical changes. We then use this -methodology to reveal statistical laws of semantic evolution. Using six -historical corpora spanning four languages and two centuries, we propose two -quantitative laws of semantic change: (i) the law of conformity---the rate of -semantic change scales with an inverse power-law of word frequency; (ii) the -law of innovation---independent of frequency, words that are more polysemous -have higher rates of semantic change. -" -2876,1605.09186,"Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes - Garc\'ia-Mart\'inez, Fethi Bougares, Lo\""ic Barrault, Joost van de Weijer","Does Multimodality Help Human and Machine for Translation and Image - Captioning?",cs.CL cs.LG cs.NE," This paper presents the systems developed by LIUM and CVC for the WMT16 -Multimodal Machine Translation challenge. We explored various comparative -methods, namely phrase-based systems and attentional recurrent neural networks -models trained using monomodal or multimodal data. We also performed a human -evaluation in order to estimate the usefulness of multimodal data for human -machine translation and image description generation. Our systems obtained the -best results for both tasks according to the automatic evaluation metrics BLEU -and METEOR. -" -2877,1605.09211,Brendan Jou and Shih-Fu Chang,Going Deeper for Multilingual Visual Sentiment Detection,cs.MM cs.CL cs.CV," This technical report details several improvements to the visual concept -detector banks built on images from the Multilingual Visual Sentiment Ontology -(MVSO). The detector banks are trained to detect a total of 9,918 -sentiment-biased visual concepts from six major languages: English, Spanish, -Italian, French, German and Chinese. In the original MVSO release, -adjective-noun pair (ANP) detectors were trained for the six languages using an -AlexNet-styled architecture by fine-tuning from DeepSentiBank. Here, through a -more extensive set of experiments, parameter tuning, and training runs, we -detail and release higher accuracy models for detecting ANPs across six -languages from the same image pool and setting as in the original release using -a more modern architecture, GoogLeNet, providing comparable or better -performance with reduced network parameter cost. - In addition, since the image pool in MVSO can be corrupted by user noise from -social interactions, we partitioned out a sub-corpus of MVSO images based on -tag-restricted queries for higher fidelity labels. We show that as a result of -these higher fidelity labels, higher performing AlexNet-styled ANP detectors -can be trained using the tag-restricted image subset as compared to the models -in full corpus. We release all these newly trained models for public research -use along with the list of tag-restricted images from the MVSO dataset. -" -2878,1605.09553,"Chenxi Liu, Junhua Mao, Fei Sha, Alan Yuille",Attention Correctness in Neural Image Captioning,cs.CV cs.CL cs.LG," Attention mechanisms have recently been introduced in deep learning for -various tasks in natural language processing and computer vision. But despite -their popularity, the ""correctness"" of the implicitly-learned attention maps -has only been assessed qualitatively by visualization of several examples. In -this paper we focus on evaluating and improving the correctness of attention in -neural image captioning models. Specifically, we propose a quantitative -evaluation metric for the consistency between the generated attention maps and -human annotations, using recently released datasets with alignment between -regions in images and entities in captions. We then propose novel models with -different levels of explicit supervision for learning attention maps during -training. The supervision can be strong when alignment between regions and -caption entities are available, or weak when only object segments and -categories are provided. We show on the popular Flickr30k and COCO datasets -that introducing supervision of attention maps during training solidly improves -both attention correctness and caption quality, showing the promise of making -machine perception more human-like. -" -2879,1605.09564,"Gregory Grefenstette (TAO), Lawrence Muchemi (TAO)","Determining the Characteristic Vocabulary for a Specialized Dictionary - using Word2vec and a Directed Crawler",cs.CL cs.AI cs.IR," Specialized dictionaries are used to understand concepts in specific domains, -especially where those concepts are not part of the general vocabulary, or -having meanings that differ from ordinary languages. The first step in creating -a specialized dictionary involves detecting the characteristic vocabulary of -the domain in question. Classical methods for detecting this vocabulary involve -gathering a domain corpus, calculating statistics on the terms found there, and -then comparing these statistics to a background or general language corpus. -Terms which are found significantly more often in the specialized corpus than -in the background corpus are candidates for the characteristic vocabulary of -the domain. Here we present two tools, a directed crawler, and a distributional -semantics package, that can be used together, circumventing the need of a -background corpus. Both tools are available on the web. -" -2880,1606.00025,"Sushrut Thorat, Varad Choudhari","Implementing a Reverse Dictionary, based on word definitions, using a - Node-Graph Architecture",cs.CL," In this paper, we outline an approach to build graph-based reverse -dictionaries using word definitions. A reverse dictionary takes a phrase as an -input and outputs a list of words semantically similar to that phrase. It is a -solution to the Tip-of-the-Tongue problem. We use a distance-based similarity -measure, computed on a graph, to assess the similarity between a word and the -input phrase. We compare the performance of our approach with the Onelook -Reverse Dictionary and a distributional semantics method based on word2vec, and -show that our approach is much better than the distributional semantics method, -and as good as Onelook, on a 3k lexicon. This simple approach sets a new -performance baseline for reverse dictionaries. -" -2881,1606.00061,"Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh",Hierarchical Question-Image Co-Attention for Visual Question Answering,cs.CV cs.CL," A number of recent works have proposed attention models for Visual Question -Answering (VQA) that generate spatial maps highlighting image regions relevant -to answering the question. In this paper, we argue that in addition to modeling -""where to look"" or visual attention, it is equally important to model ""what -words to listen to"" or question attention. We present a novel co-attention -model for VQA that jointly reasons about image and question attention. In -addition, our model reasons about the question (and consequently the image via -the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional -convolution neural networks (CNN). Our model improves the state-of-the-art on -the VQA dataset from 60.3% to 60.5%, and from 61.6% to 63.3% on the COCO-QA -dataset. By using ResNet, the performance is further improved to 62.1% for VQA -and 65.4% for COCO-QA. -" -2882,1606.00189,"Shamil Chollampatt, Kaveh Taghipour and Hwee Tou Ng",Neural Network Translation Models for Grammatical Error Correction,cs.CL," Phrase-based statistical machine translation (SMT) systems have previously -been used for the task of grammatical error correction (GEC) to achieve -state-of-the-art accuracy. The superiority of SMT systems comes from their -ability to learn text transformations from erroneous to corrected text, without -explicitly modeling error types. However, phrase-based SMT systems suffer from -limitations of discrete word representation, linear mapping, and lack of global -context. In this paper, we address these limitations by using two different yet -complementary neural network models, namely a neural network global lexicon -model and a neural network joint model. These neural networks can generalize -better by using continuous space representation of words and learn non-linear -mappings. Moreover, they can leverage contextual information from the source -sentence more effectively. By adding these two components, we achieve -statistically significant improvement in accuracy for grammatical error -correction over a state-of-the-art GEC system. -" -2883,1606.00210,Duc Tam Hoang and Shamil Chollampatt and Hwee Tou Ng,"Exploiting N-Best Hypotheses to Improve an SMT Approach to Grammatical - Error Correction",cs.CL," Grammatical error correction (GEC) is the task of detecting and correcting -grammatical errors in texts written by second language learners. The -statistical machine translation (SMT) approach to GEC, in which sentences -written by second language learners are translated to grammatically correct -sentences, has achieved state-of-the-art accuracy. However, the SMT approach is -unable to utilize global context. In this paper, we propose a novel approach to -improve the accuracy of GEC, by exploiting the n-best hypotheses generated by -an SMT approach. Specifically, we build a classifier to score the edits in the -n-best hypotheses. The classifier can be used to select appropriate edits or -re-rank the n-best hypotheses. We apply these methods to a state-of-the-art GEC -system that uses the SMT approach. Our experiments show that our methods -achieve statistically significant improvements in accuracy over the best -published results on a benchmark test dataset on GEC. -" -2884,1606.00253,"Georgios Balikas, Massih-Reza Amini, Marianne Clausel",On a Topic Model for Sentences,cs.CL cs.IR cs.LG," Probabilistic topic models are generative models that describe the content of -documents by discovering the latent topics underlying them. However, the -structure of the textual input, and for instance the grouping of words in -coherent text spans such as sentences, contains much information which is -generally lost with these models. In this paper, we propose sentenceLDA, an -extension of LDA whose goal is to overcome this limitation by incorporating the -structure of the text in the generative and inference processes. We illustrate -the advantages of sentenceLDA by comparing it with LDA using both intrinsic -(perplexity) and extrinsic (text classification) evaluation tasks on different -text collections. -" -2885,1606.00294,Jessica Ficler and Yoav Goldberg,Improved Parsing for Argument-Clusters Coordination,cs.CL," Syntactic parsers perform poorly in prediction of Argument-Cluster -Coordination (ACC). We change the PTB representation of ACC to be more suitable -for learning by a statistical PCFG parser, affecting 125 trees in the training -set. Training on the modified trees yields a slight improvement in EVALB scores -on sections 22 and 23. The main evaluation is on a corpus of 4th grade science -exams, in which ACC structures are prevalent. On this corpus, we obtain an -impressive x2.7 improvement in recovering ACC structures compared to a parser -trained on the original PTB trees. -" -2886,1606.00372,"Rami Al-Rfou and Marc Pickett and Javier Snaider and Yun-hsuan Sung - and Brian Strope and Ray Kurzweil","Conversational Contextual Cues: The Case of Personalization and History - for Response Ranking",cs.CL cs.LG," We investigate the task of modeling open-domain, multi-turn, unstructured, -multi-participant, conversational dialogue. We specifically study the effect of -incorporating different elements of the conversation. Unlike previous efforts, -which focused on modeling messages and responses, we extend the modeling to -long context and participant's history. Our system does not rely on handwritten -rules or engineered features; instead, we train deep neural networks on a large -conversational dataset. In particular, we exploit the structure of Reddit -comments and posts to extract 2.1 billion messages and 133 million -conversations. We evaluate our models on the task of predicting the next -response in a conversation, and we find that modeling both context and -participants improves prediction accuracy. -" -2887,1606.00411,"Saurav Ghosh, Prithwish Chakraborty, Elaine O. Nsoesie, Emily Cohn, - Sumiko R. Mekaru, John S. Brownstein and Naren Ramakrishnan","Temporal Topic Modeling to Assess Associations between News Trends and - Infectious Disease Outbreaks",cs.SI cs.CL cs.IR stat.ML," In retrospective assessments, internet news reports have been shown to -capture early reports of unknown infectious disease transmission prior to -official laboratory confirmation. In general, media interest and reporting -peaks and wanes during the course of an outbreak. In this study, we quantify -the extent to which media interest during infectious disease outbreaks is -indicative of trends of reported incidence. We introduce an approach that uses -supervised temporal topic models to transform large corpora of news articles -into temporal topic trends. The key advantages of this approach include, -applicability to a wide range of diseases, and ability to capture disease -dynamics - including seasonality, abrupt peaks and troughs. We evaluated the -method using data from multiple infectious disease outbreaks reported in the -United States of America (U.S.), China and India. We noted that temporal topic -trends extracted from disease-related news reports successfully captured the -dynamics of multiple outbreaks such as whooping cough in U.S. (2012), dengue -outbreaks in India (2013) and China (2014). Our observations also suggest that -efficient modeling of temporal topic trends using time-series regression -techniques can estimate disease case counts with increased precision before -official reports by health organizations. -" -2888,1606.00414,Nicolas Turenne,On a Possible Similarity between Gene and Semantic Networks,cs.CL cs.CE cs.SI," In several domains such as linguistics, molecular biology or social sciences, -holistic effects are hardly well-defined by modeling with single units, but -more and more studies tend to understand macro structures with the help of -meaningful and useful associations in fields such as social networks, systems -biology or semantic web. A stochastic multi-agent system offers both accurate -theoretical framework and operational computing implementations to model -large-scale associations, their dynamics and patterns extraction. We show that -clustering around a target object in a set of associations of object prove some -similarity in specific data and two case studies about gene-gene and term-term -relationships leading to an idea of a common organizing principle of cognition -with random and deterministic effects. -" -2889,1606.00499,Graham Neubig and Chris Dyer,Generalizing and Hybridizing Count-based and Neural Language Models,cs.CL," Language models (LMs) are statistical models that calculate probabilities -over sequences of words or other discrete symbols. Currently two major -paradigms for language modeling exist: count-based n-gram models, which have -advantages of scalability and test-time speed, and neural LMs, which often -achieve superior modeling performance. We demonstrate how both varieties of -models can be unified in a single modeling framework that defines a set of -probability distributions over the vocabulary of words, and then dynamically -calculates mixture weights over these distributions. This formulation allows us -to create novel hybrid models that combine the desirable features of -count-based and neural LMs, and experiments demonstrate the advantages of these -approaches. -" -2890,1606.00577,"Justin Wood, Patrick Tan, Wei Wang, Corey Arnold","Source-LDA: Enhancing probabilistic topic models using prior knowledge - sources",cs.CL cs.IR cs.LG," A popular approach to topic modeling involves extracting co-occurring n-grams -of a corpus into semantic themes. The set of n-grams in a theme represents an -underlying topic, but most topic modeling approaches are not able to label -these sets of words with a single n-gram. Such labels are useful for topic -identification in summarization systems. This paper introduces a novel approach -to labeling a group of n-grams comprising an individual topic. The approach -taken is to complement the existing topic distributions over words with a known -distribution based on a predefined set of topics. This is done by integrating -existing labeled knowledge sources representing known potential topics into the -probabilistic topic model. These knowledge sources are translated into a -distribution and used to set the hyperparameters of the Dirichlet generated -distribution over words. In the inference these modified distributions guide -the convergence of the latent topics to conform with the complementary -distributions. This approach ensures that the topic inference process is -consistent with existing knowledge. The label assignment from the complementary -knowledge sources are then transferred to the latent topics of the corpus. The -results show both accurate label assignment to topics as well as improved topic -generation than those obtained using various labeling approaches based off -Latent Dirichlet allocation (LDA). -" -2891,1606.00589,"Katharina Kann and Hinrich Sch\""utze","Single-Model Encoder-Decoder with Explicit Morphological Representation - for Reinflection",cs.CL," Morphological reinflection is the task of generating a target form given a -source form, a source tag and a target tag. We propose a new way of modeling -this task with neural encoder-decoder models. Our approach reduces the amount -of required training data for this architecture and achieves state-of-the-art -results, making encoder-decoder models applicable to morphological reinflection -even for low-resource languages. We further present a new automatic correction -method for the outputs based on edit trees. -" -2892,1606.00739,Artem Sokolov and Julia Kreutzer and Christopher Lo and Stefan Riezler,Stochastic Structured Prediction under Bandit Feedback,cs.CL cs.LG stat.ML," Stochastic structured prediction under bandit feedback follows a learning -protocol where on each of a sequence of iterations, the learner receives an -input, predicts an output structure, and receives partial feedback in form of a -task loss evaluation of the predicted structure. We present applications of -this learning scenario to convex and non-convex objectives for structured -prediction and analyze them as stochastic first-order methods. We present an -experimental evaluation on problems of natural language processing over -exponential output spaces, and compare convergence speed across different -objectives under the practical criterion of optimal task performance on -development data and the optimization-theoretic criterion of minimal squared -gradient norm. Best results under both criteria are obtained for a non-convex -objective for pairwise preference learning under bandit feedback. -" -2893,1606.00776,"Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, - Bowen Zhou, Yoshua Bengio, Aaron Courville","Multiresolution Recurrent Neural Networks: An Application to Dialogue - Response Generation",cs.CL cs.AI cs.LG cs.NE stat.ML," We introduce the multiresolution recurrent neural network, which extends the -sequence-to-sequence framework to model natural language generation as two -parallel discrete stochastic processes: a sequence of high-level coarse tokens, -and a sequence of natural language tokens. There are many ways to estimate or -learn the high-level coarse tokens, but we argue that a simple extraction -procedure is sufficient to capture a wealth of high-level discourse semantics. -Such procedure allows training the multiresolution recurrent neural network by -maximizing the exact joint log-likelihood over both sequences. In contrast to -the standard log- likelihood objective w.r.t. natural language tokens (word -perplexity), optimizing the joint log-likelihood biases the model towards -modeling high-level abstractions. We apply the proposed model to the task of -dialogue response generation in two challenging domains: the Ubuntu technical -support domain, and Twitter conversations. On Ubuntu, the model outperforms -competing approaches by a substantial margin, achieving state-of-the-art -results according to both automatic evaluation metrics and a human evaluation -study. On Twitter, the model appears to generate more relevant and on-topic -responses according to automatic evaluation metrics. Finally, our experiments -demonstrate that the proposed model is more adept at overcoming the sparsity of -natural language and is better able to capture long-term structure. -" -2894,1606.00819,Alexandre Salle and Marco Idiart and Aline Villavicencio,"Matrix Factorization using Window Sampling and Negative Sampling for - Improved Word Representations",cs.CL," In this paper, we propose LexVec, a new method for generating distributed -word representations that uses low-rank, weighted factorization of the Positive -Point-wise Mutual Information matrix via stochastic gradient descent, employing -a weighting scheme that assigns heavier penalties for errors on frequent -co-occurrences while still accounting for negative co-occurrence. Evaluation on -word similarity and analogy tasks shows that LexVec matches and often -outperforms state-of-the-art methods on many of these tasks. -" -2895,1606.00979,"Yuanzhe Zhang, Kang Liu, Shizhu He, Guoliang Ji, Zhanyi Liu, Hua Wu, - Jun Zhao","Question Answering over Knowledge Base with Neural Attention Combining - Global Knowledge Information",cs.IR cs.AI cs.CL cs.NE," With the rapid growth of knowledge bases (KBs) on the web, how to take full -advantage of them becomes increasingly important. Knowledge base-based question -answering (KB-QA) is one of the most promising approaches to access the -substantial knowledge. Meantime, as the neural network-based (NN-based) methods -develop, NN-based KB-QA has already achieved impressive results. However, -previous work did not put emphasis on question representation, and the question -is converted into a fixed vector regardless of its candidate answers. This -simple representation strategy is unable to express the proper information of -the question. Hence, we present a neural attention-based model to represent the -questions dynamically according to the different focuses of various candidate -answer aspects. In addition, we leverage the global knowledge inside the -underlying KB, aiming at integrating the rich KB information into the -representation of the answers. And it also alleviates the out of vocabulary -(OOV) problem, which helps the attention model to represent the question more -precisely. The experimental results on WEBQUESTIONS demonstrate the -effectiveness of the proposed approach. -" -2896,1606.01151,Alexander G. Ororbia II and Fridolin Linder and Joshua Snoke,"Using Neural Generative Models to Release Synthetic Twitter Corpora with - Reduced Stylometric Identifiability of Users",cs.CL," We present a method for generating synthetic versions of Twitter data using -neural generative models. The goal is protecting individuals in the source data -from stylometric re-identification attacks while still releasing data that -carries research value. Specifically, we generate tweet corpora that maintain -user-level word distributions by augmenting the neural language models with -user-specific components. We compare our approach to two standard text data -protection methods: redaction and iterative translation. We evaluate the three -methods on measures of risk and utility. We define risk following the -stylometric models of re-identification, and we define utility based on two -general word distribution measures and two common text analysis research tasks. -We find that neural models are able to significantly lower risk over previous -methods with little cost to utility. We also demonstrate that the neural models -allow data providers to actively control the risk-utility trade-off through -model tuning parameters. This work presents promising results for a new tool -addressing the problem of privacy for free text and sharing social media data -in a way that respects privacy and is ethically responsible. -" -2897,1606.01161,"Jiang Guo, Wanxiang Che, Haifeng Wang and Ting Liu","Exploiting Multi-typed Treebanks for Parsing with Deep Multi-task - Learning",cs.CL," Various treebanks have been released for dependency parsing. Despite that -treebanks may belong to different languages or have different annotation -schemes, they contain syntactic knowledge that is potential to benefit each -other. This paper presents an universal framework for exploiting these -multi-typed treebanks to improve parsing with deep multi-task learning. We -consider two kinds of treebanks as source: the multilingual universal treebanks -and the monolingual heterogeneous treebanks. Multiple treebanks are trained -jointly and interacted with multi-level parameter sharing. Experiments on -several benchmark datasets in various languages demonstrate that our approach -can make effective use of arbitrary source treebanks to improve target parsing -models. -" -2898,1606.01219,"Steven H. H. Ding, Benjamin C. M. Fung, Farkhund Iqbal, William K. - Cheung",Learning Stylometric Representations for Authorship Analysis,cs.CL cs.CY cs.SI," Authorship analysis (AA) is the study of unveiling the hidden properties of -authors from a body of exponentially exploding textual data. It extracts an -author's identity and sociolinguistic characteristics based on the reflected -writing styles in the text. It is an essential process for various areas, such -as cybercrime investigation, psycholinguistics, political socialization, etc. -However, most of the previous techniques critically depend on the manual -feature engineering process. Consequently, the choice of feature set has been -shown to be scenario- or dataset-dependent. In this paper, to mimic the human -sentence composition process using a neural network approach, we propose to -incorporate different categories of linguistic features into distributed -representation of words in order to learn simultaneously the writing style -representations based on unlabeled texts for authorship analysis. In -particular, the proposed models allow topical, lexical, syntactical, and -character-level feature vectors of each document to be extracted as -stylometrics. We evaluate the performance of our approach on the problems of -authorship characterization and authorship verification with the Twitter, -novel, and essay datasets. The experiments suggest that our proposed text -representation outperforms the bag-of-lexical-n-grams, Latent Dirichlet -Allocation, Latent Semantic Analysis, PVDM, PVDBOW, and word2vec -representations. -" -2899,1606.01269,Jason D. Williams and Geoffrey Zweig,"End-to-end LSTM-based dialog control optimized with supervised and - reinforcement learning",cs.CL cs.AI cs.LG," This paper presents a model for end-to-end learning of task-oriented dialog -systems. The main component of the model is a recurrent neural network (an -LSTM), which maps from raw dialog history directly to a distribution over -system actions. The LSTM automatically infers a representation of dialog -history, which relieves the system developer of much of the manual feature -engineering of dialog state. In addition, the developer can provide software -that expresses business rules and provides access to programmatic APIs, -enabling the LSTM to take actions in the real world on behalf of the user. The -LSTM can be optimized using supervised learning (SL), where a domain expert -provides example dialogs which the LSTM should imitate; or using reinforcement -learning (RL), where the system improves by interacting directly with end -users. Experiments show that SL and RL are complementary: SL alone can derive a -reasonable initial policy from a small number of training dialogs; and starting -RL optimization with a policy trained with SL substantially accelerates the -learning rate of RL. -" -2900,1606.01280,"Xingxing Zhang, Jianpeng Cheng and Mirella Lapata",Dependency Parsing as Head Selection,cs.CL cs.LG," Conventional graph-based dependency parsers guarantee a tree structure both -during training and inference. Instead, we formalize dependency parsing as the -problem of independently selecting the head of each word in a sentence. Our -model which we call \textsc{DeNSe} (as shorthand for {\bf De}pendency {\bf -N}eural {\bf Se}lection) produces a distribution over possible heads for each -word using features obtained from a bidirectional recurrent neural network. -Without enforcing structural constraints during training, \textsc{DeNSe} -generates (at inference time) trees for the overwhelming majority of sentences, -while non-tree outputs can be adjusted with a maximum spanning tree algorithm. -We evaluate \textsc{DeNSe} on four languages (English, Chinese, Czech, and -German) with varying degrees of non-projectivity. Despite the simplicity of the -approach, our parsers are on par with the state of the art. -" -2901,1606.01283,Alexandre Salle and Marco Idiart and Aline Villavicencio,"Enhancing the LexVec Distributed Word Representation Model Using - Positional Contexts and External Memory",cs.CL," In this paper we take a state-of-the-art model for distributed word -representation that explicitly factorizes the positive pointwise mutual -information (PPMI) matrix using window sampling and negative sampling and -address two of its shortcomings. We improve syntactic performance by using -positional contexts, and solve the need to store the PPMI matrix in memory by -working on aggregate data in external memory. The effectiveness of both -modifications is shown using word similarity and analogy tasks. -" -2902,1606.01292,Kaisheng Yao and Baolin Peng and Geoffrey Zweig and Kam-Fai Wong,An Attentional Neural Conversation Model with Improved Specificity,cs.CL cs.HC," In this paper we propose a neural conversation model for conducting -dialogues. We demonstrate the use of this model to generate help desk -responses, where users are asking questions about PC applications. Our model is -distinguished by two characteristics. First, it models intention across turns -with a recurrent network, and incorporates an attention model that is -conditioned on the representation of intention. Secondly, it avoids generating -non-specific responses by incorporating an IDF term in the objective function. -The model is evaluated both as a pure generation model in which a help-desk -response is generated from scratch, and as a retrieval model with performance -measured using recall rates of the correct response. Experimental results -indicate that the model outperforms previously proposed neural conversation -architectures, and that using specificity in the objective function -significantly improves performances for both generation and retrieval. -" -2903,1606.01305,"David Krueger, Tegan Maharaj, J\'anos Kram\'ar, Mohammad Pezeshki, - Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron - Courville, Chris Pal",Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations,cs.NE cs.CL cs.LG," We propose zoneout, a novel method for regularizing RNNs. At each timestep, -zoneout stochastically forces some hidden units to maintain their previous -values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, -improving generalization. But by preserving instead of dropping hidden units, -gradient information and state information are more readily propagated through -time, as in feedforward stochastic depth networks. We perform an empirical -investigation of various RNN regularizers, and find that zoneout gives -significant performance improvements across tasks. We achieve competitive -results with relatively simple models in character- and word-level language -modelling on the Penn Treebank and Text8 datasets, and combining with recurrent -batch normalization yields state-of-the-art results on permuted sequential -MNIST. -" -2904,1606.01323,Kevin Clark and Christopher D. Manning,"Improving Coreference Resolution by Learning Entity-Level Distributed - Representations",cs.CL," A long-standing challenge in coreference resolution has been the -incorporation of entity-level information - features defined over clusters of -mentions instead of mention pairs. We present a neural network based -coreference system that produces high-dimensional vector representations for -pairs of coreference clusters. Using these representations, our system learns -when combining clusters is desirable. We train the system with a -learning-to-search algorithm that teaches it which local decisions (cluster -merges) will lead to a high-scoring final coreference partition. The system -substantially outperforms the current state-of-the-art on the English and -Chinese portions of the CoNLL 2012 Shared Task dataset despite using few -hand-engineered features. -" -2905,1606.01341,"Sonse Shimaoka, Pontus Stenetorp, Kentaro Inui, Sebastian Riedel",Neural Architectures for Fine-grained Entity Type Classification,cs.CL," In this work, we investigate several neural network architectures for -fine-grained entity type classification. Particularly, we consider extensions -to a recently proposed attentive neural architecture and make three key -contributions. Previous work on attentive neural architectures do not consider -hand-crafted features, we combine learnt and hand-crafted features and observe -that they complement each other. Additionally, through quantitative analysis we -establish that the attention mechanism is capable of learning to attend over -syntactic heads and the phrase containing the mention, where both are known -strong hand-crafted features for our task. We enable parameter sharing through -a hierarchical label encoding method, that in low-dimensional projections show -clear clusters for each type hierarchy. Lastly, despite using the same -evaluation dataset, the literature frequently compare models trained using -different data. We establish that the choice of training data has a drastic -impact on performance, with decreases by as much as 9.85% loose micro F1 score -for a previously proposed method. Despite this, our best model achieves -state-of-the-art results with 75.36% loose micro F1 score on the well- -established FIGER (GOLD) dataset. -" -2906,1606.01404,"Vladyslav Kolesnyk, Tim Rockt\""aschel, Sebastian Riedel",Generating Natural Language Inference Chains,cs.CL cs.AI cs.NE," The ability to reason with natural language is a fundamental prerequisite for -many NLP tasks such as information extraction, machine translation and question -answering. To quantify this ability, systems are commonly tested whether they -can recognize textual entailment, i.e., whether one sentence can be inferred -from another one. However, in most NLP applications only single source -sentences instead of sentence pairs are available. Hence, we propose a new task -that measures how well a model can generate an entailed sentence from a source -sentence. We take entailment-pairs of the Stanford Natural Language Inference -corpus and train an LSTM with attention. On a manually annotated test set we -found that 82% of generated sentences are correct, an improvement of 10.3% over -an LSTM baseline. A qualitative analysis shows that this model is not only -capable of shortening input sentences, but also inferring new statements via -paraphrasing and phrase entailment. We then apply this model recursively to -input-output pairs, thereby generating natural language inference chains that -can be used to automatically construct an entailment graph from source -sentences. Finally, by swapping source and target sentences we can also train a -model that given an input sentence invents additional information to generate a -new sentence. -" -2907,1606.01433,Jason Alan Fries,"Brundlefly at SemEval-2016 Task 12: Recurrent Neural Networks vs. Joint - Inference for Clinical Temporal Information Extraction",cs.CL," We submitted two systems to the SemEval-2016 Task 12: Clinical TempEval -challenge, participating in Phase 1, where we identified text spans of time and -event expressions in clinical notes and Phase 2, where we predicted a relation -between an event and its parent document creation time. - For temporal entity extraction, we find that a joint inference-based approach -using structured prediction outperforms a vanilla recurrent neural network that -incorporates word embeddings trained on a variety of large clinical document -sets. For document creation time relations, we find that a combination of date -canonicalization and distant supervision rules for predicting relations on both -events and time expressions improves classification, though gains are limited, -likely due to the small scale of training data. -" -2908,1606.01515,Dimitri Kartsaklis (Queen Mary University of London),Coordination in Categorical Compositional Distributional Semantics,cs.CL cs.AI math.CT," An open problem with categorical compositional distributional semantics is -the representation of words that are considered semantically vacuous from a -distributional perspective, such as determiners, prepositions, relative -pronouns or coordinators. This paper deals with the topic of coordination -between identical syntactic types, which accounts for the majority of -coordination cases in language. By exploiting the compact closed structure of -the underlying category and Frobenius operators canonically induced over the -fixed basis of finite-dimensional vector spaces, we provide a morphism as -representation of a coordinator tensor, and we show how it lifts from atomic -types to compound types. Linguistic intuitions are provided, and the importance -of the Frobenius operators as an addition to the compact closed setting with -regard to language is discussed. -" -2909,1606.01541,"Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, Dan - Jurafsky",Deep Reinforcement Learning for Dialogue Generation,cs.CL," Recent neural models of dialogue generation offer great promise for -generating responses for conversational agents, but tend to be shortsighted, -predicting utterances one at a time while ignoring their influence on future -outcomes. Modeling the future direction of a dialogue is crucial to generating -coherent, interesting dialogues, a need which led traditional NLP models of -dialogue to draw on reinforcement learning. In this paper, we show how to -integrate these goals, applying deep reinforcement learning to model future -reward in chatbot dialogue. The model simulates dialogues between two virtual -agents, using policy gradient methods to reward sequences that display three -useful conversational properties: informativity (non-repetitive turns), -coherence, and ease of answering (related to forward-looking function). We -evaluate our model on diversity, length as well as with human judges, showing -that the proposed algorithm generates more interactive responses and manages to -foster a more sustained conversation in dialogue simulation. This work marks a -first step towards learning a neural conversational model based on the -long-term success of dialogues. -" -2910,1606.01545,"Jiwei Li, Dan Jurafsky",Neural Net Models for Open-Domain Discourse Coherence,cs.CL," Discourse coherence is strongly associated with text quality, making it -important to natural language generation and understanding. Yet existing models -of coherence focus on measuring individual aspects of coherence (lexical -overlap, rhetorical structure, entity centering) in narrow domains. - In this paper, we describe domain-independent neural models of discourse -coherence that are capable of measuring multiple aspects of coherence in -existing sentences and can maintain coherence while generating new sentences. -We study both discriminative models that learn to distinguish coherent from -incoherent discourse, and generative models that produce coherent text, -including a novel neural latent-variable Markovian generative model that -captures the latent discourse dependencies between sentences in a text. - Our work achieves state-of-the-art performance on multiple coherence -evaluations, and marks an initial step in generating coherent texts given -discourse contexts. -" -2911,1606.01549,"Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan - Salakhutdinov",Gated-Attention Readers for Text Comprehension,cs.CL cs.LG," In this paper we study the problem of answering cloze-style questions over -documents. Our model, the Gated-Attention (GA) Reader, integrates a multi-hop -architecture with a novel attention mechanism, which is based on multiplicative -interactions between the query embedding and the intermediate states of a -recurrent neural network document reader. This enables the reader to build -query-specific representations of tokens in the document for accurate answer -selection. The GA Reader obtains state-of-the-art results on three benchmarks -for this task--the CNN \& Daily Mail news stories and the Who Did What dataset. -The effectiveness of multiplicative interaction is demonstrated by an ablation -study, and by comparing to alternative compositional operators for implementing -the gated-attention. The code is available at -https://github.com/bdhingra/ga-reader. -" -2912,1606.01603,"Ting Liu, Yiming Cui, Qingyu Yin, Weinan Zhang, Shijin Wang and - Guoping Hu","Generating and Exploiting Large-scale Pseudo Training Data for Zero - Pronoun Resolution",cs.CL," Most existing approaches for zero pronoun resolution are heavily relying on -annotated data, which is often released by shared task organizers. Therefore, -the lack of annotated data becomes a major obstacle in the progress of zero -pronoun resolution task. Also, it is expensive to spend manpower on labeling -the data for better performance. To alleviate the problem above, in this paper, -we propose a simple but novel approach to automatically generate large-scale -pseudo training data for zero pronoun resolution. Furthermore, we successfully -transfer the cloze-style reading comprehension neural network model into zero -pronoun resolution task and propose a two-step training mechanism to overcome -the gap between the pseudo training data and the real one. Experimental results -show that the proposed approach significantly outperforms the state-of-the-art -systems with an absolute improvements of 3.1% F-score on OntoNotes 5.0 data. -" -2913,1606.01614,"Xilun Chen, Yu Sun, Ben Athiwaratkun, Claire Cardie and Kilian - Weinberger","Adversarial Deep Averaging Networks for Cross-Lingual Sentiment - Classification",cs.CL," In recent years great success has been achieved in sentiment classification -for English, thanks in part to the availability of copious annotated resources. -Unfortunately, most languages do not enjoy such an abundance of labeled data. -To tackle the sentiment classification problem in low-resource languages -without adequate annotated data, we propose an Adversarial Deep Averaging -Network (ADAN) to transfer the knowledge learned from labeled data on a -resource-rich source language to low-resource languages where only unlabeled -data exists. ADAN has two discriminative branches: a sentiment classifier and -an adversarial language discriminator. Both branches take input from a shared -feature extractor to learn hidden representations that are simultaneously -indicative for the classification task and invariant across languages. -Experiments on Chinese and Arabic sentiment classification demonstrate that -ADAN significantly outperforms state-of-the-art systems. -" -2914,1606.01700,Yasumasa Miyamoto and Kyunghyun Cho,Gated Word-Character Recurrent Language Model,cs.CL," We introduce a recurrent neural network language model (RNN-LM) with long -short-term memory (LSTM) units that utilizes both character-level and -word-level inputs. Our model has a gate that adaptively finds the optimal -mixture of the character-level and word-level inputs. The gate creates the -final vector representation of a word by combining two distinct representations -of the word. The character-level inputs are converted into vector -representations of words using a bidirectional LSTM. The word-level inputs are -projected into another high-dimensional space by a word lookup table. The final -vector representations of words are used in the LSTM language model which -predicts the next word given all the preceding words. Our model with the gating -mechanism effectively utilizes the character-level inputs for rare and -out-of-vocabulary words and outperforms word-level language models on several -English corpora. -" -2915,1606.01720,Richard Moot (LaBRI),Proof nets for the Displacement calculus,cs.LO cs.CL," We present a proof net calculus for the Displacement calculus and show its -correctness. This is the first proof net calculus which models the Displacement -calculus directly and not by some sort of translation into another formalism. -The proof net calculus opens up new possibilities for parsing and proof search -with the Displacement calculus. -" -2916,1606.01781,"Alexis Conneau, Holger Schwenk, Lo\""ic Barrault, Yann Lecun",Very Deep Convolutional Networks for Text Classification,cs.CL cs.LG cs.NE," The dominant approach for many NLP tasks are recurrent neural networks, in -particular LSTMs, and convolutional neural networks. However, these -architectures are rather shallow in comparison to the deep convolutional -networks which have pushed the state-of-the-art in computer vision. We present -a new architecture (VDCNN) for text processing which operates directly at the -character level and uses only small convolutions and pooling operations. We are -able to show that the performance of this model increases with depth: using up -to 29 convolutional layers, we report improvements over the state-of-the-art on -several public text classification tasks. To the best of our knowledge, this is -the first time that very deep convolutional nets have been applied to text -processing. -" -2917,1606.01792,"Yaohua Tang, Fandong Meng, Zhengdong Lu, Hang Li, Philip L.H. Yu",Neural Machine Translation with External Phrase Memory,cs.CL," In this paper, we propose phraseNet, a neural machine translator with a -phrase memory which stores phrase pairs in symbolic form, mined from corpus or -specified by human experts. For any given source sentence, phraseNet scans the -phrase memory to determine the candidate phrase pairs and integrates tagging -information in the representation of source sentence accordingly. The decoder -utilizes a mixture of word-generating component and phrase-generating -component, with a specifically designed strategy to generate a sequence of -multiple words all at once. The phraseNet not only approaches one step towards -incorporating external knowledge into neural machine translation, but also -makes an effort to extend the word-by-word generation mechanism of recurrent -neural network. Our empirical study on Chinese-to-English translation shows -that, with carefully-chosen phrase table in memory, phraseNet yields 3.45 BLEU -improvement over the generic neural machine translator. -" -2918,1606.01847,"Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor - Darrell, and Marcus Rohrbach","Multimodal Compact Bilinear Pooling for Visual Question Answering and - Visual Grounding",cs.CV cs.AI cs.CL," Modeling textual or visual information with vector representations trained -from large language or visual datasets has been successfully explored in recent -years. However, tasks such as visual question answering require combining these -vector representations with each other. Approaches to multimodal pooling -include element-wise product or sum, as well as concatenation of the visual and -textual representations. We hypothesize that these methods are not as -expressive as an outer product of the visual and textual vectors. As the outer -product is typically infeasible due to its high dimensionality, we instead -propose utilizing Multimodal Compact Bilinear pooling (MCB) to efficiently and -expressively combine multimodal features. We extensively evaluate MCB on the -visual question answering and grounding tasks. We consistently show the benefit -of MCB over ablations without MCB. For visual question answering, we present an -architecture which uses MCB twice, once for predicting attention over spatial -features and again to combine the attended representation with the question -representation. This model outperforms the state-of-the-art on the Visual7W -dataset and the VQA challenge. -" -2919,1606.01933,"Ankur P. Parikh, Oscar T\""ackstr\""om, Dipanjan Das, Jakob Uszkoreit",A Decomposable Attention Model for Natural Language Inference,cs.CL," We propose a simple neural architecture for natural language inference. Our -approach uses attention to decompose the problem into subproblems that can be -solved separately, thus making it trivially parallelizable. On the Stanford -Natural Language Inference (SNLI) dataset, we obtain state-of-the-art results -with almost an order of magnitude fewer parameters than previous work and -without relying on any word-order information. Adding intra-sentence attention -that takes a minimum amount of order into account yields further improvements. -" -2920,1606.01990,"Attapol T. Rutherford, Vera Demberg, Nianwen Xue","Neural Network Models for Implicit Discourse Relation Classification in - English and Chinese without Surface Features",cs.CL," Inferring implicit discourse relations in natural language text is the most -difficult subtask in discourse parsing. Surface features achieve good -performance, but they are not readily applicable to other languages without -semantic lexicons. Previous neural models require parses, surface features, or -a small label set to work well. Here, we propose neural network models that are -based on feedforward and long-short term memory architecture without any -surface features. To our surprise, our best configured feedforward architecture -outperforms LSTM-based model in most cases despite thorough tuning. Under -various fine-grained label sets and a cross-linguistic setting, our feedforward -models perform consistently better or at least just as well as systems that -require hand-crafted surface features. Our models present the first neural -Chinese discourse parser in the style of Chinese Discourse Treebank, showing -that our results hold cross-linguistically. -" -2921,1606.01994,"Zihang Dai, Lei Li, Wei Xu","CFO: Conditional Focused Neural Question Answering with Large-scale - Knowledge Bases",cs.CL," How can we enable computers to automatically answer questions like ""Who -created the character Harry Potter""? Carefully built knowledge bases provide -rich sources of facts. However, it remains a challenge to answer factoid -questions raised in natural language due to numerous expressions of one -question. In particular, we focus on the most common questions --- ones that -can be answered with a single fact in the knowledge base. We propose CFO, a -Conditional Focused neural-network-based approach to answering factoid -questions with knowledge bases. Our approach first zooms in a question to find -more probable candidate subject mentions, and infers the final answers with a -unified conditional probabilistic framework. Powered by deep recurrent neural -networks and neural embeddings, our proposed CFO achieves an accuracy of 75.7% -on a dataset of 108k questions - the largest public one to date. It outperforms -the current state of the art by an absolute margin of 11.8%. -" -2922,1606.02003,"Mingxuan Wang, Zhengdong Lu, Hang Li and Qun Liu",Memory-enhanced Decoder for Neural Machine Translation,cs.CL," We propose to enhance the RNN decoder in a neural machine translator (NMT) -with external memory, as a natural but powerful extension to the state in the -decoding RNN. This memory-enhanced RNN decoder is called \textsc{MemDec}. At -each time during decoding, \textsc{MemDec} will read from this memory and write -to this memory once, both with content-based addressing. Unlike the unbounded -memory in previous work\cite{RNNsearch} to store the representation of source -sentence, the memory in \textsc{MemDec} is a matrix with pre-determined size -designed to better capture the information important for the decoding process -at each time step. Our empirical study on Chinese-English translation shows -that it can improve by $4.8$ BLEU upon Groundhog and $5.3$ BLEU upon on Moses, -yielding the best performance achieved with the same training set. -" -2923,1606.02006,"Philip Arthur, Graham Neubig, Satoshi Nakamura","Incorporating Discrete Translation Lexicons into Neural Machine - Translation",cs.CL," Neural machine translation (NMT) often makes mistakes in translating -low-frequency content words that are essential to understanding the meaning of -the sentence. We propose a method to alleviate this problem by augmenting NMT -systems with discrete translation lexicons that efficiently encode translations -of these low-frequency words. We describe a method to calculate the lexicon -probability of the next word in the translation candidate by using the -attention vector of the NMT model to select which source word lexical -probabilities the model should focus on. We test two methods to combine this -probability with the standard NMT probability: (1) using it as a bias, and (2) -linear interpolation. Experiments on two corpora show an improvement of 2.0-2.3 -BLEU and 0.13-0.44 NIST score, and faster convergence time. -" -2924,1606.02012,"Kyunghyun Cho, Masha Esipova",Can neural machine translation do simultaneous translation?,cs.CL," We investigate the potential of attention-based neural machine translation in -simultaneous translation. We introduce a novel decoding algorithm, called -simultaneous greedy decoding, that allows an existing neural machine -translation model to begin translating before a full source sentence is -received. This approach is unique from previous works on simultaneous -translation in that segmentation and translation are done jointly to maximize -the translation quality and that translating each segment is strongly -conditioned on all the previous segments. This paper presents a first step -toward building a full simultaneous translation system based on neural machine -translation. -" -2925,1606.02126,Chenhui Chu and Sadao Kurohashi,"Supervised Syntax-based Alignment between English Sentences and Abstract - Meaning Representation Graphs",cs.CL," As alignment links are not given between English sentences and Abstract -Meaning Representation (AMR) graphs in the AMR annotation, automatic alignment -becomes indispensable for training an AMR parser. Previous studies formalize it -as a string-to-string problem and solve it in an unsupervised way, which -suffers from data sparseness due to the small size of training data for -English-AMR alignment. In this paper, we formalize it as a syntax-based -alignment problem and solve it in a supervised manner based on syntax trees, -which can address the data sparseness problem by generalizing English-AMR -tokens to syntax tags. Experiments verify the effectiveness of the proposed -method not only for English-AMR alignment, but also for AMR parsing. -" -2926,1606.02245,"Alessandro Sordoni and Philip Bachman and Adam Trischler and Yoshua - Bengio",Iterative Alternating Neural Attention for Machine Reading,cs.CL cs.NE," We propose a novel neural attention architecture to tackle machine -comprehension tasks, such as answering Cloze-style queries with respect to a -document. Unlike previous models, we do not collapse the query into a single -vector, instead we deploy an iterative alternating attention mechanism that -allows a fine-grained exploration of both the query and the document. Our model -outperforms state-of-the-art baselines in standard machine comprehension -benchmarks such as CNN news articles and the Children's Book Test (CBT) -dataset. -" -2927,1606.02270,"Adam Trischler, Zheng Ye, Xingdi Yuan, Kaheer Suleman",Natural Language Comprehension with the EpiReader,cs.CL," We present the EpiReader, a novel model for machine comprehension of text. -Machine comprehension of unstructured, real-world text is a major research goal -for natural language processing. Current tests of machine comprehension pose -questions whose answers can be inferred from some supporting text, and evaluate -a model's response to the questions. The EpiReader is an end-to-end neural -model comprising two components: the first component proposes a small set of -candidate answers after comparing a question to its supporting text, and the -second component formulates hypotheses using the proposed candidates and the -question, then reranks the hypotheses based on their estimated concordance with -the supporting text. We present experiments demonstrating that the EpiReader -sets a new state-of-the-art on the CNN and Children's Book Test machine -comprehension benchmarks, outperforming previous neural models by a significant -margin. -" -2928,1606.02276,"Nikolaos Pappas, Miriam Redi, Mercan Topkara, Brendan Jou, Hongyi Liu, - Tao Chen, Shih-Fu Chang",Multilingual Visual Sentiment Concept Matching,cs.CL cs.CV cs.IR cs.MM," The impact of culture in visual emotion perception has recently captured the -attention of multimedia research. In this study, we pro- vide powerful -computational linguistics tools to explore, retrieve and browse a dataset of -16K multilingual affective visual concepts and 7.3M Flickr images. First, we -design an effective crowdsourc- ing experiment to collect human judgements of -sentiment connected to the visual concepts. We then use word embeddings to -repre- sent these concepts in a low dimensional vector space, allowing us to -expand the meaning around concepts, and thus enabling insight about -commonalities and differences among different languages. We compare a variety -of concept representations through a novel evaluation task based on the notion -of visual semantic relatedness. Based on these representations, we design -clustering schemes to group multilingual visual concepts, and evaluate them -with novel metrics based on the crowdsourced sentiment annotations as well as -visual semantic relatedness. The proposed clustering framework enables us to -analyze the full multilingual dataset in-depth and also show an application on -a facial data subset, exploring cultural in- sights of portrait-related -affective visual concepts. -" -2929,1606.02342,Shashi Narayan and Shay B. Cohen,Optimizing Spectral Learning for Parsing,cs.CL," We describe a search algorithm for optimizing the number of latent states -when estimating latent-variable PCFGs with spectral methods. Our results show -that contrary to the common belief that the number of latent states for each -nonterminal in an L-PCFG can be decided in isolation with spectral methods, -parsing results significantly improve if the number of latent states for each -nonterminal is globally optimized, while taking into account interactions -between the different nonterminals. In addition, we contribute an empirical -analysis of spectral algorithms on eight morphologically rich languages: -Basque, French, German, Hebrew, Hungarian, Korean, Polish and Swedish. Our -results show that our estimation consistently performs better or close to -coarse-to-fine expectation-maximization techniques for these languages. -" -2930,1606.02440,"Gregory Grefenstette (TAO), Lawrence Muchemi (TAO)","On the Place of Text Data in Lifelogs, and Text Analysis via Semantic - Facets",cs.CL cs.CY cs.HC," Current research in lifelog data has not paid enough attention to analysis of -cognitive activities in comparison to physical activities. We argue that as we -look into the future, wearable devices are going to be cheaper and more -prevalent and textual data will play a more significant role. Data captured by -lifelogging devices will increasingly include speech and text, potentially -useful in analysis of intellectual activities. Analyzing what a person hears, -reads, and sees, we should be able to measure the extent of cognitive activity -devoted to a certain topic or subject by a learner. Test-based lifelog records -can benefit from semantic analysis tools developed for natural language -processing. We show how semantic analysis of such text data can be achieved -through the use of taxonomic subject facets and how these facets might be -useful in quantifying cognitive activity devoted to various topics in a -person's day. We are currently developing a method to automatically create -taxonomic topic vocabularies that can be applied to this detection of -intellectual activity. -" -2931,1606.02447,Sida I. Wang and Percy Liang and Christopher D. Manning,Learning Language Games through Interaction,cs.CL cs.AI," We introduce a new language learning setting relevant to building adaptive -natural language interfaces. It is inspired by Wittgenstein's language games: a -human wishes to accomplish some task (e.g., achieving a certain configuration -of blocks), but can only communicate with a computer, who performs the actual -actions (e.g., removing all red blocks). The computer initially knows nothing -about language and therefore must learn it from scratch through interaction, -while the human adapts to the computer's capabilities. We created a game in a -blocks world and collected interactions from 100 people playing it. First, we -analyze the humans' strategies, showing that using compositionality and -avoiding synonyms correlates positively with task performance. Second, we -compare computer strategies, showing how to quickly learn a semantic parsing -model from scratch, and that modeling pragmatics further accelerates learning -for successful players. -" -2932,1606.02461,Ran Tian and Naoaki Okazaki and Kentaro Inui,"Learning Semantically and Additively Compositional Distributional - Representations",cs.CL," This paper connects a vector-based composition model to a formal semantics, -the Dependency-based Compositional Semantics (DCS). We show theoretical -evidence that the vector compositions in our model conform to the logic of DCS. -Experimentally, we show that vector-based composition brings a strong ability -to calculate similar phrases as similar vectors, achieving near -state-of-the-art on a wide range of phrase similarity tasks and relation -classification; meanwhile, DCS can guide building vectors for structured -queries that can be directly executed. We evaluate this utility on sentence -completion task and report a new state-of-the-art. -" -2933,1606.02514,"Luis Espinosa-Anke, Roberto Carlini, Horacio Saggion, Francesco - Ronzano",DefExt: A Semi Supervised Definition Extraction Tool,cs.CL," We present DefExt, an easy to use semi supervised Definition Extraction Tool. -DefExt is designed to extract from a target corpus those textual fragments -where a term is explicitly mentioned together with its core features, i.e. its -definition. It works on the back of a Conditional Random Fields based -sequential labeling algorithm and a bootstrapping approach. Bootstrapping -enables the model to gradually become more aware of the idiosyncrasies of the -target corpus. In this paper we describe the main components of the toolkit as -well as experimental results stemming from both automatic and manual -evaluation. We release DefExt as open source along with the necessary files to -run it in any Unix machine. We also provide access to training and test data -for immediate use. -" -2934,1606.02529,Jessica Ficler and Yoav Goldberg,Coordination Annotation Extension in the Penn Tree Bank,cs.CL," Coordination is an important and common syntactic construction which is not -handled well by state of the art parsers. Coordinations in the Penn Treebank -are missing internal structure in many cases, do not include explicit marking -of the conjuncts and contain various errors and inconsistencies. In this work, -we initiated manual annotation process for solving these issues. We identify -the different elements in a coordination phrase and label each element with its -function. We add phrase boundaries when these are missing, unify -inconsistencies, and fix errors. The outcome is an extension of the PTB that -includes consistent and detailed structures for coordinations. We make the -coordination annotation publicly available, in hope that they will facilitate -further research into coordination disambiguation. -" -2935,1606.02555,Marco Dinarelli and Isabelle Tellier,Improving Recurrent Neural Networks For Sequence Labelling,cs.CL cs.LG cs.NE," In this paper we study different types of Recurrent Neural Networks (RNN) for -sequence labeling tasks. We propose two new variants of RNNs integrating -improvements for sequence labeling, and we compare them to the more traditional -Elman and Jordan RNNs. We compare all models, either traditional or new, on -four distinct tasks of sequence labeling: two on Spoken Language Understanding -(ATIS and MEDIA); and two of POS tagging for the French Treebank (FTB) and the -Penn Treebank (PTB) corpora. The results show that our new variants of RNNs are -always more effective than the others. -" -2936,1606.02560,Tiancheng Zhao and Maxine Eskenazi,"Towards End-to-End Learning for Dialog State Tracking and Management - using Deep Reinforcement Learning",cs.AI cs.CL cs.LG," This paper presents an end-to-end framework for task-oriented dialog systems -using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to -interface with a relational database and jointly learn policies for both -language understanding and dialog strategy. Moreover, we propose a hybrid -algorithm that combines the strength of reinforcement learning and supervised -learning to achieve faster learning speed. We evaluated the proposed model on a -20 Question Game conversational game simulator. Results show that the proposed -method outperforms the modular-based baseline and learns a distributed -representation of the latent dialog state. -" -2937,1606.02562,"Tiancheng Zhao, Kyusong Lee, Maxine Eskenazi","DialPort: Connecting the Spoken Dialog Research Community to Real User - Data",cs.AI cs.CL," This paper describes a new spoken dialog portal that connects systems -produced by the spoken dialog academic research community and gives them access -to real users. We introduce a distributed, multi-modal, multi-agent prototype -dialog framework that affords easy integration with various remote resources, -ranging from end-to-end dialog systems to external knowledge APIs. To date, the -DialPort portal has successfully connected to the multi-domain spoken dialog -system at Cambridge University, the NOAA (National Oceanic and Atmospheric -Administration) weather API and the Yelp API. -" -2938,1606.02601,Kris Cao and Marek Rei,A Joint Model for Word Embedding and Word Morphology,cs.CL," This paper presents a joint model for performing unsupervised morphological -analysis on words, and learning a character-level composition function from -morphemes to word embeddings. Our model splits individual words into segments, -and weights each segment according to its ability to predict context words. Our -morphological analysis is comparable to dedicated morphological analyzers at -the task of morpheme boundary recovery, and also performs better than -word-based embedding models at the task of syntactic analogy answering. -Finally, we show that incorporating morphology explicitly into character-level -models help them produce embeddings for unseen words which correlate better -with human judgments. -" -2939,1606.02638,"Chaitanya Shivade, Preethi Raghavan, Siddharth Patwardhan",Addressing Limited Data for Textual Entailment Across Domains,cs.CL," We seek to address the lack of labeled data (and high cost of annotation) for -textual entailment in some domains. To that end, we first create (for -experimental purposes) an entailment dataset for the clinical domain, and a -highly competitive supervised entailment system, ENT, that is effective (out of -the box) on two domains. We then explore self-training and active learning -strategies to address the lack of labeled data. With self-training, we -successfully exploit unlabeled data to improve over ENT by 15% F-score on the -newswire domain, and 13% F-score on clinical data. On the other hand, our -active learning experiments demonstrate that we can match (and even beat) ENT -using only 6.6% of the training data in the clinical domain, and only 5.8% of -the training data in the newswire domain. -" -2940,1606.02680,"Amjad Almahairi, Kyunghyun Cho, Nizar Habash and Aaron Courville",First Result on Arabic Neural Machine Translation,cs.CL," Neural machine translation has become a major alternative to widely used -phrase-based statistical machine translation. We notice however that much of -research on neural machine translation has focused on European languages -despite its language agnostic nature. In this paper, we apply neural machine -translation to the task of Arabic translation (Ar<->En) and compare it against -a standard phrase-based translation system. We run extensive comparison using -various configurations in preprocessing Arabic script and show that the -phrase-based and neural translation systems perform comparably to each other -and that proper preprocessing of Arabic script has a similar effect on both of -the systems. We however observe that the neural machine translation -significantly outperform the phrase-based system on an out-of-domain test set, -making it attractive for real-world deployment. -" -2941,1606.02689,"Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina Rojas-Barahona, Stefan - Ultes, David Vandyke, Tsung-Hsien Wen, Steve Young",Continuously Learning Neural Dialogue Management,cs.CL cs.LG," We describe a two-step approach for dialogue management in task-oriented -spoken dialogue systems. A unified neural network framework is proposed to -enable the system to first learn by supervision from a set of dialogue data and -then continuously improve its behaviour via reinforcement learning, all using -gradient-based algorithms on one single model. The experiments demonstrate the -supervised model's effectiveness in the corpus-based evaluation, with user -simulation, and with paid human subjects. The use of reinforcement learning -further improves the model's performance in both interactive settings, -especially under higher-noise conditions. -" -2942,1606.02785,Lu Wang and Wang Ling,Neural Network-Based Abstract Generation for Opinions and Arguments,cs.CL," We study the problem of generating abstractive summaries for opinionated -text. We propose an attention-based neural network model that is able to absorb -information from multiple text units to construct informative, concise, and -fluent summaries. An importance-based sampling method is designed to allow the -encoder to integrate information from an important subset of input. Automatic -evaluation indicates that our system outperforms state-of-the-art abstractive -and extractive summarization systems on two newly collected datasets of movie -reviews and arguments. Our system summaries are also rated as more informative -and grammatical in human evaluation. -" -2943,1606.02820,"William L. Hamilton, Kevin Clark, Jure Leskovec, Dan Jurafsky",Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora,cs.CL," A word's sentiment depends on the domain in which it is used. Computational -social science research thus requires sentiment lexicons that are specific to -the domains being studied. We combine domain-specific word embeddings with a -label propagation framework to induce accurate domain-specific sentiment -lexicons using small sets of seed words, achieving state-of-the-art performance -competitive with approaches that rely on hand-curated resources. Using our -framework we perform two large-scale empirical studies to quantify the extent -to which sentiment varies across time and between communities. We induce and -release historical sentiment lexicons for 150 years of English and -community-specific sentiment lexicons for 250 online communities from the -social media forum Reddit. The historical lexicons show that more than 5% of -sentiment-bearing (non-neutral) English words completely switched polarity -during the last 150 years, and the community-specific lexicons highlight how -sentiment varies drastically between different communities. -" -2944,1606.02821,"William L. Hamilton, Jure Leskovec, Dan Jurafsky","Cultural Shift or Linguistic Drift? Comparing Two Computational Measures - of Semantic Change",cs.CL," Words shift in meaning for many reasons, including cultural factors like new -technologies and regular linguistic processes like subjectification. -Understanding the evolution of language and culture requires disentangling -these underlying causes. Here we show how two different distributional measures -can be used to detect two different types of semantic change. The first -measure, which has been used in many previous works, analyzes global shifts in -a word's distributional semantics, it is sensitive to changes due to regular -processes of linguistic drift, such as the semantic generalization of promise -(""I promise."" -> ""It promised to be exciting.""). The second measure, which we -develop here, focuses on local changes to a word's nearest semantic neighbors; -it is more sensitive to cultural shifts, such as the change in the meaning of -cell (""prison cell"" -> ""cell phone""). Comparing measurements made by these two -methods allows researchers to determine whether changes are more cultural or -linguistic in nature, a distinction that is essential for work in the digital -humanities and historical linguistics. -" -2945,1606.02858,"Danqi Chen, Jason Bolton, Christopher D. Manning",A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task,cs.CL cs.AI," Enabling a computer to understand a document so that it can answer -comprehension questions is a central, yet unsolved goal of NLP. A key factor -impeding its solution by machine learned systems is the limited availability of -human-annotated data. Hermann et al. (2015) seek to solve this problem by -creating over a million training examples by pairing CNN and Daily Mail news -articles with their summarized bullet points, and show that a neural network -can then be trained to give good performance on this task. In this paper, we -conduct a thorough examination of this new reading comprehension task. Our -primary aim is to understand what depth of language understanding is required -to do well on this task. We approach this from one side by doing a careful -hand-analysis of a small subset of the problems and from the other by showing -that simple, carefully designed systems can obtain accuracies of 73.6% and -76.6% on these two datasets, exceeding current state-of-the-art results by -7-10% and approaching what we believe is the ceiling for performance on this -task. -" -2946,1606.02891,"Rico Sennrich, Barry Haddow, Alexandra Birch",Edinburgh Neural Machine Translation Systems for WMT 16,cs.CL," We participated in the WMT 2016 shared news translation task by building -neural translation systems for four language pairs, each trained in both -directions: English<->Czech, English<->German, English<->Romanian and -English<->Russian. Our systems are based on an attentional encoder-decoder, -using BPE subword segmentation for open-vocabulary translation with a fixed -vocabulary. We experimented with using automatic back-translations of the -monolingual News corpus as additional training data, pervasive dropout, and -target-bidirectional models. All reported methods give substantial -improvements, and we see improvements of 4.3--11.2 BLEU over our baseline -systems. In the human evaluation, our systems were the (tied) best constrained -system for 7 out of 8 translation directions in which we participated. -" -2947,1606.02892,"Rico Sennrich, Barry Haddow",Linguistic Input Features Improve Neural Machine Translation,cs.CL," Neural machine translation has recently achieved impressive results, while -using little in the way of external linguistic information. In this paper we -show that the strong learning capability of neural MT models does not make -linguistic features redundant; they can be easily incorporated to provide -further improvements in performance. We generalize the embedding layer of the -encoder in the attentional encoder--decoder architecture to support the -inclusion of arbitrary features, in addition to the baseline word feature. We -add morphological features, part-of-speech tags, and syntactic dependency -labels as input features to English<->German, and English->Romanian neural -machine translation systems. In experiments on WMT16 training and test sets, we -find that linguistic input features improve model quality according to three -metrics: perplexity, BLEU and CHRF3. An open-source implementation of our -neural MT system is available, as are sample files and configurations. -" -2948,1606.02960,Sam Wiseman and Alexander M. Rush,Sequence-to-Sequence Learning as Beam-Search Optimization,cs.CL cs.LG cs.NE stat.ML," Sequence-to-Sequence (seq2seq) modeling has rapidly become an important -general-purpose NLP tool that has proven effective for many text-generation and -sequence-labeling tasks. Seq2seq builds on deep neural language modeling and -inherits its remarkable accuracy in estimating local, next-word distributions. -In this work, we introduce a model and beam-search training scheme, based on -the work of Daume III and Marcu (2005), that extends seq2seq to learn global -sequence scores. This structured approach avoids classical biases associated -with local training and unifies the training loss with the test-time usage, -while preserving the proven model architecture of seq2seq and its efficient -training approach. We show that our system outperforms a highly-optimized -attention-based seq2seq system and other baselines on three different sequence -to sequence tasks: word ordering, parsing, and machine translation. -" -2949,1606.02976,"Khadim Dram\'e (UB), Fleur Mougin (UB), Gayo Diallo (UB)","Large scale biomedical texts classification: a kNN and an ESA-based - approaches",cs.IR cs.CL," With the large and increasing volume of textual data, automated methods for -identifying significant topics to classify textual documents have received a -growing interest. While many efforts have been made in this direction, it still -remains a real challenge. Moreover, the issue is even more complex as full -texts are not always freely available. Then, using only partial information to -annotate these documents is promising but remains a very ambitious issue. -MethodsWe propose two classification methods: a k-nearest neighbours -(kNN)-based approach and an explicit semantic analysis (ESA)-based approach. -Although the kNN-based approach is widely used in text classification, it needs -to be improved to perform well in this specific classification problem which -deals with partial information. Compared to existing kNN-based methods, our -method uses classical Machine Learning (ML) algorithms for ranking the labels. -Additional features are also investigated in order to improve the classifiers' -performance. In addition, the combination of several learning algorithms with -various techniques for fixing the number of relevant topics is performed. On -the other hand, ESA seems promising for this classification task as it yielded -interesting results in related issues, such as semantic relatedness computation -between texts and text classification. Unlike existing works, which use ESA for -enriching the bag-of-words approach with additional knowledge-based features, -our ESA-based method builds a standalone classifier. Furthermore, we -investigate if the results of this method could be useful as a complementary -feature of our kNN-based approach.ResultsExperimental evaluations performed on -large standard annotated datasets, provided by the BioASQ organizers, show that -the kNN-based method with the Random Forest learning algorithm achieves good -performances compared with the current state-of-the-art methods, reaching a -competitive f-measure of 0.55% while the ESA-based approach surprisingly -yielded reserved results.ConclusionsWe have proposed simple classification -methods suitable to annotate textual documents using only partial information. -They are therefore adequate for large multi-label classification and -particularly in the biomedical domain. Thus, our work contributes to the -extraction of relevant information from unstructured documents in order to -facilitate their automated processing. Consequently, it could be used for -various purposes, including document indexing, information retrieval, etc. -" -2950,1606.02979,"Shaohua Li, Tat-Seng Chua, Jun Zhu, Chunyan Miao","Generative Topic Embedding: a Continuous Representation of Documents - (Extended Version with Proofs)",cs.CL cs.AI cs.IR cs.LG stat.ML," Word embedding maps words into a low-dimensional continuous embedding space -by exploiting the local word collocation patterns in a small context window. On -the other hand, topic modeling maps documents onto a low-dimensional topic -space, by utilizing the global word collocation patterns in the same document. -These two types of patterns are complementary. In this paper, we propose a -generative topic embedding model to combine the two types of patterns. In our -model, topics are represented by embedding vectors, and are shared across -documents. The probability of each word is influenced by both its local context -and its topic. A variational inference method yields the topic embeddings as -well as the topic mixing proportions for each document. Jointly they represent -the document in a low-dimensional continuous space. In two document -classification tasks, our method performs better than eight existing methods, -with fewer features. In addition, we illustrate with an example that our method -can generate coherent topics even based on only one document. -" -2951,1606.03002,"Dirk Weissenborn and Tim Rockt\""aschel",MuFuRU: The Multi-Function Recurrent Unit,cs.NE cs.AI cs.CL," Recurrent neural networks such as the GRU and LSTM found wide adoption in -natural language processing and achieve state-of-the-art results for many -tasks. These models are characterized by a memory state that can be written to -and read from by applying gated composition operations to the current input and -the previous state. However, they only cover a small subset of potentially -useful compositions. We propose Multi-Function Recurrent Units (MuFuRUs) that -allow for arbitrary differentiable functions as composition operations. -Furthermore, MuFuRUs allow for an input- and state-dependent choice of these -composition operations that is learned. Our experiments demonstrate that the -additional functionality helps in different sequence modeling tasks, including -the evaluation of propositional logic formulae, language modeling and sentiment -analysis. -" -2952,1606.03126,"Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, - Antoine Bordes, Jason Weston",Key-Value Memory Networks for Directly Reading Documents,cs.CL," Directly reading documents and being able to answer questions from them is an -unsolved challenge. To avoid its inherent difficulty, question answering (QA) -has been directed towards using Knowledge Bases (KBs) instead, which has proven -effective. Unfortunately KBs often suffer from being too restrictive, as the -schema cannot support certain types of answers, and too sparse, e.g. Wikipedia -contains much more information than Freebase. In this work we introduce a new -method, Key-Value Memory Networks, that makes reading documents more viable by -utilizing different encodings in the addressing and output stages of the memory -read operation. To compare using KBs, information extraction or Wikipedia -documents directly in a single framework we construct an analysis tool, -WikiMovies, a QA dataset that contains raw text alongside a preprocessed KB, in -the domain of movies. Our method reduces the gap between all three settings. It -also achieves state-of-the-art results on the existing WikiQA benchmark. -" -2953,1606.03143,"Saeid Parvandeh, Shibamouli Lahiri, Fahimeh Boroumand",PerSum: Novel Systems for Document Summarization in Persian,cs.CL," In this paper we explore the problem of document summarization in Persian -language from two distinct angles. In our first approach, we modify a popular -and widely cited Persian document summarization framework to see how it works -on a realistic corpus of news articles. Human evaluation on generated summaries -shows that graph-based methods perform better than the modified systems. We -carry this intuition forward in our second approach, and probe deeper into the -nature of graph-based systems by designing several summarizers based on -centrality measures. Ad hoc evaluation using ROUGE score on these summarizers -suggests that there is a small class of centrality measures that perform better -than three strong unsupervised baselines. -" -2954,1606.03144,Marek Rei and Ronan Cummins,"Sentence Similarity Measures for Fine-Grained Estimation of Topical - Relevance in Learner Essays",cs.CL cs.LG cs.NE," We investigate the task of assessing sentence-level prompt relevance in -learner essays. Various systems using word overlap, neural embeddings and -neural compositional models are evaluated on two datasets of learner writing. -We propose a new method for sentence-level similarity calculation, which learns -to adjust the weights of pre-trained word embeddings for a specific task, -achieving substantially higher accuracy compared to other relevant baselines. -" -2955,1606.03152,"Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman",Policy Networks with Two-Stage Training for Dialogue Systems,cs.CL cs.AI," In this paper, we propose to use deep policy networks which are trained with -an advantage actor-critic method for statistically optimised dialogue systems. -First, we show that, on summary state and action spaces, deep Reinforcement -Learning (RL) outperforms Gaussian Processes methods. Summary state and action -spaces lead to good performance but require pre-engineering effort, RL -knowledge, and domain expertise. In order to remove the need to define such -summary spaces, we show that deep RL can also be trained efficiently on the -original state and action spaces. Dialogue systems based on partially -observable Markov decision processes are known to require many dialogues to -train, which makes them unappealing for practical deployment. We show that a -deep RL method based on an actor-critic architecture can exploit a small amount -of data very efficiently. Indeed, with only a few hundred dialogues collected -with a handcrafted policy, the actor-critic deep learner is considerably -bootstrapped from a combination of supervised and batch RL. In addition, -convergence to an optimal policy is significantly sped up compared to other -deep RL methods initialized on the data with batch RL. All experiments are -performed on a restaurant domain derived from the Dialogue State Tracking -Challenge 2 (DSTC2) dataset. -" -2956,1606.03153,"Furong Huang, Animashree Anandkumar","Unsupervised Learning of Word-Sequence Representations from Scratch via - Convolutional Tensor Decomposition",cs.CL cs.LG," Unsupervised text embeddings extraction is crucial for text understanding in -machine learning. Word2Vec and its variants have received substantial success -in mapping words with similar syntactic or semantic meaning to vectors close to -each other. However, extracting context-aware word-sequence embedding remains a -challenging task. Training over large corpus is difficult as labels are -difficult to get. More importantly, it is challenging for pre-trained models to -obtain word-sequence embeddings that are universally good for all downstream -tasks or for any new datasets. We propose a two-phased ConvDic+DeconvDec -framework to solve the problem by combining a word-sequence dictionary learning -model with a word-sequence embedding decode model. We propose a convolutional -tensor decomposition mechanism to learn good word-sequence phrase dictionary in -the learning phase. It is proved to be more accurate and much more efficient -than the popular alternating minimization method. In the decode phase, we -introduce a deconvolution framework that is immune to the problem of varying -sentence lengths. The word-sequence embeddings we extracted using -ConvDic+DeconvDec are universally good for a few downstream tasks we test on. -The framework requires neither pre-training nor prior/outside information. -" -2957,1606.03192,"Shaohua Li, Jun Zhu, Chunyan Miao",PSDVec: a Toolbox for Incremental and Scalable Word Embedding,cs.CL," PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping -of words in a natural language to continuous vectors which encode the -semantic/syntactic regularities between the words. PSDVec implements a word -embedding learning method based on a weighted low-rank positive semidefinite -approximation. To scale up the learning process, we implement a blockwise -online learning algorithm to learn the embeddings incrementally. This strategy -greatly reduces the learning time of word embeddings on a large vocabulary, and -can learn the embeddings of new words without re-learning the whole vocabulary. -On 9 word similarity/analogy benchmark sets and 2 Natural Language Processing -(NLP) tasks, PSDVec produces embeddings that has the best average performance -among popular word embedding tools. PSDVec provides a new option for NLP -practitioners. -" -2958,1606.03207,"Hwaran Lee, Geonmin Kim, Ho-Gyeong Kim, Sang-Hoon Oh, and Soo-Young - Lee","Deep CNNs along the Time Axis with Intermap Pooling for Robustness to - Spectral Variations",cs.CL cs.LG cs.NE," Convolutional neural networks (CNNs) with convolutional and pooling -operations along the frequency axis have been proposed to attain invariance to -frequency shifts of features. However, this is inappropriate with regard to the -fact that acoustic features vary in frequency. In this paper, we contend that -convolution along the time axis is more effective. We also propose the addition -of an intermap pooling (IMP) layer to deep CNNs. In this layer, filters in each -group extract common but spectrally variant features, then the layer pools the -feature maps of each group. As a result, the proposed IMP CNN can achieve -insensitivity to spectral variations characteristic of different speakers and -utterances. The effectiveness of the IMP CNN architecture is demonstrated on -several LVCSR tasks. Even without speaker adaptation techniques, the -architecture achieved a WER of 12.7% on the SWB part of the Hub5'2000 -evaluation test set, which is competitive with other state-of-the-art methods. -" -2959,1606.03254,Dimitra Gkatzia and Oliver Lemon and Verena Rieser,"Natural Language Generation enhances human decision-making with - uncertain information",cs.CL cs.AI," Decision-making is often dependent on uncertain data, e.g. data associated -with confidence scores or probabilities. We present a comparison of different -information presentations for uncertain data and, for the first time, measure -their effects on human decision-making. We show that the use of Natural -Language Generation (NLG) improves decision-making under uncertainty, compared -to state-of-the-art graphical-based representation methods. In a task-based -study with 442 adults, we found that presentations using NLG lead to 24% better -decision-making on average than the graphical presentations, and to 44% better -decision-making when NLG is combined with graphics. We also show that women -achieve significantly better results when presented with NLG output (an 87% -increase on average compared to graphical presentations). -" -2960,1606.03333,"Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain",Automatic Genre and Show Identification of Broadcast Media,cs.MM cs.CL cs.IR," Huge amounts of digital videos are being produced and broadcast every day, -leading to giant media archives. Effective techniques are needed to make such -data accessible further. Automatic meta-data labelling of broadcast media is an -essential task for multimedia indexing, where it is standard to use multi-modal -input for such purposes. This paper describes a novel method for automatic -detection of media genre and show identities using acoustic features, textual -features or a combination thereof. Furthermore the inclusion of available -meta-data, such as time of broadcast, is shown to lead to very high -performance. Latent Dirichlet Allocation is used to model both acoustics and -text, yielding fixed dimensional representations of media recordings that can -then be used in Support Vector Machines based classification. Experiments are -conducted on more than 1200 hours of TV broadcasts from the British -Broadcasting Corporation (BBC), where the task is to categorise the broadcasts -into 8 genres or 133 show identities. On a 200-hour test set, accuracies of -98.6% and 85.7% were achieved for genre and show identification respectively, -using a combination of acoustic and textual features with meta-data. -" -2961,1606.03335,"Roman Bartusiak, {\L}ukasz Augustyniak, Tomasz Kajdanowicz, - Przemys{\l}aw Kazienko, Maciej Piasecki",WordNet2Vec: Corpora Agnostic Word Vectorization Method,cs.CL cs.AI cs.DC," A complex nature of big data resources demands new methods for structuring -especially for textual content. WordNet is a good knowledge source for -comprehensive abstraction of natural language as its good implementations exist -for many languages. Since WordNet embeds natural language in the form of a -complex network, a transformation mechanism WordNet2Vec is proposed in the -paper. It creates vectors for each word from WordNet. These vectors encapsulate -general position - role of a given word towards all other words in the natural -language. Any list or set of such vectors contains knowledge about the context -of its component within the whole language. Such word representation can be -easily applied to many analytic tasks like classification or clustering. The -usefulness of the WordNet2Vec method was demonstrated in sentiment analysis, -i.e. classification with transfer learning for the real Amazon opinion textual -dataset. -" -2962,1606.03352,"Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M. Rojas-Barahona, - Pei-Hao Su, Stefan Ultes, David Vandyke, Steve Young",Conditional Generation and Snapshot Learning in Neural Dialogue Systems,cs.CL cs.NE stat.ML," Recently a variety of LSTM-based conditional language models (LM) have been -applied across a range of language generation tasks. In this work we study -various model architectures and different ways to represent and aggregate the -source information in an end-to-end neural dialogue system framework. A method -called snapshot learning is also proposed to facilitate learning from -supervised sequential signals by applying a companion cross-entropy objective -function to the conditioning vector. The experimental and analytical results -demonstrate firstly that competition occurs between the conditioning vector and -the LM, and the differing architectures provide different trade-offs between -the two. Secondly, the discriminative power and transparency of the -conditioning vector is key to providing both model interpretability and better -performance. Thirdly, snapshot learning leads to consistent performance -improvements independent of which architecture is used. -" -2963,1606.03391,"Wenpeng Yin, Mo Yu, Bing Xiang, Bowen Zhou, Hinrich Sch\""utze",Simple Question Answering by Attentive Convolutional Neural Network,cs.CL," This work focuses on answering single-relation factoid questions over -Freebase. Each question can acquire the answer from a single fact of form -(subject, predicate, object) in Freebase. This task, simple question answering -(SimpleQA), can be addressed via a two-step pipeline: entity linking and fact -selection. In fact selection, we match the subject entity in a fact candidate -with the entity mention in the question by a character-level convolutional -neural network (char-CNN), and match the predicate in that fact with the -question by a word-level CNN (word-CNN). This work makes two main -contributions. (i) A simple and effective entity linker over Freebase is -proposed. Our entity linker outperforms the state-of-the-art entity linker over -SimpleQA task. (ii) A novel attentive maxpooling is stacked over word-CNN, so -that the predicate representation can be matched with the predicate-focused -question representation more effectively. Experiments show that our system sets -new state-of-the-art in this task. -" -2964,1606.03398,"Lidong Bing, Bhuwan Dhingra, Kathryn Mazaitis, Jong Hyuk Park, William - W. Cohen","Bootstrapping Distantly Supervised IE using Joint Learning and Small - Well-structured Corpora",cs.CL," We propose a framework to improve performance of distantly-supervised -relation extraction, by jointly learning to solve two related tasks: -concept-instance extraction and relation extraction. We combine this with a -novel use of document structure: in some small, well-structured corpora, -sections can be identified that correspond to relation arguments, and -distantly-labeled examples from such sections tend to have good precision. -Using these as seeds we extract additional relation examples by applying label -propagation on a graph composed of noisy examples extracted from a large -unstructured testing corpus. Combined with the soft constraint that concept -examples should have the same type as the second argument of the relation, we -get significant improvements over several state-of-the-art approaches to -distantly-supervised relation extraction. -" -2965,1606.03402,"Pavel Sountsov, Sunita Sarawagi",Length bias in Encoder Decoder Models and a Case for Global Conditioning,cs.AI cs.CL," Encoder-decoder networks are popular for modeling sequences probabilistically -in many applications. These models use the power of the Long Short-Term Memory -(LSTM) architecture to capture the full dependence among variables, unlike -earlier models like CRFs that typically assumed conditional independence among -non-adjacent variables. However in practice encoder-decoder models exhibit a -bias towards short sequences that surprisingly gets worse with increasing beam -size. - In this paper we show that such phenomenon is due to a discrepancy between -the full sequence margin and the per-element margin enforced by the locally -conditioned training objective of a encoder-decoder model. The discrepancy more -adversely impacts long sequences, explaining the bias towards predicting short -sequences. - For the case where the predicted sequences come from a closed set, we show -that a globally conditioned model alleviates the above problems of -encoder-decoder models. From a practical point of view, our proposed model also -eliminates the need for a beam-search during inference, which reduces to an -efficient dot-product based search in a vector-space. -" -2966,1606.03475,"Franck Dernoncourt, Ji Young Lee, Ozlem Uzuner, Peter Szolovits",De-identification of Patient Notes with Recurrent Neural Networks,cs.CL cs.AI cs.NE stat.ML," Objective: Patient notes in electronic health records (EHRs) may contain -critical information for medical investigations. However, the vast majority of -medical investigators can only access de-identified notes, in order to protect -the confidentiality of patients. In the United States, the Health Insurance -Portability and Accountability Act (HIPAA) defines 18 types of protected health -information (PHI) that needs to be removed to de-identify patient notes. Manual -de-identification is impractical given the size of EHR databases, the limited -number of researchers with access to the non-de-identified notes, and the -frequent mistakes of human annotators. A reliable automated de-identification -system would consequently be of high value. - Materials and Methods: We introduce the first de-identification system based -on artificial neural networks (ANNs), which requires no handcrafted features or -rules, unlike existing systems. We compare the performance of the system with -state-of-the-art systems on two datasets: the i2b2 2014 de-identification -challenge dataset, which is the largest publicly available de-identification -dataset, and the MIMIC de-identification dataset, which we assembled and is -twice as large as the i2b2 2014 dataset. - Results: Our ANN model outperforms the state-of-the-art systems. It yields an -F1-score of 97.85 on the i2b2 2014 dataset, with a recall 97.38 and a precision -of 97.32, and an F1-score of 99.23 on the MIMIC de-identification dataset, with -a recall 99.25 and a precision of 99.06. - Conclusion: Our findings support the use of ANNs for de-identification of -patient notes, as they show better performance than previously published -systems while requiring no feature engineering. -" -2967,1606.03556,"Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, Dhruv - Batra","Human Attention in Visual Question Answering: Do Humans and Deep - Networks Look at the Same Regions?",cs.CV cs.CL," We conduct large-scale studies on `human attention' in Visual Question -Answering (VQA) to understand where humans choose to look to answer questions -about images. We design and test multiple game-inspired novel -attention-annotation interfaces that require the subject to sharpen regions of -a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human -ATtention) dataset. We evaluate attention maps generated by state-of-the-art -VQA models against human attention both qualitatively (via visualizations) and -quantitatively (via rank-order correlation). Overall, our experiments show that -current attention models in VQA do not seem to be looking at the same regions -as humans. -" -2968,1606.03568,"Mikael K{\aa}geb\""ack, Hans Salomonsson",Word Sense Disambiguation using a Bidirectional LSTM,cs.CL cs.AI," In this paper we present a clean, yet effective, model for word sense -disambiguation. Our approach leverage a bidirectional long short-term memory -network which is shared between all words. This enables the model to share -statistical strength and to scale well with vocabulary size. The model is -trained end-to-end, directly from the raw text to sense labels, and makes -effective use of word order. We evaluate our approach on two standard datasets, -using identical hyperparameter settings, which are in turn tuned on a third set -of held out data. We employ no external resources (e.g. knowledge graphs, -part-of-speech tagging, etc), language specific features, or hand crafted -rules, but still achieve statistically equivalent results to the best -state-of-the-art systems, that employ no such limitations. -" -2969,1606.03622,Robin Jia and Percy Liang,Data Recombination for Neural Semantic Parsing,cs.CL," Modeling crisp logical regularities is crucial in semantic parsing, making it -difficult for neural models with no task-specific prior knowledge to achieve -good results. In this paper, we introduce data recombination, a novel framework -for injecting such prior knowledge into a model. From the training data, we -induce a high-precision synchronous context-free grammar, which captures -important conditional independence properties commonly found in semantic -parsing. We then train a sequence-to-sequence recurrent network (RNN) model -with a novel attention-based copying mechanism on datapoints sampled from this -grammar, thereby teaching the model about these structural properties. Data -recombination improves the accuracy of our RNN model on three semantic parsing -datasets, leading to new state-of-the-art performance on the standard GeoQuery -dataset for models with comparable supervision. -" -2970,1606.03632,"Shikhar Sharma and Jing He and Kaheer Suleman and Hannes Schulz and - Philip Bachman","Natural Language Generation in Dialogue using Lexicalized and - Delexicalized Data",cs.CL," Natural language generation plays a critical role in spoken dialogue systems. -We present a new approach to natural language generation for task-oriented -dialogue using recurrent neural networks in an encoder-decoder framework. In -contrast to previous work, our model uses both lexicalized and delexicalized -components i.e. slot-value pairs for dialogue acts, with slots and -corresponding values aligned together. This allows our model to learn from all -available data including the slot-value pairing, rather than being restricted -to delexicalized slots. We show that this helps our model generate more natural -sentences with better grammar. We further improve our model's performance by -transferring weights learnt from a pretrained sentence auto-encoder. Human -evaluation of our best-performing model indicates that it generates sentences -which users find more appealing. -" -2971,1606.03667,"Ji He, Mari Ostendorf, Xiaodong He, Jianshu Chen, Jianfeng Gao, Lihong - Li, Li Deng","Deep Reinforcement Learning with a Combinatorial Action Space for - Predicting Popular Reddit Threads",cs.CL cs.AI cs.LG," We introduce an online popularity prediction and tracking task as a benchmark -task for reinforcement learning with a combinatorial, natural language action -space. A specified number of discussion threads predicted to be popular are -recommended, chosen from a fixed window of recent comments to track. Novel deep -reinforcement learning architectures are studied for effective modeling of the -value function associated with actions comprised of interdependent sub-actions. -The proposed model, which represents dependence between sub-actions through a -bi-directional LSTM, gives the best performance across different experimental -configurations and domains, and it also generalizes well with varying numbers -of recommendation requests. -" -2972,1606.03676,Beno\^it Sagot (ALPAGE),External Lexical Information for Multilingual Part-of-Speech Tagging,cs.CL," Morphosyntactic lexicons and word vector representations have both proven -useful for improving the accuracy of statistical part-of-speech taggers. Here -we compare the performances of four systems on datasets covering 16 languages, -two of these systems being feature-based (MEMMs and CRFs) and two of them being -neural-based (bi-LSTMs). We show that, on average, all four approaches perform -similarly and reach state-of-the-art results. Yet better performances are -obtained with our feature-based models on lexically richer datasets (e.g. for -morphologically rich languages), whereas neural-based results are higher on -datasets with less lexical variability (e.g. for English). These conclusions -hold in particular for the MEMM models relying on our system MElt, which -benefited from newly designed features. This shows that, under certain -conditions, feature-based approaches enriched with morphosyntactic lexicons are -competitive with respect to neural methods. -" -2973,1606.03777,"Nikola Mrk\v{s}i\'c and Diarmuid \'O S\'eaghdha and Tsung-Hsien Wen - and Blaise Thomson and Steve Young",Neural Belief Tracker: Data-Driven Dialogue State Tracking,cs.CL cs.AI cs.LG," One of the core components of modern spoken dialogue systems is the belief -tracker, which estimates the user's goal at every step of the dialogue. -However, most current approaches have difficulty scaling to larger, more -complex dialogue domains. This is due to their dependency on either: a) Spoken -Language Understanding models that require large amounts of annotated training -data; or b) hand-crafted lexicons for capturing some of the linguistic -variation in users' language. We propose a novel Neural Belief Tracking (NBT) -framework which overcomes these problems by building on recent advances in -representation learning. NBT models reason over pre-trained word vectors, -learning to compose them into distributed representations of user utterances -and dialogue context. Our evaluation on two datasets shows that this approach -surpasses past limitations, matching the performance of state-of-the-art models -which rely on hand-crafted semantic lexicons and outperforming them when such -lexicons are not provided. -" -2974,1606.03783,"Pedro Chahuara, Thomas Lampert, Pierre Gancarski","Retrieving and Ranking Similar Questions from Question-Answer Archives - Using Topic Modelling and Topic Distribution Regression",cs.IR cs.CL cs.LG," Presented herein is a novel model for similar question ranking within -collaborative question answer platforms. The presented approach integrates a -regression stage to relate topics derived from questions to those derived from -question-answer pairs. This helps to avoid problems caused by the differences -in vocabulary used within questions and answers, and the tendency for questions -to be shorter than answers. The performance of the model is shown to outperform -translation methods and topic modelling (without regression) on several -real-world datasets. -" -2975,1606.03784,Guido Zarrella and Amy Marsh,MITRE at SemEval-2016 Task 6: Transfer Learning for Stance Detection,cs.AI cs.CL," We describe MITRE's submission to the SemEval-2016 Task 6, Detecting Stance -in Tweets. This effort achieved the top score in Task A on supervised stance -detection, producing an average F1 score of 67.8 when assessing whether a tweet -author was in favor or against a topic. We employed a recurrent neural network -initialized with features learned via distant supervision on two large -unlabeled datasets. We trained embeddings of words and phrases with the -word2vec skip-gram method, then used those features to learn sentence -representations via a hashtag prediction auxiliary task. These sentence vectors -were then fine-tuned for stance detection on several hundred labeled examples. -The result was a high performing system that used transfer learning to maximize -the value of the available training data. -" -2976,1606.03821,"Will Monroe, Noah D. Goodman, Christopher Potts",Learning to Generate Compositional Color Descriptions,cs.CL," The production of color language is essential for grounded language -generation. Color descriptions have many challenging properties: they can be -vague, compositionally complex, and denotationally rich. We present an -effective approach to generating color descriptions using recurrent neural -networks and a Fourier-transformed color representation. Our model outperforms -previous work on a conditional language modeling task over a large corpus of -naturalistic color descriptions. In addition, probing the model's output -reveals that it can accurately produce not only basic color terms but also -descriptors with non-convex denotations (""greenish""), bare modifiers (""bright"", -""dull""), and compositional phrases (""faded teal"") not seen in training. -" -2977,1606.03864,Dirk Weissenborn,Neural Associative Memory for Dual-Sequence Modeling,cs.NE cs.AI cs.CL cs.LG," Many important NLP problems can be posed as dual-sequence or -sequence-to-sequence modeling tasks. Recent advances in building end-to-end -neural architectures have been highly successful in solving such tasks. In this -work we propose a new architecture for dual-sequence modeling that is based on -associative memory. We derive AM-RNNs, a recurrent associative memory (AM) -which augments generic recurrent neural networks (RNN). This architecture is -extended to the Dual AM-RNN which operates on two AMs at once. Our models -achieve very competitive results on textual entailment. A qualitative analysis -demonstrates that long range dependencies between source and target-sequence -can be bridged effectively using Dual AM-RNNs. However, an initial experiment -on auto-encoding reveals that these benefits are not exploited by the system -when learning to solve sequence-to-sequence tasks which indicates that -additional supervision or regularization is needed. -" -2978,1606.04052,Julien Perez and Fei Liu,"Dialog state tracking, a machine reading approach using Memory Network",cs.CL cs.NE stat.ML," In an end-to-end dialog system, the aim of dialog state tracking is to -accurately estimate a compact representation of the current dialog status from -a sequence of noisy observations produced by the speech recognition and the -natural language understanding modules. This paper introduces a novel method of -dialog state tracking based on the general paradigm of machine reading and -proposes to solve it using an End-to-End Memory Network, MemN2N, a -memory-enhanced neural network architecture. We evaluate the proposed approach -on the second Dialog State Tracking Challenge (DSTC-2) dataset. The corpus has -been converted for the occasion in order to frame the hidden state variable -inference as a question-answering task based on a sequence of utterances -extracted from a dialog. We show that the proposed tracker gives encouraging -results. Then, we propose to extend the DSTC-2 dataset with specific reasoning -capabilities requirement like counting, list maintenance, yes-no question -answering and indefinite knowledge management. Finally, we present encouraging -results using our proposed MemN2N based tracking model. -" -2979,1606.04081,"Pedro Mota, Maxine Eskenazi, Luisa Coheur","Graph-Community Detection for Cross-Document Topic Segment Relationship - Identification",cs.CL cs.IR cs.SI," In this paper we propose a graph-community detection approach to identify -cross-document relationships at the topic segment level. Given a set of related -documents, we automatically find these relationships by clustering segments -with similar content (topics). In this context, we study how different -weighting mechanisms influence the discovery of word communities that relate to -the different topics found in the documents. Finally, we test different mapping -functions to assign topic segments to word communities, determining which topic -segments are considered equivalent. - By performing this task it is possible to enable efficient multi-document -browsing, since when a user finds relevant content in one document we can -provide access to similar topics in other documents. We deploy our approach in -two different scenarios. One is an educational scenario where equivalence -relationships between learning materials need to be found. The other consists -of a series of dialogs in a social context where students discuss commonplace -topics. Results show that our proposed approach better discovered equivalence -relationships in learning material documents and obtained close results in the -social speech domain, where the best performing approach was a clustering -technique. -" -2980,1606.04155,"Tao Lei, Regina Barzilay and Tommi Jaakkola",Rationalizing Neural Predictions,cs.CL cs.NE," Prediction without justification has limited applicability. As a remedy, we -learn to extract pieces of input text as justifications -- rationales -- that -are tailored to be short and coherent, yet sufficient for making the same -prediction. Our approach combines two modular components, generator and -encoder, which are trained to operate well together. The generator specifies a -distribution over text fragments as candidate rationales and these are passed -through the encoder for prediction. Rationales are never given during training. -Instead, the model is regularized by desiderata for rationales. We evaluate the -approach on multi-aspect sentiment analysis against manually annotated test -cases. Our approach outperforms attention-based baseline by a significant -margin. We also successfully illustrate the method on the question retrieval -task. -" -2981,1606.04164,"Orhan Firat and Baskaran Sankaran and Yaser Al-Onaizan and Fatos T. - Yarman Vural and Kyunghyun Cho",Zero-Resource Translation with Multi-Lingual Neural Machine Translation,cs.CL," In this paper, we propose a novel finetuning algorithm for the recently -introduced multi-way, mulitlingual neural machine translate that enables -zero-resource machine translation. When used together with novel many-to-one -translation strategies, we empirically show that this finetuning algorithm -allows the multi-way, multilingual model to translate a zero-resource language -pair (1) as well as a single-pair neural translation model trained with up to -1M direct parallel sentences of the same language pair and (2) better than -pivot-based translation strategy, while keeping only one additional copy of -attention-related parameters. -" -2982,1606.04199,Jie Zhou and Ying Cao and Xuguang Wang and Peng Li and Wei Xu,"Deep Recurrent Models with Fast-Forward Connections for Neural Machine - Translation",cs.CL cs.LG," Neural machine translation (NMT) aims at solving machine translation (MT) -problems using neural networks and has exhibited promising results in recent -years. However, most of the existing NMT models are shallow and there is still -a performance gap between a single NMT model and the best conventional MT -system. In this work, we introduce a new type of linear connections, named -fast-forward connections, based on deep Long Short-Term Memory (LSTM) networks, -and an interleaved bi-directional architecture for stacking the LSTM layers. -Fast-forward connections play an essential role in propagating the gradients -and building a deep topology of depth 16. On the WMT'14 English-to-French task, -we achieve BLEU=37.7 with a single attention model, which outperforms the -corresponding single shallow model by 6.2 BLEU points. This is the first time -that a single NMT model achieves state-of-the-art performance and outperforms -the best conventional model by 0.7 BLEU points. We can still achieve BLEU=36.3 -even without using an attention mechanism. After special handling of unknown -words and model ensembling, we obtain the best score reported to date on this -task with BLEU=40.4. Our models are also validated on the more difficult WMT'14 -English-to-German task. -" -2983,1606.04212,"Ye Zhang, Matthew Lease, Byron C. Wallace",Active Discriminative Text Representation Learning,cs.CL," We propose a new active learning (AL) method for text classification with -convolutional neural networks (CNNs). In AL, one selects the instances to be -manually labeled with the aim of maximizing model performance with minimal -effort. Neural models capitalize on word embeddings as representations -(features), tuning these to the task at hand. We argue that AL strategies for -multi-layered neural models should focus on selecting instances that most -affect the embedding space (i.e., induce discriminative word representations). -This is in contrast to traditional AL approaches (e.g., entropy-based -uncertainty sampling), which specify higher level objectives. We propose a -simple approach for sentence classification that selects instances containing -words whose embeddings are likely to be updated with the greatest magnitude, -thereby rapidly learning discriminative, task-specific embeddings. We extend -this approach to document classification by jointly considering: (1) the -expected changes to the constituent word representations; and (2) the model's -current overall uncertainty regarding the instance. The relative emphasis -placed on these criteria is governed by a stochastic process that favors -selecting instances likely to improve representations at the outset of -learning, and then shifts toward general uncertainty sampling as AL progresses. -Empirical results show that our method outperforms baseline AL approaches on -both sentence and document classification tasks. We also show that, as -expected, the method quickly learns discriminative word embeddings. To the best -of our knowledge, this is the first work on AL addressing neural models for -text classification. -" -2984,1606.04217,"Ekaterina Vylomova, Trevor Cohn, Xuanli He and Gholamreza Haffari","Word Representation Models for Morphologically Rich Languages in Neural - Machine Translation",cs.NE cs.CL," Dealing with the complex word forms in morphologically rich languages is an -open problem in language processing, and is particularly important in -translation. In contrast to most modern neural systems of translation, which -discard the identity for rare words, in this paper we propose several -architectures for learning word representations from character and morpheme -level word decompositions. We incorporate these representations in a novel -machine translation model which jointly learns word alignments and translations -via a hard attention mechanism. Evaluating on translating from several -morphologically rich languages into English, we show consistent improvements -over strong baseline methods, of between 1 and 1.5 BLEU points. -" -2985,1606.04279,Jan Buys and Jan A. Botha,Cross-Lingual Morphological Tagging for Low-Resource Languages,cs.CL," Morphologically rich languages often lack the annotated linguistic resources -required to develop accurate natural language processing tools. We propose -models suitable for training morphological taggers with rich tagsets for -low-resource languages without using direct supervision. Our approach extends -existing approaches of projecting part-of-speech tags across languages, using -bitext to infer constraints on the possible tags for a given word type or -token. We propose a tagging model using Wsabie, a discriminative -embedding-based model with rank-based learning. In our evaluation on 11 -languages, on average this model performs on par with a baseline -weakly-supervised HMM, while being more scalable. Multilingual experiments show -that the method performs best when projecting between related language pairs. -Despite the inherently lossy projection, we show that the morphological tags -predicted by our models improve the downstream performance of a parser by +0.6 -LAS on average. -" -2986,1606.04289,Dimitrios Alikaniotis and Helen Yannakoudakis and Marek Rei,Automatic Text Scoring Using Neural Networks,cs.CL cs.LG cs.NE," Automated Text Scoring (ATS) provides a cost-effective and consistent -alternative to human marking. However, in order to achieve good performance, -the predictive features of the system need to be manually engineered by human -experts. We introduce a model that forms word representations by learning the -extent to which specific words contribute to the text's score. Using Long-Short -Term Memory networks to represent the meaning of texts, we demonstrate that a -fully automated framework is able to achieve excellent results over similar -approaches. In an attempt to make our results more interpretable, and inspired -by recent advances in visualizing neural networks, we introduce a novel method -for identifying the regions of the text that the model has found more -discriminative. -" -2987,1606.04300,Deng Cai and Hai Zhao,Neural Word Segmentation Learning for Chinese,cs.CL," Most previous approaches to Chinese word segmentation formalize this problem -as a character-based sequence labeling task where only contextual information -within fixed sized local windows and simple interactions between adjacent tags -can be captured. In this paper, we propose a novel neural framework which -thoroughly eliminates context windows and can utilize complete segmentation -history. Our model employs a gated combination neural network over characters -to produce distributed representations of word candidates, which are then given -to a long short-term memory (LSTM) language scoring model. Experiments on the -benchmark datasets show that without the help of feature engineering as most -existing approaches, our models achieve competitive or better performances with -previous state-of-the-art methods. -" -2988,1606.04351,"Georgios Balikas, Massih-Reza Amini",TwiSE at SemEval-2016 Task 4: Twitter Sentiment Classification,cs.CL cs.IR cs.LG," This paper describes the participation of the team ""TwiSE"" in the SemEval -2016 challenge. Specifically, we participated in Task 4, namely ""Sentiment -Analysis in Twitter"" for which we implemented sentiment classification systems -for subtasks A, B, C and D. Our approach consists of two steps. In the first -step, we generate and validate diverse feature sets for twitter sentiment -evaluation, inspired by the work of participants of previous editions of such -challenges. In the second step, we focus on the optimization of the evaluation -measures of the different subtasks. To this end, we examine different learning -strategies by validating them on the data provided by the task organisers. For -our final submissions we used an ensemble learning approach (stacked -generalization) for Subtask A and single linear models for the rest of the -subtasks. In the official leaderboard we were ranked 9/35, 8/19, 1/11 and 2/14 -for subtasks A, B, C and D respectively.\footnote{We make the code available -for research purposes at -\url{https://github.com/balikasg/SemEval2016-Twitter\_Sentiment\_Evaluation}.} -" -2989,1606.04429,"Alberto P. Garc\'ia-Plaza and V\'ictor Fresno and Raquel Mart\'inez - and Arkaitz Zubiaga",Using Fuzzy Logic to Leverage HTML Markup for Web Page Representation,cs.IR cs.CL," The selection of a suitable document representation approach plays a crucial -role in the performance of a document clustering task. Being able to pick out -representative words within a document can lead to substantial improvements in -document clustering. In the case of web documents, the HTML markup that defines -the layout of the content provides additional structural information that can -be further exploited to identify representative words. In this paper we -introduce a fuzzy term weighing approach that makes the most of the HTML -structure for document clustering. We set forth and build on the hypothesis -that a good representation can take advantage of how humans skim through -documents to extract the most representative words. The authors of web pages -make use of HTML tags to convey the most important message of a web page -through page elements that attract the readers' attention, such as page titles -or emphasized elements. We define a set of criteria to exploit the information -provided by these page elements, and introduce a fuzzy combination of these -criteria that we evaluate within the context of a web page clustering task. Our -proposed approach, called Abstract Fuzzy Combination of Criteria (AFCC), can -adapt to datasets whose features are distributed differently, achieving good -results compared to other similar fuzzy logic based approaches and TF-IDF -across different datasets. -" -2990,1606.04503,Akanksha and Jacob Eisenstein,"Shallow Discourse Parsing Using Distributed Argument Representations and - Bayesian Optimization",cs.CL," This paper describes the Georgia Tech team's approach to the CoNLL-2016 -supplementary evaluation on discourse relation sense classification. We use -long short-term memories (LSTM) to induce distributed representations of each -argument, and then combine these representations with surface features in a -neural network. The architecture of the neural network is determined by -Bayesian hyperparameter search. -" -2991,1606.04582,"Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi",Query-Reduction Networks for Question Answering,cs.CL cs.NE," In this paper, we study the problem of question answering when reasoning over -multiple facts is required. We propose Query-Reduction Network (QRN), a variant -of Recurrent Neural Network (RNN) that effectively handles both short-term -(local) and long-term (global) sequential dependencies to reason over multiple -facts. QRN considers the context sentences as a sequence of state-changing -triggers, and reduces the original query to a more informed query as it -observes each trigger (context sentence) through time. Our experiments show -that QRN produces the state-of-the-art results in bAbI QA and dialog tasks, and -in a real goal-oriented dialog dataset. In addition, QRN formulation allows -parallelization on RNN's time axis, saving an order of magnitude in time -complexity for training and inference. -" -2992,1606.04596,"Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun and Yang - Liu",Semi-Supervised Learning for Neural Machine Translation,cs.CL," While end-to-end neural machine translation (NMT) has made remarkable -progress recently, NMT systems only rely on parallel corpora for parameter -estimation. Since parallel corpora are usually limited in quantity, quality, -and coverage, especially for low-resource languages, it is appealing to exploit -monolingual corpora to improve NMT. We propose a semi-supervised approach for -training NMT models on the concatenation of labeled (parallel corpora) and -unlabeled (monolingual corpora) data. The central idea is to reconstruct the -monolingual corpora using an autoencoder, in which the source-to-target and -target-to-source translation models serve as the encoder and decoder, -respectively. Our approach can not only exploit the monolingual corpora of the -target language, but also of the source language. Experiments on the -Chinese-English dataset show that our approach achieves significant -improvements over state-of-the-art SMT and NMT systems. -" -2993,1606.04597,"Chunyang Liu, Yang Liu, Huanbo Luan, Maosong Sun and Heng Yu","Agreement-based Learning of Parallel Lexicons and Phrases from - Non-Parallel Corpora",cs.CL," We introduce an agreement-based approach to learning parallel lexicons and -phrases from non-parallel corpora. The basic idea is to encourage two -asymmetric latent-variable translation models (i.e., source-to-target and -target-to-source) to agree on identifying latent phrase and word alignments. -The agreement is defined at both word and phrase levels. We develop a Viterbi -EM algorithm for jointly training the two unidirectional models efficiently. -Experiments on the Chinese-English dataset show that agreement-based learning -significantly improves both alignment and translation performance. -" -2994,1606.04631,"Yi Bin, Yang Yang, Zi Huang, Fumin Shen, Xing Xu, Heng Tao Shen",Bidirectional Long-Short Term Memory for Video Description,cs.MM cs.CL," Video captioning has been attracting broad research attention in multimedia -community. However, most existing approaches either ignore temporal information -among video frames or just employ local contextual temporal knowledge. In this -work, we propose a novel video captioning framework, termed as -\emph{Bidirectional Long-Short Term Memory} (BiLSTM), which deeply captures -bidirectional global temporal structure in video. Specifically, we first devise -a joint visual modelling approach to encode video data by combining a forward -LSTM pass, a backward LSTM pass, together with visual features from -Convolutional Neural Networks (CNNs). Then, we inject the derived video -representation into the subsequent language model for initialization. The -benefits are in two folds: 1) comprehensively preserving sequential and visual -information; and 2) adaptively learning dense visual features and sparse -semantic representations for videos and sentences, respectively. We verify the -effectiveness of our proposed video captioning framework on a commonly-used -benchmark, i.e., Microsoft Video Description (MSVD) corpus, and the -experimental results demonstrate that the superiority of the proposed approach -as compared to several state-of-the-art methods. -" -2995,1606.04640,"Tom Kenter, Alexey Borisov, Maarten de Rijke",Siamese CBOW: Optimizing Word Embeddings for Sentence Representations,cs.CL," We present the Siamese Continuous Bag of Words (Siamese CBOW) model, a neural -network for efficient estimation of high-quality sentence embeddings. Averaging -the embeddings of words in a sentence has proven to be a surprisingly -successful and efficient way of obtaining sentence embeddings. However, word -embeddings trained with the methods currently available are not optimized for -the task of sentence representation, and, thus, likely to be suboptimal. -Siamese CBOW handles this problem by training word embeddings directly for the -purpose of being averaged. The underlying neural network learns word embeddings -by predicting, from a sentence representation, its surrounding sentences. We -show the robustness of the Siamese CBOW model by evaluating it on 20 datasets -stemming from a wide variety of sources. -" -2996,1606.04672,"Allen Huang, Lars Roemheld",Constitutional Precedent of Amicus Briefs,cs.CL cs.CY," We investigate shared language between U.S. Supreme Court majority opinions -and interest groups' corresponding amicus briefs. Specifically, we evaluate -whether language that originated in an amicus brief acquired legal precedent -status by being cited in the Court's opinion. Using plagiarism detection -software, automated querying of a large legal database, and manual analysis, we -establish seven instances where interest group amici were able to formulate -constitutional case law, setting binding legal precedent. We discuss several -such instances for their implications in the Supreme Court's creation of case -law. -" -2997,1606.04686,Verena Rieser and Oliver Lemon,"Natural Language Generation as Planning under Uncertainty Using - Reinforcement Learning",cs.CL cs.AI," We present and evaluate a new model for Natural Language Generation (NLG) in -Spoken Dialogue Systems, based on statistical planning, given noisy feedback -from the current generation context (e.g. a user and a surface realiser). We -study its use in a standard NLG problem: how to present information (in this -case a set of search results) to users, given the complex trade- offs between -utterance length, amount of information conveyed, and cognitive load. We set -these trade-offs by analysing existing MATCH data. We then train a NLG pol- icy -using Reinforcement Learning (RL), which adapts its behaviour to noisy feed- -back from the current generation context. This policy is compared to several -base- lines derived from previous work in this area. The learned policy -significantly out- performs all the prior approaches. -" -2998,1606.04721,Alessandro Bessi,Personality Traits and Echo Chambers on Facebook,cs.SI cs.CL cs.CY cs.HC," In online social networks, users tend to select information that adhere to -their system of beliefs and to form polarized groups of like minded people. -Polarization as well as its effects on online social interactions have been -extensively investigated. Still, the relation between group formation and -personality traits remains unclear. A better understanding of the cognitive and -psychological determinants of online social dynamics might help to design more -efficient communication strategies and to challenge the digital misinformation -threat. In this work, we focus on users commenting posts published by US -Facebook pages supporting scientific and conspiracy-like narratives, and we -classify the personality traits of those users according to their online -behavior. We show that different and conflicting communities are populated by -users showing similar psychological profiles, and that the dominant personality -model is the same in both scientific and conspiracy echo chambers. Moreover, we -observe that the permanence within echo chambers slightly shapes users' -psychological profiles. Our results suggest that the presence of specific -personality traits in individuals lead to their considerable involvement in -supporting narratives inside virtual echo chambers. -" -2999,1606.04754,"Amrita Saha, Mitesh M. Khapra, Sarath Chandar, Janarthanan Rajendran, - Kyunghyun Cho","A Correlational Encoder Decoder Architecture for Pivot Based Sequence - Generation",cs.CL," Interlingua based Machine Translation (MT) aims to encode multiple languages -into a common linguistic representation and then decode sentences in multiple -target languages from this representation. In this work we explore this idea in -the context of neural encoder decoder architectures, albeit on a smaller scale -and without MT as the end goal. Specifically, we consider the case of three -languages or modalities X, Z and Y wherein we are interested in generating -sequences in Y starting from information available in X. However, there is no -parallel training data available between X and Y but, training data is -available between X & Z and Z & Y (as is often the case in many real world -applications). Z thus acts as a pivot/bridge. An obvious solution, which is -perhaps less elegant but works very well in practice is to train a two stage -model which first converts from X to Z and then from Z to Y. Instead we explore -an interlingua inspired solution which jointly learns to do the following (i) -encode X and Z to a common representation and (ii) decode Y from this common -representation. We evaluate our model on two tasks: (i) bridge transliteration -and (ii) bridge captioning. We report promising results in both these -applications and believe that this is a right step towards truly interlingua -inspired encoder decoder architectures. -" -3000,1606.04835,"Qi Li, Tianshi Li, Baobao Chang",Learning Word Sense Embeddings from Word Sense Definitions,cs.CL," Word embeddings play a significant role in many modern NLP systems. Since -learning one representation per word is problematic for polysemous words and -homonymous words, researchers propose to use one embedding per word sense. -Their approaches mainly train word sense embeddings on a corpus. In this paper, -we propose to use word sense definitions to learn one embedding per word sense. -Experimental results on word similarity tasks and a word sense disambiguation -task show that word sense embeddings produced by our approach are of high -quality. -" -3001,1606.04870,"Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew - Tomkins, Balint Miklos, Greg Corrado, Laszlo Lukacs, Marina Ganea, Peter - Young, Vivek Ramavajjala",Smart Reply: Automated Response Suggestion for Email,cs.CL," In this paper we propose and investigate a novel end-to-end method for -automatically generating short email responses, called Smart Reply. It -generates semantically diverse suggestions that can be used as complete email -responses with just one tap on mobile. The system is currently used in Inbox by -Gmail and is responsible for assisting with 10% of all mobile responses. It is -designed to work at very high throughput and process hundreds of millions of -messages daily. The system exploits state-of-the-art, large-scale deep -learning. - We describe the architecture of the system as well as the challenges that we -faced while building it, like response diversity and scalability. We also -introduce a new method for semantic clustering of user-generated content that -requires only a modest amount of explicitly labeled data. -" -3002,1606.04963,"Felix Stahlberg, Eva Hasler and Bill Byrne","The Edit Distance Transducer in Action: The University of Cambridge - English-German System at WMT16",cs.CL," This paper presents the University of Cambridge submission to WMT16. -Motivated by the complementary nature of syntactical machine translation and -neural machine translation (NMT), we exploit the synergies of Hiero and NMT in -different combination schemes. Starting out with a simple neural lattice -rescoring approach, we show that the Hiero lattices are often too narrow for -NMT ensembles. Therefore, instead of a hard restriction of the NMT search space -to the lattice, we propose to loosely couple NMT and Hiero by composition with -a modified version of the edit distance transducer. The loose combination -outperforms lattice rescoring, especially when using multiple NMT systems in an -ensemble. -" -3003,1606.05007,"Naoya Takahashi, Tofigh Naghibi, Beat Pfister","Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep - Neural Networks",cs.CL cs.LG cs.SD," Phonemic or phonetic sub-word units are the most commonly used atomic -elements to represent speech signals in modern ASRs. However they are not the -optimal choice due to several reasons such as: large amount of effort required -to handcraft a pronunciation dictionary, pronunciation variations, human -mistakes and under-resourced dialects and languages. Here, we propose a -data-driven pronunciation estimation and acoustic modeling method which only -takes the orthographic transcription to jointly estimate a set of sub-word -units and a reliable dictionary. Experimental results show that the proposed -method which is based on semi-supervised training of a deep neural network -largely outperforms phoneme based continuous speech recognition on the TIMIT -dataset. -" -3004,1606.05029,Ferhan Ture and Oliver Jojic,"No Need to Pay Attention: Simple Recurrent Neural Networks Work! (for - Answering ""Simple"" Questions)",cs.CL," First-order factoid question answering assumes that the question can be -answered by a single fact in a knowledge base (KB). While this does not seem -like a challenging task, many recent attempts that apply either complex -linguistic reasoning or deep neural networks achieve 65%-76% accuracy on -benchmark sets. Our approach formulates the task as two machine learning -problems: detecting the entities in the question, and classifying the question -as one of the relation types in the KB. We train a recurrent neural network to -solve each problem. On the SimpleQuestions dataset, our approach yields -substantial improvements over previously published results --- even neural -networks based on much more complex architectures. The simplicity of our -approach also has practical advantages, such as efficiency and modularity, that -are valuable especially in an industry setting. In fact, we present a -preliminary analysis of the performance of our model on real queries from -Comcast's X1 entertainment platform with millions of users every day. -" -3005,1606.05250,"Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang","SQuAD: 100,000+ Questions for Machine Comprehension of Text",cs.CL," We present the Stanford Question Answering Dataset (SQuAD), a new reading -comprehension dataset consisting of 100,000+ questions posed by crowdworkers on -a set of Wikipedia articles, where the answer to each question is a segment of -text from the corresponding reading passage. We analyze the dataset to -understand the types of reasoning required to answer the questions, leaning -heavily on dependency and constituency trees. We build a strong logistic -regression model, which achieves an F1 score of 51.0%, a significant -improvement over a simple baseline (20%). However, human performance (86.8%) is -much higher, indicating that the dataset presents a good challenge problem for -future research. - The dataset is freely available at https://stanford-qa.com -" -3006,1606.05286,Julien Perez,"Spectral decomposition method of dialog state tracking via collective - matrix factorization",cs.CL stat.ML," The task of dialog management is commonly decomposed into two sequential -subtasks: dialog state tracking and dialog policy learning. In an end-to-end -dialog system, the aim of dialog state tracking is to accurately estimate the -true dialog state from noisy observations produced by the speech recognition -and the natural language understanding modules. The state tracking task is -primarily meant to support a dialog policy. From a probabilistic perspective, -this is achieved by maintaining a posterior distribution over hidden dialog -states composed of a set of context dependent variables. Once a dialog policy -is learned, it strives to select an optimal dialog act given the estimated -dialog state and a defined reward function. This paper introduces a novel -method of dialog state tracking based on a bilinear algebric decomposition -model that provides an efficient inference schema through collective matrix -factorization. We evaluate the proposed approach on the second Dialog State -Tracking Challenge (DSTC-2) dataset and we show that the proposed tracker gives -encouraging results compared to the state-of-the-art trackers that participated -in this standard benchmark. Finally, we show that the prediction schema is -computationally efficient in comparison to the previous approaches. -" -3007,1606.05320,"Viktoriya Krakovna, Finale Doshi-Velez","Increasing the Interpretability of Recurrent Neural Networks Using - Hidden Markov Models",stat.ML cs.CL cs.LG," As deep neural networks continue to revolutionize various application -domains, there is increasing interest in making these powerful models more -understandable and interpretable, and narrowing down the causes of good and bad -predictions. We focus on recurrent neural networks (RNNs), state of the art -models in speech recognition and translation. Our approach to increasing -interpretability is by combining an RNN with a hidden Markov model (HMM), a -simpler and more transparent model. We explore various combinations of RNNs and -HMMs: an HMM trained on LSTM states; a hybrid model where an HMM is trained -first, then a small LSTM is given HMM state distributions and trained to fill -in gaps in the HMM's performance; and a jointly trained hybrid model. We find -that the LSTM and HMM learn complementary information about the features in the -text. -" -3008,1606.05378,"Reginald Long, Panupong Pasupat, Percy Liang",Simpler Context-Dependent Logical Forms via Model Projections,cs.CL," We consider the task of learning a context-dependent mapping from utterances -to denotations. With only denotations at training time, we must search over a -combinatorially large space of logical forms, which is even larger with -context-dependent utterances. To cope with this challenge, we perform -successive projections of the full model onto simpler models that operate over -equivalence classes of logical forms. Though less expressive, we find that -these simpler models are much faster and can be surprisingly effective. -Moreover, they can be used to bootstrap the full model. Finally, we collected -three new context-dependent semantic parsing datasets, and develop a new -left-to-right parser. -" -3009,1606.05409,"Linfeng Song, Zhiguo Wang, Haitao Mi and Daniel Gildea",Sense Embedding Learning for Word Sense Induction,cs.CL," Conventional word sense induction (WSI) methods usually represent each -instance with discrete linguistic features or cooccurrence features, and train -a model for each polysemous word individually. In this work, we propose to -learn sense embeddings for the WSI task. In the training stage, our method -induces several sense centroids (embedding) for each polysemous word. In the -testing stage, our method represents each instance as a contextual vector, and -induces its sense by finding the nearest sense centroid in the embedding space. -The advantages of our method are (1) distributed sense vectors are taken as the -knowledge representations which are trained discriminatively, and usually have -better performance than traditional count-based distributional models, and (2) -a general model for the whole vocabulary is jointly trained to induce sense -centroids under the mutlitask learning framework. Evaluated on SemEval-2010 WSI -dataset, our method outperforms all participants and most of the recent -state-of-the-art methods. We further verify the two advantages by comparing -with carefully designed baselines. -" -3010,1606.05464,"Isabelle Augenstein and Tim Rockt\""aschel and Andreas Vlachos and - Kalina Bontcheva",Stance Detection with Bidirectional Conditional Encoding,cs.CL cs.LG cs.NE," Stance detection is the task of classifying the attitude expressed in a text -towards a target such as Hillary Clinton to be ""positive"", negative"" or -""neutral"". Previous work has assumed that either the target is mentioned in the -text or that training data for every target is given. This paper considers the -more challenging version of this task, where targets are not always mentioned -and no training data is available for the test targets. We experiment with -conditional LSTM encoding, which builds a representation of the tweet that is -dependent on the target, and demonstrate that it outperforms encoding the tweet -and the target independently. Performance is improved further when the -conditional model is augmented with bidirectional encoding. We evaluate our -approach on the SemEval 2016 Task 6 Twitter Stance Detection corpus achieving -performance second best only to a system trained on semi-automatically labelled -tweets for the test target. When such weak supervision is added, our approach -achieves state-of-the-art results. -" -3011,1606.05467,Juergen Mueller and Gerd Stumme,Gender Inference using Statistical Name Characteristics in Twitter,cs.CL cs.SI," Much attention has been given to the task of gender inference of Twitter -users. Although names are strong gender indicators, the names of Twitter users -are rarely used as a feature; probably due to the high number of ill-formed -names, which cannot be found in any name dictionary. Instead of relying solely -on a name database, we propose a novel name classifier. Our approach extracts -characteristics from the user names and uses those in order to assign the names -to a gender. This enables us to classify international first names as well as -ill-formed names. -" -3012,1606.05491,Ond\v{r}ej Du\v{s}ek and Filip Jur\v{c}\'i\v{c}ek,"Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax - Trees and Strings",cs.CL," We present a natural language generator based on the sequence-to-sequence -approach that can be trained to produce natural language strings as well as -deep syntax dependency trees from input dialogue acts, and we use it to -directly compare two-step generation with separate sentence planning and -surface realization stages to a joint, one-step approach. We were able to train -both setups successfully using very little training data. The joint setup -offers better performance, surpassing state-of-the-art with regards to -n-gram-based scores while providing more relevant outputs. -" -3013,1606.05545,David Vilares and Carlos G\'omez-Rodr\'iguez and Miguel A. Alonso,"Universal, Unsupervised (Rule-Based), Uncovered Sentiment Analysis",cs.CL," We present a novel unsupervised approach for multilingual sentiment analysis -driven by compositional syntax-based rules. On the one hand, we exploit some of -the main advantages of unsupervised algorithms: (1) the interpretability of -their output, in contrast with most supervised models, which behave as a black -box and (2) their robustness across different corpora and domains. On the other -hand, by introducing the concept of compositional operations and exploiting -syntactic information in the form of universal dependencies, we tackle one of -their main drawbacks: their rigidity on data that are structured differently -depending on the language concerned. Experiments show an improvement both over -existing unsupervised methods, and over state-of-the-art supervised models when -evaluating outside their corpus of origin. Experiments also show how the same -compositional operations can be shared across languages. The system is -available at http://www.grupolys.org/software/UUUSA/ -" -3014,1606.05554,"Noura Al Moubayed, Toby Breckon, Peter Matthews, and A. Stephen - McGough","SMS Spam Filtering using Probabilistic Topic Modelling and Stacked - Denoising Autoencoder",cs.CL cs.LG cs.NE," In This paper we present a novel approach to spam filtering and demonstrate -its applicability with respect to SMS messages. Our approach requires minimum -features engineering and a small set of la- belled data samples. Features are -extracted using topic modelling based on latent Dirichlet allocation, and then -a comprehensive data model is created using a Stacked Denoising Autoencoder -(SDA). Topic modelling summarises the data providing ease of use and high -interpretability by visualising the topics using word clouds. Given that the -SMS messages can be regarded as either spam (unwanted) or ham (wanted), the SDA -is able to model the messages and accurately discriminate between the two -classes without the need for a pre-labelled training set. The results are -compared against the state-of-the-art spam detection algorithms with our -proposed approach achieving over 97% accuracy which compares favourably to the -best reported algorithms presented in the literature. -" -3015,1606.05611,Tim Zimmermann and Leo Kotschenreuther and Karsten Schmidt,"Data-driven HR - R\'esum\'e Analysis Based on Natural Language - Processing and Machine Learning",cs.CL cs.AI," Recruiters usually spend less than a minute looking at each r\'esum\'e when -deciding whether it's worth continuing the recruitment process with the -candidate. Recruiters focus on keywords, and it's almost impossible to -guarantee a fair process of candidate selection. The main scope of this paper -is to tackle this issue by introducing a data-driven approach that shows how to -process r\'esum\'es automatically and give recruiters more time to only examine -promising candidates. Furthermore, we show how to leverage Machine Learning and -Natural Language Processing in order to extract all required information from -the r\'esum\'es. Once the information is extracted, a ranking score is -calculated. The score describes how well the candidates fit based on their -education, work experience and skills. Later this paper illustrates a prototype -application that shows how this novel approach can increase the productivity of -recruiters. The application enables them to filter and rank candidates based on -predefined job descriptions. Guided by the ranking, recruiters can get deeper -insights from candidate profiles and validate why and how the application -ranked them. This application shows how to improve the hiring process by giving -an unbiased hiring decision support. -" -3016,1606.05679,"Haoruo Peng, Dan Roth",Two Discourse Driven Language Models for Semantics,cs.CL," Natural language understanding often requires deep semantic knowledge. -Expanding on previous proposals, we suggest that some important aspects of -semantic knowledge can be modeled as a language model if done at an appropriate -level of abstraction. We develop two distinct models that capture semantic -frame chains and discourse information while abstracting over the specific -mentions of predicates and entities. For each model, we investigate four -implementations: a ""standard"" N-gram language model and three discriminatively -trained ""neural"" language models that generate embeddings for semantic frames. -The quality of the semantic language models (SemLM) is evaluated both -intrinsically, using perplexity and a narrative cloze test and extrinsically - -we show that our SemLM helps improve performance on semantic natural language -processing tasks such as co-reference resolution and discourse parsing. -" -3017,1606.05694,"Prashanth Vijayaraghavan, Ivan Sysoev, Soroush Vosoughi and Deb Roy","DeepStance at SemEval-2016 Task 6: Detecting Stance in Tweets Using - Character and Word-Level CNNs",cs.CL cs.SI," This paper describes our approach for the Detecting Stance in Tweets task -(SemEval-2016 Task 6). We utilized recent advances in short text categorization -using deep learning to create word-level and character-level models. The choice -between word-level and character-level models in each particular case was -informed through validation performance. Our final system is a combination of -classifiers using word-level or character-level models. We also employed novel -data augmentation techniques to expand and diversify our training dataset, thus -making our system more robust. Our system achieved a macro-average precision, -recall and F1-scores of 0.67, 0.61 and 0.635 respectively. -" -3018,1606.05699,Lu Wang and Claire Cardie and Galen Marchetti,Socially-Informed Timeline Generation for Complex Events,cs.CL," Existing timeline generation systems for complex events consider only -information from traditional media, ignoring the rich social context provided -by user-generated content that reveals representative public interests or -insightful opinions. We instead aim to generate socially-informed timelines -that contain both news article summaries and selected user comments. We present -an optimization framework designed to balance topical cohesion between the -article and comment summaries along with their informativeness and coverage of -the event. Automatic evaluations on real-world datasets that cover four complex -events show that our system produces more informative timelines than -state-of-the-art systems. In human evaluation, the associated comment summaries -are furthermore rated more insightful than editor's picks and comments ranked -highly by users. -" -3019,1606.05702,Lu Wang and Hema Raghavan and Claire Cardie and Vittorio Castelli,Query-Focused Opinion Summarization for User-Generated Content,cs.CL," We present a submodular function-based framework for query-focused opinion -summarization. Within our framework, relevance ordering produced by a -statistical ranker, and information coverage with respect to topic distribution -and diverse viewpoints are both encoded as submodular functions. Dispersion -functions are utilized to minimize the redundancy. We are the first to evaluate -different metrics of text similarity for submodularity-based summarization -methods. By experimenting on community QA and blog summarization, we show that -our system outperforms state-of-the-art approaches in both automatic evaluation -and human evaluation. A human evaluation task is conducted on Amazon Mechanical -Turk with scale, and shows that our systems are able to generate summaries of -high overall quality and information diversity. -" -3020,1606.05704,Lu Wang and Claire Cardie,"A Piece of My Mind: A Sentiment Analysis Approach for Online Dispute - Detection",cs.CL," We investigate the novel task of online dispute detection and propose a -sentiment analysis solution to the problem: we aim to identify the sequence of -sentence-level sentiments expressed during a discussion and to use them as -features in a classifier that predicts the DISPUTE/NON-DISPUTE label for the -discussion as a whole. We evaluate dispute detection approaches on a newly -created corpus of Wikipedia Talk page disputes and find that classifiers that -rely on our sentiment tagging features outperform those that do not. The best -model achieves a very promising F1 score of 0.78 and an accuracy of 0.80. -" -3021,1606.05706,Lu Wang and Claire Cardie,"Improving Agreement and Disagreement Identification in Online - Discussions with A Socially-Tuned Sentiment Lexicon",cs.CL," We study the problem of agreement and disagreement detection in online -discussions. An isotonic Conditional Random Fields (isotonic CRF) based -sequential model is proposed to make predictions on sentence- or segment-level. -We automatically construct a socially-tuned lexicon that is bootstrapped from -existing general-purpose sentiment lexicons to further improve the performance. -We evaluate our agreement and disagreement tagging model on two disparate -online discussion corpora -- Wikipedia Talk pages and online debates. Our model -is shown to outperform the state-of-the-art approaches in both datasets. For -example, the isotonic CRF model achieves F1 scores of 0.74 and 0.67 for -agreement and disagreement detection, when a linear chain CRF obtains 0.58 and -0.56 for the discussions on Wikipedia Talk pages. -" -3022,1606.05759,"Hassan Sajjad, Nadir Durrani, Francisco Guzman, Preslav Nakov, Ahmed - Abdelali, Stephan Vogel, Wael Salloum, Ahmed El Kholy, Nizar Habash","Egyptian Arabic to English Statistical Machine Translation System for - NIST OpenMT'2015",cs.CL," The paper describes the Egyptian Arabic-to-English statistical machine -translation (SMT) system that the QCRI-Columbia-NYUAD (QCN) group submitted to -the NIST OpenMT'2015 competition. The competition focused on informal dialectal -Arabic, as used in SMS, chat, and speech. Thus, our efforts focused on -processing and standardizing Arabic, e.g., using tools such as 3arrib and -MADAMIRA. We further trained a phrase-based SMT system using state-of-the-art -features and components such as operation sequence model, class-based language -model, sparse features, neural network joint model, genre-based -hierarchically-interpolated language model, unsupervised transliteration -mining, phrase-table merging, and hypothesis combination. Our system ranked -second on all three genres. -" -3023,1606.05804,"Patrick Verga, Arvind Neelakantan, Andrew McCallum","Generalizing to Unseen Entities and Entity Pairs with Row-less Universal - Schema",cs.CL," Universal schema predicts the types of entities and relations in a knowledge -base (KB) by jointly embedding the union of all available schema types---not -only types from multiple structured databases (such as Freebase or Wikipedia -infoboxes), but also types expressed as textual patterns from raw text. This -prediction is typically modeled as a matrix completion problem, with one type -per column, and either one or two entities per row (in the case of entity types -or binary relation types, respectively). Factorizing this sparsely observed -matrix yields a learned vector embedding for each row and each column. In this -paper we explore the problem of making predictions for entities or entity-pairs -unseen at training time (and hence without a pre-learned row embedding). We -propose an approach having no per-row parameters at all; rather we produce a -row vector on the fly using a learned aggregation function of the vectors of -the observed columns for that row. We experiment with various aggregation -functions, including neural network attention models. Our approach can be -understood as a natural language database, in that questions about KB entities -are answered by attending to textual or database evidence. In experiments -predicting both relations and entity types, we demonstrate that despite having -an order of magnitude fewer parameters than traditional universal schema, we -can match the accuracy of the traditional model, and more importantly, we can -now make predictions about unseen rows with nearly the same accuracy as rows -available at training time. -" -3024,1606.05829,"Qixin Wang, Tianyi Luo, Dong Wang",Can Machine Generate Traditional Chinese Poetry? A Feigenbaum Test,cs.CL," Recent progress in neural learning demonstrated that machines can do well in -regularized tasks, e.g., the game of Go. However, artistic activities such as -poem generation are still widely regarded as human's special capability. In -this paper, we demonstrate that a simple neural model can imitate human in some -tasks of art generation. We particularly focus on traditional Chinese poetry, -and show that machines can do as well as many contemporary poets and weakly -pass the Feigenbaum Test, a variant of Turing test in professional domains. Our -method is based on an attention-based recurrent neural network, which accepts a -set of keywords as the theme and generates poems by looking at each keyword -during the generation. A number of techniques are proposed to improve the -model, including character vector initialization, attention to input and -hybrid-style training. Compared to existing poetry generation methods, our -model can generate much more theme-consistent and semantic-rich poems. -" -3025,1606.05854,Dong Xu and Wu-Jun Li,"Full-Time Supervision based Bidirectional RNN for Factoid Question - Answering",cs.CL," Recently, bidirectional recurrent neural network (BRNN) has been widely used -for question answering (QA) tasks with promising performance. However, most -existing BRNN models extract the information of questions and answers by -directly using a pooling operation to generate the representation for loss or -similarity calculation. Hence, these existing models don't put supervision -(loss or similarity calculation) at every time step, which will lose some -useful information. In this paper, we propose a novel BRNN model called -full-time supervision based BRNN (FTS-BRNN), which can put supervision at every -time step. Experiments on the factoid QA task show that our FTS-BRNN can -outperform other baselines to achieve the state-of-the-art accuracy. -" -3026,1606.05925,Vikrant Singh Tomar and Richard C. Rose,"Graph based manifold regularized deep neural networks for automatic - speech recognition",stat.ML cs.CL cs.LG," Deep neural networks (DNNs) have been successfully applied to a wide variety -of acoustic modeling tasks in recent years. These include the applications of -DNNs either in a discriminative feature extraction or in a hybrid acoustic -modeling scenario. Despite the rapid progress in this area, a number of -challenges remain in training DNNs. This paper presents an effective way of -training DNNs using a manifold learning based regularization framework. In this -framework, the parameters of the network are optimized to preserve underlying -manifold based relationships between speech feature vectors while minimizing a -measure of loss between network outputs and targets. This is achieved by -incorporating manifold based locality constraints in the objective criterion of -DNNs. Empirical evidence is provided to demonstrate that training a network -with manifold constraints preserves structural compactness in the hidden layers -of the network. Manifold regularization is applied to train bottleneck DNNs for -feature extraction in hidden Markov model (HMM) based speech recognition. The -experiments in this work are conducted on the Aurora-2 spoken digits and the -Aurora-4 read news large vocabulary continuous speech recognition tasks. The -performance is measured in terms of word error rate (WER) on these tasks. It is -shown that the manifold regularized DNNs result in up to 37% reduction in WER -relative to standard DNNs. -" -3027,1606.05967,"Amir Hossein Harati Nejad Torbati, Joseph Picone","A Nonparametric Bayesian Approach for Spoken Term detection by Example - Query",cs.CL," State of the art speech recognition systems use data-intensive -context-dependent phonemes as acoustic units. However, these approaches do not -translate well to low resourced languages where large amounts of training data -is not available. For such languages, automatic discovery of acoustic units is -critical. In this paper, we demonstrate the application of nonparametric -Bayesian models to acoustic unit discovery. We show that the discovered units -are correlated with phonemes and therefore are linguistically meaningful. We -also present a spoken term detection (STD) by example query algorithm based on -these automatically learned units. We show that our proposed system produces a -P@N of 61.2% and an EER of 13.95% on the TIMIT dataset. The improvement in the -EER is 5% while P@N is only slightly lower than the best reported system in the -literature. -" -3028,1606.05994,Normunds Gruzitis and Guntis Barzdins,"The Role of CNL and AMR in Scalable Abstractive Summarization for - Multilingual Media Monitoring",cs.CL," In the era of Big Data and Deep Learning, there is a common view that machine -learning approaches are the only way to cope with the robust and scalable -information extraction and summarization. It has been recently proposed that -the CNL approach could be scaled up, building on the concept of embedded CNL -and, thus, allowing for CNL-based information extraction from e.g. normative or -medical texts that are rather controlled by nature but still infringe the -boundaries of CNL. Although it is arguable if CNL can be exploited to approach -the robust wide-coverage semantic parsing for use cases like media monitoring, -its potential becomes much more obvious in the opposite direction: generation -of story highlights from the summarized AMR graphs, which is in the focus of -this position paper. -" -3029,1606.06031,"Denis Paperno (1), Germ\'an Kruszewski (1), Angeliki Lazaridou (1), - Quan Ngoc Pham (1), Raffaella Bernardi (1), Sandro Pezzelle (1), Marco Baroni - (1), Gemma Boleda (1), Raquel Fern\'andez (2) ((1) CIMeC - Center for - Mind/Brain Sciences, University of Trento, (2) Institute for Logic, Language - & Computation, University of Amsterdam)",The LAMBADA dataset: Word prediction requiring a broad discourse context,cs.CL cs.AI cs.LG," We introduce LAMBADA, a dataset to evaluate the capabilities of computational -models for text understanding by means of a word prediction task. LAMBADA is a -collection of narrative passages sharing the characteristic that human subjects -are able to guess their last word if they are exposed to the whole passage, but -not if they only see the last sentence preceding the target word. To succeed on -LAMBADA, computational models cannot simply rely on local context, but must be -able to keep track of information in the broader discourse. We show that -LAMBADA exemplifies a wide range of linguistic phenomena, and that none of -several state-of-the-art language models reaches accuracy above 1% on this -novel benchmark. We thus propose LAMBADA as a challenging test set, meant to -encourage the development of new models capable of genuine understanding of -broad context in natural language text. -" -3030,1606.06061,"Heiga Zen and Yannis Agiomyrgiannakis and Niels Egberts and Fergus - Henderson and Przemys{\l}aw Szczepaniak","Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric - Speech Synthesizers for Mobile Devices",cs.SD cs.CL," Acoustic models based on long short-term memory recurrent neural networks -(LSTM-RNNs) were applied to statistical parametric speech synthesis (SPSS) and -showed significant improvements in naturalness and latency over those based on -hidden Markov models (HMMs). This paper describes further optimizations of -LSTM-RNN-based SPSS for deployment on mobile devices; weight quantization, -multi-frame inference, and robust inference using an {\epsilon}-contaminated -Gaussian loss function. Experimental results in subjective listening tests show -that these optimizations can make LSTM-RNN-based SPSS comparable to HMM-based -SPSS in runtime speed while maintaining naturalness. Evaluations between -LSTM-RNN- based SPSS and HMM-driven unit selection speech synthesis are also -presented. -" -3031,1606.06083,"Vivek Gupta, Harish Karnick, Ashendra Bansal, Pradhuman Jhala",Product Classification in E-Commerce using Distributional Semantics,cs.AI cs.CL cs.IR," Product classification is the task of automatically predicting a taxonomy -path for a product in a predefined taxonomy hierarchy given a textual product -description or title. For efficient product classification we require a -suitable representation for a document (the textual description of a product) -feature vector and efficient and fast algorithms for prediction. To address the -above challenges, we propose a new distributional semantics representation for -document vector formation. We also develop a new two-level ensemble approach -utilizing (with respect to the taxonomy tree) a path-wise, node-wise and -depth-wise classifiers for error reduction in the final product classification. -Our experiments show the effectiveness of the distributional representation and -the ensemble approach on data sets from a leading e-commerce platform and -achieve better results on various evaluation metrics compared to earlier -approaches. -" -3032,1606.06086,"Navid Rekabsaz, Mihai Lupu, Allan Hanbury","Uncertainty in Neural Network Word Embedding: Exploration of Threshold - for Similarity",cs.CL cs.IR," Word embedding, specially with its recent developments, promises a -quantification of the similarity between terms. However, it is not clear to -which extent this similarity value can be genuinely meaningful and useful for -subsequent tasks. We explore how the similarity score obtained from the models -is really indicative of term relatedness. We first observe and quantify the -uncertainty factor of the word embedding models regarding to the similarity -value. Based on this factor, we introduce a general threshold on various -dimensions which effectively filters the highly related terms. Our evaluation -on four information retrieval collections supports the effectiveness of our -approach as the results of the introduced threshold are significantly better -than the baseline while being equal to or statistically indistinguishable from -the optimal results. -" -3033,1606.06121,"Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam - Kalai",Quantifying and Reducing Stereotypes in Word Embeddings,cs.CL cs.LG stat.ML," Machine learning algorithms are optimized to model statistical properties of -the training data. If the input data reflects stereotypes and biases of the -broader society, then the output of the learning algorithm also captures these -stereotypes. In this paper, we initiate the study of gender stereotypes in {\em -word embedding}, a popular framework to represent text data. As their use -becomes increasingly common, applications can inadvertently amplify unwanted -stereotypes. We show across multiple datasets that the embeddings contain -significant gender stereotypes, especially with regard to professions. We -created a novel gender analogy task and combined it with crowdsourcing to -systematically quantify the gender bias in a given embedding. We developed an -efficient algorithm that reduces gender stereotype using just a handful of -training examples while preserving the useful geometric properties of the -embedding. We evaluated our algorithm on several metrics. While we focus on -male/female stereotypes, our framework may be applicable to other types of -embedding biases. -" -3034,1606.06125,"Jirka Mar\v{s}\'ik (SEMAGRAMME), Maxime Amblard (MSH Lorraine, - SEMAGRAMME)","Introducing a Calculus of Effects and Handlers for Natural Language - Semantics",cs.CL cs.PL," In compositional model-theoretic semantics, researchers assemble -truth-conditions or other kinds of denotations using the lambda calculus. It -was previously observed that the lambda terms and/or the denotations studied -tend to follow the same pattern: they are instances of a monad. In this paper, -we present an extension of the simply-typed lambda calculus that exploits this -uniformity using the recently discovered technique of effect handlers. We prove -that our calculus exhibits some of the key formal properties of the lambda -calculus and we use it to construct a modular semantics for a small fragment -that involves multiple distinct semantic phenomena. -" -3035,1606.06137,"Petri Luukkonen, Markus Koskela, and Patrik Flor\'een",LSTM-Based Predictions for Proactive Information Retrieval,cs.IR cs.CL cs.NE," We describe a method for proactive information retrieval targeted at -retrieving relevant information during a writing task. In our method, the -current task and the needs of the user are estimated, and the potential next -steps are unobtrusively predicted based on the user's past actions. We focus on -the task of writing, in which the user is coalescing previously collected -information into a text. Our proactive system automatically recommends the user -relevant background information. The proposed system incorporates text input -prediction using a long short-term memory (LSTM) network. We present -simulations, which show that the system is able to reach higher precision -values in an exploratory search setting compared to both a baseline and a -comparison system. -" -3036,1606.06142,"Gergely Tib\'ely, David Sousa-Rodrigues, P\'eter Pollner, Gergely - Palla",Comparing the hierarchy of keywords in on-line news portals,physics.soc-ph cs.CL cs.SI," The tagging of on-line content with informative keywords is a widespread -phenomenon from scientific article repositories through blogs to on-line news -portals. In most of the cases, the tags on a given item are free words chosen -by the authors independently. Therefore, relations among keywords in a -collection of news items is unknown. However, in most cases the topics and -concepts described by these keywords are forming a latent hierarchy, with the -more general topics and categories at the top, and more specialised ones at the -bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction -method to sets of keywords obtained from four different on-line news portals. -The resulting hierarchies show substantial differences not just in the topics -rendered as important (being at the top of the hierarchy) or of less interest -(categorised low in the hierarchy), but also in the underlying network -structure. This reveals discrepancies between the plausible keyword association -frameworks in the studied news portals. -" -3037,1606.06164,"Emiel van Miltenburg, Roser Morante, Desmond Elliott",Pragmatic factors in image description: the case of negations,cs.CL cs.CV," We provide a qualitative analysis of the descriptions containing negations -(no, not, n't, nobody, etc) in the Flickr30K corpus, and a categorization of -negation uses. Based on this analysis, we provide a set of requirements that an -image description system should have in order to generate negation sentences. -As a pilot experiment, we used our categorization to manually annotate -sentences containing negations in the Flickr30K corpus, with an agreement score -of K=0.67. With this paper, we hope to open up a broader discussion of -subjective language in image descriptions. -" -3038,1606.06259,"Amir Zadeh, Rowan Zellers, Eli Pincus, Louis-Philippe Morency","MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis - in Online Opinion Videos",cs.CL cs.MM," People are sharing their opinions, stories and reviews through online video -sharing websites every day. Studying sentiment and subjectivity in these -opinion videos is experiencing a growing attention from academia and industry. -While sentiment analysis has been successful for text, it is an understudied -research question for videos and multimedia content. The biggest setbacks for -studies in this direction are lack of a proper dataset, methodology, baselines -and statistical analysis of how information from different modality sources -relate to each other. This paper introduces to the scientific community the -first opinion-level annotated corpus of sentiment and subjectivity analysis in -online videos called Multimodal Opinion-level Sentiment Intensity dataset -(MOSI). The dataset is rigorously annotated with labels for subjectivity, -sentiment intensity, per-frame and per-opinion annotated visual features, and -per-milliseconds annotated audio features. Furthermore, we present baselines -for future studies in this direction as well as a new multimodal fusion -approach that jointly models spoken words and visual gestures. -" -3039,1606.06274,"Vivek Datla, David Lin, Max Louwerse and Abhinav Vishnu","A Data-Driven Approach for Semantic Role Labeling from Induced Grammar - Structures in Language",cs.CL," Semantic roles play an important role in extracting knowledge from text. -Current unsupervised approaches utilize features from grammar structures, to -induce semantic roles. The dependence on these grammars, however, makes it -difficult to adapt to noisy and new languages. In this paper we develop a -data-driven approach to identifying semantic roles, the approach is entirely -unsupervised up to the point where rules need to be learned to identify the -position the semantic role occurs. Specifically we develop a modified-ADIOS -algorithm based on ADIOS Solan et al. (2005) to learn grammar structures, and -use these grammar structures to learn the rules for identifying the semantic -roles based on the context in which the grammar structures appeared. The -results obtained are comparable with the current state-of-art models that are -inherently dependent on human annotated data. -" -3040,1606.06352,"Abram Handler, Su Lin Blodgett, Brendan O'Connor",Visualizing textual models with in-text and word-as-pixel highlighting,stat.ML cs.CL cs.LG," We explore two techniques which use color to make sense of statistical text -models. One method uses in-text annotations to illustrate a model's view of -particular tokens in particular documents. Another uses a high-level, -""words-as-pixels"" graphic to display an entire corpus. Together, these methods -offer both zoomed-in and zoomed-out perspectives into a model's understanding -of text. We show how these interconnected methods help diagnose a classifier's -poor performance on Twitter slang, and make sense of a topic model on -historical political texts. -" -3041,1606.06361,Abulhair Saparov,A Probabilistic Generative Grammar for Semantic Parsing,cs.CL cs.LG stat.ML," Domain-general semantic parsing is a long-standing goal in natural language -processing, where the semantic parser is capable of robustly parsing sentences -from domains outside of which it was trained. Current approaches largely rely -on additional supervision from new domains in order to generalize to those -domains. We present a generative model of natural language utterances and -logical forms and demonstrate its application to semantic parsing. Our approach -relies on domain-independent supervision to generalize to new domains. We -derive and implement efficient algorithms for training, parsing, and sentence -generation. The work relies on a novel application of hierarchical Dirichlet -processes (HDPs) for structured prediction, which we also present in this -manuscript. - This manuscript is an excerpt of chapter 4 from the Ph.D. thesis of Saparov -(2022), where the model plays a central role in a larger natural language -understanding system. - This manuscript provides a new simplified and more complete presentation of -the work first introduced in Saparov, Saraswat, and Mitchell (2017). The -description and proofs of correctness of the training algorithm, parsing -algorithm, and sentence generation algorithm are much simplified in this new -presentation. We also describe the novel application of hierarchical Dirichlet -processes for structured prediction. In addition, we extend the earlier work -with a new model of word morphology, which utilizes the comprehensive -morphological data from Wiktionary. -" -3042,1606.06368,"Fereshte Khani, Martin Rinard, Percy Liang","Unanimous Prediction for 100% Precision with Application to Learning - Semantic Mappings",cs.LG cs.AI cs.CL," Can we train a system that, on any new input, either says ""don't know"" or -makes a prediction that is guaranteed to be correct? We answer the question in -the affirmative provided our model family is well-specified. Specifically, we -introduce the unanimity principle: only predict when all models consistent with -the training data predict the same output. We operationalize this principle for -semantic parsing, the task of mapping utterances to logical forms. We develop a -simple, efficient method that reasons over the infinite set of all consistent -models by only checking two of the models. We prove that our method obtains -100% precision even with a modest amount of training data from a possibly -adversarial distribution. Empirically, we demonstrate the effectiveness of our -approach on the standard GeoQuery dataset. -" -3043,1606.06406,James Cross and Liang Huang,Incremental Parsing with Minimal Features Using Bi-Directional LSTM,cs.CL," Recently, neural network approaches for parsing have largely automated the -combination of individual features, but still rely on (often a larger number -of) atomic features created from human linguistic intuition, and potentially -omitting important global context. To further reduce feature engineering to the -bare minimum, we use bi-directional LSTM sentence representations to model a -parser state with only three sentence positions, which automatically identifies -important aspects of the entire sentence. This model achieves state-of-the-art -results among greedy dependency parsers for English. We also introduce a novel -transition system for constituency parsing which does not require binarization, -and together with the above architecture, achieves state-of-the-art results -among greedy parsers for both English and Chinese. -" -3044,1606.06424,"Tanmay Basu, Shraman Kumar, Abhishek Kalyan, Priyanka Jayaswal, Pawan - Goyal, Stephen Pettifer and Siddhartha R. Jonnalagadda","A Novel Framework to Expedite Systematic Reviews by Automatically - Building Information Extraction Training Corpora",cs.IR cs.CL cs.LG," A systematic review identifies and collates various clinical studies and -compares data elements and results in order to provide an evidence based answer -for a particular clinical question. The process is manual and involves lot of -time. A tool to automate this process is lacking. The aim of this work is to -develop a framework using natural language processing and machine learning to -build information extraction algorithms to identify data elements in a new -primary publication, without having to go through the expensive task of manual -annotation to build gold standards for each data element type. The system is -developed in two stages. Initially, it uses information contained in existing -systematic reviews to identify the sentences from the PDF files of the included -references that contain specific data elements of interest using a modified -Jaccard similarity measure. These sentences have been treated as labeled data.A -Support Vector Machine (SVM) classifier is trained on this labeled data to -extract data elements of interests from a new article. We conducted experiments -on Cochrane Database systematic reviews related to congestive heart failure -using inclusion criteria as an example data element. The empirical results show -that the proposed system automatically identifies sentences containing the data -element of interest with a high recall (93.75%) and reasonable precision -(27.05% - which means the reviewers have to read only 3.7 sentences on -average). The empirical results suggest that the tool is retrieving valuable -information from the reference articles, even when it is time-consuming to -identify them manually. Thus we hope that the tool will be useful for automatic -data extraction from biomedical research publications. The future scope of this -work is to generalize this information framework for all types of systematic -reviews. -" -3045,1606.06461,"Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson",Neighborhood Mixture Model for Knowledge Base Completion,cs.CL cs.AI," Knowledge bases are useful resources for many natural language processing -tasks, however, they are far from complete. In this paper, we define a novel -entity representation as a mixture of its neighborhood in the knowledge base -and apply this technique on TransE-a well-known embedding model for knowledge -base completion. Experimental results show that the neighborhood information -significantly helps to improve the results of the TransE model, leading to -better performance than obtained by other state-of-the-art embedding models on -three benchmark datasets for triple classification, entity prediction and -relation prediction tasks. -" -3046,1606.06622,"Arijit Ray, Gordon Christie, Mohit Bansal, Dhruv Batra, Devi Parikh","Question Relevance in VQA: Identifying Non-Visual And False-Premise - Questions",cs.CV cs.CL cs.LG," Visual Question Answering (VQA) is the task of answering natural-language -questions about images. We introduce the novel problem of determining the -relevance of questions to images in VQA. Current VQA models do not reason about -whether a question is even related to the given image (e.g. What is the capital -of Argentina?) or if it requires information from external resources to answer -correctly. This can break the continuity of a dialogue in human-machine -interaction. Our approaches for determining relevance are composed of two -stages. Given an image and a question, (1) we first determine whether the -question is visual or not, (2) if visual, we determine whether the question is -relevant to the given image or not. Our approaches, based on LSTM-RNNs, VQA -model uncertainty, and caption-question similarity, are able to outperform -strong baselines on both relevance tasks. We also present human studies showing -that VQA models augmented with such question relevance reasoning are perceived -as more intelligent, reasonable, and human-like. -" -3047,1606.06623,Georgios Balikas and Massih-Reza Amini,"An empirical study on large scale text classification with skip-gram - embeddings",cs.CL cs.IR," We investigate the integration of word embeddings as classification features -in the setting of large scale text classification. Such representations have -been used in a plethora of tasks, however their application in classification -scenarios with thousands of classes has not been extensively researched, -partially due to hardware limitations. In this work, we examine efficient -composition functions to obtain document-level from word-level embeddings and -we subsequently investigate their combination with the traditional -one-hot-encoding representations. By presenting empirical evidence on large, -multi-class, multi-label classification problems, we demonstrate the efficiency -and the performance benefits of this combination. -" -3048,1606.06640,"Georg Heigold, Guenter Neumann, Josef van Genabith","Neural Morphological Tagging from Characters for Morphologically Rich - Languages",cs.CL," This paper investigates neural character-based morphological tagging for -languages with complex morphology and large tag sets. We systematically explore -a variety of neural architectures (DNN, CNN, CNNHighway, LSTM, BLSTM) to obtain -character-based word vectors combined with bidirectional LSTMs to model -across-word context in an end-to-end setting. We explore supplementary use of -word-based vectors trained on large amounts of unlabeled data. Our experiments -for morphological tagging suggest that for ""simple"" model configurations, the -choice of the network architecture (CNN vs. CNNHighway vs. LSTM vs. BLSTM) or -the augmentation with pre-trained word embeddings can be important and clearly -impact the accuracy. Increasing the model capacity by adding depth, for -example, and carefully optimizing the neural networks can lead to substantial -improvements, and the differences in accuracy (but not training time) become -much smaller or even negligible. Overall, our best morphological taggers for -German and Czech outperform the best results reported in the literature by a -large margin. -" -3049,1606.06710,"Yulia Tsvetkov, Manaal Faruqui, Chris Dyer",Correlation-based Intrinsic Evaluation of Word Vector Representations,cs.CL," We introduce QVEC-CCA--an intrinsic evaluation metric for word vector -representations based on correlations of learned vectors with features -extracted from linguistic resources. We show that QVEC-CCA scores are an -effective proxy for a range of extrinsic semantic and syntactic tasks. We also -show that the proposed evaluation obtains higher and more consistent -correlations with downstream tasks, compared to existing approaches to -intrinsic evaluation of word vectors that are based on word similarity. -" -3050,1606.06737,"Henry W. Lin (Harvard), Max Tegmark (MIT)",Criticality in Formal Languages and Statistical Physics,cond-mat.dis-nn cs.CL," We show that the mutual information between two symbols, as a function of the -number of symbols between the two, decays exponentially in any probabilistic -regular grammar, but can decay like a power law for a context-free grammar. -This result about formal languages is closely related to a well-known result in -classical statistical mechanics that there are no phase transitions in -dimensions fewer than two. It is also related to the emergence of power-law -correlations in turbulence and cosmological inflation through recursive -generative processes. We elucidate these physics connections and comment on -potential applications of our results to machine learning tasks like training -artificial recurrent neural networks. Along the way, we introduce a useful -quantity which we dub the rational mutual information and discuss -generalizations of our claims involving more complicated Bayesian networks. -" -3051,1606.06820,"Ryan J. Gallagher, Andrew J. Reagan, Christopher M. Danforth, Peter - Sheridan Dodds","Divergent discourse between protests and counter-protests: - #BlackLivesMatter and #AllLivesMatter",cs.CL cs.CY cs.SI," Since the shooting of Black teenager Michael Brown by White police officer -Darren Wilson in Ferguson, Missouri, the protest hashtag #BlackLivesMatter has -amplified critiques of extrajudicial killings of Black Americans. In response -to #BlackLivesMatter, other Twitter users have adopted #AllLivesMatter, a -counter-protest hashtag whose content argues that equal attention should be -given to all lives regardless of race. Through a multi-level analysis of over -860,000 tweets, we study how these protests and counter-protests diverge by -quantifying aspects of their discourse. We find that #AllLivesMatter -facilitates opposition between #BlackLivesMatter and hashtags such as -#PoliceLivesMatter and #BlueLivesMatter in such a way that historically echoes -the tension between Black protesters and law enforcement. In addition, we show -that a significant portion of #AllLivesMatter use stems from hijacking by -#BlackLivesMatter advocates. Beyond simply injecting #AllLivesMatter with -#BlackLivesMatter content, these hijackers use the hashtag to directly confront -the counter-protest notion of ""All lives matter."" Our findings suggest that -Black Lives Matter movement was able to grow, exhibit diverse conversations, -and avoid derailment on social media by making discussion of counter-protest -opinions a central topic of #AllLivesMatter, rather than the movement itself. -" -3052,1606.06864,"Stefan Braun, Daniel Neil, Shih-Chii Liu","A Curriculum Learning Method for Improved Noise Robustness in Automatic - Speech Recognition",cs.CL cs.LG cs.SD," The performance of automatic speech recognition systems under noisy -environments still leaves room for improvement. Speech enhancement or feature -enhancement techniques for increasing noise robustness of these systems usually -add components to the recognition system that need careful optimization. In -this work, we propose the use of a relatively simple curriculum training -strategy called accordion annealing (ACCAN). It uses a multi-stage training -schedule where samples at signal-to-noise ratio (SNR) values as low as 0dB are -first added and samples at increasing higher SNR values are gradually added up -to an SNR value of 50dB. We also use a method called per-epoch noise mixing -(PEM) that generates noisy training samples online during training and thus -enables dynamically changing the SNR of our training data. Both the ACCAN and -the PEM methods are evaluated on a end-to-end speech recognition pipeline on -the Wall Street Journal corpus. ACCAN decreases the average word error rate -(WER) on the 20dB to -10dB SNR range by up to 31.4% when compared to a -conventional multi-condition training method. -" -3053,1606.06871,"Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schl\""uter, - Hermann Ney","A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic - Modeling in Speech Recognition",cs.NE cs.CL cs.LG cs.SD," We present a comprehensive study of deep bidirectional long short-term memory -(LSTM) recurrent neural network (RNN) based acoustic models for automatic -speech recognition (ASR). We study the effect of size and depth and train -models of up to 8 layers. We investigate the training aspect and study -different variants of optimization methods, batching, truncated -backpropagation, different regularization techniques such as dropout and $L_2$ -regularization, and different gradient clipping variants. - The major part of the experimental analysis was performed on the Quaero -corpus. Additional experiments also were performed on the Switchboard corpus. -Our best LSTM model has a relative improvement in word error rate of over 14\% -compared to our best feed-forward neural network (FFNN) baseline on the Quaero -task. On this task, we get our best result with an 8 layer bidirectional LSTM -and we show that a pretraining scheme with layer-wise construction helps for -deep LSTMs. - Finally we compare the training calculation time of many of the presented -experiments in relation with recognition performance. - All the experiments were done with RETURNN, the RWTH extensible training -framework for universal recurrent neural networks in combination with RASR, the -RWTH ASR toolkit. -" -3054,1606.06900,Panupong Pasupat and Percy Liang,Inferring Logical Forms From Denotations,cs.CL cs.AI," A core problem in learning semantic parsers from denotations is picking out -consistent logical forms--those that yield the correct denotation--from a -combinatorially large space. To control the search space, previous work relied -on restricted set of rules, which limits expressivity. In this paper, we -consider a much more expressive class of logical forms, and show how to use -dynamic programming to efficiently represent the complete set of consistent -logical forms. Expressivity also introduces many more spurious logical forms -which are consistent with the correct denotation but do not represent the -meaning of the utterance. To address this, we generate fictitious worlds and -use crowdsourced denotations on these worlds to filter out spurious logical -forms. On the WikiTableQuestions dataset, we increase the coverage of -answerable questions from 53.5% to 76%, and the additional crowdsourced -supervision lets us rule out 92.1% of spurious logical forms. -" -3055,1606.06905,"Ying Wen, Weinan Zhang, Rui Luo, Jun Wang","Learning text representation using recurrent convolutional neural - network with highway layers",cs.CL cs.IR," Recently, the rapid development of word embedding and neural networks has -brought new inspiration to various NLP and IR tasks. In this paper, we describe -a staged hybrid model combining Recurrent Convolutional Neural Networks (RCNN) -with highway layers. The highway network module is incorporated in the middle -takes the output of the bi-directional Recurrent Neural Network (Bi-RNN) module -in the first stage and provides the Convolutional Neural Network (CNN) module -in the last stage with the input. The experiment shows that our model -outperforms common neural network models (CNN, RNN, Bi-RNN) on a sentiment -analysis task. Besides, the analysis of how sequence length influences the RCNN -with highway layers shows that our model could learn good representation for -the long text. -" -3056,1606.06950,"Herman Kamper, Aren Jansen, Sharon Goldwater","A segmental framework for fully-unsupervised large-vocabulary speech - recognition",cs.CL cs.LG," Zero-resource speech technology is a growing research area that aims to -develop methods for speech processing in the absence of transcriptions, -lexicons, or language modelling text. Early term discovery systems focused on -identifying isolated recurring patterns in a corpus, while more recent -full-coverage systems attempt to completely segment and cluster the audio into -word-like units---effectively performing unsupervised speech recognition. This -article presents the first attempt we are aware of to apply such a system to -large-vocabulary multi-speaker data. Our system uses a Bayesian modelling -framework with segmental word representations: each word segment is represented -as a fixed-dimensional acoustic embedding obtained by mapping the sequence of -feature frames to a single embedding vector. We compare our system on English -and Xitsonga datasets to state-of-the-art baselines, using a variety of -measures including word error rate (obtained by mapping the unsupervised output -to ground truth transcriptions). Very high word error rates are reported---in -the order of 70--80% for speaker-dependent and 80--95% for speaker-independent -systems---highlighting the difficulty of this task. Nevertheless, in terms of -cluster quality and word segmentation metrics, we show that by imposing a -consistent top-down segmentation while also using bottom-up knowledge from -detected syllable boundaries, both single-speaker and multi-speaker versions of -our system outperform a purely bottom-up single-speaker syllable-based -approach. We also show that the discovered clusters can be made less speaker- -and gender-specific by using an unsupervised autoencoder-like feature extractor -to learn better frame-level features (prior to embedding). Our system's -discovered clusters are still less pure than those of unsupervised term -discovery systems, but provide far greater coverage. -" -3057,1606.06991,Nawal Ould-Amer and Philippe Mulhem and Mathias Gery,Toward Word Embedding for Personalized Information Retrieval,cs.IR cs.CL," This paper presents preliminary works on using Word Embedding (word2vec) for -query expansion in the context of Personalized Information Retrieval. -Traditionally, word embeddings are learned on a general corpus, like Wikipedia. -In this work we try to personalize the word embeddings learning, by achieving -the learning on the user's profile. The word embeddings are then in the same -context than the user interests. Our proposal is evaluated on the CLEF Social -Book Search 2016 collection. The results obtained show that some efforts should -be made in the way to apply Word Embedding in the context of Personalized -Information Retrieval. -" -3058,1606.06996,Christian Bentz and Dimitrios Alikaniotis,The word entropy of natural languages,cs.CL," The average uncertainty associated with words is an information-theoretic -concept at the heart of quantitative and computational linguistics. The entropy -has been established as a measure of this average uncertainty - also called -average information content. We here use parallel texts of 21 languages to -establish the number of tokens at which word entropies converge to stable -values. These convergence points are then used to select texts from a massively -parallel corpus, and to estimate word entropies across more than 1000 -languages. Our results help to establish quantitative language comparisons, to -understand the performance of multilingual translation systems, and to -normalize semantic similarity measures. -" -3059,1606.07006,"Xiao Yang, Craig Macdonald, Iadh Ounis",Using Word Embeddings in Twitter Election Classification,cs.IR cs.CL," Word embeddings and convolutional neural networks (CNN) have attracted -extensive attention in various classification tasks for Twitter, e.g. sentiment -classification. However, the effect of the configuration used to train and -generate the word embeddings on the classification performance has not been -studied in the existing literature. In this paper, using a Twitter election -classification task that aims to detect election-related tweets, we investigate -the impact of the background dataset used to train the embedding models, the -context window size and the dimensionality of word embeddings on the -classification performance. By comparing the classification results of two word -embedding models, which are trained using different background corpora (e.g. -Wikipedia articles and Twitter microposts), we show that the background data -type should align with the Twitter classification dataset to achieve a better -performance. Moreover, by evaluating the results of word embeddings models -trained using various context window sizes and dimensionalities, we found that -large context window and dimension sizes are preferable to improve the -performance. Our experimental results also show that using word embeddings and -CNN leads to statistically significant improvements over various baselines such -as random, SVM with TF-IDF and SVM with word embeddings. -" -3060,1606.07043,"Kyle Reing, David C. Kale, Greg Ver Steeg, Aram Galstyan","Toward Interpretable Topic Discovery via Anchored Correlation - Explanation",stat.ML cs.CL cs.LG," Many predictive tasks, such as diagnosing a patient based on their medical -chart, are ultimately defined by the decisions of human experts. Unfortunately, -encoding experts' knowledge is often time consuming and expensive. We propose a -simple way to use fuzzy and informal knowledge from experts to guide discovery -of interpretable latent topics in text. The underlying intuition of our -approach is that latent factors should be informative about both correlations -in the data and a set of relevance variables specified by an expert. -Mathematically, this approach is a combination of the information bottleneck -and Total Correlation Explanation (CorEx). We give a preliminary evaluation of -Anchored CorEx, showing that it produces more coherent and interpretable topics -on two distinct corpora. -" -3061,1606.07046,Jayant Krishnamurthy and Oyvind Tafjord and Aniruddha Kembhavi,"Semantic Parsing to Probabilistic Programs for Situated Question - Answering",cs.CL," Situated question answering is the problem of answering questions about an -environment such as an image or diagram. This problem requires jointly -interpreting a question and an environment using background knowledge to select -the correct answer. We present Parsing to Probabilistic Programs (P3), a novel -situated question answering model that can use background knowledge and global -features of the question/environment interpretation while retaining efficient -approximate inference. Our key insight is to treat semantic parses as -probabilistic programs that execute nondeterministically and whose possible -executions represent environmental uncertainty. We evaluate our approach on a -new, publicly-released data set of 5000 science diagram questions, -outperforming several competitive classical and neural baselines. -" -3062,1606.07056,"Abhay Prakash, Chris Brockett, Puneet Agrawal","Emulating Human Conversations using Convolutional Neural Network-based - IR",cs.AI cs.CL cs.IR," Conversational agents (""bots"") are beginning to be widely used in -conversational interfaces. To design a system that is capable of emulating -human-like interactions, a conversational layer that can serve as a fabric for -chat-like interaction with the agent is needed. In this paper, we introduce a -model that employs Information Retrieval by utilizing convolutional deep -structured semantic neural network-based features in the ranker to present -human-like responses in ongoing conversation with a user. In conversations, -accounting for context is critical to the retrieval model; we show that our -context-sensitive approach using a Convolutional Deep Structured Semantic Model -(cDSSM) with character trigrams significantly outperforms several conventional -baselines in terms of the relevance of responses retrieved. -" -3063,1606.07103,"Sai Praneeth Suggu, Kushwanth N. Goutham, Manoj K. Chinnakotla and - Manish Shrivastava","Deep Feature Fusion Network for Answer Quality Prediction in Community - Question Answering",cs.IR cs.CL," Community Question Answering (cQA) forums have become a popular medium for -soliciting direct answers to specific questions of users from experts or other -experienced users on a given topic. However, for a given question, users -sometimes have to sift through a large number of low-quality or irrelevant -answers to find out the answer which satisfies their information need. To -alleviate this, the problem of Answer Quality Prediction (AQP) aims to predict -the quality of an answer posted in response to a forum question. Current AQP -systems either learn models using - a) various hand-crafted features (HCF) or -b) use deep learning (DL) techniques which automatically learn the required -feature representations. - In this paper, we propose a novel approach for AQP known as - ""Deep Feature -Fusion Network (DFFN)"" which leverages the advantages of both hand-crafted -features and deep learning based systems. Given a question-answer pair along -with its metadata, DFFN independently - a) learns deep features using a -Convolutional Neural Network (CNN) and b) computes hand-crafted features using -various external resources and then combines them using a deep neural network -trained to predict the final answer quality. DFFN achieves state-of-the-art -performance on the standard SemEval-2015 and SemEval-2016 benchmark datasets -and outperforms baseline approaches which individually employ either HCF or DL -based techniques alone. -" -3064,1606.07137,Abeed Sarker,"Automated Extraction of Number of Subjects in Randomised Controlled - Trials",cs.AI cs.CL cs.IR," We present a simple approach for automatically extracting the number of -subjects involved in randomised controlled trials (RCT). Our approach first -applies a set of rule-based techniques to extract candidate study sizes from -the abstracts of the articles. Supervised classification is then performed over -the candidates with support vector machines, using a small set of lexical, -structural, and contextual features. With only a small annotated training set -of 201 RCTs, we obtained an accuracy of 88\%. We believe that this system will -aid complex medical text processing tasks such as summarisation and question -answering. -" -3065,1606.07189,"Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan - Bhamidipati, Ananth Nagarajan",Gender and Interest Targeting for Sponsored Post Advertising at Tumblr,cs.CL cs.CY cs.SI," As one of the leading platforms for creative content, Tumblr offers -advertisers a unique way of creating brand identity. Advertisers can tell their -story through images, animation, text, music, video, and more, and promote that -content by sponsoring it to appear as an advertisement in the streams of Tumblr -users. In this paper we present a framework that enabled one of the key -targeted advertising components for Tumblr, specifically gender and interest -targeting. We describe the main challenges involved in development of the -framework, which include creating the ground truth for training gender -prediction models, as well as mapping Tumblr content to an interest taxonomy. -For purposes of inferring user interests we propose a novel semi-supervised -neural language model for categorization of Tumblr content (i.e., post tags and -post keywords). The model was trained on a large-scale data set consisting of -6.8 billion user posts, with very limited amount of categorized keywords, and -was shown to have superior performance over the bag-of-words model. We -successfully deployed gender and interest targeting capability in Yahoo -production systems, delivering inference for users that cover more than 90% of -daily activities at Tumblr. Online performance results indicate advantages of -the proposed approach, where we observed 20% lift in user engagement with -sponsored posts as compared to untargeted campaigns. -" -3066,1606.07211,"Gia-Hung Nguyen, Lynda Tamine, Laure Soulier, Nathalie Bricon-Souf",Toward a Deep Neural Approach for Knowledge-Based IR,cs.IR cs.CL," This paper tackles the problem of the semantic gap between a document and a -query within an ad-hoc information retrieval task. In this context, knowledge -bases (KBs) have already been acknowledged as valuable means since they allow -the representation of explicit relations between entities. However, they do not -necessarily represent implicit relations that could be hidden in a corpora. -This latter issue is tackled by recent works dealing with deep representation -learn ing of texts. With this in mind, we argue that embedding KBs within deep -neural architectures supporting documentquery matching would give rise to -fine-grained latent representations of both words and their semantic relations. -In this paper, we review the main approaches of neural-based document ranking -as well as those approaches for latent representation of entities and relations -via KBs. We then propose some avenues to incorporate KBs in deep neural -approaches for document ranking. More particularly, this paper advocates that -KBs can be used either to support enhanced latent representations of queries -and documents based on both distributional and relational semantics or to serve -as a semantic translator between their latent distributional representations. -" -3067,1606.07287,"Fabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro - Moreo Fern\'andez","Picture It In Your Mind: Generating High Level Visual Representations - From Textual Descriptions",cs.IR cs.CL cs.CV cs.NE," In this paper we tackle the problem of image search when the query is a short -textual description of the image the user is looking for. We choose to -implement the actual search process as a similarity search in a visual feature -space, by learning to translate a textual query into a visual representation. -Searching in the visual feature space has the advantage that any update to the -translation model does not require to reprocess the, typically huge, image -collection on which the search is performed. We propose Text2Vis, a neural -network that generates a visual representation, in the visual feature space of -the fc6-fc7 layers of ImageNet, from a short descriptive text. Text2Vis -optimizes two loss functions, using a stochastic loss-selection method. A -visual-focused loss is aimed at learning the actual text-to-visual feature -mapping, while a text-focused loss is aimed at modeling the higher-level -semantic concepts expressed in language and countering the overfit on -non-relevant visual components of the visual loss. We report preliminary -results on the MS-COCO dataset. -" -3068,1606.07298,"Leila Arras and Franziska Horn and Gr\'egoire Montavon and - Klaus-Robert M\""uller and Wojciech Samek",Explaining Predictions of Non-Linear Classifiers in NLP,cs.CL cs.IR cs.LG cs.NE stat.ML," Layer-wise relevance propagation (LRP) is a recently proposed technique for -explaining predictions of complex non-linear classifiers in terms of input -variables. In this paper, we apply LRP for the first time to natural language -processing (NLP). More precisely, we use it to explain the predictions of a -convolutional neural network (CNN) trained on a topic categorization task. Our -analysis highlights which words are relevant for a specific prediction of the -CNN. We compare our technique to standard sensitivity analysis, both -qualitatively and quantitatively, using a ""word deleting"" perturbation -experiment, a PCA analysis, and various visualizations. All experiments -validate the suitability of LRP for explaining the CNN predictions, which is -also in line with results reported in recent image classification studies. -" -3069,1606.07356,"Aishwarya Agrawal, Dhruv Batra, Devi Parikh",Analyzing the Behavior of Visual Question Answering Models,cs.CL cs.AI cs.CV cs.LG," Recently, a number of deep-learning based models have been proposed for the -task of Visual Question Answering (VQA). The performance of most models is -clustered around 60-70%. In this paper we propose systematic methods to analyze -the behavior of these models as a first step towards recognizing their -strengths and weaknesses, and identifying the most fruitful directions for -progress. We analyze two models, one each from two major classes of VQA models --- with-attention and without-attention and show the similarities and -differences in the behavior of these models. We also analyze the winning entry -of the VQA Challenge 2016. - Our behavior analysis reveals that despite recent progress, today's VQA -models are ""myopic"" (tend to fail on sufficiently novel instances), often ""jump -to conclusions"" (converge on a predicted answer after 'listening' to just half -the question), and are ""stubborn"" (do not change their answers across images). -" -3070,1606.07461,"Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, Alexander M. - Rush","LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in - Recurrent Neural Networks",cs.CL cs.AI cs.NE," Recurrent neural networks, and in particular long short-term memory (LSTM) -networks, are a remarkably effective tool for sequence modeling that learn a -dense black-box hidden representation of their sequential input. Researchers -interested in better understanding these models have studied the changes in -hidden state representations over time and noticed some interpretable patterns -but also significant noise. In this work, we present LSTMVIS, a visual analysis -tool for recurrent neural networks with a focus on understanding these hidden -state dynamics. The tool allows users to select a hypothesis input range to -focus on local state changes, to match these states changes to similar patterns -in a large data set, and to align these results with structural annotations -from their domain. We show several use cases of the tool for analyzing specific -hidden state properties on dataset containing nesting, phrase structure, and -chord progressions, and demonstrate how the tool can be used to isolate -patterns for further statistical analysis. We characterize the domain, the -different stakeholders, and their goals and tasks. -" -3071,1606.07470,"Babak Damavandi, Shankar Kumar, Noam Shazeer and Antoine Bruguier","NN-grams: Unifying neural network and n-gram language models for Speech - Recognition",cs.CL stat.ML," We present NN-grams, a novel, hybrid language model integrating n-grams and -neural networks (NN) for speech recognition. The model takes as input both word -histories as well as n-gram counts. Thus, it combines the memorization capacity -and scalability of an n-gram model with the generalization ability of neural -networks. We report experiments where the model is trained on 26B words. -NN-grams are efficient at run-time since they do not include an output soft-max -layer. The model is trained using noise contrastive estimation (NCE), an -approach that transforms the estimation problem of neural networks into one of -binary classification between data samples and noise samples. We present -results with noise samples derived from either an n-gram distribution or from -speech recognition lattices. NN-grams outperforms an n-gram model on an Italian -speech recognition dictation task. -" -3072,1606.07481,"Jind\v{r}ich Libovick\'y, Jind\v{r}ich Helcl, Marek Tlust\'y, Pavel - Pecina, Ond\v{r}ej Bojar","CUNI System for WMT16 Automatic Post-Editing and Multimodal Translation - Tasks",cs.CL," Neural sequence to sequence learning recently became a very promising -paradigm in machine translation, achieving competitive results with statistical -phrase-based systems. In this system description paper, we attempt to utilize -several recently published methods used for neural sequential learning in order -to build systems for WMT 2016 shared tasks of Automatic Post-Editing and -Multimodal Machine Translation. -" -3073,1606.07493,"Harsh Agrawal, Arjun Chandrasekaran, Dhruv Batra, Devi Parikh, Mohit - Bansal",Sort Story: Sorting Jumbled Images and Captions into Stories,cs.CL cs.AI cs.CV cs.LG," Temporal common sense has applications in AI tasks such as QA, multi-document -summarization, and human-AI communication. We propose the task of sequencing -- -given a jumbled set of aligned image-caption pairs that belong to a story, the -task is to sort them such that the output sequence forms a coherent story. We -present multiple approaches, via unary (position) and pairwise (order) -predictions, and their ensemble-based combinations, achieving strong results on -this task. We use both text-based and image-based features, which depict -complementary improvements. Using qualitative examples, we demonstrate that our -models have learnt interesting aspects of temporal common sense. -" -3074,1606.07496,"Roberto Camacho Barranco (1), Laura M. Rodriguez (1), Rebecca Urbina - (1), and M. Shahriar Hossain (1) ((1) The University of Texas at El Paso)",Is a Picture Worth Ten Thousand Words in a Review Dataset?,cs.CV cs.CL cs.IR cs.LG cs.NE," While textual reviews have become prominent in many recommendation-based -systems, automated frameworks to provide relevant visual cues against text -reviews where pictures are not available is a new form of task confronted by -data mining and machine learning researchers. Suggestions of pictures that are -relevant to the content of a review could significantly benefit the users by -increasing the effectiveness of a review. We propose a deep learning-based -framework to automatically: (1) tag the images available in a review dataset, -(2) generate a caption for each image that does not have one, and (3) enhance -each review by recommending relevant images that might not be uploaded by the -corresponding reviewer. We evaluate the proposed framework using the Yelp -Challenge Dataset. While a subset of the images in this particular dataset are -correctly captioned, the majority of the pictures do not have any associated -text. Moreover, there is no mapping between reviews and images. Each image has -a corresponding business-tag where the picture was taken, though. The overall -data setting and unavailability of crucial pieces required for a mapping make -the problem of recommending images for reviews a major challenge. Qualitative -and quantitative evaluations indicate that our proposed framework provides high -quality enhancements through automatic captioning, tagging, and recommendation -for mapping reviews and images. -" -3075,1606.07545,"Camille Jandot, Patrice Simard, Max Chickering, David Grangier, Jina - Suh",Interactive Semantic Featuring for Text Classification,cs.CL stat.ML," In text classification, dictionaries can be used to define -human-comprehensible features. We propose an improvement to dictionary features -called smoothed dictionary features. These features recognize document contexts -instead of n-grams. We describe a principled methodology to solicit dictionary -features from a teacher, and present results showing that models built using -these human-comprehensible features are competitive with models trained with -Bag of Words features. -" -3076,1606.07548,"Lu Wang and Hema Raghavan and Vittorio Castelli and Radu Florian and - Claire Cardie","A Sentence Compression Based Framework to Query-Focused Multi-Document - Summarization",cs.CL," We consider the problem of using sentence compression techniques to -facilitate query-focused multi-document summarization. We present a -sentence-compression-based framework for the task, and design a series of -learning-based compression models built on parse trees. An innovative beam -search decoder is proposed to efficiently find highly probable compressions. -Under this framework, we show how to integrate various indicative metrics such -as linguistic motivation and query relevance into the compression process by -deriving a novel formulation of a compression scoring function. Our best model -achieves statistically significant improvement over the state-of-the-art -systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 -respectively) for the DUC 2006 and 2007 summarization task. -" -3077,1606.07565,"Daniel Cohen, Qingyao Ai, W. Bruce Croft",Adaptability of Neural Networks on Varying Granularity IR Tasks,cs.IR cs.CL," Recent work in Information Retrieval (IR) using Deep Learning models has -yielded state of the art results on a variety of IR tasks. Deep neural networks -(DNN) are capable of learning ideal representations of data during the training -process, removing the need for independently extracting features. However, the -structures of these DNNs are often tailored to perform on specific datasets. In -addition, IR tasks deal with text at varying levels of granularity from single -factoids to documents containing thousands of words. In this paper, we examine -the role of the granularity on the performance of common state of the art DNN -structures in IR. -" -3078,1606.07572,Subhashree S and P Sreenivasa Kumar,Enriching Linked Datasets with New Object Properties,cs.DB cs.AI cs.CL," Although several RDF knowledge bases are available through the LOD -initiative, the ontology schema of such linked datasets is not very rich. In -particular, they lack object properties. The problem of finding new object -properties (and their instances) between any two given classes has not been -investigated in detail in the context of Linked Data. In this paper, we present -DART (Detecting Arbitrary Relations for enriching T-Boxes of Linked Data) - an -unsupervised solution to enrich the LOD cloud with new object properties -between two given classes. DART exploits contextual similarity to identify text -patterns from the web corpus that can potentially represent relations between -individuals. These text patterns are then clustered by means of paraphrase -detection to capture the object properties between the two given LOD classes. -DART also performs fully automated mapping of the discovered relations to the -properties in the linked dataset. This serves many purposes such as -identification of completely new relations, elimination of irrelevant -relations, and generation of prospective property axioms. We have empirically -evaluated our approach on several pairs of classes and found that the system -can indeed be used for enriching the linked datasets with new object properties -and their instances. We compared DART with newOntExt system which is an -offshoot of the NELL (Never-Ending Language Learning) effort. Our experiments -reveal that DART gives better results than newOntExt with respect to both the -correctness, as well as the number of relations. -" -3079,1606.07601,KeBin Peng,Evaluation method of word embedding by roots and affixes,cs.CL," Word embedding has been shown to be remarkably effective in a lot of Natural -Language Processing tasks. However, existing models still have a couple of -limitations in interpreting the dimensions of word vector. In this paper, we -provide a new approach---roots and affixes model(RAAM)---to interpret it from -the intrinsic structures of natural language. Also it can be used as an -evaluation measure of the quality of word embedding. We introduce the -information entropy into our model and divide the dimensions into two -categories, just like roots and affixes in lexical semantics. Then considering -each category as a whole rather than individually. We experimented with English -Wikipedia corpus. Our result show that there is a negative linear relation -between the two attributes and a high positive correlation between our model -and downstream semantic evaluation tasks. -" -3080,1606.07711,Rocco Tripodi and Marcello Pelillo,A Game-Theoretic Approach to Word Sense Disambiguation,cs.AI cs.CL cs.GT," This paper presents a new model for word sense disambiguation formulated in -terms of evolutionary game theory, where each word to be disambiguated is -represented as a node on a graph whose edges represent word relations and -senses are represented as classes. The words simultaneously update their class -membership preferences according to the senses that neighboring words are -likely to choose. We use distributional information to weigh the influence that -each word has on the decisions of the others and semantic similarity -information to measure the strength of compatibility among the choices. With -this information we can formulate the word sense disambiguation problem as a -constraint satisfaction problem and solve it using tools derived from game -theory, maintaining the textual coherence. The model is based on two ideas: -similar words should be assigned to similar classes and the meaning of a word -does not depend on all the words in a text but just on some of them. The paper -provides an in-depth motivation of the idea of modeling the word sense -disambiguation problem in terms of game theory, which is illustrated by an -example. The conclusion presents an extensive analysis on the combination of -similarity measures to use in the framework and a comparison with -state-of-the-art systems. The results show that our model outperforms -state-of-the-art algorithms and can be applied to different tasks and in -different scenarios. -" -3081,1606.07736,Tal Linzen,Issues in evaluating semantic spaces using word analogies,cs.CL," The offset method for solving word analogies has become a standard evaluation -tool for vector-space semantic models: it is considered desirable for a space -to represent semantic relations as consistent vector offsets. We show that the -method's reliance on cosine similarity conflates offset consistency with -largely irrelevant neighborhood structure, and propose simple baselines that -should be used to improve the utility of the method in vector space evaluation. -" -3082,1606.07770,"Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond - Mooney, Trevor Darrell, Kate Saenko",Captioning Images with Diverse Objects,cs.CV cs.CL," Recent captioning models are limited in their ability to scale and describe -concepts unseen in paired image-text corpora. We propose the Novel Object -Captioner (NOC), a deep visual semantic captioning model that can describe a -large number of object categories not present in existing image-caption -datasets. Our model takes advantage of external sources -- labeled images from -object recognition datasets, and semantic knowledge extracted from unannotated -text. We propose minimizing a joint objective which can learn from these -diverse data sources and leverage distributional semantic embeddings, enabling -the model to generalize and describe novel objects outside of image-caption -datasets. We demonstrate that our model exploits semantic information to -generate captions for hundreds of object categories in the ImageNet object -recognition dataset that are not observed in MSCOCO image-caption training -data, as well as many categories that are observed very rarely. Both automatic -evaluations and human judgements show that our model considerably outperforms -prior work in being able to describe many more categories of objects. -" -3083,1606.07772,"Andrew J. Reagan and Lewis Mitchell and Dilan Kiley and Christopher M. - Danforth and Peter Sheridan Dodds",The emotional arcs of stories are dominated by six basic shapes,cs.CL," Advances in computing power, natural language processing, and digitization of -text now make it possible to study a culture's evolution through its texts -using a ""big data"" lens. Our ability to communicate relies in part upon a -shared emotional experience, with stories often following distinct emotional -trajectories and forming patterns that are meaningful to us. Here, by -classifying the emotional arcs for a filtered subset of 1,327 stories from -Project Gutenberg's fiction collection, we find a set of six core emotional -arcs which form the essential building blocks of complex emotional -trajectories. We strengthen our findings by separately applying Matrix -decomposition, supervised learning, and unsupervised learning. For each of -these six core emotional arcs, we examine the closest characteristic stories in -publication today and find that particular emotional arcs enjoy greater -success, as measured by downloads. -" -3084,1606.07783,Ngoc Thang Vu,"Sequential Convolutional Neural Networks for Slot Filling in Spoken - Language Understanding",cs.CL," We investigate the usage of convolutional neural networks (CNNs) for the slot -filling task in spoken language understanding. We propose a novel CNN -architecture for sequence labeling which takes into account the previous -context words with preserved order information and pays special attention to -the current word with its surrounding context. Moreover, it combines the -information from the past and the future words for classification. Our proposed -CNN architecture outperforms even the previously best ensembling recurrent -neural network model and achieves state-of-the-art results with an F1-score of -95.61% on the ATIS benchmark dataset without using any additional linguistic -knowledge and resources. -" -3085,1606.07822,"Jeroen B.P. Vuurens, Carsten Eickhoff, Arjen P. de Vries",Efficient Parallel Learning of Word2Vec,cs.CL cs.DC," Since its introduction, Word2Vec and its variants are widely used to learn -semantics-preserving representations of words or entities in an embedding -space, which can be used to produce state-of-art results for various Natural -Language Processing tasks. Existing implementations aim to learn efficiently by -running multiple threads in parallel while operating on a single model in -shared memory, ignoring incidental memory update collisions. We show that these -collisions can degrade the efficiency of parallel learning, and propose a -straightforward caching strategy that improves the efficiency by a factor of 4. -" -3086,1606.07829,Lu Wang and Claire Cardie,"Unsupervised Topic Modeling Approaches to Decision Summarization in - Spoken Meetings",cs.CL," We present a token-level decision summarization framework that utilizes the -latent topic structures of utterances to identify ""summary-worthy"" words. -Concretely, a series of unsupervised topic models is explored and experimental -results show that fine-grained topic models, which discover topics at the -utterance-level rather than the document-level, can better identify the gist of -the decision-making process. Moreover, our proposed token-level summarization -approach, which is able to remove redundancies within utterances, outperforms -existing utterance ranking based summarization methods. Finally, context -information is also investigated to add additional relevant information to the -summary. -" -3087,1606.07839,"Stefan Lee, Senthil Purushwalkam, Michael Cogswell, Viresh Ranjan, - David Crandall, and Dhruv Batra",Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles,cs.CV cs.CL," Many practical perception systems exist within larger processes that include -interactions with users or additional components capable of evaluating the -quality of predicted solutions. In these contexts, it is beneficial to provide -these oracle mechanisms with multiple highly likely hypotheses rather than a -single prediction. In this work, we pose the task of producing multiple outputs -as a learning problem over an ensemble of deep networks -- introducing a novel -stochastic gradient descent based approach to minimize the loss with respect to -an oracle. Our method is simple to implement, agnostic to both architecture and -loss function, and parameter-free. Our approach achieves lower oracle error -compared to existing methods on a wide range of tasks and deep architectures. -We also show qualitatively that the diverse solutions produced often provide -interpretable representations of task ambiguity. -" -3088,1606.07849,Lu Wang and Claire Cardie,Focused Meeting Summarization via Unsupervised Relation Extraction,cs.CL," We present a novel unsupervised framework for focused meeting summarization -that views the problem as an instance of relation extraction. We adapt an -existing in-domain relation learner (Chen et al., 2011) by exploiting a set of -task-specific constraints and features. We evaluate the approach on a decision -summarization task and show that it outperforms unsupervised utterance-level -extractive summarization baselines as well as an existing generic -relation-extraction-based summarization method. Moreover, our approach produces -summaries competitive with those generated by supervised methods in terms of -the standard ROUGE score. -" -3089,1606.07901,"Yadollah Yaghoobzadeh, Hinrich Sch\""utze",Corpus-level Fine-grained Entity Typing Using Contextual Information,cs.CL," This paper addresses the problem of corpus-level entity typing, i.e., -inferring from a large corpus that an entity is a member of a class such as -""food"" or ""artist"". The application of entity typing we are interested in is -knowledge base completion, specifically, to learn which classes an entity is a -member of. We propose FIGMENT to tackle this problem. FIGMENT is -embedding-based and combines (i) a global model that scores based on aggregated -contextual information of an entity and (ii) a context model that first scores -the individual occurrences of an entity and then aggregates the scores. In our -evaluation, FIGMENT strongly outperforms an approach to entity typing that -relies on relations obtained by an open information extraction system. -" -3090,1606.07902,"Yadollah Yaghoobzadeh, Hinrich Sch\""utze",Intrinsic Subspace Evaluation of Word Embedding Representations,cs.CL," We introduce a new methodology for intrinsic evaluation of word -representations. Specifically, we identify four fundamental criteria based on -the characteristics of natural language that pose difficulties to NLP systems; -and develop tests that directly show whether or not representations contain the -subspaces necessary to satisfy these criteria. Current intrinsic evaluations -are mostly based on the overall similarity or full-space similarity of words -and thus view vector representations as points. We show the limits of these -point-based intrinsic evaluations. We apply our evaluation methodology to the -comparison of a count vector model and several neural network models and -demonstrate important properties of these models. -" -3091,1606.07947,"Yoon Kim, Alexander M. Rush",Sequence-Level Knowledge Distillation,cs.CL cs.LG cs.NE," Neural machine translation (NMT) offers a novel alternative formulation of -translation that is potentially simpler than statistical approaches. However to -reach competitive performance, NMT models need to be exceedingly large. In this -paper we consider applying knowledge distillation approaches (Bucila et al., -2006; Hinton et al., 2015) that have proven successful for reducing the size of -neural models in other domains to the problem of NMT. We demonstrate that -standard knowledge distillation applied to word-level prediction can be -effective for NMT, and also introduce two novel sequence-level versions of -knowledge distillation that further improve performance, and somewhat -surprisingly, seem to eliminate the need for beam search (even when applied on -the original teacher model). Our best student model runs 10 times faster than -its state-of-the-art teacher with little loss in performance. It is also -significantly better than a baseline model trained without knowledge -distillation: by 4.2/1.7 BLEU with greedy decoding/beam search. Applying weight -pruning on top of knowledge distillation results in a student model that has 13 -times fewer parameters than the original teacher model, with a decrease of 0.4 -BLEU. -" -3092,1606.07950,Edilson A. Correa Jr. and Alneu de Andrade Lopes and Diego R. Amancio,Word sense disambiguation: a complex network approach,cs.CL," In recent years, concepts and methods of complex networks have been employed -to tackle the word sense disambiguation (WSD) task by representing words as -nodes, which are connected if they are semantically similar. Despite the -increasingly number of studies carried out with such models, most of them use -networks just to represent the data, while the pattern recognition performed on -the attribute space is performed using traditional learning techniques. In -other words, the structural relationship between words have not been explicitly -used in the pattern recognition process. In addition, only a few investigations -have probed the suitability of representations based on bipartite networks and -graphs (bigraphs) for the problem, as many approaches consider all possible -links between words. In this context, we assess the relevance of a bipartite -network model representing both feature words (i.e. the words characterizing -the context) and target (ambiguous) words to solve ambiguities in written -texts. Here, we focus on the semantical relationships between these two type of -words, disregarding the relationships between feature words. In special, the -proposed method not only serves to represent texts as graphs, but also -constructs a structure on which the discrimination of senses is accomplished. -Our results revealed that the proposed learning algorithm in such bipartite -networks provides excellent results mostly when topical features are employed -to characterize the context. Surprisingly, our method even outperformed the -support vector machine algorithm in particular cases, with the advantage of -being robust even if a small training dataset is available. Taken together, the -results obtained here show that the proposed representation/classification -method might be useful to improve the semantical characterization of written -texts. -" -3093,1606.07953,"Abhyuday Jagannatha, Hong Yu","Bidirectional Recurrent Neural Networks for Medical Event Detection in - Electronic Health Records",cs.CL cs.LG cs.NE," Sequence labeling for extraction of medical events and their attributes from -unstructured text in Electronic Health Record (EHR) notes is a key step towards -semantic understanding of EHRs. It has important applications in health -informatics including pharmacovigilance and drug surveillance. The state of the -art supervised machine learning models in this domain are based on Conditional -Random Fields (CRFs) with features calculated from fixed context windows. In -this application, we explored various recurrent neural network frameworks and -show that they significantly outperformed the CRF models. -" -3094,1606.07955,Daniel Winterstein and Joseph Corneli,X575: writing rengas with web services,cs.AI cs.CL," Our software system simulates the classical collaborative Japanese poetry -form, renga, made of linked haikus. We used NLP methods wrapped up as web -services. Our experiments were only a partial success, since results fail to -satisfy classical constraints. To gather ideas for future work, we examine -related research in semiotics, linguistics, and computing. -" -3095,1606.07965,Lu Wang and Claire Cardie,Summarizing Decisions in Spoken Meetings,cs.CL," This paper addresses the problem of summarizing decisions in spoken meetings: -our goal is to produce a concise {\it decision abstract} for each meeting -decision. We explore and compare token-level and dialogue act-level automatic -summarization methods using both unsupervised and supervised learning -frameworks. In the supervised summarization setting, and given true clusterings -of decision-related utterances, we find that token-level summaries that employ -discourse context can approach an upper bound for decision abstracts derived -directly from dialogue acts. In the unsupervised summarization setting,we find -that summaries based on unsupervised partitioning of decision-related -utterances perform comparably to those based on partitions generated using -supervised techniques (0.22 ROUGE-F1 using LDA-based topic models vs. 0.23 -using SVMs). -" -3096,1606.07967,Lu Wang and Larry Heck and Dilek Hakkani-Tur,"Leveraging Semantic Web Search and Browse Sessions for Multi-Turn Spoken - Dialog Systems",cs.CL," Training statistical dialog models in spoken dialog systems (SDS) requires -large amounts of annotated data. The lack of scalable methods for data mining -and annotation poses a significant hurdle for state-of-the-art statistical -dialog managers. This paper presents an approach that directly leverage -billions of web search and browse sessions to overcome this hurdle. The key -insight is that task completion through web search and browse sessions is (a) -predictable and (b) generalizes to spoken dialog task completion. The new -method automatically mines behavioral search and browse patterns from web logs -and translates them into spoken dialog models. We experiment with naturally -occurring spoken dialogs and large scale web logs. Our session-based models -outperform the state-of-the-art method for entity extraction task in SDS. We -also achieve better performance for both entity and relation extraction on web -search queries when compared with nontrivial baselines. -" -3097,1606.07993,"Feifan Liu, Jinying Chen, Abhyuday Jagannatha, Hong Yu","Learning for Biomedical Information Extraction: Methodological Review of - Recent Advances",cs.CL," Biomedical information extraction (BioIE) is important to many applications, -including clinical decision support, integrative biology, and -pharmacovigilance, and therefore it has been an active research. Unlike -existing reviews covering a holistic view on BioIE, this review focuses on -mainly recent advances in learning based approaches, by systematically -summarizing them into different aspects of methodological development. In -addition, we dive into open information extraction and deep learning, two -emerging and influential techniques and envision next generation of BioIE. -" -3098,1606.08003,"Guy Emerson, Ann Copestake",Functional Distributional Semantics,cs.CL," Vector space models have become popular in distributional semantics, despite -the challenges they face in capturing various semantic phenomena. We propose a -novel probabilistic framework which draws on both formal semantics and recent -advances in machine learning. In particular, we separate predicates from the -entities they refer to, allowing us to perform Bayesian inference based on -logical forms. We describe an implementation of this framework using a -combination of Restricted Boltzmann Machines and feedforward neural networks. -Finally, we demonstrate the feasibility of this approach by training it on a -parsed corpus and evaluating it on established similarity datasets. -" -3099,1606.08089,"Gus Hahn-Powell, Dane Bell, Marco A. Valenzuela-Esc\'arcega, and Mihai - Surdeanu",This before That: Causal Precedence in the Biomedical Domain,cs.CL," Causal precedence between biochemical interactions is crucial in the -biomedical domain, because it transforms collections of individual -interactions, e.g., bindings and phosphorylations, into the causal mechanisms -needed to inform meaningful search and inference. Here, we analyze causal -precedence in the biomedical domain as distinct from open-domain, temporal -precedence. First, we describe a novel, hand-annotated text corpus of causal -precedence in the biomedical domain. Second, we use this corpus to investigate -a battery of models of precedence, covering rule-based, feature-based, and -latent representation models. The highest-performing individual model achieved -a micro F1 of 43 points, approaching the best performers on the simpler -temporal-only precedence tasks. Feature-based and latent representation models -each outperform the rule-based models, but their performance is complementary -to one another. We apply a sieve-based architecture to capitalize on this lack -of overlap, achieving a micro F1 score of 46 points. -" -3100,1606.08140,"Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu and Mark Johnson","STransE: a novel embedding model of entities and relationships in - knowledge bases",cs.CL cs.AI," Knowledge bases of real-world facts about entities and their relationships -are useful resources for a variety of natural language processing tasks. -However, because knowledge bases are typically incomplete, it is useful to be -able to perform link prediction or knowledge base completion, i.e., predict -whether a relationship not in the knowledge base is likely to be true. This -paper combines insights from several previous link prediction models into a new -embedding model STransE that represents each entity as a low-dimensional -vector, and each relation by two matrices and a translation vector. STransE is -a simple combination of the SE and TransE models, but it obtains better link -prediction performance on two benchmark datasets than previous embedding -models. Thus, STransE can serve as a new baseline for the more complex models -in the link prediction task. -" -3101,1606.08207,"Sanja \v{S}\'cepanovi\'c, Igor Mishkovski, Bruno Gon\c{c}alves, Nguyen - Trung Hieu, Pan Hui",Semantic homophily in online communication: evidence from Twitter,physics.soc-ph cs.CL cs.CY cs.HC cs.SI," People are observed to assortatively connect on a set of traits. This -phenomenon, termed assortative mixing or sometimes homophily, can be quantified -through assortativity coefficient in social networks. Uncovering the exact -causes of strong assortative mixing found in social networks has been a -research challenge. Among the main suggested causes from sociology are the -tendency of similar individuals to connect (often itself referred as homophily) -and the social influence among already connected individuals. An important -question to researchers and in practice can be tackled, as we present here: -understanding the exact mechanisms of interplay between these tendencies and -the underlying social network structure. Namely, in addition to the mentioned -assortativity coefficient, there are several other static and temporal network -properties and substructures that can be linked to the tendencies of homophily -and social influence in the social network and we herein investigate those. -Concretely, we tackle a computer-mediated \textit{communication network} (based -on Twitter mentions) and a particular type of assortative mixing that can be -inferred from the semantic features of communication content that we term -\textit{semantic homophily}. Our work, to the best of our knowledge, is the -first to offer an in-depth analysis on semantic homophily in a communication -network and the interplay between them. We quantify diverse levels of semantic -homophily, identify the semantic aspects that are the drivers of observed -homophily, show insights in its temporal evolution and finally, we present its -intricate interplay with the communication network on Twitter. By analyzing -these mechanisms we increase understanding on what are the semantic aspects -that shape and how they shape the human computer-mediated communication. -" -3102,1606.08270,Naomi Saphra and Adam Lopez,Evaluating Informal-Domain Word Representations With UrbanDictionary,cs.CL," Existing corpora for intrinsic evaluation are not targeted towards tasks in -informal domains such as Twitter or news comment forums. We want to test -whether a representation of informal words fulfills the promise of eliding -explicit text normalization as a preprocessing step. One possible evaluation -metric for such domains is the proximity of spelling variants. We propose how -such a metric might be computed and how a spelling variant dataset can be -collected using UrbanDictionary. -" -3103,1606.08340,"Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, Wei-Ying Ma",Topic Aware Neural Response Generation,cs.CL," We consider incorporating topic information into the sequence-to-sequence -framework to generate informative and interesting responses for chatbots. To -this end, we propose a topic aware sequence-to-sequence (TA-Seq2Seq) model. The -model utilizes topics to simulate prior knowledge of human that guides them to -form informative and interesting responses in conversation, and leverages the -topic information in generation by a joint attention mechanism and a biased -generation probability. The joint attention mechanism summarizes the hidden -vectors of an input message as context vectors by message attention, -synthesizes topic vectors by topic attention from the topic words of the -message obtained from a pre-trained LDA model, and let these vectors jointly -affect the generation of words in decoding. To increase the possibility of -topic words appearing in responses, the model modifies the generation -probability of topic words by adding an extra probability item to bias the -overall distribution. Empirical study on both automatic evaluation metrics and -human annotations shows that TA-Seq2Seq can generate more informative and -interesting responses, and significantly outperform the-state-of-the-art -response generation models. -" -3104,1606.08359,"Thomas Demeester and Tim Rockt\""aschel and Sebastian Riedel",Lifted Rule Injection for Relation Embeddings,cs.LG cs.AI cs.CL," Methods based on representation learning currently hold the state-of-the-art -in many natural language processing and knowledge base inference tasks. Yet, a -major challenge is how to efficiently incorporate commonsense knowledge into -such models. A recent approach regularizes relation and entity representations -by propositionalization of first-order logic rules. However, -propositionalization does not scale beyond domains with only few entities and -rules. In this paper we present a highly efficient method for incorporating -implication rules into distributed representations for automated knowledge base -construction. We map entity-tuple embeddings into an approximately Boolean -space and encourage a partial ordering over relation embeddings based on -implication rules mined from WordNet. Surprisingly, we find that the strong -restriction of the entity-tuple embedding space does not hurt the -expressiveness of the model and even acts as a regularizer that improves -generalization. By incorporating few commonsense rules, we achieve an increase -of 2 percentage points mean average precision over a matrix factorization -baseline, while observing a negligible increase in runtime. -" -3105,1606.08425,"Elliot Schumacher, Maxine Eskenazi, Gwen Frishkoff, Kevyn - Collins-Thompson","Predicting the Relative Difficulty of Single Sentences With and Without - Surrounding Context",cs.CL," The problem of accurately predicting relative reading difficulty across a set -of sentences arises in a number of important natural language applications, -such as finding and curating effective usage examples for intelligent language -tutoring systems. Yet while significant research has explored document- and -passage-level reading difficulty, the special challenges involved in assessing -aspects of readability for single sentences have received much less attention, -particularly when considering the role of surrounding passages. We introduce -and evaluate a novel approach for estimating the relative reading difficulty of -a set of sentences, with and without surrounding context. Using different sets -of lexical and grammatical features, we explore models for predicting pairwise -relative difficulty using logistic regression, and examine rankings generated -by aggregating pairwise difficulty labels using a Bayesian rating system to -form a final ranking. We also compare rankings derived for sentences assessed -with and without context, and find that contextual features can help predict -differences in relative difficulty judgments across these two conditions. -" -3106,1606.08495,"Erik Ordentlich, Lee Yang, Andy Feng, Peter Cnudde, Mihajlo Grbovic, - Nemanja Djuric, Vladan Radosavljevic, Gavin Owens","Network-Efficient Distributed Word2vec Training System for Large - Vocabularies",cs.CL," Word2vec is a popular family of algorithms for unsupervised training of dense -vector representations of words on large text corpuses. The resulting vectors -have been shown to capture semantic relationships among their corresponding -words, and have shown promise in reducing a number of natural language -processing (NLP) tasks to mathematical operations on these vectors. While -heretofore applications of word2vec have centered around vocabularies with a -few million words, wherein the vocabulary is the set of words for which vectors -are simultaneously trained, novel applications are emerging in areas outside of -NLP with vocabularies comprising several 100 million words. Existing word2vec -training systems are impractical for training such large vocabularies as they -either require that the vectors of all vocabulary words be stored in the memory -of a single server or suffer unacceptable training latency due to massive -network data transfer. In this paper, we present a novel distributed, parallel -training system that enables unprecedented practical training of vectors for -vocabularies with several 100 million words on a shared cluster of commodity -servers, using far less network traffic than the existing solutions. We -evaluate the proposed system on a benchmark dataset, showing that the quality -of vectors does not degrade relative to non-distributed training. Finally, for -several quarters, the system has been deployed for the purpose of matching -queries to ads in Gemini, the sponsored search advertising platform at Yahoo, -resulting in significant improvement of business metrics. -" -3107,1606.08513,"Tomasz Jurczyk, Michael Zhai, Jinho D. Choi",SelQA: A New Benchmark for Selection-based Question Answering,cs.CL," This paper presents a new selection-based question answering dataset, SelQA. -The dataset consists of questions generated through crowdsourcing and sentence -length answers that are drawn from the ten most prevalent topics in the English -Wikipedia. We introduce a corpus annotation scheme that enhances the generation -of large, diverse, and challenging datasets by explicitly aiming to reduce word -co-occurrences between the question and answers. Our annotation scheme is -composed of a series of crowdsourcing tasks with a view to more effectively -utilize crowdsourcing in the creation of question answering datasets in various -domains. Several systems are compared on the tasks of answer sentence selection -and answer triggering, providing strong baseline results for future work to -improve upon. -" -3108,1606.08689,"Nemanja Djuric, Hao Wu, Vladan Radosavljevic, Mihajlo Grbovic, Narayan - Bhamidipati","Hierarchical Neural Language Models for Joint Representation of - Streaming Documents and their Content",cs.CL cs.IR," We consider the problem of learning distributed representations for documents -in data streams. The documents are represented as low-dimensional vectors and -are jointly learned with distributed vector representations of word tokens -using a hierarchical framework with two embedded neural language models. In -particular, we exploit the context of documents in streams and use one of the -language models to model the document sequences, and the other to model word -sequences within them. The models learn continuous vector representations for -both word tokens and documents such that semantically similar documents and -words are close in a common vector space. We discuss extensions to our model, -which can be applied to personalized recommendation and social relationship -mining by adding further user layers to the hierarchy, thus learning -user-specific vectors to represent individual preferences. We validated the -learned representations on a public movie rating data set from MovieLens, as -well as on a large-scale Yahoo News data comprising three months of user -activity logs collected on Yahoo servers. The results indicate that the -proposed model can learn useful representations of both documents and word -tokens, outperforming the current state-of-the-art by a large margin. -" -3109,1606.08733,"Ond\v{r}ej Pl\'atek and Petr B\v{e}lohl\'avek and Vojt\v{e}ch - Hude\v{c}ek and Filip Jur\v{c}\'i\v{c}ek",Recurrent Neural Networks for Dialogue State Tracking,cs.CL," This paper discusses models for dialogue state tracking using recurrent -neural networks (RNN). We present experiments on the standard dialogue state -tracking (DST) dataset, DSTC2. On the one hand, RNN models became the state of -the art models in DST, on the other hand, most state-of-the-art models are only -turn-based and require dataset-specific preprocessing (e.g. DSTC2-specific) in -order to achieve such results. We implemented two architectures which can be -used in incremental settings and require almost no preprocessing. We compare -their performance to the benchmarks on DSTC2 and discuss their properties. With -only trivial preprocessing, the performance of our models is close to the -state-of- the-art results. -" -3110,1606.08777,Gemma Boleda and Sebastian Pad\'o and Marco Baroni,"""Show me the cup"": Reference with Continuous Representations",cs.CL cs.AI cs.LG," One of the most basic functions of language is to refer to objects in a -shared scene. Modeling reference with continuous representations is challenging -because it requires individuation, i.e., tracking and distinguishing an -arbitrary number of referents. We introduce a neural network model that, given -a definite description and a set of objects represented by natural images, -points to the intended object if the expression has a unique referent, or -indicates a failure, if it does not. The model, directly trained on reference -acts, is competitive with a pipeline manually engineered to perform the same -task, both when referents are purely visual, and when they are characterized by -a combination of visual and linguistic properties. -" -3111,1606.08821,"Zhenhao Ge, Aravind Ganapathiraju, Ananth N. Iyer, Scott A. Randal and - Felix I. Wyss",Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy,cs.CL," Speech recognition, especially name recognition, is widely used in phone -services such as company directory dialers, stock quote providers or location -finders. It is usually challenging due to pronunciation variations. This paper -proposes an efficient and robust data-driven technique which automatically -learns acceptable word pronunciations and updates the pronunciation dictionary -to build a better lexicon without affecting recognition of other words similar -to the target word. It generalizes well on datasets with various sizes, and -reduces the error rate on a database with 13000+ human names by 42%, compared -to a baseline with regular dictionaries already covering canonical -pronunciations of 97%+ words in names, plus a well-trained -spelling-to-pronunciation (STP) engine. -" -3112,1606.08954,"Swabha Swayamdipta and Miguel Ballesteros and Chris Dyer and Noah A. - Smith","Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs",cs.CL cs.AI," We present a transition-based parser that jointly produces syntactic and -semantic dependencies. It learns a representation of the entire algorithm -state, using stack long short-term memories. Our greedy inference algorithm has -linear time, including feature extraction. On the CoNLL 2008--9 English shared -tasks, we obtain the best published parsing performance among models that -jointly learn syntax and semantics. -" -3113,1606.09058,Dimitrios Alikaniotis and John N. Williams,A Distributional Semantics Approach to Implicit Language Learning,cs.CL cs.LG," In the present paper we show that distributional information is particularly -important when considering concept availability under implicit language -learning conditions. Based on results from different behavioural experiments we -argue that the implicit learnability of semantic regularities depends on the -degree to which the relevant concept is reflected in language use. In our -simulations, we train a Vector-Space model on either an English or a Chinese -corpus and then feed the resulting representations to a feed-forward neural -network. The task of the neural network was to find a mapping between the word -representations and the novel words. Using datasets from four behavioural -experiments, which used different semantic manipulations, we were able to -obtain learning patterns very similar to those obtained by humans. -" -3114,1606.09163,Akash Kumar Dhaka and Giampiero Salvi,"Optimising The Input Window Alignment in CD-DNN Based Phoneme - Recognition for Low Latency Processing",cs.CL cs.CV cs.NE stat.ML," We present a systematic analysis on the performance of a phonetic recogniser -when the window of input features is not symmetric with respect to the current -frame. The recogniser is based on Context Dependent Deep Neural Networks -(CD-DNNs) and Hidden Markov Models (HMMs). The objective is to reduce the -latency of the system by reducing the number of future feature frames required -to estimate the current output. Our tests performed on the TIMIT database show -that the performance does not degrade when the input window is shifted up to 5 -frames in the past compared to common practice (no future frame). This -corresponds to improving the latency by 50 ms in our settings. Our tests also -show that the best results are not obtained with the symmetric window commonly -employed, but with an asymmetric window with eight past and two future context -frames, although this observation should be confirmed on other data sets. The -reduction in latency suggested by our results is critical for specific -applications such as real-time lip synchronisation for tele-presence, but may -also be beneficial in general applications to improve the lag in human-machine -spoken interaction. -" -3115,1606.09222,"Salita Ulitia Prini, Ary Setijadi Prihatmanto","Penambahan emosi menggunakan metode manipulasi prosodi untuk sistem text - to speech bahasa Indonesia",cs.SD cs.CL cs.RO," Adding an emotions using prosody manipulation method for Indonesian text to -speech system. Text To Speech (TTS) is a system that can convert text in one -language into speech, accordance with the reading of the text in the language -used. The focus of this research is a natural sounding concept, the make -""humanize"" for the pronunciation of voice synthesis system Text To Speech. -Humans have emotions / intonation that may affect the sound produced. The main -requirement for the system used Text To Speech in this research is eSpeak, the -database MBROLA using id1, Human Speech Corpus database from a website that -summarizes the words with the highest frequency (Most Common Words) used in a -country. And there are 3 types of emotional / intonation designed base. There -is a happy, angry and sad emotion. Method for develop the emotional filter is -manipulate the relevant features of prosody (especially pitch and duration -value) using a predetermined rate factor that has been established by analyzing -the differences between the standard output Text To Speech and voice recording -with emotional prosody / a particular intonation. The test results for the -perception tests of Human Speech Corpus for happy emotion is 95 %, 96.25 % for -angry emotion and 98.75 % for sad emotions. For perception test system carried -by intelligibility and naturalness test. Intelligibility test for the accuracy -of sound with the original sentence is 93.3%, and for clarity rate for each -sentence is 62.8%. For naturalness, accuracy emotional election amounted to -75.6 % for happy emotion, 73.3 % for angry emotion, and 60 % for sad emotions. - ----- - Text To Speech (TTS) merupakan suatu sistem yang dapat mengonversi teks dalam -format suatu bahasa menjadi ucapan sesuai dengan pembacaan teks dalam bahasa -yang digunakan. -" -3116,1606.09239,"Hao Zhang, Zhiting Hu, Yuntian Deng, Mrinmaya Sachan, Zhicheng Yan, - Eric P. Xing",Learning Concept Taxonomies from Multi-modal Data,cs.CL cs.CV cs.LG," We study the problem of automatically building hypernym taxonomies from -textual and visual data. Previous works in taxonomy induction generally ignore -the increasingly prominent visual data, which encode important perceptual -semantics. Instead, we propose a probabilistic model for taxonomy induction by -jointly leveraging text and images. To avoid hand-crafted feature engineering, -we design end-to-end features based on distributed representations of images -and words. The model is discriminatively trained given a small set of existing -ontologies and is capable of building full taxonomies from scratch for a -collection of unseen conceptual label items with associated images. We evaluate -our model and features on the WordNet hierarchies, where our system outperforms -previous approaches by a large gap. -" -3117,1606.09274,"Abigail See, Minh-Thang Luong, Christopher D. Manning",Compression of Neural Machine Translation Models via Pruning,cs.AI cs.CL cs.NE," Neural Machine Translation (NMT), like many other deep learning domains, -typically suffers from over-parameterization, resulting in large storage sizes. -This paper examines three simple magnitude-based pruning schemes to compress -NMT models, namely class-blind, class-uniform, and class-distribution, which -differ in terms of how pruning thresholds are computed for the different -classes of weights in the NMT architecture. We demonstrate the efficacy of -weight pruning as a compression technique for a state-of-the-art NMT system. We -show that an NMT model with over 200 million parameters can be pruned by 40% -with very little performance loss as measured on the WMT'14 English-German -translation task. This sheds light on the distribution of redundancy in the NMT -architecture. Our main result is that with retraining, we can recover and even -surpass the original performance with an 80%-pruned model. -" -3118,1606.09370,"Sunil Kumar Sahu, Ashish Anand, Krishnadev Oruganty, Mahanandeeshwar - Gattu","Relation extraction from clinical texts using domain invariant - convolutional neural network",cs.CL," In recent years extracting relevant information from biomedical and clinical -texts such as research articles, discharge summaries, or electronic health -records have been a subject of many research efforts and shared challenges. -Relation extraction is the process of detecting and classifying the semantic -relation among entities in a given piece of texts. Existing models for this -task in biomedical domain use either manually engineered features or kernel -methods to create feature vector. These features are then fed to classifier for -the prediction of the correct class. It turns out that the results of these -methods are highly dependent on quality of user designed features and also -suffer from curse of dimensionality. In this work we focus on extracting -relations from clinical discharge summaries. Our main objective is to exploit -the power of convolution neural network (CNN) to learn features automatically -and thus reduce the dependency on manual feature engineering. We evaluate -performance of the proposed model on i2b2-2010 clinical relation extraction -challenge dataset. Our results indicate that convolution neural network can be -a good model for relation exaction in clinical text without being dependent on -expert's knowledge on defining quality features. -" -3119,1606.09371,"Sunil Kumar Sahu, Ashish Anand","Recurrent neural network models for disease name recognition using - domain invariant features",cs.CL," Hand-crafted features based on linguistic and domain-knowledge play crucial -role in determining the performance of disease name recognition systems. Such -methods are further limited by the scope of these features or in other words, -their ability to cover the contexts or word dependencies within a sentence. In -this work, we focus on reducing such dependencies and propose a -domain-invariant framework for the disease name recognition task. In -particular, we propose various end-to-end recurrent neural network (RNN) models -for the tasks of disease name recognition and their classification into four -pre-defined categories. We also utilize convolution neural network (CNN) in -cascade of RNN to get character-based embedded features and employ it with -word-embedded features in our model. We compare our models with the -state-of-the-art results for the two tasks on NCBI disease dataset. Our results -for the disease mention recognition task indicate that state-of-the-art -performance can be obtained without relying on feature engineering. Further the -proposed models obtained improved performance on the classification task of -disease names. -" -3120,1606.09403,"Long Duong, Hiroshi Kanayama, Tengfei Ma, Steven Bird, Trevor Cohn",Learning Crosslingual Word Embeddings without Bilingual Corpora,cs.CL cs.AI," Crosslingual word embeddings represent lexical items from different languages -in the same vector space, enabling transfer of NLP tools. However, previous -attempts had expensive resource requirements, difficulty incorporating -monolingual data or were unable to handle polysemy. We address these drawbacks -in our method which takes advantage of a high coverage dictionary in an EM -style training algorithm over monolingual corpora in two languages. Our model -achieves state-of-the-art performance on bilingual lexicon induction task -exceeding models using large bilingual corpora, and competitive results on the -monolingual word similarity and cross-lingual document classification task. -" -3121,1606.09560,"Joel Legrand, Michael Auli, Ronan Collobert",Neural Network-based Word Alignment through Score Aggregation,cs.CL," We present a simple neural network for word alignment that builds source and -target word window representations to compute alignment scores for sentence -pairs. To enable unsupervised training, we use an aggregation operation that -summarizes the alignment scores for a given target word. A soft-margin -objective increases scores for true target words while decreasing scores for -target words that are not present. Compared to the popular Fast Align model, -our approach improves alignment accuracy by 7 AER on English-Czech, by 6 AER on -Romanian-English and by 1.7 AER on English-French alignment. -" -3122,1606.09600,"Daniel Beck, Lucia Specia, Trevor Cohn","Exploring Prediction Uncertainty in Machine Translation Quality - Estimation",cs.CL," Machine Translation Quality Estimation is a notoriously difficult task, which -lessens its usefulness in real-world translation environments. Such scenarios -can be improved if quality predictions are accompanied by a measure of -uncertainty. However, models in this task are traditionally evaluated only in -terms of point estimate metrics, which do not take prediction uncertainty into -account. We investigate probabilistic methods for Quality Estimation that can -provide well-calibrated uncertainty estimates and evaluate them in terms of -their full posterior predictive distributions. We also show how this posterior -information can be useful in an asymmetric risk scenario, which aims to capture -typical situations in translation workflows. -" -3123,1606.09604,"Marco A. Valenzuela-Escarcega, Gus Hahn-Powell, Dane Bell, Mihai - Surdeanu","SnapToGrid: From Statistical to Interpretable Models for Biomedical - Information Extraction",cs.CL," We propose an approach for biomedical information extraction that marries the -advantages of machine learning models, e.g., learning directly from data, with -the benefits of rule-based approaches, e.g., interpretability. Our approach -starts by training a feature-based statistical model, then converts this model -to a rule-based variant by converting its features to rules, and ""snapping to -grid"" the feature weights to discrete votes. In doing so, our proposal takes -advantage of the large body of work in machine learning, but it produces an -interpretable model, which can be directly edited by experts. We evaluate our -approach on the BioNLP 2009 event extraction task. Our results show that there -is a small performance penalty when converting the statistical model to rules, -but the gain in interpretability compensates for that: with minimal effort, -human experts improve this model to have similar performance to the statistical -model that served as starting point. -" -3124,1606.09636,"Henrique F. de Arruda, Filipi N. Silva, Vanessa Q. Marinho, Diego R. - Amancio, Luciano da F. Costa",Representation of texts as complex networks: a mesoscopic approach,cs.CL," Statistical techniques that analyze texts, referred to as text analytics, -have departed from the use of simple word count statistics towards a new -paradigm. Text mining now hinges on a more sophisticated set of methods, -including the representations in terms of complex networks. While -well-established word-adjacency (co-occurrence) methods successfully grasp -syntactical features of written texts, they are unable to represent important -aspects of textual data, such as its topical structure, i.e. the sequence of -subjects developing at a mesoscopic level along the text. Such aspects are -often overlooked by current methodologies. In order to grasp the mesoscopic -characteristics of semantical content in written texts, we devised a network -model which is able to analyze documents in a multi-scale fashion. In the -proposed model, a limited amount of adjacent paragraphs are represented as -nodes, which are connected whenever they share a minimum semantical content. To -illustrate the capabilities of our model, we present, as a case example, a -qualitative analysis of ""Alice's Adventures in Wonderland"". We show that the -mesoscopic structure of a document, modeled as a network, reveals many semantic -traits of texts. Such an approach paves the way to a myriad of semantic-based -applications. In addition, our approach is illustrated in a machine learning -context, in which texts are classified among real texts and randomized -instances. -" -3125,1607.00030,"Alexandra Birch, Omri Abend, Ondrej Bojar, Barry Haddow",HUME: Human UCCA-Based Evaluation of Machine Translation,cs.CL," Human evaluation of machine translation normally uses sentence-level measures -such as relative ranking or adequacy scales. However, these provide no insight -into possible errors, and do not scale well with sentence length. We argue for -a semantics-based evaluation, which captures what meaning components are -retained in the MT output, thus providing a more fine-grained analysis of -translation quality, and enabling the construction and tuning of -semantics-based MT. We present a novel human semantic evaluation measure, Human -UCCA-based MT Evaluation (HUME), building on the UCCA semantic representation -scheme. HUME covers a wider range of semantic phenomena than previous methods -and does not rely on semantic annotation of the potentially garbled MT output. -We experiment with four language pairs, demonstrating HUME's broad -applicability, and report good inter-annotator agreement rates and correlation -with human adequacy scores. -" -3126,1607.00070,Layla El Asri and Jing He and Kaheer Suleman,"A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue - Systems",cs.CL," User simulation is essential for generating enough data to train a -statistical spoken dialogue system. Previous models for user simulation suffer -from several drawbacks, such as the inability to take dialogue history into -account, the need of rigid structure to ensure coherent user behaviour, heavy -dependence on a specific domain, the inability to output several user -intentions during one dialogue turn, or the requirement of a summarized action -space for tractability. This paper introduces a data-driven user simulator -based on an encoder-decoder recurrent neural network. The model takes as input -a sequence of dialogue contexts and outputs a sequence of dialogue acts -corresponding to user intentions. The dialogue contexts include information -about the machine acts and the status of the user goal. We show on the Dialogue -State Tracking Challenge 2 (DSTC2) dataset that the sequence-to-sequence model -outperforms an agenda-based simulator and an n-gram simulator, according to -F-score. Furthermore, we show how this model can be used on the original action -space and thereby models user behaviour with finer granularity. -" -3127,1607.00139,Mike Thelwall,"TensiStrength: Stress and relaxation magnitude detection for social - media texts",cs.CL," Computer systems need to be able to react to stress in order to perform -optimally on some tasks. This article describes TensiStrength, a system to -detect the strength of stress and relaxation expressed in social media text -messages. TensiStrength uses a lexical approach and a set of rules to detect -direct and indirect expressions of stress or relaxation, particularly in the -context of transportation. It is slightly more effective than a comparable -sentiment analysis program, although their similar performances occur despite -differences on almost half of the tweets gathered. The effectiveness of -TensiStrength depends on the nature of the tweets classified, with tweets that -are rich in stress-related terms being particularly problematic. Although -generic machine learning methods can give better performance than TensiStrength -overall, they exploit topic-related terms in a way that may be undesirable in -practical applications and that may not work as well in more focused contexts. -In conclusion, TensiStrength and generic machine learning approaches work well -enough to be practical choices for intelligent applications that need to take -advantage of stress information, and the decision about which to use depends on -the nature of the texts analysed and the purpose of the task. -" -3128,1607.00167,"Jo\~ao Oliveira, Mike Pinto, Pedro Saleiro, Jorge Teixeira","SentiBubbles: Topic Modeling and Sentiment Visualization of - Entity-centric Tweets",cs.SI cs.CL cs.IR," Social Media users tend to mention entities when reacting to news events. The -main purpose of this work is to create entity-centric aggregations of tweets on -a daily basis. By applying topic modeling and sentiment analysis, we create -data visualization insights about current events and people reactions to those -events from an entity-centric perspective. -" -3129,1607.00186,David M. W. Powers,"Throwing fuel on the embers: Probability or Dichotomy, Cognitive or - Linguistic?",cs.CL cs.AI," Prof. Robert Berwick's abstract for his forthcoming invited talk at the -ACL2016 workshop on Cognitive Aspects of Computational Language Learning -revives an ancient debate. Entitled ""Why take a chance?"", Berwick seems to -refer implicitly to Chomsky's critique of the statistical approach of Harris as -well as the currently dominant paradigms in CoNLL. - Berwick avoids Chomsky's use of ""innate"" but states that ""the debate over the -existence of sophisticated mental grammars was settled with Chomsky's Logical -Structure of Linguistic Theory (1957/1975)"", acknowledging that ""this debate -has often been revived"". - This paper agrees with the view that this debate has long since been settled, -but with the opposite outcome! Given the embers have not yet died away, and the -questions remain fundamental, perhaps it is appropriate to refuel the debate, -so I would like to join Bob in throwing fuel on this fire by reviewing the -evidence against the Chomskian position! -" -3130,1607.00198,"Rudra Murthy V, Mitesh Khapra and Pushpak Bhattacharyya",Sharing Network Parameters for Crosslingual Named Entity Recognition,cs.CL," Most state of the art approaches for Named Entity Recognition rely on hand -crafted features and annotated corpora. Recently Neural network based models -have been proposed which do not require handcrafted features but still require -annotated corpora. However, such annotated corpora may not be available for -many languages. In this paper, we propose a neural network based model which -allows sharing the decoder as well as word and character level parameters -between two languages thereby allowing a resource fortunate language to aid a -resource deprived language. Specifically, we focus on the case when limited -annotated corpora is available in one language ($L_1$) and abundant annotated -corpora is available in another language ($L_2$). Sharing the network -architecture and parameters between $L_1$ and $L_2$ leads to improved -performance in $L_1$. Further, our approach does not require any hand crafted -features but instead directly learns meaningful feature representations from -the training data itself. We experiment with 4 language pairs and show that -indeed in a resource constrained setup (lesser annotated corpora), a model -jointly trained with data from another language performs better than a model -trained only on the limited corpora in one language. -" -3131,1607.00225,"St\'ephan Tulkens, Chris Emmery, Walter Daelemans",Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource,cs.CL," Word embeddings have recently seen a strong increase in interest as a result -of strong performance gains on a variety of tasks. However, most of this -research also underlined the importance of benchmark datasets, and the -difficulty of constructing these for a variety of language-specific tasks. -Still, many of the datasets used in these tasks could prove to be fruitful -linguistic resources, allowing for unique observations into language use and -variability. In this paper we demonstrate the performance of multiple types of -embeddings, created with both count and prediction-based architectures on a -variety of corpora, in two language-specific tasks: relation evaluation, and -dialect identification. For the latter, we compare unsupervised methods with a -traditional, hand-crafted dictionary. With this research, we provide the -embeddings themselves, the relation evaluation task benchmark for use in -further research, and demonstrate how the benchmarked embeddings prove a useful -unsupervised linguistic resource, effectively used in a downstream task. -" -3132,1607.00325,"Dong Yu, Morten Kolb{\ae}k, Zheng-Hua Tan, and Jesper Jensen","Permutation Invariant Training of Deep Models for Speaker-Independent - Multi-talker Speech Separation",cs.CL cs.LG cs.SD eess.AS," We propose a novel deep learning model, which supports permutation invariant -training (PIT), for speaker independent multi-talker speech separation, -commonly known as the cocktail-party problem. Different from most of the prior -arts that treat speech separation as a multi-class regression problem and the -deep clustering technique that considers it a segmentation (or clustering) -problem, our model optimizes for the separation regression error, ignoring the -order of mixing sources. This strategy cleverly solves the long-lasting label -permutation problem that has prevented progress on deep learning based -techniques for speech separation. Experiments on the equal-energy mixing setup -of a Danish corpus confirms the effectiveness of PIT. We believe improvements -built upon PIT can eventually solve the cocktail-party problem and enable -real-world adoption of, e.g., automatic meeting transcription and multi-party -human-computer interaction, where overlapping speech is common. -" -3133,1607.00359,S\'ebastien Gagnon and Jean Rouat,Moving Toward High Precision Dynamical Modelling in Hidden Markov Models,cs.CL," Hidden Markov Model (HMM) is often regarded as the dynamical model of choice -in many fields and applications. It is also at the heart of most -state-of-the-art speech recognition systems since the 70's. However, from -Gaussian mixture models HMMs (GMM-HMM) to deep neural network HMMs (DNN-HMM), -the underlying Markovian chain of state-of-the-art models did not changed much. -The ""left-to-right"" topology is mostly always employed because very few other -alternatives exist. In this paper, we propose that finely-tuned HMM topologies -are essential for precise temporal modelling and that this approach should be -investigated in state-of-the-art HMM system. As such, we propose a -proof-of-concept framework for learning efficient topologies by pruning down -complex generic models. Speech recognition experiments that were conducted -indicate that complex time dependencies can be better learned by this approach -than with classical ""left-to-right"" models. -" -3134,1607.00410,"Yusuke Watanabe, Kazuma Hashimoto, Yoshimasa Tsuruoka",Domain Adaptation for Neural Networks by Parameter Augmentation,cs.CL cs.AI cs.LG," We propose a simple domain adaptation method for neural networks in a -supervised setting. Supervised domain adaptation is a way of improving the -generalization performance on the target domain by using the source domain -dataset, assuming that both of the datasets are labeled. Recently, recurrent -neural networks have been shown to be successful on a variety of NLP tasks such -as caption generation; however, the existing domain adaptation techniques are -limited to (1) tune the model parameters by the target dataset after the -training by the source dataset, or (2) design the network to have dual output, -one for the source domain and the other for the target domain. Reformulating -the idea of the domain adaptation technique proposed by Daume (2007), we -propose a simple domain adaptation method, which can be applied to neural -networks trained with a cross-entropy loss. On captioning datasets, we show -performance improvements over other domain adaptation methods. -" -3135,1607.00424,"Dileep Viswanathan and Ameet Soni and Jude Shavlik and Sriraam - Natarajan",Learning Relational Dependency Networks for Relation Extraction,cs.AI cs.CL cs.LG," We consider the task of KBP slot filling -- extracting relation information -from newswire documents for knowledge base construction. We present our -pipeline, which employs Relational Dependency Networks (RDNs) to learn -linguistic patterns for relation extraction. Additionally, we demonstrate how -several components such as weak supervision, word2vec features, joint learning -and the use of human advice, can be incorporated in this relational framework. -We evaluate the different components in the benchmark KBP 2015 task and show -that RDNs effectively model a diverse set of features and perform competitively -with current state-of-the-art relation extraction. -" -3136,1607.00534,Hendrik Heuer,"Text comparison using word vector representations and dimensionality - reduction",cs.CL," This paper describes a technique to compare large text sources using word -vector representations (word2vec) and dimensionality reduction (t-SNE) and how -it can be implemented using Python. The technique provides a bird's-eye view of -text sources, e.g. text summaries and their source material, and enables users -to explore text sources like a geographical map. Word vector representations -capture many linguistic properties such as gender, tense, plurality and even -semantic concepts like ""capital city of"". Using dimensionality reduction, a 2D -map can be computed where semantically similar words are close to each other. -The technique uses the word2vec model from the gensim Python library and t-SNE -from scikit-learn. -" -3137,1607.00570,"Cedric De Boom, Steven Van Canneyt, Thomas Demeester, Bart Dhoedt","Representation learning for very short texts using weighted word - embedding aggregation",cs.IR cs.CL," Short text messages such as tweets are very noisy and sparse in their use of -vocabulary. Traditional textual representations, such as tf-idf, have -difficulty grasping the semantic meaning of such texts, which is important in -applications such as event detection, opinion mining, news recommendation, etc. -We constructed a method based on semantic word embeddings and frequency -information to arrive at low-dimensional representations for short texts -designed to capture semantic similarity. For this purpose we designed a -weight-based model and a learning procedure based on a novel median-based loss -function. This paper discusses the details of our model and the optimization -methods, together with the experimental results on both Wikipedia and Twitter -data. We find that our method outperforms the baseline approaches in the -experiments, and that it generalizes well on different word embeddings without -retraining. Our method is therefore capable of retaining most of the semantic -information in the text, and is applicable out-of-the-box. -" -3138,1607.00578,Heeyoul Choi and Kyunghyun Cho and Yoshua Bengio,Context-Dependent Word Representation for Neural Machine Translation,cs.CL," We first observe a potential weakness of continuous vector representations of -symbols in neural machine translation. That is, the continuous vector -representation, or a word embedding vector, of a symbol encodes multiple -dimensions of similarity, equivalent to encoding more than one meaning of the -word. This has the consequence that the encoder and decoder recurrent networks -in neural machine translation need to spend substantial amount of their -capacity in disambiguating source and target words based on the context which -is defined by a source sentence. Based on this observation, in this paper we -propose to contextualize the word embedding vectors using a nonlinear -bag-of-words representation of the source sentence. Additionally, we propose to -represent special tokens (such as numbers, proper nouns and acronyms) with -typed symbols to facilitate translating those words that are not well-suited to -be translated via continuous vectors. The experiments on En-Fr and En-De reveal -that the proposed approaches of contextualization and symbolization improves -the translation quality of neural machine translation systems significantly. -" -3139,1607.00623,Kaveh Hassani and Won-Sook Lee,Visualizing Natural Language Descriptions: A Survey,cs.CL cs.AI cs.CV cs.GR cs.HC," A natural language interface exploits the conceptual simplicity and -naturalness of the language to create a high-level user-friendly communication -channel between humans and machines. One of the promising applications of such -interfaces is generating visual interpretations of semantic content of a given -natural language that can be then visualized either as a static scene or a -dynamic animation. This survey discusses requirements and challenges of -developing such systems and reports 26 graphical systems that exploit natural -language interfaces and addresses both artificial intelligence and -visualization aspects. This work serves as a frame of reference to researchers -and to enable further advances in the field. -" -3140,1607.00718,"Minsoo Kim, Moirangthem Dennis Singh, and Minho Lee","Towards Abstraction from Extraction: Multiple Timescale Gated Recurrent - Unit for Summarization",cs.CL," In this work, we introduce temporal hierarchies to the sequence to sequence -(seq2seq) model to tackle the problem of abstractive summarization of -scientific articles. The proposed Multiple Timescale model of the Gated -Recurrent Unit (MTGRU) is implemented in the encoder-decoder setting to better -deal with the presence of multiple compositionalities in larger texts. The -proposed model is compared to the conventional RNN encoder-decoder, and the -results demonstrate that our model trains faster and shows significant -performance gains. The results also show that the temporal hierarchies help -improve the ability of seq2seq models to capture compositionalities better -without the presence of highly complex architectural hierarchies. -" -3141,1607.00970,"Lili Mou, Yiping Song, Rui Yan, Ge Li, Lu Zhang, Zhi Jin","Sequence to Backward and Forward Sequences: A Content-Introducing - Approach to Generative Short-Text Conversation",cs.CL cs.LG," Using neural networks to generate replies in human-computer dialogue systems -is attracting increasing attention over the past few years. However, the -performance is not satisfactory: the neural network tends to generate safe, -universally relevant replies which carry little meaning. In this paper, we -propose a content-introducing approach to neural network-based generative -dialogue systems. We first use pointwise mutual information (PMI) to predict a -noun as a keyword, reflecting the main gist of the reply. We then propose -seq2BF, a ""sequence to backward and forward sequences"" model, which generates a -reply containing the given keyword. Experimental results show that our approach -significantly outperforms traditional sequence-to-sequence models in terms of -human evaluation and the entropy measure, and that the predicted keyword can -appear at an appropriate position in the reply. -" -3142,1607.00976,"Silvio Amir, Byron C. Wallace, Hao Lyu, Paula Carvalho M\'ario J. - Silva","Modelling Context with User Embeddings for Sarcasm Detection in Social - Media",cs.CL cs.AI," We introduce a deep neural network for automated sarcasm detection. Recent -work has emphasized the need for models to capitalize on contextual features, -beyond lexical and syntactic cues present in utterances. For example, different -speakers will tend to employ sarcasm regarding different subjects and, thus, -sarcasm detection models ought to encode such speaker information. Current -methods have achieved this by way of laborious feature engineering. By -contrast, we propose to automatically learn and then exploit user embeddings, -to be used in concert with lexical signals to recognize sarcasm. Our approach -does not require elaborate feature engineering (and concomitant data scraping); -fitting user embeddings requires only the text from their previous posts. The -experimental results show that our model outperforms a state-of-the-art -approach leveraging an extensive set of carefully crafted features. -" -3143,1607.00992,Jay Pujara and Lise Getoor,Generic Statistical Relational Entity Resolution in Knowledge Graphs,cs.AI cs.CL," Entity resolution, the problem of identifying the underlying entity of -references found in data, has been researched for many decades in many -communities. A common theme in this research has been the importance of -incorporating relational features into the resolution process. Relational -entity resolution is particularly important in knowledge graphs (KGs), which -have a regular structure capturing entities and their interrelationships. We -identify three major problems in KG entity resolution: (1) intra-KG reference -ambiguity; (2) inter-KG reference ambiguity; and (3) ambiguity when extending -KGs with new facts. We implement a framework that generalizes across these -three settings and exploits this regular structure of KGs. Our framework has -many advantages over custom solutions widely deployed in industry, including -collective inference, scalability, and interpretability. We apply our framework -to two real-world KG entity resolution problems, ambiguity in NELL and merging -data from Freebase and MusicBrainz, demonstrating the importance of relational -features. -" -3144,1607.01133,Meng Fang and Trevor Cohn,"Learning when to trust distant supervision: An application to - low-resource POS tagging using cross-lingual projection",cs.CL," Cross lingual projection of linguistic annotation suffers from many sources -of bias and noise, leading to unreliable annotations that cannot be used -directly. In this paper, we introduce a novel approach to sequence tagging that -learns to correct the errors from cross-lingual projection using an explicit -debiasing layer. This is framed as joint learning over two corpora, one tagged -with gold standard and the other with projected tags. We evaluated with only -1,000 tokens tagged with gold standard tags, along with more plentiful parallel -data. Our system equals or exceeds the state-of-the-art on eight simulated -low-resource settings, as well as two real low-resource languages, Malagasy and -Kinyarwanda. -" -3145,1607.01149,"Ale\v{s} Tamchyna, Alexander Fraser, Ond\v{r}ej Bojar, Marcin - Junczys-Dowmunt","Target-Side Context for Discriminative Models in Statistical Machine - Translation",cs.CL," Discriminative translation models utilizing source context have been shown to -help statistical machine translation performance. We propose a novel extension -of this work using target context information. Surprisingly, we show that this -model can be efficiently integrated directly in the decoding process. Our -approach scales to large training data sizes and results in consistent -improvements in translation quality on four language pairs. We also provide an -analysis comparing the strengths of the baseline source-context model with our -extended source-context and target-context model and we show that our extension -allows us to better capture morphological coherence. Our work is freely -available as part of Moses. -" -3146,1607.01274,"Baiyang Wang, Diego Klabjan",Temporal Topic Analysis with Endogenous and Exogenous Processes,cs.CL cs.IR cs.LG," We consider the problem of modeling temporal textual data taking endogenous -and exogenous processes into account. Such text documents arise in real world -applications, including job advertisements and economic news articles, which -are influenced by the fluctuations of the general economy. We propose a -hierarchical Bayesian topic model which imposes a ""group-correlated"" -hierarchical structure on the evolution of topics over time incorporating both -processes, and show that this model can be estimated from Markov chain Monte -Carlo sampling methods. We further demonstrate that this model captures the -intrinsic relationships between the topic distribution and the time-dependent -factors, and compare its performance with latent Dirichlet allocation (LDA) and -two other related models. The model is applied to two collections of documents -to illustrate its empirical performance: online job advertisements from -DirectEmployers Association and journalists' postings on BusinessInsider.com. -" -3147,1607.01426,"Rajarshi Das, Arvind Neelakantan, David Belanger, Andrew McCallum","Chains of Reasoning over Entities, Relations, and Text using Recurrent - Neural Networks",cs.CL," Our goal is to combine the rich multistep inference of symbolic logical -reasoning with the generalization capabilities of neural networks. We are -particularly interested in complex reasoning about entities and relations in -text and large-scale knowledge bases (KBs). Neelakantan et al. (2015) use RNNs -to compose the distributed semantics of multi-hop paths in KBs; however for -multiple reasons, the approach lacks accuracy and practicality. This paper -proposes three significant modeling advances: (1) we learn to jointly reason -about relations, entities, and entity-types; (2) we use neural attention -modeling to incorporate multiple paths; (3) we learn to share strength in a -single RNN that represents logical composition across all relations. On a -largescale Freebase+ClueWeb prediction task, we achieve 25% error reduction, -and a 53% error reduction on sparse relations due to shared strength. On chains -of reasoning in WordNet we reduce error in mean quantile by 84% versus previous -state-of-the-art. The code and data are available at -https://rajarshd.github.io/ChainsofReasoning -" -3148,1607.01432,"Kenton Lee, Mike Lewis, Luke Zettlemoyer",Global Neural CCG Parsing with Optimality Guarantees,cs.CL," We introduce the first global recursive neural parsing model with optimality -guarantees during decoding. To support global features, we give up dynamic -programs and instead search directly in the space of all possible subtrees. -Although this space is exponentially large in the sentence length, we show it -is possible to learn an efficient A* parser. We augment existing parsing -models, which have informative bounds on the outside score, with a global model -that has loose bounds but only needs to model non-local phenomena. The global -model is trained with a new objective that encourages the parser to explore a -tiny fraction of the search space. The approach is applied to CCG parsing, -improving state-of-the-art accuracy by 0.4 F1. The parser finds the optimal -parse for 99.9% of held-out sentences, exploring on average only 190 subtrees. -" -3149,1607.01485,"John J. Camilleri, Normunds Gruzitis, Gerardo Schneider",Extracting Formal Models from Normative Texts,cs.CL," Normative texts are documents based on the deontic notions of obligation, -permission, and prohibition. Our goal is to model such texts using the C-O -Diagram formalism, making them amenable to formal analysis, in particular -verifying that a text satisfies properties concerning causality of actions and -timing constraints. We present an experimental, semi-automatic aid to bridge -the gap between a normative text and its formal representation. Our approach -uses dependency trees combined with our own rules and heuristics for extracting -the relevant components. The resulting tabular data can then be converted into -a C-O Diagram. -" -3150,1607.01490,"Ren\=ars Liepi\c{n}\v{s}, Uldis Boj\=ars, Normunds Gr\=uz\=itis, - K\=arlis \v{C}er\=ans, Edgars Celms","Towards Self-explanatory Ontology Visualization with Contextual - Verbalization",cs.AI cs.CL," Ontologies are one of the core foundations of the Semantic Web. To -participate in Semantic Web projects, domain experts need to be able to -understand the ontologies involved. Visual notations can provide an overview of -the ontology and help users to understand the connections among entities. -However, the users first need to learn the visual notation before they can -interpret it correctly. Controlled natural language representation would be -readable right away and might be preferred in case of complex axioms, however, -the structure of the ontology would remain less apparent. We propose to combine -ontology visualizations with contextual ontology verbalizations of selected -ontology (diagram) elements, displaying controlled natural language (CNL) -explanations of OWL axioms corresponding to the selected visual notation -elements. Thus, the domain experts will benefit from both the high-level -overview provided by the graphical notation and the detailed textual -explanations of particular elements in the diagram. -" -3151,1607.01628,"Wenhu Chen, Evgeny Matusov, Shahram Khadivi, Jan-Thorsten Peter",Guided Alignment Training for Topic-Aware Neural Machine Translation,cs.CL cs.NE," In this paper, we propose an effective way for biasing the attention -mechanism of a sequence-to-sequence neural machine translation (NMT) model -towards the well-studied statistical word alignment models. We show that our -novel guided alignment training approach improves translation quality on -real-life e-commerce texts consisting of product titles and descriptions, -overcoming the problems posed by many unknown words and a large type/token -ratio. We also show that meta-data associated with input texts such as topic or -category information can significantly improve translation quality when used as -an additional signal to the decoder part of the network. With both novel -features, the BLEU score of the NMT system on a product title set improves from -18.6 to 21.3%. Even larger MT quality gains are obtained through domain -adaptation of a general domain NMT system to e-commerce data. The developed NMT -system also performs well on the IWSLT speech translation task, where an -ensemble of four variant systems outperforms the phrase-based baseline by 2.1% -BLEU absolute. -" -3152,1607.01759,"Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov",Bag of Tricks for Efficient Text Classification,cs.CL," This paper explores a simple and efficient baseline for text classification. -Our experiments show that our fast text classifier fastText is often on par -with deep learning classifiers in terms of accuracy, and many orders of -magnitude faster for training and evaluation. We can train fastText on more -than one billion words in less than ten minutes using a standard multicore~CPU, -and classify half a million sentences among~312K classes in less than a minute. -" -3153,1607.01856,"Xiaoqing Li, Jiajun Zhang and Chengqing Zong",Neural Name Translation Improves Neural Machine Translation,cs.CL," In order to control computational complexity, neural machine translation -(NMT) systems convert all rare words outside the vocabulary into a single unk -symbol. Previous solution (Luong et al., 2015) resorts to use multiple numbered -unks to learn the correspondence between source and target rare words. However, -testing words unseen in the training corpus cannot be handled by this method. -And it also suffers from the noisy word alignment. In this paper, we focus on a -major type of rare words -- named entity (NE), and propose to translate them -with character level sequence to sequence model. The NE translation model is -further used to derive high quality NE alignment in the bilingual training -corpus. With the integration of NE translation and alignment modules, our NMT -system is able to surpass the baseline system by 2.9 BLEU points on the Chinese -to English task. -" -3154,1607.01869,"Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio - Silvestri, Ricardo Baeza-Yates, Andrew Feng, Erik Ordentlich, Lee Yang, Gavin - Owens","Scalable Semantic Matching of Queries to Ads in Sponsored Search - Advertising",cs.IR cs.AI cs.CL," Sponsored search represents a major source of revenue for web search engines. -This popular advertising model brings a unique possibility for advertisers to -target users' immediate intent communicated through a search query, usually by -displaying their ads alongside organic search results for queries deemed -relevant to their products or services. However, due to a large number of -unique queries it is challenging for advertisers to identify all such relevant -queries. For this reason search engines often provide a service of advanced -matching, which automatically finds additional relevant queries for advertisers -to bid on. We present a novel advanced matching approach based on the idea of -semantic embeddings of queries and ads. The embeddings were learned using a -large data set of user search sessions, consisting of search queries, clicked -ads and search links, while utilizing contextual information such as dwell time -and skipped ads. To address the large-scale nature of our problem, both in -terms of data and vocabulary size, we propose a novel distributed algorithm for -training of the embeddings. Finally, we present an approach for overcoming a -cold-start problem associated with new ads and queries. We report results of -editorial evaluation and online tests on actual search traffic. The results -show that our approach significantly outperforms baselines in terms of -relevance, coverage, and incremental revenue. Lastly, we open-source learned -query embeddings to be used by researchers in computational advertising and -related fields. -" -3155,1607.01958,"Joshi Kalyani, Prof. H. N. Bharathi, Prof. Rao Jyothi",Stock trend prediction using news sentiment analysis,cs.CL cs.IR cs.LG," Efficient Market Hypothesis is the popular theory about stock prediction. -With its failure much research has been carried in the area of prediction of -stocks. This project is about taking non quantifiable data such as financial -news articles about a company and predicting its future stock trend with news -sentiment classification. Assuming that news articles have impact on stock -market, this is an attempt to study relationship between news and stock trend. -To show this, we created three different classification models which depict -polarity of news articles being positive or negative. Observations show that RF -and SVM perform well in all types of testing. Na\""ive Bayes gives good result -but not compared to the other two. Experiments are conducted to evaluate -various aspects of the proposed model and encouraging results are obtained in -all of the experiments. The accuracy of the prediction model is more than 80% -and in comparison with news random labeling with 50% of accuracy; the model has -increased the accuracy by 30%. -" -3156,1607.01963,Liang Lu,Sequence Training and Adaptation of Highway Deep Neural Networks,cs.CL cs.LG cs.NE," Highway deep neural network (HDNN) is a type of depth-gated feedforward -neural network, which has shown to be easier to train with more hidden layers -and also generalise better compared to conventional plain deep neural networks -(DNNs). Previously, we investigated a structured HDNN architecture for speech -recognition, in which the two gate functions were tied across all the hidden -layers, and we were able to train a much smaller model without sacrificing the -recognition accuracy. In this paper, we carry on the study of this architecture -with sequence-discriminative training criterion and speaker adaptation -techniques on the AMI meeting speech recognition corpus. We show that these two -techniques improve speech recognition accuracy on top of the model trained with -the cross entropy criterion. Furthermore, we demonstrate that the two gate -functions that are tied across all the hidden layers are able to control the -information flow over the whole network, and we can achieve considerable -improvements by only updating these gate functions in both sequence training -and adaptation experiments. -" -3157,1607.01990,"N\'uria Bel, Mikel L. Forcada and Asunci\'on G\'omez-P\'erez","A Maturity Model for Public Administration as Open Translation Data - Providers",cs.CY cs.CL," Any public administration that produces translation data can be a provider of -useful reusable data to meet its own translation needs and the ones of other -public organizations and private companies that work with texts of the same -domain. These data can also be crucial to produce domain-tuned Machine -Translation systems. The organization's management of the translation process, -the characteristics of the archives of the generated resources and of the -infrastructure available to support them determine the efficiency and the -effectiveness with which the materials produced can be converted into reusable -data. However, it is of utmost importance that the organizations themselves -first become aware of the goods they are producing and, second, adapt their -internal processes to become optimal providers. In this article, we propose a -Maturity Model to help these organizations to achieve it by identifying the -different stages of the management of translation data that determine the path -to the aforementioned goal. -" -3158,1607.02061,"Emmanuele Chersoni, Enrico Santus, Alessandro Lenci, Philippe Blache - and Chu-Ren Huang",Representing Verbs with Rich Contexts: an Evaluation on Verb Similarity,cs.CL cs.AI," Several studies on sentence processing suggest that the mental lexicon keeps -track of the mutual expectations between words. Current DSMs, however, -represent context words as separate features, thereby loosing important -information for word expectations, such as word interrelations. In this paper, -we present a DSM that addresses this issue by defining verb contexts as joint -syntactic dependencies. We test our representation in a verb similarity task on -two datasets, showing that joint contexts achieve performances comparable to -single dependencies or even better. Moreover, they are able to overcome the -data sparsity problem of joint feature spaces, in spite of the limited size of -our training corpus. -" -3159,1607.02109,John J. Nay,"Predicting and Understanding Law-Making with Word Vectors and an - Ensemble Model",cs.CL physics.soc-ph stat.AP stat.ML," Out of nearly 70,000 bills introduced in the U.S. Congress from 2001 to 2015, -only 2,513 were enacted. We developed a machine learning approach to -forecasting the probability that any bill will become law. Starting in 2001 -with the 107th Congress, we trained models on data from previous Congresses, -predicted all bills in the current Congress, and repeated until the 113th -Congress served as the test. For prediction we scored each sentence of a bill -with a language model that embeds legislative vocabulary into a -high-dimensional, semantic-laden vector space. This language representation -enables our investigation into which words increase the probability of -enactment for any topic. To test the relative importance of text and context, -we compared the text model to a context-only model that uses variables such as -whether the bill's sponsor is in the majority party. To test the effect of -changes to bills after their introduction on our ability to predict their final -outcome, we compared using the bill text and meta-data available at the time of -introduction with using the most recent data. At the time of introduction -context-only predictions outperform text-only, and with the newest data -text-only outperforms context-only. Combining text and context always performs -best. We conducted a global sensitivity analysis on the combined model to -determine important variables predicting enactment. -" -3160,1607.02250,"Yiming Cui, Ting Liu, Zhipeng Chen, Shijin Wang and Guoping Hu","Consensus Attention-based Neural Networks for Chinese Reading - Comprehension",cs.CL cs.NE," Reading comprehension has embraced a booming in recent NLP research. Several -institutes have released the Cloze-style reading comprehension data, and these -have greatly accelerated the research of machine comprehension. In this work, -we firstly present Chinese reading comprehension datasets, which consist of -People Daily news dataset and Children's Fairy Tale (CFT) dataset. Also, we -propose a consensus attention-based neural network architecture to tackle the -Cloze-style reading comprehension problem, which aims to induce a consensus -attention over every words in the query. Experimental results show that the -proposed neural network significantly outperforms the state-of-the-art -baselines in several public datasets. Furthermore, we setup a baseline for -Chinese reading comprehension task, and hopefully this would speed up the -process for future research. -" -3161,1607.02310,Tamara Polajnar,"Collaborative Training of Tensors for Compositional Distributional - Semantics",cs.CL cs.LG," Type-based compositional distributional semantic models present an -interesting line of research into functional representations of linguistic -meaning. One of the drawbacks of such models, however, is the lack of training -data required to train each word-type combination. In this paper we address -this by introducing training methods that share parameters between similar -words. We show that these methods enable zero-shot learning for words that have -no training data at all, as well as enabling construction of high-quality -tensors from very few training examples per word. -" -3162,1607.02355,"Aurangzeb khan, Khairullah khan, Shakeel Ahmad, Fazal Masood Kundi, - Irum Tareen, Muhammad Zubair Asghar",Lexical Based Semantic Orientation of Online Customer Reviews and Blogs,cs.CL cs.IR," Rapid increase in internet users along with growing power of online review -sites and social media has given birth to sentiment analysis or opinion mining, -which aims at determining what other people think and comment. Sentiments or -Opinions contain public generated content about products, services, policies -and politics. People are usually interested to seek positive and negative -opinions containing likes and dislikes, shared by users for features of -particular product or service. This paper proposed sentence-level lexical based -domain independent sentiment classification method for different types of data -such as reviews and blogs. The proposed method is based on general lexicons -i.e. WordNet, SentiWordNet and user defined lexical dictionaries for semantic -orientation. The relations and glosses of these dictionaries provide solution -to the domain portability problem. The method performs better than word and -text level corpus based machine learning methods for semantic orientation. The -results show the proposed method performs better as it shows precision of 87% -and83% at document and sentence levels respectively for online comments. -" -3163,1607.02436,Rocco Tripodi and Marcello Pelillo,Document Clustering Games in Static and Dynamic Scenarios,cs.AI cs.CL cs.GT," In this work we propose a game theoretic model for document clustering. Each -document to be clustered is represented as a player and each cluster as a -strategy. The players receive a reward interacting with other players that they -try to maximize choosing their best strategies. The geometry of the data is -modeled with a weighted graph that encodes the pairwise similarity among -documents, so that similar players are constrained to choose similar -strategies, updating their strategy preferences at each iteration of the games. -We used different approaches to find the prototypical elements of the clusters -and with this information we divided the players into two disjoint sets, one -collecting players with a definite strategy and the other one collecting -players that try to learn from others the correct strategy to play. The latter -set of players can be considered as new data points that have to be clustered -according to previous information. This representation is useful in scenarios -in which the data are streamed continuously. The evaluation of the system was -conducted on 13 document datasets using different settings. It shows that the -proposed method performs well compared to different document clustering -algorithms. -" -3164,1607.02467,"Marc Dymetman, Chunyang Xiao","Log-Linear RNNs: Towards Recurrent Neural Networks with Flexible Prior - Knowledge",cs.AI cs.CL cs.LG cs.NE," We introduce LL-RNNs (Log-Linear RNNs), an extension of Recurrent Neural -Networks that replaces the softmax output layer by a log-linear output layer, -of which the softmax is a special case. This conceptually simple move has two -main advantages. First, it allows the learner to combat training data sparsity -by allowing it to model words (or more generally, output symbols) as complex -combinations of attributes without requiring that each combination is directly -observed in the training data (as the softmax does). Second, it permits the -inclusion of flexible prior knowledge in the form of a priori specified modular -features, where the neural network component learns to dynamically control the -weights of a log-linear distribution exploiting these features. - We conduct experiments in the domain of language modelling of French, that -exploit morphological prior knowledge and show an important decrease in -perplexity relative to a baseline RNN. - We provide other motivating iillustrations, and finally argue that the -log-linear and the neural-network components contribute complementary strengths -to the LL-RNN: the LL aspect allows the model to incorporate rich prior -knowledge, while the NN aspect, according to the ""representation learning"" -paradigm, allows the model to discover novel combination of characteristics. -" -3165,1607.02501,"Adithya Rao, Nemanja Spasojevic","Actionable and Political Text Classification using Word Embeddings and - LSTM",cs.CL cs.IR," In this work, we apply word embeddings and neural networks with Long -Short-Term Memory (LSTM) to text classification problems, where the -classification criteria are decided by the context of the application. We -examine two applications in particular. The first is that of Actionability, -where we build models to classify social media messages from customers of -service providers as Actionable or Non-Actionable. We build models for over 30 -different languages for actionability, and most of the models achieve accuracy -around 85%, with some reaching over 90% accuracy. We also show that using LSTM -neural networks with word embeddings vastly outperform traditional techniques. -Second, we explore classification of messages with respect to political -leaning, where social media messages are classified as Democratic or -Republican. The model is able to classify messages with a high accuracy of -87.57%. As part of our experiments, we vary different hyperparameters of the -neural networks, and report the effect of such variation on the accuracy. These -actionability models have been deployed to production and help company agents -provide customer support by prioritizing which messages to respond to. The -model for political leaning has been opened and made available for wider use. -" -3166,1607.02576,K Paramesha and K C Ravishankar,Analysis of opinionated text for opinion mining,cs.CL cs.AI cs.IR," In sentiment analysis, the polarities of the opinions expressed on an -object/feature are determined to assess the sentiment of a sentence or document -whether it is positive/negative/neutral. Naturally, the object/feature is a -noun representation which refers to a product or a component of a product, let -us say, the ""lens"" in a camera and opinions emanating on it are captured in -adjectives, verbs, adverbs and noun words themselves. Apart from such words, -other meta-information and diverse effective features are also going to play an -important role in influencing the sentiment polarity and contribute -significantly to the performance of the system. In this paper, some of the -associated information/meta-data are explored and investigated in the sentiment -text. Based on the analysis results presented here, there is scope for further -assessment and utilization of the meta-information as features in text -categorization, ranking text document, identification of spam documents and -polarity classification problems. -" -3167,1607.02769,Gitit Kehat and James Pustejovsky,Annotation Methodologies for Vision and Language Dataset Creation,cs.CV cs.CL," Annotated datasets are commonly used in the training and evaluation of tasks -involving natural language and vision (image description generation, action -recognition and visual question answering). However, many of the existing -datasets reflect problems that emerge in the process of data selection and -annotation. Here we point out some of the difficulties and problems one -confronts when creating and validating annotated vision and language datasets. -" -3168,1607.02784,Duc-Thuan Vo and Ebrahim Bagheri,Open Information Extraction,cs.CL cs.AI," Open Information Extraction (Open IE) systems aim to obtain relation tuples -with highly scalable extraction in portable across domain by identifying a -variety of relation phrases and their arguments in arbitrary sentences. The -first generation of Open IE learns linear chain models based on unlexicalized -features such as Part-of-Speech (POS) or shallow tags to label the intermediate -words between pair of potential arguments for identifying extractable -relations. Open IE currently is developed in the second generation that is able -to extract instances of the most frequently observed relation types such as -Verb, Noun and Prep, Verb and Prep, and Infinitive with deep linguistic -analysis. They expose simple yet principled ways in which verbs express -relationships in linguistics such as verb phrase-based extraction or -clause-based extraction. They obtain a significantly higher performance over -previous systems in the first generation. In this paper, we describe an -overview of two Open IE generations including strengths, weaknesses and -application areas. -" -3169,1607.02789,"John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu",Charagram: Embedding Words and Sentences via Character n-grams,cs.CL," We present Charagram embeddings, a simple approach for learning -character-based compositional models to embed textual sequences. A word or -sentence is represented using a character n-gram count vector, followed by a -single nonlinear transformation to yield a low-dimensional embedding. We use -three tasks for evaluation: word similarity, sentence similarity, and -part-of-speech tagging. We demonstrate that Charagram embeddings outperform -more complex architectures based on character-level recurrent and convolutional -neural networks, achieving new state-of-the-art performance on several -similarity tasks. -" -3170,1607.02791,"Kevin Shu, Sharjeel Aziz, Vy-Luan Huynh, David Warrick, Matilde - Marcolli",Syntactic Phylogenetic Trees,cs.CL," In this paper we identify several serious problems that arise in the use of -syntactic data from the SSWL database for the purpose of computational -phylogenetic reconstruction. We show that the most naive approach fails to -produce reliable linguistic phylogenetic trees. We identify some of the sources -of the observed problems and we discuss how they may be, at least partly, -corrected by using additional information, such as prior subdivision into -language families and subfamilies, and a better use of the information about -ancient languages. We also describe how the use of phylogenetic algebraic -geometry can help in estimating to what extent the probability distribution at -the leaves of the phylogenetic tree obtained from the SSWL data can be -considered reliable, by testing it on phylogenetic trees established by other -forms of linguistic analysis. In simple examples, we find that, after -restricting to smaller language subfamilies and considering only those SSWL -parameters that are fully mapped for the whole subfamily, the SSWL data match -extremely well reliable phylogenetic trees, according to the evaluation of -phylogenetic invariants. This is a promising sign for the use of SSWL data for -linguistic phylogenetics. -" -3171,1607.02802,Franck Dernoncourt,Mapping distributional to model-theoretic semantic spaces: a baseline,cs.CL cs.AI stat.ML," Word embeddings have been shown to be useful across state-of-the-art systems -in many natural language processing tasks, ranging from question answering -systems to dependency parsing. (Herbelot and Vecchi, 2015) explored word -embeddings and their utility for modeling language semantics. In particular, -they presented an approach to automatically map a standard distributional -semantic space onto a set-theoretic model using partial least squares -regression. We show in this paper that a simple baseline achieves a +51% -relative improvement compared to their model on one of the two datasets they -used, and yields competitive results on the second dataset. -" -3172,1607.02810,"Mahnoosh Kholghi, Lance De Vine, Laurianne Sitbon, Guido Zuccon, - Anthony Nguyen","The Benefits of Word Embeddings Features for Active Learning in Clinical - Information Extraction",cs.CL," This study investigates the use of unsupervised word embeddings and sequence -features for sample representation in an active learning framework built to -extract clinical concepts from clinical free text. The objective is to further -reduce the manual annotation effort while achieving higher effectiveness -compared to a set of baseline features. Unsupervised features are derived from -skip-gram word embeddings and a sequence representation approach. The -comparative performance of unsupervised features and baseline hand-crafted -features in an active learning framework are investigated using a wide range of -selection criteria including least confidence, information diversity, -information density and diversity, and domain knowledge informativeness. Two -clinical datasets are used for evaluation: the i2b2/VA 2010 NLP challenge and -the ShARe/CLEF 2013 eHealth Evaluation Lab. Our results demonstrate significant -improvements in terms of effectiveness as well as annotation effort savings -across both datasets. Using unsupervised features along with baseline features -for sample representation lead to further savings of up to 9% and 10% of the -token and concept annotation rates, respectively. -" -3173,1607.03055,Derek Greene and James P. Cross,"Exploring the Political Agenda of the European Parliament Using a - Dynamic Topic Modeling Approach",cs.CL cs.CY," This study analyzes the political agenda of the European Parliament (EP) -plenary, how it has evolved over time, and the manner in which Members of the -European Parliament (MEPs) have reacted to external and internal stimuli when -making plenary speeches. To unveil the plenary agenda and detect latent themes -in legislative speeches over time, MEP speech content is analyzed using a new -dynamic topic modeling method based on two layers of Non-negative Matrix -Factorization (NMF). This method is applied to a new corpus of all English -language legislative speeches in the EP plenary from the period 1999-2014. Our -findings suggest that two-layer NMF is a valuable alternative to existing -dynamic topic modeling approaches found in the literature, and can unveil niche -topics and associated vocabularies not captured by existing methods. -Substantively, our findings suggest that the political agenda of the EP evolves -significantly over time and reacts to exogenous events such as EU Treaty -referenda and the emergence of the Euro-crisis. MEP contributions to the -plenary agenda are also found to be impacted upon by voting behaviour and the -committee structure of the Parliament. -" -3174,1607.03316,Dirk Weissenborn,Separating Answers from Queries for Neural Reading Comprehension,cs.CL cs.NE," We present a novel neural architecture for answering queries, designed to -optimally leverage explicit support in the form of query-answer memories. Our -model is able to refine and update a given query while separately accumulating -evidence for predicting the answer. Its architecture reflects this separation -with dedicated embedding matrices and loosely connected information pathways -(modules) for updating the query and accumulating evidence. This separation of -responsibilities effectively decouples the search for query related support and -the prediction of the answer. On recent benchmark datasets for reading -comprehension, our model achieves state-of-the-art results. A qualitative -analysis reveals that the model effectively accumulates weighted evidence from -the query and over multiple support retrieval cycles which results in a robust -answer prediction. -" -3175,1607.03474,"Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutn\'ik and - J\""urgen Schmidhuber",Recurrent Highway Networks,cs.LG cs.CL cs.NE," Many sequential processing tasks require complex nonlinear transition -functions from one step to the next. However, recurrent neural networks with -'deep' transition functions remain difficult to train, even when using Long -Short-Term Memory (LSTM) networks. We introduce a novel theoretical analysis of -recurrent networks based on Gersgorin's circle theorem that illuminates several -modeling and optimization issues and improves our understanding of the LSTM -cell. Based on this analysis we propose Recurrent Highway Networks, which -extend the LSTM architecture to allow step-to-step transition depths larger -than one. Several language modeling experiments demonstrate that the proposed -architecture results in powerful and efficient models. On the Penn Treebank -corpus, solely increasing the transition depth from 1 to 10 improves word-level -perplexity from 90.6 to 65.4 using the same number of parameters. On the larger -Wikipedia datasets for character prediction (text8 and enwik8), RHNs outperform -all previous results and achieve an entropy of 1.27 bits per character. -" -3176,1607.03542,Matt Gardner and Jayant Krishnamurthy,"Open-Vocabulary Semantic Parsing with both Distributional Statistics and - Formal Knowledge",cs.CL," Traditional semantic parsers map language onto compositional, executable -queries in a fixed schema. This mapping allows them to effectively leverage the -information contained in large, formal knowledge bases (KBs, e.g., Freebase) to -answer questions, but it is also fundamentally limiting---these semantic -parsers can only assign meaning to language that falls within the KB's -manually-produced schema. Recently proposed methods for open vocabulary -semantic parsing overcome this limitation by learning execution models for -arbitrary language, essentially using a text corpus as a kind of knowledge -base. However, all prior approaches to open vocabulary semantic parsing replace -a formal KB with textual information, making no use of the KB in their models. -We show how to combine the disparate representations used by these two -approaches, presenting for the first time a semantic parser that (1) produces -compositional, executable representations of language, (2) can successfully -leverage the information contained in both a formal KB and a large corpus, and -(3) is not limited to the schema of the underlying KB. We demonstrate -significantly improved performance over state-of-the-art baselines on an -open-domain natural language question answering task. -" -3177,1607.03707,"Hwiyeol Jo, Yohan Moon, Jong In Kim, and Jeong Ryu","Re-presenting a Story by Emotional Factors using Sentimental Analysis - Method",cs.CL cs.LG," Remembering an event is affected by personal emotional status. We examined -the psychological status and personal factors; depression (Center for -Epidemiological Studies - Depression, Radloff, 1977), present affective -(Positive Affective and Negative Affective Schedule, Watson et al., 1988), life -orient (Life Orient Test, Scheier & Carver, 1985), self-awareness (Core Self -Evaluation Scale, Judge et al., 2003), and social factor (Social Support, -Sarason et al., 1983) of undergraduate students (N=64) and got summaries of a -story, Chronicle of a Death Foretold (Gabriel Garcia Marquez, 1981) from them. -We implement a sentimental analysis model based on convolutional neural network -(LeCun & Bengio, 1995) to evaluate each summary. From the same vein used for -transfer learning (Pan & Yang, 2010), we collected 38,265 movie review data to -train the model and then use them to score summaries of each student. The -results of CES-D and PANAS show the relationship between emotion and memory -retrieval as follows: depressed people have shown a tendency of representing a -story more negatively, and they seemed less expressive. People with full of -emotion - high in PANAS - have retrieved their memory more expressively than -others, using more negative words then others. The contributions of this study -can be summarized as follows: First, lightening the relationship between -emotion and its effect during times of storing or retrieving a memory. Second, -suggesting objective methods to evaluate the intensity of emotion in natural -language format, using a sentimental analysis model. -" -3178,1607.03766,"Sebastian Sager and Benjamin Elizalde and Damian Borth and Christian - Schulze and Bhiksha Raj and Ian Lane","AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content - Analysis",cs.SD cs.CL," Recently, sound recognition has been used to identify sounds, such as car and -river. However, sounds have nuances that may be better described by -adjective-noun pairs such as slow car, and verb-noun pairs such as flying -insects, which are under explored. Therefore, in this work we investigate the -relation between audio content and both adjective-noun pairs and verb-noun -pairs. Due to the lack of datasets with these kinds of annotations, we -collected and processed the AudioPairBank corpus consisting of a combined total -of 1,123 pairs and over 33,000 audio files. One contribution is the previously -unavailable documentation of the challenges and implications of collecting -audio recordings with these type of labels. A second contribution is to show -the degree of correlation between the audio content and the labels through -sound recognition experiments, which yielded results of 70% accuracy, hence -also providing a performance benchmark. The results and study in this paper -encourage further exploration of the nuances in audio and are meant to -complement similar research performed on images and text in multimedia -analysis. -" -3179,1607.03780,James Henderson and Diana Nicoleta Popa,A Vector Space for Distributional Semantics for Entailment,cs.CL cs.LG," Distributional semantics creates vector-space representations that capture -many forms of semantic similarity, but their relation to semantic entailment -has been less clear. We propose a vector-space model which provides a formal -foundation for a distributional semantics of entailment. Using a mean-field -approximation, we develop approximate inference procedures and entailment -operators over vectors of probabilities of features being known (versus -unknown). We use this framework to reinterpret an existing -distributional-semantic model (Word2Vec) as approximating an entailment-based -model of the distributions of words in contexts, thereby predicting lexical -entailment relations. In both unsupervised and semi-supervised experiments on -hyponymy detection, we get substantial improvements over previous results. -" -3180,1607.03827,"Matthias Plappert, Christian Mandery, Tamim Asfour",The KIT Motion-Language Dataset,cs.RO cs.CL cs.CV cs.LG," Linking human motion and natural language is of great interest for the -generation of semantic representations of human activities as well as for the -generation of robot activities based on natural language input. However, while -there have been years of research in this area, no standardized and openly -available dataset exists to support the development and evaluation of such -systems. We therefore propose the KIT Motion-Language Dataset, which is large, -open, and extensible. We aggregate data from multiple motion capture databases -and include them in our dataset using a unified representation that is -independent of the capture system or marker set, making it easy to work with -the data regardless of its origin. To obtain motion annotations in natural -language, we apply a crowd-sourcing approach and a web-based tool that was -specifically build for this purpose, the Motion Annotation Tool. We thoroughly -document the annotation process itself and discuss gamification methods that we -used to keep annotators motivated. We further propose a novel method, -perplexity-based selection, which systematically selects motions for further -annotation that are either under-represented in our dataset or that have -erroneous annotations. We show that our method mitigates the two aforementioned -problems and ensures a systematic annotation process. We provide an in-depth -analysis of the structure and contents of our resulting dataset, which, as of -October 10, 2016, contains 3911 motions with a total duration of 11.23 hours -and 6278 annotations in natural language that contain 52,903 words. We believe -this makes our dataset an excellent choice that enables more transparent and -comparable research in this important area. -" -3181,1607.03895,Liye Fu and Cristian Danescu-Niculescu-Mizil and Lillian Lee,"Tie-breaker: Using language models to quantify gender bias in sports - journalism",cs.CL physics.soc-ph," Gender bias is an increasingly important issue in sports journalism. In this -work, we propose a language-model-based approach to quantify differences in -questions posed to female vs. male athletes, and apply it to tennis post-match -interviews. We find that journalists ask male players questions that are -generally more focused on the game when compared with the questions they ask -their female counterparts. We also provide a fine-grained analysis of the -extent to which the salience of this bias depends on various factors, such as -question type, game outcome or player rank. -" -3182,1607.04110,"Giulio Petrucci, Chiara Ghidini, Marco Rospocher",Using Recurrent Neural Network for Learning Expressive Ontologies,cs.CL cs.AI," Recently, Neural Networks have been proven extremely effective in many -natural language processing tasks such as sentiment analysis, question -answering, or machine translation. Aiming to exploit such advantages in the -Ontology Learning process, in this technical report we present a detailed -description of a Recurrent Neural Network based system to be used to pursue -such goal. -" -3183,1607.04315,Tsendsuren Munkhdalai and Hong Yu,Neural Semantic Encoders,cs.LG cs.CL stat.ML," We present a memory augmented neural network for natural language -understanding: Neural Semantic Encoders. NSE is equipped with a novel memory -update rule and has a variable sized encoding memory that evolves over time and -maintains the understanding of input sequences through read}, compose and write -operations. NSE can also access multiple and shared memories. In this paper, we -demonstrated the effectiveness and the flexibility of NSE on five different -natural language tasks: natural language inference, question answering, -sentence classification, document sentiment analysis and machine translation -where NSE achieved state-of-the-art performance when evaluated on publically -available benchmarks. For example, our shared-memory model showed an -encouraging result on neural machine translation, improving an attention-based -baseline by approximately 1.0 BLEU. -" -3184,1607.04423,"Yiming Cui, Zhipeng Chen, Si Wei, Shijin Wang, Ting Liu and Guoping Hu",Attention-over-Attention Neural Networks for Reading Comprehension,cs.CL cs.NE," Cloze-style queries are representative problems in reading comprehension. -Over the past few months, we have seen much progress that utilizing neural -network approach to solve Cloze-style questions. In this paper, we present a -novel model called attention-over-attention reader for the Cloze-style reading -comprehension task. Our model aims to place another attention mechanism over -the document-level attention, and induces ""attended attention"" for final -predictions. Unlike the previous works, our neural network model requires less -pre-defined hyper-parameters and uses an elegant architecture for modeling. -Experimental results show that the proposed attention-over-attention model -significantly outperforms various state-of-the-art systems by a large margin in -public datasets, such as CNN and Children's Book Test datasets. -" -3185,1607.04492,Tsendsuren Munkhdalai and Hong Yu,Neural Tree Indexers for Text Understanding,cs.CL cs.LG stat.ML," Recurrent neural networks (RNNs) process input text sequentially and model -the conditional transition between word tokens. In contrast, the advantages of -recursive networks include that they explicitly model the compositionality and -the recursive structure of natural language. However, the current recursive -architecture is limited by its dependence on syntactic tree. In this paper, we -introduce a robust syntactic parsing-independent tree structured model, Neural -Tree Indexers (NTI) that provides a middle ground between the sequential RNNs -and the syntactic treebased recursive models. NTI constructs a full n-ary tree -by processing the input text with its node function in a bottom-up fashion. -Attention mechanism can then be applied to both structure and node function. We -implemented and evaluated a binarytree model of NTI, showing the model achieved -the state-of-the-art performance on three different NLP tasks: natural language -inference, answer sentence selection, and sentence classification, -outperforming state-of-the-art recurrent and recursive neural networks. -" -3186,1607.04576,"John M. Pierre, Mark Butler, Jacob Portnoff, and Luis Aguilar",Neural Discourse Modeling of Conversations,cs.CL cs.NE," Deep neural networks have shown recent promise in many language-related tasks -such as the modeling of conversations. We extend RNN-based sequence to sequence -models to capture the long range discourse across many turns of conversation. -We perform a sensitivity analysis on how much additional context affects -performance, and provide quantitative and qualitative evidence that these -models are able to capture discourse relationships across multiple utterances. -Our results quantifies how adding an additional RNN layer for modeling -discourse improves the quality of output utterances and providing more of the -previous conversation as input also improves performance. By searching the -generated outputs for specific discourse markers we show how neural discourse -models can exhibit increased coherence and cohesion in conversations. -" -3187,1607.04606,"Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov",Enriching Word Vectors with Subword Information,cs.CL cs.LG," Continuous word representations, trained on large unlabeled corpora are -useful for many natural language processing tasks. Popular models that learn -such representations ignore the morphology of words, by assigning a distinct -vector to each word. This is a limitation, especially for languages with large -vocabularies and many rare words. In this paper, we propose a new approach -based on the skipgram model, where each word is represented as a bag of -character $n$-grams. A vector representation is associated to each character -$n$-gram; words being represented as the sum of these representations. Our -method is fast, allowing to train models on large corpora quickly and allows us -to compute word representations for words that did not appear in the training -data. We evaluate our word representations on nine different languages, both on -word similarity and analogy tasks. By comparing to recently proposed -morphological word representations, we show that our vectors achieve -state-of-the-art performance on these tasks. -" -3188,1607.04660,Victor Andrei and Ognjen Arandjelovic,"Identification of promising research directions using machine learning - aided medical literature analysis",cs.CL cs.IR," The rapidly expanding corpus of medical research literature presents major -challenges in the understanding of previous work, the extraction of maximum -information from collected data, and the identification of promising research -directions. We present a case for the use of advanced machine learning -techniques as an aide in this task and introduce a novel methodology that is -shown to be capable of extracting meaningful information from large -longitudinal corpora, and of tracking complex temporal changes within it. -" -3189,1607.04683,"Raziel Alvarez, Rohit Prabhavalkar, Anton Bakhtin",On the efficient representation and execution of deep acoustic models,cs.LG cs.CL," In this paper we present a simple and computationally efficient quantization -scheme that enables us to reduce the resolution of the parameters of a neural -network from 32-bit floating point values to 8-bit integer values. The proposed -quantization scheme leads to significant memory savings and enables the use of -optimized hardware instructions for integer arithmetic, thus significantly -reducing the cost of inference. Finally, we propose a ""quantization aware"" -training process that applies the proposed scheme during network training and -find that it allows us to recover most of the loss in accuracy introduced by -quantization. We validate the proposed techniques by applying them to a long -short-term memory-based acoustic model on an open-ended large vocabulary speech -recognition task. -" -3190,1607.04853,Anirban Laha and Vikas Raykar,"An Empirical Evaluation of various Deep Learning Architectures for - Bi-Sequence Classification Tasks",cs.CL," Several tasks in argumentation mining and debating, question-answering, and -natural language inference involve classifying a sequence in the context of -another sequence (referred as bi-sequence classification). For several single -sequence classification tasks, the current state-of-the-art approaches are -based on recurrent and convolutional neural networks. On the other hand, for -bi-sequence classification problems, there is not much understanding as to the -best deep learning architecture. In this paper, we attempt to get an -understanding of this category of problems by extensive empirical evaluation of -19 different deep learning architectures (specifically on different ways of -handling context) for various problems originating in natural language -processing like debating, textual entailment and question-answering. Following -the empirical evaluation, we offer our insights and conclusions regarding the -architectures we have considered. We also establish the first deep learning -baselines for three argumentation mining tasks. -" -3191,1607.04982,Juntao Yu and Bernd Bohnet,Dependency Language Models for Transition-based Dependency Parsing,cs.CL," In this paper, we present an approach to improve the accuracy of a strong -transition-based dependency parser by exploiting dependency language models -that are extracted from a large parsed corpus. We integrated a small number of -features based on the dependency language models into the parser. To -demonstrate the effectiveness of the proposed approach, we evaluate our parser -on standard English and Chinese data where the base parser could achieve -competitive accuracy scores. Our enhanced parser achieved state-of-the-art -accuracy on Chinese data and competitive results on English data. We gained a -large absolute improvement of one point (UAS) on Chinese and 0.5 points for -English. -" -3192,1607.05014,Steffen Eger and Armin Hoenen and Alexander Mehler,Language classification from bilingual word embedding graphs,cs.CL," We study the role of the second language in bilingual word embeddings in -monolingual semantic evaluation tasks. We find strongly and weakly positive -correlations between down-stream task performance and second language -similarity to the target language. Additionally, we show how bilingual word -embeddings can be employed for the task of semantic language classification and -that joint semantic spaces vary in meaningful ways across second languages. Our -results support the hypothesis that semantic language similarity is influenced -by both structural similarity as well as geography/contact. -" -3193,1607.05108,"Zichao Yang, Zhiting Hu, Yuntian Deng, Chris Dyer, Alex Smola",Neural Machine Translation with Recurrent Attention Modeling,cs.NE cs.CL," Knowing which words have been attended to in previous time steps while -generating a translation is a rich source of information for predicting what -words will be attended to in the future. We improve upon the attention model of -Bahdanau et al. (2014) by explicitly modeling the relationship between previous -and subsequent attention levels for each word using one recurrent network per -input word. This architecture easily captures informative features, such as -fertility and regularities in relative distortion. In experiments, we show our -parameterization of attention improves translation quality. -" -3194,1607.05142,"Matthias Galle, Jean-Michel Renders, Guillaume Jacquet",Joint Event Detection and Entity Resolution: a Virtuous Cycle,cs.CL," Clustering web documents has numerous applications, such as aggregating news -articles into meaningful events, detecting trends and hot topics on the Web, -preserving diversity in search results, etc. At the same time, the importance -of named entities and, in particular, the ability to recognize them and to -solve the associated co-reference resolution problem are widely recognized as -key enabling factors when mining, aggregating and comparing content on the Web. - Instead of considering these two problems separately, we propose in this -paper a method that tackles jointly the problem of clustering news articles -into events and cross-document co-reference resolution of named entities. The -co-occurrence of named entities in the same clusters is used as an additional -signal to decide whether two referents should be merged into one entity. These -refined entities can in turn be used as enhanced features to re-cluster the -documents and then be refined again, entering into a virtuous cycle that -improves simultaneously the performances of both tasks. We implemented a -prototype system and report results using the TDT5 collection of news articles, -demonstrating the potential of our approach. -" -3195,1607.05174,Roger K. Moore,"Is spoken language all-or-nothing? Implications for future speech-based - human-machine interaction",cs.HC cs.AI cs.CL cs.RO," Recent years have seen significant market penetration for voice-based -personal assistants such as Apple's Siri. However, despite this success, user -take-up is frustratingly low. This position paper argues that there is a -habitability gap caused by the inevitable mismatch between the capabilities and -expectations of human users and the features and benefits provided by -contemporary technology. Suggestions are made as to how such problems might be -mitigated, but a more worrisome question emerges: ""is spoken language -all-or-nothing""? The answer, based on contemporary views on the special nature -of (spoken) language, is that there may indeed be a fundamental limit to the -interaction that can take place between mismatched interlocutors (such as -humans and machines). However, it is concluded that interactions between native -and non-native speakers, or between adults and children, or even between humans -and dogs, might provide critical inspiration for the design of future -speech-based human-machine interaction. -" -3196,1607.05241,Khanh Nguyen,Imitation Learning with Recurrent Neural Networks,cs.CL cs.LG stat.ML," We present a novel view that unifies two frameworks that aim to solve -sequential prediction problems: learning to search (L2S) and recurrent neural -networks (RNN). We point out equivalences between elements of the two -frameworks. By complementing what is missing from one framework comparing to -the other, we introduce a more advanced imitation learning framework that, on -one hand, augments L2S s notion of search space and, on the other hand, -enhances RNNs training procedure to be more robust to compounding errors -arising from training on highly correlated examples. -" -3197,1607.05368,"Jey Han Lau, Timothy Baldwin","An Empirical Evaluation of doc2vec with Practical Insights into Document - Embedding Generation",cs.CL," Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec -(Mikolov et al., 2013a) to learn document-level embeddings. Despite promising -results in the original paper, others have struggled to reproduce those -results. This paper presents a rigorous empirical evaluation of doc2vec over -two tasks. We compare doc2vec to two baselines and two state-of-the-art -document embedding methodologies. We found that doc2vec performs robustly when -using models trained on large external corpora, and can be further improved by -using pre-trained word embeddings. We also provide recommendations on -hyper-parameter settings for general purpose applications, and release source -code to induce document embeddings using our trained doc2vec models. -" -3198,1607.05408,"Will Radford, Matthias Galle","Discriminating between similar languages in Twitter using label - propagation",cs.CL," Identifying the language of social media messages is an important first step -in linguistic processing. Existing models for Twitter focus on content -analysis, which is successful for dissimilar language pairs. We propose a label -propagation approach that takes the social graph of tweet authors into account -as well as content to better tease apart similar languages. This results in -state-of-the-art shared task performance of $76.63\%$, $1.4\%$ higher than the -top system. -" -3199,1607.05422,"Abhijit Adhikari, Shivang Singh, Deepjyoti Mondal, Biswanath Dutta, - Animesh Dutta","A Novel Information Theoretic Framework for Finding Semantic Similarity - in WordNet",cs.IR cs.CL," Information content (IC) based measures for finding semantic similarity is -gaining preferences day by day. Semantics of concepts can be highly -characterized by information theory. The conventional way for calculating IC is -based on the probability of appearance of concepts in corpora. Due to data -sparseness and corpora dependency issues of those conventional approaches, a -new corpora independent intrinsic IC calculation measure has evolved. In this -paper, we mainly focus on such intrinsic IC model and several topological -aspects of the underlying ontology. Accuracy of intrinsic IC calculation and -semantic similarity measure rely on these aspects deeply. Based on these -analysis we propose an information theoretic framework which comprises an -intrinsic IC calculator and a semantic similarity model. Our approach is -compared with state of the art semantic similarity measures based on corpora -dependent IC calculation as well as intrinsic IC based methods using several -benchmark data set. We also compare our model with the related Edge based, -Feature based and Distributional approaches. Experimental results show that our -intrinsic IC model gives high correlation value when applied to different -semantic similarity models. Our proposed semantic similarity model also -achieves significant results when embedded with some state of the art IC models -including ours. -" -3200,1607.05650,"Shanta Phani, Shibamouli Lahiri and Arindam Biswas",A Supervised Authorship Attribution Framework for Bengali Language,cs.CL cs.DL," Authorship Attribution is a long-standing problem in Natural Language -Processing. Several statistical and computational methods have been used to -find a solution to this problem. In this paper, we have proposed methods to -deal with the authorship attribution problem in Bengali. -" -3201,1607.05666,"Yuxuan Wang, Pascal Getreuer, Thad Hughes, Richard F. Lyon, Rif A. - Saurous",Trainable Frontend For Robust and Far-Field Keyword Spotting,cs.CL cs.NE," Robust and far-field speech recognition is critical to enable true hands-free -communication. In far-field conditions, signals are attenuated due to distance. -To improve robustness to loudness variation, we introduce a novel frontend -called per-channel energy normalization (PCEN). The key ingredient of PCEN is -the use of an automatic gain control based dynamic compression to replace the -widely used static (such as log or root) compression. We evaluate PCEN on the -keyword spotting task. On our large rerecorded noisy and far-field eval sets, -we show that PCEN significantly improves recognition performance. Furthermore, -we model PCEN as neural network layers and optimize high-dimensional PCEN -parameters jointly with the keyword spotting acoustic model. The trained PCEN -frontend demonstrates significant further improvements without increasing model -complexity or inference-time cost. -" -3202,1607.05755,"Shanta Phani, Shibamouli Lahiri and Arindam Biswas",A New Bengali Readability Score,cs.CL," In this paper we have proposed methods to analyze the readability of Bengali -language texts. We have got some exceptionally good results out of the -experiments. -" -3203,1607.05809,"Kun Xiong, Anqi Cui, Zefeng Zhang, Ming Li","Neural Contextual Conversation Learning with Labeled Question-Answering - Pairs",cs.CL cs.AI," Neural conversational models tend to produce generic or safe responses in -different contexts, e.g., reply \textit{""Of course""} to narrative statements or -\textit{""I don't know""} to questions. In this paper, we propose an end-to-end -approach to avoid such problem in neural generative models. Additional memory -mechanisms have been introduced to standard sequence-to-sequence (seq2seq) -models, so that context can be considered while generating sentences. Three -seq2seq models, which memorize a fix-sized contextual vector from hidden input, -hidden input/output and a gated contextual attention structure respectively, -have been trained and tested on a dataset of labeled question-answering pairs -in Chinese. The model with contextual attention outperforms others including -the state-of-the-art seq2seq models on perplexity test. The novel contextual -model generates diverse and robust responses, and is able to carry out -conversations on a wide range of topics appropriately. -" -3204,1607.05818,"Ruey-Cheng Chen, Reid Swanson, and Andrew S. Gordon",An Adaptation of Topic Modeling to Sentences,cs.CL," Advances in topic modeling have yielded effective methods for characterizing -the latent semantics of textual data. However, applying standard topic modeling -approaches to sentence-level tasks introduces a number of challenges. In this -paper, we adapt the approach of latent-Dirichlet allocation to include an -additional layer for incorporating information about the sentence boundaries in -documents. We show that the addition of this minimal information of document -structure improves the perplexity results of a trained model. -" -3205,1607.05822,Ruey-Cheng Chen,"Incremental Learning for Fully Unsupervised Word Segmentation Using - Penalized Likelihood and Model Selection",cs.CL," We present a novel incremental learning approach for unsupervised word -segmentation that combines features from probabilistic modeling and model -selection. This includes super-additive penalties for addressing the cognitive -burden imposed by long word formation, and new model selection criteria based -on higher-order generative assumptions. Our approach is fully unsupervised; it -relies on a small number of parameters that permits flexible modeling and a -mechanism that automatically learns parameters from the data. Through -experimentation, we show that this intricate design has led to top-tier -performance in both phonemic and orthographic word segmentation. -" -3206,1607.05968,Michael Spranger and Jakob Suchan and Mehul Bhatt,"Robust Natural Language Processing - Combining Reasoning, Cognitive - Semantics and Construction Grammar for Spatial Language",cs.AI cs.CL," We present a system for generating and understanding of dynamic and static -spatial relations in robotic interaction setups. Robots describe an environment -of moving blocks using English phrases that include spatial relations such as -""across"" and ""in front of"". We evaluate the system in robot-robot interactions -and show that the system can robustly deal with visual perception errors, -language omissions and ungrammatical utterances. -" -3207,1607.06025,Janez Starc and Dunja Mladeni\'c,"Constructing a Natural Language Inference Dataset using Generative - Neural Networks",cs.AI cs.CL cs.NE," Natural Language Inference is an important task for Natural Language -Understanding. It is concerned with classifying the logical relation between -two sentences. In this paper, we propose several text generative neural -networks for generating text hypothesis, which allows construction of new -Natural Language Inference datasets. To evaluate the models, we propose a new -metric -- the accuracy of the classifier trained on the generated dataset. The -accuracy obtained by our best generative model is only 2.7% lower than the -accuracy of the classifier trained on the original, human crafted dataset. -Furthermore, the best generated dataset combined with the original dataset -achieves the highest accuracy. The best model learns a mapping embedding for -each training example. By comparing various metrics we show that datasets that -obtain higher ROUGE or METEOR scores do not necessarily yield higher -classification accuracies. We also provide analysis of what are the -characteristics of a good dataset including the distinguishability of the -generated datasets from the original one. -" -3208,1607.06153,"Marek Rei, Helen Yannakoudakis","Compositional Sequence Labeling Models for Error Detection in Learner - Writing",cs.CL cs.NE," In this paper, we present the first experiments using neural network models -for the task of error detection in learner writing. We perform a systematic -comparison of alternative compositional architectures and propose a framework -for error detection based on bidirectional LSTMs. Experiments on the CoNLL-14 -shared task dataset show the model is able to outperform other participants on -detecting errors in learner writing. Finally, the model is integrated with a -publicly deployed self-assessment system, leading to performance comparable to -human annotators. -" -3209,1607.06208,Xiaochang Peng and Daniel Gildea,Exploring phrase-compositionality in skip-gram models,cs.CL," In this paper, we introduce a variation of the skip-gram model which jointly -learns distributed word vector representations and their way of composing to -form phrase embeddings. In particular, we propose a learning procedure that -incorporates a phrase-compositionality function which can capture how we want -to compose phrases vectors from their component word vectors. Our experiments -show improvement in word and phrase similarity tasks as well as syntactic tasks -like dependency parsing using the proposed joint models. -" -3210,1607.06215,"Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, Liang Wang",A Comprehensive Survey on Cross-modal Retrieval,cs.MM cs.CL cs.IR," In recent years, cross-modal retrieval has drawn much attention due to the -rapid growth of multimodal data. It takes one type of data as the query to -retrieve relevant data of another type. For example, a user can use a text to -retrieve relevant pictures or videos. Since the query and its retrieved results -can be of different modalities, how to measure the content similarity between -different modalities of data remains a challenge. Various methods have been -proposed to deal with such a problem. In this paper, we first review a number -of representative methods for cross-modal retrieval and classify them into two -main groups: 1) real-valued representation learning, and 2) binary -representation learning. Real-valued representation learning methods aim to -learn real-valued common representations for different modalities of data. To -speed up the cross-modal retrieval, a number of binary representation learning -methods are proposed to map different modalities of data into a common Hamming -space. Then, we introduce several multimodal datasets in the community, and -show the experimental results on two commonly used multimodal datasets. The -comparison reveals the characteristic of different kinds of cross-modal -retrieval methods, which is expected to benefit both practical applications and -future research. Finally, we discuss open problems and future research -directions. -" -3211,1607.06221,K Paramesha and K C Ravishankar,A Perspective on Sentiment Analysis,cs.CL," Sentiment Analysis (SA) is indeed a fascinating area of research which has -stolen the attention of researchers as it has many facets and more importantly -it promises economic stakes in the corporate and governance sector. SA has been -stemmed out of text analytics and established itself as a separate identity and -a domain of research. The wide ranging results of SA have proved to influence -the way some critical decisions are taken. Hence, it has become relevant in -thorough understanding of the different dimensions of the input, output and the -processes and approaches of SA. -" -3212,1607.06275,"Peng Li, Wei Li, Zhengyan He, Xuguang Wang, Ying Cao, Jie Zhou, Wei Xu","Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain - Factoid Question Answering",cs.CL cs.AI cs.NE," While question answering (QA) with neural network, i.e. neural QA, has -achieved promising results in recent years, lacking of large scale real-word QA -dataset is still a challenge for developing and evaluating neural QA system. To -alleviate this problem, we propose a large scale human annotated real-world QA -dataset WebQA with more than 42k questions and 556k evidences. As existing -neural QA methods resolve QA either as sequence generation or -classification/ranking problem, they face challenges of expensive softmax -computation, unseen answers handling or separate candidate answer generation -component. In this work, we cast neural QA as a sequence labeling problem and -propose an end-to-end sequence labeling model, which overcomes all the above -challenges. Experimental results on WebQA show that our model outperforms the -baselines significantly with an F1 score of 74.69% with word-based input, and -the performance drops only 3.72 F1 points with more challenging character-based -input. -" -3213,1607.06299,"Janik Jaskolski, Fabian Siegberg, Thomas Tibroni, Philipp Cimiano, - Roman Klinger",Opinion Mining in Online Reviews About Distance Education Programs,cs.CL," The popularity of distance education programs is increasing at a fast pace. -En par with this development, online communication in fora, social media and -reviewing platforms between students is increasing as well. Exploiting this -information to support fellow students or institutions requires to extract the -relevant opinions in order to automatically generate reports providing an -overview of pros and cons of different distance education programs. We report -on an experiment involving distance education experts with the goal to develop -a dataset of reviews annotated with relevant categories and aspects in each -category discussed in the specific review together with an indication of the -sentiment. - Based on this experiment, we present an approach to extract general -categories and specific aspects under discussion in a review together with -their sentiment. We frame this task as a multi-label hierarchical text -classification problem and empirically investigate the performance of different -classification architectures to couple the prediction of a category with the -prediction of particular aspects in this category. We evaluate different -architectures and show that a hierarchical approach leads to superior results -in comparison to a flat model which makes decisions independently. -" -3214,1607.06330,Antonio San Mart\'in,"La representaci\'on de la variaci\'on contextual mediante definiciones - terminol\'ogicas flexibles",cs.CL," In this doctoral thesis, we apply premises of cognitive linguistics to -terminological definitions and present a proposal called the flexible -terminological definition. This consists of a set of definitions of the same -concept made up of a general definition (in this case, one encompassing the -entire environmental domain) along with additional definitions describing the -concept from the perspective of the subdomains in which it is relevant. Since -context is a determining factor in the construction of the meaning of lexical -units (including terms), we assume that terminological definitions can, and -should, reflect the effects of context, even though definitions have -traditionally been treated as the expression of meaning void of any contextual -effect. The main objective of this thesis is to analyze the effects of -contextual variation on specialized environmental concepts with a view to their -representation in terminological definitions. Specifically, we focused on -contextual variation based on thematic restrictions. To accomplish the -objectives of this doctoral thesis, we conducted an empirical study consisting -of the analysis of a set of contextually variable concepts and the creation of -a flexible definition for two of them. As a result of the first part of our -empirical study, we divided our notion of domain-dependent contextual variation -into three different phenomena: modulation, perspectivization and -subconceptualization. These phenomena are additive in that all concepts -experience modulation, some concepts also undergo perspectivization, and -finally, a small number of concepts are additionally subjected to -subconceptualization. In the second part, we applied these notions to -terminological definitions and we presented we presented guidelines on how to -build flexible definitions, from the extraction of knowledge to the actual -writing of the definition. -" -3215,1607.06520,"Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam - Kalai","Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word - Embeddings",cs.CL cs.AI cs.LG stat.ML," The blind application of machine learning runs the risk of amplifying biases -present in data. Such a danger is facing us with word embedding, a popular -framework to represent text data as vectors which has been used in many machine -learning and natural language processing tasks. We show that even word -embeddings trained on Google News articles exhibit female/male gender -stereotypes to a disturbing extent. This raises concerns because their -widespread use, as we describe, often tends to amplify these biases. -Geometrically, gender bias is first shown to be captured by a direction in the -word embedding. Second, gender neutral words are shown to be linearly separable -from gender definition words in the word embedding. Using these properties, we -provide a methodology for modifying an embedding to remove gender stereotypes, -such as the association between between the words receptionist and female, -while maintaining desired associations such as between the words queen and -female. We define metrics to quantify both direct and indirect gender biases in -embeddings, and develop algorithms to ""debias"" the embedding. Using -crowd-worker evaluation as well as standard benchmarks, we empirically -demonstrate that our algorithms significantly reduce gender bias in embeddings -while preserving the its useful properties such as the ability to cluster -related concepts and to solve analogy tasks. The resulting embeddings can be -used in applications without amplifying gender bias. -" -3216,1607.06532,"Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Hsin-Hsi Chen","Novel Word Embedding and Translation-based Language Modeling for - Extractive Speech Summarization",cs.CL cs.AI cs.IR cs.MM," Word embedding methods revolve around learning continuous distributed vector -representations of words with neural networks, which can capture semantic -and/or syntactic cues, and in turn be used to induce similarity measures among -words, sentences and documents in context. Celebrated methods can be -categorized as prediction-based and count-based methods according to the -training objectives and model architectures. Their pros and cons have been -extensively analyzed and evaluated in recent studies, but there is relatively -less work continuing the line of research to develop an enhanced learning -method that brings together the advantages of the two model families. In -addition, the interpretation of the learned word representations still remains -somewhat opaque. Motivated by the observations and considering the pressing -need, this paper presents a novel method for learning the word representations, -which not only inherits the advantages of classic word embedding methods but -also offers a clearer and more rigorous interpretation of the learned word -representations. Built upon the proposed word embedding method, we further -formulate a translation-based language modeling framework for the extractive -speech summarization task. A series of empirical evaluations demonstrate the -effectiveness of the proposed word representation learning and language -modeling techniques in extractive speech summarization. -" -3217,1607.06556,PengFei Liu and Xipeng Qiu and Xuanjing Huang,Syntax-based Attention Model for Natural Language Inference,cs.CL," Introducing attentional mechanism in neural network is a powerful concept, -and has achieved impressive results in many natural language processing tasks. -However, most of the existing models impose attentional distribution on a flat -topology, namely the entire input representation sequence. Clearly, any -well-formed sentence has its accompanying syntactic tree structure, which is a -much rich topology. Applying attention to such topology not only exploits the -underlying syntax, but also makes attention more interpretable. In this paper, -we explore this direction in the context of natural language inference. The -results demonstrate its efficacy. We also perform extensive qualitative -analysis, deriving insights and intuitions of why and how our model works. -" -3218,1607.06560,"Amol S Patwardhan, Jacob Badeaux, Siavash, Gerald M Knapp",Automated Prediction of Temporal Relations,cs.CL cs.AI," Background: There has been growing research interest in automated answering -of questions or generation of summary of free form text such as news article. -In order to implement this task, the computer should be able to identify the -sequence of events, duration of events, time at which event occurred and the -relationship type between event pairs, time pairs or event-time pairs. Specific -Problem: It is important to accurately identify the relationship type between -combinations of event and time before the temporal ordering of events can be -defined. The machine learning approach taken in Mani et. al (2006) provides an -accuracy of only 62.5 on the baseline data from TimeBank. The researchers used -maximum entropy classifier in their methodology. TimeML uses the TLINK -annotation to tag a relationship type between events and time. The time -complexity is quadratic when it comes to tagging documents with TLINK using -human annotation. This research proposes using decision tree and parsing to -improve the relationship type tagging. This research attempts to solve the gaps -in human annotation by automating the task of relationship type tagging in an -attempt to improve the accuracy of event and time relationship in annotated -documents. Scope information: The documents from the domain of news will be -used. The tagging will be performed within the same document and not across -documents. The relationship types will be identified only for a pair of event -and time and not a chain of events. The research focuses on documents tagged -using the TimeML specification which contains tags such as EVENT, TLINK, and -TIMEX. Each tag has attributes such as identifier, relation, POS, time etc. -" -3219,1607.06852,"Adam James Summerville, James Ryan, Michael Mateas, Noah Wardrip-Fruin","CFGs-2-NLU: Sequence-to-Sequence Learning for Mapping Utterances to - Semantics and Pragmatics",cs.CL," In this paper, we present a novel approach to natural language understanding -that utilizes context-free grammars (CFGs) in conjunction with -sequence-to-sequence (seq2seq) deep learning. Specifically, we take a CFG -authored to generate dialogue for our target application for NLU, a videogame, -and train a long short-term memory (LSTM) recurrent neural network (RNN) to map -the surface utterances that it produces to traces of the grammatical expansions -that yielded them. Critically, this CFG was authored using a tool we have -developed that supports arbitrary annotation of the nonterminal symbols in the -grammar. Because we already annotated the symbols in this grammar for the -semantic and pragmatic considerations that our game's dialogue manager operates -over, we can use the grammatical trace associated with any surface utterance to -infer such information. During gameplay, we translate player utterances into -grammatical traces (using our RNN), collect the mark-up attributed to the -symbols included in that trace, and pass this information to the dialogue -manager, which updates the conversation state accordingly. From an offline -evaluation task, we demonstrate that our trained RNN translates surface -utterances to grammatical traces with great accuracy. To our knowledge, this is -the first usage of seq2seq learning for conversational agents (our game's -characters) who explicitly reason over semantic and pragmatic considerations. -" -3220,1607.06875,"Steve Doubleday, Sean Trott, Jerome Feldman",Processing Natural Language About Ongoing Actions,cs.AI cs.CL cs.HC cs.RO," Actions may not proceed as planned; they may be interrupted, resumed or -overridden. This is a challenge to handle in a natural language understanding -system. We describe extensions to an existing implementation for the control of -autonomous systems by natural language, to enable such systems to handle -incoming language requests regarding actions. Language Communication with -Autonomous Systems (LCAS) has been extended with support for X-nets, -parameterized executable schemas representing actions. X-nets enable the system -to control actions at a desired level of granularity, while providing a -mechanism for language requests to be processed asynchronously. Standard -semantics supported include requests to stop, continue, or override the -existing action. The specific domain demonstrated is the control of motion of a -simulated robot, but the approach is general, and could be applied to other -domains. -" -3221,1607.06952,"Xinchi Chen, Xipeng Qiu, Xuanjing Huang",Neural Sentence Ordering,cs.CL," Sentence ordering is a general and critical task for natural language -generation applications. Previous works have focused on improving its -performance in an external, downstream task, such as multi-document -summarization. Given its importance, we propose to study it as an isolated -task. We collect a large corpus of academic texts, and derive a data driven -approach to learn pairwise ordering of sentences, and validate the efficacy -with extensive experiments. Source codes and dataset of this paper will be made -publicly available. -" -3222,1607.06961,Vanessa Queiroz Marinho and Graeme Hirst and Diego Raphael Amancio,Authorship attribution via network motifs identification,cs.CL," Concepts and methods of complex networks can be used to analyse texts at -their different complexity levels. Examples of natural language processing -(NLP) tasks studied via topological analysis of networks are keyword -identification, automatic extractive summarization and authorship attribution. -Even though a myriad of network measurements have been applied to study the -authorship attribution problem, the use of motifs for text analysis has been -restricted to a few works. The goal of this paper is to apply the concept of -motifs, recurrent interconnection patterns, in the authorship attribution task. -The absolute frequencies of all thirteen directed motifs with three nodes were -extracted from the co-occurrence networks and used as classification features. -The effectiveness of these features was verified with four machine learning -methods. The results show that motifs are able to distinguish the writing style -of different authors. In our best scenario, 57.5% of the books were correctly -classified. The chance baseline for this problem is 12.5%. In addition, we have -found that function words play an important role in these recurrent patterns. -Taken together, our findings suggest that motifs should be further explored in -other related linguistic tasks. -" -3223,1607.07057,Tomas Brychcin,Latent Tree Language Model,cs.CL," In this paper we introduce Latent Tree Language Model (LTLM), a novel -approach to language modeling that encodes syntax and semantics of a given -sentence as a tree of word roles. - The learning phase iteratively updates the trees by moving nodes according to -Gibbs sampling. We introduce two algorithms to infer a tree for a given -sentence. The first one is based on Gibbs sampling. It is fast, but does not -guarantee to find the most probable tree. The second one is based on dynamic -programming. It is slower, but guarantees to find the most probable tree. We -provide comparison of both algorithms. - We combine LTLM with 4-gram Modified Kneser-Ney language model via linear -interpolation. Our experiments with English and Czech corpora show significant -perplexity reductions (up to 46% for English and 49% for Czech) compared with -standalone 4-gram Modified Kneser-Ney language model. -" -3224,1607.07514,"Soroush Vosoughi, Prashanth Vijayaraghavan and Deb Roy","Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM - Encoder-Decoder",cs.CL cs.AI cs.NE cs.SI," We present Tweet2Vec, a novel method for generating general-purpose vector -representation of tweets. The model learns tweet embeddings using -character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, -randomly selected English-language tweets. The model was evaluated using two -methods: tweet semantic similarity and tweet sentiment categorization, -outperforming the previous state-of-the-art in both tasks. The evaluations -demonstrate the power of the tweet embeddings generated by our model for -various tweet categorization tasks. The vector representations generated by our -model are generic, and hence can be applied to a variety of tasks. Though the -model presented in this paper is trained on English-language tweets, the method -presented can be used to learn tweet embeddings for different languages. -" -3225,1607.07565,Michael Spranger and Jakob Suchan and Mehul Bhatt and Manfred Eppe,Grounding Dynamic Spatial Relations for Embodied (Robot) Interaction,cs.CL," This paper presents a computational model of the processing of dynamic -spatial relations occurring in an embodied robotic interaction setup. A -complete system is introduced that allows autonomous robots to produce and -interpret dynamic spatial phrases (in English) given an environment of moving -objects. The model unites two separate research strands: computational -cognitive semantics and on commonsense spatial representation and reasoning. -The model for the first time demonstrates an integration of these different -strands. -" -3226,1607.07602,Niraj Kumar and Premkumar Devanbu,OntoCat: Automatically categorizing knowledge in API Documentation,cs.SE cs.AI cs.CL," Most application development happens in the context of complex APIs; -reference documentation for APIs has grown tremendously in variety, complexity, -and volume, and can be difficult to navigate. There is a growing need to -develop well-organized ways to access the knowledge latent in the -documentation; several research efforts deal with the organization (ontology) -of API-related knowledge. Extensive knowledge-engineering work, supported by a -rigorous qualitative analysis, by Maalej & Robillard [3] has identified a -useful taxonomy of API knowledge. Based on this taxonomy, we introduce a domain -independent technique to extract the knowledge types from the given API -reference documentation. Our system, OntoCat, introduces total nine different -features and their semantic and statistical combinations to classify the -different knowledge types. We tested OntoCat on python API reference -documentation. Our experimental results show the effectiveness of the system -and opens the scope of probably related research areas (i.e., user behavior, -documentation quality, etc.). -" -3227,1607.07630,Michael Spranger,Grounded Lexicon Acquisition - Case Studies in Spatial Language,cs.CL," This paper discusses grounded acquisition experiments of increasing -complexity. Humanoid robots acquire English spatial lexicons from robot tutors. -We identify how various spatial language systems, such as projective, absolute -and proximal can be learned. The proposed learning mechanisms do not rely on -direct meaning transfer or direct access to world models of interlocutors. -Finally, we show how multiple systems can be acquired at the same time. -" -3228,1607.07657,"Yiou Lin, Hang Lei, Prince Clement Addo and Xiaoyu Li",Machine Learned Resume-Job Matching Solution,cs.CL," Job search through online matching engines nowadays are very prominent and -beneficial to both job seekers and employers. But the solutions of traditional -engines without understanding the semantic meanings of different resumes have -not kept pace with the incredible changes in machine learning techniques and -computing capability. These solutions are usually driven by manual rules and -predefined weights of keywords which lead to an inefficient and frustrating -search experience. To this end, we present a machine learned solution with rich -features and deep learning methods. Our solution includes three configurable -modules that can be plugged with little restrictions. Namely, unsupervised -feature extraction, base classifiers training and ensemble method learning. In -our solution, rather than using manual rules, machine learned methods to -automatically detect the semantic similarity of positions are proposed. Then -four competitive ""shallow"" estimators and ""deep"" estimators are selected. -Finally, ensemble methods to bag these estimators and aggregate their -individual predictions to form a final prediction are verified. Experimental -results of over 47 thousand resumes show that our solution can significantly -improve the predication precision current position, salary, educational -background and company scale. -" -3229,1607.07788,"Daria Micaela Hernandez, Monica Becue-Bertaut, Igor Barahona","How scientific literature has been evolving over the time? A novel - statistical approach using tracking verbal-based methods",cs.CL cs.DL," This paper provides a global vision of the scientific publications related -with the Systemic Lupus Erythematosus (SLE), taking as starting point abstracts -of articles. Through the time, abstracts have been evolving towards higher -complexity on used terminology, which makes necessary the use of sophisticated -statistical methods and answering questions including: how vocabulary is -evolving through the time? Which ones are most influential articles? And which -one are the articles that introduced new terms and vocabulary? To answer these, -we analyze a dataset composed by 506 abstracts and downloaded from 115 -different journals and cover a 18 year-period. -" -3230,1607.07931,Stuart Bradley,Synthetic Language Generation and Model Validation in BEAST2,cs.CL," Generating synthetic languages aids in the testing and validation of future -computational linguistic models and methods. This thesis extends the BEAST2 -phylogenetic framework to add linguistic sequence generation under multiple -models. The new plugin is then used to test the effects of the phenomena of -word borrowing on the inference process under two widely used phylolinguistic -models. -" -3231,1607.07956,"Yuezhang Li, Ronghuo Zheng, Tian Tian, Zhiting Hu, Rahul Iyer, Katia - Sycara","Joint Embedding of Hierarchical Categories and Entities for Concept - Categorization and Dataless Classification",cs.CL cs.AI," Due to the lack of structured knowledge applied in learning distributed -representation of cate- gories, existing work cannot incorporate category -hierarchies into entity information. We propose a framework that embeds -entities and categories into a semantic space by integrating structured -knowledge and taxonomy hierarchy from large knowledge bases. The framework -allows to com- pute meaningful semantic relatedness between entities and -categories. Our framework can han- dle both single-word concepts and -multiple-word concepts with superior performance on concept categorization and -yield state of the art results on dataless hierarchical classification. -" -3232,1607.08074,"Adrian Groza, Oana Popa","Mining Arguments from Cancer Documents Using Natural Language Processing - and Ontologies",cs.AI cs.CL," In the medical domain, the continuous stream of scientific research contains -contradictory results supported by arguments and counter-arguments. As medical -expertise occurs at different levels, part of the human agents have -difficulties to face the huge amount of studies, but also to understand the -reasons and pieces of evidences claimed by the proponents and the opponents of -the debated topic. To better understand the supporting arguments for new -findings related to current state of the art in the medical domain we need -tools able to identify arguments in scientific papers. Our work here aims to -fill the above technological gap. - Quite aware of the difficulty of this task, we embark to this road by relying -on the well-known interleaving of domain knowledge with natural language -processing. To formalise the existing medical knowledge, we rely on ontologies. -To structure the argumentation model we use also the expressivity and reasoning -capabilities of Description Logics. To perform argumentation mining we -formalise various linguistic patterns in a rule-based language. We tested our -solution against a corpus of scientific papers related to breast cancer. The -run experiments show a F-measure between 0.71 and 0.86 for identifying -conclusions of an argument and between 0.65 and 0.86 for identifying premises -of an argument. -" -3233,1607.08592,Erkki Luuk,Modeling selectional restrictions in a relational type system,cs.CL cs.AI," Selectional restrictions are semantic constraints on forming certain complex -types in natural language. The paper gives an overview of modeling selectional -restrictions in a relational type system with morphological and syntactic -types. We discuss some foundations of the system and ways of formalizing -selectional restrictions. - Keywords: type theory, selectional restrictions, syntax, morphology -" -3234,1607.08692,"Rui Wang, Hai Zhao, Sabine Ploux, Bao-Liang Lu, Masao Utiyama and - Eiichiro Sumita","A Novel Bilingual Word Embedding Method for Lexical Translation Using - Bilingual Sense Clique",cs.CL," Most of the existing methods for bilingual word embedding only consider -shallow context or simple co-occurrence information. In this paper, we propose -a latent bilingual sense unit (Bilingual Sense Clique, BSC), which is derived -from a maximum complete sub-graph of pointwise mutual information based graph -over bilingual corpus. In this way, we treat source and target words equally -and a separated bilingual projection processing that have to be used in most -existing works is not necessary any more. Several dimension reduction methods -are evaluated to summarize the BSC-word relationship. The proposed method is -evaluated on bilingual lexicon translation tasks and empirical results show -that bilingual sense embedding methods outperform existing bilingual word -embedding methods. -" -3235,1607.08693,"Rui Wang, Hai Zhao, Bao-Liang Lu, Masao Utiyama and Eiichro Sumita",Connecting Phrase based Statistical Machine Translation Adaptation,cs.CL," Although more additional corpora are now available for Statistical Machine -Translation (SMT), only the ones which belong to the same or similar domains -with the original corpus can indeed enhance SMT performance directly. Most of -the existing adaptation methods focus on sentence selection. In comparison, -phrase is a smaller and more fine grained unit for data selection, therefore we -propose a straightforward and efficient connecting phrase based adaptation -method, which is applied to both bilingual phrase pair and monolingual n-gram -adaptation. The proposed method is evaluated on IWSLT/NIST data sets, and the -results show that phrase based SMT performance are significantly improved (up -to +1.6 in comparison with phrase based SMT baseline system and +0.9 in -comparison with existing methods). -" -3236,1607.08720,"Jiazhen He, Benjamin I. P. Rubinstein, James Bailey, Rui Zhang, Sandra - Milligan","TopicResponse: A Marriage of Topic Modelling and Rasch Modelling for - Automatic Measurement in MOOCs",cs.LG cs.CL cs.IR stat.ML," This paper explores the suitability of using automatically discovered topics -from MOOC discussion forums for modelling students' academic abilities. The -Rasch model from psychometrics is a popular generative probabilistic model that -relates latent student skill, latent item difficulty, and observed student-item -responses within a principled, unified framework. According to scholarly -educational theory, discovered topics can be regarded as appropriate -measurement items if (1) students' participation across the discovered topics -is well fit by the Rasch model, and if (2) the topics are interpretable to -subject-matter experts as being educationally meaningful. Such Rasch-scaled -topics, with associated difficulty levels, could be of potential benefit to -curriculum refinement, student assessment and personalised feedback. The -technical challenge that remains, is to discover meaningful topics that -simultaneously achieve good statistical fit with the Rasch model. To address -this challenge, we combine the Rasch model with non-negative matrix -factorisation based topic modelling, jointly fitting both models. We -demonstrate the suitability of our approach with quantitative experiments on -data from three Coursera MOOCs, and with qualitative survey results on topic -interpretability on a Discrete Optimisation MOOC. -" -3237,1607.08723,Emmanuel Dupoux,"Cognitive Science in the era of Artificial Intelligence: A roadmap for - reverse-engineering the infant language-learner",cs.CL cs.AI cs.LG," During their first years of life, infants learn the language(s) of their -environment at an amazing speed despite large cross cultural variations in -amount and complexity of the available language input. Understanding this -simple fact still escapes current cognitive and linguistic theories. Recently, -spectacular progress in the engineering science, notably, machine learning and -wearable technology, offer the promise of revolutionizing the study of -cognitive development. Machine learning offers powerful learning algorithms -that can achieve human-like performance on many linguistic tasks. Wearable -sensors can capture vast amounts of data, which enable the reconstruction of -the sensory experience of infants in their natural environment. The project of -'reverse engineering' language development, i.e., of building an effective -system that mimics infant's achievements appears therefore to be within reach. -Here, we analyze the conditions under which such a project can contribute to -our scientific understanding of early language development. We argue that -instead of defining a sub-problem or simplifying the data, computational models -should address the full complexity of the learning situation, and take as input -the raw sensory signals available to infants. This implies that (1) accessible -but privacy-preserving repositories of home data be setup and widely shared, -and (2) models be evaluated at different linguistic levels through a benchmark -of psycholinguist tests that can be passed by machines and humans alike, (3) -linguistically and psychologically plausible learning architectures be scaled -up to real data using probabilistic/optimization principles from machine -learning. We discuss the feasibility of this approach and present preliminary -results. -" -3238,1607.08725,"Biao Zhang, Deyi Xiong and Jinsong Su",Cseq2seq: Cyclic Sequence-to-Sequence Learning,cs.CL," The vanilla sequence-to-sequence learning (seq2seq) reads and encodes a -source sequence into a fixed-length vector only once, suffering from its -insufficiency in modeling structural correspondence between the source and -target sequence. Instead of handling this insufficiency with a linearly -weighted attention mechanism, in this paper, we propose to use a recurrent -neural network (RNN) as an alternative (Cseq2seq-I). During decoding, -Cseq2seq-I cyclically feeds the previous decoding state back to the encoder as -the initial state of the RNN, and reencodes source representations to produce -context vectors. We surprisingly find that the introduced RNN succeeds in -dynamically detecting translationrelated source tokens according to the partial -target sequence. Based on this finding, we further hypothesize that the partial -target sequence can act as a feedback to improve the understanding of the -source sequence. To test this hypothesis, we propose cyclic -sequence-to-sequence learning (Cseq2seq-II) which differs from the seq2seq only -in the reintroduction of previous decoding state into the same encoder. We -further perform parameter sharing on Cseq2seq-II to reduce parameter redundancy -and enhance regularization. In particular, we share the weights of the encoder -and decoder, and two targetside word embeddings, making Cseq2seq-II equivalent -to a single conditional RNN model, with 31% parameters pruned but even better -performance. Cseq2seq-II not only preserves the simplicity of seq2seq but also -yields comparable and promising results on machine translation tasks. -Experiments on Chinese- English and English-German translation show that -Cseq2seq achieves significant and consistent improvements over seq2seq and is -as competitive as the attention-based seq2seq model. -" -3239,1607.08822,"Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould",SPICE: Semantic Propositional Image Caption Evaluation,cs.CV cs.CL," There is considerable interest in the task of automatically generating image -captions. However, evaluation is challenging. Existing automatic evaluation -metrics are primarily sensitive to n-gram overlap, which is neither necessary -nor sufficient for the task of simulating human judgment. We hypothesize that -semantic propositional content is an important component of human caption -evaluation, and propose a new automated caption evaluation metric defined over -scene graphs coined SPICE. Extensive evaluations across a range of models and -datasets indicate that SPICE captures human judgments over model-generated -captions better than other automatic metrics (e.g., system-level correlation of -0.88 with human judgments on the MS COCO dataset, versus 0.43 for CIDEr and -0.53 for METEOR). Furthermore, SPICE can answer questions such as `which -caption-generator best understands colors?' and `can caption-generators count?' -" -3240,1607.08864,Christoph Redl,"The DLVHEX System for Knowledge Representation: Recent Advances (System - Description)",cs.CL cs.AI cs.PL," The DLVHEX system implements the HEX-semantics, which integrates answer set -programming (ASP) with arbitrary external sources. Since its first release ten -years ago, significant advancements were achieved. Most importantly, the -exploitation of properties of external sources led to efficiency improvements -and flexibility enhancements of the language, and technical improvements on the -system side increased user's convenience. In this paper, we present the current -status of the system and point out the most important recent enhancements over -early versions. While existing literature focuses on theoretical aspects and -specific components, a bird's eye view of the overall system is missing. In -order to promote the system for real-world applications, we further present -applications which were already successfully realized on top of DLVHEX. This -paper is under consideration for acceptance in Theory and Practice of Logic -Programming. -" -3241,1607.08883,"Satanu Ghosh, Souvick Ghosh, Dipankar Das",Labeling of Query Words using Conditional Random Field,cs.IR cs.CL," This paper describes our approach on Query Word Labeling as an attempt in the -shared task on Mixed Script Information Retrieval at Forum for Information -Retrieval Evaluation (FIRE) 2015. The query is written in Roman script and the -words were in English or transliterated from Indian regional languages. A total -of eight Indian languages were present in addition to English. We also -identified the Named Entities and special symbols as part of our task. A CRF -based machine learning framework was used for labeling the individual words -with their corresponding language labels. We used a dictionary based approach -for language identification. We also took into account the context of the word -while identifying the language. Our system demonstrated an overall accuracy of -75.5% for token level language identification. The strict F-measure scores for -the identification of token level language labels for Bengali, English and -Hindi are 0.7486, 0.892 and 0.7972 respectively. The overall weighted F-measure -of our system was 0.7498. -" -3242,1607.08885,"Promita Maitra, Souvick Ghosh, Dipankar Das",Authorship Verification - An Approach based on Random Forest,cs.CL," Authorship attribution, being an important problem in many areas in-cluding -information retrieval, computational linguistics, law and journalism etc., has -been identified as a subject of increasingly research interest in the re-cent -years. In case of Author Identification task in PAN at CLEF 2015, the main -focus was given on cross-genre and cross-topic author verification tasks. We -have used several word-based and style-based features to identify the -dif-ferences between the known and unknown problems of one given set and label -the unknown ones accordingly using a Random Forest based classifier. -" -3243,1607.08898,Tao Ding and Shimei Pan,Personalized Emphasis Framing for Persuasive Message Generation,cs.AI cs.CL," In this paper, we present a study on personalized emphasis framing which can -be used to tailor the content of a message to enhance its appeal to different -individuals. With this framework, we directly model content selection decisions -based on a set of psychologically-motivated domain-independent personal traits -including personality (e.g., extraversion and conscientiousness) and basic -human values (e.g., self-transcendence and hedonism). We also demonstrate how -the analysis results can be used in automated personalized content selection -for persuasive message generation. -" -3244,1608.00104,"Chenguang Wang, Yangqiu Song, Dan Roth, Ming Zhang, Jiawei Han",World Knowledge as Indirect Supervision for Document Clustering,cs.LG cs.CL cs.IR," One of the key obstacles in making learning protocols realistic in -applications is the need to supervise them, a costly process that often -requires hiring domain experts. We consider the framework to use the world -knowledge as indirect supervision. World knowledge is general-purpose -knowledge, which is not designed for any specific domain. Then the key -challenges are how to adapt the world knowledge to domains and how to represent -it for learning. In this paper, we provide an example of using world knowledge -for domain dependent document clustering. We provide three ways to specify the -world knowledge to domains by resolving the ambiguity of the entities and their -types, and represent the data with world knowledge as a heterogeneous -information network. Then we propose a clustering algorithm that can cluster -multiple types and incorporate the sub-type information as constraints. In the -experiments, we use two existing knowledge bases as our sources of world -knowledge. One is Freebase, which is collaboratively collected knowledge about -entities and their organizations. The other is YAGO2, a knowledge base -automatically extracted from Wikipedia and maps knowledge to the linguistic -knowledge base, WordNet. Experimental results on two text benchmark datasets -(20newsgroups and RCV1) show that incorporating world knowledge as indirect -supervision can significantly outperform the state-of-the-art clustering -algorithms as well as clustering algorithms enhanced with world knowledge -features. -" -3245,1608.00112,Haitao Mi and Zhiguo Wang and Abe Ittycheriah,Supervised Attentions for Neural Machine Translation,cs.CL," In this paper, we improve the attention or alignment accuracy of neural -machine translation by utilizing the alignments of training sentence pairs. We -simply compute the distance between the machine attentions and the ""true"" -alignments, and minimize this cost in the training procedure. Our experiments -on large-scale Chinese-to-English task show that our model improves both -translation and alignment qualities significantly over the large-vocabulary -neural machine translation system, and even beats a state-of-the-art -traditional syntax-based system. -" -3246,1608.00255,Justyna Grudzinska and Marek Zawadowski,"Continuation semantics for multi-quantifier sentences: operation-based - approaches",math.LO cs.CL cs.LO," Classical scope-assignment strategies for multi-quantifier sentences involve -quantifier phrase (QP)-movement. More recent continuation-based approaches -provide a compelling alternative, for they interpret QP's in situ - without -resorting to Logical Forms or any structures beyond the overt syntax. The -continuation-based strategies can be divided into two groups: those that locate -the source of scope-ambiguity in the rules of semantic composition and those -that attribute it to the lexical entries for the quantifier words. In this -paper, we focus on the former operation-based approaches and the nature of the -semantic operations involved. More specifically, we discuss three such possible -operation-based strategies for multi-quantifier sentences, together with their -relative merits and costs. -" -3247,1608.00272,"Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L. - Berg",Modeling Context in Referring Expressions,cs.CV cs.CL," Humans refer to objects in their environments all the time, especially in -dialogue with other people. We explore generating and comprehending natural -language referring expressions for objects in images. In particular, we focus -on incorporating better measures of visual context into referring expression -models and find that visual comparison to other objects within an image helps -improve performance significantly. We also develop methods to tie the language -generation process together, so that we generate expressions for all objects of -a particular category jointly. Evaluation on three recent datasets - RefCOCO, -RefCOCO+, and RefCOCOg, shows the advantages of our methods for both referring -expression generation and comprehension. -" -3248,1608.00293,Hiroshi Noji,"Left-corner Methods for Syntactic Modeling with Universal Structural - Constraints",cs.CL," The primary goal in this thesis is to identify better syntactic constraint or -bias, that is language independent but also efficiently exploitable during -sentence processing. We focus on a particular syntactic construction called -center-embedding, which is well studied in psycholinguistics and noted to cause -particular difficulty for comprehension. Since people use language as a tool -for communication, one expects such complex constructions to be avoided for -communication efficiency. From a computational perspective, center-embedding is -closely relevant to a left-corner parsing algorithm, which can capture the -degree of center-embedding of a parse tree being constructed. This connection -suggests left-corner methods can be a tool to exploit the universal syntactic -constraint that people avoid generating center-embedded structures. We explore -such utilities of center-embedding as well as left-corner methods extensively -through several theoretical and empirical examinations. - Our primary task is unsupervised grammar induction. In this task, the input -to the algorithm is a collection of sentences, from which the model tries to -extract the salient patterns on them as a grammar. This is a particularly hard -problem although we expect the universal constraint may help in improving the -performance since it can effectively restrict the possible search space for the -model. We build the model by extending the left-corner parsing algorithm for -efficiently tabulating the search space except those involving center-embedding -up to a specific degree. We examine the effectiveness of our approach on many -treebanks, and demonstrate that often our constraint leads to better parsing -performance. We thus conclude that left-corner methods are particularly useful -for syntax-oriented systems, as it can exploit efficiently the inherent -universal constraints in languages. -" -3249,1608.00318,"Sungjin Ahn, Heeyoul Choi, Tanel P\""arnamaa, Yoshua Bengio",A Neural Knowledge Language Model,cs.CL cs.LG," Current language models have a significant limitation in the ability to -encode and decode factual knowledge. This is mainly because they acquire such -knowledge from statistical co-occurrences although most of the knowledge words -are rarely observed. In this paper, we propose a Neural Knowledge Language -Model (NKLM) which combines symbolic knowledge provided by the knowledge graph -with the RNN language model. By predicting whether the word to generate has an -underlying fact or not, the model can generate such knowledge-related words by -copying from the description of the predicted fact. In experiments, we show -that the NKLM significantly improves the performance while generating a much -smaller number of unknown words. -" -3250,1608.00329,Sujatha Das Gollapalli and Xiao-li Li,Keyphrase Extraction using Sequential Labeling,cs.CL cs.AI cs.IR," Keyphrases efficiently summarize a document's content and are used in various -document processing and retrieval tasks. Several unsupervised techniques and -classifiers exist for extracting keyphrases from text documents. Most of these -methods operate at a phrase-level and rely on part-of-speech (POS) filters for -candidate phrase generation. In addition, they do not directly handle -keyphrases of varying lengths. We overcome these modeling shortcomings by -addressing keyphrase extraction as a sequential labeling task in this paper. We -explore a basic set of features commonly used in NLP tasks as well as -predictions from various unsupervised methods to train our taggers. In addition -to a more natural modeling for the keyphrase extraction problem, we show that -tagging models yield significant performance benefits over existing -state-of-the-art extraction methods. -" -3251,1608.00339,"Jekaterina Novikova, Oliver Lemon and Verena Rieser",Crowd-sourcing NLG Data: Pictures Elicit Better Data,cs.CL," Recent advances in corpus-based Natural Language Generation (NLG) hold the -promise of being easily portable across domains, but require costly training -data, consisting of meaning representations (MRs) paired with Natural Language -(NL) utterances. In this work, we propose a novel framework for crowdsourcing -high quality NLG training data, using automatic quality control measures and -evaluating different MRs with which to elicit data. We show that pictorial MRs -result in better NL data being collected than logic-based MRs: utterances -elicited by pictorial MRs are judged as significantly more natural, more -informative, and better phrased, with a significant increase in average quality -ratings (around 0.5 points on a 6-point scale), compared to using the logical -MRs. As the MR becomes more complex, the benefits of pictorial stimuli -increase. The collected data will be released as part of this submission. -" -3252,1608.00466,"Madhusudan Lakshmana, Sundararajan Sellamanickam, Shirish Shevade, - Keerthi Selvaraj","Learning Semantically Coherent and Reusable Kernels in Convolution - Neural Nets for Sentence Classification",cs.CL cs.LG cs.NE," The state-of-the-art CNN models give good performance on sentence -classification tasks. The purpose of this work is to empirically study -desirable properties such as semantic coherence, attention mechanism and -reusability of CNNs in these tasks. Semantically coherent kernels are -preferable as they are a lot more interpretable for explaining the decision of -the learned CNN model. We observe that the learned kernels do not have semantic -coherence. Motivated by this observation, we propose to learn kernels with -semantic coherence using clustering scheme combined with Word2Vec -representation and domain knowledge such as SentiWordNet. We suggest a -technique to visualize attention mechanism of CNNs for decision explanation -purpose. Reusable property enables kernels learned on one problem to be used in -another problem. This helps in efficient learning as only a few additional -domain specific filters may have to be learned. We demonstrate the efficacy of -our core ideas of learning semantically coherent kernels and leveraging -reusable kernels for efficient learning on several benchmark datasets. -Experimental results show the usefulness of our approach by achieving -performance close to the state-of-the-art methods but with semantic and -reusable properties. -" -3253,1608.00470,"Nikolaos Aletras, Arpit Mittal",Labeling Topics with Images using Neural Networks,cs.CL cs.CV," Topics generated by topic models are usually represented by lists of $t$ -terms or alternatively using short phrases and images. The current -state-of-the-art work on labeling topics using images selects images by -re-ranking a small set of candidates for a given topic. In this paper, we -present a more generic method that can estimate the degree of association -between any arbitrary pair of an unseen topic and image using a deep neural -network. Our method has better runtime performance $O(n)$ compared to $O(n^2)$ -for the current state-of-the-art method, and is also significantly more -accurate. -" -3254,1608.00508,"Paul Michel, Okko R\""as\""anen, Roland Thiolli\`ere, Emmanuel Dupoux",Blind phoneme segmentation with temporal prediction errors,cs.CL," Phonemic segmentation of speech is a critical step of speech recognition -systems. We propose a novel unsupervised algorithm based on sequence prediction -models such as Markov chains and recurrent neural network. Our approach -consists in analyzing the error profile of a model trained to predict speech -features frame-by-frame. Specifically, we try to learn the dynamics of speech -in the MFCC space and hypothesize boundaries from local maxima in the -prediction error. We evaluate our system on the TIMIT dataset, with -improvements over similar methods. -" -3255,1608.00612,"Abhyuday Jagannatha, Hong Yu","Structured prediction models for RNN based sequence labeling in clinical - text",cs.CL," Sequence labeling is a widely used method for named entity recognition and -information extraction from unstructured natural language data. In clinical -domain one major application of sequence labeling involves extraction of -medical entities such as medication, indication, and side-effects from -Electronic Health Record narratives. Sequence labeling in this domain, presents -its own set of challenges and objectives. In this work we experimented with -various CRF based structured learning models with Recurrent Neural Networks. We -extend the previously studied LSTM-CRF models with explicit modeling of -pairwise potentials. We also propose an approximate version of skip-chain CRF -inference with RNN potentials. We use these methodologies for structured -prediction in order to improve the exact phrase detection of various medical -entities. -" -3256,1608.00789,Luk\'a\v{s} Svoboda and Tom\'a\v{s} Brychc\'in,New word analogy corpus for exploring embeddings of Czech words,cs.CL," The word embedding methods have been proven to be very useful in many tasks -of NLP (Natural Language Processing). Much has been investigated about word -embeddings of English words and phrases, but only little attention has been -dedicated to other languages. - Our goal in this paper is to explore the behavior of state-of-the-art word -embedding methods on Czech, the language that is characterized by very rich -morphology. We introduce new corpus for word analogy task that inspects -syntactic, morphosyntactic and semantic properties of Czech words and phrases. -We experiment with Word2Vec and GloVe algorithms and discuss the results on -this corpus. The corpus is available for the research community. -" -3257,1608.00841,"Jos\'e Camacho-Collados, Ignacio Iacobacci, Roberto Navigli and - Mohammad Taher Pilehvar",Semantic Representations of Word Senses and Concepts,cs.CL," Representing the semantics of linguistic items in a machine-interpretable -form has been a major goal of Natural Language Processing since its earliest -days. Among the range of different linguistic items, words have attracted the -most research attention. However, word representations have an important -limitation: they conflate different meanings of a word into a single vector. -Representations of word senses have the potential to overcome this inherent -limitation. Indeed, the representation of individual word senses and concepts -has recently gained in popularity with several experimental results showing -that a considerable performance improvement can be achieved across different -NLP applications upon moving from word level to the deeper sense and concept -levels. Another interesting point regarding the representation of concepts and -word senses is that these models can be seamlessly applied to other linguistic -items, such as words, phrases and sentences. -" -3258,1608.00869,"Daniela Gerz, Ivan Vuli\'c, Felix Hill, Roi Reichart, Anna Korhonen",SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity,cs.CL," Verbs play a critical role in the meaning of sentences, but these ubiquitous -words have received little attention in recent distributional semantics -research. We introduce SimVerb-3500, an evaluation resource that provides human -ratings for the similarity of 3,500 verb pairs. SimVerb-3500 covers all normed -verb types from the USF free-association database, providing at least three -examples for every VerbNet class. This broad coverage facilitates detailed -analyses of how syntactic and semantic phenomena together influence human -understanding of verb meaning. Further, with significantly larger development -and test sets than existing benchmarks, SimVerb-3500 enables more robust -evaluation of representation learning architectures and promotes the -development of methods tailored to verbs. We hope that SimVerb-3500 will enable -a richer understanding of the diversity and complexity of verb semantics and -guide the development of systems that can effectively represent and interpret -this meaning. -" -3259,1608.00892,Liang Lu and Michelle Guo and Steve Renals,Knowledge Distillation for Small-footprint Highway Networks,cs.CL," Deep learning has significantly advanced state-of-the-art of speech -recognition in the past few years. However, compared to conventional Gaussian -mixture acoustic models, neural network models are usually much larger, and are -therefore not very deployable in embedded devices. Previously, we investigated -a compact highway deep neural network (HDNN) for acoustic modelling, which is a -type of depth-gated feedforward neural network. We have shown that HDNN-based -acoustic models can achieve comparable recognition accuracy with much smaller -number of model parameters compared to plain deep neural network (DNN) acoustic -models. In this paper, we push the boundary further by leveraging on the -knowledge distillation technique that is also known as {\it teacher-student} -training, i.e., we train the compact HDNN model with the supervision of a high -accuracy cumbersome model. Furthermore, we also investigate sequence training -and adaptation in the context of teacher-student training. Our experiments were -performed on the AMI meeting speech recognition corpus. With this technique, we -significantly improved the recognition accuracy of the HDNN acoustic model with -less than 0.8 million parameters, and narrowed the gap between this model and -the plain DNN with 30 million parameters. -" -3260,1608.00895,"Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilya Kulikov, Ralf - Schl\""uter, Hermann Ney","RETURNN: The RWTH Extensible Training framework for Universal Recurrent - Neural Networks",cs.LG cs.CL cs.NE," In this work we release our extensible and easily configurable neural network -training software. It provides a rich set of functional layers with a -particular focus on efficient training of recurrent neural network topologies -on multiple GPUs. The source of the software package is public and freely -available for academic research purposes and can be used as a framework or as a -standalone tool which supports a flexible configuration. The software allows to -train state-of-the-art deep bidirectional long short-term memory (LSTM) models -on both one dimensional data like speech or two dimensional data like -handwritten text and was used to develop successful submission systems in -several evaluation campaigns. -" -3261,1608.00929,"Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu",Efficient Segmental Cascades for Speech Recognition,cs.CL," Discriminative segmental models offer a way to incorporate flexible feature -functions into speech recognition. However, their appeal has been limited by -their computational requirements, due to the large number of possible segments -to consider. Multi-pass cascades of segmental models introduce features of -increasing complexity in different passes, where in each pass a segmental model -rescores lattices produced by a previous (simpler) segmental model. In this -paper, we explore several ways of making segmental cascades efficient and -practical: reducing the feature set in the first pass, frame subsampling, and -various pruning approaches. In experiments on phonetic recognition, we find -that with a combination of such techniques, it is possible to maintain -competitive performance while greatly reducing decoding, pruning, and training -time. -" -3262,1608.00938,"Christopher A. Ahern, Mitchell G. Newberry, Robin Clark, Joshua B. - Plotkin",Evolutionary forces in language change,q-bio.PE cs.CL," Languages and genes are both transmitted from generation to generation, with -opportunity for differential reproduction and survivorship of forms. Here we -apply a rigorous inference framework, drawn from population genetics, to -distinguish between two broad mechanisms of language change: drift and -selection. Drift is change that results from stochasticity in transmission and -it may occur in the absence of any intrinsic difference between linguistic -forms; whereas selection is truly an evolutionary force arising from intrinsic -differences -- for example, when one form is preferred by members of the -population. Using large corpora of parsed texts spanning the 12th century to -the 21st century, we analyze three examples of grammatical changes in English: -the regularization of past-tense verbs, the rise of the periphrastic `do', and -syntactic variation in verbal negation. We show that we can reject stochastic -drift in favor of a selective force driving some of these language changes, but -not others. The strength of drift depends on a word's frequency, and so drift -provides an alternative explanation for why some words are more prone to change -than others. Our results suggest an important role for stochasticity in -language change, and they provide a null model against which selective theories -of language evolution must be compared. -" -3263,1608.01018,"Dimitrios Kartsaklis, Martha Lewis, Laura Rimell","Proceedings of the 2016 Workshop on Semantic Spaces at the Intersection - of NLP, Physics and Cognitive Science",cs.CL cs.AI math.CT quant-ph," This volume contains the Proceedings of the 2016 Workshop on Semantic Spaces -at the Intersection of NLP, Physics and Cognitive Science (SLPCS 2016), which -was held on the 11th of June at the University of Strathclyde, Glasgow, and was -co-located with Quantum Physics and Logic (QPL 2016). Exploiting the common -ground provided by the concept of a vector space, the workshop brought together -researchers working at the intersection of Natural Language Processing (NLP), -cognitive science, and physics, offering them an appropriate forum for -presenting their uniquely motivated work and ideas. The interplay between these -three disciplines inspired theoretically motivated approaches to the -understanding of how word meanings interact with each other in sentences and -discourse, how diagrammatic reasoning depicts and simplifies this interaction, -how language models are determined by input from the world, and how word and -sentence meanings interact logically. This first edition of the workshop -consisted of three invited talks from distinguished speakers (Hans Briegel, -Peter G\""ardenfors, Dominic Widdows) and eight presentations of selected -contributed papers. Each submission was refereed by at least three members of -the Programme Committee, who delivered detailed and insightful comments and -suggestions. -" -3264,1608.01056,Parminder Bhatia and Robert Guthrie and Jacob Eisenstein,Morphological Priors for Probabilistic Neural Word Embeddings,cs.CL," Word embeddings allow natural language processing systems to share -statistical information across related words. These embeddings are typically -based on distributional statistics, making it difficult for them to generalize -to rare or unseen words. We propose to improve word embeddings by incorporating -morphological information, capturing shared sub-word features. Unlike previous -work that constructs word embeddings directly from morphemes, we combine -morphological and distributional information in a unified probabilistic -framework, in which the word embedding is a latent variable. The morphological -information provides a prior distribution on the latent word embeddings, which -in turn condition a likelihood function over an observed corpus. This approach -yields improvements on intrinsic word similarity evaluations, and also in the -downstream task of part-of-speech tagging. -" -3265,1608.01084,Christian Hadiwinoto and Yang Liu and Hwee Tou Ng,"To Swap or Not to Swap? Exploiting Dependency Word Pairs for Reordering - in Statistical Machine Translation",cs.CL," Reordering poses a major challenge in machine translation (MT) between two -languages with significant differences in word order. In this paper, we present -a novel reordering approach utilizing sparse features based on dependency word -pairs. Each instance of these features captures whether two words, which are -related by a dependency link in the source sentence dependency parse tree, -follow the same order or are swapped in the translation output. Experiments on -Chinese-to-English translation show a statistically significant improvement of -1.21 BLEU point using our approach, compared to a state-of-the-art statistical -MT system that incorporates prior reordering approaches. -" -3266,1608.01238,Manuel R. Ciosici,Improving Quality of Hierarchical Clustering for Large Data Series,cs.CL cs.LG," Brown clustering is a hard, hierarchical, bottom-up clustering of words in a -vocabulary. Words are assigned to clusters based on their usage pattern in a -given corpus. The resulting clusters and hierarchical structure can be used in -constructing class-based language models and for generating features to be used -in NLP tasks. Because of its high computational cost, the most-used version of -Brown clustering is a greedy algorithm that uses a window to restrict its -search space. Like other clustering algorithms, Brown clustering finds a -sub-optimal, but nonetheless effective, mapping of words to clusters. Because -of its ability to produce high-quality, human-understandable cluster, Brown -clustering has seen high uptake the NLP research community where it is used in -the preprocessing and feature generation steps. - Little research has been done towards improving the quality of Brown -clusters, despite the greedy and heuristic nature of the algorithm. The -approaches tried so far have focused on: studying the effect of the -initialisation in a similar algorithm; tuning the parameters used to define the -desired number of clusters and the behaviour of the algorithm; and including a -separate parameter to differentiate the window from the desired number of -clusters. However, some of these approaches have not yielded significant -improvements in cluster quality. - In this thesis, a close analysis of the Brown algorithm is provided, -revealing important under-specifications and weaknesses in the original -algorithm. These have serious effects on cluster quality and reproducibility of -research using Brown clustering. In the second part of the thesis, two -modifications are proposed. Finally, a thorough evaluation is performed, -considering both the optimization criterion of Brown clustering and the -performance of the resulting class-based language models. -" -3267,1608.01247,S.K Kolluru and Prasenjit Mukherjee,Query Clustering using Segment Specific Context Embeddings,cs.IR cs.CL," This paper presents a novel query clustering approach to capture the broad -interest areas of users querying search engines. We make use of recent advances -in NLP - word2vec and extend it to get query2vec, vector representations of -queries, based on query contexts, obtained from the top search results for the -query and use a highly scalable Divide & Merge clustering algorithm on top of -the query vectors, to get the clusters. We have tried this approach on a -variety of segments, including Retail, Travel, Health, Phones and found the -clusters to be effective in discovering user's interest areas which have high -monetization potential. -" -3268,1608.01281,"Yuping Luo, Chung-Cheng Chiu, Navdeep Jaitly, Ilya Sutskever",Learning Online Alignments with Continuous Rewards Policy Gradient,cs.LG cs.CL," Sequence-to-sequence models with soft attention had significant success in -machine translation, speech recognition, and question answering. Though capable -and easy to use, they require that the entirety of the input sequence is -available at the beginning of inference, an assumption that is not valid for -instantaneous translation and speech recognition. To address this problem, we -present a new method for solving sequence-to-sequence problems using hard -online alignments instead of soft offline alignments. The online alignments -model is able to start producing outputs without the need to first process the -entire input sequence. A highly accurate online sequence-to-sequence model is -useful because it can be used to build an accurate voice-based instantaneous -translator. Our model uses hard binary stochastic decisions to select the -timesteps at which outputs will be produced. The model is trained to produce -these stochastic decisions using a standard policy gradient method. In our -experiments, we show that this model achieves encouraging performance on TIMIT -and Wall Street Journal (WSJ) speech recognition datasets. -" -3269,1608.01298,"S\'andor Dar\'anyi, Peter Wittek, Konstantinos Konstantinidis, Symeon - Papadopoulos, Efstratios Kontopoulos",A Physical Metaphor to Study Semantic Drift,cs.CL cs.NE stat.ML," In accessibility tests for digital preservation, over time we experience -drifts of localized and labelled content in statistical models of evolving -semantics represented as a vector field. This articulates the need to detect, -measure, interpret and model outcomes of knowledge dynamics. To this end we -employ a high-performance machine learning algorithm for the training of -extremely large emergent self-organizing maps for exploratory data analysis. -The working hypothesis we present here is that the dynamics of semantic drifts -can be modeled on a relaxed version of Newtonian mechanics called social -mechanics. By using term distances as a measure of semantic relatedness vs. -their PageRank values indicating social importance and applied as variable -`term mass', gravitation as a metaphor to express changes in the semantic -content of a vector field lends a new perspective for experimentation. From -`term gravitation' over time, one can compute its generating potential whose -fluctuations manifest modifications in pairwise term similarity vs. social -importance, thereby updating Osgood's semantic differential. The dataset -examined is the public catalog metadata of Tate Galleries, London. -" -3270,1608.01401,"Daniela Ashoush (Univesity of Oxford), Bob Coecke (Univesity of - Oxford)",Dual Density Operators and Natural Language Meaning,cs.CL cs.LO quant-ph," Density operators allow for representing ambiguity about a vector -representation, both in quantum theory and in distributional natural language -meaning. Formally equivalently, they allow for discarding part of the -description of a composite system, where we consider the discarded part to be -the context. We introduce dual density operators, which allow for two -independent notions of context. We demonstrate the use of dual density -operators within a grammatical-compositional distributional framework for -natural language meaning. We show that dual density operators can be used to -simultaneously represent: (i) ambiguity about word meanings (e.g. queen as a -person vs. queen as a band), and (ii) lexical entailment (e.g. tiger -> -mammal). We provide a proof-of-concept example. -" -3271,1608.01402,"Josef Bolt (Univesity of Oxford), Bob Coecke (Univesity of Oxford), - Fabrizio Genovese (Univesity of Oxford), Martha Lewis (Univesity of Oxford), - Daniel Marsden (Univesity of Oxford), Robin Piedeleu (Univesity of Oxford)",Interacting Conceptual Spaces,cs.AI cs.CL cs.LO," We propose applying the categorical compositional scheme of [6] to conceptual -space models of cognition. In order to do this we introduce the category of -convex relations as a new setting for categorical compositional semantics, -emphasizing the convex structure important to conceptual space applications. We -show how conceptual spaces for composite types such as adjectives and verbs can -be constructed. We illustrate this new model on detailed examples. -" -3272,1608.01403,"Stephen McGregor (Queen Mary University of London), Matthew Purver - (Queen Mary University of London), Geraint Wiggins (Queen Mary University of - London)","Words, Concepts, and the Geometry of Analogy",cs.CL," This paper presents a geometric approach to the problem of modelling the -relationship between words and concepts, focusing in particular on analogical -phenomena in language and cognition. Grounded in recent theories regarding -geometric conceptual spaces, we begin with an analysis of existing static -distributional semantic models and move on to an exploration of a dynamic -approach to using high dimensional spaces of word meaning to project subspaces -where analogies can potentially be solved in an online, contextualised way. The -crucial element of this analysis is the positioning of statistics in a -geometric environment replete with opportunities for interpretation. -" -3273,1608.01404,Mehrnoosh Sadrzadeh (Queen Mary University of London),Quantifier Scope in Categorical Compositional Distributional Semantics,cs.CL cs.AI cs.LO," In previous work with J. Hedges, we formalised a generalised quantifiers -theory of natural language in categorical compositional distributional -semantics with the help of bialgebras. In this paper, we show how quantifier -scope ambiguity can be represented in that setting and how this representation -can be generalised to branching quantifiers. -" -3274,1608.01405,John van de Wetering (Radboud University),Entailment Relations on Distributions,cs.CL," In this paper we give an overview of partial orders on the space of -probability distributions that carry a notion of information content and serve -as a generalisation of the Bayesian order given in (Coecke and Martin, 2011). -We investigate what constraints are necessary in order to get a unique notion -of information content. These partial orders can be used to give an ordering on -words in vector space models of natural language meaning relating to the -contexts in which words are used, which is useful for a notion of entailment -and word disambiguation. The construction used also points towards a way to -create orderings on the space of density operators which allow a more -fine-grained study of entailment. The partial orders in this paper are directed -complete and form domains in the sense of domain theory. -" -3275,1608.01406,"William Zeng (Rigetti Computing), Bob Coecke (Univesity of Oxford)",Quantum Algorithms for Compositional Natural Language Processing,cs.CL quant-ph," We propose a new application of quantum computing to the field of natural -language processing. Ongoing work in this field attempts to incorporate -grammatical structure into algorithms that compute meaning. In (Coecke, -Sadrzadeh and Clark, 2010), the authors introduce such a model (the CSC model) -based on tensor product composition. While this algorithm has many advantages, -its implementation is hampered by the large classical computational resources -that it requires. In this work we show how computational shortcomings of the -CSC approach could be resolved using quantum computation (possibly in addition -to existing techniques for dimension reduction). We address the value of -quantum RAM (Giovannetti,2008) for this model and extend an algorithm from -Wiebe, Braun and Lloyd (2012) into a quantum algorithm to categorize sentences -in CSC. Our new algorithm demonstrates a quadratic speedup over classical -methods under certain conditions. -" -3276,1608.01413,Subhro Roy and Dan Roth,Solving General Arithmetic Word Problems,cs.CL," This paper presents a novel approach to automatically solving arithmetic word -problems. This is the first algorithmic approach that can handle arithmetic -problems with multiple steps and operations, without depending on additional -annotations or predefined templates. We develop a theory for expression trees -that can be used to represent and evaluate the target arithmetic expressions; -we use it to uniquely decompose the target arithmetic problem to multiple -classification problems; we then compose an expression tree, combining these -with world knowledge through a constrained inference framework. Our classifiers -gain from the use of {\em quantity schemas} that supports better extraction of -features. Experimental results show that our method outperforms existing -systems, achieving state of the art performance on benchmark datasets of -arithmetic word problems. -" -3277,1608.01448,"Qingrong Xia, Zhenghua Li, Jiayuan Chao, Min Zhang","Word Segmentation on Micro-blog Texts with External Lexicon and - Heterogeneous Data",cs.CL," This paper describes our system designed for the NLPCC 2016 shared task on -word segmentation on micro-blog texts. -" -3278,1608.01561,"Paheli Bhattacharya, Pawan Goyal and Sudeshna Sarkar","UsingWord Embeddings for Query Translation for Hindi to English Cross - Language Information Retrieval",cs.CL," Cross-Language Information Retrieval (CLIR) has become an important problem -to solve in the recent years due to the growth of content in multiple languages -in the Web. One of the standard methods is to use query translation from source -to target language. In this paper, we propose an approach based on word -embeddings, a method that captures contextual clues for a particular word in -the source language and gives those words as translations that occur in a -similar context in the target language. Once we obtain the word embeddings of -the source and target language pairs, we learn a projection from source to -target word embeddings, making use of a dictionary with word translation -pairs.We then propose various methods of query translation and aggregation. The -advantage of this approach is that it does not require the corpora to be -aligned (which is difficult to obtain for resource-scarce languages), a -dictionary with word translation pairs is enough to train the word vectors for -translation. We experiment with Forum for Information Retrieval and Evaluation -(FIRE) 2008 and 2012 datasets for Hindi to English CLIR. The proposed word -embedding based approach outperforms the basic dictionary based approach by 70% -and when the word embeddings are combined with the dictionary, the hybrid -approach beats the baseline dictionary based method by 77%. It outperforms the -English monolingual baseline by 15%, when combined with the translations -obtained from Google Translate and Dictionary. -" -3279,1608.01884,Ernest Davis,Winograd Schemas and Machine Translation,cs.AI cs.CL," A Winograd schema is a pair of sentences that differ in a single word and -that contain an ambiguous pronoun whose referent is different in the two -sentences and requires the use of commonsense knowledge or world knowledge to -disambiguate. This paper discusses how Winograd schemas and other sentence -pairs could be used as challenges for machine translation using distinctions -between pronouns, such as gender, that appear in the target language but not in -the source. -" -3280,1608.01910,"Pranava Swaroop Madhyastha, Cristina Espa\~na-Bonet","Resolving Out-of-Vocabulary Words with Bilingual Embeddings in Machine - Translation",cs.CL," Out-of-vocabulary words account for a large proportion of errors in machine -translation systems, especially when the system is used on a different domain -than the one where it was trained. In order to alleviate the problem, we -propose to use a log-bilinear softmax-based model for vocabulary expansion, -such that given an out-of-vocabulary source word, the model generates a -probabilistic list of possible translations in the target language. Our model -uses only word embeddings trained on significantly large unlabelled monolingual -corpora and trains over a fairly small, word-to-word bilingual dictionary. We -input this probabilistic list into a standard phrase-based statistical machine -translation system and obtain consistent improvements in translation quality on -the English-Spanish language pair. Especially, we get an improvement of 3.9 -BLEU points when tested over an out-of-domain test set. -" -3281,1608.01961,Mohammad Taher Pilehvar and Nigel Collier,De-Conflated Semantic Representations,cs.CL cs.AI," One major deficiency of most semantic representation techniques is that they -usually model a word type as a single point in the semantic space, hence -conflating all the meanings that the word can have. Addressing this issue by -learning distinct representations for individual meanings of words has been the -subject of several research studies in the past few years. However, the -generated sense representations are either not linked to any sense inventory or -are unreliable for infrequent word senses. We propose a technique that tackles -these problems by de-conflating the representations of words based on the deep -knowledge it derives from a semantic network. Our approach provides multiple -advantages in comparison to the past work, including its high coverage and the -ability to generate accurate representations even for infrequent word senses. -We carry out evaluations on six datasets across two semantic similarity tasks -and report state-of-the-art results on most of them. -" -3282,1608.01965,Camilo Akimushkin and Diego R. Amancio and Osvaldo N. Oliveira Jr,"Text authorship identified using the dynamics of word co-occurrence - networks",cs.CL," The identification of authorship in disputed documents still requires human -expertise, which is now unfeasible for many tasks owing to the large volumes of -text and authors in practical applications. In this study, we introduce a -methodology based on the dynamics of word co-occurrence networks representing -written texts to classify a corpus of 80 texts by 8 authors. The texts were -divided into sections with equal number of linguistic tokens, from which time -series were created for 12 topological metrics. The series were proven to be -stationary (p-value>0.05), which permits to use distribution moments as -learning attributes. With an optimized supervised learning procedure using a -Radial Basis Function Network, 68 out of 80 texts were correctly classified, -i.e. a remarkable 85% author matching success rate. Therefore, fluctuations in -purely dynamic network metrics were found to characterize authorship, thus -opening the way for the description of texts in terms of small evolving -networks. Moreover, the approach introduced allows for comparison of texts with -diverse characteristics in a simple, fast fashion. -" -3283,1608.01972,"Sun Kim, Nicolas Fiorini, W. John Wilbur and Zhiyong Lu","Bridging the Gap: Incorporating a Semantic Similarity Measure for - Effectively Mapping PubMed Queries to Documents",cs.CL cs.IR," The main approach of traditional information retrieval (IR) is to examine how -many words from a query appear in a document. A drawback of this approach, -however, is that it may fail to detect relevant documents where no or only few -words from a query are found. The semantic analysis methods such as LSA (latent -semantic analysis) and LDA (latent Dirichlet allocation) have been proposed to -address the issue, but their performance is not superior compared to common IR -approaches. Here we present a query-document similarity measure motivated by -the Word Mover's Distance. Unlike other similarity measures, the proposed -method relies on neural word embeddings to compute the distance between words. -This process helps identify related words when no direct matches are found -between a query and a document. Our method is efficient and straightforward to -implement. The experimental results on TREC Genomics data show that our -approach outperforms the BM25 ranking function by an average of 12% in mean -average precision. Furthermore, for a real-world dataset collected from the -PubMed search logs, we combine the semantic measure with BM25 using a learning -to rank method, which leads to improved ranking scores by up to 25%. This -experiment demonstrates that the proposed approach and BM25 nicely complement -each other and together produce superior performance. -" -3284,1608.02025,Jake Ryland Williams,Boundary-based MWE segmentation with text partitioning,cs.CL," This work presents a fine-grained, text-chunking algorithm designed for the -task of multiword expressions (MWEs) segmentation. As a lexical class, MWEs -include a wide variety of idioms, whose automatic identification are a -necessity for the handling of colloquial language. This algorithm's core -novelty is its use of non-word tokens, i.e., boundaries, in a bottom-up -strategy. Leveraging boundaries refines token-level information, forging -high-level performance from relatively basic data. The generality of this -model's feature space allows for its application across languages and domains. -Experiments spanning 19 different languages exhibit a broadly-applicable, -state-of-the-art model. Evaluation against recent shared-task data places text -partitioning as the overall, best performing MWE segmentation algorithm, -covering all MWE classes and multiple English domains (including user-generated -text). This performance, coupled with a non-combinatorial, fast-running design, -produces an ideal combination for implementations at scale, which are -facilitated through the release of open-source software. -" -3285,1608.02071,"Yun Liu, Kun-Ta Chuang, Fu-Wen Liang, Huey-Jen Su, Collin M. Stultz, - John V. Guttag",Transferring Knowledge from Text to Predict Disease Onset,cs.LG cs.CL," In many domains such as medicine, training data is in short supply. In such -cases, external knowledge is often helpful in building predictive models. We -propose a novel method to incorporate publicly available domain expertise to -build accurate models. Specifically, we use word2vec models trained on a -domain-specific corpus to estimate the relevance of each feature's text -description to the prediction problem. We use these relevance estimates to -rescale the features, causing more important features to experience weaker -regularization. - We apply our method to predict the onset of five chronic diseases in the next -five years in two genders and two age groups. Our rescaling approach improves -the accuracy of the model, particularly when there are few positive examples. -Furthermore, our method selects 60% fewer features, easing interpretation by -physicians. Our method is applicable to other domains where feature and outcome -descriptions are available. -" -3286,1608.02076,Hao Cheng and Hao Fang and Xiaodong He and Jianfeng Gao and Li Deng,Bi-directional Attention with Agreement for Dependency Parsing,cs.CL cs.AI cs.LG," We develop a novel bi-directional attention model for dependency parsing, -which learns to agree on headword predictions from the forward and backward -parsing directions. The parsing procedure for each direction is formulated as -sequentially querying the memory component that stores continuous headword -embeddings. The proposed parser makes use of {\it soft} headword embeddings, -allowing the model to implicitly capture high-order parsing history without -dramatically increasing the computational complexity. We conduct experiments on -English, Chinese, and 12 other languages from the CoNLL 2006 shared task, -showing that the proposed model achieves state-of-the-art unlabeled attachment -scores on 6 languages. -" -3287,1608.02094,Leon Derczynski,Desiderata for Vector-Space Word Representations,cs.CL," A plethora of vector-space representations for words is currently available, -which is growing. These consist of fixed-length vectors containing real values, -which represent a word. The result is a representation upon which the power of -many conventional information processing and data mining techniques can be -brought to bear, as long as the representations are designed with some -forethought and fit certain constraints. This paper details desiderata for the -design of vector space representations of words. -" -3288,1608.02097,"Su Zhu, Kai Yu","Encoder-decoder with Focus-mechanism for Sequence Labelling Based Spoken - Language Understanding",cs.CL," This paper investigates the framework of encoder-decoder with attention for -sequence labelling based spoken language understanding. We introduce -Bidirectional Long Short Term Memory - Long Short Term Memory networks -(BLSTM-LSTM) as the encoder-decoder model to fully utilize the power of deep -learning. In the sequence labelling task, the input and output sequences are -aligned word by word, while the attention mechanism cannot provide the exact -alignment. To address this limitation, we propose a novel focus mechanism for -encoder-decoder framework. Experiments on the standard ATIS dataset showed that -BLSTM-LSTM with focus mechanism defined the new state-of-the-art by -outperforming standard BLSTM and attention based encoder-decoder. Further -experiments also show that the proposed model is more robust to speech -recognition errors. -" -3289,1608.02117,"Ivan Vuli\'c, Daniela Gerz, Douwe Kiela, Felix Hill, and Anna Korhonen",HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment,cs.CL," We introduce HyperLex - a dataset and evaluation resource that quantifies the -extent of of the semantic category membership, that is, type-of relation also -known as hyponymy-hypernymy or lexical entailment (LE) relation between 2,616 -concept pairs. Cognitive psychology research has established that typicality -and category/class membership are computed in human semantic memory as a -gradual rather than binary relation. Nevertheless, most NLP research, and -existing large-scale invetories of concept category membership (WordNet, -DBPedia, etc.) treat category membership and LE as binary. To address this, we -asked hundreds of native English speakers to indicate typicality and strength -of category membership between a diverse range of concept pairs on a -crowdsourcing platform. Our results confirm that category membership and LE are -indeed more gradual than binary. We then compare these human judgements with -the predictions of automatic systems, which reveals a huge gap between human -performance and state-of-the-art LE, distributional and representation learning -models, and substantial differences between the models themselves. We discuss a -pathway for improving semantic models to overcome this discrepancy, and -indicate future application areas for improved graded LE systems. -" -3290,1608.02153,"U. Springmann and A. L\""udeling","OCR of historical printings with an application to building diachronic - corpora: A case study using the RIDGES herbal corpus",cs.CL cs.DL," This article describes the results of a case study that applies Neural -Network-based Optical Character Recognition (OCR) to scanned images of books -printed between 1487 and 1870 by training the OCR engine OCRopus -[@breuel2013high] on the RIDGES herbal text corpus [@OdebrechtEtAlSubmitted]. -Training specific OCR models was possible because the necessary *ground truth* -is available as error-corrected diplomatic transcriptions. The OCR results have -been evaluated for accuracy against the ground truth of unseen test sets. -Character and word accuracies (percentage of correctly recognized items) for -the resulting machine-readable texts of individual documents range from 94% to -more than 99% (character level) and from 76% to 97% (word level). This includes -the earliest printed books, which were thought to be inaccessible by OCR -methods until recently. Furthermore, OCR models trained on one part of the -corpus consisting of books with different printing dates and different typesets -*(mixed models)* have been tested for their predictive power on the books from -the other part containing yet other fonts, mostly yielding character accuracies -well above 90%. It therefore seems possible to construct generalized models -trained on a range of fonts that can be applied to a wide variety of historical -printings still giving good results. A moderate postcorrection effort of some -pages will then enable the training of individual models with even better -accuracies. Using this method, diachronic corpora including early printings can -be constructed much faster and cheaper than by manual transcription. The OCR -methods reported here open up the possibility of transforming our printed -textual cultural heritage into electronic text by largely automatic means, -which is a prerequisite for the mass conversion of scanned books. -" -3291,1608.02195,Felix Biessmann,Automating Political Bias Prediction,cs.SI cs.CL," Every day media generate large amounts of text. An unbiased view on media -reports requires an understanding of the political bias of media content. -Assistive technology for estimating the political bias of texts can be helpful -in this context. This study proposes a simple statistical learning approach to -predict political bias from text. Standard text features extracted from -speeches and manifestos of political parties are used to predict political bias -in terms of political party affiliation and in terms of political views. -Results indicate that political bias can be predicted with above chance -accuracy. Mistakes of the model can be interpreted with respect to changes of -policies of political actors. Two approaches are presented to make the results -more interpretable: a) discriminative text features are related to the -political orientation of a party and b) sentiment features of texts are -correlated with a measure of political power. Political power appears to be -strongly correlated with positive sentiment of a text. To highlight some -potential use cases a web application shows how the model can be used for texts -for which the political bias is not clear such as news articles. -" -3292,1608.02214,"Keisuke Sakaguchi, Kevin Duh, Matt Post, and Benjamin Van Durme",Robsut Wrod Reocginiton via semi-Character Recurrent Neural Network,cs.CL," Language processing mechanism by humans is generally more robust than -computers. The Cmabrigde Uinervtisy (Cambridge University) effect from the -psycholinguistics literature has demonstrated such a robust word processing -mechanism, where jumbled words (e.g. Cmabrigde / Cambridge) are recognized with -little cost. On the other hand, computational models for word recognition (e.g. -spelling checkers) perform poorly on data with such noise. Inspired by the -findings from the Cmabrigde Uinervtisy effect, we propose a word recognition -model based on a semi-character level recurrent neural network (scRNN). In our -experiments, we demonstrate that scRNN has significantly more robust -performance in word spelling correction (i.e. word recognition) compared to -existing spelling checkers and character-based convolutional neural network. -Furthermore, we demonstrate that the model is cognitively plausible by -replicating a psycholinguistics experiment about human reading difficulty using -our model. -" -3293,1608.02254,"Max Kanovich, Stepan Kuznetsov, Andre Scedrov","Reconciling Lambek's restriction, cut-elimination, and substitution in - the presence of exponential modalities",math.LO cs.CL," The Lambek calculus can be considered as a version of non-commutative -intuitionistic linear logic. One of the interesting features of the Lambek -calculus is the so-called ""Lambek's restriction,"" that is, the antecedent of -any provable sequent should be non-empty. In this paper we discuss ways of -extending the Lambek calculus with the linear logic exponential modality while -keeping Lambek's restriction. Interestingly enough, we show that for any system -equipped with a reasonable exponential modality the following holds: if the -system enjoys cut elimination and substitution to the full extent, then the -system necessarily violates Lambek's restriction. Nevertheless, we show that -two of the three conditions can be implemented. Namely, we design a system with -Lambek's restriction and cut elimination and another system with Lambek's -restriction and substitution. For both calculi we prove that they are -undecidable, even if we take only one of the two divisions provided by the -Lambek calculus. The system with cut elimination and substitution and without -Lambek's restriction is folklore and known to be undecidable. -" -3294,1608.02272,"Ali Khodabakhsh, Seyyed Saeed Sarfjoo, Umut Uludag, Osman Soyyigit, - Cenk Demiroglu","Incorporation of Speech Duration Information in Score Fusion of Speaker - Recognition Systems",cs.SD cs.CL," In recent years identity-vector (i-vector) based speaker verification (SV) -systems have become very successful. Nevertheless, environmental noise and -speech duration variability still have a significant effect on degrading the -performance of these systems. In many real-life applications, duration of -recordings are very short; as a result, extracted i-vectors cannot reliably -represent the attributes of the speaker. Here, we investigate the effect of -speech duration on the performance of three state-of-the-art speaker -recognition systems. In addition, using a variety of available score fusion -methods, we investigate the effect of score fusion for those speaker -verification techniques to benefit from the performance difference of different -methods under different enrollment and test speech duration conditions. This -technique performed significantly better than the baseline score fusion -methods. -" -3295,1608.02289,"Rossano Schifanella, Paloma de Juan, Joel Tetreault, Liangliang Cao",Detecting Sarcasm in Multimodal Social Platforms,cs.CV cs.CL cs.MM," Sarcasm is a peculiar form of sentiment expression, where the surface -sentiment differs from the implied sentiment. The detection of sarcasm in -social media platforms has been applied in the past mainly to textual -utterances where lexical indicators (such as interjections and intensifiers), -linguistic markers, and contextual information (such as user profiles, or past -conversations) were used to detect the sarcastic tone. However, modern social -media platforms allow to create multimodal messages where audiovisual content -is integrated with the text, making the analysis of a mode in isolation -partial. In our work, we first study the relationship between the textual and -visual aspects in multimodal posts from three major social media platforms, -i.e., Instagram, Tumblr and Twitter, and we run a crowdsourcing task to -quantify the extent to which images are perceived as necessary by human -annotators. Moreover, we propose two different computational frameworks to -detect sarcasm that integrate the textual and visual modalities. The first -approach exploits visual semantics trained on an external dataset, and -concatenates the semantics features with state-of-the-art textual features. The -second method adapts a visual neural network initialized with parameters -trained on ImageNet to multimodal sarcastic posts. Results show the positive -effect of combining modalities for the detection of sarcasm across platforms -and methods. -" -3296,1608.02519,"Marina Sokolova, Kanyi Huang, Stan Matwin, Joshua Ramisch, Vera - Sazonova, Renee Black, Chris Orwa, Sidney Ochieng, Nanjira Sambuli",Topic Modelling and Event Identification from Twitter Textual Data,cs.SI cs.CL," The tremendous growth of social media content on the Internet has inspired -the development of the text analytics to understand and solve real-life -problems. Leveraging statistical topic modelling helps researchers and -practitioners in better comprehension of textual content as well as provides -useful information for further analysis. Statistical topic modelling becomes -especially important when we work with large volumes of dynamic text, e.g., -Facebook or Twitter datasets. In this study, we summarize the message content -of four data sets of Twitter messages relating to challenging social events in -Kenya. We use Latent Dirichlet Allocation (LDA) topic modelling to analyze the -content. Our study uses two evaluation measures, Normalized Mutual Information -(NMI) and topic coherence analysis, to select the best LDA models. The obtained -LDA results show that the tool can be effectively used to extract discussion -topics and summarize them for further manual analysis -" -3297,1608.02689,Nanyun Peng and Mark Dredze,Multi-task Domain Adaptation for Sequence Tagging,cs.CL cs.LG," Many domain adaptation approaches rely on learning cross domain shared -representations to transfer the knowledge learned in one domain to other -domains. Traditional domain adaptation only considers adapting for one task. In -this paper, we explore multi-task representation learning under the domain -adaptation scenario. We propose a neural network framework that supports domain -adaptation for multiple tasks simultaneously, and learns shared representations -that better generalize for domain adaptation. We apply the proposed framework -to domain adaptation for sequence tagging problems considering two tasks: -Chinese word segmentation and named entity recognition. Experiments show that -multi-task domain adaptation works better than disjoint domain adaptation for -each task, and achieves the state-of-the-art results for both tasks in the -social media domain. -" -3298,1608.02717,Ashkan Mokarian and Mateusz Malinowski and Mario Fritz,"Mean Box Pooling: A Rich Image Representation and Output Embedding for - the Visual Madlibs Task",cs.CV cs.AI cs.CL cs.LG," We present Mean Box Pooling, a novel visual representation that pools over -CNN representations of a large number, highly overlapping object proposals. We -show that such representation together with nCCA, a successful multimodal -embedding technique, achieves state-of-the-art performance on the Visual -Madlibs task. Moreover, inspired by the nCCA's objective function, we extend -classical CNN+LSTM approach to train the network by directly maximizing the -similarity between the internal representation of the deep learning -architecture and candidate answers. Again, such approach achieves a significant -improvement over the prior work that also uses CNN+LSTM approach on Visual -Madlibs. -" -3299,1608.02784,"Nikos Papasarantopoulos, Helen Jiang, Shay B. Cohen",Canonical Correlation Inference for Mapping Abstract Scenes to Text,cs.CL," We describe a technique for structured prediction, based on canonical -correlation analysis. Our learning algorithm finds two projections for the -input and the output spaces that aim at projecting a given input and its -correct output into points close to each other. We demonstrate our technique on -a language-vision problem, namely the problem of giving a textual description -to an ""abstract scene"". -" -3300,1608.02893,David Cox,Syntactically Informed Text Compression with Recurrent Neural Networks,cs.LG cs.CL cs.IT math.IT," We present a self-contained system for constructing natural language models -for use in text compression. Our system improves upon previous neural network -based models by utilizing recent advances in syntactic parsing -- Google's -SyntaxNet -- to augment character-level recurrent neural networks. RNNs have -proven exceptional in modeling sequence data such as text, as their -architecture allows for modeling of long-term contextual information. -" -3301,1608.02904,"Jeniya Tabassum, Alan Ritter, Wei Xu","TweeTime: A Minimally Supervised Method for Recognizing and Normalizing - Time Expressions in Twitter",cs.IR cs.CL," We describe TweeTIME, a temporal tagger for recognizing and normalizing time -expressions in Twitter. Most previous work in social media analysis has to rely -on temporal resolvers that are designed for well-edited text, and therefore -suffer from the reduced performance due to domain mismatch. We present a -minimally supervised method that learns from large quantities of unlabeled data -and requires no hand-engineered rules or hand-annotated training corpora. -TweeTIME achieves 0.68 F1 score on the end-to-end task of resolving date -expressions, outperforming a broad range of state-of-the-art systems. -" -3302,1608.02926,Michael Henry Tessler and Noah D. Goodman,The Language of Generalization,cs.CL," Language provides simple ways of communicating generalizable knowledge to -each other (e.g., ""Birds fly"", ""John hikes"", ""Fire makes smoke""). Though found -in every language and emerging early in development, the language of -generalization is philosophically puzzling and has resisted precise -formalization. Here, we propose the first formal account of generalizations -conveyed with language that makes quantitative predictions about human -understanding. We test our model in three diverse domains: generalizations -about categories (generic language), events (habitual language), and causes -(causal language). The model explains the gradience in human endorsement -through the interplay between a simple truth-conditional semantic theory and -diverse beliefs about properties, formalized in a probabilistic model of -language understanding. This work opens the door to understanding precisely how -abstract knowledge is learned from language. -" -3303,1608.02927,"Baskaran Sankaran, Haitao Mi, Yaser Al-Onaizan, Abe Ittycheriah",Temporal Attention Model for Neural Machine Translation,cs.CL," Attention-based Neural Machine Translation (NMT) models suffer from attention -deficiency issues as has been observed in recent research. We propose a novel -mechanism to address some of these limitations and improve the NMT attention. -Specifically, our approach memorizes the alignments temporally (within each -sentence) and modulates the attention with the accumulated temporal memory, as -the decoder generates the candidate translation. We compare our approach -against the baseline NMT model and two other related approaches that address -this issue either explicitly or implicitly. Large-scale experiments on two -language pairs show that our approach achieves better and robust gains over the -baseline and related NMT approaches. Our model further outperforms strong SMT -baselines in some settings even without using ensembles. -" -3304,1608.02996,Antonio Valerio Miceli Barone,"Towards cross-lingual distributed representations without parallel text - trained with adversarial autoencoders",cs.CL cs.LG cs.NE," Current approaches to learning vector representations of text that are -compatible between different languages usually require some amount of parallel -text, aligned at word, sentence or at least document level. We hypothesize -however, that different natural languages share enough semantic structure that -it should be possible, in principle, to learn compatible vector representations -just by analyzing the monolingual distribution of words. - In order to evaluate this hypothesis, we propose a scheme to map word vectors -trained on a source language to vectors semantically compatible with word -vectors trained on a target language using an adversarial autoencoder. - We present preliminary qualitative results and discuss possible future -developments of this technique, such as applications to cross-lingual sentence -representations. -" -3305,1608.03000,"Nicholas Locascio, Karthik Narasimhan, Eduardo DeLeon, Nate Kushman, - Regina Barzilay","Neural Generation of Regular Expressions from Natural Language with - Minimal Domain Knowledge",cs.CL cs.AI," This paper explores the task of translating natural language queries into -regular expressions which embody their meaning. In contrast to prior work, the -proposed neural model does not utilize domain-specific crafting, learning to -translate directly from a parallel corpus. To fully explore the potential of -neural models, we propose a methodology for collecting a large corpus of -regular expression, natural language pairs. Our resulting model achieves a -performance gain of 19.6% over previous state-of-the-art models. -" -3306,1608.03030,"Aaron Jaech and George Mulcaire and Shobhit Hathi and Mari Ostendorf - and Noah A. Smith",Hierarchical Character-Word Models for Language Identification,cs.CL," Social media messages' brevity and unconventional spelling pose a challenge -to language identification. We introduce a hierarchical model that learns -character and contextualized word-level representations for language -identification. Our method performs well against strong base- lines, and can -also reveal code-switching. -" -3307,1608.03065,C. Maria Keet,"An assessment of orthographic similarity measures for several African - languages",cs.CL," Natural Language Interfaces and tools such as spellcheckers and Web search in -one's own language are known to be useful in ICT-mediated communication. Most -languages in Southern Africa are under-resourced, however. Therefore, it would -be very useful if both the generic and the few language-specific NLP tools -could be reused or easily adapted across languages. This depends on the notion, -and extent, of similarity between the languages. We assess this from the angle -of orthography and corpora. Twelve versions of the Universal Declaration of -Human Rights (UDHR) are examined, showing clusters of languages, and which are -thus more or less amenable to cross-language adaptation of NLP tools, which do -not match with Guthrie zones. To examine the generalisability of these results, -we zoom in on isiZulu both quantitatively and qualitatively with four other -corpora and texts in different genres. The results show that the UDHR is a -typical text document orthographically. The results also provide insight into -usability of typical measures such as lexical diversity and genre, and that the -same statistic may mean different things in different documents. While NLTK for -Python could be used for basic analyses of text, it, and similar NLP tools, -will need considerable customization. -" -3308,1608.03192,"Salvador Agui\~naga and Rodrigo Palacios and David Chiang and Tim - Weninger",Growing Graphs with Hyperedge Replacement Graph Grammars,cs.SI cs.CL," Discovering the underlying structures present in large real world graphs is a -fundamental scientific problem. In this paper we show that a graph's clique -tree can be used to extract a hyperedge replacement grammar. If we store an -ordering from the extraction process, the extracted graph grammar is guaranteed -to generate an isomorphic copy of the original graph. Or, a stochastic -application of the graph grammar rules can be used to quickly create random -graphs. In experiments on large real world networks, we show that random -graphs, generated from extracted graph grammars, exhibit a wide range of -properties that are very similar to the original graphs. In addition to graph -properties like degree or eigenvector centrality, what a graph ""looks like"" -ultimately depends on small details in local graph substructures that are -difficult to define at a global level. We show that our generative graph model -is able to preserve these local substructures when generating new graphs and -performs well on new and difficult tests of model robustness. -" -3309,1608.03448,Stefania Raimondo and Frank Rudzicz,"Sex, drugs, and violence",cs.CL," Automatically detecting inappropriate content can be a difficult NLP task, -requiring understanding context and innuendo, not just identifying specific -keywords. Due to the large quantity of online user-generated content, automatic -detection is becoming increasingly necessary. We take a largely unsupervised -approach using a large corpus of narratives from a community-based -self-publishing website and a small segment of crowd-sourced annotations. We -explore topic modelling using latent Dirichlet allocation (and a variation), -and use these to regress appropriateness ratings, effectively automating rating -for suitability. The results suggest that certain topics inferred may be useful -in detecting latent inappropriateness -- yielding recall up to 96% and low -regression errors. -" -3310,1608.03542,"Daniel Hewlett, Alexandre Lacoste, Llion Jones, Illia Polosukhin, - Andrew Fandrianto, Jay Han, Matthew Kelcey, David Berthelot","WikiReading: A Novel Large-scale Language Understanding Task over - Wikipedia",cs.CL," We present WikiReading, a large-scale natural language understanding task and -publicly-available dataset with 18 million instances. The task is to predict -textual values from the structured knowledge base Wikidata by reading the text -of the corresponding Wikipedia articles. The task contains a rich variety of -challenging classification and extraction sub-tasks, making it well-suited for -end-to-end models such as deep neural networks (DNNs). We compare various -state-of-the-art DNN-based architectures for document classification, -information extraction, and question answering. We find that models supporting -a rich answer space, such as word or character sequences, perform best. Our -best-performing model, a word-level sequence to sequence model with a mechanism -to copy out-of-vocabulary words, obtains an accuracy of 71.8%. -" -3311,1608.03587,"Alexander Koplenig, Peter Meyer, Sascha Wolfer, and Carolin - Mueller-Spitzer","The statistical trade-off between word order and word structure - - large-scale evidence for the principle of least effort",cs.CL," Languages employ different strategies to transmit structural and grammatical -information. While, for example, grammatical dependency relationships in -sentences are mainly conveyed by the ordering of the words for languages like -Mandarin Chinese, or Vietnamese, the word ordering is much less restricted for -languages such as Inupiatun or Quechua, as those languages (also) use the -internal structure of words (e.g. inflectional morphology) to mark grammatical -relationships in a sentence. Based on a quantitative analysis of more than -1,500 unique translations of different books of the Bible in more than 1,100 -different languages that are spoken as a native language by approximately 6 -billion people (more than 80% of the world population), we present large-scale -evidence for a statistical trade-off between the amount of information conveyed -by the ordering of words and the amount of information conveyed by internal -word structure: languages that rely more strongly on word order information -tend to rely less on word structure information and vice versa. In addition, we -find that - despite differences in the way information is expressed - there is -also evidence for a trade-off between different books of the biblical canon -that recurs with little variation across languages: the more informative the -word order of the book, the less informative its word structure and vice versa. -We argue that this might suggest that, on the one hand, languages encode -information in very different (but efficient) ways. On the other hand, -content-related and stylistic features are statistically encoded in very -similar ways. -" -3312,1608.03764,Michael Spranger and Sucheendra K. Palaniappan and Samik Ghosh,Extracting Biological Pathway Models From NLP Event Representations,cs.CL q-bio.MN," This paper describes an an open-source software system for the automatic -conversion of NLP event representations to system biology structured data -interchange formats such as SBML and BioPAX. It is part of a larger effort to -make results of the NLP community available for system biology pathway -modelers. -" -3313,1608.03767,Michael Spranger and Sucheendra K. Palaniappan and Samik Ghosh,"Measuring the State of the Art of Automated Pathway Curation Using Graph - Algorithms - A Case Study of the mTOR Pathway",cs.CL q-bio.MN," This paper evaluates the difference between human pathway curation and -current NLP systems. We propose graph analysis methods for quantifying the gap -between human curated pathway maps and the output of state-of-the-art automatic -NLP systems. Evaluation is performed on the popular mTOR pathway. Based on -analyzing where current systems perform well and where they fail, we identify -possible avenues for progress. -" -3314,1608.03785,"Yaared Al-Mehairi, Bob Coecke, Martha Lewis",Compositional Distributional Cognition,cs.AI cs.CL math.CT," We accommodate the Integrated Connectionist/Symbolic Architecture (ICS) of -[32] within the categorical compositional semantics (CatCo) of [13], forming a -model of categorical compositional cognition (CatCog). This resolves intrinsic -problems with ICS such as the fact that representations inhabit an unbounded -space and that sentences with differing tree structures cannot be directly -compared. We do so in a way that makes the most of the grammatical structure -available, in contrast to strategies like circular convolution. Using the CatCo -model also allows us to make use of tools developed for CatCo such as the -representation of ambiguity and logical reasoning via density matrices, -structural meanings for words such as relative pronouns, and addressing over- -and under-extension, all of which are present in cognitive processes. Moreover -the CatCog framework is sufficiently flexible to allow for entirely different -representations of meaning, such as conceptual spaces. Interestingly, since the -CatCo model was largely inspired by categorical quantum mechanics, so is -CatCog. -" -3315,1608.03803,"Andrey Kutuzov, Erik Velldal, Lilja {\O}vrelid",Redefining part-of-speech classes with distributional semantic models,cs.CL," This paper studies how word embeddings trained on the British National Corpus -interact with part of speech boundaries. Our work targets the Universal PoS tag -set, which is currently actively being used for annotation of a range of -languages. We experiment with training classifiers for predicting PoS tags for -words based on their embeddings. The results show that the information about -PoS affiliation contained in the distributional vectors allows us to discover -groups of words with distributional patterns that differ from other words of -the same part of speech. - This data often reveals hidden inconsistencies of the annotation process or -guidelines. At the same time, it supports the notion of `soft' or `graded' part -of speech affiliations. Finally, we show that information about PoS is -distributed among dozens of vector components, not limited to only one or two -features. -" -3316,1608.03902,"Dat Tien Nguyen, Kamela Ali Al Mannai, Shafiq Joty, Hassan Sajjad, - Muhammad Imran, Prasenjit Mitra","Rapid Classification of Crisis-Related Data on Social Networks using - Convolutional Neural Networks",cs.CL cs.LG cs.SI," The role of social media, in particular microblogging platforms such as -Twitter, as a conduit for actionable and tactical information during disasters -is increasingly acknowledged. However, time-critical analysis of big crisis -data on social media streams brings challenges to machine learning techniques, -especially the ones that use supervised learning. The Scarcity of labeled data, -particularly in the early hours of a crisis, delays the machine learning -process. The current state-of-the-art classification methods require a -significant amount of labeled data specific to a particular event for training -plus a lot of feature engineering to achieve best results. In this work, we -introduce neural network based classification methods for binary and -multi-class tweet classification task. We show that neural network based models -do not require any feature engineering and perform better than state-of-the-art -methods. In the early hours of a disaster when no labeled data is available, -our proposed method makes the best use of the out-of-event data and achieves -good results. -" -3317,1608.03938,"Christopher Thompson, Josh Introne, and Clint Young",Determining Health Utilities through Data Mining of Social Media,cs.CL cs.AI cs.CY cs.SI," 'Health utilities' measure patient preferences for perfect health compared to -specific unhealthy states, such as asthma, a fractured hip, or colon cancer. -When integrated over time, these estimations are called quality adjusted life -years (QALYs). Until now, characterizing health utilities (HUs) required -detailed patient interviews or written surveys. While reliable and specific, -this data remained costly due to efforts to locate, enlist and coordinate -participants. Thus the scope, context and temporality of diseases examined has -remained limited. - Now that more than a billion people use social media, we propose a novel -strategy: use natural language processing to analyze public online -conversations for signals of the severity of medical conditions and correlate -these to known HUs using machine learning. In this work, we filter a dataset -that originally contained 2 billion tweets for relevant content on 60 diseases. -Using this data, our algorithm successfully distinguished mild from severe -diseases, which had previously been categorized only by traditional techniques. -This represents progress towards two related applications: first, predicting -HUs where such information is nonexistent; and second, (where rich HU data -already exists) estimating temporal or geographic patterns of disease severity -through data mining. -" -3318,1608.03995,"Chandler May, Ryan Cotterell, Benjamin Van Durme","An Analysis of Lemmatization on Topic Models of Morphologically Rich - Language",cs.CL," Topic models are typically represented by top-$m$ word lists for human -interpretation. The corpus is often pre-processed with lemmatization (or -stemming) so that those representations are not undermined by a proliferation -of words with similar meanings, but there is little public work on the effects -of that pre-processing. Recent work studied the effect of stemming on topic -models of English texts and found no supporting evidence for the practice. We -study the effect of lemmatization on topic models of Russian Wikipedia -articles, finding in one configuration that it significantly improves -interpretability according to a word intrusion metric. We conclude that -lemmatization may benefit topic models on morphologically rich languages, but -that further investigation is needed. -" -3319,1608.04020,"Max Kanovich, Stepan Kuznetsov, Andre Scedrov","Undecidability of the Lambek calculus with subexponential and bracket - modalities",math.LO cs.CL," The Lambek calculus is a well-known logical formalism for modelling natural -language syntax. The original calculus covered a substantial number of -intricate natural language phenomena, but only those restricted to the -context-free setting. In order to address more subtle linguistic issues, the -Lambek calculus has been extended in various ways. In particular, Morrill and -Valentin (2015) introduce an extension with so-called exponential and bracket -modalities. Their extension is based on a non-standard contraction rule for the -exponential that interacts with the bracket structure in an intricate way. The -standard contraction rule is not admissible in this calculus. In this paper we -prove undecidability of the derivability problem in their calculus. We also -investigate restricted decidable fragments considered by Morrill and Valentin -and we show that these fragments belong to the NP class. -" -3320,1608.04089,"Kerry Zhang, Jussi Karlgren, Cheng Zhang, Jens Lagergren",Viewpoint and Topic Modeling of Current Events,cs.CL cs.IR stat.ML," There are multiple sides to every story, and while statistical topic models -have been highly successful at topically summarizing the stories in corpora of -text documents, they do not explicitly address the issue of learning the -different sides, the viewpoints, expressed in the documents. In this paper, we -show how these viewpoints can be learned completely unsupervised and -represented in a human interpretable form. We use a novel approach of applying -CorrLDA2 for this purpose, which learns topic-viewpoint relations that can be -used to form groups of topics, where each group represents a viewpoint. A -corpus of documents about the Israeli-Palestinian conflict is then used to -demonstrate how a Palestinian and an Israeli viewpoint can be learned. By -leveraging the magnitudes and signs of the feature weights of a linear SVM, we -introduce a principled method to evaluate associations between topics and -viewpoints. With this, we demonstrate, both quantitatively and qualitatively, -that the learned topic groups are contextually coherent, and form consistently -correct topic-viewpoint associations. -" -3321,1608.04147,"Georgios P. Spithourakis, Isabelle Augenstein, Sebastian Riedel",Numerically Grounded Language Models for Semantic Error Correction,cs.CL cs.NE," Semantic error detection and correction is an important task for applications -such as fact checking, speech-to-text or grammatical error correction. Current -approaches generally focus on relatively shallow semantics and do not account -for numeric quantities. Our approach uses language models grounded in numbers -within the text. Such groundings are easily achieved for recurrent neural -language model architectures, which can be further conditioned on incomplete -background knowledge bases. Our evaluation on clinical reports shows that -numerical grounding improves perplexity by 33% and F1 for semantic error -correction by 5 points when compared to ungrounded approaches. Conditioning on -a knowledge base yields further improvements. -" -3322,1608.04207,"Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, Yoav Goldberg","Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction - Tasks",cs.CL," There is a lot of research interest in encoding variable length sentences -into fixed length vectors, in a way that preserves the sentence meanings. Two -common methods include representations based on averaging word vectors, and -representations based on the hidden states of recurrent neural networks such as -LSTMs. The sentence vectors are used as features for subsequent machine -learning tasks or for pre-training in the context of deep learning. However, -not much is known about the properties that are encoded in these sentence -representations and about the language information they capture. We propose a -framework that facilitates better understanding of the encoded representations. -We define prediction tasks around isolated aspects of sentence structure -(namely sentence length, word content, and word order), and score -representations by the ability to train a classifier to solve each prediction -task when using the representation as input. We demonstrate the potential -contribution of the approach by analyzing different sentence representation -mechanisms. The analysis sheds light on the relative strengths of different -sentence embedding methods with respect to these low level prediction tasks, -and on the effect of the encoded vector's dimensionality on the resulting -representations. -" -3323,1608.04434,"Emre Erturk, Hong Shi",Natural Language Processing using Hadoop and KOSHIK,cs.CL," Natural language processing, as a data analytics related technology, is used -widely in many research areas such as artificial intelligence, human language -processing, and translation. At present, due to explosive growth of data, there -are many challenges for natural language processing. Hadoop is one of the -platforms that can process the large amount of data required for natural -language processing. KOSHIK is one of the natural language processing -architectures, and utilizes Hadoop and contains language processing components -such as Stanford CoreNLP and OpenNLP. This study describes how to build a -KOSHIK platform with the relevant tools, and provides the steps to analyze wiki -data. Finally, it evaluates and discusses the advantages and disadvantages of -the KOSHIK architecture, and gives recommendations on improving the processing -performance. -" -3324,1608.04465,"Ehsan Shareghi, Matthias Petri, Gholamreza Haffari and Trevor Cohn","Fast, Small and Exact: Infinite-order Language Modelling with Compressed - Suffix Trees",cs.CL," Efficient methods for storing and querying are critical for scaling -high-order n-gram language models to large corpora. We propose a language model -based on compressed suffix trees, a representation that is highly compact and -can be easily held in memory, while supporting queries needed in computing -language model probabilities on-the-fly. We present several optimisations which -improve query runtimes up to 2500x, despite only incurring a modest increase in -construction time and memory usage. For large corpora and high Markov orders, -our method is highly competitive with the state-of-the-art KenLM package. It -imposes much lower memory requirements, often by orders of magnitude, and has -runtimes that are either similar (for training) or comparable (for querying). -" -3325,1608.04485,Douglas Bagnall,Authorship clustering using multi-headed recurrent neural networks,cs.CL," A recurrent neural network that has been trained to separately model the -language of several documents by unknown authors is used to measure similarity -between the documents. It is able to find clues of common authorship even when -the documents are very short and about disparate topics. While it is easy to -make statistically significant predictions regarding authorship, it is -difficult to group documents into definite clusters with high accuracy. -" -3326,1608.04631,"Luisa Bentivogli, Arianna Bisazza, Mauro Cettolo, Marcello Federico",Neural versus Phrase-Based Machine Translation Quality: a Case Study,cs.CL," Within the field of Statistical Machine Translation (SMT), the neural -approach (NMT) has recently emerged as the first technology able to challenge -the long-standing dominance of phrase-based approaches (PBMT). In particular, -at the IWSLT 2015 evaluation campaign, NMT outperformed well established -state-of-the-art PBMT systems on English-German, a language pair known to be -particularly hard because of morphology and syntactic differences. To -understand in what respects NMT provides better translation quality than PBMT, -we perform a detailed analysis of neural versus phrase-based SMT outputs, -leveraging high quality post-edits performed by professional translators on the -IWSLT data. For the first time, our analysis provides useful insights on what -linguistic phenomena are best modeled by neural models -- such as the -reordering of verbs -- while pointing out other aspects that remain to be -improved. -" -3327,1608.04670,Ajinkya More,Attribute Extraction from Product Titles in eCommerce,cs.CL cs.IR," This paper presents a named entity extraction system for detecting attributes -in product titles of eCommerce retailers like Walmart. The absence of syntactic -structure in such short pieces of text makes extracting attribute values a -challenging problem. We find that combining sequence labeling algorithms such -as Conditional Random Fields and Structured Perceptron with a curated -normalization scheme produces an effective system for the task of extracting -product attribute values from titles. To keep the discussion concrete, we will -illustrate the mechanics of the system from the point of view of a particular -attribute - brand. We also discuss the importance of an attribute extraction -system in the context of retail websites with large product catalogs, compare -our approach to other potential approaches to this problem and end the paper -with a discussion of the performance of our system for extracting attributes. -" -3328,1608.04738,"Shenjian Zhao, Zhihua Zhang",An Efficient Character-Level Neural Machine Translation,cs.CL stat.ML," Neural machine translation aims at building a single large neural network -that can be trained to maximize translation performance. The encoder-decoder -architecture with an attention mechanism achieves a translation performance -comparable to the existing state-of-the-art phrase-based systems on the task of -English-to-French translation. However, the use of large vocabulary becomes the -bottleneck in both training and improving the performance. In this paper, we -propose an efficient architecture to train a deep character-level neural -machine translation by introducing a decimator and an interpolator. The -decimator is used to sample the source sequence before encoding while the -interpolator is used to resample after decoding. Such a deep model has two -major advantages. It avoids the large vocabulary issue radically; at the same -time, it is much faster and more memory-efficient in training than conventional -character-based models. More interestingly, our model is able to translate the -misspelled word like human beings. -" -3329,1608.04767,"Steven Neale, Valeria de Paiva, Arantxa Otegi, Alexandre Rademaker",Proceedings of the LexSem+Logics Workshop 2016,cs.CL," Lexical semantics continues to play an important role in driving research -directions in NLP, with the recognition and understanding of context becoming -increasingly important in delivering successful outcomes in NLP tasks. Besides -traditional processing areas such as word sense and named entity -disambiguation, the creation and maintenance of dictionaries, annotated corpora -and resources have become cornerstones of lexical semantics research and -produced a wealth of contextual information that NLP processes can exploit. New -efforts both to link and construct from scratch such information - as Linked -Open Data or by way of formal tools coming from logic, ontologies and automated -reasoning - have increased the interoperability and accessibility of resources -for lexical and computational semantics, even in those languages for which they -have previously been limited. - LexSem+Logics 2016 combines the 1st Workshop on Lexical Semantics for -Lesser-Resources Languages and the 3rd Workshop on Logics and Ontologies. The -accepted papers in our program covered topics across these two areas, -including: the encoding of plurals in Wordnets, the creation of a thesaurus -from multiple sources based on semantic similarity metrics, and the use of -cross-lingual treebanks and annotations for universal part-of-speech tagging. -We also welcomed talks from two distinguished speakers: on Portuguese lexical -knowledge bases (different approaches, results and their application in NLP -tasks) and on new strategies for open information extraction (the capture of -verb-based propositions from massive text corpora). -" -3330,1608.04808,"Hao Fang, Hao Cheng, Mari Ostendorf","Learning Latent Local Conversation Modes for Predicting Community - Endorsement in Online Discussions",cs.SI cs.CL," Many social media platforms offer a mechanism for readers to react to -comments, both positively and negatively, which in aggregate can be thought of -as community endorsement. This paper addresses the problem of predicting -community endorsement in online discussions, leveraging both the participant -response structure and the text of the comment. The different types of features -are integrated in a neural network that uses a novel architecture to learn -latent modes of discussion structure that perform as well as deep neural -networks but are more interpretable. In addition, the latent modes can be used -to weight text features thereby improving prediction accuracy. -" -3331,1608.04868,"Keunwoo Choi and George Fazekas and Brian McFee and Kyunghyun Cho and - Mark Sandler",Towards Music Captioning: Generating Music Playlist Descriptions,cs.MM cs.AI cs.CL," Descriptions are often provided along with recommendations to help users' -discovery. Recommending automatically generated music playlists (e.g. -personalised playlists) introduces the problem of generating descriptions. In -this paper, we propose a method for generating music playlist descriptions, -which is called as music captioning. In the proposed method, audio content -analysis and natural language processing are adopted to utilise the information -of each track. -" -3332,1608.04917,"Darko Cherepnalkoski, Andreas Karpf, Igor Mozetic, Miha Grcar","Cohesion and Coalition Formation in the European Parliament: Roll-Call - Votes and Twitter Activities",cs.CL cs.CY cs.SI," We study the cohesion within and the coalitions between political groups in -the Eighth European Parliament (2014--2019) by analyzing two entirely different -aspects of the behavior of the Members of the European Parliament (MEPs) in the -policy-making processes. On one hand, we analyze their co-voting patterns and, -on the other, their retweeting behavior. We make use of two diverse datasets in -the analysis. The first one is the roll-call vote dataset, where cohesion is -regarded as the tendency to co-vote within a group, and a coalition is formed -when the members of several groups exhibit a high degree of co-voting agreement -on a subject. The second dataset comes from Twitter; it captures the retweeting -(i.e., endorsing) behavior of the MEPs and implies cohesion (retweets within -the same group) and coalitions (retweets between groups) from a completely -different perspective. - We employ two different methodologies to analyze the cohesion and coalitions. -The first one is based on Krippendorff's Alpha reliability, used to measure the -agreement between raters in data-analysis scenarios, and the second one is -based on Exponential Random Graph Models, often used in social-network -analysis. We give general insights into the cohesion of political groups in the -European Parliament, explore whether coalitions are formed in the same way for -different policy areas, and examine to what degree the retweeting behavior of -MEPs corresponds to their co-voting patterns. A novel and interesting aspect of -our work is the relationship between the co-voting and retweeting patterns. -" -3333,1608.04983,"Jeehye Lee, Myungin Lee, and Joon-Hyuk Chang","Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models - for Reverberant Speech Recognition",cs.CL," Distant speech recognition is a challenge, particularly due to the corruption -of speech signals by reverberation caused by large distances between the -speaker and microphone. In order to cope with a wide range of reverberations in -real-world situations, we present novel approaches for acoustic modeling -including an ensemble of deep neural networks (DNNs) and an ensemble of jointly -trained DNNs. First, multiple DNNs are established, each of which corresponds -to a different reverberation time 60 (RT60) in a setup step. Also, each model -in the ensemble of DNN acoustic models is further jointly trained, including -both feature mapping and acoustic modeling, where the feature mapping is -designed for the dereverberation as a front-end. In a testing phase, the two -most likely DNNs are chosen from the DNN ensemble using maximum a posteriori -(MAP) probabilities, computed in an online fashion by using maximum likelihood -(ML)-based blind RT60 estimation and then the posterior probability outputs -from two DNNs are combined using the ML-based weights as a simple average. -Extensive experiments demonstrate that the proposed approach leads to -substantial improvements in speech recognition accuracy over the conventional -DNN baseline systems under diverse reverberant conditions. -" -3334,1608.05014,Vered Shwartz and Ido Dagan,"Path-based vs. Distributional Information in Recognizing Lexical - Semantic Relations",cs.CL," Recognizing various semantic relations between terms is beneficial for many -NLP tasks. While path-based and distributional information sources are -considered complementary for this task, the superior results the latter showed -recently suggested that the former's contribution might have become obsolete. -We follow the recent success of an integrated neural method for hypernymy -detection (Shwartz et al., 2016) and extend it to recognize multiple relations. -The empirical results show that this method is effective in the multiclass -setting as well. We further show that the path-based information source always -contributes to the classification, and analyze the cases in which it mostly -complements the distributional information. -" -3335,1608.05129,"Liang Wu, Fred Morstatter, Huan Liu","SlangSD: Building and Using a Sentiment Dictionary of Slang Words for - Short-Text Sentiment Classification",cs.CL," Sentiment in social media is increasingly considered as an important resource -for customer segmentation, market understanding, and tackling other -socio-economic issues. However, sentiment in social media is difficult to -measure since user-generated content is usually short and informal. Although -many traditional sentiment analysis methods have been proposed, identifying -slang sentiment words remains untackled. One of the reasons is that slang -sentiment words are not available in existing dictionaries or sentiment -lexicons. To this end, we propose to build the first sentiment dictionary of -slang words to aid sentiment analysis of social media content. It is laborious -and time-consuming to collect and label the sentiment polarity of a -comprehensive list of slang words. We present an approach to leverage web -resources to construct an extensive Slang Sentiment word Dictionary (SlangSD) -that is easy to maintain and extend. SlangSD is publicly available for research -purposes. We empirically show the advantages of using SlangSD, the newly-built -slang sentiment word dictionary for sentiment classification, and provide -examples demonstrating its ease of use with an existing sentiment system. -" -3336,1608.05243,Ana Marasovi\'c and Anette Frank,"Multilingual Modal Sense Classification using a Convolutional Neural - Network",cs.CL," Modal sense classification (MSC) is a special WSD task that depends on the -meaning of the proposition in the modal's scope. We explore a CNN architecture -for classifying modal sense in English and German. We show that CNNs are -superior to manually designed feature-based classifiers and a standard NN -classifier. We analyze the feature maps learned by the CNN and identify known -and previously unattested linguistic features. We benchmark the CNN on a -standard WSD task, where it compares favorably to models using -sense-disambiguated target vectors. -" -3337,1608.05374,Srikanth Ronanki and Siva Reddy and Bajibabu Bollepalli and Simon King,DNN-based Speech Synthesis for Indian Languages from ASCII text,cs.CL," Text-to-Speech synthesis in Indian languages has a seen lot of progress over -the decade partly due to the annual Blizzard challenges. These systems assume -the text to be written in Devanagari or Dravidian scripts which are nearly -phonemic orthography scripts. However, the most common form of computer -interaction among Indians is ASCII written transliterated text. Such text is -generally noisy with many variations in spelling for the same word. In this -paper we evaluate three approaches to synthesize speech from such noisy ASCII -text: a naive Uni-Grapheme approach, a Multi-Grapheme approach, and a -supervised Grapheme-to-Phoneme (G2P) approach. These methods first convert the -ASCII text to a phonetic script, and then learn a Deep Neural Network to -synthesize speech from that. We train and test our models on Blizzard Challenge -datasets that were transliterated to ASCII using crowdsourcing. Our experiments -on Hindi, Tamil and Telugu demonstrate that our models generate speech of -competetive quality from ASCII text compared to the speech synthesized from the -native scripts. All the accompanying transliterated datasets are released for -public access. -" -3338,1608.05426,"Omer Levy, Anders S{\o}gaard, Yoav Goldberg","A Strong Baseline for Learning Cross-Lingual Word Embeddings from - Sentence Alignments",cs.CL," While cross-lingual word embeddings have been studied extensively in recent -years, the qualitative differences between the different algorithms remain -vague. We observe that whether or not an algorithm uses a particular feature -set (sentence IDs) accounts for a significant performance gap among these -algorithms. This feature set is also used by traditional alignment algorithms, -such as IBM Model-1, which demonstrate similar performance to state-of-the-art -embedding algorithms on a variety of benchmarks. Overall, we observe that -different algorithmic approaches for utilizing the sentence ID feature space -result in similar performance. This paper draws both empirical and theoretical -parallels between the embedding and alignment literature, and suggests that -adding additional sources of information, which go beyond the traditional -signal of bilingual sentence-aligned corpora, may substantially improve -cross-lingual word embeddings, and that future baselines should at least take -such features into account. -" -3339,1608.05457,"Takeshi Onishi, Hai Wang, Mohit Bansal, Kevin Gimpel and David - McAllester",Who did What: A Large-Scale Person-Centered Cloze Dataset,cs.CL," We have constructed a new ""Who-did-What"" dataset of over 200,000 -fill-in-the-gap (cloze) multiple choice reading comprehension problems -constructed from the LDC English Gigaword newswire corpus. The WDW dataset has -a variety of novel features. First, in contrast with the CNN and Daily Mail -datasets (Hermann et al., 2015) we avoid using article summaries for question -formation. Instead, each problem is formed from two independent articles --- an -article given as the passage to be read and a separate article on the same -events used to form the question. Second, we avoid anonymization --- each -choice is a person named entity. Third, the problems have been filtered to -remove a fraction that are easily solved by simple baselines, while remaining -84% solvable by humans. We report performance benchmarks of standard systems -and propose the WDW dataset as a challenge task for the community. -" -3340,1608.05528,"Ivan Vuli\'c, Roy Schwartz, Ari Rappoport, Roi Reichart, and Anna - Korhonen","Automatic Selection of Context Configurations for Improved - Class-Specific Word Representations",cs.CL," This paper is concerned with identifying contexts useful for training word -representation models for different word classes such as adjectives (A), verbs -(V), and nouns (N). We introduce a simple yet effective framework for an -automatic selection of class-specific context configurations. We construct a -context configuration space based on universal dependency relations between -words, and efficiently search this space with an adapted beam search algorithm. -In word similarity tasks for each word class, we show that our framework is -both effective and efficient. Particularly, it improves the Spearman's rho -correlation with human scores on SimLex-999 over the best previously proposed -class-specific contexts by 6 (A), 6 (V) and 5 (N) rho points. With our selected -context configurations, we train on only 14% (A), 26.2% (V), and 33.6% (N) of -all dependency-based contexts, resulting in a reduced training time. Our -results generalise: we show that the configurations our algorithm learns for -one English training setup outperform previously proposed context types in -another training setup for English. Moreover, basing the configuration space on -universal dependencies, it is possible to transfer the learned configurations -to German and Italian. We also demonstrate improved per-class results over -other context types in these two languages. -" -3341,1608.05554,Qingfu Zhu and Weinan Zhang and Lianqiang Zhou and Ting Liu,Learning to Start for Sequence to Sequence Architecture,cs.CL," The sequence to sequence architecture is widely used in the response -generation and neural machine translation to model the potential relationship -between two sentences. It typically consists of two parts: an encoder that -reads from the source sentence and a decoder that generates the target sentence -word by word according to the encoder's output and the last generated word. -However, it faces to the cold start problem when generating the first word as -there is no previous word to refer. Existing work mainly use a special start -symbol to generate the first word. An obvious drawback of these work is -that there is not a learnable relationship between words and the start symbol. -Furthermore, it may lead to the error accumulation for decoding when the first -word is incorrectly generated. In this paper, we proposed a novel approach to -learning to generate the first word in the sequence to sequence architecture -rather than using the start symbol. Experimental results on the task of -response generation of short text conversation show that the proposed approach -outperforms the state-of-the-art approach in both of the automatic and manual -evaluations. -" -3342,1608.05604,"Michael Hahn, Frank Keller",Modeling Human Reading with Neural Attention,cs.CL," When humans read text, they fixate some words and skip others. However, there -have been few attempts to explain skipping behavior with computational models, -as most existing work has focused on predicting reading times (e.g.,~using -surprisal). In this paper, we propose a novel approach that models both -skipping and reading, using an unsupervised architecture that combines a neural -attention with autoencoding, trained on raw text using reinforcement learning. -Our model explains human reading behavior as a tradeoff between precision of -language understanding (encoding the input accurately) and economy of attention -(fixating as few words as possible). We evaluate the model on the Dundee -eye-tracking corpus, showing that it accurately predicts skipping behavior and -reading times, is competitive with surprisal, and captures known qualitative -features of human reading. -" -3343,1608.05605,St\'ephan Tulkens and Simon \v{S}uster and Walter Daelemans,"Using Distributed Representations to Disambiguate Biomedical and - Clinical Concepts",cs.CL," In this paper, we report a knowledge-based method for Word Sense -Disambiguation in the domains of biomedical and clinical text. We combine word -representations created on large corpora with a small number of definitions -from the UMLS to create concept representations, which we then compare to -representations of the context of ambiguous terms. Using no relational -information, we obtain comparable performance to previous approaches on the -MSH-WSD dataset, which is a well-known dataset in the biomedical domain. -Additionally, our method is fast and easy to set up and extend to other -domains. Supplementary materials, including source code, can be found at https: -//github.com/clips/yarn -" -3344,1608.05777,"Lei Xu, Ziyun Wang, Ayana, Zhiyuan Liu, Maosong Sun",Topic Sensitive Neural Headline Generation,cs.CL," Neural models have recently been used in text summarization including -headline generation. The model can be trained using a set of document-headline -pairs. However, the model does not explicitly consider topical similarities and -differences of documents. We suggest to categorizing documents into various -topics so that documents within the same topic are similar in content and share -similar summarization patterns. Taking advantage of topic information of -documents, we propose topic sensitive neural headline generation model. Our -model can generate more accurate summaries guided by document topics. We test -our model on LCSTS dataset, and experiments show that our method outperforms -other baselines on each topic and achieves the state-of-art performance. -" -3345,1608.05813,"Ying Hua Tan, Chee Seng Chan",phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning,cs.CL cs.CV," A picture is worth a thousand words. Not until recently, however, we noticed -some success stories in understanding of visual scenes: a model that is able to -detect/name objects, describe their attributes, and recognize their -relationships/interactions. In this paper, we propose a phrase-based -hierarchical Long Short-Term Memory (phi-LSTM) model to generate image -description. The proposed model encodes sentence as a sequence of combination -of phrases and words, instead of a sequence of words alone as in those -conventional solutions. The two levels of this model are dedicated to i) learn -to generate image relevant noun phrases, and ii) produce appropriate image -description from the phrases and other words in the corpus. Adopting a -convolutional neural network to learn image features and the LSTM to learn the -word sequence in a sentence, the proposed model has shown better or competitive -results in comparison to the state-of-the-art models on Flickr8k and Flickr30k -datasets. -" -3346,1608.05852,"Jifan Chen, Kan Chen, Xipeng Qiu, Qi Zhang, Xuanjing Huang, Zheng - Zhang",Learning Word Embeddings from Intrinsic and Extrinsic Views,cs.CL cs.AI," While word embeddings are currently predominant for natural language -processing, most of existing models learn them solely from their contexts. -However, these context-based word embeddings are limited since not all words' -meaning can be learned based on only context. Moreover, it is also difficult to -learn the representation of the rare words due to data sparsity problem. In -this work, we address these issues by learning the representations of words by -integrating their intrinsic (descriptive) and extrinsic (contextual) -information. To prove the effectiveness of our model, we evaluate it on four -tasks, including word similarity, reverse dictionaries,Wiki link prediction, -and document classification. Experiment results show that our model is powerful -in both word and document modeling. -" -3347,1608.05859,"Ofir Press, Lior Wolf",Using the Output Embedding to Improve Language Models,cs.CL," We study the topmost weight matrix of neural network language models. We show -that this matrix constitutes a valid word embedding. When training language -models, we recommend tying the input embedding and this output embedding. We -analyze the resulting update rules and show that the tied embedding evolves in -a more similar way to the output embedding than to the input embedding in the -untied model. We also offer a new method of regularizing the output embedding. -Our methods lead to a significant reduction in perplexity, as we are able to -show on a variety of neural network language models. Finally, we show that -weight tying can reduce the size of neural translation models to less than half -of their original size without harming their performance. -" -3348,1608.06043,Zhaopeng Tu and Yang Liu and Zhengdong Lu and Xiaohua Liu and Hang Li,Context Gates for Neural Machine Translation,cs.CL," In neural machine translation (NMT), generation of a target word depends on -both source and target contexts. We find that source contexts have a direct -impact on the adequacy of a translation while target contexts affect the -fluency. Intuitively, generation of a content word should rely more on the -source context and generation of a functional word should rely more on the -target context. Due to the lack of effective control over the influence from -source and target contexts, conventional NMT tends to yield fluent but -inadequate translations. To address this problem, we propose context gates -which dynamically control the ratios at which source and target contexts -contribute to the generation of target words. In this way, we can enhance both -the adequacy and fluency of NMT with more careful control of the information -flow from contexts. Experiments show that our approach significantly improves -upon a standard attention-based NMT system by +2.3 BLEU points. -" -3349,1608.06111,"Marco Damonte, Shay B. Cohen, Giorgio Satta",An Incremental Parser for Abstract Meaning Representation,cs.CL," Meaning Representation (AMR) is a semantic representation for natural -language that embeds annotations related to traditional tasks such as named -entity recognition, semantic role labeling, word sense disambiguation and -co-reference resolution. We describe a transition-based parser for AMR that -parses sentences left-to-right, in linear time. We further propose a test-suite -that assesses specific subtasks that are helpful in comparing AMR parsers, and -show that our parser is competitive with the state of the art on the LDC2015E86 -dataset and that it outperforms state-of-the-art parsers for recovering named -entities and handling polarity. -" -3350,1608.06134,Srikanth Ronanki and Oliver Watts and Simon King and Gustav Eje Henter,"Median-Based Generation of Synthetic Speech Durations using a - Non-Parametric Approach",cs.CL," This paper proposes a new approach to duration modelling for statistical -parametric speech synthesis in which a recurrent statistical model is trained -to output a phone transition probability at each timestep (acoustic frame). -Unlike conventional approaches to duration modelling -- which assume that -duration distributions have a particular form (e.g., a Gaussian) and use the -mean of that distribution for synthesis -- our approach can in principle model -any distribution supported on the non-negative integers. Generation from this -model can be performed in many ways; here we consider output generation based -on the median predicted duration. The median is more typical (more probable) -than the conventional mean duration, is robust to training-data irregularities, -and enables incremental generation. Furthermore, a frame-level approach to -duration prediction is consistent with a longer-term goal of modelling -durations and acoustic features together. Results indicate that the proposed -method is competitive with baseline approaches in approximating the median -duration of held-out natural speech. -" -3351,1608.06378,"Bo-Hsiang Tseng, Sheng-Syun Shen, Hung-Yi Lee, Lin-Shan Lee","Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening - Comprehension Test by Machine",cs.CL," Multimedia or spoken content presents more attractive information than plain -text content, but it's more difficult to display on a screen and be selected by -a user. As a result, accessing large collections of the former is much more -difficult and time-consuming than the latter for humans. It's highly attractive -to develop a machine which can automatically understand spoken content and -summarize the key information for humans to browse over. In this endeavor, we -propose a new task of machine comprehension of spoken content. We define the -initial goal as the listening comprehension test of TOEFL, a challenging -academic English examination for English learners whose native language is not -English. We further propose an Attention-based Multi-hop Recurrent Neural -Network (AMRNN) architecture for this task, achieving encouraging results in -the initial tests. Initial results also have shown that word-level attention is -probably more robust than sentence-level attention for this task with ASR -errors. -" -3352,1608.06386,"Soham Dan, Sanyam Agarwal, Mayank Singh, Pawan Goyal and Animesh - Mukherjee","Which techniques does your application use?: An information extraction - framework for scientific articles",cs.CL," Every field of research consists of multiple application areas with various -techniques routinely used to solve problems in these wide range of application -areas. With the exponential growth in research volumes, it has become difficult -to keep track of the ever-growing number of application areas as well as the -corresponding problem solving techniques. In this paper, we consider the -computational linguistics domain and present a novel information extraction -system that automatically constructs a pool of all application areas in this -domain and appropriately links them with corresponding problem solving -techniques. Further, we categorize individual research articles based on their -application area and the techniques proposed/used in the article. k-gram based -discounting method along with handwritten rules and bootstrapped pattern -learning is employed to extract application areas. Subsequently, a language -modeling approach is proposed to characterize each article based on its -application area. Similarly, regular expressions and high-scoring noun phrases -are used for the extraction of the problem solving techniques. We propose a -greedy approach to characterize each article based on the techniques. Towards -the end, we present a table representing the most frequent techniques adopted -for a particular application area. Finally, we propose three use cases -presenting an extensive temporal analysis of the usage of techniques and -application areas. -" -3353,1608.06459,Henrik Hermansson and James P. Cross,"Tracking Amendments to Legislation and Other Political Texts with a - Novel Minimum-Edit-Distance Algorithm: DocuToads",cs.CL cs.CY," Political scientists often find themselves tracking amendments to political -texts. As different actors weigh in, texts change as they are drafted and -redrafted, reflecting political preferences and power. This study provides a -novel solution to the prob- lem of detecting amendments to political text based -upon minimum edit distances. We demonstrate the usefulness of two -language-insensitive, transparent, and efficient minimum-edit-distance -algorithms suited for the task. These algorithms are capable of providing an -account of the types (insertions, deletions, substitutions, and trans- -positions) and substantive amount of amendments made between version of texts. -To illustrate the usefulness and efficiency of the approach we replicate two -existing stud- ies from the field of legislative studies. Our results -demonstrate that minimum edit distance methods can produce superior measures of -text amendments to hand-coded efforts in a fraction of the time and resource -costs. -" -3354,1608.06549,Jun-Wei Lin and Farn Wang,"Using Semantic Similarity for Input Topic Identification in - Crawling-based Web Application Testing",cs.SE cs.CL," To automatically test web applications, crawling-based techniques are usually -adopted to mine the behavior models, explore the state spaces or detect the -violated invariants of the applications. However, in existing crawlers, rules -for identifying the topics of input text fields, such as login ids, passwords, -emails, dates and phone numbers, have to be manually configured. Moreover, the -rules for one application are very often not suitable for another. In addition, -when several rules conflict and match an input text field to more than one -topics, it can be difficult to determine which rule suggests a better match. -This paper presents a natural-language approach to automatically identify the -topics of encountered input fields during crawling by semantically comparing -their similarities with the input fields in labeled corpus. In our evaluation -with 100 real-world forms, the proposed approach demonstrated comparable -performance to the rule-based one. Our experiments also show that the accuracy -of the rule-based approach can be improved by up to 19% when integrated with -our approach. -" -3355,1608.06651,"Christophe Van Gysel, Maarten de Rijke, Marcel Worring","Unsupervised, Efficient and Semantic Expertise Retrieval",cs.IR cs.AI cs.CL cs.LG," We introduce an unsupervised discriminative model for the task of retrieving -experts in online document collections. We exclusively employ textual evidence -and avoid explicit feature engineering by learning distributed word -representations in an unsupervised way. We compare our model to -state-of-the-art unsupervised statistical vector space and probabilistic -generative approaches. Our proposed log-linear model achieves the retrieval -performance levels of state-of-the-art document-centric methods with the low -inference cost of so-called profile-centric approaches. It yields a -statistically significant improved ranking over vector space and generative -models in most cases, matching the performance of supervised methods on various -benchmarks. That is, by using solely text we can do as well as methods that -work with external evidence and/or relevance feedback. A contrastive analysis -of rankings produced by discriminative and generative approaches shows that -they have complementary strengths due to the ability of the unsupervised -discriminative model to perform semantic matching. -" -3356,1608.06656,"Christophe Van Gysel, Evangelos Kanoulas, Maarten de Rijke",Lexical Query Modeling in Session Search,cs.IR cs.CL," Lexical query modeling has been the leading paradigm for session search. In -this paper, we analyze TREC session query logs and compare the performance of -different lexical matching approaches for session search. Naive methods based -on term frequency weighing perform on par with specialized session models. In -addition, we investigate the viability of lexical query models in the setting -of session search. We give important insights into the potential and -limitations of lexical query modeling for session search and propose future -directions for the field of session search. -" -3357,1608.06697,"Cliff Goddard, Maite Taboada, Radoslava Trnavac","Semantic descriptions of 24 evaluational adjectives, for application in - sentiment analysis",cs.CL," We apply the Natural Semantic Metalanguage (NSM) approach (Goddard and -Wierzbicka 2014) to the lexical-semantic analysis of English evaluational -adjectives and compare the results with the picture developed in the Appraisal -Framework (Martin and White 2005). The analysis is corpus-assisted, with -examples mainly drawn from film and book reviews, and supported by -collocational and statistical information from WordBanks Online. We propose NSM -explications for 24 evaluational adjectives, arguing that they fall into five -groups, each of which corresponds to a distinct semantic template. The groups -can be sketched as follows: ""First-person thought-plus-affect"", e.g. wonderful; -""Experiential"", e.g. entertaining; ""Experiential with bodily reaction"", e.g. -gripping; ""Lasting impact"", e.g. memorable; ""Cognitive evaluation"", e.g. -complex, excellent. These groupings and semantic templates are compared with -the classifications in the Appraisal Framework's system of Appreciation. In -addition, we are particularly interested in sentiment analysis, the automatic -identification of evaluation and subjectivity in text. We discuss the relevance -of the two frameworks for sentiment analysis and other language technology -applications. -" -3358,1608.06718,"Jos\'e Camacho Collados and Claudio Delli Bovi and Alessandro Raganato - and Roberto Navigli",A Large-Scale Multilingual Disambiguation of Glosses,cs.CL," Linking concepts and named entities to knowledge bases has become a crucial -Natural Language Understanding task. In this respect, recent works have shown -the key advantage of exploiting textual definitions in various Natural Language -Processing applications. However, to date there are no reliable large-scale -corpora of sense-annotated textual definitions available to the research -community. In this paper we present a large-scale high-quality corpus of -disambiguated glosses in multiple languages, comprising sense annotations of -both concepts and named entities from a unified sense inventory. Our approach -for the construction and disambiguation of the corpus builds upon the structure -of a large multilingual semantic network and a state-of-the-art disambiguation -system; first, we gather complementary information of equivalent definitions -across different languages to provide context for disambiguation, and then we -combine it with a semantic similarity-based refinement. As a result we obtain a -multilingual corpus of textual definitions featuring over 38 million -definitions in 263 languages, and we make it freely available at -http://lcl.uniroma1.it/disambiguated-glosses. Experiments on Open Information -Extraction and Sense Clustering show how two state-of-the-art approaches -improve their performance by integrating our disambiguated corpus into their -pipeline. -" -3359,1608.06757,"Sebastian Arnold, Felix A. Gers, Torsten Kilias, Alexander L\""oser",Robust Named Entity Recognition in Idiosyncratic Domains,cs.CL," Named entity recognition often fails in idiosyncratic domains. That causes a -problem for depending tasks, such as entity linking and relation extraction. We -propose a generic and robust approach for high-recall named entity recognition. -Our approach is easy to train and offers strong generalization over diverse -domain-specific language, such as news documents (e.g. Reuters) or biomedical -text (e.g. Medline). Our approach is based on deep contextual sequence learning -and utilizes stacked bidirectional LSTM networks. Our model is trained with -only few hundred labeled sentences and does not rely on further external -knowledge. We report from our results F1 scores in the range of 84-94% on -standard datasets. -" -3360,1608.06794,"Thomas Kober, Julie Weeds, Jeremy Reffin and David Weir","Improving Sparse Word Representations with Distributional Inference for - Semantic Composition",cs.CL," Distributional models are derived from co-occurrences in a corpus, where only -a small proportion of all possible plausible co-occurrences will be observed. -This results in a very sparse vector space, requiring a mechanism for inferring -missing knowledge. Most methods face this challenge in ways that render the -resulting word representations uninterpretable, with the consequence that -semantic composition becomes hard to model. In this paper we explore an -alternative which involves explicitly inferring unobserved co-occurrences using -the distributional neighbourhood. We show that distributional inference -improves sparse word representations on several word similarity benchmarks and -demonstrate that our model is competitive with the state-of-the-art for -adjective-noun, noun-noun and verb-object compositions while being fully -interpretable. -" -3361,1608.07076,"Ond\v{r}ej Du\v{s}ek, Filip Jur\v{c}\'i\v{c}ek",A Context-aware Natural Language Generator for Dialogue Systems,cs.CL," We present a novel natural language generation system for spoken dialogue -systems capable of entraining (adapting) to users' way of speaking, providing -contextually appropriate responses. The generator is based on recurrent neural -networks and the sequence-to-sequence approach. It is fully trainable from data -which include preceding context along with responses to be generated. We show -that the context-aware generator yields significant improvements over the -baseline in both automatic metrics and a human pairwise preference test. -" -3362,1608.07094,"D S Guru, Mahamad Suhil",A Novel Term_Class Relevance Measure for Text Categorization,cs.IR cs.CL," In this paper, we introduce a new measure called Term_Class relevance to -compute the relevancy of a term in classifying a document into a particular -class. The proposed measure estimates the degree of relevance of a given term, -in placing an unlabeled document to be a member of a known class, as a product -of Class_Term weight and Class_Term density; where the Class_Term weight is the -ratio of the number of documents of the class containing the term to the total -number of documents containing the term and the Class_Term density is the -relative density of occurrence of the term in the class to the total occurrence -of the term in the entire population. Unlike the other existing term weighting -schemes such as TF-IDF and its variants, the proposed relevance measure takes -into account the degree of relative participation of the term across all -documents of the class to the entire population. To demonstrate the -significance of the proposed measure experimentation has been conducted on the -20 Newsgroups dataset. Further, the superiority of the novel measure is brought -out through a comparative analysis. -" -3363,1608.07115,"David Weir, Julie Weeds, Jeremy Reffin and Thomas Kober","Aligning Packed Dependency Trees: a theory of composition for - distributional semantics",cs.CL," We present a new framework for compositional distributional semantics in -which the distributional contexts of lexemes are expressed in terms of anchored -packed dependency trees. We show that these structures have the potential to -capture the full sentential contexts of a lexeme and provide a uniform basis -for the composition of distributional knowledge in a way that captures both -mutual disambiguation and generalization. -" -3364,1608.07187,"Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan","Semantics derived automatically from language corpora contain human-like - biases",cs.AI cs.CL cs.CY cs.LG," Artificial intelligence and machine learning are in a period of astounding -growth. However, there are concerns that these technologies may be used, either -with or without intention, to perpetuate the prejudice and unfairness that -unfortunately characterizes many human institutions. Here we show for the first -time that human-like semantic biases result from the application of standard -machine learning to ordinary language---the same sort of language humans are -exposed to every day. We replicate a spectrum of standard human biases as -exposed by the Implicit Association Test and other well-known psychological -studies. We replicate these using a widely used, purely statistical -machine-learning model---namely, the GloVe word embedding---trained on a corpus -of text from the Web. Our results indicate that language itself contains -recoverable and accurate imprints of our historic biases, whether these are -morally neutral as towards insects or flowers, problematic as towards race or -gender, or even simply veridical, reflecting the {\em status quo} for the -distribution of gender with respect to careers or first names. These -regularities are captured by machine learning along with the rest of semantics. -In addition to our empirical findings concerning language, we also contribute -new methods for evaluating bias in text, the Word Embedding Association Test -(WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results -have implications not only for AI and machine learning, but also for the fields -of psychology, sociology, and human ethics, since they raise the possibility -that mere exposure to everyday language can account for the biases we replicate -here. -" -3365,1608.07253,"Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas",Learning Latent Vector Spaces for Product Search,cs.IR cs.AI cs.CL," We introduce a novel latent vector space model that jointly learns the latent -representations of words, e-commerce products and a mapping between the two -without the need for explicit annotations. The power of the model lies in its -ability to directly model the discriminative relation between products and a -particular word. We compare our method to existing latent vector space models -(LSI, LDA and word2vec) and evaluate it as a feature in a learning to rank -setting. Our latent vector space model achieves its enhanced performance as it -learns better product representations. Furthermore, the mapping from words to -products and the representations of words benefit directly from the errors -propagated back from the product representations during parameter estimation. -We provide an in-depth analysis of the performance of our model and analyze the -structure of the learned representations. -" -3366,1608.07639,"Yuval Atzmon, Jonathan Berant, Vahid Kezami, Amir Globerson and Gal - Chechik",Learning to generalize to new compositions in image understanding,cs.CV cs.AI cs.CL cs.LG," Recurrent neural networks have recently been used for learning to describe -images using natural language. However, it has been observed that these models -generalize poorly to scenes that were not observed during training, possibly -depending too strongly on the statistics of the text in the training data. Here -we propose to describe images using short structured representations, aiming to -capture the crux of a description. These structured representations allow us to -tease-out and evaluate separately two types of generalization: standard -generalization to new images with similar scenes, and generalization to new -combinations of known entities. We compare two learning approaches on the -MS-COCO dataset: a state-of-the-art recurrent network based on an LSTM (Show, -Attend and Tell), and a simple structured prediction model on top of a deep -network. We find that the structured model generalizes to new compositions -substantially better than the LSTM, ~7 times the accuracy of predicting -structured representations. By providing a concrete method to quantify -generalization for unseen combinations, we argue that structured -representations and compositional splits are a useful benchmark for image -captioning, and advocate compositional models that capture linguistic and -visual structure. -" -3367,1608.07720,"Fei Li, Meishan Zhang, Guohong Fu, Tao Qian, Donghong Ji","A Bi-LSTM-RNN Model for Relation Classification Using Low-Cost Sequence - Features",cs.CL," Relation classification is associated with many potential applications in the -artificial intelligence area. Recent approaches usually leverage neural -networks based on structure features such as syntactic or dependency features -to solve this problem. However, high-cost structure features make such -approaches inconvenient to be directly used. In addition, structure features -are probably domain-dependent. Therefore, this paper proposes a bi-directional -long-short-term-memory recurrent-neural-network (Bi-LSTM-RNN) model based on -low-cost sequence features to address relation classification. This model -divides a sentence or text segment into five parts, namely two target entities -and their three contexts. It learns the representations of entities and their -contexts, and uses them to classify relations. We evaluate our model on two -standard benchmark datasets in different domains, namely SemEval-2010 Task 8 -and BioNLP-ST 2016 Task BB3. In the former dataset, our model achieves -comparable performance compared with other models using sequence features. In -the latter dataset, our model obtains the third best results compared with -other models in the official evaluation. Moreover, we find that the context -between two target entities plays the most important role in relation -classification. Furthermore, statistic experiments show that the context -between two target entities can be used as an approximate replacement of the -shortest dependency path when dependency parsing is not used. -" -3368,1608.07738,"Enrico Santus, Emmanuele Chersoni, Alessandro Lenci, Chu-Ren Huang, - Philippe Blache",Testing APSyn against Vector Cosine on Similarity Estimation,cs.CL," In Distributional Semantic Models (DSMs), Vector Cosine is widely used to -estimate similarity between word vectors, although this measure was noticed to -suffer from several shortcomings. The recent literature has proposed other -methods which attempt to mitigate such biases. In this paper, we intend to -investigate APSyn, a measure that computes the extent of the intersection -between the most associated contexts of two target words, weighting it by -context relevance. We evaluated this metric in a similarity estimation task on -several popular test sets, and our results show that APSyn is in fact highly -competitive, even with respect to the results reported in the literature for -word embeddings. On top of it, APSyn addresses some of the weaknesses of Vector -Cosine, performing well also on genuine similarity estimation. -" -3369,1608.07775,"Wei Fang, Jui-Yang Hsu, Hung-yi Lee, Lin-Shan Lee","Hierarchical Attention Model for Improved Machine Comprehension of - Spoken Content",cs.CL," Multimedia or spoken content presents more attractive information than plain -text content, but the former is more difficult to display on a screen and be -selected by a user. As a result, accessing large collections of the former is -much more difficult and time-consuming than the latter for humans. It's -therefore highly attractive to develop machines which can automatically -understand spoken content and summarize the key information for humans to -browse over. In this endeavor, a new task of machine comprehension of spoken -content was proposed recently. The initial goal was defined as the listening -comprehension test of TOEFL, a challenging academic English examination for -English learners whose native languages are not English. An Attention-based -Multi-hop Recurrent Neural Network (AMRNN) architecture was also proposed for -this task, which considered only the sequential relationship within the speech -utterances. In this paper, we propose a new Hierarchical Attention Model (HAM), -which constructs multi-hopped attention mechanism over tree-structured rather -than sequential representations for the utterances. Improved comprehension -performance robust with respect to ASR errors were obtained. -" -3370,1608.07836,Barbara Plank,What to do about non-standard (or non-canonical) language in NLP,cs.CL," Real world data differs radically from the benchmark corpora we use in -natural language processing (NLP). As soon as we apply our technologies to the -real world, performance drops. The reason for this problem is obvious: NLP -models are trained on samples from a limited set of canonical varieties that -are considered standard, most prominently English newswire. However, there are -many dimensions, e.g., socio-demographics, language, genre, sentence type, etc. -on which texts can differ from the standard. The solution is not obvious: we -cannot control for all factors, and it is not clear how to best go beyond the -current practice of training on homogeneous data from a single domain and -language. - In this paper, I review the notion of canonicity, and how it shapes our -community's approach to language. I argue for leveraging what I call fortuitous -data, i.e., non-obvious data that is hitherto neglected, hidden in plain sight, -or raw data that needs to be refined. If we embrace the variety of this -heterogeneous data by combining it with proper algorithms, we will not only -produce more robust models, but will also enable adaptive language technology -capable of addressing natural language variation. -" -3371,1608.07852,Chao-Lin Liu,"Quantitative Analyses of Chinese Poetry of Tang and Song Dynasties: - Using Changing Colors and Innovative Terms as Examples",cs.CL cs.CY cs.DL cs.IR," Tang (618-907 AD) and Song (960-1279) dynasties are two very important -periods in the development of Chinese literary. The most influential forms of -the poetry in Tang and Song were Shi and Ci, respectively. Tang Shi and Song Ci -established crucial foundations of the Chinese literature, and their influences -in both literary works and daily lives of the Chinese communities last until -today. - We can analyze and compare the Complete Tang Shi and the Complete Song Ci -from various viewpoints. In this presentation, we report our findings about the -differences in their vocabularies. Interesting new words that started to appear -in Song Ci and continue to be used in modern Chinese were identified. Colors -are an important ingredient of the imagery in poetry, and we discuss the most -frequent color words that appeared in Tang Shi and Song Ci. -" -3372,1608.07905,Shuohang Wang and Jing Jiang,Machine Comprehension Using Match-LSTM and Answer Pointer,cs.CL cs.AI," Machine comprehension of text is an important problem in natural language -processing. A recently released dataset, the Stanford Question Answering -Dataset (SQuAD), offers a large number of real questions and their answers -created by humans through crowdsourcing. SQuAD provides a challenging testbed -for evaluating machine comprehension algorithms, partly because compared with -previous datasets, in SQuAD the answers do not come from a small set of -candidate answers and they have variable lengths. We propose an end-to-end -neural architecture for the task. The architecture is based on match-LSTM, a -model we proposed previously for textual entailment, and Pointer Net, a -sequence-to-sequence model proposed by Vinyals et al.(2015) to constrain the -output tokens to be from the input sequences. We propose two ways of using -Pointer Net for our task. Our experiments show that both of our two models -substantially outperform the best results obtained by Rajpurkar et al.(2016) -using logistic regression and manually crafted features. -" -3373,1608.08176,"Amritanshu Agrawal, Wei Fu, Tim Menzies","What is Wrong with Topic Modeling? (and How to Fix it Using Search-based - Software Engineering)",cs.SE cs.AI cs.CL cs.IR," Context: Topic modeling finds human-readable structures in unstructured -textual data. A widely used topic modeler is Latent Dirichlet allocation. When -run on different datasets, LDA suffers from ""order effects"" i.e. different -topics are generated if the order of training data is shuffled. Such order -effects introduce a systematic error for any study. This error can relate to -misleading results;specifically, inaccurate topic descriptions and a reduction -in the efficacy of text mining classification results. Objective: To provide a -method in which distributions generated by LDA are more stable and can be used -for further analysis. Method: We use LDADE, a search-based software engineering -tool that tunes LDA's parameters using DE (Differential Evolution). LDADE is -evaluated on data from a programmer information exchange site (Stackoverflow), -title and abstract text of thousands ofSoftware Engineering (SE) papers, and -software defect reports from NASA. Results were collected across different -implementations of LDA (Python+Scikit-Learn, Scala+Spark); across different -platforms (Linux, Macintosh) and for different kinds of LDAs (VEM,or using -Gibbs sampling). Results were scored via topic stability and text mining -classification accuracy. Results: In all treatments: (i) standard LDA exhibits -very large topic instability; (ii) LDADE's tunings dramatically reduce cluster -instability; (iii) LDADE also leads to improved performances for supervised as -well as unsupervised learning. Conclusion: Due to topic instability, using -standard LDA with its ""off-the-shelf"" settings should now be depreciated. Also, -in future, we should require SE papers that use LDA to test and (if needed) -mitigate LDA topic instability. Finally, LDADE is a candidate technology for -effectively and efficiently reducing that instability. -" -3374,1608.08188,Danna Gurari and Kristen Grauman,Visual Question: Predicting If a Crowd Will Agree on the Answer,cs.AI cs.CL cs.CV cs.HC," Visual question answering (VQA) systems are emerging from a desire to empower -users to ask any natural language question about visual content and receive a -valid answer in response. However, close examination of the VQA problem reveals -an unavoidable, entangled problem that multiple humans may or may not always -agree on a single answer to a visual question. We train a model to -automatically predict from a visual question whether a crowd would agree on a -single answer. We then propose how to exploit this system in a novel -application to efficiently allocate human effort to collect answers to visual -questions. Specifically, we propose a crowdsourcing system that automatically -solicits fewer human responses when answer agreement is expected and more human -responses when answer disagreement is expected. Our system improves upon -existing crowdsourcing systems, typically eliminating at least 20% of human -effort with no loss to the information collected from the crowd. -" -3375,1608.08339,Taehwan Kim,"American Sign Language fingerspelling recognition from video: Methods - for unrestricted recognition and signer-independence",cs.CL cs.CV," In this thesis, we study the problem of recognizing video sequences of -fingerspelled letters in American Sign Language (ASL). Fingerspelling comprises -a significant but relatively understudied part of ASL, and recognizing it is -challenging for a number of reasons: It involves quick, small motions that are -often highly coarticulated; it exhibits significant variation between signers; -and there has been a dearth of continuous fingerspelling data collected. In -this work, we propose several types of recognition approaches, and explore the -signer variation problem. Our best-performing models are segmental -(semi-Markov) conditional random fields using deep neural network-based -features. In the signer-dependent setting, our recognizers achieve up to about -8% letter error rates. The signer-independent setting is much more challenging, -but with neural network adaptation we achieve up to 17% letter error rates. -" -3376,1608.08515,"Ivana Balazevic, Mikio Braun, Klaus-Robert M\""uller",Language Detection For Short Text Messages In Social Media,cs.CL cs.AI," With the constant growth of the World Wide Web and the number of documents in -different languages accordingly, the need for reliable language detection tools -has increased as well. Platforms such as Twitter with predominantly short texts -are becoming important information resources, which additionally imposes the -need for short texts language detection algorithms. In this paper, we show how -incorporating personalized user-specific information into the language -detection algorithm leads to an important improvement of detection results. To -choose the best algorithm for language detection for short text messages, we -investigate several machine learning approaches. These approaches include the -use of the well-known classifiers such as SVM and logistic regression, a -dictionary based approach, and a probabilistic model based on modified -Kneser-Ney smoothing. Furthermore, the extension of the probabilistic model to -include additional user-specific information such as evidence accumulation per -user and user interface language is explored, with the goal of improving the -classification performance. The proposed approaches are evaluated on randomly -collected Twitter data containing Latin as well as non-Latin alphabet languages -and the quality of the obtained results is compared, followed by the selection -of the best performing algorithm. This algorithm is then evaluated against two -already existing general language detection tools: Chromium Compact Language -Detector 2 (CLD2) and langid, where our method significantly outperforms the -results achieved by both of the mentioned methods. Additionally, a preview of -benefits and possible applications of having a reliable language detection -algorithm is given. -" -3377,1608.08716,"C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret - Mitchell, Dhruv Batra, Devi Parikh",Measuring Machine Intelligence Through Visual Question Answering,cs.AI cs.CL cs.CV cs.LG," As machines have become more intelligent, there has been a renewed interest -in methods for measuring their intelligence. A common approach is to propose -tasks for which a human excels, but one which machines find difficult. However, -an ideal task should also be easy to evaluate and not be easily gameable. We -begin with a case study exploring the recently popular task of image captioning -and its limitations as a task for measuring machine intelligence. An -alternative and more promising task is Visual Question Answering that tests a -machine's ability to reason about language and vision. We describe a dataset -unprecedented in size created for the task that contains over 760,000 human -generated questions about images. Using around 10 million human generated -answers, machines may be easily evaluated. -" -3378,1608.08738,"St\'ephan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, Walter - Daelemans",A Dictionary-based Approach to Racism Detection in Dutch Social Media,cs.CL," We present a dictionary-based approach to racism detection in Dutch social -media comments, which were retrieved from two public Belgian social media sites -likely to attract racist reactions. These comments were labeled as racist or -non-racist by multiple annotators. For our approach, three discourse -dictionaries were created: first, we created a dictionary by retrieving -possibly racist and more neutral terms from the training data, and then -augmenting these with more general words to remove some bias. A second -dictionary was created through automatic expansion using a \texttt{word2vec} -model trained on a large corpus of general Dutch text. Finally, a third -dictionary was created by manually filtering out incorrect expansions. We -trained multiple Support Vector Machines, using the distribution of words over -the different categories in the dictionaries as features. The best-performing -model used the manually cleaned dictionary and obtained an F-score of 0.46 for -the racist class on a test set consisting of unseen Dutch comments, retrieved -from the same sites used for the training set. The automated expansion of the -dictionary only slightly boosted the model's performance, and this increase in -performance was not statistically significant. The fact that the coverage of -the expanded dictionaries did increase indicates that the words that were -automatically added did occur in the corpus, but were not able to meaningfully -impact performance. The dictionaries, code, and the procedure for requesting -the corpus are available at: https://github.com/clips/hades -" -3379,1608.08868,"Su Lin Blodgett, Lisa Green, and Brendan O'Connor","Demographic Dialectal Variation in Social Media: A Case Study of - African-American English",cs.CL," Though dialectal language is increasingly abundant on social media, few -resources exist for developing NLP tools to handle such language. We conduct a -case study of dialectal language in online conversational text by investigating -African-American English (AAE) on Twitter. We propose a distantly supervised -model to identify AAE-like language from demographics associated with -geo-located messages, and we verify that this language follows well-known AAE -linguistic phenomena. In addition, we analyze the quality of existing language -identification and dependency parsing tools on AAE-like text, demonstrating -that they perform poorly on such text compared to text associated with white -speakers. We also provide an ensemble classifier for language identification -which eliminates this disparity and release a new corpus of tweets containing -AAE-like language. -" -3380,1608.08927,Payam Siyari and Matthias Gall\'e,The Generalized Smallest Grammar Problem,cs.CL cs.AI cs.DS cs.IT math.IT," The Smallest Grammar Problem -- the problem of finding the smallest -context-free grammar that generates exactly one given sequence -- has never -been successfully applied to grammatical inference. We investigate the reasons -and propose an extended formulation that seeks to minimize non-recursive -grammars, instead of straight-line programs. In addition, we provide very -efficient algorithms that approximate the minimization problem of this class of -grammars. Our empirical evaluation shows that we are able to find smaller -models than the current best approximations to the Smallest Grammar Problem on -standard benchmarks, and that the inferred rules capture much better the -syntactic structure of natural language. -" -3381,1608.08940,"Luis Argerich, Joaqu\'in Torr\'e Zaffaroni, Mat\'ias J Cano","Hash2Vec, Feature Hashing for Word Embeddings",cs.CL cs.IR cs.LG," In this paper we propose the application of feature hashing to create word -embeddings for natural language processing. Feature hashing has been used -successfully to create document vectors in related tasks like document -classification. In this work we show that feature hashing can be applied to -obtain word embeddings in linear time with the size of the data. The results -show that this algorithm, that does not need training, is able to capture the -semantic meaning of words. We compare the results against GloVe showing that -they are similar. As far as we know this is the first application of feature -hashing to the word embeddings problem and the results indicate this is a -scalable technique with practical results for NLP applications. -" -3382,1608.08953,"Mehrnoosh Sameki, Mattia Gentil, Kate K. Mays, Lei Guo, and Margrit - Betke","Dynamic Allocation of Crowd Contributions for Sentiment Analysis during - the 2016 U.S. Presidential Election",cs.HC cs.CL cs.SI," Opinions about the 2016 U.S. Presidential Candidates have been expressed in -millions of tweets that are challenging to analyze automatically. Crowdsourcing -the analysis of political tweets effectively is also difficult, due to large -inter-rater disagreements when sarcasm is involved. Each tweet is typically -analyzed by a fixed number of workers and majority voting. We here propose a -crowdsourcing framework that instead uses a dynamic allocation of the number of -workers. We explore two dynamic-allocation methods: (1) The number of workers -queried to label a tweet is computed offline based on the predicted difficulty -of discerning the sentiment of a particular tweet. (2) The number of crowd -workers is determined online, during an iterative crowd sourcing process, based -on inter-rater agreements between labels.We applied our approach to 1,000 -twitter messages about the four U.S. presidential candidates Clinton, Cruz, -Sanders, and Trump, collected during February 2016. We implemented the two -proposed methods using decision trees that allocate more crowd efforts to -tweets predicted to be sarcastic. We show that our framework outperforms the -traditional static allocation scheme. It collects opinion labels from the crowd -at a much lower cost while maintaining labeling accuracy. -" -3383,1608.08974,"Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra","Towards Transparent AI Systems: Interpreting Visual Question Answering - Models",cs.CV cs.AI cs.CL cs.LG," Deep neural networks have shown striking progress and obtained -state-of-the-art results in many AI research fields in the recent years. -However, it is often unsatisfying to not know why they predict what they do. In -this paper, we address the problem of interpreting Visual Question Answering -(VQA) models. Specifically, we are interested in finding what part of the input -(pixels in images or words in questions) the VQA model focuses on while -answering the question. To tackle this problem, we use two visualization -techniques -- guided backpropagation and occlusion -- to find important words -in the question and important regions in the image. We then present qualitative -and quantitative analyses of these importance maps. We found that even without -explicit attention mechanisms, VQA models may sometimes be implicitly attending -to relevant regions in the image, and often to appropriate words in the -question. -" -3384,1609.00070,Arun Tejasvi Chaganty and Percy Liang,"How Much is 131 Million Dollars? Putting Numbers in Perspective with - Compositional Descriptions",cs.CL," How much is 131 million US dollars? To help readers put such numbers in -context, we propose a new task of automatically generating short descriptions -known as perspectives, e.g. ""$131 million is about the cost to employ everyone -in Texas over a lunch period"". First, we collect a dataset of numeric mentions -in news articles, where each mention is labeled with a set of rated -perspectives. We then propose a system to generate these descriptions -consisting of two steps: formula construction and description generation. In -construction, we compose formulae from numeric facts in a knowledge base and -rank the resulting formulas based on familiarity, numeric proximity and -semantic compatibility. In generation, we convert a formula into natural -language using a sequence-to-sequence recurrent neural network. Our system -obtains a 15.2% F1 improvement over a non-compositional baseline at formula -construction and a 12.5 BLEU point improvement over a baseline description -generation. -" -3385,1609.00081,Tanmoy Chakraborty and Ramasuri Narayanam,"All Fingers are not Equal: Intensity of References in Scientific - Articles",cs.CL cs.DL," Research accomplishment is usually measured by considering all citations with -equal importance, thus ignoring the wide variety of purposes an article is -being cited for. Here, we posit that measuring the intensity of a reference is -crucial not only to perceive better understanding of research endeavor, but -also to improve the quality of citation-based applications. To this end, we -collect a rich annotated dataset with references labeled by the intensity, and -propose a novel graph-based semi-supervised model, GraLap to label the -intensity of references. Experiments with AAN datasets show a significant -improvement compared to the baselines to achieve the true labels of the -references (46% better correlation). Finally, we provide four applications to -demonstrate how the knowledge of reference intensity leads to design better -real-world applications. -" -3386,1609.00425,Ethan Fast and Eric Horvitz,Identifying Dogmatism in Social Media: Signals and Models,cs.CL cs.SI," We explore linguistic and behavioral features of dogmatism in social media -and construct statistical models that can identify dogmatic comments. Our model -is based on a corpus of Reddit posts, collected across a diverse set of -conversational topics and annotated via paid crowdsourcing. We operationalize -key aspects of dogmatism described by existing psychology theories (such as -over-confidence), finding they have predictive power. We also find evidence for -new signals of dogmatism, such as the tendency of dogmatic posts to refrain -from signaling cognitive processes. When we use our predictive model to analyze -millions of other Reddit posts, we find evidence that suggests dogmatism is a -deeper personality trait, present for dogmatic users across many different -domains, and that users who engage on dogmatic comments tend to show increases -in dogmatic posts themselves. -" -3387,1609.00435,"David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, Dan Jurafsky",Citation Classification for Behavioral Analysis of a Scientific Field,cs.CL cs.DL," Citations are an important indicator of the state of a scientific field, -reflecting how authors frame their work, and influencing uptake by future -scholars. However, our understanding of citation behavior has been limited to -small-scale manual citation analysis. We perform the largest behavioral study -of citations to date, analyzing how citations are both framed and taken up by -scholars in one entire field: natural language processing. We introduce a new -dataset of nearly 2,000 citations annotated for function and centrality, and -use it to develop a state-of-the-art classifier and label the entire ACL -Reference Corpus. We then study how citations are framed by authors and use -both papers and online traces to track how citations are followed by readers. -We demonstrate that authors are sensitive to discourse structure and -publication venue when citing, that online readers follow temporal links to -previous and future work rather than methodological links, and that how a paper -cites related work is predictive of its citation count. Finally, we use changes -in citation roles to show that the field of NLP is undergoing a significant -increase in consensus. -" -3388,1609.00464,"Trey Grainger, Khalifeh AlJadda, Mohammed Korayem, Andries Smith","The Semantic Knowledge Graph: A compact, auto-generated model for - real-time traversal and ranking of any relationship within a domain",cs.IR cs.AI cs.CL," This paper describes a new kind of knowledge representation and mining system -which we are calling the Semantic Knowledge Graph. At its heart, the Semantic -Knowledge Graph leverages an inverted index, along with a complementary -uninverted index, to represent nodes (terms) and edges (the documents within -intersecting postings lists for multiple terms/nodes). This provides a layer of -indirection between each pair of nodes and their corresponding edge, enabling -edges to materialize dynamically from underlying corpus statistics. As a -result, any combination of nodes can have edges to any other nodes materialize -and be scored to reveal latent relationships between the nodes. This provides -numerous benefits: the knowledge graph can be built automatically from a -real-world corpus of data, new nodes - along with their combined edges - can be -instantly materialized from any arbitrary combination of preexisting nodes -(using set operations), and a full model of the semantic relationships between -all entities within a domain can be represented and dynamically traversed using -a highly compact representation of the graph. Such a system has widespread -applications in areas as diverse as knowledge modeling and reasoning, natural -language processing, anomaly detection, data cleansing, semantic search, -analytics, data classification, root cause analysis, and recommendations -systems. The main contribution of this paper is the introduction of a novel -system - the Semantic Knowledge Graph - which is able to dynamically discover -and score interesting relationships between any arbitrary combination of -entities (words, phrases, or extracted concepts) through dynamically -materializing nodes and edges from a compact graphical representation built -automatically from a corpus of data representative of a knowledge domain. -" -3389,1609.00514,"Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten Marx","On Horizontal and Vertical Separation in Hierarchical Text - Classification",cs.IR cs.CL cs.IT math.IT," Hierarchy is a common and effective way of organizing data and representing -their relationships at different levels of abstraction. However, hierarchical -data dependencies cause difficulties in the estimation of ""separable"" models -that can distinguish between the entities in the hierarchy. Extracting -separable models of hierarchical entities requires us to take their relative -position into account and to consider the different types of dependencies in -the hierarchy. In this paper, we present an investigation of the effect of -separability in text-based entity classification and argue that in hierarchical -classification, a separation property should be established between entities -not only in the same layer, but also in different layers. Our main findings are -the followings. First, we analyse the importance of separability on the data -representation in the task of classification and based on that, we introduce a -""Strong Separation Principle"" for optimizing expected effectiveness of -classifiers decision based on separation property. Second, we present -Hierarchical Significant Words Language Models (HSWLM) which capture all, and -only, the essential features of hierarchical entities according to their -relative position in the hierarchy resulting in horizontally and vertically -separable models. Third, we validate our claims on real-world data and -demonstrate that how HSWLM improves the accuracy of classification and how it -provides transferable models over time. Although discussions in this paper -focus on the classification problem, the models are applicable to any -information access tasks on data that has, or can be mapped to, a hierarchical -structure. -" -3390,1609.00559,Bridget T. McInnes and Ted Pedersen,"Improving Correlation with Human Judgments by Integrating Semantic - Similarity with Second--Order Vectors",cs.CL," Vector space methods that measure semantic similarity and relatedness often -rely on distributional information such as co--occurrence frequencies or -statistical measures of association to weight the importance of particular -co--occurrences. In this paper, we extend these methods by incorporating a -measure of semantic similarity based on a human curated taxonomy into a -second--order vector representation. This results in a measure of semantic -relatedness that combines both the contextual information available in a -corpus--based vector space representation with the semantic knowledge found in -a biomedical ontology. Our results show that incorporating semantic similarity -into a second order co--occurrence matrices improves correlation with human -judgments for both similarity and relatedness, and that our method compares -favorably to various different word embedding methods that have recently been -evaluated on the same reference standards we have used. -" -3391,1609.00565,"Lingxun Meng, Yan Li, Mengyi Liu and Peng Shu","Skipping Word: A Character-Sequential Representation based Framework for - Question Answering",cs.CL," Recent works using artificial neural networks based on word distributed -representation greatly boost the performance of various natural language -learning tasks, especially question answering. Though, they also carry along -with some attendant problems, such as corpus selection for embedding learning, -dictionary transformation for different learning tasks, etc. In this paper, we -propose to straightforwardly model sentences by means of character sequences, -and then utilize convolutional neural networks to integrate character embedding -learning together with point-wise answer selection training. Compared with deep -models pre-trained on word embedding (WE) strategy, our character-sequential -representation (CSR) based method shows a much simpler procedure and more -stable performance across different benchmarks. Extensive experiments on two -benchmark answer selection datasets exhibit the competitive performance -compared with the state-of-the-art methods. -" -3392,1609.00626,"Shinichi Nakajima, Sebastian Krause, Dirk Weissenborn, Sven Schmeier, - Nico Goernitz, Feiyu Xu",SynsetRank: Degree-adjusted Random Walk for Relation Identification,cs.CL stat.AP," In relation extraction, a key process is to obtain good detectors that find -relevant sentences describing the target relation. To minimize the necessity of -labeled data for refining detectors, previous work successfully made use of -BabelNet, a semantic graph structure expressing relationships between synsets, -as side information or prior knowledge. The goal of this paper is to enhance -the use of graph structure in the framework of random walk with a few -adjustable parameters. Actually, a straightforward application of random walk -degrades the performance even after parameter optimization. With the insight -from this unsuccessful trial, we propose SynsetRank, which adjusts the initial -probability so that high degree nodes influence the neighbors as strong as low -degree nodes. In our experiment on 13 relations in the FB15K-237 dataset, -SynsetRank significantly outperforms baselines and the plain random walk -approach. -" -3393,1609.00718,Rie Johnson and Tong Zhang,"Convolutional Neural Networks for Text Categorization: Shallow - Word-level vs. Deep Character-level",cs.CL cs.LG stat.ML," This paper reports the performances of shallow word-level convolutional -neural networks (CNN), our earlier work (2015), on the eight datasets with -relatively large training data that were used for testing the very deep -character-level CNN in Conneau et al. (2016). Our findings are as follows. The -shallow word-level CNNs achieve better error rates than the error rates -reported in Conneau et al., though the results should be interpreted with some -consideration due to the unique pre-processing of Conneau et al. The shallow -word-level CNN uses more parameters and therefore requires more storage than -the deep character-level CNN; however, the shallow word-level CNN computes much -faster. -" -3394,1609.00777,"Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, - Faisal Ahmed, Li Deng","Towards End-to-End Reinforcement Learning of Dialogue Agents for - Information Access",cs.CL cs.LG," This paper proposes KB-InfoBot -- a multi-turn dialogue agent which helps -users search Knowledge Bases (KBs) without composing complicated queries. Such -goal-oriented dialogue agents typically need to interact with an external -database to access real-world knowledge. Previous systems achieved this by -issuing a symbolic query to the KB to retrieve entries based on their -attributes. However, such symbolic operations break the differentiability of -the system and prevent end-to-end training of neural dialogue agents. In this -paper, we address this limitation by replacing symbolic queries with an induced -""soft"" posterior distribution over the KB that indicates which entities the -user is interested in. Integrating the soft retrieval process with a -reinforcement learner leads to higher task success rate and reward in both -simulations and against real users. We also present a fully neural end-to-end -agent, trained entirely from user feedback, and discuss its application towards -personalized dialogue agents. The source code is available at -https://github.com/MiuLab/KB-InfoBot. -" -3395,1609.00799,"Danilo S. Carvalho, Minh-Tien Nguyen, Tran Xuan Chien and Minh Le - Nguyen",Lexical-Morphological Modeling for Legal Text Analysis,cs.IR cs.CL," In the context of the Competition on Legal Information Extraction/Entailment -(COLIEE), we propose a method comprising the necessary steps for finding -relevant documents to a legal question and deciding on textual entailment -evidence to provide a correct answer. The proposed method is based on the -combination of several lexical and morphological characteristics, to build a -language model and a set of features for Machine Learning algorithms. We -provide a detailed study on the proposed method performance and failure cases, -indicating that it is competitive with state-of-the-art approaches on Legal -Information Retrieval and Question Answering, while not needing extensive -training data nor depending on expert produced knowledge. The proposed method -achieved significant results in the competition, indicating a substantial level -of adequacy for the tasks addressed. -" -3396,1609.01188,"Fahad Al-Obaidli, Stephen Cox, Preslav Nakov","Bi-Text Alignment of Movie Subtitles for Spoken English-Arabic - Statistical Machine Translation",cs.CL," We describe efforts towards getting better resources for English-Arabic -machine translation of spoken text. In particular, we look at movie subtitles -as a unique, rich resource, as subtitles in one language often get translated -into other languages. Movie subtitles are not new as a resource and have been -explored in previous research; however, here we create a much larger bi-text -(the biggest to date), and we further generate better quality alignment for it. -Given the subtitles for the same movie in different languages, a key problem is -how to align them at the fragment level. Typically, this is done using -length-based alignment, but for movie subtitles, there is also time -information. Here we exploit this information to develop an original algorithm -that outperforms the current best subtitle alignment tool, subalign. The -evaluation results show that adding our bi-text to the IWSLT training bi-text -yields an improvement of over two BLEU points absolute. -" -3397,1609.01235,"Oren Melamud, Ido Dagan, Jacob Goldberger",PMI Matrix Approximations with Applications to Neural Language Modeling,cs.CL," The negative sampling (NEG) objective function, used in word2vec, is a -simplification of the Noise Contrastive Estimation (NCE) method. NEG was found -to be highly effective in learning continuous word representations. However, -unlike NCE, it was considered inapplicable for the purpose of learning the -parameters of a language model. In this study, we refute this assertion by -providing a principled derivation for NEG-based language modeling, founded on a -novel analysis of a low-dimensional approximation of the matrix of pointwise -mutual information between the contexts and the predicted words. The obtained -language modeling is closely related to NCE language models but is based on a -simplified objective function. We thus provide a unified formulation for two -main language processing tasks, namely word embedding and language modeling, -based on the NEG objective function. Experimental results on two popular -language modeling benchmarks show comparable perplexity results, with a small -advantage to NEG over NCE. -" -3398,1609.01454,"Bing Liu, Ian Lane","Attention-Based Recurrent Neural Network Models for Joint Intent - Detection and Slot Filling",cs.CL," Attention-based encoder-decoder neural network models have recently shown -promising results in machine translation and speech recognition. In this work, -we propose an attention-based neural network model for joint intent detection -and slot filling, both of which are critical steps for many speech -understanding and dialog systems. Unlike in machine translation and speech -recognition, alignment is explicit in slot filling. We explore different -strategies in incorporating this alignment information to the encoder-decoder -framework. Learning from the attention mechanism in encoder-decoder model, we -further propose introducing attention to the alignment-based RNN models. Such -attentions provide additional information to the intent classification and slot -label prediction. Our independent task models achieve state-of-the-art intent -detection error rate and slot filling F1 score on the benchmark ATIS task. Our -joint training model further obtains 0.56% absolute (23.8% relative) error -reduction on intent detection and 0.23% absolute gain on slot filling over the -independent task models. -" -3399,1609.01462,"Bing Liu, Ian Lane","Joint Online Spoken Language Understanding and Language Modeling with - Recurrent Neural Networks",cs.CL," Speaker intent detection and semantic slot filling are two critical tasks in -spoken language understanding (SLU) for dialogue systems. In this paper, we -describe a recurrent neural network (RNN) model that jointly performs intent -detection, slot filling, and language modeling. The neural network model keeps -updating the intent estimation as word in the transcribed utterance arrives and -uses it as contextual features in the joint model. Evaluation of the language -model and online SLU model is made on the ATIS benchmarking data set. On -language modeling task, our joint model achieves 11.8% relative reduction on -perplexity comparing to the independent training language model. On SLU tasks, -our joint model outperforms the independent task training model by 22.3% on -intent detection error rate, with slight degradation on slot filling F1 score. -The joint model also shows advantageous performance in the realistic ASR -settings with noisy speech input. -" -3400,1609.01574,"Prakash Reddy Putta, John J. Dzak III, Siddhartha R. Jonnalagadda","Automatically extracting, ranking and visually summarizing the - treatments for a disease",cs.CL cs.IR," Clinicians are expected to have up-to-date and broad knowledge of disease -treatment options for a patient. Online health knowledge resources contain a -wealth of information. However, because of the time investment needed to -disseminate and rank pertinent information, there is a need to summarize the -information in a more concise format. Our aim of the study is to provide -clinicians with a concise overview of popular treatments for a given disease -using information automatically computed from Medline abstracts. We analyzed -the treatments of two disorders - Atrial Fibrillation and Congestive Heart -Failure. We calculated the precision, recall, and f-scores of our two ranking -methods to measure the accuracy of the results. For Atrial Fibrillation -disorder, maximum f-score for the New Treatments weighing method is 0.611, -which occurs at 60 treatments. For Congestive Heart Failure disorder, maximum -f-score for the New Treatments weighing method is 0.503, which occurs at 80 -treatments. -" -3401,1609.01580,"Shu Dong, R Kannan Mutharasan, Siddhartha Jonnalagadda","Using Natural Language Processing to Screen Patients with Active Heart - Failure: An Exploration for Hospital-wide Surveillance",cs.CL cs.CY," In this paper, we proposed two different approaches, a rule-based approach -and a machine-learning based approach, to identify active heart failure cases -automatically by analyzing electronic health records (EHR). For the rule-based -approach, we extracted cardiovascular data elements from clinical notes and -matched patients to different colors according their heart failure condition by -using rules provided by experts in heart failure. It achieved 69.4% accuracy -and 0.729 F1-Score. For the machine learning approach, with bigram of clinical -notes as features, we tried four different models while SVM with linear kernel -achieved the best performance with 87.5% accuracy and 0.86 F1-Score. Also, from -the classification comparison between the four different models, we believe -that linear models fit better for this problem. Once we combine the -machine-learning and rule-based algorithms, we will enable hospital-wide -surveillance of active heart failure through increased accuracy and -interpretability of the outputs. -" -3402,1609.01586,"Ravi Garg, Shu Dong, Sanjiv Shah, Siddhartha R Jonnalagadda","A Bootstrap Machine Learning Approach to Identify Rare Disease Patients - from Electronic Health Records",cs.LG cs.CL," Rare diseases are very difficult to identify among large number of other -possible diagnoses. Better availability of patient data and improvement in -machine learning algorithms empower us to tackle this problem computationally. -In this paper, we target one such rare disease - cardiac amyloidosis. We aim to -automate the process of identifying potential cardiac amyloidosis patients with -the help of machine learning algorithms and also learn most predictive factors. -With the help of experienced cardiologists, we prepared a gold standard with 73 -positive (cardiac amyloidosis) and 197 negative instances. We achieved high -average cross-validation F1 score of 0.98 using an ensemble machine learning -classifier. Some of the predictive variables were: Age and Diagnosis of cardiac -arrest, chest pain, congestive heart failure, hypertension, prim open angle -glaucoma, and shoulder arthritis. Further studies are needed to validate the -accuracy of the system across an entire health system and its generalizability -for other diseases. -" -3403,1609.01592,"Ravi P Garg, Kalpana Raja, Siddhartha R Jonnalagadda",CRTS: A type system for representing clinical recommendations,cs.CL cs.CY," Background: Clinical guidelines and recommendations are the driving wheels of -the evidence-based medicine (EBM) paradigm, but these are available primarily -as unstructured text and are generally highly heterogeneous in nature. This -significantly reduces the dissemination and automatic application of these -recommendations at the point of care. A comprehensive structured representation -of these recommendations is highly beneficial in this regard. Objective: The -objective of this paper to present Clinical Recommendation Type System (CRTS), -a common type system that can effectively represent a clinical recommendation -in a structured form. Methods: CRTS is built by analyzing 125 recommendations -and 195 research articles corresponding to 6 different diseases available from -UpToDate, a publicly available clinical knowledge system, and from the National -Guideline Clearinghouse, a public resource for evidence-based clinical practice -guidelines. Results: We show that CRTS not only covers the recommendations but -also is flexible to be extended to represent information from primary -literature. We also describe how our developed type system can be applied for -clinical decision support, medical knowledge summarization, and citation -retrieval. Conclusion: We showed that our proposed type system is precise and -comprehensive in representing a large sample of recommendations available for -various disorders. CRTS can now be used to build interoperable information -extraction systems that automatically extract clinical recommendations and -related data elements from clinical evidence resources, guidelines, systematic -reviews and primary publications. - Keywords: guidelines and recommendations, type system, clinical decision -support, evidence-based medicine, information storage and retrieval -" -3404,1609.01594,"Abhishek Kalyan Adupa, Ravi Prakash Garg, Jessica Corona-Cox, Sanjiv. - J. Shah, Siddhartha R. Jonnalagadda","An Information Extraction Approach to Prescreen Heart Failure Patients - for Clinical Trials",cs.CL cs.CY," To reduce the large amount of time spent screening, identifying, and -recruiting patients into clinical trials, we need prescreening systems that are -able to automate the data extraction and decision-making tasks that are -typically relegated to clinical research study coordinators. However, a major -obstacle is the vast amount of patient data available as unstructured free-form -text in electronic health records. Here we propose an information -extraction-based approach that first automatically converts unstructured text -into a structured form. The structured data are then compared against a list of -eligibility criteria using a rule-based system to determine which patients -qualify for enrollment in a heart failure clinical trial. We show that we can -achieve highly accurate results, with recall and precision values of 0.95 and -0.86, respectively. Our system allowed us to significantly reduce the time -needed for prescreening patients from a few weeks to a few minutes. Our -open-source information extraction modules are available for researchers and -could be tested and validated in other cardiovascular trials. An approach such -as the one we demonstrate here may decrease costs and expedite clinical trials, -and could enhance the reproducibility of trials across institutions and -populations. -" -3405,1609.01597,"Kalpana Raja, Andrew J Sauer, Ravi P Garg, Melanie R Klerer, - Siddhartha R Jonnalagadda","A Hybrid Citation Retrieval Algorithm for Evidence-based Clinical - Knowledge Summarization: Combining Concept Extraction, Vector Similarity and - Query Expansion for High Precision",cs.CL cs.IR," Novel information retrieval methods to identify citations relevant to a -clinical topic can overcome the knowledge gap existing between the primary -literature (MEDLINE) and online clinical knowledge resources such as UpToDate. -Searching the MEDLINE database directly or with query expansion methods returns -a large number of citations that are not relevant to the query. The current -study presents a citation retrieval system that retrieves citations for -evidence-based clinical knowledge summarization. This approach combines query -expansion, concept-based screening algorithm, and concept-based vector -similarity. We also propose an information extraction framework for automated -concept (Population, Intervention, Comparison, and Disease) extraction. We -evaluated our proposed system on all topics (as queries) available from -UpToDate for two diseases, heart failure (HF) and atrial fibrillation (AFib). -The system achieved an overall F-score of 41.2% on HF topics and 42.4% on AFib -topics on a gold standard of citations available in UpToDate. This is -significantly high when compared to a query-expansion based baseline (F-score -of 1.3% on HF and 2.2% on AFib) and a system that uses query expansion with -disease hyponyms and journal names, concept-based screening, and term-based -vector similarity system (F-score of 37.5% on HF and 39.5% on AFib). Evaluating -the system with top K relevant citations, where K is the number of citations in -the gold standard achieved a much higher overall F-score of 69.9% on HF topics -and 75.1% on AFib topics. In addition, the system retrieved up to 18 new -relevant citations per topic when tested on ten HF and six AFib clinical -topics. -" -3406,1609.01926,"Giovanni Sirio Carmantini, Peter beim Graben, Mathieu Desroches, - Serafim Rodrigues","A modular architecture for transparent computation in Recurrent Neural - Networks",cs.NE cs.AI cs.CL cs.FL cs.SC," Computation is classically studied in terms of automata, formal languages and -algorithms; yet, the relation between neural dynamics and symbolic -representations and operations is still unclear in traditional eliminative -connectionism. Therefore, we suggest a unique perspective on this central -issue, to which we would like to refer as to transparent connectionism, by -proposing accounts of how symbolic computation can be implemented in neural -substrates. In this study we first introduce a new model of dynamics on a -symbolic space, the versatile shift, showing that it supports the real-time -simulation of a range of automata. We then show that the Goedelization of -versatile shifts defines nonlinear dynamical automata, dynamical systems -evolving on a vectorial space. Finally, we present a mapping between nonlinear -dynamical automata and recurrent artificial neural networks. The mapping -defines an architecture characterized by its granular modularity, where data, -symbolic operations and their control are not only distinguishable in -activation space, but also spatially localizable in the network itself, while -maintaining a distributed encoding of symbolic representations. The resulting -networks simulate automata in real-time and are programmed directly, in absence -of network training. To discuss the unique characteristics of the architecture -and their consequences, we present two examples: i) the design of a Central -Pattern Generator from a finite-state locomotive controller, and ii) the -creation of a network simulating a system of interactive automata that supports -the parsing of garden-path sentences as investigated in psycholinguistics -experiments. -" -3407,1609.01933,Hua Feng and Ruixi Lin,Sentiment Classification of Food Reviews,cs.CL," Sentiment analysis of reviews is a popular task in natural language -processing. In this work, the goal is to predict the score of food reviews on a -scale of 1 to 5 with two recurrent neural networks that are carefully tuned. As -for baseline, we train a simple RNN for classification. Then we extend the -baseline to GRU. In addition, we present two different methods to deal with -highly skewed data, which is a common problem for reviews. Models are evaluated -using accuracies. -" -3408,1609.01962,"Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Arkaitz Zubiaga, Maria - Liakata, Rob Procter","Using Gaussian Processes for Rumour Stance Classification in Social - Media",cs.CL cs.IR cs.SI," Social media tend to be rife with rumours while new reports are released -piecemeal during breaking news. Interestingly, one can mine multiple reactions -expressed by social media users in those situations, exploring their stance -towards rumours, ultimately enabling the flagging of highly disputed rumours as -being potentially false. In this work, we set out to develop an automated, -supervised classifier that uses multi-task learning to classify the stance -expressed in each individual tweet in a rumourous conversation as either -supporting, denying or questioning the rumour. Using a classifier based on -Gaussian Processes, and exploring its effectiveness on two datasets with very -different characteristics and varying distributions of stances, we show that -our approach consistently outperforms competitive baseline classifiers. Our -classifier is especially effective in estimating the distribution of different -types of stance associated with a given rumour, which we set forth as a desired -characteristic for a rumour-tracking system that will warn both ordinary users -of Twitter and professional news practitioners when a rumour is being rebutted. -" -3409,1609.02043,"Purushotam Radadia, Shirish Karande","Feasibility of Post-Editing Speech Transcriptions with a Mismatched - Crowd",cs.AI cs.CL," Manual correction of speech transcription can involve a selection from -plausible transcriptions. Recent work has shown the feasibility of employing a -mismatched crowd for speech transcription. However, it is yet to be established -whether a mismatched worker has sufficiently fine-granular speech perception to -choose among the phonetically proximate options that are likely to be generated -from the trellis of an ASRU. Hence, we consider five languages, Arabic, German, -Hindi, Russian and Spanish. For each we generate synthetic, phonetically -proximate, options which emulate post-editing scenarios of varying difficulty. -We consistently observe non-trivial crowd ability to choose among fine-granular -options. -" -3410,1609.02075,"Rahul Goel, Sandeep Soni, Naman Goyal, John Paparrizos, Hanna Wallach, - Fernando Diaz, Jacob Eisenstein",The Social Dynamics of Language Change in Online Networks,cs.CL cs.SI physics.soc-ph," Language change is a complex social phenomenon, revealing pathways of -communication and sociocultural influence. But, while language change has long -been a topic of study in sociolinguistics, traditional linguistic research -methods rely on circumstantial evidence, estimating the direction of change -from differences between older and younger speakers. In this paper, we use a -data set of several million Twitter users to track language changes in -progress. First, we show that language change can be viewed as a form of social -influence: we observe complex contagion for phonetic spellings and ""netspeak"" -abbreviations (e.g., lol), but not for older dialect markers from spoken -language. Next, we test whether specific types of social network connections -are more influential than others, using a parametric Hawkes process model. We -find that tie strength plays an important role: densely embedded social ties -are significantly better conduits of linguistic influence. Geographic locality -appears to play a more limited role: we find relatively little evidence to -support the hypothesis that individuals are more influenced by geographically -local social ties, even in their usage of geographical dialect markers. -" -3411,1609.02082,"Christian Huemmer, Ram\'on Fern\'andez Astudillo and Walter Kellermann","An improved uncertainty decoding scheme with weighted samples for - DNN-HMM hybrid systems",cs.LG cs.CL cs.SD," In this paper, we advance a recently-proposed uncertainty decoding scheme for -DNN-HMM (deep neural network - hidden Markov model) hybrid systems. This -numerical sampling concept averages DNN outputs produced by a finite set of -feature samples (drawn from a probabilistic distortion model) to approximate -the posterior likelihoods of the context-dependent HMM states. As main -innovation, we propose a weighted DNN-output averaging based on a minimum -classification error criterion and apply it to a probabilistic distortion model -for spatial diffuseness features. The experimental evaluation is performed on -the 8-channel REVERB Challenge task using a DNN-HMM hybrid system with -multichannel front-end signal enhancement. We show that the recognition -accuracy of the DNN-HMM hybrid system improves by incorporating uncertainty -decoding based on random sampling and that the proposed weighted DNN-output -averaging further reduces the word error rate scores. -" -3412,1609.02116,"Trapit Bansal, David Belanger, Andrew McCallum",Ask the GRU: Multi-Task Learning for Deep Text Recommendations,stat.ML cs.CL cs.LG," In a variety of application domains the content to be recommended to users is -associated with text. This includes research papers, movies with associated -plot summaries, news articles, blog posts, etc. Recommendation approaches based -on latent factor models can be extended naturally to leverage text by employing -an explicit mapping from text to factors. This enables recommendations for new, -unseen content, and may generalize better, since the factors for all items are -produced by a compactly-parametrized model. Previous work has used topic models -or averages of word embeddings for this mapping. In this paper we present a -method leveraging deep recurrent neural networks to encode the text sequence -into a latent vector, specifically gated recurrent units (GRUs) trained -end-to-end on the collaborative filtering task. For the task of scientific -paper recommendation, this yields models with significantly higher accuracy. In -cold-start scenarios, we beat the previous state-of-the-art, all of which -ignore word order. Performance is further improved by multi-task learning, -where the text encoder network is trained for a combination of content -recommendation and item metadata prediction. This regularizes the collaborative -filtering model, ameliorating the problem of sparsity of the observed rating -matrix. -" -3413,1609.02549,"Junjie Hu, Jean Oh, Anatole Gershman",Learning Lexical Entries for Robotic Commands using Crowdsourcing,cs.CL," Robotic commands in natural language usually contain various spatial -descriptions that are semantically similar but syntactically different. Mapping -such syntactic variants into semantic concepts that can be understood by robots -is challenging due to the high flexibility of natural language expressions. To -tackle this problem, we collect robotic commands for navigation and -manipulation tasks using crowdsourcing. We further define a robot language and -use a generative machine translation model to translate robotic commands from -natural language to robot language. The main purpose of this paper is to -simulate the interaction process between human and robots using crowdsourcing -platforms, and investigate the possibility of translating natural language to -robot language with paraphrases. -" -3414,1609.02727,"Vlad Sandulescu, Martin Ester",Detecting Singleton Review Spammers Using Semantic Similarity,cs.CL cs.LG," Online reviews have increasingly become a very important resource for -consumers when making purchases. Though it is becoming more and more difficult -for people to make well-informed buying decisions without being deceived by -fake reviews. Prior works on the opinion spam problem mostly considered -classifying fake reviews using behavioral user patterns. They focused on -prolific users who write more than a couple of reviews, discarding one-time -reviewers. The number of singleton reviewers however is expected to be high for -many review websites. While behavioral patterns are effective when dealing with -elite users, for one-time reviewers, the review text needs to be exploited. In -this paper we tackle the problem of detecting fake reviews written by the same -person using multiple names, posting each review under a different name. We -propose two methods to detect similar reviews and show the results generally -outperform the vectorial similarity measures used in prior works. The first -method extends the semantic similarity between words to the reviews level. The -second method is based on topic modeling and exploits the similarity of the -reviews topic distributions using two models: bag-of-words and -bag-of-opinion-phrases. The experiments were conducted on reviews from three -different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset -(800 reviews). -" -3415,1609.02745,"Sebastian Ruder, Parsa Ghaffari, and John G. Breslin",A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis,cs.CL cs.LG," Opinion mining from customer reviews has become pervasive in recent years. -Sentences in reviews, however, are usually classified independently, even -though they form part of a review's argumentative structure. Intuitively, -sentences in a review build and elaborate upon each other; knowledge of the -review structure and sentential context should thus inform the classification -of each sentence. We demonstrate this hypothesis for the task of aspect-based -sentiment analysis by modeling the interdependencies of sentences in a review -with a hierarchical bidirectional LSTM. We show that the hierarchical model -outperforms two non-hierarchical baselines, obtains results competitive with -the state-of-the-art, and outperforms the state-of-the-art on five -multilingual, multi-domain datasets without any hand-engineered features or -external resources. -" -3416,1609.02746,"Sebastian Ruder, Parsa Ghaffari, and John G. Breslin","INSIGHT-1 at SemEval-2016 Task 4: Convolutional Neural Networks for - Sentiment Classification and Quantification",cs.CL cs.LG," This paper describes our deep learning-based approach to sentiment analysis -in Twitter as part of SemEval-2016 Task 4. We use a convolutional neural -network to determine sentiment and participate in all subtasks, i.e. two-point, -three-point, and five-point scale sentiment classification and two-point and -five-point scale sentiment quantification. We achieve competitive results for -two-point scale sentiment classification and quantification, ranking fifth and -a close fourth (third and second by alternative metrics) respectively despite -using only pre-trained embeddings that contain no sentiment information. We -achieve good performance on three-point scale sentiment classification, ranking -eighth out of 35, while performing poorly on five-point scale sentiment -classification and quantification. An error analysis reveals that this is due -to low expressiveness of the model to capture negative sentiment as well as an -inability to take into account ordinal information. We propose improvements in -order to address these and other issues. -" -3417,1609.02748,"Sebastian Ruder, Parsa Ghaffari, and John G. Breslin","INSIGHT-1 at SemEval-2016 Task 5: Deep Learning for Multilingual - Aspect-based Sentiment Analysis",cs.CL cs.LG," This paper describes our deep learning-based approach to multilingual -aspect-based sentiment analysis as part of SemEval 2016 Task 5. We use a -convolutional neural network (CNN) for both aspect extraction and aspect-based -sentiment analysis. We cast aspect extraction as a multi-label classification -problem, outputting probabilities over aspects parameterized by a threshold. To -determine the sentiment towards an aspect, we concatenate an aspect vector with -every word embedding and apply a convolution over it. Our constrained system -(unconstrained for English) achieves competitive results across all languages -and domains, placing first or second in 5 and 7 out of 11 language-domain pairs -for aspect category detection (slot 1) and sentiment polarity (slot 3) -respectively, thereby demonstrating the viability of a deep learning-based -approach for multilingual aspect-based sentiment analysis. -" -3418,1609.02809,"Alexei Bastidas, Edward Dixon, Chris Loo, John Ryan",Harassment detection: a benchmark on the #HackHarassment dataset,cs.CL," Online harassment has been a problem to a greater or lesser extent since the -early days of the internet. Previous work has applied anti-spam techniques like -machine-learning based text classification (Reynolds, 2011) to detecting -harassing messages. However, existing public datasets are limited in size, with -labels of varying quality. The #HackHarassment initiative (an alliance of 1 -tech companies and NGOs devoted to fighting bullying on the internet) has begun -to address this issue by creating a new dataset superior to its predecssors in -terms of both size and quality. As we (#HackHarassment) complete further rounds -of labelling, later iterations of this dataset will increase the available -samples by at least an order of magnitude, enabling corresponding improvements -in the quality of machine learning models for harassment detection. In this -paper, we introduce the first models built on the #HackHarassment dataset v1.0 -(a new open dataset, which we are delighted to share with any interested -researcherss) as a benchmark for future research. -" -3419,1609.02846,"Milica Gasic, Nikola Mrksic, Lina M. Rojas-Barahona, Pei-Hao Su, - Stefan Ultes, David Vandyke, Tsung-Hsien Wen and Steve Young","Dialogue manager domain adaptation using Gaussian process reinforcement - learning",cs.CL," Spoken dialogue systems allow humans to interact with machines using natural -speech. As such, they have many benefits. By using speech as the primary -communication medium, a computer interface can facilitate swift, human-like -acquisition of information. In recent years, speech interfaces have become ever -more popular, as is evident from the rise of personal assistants such as Siri, -Google Now, Cortana and Amazon Alexa. Recently, data-driven machine learning -methods have been applied to dialogue modelling and the results achieved for -limited-domain applications are comparable to or outperform traditional -approaches. Methods based on Gaussian processes are particularly effective as -they enable good models to be estimated from limited training data. -Furthermore, they provide an explicit estimate of the uncertainty which is -particularly useful for reinforcement learning. This article explores the -additional steps that are necessary to extend these methods to model multiple -dialogue domains. We show that Gaussian process reinforcement learning is an -elegant framework that naturally supports a range of methods, including prior -knowledge, Bayesian committee machines and multi-agent learning, for -facilitating extensible and adaptable dialogue systems. -" -3420,1609.02960,"Salam Khalifa, Nizar Habash, Dana Abdulrahim, Sara Hassan",A Large Scale Corpus of Gulf Arabic,cs.CL," Most Arabic natural language processing tools and resources are developed to -serve Modern Standard Arabic (MSA), which is the official written language in -the Arab World. Some Dialectal Arabic varieties, notably Egyptian Arabic, have -received some attention lately and have a growing collection of resources that -include annotated corpora and morphological analyzers and taggers. Gulf Arabic, -however, lags behind in that respect. In this paper, we present the Gumar -Corpus, a large-scale corpus of Gulf Arabic consisting of 110 million words -from 1,200 forum novels. We annotate the corpus for sub-dialect information at -the document level. We also present results of a preliminary study in the -morphological annotation of Gulf Arabic which includes developing guidelines -for a conventional orthography. The text of the corpus is publicly browsable -through a web interface we developed for it. -" -3421,1609.03148,Diego Gabriel Krivochen,"Divide and...conquer? On the limits of algorithmic approaches to - syntactic semantic structure",cs.CL cs.FL," In computer science, divide and conquer (D&C) is an algorithm design paradigm -based on multi-branched recursion. A D&C algorithm works by recursively and -monotonically breaking down a problem into sub problems of the same (or a -related) type, until these become simple enough to be solved directly. The -solutions to the sub problems are then combined to give a solution to the -original problem. The present work identifies D&C algorithms assumed within -contemporary syntactic theory, and discusses the limits of their applicability -in the realms of the syntax semantics and syntax morphophonology interfaces. We -will propose that D&C algorithms, while valid for some processes, fall short on -flexibility given a mixed approach to the structure of linguistic phrase -markers. Arguments in favour of a computationally mixed approach to linguistic -structure will be presented as an alternative that offers advantages to uniform -D&C approaches. -" -3422,1609.03193,"Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve",Wav2Letter: an End-to-End ConvNet-based Speech Recognition System,cs.LG cs.AI cs.CL," This paper presents a simple end-to-end model for speech recognition, -combining a convolutional network based acoustic model and a graph decoding. It -is trained to output letters, with transcribed speech, without the need for -force alignment of phonemes. We introduce an automatic segmentation criterion -for training from sequence annotation without alignment that is on par with CTC -while being simpler. We show competitive results in word error rate on the -Librispeech corpus with MFCC features, and promising results from raw waveform. -" -3423,1609.03204,"Ella Rabinovich, Sergiu Nisioi, Noam Ordan, Shuly Wintner","On the Similarities Between Native, Non-native and Translated Texts",cs.CL," We present a computational analysis of three language varieties: native, -advanced non-native, and translation. Our goal is to investigate the -similarities and differences between non-native language productions and -translations, contrasting both with native language. Using a collection of -computational methods we establish three main results: (1) the three types of -texts are easily distinguishable; (2) non-native language and translations are -closer to each other than each of them is to native language; and (3) some of -these characteristics depend on the source or native language, while others do -not, reflecting, perhaps, unified principles that similarly affect translations -and non-native language. -" -3424,1609.03205,Ella Rabinovich and Shuly Wintner,Unsupervised Identification of Translationese,cs.CL," Translated texts are distinctively different from original ones, to the -extent that supervised text classification methods can distinguish between them -with high accuracy. These differences were proven useful for statistical -machine translation. However, it has been suggested that the accuracy of -translation detection deteriorates when the classifier is evaluated outside the -domain it was trained on. We show that this is indeed the case, in a variety of -evaluation scenarios. We then show that unsupervised classification is highly -accurate on this task. We suggest a method for determining the correct labels -of the clustering outcomes, and then use the labels for voting, improving the -accuracy even further. Moreover, we suggest a simple method for clustering in -the challenging case of mixed-domain datasets, in spite of the dominance of -domain-related features over translation-related ones. The result is an -effective, fully-unsupervised method for distinguishing between original and -translated texts that can be applied to new domains with reasonable accuracy. -" -3425,1609.03207,"Massimo Stella, Nicole M. Beckage and Markus Brede","Multiplex lexical networks reveal patterns in early word acquisition in - children",physics.soc-ph cond-mat.dis-nn cs.CL cs.LG," Network models of language have provided a way of linking cognitive processes -to the structure and connectivity of language. However, one shortcoming of -current approaches is focusing on only one type of linguistic relationship at a -time, missing the complex multi-relational nature of language. In this work, we -overcome this limitation by modelling the mental lexicon of English-speaking -toddlers as a multiplex lexical network, i.e. a multi-layered network where -N=529 words/nodes are connected according to four types of relationships: (i) -free associations, (ii) feature sharing, (iii) co-occurrence, and (iv) -phonological similarity. We provide analysis of the topology of the resulting -multiplex and then proceed to evaluate single layers as well as the full -multiplex structure on their ability to predict empirically observed age of -acquisition data of English speaking toddlers. We find that the emerging -multiplex network topology is an important proxy of the cognitive processes of -acquisition, capable of capturing emergent lexicon structure. In fact, we show -that the multiplex topology is fundamentally more powerful than individual -layers in predicting the ordering with which words are acquired. Furthermore, -multiplex analysis allows for a quantification of distinct phases of lexical -acquisition in early learners: while initially all the multiplex layers -contribute to word learning, after about month 23 free associations take the -lead in driving word acquisition. -" -3426,1609.03286,"Yun-Nung Chen, Dilek Hakkani-Tur, Gokhan Tur, Asli Celikyilmaz, - Jianfeng Gao, Li Deng",Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,cs.AI cs.CL," Natural language understanding (NLU) is a core component of a spoken dialogue -system. Recently recurrent neural networks (RNN) obtained strong results on NLU -due to their superior ability of preserving sequential information over time. -Traditionally, the NLU module tags semantic slots for utterances considering -their flat structures, as the underlying RNN structure is a linear chain. -However, natural language exhibits linguistic properties that provide rich, -structured information for better understanding. This paper introduces a novel -model, knowledge-guided structural attention networks (K-SAN), a generalization -of RNN to additionally incorporate non-flat network topologies guided by prior -knowledge. There are two characteristics: 1) important substructures can be -captured from small training data, allowing the model to generalize to -previously unseen test data; 2) the model automatically figures out the salient -substructures that are essential to predict the semantic tags of the given -sentences, so that the understanding performance can be improved. The -experiments on the benchmark Air Travel Information System (ATIS) data show -that the proposed K-SAN architecture can effectively extract salient knowledge -from substructures with an attention mechanism, and outperform the performance -of the state-of-the-art neural network based frameworks. -" -3427,1609.03357,Anna Jordanous and Bill Keller,"Modelling Creativity: Identifying Key Components through a Corpus-Based - Approach",cs.CL cs.AI," Creativity is a complex, multi-faceted concept encompassing a variety of -related aspects, abilities, properties and behaviours. If we wish to study -creativity scientifically, then a tractable and well-articulated model of -creativity is required. Such a model would be of great value to researchers -investigating the nature of creativity and in particular, those concerned with -the evaluation of creative practice. This paper describes a unique approach to -developing a suitable model of how creative behaviour emerges that is based on -the words people use to describe the concept. Using techniques from the field -of statistical natural language processing, we identify a collection of -fourteen key components of creativity through an analysis of a corpus of -academic papers on the topic. Words are identified which appear significantly -often in connection with discussions of the concept. Using a measure of lexical -similarity to help cluster these words, a number of distinct themes emerge, -which collectively contribute to a comprehensive and multi-perspective model of -creativity. The components provide an ontology of creativity: a set of building -blocks which can be used to model creative practice in a variety of domains. -The components have been employed in two case studies to evaluate the -creativity of computational systems and have proven useful in articulating -achievements of this work and directions for further research. -" -3428,1609.03376,Ahmed El Kholy and Nizar Habash,"Morphological Constraints for Phrase Pivot Statistical Machine - Translation",cs.CL," The lack of parallel data for many language pairs is an important challenge -to statistical machine translation (SMT). One common solution is to pivot -through a third language for which there exist parallel corpora with the source -and target languages. Although pivoting is a robust technique, it introduces -some low quality translations especially when a poor morphology language is -used as the pivot between rich morphology languages. In this paper, we examine -the use of synchronous morphology constraint features to improve the quality of -phrase pivot SMT. We compare hand-crafted constraints to those learned from -limited parallel data between source and target languages. The learned -morphology constraints are based on projected align- ments between the source -and target phrases in the pivot phrase table. We show positive results on -Hebrew-Arabic SMT (pivoting on English). We get 1.5 BLEU points over a phrase -pivot baseline and 0.8 BLEU points over a system combination baseline with a -direct model built from parallel data. -" -3429,1609.03441,"Jan Chorowski, Micha{\l} Zapotoczny, Pawe{\l} Rychlikowski","Read, Tag, and Parse All at Once, or Fully-neural Dependency Parsing",cs.CL," We present a dependency parser implemented as a single deep neural network -that reads orthographic representations of words and directly generates -dependencies and their labels. Unlike typical approaches to parsing, the model -doesn't require part-of-speech (POS) tagging of the sentences. With proper -regularization and additional supervision achieved with multitask learning we -reach state-of-the-art performance on Slavic languages from the Universal -Dependencies treebank: with no linguistic features other than characters, our -parser is as accurate as a transition- based system trained on perfect POS -tags. -" -3430,1609.03528,"W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu - and G. Zweig",The Microsoft 2016 Conversational Speech Recognition System,cs.CL eess.AS," We describe Microsoft's conversational speech recognition system, in which we -combine recent developments in neural-network-based acoustic and language -modeling to advance the state of the art on the Switchboard recognition task. -Inspired by machine learning ensemble techniques, the system uses a range of -convolutional and recurrent neural networks. I-vector modeling and lattice-free -MMI training provide significant gains for all acoustic model architectures. -Language model rescoring with multiple forward and backward running RNNLMs, and -word posterior-based system combination provide a 20% boost. The best single -system uses a ResNet architecture acoustic model with RNNLM rescoring, and -achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The -combined system has an error rate of 6.2%, representing an improvement over -previously reported results on this benchmark task. -" -3431,1609.03632,Bishan Yang and Tom Mitchell,Joint Extraction of Events and Entities within a Document Context,cs.CL cs.AI," Events and entities are closely related; entities are often actors or -participants in events and events without entities are uncommon. The -interpretation of events and entities is highly contextually dependent. -Existing work in information extraction typically models events separately from -entities, and performs inference at the sentence level, ignoring the rest of -the document. In this paper, we propose a novel approach that models the -dependencies among variables of events, entities, and their relations, and -performs joint inference of these variables across a document. The goal is to -enable access to document-level contextual information and facilitate -context-aware predictions. We demonstrate that our approach substantially -outperforms the state-of-the-art methods for event extraction as well as a -strong baseline for entity extraction. -" -3432,1609.03663,"Tong Wang, Ping Chen, Kevin Amaral, Jipeng Qiang","An Experimental Study of LSTM Encoder-Decoder Model for Text - Simplification",cs.CL cs.LG," Text simplification (TS) aims to reduce the lexical and structural complexity -of a text, while still retaining the semantic meaning. Current automatic TS -techniques are limited to either lexical-level applications or manually -defining a large amount of rules. Since deep neural networks are powerful -models that have achieved excellent performance over many difficult tasks, in -this paper, we propose to use the Long Short-Term Memory (LSTM) Encoder-Decoder -model for sentence level TS, which makes minimal assumptions about word -sequence. We conduct preliminary experiments to find that the model is able to -learn operation rules such as reversing, sorting and replacing from sequence -pairs, which shows that the model may potentially discover and apply rules such -as modifying sentence structure, substituting words, and removing words for TS. -" -3433,1609.03777,"Kyuyeon Hwang, Wonyong Sung","Character-Level Language Modeling with Hierarchical Recurrent Neural - Networks",cs.LG cs.CL cs.NE," Recurrent neural network (RNN) based character-level language models (CLMs) -are extremely useful for modeling out-of-vocabulary words by nature. However, -their performance is generally much worse than the word-level language models -(WLMs), since CLMs need to consider longer history of tokens to properly -predict the next one. We address this problem by proposing hierarchical RNN -architectures, which consist of multiple modules with different timescales. -Despite the multi-timescale structures, the input and output layers operate -with the character-level clock, which allows the existing RNN CLM training -approaches to be directly applicable without any modifications. Our CLM models -show better perplexity than Kneser-Ney (KN) 5-gram WLMs on the One Billion Word -Benchmark with only 2% of parameters. Also, we present real-time -character-level end-to-end speech recognition examples on the Wall Street -Journal (WSJ) corpus, where replacing traditional mono-clock RNN CLMs with the -proposed models results in better recognition accuracies even though the number -of parameters are reduced to 30%. -" -3434,1609.03976,"Ozan Caglayan, Lo\""ic Barrault, Fethi Bougares",Multimodal Attention for Neural Machine Translation,cs.CL cs.NE," The attention mechanism is an important part of the neural machine -translation (NMT) where it was reported to produce richer source representation -compared to fixed-length encoding sequence-to-sequence models. Recently, the -effectiveness of attention has also been explored in the context of image -captioning. In this work, we assess the feasibility of a multimodal attention -mechanism that simultaneously focus over an image and its natural language -description for generating a description in another language. We train several -variants of our proposed attention mechanism on the Multi30k multilingual image -captioning dataset. We show that a dedicated attention for each modality -achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT -baseline. -" -3435,1609.04186,"Lemao Liu, Masao Utiyama, Andrew Finch and Eiichiro Sumita",Neural Machine Translation with Supervised Attention,cs.CL," The attention mechanisim is appealing for neural machine translation, since -it is able to dynam- ically encode a source sentence by generating a alignment -between a target word and source words. Unfortunately, it has been proved to be -worse than conventional alignment models in aligment accuracy. In this paper, -we analyze and explain this issue from the point view of re- ordering, and -propose a supervised attention which is learned with guidance from conventional -alignment models. Experiments on two Chinese-to-English translation tasks show -that the super- vised attention mechanism yields better alignments leading to -substantial gains over the standard attention based NMT. -" -3436,1609.04253,Amir H. Jadidinejad,Neural Machine Transliteration: Preliminary Results,cs.CL," Machine transliteration is the process of automatically transforming the -script of a word from a source language to a target language, while preserving -pronunciation. Sequence to sequence learning has recently emerged as a new -paradigm in supervised learning. In this paper a character-based -encoder-decoder model has been proposed that consists of two Recurrent Neural -Networks. The encoder is a Bidirectional recurrent neural network that encodes -a sequence of symbols into a fixed-length vector representation, and the -decoder generates the target sequence using an attention-based recurrent neural -network. The encoder, the decoder and the attention mechanism are jointly -trained to maximize the conditional probability of a target sequence given a -source sequence. Our experiments on different datasets show that the proposed -encoder-decoder model is able to achieve significantly higher transliteration -quality over traditional statistical models. -" -3437,1609.04309,"Edouard Grave, Armand Joulin, Moustapha Ciss\'e, David Grangier, - Herv\'e J\'egou",Efficient softmax approximation for GPUs,cs.CL cs.LG," We propose an approximate strategy to efficiently train neural network based -language models over very large vocabularies. Our approach, called adaptive -softmax, circumvents the linear dependency on the vocabulary size by exploiting -the unbalanced word distribution to form clusters that explicitly minimize the -expectation of computation time. Our approach further reduces the computational -time by exploiting the specificities of modern architectures and matrix-matrix -vector operations, making it particularly suited for graphical processing -units. Our experiments carried out on standard benchmarks, such as EuroParl and -One Billion Word, show that our approach brings a large gain in efficiency over -standard approximations while achieving an accuracy close to that of the full -softmax. The code of our method is available at -https://github.com/facebookresearch/adaptive-softmax. -" -3438,1609.04325,"Stephen Mayhew, Christos Christodoulopoulos, Dan Roth",Transliteration in Any Language with Surrogate Languages,cs.CL," We introduce a method for transliteration generation that can produce -transliterations in every language. Where previous results are only as -multilingual as Wikipedia, we show how to use training data from Wikipedia as -surrogate training for any language. Thus, the problem becomes one of ranking -Wikipedia languages in order of suitability with respect to a target language. -We introduce several task-specific methods for ranking languages, and show that -our approach is comparable to the oracle ceiling, and even outperforms it in -some cases. -" -3439,1609.04417,"Peng Dai, Xue Teng, Frank Rudzicz, Ing Yann Soon",An Adaptive Psychoacoustic Model for Automatic Speech Recognition,cs.CL cs.SD," Compared with automatic speech recognition (ASR), the human auditory system -is more adept at handling noise-adverse situations, including environmental -noise and channel distortion. To mimic this adeptness, auditory models have -been widely incorporated in ASR systems to improve their robustness. This paper -proposes a novel auditory model which incorporates psychoacoustics and -otoacoustic emissions (OAEs) into ASR. In particular, we successfully implement -the frequency-dependent property of psychoacoustic models and effectively -improve resulting system performance. We also present a novel double-transform -spectrum-analysis technique, which can qualitatively predict ASR performance -for different noise types. Detailed theoretical analysis is provided to show -the effectiveness of the proposed algorithm. Experiments are carried out on the -AURORA2 database and show that the word recognition rate using our proposed -feature extraction method is significantly increased over the baseline. Given -models trained with clean speech, our proposed method achieves up to 85.39% -word recognition accuracy on noisy data. -" -3440,1609.04621,"Mercedes Garc\'ia-Mart\'inez, Lo\""ic Barrault and Fethi Bougares",Factored Neural Machine Translation,cs.CL," We present a new approach for neural machine translation (NMT) using the -morphological and grammatical decomposition of the words (factors) in the -output side of the neural network. This architecture addresses two main -problems occurring in MT, namely dealing with a large target language -vocabulary and the out of vocabulary (OOV) words. By the means of factors, we -are able to handle larger vocabulary and reduce the training time (for systems -with equivalent target language vocabulary size). In addition, we can produce -new words that are not in the vocabulary. We use a morphological analyser to -get a factored representation of each word (lemmas, Part of Speech tag, tense, -person, gender and number). We have extended the NMT approach with attention -mechanism in order to have two different outputs, one for the lemmas and the -other for the rest of the factors. The final translation is built using some -\textit{a priori} linguistic information. We compare our extension with a -word-based NMT system. The experiments, performed on the IWSLT'15 dataset -translating from English to French, show that while the performance do not -always increase, the system can manage a much larger vocabulary and -consistently reduce the OOV rate. We observe up to 2% BLEU point improvement in -a simulated out of domain translation setup. -" -3441,1609.04628,"Rocco Tripodi, Sebastiano Vascon, Marcello Pelillo",Context Aware Nonnegative Matrix Factorization Clustering,cs.CV cs.AI cs.CL cs.GT," In this article we propose a method to refine the clustering results obtained -with the nonnegative matrix factorization (NMF) technique, imposing consistency -constraints on the final labeling of the data. The research community focused -its effort on the initialization and on the optimization part of this method, -without paying attention to the final cluster assignments. We propose a game -theoretic framework in which each object to be clustered is represented as a -player, which has to choose its cluster membership. The information obtained -with NMF is used to initialize the strategy space of the players and a weighted -graph is used to model the interactions among the players. These interactions -allow the players to choose a cluster which is coherent with the clusters -chosen by similar players, a property which is not guaranteed by NMF, since it -produces a soft clustering of the data. The results on common benchmarks show -that our model is able to improve the performances of many NMF formulations. -" -3442,1609.04779,Trang Tran and Mari Ostendorf,"Characterizing the Language of Online Communities and its Relation to - Community Reception",cs.CL," This work investigates style and topic aspects of language in online -communities: looking at both utility as an identifier of the community and -correlation with community reception of content. Style is characterized using a -hybrid word and part-of-speech tag n-gram language model, while topic is -represented using Latent Dirichlet Allocation. Experiments with several Reddit -forums show that style is a better indicator of community identity than topic, -even for communities organized around specific topics. Further, there is a -positive correlation between the community reception to a contribution and the -style similarity to that community, but not so for topic similarity. -" -3443,1609.04873,Chris Quirk and Hoifung Poon,Distant Supervision for Relation Extraction beyond the Sentence Boundary,cs.CL," The growing demand for structured knowledge has led to great interest in -relation extraction, especially in cases with limited supervision. However, -existing distance supervision approaches only extract relations expressed in -single sentences. In general, cross-sentence relation extraction is -under-explored, even in the supervised-learning setting. In this paper, we -propose the first approach for applying distant supervision to cross- sentence -relation extraction. At the core of our approach is a graph representation that -can incorporate both standard dependencies and discourse relations, thus -providing a unifying way to model relations within and across sentences. We -extract features from multiple paths in this graph, increasing accuracy and -robustness when confronted with linguistic variation and analysis error. -Experiments on an important extraction task for precision medicine show that -our approach can learn an accurate cross-sentence extractor, using only a small -existing knowledge base and unlabeled text from biomedical research articles. -Compared to the existing distant supervision paradigm, our approach extracted -twice as many relations at similar precision, thus demonstrating the prevalence -of cross-sentence relations and the promise of our approach. -" -3444,1609.04904,Ethan Fast and Eric Horvitz,Long-Term Trends in the Public Perception of Artificial Intelligence,cs.CL cs.AI cs.CY," Analyses of text corpora over time can reveal trends in beliefs, interest, -and sentiment about a topic. We focus on views expressed about artificial -intelligence (AI) in the New York Times over a 30-year period. General -interest, awareness, and discussion about AI has waxed and waned since the -field was founded in 1956. We present a set of measures that captures levels of -engagement, measures of pessimism and optimism, the prevalence of specific -hopes and concerns, and topics that are linked to discussions about AI over -decades. We find that discussion of AI has increased sharply since 2009, and -that these discussions have been consistently more optimistic than pessimistic. -However, when we examine specific concerns, we find that worries of loss of -control of AI, ethical concerns for AI, and the negative impact of AI on work -have grown in recent years. We also find that hopes for AI in healthcare and -education have increased over time. -" -3445,1609.04909,"Shourya Roy, Himanshu S. Bhatt, Y. Narahari","An Iterative Transfer Learning Based Ensemble Technique for Automatic - Short Answer Grading",cs.CL," Automatic short answer grading (ASAG) techniques are designed to -automatically assess short answers to questions in natural language, having a -length of a few words to a few sentences. Supervised ASAG techniques have been -demonstrated to be effective but suffer from a couple of key practical -limitations. They are greatly reliant on instructor provided model answers and -need labeled training data in the form of graded student answers for every -assessment task. To overcome these, in this paper, we introduce an ASAG -technique with two novel features. We propose an iterative technique on an -ensemble of (a) a text classifier of student answers and (b) a classifier using -numeric features derived from various similarity measures with respect to model -answers. Second, we employ canonical correlation analysis based transfer -learning on a common feature representation to build the classifier ensemble -for questions having no labelled data. The proposed technique handsomely beats -all winning supervised entries on the SCIENTSBANK dataset from the Student -Response Analysis task of SemEval 2013. Additionally, we demonstrate -generalizability and benefits of the proposed technique through evaluation on -multiple ASAG datasets from different subject topics and standards. -" -3446,1609.04938,"Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush",Image-to-Markup Generation with Coarse-to-Fine Attention,cs.CV cs.CL cs.LG cs.NE," We present a neural encoder-decoder model to convert images into -presentational markup based on a scalable coarse-to-fine attention mechanism. -Our method is evaluated in the context of image-to-LaTeX generation, and we -introduce a new dataset of real-world rendered mathematical expressions paired -with LaTeX markup. We show that unlike neural OCR techniques using CTC-based -models, attention-based approaches can tackle this non-standard OCR task. Our -approach outperforms classical mathematical OCR systems by a large margin on -in-domain rendered data, and, with pretraining, also performs well on -out-of-domain handwritten data. To reduce the inference complexity associated -with the attention-based approaches, we introduce a new coarse-to-fine -attention layer that selects a support region before applying attention. -" -3447,1609.05104,T.V. Ananthapadmanabha and A.G. Ramakrishnan,"Intrinsic normalization and extrinsic denormalization of formant data of - vowels",cs.SD cs.CL," Using a known speaker-intrinsic normalization procedure, formant data are -scaled by the reciprocal of the geometric mean of the first three formant -frequencies. This reduces the influence of the talker but results in a -distorted vowel space. The proposed speaker-extrinsic procedure re-scales the -normalized values by the mean formant values of vowels. When tested on the -formant data of vowels published by Peterson and Barney, the combined approach -leads to well separated clusters by reducing the spread due to talkers. The -proposed procedure performs better than two top-ranked normalization procedures -based on the accuracy of vowel classification as the objective measure. -" -3448,1609.05180,"Shuhan Wang, Erik Andersen","Grammatical Templates: Improving Text Difficulty Evaluation for Language - Learners",cs.CL cs.AI," Language students are most engaged while reading texts at an appropriate -difficulty level. However, existing methods of evaluating text difficulty focus -mainly on vocabulary and do not prioritize grammatical features, hence they do -not work well for language learners with limited knowledge of grammar. In this -paper, we introduce grammatical templates, the expert-identified units of -grammar that students learn from class, as an important feature of text -difficulty evaluation. Experimental classification results show that -grammatical template features significantly improve text difficulty prediction -accuracy over baseline readability features by 7.4%. Moreover, we build a -simple and human-understandable text difficulty evaluation approach with 87.7% -accuracy, using only 5 grammatical template features. -" -3449,1609.05234,"Yen-Chen Wu, Tzu-Hsiang Lin, Yang-De Chen, Hung-Yi Lee, Lin-Shan Lee",Interactive Spoken Content Retrieval by Deep Reinforcement Learning,cs.CL cs.IR," User-machine interaction is important for spoken content retrieval. For text -content retrieval, the user can easily scan through and select on a list of -retrieved item. This is impossible for spoken content retrieval, because the -retrieved items are difficult to show on screen. Besides, due to the high -degree of uncertainty for speech recognition, the retrieval results can be very -noisy. One way to counter such difficulties is through user-machine -interaction. The machine can take different actions to interact with the user -to obtain better retrieval results before showing to the user. The suitable -actions depend on the retrieval status, for example requesting for extra -information from the user, returning a list of topics for user to select, etc. -In our previous work, some hand-crafted states estimated from the present -retrieval results are used to determine the proper actions. In this paper, we -propose to use Deep-Q-Learning techniques instead to determine the machine -actions for interactive spoken content retrieval. Deep-Q-Learning bypasses the -need for estimation of the hand-crafted states, and directly determine the best -action base on the present retrieval status even without any human knowledge. -It is shown to achieve significantly better performance compared with the -previous hand-crafted states. -" -3450,1609.05244,"Haohan Wang, Aaksha Meghawat, Louis-Philippe Morency and Eric P. Xing","Select-Additive Learning: Improving Generalization in Multimodal - Sentiment Analysis",cs.CL cs.IR," Multimodal sentiment analysis is drawing an increasing amount of attention -these days. It enables mining of opinions in video reviews which are now -available aplenty on online platforms. However, multimodal sentiment analysis -has only a few high-quality data sets annotated for training machine learning -algorithms. These limited resources restrict the generalizability of models, -where, for example, the unique characteristics of a few speakers (e.g., wearing -glasses) may become a confounding factor for the sentiment classification task. -In this paper, we propose a Select-Additive Learning (SAL) procedure that -improves the generalizability of trained neural networks for multimodal -sentiment analysis. In our experiments, we show that our SAL approach improves -prediction accuracy significantly in all three modalities (verbal, acoustic, -visual), as well as in their fusion. Our results show that SAL, even when -trained on one dataset, achieves good generalization across two new test -datasets. -" -3451,1609.05511,Dafydd Gibbon and Sascha Griffiths,Multilinear Grammar: Ranks and Interpretations,cs.CL," Multilinear Grammar provides a framework for integrating the many different -syntagmatic structures of language into a coherent semiotically based Rank -Interpretation Architecture, with default linear grammars at each rank. The -architecture defines a Sui Generis Condition on ranks, from discourse through -utterance and phrasal structures to the word, with its sub-ranks of morphology -and phonology. Each rank has unique structures and its own semantic-pragmatic -and prosodic-phonetic interpretation models. Default computational models for -each rank are proposed, based on a Procedural Plausibility Condition: -incremental processing in linear time with finite working memory. We suggest -that the Rank Interpretation Architecture and its multilinear properties -provide systematic design features of human languages, contrasting with -unordered lists of key properties or single structural properties at one rank, -such as recursion, which have previously been been put forward as language -design features. The framework provides a realistic background for the gradual -development of complexity in the phylogeny and ontogeny of language, and -clarifies a range of challenges for the evaluation of realistic linguistic -theories and applications. The empirical objective of the paper is to -demonstrate unique multilinear properties at each rank and thereby motivate the -Multilinear Grammar and Rank Interpretation Architecture framework as a -coherent approach to capturing the complexity of human languages in the -simplest possible way. -" -3452,1609.05600,"Damien Teney, Lingqiao Liu, Anton van den Hengel",Graph-Structured Representations for Visual Question Answering,cs.CV cs.AI cs.CL," This paper proposes to improve visual question answering (VQA) with -structured representations of both scene contents and questions. A key -challenge in VQA is to require joint reasoning over the visual and text -domains. The predominant CNN/LSTM-based approach to VQA is limited by -monolithic vector representations that largely ignore structure in the scene -and in the form of the question. CNN feature vectors cannot effectively capture -situations as simple as multiple object instances, and LSTMs process questions -as series of words, which does not reflect the true complexity of language -structure. We instead propose to build graphs over the scene objects and over -the question words, and we describe a deep neural network that exploits the -structure in these representations. This shows significant benefit over the -sequential processing of LSTMs. The overall efficacy of our approach is -demonstrated by significant improvements over the state-of-the-art, from 71.2% -to 74.4% in accuracy on the ""abstract scenes"" multiple-choice benchmark, and -from 34.7% to 39.1% in accuracy over pairs of ""balanced"" scenes, i.e. images -with fine-grained differences and opposite yes/no answers to a same question. -" -3453,1609.05625,"Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, - Steve Renals, Yifan Zhang",The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition,cs.CL," This paper describes the Arabic Multi-Genre Broadcast (MGB-2) Challenge for -SLT-2016. Unlike last year's English MGB Challenge, which focused on -recognition of diverse TV genres, this year, the challenge has an emphasis on -handling the diversity in dialect in Arabic speech. Audio data comes from 19 -distinct programmes from the Aljazeera Arabic TV channel between March 2005 and -December 2015. Programmes are split into three groups: conversations, -interviews, and reports. A total of 1,200 hours have been released with lightly -supervised transcriptions for the acoustic modelling. For language modelling, -we made available over 110M words crawled from Aljazeera Arabic website -Aljazeera.net for a 10 year duration 2000-2011. Two lexicons have been -provided, one phoneme based and one grapheme based. Finally, two tasks were -proposed for this year's challenge: standard speech transcription, and word -alignment. This paper describes the task data and evaluation process used in -the MGB challenge, and summarises the results obtained. -" -3454,1609.05650,"Sameer Khurana, Ahmed Ali, Steve Renals","Multi-view Dimensionality Reduction for Dialect Identification of Arabic - Broadcast Speech",cs.CL," In this work, we present a new Vector Space Model (VSM) of speech utterances -for the task of spoken dialect identification. Generally, DID systems are built -using two sets of features that are extracted from speech utterances; acoustic -and phonetic. The acoustic and phonetic features are used to form vector -representations of speech utterances in an attempt to encode information about -the spoken dialects. The Phonotactic and Acoustic VSMs, thus formed, are used -for the task of DID. The aim of this paper is to construct a single VSM that -encodes information about spoken dialects from both the Phonotactic and -Acoustic VSMs. Given the two views of the data, we make use of a well known -multi-view dimensionality reduction technique known as Canonical Correlation -Analysis (CCA), to form a single vector representation for each speech -utterance that encodes dialect specific discriminative information from both -the phonetic and acoustic representations. We refer to this approach as feature -space combination approach and show that our CCA based feature vector -representation performs better on the Arabic DID task than the phonetic and -acoustic feature representations used alone. We also present the feature space -combination approach as a viable alternative to the model based combination -approach, where two DID systems are built using the two VSMs (Phonotactic and -Acoustic) and the final prediction score is the output score combination from -the two systems. -" -3455,1609.05935,"G. Zweig, C. Yu, J. Droppo and A. Stolcke",Advances in All-Neural Speech Recognition,cs.CL," This paper advances the design of CTC-based all-neural (or end-to-end) speech -recognizers. We propose a novel symbol inventory, and a novel iterated-CTC -method in which a second system is used to transform a noisy initial output -into a cleaner version. We present a number of stabilization and initialization -methods we have found useful in training these networks. We evaluate our system -on the commonly used NIST 2000 conversational telephony test set, and -significantly exceed the previously published performance of similar systems, -both with and without the use of an external language model and decoding -technology. -" -3456,1609.06038,"Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, Diana Inkpen",Enhanced LSTM for Natural Language Inference,cs.CL," Reasoning and inference are central to human and artificial intelligence. -Modeling inference in human language is very challenging. With the availability -of large annotated data (Bowman et al., 2015), it has recently become feasible -to train neural network based inference models, which have shown to be very -effective. In this paper, we present a new state-of-the-art result, achieving -the accuracy of 88.6% on the Stanford Natural Language Inference Dataset. -Unlike the previous top models that use very complicated network architectures, -we first demonstrate that carefully designing sequential inference models based -on chain LSTMs can outperform all previous models. Based on this, we further -show that by explicitly considering recursive architectures in both local -inference modeling and inference composition, we achieve additional -improvement. Particularly, incorporating syntactic parsing information -contributes to our best result---it further improves the performance even when -added to the already very strong model. -" -3457,1609.06049,"Ngoc-Tien Le, Benjamin Lecouteux, Laurent Besacier","Automatic Quality Assessment for Speech Translation Using Joint ASR and - MT Features",cs.CL," This paper addresses automatic quality assessment of spoken language -translation (SLT). This relatively new task is defined and formalized as a -sequence labeling problem where each word in the SLT hypothesis is tagged as -good or bad according to a large feature set. We propose several word -confidence estimators (WCE) based on our automatic evaluation of transcription -(ASR) quality, translation (MT) quality, or both (combined ASR+MT). This -research work is possible because we built a specific corpus which contains -6.7k utterances for which a quintuplet containing: ASR output, verbatim -transcript, text translation, speech translation and post-edition of -translation is built. The conclusion of our multiple experiments using joint -ASR and MT features for WCE is that MT features remain the most influent while -ASR feature can bring interesting complementary information. Our robust quality -estimators for SLT can be used for re-scoring speech translation graphs or for -providing feedback to the user in interactive speech translation or -computer-assisted speech-to-text scenarios. -" -3458,1609.06082,Yitong Li and Trevor Cohn and Timothy Baldwin,Learning Robust Representations of Text,cs.CL," Deep neural networks have achieved remarkable results across many language -processing tasks, however these methods are highly sensitive to noise and -adversarial attacks. We present a regularization based method for limiting -network sensitivity to its inputs, inspired by ideas from computer vision, thus -learning models that are more robust. Empirical evaluation over a range of -sentiment datasets with a convolutional neural network shows that, compared to -a baseline model and the dropout method, our method achieves superior -performance over noisy inputs and out-of-domain data. -" -3459,1609.06127,Diana Jlailaty and Daniela Grigori and Khalid Belhajjame,A framework for mining process models from emails logs,cs.CL cs.LG," Due to its wide use in personal, but most importantly, professional contexts, -email represents a valuable source of information that can be harvested for -understanding, reengineering and repurposing undocumented business processes of -companies and institutions. Towards this aim, a few researchers investigated -the problem of extracting process oriented information from email logs in order -to take benefit of the many available process mining techniques and tools. In -this paper we go further in this direction, by proposing a new method for -mining process models from email logs that leverage unsupervised machine -learning techniques with little human involvement. Moreover, our method allows -to semi-automatically label emails with activity names, that can be used for -activity recognition in new incoming emails. A use case demonstrates the -usefulness of the proposed solution using a modest in size, yet real-world, -dataset containing emails that belong to two different process models. -" -3460,1609.06204,Alessio Palmero Aprosio and Giovanni Moretti,Italy goes to Stanford: a collection of CoreNLP modules for Italian,cs.CL," In this we paper present Tint, an easy-to-use set of fast, accurate and -extendable Natural Language Processing modules for Italian. It is based on -Stanford CoreNLP and is freely available as a standalone software or a library -that can be integrated in an existing project. -" -3461,1609.06239,John Beieler,Generating Politically-Relevant Event Data,cs.CL," Automatically generated political event data is an important part of the -social science data ecosystem. The approaches for generating this data, though, -have remained largely the same for two decades. During this time, the field of -computational linguistics has progressed tremendously. This paper presents an -overview of political event data, including methods and ontologies, and a set -of experiments to determine the applicability of deep neural networks to the -extraction of political events from news text. -" -3462,1609.06268,"Yun Zhu, Faizan Javed, Ozgur Ozturk",Semantic Similarity Strategies for Job Title Classification,cs.AI cs.CL," Automatic and accurate classification of items enables numerous downstream -applications in many domains. These applications can range from faceted -browsing of items to product recommendations and big data analytics. In the -online recruitment domain, we refer to classifying job ads to pre-defined or -custom occupation categories as job title classification. A large-scale job -title classification system can power various downstream applications such as -semantic search, job recommendations and labor market analytics. In this paper, -we discuss experiments conducted to improve our in-house job title -classification system. The classification component of the system is composed -of a two-stage coarse and fine level classifier cascade that classifies input -text such as job title and/or job ads to one of the thousands of job titles in -our taxonomy. To improve classification accuracy and effectiveness, we -experiment with various semantic representation strategies such as average W2V -vectors and document similarity measures such as Word Movers Distance (WMD). -Our initial results show an overall improvement in accuracy of Carotene[1]. -" -3463,1609.06380,Yang Liu and Sujian Li,"Recognizing Implicit Discourse Relations via Repeated Reading: Neural - Networks with Multi-Level Attention",cs.CL cs.AI," Recognizing implicit discourse relations is a challenging but important task -in the field of Natural Language Processing. For such a complex text processing -task, different from previous studies, we argue that it is necessary to -repeatedly read the arguments and dynamically exploit the efficient features -useful for recognizing discourse relations. To mimic the repeated reading -strategy, we propose the neural networks with multi-level attention (NNMA), -combining the attention mechanism and external memories to gradually fix the -attention on some specific words helpful to judging the discourse relations. -Experiments on the PDTB dataset show that our proposed method achieves the -state-of-art results. The visualization of the attention weights also -illustrates the progress that our model observes the arguments on each level -and progressively locates the important words. -" -3464,1609.06404,"Suwon Shon, Seongkyu Mun, John H.L. Hansen, Hanseok Ko","KU-ISPL Language Recognition System for NIST 2015 i-Vector Machine - Learning Challenge",cs.SD cs.CL," In language recognition, the task of rejecting/differentiating closely spaced -versus acoustically far spaced languages remains a major challenge. For -confusable closely spaced languages, the system needs longer input test -duration material to obtain sufficient information to distinguish between -languages. Alternatively, if languages are distinct and not -acoustically/linguistically similar to others, duration is not a sufficient -remedy. The solution proposed here is to explore duration distribution analysis -for near/far languages based on the Language Recognition i-Vector Machine -Learning Challenge 2015 (LRiMLC15) database. Using this knowledge, we propose a -likelihood ratio based fusion approach that leveraged both score and duration -information. The experimental results show that the use of duration and score -fusion improves language recognition performance by 5% relative in LRiMLC15 -cost. -" -3465,1609.06490,"Xiaoqing Li, Jiajun Zhang and Chengqing Zong",One Sentence One Model for Neural Machine Translation,cs.CL," Neural machine translation (NMT) becomes a new state-of-the-art and achieves -promising translation results using a simple encoder-decoder neural network. -This neural network is trained once on the parallel corpus and the fixed -network is used to translate all the test sentences. We argue that the general -fixed network cannot best fit the specific test sentences. In this paper, we -propose the dynamic NMT which learns a general network as usual, and then -fine-tunes the network for each test sentence. The fine-tune work is done on a -small set of the bilingual training data that is obtained through similarity -search according to the test sentence. Extensive experiments demonstrate that -this method can significantly improve the translation performance, especially -when highly similar sentences are available. -" -3466,1609.06492,"Darko Brodic, Alessia Amelio, Zoran N. Milivojevic, Milena Jevtic",Document Image Coding and Clustering for Script Discrimination,cs.CV cs.AI cs.CL cs.LG cs.NE," The paper introduces a new method for discrimination of documents given in -different scripts. The document is mapped into a uniformly coded text of -numerical values. It is derived from the position of the letters in the text -line, based on their typographical characteristics. Each code is considered as -a gray level. Accordingly, the coded text determines a 1-D image, on which -texture analysis by run-length statistics and local binary pattern is -performed. It defines feature vectors representing the script content of the -document. A modified clustering approach employed on document feature vector -groups documents written in the same script. Experimentation performed on two -custom oriented databases of historical documents in old Cyrillic, angular and -round Glagolitic as well as Antiqua and Fraktur scripts demonstrates the -superiority of the proposed method with respect to well-known methods in the -state-of-the-art. -" -3467,1609.06530,"Sameer Bansal, Herman Kamper, Sharon Goldwater, Adam Lopez","Weakly supervised spoken term discovery using cross-lingual side - information",cs.CL," Recent work on unsupervised term discovery (UTD) aims to identify and cluster -repeated word-like units from audio alone. These systems are promising for some -very low-resource languages where transcribed audio is unavailable, or where no -written form of the language exists. However, in some cases it may still be -feasible (e.g., through crowdsourcing) to obtain (possibly noisy) text -translations of the audio. If so, this information could be used as a source of -side information to improve UTD. Here, we present a simple method for rescoring -the output of a UTD system using text translations, and test it on a corpus of -Spanish audio with English translations. We show that it greatly improves the -average precision of the results over a wide range of system configurations and -data preprocessing methods. -" -3468,1609.06577,"Fabio Del Vigna, Marinella Petrocchi, Alessandro Tommasi, Cesare - Zavattari, Maurizio Tesconi","Semi-supervised knowledge extraction for detection of drugs and their - effects",cs.CL," New Psychoactive Substances (NPS) are drugs that lay in a grey area of -legislation, since they are not internationally and officially banned, possibly -leading to their not prosecutable trade. The exacerbation of the phenomenon is -that NPS can be easily sold and bought online. Here, we consider large corpora -of textual posts, published on online forums specialized on drug discussions, -plus a small set of known substances and associated effects, which we call -seeds. We propose a semi-supervised approach to knowledge extraction, applied -to the detection of drugs (comprising NPS) and effects from the corpora under -investigation. Based on the very small set of initial seeds, the work -highlights how a contrastive approach and context deduction are effective in -detecting substances and effects from the corpora. Our promising results, which -feature a F1 score close to 0.9, pave the way for shortening the detection time -of new psychoactive substances, once these are discussed and advertised on the -Internet. -" -3469,1609.06578,"Kar Wai Lim, Wray Buntine","Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by - Leveraging Hashtags and Sentiment Lexicon",cs.CL cs.IR cs.LG," Aspect-based opinion mining is widely applied to review data to aggregate or -summarize opinions of a product, and the current state-of-the-art is achieved -with Latent Dirichlet Allocation (LDA)-based model. Although social media data -like tweets are laden with opinions, their ""dirty"" nature (as natural language) -has discouraged researchers from applying LDA-based opinion model for product -review mining. Tweets are often informal, unstructured and lacking labeled data -such as categories and ratings, making it challenging for product opinion -mining. In this paper, we propose an LDA-based opinion model named Twitter -Opinion Topic Model (TOTM) for opinion mining and sentiment analysis. TOTM -leverages hashtags, mentions, emoticons and strong sentiment words that are -present in tweets in its discovery process. It improves opinion prediction by -modeling the target-opinion interaction directly, thus discovering target -specific opinion words, neglected in existing approaches. Moreover, we propose -a new formulation of incorporating sentiment prior information into a topic -model, by utilizing an existing public sentiment lexicon. This is novel in that -it learns and updates with the data. We conduct experiments on 9 million tweets -on electronic products, and demonstrate the improved performance of TOTM in -both quantitative evaluations and qualitative analysis. We show that -aspect-based opinion analysis on massive volume of tweets provides useful -opinions on products. -" -3470,1609.06616,John J. Nay,"Gov2Vec: Learning Distributed Representations of Institutions and Their - Legal Text",cs.CL cs.IR cs.NE cs.SI," We compare policy differences across institutions by embedding -representations of the entire legal corpus of each institution and the -vocabulary shared across all corpora into a continuous vector space. We apply -our method, Gov2Vec, to Supreme Court opinions, Presidential actions, and -official summaries of Congressional bills. The model discerns meaningful -differences between government branches. We also learn representations for more -fine-grained word sources: individual Presidents and (2-year) Congresses. The -similarities between learned representations of Congresses over time and -sitting Presidents are negatively correlated with the bill veto rate, and the -temporal ordering of Presidents and Congresses was implicitly learned from only -text. With the resulting vectors we answer questions such as: how does Obama -and the 113th House differ in addressing climate change and how does this vary -from environmental or economic perspectives? Our work illustrates -vector-arithmetic-based investigations of complex relationships between word -sources based on their texts. We are extending this to create a more -comprehensive legal semantic map. -" -3471,1609.06649,"Ke Wu, Kyle Gorman, and Richard Sproat",Minimally Supervised Written-to-Spoken Text Normalization,cs.CL," In speech-applications such as text-to-speech (TTS) or automatic speech -recognition (ASR), \emph{text normalization} refers to the task of converting -from a \emph{written} representation into a representation of how the text is -to be \emph{spoken}. In all real-world speech applications, the text -normalization engine is developed---in large part---by hand. For example, a -hand-built grammar may be used to enumerate the possible ways of saying a given -token in a given language, and a statistical model used to select the most -appropriate pronunciation in context. In this study we examine the tradeoffs -associated with using more or less language-specific domain knowledge in a text -normalization engine. In the most data-rich scenario, we have access to a -carefully constructed hand-built normalization grammar that for any given token -will produce a set of all possible verbalizations for that token. We also -assume a corpus of aligned written-spoken utterances, from which we can train a -ranking model that selects the appropriate verbalization for the given context. -As a substitute for the carefully constructed grammar, we also consider a -scenario with a language-universal normalization \emph{covering grammar}, where -the developer merely needs to provide a set of lexical items particular to the -language. As a substitute for the aligned corpus, we also consider a scenario -where one only has the spoken side, and the corresponding written side is -""hallucinated"" by composing the spoken side with the inverted normalization -grammar. We investigate the accuracy of a text normalization engine under each -of these scenarios. We report the results of experiments on English and -Russian. -" -3472,1609.06657,"Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada","The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question - Answering (FSVQA)",cs.CV cs.CL," Visual Question Answering (VQA) task has showcased a new stage of interaction -between language and vision, two of the most pivotal components of artificial -intelligence. However, it has mostly focused on generating short and repetitive -answers, mostly single words, which fall short of rich linguistic capabilities -of humans. We introduce Full-Sentence Visual Question Answering (FSVQA) -dataset, consisting of nearly 1 million pairs of questions and full-sentence -answers for images, built by applying a number of rule-based natural language -processing techniques to original VQA dataset and captions in the MS COCO -dataset. This poses many additional complexities to conventional VQA task, and -we provide a baseline for approaching and evaluating the task, on top of which -we invite the research community to build further improvements. -" -3473,1609.06686,"Sebastian Ruder, Parsa Ghaffari, John G. Breslin","Character-level and Multi-channel Convolutional Neural Networks for - Large-scale Authorship Attribution",cs.CL cs.LG," Convolutional neural networks (CNNs) have demonstrated superior capability -for extracting information from raw signals in computer vision. Recently, -character-level and multi-channel CNNs have exhibited excellent performance for -sentence classification tasks. We apply CNNs to large-scale authorship -attribution, which aims to determine an unknown text's author among many -candidate authors, motivated by their ability to process character-level -signals and to differentiate between a large number of classes, while making -fast predictions in comparison to state-of-the-art approaches. We extensively -evaluate CNN-based approaches that leverage word and character channels and -compare them against state-of-the-art methods for a large range of author -numbers, shedding new light on traditional approaches. We show that -character-level CNNs outperform the state-of-the-art on four out of five -datasets in different domains. Additionally, we present the first application -of authorship attribution to reddit. -" -3474,1609.06773,"Suyoun Kim, Takaaki Hori, Shinji Watanabe","Joint CTC-Attention based End-to-End Speech Recognition using Multi-task - Learning",cs.CL," Recently, there has been an increasing interest in end-to-end speech -recognition that directly transcribes speech to text without any predefined -alignments. One approach is the attention-based encoder-decoder framework that -learns a mapping between variable-length input and output sequences in one step -using a purely data-driven method. The attention model has often been shown to -improve the performance over another end-to-end approach, the Connectionist -Temporal Classification (CTC), mainly because it explicitly uses the history of -the target character without any conditional independence assumptions. However, -we observed that the performance of the attention has shown poor results in -noisy condition and is hard to learn in the initial training stage with long -input sequences. This is because the attention model is too flexible to predict -proper alignments in such cases due to the lack of left-to-right constraints as -used in CTC. This paper presents a novel method for end-to-end speech -recognition to improve robustness and achieve fast convergence by using a joint -CTC-attention model within the multi-task learning framework, thereby -mitigating the alignment issue. An experiment on the WSJ and CHiME-4 tasks -demonstrates its advantages over both the CTC and attention-based -encoder-decoder baselines, showing 5.4-14.6% relative improvements in Character -Error Rate (CER). -" -3475,1609.06783,"Kar Wai Lim, Wray Buntine, Changyou Chen, Lan Du","Nonparametric Bayesian Topic Modelling with the Hierarchical Pitman-Yor - Processes",stat.ML cs.CL cs.LG," The Dirichlet process and its extension, the Pitman-Yor process, are -stochastic processes that take probability distributions as a parameter. These -processes can be stacked up to form a hierarchical nonparametric Bayesian -model. In this article, we present efficient methods for the use of these -processes in this hierarchical context, and apply them to latent variable -models for text analytics. In particular, we propose a general framework for -designing these Bayesian models, which are called topic models in the computer -science community. We then propose a specific nonparametric Bayesian topic -model for modelling text from social media. We focus on tweets (posts on -Twitter) in this article due to their ease of access. We find that our -nonparametric model performs better than existing parametric models in both -goodness of fit and real world applications. -" -3476,1609.06791,"Kar Wai Lim, Changyou Chen, Wray Buntine","Twitter-Network Topic Model: A Full Bayesian Treatment for Social - Network and Text Modeling",cs.CL cs.IR cs.SI," Twitter data is extremely noisy -- each tweet is short, unstructured and with -informal language, a challenge for current topic modeling. On the other hand, -tweets are accompanied by extra information such as authorship, hashtags and -the user-follower network. Exploiting this additional information, we propose -the Twitter-Network (TN) topic model to jointly model the text and the social -network in a full Bayesian nonparametric way. The TN topic model employs the -hierarchical Poisson-Dirichlet processes (PDP) for text modeling and a Gaussian -process random function model for social network modeling. We show that the TN -topic model significantly outperforms several existing nonparametric models due -to its flexibility. Moreover, the TN topic model enables additional informative -inference such as authors' interests, hashtag analysis, as well as leading to -further applications such as author recommendation, automatic topic labeling -and hashtag suggestion. Note our general inference framework can readily be -applied to other topic models with embedded PDP nodes. -" -3477,1609.07028,"Ruobing Xie, Zhiyuan Liu, Huanbo Luan, Maosong Sun",Image-embodied Knowledge Representation Learning,cs.CV cs.CL," Entity images could provide significant visual information for knowledge -representation learning. Most conventional methods learn knowledge -representations merely from structured triples, ignoring rich visual -information extracted from entity images. In this paper, we propose a novel -Image-embodied Knowledge Representation Learning model (IKRL), where knowledge -representations are learned with both triple facts and images. More -specifically, we first construct representations for all images of an entity -with a neural image encoder. These image representations are then integrated -into an aggregated image-based representation via an attention-based method. We -evaluate our IKRL models on knowledge graph completion and triple -classification. Experimental results demonstrate that our models outperform all -baselines on both tasks, which indicates the significance of visual information -for knowledge representations and the capability of our models in learning -knowledge representations with images. -" -3478,1609.07033,"Siddhartha Banerjee, Prasenjit Mitra and Kazunari Sugiyama",Generating Abstractive Summaries from Meeting Transcripts,cs.CL," Summaries of meetings are very important as they convey the essential content -of discussions in a concise form. Generally, it is time consuming to read and -understand the whole documents. Therefore, summaries play an important role as -the readers are interested in only the important context of discussions. In -this work, we address the task of meeting document summarization. Automatic -summarization systems on meeting conversations developed so far have been -primarily extractive, resulting in unacceptable summaries that are hard to -read. The extracted utterances contain disfluencies that affect the quality of -the extractive summaries. To make summaries much more readable, we propose an -approach to generating abstractive summaries by fusing important content from -several utterances. We first separate meeting transcripts into various topic -segments, and then identify the important utterances in each segment using a -supervised learning approach. The important utterances are then combined -together to generate a one-sentence summary. In the text generation step, the -dependency parses of the utterances in each segment are combined together to -create a directed graph. The most informative and well-formed sub-graph -obtained by integer linear programming (ILP) is selected to generate a -one-sentence summary for each topic segment. The ILP formulation reduces -disfluencies by leveraging grammatical relations that are more prominent in -non-conversational style of text, and therefore generates summaries that is -comparable to human-written abstractive summaries. Experimental results show -that our method can generate more informative summaries than the baselines. In -addition, readability assessments by human judges as well as log-likelihood -estimates obtained from the dependency parser show that our generated summaries -are significantly readable and well-formed. -" -3479,1609.07034,"Siddhartha Banerjee, Prasenjit Mitra and Kazunari Sugiyama","Multi-document abstractive summarization using ILP based multi-sentence - compression",cs.CL," Abstractive summarization is an ideal form of summarization since it can -synthesize information from multiple documents to create concise informative -summaries. In this work, we aim at developing an abstractive summarizer. First, -our proposed approach identifies the most important document in the -multi-document set. The sentences in the most important document are aligned to -sentences in other documents to generate clusters of similar sentences. Second, -we generate K-shortest paths from the sentences in each cluster using a -word-graph structure. Finally, we select sentences from the set of shortest -paths generated from all the clusters employing a novel integer linear -programming (ILP) model with the objective of maximizing information content -and readability of the final summary. Our ILP model represents the shortest -paths as binary variables and considers the length of the path, information -score and linguistic quality score in the objective function. Experimental -results on the DUC 2004 and 2005 multi-document summarization datasets show -that our proposed approach outperforms all the baselines and state-of-the-art -extractive summarizers as measured by the ROUGE scores. Our method also -outperforms a recent abstractive summarization technique. In manual evaluation, -our approach also achieves promising results on informativeness and -readability. -" -3480,1609.07035,"Siddhartha Banerjee, Prasenjit Mitra and Kazunari Sugiyama",Abstractive Meeting Summarization UsingDependency Graph Fusion,cs.CL," Automatic summarization techniques on meeting conversations developed so far -have been primarily extractive, resulting in poor summaries. To improve this, -we propose an approach to generate abstractive summaries by fusing important -content from several utterances. Any meeting is generally comprised of several -discussion topic segments. For each topic segment within a meeting -conversation, we aim to generate a one sentence summary from the most important -utterances using an integer linear programming-based sentence fusion approach. -Experimental results show that our method can generate more informative -summaries than the baselines. -" -3481,1609.07053,Johannes Bjerva and Barbara Plank and Johan Bos,Semantic Tagging with Deep Residual Networks,cs.CL," We propose a novel semantic tagging task, sem-tagging, tailored for the -purpose of multilingual semantic parsing, and present the first tagger using -deep residual networks (ResNets). Our tagger uses both word and character -representations and includes a novel residual bypass architecture. We evaluate -the tagset both intrinsically on the new task of semantic tagging, as well as -on Part-of-Speech (POS) tagging. Our system, consisting of a ResNet and an -auxiliary loss function predicting our semantic tags, significantly outperforms -prior results on English Universal Dependencies POS tagging (95.71% accuracy on -UD v1.2 and 95.67% accuracy on UD v1.3). -" -3482,1609.07075,"Jiawei Wu, Ruobing Xie, Zhiyuan Liu, Maosong Sun","Knowledge Representation via Joint Learning of Sequential Text and - Knowledge Graphs",cs.CL," Textual information is considered as significant supplement to knowledge -representation learning (KRL). There are two main challenges for constructing -knowledge representations from plain texts: (1) How to take full advantages of -sequential contexts of entities in plain texts for KRL. (2) How to dynamically -select those informative sentences of the corresponding entities for KRL. In -this paper, we propose the Sequential Text-embodied Knowledge Representation -Learning to build knowledge representations from multiple sentences. Given each -reference sentence of an entity, we first utilize recurrent neural network with -pooling or long short-term memory network to encode the semantic information of -the sentence with respect to the entity. Then we further design an attention -model to measure the informativeness of each sentence, and build text-based -representations of entities. We evaluate our method on two tasks, including -triple classification and link prediction. Experimental results demonstrate -that our method outperforms other baselines on both tasks, which indicates that -our method is capable of selecting informative sentences and encoding the -textual information well into knowledge representations. -" -3483,1609.07197,Shyam Upadhyay and Ming-Wei Chang,"Annotating Derivations: A New Evaluation Strategy and Dataset for - Algebra Word Problems",cs.CL," We propose a new evaluation for automatic solvers for algebra word problems, -which can identify mistakes that existing evaluations overlook. Our proposal is -to evaluate such solvers using derivations, which reflect how an equation -system was constructed from the word problem. To accomplish this, we develop an -algorithm for checking the equivalence between two derivations, and show how -derivation an- notations can be semi-automatically added to existing datasets. -To make our experiments more comprehensive, we include the derivation -annotation for DRAW-1K, a new dataset containing 1000 general algebra word -problems. In our experiments, we found that the annotated derivations enable a -more accurate evaluation of automatic solvers than previously used metrics. We -release derivation annotations for over 2300 algebra word problems for future -evaluations. -" -3484,1609.07222,Pengfei Liu and Xipeng Qiu and Xuanjing Huang,Deep Multi-Task Learning with Shared Memory,cs.CL," Neural network based models have achieved impressive results on various -specific tasks. However, in previous works, most models are learned separately -based on single-task supervised objectives, which often suffer from -insufficient training data. In this paper, we propose two deep architectures -which can be trained jointly on multiple related tasks. More specifically, we -augment neural model with an external memory, which is shared by several tasks. -Experiments on two groups of text classification tasks show that our proposed -architectures can improve the performance of a task with the help of other -related tasks. -" -3485,1609.07245,Xiaodong Zhuang,"A New Statistic Feature of the Short-Time Amplitude Spectrum Values for - Human's Unvoiced Pronunciation",cs.SD cs.CL," In this paper, a new statistic feature of the discrete short-time amplitude -spectrum is discovered by experiments for the signals of unvoiced -pronunciation. For the random-varying short-time spectrum, this feature reveals -the relationship between the amplitude's average and its standard for every -frequency component. On the other hand, the association between the amplitude -distributions for different frequency components is also studied. A new model -representing such association is inspired by the normalized histogram of -amplitude. By mathematical analysis, the new statistic feature discovered is -proved to be necessary evidence which supports the proposed model, and also can -be direct evidence for the widely used hypothesis of ""identical distribution of -amplitude for all frequencies"". -" -3486,1609.07317,Yishu Miao and Phil Blunsom,"Language as a Latent Variable: Discrete Generative Models for Sentence - Compression",cs.CL cs.AI," In this work we explore deep generative models of text in which the latent -representation of a document is itself drawn from a discrete language model -distribution. We formulate a variational auto-encoder for inference in this -model and apply it to the task of compressing sentences. In this application -the generative model first draws a latent summary sentence from a background -language model, and then subsequently draws the observed sentence conditioned -on this latent summary. In our empirical evaluation we show that generative -formulations of both abstractive and extractive compression yield -state-of-the-art results when trained on a large amount of supervised data. -Further, we explore semi-supervised compression scenarios where we show that it -is possible to achieve performance competitive with previously proposed -supervised models while training on a fraction of the supervised data. -" -3487,1609.07451,"Linfeng Song, Yue Zhang, Xiaochang Peng, Zhiguo Wang and Daniel Gildea",AMR-to-text generation as a Traveling Salesman Problem,cs.CL," The task of AMR-to-text generation is to generate grammatical text that -sustains the semantic meaning for a given AMR graph. We at- tack the task by -first partitioning the AMR graph into smaller fragments, and then generating -the translation for each fragment, before finally deciding the order by solving -an asymmetric generalized traveling salesman problem (AGTSP). A Maximum Entropy -classifier is trained to estimate the traveling costs, and a TSP solver is used -to find the optimized solution. The final model reports a BLEU score of 22.44 -on the SemEval-2016 Task8 dataset. -" -3488,1609.07479,"Wenyuan Zeng, Yankai Lin, Zhiyuan Liu, Maosong Sun",Incorporating Relation Paths in Neural Relation Extraction,cs.CL," Distantly supervised relation extraction has been widely used to find novel -relational facts from plain text. To predict the relation between a pair of two -target entities, existing methods solely rely on those direct sentences -containing both entities. In fact, there are also many sentences containing -only one of the target entities, which provide rich and useful information for -relation extraction. To address this issue, we build inference chains between -two target entities via intermediate entities, and propose a path-based neural -relation extraction model to encode the relational semantics from both direct -sentences and inference chains. Experimental results on real-world datasets -show that, our model can make full use of those sentences containing only one -target entity, and achieves significant and consistent improvements on relation -extraction as compared with baselines. The source code of this paper can be -obtained from https: //github.com/thunlp/PathNRE. -" -3489,1609.07498,"Saeid Safavi, Maryam Najafian, Abualsoud Hanani, Martin J Russell, - Peter Jancovic, Michael J Carey",Speaker Recognition for Children's Speech,cs.SD cs.CL," This paper presents results on Speaker Recognition (SR) for children's -speech, using the OGI Kids corpus and GMM-UBM and GMM-SVM SR systems. Regions -of the spectrum containing important speaker information for children are -identified by conducting SR experiments over 21 frequency bands. As for adults, -the spectrum can be split into four regions, with the first (containing primary -vocal tract resonance information) and third (corresponding to high frequency -speech sounds) being most useful for SR. However, the frequencies at which -these regions occur are from 11% to 38% higher for children. It is also noted -that subband SR rates are lower for younger children. Finally results are -presented of SR experiments to identify a child in a class (30 children, -similar age) and school (288 children, varying ages). Class performance depends -on age, with accuracy varying from 90% for young children to 99% for older -children. The identification rate achieved for a child in a school is 81%. -" -3490,1609.07561,"Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Noah - A. Smith",Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser,cs.CL," We introduce two first-order graph-based dependency parsers achieving a new -state of the art. The first is a consensus parser built from an ensemble of -independently trained greedy LSTM transition-based parsers with different -random initializations. We cast this approach as minimum Bayes risk decoding -(under the Hamming cost) and argue that weaker consensus within the ensemble is -a useful signal of difficulty or ambiguity. The second parser is a -""distillation"" of the ensemble into a single model. We train the distillation -parser using a structured hinge loss objective with a novel cost that -incorporates ensemble uncertainty estimates for each possible attachment, -thereby avoiding the intractable cross-entropy computations required by -applying standard distillation objectives to problems with structured outputs. -The first-order distillation parser matches or surpasses the state of the art -on English, Chinese, and German. -" -3491,1609.07568,"Yonatan Belinkov, James Glass","A Character-level Convolutional Neural Network for Distinguishing - Similar Languages and Dialects",cs.CL," Discriminating between closely-related language varieties is considered a -challenging and important task. This paper describes our submission to the DSL -2016 shared-task, which included two sub-tasks: one on discriminating similar -languages and one on identifying Arabic dialects. We developed a -character-level neural network for this task. Given a sequence of characters, -our model embeds each character in vector space, runs the sequence through -multiple convolutions with different filter widths, and pools the convolutional -representations to obtain a hidden vector representation of the text that is -used for predicting the language or dialect. We primarily focused on the Arabic -dialect identification task and obtained an F1 score of 0.4834, ranking 6th out -of 18 participants. We also analyze errors made by our system on the Arabic -data in some detail, and point to challenges such an approach is faced with. -" -3492,1609.07585,"Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi","An Investigation of Recurrent Neural Architectures for Drug Name - Recognition",cs.CL," Drug name recognition (DNR) is an essential step in the Pharmacovigilance -(PV) pipeline. DNR aims to find drug name mentions in unstructured biomedical -texts and classify them into predefined categories. State-of-the-art DNR -approaches heavily rely on hand crafted features and domain specific resources -which are difficult to collect and tune. For this reason, this paper -investigates the effectiveness of contemporary recurrent neural architectures - -the Elman and Jordan networks and the bidirectional LSTM with CRF decoding - at -performing DNR straight from the text. The experimental results achieved on the -authoritative SemEval-2013 Task 9.1 benchmarks show that the bidirectional -LSTM-CRF ranks closely to highly-dedicated, hand-crafted systems. -" -3493,1609.07680,"Shuiyuan Yu, Junying Liang, Haitao Liu","Existence of Hierarchies and Human's Pursuit of Top Hierarchy Lead to - Power Law",cs.CL physics.soc-ph," The power law is ubiquitous in natural and social phenomena, and is -considered as a universal relationship between the frequency and its rank for -diverse social systems. However, a general model is still lacking to interpret -why these seemingly unrelated systems share great similarity. Through a -detailed analysis of natural language texts and simulation experiments based on -the proposed 'Hierarchical Selection Model', we found that the existence of -hierarchies and human's pursuit of top hierarchy lead to the power law. -Further, the power law is a statistical and emergent performance of -hierarchies, and it is the universality of hierarchies that contributes to the -ubiquity of the power law. -" -3494,1609.07681,"Shuiyuan Yu, Jin Cong, Junying Liang, Haitao Liu",The distribution of information content in English sentences,cs.CL," Sentence is a basic linguistic unit, however, little is known about how -information content is distributed across different positions of a sentence. -Based on authentic language data of English, the present study calculated the -entropy and other entropy-related statistics for different sentence positions. -The statistics indicate a three-step staircase-shaped distribution pattern, -with entropy in the initial position lower than the medial positions (positions -other than the initial and final), the medial positions lower than the final -position and the medial positions showing no significant difference. The -results suggest that: (1) the hypotheses of Constant Entropy Rate and Uniform -Information Density do not hold for the sentence-medial positions; (2) the -context of a word in a sentence should not be simply defined as all the words -preceding it in the same sentence; and (3) the contextual information content -in a sentence does not accumulate incrementally but follows a pattern of ""the -whole is greater than the sum of parts"". -" -3495,1609.07701,"Yonatan Belinkov, James Glass","Large-Scale Machine Translation between Arabic and Hebrew: Available - Corpora and Initial Results",cs.CL," Machine translation between Arabic and Hebrew has so far been limited by a -lack of parallel corpora, despite the political and cultural importance of this -language pair. Previous work relied on manually-crafted grammars or pivoting -via English, both of which are unsatisfactory for building a scalable and -accurate MT system. In this work, we compare standard phrase-based and neural -systems on Arabic-Hebrew translation. We experiment with tokenization by -external tools and sub-word modeling by character-level neural models, and show -that both methods lead to improved translation performance, with a small -advantage to the neural models. -" -3496,1609.07730,"Jinsong Su, Zhixing Tan, Deyi Xiong, Rongrong Ji, Xiaodong Shi, Yang - Liu","Lattice-Based Recurrent Neural Network Encoders for Neural Machine - Translation",cs.CL," Neural machine translation (NMT) heavily relies on word-level modelling to -learn semantic representations of input sentences. However, for languages -without natural word delimiters (e.g., Chinese) where input sentences have to -be tokenized first, conventional NMT is confronted with two issues: 1) it is -difficult to find an optimal tokenization granularity for source sentence -modelling, and 2) errors in 1-best tokenizations may propagate to the encoder -of NMT. To handle these issues, we propose word-lattice based Recurrent Neural -Network (RNN) encoders for NMT, which generalize the standard RNN to word -lattice topology. The proposed encoders take as input a word lattice that -compactly encodes multiple tokenizations, and learn to generate new hidden -states from arbitrarily many inputs and hidden states in preceding time steps. -As such, the word-lattice based encoders not only alleviate the negative impact -of tokenization errors but also are more expressive and flexible to embed input -sentences. Experiment results on Chinese-English translation demonstrate the -superiorities of the proposed encoders over the conventional encoder. -" -3497,1609.07756,Lilach Edelstein and Roi Reichart,"A Factorized Model for Transitive Verbs in Compositional Distributional - Semantics",cs.CL," We present a factorized compositional distributional semantics model for the -representation of transitive verb constructions. Our model first produces -(subject, verb) and (verb, object) vector representations based on the -similarity of the nouns in the construction to each of the nouns in the -vocabulary and the tendency of these nouns to take the subject and object roles -of the verb. These vectors are then combined into a final (subject,verb,object) -representation through simple vector operations. On two established tasks for -the transitive verb construction our model outperforms recent previous work. -" -3498,1609.07843,Stephen Merity and Caiming Xiong and James Bradbury and Richard Socher,Pointer Sentinel Mixture Models,cs.CL cs.AI," Recent neural network sequence models with softmax classifiers have achieved -their best language modeling performance only with very large hidden states and -large vocabularies. Even then they struggle to predict rare or unseen words -even if the context makes the prediction unambiguous. We introduce the pointer -sentinel mixture architecture for neural sequence models which has the ability -to either reproduce a word from the recent context or produce a word from a -standard softmax classifier. Our pointer sentinel-LSTM model achieves state of -the art language modeling performance on the Penn Treebank (70.9 perplexity) -while using far fewer parameters than a standard softmax LSTM. In order to -evaluate how well language models can exploit longer contexts and deal with -more realistic vocabularies and larger corpora we also introduce the freely -available WikiText corpus. -" -3499,1609.07876,"Taehwan Kim, Jonathan Keane, Weiran Wang, Hao Tang, Jason Riggle, - Gregory Shakhnarovich, Diane Brentari, Karen Livescu","Lexicon-Free Fingerspelling Recognition from Video: Data, Models, and - Signer Adaptation",cs.CL cs.CV," We study the problem of recognizing video sequences of fingerspelled letters -in American Sign Language (ASL). Fingerspelling comprises a significant but -relatively understudied part of ASL. Recognizing fingerspelling is challenging -for a number of reasons: It involves quick, small motions that are often highly -coarticulated; it exhibits significant variation between signers; and there has -been a dearth of continuous fingerspelling data collected. In this work we -collect and annotate a new data set of continuous fingerspelling videos, -compare several types of recognizers, and explore the problem of signer -variation. Our best-performing models are segmental (semi-Markov) conditional -random fields using deep neural network-based features. In the signer-dependent -setting, our recognizers achieve up to about 92% letter accuracy. The -multi-signer setting is much more challenging, but with neural network -adaptation we achieve up to 83% letter accuracies in this setting. -" -3500,1609.08075,Yi Yang and Ming-Wei Chang,"S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet - Entity Linking",cs.CL," Non-linear models recently receive a lot of attention as people are starting -to discover the power of statistical and embedding features. However, -tree-based models are seldom studied in the context of structured learning -despite their recent success on various classification and ranking tasks. In -this paper, we propose S-MART, a tree-based structured learning framework based -on multiple additive regression trees. S-MART is especially suitable for -handling tasks with dense features, and can be used to learn many different -structures under various loss functions. - We apply S-MART to the task of tweet entity linking --- a core component of -tweet information extraction, which aims to identify and link name mentions to -entities in a knowledge base. A novel inference algorithm is proposed to handle -the special structure of the task. The experimental results show that S-MART -significantly outperforms state-of-the-art tweet entity linking systems. -" -3501,1609.08084,"Yi Yang, Ming-Wei Chang, Jacob Eisenstein","Toward Socially-Infused Information Extraction: Embedding Authors, - Mentions, and Entities",cs.CL," Entity linking is the task of identifying mentions of entities in text, and -linking them to entries in a knowledge base. This task is especially difficult -in microblogs, as there is little additional text to provide disambiguating -context; rather, authors rely on an implicit common ground of shared knowledge -with their readers. In this paper, we attempt to capture some of this implicit -context by exploiting the social network structure in microblogs. We build on -the theory of homophily, which implies that socially linked individuals share -interests, and are therefore likely to mention the same sorts of entities. We -implement this idea by encoding authors, mentions, and entities in a continuous -vector space, which is constructed so that socially-connected authors have -similar vector representations. These vectors are incorporated into a neural -structured prediction model, which captures structural constraints that are -inherent in the entity linking task. Together, these design decisions yield F1 -improvements of 1%-5% on benchmark datasets, as compared to the previous -state-of-the-art. -" -3502,1609.08097,"Rebecca Sharp, Mihai Surdeanu, Peter Jansen, Peter Clark, and Michael - Hammond","Creating Causal Embeddings for Question Answering with Minimal - Supervision",cs.CL," A common model for question answering (QA) is that a good answer is one that -is closely related to the question, where relatedness is often determined using -general-purpose lexical models such as word embeddings. We argue that a better -approach is to look for answers that are related to the question in a relevant -way, according to the information need of the question, which may be determined -through task-specific embeddings. With causality as a use case, we implement -this insight in three steps. First, we generate causal embeddings -cost-effectively by bootstrapping cause-effect pairs extracted from free text -using a small set of seed patterns. Second, we train dedicated embeddings over -this data, by using task-specific contexts, i.e., the context of a cause is its -effect. Finally, we extend a state-of-the-art reranking approach for QA to -incorporate these causal embeddings. We evaluate the causal embedding models -both directly with a casual implication task, and indirectly, in a downstream -causal QA task using data from Yahoo! Answers. We show that explicitly modeling -causality improves performance in both tasks. In the QA task our best model -achieves 37.3% P@1, significantly outperforming a strong baseline by 7.7% -(relative). -" -3503,1609.08139,"Antonios Anastasopoulos, David Chiang, Long Duong","An Unsupervised Probability Model for Speech-to-Translation Alignment of - Low-Resource Languages",cs.CL," For many low-resource languages, spoken language resources are more likely to -be annotated with translations than with transcriptions. Translated speech data -is potentially valuable for documenting endangered languages or for training -speech translation systems. A first step towards making use of such data would -be to automatically align spoken words with their translations. We present a -model that combines Dyer et al.'s reparameterization of IBM Model 2 -(fast-align) and k-means clustering using Dynamic Time Warping as a distance -metric. The two components are trained jointly using expectation-maximization. -In an extremely low-resource scenario, our model performs significantly better -than both a neural model and a strong baseline. -" -3504,1609.08144,"Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, - Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff - Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, {\L}ukasz Kaiser, - Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, - George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason - Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey - Dean","Google's Neural Machine Translation System: Bridging the Gap between - Human and Machine Translation",cs.CL cs.AI cs.LG," Neural Machine Translation (NMT) is an end-to-end learning approach for -automated translation, with the potential to overcome many of the weaknesses of -conventional phrase-based translation systems. Unfortunately, NMT systems are -known to be computationally expensive both in training and in translation -inference. Also, most NMT systems have difficulty with rare words. These issues -have hindered NMT's use in practical deployments and services, where both -accuracy and speed are essential. In this work, we present GNMT, Google's -Neural Machine Translation system, which attempts to address many of these -issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder -layers using attention and residual connections. To improve parallelism and -therefore decrease training time, our attention mechanism connects the bottom -layer of the decoder to the top layer of the encoder. To accelerate the final -translation speed, we employ low-precision arithmetic during inference -computations. To improve handling of rare words, we divide words into a limited -set of common sub-word units (""wordpieces"") for both input and output. This -method provides a good balance between the flexibility of ""character""-delimited -models and the efficiency of ""word""-delimited models, naturally handles -translation of rare words, and ultimately improves the overall accuracy of the -system. Our beam search technique employs a length-normalization procedure and -uses a coverage penalty, which encourages generation of an output sentence that -is most likely to cover all the words in the source sentence. On the WMT'14 -English-to-French and English-to-German benchmarks, GNMT achieves competitive -results to state-of-the-art. Using a human side-by-side evaluation on a set of -isolated simple sentences, it reduces translation errors by an average of 60% -compared to Google's phrase-based production system. -" -3505,1609.08194,"Lei Yu, Jan Buys and Phil Blunsom",Online Segment to Segment Neural Transduction,cs.CL cs.AI cs.NE," We introduce an online neural sequence to sequence model that learns to -alternate between encoding and decoding segments of the input as it is read. By -independently tracking the encoding and decoding representations our algorithm -permits exact polynomial marginalization of the latent segmentation during -training, and during decoding beam search is employed to find the best -alignment path together with the predicted output sequence. Our model tackles -the bottleneck of vanilla encoder-decoders that have to read and memorize the -entire input sequence in their fixed-length hidden states before producing any -output. It is different from previous attentive models in that, instead of -treating the attention weights as output of a deterministic function, our model -assigns attention weights to a sequential latent variable which can be -marginalized out and permits online generation. Experiments on abstractive -sentence summarization and morphological inflection show significant -performance gains over the baseline encoder-decoders. -" -3506,1609.08210,Ferhan Ture and Elizabeth Boschee,Learning to Translate for Multilingual Question Answering,cs.CL cs.AI," In multilingual question answering, either the question needs to be -translated into the document language, or vice versa. In addition to direction, -there are multiple methods to perform the translation, four of which we explore -in this paper: word-based, 10-best, context-based, and grammar-based. We build -a feature for each combination of translation direction and method, and train a -model that learns optimal feature weights. On a large forum dataset consisting -of posts in English, Arabic, and Chinese, our novel learn-to-translate approach -was more effective than a strong baseline (p<0.05): translating all text into -English, then training a classifier based only on English (original or -translated) text. -" -3507,1609.08237,"Tao Ge, Qing Dou, Xiaoman Pan, Heng Ji, Lei Cui, Baobao Chang, Zhifang - Sui, Ming Zhou","Aligning Coordinated Text Streams through Burst Information Network - Construction and Decipherment",cs.CL," Aligning coordinated text streams from multiple sources and multiple -languages has opened many new research venues on cross-lingual knowledge -discovery. In this paper we aim to advance state-of-the-art by: (1). extending -coarse-grained topic-level knowledge mining to fine-grained information units -such as entities and events; (2). following a novel -Data-to-Network-to-Knowledge (D2N2K) paradigm to construct and utilize network -structures to capture and propagate reliable evidence. We introduce a novel -Burst Information Network (BINet) representation that can display the most -important information and illustrate the connections among bursty entities, -events and keywords in the corpus. We propose an effective approach to -construct and decipher BINets, incorporating novel criteria based on -multi-dimensional clues from pronunciation, translation, burst, neighbor and -graph topological structure. The experimental results on Chinese and English -coordinated text streams show that our approach can accurately decipher the -nodes with high confidence in the BINets and that the algorithm can be -efficiently run in parallel, which makes it possible to apply it to huge -amounts of streaming data for never-ending language and information -decipherment. -" -3508,1609.08293,Magnus Sahlgren and Alessandro Lenci,"The Effects of Data Size and Frequency Range on Distributional Semantic - Models",cs.CL," This paper investigates the effects of data size and frequency range on -distributional semantic models. We compare the performance of a number of -representative models for several test settings over data of varying sizes, and -over test items of various frequency. Our results show that neural -network-based models underperform when the data is small, and that the most -reliable model over data of varying sizes and frequency ranges is the inverted -factorized model. -" -3509,1609.08337,"Zhiyuan Tang, Lantian Li and Dong Wang",Multi-task Recurrent Model for True Multilingual Speech Recognition,cs.CL cs.LG cs.NE," Research on multilingual speech recognition remains attractive yet -challenging. Recent studies focus on learning shared structures under the -multi-task paradigm, in particular a feature sharing structure. This approach -has been found effective to improve performance on each individual language. -However, this approach is only useful when the deployed system supports just -one language. In a true multilingual scenario where multiple languages are -allowed, performance will be significantly reduced due to the competition among -languages in the decoding space. This paper presents a multi-task recurrent -model that involves a multilingual speech recognition (ASR) component and a -language recognition (LR) component, and the ASR component is informed of the -language information by the LR component, leading to a language-aware -recognition. We tested the approach on an English-Chinese bilingual recognition -task. The results show that the proposed multi-task recurrent model can improve -performance of multilingual recognition systems. -" -3510,1609.08359,"Ben Eisner, Tim Rockt\""aschel, Isabelle Augenstein, Matko Bo\v{s}njak, - Sebastian Riedel",emoji2vec: Learning Emoji Representations from their Description,cs.CL," Many current natural language processing applications for social media rely -on representation learning and utilize pre-trained word embeddings. There -currently exist several publicly-available, pre-trained sets of word -embeddings, but they contain few or no emoji representations even as emoji -usage in social media has increased. In this paper we release emoji2vec, -pre-trained embeddings for all Unicode emoji which are learned from their -description in the Unicode emoji standard. The resulting emoji embeddings can -be readily used in downstream social natural language processing applications -alongside word2vec. We demonstrate, for the downstream task of sentiment -analysis, that emoji embeddings learned from short descriptions outperforms a -skip-gram model trained on a large collection of tweets, while avoiding the -need for contexts in which emoji need to appear frequently in order to estimate -a representation. -" -3511,1609.08389,"Orna Almogi (UHH), Lena Dankin (TAU-CS), Nachum Dershowitz (TAU-CS), - Lior Wolf (TAU-CS)",A Hackathon for Classical Tibetan,cs.CL cs.CY," We describe the course of a hackathon dedicated to the development of -linguistic tools for Tibetan Buddhist studies. Over a period of five days, a -group of seventeen scholars, scientists, and students developed and compared -algorithms for intertextual alignment and text classification, along with some -basic language tools, including a stemmer and word segmenter. -" -3512,1609.08409,"Savelie Cornegruta, Robert Bakewell, Samuel Withey, Giovanni Montana","Modelling Radiological Language with Bidirectional Long Short-Term - Memory Networks",cs.CL stat.ML," Motivated by the need to automate medical information extraction from -free-text radiological reports, we present a bi-directional long short-term -memory (BiLSTM) neural network architecture for modelling radiological -language. The model has been used to address two NLP tasks: medical -named-entity recognition (NER) and negation detection. We investigate whether -learning several types of word embeddings improves BiLSTM's performance on -those tasks. Using a large dataset of chest x-ray reports, we compare the -proposed model to a baseline dictionary-based NER system and a negation -detection system that leverages the hand-crafted rules of the NegEx algorithm -and the grammatical relations obtained from the Stanford Dependency Parser. -Compared to these more traditional rule-based systems, we argue that BiLSTM -offers a strong alternative for both our tasks. -" -3513,1609.08412,"Dong Wang, Zhiyuan Tang, Difei Tang and Qing Chen","OC16-CE80: A Chinese-English Mixlingual Database and A Speech - Recognition Baseline",cs.CL," We present the OC16-CE80 Chinese-English mixlingual speech database which was -released as a main resource for training, development and test for the -Chinese-English mixlingual speech recognition (MixASR-CHEN) challenge on -O-COCOSDA 2016. This database consists of 80 hours of speech signals recorded -from more than 1,400 speakers, where the utterances are in Chinese but each -involves one or several English words. Based on the database and another two -free data resources (THCHS30 and the CMU dictionary), a speech recognition -(ASR) baseline was constructed with the deep neural network-hidden Markov model -(DNN-HMM) hybrid system. We then report the baseline results following the -MixASR-CHEN evaluation rules and demonstrate that OC16-CE80 is a reasonable -data resource for mixlingual research. -" -3514,1609.08433,"Chenghui Zhao, Lantian Li, Dong Wang, April Pu",Local Training for PLDA in Speaker Verification,cs.SD cs.CL," PLDA is a popular normalization approach for the i-vector model, and it has -delivered state-of-the-art performance in speaker verification. However, PLDA -training requires a large amount of labeled development data, which is highly -expensive in most cases. A possible approach to mitigate the problem is various -unsupervised adaptation methods, which use unlabeled data to adapt the PLDA -scattering matrices to the target domain. - In this paper, we present a new `local training' approach that utilizes -inaccurate but much cheaper local labels to train the PLDA model. These local -labels discriminate speakers within a single conversion only, and so are much -easier to obtain compared to the normal `global labels'. Our experiments show -that the proposed approach can deliver significant performance improvement, -particularly with limited globally-labeled data. -" -3515,1609.08441,"Lantian Li, Yixiang Chen, Dong Wang, Chenghui Zhao",Weakly Supervised PLDA Training,cs.LG cs.AI cs.CL cs.SD," PLDA is a popular normalization approach for the i-vector model, and it has -delivered state-of-the-art performance in speaker verification. However, PLDA -training requires a large amount of labelled development data, which is highly -expensive in most cases. We present a cheap PLDA training approach, which -assumes that speakers in the same session can be easily separated, and speakers -in different sessions are simply different. This results in `weak labels' which -are not fully accurate but cheap, leading to a weak PLDA training. - Our experimental results on real-life large-scale telephony customer service -achieves demonstrated that the weak training can offer good performance when -human-labelled data are limited. More interestingly, the weak training can be -employed as a discriminative adaptation approach, which is more efficient than -the prevailing unsupervised method when human-labelled data are insufficient. -" -3516,1609.08442,"Lantian Li, Zhiyuan Tang, Dong Wang, Andrew Abel, Yang Feng, Shiyue - Zhang",Collaborative Learning for Language and Speaker Recognition,cs.SD cs.CL," This paper presents a unified model to perform language and speaker -recognition simultaneously and altogether. The model is based on a multi-task -recurrent neural network where the output of one task is fed as the input of -the other, leading to a collaborative learning framework that can improve both -language and speaker recognition by borrowing information from each other. Our -experiments demonstrated that the multi-task model outperforms the -task-specific models on both tasks. -" -3517,1609.08445,"Dong Wang, Lantian Li, Difei Tang, Qing Chen","AP16-OL7: A Multilingual Database for Oriental Languages and A Language - Recognition Baseline",cs.CL cs.AI," We present the AP16-OL7 database which was released as the training and test -data for the oriental language recognition (OLR) challenge on APSIPA 2016. -Based on the database, a baseline system was constructed on the basis of the -i-vector model. We report the baseline results evaluated in various metrics -defined by the AP16-OLR evaluation plan and demonstrate that AP16-OL7 is a -reasonable data resource for multilingual research. -" -3518,1609.08492,"Miguel J. Rodrigues, Miguel Fal\'e, Andre Lamurias, and Francisco M. - Couto","WS4A: a Biomedical Question and Answering System based on public Web - Services and Ontologies",cs.CL cs.IR," This paper describes our system, dubbed WS4A (Web Services for All), that -participated in the fourth edition of the BioASQ challenge (2016). We used WS4A -to perform the Question and Answering (QA) task 4b, which consisted on the -retrieval of relevant concepts, documents, snippets, RDF triples, exact answers -and ideal answers for each given question. The novelty in our approach consists -on the maximum exploitation of existing web services in each step of WS4A, such -as the annotation of text, and the retrieval of metadata for each annotation. -The information retrieved included concept identifiers, ontologies, ancestors, -and most importantly, PubMed identifiers. The paper describes the WS4A pipeline -and also presents the precision, recall and f-measure values obtained in task -4b. Our system achieved two second places in two subtasks on one of the five -batches. -" -3519,1609.08496,"Jipeng Qiang, Ping Chen, Tong Wang, Xindong Wu",Topic Modeling over Short Texts by Incorporating Word Embeddings,cs.CL cs.IR cs.LG," Inferring topics from the overwhelming amount of short texts becomes a -critical but challenging task for many content analysis tasks, such as content -charactering, user interest profiling, and emerging topic detecting. Existing -methods such as probabilistic latent semantic analysis (PLSA) and latent -Dirichlet allocation (LDA) cannot solve this prob- lem very well since only -very limited word co-occurrence information is available in short texts. This -paper studies how to incorporate the external word correlation knowledge into -short texts to improve the coherence of topic modeling. Based on recent results -in word embeddings that learn se- mantically representations for words from a -large corpus, we introduce a novel method, Embedding-based Topic Model (ETM), -to learn latent topics from short texts. ETM not only solves the problem of -very limited word co-occurrence information by aggregating short texts into -long pseudo- texts, but also utilizes a Markov Random Field regularized model -that gives correlated words a better chance to be put into the same topic. The -experiments on real-world datasets validate the effectiveness of our model -comparing with the state-of-the-art models. -" -3520,1609.08667,Kevin Clark and Christopher D. Manning,Deep Reinforcement Learning for Mention-Ranking Coreference Models,cs.CL," Coreference resolution systems are typically trained with heuristic loss -functions that require careful tuning. In this paper we instead apply -reinforcement learning to directly optimize a neural mention-ranking model for -coreference evaluation metrics. We experiment with two approaches: the -REINFORCE policy gradient algorithm and a reward-rescaled max-margin objective. -We find the latter to be more effective, resulting in significant improvements -over the current state-of-the-art on the English and Chinese portions of the -CoNLL 2012 Shared Task. -" -3521,1609.08703,"Franck Dernoncourt, Ji Young Lee","Optimizing Neural Network Hyperparameters with Gaussian Processes for - Dialog Act Classification",cs.CL cs.NE stat.ML," Systems based on artificial neural networks (ANNs) have achieved -state-of-the-art results in many natural language processing tasks. Although -ANNs do not require manually engineered features, ANNs have many -hyperparameters to be optimized. The choice of hyperparameters significantly -impacts models' performances. However, the ANN hyperparameters are typically -chosen by manual, grid, or random search, which either requires expert -experiences or is computationally expensive. Recent approaches based on -Bayesian optimization using Gaussian processes (GPs) is a more systematic way -to automatically pinpoint optimal or near-optimal machine learning -hyperparameters. Using a previously published ANN model yielding -state-of-the-art results for dialog act classification, we demonstrate that -optimizing hyperparameters using GP further improves the results, and reduces -the computational time by a factor of 4 compared to a random search. Therefore -it is a useful technique for tuning ANN models to yield the best performances -for natural language processing tasks. -" -3522,1609.08777,"Kazuya Kawakami, Chris Dyer, Bryan R. Routledge, Noah A. Smith",Character Sequence Models for ColorfulWords,cs.CL," We present a neural network architecture to predict a point in color space -from the sequence of characters in the color's name. Using large scale -color--name pairs obtained from an online color design forum, we evaluate our -model on a ""color Turing test"" and find that, given a name, the colors -predicted by our model are preferred by annotators to color names created by -humans. Our datasets and demo system are available online at colorlab.us. -" -3523,1609.08779,"Desmond Upton Patton (Columbia University), Kathleen McKeown (Columbia - University), Owen Rambow (Columbia University), Jamie Macbeth (Columbia - University)","Using Natural Language Processing and Qualitative Analysis to Intervene - in Gang Violence: A Collaboration Between Social Work Researchers and Data - Scientists",cs.CY cs.CL," The U.S. has the highest rate of firearm-related deaths when compared to -other industrialized countries. Violence particularly affects low-income, urban -neighborhoods in cities like Chicago, which saw a 40% increase in firearm -violence from 2014 to 2015 to more than 3,000 shooting victims. While recent -studies have found that urban, gang-involved individuals curate a unique and -complex communication style within and between social media platforms, -organizations focused on reducing gang violence are struggling to keep up with -the growing complexity of social media platforms and the sheer volume of data -they present. In this paper, describe the Digital Urban Violence Analysis -Approach (DUVVA), a collaborative qualitative analysis method used in a -collaboration between data scientists and social work researchers to develop a -suite of systems for decoding the high- stress language of urban, gang-involved -youth. Our approach leverages principles of grounded theory when analyzing -approximately 800 tweets posted by Chicago gang members and participation of -youth from Chicago neighborhoods to create a language resource for natural -language processing (NLP) methods. In uncovering the unique language and -communication style, we developed automated tools with the potential to detect -aggressive language on social media and aid individuals and groups in -performing violence prevention and interruption. -" -3524,1609.08789,"Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng and Shiyue Zhang","Memory Visualization for Gated Recurrent Neural Networks in Speech - Recognition",cs.LG cs.CL cs.NE," Recurrent neural networks (RNNs) have shown clear superiority in sequence -modeling, particularly the ones with gated units, such as long short-term -memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties -behind the remarkable performance remain unclear in many applications, e.g., -automatic speech recognition (ASR). This paper employs visualization techniques -to study the behavior of LSTM and GRU when performing speech recognition tasks. -Our experiments show some interesting patterns in the gated memory, and some of -them have inspired simple yet effective modifications on the network structure. -We report two of such modifications: (1) lazy cell update in LSTM, and (2) -shortcut connections for residual learning. Both modifications lead to more -comprehensible and powerful networks. -" -3525,1609.08810,Hagar Loeub and Roi Reichart,"Effective Combination of Language and Vision Through Model Composition - and the R-CCA Method",cs.CL," We address the problem of integrating textual and visual information in -vector space models for word meaning representation. We first present the -Residual CCA (R-CCA) method, that complements the standard CCA method by -representing, for each modality, the difference between the original signal and -the signal projected to the shared, max correlation, space. We then show that -constructing visual and textual representations and then post-processing them -through composition of common modeling motifs such as PCA, CCA, R-CCA and -linear interpolation (a.k.a sequential modeling) yields high quality models. On -five standard semantic benchmarks our sequential models outperform recent -multimodal representation learning alternatives, including ones that rely on -joint representation learning. For two of these benchmarks our R-CCA method is -part of the Best configuration our algorithm yields. -" -3526,1609.08824,"Subhro Roy, Shyam Upadhyay, Dan Roth",Equation Parsing: Mapping Sentences to Grounded Equations,cs.CL," Identifying mathematical relations expressed in text is essential to -understanding a broad range of natural language text from election reports, to -financial news, to sport commentaries to mathematical word problems. This paper -focuses on identifying and understanding mathematical relations described -within a single sentence. We introduce the problem of Equation Parsing -- given -a sentence, identify noun phrases which represent variables, and generate the -mathematical equation expressing the relation described in the sentence. We -introduce the notion of projective equation parsing and provide an efficient -algorithm to parse text to projective equations. Our system makes use of a high -precision lexicon of mathematical expressions and a pipeline of structured -predictors, and generates correct equations in $70\%$ of the cases. In $60\%$ -of the time, it also identifies the correct noun phrase $\rightarrow$ variables -mapping, significantly outperforming baselines. We also release a new annotated -dataset for task evaluation. -" -3527,1609.08843,"Jiaming Xu, Jing Shi, Yiqun Yao, Suncong Zheng, Bo Xu, Bo Xu",Hierarchical Memory Networks for Answer Selection on Unknown Words,cs.IR cs.AI cs.CL," Recently, end-to-end memory networks have shown promising results on Question -Answering task, which encode the past facts into an explicit memory and perform -reasoning ability by making multiple computational steps on the memory. -However, memory networks conduct the reasoning on sentence-level memory to -output coarse semantic vectors and do not further take any attention mechanism -to focus on words, which may lead to the model lose some detail information, -especially when the answers are rare or unknown words. In this paper, we -propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the -past facts into sentence-level memory and word-level memory respectively. Then, -(k)-max pooling is exploited following reasoning module on the sentence-level -memory to sample the (k) most relevant sentences to a question and feed these -sentences into attention mechanism on the word-level memory to focus the words -in the selected sentences. Finally, the prediction is jointly learned over the -outputs of the sentence-level reasoning module and the word-level attention -mechanism. The experimental results demonstrate that our approach successfully -conducts answer selection on unknown words and achieves a better performance -than memory networks. -" -3528,1609.09004,Johannes Bjerva,Byte-based Language Identification with Deep Convolutional Networks,cs.CL," We report on our system for the shared task on discriminating between similar -languages (DSL 2016). The system uses only byte representations in a deep -residual network (ResNet). The system, named ResIdent, is trained only on the -data released with the task (closed training). We obtain 84.88% accuracy on -subtask A, 68.80% accuracy on subtask B1, and 69.80% accuracy on subtask B2. A -large difference in accuracy on development data can be observed with -relatively minor changes in our network's architecture and hyperparameters. We -therefore expect fine-tuning of these parameters to yield higher accuracies. -" -3529,1609.09007,"Ke Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, Kevin Knight",Unsupervised Neural Hidden Markov Models,cs.CL cs.LG," In this work, we present the first results for neuralizing an Unsupervised -Hidden Markov Model. We evaluate our approach on tag in- duction. Our approach -outperforms existing generative models and is competitive with the -state-of-the-art though with a simpler model easily extended to include -additional context. -" -3530,1609.09019,Ekaterina Shutova and Patricia Lichtenstein,Psychologically Motivated Text Mining,cs.CL," Natural language processing techniques are increasingly applied to identify -social trends and predict behavior based on large text collections. Existing -methods typically rely on surface lexical and syntactic information. Yet, -research in psychology shows that patterns of human conceptualisation, such as -metaphorical framing, are reliable predictors of human expectations and -decisions. In this paper, we present a method to learn patterns of metaphorical -framing from large text collections, using statistical techniques. We apply the -method to data in three different languages and evaluate the identified -patterns, demonstrating their psychological validity. -" -3531,1609.09028,"Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal - Lukasik","Stance Classification in Rumours as a Sequential Task Exploiting the - Tree Structure of Social Media Conversations",cs.CL cs.SI," Rumour stance classification, the task that determines if each tweet in a -collection discussing a rumour is supporting, denying, questioning or simply -commenting on the rumour, has been attracting substantial interest. Here we -introduce a novel approach that makes use of the sequence of transitions -observed in tree-structured conversation threads in Twitter. The conversation -threads are formed by harvesting users' replies to one another, which results -in a nested tree-like structure. Previous work addressing the stance -classification task has treated each tweet as a separate unit. Here we analyse -tweets by virtue of their position in a sequence and test two sequential -classifiers, Linear-Chain CRF and Tree CRF, each of which makes different -assumptions about the conversational structure. We experiment with eight -Twitter datasets, collected during breaking news, and show that exploiting the -sequential structure of Twitter conversations achieves significant improvements -over the non-sequential methods. Our work is the first to model Twitter -conversations as a tree structure in this manner, introducing a novel way of -tackling NLP tasks on Twitter conversations. -" -3532,1609.09171,"Lei Shen, Junlin Zhang","Empirical Evaluation of RNN Architectures on Sentence Classification - Task",cs.CL," Recurrent Neural Networks have achieved state-of-the-art results for many -problems in NLP and two most popular RNN architectures are Tail Model and -Pooling Model. In this paper, a hybrid architecture is proposed and we present -the first empirical study using LSTMs to compare performance of the three RNN -structures on sentence classification task. Experimental results show that the -Max Pooling Model or Hybrid Max Pooling Model achieves the best performance on -most datasets, while Tail Model does not outperform other models. -" -3533,1609.09188,Leonard K.M. Poon and Nevin L. Zhang,"Topic Browsing for Research Papers with Hierarchical Latent Tree - Analysis",cs.CL cs.IR cs.LG," Academic researchers often need to face with a large collection of research -papers in the literature. This problem may be even worse for postgraduate -students who are new to a field and may not know where to start. To address -this problem, we have developed an online catalog of research papers where the -papers have been automatically categorized by a topic model. The catalog -contains 7719 papers from the proceedings of two artificial intelligence -conferences from 2000 to 2015. Rather than the commonly used Latent Dirichlet -Allocation, we use a recently proposed method called hierarchical latent tree -analysis for topic modeling. The resulting topic model contains a hierarchy of -topics so that users can browse the topics from the top level to the bottom -level. The topic model contains a manageable number of general topics at the -top level and allows thousands of fine-grained topics at the bottom level. It -also can detect topics that have emerged recently. -" -3534,1609.09189,"Shaonan Wang, Jiajun Zhang and Chengqing Zong",Learning Sentence Representation with Guidance of Human Attention,cs.CL," Recently, much progress has been made in learning general-purpose sentence -representations that can be used across domains. However, most of the existing -models typically treat each word in a sentence equally. In contrast, extensive -studies have proven that human read sentences efficiently by making a sequence -of fixation and saccades. This motivates us to improve sentence representations -by assigning different weights to the vectors of the component words, which can -be treated as an attention mechanism on single sentences. To that end, we -propose two novel attention models, in which the attention weights are derived -using significant predictors of human reading time, i.e., Surprisal, POS tags -and CCG supertags. The extensive experiments demonstrate that the proposed -methods significantly improve upon the state-of-the-art sentence representation -models. -" -3535,1609.09247,"Zhenghua Li, Yue Zhang, Jiayuan Chao, Min Zhang",Training Dependency Parsers with Partial Annotation,cs.CL cs.LG," Recently, these has been a surge on studying how to obtain partially -annotated data for model supervision. However, there still lacks a systematic -study on how to train statistical models with partial annotation (PA). Taking -dependency parsing as our case study, this paper describes and compares two -straightforward approaches for three mainstream dependency parsers. The first -approach is previously proposed to directly train a log-linear graph-based -parser (LLGPar) with PA based on a forest-based objective. This work for the -first time proposes the second approach to directly training a linear -graph-based parse (LGPar) and a linear transition-based parser (LTPar) with PA -based on the idea of constrained decoding. We conduct extensive experiments on -Penn Treebank under three different settings for simulating PA, i.e., random -dependencies, most uncertain dependencies, and dependencies with divergent -outputs from the three parsers. The results show that LLGPar is most effective -in learning from PA and LTPar lags behind the graph-based counterparts by large -margin. Moreover, LGPar and LTPar can achieve best performance by using LLGPar -to complete PA into full annotation (FA). -" -3536,1609.09315,"Tom\'a\v{s} Ko\v{c}isk\'y and G\'abor Melis and Edward Grefenstette - and Chris Dyer and Wang Ling and Phil Blunsom and Karl Moritz Hermann",Semantic Parsing with Semi-Supervised Sequential Autoencoders,cs.CL cs.AI cs.NE," We present a novel semi-supervised approach for sequence transduction and -apply it to semantic parsing. The unsupervised component is based on a -generative model in which latent sentences generate the unpaired logical forms. -We apply this method to a number of semantic parsing tasks focusing on domains -with limited access to labelled training data and extend those datasets with -synthetically generated logical forms. -" -3537,1609.09382,Othman Zennaki and Nasredine Semmar and Laurent Besacier,"Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent - Neural Networks",cs.CL," This work focuses on the rapid development of linguistic annotation tools for -resource-poor languages. We experiment several cross-lingual annotation -projection methods using Recurrent Neural Networks (RNN) models. The -distinctive feature of our approach is that our multilingual word -representation requires only a parallel corpus between the source and target -language. More precisely, our method has the following characteristics: (a) it -does not use word alignment information, (b) it does not assume any knowledge -about foreign languages, which makes it applicable to a wide range of -resource-poor languages, (c) it provides truly multilingual taggers. We -investigate both uni- and bi-directional RNN models and propose a method to -include external information (for instance low level information from POS) in -the RNN to train higher level taggers (for instance, super sense taggers). We -demonstrate the validity and genericity of our model by using parallel corpora -(obtained by manual or automatic translation). Our experiments are conducted to -induce cross-lingual POS and super sense taggers. -" -3538,1609.09405,"Yonatan Bisk, Siva Reddy, John Blitzer, Julia Hockenmaier, Mark - Steedman",Evaluating Induced CCG Parsers on Grounded Semantic Parsing,cs.CL cs.AI," We compare the effectiveness of four different syntactic CCG parsers for a -semantic slot-filling task to explore how much syntactic supervision is -required for downstream semantic analysis. This extrinsic, task-based -evaluation provides a unique window to explore the strengths and weaknesses of -semantics captured by unsupervised grammar induction systems. We release a new -Freebase semantic parsing dataset called SPADES (Semantic PArsing of -DEclarative Sentences) containing 93K cloze-style questions paired with -answers. We evaluate all our models on this dataset. Our code and data are -available at https://github.com/sivareddyg/graph-parser. -" -3539,1609.09552,"Yuta Kikuchi, Graham Neubig, Ryohei Sasano, Hiroya Takamura and Manabu - Okumura",Controlling Output Length in Neural Encoder-Decoders,cs.CL," Neural encoder-decoder models have shown great success in many sequence -generation tasks. However, previous work has not investigated situations in -which we would like to control the length of encoder-decoder outputs. This -capability is crucial for applications such as text summarization, in which we -have to generate concise summaries with a desired length. In this paper, we -propose methods for controlling the output sequence length for neural -encoder-decoder models: two decoding-based methods and two learning-based -methods. Results show that our learning-based methods have the capability to -control length without degrading summary quality in a summarization task. -" -3540,1609.09580,Michael Spranger and Katrien Beuls,"Referential Uncertainty and Word Learning in High-dimensional, - Continuous Meaning Spaces",cs.CL," This paper discusses lexicon word learning in high-dimensional meaning spaces -from the viewpoint of referential uncertainty. We investigate various -state-of-the-art Machine Learning algorithms and discuss the impact of scaling, -representation and meaning space structure. We demonstrate that current Machine -Learning techniques successfully deal with high-dimensional meaning spaces. In -particular, we show that exponentially increasing dimensions linearly impact -learner performance and that referential uncertainty from word sensitivity has -no impact. -" -3541,1610.00030,"Marcos Zampieri, Shervin Malmasi, Mark Dras",Modeling Language Change in Historical Corpora: The Case of Portuguese,cs.CL," This paper presents a number of experiments to model changes in a historical -Portuguese corpus composed of literary texts for the purpose of temporal text -classification. Algorithms were trained to classify texts with respect to their -publication date taking into account lexical variation represented as word -n-grams, and morphosyntactic variation represented by part-of-speech (POS) -distribution. We report results of 99.8% accuracy using word unigram features -with a Support Vector Machines classifier to predict the publication date of -documents in time intervals of both one century and half a century. A feature -analysis is performed to investigate the most informative features for this -task and how they are linked to language change. -" -3542,1610.00031,"Cyril Goutte, Serge L\'eger, Shervin Malmasi, Marcos Zampieri",Discriminating Similar Languages: Evaluations and Explorations,cs.CL," We present an analysis of the performance of machine learning classifiers on -discriminating between similar languages and language varieties. We carried out -a number of experiments using the results of the two editions of the -Discriminating between Similar Languages (DSL) shared task. We investigate the -progress made between the two tasks, estimate an upper bound on possible -performance using ensemble and oracle combination, and provide learning curves -to help us understand which languages are more challenging. A number of -difficult sentences are identified and investigated further with human -annotation. -" -3543,1610.00072,"Gurvan L'Hostis, David Grangier, Michael Auli",Vocabulary Selection Strategies for Neural Machine Translation,cs.CL," Classical translation models constrain the space of possible outputs by -selecting a subset of translation rules based on the input sentence. Recent -work on improving the efficiency of neural translation models adopted a similar -strategy by restricting the output vocabulary to a subset of likely candidates -given the source. In this paper we experiment with context and embedding-based -selection methods and extend previous work by examining speed and accuracy -trade-offs in more detail. We show that decoding time on CPUs can be reduced by -up to 90% and training time by 25% on the WMT15 English-German and WMT16 -English-Romanian tasks at the same or only negligible change in accuracy. This -brings the time to decode with a state of the art neural translation system to -just over 140 msec per sentence on a single CPU core for English-German. -" -3544,1610.00211,"Marcos Vin\'icius Treviso, Christopher Shulby, Sandra Maria Alu\'isio","Sentence Segmentation in Narrative Transcripts from Neuropsychological - Tests using Recurrent Convolutional Neural Networks",cs.CL," Automated discourse analysis tools based on Natural Language Processing (NLP) -aiming at the diagnosis of language-impairing dementias generally extract -several textual metrics of narrative transcripts. However, the absence of -sentence boundary segmentation in the transcripts prevents the direct -application of NLP methods which rely on these marks to function properly, such -as taggers and parsers. We present the first steps taken towards automatic -neuropsychological evaluation based on narrative discourse analysis, presenting -a new automatic sentence segmentation method for impaired speech. Our model -uses recurrent convolutional neural networks with prosodic, Part of Speech -(PoS) features, and word embeddings. It was evaluated intrinsically on -impaired, spontaneous speech, as well as, normal, prepared speech, and presents -better results for healthy elderly (CTL) (F1 = 0.74) and Mild Cognitive -Impairment (MCI) patients (F1 = 0.70) than the Conditional Random Fields method -(F1 = 0.55 and 0.53, respectively) used in the same context of our study. The -results suggest that our model is robust for impaired speech and can be used in -automated discourse analysis tools to differentiate narratives produced by MCI -and CTL. -" -3545,1610.00219,"Junxian He, Ying Huang, Changfeng Liu, Jiaming Shen, Yuting Jia, - Xinbing Wang",Text Network Exploration via Heterogeneous Web of Topics,cs.SI cs.CL cs.IR," A text network refers to a data type that each vertex is associated with a -text document and the relationship between documents is represented by edges. -The proliferation of text networks such as hyperlinked webpages and academic -citation networks has led to an increasing demand for quickly developing a -general sense of a new text network, namely text network exploration. In this -paper, we address the problem of text network exploration through constructing -a heterogeneous web of topics, which allows people to investigate a text -network associating word level with document level. To achieve this, a -probabilistic generative model for text and links is proposed, where three -different relationships in the heterogeneous topic web are quantified. We also -develop a prototype demo system named TopicAtlas to exhibit such heterogeneous -topic web, and demonstrate how this system can facilitate the task of text -network exploration. Extensive qualitative analyses are included to verify the -effectiveness of this heterogeneous topic web. Besides, we validate our model -on real-life text networks, showing that it preserves good performance on -objective evaluation metrics. -" -3546,1610.00277,"Yanmin Qian, Philip C Woodland",Very Deep Convolutional Neural Networks for Robust Speech Recognition,cs.CL," This paper describes the extension and optimization of our previous work on -very deep convolutional neural networks (CNNs) for effective recognition of -noisy speech in the Aurora 4 task. The appropriate number of convolutional -layers, the sizes of the filters, pooling operations and input feature maps are -all modified: the filter and pooling sizes are reduced and dimensions of input -feature maps are extended to allow adding more convolutional layers. -Furthermore appropriate input padding and input feature map selection -strategies are developed. In addition, an adaptation framework using joint -training of very deep CNN with auxiliary features i-vector and fMLLR features -is developed. These modifications give substantial word error rate reductions -over the standard CNN used as baseline. Finally the very deep CNN is combined -with an LSTM-RNN acoustic model and it is shown that state-level weighted log -likelihood score combination in a joint acoustic model decoding scheme is very -effective. On the Aurora 4 task, the very deep CNN achieves a WER of 8.81%, -further 7.99% with auxiliary feature joint training, and 7.09% with LSTM-RNN -joint decoding. -" -3547,1610.00311,Kevin Shu and Matilde Marcolli,Syntactic Structures and Code Parameters,cs.CL," We assign binary and ternary error-correcting codes to the data of syntactic -structures of world languages and we study the distribution of code points in -the space of code parameters. We show that, while most codes populate the lower -region approximating a superposition of Thomae functions, there is a -substantial presence of codes above the Gilbert-Varshamov bound and even above -the asymptotic bound and the Plotkin bound. We investigate the dynamics induced -on the space of code parameters by spin glass models of language change, and -show that, in the presence of entailment relations between syntactic parameters -the dynamics can sometimes improve the code. For large sets of languages and -syntactic data, one can gain information on the spin glass dynamics from the -induced dynamics in the space of code parameters. -" -3548,1610.00369,"A. Hassan, M. R. Amin, N. Mohammed, A. K. A. Azad","Sentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep - Recurrent models",cs.CL cs.IR cs.LG cs.NE," Sentiment Analysis (SA) is an action research area in the digital age. With -rapid and constant growth of online social media sites and services, and the -increasing amount of textual data such as - statuses, comments, reviews etc. -available in them, application of automatic SA is on the rise. However, most of -the research works on SA in natural language processing (NLP) are based on -English language. Despite being the sixth most widely spoken language in the -world, Bangla still does not have a large and standard dataset. Because of -this, recent research works in Bangla have failed to produce results that can -be both comparable to works done by others and reusable as stepping stones for -future researchers to progress in this field. Therefore, we first tried to -provide a textual dataset - that includes not just Bangla, but Romanized Bangla -texts as well, is substantial, post-processed and multiple validated, ready to -be used in SA experiments. We tested this dataset in Deep Recurrent model, -specifically, Long Short Term Memory (LSTM), using two types of loss functions -- binary crossentropy and categorical crossentropy, and also did some -experimental pre-training by using data from one validation to pre-train the -other and vice versa. Lastly, we documented the results along with some -analysis on them, which were promising. -" -3549,1610.00388,"Jiatao Gu, Graham Neubig, Kyunghyun Cho and Victor O.K. Li",Learning to Translate in Real-time with Neural Machine Translation,cs.CL cs.LG," Translating in real-time, a.k.a. simultaneous translation, outputs -translation words before the input sentence ends, which is a challenging -problem for conventional machine translation methods. We propose a neural -machine translation (NMT) framework for simultaneous translation in which an -agent learns to make decisions on when to translate from the interaction with a -pre-trained NMT environment. To trade off quality and delay, we extensively -explore various targets for delay and design a method for beam-search -applicable in the simultaneous MT setting. Experiments against state-of-the-art -baselines on two language pairs demonstrate the efficacy of the proposed -framework both quantitatively and qualitatively. -" -3550,1610.00479,"Hinrich Schuetze, Heike Adel, Ehsaneddin Asgari",Nonsymbolic Text Representation,cs.CL," We introduce the first generic text representation model that is completely -nonsymbolic, i.e., it does not require the availability of a segmentation or -tokenization method that attempts to identify words or other symbolic units in -text. This applies to training the parameters of the model on a training corpus -as well as to applying it when computing the representation of a new text. We -show that our model performs better than prior work on an information -extraction and a text denoising task. -" -3551,1610.00520,Akash Kumar Dhaka and Giampiero Salvi,"Semi-supervised Learning with Sparse Autoencoders in Phone - Classification",stat.ML cs.CL cs.LG," We propose the application of a semi-supervised learning method to improve -the performance of acoustic modelling for automatic speech recognition based on -deep neural net- works. As opposed to unsupervised initialisation followed by -supervised fine tuning, our method takes advantage of both unlabelled and -labelled data simultaneously through mini- batch stochastic gradient descent. -We tested the method with varying proportions of labelled vs unlabelled -observations in frame-based phoneme classification on the TIMIT database. Our -experiments show that the method outperforms standard supervised training for -an equal amount of labelled data and provides competitive error rates compared -to state-of-the-art graph-based semi-supervised learning techniques. -" -3552,1610.00552,"Minjae Lee, Kyuyeon Hwang, Jinhwan Park, Sungwook Choi, Sungho Shin, - Wonyong Sung",FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks,cs.CL cs.LG cs.SD," In this paper, a neural network based real-time speech recognition (SR) -system is developed using an FPGA for very low-power operation. The implemented -system employs two recurrent neural networks (RNNs); one is a -speech-to-character RNN for acoustic modeling (AM) and the other is for -character-level language modeling (LM). The system also employs a statistical -word-level LM to improve the recognition accuracy. The results of the AM, the -character-level LM, and the word-level LM are combined using a fairly simple -N-best search algorithm instead of the hidden Markov model (HMM) based network. -The RNNs are implemented using massively parallel processing elements (PEs) for -low latency and high throughput. The weights are quantized to 6 bits to store -all of them in the on-chip memory of an FPGA. The proposed algorithm is -implemented on a Xilinx XC7Z045, and the system can operate much faster than -real-time. -" -3553,1610.00572,Mauro Cettolo,An Arabic-Hebrew parallel corpus of TED talks,cs.CL cs.IR," We describe an Arabic-Hebrew parallel corpus of TED talks built upon WIT3, -the Web inventory that repurposes the original content of the TED website in a -way which is more convenient for MT researchers. The benchmark consists of -about 2,000 talks, whose subtitles in Arabic and Hebrew have been accurately -aligned and rearranged in sentences, for a total of about 3.5M tokens per -language. Talks have been partitioned in train, development and test sets -similarly in all respects to the MT tasks of the IWSLT 2016 evaluation -campaign. In addition to describing the benchmark, we list the problems -encountered in preparing it and the novel methods designed to solve them. -Baseline MT results and some measures on sentence length are provided as an -extrinsic evaluation of the quality of the benchmark. -" -3554,1610.00602,Nikhil Krishnaswamy and James Pustejovsky,"Multimodal Semantic Simulations of Linguistically Underspecified Motion - Events",cs.CL," In this paper, we describe a system for generating three-dimensional visual -simulations of natural language motion expressions. We use a rich formal model -of events and their participants to generate simulations that satisfy the -minimal constraints entailed by the associated utterance, relying on semantic -knowledge of physical objects and motion events. This paper outlines technical -considerations and discusses implementing the aforementioned semantic models -into such a system. -" -3555,1610.00634,Anoop Kunchukuttan and Pushpak Bhattacharyya,Orthographic Syllable as basic unit for SMT between Related Languages,cs.CL," We explore the use of the orthographic syllable, a variable-length -consonant-vowel sequence, as a basic unit of translation between related -languages which use abugida or alphabetic scripts. We show that orthographic -syllable level translation significantly outperforms models trained over other -basic units (word, morpheme and character) when training over small parallel -corpora. -" -3556,1610.00765,"Edoardo Maria Ponti, Elisabetta Jezek, Bernardo Magnini","Distributed Representations of Lexical Sets and Prototypes in Causal - Alternation Verbs",cs.CL," Lexical sets contain the words filling an argument slot of a verb, and are in -part determined by selectional preferences. The purpose of this paper is to -unravel the properties of lexical sets through distributional semantics. We -investigate 1) whether lexical set behave as prototypical categories with a -centre and a periphery; 2) whether they are polymorphic, i.e. composed by -subcategories; 3) whether the distance between lexical sets of different -arguments is explanatory of verb properties. In particular, our case study are -lexical sets of causative-inchoative verbs in Italian. Having studied several -vector models, we find that 1) based on spatial distance from the centroid, -object fillers are scattered uniformly across the category, whereas -intransitive subject fillers lie on its edge; 2) a correlation exists between -the amount of verb senses and that of clusters discovered automatically, -especially for intransitive subjects; 3) the distance between the centroids of -object and intransitive subject is correlated with other properties of verbs, -such as their cross-lingual tendency to appear in the intransitive pattern -rather than transitive one. This paper is noncommittal with respect to the -hypothesis that this connection is underpinned by a semantic reason, namely the -spontaneity of the event denoted by the verb. -" -3557,1610.00842,"Yandi Xia, Yang Liu",Chinese Event Extraction Using DeepNeural Network with Word Embedding,cs.CL," A lot of prior work on event extraction has exploited a variety of features -to represent events. Such methods have several drawbacks: 1) the features are -often specific for a particular domain and do not generalize well; 2) the -features are derived from various linguistic analyses and are error-prone; and -3) some features may be expensive and require domain expert. In this paper, we -develop a Chinese event extraction system that uses word embedding vectors to -represent language, and deep neural networks to learn the abstract feature -representation in order to greatly reduce the effort of feature engineering. In -addition, in this framework, we leverage large amount of unlabeled data, which -can address the problem of limited labeled corpus for this task. Our -experiments show that our proposed method performs better compared to the -system using rich language features, and using unlabeled data benefits the word -embeddings. This study suggests the potential of DNN and word embedding for the -event extraction task. -" -3558,1610.00852,"Joey Hong, Chris Mattmann, Paul Ramirez","Ensemble Maximum Entropy Classification and Linear Regression for Author - Age Prediction",cs.LG cs.CL," The evolution of the internet has created an abundance of unstructured data -on the web, a significant part of which is textual. The task of author -profiling seeks to find the demographics of people solely from their linguistic -and content-based features in text. The ability to describe traits of authors -clearly has applications in fields such as security and forensics, as well as -marketing. Instead of seeing age as just a classification problem, we also -frame age as a regression one, but use an ensemble chain method that -incorporates the power of both classification and regression to learn the -authors exact age. -" -3559,1610.00879,"Aditya Joshi, Abhijit Mishra, Balamurali AR, Pushpak Bhattacharyya, - Mark Carman",A Computational Approach to Automatic Prediction of Drunk Texting,cs.CL," Alcohol abuse may lead to unsociable behavior such as crime, drunk driving, -or privacy leaks. We introduce automatic drunk-texting prediction as the task -of identifying whether a text was written when under the influence of alcohol. -We experiment with tweets labeled using hashtags as distant supervision. Our -classifiers use a set of N-gram and stylistic features to detect drunk tweets. -Our observations present the first quantitative evidence that text contains -signals that can be exploited to detect drunk-texting. -" -3560,1610.00883,"Aditya Joshi, Vaibhav Tripathi, Kevin Patel, Pushpak Bhattacharyya, - Mark Carman",Are Word Embedding-based Features Useful for Sarcasm Detection?,cs.CL," This paper makes a simple increment to state-of-the-art in sarcasm detection -research. Existing approaches are unable to capture subtle forms of context -incongruity which lies at the heart of sarcasm. We explore if prior work can be -enhanced using semantic similarity/discordance between word embeddings. We -augment word embedding-based features to four feature sets reported in the -past. We also experiment with four types of word embeddings. We observe an -improvement in sarcasm detection, irrespective of the word embedding used or -the original feature set to which our features are augmented. For example, this -augmentation results in an improvement in F-score of around 4\% for three out -of these four feature sets, and a minor degradation in case of the fourth, when -Word2Vec embeddings are used. Finally, a comparison of the four embeddings -shows that Word2Vec and dependency weight-based features outperform LSA and -GloVe, in terms of their benefit to sarcasm detection. -" -3561,1610.00956,"Ondrej Bajgar, Rudolf Kadlec and Jan Kleindienst",Embracing data abundance: BookTest Dataset for Reading Comprehension,cs.CL cs.AI cs.LG cs.NE," There is a practically unlimited amount of natural language data available. -Still, recent work in text comprehension has focused on datasets which are -small relative to current computing possibilities. This article is making a -case for the community to move to larger data and as a step in that direction -it is proposing the BookTest, a new dataset similar to the popular Children's -Book Test (CBT), however more than 60 times larger. We show that training on -the new data improves the accuracy of our Attention-Sum Reader model on the -original CBT test data by a much larger margin than many recent attempts to -improve the model architecture. On one version of the dataset our ensemble even -exceeds the human baseline provided by Facebook. We then show in our own human -study that there is still space for further improvement. -" -3562,1610.01030,"Dat Tien Nguyen, Shafiq Joty, Muhammad Imran, Hassan Sajjad, Prasenjit - Mitra","Applications of Online Deep Learning for Crisis Response Using Social - Media Information",cs.CL cs.CY cs.LG," During natural or man-made disasters, humanitarian response organizations -look for useful information to support their decision-making processes. Social -media platforms such as Twitter have been considered as a vital source of -useful information for disaster response and management. Despite advances in -natural language processing techniques, processing short and informal Twitter -messages is a challenging task. In this paper, we propose to use Deep Neural -Network (DNN) to address two types of information needs of response -organizations: 1) identifying informative tweets and 2) classifying them into -topical classes. DNNs use distributed representation of words and learn the -representation as well as higher level features automatically for the -classification task. We propose a new online algorithm based on stochastic -gradient descent to train DNNs in an online fashion during disaster situations. -We test our models using a crisis-related real-world Twitter dataset. -" -3563,1610.01076,Mateusz Malinowski and Mario Fritz,Tutorial on Answering Questions about Images with Deep Learning,cs.CV cs.AI cs.CL cs.LG cs.NE," Together with the development of more accurate methods in Computer Vision and -Natural Language Understanding, holistic architectures that answer on questions -about the content of real-world images have emerged. In this tutorial, we build -a neural-based approach to answer questions about images. We base our tutorial -on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the -models that we present here can achieve a competitive performance on both -datasets, in fact, they are among the best methods that use a combination of -LSTM with a global, full frame CNN representation of an image. We hope that -after reading this tutorial, the reader will be able to use Deep Learning -frameworks, such as Keras and introduced Kraino, to build various architectures -that will lead to a further performance improvement on this challenging task. -" -3564,1610.01108,Marcin Junczys-Dowmunt and Tomasz Dwojak and Hieu Hoang,"Is Neural Machine Translation Ready for Deployment? A Case Study on 30 - Translation Directions",cs.CL," In this paper we provide the largest published comparison of translation -quality for phrase-based SMT and neural machine translation across 30 -translation directions. For ten directions we also include hierarchical -phrase-based MT. Experiments are performed for the recently published United -Nations Parallel Corpus v1.0 and its large six-way sentence-aligned subcorpus. -In the second part of the paper we investigate aspects of translation speed, -introducing AmuNMT, our efficient neural machine translation decoder. We -demonstrate that current neural machine translation could already be used for -in-production systems when comparing words-per-second ratios. -" -3565,1610.01247,"Tuan Do, Nikhil Krishnaswamy, James Pustejovsky",ECAT: Event Capture Annotation Tool,cs.CL cs.CV," This paper introduces the Event Capture Annotation Tool (ECAT), a -user-friendly, open-source interface tool for annotating events and their -participants in video, capable of extracting the 3D positions and orientations -of objects in video captured by Microsoft's Kinect(R) hardware. The modeling -language VoxML (Pustejovsky and Krishnaswamy, 2016) underlies ECAT's object, -program, and attribute representations, although ECAT uses its own spec for -explicit labeling of motion instances. The demonstration will show the tool's -workflow and the options available for capturing event-participant relations -and browsing visual data. Mapping ECAT's output to VoxML will also be -addressed. -" -3566,1610.01291,"Christophe Servan and Alexandre Berard and Zied Elloumi and Herv\'e - Blanchon and Laurent Besacier","Word2Vec vs DBnary: Augmenting METEOR using Vector Representations or - Lexical Resources?",cs.CL," This paper presents an approach combining lexico-semantic resources and -distributed representations of words applied to the evaluation in machine -translation (MT). This study is made through the enrichment of a well-known MT -evaluation metric: METEOR. This metric enables an approximate match (synonymy -or morphological similarity) between an automatic and a reference translation. -Our experiments are made in the framework of the Metrics task of WMT 2014. We -show that distributed representations are a good alternative to lexico-semantic -resources for MT evaluation and they can even bring interesting additional -information. The augmented versions of METEOR, using vector representations, -are made available on our Github page. -" -3567,1610.01367,Mahdi Khademian and Mohammad Mehdi Homayounpour,"Monaural Multi-Talker Speech Recognition using Factorial Speech - Processing Models",cs.CL cs.SD," A Pascal challenge entitled monaural multi-talker speech recognition was -developed, targeting the problem of robust automatic speech recognition against -speech like noises which significantly degrades the performance of automatic -speech recognition systems. In this challenge, two competing speakers say a -simple command simultaneously and the objective is to recognize speech of the -target speaker. Surprisingly during the challenge, a team from IBM research, -could achieve a performance better than human listeners on this task. The -proposed method of the IBM team, consist of an intermediate speech separation -and then a single-talker speech recognition. This paper reconsiders the task of -this challenge based on gain adapted factorial speech processing models. It -develops a joint-token passing algorithm for direct utterance decoding of both -target and masker speakers, simultaneously. Comparing it to the challenge -winner, it uses maximum uncertainty during the decoding which cannot be used in -the past two-phased method. It provides detailed derivation of inference on -these models based on general inference procedures of probabilistic graphical -models. As another improvement, it uses deep neural networks for joint-speaker -identification and gain estimation which makes these two steps easier than -before producing competitive results for these steps. The proposed method of -this work outperforms past super-human results and even the results were -achieved recently by Microsoft research, using deep neural networks. It -achieved 5.5% absolute task performance improvement compared to the first -super-human system and 2.7% absolute task performance improvement compared to -its recent competitor. -" -3568,1610.01382,"Abdul Malik Badshah, Jamil Ahmad, Mi Young Lee, Sung Wook Baik","Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC - and Random Forest",cs.SD cs.CL," Besides spoken words, speech signals also carry information about speaker -gender, age, and emotional state which can be used in a variety of speech -analysis applications. In this paper, a divide and conquer strategy for -ensemble classification has been proposed to recognize emotions in speech. -Intrinsic hierarchy in emotions has been utilized to construct an emotions -tree, which assisted in breaking down the emotion recognition task into smaller -sub tasks. The proposed framework generates predictions in three phases. -Firstly, emotions are detected in the input speech signal by classifying it as -neutral or emotional. If the speech is classified as emotional, then in the -second phase, it is further classified into positive and negative classes. -Finally, individual positive or negative emotions are identified based on the -outcomes of the previous stages. Several experiments have been performed on a -widely used benchmark dataset. The proposed method was able to achieve improved -recognition rates as compared to several other approaches. -" -3569,1610.01465,"Kushal Kafle, Christopher Kanan","Visual Question Answering: Datasets, Algorithms, and Future Challenges",cs.CV cs.AI cs.CL," Visual Question Answering (VQA) is a recent problem in computer vision and -natural language processing that has garnered a large amount of interest from -the deep learning, computer vision, and natural language processing -communities. In VQA, an algorithm needs to answer text-based questions about -images. Since the release of the first VQA dataset in 2014, additional datasets -have been released and many algorithms have been proposed. In this review, we -critically examine the current state of VQA in terms of problem formulation, -existing datasets, evaluation metrics, and algorithms. In particular, we -discuss the limitations of current datasets with regard to their ability to -properly train and assess VQA algorithms. We then exhaustively review existing -algorithms for VQA. Finally, we discuss possible future directions for VQA and -image understanding research. -" -3570,1610.01486,Tiago Tresoldi,"A tentative model for dimensionless phoneme distance from binary - distinctive features",cs.CL," This work proposes a tentative model for the calculation of dimensionless -distances between phonemes; sounds are described with binary distinctive -features and distances show linear consistency in terms of such features. The -model can be used as a scoring function for local and global pairwise alignment -of phoneme sequences, and the distances can be used as prior probabilities for -Bayesian analyses on the phylogenetic relationship between languages, -particularly for cognate identification in cases where no empirical prior -probability is available. -" -3571,1610.01508,James Pustejovsky and Nikhil Krishnaswamy,VoxML: A Visualization Modeling Language,cs.CL," We present the specification for a modeling language, VoxML, which encodes -semantic knowledge of real-world objects represented as three-dimensional -models, and of events and attributes related to and enacted over these objects. -VoxML is intended to overcome the limitations of existing 3D visual markup -languages by allowing for the encoding of a broad range of semantic knowledge -that can be exploited by a variety of systems and platforms, leading to -multimodal simulations of real-world scenarios using conceptual objects that -represent their semantic values. -" -3572,1610.01520,"Edgar Altszyler, Mariano Sigman, Sidarta Ribeiro and Diego Fern\'andez - Slezak","Comparative study of LSA vs Word2vec embeddings in small corpora: a case - study in dreams database",cs.CL cs.IR," Word embeddings have been extensively studied in large text datasets. -However, only a few studies analyze semantic representations of small corpora, -particularly relevant in single-person text production studies. In the present -paper, we compare Skip-gram and LSA capabilities in this scenario, and we test -both techniques to extract relevant semantic patterns in single-series dreams -reports. LSA showed better performance than Skip-gram in small size training -corpus in two semantic tests. As a study case, we show that LSA can capture -relevant words associations in dream reports series, even in cases of small -number of dreams or low-frequency words. We propose that LSA can be used to -explore words associations in dreams reports, which could bring new insight -into this classic research area of psychology -" -3573,1610.01546,"Yueming Sun, Yi Zhang, Yunfei Chen, Roger Jin",Conversational Recommendation System with Unsupervised Learning,cs.CL cs.IR cs.LG," We will demonstrate a conversational products recommendation agent. This -system shows how we combine research in personalized recommendation systems -with research in dialogue systems to build a virtual sales agent. Based on new -deep learning technologies we developed, the virtual agent is capable of -learning how to interact with users, how to answer user questions, what is the -next question to ask, and what to recommend when chatting with a human user. - Normally a descent conversational agent for a particular domain requires tens -of thousands of hand labeled conversational data or hand written rules. This is -a major barrier when launching a conversation agent for a new domain. We will -explore and demonstrate the effectiveness of the learning solution even when -there is no hand written rules or hand labeled training data. -" -3574,1610.01561,"Koustav Rudra, Siddhartha Banerjee, Niloy Ganguly, Pawan Goyal, - Muhammad Imran, Prasenjit Mitra",Summarizing Situational and Topical Information During Crises,cs.SI cs.CL," The use of microblogging platforms such as Twitter during crises has become -widespread. More importantly, information disseminated by affected people -contains useful information like reports of missing and found people, requests -for urgent needs etc. For rapid crisis response, humanitarian organizations -look for situational awareness information to understand and assess the -severity of the crisis. In this paper, we present a novel framework (i) to -generate abstractive summaries useful for situational awareness, and (ii) to -capture sub-topics and present a short informative summary for each of these -topics. A summary is generated using a two stage framework that first extracts -a set of important tweets from the whole set of information through an -Integer-linear programming (ILP) based optimization technique and then follows -a word graph and concept event based abstractive summarization technique to -produce the final summary. High accuracies obtained for all the tasks show the -effectiveness of the proposed framework. -" -3575,1610.01588,Yftah Ziser and Roi Reichart,Neural Structural Correspondence Learning for Domain Adaptation,cs.CL," Domain adaptation, adapting models from domains rich in labeled training data -to domains poor in such data, is a fundamental NLP challenge. We introduce a -neural network model that marries together ideas from two prominent strands of -research on domain adaptation through representation learning: structural -correspondence learning (SCL, (Blitzer et al., 2006)) and autoencoder neural -networks. Particularly, our model is a three-layer neural network that learns -to encode the nonpivot features of an input example into a low-dimensional -representation, so that the existence of pivot features (features that are -prominent in both domains and convey useful information for the NLP task) in -the example can be decoded from that representation. The low-dimensional -representation is then employed in a learning algorithm for the task. Moreover, -we show how to inject pre-trained word embeddings into our model in order to -improve generalization across examples with similar pivot features. On the task -of cross-domain product sentiment classification (Blitzer et al., 2007), -consisting of 12 domain pairs, our model outperforms both the SCL and the -marginalized stacked denoising autoencoder (MSDA, (Chen et al., 2012)) methods -by 3.77% and 2.17% respectively, on average across domain pairs. -" -3576,1610.01713,James Pustejovsky and Nikhil Krishnaswamy,Generating Simulations of Motion Events from Verbal Descriptions,cs.CL," In this paper, we describe a computational model for motion events in natural -language that maps from linguistic expressions, through a dynamic event -interpretation, into three-dimensional temporal simulations in a model. -Starting with the model from (Pustejovsky and Moszkowicz, 2011), we analyze -motion events using temporally-traced Labelled Transition Systems. We model the -distinction between path- and manner-motion in an operational semantics, and -further distinguish different types of manner-of-motion verbs in terms of the -mereo-topological relations that hold throughout the process of movement. From -these representations, we generate minimal models, which are realized as -three-dimensional simulations in software developed with the game engine, -Unity. The generated simulations act as a conceptual ""debugger"" for the -semantics of different motion verbs: that is, by testing for consistency and -informativeness in the model, simulations expose the presuppositions associated -with linguistic expressions and their compositions. Because the model -generation component is still incomplete, this paper focuses on an -implementation which maps directly from linguistic interpretations into the -Unity code snippets that create the simulations. -" -3577,1610.01720,"French Pope III, Rouzbeh A. Shirvani, Mugizi Robert Rwebangira, - Mohamed Chouikha, Ayo Taylor, Andres Alarcon Ramirez, Amirsina Torfi","Automatic Detection of Small Groups of Persons, Influential Members, - Relations and Hierarchy in Written Conversations Using Fuzzy Logic",cs.CL cs.SI," Nowadays a lot of data is collected in online forums. One of the key tasks is -to determine the social structure of these online groups, for example the -identification of subgroups within a larger group. We will approach the -grouping of individual as a classification problem. The classifier will be -based on fuzzy logic. The input to the classifier will be linguistic features -and degree of relationships (among individuals). The output of the classifiers -are the groupings of individuals. We also incorporate a method that ranks the -members of the detected subgroup to identify the hierarchies in each subgroup. -Data from the HBO television show The Wire is used to analyze the efficacy and -usefulness of fuzzy logic based methods as alternative methods to classical -statistical methods usually used for these problems. The proposed methodology -could detect automatically the most influential members of each organization -The Wire with 90% accuracy. -" -3578,1610.01858,"Muhammad Imran, Sanjay Chawla, Carlos Castillo","A Robust Framework for Classifying Evolving Document Streams in an - Expert-Machine-Crowd Setting",cs.CL cs.IR," An emerging challenge in the online classification of social media data -streams is to keep the categories used for classification up-to-date. In this -paper, we propose an innovative framework based on an Expert-Machine-Crowd -(EMC) triad to help categorize items by continuously identifying novel concepts -in heterogeneous data streams often riddled with outliers. We unify constrained -clustering and outlier detection by formulating a novel optimization problem: -COD-Means. We design an algorithm to solve the COD-Means problem and show that -COD-Means will not only help detect novel categories but also seamlessly -discover human annotation errors and improve the overall quality of the -categorization process. Experiments on diverse real data sets demonstrate that -our approach is both effective and efficient. -" -3579,1610.01874,"Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu",Neural-based Noise Filtering from Word Embeddings,cs.CL," Word embeddings have been demonstrated to benefit NLP tasks impressively. -Yet, there is room for improvement in the vector representations, because -current word embeddings typically contain unnecessary information, i.e., noise. -We propose two novel models to improve word embeddings by unsupervised -learning, in order to yield word denoising embeddings. The word denoising -embeddings are obtained by strengthening salient information and weakening -noise in the original word embeddings, based on a deep feed-forward neural -network filter. Results from benchmark tasks show that the filtered word -denoising embeddings outperform the original word embeddings. -" -3580,1610.01891,"Sadikin Mujiono, Mohamad Ivan Fanany, Chan Basaruddin","A New Data Representation Based on Training Data Characteristics to - Extract Drug Named-Entity in Medical Text",cs.CL cs.AI cs.LG cs.NE," One essential task in information extraction from the medical corpus is drug -name recognition. Compared with text sources come from other domains, the -medical text is special and has unique characteristics. In addition, the -medical text mining poses more challenges, e.g., more unstructured text, the -fast growing of new terms addition, a wide range of name variation for the same -drug. The mining is even more challenging due to the lack of labeled dataset -sources and external knowledge, as well as multiple token representations for a -single drug name that is more common in the real application setting. Although -many approaches have been proposed to overwhelm the task, some problems -remained with poor F-score performance (less than 0.75). This paper presents a -new treatment in data representation techniques to overcome some of those -challenges. We propose three data representation techniques based on the -characteristics of word distribution and word similarities as a result of word -embedding training. The first technique is evaluated with the standard NN -model, i.e., MLP (Multi-Layer Perceptrons). The second technique involves two -deep network classifiers, i.e., DBN (Deep Belief Networks), and SAE (Stacked -Denoising Encoders). The third technique represents the sentence as a sequence -that is evaluated with a recurrent NN model, i.e., LSTM (Long Short Term -Memory). In extracting the drug name entities, the third technique gives the -best F-score performance compared to the state of the art, with its average -F-score being 0.8645. -" -3581,1610.01910,"Amit Navindgi, Caroline Brun, C\'ecile Boulard Masson, Scott Nowson","Toward Automatic Understanding of the Function of Affective Language in - Support Groups",cs.CL," Understanding expressions of emotions in support forums has considerable -value and NLP methods are key to automating this. Many approaches -understandably use subjective categories which are more fine-grained than a -straightforward polarity-based spectrum. However, the definition of such -categories is non-trivial and, in fact, we argue for a need to incorporate -communicative elements even beyond subjectivity. To support our position, we -report experiments on a sentiment-labelled corpus of posts taken from a medical -support forum. We argue that not only is a more fine-grained approach to text -analysis important, but simultaneously recognising the social function behind -affective expressions enable a more accurate and valuable level of -understanding. -" -3582,1610.02003,Paul Baltescu,Scalable Machine Translation in Memory Constrained Environments,cs.CL," Machine translation is the discipline concerned with developing automated -tools for translating from one human language to another. Statistical machine -translation (SMT) is the dominant paradigm in this field. In SMT, translations -are generated by means of statistical models whose parameters are learned from -bilingual data. Scalability is a key concern in SMT, as one would like to make -use of as much data as possible to train better translation systems. - In recent years, mobile devices with adequate computing power have become -widely available. Despite being very successful, mobile applications relying on -NLP systems continue to follow a client-server architecture, which is of -limited use because access to internet is often limited and expensive. The goal -of this dissertation is to show how to construct a scalable machine translation -system that can operate with the limited resources available on a mobile -device. - The main challenge for porting translation systems on mobile devices is -memory usage. The amount of memory available on a mobile device is far less -than what is typically available on the server side of a client-server -application. In this thesis, we investigate alternatives for the two components -which prevent standard translation systems from working on mobile devices due -to high memory usage. We show that once these standard components are replaced -with our proposed alternatives, we obtain a scalable translation system that -can work on a device with limited memory. -" -3583,1610.02124,"Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault","There's No Comparison: Reference-less Evaluation Metrics in Grammatical - Error Correction",cs.CL," Current methods for automatically evaluating grammatical error correction -(GEC) systems rely on gold-standard references. However, these methods suffer -from penalizing grammatical edits that are correct but not in the gold -standard. We show that reference-less grammaticality metrics correlate very -strongly with human judgments and are competitive with the leading -reference-based evaluation metrics. By interpolating both methods, we achieve -state-of-the-art correlation with human judgments. Finally, we show that GEC -metrics are much more reliable when they are calculated at the sentence level -instead of the corpus level. We have set up a CodaLab site for benchmarking GEC -output using a common dataset and different evaluation metrics. -" -3584,1610.02209,Marta R. Costa-juss\`a and Carlos Escolano,"Morphology Generation for Statistical Machine Translation using Deep - Learning Techniques",cs.CL stat.ML," Morphology in unbalanced languages remains a big challenge in the context of -machine translation. In this paper, we propose to de-couple machine translation -from morphology generation in order to better deal with the problem. We -investigate the morphology simplification with a reasonable trade-off between -expected gain and generation complexity. For the Chinese-Spanish task, optimum -morphological simplification is in gender and number. For this purpose, we -design a new classification architecture which, compared to other standard -machine learning techniques, obtains the best results. This proposed -neural-based architecture consists of several layers: an embedding, a -convolutional followed by a recurrent neural network and, finally, ends with -sigmoid and softmax layers. We obtain classification results over 98% accuracy -in gender classification, over 93% in number classification, and an overall -translation improvement of 0.7 METEOR. -" -3585,1610.02213,"\""Ozlem \c{C}etino\u{g}lu and Sarah Schulz and Ngoc Thang Vu",Challenges of Computational Processing of Code-Switching,cs.CL," This paper addresses challenges of Natural Language Processing (NLP) on -non-canonical multilingual data in which two or more languages are mixed. It -refers to code-switching which has become more popular in our daily life and -therefore obtains an increasing amount of attention from the research -community. We report our experience that cov- ers not only core NLP tasks such -as normalisation, language identification, language modelling, part-of-speech -tagging and dependency parsing but also more downstream ones such as machine -translation and automatic speech recognition. We highlight and discuss the key -problems for each of the tasks with supporting examples from different language -pairs and relevant previous work. -" -3586,1610.02424,"Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing - Sun, Stefan Lee, David Crandall, Dhruv Batra","Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence - Models",cs.AI cs.CL cs.CV," Neural sequence models are widely used to model time-series data. Equally -ubiquitous is the usage of beam search (BS) as an approximate inference -algorithm to decode output sequences from these models. BS explores the search -space in a greedy left-right fashion retaining only the top-B candidates - -resulting in sequences that differ only slightly from each other. Producing -lists of nearly identical sequences is not only computationally wasteful but -also typically fails to capture the inherent ambiguity of complex AI tasks. To -overcome this problem, we propose Diverse Beam Search (DBS), an alternative to -BS that decodes a list of diverse outputs by optimizing for a -diversity-augmented objective. We observe that our method finds better top-1 -solutions by controlling for the exploration and exploitation of the search -space - implying that DBS is a better search algorithm. Moreover, these gains -are achieved with minimal computational or memory over- head as compared to -beam search. To demonstrate the broad applicability of our method, we present -results on image captioning, machine translation and visual question generation -using both standard quantitative metrics and qualitative human studies. -Further, we study the role of diversity for image-grounded language generation -tasks as the complexity of the image changes. We observe that our method -consistently outperforms BS and previously proposed techniques for diverse -decoding from neural sequence models. -" -3587,1610.02493,"Mourad Mars, Mounir Zrigui, Mohamed Belgacem, Anis Zouaghi","A Semantic Analyzer for the Comprehension of the Spontaneous Arabic - Speech",cs.CL," This work is part of a large research project entitled ""Or\'eodule"" aimed at -developing tools for automatic speech recognition, translation, and synthesis -for Arabic language. Our attention has mainly been focused on an attempt to -improve the probabilistic model on which our semantic decoder is based. To -achieve this goal, we have decided to test the influence of the pertinent -context use, and of the contextual data integration of different types, on the -effectiveness of the semantic decoder. The findings are quite satisfactory. -" -3588,1610.02544,"Aaron Steven White, Drew Reisinger, Rachel Rudinger, Kyle Rawlins, - Benjamin Van Durme",Computational linking theory,cs.CL," A linking theory explains how verbs' semantic arguments are mapped to their -syntactic arguments---the inverse of the Semantic Role Labeling task from the -shallow semantic parsing literature. In this paper, we develop the -Computational Linking Theory framework as a method for implementing and testing -linking theories proposed in the theoretical literature. We deploy this -framework to assess two cross-cutting types of linking theory: local v. global -models and categorical v. featural models. To further investigate the behavior -of these models, we develop a measurement model in the spirit of previous work -in semantic role induction: the Semantic Proto-Role Linking Model. We use this -model, which implements a generalization of Dowty's seminal Proto-Role Theory, -to induce semantic proto-roles, which we compare to those Dowty proposes. -" -3589,1610.02567,"Abbas Chokor, Abeed Sarker, Graciela Gonzalez","Mining the Web for Pharmacovigilance: the Case Study of Duloxetine and - Venlafaxine",cs.CL cs.CY," Adverse reactions caused by drugs following their release into the market are -among the leading causes of death in many countries. The rapid growth of -electronically available health related information, and the ability to process -large volumes of them automatically, using natural language processing (NLP) -and machine learning algorithms, have opened new opportunities for -pharmacovigilance. Survey found that more than 70% of US Internet users consult -the Internet when they require medical information. In recent years, research -in this area has addressed for Adverse Drug Reaction (ADR) pharmacovigilance -using social media, mainly Twitter and medical forums and websites. This paper -will show the information which can be collected from a variety of Internet -data sources and search engines, mainly Google Trends and Google Correlate. -While considering the case study of two popular Major depressive Disorder (MDD) -drugs, Duloxetine and Venlafaxine, we will provide a comparative analysis for -their reactions using publicly-available alternative data sources. -" -3590,1610.02633,"Ahmad Musleh and Nadir Durrani and Irina Temnikova and Preslav Nakov - and Stephan Vogel and Osama Alsaad",Enabling Medical Translation for Low-Resource Languages,cs.CL," We present research towards bridging the language gap between migrant workers -in Qatar and medical staff. In particular, we present the first steps towards -the development of a real-world Hindi-English machine translation system for -doctor-patient communication. As this is a low-resource language pair, -especially for speech and for the medical domain, our initial focus has been on -gathering suitable training data from various sources. We applied a variety of -methods ranging from fully automatic extraction from the Web to manual -annotation of test data. Moreover, we developed a method for automatically -augmenting the training data with synthetically generated variants, which -yielded a very sizable improvement of more than 3 BLEU points absolute. -" -3591,1610.02683,"Malika Aubakirova, Mohit Bansal",Interpreting Neural Networks to Improve Politeness Comprehension,cs.CL cs.AI," We present an interpretable neural network approach to predicting and -understanding politeness in natural language requests. Our models are based on -simple convolutional neural networks directly on raw text, avoiding any manual -identification of complex sentiment or syntactic features, while performing -better than such feature-based models from previous work. More importantly, we -use the challenging task of politeness prediction as a testbed to next present -a much-needed understanding of what these successful networks are actually -learning. For this, we present several network visualizations based on -activation clusters, first derivative saliency, and embedding space -transformations, helping us automatically identify several subtle linguistics -markers of politeness theories. Further, this analysis reveals multiple novel, -high-scoring politeness strategies which, when added back as new features, -reduce the accuracy gap between the original featurized system and the neural -model, thus providing a clear quantitative interpretation of the success of -these neural networks. -" -3592,1610.02692,"Issey Masuda, Santiago Pascual de la Puente and Xavier Giro-i-Nieto",Open-Ended Visual Question-Answering,cs.CL cs.CV cs.MM," This thesis report studies methods to solve Visual Question-Answering (VQA) -tasks with a Deep Learning framework. As a preliminary step, we explore Long -Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to -tackle Question-Answering (text based). We then modify the previous model to -accept an image as an input in addition to the question. For this purpose, we -explore the VGG-16 and K-CNN convolutional neural networks to extract visual -features from the image. These are merged with the word embedding or with a -sentence embedding of the question to predict the answer. This work was -successfully submitted to the Visual Question Answering Challenge 2016, where -it achieved a 53,62% of accuracy in the test dataset. The developed software -has followed the best programming practices and Python code style, providing a -consistent baseline in Keras for different configurations. -" -3593,1610.02736,"Ivan Gonzalez Torre, Bartolo Luque, Lucas Lacasa, Jordi Luque and - Antoni Hernandez-Fernandez",Emergence of linguistic laws in human voice,physics.soc-ph cs.CL," Linguistic laws constitute one of the quantitative cornerstones of modern -cognitive sciences and have been routinely investigated in written corpora, or -in the equivalent transcription of oral corpora. This means that inferences of -statistical patterns of language in acoustics are biased by the arbitrary, -language-dependent segmentation of the signal, and virtually precludes the -possibility of making comparative studies between human voice and other animal -communication systems. Here we bridge this gap by proposing a method that -allows to measure such patterns in acoustic signals of arbitrary origin, -without needs to have access to the language corpus underneath. The method has -been applied to six different human languages, recovering successfully some -well-known laws of human communication at timescales even below the phoneme and -finding yet another link between complexity and criticality in a biological -system. These methods further pave the way for new comparative studies in -animal communication or the analysis of signals of unknown code. -" -3594,1610.02749,"Huijia Wu, Jiajun Zhang and Chengqing Zong",A Dynamic Window Neural Network for CCG Supertagging,cs.CL," Combinatory Category Grammar (CCG) supertagging is a task to assign lexical -categories to each word in a sentence. Almost all previous methods use fixed -context window sizes as input features. However, it is obvious that different -tags usually rely on different context window sizes. These motivate us to build -a supertagger with a dynamic window approach, which can be treated as an -attention mechanism on the local contexts. Applying dropout on the dynamic -filters can be seen as drop on words directly, which is superior to the regular -dropout on word embeddings. We use this approach to demonstrate the -state-of-the-art CCG supertagging performance on the standard test set. -" -3595,1610.02751,Shiyou Lian,"A New Theoretical and Technological System of Imprecise-Information - Processing",cs.CL cs.AI," Imprecise-information processing will play an indispensable role in -intelligent systems, especially in the anthropomorphic intelligent systems (as -intelligent robots). A new theoretical and technological system of -imprecise-information processing has been founded in Principles of -Imprecise-Information Processing: A New Theoretical and Technological System[1] -which is different from fuzzy technology. The system has clear hierarchy and -rigorous structure, which results from the formation principle of imprecise -information and has solid mathematical and logical bases, and which has many -advantages beyond fuzzy technology. The system provides a technological -platform for relevant applications and lays a theoretical foundation for -further research. -" -3596,1610.02806,"Yao Zhou, Cong Liu and Yan Pan",Modelling Sentence Pairs with Tree-structured Attentive Encoder,cs.CL," We describe an attentive encoder that combines tree-structured recursive -neural networks and sequential recurrent neural networks for modelling sentence -pairs. Since existing attentive models exert attention on the sequential -structure, we propose a way to incorporate attention into the tree topology. -Specially, given a pair of sentences, our attentive encoder uses the -representation of one sentence, which generated via an RNN, to guide the -structural encoding of the other sentence on the dependency parse tree. We -evaluate the proposed attentive encoder on three tasks: semantic similarity, -paraphrase identification and true-false question selection. Experimental -results show that our encoder outperforms all baselines and achieves -state-of-the-art results on two tasks. -" -3597,1610.02891,"Kaixiang Mo, Shuangyin Li, Yu Zhang, Jiajun Li, Qiang Yang",Personalizing a Dialogue System with Transfer Reinforcement Learning,cs.AI cs.CL cs.LG," It is difficult to train a personalized task-oriented dialogue system because -the data collected from each individual is often insufficient. Personalized -dialogue systems trained on a small dataset can overfit and make it difficult -to adapt to different user needs. One way to solve this problem is to consider -a collection of multiple users' data as a source domain and an individual -user's data as a target domain, and to perform a transfer learning from the -source to the target domain. By following this idea, we propose -""PETAL""(PErsonalized Task-oriented diALogue), a transfer-learning framework -based on POMDP to learn a personalized dialogue system. The system first learns -common dialogue knowledge from the source domain and then adapts this knowledge -to the target user. This framework can avoid the negative transfer problem by -considering differences between source and target users. The policy in the -personalized POMDP can learn to choose different actions appropriately for -different users. Experimental results on a real-world coffee-shopping data and -simulation data show that our personalized dialogue system can choose different -optimal actions for different users, and thus effectively improve the dialogue -quality under the personalized setting. -" -3598,1610.02906,"Xiaofei Sun, Jiang Guo, Xiao Ding and Ting Liu",A General Framework for Content-enhanced Network Representation Learning,cs.SI cs.CL cs.LG," This paper investigates the problem of network embedding, which aims at -learning low-dimensional vector representation of nodes in networks. Most -existing network embedding methods rely solely on the network structure, i.e., -the linkage relationships between nodes, but ignore the rich content -information associated with it, which is common in real world networks and -beneficial to describing the characteristics of a node. In this paper, we -propose content-enhanced network embedding (CENE), which is capable of jointly -leveraging the network structure and the content information. Our approach -integrates text modeling and structure modeling in a general framework by -treating the content information as a special kind of node. Experiments on -several real world net- works with application to node classification show that -our models outperform all existing network embedding methods, demonstrating the -merits of content information and joint learning. -" -3599,1610.03009,"Ali Khodabakhsh, Cenk Demiroglu","Investigation of Synthetic Speech Detection Using Frame- and - Segment-Specific Importance Weighting",cs.SD cs.CL," Speaker verification systems are vulnerable to spoofing attacks which -presents a major problem in their real-life deployment. To date, most of the -proposed synthetic speech detectors (SSDs) have weighted the importance of -different segments of speech equally. However, different attack methods have -different strengths and weaknesses and the traces that they leave may be short -or long term acoustic artifacts. Moreover, those may occur for only particular -phonemes or sounds. Here, we propose three algorithms that weigh -likelihood-ratio scores of individual frames, phonemes, and sound-classes -depending on their importance for the SSD. Significant improvement over the -baseline system has been obtained for known attack methods that were used in -training the SSDs. However, improvement with unknown attack types was not -substantial. Thus, the type of distortions that were caused by the unknown -systems were different and could not be captured better with the proposed SSD -compared to the baseline SSD. -" -3600,1610.03017,"Jason Lee, Kyunghyun Cho and Thomas Hofmann","Fully Character-Level Neural Machine Translation without Explicit - Segmentation",cs.CL cs.LG," Most existing machine translation systems operate at the level of words, -relying on explicit segmentation to extract tokens. We introduce a neural -machine translation (NMT) model that maps a source character sequence to a -target character sequence without any segmentation. We employ a character-level -convolutional network with max-pooling at the encoder to reduce the length of -source representation, allowing the model to be trained at a speed comparable -to subword-level models while capturing local regularities. Our -character-to-character model outperforms a recently proposed baseline with a -subword-level encoder on WMT'15 DE-EN and CS-EN, and gives comparable -performance on FI-EN and RU-EN. We then demonstrate that it is possible to -share a single character-level encoder across multiple languages by training a -model on a many-to-one translation task. In this multilingual setting, the -character-level encoder significantly outperforms the subword-level encoder on -all the language pairs. We observe that on CS-EN, FI-EN and RU-EN, the quality -of the multilingual character-level translation even surpasses the models -specifically trained on that language pair alone, both in terms of BLEU score -and human judgment. -" -3601,1610.03022,"Yu Zhang, William Chan, Navdeep Jaitly",Very Deep Convolutional Networks for End-to-End Speech Recognition,cs.CL," Sequence-to-sequence models have shown success in end-to-end speech -recognition. However these models have only used shallow acoustic encoder -networks. In our work, we successively train very deep convolutional networks -to add more expressive power and better generalization for end-to-end ASR -models. We apply network-in-network principles, batch normalization, residual -connections and convolutional LSTMs to build very deep recurrent and -convolutional structures. Our models exploit the spectral structure in the -feature space and add computational depth without overfitting issues. We -experiment with the WSJ ASR task and achieve 10.5\% word error rate without any -dictionary or language using a 15 layer deep network. -" -3602,1610.03035,"William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly",Latent Sequence Decompositions,stat.ML cs.CL cs.LG," We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes -sequences with variable lengthed output units as a function of both the input -sequence and the output sequence. We present a training algorithm which samples -valid extensions and an approximate decoding algorithm. We experiment with the -Wall Street Journal speech recognition task. Our LSD model achieves 12.9% WER -compared to a character baseline of 14.8% WER. When combined with a -convolutional network on the encoder, we achieve 9.6% WER. -" -3603,1610.03098,"Aaditya Prakash, Sadid A. Hasan, Kathy Lee, Vivek Datla, Ashequl - Qadir, Joey Liu, Oladimeji Farri",Neural Paraphrase Generation with Stacked Residual LSTM Networks,cs.CL," In this paper, we propose a novel neural approach for paraphrase generation. -Conventional para- phrase generation methods either leverage hand-written rules -and thesauri-based alignments, or use statistical machine learning principles. -To the best of our knowledge, this work is the first to explore deep learning -models for paraphrase generation. Our primary contribution is a stacked -residual LSTM network, where we add residual connections between LSTM layers. -This allows for efficient training of deep LSTMs. We evaluate our model and -other state-of-the-art deep learning models on three different datasets: PPDB, -WikiAnswers and MSCOCO. Evaluation results demonstrate that our model -outperforms sequence to sequence, attention-based and bi- directional LSTM -models on BLEU, METEOR, TER and an embedding-based sentence similarity metric. -" -3604,1610.03106,"Hussam Hamdan, Patrice Bellot, Frederic Bechet",Supervised Term Weighting Metrics for Sentiment Analysis in Short Text,cs.CL cs.IR cs.LG," Term weighting metrics assign weights to terms in order to discriminate the -important terms from the less crucial ones. Due to this characteristic, these -metrics have attracted growing attention in text classification and recently in -sentiment analysis. Using the weights given by such metrics could lead to more -accurate document representation which may improve the performance of the -classification. While previous studies have focused on proposing or comparing -different weighting metrics at two-classes document level sentiment analysis, -this study propose to analyse the results given by each metric in order to find -out the characteristics of good and bad weighting metrics. Therefore we present -an empirical study of fifteen global supervised weighting metrics with four -local weighting metrics adopted from information retrieval, we also give an -analysis to understand the behavior of each metric by observing and analysing -how each metric distributes the terms and deduce some characteristics which may -distinguish the good and bad metrics. The evaluation has been done using -Support Vector Machine on three different datasets: Twitter, restaurant and -laptop reviews. -" -3605,1610.03112,"Tiancheng Zhao, Ran Zhao, Zhao Meng, Justine Cassell","Leveraging Recurrent Neural Networks for Multimodal Recognition of - Social Norm Violation in Dialog",cs.CL," Social norms are shared rules that govern and facilitate social interaction. -Violating such social norms via teasing and insults may serve to upend power -imbalances or, on the contrary reinforce solidarity and rapport in -conversation, rapport which is highly situated and context-dependent. In this -work, we investigate the task of automatically identifying the phenomena of -social norm violation in discourse. Towards this goal, we leverage the power of -recurrent neural networks and multimodal information present in the -interaction, and propose a predictive model to recognize social norm violation. -Using long-term temporal and contextual information, our model achieves an F1 -score of 0.705. Implications of our work regarding developing a social-aware -agent are discussed. -" -3606,1610.03120,Hussam Hamdan,Correlation-Based Method for Sentiment Classification,cs.CL cs.IR," The classic supervised classification algorithms are efficient, but -time-consuming, complicated and not interpretable, which makes it difficult to -analyze their results that limits the possibility to improve them based on real -observations. In this paper, we propose a new and a simple classifier to -predict a sentiment label of a short text. This model keeps the capacity of -human interpret-ability and can be extended to integrate NLP techniques in a -more interpretable way. Our model is based on a correlation metric which -measures the degree of association between a sentiment label and a word. Ten -correlation metrics are proposed and evaluated intrinsically. And then a -classifier based on each metric is proposed, evaluated and compared to the -classic classification algorithms which have proved their performance in many -studies. Our model outperforms these algorithms with several correlation -metrics. -" -3607,1610.03164,Andrea F. Daniele and Mohit Bansal and Matthew R. Walter,"Navigational Instruction Generation as Inverse Reinforcement Learning - with Neural Machine Translation",cs.RO cs.AI cs.CL cs.LG," Modern robotics applications that involve human-robot interaction require -robots to be able to communicate with humans seamlessly and effectively. -Natural language provides a flexible and efficient medium through which robots -can exchange information with their human partners. Significant advancements -have been made in developing robots capable of interpreting free-form -instructions, but less attention has been devoted to endowing robots with the -ability to generate natural language. We propose a navigational guide model -that enables robots to generate natural language instructions that allow humans -to navigate a priori unknown environments. We first decide which information to -share with the user according to their preferences, using a policy trained from -human demonstrations via inverse reinforcement learning. We then ""translate"" -this information into a natural language instruction using a neural -sequence-to-sequence model that learns to generate free-form instructions from -natural language corpora. We evaluate our method on a benchmark route -instruction dataset and achieve a BLEU score of 72.18% when compared to -human-generated reference instructions. We additionally conduct navigation -experiments with human participants that demonstrate that our method generates -instructions that people follow as accurately and easily as those produced by -humans. -" -3608,1610.03165,Xiangang Li and Xihong Wu,"Long Short-Term Memory based Convolutional Recurrent Neural Networks for - Large Vocabulary Speech Recognition",cs.CL cs.NE," Long short-term memory (LSTM) recurrent neural networks (RNNs) have been -shown to give state-of-the-art performance on many speech recognition tasks, as -they are able to provide the learned dynamically changing contextual window of -all sequence history. On the other hand, the convolutional neural networks -(CNNs) have brought significant improvements to deep feed-forward neural -networks (FFNNs), as they are able to better reduce spectral variation in the -input signal. In this paper, a network architecture called as convolutional -recurrent neural network (CRNN) is proposed by combining the CNN and LSTM RNN. -In the proposed CRNNs, each speech frame, without adjacent context frames, is -organized as a number of local feature patches along the frequency axis, and -then a LSTM network is performed on each feature patch along the time axis. We -train and compare FFNNs, LSTM RNNs and the proposed LSTM CRNNs at various -number of configurations. Experimental results show that the LSTM CRNNs can -exceed state-of-the-art speech recognition performance. -" -3609,1610.03167,"Huijia Wu, Jiajun Zhang and Chengqing Zong",An Empirical Exploration of Skip Connections for Sequential Tagging,cs.CL," In this paper, we empirically explore the effects of various kinds of skip -connections in stacked bidirectional LSTMs for sequential tagging. We -investigate three kinds of skip connections connecting to LSTM cells: (a) skip -connections to the gates, (b) skip connections to the internal states and (c) -skip connections to the cell outputs. We present comprehensive experiments -showing that skip connections to cell outputs outperform the remaining two. -Furthermore, we observe that using gated identity functions as skip mappings -works pretty well. Based on this novel skip connections, we successfully train -deep stacked bidirectional LSTM models and obtain state-of-the-art results on -CCG supertagging and comparable results on POS tagging. -" -3610,1610.03246,Maisa C. Duarte and Pierre Maret,Toward a new instances of NELL,cs.CL," We are developing the method to start new instances of NELL in various -languages and develop then NELL multilingualism. We base our method on our -experience on NELL Portuguese and NELL French. This reports explain our method -and develops some research perspectives. -" -3611,1610.03256,"G\'abor Gosztolya, Tam\'as Gr\'osz, L\'aszl\'o T\'oth",GMM-Free Flat Start Sequence-Discriminative DNN Training,cs.CL," Recently, attempts have been made to remove Gaussian mixture models (GMM) -from the training process of deep neural network-based hidden Markov models -(HMM/DNN). For the GMM-free training of a HMM/DNN hybrid we have to solve two -problems, namely the initial alignment of the frame-level state labels and the -creation of context-dependent states. Although flat-start training via -iteratively realigning and retraining the DNN using a frame-level error -function is viable, it is quite cumbersome. Here, we propose to use a -sequence-discriminative training criterion for flat start. While -sequence-discriminative training is routinely applied only in the final phase -of model training, we show that with proper caution it is also suitable for -getting an alignment of context-independent DNN models. For the construction of -tied states we apply a recently proposed KL-divergence-based state clustering -method, hence our whole training process is GMM-free. In the experimental -evaluation we found that the sequence-discriminative flat start training method -is not only significantly faster than the straightforward approach of iterative -retraining and realignment, but the word error rates attained are slightly -better as well. -" -3612,1610.03321,Barbara Plank,Keystroke dynamics as signal for shallow syntactic parsing,cs.CL," Keystroke dynamics have been extensively used in psycholinguistic and writing -research to gain insights into cognitive processing. But do keystroke logs -contain actual signal that can be used to learn better natural language -processing models? - We postulate that keystroke dynamics contain information about syntactic -structure that can inform shallow syntactic parsing. To test this hypothesis, -we explore labels derived from keystroke logs as auxiliary task in a multi-task -bidirectional Long Short-Term Memory (bi-LSTM). Our results show promising -results on two shallow syntactic parsing tasks, chunking and CCG supertagging. -Our model is simple, has the advantage that data can come from distinct -sources, and produces models that are significantly better than models trained -on the text annotations alone. -" -3613,1610.03342,Lieke Gelderloos and Grzegorz Chrupa{\l}a,"From phonemes to images: levels of representation in a recurrent neural - model of visually-grounded language learning",cs.CL cs.LG," We present a model of visually-grounded language learning based on stacked -gated recurrent neural networks which learns to predict visual features given -an image description in the form of a sequence of phonemes. The learning task -resembles that faced by human language learners who need to discover both -structure and meaning from noisy and ambiguous data across modalities. We show -that our model indeed learns to predict features of the visual context given -phonetically transcribed image descriptions, and show that it represents -linguistic information in a hierarchy of levels: lower layers in the stack are -comparatively more sensitive to form, whereas higher layers are more sensitive -to meaning. -" -3614,1610.03349,"Helen O'Horan, Yevgeni Berzak, Ivan Vuli\'c, Roi Reichart, Anna - Korhonen","Survey on the Use of Typological Information in Natural Language - Processing",cs.CL," In recent years linguistic typology, which classifies the world's languages -according to their functional and structural properties, has been widely used -to support multilingual NLP. While the growing importance of typological -information in supporting multilingual tasks has been recognised, no systematic -survey of existing typological resources and their use in NLP has been -published. This paper provides such a survey as well as discussion which we -hope will both inform and inspire future work in the area. -" -3615,1610.03585,"Jon Gauthier, Igor Mordatch",A Paradigm for Situated and Goal-Driven Language Learning,cs.CL," A distinguishing property of human intelligence is the ability to flexibly -use language in order to communicate complex ideas with other humans in a -variety of contexts. Research in natural language dialogue should focus on -designing communicative agents which can integrate themselves into these -contexts and productively collaborate with humans. In this abstract, we propose -a general situated language learning paradigm which is designed to bring about -robust language agents able to cooperate productively with humans. -" -3616,1610.03708,"Hendrik Heuer, Christof Monz, Arnold W.M. Smeulders",Generating captions without looking beyond objects,cs.CV cs.CL," This paper explores new evaluation perspectives for image captioning and -introduces a noun translation task that achieves comparative image caption -generation performance by translating from a set of nouns to captions. This -implies that in image captioning, all word categories other than nouns can be -evoked by a powerful language model without sacrificing performance on n-gram -precision. The paper also investigates lower and upper bounds of how much -individual word categories in the captions contribute to the final BLEU score. -A large possible improvement exists for nouns, verbs, and prepositions. -" -3617,1610.03750,"Shanshan Zhang, Slobodan Vucetic","Semi-supervised Discovery of Informative Tweets During the Emerging - Disasters",cs.CL cs.SI," The first objective towards the effective use of microblogging services such -as Twitter for situational awareness during the emerging disasters is discovery -of the disaster-related postings. Given the wide range of possible disasters, -using a pre-selected set of disaster-related keywords for the discovery is -suboptimal. An alternative that we focus on in this work is to train a -classifier using a small set of labeled postings that are becoming available as -a disaster is emerging. Our hypothesis is that utilizing large quantities of -historical microblogs could improve the quality of classification, as compared -to training a classifier only on the labeled data. We propose to use unlabeled -microblogs to cluster words into a limited number of clusters and use the word -clusters as features for classification. To evaluate the proposed -semi-supervised approach, we used Twitter data from 6 different disasters. Our -results indicate that when the number of labeled tweets is 100 or less, the -proposed approach is superior to the standard classification based on the bag -or words feature representation. Our results also reveal that the choice of the -unlabeled corpus, the choice of word clustering algorithm, and the choice of -hyperparameters can have a significant impact on the classification accuracy. -" -3618,1610.03759,"Victor Makarenkov, Bracha Shapira, Lior Rokach",Language Models with Pre-Trained (GloVe) Word Embeddings,cs.CL," In this work we implement a training of a Language Model (LM), using -Recurrent Neural Network (RNN) and GloVe word embeddings, introduced by -Pennigton et al. in [1]. The implementation is following the general idea of -training RNNs for LM tasks presented in [2], but is rather using Gated -Recurrent Unit (GRU) [3] for a memory cell, and not the more commonly used LSTM -[4]. -" -3619,1610.03771,"Marzieh Saeidi, Guillaume Bouchard, Maria Liakata, Sebastian Riedel","SentiHood: Targeted Aspect Based Sentiment Analysis Dataset for Urban - Neighbourhoods",cs.CL," In this paper, we introduce the task of targeted aspect-based sentiment -analysis. The goal is to extract fine-grained information with respect to -entities mentioned in user comments. This work extends both aspect-based -sentiment analysis that assumes a single entity per document and targeted -sentiment analysis that assumes a single sentiment towards a target entity. In -particular, we identify the sentiment towards each aspect of one or more -entities. As a testbed for this task, we introduce the SentiHood dataset, -extracted from a question answering (QA) platform where urban neighbourhoods -are discussed by users. In this context units of text often mention several -aspects of one or more neighbourhoods. This is the first time that a generic -social media platform in this case a QA platform, is used for fine-grained -opinion mining. Text coming from QA platforms is far less constrained compared -to text from review specific platforms which current datasets are based on. We -develop several strong baselines, relying on logistic regression and -state-of-the-art recurrent neural networks. -" -3620,1610.03807,Linfeng Song and Lin Zhao,Question Generation from a Knowledge Base with Web Exploration,cs.CL," Question generation from a knowledge base (KB) is the task of generating -questions related to the domain of the input KB. We propose a system for -generating fluent and natural questions from a KB, which significantly reduces -the human effort by leveraging massive web resources. In more detail, a seed -question set is first generated by applying a small number of hand-crafted -templates on the input KB, then more questions are retrieved by iteratively -forming already obtained questions as search queries into a standard search -engine, before finally questions are selected by estimating their fluency and -domain relevance. Evaluated by human graders on 500 random-selected triples -from Freebase, questions generated by our system are judged to be more fluent -than those of \newcite{serban-EtAl:2016:P16-1} by human graders. -" -3621,1610.03914,"Kiran Vodrahalli, Po-Hsuan Chen, Yingyu Liang, Christopher Baldassano, - Janice Chen, Esther Yong, Christopher Honey, Uri Hasson, Peter Ramadge, Ken - Norman, Sanjeev Arora","Mapping Between fMRI Responses to Movies and their Natural Language - Annotations",q-bio.NC cs.CL cs.LG," Several research groups have shown how to correlate fMRI responses to the -meanings of presented stimuli. This paper presents new methods for doing so -when only a natural language annotation is available as the description of the -stimulus. We study fMRI data gathered from subjects watching an episode of BBCs -Sherlock [1], and learn bidirectional mappings between fMRI responses and -natural language representations. We show how to leverage data from multiple -subjects watching the same movie to improve the accuracy of the mappings, -allowing us to succeed at a scene classification task with 72% accuracy (random -guessing would give 4%) and at a scene ranking task with average rank in the -top 4% (random guessing would give 50%). The key ingredients are (a) the use of -the Shared Response Model (SRM) and its variant SRM-ICA [2, 3] to aggregate -fMRI data from multiple subjects, both of which are shown to be superior to -standard PCA in producing low-dimensional representations for the tasks in this -paper; (b) a sentence embedding technique adapted from the natural language -processing (NLP) literature [4] that produces semantic vector representation of -the annotations; (c) using previous timestep information in the featurization -of the predictor data. -" -3622,1610.03934,"Hans Krupakar, Keerthika Rajvel, Bharathi B, Angel Deborah S, - Vallidevi Krishnamurthy",A Survey of Voice Translation Methodologies - Acoustic Dialect Decoder,cs.CL cs.NE cs.SD stat.ML," Speech Translation has always been about giving source text or audio input -and waiting for system to give translated output in desired form. In this -paper, we present the Acoustic Dialect Decoder (ADD) - a voice to voice -ear-piece translation device. We introduce and survey the recent advances made -in the field of Speech Engineering, to employ in the ADD, particularly focusing -on the three major processing steps of Recognition, Translation and Synthesis. -We tackle the problem of machine understanding of natural language by designing -a recognition unit for source audio to text, a translation unit for source -language text to target language text, and a synthesis unit for target language -text to target language speech. Speech from the surroundings will be recorded -by the recognition unit present on the ear-piece and translation will start as -soon as one sentence is successfully read. This way, we hope to give translated -output as and when input is being read. The recognition unit will use Hidden -Markov Models (HMMs) Based Tool-Kit (HTK), hybrid RNN systems with gated memory -cells, and the synthesis unit, HMM based speech synthesis system HTS. This -system will initially be built as an English to Tamil translation device. -" -3623,1610.03946,"Jessica Ficler, Yoav Goldberg",A Neural Network for Coordination Boundary Prediction,cs.CL," We propose a neural-network based model for coordination boundary prediction. -The network is designed to incorporate two signals: the similarity between -conjuncts and the observation that replacing the whole coordination phrase with -a conjunct tends to produce a coherent sentences. The modeling makes use of -several LSTM networks. The model is trained solely on conjunction annotations -in a Treebank, without using external resources. We show improvements on -predicting coordination boundaries on the PTB compared to two state-of-the-art -parsers; as well as improvement over previous coordination boundary prediction -systems on the Genia corpus. -" -3624,1610.03950,"Yunchuan Chen, Lili Mou, Yan Xu, Ge Li, Zhi Jin",Compressing Neural Language Models by Sparse Word Representations,cs.CL cs.LG," Neural networks are among the state-of-the-art techniques for language -modeling. Existing neural language models typically map discrete words to -distributed, dense vector representations. After information processing of the -preceding context words by hidden layers, an output layer estimates the -probability of the next word. Such approaches are time- and memory-intensive -because of the large numbers of parameters for word embeddings and the output -layer. In this paper, we propose to compress neural language models by sparse -word representations. In the experiments, the number of parameters in our model -increases very slowly with the growth of the vocabulary size, which is almost -imperceptible. Moreover, our approach not only reduces the parameter space to a -large extent, but also improves the performance in terms of the perplexity -measure. -" -3625,1610.03955,"Yiping Song, Lili Mou, Rui Yan, Li Yi, Zinan Zhu, Xiaohua Hu, Ming - Zhang",Dialogue Session Segmentation by Embedding-Enhanced TextTiling,cs.CL cs.HC," In human-computer conversation systems, the context of a user-issued -utterance is particularly important because it provides useful background -information of the conversation. However, it is unwise to track all previous -utterances in the current session as not all of them are equally important. In -this paper, we address the problem of session segmentation. We propose an -embedding-enhanced TextTiling approach, inspired by the observation that -conversation utterances are highly noisy, and that word embeddings provide a -robust way of capturing semantics. Experimental results show that our approach -achieves better performance than the TextTiling, MMD approaches. -" -3626,1610.04120,"Lina M. Rojas Barahona, Milica Gasic, Nikola Mrk\v{s}i\'c, Pei-Hao Su, - Stefan Ultes, Tsung-Hsien Wen and Steve Young","Exploiting Sentence and Context Representations in Deep Neural Models - for Spoken Language Understanding",cs.AI cs.CL cs.NE," This paper presents a deep learning architecture for the semantic decoder -component of a Statistical Spoken Dialogue System. In a slot-filling dialogue, -the semantic decoder predicts the dialogue act and a set of slot-value pairs -from a set of n-best hypotheses returned by the Automatic Speech Recognition. -Most current models for spoken language understanding assume (i) word-aligned -semantic annotations as in sequence taggers and (ii) delexicalisation, or a -mapping of input words to domain-specific concepts using heuristics that try to -capture morphological variation but that do not scale to other domains nor to -language variation (e.g., morphology, synonyms, paraphrasing ). In this work -the semantic decoder is trained using unaligned semantic annotations and it -uses distributed semantic representation learning to overcome the limitations -of explicit delexicalisation. The proposed architecture uses a convolutional -neural network for the sentence representation and a long-short term memory -network for the context representation. Results are presented for the publicly -available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a -significantly higher word error rate (WER). -" -3627,1610.04211,Julien Perez and Fei Liu,Gated End-to-End Memory Networks,cs.CL stat.ML," Machine reading using differentiable reasoning models has recently shown -remarkable progress. In this context, End-to-End trainable Memory Networks, -MemN2N, have demonstrated promising performance on simple natural language -based reasoning tasks such as factual reasoning and basic deduction. However, -other tasks, namely multi-fact question-answering, positional reasoning or -dialog related tasks, remain challenging particularly due to the necessity of -more complex interactions between the memory and controller modules composing -this family of models. In this paper, we introduce a novel end-to-end memory -access regulation mechanism inspired by the current progress on the connection -short-cutting principle in the field of computer vision. Concretely, we develop -a Gated End-to-End trainable Memory Network architecture, GMemN2N. From the -machine learning perspective, this new capability is learned in an end-to-end -fashion without the use of any additional supervision signal which is, as far -as our knowledge goes, the first of its kind. Our experiments show significant -improvements on the most challenging tasks in the 20 bAbI dataset, without the -use of any domain knowledge. Then, we show improvements on the dialog bAbI -tasks including the real human-bot conversion-based Dialog State Tracking -Challenge (DSTC-2) dataset. On these two datasets, our model sets the new state -of the art. -" -3628,1610.04265,"Hieu Hoang, Nikolay Bogoychev, Lane Schwartz, Marcin Junczys-Dowmunt","Fast, Scalable Phrase-Based SMT Decoding",cs.CL," The utilization of statistical machine translation (SMT) has grown enormously -over the last decade, many using open-source software developed by the NLP -community. As commercial use has increased, there is need for software that is -optimized for commercial requirements, in particular, fast phrase-based -decoding and more efficient utilization of modern multicore servers. - In this paper we re-examine the major components of phrase-based decoding and -decoder implementation with particular emphasis on speed and scalability on -multicore machines. The result is a drop-in replacement for the Moses decoder -which is up to fifteen times faster and scales monotonically with the number of -cores. -" -3629,1610.04345,Fei Liu and Julien Perez and Scott Nowson,"A Language-independent and Compositional Model for Personality Trait - Recognition from Short Texts",cs.CL stat.ML," Many methods have been used to recognize author personality traits from text, -typically combining linguistic feature engineering with shallow learning -models, e.g. linear regression or Support Vector Machines. This work uses -deep-learning-based models and atomic features of text, the characters, to -build hierarchical, vectorial word and sentence representations for trait -inference. This method, applied to a corpus of tweets, shows state-of-the-art -performance across five traits and three languages (English, Spanish and -Italian) compared with prior work in author profiling. The results, supported -by preliminary visualisation work, are encouraging for the ability to detect -complex human traits. -" -3630,1610.04377,"Diptesh Kanojia, Vishwajeet Kumar, and Krithi Ramamritham",Civique: Using Social Media to Detect Urban Emergencies,cs.CL cs.CY cs.SI," We present the Civique system for emergency detection in urban areas by -monitoring micro blogs like Tweets. The system detects emergency related -events, and classifies them into appropriate categories like ""fire"", -""accident"", ""earthquake"", etc. We demonstrate our ideas by classifying Twitter -posts in real time, visualizing the ongoing event on a map interface and -alerting users with options to contact relevant authorities, both online and -offline. We evaluate our classifiers for both the steps, i.e., emergency -detection and categorization, and obtain F-scores exceeding 70% and 90%, -respectively. We demonstrate Civique using a web interface and on an Android -application, in realtime, and show its use for both tweet detection and -visualization. -" -3631,1610.04416,"Dimitri Kartsaklis, Mehrnoosh Sadrzadeh",Distributional Inclusion Hypothesis for Tensor-based Composition,cs.CL cs.AI," According to the distributional inclusion hypothesis, entailment between -words can be measured via the feature inclusions of their distributional -vectors. In recent work, we showed how this hypothesis can be extended from -words to phrases and sentences in the setting of compositional distributional -semantics. This paper focuses on inclusion properties of tensors; its main -contribution is a theoretical and experimental analysis of how feature -inclusion works in different concrete models of verb tensors. We present -results for relational, Frobenius, projective, and holistic methods and compare -them to the simple vector addition, multiplication, min, and max models. The -degrees of entailment thus obtained are evaluated via a variety of existing -word-based measures, such as Weed's and Clarke's, KL-divergence, APinc, -balAPinc, and two of our previously proposed metrics at the phrase/sentence -level. We perform experiments on three entailment datasets, investigating which -version of tensor-based composition achieves the highest performance when -combined with the sentence-level measures. -" -3632,1610.04533,"Issa Atoum, Ahmed Otoom and Narayanan Kulathuramaiyer","A Comprehensive Comparative Study of Word and Sentence Similarity - Measures",cs.IR cs.CL," Sentence similarity is considered the basis of many natural language tasks -such as information retrieval, question answering and text summarization. The -semantic meaning between compared text fragments is based on the words semantic -features and their relationships. This article reviews a set of word and -sentence similarity measures and compares them on benchmark datasets. On the -studied datasets, results showed that hybrid semantic measures perform better -than both knowledge and corpus based measures. -" -3633,1610.04658,"Yacine Jernite, Anna Choromanska and David Sontag","Simultaneous Learning of Trees and Representations for Extreme - Classification and Density Estimation",stat.ML cs.CL cs.LG," We consider multi-class classification where the predictor has a hierarchical -structure that allows for a very large number of labels both at train and test -time. The predictive power of such models can heavily depend on the structure -of the tree, and although past work showed how to learn the tree structure, it -expected that the feature vectors remained static. We provide a novel algorithm -to simultaneously perform representation learning for the input data and -learning of the hierarchi- cal predictor. Our approach optimizes an objec- tive -function which favors balanced and easily- separable multi-way node partitions. -We theoret- ically analyze this objective, showing that it gives rise to a -boosting style property and a bound on classification error. We next show how -to extend the algorithm to conditional density estimation. We empirically -validate both variants of the al- gorithm on text classification and language -mod- eling, respectively, and show that they compare favorably to common -baselines in terms of accu- racy and running time. -" -3634,1610.04718,"Roman Samarev, Andrey Vasnetsov, Elizaveta Smelkova","Generalization of metric classification algorithms for sequences - classification and labelling",cs.LG cs.CL," The article deals with the issue of modification of metric classification -algorithms. In particular, it studies the algorithm k-Nearest Neighbours for -its application to sequential data. A method of generalization of metric -classification algorithms is proposed. As a part of it, there has been -developed an algorithm for solving the problem of classification and labelling -of sequential data. The advantages of the developed algorithm of classification -in comparison with the existing one are also discussed in the article. There is -a comparison of the effectiveness of the proposed algorithm with the algorithm -of CRF in the task of chunking in the open data set CoNLL2000. -" -3635,1610.04814,D S Guru and Mahamad Suhil,"Term-Class-Max-Support (TCMS): A Simple Text Document Categorization - Approach Using Term-Class Relevance Measure",cs.IR cs.CL," In this paper, a simple text categorization method using term-class relevance -measures is proposed. Initially, text documents are processed to extract -significant terms present in them. For every term extracted from a document, we -compute its importance in preserving the content of a class through a novel -term-weighting scheme known as Term_Class Relevance (TCR) measure proposed by -Guru and Suhil (2015) [1]. In this way, for every term, its relevance for all -the classes present in the corpus is computed and stored in the knowledgebase. -During testing, the terms present in the test document are extracted and the -term-class relevance of each term is obtained from the stored knowledgebase. To -achieve quick search of term weights, Btree indexing data structure has been -adapted. Finally, the class which receives maximum support in terms of -term-class relevance is decided to be the class of the given test document. The -proposed method works in logarithmic complexity in testing time and simple to -implement when compared to any other text categorization techniques available -in literature. The experiments conducted on various benchmarking datasets have -revealed that the performance of the proposed method is satisfactory and -encouraging. -" -3636,1610.04841,Raj Nath Patel and Sasikumar M,Translation Quality Estimation using Recurrent Neural Network,cs.CL," This paper describes our submission to the shared task on word/phrase level -Quality Estimation (QE) in the First Conference on Statistical Machine -Translation (WMT16). The objective of the shared task was to predict if the -given word/phrase is a correct/incorrect (OK/BAD) translation in the given -sentence. In this paper, we propose a novel approach for word level Quality -Estimation using Recurrent Neural Network Language Model (RNN-LM) architecture. -RNN-LMs have been found very effective in different Natural Language Processing -(NLP) applications. RNN-LM is mainly used for vector space language modeling -for different NLP problems. For this task, we modify the architecture of -RNN-LM. The modified system predicts a label (OK/BAD) in the slot rather than -predicting the word. The input to the system is a word sequence, similar to the -standard RNN-LM. The approach is language independent and requires only the -translated text for QE. To estimate the phrase level quality, we use the output -of the word level QE system. -" -3637,1610.04989,"Jiacheng Xu, Danlu Chen, Xipeng Qiu and Xuangjing Huang","Cached Long Short-Term Memory Neural Networks for Document-Level - Sentiment Classification",cs.CL cs.NE," Recently, neural networks have achieved great success on sentiment -classification due to their ability to alleviate feature engineering. However, -one of the remaining challenges is to model long texts in document-level -sentiment classification under a recurrent architecture because of the -deficiency of the memory unit. To address this problem, we present a Cached -Long Short-Term Memory neural networks (CLSTM) to capture the overall semantic -information in long texts. CLSTM introduces a cache mechanism, which divides -memory into several groups with different forgetting rates and thus enables the -network to keep sentiment information better within a recurrent unit. The -proposed CLSTM outperforms the state-of-the-art models on three publicly -available document-level sentiment analysis datasets. -" -3638,1610.05011,"Fandong Meng, Zhengdong Lu, Hang Li, Qun Liu",Interactive Attention for Neural Machine Translation,cs.CL," Conventional attention-based Neural Machine Translation (NMT) conducts -dynamic alignment in generating the target sentence. By repeatedly reading the -representation of source sentence, which keeps fixed after generated by the -encoder (Bahdanau et al., 2015), the attention mechanism has greatly enhanced -state-of-the-art NMT. In this paper, we propose a new attention mechanism, -called INTERACTIVE ATTENTION, which models the interaction between the decoder -and the representation of source sentence during translation by both reading -and writing operations. INTERACTIVE ATTENTION can keep track of the interaction -history and therefore improve the translation performance. Experiments on NIST -Chinese-English translation task show that INTERACTIVE ATTENTION can achieve -significant improvements over both the previous attention-based NMT baseline -and some state-of-the-art variants of attention-based NMT (i.e., coverage -models (Tu et al., 2016)). And neural machine translator with our INTERACTIVE -ATTENTION can outperform the open source attention-based NMT system Groundhog -by 4.22 BLEU points and the open source phrase-based system Moses by 3.94 BLEU -points averagely on multiple test sets. -" -3639,1610.05150,"Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, Min Zhang",Neural Machine Translation Advised by Statistical Machine Translation,cs.CL," Neural Machine Translation (NMT) is a new approach to machine translation -that has made great progress in recent years. However, recent studies show that -NMT generally produces fluent but inadequate translations (Tu et al. 2016b; Tu -et al. 2016a; He et al. 2016; Tu et al. 2017). This is in contrast to -conventional Statistical Machine Translation (SMT), which usually yields -adequate but non-fluent translations. It is natural, therefore, to leverage the -advantages of both models for better translations, and in this work we propose -to incorporate SMT model into NMT framework. More specifically, at each -decoding step, SMT offers additional recommendations of generated words based -on the decoding information from NMT (e.g., the generated partial translation -and attention history). Then we employ an auxiliary classifier to score the SMT -recommendations and a gating function to combine the SMT recommendations with -NMT generations, both of which are jointly trained within the NMT architecture -in an end-to-end manner. Experimental results on Chinese-English translation -show that the proposed approach achieves significant and consistent -improvements over state-of-the-art NMT and SMT systems on multiple NIST test -sets. -" -3640,1610.05243,"Jan Niehues, Eunah Cho, Thanh-Le Ha and Alex Waibel",Pre-Translation for Neural Machine Translation,cs.CL," Recently, the development of neural machine translation (NMT) has -significantly improved the translation quality of automatic machine -translation. While most sentences are more accurate and fluent than -translations by statistical machine translation (SMT)-based systems, in some -cases, the NMT system produces translations that have a completely different -meaning. This is especially the case when rare words occur. - When using statistical machine translation, it has already been shown that -significant gains can be achieved by simplifying the input in a preprocessing -step. A commonly used example is the pre-reordering approach. - In this work, we used phrase-based machine translation to pre-translate the -input into the target language. Then a neural machine translation system -generates the final hypothesis using the pre-translation. Thereby, we use -either only the output of the phrase-based machine translation (PBMT) system or -a combination of the PBMT output and the source sentence. - We evaluate the technique on the English to German translation task. Using -this approach we are able to outperform the PBMT system as well as the baseline -neural MT system by up to 2 BLEU points. We analyzed the influence of the -quality of the initial system on the final result. -" -3641,1610.05256,"W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu - and G. Zweig",Achieving Human Parity in Conversational Speech Recognition,cs.CL eess.AS," Conversational speech recognition has served as a flagship speech recognition -task since the release of the Switchboard corpus in the 1990s. In this paper, -we measure the human error rate on the widely used NIST 2000 test set, and find -that our latest automated system has reached human parity. The error rate of -professional transcribers is 5.9% for the Switchboard portion of the data, in -which newly acquainted pairs of people discuss an assigned topic, and 11.3% for -the CallHome portion where friends and family members have open-ended -conversations. In both cases, our automated system establishes a new state of -the art, and edges past the human benchmark, achieving error rates of 5.8% and -11.0%, respectively. The key to our system's performance is the use of various -convolutional and LSTM acoustic model architectures, combined with a novel -spatial smoothing method and lattice-free MMI acoustic training, multiple -recurrent neural network language modeling approaches, and a systematic use of -system combination. -" -3642,1610.05361,Hassan Taherian,End-to-end attention-based distant speech recognition with Highway LSTM,cs.CL," End-to-end attention-based models have been shown to be competitive -alternatives to conventional DNN-HMM models in the Speech Recognition Systems. -In this paper, we extend existing end-to-end attention-based models that can be -applied for Distant Speech Recognition (DSR) task. Specifically, we propose an -end-to-end attention-based speech recognizer with multichannel input that -performs sequence prediction directly at the character level. To gain a better -performance, we also incorporate Highway long short-term memory (HLSTM) which -outperforms previous models on AMI distant speech recognition task. -" -3643,1610.05461,"Ella Rabinovich, Shachar Mirkin, Raj Nath Patel, Lucia Specia and - Shuly Wintner",Personalized Machine Translation: Preserving Original Author Traits,cs.CL," The language that we produce reflects our personality, and various personal -and demographic characteristics can be detected in natural language texts. We -focus on one particular personal trait of the author, gender, and study how it -is manifested in original texts and in translations. We show that author's -gender has a powerful, clear signal in originals texts, but this signal is -obfuscated in human and machine translation. We then propose simple -domain-adaptation techniques that help retain the original gender traits in the -translation, without harming the quality of the translation, thereby creating -more personalized machine translation systems. -" -3644,1610.05522,"Giovanni Da San Martino, Alberto Barr\'on-Cede\~no, Salvatore Romeo, - Alessandro Moschitti, Shafiq Joty, Fahad A. Al Obaidli, Kateryna Tymoshenko, - Antonio Uva",Addressing Community Question Answering in English and Arabic,cs.CL," This paper studies the impact of different types of features applied to -learning to re-rank questions in community Question Answering. We tested our -models on two datasets released in SemEval-2016 Task 3 on ""Community Question -Answering"". Task 3 targeted real-life Web fora both in English and Arabic. Our -models include bag-of-words features (BoW), syntactic tree kernels (TKs), rank -features, embeddings, and machine translation evaluation features. To the best -of our knowledge, structural kernels have barely been applied to the question -reranking task, where they have to model paraphrase relations. In the case of -the English question re-ranking task, we compare our learning to rank (L2R) -algorithms against a strong baseline given by the Google-generated ranking -(GR). The results show that i) the shallow structures used in our TKs are -robust enough to noisy data and ii) improving GR is possible, but effective BoW -features and TKs along with an accurate model of GR features in the used L2R -algorithm are required. In the case of the Arabic question re-ranking task, for -the first time we applied tree kernels on syntactic trees of Arabic sentences. -Our approaches to both tasks obtained the second best results on SemEval-2016 -subtasks B on English and D on Arabic. -" -3645,1610.05540,"Josep Crego, Jungi Kim, Guillaume Klein, Anabel Rebollo, Kathy Yang, - Jean Senellart, Egor Akhanov, Patrice Brunelle, Aurelien Coquard, Yongchao - Deng, Satoshi Enoue, Chiyo Geiss, Joshua Johanson, Ardas Khalsa, Raoum - Khiari, Byeongil Ko, Catherine Kobus, Jean Lorieux, Leidiana Martins, - Dang-Chuan Nguyen, Alexandra Priori, Thomas Riccardi, Natalia Segal, - Christophe Servan, Cyril Tiquet, Bo Wang, Jin Yang, Dakun Zhang, Jing Zhou, - Peter Zoldan",SYSTRAN's Pure Neural Machine Translation Systems,cs.CL," Since the first online demonstration of Neural Machine Translation (NMT) by -LISA, NMT development has recently moved from laboratory to production systems -as demonstrated by several entities announcing roll-out of NMT engines to -replace their existing technologies. NMT systems have a large number of -training configurations and the training process of such systems is usually -very long, often a few weeks, so role of experimentation is critical and -important to share. In this work, we present our approach to production-ready -systems simultaneously with release of online demonstrators covering a large -variety of languages (12 languages, for 32 language pairs). We explore -different practical choices: an efficient and evolutive open-source framework; -data preparation; network architecture; additional implemented features; tuning -for production; etc. We discuss about evaluation methodology, present our first -findings and we finally outline further work. - Our ultimate goal is to share our expertise to build competitive production -systems for ""generic"" translation. We aim at contributing to set up a -collaborative framework to speed-up adoption of the technology, foster further -research efforts and enable the delivery and adoption to/by industry of -use-case specific engines integrated in real production workflows. Mastering of -the technology would allow us to build translation engines suited for -particular needs, outperforming current simplest/uniform systems. -" -3646,1610.05652,Phuong Le-Hong,"Vietnamese Named Entity Recognition using Token Regular Expressions and - Bidirectional Inference",cs.CL," This paper describes an efficient approach to improve the accuracy of a named -entity recognition system for Vietnamese. The approach combines regular -expressions over tokens and a bidirectional inference method in a sequence -labelling model. The proposed method achieves an overall $F_1$ score of 89.66% -on a test set of an evaluation campaign, organized in late 2016 by the -Vietnamese Language and Speech Processing (VLSP) community. -" -3647,1610.05654,Antoni Hern\'andez-Fern\'andez and Ramon Ferrer-i-Cancho,The infochemical core,q-bio.NC cs.CL," Vocalizations and less often gestures have been the object of linguistic -research over decades. However, the development of a general theory of -communication with human language as a particular case requires a clear -understanding of the organization of communication through other means. -Infochemicals are chemical compounds that carry information and are employed by -small organisms that cannot emit acoustic signals of optimal frequency to -achieve successful communication. Here the distribution of infochemicals across -species is investigated when they are ranked by their degree or the number of -species with which it is associated (because they produce or they are sensitive -to it). The quality of the fit of different functions to the dependency between -degree and rank is evaluated with a penalty for the number of parameters of the -function. Surprisingly, a double Zipf (a Zipf distribution with two regimes -with a different exponent each) is the model yielding the best fit although it -is the function with the largest number of parameters. This suggests that the -world wide repertoire of infochemicals contains a chemical nucleus shared by -many species and reminiscent of the core vocabularies found for human language -in dictionaries or large corpora. -" -3648,1610.05670,"Mark Eisen, Santiago Segarra, Gabriel Egan, Alejandro Ribeiro",Stylometric Analysis of Early Modern Period English Plays,cs.CL cs.LG," Function word adjacency networks (WANs) are used to study the authorship of -plays from the Early Modern English period. In these networks, nodes are -function words and directed edges between two nodes represent the relative -frequency of directed co-appearance of the two words. For every analyzed play, -a WAN is constructed and these are aggregated to generate author profile -networks. We first study the similarity of writing styles between Early English -playwrights by comparing the profile WANs. The accuracy of using WANs for -authorship attribution is then demonstrated by attributing known plays among -six popular playwrights. Moreover, the WAN method is shown to outperform other -frequency-based methods on attributing Early English plays. In addition, WANs -are shown to be reliable classifiers even when attributing collaborative plays. -For several plays of disputed co-authorship, a deeper analysis is performed by -attributing every act and scene separately, in which we both corroborate -existing breakdowns and provide evidence of new assignments. -" -3649,1610.05688,"Pranay Dighe, Afsaneh Asaei and Herve Bourlard",Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models,cs.CL cs.AI cs.HC cs.LG," Conventional deep neural networks (DNN) for speech acoustic modeling rely on -Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary -class labels as the targets for DNN training. Subword classes in speech -recognition systems correspond to context-dependent tied states or senones. The -present work addresses some limitations of GMM-HMM senone alignments for DNN -training. We hypothesize that the senone probabilities obtained from a DNN -trained with binary labels can provide more accurate targets to learn better -acoustic models. However, DNN outputs bear inaccuracies which are exhibited as -high dimensional unstructured noise, whereas the informative components are -structured and low-dimensional. We exploit principle component analysis (PCA) -and sparse coding to characterize the senone subspaces. Enhanced probabilities -obtained from low-rank and sparse reconstructions are used as soft-targets for -DNN acoustic modeling, that also enables training with untranscribed data. -Experiments conducted on AMI corpus shows 4.6% relative reduction in word error -rate. -" -3650,1610.05812,Liang Lu and Steve Renals,Small-footprint Highway Deep Neural Networks for Speech Recognition,cs.CL cs.LG," State-of-the-art speech recognition systems typically employ neural network -acoustic models. However, compared to Gaussian mixture models, deep neural -network (DNN) based acoustic models often have many more model parameters, -making it challenging for them to be deployed on resource-constrained -platforms, such as mobile devices. In this paper, we study the application of -the recently proposed highway deep neural network (HDNN) for training -small-footprint acoustic models. HDNNs are a depth-gated feedforward neural -network, which include two types of gate functions to facilitate the -information flow through different layers. Our study demonstrates that HDNNs -are more compact than regular DNNs for acoustic modeling, i.e., they can -achieve comparable recognition accuracy with many fewer model parameters. -Furthermore, HDNNs are more controllable than DNNs: the gate functions of an -HDNN can control the behavior of the whole network using a very small number of -model parameters. Finally, we show that HDNNs are more adaptable than DNNs. For -example, simply updating the gate functions using adaptation data can result in -considerable gains in accuracy. We demonstrate these aspects by experiments -using the publicly available AMI corpus, which has around 80 hours of training -data. -" -3651,1610.05858,"Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi",Bidirectional LSTM-CRF for Clinical Concept Extraction,cs.CL," Extraction of concepts present in patient clinical records is an essential -step in clinical research. The 2010 i2b2/VA Workshop on Natural Language -Processing Challenges for clinical records presented concept extraction (CE) -task, with aim to identify concepts (such as treatments, tests, problems) and -classify them into predefined categories. State-of-the-art CE approaches -heavily rely on hand crafted features and domain specific resources which are -hard to collect and tune. For this reason, this paper employs bidirectional -LSTM with CRF decoding initialized with general purpose off-the-shelf word -embeddings for CE. The experimental results achieved on 2010 i2b2/VA reference -standard corpora using bidirectional LSTM CRF ranks closely with top ranked -systems. -" -3652,1610.05948,"Dhananjay Ram, Debasis Kundu, Rajesh M. Hegde",A Bayesian Approach to Estimation of Speaker Normalization Parameters,cs.SD cs.CL stat.AP," In this work, a Bayesian approach to speaker normalization is proposed to -compensate for the degradation in performance of a speaker independent speech -recognition system. The speaker normalization method proposed herein uses the -technique of vocal tract length normalization (VTLN). The VTLN parameters are -estimated using a novel Bayesian approach which utilizes the Gibbs sampler, a -special type of Markov Chain Monte Carlo method. Additionally the -hyperparameters are estimated using maximum likelihood approach. This model is -used assuming that human vocal tract can be modeled as a tube of uniform cross -section. It captures the variation in length of the vocal tract of different -speakers more effectively, than the linear model used in literature. The work -has also investigated different methods like minimization of Mean Square Error -(MSE) and Mean Absolute Error (MAE) for the estimation of VTLN parameters. Both -single pass and two pass approaches are then used to build a VTLN based speech -recognizer. Experimental results on recognition of vowels and Hindi phrases -from a medium vocabulary indicate that the Bayesian method improves the -performance by a considerable margin. -" -3653,1610.06053,Taraka Rama,"Chinese Restaurant Process for cognate clustering: A threshold free - approach",cs.CL," In this paper, we introduce a threshold free approach, motivated from Chinese -Restaurant Process, for the purpose of cognate clustering. We show that our -approach yields similar results to a linguistically motivated cognate -clustering system known as LexStat. Our Chinese Restaurant Process system is -fast and does not require any threshold and can be applied to any language -family of the world. -" -3654,1610.06210,"Rik Koncel-Kedziorski, Ioannis Konstas, Luke Zettlemoyer, and Hannaneh - Hajishirzi",A Theme-Rewriting Approach for Generating Algebra Word Problems,cs.CL," Texts present coherent stories that have a particular theme or overall -setting, for example science fiction or western. In this paper, we present a -text generation method called {\it rewriting} that edits existing -human-authored narratives to change their theme without changing the underlying -story. We apply the approach to math word problems, where it might help -students stay more engaged by quickly transforming all of their homework -assignments to the theme of their favorite movie without changing the math -concepts that are being taught. Our rewriting method uses a two-stage decoding -process, which proposes new words from the target theme and scores the -resulting stories according to a number of factors defining aspects of -syntactic, semantic, and thematic coherence. Experiments demonstrate that the -final stories typically represent the new theme well while still testing the -original math concepts, outperforming a number of baselines. We also release a -new dataset of human-authored rewrites of math word problems in several themes. -" -3655,1610.06227,"Mohammad Sadegh Rasooli, Michael Collins",Cross-Lingual Syntactic Transfer with Limited Resources,cs.CL," We describe a simple but effective method for cross-lingual syntactic -transfer of dependency parsers, in the scenario where a large amount of -translation data is not available. The method makes use of three steps: 1) a -method for deriving cross-lingual word clusters, which can then be used in a -multilingual parser; 2) a method for transferring lexical information from a -target language to source language treebanks; 3) a method for integrating these -steps with the density-driven annotation projection method of Rasooli and -Collins (2015). Experiments show improvements over the state-of-the-art in -several languages used in previous work, in a setting where the only source of -translation data is the Bible, a considerably smaller corpus than the Europarl -corpus used in previous work. Results using the Europarl corpus as a source of -translation data show additional improvements over the results of Rasooli and -Collins (2015). We conclude with results on 38 datasets from the Universal -Dependencies corpora. -" -3656,1610.06272,"Bonggun Shin, Timothy Lee and Jinho D. Choi",Lexicon Integrated CNN Models with Attention for Sentiment Analysis,cs.CL," With the advent of word embeddings, lexicons are no longer fully utilized for -sentiment analysis although they still provide important features in the -traditional setting. This paper introduces a novel approach to sentiment -analysis that integrates lexicon embeddings and an attention mechanism into -Convolutional Neural Networks. Our approach performs separate convolutions for -word and lexicon embeddings and provides a global view of the document using -attention. Our models are experimented on both the SemEval'16 Task 4 dataset -and the Stanford Sentiment Treebank, and show comparative or better results -against the existing state-of-the-art systems. Our analysis shows that lexicon -embeddings allow to build high-performing models with much smaller word -embeddings, and the attention mechanism effectively dims out noisy words for -sentiment analysis. -" -3657,1610.06370,Georgios P. Spithourakis and Steffen E. Petersen and Sebastian Riedel,"Clinical Text Prediction with Numerically Grounded Conditional Language - Models",cs.CL cs.HC cs.NE," Assisted text input techniques can save time and effort and improve text -quality. In this paper, we investigate how grounded and conditional extensions -to standard neural language models can bring improvements in the tasks of word -prediction and completion. These extensions incorporate a structured knowledge -base and numerical values from the text into the context used to predict the -next word. Our automated evaluation on a clinical dataset shows extended models -significantly outperform standard models. Our best system uses both -conditioning and grounding, because of their orthogonal benefits. For word -prediction with a list of 5 suggestions, it improves recall from 25.03% to -71.28% and for word completion it improves keystroke savings from 34.35% to -44.81%, where theoretical bound for this dataset is 58.78%. We also perform a -qualitative investigation of how models with lower perplexity occasionally fare -better at the tasks. We found that at test time numbers have more influence on -the document level than on individual word probabilities. -" -3658,1610.06454,Tsendsuren Munkhdalai and Hong Yu,"Reasoning with Memory Augmented Neural Networks for Language - Comprehension",cs.CL cs.AI cs.NE stat.ML," Hypothesis testing is an important cognitive process that supports human -reasoning. In this paper, we introduce a computational hypothesis testing -approach based on memory augmented neural networks. Our approach involves a -hypothesis testing loop that reconsiders and progressively refines a previously -formed hypothesis in order to generate new hypotheses to test. We apply the -proposed approach to language comprehension task by using Neural Semantic -Encoders (NSE). Our NSE models achieve the state-of-the-art results showing an -absolute improvement of 1.2% to 2.6% accuracy over previous results obtained by -single and ensemble systems on standard machine comprehension benchmarks such -as the Children's Book Test (CBT) and Who-Did-What (WDW) news article datasets. -" -3659,1610.06498,"Jeaneth Machicao, Edilson A. Corr\^ea Jr., Gisele H. B. Miranda, Diego - R. Amancio, Odemir M. Bruno",Authorship Attribution Based on Life-Like Network Automata,cs.CL," The authorship attribution is a problem of considerable practical and -technical interest. Several methods have been designed to infer the authorship -of disputed documents in multiple contexts. While traditional statistical -methods based solely on word counts and related measurements have provided a -simple, yet effective solution in particular cases; they are prone to -manipulation. Recently, texts have been successfully modeled as networks, where -words are represented by nodes linked according to textual similarity -measurements. Such models are useful to identify informative topological -patterns for the authorship recognition task. However, there is no consensus on -which measurements should be used. Thus, we proposed a novel method to -characterize text networks, by considering both topological and dynamical -aspects of networks. Using concepts and methods from cellular automata theory, -we devised a strategy to grasp informative spatio-temporal patterns from this -model. Our experiments revealed an outperformance over traditional analysis -relying only on topological measurements. Remarkably, we have found a -dependence of pre-processing steps (such as the lemmatization) on the obtained -results, a feature that has mostly been disregarded in related works. The -optimized results obtained here pave the way for a better characterization of -textual networks. -" -3660,1610.06510,Anoop Kunchukuttan and Pushpak Bhattacharyya,"Learning variable length units for SMT between related languages via - Byte Pair Encoding",cs.CL," We explore the use of segments learnt using Byte Pair Encoding (referred to -as BPE units) as basic units for statistical machine translation between -related languages and compare it with orthographic syllables, which are -currently the best performing basic units for this translation task. BPE -identifies the most frequent character sequences as basic units, while -orthographic syllables are linguistically motivated pseudo-syllables. We show -that BPE units modestly outperform orthographic syllables as units of -translation, showing up to 11% increase in BLEU score. While orthographic -syllables can be used only for languages whose writing systems use vowel -representations, BPE is writing system independent and we show that BPE -outperforms other units for non-vowel writing systems too. Our results are -supported by extensive experimentation spanning multiple language families and -writing systems. -" -3661,1610.06540,"Shubham Toshniwal, Karen Livescu","Jointly Learning to Align and Convert Graphemes to Phonemes with Neural - Attention Models",cs.CL cs.AI," We propose an attention-enabled encoder-decoder model for the problem of -grapheme-to-phoneme conversion. Most previous work has tackled the problem via -joint sequence models that require explicit alignments for training. In -contrast, the attention-enabled encoder-decoder model allows for jointly -learning to align and convert characters to phonemes. We explore different -types of attention models, including global and local attention, and our best -models achieve state-of-the-art results on three standard data sets (CMUDict, -Pronlex, and NetTalk). -" -3662,1610.06542,Graham Neubig,"Lexicons and Minimum Risk Training for Neural Machine Translation: - NAIST-CMU at WAT2016",cs.CL," This year, the Nara Institute of Science and Technology (NAIST)/Carnegie -Mellon University (CMU) submission to the Japanese-English translation track of -the 2016 Workshop on Asian Translation was based on attentional neural machine -translation (NMT) models. In addition to the standard NMT model, we make a -number of improvements, most notably the use of discrete translation lexicons -to improve probability estimates, and the use of minimum risk training to -optimize the MT system for BLEU score. As a result, our system achieved the -highest translation evaluation scores for the task. -" -3663,1610.06550,"Alexander Rosenberg Johansen, Jonas Meinertz Hansen, Elias Khazen - Obeid, Casper Kaae S{\o}nderby, Ole Winther",Neural Machine Translation with Characters and Hierarchical Encoding,cs.CL," Most existing Neural Machine Translation models use groups of characters or -whole words as their unit of input and output. We propose a model with a -hierarchical char2word encoder, that takes individual characters both as input -and output. We first argue that this hierarchical representation of the -character encoder reduces computational complexity, and show that it improves -translation performance. Secondly, by qualitatively studying attention plots -from the decoder we find that the model learns to compress common words into a -single embedding whereas rare words, such as names and places, are represented -character by character. -" -3664,1610.06601,"Alok Ranjan Pal, Anupam Munshi and Diganta Saha","An Approach to Speed-up the Word Sense Disambiguation Procedure through - Sense Filtering",cs.CL," In this paper, we are going to focus on speed up of the Word Sense -Disambiguation procedure by filtering the relevant senses of an ambiguous word -through Part-of-Speech Tagging. First, this proposed approach performs the -Part-of-Speech Tagging operation before the disambiguation procedure using -Bigram approximation. As a result, the exact Part-of-Speech of the ambiguous -word at a particular text instance is derived. In the next stage, only those -dictionary definitions (glosses) are retrieved from an online dictionary, which -are associated with that particular Part-of-Speech to disambiguate the exact -sense of the ambiguous word. In the training phase, we have used Brown Corpus -for Part-of-Speech Tagging and WordNet as an online dictionary. The proposed -approach reduces the execution time upto half (approximately) of the normal -execution time for a text, containing around 200 sentences. Not only that, we -have found several instances, where the correct sense of an ambiguous word is -found for using the Part-of-Speech Tagging before the Disambiguation procedure. -" -3665,1610.06602,"Roman Novak, Michael Auli, David Grangier",Iterative Refinement for Machine Translation,cs.CL," Existing machine translation decoding algorithms generate translations in a -strictly monotonic fashion and never revisit previous decisions. As a result, -earlier mistakes cannot be corrected at a later stage. In this paper, we -present a translation scheme that starts from an initial guess and then makes -iterative improvements that may revisit previous decisions. We parameterize our -model as a convolutional neural network that predicts discrete substitutions to -an existing translation based on an attention mechanism over both the source -sentence as well as the current translation output. By making less than one -modification per sentence, we improve the output of a phrase-based translation -system by up to 0.4 BLEU on WMT15 German-English translation. -" -3666,1610.06620,"Omid Bakhshandeh, Trung Bui, Zhe Lin, Walter Chang",Proposing Plausible Answers for Open-ended Visual Question Answering,cs.CL cs.AI cs.CV," Answering open-ended questions is an essential capability for any intelligent -agent. One of the most interesting recent open-ended question answering -challenges is Visual Question Answering (VQA) which attempts to evaluate a -system's visual understanding through its answers to natural language questions -about images. There exist many approaches to VQA, the majority of which do not -exhibit deeper semantic understanding of the candidate answers they produce. We -study the importance of generating plausible answers to a given question by -introducing the novel task of `Answer Proposal': for a given open-ended -question, a system should generate a ranked list of candidate answers informed -by the semantics of the question. We experiment with various models including a -neural generative model as well as a semantic graph matching one. We provide -both intrinsic and extrinsic evaluations for the task of Answer Proposal, -showing that our best model learns to propose plausible answers with a high -recall and performs competitively with some other solutions to VQA. -" -3667,1610.06700,"Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu",End-to-End Training Approaches for Discriminative Segmental Models,cs.CL cs.LG stat.ML," Recent work on discriminative segmental models has shown that they can -achieve competitive speech recognition performance, using features based on -deep neural frame classifiers. However, segmental models can be more -challenging to train than standard frame-based approaches. While some segmental -models have been successfully trained end to end, there is a lack of -understanding of their training under different settings and with different -losses. - We investigate a model class based on recent successful approaches, -consisting of a linear model that combines segmental features based on an LSTM -frame classifier. Similarly to hybrid HMM-neural network models, segmental -models of this class can be trained in two stages (frame classifier training -followed by linear segmental model weight training), end to end (joint training -of both frame classifier and linear weights), or with end-to-end fine-tuning -after two-stage training. - We study segmental models trained end to end with hinge loss, log loss, -latent hinge loss, and marginal log loss. We consider several losses for the -case where training alignments are available as well as where they are not. - We find that in general, marginal log loss provides the most consistent -strong performance without requiring ground-truth alignments. We also find that -training with dropout is very important in obtaining good performance with -end-to-end training. Finally, the best results are typically obtained by a -combination of two-stage training and fine-tuning. -" -3668,1610.06856,"Khudran Alzhrani, Ethan M. Rudd, Terrance E. Boult, and C. Edward Chow",Automated Big Text Security Classification,cs.CR cs.AI cs.CL cs.CY," In recent years, traditional cybersecurity safeguards have proven ineffective -against insider threats. Famous cases of sensitive information leaks caused by -insiders, including the WikiLeaks release of diplomatic cables and the Edward -Snowden incident, have greatly harmed the U.S. government's relationship with -other governments and with its own citizens. Data Leak Prevention (DLP) is a -solution for detecting and preventing information leaks from within an -organization's network. However, state-of-art DLP detection models are only -able to detect very limited types of sensitive information, and research in the -field has been hindered due to the lack of available sensitive texts. Many -researchers have focused on document-based detection with artificially labeled -""confidential documents"" for which security labels are assigned to the entire -document, when in reality only a portion of the document is sensitive. This -type of whole-document based security labeling increases the chances of -preventing authorized users from accessing non-sensitive information within -sensitive documents. In this paper, we introduce Automated Classification -Enabled by Security Similarity (ACESS), a new and innovative detection model -that penetrates the complexity of big text security classification/detection. -To analyze the ACESS system, we constructed a novel dataset, containing -formerly classified paragraphs from diplomatic cables made public by the -WikiLeaks organization. To our knowledge this paper is the first to analyze a -dataset that contains actual formerly sensitive information annotated at -paragraph granularity. -" -3669,1610.07091,"Aditya Joshi, Pranav Goel, Pushpak Bhattacharyya, Mark Carman",Automatic Identification of Sarcasm Target: An Introductory Approach,cs.CL," Past work in computational sarcasm deals primarily with sarcasm detection. In -this paper, we introduce a novel, related problem: sarcasm target -identification i.e., extracting the target of ridicule in a sarcastic -sentence). We present an introductory approach for sarcasm target -identification. Our approach employs two types of extractors: one based on -rules, and another consisting of a statistical classifier. To compare our -approach, we use two baselines: a na\""ive baseline and another baseline based -on work in sentiment target identification. We perform our experiments on book -snippets and tweets, and show that our hybrid approach performs better than the -two baselines and also, in comparison with using the two extractors -individually. Our introductory approach establishes the viability of sarcasm -target identification, and will serve as a baseline for future work. -" -3670,1610.07149,"Yiping Song, Rui Yan, Xiang Li, Dongyan Zhao, Ming Zhang","Two are Better than One: An Ensemble of Retrieval- and Generation-Based - Dialog Systems",cs.CL," Open-domain human-computer conversation has attracted much attention in the -field of NLP. Contrary to rule- or template-based domain-specific dialog -systems, open-domain conversation usually requires data-driven approaches, -which can be roughly divided into two categories: retrieval-based and -generation-based systems. Retrieval systems search a user-issued utterance -(called a query) in a large database, and return a reply that best matches the -query. Generative approaches, typically based on recurrent neural networks -(RNNs), can synthesize new replies, but they suffer from the problem of -generating short, meaningless utterances. In this paper, we propose a novel -ensemble of retrieval-based and generation-based dialog systems in the open -domain. In our approach, the retrieved candidate, in addition to the original -query, is fed to an RNN-based reply generator, so that the neural model is -aware of more information. The generated reply is then fed back as a new -candidate for post-reranking. Experimental results show that such ensemble -outperforms each single part of it by a large margin. -" -3671,1610.07272,Jiajun Zhang and Chengqing Zong,Bridging Neural Machine Translation and Bilingual Dictionaries,cs.CL," Neural Machine Translation (NMT) has become the new state-of-the-art in -several language pairs. However, it remains a challenging problem how to -integrate NMT with a bilingual dictionary which mainly contains words rarely or -never seen in the bilingual training data. In this paper, we propose two -methods to bridge NMT and the bilingual dictionaries. The core idea behind is -to design novel models that transform the bilingual dictionaries into adequate -sentence pairs, so that NMT can distil latent bilingual mappings from the ample -and repetitive phenomena. One method leverages a mixed word/character model and -the other attempts at synthesizing parallel sentences guaranteeing massive -occurrence of the translation lexicon. Extensive experiments demonstrate that -the proposed methods can remarkably improve the translation quality, and most -of the rare words in the test sentences can obtain correct translations if they -are covered by the dictionary. -" -3672,1610.07363,"Arkaitz Zubiaga, Maria Liakata, Rob Procter","Learning Reporting Dynamics during Breaking News for Rumour Detection in - Social Media",cs.CL cs.IR cs.SI," Breaking news leads to situations of fast-paced reporting in social media, -producing all kinds of updates related to news stories, albeit with the caveat -that some of those early updates tend to be rumours, i.e., information with an -unverified status at the time of posting. Flagging information that is -unverified can be helpful to avoid the spread of information that may turn out -to be false. Detection of rumours can also feed a rumour tracking system that -ultimately determines their veracity. In this paper we introduce a novel -approach to rumour detection that learns from the sequential dynamics of -reporting during breaking news in social media to detect rumours in new -stories. Using Twitter datasets collected during five breaking news stories, we -experiment with Conditional Random Fields as a sequential classifier that -leverages context learnt during an event for rumour detection, which we compare -with the state-of-the-art rumour detection system as well as other baselines. -In contrast to existing work, our classifier does not need to observe tweets -querying a piece of information to deem it a rumour, but instead we detect -rumours from the tweet alone by exploiting context learnt during the event. Our -classifier achieves competitive performance, beating the state-of-the-art -classifier that relies on querying tweets with improved precision and recall, -as well as outperforming our best baseline with nearly 40% improvement in terms -of F1 score. The scale and diversity of our experiments reinforces the -generalisability of our classifier. -" -3673,1610.07365,"Thierry Poibeau (LaTTICe), Shravan Vasishth",Introduction: Cognitive Issues in Natural Language Processing,cs.CL cs.AI cs.HC," This special issue is dedicated to get a better picture of the relationships -between computational linguistics and cognitive science. It specifically raises -two questions: ""what is the potential contribution of computational language -modeling to cognitive science?"" and conversely: ""what is the influence of -cognitive science in contemporary computational linguistics?"" -" -3674,1610.07418,"Raj Nath Patel, Prakash B. Pimpale, Sasikumar M",Statistical Machine Translation for Indian Languages: Mission Hindi,cs.CL," This paper discusses Centre for Development of Advanced Computing Mumbai's -(CDACM) submission to the NLP Tools Contest on Statistical Machine Translation -in Indian Languages (ILSMT) 2014 (collocated with ICON 2014). The objective of -the contest was to explore the effectiveness of Statistical Machine Translation -(SMT) for Indian language to Indian language and English-Hindi machine -translation. In this paper, we have proposed that suffix separation and word -splitting for SMT from agglutinative languages to Hindi significantly improves -over the baseline (BL). We have also shown that the factored model with -reordering outperforms the phrase-based SMT for English-Hindi (\enhi). We -report our work on all five pairs of languages, namely Bengali-Hindi (\bnhi), -Marathi-Hindi (\mrhi), Tamil-Hindi (\tahi), Telugu-Hindi (\tehi), and \enhi for -Health, Tourism, and General domains. -" -3675,1610.07420,"Raj Nath Patel, Rohit Gupta, Prakash B. Pimpale, Sasikumar M",Reordering rules for English-Hindi SMT,cs.CL," Reordering is a preprocessing stage for Statistical Machine Translation (SMT) -system where the words of the source sentence are reordered as per the syntax -of the target language. We are proposing a rich set of rules for better -reordering. The idea is to facilitate the training process by better alignments -and parallel phrase extraction for a phrase-based SMT system. Reordering also -helps the decoding process and hence improving the machine translation quality. -We have observed significant improvements in the translation quality by using -our approach over the baseline SMT. We have used BLEU, NIST, multi-reference -word error rate, multi-reference position independent error rate for judging -the improvements. We have exploited open source SMT toolkit MOSES to develop -the system. -" -3676,1610.07432,Douwe Kiela and Luana Bulat and Anita L. Vero and Stephen Clark,"Virtual Embodiment: A Scalable Long-Term Strategy for Artificial - Intelligence Research",cs.AI cs.CL cs.CV," Meaning has been called the ""holy grail"" of a variety of scientific -disciplines, ranging from linguistics to philosophy, psychology and the -neurosciences. The field of Artifical Intelligence (AI) is very much a part of -that list: the development of sophisticated natural language semantics is a -sine qua non for achieving a level of intelligence comparable to humans. -Embodiment theories in cognitive science hold that human semantic -representation depends on sensori-motor experience; the abundant evidence that -human meaning representation is grounded in the perception of physical reality -leads to the conclusion that meaning must depend on a fusion of multiple -(perceptual) modalities. Despite this, AI research in general, and its -subdisciplines such as computational linguistics and computer vision in -particular, have focused primarily on tasks that involve a single modality. -Here, we propose virtual embodiment as an alternative, long-term strategy for -AI research that is multi-modal in nature and that allows for the kind of -scalability required to develop the field coherently and incrementally, in an -ethically responsible fashion. -" -3677,1610.07569,"Jiaqi Mu, Suma Bhat, Pramod Viswanath",Geometry of Polysemy,cs.CL cs.LG stat.ML," Vector representations of words have heralded a transformational approach to -classical problems in NLP; the most popular example is word2vec. However, a -single vector does not suffice to model the polysemous nature of many -(frequent) words, i.e., words with multiple meanings. In this paper, we propose -a three-fold approach for unsupervised polysemy modeling: (a) context -representations, (b) sense induction and disambiguation and (c) lexeme (as a -word and sense pair) representations. A key feature of our work is the finding -that a sentence containing a target word is well represented by a low rank -subspace, instead of a point in a vector space. We then show that the subspaces -associated with a particular sense of the target word tend to intersect over a -line (one-dimensional subspace), which we use to disambiguate senses using a -clustering algorithm that harnesses the Grassmannian geometry of the -representations. The disambiguation algorithm, which we call $K$-Grassmeans, -leads to a procedure to label the different senses of the target word in the -corpus -- yielding lexeme vector representations, all in an unsupervised manner -starting from a large (Wikipedia) corpus in English. Apart from several -prototypical target (word,sense) examples and a host of empirical studies to -intuit and justify the various geometric representations, we validate our -algorithms on standard sense induction and disambiguation datasets and present -new state-of-the-art results. -" -3678,1610.07647,"Mark Neumann, Pontus Stenetorp, Sebastian Riedel",Learning to Reason With Adaptive Computation,cs.CL cs.NE stat.ML," Multi-hop inference is necessary for machine learning systems to successfully -solve tasks such as Recognising Textual Entailment and Machine Reading. In this -work, we demonstrate the effectiveness of adaptive computation for learning the -number of inference steps required for examples of different complexity and -that learning the correct number of inference steps is difficult. We introduce -the first model involving Adaptive Computation Time which provides a small -performance benefit on top of a similar model without an adaptive component as -well as enabling considerable insight into the reasoning process of the model. -" -3679,1610.07651,"Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, - Navid Shokouhi, John H.L. Hansen",UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation,cs.CL," This document briefly describes the systems submitted by the Center for -Robust Speech Systems (CRSS) from The University of Texas at Dallas (UTD) to -the 2016 National Institute of Standards and Technology (NIST) Speaker -Recognition Evaluation (SRE). We developed several UBM and DNN i-Vector based -speaker recognition systems with different data sets and feature -representations. Given that the emphasis of the NIST SRE 2016 is on language -mismatch between training and enrollment/test data, so-called domain mismatch, -in our system development we focused on: (1) using unlabeled in-domain data for -centralizing data to alleviate the domain mismatch problem, (2) finding the -best data set for training LDA/PLDA, (3) using newly proposed dimension -reduction technique incorporating unlabeled in-domain data before PLDA -training, (4) unsupervised speaker clustering of unlabeled data and using them -alone or with previous SREs for PLDA training, (5) score calibration using only -unlabeled data and combination of unlabeled and development (Dev) data as -separate experiments. -" -3680,1610.07708,"Amit Sheth, Sujan Perera, Sanjaya Wijeratne","Knowledge will Propel Machine Understanding of Content: Extrapolating - from Current Examples",cs.AI cs.CL," Machine Learning has been a big success story during the AI resurgence. One -particular stand out success relates to unsupervised learning from a massive -amount of data, albeit much of it relates to one modality/type of data at a -time. In spite of early assertions of the unreasonable effectiveness of data, -there is increasing recognition of utilizing knowledge whenever it is available -or can be created purposefully. In this paper, we focus on discussing the -indispensable role of knowledge for deeper understanding of complex text and -multimodal data in situations where (i) large amounts of training data -(labeled/unlabeled) are not available or labor intensive to create, (ii) the -objects (particularly text) to be recognized are complex (i.e., beyond simple -entity-person/location/organization names), such as implicit entities and -highly subjective content, and (iii) applications need to use complementary or -related data in multiple modalities/media. What brings us to the cusp of rapid -progress is our ability to (a) create knowledge, varying from comprehensive or -cross domain to domain or application specific, and (b) carefully exploit the -knowledge to further empower or extend the applications of ML/NLP techniques. -Using the early results in several diverse situations - both in data types and -applications - we seek to foretell unprecedented progress in our ability for -deeper understanding and exploitation of multimodal data. -" -3681,1610.07710,"Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran",EmojiNet: Building a Machine Readable Sense Inventory for Emoji,cs.CL," Emoji are a contemporary and extremely popular way to enhance electronic -communication. Without rigid semantics attached to them, emoji symbols take on -different meanings based on the context of a message. Thus, like the word sense -disambiguation task in natural language processing, machines also need to -disambiguate the meaning or sense of an emoji. In a first step toward achieving -this goal, this paper presents EmojiNet, the first machine readable sense -inventory for emoji. EmojiNet is a resource enabling systems to link emoji with -their context-specific meaning. It is automatically constructed by integrating -multiple emoji resources with BabelNet, which is the most comprehensive -multilingual sense inventory available to date. The paper discusses its -construction, evaluates the automatic resource creation process, and presents a -use case where EmojiNet disambiguates emoji usage in tweets. EmojiNet is -available online for use at http://emojinet.knoesis.org. -" -3682,1610.07796,"Carsten Schnober and Steffen Eger and Erik-L\^an Do Dinh and Iryna - Gurevych","Still not there? Comparing Traditional Sequence-to-Sequence Models to - Encoder-Decoder Neural Networks on Monotone String Translation Tasks",cs.CL," We analyze the performance of encoder-decoder neural models and compare them -with well-known established methods. The latter represent different classes of -traditional approaches that are applied to the monotone sequence-to-sequence -tasks OCR post-correction, spelling correction, grapheme-to-phoneme conversion, -and lemmatization. Such tasks are of practical relevance for various -higher-level research fields including digital humanities, automatic text -correction, and speech recognition. We investigate how well generic -deep-learning approaches adapt to these tasks, and how they perform in -comparison with established and more specialized methods, including our own -adaptation of pruned CRFs. -" -3683,1610.07809,"Florian Boudin, Hugo Mougard, Damien Cram",How Document Pre-processing affects Keyphrase Extraction Performance,cs.CL," The SemEval-2010 benchmark dataset has brought renewed attention to the task -of automatic keyphrase extraction. This dataset is made up of scientific -articles that were automatically converted from PDF format to plain text and -thus require careful preprocessing so that irrevelant spans of text do not -negatively affect keyphrase extraction performance. In previous work, a wide -range of document preprocessing techniques were described but their impact on -the overall performance of keyphrase extraction models is still unexplored. -Here, we re-assess the performance of several keyphrase extraction models and -measure their robustness against increasingly sophisticated levels of document -preprocessing. -" -3684,1610.07844,Marcel Bollmann and Anders S{\o}gaard,"Improving historical spelling normalization with bi-directional LSTMs - and multi-task learning",cs.CL," Natural-language processing of historical documents is complicated by the -abundance of variant spellings and lack of annotated data. A common approach is -to normalize the spelling of historical words to modern forms. We explore the -suitability of a deep neural network architecture for this task, particularly a -deep bi-LSTM network applied on a character level. Our model compares well to -previously established normalization algorithms when evaluated on a diverse set -of texts from Early New High German. We show that multi-task learning with -additional normalization data can improve our model's performance further. -" -3685,1610.07918,"Yossi Adi, Joseph Keshet, Emily Cibelli, Matthew Goldrick",Sequence Segmentation Using Joint RNN and Structured Prediction Models,cs.CL," We describe and analyze a simple and effective algorithm for sequence -segmentation applied to speech processing tasks. We propose a neural -architecture that is composed of two modules trained jointly: a recurrent -neural network (RNN) module and a structured prediction model. The RNN outputs -are considered as feature functions to the structured model. The overall model -is trained with a structured loss function which can be designed to the given -segmentation task. We demonstrate the effectiveness of our method by applying -it to two simple tasks commonly used in phonetic studies: word segmentation and -voice onset time segmentation. Results sug- gest the proposed model is superior -to previous methods, ob- taining state-of-the-art results on the tested -datasets. -" -3686,1610.08000,"Raj Nath Patel, Prakash B. Pimpale",Statistical Machine Translation for Indian Languages: Mission Hindi 2,cs.CL," This paper presents Centre for Development of Advanced Computing Mumbai's -(CDACM) submission to NLP Tools Contest on Statistical Machine Translation in -Indian Languages (ILSMT) 2015 (collocated with ICON 2015). The aim of the -contest was to collectively explore the effectiveness of Statistical Machine -Translation (SMT) while translating within Indian languages and between English -and Indian languages. In this paper, we report our work on all five language -pairs, namely Bengali-Hindi (\bnhi), Marathi-Hindi (\mrhi), Tamil-Hindi -(\tahi), Telugu-Hindi (\tehi), and English-Hindi (\enhi) for Health, Tourism, -and General domains. We have used suffix separation, compound splitting and -preordering prior to SMT training and testing. -" -3687,1610.08078,"Tanay Kumar Saha, Shafiq Joty, Naeemul Hassan and Mohammad Al Hasan",Dis-S2V: Discourse Informed Sen2Vec,cs.CL cs.IR," Vector representation of sentences is important for many text processing -tasks that involve clustering, classifying, or ranking sentences. Recently, -distributed representation of sentences learned by neural models from unlabeled -data has been shown to outperform the traditional bag-of-words representation. -However, most of these learning methods consider only the content of a sentence -and disregard the relations among sentences in a discourse by and large. - In this paper, we propose a series of novel models for learning latent -representations of sentences (Sen2Vec) that consider the content of a sentence -as well as inter-sentence relations. We first represent the inter-sentence -relations with a language network and then use the network to induce contextual -information into the content-based Sen2Vec models. Two different approaches are -introduced to exploit the information in the network. Our first approach -retrofits (already trained) Sen2Vec vectors with respect to the network in two -different ways: (1) using the adjacency relations of a node, and (2) using a -stochastic sampling method which is more flexible in sampling neighbors of a -node. The second approach uses a regularizer to encode the information in the -network into the existing Sen2Vec model. Experimental results show that our -proposed models outperform existing methods in three fundamental information -system tasks demonstrating the effectiveness of our approach. The models -leverage the computational power of multi-core CPUs to achieve fine-grained -computational efficiency. We make our code publicly available upon acceptance. -" -3688,1610.08095,"Mengting Wan, Julian McAuley","Modeling Ambiguity, Subjectivity, and Diverging Viewpoints in Opinion - Question Answering Systems",cs.IR cs.CL," Product review websites provide an incredible lens into the wide variety of -opinions and experiences of different people, and play a critical role in -helping users discover products that match their personal needs and -preferences. To help address questions that can't easily be answered by reading -others' reviews, some review websites also allow users to pose questions to the -community via a question-answering (QA) system. As one would expect, just as -opinions diverge among different reviewers, answers to such questions may also -be subjective, opinionated, and divergent. This means that answering such -questions automatically is quite different from traditional QA tasks, where it -is assumed that a single `correct' answer is available. While recent work -introduced the idea of question-answering using product reviews, it did not -account for two aspects that we consider in this paper: (1) Questions have -multiple, often divergent, answers, and this full spectrum of answers should -somehow be used to train the system; and (2) What makes a `good' answer depends -on the asker and the answerer, and these factors should be incorporated in -order for the system to be more personalized. Here we build a new QA dataset -with 800 thousand questions---and over 3.1 million answers---and show that -explicitly accounting for personalization and ambiguity leads both to -quantitatively better answers, but also a more nuanced view of the range of -supporting, but subjective, opinions. -" -3689,1610.08229,Amit Mandelbaum and Adi Shalev,Word Embeddings and Their Use In Sentence Classification Tasks,cs.LG cs.CL," This paper have two parts. In the first part we discuss word embeddings. We -discuss the need for them, some of the methods to create them, and some of -their interesting properties. We also compare them to image embeddings and see -how word embedding and image embedding can be combined to perform different -tasks. In the second part we implement a convolutional neural network trained -on top of pre-trained word vectors. The network is used for several -sentence-level classification tasks, and achieves state-of-art (or comparable) -results, demonstrating the great power of pre-trainted word embeddings over -random ones. -" -3690,1610.08375,Dimitra Gkatzia,Content Selection in Data-to-Text Systems: A Survey,cs.CL," Data-to-text systems are powerful in generating reports from data -automatically and thus they simplify the presentation of complex data. Rather -than presenting data using visualisation techniques, data-to-text systems use -natural (human) language, which is the most common way for human-human -communication. In addition, data-to-text systems can adapt their output content -to users' preferences, background or interests and therefore they can be -pleasant for users to interact with. Content selection is an important part of -every data-to-text system, because it is the module that determines which from -the available information should be conveyed to the user. This survey initially -introduces the field of data-to-text generation, describes the general -data-to-text system architecture and then it reviews the state-of-the-art -content selection methods. Finally, it provides recommendations for choosing an -approach and discusses opportunities for future research. -" -3691,1610.08431,"Zewei Chu, Hai Wang, Kevin Gimpel, David McAllester",Broad Context Language Modeling as Reading Comprehension,cs.CL," Progress in text understanding has been driven by large datasets that test -particular capabilities, like recent datasets for reading comprehension -(Hermann et al., 2015). We focus here on the LAMBADA dataset (Paperno et al., -2016), a word prediction task requiring broader context than the immediate -sentence. We view LAMBADA as a reading comprehension problem and apply -comprehension models based on neural networks. Though these models are -constrained to choose a word from the context, they improve the state of the -art on LAMBADA from 7.3% to 49%. We analyze 100 instances, finding that neural -network readers perform well in cases that involve selecting a name from the -context based on dialogue or discourse cues but struggle when coreference -resolution or external knowledge is needed. -" -3692,1610.08462,"Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang",Distraction-Based Neural Networks for Document Summarization,cs.CL," Distributed representation learned with neural networks has recently shown to -be effective in modeling natural languages at fine granularities such as words, -phrases, and even sentences. Whether and how such an approach can be extended -to help model larger spans of text, e.g., documents, is intriguing, and further -investigation would still be desirable. This paper aims to enhance neural -network models for such a purpose. A typical problem of document-level modeling -is automatic summarization, which aims to model documents in order to generate -summaries. In this paper, we propose neural models to train computers not just -to pay attention to specific regions and content of input documents with -attention models, but also distract them to traverse between different content -of a document so as to better grasp the overall meaning for summarization. -Without engineering any features, we train the models on two large datasets. -The models achieve the state-of-the-art performance, and they significantly -benefit from the distraction modeling, particularly when input documents are -long. -" -3693,1610.08557,"A.K.M. Sabbir, Antonio Jimeno Yepes, and Ramakanth Kavuluru","Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept - Embeddings",cs.CL," Biomedical word sense disambiguation (WSD) is an important intermediate task -in many natural language processing applications such as named entity -recognition, syntactic parsing, and relation extraction. In this paper, we -employ knowledge-based approaches that also exploit recent advances in neural -word/concept embeddings to improve over the state-of-the-art in biomedical WSD -using the MSH WSD dataset as the test set. Our methods involve weak supervision -- we do not use any hand-labeled examples for WSD to build our prediction -models; however, we employ an existing well known named entity recognition and -concept mapping program, MetaMap, to obtain our concept vectors. Over the MSH -WSD dataset, our linear time (in terms of numbers of senses and words in the -test instance) method achieves an accuracy of 92.24% which is an absolute 3% -improvement over the best known results obtained via unsupervised or -knowledge-based means. A more expensive approach that we developed relies on a -nearest neighbor framework and achieves an accuracy of 94.34%. Employing dense -vector representations learned from unlabeled free text has been shown to -benefit many language processing tasks recently and our efforts show that -biomedical WSD is no exception to this trend. For a complex and rapidly -evolving domain such as biomedicine, building labeled datasets for larger sets -of ambiguous terms may be impractical. Here, we show that weak supervision that -leverages recent advances in representation learning can rival supervised -approaches in biomedical WSD. However, external knowledge bases (here sense -inventories) play a key role in the improvements achieved. -" -3694,1610.08597,"Sanjaya Wijeratne, Lakshika Balasuriya, Derek Doran, Amit Sheth",Word Embeddings to Enhance Twitter Gang Member Profile Identification,cs.SI cs.CL cs.CY cs.IR," Gang affiliates have joined the masses who use social media to share thoughts -and actions publicly. Interestingly, they use this public medium to express -recent illegal actions, to intimidate others, and to share outrageous images -and statements. Agencies able to unearth these profiles may thus be able to -anticipate, stop, or hasten the investigation of gang-related crimes. This -paper investigates the use of word embeddings to help identify gang members on -Twitter. Building on our previous work, we generate word embeddings that -translate what Twitter users post in their profile descriptions, tweets, -profile images, and linked YouTube content to a real vector format amenable for -machine learning classification. Our experimental results show that pre-trained -word embeddings can boost the accuracy of supervised learning algorithms -trained over gang members social media posts. -" -3695,1610.08613,{\L}ukasz Kaiser and Samy Bengio,Can Active Memory Replace Attention?,cs.LG cs.CL," Several mechanisms to focus attention of a neural network on selected parts -of its input or memory have been used successfully in deep learning models in -recent years. Attention has improved image classification, image captioning, -speech recognition, generative models, and learning algorithmic tasks, but it -had probably the largest impact on neural machine translation. - Recently, similar improvements have been obtained using alternative -mechanisms that do not focus on a single part of a memory but operate on all of -it in parallel, in a uniform way. Such mechanism, which we call active memory, -improved over attention in algorithmic tasks, image processing, and in -generative modelling. - So far, however, active memory has not improved over attention for most -natural language processing tasks, in particular for machine translation. We -analyze this shortcoming in this paper and propose an extended model of active -memory that matches existing attention models on neural machine translation and -generalizes better to longer sentences. We investigate this model and explain -why previous active memory models did not succeed. Finally, we discuss when -active memory brings most benefits and where attention can be a better choice. -" -3696,1610.08694,Vered Shwartz and Ido Dagan,"CogALex-V Shared Task: LexNET - Integrated Path-based and Distributional - Method for the Identification of Semantic Relations",cs.CL," We present a submission to the CogALex 2016 shared task on the corpus-based -identification of semantic relations, using LexNET (Shwartz and Dagan, 2016), -an integrated path-based and distributional method for semantic relation -classification. The reported results in the shared task bring this submission -to the third place on subtask 1 (word relatedness), and the first place on -subtask 2 (semantic relation classification), demonstrating the utility of -integrating the complementary path-based and distributional information sources -in recognizing concrete semantic relations. Combined with a common similarity -measure, LexNET performs fairly good on the word relatedness task (subtask 1). -The relatively low performance of LexNET and all other systems on subtask 2, -however, confirms the difficulty of the semantic relation classification task, -and stresses the need to develop additional methods for this task. -" -3697,1610.08763,"Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek - F. Abdelzaher, Jiawei Han","CoType: Joint Extraction of Typed Entities and Relations with Knowledge - Bases",cs.CL cs.LG," Extracting entities and relations for types of interest from text is -important for understanding massive text corpora. Traditionally, systems of -entity relation extraction have relied on human-annotated corpora for training -and adopted an incremental pipeline. Such systems require additional human -expertise to be ported to a new domain, and are vulnerable to errors cascading -down the pipeline. In this paper, we investigate joint extraction of typed -entities and relations with labeled data heuristically obtained from knowledge -bases (i.e., distant supervision). As our algorithm for type labeling via -distant supervision is context-agnostic, noisy training data poses unique -challenges for the task. We propose a novel domain-independent framework, -called CoType, that runs a data-driven text segmentation algorithm to extract -entity mentions, and jointly embeds entity mentions, relation mentions, text -features and type labels into two low-dimensional spaces (for entity and -relation mentions respectively), where, in each space, objects whose types are -close will also have similar representations. CoType, then using these learned -embeddings, estimates the types of test (unlinkable) mentions. We formulate a -joint optimization problem to learn embeddings from text corpora and knowledge -bases, adopting a novel partial-label loss function for noisy labeled data and -introducing an object ""translation"" function to capture the cross-constraints -of entities and relations on each other. Experiments on three public datasets -demonstrate the effectiveness of CoType across different domains (e.g., news, -biomedical), with an average of 25% improvement in F1 score compared to the -next best method. -" -3698,1610.08815,"Soujanya Poria, Erik Cambria, Devamanyu Hazarika, Prateek Vij","A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural - Networks",cs.CL," Sarcasm detection is a key task for many natural language processing tasks. -In sentiment analysis, for example, sarcasm can flip the polarity of an -""apparently positive"" sentence and, hence, negatively affect polarity detection -performance. To date, most approaches to sarcasm detection have treated the -task primarily as a text categorization problem. Sarcasm, however, can be -expressed in very subtle ways and requires a deeper understanding of natural -language that standard text categorization techniques cannot grasp. In this -work, we develop models based on a pre-trained convolutional neural network for -extracting sentiment, emotion and personality features for sarcasm detection. -Such features, along with the network's baseline features, allow the proposed -models to outperform the state of the art on benchmark datasets. We also -address the often ignored generalizability issue of classifying data that have -not been seen by the models at learning phase. -" -3699,1610.08914,"Ellery Wulczyn, Nithum Thain, Lucas Dixon",Ex Machina: Personal Attacks Seen at Scale,cs.CL," The damage personal attacks cause to online discourse motivates many -platforms to try to curb the phenomenon. However, understanding the prevalence -and impact of personal attacks in online platforms at scale remains -surprisingly difficult. The contribution of this paper is to develop and -illustrate a method that combines crowdsourcing and machine learning to analyze -personal attacks at scale. We show an evaluation method for a classifier in -terms of the aggregated number of crowd-workers it can approximate. We apply -our methodology to English Wikipedia, generating a corpus of over 100k high -quality human-labeled comments and 63M machine-labeled ones from a classifier -that is as good as the aggregate of 3 crowd-workers, as measured by the area -under the ROC curve and Spearman correlation. Using this corpus of -machine-labeled scores, our methodology allows us to explore some of the open -questions about the nature of online personal attacks. This reveals that the -majority of personal attacks on Wikipedia are not the result of a few malicious -users, nor primarily the consequence of allowing anonymous contributions from -unregistered users. -" -3700,1610.09091,"Shijia E, Yang Xiang, Mohan Zhang",Representation Learning Models for Entity Search,cs.CL," We focus on the problem of learning distributed representations for entity -search queries, named entities, and their short descriptions. With our -representation learning models, the entity search query, named entity and -description can be represented as low-dimensional vectors. Our goal is to -develop a simple but effective model that can make the distributed -representations of query related entities similar to the query in the vector -space. Hence, we propose three kinds of learning strategies, and the difference -between them mainly lies in how to deal with the relationship between an entity -and its description. We analyze the strengths and weaknesses of each learning -strategy and validate our methods on public datasets which contain four kinds -of named entities, i.e., movies, TV shows, restaurants and celebrities. The -experimental results indicate that our proposed methods can adapt to different -types of entity search queries, and outperform the current state-of-the-art -methods based on keyword matching and vanilla word2vec models. Besides, the -proposed methods can be trained fast and be easily extended to other similar -tasks. -" -3701,1610.09158,"Sebastian Ruder, Parsa Ghaffari, and John G. Breslin",Towards a continuous modeling of natural language domains,cs.CL cs.LG," Humans continuously adapt their style and language to a variety of domains. -However, a reliable definition of `domain' has eluded researchers thus far. -Additionally, the notion of discrete domains stands in contrast to the -multiplicity of heterogeneous domains that humans navigate, many of which -overlap. In order to better understand the change and variation of human -language, we draw on research in domain adaptation and extend the notion of -discrete domains to the continuous spectrum. We propose representation -learning-based models that can adapt to continuous domains and detail how these -can be used to investigate variation in language. To this end, we propose to -use dialogue modeling as a test bed due to its proximity to language modeling -and its social component. -" -3702,1610.09225,"Venkata Sasank Pagolu, Kamal Nayan Reddy Challa, Ganapati Panda, - Babita Majhi",Sentiment Analysis of Twitter Data for Predicting Stock Market Movements,cs.IR cs.CL cs.SI," Predicting stock market movements is a well-known problem of interest. -Now-a-days social media is perfectly representing the public sentiment and -opinion about current events. Especially, twitter has attracted a lot of -attention from researchers for studying the public sentiments. Stock market -prediction on the basis of public sentiments expressed on twitter has been an -intriguing field of research. Previous studies have concluded that the -aggregate public mood collected from twitter may well be correlated with Dow -Jones Industrial Average Index (DJIA). The thesis of this work is to observe -how well the changes in stock prices of a company, the rises and falls, are -correlated with the public opinions being expressed in tweets about that -company. Understanding author's opinion from a piece of text is the objective -of sentiment analysis. The present paper have employed two different textual -representations, Word2vec and N-gram, for analyzing the public sentiments in -tweets. In this paper, we have applied sentiment analysis and supervised -machine learning principles to the tweets extracted from twitter and analyze -the correlation between stock market movements of a company and sentiments in -tweets. In an elaborate way, positive news and tweets in social media about a -company would definitely encourage people to invest in the stocks of that -company and as a result the stock price of that company would increase. At the -end of the paper, it is shown that a strong correlation exists between the rise -and falls in stock prices with the public sentiments in tweets. -" -3703,1610.09226,Pavlina Fragkou,"Text Segmentation using Named Entity Recognition and Co-reference - Resolution in English and Greek Texts",cs.CL cs.IR," In this paper we examine the benefit of performing named entity recognition -(NER) and co-reference resolution to an English and a Greek corpus used for -text segmentation. The aim here is to examine whether the combination of text -segmentation and information extraction can be beneficial for the -identification of the various topics that appear in a document. NER was -performed manually in the English corpus and was compared with the output -produced by publicly available annotation tools while, an already existing tool -was used for the Greek corpus. Produced annotations from both corpora were -manually corrected and enriched to cover four types of named entities. -Co-reference resolution i.e., substitution of every reference of the same -instance with the same named entity identifier was subsequently performed. The -evaluation, using five text segmentation algorithms for the English corpus and -four for the Greek corpus leads to the conclusion that, the benefit highly -depends on the segment's topic, the number of named entity instances appearing -in it, as well as the segment's length. -" -3704,1610.09333,"Antoine J.-P. Tixier, Michalis Vazirgiannis, Matthew R. Hallowell",Word Embeddings for the Construction Domain,cs.CL," We introduce word vectors for the construction domain. Our vectors were -obtained by running word2vec on an 11M-word corpus that we created from scratch -by leveraging freely-accessible online sources of construction-related text. We -first explore the embedding space and show that our vectors capture meaningful -construction-specific concepts. We then evaluate the performance of our vectors -against that of ones trained on a 100B-word corpus (Google News) within the -framework of an injury report classification task. Without any parameter -tuning, our embeddings give competitive results, and outperform the Google News -vectors in many cases. Using a keyword-based compression of the reports also -leads to a significant speed-up with only a limited loss in performance. We -release our corpus and the data set we created for the classification task as -publicly available, in the hope that they will be used by future studies for -benchmarking and building on our work. -" -3705,1610.09516,"Lakshika Balasuriya, Sanjaya Wijeratne, Derek Doran, Amit Sheth",Finding Street Gang Members on Twitter,cs.SI cs.CL cs.CY cs.IR," Most street gang members use Twitter to intimidate others, to present -outrageous images and statements to the world, and to share recent illegal -activities. Their tweets may thus be useful to law enforcement agencies to -discover clues about recent crimes or to anticipate ones that may occur. -Finding these posts, however, requires a method to discover gang member Twitter -profiles. This is a challenging task since gang members represent a very small -population of the 320 million Twitter users. This paper studies the problem of -automatically finding gang members on Twitter. It outlines a process to curate -one of the largest sets of verifiable gang member profiles that have ever been -studied. A review of these profiles establishes differences in the language, -images, YouTube links, and emojis gang members use compared to the rest of the -Twitter population. Features from this review are used to train a series of -supervised classifiers. Our classifier achieves a promising F1 score with a low -false positive rate. -" -3706,1610.09565,"Mihaela Rosca, Thomas Breuel",Sequence-to-sequence neural network models for transliteration,cs.CL," Transliteration is a key component of machine translation systems and -software internationalization. This paper demonstrates that neural -sequence-to-sequence models obtain state of the art or close to state of the -art results on existing datasets. In an effort to make machine transliteration -accessible, we open source a new Arabic to English transliteration dataset and -our trained models. -" -3707,1610.09704,"Ji Young Lee, Franck Dernoncourt, Ozlem Uzuner, Peter Szolovits",Feature-Augmented Neural Networks for Patient Note De-identification,cs.CL cs.NE stat.ML," Patient notes contain a wealth of information of potentially great interest -to medical investigators. However, to protect patients' privacy, Protected -Health Information (PHI) must be removed from the patient notes before they can -be legally released, a process known as patient note de-identification. The -main objective for a de-identification system is to have the highest possible -recall. Recently, the first neural-network-based de-identification system has -been proposed, yielding state-of-the-art results. Unlike other systems, it does -not rely on human-engineered features, which allows it to be quickly deployed, -but does not leverage knowledge from human experts or from electronic health -records (EHRs). In this work, we explore a method to incorporate -human-engineered features as well as features derived from EHRs to a -neural-network-based de-identification system. Our results show that the -addition of features, especially the EHR-derived features, further improves the -state-of-the-art in patient note de-identification, including for some of the -most sensitive PHI types such as patient names. Since in a real-life setting -patient notes typically come with EHRs, we recommend developers of -de-identification systems to leverage the information EHRs contain. -" -3708,1610.09722,Jason Naradowsky and Sebastian Riedel,"Represent, Aggregate, and Constrain: A Novel Architecture for Machine - Reading from Noisy Sources",cs.CL," In order to extract event information from text, a machine reading model must -learn to accurately read and interpret the ways in which that information is -expressed. But it must also, as the human reader must, aggregate numerous -individual value hypotheses into a single coherent global analysis, applying -global constraints which reflect prior knowledge of the domain. - In this work we focus on the task of extracting plane crash event information -from clusters of related news articles whose labels are derived via distant -supervision. Unlike previous machine reading work, we assume that while most -target values will occur frequently in most clusters, they may also be missing -or incorrect. - We introduce a novel neural architecture to explicitly model the noisy nature -of the data and to deal with these aforementioned learning issues. Our models -are trained end-to-end and achieve an improvement of more than 12.1 F$_1$ over -previous work, despite using far less linguistic annotation. We apply factor -graph constraints to promote more coherent event analyses, with belief -propagation inference formulated within the transitions of a recurrent neural -network. We show this technique additionally improves maximum F$_1$ by up to -2.8 points, resulting in a relative improvement of $50\%$ over the previous -state-of-the-art. -" -3709,1610.09756,"Vinayak Athavale, Shreenivas Bharadwaj, Monik Pamecha, Ameya Prabhu - and Manish Shrivastava","Towards Deep Learning in Hindi NER: An approach to tackle the Labelled - Data Scarcity",cs.CL cs.LG," In this paper we describe an end to end Neural Model for Named Entity -Recognition NER) which is based on Bi-Directional RNN-LSTM. Almost all NER -systems for Hindi use Language Specific features and handcrafted rules with -gazetteers. Our model is language independent and uses no domain specific -features or any handcrafted rules. Our models rely on semantic information in -the form of word vectors which are learnt by an unsupervised learning algorithm -on an unannotated corpus. Our model attained state of the art performance in -both English and Hindi without the use of any morphological analysis or without -using gazetteers of any sort. -" -3710,1610.09799,"Prakash B. Pimpale, Raj Nath Patel",Experiments with POS Tagging Code-mixed Indian Social Media Text,cs.CL," This paper presents Centre for Development of Advanced Computing Mumbai's -(CDACM) submission to the NLP Tools Contest on Part-Of-Speech (POS) Tagging For -Code-mixed Indian Social Media Text (POSCMISMT) 2015 (collocated with ICON -2015). We submitted results for Hindi (hi), Bengali (bn), and Telugu (te) -languages mixed with English (en). In this paper, we have described our -approaches to the POS tagging techniques, we exploited for this task. Machine -learning has been used to POS tag the mixed language text. For POS tagging, -distributed representations of words in vector space (word2vec) for feature -extraction and Log-linear models have been tried. We report our work on all -three languages hi, bn, and te mixed with en. -" -3711,1610.09889,"Zhe Wang, Wei He, Hua Wu, Haiyang Wu, Wei Li, Haifeng Wang, Enhong - Chen",Chinese Poetry Generation with Planning based Neural Network,cs.CL cs.AI," Chinese poetry generation is a very challenging task in natural language -processing. In this paper, we propose a novel two-stage poetry generating -method which first plans the sub-topics of the poem according to the user's -writing intent, and then generates each line of the poem sequentially, using a -modified recurrent neural network encoder-decoder framework. The proposed -planning-based method can ensure that the generated poem is coherent and -semantically consistent with the user's intent. A comprehensive evaluation with -human judgments demonstrates that our proposed approach outperforms the -state-of-the-art poetry generating methods and the poem quality is somehow -comparable to human poets. -" -3712,1610.09893,Xiang Li and Tao Qin and Jian Yang and Tie-Yan Liu,LightRNN: Memory and Computation-Efficient Recurrent Neural Networks,cs.CL cs.LG," Recurrent neural networks (RNNs) have achieved state-of-the-art performances -in many natural language processing tasks, such as language modeling and -machine translation. However, when the vocabulary is large, the RNN model will -become very big (e.g., possibly beyond the memory capacity of a GPU device) and -its training will become very inefficient. In this work, we propose a novel -technique to tackle this challenge. The key idea is to use 2-Component (2C) -shared embedding for word representations. We allocate every word in the -vocabulary into a table, each row of which is associated with a vector, and -each column associated with another vector. Depending on its position in the -table, a word is jointly represented by two components: a row vector and a -column vector. Since the words in the same row share the row vector and the -words in the same column share the column vector, we only need $2 \sqrt{|V|}$ -vectors to represent a vocabulary of $|V|$ unique words, which are far less -than the $|V|$ vectors required by existing approaches. Based on the -2-Component shared embedding, we design a new RNN algorithm and evaluate it -using the language modeling task on several benchmark datasets. The results -show that our algorithm significantly reduces the model size and speeds up the -training process, without sacrifice of accuracy (it achieves similar, if not -better, perplexity as compared to state-of-the-art language models). -Remarkably, on the One-Billion-Word benchmark Dataset, our algorithm achieves -comparable perplexity to previous language models, whilst reducing the model -size by a factor of 40-100, and speeding up the training process by a factor of -2. We name our proposed algorithm \emph{LightRNN} to reflect its very small -model size and very high training speed. -" -3713,1610.09914,"Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Timothy Baldwin",Named Entity Recognition for Novel Types by Transfer Learning,cs.CL," In named entity recognition, we often don't have a large in-domain training -corpus or a knowledge base with adequate coverage to train a model directly. In -this paper, we propose a method where, given training data in a related domain -with similar (but not identical) named entity (NE) types and a small amount of -in-domain training data, we use transfer learning to learn a domain-specific NE -model. That is, the novelty in the task setup is that we assume not just domain -mismatch, but also label mismatch. -" -3714,1610.09935,"Dominic Seyler, Mohamed Yahya, Klaus Berberich",Knowledge Questions from Knowledge Graphs,cs.CL," We address the novel problem of automatically generating quiz-style knowledge -questions from a knowledge graph such as DBpedia. Questions of this kind have -ample applications, for instance, to educate users about or to evaluate their -knowledge in a specific domain. To solve the problem, we propose an end-to-end -approach. The approach first selects a named entity from the knowledge graph as -an answer. It then generates a structured triple-pattern query, which yields -the answer as its sole result. If a multiple-choice question is desired, the -approach selects alternative answer options. Finally, our approach uses a -template-based method to verbalize the structured query and yield a natural -language question. A key challenge is estimating how difficult the generated -question is to human users. To do this, we make use of historical data from the -Jeopardy! quiz show and a semantically annotated Web-scale document collection, -engineer suitable features, and train a logistic regression classifier to -predict question difficulty. Experiments demonstrate the viability of our -overall approach. -" -3715,1610.09964,Vinu E.V and P Sreenivasa Kumar,Ontology Verbalization using Semantic-Refinement,cs.AI cs.CL," We propose a rule-based technique to generate redundancy-free NL descriptions -of OWL entities.The existing approaches which address the problem of -verbalizing OWL ontologies generate NL text segments which are close to their -counterpart OWL statements.Some of these approaches also perform grouping and -aggregating of these NL text segments to generate a more fluent and -comprehensive form of the content.Restricting our attention to description of -individuals and concepts, we find that the approach currently followed in the -available tools is that of determining the set of all logical conditions that -are satisfied by the given individual/concept name and translate these -conditions verbatim into corresponding NL descriptions.Human-understandability -of such descriptions is affected by the presence of repetitions and -redundancies, as they have high fidelity to their OWL representation.In the -literature, no efforts had been taken to remove redundancies and repetitions at -the logical-level before generating the NL descriptions of entities and we find -this to be the main reason for lack of readability of the generated -text.Herein, we propose a technique called semantic-refinement(SR) to generate -meaningful and easily-understandable descriptions of individuals and concepts -of a given OWLontology.We identify the combinations of OWL/DL constructs that -lead to repetitive/redundant descriptions and propose a series of refinement -rules to rewrite the conditions that are satisfied by an individual/concept in -a meaning-preserving manner.The reduced set of conditions are then employed for -generating NL descriptions.Our experiments show that, SR leads to significantly -improved descriptions of ontology entities.We also test the effectiveness and -usefulness of the the generated descriptions for the purpose of validating the -ontologies and find that the proposed technique is indeed helpful in the -context. -" -3716,1610.09975,"Hagen Soltau, Hank Liao, Hasim Sak","Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large - Vocabulary Speech Recognition",cs.CL cs.LG cs.NE," We present results that show it is possible to build a competitive, greatly -simplified, large vocabulary continuous speech recognition system with whole -words as acoustic units. We model the output vocabulary of about 100,000 words -directly using deep bi-directional LSTM RNNs with CTC loss. The model is -trained on 125,000 hours of semi-supervised acoustic training data, which -enables us to alleviate the data sparsity problem for word models. We show that -the CTC word models work very well as an end-to-end all-neural speech -recognition model without the use of traditional context-dependent sub-word -phone units that require a pronunciation lexicon, and without any language -model removing the need to decode. We demonstrate that the CTC word models -perform better than a strong, more complex, state-of-the-art baseline with -sub-word units. -" -3717,1610.09982,"Lopamudra Dey, Sanjay Chakraborty, Anuraag Biswas, Beepa Bose, Sweta - Tiwari","Sentiment Analysis of Review Datasets Using Naive Bayes and K-NN - Classifier",cs.IR cs.CL," The advent of Web 2.0 has led to an increase in the amount of sentimental -content available in the Web. Such content is often found in social media web -sites in the form of movie or product reviews, user comments, testimonials, -messages in discussion forums etc. Timely discovery of the sentimental or -opinionated web content has a number of advantages, the most important of all -being monetization. Understanding of the sentiments of human masses towards -different entities and products enables better services for contextual -advertisements, recommendation systems and analysis of market trends. The focus -of our project is sentiment focussed web crawling framework to facilitate the -quick discovery of sentimental contents of movie reviews and hotel reviews and -analysis of the same. We use statistical methods to capture elements of -subjective style and the sentence polarity. The paper elaborately discusses two -supervised machine learning algorithms: K-Nearest Neighbour(K-NN) and Naive -Bayes and compares their overall accuracy, precisions as well as recall values. -It was seen that in case of movie reviews Naive Bayes gave far better results -than K-NN but for hotel reviews these algorithms gave lesser, almost same -accuracies. -" -3718,1610.09995,Uladzimir Sidarenka and Manfred Stede,Generating Sentiment Lexicons for German Twitter,cs.CL," Despite a substantial progress made in developing new sentiment lexicon -generation (SLG) methods for English, the task of transferring these approaches -to other languages and domains in a sound way still remains open. In this -paper, we contribute to the solution of this problem by systematically -comparing semi-automatic translations of common English polarity lists with the -results of the original automatic SLG algorithms, which were applied directly -to German data. We evaluate these lexicons on a corpus of 7,992 manually -annotated tweets. In addition to that, we also collate the results of -dictionary- and corpus-based SLG methods in order to find out which of these -paradigms is better suited for the inherently noisy domain of social media. Our -experiments show that semi-automatic translations notably outperform automatic -systems (reaching a macro-averaged F1-score of 0.589), and that -dictionary-based techniques produce much better polarity lists as compared to -corpus-based approaches (whose best F1-scores run up to 0.479 and 0.419 -respectively) even for the non-standard Twitter genre. -" -3719,1610.09996,"Yang Yu, Wei Zhang, Kazi Hasan, Mo Yu, Bing Xiang, Bowen Zhou",End-to-End Answer Chunk Extraction and Ranking for Reading Comprehension,cs.CL," This paper proposes dynamic chunk reader (DCR), an end-to-end neural reading -comprehension (RC) model that is able to extract and rank a set of answer -candidates from a given document to answer questions. DCR is able to predict -answers of variable lengths, whereas previous neural RC models primarily -focused on predicting single tokens or entities. DCR encodes a document and an -input question with recurrent neural networks, and then applies a word-by-word -attention mechanism to acquire question-aware representations for the document, -followed by the generation of chunk representations and a ranking module to -propose the top-ranked chunk as the answer. Experimental results show that DCR -achieves state-of-the-art exact match and F1 scores on the SQuAD dataset. -" -3720,1610.10099,"Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, - Alex Graves, Koray Kavukcuoglu",Neural Machine Translation in Linear Time,cs.CL cs.LG," We present a novel neural network for processing sequences. The ByteNet is a -one-dimensional convolutional neural network that is composed of two parts, one -to encode the source sequence and the other to decode the target sequence. The -two network parts are connected by stacking the decoder on top of the encoder -and preserving the temporal resolution of the sequences. To address the -differing lengths of the source and the target, we introduce an efficient -mechanism by which the decoder is dynamically unfolded over the representation -of the encoder. The ByteNet uses dilation in the convolutional layers to -increase its receptive field. The resulting network has two core properties: it -runs in time that is linear in the length of the sequences and it sidesteps the -need for excessive memorization. The ByteNet decoder attains state-of-the-art -performance on character-level language modelling and outperforms the previous -best results obtained with recurrent networks. The ByteNet also achieves -state-of-the-art performance on character-to-character machine translation on -the English-to-German WMT translation task, surpassing comparable neural -translation models that are based on recurrent networks with attentional -pooling and run in quadratic time. We find that the latent alignment structure -contained in the representations reflects the expected alignment between the -tokens. -" -3721,1611.00020,"Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao","Neural Symbolic Machines: Learning Semantic Parsers on Freebase with - Weak Supervision",cs.CL cs.AI cs.LG," Harnessing the statistical power of neural networks to perform language -understanding and symbolic reasoning is difficult, when it requires executing -efficient discrete operations against a large knowledge-base. In this work, we -introduce a Neural Symbolic Machine, which contains (a) a neural ""programmer"", -i.e., a sequence-to-sequence model that maps language utterances to programs -and utilizes a key-variable memory to handle compositionality (b) a symbolic -""computer"", i.e., a Lisp interpreter that performs program execution, and helps -find good programs by pruning the search space. We apply REINFORCE to directly -optimize the task reward of this structured prediction problem. To train with -weak supervision and improve the stability of REINFORCE, we augment it with an -iterative maximum-likelihood training process. NSM outperforms the -state-of-the-art on the WebQuestionsSP dataset when trained from -question-answer pairs only, without requiring any feature engineering or -domain-specific knowledge. -" -3722,1611.00027,"Mahmoud El-Defrawy, Yasser El-Sonbaty and Nahla A. Belal",CBAS: context based arabic stemmer,cs.CL cs.IR," Arabic morphology encapsulates many valuable features such as word root. -Arabic roots are being utilized for many tasks; the process of extracting a -word root is referred to as stemming. Stemming is an essential part of most -Natural Language Processing tasks, especially for derivative languages such as -Arabic. However, stemming is faced with the problem of ambiguity, where two or -more roots could be extracted from the same word. On the other hand, -distributional semantics is a powerful co-occurrence model. It captures the -meaning of a word based on its context. In this paper, a distributional -semantics model utilizing Smoothed Pointwise Mutual Information (SPMI) is -constructed to investigate its effectiveness on the stemming analysis task. It -showed an accuracy of 81.5%, with a at least 9.4% improvement over other -stemmers. -" -3723,1611.00068,"Richard Sproat, Navdeep Jaitly",RNN Approaches to Text Normalization: A Challenge,cs.CL," This paper presents a challenge to the community: given a large corpus of -written text aligned to its normalized spoken form, train an RNN to learn the -correct normalization function. We present a data set of general text where the -normalizations were generated using an existing text normalization component of -a text-to-speech system. This data set will be released open-source in the near -future. - We also present our own experiments with this data set with a variety of -different RNN architectures. While some of the architectures do in fact produce -very good results when measured in terms of overall accuracy, the errors that -are produced are problematic, since they would convey completely the wrong -message if such a system were deployed in a speech application. On the other -hand, we show that a simple FST-based filter can mitigate those errors, and -achieve a level of accuracy not achievable by the RNN alone. - Though our conclusions are largely negative on this point, we are actually -not arguing that the text normalization problem is intractable using an pure -RNN approach, merely that it is not going to be something that can be solved -merely by having huge amounts of annotated text data and feeding that to a -general RNN model. And when we open-source our data, we will be providing a -novel data set for sequence-to-sequence modeling in the hopes that the the -community can find better solutions. - The data used in this work have been released and are available at: -https://github.com/rwsproat/text-normalization-data -" -3724,1611.00126,Shufeng Xiong,"Improving Twitter Sentiment Classification via Multi-Level - Sentiment-Enriched Word Embeddings",cs.CL," Most of existing work learn sentiment-specific word representation for -improving Twitter sentiment classification, which encoded both n-gram and -distant supervised tweet sentiment information in learning process. They assume -all words within a tweet have the same sentiment polarity as the whole tweet, -which ignores the word its own sentiment polarity. To address this problem, we -propose to learn sentiment-specific word embedding by exploiting both lexicon -resource and distant supervised information. We develop a multi-level -sentiment-enriched word embedding learning method, which uses parallel -asymmetric neural network to model n-gram, word level sentiment and tweet level -sentiment in learning process. Experiments on standard benchmarks show our -approach outperforms state-of-the-art methods. -" -3725,1611.00138,Sebastian Raschka,"MusicMood: Predicting the mood of music from song lyrics using machine - learning",cs.LG cs.CL cs.IR," Sentiment prediction of contemporary music can have a wide-range of -applications in modern society, for instance, selecting music for public -institutions such as hospitals or restaurants to potentially improve the -emotional well-being of personnel, patients, and customers, respectively. In -this project, music recommendation system built upon on a naive Bayes -classifier, trained to predict the sentiment of songs based on song lyrics -alone. The experimental results show that music corresponding to a happy mood -can be detected with high precision based on text features obtained from song -lyrics. -" -3726,1611.00179,"Yingce Xia, Di He, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, - Wei-Ying Ma",Dual Learning for Machine Translation,cs.CL," While neural machine translation (NMT) is making good progress in the past -two years, tens of millions of bilingual sentence pairs are needed for its -training. However, human labeling is very costly. To tackle this training data -bottleneck, we develop a dual-learning mechanism, which can enable an NMT -system to automatically learn from unlabeled data through a dual-learning game. -This mechanism is inspired by the following observation: any machine -translation task has a dual task, e.g., English-to-French translation (primal) -versus French-to-English translation (dual); the primal and dual tasks can form -a closed loop, and generate informative feedback signals to train the -translation models, even if without the involvement of a human labeler. In the -dual-learning mechanism, we use one agent to represent the model for the primal -task and the other agent to represent the model for the dual task, then ask -them to teach each other through a reinforcement learning process. Based on the -feedback signals generated during this process (e.g., the language-model -likelihood of the output of a model, and the reconstruction error of the -original sentence after the primal and dual translations), we can iteratively -update the two models until convergence (e.g., using the policy gradient -methods). We call the corresponding approach to neural machine translation -\emph{dual-NMT}. Experiments show that dual-NMT works very well on -English$\leftrightarrow$French translation; especially, by learning from -monolingual data (with 10% bilingual data for warm start), it achieves a -comparable accuracy to NMT trained from the full bilingual data for the -French-to-English translation task. -" -3727,1611.00196,"Wei Li, Brian Kan Wing Mak","Recurrent Neural Network Language Model Adaptation Derived Document - Vector",cs.CL," In many natural language processing (NLP) tasks, a document is commonly -modeled as a bag of words using the term frequency-inverse document frequency -(TF-IDF) vector. One major shortcoming of the frequency-based TF-IDF feature -vector is that it ignores word orders that carry syntactic and semantic -relationships among the words in a document, and they can be important in some -NLP tasks such as genre classification. This paper proposes a novel distributed -vector representation of a document: a simple recurrent-neural-network language -model (RNN-LM) or a long short-term memory RNN language model (LSTM-LM) is -first created from all documents in a task; some of the LM parameters are then -adapted by each document, and the adapted parameters are vectorized to -represent the document. The new document vectors are labeled as DV-RNN and -DV-LSTM respectively. We believe that our new document vectors can capture some -high-level sequential information in the documents, which other current -document representations fail to capture. The new document vectors were -evaluated in the genre classification of documents in three corpora: the Brown -Corpus, the BNC Baby Corpus and an artificially created Penn Treebank dataset. -Their classification performances are compared with the performance of TF-IDF -vector and the state-of-the-art distributed memory model of paragraph vector -(PV-DM). The results show that DV-LSTM significantly outperforms TF-IDF and -PV-DM in most cases, and combinations of the proposed document vectors with -TF-IDF or PV-DM may further improve performance. -" -3728,1611.00354,"Anoop Kunchukuttan, Pushpak Bhattacharyya","Faster decoding for subword level Phrase-based SMT between related - languages",cs.CL," A common and effective way to train translation systems between related -languages is to consider sub-word level basic units. However, this increases -the length of the sentences resulting in increased decoding time. The increase -in length is also impacted by the specific choice of data format for -representing the sentences as subwords. In a phrase-based SMT framework, we -investigate different choices of decoder parameters as well as data format and -their impact on decoding time and translation accuracy. We suggest best options -for these settings that significantly improve decoding time with little impact -on the translation accuracy. -" -3729,1611.00356,"Renato Rocha Souza, Flavio Codeco Coelho, Rohan Shah, Matthew Connelly",Using Artificial Intelligence to Identify State Secrets,cs.CY cs.CL cs.LG," Whether officials can be trusted to protect national security information has -become a matter of great public controversy, reigniting a long-standing debate -about the scope and nature of official secrecy. The declassification of -millions of electronic records has made it possible to analyze these issues -with greater rigor and precision. Using machine-learning methods, we examined -nearly a million State Department cables from the 1970s to identify features of -records that are more likely to be classified, such as international -negotiations, military operations, and high-level communications. Even with -incomplete data, algorithms can use such features to identify 90% of classified -cables with <11% false positives. But our results also show that there are -longstanding problems in the identification of sensitive information. Error -analysis reveals many examples of both overclassification and -underclassification. This indicates both the need for research on inter-coder -reliability among officials as to what constitutes classified material and the -opportunity to develop recommender systems to better manage both classification -and declassification. -" -3730,1611.00384,"Oren Barkan, Noam Koenigstein, Eylon Yogev and Ori Katz","CB2CF: A Neural Multiview Content-to-Collaborative Filtering Model for - Completely Cold Item Recommendations",cs.IR cs.CL cs.LG," In Recommender Systems research, algorithms are often characterized as either -Collaborative Filtering (CF) or Content Based (CB). CF algorithms are trained -using a dataset of user preferences while CB algorithms are typically based on -item profiles. These approaches harness different data sources and therefore -the resulting recommended items are generally very different. This paper -presents the CB2CF, a deep neural multiview model that serves as a bridge from -items content into their CF representations. CB2CF is a real-world algorithm -designed for Microsoft Store services that handle around a billion users -worldwide. CB2CF is demonstrated on movies and apps recommendations, where it -is shown to outperform an alternative CB model on completely cold items. -" -3731,1611.00440,"Elvyna Tunggawan, Yustinus Eko Soelistio","And the Winner is ...: Bayesian Twitter-based Prediction on 2016 U.S. - Presidential Election",cs.IR cs.CL cs.SI," This paper describes a Naive-Bayesian predictive model for 2016 U.S. -Presidential Election based on Twitter data. We use 33,708 tweets gathered -since December 16, 2015 until February 29, 2016. We introduce a simpler data -preprocessing method to label the data and train the model. The model achieves -95.8% accuracy on 10-fold cross validation and predicts Ted Cruz and Bernie -Sanders as Republican and Democratic nominee respectively. It achieves a -comparable result to those in its competitor methods. -" -3732,1611.00448,"Hao Wang, Xingjian Shi, Dit-Yan Yeung",Natural-Parameter Networks: A Class of Probabilistic Neural Networks,cs.LG cs.AI cs.CL cs.CV stat.ML," Neural networks (NN) have achieved state-of-the-art performance in various -applications. Unfortunately in applications where training data is -insufficient, they are often prone to overfitting. One effective way to -alleviate this problem is to exploit the Bayesian approach by using Bayesian -neural networks (BNN). Another shortcoming of NN is the lack of flexibility to -customize different distributions for the weights and neurons according to the -data, as is often done in probabilistic graphical models. To address these -problems, we propose a class of probabilistic neural networks, dubbed -natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment -of NN. NPN allows the usage of arbitrary exponential-family distributions to -model the weights and neurons. Different from traditional NN and BNN, NPN takes -distributions as input and goes through layers of transformation before -producing distributions to match the target output distributions. As a Bayesian -treatment, efficient backpropagation (BP) is performed to learn the natural -parameters for the distributions over both the weights and neurons. The output -distributions of each layer, as byproducts, may be used as second-order -representations for the associated tasks such as link prediction. Experiments -on real-world datasets show that NPN can achieve state-of-the-art performance. -" -3733,1611.00454,"Hao Wang, Xingjian Shi, Dit-Yan Yeung","Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in - the Blanks",cs.LG cs.AI cs.CL cs.CV stat.ML," Hybrid methods that utilize both content and rating information are commonly -used in many recommender systems. However, most of them use either handcrafted -features or the bag-of-words representation as a surrogate for the content -information but they are neither effective nor natural enough. To address this -problem, we develop a collaborative recurrent autoencoder (CRAE) which is a -denoising recurrent autoencoder (DRAE) that models the generation of content -sequences in the collaborative filtering (CF) setting. The model generalizes -recent advances in recurrent deep learning from i.i.d. input to non-i.i.d. -(CF-based) input and provides a new denoising scheme along with a novel -learnable pooling scheme for the recurrent autoencoder. To do this, we first -develop a hierarchical Bayesian model for the DRAE and then generalize it to -the CF setting. The synergy between denoising and CF enables CRAE to make -accurate recommendations while learning to fill in the blanks in sequences. -Experiments on real-world datasets from different domains (CiteULike and -Netflix) show that, by jointly modeling the order-aware generation of sequences -for the content information and performing CF for the ratings, CRAE is able to -significantly outperform the state of the art on both the recommendation task -based on ratings and the sequence generation task based on content information. -" -3734,1611.00456,"Bo Wang, Yanshu Yu, Yuan Wang","Measuring Asymmetric Opinions on Online Social Interrelationship with - Language and Network Features",cs.SI cs.CL," Instead of studying the properties of social relationship from an objective -view, in this paper, we focus on individuals' subjective and asymmetric -opinions on their interrelationships. Inspired by the theories from -sociolinguistics, we investigate two individuals' opinions on their -interrelationship with their interactive language features. Eliminating the -difference of personal language style, we clarify that the asymmetry of -interactive language feature values can indicate individuals' asymmetric -opinions on their interrelationship. We also discuss how the degree of -opinions' asymmetry is related to the individuals' personality traits. -Furthermore, to measure the individuals' asymmetric opinions on -interrelationship concretely, we develop a novel model synthetizing interactive -language and social network features. The experimental results with Enron email -dataset provide multiple evidences of the asymmetric opinions on -interrelationship, and also verify the effectiveness of the proposed model in -measuring the degree of opinions' asymmetry. -" -3735,1611.00457,"Bo Wang, Yingjun Sun, Yuan Wang","Structure vs. Language: Investigating the Multi-factors of Asymmetric - Opinions on Online Social Interrelationship with a Case Study",cs.SI cs.CL," Though current researches often study the properties of online social -relationship from an objective view, we also need to understand individuals' -subjective opinions on their interrelationships in social computing studies. -Inspired by the theories from sociolinguistics, the latest work indicates that -interactive language can reveal individuals' asymmetric opinions on their -interrelationship. In this work, in order to explain the opinions' asymmetry on -interrelationship with more latent factors, we extend the investigation from -single relationship to the structural context in online social network. We -analyze the correlation between interactive language features and the -structural context of interrelationships. The structural context of vertex, -edges and triangles in social network are considered. With statistical analysis -on Enron email dataset, we find that individuals' opinions (measured by -interactive language features) on their interrelationship are related to some -of their important structural context in social network. This result can help -us to understand and measure the individuals' opinions on their -interrelationship with more intrinsic information. -" -3736,1611.00472,"Ameya Prabhu, Aditya Joshi, Manish Shrivastava and Vasudeva Varma","Towards Sub-Word Level Compositions for Sentiment Analysis of - Hindi-English Code Mixed Text",cs.CL," Sentiment analysis (SA) using code-mixed data from social media has several -applications in opinion mining ranging from customer satisfaction to social -campaign analysis in multilingual societies. Advances in this area are impeded -by the lack of a suitable annotated dataset. We introduce a Hindi-English -(Hi-En) code-mixed dataset for sentiment analysis and perform empirical -analysis comparing the suitability and performance of various state-of-the-art -SA methods in social media. - In this paper, we introduce learning sub-word level representations in LSTM -(Subword-LSTM) architecture instead of character-level or word-level -representations. This linguistic prior in our architecture enables us to learn -the information about sentiment value of important morphemes. This also seems -to work well in highly noisy text containing misspellings as shown in our -experiments which is demonstrated in morpheme-level feature maps learned by our -model. Also, we hypothesize that encoding this linguistic prior in the -Subword-LSTM architecture leads to the superior performance. Our system attains -accuracy 4-5% greater than traditional approaches on our dataset, and also -outperforms the available system for sentiment analysis in Hi-En code-mixed -text by 18%. -" -3737,1611.00483,"Chaozhuo Li, Yu Wu, Wei Wu, Chen Xing, Zhoujun Li, Ming Zhou",Detecting Context Dependent Messages in a Conversational Environment,cs.CL," While automatic response generation for building chatbot systems has drawn a -lot of attention recently, there is limited understanding on when we need to -consider the linguistic context of an input text in the generation process. The -task is challenging, as messages in a conversational environment are short and -informal, and evidence that can indicate a message is context dependent is -scarce. After a study of social conversation data crawled from the web, we -observed that some characteristics estimated from the responses of messages are -discriminative for identifying context dependent messages. With the -characteristics as weak supervision, we propose using a Long Short Term Memory -(LSTM) network to learn a classifier. Our method carries out text -representation and classifier learning in a unified framework. Experimental -results show that the proposed method can significantly outperform baseline -methods on accuracy of classification. -" -3738,1611.00514,"Abbas Khosravani, Cornelius Glackin, Nazim Dugan, G\'erard Chollet, - Nigel Cannings",The Intelligent Voice 2016 Speaker Recognition System,cs.SD cs.CL stat.ML," This paper presents the Intelligent Voice (IV) system submitted to the NIST -2016 Speaker Recognition Evaluation (SRE). The primary emphasis of SRE this -year was on developing speaker recognition technology which is robust for novel -languages that are much more heterogeneous than those used in the current -state-of-the-art, using significantly less training data, that does not contain -meta-data from those languages. The system is based on the state-of-the-art -i-vector/PLDA which is developed on the fixed training condition, and the -results are reported on the protocol defined on the development set of the -challenge. -" -3739,1611.00601,"Sheng Zhang, Rachel Rudinger, Kevin Duh, Benjamin Van Durme",Ordinal Common-sense Inference,cs.CL," Humans have the capacity to draw common-sense inferences from natural -language: various things that are likely but not certain to hold based on -established discourse, and are rarely stated explicitly. We propose an -evaluation of automated common-sense inference based on an extension of -recognizing textual entailment: predicting ordinal human responses on the -subjective likelihood of an inference holding in a given context. We describe a -framework for extracting common-sense knowledge from corpora, which is then -used to construct a dataset for this ordinal entailment task. We train a neural -sequence-to-sequence model on this dataset, which we use to score and generate -possible inferences. Further, we annotate subsets of previously established -datasets via our ordinal annotation protocol in order to then analyze the -distinctions between these and what we have constructed. -" -3740,1611.00674,"Yuanzhi Ke, Masafumi Hagiwara",Fuzzy paraphrases in learning word representations with a lexicon,cs.CL," A synonym of a polysemous word is usually only the paraphrase of one sense -among many. When lexicons are used to improve vector-space word -representations, such paraphrases are unreliable and bring noise to the -vector-space. The prior works use a coefficient to adjust the overall learning -of the lexicons. They regard the paraphrases equally. In this paper, we propose -a novel approach that regards the paraphrases diversely to alleviate the -adverse effects of polysemy. We annotate each paraphrase with a degree of -reliability. The paraphrases are randomly eliminated according to the degrees -when our model learns word representations. In this way, our approach drops the -unreliable paraphrases, keeping more reliable paraphrases at the same time. The -experimental results show that the proposed method improves the word vectors. -Our approach is an attempt to address the polysemy problem keeping one vector -per word. It makes the approach easier to use than the conventional methods -that estimate multiple vectors for a word. Our approach also outperforms the -prior works in the experiments. -" -3741,1611.00801,Mingbin Xu and Hui Jiang,"A FOFE-based Local Detection Approach for Named Entity Recognition and - Mention Detection",cs.CL," In this paper, we study a novel approach for named entity recognition (NER) -and mention detection in natural language processing. Instead of treating NER -as a sequence labelling problem, we propose a new local detection approach, -which rely on the recent fixed-size ordinally forgetting encoding (FOFE) method -to fully encode each sentence fragment and its left/right contexts into a -fixed-size representation. Afterwards, a simple feedforward neural network is -used to reject or predict entity label for each individual fragment. The -proposed method has been evaluated in several popular NER and mention detection -tasks, including the CoNLL 2003 NER task and TAC-KBP2015 and TAC-KBP2016 -Tri-lingual Entity Discovery and Linking (EDL) tasks. Our methods have yielded -pretty strong performance in all of these examined tasks. This local detection -approach has shown many advantages over the traditional sequence labelling -methods. -" -3742,1611.00995,"Dat Quoc Nguyen, Mark Dras, Mark Johnson",An empirical study for Vietnamese dependency parsing,cs.CL," This paper presents an empirical comparison of different dependency parsers -for Vietnamese, which has some unusual characteristics such as copula drop and -verb serialization. Experimental results show that the neural network-based -parsers perform significantly better than the traditional parsers. We report -the highest parsing scores published to date for Vietnamese with the labeled -attachment score (LAS) at 73.53% and the unlabeled attachment score (UAS) at -80.66%. -" -3743,1611.01083,"Alok Ranjan Pal, Anirban Kundu, Abhay Singh, Raj Shekhar and Kunal - Sinha","A Hybrid Approach to Word Sense Disambiguation Combining Supervised and - Unsupervised Learning",cs.CL," In this paper, we are going to find meaning of words based on distinct -situations. Word Sense Disambiguation is used to find meaning of words based on -live contexts using supervised and unsupervised approaches. Unsupervised -approaches use online dictionary for learning, and supervised approaches use -manual learning sets. Hand tagged data are populated which might not be -effective and sufficient for learning procedure. This limitation of information -is main flaw of the supervised approach. Our proposed approach focuses to -overcome the limitation using learning set which is enriched in dynamic way -maintaining new data. Trivial filtering method is utilized to achieve -appropriate training data. We introduce a mixed methodology having Modified -Lesk approach and Bag-of-Words having enriched bags using learning methods. Our -approach establishes the superiority over individual Modified Lesk and -Bag-of-Words approaches based on experimentation. -" -3744,1611.01101,"Emmanuele Chersoni, Giulia Rambelli, Enrico Santus",CogALex-V Shared Task: ROOT18,cs.CL," In this paper, we describe ROOT 18, a classifier using the scores of several -unsupervised distributional measures as features to discriminate between -semantically related and unrelated words, and then to classify the related -pairs according to their semantic relation (i.e. synonymy, antonymy, hypernymy, -part-whole meronymy). Our classifier participated in the CogALex-V Shared Task, -showing a solid performance on the first subtask, but a poor performance on the -second subtask. The low scores reported on the second subtask suggest that -distributional measures are not sufficient to discriminate between multiple -semantic relations at once. -" -3745,1611.01116,Karol Grzegorczyk and Marcin Kurdziel,Binary Paragraph Vectors,cs.CL," Recently Le & Mikolov described two log-linear models, called Paragraph -Vector, that can be used to learn state-of-the-art distributed representations -of documents. Inspired by this work, we present Binary Paragraph Vector models: -simple neural networks that learn short binary codes for fast information -retrieval. We show that binary paragraph vectors outperform autoencoder-based -binary codes, despite using fewer bits. We also evaluate their precision in -transfer learning settings, where binary codes are inferred for documents -unrelated to the training corpus. Results from these experiments indicate that -binary paragraph vectors can capture semantics relevant for various -domain-specific documents. Finally, we present a model that simultaneously -learns short binary codes and longer, real-valued representations. This model -can be used to rapidly retrieve a short list of highly relevant documents from -a large document collection. -" -3746,1611.01242,"Mohit Iyyer, Wen-tau Yih, Ming-Wei Chang","Answering Complicated Question Intents Expressed in Decomposed Question - Sequences",cs.CL," Recent work in semantic parsing for question answering has focused on long -and complicated questions, many of which would seem unnatural if asked in a -normal conversation between two humans. In an effort to explore a -conversational QA setting, we present a more realistic task: answering -sequences of simple but inter-related questions. We collect a dataset of 6,066 -question sequences that inquire about semi-structured tables from Wikipedia, -with 17,553 question-answer pairs in total. Existing QA systems face two major -problems when evaluated on our dataset: (1) handling questions that contain -coreferences to previous questions or answers, and (2) matching words or -phrases in a question to corresponding entries in the associated table. We -conclude by proposing strategies to handle both of these issues. -" -3747,1611.01259,"Avrim Blum, Nika Haghtalab",Generalized Topic Modeling,cs.LG cs.CL cs.DS cs.IR," Recently there has been significant activity in developing algorithms with -provable guarantees for topic modeling. In standard topic models, a topic (such -as sports, business, or politics) is viewed as a probability distribution $\vec -a_i$ over words, and a document is generated by first selecting a mixture $\vec -w$ over topics, and then generating words i.i.d. from the associated mixture -$A{\vec w}$. Given a large collection of such documents, the goal is to recover -the topic vectors and then to correctly classify new documents according to -their topic mixture. - In this work we consider a broad generalization of this framework in which -words are no longer assumed to be drawn i.i.d. and instead a topic is a complex -distribution over sequences of paragraphs. Since one could not hope to even -represent such a distribution in general (even if paragraphs are given using -some natural feature representation), we aim instead to directly learn a -document classifier. That is, we aim to learn a predictor that given a new -document, accurately predicts its topic mixture, without learning the -distributions explicitly. We present several natural conditions under which one -can do this efficiently and discuss issues such as noise tolerance and sample -complexity in this model. More generally, our model can be viewed as a -generalization of the multi-view or co-training setting in machine learning. -" -3748,1611.01368,"Tal Linzen, Emmanuel Dupoux and Yoav Goldberg",Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies,cs.CL," The success of long short-term memory (LSTM) neural networks in language -processing is typically attributed to their ability to capture long-distance -statistical regularities. Linguistic regularities are often sensitive to -syntactic structure; can such dependencies be captured by LSTMs, which do not -have explicit structural representations? We begin addressing this question -using number agreement in English subject-verb dependencies. We probe the -architecture's grammatical competence both using training objectives with an -explicit grammatical target (number prediction, grammaticality judgments) and -using language models. In the strongly supervised settings, the LSTM achieved -very high overall accuracy (less than 1% errors), but errors increased when -sequential and structural information conflicted. The frequency of such errors -rose sharply in the language-modeling setting. We conclude that LSTMs can -capture a non-trivial amount of grammatical structure given targeted -supervision, but stronger architectures may be required to further reduce -errors; furthermore, the language modeling signal is insufficient for capturing -syntax-sensitive dependencies, and should be supplemented with more direct -supervision if such dependencies need to be captured. -" -3749,1611.01400,"Jesse M Lingeman, Hong Yu",Learning to Rank Scientific Documents from the Crowd,cs.IR cs.CL cs.DL cs.LG cs.SI," Finding related published articles is an important task in any science, but -with the explosion of new work in the biomedical domain it has become -especially challenging. Most existing methodologies use text similarity metrics -to identify whether two articles are related or not. However biomedical -knowledge discovery is hypothesis-driven. The most related articles may not be -ones with the highest text similarities. In this study, we first develop an -innovative crowd-sourcing approach to build an expert-annotated -document-ranking corpus. Using this corpus as the gold standard, we then -evaluate the approaches of using text similarity to rank the relatedness of -articles. Finally, we develop and evaluate a new supervised model to -automatically rank related scientific articles. Our results show that authors' -ranking differ significantly from rankings by text-similarity-based models. By -training a learning-to-rank model on a subset of the annotated corpus, we found -the best supervised learning-to-rank model (SVM-Rank) significantly surpassed -state-of-the-art baseline systems. -" -3750,1611.01436,"Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das, - Jonathan Berant","Learning Recurrent Span Representations for Extractive Question - Answering",cs.CL," The reading comprehension task, that asks questions about a given evidence -document, is a central problem in natural language understanding. Recent -formulations of this task have typically focused on answer selection from a set -of candidates pre-defined manually or through the use of an external NLP -pipeline. However, Rajpurkar et al. (2016) recently released the SQuAD dataset -in which the answers can be arbitrary strings from the supplied text. In this -paper, we focus on this answer extraction task, presenting a novel model -architecture that efficiently builds fixed length representations of all spans -in the evidence document with a recurrent network. We show that scoring -explicit span representations significantly improves performance over other -approaches that factor the prediction into separate predictions about words or -start and end markers. Our approach improves upon the best published results of -Wang & Jiang (2016) by 5% and decreases the error of Rajpurkar et al.'s -baseline by > 50%. -" -3751,1611.01462,"Hakan Inan, Khashayar Khosravi, Richard Socher","Tying Word Vectors and Word Classifiers: A Loss Framework for Language - Modeling",cs.LG cs.CL stat.ML," Recurrent neural networks have been very successful at predicting sequences -of words in tasks such as language modeling. However, all such models are based -on the conventional classification framework, where the model is trained -against one-hot targets, and each word is represented both as an input and as -an output in isolation. This causes inefficiencies in learning both in terms of -utilizing all of the information and in terms of the number of parameters -needed to train. We introduce a novel theoretical framework that facilitates -better learning in language modeling, and show that our framework leads to -tying together the input embedding and the output projection matrices, greatly -reducing the number of trainable variables. Our framework leads to state of the -art performance on the Penn Treebank with a variety of network models. -" -3752,1611.01487,"Roee Aharoni, Yoav Goldberg",Morphological Inflection Generation with Hard Monotonic Attention,cs.CL," We present a neural model for morphological inflection generation which -employs a hard attention mechanism, inspired by the nearly-monotonic alignment -commonly found between the characters in a word and the characters in its -inflection. We evaluate the model on three previously studied morphological -inflection generation datasets and show that it provides state of the art -results in various setups compared to previous neural and non-neural -approaches. Finally we present an analysis of the continuous representations -learned by both the hard and soft attention \cite{bahdanauCB14} models for the -task, shedding some light on the features such models extract. -" -3753,1611.01547,"Philip Blair, Yuval Merhav, and Joel Barry","Automated Generation of Multilingual Clusters for the Evaluation of - Distributed Representations",cs.CL cs.LG," We propose a language-agnostic way of automatically generating sets of -semantically similar clusters of entities along with sets of ""outlier"" -elements, which may then be used to perform an intrinsic evaluation of word -embeddings in the outlier detection task. We used our methodology to create a -gold-standard dataset, which we call WikiSem500, and evaluated multiple -state-of-the-art embeddings. The results show a correlation between performance -on this dataset and performance on sentiment analysis. -" -3754,1611.01576,"James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher",Quasi-Recurrent Neural Networks,cs.NE cs.AI cs.CL cs.LG," Recurrent neural networks are a powerful tool for modeling sequential data, -but the dependence of each timestep's computation on the previous timestep's -output limits parallelism and makes RNNs unwieldy for very long sequences. We -introduce quasi-recurrent neural networks (QRNNs), an approach to neural -sequence modeling that alternates convolutional layers, which apply in parallel -across timesteps, and a minimalist recurrent pooling function that applies in -parallel across channels. Despite lacking trainable recurrent layers, stacked -QRNNs have better predictive accuracy than stacked LSTMs of the same hidden -size. Due to their increased parallelism, they are up to 16 times faster at -train and test time. Experiments on language modeling, sentiment -classification, and character-level neural machine translation demonstrate -these advantages and underline the viability of QRNNs as a basic building block -for a variety of sequence tasks. -" -3755,1611.01587,"Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher",A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks,cs.CL cs.AI," Transfer and multi-task learning have traditionally focused on either a -single source-target pair or very few, similar tasks. Ideally, the linguistic -levels of morphology, syntax and semantics would benefit each other by being -trained in a single model. We introduce a joint many-task model together with a -strategy for successively growing its depth to solve increasingly complex -tasks. Higher layers include shortcut connections to lower-level task -predictions to reflect linguistic hierarchies. We use a simple regularization -term to allow for optimizing all model weights to improve one task's loss -without exhibiting catastrophic interference of the other tasks. Our single -end-to-end model obtains state-of-the-art or competitive results on five -different tasks from tagging, parsing, relatedness, and entailment tasks. -" -3756,1611.01599,"Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de - Freitas",LipNet: End-to-End Sentence-level Lipreading,cs.LG cs.CL cs.CV," Lipreading is the task of decoding text from the movement of a speaker's -mouth. Traditional approaches separated the problem into two stages: designing -or learning visual features, and prediction. More recent deep lipreading -approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, -2016a). However, existing work on models trained end-to-end perform only word -classification, rather than sentence-level sequence prediction. Studies have -shown that human lipreading performance increases for longer words (Easton & -Basala, 1982), indicating the importance of features capturing temporal context -in an ambiguous communication channel. Motivated by this observation, we -present LipNet, a model that maps a variable-length sequence of video frames to -text, making use of spatiotemporal convolutions, a recurrent network, and the -connectionist temporal classification loss, trained entirely end-to-end. To the -best of our knowledge, LipNet is the first end-to-end sentence-level lipreading -model that simultaneously learns spatiotemporal visual features and a sequence -model. On the GRID corpus, LipNet achieves 95.2% accuracy in sentence-level, -overlapped speaker split task, outperforming experienced human lipreaders and -the previous 86.4% word-level state-of-the-art accuracy (Gergen et al., 2016). -" -3757,1611.01603,"Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi",Bidirectional Attention Flow for Machine Comprehension,cs.CL," Machine comprehension (MC), answering a query about a given context -paragraph, requires modeling complex interactions between the context and the -query. Recently, attention mechanisms have been successfully extended to MC. -Typically these methods use attention to focus on a small portion of the -context and summarize it with a fixed-size vector, couple attentions -temporally, and/or often form a uni-directional attention. In this paper we -introduce the Bi-Directional Attention Flow (BIDAF) network, a multi-stage -hierarchical process that represents the context at different levels of -granularity and uses bi-directional attention flow mechanism to obtain a -query-aware context representation without early summarization. Our -experimental evaluations show that our model achieves the state-of-the-art -results in Stanford Question Answering Dataset (SQuAD) and CNN/DailyMail cloze -test. -" -3758,1611.01604,"Caiming Xiong, Victor Zhong, Richard Socher",Dynamic Coattention Networks For Question Answering,cs.CL cs.AI," Several deep learning models have been proposed for question answering. -However, due to their single-pass nature, they have no way to recover from -local maxima corresponding to incorrect answers. To address this problem, we -introduce the Dynamic Coattention Network (DCN) for question answering. The DCN -first fuses co-dependent representations of the question and the document in -order to focus on relevant parts of both. Then a dynamic pointing decoder -iterates over potential answer spans. This iterative procedure enables the -model to recover from initial local maxima corresponding to incorrect answers. -On the Stanford question answering dataset, a single DCN model improves the -previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains -80.4% F1. -" -3759,1611.01628,"Zichao Yang, Phil Blunsom, Chris Dyer, Wang Ling",Reference-Aware Language Models,cs.CL," We propose a general class of language models that treat reference as an -explicit stochastic latent variable. This architecture allows models to create -mentions of entities and their attributes by accessing external databases -(required by, e.g., dialogue generation and recipe generation) and internal -state (required by, e.g. language models which are aware of coreference). This -facilitates the incorporation of information that can be accessed in -predictable locations in databases or discourse context, even when the targets -of the reference may be rare words. Experiments on three tasks shows our model -variants based on deterministic attention. -" -3760,1611.01702,"Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley",TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency,cs.CL cs.AI cs.LG stat.ML," In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based -language model designed to directly capture the global semantic meaning -relating words in a document via latent topics. Because of their sequential -nature, RNNs are good at capturing the local structure of a word sequence - -both semantic and syntactic - but might face difficulty remembering long-range -dependencies. Intuitively, these long-range dependencies are of semantic -nature. In contrast, latent topic models are able to capture the global -underlying semantic structure of a document but do not account for word -ordering. The proposed TopicRNN model integrates the merits of RNNs and latent -topic models: it captures local (syntactic) dependencies using an RNN and -global (semantic) dependencies using latent topics. Unlike previous work on -contextual RNN language modeling, our model is learned end-to-end. Empirical -results on word prediction show that TopicRNN outperforms existing contextual -RNN baselines. In addition, TopicRNN can be used as an unsupervised feature -extractor for documents. We do this for sentiment analysis on the IMDB movie -review dataset and report an error rate of $6.28\%$. This is comparable to the -state-of-the-art $5.91\%$ resulting from a semi-supervised approach. Finally, -TopicRNN also yields sensible topics, making it a useful alternative to -document models such as latent Dirichlet allocation. -" -3761,1611.01714,"Ark Anderson, Kyle Shaffer, Artem Yankov, Court D. Corley, Nathan O. - Hodas",Beyond Fine Tuning: A Modular Approach to Learning on Small Data,cs.LG cs.CL," In this paper we present a technique to train neural network models on small -amounts of data. Current methods for training neural networks on small amounts -of rich data typically rely on strategies such as fine-tuning a pre-trained -neural network or the use of domain-specific hand-engineered features. Here we -take the approach of treating network layers, or entire networks, as modules -and combine pre-trained modules with untrained modules, to learn the shift in -distributions between data sets. The central impact of using a modular approach -comes from adding new representations to a network, as opposed to replacing -representations via fine-tuning. Using this technique, we are able surpass -results using standard fine-tuning transfer learning approaches, and we are -also able to significantly increase performance over such approaches when using -smaller amounts of data. -" -3762,1611.01724,"Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, - Ruslan Salakhutdinov",Words or Characters? Fine-grained Gating for Reading Comprehension,cs.CL cs.LG," Previous work combines word-level and character-level representations using -concatenation or scalar weighting, which is suboptimal for high-level tasks -like reading comprehension. We present a fine-grained gating mechanism to -dynamically combine word-level and character-level representations based on -properties of the words. We also extend the idea of fine-grained gating to -modeling the interaction between questions and paragraphs for reading -comprehension. Experiments show that our approach can improve the performance -on reading comprehension tasks, achieving new state-of-the-art results on the -Children's Book Test dataset. To demonstrate the generality of our gating -mechanism, we also show improved results on a social media tag prediction task. -" -3763,1611.01734,Timothy Dozat and Christopher D. Manning,Deep Biaffine Attention for Neural Dependency Parsing,cs.CL cs.NE," This paper builds off recent work from Kiperwasser & Goldberg (2016) using -neural attention in a simple graph-based dependency parser. We use a larger but -more thoroughly regularized parser than other recent BiLSTM-based approaches, -with biaffine classifiers to predict arcs and labels. Our parser gets state of -the art or near state of the art performance on standard treebanks for six -different languages, achieving 95.7% UAS and 94.1% LAS on the most popular -English PTB dataset. This makes it the highest-performing graph-based parser on -this benchmark---outperforming Kiperwasser Goldberg (2016) by 1.8% and -2.2%---and comparable to the highest performing transition-based parser -(Kuncoro et al., 2016), which achieves 95.8% UAS and 94.6% LAS. We also show -which hyperparameter choices had a significant effect on parsing accuracy, -allowing us to achieve large gains over other graph-based approaches. -" -3764,1611.01747,Shuohang Wang and Jing Jiang,A Compare-Aggregate Model for Matching Text Sequences,cs.CL cs.AI," Many NLP tasks including machine comprehension, answer selection and text -entailment require the comparison between sequences. Matching the important -units between sequences is a key to solve these problems. In this paper, we -present a general ""compare-aggregate"" framework that performs word-level -matching followed by aggregation using Convolutional Neural Networks. We -particularly focus on the different comparison functions we can use to match -two vectors. We use four different datasets to evaluate the model. We find that -some simple comparison functions based on element-wise operations can work -better than standard neural network and neural tensor network. -" -3765,1611.01783,"Yehoshua Dissen, Joseph Keshet, Jacob Goldberger and Cynthia Clopper",Domain Adaptation For Formant Estimation Using Deep Learning,cs.CL cs.SD," In this paper we present a domain adaptation technique for formant estimation -using a deep network. We first train a deep learning network on a small read -speech dataset. We then freeze the parameters of the trained network and use -several different datasets to train an adaptation layer that makes the obtained -network universal in the sense that it works well for a variety of speakers and -speech domains with very different characteristics. We evaluated our adapted -network on three datasets, each of which has different speaker characteristics -and speech styles. The performance of our method compares favorably with -alternative methods for formant estimation. -" -3766,1611.01802,"Ricardo Usbeck, Jonathan Huthmann, Nico Duldhardt, Axel-Cyrille Ngonga - Ngomo",Self-Wiring Question Answering Systems,cs.AI cs.CL cs.IR," Question answering (QA) has been the subject of a resurgence over the past -years. The said resurgence has led to a multitude of question answering (QA) -systems being developed both by companies and research facilities. While a few -components of QA systems get reused across implementations, most systems do not -leverage the full potential of component reuse. Hence, the development of QA -systems is currently still a tedious and time-consuming process. We address the -challenge of accelerating the creation of novel or tailored QA systems by -presenting a concept for a self-wiring approach to composing QA systems. Our -approach will allow the reuse of existing, web-based QA systems or modules -while developing new QA platforms. To this end, it will rely on QA modules -being described using the Web Ontology Language. Based on these descriptions, -our approach will be able to automatically compose QA systems using a -data-driven approach automatically. -" -3767,1611.01839,"Eunsol Choi, Daniel Hewlett, Alexandre Lacoste, Illia Polosukhin, - Jakob Uszkoreit, Jonathan Berant",Hierarchical Question Answering for Long Documents,cs.CL," We present a framework for question answering that can efficiently scale to -longer documents while maintaining or even improving performance of -state-of-the-art models. While most successful approaches for reading -comprehension rely on recurrent neural networks (RNNs), running them over long -documents is prohibitively slow because it is difficult to parallelize over -sequences. Inspired by how people first skim the document, identify relevant -parts, and carefully read these parts to produce an answer, we combine a -coarse, fast model for selecting relevant sentences and a more expensive RNN -for producing the answer from those sentences. We treat sentence selection as a -latent variable trained jointly from the answer only using reinforcement -learning. Experiments demonstrate the state of the art performance on a -challenging subset of the Wikireading and on a new dataset, while speeding up -the model by 3.5x-6.7x. -" -3768,1611.01867,"Xinyun Chen, Chang Liu, Richard Shin, Dawn Song, Mingcheng Chen",Latent Attention For If-Then Program Synthesis,cs.CL," Automatic translation from natural language descriptions into programs is a -longstanding challenging problem. In this work, we consider a simple yet -important sub-problem: translation from textual descriptions to If-Then -programs. We devise a novel neural network architecture for this task which we -train end-to-end. Specifically, we introduce Latent Attention, which computes -multiplicative weights for the words in the description in a two-stage process -with the goal of better leveraging the natural language structures that -indicate the relevant parts for predicting program elements. Our architecture -reduces the error rate by 28.57% compared to prior art. We also propose a -one-shot learning scenario of If-Then program synthesis and simulate it with -our existing dataset. We demonstrate a variation on the training procedure for -this scenario that outperforms the original procedure, significantly closing -the gap to the model trained with all data. -" -3769,1611.01868,"Luyang Li, Bing Qin, Wenjing Ren, Ting Liu",Truth Discovery with Memory Network,cs.CL cs.DB," Truth discovery is to resolve conflicts and find the truth from -multiple-source statements. Conventional methods mostly research based on the -mutual effect between the reliability of sources and the credibility of -statements, however, pay no attention to the mutual effect among the -credibility of statements about the same object. We propose memory network -based models to incorporate these two ideas to do the truth discovery. We use -feedforward memory network and feedback memory network to learn the -representation of the credibility of statements which are about the same -object. Specially, we adopt memory mechanism to learn source reliability and -use it through truth prediction. During learning models, we use multiple types -of data (categorical data and continuous data) by assigning different weights -automatically in the loss function based on their own effect on truth discovery -prediction. The experiment results show that the memory network based models -much outperform the state-of-the-art method and other baseline methods. -" -3770,1611.01874,"Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li",Neural Machine Translation with Reconstruction,cs.CL," Although end-to-end Neural Machine Translation (NMT) has achieved remarkable -progress in the past two years, it suffers from a major drawback: translations -generated by NMT systems often lack of adequacy. It has been widely observed -that NMT tends to repeatedly translate some source words while mistakenly -ignoring other words. To alleviate this problem, we propose a novel -encoder-decoder-reconstructor framework for NMT. The reconstructor, -incorporated into the NMT model, manages to reconstruct the input source -sentence from the hidden layer of the output target sentence, to ensure that -the information in the source side is transformed to the target side as much as -possible. Experiments show that the proposed framework significantly improves -the adequacy of NMT output and achieves superior translation result over -state-of-the-art NMT and statistical MT systems. -" -3771,1611.01884,"Depeng Liang, Yongdong Zhang","AC-BLSTM: Asymmetric Convolutional Bidirectional LSTM Networks for Text - Classification",cs.CL," Recently deeplearning models have been shown to be capable of making -remarkable performance in sentences and documents classification tasks. In this -work, we propose a novel framework called AC-BLSTM for modeling sentences and -documents, which combines the asymmetric convolution neural network (ACNN) with -the Bidirectional Long Short-Term Memory network (BLSTM). Experiment results -demonstrate that our model achieves state-of-the-art results on five tasks, -including sentiment analysis, question type classification, and subjectivity -classification. In order to further improve the performance of AC-BLSTM, we -propose a semi-supervised learning framework called G-AC-BLSTM for text -classification by combining the generative model with AC-BLSTM. -" -3772,1611.02007,"Adrien Bougouin, Florian Boudin, B\'eatrice Daille",Keyphrase Annotation with Graph Co-Ranking,cs.CL," Keyphrase annotation is the task of identifying textual units that represent -the main content of a document. Keyphrase annotation is either carried out by -extracting the most important phrases from a document, keyphrase extraction, or -by assigning entries from a controlled domain-specific vocabulary, keyphrase -assignment. Assignment methods are generally more reliable. They provide -better-formed keyphrases, as well as keyphrases that do not occur in the -document. But they are often silent on the contrary of extraction methods that -do not depend on manually built resources. This paper proposes a new method to -perform both keyphrase extraction and keyphrase assignment in an integrated and -mutual reinforcing manner. Experiments have been carried out on datasets -covering different domains of humanities and social sciences. They show -statistically significant improvements compared to both keyphrase extraction -and keyphrase assignment state-of-the art methods. -" -3773,1611.02025,"Xavier Holt, Will Radford, Ben Hachey",Presenting a New Dataset for the Timeline Generation Problem,cs.CL," The timeline generation task summarises an entity's biography by selecting -stories representing key events from a large pool of relevant documents. This -paper addresses the lack of a standard dataset and evaluative methodology for -the problem. We present and make publicly available a new dataset of 18,793 -news articles covering 39 entities. For each entity, we provide a gold standard -timeline and a set of entity-related articles. We propose ROUGE as an -evaluation metric and validate our dataset by showing that top Google results -outperform straw-man baselines. -" -3774,1611.02027,Will Radford and Andrew Chisholm and Ben Hachey and Bo Han,":telephone::person::sailboat::whale::okhand:; or ""Call me Ishmael"" - How - do you translate emoji?",cs.CL," We report on an exploratory analysis of Emoji Dick, a project that leverages -crowdsourcing to translate Melville's Moby Dick into emoji. This distinctive -use of emoji removes textual context, and leads to a varying translation -quality. In this paper, we use statistical word alignment and part-of-speech -tagging to explore how people use emoji. Despite these simple methods, we -observed differences in token and part-of-speech distributions. Experiments -also suggest that semantics are preserved in the translation, and repetition is -more common in emoji. -" -3775,1611.02091,"Bin He, Bin Dong, Yi Guan, Jinfeng Yang, Zhipeng Jiang, Qiubin Yu, - Jianyi Cheng, Chunyan Qu","Building a comprehensive syntactic and semantic corpus of Chinese - clinical texts",cs.CL," Objective: To build a comprehensive corpus covering syntactic and semantic -annotations of Chinese clinical texts with corresponding annotation guidelines -and methods as well as to develop tools trained on the annotated corpus, which -supplies baselines for research on Chinese texts in the clinical domain. - Materials and methods: An iterative annotation method was proposed to train -annotators and to develop annotation guidelines. Then, by using annotation -quality assurance measures, a comprehensive corpus was built, containing -annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, -and relations. Inter-annotator agreement (IAA) was calculated to evaluate the -annotation quality and a Chinese clinical text processing and information -extraction system (CCTPIES) was developed based on our annotated corpus. - Results: The syntactic corpus consists of 138 Chinese clinical documents with -47,424 tokens and 2553 full parsing trees, while the semantic corpus includes -992 documents that annotated 39,511 entities with their assertions and 7695 -relations. IAA evaluation shows that this comprehensive corpus is of good -quality, and the system modules are effective. - Discussion: The annotated corpus makes a considerable contribution to natural -language processing (NLP) research into Chinese texts in the clinical domain. -However, this corpus has a number of limitations. Some additional types of -clinical text should be introduced to improve corpus coverage and active -learning methods should be utilized to promote annotation efficiency. - Conclusions: In this study, several annotation guidelines and an annotation -method for Chinese clinical texts were proposed, and a comprehensive corpus -with its NLP modules were constructed, providing a foundation for further study -of applying NLP techniques to Chinese texts in the clinical domain. -" -3776,1611.02266,Liwen Zhang and John Winn and Ryota Tomioka,"Gaussian Attention Model and Its Application to Knowledge Base Embedding - and Question Answering",stat.ML cs.AI cs.CL cs.LG," We propose the Gaussian attention model for content-based neural memory -access. With the proposed attention model, a neural network has the additional -degree of freedom to control the focus of its attention from a laser sharp -attention to a broad attention. It is applicable whenever we can assume that -the distance in the latent space reflects some notion of semantics. We use the -proposed attention model as a scoring function for the embedding of a knowledge -base into a continuous vector space and then train a model that performs -question answering about the entities in the knowledge base. The proposed -attention model can handle both the propagation of uncertainty when following a -series of relations and also the conjunction of conditions in a natural way. On -a dataset of soccer players who participated in the FIFA World Cup 2014, we -demonstrate that our model can handle both path queries and conjunctive queries -well. -" -3777,1611.02337,"Daniel Robins, Fernando Emmanuel Frati, Jonatan Alvarez, Jose Texier","Balotage in Argentina 2015, a sentiment analysis of tweets",cs.IR cs.CL cs.SI," Twitter social network contains a large amount of information generated by -its users. That information is composed of opinions and comments that may -reflect trends in social behavior. There is talk of trend when it is possible -to identify opinions and comments geared towards the same shared by a lot of -people direction. To determine if two or more written opinions share the same -address, techniques Natural Language Processing (NLP) are used. This paper -proposes a methodology for predicting reflected in Twitter from the use of -sentiment analysis functions NLP based on social behaviors. The case study was -selected the 2015 Presidential in Argentina, and a software architecture Big -Data composed Vertica data base with the component called Pulse was used. -Through the analysis it was possible to detect trends in voting intentions with -regard to the presidential candidates, achieving greater accuracy in predicting -that achieved with traditional systems surveys. -" -3778,1611.02344,"Jonas Gehring, Michael Auli, David Grangier, Yann N. Dauphin",A Convolutional Encoder Model for Neural Machine Translation,cs.CL," The prevalent approach to neural machine translation relies on bi-directional -LSTMs to encode the source sentence. In this paper we present a faster and -simpler architecture based on a succession of convolutional layers. This allows -to encode the entire source sentence simultaneously compared to recurrent -networks for which computation is constrained by temporal dependencies. On -WMT'16 English-Romanian translation we achieve competitive accuracy to the -state-of-the-art and we outperform several recently published results on the -WMT'15 English-German task. Our models obtain almost the same accuracy as a -very deep LSTM setup on WMT'14 English-French translation. Our convolutional -encoder speeds up CPU decoding by more than two times at the same or higher -accuracy as a strong bi-directional LSTM baseline. -" -3779,1611.02360,"Dragomir Radev, Rui Zhang, Steve Wilson, Derek Van Assche, Henrique - Spyra Gubert, Alisa Krivokapic, MeiXing Dong, Chongruo Wu, Spruce Bondera, - Luke Brandl, Jeremy Dohmann",Cruciform: Solving Crosswords with Natural Language Processing,cs.CL," Crossword puzzles are popular word games that require not only a large -vocabulary, but also a broad knowledge of topics. Answering each clue is a -natural language task on its own as many clues contain nuances, puns, or -counter-intuitive word definitions. Additionally, it can be extremely difficult -to ascertain definitive answers without the constraints of the crossword grid -itself. This task is challenging for both humans and computers. We describe -here a new crossword solving system, Cruciform. We employ a group of natural -language components, each of which returns a list of candidate words with -scores when given a clue. These lists are used in conjunction with the fill -intersections in the puzzle grid to formulate a constraint satisfaction -problem, in a manner similar to the one used in the Dr. Fill system. We -describe the results of several of our experiments with the system. -" -3780,1611.02361,"Rui Zhang, Honglak Lee, Dragomir Radev","Dependency Sensitive Convolutional Neural Networks for Modeling - Sentences and Documents",cs.CL," The goal of sentence and document modeling is to accurately represent the -meaning of sentences and documents for various Natural Language Processing -tasks. In this work, we present Dependency Sensitive Convolutional Neural -Networks (DSCNN) as a general-purpose classification system for both sentences -and documents. DSCNN hierarchically builds textual representations by -processing pretrained word embeddings via Long Short-Term Memory networks and -subsequently extracting features with convolution operators. Compared with -existing recursive neural models with tree structures, DSCNN does not rely on -parsers and expensive phrase labeling, and thus is not restricted to -sentence-level tasks. Moreover, unlike other CNN-based models that analyze -sentences locally by sliding windows, our system captures both the dependency -information within each sentence and relationships across sentences in the same -document. Experiment results demonstrate that our approach is achieving -state-of-the-art performance on several tasks, including sentiment analysis, -question type classification, and subjectivity classification. -" -3781,1611.02378,"Yufeng Ma, Long Xia, Wenqi Shen, Mi Zhou, Weiguo Fan",A Surrogate-based Generic Classifier for Chinese TV Series Reviews,cs.CL," With the emerging of various online video platforms like Youtube, Youku and -LeTV, online TV series' reviews become more and more important both for viewers -and producers. Customers rely heavily on these reviews before selecting TV -series, while producers use them to improve the quality. As a result, -automatically classifying reviews according to different requirements evolves -as a popular research topic and is essential in our daily life. In this paper, -we focused on reviews of hot TV series in China and successfully trained -generic classifiers based on eight predefined categories. The experimental -results showed promising performance and effectiveness of its generalization to -different TV series. -" -3782,1611.02550,Shane Settle and Karen Livescu,"Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based - Approaches",cs.CL," Acoustic word embeddings --- fixed-dimensional vector representations of -variable-length spoken word segments --- have begun to be considered for tasks -such as speech recognition and query-by-example search. Such embeddings can be -learned discriminatively so that they are similar for speech segments -corresponding to the same word, while being dissimilar for segments -corresponding to different words. Recent work has found that acoustic word -embeddings can outperform dynamic time warping on query-by-example search and -related word discrimination tasks. However, the space of embedding models and -training approaches is still relatively unexplored. In this paper we present -new discriminative embedding models based on recurrent neural networks (RNNs). -We consider training losses that have been successful in prior work, in -particular a cross entropy loss for word classification and a contrastive loss -that explicitly aims to separate same-word and different-word pairs in a -""Siamese network"" training setting. We find that both classifier-based and -Siamese RNN embeddings improve over previously reported results on a word -discrimination task, with Siamese RNNs outperforming classification models. In -addition, we present analyses of the learned embeddings and the effects of -variables such as dimensionality and network structure. -" -3783,1611.02554,"Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky",The Neural Noisy Channel,cs.CL cs.AI cs.NE," We formulate sequence to sequence transduction as a noisy channel decoding -problem and use recurrent neural networks to parameterise the source and -channel models. Unlike direct models which can suffer from explaining-away -effects during training, noisy channel models must produce outputs that explain -their inputs, and their component models can be trained with not only paired -training samples but also unpaired samples from the marginal output -distribution. Using a latent variable to control how much of the conditioning -sequence the channel model needs to read in order to generate a subsequent -symbol, we obtain a tractable and effective beam search decoder. Experimental -results on abstractive sentence summarisation, morphological inflection, and -machine translation show that noisy channel models outperform direct models, -and that they significantly benefit from increased amounts of unpaired output -data that direct models cannot easily use. -" -3784,1611.02588,Piroska Lendvai and Uwe D. Reichel,Contradiction Detection for Rumorous Claims,cs.CL," The utilization of social media material in journalistic workflows is -increasing, demanding automated methods for the identification of mis- and -disinformation. Since textual contradiction across social media posts can be a -signal of rumorousness, we seek to model how claims in Twitter posts are being -textually contradicted. We identify two different contexts in which -contradiction emerges: its broader form can be observed across independently -posted tweets and its more specific form in threaded conversations. We define -how the two scenarios differ in terms of central elements of argumentation: -claims and conversation structure. We design and evaluate models for the two -scenarios uniformly as 3-way Recognizing Textual Entailment tasks in order to -represent claims and conversation structure implicitly in a generic inference -model, while previous studies used explicit or no representation of these -properties. To address noisy text, our classifiers use simple similarity -features derived from the string and part-of-speech level. Corpus statistics -reveal distribution differences for these features in contradictory as opposed -to non-contradictory tweet relations, and the classifiers yield state of the -art performance. -" -3785,1611.02590,Uwe D. Reichel and Piroska Lendvai,Veracity Computing from Lexical Cues and Perceived Certainty Trends,cs.CL," We present a data-driven method for determining the veracity of a set of -rumorous claims on social media data. Tweets from different sources pertaining -to a rumor are processed on three levels: first, factuality values are assigned -to each tweet based on four textual cue categories relevant for our journalism -use case; these amalgamate speaker support in terms of polarity and commitment -in terms of certainty and speculation. Next, the proportions of these lexical -cues are utilized as predictors for tweet certainty in a generalized linear -regression model. Subsequently, lexical cue proportions, predicted certainty, -as well as their time course characteristics are used to compute veracity for -each rumor in terms of the identity of the rumor-resolving tweet and its binary -resolution value judgment. The system operates without access to -extralinguistic resources. Evaluated on the data portion for which hand-labeled -examples were available, it achieves .74 F1-score on identifying rumor -resolving tweets and .76 F1-score on predicting if a rumor is resolved as true -or false. -" -3786,1611.02654,"Lajanugen Logeswaran, Honglak Lee, Dragomir Radev",Sentence Ordering and Coherence Modeling using Recurrent Neural Networks,cs.CL cs.AI cs.LG," Modeling the structure of coherent texts is a key NLP problem. The task of -coherently organizing a given set of sentences has been commonly used to build -and evaluate models that understand such structure. We propose an end-to-end -unsupervised deep learning approach based on the set-to-sequence framework to -address this problem. Our model strongly outperforms prior methods in the order -discrimination task and a novel task of ordering abstracts from scientific -articles. Furthermore, our work shows that useful text representations can be -obtained by learning to order sentences. Visualizing the learned sentence -representations shows that the model captures high-level logical structure in -paragraphs. Our representations perform comparably to state-of-the-art -pre-training methods on sentence similarity and paraphrase detection tasks. -" -3787,1611.02683,"Prajit Ramachandran, Peter J. Liu, Quoc V. Le",Unsupervised Pretraining for Sequence to Sequence Learning,cs.CL cs.LG cs.NE," This work presents a general unsupervised learning method to improve the -accuracy of sequence to sequence (seq2seq) models. In our method, the weights -of the encoder and decoder of a seq2seq model are initialized with the -pretrained weights of two language models and then fine-tuned with labeled -data. We apply this method to challenging benchmarks in machine translation and -abstractive summarization and find that it significantly improves the -subsequent supervised models. Our main result is that pretraining improves the -generalization of seq2seq models. We achieve state-of-the art results on the -WMT English$\rightarrow$German task, surpassing a range of methods using both -phrase-based machine translation and neural machine translation. Our method -achieves a significant improvement of 1.3 BLEU from the previous best models on -both WMT'14 and WMT'15 English$\rightarrow$German. We also conduct human -evaluations on abstractive summarization and find that our method outperforms a -purely supervised learning baseline in a statistically significant manner. -" -3788,1611.02695,"Samuel Fernando, Roger K. Moore, David Cameron, Emily C. Collins, - Abigail Millings, Amanda J. Sharkey, Tony J. Prescott","Automatic recognition of child speech for robotic applications in noisy - environments",cs.CL cs.SD," Automatic speech recognition (ASR) allows a natural and intuitive interface -for robotic educational applications for children. However there are a number -of challenges to overcome to allow such an interface to operate robustly in -realistic settings, including the intrinsic difficulties of recognising child -speech and high levels of background noise often present in classrooms. As part -of the EU EASEL project we have provided several contributions to address these -challenges, implementing our own ASR module for use in robotics applications. -We used the latest deep neural network algorithms which provide a leap in -performance over the traditional GMM approach, and apply data augmentation -methods to improve robustness to noise and speaker variation. We provide a -close integration between the ASR module and the rest of the dialogue system, -allowing the ASR to receive in real-time the language models relevant to the -current section of the dialogue, greatly improving the accuracy. We integrated -our ASR module into an interactive, multimodal system using a small humanoid -robot to help children learn about exercise and energy. The system was -installed at a public museum event as part of a research study where 320 -children (aged 3 to 14) interacted with the robot, with our ASR achieving 90% -accuracy for fluent and near-fluent speech. -" -3789,1611.02815,Emad Fawzi Al-Shalabi,"An Automated System for Essay Scoring of Online Exams in Arabic based on - Stemming Techniques and Levenshtein Edit Operations",cs.IR cs.CL," In this article, an automated system is proposed for essay scoring in Arabic -language for online exams based on stemming techniques and Levenshtein edit -operations. An online exam has been developed on the proposed mechanisms, -exploiting the capabilities of light and heavy stemming. The implemented online -grading system has shown to be an efficient tool for automated scoring of essay -questions. -" -3790,1611.02839,"Kimmo Kettunen, Eetu M\""akel\""a, Teemu Ruokolainen, Juha Kuokkala and - Laura L\""ofberg","Old Content and Modern Tools - Searching Named Entities in a Finnish - OCRed Historical Newspaper Collection 1771-1910",cs.CL," Named Entity Recognition (NER), search, classification and tagging of names -and name like frequent informational elements in texts, has become a standard -information extraction procedure for textual data. NER has been applied to many -types of texts and different types of entities: newspapers, fiction, historical -records, persons, locations, chemical compounds, protein families, animals etc. -In general a NER system's performance is genre and domain dependent and also -used entity categories vary (Nadeau and Sekine, 2007). The most general set of -named entities is usually some version of three partite categorization of -locations, persons and organizations. In this paper we report first large scale -trials and evaluation of NER with data out of a digitized Finnish historical -newspaper collection Digi. Experiments, results and discussion of this research -serve development of the Web collection of historical Finnish newspapers. - Digi collection contains 1,960,921 pages of newspaper material from years -1771-1910 both in Finnish and Swedish. We use only material of Finnish -documents in our evaluation. The OCRed newspaper collection has lots of OCR -errors; its estimated word level correctness is about 70-75 % (Kettunen and -P\""a\""akk\""onen, 2016). Our principal NER tagger is a rule-based tagger of -Finnish, FiNER, provided by the FIN-CLARIN consortium. We show also results of -limited category semantic tagging with tools of the Semantic Computing Research -Group (SeCo) of the Aalto University. Three other tools are also evaluated -briefly. - This research reports first published large scale results of NER in a -historical Finnish OCRed newspaper collection. Results of the research -supplement NER results of other languages with similar noisy data. -" -3791,1611.02879,"Abhinav Thanda, Shankar M Venkatesan",Audio Visual Speech Recognition using Deep Recurrent Neural Networks,cs.CV cs.CL cs.LG," In this work, we propose a training algorithm for an audio-visual automatic -speech recognition (AV-ASR) system using deep recurrent neural network -(RNN).First, we train a deep RNN acoustic model with a Connectionist Temporal -Classification (CTC) objective function. The frame labels obtained from the -acoustic model are then used to perform a non-linear dimensionality reduction -of the visual features using a deep bottleneck network. Audio and visual -features are fused and used to train a fusion RNN. The use of bottleneck -features for visual modality helps the model to converge properly during -training. Our system is evaluated on GRID corpus. Our results show that -presence of visual modality gives significant improvement in character error -rate (CER) at various levels of noise even when the model is trained without -noisy data. We also provide a comparison of two fusion methods: feature fusion -and decision fusion. -" -3792,1611.02944,"Jernej Vi\v{c}i\v{c}, Andrej Brodnik",Increasing the throughput of machine translation systems using clouds,cs.CL cs.DC," The manuscript presents an experiment at implementation of a Machine -Translation system in a MapReduce model. The empirical evaluation was done -using fully implemented translation systems embedded into the MapReduce -programming model. Two machine translation paradigms were studied: shallow -transfer Rule Based Machine Translation and Statistical Machine Translation. - The results show that the MapReduce model can be successfully used to -increase the throughput of a machine translation system. Furthermore this -method enhances the throughput of a machine translation system without -decreasing the quality of the translation output. - Thus, the present manuscript also represents a contribution to the seminal -work in natural language processing, specifically Machine Translation. It first -points toward the importance of the definition of the metric of throughput of -translation system and, second, the applicability of the machine translation -task to the MapReduce paradigm. -" -3793,1611.02956,"Hong Jin Kang, Tao Chen, Muthu Kumar Chandrasekaran, Min-Yen Kan","A Comparison of Word Embeddings for English and Cross-Lingual Chinese - Word Sense Disambiguation",cs.CL," Word embeddings are now ubiquitous forms of word representation in natural -language processing. There have been applications of word embeddings for -monolingual word sense disambiguation (WSD) in English, but few comparisons -have been done. This paper attempts to bridge that gap by examining popular -embeddings for the task of monolingual English WSD. Our simplified method leads -to comparable state-of-the-art performance without expensive retraining. -Cross-Lingual WSD - where the word senses of a word in a source language e come -from a separate target translation language f - can also assist in language -learning; for example, when providing translations of target vocabulary for -learners. Thus we have also applied word embeddings to the novel task of -cross-lingual WSD for Chinese and provide a public dataset for further -benchmarking. We have also experimented with using word embeddings for LSTM -networks and found surprisingly that a basic LSTM network does not work well. -We discuss the ramifications of this outcome. -" -3794,1611.02988,Chris Pool and Malvina Nissim,Distant supervision for emotion detection using Facebook reactions,cs.CL," We exploit the Facebook reaction feature in a distant supervised fashion to -train a support vector machine classifier for emotion detection, using several -feature combinations and combining different Facebook pages. We test our models -on existing benchmarks for emotion detection and show that employing only -information that is derived completely automatically, thus without relying on -any handcrafted lexicon as it's usually done, we can achieve competitive -results. The results also show that there is large room for improvement, -especially by gearing the collection of Facebook pages, with a view to the -target domain. -" -3795,1611.03057,Barbara Plank and Malvina Nissim,"When silver glitters more than gold: Bootstrapping an Italian - part-of-speech tagger for Twitter",cs.CL," We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter -data, in the context of the Evalita 2016 PoSTWITA shared task. We show that -training the tagger on native Twitter data enriched with little amounts of -specifically selected gold data and additional silver-labelled data scraped -from Facebook, yields better results than using large amounts of manually -annotated data from a mix of genres. -" -3796,1611.03218,"Emilio Jorge, Mikael K{\aa}geb\""ack, Fredrik D. Johansson, Emil - Gustavsson","Learning to Play Guess Who? and Inventing a Grounded Language as a - Consequence",cs.AI cs.CL cs.LG cs.MA," Acquiring your first language is an incredible feat and not easily -duplicated. Learning to communicate using nothing but a few pictureless books, -a corpus, would likely be impossible even for humans. Nevertheless, this is the -dominating approach in most natural language processing today. As an -alternative, we propose the use of situated interactions between agents as a -driving force for communication, and the framework of Deep Recurrent Q-Networks -for evolving a shared language grounded in the provided environment. We task -the agents with interactive image search in the form of the game Guess Who?. -The images from the game provide a non trivial environment for the agents to -discuss and a natural grounding for the concepts they decide to encode in their -communication. Our experiments show that the agents learn not only to encode -physical concepts in their words, i.e. grounding, but also that the agents -learn to hold a multi-step dialogue remembering the state of the dialogue from -step to step. -" -3797,1611.03279,Marco Del Tredici and Malvina Nissim and Andrea Zaninello,Tracing metaphors in time through self-distance in vector spaces,cs.CL," From a diachronic corpus of Italian, we build consecutive vector spaces in -time and use them to compare a term's cosine similarity to itself in different -time spans. We assume that a drop in similarity might be related to the -emergence of a metaphorical sense at a given time. Similarity-based -observations are matched to the actual year when a figurative meaning was -documented in a reference dictionary and through manual inspection of corpus -occurrences. -" -3798,1611.03305,"Kezban Dilek Onal, Ismail Sengor Altingovde, Pinar Karagoz, Maarten de - Rijke",Getting Started with Neural Models for Semantic Matching in Web Search,cs.IR cs.CL," The vocabulary mismatch problem is a long-standing problem in information -retrieval. Semantic matching holds the promise of solving the problem. Recent -advances in language technology have given rise to unsupervised neural models -for learning representations of words as well as bigger textual units. Such -representations enable powerful semantic matching methods. This survey is meant -as an introduction to the use of neural models for semantic matching. To remain -focused we limit ourselves to web search. We detail the required background and -terminology, a taxonomy grouping the rapidly growing body of work in the area, -and then survey work on neural models for semantic matching in the context of -three tasks: query suggestion, ad retrieval, and document retrieval. We include -a section on resources and best practices that we believe will help readers who -are new to the area. We conclude with an assessment of the state-of-the-art and -suggestions for future work. -" -3799,1611.03382,"Wenyuan Zeng, Wenjie Luo, Sanja Fidler, Raquel Urtasun",Efficient Summarization with Read-Again and Copy Mechanism,cs.CL," Encoder-decoder models have been widely used to solve sequence to sequence -prediction tasks. However current approaches suffer from two shortcomings. -First, the encoders compute a representation of each word taking into account -only the history of the words it has read so far, yielding suboptimal -representations. Second, current decoders utilize large vocabularies in order -to minimize the problem of unknown words, resulting in slow decoding times. In -this paper we address both shortcomings. Towards this goal, we first introduce -a simple mechanism that first reads the input sequence before committing to a -representation of each word. Furthermore, we propose a simple copy mechanism -that is able to exploit very small vocabularies and handle out-of-vocabulary -words. We demonstrate the effectiveness of our approach on the Gigaword dataset -and DUC competition outperforming the state-of-the-art. -" -3800,1611.03466,Vikram Krishnamurthy and Sijia Gao,"Syntactic Enhancement to VSIMM for Roadmap Based Anomalous Trajectory - Detection: A Natural Language Processing Approach",cs.CL," The aim of syntactic tracking is to classify spatio-temporal patterns of a -target's motion using natural language processing models. In this paper, we -generalize earlier work by considering a constrained stochastic context free -grammar (CSCFG) for modeling patterns confined to a roadmap. The constrained -grammar facilitates modeling specific directions and road names in a roadmap. -We present a novel particle filtering algorithm that exploits the CSCFG model -for estimating the target's patterns. This meta-level algorithm operates in -conjunction with a base-level tracking algorithm. Extensive numerical results -using simulated ground moving target indicator (GMTI) radar measurements show -substantial improvement in target tracking accuracy. -" -3801,1611.03533,"Xiang Kong, Xuesong Yang, Mark Hasegawa-Johnson, Jeung-Yoon Choi, - Stefanie Shattuck-Hufnagel",Landmark-based consonant voicing detection on multilingual corpora,cs.CL cs.SD," This paper tests the hypothesis that distinctive feature classifiers anchored -at phonetic landmarks can be transferred cross-lingually without loss of -accuracy. Three consonant voicing classifiers were developed: (1) manually -selected acoustic features anchored at a phonetic landmark, (2) MFCCs (either -averaged across the segment or anchored at the landmark), and(3) acoustic -features computed using a convolutional neural network (CNN). All detectors are -trained on English data (TIMIT),and tested on English, Turkish, and Spanish -(performance measured using F1 and accuracy). Experiments demonstrate that -manual features outperform all MFCC classifiers, while CNNfeatures outperform -both. MFCC-based classifiers suffer an F1reduction of 16% absolute when -generalized from English to other languages. Manual features suffer only a 5% -F1 reduction,and CNN features actually perform better in Turkish and Span-ish -than in the training language, demonstrating that features capable of -representing long-term spectral dynamics (CNN and landmark-based features) are -able to generalize cross-lingually with little or no loss of accuracy -" -3802,1611.03558,Dan Liu and Wei Lin and Shiliang Zhang and Si Wei and Hui Jiang,Neural Networks Models for Entity Discovery and Linking,cs.CL cs.AI cs.IR," This paper describes the USTC_NELSLIP systems submitted to the Trilingual -Entity Detection and Linking (EDL) track in 2016 TAC Knowledge Base Population -(KBP) contests. We have built two systems for entity discovery and mention -detection (MD): one uses the conditional RNNLM and the other one uses the -attention-based encoder-decoder framework. The entity linking (EL) system -consists of two modules: a rule based candidate generation and a neural -networks probability ranking model. Moreover, some simple string matching rules -are used for NIL clustering. At the end, our best system has achieved an F1 -score of 0.624 in the end-to-end typed mention ceaf plus metric. -" -3803,1611.03596,"Eduardo G. Altmann, Laercio Dias, and Martin Gerlach",Generalized Entropies and the Similarity of Texts,physics.soc-ph cs.CL," We show how generalized Gibbs-Shannon entropies can provide new insights on -the statistical properties of texts. The universal distribution of word -frequencies (Zipf's law) implies that the generalized entropies, computed at -the word level, are dominated by words in a specific range of frequencies. Here -we show that this is the case not only for the generalized entropies but also -for the generalized (Jensen-Shannon) divergences, used to compute the -similarity between different texts. This finding allows us to identify the -contribution of specific words (and word frequencies) for the different -generalized entropies and also to estimate the size of the databases needed to -obtain a reliable estimation of the divergences. We test our results in large -databases of books (from the Google n-gram database) and scientific papers -(indexed by Web of Science). -" -3804,1611.03599,Wei-Fan Chen and Lun-Wei Ku,"UTCNN: a Deep Learning Model of Stance Classificationon on Social Media - Text",cs.CL cs.AI cs.LG," Most neural network models for document classification on social media focus -on text infor-mation to the neglect of other information on these platforms. In -this paper, we classify post stance on social media channels and develop UTCNN, -a neural network model that incorporates user tastes, topic tastes, and user -comments on posts. UTCNN not only works on social media texts, but also -analyzes texts in forums and message boards. Experiments performed on Chinese -Facebook data and English online debate forum data show that UTCNN achieves a -0.755 macro-average f-score for supportive, neutral, and unsupportive stance -classes on Facebook data, which is significantly better than models in which -either user, topic, or comment information is withheld. This model design -greatly mitigates the lack of data for the minor class without the use of -oversampling. In addition, UTCNN yields a 0.842 accuracy on English online -debate forum data, which also significantly outperforms results from previous -work as well as other deep learning models, showing that UTCNN performs well -regardless of language or platform. -" -3805,1611.03641,Oded Avraham and Yoav Goldberg,"Improving Reliability of Word Similarity Evaluation by Redesigning - Annotation Task and Performance Measure",cs.CL," We suggest a new method for creating and using gold-standard datasets for -word similarity evaluation. Our goal is to improve the reliability of the -evaluation, and we do this by redesigning the annotation task to achieve higher -inter-rater agreement, and by defining a performance measure which takes the -reliability of each annotation decision in the dataset into account. -" -3806,1611.03932,"Jangho Lee, Gyuwan Kim, Jaeyoon Yoo, Changwoo Jung, Minseok Kim, - Sungroh Yoon",Training IBM Watson using Automatically Generated Question-Answer Pairs,cs.CL," IBM Watson is a cognitive computing system capable of question answering in -natural languages. It is believed that IBM Watson can understand large corpora -and answer relevant questions more effectively than any other -question-answering system currently available. To unleash the full power of -Watson, however, we need to train its instance with a large number of -well-prepared question-answer pairs. Obviously, manually generating such pairs -in a large quantity is prohibitively time consuming and significantly limits -the efficiency of Watson's training. Recently, a large-scale dataset of over 30 -million question-answer pairs was reported. Under the assumption that using -such an automatically generated dataset could relieve the burden of manual -question-answer generation, we tried to use this dataset to train an instance -of Watson and checked the training efficiency and accuracy. According to our -experiments, using this auto-generated dataset was effective for training -Watson, complementing manually crafted question-answer pairs. To the best of -the authors' knowledge, this work is the first attempt to use a large-scale -dataset of automatically generated question-answer pairs for training IBM -Watson. We anticipate that the insights and lessons obtained from our -experiments will be useful for researchers who want to expedite Watson training -leveraged by automatically generated question-answer pairs. -" -3807,1611.03949,"Qiao Qian, Minlie Huang, Jinhao Lei, Xiaoyan Zhu",Linguistically Regularized LSTMs for Sentiment Classification,cs.CL," Sentiment understanding has been a long-term goal of AI in the past decades. -This paper deals with sentence-level sentiment classification. Though a variety -of neural network models have been proposed very recently, however, previous -models either depend on expensive phrase-level annotation, whose performance -drops substantially when trained with only sentence-level annotation; or do not -fully employ linguistic resources (e.g., sentiment lexicons, negation words, -intensity words), thus not being able to produce linguistically coherent -representations. In this paper, we propose simple models trained with -sentence-level annotation, but also attempt to generating linguistically -coherent representations by employing regularizers that model the linguistic -role of sentiment lexicons, negation words, and intensity words. Results show -that our models are effective to capture the sentiment shifting effect of -sentiment, negation, and intensity words, while still obtain competitive -results without sacrificing the models' simplicity. -" -3808,1611.03954,"Muhao Chen, Yingtao Tian, Mohan Yang, Carlo Zaniolo","Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge - Alignment",cs.AI cs.CL," Many recent works have demonstrated the benefits of knowledge graph -embeddings in completing monolingual knowledge graphs. Inasmuch as related -knowledge bases are built in several different languages, achieving -cross-lingual knowledge alignment will help people in constructing a coherent -knowledge base, and assist machines in dealing with different expressions of -entity relationships across diverse human languages. Unfortunately, achieving -this highly desirable crosslingual alignment by human labor is very costly and -errorprone. Thus, we propose MTransE, a translation-based model for -multilingual knowledge graph embeddings, to provide a simple and automated -solution. By encoding entities and relations of each language in a separated -embedding space, MTransE provides transitions for each embedding vector to its -cross-lingual counterparts in other spaces, while preserving the -functionalities of monolingual embeddings. We deploy three different techniques -to represent cross-lingual transitions, namely axis calibration, translation -vectors, and linear transformations, and derive five variants for MTransE using -different loss functions. Our models can be trained on partially aligned -graphs, where just a small portion of triples are aligned with their -cross-lingual counterparts. The experiments on cross-lingual entity matching -and triple-wise alignment verification show promising results, with some -variants consistently outperforming others on different tasks. We also explore -how MTransE preserves the key properties of its monolingual counterpart TransE. -" -3809,1611.04010,Vrishabh Ajay Lakhani and Rohan Mahadev,"Multi-Language Identification Using Convolutional Recurrent Neural - Network",cs.CL," Language Identification, being an important aspect of Automatic Speaker -Recognition has had many changes and new approaches to ameliorate performance -over the last decade. We compare the performance of using audio spectrum in the -log scale and using Polyphonic sound sequences from raw audio samples to train -the neural network and to classify speech as either English or Spanish. To -achieve this, we use the novel approach of using a Convolutional Recurrent -Neural Network using Long Short Term Memory (LSTM) or a Gated Recurrent Unit -(GRU) for forward propagation of the neural network. Our hypothesis is that the -performance of using polyphonic sound sequence as features and both LSTM and -GRU as the gating mechanisms for the neural network outperform the traditional -MFCC features using a unidirectional Deep Neural Network. -" -3810,1611.04033,Ibrahim Abu El-khair,1.5 billion words Arabic Corpus,cs.CL cs.DL cs.IR," This study is an attempt to build a contemporary linguistic corpus for Arabic -language. The corpus produced, is a text corpus includes more than five million -newspaper articles. It contains over a billion and a half words in total, out -of which, there is about three million unique words. The data were collected -from newspaper articles in ten major news sources from eight Arabic countries, -over a period of fourteen years. The corpus was encoded with two types of -encoding, namely: UTF-8, and Windows CP-1256. Also it was marked with two -mark-up languages, namely: SGML, and XML. -" -3811,1611.04052,Xiaojun Zhang,Semi-automatic Simultaneous Interpreting Quality Evaluation,cs.CL," Increasing interpreting needs a more objective and automatic measurement. We -hold a basic idea that 'translating means translating meaning' in that we can -assessment interpretation quality by comparing the meaning of the interpreting -output with the source input. That is, a translation unit of a 'chunk' named -Frame which comes from frame semantics and its components named Frame Elements -(FEs) which comes from Frame Net are proposed to explore their matching rate -between target and source texts. A case study in this paper verifies the -usability of semi-automatic graded semantic-scoring measurement for human -simultaneous interpreting and shows how to use frame and FE matches to score. -Experiments results show that the semantic-scoring metrics have a significantly -correlation coefficient with human judgment. -" -3812,1611.04122,Yangqiu Song and Stephen Mayhew and Dan Roth,"Cross-lingual Dataless Classification for Languages with Small Wikipedia - Presence",cs.CL," This paper presents an approach to classify documents in any language into an -English topical label space, without any text categorization training data. The -approach, Cross-Lingual Dataless Document Classification (CLDDC) relies on -mapping the English labels or short category description into a Wikipedia-based -semantic representation, and on the use of the target language Wikipedia. -Consequently, performance could suffer when Wikipedia in the target language is -small. In this paper, we focus on languages with small Wikipedias, -(Small-Wikipedia languages, SWLs). We use a word-level dictionary to convert -documents in a SWL to a large-Wikipedia language (LWLs), and then perform CLDDC -based on the LWL's Wikipedia. This approach can be applied to thousands of -languages, which can be contrasted with machine translation, which is a -supervision heavy approach and can be done for about 100 languages. We also -develop a ranking algorithm that makes use of language similarity metrics to -automatically select a good LWL, and show that this significantly improves -classification of SWLs' documents, performing comparably to the best bridge -possible. -" -3813,1611.04125,"Xu Han, Zhiyuan Liu, Maosong Sun","Joint Representation Learning of Text and Knowledge for Knowledge Graph - Completion",cs.CL," Joint representation learning of text and knowledge within a unified semantic -space enables us to perform knowledge graph completion more accurately. In this -work, we propose a novel framework to embed words, entities and relations into -the same continuous vector space. In this model, both entity and relation -embeddings are learned by taking knowledge graph and plain text into -consideration. In experiments, we evaluate the joint learning model on three -tasks including entity prediction, relation prediction and relation -classification from text. The experiment results show that our model can -significantly and consistently improve the performance on the three tasks as -compared with other baselines. -" -3814,1611.04230,"Ramesh Nallapati, Feifei Zhai, Bowen Zhou","SummaRuNNer: A Recurrent Neural Network based Sequence Model for - Extractive Summarization of Documents",cs.CL," We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model -for extractive summarization of documents and show that it achieves performance -better than or comparable to state-of-the-art. Our model has the additional -advantage of being very interpretable, since it allows visualization of its -predictions broken up by abstract features such as information content, -salience and novelty. Another novel contribution of our work is abstractive -training of our extractive model that can train on human generated reference -summaries alone, eliminating the need for sentence-level extractive labels. -" -3815,1611.04233,"Shuming Ma, Xu Sun",A New Recurrent Neural CRF for Learning Non-linear Edge Features,cs.CL," Conditional Random Field (CRF) and recurrent neural models have achieved -success in structured prediction. More recently, there is a marriage of CRF and -recurrent neural models, so that we can gain from both non-linear dense -features and globally normalized CRF objective. These recurrent neural CRF -models mainly focus on encode node features in CRF undirected graphs. However, -edge features prove important to CRF in structured prediction. In this work, we -introduce a new recurrent neural CRF model, which learns non-linear edge -features, and thus makes non-linear features encoded completely. We compare our -model with different neural models in well-known structured prediction tasks. -Experiments show that our model outperforms state-of-the-art methods in NP -chunking, shallow parsing, Chinese word segmentation and POS tagging. -" -3816,1611.04234,"Hangfeng He, Xu Sun","F-Score Driven Max Margin Neural Network for Named Entity Recognition in - Chinese Social Media",cs.CL," We focus on named entity recognition (NER) for Chinese social media. With -massive unlabeled text and quite limited labelled corpus, we propose a -semi-supervised learning model based on B-LSTM neural network. To take -advantage of traditional methods in NER such as CRF, we combine transition -probability with deep learning in our model. To bridge the gap between label -accuracy and F-score of NER, we construct a model which can be directly trained -on F-score. When considering the instability of F-score driven method and -meaningful information provided by label accuracy, we propose an integrated -method to train on both F-score and label accuracy. Our integrated model yields -7.44\% improvement over previous state-of-the-art result. -" -3817,1611.04244,"Ramesh Nallapati, Bowen Zhou, Mingbo Ma","Classify or Select: Neural Architectures for Extractive Document - Summarization",cs.CL," We present two novel and contrasting Recurrent Neural Network (RNN) based -architectures for extractive summarization of documents. The Classifier based -architecture sequentially accepts or rejects each sentence in the original -document order for its membership in the final summary. The Selector -architecture, on the other hand, is free to pick one sentence at a time in any -arbitrary order to piece together the summary. Our models under both -architectures jointly capture the notions of salience and redundancy of -sentences. In addition, these models have the advantage of being very -interpretable, since they allow visualization of their predictions broken up by -abstract features such as information content, salience and redundancy. We show -that our models reach or outperform state-of-the-art supervised models on two -different corpora. We also recommend the conditions under which one -architecture is superior to the other based on experimental evidence. -" -3818,1611.04326,"Aditya Joshi, Prayas Jain, Pushpak Bhattacharyya, Mark Carman","`Who would have thought of that!': A Hierarchical Topic Model for - Extraction of Sarcasm-prevalent Topics and Sarcasm Detection",cs.CL," Topic Models have been reported to be beneficial for aspect-based sentiment -analysis. This paper reports a simple topic model for sarcasm detection, a -first, to the best of our knowledge. Designed on the basis of the intuition -that sarcastic tweets are likely to have a mixture of words of both sentiments -as against tweets with literal sentiment (either positive or negative), our -hierarchical topic model discovers sarcasm-prevalent topics and topic-level -sentiment. Using a dataset of tweets labeled using hashtags, the model -estimates topic-level, and sentiment-level distributions. Our evaluation shows -that topics such as `work', `gun laws', `weather' are sarcasm-prevalent topics. -Our model is also able to discover the mixture of sentiment-bearing words that -exist in a text of a given sentiment-related label. Finally, we apply our model -to predict sarcasm in tweets. We outperform two prior work based on statistical -classifiers with specific features, by around 25\%. -" -3819,1611.04358,"Weijie Huang, Jun Wang","Character-level Convolutional Network for Text Classification Applied to - Chinese Corpus",cs.CL," This article provides an interesting exploration of character-level -convolutional neural network solving Chinese corpus text classification -problem. We constructed a large-scale Chinese language dataset, and the result -shows that character-level convolutional neural network works better on Chinese -corpus than its corresponding pinyin format dataset. This is the first time -that character-level convolutional neural network applied to text -classification problem. -" -3820,1611.04361,"Marek Rei, Gamal K.O. Crichton, Sampo Pyysalo",Attending to Characters in Neural Sequence Labeling Models,cs.CL cs.LG cs.NE," Sequence labeling architectures use word embeddings for capturing similarity, -but suffer when handling previously unseen or rare words. We investigate -character-level extensions to such models and propose a novel architecture for -combining alternative word representations. By using an attention mechanism, -the model is able to dynamically decide how much information to use from a -word- or character-level component. We evaluated different architectures on a -range of sequence labeling datasets, and character-level extensions were found -to improve performance on every benchmark. In addition, the proposed -attention-based architecture delivered the best results even with a smaller -number of trainable parameters. -" -3821,1611.04491,"Jinying Chen, Abhyuday N. Jagannatha, Samah J. Jarad, Hong Yu","Ranking medical jargon in electronic health record notes by adapted - distant supervision",cs.CL," Objective: Allowing patients to access their own electronic health record -(EHR) notes through online patient portals has the potential to improve -patient-centered care. However, medical jargon, which abounds in EHR notes, has -been shown to be a barrier for patient EHR comprehension. Existing knowledge -bases that link medical jargon to lay terms or definitions play an important -role in alleviating this problem but have low coverage of medical jargon in -EHRs. We developed a data-driven approach that mines EHRs to identify and rank -medical jargon based on its importance to patients, to support the building of -EHR-centric lay language resources. - Methods: We developed an innovative adapted distant supervision (ADS) model -based on support vector machines to rank medical jargon from EHRs. For distant -supervision, we utilized the open-access, collaborative consumer health -vocabulary, a large, publicly available resource that links lay terms to -medical jargon. We explored both knowledge-based features from the Unified -Medical Language System and distributed word representations learned from -unlabeled large corpora. We evaluated the ADS model using physician-identified -important medical terms. - Results: Our ADS model significantly surpassed two state-of-the-art automatic -term recognition methods, TF*IDF and C-Value, yielding 0.810 ROC-AUC versus -0.710 and 0.667, respectively. Our model identified 10K important medical -jargon terms after ranking over 100K candidate terms mined from over 7,500 EHR -narratives. - Conclusion: Our work is an important step towards enriching lexical resources -that link medical jargon to lay terms/definitions to support patient EHR -comprehension. The identified medical jargon terms and their rankings are -available upon request. -" -3822,1611.04496,"Wanjia He, Weiran Wang, Karen Livescu",Multi-view Recurrent Neural Acoustic Word Embeddings,cs.CL," Recent work has begun exploring neural acoustic word -embeddings---fixed-dimensional vector representations of arbitrary-length -speech segments corresponding to words. Such embeddings are applicable to -speech retrieval and recognition tasks, where reasoning about whole words may -make it possible to avoid ambiguous sub-word representations. The main idea is -to map acoustic sequences to fixed-dimensional vectors such that examples of -the same word are mapped to similar vectors, while different-word examples are -mapped to very different vectors. In this work we take a multi-view approach to -learning acoustic word embeddings, in which we jointly learn to embed acoustic -sequences and their corresponding character sequences. We use deep -bidirectional LSTM embedding models and multi-view contrastive losses. We study -the effect of different loss variants, including fixed-margin and -cost-sensitive losses. Our acoustic word embeddings improve over previous -approaches for the task of word discrimination. We also present results on -other tasks that are enabled by the multi-view approach, including cross-view -word discrimination and word similarity. -" -3823,1611.04503,Hideki Nakayama and Noriki Nishida,"Zero-resource Machine Translation by Multimodal Encoder-decoder Network - with Multimedia Pivot",cs.CL cs.AI cs.CV cs.MM," We propose an approach to build a neural machine translation system with no -supervised resources (i.e., no parallel corpora) using multimodal embedded -representation over texts and images. Based on the assumption that text -documents are often likely to be described with other multimedia information -(e.g., images) somewhat related to the content, we try to indirectly estimate -the relevance between two languages. Using multimedia as the ""pivot"", we -project all modalities into one common hidden space where samples belonging to -similar semantic concepts should come close to each other, whatever the -observed space of each sample is. This modality-agnostic representation is the -key to bridging the gap between different modalities. Putting a decoder on top -of it, our network can flexibly draw the outputs from any input modality. -Notably, in the testing phase, we need only source language texts as the input -for translation. In experiments, we tested our method on two benchmarks to show -that it can achieve reasonable translation performance. We compared and -investigated several possible implementations and found that an end-to-end -model that simultaneously optimized both rank loss in multimodal encoders and -cross-entropy loss in decoders performed the best. -" -3824,1611.04558,"Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, - Zhifeng Chen, Nikhil Thorat, Fernanda Vi\'egas, Martin Wattenberg, Greg - Corrado, Macduff Hughes, Jeffrey Dean","Google's Multilingual Neural Machine Translation System: Enabling - Zero-Shot Translation",cs.CL cs.AI," We propose a simple solution to use a single Neural Machine Translation (NMT) -model to translate between multiple languages. Our solution requires no change -in the model architecture from our base system but instead introduces an -artificial token at the beginning of the input sentence to specify the required -target language. The rest of the model, which includes encoder, decoder and -attention, remains unchanged and is shared across all languages. Using a shared -wordpiece vocabulary, our approach enables Multilingual NMT using a single -model without any increase in parameters, which is significantly simpler than -previous proposals for Multilingual NMT. Our method often improves the -translation quality of all involved language pairs, even while keeping the -total number of model parameters constant. On the WMT'14 benchmarks, a single -multilingual model achieves comparable performance for -English$\rightarrow$French and surpasses state-of-the-art results for -English$\rightarrow$German. Similarly, a single multilingual model surpasses -state-of-the-art results for French$\rightarrow$English and -German$\rightarrow$English on WMT'14 and WMT'15 benchmarks respectively. On -production corpora, multilingual models of up to twelve language pairs allow -for better translation of many individual pairs. In addition to improving the -translation quality of language pairs that the model was trained with, our -models can also learn to perform implicit bridging between language pairs never -seen explicitly during training, showing that transfer learning and zero-shot -translation is possible for neural translation. Finally, we show analyses that -hints at a universal interlingua representation in our models and show some -interesting examples when mixing languages. -" -3825,1611.04642,"Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao",Link Prediction using Embedded Knowledge Graphs,cs.AI cs.CL cs.LG," Since large knowledge bases are typically incomplete, missing facts need to -be inferred from observed facts in a task called knowledge base completion. The -most successful approaches to this task have typically explored explicit paths -through sequences of triples. These approaches have usually resorted to -human-designed sampling procedures, since large knowledge graphs produce -prohibitively large numbers of possible paths, most of which are uninformative. -As an alternative approach, we propose performing a single, short sequence of -interactive lookup operations on an embedded knowledge graph which has been -trained through end-to-end backpropagation to be an optimized and compressed -version of the initial knowledge base. Our proposed model, called Embedded -Knowledge Graph Network (EKGN), achieves new state-of-the-art results on -popular knowledge base completion benchmarks. -" -3826,1611.04684,"Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou",Knowledge Enhanced Hybrid Neural Network for Text Matching,cs.CL," Long text brings a big challenge to semantic matching due to their -complicated semantic and syntactic structures. To tackle the challenge, we -consider using prior knowledge to help identify useful information and filter -out noise to matching in long text. To this end, we propose a knowledge -enhanced hybrid neural network (KEHNN). The model fuses prior knowledge into -word representations by knowledge gates and establishes three matching channels -with words, sequential structures of sentences given by Gated Recurrent Units -(GRU), and knowledge enhanced representations. The three channels are processed -by a convolutional neural network to generate high level features for matching, -and the features are synthesized as a matching score by a multilayer -perceptron. The model extends the existing methods by conducting matching on -words, local structures of sentences, and global context of sentences. -Evaluation results from extensive experiments on public data sets for question -answering and conversation show that KEHNN can significantly outperform -the-state-of-the-art matching models and particularly improve the performance -on pairs with long text. -" -3827,1611.04741,"Biswajit Paria, K. M. Annervaz, Ambedkar Dukkipati, Ankush Chatterjee, - Sanjay Podder","A Neural Architecture Mimicking Humans End-to-End for Natural Language - Inference",cs.CL," In this work we use the recent advances in representation learning to propose -a neural architecture for the problem of natural language inference. Our -approach is aligned to mimic how a human does the natural language inference -process given two statements. The model uses variants of Long Short Term Memory -(LSTM), attention mechanism and composable neural networks, to carry out the -task. Each part of our model can be mapped to a clear functionality humans do -for carrying out the overall task of natural language inference. The model is -end-to-end differentiable enabling training by stochastic gradient descent. On -Stanford Natural Language Inference(SNLI) dataset, the proposed model achieves -better accuracy numbers than all published models in literature. -" -3828,1611.04798,Thanh-Le Ha and Jan Niehues and Alexander Waibel,"Toward Multilingual Neural Machine Translation with Universal Encoder - and Decoder",cs.CL," In this paper, we present our first attempts in building a multilingual -Neural Machine Translation framework under a unified approach. We are then able -to employ attention-based NMT for many-to-many multilingual translation tasks. -Our approach does not require any special treatment on the network architecture -and it allows us to learn minimal number of free parameters in a standard way -of training. Our approach has shown its effectiveness in an under-resourced -translation scenario with considerable improvements up to 2.6 BLEU points. In -addition, the approach has achieved interesting and promising results when -applied in the translation task that there is no direct parallel corpus between -source and target languages. -" -3829,1611.04822,"Gaurav Maheshwari, Priyansh Trivedi, Harshita Sahijwani, Kunal Jha, - Sourish Dasgupta and Jens Lehmann",SimDoc: Topic Sequence Alignment based Document Similarity Framework,cs.CL," Document similarity is the problem of estimating the degree to which a given -pair of documents has similar semantic content. An accurate document similarity -measure can improve several enterprise relevant tasks such as document -clustering, text mining, and question-answering. In this paper, we show that a -document's thematic flow, which is often disregarded by bag-of-word techniques, -is pivotal in estimating their similarity. To this end, we propose a novel -semantic document similarity framework, called SimDoc. We model documents as -topic-sequences, where topics represent latent generative clusters of related -words. Then, we use a sequence alignment algorithm to estimate their semantic -similarity. We further conceptualize a novel mechanism to compute topic-topic -similarity to fine tune our system. In our experiments, we show that SimDoc -outperforms many contemporary bag-of-words techniques in accurately computing -document similarity, and on practical applications such as document clustering. -" -3830,1611.04837,"Sophie J. Lee, Howard Liu, and Michael D. Ward",Lost in Space: Geolocation in Event Data,cs.CL," Extracting the ""correct"" location information from text data, i.e., -determining the place of event, has long been a goal for automated text -processing. To approximate human-like coding schema, we introduce a supervised -machine learning algorithm that classifies each location word to be either -correct or incorrect. We use news articles collected from around the world -(Integrated Crisis Early Warning System [ICEWS] data and Open Event Data -Alliance [OEDA] data) to test our algorithm that consists of two stages. In the -feature selection stage, we extract contextual information from texts, namely, -the N-gram patterns for location words, the frequency of mention, and the -context of the sentences containing location words. In the classification -stage, we use three classifiers to estimate the model parameters in the -training set and then to predict whether a location word in the test set news -articles is the place of the event. The validation results show that our -algorithm improves the accuracy rate of the current geolocation methods of -dictionary approach by as much as 25%. -" -3831,1611.04841,"R.R. Xie, W.B. Deng, D.J. Wang and L.P. Csernai",Quantitative Entropy Study of Language Complexity,cs.CL physics.soc-ph," We study the entropy of Chinese and English texts, based on characters in -case of Chinese texts and based on words for both languages. Significant -differences are found between the languages and between different personal -styles of debating partners. The entropy analysis points in the direction of -lower entropy, that is of higher complexity. Such a text analysis would be -applied for individuals of different styles, a single individual at different -age, as well as different groups of the population. -" -3832,1611.04842,Francesco Fumarola,The Role of Word Length in Semantic Topology,q-bio.NC cs.CL," A topological argument is presented concering the structure of semantic -space, based on the negative correlation between polysemy and word length. The -resulting graph structure is applied to the modeling of free-recall -experiments, resulting in predictions on the comparative values of recall -probabilities. Associative recall is found to favor longer words whereas -sequential recall is found to favor shorter words. Data from the PEERS -experiments of Lohnas et al. (2015) and Healey and Kahana (2016) confirm both -predictons, with correlation coefficients $r_{seq}= -0.17$ and $r_{ass}= -+0.17$. The argument is then applied to predicting global properties of list -recall, which leads to a novel explanation for the word-length effect based on -the optimization of retrieval strategies. -" -3833,1611.04887,"J Ganesh, Manish Gupta, Vasudeva Varma","Interpreting the Syntactic and Social Elements of the Tweet - Representations via Elementary Property Prediction Tasks",cs.CL cs.SI," Research in social media analysis is experiencing a recent surge with a large -number of works applying representation learning models to solve high-level -syntactico-semantic tasks such as sentiment analysis, semantic textual -similarity computation, hashtag prediction and so on. Although the performance -of the representation learning models are better than the traditional baselines -for the tasks, little is known about the core properties of a tweet encoded -within the representations. Understanding these core properties would empower -us in making generalizable conclusions about the quality of representations. -Our work presented here constitutes the first step in opening the black-box of -vector embedding for social media posts, with emphasis on tweets in particular. - In order to understand the core properties encoded in a tweet representation, -we evaluate the representations to estimate the extent to which it can model -each of those properties such as tweet length, presence of words, hashtags, -mentions, capitalization, and so on. This is done with the help of multiple -classifiers which take the representation as input. Essentially, each -classifier evaluates one of the syntactic or social properties which are -arguably salient for a tweet. This is also the first holistic study on -extensively analysing the ability to encode these properties for a wide variety -of tweet representation models including the traditional unsupervised methods -(BOW, LDA), unsupervised representation learning methods (Siamese CBOW, -Tweet2Vec) as well as supervised methods (CNN, BLSTM). -" -3834,1611.04928,"Yong Cheng, Yang Liu, Qian Yang, Maosong Sun and Wei Xu",Neural Machine Translation with Pivot Languages,cs.CL," While recent neural machine translation approaches have delivered -state-of-the-art performance for resource-rich language pairs, they suffer from -the data scarcity problem for resource-scarce language pairs. Although this -problem can be alleviated by exploiting a pivot language to bridge the source -and target languages, the source-to-pivot and pivot-to-target translation -models are usually independently trained. In this work, we introduce a joint -training algorithm for pivot-based neural machine translation. We propose three -methods to connect the two models and enable them to interact with each other -during training. Experiments on Europarl and WMT corpora show that joint -training of source-to-pivot and pivot-to-target models leads to significant -improvements over independent training across various languages. -" -3835,1611.04953,"Jingjing Gong, Xinchi Chen, Xipeng Qiu, Xuanjing Huang",End-to-End Neural Sentence Ordering Using Pointer Network,cs.CL," Sentence ordering is one of important tasks in NLP. Previous works mainly -focused on improving its performance by using pair-wise strategy. However, it -is nontrivial for pair-wise models to incorporate the contextual sentence -information. In addition, error prorogation could be introduced by using the -pipeline strategy in pair-wise models. In this paper, we propose an end-to-end -neural approach to address the sentence ordering problem, which uses the -pointer network (Ptr-Net) to alleviate the error propagation problem and -utilize the whole contextual information. Experimental results show the -effectiveness of the proposed model. Source codes and dataset of this paper are -available. -" -3836,1611.04989,"Raj Nath Patel, Prakash B. Pimpale, M Sasikumar","Recurrent Neural Network based Part-of-Speech Tagger for Code-Mixed - Social Media Text",cs.CL," This paper describes Centre for Development of Advanced Computing's (CDACM) -submission to the shared task-'Tool Contest on POS tagging for Code-Mixed -Indian Social Media (Facebook, Twitter, and Whatsapp) Text', collocated with -ICON-2016. The shared task was to predict Part of Speech (POS) tag at word -level for a given text. The code-mixed text is generated mostly on social media -by multilingual users. The presence of the multilingual words, -transliterations, and spelling variations make such content linguistically -complex. In this paper, we propose an approach to POS tag code-mixed social -media text using Recurrent Neural Network Language Model (RNN-LM) architecture. -We submitted the results for Hindi-English (hi-en), Bengali-English (bn-en), -and Telugu-English (te-en) code-mixed data. -" -3837,1611.05010,"Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos",Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm,stat.ML cs.CL cs.IR cs.SI," In topic modeling, many algorithms that guarantee identifiability of the -topics have been developed under the premise that there exist anchor words -- -i.e., words that only appear (with positive probability) in one topic. -Follow-up work has resorted to three or higher-order statistics of the data -corpus to relax the anchor word assumption. Reliable estimates of higher-order -statistics are hard to obtain, however, and the identification of topics under -those models hinges on uncorrelatedness of the topics, which can be -unrealistic. This paper revisits topic modeling based on second-order moments, -and proposes an anchor-free topic mining framework. The proposed approach -guarantees the identification of the topics under a much milder condition -compared to the anchor-word assumption, thereby exhibiting much better -robustness in practice. The associated algorithm only involves one -eigen-decomposition and a few small linear programs. This makes it easy to -implement and scale up to very large problem instances. Experiments using the -TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free -approach exhibits very favorable performance (measured using coherence, -similarity count, and clustering accuracy metrics) compared to the prior art. -" -3838,1611.05104,"Shayne Longpre, Sabeek Pradhan, Caiming Xiong, Richard Socher","A Way out of the Odyssey: Analyzing and Combining Recent Insights for - LSTMs",cs.CL cs.AI," LSTMs have become a basic building block for many deep NLP models. In recent -years, many improvements and variations have been proposed for deep sequence -models in general, and LSTMs in particular. We propose and analyze a series of -augmentations and modifications to LSTM networks resulting in improved -performance for text classification datasets. We observe compounding -improvements on traditional LSTMs using Monte Carlo test-time model averaging, -average pooling, and residual connections, along with four other suggested -modifications. Our analysis provides a simple, reliable, and high quality -baseline model. -" -3839,1611.05118,"Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan - Boyd-Graber, Hal Daum\'e III, Larry Davis","The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels - in Comic Book Narratives",cs.CV cs.CL," Visual narrative is often a combination of explicit information and judicious -omissions, relying on the viewer to supply missing details. In comics, most -movements in time and space are hidden in the ""gutters"" between panels. To -follow the story, readers logically connect panels together by inferring unseen -actions through a process called ""closure"". While computers can now describe -what is explicitly depicted in natural images, in this paper we examine whether -they can understand the closure-driven narratives conveyed by stylized artwork -and dialogue in comic book panels. We construct a dataset, COMICS, that -consists of over 1.2 million panels (120 GB) paired with automatic textbox -transcriptions. An in-depth analysis of COMICS demonstrates that neither text -nor image alone can tell a comic book story, so a computer must understand both -modalities to keep up with the plot. We introduce three cloze-style tasks that -ask models to predict narrative and character-centric aspects of a panel given -n preceding panels as context. Various deep neural architectures underperform -human baselines on these tasks, suggesting that COMICS contains fundamental -challenges for both vision and language. -" -3840,1611.05239,Kimmo Kettunen,"How to do lexical quality estimation of a large OCRed historical Finnish - newspaper collection with scarce resources",cs.CL," The National Library of Finland has digitized the historical newspapers -published in Finland between 1771 and 1910. This collection contains -approximately 1.95 million pages in Finnish and Swedish. Finnish part of the -collection consists of about 2.40 billion words. The National Library's Digital -Collections are offered via the digi.kansalliskirjasto.fi web service, also -known as Digi. Part of the newspaper material (from 1771 to 1874) is also -available freely downloadable in The Language Bank of Finland provided by the -FINCLARIN consortium. The collection can also be accessed through the Korp -environment that has been developed by Spr{\aa}kbanken at the University of -Gothenburg and extended by FINCLARIN team at the University of Helsinki to -provide concordances of text resources. A Cranfield style information retrieval -test collection has also been produced out of a small part of the Digi -newspaper material at the University of Tampere. - Quality of OCRed collections is an important topic in digital humanities, as -it affects general usability and searchability of collections. There is no -single available method to assess quality of large collections, but different -methods can be used to approximate quality. This paper discusses different -corpus analysis style methods to approximate overall lexical quality of the -Finnish part of the Digi collection. Methods include usage of parallel samples -and word error rates, usage of morphological analyzers, frequency analysis of -words and comparisons to comparable edited lexical data. Our aim in the quality -analysis is twofold: firstly to analyze the present state of the lexical data -and secondly, to establish a set of assessment methods that build up a compact -procedure for quality assessment after e.g. new OCRing or post correction of -the material. In the discussion part of the paper we shall synthesize results -of our different analyses. -" -3841,1611.05360,Javier de la Rosa and Juan-Luis Su\'arez,The Life of Lazarillo de Tormes and of His Machine Learning Adversities,cs.CL," Summit work of the Spanish Golden Age and forefather of the so-called -picaresque novel, The Life of Lazarillo de Tormes and of His Fortunes and -Adversities still remains an anonymous text. Although distinguished scholars -have tried to attribute it to different authors based on a variety of criteria, -a consensus has yet to be reached. The list of candidates is long and not all -of them enjoy the same support within the scholarly community. Analyzing their -works from a data-driven perspective and applying machine learning techniques -for style and text fingerprinting, we shed light on the authorship of the -Lazarillo. As in a state-of-the-art survey, we discuss the methods used and how -they perform in our specific case. According to our methodology, the most -likely author seems to be Juan Arce de Ot\'alora, closely followed by Alfonso -de Vald\'es. The method states that not certain attribution can be made with -the given corpus. -" -3842,1611.05379,Prof. Roger K. Moore,"PCT and Beyond: Towards a Computational Framework for `Intelligent' - Communicative Systems",cs.AI cs.CL cs.HC cs.RO," Recent years have witnessed increasing interest in the potential benefits of -`intelligent' autonomous machines such as robots. Honda's Asimo humanoid robot, -iRobot's Roomba robot vacuum cleaner and Google's driverless cars have fired -the imagination of the general public, and social media buzz with speculation -about a utopian world of helpful robot assistants or the coming robot -apocalypse! However, there is a long way to go before autonomous systems reach -the level of capabilities required for even the simplest of tasks involving -human-robot interaction - especially if it involves communicative behaviour -such as speech and language. Of course the field of Artificial Intelligence -(AI) has made great strides in these areas, and has moved on from abstract -high-level rule-based paradigms to embodied architectures whose operations are -grounded in real physical environments. What is still missing, however, is an -overarching theory of intelligent communicative behaviour that informs -system-level design decisions in order to provide a more coherent approach to -system integration. This chapter introduces the beginnings of such a framework -inspired by the principles of Perceptual Control Theory (PCT). In particular, -it is observed that PCT has hitherto tended to view perceptual processes as a -relatively straightforward series of transformations from sensation to -perception, and has overlooked the potential of powerful generative model-based -solutions that have emerged in practical fields such as visual or auditory -scene analysis. Starting from first principles, a sequence of arguments is -presented which not only shows how these ideas might be integrated into PCT, -but which also extend PCT towards a remarkably symmetric architecture for a -needs-driven communicative agent. It is concluded that, if behaviour is the -control of perception, then perception is the simulation of behaviour. -" -3843,1611.05384,"Xinchi Chen, Xipeng Qiu, Xuanjing Huang","A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and - Part-of-Speech Tagging",cs.CL," Recently, neural network models for natural language processing tasks have -been increasingly focused on for their ability of alleviating the burden of -manual feature engineering. However, the previous neural models cannot extract -the complicated feature compositions as the traditional methods with discrete -features. In this work, we propose a feature-enriched neural model for joint -Chinese word segmentation and part-of-speech tagging task. Specifically, to -simulate the feature templates of traditional discrete feature based models, we -use different filters to model the complex compositional features with -convolutional and pooling layer, and then utilize long distance dependency -information with recurrent layer. Experimental results on five different -datasets show the effectiveness of our proposed model. -" -3844,1611.05527,"Tsubasa Ochiai, Shigeki Matsuda, Hideyuki Watanabe, Shigeru Katagiri","Automatic Node Selection for Deep Neural Networks using Group Lasso - Regularization",cs.CL cs.LG stat.ML," We examine the effect of the Group Lasso (gLasso) regularizer in selecting -the salient nodes of Deep Neural Network (DNN) hidden layers by applying a -DNN-HMM hybrid speech recognizer to TED Talks speech data. We test two types of -gLasso regularization, one for outgoing weight vectors and another for incoming -weight vectors, as well as two sizes of DNNs: 2048 hidden layer nodes and 4096 -nodes. Furthermore, we compare gLasso and L2 regularizers. Our experiment -results demonstrate that our DNN training, in which the gLasso regularizer was -embedded, successfully selected the hidden layer nodes that are necessary and -sufficient for achieving high classification power. -" -3845,1611.05546,"Damien Teney, Anton van den Hengel",Zero-Shot Visual Question Answering,cs.CV cs.AI cs.CL," Part of the appeal of Visual Question Answering (VQA) is its promise to -answer new questions about previously unseen images. Most current methods -demand training questions that illustrate every possible concept, and will -therefore never achieve this capability, since the volume of required training -data would be prohibitive. Answering general questions about images requires -methods capable of Zero-Shot VQA, that is, methods able to answer questions -beyond the scope of the training questions. We propose a new evaluation -protocol for VQA methods which measures their ability to perform Zero-Shot VQA, -and in doing so highlights significant practical deficiencies of current -approaches, some of which are masked by the biases in current datasets. We -propose and evaluate several strategies for achieving Zero-Shot VQA, including -methods based on pretrained word embeddings, object classifiers with semantic -embeddings, and test-time retrieval of example images. Our extensive -experiments are intended to serve as baselines for Zero-Shot VQA, and they also -achieve state-of-the-art performance in the standard VQA evaluation setting. -" -3846,1611.05774,"Adhiguna Kuncoro and Miguel Ballesteros and Lingpeng Kong and Chris - Dyer and Graham Neubig and Noah A. Smith",What Do Recurrent Neural Network Grammars Learn About Syntax?,cs.CL," Recurrent neural network grammars (RNNG) are a recently proposed -probabilistic generative modeling family for natural language. They show -state-of-the-art language modeling and parsing performance. We investigate what -information they learn, from a linguistic perspective, through various -ablations to the model and the data, and by augmenting the model with an -attention mechanism (GA-RNNG) to enable closer inspection. We find that -explicit modeling of composition is crucial for achieving the best performance. -Through the attention mechanism, we find that headedness plays a central role -in phrasal representation (with the model's latent attention largely agreeing -with predictions made by hand-crafted head rules, albeit with some important -differences). By training grammars without nonterminal labels, we find that -phrasal representations depend minimally on nonterminals, providing support for -the endocentricity hypothesis. -" -3847,1611.05962,Siwei Lai,Word and Document Embeddings based on Neural Network Approaches,cs.CL," Data representation is a fundamental task in machine learning. The -representation of data affects the performance of the whole machine learning -system. In a long history, the representation of data is done by feature -engineering, and researchers aim at designing better features for specific -tasks. Recently, the rapid development of deep learning and representation -learning has brought new inspiration to various domains. - In natural language processing, the most widely used feature representation -is the Bag-of-Words model. This model has the data sparsity problem and cannot -keep the word order information. Other features such as part-of-speech tagging -or more complex syntax features can only fit for specific tasks in most cases. -This thesis focuses on word representation and document representation. We -compare the existing systems and present our new model. - First, for generating word embeddings, we make comprehensive comparisons -among existing word embedding models. In terms of theory, we figure out the -relationship between the two most important models, i.e., Skip-gram and GloVe. -In our experiments, we analyze three key points in generating word embeddings, -including the model construction, the training corpus and parameter design. We -evaluate word embeddings with three types of tasks, and we argue that they -cover the existing use of word embeddings. Through theory and practical -experiments, we present some guidelines for how to generate a good word -embedding. - Second, in Chinese character or word representation. We introduce the joint -training of Chinese character and word. ... - Third, for document representation, we analyze the existing document -representation models, including recursive NNs, recurrent NNs and convolutional -NNs. We point out the drawbacks of these models and present our new model, the -recurrent convolutional neural networks. ... -" -3848,1611.06188,"Yacine Jernite, Edouard Grave, Armand Joulin, Tomas Mikolov",Variable Computation in Recurrent Neural Networks,stat.ML cs.AI cs.CL cs.LG," Recurrent neural networks (RNNs) have been used extensively and with -increasing success to model various types of sequential data. Much of this -progress has been achieved through devising recurrent units and architectures -with the flexibility to capture complex statistics in the data, such as long -range dependency or localized attention phenomena. However, while many -sequential data (such as video, speech or language) can have highly variable -information flow, most recurrent models still consume input features at a -constant rate and perform a constant number of computations per time step, -which can be detrimental to both speed and model capacity. In this paper, we -explore a modification to existing recurrent units which allows them to learn -to vary the amount of computation they perform at each step, without prior -knowledge of the sequence's time structure. We show experimentally that not -only do our models require fewer operations, they also lead to better -performance overall on evaluation tasks. -" -3849,1611.06204,"Volkan Cirik, Eduard Hovy, Louis-Philippe Morency","Visualizing and Understanding Curriculum Learning for Long Short-Term - Memory Networks",cs.CL cs.LG cs.NE," Curriculum Learning emphasizes the order of training instances in a -computational learning setup. The core hypothesis is that simpler instances -should be learned early as building blocks to learn more complex ones. Despite -its usefulness, it is still unknown how exactly the internal representation of -models are affected by curriculum learning. In this paper, we study the effect -of curriculum learning on Long Short-Term Memory (LSTM) networks, which have -shown strong competency in many Natural Language Processing (NLP) problems. Our -experiments on sentiment analysis task and a synthetic task similar to sequence -prediction tasks in NLP show that curriculum learning has a positive effect on -the LSTM's internal states by biasing the model towards building constructive -representations i.e. the internal representation at the previous timesteps are -used as building blocks for the final prediction. We also find that smaller -models significantly improves when they are trained with curriculum learning. -Lastly, we show that curriculum learning helps more when the amount of training -data is limited. -" -3850,1611.06216,"Iulian Vlad Serban, Ryan Lowe, Laurent Charlin, Joelle Pineau",Generative Deep Neural Networks for Dialogue: A Short Review,cs.CL cs.AI cs.NE," Researchers have recently started investigating deep neural networks for -dialogue applications. In particular, generative sequence-to-sequence (Seq2Seq) -models have shown promising results for unstructured tasks, such as word-level -dialogue response generation. The hope is that such models will be able to -leverage massive amounts of data to learn meaningful natural language -representations and response generation strategies, while requiring a minimum -amount of domain knowledge and hand-crafting. An important challenge is to -develop models that can effectively incorporate dialogue context and generate -meaningful and diverse responses. In support of this goal, we review recently -proposed models based on generative encoder-decoder neural network -architectures, and show that these models have better ability to incorporate -long-term dialogue history, to model uncertainty and ambiguity in dialogue, and -to generate responses with high-level compositional structure. -" -3851,1611.06320,Chao-Lin Liu and Kuo-Feng Luo,"Tracking Words in Chinese Poetry of Tang and Song Dynasties with the - China Biographical Database",cs.CL," Large-scale comparisons between the poetry of Tang and Song dynasties shed -light on how words, collocations, and expressions were used and shared among -the poets. That some words were used only in the Tang poetry and some only in -the Song poetry could lead to interesting research in linguistics. That the -most frequent colors are different in the Tang and Song poetry provides a trace -of the changing social circumstances in the dynasties. Results of the current -work link to research topics of lexicography, semantics, and social -transitions. We discuss our findings and present our algorithms for efficient -comparisons among the poems, which are crucial for completing billion times of -comparisons within acceptable time. -" -3852,1611.06322,"Yumeng Qin, Dominik Wurzer, Victor Lavrenko, Cunchen Tang",Spotting Rumors via Novelty Detection,cs.SI cs.CL cs.IR," Rumour detection is hard because the most accurate systems operate -retrospectively, only recognizing rumours once they have collected repeated -signals. By then the rumours might have already spread and caused harm. We -introduce a new category of features based on novelty, tailored to detect -rumours early on. To compensate for the absence of repeated signals, we make -use of news wire as an additional data source. Unconfirmed (novel) information -with respect to the news articles is considered as an indication of rumours. -Additionally we introduce pseudo feedback, which assumes that documents that -are similar to previous rumours, are more likely to also be a rumour. -Comparison with other real-time approaches shows that novelty based features in -conjunction with pseudo feedback perform significantly better, when detecting -rumours instantly after their publication. -" -3853,1611.06423,"A. K. Sarkar, Zheng-Hua Tan","Incorporating Pass-Phrase Dependent Background Models for Text-Dependent - Speaker Verification",cs.CL," In this paper, we propose pass-phrase dependent background models (PBMs) for -text-dependent (TD) speaker verification (SV) to integrate the pass-phrase -identification process into the conventional TD-SV system, where a PBM is -derived from a text-independent background model through adaptation using the -utterances of a particular pass-phrase. During training, pass-phrase specific -target speaker models are derived from the particular PBM using the training -data for the respective target model. While testing, the best PBM is first -selected for the test utterance in the maximum likelihood (ML) sense and the -selected PBM is then used for the log likelihood ratio (LLR) calculation with -respect to the claimant model. The proposed method incorporates the pass-phrase -identification step in the LLR calculation, which is not considered in -conventional standalone TD-SV systems. The performance of the proposed method -is compared to conventional text-independent background model based TD-SV -systems using either Gaussian mixture model (GMM)-universal background model -(UBM) or Hidden Markov model (HMM)-UBM or i-vector paradigms. In addition, we -consider two approaches to build PBMs: speaker-independent and -speaker-dependent. We show that the proposed method significantly reduces the -error rates of text-dependent speaker verification for the non-target types: -target-wrong and imposter-wrong while it maintains comparable TD-SV performance -when imposters speak a correct utterance with respect to the conventional -system. Experiments are conducted on the RedDots challenge and the RSR2015 -databases that consist of short utterances. -" -3854,1611.06459,"Supun Nakandala, Giovanni Luca Ciampaglia, Norman Makoto Su, Yong-Yeol - Ahn",Gendered Conversation in a Social Game-Streaming Platform,cs.SI cs.CL cs.CY," Online social media and games are increasingly replacing offline social -activities. Social media is now an indispensable mode of communication; online -gaming is not only a genuine social activity but also a popular spectator -sport. With support for anonymity and larger audiences, online interaction -shrinks social and geographical barriers. Despite such benefits, social -disparities such as gender inequality persist in online social media. In -particular, online gaming communities have been criticized for persistent -gender disparities and objectification. As gaming evolves into a social -platform, persistence of gender disparity is a pressing question. Yet, there -are few large-scale, systematic studies of gender inequality and -objectification in social gaming platforms. Here we analyze more than one -billion chat messages from Twitch, a social game-streaming platform, to study -how the gender of streamers is associated with the nature of conversation. -Using a combination of computational text analysis methods, we show that -gendered conversation and objectification is prevalent in chats. Female -streamers receive significantly more objectifying comments while male streamers -receive more game-related comments. This difference is more pronounced for -popular streamers. There also exists a large number of users who post only on -female or male streams. Employing a neural vector-space embedding (paragraph -vector) method, we analyze gendered chat messages and create prediction models -that (i) identify the gender of streamers based on messages posted in the -channel and (ii) identify the gender a viewer prefers to watch based on their -chat messages. Our findings suggest that disparities in social game-streaming -platforms is a nuanced phenomenon that involves the gender of streamers as well -as those who produce gendered and game-related conversation. -" -3855,1611.06468,"Rui Liu, Xiaoli Zhang","Generating machine-executable plans from end-user's natural-language - instructions",cs.AI cs.CL cs.RO," It is critical for advanced manufacturing machines to autonomously execute a -task by following an end-user's natural language (NL) instructions. However, NL -instructions are usually ambiguous and abstract so that the machines may -misunderstand and incorrectly execute the task. To address this NL-based -human-machine communication problem and enable the machines to appropriately -execute tasks by following the end-user's NL instructions, we developed a -Machine-Executable-Plan-Generation (exePlan) method. The exePlan method -conducts task-centered semantic analysis to extract task-related information -from ambiguous NL instructions. In addition, the method specifies machine -execution parameters to generate a machine-executable plan by interpreting -abstract NL instructions. To evaluate the exePlan method, an industrial robot -Baxter was instructed by NL to perform three types of industrial tasks {'drill -a hole', 'clean a spot', 'install a screw'}. The experiment results proved that -the exePlan method was effective in generating machine-executable plans from -the end-user's NL instructions. Such a method has the promise to endow a -machine with the ability of NL-instructed task execution. -" -3856,1611.06478,"Salman Mahmood, Rami Al-Rfou, Klaus Mueller",Visualizing Linguistic Shift,cs.CL cs.HC," Neural network based models are a very powerful tool for creating word -embeddings, the objective of these models is to group similar words together. -These embeddings have been used as features to improve results in various -applications such as document classification, named entity recognition, etc. -Neural language models are able to learn word representations which have been -used to capture semantic shifts across time and geography. The objective of -this paper is to first identify and then visualize how words change meaning in -different text corpus. We will train a neural language model on texts from a -diverse set of disciplines philosophy, religion, fiction etc. Each text will -alter the embeddings of the words to represent the meaning of the word inside -that text. We will present a computational technique to detect words that -exhibit significant linguistic shift in meaning and usage. We then use enhanced -scatterplots and storyline visualization to visualize the linguistic shift. -" -3857,1611.06492,"Arnav Kumar Jain, Abhinav Agarwalla, Kumar Krishna Agrawal, Pabitra - Mitra",Recurrent Memory Addressing for describing videos,cs.CV cs.CL," In this paper, we introduce Key-Value Memory Networks to a multimodal setting -and a novel key-addressing mechanism to deal with sequence-to-sequence models. -The proposed model naturally decomposes the problem of video captioning into -vision and language segments, dealing with them as key-value pairs. More -specifically, we learn a semantic embedding (v) corresponding to each frame (k) -in the video, thereby creating (k, v) memory slots. We propose to find the next -step attention weights conditioned on the previous attention distributions for -the key-value memory slots in the memory addressing schema. Exploiting this -flexibility of the framework, we additionally capture spatial dependencies -while mapping from the visual to semantic embedding. Experiments done on the -Youtube2Text dataset demonstrate usefulness of recurrent key-addressing, while -achieving competitive scores on BLEU@4, METEOR metrics against state-of-the-art -models. -" -3858,1611.06607,"Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei",A Hierarchical Approach for Generating Descriptive Image Paragraphs,cs.CV cs.CL," Recent progress on image captioning has made it possible to generate novel -sentences describing images in natural language, but compressing an image into -a single sentence can describe visual content in only coarse detail. While one -new captioning approach, dense captioning, can potentially describe images in -finer levels of detail by captioning many regions within an image, it in turn -is unable to produce a coherent story for an image. In this paper we overcome -these limitations by generating entire paragraphs for describing images, which -can tell detailed, unified stories. We develop a model that decomposes both -images and paragraphs into their constituent parts, detecting semantic regions -in images and using a hierarchical recurrent neural network to reason about -language. Linguistic analysis confirms the complexity of the paragraph -generation task, and thorough experiments on a new dataset of image and -paragraph pairs demonstrate the effectiveness of our approach. -" -3859,1611.06639,"Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, Bo Xu","Text Classification Improved by Integrating Bidirectional LSTM with - Two-dimensional Max Pooling",cs.CL," Recurrent Neural Network (RNN) is one of the most popular architectures used -in Natural Language Processsing (NLP) tasks because its recurrent structure is -very suitable to process variable-length text. RNN can utilize distributed -representations of words by first converting the tokens comprising each text -into vectors, which form a matrix. And this matrix includes two dimensions: the -time-step dimension and the feature vector dimension. Then most existing models -usually utilize one-dimensional (1D) max pooling operation or attention-based -operation only on the time-step dimension to obtain a fixed-length vector. -However, the features on the feature vector dimension are not mutually -independent, and simply applying 1D pooling operation over the time-step -dimension independently may destroy the structure of the feature -representation. On the other hand, applying two-dimensional (2D) pooling -operation over the two dimensions may sample more meaningful features for -sequence modeling tasks. To integrate the features on both dimensions of the -matrix, this paper explores applying 2D max pooling operation to obtain a -fixed-length representation of the text. This paper also utilizes 2D -convolution to sample more meaningful information of the matrix. Experiments -are conducted on six text classification tasks, including sentiment analysis, -question classification, subjectivity classification and newsgroup -classification. Compared with the state-of-the-art models, the proposed models -achieve excellent performance on 4 out of 6 tasks. Specifically, one of the -proposed models achieves highest accuracy on Stanford Sentiment Treebank binary -classification and fine-grained classification tasks. -" -3860,1611.06671,"Mark Abraham Magumba, Peter Nabende",Ontology Driven Disease Incidence Detection on Twitter,cs.CL cs.IR," In this work we address the issue of generic automated disease incidence -monitoring on twitter. We employ an ontology of disease related concepts and -use it to obtain a conceptual representation of tweets. Unlike previous key -word based systems and topic modeling approaches, our ontological approach -allows us to apply more stringent criteria for determining which messages are -relevant such as spatial and temporal characteristics whilst giving a stronger -guarantee that the resulting models will perform well on new data that may be -lexically divergent. We achieve this by training learners on concepts rather -than individual words. For training we use a dataset containing mentions of -influenza and Listeria and use the learned models to classify datasets -containing mentions of an arbitrary selection of other diseases. We show that -our ontological approach achieves good performance on this task using a variety -of Natural Language Processing Techniques. We also show that word vectors can -be learned directly from our concepts to achieve even better results. -" -3861,1611.06722,"Yanqing Chen, Steven Skiena","False-Friend Detection and Entity Matching via Unsupervised - Transliteration",cs.CL," Transliterations play an important role in multilingual entity reference -resolution, because proper names increasingly travel between languages in news -and social media. Previous work associated with machine translation targets -transliteration only single between language pairs, focuses on specific classes -of entities (such as cities and celebrities) and relies on manual curation, -which limits the expression power of transliteration in multilingual -environment. - By contrast, we present an unsupervised transliteration model covering 69 -major languages that can generate good transliterations for arbitrary strings -between any language pair. Our model yields top-(1, 20, 100) averages of -(32.85%, 60.44%, 83.20%) in matching gold standard transliteration compared to -results from a recently-published system of (26.71%, 50.27%, 72.79%). We also -show the quality of our model in detecting true and false friends from -Wikipedia high frequency lexicons. Our method indicates a strong signal of -pronunciation similarity and boosts the probability of finding true friends in -68 out of 69 languages. -" -3862,1611.06788,Zhiyang Teng and Yue Zhang,Bidirectional Tree-Structured LSTM with Head Lexicalization,cs.CL," Sequential LSTM has been extended to model tree structures, giving -competitive results for a number of tasks. Existing methods model constituent -trees by bottom-up combinations of constituent nodes, making direct use of -input word information only for leaf nodes. This is different from sequential -LSTMs, which contain reference to input words for each node. In this paper, we -propose a method for automatic head-lexicalization for tree-structure LSTMs, -propagating head words from leaf nodes to every constituent node. In addition, -enabled by head lexicalization, we build a tree LSTM in the top-down direction, -which corresponds to bidirectional sequential LSTM structurally. Experiments -show that both extensions give better representations of tree structures. Our -final model gives the best results on the Standford Sentiment Treebank and -highly competitive results on the TREC question type classification task. -" -3863,1611.06933,Jacob Eisenstein,Unsupervised Learning for Lexicon-Based Classification,cs.LG cs.CL stat.ML," In lexicon-based classification, documents are assigned labels by comparing -the number of words that appear from two opposed lexicons, such as positive and -negative sentiment. Creating such words lists is often easier than labeling -instances, and they can be debugged by non-experts if classification -performance is unsatisfactory. However, there is little analysis or -justification of this classification heuristic. This paper describes a set of -assumptions that can be used to derive a probabilistic justification for -lexicon-based classification, as well as an analysis of its expected accuracy. -One key assumption behind lexicon-based classification is that all words in -each lexicon are equally predictive. This is rarely true in practice, which is -why lexicon-based approaches are usually outperformed by supervised classifiers -that learn distinct weights on each word from labeled instances. This paper -shows that it is possible to learn such weights without labeled data, by -leveraging co-occurrence statistics across the lexicons. This offers the best -of both worlds: light supervision in the form of lexicons, and data-driven -classification with higher accuracy than traditional word-counting heuristics. -" -3864,1611.06950,"Jie Mei, Aminul Islam, Yajing Wu, Abidalrahman Moh'd, Evangelos E. - Milios",Statistical Learning for OCR Text Correction,cs.CV cs.CL cs.LG," The accuracy of Optical Character Recognition (OCR) is crucial to the success -of subsequent applications used in text analyzing pipeline. Recent models of -OCR post-processing significantly improve the quality of OCR-generated text, -but are still prone to suggest correction candidates from limited observations -while insufficiently accounting for the characteristics of OCR errors. In this -paper, we show how to enlarge candidate suggestion space by using external -corpus and integrating OCR-specific features in a regression approach to -correct OCR-generated errors. The evaluation results show that our model can -correct 61.5% of the OCR-errors (considering the top 1 suggestion) and 71.5% of -the OCR-errors (considering the top 3 suggestions), for cases where the -theoretical correction upper-bound is 78%. -" -3865,1611.06986,"Ramon Sanabria, Florian Metze and Fernando De La Torre",Robust end-to-end deep audiovisual speech recognition,cs.CL cs.LG cs.SD," Speech is one of the most effective ways of communication among humans. Even -though audio is the most common way of transmitting speech, very important -information can be found in other modalities, such as vision. Vision is -particularly useful when the acoustic signal is corrupted. Multi-modal speech -recognition however has not yet found wide-spread use, mostly because the -temporal alignment and fusion of the different information sources is -challenging. - This paper presents an end-to-end audiovisual speech recognizer (AVSR), based -on recurrent neural networks (RNN) with a connectionist temporal classification -(CTC) loss function. CTC creates sparse ""peaky"" output activations, and we -analyze the differences in the alignments of output targets (phonemes or -visemes) between audio-only, video-only, and audio-visual feature -representations. We present the first such experiments on the large vocabulary -IBM ViaVoice database, which outperform previously published approaches on -phone accuracy in clean and noisy conditions. -" -3866,1611.06997,Hongyuan Mei and Mohit Bansal and Matthew R. Walter,Coherent Dialogue with Attention-based Language Models,cs.CL cs.AI," We model coherent conversation continuation via RNN-based dialogue models -equipped with a dynamic attention mechanism. Our attention-RNN language model -dynamically increases the scope of attention on the history as the conversation -continues, as opposed to standard attention (or alignment) models with a fixed -input scope in a sequence-to-sequence model. This allows each generated word to -be associated with the most relevant words in its corresponding conversation -history. We evaluate the model on two popular dialogue datasets, the -open-domain MovieTriples dataset and the closed-domain Ubuntu Troubleshoot -dataset, and achieve significant improvements over the state-of-the-art and -baselines on several metrics, including complementary diversity-based metrics, -human evaluation, and qualitative visualizations. We also show that a vanilla -RNN with dynamic attention outperforms more complex memory models (e.g., LSTM -and GRU) by allowing for flexible, long-distance memory. We promote further -coherence via topic modeling-based reranking. -" -3867,1611.07139,"Reza Rawassizadeh, Chelsea Dobbins, Manouchehr Nourizadeh, Zahra - Ghamchili, Michael Pazzani","A Natural Language Query Interface for Searching Personal Information on - Smartwatches",cs.HC cs.CL cs.IR," Currently, personal assistant systems, run on smartphones and use natural -language interfaces. However, these systems rely mostly on the web for finding -information. Mobile and wearable devices can collect an enormous amount of -contextual personal data such as sleep and physical activities. These -information objects and their applications are known as quantified-self, mobile -health or personal informatics, and they can be used to provide a deeper -insight into our behavior. To our knowledge, existing personal assistant -systems do not support all types of quantified-self queries. In response to -this, we have undertaken a user study to analyze a set of ""textual -questions/queries"" that users have used to search their quantified-self or -mobile health data. Through analyzing these questions, we have constructed a -light-weight natural language based query interface, including a text parser -algorithm and a user interface, to process the users' queries that have been -used for searching quantified-self information. This query interface has been -designed to operate on small devices, i.e. smartwatches, as well as augmenting -the personal assistant systems by allowing them to process end users' natural -language queries about their quantified-self data. -" -3868,1611.07174,"Zewang Zhang, Zheng Sun, Jiaqi Liu, Jingwen Chen, Zhao Huo, Xiao Zhang","Deep Recurrent Convolutional Neural Network: Improving Performance For - Speech Recognition",cs.CL cs.LG," A deep learning approach has been widely applied in sequence modeling -problems. In terms of automatic speech recognition (ASR), its performance has -significantly been improved by increasing large speech corpus and deeper neural -network. Especially, recurrent neural network and deep convolutional neural -network have been applied in ASR successfully. Given the arising problem of -training speed, we build a novel deep recurrent convolutional network for -acoustic modeling and then apply deep residual learning to it. Our experiments -show that it has not only faster convergence speed but better recognition -accuracy over traditional deep convolutional recurrent network. In the -experiments, we compare the convergence speed of our novel deep recurrent -convolutional networks and traditional deep convolutional recurrent networks. -With faster convergence speed, our novel deep recurrent convolutional networks -can reach the comparable performance. We further show that applying deep -residual learning can boost the convergence speed of our novel deep recurret -convolutional networks. Finally, we evaluate all our experimental networks by -phoneme error rate (PER) with our proposed bidirectional statistical n-gram -language model. Our evaluation results show that our newly proposed deep -recurrent convolutional network applied with deep residual learning can reach -the best PER of 17.33\% with the fastest convergence speed on TIMIT database. -The outstanding performance of our novel deep recurrent convolutional neural -network with deep residual learning indicates that it can be potentially -adopted in other sequential problems. -" -3869,1611.07206,"Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, and Hsin-Min Wang",Learning to Distill: The Essence Vector Modeling Framework,cs.CL," In the context of natural language processing, representation learning has -emerged as a newly active research subject because of its excellent performance -in many applications. Learning representations of words is a pioneering study -in this school of research. However, paragraph (or sentence and document) -embedding learning is more suitable/reasonable for some tasks, such as -sentiment classification and document summarization. Nevertheless, as far as we -are aware, there is relatively less work focusing on the development of -unsupervised paragraph embedding methods. Classic paragraph embedding methods -infer the representation of a given paragraph by considering all of the words -occurring in the paragraph. Consequently, those stop or function words that -occur frequently may mislead the embedding learning process to produce a misty -paragraph representation. Motivated by these observations, our major -contributions in this paper are twofold. First, we propose a novel unsupervised -paragraph embedding method, named the essence vector (EV) model, which aims at -not only distilling the most representative information from a paragraph but -also excluding the general background information to produce a more informative -low-dimensional vector representation for the paragraph. Second, in view of the -increasing importance of spoken content processing, an extension of the EV -model, named the denoising essence vector (D-EV) model, is proposed. The D-EV -model not only inherits the advantages of the EV model but also can infer a -more robust representation for a given spoken paragraph against imperfect -speech recognition. -" -3870,1611.07232,"Xixun Lin, Yanchun Liang, Fausto Giunchiglia, Xiaoyue Feng, Renchu - Guan","Compositional Learning of Relation Path Embedding for Knowledge Base - Completion",cs.CL," Large-scale knowledge bases have currently reached impressive sizes; however, -these knowledge bases are still far from complete. In addition, most of the -existing methods for knowledge base completion only consider the direct links -between entities, ignoring the vital impact of the consistent semantics of -relation paths. In this paper, we study the problem of how to better embed -entities and relations of knowledge bases into different low-dimensional spaces -by taking full advantage of the additional semantics of relation paths, and we -propose a compositional learning model of relation path embedding (RPE). -Specifically, with the corresponding relation and path projections, RPE can -simultaneously embed each entity into two types of latent spaces. It is also -proposed that type constraints could be extended from traditional -relation-specific constraints to the new proposed path-specific constraints. -The results of experiments show that the proposed model achieves significant -and consistent improvements compared with the state-of-the-art algorithms. -" -3871,1611.07804,N. Astrakhantsev,"ATR4S: Toolkit with State-of-the-art Automatic Terms Recognition Methods - in Scala",cs.CL," Automatically recognized terminology is widely used for various -domain-specific texts processing tasks, such as machine translation, -information retrieval or sentiment analysis. However, there is still no -agreement on which methods are best suited for particular settings and, -moreover, there is no reliable comparison of already developed methods. We -believe that one of the main reasons is the lack of state-of-the-art methods -implementations, which are usually non-trivial to recreate. In order to address -these issues, we present ATR4S, an open-source software written in Scala that -comprises more than 15 methods for automatic terminology recognition (ATR) and -implements the whole pipeline from text document preprocessing, to term -candidates collection, term candidates scoring, and finally, term candidates -ranking. It is highly scalable, modular and configurable tool with support of -automatic caching. We also compare 10 state-of-the-art methods on 7 open -datasets by average precision and processing time. Experimental comparison -reveals that no single method demonstrates best average precision for all -datasets and that other available tools for ATR do not contain the best -methods. -" -3872,1611.07837,"Yunchen Pu, Martin Renqiang Min, Zhe Gan, Lawrence Carin",Adaptive Feature Abstraction for Translating Video to Text,cs.CV cs.CL," Previous models for video captioning often use the output from a specific -layer of a Convolutional Neural Network (CNN) as video features. However, the -variable context-dependent semantics in the video may make it more appropriate -to adaptively select features from the multiple CNN layers. We propose a new -approach for generating adaptive spatiotemporal representations of videos for -the captioning task. A novel attention mechanism is developed, that adaptively -and sequentially focuses on different layers of CNN features (levels of feature -""abstraction""), as well as local spatiotemporal regions of the feature maps at -each layer. The proposed approach is evaluated on three benchmark datasets: -YouTube2Text, M-VAD and MSR-VTT. Along with visualizing the results and how the -model works, these experiments quantitatively demonstrate the effectiveness of -the proposed adaptive spatiotemporal feature abstraction for translating videos -to sentences with rich semantics. -" -3873,1611.07897,"Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, Lawrence - Carin","Learning Generic Sentence Representations Using Convolutional Neural - Networks",cs.CL cs.LG," We propose a new encoder-decoder approach to learn distributed sentence -representations that are applicable to multiple purposes. The model is learned -by using a convolutional neural network as an encoder to map an input sentence -into a continuous vector, and using a long short-term memory recurrent neural -network as a decoder. Several tasks are considered, including sentence -reconstruction and future sentence prediction. Further, a hierarchical -encoder-decoder model is proposed to encode a sentence to predict multiple -future sentences. By training our models on a large collection of novels, we -obtain a highly generic convolutional sentence encoder that performs well in -practice. Experimental results on several benchmark datasets, and across a -broad range of applications, demonstrate the superiority of the proposed model -over competing methods. -" -3874,1611.07954,"Hai Wang, Takeshi Onishi, Kevin Gimpel, David McAllester",Emergent Predication Structure in Hidden State Vectors of Neural Readers,cs.CL," A significant number of neural architectures for reading comprehension have -recently been developed and evaluated on large cloze-style datasets. We present -experiments supporting the emergence of ""predication structure"" in the hidden -state vectors of these readers. More specifically, we provide evidence that the -hidden state vectors represent atomic formulas $\Phi[c]$ where $\Phi$ is a -semantic property (predicate) and $c$ is a constant symbol entity identifier. -" -3875,1611.08002,"Zhe Gan, Chuang Gan, Xiaodong He, Yunchen Pu, Kenneth Tran, Jianfeng - Gao, Lawrence Carin, Li Deng",Semantic Compositional Networks for Visual Captioning,cs.CV cs.CL cs.LG," A Semantic Compositional Network (SCN) is developed for image captioning, in -which semantic concepts (i.e., tags) are detected from the image, and the -probability of each tag is used to compose the parameters in a long short-term -memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an -ensemble of tag-dependent weight matrices. The degree to which each member of -the ensemble is used to generate an image caption is tied to the -image-dependent probability of the corresponding tag. In addition to captioning -images, we also extend the SCN to generate captions for video clips. We -qualitatively analyze semantic composition in SCNs, and quantitatively evaluate -the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text. -Experimental results show that the proposed method significantly outperforms -prior state-of-the-art approaches, across multiple evaluation metrics. -" -3876,1611.08034,"Zhe Gan, Chunyuan Li, Changyou Chen, Yunchen Pu, Qinliang Su, Lawrence - Carin","Scalable Bayesian Learning of Recurrent Neural Networks for Language - Modeling",cs.CL cs.LG," Recurrent neural networks (RNNs) have shown promising performance for -language modeling. However, traditional training of RNNs using back-propagation -through time often suffers from overfitting. One reason for this is that -stochastic optimization (used for large training sets) does not provide good -estimates of model uncertainty. This paper leverages recent advances in -stochastic gradient Markov Chain Monte Carlo (also appropriate for large -training sets) to learn weight uncertainty in RNNs. It yields a principled -Bayesian learning algorithm, adding gradient noise during training (enhancing -exploration of the model-parameter space) and model averaging when testing. -Extensive experiments on various RNN models and across a broad range of -applications demonstrate the superiority of the proposed approach over -stochastic optimization. -" -3877,1611.08096,Zheqian Chen and Ben Gao and Huimin Zhang and Zhou Zhao and Deng Cai,"User Personalized Satisfaction Prediction via Multiple Instance Deep - Learning",cs.IR cs.CL," Community based question answering services have arisen as a popular -knowledge sharing pattern for netizens. With abundant interactions among users, -individuals are capable of obtaining satisfactory information. However, it is -not effective for users to attain answers within minutes. Users have to check -the progress over time until the satisfying answers submitted. We address this -problem as a user personalized satisfaction prediction task. Existing methods -usually exploit manual feature selection. It is not desirable as it requires -careful design and is labor intensive. In this paper, we settle this issue by -developing a new multiple instance deep learning framework. Specifically, in -our settings, each question follows a weakly supervised learning multiple -instance learning assumption, where its obtained answers can be regarded as -instance sets and we define the question resolved with at least one -satisfactory answer. We thus design an efficient framework exploiting multiple -instance learning property with deep learning to model the question answer -pairs. Extensive experiments on large scale datasets from Stack Exchange -demonstrate the feasibility of our proposed framework in predicting askers -personalized satisfaction. Our framework can be extended to numerous -applications such as UI satisfaction Prediction, multi armed bandit problem, -expert finding and so on. -" -3878,1611.08135,Zheqian Chen and Chi Zhang and Zhou Zhao and Deng Cai,"Question Retrieval for Community-based Question Answering via - Heterogeneous Network Integration Learning",cs.IR cs.CL," Community based question answering platforms have attracted substantial users -to share knowledge and learn from each other. As the rapid enlargement of CQA -platforms, quantities of overlapped questions emerge, which makes users -confounded to select a proper reference. It is urgent for us to take effective -automated algorithms to reuse historical questions with corresponding answers. -In this paper we focus on the problem with question retrieval, which aims to -match historical questions that are relevant or semantically equivalent to -resolve one s query directly. The challenges in this task are the lexical gaps -between questions for the word ambiguity and word mismatch problem. -Furthermore, limited words in queried sentences cause sparsity of word -features. To alleviate these challenges, we propose a novel framework named -HNIL which encodes not only the question contents but also the askers social -interactions to enhance the question embedding performance. More specifically, -we apply random walk based learning method with recurrent neural network to -match the similarities between askers question and historical questions -proposed by other users. Extensive experiments on a large scale dataset from a -real world CQA site show that employing the heterogeneous social network -information outperforms the other state of the art solutions in this task. -" -3879,1611.08307,"Avishkar Bhoopchand, Tim Rockt\""aschel, Earl Barr, Sebastian Riedel",Learning Python Code Suggestion with a Sparse Pointer Network,cs.NE cs.AI cs.CL cs.SE," To enhance developer productivity, all modern integrated development -environments (IDEs) include code suggestion functionality that proposes likely -next tokens at the cursor. While current IDEs work well for statically-typed -languages, their reliance on type annotations means that they do not provide -the same level of support for dynamic programming languages as for -statically-typed languages. Moreover, suggestion engines in modern IDEs do not -propose expressions or multi-statement idiomatic code. Recent work has shown -that language models can improve code suggestion systems by learning from -software repositories. This paper introduces a neural language model with a -sparse pointer network aimed at capturing very long-range dependencies. We -release a large-scale code suggestion corpus of 41M lines of Python code -crawled from GitHub. On this corpus, we found standard neural language models -to perform well at suggesting local phenomena, but struggle to refer to -identifiers that are introduced many tokens in the past. By augmenting a neural -language model with a pointer network specialized in referring to predefined -classes of identifiers, we obtain a much lower perplexity and a 5 percentage -points increase in accuracy for code suggestion compared to an LSTM baseline. -In fact, this increase in code suggestion accuracy is due to a 13 times more -accurate prediction of identifiers. Furthermore, a qualitative analysis shows -this model indeed captures interesting long-range dependencies, like referring -to a class member defined over 60 tokens in the past. -" -3880,1611.08321,"Junhua Mao, Jiajing Xu, Yushi Jing, Alan Yuille","Training and Evaluating Multimodal Word Embeddings with Large-scale Web - Annotated Images",cs.LG cs.CL cs.CV," In this paper, we focus on training and evaluating effective word embeddings -with both text and visual information. More specifically, we introduce a -large-scale dataset with 300 million sentences describing over 40 million -images crawled and downloaded from publicly available Pins (i.e. an image with -sentence descriptions uploaded by users) on Pinterest. This dataset is more -than 200 times larger than MS COCO, the standard large-scale image dataset with -sentence descriptions. In addition, we construct an evaluation dataset to -directly assess the effectiveness of word embeddings in terms of finding -semantically similar or related words and phrases. The word/phrase pairs in -this evaluation dataset are collected from the click data with millions of -users in an image search system, thus contain rich semantic relationships. -Based on these datasets, we propose and compare several Recurrent Neural -Networks (RNNs) based multimodal (text and image) models. Experiments show that -our model benefits from incorporating the visual information into the word -embeddings, and a weight sharing strategy is crucial for learning such -multimodal embeddings. The project page is: -http://www.stat.ucla.edu/~junhua.mao/multimodal_embedding.html -" -3881,1611.08358,"A N Akshatha, Chandana G Upadhyaya, Rajashekara S Murthy",Kannada Spell Checker with Sandhi Splitter,cs.CL," Spelling errors are introduced in text either during typing, or when the user -does not know the correct phoneme or grapheme. If a language contains complex -words like sandhi where two or more morphemes join based on some rules, spell -checking becomes very tedious. In such situations, having a spell checker with -sandhi splitter which alerts the user by flagging the errors and providing -suggestions is very useful. A novel algorithm of sandhi splitting is proposed -in this paper. The sandhi splitter can split about 7000 most common sandhi -words in Kannada language used as test samples. The sandhi splitter was -integrated with a Kannada spell checker and a mechanism for generating -suggestions was added. A comprehensive, platform independent, standalone spell -checker with sandhi splitter application software was thus developed and tested -extensively for its efficiency and correctness. A comparative analysis of this -spell checker with sandhi splitter was made and results concluded that the -Kannada spell checker with sandhi splitter has an improved performance. It is -twice as fast, 200 times more space efficient, and it is 90% accurate in case -of complex nouns and 50% accurate for complex verbs. Such a spell checker with -sandhi splitter will be of foremost significance in machine translation -systems, voice processing, etc. This is the first sandhi splitter in Kannada -and the advantage of the novel algorithm is that, it can be extended to all -Indian languages. -" -3882,1611.08373,"Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi",Bidirectional LSTM-CRF for Clinical Concept Extraction,stat.ML cs.CL cs.LG," Automated extraction of concepts from patient clinical records is an -essential facilitator of clinical research. For this reason, the 2010 i2b2/VA -Natural Language Processing Challenges for Clinical Records introduced a -concept extraction task aimed at identifying and classifying concepts into -predefined categories (i.e., treatments, tests and problems). State-of-the-art -concept extraction approaches heavily rely on handcrafted features and -domain-specific resources which are hard to collect and define. For this -reason, this paper proposes an alternative, streamlined approach: a recurrent -neural network (the bidirectional LSTM with CRF decoding) initialized with -general-purpose, off-the-shelf word embeddings. The experimental results -achieved on the 2010 i2b2/VA reference corpora using the proposed framework -outperform all recent methods and ranks closely to the best submission from the -original 2010 i2b2/VA challenge. -" -3883,1611.08459,"Joji Toyama, Masanori Misono, Masahiro Suzuki, Kotaro Nakayama and - Yutaka Matsuo",Neural Machine Translation with Latent Semantic of Image and Text,cs.CL," Although attention-based Neural Machine Translation have achieved great -success, attention-mechanism cannot capture the entire meaning of the source -sentence because the attention mechanism generates a target word depending -heavily on the relevant parts of the source sentence. The report of earlier -studies has introduced a latent variable to capture the entire meaning of -sentence and achieved improvement on attention-based Neural Machine -Translation. We follow this approach and we believe that the capturing meaning -of sentence benefits from image information because human beings understand the -meaning of language not only from textual information but also from perceptual -information such as that gained from vision. As described herein, we propose a -neural machine translation model that introduces a continuous latent variable -containing an underlying semantic extracted from texts and images. Our model, -which can be trained end-to-end, requires image information only when training. -Experiments conducted with an English--German translation task show that our -model outperforms over the baseline. -" -3884,1611.08562,"Jiwei Li, Will Monroe and Dan Jurafsky","A Simple, Fast Diverse Decoding Algorithm for Neural Generation",cs.CL," In this paper, we propose a simple, fast decoding algorithm that fosters -diversity in neural generation. The algorithm modifies the standard beam search -algorithm by adding an inter-sibling ranking penalty, favoring choosing -hypotheses from diverse parents. We evaluate the proposed model on the tasks of -dialogue response generation, abstractive summarization and machine -translation. We find that diverse decoding helps across all tasks, especially -those for which reranking is needed. - We further propose a variation that is capable of automatically adjusting its -diversity decoding rates for different inputs using reinforcement learning -(RL). We observe a further performance boost from this RL technique. This paper -includes material from the unpublished script ""Mutual Information and Diverse -Decoding Improve Neural Machine Translation"" (Li and Jurafsky, 2016). -" -3885,1611.08656,"Da-Rong Liu, Shun-Po Chuang, Hung-yi Lee",Attention-based Memory Selection Recurrent Network for Language Modeling,cs.CL," Recurrent neural networks (RNNs) have achieved great success in language -modeling. However, since the RNNs have fixed size of memory, their memory -cannot store all the information about the words it have seen before in the -sentence, and thus the useful long-term information may be ignored when -predicting the next words. In this paper, we propose Attention-based Memory -Selection Recurrent Network (AMSRN), in which the model can review the -information stored in the memory at each previous time step and select the -relevant information to help generate the outputs. In AMSRN, the attention -mechanism finds the time steps storing the relevant information in the memory, -and memory selection determines which dimensions of the memory are involved in -computing the attention weights and from which the information is extracted.In -the experiments, AMSRN outperformed long short-term memory (LSTM) based -language models on both English and Chinese corpora. Moreover, we investigate -using entropy as a regularizer for attention weights and visualize how the -attention mechanism helps language modeling. -" -3886,1611.08661,"Jiacheng Xu, Kan Chen, Xipeng Qiu and Xuanjing Huang","Knowledge Graph Representation with Jointly Structural and Textual - Encoding",cs.CL," The objective of knowledge graph embedding is to encode both entities and -relations of knowledge graphs into continuous low-dimensional vector spaces. -Previously, most works focused on symbolic representation of knowledge graph -with structure information, which can not handle new entities or entities with -few facts well. In this paper, we propose a novel deep architecture to utilize -both structural and textual information of entities. Specifically, we introduce -three neural models to encode the valuable information from text description of -entity, among which an attentive model can select related information as -needed. Then, a gating mechanism is applied to integrate representations of -structure and text into a unified architecture. Experiments show that our -models outperform baseline by margin on link prediction and triplet -classification tasks. Source codes of this paper will be available on Github. -" -3887,1611.08669,"Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, - Jos\'e M. F. Moura, Devi Parikh, Dhruv Batra",Visual Dialog,cs.CV cs.AI cs.CL cs.LG," We introduce the task of Visual Dialog, which requires an AI agent to hold a -meaningful dialog with humans in natural, conversational language about visual -content. Specifically, given an image, a dialog history, and a question about -the image, the agent has to ground the question in image, infer context from -history, and answer the question accurately. Visual Dialog is disentangled -enough from a specific downstream task so as to serve as a general test of -machine intelligence, while being grounded in vision enough to allow objective -evaluation of individual responses and benchmark progress. We develop a novel -two-person chat data-collection protocol to curate a large-scale Visual Dialog -dataset (VisDial). VisDial v0.9 has been released and contains 1 dialog with 10 -question-answer pairs on ~120k images from COCO, with a total of ~1.2M dialog -question-answer pairs. - We introduce a family of neural encoder-decoder models for Visual Dialog with -3 encoders -- Late Fusion, Hierarchical Recurrent Encoder and Memory Network -- -and 2 decoders (generative and discriminative), which outperform a number of -sophisticated baselines. We propose a retrieval-based evaluation protocol for -Visual Dialog where the AI agent is asked to sort a set of candidate answers -and evaluated on metrics such as mean-reciprocal-rank of human response. We -quantify gap between machine and human performance on the Visual Dialog task -via human studies. Putting it all together, we demonstrate the first 'visual -chatbot'! Our dataset, code, trained models and visual chatbot are available on -https://visualdialog.org -" -3888,1611.08675,"Heriberto Cuay\'ahuitl, Seunghak Yu, Ashley Williamson, Jacob Carse",Deep Reinforcement Learning for Multi-Domain Dialogue Systems,cs.AI cs.CL cs.LG," Standard deep reinforcement learning methods such as Deep Q-Networks (DQN) -for multiple tasks (domains) face scalability problems. We propose a method for -multi-domain dialogue policy learning---termed NDQN, and apply it to an -information-seeking spoken dialogue system in the domains of restaurants and -hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed) -using simulations report that our proposed method exhibits better scalability -and is promising for optimising the behaviour of multi-domain dialogue systems. -" -3889,1611.08737,"Nana Li, Shuangfei Zhai, Zhongfei Zhang, Boying Liu","Structural Correspondence Learning for Cross-lingual Sentiment - Classification with One-to-many Mappings",cs.LG cs.CL stat.ML," Structural correspondence learning (SCL) is an effective method for -cross-lingual sentiment classification. This approach uses unlabeled documents -along with a word translation oracle to automatically induce task specific, -cross-lingual correspondences. It transfers knowledge through identifying -important features, i.e., pivot features. For simplicity, however, it assumes -that the word translation oracle maps each pivot feature in source language to -exactly only one word in target language. This one-to-one mapping between words -in different languages is too strict. Also the context is not considered at -all. In this paper, we propose a cross-lingual SCL based on distributed -representation of words; it can learn meaningful one-to-many mappings for pivot -words using large amounts of monolingual data and a small dictionary. We -conduct experiments on NLP\&CC 2013 cross-lingual sentiment analysis dataset, -employing English as source language, and Chinese as target language. Our -method does not rely on the parallel corpora and the experimental results show -that our approach is more competitive than the state-of-the-art methods in -cross-lingual sentiment classification. -" -3890,1611.08765,"Liang Sun, Jason Mielens, Jason Baldridge","Fill it up: Exploiting partial dependency annotations in a minimum - spanning tree parser",cs.CL," Unsupervised models of dependency parsing typically require large amounts of -clean, unlabeled data plus gold-standard part-of-speech tags. Adding indirect -supervision (e.g. language universals and rules) can help, but we show that -obtaining small amounts of direct supervision - here, partial dependency -annotations - provides a strong balance between zero and full supervision. We -adapt the unsupervised ConvexMST dependency parser to learn from partial -dependencies expressed in the Graph Fragment Language. With less than 24 hours -of total annotation, we obtain 7% and 17% absolute improvement in unlabeled -dependency scores for English and Spanish, respectively, compared to the same -parser using only universal grammar constraints. -" -3891,1611.08807,"Bernardino Casas, Neus Catal\`a, Ramon Ferrer-i-Cancho, Antoni - Hern\'andez-Fern\'andez and Jaume Baixeries",The polysemy of the words that children learn over time,cs.CL physics.soc-ph," Here we study polysemy as a potential learning bias in vocabulary learning in -children. Words of low polysemy could be preferred as they reduce the -disambiguation effort for the listener. However, such preference could be a -side-effect of another bias: the preference of children for nouns in -combination with the lower polysemy of nouns with respect to other -part-of-speech categories. Our results show that mean polysemy in children -increases over time in two phases, i.e. a fast growth till the 31st month -followed by a slower tendency towards adult speech. In contrast, this evolution -is not found in adults interacting with children. This suggests that children -have a preference for non-polysemous words in their early stages of vocabulary -acquisition. Interestingly, the evolutionary pattern described above weakens -when controlling for syntactic category (noun, verb, adjective or adverb) but -it does not disappear completely, suggesting that it could result from -acombination of a standalone bias for low polysemy and a preference for nouns. -" -3892,1611.08813,Hila Gonen and Yoav Goldberg,Semi Supervised Preposition-Sense Disambiguation using Multilingual Data,cs.CL," Prepositions are very common and very ambiguous, and understanding their -sense is critical for understanding the meaning of the sentence. Supervised -corpora for the preposition-sense disambiguation task are small, suggesting a -semi-supervised approach to the task. We show that signals from unannotated -multilingual data can be used to improve supervised preposition-sense -disambiguation. Our approach pre-trains an LSTM encoder for predicting the -translation of a preposition, and then incorporates the pre-trained encoder as -a component in a supervised classification system, and fine-tunes it for the -task. The multilingual signals consistently improve results on two -preposition-sense datasets. -" -3893,1611.08928,Francesco Fumarola,A theory of interpretive clustering in free recall,q-bio.NC cs.CL," A stochastic model of short-term verbal memory is proposed, in which the -psychological state of the subject is encoded as the instantaneous position of -a particle diffusing over a semantic graph with a probabilistic structure. The -model is particularly suitable for studying the dependence of free-recall -observables on semantic properties of the words to be recalled. Besides -predicting some well-known experimental features (contiguity effect, forward -asymmetry, word-length effect), a novel prediction is obtained on the -relationship between the contiguity effect and the syllabic length of words; -shorter words, by way of their wider semantic range, are predicted to be -characterized by stronger forward contiguity. A fresh analysis of archival data -allows to confirm this prediction. -" -3894,1611.08945,"Arvind Neelakantan, Quoc V. Le, Martin Abadi, Andrew McCallum, Dario - Amodei",Learning a Natural Language Interface with Neural Programmer,cs.CL cs.LG stat.ML," Learning a natural language interface for database tables is a challenging -task that involves deep language understanding and multi-step reasoning. The -task is often approached by mapping natural language queries to logical forms -or programs that provide the desired response when executed on the database. To -our knowledge, this paper presents the first weakly supervised, end-to-end -neural network model to induce such programs on a real-world dataset. We -enhance the objective function of Neural Programmer, a neural network with -built-in discrete operations, and apply it on WikiTableQuestions, a natural -language question-answering dataset. The model is trained end-to-end with weak -supervision of question-answer pairs, and does not require domain-specific -grammars, rules, or annotations that are key elements in previous approaches to -program induction. The main experimental result in this paper is that a single -Neural Programmer model achieves 34.2% accuracy using only 10,000 examples with -weak supervision. An ensemble of 15 models, with a trivial combination -technique, achieves 37.7% accuracy, which is competitive to the current -state-of-the-art accuracy of 37.1% obtained by a traditional natural language -semantic parser. -" -3895,1611.08987,Zhuoran Liu and Yang Liu,Exploiting Unlabeled Data for Neural Grammatical Error Detection,cs.CL," Identifying and correcting grammatical errors in the text written by -non-native writers has received increasing attention in recent years. Although -a number of annotated corpora have been established to facilitate data-driven -grammatical error detection and correction approaches, they are still limited -in terms of quantity and coverage because human annotation is labor-intensive, -time-consuming, and expensive. In this work, we propose to utilize unlabeled -data to train neural network based grammatical error detection models. The -basic idea is to cast error detection as a binary classification problem and -derive positive and negative training examples from unlabeled data. We -introduce an attention-based neural network to capture long-distance -dependencies that influence the word being detected. Experiments show that the -proposed approach significantly outperforms SVMs and convolutional networks -with fixed-size context window. -" -3896,1611.09020,"Jia Su, Bin He, Yi Guan, Jingchi Jiang, Jinfeng Yang","Developing a cardiovascular disease risk factor annotated corpus of - Chinese electronic medical records",cs.CL," Cardiovascular disease (CVD) has become the leading cause of death in China, -and most of the cases can be prevented by controlling risk factors. The goal of -this study was to build a corpus of CVD risk factor annotations based on -Chinese electronic medical records (CEMRs). This corpus is intended to be used -to develop a risk factor information extraction system that, in turn, can be -applied as a foundation for the further study of the progress of risk factors -and CVD. We designed a light annotation task to capture CVD risk factors with -indicators, temporal attributes and assertions that were explicitly or -implicitly displayed in the records. The task included: 1) preparing data; 2) -creating guidelines for capturing annotations (these were created with the help -of clinicians); 3) proposing an annotation method including building the -guidelines draft, training the annotators and updating the guidelines, and -corpus construction. Then, a risk factor annotated corpus based on -de-identified discharge summaries and progress notes from 600 patients was -developed. Built with the help of clinicians, this corpus has an -inter-annotator agreement (IAA) F1-measure of 0.968, indicating a high -reliability. To the best of our knowledge, this is the first annotated corpus -concerning CVD risk factors in CEMRs and the guidelines for capturing CVD risk -factor annotations from CEMRs were proposed. The obtained document-level -annotations can be applied in future studies to monitor risk factors and CVD -over the long term. -" -3897,1611.09028,"Fotis Jannidis, Isabella Reger, Albin Zehe, Martin Becker, Lena - Hettinger, Andreas Hotho",Analyzing Features for the Detection of Happy Endings in German Novels,cs.IR cs.AI cs.CL," With regard to a computational representation of literary plot, this paper -looks at the use of sentiment analysis for happy ending detection in German -novels. Its focus lies on the investigation of previously proposed sentiment -features in order to gain insight about the relevance of specific features on -the one hand and the implications of their performance on the other hand. -Therefore, we study various partitionings of novels, considering the highly -variable concept of ""ending"". We also show that our approach, even though still -rather simple, can potentially lead to substantial findings relevant to -literary studies. -" -3898,1611.09100,"Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang - Ling",Learning to Compose Words into Sentences with Reinforcement Learning,cs.CL," We use reinforcement learning to learn tree-structured neural networks for -computing representations of natural language sentences. In contrast with prior -work on tree-structured models in which the trees are either provided as input -or predicted using supervision from explicit treebank annotations, the tree -structures in this work are optimized to improve performance on a downstream -task. Experiments demonstrate the benefit of learning task-specific composition -orders, outperforming both sequential encoders and recursive encoders based on -treebank annotations. We analyze the induced trees and show that while they -discover some linguistically intuitive structures (e.g., noun phrases, simple -verb phrases), they are different than conventional English syntactic -structures. -" -3899,1611.09122,"Andronik Arutyunov, Leonid Borisov, Sergey Fedorov, Anastasiya - Ivchenko, Elizabeth Kirina-Lilinskaya, Yurii Orlov, Konstantin Osminin, - Sergey Shilin, Dmitriy Zeniuk","Statistical Properties of European Languages and Voynich Manuscript - Analysis",stat.AP cs.CL," The statistical properties of letters frequencies in European literature -texts are investigated. The determination of logarithmic dependence of letters -sequence for one-language and two-language texts are examined. The pare of -languages is suggested for Voynich Manuscript. The internal structure of -Manuscript is considered. The spectral portraits of two-letters distribution -are constructed. -" -3900,1611.09207,"Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, - Rif A. Saurous, D. Sculley",AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech,cs.CL cs.LG stat.ML," Developers of text-to-speech synthesizers (TTS) often make use of human -raters to assess the quality of synthesized speech. We demonstrate that we can -model human raters' mean opinion scores (MOS) of synthesized speech using a -deep recurrent neural network whose inputs consist solely of a raw waveform. -Our best models provide utterance-level estimates of MOS only moderately -inferior to sampled human ratings, as shown by Pearson and Spearman -correlations. When multiple utterances are scored and averaged, a scenario -common in synthesizer quality assessment, AutoMOS achieves correlations -approaching those of human raters. The AutoMOS model has a number of -applications, such as the ability to explore the parameter space of a speech -synthesizer without requiring a human-in-the-loop. -" -3901,1611.09235,"Ziqiang Cao, Chuwei Luo, Wenjie Li, Sujian Li",Joint Copying and Restricted Generation for Paraphrase,cs.CL cs.IR," Many natural language generation tasks, such as abstractive summarization and -text simplification, are paraphrase-orientated. In these tasks, copying and -rewriting are two main writing modes. Most previous sequence-to-sequence -(Seq2Seq) models use a single decoder and neglect this fact. In this paper, we -develop a novel Seq2Seq model to fuse a copying decoder and a restricted -generative decoder. The copying decoder finds the position to be copied based -on a typical attention model. The generative decoder produces words limited in -the source-specific vocabulary. To combine the two decoders and determine the -final output, we develop a predictor to predict the mode of copying or -rewriting. This predictor can be guided by the actual writing mode in the -training data. We conduct extensive experiments on two different paraphrase -datasets. The result shows that our model outperforms the state-of-the-art -approaches in terms of both informativeness and language quality. -" -3902,1611.09238,"Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei",Improving Multi-Document Summarization via Text Classification,cs.CL cs.IR," Developed so far, multi-document summarization has reached its bottleneck due -to the lack of sufficient training data and diverse categories of documents. -Text classification just makes up for these deficiencies. In this paper, we -propose a novel summarization system called TCSum, which leverages plentiful -text classification data to improve the performance of multi-document -summarization. TCSum projects documents onto distributed representations which -act as a bridge between text classification and summarization. It also utilizes -the classification results to produce summaries of different styles. Extensive -experiments on DUC generic multi-document summarization datasets show that, -TCSum can achieve the state-of-the-art performance without using any -hand-crafted features and has the capability to catch the variations of summary -styles with respect to different text categories. -" -3903,1611.09268,"Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, - Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, - Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary and Tong Wang",MS MARCO: A Human Generated MAchine Reading COmprehension Dataset,cs.CL cs.IR," We introduce a large scale MAchine Reading COmprehension dataset, which we -name MS MARCO. The dataset comprises of 1,010,916 anonymized -questions---sampled from Bing's search query logs---each with a human generated -answer and 182,669 completely human rewritten generated answers. In addition, -the dataset contains 8,841,823 passages---extracted from 3,563,535 web -documents retrieved by Bing---that provide the information necessary for -curating the natural language answers. A question in the MS MARCO dataset may -have multiple answers or no answers at all. Using this dataset, we propose -three different tasks with varying levels of difficulty: (i) predict if a -question is answerable given a set of context passages, and extract and -synthesize the answer as a human would (ii) generate a well-formed answer (if -possible) based on the context passages that can be understood with the -question and passage context, and finally (iii) rank a set of retrieved -passages given a question. The size of the dataset and the fact that the -questions are derived from real user search queries distinguishes MS MARCO from -other well-known publicly available datasets for machine reading comprehension -and question-answering. We believe that the scale and the real-world nature of -this dataset makes it attractive for benchmarking machine reading comprehension -and question-answering models. -" -3904,1611.09288,Tom Sercu and Vaibhava Goel,"Dense Prediction on Sequences with Time-Dilated Convolutions for Speech - Recognition",cs.CL cs.LG cs.NE," In computer vision pixelwise dense prediction is the task of predicting a -label for each pixel in the image. Convolutional neural networks achieve good -performance on this task, while being computationally efficient. In this paper -we carry these ideas over to the problem of assigning a sequence of labels to a -set of speech frames, a task commonly known as framewise classification. We -show that dense prediction view of framewise classification offers several -advantages and insights, including computational efficiency and the ability to -apply batch normalization. When doing dense prediction we pay specific -attention to strided pooling in time and introduce an asymmetric dilated -convolution, called time-dilated convolution, that allows for efficient and -elegant implementation of pooling in time. We show results using time-dilated -convolutions in a very deep VGG-style CNN with batch normalization on the Hub5 -Switchboard-2000 benchmark task. With a big n-gram language model, we achieve -7.7% WER which is the best single model single-pass performance reported so -far. -" -3905,1611.09405,"Chris Lengerich, Awni Hannun","An End-to-End Architecture for Keyword Spotting and Voice Activity - Detection",cs.CL," We propose a single neural network architecture for two tasks: on-line -keyword spotting and voice activity detection. We develop novel inference -algorithms for an end-to-end Recurrent Neural Network trained with the -Connectionist Temporal Classification loss function which allow our model to -achieve high accuracy on both keyword spotting and voice activity detection -without retraining. In contrast to prior voice activity detection models, our -architecture does not require aligned training data and uses the same -parameters as the keyword spotting model. This allows us to deploy a high -quality voice activity detector with no additional memory or maintenance -requirements. -" -3906,1611.09434,"Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha - Sohl-Dickstein, David Sussillo","Input Switched Affine Networks: An RNN Architecture Designed for - Interpretability",cs.AI cs.CL cs.LG cs.NE," There exist many problem domains where the interpretability of neural network -models is essential for deployment. Here we introduce a recurrent architecture -composed of input-switched affine transformations - in other words an RNN -without any explicit nonlinearities, but with input-dependent recurrent -weights. This simple form allows the RNN to be analyzed via straightforward -linear methods: we can exactly characterize the linear contribution of each -input to the model predictions; we can use a change-of-basis to disentangle -input, output, and computational hidden unit subspaces; we can fully -reverse-engineer the architecture's solution to a simple task. Despite this -ease of interpretation, the input switched affine network achieves reasonable -performance on a text modeling tasks, and allows greater computational -efficiency than networks with standard nonlinearities. -" -3907,1611.09441,"Lahari Poddar, Kishaloy Halder, and Xianyan Jia",Sentiment Analysis for Twitter : Going Beyond Tweet Text,cs.CL cs.SI," Analysing sentiment of tweets is important as it helps to determine the -users' opinion. Knowing people's opinion is crucial for several purposes -starting from gathering knowledge about customer base, e-governance, -campaigning and many more. In this report, we aim to develop a system to detect -the sentiment from tweets. We employ several linguistic features along with -some other external sources of information to detect the sentiment of a tweet. -We show that augmenting the 140 character-long tweet with information harvested -from external urls shared in the tweet as well as Social Media features -enhances the sentiment prediction accuracy significantly. -" -3908,1611.09534,"Tom Zahavy and Alessandro Magnani and Abhinandan Krishnan and Shie - Mannor","Is a picture worth a thousand words? A Deep Multi-Modal Fusion - Architecture for Product Classification in e-commerce",cs.CV cs.CL," Classifying products into categories precisely and efficiently is a major -challenge in modern e-commerce. The high traffic of new products uploaded daily -and the dynamic nature of the categories raise the need for machine learning -models that can reduce the cost and time of human editors. In this paper, we -propose a decision level fusion approach for multi-modal product classification -using text and image inputs. We train input specific state-of-the-art deep -neural networks for each input source, show the potential of forging them -together into a multi-modal architecture and train a novel policy network that -learns to choose between them. Finally, we demonstrate that our multi-modal -network improves the top-1 accuracy % over both networks on a real-world -large-scale product classification dataset that we collected fromWalmart.com. -While we focus on image-text fusion that characterizes e-commerce domains, our -algorithms can be easily applied to other modalities such as audio, video, -physical sensors, etc. -" -3909,1611.09573,"V. S. Anoop, S. Asharaf and P. Deepak",Learning Concept Hierarchies through Probabilistic Topic Modeling,cs.AI cs.CL cs.IR," With the advent of semantic web, various tools and techniques have been -introduced for presenting and organizing knowledge. Concept hierarchies are one -such technique which gained significant attention due to its usefulness in -creating domain ontologies that are considered as an integral part of semantic -web. Automated concept hierarchy learning algorithms focus on extracting -relevant concepts from unstructured text corpus and connect them together by -identifying some potential relations exist between them. In this paper, we -propose a novel approach for identifying relevant concepts from plain text and -then learns hierarchy of concepts by exploiting subsumption relation between -them. To start with, we model topics using a probabilistic topic model and then -make use of some lightweight linguistic process to extract semantically rich -concepts. Then we connect concepts by identifying an ""is-a"" relationship -between pair of concepts. The proposed method is completely unsupervised and -there is no need for a domain specific training corpus for concept extraction -and learning. Experiments on large and real-world text corpora such as BBC News -dataset and Reuters News corpus shows that the proposed method outperforms some -of the existing methods for concept extraction and efficient concept hierarchy -learning is possible if the overall task is guided by a probabilistic topic -modeling algorithm. -" -3910,1611.09703,"Cezary Kaliszyk, Josef Urban, Ji\v{r}\'i Vysko\v{c}il","Semantic Parsing of Mathematics by Context-based Learning from Aligned - Corpora and Theorem Proving",cs.CL cs.AI," We study methods for automated parsing of informal mathematical expressions -into formal ones, a main prerequisite for deep computer understanding of -informal mathematical texts. We propose a context-based parsing approach that -combines efficient statistical learning of deep parse trees with their semantic -pruning by type checking and large-theory automated theorem proving. We show -that the methods very significantly improve on previous results in parsing -theorems from the Flyspeck corpus. -" -3911,1611.09799,"Hongyu Gong, Suma Bhat, Pramod Viswanath",Geometry of Compositionality,cs.CL," This paper proposes a simple test for compositionality (i.e., literal usage) -of a word or phrase in a context-specific way. The test is computationally -simple, relying on no external resources and only uses a set of trained word -vectors. Experiments show that the proposed method is competitive with state of -the art and displays high accuracy in context-specific compositionality -detection of a variety of natural language phenomena (idiomaticity, sarcasm, -metaphor) for different datasets in multiple languages. The key insight is to -connect compositionality to a curious geometric property of word embeddings, -which is of independent interest. -" -3912,1611.09823,"Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, - Jason Weston",Dialogue Learning With Human-In-The-Loop,cs.AI cs.CL," An important aspect of developing conversational agents is to give a bot the -ability to improve through communicating with humans and to learn from the -mistakes that it makes. Most research has focused on learning from fixed -training sets of labeled data rather than interacting with a dialogue partner -in an online fashion. In this paper we explore this direction in a -reinforcement learning setting where the bot improves its question-answering -ability from feedback a teacher gives following its generated responses. We -build a simulator that tests various aspects of such learning in a synthetic -environment, and introduce models that work in this regime. Finally, real -experiments with Mechanical Turk validate the approach. -" -3913,1611.09830,"Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro - Sordoni, Philip Bachman, Kaheer Suleman",NewsQA: A Machine Comprehension Dataset,cs.CL cs.AI," We present NewsQA, a challenging machine comprehension dataset of over -100,000 human-generated question-answer pairs. Crowdworkers supply questions -and answers based on a set of over 10,000 news articles from CNN, with answers -consisting of spans of text from the corresponding articles. We collect this -dataset through a four-stage process designed to solicit exploratory questions -that require reasoning. A thorough analysis confirms that NewsQA demands -abilities beyond simple word matching and recognizing textual entailment. We -measure human performance on the dataset and compare it to several strong -neural models. The performance gap between humans and machines (0.198 in F1) -indicates that significant progress can be made on NewsQA through future -research. The dataset is freely available at -https://datasets.maluuba.com/NewsQA. -" -3914,1611.09878,"Jian Tang, Meng Qu, and Qiaozhu Mei",Identity-sensitive Word Embedding through Heterogeneous Networks,cs.CL cs.LG stat.ML," Most existing word embedding approaches do not distinguish the same words in -different contexts, therefore ignoring their contextual meanings. As a result, -the learned embeddings of these words are usually a mixture of multiple -meanings. In this paper, we acknowledge multiple identities of the same word in -different contexts and learn the \textbf{identity-sensitive} word embeddings. -Based on an identity-labeled text corpora, a heterogeneous network of words and -word identities is constructed to model different-levels of word -co-occurrences. The heterogeneous network is further embedded into a -low-dimensional space through a principled network embedding approach, through -which we are able to obtain the embeddings of words and the embeddings of word -identities. We study three different types of word identities including topics, -sentiments and categories. Experimental results on real-world data sets show -that the identity-sensitive word embeddings learned by our approach indeed -capture different meanings of words and outperforms competitive methods on -tasks including text classification and word similarity computation. -" -3915,1611.09900,"Jian Tang, Yifan Yang, Sam Carton, Ming Zhang, and Qiaozhu Mei",Context-aware Natural Language Generation with Recurrent Neural Networks,cs.CL," This paper studied generating natural languages at particular contexts or -situations. We proposed two novel approaches which encode the contexts into a -continuous semantic representation and then decode the semantic representation -into text sequences with recurrent neural networks. During decoding, the -context information are attended through a gating mechanism, addressing the -problem of long-range dependency caused by lengthy sequences. We evaluate the -effectiveness of the proposed approaches on user review data, in which rich -contexts are available and two informative contexts, sentiments and products, -are selected for evaluation. Experiments show that the fake reviews generated -by our approaches are very natural. Results of fake review detection with human -judges show that more than 50\% of the fake reviews are misclassified as the -real reviews, and more than 90\% are misclassified by existing state-of-the-art -fake review detection algorithm. -" -3916,1611.09921,"Jian Tang, Cheng Li, Ming Zhang, and Qiaozhu Mei","Less is More: Learning Prominent and Diverse Topics for Data - Summarization",cs.LG cs.CL cs.IR," Statistical topic models efficiently facilitate the exploration of -large-scale data sets. Many models have been developed and broadly used to -summarize the semantic structure in news, science, social media, and digital -humanities. However, a common and practical objective in data exploration tasks -is not to enumerate all existing topics, but to quickly extract representative -ones that broadly cover the content of the corpus, i.e., a few topics that -serve as a good summary of the data. Most existing topic models fit exactly the -same number of topics as a user specifies, which have imposed an unnecessary -burden to the users who have limited prior knowledge. We instead propose new -models that are able to learn fewer but more representative topics for the -purpose of data summarization. We propose a reinforced random walk that allows -prominent topics to absorb tokens from similar and smaller topics, thus -enhances the diversity among the top topics extracted. With this reinforced -random walk as a general process embedded in classical topic models, we obtain -\textit{diverse topic models} that are able to extract the most prominent and -diverse topics from data. The inference procedures of these diverse topic -models remain as simple and efficient as the classical models. Experimental -results demonstrate that the diverse topic models not only discover topics that -better summarize the data, but also require minimal prior knowledge of the -users. -" -3917,1611.10038,Si Li and Nianwen Xue,Towards Accurate Word Segmentation for Chinese Patents,cs.CL," A patent is a property right for an invention granted by the government to -the inventor. An invention is a solution to a specific technological problem. -So patents often have a high concentration of scientific and technical terms -that are rare in everyday language. The Chinese word segmentation model trained -on currently available everyday language data sets performs poorly because it -cannot effectively recognize these scientific and technical terms. In this -paper we describe a pragmatic approach to Chinese word segmentation on patents -where we train a character-based semi-supervised sequence labeling model by -extracting features from a manually segmented corpus of 142 patents, enhanced -with information extracted from the Chinese TreeBank. Experiments show that the -accuracy of our model reached 95.08% (F1 score) on a held-out test set and -96.59% on development set, compared with an F1 score of 91.48% on development -set if the model is trained on the Chinese TreeBank. We also experimented with -some existing domain adaptation techniques, the results show that the amount of -target domain data and the selected features impact the performance of the -domain adaptation techniques. -" -3918,1611.10122,"Jack Bowers (OEAW), Laurent Romary (CMB, ALPAGE)",Deep encoding of etymological information in TEI,cs.CL," This paper aims to provide a comprehensive modeling and representation of -etymological data in digital dictionaries. The purpose is to integrate in one -coherent framework both digital representations of legacy dictionaries, and -also born-digital lexical databases that are constructed manually or -semi-automatically. We want to propose a systematic and coherent set of -modeling principles for a variety of etymological phenomena that may contribute -to the creation of a continuum between existing and future lexical constructs, -where anyone interested in tracing the history of words and their meanings will -be able to seamlessly query lexical resources.Instead of designing an ad hoc -model and representation language for digital etymological data, we will focus -on identifying all the possibilities offered by the TEI guidelines for the -representation of lexical information. -" -3919,1611.10277,"Ryan J. Gallagher, Kyle Reing, David Kale, Greg Ver Steeg","Anchored Correlation Explanation: Topic Modeling with Minimal Domain - Knowledge",cs.CL cs.IR cs.IT math.IT stat.ML," While generative models such as Latent Dirichlet Allocation (LDA) have proven -fruitful in topic modeling, they often require detailed assumptions and careful -specification of hyperparameters. Such model complexity issues only compound -when trying to generalize generative models to incorporate human input. We -introduce Correlation Explanation (CorEx), an alternative approach to topic -modeling that does not assume an underlying generative model, and instead -learns maximally informative topics through an information-theoretic framework. -This framework naturally generalizes to hierarchical and semi-supervised -extensions with no additional modeling assumptions. In particular, word-level -domain knowledge can be flexibly incorporated within CorEx through anchor -words, allowing topic separability and representation to be promoted with -minimal human intervention. Across a variety of datasets, metrics, and -experiments, we demonstrate that CorEx produces topics that are comparable in -quality to those produced by unsupervised and semi-supervised variants of LDA. -" -3920,1612.00148,"Vivek Kulkarni, Yashar Mehdad, Troy Chevalier","Domain Adaptation for Named Entity Recognition in Online Media with Word - Embeddings",cs.CL cs.IR," Content on the Internet is heterogeneous and arises from various domains like -News, Entertainment, Finance and Technology. Understanding such content -requires identifying named entities (persons, places and organizations) as one -of the key steps. Traditionally Named Entity Recognition (NER) systems have -been built using available annotated datasets (like CoNLL, MUC) and demonstrate -excellent performance. However, these models fail to generalize onto other -domains like Sports and Finance where conventions and language use can differ -significantly. Furthermore, several domains do not have large amounts of -annotated labeled data for training robust Named Entity Recognition models. A -key step towards this challenge is to adapt models learned on domains where -large amounts of annotated training data are available to domains with scarce -annotated data. - In this paper, we propose methods to effectively adapt models learned on one -domain onto other domains using distributed word representations. First we -analyze the linguistic variation present across domains to identify key -linguistic insights that can boost performance across domains. We propose -methods to capture domain specific semantics of word usage in addition to -global semantics. We then demonstrate how to effectively use such domain -specific knowledge to learn NER models that outperform previous baselines in -the domain adaptation setting. -" -3921,1612.00227,"Stefano Borgo, Loris Bozzato, Alessio Palmero Aprosio, Marco Rospocher - and Luciano Serafini","On Coreferring Text-extracted Event Descriptions with the aid of - Ontological Reasoning",cs.AI cs.CL," Systems for automatic extraction of semantic information about events from -large textual resources are now available: these tools are capable to generate -RDF datasets about text extracted events and this knowledge can be used to -reason over the recognized events. On the other hand, text based tasks for -event recognition, as for example event coreference (i.e. recognizing whether -two textual descriptions refer to the same event), do not take into account -ontological information of the extracted events in their process. In this -paper, we propose a method to derive event coreference on text extracted event -data using semantic based rule reasoning. We demonstrate our method considering -a limited (yet representative) set of event types: we introduce a formal -analysis on their ontological properties and, on the base of this, we define a -set of coreference criteria. We then implement these criteria as RDF-based -reasoning rules to be applied on text extracted event data. We evaluate the -effectiveness of our approach over a standard coreference benchmark dataset. -" -3922,1612.00246,Lahari Poddar,Multilingual Multiword Expressions,cs.CL," The project aims to provide a semi-supervised approach to identify Multiword -Expressions in a multilingual context consisting of English and most of the -major Indian languages. Multiword expressions are a group of words which refers -to some conventional or regional way of saying things. If they are literally -translated from one language to another the expression will lose its inherent -meaning. - To automatically extract multiword expressions from a corpus, an extraction -pipeline have been constructed which consist of a combination of rule based and -statistical approaches. There are several types of multiword expressions which -differ from each other widely by construction. We employ different methods to -detect different types of multiword expressions. Given a POS tagged corpus in -English or any Indian language the system initially applies some regular -expression filters to narrow down the search space to certain patterns (like, -reduplication, partial reduplication, compound nouns, compound verbs, conjunct -verbs etc.). The word sequences matching the required pattern are subjected to -a series of linguistic tests which include verb filtering, named entity -filtering and hyphenation filtering test to exclude false positives. The -candidates are then checked for semantic relationships among themselves (using -Wordnet). In order to detect partial reduplication we make use of Wordnet as a -lexical database as well as a tool for lemmatising. We detect complex -predicates by investigating the features of the constituent words. Statistical -methods are applied to detect collocations. Finally, lexicographers examine the -list of automatically extracted candidates to validate whether they are true -multiword expressions or not and add them to the multiword dictionary -accordingly. -" -3923,1612.00347,"Dimitrios Kalatzis, Arash Eshghi, Oliver Lemon","Bootstrapping incremental dialogue systems: using linguistic knowledge - to learn from minimal data",cs.CL cs.AI cs.HC," We present a method for inducing new dialogue systems from very small amounts -of unannotated dialogue data, showing how word-level exploration using -Reinforcement Learning (RL), combined with an incremental and semantic grammar -- Dynamic Syntax (DS) - allows systems to discover, generate, and understand -many new dialogue variants. The method avoids the use of expensive and -time-consuming dialogue act annotations, and supports more natural -(incremental) dialogues than turn-based systems. Here, language generation and -dialogue management are treated as a joint decision/optimisation problem, and -the MDP model for RL is constructed automatically. With an implemented system, -we show that this method enables a wide range of dialogue variations to be -automatically captured, even when the system is trained from only a single -dialogue. The variants include question-answer pairs, over- and -under-answering, self- and other-corrections, clarification interaction, -split-utterances, and ellipsis. This generalisation property results from the -structural knowledge and constraints present within the DS grammar, and -highlights some limitations of recent systems built using machine learning -techniques only. -" -3924,1612.00370,"Siqi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, Kevin Murphy",Improved Image Captioning via Policy Gradient optimization of SPIDEr,cs.CV cs.CL," Current image captioning methods are usually trained via (penalized) maximum -likelihood estimation. However, the log-likelihood score of a caption does not -correlate well with human assessments of quality. Standard syntactic evaluation -metrics, such as BLEU, METEOR and ROUGE, are also not well correlated. The -newer SPICE and CIDEr metrics are better correlated, but have traditionally -been hard to optimize for. In this paper, we show how to use a policy gradient -(PG) method to directly optimize a linear combination of SPICE and CIDEr (a -combination we call SPIDEr): the SPICE score ensures our captions are -semantically faithful to the image, while CIDEr score ensures our captions are -syntactically fluent. The PG method we propose improves on the prior MIXER -approach, by using Monte Carlo rollouts instead of mixing MLE training with PG. -We show empirically that our algorithm leads to easier optimization and -improved results compared to MIXER. Finally, we show that using our PG method -we can optimize any of the metrics, including the proposed SPIDEr metric which -results in image captions that are strongly preferred by human raters compared -to captions generated by the same model but trained to optimize MLE or the COCO -metrics. -" -3925,1612.00377,"Iulian V. Serban, Alexander G. Ororbia II, Joelle Pineau, Aaron - Courville",Piecewise Latent Variables for Neural Variational Text Processing,cs.CL cs.AI cs.LG cs.NE," Advances in neural variational inference have facilitated the learning of -powerful directed graphical models with continuous latent variables, such as -variational autoencoders. The hope is that such models will learn to represent -rich, multi-modal latent factors in real-world data, such as natural language -text. However, current models often assume simplistic priors on the latent -variables - such as the uni-modal Gaussian distribution - which are incapable -of representing complex latent factors efficiently. To overcome this -restriction, we propose the simple, but highly flexible, piecewise constant -distribution. This distribution has the capacity to represent an exponential -number of modes of a latent target distribution, while remaining mathematically -tractable. Our results demonstrate that incorporating this new latent -distribution into different models yields substantial improvements in natural -language processing tasks such as document modeling and natural language -generation for dialogue. -" -3926,1612.00385,"Wenjie Pei, Tadas Baltru\v{s}aitis, David M.J. Tax, Louis-Philippe - Morency",Temporal Attention-Gated Model for Robust Sequence Classification,cs.CV cs.CL," Typical techniques for sequence classification are designed for -well-segmented sequences which have been edited to remove noisy or irrelevant -parts. Therefore, such methods cannot be easily applied on noisy sequences -expected in real-world applications. In this paper, we present the Temporal -Attention-Gated Model (TAGM) which integrates ideas from attention models and -gated recurrent networks to better deal with noisy or unsegmented sequences. -Specifically, we extend the concept of attention model to measure the relevance -of each observation (time step) of a sequence. We then use a novel gated -recurrent network to learn the hidden representation for the final prediction. -An important advantage of our approach is interpretability since the temporal -attention weights provide a meaningful value for the salience of each time step -in the sequence. We demonstrate the merits of our TAGM approach, both for -prediction accuracy and interpretability, on three different tasks: spoken -digit recognition, text-based sentiment analysis and visual event recognition. -" -3927,1612.00394,"Thanapon Noraset, Chen Liang, Larry Birnbaum, Doug Downey","Definition Modeling: Learning to define word embeddings in natural - language",cs.CL," Distributed representations of words have been shown to capture lexical -semantics, as demonstrated by their effectiveness in word similarity and -analogical relation tasks. But, these tasks only evaluate lexical semantics -indirectly. In this paper, we study whether it is possible to utilize -distributed representations to generate dictionary definitions of words, as a -more direct and transparent representation of the embeddings' semantics. We -introduce definition modeling, the task of generating a definition for a given -word and its embedding. We present several definition model architectures based -on recurrent neural networks, and experiment with the models over multiple data -sets. Our results show that a model that controls dependencies between the word -being defined and the definition words performs significantly better, and that -a character-level convolution layer designed to leverage morphology can -complement word-level embeddings. Finally, an error analysis suggests that the -errors made by a definition model may provide insight into the shortcomings of -word embeddings. -" -3928,1612.00467,"Paulina Grnarova, Florian Schmidt, Stephanie L. Hyland and Carsten - Eickhoff","Neural Document Embeddings for Intensive Care Patient Mortality - Prediction",cs.CL," We present an automatic mortality prediction scheme based on the unstructured -textual content of clinical notes. Proposing a convolutional document embedding -approach, our empirical investigation using the MIMIC-III intensive care -database shows significant performance gains compared to previously employed -methods such as latent topic distributions or generic doc2vec embeddings. These -improvements are especially pronounced for the difficult problem of -post-discharge mortality prediction. -" -3929,1612.00567,Jiangming Liu and Yue Zhang,Shift-Reduce Constituent Parsing with Neural Lookahead Features,cs.CL," Transition-based models can be fast and accurate for constituent parsing. -Compared with chart-based models, they leverage richer features by extracting -history information from a parser stack, which spans over non-local -constituents. On the other hand, during incremental parsing, constituent -information on the right hand side of the current word is not utilized, which -is a relative weakness of shift-reduce parsing. To address this limitation, we -leverage a fast neural model to extract lookahead features. In particular, we -build a bidirectional LSTM model, which leverages the full sentence information -to predict the hierarchy of constituents that each word starts and ends. The -results are then passed to a strong transition-based constituent parser as -lookahead features. The resulting parser gives 1.3% absolute improvement in WSJ -and 2.3% in CTB compared to the baseline, given the highest reported accuracies -for fully-supervised parsing. -" -3930,1612.00584,"Yuanzhi Ke, Masafumi Hagiwara","Alleviating Overfitting for Polysemous Words for Word Representation - Estimation Using Lexicons",cs.CL," Though there are some works on improving distributed word representations -using lexicons, the improper overfitting of the words that have multiple -meanings is a remaining issue deteriorating the learning when lexicons are -used, which needs to be solved. An alternative method is to allocate a vector -per sense instead of a vector per word. However, the word representations -estimated in the former way are not as easy to use as the latter one. Our -previous work uses a probabilistic method to alleviate the overfitting, but it -is not robust with a small corpus. In this paper, we propose a new neural -network to estimate distributed word representations using a lexicon and a -corpus. We add a lexicon layer in the continuous bag-of-words model and a -threshold node after the output of the lexicon layer. The threshold rejects the -unreliable outputs of the lexicon layer that are less likely to be the same -with their inputs. In this way, it alleviates the overfitting of the polysemous -words. The proposed neural network can be trained using negative sampling, -which maximizing the log probabilities of target words given the context words, -by distinguishing the target words from random noises. We compare the proposed -neural network with the continuous bag-of-words model, the other works -improving it, and the previous works estimating distributed word -representations using both a lexicon and a corpus. The experimental results -show that the proposed neural network is more efficient and balanced for both -semantic tasks and syntactic tasks than the previous works, and robust to the -size of the corpus. -" -3931,1612.00694,"Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, - Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally",ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA,cs.CL," Long Short-Term Memory (LSTM) is widely used in speech recognition. In order -to achieve higher prediction accuracy, machine learning scientists have built -larger and larger models. Such large model is both computation intensive and -memory intensive. Deploying such bulky model results in high power consumption -and leads to high total cost of ownership (TCO) of a data center. In order to -speedup the prediction and make it energy efficient, we first propose a -load-balance-aware pruning method that can compress the LSTM model size by 20x -(10x from pruning and 2x from quantization) with negligible loss of the -prediction accuracy. The pruned model is friendly for parallel processing. -Next, we propose scheduler that encodes and partitions the compressed model to -each PE for parallelism, and schedule the complicated LSTM data flow. Finally, -we design the hardware architecture, named Efficient Speech Recognition Engine -(ESE) that works directly on the compressed model. Implemented on Xilinx -XCKU060 FPGA running at 200MHz, ESE has a performance of 282 GOPS working -directly on the compressed LSTM network, corresponding to 2.52 TOPS on the -uncompressed one, and processes a full LSTM for speech recognition with a power -dissipation of 41 Watts. Evaluated on the LSTM for speech recognition -benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X -GPU implementations. It achieves 40x and 11.5x higher energy efficiency -compared with the CPU and GPU respectively. -" -3932,1612.00729,Sowmya Vajjala,"Automated assessment of non-native learner essays: Investigating the - role of linguistic features",cs.CL," Automatic essay scoring (AES) refers to the process of scoring free text -responses to given prompts, considering human grader scores as the gold -standard. Writing such essays is an essential component of many language and -aptitude exams. Hence, AES became an active and established area of research, -and there are many proprietary systems used in real life applications today. -However, not much is known about which specific linguistic features are useful -for prediction and how much of this is consistent across datasets. This article -addresses that by exploring the role of various linguistic features in -automatic essay scoring using two publicly available datasets of non-native -English essays written in test taking scenarios. The linguistic properties are -modeled by encoding lexical, syntactic, discourse and error types of learner -language in the feature set. Predictive models are then developed using these -features on both datasets and the most predictive features are compared. While -the results show that the feature set used results in good predictive models -with both datasets, the question ""what are the most predictive features?"" has a -different answer for each dataset. -" -3933,1612.00837,"Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh","Making the V in VQA Matter: Elevating the Role of Image Understanding in - Visual Question Answering",cs.CV cs.AI cs.CL cs.LG," Problems at the intersection of vision and language are of significant -importance both as challenging research questions and for the rich set of -applications they enable. However, inherent structure in our world and bias in -our language tend to be a simpler signal for learning than visual modalities, -resulting in models that ignore visual information, leading to an inflated -sense of their capability. - We propose to counter these language priors for the task of Visual Question -Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance -the popular VQA dataset by collecting complementary images such that every -question in our balanced dataset is associated with not just a single image, -but rather a pair of similar images that result in two different answers to the -question. Our dataset is by construction more balanced than the original VQA -dataset and has approximately twice the number of image-question pairs. Our -complete balanced dataset is publicly available at www.visualqa.org as part of -the 2nd iteration of the Visual Question Answering Dataset and Challenge (VQA -v2.0). - We further benchmark a number of state-of-art VQA models on our balanced -dataset. All models perform significantly worse on our balanced dataset, -suggesting that these models have indeed learned to exploit language priors. -This finding provides the first concrete empirical evidence for what seems to -be a qualitative sense among practitioners. - Finally, our data collection protocol for identifying complementary images -enables us to develop a novel interpretable model, which in addition to -providing an answer to the given (image, question) pair, also provides a -counter-example based explanation. Specifically, it identifies an image that is -similar to the original image, but it believes has a different answer to the -same question. This can help in building trust for machines among their users. -" -3934,1612.00866,John Beieler,"Creating a Real-Time, Reproducible Event Dataset",cs.CL," The generation of political event data has remained much the same since the -mid-1990s, both in terms of data acquisition and the process of coding text -into data. Since the 1990s, however, there have been significant improvements -in open-source natural language processing software and in the availability of -digitized news content. This paper presents a new, next-generation event -dataset, named Phoenix, that builds from these and other advances. This dataset -includes improvements in the underlying news collection process and event -coding software, along with the creation of a general processing pipeline -necessary to produce daily-updated data. This paper provides a face validity -checks by briefly examining the data for the conflict in Syria, and a -comparison between Phoenix and the Integrated Crisis Early Warning System data. -" -3935,1612.00913,"Xuesong Yang, Yun-Nung Chen, Dilek Hakkani-Tur, Paul Crook, Xiujun Li, - Jianfeng Gao, Li Deng","End-to-End Joint Learning of Natural Language Understanding and Dialogue - Manager",cs.CL cs.LG," Natural language understanding and dialogue policy learning are both -essential in conversational systems that predict the next system actions in -response to a current user utterance. Conventional approaches aggregate -separate models of natural language understanding (NLU) and system action -prediction (SAP) as a pipeline that is sensitive to noisy outputs of -error-prone NLU. To address the issues, we propose an end-to-end deep recurrent -neural network with limited contextual dialogue memory by jointly training NLU -and SAP on DSTC4 multi-domain human-human dialogues. Experiments show that our -proposed model significantly outperforms the state-of-the-art pipeline models -for both NLU and SAP, which indicates that our joint model is capable of -mitigating the affects of noisy NLU outputs, and NLU model can be refined by -error flows backpropagating from the extra supervised signals of system -actions. -" -3936,1612.00944,"Muthu Kumar Chandrasekaran, Carrie Demmans Epp, Min-Yen Kan, Diane - Litman",Using Discourse Signals for Robust Instructor Intervention Prediction,cs.AI cs.CL cs.CY," We tackle the prediction of instructor intervention in student posts from -discussion forums in Massive Open Online Courses (MOOCs). Our key finding is -that using automatically obtained discourse relations improves the prediction -of when instructors intervene in student discussions, when compared with a -state-of-the-art, feature-rich baseline. Our supervised classifier makes use of -an automatic discourse parser which outputs Penn Discourse Treebank (PDTB) tags -that represent in-post discourse features. We show PDTB relation-based features -increase the robustness of the classifier and complement baseline features in -recalling more diverse instructor intervention patterns. In comprehensive -experiments over 14 MOOC offerings from several disciplines, the PDTB discourse -features improve performance on average. The resultant models are less -dependent on domain-specific vocabulary, allowing them to better generalize to -new courses. -" -3937,1612.00969,Subhro Roy and Dan Roth,"Unit Dependency Graph and its Application to Arithmetic Word Problem - Solving",cs.CL," Math word problems provide a natural abstraction to a range of natural -language understanding problems that involve reasoning about quantities, such -as interpreting election results, news about casualties, and the financial -section of a newspaper. Units associated with the quantities often provide -information that is essential to support this reasoning. This paper proposes a -principled way to capture and reason about units and shows how it can benefit -an arithmetic word problem solver. This paper presents the concept of Unit -Dependency Graphs (UDGs), which provides a compact representation of the -dependencies between units of numbers mentioned in a given problem. Inducing -the UDG alleviates the brittleness of the unit extraction system and allows for -a natural way to leverage domain knowledge about unit compatibility, for word -problem solving. We introduce a decomposed model for inducing UDGs with minimal -additional annotations, and use it to augment the expressions used in the -arithmetic word problem solver of (Roy and Roth 2015) via a constrained -inference framework. We show that introduction of UDGs reduces the error of the -solver by over 10 %, surpassing all existing systems for solving arithmetic -word problems. In addition, it also makes the system more robust to adaptation -to new vocabulary and equation forms . -" -3938,1612.01039,"Hu Xu, Sihong Xie, Lei Shu, Philip S. Yu","CER: Complementary Entity Recognition via Knowledge Expansion on Large - Unlabeled Product Reviews",cs.CL," Product reviews contain a lot of useful information about product features -and customer opinions. One important product feature is the complementary -entity (products) that may potentially work together with the reviewed product. -Knowing complementary entities of the reviewed product is very important -because customers want to buy compatible products and avoid incompatible ones. -In this paper, we address the problem of Complementary Entity Recognition -(CER). Since no existing method can solve this problem, we first propose a -novel unsupervised method to utilize syntactic dependency paths to recognize -complementary entities. Then we expand category-level domain knowledge about -complementary entities using only a few general seed verbs on a large amount of -unlabeled reviews. The domain knowledge helps the unsupervised method to adapt -to different products and greatly improves the precision of the CER task. The -advantage of the proposed method is that it does not require any labeled data -for training. We conducted experiments on 7 popular products with about 1200 -reviews in total to demonstrate that the proposed approach is effective. -" -3939,1612.01197,"Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao","Neural Symbolic Machines: Learning Semantic Parsers on Freebase with - Weak Supervision (Short Version)",cs.CL cs.AI cs.LG," Extending the success of deep neural networks to natural language -understanding and symbolic reasoning requires complex operations and external -memory. Recent neural program induction approaches have attempted to address -this problem, but are typically limited to differentiable memory, and -consequently cannot scale beyond small synthetic tasks. In this work, we -propose the Manager-Programmer-Computer framework, which integrates neural -networks with non-differentiable memory to support abstract, scalable and -precise operations through a friendly neural computer interface. Specifically, -we introduce a Neural Symbolic Machine, which contains a sequence-to-sequence -neural ""programmer"", and a non-differentiable ""computer"" that is a Lisp -interpreter with code assist. To successfully apply REINFORCE for training, we -augment it with approximate gold programs found by an iterative maximum -likelihood training process. NSM is able to learn a semantic parser from weak -supervision over a large knowledge base. It achieves new state-of-the-art -performance on WebQuestionsSP, a challenging semantic parsing dataset, with -weak supervision. Compared to previous approaches, NSM is end-to-end, therefore -does not rely on feature engineering or domain specific knowledge. -" -3940,1612.01340,"Ankesh Anand, Tanmoy Chakraborty, Noseong Park","We used Neural Networks to Detect Clickbaits: You won't believe what - happened Next!",cs.CL cs.IR," Online content publishers often use catchy headlines for their articles in -order to attract users to their websites. These headlines, popularly known as -clickbaits, exploit a user's curiosity gap and lure them to click on links that -often disappoint them. Existing methods for automatically detecting clickbaits -rely on heavy feature engineering and domain knowledge. Here, we introduce a -neural network architecture based on Recurrent Neural Networks for detecting -clickbaits. Our model relies on distributed word representations learned from a -large unannotated corpora, and character embeddings learned via Convolutional -Neural Networks. Experimental results on a dataset of news headlines show that -our model outperforms existing techniques for clickbait detection with an -accuracy of 0.98 with F1-score of 0.98 and ROC-AUC of 0.99. -" -3941,1612.01404,"Eug\'enio Ribeiro, Ricardo Ribeiro and David Martins de Matos","Mapping the Dialog Act Annotations of the LEGO Corpus into the - Communicative Functions of ISO 24617-2",cs.CL," In this paper we present strategies for mapping the dialog act annotations of -the LEGO corpus into the communicative functions of the ISO 24617-2 standard. -Using these strategies, we obtained an additional 347 dialogs annotated -according to the standard. This is particularly important given the reduced -amount of existing data in those conditions due to the recency of the standard. -Furthermore, these are dialogs from a widely explored corpus for dialog related -tasks. However, its dialog annotations have been neglected due to their high -domain-dependency, which renders them unuseful outside the context of the -corpus. Thus, through our mapping process, we both obtain more data annotated -according to a recent standard and provide useful dialog act annotations for a -widely explored corpus in the context of dialog research. -" -3942,1612.01556,"Mika Viking M\""antyl\""a, Daniel Graziotin, Miikka Kuutila","The Evolution of Sentiment Analysis - A Review of Research Topics, - Venues, and Top Cited Papers",cs.CL cs.DL cs.SI," Sentiment analysis is one of the fastest growing research areas in computer -science, making it challenging to keep track of all the activities in the area. -We present a computer-assisted literature review, where we utilize both text -mining and qualitative coding, and analyze 6,996 papers from Scopus. We find -that the roots of sentiment analysis are in the studies on public opinion -analysis at the beginning of 20th century and in the text subjectivity analysis -performed by the computational linguistics community in 1990's. However, the -outbreak of computer-based sentiment analysis only occurred with the -availability of subjective texts on the Web. Consequently, 99% of the papers -have been published after 2004. Sentiment analysis papers are scattered to -multiple publication venues, and the combined number of papers in the top-15 -venues only represent ca. 30% of the papers in total. We present the top-20 -cited papers from Google Scholar and Scopus and a taxonomy of research topics. -In recent years, sentiment analysis has shifted from analyzing online product -reviews to social media texts from Twitter and Facebook. Many topics beyond -product reviews like stock markets, elections, disasters, medicine, software -engineering and cyberbullying extend the utilization of sentiment analysis -" -3943,1612.01627,"Yu Wu, Wei Wu, Chen Xing, Ming Zhou, Zhoujun Li","Sequential Matching Network: A New Architecture for Multi-turn Response - Selection in Retrieval-based Chatbots",cs.CL," We study response selection for multi-turn conversation in retrieval-based -chatbots. Existing work either concatenates utterances in context or matches a -response with a highly abstract context vector finally, which may lose -relationships among utterances or important contextual information. We propose -a sequential matching network (SMN) to address both problems. SMN first matches -a response with each utterance in the context on multiple levels of -granularity, and distills important matching information from each pair as a -vector with convolution and pooling operations. The vectors are then -accumulated in a chronological order through a recurrent neural network (RNN) -which models relationships among utterances. The final matching score is -calculated with the hidden states of the RNN. An empirical study on two public -data sets shows that SMN can significantly outperform state-of-the-art methods -for response selection in multi-turn conversation. -" -3944,1612.01744,"Alexandre Berard and Olivier Pietquin and Christophe Servan and - Laurent Besacier","Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text - Translation",cs.CL," This paper proposes a first attempt to build an end-to-end speech-to-text -translation system, which does not use source language transcription during -learning or decoding. We propose a model for direct speech-to-text translation, -which gives promising results on a small French-English synthetic corpus. -Relaxing the need for source language transcription would drastically change -the data collection methodology in speech translation, especially in -under-resourced scenarios. For instance, in the former project DARPA TRANSTAC -(speech translation from spoken Arabic dialects), a large effort was devoted to -the collection of speech transcripts (and a prerequisite to obtain transcripts -was often a detailed transcription guide for languages with little standardized -spelling). Now, if end-to-end approaches for speech-to-text translation are -successful, one might consider collecting data by asking bilingual speakers to -directly utter speech in the source language from target language text -utterances. Such an approach has the advantage to be applicable to any -unwritten (source) language. -" -3945,1612.01848,"Aaditya Prakash, Siyuan Zhao, Sadid A. Hasan, Vivek Datla, Kathy Lee, - Ashequl Qadir, Joey Liu, Oladimeji Farri",Condensed Memory Networks for Clinical Diagnostic Inferencing,cs.CL," Diagnosis of a clinical condition is a challenging task, which often requires -significant medical investigation. Previous work related to diagnostic -inferencing problems mostly consider multivariate observational data (e.g. -physiological signals, lab tests etc.). In contrast, we explore the problem -using free-text medical notes recorded in an electronic health record (EHR). -Complex tasks like these can benefit from structured knowledge bases, but those -are not scalable. We instead exploit raw text from Wikipedia as a knowledge -source. Memory networks have been demonstrated to be effective in tasks which -require comprehension of free-form text. They use the final iteration of the -learned representation to predict probable classes. We introduce condensed -memory neural networks (C-MemNNs), a novel model with iterative condensation of -memory representations that preserves the hierarchy of features in the memory. -Experiments on the MIMIC-III dataset show that the proposed model outperforms -other variants of memory networks to predict the most probable diagnoses given -a complex clinical scenario. -" -3946,1612.01892,"Gautam Singh, Saemi Jang, Mun Y. Yi",Cross-Lingual Predicate Mapping Between Linked Data Ontologies,cs.AI cs.CL," Ontologies in different natural languages often differ in quality in terms of -richness of schema or richness of internal links. This difference is markedly -visible when comparing a rich English language ontology with a non-English -language counterpart. Discovering alignment between them is a useful endeavor -as it serves as a starting point in bridging the disparity. In particular, our -work is motivated by the absence of inter-language links for predicates in the -localised versions of DBpedia. In this paper, we propose and demonstrate an -ad-hoc system to find possible owl:equivalentProperty links between predicates -in ontologies of different natural languages. We seek to achieve this mapping -by using pre-existing inter-language links of the resources connected by the -given predicate. Thus, our methodology stresses on semantic similarity rather -than lexical. Moreover, through an evaluation, we show that our system is -capable of outperforming a baseline system that is similar to the one used in -recent OAEI campaigns. -" -3947,1612.01928,"Dmitriy Serdyuk, Kartik Audhkhasi, Phil\'emon Brakel, Bhuvana - Ramabhadran, Samuel Thomas, Yoshua Bengio",Invariant Representations for Noisy Speech Recognition,cs.CL cs.CV cs.LG cs.SD stat.ML," Modern automatic speech recognition (ASR) systems need to be robust under -acoustic variability arising from environmental, speaker, channel, and -recording conditions. Ensuring such robustness to variability is a challenge in -modern day neural network-based ASR systems, especially when all types of -variability are not seen during training. We attempt to address this problem by -encouraging the neural network acoustic model to learn invariant feature -representations. We use ideas from recent research on image generation using -Generative Adversarial Networks and domain adaptation ideas extending -adversarial gradient-based training. A recent work from Ganin et al. proposes -to use adversarial training for image domain adaptation by using an -intermediate representation from the main target classification network to -deteriorate the domain classifier performance through a separate neural -network. Our work focuses on investigating neural architectures which produce -representations invariant to noise conditions for ASR. We evaluate the proposed -architecture on the Aurora-4 task, a popular benchmark for noise robust ASR. We -show that our method generalizes better than the standard multi-condition -training especially when only a few noise categories are seen during training. -" -3948,1612.02251,H\'ector Mart\'inez Alonso and Barbara Plank,"When is multitask learning effective? Semantic sequence prediction under - varying data conditions",cs.CL," Multitask learning has been applied successfully to a range of tasks, mostly -morphosyntactic. However, little is known on when MTL works and whether there -are data characteristics that help to determine its success. In this paper we -evaluate a range of semantic sequence labeling tasks in a MTL setup. We examine -different auxiliary tasks, amongst which a novel setup, and correlate their -impact to data-dependent conditions. Our results show that MTL is not always -effective, significant improvements are obtained only for 1 out of 5 tasks. -When successful, auxiliary tasks with compact and more uniform label -distributions are preferable. -" -3949,1612.02482,"Krupakar Hans, R S Milton","Improving the Performance of Neural Machine Translation Involving - Morphologically Rich Languages",cs.CL cs.LG cs.NE," The advent of the attention mechanism in neural machine translation models -has improved the performance of machine translation systems by enabling -selective lookup into the source sentence. In this paper, the efficiencies of -translation using bidirectional encoder attention decoder models were studied -with respect to translation involving morphologically rich languages. The -English - Tamil language pair was selected for this analysis. First, the use of -Word2Vec embedding for both the English and Tamil words improved the -translation results by 0.73 BLEU points over the baseline RNNSearch model with -4.84 BLEU score. The use of morphological segmentation before word -vectorization to split the morphologically rich Tamil words into their -respective morphemes before the translation, caused a reduction in the target -vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the -performance of neural machine translation by 7.05 BLEU points over the -RNNSearch model used over the same corpus. Since the BLEU evaluation of the -RNNMorph model might be unreliable due to an increase in the number of matching -tokens per sentence, the performances of the translations were also compared by -means of human evaluation metrics of adequacy, fluency and relative ranking. -Further, the use of morphological segmentation also improved the efficacy of -the attention mechanism. -" -3950,1612.02695,Jan Chorowski and Navdeep Jaitly,"Towards better decoding and language model integration in sequence to - sequence models",cs.NE cs.CL cs.LG stat.ML," The recently proposed Sequence-to-Sequence (seq2seq) framework advocates -replacing complex data processing pipelines, such as an entire automatic speech -recognition system, with a single neural network trained in an end-to-end -fashion. In this contribution, we analyse an attention-based seq2seq speech -recognition system that directly transcribes recordings into characters. We -observe two shortcomings: overconfidence in its predictions and a tendency to -produce incomplete transcriptions when language models are used. We propose -practical solutions to both problems achieving competitive speaker independent -word error rates on the Wall Street Journal dataset: without separate language -models we reach 10.6% WER, while together with a trigram language model, we -reach 6.7% WER. -" -3951,1612.02703,"Massimiliano Mancini, Jose Camacho-Collados, Ignacio Iacobacci and - Roberto Navigli","Embedding Words and Senses Together via Joint Knowledge-Enhanced - Training",cs.CL," Word embeddings are widely used in Natural Language Processing, mainly due to -their success in capturing semantic information from massive corpora. However, -their creation process does not allow the different meanings of a word to be -automatically separated, as it conflates them into a single vector. We address -this issue by proposing a new model which learns word and sense embeddings -jointly. Our model exploits large corpora and knowledge from semantic networks -in order to produce a unified vector space of word and sense embeddings. We -evaluate the main features of our approach both qualitatively and -quantitatively in a variety of tasks, highlighting the advantages of the -proposed method in comparison to state-of-the-art word- and sense-based models. -" -3952,1612.02706,Karl Stratos,Entity Identification as Multitasking,cs.CL," Standard approaches in entity identification hard-code boundary detection and -type prediction into labels (e.g., John/B-PER Smith/I-PER) and then perform -Viterbi. This has two disadvantages: 1. the runtime complexity grows -quadratically in the number of types, and 2. there is no natural segment-level -representation. In this paper, we propose a novel neural architecture that -addresses these disadvantages. We frame the problem as multitasking, separating -boundary detection and type prediction but optimizing them jointly. Despite its -simplicity, this architecture performs competitively with fully structured -models such as BiLSTM-CRFs while scaling linearly in the number of types. -Furthermore, by construction, the model induces type-disambiguating embeddings -of predicted mentions. -" -3953,1612.02741,"Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin",Coupling Distributed and Symbolic Execution for Natural Language Queries,cs.LG cs.AI cs.CL cs.NE cs.SE," Building neural networks to query a knowledge base (a table) with natural -language is an emerging research topic in deep learning. An executor for table -querying typically requires multiple steps of execution because queries may -have complicated structures. In previous studies, researchers have developed -either fully distributed executors or symbolic executors for table querying. A -distributed executor can be trained in an end-to-end fashion, but is weak in -terms of execution efficiency and explicit interpretability. A symbolic -executor is efficient in execution, but is very difficult to train especially -at initial stages. In this paper, we propose to couple distributed and symbolic -execution for natural language queries, where the symbolic executor is -pretrained with the distributed executor's intermediate execution results in a -step-by-step fashion. Experiments show that our approach significantly -outperforms both distributed and symbolic executors, exhibiting high accuracy, -high learning efficiency, high execution efficiency, and high interpretability. -" -3954,1612.02801,"Wenchao Du, Pascal Poupart, Wei Xu",Discovering Conversational Dependencies between Messages in Dialogs,cs.CL," We investigate the task of inferring conversational dependencies between -messages in one-on-one online chat, which has become one of the most popular -forms of customer service. We propose a novel probabilistic classifier that -leverages conversational, lexical and semantic information. The approach is -evaluated empirically on a set of customer service chat logs from a Chinese -e-commerce website. It outperforms heuristic baselines. -" -3955,1612.03205,"Peter Potash, Alexey Romanov, Anna Rumshisky","Evaluating Creative Language Generation: The Case of Rap Lyric - Ghostwriting",cs.CL," Language generation tasks that seek to mimic human ability to use language -creatively are difficult to evaluate, since one must consider creativity, -style, and other non-trivial aspects of the generated text. The goal of this -paper is to develop evaluation methods for one such task, ghostwriting of rap -lyrics, and to provide an explicit, quantifiable foundation for the goals and -future directions of this task. Ghostwriting must produce text that is similar -in style to the emulated artist, yet distinct in content. We develop a novel -evaluation methodology that addresses several complementary aspects of this -task, and illustrate how such evaluation can be used to meaningfully analyze -system performance. We provide a corpus of lyrics for 13 rap artists, annotated -for stylistic similarity, which allows us to assess the feasibility of manual -evaluation for generated verse. -" -3956,1612.03216,"Peter Potash, Alexey Romanov, Anna Rumshisky",#HashtagWars: Learning a Sense of Humor,cs.CL," In this work, we present a new dataset for computational humor, specifically -comparative humor ranking, which attempts to eschew the ubiquitous binary -approach to humor detection. The dataset consists of tweets that are humorous -responses to a given hashtag. We describe the motivation for this new dataset, -as well as the collection process, which includes a description of our -semi-automated system for data collection. We also present initial experiments -for this dataset using both unsupervised and supervised approaches. Our best -supervised system achieved 63.7% accuracy, suggesting that this task is much -more difficult than comparable humor detection tasks. Initial experiments -indicate that a character-level model is more suitable for this task than a -token-level model, likely due to a large amount of puns that can be captured by -a character-level model. -" -3957,1612.03226,"Jiaji Huang, Rewon Child, Vinay Rao, Hairong Liu, Sanjeev Satheesh, - Adam Coates",Active Learning for Speech Recognition: the Power of Gradients,cs.CL cs.LG stat.ML," In training speech recognition systems, labeling audio clips can be -expensive, and not all data is equally valuable. Active learning aims to label -only the most informative samples to reduce cost. For speech recognition, -confidence scores and other likelihood-based active learning methods have been -shown to be effective. Gradient-based active learning methods, however, are -still not well-understood. This work investigates the Expected Gradient Length -(EGL) approach in active learning for end-to-end speech recognition. We justify -EGL from a variance reduction perspective, and observe that EGL's measure of -informativeness picks novel samples uncorrelated with confidence scores. -Experimentally, we show that EGL can reduce word errors by 11\%, or -alternatively, reduce the number of samples to label by 50\%, when compared to -random sampling. -" -3958,1612.03231,"Yongjun Zhu, Erjia Yan, Il-Yeol Song","A natural language interface to a graph-based bibliographic information - retrieval system",cs.IR cs.CL," With the ever-increasing scientific literature, there is a need on a natural -language interface to bibliographic information retrieval systems to retrieve -related information effectively. In this paper, we propose a natural language -interface, NLI-GIBIR, to a graph-based bibliographic information retrieval -system. In designing NLI-GIBIR, we developed a novel framework that can be -applicable to graph-based bibliographic information retrieval systems. Our -framework integrates algorithms/heuristics for interpreting and analyzing -natural language bibliographic queries. NLI-GIBIR allows users to search for a -variety of bibliographic data through natural language. A series of text- and -linguistic-based techniques are used to analyze and answer natural language -queries, including tokenization, named entity recognition, and syntactic -analysis. We find that our framework can effectively represents and addresses -complex bibliographic information needs. Thus, the contributions of this paper -are as follows: First, to our knowledge, it is the first attempt to propose a -natural language interface to graph-based bibliographic information retrieval. -Second, we propose a novel customized natural language processing framework -that integrates a few original algorithms/heuristics for interpreting and -analyzing natural language bibliographic queries. Third, we show that the -proposed framework and natural language interface provide a practical solution -in building real-world natural language interface-based bibliographic -information retrieval systems. Our experimental results show that the presented -system can correctly answer 39 out of 40 example natural language queries with -varying lengths and complexities. -" -3959,1612.03266,"Matti Lankinen, Hannes Heikinheimo, Pyry Takala, Tapani Raiko and Juha - Karhunen",A Character-Word Compositional Neural Language Model for Finnish,cs.CL," Inspired by recent research, we explore ways to model the highly -morphological Finnish language at the level of characters while maintaining the -performance of word-level models. We propose a new -Character-to-Word-to-Character (C2W2C) compositional language model that uses -characters as input and output while still internally processing word level -embeddings. Our preliminary experiments, using the Finnish Europarl V7 corpus, -indicate that C2W2C can respond well to the challenges of morphologically rich -languages such as high out of vocabulary rates, the prediction of novel words, -and growing vocabulary size. Notably, the model is able to correctly score -inflectional forms that are not present in the training data and sample -grammatically and semantically correct Finnish sentences character by -character. -" -3960,1612.03277,"Seyed-Mehdi-Reza Beheshti and Alireza Tabebordbar and Boualem - Benatallah and Reza Nouri",Data Curation APIs,cs.IR cs.CL," Understanding and analyzing big data is firmly recognized as a powerful and -strategic priority. For deeper interpretation of and better intelligence with -big data, it is important to transform raw data (unstructured, semi-structured -and structured data sources, e.g., text, video, image data sets) into curated -data: contextualized data and knowledge that is maintained and made available -for use by end-users and applications. In particular, data curation acts as the -glue between raw data and analytics, providing an abstraction layer that -relieves users from time consuming, tedious and error prone curation tasks. In -this context, the data curation process becomes a vital analytics asset for -increasing added value and insights. - In this paper, we identify and implement a set of curation APIs and make them -available (on GitHub) to researchers and developers to assist them transforming -their raw data into curated data. The curation APIs enable developers to easily -add features - such as extracting keyword, part of speech, and named entities -such as Persons, Locations, Organizations, Companies, Products, Diseases, -Drugs, etc.; providing synonyms and stems for extracted information items -leveraging lexical knowledge bases for the English language such as WordNet; -linking extracted entities to external knowledge bases such as Google Knowledge -Graph and Wikidata; discovering similarity among the extracted information -items, such as calculating similarity between string, number, date and time -data; classifying, sorting and categorizing data into various types, forms or -any other distinct class; and indexing structured and unstructured data - into -their applications. -" -3961,1612.03494,Vasileios Lampos,"Flu Detector: Estimating influenza-like illness rates from online - user-generated content",cs.AI cs.CL cs.SI," We provide a brief technical description of an online platform for disease -monitoring, titled as the Flu Detector (fludetector.cs.ucl.ac.uk). Flu -Detector, in its current version (v.0.5), uses either Twitter or Google search -data in conjunction with statistical Natural Language Processing models to -estimate the rate of influenza-like illness in the population of England. Its -back-end is a live service that collects online data, utilises modern -technologies for large-scale text processing, and finally applies statistical -inference models that are trained offline. The front-end visualises the various -disease rate estimates. Notably, the models based on Google data achieve a high -level of accuracy with respect to the most recent four flu seasons in England -(2012/13 to 2015/16). This highlighted Flu Detector as having a great potential -of becoming a complementary source to the domestic traditional flu surveillance -schemes. -" -3962,1612.03551,"Xun Wang, Katsuhito Sudoh, Masaaki Nagata, Tomohide Shibata, Daisuke - Kawahara and Sadao Kurohashi",Reading Comprehension using Entity-based Memory Network,cs.CL cs.AI," This paper introduces a novel neural network model for question answering, -the \emph{entity-based memory network}. It enhances neural networks' ability of -representing and calculating information over a long period by keeping records -of entities contained in text. The core component is a memory pool which -comprises entities' states. These entities' states are continuously updated -according to the input text. Questions with regard to the input text are used -to search the memory pool for related entities and answers are further -predicted based on the states of retrieved entities. Compared with previous -memory network models, the proposed model is capable of handling fine-grained -information and more sophisticated relations based on entities. We formulated -several different tasks as question answering problems and tested the proposed -model. Experiments reported satisfying results. -" -3963,1612.03597,"Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song, Alistair Willis",Search Personalization with Embeddings,cs.IR cs.CL," Recent research has shown that the performance of search personalization -depends on the richness of user profiles which normally represent the user's -topical interests. In this paper, we propose a new embedding approach to -learning user profiles, where users are embedded on a topical interest space. -We then directly utilize the user profiles for search personalization. -Experiments on query logs from a major commercial web search engine demonstrate -that our embedding approach improves the performance of the search engine and -also achieves better search performance than other strong baselines. -" -3964,1612.03628,"Marc Bola\~nos, \'Alvaro Peris, Francisco Casacuberta, Petia Radeva","VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question - Answering",cs.CV cs.CL," In this paper, we address the problem of visual question answering by -proposing a novel model, called VIBIKNet. Our model is based on integrating -Kernelized Convolutional Neural Networks and Long-Short Term Memory units to -generate an answer given a question about an image. We prove that VIBIKNet is -an optimal trade-off between accuracy and computational load, in terms of -memory and time consumption. We validate our method on the VQA challenge -dataset and compare it to the top performing methods in order to illustrate its -performance and speed. -" -3965,1612.03651,"Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, - H\'erve J\'egou, Tomas Mikolov",FastText.zip: Compressing text classification models,cs.CL cs.LG," We consider the problem of producing compact architectures for text -classification, such that the full model fits in a limited amount of memory. -After considering different solutions inspired by the hashing literature, we -propose a method built upon product quantization to store word embeddings. -While the original technique leads to a loss in accuracy, we adapt this method -to circumvent quantization artefacts. Our experiments carried out on several -benchmarks show that our approach typically requires two orders of magnitude -less memory than fastText while being only slightly inferior with respect to -accuracy. As a result, it outperforms the state of the art by a good margin in -terms of the compromise between memory usage and accuracy. -" -3966,1612.03659,"Iris Hendrickx, Louis Onrust, Florian Kunneman, Ali - H\""urriyeto\u{g}lu, Antal van den Bosch, Wessel Stoop",Unraveling reported dreams with text analytics,cs.CL," We investigate what distinguishes reported dreams from other personal -narratives. The continuity hypothesis, stemming from psychological dream -analysis work, states that most dreams refer to a person's daily life and -personal concerns, similar to other personal narratives such as diary entries. -Differences between the two texts may reveal the linguistic markers of dream -text, which could be the basis for new dream analysis work and for the -automatic detection of dream descriptions. We used three text analytics -methods: text classification, topic modeling, and text coherence analysis, and -applied these methods to a balanced set of texts representing dreams, diary -entries, and other personal stories. We observed that dream texts could be -distinguished from other personal narratives nearly perfectly, mostly based on -the presence of uncertainty markers and descriptions of scenes. Important -markers for non-dream narratives are specific time expressions and -conversational expressions. Dream texts also exhibit a lower discourse -coherence than other personal narratives. -" -3967,1612.03762,"Carlo Combi, Margherita Zorzi, Gabriele Pozzani, Ugo Moretti","From narrative descriptions to MedDRA: automagically encoding adverse - drug reactions",cs.CL," The collection of narrative spontaneous reports is an irreplaceable source -for the prompt detection of suspected adverse drug reactions (ADRs): qualified -domain experts manually revise a huge amount of narrative descriptions and then -encode texts according to MedDRA standard terminology. The manual annotation of -narrative documents with medical terminology is a subtle and expensive task, -since the number of reports is growing up day-by-day. MagiCoder, a Natural -Language Processing algorithm, is proposed for the automatic encoding of -free-text descriptions into MedDRA terms. MagiCoder procedure is efficient in -terms of computational complexity (in particular, it is linear in the size of -the narrative input and the terminology). We tested it on a large dataset of -about 4500 manually revised reports, by performing an automated comparison -between human and MagiCoder revisions. For the current base version of -MagiCoder, we measured: on short descriptions, an average recall of $86\%$ and -an average precision of $88\%$; on medium-long descriptions (up to 255 -characters), an average recall of $64\%$ and an average precision of $63\%$. -From a practical point of view, MagiCoder reduces the time required for -encoding ADR reports. Pharmacologists have simply to review and validate the -MagiCoder terms proposed by the application, instead of choosing the right -terms among the 70K low level terms of MedDRA. Such improvement in the -efficiency of pharmacologists' work has a relevant impact also on the quality -of the subsequent data analysis. We developed MagiCoder for the Italian -pharmacovigilance language. However, our proposal is based on a general -approach, not depending on the considered language nor the term dictionary. -" -3968,1612.03769,"Yushi Yao, Guangjian Li",Context-aware Sentiment Word Identification: sentiword2vec,cs.CL cs.AI," Traditional sentiment analysis often uses sentiment dictionary to extract -sentiment information in text and classify documents. However, emerging -informal words and phrases in user generated content call for analysis aware to -the context. Usually, they have special meanings in a particular context. -Because of its great performance in representing inter-word relation, we use -sentiment word vectors to identify the special words. Based on the distributed -language model word2vec, in this paper we represent a novel method about -sentiment representation of word under particular context, to be detailed, to -identify the words with abnormal sentiment polarity in long answers. Result -shows the improved model shows better performance in representing the words -with special meaning, while keep doing well in representing special idiomatic -pattern. Finally, we will discuss the meaning of vectors representing in the -field of sentiment, which may be different from general object-based -conditions. -" -3969,1612.03791,Felix Stahlberg and Adri\`a de Gispert and Eva Hasler and Bill Byrne,"Neural Machine Translation by Minimising the Bayes-risk with Respect to - Syntactic Translation Lattices",cs.CL," We present a novel scheme to combine neural machine translation (NMT) with -traditional statistical machine translation (SMT). Our approach borrows ideas -from linearised lattice minimum Bayes-risk decoding for SMT. The NMT score is -combined with the Bayes-risk of the translation according the SMT lattice. This -makes our approach much more flexible than $n$-best list or lattice rescoring -as the neural decoder is not restricted to the SMT search space. We show an -efficient and simple way to integrate risk estimation into the NMT decoder -which is suitable for word-level as well as subword-unit-level NMT. We test our -method on English-German and Japanese-English and report significant gains over -lattice rescoring on several data sets for both single and ensembled NMT. The -MBR decoder produces entirely new hypotheses far beyond simply rescoring the -SMT search space or fixing UNKs in the NMT output. -" -3970,1612.03929,"Nabiha Asghar, Pascal Poupart, Xin Jiang, Hang Li",Deep Active Learning for Dialogue Generation,cs.CL cs.AI cs.NE," We propose an online, end-to-end, neural generative conversational model for -open-domain dialogue. It is trained using a unique combination of offline -two-phase supervised learning and online human-in-the-loop active learning. -While most existing research proposes offline supervision or hand-crafted -reward functions for online reinforcement, we devise a novel interactive -learning mechanism based on hamming-diverse beam search for response generation -and one-character user-feedback at each step. Experiments show that our model -inherently promotes the generation of semantically relevant and interesting -responses, and can be used to train agents with customized personas, moods and -conversational styles. -" -3971,1612.03969,"Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes and Yann - LeCun",Tracking the World State with Recurrent Entity Networks,cs.CL," We introduce a new model, the Recurrent Entity Network (EntNet). It is -equipped with a dynamic long-term memory which allows it to maintain and update -a representation of the state of the world as it receives new data. For -language understanding tasks, it can reason on-the-fly as it reads text, not -just when it is required to answer a question or respond as is the case for a -Memory Network (Sukhbaatar et al., 2015). Like a Neural Turing Machine or -Differentiable Neural Computer (Graves et al., 2014; 2016) it maintains a fixed -size memory and can learn to perform location and content-based read and write -operations. However, unlike those models it has a simple parallel architecture -in which several memory locations can be updated simultaneously. The EntNet -sets a new state-of-the-art on the bAbI tasks, and is the first method to solve -all the tasks in the 10k training examples setting. We also demonstrate that it -can solve a reasoning task which requires a large number of supporting facts, -which other methods are not able to solve, and can generalize past its training -horizon. It can also be practically used on large scale datasets such as -Children's Book Test, where it obtains competitive performance, reading the -story in a single pass. -" -3972,1612.03975,"Robyn Speer, Joshua Chin and Catherine Havasi",ConceptNet 5.5: An Open Multilingual Graph of General Knowledge,cs.CL," Machine learning about language can be improved by supplying it with specific -knowledge and sources of external information. We present here a new version of -the linked open data resource ConceptNet that is particularly well suited to be -used with modern NLP techniques such as word embeddings. - ConceptNet is a knowledge graph that connects words and phrases of natural -language with labeled edges. Its knowledge is collected from many sources that -include expert-created resources, crowd-sourcing, and games with a purpose. It -is designed to represent the general knowledge involved in understanding -language, improving natural language applications by allowing the application -to better understand the meanings behind the words people use. - When ConceptNet is combined with word embeddings acquired from distributional -semantics (such as word2vec), it provides applications with understanding that -they would not acquire from distributional semantics alone, nor from narrower -resources such as WordNet or DBPedia. We demonstrate this with state-of-the-art -results on intrinsic evaluations of word relatedness that translate into -improvements on applications of word vectors, including solving SAT-style -analogies. -" -3973,1612.03990,"Xiang Kong, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel","Evaluating Automatic Speech Recognition Systems in Comparison With Human - Perception Results Using Distinctive Feature Measures",cs.CL," This paper describes methods for evaluating automatic speech recognition -(ASR) systems in comparison with human perception results, using measures -derived from linguistic distinctive features. Error patterns in terms of -manner, place and voicing are presented, along with an examination of confusion -matrices via a distinctive-feature-distance metric. These evaluation methods -contrast with conventional performance criteria that focus on the phone or word -level, and are intended to provide a more detailed profile of ASR system -performance,as well as a means for direct comparison with human perception -results at the sub-phonemic level. -" -3974,1612.03991,"Xiang Kong, Preethi Jyothi, Mark Hasegawa-Johnson","Performance Improvements of Probabilistic Transcript-adapted ASR with - Recurrent Neural Network and Language-specific Constraints",cs.CL," Mismatched transcriptions have been proposed as a mean to acquire -probabilistic transcriptions from non-native speakers of a language.Prior work -has demonstrated the value of these transcriptions by successfully adapting -cross-lingual ASR systems for different tar-get languages. In this work, we -describe two techniques to refine these probabilistic transcriptions: a -noisy-channel model of non-native phone misperception is trained using a -recurrent neural net-work, and decoded using minimally-resourced -language-dependent pronunciation constraints. Both innovations improve quality -of the transcript, and both innovations reduce phone error rate of a -trainedASR, by 7% and 9% respectively -" -3975,1612.04061,"Aditya Singh, Saurabh Saini, Rajvi Shah, PJ Narayanan",Learning to Hash-tag Videos with Tag2Vec,cs.CV cs.CL," User-given tags or labels are valuable resources for semantic understanding -of visual media such as images and videos. Recently, a new type of labeling -mechanism known as hash-tags have become increasingly popular on social media -sites. In this paper, we study the problem of generating relevant and useful -hash-tags for short video clips. Traditional data-driven approaches for tag -enrichment and recommendation use direct visual similarity for label transfer -and propagation. We attempt to learn a direct low-cost mapping from video to -hash-tags using a two step training process. We first employ a natural language -processing (NLP) technique, skip-gram models with neural network training to -learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a -corpus of 10 million hash-tags. We then train an embedding function to map -video features to the low-dimensional Tag2vec space. We learn this embedding -for 29 categories of short video clips with hash-tags. A query video without -any tag-information can then be directly mapped to the vector space of tags -using the learned embedding and relevant tags can be found by performing a -simple nearest-neighbor retrieval in the Tag2Vec space. We validate the -relevance of the tags suggested by our system qualitatively and quantitatively -with a user study. -" -3976,1612.04113,Gustavo Henrique Paetzold and Lucia Specia,Vicinity-Driven Paragraph and Sentence Alignment for Comparable Corpora,cs.CL," Parallel corpora have driven great progress in the field of Text -Simplification. However, most sentence alignment algorithms either offer a -limited range of alignment types supported, or simply ignore valuable clues -present in comparable documents. We address this problem by introducing a new -set of flexible vicinity-driven paragraph and sentence alignment algorithms -that 1-N, N-1, N-N and long distance null alignments without the need for -hard-to-replicate supervised models. -" -3977,1612.04118,Philipp Meerkamp (Bloomberg LP) and Zhengyi Zhou (AT&T Labs Research),"Information Extraction with Character-level Neural Networks and Free - Noisy Supervision",cs.CL cs.IR cs.LG," We present an architecture for information extraction from text that augments -an existing parser with a character-level neural network. The network is -trained using a measure of consistency of extracted data with existing -databases as a form of noisy supervision. Our architecture combines the ability -of constraint-based information extraction systems to easily incorporate domain -knowledge and constraints with the ability of deep neural networks to leverage -large amounts of data to learn complex features. Boosting the existing parser's -precision, the system led to large improvements over a mature and highly tuned -constraint-based production information extraction system used at Bloomberg for -financial language text. -" -3978,1612.04174,Bruno Nicenboim and Shravan Vasishth,"Models of retrieval in sentence comprehension: A computational - evaluation using Bayesian hierarchical modeling",cs.CL stat.AP stat.ML," Research on interference has provided evidence that the formation of -dependencies between non-adjacent words relies on a cue-based retrieval -mechanism. Two different models can account for one of the main predictions of -interference, i.e., a slowdown at a retrieval site, when several items share a -feature associated with a retrieval cue: Lewis and Vasishth's (2005) -activation-based model and McElree's (2000) direct access model. Even though -these two models have been used almost interchangeably, they are based on -different assumptions and predict differences in the relationship between -reading times and response accuracy. The activation-based model follows the -assumptions of ACT-R, and its retrieval process behaves as a lognormal race -between accumulators of evidence with a single variance. Under this model, -accuracy of the retrieval is determined by the winner of the race and retrieval -time by its rate of accumulation. In contrast, the direct access model assumes -a model of memory where only the probability of retrieval varies between items; -in this model, differences in latencies are a by-product of the possibility and -repairing incorrect retrievals. We implemented both models in a Bayesian -hierarchical framework in order to evaluate them and compare them. We show that -some aspects of the data are better fit under the direct access model than -under the activation-based model. We suggest that this finding does not rule -out the possibility that retrieval may be behaving as a race model with -assumptions that follow less closely the ones from the ACT-R framework. We show -that by introducing a modification of the activation model, i.e, by assuming -that the accumulation of evidence for retrieval of incorrect items is not only -slower but noisier (i.e., different variances for the correct and incorrect -items), the model can provide a fit as good as the one of the direct access -model. -" -3979,1612.04211,"Zhiguo Wang, Haitao Mi, Wael Hamza and Radu Florian",Multi-Perspective Context Matching for Machine Comprehension,cs.CL," Previous machine comprehension (MC) datasets are either too small to train -end-to-end deep learning models, or not difficult enough to evaluate the -ability of current MC techniques. The newly released SQuAD dataset alleviates -these limitations, and gives us a chance to develop more realistic MC models. -Based on this dataset, we propose a Multi-Perspective Context Matching (MPCM) -model, which is an end-to-end system that directly predicts the answer -beginning and ending points in a passage. Our model first adjusts each -word-embedding vector in the passage by multiplying a relevancy weight computed -against the question. Then, we encode the question and weighted passage by -using bi-directional LSTMs. For each point in the passage, our model matches -the context of this point against the encoded question from multiple -perspectives and produces a matching vector. Given those matched vectors, we -employ another bi-directional LSTM to aggregate all the information and predict -the beginning and ending points. Experimental result on the test set of SQuAD -shows that our model achieves a competitive result on the leaderboard. -" -3980,1612.04342,Radu Soricut and Nan Ding,"Building Large Machine Reading-Comprehension Datasets using Paragraph - Vectors",cs.CL," We present a dual contribution to the task of machine reading-comprehension: -a technique for creating large-sized machine-comprehension (MC) datasets using -paragraph-vector models; and a novel, hybrid neural-network architecture that -combines the representation power of recurrent neural networks with the -discriminative power of fully-connected multi-layered networks. We use the -MC-dataset generation technique to build a dataset of around 2 million -examples, for which we empirically determine the high-ceiling of human -performance (around 91% accuracy), as well as the performance of a variety of -computer models. Among all the models we have experimented with, our hybrid -neural-network architecture achieves the highest performance (83.2% accuracy). -The remaining gap to the human-performance ceiling provides enough room for -future model improvements. -" -3981,1612.04403,Mason Bretan,"You Are What You Eat... Listen to, Watch, and Read",cs.SI cs.CL cs.IR," This article describes a data driven method for deriving the relationship -between personality and media preferences. A qunatifiable representation of -such a relationship can be leveraged for use in recommendation systems and -ameliorate the ""cold start"" problem. Here, the data is comprised of an original -collection of 1,316 Okcupid dating profiles. Of these profiles, 800 are labeled -with one of 16 possible Myers-Briggs Type Indicators (MBTI). A personality -specific topic model describing a person's favorite books, movies, shows, -music, and food was generated using latent Dirichlet allocation (LDA). There -were several significant findings, for example, intuitive thinking types -preferred sci-fi/fantasy entertainment, extraversion correlated positively with -upbeat dance music, and jazz, folk, and international cuisine correlated -positively with those characterized by openness to experience. Many other -correlations confirmed previous findings describing the relationship among -personality, writing style, and personal preferences. (For complete -word/personality type assocations see the Appendix). -" -3982,1612.04418,"Alexey Drutsa (Yandex, Moscow, Russia), Andrey Shutovich (Yandex, - Moscow, Russia), Philipp Pushnyakov (Yandex, Moscow, Russia), Evgeniy - Krokhalyov (Yandex, Moscow, Russia), Gleb Gusev (Yandex, Moscow, Russia), - Pavel Serdyukov (Yandex, Moscow, Russia)",User Model-Based Intent-Aware Metrics for Multilingual Search Evaluation,cs.IR cs.CL cs.HC cs.LG stat.ML," Despite the growing importance of multilingual aspect of web search, no -appropriate offline metrics to evaluate its quality are proposed so far. At the -same time, personal language preferences can be regarded as intents of a query. -This approach translates the multilingual search problem into a particular task -of search diversification. Furthermore, the standard intent-aware approach -could be adopted to build a diversified metric for multilingual search on the -basis of a classical IR metric such as ERR. The intent-aware approach estimates -user satisfaction under a user behavior model. We show however that the -underlying user behavior models is not realistic in the multilingual case, and -the produced intent-aware metric do not appropriately estimate the user -satisfaction. We develop a novel approach to build intent-aware user behavior -models, which overcome these limitations and convert to quality metrics that -better correlate with standard online metrics of user satisfaction. -" -3983,1612.04426,"Edouard Grave, Armand Joulin, Nicolas Usunier",Improving Neural Language Models with a Continuous Cache,cs.CL cs.LG," We propose an extension to neural network language models to adapt their -prediction to the recent history. Our model is a simplified version of memory -augmented networks, which stores past hidden activations as memory and accesses -them through a dot product with the current hidden activation. This mechanism -is very efficient and scales to very large memory sizes. We also draw a link -between the use of external memory in neural network and cache models used with -count based language models. We demonstrate on several language model datasets -that our approach performs significantly better than recent memory augmented -networks. -" -3984,1612.04460,"Vered Shwartz, Enrico Santus, and Dominik Schlechtweg","Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy - Detection",cs.CL," The fundamental role of hypernymy in NLP has motivated the development of -many methods for the automatic identification of this relation, most of which -rely on word distribution. We investigate an extensive number of such -unsupervised measures, using several distributional semantic models that differ -by context type and feature weighting. We analyze the performance of the -different methods based on their linguistic motivation. Comparison to the -state-of-the-art supervised methods shows that while supervised methods -generally outperform the unsupervised ones, the former are sensitive to the -distribution of training instances, hurting their reliability. Being based on -general linguistic hypotheses and independent from training data, unsupervised -measures are more robust, and therefore are still useful artillery for -hypernymy detection. -" -3985,1612.04499,"Hu Xu, Lei Shu, Jingyuan Zhang, Philip S. Yu","Mining Compatible/Incompatible Entities from Question and Answering via - Yes/No Answer Classification using Distant Label Expansion",cs.CL," Product Community Question Answering (PCQA) provides useful information about -products and their features (aspects) that may not be well addressed by product -descriptions and reviews. We observe that a product's compatibility issues with -other products are frequently discussed in PCQA and such issues are more -frequently addressed in accessories, i.e., via a yes/no question ""Does this -mouse work with windows 10?"". In this paper, we address the problem of -extracting compatible and incompatible products from yes/no questions in PCQA. -This problem can naturally have a two-stage framework: first, we perform -Complementary Entity (product) Recognition (CER) on yes/no questions; second, -we identify the polarities of yes/no answers to assign the complementary -entities a compatibility label (compatible, incompatible or unknown). We -leverage an existing unsupervised method for the first stage and a 3-class -classifier by combining a distant PU-learning method (learning from positive -and unlabeled examples) together with a binary classifier for the second stage. -The benefit of using distant PU-learning is that it can help to expand more -implicit yes/no answers without using any human annotated data. We conduct -experiments on 4 products to show that the proposed method is effective. -" -3986,1612.04538,Gayatri Bhat and Monojit Choudhury and Kalika Bali,"Grammatical Constraints on Intra-sentential Code-Switching: From - Theories to Working Models",cs.CL," We make one of the first attempts to build working models for -intra-sentential code-switching based on the Equivalence-Constraint (Poplack -1980) and Matrix-Language (Myers-Scotton 1993) theories. We conduct a detailed -theoretical analysis, and a small-scale empirical study of the two models for -Hindi-English CS. Our analyses show that the models are neither sound nor -complete. Taking insights from the errors made by the models, we propose a new -model that combines features of both the theories. -" -3987,1612.04609,"Ruobing Xie, Zhiyuan Liu, Rui Yan, Maosong Sun",Neural Emoji Recommendation in Dialogue Systems,cs.CL," Emoji is an essential component in dialogues which has been broadly utilized -on almost all social platforms. It could express more delicate feelings beyond -plain texts and thus smooth the communications between users, making dialogue -systems more anthropomorphic and vivid. In this paper, we focus on -automatically recommending appropriate emojis given the contextual information -in multi-turn dialogue systems, where the challenges locate in understanding -the whole conversations. More specifically, we propose the hierarchical long -short-term memory model (H-LSTM) to construct dialogue representations, -followed by a softmax classifier for emoji classification. We evaluate our -models on the task of emoji classification in a real-world dataset, with some -further explorations on parameter sensitivity and case study. Experimental -results demonstrate that our method achieves the best performances on all -evaluation metrics. It indicates that our method could well capture the -contextual information and emotion flow in dialogues, which is significant for -emoji recommendation. -" -3988,1612.04629,Rico Sennrich,"How Grammatical is Character-level Neural Machine Translation? Assessing - MT Quality with Contrastive Translation Pairs",cs.CL," Analysing translation quality in regards to specific linguistic phenomena has -historically been difficult and time-consuming. Neural machine translation has -the attractive property that it can produce scores for arbitrary translations, -and we propose a novel method to assess how well NMT systems model specific -linguistic phenomena such as agreement over long distances, the production of -novel words, and the faithful translation of polarity. The core idea is that we -measure whether a reference translation is more probable under a NMT model than -a contrastive translation which introduces a specific type of error. We present -LingEval97, a large-scale data set of 97000 contrastive translation pairs based -on the WMT English->German translation task, with errors automatically created -with simple rules. We report results for a number of systems, and find that -recently introduced character-level NMT systems perform better at -transliteration than models with byte-pair encoding (BPE) segmentation, but -perform more poorly at morphosyntactic agreement, and translating discontiguous -units of meaning. -" -3989,1612.04675,"Peidong Wang, Zhongqiu Wang, Deliang Wang",Recurrent Deep Stacking Networks for Speech Recognition,cs.CL cs.SD," This paper presented our work on applying Recurrent Deep Stacking Networks -(RDSNs) to Robust Automatic Speech Recognition (ASR) tasks. In the paper, we -also proposed a more efficient yet comparable substitute to RDSN, Bi- Pass -Stacking Network (BPSN). The main idea of these two models is to add -phoneme-level information into acoustic models, transforming an acoustic model -to the combination of an acoustic model and a phoneme-level N-gram model. -Experiments showed that RDSN and BPsn can substantially improve the -performances over conventional DNNs. -" -3990,1612.04683,"Mauro Cettolo, Mara Chinea Rios, Roldano Cattoni","Unsupervised Clustering of Commercial Domains for Adaptive Machine - Translation",cs.CL," In this paper, we report on domain clustering in the ambit of an adaptive MT -architecture. A standard bottom-up hierarchical clustering algorithm has been -instantiated with five different distances, which have been compared, on an MT -benchmark built on 40 commercial domains, in terms of dendrograms, intrinsic -and extrinsic evaluations. The main outcome is that the most expensive distance -is also the only one able to allow the MT engine to guarantee good performance -even with few, but highly populated clusters of domains. -" -3991,1612.04732,Radu Soricut and Nan Ding,Multilingual Word Embeddings using Multigraphs,cs.CL," We present a family of neural-network--inspired models for computing -continuous word representations, specifically designed to exploit both -monolingual and multilingual text. This framework allows us to perform -unsupervised training of embeddings that exhibit higher accuracy on syntactic -and semantic compositionality, as well as multilingual semantic similarity, -compared to previous models trained in an unsupervised fashion. We also show -that such multilingual embeddings, optimized for semantic similarity, can -improve the performance of statistical machine translation with respect to how -it handles words not present in the parallel data. -" -3992,1612.04744,"Peidong Wang, Deliang Wang",Incorporating Language Level Information into Acoustic Models,cs.CL cs.LG cs.SD," This paper proposed a class of novel Deep Recurrent Neural Networks which can -incorporate language-level information into acoustic models. For simplicity, we -named these networks Recurrent Deep Language Networks (RDLNs). Multiple -variants of RDLNs were considered, including two kinds of context information, -two methods to process the context, and two methods to incorporate the -language-level information. RDLNs provided possible methods to fine-tune the -whole Automatic Speech Recognition (ASR) system in the acoustic modeling -process. -" -3993,1612.04757,"Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, - Trevor Darrell, Marcus Rohrbach","Attentive Explanations: Justifying Decisions and Pointing to the - Evidence",cs.CV cs.AI cs.CL," Deep models are the defacto standard in visual decision models due to their -impressive performance on a wide array of visual tasks. However, they are -frequently seen as opaque and are unable to explain their decisions. In -contrast, humans can justify their decisions with natural language and point to -the evidence in the visual world which led to their decisions. We postulate -that deep models can do this as well and propose our Pointing and Justification -(PJ-X) model which can justify its decision with a sentence and point to the -evidence by introspecting its decision and explanation process using an -attention mechanism. Unfortunately there is no dataset available with reference -explanations for visual decision making. We thus collect two datasets in two -domains where it is interesting and challenging to explain decisions. First, we -extend the visual question answering task to not only provide an answer but -also a natural language explanation for the answer. Second, we focus on -explaining human activities which is traditionally more challenging than object -classification. We extensively evaluate our PJ-X model, both on the -justification and pointing tasks, by comparing it to prior models and ablations -using both automatic and human evaluations. -" -3994,1612.04765,Uwe D. Reichel,"CoPaSul Manual -- Contour-based parametric and superpositional - intonation stylization",cs.CL," The purposes of the CoPaSul toolkit are (1) automatic prosodic annotation and -(2) prosodic feature extraction from syllable to utterance level. CoPaSul -stands for contour-based, parametric, superpositional intonation stylization. -In this framework intonation is represented as a superposition of global and -local contours that are described parametrically in terms of polynomial -coefficients. On the global level (usually associated but not necessarily -restricted to intonation phrases) the stylization serves to represent register -in terms of time-varying F0 level and range. On the local level (e.g. accent -groups), local contour shapes are described. From this parameterization several -features related to prosodic boundaries and prominence can be derived. -Furthermore, by coefficient clustering prosodic contour classes can be obtained -in a bottom-up way. Next to the stylization-based feature extraction also -standard F0 and energy measures (e.g. mean and variance) as well as rhythmic -aspects can be calculated. At the current state automatic annotation comprises: -segmentation into interpausal chunks, syllable nucleus extraction, and -unsupervised localization of prosodic phrase boundaries and prominent -syllables. F0 and partly also energy feature sets can be derived for: standard -measurements (as median and IQR), register in terms of F0 level and range, -prosodic boundaries, local contour shapes, bottom-up derived contour classes, -Gestalt of accent groups in terms of their deviation from higher level prosodic -units, as well as for rhythmic aspects quantifying the relation between F0 and -energy contours and prosodic event rates. -" -3995,1612.04868,"I. Lopez-Gazpio and M. Maritxalar and A. Gonzalez-Agirre and G. Rigau - and L. Uria and E. Agirre","Interpretable Semantic Textual Similarity: Finding and explaining - differences between sentences",cs.CL cs.AI cs.LG," User acceptance of artificial intelligence agents might depend on their -ability to explain their reasoning, which requires adding an interpretability -layer that fa- cilitates users to understand their behavior. This paper focuses -on adding an in- terpretable layer on top of Semantic Textual Similarity (STS), -which measures the degree of semantic equivalence between two sentences. The -interpretability layer is formalized as the alignment between pairs of segments -across the two sentences, where the relation between the segments is labeled -with a relation type and a similarity score. We present a publicly available -dataset of sentence pairs annotated following the formalization. We then -develop a system trained on this dataset which, given a sentence pair, explains -what is similar and different, in the form of graded and typed segment -alignments. When evaluated on the dataset, the system performs better than an -informed baseline, showing that the dataset and task are well-defined and -feasible. Most importantly, two user studies show how the system output can be -used to automatically produce explanations in natural language. Users performed -better when having access to the explanations, pro- viding preliminary evidence -that our dataset and method to automatically produce explanations is useful in -real applications. -" -3996,1612.04936,"Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, - Jason Weston",Learning through Dialogue Interactions by Asking Questions,cs.CL cs.AI," A good dialogue agent should have the ability to interact with users by both -responding to questions and by asking questions, and importantly to learn from -both types of interaction. In this work, we explore this direction by designing -a simulator and a set of synthetic tasks in the movie domain that allow such -interactions between a learner and a teacher. We investigate how a learner can -benefit from asking questions in both offline and online reinforcement learning -settings, and demonstrate that the learner improves when asking questions. -Finally, real experiments with Mechanical Turk validate the approach. Our work -represents a first step in developing such end-to-end learned interactive -dialogue agents. -" -3997,1612.04949,"Hao Liu, Yang Yang, Fumin Shen, Lixin Duan and Heng Tao Shen","Recurrent Image Captioner: Describing Images with Spatial-Invariant - Transformation and Attention Filtering",cs.CV cs.CL," Along with the prosperity of recurrent neural network in modelling sequential -data and the power of attention mechanism in automatically identify salient -information, image captioning, a.k.a., image description, has been remarkably -advanced in recent years. Nonetheless, most existing paradigms may suffer from -the deficiency of invariance to images with different scaling, rotation, etc.; -and effective integration of standalone attention to form a holistic end-to-end -system. In this paper, we propose a novel image captioning architecture, termed -Recurrent Image Captioner (\textbf{RIC}), which allows visual encoder and -language decoder to coherently cooperate in a recurrent manner. Specifically, -we first equip CNN-based visual encoder with a differentiable layer to enable -spatially invariant transformation of visual signals. Moreover, we deploy an -attention filter module (differentiable) between encoder and decoder to -dynamically determine salient visual parts. We also employ bidirectional LSTM -to preprocess sentences for generating better textual representations. Besides, -we propose to exploit variational inference to optimize the whole architecture. -Extensive experimental results on three benchmark datasets (i.e., Flickr8k, -Flickr30k and MS COCO) demonstrate the superiority of our proposed architecture -as compared to most of the state-of-the-art methods. -" -3998,1612.04988,Prajna Upadhyay and Tanuma Patra and Ashwini Purkar and Maya Ramanath,"TeKnowbase: Towards Construction of a Knowledge-base of Technical - Concepts",cs.CL cs.AI," In this paper, we describe the construction of TeKnowbase, a knowledge-base -of technical concepts in computer science. Our main information sources are -technical websites such as Webopedia and Techtarget as well as Wikipedia and -online textbooks. We divide the knowledge-base construction problem into two -parts -- the acquisition of entities and the extraction of relationships among -these entities. Our knowledge-base consists of approximately 100,000 triples. -We conducted an evaluation on a sample of triples and report an accuracy of a -little over 90\%. We additionally conducted classification experiments on -StackOverflow data with features from TeKnowbase and achieved improved -classification accuracy. -" -3999,1612.05131,"Fugen Zhou, Fuxiang Wu, Zhengchen Zhang, Minghui Dong","Transition-based Parsing with Context Enhancement and Future Reward - Reranking",cs.CL," This paper presents a novel reranking model, future reward reranking, to -re-score the actions in a transition-based parser by using a global scorer. -Different to conventional reranking parsing, the model searches for the best -dependency tree in all feasible trees constraining by a sequence of actions to -get the future reward of the sequence. The scorer is based on a first-order -graph-based parser with bidirectional LSTM, which catches different parsing -view compared with the transition-based parser. Besides, since context -enhancement has shown substantial improvement in the arc-stand transition-based -parsing over the parsing accuracy, we implement context enhancement on an -arc-eager transition-base parser with stack LSTMs, the dynamic oracle and -dropout supporting and achieve further improvement. With the global scorer and -context enhancement, the results show that UAS of the parser increases as much -as 1.20% for English and 1.66% for Chinese, and LAS increases as much as 1.32% -for English and 1.63% for Chinese. Moreover, we get state-of-the-art LASs, -achieving 87.58% for Chinese and 93.37% for English. -" -4000,1612.05202,"Mickael Rouvier, Benoit Favre",Building a robust sentiment lexicon with (almost) no resource,cs.CL," Creating sentiment polarity lexicons is labor intensive. Automatically -translating them from resourceful languages requires in-domain machine -translation systems, which rely on large quantities of bi-texts. In this paper, -we propose to replace machine translation by transferring words from the -lexicon through word embeddings aligned across languages with a simple linear -transform. The approach leads to no degradation, compared to machine -translation, when tested on sentiment polarity classification on tweets from -four languages. -" -4001,1612.05251,"Franck Dernoncourt, Ji Young Lee, Peter Szolovits","Neural Networks for Joint Sentence Classification in Medical Paper - Abstracts",cs.CL cs.AI cs.NE stat.ML," Existing models based on artificial neural networks (ANNs) for sentence -classification often do not incorporate the context in which sentences appear, -and classify sentences individually. However, traditional sentence -classification approaches have been shown to greatly benefit from jointly -classifying subsequent sentences, such as with conditional random fields. In -this work, we present an ANN architecture that combines the effectiveness of -typical ANN models to classify sentences in isolation, with the strength of -structured prediction. Our model achieves state-of-the-art results on two -different datasets for sequential sentence classification in medical abstracts. -" -4002,1612.05270,"Eric S. Tellez, Sabino Miranda Jim\'enez, Mario Graff, Daniela - Moctezuma, Ranyart R. Su\'arez, Oscar S. Siordia",A Simple Approach to Multilingual Polarity Classification in Twitter,cs.CL cs.LG stat.ML," Recently, sentiment analysis has received a lot of attention due to the -interest in mining opinions of social media users. Sentiment analysis consists -in determining the polarity of a given text, i.e., its degree of positiveness -or negativeness. Traditionally, Sentiment Analysis algorithms have been -tailored to a specific language given the complexity of having a number of -lexical variations and errors introduced by the people generating content. In -this contribution, our aim is to provide a simple to implement and easy to use -multilingual framework, that can serve as a baseline for sentiment analysis -contests, and as starting point to build new sentiment analysis systems. We -compare our approach in eight different languages, three of them have important -international contests, namely, SemEval (English), TASS (Spanish), and -SENTIPOLC (Italian). Within the competitions our approach reaches from medium -to high positions in the rankings; whereas in the remaining languages our -approach outperforms the reported results. -" -4003,1612.05310,Luis Gerardo Mojica,Modeling Trolling in Social Media Conversations,cs.CL," Social media websites, electronic newspapers and Internet forums allow -visitors to leave comments for others to read and interact. This exchange is -not free from participants with malicious intentions, who troll others by -positing messages that are intended to be provocative, offensive, or menacing. -With the goal of facilitating the computational modeling of trolling, we -propose a trolling categorization that is novel in the sense that it allows -comment-based analysis from both the trolls' and the responders' perspectives, -characterizing these two perspectives using four aspects, namely, the troll's -intention and his intention disclosure, as well as the responder's -interpretation of the troll's intention and her response strategy. Using this -categorization, we annotate and release a dataset containing excerpts of Reddit -conversations involving suspected trolls and their interactions with other -users. Finally, we identify the difficult-to-classify cases in our corpus and -suggest potential solutions for them. -" -4004,1612.05340,"Shraey Bhatia, Jey Han Lau, Timothy Baldwin",Automatic Labelling of Topics with Neural Embeddings,cs.CL," Topics generated by topic models are typically represented as list of terms. -To reduce the cognitive overhead of interpreting these topics for end-users, we -propose labelling a topic with a succinct phrase that summarises its theme or -idea. Using Wikipedia document titles as label candidates, we compute neural -embeddings for documents and words to select the most relevant labels for -topics. Compared to a state-of-the-art topic labelling system, our methodology -is simpler, more efficient, and finds better topic labels. -" -4005,1612.05348,"Ndapandula Nakashole, Tom M. Mitchell",Machine Reading with Background Knowledge,cs.AI cs.CL," Intelligent systems capable of automatically understanding natural language -text are important for many artificial intelligence applications including -mobile phone voice assistants, computer vision, and robotics. Understanding -language often constitutes fitting new information into a previously acquired -view of the world. However, many machine reading systems rely on the text alone -to infer its meaning. In this paper, we pursue a different approach; machine -reading methods that make use of background knowledge to facilitate language -understanding. To this end, we have developed two methods: The first method -addresses prepositional phrase attachment ambiguity. It uses background -knowledge within a semi-supervised machine learning algorithm that learns from -both labeled and unlabeled data. This approach yields state-of-the-art results -on two datasets against strong baselines; The second method extracts -relationships from compound nouns. Our knowledge-aware method for compound noun -analysis accurately extracts relationships and significantly outperforms a -baseline that does not make use of background knowledge. -" -4006,1612.05420,"Arkanath Pathak, Pawan Goyal and Plaban Bhowmick","A Two-Phase Approach Towards Identifying Argument Structure in Natural - Language",cs.CL," We propose a new approach for extracting argument structure from natural -language texts that contain an underlying argument. Our approach comprises of -two phases: Score Assignment and Structure Prediction. The Score Assignment -phase trains models to classify relations between argument units (Support, -Attack or Neutral). To that end, different training strategies have been -explored. We identify different linguistic and lexical features for training -the classifiers. Through ablation study, we observe that our novel use of -word-embedding features is most effective for this task. The Structure -Prediction phase makes use of the scores from the Score Assignment phase to -arrive at the optimal structure. We perform experiments on three argumentation -datasets, namely, AraucariaDB, Debatepedia and Wikipedia. We also propose two -baselines and observe that the proposed approach outperforms baseline systems -for the final task of Structure Prediction. -" -4007,1612.05555,"\'Alvaro Peris, Mara Chinea-Rios and Francisco Casacuberta","Neural Networks Classifier for Data Selection in Statistical Machine - Translation",cs.CL," We address the data selection problem in statistical machine translation -(SMT) as a classification task. The new data selection method is based on a -neural network classifier. We present a new method description and empirical -results proving that our data selection method provides better translation -quality, compared to a state-of-the-art method (i.e., Cross entropy). Moreover, -the empirical results reported are coherent across different language pairs. -" -4008,1612.05688,"Xiujun Li, Zachary C. Lipton, Bhuwan Dhingra, Lihong Li, Jianfeng Gao, - Yun-Nung Chen",A User Simulator for Task-Completion Dialogues,cs.LG cs.AI cs.CL," Despite widespread interests in reinforcement-learning for task-oriented -dialogue systems, several obstacles can frustrate research and development -progress. First, reinforcement learners typically require interaction with the -environment, so conventional dialogue corpora cannot be used directly. Second, -each task presents specific challenges, requiring separate corpus of -task-specific annotated data. Third, collecting and annotating human-machine or -human-human conversations for task-oriented dialogues requires extensive domain -knowledge. Because building an appropriate dataset can be both financially -costly and time-consuming, one popular approach is to build a user simulator -based upon a corpus of example dialogues. Then, one can train reinforcement -learning agents in an online fashion as they interact with the simulator. -Dialogue agents trained on these simulators can serve as an effective starting -point. Once agents master the simulator, they may be deployed in a real -environment to interact with humans, and continue to be trained online. To ease -empirical algorithmic comparisons in dialogues, this paper introduces a new, -publicly available simulation framework, where our simulator, designed for the -movie-booking domain, leverages both rules and collected data. The simulator -supports two tasks: movie ticket booking and movie seeking. Finally, we -demonstrate several agents and detail the procedure to add and test your own -agent in the proposed framework. -" -4009,1612.05734,"Valentina Franzoni, Giulio Biondi, Alfredo Milani, Yuanxi Li",Web-based Semantic Similarity for Emotion Recognition in Web Objects,cs.CL cs.AI cs.SI," In this project we propose a new approach for emotion recognition using -web-based similarity (e.g. confidence, PMI and PMING). We aim to extract basic -emotions from short sentences with emotional content (e.g. news titles, tweets, -captions), performing a web-based quantitative evaluation of semantic proximity -between each word of the analyzed sentence and each emotion of a psychological -model (e.g. Plutchik, Ekman, Lovheim). The phases of the extraction include: -text preprocessing (tokenization, stop words, filtering), search engine -automated query, HTML parsing of results (i.e. scraping), estimation of -semantic proximity, ranking of emotions according to proximity measures. The -main idea is that, since it is possible to generalize semantic similarity under -the assumption that similar concepts co-occur in documents indexed in search -engines, therefore also emotions can be generalized in the same way, through -tags or terms that express them in a particular language, ranking emotions. -Training results are compared to human evaluation, then additional comparative -tests on results are performed, both for the global ranking correlation (e.g. -Kendall, Spearman, Pearson) both for the evaluation of the emotion linked to -each single word. Different from sentiment analysis, our approach works at a -deeper level of abstraction, aiming at recognizing specific emotions and not -only the positive/negative sentiment, in order to predict emotions as semantic -data. -" -4010,1612.06027,"Katharina Kann and Ryan Cotterell and Hinrich Sch\""utze",Neural Multi-Source Morphological Reinflection,cs.CL," We explore the task of multi-source morphological reinflection, which -generalizes the standard, single-source version. The input consists of (i) a -target tag and (ii) multiple pairs of source form and source tag for a lemma. -The motivation is that it is beneficial to have access to more than one source -form since different source forms can provide complementary information, e.g., -different stems. We further present a novel extension to the encoder- decoder -recurrent neural architecture, consisting of multiple encoders, to better solve -the task. We show that our new architecture outperforms single-source -reinflection models and publish our dataset for multi-source morphological -reinflection to facilitate future research. -" -4011,1612.06043,Raphael Shu and Hideki Nakayama,"An Empirical Study of Adequate Vision Span for Attention-Based Neural - Machine Translation",cs.CL cs.AI," Recently, the attention mechanism plays a key role to achieve high -performance for Neural Machine Translation models. However, as it computes a -score function for the encoder states in all positions at each decoding step, -the attention model greatly increases the computational complexity. In this -paper, we investigate the adequate vision span of attention models in the -context of machine translation, by proposing a novel attention framework that -is capable of reducing redundant score computation dynamically. The term -""vision span"" means a window of the encoder states considered by the attention -model in one step. In our experiments, we found that the average window size of -vision span can be reduced by over 50% with modest loss in accuracy on -English-Japanese and German-English translation tasks.% This results indicate -that the conventional attention mechanism performs a significant amount of -redundant computation. -" -4012,1612.06062,"Ganesh J, Manish Gupta and Vasudeva Varma",Improving Tweet Representations using Temporal and User Context,cs.CL cs.AI," In this work we propose a novel representation learning model which computes -semantic representations for tweets accurately. Our model systematically -exploits the chronologically adjacent tweets ('context') from users' Twitter -timelines for this task. Further, we make our model user-aware so that it can -do well in modeling the target tweet by exploiting the rich knowledge about the -user such as the way the user writes the post and also summarizing the topics -on which the user writes. We empirically demonstrate that the proposed models -outperform the state-of-the-art models in predicting the user profile -attributes like spouse, education and job by 19.66%, 2.27% and 2.22% -respectively. -" -4013,1612.06138,Dakun Zhang and Jungi Kim and Josep Crego and Jean Senellart,Boosting Neural Machine Translation,cs.CL," Training efficiency is one of the main problems for Neural Machine -Translation (NMT). Deep networks need for very large data as well as many -training iterations to achieve state-of-the-art performance. This results in -very high computation cost, slowing down research and industrialisation. In -this paper, we propose to alleviate this problem with several training methods -based on data boosting and bootstrap with no modifications to the neural -network. It imitates the learning process of humans, which typically spend more -time when learning ""difficult"" concepts than easier ones. We experiment on an -English-French translation task showing accuracy improvements of up to 1.63 -BLEU while saving 20% of training time. -" -4014,1612.06139,Josep Crego and Jean Senellart,Neural Machine Translation from Simplified Translations,cs.CL," Text simplification aims at reducing the lexical, grammatical and structural -complexity of a text while keeping the same meaning. In the context of machine -translation, we introduce the idea of simplified translations in order to boost -the learning ability of deep neural translation models. We conduct preliminary -experiments showing that translation complexity is actually reduced in a -translation of a source bi-text compared to the target reference of the bi-text -while using a neural machine translation (NMT) system learned on the exact same -bi-text. Based on knowledge distillation idea, we then train an NMT system -using the simplified bi-text, and show that it outperforms the initial system -that was built over the reference data set. Performance is further boosted when -both reference and automatic translations are used to learn the network. We -perform an elementary analysis of the translated corpus and report accuracy -results of the proposed approach on English-to-French and English-to-German -translation tasks. -" -4015,1612.06140,Catherine Kobus and Josep Crego and Jean Senellart,Domain Control for Neural Machine Translation,cs.CL," Machine translation systems are very sensitive to the domains they were -trained on. Several domain adaptation techniques have been deeply studied. We -propose a new technique for neural machine translation (NMT) that we call -domain control which is performed at runtime using a unique neural network -covering multiple domains. The presented approach shows quality improvements -when compared to dedicated domains translating on any of the covered domains -and even on out-of-domain data. In addition, model parameters do not need to be -re-estimated for each domain, making this effective to real use cases. -Evaluation is carried out on English-to-French translation for two different -testing scenarios. We first consider the case where an end-user performs -translations on a known domain. Secondly, we consider the scenario where the -domain is not known and predicted at the sentence level before translating. -Results show consistent accuracy improvements for both conditions. -" -4016,1612.06141,Christophe Servan and Josep Crego and Jean Senellart,"Domain specialization: a post-training domain adaptation for Neural - Machine Translation",cs.CL," Domain adaptation is a key feature in Machine Translation. It generally -encompasses terminology, domain and style adaptation, especially for human -post-editing workflows in Computer Assisted Translation (CAT). With Neural -Machine Translation (NMT), we introduce a new notion of domain adaptation that -we call ""specialization"" and which is showing promising results both in the -learning speed and in adaptation accuracy. In this paper, we propose to explore -this approach under several perspectives. -" -4017,1612.06212,Thomas Laurent and James von Brecht,A recurrent neural network without chaos,cs.NE cs.CL cs.LG," We introduce an exceptionally simple gated recurrent neural network (RNN) -that achieves performance comparable to well-known gated architectures, such as -LSTMs and GRUs, on the word-level language modeling task. We prove that our -model has simple, predicable and non-chaotic dynamics. This stands in stark -contrast to more standard gated architectures, whose underlying dynamical -systems exhibit chaotic behavior. -" -4018,1612.06391,"Chenhao Tan, Lillian Lee","Talk it up or play it down? (Un)expected correlations between - (de-)emphasis and recurrence of discussion points in consequential U.S. - economic policy meetings",cs.SI cs.CL physics.soc-ph," In meetings where important decisions get made, what items receive more -attention may influence the outcome. We examine how different types of -rhetorical (de-)emphasis -- including hedges, superlatives, and contrastive -conjunctions -- correlate with what gets revisited later, controlling for item -frequency and speaker. Our data consists of transcripts of recurring meetings -of the Federal Reserve's Open Market Committee (FOMC), where important aspects -of U.S. monetary policy are decided on. Surprisingly, we find that words -appearing in the context of hedging, which is usually considered a way to -express uncertainty, are more likely to be repeated in subsequent meetings, -while strong emphasis indicated by superlatives has a slightly negative effect -on word recurrence in subsequent meetings. We also observe interesting patterns -in how these effects vary depending on social factors such as status and gender -of the speaker. For instance, the positive effects of hedging are more -pronounced for female speakers than for male speakers. -" -4019,1612.06475,James Cross and Liang Huang,"Span-Based Constituency Parsing with a Structure-Label System and - Provably Optimal Dynamic Oracles",cs.CL," Parsing accuracy using efficient greedy transition systems has improved -dramatically in recent years thanks to neural networks. Despite striking -results in dependency parsing, however, neural models have not surpassed -state-of-the-art approaches in constituency parsing. To remedy this, we -introduce a new shift-reduce system whose stack contains merely sentence spans, -represented by a bare minimum of LSTM features. We also design the first -provably optimal dynamic oracle for constituency parsing, which runs in -amortized O(1) time, compared to O(n^3) oracles for standard dependency -parsing. Training with this oracle, we achieve the best F1 scores on both -English and French of any parser that does not use reranking or external data. -" -4020,1612.06530,"Shijie Zhang, Lizhen Qu, Shaodi You, Zhenglu Yang, Jiawan Zhang",Automatic Generation of Grounded Visual Questions,cs.CV cs.CL," In this paper, we propose the first model to be able to generate visually -grounded questions with diverse types for a single image. Visual question -generation is an emerging topic which aims to ask questions in natural language -based on visual input. To the best of our knowledge, it lacks automatic methods -to generate meaningful questions with various types for the same visual input. -To circumvent the problem, we propose a model that automatically generates -visually grounded questions with varying types. Our model takes as input both -images and the captions generated by a dense caption model, samples the most -probable question types, and generates the questions in sequel. The -experimental results on two real world datasets show that our model outperforms -the strongest baseline in terms of both correctness and diversity with a wide -margin. -" -4021,1612.06549,"Heike Adel and Hinrich Sch\""utze",Exploring Different Dimensions of Attention for Uncertainty Detection,cs.CL," Neural networks with attention have proven effective for many natural -language processing tasks. In this paper, we develop attention mechanisms for -uncertainty detection. In particular, we generalize standardly used attention -mechanisms by introducing external attention and sequence-preserving attention. -These novel architectures differ from standard approaches in that they use -external resources to compute attention weights and preserve sequence -information. We compare them to other configurations along different dimensions -of attention. Our novel architectures set the new state of the art on a -Wikipedia benchmark dataset and perform similar to the state-of-the-art model -on a biomedical benchmark which uses a large set of linguistic features. -" -4022,1612.06572,Tom\'a\v{s} Brychc\'in and Pavel Kr\'al,Unsupervised Dialogue Act Induction using Gaussian Mixtures,cs.CL," This paper introduces a new unsupervised approach for dialogue act induction. -Given the sequence of dialogue utterances, the task is to assign them the -labels representing their function in the dialogue. - Utterances are represented as real-valued vectors encoding their meaning. We -model the dialogue as Hidden Markov model with emission probabilities estimated -by Gaussian mixtures. We use Gibbs sampling for posterior inference. - We present the results on the standard Switchboard-DAMSL corpus. Our -algorithm achieves promising results compared with strong supervised baselines -and outperforms other unsupervised algorithms. -" -4023,1612.06581,C. Maria Keet and Langa Khumalo,Grammar rules for the isiZulu complex verb,cs.CL," The isiZulu verb is known for its morphological complexity, which is a -subject for on-going linguistics research, as well as for prospects of -computational use, such as controlled natural language interfaces, machine -translation, and spellcheckers. To this end, we seek to answer the question as -to what the precise grammar rules for the isiZulu complex verb are (and, by -extension, the Bantu verb morphology). To this end, we iteratively specify the -grammar as a Context Free Grammar, and evaluate it computationally. The grammar -presented in this paper covers the subject and object concords, negation, -present tense, aspect, mood, and the causative, applicative, stative, and the -reciprocal verbal extensions, politeness, the wh-question modifiers, and aspect -doubling, ensuring their correct order as they appear in verbs. The grammar -conforms to specification. -" -4024,1612.06671,"Max Berggren, Jussi Karlgren, Robert \""Ostling, and Mikael Parkvall",Inferring the location of authors from words in their texts,cs.CL," For the purposes of computational dialectology or other geographically bound -text analysis tasks, texts must be annotated with their or their authors' -location. Many texts are locatable through explicit labels but most have no -explicit annotation of place. This paper describes a series of experiments to -determine how positionally annotated microblog posts can be used to learn -location-indicating words which then can be used to locate blog texts and their -authors. A Gaussian distribution is used to model the locational qualities of -words. We introduce the notion of placeness to describe how locational words -are. - We find that modelling word distributions to account for several locations -and thus several Gaussian distributions per word, defining a filter which picks -out words with high placeness based on their local distributional context, and -aggregating locational information in a centroid for each text gives the most -useful results. The results are applied to data in the Swedish language. -" -4025,1612.06685,"Konstantinos Pappas, Steven Wilson, and Rada Mihalcea","Stateology: State-Level Interactive Charting of Language, Feelings, and - Values",cs.CL," People's personality and motivations are manifest in their everyday language -usage. With the emergence of social media, ample examples of such usage are -procurable. In this paper, we aim to analyze the vocabulary used by close to -200,000 Blogger users in the U.S. with the purpose of geographically portraying -various demographic, linguistic, and psychological dimensions at the state -level. We give a description of a web-based tool for viewing maps that depict -various characteristics of the social media users as derived from this large -blog dataset of over two billion words. -" -4026,1612.06778,"Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick","SCDV : Sparse Composite Document Vectors using soft clustering over - distributional representations",cs.CL," We present a feature vector formation technique for documents - Sparse -Composite Document Vector (SCDV) - which overcomes several shortcomings of the -current distributional paragraph vector representations that are widely used -for text representation. In SCDV, word embedding's are clustered to capture -multiple semantic contexts in which words occur. They are then chained together -to form document topic-vectors that can express complex, multi-topic documents. -Through extensive experiments on multi-class and multi-label classification -tasks, we outperform the previous state-of-the-art method, NTSG (Liu et al., -2015a). We also show that SCDV embedding's perform well on heterogeneous tasks -like Topic Coherence, context-sensitive Learning and Information Retrieval. -Moreover, we achieve significant reduction in training and prediction times -compared to other representation methods. SCDV achieves best of both worlds - -better performance with lower time and space complexity. -" -4027,1612.06821,"Rahul Wadbude, Vivek Gupta, Dheeraj Mekala, Harish Karnick",User Bias Removal in Review Score Prediction,cs.CL," Review score prediction of text reviews has recently gained a lot of -attention in recommendation systems. A major problem in models for review score -prediction is the presence of noise due to user-bias in review scores. We -propose two simple statistical methods to remove such noise and improve review -score prediction. Compared to other methods that use multiple classifiers, one -for each user, our model uses a single global classifier to predict review -scores. We empirically evaluate our methods on two major categories -(\textit{Electronics} and \textit{Movies and TV}) of the SNAP published Amazon -e-Commerce Reviews data-set and Amazon \textit{Fine Food} reviews data-set. We -obtain improved review score prediction for three commonly used text feature -representations. -" -4028,1612.06890,"Justin Johnson and Bharath Hariharan and Laurens van der Maaten and Li - Fei-Fei and C. Lawrence Zitnick and Ross Girshick","CLEVR: A Diagnostic Dataset for Compositional Language and Elementary - Visual Reasoning",cs.CV cs.CL cs.LG," When building artificial intelligence systems that can reason and answer -questions about visual data, we need diagnostic tests to analyze our progress -and discover shortcomings. Existing benchmarks for visual question answering -can help, but have strong biases that models can exploit to correctly answer -questions without reasoning. They also conflate multiple sources of error, -making it hard to pinpoint model weaknesses. We present a diagnostic dataset -that tests a range of visual reasoning abilities. It contains minimal biases -and has detailed annotations describing the kind of reasoning each question -requires. We use this dataset to analyze a variety of modern visual reasoning -systems, providing novel insights into their abilities and limitations. -" -4029,1612.06897,Markus Freitag and Yaser Al-Onaizan,Fast Domain Adaptation for Neural Machine Translation,cs.CL," Neural Machine Translation (NMT) is a new approach for automatic translation -of text from one human language into another. The basic concept in NMT is to -train a large Neural Network that maximizes the translation performance on a -given parallel corpus. NMT is gaining popularity in the research community -because it outperformed traditional SMT approaches in several translation tasks -at WMT and other evaluation tasks/benchmarks at least for some language pairs. -However, many of the enhancements in SMT over the years have not been -incorporated into the NMT framework. In this paper, we focus on one such -enhancement namely domain adaptation. We propose an approach for adapting a NMT -system to a new domain. The main idea behind domain adaptation is that the -availability of large out-of-domain training data and a small in-domain -training data. We report significant gains with our proposed method in both -automatic metrics and a human subjective evaluation metric on two language -pairs. With our adaptation method, we show large improvement on the new domain -while the performance of our general domain only degrades slightly. In -addition, our approach is fast enough to adapt an already trained system to a -new domain within few hours without the need to retrain the NMT model on the -combined data which usually takes several days/weeks depending on the volume of -the data. -" -4030,1612.07040,"Ze Hu, Zhan Zhang, Qing Chen, Haiqin Yang, Decheng Zuo","A deep learning approach for predicting the quality of online health - expert question-answering services",cs.IR cs.CL," Currently, a growing number of health consumers are asking health-related -questions online, at any time and from anywhere, which effectively lowers the -cost of health care. The most common approach is using online health expert -question-answering (HQA) services, as health consumers are more willing to -trust answers from professional physicians. However, these answers can be of -varying quality depending on circumstance. In addition, as the available HQA -services grow, how to predict the answer quality of HQA services via machine -learning becomes increasingly important and challenging. In an HQA service, -answers are normally short texts, which are severely affected by the data -sparsity problem. Furthermore, HQA services lack community features such as -best answer and user votes. Therefore, the wisdom of the crowd is not available -to rate answer quality. To address these problems, in this paper, the -prediction of HQA answer quality is defined as a classification task. First, -based on the characteristics of HQA services and feedback from medical experts, -a standard for HQA service answer quality evaluation is defined. Next, based on -the characteristics of HQA services, several novel non-textual features are -proposed, including surface linguistic features and social features. Finally, a -deep belief network (DBN)-based HQA answer quality prediction framework is -proposed to predict the quality of answers by learning the high-level hidden -semantic representation from the physicians' answers. Our results prove that -the proposed framework overcomes the problem of overly sparse textual features -in short text answers and effectively identifies high-quality answers. -" -4031,1612.07130,G\'abor Berend,"Sparse Coding of Neural Word Embeddings for Multilingual Sequence - Labeling",cs.CL," In this paper we propose and carefully evaluate a sequence labeling framework -which solely utilizes sparse indicator features derived from dense distributed -word representations. The proposed model obtains (near) state-of-the art -performance for both part-of-speech tagging and named entity recognition for a -variety of languages. Our model relies only on a few thousand sparse -coding-derived features, without applying any modification of the word -representations employed for the different tasks. The proposed model has -favorable generalization properties as it retains over 89.8% of its average POS -tagging accuracy when trained at 1.2% of the total available training data, -i.e.~150 sentences per language. -" -4032,1612.07182,"Angeliki Lazaridou, Alexander Peysakhovich, Marco Baroni",Multi-Agent Cooperation and the Emergence of (Natural) Language,cs.CL cs.CV cs.GT cs.LG cs.MA," The current mainstream approach to train natural language systems is to -expose them to large amounts of text. This passive learning is problematic if -we are interested in developing interactive machines, such as conversational -agents. We propose a framework for language learning that relies on multi-agent -communication. We study this learning in the context of referential games. In -these games, a sender and a receiver see a pair of images. The sender is told -one of them is the target and is allowed to send a message from a fixed, -arbitrary vocabulary to the receiver. The receiver must rely on this message to -identify the target. Thus, the agents develop their own language interactively -out of the need to communicate. We show that two networks with simple -configurations are able to learn to coordinate in the referential game. We -further explore how to make changes to the game environment to cause the ""word -meanings"" induced in the game to better reflect intuitive semantic properties -of the images. In addition, we present a simple strategy for grounding the -agents' code into natural language. Both of these are necessary steps towards -developing machines that are able to communicate with humans productively. -" -4033,1612.07215,"Tengfei Ma, Tetsuya Nasukawa","Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel - Data",cs.CL," Topic models have been successfully applied in lexicon extraction. However, -most previous methods are limited to document-aligned data. In this paper, we -try to address two challenges of applying topic models to lexicon extraction in -non-parallel data: 1) hard to model the word relationship and 2) noisy seed -dictionary. To solve these two challenges, we propose two new bilingual topic -models to better capture the semantic information of each word while -discriminating the multiple translations in a noisy seed dictionary. We extend -the scope of topic models by inverting the roles of ""word"" and ""document"". In -addition, to solve the problem of noise in seed dictionary, we incorporate the -probability of translation selection in our models. Moreover, we also propose -an effective measure to evaluate the similarity of words in different languages -and select the optimal translation pairs. Experimental results using real world -data demonstrate the utility and efficacy of the proposed models. -" -4034,1612.07411,"Huayu Li, Martin Renqiang Min, Yong Ge, Asim Kadav",A Context-aware Attention Network for Interactive Question Answering,cs.CL cs.LG," Neural network based sequence-to-sequence models in an encoder-decoder -framework have been successfully applied to solve Question Answering (QA) -problems, predicting answers from statements and questions. However, almost all -previous models have failed to consider detailed context information and -unknown states under which systems do not have enough information to answer -given questions. These scenarios with incomplete or ambiguous information are -very common in the setting of Interactive Question Answering (IQA). To address -this challenge, we develop a novel model, employing context-dependent -word-level attention for more accurate statement representations and -question-guided sentence-level attention for better context modeling. We also -generate unique IQA datasets to test our model, which will be made publicly -available. Employing these attention mechanisms, our model accurately -understands when it can output an answer or when it requires generating a -supplementary question for additional input depending on different contexts. -When available, user's feedback is encoded and directly applied to update -sentence-level attention to infer an answer. Extensive experiments on QA and -IQA datasets quantitatively demonstrate the effectiveness of our model with -significant improvement over state-of-the-art conventional QA models. -" -4035,1612.07486,"Robert \""Ostling, J\""org Tiedemann",Continuous multilinguality with language vectors,cs.CL," Most existing models for multilingual natural language processing (NLP) treat -language as a discrete category, and make predictions for either one language -or the other. In contrast, we propose using continuous vector representations -of language. We show that these can be learned efficiently with a -character-based neural language model, and used to improve inference about -language varieties not seen during training. In experiments with 1303 Bible -translations into 990 different languages, we empirically explore the capacity -of multilingual language models, and also show that the language vectors -capture genetic relationships between languages. -" -4036,1612.07495,"Yadollah Yaghoobzadeh and Heike Adel and Hinrich Sch\""utze",Noise Mitigation for Neural Entity Typing and Relation Extraction,cs.CL," In this paper, we address two different types of noise in information -extraction models: noise from distant supervision and noise from pipeline input -features. Our target tasks are entity typing and relation extraction. For the -first noise type, we introduce multi-instance multi-label learning algorithms -using neural network models, and apply them to fine-grained entity typing for -the first time. This gives our models comparable performance with the -state-of-the-art supervised approach which uses global embeddings of entities. -For the second noise type, we propose ways to improve the integration of noisy -entity type predictions into relation extraction. Our experiments show that -probabilistic predictions are more robust than discrete predictions and that -joint training of the two tasks performs best. -" -4037,1612.07600,"Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, Erkut Erdem",Re-evaluating Automatic Metrics for Image Captioning,cs.CL cs.CV," The task of generating natural language descriptions from images has received -a lot of attention in recent years. Consequently, it is becoming increasingly -important to evaluate such image captioning approaches in an automatic manner. -In this paper, we provide an in-depth evaluation of the existing image -captioning metrics through a series of carefully designed experiments. -Moreover, we explore the utilization of the recently proposed Word Mover's -Distance (WMD) document metric for the purpose of image captioning. Our -findings outline the differences and/or similarities between metrics and their -relative robustness by means of extensive correlation, accuracy and distraction -based evaluations. Our results also demonstrate that WMD provides strong -advantages over other metrics. -" -4038,1612.07602,"Hai Ye, Wenhan Chao, Zhunchen Luo, Zhoujun Li",Jointly Extracting Relations with Class Ties via Effective Deep Ranking,cs.AI cs.CL," Connections between relations in relation extraction, which we call class -ties, are common. In distantly supervised scenario, one entity tuple may have -multiple relation facts. Exploiting class ties between relations of one entity -tuple will be promising for distantly supervised relation extraction. However, -previous models are not effective or ignore to model this property. In this -work, to effectively leverage class ties, we propose to make joint relation -extraction with a unified model that integrates convolutional neural network -(CNN) with a general pairwise ranking framework, in which three novel ranking -loss functions are introduced. Additionally, an effective method is presented -to relieve the severe class imbalance problem from NR (not relation) for model -training. Experiments on a widely used dataset show that leveraging class ties -will enhance extraction and demonstrate the effectiveness of our model to learn -class ties. Our model outperforms the baselines significantly, achieving -state-of-the-art performance. -" -4039,1612.07833,Nan Ding and Sebastian Goodman and Fei Sha and Radu Soricut,"Understanding Image and Text Simultaneously: a Dual Vision-Language - Machine Comprehension Task",cs.CL cs.CV," We introduce a new multi-modal task for computer systems, posed as a combined -vision-language comprehension challenge: identifying the most suitable text -describing a scene, given several similar options. Accomplishing the task -entails demonstrating comprehension beyond just recognizing ""keywords"" (or -key-phrases) and their corresponding visual concepts. Instead, it requires an -alignment between the representations of the two modalities that achieves a -visually-grounded ""understanding"" of various linguistic elements and their -dependencies. This new task also admits an easy-to-compute and well-studied -metric: the accuracy in detecting the true target among the decoys. - The paper makes several contributions: an effective and extensible mechanism -for generating decoys from (human-created) image captions; an instance of -applying this mechanism, yielding a large-scale machine comprehension dataset -(based on the COCO images and captions) that we make publicly available; human -evaluation results on this dataset, informing a performance upper-bound; and -several baseline and competitive learning approaches that illustrate the -utility of the proposed task and dataset in advancing both image and language -comprehension. We also show that, in a multi-task learning setting, the -performance on the proposed task is positively correlated with the end-to-end -task of image captioning. -" -4040,1612.07843,"Leila Arras, Franziska Horn, Gr\'egoire Montavon, Klaus-Robert - M\""uller, Wojciech Samek","""What is Relevant in a Text Document?"": An Interpretable Machine - Learning Approach",cs.CL cs.IR cs.LG stat.ML," Text documents can be described by a number of abstract concepts such as -semantic category, writing style, or sentiment. Machine learning (ML) models -have been trained to automatically map documents to these abstract concepts, -allowing to annotate very large text collections, more than could be processed -by a human in a lifetime. Besides predicting the text's category very -accurately, it is also highly desirable to understand how and why the -categorization process takes place. In this paper, we demonstrate that such -understanding can be achieved by tracing the classification decision back to -individual words using layer-wise relevance propagation (LRP), a recently -developed technique for explaining predictions of complex non-linear -classifiers. We train two word-based ML models, a convolutional neural network -(CNN) and a bag-of-words SVM classifier, on a topic categorization task and -adapt the LRP method to decompose the predictions of these models onto words. -Resulting scores indicate how much individual words contribute to the overall -classification decision. This enables one to distill relevant information from -text documents without an explicit semantic information extraction step. We -further use the word-wise relevance scores for generating novel vector-based -document representations which capture semantic information. Based on these -document vectors, we introduce a measure of model explanatory power and show -that, although the SVM and CNN models perform similarly in terms of -classification accuracy, the latter exhibits a higher level of explainability -which makes it more comprehensible for humans and potentially more useful for -other applications. -" -4041,1612.07940,"Lei Shu, Bing Liu, Hu Xu, Annice Kim","Supervised Opinion Aspect Extraction by Exploiting Past Extraction - Results",cs.CL cs.LG," One of the key tasks of sentiment analysis of product reviews is to extract -product aspects or features that users have expressed opinions on. In this -work, we focus on using supervised sequence labeling as the base approach to -performing the task. Although several extraction methods using sequence -labeling methods such as Conditional Random Fields (CRF) and Hidden Markov -Models (HMM) have been proposed, we show that this supervised approach can be -significantly improved by exploiting the idea of concept sharing across -multiple domains. For example, ""screen"" is an aspect in iPhone, but not only -iPhone has a screen, many electronic devices have screens too. When ""screen"" -appears in a review of a new domain (or product), it is likely to be an aspect -too. Knowing this information enables us to do much better extraction in the -new domain. This paper proposes a novel extraction method exploiting this idea -in the context of supervised sequence labeling. Experimental results show that -it produces markedly better results than without using the past information. -" -4042,1612.07956,Kamal Sarkar,A CRF Based POS Tagger for Code-mixed Indian Social Media Text,cs.CL," In this work, we describe a conditional random fields (CRF) based system for -Part-Of- Speech (POS) tagging of code-mixed Indian social media text as part of -our participation in the tool contest on POS tagging for codemixed Indian -social media text, held in conjunction with the 2016 International Conference -on Natural Language Processing, IIT(BHU), India. We participated only in -constrained mode contest for all three language pairs, Bengali-English, -Hindi-English and Telegu-English. Our system achieves the overall average F1 -score of 79.99, which is the highest overall average F1 score among all 16 -systems participated in constrained mode contest. -" -4043,1612.08083,"Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier",Language Modeling with Gated Convolutional Networks,cs.CL," The pre-dominant approach to language modeling to date is based on recurrent -neural networks. Their success on this task is often linked to their ability to -capture unbounded context. In this paper we develop a finite context approach -through stacked convolutions, which can be more efficient since they allow -parallelization over sequential tokens. We propose a novel simplified gating -mechanism that outperforms Oord et al (2016) and investigate the impact of key -architectural decisions. The proposed approach achieves state-of-the-art on the -WikiText-103 benchmark, even though it features long-term dependencies, as well -as competitive results on the Google Billion Words benchmark. Our model reduces -the latency to score a sentence by an order of magnitude compared to a -recurrent baseline. To our knowledge, this is the first time a non-recurrent -approach is competitive with strong recurrent models on these large scale -language tasks. -" -4044,1612.08171,Kamal Sarkar,"KS_JU@DPIL-FIRE2016:Detecting Paraphrases in Indian Languages Using - Multinomial Logistic Regression Model",cs.CL," In this work, we describe a system that detects paraphrases in Indian -Languages as part of our participation in the shared Task on detecting -paraphrases in Indian Languages (DPIL) organized by Forum for Information -Retrieval Evaluation (FIRE) in 2016. Our paraphrase detection method uses a -multinomial logistic regression model trained with a variety of features which -are basically lexical and semantic level similarities between two sentences in -a pair. The performance of the system has been evaluated against the test set -released for the FIRE 2016 shared task on DPIL. Our system achieves the highest -f-measure of 0.95 on task1 in Punjabi language.The performance of our system on -task1 in Hindi language is f-measure of 0.90. Out of 11 teams participated in -the shared task, only four teams participated in all four languages, Hindi, -Punjabi, Malayalam and Tamil, but the remaining 7 teams participated in one of -the four languages. We also participated in task1 and task2 both for all four -Indian Languages. The overall average performance of our system including task1 -and task2 overall four languages is F1-score of 0.81 which is the second -highest score among the four systems that participated in all four languages. -" -4045,1612.08178,"Kamal Sarkar, Debanjan Das, Indra Banerjee, Mamta Kumari and Prasenjit - Biswas",JU_KS_Group@FIRE 2016: Consumer Health Information Search,cs.IR cs.CL," In this paper, we describe the methodology used and the results obtained by -us for completing the tasks given under the shared task on Consumer Health -Information Search (CHIS) collocated with the Forum for Information Retrieval -Evaluation (FIRE) 2016, ISI Kolkata. The shared task consists of two sub-tasks -- (1) task1: given a query and a document/set of documents associated with that -query, the task is to classify the sentences in the document as relevant to the -query or not and (2) task 2: the relevant sentences need to be further -classified as supporting the claim made in the query, or opposing the claim -made in the query. We have participated in both the sub-tasks. The percentage -accuracy obtained by our developed system for task1 was 73.39 which is third -highest among the 9 teams participated in the shared task. -" -4046,1612.08205,Konstantinos Pappas and Rada Mihalcea,Predicting the Industry of Users on Social Media,cs.CL cs.SI," Automatic profiling of social media users is an important task for supporting -a multitude of downstream applications. While a number of studies have used -social media content to extract and study collective social attributes, there -is a lack of substantial research that addresses the detection of a user's -industry. We frame this task as classification using both feature engineering -and ensemble learning. Our industry-detection system uses both posted content -and profile information to detect a user's industry with 64.3% accuracy, -significantly outperforming the majority baseline in a taxonomy of fourteen -industry classes. Our qualitative analysis suggests that a person's industry -not only affects the words used and their perceived meanings, but also the -number and type of emotions being expressed. -" -4047,1612.08220,"Jiwei Li, Will Monroe and Dan Jurafsky",Understanding Neural Networks through Representation Erasure,cs.CL," While neural networks have been successfully applied to many natural language -processing tasks, they come at the cost of interpretability. In this paper, we -propose a general methodology to analyze and interpret decisions from a neural -model by observing the effects on the model of erasing various parts of the -representation, such as input word-vector dimensions, intermediate hidden -units, or input words. We present several approaches to analyzing the effects -of such erasure, from computing the relative difference in evaluation metrics, -to using reinforcement learning to erase the minimum set of input words in -order to flip a neural model's decision. In a comprehensive analysis of -multiple NLP tasks, including linguistic feature classification, sentence-level -sentiment analysis, and document level sentiment aspect prediction, we show -that the proposed methodology not only offers clear explanations about neural -model decisions, but also provides a way to conduct error analysis on neural -models. -" -4048,1612.08333,Karthik Bangalore Mani,Text Summarization using Deep Learning and Ridge Regression,cs.CL," We develop models and extract relevant features for automatic text -summarization and investigate the performance of different models on the DUC -2001 dataset. Two different models were developed, one being a ridge regressor -and the other one was a multi-layer perceptron. The hyperparameters were varied -and their performance were noted. We segregated the summarization task into 2 -main steps, the first being sentence ranking and the second step being sentence -selection. In the first step, given a document, we sort the sentences based on -their Importance, and in the second step, in order to obtain non-redundant -sentences, we weed out the sentences that are have high similarity with the -previously selected sentences. -" -4049,1612.08354,"Gwangbeen Park, Woobin Im","Image-Text Multi-Modal Representation Learning by Adversarial - Backpropagation",cs.CV cs.CL cs.LG," We present novel method for image-text multi-modal representation learning. -In our knowledge, this work is the first approach of applying adversarial -learning concept to multi-modal learning and not exploiting image-text pair -information to learn multi-modal feature. We only use category information in -contrast with most previous methods using image-text pair information for -multi-modal embedding. In this paper, we show that multi-modal feature can be -achieved without image-text pair information and our method makes more similar -distribution with image and text in multi-modal feature space than other -methods which use image-text pair information. And we show our multi-modal -feature has universal semantic information, even though it was trained for -category prediction. Our model is end-to-end backpropagation, intuitive and -easily extended to other multi-modal learning work. -" -4050,1612.08375,"Lang-Chi Yu, Hung-yi Lee, Lin-shan Lee","Abstractive Headline Generation for Spoken Content by Attentive - Recurrent Neural Networks with ASR Error Modeling",cs.CL," Headline generation for spoken content is important since spoken content is -difficult to be shown on the screen and browsed by the user. It is a special -type of abstractive summarization, for which the summaries are generated word -by word from scratch without using any part of the original content. Many deep -learning approaches for headline generation from text document have been -proposed recently, all requiring huge quantities of training data, which is -difficult for spoken document summarization. In this paper, we propose an ASR -error modeling approach to learn the underlying structure of ASR error patterns -and incorporate this model in an Attentive Recurrent Neural Network (ARNN) -architecture. In this way, the model for abstractive headline generation for -spoken content can be learned from abundant text data and the ASR data for some -recognizers. Experiments showed very encouraging results and verified that the -proposed ASR error model works well even when the input spoken content is -recognized by a recognizer very different from the one the model learned from. -" -4051,1612.08504,"Antonin Bergeaud, Yoann Potiron and Juste Raimbault",Classifying Patents Based on their Semantic Content,physics.soc-ph cs.CL," In this paper, we extend some usual techniques of classification resulting -from a large-scale data-mining and network approach. This new technology, which -in particular is designed to be suitable to big data, is used to construct an -open consolidated database from raw data on 4 million patents taken from the US -patent office from 1976 onward. To build the pattern network, not only do we -look at each patent title, but we also examine their full abstract and extract -the relevant keywords accordingly. We refer to this classification as semantic -approach in contrast with the more common technological approach which consists -in taking the topology when considering US Patent office technological classes. -Moreover, we document that both approaches have highly different topological -measures and strong statistical evidence that they feature a different model. -This suggests that our method is a useful tool to extract endogenous -information. -" -4052,1612.08543,Amir Hossein Akhavan Rahnama,Distributed Real-Time Sentiment Analysis for Big Data Social Streams,stat.ML cs.CL cs.DB cs.DC cs.IR," Big data trend has enforced the data-centric systems to have continuous fast -data streams. In recent years, real-time analytics on stream data has formed -into a new research field, which aims to answer queries about -what-is-happening-now with a negligible delay. The real challenge with -real-time stream data processing is that it is impossible to store instances of -data, and therefore online analytical algorithms are utilized. To perform -real-time analytics, pre-processing of data should be performed in a way that -only a short summary of stream is stored in main memory. In addition, due to -high speed of arrival, average processing time for each instance of data should -be in such a way that incoming instances are not lost without being captured. -Lastly, the learner needs to provide high analytical accuracy measures. -Sentinel is a distributed system written in Java that aims to solve this -challenge by enforcing both the processing and learning process to be done in -distributed form. Sentinel is built on top of Apache Storm, a distributed -computing platform. Sentinels learner, Vertical Hoeffding Tree, is a parallel -decision tree-learning algorithm based on the VFDT, with ability of enabling -parallel classification in distributed environments. Sentinel also uses -SpaceSaving to keep a summary of the data stream and stores its summary in a -synopsis data structure. Application of Sentinel on Twitter Public Stream API -is shown and the results are discussed. -" -4053,1612.08989,"Yonatan Belinkov, Alexander Magidow, Maxim Romanov, Avi Shmidman, - Moshe Koppel",Shamela: A Large-Scale Historical Arabic Corpus,cs.CL," Arabic is a widely-spoken language with a rich and long history spanning more -than fourteen centuries. Yet existing Arabic corpora largely focus on the -modern period or lack sufficient diachronic information. We develop a -large-scale, historical corpus of Arabic of about 1 billion words from diverse -periods of time. We clean this corpus, process it with a morphological -analyzer, and enhance it by detecting parallel passages and automatically -dating undated texts. We demonstrate its utility with selected case-studies in -which we show its application to the digital humanities. -" -4054,1612.08994,"Peter Potash, Alexey Romanov and Anna Rumshisky",Here's My Point: Joint Pointer Architecture for Argument Mining,cs.CL," One of the major goals in automated argumentation mining is to uncover the -argument structure present in argumentative text. In order to determine this -structure, one must understand how different individual components of the -overall argument are linked. General consensus in this field dictates that the -argument components form a hierarchy of persuasion, which manifests itself in a -tree structure. This work provides the first neural network-based approach to -argumentation mining, focusing on the two tasks of extracting links between -argument components, and classifying types of argument components. In order to -solve this problem, we propose to use a joint model that is based on a Pointer -Network architecture. A Pointer Network is appealing for this task for the -following reasons: 1) It takes into account the sequential nature of argument -components; 2) By construction, it enforces certain properties of the tree -structure present in argument relations; 3) The hidden representations can be -applied to auxiliary tasks. In order to extend the contribution of the original -Pointer Network model, we construct a joint model that simultaneously attempts -to learn the type of argument component, as well as continuing to predict links -between argument components. The proposed joint model achieves state-of-the-art -results on two separate evaluation corpora, achieving far superior performance -than a regular Pointer Network model. Our results show that optimizing for both -tasks, and adding a fully-connected layer prior to recurrent neural network -input, is crucial for high performance. -" -4055,1612.09113,"Jonathan Godwin, Pontus Stenetorp, Sebastian Riedel","Deep Semi-Supervised Learning with Linguistically Motivated Sequence - Labeling Task Hierarchies",cs.CL," In this paper we present a novel Neural Network algorithm for conducting -semi-supervised learning for sequence labeling tasks arranged in a -linguistically motivated hierarchy. This relationship is exploited to -regularise the representations of supervised tasks by backpropagating the error -of the unsupervised task through the supervised tasks. We introduce a neural -network where lower layers are supervised by junior downstream tasks and the -final layer task is an auxiliary unsupervised task. The architecture shows -improvements of up to two percentage points F1 for Chunking compared to a -plausible baseline. -" -4056,1612.09213,"Vladimir V. Bochkarev, Eduard Yu.Lerner and Anna V. Shevlyakova",Verifying Heaps' law using Google Books Ngram data,cs.CL physics.soc-ph," This article is devoted to the verification of the empirical Heaps law in -European languages using Google Books Ngram corpus data. The connection between -word distribution frequency and expected dependence of individual word number -on text size is analysed in terms of a simple probability model of text -generation. It is shown that the Heaps exponent varies significantly within -characteristic time intervals of 60-100 years. -" -4057,1612.09268,"Natalia Bezerra Mota, Sylvia Pinheiro, Mariano Sigman, Diego Fernandez - Slezak, Guillermo Cecchi, Mauro Copelli, Sidarta Ribeiro",The ontogeny of discourse structure mimics the development of literature,q-bio.NC cs.CL physics.soc-ph," Discourse varies with age, education, psychiatric state and historical epoch, -but the ontogenetic and cultural dynamics of discourse structure remain to be -quantitatively characterized. To this end we investigated word graphs obtained -from verbal reports of 200 subjects ages 2-58, and 676 literary texts spanning -~5,000 years. In healthy subjects, lexical diversity, graph size, and -long-range recurrence departed from initial near-random levels through a -monotonic asymptotic increase across ages, while short-range recurrence showed -a corresponding decrease. These changes were explained by education and suggest -a hierarchical development of discourse structure: short-range recurrence and -lexical diversity stabilize after elementary school, but graph size and -long-range recurrence only stabilize after high school. This gradual maturation -was blurred in psychotic subjects, who maintained in adulthood a near-random -structure. In literature, monotonic asymptotic changes over time were -remarkable: While lexical diversity, long-range recurrence and graph size -increased away from near-randomness, short-range recurrence declined, from -above to below random levels. Bronze Age texts are structurally similar to -childish or psychotic discourses, but subsequent texts converge abruptly to the -healthy adult pattern around the onset of the Axial Age (800-200 BC), a period -of pivotal cultural change. Thus, individually as well as historically, -discourse maturation increases the range of word recurrence away from -randomness. -" -4058,1612.09327,"Ahlam Ansari, Moonish Maknojia and Altamash Shaikh",Intelligent information extraction based on artificial neural network,cs.CL cs.AI," Question Answering System (QAS) is used for information retrieval and natural -language processing (NLP) to reduce human effort. There are numerous QAS based -on the user documents present today, but they all are limited to providing -objective answers and process simple questions only. Complex questions cannot -be answered by the existing QAS, as they require interpretation of the current -and old data as well as the question asked by the user. The above limitations -can be overcome by using deep cases and neural network. Hence we propose a -modified QAS in which we create a deep artificial neural network with -associative memory from text documents. The modified QAS processes the contents -of the text document provided to it and find the answer to even complex -questions in the documents. -" -4059,1612.09535,"Concei\c{c}\~ao Rocha, Al\'ipio Jorge, Roberta Sionara, Paula Brito, - Carlos Pimenta and Solange Rezende","PAMPO: using pattern matching and pos-tagging for effective Named - Entities recognition in Portuguese",cs.IR cs.CL," This paper deals with the entity extraction task (named entity recognition) -of a text mining process that aims at unveiling non-trivial semantic -structures, such as relationships and interaction between entities or -communities. In this paper we present a simple and efficient named entity -extraction algorithm. The method, named PAMPO (PAttern Matching and POs tagging -based algorithm for NER), relies on flexible pattern matching, part-of-speech -tagging and lexical-based rules. It was developed to process texts written in -Portuguese, however it is potentially applicable to other languages as well. - We compare our approach with current alternatives that support Named Entity -Recognition (NER) for content written in Portuguese. These are Alchemy, Zemanta -and Rembrandt. Evaluation of the efficacy of the entity extraction method on -several texts written in Portuguese indicates a considerable improvement on -$recall$ and $F_1$ measures. -" -4060,1612.09542,"Licheng Yu, Hao Tan, Mohit Bansal, Tamara L. Berg",A Joint Speaker-Listener-Reinforcer Model for Referring Expressions,cs.CV cs.AI cs.CL," Referring expressions are natural language constructions used to identify -particular objects within a scene. In this paper, we propose a unified -framework for the tasks of referring expression comprehension and generation. -Our model is composed of three modules: speaker, listener, and reinforcer. The -speaker generates referring expressions, the listener comprehends referring -expressions, and the reinforcer introduces a reward function to guide sampling -of more discriminative expressions. The listener-speaker modules are trained -jointly in an end-to-end learning framework, allowing the modules to be aware -of one another during learning while also benefiting from the discriminative -reinforcer's feedback. We demonstrate that this unified framework and training -achieves state-of-the-art results for both comprehension and generation on -three referring expression datasets. Project and demo page: -https://vision.cs.unc.edu/refer -" -4061,1612.09574,Massimiliano Dal Mas,"Automatic Data Deformation Analysis on Evolving Folksonomy Driven - Environment",cs.IR cs.CL cs.CY cs.SI," The Folksodriven framework makes it possible for data scientists to define an -ontology environment where searching for buried patterns that have some kind of -predictive power to build predictive models more effectively. It accomplishes -this through an abstractions that isolate parameters of the predictive modeling -process searching for patterns and designing the feature set, too. To reflect -the evolving knowledge, this paper considers ontologies based on folksonomies -according to a new concept structure called ""Folksodriven"" to represent -folksonomies. So, the studies on the transformational regulation of the -Folksodriven tags are regarded to be important for adaptive folksonomies -classifications in an evolving environment used by Intelligent Systems to -represent the knowledge sharing. Folksodriven tags are used to categorize -salient data points so they can be fed to a machine-learning system and -""featurizing"" the data. -" -4062,1701.00066,Sree Harsha Ramesh and Raveena R Kumar,"A POS Tagger for Code Mixed Indian Social Media Text - ICON-2016 NLP - Tools Contest Entry from Surukam",cs.CL," Building Part-of-Speech (POS) taggers for code-mixed Indian languages is a -particularly challenging problem in computational linguistics due to a dearth -of accurately annotated training corpora. ICON, as part of its NLP tools -contest has organized this challenge as a shared task for the second -consecutive year to improve the state-of-the-art. This paper describes the POS -tagger built at Surukam to predict the coarse-grained and fine-grained POS tags -for three language pairs - Bengali-English, Telugu-English and Hindi-English, -with the text spanning three popular social media platforms - Facebook, -WhatsApp and Twitter. We employed Conditional Random Fields as the sequence -tagging algorithm and used a library called sklearn-crfsuite - a thin wrapper -around CRFsuite for training our model. Among the features we used include - -character n-grams, language information and patterns for emoji, number, -punctuation and web-address. Our submissions in the constrained -environment,i.e., without making any use of monolingual POS taggers or the -like, obtained an overall average F1-score of 76.45%, which is comparable to -the 2015 winning score of 76.79%. -" -4063,1701.00138,"Jun Suzuki, Masaaki Nagata","Cutting-off Redundant Repeating Generations for Neural Abstractive - Summarization",cs.CL cs.AI stat.ML," This paper tackles the reduction of redundant repeating generation that is -often observed in RNN-based encoder-decoder models. Our basic idea is to -jointly estimate the upper-bound frequency of each target vocabulary in the -encoder and control the output words based on the estimation in the decoder. -Our method shows significant improvement over a strong RNN-based -encoder-decoder baseline and achieved its best results on an abstractive -summarization benchmark. -" -4064,1701.00145,"Silvio Amir, R\'amon Astudillo, Wang Ling, Paula C. Carvalho, M\'ario - J. Silva","Expanding Subjective Lexicons for Social Media Mining with Embedding - Subspaces",cs.CL," Recent approaches for sentiment lexicon induction have capitalized on -pre-trained word embeddings that capture latent semantic properties. However, -embeddings obtained by optimizing performance of a given task (e.g. predicting -contextual words) are sub-optimal for other applications. In this paper, we -address this problem by exploiting task-specific representations, induced via -embedding sub-space projection. This allows us to expand lexicons describing -multiple semantic properties. For each property, our model jointly learns -suitable representations and the concomitant predictor. Experiments conducted -over multiple subjective lexicons, show that our model outperforms previous -work and other baselines; even in low training data regimes. Furthermore, -lexicon-based sentiment classifiers built on top of our lexicons outperform -similar resources and yield performances comparable to those of supervised -models. -" -4065,1701.00168,Jan \v{S}najder,"Social Media Argumentation Mining: The Quest for Deliberateness in - Raucousness",cs.CL," Argumentation mining from social media content has attracted increasing -attention. The task is both challenging and rewarding. The informal nature of -user-generated content makes the task dauntingly difficult. On the other hand, -the insights that could be gained by a large-scale analysis of social media -argumentation make it a very worthwhile task. In this position paper I discuss -the motivation for social media argumentation mining, as well as the tasks and -challenges involved. -" -4066,1701.00185,"Jiaming Xu, Bo Xu, Peng Wang, Suncong Zheng, Guanhua Tian, Jun Zhao, - Bo Xu",Self-Taught Convolutional Neural Networks for Short Text Clustering,cs.IR cs.CL," Short text clustering is a challenging problem due to its sparseness of text -representation. Here we propose a flexible Self-Taught Convolutional neural -network framework for Short Text Clustering (dubbed STC^2), which can flexibly -and successfully incorporate more useful semantic features and learn non-biased -deep text representation in an unsupervised manner. In our framework, the -original raw text features are firstly embedded into compact binary codes by -using one existing unsupervised dimensionality reduction methods. Then, word -embeddings are explored and fed into convolutional neural networks to learn -deep feature representations, meanwhile the output units are used to fit the -pre-trained binary codes in the training process. Finally, we get the optimal -clusters by employing K-means to cluster the learned representations. Extensive -experimental results demonstrate that the proposed framework is effective, -flexible and outperform several popular clustering methods when tested on three -public short text datasets. -" -4067,1701.00188,"Yuan Zhang, Regina Barzilay, Tommi Jaakkola",Aspect-augmented Adversarial Networks for Domain Adaptation,cs.CL," We introduce a neural method for transfer learning between two (source and -target) classification tasks or aspects over the same domain. Rather than -training on target labels, we use a few keywords pertaining to source and -target aspects indicating sentence relevance instead of document class labels. -Documents are encoded by learning to embed and softly select relevant sentences -in an aspect-dependent manner. A shared classifier is trained on the source -encoded documents and labels, and applied to target encoded documents. We -ensure transfer through aspect-adversarial training so that encoded documents -are, as sets, aspect-invariant. Experimental results demonstrate that our -approach outperforms different baselines and model variants on two datasets, -yielding an improvement of 27% on a pathology dataset and 5% on a review -dataset. -" -4068,1701.00289,"David J.P. O'Sullivan and Guillermo Gardu\~no-Hern\'andez and James P. - Gleeson and Mariano Beguerisse-D\'iaz","Integrating sentiment and social structure to determine preference - alignments: The Irish Marriage Referendum",cs.SI cs.CL cs.IR physics.soc-ph," We examine the relationship between social structure and sentiment through -the analysis of a large collection of tweets about the Irish Marriage -Referendum of 2015. We obtain the sentiment of every tweet with the hashtags -#marref and #marriageref that was posted in the days leading to the referendum, -and construct networks to aggregate sentiment and use it to study the -interactions among users. Our results show that the sentiment of mention tweets -posted by users is correlated with the sentiment of received mentions, and -there are significantly more connections between users with similar sentiment -scores than among users with opposite scores in the mention and follower -networks. We combine the community structure of the two networks with the -activity level of the users and sentiment scores to find groups of users who -support voting `yes' or `no' in the referendum. There were numerous -conversations between users on opposing sides of the debate in the absence of -follower connections, which suggests that there were efforts by some users to -establish dialogue and debate across ideological divisions. Our analysis shows -that social structure can be integrated successfully with sentiment to analyse -and understand the disposition of social media users. These results have -potential applications in the integration of data and meta-data to study -opinion dynamics, public opinion modelling, and polling. -" -4069,1701.00504,"Peter Krejzl, Barbora Hourov\'a, Josef Steinberger",Stance detection in online discussions,cs.CL," This paper describes our system created to detect stance in online -discussions. The goal is to identify whether the author of a comment is in -favor of the given target or against. Our approach is based on a maximum -entropy classifier, which uses surface-level, sentiment and domain-specific -features. The system was originally developed to detect stance in English -tweets. We adapted it to process Czech news commentaries. -" -4070,1701.00562,"Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li and Yifan Gong",End-to-End Attention based Text-Dependent Speaker Verification,cs.CL stat.ML," A new type of End-to-End system for text-dependent speaker verification is -presented in this paper. Previously, using the phonetically -discriminative/speaker discriminative DNNs as feature extractors for speaker -verification has shown promising results. The extracted frame-level (DNN -bottleneck, posterior or d-vector) features are equally weighted and aggregated -to compute an utterance-level speaker representation (d-vector or i-vector). In -this work we use speaker discriminative CNNs to extract the noise-robust -frame-level features. These features are smartly combined to form an -utterance-level speaker vector through an attention mechanism. The proposed -attention model takes the speaker discriminative information and the phonetic -information to learn the weights. The whole system, including the CNN and -attention model, is joint optimized using an end-to-end criterion. The training -algorithm imitates exactly the evaluation process --- directly mapping a test -utterance and a few target speaker utterances into a single verification score. -The algorithm can automatically select the most similar impostor for each -target speaker to train the network. We demonstrated the effectiveness of the -proposed end-to-end system on Windows $10$ ""Hey Cortana"" speaker verification -task. -" -4071,1701.00576,"Huijia Wu, Jiajun Zhang, Chengqing Zong",Shortcut Sequence Tagging,cs.CL," Deep stacked RNNs are usually hard to train. Adding shortcut connections -across different layers is a common way to ease the training of stacked -networks. However, extra shortcuts make the recurrent step more complicated. To -simply the stacked architecture, we propose a framework called shortcut block, -which is a marriage of the gating mechanism and shortcuts, while discarding the -self-connected part in LSTM cell. We present extensive empirical experiments -showing that this design makes training easy and improves generalization. We -propose various shortcut block topologies and compositions to explore its -effectiveness. Based on this architecture, we obtain a 6% relatively -improvement over the state-of-the-art on CCGbank supertagging dataset. We also -get comparable results on POS tagging task. -" -4072,1701.00660,Dan Marsden (University of Oxford),Ambiguity and Incomplete Information in Categorical Models of Language,cs.LO cs.CL math.CT," We investigate notions of ambiguity and partial information in categorical -distributional models of natural language. Probabilistic ambiguity has -previously been studied using Selinger's CPM construction. This construction -works well for models built upon vector spaces, as has been shown in quantum -computational applications. Unfortunately, it doesn't seem to provide a -satisfactory method for introducing mixing in other compact closed categories -such as the category of sets and binary relations. We therefore lack a uniform -strategy for extending a category to model imprecise linguistic information. - In this work we adopt a different approach. We analyze different forms of -ambiguous and incomplete information, both with and without quantitative -probabilistic data. Each scheme then corresponds to a suitable enrichment of -the category in which we model language. We view different monads as -encapsulating the informational behaviour of interest, by analogy with their -use in modelling side effects in computation. Previous results of Jacobs then -allow us to systematically construct suitable bases for enrichment. - We show that we can freely enrich arbitrary dagger compact closed categories -in order to capture all the phenomena of interest, whilst retaining the -important dagger compact closed structure. This allows us to construct a model -with real convex combination of binary relations that makes non-trivial use of -the scalars. Finally we relate our various different enrichments, showing that -finite subconvex algebra enrichment covers all the effects under consideration. -" -4073,1701.00728,"Pashutan Modaresi, Philipp Gross, Siavash Sefidrodi, Mirja Eckhof, - Stefan Conrad","On (Commercial) Benefits of Automatic Text Summarization Systems in the - News Domain: A Case of Media Monitoring and Media Response Analysis",cs.CL," In this work, we present the results of a systematic study to investigate the -(commercial) benefits of automatic text summarization systems in a real world -scenario. More specifically, we define a use case in the context of media -monitoring and media response analysis and claim that even using a simple -query-based extractive approach can dramatically save the processing time of -the employees without significantly reducing the quality of their work. -" -4074,1701.00749,Christophe Van Gysel and Evangelos Kanoulas and Maarten de Rijke,Pyndri: a Python Interface to the Indri Search Engine,cs.IR cs.CL," We introduce pyndri, a Python interface to the Indri search engine. Pyndri -allows to access Indri indexes from Python at two levels: (1) dictionary and -tokenized document collection, (2) evaluating queries on the index. We hope -that with the release of pyndri, we will stimulate reproducible, open and -fast-paced IR research. -" -4075,1701.00798,"Amir Hossein Yazdavar, Monireh Ebrahimi, Naomie Salim",Fuzzy Based Implicit Sentiment Analysis on Quantitative Sentences,cs.CL," With the rapid growth of social media on the web, emotional polarity -computation has become a flourishing frontier in the text mining community. -However, it is challenging to understand the latest trends and summarize the -state or general opinions about products due to the big diversity and size of -social media data and this creates the need of automated and real time opinion -extraction and mining. On the other hand, the bulk of current research has been -devoted to study the subjective sentences which contain opinion keywords and -limited work has been reported for objective statements that imply sentiment. -In this paper, fuzzy based knowledge engineering model has been developed for -sentiment classification of special group of such sentences including the -change or deviation from desired range or value. Drug reviews are the rich -source of such statements. Therefore, in this research, some experiments were -carried out on patient's reviews on several different cholesterol lowering -drugs to determine their sentiment polarity. The main conclusion through this -study is, in order to increase the accuracy level of existing drug opinion -mining systems, objective sentences which imply opinion should be taken into -account. Our experimental results demonstrate that our proposed model obtains -over 72 percent F1 value. -" -4076,1701.00851,Herman Kamper,"Unsupervised neural and Bayesian models for zero-resource speech - processing",cs.CL cs.LG," In settings where only unlabelled speech data is available, zero-resource -speech technology needs to be developed without transcriptions, pronunciation -dictionaries, or language modelling text. There are two central problems in -zero-resource speech processing: (i) finding frame-level feature -representations which make it easier to discriminate between linguistic units -(phones or words), and (ii) segmenting and clustering unlabelled speech into -meaningful units. In this thesis, we argue that a combination of top-down and -bottom-up modelling is advantageous in tackling these two problems. - To address the problem of frame-level representation learning, we present the -correspondence autoencoder (cAE), a neural network trained with weak top-down -supervision from an unsupervised term discovery system. By combining this -top-down supervision with unsupervised bottom-up initialization, the cAE yields -much more discriminative features than previous approaches. We then present our -unsupervised segmental Bayesian model that segments and clusters unlabelled -speech into hypothesized words. By imposing a consistent top-down segmentation -while also using bottom-up knowledge from detected syllable boundaries, our -system outperforms several others on multi-speaker conversational English and -Xitsonga speech data. Finally, we show that the clusters discovered by the -segmental Bayesian model can be made less speaker- and gender-specific by using -features from the cAE instead of traditional acoustic features. - In summary, the different models and systems presented in this thesis show -that both top-down and bottom-up modelling can improve representation learning, -segmentation and clustering of unlabelled speech data. -" -4077,1701.00874,"Xuezhe Ma, Eduard Hovy",Neural Probabilistic Model for Non-projective MST Parsing,cs.CL cs.LG stat.ML," In this paper, we propose a probabilistic parsing model, which defines a -proper conditional probability distribution over non-projective dependency -trees for a given sentence, using neural representations as inputs. The neural -network architecture is based on bi-directional LSTM-CNNs which benefits from -both word- and character-level representations automatically, by using -combination of bidirectional LSTM and CNN. On top of the neural network, we -introduce a probabilistic structured layer, defining a conditional log-linear -model over non-projective trees. We evaluate our model on 17 different -datasets, across 14 different languages. By exploiting Kirchhoff's Matrix-Tree -Theorem (Tutte, 1984), the partition functions and marginals can be computed -efficiently, leading to a straight-forward end-to-end model training procedure -via back-propagation. Our parser achieves state-of-the-art parsing performance -on nine datasets. -" -4078,1701.00946,"Ryan Cotterell and Hinrich Sch\""utze",Joint Semantic Synthesis and Morphological Analysis of the Derived Word,cs.CL," Much like sentences are composed of words, words themselves are composed of -smaller units. For example, the English word questionably can be analyzed as -question+able+ly. However, this structural decomposition of the word does not -directly give us a semantic representation of the word's meaning. Since -morphology obeys the principle of compositionality, the semantics of the word -can be systematically derived from the meaning of its parts. In this work, we -propose a novel probabilistic model of word formation that captures both the -analysis of a word w into its constituents segments and the synthesis of the -meaning of w from the meanings of those segments. Our model jointly learns to -segment words into morphemes and compose distributional semantic vectors of -those morphemes. We experiment with the model on English CELEX data and German -DerivBase (Zeller et al., 2013) data. We show that jointly modeling semantics -increases both segmentation accuracy and morpheme F1 by between 3% and 5%. -Additionally, we investigate different models of vector composition, showing -that recurrent neural networks yield an improvement over simple additive -models. Finally, we study the degree to which the representations correspond to -a linguist's notion of morphological productivity. -" -4079,1701.00991,"Christoph Hube, Frank Fischer, Robert J\""aschke, Gerhard Lauer, Mads - Rosendahl Thomsen","World Literature According to Wikipedia: Introduction to a DBpedia-Based - Framework",cs.IR cs.CL," Among the manifold takes on world literature, it is our goal to contribute to -the discussion from a digital point of view by analyzing the representation of -world literature in Wikipedia with its millions of articles in hundreds of -languages. As a preliminary, we introduce and compare three different -approaches to identify writers on Wikipedia using data from DBpedia, a -community project with the goal of extracting and providing structured -information from Wikipedia. Equipped with our basic set of writers, we analyze -how they are represented throughout the 15 biggest Wikipedia language versions. -We combine intrinsic measures (mostly examining the connectedness of articles) -with extrinsic ones (analyzing how often articles are frequented by readers) -and develop methods to evaluate our results. The better part of our findings -seems to convey a rather conservative, old-fashioned version of world -literature, but a version derived from reproducible facts revealing an implicit -literary canon based on the editing and reading behavior of millions of people. -While still having to solve some known issues, the introduced methods will help -us build an observatory of world literature to further investigate its -representativeness and biases. -" -4080,1701.01126,"Kai Zhao, Liang Huang, Mingbo Ma",Textual Entailment with Structured Attentions and Composition,cs.CL," Deep learning techniques are increasingly popular in the textual entailment -task, overcoming the fragility of traditional discrete models with hard -alignments and logics. In particular, the recently proposed attention models -(Rockt\""aschel et al., 2015; Wang and Jiang, 2015) achieves state-of-the-art -accuracy by computing soft word alignments between the premise and hypothesis -sentences. However, there remains a major limitation: this line of work -completely ignores syntax and recursion, which is helpful in many traditional -efforts. We show that it is beneficial to extend the attention model to tree -nodes between premise and hypothesis. More importantly, this subtree-level -attention reveals information about entailment relation. We study the recursive -composition of this subtree-level entailment relation, which can be viewed as a -soft version of the Natural Logic framework (MacCartney and Manning, 2009). -Experiments show that our structured attention and entailment composition model -can correctly identify and infer entailment relations from the bottom up, and -bring significant improvements in accuracy. -" -4081,1701.01417,Pranav Agrawal,Exploration of Proximity Heuristics in Length Normalization,cs.IR cs.CL," Ranking functions used in information retrieval are primarily used in the -search engines and they are often adopted for various language processing -applications. However, features used in the construction of ranking functions -should be analyzed before applying it on a data set. This paper gives -guidelines on construction of generalized ranking functions with -application-dependent features. The paper prescribes a specific case of a -generalized function for recommendation system using feature engineering -guidelines on the given data set. The behavior of both generalized and specific -functions are studied and implemented on the unstructured textual data. The -proximity feature based ranking function has outperformed by 52% from regular -BM25. -" -4082,1701.01505,"Da Kuang, P. Jeffrey Brantingham, Andrea L. Bertozzi",Crime Topic Modeling,cs.CL," The classification of crime into discrete categories entails a massive loss -of information. Crimes emerge out of a complex mix of behaviors and situations, -yet most of these details cannot be captured by singular crime type labels. -This information loss impacts our ability to not only understand the causes of -crime, but also how to develop optimal crime prevention strategies. We apply -machine learning methods to short narrative text descriptions accompanying -crime records with the goal of discovering ecologically more meaningful latent -crime classes. We term these latent classes ""crime topics"" in reference to -text-based topic modeling methods that produce them. We use topic distributions -to measure clustering among formally recognized crime types. Crime topics -replicate broad distinctions between violent and property crime, but also -reveal nuances linked to target characteristics, situational conditions and the -tools and methods of attack. Formal crime types are not discrete in topic -space. Rather, crime types are distributed across a range of crime topics. -Similarly, individual crime topics are distributed across a range of formal -crime types. Key ecological groups include identity theft, shoplifting, -burglary and theft, car crimes and vandalism, criminal threats and confidence -crimes, and violent crimes. Though not a replacement for formal legal crime -classifications, crime topics provide a unique window into the heterogeneous -causal processes underlying crime. -" -4083,1701.01565,"Edison Marrese-Taylor, Yutaka Matsuo",Replication issues in syntax-based aspect extraction for opinion mining,cs.CL," Reproducing experiments is an important instrument to validate previous work -and build upon existing approaches. It has been tackled numerous times in -different areas of science. In this paper, we introduce an empirical -replicability study of three well-known algorithms for syntactic centric -aspect-based opinion mining. We show that reproducing results continues to be a -difficult endeavor, mainly due to the lack of details regarding preprocessing -and parameter setting, as well as due to the absence of available -implementations that clarify these details. We consider these are important -threats to validity of the research on the field, specifically when compared to -other problems in NLP where public datasets and code availability are critical -validity components. We conclude by encouraging code-based research, which we -think has a key role in helping researchers to understand the meaning of the -state-of-the-art better and to generate continuous advances. -" -4084,1701.01574,"Haoyue Shi, Caihua Li and Junfeng Hu","Real Multi-Sense or Pseudo Multi-Sense: An Approach to Improve Word - Representation",cs.CL," Previous researches have shown that learning multiple representations for -polysemous words can improve the performance of word embeddings on many tasks. -However, this leads to another problem. Several vectors of a word may actually -point to the same meaning, namely pseudo multi-sense. In this paper, we -introduce the concept of pseudo multi-sense, and then propose an algorithm to -detect such cases. With the consideration of the detected pseudo multi-sense -cases, we try to refine the existing word embeddings to eliminate the influence -of pseudo multi-sense. Moreover, we apply our algorithm on previous released -multi-sense word embeddings and tested it on artificial word similarity tasks -and the analogy task. The result of the experiments shows that diminishing -pseudo multi-sense can improve the quality of word representations. Thus, our -method is actually an efficient way to reduce linguistic complexity. -" -4085,1701.01614,"Tsutomu Hirao, Masaaki Nishino, Jun Suzuki, Masaaki Nagata",Enumeration of Extractive Oracle Summaries,cs.CL," To analyze the limitations and the future directions of the extractive -summarization paradigm, this paper proposes an Integer Linear Programming (ILP) -formulation to obtain extractive oracle summaries in terms of ROUGE-N. We also -propose an algorithm that enumerates all of the oracle summaries for a set of -reference summaries to exploit F-measures that evaluate which system summaries -contain how many sentences that are extracted as an oracle summary. Our -experimental results obtained from Document Understanding Conference (DUC) -corpora demonstrated the following: (1) room still exists to improve the -performance of extractive summarization; (2) the F-measures derived from the -enumerated oracle summaries have significantly stronger correlations with human -judgment than those derived from single oracle summaries. -" -4086,1701.01623,Michael Sejr Schlichtkrull and Anders S{\o}gaard,"Cross-Lingual Dependency Parsing with Late Decoding for Truly - Low-Resource Languages",cs.CL," In cross-lingual dependency annotation projection, information is often lost -during transfer because of early decoding. We present an end-to-end graph-based -neural network dependency parser that can be trained to reproduce matrices of -edge scores, which can be directly projected across word alignments. We show -that our approach to cross-lingual dependency parsing is not only simpler, but -also achieves an absolute improvement of 2.25% averaged across 10 languages -compared to the previous state of the art. -" -4087,1701.01811,"Filippos Kokkinos, Alexandros Potamianos",Structural Attention Neural Networks for improved sentiment analysis,cs.CL cs.NE," We introduce a tree-structured attention neural network for sentences and -small phrases and apply it to the problem of sentiment classification. Our -model expands the current recursive models by incorporating structural -information around a node of a syntactic tree using both bottom-up and top-down -information propagation. Also, the model utilizes structural attention to -identify the most salient representations during the construction of the -syntactic tree. To our knowledge, the proposed models achieve state of the art -performance on the Stanford Sentiment Treebank dataset. -" -4088,1701.01854,"Mohaddeseh Bastan, Shahram Khadivi, Mohammad Mehdi Homayounpour","Neural Machine Translation on Scarce-Resource Condition: A case-study on - Persian-English",cs.CL," Neural Machine Translation (NMT) is a new approach for Machine Translation -(MT), and due to its success, it has absorbed the attention of many researchers -in the field. In this paper, we study NMT model on Persian-English language -pairs, to analyze the model and investigate the appropriateness of the model -for scarce-resourced scenarios, the situation that exists for Persian-centered -translation systems. We adjust the model for the Persian language and find the -best parameters and hyper parameters for two tasks: translation and -transliteration. We also apply some preprocessing task on the Persian dataset -which yields to increase for about one point in terms of BLEU score. Also, we -have modified the loss function to enhance the word alignment of the model. -This new loss function yields a total of 1.87 point improvements in terms of -BLEU score in the translation quality. -" -4089,1701.01908,"Fan Xu, Mingwen Wang and Maoxi Li",Sentence-level dialects identification in the greater China region,cs.CL," Identifying the different varieties of the same language is more challenging -than unrelated languages identification. In this paper, we propose an approach -to discriminate language varieties or dialects of Mandarin Chinese for the -Mainland China, Hong Kong, Taiwan, Macao, Malaysia and Singapore, a.k.a., the -Greater China Region (GCR). When applied to the dialects identification of the -GCR, we find that the commonly used character-level or word-level uni-gram -feature is not very efficient since there exist several specific problems such -as the ambiguity and context-dependent characteristic of words in the dialects -of the GCR. To overcome these challenges, we use not only the general features -like character-level n-gram, but also many new word-level features, including -PMI-based and word alignment-based features. A series of evaluation results on -both the news and open-domain dataset from Wikipedia show the effectiveness of -the proposed approach. -" -4090,1701.02025,"Yadollah Yaghoobzadeh and Hinrich Sch\""utze","Multi-level Representations for Fine-Grained Typing of Knowledge Base - Entities",cs.CL cs.AI," Entities are essential elements of natural language. In this paper, we -present methods for learning multi-level representations of entities on three -complementary levels: character (character patterns in entity names extracted, -e.g., by neural networks), word (embeddings of words in entity names) and -entity (entity embeddings). We investigate state-of-the-art learning methods on -each level and find large differences, e.g., for deep learning models, -traditional ngram features and the subword model of fasttext (Bojanowski et -al., 2016) on the character level; for word2vec (Mikolov et al., 2013) on the -word level; and for the order-aware model wang2vec (Ling et al., 2015a) on the -entity level. We confirm experimentally that each level of representation -contributes complementary information and a joint representation of all three -levels improves the existing embedding based baseline for fine-grained entity -typing by a large margin. Additionally, we show that adding information from -entity descriptions further improves multi-level representations of entities. -" -4091,1701.02073,"Weinan Zhang, Ting Liu, Yifa Wang, Qingfu Zhu",Neural Personalized Response Generation as Domain Adaptation,cs.CL," In this paper, we focus on the personalized response generation for -conversational systems. Based on the sequence to sequence learning, especially -the encoder-decoder framework, we propose a two-phase approach, namely -initialization then adaptation, to model the responding style of human and then -generate personalized responses. For evaluation, we propose a novel human aided -method to evaluate the performance of the personalized response generation -models by online real-time conversation and offline human judgement. Moreover, -the lexical divergence of the responses generated by the 5 personalized models -indicates that the proposed two-phase approach achieves good results on -modeling the responding style of human and generating personalized responses -for the conversational systems. -" -4092,1701.02149,"Wenpeng Yin and Hinrich Sch\""utze","Task-Specific Attentive Pooling of Phrase Alignments Contributes to - Sentence Matching",cs.CL," This work studies comparatively two typical sentence matching tasks: textual -entailment (TE) and answer selection (AS), observing that weaker phrase -alignments are more critical in TE, while stronger phrase alignments deserve -more attention in AS. The key to reach this observation lies in phrase -detection, phrase representation, phrase alignment, and more importantly how to -connect those aligned phrases of different matching degrees with the final -classifier. Prior work (i) has limitations in phrase generation and -representation, or (ii) conducts alignment at word and phrase levels by -handcrafted features or (iii) utilizes a single framework of alignment without -considering the characteristics of specific tasks, which limits the framework's -effectiveness across tasks. We propose an architecture based on Gated Recurrent -Unit that supports (i) representation learning of phrases of arbitrary -granularity and (ii) task-specific attentive pooling of phrase alignments -between two sentences. Experimental results on TE and AS match our observation -and show the effectiveness of our approach. -" -4093,1701.02163,Valentina Franzoni,"Just an Update on PMING Distance for Web-based Semantic Similarity in - Artificial Intelligence and Data Mining",cs.AI cs.CL cs.IR math.PR," One of the main problems that emerges in the classic approach to semantics is -the difficulty in acquisition and maintenance of ontologies and semantic -annotations. On the other hand, the Internet explosion and the massive -diffusion of mobile smart devices lead to the creation of a worldwide system, -which information is daily checked and fueled by the contribution of millions -of users who interacts in a collaborative way. Search engines, continually -exploring the Web, are a natural source of information on which to base a -modern approach to semantic annotation. A promising idea is that it is possible -to generalize the semantic similarity, under the assumption that semantically -similar terms behave similarly, and define collaborative proximity measures -based on the indexing information returned by search engines. The PMING -Distance is a proximity measure used in data mining and information retrieval, -which collaborative information express the degree of relationship between two -terms, using only the number of documents returned as result for a query on a -search engine. In this work, the PMINIG Distance is updated, providing a novel -formal algebraic definition, which corrects previous works. The novel point of -view underlines the features of the PMING to be a locally normalized linear -combination of the Pointwise Mutual Information and Normalized Google Distance. -The analyzed measure dynamically reflects the collaborative change made on the -web resources. -" -4094,1701.02185,"Anca Dumitrache, Lora Aroyo, Chris Welty",Crowdsourcing Ground Truth for Medical Relation Extraction,cs.CL cs.HC," Cognitive computing systems require human labeled data for evaluation, and -often for training. The standard practice used in gathering this data minimizes -disagreement between annotators, and we have found this results in data that -fails to account for the ambiguity inherent in language. We have proposed the -CrowdTruth method for collecting ground truth through crowdsourcing, that -reconsiders the role of people in machine learning based on the observation -that disagreement between annotators provides a useful signal for phenomena -such as ambiguity in the text. We report on using this method to build an -annotated data set for medical relation extraction for the $cause$ and $treat$ -relations, and how this data performed in a supervised training experiment. We -demonstrate that by modeling ambiguity, labeled data gathered from crowd -workers can (1) reach the level of quality of domain experts for this task -while reducing the cost, and (2) provide better training data at scale than -distant supervision. We further propose and validate new weighted measures for -precision, recall, and F-measure, that account for ambiguity in both human and -machine performance on this task. -" -4095,1701.02477,"Abhinav Thanda, Shankar M Venkatesan","Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic - Speech Recognition",cs.CL cs.AI cs.CV cs.LG," Multi-task learning (MTL) involves the simultaneous training of two or more -related tasks over shared representations. In this work, we apply MTL to -audio-visual automatic speech recognition(AV-ASR). Our primary task is to learn -a mapping between audio-visual fused features and frame labels obtained from -acoustic GMM/HMM model. This is combined with an auxiliary task which maps -visual features to frame labels obtained from a separate visual GMM/HMM model. -The MTL model is tested at various levels of babble noise and the results are -compared with a base-line hybrid DNN-HMM AV-ASR model. Our results indicate -that MTL is especially useful at higher level of noise. Compared to base-line, -upto 7\% relative improvement in WER is reported at -3 SNR dB -" -4096,1701.02481,Yang Xu and Jiawei Liu,Implicitly Incorporating Morphological Information into Word Embedding,cs.CL cs.LG," In this paper, we propose three novel models to enhance word embedding by -implicitly using morphological information. Experiments on word similarity and -syntactic analogy show that the implicit models are superior to traditional -explicit ones. Our models outperform all state-of-the-art baselines and -significantly improve the performance on both tasks. Moreover, our performance -on the smallest corpus is similar to the performance of CBOW on the corpus -which is five times the size of ours. Parameter analysis indicates that the -implicit models can supplement semantic information during the word embedding -training process. -" -4097,1701.02593,"Diego Marcheggiani, Anton Frolov, Ivan Titov","A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based - Semantic Role Labeling",cs.CL cs.AI," We introduce a simple and accurate neural model for dependency-based semantic -role labeling. Our model predicts predicate-argument dependencies relying on -states of a bidirectional LSTM encoder. The semantic role labeler achieves -competitive performance on English, even without any kind of syntactic -information and only using local inference. However, when automatically -predicted part-of-speech tags are provided as input, it substantially -outperforms all previous local models and approaches the best reported results -on the English CoNLL-2009 dataset. We also consider Chinese, Czech and Spanish -where our approach also achieves competitive results. Syntactic parsers are -unreliable on out-of-domain data, so standard (i.e., syntactically-informed) -SRL models are hindered when tested in this setting. Our syntax-agnostic model -appears more robust, resulting in the best reported results on standard -out-of-domain test sets. -" -4098,1701.02720,"Ying Zhang, Mohammad Pezeshki, Philemon Brakel, Saizheng Zhang, Cesar - Laurent Yoshua Bengio, Aaron Courville","Towards End-to-End Speech Recognition with Deep Convolutional Neural - Networks",cs.CL cs.LG stat.ML," Convolutional Neural Networks (CNNs) are effective models for reducing -spectral variations and modeling spectral correlations in acoustic features for -automatic speech recognition (ASR). Hybrid speech recognition systems -incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models -(HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. -Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural -Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it -feasible to train an end-to-end speech recognition system instead of hybrid -settings. However, RNNs are computationally expensive and sometimes difficult -to train. In this paper, inspired by the advantages of both CNNs and the CTC -approach, we propose an end-to-end speech framework for sequence labeling, by -combining hierarchical CNNs with CTC directly without recurrent connections. By -evaluating the approach on the TIMIT phoneme recognition task, we show that the -proposed model is not only computationally efficient, but also competitive with -the existing baseline systems. Moreover, we argue that CNNs have the capability -to model temporal correlations with appropriate context information. -" -4099,1701.02795,Hardie Cate and Zeshan Hussain,Bidirectional American Sign Language to English Translation,cs.CL," We outline a bidirectional translation system that converts sentences from -American Sign Language (ASL) to English, and vice versa. To perform machine -translation between ASL and English, we utilize a generative approach. -Specifically, we employ an adjustment to the IBM word-alignment model 1 (IBM -WAM1), where we define language models for English and ASL, as well as a -translation model, and attempt to generate a translation that maximizes the -posterior distribution defined by these models. Then, using these models, we -are able to quantify the concepts of fluency and faithfulness of a translation -between languages. -" -4100,1701.02810,"Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, Alexander M. - Rush",OpenNMT: Open-Source Toolkit for Neural Machine Translation,cs.CL cs.AI cs.NE," We describe an open-source toolkit for neural machine translation (NMT). The -toolkit prioritizes efficiency, modularity, and extensibility with the goal of -supporting NMT research into model architectures, feature representations, and -source modalities, while maintaining competitive performance and reasonable -training requirements. The toolkit consists of modeling and translation -support, as well as detailed pedagogical documentation about the underlying -techniques. -" -4101,1701.02854,"Cong Duy Vu Hoang (University of Melbourne), Gholamreza Haffari - (Monash University), Trevor Cohn (University of Melbourne)","Towards Decoding as Continuous Optimization in Neural Machine - Translation",cs.CL cs.AI," We propose a novel decoding approach for neural machine translation (NMT) -based on continuous optimisation. We convert decoding - basically a discrete -optimization problem - into a continuous optimization problem. The resulting -constrained continuous optimisation problem is then tackled using -gradient-based methods. Our powerful decoding framework enables decoding -intractable models such as the intersection of left-to-right and right-to-left -(bidirectional) as well as source-to-target and target-to-source (bilingual) -NMT models. Our empirical results show that our decoding framework is -effective, and leads to substantial improvements in translations generated from -the intersected models where the typical greedy or beam search is not feasible. -We also compare our framework against reranking, and analyse its advantages and -disadvantages. -" -4102,1701.02877,"Isabelle Augenstein, Leon Derczynski, Kalina Bontcheva",Generalisation in Named Entity Recognition: A Quantitative Analysis,cs.CL," Named Entity Recognition (NER) is a key NLP task, which is all the more -challenging on Web and user-generated content with their diverse and -continuously changing language. This paper aims to quantify how this diversity -impacts state-of-the-art NER methods, by measuring named entity (NE) and -context variability, feature sparsity, and their effects on precision and -recall. In particular, our findings indicate that NER approaches struggle to -generalise in diverse genres with limited training data. Unseen NEs, in -particular, play an important role, which have a higher incidence in diverse -genres such as social media than in more regular genres such as newswire. -Coupled with a higher incidence of unseen features more generally and the lack -of large training corpora, this leads to significantly lower F1 scores for -diverse genres as compared to more regular ones. We also find that leading -systems rely heavily on surface forms found in training data, having problems -generalising beyond these, and offer explanations for this observation. -" -4103,1701.02901,Antonio Toral and V\'ictor M. S\'anchez-Cartagena,"A Multifaceted Evaluation of Neural versus Phrase-Based Machine - Translation for 9 Language Directions",cs.CL," We aim to shed light on the strengths and weaknesses of the newly introduced -neural machine translation paradigm. To that end, we conduct a multifaceted -evaluation in which we compare outputs produced by state-of-the-art neural -machine translation and phrase-based machine translation systems for 9 language -directions across a number of dimensions. Specifically, we measure the -similarity of the outputs, their fluency and amount of reordering, the effect -of sentence length and performance across different error categories. We find -out that translations produced by neural machine translation systems are -considerably different, more fluent and more accurate in terms of word order -compared to those produced by phrase-based systems. Neural machine translation -systems are also more accurate at producing inflected forms, but they perform -poorly when translating very long sentences. -" -4104,1701.02925,"Waheeb Ahmed, Dr. Anto P Babu",Question Analysis for Arabic Question Answering Systems,cs.CL," The first step of processing a question in Question Answering(QA) Systems is -to carry out a detailed analysis of the question for the purpose of determining -what it is asking for and how to perfectly approach answering it. Our Question -analysis uses several techniques to analyze any question given in natural -language: a Stanford POS Tagger & parser for Arabic language, a named entity -recognizer, tokenizer,Stop-word removal, Question expansion, Question -classification and Question focus extraction components. We employ numerous -detection rules and trained classifier using features from this analysis to -detect important elements of the question, including: 1) the portion of the -question that is a referring to the answer (the focus); 2) different terms in -the question that identify what type of entity is being asked for (the lexical -answer types); 3) Question expansion ; 4) a process of classifying the question -into one or more of several and different types; and We describe how these -elements are identified and evaluate the effect of accurate detection on our -question-answering system using the Mean Reciprocal Rank(MRR) accuracy measure. -" -4105,1701.02946,Chlo\'e Braud and Maximin Coavoux and Anders S{\o}gaard,Cross-lingual RST Discourse Parsing,cs.CL," Discourse parsing is an integral part of understanding information flow and -argumentative structure in documents. Most previous research has focused on -inducing and evaluating models from the English RST Discourse Treebank. -However, discourse treebanks for other languages exist, including Spanish, -German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same -underlying linguistic theory, but differ slightly in the way documents are -annotated. In this paper, we present (a) a new discourse parser which is -simpler, yet competitive (significantly better on 2/3 metrics) to state of the -art for English, (b) a harmonization of discourse treebanks across languages, -enabling us to present (c) what to the best of our knowledge are the first -experiments on cross-lingual discourse parsing. -" -4106,1701.02962,Kim Anh Nguyen and Sabine Schulte im Walde and Ngoc Thang Vu,Distinguishing Antonyms and Synonyms in a Pattern-based Neural Network,cs.CL," Distinguishing between antonyms and synonyms is a key task to achieve high -performance in NLP systems. While they are notoriously difficult to distinguish -by distributional co-occurrence models, pattern-based methods have proven -effective to differentiate between the relations. In this paper, we present a -novel neural network model AntSynNET that exploits lexico-syntactic patterns -from syntactic parse trees. In addition to the lexical and syntactic -information, we successfully integrate the distance between the related words -along the syntactic path as a new pattern feature. The results from -classification experiments show that AntSynNET improves the performance over -prior pattern-based methods. -" -4107,1701.03038,"Arturo Argueta, David Chiang",Decoding with Finite-State Transducers on GPUs,cs.CL cs.DC," Weighted finite automata and transducers (including hidden Markov models and -conditional random fields) are widely used in natural language processing (NLP) -to perform tasks such as morphological analysis, part-of-speech tagging, -chunking, named entity recognition, speech recognition, and others. -Parallelizing finite state algorithms on graphics processing units (GPUs) would -benefit many areas of NLP. Although researchers have implemented GPU versions -of basic graph algorithms, limited previous work, to our knowledge, has been -done on GPU algorithms for weighted finite automata. We introduce a GPU -implementation of the Viterbi and forward-backward algorithm, achieving -decoding speedups of up to 5.2x over our serial implementation running on -different computer architectures and 6093x over OpenFST. -" -4108,1701.03051,"Tapan Sahni, Chinmay Chandak, Naveen Reddy Chedeti, Manish Singh","Efficient Twitter Sentiment Classification using Subjective Distant - Supervision",cs.SI cs.CL cs.IR," As microblogging services like Twitter are becoming more and more influential -in today's globalised world, its facets like sentiment analysis are being -extensively studied. We are no longer constrained by our own opinion. Others -opinions and sentiments play a huge role in shaping our perspective. In this -paper, we build on previous works on Twitter sentiment analysis using Distant -Supervision. The existing approach requires huge computation resource for -analysing large number of tweets. In this paper, we propose techniques to speed -up the computation process for sentiment analysis. We use tweet subjectivity to -select the right training samples. We also introduce the concept of EFWS -(Effective Word Score) of a tweet that is derived from polarity scores of -frequently used words, which is an additional heuristic that can be used to -speed up the sentiment classification with standard machine learning -algorithms. We performed our experiments using 1.6 million tweets. Experimental -evaluations show that our proposed technique is more efficient and has higher -accuracy compared to previously proposed methods. We achieve overall accuracies -of around 80% (EFWS heuristic gives an accuracy around 85%) on a training -dataset of 100K tweets, which is half the size of the dataset used for the -baseline model. The accuracy of our proposed model is 2-3% higher than the -baseline model, and the model effectively trains at twice the speed of the -baseline model. -" -4109,1701.03079,"Chongyang Tao, Lili Mou, Dongyan Zhao, Rui Yan","RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain - Dialog Systems",cs.CL cs.HC cs.IR," Open-domain human-computer conversation has been attracting increasing -attention over the past few years. However, there does not exist a standard -automatic evaluation metric for open-domain dialog systems; researchers usually -resort to human annotation for model evaluation, which is time- and -labor-intensive. In this paper, we propose RUBER, a Referenced metric and -Unreferenced metric Blended Evaluation Routine, which evaluates a reply by -taking into consideration both a groundtruth reply and a query (previous -user-issued utterance). Our metric is learnable, but its training does not -require labels of human satisfaction. Hence, RUBER is flexible and extensible -to different datasets and languages. Experiments on both retrieval and -generative dialog systems show that RUBER has a high correlation with human -annotation. -" -4110,1701.03092,Besat Kassaie,Job Detection in Twitter,cs.CL," In this report, we propose a new application for twitter data called -\textit{job detection}. We identify people's job category based on their -tweets. As a preliminary work, we limited our task to identify only IT workers -from other job holders. We have used and compared both simple bag of words -model and a document representation based on Skip-gram model. Our results show -that the model based on Skip-gram, achieves a 76\% precision and 82\% recall. -" -4111,1701.03126,"Chiori Hori, Takaaki Hori, Teng-Yok Lee, Kazuhiro Sumi, John R. - Hershey, Tim K. Marks",Attention-Based Multimodal Fusion for Video Description,cs.CV cs.CL cs.MM," Currently successful methods for video description are based on -encoder-decoder sentence generation using recur-rent neural networks (RNNs). -Recent work has shown the advantage of integrating temporal and/or spatial -attention mechanisms into these models, in which the decoder net-work predicts -each word in the description by selectively giving more weight to encoded -features from specific time frames (temporal attention) or to features from -specific spatial regions (spatial attention). In this paper, we propose to -expand the attention model to selectively attend not just to specific times or -spatial regions, but to specific modalities of input such as image features, -motion features, and audio features. Our new modality-dependent attention -mechanism, which we call multimodal attention, provides a natural way to fuse -multimodal information for video description. We evaluate our method on the -Youtube2Text dataset, achieving results that are competitive with current state -of the art. More importantly, we demonstrate that our model incorporating -multimodal attention as well as temporal attention significantly outperforms -the model that uses temporal attention alone. -" -4112,1701.03129,Besat Kassaie,De-identification In practice,cs.CL," We report our effort to identify the sensitive information, subset of data -items listed by HIPAA (Health Insurance Portability and Accountability), from -medical text using the recent advances in natural language processing and -machine learning techniques. We represent the words with high dimensional -continuous vectors learned by a variant of Word2Vec called Continous Bag Of -Words (CBOW). We feed the word vectors into a simple neural network with a Long -Short-Term Memory (LSTM) architecture. Without any attempts to extract manually -crafted features and considering that our medical dataset is too small to be -fed into neural network, we obtained promising results. The results thrilled us -to think about the larger scale of the project with precise parameter tuning -and other possible improvements. -" -4113,1701.03163,"H\'ector Mart\'inez Alonso and \v{Z}eljko Agi\'c and Barbara Plank and - Anders S{\o}gaard",Parsing Universal Dependencies without training,cs.CL," We propose UDP, the first training-free parser for Universal Dependencies -(UD). Our algorithm is based on PageRank and a small set of head attachment -rules. It features two-step decoding to guarantee that function words are -attached as leaf nodes. The parser requires no training, and it is competitive -with a delexicalized transfer system. UDP offers a linguistically sound -unsupervised alternative to cross-lingual parsing for UD, which can be used as -a baseline for such systems. The parser has very few parameters and is -distinctly robust to domain change across languages. -" -4114,1701.03185,"Louis Shao, Stephan Gouws, Denny Britz, Anna Goldie, Brian Strope, Ray - Kurzweil","Generating High-Quality and Informative Conversation Responses with - Sequence-to-Sequence Models",cs.CL," Sequence-to-sequence models have been applied to the conversation response -generation problem where the source sequence is the conversation history and -the target sequence is the response. Unlike translation, conversation -responding is inherently creative. The generation of long, informative, -coherent, and diverse responses remains a hard task. In this work, we focus on -the single turn setting. We add self-attention to the decoder to maintain -coherence in longer responses, and we propose a practical approach, called the -glimpse-model, for scaling to large datasets. We introduce a stochastic -beam-search algorithm with segment-by-segment reranking which lets us inject -diversity earlier in the generation process. We trained on a combined data set -of over 2.3B conversation messages mined from the web. In human evaluation -studies, our method produces longer responses overall, with a higher proportion -rated as acceptable and excellent as length increases, compared to baseline -sequence-to-sequence models with explicit length-promotion. A back-off strategy -produces better responses overall, in the full spectrum of lengths. -" -4115,1701.03214,"Chenhui Chu, Raj Dabre, and Sadao Kurohashi","An Empirical Comparison of Simple Domain Adaptation Methods for Neural - Machine Translation",cs.CL," In this paper, we propose a novel domain adaptation method named ""mixed fine -tuning"" for neural machine translation (NMT). We combine two existing -approaches namely fine tuning and multi domain NMT. We first train an NMT model -on an out-of-domain parallel corpus, and then fine tune it on a parallel corpus -which is a mix of the in-domain and out-of-domain corpora. All corpora are -augmented with artificial tags to indicate specific domains. We empirically -compare our proposed method against fine tuning and multi domain methods and -discuss its benefits and shortcomings. -" -4116,1701.03227,"Angela Fan, Finale Doshi-Velez, Luke Miratrix","Prior matters: simple and general methods for evaluating and improving - topic quality in topic modeling",cs.CL cs.IR cs.LG," Latent Dirichlet Allocation (LDA) models trained without stopword removal -often produce topics with high posterior probabilities on uninformative words, -obscuring the underlying corpus content. Even when canonical stopwords are -manually removed, uninformative words common in that corpus will still dominate -the most probable words in a topic. In this work, we first show how the -standard topic quality measures of coherence and pointwise mutual information -act counter-intuitively in the presence of common but irrelevant words, making -it difficult to even quantitatively identify situations in which topics may be -dominated by stopwords. We propose an additional topic quality metric that -targets the stopword problem, and show that it, unlike the standard measures, -correctly correlates with human judgements of quality. We also propose a -simple-to-implement strategy for generating topics that are evaluated to be of -much higher quality by both human assessment and our new metric. This approach, -a collection of informative priors easily introduced into most LDA-style -inference methods, automatically promotes terms with domain relevance and -demotes domain-specific stop words. We demonstrate this approach's -effectiveness in three very different domains: Department of Labor accident -reports, online health forum posts, and NIPS abstracts. Overall we find that -current practices thought to solve this problem do not do so adequately, and -that our proposal offers a substantial improvement for those interested in -interpreting their topics as objects in their own right. -" -4117,1701.03231,Manuel Amunategui,"Single-Pass, Adaptive Natural Language Filtering: Measuring Value in - User Generated Comments on Large-Scale, Social Media News Forums",cs.CL," There are large amounts of insight and social discovery potential in mining -crowd-sourced comments left on popular news forums like Reddit.com, Tumblr.com, -Facebook.com and Hacker News. Unfortunately, due the overwhelming amount of -participation with its varying quality of commentary, extracting value out of -such data isn't always obvious nor timely. By designing efficient, single-pass -and adaptive natural language filters to quickly prune spam, noise, copy-cats, -marketing diversions, and out-of-context posts, we can remove over a third of -entries and return the comments with a higher probability of relatedness to the -original article in question. The approach presented here uses an adaptive, -two-step filtering process. It first leverages the original article posted in -the thread as a starting corpus to parse comments by matching intersecting -words and term-ratio balance per sentence then grows the corpus by adding new -words harvested from high-matching comments to increase filtering accuracy over -time. -" -4118,1701.03329,"Andreas van Cranenburgh, Rens Bod",A Data-Oriented Model of Literary Language,cs.CL," We consider the task of predicting how literary a text is, with a gold -standard from human ratings. Aside from a standard bigram baseline, we apply -rich syntactic tree fragments, mined from the training set, and a series of -hand-picked features. Our model is the first to distinguish degrees of highly -and less literary novels using a variety of lexical and syntactic features, and -explains 76.0 % of the variation in literary ratings. -" -4119,1701.03338,"Tom Kocmi, Ond\v{r}ej Bojar",LanideNN: Multilingual Language Identification on Character Window,cs.CL," In language identification, a common first step in natural language -processing, we want to automatically determine the language of some input text. -Monolingual language identification assumes that the given document is written -in one language. In multilingual language identification, the document is -usually in two or three languages and we just want their names. We aim one step -further and propose a method for textual language identification where -languages can change arbitrarily and the goal is to identify the spans of each -of the languages. Our method is based on Bidirectional Recurrent Neural -Networks and it performs well in monolingual and multilingual language -identification tasks on six datasets covering 131 languages. The method keeps -the accuracy also for short documents and across domains, so it is ideal for -off-the-shelf use without preparation of training data. -" -4120,1701.03434,"Noura Farra, Kathleen McKeown",SMARTies: Sentiment Models for Arabic Target Entities,cs.CL," We consider entity-level sentiment analysis in Arabic, a morphologically rich -language with increasing resources. We present a system that is applied to -complex posts written in response to Arabic newspaper articles. Our goal is to -identify important entity ""targets"" within the post along with the polarity -expressed about each target. We achieve significant improvements over multiple -baselines, demonstrating that the use of specific morphological representations -improves the performance of identifying both important targets and their -sentiment, and that the use of distributional semantic clusters further boosts -performances for these representations, especially when richer linguistic -resources are not available. -" -4121,1701.03492,Emrah Budur,"Scalable, Trie-based Approximate Entity Extraction for Real-Time - Financial Transaction Screening",cs.CL cs.IR," Financial institutions have to screen their transactions to ensure that they -are not affiliated with terrorism entities. Developing appropriate solutions to -detect such affiliations precisely while avoiding any kind of interruption to -large amount of legitimate transactions is essential. In this paper, we present -building blocks of a scalable solution that may help financial institutions to -build their own software to extract terrorism entities out of both structured -and unstructured financial messages in real time and with approximate -similarity matching approach. -" -4122,1701.03577,"Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, - Aur\'elien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, - Michael Picheny, Fei Sha",Kernel Approximation Methods for Speech Recognition,stat.ML cs.AI cs.CL cs.LG," We study large-scale kernel methods for acoustic modeling in speech -recognition and compare their performance to deep neural networks (DNNs). We -perform experiments on four speech recognition datasets, including the TIMIT -and Broadcast News benchmark tasks, and compare these two types of models on -frame-level performance metrics (accuracy, cross-entropy), as well as on -recognition metrics (word/character error rate). In order to scale kernel -methods to these large datasets, we use the random Fourier feature method of -Rahimi and Recht (2007). We propose two novel techniques for improving the -performance of kernel acoustic models. First, in order to reduce the number of -random features required by kernel models, we propose a simple but effective -method for feature selection. The method is able to explore a large number of -non-linear features while maintaining a compact model more efficiently than -existing approaches. Second, we present a number of frame-level metrics which -correlate very strongly with recognition performance when computed on the -heldout set; we take advantage of these correlations by monitoring these -metrics during training in order to decide when to stop learning. This -technique can noticeably improve the recognition performance of both DNN and -kernel models, while narrowing the gap between them. Additionally, we show that -the linear bottleneck method of Sainath et al. (2013) improves the performance -of our kernel models significantly, in addition to speeding up training and -making the models more compact. Together, these three methods dramatically -improve the performance of kernel acoustic models, making their performance -comparable to DNNs on the tasks we explored. -" -4123,1701.03578,"Seunghyun Yoon, Hyeongu Yun, Yuna Kim, Gyu-tae Park, Kyomin Jung","Efficient Transfer Learning Schemes for Personalized Language Modeling - using Recurrent Neural Network",cs.CL cs.AI," In this paper, we propose an efficient transfer leaning methods for training -a personalized language model using a recurrent neural network with long -short-term memory architecture. With our proposed fast transfer learning -schemes, a general language model is updated to a personalized language model -with a small amount of user data and a limited computing resource. These -methods are especially useful for a mobile device environment while the data is -prevented from transferring out of the device for privacy purposes. Through -experiments on dialogue data in a drama, it is verified that our transfer -learning methods have successfully generated the personalized language model, -whose output is more similar to the personal language style in both qualitative -and quantitative aspects. -" -4124,1701.03682,"Priyank Mathur, Arkajyoti Misra, Emrah Budur",LIDE: Language Identification from Text Documents,cs.CL cs.NE," The increase in the use of microblogging came along with the rapid growth on -short linguistic data. On the other hand deep learning is considered to be the -new frontier to extract meaningful information out of large amount of raw data -in an automated manner. In this study, we engaged these two emerging fields to -come up with a robust language identifier on demand, namely Language -Identification Engine (LIDE). As a result, we achieved 95.12% accuracy in -Discriminating between Similar Languages (DSL) Shared Task 2015 dataset, which -is comparable to the maximum reported accuracy of 95.54% achieved so far. -" -4125,1701.03849,"Ladislav Lenc, Pavel Kr\'al",Deep Neural Networks for Czech Multi-label Document Classification,cs.CL," This paper is focused on automatic multi-label document classification of -Czech text documents. The current approaches usually use some pre-processing -which can have negative impact (loss of information, additional implementation -work, etc). Therefore, we would like to omit it and use deep neural networks -that learn from simple features. This choice was motivated by their successful -usage in many other machine learning fields. Two different networks are -compared: the first one is a standard multi-layer perceptron, while the second -one is a popular convolutional network. The experiments on a Czech newspaper -corpus show that both networks significantly outperform baseline method which -uses a rich set of features with maximum entropy classifier. We have also shown -that convolutional network gives the best results. -" -4126,1701.03924,"Nadir Durrani and Fahim Dalvi and Hassan Sajjad, Stephan Vogel",QCRI Machine Translation Systems for IWSLT 16,cs.CL," This paper describes QCRI's machine translation systems for the IWSLT 2016 -evaluation campaign. We participated in the Arabic->English and English->Arabic -tracks. We built both Phrase-based and Neural machine translation models, in an -effort to probe whether the newly emerged NMT framework surpasses the -traditional phrase-based systems in Arabic-English language pairs. We trained a -very strong phrase-based system including, a big language model, the Operation -Sequence Model, Neural Network Joint Model and Class-based models along with -different domain adaptation techniques such as MML filtering, mixture modeling -and using fine tuning over NNJM model. However, a Neural MT system, trained by -stacking data from different genres through fine-tuning, and applying ensemble -over 8 models, beat our very strong phrase-based system by a significant 2 BLEU -points margin in Arabic->English direction. We did not obtain similar gains in -the other direction but were still able to outperform the phrase-based system. -We also applied system combination on phrase-based and NMT outputs. -" -4127,1701.03947,"Tuan Tran, Claudia Nieder\'ee, Nattiya Kanhabua, Ujwal Gadiraju, - Avishek Anand","Balancing Novelty and Salience: Adaptive Learning to Rank Entities for - Timeline Summarization of High-impact Events",cs.IR cs.CL," Long-running, high-impact events such as the Boston Marathon bombing often -develop through many stages and involve a large number of entities in their -unfolding. Timeline summarization of an event by key sentences eases story -digestion, but does not distinguish between what a user remembers and what she -might want to re-check. In this work, we present a novel approach for timeline -summarization of high-impact events, which uses entities instead of sentences -for summarizing the event at each individual point in time. Such entity -summaries can serve as both (1) important memory cues in a retrospective event -consideration and (2) pointers for personalized event exploration. In order to -automatically create such summaries, it is crucial to identify the ""right"" -entities for inclusion. We propose to learn a ranking function for entities, -with a dynamically adapted trade-off between the in-document salience of -entities and the informativeness of entities across documents, i.e., the level -of new information associated with an entity for a time point under -consideration. Furthermore, for capturing collective attention for an entity we -use an innovative soft labeling approach based on Wikipedia. Our experiments on -a real large news datasets confirm the effectiveness of the proposed methods. -" -4128,1701.03980,"Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed - Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel - Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, - Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya - Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha - Swayamdipta, Pengcheng Yin",DyNet: The Dynamic Neural Network Toolkit,stat.ML cs.CL cs.MS," We describe DyNet, a toolkit for implementing neural network models based on -dynamic declaration of network structure. In the static declaration strategy -that is used in toolkits like Theano, CNTK, and TensorFlow, the user first -defines a computation graph (a symbolic representation of the computation), and -then examples are fed into an engine that executes this computation and -computes its derivatives. In DyNet's dynamic declaration strategy, computation -graph construction is mostly transparent, being implicitly constructed by -executing procedural code that computes the network outputs, and the user is -free to use different network structures for each input. Dynamic declaration -thus facilitates the implementation of more complicated network architectures, -and DyNet is specifically designed to allow users to implement their models in -a way that is idiomatic in their preferred programming language (C++ or -Python). One challenge with dynamic declaration is that because the symbolic -computation graph is defined anew for every training example, its construction -must have low overhead. To achieve this, DyNet has an optimized C++ backend and -lightweight graph representation. Experiments show that DyNet's speeds are -faster than or comparable with static declaration toolkits, and significantly -faster than Chainer, another dynamic declaration toolkit. DyNet is released -open-source under the Apache 2.0 license and available at -http://github.com/clab/dynet. -" -4129,1701.04024,Mihail Eric and Christopher D. Manning,"A Copy-Augmented Sequence-to-Sequence Architecture Gives Good - Performance on Task-Oriented Dialogue",cs.CL cs.AI," Task-oriented dialogue focuses on conversational agents that participate in -user-initiated dialogues on domain-specific topics. In contrast to chatbots, -which simply seek to sustain open-ended meaningful discourse, existing -task-oriented agents usually explicitly model user intent and belief states. -This paper examines bypassing such an explicit representation by depending on a -latent neural embedding of state and learning selective attention to dialogue -history together with copying to incorporate relevant prior context. We -complement recent work by showing the effectiveness of simple -sequence-to-sequence neural architectures with a copy mechanism. Our model -outperforms more complex memory-augmented models by 7% in per-response -generation and is on par with the current state-of-the-art on DSTC2. -" -4130,1701.04027,"Feifei Zhai, Saloni Potdar, Bing Xiang, Bowen Zhou",Neural Models for Sequence Chunking,cs.CL," Many natural language understanding (NLU) tasks, such as shallow parsing -(i.e., text chunking) and semantic slot filling, require the assignment of -representative labels to the meaningful chunks in a sentence. Most of the -current deep neural network (DNN) based methods consider these tasks as a -sequence labeling problem, in which a word, rather than a chunk, is treated as -the basic unit for labeling. These chunks are then inferred by the standard IOB -(Inside-Outside-Beginning) labels. In this paper, we propose an alternative -approach by investigating the use of DNN for sequence chunking, and propose -three neural models so that each chunk can be treated as a complete unit for -labeling. Experimental results show that the proposed neural sequence chunking -models can achieve start-of-the-art performance on both the text chunking and -slot filling tasks. -" -4131,1701.04039,"David Graus, Daan Odijk, Maarten de Rijke","The Birth of Collective Memories: Analyzing Emerging Entities in Text - Streams",cs.IR cs.CL," We study how collective memories are formed online. We do so by tracking -entities that emerge in public discourse, that is, in online text streams such -as social media and news streams, before they are incorporated into Wikipedia, -which, we argue, can be viewed as an online place for collective memory. By -tracking how entities emerge in public discourse, i.e., the temporal patterns -between their first mention in online text streams and subsequent incorporation -into collective memory, we gain insights into how the collective remembrance -process happens online. Specifically, we analyze nearly 80,000 entities as they -emerge in online text streams before they are incorporated into Wikipedia. The -online text streams we use for our analysis comprise of social media and news -streams, and span over 579 million documents in a timespan of 18 months. We -discover two main emergence patterns: entities that emerge in a ""bursty"" -fashion, i.e., that appear in public discourse without a precedent, blast into -activity and transition into collective memory. Other entities display a -""delayed"" pattern, where they appear in public discourse, experience a period -of inactivity, and then resurface before transitioning into our cultural -collective memory. -" -4132,1701.04056,"Bing Liu, Ian Lane",Dialog Context Language Modeling with Recurrent Neural Networks,cs.CL," In this work, we propose contextual language models that incorporate dialog -level discourse information into language modeling. Previous works on -contextual language model treat preceding utterances as a sequence of inputs, -without considering dialog interactions. We design recurrent neural network -(RNN) based contextual language models that specially track the interactions -between speakers in a dialog. Experiment results on Switchboard Dialog Act -Corpus show that the proposed model outperforms conventional single turn based -RNN language model by 3.3% on perplexity. The proposed models also demonstrate -advantageous performance over other competitive contextual language models. -" -4133,1701.04189,"Cheng Li, Xiaoxiao Guo and Qiaozhu Mei",Deep Memory Networks for Attitude Identification,cs.CL," We consider the task of identifying attitudes towards a given set of entities -from text. Conventionally, this task is decomposed into two separate subtasks: -target detection that identifies whether each entity is mentioned in the text, -either explicitly or implicitly, and polarity classification that classifies -the exact sentiment towards an identified entity (the target) into positive, -negative, or neutral. - Instead, we show that attitude identification can be solved with an -end-to-end machine learning architecture, in which the two subtasks are -interleaved by a deep memory network. In this way, signals produced in target -detection provide clues for polarity classification, and reversely, the -predicted polarity provides feedback to the identification of targets. -Moreover, the treatments for the set of targets also influence each other -- -the learned representations may share the same semantics for some targets but -vary for others. The proposed deep memory network, the AttNet, outperforms -methods that do not consider the interactions between the subtasks or those -among the targets, including conventional machine learning methods and the -state-of-the-art deep learning models. -" -4134,1701.04290,"Nadeem Jadoon Khan, Waqas Anwar, Nadir Durrani",Machine Translation Approaches and Survey for Indian Languages,cs.CL," In this study, we present an analysis regarding the performance of the -state-of-art Phrase-based Statistical Machine Translation (SMT) on multiple -Indian languages. We report baseline systems on several language pairs. The -motivation of this study is to promote the development of SMT and linguistic -resources for these language pairs, as the current state-of-the-art is quite -bleak due to sparse data resources. The success of an SMT system is contingent -on the availability of a large parallel corpus. Such data is necessary to -reliably estimate translation probabilities. We report the performance of -baseline systems translating from Indian languages (Bengali, Guajarati, Hindi, -Malayalam, Punjabi, Tamil, Telugu and Urdu) into English with average 10% -accurate results for all the language pairs. -" -4135,1701.04292,"Piotr Borkowski and Krzysztof Ciesielski and Mieczys{\l}aw A. - K{\l}opotek",Semantic classifier approach to document classification,cs.IR cs.CL," In this paper we propose a new document classification method, bridging -discrepancies (so-called semantic gap) between the training set and the -application sets of textual data. We demonstrate its superiority over classical -text classification approaches, including traditional classifier ensembles. The -method consists in combining a document categorization technique with a single -classifier or a classifier ensemble (SEMCOM algorithm - Committee with Semantic -Categorizer). -" -4136,1701.04313,"Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana - Ramabhadran, Brian Kingsbury",End-to-End ASR-free Keyword Search from Speech,cs.CL cs.IR cs.LG cs.NE," End-to-end (E2E) systems have achieved competitive results compared to -conventional hybrid hidden Markov model (HMM)-deep neural network based -automatic speech recognition (ASR) systems. Such E2E systems are attractive due -to the lack of dependence on alignments between input acoustic and output -grapheme or HMM state sequence during training. This paper explores the design -of an ASR-free end-to-end system for text query-based keyword search (KWS) from -speech trained with minimal supervision. Our E2E KWS system consists of three -sub-systems. The first sub-system is a recurrent neural network (RNN)-based -acoustic auto-encoder trained to reconstruct the audio through a -finite-dimensional representation. The second sub-system is a character-level -RNN language model using embeddings learned from a convolutional neural -network. Since the acoustic and text query embeddings occupy different -representation spaces, they are input to a third feed-forward neural network -that predicts whether the query occurs in the acoustic utterance or not. This -E2E ASR-free KWS system performs respectably despite lacking a conventional ASR -system and trains much faster. -" -4137,1701.04653,"Marzieh Saeidi, Alessandro Venerandi, Licia Capra and Sebastian Riedel","Community Question Answering Platforms vs. Twitter for Predicting - Characteristics of Urban Neighbourhoods",cs.CL cs.SI," In this paper, we investigate whether text from a Community Question -Answering (QA) platform can be used to predict and describe real-world -attributes. We experiment with predicting a wide range of 62 demographic -attributes for neighbourhoods of London. We use the text from QA platform of -Yahoo! Answers and compare our results to the ones obtained from Twitter -microblogs. Outcomes show that the correlation between the predicted -demographic attributes using text from Yahoo! Answers discussions and the -observed demographic attributes can reach an average Pearson correlation -coefficient of \r{ho} = 0.54, slightly higher than the predictions obtained -using Twitter data. Our qualitative analysis indicates that there is semantic -relatedness between the highest correlated terms extracted from both datasets -and their relative demographic attributes. Furthermore, the correlations -highlight the different natures of the information contained in Yahoo! Answers -and Twitter. While the former seems to offer a more encyclopedic content, the -latter provides information related to the current sociocultural aspects or -phenomena. -" -4138,1701.05011,"Eug\'enio Ribeiro, Fernando Batista, Isabel Trancoso, Jos\'e Lopes, - Ricardo Ribeiro, and David Martins de Matos",Assessing User Expertise in Spoken Dialog System Interactions,cs.CL," Identifying the level of expertise of its users is important for a system -since it can lead to a better interaction through adaptation techniques. -Furthermore, this information can be used in offline processes of root cause -analysis. However, not much effort has been put into automatically identifying -the level of expertise of an user, especially in dialog-based interactions. In -this paper we present an approach based on a specific set of task related -features. Based on the distribution of the features among the two classes - -Novice and Expert - we used Random Forests as a classification approach. -Furthermore, we used a Support Vector Machine classifier, in order to perform a -result comparison. By applying these approaches on data from a real system, -Let's Go, we obtained preliminary results that we consider positive, given the -difficulty of the task and the lack of competing approaches for comparison. -" -4139,1701.05311,"Valentina Franzoni, Yuanxi Li, Clement H.C.Leung and Alfredo Milani","Semantic Evolutionary Concept Distances for Effective Information - Retrieval in Query Expansion",cs.IR cs.AI cs.CL math.PR," In this work several semantic approaches to concept-based query expansion and -reranking schemes are studied and compared with different ontology-based -expansion methods in web document search and retrieval. In particular, we focus -on concept-based query expansion schemes, where, in order to effectively -increase the precision of web document retrieval and to decrease the users -browsing time, the main goal is to quickly provide users with the most suitable -query expansion. Two key tasks for query expansion in web document retrieval -are to find the expansion candidates, as the closest concepts in web document -domain, and to rank the expanded queries properly. The approach we propose aims -at improving the expansion phase for better web document retrieval and -precision. The basic idea is to measure the distance between candidate concepts -using the PMING distance, a collaborative semantic proximity measure, i.e. a -measure which can be computed by using statistical results from web search -engine. Experiments show that the proposed technique can provide users with -more satisfying expansion results and improve the quality of web document -retrieval. -" -4140,1701.05334,"Farman Ali, D. Kwak, Pervez Khan, S.M. Riazul Islam, K.H. Kim, and - K.S. Kwak","Fuzzy Ontology-Based Sentiment Analysis of Transportation and City - Feature Reviews for Safe Traveling",cs.AI cs.CL," Traffic congestion is rapidly increasing in urban areas, particularly in mega -cities. To date, there exist a few sensor network based systems to address this -problem. However, these techniques are not suitable enough in terms of -monitoring an entire transportation system and delivering emergency services -when needed. These techniques require real-time data and intelligent ways to -quickly determine traffic activity from useful information. In addition, these -existing systems and websites on city transportation and travel rely on rating -scores for different factors (e.g., safety, low crime rate, cleanliness, etc.). -These rating scores are not efficient enough to deliver precise information, -whereas reviews or tweets are significant, because they help travelers and -transportation administrators to know about each aspect of the city. However, -it is difficult for travelers to read, and for transportation systems to -process, all reviews and tweets to obtain expressive sentiments regarding the -needs of the city. The optimum solution for this kind of problem is analyzing -the information available on social network platforms and performing sentiment -analysis. On the other hand, crisp ontology-based frameworks cannot extract -blurred information from tweets and reviews; therefore, they produce inadequate -results. In this regard, this paper proposes fuzzy ontology-based sentiment -analysis and SWRL rule-based decision-making to monitor transportation -activities and to make a city- feature polarity map for travelers. This system -retrieves reviews and tweets related to city features and transportation -activities. The feature opinions are extracted from these retrieved data, and -then fuzzy ontology is used to determine the transportation and city-feature -polarity. A fuzzy ontology and an intelligent system prototype are developed by -using Prot\'eg\'e OWL and Java, respectively. -" -4141,1701.05343,"Zhongyu Wei, Chen Li and Yang Liu","A Joint Framework for Argumentative Text Analysis Incorporating Domain - Knowledge",cs.CL," For argumentation mining, there are several sub-tasks such as argumentation -component type classification, relation classification. Existing research tends -to solve such sub-tasks separately, but ignore the close relation between them. -In this paper, we present a joint framework incorporating logical relation -between sub-tasks to improve the performance of argumentation structure -generation. We design an objective function to combine the predictions from -individual models for each sub-task and solve the problem with some constraints -constructed from background knowledge. We evaluate our proposed model on two -public corpora and the experiment results show that our model can outperform -the baseline that uses a separate model significantly for each sub-task. Our -model also shows advantages on component-related sub-tasks compared to a -state-of-the-art joint model based on the evidence graph. -" -4142,1701.05574,"Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey and Pushpak - Bhattacharyya",Harnessing Cognitive Features for Sarcasm Detection,cs.CL," In this paper, we propose a novel mechanism for enriching the feature vector, -for the task of sarcasm detection, with cognitive features extracted from -eye-movement patterns of human readers. Sarcasm detection has been a -challenging research problem, and its importance for NLP applications such as -review summarization, dialog systems and sentiment analysis is well recognized. -Sarcasm can often be traced to incongruity that becomes apparent as the full -sentence unfolds. This presence of incongruity- implicit or explicit- affects -the way readers eyes move through the text. We observe the difference in the -behaviour of the eye, while reading sarcastic and non sarcastic sentences. -Motivated by his observation, we augment traditional linguistic and stylistic -features for sarcasm detection with the cognitive features obtained from -readers eye movement data. We perform statistical classification using the -enhanced feature set so obtained. The augmented cognitive features improve -sarcasm detection by 3.7% (in terms of F-score), over the performance of the -best reported system. -" -4143,1701.05581,"Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey and Pushpak - Bhattacharyya",Leveraging Cognitive Features for Sentiment Analysis,cs.CL," Sentiments expressed in user-generated short text and sentences are nuanced -by subtleties at lexical, syntactic, semantic and pragmatic levels. To address -this, we propose to augment traditional features used for sentiment analysis -and sarcasm detection, with cognitive features derived from the eye-movement -patterns of readers. Statistical classification using our enhanced feature set -improves the performance (F-score) of polarity detection by a maximum of 3.7% -and 9.3% on two datasets, over the systems that use only traditional features. -We perform feature significance analysis, and experiment on a held-out dataset, -showing that cognitive features indeed empower sentiment analyzers to handle -complex constructs. -" -4144,1701.05625,"Saeedeh Shekarpour, Faisal Alshargi, Valerie Shalin, Krishnaprasad - Thirunarayan, Amit P. Sheth",CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation,cs.CL," While the general analysis of named entities has received substantial -research attention on unstructured as well as structured data, the analysis of -relations among named entities has received limited focus. In fact, a review of -the literature revealed a deficiency in research on the abstract -conceptualization required to organize relations. We believe that such an -abstract conceptualization can benefit various communities and applications -such as natural language processing, information extraction, machine learning, -and ontology engineering. In this paper, we present Comprehensive EVent -Ontology (CEVO), built on Levin's conceptual hierarchy of English verbs that -categorizes verbs with shared meaning, and syntactic behavior. We present the -fundamental concepts and requirements for this ontology. Furthermore, we -present three use cases employing the CEVO ontology on annotation tasks: (i) -annotating relations in plain text, (ii) annotating ontological properties, and -(iii) linking textual relations to ontological properties. These use-cases -demonstrate the benefits of using CEVO for annotation: (i) annotating English -verbs from an abstract conceptualization, (ii) playing the role of an upper -ontology for organizing ontological properties, and (iii) facilitating the -annotation of text relations using any underlying vocabulary. This resource is -available at https://shekarpour.github.io/cevo.io/ using https://w3id.org/cevo -namespace. -" -4145,1701.05847,"Stavros Petridis, Zuwei Li, Maja Pantic",End-To-End Visual Speech Recognition With LSTMs,cs.CV cs.CL," Traditional visual speech recognition systems consist of two stages, feature -extraction and classification. Recently, several deep learning approaches have -been presented which automatically extract features from the mouth images and -aim to replace the feature extraction stage. However, research on joint -learning of features and classification is very limited. In this work, we -present an end-to-end visual speech recognition system based on Long-Short -Memory (LSTM) networks. To the best of our knowledge, this is the first model -which simultaneously learns to extract features directly from the pixels and -perform classification and also achieves state-of-the-art performance in visual -speech classification. The model consists of two streams which extract features -directly from the mouth and difference images, respectively. The temporal -dynamics in each stream are modelled by an LSTM and the fusion of the two -streams takes place via a Bidirectional LSTM (BLSTM). An absolute improvement -of 9.7% over the base line is reported on the OuluVS2 database, and 1.5% on the -CUAVE database when compared with other methods which use a similar visual -front-end. -" -4146,1701.06233,"Tianran Hu, Haoyuan Xiao, Thuy-vy Thi Nguyen, Jiebo Luo",What the Language You Tweet Says About Your Occupation,cs.CY cs.AI cs.CL cs.LG," Many aspects of people's lives are proven to be deeply connected to their -jobs. In this paper, we first investigate the distinct characteristics of major -occupation categories based on tweets. From multiple social media platforms, we -gather several types of user information. From users' LinkedIn webpages, we -learn their proficiencies. To overcome the ambiguity of self-reported -information, a soft clustering approach is applied to extract occupations from -crowd-sourced data. Eight job categories are extracted, including Marketing, -Administrator, Start-up, Editor, Software Engineer, Public Relation, Office -Clerk, and Designer. Meanwhile, users' posts on Twitter provide cues for -understanding their linguistic styles, interests, and personalities. Our -results suggest that people of different jobs have unique tendencies in certain -language styles and interests. Our results also clearly reveal distinctive -levels in terms of Big Five Traits for different jobs. Finally, a classifier is -built to predict job types based on the features extracted from tweets. A high -accuracy indicates a strong discrimination power of language features for job -prediction task. -" -4147,1701.06247,"Hongjie Shi, Takashi Ushio, Mitsuru Endo, Katsuyoshi Yamagami, Noriaki - Horii","A Multichannel Convolutional Neural Network For Cross-language Dialog - State Tracking",cs.CL cs.AI cs.LG," The fifth Dialog State Tracking Challenge (DSTC5) introduces a new -cross-language dialog state tracking scenario, where the participants are asked -to build their trackers based on the English training corpus, while evaluating -them with the unlabeled Chinese corpus. Although the computer-generated -translations for both English and Chinese corpus are provided in the dataset, -these translations contain errors and careless use of them can easily hurt the -performance of the built trackers. To address this problem, we propose a -multichannel Convolutional Neural Networks (CNN) architecture, in which we -treat English and Chinese language as different input channels of one single -CNN model. In the evaluation of DSTC5, we found that such multichannel -architecture can effectively improve the robustness against translation errors. -Additionally, our method for DSTC5 is purely machine learning based and -requires no prior knowledge about the target language. We consider this a -desirable property for building a tracker in the cross-language context, as not -every developer will be familiar with both languages. -" -4148,1701.06279,Patrick Ng,dna2vec: Consistent vector representations of variable-length k-mers,q-bio.QM cs.CL cs.LG stat.ML," One of the ubiquitous representation of long DNA sequence is dividing it into -shorter k-mer components. Unfortunately, the straightforward vector encoding of -k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse -yet, the distance between any pair of one-hot vectors is equidistant. This is -particularly problematic when applying the latest machine learning algorithms -to solve problems in biological sequence analysis. In this paper, we propose a -novel method to train distributed representations of variable-length k-mers. -Our method is based on the popular word embedding model word2vec, which is -trained on a shallow two-layer neural network. Our experiments provide evidence -that the summing of dna2vec vectors is akin to nucleotides concatenation. We -also demonstrate that there is correlation between Needleman-Wunsch similarity -score and cosine similarity of dna2vec vectors. -" -4149,1701.06521,Iacer Calixto and Qun Liu and Nick Campbell,"Incorporating Global Visual Features into Attention-Based Neural Machine - Translation",cs.CL," We introduce multi-modal, attention-based neural machine translation (NMT) -models which incorporate visual features into different parts of both the -encoder and the decoder. We utilise global image features extracted using a -pre-trained convolutional neural network and incorporate them (i) as words in -the source sentence, (ii) to initialise the encoder hidden state, and (iii) as -additional data to initialise the decoder hidden state. In our experiments, we -evaluate how these different strategies to incorporate global image features -compare and which ones perform best. We also study the impact that adding -synthetic multi-modal, multilingual data brings and find that the additional -data have a positive impact on multi-modal models. We report new -state-of-the-art results and our best models also significantly improve on a -comparable phrase-based Statistical MT (PBSMT) model trained on the Multi30k -data set according to all metrics evaluated. To the best of our knowledge, it -is the first time a purely neural model significantly improves over a PBSMT -model on all metrics evaluated on this data set. -" -4150,1701.06538,"Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc - Le, Geoffrey Hinton, Jeff Dean","Outrageously Large Neural Networks: The Sparsely-Gated - Mixture-of-Experts Layer",cs.LG cs.CL cs.NE stat.ML," The capacity of a neural network to absorb information is limited by its -number of parameters. Conditional computation, where parts of the network are -active on a per-example basis, has been proposed in theory as a way of -dramatically increasing model capacity without a proportional increase in -computation. In practice, however, there are significant algorithmic and -performance challenges. In this work, we address these challenges and finally -realize the promise of conditional computation, achieving greater than 1000x -improvements in model capacity with only minor losses in computational -efficiency on modern GPU clusters. We introduce a Sparsely-Gated -Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward -sub-networks. A trainable gating network determines a sparse combination of -these experts to use for each example. We apply the MoE to the tasks of -language modeling and machine translation, where model capacity is critical for -absorbing the vast quantities of knowledge available in the training corpora. -We present model architectures in which a MoE with up to 137 billion parameters -is applied convolutionally between stacked LSTM layers. On large language -modeling and machine translation benchmarks, these models achieve significantly -better results than state-of-the-art at lower computational cost. -" -4151,1701.06547,"Jiwei Li, Will Monroe, Tianlin Shi, S\'ebastien Jean, Alan Ritter and - Dan Jurafsky",Adversarial Learning for Neural Dialogue Generation,cs.CL," In this paper, drawing intuition from the Turing test, we propose using -adversarial training for open-domain dialogue generation: the system is trained -to produce sequences that are indistinguishable from human-generated dialogue -utterances. We cast the task as a reinforcement learning (RL) problem where we -jointly train two systems, a generative model to produce response sequences, -and a discriminator---analagous to the human evaluator in the Turing test--- to -distinguish between the human-generated dialogues and the machine-generated -ones. The outputs from the discriminator are then used as rewards for the -generative model, pushing the system to generate dialogues that mostly resemble -human dialogues. - In addition to adversarial training we describe a model for adversarial {\em -evaluation} that uses success in fooling an adversary as a dialogue evaluation -metric, while avoiding a number of potential pitfalls. Experimental results on -several metrics, including adversarial evaluation, demonstrate that the -adversarially-trained system generates higher-quality responses than previous -baselines. -" -4152,1701.06549,"Jiwei Li, Will Monroe and Dan Jurafsky",Learning to Decode for Future Success,cs.CL," We introduce a simple, general strategy to manipulate the behavior of a -neural decoder that enables it to generate outputs that have specific -properties of interest (e.g., sequences of a pre-specified length). The model -can be thought of as a simple version of the actor-critic model that uses an -interpolation of the actor (the MLE-based token generation policy) and the -critic (a value function that estimates the future values of the desired -property) for decision making. We demonstrate that the approach is able to -incorporate a variety of properties that cannot be handled by standard neural -sequence decoders, such as sequence length and backward probability -(probability of sources given targets), in addition to yielding consistent -improvements in abstractive summarization and machine translation when the -property to be optimized is BLEU or ROUGE scores. -" -4153,1701.07149,"Chen Xing, Wei Wu, Yu Wu, Ming Zhou, Yalou Huang, Wei-Ying Ma",Hierarchical Recurrent Attention Network for Response Generation,cs.CL," We study multi-turn response generation in chatbots where a response is -generated according to a conversation context. Existing work has modeled the -hierarchy of the context, but does not pay enough attention to the fact that -words and utterances in the context are differentially important. As a result, -they may lose important information in context and generate irrelevant -responses. We propose a hierarchical recurrent attention network (HRAN) to -model both aspects in a unified framework. In HRAN, a hierarchical attention -mechanism attends to important parts within and among utterances with word -level attention and utterance level attention respectively. With the word level -attention, hidden vectors of a word level encoder are synthesized as utterance -vectors and fed to an utterance level encoder to construct hidden -representations of the context. The hidden vectors of the context are then -processed by the utterance level attention and formed as context vectors for -decoding the response. Empirical studies on both automatic evaluation and human -judgment show that HRAN can significantly outperform state-of-the-art models -for multi-turn response generation. -" -4154,1701.07481,David Harwath and James R. Glass,Learning Word-Like Units from Joint Audio-Visual Analysis,cs.CL cs.CV," Given a collection of images and spoken audio captions, we present a method -for discovering word-like acoustic units in the continuous speech signal and -grounding them to semantically relevant image regions. For example, our model -is able to detect spoken instances of the word 'lighthouse' within an utterance -and associate them with image regions containing lighthouses. We do not use any -form of conventional automatic speech recognition, nor do we use any text -transcriptions or conventional linguistic annotations. Our model effectively -implements a form of spoken language acquisition, in which the computer learns -not only to recognize word categories by sound, but also to enrich the words it -learns with semantics by grounding them in images. -" -4155,1701.07795,"Aaron Jaech and Hetunandan Kamisetty and Eric Ringger and Charlie - Clarke",Match-Tensor: a Deep Relevance Model for Search,cs.IR cs.CL," The application of Deep Neural Networks for ranking in search engines may -obviate the need for the extensive feature engineering common to current -learning-to-rank methods. However, we show that combining simple relevance -matching features like BM25 with existing Deep Neural Net models often -substantially improves the accuracy of these models, indicating that they do -not capture essential local relevance matching signals. We describe a novel -deep Recurrent Neural Net-based model that we call Match-Tensor. The -architecture of the Match-Tensor model simultaneously accounts for both local -relevance matching and global topicality signals allowing for a rich interplay -between them when computing the relevance of a document to a query. On a large -held-out test set consisting of social media documents, we demonstrate not only -that Match-Tensor outperforms BM25 and other classes of DNNs but also that it -largely subsumes signals present in these models. -" -4156,1701.07880,D\'avid M\'ark Nemeskey,emLam -- a Hungarian Language Modeling baseline,cs.CL," This paper aims to make up for the lack of documented baselines for Hungarian -language modeling. Various approaches are evaluated on three publicly available -Hungarian corpora. Perplexity values comparable to models of similar-sized -English corpora are reported. A new, freely downloadable Hungar- ian benchmark -corpus is introduced. -" -4157,1701.07955,"Syed Mehedi Hasan Nirob, Md. Kazi Nayeem and Md. Saiful Islam","Statistical Analysis on Bangla Newspaper Data to Extract Trending Topic - and Visualize Its Change Over Time",cs.IR cs.CL," Trending topic of newspapers is an indicator to understand the situation of a -country and also a way to evaluate the particular newspaper. This paper -represents a model describing few techniques to select trending topics from -Bangla Newspaper. Topics that are discussed more frequently than other in -Bangla newspaper will be marked and how a very famous topic loses its -importance with the change of time and another topic takes its place will be -demonstrated. Data from two popular Bangla Newspaper with date and time were -collected. Statistical analysis was performed after on these data after -preprocessing. Popular and most used keywords were extracted from the stream of -Bangla keyword with this analysis. This model can also cluster category wise -news trend or a list of news trend in daily or weekly basis with enough data. A -pattern can be found on their news trend too. Comparison among past news trend -of Bangla newspapers will give a visualization of the situation of Bangladesh. -This visualization will be helpful to predict future trending topics of Bangla -Newspaper. -" -4158,1701.08071,Vladimir Chernykh and Pavel Prikhodko,Emotion Recognition From Speech With Recurrent Neural Networks,cs.CL," In this paper the task of emotion recognition from speech is considered. -Proposed approach uses deep recurrent neural network trained on a sequence of -acoustic features calculated over small speech intervals. At the same time -special probabilistic-nature CTC loss function allows to consider long -utterances containing both emotional and neutral parts. The effectiveness of -such an approach is shown in two ways. Firstly, the comparison with recent -advances in this field is carried out. Secondly, human performance on the same -task is measured. Both criteria show the high quality of the proposed method. -" -4159,1701.08118,"Bj\""orn Ross, Michael Rist, Guillermo Carbonell, Benjamin Cabrera, - Nils Kurowsky, Michael Wojatzki","Measuring the Reliability of Hate Speech Annotations: The Case of the - European Refugee Crisis",cs.CL," Some users of social media are spreading racist, sexist, and otherwise -hateful content. For the purpose of training a hate speech detection system, -the reliability of the annotations is crucial, but there is no universally -agreed-upon definition. We collected potentially hateful messages and asked two -groups of internet users to determine whether they were hate speech or not, -whether they should be banned or not and to rate their degree of offensiveness. -One of the groups was shown a definition prior to completing the survey. We -aimed to assess whether hate speech can be annotated reliably, and the extent -to which existing definitions are in accordance with subjective ratings. Our -results indicate that showing users a definition caused them to partially align -their own opinion with the definition but did not improve reliability, which -was very low overall. We conclude that the presence of hate speech should -perhaps not be considered a binary yes-or-no decision, and raters need more -detailed instructions for the annotation. -" -4160,1701.08156,"Sadia Tasnim Swarna, Shamim Ehsan, Md. Saiful Islam and Marium E - Jannat",A Comprehensive Survey on Bengali Phoneme Recognition,cs.SD cs.CL," Hidden Markov model based various phoneme recognition methods for Bengali -language is reviewed. Automatic phoneme recognition for Bengali language using -multilayer neural network is reviewed. Usefulness of multilayer neural network -over single layer neural network is discussed. Bangla phonetic feature table -construction and enhancement for Bengali speech recognition is also discussed. -Comparison among these methods is discussed. -" -4161,1701.08198,Anjuli Kannan and Oriol Vinyals,Adversarial Evaluation of Dialogue Models,cs.CL," The recent application of RNN encoder-decoder models has resulted in -substantial progress in fully data-driven dialogue systems, but evaluation -remains a challenge. An adversarial loss could be a way to directly evaluate -the extent to which generated dialogue responses sound like they came from a -human. This could reduce the need for human evaluation, while more directly -evaluating on a generative task. In this work, we investigate this idea by -training an RNN to discriminate a dialogue model's samples from human-generated -samples. Although we find some evidence this setup could be viable, we also -note that many issues remain in its practical application. We discuss both -aspects and conclude that future work is warranted. -" -4162,1701.08229,Danielle Mowery and Craig Bryan and Mike Conway,"Feature Studies to Inform the Classification of Depressive Symptoms from - Twitter Data for Population Health",cs.IR cs.CL cs.CY cs.SI," The utility of Twitter data as a medium to support population-level mental -health monitoring is not well understood. In an effort to better understand the -predictive power of supervised machine learning classifiers and the influence -of feature sets for efficiently classifying depression-related tweets on a -large-scale, we conducted two feature study experiments. In the first -experiment, we assessed the contribution of feature groups such as lexical -information (e.g., unigrams) and emotions (e.g., strongly negative) using a -feature ablation study. In the second experiment, we determined the percentile -of top ranked features that produced the optimal classification performance by -applying a three-step feature elimination approach. In the first experiment, we -observed that lexical features are critical for identifying depressive -symptoms, specifically for depressed mood (-35 points) and for disturbed sleep -(-43 points). In the second experiment, we observed that the optimal F1-score -performance of top ranked features in percentiles variably ranged across -classes e.g., fatigue or loss of energy (5th percentile, 288 features) to -depressed mood (55th percentile, 3,168 features) suggesting there is no -consistent count of features for predicting depressive-related tweets. We -conclude that simple lexical features and reduced feature sets can produce -comparable results to larger feature sets. -" -4163,1701.08251,"Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, - Jianfeng Gao, Georgios P. Spithourakis, Lucy Vanderwende","Image-Grounded Conversations: Multimodal Context for Natural Question - and Response Generation",cs.CL cs.AI cs.CV," The popularity of image sharing on social media and the engagement it creates -between users reflects the important role that visual context plays in everyday -conversations. We present a novel task, Image-Grounded Conversations (IGC), in -which natural-sounding conversations are generated about a shared image. To -benchmark progress, we introduce a new multiple-reference dataset of -crowd-sourced, event-centric conversations on images. IGC falls on the -continuum between chit-chat and goal-directed conversation models, where visual -grounding constrains the topic of conversation to event-driven utterances. -Experiments with models trained on social media data show that the combination -of visual and textual context enhances the quality of generated conversational -turns. In human evaluation, the gap between human performance and that of both -neural and retrieval architectures suggests that multi-modal IGC presents an -interesting challenge for dialogue research. -" -4164,1701.08269,"Rui Liu, Xiaoli Zhang","Systems of natural-language-facilitated human-robot cooperation: A - review",cs.RO cs.AI cs.CL cs.HC," Natural-language-facilitated human-robot cooperation (NLC), in which natural -language (NL) is used to share knowledge between a human and a robot for -conducting intuitive human-robot cooperation (HRC), is continuously developing -in the recent decade. Currently, NLC is used in several robotic domains such as -manufacturing, daily assistance and health caregiving. It is necessary to -summarize current NLC-based robotic systems and discuss the future developing -trends, providing helpful information for future NLC research. In this review, -we first analyzed the driving forces behind the NLC research. Regarding to a -robot s cognition level during the cooperation, the NLC implementations then -were categorized into four types {NL-based control, NL-based robot training, -NL-based task execution, NL-based social companion} for comparison and -discussion. Last based on our perspective and comprehensive paper review, the -future research trends were discussed. -" -4165,1701.08303,"Sunil Kumar Sahu, Ashish Anand","Drug-Drug Interaction Extraction from Biomedical Text Using Long Short - Term Memory Network",cs.CL," Simultaneous administration of multiple drugs can have synergistic or -antagonistic effects as one drug can affect activities of other drugs. -Synergistic effects lead to improved therapeutic outcomes, whereas, -antagonistic effects can be life-threatening, may lead to increased healthcare -cost, or may even cause death. Thus identification of unknown drug-drug -interaction (DDI) is an important concern for efficient and effective -healthcare. Although multiple resources for DDI exist, they are often unable to -keep pace with rich amount of information available in fast growing biomedical -texts. Most existing methods model DDI extraction from text as a classification -problem and mainly rely on handcrafted features. Some of these features further -depend on domain specific tools. Recently neural network models using latent -features have been shown to give similar or better performance than the other -existing models dependent on handcrafted features. In this paper, we present -three models namely, {\it B-LSTM}, {\it AB-LSTM} and {\it Joint AB-LSTM} based -on long short-term memory (LSTM) network. All three models utilize word and -position embedding as latent features and thus do not rely on explicit feature -engineering. Further use of bidirectional long short-term memory (Bi-LSTM) -networks allow implicit feature extraction from the whole sentence. The two -models, {\it AB-LSTM} and {\it Joint AB-LSTM} also use attentive pooling in the -output of Bi-LSTM layer to assign weights to features. Our experimental results -on the SemEval-2013 DDI extraction dataset show that the {\it Joint AB-LSTM} -model outperforms all the existing methods, including those relying on -handcrafted features. The other two proposed LSTM models also perform -competitively with state-of-the-art methods. -" -4166,1701.08339,"Ebrahim Ansari, M.H. Sadreddini, Mostafa Sheikhalishahi, Richard - Wallace, Fatemeh Alimardani","Using English as Pivot to Extract Persian-Italian Parallel Sentences - from Non-Parallel Corpora",cs.CL," The effectiveness of a statistical machine translation system (SMT) is very -dependent upon the amount of parallel corpus used in the training phase. For -low-resource language pairs there are not enough parallel corpora to build an -accurate SMT. In this paper, a novel approach is presented to extract bilingual -Persian-Italian parallel sentences from a non-parallel (comparable) corpus. In -this study, English is used as the pivot language to compute the matching -scores between source and target sentences and candidate selection phase. -Additionally, a new monolingual sentence similarity metric, Normalized Google -Distance (NGD) is proposed to improve the matching process. Moreover, some -extensions of the baseline system are applied to improve the quality of -extracted sentences measured with BLEU. Experimental results show that using -the new pivot based extraction can increase the quality of bilingual corpus -significantly and consequently improves the performance of the Persian-Italian -SMT system. -" -4167,1701.08340,"Ebrahim Ansari, M.H. Sadreddini, Lucio Grandinetti, Mahsa Radinmehr, - Ziba Khosravan, and Mehdi Sheikhalishahi","Extracting Bilingual Persian Italian Lexicon from Comparable Corpora - Using Different Types of Seed Dictionaries",cs.CL," Bilingual dictionaries are very important in various fields of natural -language processing. In recent years, research on extracting new bilingual -lexicons from non-parallel (comparable) corpora have been proposed. Almost all -use a small existing dictionary or other resources to make an initial list -called the ""seed dictionary"". In this paper, we discuss the use of different -types of dictionaries as the initial starting list for creating a bilingual -Persian-Italian lexicon from a comparable corpus. Our experiments apply -state-of-the-art techniques on three different seed dictionaries; an existing -dictionary, a dictionary created with pivot-based schema, and a dictionary -extracted from a small Persian-Italian parallel text. The interesting challenge -of our approach is to find a way to combine different dictionaries together in -order to produce a better and more accurate lexicon. In order to combine seed -dictionaries, we propose two different combination models and examine the -effect of our novel combination models on various comparable corpora that have -differing degrees of comparability. We conclude with a proposal for a new -weighting system to improve the extracted lexicon. The experimental results -produced by our implementation show the efficiency of our proposed models. -" -4168,1701.08533,"Mohammad Aliannejadi, Masoud Kiaeeha, Shahram Khadivi, Saeed Shiry - Ghidary","Graph-Based Semi-Supervised Conditional Random Fields For Spoken - Language Understanding Using Unaligned Data",cs.CL," We experiment graph-based Semi-Supervised Learning (SSL) of Conditional -Random Fields (CRF) for the application of Spoken Language Understanding (SLU) -on unaligned data. The aligned labels for examples are obtained using IBM -Model. We adapt a baseline semi-supervised CRF by defining new feature set and -altering the label propagation algorithm. Our results demonstrate that our -proposed approach significantly improves the performance of the supervised -model by utilizing the knowledge gained from the graph. -" -4169,1701.08655,"Shrikant Malviya, Rohit Mishra and Uma Shanker Tiwary","Structural Analysis of Hindi Phonetics and A Method for Extraction of - Phonetically Rich Sentences from a Very Large Hindi Text Corpus",cs.CL," Automatic speech recognition (ASR) and Text to speech (TTS) are two prominent -area of research in human computer interaction nowadays. A set of phonetically -rich sentences is in a matter of importance in order to develop these two -interactive modules of HCI. Essentially, the set of phonetically rich sentences -has to cover all possible phone units distributed uniformly. Selecting such a -set from a big corpus with maintaining phonetic characteristic based similarity -is still a challenging problem. The major objective of this paper is to devise -a criteria in order to select a set of sentences encompassing all phonetic -aspects of a corpus with size as minimum as possible. First, this paper -presents a statistical analysis of Hindi phonetics by observing the structural -characteristics. Further a two stage algorithm is proposed to extract -phonetically rich sentences with a high variety of triphones from the EMILLE -Hindi corpus. The algorithm consists of a distance measuring criteria to select -a sentence in order to improve the triphone distribution. Moreover, a special -preprocessing method is proposed to score each triphone in terms of inverse -probability in order to fasten the algorithm. The results show that the -approach efficiently build uniformly distributed phonetically-rich corpus with -optimum number of sentences. -" -4170,1701.08694,"Md. Saiful Islam, Fazla Elahi Md Jubayer and Syed Ikhtiar Ahmed","A Comparative Study on Different Types of Approaches to Bengali document - Categorization",cs.CL cs.LG," Document categorization is a technique where the category of a document is -determined. In this paper three well-known supervised learning techniques which -are Support Vector Machine(SVM), Na\""ive Bayes(NB) and Stochastic Gradient -Descent(SGD) compared for Bengali document categorization. Besides classifier, -classification also depends on how feature is selected from dataset. For -analyzing those classifier performances on predicting a document against twelve -categories several feature selection techniques are also applied in this -article namely Chi square distribution, normalized TFIDF (term -frequency-inverse document frequency) with word analyzer. So, we attempt to -explore the efficiency of those three-classification algorithms by using two -different feature selection techniques in this article. -" -4171,1701.08702,"Dipaloke Saha, Md Saddam Hossain, MD. Saiful Islam and Sabir Ismail","Bangla Word Clustering Based on Tri-gram, 4-gram and 5-gram Language - Model",cs.CL," In this paper, we describe a research method that generates Bangla word -clusters on the basis of relating to meaning in language and contextual -similarity. The importance of word clustering is in parts of speech (POS) -tagging, word sense disambiguation, text classification, recommender system, -spell checker, grammar checker, knowledge discover and for many others Natural -Language Processing (NLP) applications. In the history of word clustering, -English and some other languages have already implemented some methods on word -clustering efficiently. But due to lack of the resources, word clustering in -Bangla has not been still implemented efficiently. Presently, its -implementation is in the beginning stage. In some research of word clustering -in English based on preceding and next five words of a key word they found an -efficient result. Now, we are trying to implement the tri-gram, 4-gram and -5-gram model of word clustering for Bangla to observe which one is the best -among them. We have started our research with quite a large corpus of -approximate 1 lakh Bangla words. We are using a machine learning technique in -this research. We will generate word clusters and analyze the clusters by -testing some different threshold values. -" -4172,1701.08706,"Md. Fahad Hasan, Tasmin Afroz, Sabir Ismail and Md. Saiful Islam",Document Decomposition of Bangla Printed Text,cs.CV cs.CL," Today all kind of information is getting digitized and along with all this -digitization, the huge archive of various kinds of documents is being digitized -too. We know that, Optical Character Recognition is the method through which, -newspapers and other paper documents convert into digital resources. But, it is -a fact that this method works on texts only. As a result, if we try to process -any document which contains non-textual zones, then we will get garbage texts -as output. That is why; in order to digitize documents properly they should be -prepossessed carefully. And while preprocessing, segmenting document in -different regions according to the category properly is most important. But, -the Optical Character Recognition processes available for Bangla language have -no such algorithm that can categorize a newspaper/book page fully. So we worked -to decompose a document into its several parts like headlines, sub headlines, -columns, images etc. And if the input is skewed and rotated, then the input was -also deskewed and de-rotated. To decompose any Bangla document we found out the -edges of the input image. Then we find out the horizontal and vertical area of -every pixel where it lies in. Later on the input image was cut according to -these areas. Then we pick each and every sub image and found out their -height-width ratio, line height. Then according to these values the sub images -were categorized. To deskew the image we found out the skew angle and de skewed -the image according to this angle. To de-rotate the image we used the line -height, matra line, pixel ratio of matra line. -" -4173,1701.08711,Vinci Chow,"Predicting Auction Price of Vehicle License Plate with Deep Recurrent - Neural Network",cs.CL cs.LG q-fin.EC stat.ML," In Chinese societies, superstition is of paramount importance, and vehicle -license plates with desirable numbers can fetch very high prices in auctions. -Unlike other valuable items, license plates are not allocated an estimated -price before auction. I propose that the task of predicting plate prices can be -viewed as a natural language processing (NLP) task, as the value depends on the -meaning of each individual character on the plate and its semantics. I -construct a deep recurrent neural network (RNN) to predict the prices of -vehicle license plates in Hong Kong, based on the characters on a plate. I -demonstrate the importance of having a deep network and of retraining. -Evaluated on 13 years of historical auction prices, the deep RNN's predictions -can explain over 80 percent of price variations, outperforming previous models -by a significant margin. I also demonstrate how the model can be extended to -become a search engine for plates and to provide estimates of the expected -price distribution. -" -4174,1701.08756,"Rui Liu, Xiaoli Zhang","A Review of Methodologies for Natural-Language-Facilitated Human-Robot - Cooperation",cs.RO cs.AI cs.CL cs.HC," Natural-language-facilitated human-robot cooperation (NLC) refers to using -natural language (NL) to facilitate interactive information sharing and task -executions with a common goal constraint between robots and humans. Recently, -NLC research has received increasing attention. Typical NLC scenarios include -robotic daily assistance, robotic health caregiving, intelligent manufacturing, -autonomous navigation, and robot social accompany. However, a thorough review, -that can reveal latest methodologies to use NL to facilitate human-robot -cooperation, is missing. In this review, a comprehensive summary about -methodologies for NLC is presented. NLC research includes three main research -focuses: NL instruction understanding, NL-based execution plan generation, and -knowledge-world mapping. In-depth analyses on theoretical methods, -applications, and model advantages and disadvantages are made. Based on our -paper review and perspective, potential research directions of NLC are -summarized. -" -4175,1701.08888,"Guang-Neng Hu, Xin-Yu Dai","Integrating Reviews into Personalized Ranking for Cold Start - Recommendation",cs.IR cs.AI cs.CL," Item recommendation task predicts a personalized ranking over a set of items -for each individual user. One paradigm is the rating-based methods that -concentrate on explicit feedbacks and hence face the difficulties in collecting -them. Meanwhile, the ranking-based methods are presented with rated items and -then rank the rated above the unrated. This paradigm takes advantage of widely -available implicit feedback. It, however, usually ignores a kind of important -information: item reviews. Item reviews not only justify the preferences of -users, but also help alleviate the cold-start problem that fails the -collaborative filtering. In this paper, we propose two novel and simple models -to integrate item reviews into Bayesian personalized ranking. In each model, we -make use of text features extracted from item reviews using word embeddings. On -top of text features we uncover the review dimensions that explain the -variation in users' feedback and these review factors represent a prior -preference of users. Experiments on six real-world data sets show the benefits -of leveraging item reviews on ranking prediction. We also conduct analyses to -understand the proposed models. -" -4176,1701.08954,"Marco Baroni, Armand Joulin, Allan Jabri, Germ\`an Kruszewski, - Angeliki Lazaridou, Klemen Simonic, Tomas Mikolov",CommAI: Evaluating the first steps towards a useful general AI,cs.LG cs.AI cs.CL," With machine learning successfully applied to new daunting problems almost -every day, general AI starts looking like an attainable goal. However, most -current research focuses instead on important but narrow applications, such as -image classification or machine translation. We believe this to be largely due -to the lack of objective ways to measure progress towards broad machine -intelligence. In order to fill this gap, we propose here a set of concrete -desiderata for general AI, together with a platform to test machines on how -well they satisfy such desiderata, while keeping all further complexities to a -minimum. -" -4177,1701.09123,Rodrigo Agerri and German Rigau,"Robust Multilingual Named Entity Recognition with Shallow - Semi-Supervised Features",cs.CL cs.AI," We present a multilingual Named Entity Recognition approach based on a robust -and general set of features across languages and datasets. Our system combines -shallow local information with clustering semi-supervised features induced on -large amounts of unlabeled text. Understanding via empirical experimentation -how to effectively combine various types of clustering features allows us to -seamlessly export our system to other datasets and languages. The result is a -simple but highly competitive system which obtains state of the art results -across five languages and twelve datasets. The results are reported on standard -shared task evaluation data such as CoNLL for English, Spanish and Dutch. -Furthermore, and despite the lack of linguistically motivated features, we also -report best results for languages such as Basque and German. In addition, we -demonstrate that our method also obtains very competitive results even when the -amount of supervised data is cut by half, alleviating the dependency on -manually annotated data. Finally, the results show that our emphasis on -clustering features is crucial to develop robust out-of-domain models. The -system and models are freely available to facilitate its use and guarantee the -reproducibility of results. -" -4178,1702.00167,"Deepak Gupta, Shubham Tripathi, Asif Ekbal, Pushpak Bhattacharyya",SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text,cs.CL," Use of social media has grown dramatically during the last few years. Users -follow informal languages in communicating through social media. The language -of communication is often mixed in nature, where people transcribe their -regional language with English and this technique is found to be extremely -popular. Natural language processing (NLP) aims to infer the information from -these text where Part-of-Speech (PoS) tagging plays an important role in -getting the prosody of the written text. For the task of PoS tagging on -Code-Mixed Indian Social Media Text, we develop a supervised system based on -Conditional Random Field classifier. In order to tackle the problem -effectively, we have focused on extracting rich linguistic features. We -participate in three different language pairs, ie. English-Hindi, -English-Bengali and English-Telugu on three different social media platforms, -Twitter, Facebook & WhatsApp. The proposed system is able to successfully -assign coarse as well as fine-grained PoS tag labels for a given a code-mixed -sentence. Experiments show that our system is quite generic that shows -encouraging performance levels on all the three language pairs in all the -domains. -" -4179,1702.00210,Scott A. Hale and Irene Eleta,Foreign-language Reviews: Help or Hindrance?,cs.HC cs.CL cs.CY," The number and quality of user reviews greatly affects consumer purchasing -decisions. While reviews in all languages are increasing, it is still often the -case (especially for non-English speakers) that there are only a few reviews in -a person's first language. Using an online experiment, we examine the value -that potential purchasers receive from interfaces showing additional reviews in -a second language. The results paint a complicated picture with both positive -and negative reactions to the inclusion of foreign-language reviews. Roughly -26-28% of subjects clicked to see translations of the foreign-language content -when given the opportunity, and those who did so were more likely to select the -product with foreign-language reviews than those who did not. -" -4180,1702.00500,"Linfeng Song, Xiaochang Peng, Yue Zhang, Zhiguo Wang and Daniel Gildea",AMR-to-text Generation with Synchronous Node Replacement Grammar,cs.CL," This paper addresses the task of AMR-to-text generation by leveraging -synchronous node replacement grammar. During training, graph-to-string rules -are learned using a heuristic extraction algorithm. At test time, a graph -transducer is applied to collapse input AMRs and generate output sentences. -Evaluated on SemEval-2016 Task 8, our method gives a BLEU score of 25.62, which -is the best reported so far. -" -4181,1702.00523,Satish Palaniappan and Ronojoy Adhikari,Deep Learning the Indus Script,cs.CV cs.CL cs.LG," Standardized corpora of undeciphered scripts, a necessary starting point for -computational epigraphy, requires laborious human effort for their preparation -from raw archaeological records. Automating this process through machine -learning algorithms can be of significant aid to epigraphical research. Here, -we take the first steps in this direction and present a deep learning pipeline -that takes as input images of the undeciphered Indus script, as found in -archaeological artifacts, and returns as output a string of graphemes, suitable -for inclusion in a standard corpus. The image is first decomposed into regions -using Selective Search and these regions are classified as containing textual -and/or graphical information using a convolutional neural network. Regions -classified as potentially containing text are hierarchically merged and trimmed -to remove non-textual information. The remaining textual part of the image is -segmented using standard image processing techniques to isolate individual -graphemes. This set is finally passed to a second convolutional neural network -to classify the graphemes, based on a standard corpus. The classifier can -identify the presence or absence of the most frequent Indus grapheme, the ""jar"" -sign, with an accuracy of 92%. Our results demonstrate the great potential of -deep learning approaches in computational epigraphy and, more generally, in the -digital humanities. -" -4182,1702.00564,"Shravan Vasishth, Nicolas Chopin, Robin Ryder, Bruno Nicenboim","Modelling dependency completion in sentence comprehension as a Bayesian - hierarchical mixture process: A case study involving Chinese relative clauses",stat.AP cs.CL stat.ME stat.ML," We present a case-study demonstrating the usefulness of Bayesian hierarchical -mixture modelling for investigating cognitive processes. In sentence -comprehension, it is widely assumed that the distance between linguistic -co-dependents affects the latency of dependency resolution: the longer the -distance, the longer the retrieval time (the distance-based account). An -alternative theory, direct-access, assumes that retrieval times are a mixture -of two distributions: one distribution represents successful retrievals (these -are independent of dependency distance) and the other represents an initial -failure to retrieve the correct dependent, followed by a reanalysis that leads -to successful retrieval. We implement both models as Bayesian hierarchical -models and show that the direct-access model explains Chinese relative clause -reading time data better than the distance account. -" -4183,1702.00700,Egoitz Laparra and Rodrigo Agerri and Itziar Aldabe and German Rigau,Multilingual and Cross-lingual Timeline Extraction,cs.CL cs.AI," In this paper we present an approach to extract ordered timelines of events, -their participants, locations and times from a set of multilingual and -cross-lingual data sources. Based on the assumption that event-related -information can be recovered from different documents written in different -languages, we extend the Cross-document Event Ordering task presented at -SemEval 2015 by specifying two new tasks for, respectively, Multilingual and -Cross-lingual Timeline Extraction. We then develop three deterministic -algorithms for timeline extraction based on two main ideas. First, we address -implicit temporal relations at document level since explicit time-anchors are -too scarce to build a wide coverage timeline extraction system. Second, we -leverage several multilingual resources to obtain a single, inter-operable, -semantic representation of events across documents and across languages. The -result is a highly competitive system that strongly outperforms the current -state-of-the-art. Nonetheless, further analysis of the results reveals that -linking the event mentions with their target entities and time-anchors remains -a difficult challenge. The systems, resources and scorers are freely available -to facilitate its use and guarantee the reproducibility of results. -" -4184,1702.00716,"Simon Gottschalk, Elena Demidova",Analysing Temporal Evolution of Interlingual Wikipedia Article Pairs,cs.CL," Wikipedia articles representing an entity or a topic in different language -editions evolve independently within the scope of the language-specific user -communities. This can lead to different points of views reflected in the -articles, as well as complementary and inconsistent information. An analysis of -how the information is propagated across the Wikipedia language editions can -provide important insights in the article evolution along the temporal and -cultural dimensions and support quality control. To facilitate such analysis, -we present MultiWiki - a novel web-based user interface that provides an -overview of the similarities and differences across the article pairs -originating from different language editions on a timeline. MultiWiki enables -users to observe the changes in the interlingual article similarity over time -and to perform a detailed visual comparison of the article snapshots at a -particular time point. -" -4185,1702.00764,Lorenzo Ferrone and Fabio Massimo Zanzotto,"Symbolic, Distributed and Distributional Representations for Natural - Language Processing in the Era of Deep Learning: a Survey",cs.CL," Natural language is inherently a discrete symbolic representation of human -knowledge. Recent advances in machine learning (ML) and in natural language -processing (NLP) seem to contradict the above intuition: discrete symbols are -fading away, erased by vectors or tensors called distributed and distributional -representations. However, there is a strict link between -distributed/distributional representations and discrete symbols, being the -first an approximation of the second. A clearer understanding of the strict -link between distributed/distributional representations and symbols may -certainly lead to radically new deep learning networks. In this paper we make a -survey that aims to renew the link between symbolic representations and -distributed/distributional representations. This is the right time to -revitalize the area of interpreting how discrete symbols are represented inside -neural networks. -" -4186,1702.00860,"Colin Allen and Hongliang Luo and Jaimie Murdock and Jianghuai Pu and - Xiaohong Wang and Yanjie Zhai and Kun Zhao",Topic Modeling the H\`an di\u{a}n Ancient Classics,cs.CL cs.CY cs.DL cs.HC cs.IR," Ancient Chinese texts present an area of enormous challenge and opportunity -for humanities scholars interested in exploiting computational methods to -assist in the development of new insights and interpretations of culturally -significant materials. In this paper we describe a collaborative effort between -Indiana University and Xi'an Jiaotong University to support exploration and -interpretation of a digital corpus of over 18,000 ancient Chinese documents, -which we refer to as the ""Handian"" ancient classics corpus (H\`an di\u{a}n -g\u{u} j\'i, i.e, the ""Han canon"" or ""Chinese classics""). It contains classics -of ancient Chinese philosophy, documents of historical and biographical -significance, and literary works. We begin by describing the Digital Humanities -context of this joint project, and the advances in humanities computing that -made this project feasible. We describe the corpus and introduce our -application of probabilistic topic modeling to this corpus, with attention to -the particular challenges posed by modeling ancient Chinese documents. We give -a specific example of how the software we have developed can be used to aid -discovery and interpretation of themes in the corpus. We outline more advanced -forms of computer-aided interpretation that are also made possible by the -programming interface provided by our system, and the general implications of -these methods for understanding the nature of meaning in these texts. -" -4187,1702.00887,"Yoon Kim, Carl Denton, Luong Hoang, Alexander M. Rush",Structured Attention Networks,cs.CL cs.LG cs.NE," Attention networks have proven to be an effective approach for embedding -categorical inference within a deep neural network. However, for many tasks we -may want to model richer structural dependencies without abandoning end-to-end -training. In this work, we experiment with incorporating richer structural -distributions, encoded using graphical models, within deep networks. We show -that these structured attention networks are simple extensions of the basic -attention procedure, and that they allow for extending attention beyond the -standard soft-selection approach, such as attending to partial segmentations or -to subtrees. We experiment with two different classes of structured attention -networks: a linear-chain conditional random field and a graph-based parsing -model, and describe how these models can be practically implemented as neural -network layers. Experiments show that this approach is effective for -incorporating structural biases, and structured attention networks outperform -baseline attention models on a variety of synthetic and real tasks: tree -transduction, neural machine translation, question answering, and natural -language inference. We further find that models trained in this way learn -interesting unsupervised hidden representations that generalize simple -attention. -" -4188,1702.00956,"Suwon Shon, Hanseok Ko","KU-ISPL Speaker Recognition Systems under Language mismatch condition - for NIST 2016 Speaker Recognition Evaluation",cs.SD cs.CL," Korea University Intelligent Signal Processing Lab. (KU-ISPL) developed -speaker recognition system for SRE16 fixed training condition. Data for -evaluation trials are collected from outside North America, spoken in Tagalog -and Cantonese while training data only is spoken English. Thus, main issue for -SRE16 is compensating the discrepancy between different languages. As -development dataset which is spoken in Cebuano and Mandarin, we could prepare -the evaluation trials through preliminary experiments to compensate the -language mismatched condition. Our team developed 4 different approaches to -extract i-vectors and applied state-of-the-art techniques as backend. To -compensate language mismatch, we investigated and endeavored unique method such -as unsupervised language clustering, inter language variability compensation -and gender/language dependent score normalization. -" -4189,1702.00992,"Eric Malmi, Daniele Pighin, Sebastian Krause, Mikhail Kozhevnikov",Automatic Prediction of Discourse Connectives,cs.CL," Accurate prediction of suitable discourse connectives (however, furthermore, -etc.) is a key component of any system aimed at building coherent and fluent -discourses from shorter sentences and passages. As an example, a dialog system -might assemble a long and informative answer by sampling passages extracted -from different documents retrieved from the Web. We formulate the task of -discourse connective prediction and release a dataset of 2.9M sentence pairs -separated by discourse connectives for this task. Then, we evaluate the -hardness of the task for human raters, apply a recently proposed decomposable -attention (DA) model to this task and observe that the automatic predictor has -a higher F1 than human raters (32 vs. 30). Nevertheless, under specific -conditions the raters still outperform the DA model, suggesting that there is -headroom for future improvements. -" -4190,1702.01090,"Jaimie Murdock and Colin Allen and Katy B\""orner and Robert Light and - Simon McAlister and Andrew Ravenscroft and Robert Rose and Doori Rose and Jun - Otsuka and David Bourget and John Lawrence and Chris Reed","Multi-level computational methods for interdisciplinary research in the - HathiTrust Digital Library",cs.DL cs.CL cs.IR," We show how faceted search using a combination of traditional classification -systems and mixed-membership topic models can go beyond keyword search to -inform resource discovery, hypothesis formulation, and argument extraction for -interdisciplinary research. Our test domain is the history and philosophy of -scientific work on animal mind and cognition. The methods can be generalized to -other research areas and ultimately support a system for semi-automatic -identification of argument structures. We provide a case study for the -application of the methods to the problem of identifying and extracting -arguments about anthropomorphism during a critical period in the development of -comparative psychology. We show how a combination of classification systems and -mixed-membership models trained over large digital libraries can inform -resource discovery in this domain. Through a novel approach of ""drill-down"" -topic modeling---simultaneously reducing both the size of the corpus and the -unit of analysis---we are able to reduce a large collection of fulltext volumes -to a much smaller set of pages within six focal volumes containing arguments of -interest to historians and philosophers of comparative psychology. The volumes -identified in this way did not appear among the first ten results of the -keyword search in the HathiTrust digital library and the pages bear the kind of -""close reading"" needed to generate original interpretations that is the heart -of scholarly work in the humanities. Zooming back out, we provide a way to -place the books onto a map of science originally constructed from very -different data and for different purposes. The multilevel approach advances -understanding of the intellectual and societal contexts in which writings are -interpreted. -" -4191,1702.01101,Iacer Calixto and Qun Liu and Nick Campbell,Multilingual Multi-modal Embeddings for Natural Language Processing,cs.CL," We propose a novel discriminative model that learns embeddings from -multilingual and multi-modal data, meaning that our model can take advantage of -images and descriptions in multiple languages to improve embedding quality. To -that end, we introduce a modification of a pairwise contrastive estimation -optimisation function as our training objective. We evaluate our embeddings on -an image-sentence ranking (ISR), a semantic textual similarity (STS), and a -neural machine translation (NMT) task. We find that the additional multilingual -signals lead to improvements on both the ISR and STS tasks, and the -discriminative cost can also be used in re-ranking $n$-best lists produced by -NMT models, yielding strong improvements. -" -4192,1702.01147,"Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin - Junczys-Dowmunt, Philipp Koehn, Alexandra Birch","Predicting Target Language CCG Supertags Improves Neural Machine - Translation",cs.CL," Neural machine translation (NMT) models are able to partially learn syntactic -information from sequential lexical information. Still, some complex syntactic -phenomena such as prepositional phrase attachment are poorly modeled. This work -aims to answer two questions: 1) Does explicitly modeling target language -syntax help NMT? 2) Is tight integration of words and syntax better than -multitask training? We introduce syntactic information in the form of CCG -supertags in the decoder, by interleaving the target supertags with the word -sequence. Our results on WMT data show that explicitly modeling target-syntax -improves machine translation quality for German->English, a high-resource pair, -and for Romanian->English, a low-resource pair and also several syntactic -phenomena including prepositional phrase attachment. Furthermore, a tight -coupling of words and syntax improves translation quality more than multitask -training. By combining target-syntax with adding source-side dependency labels -in the embedding layer, we obtain a total improvement of 0.9 BLEU for -German->English and 1.2 BLEU for Romanian->English. -" -4193,1702.01172,"Helge Holzmann, Thomas Risse",Insights into Entity Name Evolution on Wikipedia,cs.CL cs.DL," Working with Web archives raises a number of issues caused by their temporal -characteristics. Depending on the age of the content, additional knowledge -might be needed to find and understand older texts. Especially facts about -entities are subject to change. Most severe in terms of information retrieval -are name changes. In order to find entities that have changed their name over -time, search engines need to be aware of this evolution. We tackle this problem -by analyzing Wikipedia in terms of entity evolutions mentioned in articles -regardless the structural elements. We gathered statistics and automatically -extracted minimum excerpts covering name changes by incorporating lists -dedicated to that subject. In future work, these excerpts are going to be used -to discover patterns and detect changes in other sources. In this work we -investigate whether or not Wikipedia is a suitable source for extracting the -required knowledge. -" -4194,1702.01176,"Helge Holzmann, Thomas Risse",Named Entity Evolution Analysis on Wikipedia,cs.CL cs.DL," Accessing Web archives raises a number of issues caused by their temporal -characteristics. Additional knowledge is needed to find and understand older -texts. Especially entities mentioned in texts are subject to change. Most -severe in terms of information retrieval are name changes. In order to find -entities that have changed their name over time, search engines need to be -aware of this evolution. We tackle this problem by analyzing Wikipedia in terms -of entity evolutions mentioned in articles. We present statistical data on -excerpts covering name changes, which will be used to discover similar text -passages and extract evolution knowledge in future work. -" -4195,1702.01179,"Helge Holzmann, Thomas Risse",Extraction of Evolution Descriptions from the Web,cs.CL cs.DL," The evolution of named entities affects exploration and retrieval tasks in -digital libraries. An information retrieval system that is aware of name -changes can actively support users in finding former occurrences of evolved -entities. However, current structured knowledge bases, such as DBpedia or -Freebase, do not provide enough information about evolutions, even though the -data is available on their resources, like Wikipedia. Our \emph{Evolution Base} -prototype will demonstrate how excerpts describing name evolutions can be -identified on these websites with a promising precision. The descriptions are -classified by means of models that we trained based on a recent analysis of -named entity evolutions on Wikipedia. -" -4196,1702.01187,"Helge Holzmann, Nina Tahmasebi, Thomas Risse",Named Entity Evolution Recognition on the Blogosphere,cs.CL cs.DL," Advancements in technology and culture lead to changes in our language. These -changes create a gap between the language known by users and the language -stored in digital archives. It affects user's possibility to firstly find -content and secondly interpret that content. In previous work we introduced our -approach for Named Entity Evolution Recognition~(NEER) in newspaper -collections. Lately, increasing efforts in Web preservation lead to increased -availability of Web archives covering longer time spans. However, language on -the Web is more dynamic than in traditional media and many of the basic -assumptions from the newspaper domain do not hold for Web data. In this paper -we discuss the limitations of existing methodology for NEER. We approach these -by adapting an existing NEER method to work on noisy data like the Web and the -Blogosphere in particular. We develop novel filters that reduce the noise and -make use of Semantic Web resources to obtain more information about terms. Our -evaluation shows the potentials of the proposed approach. -" -4197,1702.01287,Iacer Calixto and Qun Liu and Nick Campbell,Doubly-Attentive Decoder for Multi-modal Neural Machine Translation,cs.CL," We introduce a Multi-modal Neural Machine Translation model in which a -doubly-attentive decoder naturally incorporates spatial visual features -obtained using pre-trained convolutional neural networks, bridging the gap -between image description and translation. Our decoder learns to attend to -source-language words and parts of an image independently by means of two -separate attention mechanisms as it generates words in the target language. We -find that our model can efficiently exploit not just back-translated in-domain -multi-modal data but also large general-domain text-only MT corpora. We also -report state-of-the-art results on the Multi30k data set. -" -4198,1702.01360,"Chunxi Liu, Jinyi Yang, Ming Sun, Santosh Kesiraju, Alena Rott, Lucas - Ondel, Pegah Ghahremani, Najim Dehak, Lukas Burget, Sanjeev Khudanpur",An Empirical Evaluation of Zero Resource Acoustic Unit Discovery,cs.CL," Acoustic unit discovery (AUD) is a process of automatically identifying a -categorical acoustic unit inventory from speech and producing corresponding -acoustic unit tokenizations. AUD provides an important avenue for unsupervised -acoustic model training in a zero resource setting where expert-provided -linguistic knowledge and transcribed speech are unavailable. Therefore, to -further facilitate zero-resource AUD process, in this paper, we demonstrate -acoustic feature representations can be significantly improved by (i) -performing linear discriminant analysis (LDA) in an unsupervised self-trained -fashion, and (ii) leveraging resources of other languages through building a -multilingual bottleneck (BN) feature extractor to give effective cross-lingual -generalization. Moreover, we perform comprehensive evaluations of AUD efficacy -on multiple downstream speech applications, and their correlated performance -suggests that AUD evaluations are feasible using different alternative language -resources when only a subset of these evaluation resources can be available in -typical zero resource applications. -" -4199,1702.01417,"Jiaqi Mu, Suma Bhat, Pramod Viswanath","All-but-the-Top: Simple and Effective Postprocessing for Word - Representations",cs.CL stat.ML," Real-valued word representations have transformed NLP applications; popular -examples are word2vec and GloVe, recognized for their ability to capture -linguistic regularities. In this paper, we demonstrate a {\em very simple}, and -yet counter-intuitive, postprocessing technique -- eliminate the common mean -vector and a few top dominating directions from the word vectors -- that -renders off-the-shelf representations {\em even stronger}. The postprocessing -is empirically validated on a variety of lexical-level intrinsic tasks (word -similarity, concept categorization, word analogy) and sentence-level tasks -(semantic textural similarity and { text classification}) on multiple datasets -and with a variety of representation methods and hyperparameter choices in -multiple languages; in each case, the processed representations are -consistently better than the original ones. -" -4200,1702.01466,"Hongyu Gong, Jiaqi Mu, Suma Bhat, Pramod Viswanath",Prepositions in Context,cs.CL," Prepositions are highly polysemous, and their variegated senses encode -significant semantic information. In this paper we match each preposition's -complement and attachment and their interplay crucially to the geometry of the -word vectors to the left and right of the preposition. Extracting such features -from the vast number of instances of each preposition and clustering them makes -for an efficient preposition sense disambigution (PSD) algorithm, which is -comparable to and better than state-of-the-art on two benchmark datasets. Our -reliance on no external linguistic resource allows us to scale the PSD -algorithm to a large WikiCorpus and learn sense-specific preposition -representations -- which we show to encode semantic relations and paraphrasing -of verb particle compounds, via simple vector operations. -" -4201,1702.01517,"Zhongqing Wang, Yue Zhang",Opinion Recommendation using Neural Memory Model,cs.CL," We present opinion recommendation, a novel task of jointly predicting a -custom review with a rating score that a certain user would give to a certain -product or service, given existing reviews and rating scores to the product or -service by other users, and the reviews that the user has given to other -products and services. A characteristic of opinion recommendation is the -reliance of multiple data sources for multi-task joint learning, which is the -strength of neural models. We use a single neural network to model users and -products, capturing their correlation and generating customised product -representations using a deep memory network, from which customised ratings and -reviews are constructed jointly. Results show that our opinion recommendation -system gives ratings that are closer to real user ratings on Yelp.com data -compared with Yelp's own ratings, and our methods give better results compared -to several pipelines baselines using state-of-the-art sentiment rating and -summarization systems. -" -4202,1702.01569,"Jonathan Herzig, Jonathan Berant",Neural Semantic Parsing over Multiple Knowledge-bases,cs.CL," A fundamental challenge in developing semantic parsers is the paucity of -strong supervision in the form of language utterances annotated with logical -form. In this paper, we propose to exploit structural regularities in language -in different domains, and train semantic parsers over multiple knowledge-bases -(KBs), while sharing information across datasets. We find that we can -substantially improve parsing accuracy by training a single -sequence-to-sequence model over multiple KBs, when providing an encoding of the -domain at decoding time. Our model achieves state-of-the-art performance on the -Overnight dataset (containing eight domains), improves performance over a -single KB baseline from 75.6% to 79.6%, while obtaining a 7x reduction in the -number of model parameters. -" -4203,1702.01587,"Omkar Dhariya, Shrikant Malviya and Uma Shanker Tiwary",A Hybrid Approach For Hindi-English Machine Translation,cs.CL," In this paper, an extended combined approach of phrase based statistical -machine translation (SMT), example based MT (EBMT) and rule based MT (RBMT) is -proposed to develop a novel hybrid data driven MT system capable of -outperforming the baseline SMT, EBMT and RBMT systems from which it is derived. -In short, the proposed hybrid MT process is guided by the rule based MT after -getting a set of partial candidate translations provided by EBMT and SMT -subsystems. Previous works have shown that EBMT systems are capable of -outperforming the phrase-based SMT systems and RBMT approach has the strength -of generating structurally and morphologically more accurate results. This -hybrid approach increases the fluency, accuracy and grammatical precision which -improve the quality of a machine translation system. A comparison of the -proposed hybrid machine translation (HTM) model with renowned translators i.e. -Google, BING and Babylonian is also presented which shows that the proposed -model works better on sentences with ambiguity as well as comprised of idioms -than others. -" -4204,1702.01711,"I\~naki San Vicente, Rodrigo Agerri, German Rigau","Q-WordNet PPV: Simple, Robust and (almost) Unsupervised Generation of - Polarity Lexicons for Multiple Languages",cs.CL," This paper presents a simple, robust and (almost) unsupervised -dictionary-based method, qwn-ppv (Q-WordNet as Personalized PageRanking Vector) -to automatically generate polarity lexicons. We show that qwn-ppv outperforms -other automatically generated lexicons for the four extrinsic evaluations -presented here. It also shows very competitive and robust results with respect -to manually annotated ones. Results suggest that no single lexicon is best for -every task and dataset and that the intrinsic evaluation of polarity lexicons -is not a good performance indicator on a Sentiment Analysis task. The qwn-ppv -method allows to easily create quality polarity lexicons whenever no -domain-based annotated corpora are available for a given language. -" -4205,1702.01714,"Daniele Falavigna, Marco Matassoni, Shahab Jalalvand, Matteo Negri, - Marco Turchi",DNN adaptation by automatic quality estimation of ASR hypotheses,cs.CL," In this paper we propose to exploit the automatic Quality Estimation (QE) of -ASR hypotheses to perform the unsupervised adaptation of a deep neural network -modeling acoustic probabilities. Our hypothesis is that significant -improvements can be achieved by: i)automatically transcribing the evaluation -data we are currently trying to recognise, and ii) selecting from it a subset -of ""good quality"" instances based on the word error rate (WER) scores predicted -by a QE component. To validate this hypothesis, we run several experiments on -the evaluation data sets released for the CHiME-3 challenge. First, we operate -in oracle conditions in which manual transcriptions of the evaluation data are -available, thus allowing us to compute the ""true"" sentence WER. In this -scenario, we perform the adaptation with variable amounts of data, which are -characterised by different levels of quality. Then, we move to realistic -conditions in which the manual transcriptions of the evaluation data are not -available. In this case, the adaptation is performed on data selected according -to the WER scores ""predicted"" by a QE component. Our results indicate that: i) -QE predictions allow us to closely approximate the adaptation results obtained -in oracle conditions, and ii) the overall ASR performance based on the proposed -QE-driven adaptation method is significantly better than the strong, most -recent, CHiME-3 baseline. -" -4206,1702.01776,"Wenya Wang, Sinno Jialin Pan, Daniel Dahlmeier","Multi-task memory networks for category-specific aspect and opinion - terms co-extraction",cs.CL," In aspect-based sentiment analysis, most existing methods either focus on -aspect/opinion terms extraction or aspect terms categorization. However, each -task by itself only provides partial information to end users. To generate more -detailed and structured opinion analysis, we propose a finer-grained problem, -which we call category-specific aspect and opinion terms extraction. This -problem involves the identification of aspect and opinion terms within each -sentence, as well as the categorization of the identified terms. To this end, -we propose an end-to-end multi-task attention model, where each task -corresponds to aspect/opinion terms extraction for a specific category. Our -model benefits from exploring the commonalities and relationships among -different tasks to address the data sparsity issue. We demonstrate its -state-of-the-art performance on three benchmark datasets. -" -4207,1702.01802,"Markus Freitag, Yaser Al-Onaizan, Baskaran Sankaran",Ensemble Distillation for Neural Machine Translation,cs.CL," Knowledge distillation describes a method for training a student network to -perform better by learning from a stronger teacher network. Translating a -sentence with an Neural Machine Translation (NMT) engine is time expensive and -having a smaller model speeds up this process. We demonstrate how to transfer -the translation quality of an ensemble and an oracle BLEU teacher network into -a single NMT system. Further, we present translation improvements from a -teacher network that has the same architecture and dimensions of the student -network. As the training of the student model is still expensive, we introduce -a data filtering method based on the knowledge of the teacher model that not -only speeds up the training, but also leads to better translation quality. Our -techniques need no code change and can be easily reproduced with any NMT -architecture to speed up the decoding process. -" -4208,1702.01806,Markus Freitag and Yaser Al-Onaizan,Beam Search Strategies for Neural Machine Translation,cs.CL," The basic concept in Neural Machine Translation (NMT) is to train a large -Neural Network that maximizes the translation performance on a given parallel -corpus. NMT is then using a simple left-to-right beam-search decoder to -generate new translations that approximately maximize the trained conditional -probability. The current beam search strategy generates the target sentence -word by word from left-to- right while keeping a fixed amount of active -candidates at each time step. First, this simple search is less adaptive as it -also expands candidates whose scores are much worse than the current best. -Secondly, it does not expand hypotheses if they are not within the best scoring -candidates, even if their scores are close to the best one. The latter one can -be avoided by increasing the beam size until no performance improvement can be -observed. While you can reach better performance, this has the draw- back of a -slower decoding speed. In this paper, we concentrate on speeding up the decoder -by applying a more flexible beam search strategy whose candidate size may vary -at each time step depending on the candidate scores. We speed up the original -decoder by up to 43% for the two language pairs German-English and -Chinese-English without losing any translation quality. -" -4209,1702.01815,"Gemma Boleda, Sebastian Pad\'o, Nghia The Pham, Marco Baroni","Living a discrete life in a continuous world: Reference with distributed - representations",cs.CL," Reference is a crucial property of language that allows us to connect -linguistic expressions to the world. Modeling it requires handling both -continuous and discrete aspects of meaning. Data-driven models excel at the -former, but struggle with the latter, and the reverse is true for symbolic -models. - This paper (a) introduces a concrete referential task to test both aspects, -called cross-modal entity tracking; (b) proposes a neural network architecture -that uses external memory to build an entity library inspired in the DRSs of -DRT, with a mechanism to dynamically introduce new referents or add information -to referents that are already in the library. - Our model shows promise: it beats traditional neural network architectures on -the task. However, it is still outperformed by Memory Networks, another model -with external memory. -" -4210,1702.01829,"Yangfeng Ji, Noah Smith",Neural Discourse Structure for Text Categorization,cs.CL cs.LG," We show that discourse structure, as defined by Rhetorical Structure Theory -and provided by an existing discourse parser, benefits text categorization. Our -approach uses a recursive neural network and a newly proposed attention -mechanism to compute a representation of the text that focuses on salient -content, from the perspective of both RST and the task. Experiments consider -variants of the approach and illustrate its strengths and weaknesses. -" -4211,1702.01841,"Roy Schwartz, Maarten Sap, Ioannis Konstas, Li Zilles, Yejin Choi and - Noah A. Smith","The Effect of Different Writing Tasks on Linguistic Style: A Case Study - of the ROC Story Cloze Task",cs.CL," A writer's style depends not just on personal traits but also on her intent -and mental state. In this paper, we show how variants of the same writing task -can lead to measurable differences in writing style. We present a case study -based on the story cloze task (Mostafazadeh et al., 2016a), where annotators -were assigned similar writing tasks with different constraints: (1) writing an -entire story, (2) adding a story ending for a given story context, and (3) -adding an incoherent ending to a story. We show that a simple linear classifier -informed by stylistic features is able to successfully distinguish among the -three cases, without even looking at the story context. In addition, combining -our stylistic features with language model predictions reaches state of the art -performance on the story cloze challenge. Our results demonstrate that -different task framings can dramatically affect the way people write. -" -4212,1702.01923,"Wenpeng Yin, Katharina Kann, Mo Yu, Hinrich Sch\""utze",Comparative Study of CNN and RNN for Natural Language Processing,cs.CL," Deep neural networks (DNN) have revolutionized the field of natural language -processing (NLP). Convolutional neural network (CNN) and recurrent neural -network (RNN), the two main types of DNN architectures, are widely explored to -handle various NLP tasks. CNN is supposed to be good at extracting -position-invariant features and RNN at modeling units in sequence. The state of -the art on many NLP tasks often switches due to the battle between CNNs and -RNNs. This work is the first systematic comparison of CNN and RNN on a wide -range of representative NLP tasks, aiming to give basic guidance for DNN -selection. -" -4213,1702.01925,Ibrahim Abu El-Khair,"Effects of Stop Words Elimination for Arabic Information Retrieval: A - Comparative Study",cs.CL cs.IR," The effectiveness of three stop words lists for Arabic Information -Retrieval---General Stoplist, Corpus-Based Stoplist, Combined Stoplist ---were -investigated in this study. Three popular weighting schemes were examined: the -inverse document frequency weight, probabilistic weighting, and statistical -language modelling. The Idea is to combine the statistical approaches with -linguistic approaches to reach an optimal performance, and compare their effect -on retrieval. The LDC (Linguistic Data Consortium) Arabic Newswire data set was -used with the Lemur Toolkit. The Best Match weighting scheme used in the Okapi -retrieval system had the best overall performance of the three weighting -algorithms used in the study, stoplists improved retrieval effectiveness -especially when used with the BM25 weight. The overall performance of a general -stoplist was better than the other two lists. -" -4214,1702.01932,"Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, - Jianfeng Gao, Wen-tau Yih, Michel Galley",A Knowledge-Grounded Neural Conversation Model,cs.CL," Neural network models are capable of generating extremely natural sounding -conversational interactions. Nevertheless, these models have yet to demonstrate -that they can incorporate content in the form of factual information or -entity-grounded opinion that would enable them to serve in more task-oriented -conversational applications. This paper presents a novel, fully data-driven, -and knowledge-grounded neural conversation model aimed at producing more -contentful responses without slot filling. We generalize the widely-used -Seq2Seq approach by conditioning responses on both conversation history and -external ""facts"", allowing the model to be versatile and applicable in an -open-domain setting. Our approach yields significant improvements over a -competitive Seq2Seq baseline. Human judges found that our outputs are -significantly more informative. -" -4215,1702.01944,"I\~naki San Vicente, Xabier Saralegi, Rodrigo Agerri",EliXa: A Modular and Flexible ABSA Platform,cs.CL," This paper presents a supervised Aspect Based Sentiment Analysis (ABSA) -system. Our aim is to develop a modular platform which allows to easily conduct -experiments by replacing the modules or adding new features. We obtain the best -result in the Opinion Target Extraction (OTE) task (slot 2) using an -off-the-shelf sequence labeler. The target polarity classification (slot 3) is -addressed by means of a multiclass SVM algorithm which includes lexical based -features such as the polarity values obtained from domain and open polarity -lexicons. The system obtains accuracies of 0.70 and 0.73 for the restaurant and -laptop domain respectively, and performs second best in the out-of-domain -hotel, achieving an accuracy of 0.80. -" -4216,1702.01991,"Grzegorz Chrupa{\l}a, Lieke Gelderloos, Afra Alishahi","Representations of language in a model of visually grounded speech - signal",cs.CL cs.AI cs.LG," We present a visually grounded model of speech perception which projects -spoken utterances and images to a joint semantic space. We use a multi-layer -recurrent highway network to model the temporal nature of spoken speech, and -show that it learns to extract both form and meaning-based linguistic knowledge -from the input signal. We carry out an in-depth analysis of the representations -used by different components of the trained model and show that encoding of -semantic aspects tends to become richer as we go up the hierarchy of layers, -whereas encoding of form-related aspects of the language input tends to -initially increase and then plateau or decrease. -" -4217,1702.02052,"Sebastian Ruder, Parsa Ghaffari, and John G. Breslin",Knowledge Adaptation: Teaching to Adapt,cs.CL cs.LG," Domain adaptation is crucial in many real-world applications where the -distribution of the training data differs from the distribution of the test -data. Previous Deep Learning-based approaches to domain adaptation need to be -trained jointly on source and target domain data and are therefore unappealing -in scenarios where models need to be adapted to a large number of domains or -where a domain is evolving, e.g. spam detection where attackers continuously -change their tactics. - To fill this gap, we propose Knowledge Adaptation, an extension of Knowledge -Distillation (Bucilua et al., 2006; Hinton et al., 2015) to the domain -adaptation scenario. We show how a student model achieves state-of-the-art -results on unsupervised domain adaptation from multiple sources on a standard -sentiment analysis benchmark by taking into account the domain-specific -expertise of multiple teachers and the similarities between their domains. - When learning from a single teacher, using domain similarity to gauge -trustworthiness is inadequate. To this end, we propose a simple metric that -correlates well with the teacher's accuracy in the target domain. We -demonstrate that incorporating high-confidence examples selected by this metric -enables the student model to achieve state-of-the-art performance in the -single-source scenario. -" -4218,1702.02092,"Tom A. F. Anderson, David M. W. Powers",Characterisation of speech diversity using self-organising maps,cs.CL cs.NE cs.SD," We report investigations into speaker classification of larger quantities of -unlabelled speech data using small sets of manually phonemically annotated -speech. The Kohonen speech typewriter is a semi-supervised method comprised of -self-organising maps (SOMs) that achieves low phoneme error rates. A SOM is a -2D array of cells that learn vector representations of the data based on -neighbourhoods. In this paper, we report a method to evaluate pronunciation -using multilevel SOMs with /hVd/ single syllable utterances for the study of -vowels, for Australian pronunciation. -" -4219,1702.02098,"Emma Strubell, Patrick Verga, David Belanger, Andrew McCallum",Fast and Accurate Entity Recognition with Iterated Dilated Convolutions,cs.CL," Today when many practitioners run basic NLP on the entire web and -large-volume traffic, faster methods are paramount to saving time and energy -costs. Recent advances in GPU hardware have led to the emergence of -bi-directional LSTMs as a standard method for obtaining per-token vector -representations serving as input to labeling tasks such as NER (often followed -by prediction in a linear-chain CRF). Though expressive and accurate, these -models fail to fully exploit GPU parallelism, limiting their computational -efficiency. This paper proposes a faster alternative to Bi-LSTMs for NER: -Iterated Dilated Convolutional Neural Networks (ID-CNNs), which have better -capacity than traditional CNNs for large context and structured prediction. -Unlike LSTMs whose sequential processing on sentences of length N requires O(N) -time even in the face of parallelism, ID-CNNs permit fixed-depth convolutions -to run in parallel across entire documents. We describe a distinct combination -of network structure, parameter sharing and training procedures that enable -dramatic 14-20x test-time speedups while retaining accuracy comparable to the -Bi-LSTM-CRF. Moreover, ID-CNNs trained to aggregate context from the entire -document are even more accurate while maintaining 8x faster test time speeds. -" -4220,1702.02170,"Stanis{\l}aw Jastrzebski, Damian Le\'sniak, Wojciech Marian Czarnecki","How to evaluate word embeddings? On importance of data efficiency and - simple supervised tasks",cs.CL," Maybe the single most important goal of representation learning is making -subsequent learning faster. Surprisingly, this fact is not well reflected in -the way embeddings are evaluated. In addition, recent practice in word -embeddings points towards importance of learning specialized representations. -We argue that focus of word representation evaluation should reflect those -trends and shift towards evaluating what useful information is easily -accessible. Specifically, we propose that evaluation should focus on data -efficiency and simple supervised tasks, where the amount of available data is -varied and scores of a supervised model are reported for each subset (as -commonly done in transfer learning). - In order to illustrate significance of such analysis, a comprehensive -evaluation of selected word embeddings is presented. Proposed approach yields a -more complete picture and brings new insight into performance characteristics, -for instance information about word similarity or analogy tends to be -non--linearly encoded in the embedding space, which questions the cosine-based, -unsupervised, evaluation methods. All results and analysis scripts are -available online. -" -4221,1702.02171,"Sewon Min, Minjoon Seo, Hannaneh Hajishirzi","Question Answering through Transfer Learning from Large Fine-grained - Supervision Data",cs.CL," We show that the task of question answering (QA) can significantly benefit -from the transfer learning of models trained on a different large, fine-grained -QA dataset. We achieve the state of the art in two well-studied QA datasets, -WikiQA and SemEval-2016 (Task 3A), through a basic transfer learning technique -from SQuAD. For WikiQA, our model outperforms the previous best model by more -than 8%. We demonstrate that finer supervision provides better guidance for -learning lexical and syntactic information than coarser supervision, through -quantitative results and visual analysis. We also show that a similar transfer -learning procedure achieves the state of the art on an entailment task. -" -4222,1702.02206,"Zhilin Yang, Junjie Hu, Ruslan Salakhutdinov, William W. Cohen",Semi-Supervised QA with Generative Domain-Adaptive Nets,cs.CL cs.LG," We study the problem of semi-supervised question answering----utilizing -unlabeled text to boost the performance of question answering models. We -propose a novel training framework, the Generative Domain-Adaptive Nets. In -this framework, we train a generative model to generate questions based on the -unlabeled text, and combine model-generated questions with human-generated -questions for training question answering models. We develop novel domain -adaptation algorithms, based on reinforcement learning, to alleviate the -discrepancy between the model-generated data distribution and the -human-generated data distribution. Experiments show that our proposed framework -obtains substantial improvement from unlabeled text. -" -4223,1702.02211,"Tarek Sakakini, Suma Bhat, Pramod Viswanath",Fixing the Infix: Unsupervised Discovery of Root-and-Pattern Morphology,cs.CL," We present an unsupervised and language-agnostic method for learning -root-and-pattern morphology in Semitic languages. This form of morphology, -abundant in Semitic languages, has not been handled in prior unsupervised -approaches. We harness the syntactico-semantic information in distributed word -representations to solve the long standing problem of root-and-pattern -discovery in Semitic languages. Moreover, we construct an unsupervised root -extractor based on the learned rules. We prove the validity of learned rules -across Arabic, Hebrew, and Amharic, alongside showing that our root extractor -compares favorably with a widely used, carefully engineered root extractor: -ISRI. -" -4224,1702.02212,"Tarek Sakakini, Suma Bhat, Pramod Viswanath",MORSE: Semantic-ally Drive-n MORpheme SEgment-er,cs.CL," We present in this paper a novel framework for morpheme segmentation which -uses the morpho-syntactic regularities preserved by word representations, in -addition to orthographic features, to segment words into morphemes. This -framework is the first to consider vocabulary-wide syntactico-semantic -information for this task. We also analyze the deficiencies of available -benchmarking datasets and introduce our own dataset that was created on the -basis of compositionality. We validate our algorithm across datasets and -present state-of-the-art results. -" -4225,1702.02261,"Pramod Bharadwaj Chandrashekar (1), Arjun Magge (1), Abeed Sarker (2), - Graciela Gonzalez (2) ((1) Arizona State University, (2) University of - Pennsylvania)","Social media mining for identification and exploration of health-related - information from pregnant women",cs.CL," Widespread use of social media has led to the generation of substantial -amounts of information about individuals, including health-related information. -Social media provides the opportunity to study health-related information about -selected population groups who may be of interest for a particular study. In -this paper, we explore the possibility of utilizing social media to perform -targeted data collection and analysis from a particular population group -- -pregnant women. We hypothesize that we can use social media to identify cohorts -of pregnant women and follow them over time to analyze crucial health-related -information. To identify potentially pregnant women, we employ simple -rule-based searches that attempt to detect pregnancy announcements with -moderate precision. To further filter out false positives and noise, we employ -a supervised classifier using a small number of hand-annotated data. We then -collect their posts over time to create longitudinal health timelines and -attempt to divide the timelines into different pregnancy trimesters. Finally, -we assess the usefulness of the timelines by performing a preliminary analysis -to estimate drug intake patterns of our cohort at different trimesters. Our -rule-based cohort identification technique collected 53,820 users over thirty -months from Twitter. Our pregnancy announcement classification technique -achieved an F-measure of 0.81 for the pregnancy class, resulting in 34,895 user -timelines. Analysis of the timelines revealed that pertinent health-related -information, such as drug-intake and adverse reactions can be mined from the -data. Our approach to using user timelines in this fashion has produced very -encouraging results and can be employed for other important tasks where -cohorts, for which health-related information may not be available from other -sources, are required to be followed over time to derive population-based -estimates. -" -4226,1702.02265,"Kazuma Hashimoto, Yoshimasa Tsuruoka",Neural Machine Translation with Source-Side Latent Graph Parsing,cs.CL," This paper presents a novel neural machine translation model which jointly -learns translation and source-side latent graph representations of sentences. -Unlike existing pipelined approaches using syntactic parsers, our end-to-end -model learns a latent graph parser as part of the encoder of an attention-based -neural machine translation model, and thus the parser is optimized according to -the translation objective. In experiments, we first show that our model -compares favorably with state-of-the-art sequential and pipelined syntax-based -NMT models. We also show that the performance of our model can be further -improved by pre-training it with a small amount of treebank annotations. Our -final ensemble model significantly outperforms the previous best models on the -standard English-to-Japanese translation dataset. -" -4227,1702.02287,"Baichuan Zhang, Mohammad Al Hasan",Name Disambiguation in Anonymized Graphs using Network Embedding,cs.SI cs.CL cs.IR," In real-world, our DNA is unique but many people share names. This phenomenon -often causes erroneous aggregation of documents of multiple persons who are -namesake of one another. Such mistakes deteriorate the performance of document -retrieval, web search, and more seriously, cause improper attribution of credit -or blame in digital forensic. To resolve this issue, the name disambiguation -task is designed which aims to partition the documents associated with a name -reference such that each partition contains documents pertaining to a unique -real-life person. Existing solutions to this task substantially rely on feature -engineering, such as biographical feature extraction, or construction of -auxiliary features from Wikipedia. However, for many scenarios, such features -may be costly to obtain or unavailable due to the risk of privacy violation. In -this work, we propose a novel name disambiguation method. Our proposed method -is non-intrusive of privacy because instead of using attributes pertaining to a -real-life person, our method leverages only relational data in the form of -anonymized graphs. In the methodological aspect, the proposed method uses a -novel representation learning model to embed each document in a low dimensional -vector space where name disambiguation can be solved by a hierarchical -agglomerative clustering algorithm. Our experimental results demonstrate that -the proposed method is significantly better than the existing name -disambiguation methods working in a similar setting. -" -4228,1702.02363,"H. Bahadir Sahin, Caglar Tirkaz, Eray Yildiz, Mustafa Tolga Eren, Ozan - Sonmez","Automatically Annotated Turkish Corpus for Named Entity Recognition and - Text Categorization using Large-Scale Gazetteers",cs.CL," Turkish Wikipedia Named-Entity Recognition and Text Categorization (TWNERTC) -dataset is a collection of automatically categorized and annotated sentences -obtained from Wikipedia. We constructed large-scale gazetteers by using a graph -crawler algorithm to extract relevant entity and domain information from a -semantic knowledge base, Freebase. The constructed gazetteers contains -approximately 300K entities with thousands of fine-grained entity types under -77 different domains. Since automated processes are prone to ambiguity, we also -introduce two new content specific noise reduction methodologies. Moreover, we -map fine-grained entity types to the equivalent four coarse-grained types: -person, loc, org, misc. Eventually, we construct six different dataset versions -and evaluate the quality of annotations by comparing ground truths from human -annotators. We make these datasets publicly available to support studies on -Turkish named-entity recognition (NER) and text categorization (TC). -" -4229,1702.02367,"Claudio Greco, Alessandro Suglia, Pierpaolo Basile, Gaetano Rossiello, - Giovanni Semeraro",Iterative Multi-document Neural Attention for Multiple Answer Prediction,cs.CL," People have information needs of varying complexity, which can be solved by -an intelligent agent able to answer questions formulated in a proper way, -eventually considering user context and preferences. In a scenario in which the -user profile can be considered as a question, intelligent agents able to answer -questions can be used to find the most relevant answers for a given user. In -this work we propose a novel model based on Artificial Neural Networks to -answer questions with multiple answers by exploiting multiple facts retrieved -from a knowledge base. The model is evaluated on the factoid Question Answering -and top-n recommendation tasks of the bAbI Movie Dialog dataset. After -assessing the performance of the model on both tasks, we try to define the -long-term goal of a conversational recommender system able to interact using -natural language and to support users in their information seeking processes in -a personalized way. -" -4230,1702.02390,"Stanislau Semeniuta, Aliaksei Severyn, Erhardt Barth",A Hybrid Convolutional Variational Autoencoder for Text Generation,cs.CL," In this paper we explore the effect of architectural choices on learning a -Variational Autoencoder (VAE) for text generation. In contrast to the -previously introduced VAE model for text where both the encoder and decoder are -RNNs, we propose a novel hybrid architecture that blends fully feed-forward -convolutional and deconvolutional components with a recurrent language model. -Our architecture exhibits several attractive properties such as faster run time -and convergence, ability to better handle long sequences and, more importantly, -it helps to avoid some of the major difficulties posed by training VAE models -on textual data. -" -4231,1702.02426,"Sebastian Ruder, Parsa Ghaffari, and John G. Breslin",Data Selection Strategies for Multi-Domain Sentiment Analysis,cs.CL cs.LG," Domain adaptation is important in sentiment analysis as sentiment-indicating -words vary between domains. Recently, multi-domain adaptation has become more -pervasive, but existing approaches train on all available source domains -including dissimilar ones. However, the selection of appropriate training data -is as important as the choice of algorithm. We undertake -- to our knowledge -for the first time -- an extensive study of domain similarity metrics in the -context of sentiment analysis and propose novel representations, metrics, and a -new scope for data selection. We evaluate the proposed methods on two -large-scale multi-domain adaptation settings on tweets and reviews and -demonstrate that they consistently outperform strong random and balanced -baselines, while our proposed selection strategy outperforms instance-level -selection and yields the best score on a large reviews corpus. -" -4232,1702.02429,"Jiatao Gu, Kyunghyun Cho and Victor O.K. Li",Trainable Greedy Decoding for Neural Machine Translation,cs.CL cs.LG," Recent research in neural machine translation has largely focused on two -aspects; neural network architectures and end-to-end learning algorithms. The -problem of decoding, however, has received relatively little attention from the -research community. In this paper, we solely focus on the problem of decoding -given a trained neural machine translation model. Instead of trying to build a -new decoding algorithm for any specific decoding objective, we propose the idea -of trainable decoding algorithm in which we train a decoding algorithm to find -a translation that maximizes an arbitrary decoding objective. More -specifically, we design an actor that observes and manipulates the hidden state -of the neural machine translation decoder and propose to train it using a -variant of deterministic policy gradient. We extensively evaluate the proposed -algorithm using four language pairs and two decoding objectives and show that -we can indeed train a trainable greedy decoder that generates a better -translation (in terms of a target decoding objective) with minimal -computational overhead. -" -4233,1702.02535,"Ye Zhang, Matthew Lease, Byron C. Wallace","Exploiting Domain Knowledge via Grouped Weight Sharing with Application - to Text Categorization",cs.CL," A fundamental advantage of neural models for NLP is their ability to learn -representations from scratch. However, in practice this often means ignoring -existing external linguistic resources, e.g., WordNet or domain specific -ontologies such as the Unified Medical Language System (UMLS). We propose a -general, novel method for exploiting such resources via weight sharing. Prior -work on weight sharing in neural networks has considered it largely as a means -of model compression. In contrast, we treat weight sharing as a flexible -mechanism for incorporating prior knowledge into neural models. We show that -this approach consistently yields improved performance on classification tasks -compared to baseline strategies that do not exploit weight sharing. -" -4234,1702.02540,W. James Murdoch and Arthur Szlam,Automatic Rule Extraction from Long Short Term Memory Networks,cs.CL cs.AI cs.NE stat.ML," Although deep learning models have proven effective at solving problems in -natural language processing, the mechanism by which they come to their -conclusions is often unclear. As a result, these models are generally treated -as black boxes, yielding no insight of the underlying learned patterns. In this -paper we consider Long Short Term Memory networks (LSTMs) and demonstrate a new -approach for tracking the importance of a given input to the LSTM for a given -output. By identifying consistently important patterns of words, we are able to -distill state of the art LSTMs on sentiment analysis and question answering -into a set of representative phrases. This representation is then -quantitatively validated by using the extracted phrases to construct a simple, -rule-based classifier which approximates the output of the LSTM. -" -4235,1702.02584,Lei Chen and Chong MIn Lee,Predicting Audience's Laughter Using Convolutional Neural Network,cs.CL," For the purpose of automatically evaluating speakers' humor usage, we build a -presentation corpus containing humorous utterances based on TED talks. Compared -to previous data resources supporting humor recognition research, ours has -several advantages, including (a) both positive and negative instances coming -from a homogeneous data set, (b) containing a large number of speakers, and (c) -being open. Focusing on using lexical cues for humor recognition, we -systematically compare a newly emerging text classification method based on -Convolutional Neural Networks (CNNs) with a well-established conventional -method using linguistic knowledge. The advantages of the CNN method are both -getting higher detection accuracies and being able to learn essential features -automatically. -" -4236,1702.02640,"Zhe Gan, P. D. Singh, Ameet Joshi, Xiaodong He, Jianshu Chen, Jianfeng - Gao, Li Deng",Character-level Deep Conflation for Business Data Analytics,cs.CL cs.LG," Connecting different text attributes associated with the same entity -(conflation) is important in business data analytics since it could help merge -two different tables in a database to provide a more comprehensive profile of -an entity. However, the conflation task is challenging because two text strings -that describe the same entity could be quite different from each other for -reasons such as misspelling. It is therefore critical to develop a conflation -model that is able to truly understand the semantic meaning of the strings and -match them at the semantic level. To this end, we develop a character-level -deep conflation model that encodes the input text strings from character level -into finite dimension feature vectors, which are then used to compute the -cosine similarity between the text strings. The model is trained in an -end-to-end manner using back propagation and stochastic gradient descent to -maximize the likelihood of the correct association. Specifically, we propose -two variants of the deep conflation model, based on long-short-term memory -(LSTM) recurrent neural network (RNN) and convolutional neural network (CNN), -respectively. Both models perform well on a real-world business analytics -dataset and significantly outperform the baseline bag-of-character (BoC) model. -" -4237,1702.02736,"Chieh-Yang Huang, Ting-Hao (Kenneth) Huang and Lun-Wei Ku","Challenges in Providing Automatic Affective Feedback in Instant - Messaging Applications",cs.CL cs.HC," Instant messaging is one of the major channels of computer mediated -communication. However, humans are known to be very limited in understanding -others' emotions via text-based communication. Aiming on introducing emotion -sensing technologies to instant messaging, we developed EmotionPush, a system -that automatically detects the emotions of the messages end-users received on -Facebook Messenger and provides colored cues on their smartphones accordingly. -We conducted a deployment study with 20 participants during a time span of two -weeks. In this paper, we revealed five challenges, along with examples, that we -observed in our study based on both user's feedback and chat logs, including -(i)the continuum of emotions, (ii)multi-user conversations, (iii)different -dynamics between different users, (iv)misclassification of emotions and -(v)unconventional content. We believe this discussion will benefit the future -exploration of affective computing for instant messaging, and also shed light -on research of conversational emotion sensing. -" -4238,1702.02737,"Xuan-Son Vu, Seong-Bae Park","Mining User/Movie Preferred Features Based on Reviews for Video - Recommendation System",cs.IR cs.CL," In this work, we present an approach for mining user preferences and -recommendation based on reviews. There have been various studies worked on -recommendation problem. However, most of the studies beyond one aspect user -generated- content such as user ratings, user feedback and so on to state user -preferences. There is a prob- lem in one aspect mining is lacking for stating -user preferences. As a demonstration, in collaborative filter recommendation, -we try to figure out the preference trend of crowded users, then use that trend -to predict current user preference. Therefore, there is a gap between real user -preferences and the trend of the crowded people. Additionally, user preferences -can be addressed from mining user reviews since user often comment about -various aspects of products. To solve this problem, we mainly focus on mining -product aspects and user aspects inside user reviews to directly state user -preferences. We also take into account Social Network Analysis for cold-start -item problem. With cold-start user problem, collaborative filter algorithm is -employed in our work. The framework is general enough to be applied to -different recommendation domains. Theoretically, our method would achieve a -significant enhancement. -" -4239,1702.03033,"Markus Freitag, Jan-Thorsten Peter, Stephan Peitz, Minwei Feng and - Hermann Ney",Local System Voting Feature for Machine Translation System Combination,cs.CL," In this paper, we enhance the traditional confusion network system -combination approach with an additional model trained by a neural network. This -work is motivated by the fact that the commonly used binary system voting -models only assign each input system a global weight which is responsible for -the global impact of each input system on all translations. This prevents -individual systems with low system weights from having influence on the system -combination output, although in some situations this could be helpful. Further, -words which have only been seen by one or few systems rarely have a chance of -being present in the combined output. We train a local system voting model by a -neural network which is based on the words themselves and the combinatorial -occurrences of the different system outputs. This gives system combination the -option to prefer other systems at different word positions even for the same -sentence. -" -4240,1702.03082,"J. Ferrero, F. Agnes, L. Besacier, D. Schwab",UsingWord Embedding for Cross-Language Plagiarism Detection,cs.CL," This paper proposes to use distributed representation of words (word -embeddings) in cross-language textual similarity detection. The main -contributions of this paper are the following: (a) we introduce new -cross-language similarity detection methods based on distributed representation -of words; (b) we combine the different methods proposed to verify their -complementarity and finally obtain an overall F1 score of 89.15% for -English-French similarity detection at chunk level (88.5% at sentence level) on -a very challenging corpus. -" -4241,1702.03121,"Ashutosh Modi, Ivan Titov, Vera Demberg, Asad Sayeed and Manfred - Pinkal","Modeling Semantic Expectation: Using Script Knowledge for Referent - Prediction",cs.CL cs.AI stat.ML," Recent research in psycholinguistics has provided increasing evidence that -humans predict upcoming content. Prediction also affects perception and might -be a key to robustness in human language processing. In this paper, we -investigate the factors that affect human prediction by building a -computational model that can predict upcoming discourse referents based on -linguistic knowledge alone vs. linguistic knowledge jointly with common-sense -knowledge in the form of scripts. We find that script knowledge significantly -improves model estimates of human predictions. In a second study, we test the -highly controversial hypothesis that predictability influences referring -expression type but do not find evidence for such an effect. -" -4242,1702.03196,"Siva Reddy, Oscar T\""ackstr\""om, Slav Petrov, Mark Steedman, Mirella - Lapata",Universal Semantic Parsing,cs.CL," Universal Dependencies (UD) offer a uniform cross-lingual syntactic -representation, with the aim of advancing multilingual applications. Recent -work shows that semantic parsing can be accomplished by transforming syntactic -dependencies to logical forms. However, this work is limited to English, and -cannot process dependency graphs, which allow handling complex phenomena such -as control. In this work, we introduce UDepLambda, a semantic interface for UD, -which maps natural language to logical forms in an almost language-independent -fashion and can process dependency graphs. We perform experiments on question -answering against Freebase and provide German and Spanish translations of the -WebQuestions and GraphQuestions datasets to facilitate multilingual evaluation. -Results show that UDepLambda outperforms strong baselines across languages and -datasets. For English, it achieves a 4.9 F1 point improvement over the -state-of-the-art on GraphQuestions. Our code and data can be downloaded at -https://github.com/sivareddyg/udeplambda. -" -4243,1702.03197,"Abdulaziz M. Alayba, Vasile Palade, Matthew England and Rahat Iqbal",Arabic Language Sentiment Analysis on Health Services,cs.CL cs.NE cs.SI," The social media network phenomenon leads to a massive amount of valuable -data that is available online and easy to access. Many users share images, -videos, comments, reviews, news and opinions on different social networks -sites, with Twitter being one of the most popular ones. Data collected from -Twitter is highly unstructured, and extracting useful information from tweets -is a challenging task. Twitter has a huge number of Arabic users who mostly -post and write their tweets using the Arabic language. While there has been a -lot of research on sentiment analysis in English, the amount of researches and -datasets in Arabic language is limited. This paper introduces an Arabic -language dataset which is about opinions on health services and has been -collected from Twitter. The paper will first detail the process of collecting -the data from Twitter and also the process of filtering, pre-processing and -annotating the Arabic text in order to build a big sentiment analysis dataset -in Arabic. Several Machine Learning algorithms (Naive Bayes, Support Vector -Machine and Logistic Regression) alongside Deep and Convolutional Neural -Networks were utilized in our experiments of sentiment analysis on our health -dataset. -" -4244,1702.03274,"Jason D. Williams, Kavosh Asadi, Geoffrey Zweig","Hybrid Code Networks: practical and efficient end-to-end dialog control - with supervised and reinforcement learning",cs.AI cs.CL," End-to-end learning of recurrent neural networks (RNNs) is an attractive -solution for dialog systems; however, current techniques are data-intensive and -require thousands of dialogs to learn simple behaviors. We introduce Hybrid -Code Networks (HCNs), which combine an RNN with domain-specific knowledge -encoded as software and system action templates. Compared to existing -end-to-end approaches, HCNs considerably reduce the amount of training data -required, while retaining the key benefit of inferring a latent representation -of dialog state. In addition, HCNs can be optimized with supervised learning, -reinforcement learning, or a mixture of both. HCNs attain state-of-the-art -performance on the bAbI dialog dataset, and outperform two commercially -deployed customer-facing dialog systems. -" -4245,1702.03305,"Federico Fancellu, Siva Reddy, Adam Lopez, Bonnie Webber",Universal Dependencies to Logical Forms with Negation Scope,cs.CL," Many language technology applications would benefit from the ability to -represent negation and its scope on top of widely-used linguistic resources. In -this paper, we investigate the possibility of obtaining a first-order logic -representation with negation scope marked using Universal Dependencies. To do -so, we enhance UDepLambda, a framework that converts dependency graphs to -logical forms. The resulting UDepLambda$\lnot$ is able to handle phenomena -related to scope by means of an higher-order type theory, relevant not only to -negation but also to universal quantification and other complex semantic -phenomena. The initial conversion we did for English is promising, in that one -can represent the scope of negation also in the presence of more complex -phenomena such as universal quantifiers. -" -4246,1702.03342,"Walid Shalaby, Wlodek Zadrozny",Learning Concept Embeddings for Efficient Bag-of-Concepts Densification,cs.CL," Explicit concept space models have proven efficacy for text representation in -many natural language and text mining applications. The idea is to embed -textual structures into a semantic space of concepts which captures the main -ideas, objects, and the characteristics of these structures. The so called Bag -of Concepts (BoC) representation suffers from data sparsity causing low -similarity scores between similar texts due to low concept overlap. To address -this problem, we propose two neural embedding models to learn continuous -concept vectors. Once they are learned, we propose an efficient vector -aggregation method to generate fully continuous BoC representations. We -evaluate our concept embedding models on three tasks: 1) measuring entity -semantic relatedness and ranking where we achieve 1.6% improvement in -correlation scores, 2) dataless concept categorization where we achieve -state-of-the-art performance and reduce the categorization error rate by more -than 5% compared to five prior word and entity embedding models, and 3) -dataless document classification where our models outperform the sparse BoC -representations. In addition, by exploiting our efficient linear time vector -aggregation method, we achieve better accuracy scores with much less concept -dimensions compared to previous BoC densification methods which operate in -polynomial time and require hundreds of dimensions in the BoC representation. -" -4247,1702.03402,"Mohamed Bouaziz, Mohamed Morchid, Richard Dufour, Georges Linar\`es, - Renato De Mori",Parallel Long Short-Term Memory for Multi-stream Classification,cs.LG cs.CL," Recently, machine learning methods have provided a broad spectrum of original -and efficient algorithms based on Deep Neural Networks (DNN) to automatically -predict an outcome with respect to a sequence of inputs. Recurrent hidden cells -allow these DNN-based models to manage long-term dependencies such as Recurrent -Neural Networks (RNN) and Long Short-Term Memory (LSTM). Nevertheless, these -RNNs process a single input stream in one (LSTM) or two (Bidirectional LSTM) -directions. But most of the information available nowadays is from multistreams -or multimedia documents, and require RNNs to process these information -synchronously during the training. This paper presents an original LSTM-based -architecture, named Parallel LSTM (PLSTM), that carries out multiple parallel -synchronized input sequences in order to predict a common output. The proposed -PLSTM method could be used for parallel sequence classification purposes. The -PLSTM approach is evaluated on an automatic telecast genre sequences -classification task and compared with different state-of-the-art architectures. -Results show that the proposed PLSTM method outperforms the baseline n-gram -models as well as the state-of-the-art LSTM approach. -" -4248,1702.03470,"Ehsan Sherkat, Evangelos Milios",Vector Embedding of Wikipedia Concepts and Entities,cs.CL," Using deep learning for different machine learning tasks such as image -classification and word embedding has recently gained many attentions. Its -appealing performance reported across specific Natural Language Processing -(NLP) tasks in comparison with other approaches is the reason for its -popularity. Word embedding is the task of mapping words or phrases to a low -dimensional numerical vector. In this paper, we use deep learning to embed -Wikipedia Concepts and Entities. The English version of Wikipedia contains more -than five million pages, which suggest its capability to cover many English -Entities, Phrases, and Concepts. Each Wikipedia page is considered as a -concept. Some concepts correspond to entities, such as a person's name, an -organization or a place. Contrary to word embedding, Wikipedia Concepts -Embedding is not ambiguous, so there are different vectors for concepts with -similar surface form but different mentions. We proposed several approaches and -evaluated their performance based on Concept Analogy and Concept Similarity -tasks. The results show that proposed approaches have the performance -comparable and in some cases even higher than the state-of-the-art methods. -" -4249,1702.03525,"Akiko Eriguchi, Yoshimasa Tsuruoka, Kyunghyun Cho",Learning to Parse and Translate Improves Neural Machine Translation,cs.CL," There has been relatively little attention to incorporating linguistic prior -to neural machine translation. Much of the previous work was further -constrained to considering linguistic prior on the source side. In this paper, -we propose a hybrid model, called NMT+RNNG, that learns to parse and translate -by combining the recurrent neural network grammar into the attention-based -neural machine translation. Our approach encourages the neural machine -translation model to incorporate linguistic prior during training, and lets it -translate on its own afterward. Extensive experiments with four language pairs -show the effectiveness of the proposed NMT+RNNG. -" -4250,1702.03654,"Eray Yildiz, Caglar Tirkaz, H. Bahadir Sahin, Mustafa Tolga Eren, Ozan - Sonmez",A Morphology-aware Network for Morphological Disambiguation,cs.CL," Agglutinative languages such as Turkish, Finnish and Hungarian require -morphological disambiguation before further processing due to the complex -morphology of words. A morphological disambiguator is used to select the -correct morphological analysis of a word. Morphological disambiguation is -important because it generally is one of the first steps of natural language -processing and its performance affects subsequent analyses. In this paper, we -propose a system that uses deep learning techniques for morphological -disambiguation. Many of the state-of-the-art results in computer vision, speech -recognition and natural language processing have been obtained through deep -learning models. However, applying deep learning techniques to morphologically -rich languages is not well studied. In this work, while we focus on Turkish -morphological disambiguation we also present results for French and German in -order to show that the proposed architecture achieves high accuracy with no -language-specific feature engineering or additional resource. In the -experiments, we achieve 84.12, 88.35 and 93.78 morphological disambiguation -accuracy among the ambiguous words for Turkish, German and French respectively. -" -4251,1702.03706,"Daniele Bonadiman, Antonio Uva, Alessandro Moschitti","Multitask Learning with Deep Neural Networks for Community Question - Answering",cs.CL," In this paper, we developed a deep neural network (DNN) that learns to solve -simultaneously the three tasks of the cQA challenge proposed by the -SemEval-2016 Task 3, i.e., question-comment similarity, question-question -similarity and new question-comment similarity. The latter is the main task, -which can exploit the previous two for achieving better results. Our DNN is -trained jointly on all the three cQA tasks and learns to encode questions and -comments into a single vector representation shared across the multiple tasks. -The results on the official challenge test set show that our approach produces -higher accuracy and faster convergence rates than the individual neural -networks. Additionally, our method, which does not use any manual feature -engineering, approaches the state of the art established with methods that make -heavy use of it. -" -4252,1702.03814,"Zhiguo Wang, Wael Hamza, Radu Florian",Bilateral Multi-Perspective Matching for Natural Language Sentences,cs.AI cs.CL," Natural language sentence matching is a fundamental technology for a variety -of tasks. Previous approaches either match sentences from a single direction or -only apply single granular (word-by-word or sentence-by-sentence) matching. In -this work, we propose a bilateral multi-perspective matching (BiMPM) model -under the ""matching-aggregation"" framework. Given two sentences $P$ and $Q$, -our model first encodes them with a BiLSTM encoder. Next, we match the two -encoded sentences in two directions $P \rightarrow Q$ and $P \leftarrow Q$. In -each matching direction, each time step of one sentence is matched against all -time-steps of the other sentence from multiple perspectives. Then, another -BiLSTM layer is utilized to aggregate the matching results into a fix-length -matching vector. Finally, based on the matching vector, the decision is made -through a fully connected layer. We evaluate our model on three tasks: -paraphrase identification, natural language inference and answer sentence -selection. Experimental results on standard benchmark datasets show that our -model achieves the state-of-the-art performance on all tasks. -" -4253,1702.03856,"Sameer Bansal, Herman Kamper, Adam Lopez and Sharon Goldwater",Towards speech-to-text translation without speech recognition,cs.CL," We explore the problem of translating speech to text in low-resource -scenarios where neither automatic speech recognition (ASR) nor machine -translation (MT) are available, but we have training data in the form of audio -paired with text translations. We present the first system for this problem -applied to a realistic multi-speaker dataset, the CALLHOME Spanish-English -speech translation corpus. Our approach uses unsupervised term discovery (UTD) -to cluster repeated patterns in the audio, creating a pseudotext, which we pair -with translations to create a parallel text and train a simple bag-of-words MT -model. We identify the challenges faced by the system, finding that the -difficulty of cross-speaker UTD results in low recall, but that our system is -still able to correctly translate some content words in test data. -" -4254,1702.03859,"Samuel L. Smith, David H. P. Turban, Steven Hamblin and Nils Y. - Hammerla","Offline bilingual word vectors, orthogonal transformations and the - inverted softmax",cs.CL cs.AI cs.IR," Usually bilingual word vectors are trained ""online"". Mikolov et al. showed -they can also be found ""offline"", whereby two pre-trained embeddings are -aligned with a linear transformation, using dictionaries compiled from expert -knowledge. In this work, we prove that the linear transformation between two -spaces should be orthogonal. This transformation can be obtained using the -singular value decomposition. We introduce a novel ""inverted softmax"" for -identifying translation pairs, with which we improve the precision @1 of -Mikolov's original mapping from 34% to 43%, when translating a test set -composed of both common and rare English words into Italian. Orthogonal -transformations are more robust to noise, enabling us to learn the -transformation without expert bilingual signal by constructing a -""pseudo-dictionary"" from the identical character strings which appear in both -languages, achieving 40% precision on the same test set. Finally, we extend our -method to retrieve the true translations of English sentences from a corpus of -200k Italian sentences with a precision @1 of 68%. -" -4255,1702.03964,"Lasha Abzianidze, Johannes Bjerva, Kilian Evang, Hessel Haagsma, Rik - van Noord, Pierre Ludmann, Duc-Duy Nguyen, Johan Bos","The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations - Annotated with Compositional Meaning Representations",cs.CL," The Parallel Meaning Bank is a corpus of translations annotated with shared, -formal meaning representations comprising over 11 million words divided over -four languages (English, German, Italian, and Dutch). Our approach is based on -cross-lingual projection: automatically produced (and manually corrected) -semantic annotations for English sentences are mapped onto their word-aligned -translations, assuming that the translations are meaning-preserving. The -semantic annotation consists of five main steps: (i) segmentation of the text -in sentences and lexical items; (ii) syntactic parsing with Combinatory -Categorial Grammar; (iii) universal semantic tagging; (iv) symbolization; and -(v) compositional semantic analysis based on Discourse Representation Theory. -These steps are performed using statistical models trained in a semi-supervised -manner. The employed annotation models are all language-neutral. Our first -results are promising. -" -4256,1702.04066,"Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault",JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction,cs.CL," We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for -developing and evaluating grammatical error correction (GEC). Unlike other -corpora, it represents a broad range of language proficiency levels and uses -holistic fluency edits to not only correct grammatical errors but also make the -original text more native sounding. We describe the types of corrections made -and benchmark four leading GEC systems on this corpus, identifying specific -areas in which they do well and how they can improve. JFLEG fulfills the need -for a new gold standard to properly assess the current state of GEC. -" -4257,1702.04241,Alok Ranjan Pal and Diganta Saha,Detection of Slang Words in e-Data using semi-Supervised Learning,cs.CL," The proposed algorithmic approach deals with finding the sense of a word in -an electronic data. Now a day,in different communication mediums like internet, -mobile services etc. people use few words, which are slang in nature. This -approach detects those abusive words using supervised learning procedure. But -in the real life scenario, the slang words are not used in complete word forms -always. Most of the times, those words are used in different abbreviated forms -like sounds alike forms, taboo morphemes etc. This proposed approach can detect -those abbreviated forms also using semi supervised learning procedure. Using -the synset and concept analysis of the text, the probability of a suspicious -word to be a slang word is also evaluated. -" -4258,1702.04333,"Angel Mario Castro Martinez, Sri Harish Mallidi, Bernd T. Meyer","On the Relevance of Auditory-Based Gabor Features for Deep Learning in - Automatic Speech Recognition",cs.CL," Previous studies support the idea of merging auditory-based Gabor features -with deep learning architectures to achieve robust automatic speech -recognition, however, the cause behind the gain of such combination is still -unknown. We believe these representations provide the deep learning decoder -with more discriminable cues. Our aim with this paper is to validate this -hypothesis by performing experiments with three different recognition tasks -(Aurora 4, CHiME 2 and CHiME 3) and assess the discriminability of the -information encoded by Gabor filterbank features. Additionally, to identify the -contribution of low, medium and high temporal modulation frequencies subsets of -the Gabor filterbank were used as features (dubbed LTM, MTM and HTM -respectively). With temporal modulation frequencies between 16 and 25 Hz, HTM -consistently outperformed the remaining ones in every condition, highlighting -the robustness of these representations against channel distortions, low -signal-to-noise ratios and acoustically challenging real-life scenarios with -relative improvements from 11 to 56% against a Mel-filterbank-DNN baseline. To -explain the results, a measure of similarity between phoneme classes from DNN -activations is proposed and linked to their acoustic properties. We find this -measure to be consistent with the observed error rates and highlight specific -differences on phoneme level to pinpoint the benefit of the proposed features. -" -4259,1702.04372,Antonios Anastasopoulos and David Chiang,"A case study on using speech-to-translation alignments for language - documentation",cs.CL," For many low-resource or endangered languages, spoken language resources are -more likely to be annotated with translations than with transcriptions. Recent -work exploits such annotations to produce speech-to-translation alignments, -without access to any text transcriptions. We investigate whether providing -such information can aid in producing better (mismatched) crowdsourced -transcriptions, which in turn could be valuable for training speech recognition -systems, and show that they can indeed be beneficial through a small-scale case -study as a proof-of-concept. We also present a simple phonetically aware string -averaging technique that produces transcriptions of higher quality. -" -4260,1702.04457,"Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei - Han",Automated Phrase Mining from Massive Text Corpora,cs.CL," As one of the fundamental tasks in text analysis, phrase mining aims at -extracting quality phrases from a text corpus. Phrase mining is important in -various tasks such as information extraction/retrieval, taxonomy construction, -and topic modeling. Most existing methods rely on complex, trained linguistic -analyzers, and thus likely have unsatisfactory performance on text corpora of -new domains and genres without extra but expensive adaption. Recently, a few -data-driven methods have been developed successfully for extraction of phrases -from massive domain-specific text. However, none of the state-of-the-art models -is fully automated because they require human experts for designing rules or -labeling phrases. - Since one can easily obtain many quality phrases from public knowledge bases -to a scale that is much larger than that produced by human experts, in this -paper, we propose a novel framework for automated phrase mining, AutoPhrase, -which leverages this large amount of high-quality phrases in an effective way -and achieves better performance compared to limited human labeled phrases. In -addition, we develop a POS-guided phrasal segmentation model, which -incorporates the shallow syntactic information in part-of-speech (POS) tags to -further enhance the performance, when a POS tagger is available. Note that, -AutoPhrase can support any language as long as a general knowledge base (e.g., -Wikipedia) in that language is available, while benefiting from, but not -requiring, a POS tagger. Compared to the state-of-the-art methods, the new -method has shown significant improvements in effectiveness on five real-world -datasets across different domains and languages. -" -4261,1702.04488,Jingjing Xu and Xu Sun,"Transfer Deep Learning for Low-Resource Chinese Word Segmentation with a - Novel Neural Network",cs.CL," Recent studies have shown effectiveness in using neural networks for Chinese -word segmentation. However, these models rely on large-scale data and are less -effective for low-resource datasets because of insufficient training data. We -propose a transfer learning method to improve low-resource word segmentation by -leveraging high-resource corpora. First, we train a teacher model on -high-resource corpora and then use the learned knowledge to initialize a -student model. Second, a weighted data similarity method is proposed to train -the student model on low-resource data. Experiment results show that our work -significantly improves the performance on low-resource datasets: 2.3% and 1.5% -F-score on PKU and CTB datasets. Furthermore, this paper achieves -state-of-the-art results: 96.1%, and 96.2% F-score on PKU and CTB datasets. -" -4262,1702.04510,"Christian Hadiwinoto, Hwee Tou Ng","A Dependency-Based Neural Reordering Model for Statistical Machine - Translation",cs.CL," In machine translation (MT) that involves translating between two languages -with significant differences in word order, determining the correct word order -of translated words is a major challenge. The dependency parse tree of a source -sentence can help to determine the correct word order of the translated words. -In this paper, we present a novel reordering approach utilizing a neural -network and dependency-based embeddings to predict whether the translations of -two source words linked by a dependency relation should remain in the same -order or should be swapped in the translated sentence. Experiments on -Chinese-to-English translation show that our approach yields a statistically -significant improvement of 0.57 BLEU point on benchmark NIST test sets, -compared to our prior state-of-the-art statistical MT system that uses sparse -dependency-based reordering features. -" -4263,1702.04521,"Micha{\l} Daniluk, Tim Rockt\""aschel, Johannes Welbl, Sebastian Riedel",Frustratingly Short Attention Spans in Neural Language Modeling,cs.CL cs.AI cs.LG cs.NE," Neural language models predict the next token using a latent representation -of the immediate token history. Recently, various methods for augmenting neural -language models with an attention mechanism over a differentiable memory have -been proposed. For predicting the next token, these models query information -from a memory of the recent history which can facilitate learning mid- and -long-range dependencies. However, conventional attention mechanisms used in -memory-augmented neural language models produce a single output vector per time -step. This vector is used both for predicting the next token as well as for the -key and value of a differentiable memory of a token history. In this paper, we -propose a neural language model with a key-value attention mechanism that -outputs separate representations for the key and value of a differentiable -memory, as well as for encoding the next-word distribution. This model -outperforms existing memory-augmented neural language models on two corpora. -Yet, we found that our method mainly utilizes a memory of the five most recent -output representations. This led to the unexpected main finding that a much -simpler model based only on the concatenation of recent output representations -from previous time steps is on par with more sophisticated memory-augmented -neural language models. -" -4264,1702.04615,Daniel Miller,"Automated Identification of Drug-Drug Interactions in Pediatric - Congestive Heart Failure Patients",cs.CL," Congestive Heart Failure, or CHF, is a serious medical condition that can -result in fluid buildup in the body as a result of a weak heart. When the heart -can't pump enough blood to efficiently deliver nutrients and oxygen to the -body, kidney function may be impaired, resulting in fluid retention. CHF -patients require a broad drug regimen to maintain the delicate system balance, -particularly between their heart and kidneys. These drugs include ACE -inhibitors and Beta Blockers to control blood pressure, anticoagulants to -prevent blood clots, and diuretics to reduce fluid overload. Many of these -drugs may interact, and potential effects of these interactions must be weighed -against their benefits. For this project, we consider a set of 44 drugs -identified as specifically relevant for treating CHF by pediatric cardiologists -at Lucile Packard Children's Hospital. This list was generated as part of our -current work at the LPCH Heart Center. The goal of this project is to identify -and evaluate potentially harmful drug-drug interactions (DDIs) within pediatric -patients with Congestive Heart Failure. This identification will be done -autonomously, so that it may continuously update by evaluating newly published -literature. -" -4265,1702.04770,"Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu - Sun, Soumith Chintala, Nicolas Vasilache",Training Language Models Using Target-Propagation,cs.CL cs.LG cs.NE," While Truncated Back-Propagation through Time (BPTT) is the most popular -approach to training Recurrent Neural Networks (RNNs), it suffers from being -inherently sequential (making parallelization difficult) and from truncating -gradient flow between distant time-steps. We investigate whether Target -Propagation (TPROP) style approaches can address these shortcomings. -Unfortunately, extensive experiments suggest that TPROP generally underperforms -BPTT, and we end with an analysis of this phenomenon, and suggestions for -future work. -" -4266,1702.04811,"John P. Lalor, Hao Wu, Tsendsuren Munkhdalai, Hong Yu","Understanding Deep Learning Performance through an Examination of Test - Set Difficulty: A Psychometric Case Study",cs.CL," Interpreting the performance of deep learning models beyond test set accuracy -is challenging. Characteristics of individual data points are often not -considered during evaluation, and each data point is treated equally. We -examine the impact of a test set question's difficulty to determine if there is -a relationship between difficulty and performance. We model difficulty using -well-studied psychometric methods on human response patterns. Experiments on -Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the -likelihood of answering a question correctly is impacted by the question's -difficulty. As DNNs are trained with more data, easy examples are learned more -quickly than hard examples. -" -4267,1702.04938,"Taraka Rama, Johannes Wahle, Pavel Sofroniev, and Gerhard J\""ager",Fast and unsupervised methods for multilingual cognate clustering,cs.CL," In this paper we explore the use of unsupervised methods for detecting -cognates in multilingual word lists. We use online EM to train sound segment -similarity weights for computing similarity between two words. We tested our -online systems on geographically spread sixteen different language groups of -the world and show that the Online PMI system (Pointwise Mutual Information) -outperforms a HMM based system and two linguistically motivated systems: -LexStat and ALINE. Our results suggest that a PMI system trained in an online -fashion can be used by historical linguists for fast and accurate -identification of cognates in not so well-studied language families. -" -4268,1702.05053,"Xiaochang Peng, Chuan Wang, Daniel Gildea and Nianwen Xue",Addressing the Data Sparsity Issue in Neural AMR Parsing,cs.CL," Neural attention models have achieved great success in different NLP tasks. -How- ever, they have not fulfilled their promise on the AMR parsing task due to -the data sparsity issue. In this paper, we de- scribe a sequence-to-sequence -model for AMR parsing and present different ways to tackle the data sparsity -problem. We show that our methods achieve significant improvement over a -baseline neural atten- tion model and our results are also compet- itive -against state-of-the-art systems that do not use extra linguistic resources. -" -4269,1702.05270,"Sandro Pezzelle, Marco Marelli, Raffaella Bernardi","Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers - from Vision",cs.CL cs.AI cs.CV," People can refer to quantities in a visual scene by using either exact -cardinals (e.g. one, two, three) or natural language quantifiers (e.g. few, -most, all). In humans, these two processes underlie fairly different cognitive -and neural mechanisms. Inspired by this evidence, the present study proposes -two models for learning the objective meaning of cardinals and quantifiers from -visual scenes containing multiple objects. We show that a model capitalizing on -a 'fuzzy' measure of similarity is effective for learning quantifiers, whereas -the learning of exact cardinals is better accomplished when information about -number is provided. -" -4270,1702.05398,"Pradeep Dasigi, Gully A.P.C. Burns, Eduard Hovy, and Anita de Waard","Experiment Segmentation in Scientific Discourse as Clause-level - Structured Prediction using Recurrent Neural Networks",cs.CL," We propose a deep learning model for identifying structure within experiment -narratives in scientific literature. We take a sequence labeling approach to -this problem, and label clauses within experiment narratives to identify the -different parts of the experiment. Our dataset consists of paragraphs taken -from open access PubMed papers labeled with rhetorical information as a result -of our pilot annotation. Our model is a Recurrent Neural Network (RNN) with -Long Short-Term Memory (LSTM) cells that labels clauses. The clause -representations are computed by combining word representations using a novel -attention mechanism that involves a separate RNN. We compare this model against -LSTMs where the input layer has simple or no attention and a feature rich CRF -model. Furthermore, we describe how our work could be useful for information -extraction from scientific literature. -" -4271,1702.05512,"Parminder Bhatia, Marsal Gavalda and Arash Einolghozati",soc2seq: Social Embedding meets Conversation Model,cs.SI cs.CL," While liking or upvoting a post on a mobile app is easy to do, replying with -a written note is much more difficult, due to both the cognitive load of coming -up with a meaningful response as well as the mechanics of entering the text. -Here we present a novel textual reply generation model that goes beyond the -current auto-reply and predictive text entry models by taking into account the -content preferences of the user, the idiosyncrasies of their conversational -style, and even the structure of their social graph. Specifically, we have -developed two types of models for personalized user interactions: a -content-based conversation model, which makes use of location together with -user information, and a social-graph-based conversation model, which combines -content-based conversation models with social graphs. -" -4272,1702.05531,Vladimir Zolotov and David Kung,Analysis and Optimization of fastText Linear Text Classifier,cs.CL," The paper [1] shows that simple linear classifier can compete with complex -deep learning algorithms in text classification applications. Combining bag of -words (BoW) and linear classification techniques, fastText [1] attains same or -only slightly lower accuracy than deep learning algorithms [2-9] that are -orders of magnitude slower. We proved formally that fastText can be transformed -into a simpler equivalent classifier, which unlike fastText does not have any -hidden layer. We also proved that the necessary and sufficient dimensionality -of the word vector embedding space is exactly the number of document classes. -These results help constructing more optimal linear text classifiers with -guaranteed maximum classification capabilities. The results are proven exactly -by pure formal algebraic methods without attracting any empirical data. -" -4273,1702.05624,Roberto Santana,"Reproducing and learning new algebraic operations on word embeddings - using genetic programming",cs.CL," Word-vector representations associate a high dimensional real-vector to every -word from a corpus. Recently, neural-network based methods have been proposed -for learning this representation from large corpora. This type of -word-to-vector embedding is able to keep, in the learned vector space, some of -the syntactic and semantic relationships present in the original word corpus. -This, in turn, serves to address different types of language classification -tasks by doing algebraic operations defined on the vectors. The general -practice is to assume that the semantic relationships between the words can be -inferred by the application of a-priori specified algebraic operations. Our -general goal in this paper is to show that it is possible to learn methods for -word composition in semantic spaces. Instead of expressing the compositional -method as an algebraic operation, we will encode it as a program, which can be -linear, nonlinear, or involve more intricate expressions. More remarkably, this -program will be evolved from a set of initial random programs by means of -genetic programming (GP). We show that our method is able to reproduce the same -behavior as human-designed algebraic operators. Using a word analogy task as -benchmark, we also show that GP-generated programs are able to obtain accuracy -values above those produced by the commonly used human-designed rule for -algebraic manipulation of word vectors. Finally, we show the robustness of our -approach by executing the evolved programs on the word2vec GoogleNews vectors, -learned over 3 billion running words, and assessing their accuracy in the same -word analogy task. -" -4274,1702.05638,"Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, - Benno Stein",A Stylometric Inquiry into Hyperpartisan and Fake News,cs.CL," This paper reports on a writing style analysis of hyperpartisan (i.e., -extremely one-sided) news in connection to fake news. It presents a large -corpus of 1,627 articles that were manually fact-checked by professional -journalists from BuzzFeed. The articles originated from 9 well-known political -publishers, 3 each from the mainstream, the hyperpartisan left-wing, and the -hyperpartisan right-wing. In sum, the corpus contains 299 fake news, 97% of -which originated from hyperpartisan publishers. - We propose and demonstrate a new way of assessing style similarity between -text categories via Unmasking---a meta-learning approach originally devised for -authorship verification---, revealing that the style of left-wing and -right-wing news have a lot more in common than any of the two have with the -mainstream. Furthermore, we show that hyperpartisan news can be discriminated -well by its style from the mainstream (F1=0.78), as can be satire from both -(F1=0.81). Unsurprisingly, style-based fake news detection does not live up to -scratch (F1=0.46). Nevertheless, the former results are important to implement -pre-screening for fake news detectors. -" -4275,1702.05793,Ann Irvine and Mark Dredze,"Harmonic Grammar, Optimality Theory, and Syntax Learnability: An - Empirical Exploration of Czech Word Order",cs.CL," This work presents a systematic theoretical and empirical comparison of the -major algorithms that have been proposed for learning Harmonic and Optimality -Theory grammars (HG and OT, respectively). By comparing learning algorithms, we -are also able to compare the closely related OT and HG frameworks themselves. -Experimental results show that the additional expressivity of the HG framework -over OT affords performance gains in the task of predicting the surface word -order of Czech sentences. We compare the perceptron with the classic Gradual -Learning Algorithm (GLA), which learns OT grammars, as well as the popular -Maximum Entropy model. In addition to showing that the perceptron is -theoretically appealing, our work shows that the performance of the HG model it -learns approaches that of the upper bound in prediction accuracy on a held out -test set and that it is capable of accurately modeling observed variation. -" -4276,1702.05821,"Bo Han, Will Radford, Ana\""is Cadilhac, Art Harol, Andrew Chisholm, - Ben Hachey",Post-edit Analysis of Collective Biography Generation,cs.CL," Text generation is increasingly common but often requires manual post-editing -where high precision is critical to end users. However, manual editing is -expensive so we want to ensure this effort is focused on high-value tasks. And -we want to maintain stylistic consistency, a particular challenge in crowd -settings. We present a case study, analysing human post-editing in the context -of a template-based biography generation system. An edit flow visualisation -combined with manual characterisation of edits helps identify and prioritise -work for improving end-to-end efficiency and accuracy. -" -4277,1702.05962,Kris Cao and Stephen Clark,Latent Variable Dialogue Models and their Diversity,cs.CL," We present a dialogue generation model that directly captures the variability -in possible responses to a given input, which reduces the `boring output' issue -of deterministic dialogue models. Experiments show that our model generates -more diverse outputs than baseline models, and also generates more consistently -acceptable output than sampling from a deterministic encoder-decoder model. -" -4278,1702.06027,Ibrahim Cimentepe and Haluk O. Bingol,Parent Oriented Teacher Selection Causes Language Diversity,cs.CL," An evolutionary model for emergence of diversity in language is developed. We -investigated the effects of two real life observations, namely, people prefer -people that they communicate with well, and people interact with people that -are physically close to each other. Clearly these groups are relatively small -compared to the entire population. We restrict selection of the teachers from -such small groups, called imitation sets, around parents. Then the child learns -language from a teacher selected within the imitation set of her parent. As a -result, there are subcommunities with their own languages developed. Within -subcommunity comprehension is found to be high. The number of languages is -related to the relative size of imitation set by a power law. -" -4279,1702.06135,"Raj Dabre, Fabien Cromieres, Sadao Kurohashi","Enabling Multi-Source Neural Machine Translation By Concatenating Source - Sentences In Multiple Languages",cs.CL," In this paper, we explore a simple solution to ""Multi-Source Neural Machine -Translation"" (MSNMT) which only relies on preprocessing a N-way multilingual -corpus without modifying the Neural Machine Translation (NMT) architecture or -training procedure. We simply concatenate the source sentences to form a single -long multi-source input sentence while keeping the target side sentence as it -is and train an NMT system using this preprocessed corpus. We evaluate our -method in resource poor as well as resource rich settings and show its -effectiveness (up to 4 BLEU using 2 source languages and up to 6 BLEU using 5 -source languages). We also compare against existing methods for MSNMT and show -that our solution gives competitive results despite its simplicity. We also -provide some insights on how the NMT system leverages multilingual information -in such a scenario by visualizing attention. -" -4280,1702.06216,"Alan Mishler, Kevin Wonus, Wendy Chambers and Michael Bloodgood",Filtering Tweets for Social Unrest,cs.CL cs.IR cs.LG stat.ML," Since the events of the Arab Spring, there has been increased interest in -using social media to anticipate social unrest. While efforts have been made -toward automated unrest prediction, we focus on filtering the vast volume of -tweets to identify tweets relevant to unrest, which can be provided to -downstream users for further analysis. We train a supervised classifier that is -able to label Arabic language tweets as relevant to unrest with high -reliability. We examine the relationship between training data size and -performance and investigate ways to optimize the model building process while -minimizing cost. We also explore how confidence thresholds can be set to -achieve desired levels of performance. -" -4281,1702.06235,"Andrew Chisholm, Will Radford, Ben Hachey",Learning to generate one-sentence biographies from Wikidata,cs.CL," We investigate the generation of one-sentence Wikipedia biographies from -facts derived from Wikidata slot-value pairs. We train a recurrent neural -network sequence-to-sequence model with attention to select facts and generate -textual summaries. Our model incorporates a novel secondary objective that -helps ensure it generates sentences that contain the input facts. The model -achieves a BLEU score of 41, improving significantly upon the vanilla -sequence-to-sequence model and scoring roughly twice that of a simple template -baseline. Human preference evaluation suggests the model is nearly as good as -the Wikipedia reference. Manual analysis explores content selection, suggesting -the model can trade the ability to infer knowledge against the risk of -hallucinating incorrect information. -" -4282,1702.06239,"Yang Gao, Hao Wang, Chen Zhang, Wei Wang",Reinforcement Learning Based Argument Component Detection,cs.CL," Argument component detection (ACD) is an important sub-task in argumentation -mining. ACD aims at detecting and classifying different argument components in -natural language texts. Historical annotations (HAs) are important features the -human annotators consider when they manually perform the ACD task. However, HAs -are largely ignored by existing automatic ACD techniques. Reinforcement -learning (RL) has proven to be an effective method for using HAs in some -natural language processing tasks. In this work, we propose a RL-based ACD -technique, and evaluate its performance on two well-annotated corpora. Results -suggest that, in terms of classification accuracy, HAs-augmented RL outperforms -plain RL by at most 17.85%, and outperforms the state-of-the-art supervised -learning algorithm by at most 11.94%. -" -4283,1702.06336,"Miroslav Vodol\'an, Rudolf Kadlec, Jan Kleindienst",Hybrid Dialog State Tracker with ASR Features,cs.CL," This paper presents a hybrid dialog state tracker enhanced by trainable -Spoken Language Understanding (SLU) for slot-filling dialog systems. Our -architecture is inspired by previously proposed neural-network-based -belief-tracking systems. In addition, we extended some parts of our modular -architecture with differentiable rules to allow end-to-end training. We -hypothesize that these rules allow our tracker to generalize better than pure -machine-learning based systems. For evaluation, we used the Dialog State -Tracking Challenge (DSTC) 2 dataset - a popular belief tracking testbed with -dialogs from restaurant information system. To our knowledge, our hybrid -tracker sets a new state-of-the-art result in three out of four categories -within the DSTC2. -" -4284,1702.06378,"Liang Lu, Lingpeng Kong, Chris Dyer and Noah A. Smith",Multitask Learning with CTC and Segmental CRF for Speech Recognition,cs.CL," Segmental conditional random fields (SCRFs) and connectionist temporal -classification (CTC) are two sequence labeling methods used for end-to-end -training of speech recognition models. Both models define a transcription -probability by marginalizing decisions about latent segmentation alternatives -to derive a sequence probability: the former uses a globally normalized joint -model of segment labels and durations, and the latter classifies each frame as -either an output symbol or a ""continuation"" of the previous label. In this -paper, we train a recognition model by optimizing an interpolation between the -SCRF and CTC losses, where the same recurrent neural network (RNN) encoder is -used for feature extraction for both outputs. We find that this multitask -objective improves recognition accuracy when decoding with either the SCRF or -CTC models. Additionally, we show that CTC can also be used to pretrain the RNN -encoder, which improves the convergence rate when learning the joint model. -" -4285,1702.06467,"Carlos-Emiliano Gonz\'alez-Gallardo, Juan-Manuel Torres-Moreno, - Azucena Montes Rend\'on and Gerardo Sierra","Efficient Social Network Multilingual Classification using Character, - POS n-grams and Dynamic Normalization",cs.IR cs.CL cs.SI," In this paper we describe a dynamic normalization process applied to social -network multilingual documents (Facebook and Twitter) to improve the -performance of the Author profiling task for short texts. After the -normalization process, $n$-grams of characters and n-grams of POS tags are -obtained to extract all the possible stylistic information encoded in the -documents (emoticons, character flooding, capital letters, references to other -users, hyperlinks, hashtags, etc.). Experiments with SVM showed up to 90% of -performance. -" -4286,1702.06478,"Xavier Bost, Ilaria Brunetti, Luis Adri\'an Cabrera-Diego, - Jean-Val\`ere Cossu, Andr\'ea Linhares, Mohamed Morchid, Juan-Manuel - Torres-Moreno, Marc El-B\`eze, Richard Dufour",Syst\`emes du LIA \`a DEFT'13,cs.CL cs.IR," The 2013 D\'efi de Fouille de Textes (DEFT) campaign is interested in two -types of language analysis tasks, the document classification and the -information extraction in the specialized domain of cuisine recipes. We present -the systems that the LIA has used in DEFT 2013. Our systems show interesting -results, even though the complexity of the proposed tasks. -" -4287,1702.06510,"Luis Adri\'an Cabrera-Diego, St\'ephane Huet, Bassam Jabaian, - Alejandro Molina, Juan-Manuel Torres-Moreno, Marc El-B\`eze, Barth\'el\'emy - Durette","Algorithmes de classification et d'optimisation: participation du - LIA/ADOC \'a DEFT'14",cs.IR cs.CL," This year, the DEFT campaign (D\'efi Fouilles de Textes) incorporates a task -which aims at identifying the session in which articles of previous TALN -conferences were presented. We describe the three statistical systems developed -at LIA/ADOC for this task. A fusion of these systems enables us to obtain -interesting results (micro-precision score of 0.76 measured on the test corpus) -" -4288,1702.06589,Till Haug and Octavian-Eugen Ganea and Paulina Grnarova,"Neural Multi-Step Reasoning for Question Answering on Semi-Structured - Tables",cs.CL," Advances in natural language processing tasks have gained momentum in recent -years due to the increasingly popular neural network methods. In this paper, we -explore deep learning techniques for answering multi-step reasoning questions -that operate on semi-structured tables. Challenges here arise from the level of -logical compositionality expressed by questions, as well as the domain -openness. Our approach is weakly supervised, trained on question-answer-table -triples without requiring intermediate strong supervision. It performs two -phases: first, machine understandable logical forms (programs) are generated -from natural language questions following the work of [Pasupat and Liang, -2015]. Second, paraphrases of logical forms and questions are embedded in a -jointly learned vector space using word and character convolutional neural -networks. A neural scoring function is further used to rank and retrieve the -most probable logical form (interpretation) of a question. Our best single -model achieves 34.8% accuracy on the WikiTableQuestions dataset, while the best -ensemble of our models pushes the state-of-the-art score on this task to 38.7%, -thus slightly surpassing both the engineered feature scoring baseline, as well -as the Neural Programmer model of [Neelakantan et al., 2016]. -" -4289,1702.06594,Marco Kuhlmann and Giorgio Satta and Peter Jonsson,On the Complexity of CCG Parsing,cs.CL," We study the parsing complexity of Combinatory Categorial Grammar (CCG) in -the formalism of Vijay-Shanker and Weir (1994). As our main result, we prove -that any parsing algorithm for this formalism will take in the worst case -exponential time when the size of the grammar, and not only the length of the -input sentence, is included in the analysis. This sets the formalism of -Vijay-Shanker and Weir (1994) apart from weakly equivalent formalisms such as -Tree-Adjoining Grammar (TAG), for which parsing can be performed in time -polynomial in the combined size of grammar and input sentence. Our results -contribute to a refined understanding of the class of mildly context-sensitive -grammars, and inform the search for new, mildly context-sensitive versions of -CCG. -" -4290,1702.06663,"Saurav Ghosh, Prithwish Chakraborty, Bryan L. Lewis, Maimuna S. - Majumder, Emily Cohn, John S. Brownstein, Madhav V. Marathe, Naren - Ramakrishnan","Guided Deep List: Automating the Generation of Epidemiological Line - Lists from Open Sources",cs.CL cs.IR," Real-time monitoring and responses to emerging public health threats rely on -the availability of timely surveillance data. During the early stages of an -epidemic, the ready availability of line lists with detailed tabular -information about laboratory-confirmed cases can assist epidemiologists in -making reliable inferences and forecasts. Such inferences are crucial to -understand the epidemiology of a specific disease early enough to stop or -control the outbreak. However, construction of such line lists requires -considerable human supervision and therefore, difficult to generate in -real-time. In this paper, we motivate Guided Deep List, the first tool for -building automated line lists (in near real-time) from open source reports of -emerging disease outbreaks. Specifically, we focus on deriving epidemiological -characteristics of an emerging disease and the affected population from reports -of illness. Guided Deep List uses distributed vector representations (ala -word2vec) to discover a set of indicators for each line list feature. This -discovery of indicators is followed by the use of dependency parsing based -techniques for final extraction in tabular form. We evaluate the performance of -Guided Deep List against a human annotated line list provided by HealthMap -corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that Guided -Deep List extracts line list features with increased accuracy compared to a -baseline method. We further show how these automatically extracted line list -features can be used for making epidemiological inferences, such as inferring -demographics and symptoms-to-hospitalization period of affected individuals. -" -4291,1702.06672,"Aida Nematzadeh and Barend Beekhuizen and Shanshan Huang and Suzanne - Stevenson",Calculating Probabilities Simplifies Word Learning,cs.CL," Children can use the statistical regularities of their environment to learn -word meanings, a mechanism known as cross-situational learning. We take a -computational approach to investigate how the information present during each -observation in a cross-situational framework can affect the overall acquisition -of word meanings. We do so by formulating various in-the-moment learning -mechanisms that are sensitive to different statistics of the environment, such -as counts and conditional probabilities. Each mechanism introduces a unique -source of competition or mutual exclusivity bias to the model; the mechanism -that maximally uses the model's knowledge of word meanings performs the best. -Moreover, the gap between this mechanism and others is amplified in more -challenging learning scenarios, such as learning from few examples. -" -4292,1702.06675,"Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin and Trevor Cohn",Context-Aware Prediction of Derivational Word-forms,cs.CL," Derivational morphology is a fundamental and complex characteristic of -language. In this paper we propose the new task of predicting the derivational -form of a given base-form lemma that is appropriate for a given context. We -present an encoder--decoder style neural network to produce a derived form -character-by-character, based on its corresponding character-level -representation of the base form and the context. We demonstrate that our model -is able to generate valid context-sensitive derivations from known base forms, -but is less accurate under a lexicon agnostic setting. -" -4293,1702.06677,George Berry and Sean J. Taylor,Discussion quality diffuses in the digital public square,cs.CY cs.CL cs.SI," Studies of online social influence have demonstrated that friends have -important effects on many types of behavior in a wide variety of settings. -However, we know much less about how influence works among relative strangers -in digital public squares, despite important conversations happening in such -spaces. We present the results of a study on large public Facebook pages where -we randomly used two different methods--most recent and social feedback--to -order comments on posts. We find that the social feedback condition results in -higher quality viewed comments and response comments. After measuring the -average quality of comments written by users before the study, we find that -social feedback has a positive effect on response quality for both low and high -quality commenters. We draw on a theoretical framework of social norms to -explain this empirical result. In order to examine the influence mechanism -further, we measure the similarity between comments viewed and written during -the study, finding that similarity increases for the highest quality -contributors under the social feedback condition. This suggests that, in -addition to norms, some individuals may respond with increased relevance to -high-quality comments. -" -4294,1702.06696,"Thomas Kober and Julie Weeds and John Wilkie and Jeremy Reffin and - David Weir",One Representation per Word - Does it make Sense for Composition?,cs.CL," In this paper, we investigate whether an a priori disambiguation of word -senses is strictly necessary or whether the meaning of a word in context can be -disambiguated through composition alone. We evaluate the performance of -off-the-shelf single-vector and multi-sense vector models on a benchmark phrase -similarity task and a novel task for word-sense discrimination. We find that -single-sense vector models perform as well or better than multi-sense vector -models despite arguably less clean elementary representations. Our findings -furthermore show that simple composition functions such as pointwise addition -are able to recover sense specific information from a single-sense vector model -remarkably well. -" -4295,1702.06700,"Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang","Task-driven Visual Saliency and Attention-based Visual Question - Answering",cs.CV cs.AI cs.CL cs.NE," Visual question answering (VQA) has witnessed great progress since May, 2015 -as a classic problem unifying visual and textual data into a system. Many -enlightening VQA works explore deep into the image and question encodings and -fusing methods, of which attention is the most effective and infusive -mechanism. Current attention based methods focus on adequate fusion of visual -and textual features, but lack the attention to where people focus to ask -questions about the image. Traditional attention based methods attach a single -value to the feature at each spatial location, which losses many useful -information. To remedy these problems, we propose a general method to perform -saliency-like pre-selection on overlapped region features by the interrelation -of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication -based attention method to capture more competent correlation information -between visual and textual features. We conduct experiments on the large-scale -COCO-VQA dataset and analyze the effectiveness of our model demonstrated by -strong empirical results. -" -4296,1702.06703,"Jiwei Li, Will Monroe and Dan Jurafsky",Data Distillation for Controlling Specificity in Dialogue Generation,cs.CL," People speak at different levels of specificity in different situations. -Depending on their knowledge, interlocutors, mood, etc.} A conversational agent -should have this ability and know when to be specific and when to be general. -We propose an approach that gives a neural network--based conversational agent -this ability. Our approach involves alternating between \emph{data -distillation} and model training : removing training examples that are closest -to the responses most commonly produced by the model trained from the last -round and then retrain the model on the remaining dataset. Dialogue generation -models trained with different degrees of data distillation manifest different -levels of specificity. - We then train a reinforcement learning system for selecting among this pool -of generation models, to choose the best level of specificity for a given -input. Compared to the original generative model trained without distillation, -the proposed system is capable of generating more interesting and -higher-quality responses, in addition to appropriately adjusting specificity -depending on the context. - Our research constitutes a specific case of a broader approach involving -training multiple subsystems from a single dataset distinguished by differences -in a specific property one wishes to model. We show that from such a set of -subsystems, one can use reinforcement learning to build a system that tailors -its output to different input contexts at test time. -" -4297,1702.06709,"Abhishek, Ashish Anand and Amit Awekar","Fine-Grained Entity Type Classification by Jointly Learning - Representations and Label Embeddings",cs.CL," Fine-grained entity type classification (FETC) is the task of classifying an -entity mention to a broad set of types. Distant supervision paradigm is -extensively used to generate training data for this task. However, generated -training data assigns same set of labels to every mention of an entity without -considering its local context. Existing FETC systems have two major drawbacks: -assuming training data to be noise free and use of hand crafted features. Our -work overcomes both drawbacks. We propose a neural network model that jointly -learns entity mentions and their context representation to eliminate use of -hand crafted features. Our model treats training data as noisy and uses -non-parametric variant of hinge loss function. Experiments show that the -proposed model outperforms previous state-of-the-art methods on two publicly -available datasets, namely FIGER (GOLD) and BBN with an average relative -improvement of 2.69% in micro-F1 score. Knowledge learnt by our model on one -dataset can be transferred to other datasets while using same model or other -FETC systems. These approaches of transferring knowledge further improve the -performance of respective models. -" -4298,1702.06733,Jessica Ficler and Yoav Goldberg,Improving a Strong Neural Parser with Conjunction-Specific Features,cs.CL," While dependency parsers reach very high overall accuracy, some dependency -relations are much harder than others. In particular, dependency parsers -perform poorly in coordination construction (i.e., correctly attaching the -""conj"" relation). We extend a state-of-the-art dependency parser with -conjunction-specific features, focusing on the similarity between the conjuncts -head words. Training the extended parser yields an improvement in ""conj"" -attachment as well as in overall dependency parsing accuracy on the Stanford -dependency conversion of the Penn TreeBank. -" -4299,1702.06740,"Qiaolin Xia, Baobao Chang, Zhifang Sui",Improving Chinese SRL with Heterogeneous Annotations,cs.CL," Previous studies on Chinese semantic role labeling (SRL) have concentrated on -single semantically annotated corpus. But the training data of single corpus is -often limited. Meanwhile, there usually exists other semantically annotated -corpora for Chinese SRL scattered across different annotation frameworks. Data -sparsity remains a bottleneck. This situation calls for larger training -datasets, or effective approaches which can take advantage of highly -heterogeneous data. In these papers, we focus mainly on the latter, that is, to -improve Chinese SRL by using heterogeneous corpora together. We propose a novel -progressive learning model which augments the Progressive Neural Network with -Gated Recurrent Adapters. The model can accommodate heterogeneous inputs and -effectively transfer knowledge between them. We also release a new corpus, -Chinese SemBank, for Chinese SRL. Experiments on CPB 1.0 show that ours model -outperforms state-of-the-art methods. -" -4300,1702.06777,"Gonzalo Donoso, David Sanchez",Dialectometric analysis of language variation in Twitter,cs.CL cs.IR cs.SI physics.soc-ph," In the last few years, microblogging platforms such as Twitter have given -rise to a deluge of textual data that can be used for the analysis of informal -communication between millions of individuals. In this work, we propose an -information-theoretic approach to geographic language variation using a corpus -based on Twitter. We test our models with tens of concepts and their associated -keywords detected in Spanish tweets geolocated in Spain. We employ -dialectometric measures (cosine similarity and Jensen-Shannon divergence) to -quantify the linguistic distance on the lexical level between cells created in -a uniform grid over the map. This can be done for a single concept or in the -general case taking into account an average of the considered variants. The -latter permits an analysis of the dialects that naturally emerge from the data. -Interestingly, our results reveal the existence of two dialect macrovarieties. -The first group includes a region-specific speech spoken in small towns and -rural areas whereas the second cluster encompasses cities that tend to use a -more uniform variety. Since the results obtained with the two different metrics -qualitatively agree, our work suggests that social media corpora can be -efficiently used for dialectometric analyses. -" -4301,1702.06794,"Minh Le, Antske Fokkens","Tackling Error Propagation through Reinforcement Learning: A Case of - Greedy Dependency Parsing",cs.CL," Error propagation is a common problem in NLP. Reinforcement learning explores -erroneous states during training and can therefore be more robust when mistakes -are made early in a process. In this paper, we apply reinforcement learning to -greedy dependency parsing which is known to suffer from error propagation. -Reinforcement learning improves accuracy of both labeled and unlabeled -dependencies of the Stanford Neural Dependency Parser, a high performance -greedy parser, while maintaining its efficiency. We investigate the portion of -errors which are the result of error propagation and confirm that reinforcement -learning reduces the occurrence of error propagation. -" -4302,1702.06875,"Arman Cohan, Sydney Young, Andrew Yates, Nazli Goharian",Triaging Content Severity in Online Mental Health Forums,cs.CL cs.IR cs.SI," Mental health forums are online communities where people express their issues -and seek help from moderators and other users. In such forums, there are often -posts with severe content indicating that the user is in acute distress and -there is a risk of attempted self-harm. Moderators need to respond to these -severe posts in a timely manner to prevent potential self-harm. However, the -large volume of daily posted content makes it difficult for the moderators to -locate and respond to these critical posts. We present a framework for triaging -user content into four severity categories which are defined based on -indications of self-harm ideation. Our models are based on a feature-rich -classification framework which includes lexical, psycholinguistic, contextual -and topic modeling features. Our approaches improve the state of the art in -triaging the content severity in mental health forums by large margins (up to -17% improvement over the F-1 scores). Using the proposed model, we analyze the -mental state of users and we show that overall, long-term users of the forum -demonstrate a decreased severity of risk over time. Our analysis on the -interaction of the moderators with the users further indicates that without an -automatic way to identify critical content, it is indeed challenging for the -moderators to provide timely response to the users in need. -" -4303,1702.06891,M. Atif Qureshi and Derek Greene,EVE: Explainable Vector Based Embedding Technique Using Wikipedia,cs.CL," We present an unsupervised explainable word embedding technique, called EVE, -which is built upon the structure of Wikipedia. The proposed model defines the -dimensions of a semantic vector representing a word using human-readable -labels, thereby it readily interpretable. Specifically, each vector is -constructed using the Wikipedia category graph structure together with the -Wikipedia article link structure. To test the effectiveness of the proposed -word embedding model, we consider its usefulness in three fundamental tasks: 1) -intruder detection - to evaluate its ability to identify a non-coherent vector -from a list of coherent vectors, 2) ability to cluster - to evaluate its -tendency to group related vectors together while keeping unrelated vectors in -separate clusters, and 3) sorting relevant items first - to evaluate its -ability to rank vectors (items) relevant to the query in the top order of the -result. For each task, we also propose a strategy to generate a task-specific -human-interpretable explanation from the model. These demonstrate the overall -effectiveness of the explainable embeddings generated by EVE. Finally, we -compare EVE with the Word2Vec, FastText, and GloVe embedding techniques across -the three tasks, and report improvements over the state-of-the-art. -" -4304,1702.07015,"Jiaming Luo, Karthik Narasimhan, Regina Barzilay",Unsupervised Learning of Morphological Forests,cs.CL," This paper focuses on unsupervised modeling of morphological families, -collectively comprising a forest over the language vocabulary. This formulation -enables us to capture edgewise properties reflecting single-step morphological -derivations, along with global distributional properties of the entire forest. -These global properties constrain the size of the affix set and encourage -formation of tight morphological families. The resulting objective is solved -using Integer Linear Programming (ILP) paired with contrastive estimation. We -train the model by alternating between optimizing the local log-linear model -and the global ILP objective. We evaluate our system on three tasks: root -detection, clustering of morphological families and segmentation. Our -experiments demonstrate that our model yields consistent gains in all three -tasks compared with the best published results. -" -4305,1702.07046,"Travis Wolfe, Mark Dredze, Benjamin Van Durme",Feature Generation for Robust Semantic Role Labeling,cs.CL," Hand-engineered feature sets are a well understood method for creating robust -NLP models, but they require a lot of expertise and effort to create. In this -work we describe how to automatically generate rich feature sets from simple -units called featlets, requiring less engineering. Using information gain to -guide the generation process, we train models which rival the state of the art -on two standard Semantic Role Labeling datasets with almost no task or -linguistic insight. -" -4306,1702.07071,Keith Y. Patarroyo and Vladimir Vargas-Calder\'on,"Pronunciation recognition of English phonemes /\textipa{@}/, /{\ae}/, - /\textipa{A}:/ and /\textipa{2}/ using Formants and Mel Frequency Cepstral - Coefficients",cs.CL cs.SD," The Vocal Joystick Vowel Corpus, by Washington University, was used to study -monophthongs pronounced by native English speakers. The objective of this study -was to quantitatively measure the extent at which speech recognition methods -can distinguish between similar sounding vowels. In particular, the phonemes -/\textipa{@}/, /{\ae}/, /\textipa{A}:/ and /\textipa{2}/ were analysed. 748 -sound files from the corpus were used and subjected to Linear Predictive Coding -(LPC) to compute their formants, and to Mel Frequency Cepstral Coefficients -(MFCC) algorithm, to compute the cepstral coefficients. A Decision Tree -Classifier was used to build a predictive model that learnt the patterns of the -two first formants measured in the data set, as well as the patterns of the 13 -cepstral coefficients. An accuracy of 70\% was achieved using formants for the -mentioned phonemes. For the MFCC analysis an accuracy of 52 \% was achieved and -an accuracy of 71\% when /\textipa{@}/ was ignored. The results obtained show -that the studied algorithms are far from mimicking the ability of -distinguishing subtle differences in sounds like human hearing does. -" -4307,1702.07092,"Arman Cohan, Allan Fong, Nazli Goharian, and Raj Ratwani",A Neural Attention Model for Categorizing Patient Safety Events,cs.CL cs.IR," Medical errors are leading causes of death in the US and as such, prevention -of these errors is paramount to promoting health care. Patient Safety Event -reports are narratives describing potential adverse events to the patients and -are important in identifying and preventing medical errors. We present a neural -network architecture for identifying the type of safety events which is the -first step in understanding these narratives. Our proposed model is based on a -soft neural attention model to improve the effectiveness of encoding long -sequences. Empirical results on two large-scale real-world datasets of patient -safety reports demonstrate the effectiveness of our method with significant -improvements over existing methods. -" -4308,1702.07117,"Jarvan Law, Hankz Hankui Zhuo, Junhua He and Erhu Rong (Dept. of - Computer Science, Sun Yat-Sen University, GuangZhou, China.)","LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and - Vector Representations",cs.CL," Topic models have been widely used in discovering latent topics which are -shared across documents in text mining. Vector representations, word embeddings -and topic embeddings, map words and topics into a low-dimensional and dense -real-value vector space, which have obtained high performance in NLP tasks. -However, most of the existing models assume the result trained by one of them -are perfect correct and used as prior knowledge for improving the other model. -Some other models use the information trained from external large corpus to -help improving smaller corpus. In this paper, we aim to build such an algorithm -framework that makes topic models and vector representations mutually improve -each other within the same corpus. An EM-style algorithm framework is employed -to iteratively optimize both topic model and vector representations. -Experimental results show that our model outperforms state-of-art methods on -various NLP tasks. -" -4309,1702.07186,Mark Belford and Brian Mac Namee and Derek Greene,Stability of Topic Modeling via Matrix Factorization,cs.IR cs.CL cs.LG stat.ML," Topic models can provide us with an insight into the underlying latent -structure of a large corpus of documents. A range of methods have been proposed -in the literature, including probabilistic topic models and techniques based on -matrix factorization. However, in both cases, standard implementations rely on -stochastic elements in their initialization phase, which can potentially lead -to different results being generated on the same corpus when using the same -parameter values. This corresponds to the concept of ""instability"" which has -previously been studied in the context of $k$-means clustering. In many -applications of topic modeling, this problem of instability is not considered -and topic models are treated as being definitive, even though the results may -change considerably if the initialization process is altered. In this paper we -demonstrate the inherent instability of popular topic modeling approaches, -using a number of new measures to assess stability. To address this issue in -the context of matrix factorization for topic modeling, we propose the use of -ensemble learning strategies. Based on experiments performed on annotated text -corpora, we show that a K-Fold ensemble strategy, combining both ensembles and -structured initialization, can significantly reduce instability, while -simultaneously yielding more accurate topic models. -" -4310,1702.07203,"Anoop Kunchukuttan, Maulik Shah, Pradyot Prakash, Pushpak - Bhattacharyya","Utilizing Lexical Similarity between Related, Low-resource Languages for - Pivot-based SMT",cs.CL," We investigate pivot-based translation between related languages in a low -resource, phrase-based SMT setting. We show that a subword-level pivot-based -SMT model using a related pivot language is substantially better than word and -morpheme-level pivot models. It is also highly competitive with the best direct -translation model, which is encouraging as no direct source-target training -corpus is used. We also show that combining multiple related language pivot -models can rival a direct translation model. Thus, the use of subwords as -translation units coupled with multiple related pivot languages can compensate -for the lack of a direct parallel corpus. -" -4311,1702.07285,"Francesco Barbieri, Miguel Ballesteros, Horacio Saggion",Are Emojis Predictable?,cs.CL," Emojis are ideograms which are naturally combined with plain text to visually -complement or condense the meaning of a message. Despite being widely used in -social media, their underlying semantics have received little attention from a -Natural Language Processing standpoint. In this paper, we investigate the -relation between words and emojis, studying the novel task of predicting which -emojis are evoked by text-based tweet messages. We train several models based -on Long Short-Term Memory networks (LSTMs) in this task. Our experimental -results show that our neural model outperforms two baselines as well as humans -solving the same task, suggesting that computational models are able to better -capture the underlying semantics of emojis. -" -4312,1702.07324,Amanda Doucette,"Inherent Biases of Recurrent Neural Networks for Phonological - Assimilation and Dissimilation",cs.CL," A recurrent neural network model of phonological pattern learning is -proposed. The model is a relatively simple neural network with one recurrent -layer, and displays biases in learning that mimic observed biases in human -learning. Single-feature patterns are learned faster than two-feature patterns, -and vowel or consonant-only patterns are learned faster than patterns involving -vowels and consonants, mimicking the results of laboratory learning -experiments. In non-recurrent models, capturing these biases requires the use -of alpha features or some other representation of repeated features, but with a -recurrent neural network, these elaborations are not necessary. -" -4313,1702.07495,Shaohua Li,Dirichlet-vMF Mixture Model,cs.CL," This document is about the multi-document Von-Mises-Fisher mixture model with -a Dirichlet prior, referred to as VMFMix. VMFMix is analogous to Latent -Dirichlet Allocation (LDA) in that they can capture the co-occurrence patterns -acorss multiple documents. The difference is that in VMFMix, the topic-word -distribution is defined on a continuous n-dimensional hypersphere. Hence VMFMix -is used to derive topic embeddings, i.e., representative vectors, from multiple -sets of embedding vectors. An efficient Variational Expectation-Maximization -inference algorithm is derived. The performance of VMFMix on two document -classification tasks is reported, with some preliminary analysis. -" -4314,1702.07507,Nafise Sadat Moosavi and Michael Strube,"Use Generalized Representations, But Do Not Forget Surface Features",cs.CL," Only a year ago, all state-of-the-art coreference resolvers were using an -extensive amount of surface features. Recently, there was a paradigm shift -towards using word embeddings and deep neural networks, where the use of -surface features is very limited. In this paper, we show that a simple SVM -model with surface features outperforms more complex neural models for -detecting anaphoric mentions. Our analysis suggests that using generalized -representations and surface features have different strength that should be -both taken into account for improving coreference resolution. -" -4315,1702.07680,"Cem Safak Sahin, Rajmonda S. Caceres, Brandon Oselio, William M. - Campbell",Consistent Alignment of Word Embedding Models,cs.CL cs.IR stat.ML," Word embedding models offer continuous vector representations that can -capture rich contextual semantics based on their word co-occurrence patterns. -While these word vectors can provide very effective features used in many NLP -tasks such as clustering similar words and inferring learning relationships, -many challenges and open research questions remain. In this paper, we propose a -solution that aligns variations of the same model (or different models) in a -joint low-dimensional latent space leveraging carefully generated synthetic -data points. This generative process is inspired by the observation that a -variety of linguistic relationships is captured by simple linear operations in -embedded space. We demonstrate that our approach can lead to substantial -improvements in recovering embeddings of local neighborhoods. -" -4316,1702.07717,Liye Fu and Lillian Lee and Cristian Danescu-Niculescu-Mizil,"When confidence and competence collide: Effects on online - decision-making discussions",cs.CL cs.CY cs.HC cs.SI physics.soc-ph," Group discussions are a way for individuals to exchange ideas and arguments -in order to reach better decisions than they could on their own. One of the -premises of productive discussions is that better solutions will prevail, and -that the idea selection process is mediated by the (relative) competence of the -individuals involved. However, since people may not know their actual -competence on a new task, their behavior is influenced by their self-estimated -competence --- that is, their confidence --- which can be misaligned with their -actual competence. - Our goal in this work is to understand the effects of confidence-competence -misalignment on the dynamics and outcomes of discussions. To this end, we -design a large-scale natural setting, in the form of an online team-based -geography game, that allows us to disentangle confidence from competence and -thus separate their effects. - We find that in task-oriented discussions, the more-confident individuals -have a larger impact on the group's decisions even when these individuals are -at the same level of competence as their teammates. Furthermore, this -unjustified role of confidence in the decision-making process often leads teams -to under-perform. We explore this phenomenon by investigating the effects of -confidence on conversational dynamics. -" -4317,1702.07793,"Yisen Wang, Xuejiao Deng, Songbai Pu, Zhiheng Huang",Residual Convolutional CTC Networks for Automatic Speech Recognition,cs.CL," Deep learning approaches have been widely used in Automatic Speech -Recognition (ASR) and they have achieved a significant accuracy improvement. -Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR -recently. However, most CNNs used in existing work have less than 10 layers -which may not be deep enough to capture all human speech signal information. In -this paper, we propose a novel deep and wide CNN architecture denoted as -RCNN-CTC, which has residual connections and Connectionist Temporal -Classification (CTC) loss function. RCNN-CTC is an end-to-end system which can -exploit temporal and spectral structures of speech signals simultaneously. -Furthermore, we introduce a CTC-based system combination, which is different -from the conventional frame-wise senone-based one. The basic subsystems adopted -in the combination are different types and thus mutually complementary to each -other. Experimental results show that our proposed single system RCNN-CTC can -achieve the lowest word error rate (WER) on WSJ and Tencent Chat data sets, -compared to several widely used neural network systems in ASR. In addition, the -proposed system combination can offer a further error reduction on these two -data sets, resulting in relative WER reductions of $14.91\%$ and $6.52\%$ on -WSJ dev93 and Tencent Chat data sets respectively. -" -4318,1702.07825,"Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew - Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, - Shubho Sengupta, Mohammad Shoeybi",Deep Voice: Real-time Neural Text-to-Speech,cs.CL cs.LG cs.NE cs.SD," We present Deep Voice, a production-quality text-to-speech system constructed -entirely from deep neural networks. Deep Voice lays the groundwork for truly -end-to-end neural speech synthesis. The system comprises five major building -blocks: a segmentation model for locating phoneme boundaries, a -grapheme-to-phoneme conversion model, a phoneme duration prediction model, a -fundamental frequency prediction model, and an audio synthesis model. For the -segmentation model, we propose a novel way of performing phoneme boundary -detection with deep neural networks using connectionist temporal classification -(CTC) loss. For the audio synthesis model, we implement a variant of WaveNet -that requires fewer parameters and trains faster than the original. By using a -neural network for each component, our system is simpler and more flexible than -traditional text-to-speech systems, where each component requires laborious -feature engineering and extensive domain expertise. Finally, we show that -inference with our system can be performed faster than real time and describe -optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x -speedups over existing implementations. -" -4319,1702.07826,"Upol Ehsan, Brent Harrison, Larry Chan, Mark O. Riedl","Rationalization: A Neural Machine Translation Approach to Generating - Natural Language Explanations",cs.AI cs.CL cs.HC cs.LG," We introduce AI rationalization, an approach for generating explanations of -autonomous system behavior as if a human had performed the behavior. We -describe a rationalization technique that uses neural machine translation to -translate internal state-action representations of an autonomous agent into -natural language. We evaluate our technique in the Frogger game environment, -training an autonomous game playing agent to rationalize its action choices -using natural language. A natural language training corpus is collected from -human players thinking out loud as they play the game. We motivate the use of -rationalization as an approach to explanation generation and show the results -of two experiments evaluating the effectiveness of rationalization. Results of -these evaluations show that neural machine translation is able to accurately -generate rationalizations that describe agent behavior, and that -rationalizations are more satisfying to humans than other alternative methods -of explanation. -" -4320,1702.07835,Wajdi Zaghouani,Critical Survey of the Freely Available Arabic Corpora,cs.CL," The availability of corpora is a major factor in building natural language -processing applications. However, the costs of acquiring corpora can prevent -some researchers from going further in their endeavours. The ease of access to -freely available corpora is urgent needed in the NLP research community -especially for language such as Arabic. Currently, there is not easy was to -access to a comprehensive and updated list of freely available Arabic corpora. -We present in this paper, the results of a recent survey conducted to identify -the list of the freely available Arabic corpora and language resources. Our -preliminary results showed an initial list of 66 sources. We presents our -findings in the various categories studied and we provided the direct links to -get the data when possible. -" -4321,1702.07983,"Tong Che, Yanran Li, Ruixiang Zhang, R Devon Hjelm, Wenjie Li, Yangqiu - Song, Yoshua Bengio",Maximum-Likelihood Augmented Discrete Generative Adversarial Networks,cs.AI cs.CL cs.LG," Despite the successes in capturing continuous distributions, the application -of generative adversarial networks (GANs) to discrete settings, like natural -language tasks, is rather restricted. The fundamental reason is the difficulty -of back-propagation through discrete random variables combined with the -inherent instability of the GAN training objective. To address these problems, -we propose Maximum-Likelihood Augmented Discrete Generative Adversarial -Networks. Instead of directly optimizing the GAN objective, we derive a novel -and low-variance objective using the discriminator's output that follows -corresponds to the log-likelihood. Compared with the original, the new -objective is proved to be consistent in theory and beneficial in practice. The -experimental results on various discrete datasets demonstrate the effectiveness -of the proposed approach. -" -4322,1702.07998,"Yinfei Yang, Forrest Sheng Bao, Ani Nenkova",Detecting (Un)Important Content for Single-Document News Summarization,cs.CL," We present a robust approach for detecting intrinsic sentence importance in -news, by training on two corpora of document-summary pairs. When used for -single-document summarization, our approach, combined with the ""beginning of -document"" heuristic, outperforms a state-of-the-art summarizer and the -beginning-of-article baseline in both automatic and manual evaluations. These -results represent an important advance because in the absence of cross-document -repetition, single document summarizers for news have not been able to -consistently outperform the strong beginning-of-article baseline. -" -4323,1702.08021,"Mirko Lai, Delia Iraz\'u Hern\'andez Far\'ias, Viviana Patti, Paolo - Rosso","Friends and Enemies of Clinton and Trump: Using Context for Detecting - Stance in Political Tweets",cs.CL," Stance detection, the task of identifying the speaker's opinion towards a -particular target, has attracted the attention of researchers. This paper -describes a novel approach for detecting stance in Twitter. We define a set of -features in order to consider the context surrounding a target of interest with -the final aim of training a model for predicting the stance towards the -mentioned targets. In particular, we are interested in investigating political -debates in social media. For this reason we evaluated our approach focusing on -two targets of the SemEval-2016 Task6 on Detecting stance in tweets, which are -related to the political campaign for the 2016 U.S. presidential elections: -Hillary Clinton vs. Donald Trump. For the sake of comparison with the state of -the art, we evaluated our model against the dataset released in the -SemEval-2016 Task 6 shared task competition. Our results outperform the best -ones obtained by participating teams, and show that information about enemies -and friends of politicians help in detecting stance towards them. -" -4324,1702.08139,"Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick","Improved Variational Autoencoders for Text Modeling using Dilated - Convolutions",cs.NE cs.CL cs.LG," Recent work on generative modeling of text has found that variational -auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM -language models (Bowman et al., 2015). This negative result is so far poorly -understood, but has been attributed to the propensity of LSTM decoders to -ignore conditioning information from the encoder. In this paper, we experiment -with a new type of decoder for VAE: a dilated CNN. By changing the decoder's -dilation architecture, we control the effective context from previously -generated words. In experiments, we find that there is a trade off between the -contextual capacity of the decoder and the amount of encoding information used. -We show that with the right decoder, VAE can outperform LSTM language models. -We demonstrate perplexity gains on two datasets, representing the first -positive experimental result on the use VAE for generative modeling of text. -Further, we conduct an in-depth investigation of the use of VAE (with our new -decoding architecture) for semi-supervised and unsupervised labeling tasks, -demonstrating gains over several strong baselines. -" -4325,1702.08217,"Sreelekha S, Pushpak Bhattacharyya",A case study on English-Malayalam Machine Translation,cs.CL," In this paper we present our work on a case study on Statistical Machine -Translation (SMT) and Rule based machine translation (RBMT) for translation -from English to Malayalam and Malayalam to English. One of the motivations of -our study is to make a three way performance comparison, such as, a) SMT and -RBMT b) English to Malayalam SMT and Malayalam to English SMT c) English to -Malayalam RBMT and Malayalam to English RBMT. We describe the development of -English to Malayalam and Malayalam to English baseline phrase based SMT system -and the evaluation of its performance compared against the RBMT system. Based -on our study the observations are: a) SMT systems outperform RBMT systems, b) -In the case of SMT, English - Malayalam systems perform better than that of -Malayalam - English systems, c) In the case RBMT, Malayalam to English systems -are performing better than English to Malayalam systems. Based on our -evaluations and detailed error analysis, we describe the requirements of -incorporating morphological processing into the SMT to improve the accuracy of -translation. -" -4326,1702.08303,Joachim Bingel and Anders S{\o}gaard,"Identifying beneficial task relations for multi-task learning in deep - neural networks",cs.CL," Multi-task learning (MTL) in deep neural networks for NLP has recently -received increasing interest due to some compelling benefits, including its -potential to efficiently regularize models and to reduce the need for labeled -data. While it has brought significant improvements in a number of NLP tasks, -mixed results have been reported, and little is known about the conditions -under which MTL leads to gains in NLP. This paper sheds light on the specific -task relations that can lead to gains from MTL models over single-task setups. -" -4327,1702.08388,"Arkaitz Zubiaga, Bo Wang, Maria Liakata, Rob Procter","Political Homophily in Independence Movements: Analysing and Classifying - Social Media Users by National Identity",cs.CL cs.SI," Social media and data mining are increasingly being used to analyse political -and societal issues. Here we undertake the classification of social media users -as supporting or opposing ongoing independence movements in their territories. -Independence movements occur in territories whose citizens have conflicting -national identities; users with opposing national identities will then support -or oppose the sense of being part of an independent nation that differs from -the officially recognised country. We describe a methodology that relies on -users' self-reported location to build large-scale datasets for three -territories -- Catalonia, the Basque Country and Scotland. An analysis of these -datasets shows that homophily plays an important role in determining who people -connect with, as users predominantly choose to follow and interact with others -from the same national identity. We show that a classifier relying on users' -follow networks can achieve accurate, language-independent classification -performances ranging from 85% to 97% for the three territories. -" -4328,1702.08450,Mokhtar Billami (LIF),"A Knowledge-Based Approach to Word Sense Disambiguation by - distributional selection and semantic features",cs.CL," Word sense disambiguation improves many Natural Language Processing (NLP) -applications such as Information Retrieval, Information Extraction, Machine -Translation, or Lexical Simplification. Roughly speaking, the aim is to choose -for each word in a text its best sense. One of the most popular method -estimates local semantic similarity relatedness between two word senses and -then extends it to all words from text. The most direct method computes a rough -score for every pair of word senses and chooses the lexical chain that has the -best score (we can imagine the exponential complexity that returns this -comprehensive approach). In this paper, we propose to use a combinatorial -optimization metaheuristic for choosing the nearest neighbors obtained by -distributional selection around the word to disambiguate. The test and the -evaluation of our method concern a corpus written in French by means of the -semantic network BabelNet. The obtained accuracy rate is 78 % on all names and -verbs chosen for the evaluation. -" -4329,1702.08451,"Mokhtar Billami (LIF), N\'uria Gala (LIF)","Approches d'analyse distributionnelle pour am\'eliorer la - d\'esambigu\""isation s\'emantique",cs.CL," Word sense disambiguation (WSD) improves many Natural Language Processing -(NLP) applications such as Information Retrieval, Machine Translation or -Lexical Simplification. WSD is the ability of determining a word sense among -different ones within a polysemic lexical unit taking into account the context. -The most straightforward approach uses a semantic proximity measure between the -word sense candidates of the target word and those of its context. Such a -method very easily entails a combinatorial explosion. In this paper, we propose -two methods based on distributional analysis which enable to reduce the -exponential complexity without losing the coherence. We present a comparison -between the selection of distributional neighbors and the linearly nearest -neighbors. The figures obtained show that selecting distributional neighbors -leads to better results. -" -4330,1702.08563,"John P. Lalor, Hao Wu, Hong Yu",Soft Label Memorization-Generalization for Natural Language Inference,cs.CL," Often when multiple labels are obtained for a training example it is assumed -that there is an element of noise that must be accounted for. It has been shown -that this disagreement can be considered signal instead of noise. In this work -we investigate using soft labels for training data to improve generalization in -machine learning models. However, using soft labels for training Deep Neural -Networks (DNNs) is not practical due to the costs involved in obtaining -multiple labels for large data sets. We propose soft label -memorization-generalization (SLMG), a fine-tuning approach to using soft labels -for training DNNs. We assume that differences in labels provided by human -annotators represent ambiguity about the true label instead of noise. -Experiments with SLMG demonstrate improved generalization performance on the -Natural Language Inference (NLI) task. Our experiments show that by injecting a -small percentage of soft label training data (0.03% of training set size) we -can improve generalization performance over several baselines. -" -4331,1702.08653,Asli Celikyilmaz and Li Deng and Lihong Li and Chong Wang,"Scaffolding Networks: Incremental Learning and Teaching Through - Questioning",cs.CL," We introduce a new paradigm of learning for reasoning, understanding, and -prediction, as well as the scaffolding network to implement this paradigm. The -scaffolding network embodies an incremental learning approach that is -formulated as a teacher-student network architecture to teach machines how to -understand text and do reasoning. The key to our computational scaffolding -approach is the interactions between the teacher and the student through -sequential questioning. The student observes each sentence in the text -incrementally, and it uses an attention-based neural net to discover and -register the key information in relation to its current memory. Meanwhile, the -teacher asks questions about the observed text, and the student network gets -rewarded by correctly answering these questions. The entire network is updated -continually using reinforcement learning. Our experimental results on synthetic -and real datasets show that the scaffolding network not only outperforms -state-of-the-art methods but also learns to do reasoning in a scalable way even -with little human generated input. -" -4332,1702.08866,"Marina Sokolova, Vera Sazonova, Kanyi Huang, Rudraneel Chakraboty, - Stan Matwin",Studying Positive Speech on Twitter,cs.CL," We present results of empirical studies on positive speech on Twitter. By -positive speech we understand speech that works for the betterment of a given -situation, in this case relations between different communities in a -conflict-prone country. We worked with four Twitter data sets. Through -semi-manual opinion mining, we found that positive speech accounted for < 1% of -the data . In fully automated studies, we tested two approaches: unsupervised -statistical analysis, and supervised text classification based on distributed -word representation. We discuss benefits and challenges of those approaches and -report empirical evidence obtained in the study. -" -4333,1703.00050,"Angel X. Chang, Mihail Eric, Manolis Savva, Christopher D. Manning",SceneSeer: 3D Scene Design with Natural Language,cs.GR cs.CL cs.HC," Designing 3D scenes is currently a creative task that requires significant -expertise and effort in using complex 3D design interfaces. This effortful -design process starts in stark contrast to the easiness with which people can -use language to describe real and imaginary environments. We present SceneSeer: -an interactive text to 3D scene generation system that allows a user to design -3D scenes using natural language. A user provides input text from which we -extract explicit constraints on the objects that should appear in the scene. -Given these explicit constraints, the system then uses a spatial knowledge base -learned from an existing database of 3D scenes and 3D object models to infer an -arrangement of the objects forming a natural scene matching the input -description. Using textual commands the user can then iteratively refine the -created scene by adding, removing, replacing, and manipulating objects. We -evaluate the quality of 3D scenes generated by SceneSeer in a perceptual -evaluation experiment where we compare against manually designed scenes and -simpler baselines for 3D scene generation. We demonstrate how the generated -scenes can be iteratively refined through simple natural language commands. -" -4334,1703.00089,"Fan Zhang, Diane Litman",A Joint Identification Approach for Argumentative Writing Revisions,cs.CL," Prior work on revision identification typically uses a pipeline method: -revision extraction is first conducted to identify the locations of revisions -and revision classification is then conducted on the identified revisions. Such -a setting propagates the errors of the revision extraction step to the revision -classification step. This paper proposes an approach that identifies the -revision location and the revision type jointly to solve the issue of error -propagation. It utilizes a sequence representation of revisions and conducts -sequence labeling for revision identification. A mutation-based approach is -utilized to update identification sequences. Results demonstrate that our -proposed approach yields better performance on both revision location -extraction and revision type classification compared to a pipeline baseline. -" -4335,1703.00096,"Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh","Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence - Labelling",cs.CL cs.LG cs.NE," Most existing sequence labelling models rely on a fixed decomposition of a -target sequence into a sequence of basic units. These methods suffer from two -major drawbacks: 1) the set of basic units is fixed, such as the set of words, -characters or phonemes in speech recognition, and 2) the decomposition of -target sequences is fixed. These drawbacks usually result in sub-optimal -performance of modeling sequences. In this pa- per, we extend the popular CTC -loss criterion to alleviate these limitations, and propose a new loss function -called Gram-CTC. While preserving the advantages of CTC, Gram-CTC automatically -learns the best set of basic units (grams), as well as the most suitable -decomposition of tar- get sequences. Unlike CTC, Gram-CTC allows the model to -output variable number of characters at each time step, which enables the model -to capture longer term dependency and improves the computational efficiency. We -demonstrate that the proposed Gram-CTC improves CTC in terms of both -performance and efficiency on the large vocabulary speech recognition task at -multiple scales of data, and that with Gram-CTC we can outperform the -state-of-the-art on a standard speech benchmark. -" -4336,1703.00099,"Zhou Yu, Alan W Black and Alexander I. Rudnicky","Learning Conversational Systems that Interleave Task and Non-Task - Content",cs.CL cs.AI cs.HC," Task-oriented dialog systems have been applied in various tasks, such as -automated personal assistants, customer service providers and tutors. These -systems work well when users have clear and explicit intentions that are -well-aligned to the systems' capabilities. However, they fail if users -intentions are not explicit. To address this shortcoming, we propose a -framework to interleave non-task content (i.e. everyday social conversation) -into task conversations. When the task content fails, the system can still keep -the user engaged with the non-task content. We trained a policy using -reinforcement learning algorithms to promote long-turn conversation coherence -and consistency, so that the system can have smooth transitions between task -and non-task content. To test the effectiveness of the proposed framework, we -developed a movie promotion dialog system. Experiments with human users -indicate that a system that interleaves social and task content achieves a -better task success rate and is also rated as more engaging compared to a pure -task-oriented system. -" -4337,1703.00203,"Quentin Feltgen, Benjamin Fagard and Jean-Pierre Nadal","Frequency patterns of semantic change: Corpus-based evidence of a - near-critical dynamics in language change",physics.soc-ph cs.CL," It is generally believed that, when a linguistic item acquires a new meaning, -its overall frequency of use in the language rises with time with an S-shaped -growth curve. Yet, this claim has only been supported by a limited number of -case studies. In this paper, we provide the first corpus-based quantitative -confirmation of the genericity of the S-curve in language change. Moreover, we -uncover another generic pattern, a latency phase of variable duration preceding -the S-growth, during which the frequency of use of the semantically expanding -word remains low and more or less constant. We also propose a usage-based model -of language change supported by cognitive considerations, which predicts that -both phases, the latency and the fast S-growth, take place. The driving -mechanism is a stochastic dynamics, a random walk in the space of frequency of -use. The underlying deterministic dynamics highlights the role of a control -parameter, the strength of the cognitive impetus governing the onset of change, -which tunes the system at the vicinity of a saddle-node bifurcation. In the -neighborhood of the critical point, the latency phase corresponds to the -diffusion time over the critical region, and the S-growth to the fast -convergence that follows. The duration of the two phases is computed as -specific first passage times of the random walk process, leading to -distributions that fit well the ones extracted from our dataset. We argue that -our results are not specific to the studied corpus, but apply to semantic -change in general. -" -4338,1703.00317,"Ceyda Sanli, Anupam Mondal, Erik Cambria","Tracing Linguistic Relations in Winning and Losing Sides of Explicit - Opposing Groups",cs.CL cs.AI," Linguistic relations in oral conversations present how opinions are -constructed and developed in a restricted time. The relations bond ideas, -arguments, thoughts, and feelings, re-shape them during a speech, and finally -build knowledge out of all information provided in the conversation. Speakers -share a common interest to discuss. It is expected that each speaker's reply -includes duplicated forms of words from previous speakers. However, linguistic -adaptation is observed and evolves in a more complex path than just -transferring slightly modified versions of common concepts. A conversation -aiming a benefit at the end shows an emergent cooperation inducing the -adaptation. Not only cooperation, but also competition drives the adaptation or -an opposite scenario and one can capture the dynamic process by tracking how -the concepts are linguistically linked. To uncover salient complex dynamic -events in verbal communications, we attempt to discover self-organized -linguistic relations hidden in a conversation with explicitly stated winners -and losers. We examine open access data of the United States Supreme Court. Our -understanding is crucial in big data research to guide how transition states in -opinion mining and decision-making should be modeled and how this required -knowledge to guide the model should be pinpointed, by filtering large amount of -data. -" -4339,1703.00538,Jinying Chen and Hong Yu,"Unsupervised Ensemble Ranking of Terms in Electronic Health Record Notes - Based on Their Importance to Patients",cs.CL," Background: Electronic health record (EHR) notes contain abundant medical -jargon that can be difficult for patients to comprehend. One way to help -patients is to reduce information overload and help them focus on medical terms -that matter most to them. - Objective: The aim of this work was to develop FIT (Finding Important Terms -for patients), an unsupervised natural language processing (NLP) system that -ranks medical terms in EHR notes based on their importance to patients. - Methods: We built FIT on a new unsupervised ensemble ranking model derived -from the biased random walk algorithm to combine heterogeneous information -resources for ranking candidate terms from each EHR note. Specifically, FIT -integrates four single views for term importance: patient use of medical -concepts, document-level term salience, word-occurrence based term relatedness, -and topic coherence. It also incorporates partial information of term -importance as conveyed by terms' unfamiliarity levels and semantic types. We -evaluated FIT on 90 expert-annotated EHR notes and compared it with three -benchmark unsupervised ensemble ranking methods. - Results: FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR -notes to identify important terms. When including term identification, the -performance of FIT for identifying important terms from EHR notes was 0.813 -AUC-ROC. It outperformed the three ensemble rankers for most metrics. Its -performance is relatively insensitive to its parameter. - Conclusions: FIT can automatically identify EHR terms important to patients -and may help develop personalized interventions to improve quality of care. By -using unsupervised learning as well as a robust and flexible framework for -information fusion, FIT can be readily applied to other domains and -applications. -" -4340,1703.00565,Jason S. Kessler,Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ,cs.CL cs.IR," Scattertext is an open source tool for visualizing linguistic variation -between document categories in a language-independent way. The tool presents a -scatterplot, where each axis corresponds to the rank-frequency a term occurs in -a category of documents. Through a tie-breaking strategy, the tool is able to -display thousands of visible term-representing points and find space to legibly -label hundreds of them. Scattertext also lends itself to a query-based -visualization of how the use of terms with similar embeddings differs between -document categories, as well as a visualization for comparing the importance -scores of bag-of-words features to univariate metrics. -" -4341,1703.00572,"Rui Liu, Junjie Hu, Wei Wei, Zi Yang, Eric Nyberg",Structural Embedding of Syntactic Trees for Machine Comprehension,cs.CL," Deep neural networks for machine comprehension typically utilizes only word -or character embeddings without explicitly taking advantage of structured -linguistic information such as constituency trees and dependency trees. In this -paper, we propose structural embedding of syntactic trees (SEST), an algorithm -framework to utilize structured information and encode them into vector -representations that can boost the performance of algorithms for the machine -comprehension. We evaluate our approach using a state-of-the-art neural -attention model on the SQuAD dataset. Experimental results demonstrate that our -model can accurately identify the syntactic boundaries of the sentences and -extract answers that are syntactically coherent over the baseline methods. -" -4342,1703.00607,"Zijun Yao, Yifan Sun, Weicong Ding, Nikhil Rao, Hui Xiong",Dynamic Word Embeddings for Evolving Semantic Discovery,cs.CL stat.ML," Word evolution refers to the changing meanings and associations of words -throughout time, as a byproduct of human language evolution. By studying word -evolution, we can infer social trends and language constructs over different -periods of human history. However, traditional techniques such as word -representation learning do not adequately capture the evolving language -structure and vocabulary. In this paper, we develop a dynamic statistical model -to learn time-aware word vector representation. We propose a model that -simultaneously learns time-aware embeddings and solves the resulting ""alignment -problem"". This model is trained on a crawled NYTimes dataset. Additionally, we -develop multiple intuitive evaluation strategies of temporal word embeddings. -Our qualitative and quantitative tests indicate that our method not only -reliably captures this evolution over time, but also consistently outperforms -state-of-the-art temporal embedding approaches on both semantic accuracy and -alignment quality. -" -4343,1703.00782,Xu Sun and Shuming Ma,Lock-Free Parallel Perceptron for Graph-based Dependency Parsing,cs.CL," Dependency parsing is an important NLP task. A popular approach for -dependency parsing is structured perceptron. Still, graph-based dependency -parsing has the time complexity of $O(n^3)$, and it suffers from slow training. -To deal with this problem, we propose a parallel algorithm called parallel -perceptron. The parallel algorithm can make full use of a multi-core computer -which saves a lot of training time. Based on experiments we observe that -dependency parsing with parallel perceptron can achieve 8-fold faster training -speed than traditional structured perceptron methods when using 10 threads, and -with no loss at all in accuracy. -" -4344,1703.00786,Shuming Ma and Xu Sun,A Generic Online Parallel Learning Framework for Large Margin Models,cs.CL cs.LG," To speed up the training process, many existing systems use parallel -technology for online learning algorithms. However, most research mainly focus -on stochastic gradient descent (SGD) instead of other algorithms. We propose a -generic online parallel learning framework for large margin models, and also -analyze our framework on popular large margin algorithms, including MIRA and -Structured Perceptron. Our framework is lock-free and easy to implement on -existing systems. Experiments show that systems with our framework can gain -near linear speed up by increasing running threads, and with no loss in -accuracy. -" -4345,1703.00948,"Nemanja Spasojevic, Preeti Bhargava, Guoning Hu",DAWT: Densely Annotated Wikipedia Texts across multiple languages,cs.IR cs.AI cs.CL cs.SI," In this work, we open up the DAWT dataset - Densely Annotated Wikipedia Texts -across multiple languages. The annotations include labeled text mentions -mapping to entities (represented by their Freebase machine ids) as well as the -type of the entity. The data set contains total of 13.6M articles, 5.0B tokens, -13.8M mention entity co-occurrences. DAWT contains 4.8 times more anchor text -to entity links than originally present in the Wikipedia markup. Moreover, it -spans several languages including English, Spanish, Italian, German, French and -Arabic. We also present the methodology used to generate the dataset which -enriches Wikipedia markup in order to increase number of links. In addition to -the main dataset, we open up several derived datasets including mention entity -co-occurrence counts and entity embeddings, as well as mappings between -Freebase ids and Wikidata item ids. We also discuss two applications of these -datasets and hope that opening them up would prove useful for the Natural -Language Processing and Information Retrieval communities, as well as -facilitate multi-lingual research. -" -4346,1703.00955,"Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, Eric P. - Xing",Toward Controlled Generation of Text,cs.LG cs.AI cs.CL stat.ML," Generic generation and manipulation of text is challenging and has limited -success compared to recent deep generative modeling in visual domain. This -paper aims at generating plausible natural language sentences, whose attributes -are dynamically controlled by learning disentangled latent representations with -designated semantics. We propose a new neural generative model which combines -variational auto-encoders and holistic attribute discriminators for effective -imposition of semantic structures. With differentiable approximation to -discrete text samples, explicit constraints on independent attribute controls, -and efficient collaborative learning of generator and discriminators, our model -learns highly interpretable representations from even only word annotations, -and produces realistic sentences with desired attributes. Quantitative -evaluation validates the accuracy of sentence and attribute generation. -" -4347,1703.00993,"Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, William W. Cohen",A Comparative Study of Word Embeddings for Reading Comprehension,cs.CL," The focus of past machine learning research for Reading Comprehension tasks -has been primarily on the design of novel deep learning architectures. Here we -show that seemingly minor choices made on (1) the use of pre-trained word -embeddings, and (2) the representation of out-of-vocabulary tokens at test -time, can turn out to have a larger impact than architectural choices on the -final performance. We systematically explore several options for these choices, -and provide recommendations to researchers working in this area. -" -4348,1703.01008,"Xiujun Li and Yun-Nung Chen and Lihong Li and Jianfeng Gao and Asli - Celikyilmaz",End-to-End Task-Completion Neural Dialogue Systems,cs.CL cs.AI," One of the major drawbacks of modularized task-completion dialogue systems is -that each module is trained individually, which presents several challenges. -For example, downstream modules are affected by earlier modules, and the -performance of the entire system is not robust to the accumulated errors. This -paper presents a novel end-to-end learning framework for task-completion -dialogue systems to tackle such issues. Our neural dialogue system can directly -interact with a structured database to assist users in accessing information -and accomplishing certain tasks. The reinforcement learning based dialogue -manager offers robust capabilities to handle noises caused by other components -of the dialogue system. Our experiments in a movie-ticket booking domain show -that our end-to-end system not only outperforms modularized dialogue system -baselines for both objective and subjective evaluation, but also is robust to -noises as demonstrated by several systematic experiments with different error -granularity and rates specific to the language understanding module. -" -4349,1703.01024,"Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei",Exponential Moving Average Model in Parallel Speech Recognition Training,cs.CL," As training data rapid growth, large-scale parallel training with multi-GPUs -cluster is widely applied in the neural network model learning currently.We -present a new approach that applies exponential moving average method in -large-scale parallel training of neural network model. It is a non-interference -strategy that the exponential moving average model is not broadcasted to -distributed workers to update their local models after model synchronization in -the training process, and it is implemented as the final model of the training -system. Fully-connected feed-forward neural networks (DNNs) and deep -unidirectional Long short-term memory (LSTM) recurrent neural networks (RNNs) -are successfully trained with proposed method for large vocabulary continuous -speech recognition on Shenma voice search data in Mandarin. The character error -rate (CER) of Mandarin speech recognition further degrades than -state-of-the-art approaches of parallel training. -" -4350,1703.01485,"Sreelekha S, Pushpak Bhattacharyya",Lexical Resources for Hindi Marathi MT,cs.CL," In this paper we describe some ways to utilize various lexical resources to -improve the quality of statistical machine translation system. We have -augmented the training corpus with various lexical resources such as -IndoWordnet semantic relation set, function words, kridanta pairs and verb -phrases etc. Our research on the usage of lexical resources mainly focused on -two ways such as augmenting parallel corpus with more vocabulary and augmenting -with various word forms. We have described case studies, evaluations and -detailed error analysis for both Marathi to Hindi and Hindi to Marathi machine -translation systems. From the evaluations we observed that, there is an -incremental growth in the quality of machine translation as the usage of -various lexical resources increases. Moreover usage of various lexical -resources helps to improve the coverage and quality of machine translation -where limited parallel corpus is available. -" -4351,1703.01557,Lidong Bing and William W. Cohen and Bhuwan Dhingra,"Using Graphs of Classifiers to Impose Declarative Constraints on - Semi-supervised Learning",cs.LG cs.CL stat.ML," We propose a general approach to modeling semi-supervised learning (SSL) -algorithms. Specifically, we present a declarative language for modeling both -traditional supervised classification tasks and many SSL heuristics, including -both well-known heuristics such as co-training and novel domain-specific -heuristics. In addition to representing individual SSL heuristics, we show that -multiple heuristics can be automatically combined using Bayesian optimization -methods. We experiment with two classes of tasks, link-based text -classification and relation extraction. We show modest improvements on -well-studied link-based classification benchmarks, and state-of-the-art results -on relation-extraction tasks for two realistic domains. -" -4352,1703.01619,Graham Neubig,Neural Machine Translation and Sequence-to-sequence Models: A Tutorial,cs.CL cs.LG stat.ML," This tutorial introduces a new and powerful set of techniques variously -called ""neural machine translation"" or ""neural sequence-to-sequence models"". -These techniques have been used in a number of tasks regarding the handling of -human language, and can be a powerful tool in the toolbox of anyone who wants -to model sequential data of some sort. The tutorial assumes that the reader -knows the basics of math and programming, but does not assume any particular -experience with neural networks or natural language processing. It attempts to -explain the intuition behind the various methods covered, then delves into them -with enough mathematical detail to understand them concretely, and culiminates -with a suggestion for an implementation exercise, where readers can test that -they understood the content in practice. -" -4353,1703.01671,"Virgile Landeiro, Aron Culotta","Controlling for Unobserved Confounds in Classification Using - Correlational Constraints",cs.AI cs.CL," As statistical classifiers become integrated into real-world applications, it -is important to consider not only their accuracy but also their robustness to -changes in the data distribution. In this paper, we consider the case where -there is an unobserved confounding variable $z$ that influences both the -features $\mathbf{x}$ and the class variable $y$. When the influence of $z$ -changes from training to testing data, we find that the classifier accuracy can -degrade rapidly. In our approach, we assume that we can predict the value of -$z$ at training time with some error. The prediction for $z$ is then fed to -Pearl's back-door adjustment to build our model. Because of the attenuation -bias caused by measurement error in $z$, standard approaches to controlling for -$z$ are ineffective. In response, we propose a method to properly control for -the influence of $z$ by first estimating its relationship with the class -variable $y$, then updating predictions for $z$ to match that estimated -relationship. By adjusting the influence of $z$, we show that we can build a -model that exceeds competing baselines on accuracy as well as on robustness -over a range of confounding relationships. -" -4354,1703.01694,Stephan C. Meylan and Thomas L. Griffiths,"Word forms - not just their lengths- are optimized for efficient - communication",cs.CL," The inverse relationship between the length of a word and the frequency of -its use, first identified by G.K. Zipf in 1935, is a classic empirical law that -holds across a wide range of human languages. We demonstrate that length is one -aspect of a much more general property of words: how distinctive they are with -respect to other words in a language. Distinctiveness plays a critical role in -recognizing words in fluent speech, in that it reflects the strength of -potential competitors when selecting the best candidate for an ambiguous -signal. Phonological information content, a measure of a word's string -probability under a statistical model of a language's sound or character -sequences, concisely captures distinctiveness. Examining large-scale corpora -from 13 languages, we find that distinctiveness significantly outperforms word -length as a predictor of frequency. This finding provides evidence that -listeners' processing constraints shape fine-grained aspects of word forms -across languages. -" -4355,1703.01720,"Ashwin K Vijayakumar, Ramakrishna Vedantam, Devi Parikh",Sound-Word2Vec: Learning Word Representations Grounded in Sounds,cs.CL cs.AI cs.SD," To be able to interact better with humans, it is crucial for machines to -understand sound - a primary modality of human perception. Previous works have -used sound to learn embeddings for improved generic textual similarity -assessment. In this work, we treat sound as a first-class citizen, studying -downstream textual tasks which require aural grounding. To this end, we propose -sound-word2vec - a new embedding scheme that learns specialized word embeddings -grounded in sounds. For example, we learn that two seemingly (semantically) -unrelated concepts, like leaves and paper are similar due to the similar -rustling sounds they make. Our embeddings prove useful in textual tasks -requiring aural reasoning like text-based sound retrieval and discovering foley -sound effects (used in movies). Moreover, our embedding space captures -interesting dependencies between words and onomatopoeia and outperforms prior -work on aurally-relevant word relatedness datasets such as AMEN and ASLex. -" -4356,1703.01725,"Jack Hessel, Lillian Lee, David Mimno","Cats and Captions vs. Creators and the Clock: Comparing Multimodal - Content to Context in Predicting Relative Popularity",cs.SI cs.CL cs.CV physics.soc-ph," The content of today's social media is becoming more and more rich, -increasingly mixing text, images, videos, and audio. It is an intriguing -research question to model the interplay between these different modes in -attracting user attention and engagement. But in order to pursue this study of -multimodal content, we must also account for context: timing effects, community -preferences, and social factors (e.g., which authors are already popular) also -affect the amount of feedback and reaction that social-media posts receive. In -this work, we separate out the influence of these non-content factors in -several ways. First, we focus on ranking pairs of submissions posted to the -same community in quick succession, e.g., within 30 seconds, this framing -encourages models to focus on time-agnostic and community-specific content -features. Within that setting, we determine the relative performance of author -vs. content features. We find that victory usually belongs to ""cats and -captions,"" as visual and textual features together tend to outperform -identity-based features. Moreover, our experiments show that when considered in -isolation, simple unigram text features and deep neural network visual features -yield the highest accuracy individually, and that the combination of the two -modalities generally leads to the best accuracies overall. -" -4357,1703.01726,"Xiao-gang Zhang, Shou-qian Sun, Ke-jun Zhang","A Novel Comprehensive Approach for Estimating Concept Semantic - Similarity in WordNet",cs.CL," Computation of semantic similarity between concepts is an important -foundation for many research works. This paper focuses on IC computing methods -and IC measures, which estimate the semantic similarities between concepts by -exploiting the topological parameters of the taxonomy. Based on analyzing -representative IC computing methods and typical semantic similarity measures, -we propose a new hybrid IC computing method. Through adopting the parameter -dhyp and lch, we utilize the new IC computing method and propose a novel -comprehensive measure of semantic similarity between concepts. An experiment -based on WordNet ""is a"" taxonomy has been designed to test representative -measures and our measure on benchmark dataset R&G, and the results show that -our measure can obviously improve the similarity accuracy. We evaluate the -proposed approach by comparing the correlation coefficients between five -measures and the artificial data. The results show that our proposal -outperforms the previous measures. -" -4358,1703.01898,"Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom","Generative and Discriminative Text Classification with Recurrent Neural - Networks",stat.ML cs.CL cs.LG," We empirically characterize the performance of discriminative and generative -LSTM models for text classification. We find that although RNN-based generative -models are more powerful than their bag-of-words ancestors (e.g., they account -for conditional dependencies across words in a document), they have higher -asymptotic error rates than discriminatively trained RNN models. However we -also find that generative models approach their asymptotic error rate more -rapidly than their discriminative counterparts---the same pattern that Ng & -Jordan (2001) proved holds for linear classification models that make more -naive conditional independence assumptions. Building on this finding, we -hypothesize that RNN-based generative classification models will be more robust -to shifts in the data distribution. This hypothesis is confirmed in a series of -experiments in zero-shot and continual learning settings that show that -generative models substantially outperform discriminative models. -" -4359,1703.02019,"Gourav G. Shenoy, Erika H. Dsouza, Sandra K\""ubler","Performing Stance Detection on Twitter Data using Computational - Linguistics Techniques",cs.CL," As humans, we can often detect from a persons utterances if he or she is in -favor of or against a given target entity (topic, product, another person, -etc). But from the perspective of a computer, we need means to automatically -deduce the stance of the tweeter, given just the tweet text. In this paper, we -present our results of performing stance detection on twitter data using a -supervised approach. We begin by extracting bag-of-words to perform -classification using TIMBL, then try and optimize the features to improve -stance detection accuracy, followed by extending the dataset with two sets of -lexicons - arguing, and MPQA subjectivity; next we explore the MALT parser and -construct features using its dependency triples, finally we perform analysis -using Scikit-learn Random Forest implementation. -" -4360,1703.02031,Jean-Fran\c{c}ois Delpech and Sabine Ploux,Random vector generation of a semantic space,cs.CL," We show how random vectors and random projection can be implemented in the -usual vector space model to construct a Euclidean semantic space from a French -synonym dictionary. We evaluate theoretically the resulting noise and show the -experimental distribution of the similarities of terms in a neighborhood -according to the choice of parameters. We also show that the Schmidt -orthogonalization process is applicable and can be used to separate homonyms -with distinct semantic meanings. Neighboring terms are easily arranged into -semantically significant clusters which are well suited to the generation of -realistic lists of synonyms and to such applications as word selection for -automatic text generation. This process, applicable to any language, can easily -be extended to collocations, is extremely fast and can be updated in real time, -whenever new synonyms are proposed. -" -4361,1703.02136,"George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel - Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael - Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall","English Conversational Telephone Speech Recognition by Humans and - Machines",cs.CL," One of the most difficult speech recognition tasks is accurate recognition of -human to human communication. Advances in deep learning over the last few years -have produced major speech recognition improvements on the representative -Switchboard conversational corpus. Word error rates that just a few years ago -were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now -believed to be within striking range of human performance. This then raises two -issues - what IS human performance, and how far down can we still drive speech -recognition error rates? A recent paper by Microsoft suggests that we have -already achieved human performance. In trying to verify this statement, we -performed an independent set of human performance measurements on two -conversational tasks and found that human performance may be considerably -better than what was earlier reported, giving the community a significantly -harder goal to achieve. We also report on our own efforts in this area, -presenting a set of acoustic and language modeling techniques that lowered the -word error rate of our own English conversational telephone LVCSR system to the -level of 5.5%/10.3% on the Switchboard/CallHome subsets of the Hub5 2000 -evaluation, which - at least at the writing of this paper - is a new -performance milestone (albeit not at what we measure to be human performance!). -On the acoustic side, we use a score fusion of three models: one LSTM with -multiple feature inputs, a second LSTM trained with speaker-adversarial -multi-task learning and a third residual net (ResNet) with 25 convolutional -layers and time-dilated convolutions. On the language modeling side, we use -word and character LSTMs and convolutional WaveNet-style language models. -" -4362,1703.02166,Nam Tran Van,"Building a Syllable Database to Solve the Problem of Khmer Word - Segmentation",cs.CL," Word segmentation is a basic problem in natural language processing. With the -languages having the complex writing system like the Khmer language in Southern -of Vietnam, this problem really very intractable, posing the significant -challenges. Although there are some experts in Vietnam as well as international -having deeply researched this problem, there are still no reasonable results -meeting the demand, in particular, no treated thoroughly the ambiguous -phenomenon, in the process of Khmer language processing so far. This paper -present a solution based on the syllable division into component clusters using -two syllable models proposed, thereby building a Khmer syllable database, is -still not actually available. This method using a lexical database updated from -the online Khmer dictionaries and some supported dictionaries serving role of -training data and complementary linguistic characteristics. Each component -cluster is labelled and located by the first and last letter to identify -entirety a syllable. This approach is workable and the test results achieve -high accuracy, eliminate the ambiguity, contribute to solving the problem of -word segmentation and applying efficiency in Khmer language processing. -" -4363,1703.02504,"Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon - M\""uller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi","Leveraging Large Amounts of Weakly Supervised Data for Multi-Language - Sentiment Classification",cs.CL cs.IR cs.LG," This paper presents a novel approach for multi-lingual sentiment -classification in short texts. This is a challenging task as the amount of -training data in languages other than English is very limited. Previously -proposed multi-lingual approaches typically require to establish a -correspondence to English for which powerful classifiers are already available. -In contrast, our method does not require such supervision. We leverage large -amounts of weakly-supervised data in various languages to train a multi-layer -convolutional network and demonstrate the importance of using pre-training of -such networks. We thoroughly evaluate our approach on various multi-lingual -datasets, including the recent SemEval-2016 sentiment prediction benchmark -(Task 4), where we achieved state-of-the-art performance. We also compare the -performance of our model trained individually for each language to a variant -trained for all languages at once. We show that the latter model reaches -slightly worse - but still acceptable - performance when compared to the single -language model, while benefiting from better generalization properties across -languages. -" -4364,1703.02507,"Matteo Pagliardini, Prakhar Gupta, Martin Jaggi","Unsupervised Learning of Sentence Embeddings using Compositional n-Gram - Features",cs.CL cs.AI cs.IR," The recent tremendous success of unsupervised word embeddings in a multitude -of applications raises the obvious question if similar methods could be derived -to improve embeddings (i.e. semantic representations) of word sequences as -well. We present a simple but efficient unsupervised objective to train -distributed representations of sentences. Our method outperforms the -state-of-the-art unsupervised models on most benchmark tasks, highlighting the -robustness of the produced general-purpose sentence embeddings. -" -4365,1703.02517,Aleksei Nazarov and Joe Pater,Learning opacity in Stratal Maximum Entropy Grammar,cs.CL," Opaque phonological patterns are sometimes claimed to be difficult to learn; -specific hypotheses have been advanced about the relative difficulty of -particular kinds of opaque processes (Kiparsky 1971, 1973), and the kind of -data that will be helpful in learning an opaque pattern (Kiparsky 2000). In -this paper, we present a computationally implemented learning theory for one -grammatical theory of opacity: a Maximum Entropy version of Stratal OT -(Berm\'udez-Otero 1999, Kiparsky 2000), and test it on simplified versions of -opaque French tense-lax vowel alternations and the opaque interaction of -diphthong raising and flapping in Canadian English. We find that the difficulty -of opacity can be influenced by evidence for stratal affiliation: the Canadian -English case is easier if the learner encounters application of raising outside -the flapping context, or non-application of raising between words (i.e., -with a raised vowel; with a non-raised vowel). -" -4366,1703.02573,"Ziang Xie, Sida I. Wang, Jiwei Li, Daniel L\'evy, Aiming Nie, Dan - Jurafsky, Andrew Y. Ng",Data Noising as Smoothing in Neural Network Language Models,cs.LG cs.CL," Data noising is an effective technique for regularizing neural network -models. While noising is widely adopted in application domains such as vision -and speech, commonly used noising primitives have not been developed for -discrete sequence-level settings such as language modeling. In this paper, we -derive a connection between input noising in neural network language models and -smoothing in $n$-gram models. Using this connection, we draw upon ideas from -smoothing to develop effective noising schemes. We demonstrate performance -gains when applying the proposed schemes to language modeling and machine -translation. Finally, we provide empirical analysis validating the relationship -between noising and smoothing. -" -4367,1703.02620,"Bhuwan Dhingra, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov",Linguistic Knowledge as Memory for Recurrent Neural Networks,cs.CL," Training recurrent neural networks to model long term dependencies is -difficult. Hence, we propose to use external linguistic knowledge as an -explicit signal to inform the model which memories it should utilize. -Specifically, external knowledge is used to augment a sequence with typed edges -between arbitrarily distant elements, and the resulting graph is decomposed -into directed acyclic subgraphs. We introduce a model that encodes such graphs -as explicit memory in recurrent neural networks, and use it to model -coreference relations in text. We apply our model to several text comprehension -tasks and achieve new state-of-the-art results on all considered benchmarks, -including CNN, bAbi, and LAMBADA. On the bAbi QA tasks, our model solves 15 out -of the 20 tasks with only 1000 training examples per task. Analysis of the -learned representations further demonstrates the ability of our model to encode -fine-grained entity information across a document. -" -4368,1703.02819,Dmitry I. Ignatov,"Introduction to Formal Concept Analysis and Its Applications in - Information Retrieval and Related Fields",cs.IR cs.AI cs.CL cs.DM stat.ML," This paper is a tutorial on Formal Concept Analysis (FCA) and its -applications. FCA is an applied branch of Lattice Theory, a mathematical -discipline which enables formalisation of concepts as basic units of human -thinking and analysing data in the object-attribute form. Originated in early -80s, during the last three decades, it became a popular human-centred tool for -knowledge representation and data analysis with numerous applications. Since -the tutorial was specially prepared for RuSSIR 2014, the covered FCA topics -include Information Retrieval with a focus on visualisation aspects, Machine -Learning, Data Mining and Knowledge Discovery, Text Mining and several others. -" -4369,1703.02859,"Tianran Hu, Ruihua Song, Maya Abtahian, Philip Ding, Xing Xie, Jiebo - Luo",A World of Difference: Divergent Word Interpretations among People,cs.CL," Divergent word usages reflect differences among people. In this paper, we -present a novel angle for studying word usage divergence -- word -interpretations. We propose an approach that quantifies semantic differences in -interpretations among different groups of people. The effectiveness of our -approach is validated by quantitative evaluations. Experiment results indicate -that divergences in word interpretations exist. We further apply the approach -to two well studied types of differences between people -- gender and region. -The detected words with divergent interpretations reveal the unique features of -specific groups of people. For gender, we discover that certain different -interests, social attitudes, and characters between males and females are -reflected in their divergent interpretations of many words. For region, we find -that specific interpretations of certain words reveal the geographical and -cultural features of different regions. -" -4370,1703.02860,"Tianran Hu, Han Guo, Hao Sun, Thuy-vy Thi Nguyen, Jiebo Luo",Spice up Your Chat: The Intentions and Sentiment Effects of Using Emoji,cs.CL cs.HC," Emojis, as a new way of conveying nonverbal cues, are widely adopted in -computer-mediated communications. In this paper, first from a message sender -perspective, we focus on people's motives in using four types of emojis -- -positive, neutral, negative, and non-facial. We compare the willingness levels -of using these emoji types for seven typical intentions that people usually -apply nonverbal cues for in communication. The results of extensive statistical -hypothesis tests not only report the popularities of the intentions, but also -uncover the subtle differences between emoji types in terms of intended uses. -Second, from a perspective of message recipients, we further study the -sentiment effects of emojis, as well as their duplications, on verbal messages. -Different from previous studies in emoji sentiment, we study the sentiments of -emojis and their contexts as a whole. The experiment results indicate that the -powers of conveying sentiment are different between four emoji types, and the -sentiment effects of emojis vary in the contexts of different valences. -" -4371,1703.03091,Marc Moreno Lopez and Jugal Kalita,Deep Learning applied to NLP,cs.CL," Convolutional Neural Network (CNNs) are typically associated with Computer -Vision. CNNs are responsible for major breakthroughs in Image Classification -and are the core of most Computer Vision systems today. More recently CNNs have -been applied to problems in Natural Language Processing and gotten some -interesting results. In this paper, we will try to explain the basics of CNNs, -its different variations and how they have been applied to NLP. -" -4372,1703.03097,"Mayank Kejriwal, Pedro Szekely",Information Extraction in Illicit Domains,cs.CL cs.AI," Extracting useful entities and attribute values from illicit domains such as -human trafficking is a challenging problem with the potential for widespread -social impact. Such domains employ atypical language models, have `long tails' -and suffer from the problem of concept drift. In this paper, we propose a -lightweight, feature-agnostic Information Extraction (IE) paradigm specifically -designed for such domains. Our approach uses raw, unlabeled text from an -initial corpus, and a few (12-120) seed annotations per domain-specific -attribute, to learn robust IE models for unobserved pages and websites. -Empirically, we demonstrate that our approach can outperform feature-centric -Conditional Random Field baselines by over 18\% F-Measure on five annotated -sets of real-world human trafficking datasets in both low-supervision and -high-supervision settings. We also show that our approach is demonstrably -robust to concept drift, and can be efficiently bootstrapped even in a serial -computing environment. -" -4373,1703.03130,"Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing - Xiang, Bowen Zhou, Yoshua Bengio",A Structured Self-attentive Sentence Embedding,cs.CL cs.AI cs.LG cs.NE," This paper proposes a new model for extracting an interpretable sentence -embedding by introducing self-attention. Instead of using a vector, we use a -2-D matrix to represent the embedding, with each row of the matrix attending on -a different part of the sentence. We also propose a self-attention mechanism -and a special regularization term for the model. As a side effect, the -embedding comes with an easy way of visualizing what specific parts of the -sentence are encoded into the embedding. We evaluate our model on 3 different -tasks: author profiling, sentiment classification, and textual entailment. -Results show that our model yields a significant performance gain compared to -other sentence embedding methods in all of the 3 tasks. -" -4374,1703.03149,Marjan Hosseinia and Arjun Mukherjee,Detecting Sockpuppets in Deceptive Opinion Spam,cs.CL," This paper explores the problem of sockpuppet detection in deceptive opinion -spam using authorship attribution and verification approaches. Two methods are -explored. The first is a feature subsampling scheme that uses the KL-Divergence -on stylistic language models of an author to find discriminative features. The -second is a transduction scheme, spy induction that leverages the diversity of -authors in the unlabeled test set by sending a set of spies (positive samples) -from the training set to retrieve hidden samples in the unlabeled test set -using nearest and farthest neighbors. Experiments using ground truth sockpuppet -data show the effectiveness of the proposed schemes. -" -4375,1703.03200,"Burcu Can, Ahmet \""Ust\""un, Murathan Kurfal{\i}","Turkish PoS Tagging by Reducing Sparsity with Morpheme Tags in Small - Datasets",cs.CL," Sparsity is one of the major problems in natural language processing. The -problem becomes even more severe in agglutinating languages that are highly -prone to be inflected. We deal with sparsity in Turkish by adopting -morphological features for part-of-speech tagging. We learn inflectional and -derivational morpheme tags in Turkish by using conditional random fields (CRF) -and we employ the morpheme tags in part-of-speech (PoS) tagging by using hidden -Markov models (HMMs) to mitigate sparsity. Results show that using morpheme -tags in PoS tagging helps alleviate the sparsity in emission probabilities. Our -model outperforms other hidden Markov model based PoS tagging models for small -training datasets in Turkish. We obtain an accuracy of 94.1% in morpheme -tagging and 89.2% in PoS tagging on a 5K training dataset. -" -4376,1703.03386,"William L. Hamilton, Justine Zhang, Cristian Danescu-Niculescu-Mizil, - Dan Jurafsky, Jure Leskovec",Loyalty in Online Communities,cs.SI cs.CL," Loyalty is an essential component of multi-community engagement. When users -have the choice to engage with a variety of different communities, they often -become loyal to just one, focusing on that community at the expense of others. -However, it is unclear how loyalty is manifested in user behavior, or whether -loyalty is encouraged by certain community characteristics. - In this paper we operationalize loyalty as a user-community relation: users -loyal to a community consistently prefer it over all others; loyal communities -retain their loyal users over time. By exploring this relation using a large -dataset of discussion communities from Reddit, we reveal that loyalty is -manifested in remarkably consistent behaviors across a wide spectrum of -communities. Loyal users employ language that signals collective identity and -engage with more esoteric, less popular content, indicating they may play a -curational role in surfacing new material. Loyal communities have denser -user-user interaction networks and lower rates of triadic closure, suggesting -that community-level loyalty is associated with more cohesive interactions and -less fragmentation into subgroups. We exploit these general patterns to predict -future rates of loyalty. Our results show that a user's propensity to become -loyal is apparent from their first interactions with a community, suggesting -that some users are intrinsically loyal from the very beginning. -" -4377,1703.03429,Nancy Fulda and Daniel Ricks and Ben Murdoch and David Wingate,What can you do with a rock? Affordance extraction via word embeddings,cs.AI cs.CL," Autonomous agents must often detect affordances: the set of behaviors enabled -by a situation. Affordance detection is particularly helpful in domains with -large action spaces, allowing the agent to prune its search space by avoiding -futile behaviors. This paper presents a method for affordance extraction via -word embeddings trained on a Wikipedia corpus. The resulting word vectors are -treated as a common knowledge database which can be queried using linear -algebra. We apply this method to a reinforcement learning agent in a text-only -environment and show that affordance-based action selection improves -performance most of the time. Our method increases the computational complexity -of each learning step but significantly reduces the total number of steps -needed. In addition, the agent's action selections begin to resemble those a -human would choose. -" -4378,1703.03442,"Vanessa Ferdinand, Simon Kirby, Kenny Smith",The cognitive roots of regularization in language,cs.CL q-bio.NC," Regularization occurs when the output a learner produces is less variable -than the linguistic data they observed. In an artificial language learning -experiment, we show that there exist at least two independent sources of -regularization bias in cognition: a domain-general source based on cognitive -load and a domain-specific source triggered by linguistic stimuli. Both of -these factors modulate how frequency information is encoded and produced, but -only the production-side modulations result in regularization (i.e. cause -learners to eliminate variation from the observed input). We formalize the -definition of regularization as the reduction of entropy and find that entropy -measures are better at identifying regularization behavior than frequency-based -analyses. Using our experimental data and a model of cultural transmission, we -generate predictions for the amount of regularity that would develop in each -experimental condition if the artificial language were transmitted over several -generations of learners. Here we find that the effect of cognitive constraints -can become more complex when put into the context of cultural evolution: -although learning biases certainly carry information about the course of -language evolution, we should not expect a one-to-one correspondence between -the micro-level processes that regularize linguistic datasets and the -macro-level evolution of linguistic regularity. -" -4379,1703.03609,"Saeedreza Shehnepoor, Mostafa Salehi, Reza Farahbakhsh and Noel Crespi","NetSpam: a Network-based Spam Detection Framework for Reviews in Online - Social Media",cs.SI cs.CL cs.IR physics.soc-ph," Nowadays, a big part of people rely on available content in social media in -their decisions (e.g. reviews and feedback on a topic or product). The -possibility that anybody can leave a review provide a golden opportunity for -spammers to write spam reviews about products and services for different -interests. Identifying these spammers and the spam content is a hot topic of -research and although a considerable number of studies have been done recently -toward this end, but so far the methodologies put forth still barely detect -spam reviews, and none of them show the importance of each extracted feature -type. In this study, we propose a novel framework, named NetSpam, which -utilizes spam features for modeling review datasets as heterogeneous -information networks to map spam detection procedure into a classification -problem in such networks. Using the importance of spam features help us to -obtain better results in terms of different metrics experimented on real-world -review datasets from Yelp and Amazon websites. The results show that NetSpam -outperforms the existing methods and among four categories of features; -including review-behavioral, user-behavioral, reviewlinguistic, -user-linguistic, the first type of features performs better than the other -categories. -" -4380,1703.03640,Christina Lioma and Niels Dalum Hansen,"A Study of Metrics of Distance and Correlation Between Ranked Lists for - Compositionality Detection",cs.CL," Compositionality in language refers to how much the meaning of some phrase -can be decomposed into the meaning of its constituents and the way these -constituents are combined. Based on the premise that substitution by synonyms -is meaning-preserving, compositionality can be approximated as the semantic -similarity between a phrase and a version of that phrase where words have been -replaced by their synonyms. Different ways of representing such phrases exist -(e.g., vectors [1] or language models [2]), and the choice of representation -affects the measurement of semantic similarity. - We propose a new compositionality detection method that represents phrases as -ranked lists of term weights. Our method approximates the semantic similarity -between two ranked list representations using a range of well-known distance -and correlation metrics. In contrast to most state-of-the-art approaches in -compositionality detection, our method is completely unsupervised. Experiments -with a publicly available dataset of 1048 human-annotated phrases shows that, -compared to strong supervised baselines, our approach provides superior -measurement of compositionality using any of the distance and correlation -metrics considered. -" -4381,1703.03666,"Sreelekha. S, Pushpak Bhattacharyya","Comparison of SMT and RBMT; The Requirement of Hybridization for - Marathi-Hindi MT",cs.CL," We present in this paper our work on comparison between Statistical Machine -Translation (SMT) and Rule-based machine translation for translation from -Marathi to Hindi. Rule Based systems although robust take lots of time to -build. On the other hand statistical machine translation systems are easier to -create, maintain and improve upon. We describe the development of a basic -Marathi-Hindi SMT system and evaluate its performance. Through a detailed error -analysis, we, point out the relative strengths and weaknesses of both systems. -Effectively, we shall see that even with a small amount of training corpus a -statistical machine translation system has many advantages for high quality -domain specific machine translation over that of a rule-based counterpart. -" -4382,1703.03714,"Matthew Marge, Claire Bonial, Brendan Byrne, Taylor Cassidy, A. - William Evans, Susan G. Hill, Clare Voss",Applying the Wizard-of-Oz Technique to Multimodal Human-Robot Dialogue,cs.CL cs.AI cs.HC cs.RO," Our overall program objective is to provide more natural ways for soldiers to -interact and communicate with robots, much like how soldiers communicate with -other soldiers today. We describe how the Wizard-of-Oz (WOz) method can be -applied to multimodal human-robot dialogue in a collaborative exploration task. -While the WOz method can help design robot behaviors, traditional approaches -place the burden of decisions on a single wizard. In this work, we consider two -wizards to stand in for robot navigation and dialogue management software -components. The scenario used to elicit data is one in which a human-robot team -is tasked with exploring an unknown environment: a human gives verbal -instructions from a remote location and the robot follows them, clarifying -possible misunderstandings as needed via dialogue. We found the division of -labor between wizards to be workable, which holds promise for future software -development. -" -4383,1703.03771,"Jena D. Hwang, Archna Bhatia, Na-Rae Han, Tim O'Gorman, Vivek - Srikumar, Nathan Schneider","Coping with Construals in Broad-Coverage Semantic Annotation of - Adpositions",cs.CL," We consider the semantics of prepositions, revisiting a broad-coverage -annotation scheme used for annotating all 4,250 preposition tokens in a 55,000 -word corpus of English. Attempts to apply the scheme to adpositions and case -markers in other languages, as well as some problematic cases in English, have -led us to reconsider the assumption that a preposition's lexical contribution -is equivalent to the role/relation that it mediates. Our proposal is to embrace -the potential for construal in adposition use, expressing such phenomena -directly at the token level to manage complexity and avoid sense proliferation. -We suggest a framework to represent both the scene role and the adposition's -lexical function so they can be annotated at scale---supporting automatic, -statistical processing of domain-general language---and sketch how this -representation would inform a constructional analysis. -" -4384,1703.03842,B. Goodman and P. F. Tupper,"Effects of Limiting Memory Capacity on the Behaviour of Exemplar - Dynamics",cs.CL," Exemplar models are a popular class of models used to describe language -change. Here we study how limiting the memory capacity of an individual in -these models affects the system's behaviour. In particular we demonstrate the -effect this change has on the extinction of categories. Previous work in -exemplar dynamics has not addressed this question. In order to investigate -this, we will inspect a simplified exemplar model. We will prove for the -simplified model that all the sound categories but one will always become -extinct, whether memory storage is limited or not. However, computer -simulations show that changing the number of stored memories alters how fast -categories become extinct. -" -4385,1703.03906,"Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc Le",Massive Exploration of Neural Machine Translation Architectures,cs.CL," Neural Machine Translation (NMT) has shown remarkable progress over the past -few years with production systems now being deployed to end-users. One major -drawback of current architectures is that they are expensive to train, -typically requiring days to weeks of GPU time to converge. This makes -exhaustive hyperparameter search, as is commonly done with other neural network -architectures, prohibitively expensive. In this work, we present the first -large-scale analysis of NMT architecture hyperparameters. We report empirical -results and variance numbers for several hundred experimental runs, -corresponding to over 250,000 GPU hours on the standard WMT English to German -translation task. Our experiments lead to novel insights and practical advice -for building and extending NMT architectures. As part of this contribution, we -release an open-source NMT framework that enables researchers to easily -experiment with novel techniques and reproduce state of the art results. -" -4386,1703.03923,"Juan-Manuel Torres-Moreno, Gerardo Sierra, Peter Peinl",A German Corpus for Text Similarity Detection Tasks,cs.IR cs.CL," Text similarity detection aims at measuring the degree of similarity between -a pair of texts. Corpora available for text similarity detection are designed -to evaluate the algorithms to assess the paraphrase level among documents. In -this paper we present a textual German corpus for similarity detection. The -purpose of this corpus is to automatically assess the similarity between a pair -of texts and to evaluate different similarity measures, both for whole -documents or for individual sentences. Therefore we have calculated several -simple measures on our corpus based on a library of similarity functions. -" -4387,1703.03939,"Govardana Sachithanandam Ramachandran, Ajay Sohmshetty",Ask Me Even More: Dynamic Memory Tensor Networks (Extended Model),cs.CL cs.LG cs.NE," We examine Memory Networks for the task of question answering (QA), under -common real world scenario where training examples are scarce and under weakly -supervised scenario, that is only extrinsic labels are available for training. -We propose extensions for the Dynamic Memory Network (DMN), specifically within -the attention mechanism, we call the resulting Neural Architecture as Dynamic -Memory Tensor Network (DMTN). Ultimately, we see that our proposed extensions -results in over 80% improvement in the number of task passed against the -baselined standard DMN and 20% more task passed compared to state-of-the-art -End-to-End Memory Network for Facebook's single task weakly trained 1K bAbi -dataset. -" -4388,1703.04001,"Suman Kalyan Maity, Aman Kharb and Animesh Mukherjee","Language Use Matters: Analysis of the Linguistic Structure of Question - Texts Can Characterize Answerability in Quora",cs.CL cs.SI," Quora is one of the most popular community Q&A sites of recent times. -However, many question posts on this Q&A site often do not get answered. In -this paper, we quantify various linguistic activities that discriminates an -answered question from an unanswered one. Our central finding is that the way -users use language while writing the question text can be a very effective -means to characterize answerability. This characterization helps us to predict -early if a question remaining unanswered for a specific time period t will -eventually be answered or not and achieve an accuracy of 76.26% (t = 1 month) -and 68.33% (t = 3 months). Notably, features representing the language use -patterns of the users are most discriminative and alone account for an accuracy -of 74.18%. We also compare our method with some of the similar works (Dror et -al., Yang et al.) achieving a maximum improvement of ~39% in terms of accuracy. -" -4389,1703.04009,"Thomas Davidson, Dana Warmsley, Michael Macy, Ingmar Weber",Automated Hate Speech Detection and the Problem of Offensive Language,cs.CL," A key challenge for automatic hate-speech detection on social media is the -separation of hate speech from other instances of offensive language. Lexical -detection methods tend to have low precision because they classify all messages -containing particular terms as hate speech and previous work using supervised -learning has failed to distinguish between the two categories. We used a -crowd-sourced hate speech lexicon to collect tweets containing hate speech -keywords. We use crowd-sourcing to label a sample of these tweets into three -categories: those containing hate speech, only offensive language, and those -with neither. We train a multi-class classifier to distinguish between these -different categories. Close analysis of the predictions and the errors shows -when we can reliably separate hate speech from other offensive language and -when this differentiation is more difficult. We find that racist and homophobic -tweets are more likely to be classified as hate speech but that sexist tweets -are generally classified as offensive. Tweets without explicit hate keywords -are also more difficult to classify. -" -4390,1703.04081,"Shravan Vasishth, Lena A. J\""ager, Bruno Nicenboim","Feature overwriting as a finite mixture process: Evidence from - comprehension data",stat.ML cs.CL stat.AP," The ungrammatical sentence ""The key to the cabinets are on the table"" is -known to lead to an illusion of grammaticality. As discussed in the -meta-analysis by Jaeger et al., 2017, faster reading times are observed at the -verb are in the agreement-attraction sentence above compared to the equally -ungrammatical sentence ""The key to the cabinet are on the table"". One -explanation for this facilitation effect is the feature percolation account: -the plural feature on cabinets percolates up to the head noun key, leading to -the illusion. An alternative account is in terms of cue-based retrieval (Lewis -& Vasishth, 2005), which assumes that the non-subject noun cabinets is -misretrieved due to a partial feature-match when a dependency completion -process at the auxiliary initiates a memory access for a subject with plural -marking. We present evidence for yet another explanation for the observed -facilitation. Because the second sentence has two nouns with identical number, -it is possible that these are, in some proportion of trials, more difficult to -keep distinct, leading to slower reading times at the verb in the first -sentence above; this is the feature overwriting account of Nairne, 1990. We -show that the feature overwriting proposal can be implemented as a finite -mixture process. We reanalysed ten published data-sets, fitting hierarchical -Bayesian mixture models to these data assuming a two-mixture distribution. We -show that in nine out of the ten studies, a mixture distribution corresponding -to feature overwriting furnishes a superior fit over both the feature -percolation and the cue-based retrieval accounts. -" -4391,1703.04178,Jose Camacho-Collados,"Why we have switched from building full-fledged taxonomies to simply - detecting hypernymy relations",cs.CL," The study of taxonomies and hypernymy relations has been extensive on the -Natural Language Processing (NLP) literature. However, the evaluation of -taxonomy learning approaches has been traditionally troublesome, as it mainly -relies on ad-hoc experiments which are hardly reproducible and manually -expensive. Partly because of this, current research has been lately focusing on -the hypernymy detection task. In this paper we reflect on this trend, analyzing -issues related to current evaluation procedures. Finally, we propose three -potential avenues for future work so that is-a relations and resources based on -them play a more important role in downstream NLP applications. -" -4392,1703.04213,"Meng Jiang, Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance M. Kaplan, - Timothy P. Hanratty, Jiawei Han",MetaPAD: Meta Pattern Discovery from Massive Text Corpora,cs.CL," Mining textual patterns in news, tweets, papers, and many other kinds of text -corpora has been an active theme in text mining and NLP research. Previous -studies adopt a dependency parsing-based pattern discovery approach. However, -the parsing results lose rich context around entities in the patterns, and the -process is costly for a corpus of large scale. In this study, we propose a -novel typed textual pattern structure, called meta pattern, which is extended -to a frequent, informative, and precise subsequence pattern in certain context. -We propose an efficient framework, called MetaPAD, which discovers meta -patterns from massive corpora with three techniques: (1) it develops a -context-aware segmentation method to carefully determine the boundaries of -patterns with a learnt pattern quality assessment function, which avoids costly -dependency parsing and generates high-quality patterns; (2) it identifies and -groups synonymous meta patterns from multiple facets---their types, contexts, -and extractions; and (3) it examines type distributions of entities in the -instances extracted by each group of patterns, and looks for appropriate type -levels to make discovered patterns precise. Experiments demonstrate that our -proposed framework discovers high-quality typed textual patterns efficiently -from different genres of massive corpora and facilitates information -extraction. -" -4393,1703.04247,"Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He",DeepFM: A Factorization-Machine based Neural Network for CTR Prediction,cs.IR cs.CL," Learning sophisticated feature interactions behind user behaviors is critical -in maximizing CTR for recommender systems. Despite great progress, existing -methods seem to have a strong bias towards low- or high-order interactions, or -require expertise feature engineering. In this paper, we show that it is -possible to derive an end-to-end learning model that emphasizes both low- and -high-order feature interactions. The proposed model, DeepFM, combines the power -of factorization machines for recommendation and deep learning for feature -learning in a new neural network architecture. Compared to the latest Wide \& -Deep model from Google, DeepFM has a shared input to its ""wide"" and ""deep"" -parts, with no need of feature engineering besides raw features. Comprehensive -experiments are conducted to demonstrate the effectiveness and efficiency of -DeepFM over the existing models for CTR prediction, on both benchmark data and -commercial data. -" -4394,1703.04330,"Todor Mihaylov, Anette Frank",Story Cloze Ending Selection Baselines and Data Examination,cs.CL," This paper describes two supervised baseline systems for the Story Cloze Test -Shared Task (Mostafazadeh et al., 2016a). We first build a classifier using -features based on word embeddings and semantic similarity computation. We -further implement a neural LSTM system with different encoding strategies that -try to model the relation between the story and the provided endings. Our -experiments show that a model using representation features based on average -word embedding vectors over the given story words and the candidate ending -sentences words, joint with similarity features between the story and candidate -ending representations performed better than the neural models. Our best model -achieves an accuracy of 72.42, ranking 3rd in the official evaluation. -" -4395,1703.04336,Anca Bucur and Sergiu Nisioi,A Visual Representation of Wittgenstein's Tractatus Logico-Philosophicus,cs.IR cs.CL," In this paper we present a data visualization method together with its -potential usefulness in digital humanities and philosophy of language. We -compile a multilingual parallel corpus from different versions of -Wittgenstein's Tractatus Logico-Philosophicus, including the original in German -and translations into English, Spanish, French, and Russian. Using this corpus, -we compute a similarity measure between propositions and render a visual -network of relations for different languages. -" -4396,1703.04357,"Rico Sennrich and Orhan Firat and Kyunghyun Cho and Alexandra Birch - and Barry Haddow and Julian Hitschler and Marcin Junczys-Dowmunt and Samuel - L\""aubli and Antonio Valerio Miceli Barone and Jozef Mokry and Maria - N\u{a}dejde",Nematus: a Toolkit for Neural Machine Translation,cs.CL," We present Nematus, a toolkit for Neural Machine Translation. The toolkit -prioritizes high translation accuracy, usability, and extensibility. Nematus -has been used to build top-performing submissions to shared translation tasks -at WMT and IWSLT, and has been used to train systems for production -environments. -" -4397,1703.04417,Franco M. Luque,El Lenguaje Natural como Lenguaje Formal,cs.CL cs.FL," Formal languages theory is useful for the study of natural language. In -particular, it is of interest to study the adequacy of the grammatical -formalisms to express syntactic phenomena present in natural language. First, -it helps to draw hypothesis about the nature and complexity of the -speaker-hearer linguistic competence, a fundamental question in linguistics and -other cognitive sciences. Moreover, from an engineering point of view, it -allows the knowledge of practical limitations of applications based on those -formalisms. In this article I introduce the adequacy problem of grammatical -formalisms for natural language, also introducing some formal language theory -concepts required for this discussion. Then, I review the formalisms that have -been proposed in history, and the arguments that have been given to support or -reject their adequacy. - ----- - La teor\'ia de lenguajes formales es \'util para el estudio de los lenguajes -naturales. En particular, resulta de inter\'es estudiar la adecuaci\'on de los -formalismos gramaticales para expresar los fen\'omenos sint\'acticos presentes -en el lenguaje natural. Primero, ayuda a trazar hip\'otesis acerca de la -naturaleza y complejidad de las competencias ling\""u\'isticas de los -hablantes-oyentes del lenguaje, un interrogante fundamental de la -ling\""u\'istica y otras ciencias cognitivas. Adem\'as, desde el punto de vista -de la ingenier\'ia, permite conocer limitaciones pr\'acticas de las -aplicaciones basadas en dichos formalismos. En este art\'iculo hago una -introducci\'on al problema de la adecuaci\'on de los formalismos gramaticales -para el lenguaje natural, introduciendo tambi\'en algunos conceptos de la -teor\'ia de lenguajes formales necesarios para esta discusi\'on. Luego, hago un -repaso de los formalismos que han sido propuestos a lo largo de la historia, y -de los argumentos que se han dado para sostener o refutar su adecuaci\'on. -" -4398,1703.04474,"Lingpeng Kong, Chris Alberti, Daniel Andor, Ivan Bogatyy, David Weiss","DRAGNN: A Transition-based Framework for Dynamically Connected Neural - Networks",cs.CL," In this work, we present a compact, modular framework for constructing novel -recurrent neural architectures. Our basic module is a new generic unit, the -Transition Based Recurrent Unit (TBRU). In addition to hidden layer -activations, TBRUs have discrete state dynamics that allow network connections -to be built dynamically as a function of intermediate activations. By -connecting multiple TBRUs, we can extend and combine commonly used -architectures such as sequence-to-sequence, attention mechanisms, and -re-cursive tree-structured models. A TBRU can also serve as both an encoder for -downstream tasks and as a decoder for its own task simultaneously, resulting in -more accurate multi-task learning. We call our approach Dynamic Recurrent -Acyclic Graphical Neural Networks, or DRAGNN. We show that DRAGNN is -significantly more accurate and efficient than seq2seq with attention for -syntactic dependency parsing and yields more accurate multi-task learning for -extractive summarization tasks. -" -4399,1703.04481,John Goldsmith and Eric Rosen,Geometrical morphology,cs.CL," We explore inflectional morphology as an example of the relationship of the -discrete and the continuous in linguistics. The grammar requests a form of a -lexeme by specifying a set of feature values, which corresponds to a corner M -of a hypercube in feature value space. The morphology responds to that request -by providing a morpheme, or a set of morphemes, whose vector sum is -geometrically closest to the corner M. In short, the chosen morpheme $\mu$ is -the morpheme (or set of morphemes) that maximizes the inner product of $\mu$ -and M. -" -4400,1703.04489,"Georgiana Dinu, Wael Hamza and Radu Florian",Reinforcement Learning for Transition-Based Mention Detection,cs.CL cs.AI," This paper describes an application of reinforcement learning to the mention -detection task. We define a novel action-based formulation for the mention -detection task, in which a model can flexibly revise past labeling decisions by -grouping together tokens and assigning partial mention labels. We devise a -method to create mention-level episodes and we train a model by rewarding -correctly labeled complete mentions, irrespective of the inner structure -created. The model yields results which are on par with a competitive -supervised counterpart while being more flexible in terms of achieving targeted -behavior through reward modeling and generating internal mention structure, -especially on longer mentions. -" -4401,1703.04498,"Preeti Bhargava, Nemanja Spasojevic, Guoning Hu","High-Throughput and Language-Agnostic Entity Disambiguation and Linking - on User Generated Data",cs.IR cs.AI cs.CL," The Entity Disambiguation and Linking (EDL) task matches entity mentions in -text to a unique Knowledge Base (KB) identifier such as a Wikipedia or Freebase -id. It plays a critical role in the construction of a high quality information -network, and can be further leveraged for a variety of information retrieval -and NLP tasks such as text categorization and document tagging. EDL is a -complex and challenging problem due to ambiguity of the mentions and real world -text being multi-lingual. Moreover, EDL systems need to have high throughput -and should be lightweight in order to scale to large datasets and run on -off-the-shelf machines. More importantly, these systems need to be able to -extract and disambiguate dense annotations from the data in order to enable an -Information Retrieval or Extraction task running on the data to be more -efficient and accurate. In order to address all these challenges, we present -the Lithium EDL system and algorithm - a high-throughput, lightweight, -language-agnostic EDL system that extracts and correctly disambiguates 75% more -entities than state-of-the-art EDL systems and is significantly faster than -them. -" -4402,1703.04512,"Henri Hudrisier, Ben Henda Mokhtar","Normalisation de la langue et de lecriture arabe : enjeux culturels - regionaux et mondiaux",cs.CY cs.CL," Arabic language and writing are now facing a resurgence of international -normative solutions that challenge most of their local or network based -operating principles. Even if the multilingual digital coding solutions, -especially those proposed by Unicode, have solved many difficulties of Arabic -writing, the linguistic aspect is still in search of more adapted solutions. -Terminology is one of the sectors in which the Arabic language requires a deep -modernization of its classical productivity models. The normative approach, in -particular that of the ISO TC37, is proposed as one of the solutions that would -allow it to combine with international standards to better integrate the -knowledge society under construction. - La langue et lecriture arabe sont aujourdhui confrontees a une recrudescence -de solutions normatives internationales qui remettent en cause la plupart de -leurs principes de fonctionnement en site ou sur les reseaux. Meme si les -solutions du codage numerique multilingue, notamment celles proposees par -Unicode, ont resolu beaucoup de difficultes de lecriture arabe, le volet -linguistique est encore en quete de solutions plus adaptees. La terminologie -est lun des secteurs dans lequel la langue arabe necessite une modernisation -profonde de ses modeles classiques de production. La voie normative, notamment -celle du TC37 de ISO, est proposee comme une des solutions qui lui permettrait -de se mettre en synergie avec les referentiels internationaux pour mieux -integrer la societe du savoir en voie de construction. -" -4403,1703.04617,"Junbei Zhang, Xiaodan Zhu, Qian Chen, Lirong Dai, Si Wei, and Hui - Jiang","Exploring Question Understanding and Adaptation in Neural-Network-Based - Question Answering",cs.CL," The last several years have seen intensive interest in exploring -neural-network-based models for machine comprehension (MC) and question -answering (QA). In this paper, we approach the problems by closely modelling -questions in a neural network framework. We first introduce syntactic -information to help encode questions. We then view and model different types of -questions and the information shared among them as an adaptation task and -proposed adaptation models for them. On the Stanford Question Answering Dataset -(SQuAD), we show that these approaches can help attain better results over a -competitive baseline. -" -4404,1703.04650,"Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas Raykar, Lili - Kotlerman, Guy Lev","Joint Learning of Correlated Sequence Labelling Tasks Using - Bidirectional Recurrent Neural Networks",cs.CL," The stream of words produced by Automatic Speech Recognition (ASR) systems is -typically devoid of punctuations and formatting. Most natural language -processing applications expect segmented and well-formatted texts as input, -which is not available in ASR output. This paper proposes a novel technique of -jointly modeling multiple correlated tasks such as punctuation and -capitalization using bidirectional recurrent neural networks, which leads to -improved performance for each of these tasks. This method could be extended for -joint modeling of any other correlated sequence labeling tasks. -" -4405,1703.04677,"Paul M\""atzig, Shravan Vasishth, Felix Engelmann, David Caplan","A computational investigation of sources of variability in sentence - comprehension difficulty in aphasia",cs.CL cs.AI," We present a computational evaluation of three hypotheses about sources of -deficit in sentence comprehension in aphasia: slowed processing, intermittent -deficiency, and resource reduction. The ACT-R based Lewis and Vasishth (2005) -model is used to implement these three proposals. Slowed processing is -implemented as slowed default production-rule firing time; intermittent -deficiency as increased random noise in activation of chunks in memory; and -resource reduction as reduced goal activation. As data, we considered subject -vs. object rela- tives whose matrix clause contained either an NP or a -reflexive, presented in a self-paced listening modality to 56 individuals with -aphasia (IWA) and 46 matched controls. The participants heard the sentences and -carried out a picture verification task to decide on an interpretation of the -sentence. These response accuracies are used to identify the best parameters -(for each participant) that correspond to the three hypotheses mentioned above. -We show that controls have more tightly clustered (less variable) parameter -values than IWA; specifically, compared to controls, among IWA there are more -individuals with low goal activations, high noise, and slow default action -times. This suggests that (i) individual patients show differential amounts of -deficit along the three dimensions of slowed processing, intermittent -deficient, and resource reduction, (ii) overall, there is evidence for all -three sources of deficit playing a role, and (iii) IWA have a more variable -range of parameter values than controls. In sum, this study contributes a proof -of concept of a quantitative implementation of, and evidence for, these three -accounts of comprehension deficits in aphasia. -" -4406,1703.04718,"Iria da Cunha, Eric SanJuan, Juan-Manuel Torres-Moreno, Irene - Castell\'on","Extending Automatic Discourse Segmentation for Texts in Spanish to - Catalan",cs.CL," At present, automatic discourse analysis is a relevant research topic in the -field of NLP. However, discourse is one of the phenomena most difficult to -process. Although discourse parsers have been already developed for several -languages, this tool does not exist for Catalan. In order to implement this -kind of parser, the first step is to develop a discourse segmenter. In this -article we present the first discourse segmenter for texts in Catalan. This -segmenter is based on Rhetorical Structure Theory (RST) for Spanish, and uses -lexical and syntactic information to translate rules valid for Spanish into -rules for Catalan. We have evaluated the system by using a gold standard corpus -including manually segmented texts and results are promising. -" -4407,1703.04783,"Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey",Multichannel End-to-end Speech Recognition,cs.SD cs.CL," The field of speech recognition is in the midst of a paradigm shift: -end-to-end neural networks are challenging the dominance of hidden Markov -models as a core technology. Using an attention mechanism in a recurrent -encoder-decoder architecture solves the dynamic time alignment problem, -allowing joint end-to-end training of the acoustic and language modeling -components. In this paper we extend the end-to-end framework to encompass -microphone array signal processing for noise suppression and speech enhancement -within the acoustic encoding network. This allows the beamforming components to -be optimized jointly within the recognition architecture to improve the -end-to-end speech recognition objective. Experiments on the noisy speech -benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system -outperformed the attention-based baseline with input from a conventional -adaptive beamformer. -" -4408,1703.04816,Dirk Weissenborn and Georg Wiese and Laura Seiffe,Making Neural QA as Simple as Possible but not Simpler,cs.CL cs.AI cs.NE," Recent development of large-scale question answering (QA) datasets triggered -a substantial amount of research into end-to-end neural architectures for QA. -Increasingly complex systems have been conceived without comparison to simpler -neural baseline systems that would justify their complexity. In this work, we -propose a simple heuristic that guides the development of neural baseline -systems for the extractive QA task. We find that there are two ingredients -necessary for building a high-performing neural QA system: first, the awareness -of question words while processing the context and second, a composition -function that goes beyond simple bag-of-words modeling, such as recurrent -neural networks. Our results show that FastQA, a system that meets these two -requirements, can achieve very competitive performance compared with existing -models. We argue that this surprising finding puts results of previous systems -and the complexity of recent QA datasets into perspective. -" -4409,1703.04826,Diego Marcheggiani and Ivan Titov,"Encoding Sentences with Graph Convolutional Networks for Semantic Role - Labeling",cs.CL cs.LG," Semantic role labeling (SRL) is the task of identifying the -predicate-argument structure of a sentence. It is typically regarded as an -important step in the standard NLP pipeline. As the semantic representations -are closely related to syntactic ones, we exploit syntactic information in our -model. We propose a version of graph convolutional networks (GCNs), a recent -class of neural networks operating on graphs, suited to model syntactic -dependency graphs. GCNs over syntactic dependency trees are used as sentence -encoders, producing latent feature representations of words in a sentence. We -observe that GCN layers are complementary to LSTM ones: when we stack both GCN -and LSTM layers, we obtain a substantial improvement over an already -state-of-the-art LSTM SRL model, resulting in the best reported scores on the -standard benchmark (CoNLL-2009) both for Chinese and English. -" -4410,1703.04854,"Junhua He, Hankz Hankui Zhuo and Jarvan Law","Distributed-Representation Based Hybrid Recommender System with Short - Item Descriptions",cs.IR cs.CL," Collaborative filtering (CF) aims to build a model from users' past behaviors -and/or similar decisions made by other users, and use the model to recommend -items for users. Despite of the success of previous collaborative filtering -approaches, they are all based on the assumption that there are sufficient -rating scores available for building high-quality recommendation models. In -real world applications, however, it is often difficult to collect sufficient -rating scores, especially when new items are introduced into the system, which -makes the recommendation task challenging. We find that there are often ""short"" -texts describing features of items, based on which we can approximate the -similarity of items and make recommendation together with rating scores. In -this paper we ""borrow"" the idea of vector representation of words to capture -the information of short texts and embed it into a matrix factorization -framework. We empirically show that our approach is effective by comparing it -with state-of-the-art approaches. -" -4411,1703.04879,Ai Hirata and Mamoru Komachi,Sparse Named Entity Classification using Factorization Machines,cs.CL," Named entity classification is the task of classifying text-based elements -into various categories, including places, names, dates, times, and monetary -values. A bottleneck in named entity classification, however, is the data -problem of sparseness, because new named entities continually emerge, making it -rather difficult to maintain a dictionary for named entity classification. -Thus, in this paper, we address the problem of named entity classification -using matrix factorization to overcome the problem of feature sparsity. -Experimental results show that our proposed model, with fewer features and a -smaller size, achieves competitive accuracy to state-of-the-art models. -" -4412,1703.04887,"Zhen Yang, Wei Chen, Feng Wang and Bo Xu","Improving Neural Machine Translation with Conditional Sequence - Generative Adversarial Nets",cs.CL," This paper proposes an approach for applying GANs to NMT. We build a -conditional sequence generative adversarial net which comprises of two -adversarial sub models, a generator and a discriminator. The generator aims to -generate sentences which are hard to be discriminated from human-translated -sentences (i.e., the golden target sentences), And the discriminator makes -efforts to discriminate the machine-generated sentences from human-translated -ones. The two sub models play a mini-max game and achieve the win-win situation -when they reach a Nash Equilibrium. Additionally, the static sentence-level -BLEU is utilized as the reinforced objective for the generator, which biases -the generation towards high BLEU points. During training, both the dynamic -discriminator and the static BLEU objective are employed to evaluate the -generated sentences and feedback the evaluations to guide the learning of the -generator. Experimental results show that the proposed model consistently -outperforms the traditional RNNSearch and the newly emerged state-of-the-art -Transformer on English-German and Chinese-English translation tasks. -" -4413,1703.04908,"Igor Mordatch, Pieter Abbeel",Emergence of Grounded Compositional Language in Multi-Agent Populations,cs.AI cs.CL," By capturing statistical patterns in large corpora, machine learning has -enabled significant advances in natural language processing, including in -machine translation, question answering, and sentiment analysis. However, for -agents to intelligently interact with humans, simply capturing the statistical -patterns is insufficient. In this paper we investigate if, and how, grounded -compositional language can emerge as a means to achieve goals in multi-agent -populations. Towards this end, we propose a multi-agent learning environment -and learning methods that bring about emergence of a basic compositional -language. This language is represented as streams of abstract discrete symbols -uttered by agents over time, but nonetheless has a coherent structure that -possesses a defined vocabulary and syntax. We also observe emergence of -non-verbal communication such as pointing and guiding when language -communication is unavailable. -" -4414,1703.04914,"Ikuya Yamada, Motoki Sato, Hiroyuki Shindo",Ensemble of Neural Classifiers for Scoring Knowledge Base Triples,cs.CL cs.IR," This paper describes our approach for the triple scoring task at the WSDM Cup -2017. The task required participants to assign a relevance score for each pair -of entities and their types in a knowledge base in order to enhance the ranking -results in entity retrieval tasks. We propose an approach wherein the outputs -of multiple neural network classifiers are combined using a supervised machine -learning model. The experimental results showed that our proposed method -achieved the best performance in one out of three measures (i.e., Kendall's -tau), and performed competitively in the other two measures (i.e., accuracy and -average score difference). -" -4415,1703.04929,"Chris Alberti, Daniel Andor, Ivan Bogatyy, Michael Collins, Dan - Gillick, Lingpeng Kong, Terry Koo, Ji Ma, Mark Omernick, Slav Petrov, Chayut - Thanapirom, Zora Tung, David Weiss",SyntaxNet Models for the CoNLL 2017 Shared Task,cs.CL," We describe a baseline dependency parsing system for the CoNLL2017 Shared -Task. This system, which we call ""ParseySaurus,"" uses the DRAGNN framework -[Kong et al, 2017] to combine transition-based recurrent parsing and tagging -with character-based word representations. On the v1.3 Universal Dependencies -Treebanks, the new system outpeforms the publicly available, state-of-the-art -""Parsey's Cousins"" models by 3.47% absolute Labeled Accuracy Score (LAS) across -52 treebanks. -" -4416,1703.05122,"Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Prithwish Mukherjee, - Monojit Choudhury, Animesh Mukherjee","Is this word borrowed? An automatic approach to quantify the likeliness - of borrowing in social media",cs.CL," Code-mixing or code-switching are the effortless phenomena of natural -switching between two or more languages in a single conversation. Use of a -foreign word in a language; however, does not necessarily mean that the speaker -is code-switching because often languages borrow lexical items from other -languages. If a word is borrowed, it becomes a part of the lexicon of a -language; whereas, during code-switching, the speaker is aware that the -conversation involves foreign words or phrases. Identifying whether a foreign -word used by a bilingual speaker is due to borrowing or code-switching is a -fundamental importance to theories of multilingualism, and an essential -prerequisite towards the development of language and speech technologies for -multilingual communities. In this paper, we present a series of novel -computational methods to identify the borrowed likeliness of a word, based on -the social media signals. We first propose context based clustering method to -sample a set of candidate words from the social media data.Next, we propose -three novel and similar metrics based on the usage of these words by the users -in different tweets; these metrics were used to score and rank the candidate -words indicating their borrowed likeliness. We compare these rankings with a -ground truth ranking constructed through a human judgment experiment. The -Spearman's rank correlation between the two rankings (nearly 0.62 for all the -three metric variants) is more than double the value (0.26) of the most -competitive existing baseline reported in the literature. Some other striking -observations are, (i) the correlation is higher for the ground truth data -elicited from the younger participants (age less than 30) than that from the -older participants, and (ii )those participants who use mixed-language for -tweeting the least, provide the best signals of borrowing. -" -4417,1703.05123,"Svitlana Vakulenko, Lyndon Nixon, Mihai Lupu",Character-based Neural Embeddings for Tweet Clustering,cs.IR cs.CL," In this paper we show how the performance of tweet clustering can be improved -by leveraging character-based neural networks. The proposed approach overcomes -the limitations related to the vocabulary explosion in the word-based models -and allows for the seamless processing of the multilingual content. Our -evaluation results and code are available on-line at -https://github.com/vendi12/tweet2vec_clustering -" -4418,1703.05260,"Ashutosh Modi and Tatjana Anikina and Simon Ostermann and Manfred - Pinkal",InScript: Narrative texts annotated with script information,cs.CL cs.AI," This paper presents the InScript corpus (Narrative Texts Instantiating Script -structure). InScript is a corpus of 1,000 stories centered around 10 different -scenarios. Verbs and noun phrases are annotated with event and participant -types, respectively. Additionally, the text is annotated with coreference -information. The corpus shows rich lexical variation and will serve as a unique -resource for the study of the role of script knowledge in natural language -processing. -" -4419,1703.05320,"Phong-Khac Do, Huy-Tien Nguyen, Chien-Xuan Tran, Minh-Tien Nguyen, and - Minh-Le Nguyen","Legal Question Answering using Ranking SVM and Deep Convolutional Neural - Network",cs.CL cs.AI," This paper presents a study of employing Ranking SVM and Convolutional Neural -Network for two missions: legal information retrieval and question answering in -the Competition on Legal Information Extraction/Entailment. For the first task, -our proposed model used a triple of features (LSI, Manhattan, Jaccard), and is -based on paragraph level instead of article level as in previous studies. In -fact, each single-paragraph article corresponds to a particular paragraph in a -huge multiple-paragraph article. For the legal question answering task, -additional statistical features from information retrieval task integrated into -Convolutional Neural Network contribute to higher accuracy. -" -4420,1703.05390,"Sercan O. Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew - Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates","Convolutional Recurrent Neural Networks for Small-Footprint Keyword - Spotting",cs.CL cs.AI cs.LG," Keyword spotting (KWS) constitutes a major component of human-technology -interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate, -while minimizing the footprint size, latency and complexity are the goals for -KWS. Towards achieving them, we study Convolutional Recurrent Neural Networks -(CRNNs). Inspired by large-scale state-of-the-art speech recognition systems, -we combine the strengths of convolutional layers and recurrent layers to -exploit local structure and long-range context. We analyze the effect of -architecture parameters, and propose training strategies to improve -performance. With only ~230k parameters, our CRNN model yields acceptably low -latency, and achieves 97.71% accuracy at 0.5 FA/hour for 5 dB signal-to-noise -ratio. -" -4421,1703.05423,"Florian Strub and Harm de Vries and Jeremie Mary and Bilal Piot and - Aaron Courville and Olivier Pietquin","End-to-end optimization of goal-driven and visually grounded dialogue - systems",cs.CL," End-to-end design of dialogue systems has recently become a popular research -topic thanks to powerful tools such as encoder-decoder architectures for -sequence-to-sequence learning. Yet, most current approaches cast human-machine -dialogue management as a supervised learning problem, aiming at predicting the -next utterance of a participant given the full history of the dialogue. This -vision is too simplistic to render the intrinsic planning problem inherent to -dialogue as well as its grounded nature, making the context of a dialogue -larger than the sole history. This is why only chit-chat and question answering -tasks have been addressed so far using end-to-end architectures. In this paper, -we introduce a Deep Reinforcement Learning method to optimize visually grounded -task-oriented dialogues, based on the policy gradient algorithm. This approach -is tested on a dataset of 120k dialogues collected through Mechanical Turk and -provides encouraging results at solving both the problem of generating natural -dialogues and the task of discovering a specific object in a complex picture. -" -4422,1703.05465,Wenli Zhuang and Ernie Chang,"Neobility at SemEval-2017 Task 1: An Attention-based Sentence Similarity - Model",cs.CL," This paper describes a neural-network model which performed competitively -(top 6) at the SemEval 2017 cross-lingual Semantic Textual Similarity (STS) -task. Our system employs an attention-based recurrent neural network model that -optimizes the sentence similarity. In this paper, we describe our participation -in the multilingual STS task which measures similarity across English, Spanish, -and Arabic. -" -4423,1703.05706,"Myungha Jang, Jinho D. Choi, James Allan",Improving Document Clustering by Eliminating Unnatural Language,cs.IR cs.CL," Technical documents contain a fair amount of unnatural language, such as -tables, formulas, pseudo-codes, etc. Unnatural language can be an important -factor of confusing existing NLP tools. This paper presents an effective method -of distinguishing unnatural language from natural language, and evaluates the -impact of unnatural language detection on NLP tasks such as document -clustering. We view this problem as an information extraction task and build a -multiclass classification model identifying unnatural language components into -four categories. First, we create a new annotated corpus by collecting slides -and papers in various formats, PPT, PDF, and HTML, where unnatural language -components are annotated into four categories. We then explore features -available from plain text to build a statistical model that can handle any -format as long as it is converted into plain text. Our experiments show that -removing unnatural language components gives an absolute improvement in -document clustering up to 15%. Our corpus and tool are publicly available. -" -4424,1703.05851,"Yuanliang Meng, Anna Rumshisky, Alexey Romanov","Temporal Information Extraction for Question Answering Using Syntactic - Dependencies in an LSTM-based Architecture",cs.IR cs.CL," In this paper, we propose to use a set of simple, uniform in architecture -LSTM-based models to recover different kinds of temporal relations from text. -Using the shortest dependency path between entities as input, the same -architecture is used to extract intra-sentence, cross-sentence, and document -creation time relations. A ""double-checking"" technique reverses entity pairs in -classification, boosting the recall of positive cases and reducing -misclassifications between opposite classes. An efficient pruning algorithm -resolves conflicts globally. Evaluated on QA-TempEval (SemEval2015 Task 5), our -proposed technique outperforms state-of-the-art methods by a large margin. -" -4425,1703.05880,"Wenpeng Li, BinBin Zhang, Lei Xie, Dong Yu","Empirical Evaluation of Parallel Training Algorithms on Acoustic - Modeling",cs.CL cs.LG cs.SD eess.AS," Deep learning models (DLMs) are state-of-the-art techniques in speech -recognition. However, training good DLMs can be time consuming especially for -production-size models and corpora. Although several parallel training -algorithms have been proposed to improve training efficiency, there is no clear -guidance on which one to choose for the task in hand due to lack of systematic -and fair comparison among them. In this paper we aim at filling this gap by -comparing four popular parallel training algorithms in speech recognition, -namely asynchronous stochastic gradient descent (ASGD), blockwise model-update -filtering (BMUF), bulk synchronous parallel (BSP) and elastic averaging -stochastic gradient descent (EASGD), on 1000-hour LibriSpeech corpora using -feed-forward deep neural networks (DNNs) and convolutional, long short-term -memory, DNNs (CLDNNs). Based on our experiments, we recommend using BMUF as the -top choice to train acoustic models since it is most stable, scales well with -number of GPUs, can achieve reproducible results, and in many cases even -outperforms single-GPU SGD. ASGD can be used as a substitute in some cases. -" -4426,1703.05908,Yao-Hung Hubert Tsai and Liang-Kang Huang and Ruslan Salakhutdinov,Learning Robust Visual-Semantic Embeddings,cs.CV cs.CL cs.LG," Many of the existing methods for learning joint embedding of images and text -use only supervised information from paired images and its textual attributes. -Taking advantage of the recent success of unsupervised learning in deep neural -networks, we propose an end-to-end learning framework that is able to extract -more robust multi-modal representations across domains. The proposed method -combines representation learning models (i.e., auto-encoders) together with -cross-domain learning criteria (i.e., Maximum Mean Discrepancy loss) to learn -joint embeddings for semantic and visual features. A novel technique of -unsupervised-data adaptation inference is introduced to construct more -comprehensive embeddings for both labeled and unlabeled data. We evaluate our -method on Animals with Attributes and Caltech-UCSD Birds 200-2011 dataset with -a wide range of applications, including zero and few-shot image recognition and -retrieval, from inductive to transductive settings. Empirically, we show that -our framework improves over the current state of the art on many of the -considered tasks. -" -4427,1703.05916,Yuya Sakaizawa and Mamoru Komachi,Construction of a Japanese Word Similarity Dataset,cs.CL," An evaluation of distributed word representation is generally conducted using -a word similarity task and/or a word analogy task. There are many datasets -readily available for these tasks in English. However, evaluating distributed -representation in languages that do not have such resources (e.g., Japanese) is -difficult. Therefore, as a first step toward evaluating distributed -representations in Japanese, we constructed a Japanese word similarity dataset. -To the best of our knowledge, our dataset is the first resource that can be -used to evaluate distributed representations in Japanese. Moreover, our dataset -contains various parts of speech and includes rare words in addition to common -words. -" -4428,1703.06108,"Prantik Bhattacharyya, Nemanja Spasojevic",Global Entity Ranking Across Multiple Languages,cs.IR cs.CL cs.SI," We present work on building a global long-tailed ranking of entities across -multiple languages using Wikipedia and Freebase knowledge bases. We identify -multiple features and build a model to rank entities using a ground-truth -dataset of more than 10 thousand labels. The final system ranks 27 million -entities with 75% precision and 48% F1 score. We provide performance evaluation -and empirical evidence of the quality of ranking across languages, and open the -final ranked lists for future research. -" -4429,1703.06345,"Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen","Transfer Learning for Sequence Tagging with Hierarchical Recurrent - Networks",cs.CL cs.LG," Recent papers have shown that neural networks obtain state-of-the-art -performance on several different sequence tagging tasks. One appealing property -of such systems is their generality, as excellent performance can be achieved -with a unified architecture and without task-specific feature engineering. -However, it is unclear if such systems can be used for tasks without large -amounts of training data. In this paper we explore the problem of transfer -learning for neural sequence taggers, where a source task with plentiful -annotations (e.g., POS tagging on Penn Treebank) is used to improve performance -on a target task with fewer available annotations (e.g., POS tagging for -microblogs). We examine the effects of transfer learning for deep hierarchical -recurrent networks across domains, applications, and languages, and show that -significant improvement can often be obtained. These improvements lead to -improvements over the current state-of-the-art on several well-studied tasks. -" -4430,1703.06492,"Jia-Hong Huang, Modar Alfadly, Bernard Ghanem",VQABQ: Visual Question Answering by Basic Questions,cs.CV cs.CL," Taking an image and question as the input of our method, it can output the -text-based answer of the query question about the given image, so called Visual -Question Answering (VQA). There are two main modules in our algorithm. Given a -natural language question about an image, the first module takes the question -as input and then outputs the basic questions of the main given question. The -second module takes the main question, image and these basic questions as input -and then outputs the text-based answer of the main question. We formulate the -basic questions generation problem as a LASSO optimization problem, and also -propose a criterion about how to exploit these basic questions to help answer -main question. Our method is evaluated on the challenging VQA dataset and -yields state-of-the-art accuracy, 60.34% in open-ended task. -" -4431,1703.06501,"Elvys Linhares Pontes, Thiago Gouveia da Silva, Andr\'ea Carneiro - Linhares, Juan-Manuel Torres-Moreno, St\'ephane Huet","M\'etodos de Otimiza\c{c}\~ao Combinat\'oria Aplicados ao Problema de - Compress\~ao MultiFrases",cs.CL," The Internet has led to a dramatic increase in the amount of available -information. In this context, reading and understanding this flow of -information have become costly tasks. In the last years, to assist people to -understand textual data, various Natural Language Processing (NLP) applications -based on Combinatorial Optimization have been devised. However, for -Multi-Sentences Compression (MSC), method which reduces the sentence length -without removing core information, the insertion of optimization methods -requires further study to improve the performance of MSC. This article -describes a method for MSC using Combinatorial Optimization and Graph Theory to -generate more informative sentences while maintaining their grammaticality. An -experiment led on a corpus of 40 clusters of sentences shows that our system -has achieved a very good quality and is better than the state-of-the-art. -" -4432,1703.06541,Shervin Malmasi and Mark Dras,Native Language Identification using Stacked Generalization,cs.CL," Ensemble methods using multiple classifiers have proven to be the most -successful approach for the task of Native Language Identification (NLI), -achieving the current state of the art. However, a systematic examination of -ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble -architectures such as classifier stacking have not been closely evaluated. We -present a set of experiments using three ensemble-based models, testing each -with multiple configurations and algorithms. This includes a rigorous -application of meta-classification models for NLI, achieving state-of-the-art -results on three datasets from different languages. We also present the first -use of statistical significance testing for comparing NLI systems, showing that -our results are significantly better than the previous state of the art. We -make available a collection of test set predictions to facilitate future -statistical tests. -" -4433,1703.06585,"Abhishek Das, Satwik Kottur, Jos\'e M. F. Moura, Stefan Lee, Dhruv - Batra","Learning Cooperative Visual Dialog Agents with Deep Reinforcement - Learning",cs.CV cs.AI cs.CL cs.LG," We introduce the first goal-driven training for visual question answering and -dialog agents. Specifically, we pose a cooperative 'image guessing' game -between two agents -- Qbot and Abot -- who communicate in natural language -dialog so that Qbot can select an unseen image from a lineup of images. We use -deep reinforcement learning (RL) to learn the policies of these agents -end-to-end -- from pixels to multi-agent multi-round dialog to game reward. - We demonstrate two experimental results. - First, as a 'sanity check' demonstration of pure RL (from scratch), we show -results on a synthetic world, where the agents communicate in ungrounded -vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find -that two bots invent their own communication protocol and start using certain -symbols to ask/answer about certain visual attributes (shape/color/style). -Thus, we demonstrate the emergence of grounded language and communication among -'visual' dialog agents with no human supervision. - Second, we conduct large-scale real-image experiments on the VisDial dataset, -where we pretrain with supervised dialog data and show that the RL 'fine-tuned' -agents significantly outperform SL agents. Interestingly, the RL Qbot learns to -ask questions that Abot is good at, ultimately resulting in more informative -dialog and a better team. -" -4434,1703.06630,"Mohamed Morchid, Juan-Manuel Torres-Moreno, Richard Dufour, Javier - Ram\'irez-Rodr\'iguez, Georges Linar\`es","Automatic Text Summarization Approaches to Speed up Topic Model Learning - Process",cs.IR cs.CL," The number of documents available into Internet moves each day up. For this -reason, processing this amount of information effectively and expressibly -becomes a major concern for companies and scientists. Methods that represent a -textual document by a topic representation are widely used in Information -Retrieval (IR) to process big data such as Wikipedia articles. One of the main -difficulty in using topic model on huge data collection is related to the -material resources (CPU time and memory) required for model estimate. To deal -with this issue, we propose to build topic spaces from summarized documents. In -this paper, we present a study of topic space representation in the context of -big data. The topic space representation behavior is analyzed on different -languages. Experiments show that topic spaces estimated from text summaries are -as relevant as those estimated from the complete documents. The real advantage -of such an approach is the processing time gain: we showed that the processing -time can be drastically reduced using summarized documents (more than 60\% in -general). This study finally points out the differences between thematic -representations of documents depending on the targeted languages such as -English or latin languages. -" -4435,1703.06642,"Diederik Aerts, Jonito Aerts Arguelles, Lester Beltran, Lyneth - Beltran, Isaac Distrito, Massimiliano Sassoli de Bianchi, Sandro Sozzo and - Tomas Veloz",Towards a Quantum World Wide Web,cs.AI cs.CL quant-ph," We elaborate a quantum model for the meaning associated with corpora of -written documents, like the pages forming the World Wide Web. To that end, we -are guided by how physicists constructed quantum theory for microscopic -entities, which unlike classical objects cannot be fully represented in our -spatial theater. We suggest that a similar construction needs to be carried out -by linguists and computational scientists, to capture the full meaning carried -by collections of documental entities. More precisely, we show how to associate -a quantum-like 'entity of meaning' to a 'language entity formed by printed -documents', considering the latter as the collection of traces that are left by -the former, in specific results of search actions that we describe as -measurements. In other words, we offer a perspective where a collection of -documents, like the Web, is described as the space of manifestation of a more -complex entity - the QWeb - which is the object of our modeling, drawing its -inspiration from previous studies on operational-realistic approaches to -quantum physics and quantum modeling of human cognition and decision-making. We -emphasize that a consistent QWeb model needs to account for the observed -correlations between words appearing in printed documents, e.g., -co-occurrences, as the latter would depend on the 'meaning connections' -existing between the concepts that are associated with these words. In that -respect, we show that both 'context and interference (quantum) effects' are -required to explain the probabilities calculated by counting the relative -number of documents containing certain words and co-ocurrrences of words. -" -4436,1703.06676,"Hao Dong, Jingqing Zhang, Douglas McIlwraith, Yike Guo",I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation,cs.CV cs.CL," Translating information between text and image is a fundamental problem in -artificial intelligence that connects natural language processing and computer -vision. In the past few years, performance in image caption generation has seen -significant improvement through the adoption of recurrent neural networks -(RNN). Meanwhile, text-to-image generation begun to generate plausible images -using datasets of specific categories like birds and flowers. We've even seen -image generation from multi-category datasets such as the Microsoft Common -Objects in Context (MSCOCO) through the use of generative adversarial networks -(GANs). Synthesizing objects with a complex shape, however, is still -challenging. For example, animals and humans have many degrees of freedom, -which means that they can take on many complex shapes. We propose a new -training method called Image-Text-Image (I2T2I) which integrates text-to-image -and image-to-text (image captioning) synthesis to improve the performance of -text-to-image synthesis. We demonstrate that %the capability of our method to -understand the sentence descriptions, so as to I2T2I can generate better -multi-categories images using MSCOCO than the state-of-the-art. We also -demonstrate that I2T2I can achieve transfer learning by using a pre-trained -image captioning module to generate human images on the MPII Human Pose -" -4437,1703.07055,"Xiujun Li and Yun-Nung Chen and Lihong Li and Jianfeng Gao and Asli - Celikyilmaz","Investigation of Language Understanding Impact for Reinforcement - Learning Based Dialogue Systems",cs.CL cs.AI cs.LG," Language understanding is a key component in a spoken dialogue system. In -this paper, we investigate how the language understanding module influences the -dialogue system performance by conducting a series of systematic experiments on -a task-oriented neural dialogue system in a reinforcement learning based -setting. The empirical study shows that among different types of language -understanding errors, slot-level errors can have more impact on the overall -performance of a dialogue system compared to intent-level errors. In addition, -our experiments demonstrate that the reinforcement learning based dialogue -system is able to learn when and what to confirm in order to achieve better -performance and greater robustness. -" -4438,1703.07090,"Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei, Peihao Wu, Wenchang - Situ, Shuai Li, Yang Zhang",Deep LSTM for Large Vocabulary Continuous Speech Recognition,cs.CL," Recurrent neural networks (RNNs), especially long short-term memory (LSTM) -RNNs, are effective network for sequential task like speech recognition. Deeper -LSTM models perform well on large vocabulary continuous speech recognition, -because of their impressive learning ability. However, it is more difficult to -train a deeper network. We introduce a training framework with layer-wise -training and exponential moving average methods for deeper LSTM models. It is a -competitive framework that LSTM models of more than 7 layers are successfully -trained on Shenma voice search data in Mandarin and they outperform the deep -LSTM models trained by conventional approach. Moreover, in order for online -streaming speech recognition applications, the shallow model with low real time -factor is distilled from the very deep model. The recognition accuracy have -little loss in the distillation process. Therefore, the model trained with the -proposed training framework reduces relative 14\% character error rate, -compared to original model which has the similar real-time capability. -Furthermore, the novel transfer learning strategy with segmental Minimum -Bayes-Risk is also introduced in the framework. The strategy makes it possible -that training with only a small part of dataset could outperform full dataset -training from the beginning. -" -4439,1703.07438,"Nathan Schneider, Chuck Wooters","The NLTK FrameNet API: Designing for Discoverability with a Rich - Linguistic Resource",cs.CL," A new Python API, integrated within the NLTK suite, offers access to the -FrameNet 1.7 lexical database. The lexicon (structured in terms of frames) as -well as annotated sentences can be processed programatically, or browsed with -human-readable displays via the interactive Python prompt. -" -4440,1703.07476,"Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev - Khudanpur",Topic Identification for Speech without ASR,cs.CL," Modern topic identification (topic ID) systems for speech use automatic -speech recognition (ASR) to produce speech transcripts, and perform supervised -classification on such ASR outputs. However, under resource-limited conditions, -the manually transcribed speech required to develop standard ASR systems can be -severely limited or unavailable. In this paper, we investigate alternative -unsupervised solutions to obtaining tokenizations of speech in terms of a -vocabulary of automatically discovered word-like or phoneme-like units, without -depending on the supervised training of ASR systems. Moreover, using automatic -phoneme-like tokenizations, we demonstrate that a convolutional neural network -based framework for learning spoken document representations provides -competitive performance compared to a standard bag-of-words representation, as -evidenced by comprehensive topic ID evaluations on both single-label and -multi-label classification tasks. -" -4441,1703.07588,"Yu-Hsuan Wang, Cheng-Tao Chung, Hung-yi Lee","Gate Activation Signal Analysis for Gated Recurrent Neural Networks and - Its Correlation with Phoneme Boundaries",cs.SD cs.CL cs.LG," In this paper we analyze the gate activation signals inside the gated -recurrent neural networks, and find the temporal structure of such signals is -highly correlated with the phoneme boundaries. This correlation is further -verified by a set of experiments for phoneme segmentation, in which better -results compared to standard approaches were obtained. -" -4442,1703.07713,"Zhao Meng, Lili Mou, Zhi Jin","Hierarchical RNN with Static Sentence-Level Attention for Text-Based - Speaker Change Detection",cs.CL," Speaker change detection (SCD) is an important task in dialog modeling. Our -paper addresses the problem of text-based SCD, which differs from existing -audio-based studies and is useful in various scenarios, for example, processing -dialog transcripts where speaker identities are missing (e.g., OpenSubtitle), -and enhancing audio SCD with textual information. We formulate text-based SCD -as a matching problem of utterances before and after a certain decision point; -we propose a hierarchical recurrent neural network (RNN) with static -sentence-level attention. Experimental results show that neural networks -consistently achieve better performance than feature-based approaches, and that -our attention-based model significantly outperforms non-attention neural -networks. -" -4443,1703.07754,"Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, - David Nahamoo","Direct Acoustics-to-Word Models for English Conversational Speech - Recognition",cs.CL cs.NE stat.ML," Recent work on end-to-end automatic speech recognition (ASR) has shown that -the connectionist temporal classification (CTC) loss can be used to convert -acoustics to phone or character sequences. Such systems are used with a -dictionary and separately-trained Language Model (LM) to produce word -sequences. However, they are not truly end-to-end in the sense of mapping -acoustics directly to words without an intermediate phone representation. In -this paper, we present the first results employing direct acoustics-to-word CTC -models on two well-known public benchmark tasks: Switchboard and CallHome. -These models do not require an LM or even a decoder at run-time and hence -recognize speech with minimal complexity. However, due to the large number of -word output units, CTC word models require orders of magnitude more data to -train reliably compared to traditional systems. We present some techniques to -mitigate this issue. Our CTC word model achieves a word error rate of -13.0%/18.8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or -decoder compared with 9.6%/16.0% for phone-based CTC with a 4-gram LM. We also -present rescoring results on CTC word model lattices to quantify the -performance benefits of a LM, and contrast the performance of word and phone -CTC models. -" -4444,1703.07805,"Mayank Kejriwal, Pedro Szekely",Supervised Typing of Big Graphs using Semantic Embeddings,cs.CL cs.AI," We propose a supervised algorithm for generating type embeddings in the same -semantic vector space as a given set of entity embeddings. The algorithm is -agnostic to the derivation of the underlying entity embeddings. It does not -require any manual feature engineering, generalizes well to hundreds of types -and achieves near-linear scaling on Big Graphs containing many millions of -triples and instances by virtue of an incremental execution. We demonstrate the -utility of the embeddings on a type recommendation task, outperforming a -non-parametric feature-agnostic baseline while achieving 15x speedup and -near-constant memory usage on a full partition of DBpedia. Using -state-of-the-art visualization, we illustrate the agreement of our -extensionally derived DBpedia type embeddings with the manually curated domain -ontology. Finally, we use the embeddings to probabilistically cluster about 4 -million DBpedia instances into 415 types in the DBpedia ontology. -" -4445,1703.08002,"Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio",A network of deep neural networks for distant speech recognition,cs.CL cs.LG," Despite the remarkable progress recently made in distant speech recognition, -state-of-the-art technology still suffers from a lack of robustness, especially -when adverse acoustic conditions characterized by non-stationary noises and -reverberation are met. A prominent limitation of current systems lies in the -lack of matching and communication between the various technologies involved in -the distant speech recognition process. The speech enhancement and speech -recognition modules are, for instance, often trained independently. Moreover, -the speech enhancement normally helps the speech recognizer, but the output of -the latter is not commonly used, in turn, to improve the speech enhancement. To -address both concerns, we propose a novel architecture based on a network of -deep neural networks, where all the components are jointly trained and better -cooperate with each other thanks to a full communication scheme between them. -Experiments, conducted using different datasets, tasks and acoustic conditions, -revealed that the proposed framework can overtake other competitive solutions, -including recent joint training approaches. -" -4446,1703.08052,"Maja Rudolph, David Blei",Dynamic Bernoulli Embeddings for Language Evolution,stat.ML cs.CL," Word embeddings are a powerful approach for unsupervised analysis of -language. Recently, Rudolph et al. (2016) developed exponential family -embeddings, which cast word embeddings in a probabilistic framework. Here, we -develop dynamic embeddings, building on exponential family embeddings to -capture how the meanings of words change over time. We use dynamic embeddings -to analyze three large collections of historical texts: the U.S. Senate -speeches from 1858 to 2009, the history of computer science ACM abstracts from -1951 to 2014, and machine learning papers on the Arxiv from 2007 to 2015. We -find dynamic embeddings provide better fits than classical embeddings and -capture interesting patterns about how language changes. -" -4447,1703.08068,"Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow",Sequential Recurrent Neural Networks for Language Modeling,cs.CL," Feedforward Neural Network (FNN)-based language models estimate the -probability of the next word based on the history of the last N words, whereas -Recurrent Neural Networks (RNN) perform the same task based only on the last -word and some context information that cycles in the network. This paper -presents a novel approach, which bridges the gap between these two categories -of networks. In particular, we propose an architecture which takes advantage of -the explicit, sequential enumeration of the word history in FNN structure while -enhancing each word representation at the projection layer through recurrent -context information that evolves in the network. The context integration is -performed using an additional word-dependent weight matrix that is also learned -during the training. Extensive experiments conducted on the Penn Treebank (PTB) -and the Large Text Compression Benchmark (LTCB) corpus showed a significant -reduction of the perplexity when compared to state-of-the-art feedforward as -well as recurrent neural network architectures. -" -4448,1703.08084,"Jean-Benoit Delbrouck, Stephane Dupont","Multimodal Compact Bilinear Pooling for Multimodal Neural Machine - Translation",cs.CL," In state-of-the-art Neural Machine Translation, an attention mechanism is -used during decoding to enhance the translation. At every step, the decoder -uses this mechanism to focus on different parts of the source sentence to -gather the most useful information before outputting its target word. Recently, -the effectiveness of the attention mechanism has also been explored for -multimodal tasks, where it becomes possible to focus both on sentence parts and -image regions. Approaches to pool two modalities usually include element-wise -product, sum or concatenation. In this paper, we evaluate the more advanced -Multimodal Compact Bilinear pooling method, which takes the outer product of -two vectors to combine the attention features for the two modalities. This has -been previously investigated for visual question answering. We try out this -approach for multimodal image caption translation and show improvements -compared to basic combination methods. -" -4449,1703.08088,Vineet John,"Rapid-Rate: A Framework for Semi-supervised Real-time Sentiment Trend - Detection in Unstructured Big Data",cs.CL," Commercial establishments like restaurants, service centres and retailers -have several sources of customer feedback about products and services, most of -which need not be as structured as rated reviews provided by services like -Yelp, or Amazon, in terms of sentiment conveyed. For instance, Amazon provides -a fine-grained score on a numeric scale for product reviews. Some sources, -however, like social media (Twitter, Facebook), mailing lists (Google Groups) -and forums (Quora) contain text data that is much more voluminous, but -unstructured and unlabelled. It might be in the best interests of a business -establishment to assess the general sentiment towards their brand on these -platforms as well. This text could be pipelined into a system with a built-in -prediction model, with the objective of generating real-time graphs on opinion -and sentiment trends. Although such tasks like the one described about have -been explored with respect to document classification problems in the past, the -implementation described in this paper, by virtue of learning a continuous -function rather than a discrete one, offers a lot more depth of insight as -compared to document classification approaches. This study aims to explore the -validity of such a continuous function predicting model to quantify sentiment -about an entity, without the additional overhead of manual labelling, and -computational preprocessing & feature extraction. This research project also -aims to design and implement a re-usable document regression pipeline as a -framework, Rapid-Rate, that can be used to predict document scores in -real-time. -" -4450,1703.08098,Dat Quoc Nguyen,"A survey of embedding models of entities and relationships for knowledge - graph completion",cs.CL cs.AI cs.IR," Knowledge graphs (KGs) of real-world facts about entities and their -relationships are useful resources for a variety of natural language processing -tasks. However, because knowledge graphs are typically incomplete, it is useful -to perform knowledge graph completion or link prediction, i.e. predict whether -a relationship not in the knowledge graph is likely to be true. This paper -serves as a comprehensive survey of embedding models of entities and -relationships for knowledge graph completion, summarizing up-to-date -experimental results on standard benchmark datasets and pointing out potential -future research directions. -" -4451,1703.08120,"Abhijit Sharang, Eric Lau",Recurrent and Contextual Models for Visual Question Answering,cs.CL cs.CV," We propose a series of recurrent and contextual neural network models for -multiple choice visual question answering on the Visual7W dataset. Motivated by -divergent trends in model complexities in the literature, we explore the -balance between model expressiveness and simplicity by studying incrementally -more complex architectures. We start with LSTM-encoding of input questions and -answers; build on this with context generation by LSTM-encodings of neural -image and question representations and attention over images; and evaluate the -diversity and predictive power of our models and the ensemble thereof. All -models are evaluated against a simple baseline inspired by the current -state-of-the-art, consisting of involving simple concatenation of bag-of-words -and CNN representations for the text and images, respectively. Generally, we -observe marked variation in image-reasoning performance between our models not -obvious from their overall performance, as well as evidence of dataset bias. -Our standalone models achieve accuracies up to $64.6\%$, while the ensemble of -all models achieves the best accuracy of $66.67\%$, within $0.5\%$ of the -current state-of-the-art for Visual7W. -" -4452,1703.08135,"Herman Kamper, Karen Livescu, Sharon Goldwater","An embedded segmental K-means model for unsupervised segmentation and - clustering of speech",cs.CL cs.LG," Unsupervised segmentation and clustering of unlabelled speech are core -problems in zero-resource speech processing. Most approaches lie at -methodological extremes: some use probabilistic Bayesian models with -convergence guarantees, while others opt for more efficient heuristic -techniques. Despite competitive performance in previous work, the full Bayesian -approach is difficult to scale to large speech corpora. We introduce an -approximation to a recent Bayesian model that still has a clear objective -function but improves efficiency by using hard clustering and segmentation -rather than full Bayesian inference. Like its Bayesian counterpart, this -embedded segmental K-means model (ES-KMeans) represents arbitrary-length word -segments as fixed-dimensional acoustic word embeddings. We first compare -ES-KMeans to previous approaches on common English and Xitsonga data sets (5 -and 2.5 hours of speech): ES-KMeans outperforms a leading heuristic method in -word segmentation, giving similar scores to the Bayesian model while being 5 -times faster with fewer hyperparameters. However, its clusters are less pure -than those of the other models. We then show that ES-KMeans scales to larger -corpora by applying it to the 5 languages of the Zero Resource Speech Challenge -2017 (up to 45 hours), where it performs competitively compared to the -challenge baseline. -" -4453,1703.08136,"Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu","Visually grounded learning of keyword prediction from untranscribed - speech",cs.CL cs.CV," During language acquisition, infants have the benefit of visual cues to -ground spoken language. Robots similarly have access to audio and visual -sensors. Recent work has shown that images and spoken captions can be mapped -into a meaningful common space, allowing images to be retrieved using speech -and vice versa. In this setting of images paired with untranscribed spoken -captions, we consider whether computer vision systems can be used to obtain -textual labels for the speech. Concretely, we use an image-to-words multi-label -visual classifier to tag images with soft textual labels, and then train a -neural network to map from the speech to these soft targets. We show that the -resulting speech system is able to predict which words occur in an -utterance---acting as a spoken bag-of-words classifier---without seeing any -parallel speech and text. We find that the model often confuses semantically -related words, e.g. ""man"" and ""person"", making it even more effective as a -semantic keyword spotter. -" -4454,1703.08244,"Fabian Fl\""ock, Kenan Erdogan, Maribel Acosta","TokTrack: A Complete Token Provenance and Change Tracking Dataset for - the English Wikipedia",cs.CL," We present a dataset that contains every instance of all tokens (~ words) -ever written in undeleted, non-redirect English Wikipedia articles until -October 2016, in total 13,545,349,787 instances. Each token is annotated with -(i) the article revision it was originally created in, and (ii) lists with all -the revisions in which the token was ever deleted and (potentially) re-added -and re-deleted from its article, enabling a complete and straightforward -tracking of its history. This data would be exceedingly hard to create by an -average potential user as it is (i) very expensive to compute and as (ii) -accurately tracking the history of each token in revisioned documents is a -non-trivial task. Adapting a state-of-the-art algorithm, we have produced a -dataset that allows for a range of analyses and metrics, already popular in -research and going beyond, to be generated on complete-Wikipedia scale; -ensuring quality and allowing researchers to forego expensive text-comparison -computation, which so far has hindered scalable usage. We show how this data -enables, on token-level, computation of provenance, measuring survival of -content over time, very detailed conflict metrics, and fine-grained -interactions of editors like partial reverts, re-additions and other metrics, -in the process gaining several novel insights. -" -4455,1703.08314,"Joe Bolt, Bob Coecke, Fabrizio Genovese, Martha Lewis, Dan Marsden and - Robin Piedeleu",Interacting Conceptual Spaces I : Grammatical Composition of Concepts,cs.LO cs.CL," The categorical compositional approach to meaning has been successfully -applied in natural language processing, outperforming other models in -mainstream empirical language processing tasks. We show how this approach can -be generalized to conceptual space models of cognition. In order to do this, -first we introduce the category of convex relations as a new setting for -categorical compositional semantics, emphasizing the convex structure important -to conceptual space applications. We then show how to construct conceptual -spaces for various types such as nouns, adjectives and verbs. Finally we show -by means of examples how concepts can be systematically combined to establish -the meanings of composite phrases from the meanings of their constituent parts. -This provides the mathematical underpinnings of a new compositional approach to -cognition. -" -4456,1703.08324,"Ramon Ferrer-i-Cancho, Carlos Gomez-Rodriguez and J.L. Esteban",Are crossing dependencies really scarce?,physics.soc-ph cond-mat.stat-mech cs.CL physics.data-an," The syntactic structure of a sentence can be modelled as a tree, where -vertices correspond to words and edges indicate syntactic dependencies. It has -been claimed recurrently that the number of edge crossings in real sentences is -small. However, a baseline or null hypothesis has been lacking. Here we -quantify the amount of crossings of real sentences and compare it to the -predictions of a series of baselines. We conclude that crossings are really -scarce in real sentences. Their scarcity is unexpected by the hubiness of the -trees. Indeed, real sentences are close to linear trees, where the potential -number of crossings is maximized. -" -4457,1703.08428,"Justin Cranshaw, Emad Elwany, Todd Newman, Rafal Kocielnik, Bowen Yu, - Sandeep Soni, Jaime Teevan, Andr\'es Monroy-Hern\'andez","Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans - in the Loop",cs.HC cs.AI cs.CL," Although information workers may complain about meetings, they are an -essential part of their work life. Consequently, busy people spend a -significant amount of time scheduling meetings. We present Calendar.help, a -system that provides fast, efficient scheduling through structured workflows. -Users interact with the system via email, delegating their scheduling needs to -the system as if it were a human personal assistant. Common scheduling -scenarios are broken down using well-defined workflows and completed as a -series of microtasks that are automated when possible and executed by a human -otherwise. Unusual scenarios fall back to a trained human assistant who -executes them as unstructured macrotasks. We describe the iterative approach we -used to develop Calendar.help, and share the lessons learned from scheduling -thousands of meetings during a year of real-world deployments. Our findings -provide insight into how complex information tasks can be broken down into -repeatable components that can be executed efficiently to improve productivity. -" -4458,1703.08471,"Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio",Batch-normalized joint training for DNN-based distant speech recognition,cs.CL cs.LG," Improving distant speech recognition is a crucial step towards flexible -human-machine interfaces. Current technology, however, still exhibits a lack of -robustness, especially when adverse acoustic conditions are met. Despite the -significant progress made in the last years on both speech enhancement and -speech recognition, one potential limitation of state-of-the-art technology -lies in composing modules that are not well matched because they are not -trained jointly. To address this concern, a promising approach consists in -concatenating a speech enhancement and a speech recognition deep neural network -and to jointly update their parameters as if they were within a single bigger -network. Unfortunately, joint training can be difficult because the output -distribution of the speech enhancement system may change substantially during -the optimization procedure. The speech recognition module would have to deal -with an input distribution that is non-stationary and unnormalized. To mitigate -this issue, we propose a joint training approach based on a fully -batch-normalized architecture. Experiments, conducted using different datasets, -tasks and acoustic conditions, revealed that the proposed framework -significantly overtakes other competitive solutions, especially in challenging -environments. -" -4459,1703.08513,Stefan Heinrich and Stefan Wermter,"Interactive Natural Language Acquisition in a Multi-modal Recurrent - Neural Architecture",cs.CL q-bio.NC," For the complex human brain that enables us to communicate in natural -language, we gathered good understandings of principles underlying language -acquisition and processing, knowledge about socio-cultural conditions, and -insights about activity patterns in the brain. However, we were not yet able to -understand the behavioural and mechanistic characteristics for natural language -and how mechanisms in the brain allow to acquire and process language. In -bridging the insights from behavioural psychology and neuroscience, the goal of -this paper is to contribute a computational understanding of appropriate -characteristics that favour language acquisition. Accordingly, we provide -concepts and refinements in cognitive modelling regarding principles and -mechanisms in the brain and propose a neurocognitively plausible model for -embodied language acquisition from real world interaction of a humanoid robot -with its environment. In particular, the architecture consists of a continuous -time recurrent neural network, where parts have different leakage -characteristics and thus operate on multiple timescales for every modality and -the association of the higher level nodes of all modalities into cell -assemblies. The model is capable of learning language production grounded in -both, temporal dynamic somatosensation and vision, and features hierarchical -concept abstraction, concept decomposition, multi-modal integration, and -self-organisation of latent representations. -" -4460,1703.08537,"Victor Soto, Julia Hirschberg",Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching,cs.CL," Code-switching is the phenomenon by which bilingual speakers switch between -multiple languages during communication. The importance of developing language -technologies for codeswitching data is immense, given the large populations -that routinely code-switch. High-quality linguistic annotations are extremely -valuable for any NLP task, and performance is often limited by the amount of -high-quality labeled data. However, little such data exists for code-switching. -In this paper, we describe crowd-sourcing universal part-of-speech tags for the -Miami Bangor Corpus of Spanish-English code-switched speech. We split the -annotation task into three subtasks: one in which a subset of tokens are -labeled automatically, one in which questions are specifically designed to -disambiguate a subset of high frequency words, and a more general cascaded -approach for the remaining data in which questions are displayed to the worker -following a decision tree structure. Each subtask is extended and adapted for a -multilingual setting and the universal tagset. The quality of the annotation -process is measured using hidden check questions annotated with gold labels. -The overall agreement between gold standard labels and the majority vote is -between 0.95 and 0.96 for just three labels and the average recall across -part-of-speech tags is between 0.87 and 0.99, depending on the task. -" -4461,1703.08544,"Joshua J. Michalenko, Andrew S. Lan, Richard G. Baraniuk",Data-Mining Textual Responses to Uncover Misconception Patterns,stat.ML cs.CL," An important, yet largely unstudied, problem in student data analysis is to -detect misconceptions from students' responses to open-response questions. -Misconception detection enables instructors to deliver more targeted feedback -on the misconceptions exhibited by many students in their class, thus improving -the quality of instruction. In this paper, we propose a new natural language -processing-based framework to detect the common misconceptions among students' -textual responses to short-answer questions. We propose a probabilistic model -for students' textual responses involving misconceptions and experimentally -validate it on a real-world student-response dataset. Experimental results show -that our proposed framework excels at classifying whether a response exhibits -one or more misconceptions. More importantly, it can also automatically detect -the common misconceptions exhibited across responses from multiple students to -multiple questions; this property is especially important at large scale, since -instructors will no longer need to manually specify all possible misconceptions -that students might exhibit. -" -4462,1703.08581,"Ron J. Weiss, Jan Chorowski, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen",Sequence-to-Sequence Models Can Directly Translate Foreign Speech,cs.CL cs.LG stat.ML," We present a recurrent encoder-decoder deep neural network architecture that -directly translates speech in one language into text in another. The model does -not explicitly transcribe the speech into text in the source language, nor does -it require supervision from the ground truth source language transcription -during training. We apply a slightly modified sequence-to-sequence with -attention architecture that has previously been used for speech recognition and -show that it can be repurposed for this more complex task, illustrating the -power of attention-based models. A single model trained end-to-end obtains -state-of-the-art performance on the Fisher Callhome Spanish-English speech -translation task, outperforming a cascade of independently trained -sequence-to-sequence speech recognition and machine translation models by 1.8 -BLEU points on the Fisher test set. In addition, we find that making use of the -training data in both languages by multi-task training sequence-to-sequence -speech translation and recognition models with a shared encoder network can -improve performance by a further 1.4 BLEU points. -" -4463,1703.08646,Yohan Jo,"Simplifying the Bible and Wikipedia Using Statistical Machine - Translation",cs.CL," I started this work with the hope of generating a text synthesizer (like a -musical synthesizer) that can imitate certain linguistic styles. Most of the -report focuses on text simplification using statistical machine translation -(SMT) techniques. I applied MOSES to a parallel corpus of the Bible (King James -Version and Easy-to-Read Version) and that of Wikipedia articles (normal and -simplified). I report the importance of the three main components of -SMT---phrase translation, language model, and recording---by changing their -weights and comparing the resulting quality of simplified text in terms of -METEOR and BLEU. Toward the end of the report will be presented some examples -of text ""synthesized"" into the King James style. -" -4464,1703.08701,Claudia Borg and Albert Gatt,"Morphological Analysis for the Maltese Language: The Challenges of a - Hybrid System",cs.CL," Maltese is a morphologically rich language with a hybrid morphological system -which features both concatenative and non-concatenative processes. This paper -analyses the impact of this hybridity on the performance of machine learning -techniques for morphological labelling and clustering. In particular, we -analyse a dataset of morphologically related word clusters to evaluate the -difference in results for concatenative and nonconcatenative clusters. We also -describe research carried out in morphological labelling, with a particular -focus on the verb category. Two evaluations were carried out, one using an -unseen dataset, and another one using a gold standard dataset which was -manually labelled. The gold standard dataset was split into concatenative and -non-concatenative to analyse the difference in results between the two -morphological systems. -" -4465,1703.08705,"Sebastian Gehrmann, Franck Dernoncourt, Yeran Li, Eric T. Carlson, Joy - T. Wu, Jonathan Welt, John Foote Jr., Edward T. Moseley, David W. Grant, - Patrick D. Tyler, Leo Anthony Celi",Comparing Rule-Based and Deep Learning Models for Patient Phenotyping,cs.CL cs.AI cs.NE stat.ML," Objective: We investigate whether deep learning techniques for natural -language processing (NLP) can be used efficiently for patient phenotyping. -Patient phenotyping is a classification task for determining whether a patient -has a medical condition, and is a crucial part of secondary analysis of -healthcare data. We assess the performance of deep learning algorithms and -compare them with classical NLP approaches. - Materials and Methods: We compare convolutional neural networks (CNNs), -n-gram models, and approaches based on cTAKES that extract pre-defined medical -concepts from clinical notes and use them to predict patient phenotypes. The -performance is tested on 10 different phenotyping tasks using 1,610 discharge -summaries extracted from the MIMIC-III database. - Results: CNNs outperform other phenotyping algorithms in all 10 tasks. The -average F1-score of our model is 76 (PPV of 83, and sensitivity of 71) with our -model having an F1-score up to 37 points higher than alternative approaches. We -additionally assess the interpretability of our model by presenting a method -that extracts the most salient phrases for a particular prediction. - Conclusion: We show that NLP methods based on deep learning improve the -performance of patient phenotyping. Our CNN-based algorithm automatically -learns the phrases associated with each patient phenotype. As such, it reduces -the annotation complexity for clinical domain experts, who are normally -required to develop task-specific annotation rules and identify relevant -phrases. Our method performs well in terms of both performance and -interpretability, which indicates that deep learning is an effective approach -to patient phenotyping based on clinicians' notes. -" -4466,1703.08748,Lifeng Han,LEPOR: An Augmented Machine Translation Evaluation Metric,cs.CL," Machine translation (MT) was developed as one of the hottest research topics -in the natural language processing (NLP) literature. One important issue in MT -is that how to evaluate the MT system reasonably and tell us whether the -translation system makes an improvement or not. The traditional manual judgment -methods are expensive, time-consuming, unrepeatable, and sometimes with low -agreement. On the other hand, the popular automatic MT evaluation methods have -some weaknesses. Firstly, they tend to perform well on the language pairs with -English as the target language, but weak when English is used as source. -Secondly, some methods rely on many additional linguistic features to achieve -good performance, which makes the metric unable to replicate and apply to other -language pairs easily. Thirdly, some popular metrics utilize incomprehensive -factors, which result in low performance on some practical tasks. In this -thesis, to address the existing problems, we design novel MT evaluation methods -and investigate their performances on different languages. Firstly, we design -augmented factors to yield highly accurate evaluation. Secondly, we design a -tunable evaluation model where weighting of factors can be optimized according -to the characteristics of languages. Thirdly, in the enhanced version of our -methods, we design concise linguistic feature using part-of-speech (POS) to -show that our methods can yield even higher performance when using some -external linguistic resources. Finally, we introduce the practical performance -of our metrics in the ACL-WMT workshop shared tasks, which show that the -proposed methods are robust across different languages. In addition, we also -present some novel work on quality estimation of MT without using reference -translations including the usage of probability models of Na\""ive Bayes (NB), -support vector machine (SVM) classification algorithms, and CRFs. -" -4467,1703.08864,"Alexander G. Ororbia II, Tomas Mikolov, and David Reitter",Learning Simpler Language Models with the Differential State Framework,cs.CL," Learning useful information across long time lags is a critical and difficult -problem for temporal neural models in tasks such as language modeling. Existing -architectures that address the issue are often complex and costly to train. The -Differential State Framework (DSF) is a simple and high-performing design that -unifies previously introduced gated neural models. DSF models maintain -longer-term memory by learning to interpolate between a fast-changing -data-driven representation and a slowly changing, implicitly stable state. This -requires hardly any more parameters than a classical, simple recurrent network. -Within the DSF framework, a new architecture is presented, the Delta-RNN. In -language modeling at the word and character levels, the Delta-RNN outperforms -popular complex architectures, such as the Long Short Term Memory (LSTM) and -the Gated Recurrent Unit (GRU), and, when regularized, performs comparably to -several state-of-the-art baselines. At the subword level, the Delta-RNN's -performance is comparable to that of complex gated architectures. -" -4468,1703.08885,"Yusuke Watanabe, Bhuwan Dhingra, Ruslan Salakhutdinov",Question Answering from Unstructured Text by Retrieval and Comprehension,cs.CL," Open domain Question Answering (QA) systems must interact with external -knowledge sources, such as web pages, to find relevant information. Information -sources like Wikipedia, however, are not well structured and difficult to -utilize in comparison with Knowledge Bases (KBs). In this work we present a -two-step approach to question answering from unstructured text, consisting of a -retrieval step and a comprehension step. For comprehension, we present an RNN -based attention model with a novel mixture mechanism for selecting answers from -either retrieved articles or a fixed vocabulary. For retrieval we introduce a -hand-crafted model and a neural model for ranking relevant articles. We achieve -state-of-the-art performance on W IKI M OVIES dataset, reducing the error by -40%. Our experimental results further demonstrate the importance of each of the -introduced components. -" -4469,1703.09013,"Christina Niklaus, Bernhard Bermeitinger, Siegfried Handschuh, Andr\'e - Freitas",A Sentence Simplification System for Improving Relation Extraction,cs.CL," In this demo paper, we present a text simplification approach that is -directed at improving the performance of state-of-the-art Open Relation -Extraction (RE) systems. As syntactically complex sentences often pose a -challenge for current Open RE approaches, we have developed a simplification -framework that performs a pre-processing step by taking a single sentence as -input and using a set of syntactic-based transformation rules to create a -textual input that is easier to process for subsequently applied Open RE -systems. -" -4470,1703.09046,"Mika V. M\""antyl\""a, Nicole Novielli, Filippo Lanubile, Ma\""elick - Claes, Miikka Kuutila",Bootstrapping a Lexicon for Emotional Arousal in Software Engineering,cs.SE cs.CL," Emotional arousal increases activation and performance but may also lead to -burnout in software development. We present the first version of a Software -Engineering Arousal lexicon (SEA) that is specifically designed to address the -problem of emotional arousal in the software developer ecosystem. SEA is built -using a bootstrapping approach that combines word embedding model trained on -issue-tracking data and manual scoring of items in the lexicon. We show that -our lexicon is able to differentiate between issue priorities, which are a -source of emotional activation and then act as a proxy for arousal. The best -performance is obtained by combining SEA (428 words) with a previously created -general purpose lexicon by Warriner et al. (13,915 words) and it achieves -Cohen's d effect sizes up to 0.5. -" -4471,1703.09137,"Marc Tanti (1), Albert Gatt (1), Kenneth P. Camilleri (1) ((1) - University of Malta)",Where to put the Image in an Image Caption Generator,cs.NE cs.CL cs.CV," When a recurrent neural network language model is used for caption -generation, the image information can be fed to the neural network either by -directly incorporating it in the RNN -- conditioning the language model by -`injecting' image features -- or in a layer following the RNN -- conditioning -the language model by `merging' image features. While both options are attested -in the literature, there is as yet no systematic comparison between the two. In -this paper we empirically show that it is not especially detrimental to -performance whether one architecture is used or another. The merge architecture -does have practical advantages, as conditioning by merging allows the RNN's -hidden state vector to shrink in size by up to four times. Our results suggest -that the visual and linguistic modalities for caption generation need not be -jointly encoded by the RNN as that yields large, memory-intensive models with -few tangible advantages in performance; rather, the multimodal integration -should be delayed to a subsequent stage. -" -4472,1703.09398,Benjamin D. Horne and Sibel Adali,"This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive - Content in Text Body, More Similar to Satire than Real News",cs.SI cs.CL," The problem of fake news has gained a lot of attention as it is claimed to -have had a significant impact on 2016 US Presidential Elections. Fake news is -not a new problem and its spread in social networks is well-studied. Often an -underlying assumption in fake news discussion is that it is written to look -like real news, fooling the reader who does not check for reliability of the -sources or the arguments in its content. Through a unique study of three data -sets and features that capture the style and the language of articles, we show -that this assumption is not true. Fake news in most cases is more similar to -satire than to real news, leading us to conclude that persuasion in fake news -is achieved through heuristics rather than the strength of arguments. We show -overall title structure and the use of proper nouns in titles are very -significant in differentiating fake from real. This leads us to conclude that -fake news is targeted for audiences who are not likely to read beyond titles -and is aimed at creating mental associations between entities and claims. -" -4473,1703.09400,"Md Main Uddin Rony, Naeemul Hassan, Mohammad Yousuf","Diving Deep into Clickbaits: Who Use Them to What Extents in Which - Topics with What Effects?",cs.SI cs.CL," The use of alluring headlines (clickbait) to tempt the readers has become a -growing practice nowadays. For the sake of existence in the highly competitive -media industry, most of the on-line media including the mainstream ones, have -started following this practice. Although the wide-spread practice of clickbait -makes the reader's reliability on media vulnerable, a large scale analysis to -reveal this fact is still absent. In this paper, we analyze 1.67 million -Facebook posts created by 153 media organizations to understand the extent of -clickbait practice, its impact and user engagement by using our own developed -clickbait detection model. The model uses distributed sub-word embeddings -learned from a large corpus. The accuracy of the model is 98.3%. Powered with -this model, we further study the distribution of topics in clickbait and -non-clickbait contents. -" -4474,1703.09439,"Yichao Lu, Phillip Keung, Shaonan Zhang, Jason Sun, Vikas Bhardwaj",A practical approach to dialogue response generation in closed domains,cs.CL cs.NE," We describe a prototype dialogue response generation model for the customer -service domain at Amazon. The model, which is trained in a weakly supervised -fashion, measures the similarity between customer questions and agent answers -using a dual encoder network, a Siamese-like neural network architecture. -Answer templates are extracted from embeddings derived from past agent answers, -without turn-by-turn annotations. Responses to customer inquiries are generated -by selecting the best template from the final set of templates. We show that, -in a closed domain like customer service, the selected templates cover $>$70\% -of past customer inquiries. Furthermore, the relevance of the model-selected -templates is significantly higher than templates selected by a standard tf-idf -baseline. -" -4475,1703.09527,"Santiago Castro and Mat\'ias Cubero and Diego Garat and Guillermo - Moncecchi",Is This a Joke? Detecting Humor in Spanish Tweets,cs.CL cs.AI," While humor has been historically studied from a psychological, cognitive and -linguistic standpoint, its study from a computational perspective is an area -yet to be explored in Computational Linguistics. There exist some previous -works, but a characterization of humor that allows its automatic recognition -and generation is far from being specified. In this work we build a -crowdsourced corpus of labeled tweets, annotated according to its humor value, -letting the annotators subjectively decide which are humorous. A humor -classifier for Spanish tweets is assembled based on supervised learning, -reaching a precision of 84% and a recall of 69%. -" -4476,1703.09570,Taylor Arnold,A Tidy Data Model for Natural Language Processing using cleanNLP,cs.CL stat.CO," The package cleanNLP provides a set of fast tools for converting a textual -corpus into a set of normalized tables. The underlying natural language -processing pipeline utilizes Stanford's CoreNLP library, exposing a number of -annotation tasks for text written in English, French, German, and Spanish. -Annotators include tokenization, part of speech tagging, named entity -recognition, entity linking, sentiment analysis, dependency parsing, -coreference resolution, and information extraction. -" -4477,1703.09684,Kushal Kafle and Christopher Kanan,An Analysis of Visual Question Answering Algorithms,cs.CV cs.AI cs.CL," In visual question answering (VQA), an algorithm must answer text-based -questions about images. While multiple datasets for VQA have been created since -late 2014, they all have flaws in both their content and the way algorithms are -evaluated on them. As a result, evaluation scores are inflated and -predominantly determined by answering easier questions, making it difficult to -compare different methods. In this paper, we analyze existing VQA algorithms -using a new dataset. It contains over 1.6 million questions organized into 12 -different categories. We also introduce questions that are meaningless for a -given image to force a VQA system to reason about image content. We propose new -evaluation schemes that compensate for over-represented question-types and make -it easier to study the strengths and weaknesses of algorithms. We analyze the -performance of both baseline and state-of-the-art VQA models, including -multi-modal compact bilinear pooling (MCB), neural module networks, and -recurrent answering units. Our experiments establish how attention helps -certain categories more than others, determine which models work better than -others, and explain how simple models (e.g. MLP) can surpass more complex -models (MCB) by simply learning to answer large, easy question categories. -" -4478,1703.09749,"Kouakou Ive Arsene Koffi, Konan Marcellin Brou, Souleymane Oumtanaga","Developpement de Methodes Automatiques pour la Reutilisation des - Composants Logiciels",cs.SE cs.CL cs.DB," The large amount of information and the increasing complexity of applications -constrain developers to have stand-alone and reusable components from libraries -and component markets.Our approach consists in developing methods to evaluate -the quality of the software component of these libraries, on the one hand and -moreover to optimize the financial cost and the adaptation's time of these -selected components. Our objective function defines a metric that maximizes the -value of the software component quality by minimizing the financial cost and -maintenance time. This model should make it possible to classify the components -and order them in order to choose the most optimized. - MOTS-CLES : d{\'e}veloppement de m{\'e}thode, r{\'e}utilisation, composants -logiciels, qualit{\'e} de composant - KEYWORDS:method development, reuse, software components, component quality . -" -4479,1703.09817,"Einat Naaman, Yossi Adi, and Joseph Keshet",Learning Similarity Functions for Pronunciation Variations,cs.CL," A significant source of errors in Automatic Speech Recognition (ASR) systems -is due to pronunciation variations which occur in spontaneous and -conversational speech. Usually ASR systems use a finite lexicon that provides -one or more pronunciations for each word. In this paper, we focus on learning a -similarity function between two pronunciations. The pronunciations can be the -canonical and the surface pronunciations of the same word or they can be two -surface pronunciations of different words. This task generalizes problems such -as lexical access (the problem of learning the mapping between words and their -possible pronunciations), and defining word neighborhoods. It can also be used -to dynamically increase the size of the pronunciation lexicon, or in predicting -ASR errors. We propose two methods, which are based on recurrent neural -networks, to learn the similarity function. The first is based on binary -classification, and the second is based on learning the ranking of the -pronunciations. We demonstrate the efficiency of our approach on the task of -lexical access using a subset of the Switchboard conversational speech corpus. -Results suggest that on this task our methods are superior to previous methods -which are based on graphical Bayesian methods. -" -4480,1703.09825,Areej Alhothali and Jesse Hoey,"Semi-Supervised Affective Meaning Lexicon Expansion Using Semantic and - Distributed Word Representations",cs.CL," In this paper, we propose an extension to graph-based sentiment lexicon -induction methods by incorporating distributed and semantic word -representations in building the similarity graph to expand a three-dimensional -sentiment lexicon. We also implemented and evaluated the label propagation -using four different word representations and similarity metrics. Our -comprehensive evaluation of the four approaches was performed on a single data -set, demonstrating that all four methods can generate a significant number of -new sentiment assignments with high accuracy. The highest correlations -(tau=0.51) and the lowest error (mean absolute error < 1.1%), obtained by -combining both the semantic and the distributional features, outperformed the -distributional-based and semantic-based label-propagation models and approached -a supervised algorithm. -" -4481,1703.09831,"Haonan Yu, Haichao Zhang, and Wei Xu","A Deep Compositional Framework for Human-like Language Acquisition in - Virtual Environment",cs.CL cs.LG," We tackle a task where an agent learns to navigate in a 2D maze-like -environment called XWORLD. In each session, the agent perceives a sequence of -raw-pixel frames, a natural language command issued by a teacher, and a set of -rewards. The agent learns the teacher's language from scratch in a grounded and -compositional manner, such that after training it is able to correctly execute -zero-shot commands: 1) the combination of words in the command never appeared -before, and/or 2) the command contains new object concepts that are learned -from another task but never learned from navigation. Our deep framework for the -agent is trained end to end: it learns simultaneously the visual -representations of the environment, the syntax and semantics of the language, -and the action module that outputs actions. The zero-shot learning capability -of our framework results from its compositionality and modularity with -parameter tying. We visualize the intermediate outputs of the framework, -demonstrating that the agent truly understands how to solve the problem. We -believe that our results provide some preliminary insights on how to train an -agent with similar abilities in a 3D environment. -" -4482,1703.09902,Albert Gatt and Emiel Krahmer,"Survey of the State of the Art in Natural Language Generation: Core - tasks, applications and evaluation",cs.CL cs.AI cs.NE," This paper surveys the current state of the art in Natural Language -Generation (NLG), defined as the task of generating text or speech from -non-linguistic input. A survey of NLG is timely in view of the changes that the -field has undergone over the past decade or so, especially in relation to new -(usually data-driven) methods, as well as new applications of NLG technology. -This survey therefore aims to (a) give an up-to-date synthesis of research on -the core tasks in NLG and the architectures adopted in which such tasks are -organised; (b) highlight a number of relatively recent research topics that -have arisen partly as a result of growing synergies between NLG and other areas -of artificial intelligence; (c) draw attention to the challenges in NLG -evaluation, relating them to similar challenges faced in other areas of Natural -Language Processing, with an emphasis on different evaluation methods and the -relationships between them. -" -4483,1703.10065,Soumia Bougrine and Hadda Cherroun and Djelloul Ziadi,"Hierarchical Classification for Spoken Arabic Dialect Identification - using Prosody: Case of Algerian Dialects",cs.CL," In daily communications, Arabs use local dialects which are hard to identify -automatically using conventional classification methods. The dialect -identification challenging task becomes more complicated when dealing with an -under-resourced dialects belonging to a same county/region. In this paper, we -start by analyzing statistically Algerian dialects in order to capture their -specificities related to prosody information which are extracted at utterance -level after a coarse-grained consonant/vowel segmentation. According to these -analysis findings, we propose a Hierarchical classification approach for spoken -Arabic algerian Dialect IDentification (HADID). It takes advantage from the -fact that dialects have an inherent property of naturally structured into -hierarchy. Within HADID, a top-down hierarchical classification is applied, in -which we use Deep Neural Networks (DNNs) method to build a local classifier for -every parent node into the hierarchy dialect structure. Our framework is -implemented and evaluated on Algerian Arabic dialects corpus. Whereas, the -hierarchy dialect structure is deduced from historic and linguistic knowledges. -The results reveal that within {\HD}, the best classifier is DNNs compared to -Support Vector Machine. In addition, compared with a baseline Flat -classification system, our HADID gives an improvement of 63.5% in term of -precision. Furthermore, overall results evidence the suitability of our -prosody-based HADID for speaker independent dialect identification while -requiring less than 6s test utterances. -" -4484,1703.10090,Simon \v{S}uster and St\'ephan Tulkens and Walter Daelemans,"A Short Review of Ethical Challenges in Clinical Natural Language - Processing",cs.CL cs.CY," Clinical NLP has an immense potential in contributing to how clinical -practice will be revolutionized by the advent of large scale processing of -clinical records. However, this potential has remained largely untapped due to -slow progress primarily caused by strict data access policies for researchers. -In this paper, we discuss the concern for privacy and the measures it entails. -We also suggest sources of less sensitive data. Finally, we draw attention to -biases that can compromise the validity of empirical research and lead to -socially harmful applications. -" -4485,1703.10135,"Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, - Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, - Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous",Tacotron: Towards End-to-End Speech Synthesis,cs.CL cs.LG cs.SD," A text-to-speech synthesis system typically consists of multiple stages, such -as a text analysis frontend, an acoustic model and an audio synthesis module. -Building these components often requires extensive domain expertise and may -contain brittle design choices. In this paper, we present Tacotron, an -end-to-end generative text-to-speech model that synthesizes speech directly -from characters. Given pairs, the model can be trained completely -from scratch with random initialization. We present several key techniques to -make the sequence-to-sequence framework perform well for this challenging task. -Tacotron achieves a 3.82 subjective 5-scale mean opinion score on US English, -outperforming a production parametric system in terms of naturalness. In -addition, since Tacotron generates speech at the frame level, it's -substantially faster than sample-level autoregressive methods. -" -4486,1703.10152,Haixia Liu,Automatic Argumentative-Zoning Using Word2vec,cs.CL," In comparison with document summarization on the articles from social media -and newswire, argumentative zoning (AZ) is an important task in scientific -paper analysis. Traditional methodology to carry on this task relies on feature -engineering from different levels. In this paper, three models of generating -sentence vectors for the task of sentence classification were explored and -compared. The proposed approach builds sentence representations using learned -embeddings based on neural network. The learned word embeddings formed a -feature space, to which the examined sentence is mapped to. Those features are -input into the classifiers for supervised classification. Using -10-cross-validation scheme, evaluation was conducted on the -Argumentative-Zoning (AZ) annotated articles. The results showed that simply -averaging the word vectors in a sentence works better than the paragraph to -vector algorithm and by integrating specific cuewords into the loss function of -the neural network can improve the classification performance. In comparison -with the hand-crafted features, the word2vec method won for most of the -categories. However, the hand-crafted features showed their strength on -classifying some of the categories. -" -4487,1703.10186,"Will Monroe, Robert X.D. Hawkins, Noah D. Goodman, Christopher Potts","Colors in Context: A Pragmatic Neural Model for Grounded Language - Understanding",cs.CL," We present a model of pragmatic referring expression interpretation in a -grounded communication task (identifying colors from descriptions) that draws -upon predictions from two recurrent neural network classifiers, a speaker and a -listener, unified by a recursive pragmatic reasoning framework. Experiments -show that this combined pragmatic model interprets color descriptions more -accurately than the classifiers from which it is built, and that much of this -improvement results from combining the speaker and listener perspectives. We -observe that pragmatic reasoning helps primarily in the hardest cases: when the -model must distinguish very similar colors, or when few utterances adequately -express the target color. Our findings make use of a newly-collected corpus of -human utterances in color reference games, which exhibit a variety of pragmatic -behaviors. We also show that the embedded speaker model reproduces many of -these pragmatic behaviors. -" -4488,1703.10252,"Dimitrios Kartsaklis, Sanjaye Ramgoolam, Mehrnoosh Sadrzadeh",Linguistic Matrix Theory,cs.CL hep-th math.CO," Recent research in computational linguistics has developed algorithms which -associate matrices with adjectives and verbs, based on the distribution of -words in a corpus of text. These matrices are linear operators on a vector -space of context words. They are used to construct the meaning of composite -expressions from that of the elementary constituents, forming part of a -compositional distributional approach to semantics. We propose a Matrix Theory -approach to this data, based on permutation symmetry along with Gaussian -weights and their perturbations. A simple Gaussian model is tested against word -matrices created from a large corpus of text. We characterize the cubic and -quartic departures from the model, which we propose, alongside the Gaussian -parameters, as signatures for comparison of linguistic corpora. We propose that -perturbed Gaussian models with permutation symmetry provide a promising -framework for characterizing the nature of universality in the statistical -properties of word matrices. The matrix theory framework developed here -exploits the view of statistics as zero dimensional perturbative quantum field -theory. It perceives language as a physical system realizing a universality -class of matrix statistics characterized by permutation symmetry. -" -4489,1703.10339,Besnik Fetahu and Katja Markert and Wolfgang Nejdl and Avishek Anand,Finding News Citations for Wikipedia,cs.IR cs.CL cs.SI," An important editing policy in Wikipedia is to provide citations for added -statements in Wikipedia pages, where statements can be arbitrary pieces of -text, ranging from a sentence to a paragraph. In many cases citations are -either outdated or missing altogether. - In this work we address the problem of finding and updating news citations -for statements in entity pages. We propose a two-stage supervised approach for -this problem. In the first step, we construct a classifier to find out whether -statements need a news citation or other kinds of citations (web, book, -journal, etc.). In the second step, we develop a news citation algorithm for -Wikipedia statements, which recommends appropriate citations from a given news -collection. Apart from IR techniques that use the statement to query the news -collection, we also formalize three properties of an appropriate citation, -namely: (i) the citation should entail the Wikipedia statement, (ii) the -statement should be central to the citation, and (iii) the citation should be -from an authoritative source. - We perform an extensive evaluation of both steps, using 20 million articles -from a real-world news collection. Our results are quite promising, and show -that we can perform this task with high precision and at scale. -" -4490,1703.10344,Besnik Fetahu and Katja Markert and Avishek Anand,Automated News Suggestions for Populating Wikipedia Entity Pages,cs.IR cs.CL cs.SI," Wikipedia entity pages are a valuable source of information for direct -consumption and for knowledge-base construction, update and maintenance. Facts -in these entity pages are typically supported by references. Recent studies -show that as much as 20\% of the references are from online news sources. -However, many entity pages are incomplete even if relevant information is -already available in existing news articles. Even for the already present -references, there is often a delay between the news article publication time -and the reference time. In this work, we therefore look at Wikipedia through -the lens of news and propose a novel news-article suggestion task to improve -news coverage in Wikipedia, and reduce the lag of newsworthy references. Our -work finds direct application, as a precursor, to Wikipedia page generation and -knowledge-base acceleration tasks that rely on relevant and high quality input -sources. - We propose a two-stage supervised approach for suggesting news articles to -entity pages for a given state of Wikipedia. First, we suggest news articles to -Wikipedia entities (article-entity placement) relying on a rich set of features -which take into account the \emph{salience} and \emph{relative authority} of -entities, and the \emph{novelty} of news articles to entity pages. Second, we -determine the exact section in the entity page for the input article -(article-section placement) guided by class-based section templates. We perform -an extensive evaluation of our approach based on ground-truth data that is -extracted from external references in Wikipedia. We achieve a high precision -value of up to 93\% in the \emph{article-entity} suggestion stage and upto 84\% -for the \emph{article-section placement}. Finally, we compare our approach -against competitive baselines and show significant improvements. -" -4491,1703.10356,"Lior Fritz, David Burshtein",Simplified End-to-End MMI Training and Voting for ASR,cs.LG cs.CL cs.NE," A simplified speech recognition system that uses the maximum mutual -information (MMI) criterion is considered. End-to-end training using gradient -descent is suggested, similarly to the training of connectionist temporal -classification (CTC). We use an MMI criterion with a simple language model in -the training stage, and a standard HMM decoder. Our method compares favorably -to CTC in terms of performance, robustness, decoding time, disk footprint and -quality of alignments. The good alignments enable the use of a straightforward -ensemble method, obtained by simply averaging the predictions of several neural -network models, that were trained separately end-to-end. The ensemble method -yields a considerable reduction in the word error rate. -" -4492,1703.10476,"Rakshith Shetty, Marcus Rohrbach, Lisa Anne Hendricks, Mario Fritz, - Bernt Schiele","Speaking the Same Language: Matching Machine to Human Captions by - Adversarial Training",cs.CV cs.AI cs.CL," While strong progress has been made in image captioning over the last years, -machine and human captions are still quite distinct. A closer look reveals that -this is due to the deficiencies in the generated word distribution, vocabulary -size, and strong bias in the generators towards frequent captions. Furthermore, -humans -- rightfully so -- generate multiple, diverse captions, due to the -inherent ambiguity in the captioning task which is not considered in today's -systems. - To address these challenges, we change the training objective of the caption -generator from reproducing groundtruth captions to generating a set of captions -that is indistinguishable from human generated captions. Instead of -handcrafting such a learning target, we employ adversarial training in -combination with an approximate Gumbel sampler to implicitly match the -generated distribution to the human one. While our method achieves comparable -performance to the state-of-the-art in terms of the correctness of the -captions, we generate a set of diverse captions, that are significantly less -biased and match the word statistics better in several aspects. -" -4493,1703.10661,"Mithun Biswas, Rafiqul Islam, Gautam Kumar Shom, Md Shopon, Nabeel - Mohammed, Sifat Momen, Md Anowarul Abedin","BanglaLekha-Isolated: A Comprehensive Bangla Handwritten Character - Dataset",cs.CL," Bangla handwriting recognition is becoming a very important issue nowadays. -It is potentially a very important task specially for Bangla speaking -population of Bangladesh and West Bengal. By keeping that in our mind we are -introducing a comprehensive Bangla handwritten character dataset named -BanglaLekha-Isolated. This dataset contains Bangla handwritten numerals, basic -characters and compound characters. This dataset was collected from multiple -geographical location within Bangladesh and includes sample collected from a -variety of aged groups. This dataset can also be used for other classification -problems i.e: gender, age, district. This is the largest dataset on Bangla -handwritten characters yet. -" -4494,1703.10698,"Damian Ruck, R. Alexander Bentley, Alberto Acerbi, Philip Garnett and - Daniel J. Hruschka",Neutral evolution and turnover over centuries of English word popularity,cs.CL physics.soc-ph," Here we test Neutral models against the evolution of English word frequency -and vocabulary at the population scale, as recorded in annual word frequencies -from three centuries of English language books. Against these data, we test -both static and dynamic predictions of two neutral models, including the -relation between corpus size and vocabulary size, frequency distributions, and -turnover within those frequency distributions. Although a commonly used Neutral -model fails to replicate all these emergent properties at once, we find that -modified two-stage Neutral model does replicate the static and dynamic -properties of the corpus data. This two-stage model is meant to represent a -relatively small corpus (population) of English books, analogous to a `canon', -sampled by an exponentially increasing corpus of books in the wider population -of authors. More broadly, this mode -- a smaller neutral model within a larger -neutral model -- could represent more broadly those situations where mass -attention is focused on a small subset of the cultural variants. -" -4495,1703.10722,"Oleksii Kuchaiev, Boris Ginsburg",Factorization tricks for LSTM networks,cs.CL cs.NE stat.ML," We present two simple ways of reducing the number of parameters and -accelerating the training of large Long Short-Term Memory (LSTM) networks: the -first one is ""matrix factorization by design"" of LSTM matrix into the product -of two smaller matrices, and the second one is partitioning of LSTM matrix, its -inputs and states into the independent groups. Both approaches allow us to -train large LSTM networks significantly faster to the near state-of the art -perplexity while using significantly less RNN parameters. -" -4496,1703.10724,"Ciprian Chelba, Mohammad Norouzi, Samy Bengio",N-gram Language Modeling using Recurrent Neural Network Estimation,cs.CL," We investigate the effective memory depth of RNN models by using them for -$n$-gram language model (LM) smoothing. - Experiments on a small corpus (UPenn Treebank, one million words of training -data and 10k vocabulary) have found the LSTM cell with dropout to be the best -model for encoding the $n$-gram state when compared with feed-forward and -vanilla RNN models. When preserving the sentence independence assumption the -LSTM $n$-gram matches the LSTM LM performance for $n=9$ and slightly -outperforms it for $n=13$. When allowing dependencies across sentence -boundaries, the LSTM $13$-gram almost matches the perplexity of the unlimited -history LSTM LM. - LSTM $n$-gram smoothing also has the desirable property of improving with -increasing $n$-gram order, unlike the Katz or Kneser-Ney back-off estimators. -Using multinomial distributions as targets in training instead of the usual -one-hot target is only slightly beneficial for low $n$-gram orders. - Experiments on the One Billion Words benchmark show that the results hold at -larger scale: while LSTM smoothing for short $n$-gram contexts does not provide -significant advantages over classic N-gram models, it becomes effective with -long contexts ($n > 5$); depending on the task and amount of data it can match -fully recurrent LSTM models at about $n=13$. This may have implications when -modeling short-format text, e.g. voice search/query LMs. - Building LSTM $n$-gram LMs may be appealing for some practical situations: -the state in a $n$-gram LM can be succinctly represented with $(n-1)*4$ bytes -storing the identity of the words in the context and batches of $n$-gram -contexts can be processed in parallel. On the downside, the $n$-gram context -encoding computed by the LSTM is discarded, making the model more expensive -than a regular recurrent LSTM LM. -" -4497,1703.10772,"Irshad Ahmad Bhat, Riyaz Ahmad Bhat, Manish Shrivastava and Dipti - Misra Sharma","Joining Hands: Exploiting Monolingual Treebanks for Parsing of - Code-mixing Data",cs.CL," In this paper, we propose efficient and less resource-intensive strategies -for parsing of code-mixed data. These strategies are not constrained by -in-domain annotations, rather they leverage pre-existing monolingual annotated -resources for training. We show that these methods can produce significantly -better results as compared to an informed baseline. Besides, we also present a -data set of 450 Hindi and English code-mixed tweets of Hindi multilingual -speakers for evaluation. The data set is manually annotated with Universal -Dependencies. -" -4498,1703.10931,"Xingxing Zhang, Mirella Lapata",Sentence Simplification with Deep Reinforcement Learning,cs.CL cs.LG," Sentence simplification aims to make sentences easier to read and understand. -Most recent approaches draw on insights from machine translation to learn -simplification rewrites from monolingual corpora of complex and simple -sentences. We address the simplification problem with an encoder-decoder model -coupled with a deep reinforcement learning framework. Our model, which we call -{\sc Dress} (as shorthand for {\bf D}eep {\bf RE}inforcement {\bf S}entence -{\bf S}implification), explores the space of possible simplifications while -learning to optimize a reward function that encourages outputs which are -simple, fluent, and preserve the meaning of the input. Experiments on three -datasets demonstrate that our model outperforms competitive simplification -systems. -" -4499,1703.10960,"Tiancheng Zhao, Ran Zhao and Maxine Eskenazi","Learning Discourse-level Diversity for Neural Dialog Models using - Conditional Variational Autoencoders",cs.CL cs.AI," While recent neural encoder-decoder models have shown great promise in -modeling open-domain conversations, they often generate dull and generic -responses. Unlike past work that has focused on diversifying the output of the -decoder at word-level to alleviate this problem, we present a novel framework -based on conditional variational autoencoders that captures the discourse-level -diversity in the encoder. Our model uses latent variables to learn a -distribution over potential conversational intents and generates diverse -responses using only greedy decoders. We have further developed a novel variant -that is integrated with linguistic prior knowledge for better performance. -Finally, the training procedure is improved by introducing a bag-of-word loss. -Our proposed models have been validated to generate significantly more diverse -responses than baseline approaches and exhibit competence in discourse-level -decision-making. -" -4500,1704.00016,Esra Akbas,Opinion Mining on Non-English Short Text,cs.CL cs.IR," As the type and the number of such venues increase, automated analysis of -sentiment on textual resources has become an essential data mining task. In -this paper, we investigate the problem of mining opinions on the collection of -informal short texts. Both positive and negative sentiment strength of texts -are detected. We focus on a non-English language that has few resources for -text mining. This approach would help enhance the sentiment analysis in -languages where a list of opinionated words does not exist. We propose a new -method projects the text into dense and low dimensional feature vectors -according to the sentiment strength of the words. We detect the mixture of -positive and negative sentiments on a multi-variant scale. Empirical evaluation -of the proposed framework on Turkish tweets shows that our approach gets good -results for opinion mining. -" -4501,1704.00051,"Danqi Chen, Adam Fisch, Jason Weston and Antoine Bordes",Reading Wikipedia to Answer Open-Domain Questions,cs.CL," This paper proposes to tackle open- domain question answering using Wikipedia -as the unique knowledge source: the answer to any factoid question is a text -span in a Wikipedia article. This task of machine reading at scale combines the -challenges of document retrieval (finding the relevant articles) with that of -machine comprehension of text (identifying the answer spans from those -articles). Our approach combines a search component based on bigram hashing and -TF-IDF matching with a multi-layer recurrent neural network model trained to -detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA -datasets indicate that (1) both modules are highly competitive with respect to -existing counterparts and (2) multitask learning using distant supervision on -their combination is an effective complete system on this challenging task. -" -4502,1704.00052,"Katharina Kann, Ryan Cotterell, Hinrich Sch\""utze",One-Shot Neural Cross-Lingual Transfer for Paradigm Completion,cs.CL," We present a novel cross-lingual transfer method for paradigm completion, the -task of mapping a lemma to its inflected forms, using a neural encoder-decoder -model, the state of the art for the monolingual task. We use labeled data from -a high-resource language to increase performance on a low-resource language. In -experiments on 21 language pairs from four different language families, we -obtain up to 58% higher accuracy than without transfer and show that even -zero-shot and one-shot learning are possible. We further find that the degree -of language relatedness strongly influences the ability to transfer -morphological knowledge. -" -4503,1704.00057,"Layla El Asri and Hannes Schulz and Shikhar Sharma and Jeremie Zumer - and Justin Harris and Emery Fine and Rahul Mehrotra and Kaheer Suleman",Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems,cs.CL," This paper presents the Frames dataset (Frames is available at -http://datasets.maluuba.com/Frames), a corpus of 1369 human-human dialogues -with an average of 15 turns per dialogue. We developed this dataset to study -the role of memory in goal-oriented dialogue systems. Based on Frames, we -introduce a task called frame tracking, which extends state tracking to a -setting where several states are tracked simultaneously. We propose a baseline -model for this task. We show that Frames can also be used to study memory in -dialogue management and information presentation through natural language -generation. -" -4504,1704.00119,"Meysam Alizadeh, Ingmar Weber, Claudio Cioffi-Revilla, Santo - Fortunato, Michael Macy",Psychological and Personality Profiles of Political Extremists,cs.CL cs.CY cs.SI physics.soc-ph," Global recruitment into radical Islamic movements has spurred renewed -interest in the appeal of political extremism. Is the appeal a rational -response to material conditions or is it the expression of psychological and -personality disorders associated with aggressive behavior, intolerance, -conspiratorial imagination, and paranoia? Empirical answers using surveys have -been limited by lack of access to extremist groups, while field studies have -lacked psychological measures and failed to compare extremists with contrast -groups. We revisit the debate over the appeal of extremism in the U.S. context -by comparing publicly available Twitter messages written by over 355,000 -political extremist followers with messages written by non-extremist U.S. -users. Analysis of text-based psychological indicators supports the moral -foundation theory which identifies emotion as a critical factor in determining -political orientation of individuals. Extremist followers also differ from -others in four of the Big Five personality traits. -" -4505,1704.00135,Vadim Markovtsev and Eiso Kant,"Topic modeling of public repositories at scale using names in source - code",cs.PL cs.CL," Programming languages themselves have a limited number of reserved keywords -and character based tokens that define the language specification. However, -programmers have a rich use of natural language within their code through -comments, text literals and naming entities. The programmer defined names that -can be found in source code are a rich source of information to build a high -level understanding of the project. The goal of this paper is to apply topic -modeling to names used in over 13.6 million repositories and perceive the -inferred topics. One of the problems in such a study is the occurrence of -duplicate repositories not officially marked as forks (obscure forks). We show -how to address it using the same identifiers which are extracted for topic -modeling. - We open with a discussion on naming in source code, we then elaborate on our -approach to remove exact duplicate and fuzzy duplicate repositories using -Locality Sensitive Hashing on the bag-of-words model and then discuss our work -on topic modeling; and finally present the results from our data analysis -together with open-access to the source code, tools and datasets. -" -4506,1704.00177,Haixia Liu,Sentiment Analysis of Citations Using Word2vec,cs.CL," Citation sentiment analysis is an important task in scientific paper -analysis. Existing machine learning techniques for citation sentiment analysis -are focusing on labor-intensive feature engineering, which requires large -annotated corpus. As an automatic feature extraction tool, word2vec has been -successfully applied to sentiment analysis of short texts. In this work, I -conducted empirical research with the question: how well does word2vec work on -the sentiment analysis of citations? The proposed method constructed sentence -vectors (sent2vec) by averaging the word embeddings, which were learned from -Anthology Collections (ACL-Embeddings). I also investigated polarity-specific -word embeddings (PS-Embeddings) for classifying positive and negative -citations. The sentence vectors formed a feature space, to which the examined -citation sentence was mapped to. Those features were input into classifiers -(support vector machines) for supervised classification. Using -10-cross-validation scheme, evaluation was conducted on a set of annotated -citations. The results showed that word embeddings are effective on classifying -positive and negative citations. However, hand-crafted features performed -better for the overall classification. -" -4507,1704.00200,"Amrita Saha, Mitesh Khapra, Karthik Sankaranarayanan","Towards Building Large Scale Multimodal Domain-Aware Conversation - Systems",cs.CL," While multimodal conversation agents are gaining importance in several -domains such as retail, travel etc., deep learning research in this area has -been limited primarily due to the lack of availability of large-scale, open -chatlogs. To overcome this bottleneck, in this paper we introduce the task of -multimodal, domain-aware conversations, and propose the MMD benchmark dataset. -This dataset was gathered by working in close coordination with large number of -domain experts in the retail domain. These experts suggested various -conversations flows and dialog states which are typically seen in multimodal -conversations in the fashion domain. Keeping these flows and states in mind, we -created a dataset consisting of over 150K conversation sessions between -shoppers and sales agents, with the help of in-house annotators using a -semi-automated manually intense iterative process. With this dataset, we -propose 5 new sub-tasks for multimodal conversations along with their -evaluation methodology. We also propose two multimodal neural models in the -encode-attend-decode paradigm and demonstrate their performance on two of the -sub-tasks, namely text response generation and best image response selection. -These experiments serve to establish baseline performance and open new research -directions for each of these sub-tasks. Further, for each of the sub-tasks, we -present a `per-state evaluation' of 9 most significant dialog states, which -would enable more focused research into understanding the challenges and -complexities involved in each of these states. -" -4508,1704.00217,"Lianhui Qin, Zhisong Zhang, Hai Zhao, Zhiting Hu, Eric P. Xing","Adversarial Connective-exploiting Networks for Implicit Discourse - Relation Classification",cs.CL cs.AI cs.LG stat.ML," Implicit discourse relation classification is of great challenge due to the -lack of connectives as strong linguistic cues, which motivates the use of -annotated implicit connectives to improve the recognition. We propose a feature -imitation framework in which an implicit relation network is driven to learn -from another neural network with access to connectives, and thus encouraged to -extract similarly salient features for accurate classification. We develop an -adversarial model to enable an adaptive imitation scheme through competition -between the implicit network and a rival feature discriminator. Our method -effectively transfers discriminability of connectives to the implicit features, -and achieves state-of-the-art performance on the PDTB benchmark. -" -4509,1704.00253,"Jaehong Park, Jongyoon Song, Sungroh Yoon","Building a Neural Machine Translation System Using Only Synthetic - Parallel Data",cs.CL," Recent works have shown that synthetic parallel data automatically generated -by translation models can be effective for various neural machine translation -(NMT) issues. In this study, we build NMT systems using only synthetic parallel -data. As an efficient alternative to real parallel data, we also present a new -type of synthetic parallel corpus. The proposed pseudo parallel data are -distinct from previous works in that ground truth and synthetic examples are -mixed on both sides of sentence pairs. Experiments on Czech-German and -French-German translations demonstrate the efficacy of the proposed pseudo -parallel corpus, which shows not only enhanced results for bidirectional -translation tasks but also substantial improvement with the aid of a ground -truth real parallel corpus. -" -4510,1704.00380,"Junki Matsuo, Mamoru Komachi and Katsuhito Sudoh","Word-Alignment-Based Segment-Level Machine Translation Evaluation using - Word Embeddings",cs.CL," One of the most important problems in machine translation (MT) evaluation is -to evaluate the similarity between translation hypotheses with different -surface forms from the reference, especially at the segment level. We propose -to use word embeddings to perform word alignment for segment-level MT -evaluation. We performed experiments with three types of alignment methods -using word embeddings. We evaluated our proposed methods with various -translation datasets. Experimental results show that our proposed methods -outperform previous word embeddings-based methods. -" -4511,1704.00405,"Feng Qian, Lei Sha, Baobao Chang, Lu-chen Liu, Ming Zhang",Syntax Aware LSTM Model for Chinese Semantic Role Labeling,cs.CL," As for semantic role labeling (SRL) task, when it comes to utilizing parsing -information, both traditional methods and recent recurrent neural network (RNN) -based methods use the feature engineering way. In this paper, we propose Syntax -Aware Long Short Time Memory(SA-LSTM). The structure of SA-LSTM modifies -according to dependency parsing information in order to model parsing -information directly in an architecture engineering way instead of feature -engineering way. We experimentally demonstrate that SA-LSTM gains more -improvement from the model architecture. Furthermore, SA-LSTM outperforms the -state-of-the-art on CPB 1.0 significantly according to Student t-test -($p<0.05$). -" -4512,1704.00440,"Yinfei Yang, Ani Nenkova","Combining Lexical and Syntactic Features for Detecting Content-dense - Texts in News",cs.CL," Content-dense news report important factual information about an event in -direct, succinct manner. Information seeking applications such as information -extraction, question answering and summarization normally assume all text they -deal with is content-dense. Here we empirically test this assumption on news -articles from the business, U.S. international relations, sports and science -journalism domains. Our findings clearly indicate that about half of the news -texts in our study are in fact not content-dense and motivate the development -of a supervised content-density detector. We heuristically label a large -training corpus for the task and train a two-layer classifying model based on -lexical and unlexicalized syntactic features. On manually annotated data, we -compare the performance of domain-specific classifiers, trained on data only -from a given news domain and a general classifier in which data from all four -domains is pooled together. Our annotation and prediction experiments -demonstrate that the concept of content density varies depending on the domain -and that naive annotators provide judgement biased toward the stereotypical -domain label. Domain-specific classifiers are more accurate for domains in -which content-dense texts are typically fewer. Domain independent classifiers -reproduce better naive crowdsourced judgements. Classification prediction is -high across all conditions, around 80%. -" -4513,1704.00514,"Isabelle Augenstein, Anders S{\o}gaard",Multi-Task Learning of Keyphrase Boundary Classification,cs.CL cs.AI stat.ML," Keyphrase boundary classification (KBC) is the task of detecting keyphrases -in scientific articles and labelling them with respect to predefined types. -Although important in practice, this task is so far underexplored, partly due -to the lack of labelled data. To overcome this, we explore several auxiliary -tasks, including semantic super-sense tagging and identification of multi-word -expressions, and cast the task as a multi-task learning problem with deep -recurrent neural networks. Our multi-task models perform significantly better -than previous state of the art approaches on two scientific KBC datasets, -particularly for long keyphrases. -" -4514,1704.00552,"Daniel Hershcovich, Omri Abend and Ari Rappoport",A Transition-Based Directed Acyclic Graph Parser for UCCA,cs.CL," We present the first parser for UCCA, a cross-linguistically applicable -framework for semantic representation, which builds on extensive typological -work and supports rapid annotation. UCCA poses a challenge for existing parsing -techniques, as it exhibits reentrancy (resulting in DAG structures), -discontinuous structures and non-terminal nodes corresponding to complex -semantic units. To our knowledge, the conjunction of these formal properties is -not supported by any existing parser. Our transition-based parser, which uses a -novel transition set and features based on bidirectional LSTMs, has value not -just for UCCA parsing: its ability to handle more general graph structures can -inform the development of parsers for other semantic DAG structures, and in -languages that frequently use discontinuous structures. -" -4515,1704.00559,"Matthias Sperber, Graham Neubig, Jan Niehues, Alex Waibel",Neural Lattice-to-Sequence Models for Uncertain Inputs,cs.CL," The input to a neural sequence-to-sequence model is often determined by an -up-stream system, e.g. a word segmenter, part of speech tagger, or speech -recognizer. These up-stream models are potentially error-prone. Representing -inputs through word lattices allows making this uncertainty explicit by -capturing alternative sequences and their posterior probabilities in a compact -form. In this work, we extend the TreeLSTM (Tai et al., 2015) into a -LatticeLSTM that is able to consume word lattices, and can be used as encoder -in an attentional encoder-decoder model. We integrate lattice posterior scores -into this architecture by extending the TreeLSTM's child-sum and forget gates -and introducing a bias term into the attention mechanism. We experiment with -speech translation lattices and report consistent improvements over baselines -that translate either the 1-best hypothesis or the lattice without posterior -scores. -" -4516,1704.00656,"Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, Rob - Procter",Detection and Resolution of Rumours in Social Media: A Survey,cs.CL cs.HC cs.IR cs.SI," Despite the increasing use of social media platforms for information and news -gathering, its unmoderated nature often leads to the emergence and spread of -rumours, i.e. pieces of information that are unverified at the time of posting. -At the same time, the openness of social media platforms provides opportunities -to study how users share and discuss rumours, and to explore how natural -language processing and data mining techniques may be used to find ways of -determining their veracity. In this survey we introduce and discuss two types -of rumours that circulate on social media; long-standing rumours that circulate -for long periods of time, and newly-emerging rumours spawned during fast-paced -events such as breaking news, where reports are released piecemeal and often -with an unverified status in their early stages. We provide an overview of -research into social media rumours with the ultimate goal of developing a -rumour classification system that consists of four components: rumour -detection, rumour tracking, rumour stance classification and rumour veracity -classification. We delve into the approaches presented in the scientific -literature for the development of each of these four components. We summarise -the efforts and achievements so far towards the development of rumour -classification systems and conclude with suggestions for avenues for future -research in social media mining for detection and resolution of rumours. -" -4517,1704.00717,"Arjun Chandrasekaran, Deshraj Yadav, Prithvijit Chattopadhyay, Viraj - Prabhu, Devi Parikh",It Takes Two to Tango: Towards Theory of AI's Mind,cs.CV cs.AI cs.CL," Theory of Mind is the ability to attribute mental states (beliefs, intents, -knowledge, perspectives, etc.) to others and recognize that these mental states -may differ from one's own. Theory of Mind is critical to effective -communication and to teams demonstrating higher collective performance. To -effectively leverage the progress in Artificial Intelligence (AI) to make our -lives more productive, it is important for humans and AI to work well together -in a team. Traditionally, there has been much emphasis on research to make AI -more accurate, and (to a lesser extent) on having it better understand human -intentions, tendencies, beliefs, and contexts. The latter involves making AI -more human-like and having it develop a theory of our minds. In this work, we -argue that for human-AI teams to be effective, humans must also develop a -theory of AI's mind (ToAIM) - get to know its strengths, weaknesses, beliefs, -and quirks. We instantiate these ideas within the domain of Visual Question -Answering (VQA). We find that using just a few examples (50), lay people can be -trained to better predict responses and oncoming failures of a complex VQA -model. We further evaluate the role existing explanation (or interpretability) -modalities play in helping humans build ToAIM. Explainable AI has received -considerable scientific and popular attention in recent times. Surprisingly, we -find that having access to the model's internal states - its confidence in its -top-k predictions, explicit or implicit attention maps which highlight regions -in the image (and words in the question) the model is looking at (and listening -to) while answering a question about an image - do not help people better -predict its behavior. -" -4518,1704.00774,"Alexandre Salle, Aline Villavicencio","Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency - and Compositionality",cs.CL," Increasing the capacity of recurrent neural networks (RNN) usually involves -augmenting the size of the hidden layer, with significant increase of -computational cost. Recurrent neural tensor networks (RNTN) increase capacity -using distinct hidden layer weights for each word, but with greater costs in -memory usage. In this paper, we introduce restricted recurrent neural tensor -networks (r-RNTN) which reserve distinct hidden layer weights for frequent -vocabulary words while sharing a single set of weights for infrequent words. -Perplexity evaluations show that for fixed hidden layer sizes, r-RNTNs improve -language model performance over RNNs using only a small fraction of the -parameters of unrestricted RNTNs. These results hold for r-RNTNs using Gated -Recurrent Units and Long Short-Term Memory. -" -4519,1704.00784,"Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas - Eck",Online and Linear-Time Attention by Enforcing Monotonic Alignments,cs.LG cs.CL," Recurrent neural network models with an attention mechanism have proven to be -extremely effective on a wide variety of sequence-to-sequence problems. -However, the fact that soft attention mechanisms perform a pass over the entire -input sequence when producing each element in the output sequence precludes -their use in online settings and results in a quadratic time complexity. Based -on the insight that the alignment between input and output sequence elements is -monotonic in many problems of interest, we propose an end-to-end differentiable -method for learning monotonic alignments which, at test time, enables computing -attention online and in linear time. We validate our approach on sentence -summarization, machine translation, and online speech recognition problems and -achieve results competitive with existing sequence-to-sequence models. -" -4520,1704.00849,"Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, and Hsin-Min Wang","Voice Conversion from Unaligned Corpora using Variational Autoencoding - Wasserstein Generative Adversarial Networks",cs.CL," Building a voice conversion (VC) system from non-parallel speech corpora is -challenging but highly valuable in real application scenarios. In most -situations, the source and the target speakers do not repeat the same texts or -they may even speak different languages. In this case, one possible, although -indirect, solution is to build a generative model for speech. Generative models -focus on explaining the observations with latent variables instead of learning -a pairwise transformation function, thereby bypassing the requirement of speech -frame alignment. In this paper, we propose a non-parallel VC framework with a -variational autoencoding Wasserstein generative adversarial network (VAW-GAN) -that explicitly considers a VC objective when building the speech model. -Experimental results corroborate the capability of our framework for building a -VC system from unaligned data, and demonstrate improved conversion quality. -" -4521,1704.00898,"J Ganesh, Manish Gupta, Vasudeva Varma",Interpretation of Semantic Tweet Representations,cs.CL," Research in analysis of microblogging platforms is experiencing a renewed -surge with a large number of works applying representation learning models for -applications like sentiment analysis, semantic textual similarity computation, -hashtag prediction, etc. Although the performance of the representation -learning models has been better than the traditional baselines for such tasks, -little is known about the elementary properties of a tweet encoded within these -representations, or why particular representations work better for certain -tasks. Our work presented here constitutes the first step in opening the -black-box of vector embeddings for tweets. Traditional feature engineering -methods for high-level applications have exploited various elementary -properties of tweets. We believe that a tweet representation is effective for -an application because it meticulously encodes the application-specific -elementary properties of tweets. To understand the elementary properties -encoded in a tweet representation, we evaluate the representations on the -accuracy to which they can model each of those properties such as tweet length, -presence of particular words, hashtags, mentions, capitalization, etc. Our -systematic extensive study of nine supervised and four unsupervised tweet -representations against most popular eight textual and five social elementary -properties reveal that Bi-directional LSTMs (BLSTMs) and Skip-Thought Vectors -(STV) best encode the textual and social properties of tweets respectively. -FastText is the best model for low resource settings, providing very little -degradation with reduction in embedding size. Finally, we draw interesting -insights by correlating the model performance obtained for elementary property -prediction tasks with the highlevel downstream applications. -" -4522,1704.00924,Ryosuke Miyazaki and Mamoru Komachi,"Japanese Sentiment Classification using a Tree-Structured Long - Short-Term Memory with Attention",cs.CL," Previous approaches to training syntax-based sentiment classification models -required phrase-level annotated corpora, which are not readily available in -many languages other than English. Thus, we propose the use of tree-structured -Long Short-Term Memory with an attention mechanism that pays attention to each -subtree of the parse tree. Experimental results indicate that our model -achieves the state-of-the-art performance in a Japanese sentiment -classification task. -" -4523,1704.00939,"Youness Mansar, Lorenzo Gatti, Sira Ferradans, Marco Guerini, Jacopo - Staiano","Fortia-FBK at SemEval-2017 Task 5: Bullish or Bearish? Inferring - Sentiment towards Brands from Financial News Headlines",cs.CL cs.CY," In this paper, we describe a methodology to infer Bullish or Bearish -sentiment towards companies/brands. More specifically, our approach leverages -affective lexica and word embeddings in combination with convolutional neural -networks to infer the sentiment of financial news headlines towards a target -company. Such architecture was used and evaluated in the context of the SemEval -2017 challenge (task 5, subtask 2), in which it obtained the best performance. -" -4524,1704.01074,"Hao Zhou, Minlie Huang, Tianyang Zhang, Xiaoyan Zhu, Bing Liu","Emotional Chatting Machine: Emotional Conversation Generation with - Internal and External Memory",cs.CL cs.AI," Perception and expression of emotion are key factors to the success of -dialogue systems or conversational agents. However, this problem has not been -studied in large-scale conversation generation so far. In this paper, we -propose Emotional Chatting Machine (ECM) that can generate appropriate -responses not only in content (relevant and grammatical) but also in emotion -(emotionally consistent). To the best of our knowledge, this is the first work -that addresses the emotion factor in large-scale conversation generation. ECM -addresses the factor using three new mechanisms that respectively (1) models -the high-level abstraction of emotion expressions by embedding emotion -categories, (2) captures the change of implicit internal emotion states, and -(3) uses explicit emotion expressions with an external emotion vocabulary. -Experiments show that the proposed model can generate responses appropriate not -only in content but also in emotion. -" -4525,1704.01314,"Yan Shao and Christian Hardmeier and J\""org Tiedemann and Joakim Nivre","Character-based Joint Segmentation and POS Tagging for Chinese using - Bidirectional RNN-CRF",cs.CL," We present a character-based model for joint segmentation and POS tagging for -Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is -adapted and applied with novel vector representations of Chinese characters -that capture rich contextual information and lower-than-character level -features. The proposed model is extensively evaluated and compared with a -state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The -experimental results indicate that our model is accurate and robust across -datasets in different sizes, genres and annotation schemes. We obtain -state-of-the-art performance on CTB5, achieving 94.38 F1-score for joint -segmentation and POS tagging. -" -4526,1704.01346,"Jeremy Ferrero, Frederic Agnes, Laurent Besacier, Didier Schwab","CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection - Methods for Semantic Textual Similarity",cs.CL," We present our submitted systems for Semantic Textual Similarity (STS) Track -4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must -estimate their semantic similarity by a score between 0 and 5. In our -submission, we use syntax-based, dictionary-based, context-based, and MT-based -methods. We also combine these methods in unsupervised and supervised way. Our -best run ranked 1st on track 4a with a correlation of 83.02% with human -annotations. -" -4527,1704.01419,"Avo Murom\""agi, Kairit Sirts, Sven Laur",Linear Ensembles of Word Embedding Models,cs.CL," This paper explores linear methods for combining several word embedding -models into an ensemble. We construct the combined models using an iterative -method based on either ordinary least squares regression or the solution to the -orthogonal Procrustes problem. - We evaluate the proposed approaches on Estonian---a morphologically complex -language, for which the available corpora for training word embeddings are -relatively small. We compare both combined models with each other and with the -input word embedding models using synonym and analogy tests. The results show -that while using the ordinary least squares regression performs poorly in our -experiments, using orthogonal Procrustes to combine several word embedding -models into an ensemble model leads to 7-10% relative improvements over the -mean result of the initial models in synonym tests and 19-47% in analogy tests. -" -4528,1704.01444,"Alec Radford, Rafal Jozefowicz, Ilya Sutskever",Learning to Generate Reviews and Discovering Sentiment,cs.LG cs.CL cs.NE," We explore the properties of byte-level recurrent language models. When given -sufficient amounts of capacity, training data, and compute time, the -representations learned by these models include disentangled features -corresponding to high-level concepts. Specifically, we find a single unit which -performs sentiment analysis. These representations, learned in an unsupervised -manner, achieve state of the art on the binary subset of the Stanford Sentiment -Treebank. They are also very data efficient. When using only a handful of -labeled examples, our approach matches the performance of strong baselines -trained on full datasets. We also demonstrate the sentiment unit has a direct -influence on the generative process of the model. Simply fixing its value to be -positive or negative generates samples with the corresponding positive or -negative sentiment. -" -4529,1704.01523,"Ji Young Lee, Franck Dernoncourt, Peter Szolovits","MIT at SemEval-2017 Task 10: Relation Extraction with Convolutional - Neural Networks",cs.CL cs.AI cs.NE stat.ML," Over 50 million scholarly articles have been published: they constitute a -unique repository of knowledge. In particular, one may infer from them -relations between scientific concepts, such as synonyms and hyponyms. -Artificial neural networks have been recently explored for relation extraction. -In this work, we continue this line of work and present a system based on a -convolutional neural network to extract relations. Our model ranked first in -the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific -articles (subtask C). -" -4530,1704.01599,Christina Lioma and Birger Larsen and Wei Lu,Rhetorical relations for information retrieval,cs.IR cs.CL," Typically, every part in most coherent text has some plausible reason for its -presence, some function that it performs to the overall semantics of the text. -Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts -of a text are linked to each other. Knowledge about this socalled discourse -structure has been applied successfully to several natural language processing -tasks. This work studies the use of rhetorical relations for Information -Retrieval (IR): Is there a correlation between certain rhetorical relations and -retrieval performance? Can knowledge about a document's rhetorical relations be -useful to IR? We present a language model modification that considers -rhetorical relations when estimating the relevance of a document to a query. -Empirical evaluation of different versions of our model on TREC settings shows -that certain rhetorical relations can benefit retrieval effectiveness notably -(> 10% in mean average precision over a state-of-the-art baseline). -" -4531,1704.01631,"Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu","Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder - Based Speech Recognition",cs.CL cs.AI," End-to-end training of deep learning-based models allows for implicit -learning of intermediate representations based on the final task loss. However, -the end-to-end approach ignores the useful domain knowledge encoded in explicit -intermediate-level supervision. We hypothesize that using intermediate -representations as auxiliary supervision at lower levels of deep networks may -be a good way of combining the advantages of end-to-end training and more -traditional pipeline approaches. We present experiments on conversational -speech recognition where we use lower-level tasks, such as phoneme recognition, -in a multitask training approach with an encoder-decoder model for direct -character transcription. We compare multiple types of lower-level tasks and -analyze the effects of the auxiliary tasks. Our results on the Switchboard -corpus show that this approach improves recognition accuracy over a standard -encoder-decoder model on the Eval2000 test set. -" -4532,1704.01653,"Yaniv Sheena, M\'i\v{s}a Hejn\'a, Yossi Adi, Joseph Keshet",Automatic Measurement of Pre-aspiration,cs.CL," Pre-aspiration is defined as the period of glottal friction occurring in -sequences of vocalic/consonantal sonorants and phonetically voiceless -obstruents. We propose two machine learning methods for automatic measurement -of pre-aspiration duration: a feedforward neural network, which works at the -frame level; and a structured prediction model, which relies on manually -designed feature functions, and works at the segment level. The input for both -algorithms is a speech signal of an arbitrary length containing a single -obstruent, and the output is a pair of times which constitutes the -pre-aspiration boundaries. We train both models on a set of manually annotated -examples. Results suggest that the structured model is superior to the -frame-based model as it yields higher accuracy in predicting the boundaries and -generalizes to new speakers and new languages. Finally, we demonstrate the -applicability of our structured prediction algorithm by replicating linguistic -analysis of pre-aspiration in Aberystwyth English with high correlation. -" -4533,1704.01691,Chunting Zhou and Graham Neubig,"Multi-space Variational Encoder-Decoders for Semi-supervised Labeled - Sequence Transduction",cs.CL cs.LG," Labeled sequence transduction is a task of transforming one sequence into -another sequence that satisfies desiderata specified by a set of labels. In -this paper we propose multi-space variational encoder-decoders, a new model for -labeled sequence transduction with semi-supervised learning. The generative -model can use neural networks to handle both discrete and continuous latent -variables to exploit various features of data. Experiments show that our model -provides not only a powerful supervised framework but also can effectively take -advantage of the unlabeled data. On the SIGMORPHON morphological inflection -benchmark, our model outperforms single-model state-of-art results by a large -margin for the majority of languages. -" -4534,1704.01696,"Pengcheng Yin, Graham Neubig",A Syntactic Neural Model for General-Purpose Code Generation,cs.CL cs.PL cs.SE," We consider the problem of parsing natural language descriptions into source -code written in a general-purpose programming language like Python. Existing -data-driven methods treat this problem as a language generation task without -considering the underlying syntax of the target programming language. Informed -by previous work in semantic parsing, in this paper we propose a novel neural -architecture powered by a grammar model to explicitly capture the target syntax -as prior knowledge. Experiments find this an effective way to scale up to -generation of complex programs from natural language descriptions, achieving -state-of-the-art results that well outperform previous code generation and -semantic parsing approaches. -" -4535,1704.01748,"Lu\'is Campos, Francisco Couto","MRA - Proof of Concept of a Multilingual Report Annotator Web - Application",cs.CL," MRA (Multilingual Report Annotator) is a web application that translates -Radiology text and annotates it with RadLex terms. Its goal is to explore the -solution of translating non-English Radiology reports as a way to solve the -problem of most of the Text Mining tools being developed for English. In this -brief paper we explain the language barrier problem and shortly describe the -application. MRA can be found at https://github.com/lasigeBioTM/MRA . -" -4536,1704.01792,"Qingyu Zhou, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, Ming Zhou",Neural Question Generation from Text: A Preliminary Study,cs.CL," Automatic question generation aims to generate questions from a text passage -where the generated questions can be answered by certain sub-spans of the given -passage. Traditional methods mainly use rigid heuristic rules to transform a -sentence into related questions. In this work, we propose to apply the neural -encoder-decoder model to generate meaningful and diverse questions from natural -language sentences. The encoder reads the input text and the answer position, -to produce an answer-aware input representation, which is fed to the decoder to -generate an answer focused question. We conduct a preliminary study on neural -question generation from text with the SQuAD dataset, and the experiment -results show that our method can produce fluent and diverse questions. -" -4537,1704.01938,Oded Avraham and Yoav Goldberg,The Interplay of Semantics and Morphology in Word Embeddings,cs.CL," We explore the ability of word embeddings to capture both semantic and -morphological similarity, as affected by the different types of linguistic -properties (surface form, lemma, morphological tag) used to compose the -representation of each word. We train several models, where each uses a -different subset of these properties to compose its representations. By -evaluating the models on semantic and morphological measures, we reveal some -useful insights on the relationship between semantics and morphology. -" -4538,1704.01975,"Eric S. Tellez, Daniela Moctezuma, Sabino Miranda-J\'imenez, Mario - Graff","An Automated Text Categorization Framework based on Hyperparameter - Optimization",cs.CL cs.AI stat.ML," A great variety of text tasks such as topic or spam identification, user -profiling, and sentiment analysis can be posed as a supervised learning problem -and tackle using a text classifier. A text classifier consists of several -subprocesses, some of them are general enough to be applied to any supervised -learning problem, whereas others are specifically designed to tackle a -particular task, using complex and computational expensive processes such as -lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we -propose a minimalistic and wide system able to tackle text classification tasks -independent of domain and language, namely microTC. It is composed by some easy -to implement text transformations, text representations, and a supervised -learning algorithm. These pieces produce a competitive classifier even in the -domain of informally written text. We provide a detailed description of microTC -along with an extensive experimental comparison with relevant state-of-the-art -methods. mircoTC was compared on 30 different datasets. Regarding accuracy, -microTC obtained the best performance in 20 datasets while achieves competitive -results in the remaining 10. The compared datasets include several problems -like topic and polarity classification, spam detection, user profiling and -authorship attribution. Furthermore, it is important to state that our approach -allows the usage of the technology even without knowledge of machine learning -and natural language processing. -" -4539,1704.02080,Vicky Zayats and Mari Ostendorf,Conversation Modeling on Reddit using a Graph-Structured LSTM,cs.CL," This paper presents a novel approach for modeling threaded discussions on -social media using a graph-structured bidirectional LSTM which represents both -hierarchical and temporal conversation structure. In experiments with a task of -predicting popularity of comments in Reddit discussions, the proposed model -outperforms a node-independent architecture for different sets of input -features. Analyses show a benefit to the model over the full course of the -discussion, improving detection in both early and late stages. Further, the use -of language cues with the bidirectional tree state updates helps with -identifying controversial comments. -" -4540,1704.02090,"Yi-Kun Tang, Xian-Ling Mao, Heyan Huang, Guihua Wen",Conceptualization Topic Modeling,cs.CL cs.IR," Recently, topic modeling has been widely used to discover the abstract topics -in text corpora. Most of the existing topic models are based on the assumption -of three-layer hierarchical Bayesian structure, i.e. each document is modeled -as a probability distribution over topics, and each topic is a probability -distribution over words. However, the assumption is not optimal. Intuitively, -it's more reasonable to assume that each topic is a probability distribution -over concepts, and then each concept is a probability distribution over words, -i.e. adding a latent concept layer between topic layer and word layer in -traditional three-layer assumption. In this paper, we verify the proposed -assumption by incorporating the new assumption in two representative topic -models, and obtain two novel topic models. Extensive experiments were conducted -among the proposed models and corresponding baselines, and the results show -that the proposed models significantly outperform the baselines in terms of -case study and perplexity, which means the new assumption is more reasonable -than traditional one. -" -4541,1704.02134,"Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Archna Bhatia, Na-Rae - Han, Tim O'Gorman, Sarah R. Moeller, Omri Abend, Adi Shalev, Austin Blodgett, - Jakob Prange",Adposition and Case Supersenses v2.6: Guidelines for English,cs.CL," This document offers a detailed linguistic description of SNACS (Semantic -Network of Adposition and Case Supersenses; Schneider et al., 2018), an -inventory of 52 semantic labels (""supersenses"") that characterize the use of -adpositions and case markers at a somewhat coarse level of granularity, as -demonstrated in the STREUSLE corpus (https://github.com/nert-nlp/streusle/ ; -version 4.5 tracks guidelines version 2.6). Though the SNACS inventory aspires -to be universal, this document is specific to English; documentation for other -languages will be published separately. - Version 2 is a revision of the supersense inventory proposed for English by -Schneider et al. (2015, 2016) (henceforth ""v1""), which in turn was based on -previous schemes. The present inventory was developed after extensive review of -the v1 corpus annotations for English, plus previously unanalyzed genitive case -possessives (Blodgett and Schneider, 2018), as well as consideration of -adposition and case phenomena in Hebrew, Hindi, Korean, and German. Hwang et -al. (2017) present the theoretical underpinnings of the v2 scheme. Schneider et -al. (2018) summarize the scheme, its application to English corpus data, and an -automatic disambiguation task. Liu et al. (2021) offer an English Lexical -Semantic Recognition tagger that includes SNACS labels in its output. - This documentation can also be browsed alongside corpus data on the Xposition -website (Gessler et al., 2022): http://www.xposition.org/ -" -4542,1704.02156,Rik van Noord and Johan Bos,"The Meaning Factory at SemEval-2017 Task 9: Producing AMRs with Neural - Semantic Parsing",cs.CL," We evaluate a semantic parser based on a character-based sequence-to-sequence -model in the context of the SemEval-2017 shared task on semantic parsing for -AMRs. With data augmentation, super characters, and POS-tagging we gain major -improvements in performance compared to a baseline character-level model. -Although we improve on previous character-based neural semantic parsing models, -the overall accuracy is still lower than a state-of-the-art AMR parser. An -ensemble combining our neural semantic parser with an existing, traditional -parser, yields a small gain in performance. -" -4543,1704.02215,"Steffen Eger, Erik-L\^an Do Dinh, Ilia Kuznetsov, Masoud Kiaeeha, - Iryna Gurevych","EELECTION at SemEval-2017 Task 10: Ensemble of nEural Learners for - kEyphrase ClassificaTION",cs.CL," This paper describes our approach to the SemEval 2017 Task 10: ""Extracting -Keyphrases and Relations from Scientific Publications"", specifically to Subtask -(B): ""Classification of identified keyphrases"". We explored three different -deep learning approaches: a character-level convolutional neural network (CNN), -a stacked learner with an MLP meta-classifier, and an attention based Bi-LSTM. -From these approaches, we created an ensemble of differently -hyper-parameterized systems, achieving a micro-F1-score of 0.63 on the test -data. Our approach ranks 2nd (score of 1st placed system: 0.64) out of four -according to this official score. However, we erroneously trained 2 out of 3 -neural nets (the stacker and the CNN) on only roughly 15% of the full data, -namely, the original development set. When trained on the full data -(training+development), our ensemble has a micro-F1-score of 0.69. Our code is -available from https://github.com/UKPLab/semeval2017-scienceie. -" -4544,1704.02263,"Edilson A. Corr\^ea Jr., Vanessa Queiroz Marinho, Leandro Borges dos - Santos","NILC-USP at SemEval-2017 Task 4: A Multi-view Ensemble for Twitter - Sentiment Analysis",cs.CL cs.LG," This paper describes our multi-view ensemble approach to SemEval-2017 Task 4 -on Sentiment Analysis in Twitter, specifically, the Message Polarity -Classification subtask for English (subtask A). Our system is a voting -ensemble, where each base classifier is trained in a different feature space. -The first space is a bag-of-words model and has a Linear SVM as base -classifier. The second and third spaces are two different strategies of -combining word embeddings to represent sentences and use a Linear SVM and a -Logistic Regressor as base classifiers. The proposed system was ranked 18th out -of 38 systems considering F1 score and 20th considering recall. -" -4545,1704.02293,"Lo\""ic Vial and Andon Tchechmedjiev and Didier Schwab",Comparison of Global Algorithms in Word Sense Disambiguation,cs.CL," This article compares four probabilistic algorithms (global algorithms) for -Word Sense Disambiguation (WSD) in terms of the number of scorer calls (local -algo- rithm) and the F1 score as determined by a gold-standard scorer. Two -algorithms come from the state of the art, a Simulated Annealing Algorithm -(SAA) and a Genetic Algorithm (GA) as well as two algorithms that we first -adapt from WSD that are state of the art probabilistic search algorithms, -namely a Cuckoo search algorithm (CSA) and a Bat Search algorithm (BS). As WSD -requires to evaluate exponentially many word sense combinations (with branching -factors of up to 6 or more), probabilistic algorithms allow to find approximate -solution in a tractable time by sampling the search space. We find that CSA, GA -and SA all eventually converge to similar results (0.98 F1 score), but CSA gets -there faster (in fewer scorer calls) and reaches up to 0.95 F1 before SA in -fewer scorer calls. In BA a strict convergence criterion prevents it from -reaching above 0.89 F1. -" -4546,1704.02298,"Rose Catherine, William Cohen",TransNets: Learning to Transform for Recommendation,cs.IR cs.CL cs.LG," Recently, deep learning methods have been shown to improve the performance of -recommender systems over traditional methods, especially when review text is -available. For example, a recent model, DeepCoNN, uses neural nets to learn one -latent representation for the text of all reviews written by a target user, and -a second latent representation for the text of all reviews for a target item, -and then combines these latent representations to obtain state-of-the-art -performance on recommendation tasks. We show that (unsurprisingly) much of the -predictive value of review text comes from reviews of the target user for the -target item. We then introduce a way in which this information can be used in -recommendation, even when the target user's review for the target item is not -available. Our model, called TransNets, extends the DeepCoNN model by -introducing an additional latent layer representing the target user-target item -pair. We then regularize this layer, at training time, to be similar to another -latent representation of the target user's review of the target item. We show -that TransNets and extensions of it improve substantially over the previous -state-of-the-art. -" -4547,1704.02312,"Yaoyuan Zhang, Zhenxu Ye, Yansong Feng, Dongyan Zhao, Rui Yan","A Constrained Sequence-to-Sequence Neural Model for Sentence - Simplification",cs.CL cs.AI cs.NE," Sentence simplification reduces semantic complexity to benefit people with -language impairments. Previous simplification studies on the sentence level and -word level have achieved promising results but also meet great challenges. For -sentence-level studies, sentences after simplification are fluent but sometimes -are not really simplified. For word-level studies, words are simplified but -also have potential grammar errors due to different usages of words before and -after simplification. In this paper, we propose a two-step simplification -framework by combining both the word-level and the sentence-level -simplifications, making use of their corresponding advantages. Based on the -two-step framework, we implement a novel constrained neural generation model to -simplify sentences given simplified words. The final results on Wikipedia and -Simple Wikipedia aligned datasets indicate that our method yields better -performance than various baselines. -" -4548,1704.02360,"Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, and Hiroshi - Saruwatari","Voice Conversion Using Sequence-to-Sequence Learning of Context - Posterior Probabilities",cs.SD cs.CL cs.LG," Voice conversion (VC) using sequence-to-sequence learning of context -posterior probabilities is proposed. Conventional VC using shared context -posterior probabilities predicts target speech parameters from the context -posterior probabilities estimated from the source speech parameters. Although -conventional VC can be built from non-parallel data, it is difficult to convert -speaker individuality such as phonetic property and speaking rate contained in -the posterior probabilities because the source posterior probabilities are -directly used for predicting target speech parameters. In this work, we assume -that the training data partly include parallel speech data and propose -sequence-to-sequence learning between the source and target posterior -probabilities. The conversion models perform non-linear and variable-length -transformation from the source probability sequence to the target one. Further, -we propose a joint training algorithm for the modules. In contrast to -conventional VC, which separately trains the speech recognition that estimates -posterior probabilities and the speech synthesis that predicts target speech -parameters, our proposed method jointly trains these modules along with the -proposed probability conversion modules. Experimental results demonstrate that -our approach outperforms the conventional VC. -" -4549,1704.02362,"Zhe Liu, Anbang Xu, Mengdi Zhang, Jalal Mahmud and Vibha Sinha","Fostering User Engagement: Rhetorical Devices for Applause Generation - Learnt from TED Talks",cs.CL," One problem that every presenter faces when delivering a public discourse is -how to hold the listeners' attentions or to keep them involved. Therefore, many -studies in conversation analysis work on this issue and suggest qualitatively -con-structions that can effectively lead to audience's applause. To investigate -these proposals quantitatively, in this study we an-alyze the transcripts of -2,135 TED Talks, with a particular fo-cus on the rhetorical devices that are -used by the presenters for applause elicitation. Through conducting regression -anal-ysis, we identify and interpret 24 rhetorical devices as triggers of -audience applauding. We further build models that can rec-ognize -applause-evoking sentences and conclude this work with potential implications. -" -4550,1704.02385,Luis Gerardo Mojica,"A Trolling Hierarchy in Social Media and A Conditional Random Field For - Trolling Detection",cs.CL," An-ever increasing number of social media websites, electronic newspapers and -Internet forums allow visitors to leave comments for others to read and -interact. This exchange is not free from participants with malicious -intentions, which do not contribute with the written conversation. Among -different communities users adopt strategies to handle such users. In this -paper we present a comprehensive categorization of the trolling phenomena -resource, inspired by politeness research and propose a model that jointly -predicts four crucial aspects of trolling: intention, interpretation, intention -disclosure and response strategy. Finally, we present a new annotated dataset -containing excerpts of conversations involving trolls and the interactions with -other users that we hope will be a useful resource for the research community. -" -4551,1704.02497,Steffen Eger and Alexander Mehler,"On the Linearity of Semantic Change: Investigating Meaning Variation via - Dynamic Graph Models",cs.CL," We consider two graph models of semantic change. The first is a time-series -model that relates embedding vectors from one time period to embedding vectors -of previous time periods. In the second, we construct one graph for each word: -nodes in this graph correspond to time points and edge weights to the -similarity of the word's meaning across two time points. We apply our two -models to corpora across three different languages. We find that semantic -change is linear in two senses. Firstly, today's embedding vectors (= meaning) -of words can be derived as linear combinations of embedding vectors of their -neighbors in previous time periods. Secondly, self-similarity of words decays -linearly in time. We consider both findings as new laws/hypotheses of semantic -change. -" -4552,1704.02565,Dafydd Gibbon,Prosody: The Rhythms and Melodies of Speech,cs.CL," The present contribution is a tutorial on selected aspects of prosody, the -rhythms and melodies of speech, based on a course of the same name at the -Summer School on Contemporary Phonetics and Phonology at Tongji University, -Shanghai, China in July 2016. The tutorial is not intended as an introduction -to experimental methodology or as an overview of the literature on the topic, -but as an outline of observationally accessible aspects of fundamental -frequency and timing patterns with the aid of computational visualisation, -situated in a semiotic framework of sign ranks and interpretations. After an -informal introduction to the basic concepts of prosody in the introduction and -a discussion of the place of prosody in the architecture of language, a -selection of acoustic phonetic topics in phonemic tone and accent prosody, word -prosody, phrasal prosody and discourse prosody are discussed, and a stylisation -method for visualising aspects of prosody is introduced. Examples are taken -from a number of typologically different languages: Anyi/Agni (Niger-Congo>Kwa, -Ivory Coast), English, Kuki-Thadou (Sino-Tibetan, North-East India and -Myanmar), Mandarin Chinese, Tem (Niger-Congo>Gur, Togo) and Farsi. The main -focus is on fundamental frequency patterns, but issues of timing and rhythm are -also discussed. In the final section, further reading and possible future -research directions are outlined. -" -4553,1704.02686,Eric Bailey and Shuchin Aeron,Word Embeddings via Tensor Factorization,stat.ML cs.CL cs.LG," Most popular word embedding techniques involve implicit or explicit -factorization of a word co-occurrence based matrix into low rank factors. In -this paper, we aim to generalize this trend by using numerical methods to -factor higher-order word co-occurrence based arrays, or \textit{tensors}. We -present four word embeddings using tensor factorization and analyze their -advantages and disadvantages. One of our main contributions is a novel joint -symmetric tensor factorization technique related to the idea of coupled tensor -factorization. We show that embeddings based on tensor factorization can be -used to discern the various meanings of polysemous words without being -explicitly trained to do so, and motivate the intuition behind why this works -in a way that doesn't with existing methods. We also modify an existing word -embedding evaluation metric known as Outlier Detection [Camacho-Collados and -Navigli, 2016] to evaluate the quality of the order-$N$ relations that a word -embedding captures, and show that tensor-based methods outperform existing -matrix-based methods at this task. Experimentally, we show that all of our word -embeddings either outperform or are competitive with state-of-the-art baselines -commonly used today on a variety of recent datasets. Suggested applications of -tensor factorization-based word embeddings are given, and all source code and -pre-trained vectors are publicly available online. -" -4554,1704.02709,"Quynh Ngoc Thi Do, Steven Bethard, Marie-Francine Moens","Improving Implicit Semantic Role Labeling by Predicting Semantic Frame - Arguments",cs.CL," Implicit semantic role labeling (iSRL) is the task of predicting the semantic -roles of a predicate that do not appear as explicit arguments, but rather -regard common sense knowledge or are mentioned earlier in the discourse. We -introduce an approach to iSRL based on a predictive recurrent neural semantic -frame model (PRNSFM) that uses a large unannotated corpus to learn the -probability of a sequence of semantic arguments given a predicate. We leverage -the sequence probabilities predicted by the PRNSFM to estimate selectional -preferences for predicates and their arguments. On the NomBank iSRL test set, -our approach improves state-of-the-art performance on implicit semantic role -labeling with less reliance than prior work on manually constructed language -resources. -" -4555,1704.02788,"Chuanqi Tan, Furu Wei, Pengjie Ren, Weifeng Lv, Ming Zhou",Entity Linking for Queries by Searching Wikipedia Sentences,cs.CL," We present a simple yet effective approach for linking entities in queries. -The key idea is to search sentences similar to a query from Wikipedia articles -and directly use the human-annotated entities in the similar sentences as -candidate entities for the query. Then, we employ a rich set of features, such -as link-probability, context-matching, word embeddings, and relatedness among -candidate entities as well as their related entities, to rank the candidates -under a regression based framework. The advantages of our approach lie in two -aspects, which contribute to the ranking process and final linking result. -First, it can greatly reduce the number of candidate entities by filtering out -irrelevant entities with the words in the query. Second, we can obtain the -query sensitive prior probability in addition to the static link-probability -derived from all Wikipedia articles. We conduct experiments on two benchmark -datasets on entity linking for queries, namely the ERD14 dataset and the GERDAQ -dataset. Experimental results show that our method outperforms state-of-the-art -systems and yields 75.0% in F1 on the ERD14 dataset and 56.9% on the GERDAQ -dataset. -" -4556,1704.02813,"Lyan Verwimp, Joris Pelemans, Hugo Van hamme and Patrick Wambacq",Character-Word LSTM Language Models,cs.CL," We present a Character-Word Long Short-Term Memory Language Model which both -reduces the perplexity with respect to a baseline word-level language model and -reduces the number of parameters of the model. Character information can reveal -structural (dis)similarities between words and can even be used when a word is -out-of-vocabulary, thus improving the modeling of infrequent and unknown words. -By concatenating word and character embeddings, we achieve up to 2.77% relative -improvement on English compared to a baseline model with a similar amount of -parameters and 4.57% on Dutch. Moreover, we also outperform baseline word-level -models with a larger number of parameters. -" -4557,1704.02841,"Maria Chiara Caschera, Fernando Ferri, Patrizia Grifoni",From Modal to Multimodal Ambiguities: a Classification Approach,cs.HC cs.CL," This paper deals with classifying ambiguities for Multimodal Languages. It -evolves the classifications and the methods of the literature on ambiguities -for Natural Language and Visual Language, empirically defining an original -classification of ambiguities for multimodal interaction using a linguistic -perspective. This classification distinguishes between Semantic and Syntactic -multimodal ambiguities and their subclasses, which are intercepted using a -rule-based method implemented in a software module. The experimental results -have achieved an accuracy of the obtained classification compared to the -expected one, which are defined by the human judgment, of 94.6% for the -semantic ambiguities classes, and 92.1% for the syntactic ambiguities classes. -" -4558,1704.02853,"Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, - Andrew McCallum","SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations - from Scientific Publications",cs.CL cs.AI stat.ML," We describe the SemEval task of extracting keyphrases and relations between -them from scientific documents, which is crucial for understanding which -publications describe which processes, tasks and materials. Although this was a -new task, we had a total of 26 submissions across 3 evaluation scenarios. We -expect the task and the findings reported in this paper to be relevant for -researchers working on understanding scientific content, as well as the broader -knowledge base population and information extraction communities. -" -4559,1704.02923,"Ionut Sorodoc, Sandro Pezzelle, Aur\'elie Herbelot, Mariella - Dimiccoli, Raffaella Bernardi",Pay Attention to Those Sets! Learning Quantification from Images,cs.CL cs.AI cs.CV," Major advances have recently been made in merging language and vision -representations. But most tasks considered so far have confined themselves to -the processing of objects and lexicalised relations amongst objects (content -words). We know, however, that humans (even pre-school children) can abstract -over raw data to perform certain types of higher-level reasoning, expressed in -natural language by function words. A case in point is given by their ability -to learn quantifiers, i.e. expressions like 'few', 'some' and 'all'. From -formal semantics and cognitive linguistics, we know that quantifiers are -relations over sets which, as a simplification, we can see as proportions. For -instance, in 'most fish are red', most encodes the proportion of fish which are -red fish. In this paper, we study how well current language and vision -strategies model such relations. We show that state-of-the-art attention -mechanisms coupled with a traditional linguistic formalisation of quantifiers -gives best performance on the task. Additionally, we provide insights on the -role of 'gist' representations in quantification. A 'logical' strategy to -tackle the task would be to first obtain a numerosity estimation for the two -involved sets and then compare their cardinalities. We however argue that -precisely identifying the composition of the sets is not only beyond current -state-of-the-art models but perhaps even detrimental to a task that is most -efficiently performed by refining the approximate numerosity estimator of the -system. -" -4560,1704.02963,Thales Felipe Costa Bertaglia and Maria das Gra\c{c}as Volpe Nunes,"Exploring Word Embeddings for Unsupervised Textual User-Generated - Content Normalization",cs.CL cs.AI," Text normalization techniques based on rules, lexicons or supervised training -requiring large corpora are not scalable nor domain interchangeable, and this -makes them unsuitable for normalizing user-generated content (UGC). Current -tools available for Brazilian Portuguese make use of such techniques. In this -work we propose a technique based on distributed representation of words (or -word embeddings). It generates continuous numeric vectors of -high-dimensionality to represent words. The vectors explicitly encode many -linguistic regularities and patterns, as well as syntactic and semantic word -relationships. Words that share semantic similarity are represented by similar -vectors. Based on these features, we present a totally unsupervised, expandable -and language and domain independent method for learning normalization lexicons -from word embeddings. Our approach obtains high correction rate of orthographic -errors and internet slang in product reviews, outperforming the current -available tools for Brazilian Portuguese. -" -4561,1704.03013,"Nathan Siegle Hartmann and Livia Cucatto and Danielle Brants and - Sandra Alu\'isio","Automatic Classification of the Complexity of Nonfiction Texts in - Portuguese for Early School Years",cs.CL," Recent research shows that most Brazilian students have serious problems -regarding their reading skills. The full development of this skill is key for -the academic and professional future of every citizen. Tools for classifying -the complexity of reading materials for children aim to improve the quality of -the model of teaching reading and text comprehension. For English, Fengs work -[11] is considered the state-of-art in grade level prediction and achieved 74% -of accuracy in automatically classifying 4 levels of textual complexity for -close school grades. There are no classifiers for nonfiction texts for close -grades in Portuguese. In this article, we propose a scheme for manual -annotation of texts in 5 grade levels, which will be used for customized -reading to avoid the lack of interest by students who are more advanced in -reading and the blocking of those that still need to make further progress. We -obtained 52% of accuracy in classifying texts into 5 levels and 74% in 3 -levels. The results prove to be promising when compared to the state-of-art -work.9 -" -4562,1704.03016,"Nathan Siegle Hartmann and Magali Sanches Duran and Sandra Maria - Alu\'isio","Automatic semantic role labeling on non-revised syntactic trees of - journalistic texts",cs.CL," Semantic Role Labeling (SRL) is a Natural Language Processing task that -enables the detection of events described in sentences and the participants of -these events. For Brazilian Portuguese (BP), there are two studies recently -concluded that perform SRL in journalistic texts. [1] obtained F1-measure -scores of 79.6, using the PropBank.Br corpus, which has syntactic trees -manually revised, [8], without using a treebank for training, obtained -F1-measure scores of 68.0 for the same corpus. However, the use of manually -revised syntactic trees for this task does not represent a real scenario of -application. The goal of this paper is to evaluate the performance of SRL on -revised and non-revised syntactic trees using a larger and balanced corpus of -BP journalistic texts. First, we have shown that [1]'s system also performs -better than [8]'s system on the larger corpus. Second, the SRL system trained -on non-revised syntactic trees performs better over non-revised trees than a -system trained on gold-standard data. -" -4563,1704.03084,"Baolin Peng and Xiujun Li and Lihong Li and Jianfeng Gao and Asli - Celikyilmaz and Sungjin Lee and Kam-Fai Wong","Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep - Reinforcement Learning",cs.CL cs.AI cs.LG," Building a dialogue agent to fulfill complex tasks, such as travel planning, -is challenging because the agent has to learn to collectively complete multiple -subtasks. For example, the agent needs to reserve a hotel and book a flight so -that there leaves enough time for commute between arrival and hotel check-in. -This paper addresses this challenge by formulating the task in the mathematical -framework of options over Markov Decision Processes (MDPs), and proposing a -hierarchical deep reinforcement learning approach to learning a dialogue -manager that operates at different temporal scales. The dialogue manager -consists of: (1) a top-level dialogue policy that selects among subtasks or -options, (2) a low-level dialogue policy that selects primitive actions to -complete the subtask given by the top-level policy, and (3) a global state -tracker that helps ensure all cross-subtask constraints be satisfied. -Experiments on a travel planning task with simulated and real users show that -our approach leads to significant improvements over three baselines, two based -on handcrafted rules and the other based on flat deep reinforcement learning. -" -4564,1704.03169,Raphael Shu and Hideki Nakayama,Later-stage Minimum Bayes-Risk Decoding for Neural Machine Translation,cs.CL," For extended periods of time, sequence generation models rely on beam search -algorithm to generate output sequence. However, the correctness of beam search -degrades when the a model is over-confident about a suboptimal prediction. In -this paper, we propose to perform minimum Bayes-risk (MBR) decoding for some -extra steps at a later stage. In order to speed up MBR decoding, we compute the -Bayes risks on GPU in batch mode. In our experiments, we found that MBR -reranking works with a large beam size. Later-stage MBR decoding is shown to -outperform simple MBR reranking in machine translation tasks. -" -4565,1704.03223,"Zahra Mousavi, Heshaam Faili",Persian Wordnet Construction using Supervised Learning,cs.CL cs.LG stat.ML," This paper presents an automated supervised method for Persian wordnet -construction. Using a Persian corpus and a bi-lingual dictionary, the initial -links between Persian words and Princeton WordNet synsets have been generated. -These links will be discriminated later as correct or incorrect by employing -seven features in a trained classification system. The whole method is just a -classification system, which has been trained on a train set containing FarsNet -as a set of correct instances. State of the art results on the automatically -derived Persian wordnet is achieved. The resulted wordnet with a precision of -91.18% includes more than 16,000 words and 22,000 synsets. -" -4566,1704.03242,Santosh Kumar Bharti and Korra Sathya Babu,Automatic Keyword Extraction for Text Summarization: A Survey,cs.CL," In recent times, data is growing rapidly in every domain such as news, social -media, banking, education, etc. Due to the excessiveness of data, there is a -need of automatic summarizer which will be capable to summarize the data -especially textual data in original document without losing any critical -purposes. Text summarization is emerged as an important research area in recent -past. In this regard, review of existing work on text summarization process is -useful for carrying out further research. In this paper, recent literature on -automatic keyword extraction and text summarization are presented since text -summarization process is highly depend on keyword extraction. This literature -includes the discussion about different methodology used for keyword extraction -and text summarization. It also discusses about different databases used for -text summarization in several domains along with evaluation matrices. Finally, -it discusses briefly about issues and research challenges faced by researchers -along with future direction. -" -4567,1704.03279,Felix Stahlberg and Bill Byrne,Unfolding and Shrinking Neural Machine Translation Ensembles,cs.CL," Ensembling is a well-known technique in neural machine translation (NMT) to -improve system performance. Instead of a single neural net, multiple neural -nets with the same topology are trained separately, and the decoder generates -predictions by averaging over the individual models. Ensembling often improves -the quality of the generated translations drastically. However, it is not -suitable for production systems because it is cumbersome and slow. This work -aims to reduce the runtime to be on par with a single system without -compromising the translation quality. First, we show that the ensemble can be -unfolded into a single large neural network which imitates the output of the -ensemble system. We show that unfolding can already improve the runtime in -practice since more work can be done on the GPU. We proceed by describing a set -of techniques to shrink the unfolded network by reducing the dimensionality of -layers. On Japanese-English we report that the resulting network has the size -and decoding speed of a single NMT network but performs on the level of a -3-ensemble system. -" -4568,1704.03407,"Hwiyeol Jo, Soo-Min Kim, Jeong Ryu","What we really want to find by Sentiment Analysis: The Relationship - between Computational Models and Psychological State",cs.CL cs.IR," As the first step to model emotional state of a person, we build sentiment -analysis models with existing deep neural network algorithms and compare the -models with psychological measurements to enlighten the relationship. In the -experiments, we first examined psychological state of 64 participants and asked -them to summarize the story of a book, Chronicle of a Death Foretold (Marquez, -1981). Secondly, we trained models using crawled 365,802 movie review data; -then we evaluated participants' summaries using the pretrained model as a -concept of transfer learning. With the background that emotion affects on -memories, we investigated the relationship between the evaluation score of the -summaries from computational models and the examined psychological -measurements. The result shows that although CNN performed the best among other -deep neural network algorithms (LSTM, GRU), its results are not related to the -psychological state. Rather, GRU shows more explainable results depending on -the psychological state. The contribution of this paper can be summarized as -follows: (1) we enlighten the relationship between computational models and -psychological measurements. (2) we suggest this framework as objective methods -to evaluate the emotion; the real sentiment analysis of a person. -" -4569,1704.03471,"Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, James - Glass",What do Neural Machine Translation Models Learn about Morphology?,cs.CL," Neural machine translation (MT) models obtain state-of-the-art performance -while maintaining a simple, end-to-end architecture. However, little is known -about what these models learn about source and target languages during the -training process. In this work, we analyze the representations learned by -neural MT models at various levels of granularity and empirically evaluate the -quality of the representations for learning morphology through extrinsic -part-of-speech and morphological tagging tasks. We conduct a thorough -investigation along several parameters: word-based vs. character-based -representations, depth of the encoding layer, the identity of the target -language, and encoder vs. decoder representations. Our data-driven, -quantitative evaluation sheds light on important aspects in the neural MT -system and its ability to capture word structure. -" -4570,1704.03520,Felix Mannhardt and Niek Tax,"Unsupervised Event Abstraction using Pattern Abstraction and Local - Process Models",cs.DB cs.AI cs.CL," Process mining analyzes business processes based on events stored in event -logs. However, some recorded events may correspond to activities on a very low -level of abstraction. When events are recorded on a too low level of -granularity, process discovery methods tend to generate overgeneralizing -process models. Grouping low-level events to higher level activities, i.e., -event abstraction, can be used to discover better process models. Existing -event abstraction methods are mainly based on common sub-sequences and -clustering techniques. In this paper, we propose to first discover local -process models and then use those models to lift the event log to a higher -level of abstraction. Our conjecture is that process models discovered on the -obtained high-level event log return process models of higher quality: their -fitness and precision scores are more balanced. We show this with preliminary -results on several real-life event logs. -" -4571,1704.03543,Peter D. Turney,"Leveraging Term Banks for Answering Complex Questions: A Case for Sparse - Vectors",cs.IR cs.CL cs.LG," While open-domain question answering (QA) systems have proven effective for -answering simple questions, they struggle with more complex questions. Our goal -is to answer more complex questions reliably, without incurring a significant -cost in knowledge resource construction to support the QA. One readily -available knowledge resource is a term bank, enumerating the key concepts in a -domain. We have developed an unsupervised learning approach that leverages a -term bank to guide a QA system, by representing the terminological knowledge -with thousands of specialized vector spaces. In experiments with complex -science questions, we show that this approach significantly outperforms several -state-of-the-art QA systems, demonstrating that significant leverage can be -gained from continuous vector representations of domain terminology. -" -4572,1704.03560,Robyn Speer and Joanna Lowry-Duda,"ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with - Multilingual Relational Knowledge",cs.CL," This paper describes Luminoso's participation in SemEval 2017 Task 2, -""Multilingual and Cross-lingual Semantic Word Similarity"", with a system based -on ConceptNet. ConceptNet is an open, multilingual knowledge graph that focuses -on general knowledge that relates the meanings of words and phrases. Our -submission to SemEval was an update of previous work that builds high-quality, -multilingual word embeddings from a combination of ConceptNet and -distributional semantics. Our system took first place in both subtasks. It -ranked first in 4 out of 5 of the separate languages, and also ranked first in -all 10 of the cross-lingual language pairs. -" -4573,1704.03617,"Matthew Riemer, Elham Khabiri, and Richard Goodwin","Representation Stability as a Regularizer for Improved Text Analytics - Transfer Learning",cs.CL cs.LG," Although neural networks are well suited for sequential transfer learning -tasks, the catastrophic forgetting problem hinders proper integration of prior -knowledge. In this work, we propose a solution to this problem by using a -multi-task objective based on the idea of distillation and a mechanism that -directly penalizes forgetting at the shared representation layer during the -knowledge integration phase of training. We demonstrate our approach on a -Twitter domain sentiment analysis task with sequential knowledge transfer from -four related tasks. We show that our technique outperforms networks fine-tuned -to the target task. Additionally, we show both through empirical evidence and -examples that it does not forget useful knowledge from the source task that is -forgotten during standard fine-tuning. Surprisingly, we find that first -distilling a human made rule based sentiment engine into a recurrent neural -network and then integrating the knowledge with the target task data leads to a -substantial gain in generalization performance. Our experiments demonstrate the -power of multi-source transfer techniques in practical text analytics problems -when paired with distillation. In particular, for the SemEval 2016 Task 4 -Subtask A (Nakov et al., 2016) dataset we surpass the state of the art -established during the competition with a comparatively simple model -architecture that is not even competitive when trained on only the labeled task -specific data. -" -4574,1704.03627,"Ting-Hao 'Kenneth' Huang, Yun-Nung Chen, Jeffrey P. Bigham",Real-time On-Demand Crowd-powered Entity Extraction,cs.HC cs.AI cs.CL," Output-agreement mechanisms such as ESP Game have been widely used in human -computation to obtain reliable human-generated labels. In this paper, we argue -that a ""time-limited"" output-agreement mechanism can be used to create a fast -and robust crowd-powered component in interactive systems, particularly -dialogue systems, to extract key information from user utterances on the fly. -Our experiments on Amazon Mechanical Turk using the Airline Travel Information -System (ATIS) dataset showed that the proposed approach achieves high-quality -results with an average response time shorter than 9 seconds. -" -4575,1704.03693,Thiago castro Ferreira and Ivandre Paraboni,"Trainable Referring Expression Generation using Overspecification - Preferences",cs.CL," Referring expression generation (REG) models that use speaker-dependent -information require a considerable amount of training data produced by every -individual speaker, or may otherwise perform poorly. In this work we present a -simple REG experiment that allows the use of larger training data sets by -grouping speakers according to their overspecification preferences. Intrinsic -evaluation shows that this method generally outperforms the personalised method -found in previous work. -" -4576,1704.03809,"Merlijn Blaauw, Jordi Bonada",A Neural Parametric Singing Synthesizer,cs.SD cs.CL cs.LG," We present a new model for singing synthesis based on a modified version of -the WaveNet architecture. Instead of modeling raw waveform, we model features -produced by a parametric vocoder that separates the influence of pitch and -timbre. This allows conveniently modifying pitch to match any target melody, -facilitates training on more modest dataset sizes, and significantly reduces -training and generation times. Our model makes frame-wise predictions using -mixture density outputs rather than categorical outputs in order to reduce the -required parameter count. As we found overfitting to be an issue with the -relatively small datasets used in our experiments, we propose a method to -regularize the model and make the autoregressive generation process more robust -to prediction errors. Using a simple multi-stream architecture, harmonic, -aperiodic and voiced/unvoiced components can all be predicted in a coherent -manner. We compare our method to existing parametric statistical and -state-of-the-art concatenative methods using quantitative metrics and a -listening test. While naive implementations of the autoregressive generation -algorithm tend to be inefficient, using a smart algorithm we can greatly speed -up the process and obtain a system that's competitive in both speed and -quality. -" -4577,1704.03940,"Kai Hui, Andrew Yates, Klaus Berberich, Gerard de Melo",PACRR: A Position-Aware Neural IR Model for Relevance Matching,cs.IR cs.CL," In order to adopt deep learning for information retrieval, models are needed -that can capture all relevant information required to assess the relevance of a -document to a given user query. While previous works have successfully captured -unigram term matches, how to fully employ position-dependent information such -as proximity and term dependencies has been insufficiently explored. In this -work, we propose a novel neural IR model named PACRR aiming at better modeling -position-dependent interactions between a query and a document. Extensive -experiments on six years' TREC Web Track data confirm that the proposed model -yields better results under multiple benchmarks. -" -4578,1704.03956,"Nobuhiro Kaji, Hayato Kobayashi",Incremental Skip-gram Model with Negative Sampling,cs.CL," This paper explores an incremental training strategy for the skip-gram model -with negative sampling (SGNS) from both empirical and theoretical perspectives. -Existing methods of neural word embeddings, including SGNS, are multi-pass -algorithms and thus cannot perform incremental model update. To address this -problem, we present a simple incremental extension of SGNS and provide a -thorough theoretical analysis to demonstrate its validity. Empirical -experiments demonstrated the correctness of the theoretical analysis as well as -the practical usefulness of the incremental algorithm. -" -4579,1704.03987,"Tom Ouyang, David Rybach, Fran\c{c}oise Beaufays, Michael Riley",Mobile Keyboard Input Decoding with Finite-State Transducers,cs.CL," We propose a finite-state transducer (FST) representation for the models used -to decode keyboard inputs on mobile devices. Drawing from learnings from the -field of speech recognition, we describe a decoding framework that can satisfy -the strict memory and latency constraints of keyboard input. We extend this -framework to support functionalities typically not present in speech -recognition, such as literal decoding, autocorrections, word completions, and -next word predictions. - We describe the general framework of what we call for short the keyboard ""FST -decoder"" as well as the implementation details that are new compared to a -speech FST decoder. We demonstrate that the FST decoder enables new UX features -such as post-corrections. Finally, we sketch how this decoder can support -advanced features such as personalization and contextualization. -" -4580,1704.04008,"Afshin Rahimi, Trevor Cohn, Timothy Baldwin",A Neural Model for User Geolocation and Lexical Dialectology,cs.CL," We propose a simple yet effective text- based user geolocation model based on -a neural network with one hidden layer, which achieves state of the art -performance over three Twitter benchmark geolocation datasets, in addition to -producing word and phrase embeddings in the hidden layer that we show to be -useful for detecting dialectal terms. As part of our analysis of dialectal -terms, we release DAREDS, a dataset for evaluating dialect term detection -methods. -" -4581,1704.04100,Chlo\'e Braud and Oph\'elie Lacroix and Anders S{\o}gaard,"Cross-lingual and cross-domain discourse segmentation of entire - documents",cs.CL," Discourse segmentation is a crucial step in building end-to-end discourse -parsers. However, discourse segmenters only exist for a few languages and -domains. Typically they only detect intra-sentential segment boundaries, -assuming gold standard sentence and token segmentation, and relying on -high-quality syntactic parses and rich heuristics that are not generally -available across languages and domains. In this paper, we propose statistical -discourse segmenters for five languages and three domains that do not rely on -gold pre-annotations. We also consider the problem of learning discourse -segmenters when no labeled data is available for a language. Our fully -supervised system obtains 89.5% F1 for English newswire, with slight drops in -performance on other domains, and we report supervised and unsupervised -(cross-lingual) results for five languages in total. -" -4582,1704.04154,Holger Schwenk and Matthijs Douze,"Learning Joint Multilingual Sentence Representations with Neural Machine - Translation",cs.CL," In this paper, we use the framework of neural machine translation to learn -joint sentence representations across six very different languages. Our aim is -that a representation which is independent of the language, is likely to -capture the underlying semantics. We define a new cross-lingual similarity -measure, compare up to 1.4M sentence representations and study the -characteristics of close sentences. We provide experimental evidence that -sentences that are close in embedding space are indeed semantically highly -related, but often have quite different structure and syntax. These relations -also hold when comparing sentences in different languages. -" -4583,1704.04198,"Emiel van Miltenburg, Desmond Elliott",Room for improvement in automatic image description: an error analysis,cs.CL," In recent years we have seen rapid and significant progress in automatic -image description but what are the open problems in this area? Most work has -been evaluated using text-based similarity metrics, which only indicate that -there have been improvements, without explaining what has improved. In this -paper, we present a detailed error analysis of the descriptions generated by a -state-of-the-art attention-based model. Our analysis operates on two levels: -first we check the descriptions for accuracy, and then we categorize the types -of errors we observe in the inaccurate descriptions. We find only 20% of the -descriptions are free from errors, and surprisingly that 26% are unrelated to -the image. Finally, we manually correct the most frequently occurring error -types (e.g. gender identification) to estimate the performance reward for -addressing these errors, observing gains of 0.2--1 BLEU point per type. -" -4584,1704.04222,"Wei-Ning Hsu, Yu Zhang, James Glass",Learning Latent Representations for Speech Generation and Transformation,cs.CL cs.LG stat.ML," An ability to model a generative process and learn a latent representation -for speech in an unsupervised fashion will be crucial to process vast -quantities of unlabelled speech data. Recently, deep probabilistic generative -models such as Variational Autoencoders (VAEs) have achieved tremendous success -in modeling natural images. In this paper, we apply a convolutional VAE to -model the generative process of natural speech. We derive latent space -arithmetic operations to disentangle learned latent representations. We -demonstrate the capability of our model to modify the phonetic content or the -speaker identity for speech segments using the derived operations, without the -need for parallel supervisory data. -" -4585,1704.04259,Piek Vossen and Agata Cybulska,Identity and Granularity of Events in Text,cs.CL," In this paper we describe a method to detect event descrip- tions in -different news articles and to model the semantics of events and their -components using RDF representations. We compare these descriptions to solve a -cross-document event coreference task. Our com- ponent approach to event -semantics defines identity and granularity of events at different levels. It -performs close to state-of-the-art approaches on the cross-document event -coreference task, while outperforming other works when assuming similar quality -of event detection. We demonstrate how granularity and identity are -interconnected and we discuss how se- mantic anomaly could be used to define -differences between coreference, subevent and topical relations. -" -4586,1704.04336,"Fan Xu, Shujing Du, Maoxi Li and Mingwen Wang","An entity-driven recursive neural network model for chinese discourse - coherence modeling",cs.CL," Chinese discourse coherence modeling remains a challenge taskin Natural -Language Processing field.Existing approaches mostlyfocus on the need for -feature engineering, whichadoptthe sophisticated features to capture the logic -or syntactic or semantic relationships acrosssentences within a text.In this -paper, we present an entity-drivenrecursive deep modelfor the Chinese discourse -coherence evaluation based on current English discourse coherenceneural network -model. Specifically, to overcome the shortage of identifying the entity(nouns) -overlap across sentences in the currentmodel, Our combined modelsuccessfully -investigatesthe entities information into the recursive neural network -freamework.Evaluation results on both sentence ordering and machine translation -coherence rating task show the effectiveness of the proposed model, which -significantly outperforms the existing strong baseline. -" -4587,1704.04347,"Longyue Wang, Zhaopeng Tu, Andy Way, Qun Liu",Exploiting Cross-Sentence Context for Neural Machine Translation,cs.CL," In translation, considering the document as a whole can help to resolve -ambiguities and inconsistencies. In this paper, we propose a cross-sentence -context-aware approach and investigate the influence of historical contextual -information on the performance of neural machine translation (NMT). First, this -history is summarized in a hierarchical way. We then integrate the historical -representation into NMT in two strategies: 1) a warm-start of encoder and -decoder states, and 2) an auxiliary context source for updating decoder states. -Experimental results on a large Chinese-English translation task show that our -approach significantly improves upon a strong attention-based NMT system by up -to +2.1 BLEU points. -" -4588,1704.04368,"Abigail See, Peter J. Liu, Christopher D. Manning",Get To The Point: Summarization with Pointer-Generator Networks,cs.CL," Neural sequence-to-sequence models have provided a viable new approach for -abstractive text summarization (meaning they are not restricted to simply -selecting and rearranging passages from the original text). However, these -models have two shortcomings: they are liable to reproduce factual details -inaccurately, and they tend to repeat themselves. In this work we propose a -novel architecture that augments the standard sequence-to-sequence attentional -model in two orthogonal ways. First, we use a hybrid pointer-generator network -that can copy words from the source text via pointing, which aids accurate -reproduction of information, while retaining the ability to produce novel words -through the generator. Second, we use coverage to keep track of what has been -summarized, which discourages repetition. We apply our model to the CNN / Daily -Mail summarization task, outperforming the current abstractive state-of-the-art -by at least 2 ROUGE points. -" -4589,1704.04441,"Georg Heigold and G\""unter Neumann and Josef van Genabith","How Robust Are Character-Based Word Embeddings in Tagging and MT Against - Wrod Scramlbing or Randdm Nouse?",cs.CL," This paper investigates the robustness of NLP against perturbed word forms. -While neural approaches can achieve (almost) human-like accuracy for certain -tasks and conditions, they often are sensitive to small changes in the input -such as non-canonical input (e.g., typos). Yet both stability and robustness -are desired properties in applications involving user-generated content, and -the more as humans easily cope with such noisy or adversary conditions. In this -paper, we study the impact of noisy input. We consider different noise -distributions (one type of noise, combination of noise types) and mismatched -noise distributions for training and testing. Moreover, we empirically evaluate -the robustness of different models (convolutional neural networks, recurrent -neural networks, non-neural models), different basic units (characters, byte -pair encoding units), and different NLP tasks (morphological tagging, machine -translation). -" -4590,1704.04451,Phong Le and Ivan Titov,Optimizing Differentiable Relaxations of Coreference Evaluation Metrics,cs.CL cs.AI cs.LG," Coreference evaluation metrics are hard to optimize directly as they are -non-differentiable functions, not easily decomposable into elementary -decisions. Consequently, most approaches optimize objectives only indirectly -related to the end goal, resulting in suboptimal performance. Instead, we -propose a differentiable relaxation that lends itself to gradient-based -optimisation, thus bypassing the need for reinforcement learning or heuristic -modification of cross-entropy. We show that by modifying the training objective -of a competitive neural coreference system, we obtain a substantial gain in -performance. This suggests that our approach can be regarded as a viable -alternative to using reinforcement learning or more computationally expensive -imitation learning. -" -4591,1704.04452,Tobias Falke and Iryna Gurevych,"Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of - Concept Maps",cs.CL," Concept maps can be used to concisely represent important information and -bring structure into large document collections. Therefore, we study a variant -of multi-document summarization that produces summaries in the form of concept -maps. However, suitable evaluation datasets for this task are currently -missing. To close this gap, we present a newly created corpus of concept maps -that summarize heterogeneous collections of web documents on educational -topics. It was created using a novel crowdsourcing approach that allows us to -efficiently determine important elements in large document collections. We -release the corpus along with a baseline system and proposed evaluation -protocol to enable further research on this variant of summarization. -" -4592,1704.04455,"Paramita Mirza, Simon Razniewski, Fariz Darari, Gerhard Weikum",Cardinal Virtues: Extracting Relation Cardinalities from Text,cs.CL," Information extraction (IE) from text has largely focused on relations -between individual entities, such as who has won which award. However, some -facts are never fully mentioned, and no IE method has perfect recall. Thus, it -is beneficial to also tap contents about the cardinalities of these relations, -for example, how many awards someone has won. We introduce this novel problem -of extracting cardinalities and discusses the specific challenges that set it -apart from standard IE. We present a distant supervision method using -conditional random fields. A preliminary evaluation results in precision -between 3% and 55%, depending on the difficulty of relations. -" -4593,1704.04517,Alexander Kuhnle and Ann Copestake,"ShapeWorld - A new test methodology for multimodal language - understanding",cs.CL cs.AI cs.CV," We introduce a novel framework for evaluating multimodal deep learning models -with respect to their language understanding and generalization abilities. In -this approach, artificial data is automatically generated according to the -experimenter's specifications. The content of the data, both during training -and evaluation, can be controlled in detail, which enables tasks to be created -that require true generalization abilities, in particular the combination of -previously introduced concepts in novel ways. We demonstrate the potential of -our methodology by evaluating various visual question answering models on four -different tasks, and show how our framework gives us detailed insights into -their capabilities and limitations. By open-sourcing our framework, we hope to -stimulate progress in the field of multimodal language understanding. -" -4594,1704.04520,"Zi Long, Ryuichiro Kimura, Takehito Utsuro, Tomoharu Mitsuhashi, Mikio - Yamamoto","Neural Machine Translation Model with a Large Vocabulary Selected by - Branching Entropy",cs.CL," Neural machine translation (NMT), a new approach to machine translation, has -achieved promising results comparable to those of traditional approaches such -as statistical machine translation (SMT). Despite its recent success, NMT -cannot handle a larger vocabulary because the training complexity and decoding -complexity proportionally increase with the number of target words. This -problem becomes even more serious when translating patent documents, which -contain many technical terms that are observed infrequently. In this paper, we -propose to select phrases that contain out-of-vocabulary words using the -statistical approach of branching entropy. This allows the proposed NMT system -to be applied to a translation task of any language pair without any -language-specific knowledge about technical term identification. The selected -phrases are then replaced with tokens during training and post-translated by -the phrase translation table of SMT. Evaluation on Japanese-to-Chinese, -Chinese-to-Japanese, Japanese-to-English and English-to-Japanese patent -sentence translation proved the effectiveness of phrases selected with -branching entropy, where the proposed NMT model achieves a substantial -improvement over a baseline NMT model without our proposed technique. Moreover, -the number of translation errors of under-translation by the baseline NMT model -without our proposed technique reduces to around half by the proposed NMT -model. -" -4595,1704.04521,"Zi Long, Takehito Utsuro, Tomoharu Mitsuhashi, Mikio Yamamoto","Translation of Patent Sentences with a Large Vocabulary of Technical - Terms Using Neural Machine Translation",cs.CL," Neural machine translation (NMT), a new approach to machine translation, has -achieved promising results comparable to those of traditional approaches such -as statistical machine translation (SMT). Despite its recent success, NMT -cannot handle a larger vocabulary because training complexity and decoding -complexity proportionally increase with the number of target words. This -problem becomes even more serious when translating patent documents, which -contain many technical terms that are observed infrequently. In NMTs, words -that are out of vocabulary are represented by a single unknown token. In this -paper, we propose a method that enables NMT to translate patent sentences -comprising a large vocabulary of technical terms. We train an NMT system on -bilingual data wherein technical terms are replaced with technical term tokens; -this allows it to translate most of the source sentences except technical -terms. Further, we use it as a decoder to translate source sentences with -technical term tokens and replace the tokens with technical term translations -using SMT. We also use it to rerank the 1,000-best SMT translations on the -basis of the average of the SMT score and that of the NMT rescoring of the -translated sentences with technical term tokens. Our experiments on -Japanese-Chinese patent sentences show that the proposed NMT system achieves a -substantial improvement of up to 3.1 BLEU points and 2.3 RIBES points over -traditional SMT systems and an improvement of approximately 0.6 BLEU points and -0.8 RIBES points over an equivalent NMT system without our proposed technique. -" -4596,1704.04530,"Shashi Narayan, Nikos Papasarantopoulos, Shay B. Cohen, Mirella Lapata",Neural Extractive Summarization with Side Information,cs.CL," Most extractive summarization methods focus on the main body of the document -from which sentences need to be extracted. However, the gist of the document -may lie in side information, such as the title and image captions which are -often available for newswire articles. We propose to explore side information -in the context of single-document extractive summarization. We develop a -framework for single-document summarization composed of a hierarchical document -encoder and an attention-based extractor with attention over side information. -We evaluate our model on a large scale news dataset. We show that extractive -summarization with side information consistently outperforms its counterpart -that does not use any side information, in terms of both informativeness and -fluency. -" -4597,1704.04539,Marco Damonte and Shay B. Cohen,Cross-lingual Abstract Meaning Representation Parsing,cs.CL," Abstract Meaning Representation (AMR) annotation efforts have mostly focused -on English. In order to train parsers on other languages, we propose a method -based on annotation projection, which involves exploiting annotations in a -source language and a parallel corpus of the source language and a target -language. Using English as the source language, we show promising results for -Italian, Spanish, German and Chinese as target languages. Besides evaluating -the target parsers on non-gold datasets, we further propose an evaluation -method that exploits the English gold annotations and does not require access -to gold annotations for the target languages. This is achieved by inverting the -projection process: a new English parser is learned from the target language -parser and evaluated on the existing English gold standard. -" -4598,1704.04550,"Su Wang, Stephen Roller, Katrin Erk",Distributional Modeling on a Diet: One-shot Word Learning from Text Only,cs.CL," We test whether distributional models can do one-shot learning of -definitional properties from text only. Using Bayesian models, we find that -first learning overarching structure in the known data, regularities in textual -contexts and in properties, helps one-shot learning, and that individual -context items can be highly informative. Our experiments show that our model -can learn properties from a single exposure when given an informative -utterance. -" -4599,1704.04565,"Gaurav Singh Tomar and Thyago Duque and Oscar T\""ackstr\""om and Jakob - Uszkoreit and Dipanjan Das",Neural Paraphrase Identification of Questions with Noisy Pretraining,cs.CL," We present a solution to the problem of paraphrase identification of -questions. We focus on a recent dataset of question pairs annotated with binary -paraphrase labels and show that a variant of the decomposable attention model -(Parikh et al., 2016) results in accurate performance on this task, while being -far simpler than many competing neural architectures. Furthermore, when the -model is pretrained on a noisy dataset of automatically collected question -paraphrases, it obtains the best reported performance on the dataset. -" -4600,1704.04601,Guang-He Lee and Yun-Nung Chen,MUSE: Modularizing Unsupervised Sense Embeddings,cs.CL," This paper proposes to address the word sense ambiguity issue in an -unsupervised manner, where word sense representations are learned along a word -sense selection mechanism given contexts. Prior work focused on designing a -single model to deliver both mechanisms, and thus suffered from either -coarse-grained representation learning or inefficient sense selection. The -proposed modular approach, MUSE, implements flexible modules to optimize -distinct mechanisms, achieving the first purely sense-level representation -learning system with linear-time sense selection. We leverage reinforcement -learning to enable joint training on the proposed modules, and introduce -various exploration techniques on sense selection for better robustness. The -experiments on benchmark data show that the proposed approach achieves the -state-of-the-art performance on synonym selection as well as on contextual word -similarities in terms of MaxSimC. -" -4601,1704.04664,"Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi, Tetsunari - Inamura","Online Spatial Concept and Lexical Acquisition with Simultaneous - Localization and Mapping",cs.AI cs.CL cs.RO," In this paper, we propose an online learning algorithm based on a -Rao-Blackwellized particle filter for spatial concept acquisition and mapping. -We have proposed a nonparametric Bayesian spatial concept acquisition model -(SpCoA). We propose a novel method (SpCoSLAM) integrating SpCoA and FastSLAM in -the theoretical framework of the Bayesian generative model. The proposed method -can simultaneously learn place categories and lexicons while incrementally -generating an environmental map. Furthermore, the proposed method has scene -image features and a language model added to SpCoA. In the experiments, we -tested online learning of spatial concepts and environmental maps in a novel -environment of which the robot did not have a map. Then, we evaluated the -results of online learning of spatial concepts and lexical acquisition. The -experimental results demonstrated that the robot was able to more accurately -learn the relationships between words and the place in the environmental map -incrementally by using the proposed method. -" -4602,1704.04675,"Jasmijn Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, Khalil - Sima'an",Graph Convolutional Encoders for Syntax-aware Neural Machine Translation,cs.CL," We present a simple and effective approach to incorporating syntactic -structure into neural attention-based encoder-decoder models for machine -translation. We rely on graph-convolutional networks (GCNs), a recent class of -neural networks developed for modeling graph-structured data. Our GCNs use -predicted syntactic dependency trees of source sentences to produce -representations of words (i.e. hidden states of the encoder) that are sensitive -to their syntactic neighborhoods. GCNs take word representations as input and -produce word representations as output, so they can easily be incorporated as -layers into standard encoders (e.g., on top of bidirectional RNNs or -convolutional neural networks). We evaluate their effectiveness with -English-German and English-Czech translation experiments for different types of -encoders and observe substantial improvements over their syntax-agnostic -versions in all the considered setups. -" -4603,1704.04683,"Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy",RACE: Large-scale ReAding Comprehension Dataset From Examinations,cs.CL cs.AI cs.LG," We present RACE, a new dataset for benchmark evaluation of methods in the -reading comprehension task. Collected from the English exams for middle and -high school Chinese students in the age range between 12 to 18, RACE consists -of near 28,000 passages and near 100,000 questions generated by human experts -(English instructors), and covers a variety of topics which are carefully -designed for evaluating the students' ability in understanding and reasoning. -In particular, the proportion of questions that requires reasoning is much -larger in RACE than that in other benchmark datasets for reading comprehension, -and there is a significant gap between the performance of the state-of-the-art -models (43%) and the ceiling human performance (95%). We hope this new dataset -can serve as a valuable resource for research and evaluation in machine -comprehension. The dataset is freely available at -http://www.cs.cmu.edu/~glai1/data/race/ and the code is available at -https://github.com/qizhex/RACE_AR_baselines. -" -4604,1704.04743,Roee Aharoni and Yoav Goldberg,Towards String-to-Tree Neural Machine Translation,cs.CL," We present a simple method to incorporate syntactic information about the -target language in a neural machine translation system by translating into -linearized, lexicalized constituency trees. An experiment on the WMT16 -German-English news translation task resulted in an improved BLEU score when -compared to a syntax-agnostic NMT baseline trained on the same dataset. An -analysis of the translations from the syntax-aware system shows that it -performs more reordering during translation in comparison to the baseline. A -small-scale human evaluation also showed an advantage to the syntax-aware -system. -" -4605,1704.04856,"Pablo Loyola, Edison Marrese-Taylor and Yutaka Matsuo","A Neural Architecture for Generating Natural Language Descriptions from - Source Code Changes",cs.CL," We propose a model to automatically describe changes introduced in the source -code of a program using natural language. Our method receives as input a set of -code commits, which contains both the modifications and message introduced by -an user. These two modalities are used to train an encoder-decoder -architecture. We evaluated our approach on twelve real world open source -projects from four different programming languages. Quantitative and -qualitative results showed that the proposed approach can generate feasible and -semantically sound descriptions not only in standard in-project settings, but -also in a cross-project setting. -" -4606,1704.04859,"Frederick Liu, Han Lu, Chieh Lo and Graham Neubig",Learning Character-level Compositionality with Visual Features,cs.CL," Previous work has modeled the compositionality of words by creating -character-level models of meaning, reducing problems of sparsity for rare -words. However, in many writing systems compositionality has an effect even on -the character-level: the meaning of a character is derived by the sum of its -parts. In this paper, we model this effect by creating embeddings for -characters based on their visual characteristics, creating an image for the -character and running it through a convolutional neural network to produce a -visual character embedding. Experiments on a text classification task -demonstrate that such model allows for better processing of instances with rare -characters in languages such as Chinese, Japanese, and Korean. Additionally, -qualitative analyses demonstrate that our proposed model learns to focus on the -parts of characters that carry semantic content, resulting in embeddings that -are coherent in visual space. -" -4607,1704.04920,Octavian-Eugen Ganea and Thomas Hofmann,Deep Joint Entity Disambiguation with Local Neural Attention,cs.CL," We propose a novel deep learning model for joint document-level entity -disambiguation, which leverages learned neural representations. Key components -are entity embeddings, a neural attention mechanism over local context windows, -and a differentiable joint inference stage for disambiguation. Our approach -thereby combines benefits of deep learning with more traditional approaches -such as graphical models and probabilistic mention-entity maps. Extensive -experiments show that we are able to obtain competitive or state-of-the-art -accuracy at moderate computational costs. -" -4608,1704.05021,Alham Fikri Aji and Kenneth Heafield,Sparse Communication for Distributed Gradient Descent,cs.CL cs.DC cs.LG," We make distributed stochastic gradient descent faster by exchanging sparse -updates instead of dense updates. Gradient updates are positively skewed as -most updates are near zero, so we map the 99% smallest updates (by absolute -value) to zero then exchange sparse matrices. This method can be combined with -quantization to further improve the compression. We explore different -configurations and apply them to neural machine translation and MNIST image -classification tasks. Most configurations work on MNIST, whereas different -configurations reduce convergence rate on the more complex translation task. -Our experiments show that we can achieve up to 49% speed up on MNIST and 22% on -NMT without damaging the final accuracy or BLEU. -" -4609,1704.05091,"Pedro Saleiro, Eduarda Mendes Rodrigues, Carlos Soares, Eug\'enio - Oliveira","FEUP at SemEval-2017 Task 5: Predicting Sentiment Polarity and Intensity - with Financial Word Embeddings",cs.CL cs.IR," This paper presents the approach developed at the Faculty of Engineering of -University of Porto, to participate in SemEval 2017, Task 5: Fine-grained -Sentiment Analysis on Financial Microblogs and News. The task consisted in -predicting a real continuous variable from -1.0 to +1.0 representing the -polarity and intensity of sentiment concerning companies/stocks mentioned in -short texts. We modeled the task as a regression analysis problem and combined -traditional techniques such as pre-processing short texts, bag-of-words -representations and lexical-based features with enhanced financial specific -bag-of-embeddings. We used an external collection of tweets and news headlines -mentioning companies/stocks from S\&P 500 to create financial word embeddings -which are able to capture domain-specific syntactic and semantic similarities. -The resulting approach obtained a cosine similarity score of 0.69 in sub-task -5.1 - Microblogs and 0.68 in sub-task 5.2 - News Headlines. -" -4610,1704.05119,"Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta",Exploring Sparsity in Recurrent Neural Networks,cs.LG cs.CL," Recurrent Neural Networks (RNN) are widely used to solve a variety of -problems and as the quantity of data and the amount of available compute have -increased, so have model sizes. The number of parameters in recent -state-of-the-art networks makes them hard to deploy, especially on mobile -phones and embedded devices. The challenge is due to both the size of the model -and the time it takes to evaluate it. In order to deploy these RNNs -efficiently, we propose a technique to reduce the parameters of a network by -pruning weights during the initial training of the network. At the end of -training, the parameters of the network are sparse while accuracy is still -close to the original dense neural network. The network size is reduced by 8x -and the time required to train the model remains constant. Additionally, we can -prune a larger dense network to achieve better than baseline performance while -still reducing the total number of parameters significantly. Pruning RNNs -reduces the size of the model and can also help achieve significant inference -time speed-up using sparse matrix multiply. Benchmarks show that using our -technique model size can be reduced by 90% and speed-up is around 2x to 7x. -" -4611,1704.05135,"Sebastien Jean, Stanislas Lauly, Orhan Firat, Kyunghyun Cho",Does Neural Machine Translation Benefit from Larger Context?,stat.ML cs.CL cs.LG," We propose a neural machine translation architecture that models the -surrounding text in addition to the source sentence. These models lead to -better performance, both in terms of general translation quality and pronoun -prediction, when trained on small corpora, although this improvement largely -disappears when trained with a larger corpus. We also discover that -attention-based neural machine translation is well suited for pronoun -prediction and compares favorably with other approaches that were specifically -designed for this task. -" -4612,1704.05162,Majid Laali and Leila Kosseim,Automatic Disambiguation of French Discourse Connectives,cs.CL," Discourse connectives (e.g. however, because) are terms that can explicitly -convey a discourse relation within a text. While discourse connectives have -been shown to be an effective clue to automatically identify discourse -relations, they are not always used to convey such relations, thus they should -first be disambiguated between discourse-usage non-discourse-usage. In this -paper, we investigate the applicability of features proposed for the -disambiguation of English discourse connectives for French. Our results with -the French Discourse Treebank (FDTB) show that syntactic and lexical features -developed for English texts are as effective for French and allow the -disambiguation of French discourse connectives with an accuracy of 94.2%. -" -4613,1704.05179,"Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik - and Kyunghyun Cho",SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine,cs.CL," We publicly release a new large-scale dataset, called SearchQA, for machine -comprehension, or question-answering. Unlike recently released datasets, such -as DeepMind CNN/DailyMail and SQuAD, the proposed SearchQA was constructed to -reflect a full pipeline of general question-answering. That is, we start not -from an existing article and generate a question-answer pair, but start from an -existing question-answer pair, crawled from J! Archive, and augment it with -text snippets retrieved by Google. Following this approach, we built SearchQA, -which consists of more than 140k question-answer pairs with each pair having -49.6 snippets on average. Each question-answer-context tuple of the SearchQA -comes with additional meta-data such as the snippet's URL, which we believe -will be valuable resources for future research. We conduct human evaluation as -well as test two baseline methods, one simple word selection and the other deep -learning based, on the SearchQA. We show that there is a meaningful gap between -the human and machine performances. This suggests that the proposed dataset -could well serve as a benchmark for question-answering. -" -4614,1704.05228,"Mathias Kraus, Stefan Feuerriegel","Sentiment analysis based on rhetorical structure theory: Learning deep - neural networks from discourse trees",cs.CL," Prominent applications of sentiment analysis are countless, covering areas -such as marketing, customer service and communication. The conventional -bag-of-words approach for measuring sentiment merely counts term frequencies; -however, it neglects the position of the terms within the discourse. As a -remedy, we develop a discourse-aware method that builds upon the discourse -structure of documents. For this purpose, we utilize rhetorical structure -theory to label (sub-)clauses according to their hierarchical relationships and -then assign polarity scores to individual leaves. To learn from the resulting -rhetorical structure, we propose a tensor-based, tree-structured deep neural -network (named Discourse-LSTM) in order to process the complete discourse tree. -The underlying tensors infer the salient passages of narrative materials. In -addition, we suggest two algorithms for data augmentation (node reordering and -artificial leaf insertion) that increase our training set and reduce -overfitting. Our benchmarks demonstrate the superior performance of our -approach. Moreover, our tensor structure reveals the salient text passages and -thereby provides explanatory insights. -" -4615,1704.05295,"S\'ebastien Harispe, Sylvie Ranwez, Stefan Janaqi, Jacky Montmain",Semantic Similarity from Natural Language and Ontology Analysis,cs.AI cs.CL," Artificial Intelligence federates numerous scientific fields in the aim of -developing machines able to assist human operators performing complex -treatments -- most of which demand high cognitive skills (e.g. learning or -decision processes). Central to this quest is to give machines the ability to -estimate the likeness or similarity between things in the way human beings -estimate the similarity between stimuli. - In this context, this book focuses on semantic measures: approaches designed -for comparing semantic entities such as units of language, e.g. words, -sentences, or concepts and instances defined into knowledge bases. The aim of -these measures is to assess the similarity or relatedness of such semantic -entities by taking into account their semantics, i.e. their meaning -- -intuitively, the words tea and coffee, which both refer to stimulating -beverage, will be estimated to be more semantically similar than the words -toffee (confection) and coffee, despite that the last pair has a higher -syntactic similarity. The two state-of-the-art approaches for estimating and -quantifying semantic similarities/relatedness of semantic entities are -presented in detail: the first one relies on corpora analysis and is based on -Natural Language Processing techniques and semantic models while the second is -based on more or less formal, computer-readable and workable forms of knowledge -such as semantic networks, thesaurus or ontologies. (...) Beyond a simple -inventory and categorization of existing measures, the aim of this monograph is -to convey novices as well as researchers of these domains towards a better -understanding of semantic similarity estimation and more generally semantic -measures. -" -4616,1704.05347,\v{Z}eljko Agi\'c and Natalie Schluter,Baselines and test data for cross-lingual inference,cs.CL," The recent years have seen a revival of interest in textual entailment, -sparked by i) the emergence of powerful deep neural network learners for -natural language processing and ii) the timely development of large-scale -evaluation datasets such as SNLI. Recast as natural language inference, the -problem now amounts to detecting the relation between pairs of statements: they -either contradict or entail one another, or they are mutually neutral. Current -research in natural language inference is effectively exclusive to English. In -this paper, we propose to advance the research in SNLI-style natural language -inference toward multilingual evaluation. To that end, we provide test data for -four major languages: Arabic, French, Spanish, and Russian. We experiment with -a set of baselines. Our systems are based on cross-lingual word embeddings and -machine translation. While our best system scores an average accuracy of just -over 75%, we focus largely on enabling further research in multilingual -inference. -" -4617,1704.05358,"Jiaqi Mu, Suma Bhat, Pramod Viswanath",Representing Sentences as Low-Rank Subspaces,cs.CL," Sentences are important semantic units of natural language. A generic, -distributional representation of sentences that can capture the latent -semantics is beneficial to multiple downstream applications. We observe a -simple geometry of sentences -- the word representations of a given sentence -(on average 10.23 words in all SemEval datasets with a standard deviation 4.84) -roughly lie in a low-rank subspace (roughly, rank 4). Motivated by this -observation, we represent a sentence by the low-rank subspace spanned by its -word vectors. Such an unsupervised representation is empirically validated via -semantic textual similarity tasks on 19 different datasets, where it -outperforms the sophisticated neural network models, including skip-thought -vectors, by 15% on average. -" -4618,1704.05393,"Michela Fazzolari, Marinella Petrocchi, Alessandro Tommasi, Cesare - Zavattari","Mining Worse and Better Opinions. Unsupervised and Agnostic Aggregation - of Online Reviews",cs.SI cs.CL cs.IR," In this paper, we propose a novel approach for aggregating online reviews, -according to the opinions they express. Our methodology is unsupervised - due -to the fact that it does not rely on pre-labeled reviews - and it is agnostic - -since it does not make any assumption about the domain or the language of the -review content. We measure the adherence of a review content to the domain -terminology extracted from a review set. First, we demonstrate the -informativeness of the adherence metric with respect to the score associated -with a review. Then, we exploit the metric values to group reviews, according -to the opinions they express. Our experimental campaign has been carried out on -two large datasets collected from Booking and Amazon, respectively. -" -4619,1704.05415,"Cristina Espa\~na-Bonet, \'Ad\'am Csaba Varga, Alberto - Barr\'on-Cede\~no and Josef van Genabith","An Empirical Analysis of NMT-Derived Interlingual Embeddings and their - Use in Parallel Sentence Identification",cs.CL," End-to-end neural machine translation has overtaken statistical machine -translation in terms of translation quality for some language pairs, specially -those with large amounts of parallel data. Besides this palpable improvement, -neural networks provide several new properties. A single system can be trained -to translate between many languages at almost no additional cost other than -training time. Furthermore, internal representations learned by the network -serve as a new semantic representation of words -or sentences- which, unlike -standard word embeddings, are learned in an essentially bilingual or even -multilingual context. In view of these properties, the contribution of the -present work is two-fold. First, we systematically study the NMT context -vectors, i.e. output of the encoder, and their power as an interlingua -representation of a sentence. We assess their quality and effectiveness by -measuring similarities across translations, as well as semantically related and -semantically unrelated sentence pairs. Second, as extrinsic evaluation of the -first point, we identify parallel sentences in comparable corpora, obtaining an -F1=98.2% on data from a shared task when using only NMT context vectors. Using -context vectors jointly with similarity measures F1 reaches 98.9%. -" -4620,1704.05426,"Adina Williams, Nikita Nangia and Samuel R. Bowman","A Broad-Coverage Challenge Corpus for Sentence Understanding through - Inference",cs.CL," This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) -corpus, a dataset designed for use in the development and evaluation of machine -learning models for sentence understanding. In addition to being one of the -largest corpora available for the task of NLI, at 433k examples, this corpus -improves upon available resources in its coverage: it offers data from ten -distinct genres of written and spoken English--making it possible to evaluate -systems on nearly the full complexity of the language--and it offers an -explicit setting for the evaluation of cross-genre domain adaptation. -" -4621,1704.05513,"Pierre-Hadrien Arnoux, Anbang Xu, Neil Boyette, Jalal Mahmud, Rama - Akkiraju, Vibha Sinha","25 Tweets to Know You: A New Model to Predict Personality with Social - Media",cs.SI cs.AI cs.CL cs.HC," Predicting personality is essential for social applications supporting -human-centered activities, yet prior modeling methods with users written text -require too much input data to be realistically used in the context of social -media. In this work, we aim to drastically reduce the data requirement for -personality modeling and develop a model that is applicable to most users on -Twitter. Our model integrates Word Embedding features with Gaussian Processes -regression. Based on the evaluation of over 1.3K users on Twitter, we find that -our model achieves comparable or better accuracy than state of the art -techniques with 8 times fewer data. -" -4622,1704.05543,"Gaurav Singh Tomar, Sreecharan Sankaranarayanan, Xu Wang and Carolyn - Penstein Ros\'e",Coordinating Collaborative Chat in Massive Open Online Courses,cs.CY cs.AI cs.CL cs.HC," An earlier study of a collaborative chat intervention in a Massive Open -Online Course (MOOC) identified negative effects on attrition stemming from a -requirement for students to be matched with exactly one partner prior to -beginning the activity. That study raised questions about how to orchestrate a -collaborative chat intervention in a MOOC context in order to provide the -benefit of synchronous social engagement without the coordination difficulties. -In this paper we present a careful analysis of an intervention designed to -overcome coordination difficulties by welcoming students into the chat on a -rolling basis as they arrive rather than requiring them to be matched with a -partner before beginning. The results suggest the most positive impact when -experiencing a chat with exactly one partner rather than more or less. A -qualitative analysis of the chat data reveals differential experiences between -these configurations that suggests a potential explanation for the effect and -raises questions for future research. -" -4623,1704.05550,Rakesh Verma and Daniel Lee,"Extractive Summarization: Limits, Compression, Generalized Model and - Heuristics",cs.CL cs.IR," Due to its promise to alleviate information overload, text summarization has -attracted the attention of many researchers. However, it has remained a serious -challenge. Here, we first prove empirical limits on the recall (and F1-scores) -of extractive summarizers on the DUC datasets under ROUGE evaluation for both -the single-document and multi-document summarization tasks. Next we define the -concept of compressibility of a document and present a new model of -summarization, which generalizes existing models in the literature and -integrates several dimensions of the summarization, viz., abstractive versus -extractive, single versus multi-document, and syntactic versus semantic. -Finally, we examine some new and existing single-document summarization -algorithms in a single framework and compare with state of the art summarizers -on DUC data. -" -4624,1704.05571,Mayank Kejriwal,"Predicting Role Relevance with Minimal Domain Expertise in a Financial - Domain",cs.CL," Word embeddings have made enormous inroads in recent years in a wide variety -of text mining applications. In this paper, we explore a word embedding-based -architecture for predicting the relevance of a role between two financial -entities within the context of natural language sentences. In this extended -abstract, we propose a pooled approach that uses a collection of sentences to -train word embeddings using the skip-gram word2vec architecture. We use the -word embeddings to obtain context vectors that are assigned one or more labels -based on manual annotations. We train a machine learning classifier using the -labeled context vectors, and use the trained classifier to predict contextual -role relevance on test data. Our approach serves as a good minimal-expertise -baseline for the task as it is simple and intuitive, uses open-source modules, -requires little feature crafting effort and performs well across roles. -" -4625,1704.05572,Tushar Khot and Ashish Sabharwal and Peter Clark,Answering Complex Questions Using Open Information Extraction,cs.AI cs.CL," While there has been substantial progress in factoid question-answering (QA), -answering complex questions remains challenging, typically requiring both a -large body of knowledge and inference techniques. Open Information Extraction -(Open IE) provides a way to generate semi-structured knowledge for QA, but to -date such knowledge has only been used to answer simple questions with -retrieval-based methods. We overcome this limitation by presenting a method for -reasoning with Open IE knowledge, allowing more complex questions to be -handled. Using a recently proposed support graph optimization framework for QA, -we develop a new inference model for Open IE, in particular one that can work -effectively with multiple short facts, noise, and the relational structure of -tuples. Our model significantly outperforms a state-of-the-art structured -solver on complex questions of varying difficulty, while also removing the -reliance on manually curated knowledge. -" -4626,1704.05579,"Mikhail Khodak, Nikunj Saunshi and Kiran Vodrahalli",A Large Self-Annotated Corpus for Sarcasm,cs.CL cs.AI cs.LG," We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for -sarcasm research and for training and evaluating systems for sarcasm detection. -The corpus has 1.3 million sarcastic statements -- 10 times more than any -previous dataset -- and many times more instances of non-sarcastic statements, -allowing for learning in both balanced and unbalanced label regimes. Each -statement is furthermore self-annotated -- sarcasm is labeled by the author, -not an independent annotator -- and provided with user, topic, and conversation -context. We evaluate the corpus for accuracy, construct benchmarks for sarcasm -detection, and evaluate baseline methods. -" -4627,1704.05611,"Vijay Krishna Menon, S Rajendran, M Anandkumar, K P Soman","Dependency resolution and semantic mining using Tree Adjoining Grammars - for Tamil Language",cs.CL," Tree adjoining grammars (TAGs) provide an ample tool to capture syntax of -many Indian languages. Tamil represents a special challenge to computational -formalisms as it has extensive agglutinative morphology and a comparatively -difficult argument structure. Modelling Tamil syntax and morphology using TAG -is an interesting problem which has not been in focus even though TAGs are over -4 decades old, since its inception. Our research with Tamil TAGs have shown us -that we can not only represent syntax of the language, but to an extent mine -out semantics through dependency resolution of the sentence. But in order to -demonstrate this phenomenal property, we need to parse Tamil language sentences -using TAGs we have built and through parsing obtain a derivation we could use -to resolve dependencies, thus proving the semantic property. We use an in-house -developed pseudo lexical TAG chart parser; algorithm given by Schabes and Joshi -(1988), for generating derivations of sentences. We do not use any statistics -to rank out ambiguous derivations but rather use all of them to understand the -mentioned semantic relation with in TAGs for Tamil. We shall also present a -brief parser analysis for the completeness of our discussions. -" -4628,1704.05742,Pengfei Liu and Xipeng Qiu and Xuanjing Huang,Adversarial Multi-task Learning for Text Classification,cs.CL," Neural network models have shown their promising opportunities for multi-task -learning, which focus on learning the shared layers to extract the common and -task-invariant features. However, in most existing approaches, the extracted -shared features are prone to be contaminated by task-specific features or the -noise brought by other tasks. In this paper, we propose an adversarial -multi-task learning framework, alleviating the shared and private latent -feature spaces from interfering with each other. We conduct extensive -experiments on 16 different text classification tasks, which demonstrates the -benefits of our approach. Besides, we show that the shared knowledge learned by -our proposed model can be regarded as off-the-shelf knowledge and easily -transferred to new tasks. The datasets of all 16 tasks are publicly available -at \url{http://nlp.fudan.edu.cn/data/} -" -4629,1704.05753,"Youxuan Jiang, Jonathan K. Kummerfeld and Walter S. Lasecki","Understanding Task Design Trade-offs in Crowdsourced Paraphrase - Collection",cs.CL cs.HC," Linguistically diverse datasets are critical for training and evaluating -robust machine learning systems, but data collection is a costly process that -often requires experts. Crowdsourcing the process of paraphrase generation is -an effective means of expanding natural language datasets, but there has been -limited analysis of the trade-offs that arise when designing tasks. In this -paper, we present the first systematic study of the key factors in -crowdsourcing paraphrase collection. We consider variations in instructions, -incentives, data domains, and workflows. We manually analyzed paraphrases for -correctness, grammaticality, and linguistic diversity. Our observations provide -new insight into the trade-offs between accuracy and diversity in crowd -responses that arise as a result of task design, providing guidance for future -paraphrase generation procedures. -" -4630,1704.05781,"Pierre Lison, Andrey Kutuzov","Redefining Context Windows for Word Embedding Models: An Experimental - Study",cs.CL," Distributional semantic models learn vector representations of words through -the contexts they occur in. Although the choice of context (which often takes -the form of a sliding window) has a direct influence on the resulting -embeddings, the exact role of this model component is still not fully -understood. This paper presents a systematic analysis of context windows based -on a set of four distinct hyper-parameters. We train continuous Skip-Gram -models on two English-language corpora for various combinations of these -hyper-parameters, and evaluate them on both lexical similarity and analogy -tasks. Notable experimental results are the positive impact of cross-sentential -contexts and the surprisingly good performance of right-context windows. -" -4631,1704.05907,Hongyu Guo and Colin Cherry and Jiang Su,End-to-End Multi-View Networks for Text Classification,cs.CL cs.LG cs.NE," We propose a multi-view network for text classification. Our method -automatically creates various views of its input text, each taking the form of -soft attention weights that distribute the classifier's focus among a set of -base features. For a bag-of-words representation, each view focuses on a -different subset of the text's words. Aggregating many such views results in a -more discriminative and robust representation. Through a novel architecture -that both stacks and concatenates views, we produce a network that emphasizes -both depth and width, allowing training to converge quickly. Using our -multi-view architecture, we establish new state-of-the-art accuracies on two -benchmark tasks. -" -4632,1704.05908,"Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy",An Interpretable Knowledge Transfer Model for Knowledge Base Completion,cs.CL cs.AI cs.LG," Knowledge bases are important resources for a variety of natural language -processing tasks but suffer from incompleteness. We propose a novel embedding -model, \emph{ITransF}, to perform knowledge base completion. Equipped with a -sparse attention mechanism, ITransF discovers hidden concepts of relations and -transfer statistical strength through the sharing of concepts. Moreover, the -learned associations between relations and concepts, which are represented by -sparse attention vectors, can be interpreted easily. We evaluate ITransF on two -benchmark datasets---WN18 and FB15k for knowledge base completion and obtains -improvements on both the mean rank and Hits@10 metrics, over all baselines that -do not use additional information. -" -4633,1704.05958,"Yu Su, Honglei Liu, Semih Yavuz, Izzeddin Gur, Huan Sun, Xifeng Yan",Global Relation Embedding for Relation Extraction,cs.CL," We study the problem of textual relation embedding with distant supervision. -To combat the wrong labeling problem of distant supervision, we propose to -embed textual relations with global statistics of relations, i.e., the -co-occurrence statistics of textual and knowledge base relations collected from -the entire corpus. This approach turns out to be more robust to the training -noise introduced by distant supervision. On a popular relation extraction -dataset, we show that the learned textual relation embedding can be used to -augment existing relation extraction models and significantly improve their -performance. Most remarkably, for the top 1,000 relational facts discovered by -the best existing model, the precision can be improved from 83.9% to 89.3%. -" -4634,1704.05972,"Leon Derczynski and Kalina Bontcheva and Maria Liakata and Rob Procter - and Geraldine Wong Sak Hoi and Arkaitz Zubiaga","SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support - for rumours",cs.CL cs.AI," Media is full of false claims. Even Oxford Dictionaries named ""post-truth"" as -the word of 2016. This makes it more important than ever to build systems that -can identify the veracity of a story, and the kind of discourse there is around -it. RumourEval is a SemEval shared task that aims to identify and handle -rumours and reactions to them, in text. We present an annotation scheme, a -large dataset covering multiple topics - each having their own families of -claims and replies - and use these to pose two concrete challenges as well as -the results achieved by participants on these challenges. -" -4635,1704.05973,"Tong Chen, Lin Wu, Xue Li, Jun Zhang, Hongzhi Yin, Yang Wang","Call Attention to Rumors: Deep Attention Based Recurrent Neural Networks - for Early Rumor Detection",cs.CL cs.SI," The proliferation of social media in communication and information -dissemination has made it an ideal platform for spreading rumors. Automatically -debunking rumors at their stage of diffusion is known as \textit{early rumor -detection}, which refers to dealing with sequential posts regarding disputed -factual claims with certain variations and highly textual duplication over -time. Thus, identifying trending rumors demands an efficient yet flexible model -that is able to capture long-range dependencies among postings and produce -distinct representations for the accurate early detection. However, it is a -challenging task to apply conventional classification algorithms to rumor -detection in earliness since they rely on hand-crafted features which require -intensive manual efforts in the case of large amount of posts. This paper -presents a deep attention model on the basis of recurrent neural networks (RNN) -to learn \textit{selectively} temporal hidden representations of sequential -posts for identifying rumors. The proposed model delves soft-attention into the -recurrence to simultaneously pool out distinct features with particular focus -and produce hidden representations that capture contextual variations of -relevant posts over time. Extensive experiments on real datasets collected from -social media websites demonstrate that (1) the deep attention based RNN model -outperforms state-of-the-arts that rely on hand-crafted features; (2) the -introduction of soft attention mechanism can effectively distill relevant parts -to rumors from original posts in advance; (3) the proposed method detects -rumors more quickly and accurately than competitors. -" -4636,1704.05974,"Yu Su, Xifeng Yan",Cross-domain Semantic Parsing via Paraphrasing,cs.CL," Existing studies on semantic parsing mainly focus on the in-domain setting. -We formulate cross-domain semantic parsing as a domain adaptation problem: -train a semantic parser on some source domains and then adapt it to the target -domain. Due to the diversity of logical forms in different domains, this -problem presents unique and intriguing challenges. By converting logical forms -into canonical utterances in natural language, we reduce semantic parsing to -paraphrasing, and develop an attentive sequence-to-sequence paraphrase model -that is general and flexible to adapt to different domains. We discover two -problems, small micro variance and large macro variance, of pre-trained word -embeddings that hinder their direct use in neural networks, and propose -standardization techniques as a remedy. On the popular Overnight dataset, which -contains eight domains, we show that both cross-domain training and -standardized pre-trained word embeddings can bring significant improvement. -" -4637,1704.06104,Steffen Eger and Johannes Daxenberger and Iryna Gurevych,Neural End-to-End Learning for Computational Argumentation Mining,cs.CL," We investigate neural techniques for end-to-end computational argumentation -mining (AM). We frame AM both as a token-based dependency parsing and as a -token-based sequence tagging problem, including a multi-task learning setup. -Contrary to models that operate on the argument component level, we find that -framing AM as dependency parsing leads to subpar performance results. In -contrast, less complex (local) tagging models based on BiLSTMs perform robustly -across classification scenarios, being able to catch long-range dependencies -inherent to the AM problem. Moreover, we find that jointly learning 'natural' -subtasks, in a multi-task learning setup, improves performance. -" -4638,1704.06125,Mathieu Cliche,"BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and - LSTMs",cs.CL stat.ML," In this paper we describe our attempt at producing a state-of-the-art Twitter -sentiment classifier using Convolutional Neural Networks (CNNs) and Long Short -Term Memory (LSTMs) networks. Our system leverages a large amount of unlabeled -data to pre-train word embeddings. We then use a subset of the unlabeled data -to fine tune the embeddings using distant supervision. The final CNNs and LSTMs -are trained on the SemEval-2017 Twitter dataset where the embeddings are fined -tuned again. To boost performances we ensemble several CNNs and LSTMs together. -Our approach achieved first rank on all of the five English subtasks amongst 40 -teams. -" -4639,1704.06194,"Mo Yu, Wenpeng Yin, Kazi Saidul Hasan, Cicero dos Santos, Bing Xiang, - Bowen Zhou",Improved Neural Relation Detection for Knowledge Base Question Answering,cs.CL cs.AI cs.NE," Relation detection is a core component for many NLP applications including -Knowledge Base Question Answering (KBQA). In this paper, we propose a -hierarchical recurrent neural network enhanced by residual learning that -detects KB relations given an input question. Our method uses deep residual -bidirectional LSTMs to compare questions and relation names via different -hierarchies of abstraction. Additionally, we propose a simple KBQA system that -integrates entity linking and our proposed relation detector to enable one -enhance another. Experimental results evidence that our approach achieves not -only outstanding relation detection performance, but more importantly, it helps -our KBQA system to achieve state-of-the-art accuracy for both single-relation -(SimpleQuestions) and multi-relation (WebQSP) QA benchmarks. -" -4640,1704.06217,"Ji He, Mari Ostendorf, Xiaodong He","Reinforcement Learning with External Knowledge and Two-Stage Q-functions - for Predicting Popular Reddit Threads",cs.CL," This paper addresses the problem of predicting popularity of comments in an -online discussion forum using reinforcement learning, particularly addressing -two challenges that arise from having natural language state and action spaces. -First, the state representation, which characterizes the history of comments -tracked in a discussion at a particular point, is augmented to incorporate the -global context represented by discussions on world events available in an -external knowledge source. Second, a two-stage Q-learning framework is -introduced, making it feasible to search the combinatorial action space while -also accounting for redundancy among sub-actions. We experiment with five -Reddit communities, showing that the two methods improve over previous reported -results on this task. -" -4641,1704.06259,"Ping Chen, Fei Wu, Tong Wang, Wei Ding",A Semantic QA-Based Approach for Text Summarization Evaluation,cs.CL cs.AI," Many Natural Language Processing and Computational Linguistics applications -involves the generation of new texts based on some existing texts, such as -summarization, text simplification and machine translation. However, there has -been a serious problem haunting these applications for decades, that is, how to -automatically and accurately assess quality of these applications. In this -paper, we will present some preliminary results on one especially useful and -challenging problem in NLP system evaluation: how to pinpoint content -differences of two text passages (especially for large pas-sages such as -articles and books). Our idea is intuitive and very different from existing -approaches. We treat one text passage as a small knowledge base, and ask it a -large number of questions to exhaustively identify all content points in it. By -comparing the correctly answered questions from two text passages, we will be -able to compare their content precisely. The experiment using 2007 DUC -summarization corpus clearly shows promising results. -" -4642,1704.06358,"Benjamin Goodman, Paul Tupper",Stability and Fluctuations in a Simple Model of Phonetic Category Change,cs.CL math.DS," In spoken languages, speakers divide up the space of phonetic possibilities -into different regions, corresponding to different phonemes. We consider a -simple exemplar model of how this division of phonetic space varies over time -among a population of language users. In the particular model we consider, we -show that, once the system is initialized with a given set of phonemes, that -phonemes do not become extinct: all phonemes will be maintained in the system -for all time. This is in contrast to what is observed in more complex models. -Furthermore, we show that the boundaries between phonemes fluctuate and we -quantitatively study the fluctuations in a simple instance of our model. These -results prepare the ground for more sophisticated models in which some phonemes -go extinct or new phonemes emerge through other processes. -" -4643,1704.06360,"Jason Fries, Sen Wu, Alex Ratner, Christopher R\'e","SwellShark: A Generative Model for Biomedical Named Entity Recognition - without Labeled Data",cs.CL," We present SwellShark, a framework for building biomedical named entity -recognition (NER) systems quickly and without hand-labeled data. Our approach -views biomedical resources like lexicons as function primitives for -autogenerating weak supervision. We then use a generative model to unify and -denoise this supervision and construct large-scale, probabilistically labeled -datasets for training high-accuracy NER taggers. In three biomedical NER tasks, -SwellShark achieves competitive scores with state-of-the-art supervised -benchmarks using no hand-labeled training data. In a drug name extraction task -using patient medical records, one domain expert using SwellShark achieved -within 5.1% of a crowdsourced annotation approach -- which originally utilized -20 teams over the course of several weeks -- in 24 hours. -" -4644,1704.06380,Aaron Jaech and Mari Ostendorf,Improving Context Aware Language Models,cs.CL," Increased adaptability of RNN language models leads to improved predictions -that benefit many applications. However, current methods do not take full -advantage of the RNN structure. We show that the most widely-used approach to -adaptation (concatenating the context with the word embedding at the input to -the recurrent layer) is outperformed by a model that has some low-cost -improvements: adaptation of both the hidden and output layers. and a feature -hashing bias term to capture context idiosyncrasies. Experiments on language -modeling and classification tasks using three different corpora demonstrate the -advantages of the proposed techniques. -" -4645,1704.06393,"Long Zhou, Wenpeng Hu, Jiajun Zhang and Chengqing Zong",Neural System Combination for Machine Translation,cs.CL," Neural machine translation (NMT) becomes a new approach to machine -translation and generates much more fluent results compared to statistical -machine translation (SMT). - However, SMT is usually better than NMT in translation adequacy. It is -therefore a promising direction to combine the advantages of both NMT and SMT. - In this paper, we propose a neural system combination framework leveraging -multi-source NMT, which takes as input the outputs of NMT and SMT systems and -produces the final translation. - Extensive experiments on the Chinese-to-English translation task show that -our model archives significant improvement by 5.3 BLEU points over the best -single system output and 3.4 BLEU points over the state-of-the-art traditional -system combination methods. -" -4646,1704.06485,"Cesc Chunseong Park, Byeongchang Kim, Gunhee Kim","Attend to You: Personalized Image Captioning with Context Sequence - Memory Networks",cs.CV cs.CL," We address personalization issues of image captioning, which have not been -discussed yet in previous research. For a query image, we aim to generate a -descriptive sentence, accounting for prior knowledge such as the user's active -vocabularies in previous documents. As applications of personalized image -captioning, we tackle two post automation tasks: hashtag prediction and post -generation, on our newly collected Instagram dataset, consisting of 1.1M posts -from 6.3K users. We propose a novel captioning model named Context Sequence -Memory Network (CSMN). Its unique updates over previous memory network models -include (i) exploiting memory as a repository for multiple types of context -information, (ii) appending previously generated words into memory to capture -long-term information without suffering from the vanishing gradient problem, -and (iii) adopting CNN memory structure to jointly represent nearby ordered -memory slots for better context understanding. With quantitative evaluation and -user studies via Amazon Mechanical Turk, we show the effectiveness of the three -novel features of CSMN and its performance enhancement for personalized image -captioning over state-of-the-art captioning models. -" -4647,1704.06497,"Julia Kreutzer, Artem Sokolov, Stefan Riezler",Bandit Structured Prediction for Neural Sequence-to-Sequence Learning,stat.ML cs.CL cs.LG," Bandit structured prediction describes a stochastic optimization framework -where learning is performed from partial feedback. This feedback is received in -the form of a task loss evaluation to a predicted output structure, without -having access to gold standard structures. We advance this framework by lifting -linear bandit learning to neural sequence-to-sequence learning problems using -attention-based recurrent neural networks. Furthermore, we show how to -incorporate control variates into our learning algorithms for variance -reduction and improved generalization. We present an evaluation on a neural -machine translation task that shows improvements of up to 5.89 BLEU points for -domain adaptation from simulated bandit feedback. -" -4648,1704.06567,Jind\v{r}ich Libovick\'y and Jind\v{r}ich Helcl,Attention Strategies for Multi-Source Sequence-to-Sequence Learning,cs.CL cs.NE," Modeling attention in neural multi-source sequence-to-sequence learning -remains a relatively unexplored area, despite its usefulness in tasks that -incorporate multiple source languages or modalities. We propose two novel -approaches to combine the outputs of attention mechanisms over each source -sequence, flat and hierarchical. We compare the proposed methods with existing -techniques and present results of systematic evaluation of those methods on the -WMT16 Multimodal Translation and Automatic Post-editing tasks. We show that the -proposed methods achieve competitive results on both tasks. -" -4649,1704.06619,Arman Cohan and Nazli Goharian,"Scientific Article Summarization Using Citation-Context and Article's - Discourse Structure",cs.CL cs.IR," We propose a summarization approach for scientific articles which takes -advantage of citation-context and the document discourse model. While citations -have been previously used in generating scientific summaries, they lack the -related context from the referenced article and therefore do not accurately -reflect the article's content. Our method overcomes the problem of -inconsistency between the citation summary and the article's content by -providing context for each citation. We also leverage the inherent scientific -article's discourse for producing better summaries. We show that our proposed -method effectively improves over existing summarization approaches (greater -than 30% improvement over the best performing baseline) in terms of -\textsc{Rouge} scores on TAC2014 scientific summarization dataset. While the -dataset we use for evaluation is in the biomedical domain, most of our -approaches are general and therefore adaptable to other domains. -" -4650,1704.06692,"Thomas Kober, Julie Weeds, Jeremy Reffin and David Weir",Improving Semantic Composition with Offset Inference,cs.CL," Count-based distributional semantic models suffer from sparsity due to -unobserved but plausible co-occurrences in any text collection. This problem is -amplified for models like Anchored Packed Trees (APTs), that take the -grammatical type of a co-occurrence into account. We therefore introduce a -novel form of distributional inference that exploits the rich type structure in -APTs and infers missing data by the same mechanism that is used for semantic -composition. -" -4651,1704.06779,Nafise Sadat Moosavi and Michael Strube,Lexical Features in Coreference Resolution: To be Used With Caution,cs.CL," Lexical features are a major source of information in state-of-the-art -coreference resolvers. Lexical features implicitly model some of the linguistic -phenomena at a fine granularity level. They are especially useful for -representing the context of mentions. In this paper we investigate a drawback -of using many lexical features in state-of-the-art coreference resolvers. We -show that if coreference resolvers mainly rely on lexical features, they can -hardly generalize to unseen domains. Furthermore, we show that the current -coreference resolution evaluation is clearly flawed by only evaluating on a -specific split of a specific dataset in which there is a notable overlap -between the training, development and test sets. -" -4652,1704.06836,Lotem Peled and Roi Reichart,"Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual - Machine Translation",cs.CL," Sarcasm is a form of speech in which speakers say the opposite of what they -truly mean in order to convey a strong sentiment. In other words, ""Sarcasm is -the giant chasm between what I say, and the person who doesn't get it."". In -this paper we present the novel task of sarcasm interpretation, defined as the -generation of a non-sarcastic utterance conveying the same message as the -original sarcastic one. We introduce a novel dataset of 3000 sarcastic tweets, -each interpreted by five human judges. Addressing the task as monolingual -machine translation (MT), we experiment with MT algorithms and evaluation -measures. We then present SIGN: an MT based sarcasm interpretation algorithm -that targets sentiment words, a defining element of textual sarcasm. We show -that while the scores of n-gram based automatic measures are similar for all -interpretation models, SIGN's interpretations are scored higher by humans for -adequacy and sentiment polarity. We conclude with a discussion on future -research directions for our new task. -" -4653,1704.06841,"Mark Hughes, Irene Li, Spyros Kotoulas, Toyotaro Suzumura",Medical Text Classification using Convolutional Neural Networks,cs.CL," We present an approach to automatically classify clinical text at a sentence -level. We are using deep convolutional neural networks to represent complex -features. We train the network on a dataset providing a broad categorization of -health information. Through a detailed evaluation, we demonstrate that our -method outperforms several approaches widely used in natural language -processing tasks by about 15%. -" -4654,1704.06851,"Sayan Ghosh, Mathieu Chollet, Eugene Laksana, Louis-Philippe Morency, - Stefan Scherer","Affect-LM: A Neural Language Model for Customizable Affective Text - Generation",cs.CL," Human verbal communication includes affective messages which are conveyed -through use of emotionally colored words. There has been a lot of research in -this direction but the problem of integrating state-of-the-art neural language -models with affective information remains an area ripe for exploration. In this -paper, we propose an extension to an LSTM (Long Short-Term Memory) language -model for generating conversational text, conditioned on affect categories. Our -proposed model, Affect-LM enables us to customize the degree of emotional -content in generated sentences through an additional design parameter. -Perception studies conducted using Amazon Mechanical Turk show that Affect-LM -generates naturally looking emotional sentences without sacrificing grammatical -correctness. Affect-LM also learns affect-discriminative word representations, -and perplexity experiments show that additional affective information in -conversational text can improve language model prediction. -" -4655,1704.06855,"Hao Peng, Sam Thomson, Noah A. Smith",Deep Multitask Learning for Semantic Dependency Parsing,cs.CL," We present a deep neural architecture that parses sentences into three -semantic dependency graph formalisms. By using efficient, nearly arc-factored -inference and a bidirectional-LSTM composed with a multi-layer perceptron, our -base system is able to significantly improve the state of the art for semantic -dependency parsing, without using hand-engineered features or syntax. We then -explore two multitask learning approaches---one that shares parameters across -formalisms, and one that uses higher-order structures to predict the graphs -jointly. We find that both approaches improve performance across formalisms on -average, achieving a new state of the art. Our code is open-source and -available at https://github.com/Noahs-ARK/NeurboParser. -" -4656,1704.06869,"Vlad Niculae, Joonsuk Park, Claire Cardie",Argument Mining with Structured SVMs and RNNs,cs.CL," We propose a novel factor graph model for argument mining, designed for -settings in which the argumentative relations in a document do not necessarily -form a tree structure. (This is the case in over 20% of the web comments -dataset we release.) Our model jointly learns elementary unit type -classification and argumentative relation prediction. Moreover, our model -supports SVM and RNN parametrizations, can enforce structure constraints (e.g., -transitivity), and can express dependencies between adjacent relations and -propositions. Our approaches outperform unstructured baselines in both web -comments and argumentative essay datasets. -" -4657,1704.06877,"Adams Wei Yu, Hongrae Lee, Quoc V. Le",Learning to Skim Text,cs.CL cs.LG," Recurrent Neural Networks are showing much promise in many sub-areas of -natural language processing, ranging from document classification to machine -translation to automatic question answering. Despite their promise, many -recurrent models have to read the whole text word by word, making it slow to -handle long documents. For example, it is difficult to use a recurrent network -to read a book and answer questions about it. In this paper, we present an -approach of reading text while skipping irrelevant information if needed. The -underlying model is a recurrent network that learns how far to jump after -reading a few words of the input text. We employ a standard policy gradient -method to train the model to make discrete jumping decisions. In our benchmarks -on four different tasks, including number prediction, sentiment analysis, news -article classification and automatic Q\&A, our proposed model, a modified LSTM -with jumping, is up to 6 times faster than the standard sequential LSTM, while -maintaining the same or even better accuracy. -" -4658,1704.06879,"Rui Meng, Sanqiang Zhao, Shuguang Han, Daqing He, Peter Brusilovsky, - Yu Chi",Deep Keyphrase Generation,cs.CL," Keyphrase provides highly-condensed information that can be effectively used -for understanding, organizing and retrieving text content. Though previous -studies have provided many workable solutions for automated keyphrase -extraction, they commonly divided the to-be-summarized content into multiple -text chunks, then ranked and selected the most meaningful ones. These -approaches could neither identify keyphrases that do not appear in the text, -nor capture the real semantic meaning behind the text. We propose a generative -model for keyphrase prediction with an encoder-decoder framework, which can -effectively overcome the above drawbacks. We name it as deep keyphrase -generation since it attempts to capture the deep semantic meaning of the -content with a deep learning method. Empirical analysis on six datasets -demonstrates that our proposed model not only achieves a significant -performance boost on extracting keyphrases that appear in the source text, but -also can generate absent keyphrases based on the semantic meaning of the text. -Code and dataset are available at -https://github.com/memray/OpenNMT-kpg-release. -" -4659,1704.06913,"Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux",Learning weakly supervised multimodal phoneme embeddings,cs.CL cs.LG," Recent works have explored deep architectures for learning multimodal speech -representation (e.g. audio and images, articulation and audio) in a supervised -way. Here we investigate the role of combining different speech modalities, -i.e. audio and visual information representing the lips movements, in a weakly -supervised way using Siamese networks and lexical same-different side -information. In particular, we ask whether one modality can benefit from the -other to provide a richer representation for phone recognition in a weakly -supervised setting. We introduce mono-task and multi-task methods for merging -speech and visual modalities for phone recognition. The mono-task learning -consists in applying a Siamese network on the concatenation of the two -modalities, while the multi-task learning receives several different -combinations of modalities at train time. We show that multi-task learning -enhances discriminability for visual and multimodal inputs while minimally -impacting auditory inputs. Furthermore, we present a qualitative analysis of -the obtained phone embeddings, and show that cross-modal visual input can -improve the discriminability of phonological features which are visually -discernable (rounding, open/close, labial place of articulation), resulting in -representations that are closer to abstract linguistic features than those -based on audio only. -" -4660,1704.06918,"Yusuke Oda, Philip Arthur, Graham Neubig, Koichiro Yoshino, Satoshi - Nakamura",Neural Machine Translation via Binary Code Prediction,cs.CL," In this paper, we propose a new method for calculating the output layer in -neural machine translation systems. The method is based on predicting a binary -code for each word and can reduce computation time/memory requirements of the -output layer to be logarithmic in vocabulary size in the best case. In -addition, we also introduce two advanced approaches to improve the robustness -of the proposed model: using error-correcting codes and combining softmax and -binary codes. Experiments on two English-Japanese bidirectional translation -tasks show proposed models achieve BLEU scores that approach the softmax, while -reducing memory usage to the order of less than 1/10 and improving decoding -speed on CPUs by x5 to x10. -" -4661,1704.06933,"Lijun Wu, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jianhuang Lai, - Tie-Yan Liu",Adversarial Neural Machine Translation,cs.CL cs.LG stat.ML," In this paper, we study a new learning paradigm for Neural Machine -Translation (NMT). Instead of maximizing the likelihood of the human -translation as in previous works, we minimize the distinction between human -translation and the translation given by an NMT model. To achieve this goal, -inspired by the recent success of generative adversarial networks (GANs), we -employ an adversarial training architecture and name it as Adversarial-NMT. In -Adversarial-NMT, the training of the NMT model is assisted by an adversary, -which is an elaborately designed Convolutional Neural Network (CNN). The goal -of the adversary is to differentiate the translation result generated by the -NMT model from that by human. The goal of the NMT model is to produce high -quality translations so as to cheat the adversary. A policy gradient method is -leveraged to co-train the NMT model and the adversary. Experimental results on -English$\rightarrow$French and German$\rightarrow$English translation tasks -show that Adversarial-NMT can achieve significantly better translation quality -than several strong baselines. -" -4662,1704.06936,Masashi Yoshikawa and Hiroshi Noji and Yuji Matsumoto,A* CCG Parsing with a Supertag and Dependency Factored Model,cs.CL," We propose a new A* CCG parsing model in which the probability of a tree is -decomposed into factors of CCG categories and its syntactic dependencies both -defined on bi-directional LSTMs. Our factored model allows the precomputation -of all probabilities and runs very efficiently, while modeling sentence -structures explicitly via dependencies. Our model achieves the state-of-the-art -results on English and Japanese CCG parsing. -" -4663,1704.06956,Sida I. Wang and Samuel Ginn and Percy Liang and Christoper D. Manning,Naturalizing a Programming Language via Interactive Learning,cs.CL cs.AI cs.HC cs.LG," Our goal is to create a convenient natural language interface for performing -well-specified but complex actions such as analyzing data, manipulating text, -and querying databases. However, existing natural language interfaces for such -tasks are quite primitive compared to the power one wields with a programming -language. To bridge this gap, we start with a core programming language and -allow users to ""naturalize"" the core language incrementally by defining -alternative, more natural syntax and increasingly complex concepts in terms of -compositions of simpler ones. In a voxel world, we show that a community of -users can simultaneously teach a common system a diverse language and use it to -build hundreds of complex voxel structures. Over the course of three days, -these users went from using only the core language to using the naturalized -language in 85.9\% of the last 10K utterances. -" -4664,1704.06960,"Jacob Andreas, Anca Dragan, Dan Klein",Translating Neuralese,cs.CL cs.NE," Several approaches have recently been proposed for learning decentralized -deep multiagent policies that coordinate via a differentiable communication -channel. While these policies are effective for many tasks, interpretation of -their induced communication strategies has remained a challenge. Here we -propose to interpret agents' messages by translating them. Unlike in typical -machine translation problems, we have no parallel data to learn from. Instead -we develop a translation model based on the insight that agent messages and -natural language strings mean the same thing if they induce the same belief -about the world in a listener. We present theoretical guarantees and empirical -evidence that our approach preserves both the semantics and pragmatics of -messages by ensuring that players communicating through a translation layer do -not suffer a substantial loss in reward relative to players with a common -language. -" -4665,1704.06970,"Kartik Goyal, Chris Dyer and Taylor Berg-Kirkpatrick",Differentiable Scheduled Sampling for Credit Assignment,cs.CL cs.LG cs.NE," We demonstrate that a continuous relaxation of the argmax operation can be -used to create a differentiable approximation to greedy decoding for -sequence-to-sequence (seq2seq) models. By incorporating this approximation into -the scheduled sampling training procedure (Bengio et al., 2015)--a well-known -technique for correcting exposure bias--we introduce a new training objective -that is continuous and differentiable everywhere and that can provide -informative gradients near points where previous decoding decisions change -their value. In addition, by using a related approximation, we demonstrate a -similar approach to sampled-based training. Finally, we show that our approach -outperforms cross-entropy training and scheduled sampling procedures in two -sequence prediction tasks: named entity recognition and machine translation. -" -4666,1704.06986,"Kazuya Kawakami, Chris Dyer, Phil Blunsom","Learning to Create and Reuse Words in Open-Vocabulary Neural Language - Modeling",cs.CL," Fixed-vocabulary language models fail to account for one of the most -characteristic statistical facts of natural language: the frequent creation and -reuse of new word types. Although character-level language models offer a -partial solution in that they can create word types not attested in the -training corpus, they do not capture the ""bursty"" distribution of such words. -In this paper, we augment a hierarchical LSTM language model that generates -sequences of word tokens character by character with a caching mechanism that -learns to reuse previously generated words. To validate our model we construct -a new open-vocabulary language modeling corpus (the Multilingual Wikipedia -Corpus, MWC) from comparable Wikipedia articles in 7 typologically diverse -languages and demonstrate the effectiveness of our model across this range of -languages. -" -4667,1704.07047,"Deng Cai, Hai Zhao, Zhisong Zhang, Yuan Xin, Yongjian Wu, Feiyue Huang",Fast and Accurate Neural Word Segmentation for Chinese,cs.CL," Neural models with minimal feature engineering have achieved competitive -performance against traditional methods for the task of Chinese word -segmentation. However, both training and working procedures of the current -neural models are computationally inefficient. This paper presents a greedy -neural word segmenter with balanced word and character embedding inputs to -alleviate the existing drawbacks. Our segmenter is truly end-to-end, capable of -performing segmentation much faster and even more accurate than -state-of-the-art neural models on Chinese benchmark datasets. -" -4668,1704.07050,Michael Bloodgood and Benjamin Strauss,Using Global Constraints and Reranking to Improve Cognates Detection,cs.CL cs.LG stat.ML," Global constraints and reranking have not been used in cognates detection -research to date. We propose methods for using global constraints by performing -rescoring of the score matrices produced by state of the art cognates detection -systems. Using global constraints to perform rescoring is complementary to -state of the art methods for performing cognates detection and results in -significant performance improvements beyond current state of the art -performance on publicly available datasets with different language pairs and -various conditions such as different levels of baseline state of the art -performance and different data size conditions, including with more realistic -large data size conditions than have been evaluated with in the past. -" -4669,1704.07073,"Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou",Selective Encoding for Abstractive Sentence Summarization,cs.CL," We propose a selective encoding model to extend the sequence-to-sequence -framework for abstractive sentence summarization. It consists of a sentence -encoder, a selective gate network, and an attention equipped decoder. The -sentence encoder and decoder are built with recurrent neural networks. The -selective gate network constructs a second level sentence representation by -controlling the information flow from encoder to decoder. The second level -representation is tailored for sentence summarization task, which leads to -better performance. We evaluate our model on the English Gigaword, DUC 2004 and -MSR abstractive sentence summarization datasets. The experimental results show -that the proposed selective encoding model outperforms the state-of-the-art -baseline models. -" -4670,1704.07092,Jan Buys and Phil Blunsom,Robust Incremental Neural Semantic Graph Parsing,cs.CL," Parsing sentences to linguistically-expressive semantic representations is a -key goal of Natural Language Processing. Yet statistical parsing has focused -almost exclusively on bilexical dependencies or domain-specific logical forms. -We propose a neural encoder-decoder transition-based parser which is the first -full-coverage semantic graph parser for Minimal Recursion Semantics (MRS). The -model architecture uses stack-based embedding features, predicting graphs -jointly with unlexicalized predicates and their token alignments. Our parser is -more accurate than attention-based baselines on MRS, and on an additional -Abstract Meaning Representation (AMR) benchmark, and GPU batch processing makes -it an order of magnitude faster than a high-precision grammar-based parser. -Further, the 86.69% Smatch score of our MRS parser is higher than the -upper-bound on AMR parsing, making MRS an attractive choice as a semantic -representation. -" -4671,1704.07121,"Wei-Lun Chao, Hexiang Hu, Fei Sha","Being Negative but Constructively: Lessons Learnt from Creating Better - Visual Question Answering Datasets",cs.CL cs.AI cs.CV cs.LG," Visual question answering (Visual QA) has attracted a lot of attention -lately, seen essentially as a form of (visual) Turing test that artificial -intelligence should strive to achieve. In this paper, we study a crucial -component of this task: how can we design good datasets for the task? We focus -on the design of multiple-choice based datasets where the learner has to select -the right answer from a set of candidate ones including the target (\ie the -correct one) and the decoys (\ie the incorrect ones). Through careful analysis -of the results attained by state-of-the-art learning models and human -annotators on existing datasets, we show that the design of the decoy answers -has a significant impact on how and what the learning models learn from the -datasets. In particular, the resulting learner can ignore the visual -information, the question, or both while still doing well on the task. Inspired -by this, we propose automatic procedures to remedy such design deficiencies. We -apply the procedures to re-construct decoy answers for two popular Visual QA -datasets as well as to create a new Visual QA dataset from the Visual Genome -project, resulting in the largest dataset for this task. Extensive empirical -studies show that the design deficiencies have been alleviated in the remedied -datasets and the performance on them is likely a more faithful indicator of the -difference among learning models. The datasets are released and publicly -available via http://www.teds.usc.edu/website_vqa/. -" -4672,1704.07129,"Spandana Gella, Frank Keller",An Analysis of Action Recognition Datasets for Language and Vision Tasks,cs.CL cs.CV," A large amount of recent research has focused on tasks that combine language -and vision, resulting in a proliferation of datasets and methods. One such task -is action recognition, whose applications include image annotation, scene -under- standing and image retrieval. In this survey, we categorize the existing -ap- proaches based on how they conceptualize this problem and provide a -detailed review of existing datasets, highlighting their di- versity as well as -advantages and disad- vantages. We focus on recently devel- oped datasets which -link visual informa- tion with linguistic resources and provide a fine-grained -syntactic and semantic anal- ysis of actions in images. -" -4673,1704.07130,He He and Anusha Balakrishnan and Mihail Eric and Percy Liang,"Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge - Graph Embeddings",cs.CL," We study a symmetric collaborative dialogue setting in which two agents, each -with private knowledge, must strategically communicate to achieve a common -goal. The open-ended dialogue state in this setting poses new challenges for -existing dialogue systems. We collected a dataset of 11K human-human dialogues, -which exhibits interesting lexical, semantic, and strategic elements. To model -both structured knowledge and unstructured language, we propose a neural model -with dynamic knowledge graph embeddings that evolve as the dialogue progresses. -Automatic and human evaluations show that our model is both more effective at -achieving the goal and more human-like than baseline neural and rule-based -models. -" -4674,1704.07138,Chris Hokamp and Qun Liu,"Lexically Constrained Decoding for Sequence Generation Using Grid Beam - Search",cs.CL," We present Grid Beam Search (GBS), an algorithm which extends beam search to -allow the inclusion of pre-specified lexical constraints. The algorithm can be -used with any model that generates a sequence $ \mathbf{\hat{y}} = -\{y_{0}\ldots y_{T}\} $, by maximizing $ p(\mathbf{y} | \mathbf{x}) = -\prod\limits_{t}p(y_{t} | \mathbf{x}; \{y_{0} \ldots y_{t-1}\}) $. Lexical -constraints take the form of phrases or words that must be present in the -output sequence. This is a very general way to incorporate additional knowledge -into a model's output without requiring any modification of the model -parameters or training data. We demonstrate the feasibility and flexibility of -Lexically Constrained Decoding by conducting experiments on Neural -Interactive-Predictive Translation, as well as Domain Adaptation for Neural -Machine Translation. Experiments show that GBS can provide large improvements -in translation quality in interactive scenarios, and that, even without any -user input, GBS can be used to achieve significant gains in performance in -domain adaptation scenarios. -" -4675,1704.07146,"Ella Rabinovich, Noam Ordan, Shuly Wintner","Found in Translation: Reconstructing Phylogenetic Language Trees from - Translations",cs.CL," Translation has played an important role in trade, law, commerce, politics, -and literature for thousands of years. Translators have always tried to be -invisible; ideal translations should look as if they were written originally in -the target language. We show that traces of the source language remain in the -translation product to the extent that it is possible to uncover the history of -the source language by looking only at the translation. Specifically, we -automatically reconstruct phylogenetic language trees from monolingual texts -(translated from several source languages). The signal of the source language -is so powerful that it is retained even after two phases of translation. This -strongly indicates that source language interference is the most dominant -characteristic of translated texts, overshadowing the more subtle signals of -universal properties of translation. -" -4676,1704.07156,Marek Rei,Semi-supervised Multitask Learning for Sequence Labeling,cs.CL cs.LG cs.NE," We propose a sequence labeling framework with a secondary training objective, -learning to predict surrounding words for every word in the dataset. This -language modeling objective incentivises the system to learn general-purpose -patterns of semantic and syntactic composition, which are also useful for -improving accuracy on different sequence labeling tasks. The architecture was -evaluated on a range of datasets, covering the tasks of error detection in -learner texts, named entity recognition, chunking and POS-tagging. The novel -language modeling objective provided consistent performance improvements on -every benchmark, without requiring any additional annotated or unannotated -data. -" -4677,1704.07157,"Dmitry Ustalov, Alexander Panchenko and Chris Biemann",Watset: Automatic Induction of Synsets from a Graph of Synonyms,cs.CL," This paper presents a new graph-based approach that induces synsets using -synonymy dictionaries and word embeddings. First, we build a weighted graph of -synonyms extracted from commonly available resources, such as Wiktionary. -Second, we apply word sense induction to deal with ambiguous words. Finally, we -cluster the disambiguated version of the ambiguous input graph into synsets. -Our meta-clustering approach lets us use an efficient hard clustering algorithm -to perform a fuzzy clustering of the graph. Despite its simplicity, our -approach shows excellent results, outperforming five competitive -state-of-the-art methods in terms of F-score on three gold standard datasets -for English and Russian derived from large-scale manually constructed lexical -resources. -" -4678,1704.07203,"Johannes Daxenberger, Steffen Eger, Ivan Habernal, Christian Stab, - Iryna Gurevych",What is the Essence of a Claim? Cross-Domain Claim Identification,cs.CL," Argument mining has become a popular research area in NLP. It typically -includes the identification of argumentative components, e.g. claims, as the -central component of an argument. We perform a qualitative analysis across six -different datasets and show that these appear to conceptualize claims quite -differently. To learn about the consequences of such different -conceptualizations of claim for practical applications, we carried out -extensive experiments using state-of-the-art feature-rich and deep learning -systems, to identify claims in a cross-domain fashion. While the divergent -perception of claims in different datasets is indeed harmful to cross-domain -classification, we show that there are shared properties on the lexical level -as well as system configurations that can help to overcome these gaps. -" -4679,1704.07221,"Elena Kochkina, Maria Liakata, Isabelle Augenstein","Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance - Classification with Branch-LSTM",cs.CL cs.AI," This paper describes team Turing's submission to SemEval 2017 RumourEval: -Determining rumour veracity and support for rumours (SemEval 2017 Task 8, -Subtask A). Subtask A addresses the challenge of rumour stance classification, -which involves identifying the attitude of Twitter users towards the -truthfulness of the rumour they are discussing. Stance classification is -considered to be an important step towards rumour verification, therefore -performing well in this task is expected to be useful in debunking false -rumours. In this work we classify a set of Twitter posts discussing rumours -into either supporting, denying, questioning or commenting on the underlying -rumours. We propose a LSTM-based sequential model that, through modelling the -conversational structure of tweets, which achieves an accuracy of 0.784 on the -RumourEval test set outperforming all other systems in Subtask A. -" -4680,1704.07287,"Trang Tran, Shubham Toshniwal, Mohit Bansal, Kevin Gimpel, Karen - Livescu, Mari Ostendorf","Parsing Speech: A Neural Approach to Integrating Lexical and - Acoustic-Prosodic Information",cs.CL cs.LG cs.SD," In conversational speech, the acoustic signal provides cues that help -listeners disambiguate difficult parses. For automatically parsing spoken -utterances, we introduce a model that integrates transcribed text and -acoustic-prosodic features using a convolutional neural network over energy and -pitch trajectories coupled with an attention-based recurrent neural network -that accepts text and prosodic features. We find that different types of -acoustic-prosodic features are individually helpful, and together give -statistically significant improvements in parse and disfluency detection F1 -scores over a strong text-only baseline. For this study with known sentence -boundaries, error analyses show that the main benefit of acoustic-prosodic -features is in sentences with disfluencies, attachment decisions are most -improved, and transcription errors obscure gains from prosody. -" -4681,1704.07329,"Murathan Kurfal{\i}, Ahmet \""Ust\""un, Burcu Can","A Trie-Structured Bayesian Model for Unsupervised Morphological - Segmentation",cs.CL," In this paper, we introduce a trie-structured Bayesian model for unsupervised -morphological segmentation. We adopt prior information from different sources -in the model. We use neural word embeddings to discover words that are -morphologically derived from each other and thereby that are semantically -similar. We use letter successor variety counts obtained from tries that are -built by neural word embeddings. Our results show that using different -information sources such as neural word embeddings and letter successor variety -as prior information improves morphological segmentation in a Bayesian model. -Our model outperforms other unsupervised morphological segmentation models on -Turkish and gives promising results on English and German for scarce resources. -" -4682,1704.07398,"Yevgeni Berzak, Chie Nakamura, Suzanne Flynn and Boris Katz",Predicting Native Language from Gaze,cs.CL," A fundamental question in language learning concerns the role of a speaker's -first language in second language acquisition. We present a novel methodology -for studying this question: analysis of eye-movement patterns in second -language reading of free-form text. Using this methodology, we demonstrate for -the first time that the native language of English learners can be predicted -from their gaze fixations when reading English. We provide analysis of -classifier uncertainty and learned features, which indicates that differences -in English reading are likely to be rooted in linguistic divergences across -native languages. The presented framework complements production studies and -offers new ground for advancing research on multilingualism. -" -4683,1704.07415,"Yichen Gong, Samuel R. Bowman",Ruminating Reader: Reasoning with Gated Multi-Hop Attention,cs.CL," To answer the question in machine comprehension (MC) task, the models need to -establish the interaction between the question and the context. To tackle the -problem that the single-pass model cannot reflect on and correct its answer, we -present Ruminating Reader. Ruminating Reader adds a second pass of attention -and a novel information fusion component to the Bi-Directional Attention Flow -model (BiDAF). We propose novel layer structures that construct an query-aware -context vector representation and fuse encoding representation with -intermediate representation on top of BiDAF model. We show that a multi-hop -attention mechanism can be applied to a bi-directional attention structure. In -experiments on SQuAD, we find that the Reader outperforms the BiDAF baseline by -a substantial margin, and matches or surpasses the performance of all other -published systems. -" -4684,1704.07427,"Yanqing Chen, Steven Skiena",Recognizing Descriptive Wikipedia Categories for Historical Figures,cs.CL," Wikipedia is a useful knowledge source that benefits many applications in -language processing and knowledge representation. An important feature of -Wikipedia is that of categories. Wikipedia pages are assigned different -categories according to their contents as human-annotated labels which can be -used in information retrieval, ad hoc search improvements, entity ranking and -tag recommendations. However, important pages are usually assigned too many -categories, which makes it difficult to recognize the most important ones that -give the best descriptions. - In this paper, we propose an approach to recognize the most descriptive -Wikipedia categories. We observe that historical figures in a precise category -presumably are mutually similar and such categorical coherence could be -evaluated via texts or Wikipedia links of corresponding members in the -category. We rank descriptive level of Wikipedia categories according to their -coherence and our ranking yield an overall agreement of 88.27% compared with -human wisdom. -" -4685,1704.07431,"Pierre Isabelle, Colin Cherry, and George Foster",A Challenge Set Approach to Evaluating Machine Translation,cs.CL," Neural machine translation represents an exciting leap forward in translation -quality. But what longstanding weaknesses does it resolve, and which remain? We -address these questions with a challenge set approach to translation evaluation -and error analysis. A challenge set consists of a small set of sentences, each -hand-designed to probe a system's capacity to bridge a particular structural -divergence between languages. To exemplify this approach, we present an -English-French challenge set, and use it to analyze phrase-based and neural -systems. The resulting analysis provides not only a more fine-grained picture -of the strengths of neural systems, but also insight into which linguistic -phenomena remain out of reach. -" -4686,1704.07441,"Yanging Chen, Rami Al-Rfou', Yejin Choi",Detecting English Writing Styles For Non Native Speakers,cs.CL," This paper presents the first attempt, up to our knowledge, to classify -English writing styles on this scale with the challenge of classifying day to -day language written by writers with different backgrounds covering various -areas of topics.The paper proposes simple machine learning algorithms and -simple to generate features to solve hard problems. Relying on the scale of the -data available from large sources of knowledge like Wikipedia. We believe such -sources of data are crucial to generate robust solutions for the web with high -accuracy and easy to deploy in practice. The paper achieves 74\% accuracy -classifying native versus non native speakers writing styles. - Moreover, the paper shows some interesting observations on the similarity -between different languages measured by the similarity of their users English -writing styles. This technique could be used to show some well known facts -about languages as in grouping them into families, which our experiments -support. -" -4687,1704.07463,"Chandler May, Kevin Duh, Benjamin Van Durme, Ashwin Lall",Streaming Word Embeddings with the Space-Saving Algorithm,cs.CL," We develop a streaming (one-pass, bounded-memory) word embedding algorithm -based on the canonical skip-gram with negative sampling algorithm implemented -in word2vec. We compare our streaming algorithm to word2vec empirically by -measuring the cosine similarity between word pairs under each algorithm and by -applying each algorithm in the downstream task of hashtag prediction on a -two-month interval of the Twitter sample stream. We then discuss the results of -these experiments, concluding they provide partial validation of our approach -as a streaming replacement for word2vec. Finally, we discuss potential failure -modes and suggest directions for future work. -" -4688,1704.07468,"Ritambhara Singh, Arshdeep Sekhon, Kamran Kowsari, Jack Lanchantin, - Beilun Wang and Yanjun Qi",GaKCo: a Fast GApped k-mer string Kernel using COunting,cs.LG cs.AI cs.CC cs.CL cs.DS," String Kernel (SK) techniques, especially those using gapped $k$-mers as -features (gk), have obtained great success in classifying sequences like DNA, -protein, and text. However, the state-of-the-art gk-SK runs extremely slow when -we increase the dictionary size ($\Sigma$) or allow more mismatches ($M$). This -is because current gk-SK uses a trie-based algorithm to calculate co-occurrence -of mismatched substrings resulting in a time cost proportional to -$O(\Sigma^{M})$. We propose a \textbf{fast} algorithm for calculating -\underline{Ga}pped $k$-mer \underline{K}ernel using \underline{Co}unting -(GaKCo). GaKCo uses associative arrays to calculate the co-occurrence of -substrings using cumulative counting. This algorithm is fast, scalable to -larger $\Sigma$ and $M$, and naturally parallelizable. We provide a rigorous -asymptotic analysis that compares GaKCo with the state-of-the-art gk-SK. -Theoretically, the time cost of GaKCo is independent of the $\Sigma^{M}$ term -that slows down the trie-based approach. Experimentally, we observe that GaKCo -achieves the same accuracy as the state-of-the-art and outperforms its speed by -factors of 2, 100, and 4, on classifying sequences of DNA (5 datasets), protein -(12 datasets), and character-based English text (2 datasets), respectively. - GaKCo is shared as an open source tool at -\url{https://github.com/QData/GaKCo-SVM} -" -4689,1704.07489,"Ramakanth Pasunuru, Mohit Bansal",Multi-Task Video Captioning with Video and Entailment Generation,cs.CL cs.AI cs.CV," Video captioning, the task of describing the content of a video, has seen -some promising improvements in recent years with sequence-to-sequence models, -but accurately learning the temporal and logical dynamics involved in the task -still remains a challenge, especially given the lack of sufficient annotated -data. We improve video captioning by sharing knowledge with two related -directed-generation tasks: a temporally-directed unsupervised video prediction -task to learn richer context-aware video encoder representations, and a -logically-directed language entailment generation task to learn better -video-entailed caption decoder representations. For this, we present a -many-to-many multi-task learning model that shares parameters across the -encoders and decoders of the three tasks. We achieve significant improvements -and the new state-of-the-art on several standard video captioning datasets -using diverse automatic and human evaluations. We also show mutual multi-task -improvements on the entailment generation task. -" -4690,1704.07535,"Maxim Rabinovich, Mitchell Stern, Dan Klein",Abstract Syntax Networks for Code Generation and Semantic Parsing,cs.CL cs.AI cs.LG stat.ML," Tasks like code generation and semantic parsing require mapping unstructured -(or partially structured) inputs to well-formed, executable outputs. We -introduce abstract syntax networks, a modeling framework for these problems. -The outputs are represented as abstract syntax trees (ASTs) and constructed by -a decoder with a dynamically-determined modular structure paralleling the -structure of the output tree. On the benchmark Hearthstone dataset for code -generation, our model obtains 79.2 BLEU and 22.7% exact match accuracy, -compared to previous state-of-the-art values of 67.1 and 6.1%. Furthermore, we -perform competitively on the Atis, Jobs, and Geo semantic parsing datasets with -no task-specific engineering. -" -4691,1704.07556,"Xinchi Chen, Zhan Shi, Xipeng Qiu, Xuanjing Huang",Adversarial Multi-Criteria Learning for Chinese Word Segmentation,cs.CL," Different linguistic perspectives causes many diverse segmentation criteria -for Chinese word segmentation (CWS). Most existing methods focus on improve the -performance for each single criterion. However, it is interesting to exploit -these different criteria and mining their common underlying knowledge. In this -paper, we propose adversarial multi-criteria learning for CWS by integrating -shared knowledge from multiple heterogeneous segmentation criteria. Experiments -on eight corpora with heterogeneous segmentation criteria show that the -performance of each corpus obtains a significant improvement, compared to -single-criterion learning. Source codes of this paper are available on Github. -" -4692,1704.07616,"Liner Yang, Meishan Zhang, Yang Liu, Nan Yu, Maosong Sun, Guohong Fu","Joint POS Tagging and Dependency Parsing with Transition-based Neural - Networks",cs.CL," While part-of-speech (POS) tagging and dependency parsing are observed to be -closely related, existing work on joint modeling with manually crafted feature -templates suffers from the feature sparsity and incompleteness problems. In -this paper, we propose an approach to joint POS tagging and dependency parsing -using transition-based neural networks. Three neural network based classifiers -are designed to resolve shift/reduce, tagging, and labeling conflicts. -Experiments show that our approach significantly outperforms previous methods -for joint POS tagging and dependency parsing across a variety of natural -languages. -" -4693,1704.07624,Amit Gupta and R\'emi Lebret and Hamza Harkous and Karl Aberer,"280 Birds with One Stone: Inducing Multilingual Taxonomies from - Wikipedia using Character-level Classification",cs.CL cs.AI cs.IR," We propose a simple, yet effective, approach towards inducing multilingual -taxonomies from Wikipedia. Given an English taxonomy, our approach leverages -the interlanguage links of Wikipedia followed by character-level classifiers to -induce high-precision, high-coverage taxonomies in other languages. Through -experiments, we demonstrate that our approach significantly outperforms the -state-of-the-art, heuristics-heavy approaches for six languages. As a -consequence of our work, we release presumably the largest and the most -accurate multilingual taxonomic resource spanning over 280 languages. -" -4694,1704.07626,"Amit Gupta, R\'emi Lebret, Hamza Harkous and Karl Aberer",Taxonomy Induction using Hypernym Subsequences,cs.AI cs.CL cs.IR," We propose a novel, semi-supervised approach towards domain taxonomy -induction from an input vocabulary of seed terms. Unlike all previous -approaches, which typically extract direct hypernym edges for terms, our -approach utilizes a novel probabilistic framework to extract hypernym -subsequences. Taxonomy induction from extracted subsequences is cast as an -instance of the minimumcost flow problem on a carefully designed directed -graph. Through experiments, we demonstrate that our approach outperforms -stateof- the-art taxonomy induction approaches across four languages. -Importantly, we also show that our approach is robust to the presence of noise -in the input vocabulary. To the best of our knowledge, no previous approaches -have been empirically proven to manifest noise-robustness in the input -vocabulary. -" -4695,1704.07734,"Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim",DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning,cs.SE cs.CL cs.NE," Computer programs written in one language are often required to be ported to -other languages to support multiple devices and environments. When programs use -language specific APIs (Application Programming Interfaces), it is very -challenging to migrate these APIs to the corresponding APIs written in other -languages. Existing approaches mine API mappings from projects that have -corresponding versions in two languages. They rely on the sparse availability -of bilingual projects, thus producing a limited number of API mappings. In this -paper, we propose an intelligent system called DeepAM for automatically mining -API mappings from a large-scale code corpus without bilingual projects. The key -component of DeepAM is based on the multimodal sequence to sequence learning -architecture that aims to learn joint semantic representations of bilingual API -sequences from big source code data. Experimental results indicate that DeepAM -significantly increases the accuracy of API mappings as well as the number of -API mappings, when compared with the state-of-the-art approaches. -" -4696,1704.07751,"Maxim Rabinovich, Dan Klein",Fine-Grained Entity Typing with High-Multiplicity Assignments,cs.CL cs.AI cs.IR cs.LG stat.ML," As entity type systems become richer and more fine-grained, we expect the -number of types assigned to a given entity to increase. However, most -fine-grained typing work has focused on datasets that exhibit a low degree of -type multiplicity. In this paper, we consider the high-multiplicity regime -inherent in data sources such as Wikipedia that have semi-open type systems. We -introduce a set-prediction approach to this problem and show that our model -outperforms unstructured baselines on a new Wikipedia-based fine-grained typing -corpus. -" -4697,1704.07759,"Emeric Bernard-Jones, Jeremiah Onaolapo, Gianluca Stringhini","Email Babel: Does Language Affect Criminal Activity in Compromised - Webmail Accounts?",cs.CY cs.CL," We set out to understand the effects of differing language on the ability of -cybercriminals to navigate webmail accounts and locate sensitive information in -them. To this end, we configured thirty Gmail honeypot accounts with English, -Romanian, and Greek language settings. We populated the accounts with email -messages in those languages by subscribing them to selected online newsletters. -We hid email messages about fake bank accounts in fifteen of the accounts to -mimic real-world webmail users that sometimes store sensitive information in -their accounts. We then leaked credentials to the honey accounts via paste -sites on the Surface Web and the Dark Web, and collected data for fifteen days. -Our statistical analyses on the data show that cybercriminals are more likely -to discover sensitive information (bank account information) in the Greek -accounts than the remaining accounts, contrary to the expectation that Greek -ought to constitute a barrier to the understanding of non-Greek visitors to the -Greek accounts. We also extracted the important words among the emails that -cybercriminals accessed (as an approximation of the keywords that they searched -for within the honey accounts), and found that financial terms featured among -the top words. In summary, we show that language plays a significant role in -the ability of cybercriminals to access sensitive information hidden in -compromised webmail accounts. -" -4698,1704.07828,"Chenhao Tan, Dallas Card, Noah A. Smith","Friendships, Rivalries, and Trysts: Characterizing Relations between - Ideas in Texts",cs.SI cs.CL physics.soc-ph," Understanding how ideas relate to each other is a fundamental question in -many domains, ranging from intellectual history to public communication. -Because ideas are naturally embedded in texts, we propose the first framework -to systematically characterize the relations between ideas based on their -occurrence in a corpus of documents, independent of how these ideas are -represented. Combining two statistics --- cooccurrence within documents and -prevalence correlation over time --- our approach reveals a number of different -ways in which ideas can cooperate and compete. For instance, two ideas can -closely track each other's prevalence over time, and yet rarely cooccur, almost -like a ""cold war"" scenario. We observe that pairwise cooccurrence and -prevalence correlation exhibit different distributions. We further demonstrate -that our approach is able to uncover intriguing relations between ideas through -in-depth case studies on news articles and research papers. -" -4699,1704.07875,"Maria Ryskina, Hannah Alpert-Abrams, Dan Garrette, Taylor - Berg-Kirkpatrick",Automatic Compositor Attribution in the First Folio of Shakespeare,cs.CL," Compositor attribution, the clustering of pages in a historical printed -document by the individual who set the type, is a bibliographic task that -relies on analysis of orthographic variation and inspection of visual details -of the printed page. In this paper, we introduce a novel unsupervised model -that jointly describes the textual and visual features needed to distinguish -compositors. Applied to images of Shakespeare's First Folio, our model predicts -attributions that agree with the manual judgements of bibliographers with an -accuracy of 87%, even on text that is the output of OCR. -" -4700,1704.07986,"Akira Sasaki, Kazuaki Hanawa, Naoaki Okazaki, Kentaro Inui","Other Topics You May Also Agree or Disagree: Modeling Inter-Topic - Preferences using Tweets and Matrix Factorization",cs.CL," We present in this paper our approach for modeling inter-topic preferences of -Twitter users: for example, those who agree with the Trans-Pacific Partnership -(TPP) also agree with free trade. This kind of knowledge is useful not only for -stance detection across multiple topics but also for various real-world -applications including public opinion surveys, electoral predictions, electoral -campaigns, and online debates. In order to extract users' preferences on -Twitter, we design linguistic patterns in which people agree and disagree about -specific topics (e.g., ""A is completely wrong""). By applying these linguistic -patterns to a collection of tweets, we extract statements agreeing and -disagreeing with various topics. Inspired by previous work on item -recommendation, we formalize the task of modeling inter-topic preferences as -matrix factorization: representing users' preferences as a user-topic matrix -and mapping both users and topics onto a latent feature space that abstracts -the preferences. Our experimental results demonstrate both that our proposed -approach is useful in predicting missing preferences of users and that the -latent vector representations of topics successfully encode inter-topic -preferences. -" -4701,1704.08012,Jey Han Lau and Timothy Baldwin and Trevor Cohn,Topically Driven Neural Language Model,cs.CL," Language models are typically applied at the sentence level, without access -to the broader document context. We present a neural language model that -incorporates document context in the form of a topic model-like architecture, -thus providing a succinct representation of the broader document context -outside of the current sentence. Experiments over a range of datasets -demonstrate that our model outperforms a pure sentence-based model in terms of -language model perplexity, and leads to topics that are potentially more -coherent than those produced by a standard LDA topic model. Our model also has -the ability to generate related sentences for a topic, providing another way to -interpret topics. -" -4702,1704.08059,"Alexander Fonarev, Oleksii Hrinchuk, Gleb Gusev, Pavel Serdyukov, and - Ivan Oseledets",Riemannian Optimization for Skip-Gram Negative Sampling,cs.CL," Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its -implementation in ""word2vec"" software, is usually optimized by stochastic -gradient descent. However, the optimization of SGNS objective can be viewed as -a problem of searching for a good matrix with the low-rank constraint. The most -standard way to solve this type of problems is to apply Riemannian optimization -framework to optimize the SGNS objective over the manifold of required low-rank -matrices. In this paper, we propose an algorithm that optimizes SGNS objective -using Riemannian optimization and demonstrates its superiority over popular -competitors, such as the original method to train SGNS and SVD over SPPMI -matrix. -" -4703,1704.08088,"Leandro B. dos Santos, Edilson A. Corr\^ea Jr, Osvaldo N. Oliveira Jr, - Diego R. Amancio, Let\'icia L. Mansur and Sandra M. Alu\'isio","Enriching Complex Networks with Word Embeddings for Detecting Mild - Cognitive Impairment from Speech Transcripts",cs.CL," Mild Cognitive Impairment (MCI) is a mental disorder difficult to diagnose. -Linguistic features, mainly from parsers, have been used to detect MCI, but -this is not suitable for large-scale assessments. MCI disfluencies produce -non-grammatical speech that requires manual or high precision automatic -correction of transcripts. In this paper, we modeled transcripts into complex -networks and enriched them with word embedding (CNE) to better represent short -texts produced in neuropsychological assessments. The network measurements were -applied with well-known classifiers to automatically identify MCI in -transcripts, in a binary classification task. A comparison was made with the -performance of traditional approaches using Bag of Words (BoW) and linguistic -features for three datasets: DementiaBank in English, and Cinderella and -Arizona-Battery in Portuguese. Overall, CNE provided higher accuracy than using -only complex networks, while Support Vector Machine was superior to other -classifiers. CNE provided the highest accuracies for DementiaBank and -Cinderella, but BoW was more efficient for the Arizona-Battery dataset probably -owing to its short narratives. The approach using linguistic features yielded -higher accuracy if the transcriptions of the Cinderella dataset were manually -revised. Taken together, the results indicate that complex networks enriched -with embedding is promising for detecting MCI in large-scale assessments -" -4704,1704.08092,"Samuel R\""onnqvist, Niko Schenk, Christian Chiarcos","A Recurrent Neural Model with Attention for the Recognition of Chinese - Implicit Discourse Relations",cs.CL cs.AI cs.LG cs.NE," We introduce an attention-based Bi-LSTM for Chinese implicit discourse -relations and demonstrate that modeling argument pairs as a joint sequence can -outperform word order-agnostic approaches. Our model benefits from a partial -sampling scheme and is conceptually simple, yet achieves state-of-the-art -performance on the Chinese Discourse Treebank. We also visualize its attention -activity to illustrate the model's ability to selectively focus on the relevant -parts of an input sequence. -" -4705,1704.08224,Arjun Chandrasekaran and Devi Parikh and Mohit Bansal,Punny Captions: Witty Wordplay in Image Descriptions,cs.CL cs.AI cs.CV," Wit is a form of rich interaction that is often grounded in a specific -situation (e.g., a comment in response to an event). In this work, we attempt -to build computational models that can produce witty descriptions for a given -image. Inspired by a cognitive account of humor appreciation, we employ -linguistic wordplay, specifically puns, in image descriptions. We develop two -approaches which involve retrieving witty descriptions for a given image from a -large corpus of sentences, or generating them via an encoder-decoder neural -network architecture. We compare our approach against meaningful baseline -approaches via human studies and show substantial improvements. We find that -when a human is subject to similar constraints as the model regarding word -usage and style, people vote the image descriptions generated by our model to -be slightly wittier than human-written witty descriptions. Unsurprisingly, -humans are almost always wittier than the model when they are free to choose -the vocabulary, style, etc. -" -4706,1704.08243,"Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh","C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 - Dataset",cs.CV cs.AI cs.CL cs.LG," Visual Question Answering (VQA) has received a lot of attention over the past -couple of years. A number of deep learning models have been proposed for this -task. However, it has been shown that these models are heavily driven by -superficial correlations in the training data and lack compositionality -- the -ability to answer questions about unseen compositions of seen concepts. This -compositionality is desirable and central to intelligence. In this paper, we -propose a new setting for Visual Question Answering where the test -question-answer pairs are compositionally novel compared to training -question-answer pairs. To facilitate developing models under this setting, we -present a new compositional split of the VQA v1.0 dataset, which we call -Compositional VQA (C-VQA). We analyze the distribution of questions and answers -in the C-VQA splits. Finally, we evaluate several existing VQA models under -this new setting and show that the performances of these models degrade by a -significant amount compared to the original VQA setting. -" -4707,1704.08300,"Preksha Nema, Mitesh Khapra, Anirban Laha, Balaraman Ravindran","Diversity driven Attention Model for Query-based Abstractive - Summarization",cs.CL," Abstractive summarization aims to generate a shorter version of the document -covering all the salient points in a compact and coherent fashion. On the other -hand, query-based summarization highlights those points that are relevant in -the context of a given query. The encode-attend-decode paradigm has achieved -notable success in machine translation, extractive summarization, dialog -systems, etc. But it suffers from the drawback of generation of repeated -phrases. In this work we propose a model for the query-based summarization task -based on the encode-attend-decode paradigm with two key additions (i) a query -attention model (in addition to document attention model) which learns to focus -on different portions of the query at different time steps (instead of using a -static representation for the query) and (ii) a new diversity based attention -model which aims to alleviate the problem of repeating phrases in the summary. -In order to enable the testing of this model we introduce a new query-based -summarization dataset building on debatepedia. Our experiments show that with -these two additions the proposed model clearly outperforms vanilla -encode-attend-decode models with a gain of 28% (absolute) in ROUGE-L scores. -" -4708,1704.08352,Clara Vania and Adam Lopez,From Characters to Words to in Between: Do We Capture Morphology?,cs.CL," Words can be represented by composing the representations of subword units -such as word segments, characters, and/or character n-grams. While such -representations are effective and may capture the morphological regularities of -words, they have not been systematically compared, and it is not understood how -they interact with different morphological typologies. On a language modeling -task, we present experiments that systematically vary (1) the basic unit of -representation, (2) the composition of these representations, and (3) the -morphological typology of the language modeled. Our results extend previous -findings that character representations are effective across typologies, and we -find that a previously unstudied combination of character trigram -representations composed with bi-LSTMs outperforms most others. But we also -find room for improvement: none of the character-level models match the -predictive accuracy of a model with access to true morphological analyses, even -when learned from an order of magnitude more data. -" -4709,1704.08381,"Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin Choi and Luke - Zettlemoyer",Neural AMR: Sequence-to-Sequence Models for Parsing and Generation,cs.CL," Sequence-to-sequence models have shown strong performance across a broad -range of applications. However, their application to parsing and generating -text usingAbstract Meaning Representation (AMR)has been limited, due to the -relatively limited amount of labeled data and the non-sequential nature of the -AMR graphs. We present a novel training procedure that can lift this limitation -using millions of unlabeled sentences and careful preprocessing of the AMR -graphs. For AMR parsing, our model achieves competitive results of 62.1SMATCH, -the current best score reported without significant use of external semantic -resources. For AMR generation, our model establishes a new state-of-the-art -performance of BLEU 33.8. We present extensive ablative and qualitative -analysis including strong evidence that sequence-based AMR models are robust -against ordering variations of graph-to-sequence conversions. -" -4710,1704.08384,"Rajarshi Das, Manzil Zaheer, Siva Reddy, Andrew McCallum","Question Answering on Knowledge Bases and Text using Universal Schema - and Memory Networks",cs.CL," Existing question answering methods infer answers either from a knowledge -base or from raw text. While knowledge base (KB) methods are good at answering -compositional questions, their performance is often affected by the -incompleteness of the KB. Au contraire, web text contains millions of facts -that are absent in the KB, however in an unstructured form. {\it Universal -schema} can support reasoning on the union of both structured KBs and -unstructured text by aligning them in a common embedded space. In this paper we -extend universal schema to natural language question answering, employing -\emph{memory networks} to attend to the large body of facts in the combination -of text and KB. Our models can be trained in an end-to-end fashion on -question-answer pairs. Evaluation results on \spades fill-in-the-blank question -answering dataset show that exploiting universal schema for question answering -is better than using either a KB or text alone. This model also outperforms the -current state-of-the-art by 8.5 $F_1$ points.\footnote{Code and data available -in \url{https://rajarshd.github.io/TextKBQA}} -" -4711,1704.08387,"Jianpeng Cheng, Siva Reddy, Vijay Saraswat, Mirella Lapata","Learning Structured Natural Language Representations for Semantic - Parsing",cs.CL," We introduce a neural semantic parser that converts natural language -utterances to intermediate representations in the form of predicate-argument -structures, which are induced with a transition system and subsequently mapped -to target domains. The semantic parser is trained end-to-end using annotated -logical forms or their denotations. We obtain competitive results on various -datasets. The induced predicate-argument structures shed light on the types of -representations useful for semantic parsing and how these are different from -linguistically motivated ones. -" -4712,1704.08388,Ted Pedersen,"Duluth at Semeval-2017 Task 7 : Puns upon a midnight dreary, Lexical - Semantics for the weak and weary",cs.CL," This paper describes the Duluth systems that participated in SemEval-2017 -Task 7 : Detection and Interpretation of English Puns. The Duluth systems -participated in all three subtasks, and relied on methods that included word -sense disambiguation and measures of semantic relatedness. -" -4713,1704.08390,Xinru Yan and Ted Pedersen,Duluth at SemEval-2017 Task 6: Language Models in Humor Detection,cs.CL," This paper describes the Duluth system that participated in SemEval-2017 Task -6 #HashtagWars: Learning a Sense of Humor. The system participated in Subtasks -A and B using N-gram language models, ranking highly in the task evaluation. -This paper discusses the results of our system in the development and -evaluation stages and from two post-evaluation runs. -" -4714,1704.08424,"Ben Athiwaratkun, Andrew Gordon Wilson",Multimodal Word Distributions,stat.ML cs.AI cs.CL cs.LG," Word embeddings provide point representations of words containing useful -semantic information. We introduce multimodal word distributions formed from -Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty -information. To learn these distributions, we propose an energy-based -max-margin objective. We show that the resulting approach captures uniquely -expressive semantic information, and outperforms alternatives, such as word2vec -skip-grams, and Gaussian embeddings, on benchmark datasets such as word -similarity and entailment. -" -4715,1704.08430,"Biao Zhang, Deyi Xiong, Jinsong Su",A GRU-Gated Attention Model for Neural Machine Translation,cs.CL," Neural machine translation (NMT) heavily relies on an attention network to -produce a context vector for each target word prediction. In practice, we find -that context vectors for different target words are quite similar to one -another and therefore are insufficient in discriminatively predicting target -words. The reason for this might be that context vectors produced by the -vanilla attention network are just a weighted sum of source representations -that are invariant to decoder states. In this paper, we propose a novel -GRU-gated attention model (GAtt) for NMT which enhances the degree of -discrimination of context vectors by enabling source representations to be -sensitive to the partial translation generated by the decoder. GAtt uses a -gated recurrent unit (GRU) to combine two types of information: treating a -source annotation vector originally produced by the bidirectional encoder as -the history state while the corresponding previous decoder state as the input -to the GRU. The GRU-combined information forms a new source annotation vector. -In this way, we can obtain translation-sensitive source representations which -are then feed into the attention network to generate discriminative context -vectors. We further propose a variant that regards a source annotation vector -as the current input while the previous decoder state as the history. -Experiments on NIST Chinese-English translation tasks show that both GAtt-based -models achieve significant improvements over the vanilla attentionbased NMT. -Further analyses on attention weights and context vectors demonstrate the -effectiveness of GAtt in improving the discrimination power of representations -and handling the challenging issue of over-translation. -" -4716,1704.08531,Vineet John,A Survey of Neural Network Techniques for Feature Extraction from Text,cs.CL," This paper aims to catalyze the discussions about text feature extraction -techniques using neural network architectures. The research questions discussed -in the paper focus on the state-of-the-art neural network techniques that have -proven to be useful tools for language processing, language generation, text -classification and other computational linguistics tasks. -" -4717,1704.08619,"Panagiotis Tzirakis, George Trigeorgis, Mihalis A. Nicolaou, Bj\""orn - Schuller, and Stefanos Zafeiriou",End-to-End Multimodal Emotion Recognition using Deep Neural Networks,cs.CV cs.CL," Automatic affect recognition is a challenging task due to the various -modalities emotions can be expressed with. Applications can be found in many -domains including multimedia retrieval and human computer interaction. In -recent years, deep neural networks have been used with great success in -determining emotional states. Inspired by this success, we propose an emotion -recognition system using auditory and visual modalities. To capture the -emotional content for various styles of speaking, robust features need to be -extracted. To this purpose, we utilize a Convolutional Neural Network (CNN) to -extract features from the speech, while for the visual modality a deep residual -network (ResNet) of 50 layers. In addition to the importance of feature -extraction, a machine learning algorithm needs also to be insensitive to -outliers while being able to model the context. To tackle this problem, Long -Short-Term Memory (LSTM) networks are utilized. The system is then trained in -an end-to-end fashion where - by also taking advantage of the correlations of -the each of the streams - we manage to significantly outperform the traditional -approaches based on auditory and visual handcrafted features for the prediction -of spontaneous and natural emotions on the RECOLA database of the AVEC 2016 -research challenge on emotion recognition. -" -4718,1704.08760,"Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, - Luke Zettlemoyer",Learning a Neural Semantic Parser from User Feedback,cs.CL," We present an approach to rapidly and easily build natural language -interfaces to databases for new domains, whose performance improves over time -based on user feedback, and requires minimal intervention. To achieve this, we -adapt neural sequence models to map utterances directly to SQL with its full -expressivity, bypassing any intermediate meaning representations. These models -are immediately deployed online to solicit feedback from real users to flag -incorrect queries. Finally, the popularity of SQL facilitates gathering -annotations for incorrect predictions using the crowd, which is directly used -to improve our models. This complete feedback loop, without intermediate -representations or database specific engineering, opens up new ways of building -high quality semantic parsers. Experiments suggest that this approach can be -deployed quickly for any new target domain, as we show by learning a semantic -parser for an online academic database from scratch. -" -4719,1704.08795,Dipendra Misra and John Langford and Yoav Artzi,"Mapping Instructions and Visual Observations to Actions with - Reinforcement Learning",cs.CL," We propose to directly map raw visual observations and text input to actions -for instruction execution. While existing approaches assume access to -structured environment representations or use a pipeline of separately trained -models, we learn a single model to jointly reason about linguistic and visual -input. We use reinforcement learning in a contextual bandit setting to train a -neural network agent. To guide the agent's exploration, we use reward shaping -with different forms of supervision. Our approach does not require intermediate -representations, planning procedures, or training different models. We evaluate -in a simulated environment, and show significant improvements over supervised -learning and common reinforcement learning variants. -" -4720,1704.08798,Saif M. Mohammad,Word Affect Intensities,cs.CL," Words often convey affect -- emotions, feelings, and attitudes. Further, -different words can convey affect to various degrees (intensities). However, -existing manually created lexicons for basic emotions (such as anger and fear) -indicate only coarse categories of affect association (for example, associated -with anger or not associated with anger). Automatic lexicons of affect provide -fine degrees of association, but they tend not to be accurate as human-created -lexicons. Here, for the first time, we present a manually created affect -intensity lexicon with real-valued scores of intensity for four basic emotions: -anger, fear, joy, and sadness. (We will subsequently add entries for more -emotions such as disgust, anticipation, trust, and surprise.) We refer to this -dataset as the NRC Affect Intensity Lexicon, or AIL for short. AIL has entries -for close to 6,000 English words. We used a technique called best-worst scaling -(BWS) to create the lexicon. BWS improves annotation consistency and obtains -reliable fine-grained scores (split-half reliability > 0.91). We also compare -the entries in AIL with the entries in the NRC VAD Lexicon, which has valence, -arousal, and dominance (VAD) scores for 20K English words. We find that anger, -fear, and sadness words, on average, have very similar VAD scores. However, -sadness words tend to have slightly lower dominance scores than fear and anger -words. The Affect Intensity Lexicon has applications in automatic emotion -analysis in a number of domains such as commerce, education, intelligence, and -public health. AIL is also useful in the building of natural language -generation systems. -" -4721,1704.08803,"Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, W. Bruce - Croft",Neural Ranking Models with Weak Supervision,cs.IR cs.CL cs.LG," Despite the impressive improvements achieved by unsupervised deep neural -networks in computer vision and NLP tasks, such improvements have not yet been -observed in ranking for information retrieval. The reason may be the complexity -of the ranking problem, as it is not obvious how to learn from queries and -documents when no supervised signal is available. Hence, in this paper, we -propose to train a neural ranking model using weak supervision, where labels -are obtained automatically without human annotators or any external resources -(e.g., click data). To this aim, we use the output of an unsupervised ranking -model, such as BM25, as a weak supervision signal. We further train a set of -simple yet effective ranking models based on feed-forward neural networks. We -study their effectiveness under various learning scenarios (point-wise and -pair-wise models) and using different input representations (i.e., from -encoding query-document pairs into dense/sparse vectors to using word embedding -representation). We train our networks using tens of millions of training -instances and evaluate it on two standard collections: a homogeneous news -collection(Robust) and a heterogeneous large-scale web collection (ClueWeb). -Our experiments indicate that employing proper objective functions and letting -the networks to learn the input representation based on weakly supervised data -leads to impressive performance, with over 13% and 35% MAP improvements over -the BM25 model on the Robust and the ClueWeb collections. Our findings also -suggest that supervised neural ranking models can greatly benefit from -pre-training on large amounts of weakly labeled data that can be easily -obtained from unsupervised IR models. -" -4722,1704.08893,"Vera Demberg, Fatemeh Torabi Asr, Merel Scholman","How compatible are our discourse annotations? Insights from mapping - RST-DT and PDTB annotations",cs.CL," Discourse-annotated corpora are an important resource for the community, but -they are often annotated according to different frameworks. This makes -comparison of the annotations difficult, thereby also preventing researchers -from searching the corpora in a unified way, or using all annotated data -jointly to train computational systems. Several theoretical proposals have -recently been made for mapping the relational labels of different frameworks to -each other, but these proposals have so far not been validated against existing -annotations. The two largest discourse relation annotated resources, the Penn -Discourse Treebank and the Rhetorical Structure Theory Discourse Treebank, have -however been annotated on the same text, allowing for a direct comparison of -the annotation layers. We propose a method for automatically aligning the -discourse segments, and then evaluate existing mapping proposals by comparing -the empirically observed against the proposed mappings. Our analysis highlights -the influence of segmentation on subsequent discourse relation labeling, and -shows that while agreement between frameworks is reasonable for explicit -relations, agreement on implicit relations is low. We identify several sources -of systematic discrepancies between the two annotation schemes and discuss -consequences of these discrepancies for future annotation and for the training -of automatic discourse relation labellers. -" -4723,1704.08914,"Ehsaneddin Asgari and Hinrich Sch\""utze","Past, Present, Future: A Computational Investigation of the Typology of - Tense in 1000 Languages",cs.CL cs.AI cs.LG," We present SuperPivot, an analysis method for low-resource languages that -occur in a superparallel corpus, i.e., in a corpus that contains an order of -magnitude more languages than parallel corpora currently in use. We show that -SuperPivot performs well for the crosslingual analysis of the linguistic -phenomenon of tense. We produce analysis results for more than 1000 languages, -conducting - to the best of our knowledge - the largest crosslingual -computational study performed to date. We extend existing methodology for -leveraging parallel corpora for typological analysis by overcoming a limiting -assumption of earlier work: We only require that a linguistic feature is -overtly marked in a few of thousands of languages as opposed to requiring that -it be marked in all languages under investigation. -" -4724,1704.08960,Jie Yang and Yue Zhang and Fei Dong,Neural Word Segmentation with Rich Pretraining,cs.CL," Neural word segmentation research has benefited from large-scale raw texts by -leveraging them for pretraining character and word embeddings. On the other -hand, statistical segmentation research has exploited richer sources of -external information, such as punctuation, automatic segmentation and POS. We -investigate the effectiveness of a range of external training sources for -neural word segmentation by building a modular segmentation model, pretraining -the most important submodule using rich external sources. Results show that -such pretraining significantly improves the model, leading to accuracies -competitive to the best methods on six benchmarks. -" -4725,1704.08966,"Pierre Lison, Serge Bibauw","Not All Dialogues are Created Equal: Instance Weighting for Neural - Conversational Models",cs.CL cs.AI," Neural conversational models require substantial amounts of dialogue data for -their parameter estimation and are therefore usually learned on large corpora -such as chat forums or movie subtitles. These corpora are, however, often -challenging to work with, notably due to their frequent lack of turn -segmentation and the presence of multiple references external to the dialogue -itself. This paper shows that these challenges can be mitigated by adding a -weighting model into the architecture. The weighting model, which is itself -estimated from dialogue data, associates each training example to a numerical -weight that reflects its intrinsic quality for dialogue modelling. At training -time, these sample weights are included into the empirical loss to be -minimised. Evaluation results on retrieval-based models trained on movie and TV -subtitles demonstrate that the inclusion of such a weighting model improves the -model performance on unsupervised metrics. -" -4726,1705.00045,Xinyu Hua and Lu Wang,Understanding and Detecting Supporting Arguments of Diverse Types,cs.CL," We investigate the problem of sentence-level supporting argument detection -from relevant documents for user-specified claims. A dataset containing claims -and associated citation articles is collected from online debate website -idebate.org. We then manually label sentence-level supporting arguments from -the documents along with their types as study, factual, opinion, or reasoning. -We further characterize arguments of different types, and explore whether -leveraging type information can facilitate the supporting arguments detection -task. Experimental results show that LambdaMART (Burges, 2010) ranker that uses -features informed by argument types yields better performance than the same -ranker trained without type information. -" -4727,1705.00105,"Sumit Sidana, Mikhail Trofimov, Oleg Horodnitskii, Charlotte Laclau, - Yury Maximov, Massih-Reza Amini","Representation Learning and Pairwise Ranking for Implicit Feedback in - Recommendation Systems",stat.ML cs.CL cs.IR," In this paper, we propose a novel ranking framework for collaborative -filtering with the overall aim of learning user preferences over items by -minimizing a pairwise ranking loss. We show the minimization problem involves -dependent random variables and provide a theoretical analysis by proving the -consistency of the empirical risk minimization in the worst case where all -users choose a minimal number of positive and negative items. We further derive -a Neural-Network model that jointly learns a new representation of users and -items in an embedded space as well as the preference relation of users over the -pairs of items. The learning objective is based on three scenarios of ranking -losses that control the ability of the model to maintain the ordering over the -items induced from the users' preferences, as well as, the capacity of the -dot-product defined in the learned embedded space to produce the ordering. The -proposed model is by nature suitable for implicit feedback and involves the -estimation of only very few parameters. Through extensive experiments on -several real-world benchmarks on implicit data, we show the interest of -learning the preference and the embedding simultaneously when compared to -learning those separately. We also demonstrate that our approach is very -competitive with the best state-of-the-art collaborative filtering techniques -proposed for implicit feedback. -" -4728,1705.00106,"Xinya Du, Junru Shao and Claire Cardie",Learning to Ask: Neural Question Generation for Reading Comprehension,cs.CL cs.AI," We study automatic question generation for sentences from text passages in -reading comprehension. We introduce an attention-based sequence learning model -for the task and investigate the effect of encoding sentence- vs. -paragraph-level information. In contrast to all previous work, our model does -not rely on hand-crafted rules or a sophisticated NLP pipeline; it is instead -trainable end-to-end via sequence-to-sequence learning. Automatic evaluation -results show that our system significantly outperforms the state-of-the-art -rule-based system. In human evaluations, questions generated by our system are -also rated as being more natural (i.e., grammaticality, fluency) and as more -difficult to answer (in terms of syntactic and lexical divergence from the -original text and reasoning needed to answer). -" -4729,1705.00108,"Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula, Russell Power",Semi-supervised sequence tagging with bidirectional language models,cs.CL," Pre-trained word embeddings learned from unlabeled text have become a -standard component of neural network architectures for NLP tasks. However, in -most cases, the recurrent network that operates on word-level representations -to produce context sensitive representations is trained on relatively little -labeled data. In this paper, we demonstrate a general semi-supervised approach -for adding pre- trained context embeddings from bidirectional language models -to NLP systems and apply it to sequence labeling tasks. We evaluate our model -on two standard datasets for named entity recognition (NER) and chunking, and -in both cases achieve state of the art results, surpassing previous systems -that use other forms of transfer or joint learning with additional labeled data -and task specific gazetteers. -" -4730,1705.00217,"Mikhail Khodak, Andrej Risteski, Christiane Fellbaum, Sanjeev Arora",Extending and Improving Wordnet via Unsupervised Word Embeddings,cs.CL cs.IR," This work presents an unsupervised approach for improving WordNet that builds -upon recent advances in document and sense representation via distributional -semantics. We apply our methods to construct Wordnets in French and Russian, -languages which both lack good manual constructions.1 These are evaluated on -two new 600-word test sets for word-to-synset matching and found to improve -greatly upon synset recall, outperforming the best automated Wordnets in -F-score. Our methods require very few linguistic resources, thus being -applicable for Wordnet construction in low-resources languages, and may further -be applied to sense clustering and other Wordnet improvements. -" -4731,1705.00251,"Lei Shu, Hu Xu, Bing Liu",Lifelong Learning CRF for Supervised Aspect Extraction,cs.CL," This paper makes a focused contribution to supervised aspect extraction. It -shows that if the system has performed aspect extraction from many past domains -and retained their results as knowledge, Conditional Random Fields (CRF) can -leverage this knowledge in a lifelong learning manner to extract in a new -domain markedly better than the traditional CRF without using this prior -knowledge. The key innovation is that even after CRF training, the model can -still improve its extraction with experiences in its applications. -" -4732,1705.00316,"Xiaoyu Shen, Hui Su, Yanran Li, Wenjie Li, Shuzi Niu, Yang Zhao, Akiko - Aizawa and Guoping Long",A Conditional Variational Framework for Dialog Generation,cs.CL," Deep latent variable models have been shown to facilitate the response -generation for open-domain dialog systems. However, these latent variables are -highly randomized, leading to uncontrollable generated responses. In this -paper, we propose a framework allowing conditional response generation based on -specific attributes. These attributes can be either manually assigned or -automatically detected. Moreover, the dialog states for both speakers are -modeled separately in order to reflect personal features. We validate this -framework on two different scenarios, where the attribute refers to genericness -and sentiment states respectively. The experiment result testified the -potential of our model, where meaningful responses can be generated in -accordance with the specified attributes. -" -4733,1705.00321,"Ganbin Zhou, Ping Luo, Rongyu Cao, Yijun Xiao, Fen Lin, Bo Chen, Qing - He",Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation,cs.AI cs.CL cs.LG," Different from other sequential data, sentences in natural language are -structured by linguistic grammars. Previous generative conversational models -with chain-structured decoder ignore this structure in human language and might -generate plausible responses with less satisfactory relevance and fluency. In -this study, we aim to incorporate the results from linguistic analysis into the -process of sentence generation for high-quality conversation generation. -Specifically, we use a dependency parser to transform each response sentence -into a dependency tree and construct a training corpus of sentence-tree pairs. -A tree-structured decoder is developed to learn the mapping from a sentence to -its tree, where different types of hidden states are used to depict the local -dependencies from an internal tree node to its children. For training -acceleration, we propose a tree canonicalization method, which transforms trees -into equivalent ternary trees. Then, with a proposed tree-structured search -method, the model is able to generate the most probable responses in the form -of dependency trees, which are finally flattened into sequences as the system -output. Experimental results demonstrate that the proposed X2Tree framework -outperforms baseline methods over 11.15% increase of acceptance ratio. -" -4734,1705.00335,"Silvio Amir, Glen Coppersmith, Paula Carvalho, M\'ario J. Silva, Byron - C. Wallace",Quantifying Mental Health from Social Media with Neural User Embeddings,cs.CL cs.AI cs.SI," Mental illnesses adversely affect a significant proportion of the population -worldwide. However, the methods traditionally used for estimating and -characterizing the prevalence of mental health conditions are time-consuming -and expensive. Consequently, best-available estimates concerning the prevalence -of mental health conditions are often years out of date. Automated approaches -to supplement these survey methods with broad, aggregated information derived -from social media content provides a potential means for near real-time -estimates at scale. These may, in turn, provide grist for supporting, -evaluating and iteratively improving upon public health programs and -interventions. - We propose a novel model for automated mental health status quantification -that incorporates user embeddings. This builds upon recent work exploring -representation learning methods that induce embeddings by leveraging social -media post histories. Such embeddings capture latent characteristics of -individuals (e.g., political leanings) and encode a soft notion of homophily. -In this paper, we investigate whether user embeddings learned from twitter post -histories encode information that correlates with mental health statuses. To -this end, we estimated user embeddings for a set of users known to be affected -by depression and post-traumatic stress disorder (PTSD), and for a set of -demographically matched `control' users. We then evaluated these embeddings -with respect to: (i) their ability to capture homophilic relations with respect -to mental health status; and (ii) the performance of downstream mental health -prediction models based on these features. Our experimental results demonstrate -that the user embeddings capture similarities between users with respect to -mental conditions, and are predictive of mental health. -" -4735,1705.00364,John Wieting and Kevin Gimpel,Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings,cs.CL," We consider the problem of learning general-purpose, paraphrastic sentence -embeddings, revisiting the setting of Wieting et al. (2016b). While they found -LSTM recurrent networks to underperform word averaging, we present several -developments that together produce the opposite conclusion. These include -training on sentence pairs rather than phrase pairs, averaging states to -represent sequences, and regularizing aggressively. These improve LSTMs in both -transfer learning and supervised settings. We also introduce a new recurrent -architecture, the Gated Recurrent Averaging Network, that is inspired by -averaging and LSTMs while outperforming them both. We analyze our learned -models, finding evidence of preferences for particular parts of speech and -dependency relations. -" -4736,1705.00390,Ted Pedersen,"Duluth at SemEval--2016 Task 14 : Extending Gloss Overlaps to Enrich - Semantic Taxonomies",cs.CL," This paper describes the Duluth systems that participated in Task 14 of -SemEval 2016, Semantic Taxonomy Enrichment. There were three related systems in -the formal evaluation which are discussed here, along with numerous -post--evaluation runs. All of these systems identified synonyms between WordNet -and other dictionaries by measuring the gloss overlaps between them. These -systems perform better than the random baseline and one post--evaluation -variation was within a respectable margin of the median result attained by all -participating systems. -" -4737,1705.00403,Emma Strubell and Andrew McCallum,Dependency Parsing with Dilated Iterated Graph CNNs,cs.CL," Dependency parses are an effective way to inject linguistic knowledge into -many downstream tasks, and many practitioners wish to efficiently parse -sentences at scale. Recent advances in GPU hardware have enabled neural -networks to achieve significant gains over the previous best models, these -models still fail to leverage GPUs' capability for massive parallelism due to -their requirement of sequential processing of the sentence. In response, we -propose Dilated Iterated Graph Convolutional Neural Networks (DIG-CNNs) for -graph-based dependency parsing, a graph convolutional architecture that allows -for efficient end-to-end GPU parsing. In experiments on the English Penn -TreeBank benchmark, we show that DIG-CNNs perform on par with some of the best -neural network parsers. -" -4738,1705.00424,Meng Fang and Trevor Cohn,"Model Transfer for Tagging Low-resource Languages using a Bilingual - Dictionary",cs.CL," Cross-lingual model transfer is a compelling and popular method for -predicting annotations in a low-resource language, whereby parallel corpora -provide a bridge to a high-resource language and its associated annotated -corpora. However, parallel data is not readily available for many languages, -limiting the applicability of these approaches. We address these drawbacks in -our framework which takes advantage of cross-lingual word embeddings trained -solely on a high coverage bilingual dictionary. We propose a novel neural -network model for joint training from both sources of data based on -cross-lingual word embeddings, and show substantial empirical improvements over -baseline techniques. We also propose several active learning heuristics, which -result in improvements over competitive benchmark methods. -" -4739,1705.00440,"Marzieh Fadaee, Arianna Bisazza, Christof Monz",Data Augmentation for Low-Resource Neural Machine Translation,cs.CL," The quality of a Neural Machine Translation system depends substantially on -the availability of sizable parallel corpora. For low-resource language pairs -this is not the case, resulting in poor translation quality. Inspired by work -in computer vision, we propose a novel data augmentation approach that targets -low-frequency words by generating new sentence pairs containing rare words in -new, synthetically created contexts. Experimental results on simulated -low-resource settings show that our method improves translation quality by up -to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation. -" -4740,1705.00441,"Marzieh Fadaee, Arianna Bisazza, Christof Monz",Learning Topic-Sensitive Word Representations,cs.CL," Distributed word representations are widely used for modeling words in NLP -tasks. Most of the existing models generate one representation per word and do -not consider different meanings of a word. We present two approaches to learn -multiple topic-sensitive representations per word by using Hierarchical -Dirichlet Process. We observe that by modeling topics and integrating topic -distributions for each document we obtain representations that are able to -distinguish between different meanings of a given word. Our models yield -statistically significant improvements for the lexical substitution task -indicating that commonly used single word representations, even when combined -with contextual information, are insufficient for this task. -" -4741,1705.00464,"Ted Zhang, Dengxin Dai, Tinne Tuytelaars, Marie-Francine Moens, Luc - Van Gool",Speech-Based Visual Question Answering,cs.CL cs.CV," This paper introduces speech-based visual question answering (VQA), the task -of generating an answer given an image and a spoken question. Two methods are -studied: an end-to-end, deep neural network that directly uses audio waveforms -as input versus a pipelined approach that performs ASR (Automatic Speech -Recognition) on the question, followed by text-based visual question answering. -Furthermore, we investigate the robustness of both methods by injecting various -levels of noise into the spoken question and find both methods to be tolerate -noise at similar levels. -" -4742,1705.00545,"Vanessa Q. Marinho, Graeme Hirst, Diego R. Amancio",Labelled network subgraphs reveal stylistic subtleties in written texts,cs.CL physics.data-an," The vast amount of data and increase of computational capacity have allowed -the analysis of texts from several perspectives, including the representation -of texts as complex networks. Nodes of the network represent the words, and -edges represent some relationship, usually word co-occurrence. Even though -networked representations have been applied to study some tasks, such -approaches are not usually combined with traditional models relying upon -statistical paradigms. Because networked models are able to grasp textual -patterns, we devised a hybrid classifier, called labelled subgraphs, that -combines the frequency of common words with small structures found in the -topology of the network, known as motifs. Our approach is illustrated in two -contexts, authorship attribution and translationese identification. In the -former, a set of novels written by different authors is analyzed. To identify -translationese, texts from the Canadian Hansard and the European parliament -were classified as to original and translated instances. Our results suggest -that labelled subgraphs are able to represent texts and it should be further -explored in other tasks, such as the analysis of text complexity, language -proficiency, and machine translation. -" -4743,1705.00557,"Yacine Jernite, Samuel R. Bowman and David Sontag","Discourse-Based Objectives for Fast Unsupervised Sentence Representation - Learning",cs.CL cs.LG cs.NE stat.ML," This work presents a novel objective function for the unsupervised training -of neural network sentence encoders. It exploits signals from paragraph-level -discourse coherence to train these models to understand text. Our objective is -purely discriminative, allowing us to train models many times faster than was -possible under prior methods, and it yields models which perform well in -extrinsic evaluations. -" -4744,1705.00571,"Andrew Moore, Paul Rayson","Lancaster A at SemEval-2017 Task 5: Evaluation metrics matter: - predicting sentiment from financial news headlines",cs.CL," This paper describes our participation in Task 5 track 2 of SemEval 2017 to -predict the sentiment of financial news headlines for a specific company on a -continuous scale between -1 and 1. We tackled the problem using a number of -approaches, utilising a Support Vector Regression (SVR) and a Bidirectional -Long Short-Term Memory (BLSTM). We found an improvement of 4-6% using the LSTM -model over the SVR and came fourth in the track. We report a number of -different evaluations using a finance specific word embedding model and reflect -on the effects of using different evaluation metrics. -" -4745,1705.00581,"Arun Balajee Vasudevan, Michael Gygli, Anna Volokitin and Luc Van Gool","Query-adaptive Video Summarization via Quality-aware Relevance - Estimation",cs.CV cs.CL cs.MM," Although the problem of automatic video summarization has recently received a -lot of attention, the problem of creating a video summary that also highlights -elements relevant to a search query has been less studied. We address this -problem by posing query-relevant summarization as a video frame subset -selection problem, which lets us optimise for summaries which are -simultaneously diverse, representative of the entire video, and relevant to a -text query. We quantify relevance by measuring the distance between frames and -queries in a common textual-visual semantic embedding space induced by a neural -network. In addition, we extend the model to capture query-independent -properties, such as frame quality. We compare our method against previous state -of the art on textual-visual embeddings for thumbnail selection and show that -our model outperforms them on relevance prediction. Furthermore, we introduce a -new dataset, annotated with diversity and query-specific relevance labels. On -this dataset, we train and test our complete model for video summarization and -show that it outperforms standard baselines such as Maximal Marginal Relevance. -" -4746,1705.00601,"Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee","The Promise of Premise: Harnessing Question Premises in Visual Question - Answering",cs.CV cs.CL," In this paper, we make a simple observation that questions about images often -contain premises - objects and relationships implied by the question - and that -reasoning about premises can help Visual Question Answering (VQA) models -respond more intelligently to irrelevant or previously unseen questions. When -presented with a question that is irrelevant to an image, state-of-the-art VQA -models will still answer purely based on learned language biases, resulting in -non-sensical or even misleading answers. We note that a visual question is -irrelevant to an image if at least one of its premises is false (i.e. not -depicted in the image). We leverage this observation to construct a dataset for -Question Relevance Prediction and Explanation (QRPE) by searching for false -premises. We train novel question relevance detection models and show that -models that reason about premises consistently outperform models that do not. -We also find that forcing standard VQA models to reason about premises during -training can lead to improvements on tasks requiring compositional reasoning. -" -4747,1705.00648,William Yang Wang,"""Liar, Liar Pants on Fire"": A New Benchmark Dataset for Fake News - Detection",cs.CL cs.CY," Automatic fake news detection is a challenging problem in deception -detection, and it has tremendous real-world political and social impacts. -However, statistical approaches to combating fake news has been dramatically -limited by the lack of labeled benchmark datasets. In this paper, we present -liar: a new, publicly available dataset for fake news detection. We collected a -decade-long, 12.8K manually labeled short statements in various contexts from -PolitiFact.com, which provides detailed analysis report and links to source -documents for each case. This dataset can be used for fact-checking research as -well. Notably, this new dataset is an order of magnitude larger than previously -largest public fake news datasets of similar type. Empirically, we investigate -automatic fake news detection based on surface-level linguistic patterns. We -have designed a novel, hybrid convolutional neural network to integrate -meta-data with text. We show that this hybrid approach can improve a text-only -deep learning model. -" -4748,1705.00652,"Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-hsuan Sung, Laszlo - Lukacs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, Ray Kurzweil",Efficient Natural Language Response Suggestion for Smart Reply,cs.CL," This paper presents a computationally efficient machine-learned method for -natural language response suggestion. Feed-forward neural networks using n-gram -embedding features encode messages into vectors which are optimized to give -message-response pairs a high dot-product value. An optimized search finds -response suggestions. The method is evaluated in a large-scale commercial -e-mail application, Inbox by Gmail. Compared to a sequence-to-sequence -approach, the new system achieves the same quality at a small fraction of the -computational requirements and latency. -" -4749,1705.00694,"Max Kanovich, Stepan Kuznetsov, Glyn Morrill, Andre Scedrov","A polynomial time algorithm for the Lambek calculus with brackets of - bounded order",cs.LO cs.CL cs.DS cs.FL," Lambek calculus is a logical foundation of categorial grammar, a linguistic -paradigm of grammar as logic and parsing as deduction. Pentus (2010) gave a -polynomial-time algorithm for determ- ining provability of bounded depth -formulas in the Lambek calculus with empty antecedents allowed. Pentus' -algorithm is based on tabularisation of proof nets. Lambek calculus with -brackets is a conservative extension of Lambek calculus with bracket -modalities, suitable for the modeling of syntactical domains. In this paper we -give an algorithm for provability the Lambek calculus with brackets allowing -empty antecedents. Our algorithm runs in polynomial time when both the formula -depth and the bracket nesting depth are bounded. It combines a Pentus-style -tabularisation of proof nets with an automata-theoretic treatment of -bracketing. -" -4750,1705.00697,"Juan Andr\'es Laura, Gabriel Masi, Luis Argerich","From Imitation to Prediction, Data Compression vs Recurrent Neural - Networks for Natural Language Processing",cs.CL cs.AI cs.IT math.IT," In recent studies [1][13][12] Recurrent Neural Networks were used for -generative processes and their surprising performance can be explained by their -ability to create good predictions. In addition, data compression is also based -on predictions. What the problem comes down to is whether a data compressor -could be used to perform as well as recurrent neural networks in natural -language processing tasks. If this is possible,then the problem comes down to -determining if a compression algorithm is even more intelligent than a neural -network in specific tasks related to human language. In our journey we -discovered what we think is the fundamental difference between a Data -Compression Algorithm and a Recurrent Neural Network. -" -4751,1705.00746,Satoshi Akasaki and Nobuhiro Kaji,"Chat Detection in an Intelligent Assistant: Combining Task-oriented and - Non-task-oriented Spoken Dialogue Systems",cs.CL," Recently emerged intelligent assistants on smartphones and home electronics -(e.g., Siri and Alexa) can be seen as novel hybrids of domain-specific -task-oriented spoken dialogue systems and open-domain non-task-oriented ones. -To realize such hybrid dialogue systems, this paper investigates determining -whether or not a user is going to have a chat with the system. To address the -lack of benchmark datasets for this task, we construct a new dataset consisting -of 15; 160 utterances collected from the real log data of a commercial -intelligent assistant (and will release the dataset to facilitate future -research activity). In addition, we investigate using tweets and Web search -queries for handling open-domain user utterances, which characterize the task -of chat detection. Experiments demonstrated that, while simple supervised -methods are effective, the use of the tweets and search queries further -improves the F1-score from 86.21 to 87.53. -" -4752,1705.00753,"Yun Chen, Yang Liu, Yong Cheng, Victor O.K. Li",A Teacher-Student Framework for Zero-Resource Neural Machine Translation,cs.CL," While end-to-end neural machine translation (NMT) has made remarkable -progress recently, it still suffers from the data scarcity problem for -low-resource language pairs and domains. In this paper, we propose a method for -zero-resource NMT by assuming that parallel sentences have close probabilities -of generating a sentence in a third language. Based on this assumption, our -method is able to train a source-to-target NMT model (""student"") without -parallel corpora available, guided by an existing pivot-to-target NMT model -(""teacher"") on a source-pivot parallel corpus. Experimental results show that -the proposed method significantly improves over a baseline pivot-based model by -+3.0 BLEU points across various language pairs. -" -4753,1705.00823,"Yuya Yoshikawa, Yutaro Shigeto, Akikazu Takeuchi","STAIR Captions: Constructing a Large-Scale Japanese Image Caption - Dataset",cs.CL cs.CV," In recent years, automatic generation of image descriptions (captions), that -is, image captioning, has attracted a great deal of attention. In this paper, -we particularly consider generating Japanese captions for images. Since most -available caption datasets have been constructed for English language, there -are few datasets for Japanese. To tackle this problem, we construct a -large-scale Japanese image caption dataset based on images from MS-COCO, which -is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions -for 164,062 images. In the experiment, we show that a neural network trained -using STAIR Captions can generate more natural and better Japanese captions, -compared to those generated using English-Japanese machine translation after -generating English captions. -" -4754,1705.00861,"Mingxuan Wang, Zhengdong Lu, Jie Zhou, Qun Liu",Deep Neural Machine Translation with Linear Associative Unit,cs.CL cs.LG," Deep Neural Networks (DNNs) have provably enhanced the state-of-the-art -Neural Machine Translation (NMT) with their capability in modeling complex -functions and capturing complex linguistic structures. However NMT systems with -deep architecture in their encoder or decoder RNNs often suffer from severe -gradient diffusion due to the non-linear recurrent activations, which often -make the optimization much more difficult. To address this problem we propose -novel linear associative units (LAU) to reduce the gradient propagation length -inside the recurrent unit. Different from conventional approaches (LSTM unit -and GRU), LAUs utilizes linear associative connections between input and output -of the recurrent unit, which allows unimpeded information flow through both -space and time direction. The model is quite simple, but it is surprisingly -effective. Our empirical study on Chinese-English translation shows that our -model with proper configuration can improve by 11.7 BLEU upon Groundhog and the -best reported results in the same setting. On WMT14 English-German task and a -larger WMT14 English-French task, our model achieves comparable results with -the state-of-the-art. -" -4755,1705.00995,Amir Karami and Aryya Gangopadhyay and Bin Zhou and Hadi Kharrazi,Fuzzy Approach Topic Discovery in Health and Medical Corpora,stat.ML cs.CL cs.IR," The majority of medical documents and electronic health records (EHRs) are in -text format that poses a challenge for data processing and finding relevant -documents. Looking for ways to automatically retrieve the enormous amount of -health and medical knowledge has always been an intriguing topic. Powerful -methods have been developed in recent years to make the text processing -automatic. One of the popular approaches to retrieve information based on -discovering the themes in health & medical corpora is topic modeling, however, -this approach still needs new perspectives. In this research we describe fuzzy -latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy -perspective. FLSA can handle health & medical corpora redundancy issue and -provides a new method to estimate the number of topics. The quantitative -evaluations show that FLSA produces superior performance and features to latent -Dirichlet allocation (LDA), the most popular topic model. -" -4756,1705.01020,"Junhui Li, Deyi Xiong, Zhaopeng Tu, Muhua Zhu, Min Zhang, Guodong Zhou",Modeling Source Syntax for Neural Machine Translation,cs.CL," Even though a linguistics-free sequence to sequence model in neural machine -translation (NMT) has certain capability of implicitly learning syntactic -information of source sentences, this paper shows that source syntax can be -explicitly incorporated into NMT effectively to provide further improvements. -Specifically, we linearize parse trees of source sentences to obtain structural -label sequences. On the basis, we propose three different sorts of encoders to -incorporate source syntax into NMT: 1) Parallel RNN encoder that learns word -and label annotation vectors parallelly; 2) Hierarchical RNN encoder that -learns word and label annotation vectors in a two-level hierarchy; and 3) Mixed -RNN encoder that stitchingly learns word and label annotation vectors over -sequences where words and labels are mixed. Experimentation on -Chinese-to-English translation demonstrates that all the three proposed -syntactic encoders are able to improve translation accuracy. It is interesting -to note that the simplest RNN encoder, i.e., Mixed RNN encoder yields the best -performance with an significant improvement of 1.4 BLEU points. Moreover, an -in-depth analysis from several perspectives is provided to reveal how source -syntax benefits NMT. -" -4757,1705.01042,"Weiqian Yan, Kanchan Khurad",Entity Linking with people entity on Wikipedia,cs.CL," This paper introduces a new model that uses named entity recognition, -coreference resolution, and entity linking techniques, to approach the task of -linking people entities on Wikipedia people pages to their corresponding -Wikipedia pages if applicable. Our task is different from general and -traditional entity linking because we are working in a limited domain, namely, -people entities, and we are including pronouns as entities, whereas in the -past, pronouns were never considered as entities in entity linking. We have -built 2 models, both outperforms our baseline model significantly. The purpose -of our project is to build a model that could be use to generate cleaner data -for future entity linking tasks. Our contribution include a clean data set -consisting of 50Wikipedia people pages, and 2 entity linking models, -specifically tuned for this domain. -" -4758,1705.01214,"Maira Gatti de Bayser, Paulo Cavalin, Renan Souza, Alan Braz, Heloisa - Candello, Claudio Pinhanez, Jean-Pierre Briot",A Hybrid Architecture for Multi-Party Conversational Systems,cs.CL," Multi-party Conversational Systems are systems with natural language -interaction between one or more people or systems. From the moment that an -utterance is sent to a group, to the moment that it is replied in the group by -a member, several activities must be done by the system: utterance -understanding, information search, reasoning, among others. In this paper we -present the challenges of designing and building multi-party conversational -systems, the state of the art, our proposed hybrid architecture using both -rules and machine learning and some insights after implementing and evaluating -one on the finance domain. -" -4759,1705.01253,"Hongyang Xue, Zhou Zhao, Deng Cai",The Forgettable-Watcher Model for Video Question Answering,cs.CV cs.CL," A number of visual question answering approaches have been proposed recently, -aiming at understanding the visual scenes by answering the natural language -questions. While the image question answering has drawn significant attention, -video question answering is largely unexplored. - Video-QA is different from Image-QA since the information and the events are -scattered among multiple frames. In order to better utilize the temporal -structure of the videos and the phrasal structures of the answers, we propose -two mechanisms: the re-watching and the re-reading mechanisms and combine them -into the forgettable-watcher model. Then we propose a TGIF-QA dataset for video -question answering with the help of automatic question generation. Finally, we -evaluate the models on our dataset. The experimental results show the -effectiveness of our proposed models. -" -4760,1705.01265,"Georgios Balikas, Ioannis Partalas","On the effectiveness of feature set augmentation using clusters of word - embeddings",cs.CL," Word clusters have been empirically shown to offer important performance -improvements on various tasks. Despite their importance, their incorporation in -the standard pipeline of feature engineering relies more on a trial-and-error -procedure where one evaluates several hyper-parameters, like the number of -clusters to be used. In order to better understand the role of such features we -systematically evaluate their effect on four tasks, those of named entity -segmentation and classification as well as, those of five-point sentiment -classification and quantification. Our results strongly suggest that cluster -membership features improve the performance. -" -4761,1705.01306,"Alon Rozental, Daniel Fleischer","Amobee at SemEval-2017 Task 4: Deep Learning System for Sentiment - Detection on Twitter",cs.CL stat.ML," This paper describes the Amobee sentiment analysis system, adapted to compete -in SemEval 2017 task 4. The system consists of two parts: a supervised training -of RNN models based on a Twitter sentiment treebank, and the use of feedforward -NN, Naive Bayes and logistic regression classifiers to produce predictions for -the different sub-tasks. The algorithm reached the 3rd place on the 5-label -classification task (sub-task C). -" -4762,1705.01346,"Danhao Zhu, Si Shen, Xin-Yu Dai and Jiajun Chen",Going Wider: Recurrent Neural Network With Parallel Cells,cs.CL cs.LG cs.NE," Recurrent Neural Network (RNN) has been widely applied for sequence modeling. -In RNN, the hidden states at current step are full connected to those at -previous step, thus the influence from less related features at previous step -may potentially decrease model's learning ability. We propose a simple -technique called parallel cells (PCs) to enhance the learning ability of -Recurrent Neural Network (RNN). In each layer, we run multiple small RNN cells -rather than one single large cell. In this paper, we evaluate PCs on 2 tasks. -On language modeling task on PTB (Penn Tree Bank), our model outperforms state -of art models by decreasing perplexity from 78.6 to 75.3. On Chinese-English -translation task, our model increases BLEU score for 0.39 points than baseline -model. -" -4763,1705.01359,"Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, - Moin Nabi, Enver Sangineto, Raffaella Bernardi",FOIL it! Find One mismatch between Image and Language caption,cs.CV cs.CL cs.MM," In this paper, we aim to understand whether current language and vision -(LaVi) models truly grasp the interaction between the two modalities. To this -end, we propose an extension of the MSCOCO dataset, FOIL-COCO, which associates -images with both correct and ""foil"" captions, that is, descriptions of the -image that are highly similar to the original ones, but contain one single -mistake (""foil word""). We show that current LaVi models fall into the traps of -this data and perform badly on three tasks: a) caption classification (correct -vs. foil); b) foil word detection; c) foil word correction. Humans, in -contrast, have near-perfect performance on those tasks. We demonstrate that -merely utilising language cues is not enough to model FOIL-COCO and that it -challenges the state-of-the-art by requiring a fine-grained understanding of -the relation between text and image. -" -4764,1705.01452,"Hao Zhou, Zhaopeng Tu, Shujian Huang, Xiaohua Liu, Hang Li, Jiajun - Chen",Chunk-Based Bi-Scale Decoder for Neural Machine Translation,cs.CL," In typical neural machine translation~(NMT), the decoder generates a sentence -word by word, packing all linguistic granularities in the same time-scale of -RNN. In this paper, we propose a new type of decoder for NMT, which splits the -decode state into two parts and updates them in two different time-scales. -Specifically, we first predict a chunk time-scale state for phrasal modeling, -on top of which multiple word time-scale states are generated. In this way, the -target sentence is translated hierarchically from chunks to words, with -information in different granularities being leveraged. Experiments show that -our proposed model significantly improves the translation performance over the -state-of-the-art NMT model. -" -4765,1705.01684,Ryan Cotterell and Jason Eisner,Probabilistic Typology: Deep Generative Models of Vowel Inventories,cs.CL," Linguistic typology studies the range of structures present in human -language. The main goal of the field is to discover which sets of possible -phenomena are universal, and which are merely frequent. For example, all -languages have vowels, while most---but not all---languages have an /u/ sound. -In this paper we present the first probabilistic treatment of a basic question -in phonological typology: What makes a natural vowel inventory? We introduce a -series of deep stochastic point processes, and contrast them with previous -computational, simulation-based approaches. We provide a comprehensive suite of -experiments on over 200 distinct languages. -" -4766,1705.01833,Somnath Roy,"A Finite State and Rule-based Akshara to Prosodeme (A2P) Converter in - Hindi",cs.CL," This article describes a software module called Akshara to Prosodeme (A2P) -converter in Hindi. It converts an input grapheme into prosedeme (sequence of -phonemes with the specification of syllable boundaries and prosodic labels). -The software is based on two proposed finite state machines\textemdash one for -the syllabification and another for the syllable labeling. In addition to that, -it also uses a set of nonlinear phonological rules proposed for foot formation -in Hindi, which encompass solutions to schwa-deletion in simple, compound, -derived and inflected words. The nonlinear phonological rules are based on -metrical phonology with the provision of recursive foot structure. A software -module is implemented in Python. The testing of the software for -syllabification, syllable labeling, schwa deletion and prosodic labeling yield -an accuracy of more than 99% on a lexicon of size 28664 words. -" -4767,1705.01991,Jacob Devlin,"Sharp Models on Dull Hardware: Fast and Accurate Neural Machine - Translation Decoding on the CPU",cs.CL," Attentional sequence-to-sequence models have become the new standard for -machine translation, but one challenge of such models is a significant increase -in training and decoding cost compared to phrase-based systems. Here, we focus -on efficient decoding, with a goal of achieving accuracy close the -state-of-the-art in neural machine translation (NMT), while achieving CPU -decoding speed/throughput close to that of a phrasal decoder. - We approach this problem from two angles: First, we describe several -techniques for speeding up an NMT beam search decoder, which obtain a 4.4x -speedup over a very efficient baseline decoder without changing the decoder -output. Second, we propose a simple but powerful network architecture which -uses an RNN (GRU/LSTM) layer at bottom, followed by a series of stacked -fully-connected layers applied at every timestep. This architecture achieves -similar accuracy to a deep recurrent model, at a small fraction of the training -and decoding cost. By combining these techniques, our best system achieves a -very competitive accuracy of 38.3 BLEU on WMT English-French NewsTest2014, -while decoding at 100 words/sec on single-threaded CPU. We believe this is the -best published accuracy/speed trade-off of an NMT system. -" -4768,1705.02012,"Xingdi Yuan, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip - Bachman, Sandeep Subramanian, Saizheng Zhang, Adam Trischler",Machine Comprehension by Text-to-Text Neural Question Generation,cs.CL," We propose a recurrent neural model that generates natural-language questions -from documents, conditioned on answers. We show how to train the model using a -combination of supervised and reinforcement learning. After teacher forcing for -standard maximum likelihood training, we fine-tune the model using policy -gradient techniques to maximize several rewards that measure question quality. -Most notably, one of these rewards is the performance of a question-answering -system. We motivate question generation as a means to improve the performance -of question answering systems. Our model is trained and evaluated on the recent -question-answering dataset SQuAD. -" -4769,1705.02023,Hussam Hamdan,"Senti17 at SemEval-2017 Task 4: Ten Convolutional Neural Network Voters - for Tweet Polarity Classification",cs.CL," This paper presents Senti17 system which uses ten convolutional neural -networks (ConvNet) to assign a sentiment label to a tweet. The network consists -of a convolutional layer followed by a fully-connected layer and a Softmax on -top. Ten instances of this network are initialized with the same word -embeddings as inputs but with different initializations for the network -weights. We combine the results of all instances by selecting the sentiment -label given by the majority of the ten voters. This system is ranked fourth in -SemEval-2017 Task4 over 38 systems with 67.4% -" -4770,1705.02073,Ruochen Xu and Yiming Yang,Cross-lingual Distillation for Text Classification,cs.CL," Cross-lingual text classification(CLTC) is the task of classifying documents -written in different languages into the same taxonomy of categories. This paper -presents a novel approach to CLTC that builds on model distillation, which -adapts and extends a framework originally proposed for model compression. Using -soft probabilistic predictions for the documents in a label-rich language as -the (induced) supervisory labels in a parallel corpus of documents, we train -classifiers successfully for new languages in which labeled training data are -not available. An adversarial feature adaptation technique is also applied -during the model training to reduce distribution mismatch. We conducted -experiments on two benchmark CLTC datasets, treating English as the source -language and German, French, Japan and Chinese as the unlabeled target -languages. The proposed approach had the advantageous or comparable performance -of the other state-of-art methods. -" -4771,1705.02077,"Mengxue Li, Shiqiang Geng, Yang Gao, Haijing Liu, Hao Wang",Crowdsourcing Argumentation Structures in Chinese Hotel Reviews,cs.CL," Argumentation mining aims at automatically extracting the premises-claim -discourse structures in natural language texts. There is a great demand for -argumentation corpora for customer reviews. However, due to the controversial -nature of the argumentation annotation task, there exist very few large-scale -argumentation corpora for customer reviews. In this work, we novelly use the -crowdsourcing technique to collect argumentation annotations in Chinese hotel -reviews. As the first Chinese argumentation dataset, our corpus includes 4814 -argument component annotations and 411 argument relation annotations, and its -annotations qualities are comparable to some widely used argumentation corpora -in other languages. -" -4772,1705.02131,"Minglan Li, Yang Gao, Hui Wen, Yang Du, Haijing Liu and Hao Wang",Joint RNN Model for Argument Component Boundary Detection,cs.CL," Argument Component Boundary Detection (ACBD) is an important sub-task in -argumentation mining; it aims at identifying the word sequences that constitute -argument components, and is usually considered as the first sub-task in the -argumentation mining pipeline. Existing ACBD methods heavily depend on -task-specific knowledge, and require considerable human efforts on -feature-engineering. To tackle these problems, in this work, we formulate ACBD -as a sequence labeling problem and propose a variety of Recurrent Neural -Network (RNN) based methods, which do not use domain specific or handcrafted -features beyond the relative position of the sentence in the document. In -particular, we propose a novel joint RNN model that can predict whether -sentences are argumentative or not, and use the predicted results to more -precisely detect the argument component boundaries. We evaluate our techniques -on two corpora from two different genres; results suggest that our joint RNN -model obtain the state-of-the-art performance on both datasets. -" -4773,1705.02203,"Tesfamariam M. Abuhay, Sergey V. Kovalchuk, Klavdiya O. Bochenina, - George Kampis, Valeria V. Krzhizhanovskaya, Michael H. Lees","Analysis of Computational Science Papers from ICCS 2001-2016 using Topic - Modeling and Graph Theory",cs.DL cs.CL cs.IR cs.SI," This paper presents results of topic modeling and network models of topics -using the International Conference on Computational Science corpus, which -contains domain-specific (computational science) papers over sixteen years (a -total of 5695 papers). We discuss topical structures of International -Conference on Computational Science, how these topics evolve over time in -response to the topicality of various problems, technologies and methods, and -how all these topics relate to one another. This analysis illustrates -multidisciplinary research and collaborations among scientific communities, by -constructing static and dynamic networks from the topic modeling results and -the keywords of authors. The results of this study give insights about the past -and future trends of core discussion topics in computational science. We used -the Non-negative Matrix Factorization topic modeling algorithm to discover -topics and labeled and grouped results hierarchically. -" -4774,1705.02269,"Sebastian Brarda, Philip Yeres, Samuel R. Bowman","Sequential Attention: A Context-Aware Alignment Function for Machine - Reading",cs.CL cs.LG," In this paper we propose a neural network model with a novel Sequential -Attention layer that extends soft attention by assigning weights to words in an -input sequence in a way that takes into account not just how well that word -matches a query, but how well surrounding words match. We evaluate this -approach on the task of reading comprehension (on the Who did What and CNN -datasets) and show that it dramatically improves a strong baseline--the -Stanford Reader--and is competitive with the state of the art. -" -4775,1705.02304,"Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, - Ying Cao, Ajay Kannan, Zhenyao Zhu",Deep Speaker: an End-to-End Neural Speaker Embedding System,cs.CL," We present Deep Speaker, a neural speaker embedding system that maps -utterances to a hypersphere where speaker similarity is measured by cosine -similarity. The embeddings generated by Deep Speaker can be used for many -tasks, including speaker identification, verification, and clustering. We -experiment with ResCNN and GRU architectures to extract the acoustic features, -then mean pool to produce utterance-level speaker embeddings, and train using -triplet loss based on cosine similarity. Experiments on three distinct datasets -suggest that Deep Speaker outperforms a DNN-based i-vector baseline. For -example, Deep Speaker reduces the verification equal error rate by 50% -(relatively) and improves the identification accuracy by 60% (relatively) on a -text-independent dataset. We also present results that suggest adapting from a -model trained with Mandarin can improve accuracy for English speaker -recognition. -" -4776,1705.02314,"Serkan Ozen, Burcu Can",Building Morphological Chains for Agglutinative Languages,cs.CL," In this paper, we build morphological chains for agglutinative languages by -using a log-linear model for the morphological segmentation task. The model is -based on the unsupervised morphological segmentation system called -MorphoChains. We extend MorphoChains log linear model by expanding the -candidate space recursively to cover more split points for agglutinative -languages such as Turkish, whereas in the original model candidates are -generated by considering only binary segmentation of each word. The results -show that we improve the state-of-art Turkish scores by 12% having a F-measure -of 72% and we improve the English scores by 3% having a F-measure of 74%. -Eventually, the system outperforms both MorphoChains and other well-known -unsupervised morphological segmentation systems. The results indicate that -candidate generation plays an important role in such an unsupervised log-linear -model that is learned using contrastive estimation with negative samples. -" -4777,1705.02315,"Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri and - Ronald M. Summers","ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on - Weakly-Supervised Classification and Localization of Common Thorax Diseases",cs.CV cs.CL," The chest X-ray is one of the most commonly accessible radiological -examinations for screening and diagnosis of many lung diseases. A tremendous -number of X-ray imaging studies accompanied by radiological reports are -accumulated and stored in many modern hospitals' Picture Archiving and -Communication Systems (PACS). On the other side, it is still an open question -how this type of hospital-size knowledge database containing invaluable imaging -informatics (i.e., loosely labeled) can be used to facilitate the data-hungry -deep learning paradigms in building truly large-scale high precision -computer-aided diagnosis (CAD) systems. - In this paper, we present a new chest X-ray database, namely ""ChestX-ray8"", -which comprises 108,948 frontal-view X-ray images of 32,717 unique patients -with the text-mined eight disease image labels (where each image can have -multi-labels), from the associated radiological reports using natural language -processing. Importantly, we demonstrate that these commonly occurring thoracic -diseases can be detected and even spatially-located via a unified -weakly-supervised multi-label image classification and disease localization -framework, which is validated using our proposed dataset. Although the initial -quantitative results are promising as reported, deep convolutional neural -network based ""reading chest X-rays"" (i.e., recognizing and locating the common -disease patterns trained with only image-level labels) remains a strenuous task -for fully-automated high precision CAD systems. Data download link: -https://nihcc.app.box.com/v/ChestXray-NIHCC -" -4778,1705.02364,"Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, Antoine - Bordes","Supervised Learning of Universal Sentence Representations from Natural - Language Inference Data",cs.CL," Many modern NLP systems rely on word embeddings, previously trained in an -unsupervised manner on large corpora, as base features. Efforts to obtain -embeddings for larger chunks of text, such as sentences, have however not been -so successful. Several attempts at learning unsupervised representations of -sentences have not reached satisfactory enough performance to be widely -adopted. In this paper, we show how universal sentence representations trained -using the supervised data of the Stanford Natural Language Inference datasets -can consistently outperform unsupervised methods like SkipThought vectors on a -wide range of transfer tasks. Much like how computer vision uses ImageNet to -obtain features, which can then be transferred to other tasks, our work tends -to indicate the suitability of natural language inference for transfer learning -to other NLP tasks. Our encoder is publicly available. -" -4779,1705.02394,"Jonathan Chang, Stefan Scherer","Learning Representations of Emotional Speech with Deep Convolutional - Generative Adversarial Networks",cs.CL cs.LG stat.ML," Automatically assessing emotional valence in human speech has historically -been a difficult task for machine learning algorithms. The subtle changes in -the voice of the speaker that are indicative of positive or negative emotional -states are often ""overshadowed"" by voice characteristics relating to emotional -intensity or emotional activation. In this work we explore a representation -learning approach that automatically derives discriminative representations of -emotional speech. In particular, we investigate two machine learning strategies -to improve classifier performance: (1) utilization of unlabeled data using a -deep convolutional generative adversarial network (DCGAN), and (2) multitask -learning. Within our extensive experiments we leverage a multitask annotated -emotional corpus as well as a large unlabeled meeting corpus (around 100 -hours). Our speaker-independent classification experiments show that in -particular the use of unlabeled data in our investigations improves performance -of the classifiers and both fully supervised baseline approaches are -outperformed considerably. We improve the classification of emotional valence -on a discrete 5-point scale to 43.88% and on a 3-point scale to 49.80%, which -is competitive to state-of-the-art performance. -" -4780,1705.02395,"Markus Borg, Iben Lennerstad, Rasmus Ros, Elizabeth Bjarnason","On Using Active Learning and Self-Training when Mining Performance - Discussions on Stack Overflow",cs.CL cs.HC cs.LG cs.SE," Abundant data is the key to successful machine learning. However, supervised -learning requires annotated data that are often hard to obtain. In a -classification task with limited resources, Active Learning (AL) promises to -guide annotators to examples that bring the most value for a classifier. AL can -be successfully combined with self-training, i.e., extending a training set -with the unlabelled examples for which a classifier is the most certain. We -report our experiences on using AL in a systematic manner to train an SVM -classifier for Stack Overflow posts discussing performance of software -components. We show that the training examples deemed as the most valuable to -the classifier are also the most difficult for humans to annotate. Despite -carefully evolved annotation criteria, we report low inter-rater agreement, but -we also propose mitigation strategies. Finally, based on one annotator's work, -we show that self-training can improve the classification accuracy. We conclude -the paper by discussing implication for future text miners aspiring to use AL -and self-training. -" -4781,1705.02411,"Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, - Gengshen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni","Max-Pooling Loss Training of Long Short-Term Memory Networks for - Small-Footprint Keyword Spotting",cs.CL cs.LG stat.ML," We propose a max-pooling based loss function for training Long Short-Term -Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low -CPU, memory, and latency requirements. The max-pooling loss training can be -further guided by initializing with a cross-entropy loss trained network. A -posterior smoothing based evaluation approach is employed to measure keyword -spotting performance. Our experimental results show that LSTM models trained -using cross-entropy loss or max-pooling loss outperform a cross-entropy loss -trained baseline feed-forward Deep Neural Network (DNN). In addition, -max-pooling loss trained LSTM with randomly initialized network performs better -compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss -trained LSTM initialized with a cross-entropy pre-trained network shows the -best performance, which yields $67.6\%$ relative reduction compared to baseline -feed-forward DNN in Area Under the Curve (AUC) measure. -" -4782,1705.02426,"Hanxiao Liu, Yuexin Wu, Yiming Yang",Analogical Inference for Multi-Relational Embeddings,cs.LG cs.AI cs.CL," Large-scale multi-relational embedding refers to the task of learning the -latent representations for entities and relations in large knowledge graphs. An -effective and scalable solution for this problem is crucial for the true -success of knowledge-based inference in a broad range of applications. This -paper proposes a novel framework for optimizing the latent representations with -respect to the \textit{analogical} properties of the embedded entities and -relations. By formulating the learning objective in a differentiable fashion, -our model enjoys both theoretical power and computational scalability, and -significantly outperformed a large number of representative baseline methods on -benchmark datasets. Furthermore, the model offers an elegant unification of -several well-known methods in multi-relational embedding, which can be proven -to be special instantiations of our framework. -" -4783,1705.02452,"Pramod Pandey, Somnath Roy",A Generative Model of a Pronunciation Lexicon for Hindi,cs.CL," Voice browser applications in Text-to- Speech (TTS) and Automatic Speech -Recognition (ASR) systems crucially depend on a pronunciation lexicon. The -present paper describes the model of pronunciation lexicon of Hindi developed -to automatically generate the output forms of Hindi at two levels, the - and the (PS, in short for Prosodic Structure). The latter level -involves both syllable-division and stress placement. The paper describes the -tool developed for generating the two-level outputs of lexica in Hindi. -" -4784,1705.02494,"Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji","Learning Distributed Representations of Texts and Entities from - Knowledge Base",cs.CL cs.NE," We describe a neural network model that jointly learns distributed -representations of texts and knowledge base (KB) entities. Given a text in the -KB, we train our proposed model to predict entities that are relevant to the -text. Our model is designed to be generic with the ability to address various -NLP tasks with ease. We train the model using a large corpus of texts and their -entity annotations extracted from Wikipedia. We evaluated the model on three -important NLP tasks (i.e., sentence textual similarity, entity linking, and -factoid question answering) involving both unsupervised and supervised -settings. As a result, we achieved state-of-the-art results on all three of -these tasks. Our code and trained models are publicly available for further -academic research. -" -4785,1705.02518,"Subhabrata Mukherjee, Kashyap Popat, Gerhard Weikum",Exploring Latent Semantic Factors to Find Useful Product Reviews,cs.AI cs.CL cs.IR cs.SI stat.ML," Online reviews provided by consumers are a valuable asset for e-Commerce -platforms, influencing potential consumers in making purchasing decisions. -However, these reviews are of varying quality, with the useful ones buried deep -within a heap of non-informative reviews. In this work, we attempt to -automatically identify review quality in terms of its helpfulness to the end -consumers. In contrast to previous works in this domain exploiting a variety of -syntactic and community-level features, we delve deep into the semantics of -reviews as to what makes them useful, providing interpretable explanation for -the same. We identify a set of consistency and semantic factors, all from the -text, ratings, and timestamps of user-generated reviews, making our approach -generalizable across all communities and domains. We explore review semantics -in terms of several latent factors like the expertise of its author, his -judgment about the fine-grained facets of the underlying product, and his -writing style. These are cast into a Hidden Markov Model -- Latent Dirichlet -Allocation (HMM-LDA) based model to jointly infer: (i) reviewer expertise, (ii) -item facets, and (iii) review helpfulness. Large-scale experiments on five -real-world datasets from Amazon show significant improvement over -state-of-the-art baselines in predicting and ranking useful reviews. -" -4786,1705.02519,"Subhabrata Mukherjee, Hemank Lamba, Gerhard Weikum",Item Recommendation with Evolving User Preferences and Experience,cs.AI cs.CL cs.IR cs.SI stat.ML," Current recommender systems exploit user and item similarities by -collaborative filtering. Some advanced methods also consider the temporal -evolution of item ratings as a global background process. However, all prior -methods disregard the individual evolution of a user's experience level and how -this is expressed in the user's writing in a review community. In this paper, -we model the joint evolution of user experience, interest in specific item -facets, writing style, and rating behavior. This way we can generate individual -recommendations that take into account the user's maturity level (e.g., -recommending art movies rather than blockbusters for a cinematography expert). -As only item ratings and review texts are observables, we capture the user's -experience and interests in a latent model learned from her reviews, vocabulary -and writing style. We develop a generative HMM-LDA model to trace user -evolution, where the Hidden Markov Model (HMM) traces her latent experience -progressing over time -- with solely user reviews and ratings as observables -over time. The facets of a user's interest are drawn from a Latent Dirichlet -Allocation (LDA) model derived from her reviews, as a function of her (again -latent) experience level. In experiments with five real-world datasets, we show -that our model improves the rating prediction over state-of-the-art baselines, -by a substantial margin. We also show, in a use-case study, that our model -performs well in the assessment of user experience levels. -" -4787,1705.02522,"Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil",People on Drugs: Credibility of User Statements in Health Communities,cs.AI cs.CL cs.IR cs.SI stat.ML," Online health communities are a valuable source of information for patients -and physicians. However, such user-generated resources are often plagued by -inaccuracies and misinformation. In this work we propose a method for -automatically establishing the credibility of user-generated medical statements -and the trustworthiness of their authors by exploiting linguistic cues and -distant supervision from expert sources. To this end we introduce a -probabilistic graphical model that jointly learns user trustworthiness, -statement credibility, and language objectivity. We apply this methodology to -the task of extracting rare or unknown side-effects of medical drugs --- this -being one of the problems where large scale non-expert data has the potential -to complement expert medical knowledge. We show that our method can reliably -extract side-effects and filter out false statements, while identifying -trustworthy users that are likely to contribute valuable medical information. -" -4788,1705.02667,"Subhabrata Mukherjee, Gerhard Weikum","People on Media: Jointly Identifying Credible News and Trustworthy - Citizen Journalists in Online Communities",cs.AI cs.CL cs.IR cs.SI stat.ML," Media seems to have become more partisan, often providing a biased coverage -of news catering to the interest of specific groups. It is therefore essential -to identify credible information content that provides an objective narrative -of an event. News communities such as digg, reddit, or newstrust offer -recommendations, reviews, quality ratings, and further insights on journalistic -works. However, there is a complex interaction between different factors in -such online communities: fairness and style of reporting, language clarity and -objectivity, topical perspectives (like political viewpoint), expertise and -bias of community members, and more. This paper presents a model to -systematically analyze the different interactions in a news community between -users, news, and sources. We develop a probabilistic graphical model that -leverages this joint interaction to identify 1) highly credible news articles, -2) trustworthy news sources, and 3) expert users who perform the role of -""citizen journalists"" in the community. Our method extends CRF models to -incorporate real-valued ratings, as some communities have very fine-grained -scales that cannot be easily discretized without losing information. To the -best of our knowledge, this paper is the first full-fledged analysis of -credibility, trust, and expertise in news communities. -" -4789,1705.02668,"Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum","Credible Review Detection with Limited Information using Consistency - Analysis",cs.AI cs.CL cs.IR cs.SI stat.ML," Online reviews provide viewpoints on the strengths and shortcomings of -products/services, influencing potential customers' purchasing decisions. -However, the proliferation of non-credible reviews -- either fake (promoting/ -demoting an item), incompetent (involving irrelevant aspects), or biased -- -entails the problem of identifying credible reviews. Prior works involve -classifiers harnessing rich information about items/users -- which might not be -readily available in several domains -- that provide only limited -interpretability as to why a review is deemed non-credible. This paper presents -a novel approach to address the above issues. We utilize latent topic models -leveraging review texts, item ratings, and timestamps to derive consistency -features without relying on item/user histories, unavailable for ""long-tail"" -items/users. We develop models, for computing review credibility scores to -provide interpretable evidence for non-credible reviews, that are also -transferable to other domains -- addressing the scarcity of labeled data. -Experiments on real-world datasets demonstrate improvements over -state-of-the-art baselines. -" -4790,1705.02669,"Subhabrata Mukherjee, Stephan Guennemann, Gerhard Weikum","Item Recommendation with Continuous Experience Evolution of Users using - Brownian Motion",cs.AI cs.CL cs.IR cs.SI stat.ML," Online review communities are dynamic as users join and leave, adopt new -vocabulary, and adapt to evolving trends. Recent work has shown that -recommender systems benefit from explicit consideration of user experience. -However, prior work assumes a fixed number of discrete experience levels, -whereas in reality users gain experience and mature continuously over time. -This paper presents a new model that captures the continuous evolution of user -experience, and the resulting language model in reviews and other posts. Our -model is unsupervised and combines principles of Geometric Brownian Motion, -Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal -progression of user experience and language model respectively. We develop -practical algorithms for estimating the model parameters from data and for -inference with our model (e.g., to recommend items). Extensive experiments with -five real-world datasets show that our model not only fits data better than -discrete-model baselines, but also outperforms state-of-the-art methods for -predicting item ratings. -" -4791,1705.02700,"Vincent Fiorentini, Megan Shao, Julie Medero",Generating Memorable Mnemonic Encodings of Numbers,cs.CL," The major system is a mnemonic system that can be used to memorize sequences -of numbers. In this work, we present a method to automatically generate -sentences that encode a given number. We propose several encoding models and -compare the most promising ones in a password memorability study. The results -of the study show that a model combining part-of-speech sentence templates with -an $n$-gram language model produces the most memorable password -representations. -" -4792,1705.02735,"Edmund Tong, Amir Zadeh, Cara Jones, Louis-Philippe Morency",Combating Human Trafficking with Deep Multimodal Models,cs.CL cs.CY," Human trafficking is a global epidemic affecting millions of people across -the planet. Sex trafficking, the dominant form of human trafficking, has seen a -significant rise mostly due to the abundance of escort websites, where human -traffickers can openly advertise among at-will escort advertisements. In this -paper, we take a major step in the automatic detection of advertisements -suspected to pertain to human trafficking. We present a novel dataset called -Trafficking-10k, with more than 10,000 advertisements annotated for this task. -The dataset contains two sources of information per advertisement: text and -images. For the accurate detection of trafficking advertisements, we designed -and trained a deep multimodal model called the Human Trafficking Deep Network -(HTDN). -" -4793,1705.02750,"Hayate Iso, Shoko Wakamiya, Eiji Aramaki","Density Estimation for Geolocation via Convolutional Mixture Density - Network",cs.CL," Nowadays, geographic information related to Twitter is crucially important -for fine-grained applications. However, the amount of geographic information -avail- able on Twitter is low, which makes the pursuit of many applications -challenging. Under such circumstances, estimating the location of a tweet is an -important goal of the study. Unlike most previous studies that estimate the -pre-defined district as the classification task, this study employs a -probability distribution to represent richer information of the tweet, not only -the location but also its ambiguity. To realize this modeling, we propose the -convolutional mixture density network (CMDN), which uses text data to estimate -the mixture model parameters. Experimentally obtained results reveal that CMDN -achieved the highest prediction performance among the method for predicting the -exact coordinates. It also provides a quantitative representation of the -location ambiguity for each tweet that properly works for extracting the -reliable location estimations. -" -4794,1705.02798,"Minghao Hu and Yuxing Peng and Zhen Huang and Xipeng Qiu and Furu Wei - and Ming Zhou",Reinforced Mnemonic Reader for Machine Reading Comprehension,cs.CL," In this paper, we introduce the Reinforced Mnemonic Reader for machine -reading comprehension tasks, which enhances previous attentive readers in two -aspects. First, a reattention mechanism is proposed to refine current -attentions by directly accessing to past attentions that are temporally -memorized in a multi-round alignment architecture, so as to avoid the problems -of attention redundancy and attention deficiency. Second, a new optimization -approach, called dynamic-critical reinforcement learning, is introduced to -extend the standard supervised method. It always encourages to predict a more -acceptable answer so as to address the convergence suppression problem occurred -in traditional reinforcement learning algorithms. Extensive experiments on the -Stanford Question Answering Dataset (SQuAD) show that our model achieves -state-of-the-art results. Meanwhile, our model outperforms previous systems by -over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD -datasets. -" -4795,1705.02925,Pradeep Dasigi and Waleed Ammar and Chris Dyer and Eduard Hovy,Ontology-Aware Token Embeddings for Prepositional Phrase Attachment,cs.CL," Type-level word embeddings use the same set of parameters to represent all -instances of a word regardless of its context, ignoring the inherent lexical -ambiguity in language. Instead, we embed semantic concepts (or synsets) as -defined in WordNet and represent a word token in a particular context by -estimating a distribution over relevant semantic concepts. We use the new, -context-sensitive embeddings in a model for predicting prepositional phrase(PP) -attachments and jointly learn the concept embeddings and model parameters. We -show that using context-sensitive embeddings improves the accuracy of the PP -attachment model by 5.4% absolute points, which amounts to a 34.4% relative -reduction in errors. -" -4796,1705.03122,"Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. - Dauphin",Convolutional Sequence to Sequence Learning,cs.CL," The prevalent approach to sequence to sequence learning maps an input -sequence to a variable length output sequence via recurrent neural networks. We -introduce an architecture based entirely on convolutional neural networks. -Compared to recurrent models, computations over all elements can be fully -parallelized during training and optimization is easier since the number of -non-linearities is fixed and independent of the input length. Our use of gated -linear units eases gradient propagation and we equip each decoder layer with a -separate attention module. We outperform the accuracy of the deep LSTM setup of -Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French -translation at an order of magnitude faster speed, both on GPU and CPU. -" -4797,1705.03127,Stefan Jansen,Word and Phrase Translation with word2vec,cs.CL cs.AI," Word and phrase tables are key inputs to machine translations, but costly to -produce. New unsupervised learning methods represent words and phrases in a -high-dimensional vector space, and these monolingual embeddings have been shown -to encode syntactic and semantic relationships between language elements. The -information captured by these embeddings can be exploited for bilingual -translation by learning a transformation matrix that allows matching relative -positions across two monolingual vector spaces. This method aims to identify -high-quality candidates for word and phrase translation more cost-effectively -from unlabeled data. - This paper expands the scope of previous attempts of bilingual translation to -four languages (English, German, Spanish, and French). It shows how to process -the source data, train a neural network to learn the high-dimensional -embeddings for individual languages and expands the framework for testing their -quality beyond the English language. Furthermore, it shows how to learn -bilingual transformation matrices and obtain candidates for word and phrase -translation, and assess their quality. -" -4798,1705.03151,"Zhiyuan Tang, Dong Wang, Yixiang Chen, Lantian Li and Andrew Abel",Phonetic Temporal Neural Model for Language Identification,cs.CL cs.LG cs.NE," Deep neural models, particularly the LSTM-RNN model, have shown great -potential for language identification (LID). However, the use of phonetic -information has been largely overlooked by most existing neural LID methods, -although this information has been used very successfully in conventional -phonetic LID systems. We present a phonetic temporal neural model for LID, -which is an LSTM-RNN LID system that accepts phonetic features produced by a -phone-discriminative DNN as the input, rather than raw acoustic features. This -new model is similar to traditional phonetic LID methods, but the phonetic -knowledge here is much richer: it is at the frame level and involves compacted -information of all phones. Our experiments conducted on the Babel database and -the AP16-OLR database demonstrate that the temporal phonetic neural approach is -very effective, and significantly outperforms existing acoustic neural models. -It also outperforms the conventional i-vector approach on short utterances and -in noisy conditions. -" -4799,1705.03152,"Zhiyuan Tang, Dong Wang, Yixiang Chen, Ying Shi, Lantian Li",Phone-aware Neural Language Identification,cs.CL cs.LG cs.NE," Pure acoustic neural models, particularly the LSTM-RNN model, have shown -great potential in language identification (LID). However, the phonetic -information has been largely overlooked by most of existing neural LID models, -although this information has been used in the conventional phonetic LID -systems with a great success. We present a phone-aware neural LID architecture, -which is a deep LSTM-RNN LID system but accepts output from an RNN-based ASR -system. By utilizing the phonetic knowledge, the LID performance can be -significantly improved. Interestingly, even if the test language is not -involved in the ASR training, the phonetic knowledge still presents a large -contribution. Our experiments conducted on four languages within the Babel -corpus demonstrated that the phone-aware approach is highly effective. -" -4800,1705.03202,"Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin","Does William Shakespeare REALLY Write Hamlet? Knowledge Representation - Learning with Confidence",cs.CL," Knowledge graphs (KGs), which could provide essential relational information -between entities, have been widely utilized in various knowledge-driven -applications. Since the overall human knowledge is innumerable that still grows -explosively and changes frequently, knowledge construction and update -inevitably involve automatic mechanisms with less human supervision, which -usually bring in plenty of noises and conflicts to KGs. However, most -conventional knowledge representation learning methods assume that all triple -facts in existing KGs share the same significance without any noises. To -address this problem, we propose a novel confidence-aware knowledge -representation learning framework (CKRL), which detects possible noises in KGs -while learning knowledge representations with confidence simultaneously. -Specifically, we introduce the triple confidence to conventional -translation-based methods for knowledge representation learning. To make triple -confidence more flexible and universal, we only utilize the internal structural -information in KGs, and propose three kinds of triple confidences considering -both local and global structural information. In experiments, We evaluate our -models on knowledge graph noise detection, knowledge graph completion and -triple classification. Experimental results demonstrate that our -confidence-aware models achieve significant and consistent improvements on all -tasks, which confirms the capability of CKRL modeling confidence with -structural information in both KG noise detection and knowledge representation -learning. -" -4801,1705.03247,Somnath Roy,A Systematic Review of Hindi Prosody,cs.CL," Prosody describes both form and function of a sentence using the -suprasegmental features of speech. Prosody phenomena are explored in the domain -of higher phonological constituents such as word, phonological phrase and -intonational phrase. The study of prosody at the word level is called word -prosody and above word level is called sentence prosody. Word Prosody describes -stress pattern by comparing the prosodic features of its constituent syllables. -Sentence Prosody involves the study on phrasing pattern and intonatonal pattern -of a language. The aim of this study is to summarize the existing works on -Hindi prosody carried out in different domain of language and speech -processing. The review is presented in a systematic fashion so that it could be -a useful resource for one who wants to build on the existing works. -" -4802,1705.03261,"Zibo Yi, Shasha Li, Jie Yu, Qingbo Wu","Drug-drug Interaction Extraction via Recurrent Neural Network with - Multiple Attention Layers",cs.CL," Drug-drug interaction (DDI) is a vital information when physicians and -pharmacists intend to co-administer two or more drugs. Thus, several DDI -databases are constructed to avoid mistakenly combined use. In recent years, -automatically extracting DDIs from biomedical text has drawn researchers' -attention. However, the existing work utilize either complex feature -engineering or NLP tools, both of which are insufficient for sentence -comprehension. Inspired by the deep learning approaches in natural language -processing, we propose a recur- rent neural network model with multiple -attention layers for DDI classification. We evaluate our model on 2013 SemEval -DDIExtraction dataset. The experiments show that our model classifies most of -the drug pairs into correct DDI categories, which outperforms the existing NLP -or deep learning methods. -" -4803,1705.03389,"Liang Li, Pengyu Li, Yifan Liu, Tao Wan, Zengchang Qin","Logical Parsing from Natural Language Based on a Neural Translation - Model",cs.CL," Semantic parsing has emerged as a significant and powerful paradigm for -natural language interface and question answering systems. Traditional methods -of building a semantic parser rely on high-quality lexicons, hand-crafted -grammars and linguistic features which are limited by applied domain or -representation. In this paper, we propose a general approach to learn from -denotations based on Seq2Seq model augmented with attention mechanism. We -encode input sequence into vectors and use dynamic programming to infer -candidate logical forms. We utilize the fact that similar utterances should -have similar logical forms to help reduce the searching space. Under our -learning policy, the Seq2Seq model can learn mappings gradually with noises. -Curriculum learning is adopted to make the learning smoother. We test our -method on the arithmetic domain which shows our model can successfully infer -the correct logical forms and learn the word meanings, compositionality and -operation orders simultaneously. -" -4804,1705.03454,Matthew Lamm and Mihail Eric,The Pragmatics of Indirect Commands in Collaborative Discourse,cs.CL cs.AI," Today's artificial assistants are typically prompted to perform tasks through -direct, imperative commands such as \emph{Set a timer} or \emph{Pick up the -box}. However, to progress toward more natural exchanges between humans and -these assistants, it is important to understand the way non-imperative -utterances can indirectly elicit action of an addressee. In this paper, we -investigate command types in the setting of a grounded, collaborative game. We -focus on a less understood family of utterances for eliciting agent action, -locatives like \emph{The chair is in the other room}, and demonstrate how these -utterances indirectly command in specific game state contexts. Our work shows -that models with domain-specific grounding can effectively realize the -pragmatic reasoning that is necessary for more robust natural language -interaction. -" -4805,1705.03455,"Ankur Bapna, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck",Sequential Dialogue Context Modeling for Spoken Language Understanding,cs.CL cs.AI cs.LG," Spoken Language Understanding (SLU) is a key component of goal oriented -dialogue systems that would parse user utterances into semantic frame -representations. Traditionally SLU does not utilize the dialogue history beyond -the previous system turn and contextual ambiguities are resolved by the -downstream components. In this paper, we explore novel approaches for modeling -dialogue context in a recurrent neural network (RNN) based language -understanding system. We propose the Sequential Dialogue Encoder Network, that -allows encoding context from the dialogue history in chronological order. We -compare the performance of our proposed architecture with two context models, -one that uses just the previous turn context and another that encodes dialogue -context in a memory network, but loses the order of utterances in the dialogue -history. Experiments with a multi-domain dialogue dataset demonstrate that the -proposed architecture results in reduced semantic frame error rates. -" -4806,1705.03487,"Masahiro Kazama, Minami Sugimoto, Chizuru Hosokawa, Keisuke - Matsushima, Lav R. Varshney, and Yoshiki Ishikawa",A neural network system for transformation of regional cuisine style,cs.CY cs.CL," We propose a novel system which can transform a recipe into any selected -regional style (e.g., Japanese, Mediterranean, or Italian). This system has two -characteristics. First the system can identify the degree of regional cuisine -style mixture of any selected recipe and visualize such regional cuisine style -mixtures using barycentric Newton diagrams. Second, the system can suggest -ingredient substitutions through an extended word2vec model, such that a recipe -becomes more authentic for any selected regional cuisine style. Drawing on a -large number of recipes from Yummly, an example shows how the proposed system -can transform a traditional Japanese recipe, Sukiyaki, into French style. -" -4807,1705.03508,"Hamid Reza Hassanzadeh, Ying Sha, May D. Wang","DeepDeath: Learning to Predict the Underlying Cause of Death with Big - Data",cs.CL cs.LG stat.ML," Multiple cause-of-death data provides a valuable source of information that -can be used to enhance health standards by predicting health related -trajectories in societies with large populations. These data are often -available in large quantities across U.S. states and require Big Data -techniques to uncover complex hidden patterns. We design two different classes -of models suitable for large-scale analysis of mortality data, a Hadoop-based -ensemble of random forests trained over N-grams, and the DeepDeath, a deep -classifier based on the recurrent neural network (RNN). We apply both classes -to the mortality data provided by the National Center for Health Statistics and -show that while both perform significantly better than the random classifier, -the deep model that utilizes long short-term memory networks (LSTMs), surpasses -the N-gram based models and is capable of learning the temporal aspect of the -data without a need for building ad-hoc, expert-driven features. -" -4808,1705.03551,"Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer","TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for - Reading Comprehension",cs.CL," We present TriviaQA, a challenging reading comprehension dataset containing -over 650K question-answer-evidence triples. TriviaQA includes 95K -question-answer pairs authored by trivia enthusiasts and independently gathered -evidence documents, six per question on average, that provide high quality -distant supervision for answering the questions. We show that, in comparison to -other recently introduced large-scale datasets, TriviaQA (1) has relatively -complex, compositional questions, (2) has considerable syntactic and lexical -variability between questions and corresponding answer-evidence sentences, and -(3) requires more cross sentence reasoning to find answers. We also present two -baseline algorithms: a feature-based classifier and a state-of-the-art neural -network, that performs well on SQuAD reading comprehension. Neither approach -comes close to human performance (23% and 40% vs. 80%), suggesting that -TriviaQA is a challenging testbed that is worth significant future study. Data -and code available at -- http://nlp.cs.washington.edu/triviaqa/ -" -4809,1705.03556,"Hamed Zamani, W. Bruce Croft",Relevance-based Word Embedding,cs.IR cs.CL cs.LG cs.NE," Learning a high-dimensional dense representation for vocabulary terms, also -known as a word embedding, has recently attracted much attention in natural -language processing and information retrieval tasks. The embedding vectors are -typically learned based on term proximity in a large corpus. This means that -the objective in well-known word embedding algorithms, e.g., word2vec, is to -accurately predict adjacent word(s) for a given word or context. However, this -objective is not necessarily equivalent to the goal of many information -retrieval (IR) tasks. The primary objective in various IR tasks is to capture -relevance instead of term proximity, syntactic, or even semantic similarity. -This is the motivation for developing unsupervised relevance-based word -embedding models that learn word representations based on query-document -relevance information. In this paper, we propose two learning models with -different objective functions; one learns a relevance distribution over the -vocabulary set for each query, and the other classifies each term as belonging -to the relevant or non-relevant class for each query. To train our models, we -used over six million unique queries and the top ranked documents retrieved in -response to each query, which are assumed to be relevant to the query. We -extrinsically evaluate our learned word representation models using two IR -tasks: query expansion and query classification. Both query expansion -experiments on four TREC collections and query classification experiments on -the KDD Cup 2005 dataset suggest that the relevance-based word embedding models -significantly outperform state-of-the-art proximity-based embedding models, -such as word2vec and GloVe. -" -4810,1705.03557,"Ahmed Khalifa, Gabriella A. B. Barros, Julian Togelius",DeepTingle,cs.CL cs.LG," DeepTingle is a text prediction and classification system trained on the -collected works of the renowned fantastic gay erotica author Chuck Tingle. -Whereas the writing assistance tools you use everyday (in the form of -predictive text, translation, grammar checking and so on) are trained on -generic, purportedly ""neutral"" datasets, DeepTingle is trained on a very -specific, internally consistent but externally arguably eccentric dataset. This -allows us to foreground and confront the norms embedded in data-driven -creativity and productivity assistance tools. As such tools effectively -function as extensions of our cognition into technology, it is important to -identify the norms they embed within themselves and, by extension, us. -DeepTingle is realized as a web application based on LSTM networks and the -GloVe word embedding, implemented in JavaScript with Keras-JS. -" -4811,1705.03633,"Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy - Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick",Inferring and Executing Programs for Visual Reasoning,cs.CV cs.CL cs.LG," Existing methods for visual reasoning attempt to directly map inputs to -outputs using black-box architectures without explicitly modeling the -underlying reasoning processes. As a result, these black-box models often learn -to exploit biases in the data rather than learning to perform visual reasoning. -Inspired by module networks, this paper proposes a model for visual reasoning -that consists of a program generator that constructs an explicit representation -of the reasoning process to be performed, and an execution engine that executes -the resulting program to produce an answer. Both the program generator and the -execution engine are implemented by neural networks, and are trained using a -combination of backpropagation and REINFORCE. Using the CLEVR benchmark for -visual reasoning, we show that our model significantly outperforms strong -baselines and generalizes better in a variety of settings. -" -4812,1705.03645,Shantanu Kumar,A Survey of Deep Learning Methods for Relation Extraction,cs.CL," Relation Extraction is an important sub-task of Information Extraction which -has the potential of employing deep learning (DL) models with the creation of -large datasets using distant supervision. In this review, we compare the -contributions and pitfalls of the various DL models that have been used for the -task, to help guide the path ahead. -" -4813,1705.03670,"Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang",Deep Speaker Feature Learning for Text-independent Speaker Verification,cs.SD cs.CL cs.LG," Recently deep neural networks (DNNs) have been used to learn speaker -features. However, the quality of the learned features is not sufficiently -good, so a complex back-end model, either neural or probabilistic, has to be -used to address the residual uncertainty when applied to speaker verification, -just as with raw features. This paper presents a convolutional time-delay deep -neural network structure (CT-DNN) for speaker feature learning. Our -experimental results on the Fisher database demonstrated that this CT-DNN can -produce high-quality speaker features: even with a single feature (0.3 seconds -including the context), the EER can be as low as 7.68%. This effectively -confirmed that the speaker trait is largely a deterministic short-time property -rather than a long-time distributional pattern, and therefore can be extracted -from just dozens of frames. -" -4814,1705.03751,Gagan Madan,A Survey of Distant Supervision Methods using PGMs,cs.AI cs.CL," Relation Extraction refers to the task of populating a database with tuples -of the form $r(e_1, e_2)$, where $r$ is a relation and $e_1$, $e_2$ are -entities. Distant supervision is one such technique which tries to -automatically generate training examples based on an existing KB such as -Freebase. This paper is a survey of some of the techniques in distant -supervision which primarily rely on Probabilistic Graphical Models (PGMs). -" -4815,1705.03773,"Jiyuan Zhang, Yang Feng, Dong Wang, Yang Wang, Andrew Abel, Shiyue - Zhang, Andi Zhang",Flexible and Creative Chinese Poetry Generation Using Neural Memory,cs.AI cs.CL," It has been shown that Chinese poems can be successfully generated by -sequence-to-sequence neural models, particularly with the attention mechanism. -A potential problem of this approach, however, is that neural models can only -learn abstract rules, while poem generation is a highly creative process that -involves not only rules but also innovations for which pure statistical models -are not appropriate in principle. This work proposes a memory-augmented neural -model for Chinese poem generation, where the neural model and the augmented -memory work together to balance the requirements of linguistic accordance and -aesthetic innovation, leading to innovative generations that are still -rule-compliant. In addition, it is found that the memory mechanism provides -interesting flexibility that can be used to generate poems with different -styles. -" -4816,1705.03802,Laura Perez-Beltrachini and Claire Gardent,Analysing Data-To-Text Generation Benchmarks,cs.CL," Recently, several data-sets associating data to text have been created to -train data-to-text surface realisers. It is unclear however to what extent the -surface realisation task exercised by these data-sets is linguistically -challenging. Do these data-sets provide enough variety to encourage the -development of generic, high-quality data-to-text surface realisers ? In this -paper, we argue that these data-sets have important drawbacks. We back up our -claim using statistics, metrics and manual evaluation. We conclude by eliciting -a set of criteria for the creation of a data-to-text benchmark which could help -better support the development, evaluation and comparison of linguistically -sophisticated data-to-text surface realisers. -" -4817,1705.03865,Akshay Kumar Gupta,Survey of Visual Question Answering: Datasets and Techniques,cs.CL cs.AI cs.CV," Visual question answering (or VQA) is a new and exciting problem that -combines natural language processing and computer vision techniques. We present -a survey of the various datasets and models that have been used to tackle this -task. The first part of the survey details the various datasets for VQA and -compares them along some common factors. The second part of this survey details -the different approaches for VQA, classified into four types: non-deep learning -models, deep learning models without attention, deep learning models with -attention, and other models which do not fit into the first three. Finally, we -compare the performances of these approaches and provide some directions for -future work. -" -4818,1705.03919,"Mitchell Stern, Jacob Andreas, Dan Klein",A Minimal Span-Based Neural Constituency Parser,cs.CL," In this work, we present a minimal neural model for constituency parsing -based on independent scoring of labels and spans. We show that this model is -not only compatible with classical dynamic programming techniques, but also -admits a novel greedy top-down inference algorithm based on recursive -partitioning of the input. We demonstrate empirically that both prediction -schemes are competitive with recent work, and when combined with basic -extensions to the scoring model are capable of achieving state-of-the-art -single-model performance on the Penn Treebank (91.79 F1) and strong performance -on the French Treebank (82.23 F1). -" -4819,1705.03995,"Bingfeng Luo, Yansong Feng, Zheng Wang, Zhanxing Zhu, Songfang Huang, - Rui Yan and Dongyan Zhao","Learning with Noise: Enhance Distantly Supervised Relation Extraction - with Dynamic Transition Matrix",cs.CL," Distant supervision significantly reduces human efforts in building training -data for many classification tasks. While promising, this technique often -introduces noise to the generated training data, which can severely affect the -model performance. In this paper, we take a deep look at the application of -distant supervision in relation extraction. We show that the dynamic transition -matrix can effectively characterize the noise in the training data built by -distant supervision. The transition matrix can be effectively trained using a -novel curriculum learning based method without any direct supervision about the -noise. We thoroughly evaluate our approach under a wide range of extraction -scenarios. Experimental results show that our approach consistently improves -the extraction results and outperforms the state-of-the-art in various -evaluation scenarios. -" -4820,1705.04003,"Thai-Hoang Pham, Phuong Le-Hong",Content-based Approach for Vietnamese Spam SMS Filtering,cs.CL," Short Message Service (SMS) spam is a serious problem in Vietnam because of -the availability of very cheap pre-paid SMS packages. There are some systems to -detect and filter spam messages for English, most of which use machine learning -techniques to analyze the content of messages and classify them. For -Vietnamese, there is some research on spam email filtering but none focused on -SMS. In this work, we propose the first system for filtering Vietnamese spam -SMS. We first propose an appropriate preprocessing method since existing tools -for Vietnamese preprocessing cannot give good accuracy on our dataset. We then -experiment with vector representations and classifiers to find the best model -for this problem. Our system achieves an accuracy of 94% when labelling spam -messages while the misclassification rate of legitimate messages is relatively -small, about only 0.4%. This is an encouraging result compared to that of -English and can be served as a strong baseline for future development of -Vietnamese SMS spam prevention systems. -" -4821,1705.04038,"Thai-Hoang Pham, Xuan-Khoai Pham, Phuong Le-Hong",Building a Semantic Role Labelling System for Vietnamese,cs.CL," Semantic role labelling (SRL) is a task in natural language processing which -detects and classifies the semantic arguments associated with the predicates of -a sentence. It is an important step towards understanding the meaning of a -natural language. There exists SRL systems for well-studied languages like -English, Chinese or Japanese but there is not any such system for the -Vietnamese language. In this paper, we present the first SRL system for -Vietnamese with encouraging accuracy. We first demonstrate that a simple -application of SRL techniques developed for English could not give a good -accuracy for Vietnamese. We then introduce a new algorithm for extracting -candidate syntactic constituents, which is much more accurate than the common -node-mapping algorithm usually used in the identification step. Finally, in the -classification step, in addition to the common linguistic features, we propose -novel and useful features for use in SRL. Our SRL system achieves an $F_1$ -score of 73.53\% on the Vietnamese PropBank corpus. This system, including -software and corpus, is available as an open source project and we believe that -it is a good baseline for the development of future Vietnamese SRL systems. -" -4822,1705.04044,"Thai-Hoang Pham, Phuong Le-Hong","End-to-end Recurrent Neural Network Models for Vietnamese Named Entity - Recognition: Word-level vs. Character-level",cs.CL," This paper demonstrates end-to-end neural network architectures for -Vietnamese named entity recognition. Our best model is a combination of -bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network -(CNN), Conditional Random Field (CRF), using pre-trained word embeddings as -input, which achieves an F1 score of 88.59% on a standard test set. Our system -is able to achieve a comparable performance to the first-rank system of the -VLSP campaign without using any syntactic or hand-crafted features. We also -give an extensive empirical study on using common deep learning models for -Vietnamese NER, at both word and character level. -" -4823,1705.04146,"Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom","Program Induction by Rationale Generation : Learning to Solve and - Explain Algebraic Word Problems",cs.AI cs.CL cs.LG," Solving algebraic word problems requires executing a series of arithmetic -operations---a program---to obtain a final answer. However, since programs can -be arbitrarily complicated, inducing them directly from question-answer pairs -is a formidable challenge. To make this task more feasible, we solve these -problems by generating answer rationales, sequences of natural language and -human-readable mathematical expressions that derive the final answer through a -series of small steps. Although rationales do not explicitly specify programs, -they provide a scaffolding for their structure via intermediate milestones. To -evaluate our approach, we have created a new 100,000-sample dataset of -questions, answers and rationales. Experimental results show that indirect -supervision of program learning via answer rationales is a promising strategy -for inducing arithmetic programs. -" -4824,1705.04153,"Pengfei Liu, Xipeng Qiu, Xuanjing Huang",Dynamic Compositional Neural Networks over Tree Structure,cs.CL," Tree-structured neural networks have proven to be effective in learning -semantic representations by exploiting syntactic information. In spite of their -success, most existing models suffer from the underfitting problem: they -recursively use the same shared compositional function throughout the whole -compositional process and lack expressive power due to inability to capture the -richness of compositionality. In this paper, we address this issue by -introducing the dynamic compositional neural networks over tree structure -(DC-TreeNN), in which the compositional function is dynamically generated by a -meta network. The role of meta-network is to capture the metaknowledge across -the different compositional rules and formulate them. Experimental results on -two typical tasks show the effectiveness of the proposed models. -" -4825,1705.04187,Camilo Akimushkin and Diego R. Amancio and Osvaldo N. Oliveira Jr,"On the role of words in the network structure of texts: application to - authorship attribution",cs.CL cs.SI," Well-established automatic analyses of texts mainly consider frequencies of -linguistic units, e.g. letters, words and bigrams, while methods based on -co-occurrence networks consider the structure of texts regardless of the nodes -label (i.e. the words semantics). In this paper, we reconcile these distinct -viewpoints by introducing a generalized similarity measure to compare texts -which accounts for both the network structure of texts and the role of -individual words in the networks. We use the similarity measure for authorship -attribution of three collections of books, each composed of 8 authors and 10 -books per author. High accuracy rates were obtained with typical values from -90% to 98.75%, much higher than with the traditional the TF-IDF approach for -the same collections. These accuracies are also higher than taking only the -topology of networks into account. We conclude that the different properties of -specific words on the macroscopic scale structure of a whole text are as -relevant as their frequency of appearance; conversely, considering the identity -of nodes brings further knowledge about a piece of text represented as a -network. -" -4826,1705.04253,"Behrang QasemiZadeh, Laura Kallmeyer",Sketching Word Vectors Through Hashing,cs.CL," We propose a new fast word embedding technique using hash functions. The -method is a derandomization of a new type of random projections: By -disregarding the classic constraint used in designing random projections (i.e., -preserving pairwise distances in a particular normed space), our solution -exploits extremely sparse non-negative random projections. Our experiments show -that the proposed method can achieve competitive results, comparable to neural -embedding learning techniques, however, with only a fraction of the -computational complexity of these methods. While the proposed derandomization -enhances the computational and space complexity of our method, the possibility -of applying weighting methods such as positive pointwise mutual information -(PPMI) to our models after their construction (and at a reduced dimensionality) -imparts a high discriminatory power to the resulting embeddings. Obviously, -this method comes with other known benefits of random projection-based -techniques such as ease of update. -" -4827,1705.04304,"Romain Paulus, Caiming Xiong and Richard Socher",A Deep Reinforced Model for Abstractive Summarization,cs.CL," Attentional, RNN-based encoder-decoder models for abstractive summarization -have achieved good performance on short input and output sequences. For longer -documents and summaries however these models often include repetitive and -incoherent phrases. We introduce a neural network model with a novel -intra-attention that attends over the input and continuously generated output -separately, and a new training method that combines standard supervised word -prediction and reinforcement learning (RL). Models trained only with supervised -learning often exhibit ""exposure bias"" - they assume ground truth is provided -at each step during training. However, when standard word prediction is -combined with the global sequence prediction training of RL the resulting -summaries become more readable. We evaluate this model on the CNN/Daily Mail -and New York Times datasets. Our model obtains a 41.16 ROUGE-1 score on the -CNN/Daily Mail dataset, an improvement over previous state-of-the-art models. -Human evaluation also shows that our model produces higher quality summaries. -" -4828,1705.04350,"Desmond Elliott, \'Akos K\'ad\'ar",Imagination improves Multimodal Translation,cs.CL cs.CV," We decompose multimodal translation into two sub-tasks: learning to translate -and learning visually grounded representations. In a multitask learning -framework, translations are learned in an attention-based encoder-decoder, and -grounded representations are learned through image representation prediction. -Our approach improves translation performance compared to the state of the art -on the Multi30K dataset. Furthermore, it is equally effective if we train the -image prediction task on the external MS COCO dataset, and we find improvements -if we train the translation model on the external News Commentary parallel -text. -" -4829,1705.04400,"Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, - Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul - Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop - Sriram, Zhenyao Zhu",Reducing Bias in Production Speech Models,cs.CL," Replacing hand-engineered pipelines with end-to-end deep learning systems has -enabled strong results in applications like speech and object recognition. -However, the causality and latency constraints of production systems put -end-to-end speech models back into the underfitting regime and expose biases in -the model that we show cannot be overcome by ""scaling up"", i.e., training -bigger models on more data. In this work we systematically identify and address -sources of bias, reducing error rates by up to 20% while remaining practical -for deployment. We achieve this by utilizing improved neural architectures for -streaming inference, solving optimization issues, and employing strategies that -increase audio and label modelling versatility. -" -4830,1705.04416,"Dawn Chen, Joshua C. Peterson, Thomas L. Griffiths",Evaluating vector-space models of analogy,cs.CL," Vector-space representations provide geometric tools for reasoning about the -similarity of a set of objects and their relationships. Recent machine learning -methods for deriving vector-space embeddings of words (e.g., word2vec) have -achieved considerable success in natural language processing. These vector -spaces have also been shown to exhibit a surprising capacity to capture verbal -analogies, with similar results for natural images, giving new life to a -classic model of analogies as parallelograms that was first proposed by -cognitive scientists. We evaluate the parallelogram model of analogy as applied -to modern word embeddings, providing a detailed analysis of the extent to which -this approach captures human relational similarity judgments in a large -benchmark dataset. We find that that some semantic relationships are better -captured than others. We then provide evidence for deeper limitations of the -parallelogram model based on the intrinsic geometric constraints of vector -spaces, paralleling classic results for first-order similarity. -" -4831,1705.04434,Peng Qi and Christopher D. Manning,Arc-swift: A Novel Transition System for Dependency Parsing,cs.CL," Transition-based dependency parsers often need sequences of local shift and -reduce operations to produce certain attachments. Correct individual decisions -hence require global information about the sentence context and mistakes cause -error propagation. This paper proposes a novel transition system, arc-swift, -that enables direct attachments between tokens farther apart with a single -transition. This allows the parser to leverage lexical information more -directly in transition decisions. Hence, arc-swift can achieve significantly -better performance with a very small beam size. Our parsers reduce error by -3.7--7.6% relative to those using existing transition systems on the Penn -Treebank dependency parsing task and English Universal Dependencies. -" -4832,1705.04815,Kyle Richardson and Jonas Kuhn,Learning Semantic Correspondences in Technical Documentation,cs.CL," We consider the problem of translating high-level textual descriptions to -formal representations in technical documentation as part of an effort to model -the meaning of such documentation. We focus specifically on the problem of -learning translational correspondences between text descriptions and grounded -representations in the target documentation, such as formal representation of -functions or code templates. Our approach exploits the parallel nature of such -documentation, or the tight coupling between high-level text and the low-level -representations we aim to learn. Data is collected by mining technical -documents for such parallel text-representation pairs, which we use to train a -simple semantic parsing model. We report new baseline results on sixteen novel -datasets, including the standard library documentation for nine popular -programming languages across seven natural languages, and a small collection of -Unix utility manuals. -" -4833,1705.04839,"Firoj Alam, Morena Danieli and Giuseppe Riccardi",Annotating and Modeling Empathy in Spoken Conversations,cs.CL," Empathy, as defined in behavioral sciences, expresses the ability of human -beings to recognize, understand and react to emotions, attitudes and beliefs of -others. The lack of an operational definition of empathy makes it difficult to -measure it. In this paper, we address two related problems in automatic -affective behavior analysis: the design of the annotation protocol and the -automatic recognition of empathy from spoken conversations. We propose and -evaluate an annotation scheme for empathy inspired by the modal model of -emotions. The annotation scheme was evaluated on a corpus of real-life, dyadic -spoken conversations. In the context of behavioral analysis, we designed an -automatic segmentation and classification system for empathy. Given the -different speech and language levels of representation where empathy may be -communicated, we investigated features derived from the lexical and acoustic -spaces. The feature development process was designed to support both the fusion -and automatic selection of relevant features from high dimensional space. The -automatic classification system was evaluated on call center conversations -where it showed significantly better performance than the baseline. -" -4834,1705.05039,"Kechen Qin, Lu Wang, and Joseph Kim",Joint Modeling of Content and Discourse Relations in Dialogues,cs.CL," We present a joint modeling approach to identify salient discussion points in -spoken meetings as well as to label the discourse relations between speaker -turns. A variation of our model is also discussed when discourse relations are -treated as latent variables. Experimental results on two popular meeting -corpora show that our joint model can outperform state-of-the-art approaches -for both phrase-based content selection and discourse relation prediction -tasks. We also evaluate our model on predicting the consistency among team -members' understanding of their group decisions. Classifiers trained with -features constructed from our model achieve significant better predictive -performance than the state-of-the-art. -" -4835,1705.05040,"Lu Wang, Nick Beauchamp, Sarah Shugars, and Kechen Qin","Winning on the Merits: The Joint Effects of Content and Style on Debate - Outcomes",cs.CL," Debate and deliberation play essential roles in politics and government, but -most models presume that debates are won mainly via superior style or agenda -control. Ideally, however, debates would be won on the merits, as a function of -which side has the stronger arguments. We propose a predictive model of debate -that estimates the effects of linguistic features and the latent persuasive -strengths of different topics, as well as the interactions between the two. -Using a dataset of 118 Oxford-style debates, our model's combination of content -(as latent topics) and style (as linguistic features) allows us to predict -audience-adjudicated winners with 74% accuracy, significantly outperforming -linguistic features alone (66%). Our model finds that winning sides employ -stronger arguments, and allows us to identify the linguistic features -associated with strong or weak arguments. -" -4836,1705.05183,Sahil Manchanda and Ashish Anand,Representation learning of drug and disease terms for drug repositioning,cs.CL," Drug repositioning (DR) refers to identification of novel indications for the -approved drugs. The requirement of huge investment of time as well as money and -risk of failure in clinical trials have led to surge in interest in drug -repositioning. DR exploits two major aspects associated with drugs and -diseases: existence of similarity among drugs and among diseases due to their -shared involved genes or pathways or common biological effects. Existing -methods of identifying drug-disease association majorly rely on the information -available in the structured databases only. On the other hand, abundant -information available in form of free texts in biomedical research articles are -not being fully exploited. Word-embedding or obtaining vector representation of -words from a large corpora of free texts using neural network methods have been -shown to give significant performance for several natural language processing -tasks. In this work we propose a novel way of representation learning to obtain -features of drugs and diseases by combining complementary information available -in unstructured texts and structured datasets. Next we use matrix completion -approach on these feature vectors to learn projection matrix between drug and -disease vector spaces. The proposed method has shown competitive performance -with state-of-the-art methods. Further, the case studies on Alzheimer's and -Hypertension diseases have shown that the predicted associations are matching -with the existing knowledge. -" -4837,1705.05311,"Lukas Galke, Florian Mai, Alan Schelten, Dennis Brunsch, Ansgar Scherp","Using Titles vs. Full-text as Source for Automated Semantic Document - Annotation",cs.DL cs.CL," A significant part of the largest Knowledge Graph today, the Linked Open Data -cloud, consists of metadata about documents such as publications, news reports, -and other media articles. While the widespread access to the document metadata -is a tremendous advancement, it is yet not so easy to assign semantic -annotations and organize the documents along semantic concepts. Providing -semantic annotations like concepts in SKOS thesauri is a classical research -topic, but typically it is conducted on the full-text of the documents. For the -first time, we offer a systematic comparison of classification approaches to -investigate how far semantic annotations can be conducted using just the -metadata of the documents such as titles published as labels on the Linked Open -Data cloud. We compare the classifications obtained from analyzing the -documents' titles with semantic annotations obtained from analyzing the -full-text. Apart from the prominent text classification baselines kNN and SVM, -we also compare recent techniques of Learning to Rank and neural networks and -revisit the traditional methods logistic regression, Rocchio, and Naive Bayes. -The results show that across three of our four datasets, the performance of the -classifications using only titles reaches over 90% of the quality compared to -the classification performance when using the full-text. Thus, conducting -document classification by just using the titles is a reasonable approach for -automated semantic annotation and opens up new possibilities for enriching -Knowledge Graphs. -" -4838,1705.05414,Mihail Eric and Christopher D. Manning,Key-Value Retrieval Networks for Task-Oriented Dialogue,cs.CL," Neural task-oriented dialogue systems often struggle to smoothly interface -with a knowledge base. In this work, we seek to address this problem by -proposing a new neural dialogue agent that is able to effectively sustain -grounded, multi-domain discourse through a novel key-value retrieval mechanism. -The model is end-to-end differentiable and does not need to explicitly model -dialogue state or belief trackers. We also release a new dataset of 3,031 -dialogues that are grounded through underlying knowledge bases and span three -distinct tasks in the in-car personal assistant space: calendar scheduling, -weather information retrieval, and point-of-interest navigation. Our -architecture is simultaneously trained on data from all domains and -significantly outperforms a competitive rule-based system and other existing -neural dialogue architectures on the provided domains according to both -automatic and human evaluation metrics. -" -4839,1705.05437,Surag Nair,A Biomedical Information Extraction Primer for NLP Researchers,cs.CL," Biomedical Information Extraction is an exciting field at the crossroads of -Natural Language Processing, Biology and Medicine. It encompasses a variety of -different tasks that require application of state-of-the-art NLP techniques, -such as NER and Relation Extraction. This paper provides an overview of the -problems in the field and discusses some of the techniques used for solving -them. -" -4840,1705.05487,"Franck Dernoncourt, Ji Young Lee, Peter Szolovits","NeuroNER: an easy-to-use program for named-entity recognition based on - neural networks",cs.CL cs.NE stat.ML," Named-entity recognition (NER) aims at identifying entities of interest in a -text. Artificial neural networks (ANNs) have recently been shown to outperform -existing NER systems. However, ANNs remain challenging to use for non-expert -users. In this paper, we present NeuroNER, an easy-to-use named-entity -recognition tool based on ANNs. Users can annotate entities using a graphical -web-based user interface (BRAT): the annotations are then used to train an ANN, -which in turn predict entities' locations and categories in new texts. NeuroNER -makes this annotation-training-prediction flow smooth and accessible to anyone. -" -4841,1705.05633,"Tao Ding, Warren K. Bickel, Shimei Pan",Social Media-based Substance Use Prediction,cs.CL cs.LG cs.SI," In this paper, we demonstrate how the state-of-the-art machine learning and -text mining techniques can be used to build effective social media-based -substance use detection systems. Since a substance use ground truth is -difficult to obtain on a large scale, to maximize system performance, we -explore different feature learning methods to take advantage of a large amount -of unsupervised social media data. We also demonstrate the benefit of using -multi-view unsupervised feature learning to combine heterogeneous user -information such as Facebook `""likes"" and ""status updates"" to enhance system -performance. Based on our evaluation, our best models achieved 86% AUC for -predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which -significantly outperformed existing methods. Our investigation has also -uncovered interesting relations between a user's social media behavior (e.g., -word usage) and substance use. -" -4842,1705.05742,"Rakshit Trivedi, Hanjun Dai, Yichen Wang, Le Song",Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs,cs.AI cs.CL cs.LG," The availability of large scale event data with time stamps has given rise to -dynamically evolving knowledge graphs that contain temporal information for -each edge. Reasoning over time in such dynamic knowledge graphs is not yet well -understood. To this end, we present Know-Evolve, a novel deep evolutionary -knowledge network that learns non-linearly evolving entity representations over -time. The occurrence of a fact (edge) is modeled as a multivariate point -process whose intensity function is modulated by the score for that fact -computed based on the learned entity embeddings. We demonstrate significantly -improved performance over various relational learning approaches on two large -scale real-world datasets. Further, our method effectively predicts occurrence -or recurrence time of a fact which is novel compared to prior reasoning -approaches in multi-relational setting. -" -4843,1705.05762,Felipe Urbina and Javier Vera,A decentralized route to the origins of scaling in human language,physics.soc-ph cs.CL nlin.AO," The Zipf's law establishes that if the words of a (large) text are ordered by -decreasing frequency, the frequency versus the rank decreases as a power law -with exponent close to $-1$. Previous work has stressed that this pattern -arises from a conflict of interests of the participants of communication. The -challenge here is to define a computational multi-agent language game, mainly -based on a parameter that measures the relative participant's interests. -Numerical simulations suggest that at critical values of the parameter a -human-like vocabulary, exhibiting scaling properties, seems to appear. The -appearance of an intermediate distribution of frequencies at some critical -values of the parameter suggests that on a population of artificial agents the -emergence of scaling partly arises as a self-organized process only from local -interactions between agents. -" -4844,1705.05940,"Enes Avcu, Chihiro Shibata and Jeffrey Heinz",Subregular Complexity and Deep Learning,cs.CL," This paper argues that the judicial use of formal language theory and -grammatical inference are invaluable tools in understanding how deep neural -networks can and cannot represent and learn long-term dependencies in temporal -sequences. Learning experiments were conducted with two types of Recurrent -Neural Networks (RNNs) on six formal languages drawn from the Strictly Local -(SL) and Strictly Piecewise (SP) classes. The networks were Simple RNNs -(s-RNNs) and Long Short-Term Memory RNNs (LSTMs) of varying sizes. The SL and -SP classes are among the simplest in a mathematically well-understood hierarchy -of subregular classes. They encode local and long-term dependencies, -respectively. The grammatical inference algorithm Regular Positive and Negative -Inference (RPNI) provided a baseline. According to earlier research, the LSTM -architecture should be capable of learning long-term dependencies and should -outperform s-RNNs. The results of these experiments challenge this narrative. -First, the LSTMs' performance was generally worse in the SP experiments than in -the SL ones. Second, the s-RNNs out-performed the LSTMs on the most complex SP -experiment and performed comparably to them on the others. -" -4845,1705.05952,"Dat Quoc Nguyen, Mark Dras, Mark Johnson","A Novel Neural Network Model for Joint POS Tagging and Graph-based - Dependency Parsing",cs.CL," We present a novel neural network model that learns POS tagging and -graph-based dependency parsing jointly. Our model uses bidirectional LSTMs to -learn feature representations shared for both POS tagging and dependency -parsing tasks, thus handling the feature-engineering problem. Our extensive -experiments, on 19 languages from the Universal Dependencies project, show that -our model outperforms the state-of-the-art neural network-based -Stack-propagation model for joint POS tagging and transition-based dependency -parsing, resulting in a new state of the art. Our code is open-source and -available together with pre-trained models at: -https://github.com/datquocnguyen/jPTDP -" -4846,1705.05992,"Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei",Frame Stacking and Retaining for Recurrent Neural Network Acoustic Model,cs.CL," Frame stacking is broadly applied in end-to-end neural network training like -connectionist temporal classification (CTC), and it leads to more accurate -models and faster decoding. However, it is not well-suited to conventional -neural network based on context-dependent state acoustic model, if the decoder -is unchanged. In this paper, we propose a novel frame retaining method which is -applied in decoding. The system which combined frame retaining with frame -stacking could reduces the time consumption of both training and decoding. Long -short-term memory (LSTM) recurrent neural networks (RNNs) using it achieve -almost linear training speedup and reduces relative 41\% real time factor -(RTF). At the same time, recognition performance is no degradation or improves -sightly on Shenma voice search dataset in Mandarin. -" -4847,1705.06031,Wei Wei and Xiaojun Wan,Learning to Identify Ambiguous and Misleading News Headlines,cs.CL cs.CY," Accuracy is one of the basic principles of journalism. However, it is -increasingly hard to manage due to the diversity of news media. Some editors of -online news tend to use catchy headlines which trick readers into clicking. -These headlines are either ambiguous or misleading, degrading the reading -experience of the audience. Thus, identifying inaccurate news headlines is a -task worth studying. Previous work names these headlines ""clickbaits"" and -mainly focus on the features extracted from the headlines, which limits the -performance since the consistency between headlines and news bodies is -underappreciated. In this paper, we clearly redefine the problem and identify -ambiguous and misleading headlines separately. We utilize class sequential -rules to exploit structure information when detecting ambiguous headlines. For -the identification of misleading headlines, we extract features based on the -congruence between headlines and bodies. To make use of the large unlabeled -data set, we apply a co-training method and gain an increase in performance. -The experiment results show the effectiveness of our methods. Then we use our -classifiers to detect inaccurate headlines crawled from different sources and -conduct a data analysis. -" -4848,1705.06106,"Katharina Kann and Hinrich Sch\""utze","Unlabeled Data for Morphological Generation With Character-Based - Sequence-to-Sequence Models",cs.CL," We present a semi-supervised way of training a character-based -encoder-decoder recurrent neural network for morphological reinflection, the -task of generating one inflected word form from another. This is achieved by -using unlabeled tokens or random strings as training data for an autoencoding -task, adapting a network for morphological reinflection, and performing -multi-task training. We thus use limited labeled data more effectively, -obtaining up to 9.9% improvement over state-of-the-art baselines for 8 -different languages. -" -4849,1705.06262,"Vincent Major, Alisa Surkis, and Yindalon Aphinyanaphongs","Utility of General and Specific Word Embeddings for Classifying - Translational Stages of Research",cs.CL stat.ML," Conventional text classification models make a bag-of-words assumption -reducing text into word occurrence counts per document. Recent algorithms such -as word2vec are capable of learning semantic meaning and similarity between -words in an entirely unsupervised manner using a contextual window and doing so -much faster than previous methods. Each word is projected into vector space -such that similar meaning words such as ""strong"" and ""powerful"" are projected -into the same general Euclidean space. Open questions about these embeddings -include their utility across classification tasks and the optimal properties -and source of documents to construct broadly functional embeddings. In this -work, we demonstrate the usefulness of pre-trained embeddings for -classification in our task and demonstrate that custom word embeddings, built -in the domain and for the tasks, can improve performance over word embeddings -learnt on more general data including news articles or Wikipedia. -" -4850,1705.06273,"Ji Young Lee, Franck Dernoncourt, Peter Szolovits",Transfer Learning for Named-Entity Recognition with Neural Networks,cs.CL cs.AI cs.NE stat.ML," Recent approaches based on artificial neural networks (ANNs) have shown -promising results for named-entity recognition (NER). In order to achieve high -performances, ANNs need to be trained on a large labeled dataset. However, -labels might be difficult to obtain for the dataset on which the user wants to -perform NER: label scarcity is particularly pronounced for patient note -de-identification, which is an instance of NER. In this work, we analyze to -what extent transfer learning may address this issue. In particular, we -demonstrate that transferring an ANN model trained on a large labeled dataset -to another dataset with a limited number of labels improves upon the -state-of-the-art results on two different datasets for patient note -de-identification. -" -4851,1705.06353,Christophe Bruchansky,"Political Footprints: Political Discourse Analysis using Pre-Trained - Word Vectors",cs.CL," In this paper, we discuss how machine learning could be used to produce a -systematic and more objective political discourse analysis. Political -footprints are vector space models (VSMs) applied to political discourse. Each -of their vectors represents a word, and is produced by training the English -lexicon on large text corpora. This paper presents a simple implementation of -political footprints, some heuristics on how to use them, and their application -to four cases: the U.N. Kyoto Protocol and Paris Agreement, and two U.S. -presidential elections. The reader will be offered a number of reasons to -believe that political footprints produce meaningful results, along with some -suggestions on how to improve their implementation. -" -4852,1705.06369,"Edoardo Maria Ponti, Ivan Vuli\'c, Anna Korhonen",Decoding Sentiment from Distributed Representations of Sentences,cs.CL," Distributed representations of sentences have been developed recently to -represent their meaning as real-valued vectors. However, it is not clear how -much information such representations retain about the polarity of sentences. -To study this question, we decode sentiment from unsupervised sentence -representations learned with different architectures (sensitive to the order of -words, the order of sentences, or none) in 9 typologically diverse languages. -Sentiment results from the (recursive) composition of lexical items and -grammatical strategies such as negation and concession. The results are -manifold: we show that there is no `one-size-fits-all' representation -architecture outperforming the others across the board. Rather, the top-ranking -architectures depend on the language and data at hand. Moreover, we find that -in several cases the additive composition model based on skip-gram word vectors -may surpass supervised state-of-art architectures such as bidirectional LSTMs. -Finally, we provide a possible explanation of the observed variation based on -the type of negative constructions in each language. -" -4853,1705.06400,"Matthias Plappert, Christian Mandery, Tamim Asfour","Learning a bidirectional mapping between human whole-body motion and - natural language using deep recurrent neural networks",cs.LG cs.CL cs.RO stat.ML," Linking human whole-body motion and natural language is of great interest for -the generation of semantic representations of observed human behaviors as well -as for the generation of robot behaviors based on natural language input. While -there has been a large body of research in this area, most approaches that -exist today require a symbolic representation of motions (e.g. in the form of -motion primitives), which have to be defined a-priori or require complex -segmentation algorithms. In contrast, recent advances in the field of neural -networks and especially deep learning have demonstrated that sub-symbolic -representations that can be learned end-to-end usually outperform more -traditional approaches, for applications such as machine translation. In this -paper we propose a generative model that learns a bidirectional mapping between -human whole-body motion and natural language using deep recurrent neural -networks (RNNs) and sequence-to-sequence learning. Our approach does not -require any segmentation or manual feature engineering and learns a distributed -representation, which is shared for all motions and descriptions. We evaluate -our approach on 2,846 human whole-body motions and 6,187 natural language -descriptions thereof from the KIT Motion-Language Dataset. Our results clearly -demonstrate the effectiveness of the proposed model: We show that our model -generates a wide variety of realistic motions only from descriptions thereof in -form of a single sentence. Conversely, our model is also capable of generating -correct and detailed natural language descriptions from human motions. -" -4854,1705.06457,"Augustin Speyer, Robin Lemke","Information Density as a Factor for Variation in the Embedding of - Relative Clauses",cs.CL," In German, relative clauses can be positioned in-situ or extraposed. A -potential factor for the variation might be information density. In this study, -this hypothesis is tested with a corpus of 17th century German funeral sermons. -For each referent in the relative clauses and their matrix clauses, the -attention state was determined (first calculation). In a second calculation, -for each word the surprisal values were determined, using a bi-gram language -model. In a third calculation, the surprisal values were accommodated as to -whether it is the first occurrence of the word in question or not. All three -calculations pointed in the same direction: With in-situ relative clauses, the -rate of new referents was lower and the average surprisal values were lower, -especially the accommodated surprisal values, than with extraposed relative -clauses. This indicated that in-formation density is a factor governing the -choice between in-situ and extraposed relative clauses. The study also sheds -light on the intrinsic relation-ship between the information theoretic concept -of information density and in-formation structural concepts such as givenness -which are used under a more linguistic perspective. -" -4855,1705.06463,"Hongmin Wang, Yue Zhang, GuangYong Leonard Chan, Jie Yang, Hai Leong - Chieu",Universal Dependencies Parsing for Colloquial Singaporean English,cs.CL," Singlish can be interesting to the ACL community both linguistically as a -major creole based on English, and computationally for information extraction -and sentiment analysis of regional social media. We investigate dependency -parsing of Singlish by constructing a dependency treebank under the Universal -Dependencies scheme, and then training a neural network model by integrating -English syntactic knowledge into a state-of-the-art parser trained on the -Singlish treebank. Results show that English knowledge can lead to 25% relative -error reduction, resulting in a parser of 84.47% accuracies. To the best of our -knowledge, we are the first to use neural stacking to improve cross-lingual -dependency parsing on low-resource languages. We make both our annotation and -parser available for further research. -" -4856,1705.06476,"Alexander H. Miller, Will Feng, Adam Fisch, Jiasen Lu, Dhruv Batra, - Antoine Bordes, Devi Parikh, Jason Weston",ParlAI: A Dialog Research Software Platform,cs.CL," We introduce ParlAI (pronounced ""par-lay""), an open-source software platform -for dialog research implemented in Python, available at http://parl.ai. Its -goal is to provide a unified framework for sharing, training and testing of -dialog models, integration of Amazon Mechanical Turk for data collection, human -evaluation, and online/reinforcement learning; and a repository of machine -learning models for comparing with others' models, and improving upon existing -architectures. Over 20 tasks are supported in the first release, including -popular datasets such as SQuAD, bAbI tasks, MCTest, WikiQA, QACNN, QADailyMail, -CBT, bAbI Dialog, Ubuntu, OpenSubtitles and VQA. Several models are integrated, -including neural models such as memory networks, seq2seq and attentive LSTMs. -" -4857,1705.06510,Andrea Martini and Alessio Cardillo and Paolo De Los Rios,"Entropic selection of concepts unveils hidden topics in documents - corpora",physics.soc-ph cs.CL cs.DL cs.SI," The organization and evolution of science has recently become itself an -object of scientific quantitative investigation, thanks to the wealth of -information that can be extracted from scientific documents, such as citations -between papers and co-authorship between researchers. However, only few studies -have focused on the concepts that characterize full documents and that can be -extracted and analyzed, revealing the deeper organization of scientific -knowledge. Unfortunately, several concepts can be so common across documents -that they hinder the emergence of the underlying topical structure of the -document corpus, because they give rise to a large amount of spurious and -trivial relations among documents. To identify and remove common concepts, we -introduce a method to gauge their relevance according to an objective -information-theoretic measure related to the statistics of their occurrence -across the document corpus. After progressively removing concepts that, -according to this metric, can be considered as generic, we find that the topic -organization displays a correspondingly more refined structure. -" -4858,1705.06824,"Zhengyang Wang, Shuiwang Ji","Learning Convolutional Text Representations for Visual Question - Answering",cs.LG cs.CL cs.NE stat.ML," Visual question answering is a recently proposed artificial intelligence task -that requires a deep understanding of both images and texts. In deep learning, -images are typically modeled through convolutional neural networks, and texts -are typically modeled through recurrent neural networks. While the requirement -for modeling images is similar to traditional computer vision tasks, such as -object recognition and image classification, visual question answering raises a -different need for textual representation as compared to other natural language -processing tasks. In this work, we perform a detailed analysis on natural -language questions in visual question answering. Based on the analysis, we -propose to rely on convolutional neural networks for learning textual -representations. By exploring the various properties of convolutional neural -networks specialized for text data, such as width and depth, we present our -""CNN Inception + Gate"" model. We show that our model improves question -representations and thus the overall accuracy of visual question answering -models. We also show that the text representation requirement in visual -question answering is more complicated and comprehensive than that in -conventional natural language processing tasks, making it a better task to -evaluate textual representation methods. Shallow models like fastText, which -can obtain comparable results with deep learning models in tasks like text -classification, are not suitable in visual question answering. -" -4859,1705.07008,"Leandro B. dos Santos, Magali S. Duran, Nathan S. Hartmann, Arnaldo - Candido Jr., Gustavo H. Paetzold, Sandra M. Aluisio","A Lightweight Regression Method to Infer Psycholinguistic Properties for - Brazilian Portuguese",cs.CL," Psycholinguistic properties of words have been used in various approaches to -Natural Language Processing tasks, such as text simplification and readability -assessment. Most of these properties are subjective, involving costly and -time-consuming surveys to be gathered. Recent approaches use the limited -datasets of psycholinguistic properties to extend them automatically to large -lexicons. However, some of the resources used by such approaches are not -available to most languages. This study presents a method to infer -psycholinguistic properties for Brazilian Portuguese (BP) using regressors -built with a light set of features usually available for less resourced -languages: word length, frequency lists, lexical databases composed of school -dictionaries and word embedding models. The correlations between the properties -inferred are close to those obtained by related works. The resulting resource -contains 26,874 words in BP annotated with concreteness, age of acquisition, -imageability and subjective frequency. -" -4860,1705.07136,"Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy","Softmax Q-Distribution Estimation for Structured Prediction: A - Theoretical Interpretation for RAML",cs.LG cs.CL stat.ML," Reward augmented maximum likelihood (RAML), a simple and effective learning -framework to directly optimize towards the reward function in structured -prediction tasks, has led to a number of impressive empirical successes. RAML -incorporates task-specific reward by performing maximum-likelihood updates on -candidate outputs sampled according to an exponentiated payoff distribution, -which gives higher probabilities to candidates that are close to the reference -output. While RAML is notable for its simplicity, efficiency, and its -impressive empirical successes, the theoretical properties of RAML, especially -the behavior of the exponentiated payoff distribution, has not been examined -thoroughly. In this work, we introduce softmax Q-distribution estimation, a -novel theoretical interpretation of RAML, which reveals the relation between -RAML and Bayesian decision theory. The softmax Q-distribution can be regarded -as a smooth approximation of the Bayes decision boundary, and the Bayes -decision rule is achieved by decoding with this Q-distribution. We further show -that RAML is equivalent to approximately estimating the softmax Q-distribution, -with the temperature $\tau$ controlling approximation error. We perform two -experiments, one on synthetic data of multi-class classification and one on -real data of image captioning, to demonstrate the relationship between RAML and -the proposed softmax Q-distribution estimation method, verifying our -theoretical analysis. Additional experiments on three structured prediction -tasks with rewards defined on sequential (named entity recognition), tree-based -(dependency parsing) and irregular (machine translation) structures show -notable improvements over maximum likelihood baselines. -" -4861,1705.07267,"Jiatao Gu, Yong Wang, Kyunghyun Cho and Victor O.K. Li",Search Engine Guided Non-Parametric Neural Machine Translation,cs.CL cs.AI cs.LG," In this paper, we extend an attention-based neural machine translation (NMT) -model by allowing it to access an entire training set of parallel sentence -pairs even after training. The proposed approach consists of two stages. In the -first stage--retrieval stage--, an off-the-shelf, black-box search engine is -used to retrieve a small subset of sentence pairs from a training set given a -source sentence. These pairs are further filtered based on a fuzzy matching -score based on edit distance. In the second stage--translation stage--, a novel -translation model, called translation memory enhanced NMT (TM-NMT), seamlessly -uses both the source sentence and a set of retrieved sentence pairs to perform -the translation. Empirical evaluation on three language pairs (En-Fr, En-De, -and En-Es) shows that the proposed approach significantly outperforms the -baseline approach and the improvement is more significant when more relevant -sentence pairs were retrieved. -" -4862,1705.07318,Chun Tian,Formalized Lambek Calculus in Higher Order Logic (HOL4),cs.CL cs.LO," In this project, a rather complete proof-theoretical formalization of Lambek -Calculus (non-associative with arbitrary extensions) has been ported from Coq -proof assistent to HOL4 theorem prover, with some improvements and new -theorems. - Three deduction systems (Syntactic Calculus, Natural Deduction and Sequent -Calculus) of Lambek Calculus are defined with many related theorems proved. The -equivalance between these systems are formally proved. Finally, a formalization -of Sequent Calculus proofs (where Coq has built-in supports) has been designed -and implemented in HOL4. Some basic results including the sub-formula -properties of the so-called ""cut-free"" proofs are formally proved. - This work can be considered as the preliminary work towards a language parser -based on category grammars which is not multimodal but still has ability to -support context-sensitive languages through customized extensions. -" -4863,1705.07368,James Foulds,Mixed Membership Word Embeddings for Computational Social Science,cs.CL cs.AI cs.LG," Word embeddings improve the performance of NLP systems by revealing the -hidden structural relationships between words. Despite their success in many -applications, word embeddings have seen very little use in computational social -science NLP tasks, presumably due to their reliance on big data, and to a lack -of interpretability. I propose a probabilistic model-based word embedding -method which can recover interpretable embeddings, without big data. The key -insight is to leverage mixed membership modeling, in which global -representations are shared, but individual entities (i.e. dictionary words) are -free to use these representations to uniquely differing degrees. I show how to -train the model using a combination of state-of-the-art training techniques for -word embeddings and topic models. The experimental results show an improvement -in predictive language modeling of up to 63% in MRR over the skip-gram, and -demonstrate that the representations are beneficial for supervised learning. I -illustrate the interpretability of the models with computational social science -case studies on State of the Union addresses and NIPS articles. -" -4864,1705.07371,"Yingbo Zhou, Utkarsh Porwal, Roberto Konow",Spelling Correction as a Foreign Language,cs.CL," In this paper, we reformulated the spell correction problem as a machine -translation task under the encoder-decoder framework. This reformulation -enabled us to use a single model for solving the problem that is traditionally -formulated as learning a language model and an error model. This model employs -multi-layer recurrent neural networks as an encoder and a decoder. We -demonstrate the effectiveness of this model using an internal dataset, where -the training data is automatically obtained from user logs. The model offers -competitive performance as compared to the state of the art methods but does -not require any feature engineering nor hand tuning between models. -" -4865,1705.07393,"Kenton Lee, Omer Levy, Luke Zettlemoyer",Recurrent Additive Networks,cs.CL," We introduce recurrent additive networks (RANs), a new gated RNN which is -distinguished by the use of purely additive latent state updates. At every time -step, the new state is computed as a gated component-wise sum of the input and -the previous state, without any of the non-linearities commonly used in RNN -transition dynamics. We formally show that RAN states are weighted sums of the -input vectors, and that the gates only contribute to computing the weights of -these sums. Despite this relatively simple functional form, experiments -demonstrate that RANs perform on par with LSTMs on benchmark language modeling -problems. This result shows that many of the non-linear computations in LSTMs -and related networks are not essential, at least for the problems we consider, -and suggests that the gates are doing more of the computational work than -previously understood. -" -4866,1705.07425,"Thomas Niebler, Martin Becker, Christian P\""olitz, Andreas Hotho",Learning Semantic Relatedness From Human Feedback Using Metric Learning,cs.CL cs.LG," Assessing the degree of semantic relatedness between words is an important -task with a variety of semantic applications, such as ontology learning for the -Semantic Web, semantic search or query expansion. To accomplish this in an -automated fashion, many relatedness measures have been proposed. However, most -of these metrics only encode information contained in the underlying corpus and -thus do not directly model human intuition. To solve this, we propose to -utilize a metric learning approach to improve existing semantic relatedness -measures by learning from additional information, such as explicit human -feedback. For this, we argue to use word embeddings instead of traditional -high-dimensional vector representations in order to leverage their semantic -density and to reduce computational cost. We rigorously test our approach on -several domains including tagging data as well as publicly available embeddings -based on Wikipedia texts and navigation. Human feedback about semantic -relatedness for learning and evaluation is extracted from publicly available -datasets such as MEN or WS-353. We find that our method can significantly -improve semantic relatedness measures by learning from additional information, -such as explicit human feedback. For tagging data, we are the first to generate -and study embeddings. Our results are of special interest for ontology and -recommendation engineers, but also for any other researchers and practitioners -of Semantic Web techniques. -" -4867,1705.07687,"Aitor Garc\'ia-Pablos, Montse Cuadros, German Rigau",W2VLDA: Almost Unsupervised System for Aspect Based Sentiment Analysis,cs.CL," With the increase of online customer opinions in specialised websites and -social networks, the necessity of automatic systems to help to organise and -classify customer reviews by domain-specific aspect/categories and sentiment -polarity is more important than ever. Supervised approaches to Aspect Based -Sentiment Analysis obtain good results for the domain/language their are -trained on, but having manually labelled data for training supervised systems -for all domains and languages are usually very costly and time consuming. In -this work we describe W2VLDA, an almost unsupervised system based on topic -modelling, that combined with some other unsupervised methods and a minimal -configuration, performs aspect/category classifiation, -aspect-terms/opinion-words separation and sentiment polarity classification for -any given domain and language. We evaluate the performance of the aspect and -sentiment classification in the multilingual SemEval 2016 task 5 (ABSA) -dataset. We show competitive results for several languages (English, Spanish, -French and Dutch) and domains (hotels, restaurants, electronic-devices). -" -4868,1705.07704,Vlad Niculae and Mathieu Blondel,A Regularized Framework for Sparse and Structured Neural Attention,stat.ML cs.CL cs.LG," Modern neural networks are often augmented with an attention mechanism, which -tells the network where to focus within the input. We propose in this paper a -new framework for sparse and structured attention, building upon a smoothed max -operator. We show that the gradient of this operator defines a mapping from -real values to probabilities, suitable as an attention mechanism. Our framework -includes softmax and a slight generalization of the recently-proposed sparsemax -as special cases. However, we also show how our framework can incorporate -modern structured penalties, resulting in more interpretable attention -mechanisms, that focus on entire segments or groups of an input. We derive -efficient algorithms to compute the forward and backward passes of our -attention mechanisms, enabling their use in a neural network trained with -backpropagation. To showcase their potential as a drop-in replacement for -existing ones, we evaluate our attention mechanisms on three large-scale tasks: -textual entailment, machine translation, and sentence summarization. Our -attention mechanisms improve interpretability without sacrificing performance; -notably, on textual entailment and summarization, we outperform the standard -attention mechanisms based on softmax and sparsemax. -" -4869,1705.07830,"Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech - Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang","Ask the Right Questions: Active Question Reformulation with - Reinforcement Learning",cs.CL cs.AI," We frame Question Answering (QA) as a Reinforcement Learning task, an -approach that we call Active Question Answering. We propose an agent that sits -between the user and a black box QA system and learns to reformulate questions -to elicit the best possible answers. The agent probes the system with, -potentially many, natural language reformulations of an initial question and -aggregates the returned evidence to yield the best answer. The reformulation -system is trained end-to-end to maximize answer quality using policy gradient. -We evaluate on SearchQA, a dataset of complex questions extracted from -Jeopardy!. The agent outperforms a state-of-the-art base model, playing the -role of the environment, and other benchmarks. We also analyze the language -that the agent has learned while interacting with the question answering -system. We find that successful question reformulations look quite different -from natural language paraphrases. The agent is able to discover non-trivial -reformulation strategies that resemble classic information retrieval techniques -such as term re-weighting (tf-idf) and stemming. -" -4870,1705.07860,Graham Neubig and Yoav Goldberg and Chris Dyer,On-the-fly Operation Batching in Dynamic Computation Graphs,cs.LG cs.CL stat.ML," Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offer -more flexibility for implementing models that cope with data of varying -dimensions and structure, relative to toolkits that operate on statically -declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing -toolkits - both static and dynamic - require that the developer organize the -computations into the batches necessary for exploiting high-performance -algorithms and hardware. This batching task is generally difficult, but it -becomes a major hurdle as architectures become complex. In this paper, we -present an algorithm, and its implementation in the DyNet toolkit, for -automatically batching operations. Developers simply write minibatch -computations as aggregations of single instance computations, and the batching -algorithm seamlessly executes them, on the fly, using computationally efficient -batched operations. On a variety of tasks, we obtain throughput similar to that -obtained with manual batches, as well as comparable speedups over -single-instance learning on architectures that are impractical to batch -manually. -" -4871,1705.07962,Tony Beltramelli,pix2code: Generating Code from a Graphical User Interface Screenshot,cs.LG cs.AI cs.CL cs.CV cs.NE," Transforming a graphical user interface screenshot created by a designer into -computer code is a typical task conducted by a developer in order to build -customized software, websites, and mobile applications. In this paper, we show -that deep learning methods can be leveraged to train a model end-to-end to -automatically generate code from a single input image with over 77% of accuracy -for three different platforms (i.e. iOS, Android and web-based technologies). -" -4872,1705.08018,"Ashwini Jaya Kumar, Camilo Morales, Maria-Esther Vidal, Christoph - Schmidt, S\""oren Auer","Use of Knowledge Graph in Rescoring the N-Best List in Automatic Speech - Recognition",cs.CL," With the evolution of neural network based methods, automatic speech -recognition (ASR) field has been advanced to a level where building an -application with speech interface is a reality. In spite of these advances, -building a real-time speech recogniser faces several problems such as low -recognition accuracy, domain constraint, and out-of-vocabulary words. The low -recognition accuracy problem is addressed by improving the acoustic model, -language model, decoder and by rescoring the N-best list at the output of the -decoder. We are considering the N-best list rescoring approach to improve the -recognition accuracy. Most of the methods in the literature use the -grammatical, lexical, syntactic and semantic connection between the words in a -recognised sentence as a feature to rescore. In this paper, we have tried to -see the semantic relatedness between the words in a sentence to rescore the -N-best list. Semantic relatedness is computed using -TransE~\cite{bordes2013translating}, a method for low dimensional embedding of -a triple in a knowledge graph. The novelty of the paper is the application of -semantic web to automatic speech recognition. -" -4873,1705.08038,"Vivek Kulkarni, Margaret L. Kern, David Stillwell, Michal Kosinski, - Sandra Matz, Lyle Ungar, Steven Skiena, H. Andrew Schwartz","Latent Human Traits in the Language of Social Media: An Open-Vocabulary - Approach",cs.CL," Over the past century, personality theory and research has successfully -identified core sets of characteristics that consistently describe and explain -fundamental differences in the way people think, feel and behave. Such -characteristics were derived through theory, dictionary analyses, and survey -research using explicit self-reports. The availability of social media data -spanning millions of users now makes it possible to automatically derive -characteristics from language use -- at large scale. Taking advantage of -linguistic information available through Facebook, we study the process of -inferring a new set of potential human traits based on unprompted language use. -We subject these new traits to a comprehensive set of evaluations and compare -them with a popular five factor model of personality. We find that our -language-based trait construct is often more generalizable in that it often -predicts non-questionnaire-based outcomes better than questionnaire-based -traits (e.g. entities someone likes, income and intelligence quotient), while -the factors remain nearly as stable as traditional factors. Our approach -suggests a value in new constructs of personality derived from everyday human -language use. -" -4874,1705.08063,"Arman Cohan, Nazli Goharian","Contextualizing Citations for Scientific Summarization using Word - Embeddings and Domain Knowledge",cs.CL cs.IR," Citation texts are sometimes not very informative or in some cases inaccurate -by themselves; they need the appropriate context from the referenced paper to -reflect its exact contributions. To address this problem, we propose an -unsupervised model that uses distributed representation of words as well as -domain knowledge to extract the appropriate context from the reference paper. -Evaluation results show the effectiveness of our model by significantly -outperforming the state-of-the-art. We furthermore demonstrate how an effective -contextualization method results in improving citation-based summarization of -the scientific articles. -" -4875,1705.08091,"Andros Tjandra, Sakriani Sakti, Satoshi Nakamura","Local Monotonic Attention Mechanism for End-to-End Speech and Language - Processing",cs.CL," Recently, encoder-decoder neural networks have shown impressive performance -on many sequence-related tasks. The architecture commonly uses an attentional -mechanism which allows the model to learn alignments between the source and the -target sequence. Most attentional mechanisms used today is based on a global -attention property which requires a computation of a weighted summarization of -the whole input sequence generated by encoder states. However, it is -computationally expensive and often produces misalignment on the longer input -sequence. Furthermore, it does not fit with monotonous or left-to-right nature -in several tasks, such as automatic speech recognition (ASR), -grapheme-to-phoneme (G2P), etc. In this paper, we propose a novel attention -mechanism that has local and monotonic properties. Various ways to control -those properties are also explored. Experimental results on ASR, G2P and -machine translation between two languages with similar sentence structures, -demonstrate that the proposed encoder-decoder model with local monotonic -attention could achieve significant performance improvements and reduce the -computational complexity in comparison with the one that used the standard -global attention architecture. -" -4876,1705.08094,"Zhengkui Wang, Guangdong Bai, Soumyadeb Chowdhury, Quanqing Xu, Zhi - Lin Seow",TwiInsight: Discovering Topics and Sentiments from Social Media Datasets,cs.IR cs.CL," Social media platforms contain a great wealth of information which provides -opportunities for us to explore hidden patterns or unknown correlations, and -understand people's satisfaction with what they are discussing. As one -showcase, in this paper, we present a system, TwiInsight which explores the -insight of Twitter data. Different from other Twitter analysis systems, -TwiInsight automatically extracts the popular topics under different categories -(e.g., healthcare, food, technology, sports and transport) discussed in Twitter -via topic modeling and also identifies the correlated topics across different -categories. Additionally, it also discovers the people's opinions on the tweets -and topics via the sentiment analysis. The system also employs an intuitive and -informative visualization to show the uncovered insight. Furthermore, we also -develop and compare six most popular algorithms - three for sentiment analysis -and three for topic modeling. -" -4877,1705.08142,"Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders - S{\o}gaard",Latent Multi-task Architecture Learning,stat.ML cs.AI cs.CL cs.LG cs.NE," Multi-task learning (MTL) allows deep neural networks to learn from related -tasks by sharing parameters with other networks. In practice, however, MTL -involves searching an enormous space of possible parameter sharing -architectures to find (a) the layers or subspaces that benefit from sharing, -(b) the appropriate amount of sharing, and (c) the appropriate relative weights -of the different task losses. Recent work has addressed each of the above -problems in isolation. In this work we present an approach that learns a latent -multi-task architecture that jointly addresses (a)--(c). We present experiments -on synthetic data and data from OntoNotes 5.0, including four different tasks -and seven different domains. Our extension consistently outperforms previous -approaches to learning latent architectures for multi-task problems and -achieves up to 15% average error reductions over common approaches to MTL. -" -4878,1705.08321,"Roman Gurinovich, Alexander Pashuk, Yuriy Petrovskiy, Alex - Dmitrievskij, Oleg Kuryan, Alexei Scerbacov, Antonia Tiggre, Elena Moroz, - Yuri Nikolsky","Increasing Papers' Discoverability with Precise Semantic Labeling: the - sci.AI Platform",cs.IR cs.CL," The number of published findings in biomedicine increases continually. At the -same time, specifics of the domain's terminology complicates the task of -relevant publications retrieval. In the current research, we investigate -influence of terms' variability and ambiguity on a paper's likelihood of being -retrieved. We obtained statistics that demonstrate significance of the issue -and its challenges, followed by presenting the sci.AI platform, which allows -precise terms labeling as a resolution. -" -4879,1705.08386,"Karol Kurach, Sylvain Gelly, Michal Jastrzebski, Philip Haeusser, - Olivier Teytaud, Damien Vincent, Olivier Bousquet",Better Text Understanding Through Image-To-Text Transfer,cs.CL cs.CV cs.LG," Generic text embeddings are successfully used in a variety of tasks. However, -they are often learnt by capturing the co-occurrence structure from pure text -corpora, resulting in limitations of their ability to generalize. In this -paper, we explore models that incorporate visual information into the text -representation. Based on comprehensive ablation studies, we propose a -conceptually simple, yet well performing architecture. It outperforms previous -multimodal approaches on a set of well established benchmarks. We also improve -the state-of-the-art results for image-related text datasets, using orders of -magnitude less data. -" -4880,1705.08432,"Hamid Palangi, Paul Smolensky, Xiaodong He, Li Deng",Question-Answering with Grammatically-Interpretable Representations,cs.CL," We introduce an architecture, the Tensor Product Recurrent Network (TPRN). In -our application of TPRN, internal representations learned by end-to-end -optimization in a deep neural network performing a textual question-answering -(QA) task can be interpreted using basic concepts from linguistic theory. No -performance penalty need be paid for this increased interpretability: the -proposed model performs comparably to a state-of-the-art system on the SQuAD QA -task. The internal representation which is interpreted is a Tensor Product -Representation: for each input word, the model selects a symbol to encode the -word, and a role in which to place the symbol, and binds the two together. The -selection is via soft attention. The overall interpretation is built from -interpretations of the symbols, as recruited by the trained model, and -interpretations of the roles as used by the model. We find support for our -initial hypothesis that symbols can be interpreted as lexical-semantic word -meanings, while roles can be interpreted as approximations of grammatical roles -(or categories) such as subject, wh-word, determiner, etc. Fine-grained -analysis reveals specific correspondences between the learned roles and parts -of speech as assigned by a standard tagger (Toutanova et al. 2003), and finds -several discrepancies in the model's favor. In this sense, the model learns -significant aspects of grammar, after having been exposed solely to -linguistically unannotated text, questions, and answers: no prior linguistic -knowledge is given to the model. What is given is the means to build -representations using symbols and roles, with an inductive bias favoring use of -these in an approximately discrete manner. -" -4881,1705.08488,Denis Newman-Griffis and Eric Fosler-Lussier,Second-Order Word Embeddings from Nearest Neighbor Topological Features,cs.CL cs.AI," We introduce second-order vector representations of words, induced from -nearest neighborhood topological features in pre-trained contextual word -embeddings. We then analyze the effects of using second-order embeddings as -input features in two deep natural language processing models, for named entity -recognition and recognizing textual entailment, as well as a linear model for -paraphrase recognition. Surprisingly, we find that nearest neighbor information -alone is sufficient to capture most of the performance benefits derived from -using pre-trained word embeddings. Furthermore, second-order embeddings are -able to handle highly heterogeneous data better than first-order -representations, though at the cost of some specificity. Additionally, -augmenting contextual embeddings with second-order information further improves -model performance in some cases. Due to variance in the random initializations -of word embeddings, utilizing nearest neighbor features from multiple -first-order embedding samples can also contribute to downstream performance -gains. Finally, we identify intriguing characteristics of second-order -embedding spaces for further research, including much higher density and -different semantic interpretations of cosine similarity. -" -4882,1705.08557,"Ankit Vani, Yacine Jernite, David Sontag",Grounded Recurrent Neural Networks,stat.ML cs.CL cs.LG cs.NE," In this work, we present the Grounded Recurrent Neural Network (GRNN), a -recurrent neural network architecture for multi-label prediction which -explicitly ties labels to specific dimensions of the recurrent hidden state (we -call this process ""grounding""). The approach is particularly well-suited for -extracting large numbers of concepts from text. We apply the new model to -address an important problem in healthcare of understanding what medical -concepts are discussed in clinical text. Using a publicly available dataset -derived from Intensive Care Units, we learn to label a patient's diagnoses and -procedures from their discharge summary. Our evaluation shows a clear advantage -to using our proposed architecture over a variety of strong baselines. -" -4883,1705.08828,"Jeremy Ferrero, Laurent Besacier, Didier Schwab and Frederic Agnes",Deep Investigation of Cross-Language Plagiarism Detection Methods,cs.CL," This paper is a deep investigation of cross-language plagiarism detection -methods on a new recently introduced open dataset, which contains parallel and -comparable collections of documents with multiple characteristics (different -genres, languages and sizes of texts). We investigate cross-language plagiarism -detection methods for 6 language pairs on 2 granularities of text units in -order to draw robust conclusions on the best methods while deeply analyzing -correlations across document styles and languages. -" -4884,1705.08843,Fabio Massimo Zanzotto and Giordano Cristini and Giorgio Satta,Parsing with CYK over Distributed Representations,cs.CL," Syntactic parsing is a key task in natural language processing. This task has -been dominated by symbolic, grammar-based parsers. Neural networks, with their -distributed representations, are challenging these methods. In this article we -show that existing symbolic parsing algorithms can cross the border and be -entirely formulated over distributed representations. To this end we introduce -a version of the traditional Cocke-Younger-Kasami (CYK) algorithm, called -D-CYK, which is entirely defined over distributed representations. Our D-CYK -uses matrix multiplication on real number matrices of size independent of the -length of the input string. These operations are compatible with traditional -neural networks. Experiments show that our D-CYK approximates the original CYK -algorithm. By showing that CYK can be entirely performed on distributed -representations, we open the way to the definition of recurrent layers of -CYK-informed neural networks. -" -4885,1705.08942,"Necva B\""ol\""uc\""u and Burcu Can",Joint PoS Tagging and Stemming for Agglutinative Languages,cs.CL," The number of word forms in agglutinative languages is theoretically infinite -and this variety in word forms introduces sparsity in many natural language -processing tasks. Part-of-speech tagging (PoS tagging) is one of these tasks -that often suffers from sparsity. In this paper, we present an unsupervised -Bayesian model using Hidden Markov Models (HMMs) for joint PoS tagging and -stemming for agglutinative languages. We use stemming to reduce sparsity in PoS -tagging. Two tasks are jointly performed to provide a mutual benefit in both -tasks. Our results show that joint POS tagging and stemming improves PoS -tagging scores. We present results for Turkish and Finnish as agglutinative -languages and English as a morphologically poor language. -" -4886,1705.08947,"Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan - Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou",Deep Voice 2: Multi-Speaker Neural Text-to-Speech,cs.CL," We introduce a technique for augmenting neural text-to-speech (TTS) with -lowdimensional trainable speaker embeddings to generate different voices from a -single model. As a starting point, we show improvements over the two -state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and -Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with -Deep Voice 1, but constructed with higher performance building blocks and -demonstrates a significant audio quality improvement over Deep Voice 1. We -improve Tacotron by introducing a post-processing neural vocoder, and -demonstrate a significant audio quality improvement. We then demonstrate our -technique for multi-speaker speech synthesis for both Deep Voice 2 and Tacotron -on two multi-speaker TTS datasets. We show that a single neural TTS system can -learn hundreds of unique voices from less than half an hour of data per -speaker, while achieving high audio quality synthesis and preserving the -speaker identities almost perfectly. -" -4887,1705.08992,"Nicholas Harvey, Vahab Mirrokni, David Karger, Virginia Savova, Leonid - Peshkin",Matroids Hitting Sets and Unsupervised Dependency Grammar Induction,cs.DM cs.CL cs.DS," This paper formulates a novel problem on graphs: find the minimal subset of -edges in a fully connected graph, such that the resulting graph contains all -spanning trees for a set of specifed sub-graphs. This formulation is motivated -by an un-supervised grammar induction problem from computational linguistics. -We present a reduction to some known problems and algorithms from graph theory, -provide computational complexity results, and describe an approximation -algorithm. -" -4888,1705.09037,"Tao Lei, Wengong Jin, Regina Barzilay and Tommi Jaakkola",Deriving Neural Architectures from Sequence and Graph Kernels,cs.NE cs.CL cs.LG," The design of neural architectures for structured objects is typically guided -by experimental insights rather than a formal process. In this work, we appeal -to kernels over combinatorial structures, such as sequences and graphs, to -derive appropriate neural operations. We introduce a class of deep recurrent -neural operations and formally characterize their associated kernel spaces. Our -recurrent modules compare the input to virtual reference objects (cf. filters -in CNN) via the kernels. Similar to traditional neural operations, these -reference objects are parameterized and directly optimized in end-to-end -training. We empirically evaluate the proposed class of neural architectures on -standard applications such as language modeling and molecular graph regression, -achieving state-of-the-art results across these applications. -" -4889,1705.09054,Zhipeng Xie and Junfeng Hu,"Max-Cosine Matching Based Neural Models for Recognizing Textual - Entailment",cs.CL," Recognizing textual entailment is a fundamental task in a variety of text -mining or natural language processing applications. This paper proposes a -simple neural model for RTE problem. It first matches each word in the -hypothesis with its most-similar word in the premise, producing an augmented -representation of the hypothesis conditioned on the premise as a sequence of -word pairs. The LSTM model is then used to model this augmented sequence, and -the final output from the LSTM is fed into a softmax layer to make the -prediction. Besides the base model, in order to enhance its performance, we -also proposed three techniques: the integration of multiple word-embedding -library, bi-way integration, and ensemble based on model averaging. -Experimental results on the SNLI dataset have shown that the three techniques -are effective in boosting the predicative accuracy and that our method -outperforms several state-of-the-state ones. -" -4890,1705.09189,"Jean Maillard, Stephen Clark, Dani Yogatama","Jointly Learning Sentence Embeddings and Syntax with Unsupervised - Tree-LSTMs",cs.CL," We introduce a neural network that represents sentences by composing their -words according to induced binary parse trees. We use Tree-LSTM as our -composition function, applied along a tree structure found by a fully -differentiable natural language chart parser. Our model simultaneously -optimises both the composition function and the parser, thus eliminating the -need for externally-provided parse trees which are normally required for -Tree-LSTM. It can therefore be seen as a tree-based RNN that is unsupervised -with respect to the parse trees. As it is fully differentiable, our model is -easily trained with an off-the-shelf gradient descent method and -backpropagation. We demonstrate that it achieves better performance compared to -various supervised Tree-LSTM architectures on a textual entailment task and a -reverse dictionary task. -" -4891,1705.09207,Yang Liu and Mirella Lapata,Learning Structured Text Representations,cs.CL cs.AI," In this paper, we focus on learning structure-aware document representations -from data without recourse to a discourse parser or additional annotations. -Drawing inspiration from recent efforts to empower neural networks with a -structural bias, we propose a model that can encode a document while -automatically inducing rich structural dependencies. Specifically, we embed a -differentiable non-projective parsing algorithm into a neural model and use -attention mechanisms to incorporate the structural biases. Experimental -evaluation across different tasks and datasets shows that the proposed model -achieves state-of-the-art results on document modeling tasks while inducing -intermediate structures which are both interpretable and meaningful. -" -4892,1705.09222,"Ashwini Jaya Kumar, S\""oren Auer, Christoph Schmidt, Joachim k\""ohler",Towards a Knowledge Graph based Speech Interface,cs.HC cs.CL," Applications which use human speech as an input require a speech interface -with high recognition accuracy. The words or phrases in the recognised text are -annotated with a machine-understandable meaning and linked to knowledge graphs -for further processing by the target application. These semantic annotations of -recognised words can be represented as a subject-predicate-object triples which -collectively form a graph often referred to as a knowledge graph. This type of -knowledge representation facilitates to use speech interfaces with any spoken -input application, since the information is represented in logical, semantic -form, retrieving and storing can be followed using any web standard query -languages. In this work, we develop a methodology for linking speech input to -knowledge graphs and study the impact of recognition errors in the overall -process. We show that for a corpus with lower WER, the annotation and linking -of entities to the DBpedia knowledge graph is considerable. DBpedia Spotlight, -a tool to interlink text documents with the linked open data is used to link -the speech recognition output to the DBpedia knowledge graph. Such a -knowledge-based speech recognition interface is useful for applications such as -question answering or spoken dialog systems. -" -4893,1705.09296,Dallas Card and Chenhao Tan and Noah A. Smith,Neural Models for Documents with Metadata,stat.ML cs.CL," Most real-world document collections involve various types of metadata, such -as author, source, and date, and yet the most commonly-used approaches to -modeling text corpora ignore this information. While specialized models have -been developed for particular applications, few are widely used in practice, as -customization typically requires derivation of a custom inference algorithm. In -this paper, we build on recent advances in variational inference methods and -propose a general neural framework, based on topic models, to enable flexible -incorporation of metadata and allow for rapid exploration of alternative -models. Our approach achieves strong performance, with a manageable tradeoff -between perplexity, coherence, and sparsity. Finally, we demonstrate the -potential of our framework through an exploration of a corpus of articles about -US immigration. -" -4894,1705.09515,"Edwin Simonnet (LIUM), Sahar Ghannay (LIUM), Nathalie Camelin (LIUM), - Yannick Est\`eve (LIUM), Renato De Mori (LIA)",ASR error management for improving spoken language understanding,cs.CL cs.AI cs.NE," This paper addresses the problem of automatic speech recognition (ASR) error -detection and their use for improving spoken language understanding (SLU) -systems. In this study, the SLU task consists in automatically extracting, from -ASR transcriptions , semantic concepts and concept/values pairs in a e.g -touristic information system. An approach is proposed for enriching the set of -semantic labels with error specific labels and by using a recently proposed -neural approach based on word embeddings to compute well calibrated ASR -confidence measures. Experimental results are reported showing that it is -possible to decrease significantly the Concept/Value Error Rate with a state of -the art system, outperforming previously published results performance on the -same experimental data. It also shown that combining an SLU approach based on -conditional random fields with a neural encoder/decoder attention based -architecture , it is possible to effectively identifying confidence islands and -uncertain semantic output segments useful for deciding appropriate error -handling actions by the dialogue manager strategy . -" -4895,1705.09516,"Patchigolla V S S Rahul, Sunil Kumar Sahu, Ashish Anand","Biomedical Event Trigger Identification Using Bidirectional Recurrent - Neural Network Based Models",cs.CL," Biomedical events describe complex interactions between various biomedical -entities. Event trigger is a word or a phrase which typically signifies the -occurrence of an event. Event trigger identification is an important first step -in all event extraction methods. However many of the current approaches either -rely on complex hand-crafted features or consider features only within a -window. In this paper we propose a method that takes the advantage of recurrent -neural network (RNN) to extract higher level features present across the -sentence. Thus hidden state representation of RNN along with word and entity -type embedding as features avoid relying on the complex hand-crafted features -generated using various NLP toolkits. Our experiments have shown to achieve -state-of-art F1-score on Multi Level Event Extraction (MLEE) corpus. We have -also performed category-wise analysis of the result and discussed the -importance of various features in trigger identification task. -" -4896,1705.09585,"Rohan Kshirsagar, Robert Morris, Sam Bowman",Detecting and Explaining Crisis,cs.CL," Individuals on social media may reveal themselves to be in various states of -crisis (e.g. suicide, self-harm, abuse, or eating disorders). Detecting crisis -from social media text automatically and accurately can have profound -consequences. However, detecting a general state of crisis without explaining -why has limited applications. An explanation in this context is a coherent, -concise subset of the text that rationalizes the crisis detection. We explore -several methods to detect and explain crisis using a combination of neural and -non-neural techniques. We evaluate these techniques on a unique data set -obtained from Koko, an anonymous emotional support network available through -various messaging applications. We annotate a small subset of the samples -labeled with crisis with corresponding explanations. Our best technique -significantly outperforms the baseline for detection and explanation. -" -4897,1705.09655,"Tianxiao Shen, Tao Lei, Regina Barzilay, Tommi Jaakkola",Style Transfer from Non-Parallel Text by Cross-Alignment,cs.CL cs.LG," This paper focuses on style transfer on the basis of non-parallel text. This -is an instance of a broad family of problems including machine translation, -decipherment, and sentiment modification. The key challenge is to separate the -content from other aspects such as style. We assume a shared latent content -distribution across different text corpora, and propose a method that leverages -refined alignment of latent representations to perform style transfer. The -transferred sentences from one style should match example sentences from the -other style as a population. We demonstrate the effectiveness of this -cross-alignment method on three tasks: sentiment modification, decipherment of -word substitution ciphers, and recovery of word order. -" -4898,1705.09656,"Terrence Szymanski, Claudia Orellana-Rodriguez and Mark T. Keane","Helping News Editors Write Better Headlines: A Recommender to Improve - the Keyword Contents & Shareability of News Headlines",cs.CL cs.HC cs.IR," We present a software tool that employs state-of-the-art natural language -processing (NLP) and machine learning techniques to help newspaper editors -compose effective headlines for online publication. The system identifies the -most salient keywords in a news article and ranks them based on both their -overall popularity and their direct relevance to the article. The system also -uses a supervised regression model to identify headlines that are likely to be -widely shared on social media. The user interface is designed to simplify and -speed the editor's decision process on the composition of the headline. As -such, the tool provides an efficient way to combine the benefits of automated -predictors of engagement and search-engine optimization (SEO) with human -judgments of overall headline quality. -" -4899,1705.09665,"Justine Zhang and William L. Hamilton and Cristian - Danescu-Niculescu-Mizil and Dan Jurafsky and Jure Leskovec",Community Identity and User Engagement in a Multi-Community Landscape,cs.SI cs.CL cs.CY physics.soc-ph," A community's identity defines and shapes its internal dynamics. Our current -understanding of this interplay is mostly limited to glimpses gathered from -isolated studies of individual communities. In this work we provide a -systematic exploration of the nature of this relation across a wide variety of -online communities. To this end we introduce a quantitative, language-based -typology reflecting two key aspects of a community's identity: how distinctive, -and how temporally dynamic it is. By mapping almost 300 Reddit communities into -the landscape induced by this typology, we reveal regularities in how patterns -of user engagement vary with the characteristics of a community. - Our results suggest that the way new and existing users engage with a -community depends strongly and systematically on the nature of the collective -identity it fosters, in ways that are highly consequential to community -maintainers. For example, communities with distinctive and highly dynamic -identities are more likely to retain their users. However, such niche -communities also exhibit much larger acculturation gaps between existing users -and newcomers, which potentially hinder the integration of the latter. - More generally, our methodology reveals differences in how various social -phenomena manifest across communities, and shows that structuring the -multi-community landscape can lead to a better understanding of the systematic -nature of this diversity. -" -4900,1705.09724,"Shane Walker, Morten Pedersen, Iroro Orife and Jason Flaks","Semi-Supervised Model Training for Unbounded Conversational Speech - Recognition",cs.CL," For conversational large-vocabulary continuous speech recognition (LVCSR) -tasks, up to about two thousand hours of audio is commonly used to train state -of the art models. Collection of labeled conversational audio however, is -prohibitively expensive, laborious and error-prone. Furthermore, academic -corpora like Fisher English (2004) or Switchboard (1992) are inadequate to -train models with sufficient accuracy in the unbounded space of conversational -speech. These corpora are also timeworn due to dated acoustic telephony -features and the rapid advancement of colloquial vocabulary and idiomatic -speech over the last decades. Utilizing the colossal scale of our unlabeled -telephony dataset, we propose a technique to construct a modern, high quality -conversational speech training corpus on the order of hundreds of millions of -utterances (or tens of thousands of hours) for both acoustic and language model -training. We describe the data collection, selection and training, evaluating -the results of our updated speech recognition system on a test corpus of 7K -manually transcribed utterances. We show relative word error rate (WER) -reductions of {35%, 19%} on {agent, caller} utterances over our seed model and -5% absolute WER improvements over IBM Watson STT on this conversational speech -task. -" -4901,1705.09731,"Massimo Stella, Nicole M. Beckage, Markus Brede and Manlio De Domenico",Multiplex model of mental lexicon reveals explosive learning in humans,physics.soc-ph cs.CL cs.SI nlin.AO," Word similarities affect language acquisition and use in a multi-relational -way barely accounted for in the literature. We propose a multiplex network -representation of this mental lexicon of word similarities as a natural -framework for investigating large-scale cognitive patterns. Our representation -accounts for semantic, taxonomic, and phonological interactions and it -identifies a cluster of words which are used with greater frequency, are -identified, memorised, and learned more easily, and have more meanings than -expected at random. This cluster emerges around age 7 through an explosive -transition not reproduced by null models. We relate this explosive emergence to -polysemy -- redundancy in word meanings. Results indicate that the word cluster -acts as a core for the lexicon, increasing both lexical navigability and -robustness to linguistic degradation. Our findings provide quantitative -confirmation of existing conjectures about core structure in the mental lexicon -and the importance of integrating multi-relational word-word interactions in -psycholinguistic frameworks. -" -4902,1705.09755,"Andrew J. Landgraf, Jeremy Bellay",word2vec Skip-Gram with Negative Sampling is a Weighted Logistic PCA,cs.CL stat.ML," We show that the skip-gram formulation of word2vec trained with negative -sampling is equivalent to a weighted logistic PCA. This connection allows us to -better understand the objective, compare it to other word embedding methods, -and extend it to higher dimensional models. -" -4903,1705.09837,Carlos G\'omez-Rodr\'iguez,"On the relation between dependency distance, crossing dependencies, and - parsing. Comment on ""Dependency distance: a new perspective on syntactic - patterns in natural languages"" by Haitao Liu et al",cs.CL," Liu et al. (2017) provide a comprehensive account of research on dependency -distance in human languages. While the article is a very rich and useful report -on this complex subject, here I will expand on a few specific issues where -research in computational linguistics (specifically natural language -processing) can inform DDM research, and vice versa. These aspects have not -been explored much in the article by Liu et al. or elsewhere, probably due to -the little overlap between both research communities, but they may provide -interesting insights for improving our understanding of the evolution of human -languages, the mechanisms by which the brain processes and understands -language, and the construction of effective computer systems to achieve this -goal. -" -4904,1705.09899,"Zeerak Waseem, Thomas Davidson, Dana Warmsley, Ingmar Weber",Understanding Abuse: A Typology of Abusive Language Detection Subtasks,cs.CL," As the body of research on abusive language detection and analysis grows, -there is a need for critical consideration of the relationships between -different subtasks that have been grouped under this label. Based on work on -hate speech, cyberbullying, and online abuse we propose a typology that -captures central similarities and differences between subtasks and we discuss -its implications for data annotation and feature construction. We emphasize the -practical actions that can be taken by researchers to best approach their -abusive language detection subtask of interest. -" -4905,1705.09906,"Haichao Zhang, Haonan Yu, and Wei Xu","Listen, Interact and Talk: Learning to Speak via Interaction",cs.CL," One of the long-term goals of artificial intelligence is to build an agent -that can communicate intelligently with human in natural language. Most -existing work on natural language learning relies heavily on training over a -pre-collected dataset with annotated labels, leading to an agent that -essentially captures the statistics of the fixed external training data. As the -training data is essentially a static snapshot representation of the knowledge -from the annotator, the agent trained this way is limited in adaptiveness and -generalization of its behavior. Moreover, this is very different from the -language learning process of humans, where language is acquired during -communication by taking speaking action and learning from the consequences of -speaking action in an interactive manner. This paper presents an interactive -setting for grounded natural language learning, where an agent learns natural -language by interacting with a teacher and learning from feedback, thus -learning and improving language skills while taking part in the conversation. -To achieve this goal, we propose a model which incorporates both imitation and -reinforcement by leveraging jointly sentence and reward feedbacks from the -teacher. Experiments are conducted to validate the effectiveness of the -proposed approach. -" -4906,1705.09932,Ramon Ferrer-i-Cancho,"The placement of the head that maximizes predictability. An information - theoretic approach",cs.CL nlin.AO physics.soc-ph q-bio.NC," The minimization of the length of syntactic dependencies is a -well-established principle of word order and the basis of a mathematical theory -of word order. Here we complete that theory from the perspective of information -theory, adding a competing word order principle: the maximization of -predictability of a target element. These two principles are in conflict: to -maximize the predictability of the head, the head should appear last, which -maximizes the costs with respect to dependency length minimization. The -implications of such a broad theoretical framework to understand the -optimality, diversity and evolution of the six possible orderings of subject, -object and verb are reviewed. -" -4907,1705.09975,"Nazli Farajidavar, Sefki Kolozali and Payam Barnaghi","A Deep Multi-View Learning Framework for City Event Extraction from - Twitter Data Streams",cs.SI cs.CL," Cities have been a thriving place for citizens over the centuries due to -their complex infrastructure. The emergence of the Cyber-Physical-Social -Systems (CPSS) and context-aware technologies boost a growing interest in -analysing, extracting and eventually understanding city events which -subsequently can be utilised to leverage the citizen observations of their -cities. In this paper, we investigate the feasibility of using Twitter textual -streams for extracting city events. We propose a hierarchical multi-view deep -learning approach to contextualise citizen observations of various city systems -and services. Our goal has been to build a flexible architecture that can learn -representations useful for tasks, thus avoiding excessive task-specific feature -engineering. We apply our approach on a real-world dataset consisting of event -reports and tweets of over four months from San Francisco Bay Area dataset and -additional datasets collected from London. The results of our evaluations show -that our proposed solution outperforms the existing models and can be used for -extracting city related events with an averaged accuracy of 81% over all -classes. To further evaluate the impact of our Twitter event extraction model, -we have used two sources of authorised reports through collecting road traffic -disruptions data from Transport for London API, and parsing the Time Out London -website for sociocultural events. The analysis showed that 49.5% of the Twitter -traffic comments are reported approximately five hours prior to the authorities -official records. Moreover, we discovered that amongst the scheduled -sociocultural event topics; tweets reporting transportation, cultural and -social events are 31.75% more likely to influence the distribution of the -Twitter comments than sport, weather and crime topics. -" -4908,1705.09980,Rik van Noord and Johan Bos,"Neural Semantic Parsing by Character-based Translation: Experiments with - Abstract Meaning Representations",cs.CL," We evaluate the character-level translation method for neural semantic -parsing on a large corpus of sentences annotated with Abstract Meaning -Representations (AMRs). Using a sequence-to-sequence model, and some trivial -preprocessing and postprocessing of AMRs, we obtain a baseline accuracy of 53.1 -(F-score on AMR-triples). We examine five different approaches to improve this -baseline result: (i) reordering AMR branches to match the word order of the -input sentence increases performance to 58.3; (ii) adding part-of-speech tags -(automatically produced) to the input shows improvement as well (57.2); (iii) -So does the introduction of super characters (conflating frequent sequences of -characters to a single character), reaching 57.4; (iv) optimizing the training -process by using pre-training and averaging a set of models increases -performance to 58.7; (v) adding silver-standard training data obtained by an -off-the-shelf parser yields the biggest improvement, resulting in an F-score of -64.0. Combining all five techniques leads to an F-score of 71.0 on holdout -data, which is state-of-the-art in AMR parsing. This is remarkable because of -the relative simplicity of the approach. -" -4909,1705.09993,John Pavlopoulos and Prodromos Malakasiotis and Ion Androutsopoulos,Deep Learning for User Comment Moderation,cs.CL cs.LG," Experimenting with a new dataset of 1.6M user comments from a Greek news -portal and existing datasets of English Wikipedia comments, we show that an RNN -outperforms the previous state of the art in moderation. A deep, -classification-specific attention mechanism improves further the overall -performance of the RNN. We also compare against a CNN and a word-list baseline, -considering both fully automatic and semi-automatic moderation. -" -4910,1705.09995,"Nisansa de Silva, Danaja Maldeniya, Chamilka Wijeratne","Subject Specific Stream Classification Preprocessing Algorithm for - Twitter Data Stream",cs.CL," Micro-blogging service Twitter is a lucrative source for data mining -applications on global sentiment. But due to the omnifariousness of the -subjects mentioned in each data item; it is inefficient to run a data mining -algorithm on the raw data. This paper discusses an algorithm to accurately -classify the entire stream in to a given number of mutually exclusive -collectively exhaustive streams upon each of which the data mining algorithm -can be run separately yielding more relevant results with a high efficiency. -" -4911,1705.10030,"Hu Xu, Lei Shu, Philip S. Yu","Supervised Complementary Entity Recognition with Augmented Key-value - Pairs of Knowledge",cs.CL," Extracting opinion targets is an important task in sentiment analysis on -product reviews and complementary entities (products) are one important type of -opinion targets that may work together with the reviewed product. In this -paper, we address the problem of Complementary Entity Recognition (CER) as a -supervised sequence labeling with the capability of expanding domain knowledge -as key-value pairs from unlabeled reviews, by automatically learning and -enhancing knowledge-based features. We use Conditional Random Field (CRF) as -the base learner and augment CRF with knowledge-based features (called the -Knowledge-based CRF or KCRF for short). We conduct experiments to show that -KCRF effectively improves the performance of supervised CER task. -" -4912,1705.10112,"Valery D. Solovyev, Vladimir V. Bochkarev, Anna V. Shevlyakova",Dynamics of core of language vocabulary,cs.CL," Studies of the overall structure of vocabulary and its dynamics became -possible due to creation of diachronic text corpora, especially Google Books -Ngram. This article discusses the question of core change rate and the degree -to which the core words cover the texts. Different periods of the last three -centuries and six main European languages presented in Google Books Ngram are -compared. The main result is high stability of core change rate, which is -analogous to stability of the Swadesh list. -" -4913,1705.10130,"Murtadha Talib AL-Sharuee, Fei Liu, Mahardhika Pratama","An Automatic Contextual Analysis and Clustering Classifiers Ensemble - approach to Sentiment Analysis",cs.CL," Products reviews are one of the major resources to determine the public -sentiment. The existing literature on reviews sentiment analysis mainly -utilizes supervised paradigm, which needs labeled data to be trained on and -suffers from domain-dependency. This article addresses these issues by -describes a completely automatic approach for sentiment analysis based on -unsupervised ensemble learning. The method consists of two phases. The first -phase is contextual analysis, which has five processes, namely (1) data -preparation; (2) spelling correction; (3) intensifier handling; (4) negation -handling and (5) contrast handling. The second phase comprises the unsupervised -learning approach, which is an ensemble of clustering classifiers using a -majority voting mechanism with different weight schemes. The base classifier of -the ensemble method is a modified k-means algorithm. The base classifier is -modified by extracting initial centroids from the feature set via using -SentWordNet (SWN). We also introduce new sentiment analysis problems of -Australian airlines and home builders which offer potential benchmark problems -in the sentiment analysis field. Our experiments on datasets from different -domains show that contextual analysis and the ensemble phases improve the -clustering performance in term of accuracy, stability and generalization -ability. -" -4914,1705.10209,"Micha{\l} Zapotoczny, Pawe{\l} Rychlikowski, and Jan Chorowski",On Multilingual Training of Neural Dependency Parsers,cs.CL cs.LG cs.NE," We show that a recently proposed neural dependency parser can be improved by -joint training on multiple languages from the same family. The parser is -implemented as a deep neural network whose only input is orthographic -representations of words. In order to successfully parse, the network has to -discover how linguistically relevant concepts can be inferred from word -spellings. We analyze the representations of characters and words that are -learned by the network to establish which properties of languages were -accounted for. In particular we show that the parser has approximately learned -to associate Latin characters with their Cyrillic counterparts and that it can -group Polish and Russian words that have a similar grammatical function. -Finally, we evaluate the parser on selected languages from the Universal -Dependencies dataset and show that it is competitive with other recently -proposed state-of-the art methods, while having a simple structure. -" -4915,1705.10229,"Tsung-Hsien Wen, Yishu Miao, Phil Blunsom, Steve Young",Latent Intention Dialogue Models,cs.CL cs.LG cs.NE stat.ML," Developing a dialogue agent that is capable of making autonomous decisions -and communicating by natural language is one of the long-term goals of machine -learning research. Traditional approaches either rely on hand-crafting a small -state-action set for applying reinforcement learning that is not scalable or -constructing deterministic models for learning dialogue sentences that fail to -capture natural conversational variability. In this paper, we propose a Latent -Intention Dialogue Model (LIDM) that employs a discrete latent variable to -learn underlying dialogue intentions in the framework of neural variational -inference. In a goal-oriented dialogue scenario, these latent intentions can be -interpreted as actions guiding the generation of machine responses, which can -be further refined autonomously by reinforcement learning. The experimental -evaluation of LIDM shows that the model out-performs published benchmarks for -both corpus-based and human evaluation, demonstrating the effectiveness of -discrete latent variable models for learning goal-oriented dialogues. -" -4916,1705.10272,Xinru Yan and Ted Pedersen,"Who's to say what's funny? A computer using Language Models and Deep - Learning, That's Who!",cs.CL," Humor is a defining characteristic of human beings. Our goal is to develop -methods that automatically detect humorous statements and rank them on a -continuous scale. In this paper we report on results using a Language Model -approach, and outline our plans for using methods from Deep Learning. -" -4917,1705.10369,"Katrina Evtimova, Andrew Drozdov, Douwe Kiela, Kyunghyun Cho","Emergent Communication in a Multi-Modal, Multi-Step Referential Game",cs.LG cs.CL cs.CV cs.IT cs.MA math.IT," Inspired by previous work on emergent communication in referential games, we -propose a novel multi-modal, multi-step referential game, where the sender and -receiver have access to distinct modalities of an object, and their information -exchange is bidirectional and of arbitrary duration. The multi-modal multi-step -setting allows agents to develop an internal communication significantly closer -to natural language, in that they share a single set of messages, and that the -length of the conversation may vary according to the difficulty of the task. We -examine these properties empirically using a dataset consisting of images and -textual descriptions of mammals, where the agents are tasked with identifying -the correct object. Our experiments indicate that a robust and efficient -communication protocol emerges, where gradual information exchange informs -better predictions and higher communication bandwidth improves generalization. -" -4918,1705.10415,"Vanessa Q. Marinho, Henrique F. de Arruda, Thales S. Lima, Luciano F. - Costa and Diego R. Amancio","On the ""Calligraphy"" of Books",cs.CL," Authorship attribution is a natural language processing task that has been -widely studied, often by considering small order statistics. In this paper, we -explore a complex network approach to assign the authorship of texts based on -their mesoscopic representation, in an attempt to capture the flow of the -narrative. Indeed, as reported in this work, such an approach allowed the -identification of the dominant narrative structure of the studied authors. This -has been achieved due to the ability of the mesoscopic approach to take into -account relationships between different, not necessarily adjacent, parts of the -text, which is able to capture the story flow. The potential of the proposed -approach has been illustrated through principal component analysis, a -comparison with the chance baseline method, and network visualization. Such -visualizations reveal individual characteristics of the authors, which can be -understood as a kind of calligraphy. -" -4919,1705.10586,Zhenzhou Wu and Xin Zheng and Daniel Dahlmeier,"Character-Based Text Classification using Top Down Semantic Model for - Sentence Representation",cs.CL cs.LG," Despite the success of deep learning on many fronts especially image and -speech, its application in text classification often is still not as good as a -simple linear SVM on n-gram TF-IDF representation especially for smaller -datasets. Deep learning tends to emphasize on sentence level semantics when -learning a representation with models like recurrent neural network or -recursive neural network, however from the success of TF-IDF representation, it -seems a bag-of-words type of representation has its strength. Taking advantage -of both representions, we present a model known as TDSM (Top Down Semantic -Model) for extracting a sentence representation that considers both the -word-level semantics by linearly combining the words with attention weights and -the sentence-level semantics with BiLSTM and use it on text classification. We -apply the model on characters and our results show that our model is better -than all the other character-based and word-based convolutional neural network -models by \cite{zhang15} across seven different datasets with only 1\% of their -parameters. We also demonstrate that this model beats traditional linear models -on TF-IDF vectors on small and polished datasets like news article in which -typically deep learning models surrender. -" -4920,1705.10610,"Thai-Hoang Pham, Phuong Le-Hong","The Importance of Automatic Syntactic Features in Vietnamese Named - Entity Recognition",cs.CL," This paper presents a state-of-the-art system for Vietnamese Named Entity -Recognition (NER). By incorporating automatic syntactic features with word -embeddings as input for bidirectional Long Short-Term Memory (Bi-LSTM), our -system, although simpler than some deep learning architectures, achieves a much -better result for Vietnamese NER. The proposed method achieves an overall F1 -score of 92.05% on the test set of an evaluation campaign, organized in late -2016 by the Vietnamese Language and Speech Processing (VLSP) community. Our -named entity recognition system outperforms the best previous systems for -Vietnamese NER by a large margin. -" -4921,1705.10754,Francisco Rangel and Marc Franco-Salvador and Paolo Rosso,A Low Dimensionality Representation for Language Variety Identification,cs.CL," Language variety identification aims at labelling texts in a native language -(e.g. Spanish, Portuguese, English) with its specific variation (e.g. -Argentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In this work -we propose a low dimensionality representation (LDR) to address this task with -five different varieties of Spanish: Argentina, Chile, Mexico, Peru and Spain. -We compare our LDR method with common state-of-the-art representations and show -an increase in accuracy of ~35%. Furthermore, we compare LDR with two reference -distributed representation models. Experimental results show competitive -performance while dramatically reducing the dimensionality --and increasing the -big data suitability-- to only 6 features per variety. Additionally, we analyse -the behaviour of the employed machine learning algorithms and the most -discriminating features. Finally, we employ an alternative dataset to test the -robustness of our low dimensionality representation with another set of similar -languages. -" -4922,1705.10814,Xiang Yu and Ngoc Thang Vu,"Character Composition Model with Convolutional Neural Networks for - Dependency Parsing on Morphologically Rich Languages",cs.CL," We present a transition-based dependency parser that uses a convolutional -neural network to compose word representations from characters. The character -composition model shows great improvement over the word-lookup model, -especially for parsing agglutinative languages. These improvements are even -better than using pre-trained word embeddings from extra data. On the SPMRL -data sets, our system outperforms the previous best greedy parser (Ballesteros -et al., 2015) by a margin of 3% on average. -" -4923,1705.10874,"Zixing Zhang, J\""urgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, - Wenyu Jin, Bj\""orn Schuller","Deep Learning for Environmentally Robust Speech Recognition: An Overview - of Recent Developments",cs.SD cs.CL cs.LG," Eliminating the negative effect of non-stationary environmental noise is a -long-standing research topic for automatic speech recognition that stills -remains an important challenge. Data-driven supervised approaches, including -ones based on deep neural networks, have recently emerged as potential -alternatives to traditional unsupervised approaches and with sufficient -training, can alleviate the shortcomings of the unsupervised methods in various -real-life acoustic environments. In this light, we review recently developed, -representative deep learning approaches for tackling non-stationary additive -and convolutional degradation of speech with the aim of providing guidelines -for those involved in the development of environmentally robust speech -recognition systems. We separately discuss single- and multi-channel techniques -developed for the front-end and back-end of speech recognition systems, as well -as joint front-end and back-end training frameworks. -" -4924,1705.10900,"Paul Michel, Abhilasha Ravichander, Shruti Rijhwani","Does the Geometry of Word Embeddings Help Document Classification? A - Case Study on Persistent Homology Based Representations",cs.CL," We investigate the pertinence of methods from algebraic topology for text -data analysis. These methods enable the development of -mathematically-principled isometric-invariant mappings from a set of vectors to -a document embedding, which is stable with respect to the geometry of the -document in the selected metric space. In this work, we evaluate the utility of -these topology-based document representations in traditional NLP tasks, -specifically document clustering and sentiment classification. We find that the -embeddings do not benefit text analysis. In fact, performance is worse than -simple techniques like $\textit{tf-idf}$, indicating that the geometry of the -document does not provide enough variability for classification on the basis of -topic or sentiment in the chosen datasets. -" -4925,1705.10929,"Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, - Aaron Courville",Adversarial Generation of Natural Language,cs.CL cs.AI cs.NE stat.ML," Generative Adversarial Networks (GANs) have gathered a lot of attention from -the computer vision community, yielding impressive results for image -generation. Advances in the adversarial generation of natural language from -noise however are not commensurate with the progress made in generating images, -and still lag far behind likelihood based methods. In this paper, we take a -step towards generating natural language with a GAN objective alone. We -introduce a simple baseline that addresses the discrete output space problem -without relying on gradient estimators and show that it is able to achieve -state-of-the-art results on a Chinese poem generation dataset. We present -quantitative results on generating sentences from context-free and -probabilistic context-free grammars, and qualitative language modeling results. -A conditional version is also described that can generate sequences conditioned -on sentence characteristics. -" -4926,1705.10962,"Koichiro Yoshino, Shinsuke Mori, Satoshi Nakamura","Analysis of the Effect of Dependency Information on Predicate-Argument - Structure Analysis and Zero Anaphora Resolution",cs.CL," This paper investigates and analyzes the effect of dependency information on -predicate-argument structure analysis (PASA) and zero anaphora resolution (ZAR) -for Japanese, and shows that a straightforward approach of PASA and ZAR works -effectively even if dependency information was not available. We constructed an -analyzer that directly predicts relationships of predicates and arguments with -their semantic roles from a POS-tagged corpus. The features of the system are -designed to compensate for the absence of syntactic information by using -features used in dependency parsing as a reference. We also constructed -analyzers that use the oracle dependency and the real dependency parsing -results, and compared with the system that does not use any syntactic -information to verify that the improvement provided by dependencies is not -crucial. -" -4927,1705.11001,"Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun",Adversarial Ranking for Language Generation,cs.CL cs.LG," Generative adversarial networks (GANs) have great successes on synthesizing -data. However, the existing GANs restrict the discriminator to be a binary -classifier, and thus limit their learning capacity for tasks that need to -synthesize output with rich structures such as natural language descriptions. -In this paper, we propose a novel generative adversarial network, RankGAN, for -generating high-quality language descriptions. Rather than training the -discriminator to learn and assign absolute binary predicate for individual data -sample, the proposed RankGAN is able to analyze and rank a collection of -human-written and machine-written sentences by giving a reference group. By -viewing a set of data samples collectively and evaluating their quality through -relative ranking scores, the discriminator is able to make better assessment -which in turn helps to learn a better generator. The proposed RankGAN is -optimized through the policy gradient technique. Experimental results on -multiple public datasets clearly demonstrate the effectiveness of the proposed -approach. -" -4928,1705.11122,"Qizhe Xie, Zihang Dai, Yulun Du, Eduard Hovy, Graham Neubig",Controllable Invariance through Adversarial Feature Learning,cs.LG cs.AI cs.CL," Learning meaningful representations that maintain the content necessary for a -particular task while filtering away detrimental variations is a problem of -great interest in machine learning. In this paper, we tackle the problem of -learning representations invariant to a specific factor or trait of data. The -representation learning process is formulated as an adversarial minimax game. -We analyze the optimal equilibrium of such a game and find that it amounts to -maximizing the uncertainty of inferring the detrimental factor given the -representation while maximizing the certainty of making task-specific -predictions. On three benchmark tasks, namely fair and bias-free -classification, language-independent generation, and lighting-independent image -classification, we show that the proposed framework induces an invariant -representation, and leads to better generalization evidenced by the improved -performance. -" -4929,1705.11160,Junhui Li and Muhua Zhu,Learning When to Attend for Neural Machine Translation,cs.CL," In the past few years, attention mechanisms have become an indispensable -component of end-to-end neural machine translation models. However, previous -attention models always refer to some source words when predicting a target -word, which contradicts with the fact that some target words have no -corresponding source words. Motivated by this observation, we propose a novel -attention model that has the capability of determining when a decoder should -attend to source words and when it should not. Experimental results on NIST -Chinese-English translation tasks show that the new model achieves an -improvement of 0.8 BLEU score over a state-of-the-art baseline. -" -4930,1705.11168,"Li Lucy, Jon Gauthier","Are distributional representations ready for the real world? Evaluating - word vectors for grounded perceptual meaning",cs.CL," Distributional word representation methods exploit word co-occurrences to -build compact vector encodings of words. While these representations enjoy -widespread use in modern natural language processing, it is unclear whether -they accurately encode all necessary facets of conceptual meaning. In this -paper, we evaluate how well these representations can predict perceptual and -conceptual features of concrete concepts, drawing on two semantic norm datasets -sourced from human participants. We find that several standard word -representations fail to encode many salient perceptual features of concepts, -and show that these deficits correlate with word-word similarity prediction -errors. Our analyses provide motivation for grounded and embodied language -learning approaches, which may help to remedy these deficits. -" -4931,1705.11192,"Serhii Havrylov, Ivan Titov","Emergence of Language with Multi-agent Games: Learning to Communicate - with Sequences of Symbols",cs.LG cs.CL cs.CV cs.MA," Learning to communicate through interaction, rather than relying on explicit -supervision, is often considered a prerequisite for developing a general AI. We -study a setting where two agents engage in playing a referential game and, from -scratch, develop a communication protocol necessary to succeed in this game. -Unlike previous work, we require that messages they exchange, both at train and -test time, are in the form of a language (i.e. sequences of discrete symbols). -We compare a reinforcement learning approach and one using a differentiable -relaxation (straight-through Gumbel-softmax estimator) and observe that the -latter is much faster to converge and it results in more effective protocols. -Interestingly, we also observe that the protocol we induce by optimizing the -communication success exhibits a degree of compositionality and variability -(i.e. the same information can be phrased in different ways), both properties -characteristic of natural languages. As the ultimate goal is to ensure that -communication is accomplished in natural language, we also perform experiments -where we inject prior information about natural language into our model and -study properties of the resulting protocol. -" -4932,1706.00130,"Huan Ling, Sanja Fidler",Teaching Machines to Describe Images via Natural Language Feedback,cs.CL cs.AI cs.CV cs.HC," Robots will eventually be part of every household. It is thus critical to -enable algorithms to learn from and be guided by non-expert users. In this -paper, we bring a human in the loop, and enable a human teacher to give -feedback to a learning agent in the form of natural language. We argue that a -descriptive sentence can provide a much stronger learning signal than a numeric -reward in that it can easily point to where the mistakes are and how to correct -them. We focus on the problem of image captioning in which the quality of the -output can easily be judged by non-experts. We propose a hierarchical -phrase-based captioning model trained with policy gradients, and design a -feedback network that provides reward to the learner by conditioning on the -human-provided feedback. We show that by exploiting descriptive feedback our -model learns to perform better than when given independently written human -captions. -" -4933,1706.00134,"Van-Khanh Tran, Le-Minh Nguyen","Semantic Refinement GRU-based Neural Language Generation for Spoken - Dialogue Systems",cs.CL," Natural language generation (NLG) plays a critical role in spoken dialogue -systems. This paper presents a new approach to NLG by using recurrent neural -networks (RNN), in which a gating mechanism is applied before RNN computation. -This allows the proposed model to generate appropriate sentences. The RNN-based -generator can be learned from unaligned data by jointly training sentence -planning and surface realization to produce natural language responses. The -model was extensively evaluated on four different NLG domains. The results show -that the proposed generator achieved better performance on all the NLG domains -compared to previous generators. -" -4934,1706.00139,"Van-Khanh Tran, Le-Minh Nguyen","Natural Language Generation for Spoken Dialogue System using RNN - Encoder-Decoder Networks",cs.CL," Natural language generation (NLG) is a critical component in a spoken -dialogue system. This paper presents a Recurrent Neural Network based -Encoder-Decoder architecture, in which an LSTM-based decoder is introduced to -select, aggregate semantic elements produced by an attention mechanism over the -input elements, and to produce the required utterances. The proposed generator -can be jointly trained both sentence planning and surface realization to -produce natural language sentences. The proposed model was extensively -evaluated on four different NLG datasets. The experimental results showed that -the proposed generators not only consistently outperform the previous methods -across all the NLG domains but also show an ability to generalize from a new, -unseen domain and learn from multi-domain datasets. -" -4935,1706.00188,"Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, Vasudeva Varma",Deep Learning for Hate Speech Detection in Tweets,cs.CL cs.IR," Hate speech detection on Twitter is critical for applications like -controversial event extraction, building AI chatterbots, content -recommendation, and sentiment analysis. We define this task as being able to -classify a tweet as racist, sexist or neither. The complexity of the natural -language constructs makes this task very challenging. We perform extensive -experiments with multiple deep learning architectures to learn semantic word -embeddings to handle this complexity. Our experiments on a benchmark dataset of -16K annotated tweets show that such deep learning methods outperform -state-of-the-art char/word n-gram methods by ~18 F1 points. -" -4936,1706.00245,"Danijel Kor\v{z}inek, Krzysztof Marasek, {\L}ukasz Brocki and - Krzysztof Wo{\l}k",Polish Read Speech Corpus for Speech Tools and Services,cs.CL," This paper describes the speech processing activities conducted at the Polish -consortium of the CLARIN project. The purpose of this segment of the project -was to develop specific tools that would allow for automatic and semi-automatic -processing of large quantities of acoustic speech data. The tools include the -following: grapheme-to-phoneme conversion, speech-to-text alignment, voice -activity detection, speaker diarization, keyword spotting and automatic speech -transcription. Furthermore, in order to develop these tools, a large -high-quality studio speech corpus was recorded and released under an open -license, to encourage development in the area of Polish speech research. -Another purpose of the corpus was to serve as a reference for studies in -phonetics and pronunciation. All the tools and resources were released on the -the Polish CLARIN website. This paper discusses the current status and future -plans for the project. -" -4937,1706.00286,"Dzmitry Bahdanau, Tom Bosc, Stanis{\l}aw Jastrz\k{e}bski, Edward - Grefenstette, Pascal Vincent, Yoshua Bengio",Learning to Compute Word Embeddings On the Fly,cs.LG cs.CL," Words in natural language follow a Zipfian distribution whereby some words -are frequent but most are rare. Learning representations for words in the ""long -tail"" of this distribution requires enormous amounts of data. Representations -of rare words trained directly on end tasks are usually poor, requiring us to -pre-train embeddings on external data, or treat all rare words as -out-of-vocabulary words with a unique representation. We provide a method for -predicting embeddings of rare words on the fly from small amounts of auxiliary -data with a network trained end-to-end for the downstream task. We show that -this improves results against baselines where embeddings are trained on the end -task for reading comprehension, recognizing textual entailment and language -modeling. -" -4938,1706.00290,"Julius Kunze, Louis Kirsch, Ilia Kurenkov, Andreas Krug, Jens - Johannsmeier and Sebastian Stober",Transfer Learning for Speech Recognition on a Budget,cs.LG cs.CL cs.NE stat.ML," End-to-end training of automated speech recognition (ASR) systems requires -massive data and compute resources. We explore transfer learning based on model -adaptation as an approach for training ASR models under constrained GPU memory, -throughput and training data. We conduct several systematic experiments -adapting a Wav2Letter convolutional neural network originally trained for -English ASR to the German language. We show that this technique allows faster -training on consumer-grade resources while requiring less training data in -order to achieve the same accuracy, thereby lowering the cost of training ASR -models in other languages. Model introspection revealed that small adaptations -to the network's weights were sufficient for good performance, especially for -inner layers. -" -4939,1706.00321,"Jan Trmal, Gaurav Kumar, Vimal Manohar, Sanjeev Khudanpur, Matt Post, - Paul McNamee",Using of heterogeneous corpora for training of an ASR system,cs.CL," The paper summarizes the development of the LVCSR system built as a part of -the Pashto speech-translation system at the SCALE (Summer Camp for Applied -Language Exploration) 2015 workshop on ""Speech-to-text-translation for -low-resource languages"". The Pashto language was chosen as a good ""proxy"" -low-resource language, exhibiting multiple phenomena which make the -speech-recognition and and speech-to-text-translation systems development hard. - Even when the amount of data is seemingly sufficient, given the fact that the -data originates from multiple sources, the preliminary experiments reveal that -there is little to no benefit in merging (concatenating) the corpora and more -elaborate ways of making use of all of the data must be worked out. - This paper concentrates only on the LVCSR part and presents a range of -different techniques that were found to be useful in order to benefit from -multiple different corpora -" -4940,1706.00359,"Yishu Miao, Edward Grefenstette, Phil Blunsom",Discovering Discrete Latent Topics with Neural Variational Inference,cs.CL cs.AI cs.IR cs.LG," Topic models have been widely explored as probabilistic generative models of -documents. Traditional inference methods have sought closed-form derivations -for updating the models, however as the expressiveness of these models grows, -so does the difficulty of performing fast and accurate inference over their -parameters. This paper presents alternative neural approaches to topic -modelling by providing parameterisable distributions over topics which permit -training by backpropagation in the framework of neural variational inference. -In addition, with the help of a stick-breaking construction, we propose a -recurrent network that is able to discover a notionally unbounded number of -topics, analogous to Bayesian non-parametric topic models. Experimental results -on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the -effectiveness and efficiency of these neural topic models. -" -4941,1706.00374,"Nikola Mrk\v{s}i\'c, Ivan Vuli\'c, Diarmuid \'O S\'eaghdha, Ira - Leviant, Roi Reichart, Milica Ga\v{s}i\'c, Anna Korhonen and Steve Young","Semantic Specialisation of Distributional Word Vector Spaces using - Monolingual and Cross-Lingual Constraints",cs.CL cs.AI cs.LG," We present Attract-Repel, an algorithm for improving the semantic quality of -word vectors by injecting constraints extracted from lexical resources. -Attract-Repel facilitates the use of constraints from mono- and cross-lingual -resources, yielding semantically specialised cross-lingual vector spaces. Our -evaluation shows that the method can make use of existing cross-lingual -lexicons to construct high-quality vector spaces for a plethora of different -languages, facilitating semantic transfer from high- to lower-resource ones. -The effectiveness of our approach is demonstrated with state-of-the-art results -on semantic similarity datasets in six languages. We next show that -Attract-Repel-specialised vectors boost performance in the downstream task of -dialogue state tracking (DST) across multiple languages. Finally, we show that -cross-lingual vector spaces produced by our algorithm facilitate the training -of multilingual DST models, which brings further performance improvements. -" -4942,1706.00377,"Ivan Vuli\'c, Nikola Mrk\v{s}i\'c, Roi Reichart, Diarmuid \'O - S\'eaghdha, Steve Young, and Anna Korhonen","Morph-fitting: Fine-Tuning Word Vector Spaces with Simple - Language-Specific Rules",cs.CL," Morphologically rich languages accentuate two properties of distributional -vector space models: 1) the difficulty of inducing accurate representations for -low-frequency word forms; and 2) insensitivity to distinct lexical relations -that have similar distributional signatures. These effects are detrimental for -language understanding systems, which may infer that 'inexpensive' is a -rephrasing for 'expensive' or may not associate 'acquire' with 'acquires'. In -this work, we propose a novel morph-fitting procedure which moves past the use -of curated semantic lexicons for improving distributional vector spaces. -Instead, our method injects morphological constraints generated using simple -language-specific rules, pulling inflectional forms of the same word close -together and pushing derivational antonyms far apart. In intrinsic evaluation -over four languages, we show that our approach: 1) improves low-frequency word -estimates; and 2) boosts the semantic quality of the entire word vector -collection. Finally, we show that morph-fitted vectors yield large gains in the -downstream task of dialogue state tracking, highlighting the importance of -morphology for tackling long-tail phenomena in language understanding tasks. -" -4943,1706.00457,"Ozan Caglayan, Mercedes Garc\'ia-Mart\'inez, Adrien Bardet, Walid - Aransa, Fethi Bougares, Lo\""ic Barrault","NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation - Systems",cs.CL," In this paper, we present nmtpy, a flexible Python toolkit based on Theano -for training Neural Machine Translation and other neural sequence-to-sequence -architectures. nmtpy decouples the specification of a network from the training -and inference utilities to simplify the addition of a new architecture and -reduce the amount of boilerplate code to be written. nmtpy has been used for -LIUM's top-ranked submissions to WMT Multimodal Machine Translation and News -Translation tasks in 2016 and 2017. -" -4944,1706.00465,"Elodie Gauthier, Laurent Besacier, Sylvie Voisin",Machine Assisted Analysis of Vowel Length Contrasts in Wolof,cs.CL," Growing digital archives and improving algorithms for automatic analysis of -text and speech create new research opportunities for fundamental research in -phonetics. Such empirical approaches allow statistical evaluation of a much -larger set of hypothesis about phonetic variation and its conditioning factors -(among them geographical / dialectal variants). This paper illustrates this -vision and proposes to challenge automatic methods for the analysis of a not -easily observable phenomenon: vowel length contrast. We focus on Wolof, an -under-resourced language from Sub-Saharan Africa. In particular, we propose -multiple features to make a fine evaluation of the degree of length contrast -under different factors such as: read vs semi spontaneous speech ; standard vs -dialectal Wolof. Our measures made fully automatically on more than 20k vowel -tokens show that our proposed features can highlight different degrees of -contrast for each vowel considered. We notably show that contrast is weaker in -semi-spontaneous speech and in a non standard semi-spontaneous dialect. -" -4945,1706.00468,Kyle Richardson and Jonas Kuhn,Function Assistant: A Tool for NL Querying of APIs,cs.CL," In this paper, we describe Function Assistant, a lightweight Python-based -toolkit for querying and exploring source code repositories using natural -language. The toolkit is designed to help end-users of a target API quickly -find information about functions through high-level natural language queries -and descriptions. For a given text query and background API, the tool finds -candidate functions by performing a translation from the text to known -representations in the API using the semantic parsing approach of Richardson -and Kuhn (2017). Translations are automatically learned from example text-code -pairs in example APIs. The toolkit includes features for building translation -pipelines and query engines for arbitrary source code projects. To explore this -last feature, we perform new experiments on 27 well-known Python projects -hosted on Github. -" -4946,1706.00506,"Onur Gungor, Eray Yildiz, Suzan Uskudarli, Tunga Gungor","Morphological Embeddings for Named Entity Recognition in Morphologically - Rich Languages",cs.CL," In this work, we present new state-of-the-art results of 93.59,% and 79.59,% -for Turkish and Czech named entity recognition based on the model of (Lample et -al., 2016). We contribute by proposing several schemes for representing the -morphological analysis of a word in the context of named entity recognition. We -show that a concatenation of this representation with the word and character -embeddings improves the performance. The effect of these representation schemes -on the tagging performance is also investigated. -" -4947,1706.00593,"Jooyeon Kim, Dongwoo Kim, Alice Oh","Joint Modeling of Topics, Citations, and Topical Authority in Academic - Corpora",cs.CL cs.DL cs.SI," Much of scientific progress stems from previously published findings, but -searching through the vast sea of scientific publications is difficult. We -often rely on metrics of scholarly authority to find the prominent authors but -these authority indices do not differentiate authority based on research -topics. We present Latent Topical-Authority Indexing (LTAI) for jointly -modeling the topics, citations, and topical authority in a corpus of academic -papers. Compared to previous models, LTAI differs in two main aspects. First, -it explicitly models the generative process of the citations, rather than -treating the citations as given. Second, it models each author's influence on -citations of a paper based on the topics of the cited papers, as well as the -citing papers. We fit LTAI to four academic corpora: CORA, Arxiv Physics, PNAS, -and Citeseer. We compare the performance of LTAI against various baselines, -starting with the latent Dirichlet allocation, to the more advanced models -including author-link topic model and dynamic author citation topic model. The -results show that LTAI achieves improved accuracy over other similar models -when predicting words, citations and authors of publications. -" -4948,1706.00612,"Michael Neumann, Ngoc Thang Vu","Attentive Convolutional Neural Network based Speech Emotion Recognition: - A Study on the Impact of Input Features, Signal Length, and Acted Speech",cs.CL," Speech emotion recognition is an important and challenging task in the realm -of human-computer interaction. Prior work proposed a variety of models and -feature sets for training a system. In this work, we conduct extensive -experiments using an attentive convolutional neural network with multi-view -learning objective function. We compare system performance using different -lengths of the input signal, different types of acoustic features and different -types of emotion speech (improvised/scripted). Our experimental results on the -Interactive Emotional Motion Capture (IEMOCAP) database reveal that the -recognition performance strongly depends on the type of speech data independent -of the choice of input features. Furthermore, we achieved state-of-the-art -results on the improvised speech data of IEMOCAP. -" -4949,1706.00741,Sabrina Stehwien and Ngoc Thang Vu,"Prosodic Event Recognition using Convolutional Neural Networks with - Context Information",cs.CL," This paper demonstrates the potential of convolutional neural networks (CNN) -for detecting and classifying prosodic events on words, specifically pitch -accents and phrase boundary tones, from frame-based acoustic features. Typical -approaches use not only feature representations of the word in question but -also its surrounding context. We show that adding position features indicating -the current word benefits the CNN. In addition, this paper discusses the -generalization from a speaker-dependent modelling approach to a -speaker-independent setup. The proposed method is simple and efficient and -yields strong results not only in speaker-dependent but also -speaker-independent cases. -" -4950,1706.00884,"Shuhan Yuan, Xintao Wu, Yang Xiang","Task-specific Word Identification from Short Texts Using a Convolutional - Neural Network",cs.CL cs.IR cs.LG," Task-specific word identification aims to choose the task-related words that -best describe a short text. Existing approaches require well-defined seed words -or lexical dictionaries (e.g., WordNet), which are often unavailable for many -applications such as social discrimination detection and fake review detection. -However, we often have a set of labeled short texts where each short text has a -task-related class label, e.g., discriminatory or non-discriminatory, specified -by users or learned by classification algorithms. In this paper, we focus on -identifying task-specific words and phrases from short texts by exploiting -their class labels rather than using seed words or lexical dictionaries. We -consider the task-specific word and phrase identification as feature learning. -We train a convolutional neural network over a set of labeled texts and use -score vectors to localize the task-specific words and phrases. Experimental -results on sentiment word identification show that our approach significantly -outperforms existing methods. We further conduct two case studies to show the -effectiveness of our approach. One case study on a crawled tweets dataset -demonstrates that our approach can successfully capture the -discrimination-related words/phrases. The other case study on fake review -detection shows that our approach can identify the fake-review words/phrases. -" -4951,1706.00887,"Shuhan Yuan, Panpan Zheng, Xintao Wu, Yang Xiang",Wikipedia Vandal Early Detection: from User Behavior to User Embedding,cs.CR cs.CL cs.CY," Wikipedia is the largest online encyclopedia that allows anyone to edit -articles. In this paper, we propose the use of deep learning to detect vandals -based on their edit history. In particular, we develop a multi-source -long-short term memory network (M-LSTM) to model user behaviors by using a -variety of user edit aspects as inputs, including the history of edit reversion -information, edit page titles and categories. With M-LSTM, we can encode each -user into a low dimensional real vector, called user embedding. Meanwhile, as a -sequential model, M-LSTM updates the user embedding each time after the user -commits a new edit. Thus, we can predict whether a user is benign or vandal -dynamically based on the up-to-date user embedding. Furthermore, those user -embeddings are crucial to discover collaborative vandals. -" -4952,1706.00927,Su Zhu and Kai Yu,Concept Transfer Learning for Adaptive Language Understanding,cs.CL," Concept definition is important in language understanding (LU) adaptation -since literal definition difference can easily lead to data sparsity even if -different data sets are actually semantically correlated. To address this -issue, in this paper, a novel concept transfer learning approach is proposed. -Here, substructures within literal concept definition are investigated to -reveal the relationship between concepts. A hierarchical semantic -representation for concepts is proposed, where a semantic slot is represented -as a composition of {\em atomic concepts}. Based on this new hierarchical -representation, transfer learning approaches are developed for adaptive LU. The -approaches are applied to two tasks: value set mismatch and domain adaptation, -and evaluated on two LU benchmarks: ATIS and DSTC 2\&3. Thorough empirical -studies validate both the efficiency and effectiveness of the proposed method. -In particular, we achieve state-of-the-art performance ($F_1$-score 96.08\%) on -ATIS by only using lexicon features. -" -4953,1706.01038,"Danilo S. Carvalho, Duc-Vu Tran, Van-Khanh Tran, Le-Nguyen Minh","Improving Legal Information Retrieval by Distributional Composition with - Term Order Probabilities",cs.IR cs.CL," Legal professionals worldwide are currently trying to get up-to-pace with the -explosive growth in legal document availability through digital means. This -drives a need for high efficiency Legal Information Retrieval (IR) and Question -Answering (QA) methods. The IR task in particular has a set of unique -challenges that invite the use of semantic motivated NLP techniques. In this -work, a two-stage method for Legal Information Retrieval is proposed, combining -lexical statistics and distributional sentence representations in the context -of Competition on Legal Information Extraction/Entailment (COLIEE). The -combination is done with the use of disambiguation rules, applied over the -rankings obtained through n-gram statistics. After the ranking is done, its -results are evaluated for ambiguity, and disambiguation is done if a result is -decided to be unreliable for a given query. Competition and experimental -results indicate small gains in overall retrieval performance using the -proposed approach. Additionally, an analysis of error and improvement cases is -presented for a better understanding of the contributions. -" -4954,1706.01069,"Xinyu Fu, Eugene Ch'ng, Uwe Aickelin, Simon See",CRNN: A Joint Neural Network for Redundancy Detection,cs.CL," This paper proposes a novel framework for detecting redundancy in supervised -sentence categorisation. Unlike traditional singleton neural network, our model -incorporates character-aware convolutional neural network (Char-CNN) with -character-aware recurrent neural network (Char-RNN) to form a convolutional -recurrent neural network (CRNN). Our model benefits from Char-CNN in that only -salient features are selected and fed into the integrated Char-RNN. Char-RNN -effectively learns long sequence semantics via sophisticated update mechanism. -We compare our framework against the state-of-the-art text classification -algorithms on four popular benchmarking corpus. For instance, our model -achieves competing precision rate, recall ratio, and F1 score on the -Google-news data-set. For twenty-news-groups data stream, our algorithm obtains -the optimum on precision rate, recall ratio, and F1 score. For Brown Corpus, -our framework obtains the best F1 score and almost equivalent precision rate -and recall ratio over the top competitor. For the question classification -collection, CRNN produces the optimal recall rate and F1 score and comparable -precision rate. We also analyse three different RNN hidden recurrent cells' -impact on performance and their runtime efficiency. We observe that MGU -achieves the optimal runtime and comparable performance against GRU and LSTM. -For TFIDF based algorithms, we experiment with word2vec, GloVe, and sent2vec -embeddings and report their performance differences. -" -4955,1706.01084,"Ting Chen, Liangjie Hong, Yue Shi, Yizhou Sun",Joint Text Embedding for Personalized Content-based Recommendation,cs.IR cs.CL cs.LG," Learning a good representation of text is key to many recommendation -applications. Examples include news recommendation where texts to be -recommended are constantly published everyday. However, most existing -recommendation techniques, such as matrix factorization based methods, mainly -rely on interaction histories to learn representations of items. While latent -factors of items can be learned effectively from user interaction data, in many -cases, such data is not available, especially for newly emerged items. - In this work, we aim to address the problem of personalized recommendation -for completely new items with text information available. We cast the problem -as a personalized text ranking problem and propose a general framework that -combines text embedding with personalized recommendation. Users and textual -content are embedded into latent feature space. The text embedding function can -be learned end-to-end by predicting user interactions with items. To alleviate -sparsity in interaction data, and leverage large amount of text data with -little or no user interactions, we further propose a joint text embedding model -that incorporates unsupervised text embedding with a combination module. -Experimental results show that our model can significantly improve the -effectiveness of recommendation systems on real-world datasets. -" -4956,1706.01206,Ji Ho Park and Pascale Fung,"One-step and Two-step Classification for Abusive Language Detection on - Twitter",cs.CL," Automatic abusive language detection is a difficult but important task for -online social media. Our research explores a two-step approach of performing -classification on abusive language and then classifying into specific types and -compares it with one-step approach of doing one multi-class classification for -detecting sexist and racist languages. With a public English Twitter corpus of -20 thousand tweets in the type of sexism and racism, our approach shows a -promising performance of 0.827 F-measure by using HybridCNN in one-step and -0.824 F-measure by using logistic regression in two-steps. -" -4957,1706.01322,Alexander Kuhnle and Ann Copestake,Deep learning evaluation using deep linguistic processing,cs.CL cs.AI cs.CV cs.LG," We discuss problems with the standard approaches to evaluation for tasks like -visual question answering, and argue that artificial data can be used to -address these as a complement to current practice. We demonstrate that with the -help of existing 'deep' linguistic processing technology we are able to create -challenging abstract datasets, which enable us to investigate the language -understanding abilities of multimodal deep learning models in detail, as -compared to a single performance value on a static and monolithic dataset. -" -4958,1706.01331,"Lara J. Martin, Prithviraj Ammanabrolu, Xinyu Wang, William Hancock, - Shruti Singh, Brent Harrison, Mark O. Riedl","Event Representations for Automated Story Generation with Deep Neural - Nets",cs.CL cs.AI cs.LG cs.NE," Automated story generation is the problem of automatically selecting a -sequence of events, actions, or words that can be told as a story. We seek to -develop a system that can generate stories by learning everything it needs to -know from textual story corpora. To date, recurrent neural networks that learn -language models at character, word, or sentence levels have had little success -generating coherent stories. We explore the question of event representations -that provide a mid-level of abstraction between words and sentences in order to -retain the semantic information of the original data while minimizing event -sparsity. We present a technique for preprocessing textual story data into -event sequences. We then present a technique for automated story generation -whereby we decompose the problem into the generation of successive events -(event2event) and the generation of natural language sentences from events -(event2sentence). We give empirical results comparing different event -representations and their effects on event successor generation and the -translation of events to natural language. -" -4959,1706.01340,"Robin Ruede, Markus M\""uller, Sebastian St\""uker, Alex Waibel","Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor",cs.CL cs.CV cs.HC cs.LG cs.SD," Using supporting backchannel (BC) cues can make human-computer interaction -more social. BCs provide a feedback from the listener to the speaker indicating -to the speaker that he is still listened to. BCs can be expressed in different -ways, depending on the modality of the interaction, for example as gestures or -acoustic cues. In this work, we only considered acoustic cues. We are proposing -an approach towards detecting BC opportunities based on acoustic input features -like power and pitch. While other works in the field rely on the use of a -hand-written rule set or specialized features, we made use of artificial neural -networks. They are capable of deriving higher order features from input -features themselves. In our setup, we first used a fully connected feed-forward -network to establish an updated baseline in comparison to our previously -proposed setup. We also extended this setup by the use of Long Short-Term -Memory (LSTM) networks which have shown to outperform feed-forward based setups -on various tasks. Our best system achieved an F1-Score of 0.37 using power and -pitch features. Adding linguistic information using word2vec, the score -increased to 0.39. -" -4960,1706.01399,"Ofir Press, Amir Bar, Ben Bogin, Jonathan Berant, Lior Wolf","Language Generation with Recurrent Generative Adversarial Networks - without Pre-training",cs.CL," Generative Adversarial Networks (GANs) have shown great promise recently in -image generation. Training GANs for language generation has proven to be more -difficult, because of the non-differentiable nature of generating text with -recurrent neural networks. Consequently, past work has either resorted to -pre-training with maximum-likelihood or used convolutional networks for -generation. In this work, we show that recurrent neural networks can be trained -to generate text with GANs from scratch using curriculum learning, by slowly -teaching the model to generate sequences of increasing and variable length. We -empirically show that our approach vastly improves the quality of generated -sequences compared to a convolutional baseline. -" -4961,1706.01427,"Adam Santoro, David Raposo, David G.T. Barrett, Mateusz Malinowski, - Razvan Pascanu, Peter Battaglia, Timothy Lillicrap",A simple neural network module for relational reasoning,cs.CL cs.LG," Relational reasoning is a central component of generally intelligent -behavior, but has proven difficult for neural networks to learn. In this paper -we describe how to use Relation Networks (RNs) as a simple plug-and-play module -to solve problems that fundamentally hinge on relational reasoning. We tested -RN-augmented networks on three tasks: visual question answering using a -challenging dataset called CLEVR, on which we achieve state-of-the-art, -super-human performance; text-based question answering using the bAbI suite of -tasks; and complex reasoning about dynamic physical systems. Then, using a -curated dataset called Sort-of-CLEVR we show that powerful convolutional -networks do not have a general capacity to solve relational questions, but can -gain this capacity when augmented with RNs. Our work shows how a deep learning -architecture equipped with an RN module can implicitly discover and learn to -reason about entities and their relations. -" -4962,1706.01450,Tong Wang and Xingdi Yuan and Adam Trischler,A Joint Model for Question Answering and Question Generation,cs.CL cs.AI cs.LG cs.NE," We propose a generative machine comprehension model that learns jointly to -ask and answer questions based on documents. The proposed model uses a -sequence-to-sequence framework that encodes the document and generates a -question (answer) given an answer (question). Significant improvement in model -performance is observed empirically on the SQuAD corpus, confirming our -hypothesis that the model benefits from jointly learning to perform both tasks. -We believe the joint model's novelty offers a new perspective on machine -comprehension beyond architectural engineering, and serves as a first step -towards autonomous information seeking. -" -4963,1706.01554,"Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra","Best of Both Worlds: Transferring Knowledge from Discriminative Learning - to a Generative Visual Dialog Model",cs.CV cs.AI cs.CL," We present a novel training framework for neural sequence models, -particularly for grounded dialog generation. The standard training paradigm for -these models is maximum likelihood estimation (MLE), or minimizing the -cross-entropy of the human responses. Across a variety of domains, a recurring -problem with MLE trained generative neural dialog models (G) is that they tend -to produce 'safe' and generic responses (""I don't know"", ""I can't tell""). In -contrast, discriminative dialog models (D) that are trained to rank a list of -candidate human responses outperform their generative counterparts; in terms of -automatic metrics, diversity, and informativeness of the responses. However, D -is not useful in practice since it cannot be deployed to have real -conversations with users. - Our work aims to achieve the best of both worlds -- the practical usefulness -of G and the strong performance of D -- via knowledge transfer from D to G. Our -primary contribution is an end-to-end trainable generative visual dialog model, -where G receives gradients from D as a perceptual (not adversarial) loss of the -sequence sampled from G. We leverage the recently proposed Gumbel-Softmax (GS) -approximation to the discrete distribution -- specifically, an RNN augmented -with a sequence of GS samplers, coupled with the straight-through gradient -estimator to enable end-to-end differentiability. We also introduce a stronger -encoder for visual dialog, and employ a self-attention mechanism for answer -encoding along with a metric learning loss to aid D in better capturing -semantic similarities in answer responses. Overall, our proposed model -outperforms state-of-the-art on the VisDial dataset by a significant margin -(2.67% on recall@10). The source code can be downloaded from -https://github.com/jiasenlu/visDial.pytorch. -" -4964,1706.01556,Yifan Peng and Zhiyong Lu,"Deep learning for extracting protein-protein interactions from - biomedical literature",cs.CL cs.LG q-bio.QM," State-of-the-art methods for protein-protein interaction (PPI) extraction are -primarily feature-based or kernel-based by leveraging lexical and syntactic -information. But how to incorporate such knowledge in the recent deep learning -methods remains an open question. In this paper, we propose a multichannel -dependency-based convolutional neural network model (McDepCNN). It applies one -channel to the embedding vector of each word in the sentence, and another -channel to the embedding vector of the head of the corresponding word. -Therefore, the model can use richer information obtained from different -channels. Experiments on two public benchmarking datasets, AIMed and BioInfer, -demonstrate that McDepCNN compares favorably to the state-of-the-art -rich-feature and single-kernel based methods. In addition, McDepCNN achieves -24.4% relative improvement in F1-score over the state-of-the-art methods on -cross-corpus evaluation and 12% improvement in F1-score over kernel-based -methods on ""difficult"" instances. These results suggest that McDepCNN -generalizes more easily over different corpora, and is capable of capturing -long distance features in the sentences. -" -4965,1706.01570,Michael Bloodgood and Benjamin Strauss,"Acquisition of Translation Lexicons for Historically Unwritten Languages - via Bridging Loanwords",cs.CL," With the advent of informal electronic communications such as social media, -colloquial languages that were historically unwritten are being written for the -first time in heavily code-switched environments. We present a method for -inducing portions of translation lexicons through the use of expert knowledge -in these settings where there are approximately zero resources available other -than a language informant, potentially not even large amounts of monolingual -data. We investigate inducing a Moroccan Darija-English translation lexicon via -French loanwords bridging into English and find that a useful lexicon is -induced for human-assisted translation and statistical machine translation. -" -4966,1706.01678,"Shibhansh Dohare, Harish Karnick, Vivek Gupta",Text Summarization using Abstract Meaning Representation,cs.CL," With an ever increasing size of text present on the Internet, automatic -summary generation remains an important problem for natural language -understanding. In this work we explore a novel full-fledged pipeline for text -summarization with an intermediate step of Abstract Meaning Representation -(AMR). The pipeline proposed by us first generates an AMR graph of an input -story, through which it extracts a summary graph and finally, generate summary -sentences from this summary graph. Our proposed method achieves -state-of-the-art results compared to the other text summarization routines -based on AMR. We also point out some significant problems in the existing -evaluation methods, which make them unsuitable for evaluating summary quality. -" -4967,1706.01690,"Hannes Schulz, Jeremie Zumer, Layla El Asri, Shikhar Sharma",A Frame Tracking Model for Memory-Enhanced Dialogue Systems,cs.CL," Recently, resources and tasks were proposed to go beyond state tracking in -dialogue systems. An example is the frame tracking task, which requires -recording multiple frames, one for each user goal set during the dialogue. This -allows a user, for instance, to compare items corresponding to different goals. -This paper proposes a model which takes as input the list of frames created so -far during the dialogue, the current user utterance as well as the dialogue -acts, slot types, and slot values associated with this utterance. The model -then outputs the frame being referenced by each triple of dialogue act, slot -type, and slot value. We show that on the recently published Frames dataset, -this model significantly outperforms a previously proposed rule-based baseline. -In addition, we propose an extensive analysis of the frame tracking task by -dividing it into sub-tasks and assessing their difficulty with respect to our -model. -" -4968,1706.01723,Xiang Yu and Agnieszka Fale\'nska and Ngoc Thang Vu,A General-Purpose Tagger with Convolutional Neural Networks,cs.CL," We present a general-purpose tagger based on convolutional neural networks -(CNN), used for both composing word vectors and encoding context information. -The CNN tagger is robust across different tagging tasks: without task-specific -tuning of hyper-parameters, it achieves state-of-the-art results in -part-of-speech tagging, morphological tagging and supertagging. The CNN tagger -is also robust against the out-of-vocabulary problem, it performs well on -artificially unnormalized texts. -" -4969,1706.01740,Yoann Dupont and Marco Dinarelli and Isabelle Tellier,Label-Dependencies Aware Recurrent Neural Networks,cs.CL," In the last few years, Recurrent Neural Networks (RNNs) have proved effective -on several NLP tasks. Despite such great success, their ability to model -\emph{sequence labeling} is still limited. This lead research toward solutions -where RNNs are combined with models which already proved effective in this -domain, such as CRFs. In this work we propose a solution far simpler but very -effective: an evolution of the simple Jordan RNN, where labels are re-injected -as input into the network, and converted into embeddings, in the same way as -words. We compare this RNN variant to all the other RNN models, Elman and -Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language -Understanding (SLU). Thanks to label embeddings and their combination at the -hidden layer, the proposed variant, which uses more parameters than Elman and -Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other -RNNs, but also outperforms sophisticated CRF models. -" -4970,1706.01758,"Ming Li, Peilun Xiao, Ju Zhang",A WL-SPPIM Semantic Model for Document Classification,cs.CL cs.AI," In this paper, we explore SPPIM-based text classification method, and the -experiment reveals that the SPPIM method is equal to or even superior than SGNS -method in text classification task on three international and standard text -datasets, namely 20newsgroups, Reuters52 and WebKB. Comparing to SGNS, although -SPPMI provides a better solution, it is not necessarily better than SGNS in -text classification tasks. Based on our analysis, SGNS takes into the -consideration of weight calculation during decomposition process, so it has -better performance than SPPIM in some standard datasets. Inspired by this, we -propose a WL-SPPIM semantic model based on SPPIM model, and experiment shows -that WL-SPPIM approach has better classification and higher scalability in the -text classification task compared with LDA, SGNS and SPPIM approaches. -" -4971,1706.01839,Lawrence Phillips and Nathan Hodas,"Assessing the Linguistic Productivity of Unsupervised Deep Neural - Networks",cs.CL," Increasingly, cognitive scientists have demonstrated interest in applying -tools from deep learning. One use for deep learning is in language acquisition -where it is useful to know if a linguistic phenomenon can be learned through -domain-general means. To assess whether unsupervised deep learning is -appropriate, we first pose a smaller question: Can unsupervised neural networks -apply linguistic rules productively, using them in novel situations? We draw -from the literature on determiner/noun productivity by training an -unsupervised, autoencoder network measuring its ability to combine nouns with -determiners. Our simple autoencoder creates combinations it has not previously -encountered and produces a degree of overlap matching adults. While this -preliminary work does not provide conclusive evidence for productivity, it -warrants further investigation with more complex models. Further, this work -helps lay the foundations for future collaboration between the deep learning -and cognitive science communities. -" -4972,1706.01847,"John Wieting, Jonathan Mallinson, Kevin Gimpel",Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext,cs.CL," We consider the problem of learning general-purpose, paraphrastic sentence -embeddings in the setting of Wieting et al. (2016b). We use neural machine -translation to generate sentential paraphrases via back-translation of -bilingual sentence pairs. We evaluate the paraphrase pairs by their ability to -serve as training data for learning paraphrastic sentence embeddings. We find -that the data quality is stronger than prior work based on bitext and on par -with manually-written English paraphrase pairs, with the advantage that our -approach can scale up to generate large training sets for many languages and -domains. We experiment with several language pairs and data sources, and -develop a variety of data filtering techniques. In the process, we explore how -neural machine translation output differs from human-written sentences, finding -clear differences in length, the amount of repetition, and the use of rare -words. -" -4973,1706.01863,"Peter Sch\""uller and K\""ubra C{\i}ng{\i}ll{\i} and Ferit Tun\c{c}er - and Bar{\i}\c{s} G\""un S\""urmeli and Ay\c{s}eg\""ul Pekel and Ay\c{s}e Hande - Karatay and Hacer Ezgi Karaka\c{s}",Marmara Turkish Coreference Corpus and Coreference Resolution Baseline,cs.CL cs.AI," We describe the Marmara Turkish Coreference Corpus, which is an annotation of -the whole METU-Sabanci Turkish Treebank with mentions and coreference chains. -Collecting eight or more independent annotations for each document allowed for -fully automatic adjudication. We provide a baseline system for Turkish mention -detection and coreference resolution and evaluate it on the corpus. -" -4974,1706.01875,"Rishab Nithyanand, Brian Schaffner, Phillipa Gill",Measuring Offensive Speech in Online Political Discourse,cs.CL cs.CY cs.SI," The Internet and online forums such as Reddit have become an increasingly -popular medium for citizens to engage in political conversations. However, the -online disinhibition effect resulting from the ability to use pseudonymous -identities may manifest in the form of offensive speech, consequently making -political discussions more aggressive and polarizing than they already are. -Such environments may result in harassment and self-censorship from its -targets. In this paper, we present preliminary results from a large-scale -temporal measurement aimed at quantifying offensiveness in online political -discussions. - To enable our measurements, we develop and evaluate an offensive speech -classifier. We then use this classifier to quantify and compare offensiveness -in the political and general contexts. We perform our study using a database of -over 168M Reddit comments made by over 7M pseudonyms between January 2015 and -January 2017 -- a period covering several divisive political events including -the 2016 US presidential elections. -" -4975,1706.01967,"Keet Sugathadasa, Buddhi Ayesha, Nisansa de Silva, Amal Shehan Perera, - Vindula Jayawardana, Dimuthu Lakmal, Madhavi Perera","Synergistic Union of Word2Vec and Lexicon for Domain Specific Semantic - Similarity",cs.CL," Semantic similarity measures are an important part in Natural Language -Processing tasks. However Semantic similarity measures built for general use do -not perform well within specific domains. Therefore in this study we introduce -a domain specific semantic similarity measure that was created by the -synergistic union of word2vec, a word embedding method that is used for -semantic similarity calculation and lexicon based (lexical) semantic similarity -methods. We prove that this proposed methodology out performs word embedding -methods trained on generic corpus and methods trained on domain specific corpus -but do not use lexical semantic similarity methods to augment the results. -Further, we prove that text lemmatization can improve the performance of word -embedding methods. -" -4976,1706.02027,"Duyu Tang, Nan Duan, Tao Qin, Zhao Yan and Ming Zhou",Question Answering and Question Generation as Dual Tasks,cs.CL," We study the problem of joint question answering (QA) and question generation -(QG) in this paper. - Our intuition is that QA and QG have intrinsic connections and these two -tasks could improve each other. - On one side, the QA model judges whether the generated question of a QG model -is relevant to the answer. - On the other side, the QG model provides the probability of generating a -question given the answer, which is a useful evidence that in turn facilitates -QA. - In this paper we regard QA and QG as dual tasks. - We propose a training framework that trains the models of QA and QG -simultaneously, and explicitly leverages their probabilistic correlation to -guide the training process of both models. - We implement a QG model based on sequence-to-sequence learning, and a QA -model based on recurrent neural network. - As all the components of the QA and QG models are differentiable, all the -parameters involved in these two models could be conventionally learned with -back propagation. - We conduct experiments on three datasets. Empirical results show that our -training framework improves both QA and QG tasks. - The improved QA model performs comparably with strong baseline approaches on -all three datasets. -" -4977,1706.02095,Diego Molla-Aliod,"Macquarie University at BioASQ 5b -- Query-based Summarisation - Techniques for Selecting the Ideal Answers",cs.CL," Macquarie University's contribution to the BioASQ challenge (Task 5b Phase B) -focused on the use of query-based extractive summarisation techniques for the -generation of the ideal answers. Four runs were submitted, with approaches -ranging from a trivial system that selected the first $n$ snippets, to the use -of deep learning approaches under a regression framework. Our experiments and -the ROUGE results of the five test batches of BioASQ indicate surprisingly good -results for the trivial approach. Overall, most of our runs on the first three -test batches achieved the best ROUGE-SU4 results in the challenge. -" -4978,1706.02124,"Marian Tietz, Tayfun Alpay, Johannes Twiefel, Stefan Wermter",Semi-Supervised Phoneme Recognition with Recurrent Ladder Networks,cs.CL cs.LG cs.NE," Ladder networks are a notable new concept in the field of semi-supervised -learning by showing state-of-the-art results in image recognition tasks while -being compatible with many existing neural architectures. We present the -recurrent ladder network, a novel modification of the ladder network, for -semi-supervised learning of recurrent neural networks which we evaluate with a -phoneme recognition task on the TIMIT corpus. Our results show that the model -is able to consistently outperform the baseline and achieve fully-supervised -baseline performance with only 75% of all labels which demonstrates that the -model is capable of using unsupervised data as an effective regulariser. -" -4979,1706.02141,"Carlos G\'omez-Rodr\'iguez, Iago Alonso-Alonso, David Vilares","How Important is Syntactic Parsing Accuracy? An Empirical Evaluation on - Rule-Based Sentiment Analysis",cs.CL cs.AI," Syntactic parsing, the process of obtaining the internal structure of -sentences in natural languages, is a crucial task for artificial intelligence -applications that need to extract meaning from natural language text or speech. -Sentiment analysis is one example of application for which parsing has recently -proven useful. - In recent years, there have been significant advances in the accuracy of -parsing algorithms. In this article, we perform an empirical, task-oriented -evaluation to determine how parsing accuracy influences the performance of a -state-of-the-art rule-based sentiment analysis system that determines the -polarity of sentences from their parse trees. In particular, we evaluate the -system using four well-known dependency parsers, including both current models -with state-of-the-art accuracy and more innacurate models which, however, -require less computational resources. - The experiments show that all of the parsers produce similarly good results -in the sentiment analysis task, without their accuracy having any relevant -influence on the results. Since parsing is currently a task with a relatively -high computational cost that varies strongly between algorithms, this suggests -that sentiment analysis researchers and users should prioritize speed over -accuracy when choosing a parser; and parsing researchers should investigate -models that improve speed further, even at some cost to accuracy. -" -4980,1706.02222,"Andros Tjandra, Sakriani Sakti, Ruli Manurung, Mirna Adriani and - Satoshi Nakamura",Gated Recurrent Neural Tensor Network,cs.LG cs.CL stat.ML," Recurrent Neural Networks (RNNs), which are a powerful scheme for modeling -temporal and sequential data need to capture long-term dependencies on datasets -and represent them in hidden layers with a powerful model to capture more -information from inputs. For modeling long-term dependencies in a dataset, the -gating mechanism concept can help RNNs remember and forget previous -information. Representing the hidden layers of an RNN with more expressive -operations (i.e., tensor products) helps it learn a more complex relationship -between the current input and the previous hidden layer information. These -ideas can generally improve RNN performances. In this paper, we proposed a -novel RNN architecture that combine the concepts of gating mechanism and the -tensor product into a single model. By combining these two concepts into a -single RNN, our proposed models learn long-term dependencies by modeling with -gating units and obtain more expressive and direct interaction between input -and hidden layers using a tensor product on 3-dimensional array (tensor) weight -parameters. We use Long Short Term Memory (LSTM) RNN and Gated Recurrent Unit -(GRU) RNN and combine them with a tensor product inside their formulations. Our -proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural -Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor -Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the -tensor product. We conducted experiments with our proposed models on word-level -and character-level language modeling tasks and revealed that our proposed -models significantly improved their performance compared to our baseline -models. -" -4981,1706.02241,"Denis Newman-Griffis, Albert M Lai, Eric Fosler-Lussier",Insights into Analogy Completion from the Biomedical Domain,cs.CL," Analogy completion has been a popular task in recent years for evaluating the -semantic properties of word embeddings, but the standard methodology makes a -number of assumptions about analogies that do not always hold, either in recent -benchmark datasets or when expanding into other domains. Through an analysis of -analogies in the biomedical domain, we identify three assumptions: that of a -Single Answer for any given analogy, that the pairs involved describe the Same -Relationship, and that each pair is Informative with respect to the other. We -propose modifying the standard methodology to relax these assumptions by -allowing for multiple correct answers, reporting MAP and MRR in addition to -accuracy, and using multiple example pairs. We further present BMASS, a novel -dataset for evaluating linguistic regularities in biomedical embeddings, and -demonstrate that the relationships described in the dataset pose significant -semantic challenges to current word embedding methods. -" -4982,1706.02256,"Ana Marasovi\'c, Leo Born, Juri Opitz and Anette Frank",A Mention-Ranking Model for Abstract Anaphora Resolution,cs.CL stat.ML," Resolving abstract anaphora is an important, but difficult task for text -understanding. Yet, with recent advances in representation learning this task -becomes a more tangible aim. A central property of abstract anaphora is that it -establishes a relation between the anaphor embedded in the anaphoric sentence -and its (typically non-nominal) antecedent. We propose a mention-ranking model -that learns how abstract anaphors relate to their antecedents with an -LSTM-Siamese Net. We overcome the lack of training data by generating -artificial anaphoric sentence--antecedent pairs. Our model outperforms -state-of-the-art results on shell noun resolution. We also report first -benchmark results on an abstract anaphora subset of the ARRAU corpus. This -corpus presents a greater challenge due to a mixture of nominal and pronominal -anaphors and a greater range of confounders. We found model variants that -outperform the baselines for nominal anaphors, without training on individual -anaphor data, but still lag behind for pronominal anaphors. Our model selects -syntactically plausible candidates and -- if disregarding syntax -- -discriminates candidates using deeper features. -" -4983,1706.02427,"Zhao Yan and Duyu Tang and Nan Duan and Junwei Bao and Yuanhua Lv and - Ming Zhou and Zhoujun Li",Content-Based Table Retrieval for Web Queries,cs.CL," Understanding the connections between unstructured text and semi-structured -table is an important yet neglected problem in natural language processing. In -this work, we focus on content-based table retrieval. Given a query, the task -is to find the most relevant table from a collection of tables. Further -progress towards improving this area requires powerful models of semantic -matching and richer training and evaluation resources. To remedy this, we -present a ranking based approach, and implement both carefully designed -features and neural network architectures to measure the relevance between a -query and the content of a table. Furthermore, we release an open-domain -dataset that includes 21,113 web queries for 273,816 tables. We conduct -comprehensive experiments on both real world and synthetic datasets. Results -verify the effectiveness of our approach and present the challenges for this -task. -" -4984,1706.02459,"Shuming Ma, Xu Sun, Jingjing Xu, Houfeng Wang, Wenjie Li, Qi Su","Improving Semantic Relevance for Sequence-to-Sequence Learning of - Chinese Social Media Text Summarization",cs.CL," Current Chinese social media text summarization models are based on an -encoder-decoder framework. Although its generated summaries are similar to -source texts literally, they have low semantic relevance. In this work, our -goal is to improve semantic relevance between source texts and summaries for -Chinese social media summarization. We introduce a Semantic Relevance Based -neural model to encourage high semantic similarity between texts and summaries. -In our model, the source text is represented by a gated attention encoder, -while the summary representation is produced by a decoder. Besides, the -similarity score between the representations is maximized during training. Our -experiments show that the proposed model outperforms baseline systems on a -social media corpus. -" -4985,1706.02490,"Karla Stepanova and Matej Hoffmann and Zdenek Straka and Frederico B. - Klein and Angelo Cangelosi and Michal Vavrecka","Where is my forearm? Clustering of body parts from simultaneous tactile - and linguistic input using sequential mapping",cs.NE cs.AI cs.CL cs.LG cs.RO," Humans and animals are constantly exposed to a continuous stream of sensory -information from different modalities. At the same time, they form more -compressed representations like concepts or symbols. In species that use -language, this process is further structured by this interaction, where a -mapping between the sensorimotor concepts and linguistic elements needs to be -established. There is evidence that children might be learning language by -simply disambiguating potential meanings based on multiple exposures to -utterances in different contexts (cross-situational learning). In existing -models, the mapping between modalities is usually found in a single step by -directly using frequencies of referent and meaning co-occurrences. In this -paper, we present an extension of this one-step mapping and introduce a newly -proposed sequential mapping algorithm together with a publicly available Matlab -implementation. For demonstration, we have chosen a less typical scenario: -instead of learning to associate objects with their names, we focus on body -representations. A humanoid robot is receiving tactile stimulations on its -body, while at the same time listening to utterances of the body part names -(e.g., hand, forearm and torso). With the goal at arriving at the correct ""body -categories"", we demonstrate how a sequential mapping algorithm outperforms -one-step mapping. In addition, the effect of data set size and noise in the -linguistic input are studied. -" -4986,1706.02496,Franziska Horn,Context encoders as a simple but powerful extension of word2vec,stat.ML cs.CL cs.LG," With a simple architecture and the ability to learn meaningful word -embeddings efficiently from texts containing billions of words, word2vec -remains one of the most popular neural language models used today. However, as -only a single embedding is learned for every word in the vocabulary, the model -fails to optimally represent words with multiple meanings. Additionally, it is -not possible to create embeddings for new (out-of-vocabulary) words on the -spot. Based on an intuitive interpretation of the continuous bag-of-words -(CBOW) word2vec model's negative sampling training objective in terms of -predicting context based similarities, we motivate an extension of the model we -call context encoders (ConEc). By multiplying the matrix of trained word2vec -embeddings with a word's average context vector, out-of-vocabulary (OOV) -embeddings and representations for a word with multiple meanings can be created -based on the word's local contexts. The benefits of this approach are -illustrated by using these word embeddings as features in the CoNLL 2003 named -entity recognition (NER) task. -" -4987,1706.02551,T.M. Sadykov and T.A. Zhukov,"The Algorithmic Inflection of Russian and Generation of Grammatically - Correct Text",cs.CL," We present a deterministic algorithm for Russian inflection. This algorithm -is implemented in a publicly available web-service www.passare.ru which -provides functions for inflection of single words, word matching and synthesis -of grammatically correct Russian text. The inflectional functions have been -tested against the annotated corpus of Russian language OpenCorpora. -" -4988,1706.02596,"Dirk Weissenborn, Tom\'a\v{s} Ko\v{c}isk\'y, Chris Dyer",Dynamic Integration of Background Knowledge in Neural NLU Systems,cs.CL cs.AI cs.NE," Common-sense and background knowledge is required to understand natural -language, but in most neural natural language understanding (NLU) systems, this -knowledge must be acquired from training corpora during learning, and then it -is static at test time. We introduce a new architecture for the dynamic -integration of explicit background knowledge in NLU models. A general-purpose -reading module reads background knowledge in the form of free-text statements -(together with task-specific text inputs) and yields refined word -representations to a task-specific NLU architecture that reprocesses the task -inputs with these representations. Experiments on document question answering -(DQA) and recognizing textual entailment (RTE) demonstrate the effectiveness -and flexibility of the approach. Analysis shows that our model learns to -exploit knowledge in a semantically appropriate way. -" -4989,1706.02737,"Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan","Advances in Joint CTC-Attention based End-to-End Speech Recognition with - a Deep CNN Encoder and RNN-LM",cs.CL," We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) -model. We learn to listen and write characters with a joint Connectionist -Temporal Classification (CTC) and attention-based encoder-decoder network. The -encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. -The CTC network sits on top of the encoder and is jointly trained with the -attention-based decoder. During the beam search process, we combine the CTC -predictions, the attention-based decoder predictions and a separately trained -LSTM language model. We achieve a 5-10\% error reduction compared to prior -systems on spontaneous Japanese and Chinese speech, and our end-to-end model -beats out traditional hybrid ASR systems. -" -4990,1706.02757,"Jekaterina Novikova, Christian Dondrup, Ioannis Papaioannou and Oliver - Lemon","Sympathy Begins with a Smile, Intelligence Begins with a Word: Use of - Multimodal Features in Spoken Human-Robot Interaction",cs.RO cs.CL cs.HC," Recognition of social signals, from human facial expressions or prosody of -speech, is a popular research topic in human-robot interaction studies. There -is also a long line of research in the spoken dialogue community that -investigates user satisfaction in relation to dialogue characteristics. -However, very little research relates a combination of multimodal social -signals and language features detected during spoken face-to-face human-robot -interaction to the resulting user perception of a robot. In this paper we show -how different emotional facial expressions of human users, in combination with -prosodic characteristics of human speech and features of human-robot dialogue, -correlate with users' impressions of the robot after a conversation. We find -that happiness in the user's recognised facial expression strongly correlates -with likeability of a robot, while dialogue-related features (such as number of -human turns or number of sentences per robot utterance) correlate with -perceiving a robot as intelligent. In addition, we show that facial expression, -emotional features, and prosody are better predictors of human ratings related -to perceived robot likeability and anthropomorphism, while linguistic and -non-linguistic features more often predict perceived robot intelligence and -interpretability. As such, these characteristics may in future be used as an -online reward signal for in-situ Reinforcement Learning based adaptive -human-robot dialogue systems. -" -4991,1706.02776,Matt Shannon,Optimizing expected word error rate via sampling for speech recognition,cs.CL cs.LG cs.NE stat.ML," State-level minimum Bayes risk (sMBR) training has become the de facto -standard for sequence-level training of speech recognition acoustic models. It -has an elegant formulation using the expectation semiring, and gives large -improvements in word error rate (WER) over models trained solely using -cross-entropy (CE) or connectionist temporal classification (CTC). sMBR -training optimizes the expected number of frames at which the reference and -hypothesized acoustic states differ. It may be preferable to optimize the -expected WER, but WER does not interact well with the expectation semiring, and -previous approaches based on computing expected WER exactly involve expanding -the lattices used during training. In this paper we show how to perform -optimization of the expected WER by sampling paths from the lattices used -during conventional sMBR training. The gradient of the expected WER is itself -an expectation, and so may be approximated using Monte Carlo sampling. We show -experimentally that optimizing WER during acoustic model training gives 5% -relative improvement in WER over a well-tuned sMBR baseline on a 2-channel -query recognition task (Google Home). -" -4992,1706.02807,"Lifu Tu, Kevin Gimpel, Karen Livescu",Learning to Embed Words in Context for Syntactic Tasks,cs.CL," We present models for embedding words in the context of surrounding words. -Such models, which we refer to as token embeddings, represent the -characteristics of a word that are specific to a given context, such as word -sense, syntactic category, and semantic role. We explore simple, efficient -token embedding models based on standard neural network architectures. We learn -token embeddings on a large amount of unannotated text and evaluate them as -features for part-of-speech taggers and dependency parsers trained on much -smaller amounts of annotated data. We find that predictors endowed with token -embeddings consistently outperform baseline predictors across a range of -context window and training set sizes. -" -4993,1706.02861,"Qiao Qian, Minlie Huang, Haizhou Zhao, Jingfang Xu, Xiaoyan Zhu","Assigning personality/identity to a chatting machine for coherent - conversation generation",cs.CL," Endowing a chatbot with personality or an identity is quite challenging but -critical to deliver more realistic and natural conversations. In this paper, we -address the issue of generating responses that are coherent to a pre-specified -agent profile. We design a model consisting of three modules: a profile -detector to decide whether a post should be responded using the profile and -which key should be addressed, a bidirectional decoder to generate responses -forward and backward starting from a selected profile value, and a position -detector that predicts a word position from which decoding should start given a -selected profile value. We show that general conversation data from social -media can be used to generate profile-coherent responses. Manual and automatic -evaluation shows that our model can deliver more coherent, natural, and -diversified responses. -" -4994,1706.02883,"Xipeng Qiu, Jingjing Gong, Xuanjing Huang","Overview of the NLPCC 2017 Shared Task: Chinese News Headline - Categorization",cs.CL," In this paper, we give an overview for the shared task at the CCF Conference -on Natural Language Processing \& Chinese Computing (NLPCC 2017): Chinese News -Headline Categorization. The dataset of this shared task consists 18 classes, -12,000 short texts along with corresponded labels for each class. The dataset -and example code can be accessed at -https://github.com/FudanNLP/nlpcc2017_news_headline_categorization. -" -4995,1706.02901,"Che-Wei Huang, Shrikanth. S. Narayanan","Characterizing Types of Convolution in Deep Convolutional Recurrent - Neural Networks for Robust Speech Emotion Recognition",cs.LG cs.CL cs.MM cs.SD," Deep convolutional neural networks are being actively investigated in a wide -range of speech and audio processing applications including speech recognition, -audio event detection and computational paralinguistics, owing to their ability -to reduce factors of variations, for learning from speech. However, studies -have suggested to favor a certain type of convolutional operations when -building a deep convolutional neural network for speech applications although -there has been promising results using different types of convolutional -operations. In this work, we study four types of convolutional operations on -different input features for speech emotion recognition under noisy and clean -conditions in order to derive a comprehensive understanding. Since affective -behavioral information has been shown to reflect temporally varying of mental -state and convolutional operation are applied locally in time, all deep neural -networks share a deep recurrent sub-network architecture for further temporal -modeling. We present detailed quantitative module-wise performance analysis to -gain insights into information flows within the proposed architectures. In -particular, we demonstrate the interplay of affective information and the other -irrelevant information during the progression from one module to another. -Finally we show that all of our deep neural networks provide state-of-the-art -performance on the eNTERFACE'05 corpus. -" -4996,1706.02909,"Vindula Jayawardana, Dimuthu Lakmal, Nisansa de Silva, Amal Shehan - Perera, Keet Sugathadasa, Buddhi Ayesha","Deriving a Representative Vector for Ontology Classes with Instance Word - Vector Embeddings",cs.CL," Selecting a representative vector for a set of vectors is a very common -requirement in many algorithmic tasks. Traditionally, the mean or median vector -is selected. Ontology classes are sets of homogeneous instance objects that can -be converted to a vector space by word vector embeddings. This study proposes a -methodology to derive a representative vector for ontology classes whose -instances were converted to the vector space. We start by deriving five -candidate vectors which are then used to train a machine learning model that -would calculate a representative vector for the class. We show that our -methodology out-performs the traditional mean and median vector -representations. -" -4997,1706.03059,"Lukasz Kaiser, Aidan N. Gomez, Francois Chollet",Depthwise Separable Convolutions for Neural Machine Translation,cs.CL cs.LG," Depthwise separable convolutions reduce the number of parameters and -computation used in convolutional operations while increasing representational -efficiency. They have been shown to be successful in image classification -models, both in obtaining better models than previously possible for a given -parameter count (the Xception architecture) and considerably reducing the -number of parameters required to perform at a given level (the MobileNets -family of architectures). Recently, convolutional sequence-to-sequence networks -have been applied to machine translation tasks with good results. In this work, -we study how depthwise separable convolutions can be applied to neural machine -translation. We introduce a new architecture inspired by Xception and ByteNet, -called SliceNet, which enables a significant reduction of the parameter count -and amount of computation needed to obtain results like ByteNet, and, with a -similar parameter count, achieves new state-of-the-art results. In addition to -showing that depthwise separable convolutions perform well for machine -translation, we investigate the architectural changes that they enable: we -observe that thanks to depthwise separability, we can increase the length of -convolution windows, removing the need for filter dilation. We also introduce a -new ""super-separable"" convolution operation that further reduces the number of -parameters and computational cost for obtaining state-of-the-art results. -" -4998,1706.03146,"Shuai Tang, Hailin Jin, Chen Fang, Zhaowen Wang, Virginia R. de Sa",Rethinking Skip-thought: A Neighborhood based Approach,cs.CL cs.AI cs.NE," We study the skip-thought model with neighborhood information as weak -supervision. More specifically, we propose a skip-thought neighbor model to -consider the adjacent sentences as a neighborhood. We train our skip-thought -neighbor model on a large corpus with continuous sentences, and then evaluate -the trained model on 7 tasks, which include semantic relatedness, paraphrase -detection, and classification benchmarks. Both quantitative comparison and -qualitative investigation are conducted. We empirically show that, our -skip-thought neighbor model performs as well as the skip-thought model on -evaluation tasks. In addition, we found that, incorporating an autoencoder path -in our model didn't aid our model to perform better, while it hurts the -performance of the skip-thought model. -" -4999,1706.03148,"Shuai Tang, Hailin Jin, Chen Fang, Zhaowen Wang, Virginia R. de Sa",Trimming and Improving Skip-thought Vectors,cs.CL," The skip-thought model has been proven to be effective at learning sentence -representations and capturing sentence semantics. In this paper, we propose a -suite of techniques to trim and improve it. First, we validate a hypothesis -that, given a current sentence, inferring the previous and inferring the next -sentence provide similar supervision power, therefore only one decoder for -predicting the next sentence is preserved in our trimmed skip-thought model. -Second, we present a connection layer between encoder and decoder to help the -model to generalize better on semantic relatedness tasks. Third, we found that -a good word embedding initialization is also essential for learning better -sentence representations. We train our model unsupervised on a large corpus -with contiguous sentences, and then evaluate the trained model on 7 supervised -tasks, which includes semantic relatedness, paraphrase detection, and text -classification benchmarks. We empirically show that, our proposed model is a -faster, lighter-weight and equally powerful alternative to the original -skip-thought model. -" -5000,1706.03191,Shadi Diab and Badie Sartawi,"Classification of Questions and Learning Outcome Statements (LOS) Into - Blooms Taxonomy (BT) By Similarity Measurements Towards Extracting Of - Learning Outcome from Learning Material",cs.CL," Blooms Taxonomy (BT) have been used to classify the objectives of learning -outcome by dividing the learning into three different domains; the cognitive -domain, the effective domain and the psychomotor domain. In this paper, we are -introducing a new approach to classify the questions and learning outcome -statements (LOS) into Blooms taxonomy (BT) and to verify BT verb lists, which -are being cited and used by academicians to write questions and (LOS). An -experiment was designed to investigate the semantic relationship between the -action verbs used in both questions and LOS to obtain more accurate -classification of the levels of BT. A sample of 775 different action verbs -collected from different universities allows us to measure an accurate and -clear-cut cognitive level for the action verb. It is worth mentioning that -natural language processing techniques were used to develop our rules as to -induce the questions into chunks in order to extract the action verbs. Our -proposed solution was able to classify the action verb into a precise level of -the cognitive domain. We, on our side, have tested and evaluated our proposed -solution using confusion matrix. The results of evaluation tests yielded 97% -for the macro average of precision and 90% for F1. Thus, the outcome of the -research suggests that it is crucial to analyse and verify the action verbs -cited and used by academicians to write LOS and classify their questions based -on blooms taxonomy in order to obtain a definite and more accurate -classification. -" -5001,1706.03196,"\'Alvaro Peris, Luis Cebri\'an and Francisco Casacuberta",Online Learning for Neural Machine Translation Post-editing,cs.LG cs.CL," Neural machine translation has meant a revolution of the field. Nevertheless, -post-editing the outputs of the system is mandatory for tasks requiring high -translation quality. Post-editing offers a unique opportunity for improving -neural machine translation systems, using online learning techniques and -treating the post-edited translations as new, fresh training data. We review -classical learning methods and propose a new optimization algorithm. We -thoroughly compare online learning algorithms in a post-editing scenario. -Results show significant improvements in translation quality and effort -reduction. -" -5002,1706.03216,"Johan Sjons, Thomas H\""orberg, Robert \""Ostling, Johannes Bjerva","Articulation rate in Swedish child-directed speech increases as a - function of the age of the child even when surprisal is controlled for",cs.CL," In earlier work, we have shown that articulation rate in Swedish -child-directed speech (CDS) increases as a function of the age of the child, -even when utterance length and differences in articulation rate between -subjects are controlled for. In this paper we show on utterance level in -spontaneous Swedish speech that i) for the youngest children, articulation rate -in CDS is lower than in adult-directed speech (ADS), ii) there is a significant -negative correlation between articulation rate and surprisal (the negative log -probability) in ADS, and iii) the increase in articulation rate in Swedish CDS -as a function of the age of the child holds, even when surprisal along with -utterance length and differences in articulation rate between speakers are -controlled for. These results indicate that adults adjust their articulation -rate to make it fit the linguistic capacity of the child. -" -5003,1706.03335,Amber Nigam,Exploring Automated Essay Scoring for Nonnative English Speakers,cs.CL," Automated Essay Scoring (AES) has been quite popular and is being widely -used. However, lack of appropriate methodology for rating nonnative English -speakers' essays has meant a lopsided advancement in this field. In this paper, -we report initial results of our experiments with nonnative AES that learns -from manual evaluation of nonnative essays. For this purpose, we conducted an -exercise in which essays written by nonnative English speakers in test -environment were rated both manually and by the automated system designed for -the experiment. In the process, we experimented with a few features to learn -about nuances linked to nonnative evaluation. The proposed methodology of -automated essay evaluation has yielded a correlation coefficient of 0.750 with -the manual evaluation. -" -5004,1706.03357,"Anssi Yli-Jyr\""a, Carlos G\'omez-Rodr\'iguez","Generic Axiomatization of Families of Noncrossing Graphs in Dependency - Parsing",cs.CL cs.FL," We present a simple encoding for unlabeled noncrossing graphs and show how -its latent counterpart helps us to represent several families of directed and -undirected graphs used in syntactic and semantic parsing of natural language as -context-free languages. The families are separated purely on the basis of -forbidden patterns in latent encoding, eliminating the need to differentiate -the families of non-crossing graphs in inference algorithms: one algorithm -works for all when the search space can be controlled in parser input. -" -5005,1706.03367,"Daniel Fern\'andez-Gonz\'alez, Carlos G\'omez-Rodr\'iguez","A Full Non-Monotonic Transition System for Unrestricted Non-Projective - Parsing",cs.CL," Restricted non-monotonicity has been shown beneficial for the projective -arc-eager dependency parser in previous research, as posterior decisions can -repair mistakes made in previous states due to the lack of information. In this -paper, we propose a novel, fully non-monotonic transition system based on the -non-projective Covington algorithm. As a non-monotonic system requires -exploration of erroneous actions during the training process, we develop -several non-monotonic variants of the recently defined dynamic oracle for the -Covington parser, based on tight approximations of the loss. Experiments on -datasets from the CoNLL-X and CoNLL-XI shared tasks show that a non-monotonic -dynamic oracle outperforms the monotonic version in the majority of languages. -" -5006,1706.03441,"Vinodkumar Prabhakaran, Owen Rambow","Dialog Structure Through the Lens of Gender, Gender Environment, and - Power",cs.CL," Understanding how the social context of an interaction affects our dialog -behavior is of great interest to social scientists who study human behavior, as -well as to computer scientists who build automatic methods to infer those -social contexts. In this paper, we study the interaction of power, gender, and -dialog behavior in organizational interactions. In order to perform this study, -we first construct the Gender Identified Enron Corpus of emails, in which we -semi-automatically assign the gender of around 23,000 individuals who authored -around 97,000 email messages in the Enron corpus. This corpus, which is made -freely available, is orders of magnitude larger than previously existing gender -identified corpora in the email domain. Next, we use this corpus to perform a -large-scale data-oriented study of the interplay of gender and manifestations -of power. We argue that, in addition to one's own gender, the ""gender -environment"" of an interaction, i.e., the gender makeup of one's interlocutors, -also affects the way power is manifested in dialog. We focus especially on -manifestations of power in the dialog structure --- both, in a shallow sense -that disregards the textual content of messages (e.g., how often do the -participants contribute, how often do they get replies etc.), as well as the -structure that is expressed within the textual content (e.g., who issues -requests and how are they made, whose requests get responses etc.). We find -that both gender and gender environment affect the ways power is manifested in -dialog, resulting in patterns that reveal the underlying factors. Finally, we -show the utility of gender information in the problem of automatically -predicting the direction of power between pairs of participants in email -interactions. -" -5007,1706.03449,"Arman Cohan, Nazli Goharian","Scientific document summarization via citation contextualization and - scientific discourse",cs.CL cs.DL," The rapid growth of scientific literature has made it difficult for the -researchers to quickly learn about the developments in their respective fields. -Scientific document summarization addresses this challenge by providing -summaries of the important contributions of scientific papers. We present a -framework for scientific summarization which takes advantage of the citations -and the scientific discourse structure. Citation texts often lack the evidence -and context to support the content of the cited paper and are even sometimes -inaccurate. We first address the problem of inaccuracy of the citation texts by -finding the relevant context from the cited paper. We propose three approaches -for contextualizing citations which are based on query reformulation, word -embeddings, and supervised learning. We then train a model to identify the -discourse facets for each citation. We finally propose a method for summarizing -scientific papers by leveraging the faceted citations and their corresponding -contexts. We evaluate our proposed method on two scientific summarization -datasets in the biomedical and computational linguistics domains. Extensive -evaluation results show that our methods can improve over the state of the art -by large margins. -" -5008,1706.03499,"Robert \""Ostling and Johannes Bjerva","SU-RUG at the CoNLL-SIGMORPHON 2017 shared task: Morphological - Inflection with Attentional Sequence-to-Sequence Models",cs.CL," This paper describes the Stockholm University/University of Groningen -(SU-RUG) system for the SIGMORPHON 2017 shared task on morphological -inflection. Our system is based on an attentional sequence-to-sequence neural -network model using Long Short-Term Memory (LSTM) cells, with joint training of -morphological inflection and the inverse transformation, i.e. lemmatization and -morphological analysis. Our system outperforms the baseline with a large -margin, and our submission ranks as the 4th best team for the track we -participate in (task 1, high-resource). -" -5009,1706.03530,Ildik\'o Pil\'an and Elena Volodina and Lars Borin,"Candidate sentence selection for language learning exercises: from a - comprehensive framework to an empirical evaluation",cs.CL," We present a framework and its implementation relying on Natural Language -Processing methods, which aims at the identification of exercise item -candidates from corpora. The hybrid system combining heuristics and machine -learning methods includes a number of relevant selection criteria. We focus on -two fundamental aspects: linguistic complexity and the dependence of the -extracted sentences on their original context. Previous work on exercise -generation addressed these two criteria only to a limited extent, and a refined -overall candidate sentence selection framework appears also to be lacking. In -addition to a detailed description of the system, we present the results of an -empirical evaluation conducted with language teachers and learners which -indicate the usefulness of the system for educational purposes. We have -integrated our system into a freely available online learning platform. -" -5010,1706.03542,"Emile Enguehard, Yoav Goldberg and Tal Linzen",Exploring the Syntactic Abilities of RNNs with Multi-task Learning,cs.CL," Recent work has explored the syntactic abilities of RNNs using the -subject-verb agreement task, which diagnoses sensitivity to sentence structure. -RNNs performed this task well in common cases, but faltered in complex -sentences (Linzen et al., 2016). We test whether these errors are due to -inherent limitations of the architecture or to the relatively indirect -supervision provided by most agreement dependencies in a corpus. We trained a -single RNN to perform both the agreement task and an additional task, either -CCG supertagging or language modeling. Multi-task training led to significantly -lower error rates, in particular on complex sentences, suggesting that RNNs -have the ability to evolve more sophisticated syntactic representations than -shown before. We also show that easily available agreement training data can -improve performance on other syntactic tasks, in particular when only a limited -amount of training data is available for those tasks. The multi-task paradigm -can also be leveraged to inject grammatical knowledge into language models. -" -5011,1706.03610,"Georg Wiese, Dirk Weissenborn, Mariana Neves",Neural Domain Adaptation for Biomedical Question Answering,cs.CL cs.AI cs.NE," Factoid question answering (QA) has recently benefited from the development -of deep learning (DL) systems. Neural network models outperform traditional -approaches in domains where large datasets exist, such as SQuAD (ca. 100,000 -questions) for Wikipedia articles. However, these systems have not yet been -applied to QA in more specific domains, such as biomedicine, because datasets -are generally too small to train a DL system from scratch. For example, the -BioASQ dataset for biomedical QA comprises less then 900 factoid (single -answer) and list (multiple answers) QA instances. In this work, we adapt a -neural QA system trained on a large open-domain dataset (SQuAD, source) to a -biomedical dataset (BioASQ, target) by employing various transfer learning -techniques. Our network architecture is based on a state-of-the-art QA system, -extended with biomedical word embeddings and a novel mechanism to answer list -questions. In contrast to existing biomedical QA systems, our system does not -rely on domain-specific ontologies, parsers or entity taggers, which are -expensive to create. Despite this fact, our systems achieve state-of-the-art -results on factoid questions and competitive results on list questions. -" -5012,1706.03747,"Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur","Acoustic data-driven lexicon learning based on a greedy pronunciation - selection framework",cs.CL," Speech recognition systems for irregularly-spelled languages like English -normally require hand-written pronunciations. In this paper, we describe a -system for automatically obtaining pronunciations of words for which -pronunciations are not available, but for which transcribed data exists. Our -method integrates information from the letter sequence and from the acoustic -evidence. The novel aspect of the problem that we address is the problem of how -to prune entries from such a lexicon (since, empirically, lexicons with too -many entries do not tend to be good for ASR performance). Experiments on -various ASR tasks show that, with the proposed framework, starting with an -initial lexicon of several thousand words, we are able to learn a lexicon which -performs close to a full expert lexicon in terms of WER performance on test -data, and is better than lexicons built using G2P alone or with a pruning -criterion based on pronunciation probability. -" -5013,1706.03757,"Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas",Semantic Entity Retrieval Toolkit,cs.CL cs.AI cs.IR," Unsupervised learning of low-dimensional, semantic representations of words -and entities has recently gained attention. In this paper we describe the -Semantic Entity Retrieval Toolkit (SERT) that provides implementations of our -previously published entity representation models. The toolkit provides a -unified interface to different representation learning algorithms, fine-grained -parsing configuration and can be used transparently with GPUs. In addition, -users can easily modify existing models or implement their own models in the -framework. After model training, SERT can be used to rank entities according to -a textual query and extract the learned entity/word representation for use in -downstream algorithms, such as clustering or recommendation. -" -5014,1706.03762,"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion - Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin",Attention Is All You Need,cs.CL cs.LG," The dominant sequence transduction models are based on complex recurrent or -convolutional neural networks in an encoder-decoder configuration. The best -performing models also connect the encoder and decoder through an attention -mechanism. We propose a new simple network architecture, the Transformer, based -solely on attention mechanisms, dispensing with recurrence and convolutions -entirely. Experiments on two machine translation tasks show these models to be -superior in quality while being more parallelizable and requiring significantly -less time to train. Our model achieves 28.4 BLEU on the WMT 2014 -English-to-German translation task, improving over the existing best results, -including ensembles by over 2 BLEU. On the WMT 2014 English-to-French -translation task, our model establishes a new single-model state-of-the-art -BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction -of the training costs of the best models from the literature. We show that the -Transformer generalizes well to other tasks by applying it successfully to -English constituency parsing both with large and limited training data. -" -5015,1706.03799,"Maxwell Forbes, Yejin Choi",Verb Physics: Relative Physical Knowledge of Actions and Objects,cs.CL," Learning commonsense knowledge from natural language text is nontrivial due -to reporting bias: people rarely state the obvious, e.g., ""My house is bigger -than me."" However, while rarely stated explicitly, this trivial everyday -knowledge does influence the way people talk about the world, which provides -indirect clues to reason about the world. For example, a statement like, ""Tyler -entered his house"" implies that his house is bigger than Tyler. - In this paper, we present an approach to infer relative physical knowledge of -actions and objects along five dimensions (e.g., size, weight, and strength) -from unstructured natural language text. We frame knowledge acquisition as -joint inference over two closely related problems: learning (1) relative -physical knowledge of object pairs and (2) physical implications of actions -when applied to those object pairs. Empirical results demonstrate that it is -possible to extract knowledge of actions and objects from language and that -joint inference over different types of knowledge improves performance. -" -5016,1706.03815,"Afra Alishahi, Marie Barking, Grzegorz Chrupa{\l}a",Encoding of phonology in a recurrent neural model of grounded speech,cs.CL cs.LG cs.SD," We study the representation and encoding of phonemes in a recurrent neural -network model of grounded speech. We use a model which processes images and -their spoken descriptions, and projects the visual and auditory representations -into the same semantic space. We perform a number of analyses on how -information about individual phonemes is encoded in the MFCC features extracted -from the speech signal, and the activations of the layers of the model. Via -experiments with phoneme decoding and phoneme discrimination we show that -phoneme representations are most salient in the lower layers of the model, -where low-level signals are processed at a fine-grained level, although a large -amount of phonological information is retain at the top recurrent layer. We -further find out that the attention mechanism following the top recurrent layer -significantly attenuates encoding of phonology and makes the utterance -embeddings much more invariant to synonymy. Moreover, a hierarchical clustering -of phoneme representations learned by the network shows an organizational -structure of phonemes similar to those proposed in linguistics. -" -5017,1706.03818,"Shane Settle, Keith Levin, Herman Kamper, Karen Livescu","Query-by-Example Search with Discriminative Neural Acoustic Word - Embeddings",cs.CL," Query-by-example search often uses dynamic time warping (DTW) for comparing -queries and proposed matching segments. Recent work has shown that comparing -speech segments by representing them as fixed-dimensional vectors --- acoustic -word embeddings --- and measuring their vector distance (e.g., cosine distance) -can discriminate between words more accurately than DTW-based approaches. We -consider an approach to query-by-example search that embeds both the query and -database segments according to a neural model, followed by nearest-neighbor -search to find the matching segments. Earlier work on embedding-based -query-by-example, using template-based acoustic word embeddings, achieved -competitive performance. We find that our embeddings, based on recurrent neural -networks trained to optimize word discrimination, achieve substantial -improvements in performance and run-time efficiency over the previous -approaches. -" -5018,1706.03824,"Baskaran Sankaran, Markus Freitag and Yaser Al-Onaizan",Attention-based Vocabulary Selection for NMT Decoding,cs.CL," Neural Machine Translation (NMT) models usually use large target vocabulary -sizes to capture most of the words in the target language. The vocabulary size -is a big factor when decoding new sentences as the final softmax layer -normalizes over all possible target words. To address this problem, it is -widely common to restrict the target vocabulary with candidate lists based on -the source sentence. Usually, the candidate lists are a combination of external -word-to-word aligner, phrase table entries or most frequent words. In this -work, we propose a simple and yet novel approach to learn candidate lists -directly from the attention layer during NMT training. The candidate lists are -highly optimized for the current NMT model and do not need any external -computation of the candidate pool. We show significant decoding speedup -compared with using the entire vocabulary, without losing any translation -quality for two language pairs. -" -5019,1706.03850,"Yizhe Zhang, Zhe Gan, Kai Fan, Zhi Chen, Ricardo Henao, Dinghan Shen, - Lawrence Carin",Adversarial Feature Matching for Text Generation,stat.ML cs.CL cs.LG," The Generative Adversarial Network (GAN) has achieved great success in -generating realistic (real-valued) synthetic data. However, convergence issues -and difficulties dealing with discrete data hinder the applicability of GAN to -text. We propose a framework for generating realistic text via adversarial -training. We employ a long short-term memory network as generator, and a -convolutional network as discriminator. Instead of using the standard objective -of GAN, we propose matching the high-dimensional latent feature distributions -of real and synthetic sentences, via a kernelized discrepancy metric. This -eases adversarial training by alleviating the mode-collapsing problem. Our -experiments show superior performance in quantitative evaluation, and -demonstrate that our model can generate realistic-looking sentences. -" -5020,1706.03872,Philipp Koehn and Rebecca Knowles,Six Challenges for Neural Machine Translation,cs.CL," We explore six challenges for neural machine translation: domain mismatch, -amount of training data, rare words, long sentences, word alignment, and beam -search. We show both deficiencies and improvements over the quality of -phrase-based statistical machine translation. -" -5021,1706.03946,Ed Collins and Isabelle Augenstein and Sebastian Riedel,A Supervised Approach to Extractive Summarisation of Scientific Papers,cs.CL cs.AI cs.NE stat.AP stat.ML," Automatic summarisation is a popular approach to reduce a document to its -main arguments. Recent research in the area has focused on neural approaches to -summarisation, which can be very data-hungry. However, few large datasets exist -and none for the traditionally popular domain of scientific publications, which -opens up challenging research avenues centered on encoding large, complex -documents. In this paper, we introduce a new dataset for summarisation of -computer science publications by exploiting a large resource of author provided -summaries and show straightforward ways of extending it further. We develop -models on the dataset making use of both neural sentence encoding and -traditionally used summarisation features and show that models which encode -sentences as well as their local and global context perform best, significantly -outperforming well-established baseline methods. -" -5022,1706.03952,Jean-Philippe Bernardy and Charalambos Themistocleous,Modelling prosodic structure using Artificial Neural Networks,cs.CL," The ability to accurately perceive whether a speaker is asking a question or -is making a statement is crucial for any successful interaction. However, -learning and classifying tonal patterns has been a challenging task for -automatic speech recognition and for models of tonal representation, as tonal -contours are characterized by significant variation. This paper provides a -classification model of Cypriot Greek questions and statements. We evaluate two -state-of-the-art network architectures: a Long Short-Term Memory (LSTM) network -and a convolutional network (ConvNet). The ConvNet outperforms the LSTM in the -classification task and exhibited an excellent performance with 95% -classification accuracy. -" -5023,1706.04115,"Omer Levy, Minjoon Seo, Eunsol Choi, Luke Zettlemoyer",Zero-Shot Relation Extraction via Reading Comprehension,cs.CL cs.AI cs.LG," We show that relation extraction can be reduced to answering simple reading -comprehension questions, by associating one or more natural-language questions -with each relation slot. This reduction has several advantages: we can (1) -learn relation-extraction models by extending recent neural -reading-comprehension techniques, (2) build very large training sets for those -models by combining relation-specific crowd-sourced questions with distant -supervision, and even (3) do zero-shot learning by extracting new relation -types that are only specified at test-time, for which we have no labeled -training examples. Experiments on a Wikipedia slot-filling task demonstrate -that the approach can generalize to new questions for known relation types with -high accuracy, and that zero-shot generalization to unseen relation types is -possible, at lower accuracy levels, setting the bar for future work on this -task. -" -5024,1706.04138,Marcin Junczys-Dowmunt and Roman Grundkiewicz,"An Exploration of Neural Sequence-to-Sequence Architectures for - Automatic Post-Editing",cs.CL," In this work, we explore multiple neural architectures adapted for the task -of automatic post-editing of machine translation output. We focus on neural -end-to-end models that combine both inputs $mt$ (raw MT output) and $src$ -(source language input) in a single neural architecture, modeling $\{mt, src\} -\rightarrow pe$ directly. Apart from that, we investigate the influence of -hard-attention models which seem to be well-suited for monolingual tasks, as -well as combinations of both ideas. We report results on data sets provided -during the WMT-2016 shared task on automatic post-editing and can demonstrate -that dual-attention models that incorporate all available data in the APE -scenario in a single model improve on the best shared task system and on all -other published results after the shared task. Dual-attention models that are -combined with hard attention remain competitive despite applying fewer changes -to the input. -" -5025,1706.04206,"Hossein Hematialam, Wlodek Zadrozny","Identifying Condition-Action Statements in Medical Guidelines Using - Domain-Independent Features",cs.CL cs.IR," This paper advances the state of the art in text understanding of medical -guidelines by releasing two new annotated clinical guidelines datasets, and -establishing baselines for using machine learning to extract condition-action -pairs. In contrast to prior work that relies on manually created rules, we -report experiment with several supervised machine learning techniques to -classify sentences as to whether they express conditions and actions. We show -the limitations and possible extensions of this work on text mining of medical -guidelines. -" -5026,1706.04223,"Jake Zhao (Junbo), Yoon Kim, Kelly Zhang, Alexander M. Rush and Yann - LeCun",Adversarially Regularized Autoencoders,cs.LG cs.CL cs.NE," Deep latent variable models, trained using variational autoencoders or -generative adversarial networks, are now a key technique for representation -learning of continuous structures. However, applying similar methods to -discrete structures, such as text sequences or discretized images, has proven -to be more challenging. In this work, we propose a flexible method for training -deep latent variable models of discrete structures. Our approach is based on -the recently-proposed Wasserstein autoencoder (WAE) which formalizes the -adversarial autoencoder (AAE) as an optimal transport problem. We first extend -this framework to model discrete sequences, and then further explore different -learned priors targeting a controllable representation. This adversarially -regularized autoencoder (ARAE) allows us to generate natural textual outputs as -well as perform manipulations in the latent space to induce change in the -output space. Finally we show that the latent representation can be trained to -perform unaligned textual style transfer, giving improvements both in -automatic/human evaluation compared to existing methods. -" -5027,1706.04326,"Xing Fan, Emilio Monti, Lambert Mathias, Markus Dreyer",Transfer Learning for Neural Semantic Parsing,cs.CL cs.LG," The goal of semantic parsing is to map natural language to a machine -interpretable meaning representation language (MRL). One of the constraints -that limits full exploration of deep learning technologies for semantic parsing -is the lack of sufficient annotation training data. In this paper, we propose -using sequence-to-sequence in a multi-task setup for semantic parsing with a -focus on transfer learning. We explore three multi-task architectures for -sequence-to-sequence modeling and compare their performance with an -independently trained model. Our experiments show that the multi-task setup -aids transfer learning from an auxiliary task with large labeled data to a -target task with smaller labeled data. We see absolute accuracy gains ranging -from 1.0% to 4.4% in our in- house data set, and we also see good gains ranging -from 2.5% to 7.0% on the ATIS semantic parsing tasks with syntactic and -semantic auxiliary tasks. -" -5028,1706.04389,"Filip Klubi\v{c}ka, Antonio Toral, V\'ictor M. S\'anchez-Cartagena","Fine-grained human evaluation of neural versus phrase-based machine - translation",cs.CL," We compare three approaches to statistical machine translation (pure -phrase-based, factored phrase-based and neural) by performing a fine-grained -manual evaluation via error annotation of the systems' outputs. The error types -in our annotation are compliant with the multidimensional quality metrics -(MQM), and the annotation is performed by two annotators. Inter-annotator -agreement is high for such a task, and results show that the best performing -system (neural) reduces the errors produced by the worst system (phrase-based) -by 54%. -" -5029,1706.04432,{\L}ukasz D\k{e}bowski,"Is Natural Language a Perigraphic Process? The Theorem about Facts and - Words Revisited",cs.IT cs.CL math.IT," As we discuss, a stationary stochastic process is nonergodic when a random -persistent topic can be detected in the infinite random text sampled from the -process, whereas we call the process strongly nonergodic when an infinite -sequence of independent random bits, called probabilistic facts, is needed to -describe this topic completely. Replacing probabilistic facts with an -algorithmically random sequence of bits, called algorithmic facts, we adapt -this property back to ergodic processes. Subsequently, we call a process -perigraphic if the number of algorithmic facts which can be inferred from a -finite text sampled from the process grows like a power of the text length. We -present a simple example of such a process. Moreover, we demonstrate an -assertion which we call the theorem about facts and words. This proposition -states that the number of probabilistic or algorithmic facts which can be -inferred from a text drawn from a process must be roughly smaller than the -number of distinct word-like strings detected in this text by means of the PPM -compression algorithm. We also observe that the number of the word-like strings -for a sample of plays by Shakespeare follows an empirical stepwise power law, -in a stark contrast to Markov processes. Hence we suppose that natural language -considered as a process is not only non-Markov but also perigraphic. -" -5030,1706.04473,"Kairit Sirts, Olivier Piguet, Mark Johnson",Idea density for predicting Alzheimer's disease from transcribed speech,cs.CL," Idea Density (ID) measures the rate at which ideas or elementary predications -are expressed in an utterance or in a text. Lower ID is found to be associated -with an increased risk of developing Alzheimer's disease (AD) (Snowdon et al., -1996; Engelman et al., 2010). ID has been used in two different versions: -propositional idea density (PID) counts the expressed ideas and can be applied -to any text while semantic idea density (SID) counts pre-defined information -content units and is naturally more applicable to normative domains, such as -picture description tasks. In this paper, we develop DEPID, a novel -dependency-based method for computing PID, and its version DEPID-R that enables -to exclude repeating ideas---a feature characteristic to AD speech. We conduct -the first comparison of automatically extracted PID and SID in the diagnostic -classification task on two different AD datasets covering both closed-topic and -free-recall domains. While SID performs better on the normative dataset, adding -PID leads to a small but significant improvement (+1.7 F-score). On the -free-topic dataset, PID performs better than SID as expected (77.6 vs 72.3 in -F-score) but adding the features derived from the word embedding clustering -underlying the automatic SID increases the results considerably, leading to an -F-score of 84.8. -" -5031,1706.04560,"Sandeep Subramanian, Tong Wang, Xingdi Yuan, Saizheng Zhang, Yoshua - Bengio, Adam Trischler",Neural Models for Key Phrase Detection and Question Generation,cs.CL cs.AI cs.NE," We propose a two-stage neural model to tackle question generation from -documents. First, our model estimates the probability that word sequences in a -document are ones that a human would pick when selecting candidate answers by -training a neural key-phrase extractor on the answers in a question-answering -corpus. Predicted key phrases then act as target answers and condition a -sequence-to-sequence question-generation model with a copy mechanism. -Empirically, our key-phrase extraction model significantly outperforms an -entity-tagging baseline and existing rule-based approaches. We further -demonstrate that our question generation system formulates fluent, answerable -questions from key phrases. This two-stage system could be used to augment or -generate reading comprehension datasets, which may be leveraged to improve -machine reading systems or in educational settings. -" -5032,1706.04815,"Chuanqi Tan, Furu Wei, Nan Yang, Bowen Du, Weifeng Lv, Ming Zhou","S-Net: From Answer Extraction to Answer Generation for Machine Reading - Comprehension",cs.CL," In this paper, we present a novel approach to machine reading comprehension -for the MS-MARCO dataset. Unlike the SQuAD dataset that aims to answer a -question with exact text spans in a passage, the MS-MARCO dataset defines the -task as answering a question from multiple passages and the words in the answer -are not necessary in the passages. We therefore develop an -extraction-then-synthesis framework to synthesize answers from extraction -results. Specifically, the answer extraction model is first employed to predict -the most important sub-spans from the passage as evidence, and the answer -synthesis model takes the evidence as additional features along with the -question and passage to further elaborate the final answers. We build the -answer extraction model with state-of-the-art neural networks for single -passage reading comprehension, and propose an additional task of passage -ranking to help answer extraction in multiple passages. The answer synthesis -model is based on the sequence-to-sequence neural networks with extracted -evidences as features. Experiments show that our extraction-then-synthesis -method outperforms state-of-the-art methods. -" -5033,1706.04872,Ramon Ferrer-i-Cancho,"Towards a theory of word order. Comment on ""Dependency distance: a new - perspective on syntactic patterns in natural language"" by Haitao Liu et al",cs.CL physics.soc-ph," Comment on ""Dependency distance: a new perspective on syntactic patterns in -natural language"" by Haitao Liu et al -" -5034,1706.04902,"Sebastian Ruder, Ivan Vuli\'c, Anders S{\o}gaard",A Survey Of Cross-lingual Word Embedding Models,cs.CL cs.LG," Cross-lingual representations of words enable us to reason about word meaning -in multilingual contexts and are a key facilitator of cross-lingual transfer -when developing natural language processing models for low-resource languages. -In this survey, we provide a comprehensive typology of cross-lingual word -embedding models. We compare their data requirements and objective functions. -The recurring theme of the survey is that many of the models presented in the -literature optimize for the same objectives, and that seemingly different -models are often equivalent modulo optimization strategies, hyper-parameters, -and such. We also discuss the different ways cross-lingual word embeddings are -evaluated, as well as future challenges and research horizons. -" -5035,1706.04922,"Gia-Hung Nguyen, Laure Soulier, Lynda Tamine, Nathalie Bricon-Souf","DSRIM: A Deep Neural Information Retrieval Model Enhanced by a Knowledge - Resource Driven Representation of Documents",cs.IR cs.CL," The state-of-the-art solutions to the vocabulary mismatch in information -retrieval (IR) mainly aim at leveraging either the relational semantics -provided by external resources or the distributional semantics, recently -investigated by deep neural approaches. Guided by the intuition that the -relational semantics might improve the effectiveness of deep neural approaches, -we propose the Deep Semantic Resource Inference Model (DSRIM) that relies on: -1) a representation of raw-data that models the relational semantics of text by -jointly considering objects and relations expressed in a knowledge resource, -and 2) an end-to-end neural architecture that learns the query-document -relevance by leveraging the distributional and relational semantics of -documents and queries. The experimental evaluation carried out on two TREC -datasets from TREC Terabyte and TREC CDS tracks relying respectively on WordNet -and MeSH resources, indicates that our model outperforms state-of-the-art -semantic and deep neural IR models. -" -5036,1706.04971,"Dominik Schlechtweg, Stefanie Eckmann, Enrico Santus, Sabine Schulte - im Walde, Daniel Hole",German in Flux: Detecting Metaphoric Change via Word Entropy,cs.CL," This paper explores the information-theoretic measure entropy to detect -metaphoric change, transferring ideas from hypernym detection to research on -language change. We also build the first diachronic test set for German as a -standard for metaphoric change annotation. Our model shows high performance, is -unsupervised, language-independent and generalizable to other processes of -semantic change. -" -5037,1706.04997,John J. Camilleri and Normunds Gr\=uz\={\i}tis and Gerardo Schneider,Extracting Formal Models from Normative Texts,cs.CL," We are concerned with the analysis of normative texts - documents based on -the deontic notions of obligation, permission, and prohibition. Our goal is to -make queries about these notions and verify that a text satisfies certain -properties concerning causality of actions and timing constraints. This -requires taking the original text and building a representation (model) of it -in a formal language, in our case the C-O Diagram formalism. We present an -experimental, semi-automatic aid that helps to bridge the gap between a -normative text in natural language and its C-O Diagram representation. Our -approach consists of using dependency structures obtained from the -state-of-the-art Stanford Parser, and applying our own rules and heuristics in -order to extract the relevant components. The result is a tabular data -structure where each sentence is split into suitable fields, which can then be -converted into a C-O Diagram. The process is not fully automatic however, and -some post-editing is generally required of the user. We apply our tool and -perform experiments on documents from different domains, and report an initial -evaluation of the accuracy and feasibility of our approach. -" -5038,1706.05075,"Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, Bo Xu","Joint Extraction of Entities and Relations Based on a Novel Tagging - Scheme",cs.CL cs.AI cs.LG," Joint extraction of entities and relations is an important task in -information extraction. To tackle this problem, we firstly propose a novel -tagging scheme that can convert the joint extraction task to a tagging problem. -Then, based on our tagging scheme, we study different end-to-end models to -extract entities and their relations directly, without identifying entities and -relations separately. We conduct experiments on a public dataset produced by -distant supervision method and the experimental results show that the tagging -based methods are better than most of the existing pipelined and joint learning -methods. What's more, the end-to-end model proposed in this paper, achieves the -best results on the public dataset. -" -5039,1706.05083,Chris Hokamp,"Ensembling Factored Neural Machine Translation Models for Automatic - Post-Editing and Quality Estimation",cs.CL," This work presents a novel approach to Automatic Post-Editing (APE) and -Word-Level Quality Estimation (QE) using ensembles of specialized Neural -Machine Translation (NMT) systems. Word-level features that have proven -effective for QE are included as input factors, expanding the representation of -the original source and the machine translation hypothesis, which are used to -generate an automatically post-edited hypothesis. We train a suite of NMT -models that use different input representations, but share the same output -space. These models are then ensembled together, and tuned for both the APE and -the QE task. We thus attempt to connect the state-of-the-art approaches to APE -and QE within a single framework. Our models achieve state-of-the-art results -in both tasks, with the only difference in the tuning step which learns weights -for each component of the ensemble. -" -5040,1706.05084,Kelsey MacMillan and James D. Wilson,Topic supervised non-negative matrix factorization,cs.CL cs.IR cs.LG stat.ML," Topic models have been extensively used to organize and interpret the -contents of large, unstructured corpora of text documents. Although topic -models often perform well on traditional training vs. test set evaluations, it -is often the case that the results of a topic model do not align with human -interpretation. This interpretability fallacy is largely due to the -unsupervised nature of topic models, which prohibits any user guidance on the -results of a model. In this paper, we introduce a semi-supervised method called -topic supervised non-negative matrix factorization (TS-NMF) that enables the -user to provide labeled example documents to promote the discovery of more -meaningful semantic structure of a corpus. In this way, the results of TS-NMF -better match the intuition and desired labeling of the user. The core of TS-NMF -relies on solving a non-convex optimization problem for which we derive an -iterative algorithm that is shown to be monotonic and convergent to a local -optimum. We demonstrate the practical utility of TS-NMF on the Reuters and -PubMed corpora, and find that TS-NMF is especially useful for conceptual or -broad topics, where topic key terms are not well understood. Although -identifying an optimal latent structure for the data is not a primary objective -of the proposed approach, we find that TS-NMF achieves higher weighted Jaccard -similarity scores than the contemporary methods, (unsupervised) NMF and latent -Dirichlet allocation, at supervision rates as low as 10% to 20%. -" -5041,1706.05087,"Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio","Plan, Attend, Generate: Character-level Neural Machine Translation with - Planning in the Decoder",cs.CL cs.NE," We investigate the integration of a planning mechanism into an -encoder-decoder architecture with an explicit alignment for character-level -machine translation. We develop a model that plans ahead when it computes -alignments between the source and target sequences, constructing a matrix of -proposed future alignments and a commitment vector that governs whether to -follow or recompute the plan. This mechanism is inspired by the strategic -attentive reader and writer (STRAW) model. Our proposed model is end-to-end -trainable with fully differentiable operations. We show that it outperforms a -strong baseline on three character-level decoder neural machine translation on -WMT'15 corpus. Our analysis demonstrates that our model can compute -qualitatively intuitive alignments and achieves superior performance with fewer -parameters. -" -5042,1706.05089,"Go Sugimoto (ACDH-\""OAW)",Number game,cs.CL cs.DL," CLARIN (Common Language Resources and Technology Infrastructure) is regarded -as one of the most important European research infrastructures, offering and -promoting a wide array of useful services for (digital) research in linguistics -and humanities. However, the assessment of the users for its core technical -development has been highly limited, therefore, it is unclear if the community -is thoroughly aware of the status-quo of the growing infrastructure. In -addition, CLARIN does not seem to be fully materialised marketing and business -plans and strategies despite its strong technical assets. This article analyses -the web traffic of the Virtual Language Observatory, one of the main web -applications of CLARIN and a symbol of pan-European re-search cooperation, to -evaluate the users and performance of the service in a transparent and -scientific way. It is envisaged that the paper can raise awareness of the -pressing issues on objective and transparent operation of the infrastructure -though Open Evaluation, and the synergy between marketing and technical -development. It also investigates the ""science of web analytics"" in an attempt -to document the research process for the purpose of reusability and -reproducibility, thus to find universal lessons for the use of a web analytics, -rather than to merely produce a statistical report of a particular website -which loses its value outside its context. -" -5043,1706.05111,"Dai Quoc Nguyen, Dat Quoc Nguyen, Ashutosh Modi, Stefan Thater and - Manfred Pinkal",A Mixture Model for Learning Multi-Sense Word Embeddings,cs.CL," Word embeddings are now a standard technique for inducing meaning -representations for words. For getting good representations, it is important to -take into account different senses of a word. In this paper, we propose a -mixture model for learning multi-sense word embeddings. Our model generalizes -the previous works in that it allows to induce different weights of different -senses of a word. The experimental results show that our model outperforms -previous models on standard evaluation tasks. -" -5044,1706.05122,Takuma Yoneda and Koki Mori and Makoto Miwa and Yutaka Sasaki,Bib2vec: An Embedding-based Search System for Bibliographic Information,cs.CL cs.AI cs.IR," We propose a novel embedding model that represents relationships among -several elements in bibliographic information with high representation ability -and flexibility. Based on this model, we present a novel search system that -shows the relationships among the elements in the ACL Anthology Reference -Corpus. The evaluation results show that our model can achieve a high -prediction ability and produce reasonable search results. -" -5045,1706.05125,"Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh and Dhruv Batra",Deal or No Deal? End-to-End Learning for Negotiation Dialogues,cs.AI cs.CL," Much of human dialogue occurs in semi-cooperative settings, where agents with -different goals attempt to agree on common decisions. Negotiations require -complex communication and reasoning skills, but success is easy to measure, -making this an interesting task for AI. We gather a large dataset of -human-human negotiations on a multi-issue bargaining task, where agents who -cannot observe each other's reward functions must reach an agreement (or a -deal) via natural language dialogue. For the first time, we show it is possible -to train end-to-end models for negotiation, which must learn both linguistic -and reasoning skills with no annotated dialogue states. We also introduce -dialogue rollouts, in which the model plans ahead by simulating possible -complete continuations of the conversation, and find that this technique -dramatically improves performance. Our code and dataset are publicly available -(https://github.com/facebookresearch/end-to-end-negotiator). -" -5046,1706.05140,Shraey Bhatia and Jey Han Lau and Timothy Baldwin,An Automatic Approach for Document-level Topic Model Evaluation,cs.CL," Topic models jointly learn topics and document-level topic distribution. -Extrinsic evaluation of topic models tends to focus exclusively on topic-level -evaluation, e.g. by assessing the coherence of topics. We demonstrate that -there can be large discrepancies between topic- and document-level model -quality, and that basing model evaluation on topic-level analysis can be highly -misleading. We propose a method for automatically predicting topic model -quality based on analysis of document-level topic allocations, and provide -empirical evidence for its robustness. -" -5047,1706.05349,"Jean-Val\`ere Cossu, Alejandro Molina-Villegas, Mariana Tello-Signoret",Active learning in annotating micro-blogs dealing with e-reputation,cs.SI cs.CL," Elections unleash strong political views on Twitter, but what do people -really think about politics? Opinion and trend mining on micro blogs dealing -with politics has recently attracted researchers in several fields including -Information Retrieval and Machine Learning (ML). Since the performance of ML -and Natural Language Processing (NLP) approaches are limited by the amount and -quality of data available, one promising alternative for some tasks is the -automatic propagation of expert annotations. This paper intends to develop a -so-called active learning process for automatically annotating French language -tweets that deal with the image (i.e., representation, web reputation) of -politicians. Our main focus is on the methodology followed to build an original -annotated dataset expressing opinion from two French politicians over time. We -therefore review state of the art NLP-based ML algorithms to automatically -annotate tweets using a manual initiation step as bootstrap. This paper focuses -on key issues about active learning while building a large annotated data set -from noise. This will be introduced by human annotators, abundance of data and -the label distribution across data and entities. In turn, we show that Twitter -characteristics such as the author's name or hashtags can be considered as the -bearing point to not only improve automatic systems for Opinion Mining (OM) and -Topic Classification but also to reduce noise in human annotations. However, a -later thorough analysis shows that reducing noise might induce the loss of -crucial information. -" -5048,1706.05549,"Liliya Akhtyamova, Andrey Ignatov, John Cardiff",A Large-Scale CNN Ensemble for Medication Safety Analysis,cs.IR cs.CL," Revealing Adverse Drug Reactions (ADR) is an essential part of post-marketing -drug surveillance, and data from health-related forums and medical communities -can be of a great significance for estimating such effects. In this paper, we -propose an end-to-end CNN-based method for predicting drug safety on user -comments from healthcare discussion forums. We present an architecture that is -based on a vast ensemble of CNNs with varied structural parameters, where the -prediction is determined by the majority vote. To evaluate the performance of -the proposed solution, we present a large-scale dataset collected from a -medical website that consists of over 50 thousand reviews for more than 4000 -drugs. The results demonstrate that our model significantly outperforms -conventional approaches and predicts medicine safety with an accuracy of 87.17% -for binary and 62.88% for multi-classification tasks. -" -5049,1706.05565,"Po-Sen Huang, Chong Wang, Sitao Huang, Dengyong Zhou, Li Deng",Towards Neural Phrase-based Machine Translation,cs.CL stat.ML," In this paper, we present Neural Phrase-based Machine Translation (NPMT). Our -method explicitly models the phrase structures in output sequences using -Sleep-WAke Networks (SWAN), a recently proposed segmentation-based sequence -modeling method. To mitigate the monotonic alignment requirement of SWAN, we -introduce a new layer to perform (soft) local reordering of input sequences. -Different from existing neural machine translation (NMT) approaches, NPMT does -not use attention-based decoding mechanisms. Instead, it directly outputs -phrases in a sequential order and can decode in linear time. Our experiments -show that NPMT achieves superior performances on IWSLT 2014 -German-English/English-German and IWSLT 2015 English-Vietnamese machine -translation tasks compared with strong NMT baselines. We also observe that our -method produces meaningful phrases in output languages. -" -5050,1706.05585,"Tom Hope, Joel Chan, Aniket Kittur, Dafna Shahaf",Accelerating Innovation Through Analogy Mining,cs.CL cs.AI stat.ML," The availability of large idea repositories (e.g., the U.S. patent database) -could significantly accelerate innovation and discovery by providing people -with inspiration from solutions to analogous problems. However, finding useful -analogies in these large, messy, real-world repositories remains a persistent -challenge for either human or automated methods. Previous approaches include -costly hand-created databases that have high relational structure (e.g., -predicate calculus representations) but are very sparse. Simpler -machine-learning/information-retrieval similarity metrics can scale to large, -natural-language datasets, but struggle to account for structural similarity, -which is central to analogy. In this paper we explore the viability and value -of learning simpler structural representations, specifically, ""problem -schemas"", which specify the purpose of a product and the mechanisms by which it -achieves that purpose. Our approach combines crowdsourcing and recurrent neural -networks to extract purpose and mechanism vector representations from product -descriptions. We demonstrate that these learned vectors allow us to find -analogies with higher precision and recall than traditional -information-retrieval methods. In an ideation experiment, analogies retrieved -by our models significantly increased people's likelihood of generating -creative ideas compared to analogies retrieved by traditional methods. Our -results suggest a promising approach to enabling computational analogy at scale -is to learn and leverage weaker structural representations. -" -5051,1706.05656,"Stefan Frank, Jinbiao Yang","Lexical representation explains cortical entrainment during speech - comprehension",q-bio.NC cs.CL," Results from a recent neuroimaging study on spoken sentence comprehension -have been interpreted as evidence for cortical entrainment to hierarchical -syntactic structure. We present a simple computational model that predicts the -power spectra from this study, even though the model's linguistic knowledge is -restricted to the lexical level, and word-level representations are not -combined into higher-level units (phrases or sentences). Hence, the cortical -entrainment results can also be explained from the lexical properties of the -stimuli, without recourse to hierarchical syntax. -" -5052,1706.05674,"Takuo Hamaguchi, Hidekazu Oiwa, Masashi Shimbo, and Yuji Matsumoto","Knowledge Transfer for Out-of-Knowledge-Base Entities: A Graph Neural - Network Approach",cs.CL," Knowledge base completion (KBC) aims to predict missing information in a -knowledge base.In this paper, we address the out-of-knowledge-base (OOKB) -entity problem in KBC:how to answer queries concerning test entities not -observed at training time. Existing embedding-based KBC models assume that all -test entities are available at training time, making it unclear how to obtain -embeddings for new entities without costly retraining. To solve the OOKB entity -problem without retraining, we use graph neural networks (Graph-NNs) to compute -the embeddings of OOKB entities, exploiting the limited auxiliary knowledge -provided at test time.The experimental results show the effectiveness of our -proposed model in the OOKB setting.Additionally, in the standard KBC setting in -which OOKB entities are not involved, our model achieves state-of-the-art -performance on the WordNet dataset. The code and dataset are available at -https://github.com/takuo-h/GNN-for-OOKB -" -5053,1706.05719,Thomas Krause,"Towards the Improvement of Automated Scientific Document Categorization - by Deep Learning",cs.IR cs.CL," This master thesis describes an algorithm for automated categorization of -scientific documents using deep learning techniques and compares the results to -the results of existing classification algorithms. As an additional goal a -reusable API is to be developed allowing the automation of classification tasks -in existing software. A design will be proposed using a convolutional neural -network as a classifier and integrating this into a REST based API. This is -then used as the basis for an actual proof of concept implementation presented -as well in this thesis. It will be shown that the deep learning classifier -provides very good result in the context of multi-class document categorization -and that it is feasible to integrate such classifiers into a larger ecosystem -using REST based services. -" -5054,1706.05723,"Louis Chartrand, Jackie C.K. Cheung, Mohamed Bouguessa",Detecting Large Concept Extensions for Conceptual Analysis,cs.CL," When performing a conceptual analysis of a concept, philosophers are -interested in all forms of expression of a concept in a text---be it direct or -indirect, explicit or implicit. In this paper, we experiment with topic-based -methods of automating the detection of concept expressions in order to -facilitate philosophical conceptual analysis. We propose six methods based on -LDA, and evaluate them on a new corpus of court decision that we had annotated -by experts and non-experts. Our results indicate that these methods can yield -important improvements over the keyword heuristic, which is often used as a -concept detection heuristic in many contexts. While more work remains to be -done, this indicates that detecting concepts through topics can serve as a -general-purpose method for at least some forms of concept expression that are -not captured using naive keyword approaches. -" -5055,1706.05765,"Makoto Morishita, Yusuke Oda, Graham Neubig, Koichiro Yoshino, - Katsuhito Sudoh, Satoshi Nakamura","An Empirical Study of Mini-Batch Creation Strategies for Neural Machine - Translation",cs.CL," Training of neural machine translation (NMT) models usually uses mini-batches -for efficiency purposes. During the mini-batched training process, it is -necessary to pad shorter sentences in a mini-batch to be equal in length to the -longest sentence therein for efficient computation. Previous work has noted -that sorting the corpus based on the sentence length before making mini-batches -reduces the amount of padding and increases the processing speed. However, -despite the fact that mini-batch creation is an essential step in NMT training, -widely used NMT toolkits implement disparate strategies for doing so, which -have not been empirically validated or compared. This work investigates -mini-batch creation strategies with experiments over two different datasets. -Our results suggest that the choice of a mini-batch creation strategy has a -large effect on NMT training and some length-based sorting strategies do not -always work well compared with simple shuffling. -" -5056,1706.06177,"Efsun Sarioglu Kayi, Kabir Yadav, James M. Chamberlain, Hyeong-Ah Choi",Topic Modeling for Classification of Clinical Reports,cs.CL," Electronic health records (EHRs) contain important clinical information about -patients. Efficient and effective use of this information could supplement or -even replace manual chart review as a means of studying and improving the -quality and safety of healthcare delivery. However, some of these clinical data -are in the form of free text and require pre-processing before use in automated -systems. A common free text data source is radiology reports, typically -dictated by radiologists to explain their interpretations. We sought to -demonstrate machine learning classification of computed tomography (CT) imaging -reports into binary outcomes, i.e. positive and negative for fracture, using -regular text classification and classifiers based on topic modeling. Topic -modeling provides interpretable themes (topic distributions) in reports, a -representation that is more compact than the commonly used bag-of-words -representation and can be processed faster than raw text in subsequent -automated processes. We demonstrate new classifiers based on this topic -modeling representation of the reports. Aggregate topic classifier (ATC) and -confidence-based topic classifier (CTC) use a single topic that is determined -from the training dataset based on different measures to classify the reports -on the test dataset. Alternatively, similarity-based topic classifier (STC) -measures the similarity between the reports' topic distributions to determine -the predicted class. Our proposed topic modeling-based classifier systems are -shown to be competitive with existing text classification techniques and -provides an efficient and interpretable representation. -" -5057,1706.06197,"Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang","meProp: Sparsified Back Propagation for Accelerated Deep Learning with - Reduced Overfitting",cs.LG cs.AI cs.CL cs.CV," We propose a simple yet effective technique for neural network learning. The -forward propagation is computed as usual. In back propagation, only a small -subset of the full gradient is computed to update the model parameters. The -gradient vectors are sparsified in such a way that only the top-$k$ elements -(in terms of magnitude) are kept. As a result, only $k$ rows or columns -(depending on the layout) of the weight matrix are modified, leading to a -linear reduction ($k$ divided by the vector dimension) in the computational -cost. Surprisingly, experimental results demonstrate that we can update only -1-4% of the weights at each back propagation pass. This does not result in a -larger number of training iterations. More interestingly, the accuracy of the -resulting models is actually improved rather than degraded, and a detailed -analysis is given. The code is available at https://github.com/lancopku/meProp -" -5058,1706.06210,"Pawe{\l} Budzianowski, Stefan Ultes, Pei-Hao Su, Nikola Mrk\v{s}i\'c, - Tsung-Hsien Wen, I\~nigo Casanueva, Lina Rojas-Barahona, Milica Ga\v{s}i\'c","Sub-domain Modelling for Dialogue Management with Hierarchical - Reinforcement Learning",cs.CL cs.AI," Human conversation is inherently complex, often spanning many different -topics/domains. This makes policy learning for dialogue systems very -challenging. Standard flat reinforcement learning methods do not provide an -efficient framework for modelling such dialogues. In this paper, we focus on -the under-explored problem of multi-domain dialogue management. First, we -propose a new method for hierarchical reinforcement learning using the option -framework. Next, we show that the proposed architecture learns faster and -arrives at a better policy than the existing flat ones do. Moreover, we show -how pretrained policies can be adapted to more complex systems with an -additional set of new actions. In doing that, we show that our approach has the -potential to facilitate policy optimisation for more sophisticated multi-domain -dialogue systems. -" -5059,1706.06363,"Krzysztof Wr\'obel, Maciej Wielgosz, Marcin Pietro\'n, Micha{\l} - Karwatowski, Aleksander Smywi\'nski-Pohl",Improving text classification with vectors of reduced precision,cs.CL," This paper presents the analysis of the impact of a floating-point number -precision reduction on the quality of text classification. The precision -reduction of the vectors representing the data (e.g. TF-IDF representation in -our case) allows for a decrease of computing time and memory footprint on -dedicated hardware platforms. The impact of precision reduction on the -classification quality was performed on 5 corpora, using 4 different -classifiers. Also, dimensionality reduction was taken into account. Results -indicate that the precision reduction improves classification accuracy for most -cases (up to 25% of error reduction). In general, the reduction from 64 to 4 -bits gives the best scores and ensures that the results will not be worse than -with the full floating-point representation. -" -5060,1706.06415,"Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, - Huanbo Luan, Yang Liu",THUMT: An Open Source Toolkit for Neural Machine Translation,cs.CL," This paper introduces THUMT, an open-source toolkit for neural machine -translation (NMT) developed by the Natural Language Processing Group at -Tsinghua University. THUMT implements the standard attention-based -encoder-decoder framework on top of Theano and supports three training -criteria: maximum likelihood estimation, minimum risk training, and -semi-supervised training. It features a visualization tool for displaying the -relevance between hidden states in neural networks and contextual words, which -helps to analyze the internal workings of NMT. Experiments on Chinese-English -datasets show that THUMT using minimum risk training significantly outperforms -GroundHog, a state-of-the-art toolkit for NMT. -" -5061,1706.06428,"Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin - Swersky, Ilya Sutskever, Navdeep Jaitly",An online sequence-to-sequence model for noisy speech recognition,cs.CL cs.LG stat.ML," Generative models have long been the dominant approach for speech -recognition. The success of these models however relies on the use of -sophisticated recipes and complicated machinery that is not easily accessible -to non-practitioners. Recent innovations in Deep Learning have given rise to an -alternative - discriminative models called Sequence-to-Sequence models, that -can almost match the accuracy of state of the art generative models. While -these models are easy to train as they can be trained end-to-end in a single -step, they have a practical limitation that they can only be used for offline -recognition. This is because the models require that the entirety of the input -sequence be available at the beginning of inference, an assumption that is not -valid for instantaneous speech recognition. To address this problem, online -sequence-to-sequence models were recently introduced. These models are able to -start producing outputs as data arrives, and the model feels confident enough -to output partial transcripts. These models, like sequence-to-sequence are -causal - the output produced by the model until any time, $t$, affects the -features that are computed subsequently. This makes the model inherently more -powerful than generative models that are unable to change features that are -computed from the data. This paper highlights two main contributions - an -improvement to online sequence-to-sequence model training, and its application -to noisy settings with mixed speech from two speakers. -" -5062,1706.06542,"Mir Tafseer Nayeem, Yllias Chali",Extract with Order for Coherent Multi-Document Summarization,cs.CL," In this work, we aim at developing an extractive summarizer in the -multi-document setting. We implement a rank based sentence selection using -continuous vector representations along with key-phrases. Furthermore, we -propose a model to tackle summary coherence for increasing readability. We -conduct experiments on the Document Understanding Conference (DUC) 2004 -datasets using ROUGE toolkit. Our experiments demonstrate that the methods -bring significant improvements over the state of the art methods in terms of -informativity and coherence. -" -5063,1706.06551,"Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan - Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max - Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis, - Phil Blunsom",Grounded Language Learning in a Simulated 3D World,cs.CL cs.LG stat.ML," We are increasingly surrounded by artificially intelligent technology that -takes decisions and executes actions on our behalf. This creates a pressing -need for general means to communicate with, instruct and guide artificial -agents, with human language the most compelling means for such communication. -To achieve this in a scalable fashion, agents must be able to relate language -to the world and to actions; that is, their understanding of language must be -grounded and embodied. However, learning grounded language is a notoriously -challenging problem in artificial intelligence research. Here we present an -agent that learns to interpret language in a simulated 3D environment where it -is rewarded for the successful execution of written instructions. Trained via a -combination of reinforcement and unsupervised learning, and beginning with -minimal prior knowledge, the agent learns to relate linguistic symbols to -emergent perceptual representations of its physical surroundings and to -pertinent sequences of actions. The agent's comprehension of language extends -beyond its prior experience, enabling it to apply familiar language to -unfamiliar situations and to interpret entirely novel instructions. Moreover, -the speed with which this agent learns new words increases as its semantic -knowledge grows. This facility for generalising and bootstrapping semantic -knowledge indicates the potential of the present approach for reconciling -ambiguous natural language with the complexity of the physical world. -" -5064,1706.06613,"Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell - Power",End-to-End Neural Ad-hoc Ranking with Kernel Pooling,cs.IR cs.CL," This paper proposes K-NRM, a kernel based neural model for document ranking. -Given a query and a set of documents, K-NRM uses a translation matrix that -models word-level similarities via word embeddings, a new kernel-pooling -technique that uses kernels to extract multi-level soft match features, and a -learning-to-rank layer that combines those features into the final ranking -score. The whole model is trained end-to-end. The ranking layer learns desired -feature patterns from the pairwise ranking loss. The kernels transfer the -feature patterns into soft-match targets at each similarity level and enforce -them on the translation matrix. The word embeddings are tuned accordingly so -that they can produce the desired soft matches. Experiments on a commercial -search engine's query log demonstrate the improvements of K-NRM over prior -feature-based and neural-based states-of-the-art, and explain the source of -K-NRM's advantage: Its kernel-guided embedding encodes a similarity metric -tailored for matching query words to document words, and provides effective -multi-level soft matches. -" -5065,1706.06681,"Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan - Srinivasan and Dragomir Radev",Graph-based Neural Multi-Document Summarization,cs.CL cs.LG," We propose a neural multi-document summarization (MDS) system that -incorporates sentence relation graphs. We employ a Graph Convolutional Network -(GCN) on the relation graphs, with sentence embeddings obtained from Recurrent -Neural Networks as input node features. Through multiple layer-wise -propagation, the GCN generates high-level hidden sentence features for salience -estimation. We then use a greedy heuristic to extract salient sentences while -avoiding redundancy. In our experiments on DUC 2004, we consider three types of -sentence relation graphs and demonstrate the advantage of combining sentence -relations in graphs with the representation power of deep neural networks. Our -model improves upon traditional graph-based extractive approaches and the -vanilla GRU sequence model with no graph, and it achieves competitive results -against other state-of-the-art multi-document summarization systems. -" -5066,1706.06714,Van-Khanh Tran and Le-Minh Nguyen,"Neural-based Natural Language Generation in Dialogue using RNN - Encoder-Decoder with Semantic Aggregation",cs.CL cs.LG," Natural language generation (NLG) is an important component in spoken -dialogue systems. This paper presents a model called Encoder-Aggregator-Decoder -which is an extension of an Recurrent Neural Network based Encoder-Decoder -architecture. The proposed Semantic Aggregator consists of two components: an -Aligner and a Refiner. The Aligner is a conventional attention calculated over -the encoded input information, while the Refiner is another attention or gating -mechanism stacked over the attentive Aligner in order to further select and -aggregate the semantic elements. The proposed model can be jointly trained both -sentence planning and surface realization to produce natural language -utterances. The model was extensively assessed on four different NLG domains, -in which the experimental results showed that the proposed generator -consistently outperforms the previous methods on all the NLG domains. -" -5067,1706.06749,"Shafiq Joty, Preslav Nakov, Llu\'is M\`arquez and Israa Jaradat","Cross-language Learning with Adversarial Neural Networks: Application to - Community Question Answering",cs.CL," We address the problem of cross-language adaptation for question-question -similarity reranking in community question answering, with the objective to -port a system trained on one input language to another input language given -labeled training data for the first language and only unlabeled data for the -second language. In particular, we propose to use adversarial training of -neural networks to learn high-level features that are discriminative for the -main learning task, and at the same time are invariant across the input -languages. The evaluation results show sizable improvements for our -cross-language adversarial neural network (CLANN) model over a strong -non-adversarial system. -" -5068,1706.06802,"Andrea Esuli, Tiziano Fagni, Alejandro Moreo Fernandez",JaTeCS an open-source JAva TExt Categorization System,cs.CL," JaTeCS is an open source Java library that supports research on automatic -text categorization and other related problems, such as ordinal regression and -quantification, which are of special interest in opinion mining applications. -It covers all the steps of an experimental activity, from reading the corpus to -the evaluation of the experimental results. As JaTeCS is focused on text as the -main input data, it provides the user with many text-dedicated tools, e.g.: -data readers for many formats, including the most commonly used text corpora -and lexical resources, natural language processing tools, multi-language -support, methods for feature selection and weighting, the implementation of -many machine learning algorithms as well as wrappers for well-known external -software (e.g., SVM_light) which enable their full control from code. JaTeCS -support its expansion by abstracting through interfaces many of the typical -tools and procedures used in text processing tasks. The library also provides a -number of ""template"" implementations of typical experimental setups (e.g., -train-test, k-fold validation, grid-search optimization, randomized runs) which -enable fast realization of experiments just by connecting the templates with -data readers, learning algorithms and evaluation measures. -" -5069,1706.06894,"Dilek K\""u\c{c}\""uk",Stance Detection in Turkish Tweets,cs.CL," Stance detection is a classification problem in natural language processing -where for a text and target pair, a class result from the set {Favor, Against, -Neither} is expected. It is similar to the sentiment analysis problem but -instead of the sentiment of the text author, the stance expressed for a -particular target is investigated in stance detection. In this paper, we -present a stance detection tweet data set for Turkish comprising stance -annotations of these tweets for two popular sports clubs as targets. -Additionally, we provide the evaluation results of SVM classifiers for each -target on this data set, where the classifiers use unigram, bigram, and hashtag -features. This study is significant as it presents one of the initial stance -detection data sets proposed so far and the first one for Turkish language, to -the best of our knowledge. The data set and the evaluation results of the -corresponding SVM-based approaches will form plausible baselines for the -comparison of future studies on stance detection. -" -5070,1706.06896,"Marco Dinarelli, Yoann Dupont, Isabelle Tellier",Effective Spoken Language Labeling with Deep Recurrent Neural Networks,cs.CL," Understanding spoken language is a highly complex problem, which can be -decomposed into several simpler tasks. In this paper, we focus on Spoken -Language Understanding (SLU), the module of spoken dialog systems responsible -for extracting a semantic interpretation from the user utterance. The task is -treated as a labeling problem. In the past, SLU has been performed with a wide -variety of probabilistic models. The rise of neural networks, in the last -couple of years, has opened new interesting research directions in this domain. -Recurrent Neural Networks (RNNs) in particular are able not only to represent -several pieces of information as embeddings but also, thanks to their recurrent -architecture, to encode as embeddings relatively long contexts. Such long -contexts are in general out of reach for models previously used for SLU. In -this paper we propose novel RNNs architectures for SLU which outperform -previous ones. Starting from a published idea as base block, we design new deep -RNNs achieving state-of-the-art results on two widely used corpora for SLU: -ATIS (Air Traveling Information System), in English, and MEDIA (Hotel -information and reservation in France), in French. -" -5071,1706.06987,"Hannah Morrison, Chris Martens",A Generative Model of Group Conversation,cs.CL cs.HC," Conversations with non-player characters (NPCs) in games are typically -confined to dialogue between a human player and a virtual agent, where the -conversation is initiated and controlled by the player. To create richer, more -believable environments for players, we need conversational behavior to reflect -initiative on the part of the NPCs, including conversations that include -multiple NPCs who interact with one another as well as the player. We describe -a generative computational model of group conversation between agents, an -abstract simulation of discussion in a small group setting. We define -conversational interactions in terms of rules for turn taking and interruption, -as well as belief change, sentiment change, and emotional response, all of -which are dependent on agent personality, context, and relationships. We -evaluate our model using a parameterized expressive range analysis, observing -correlations between simulation parameters and features of the resulting -conversations. This analysis confirms, for example, that character -personalities will predict how often they speak, and that heterogeneous groups -of characters will generate more belief change. -" -5072,1706.06996,"Nicolas Pr\""ollochs, Stefan Feuerriegel, Dirk Neumann",Statistical Inferences for Polarity Identification in Natural Language,cs.CL stat.AP," Information forms the basis for all human behavior, including the ubiquitous -decision-making that people constantly perform in their every day lives. It is -thus the mission of researchers to understand how humans process information to -reach decisions. In order to facilitate this task, this work proposes a novel -method of studying the reception of granular expressions in natural language. -The approach utilizes LASSO regularization as a statistical tool to extract -decisive words from textual content and draw statistical inferences based on -the correspondence between the occurrences of words and an exogenous response -variable. Accordingly, the method immediately suggests significant implications -for social sciences and Information Systems research: everyone can now identify -text segments and word choices that are statistically relevant to authors or -readers and, based on this knowledge, test hypotheses from behavioral research. -We demonstrate the contribution of our method by examining how authors -communicate subjective information through narrative materials. This allows us -to answer the question of which words to choose when communicating negative -information. On the other hand, we show that investors trade not only upon -facts in financial disclosures but are distracted by filler words and -non-informative language. Practitioners - for example those in the fields of -investor communications or marketing - can exploit our insights to enhance -their writings based on the true perception of word choice. -" -5073,1706.07179,"Trapit Bansal, Arvind Neelakantan, Andrew McCallum",RelNet: End-to-End Modeling of Entities & Relations,cs.CL cs.LG," We introduce RelNet: a new model for relational reasoning. RelNet is a memory -augmented neural network which models entities as abstract memory slots and is -equipped with an additional relational memory which models relations between -all memory pairs. The model thus builds an abstract knowledge graph on the -entities and relations present in a document which can then be used to answer -questions about the document. It is trained end-to-end: only supervision to the -model is in the form of correct answers to the questions. We test the model on -the 20 bAbI question-answering tasks with 10k examples per task and find that -it solves all the tasks with a mean error of 0.3%, achieving 0% error on 11 of -the 20 tasks. -" -5074,1706.07206,"Leila Arras, Gr\'egoire Montavon, Klaus-Robert M\""uller, Wojciech - Samek",Explaining Recurrent Neural Network Predictions in Sentiment Analysis,cs.CL cs.AI cs.NE stat.ML," Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown -to deliver insightful explanations in the form of input space relevances for -understanding feed-forward neural network classification decisions. In the -present work, we extend the usage of LRP to recurrent neural networks. We -propose a specific propagation rule applicable to multiplicative connections as -they arise in recurrent network architectures such as LSTMs and GRUs. We apply -our technique to a word-based bi-directional LSTM model on a five-class -sentiment prediction task, and evaluate the resulting LRP relevances both -qualitatively and quantitatively, obtaining better results than a -gradient-based related method which was used in previous work. -" -5075,1706.07230,"Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar - Pasumarthi, Dheeraj Rajagopal, Ruslan Salakhutdinov",Gated-Attention Architectures for Task-Oriented Language Grounding,cs.LG cs.AI cs.CL cs.RO," To perform tasks specified by natural language instructions, autonomous -agents need to extract semantically meaningful representations of language and -map it to visual elements and actions in the environment. This problem is -called task-oriented language grounding. We propose an end-to-end trainable -neural architecture for task-oriented language grounding in 3D environments -which assumes no prior linguistic or perceptual knowledge and requires only raw -pixels from the environment and the natural language instruction as input. The -proposed model combines the image and text representations using a -Gated-Attention mechanism and learns a policy to execute the natural language -instruction using standard reinforcement and imitation learning methods. We -show the effectiveness of the proposed model on unseen instructions as well as -unseen maps, both quantitatively and qualitatively. We also introduce a novel -environment based on a 3D game engine to simulate the challenges of -task-oriented language grounding over a rich set of instructions and -environment states. -" -5076,1706.07238,"Shahab Jalalvand and Matteo Negri and Daniele Falavigna and Marco - Matassoni and Marco Turchi",Automatic Quality Estimation for ASR System Combination,cs.CL," Recognizer Output Voting Error Reduction (ROVER) has been widely used for -system combination in automatic speech recognition (ASR). In order to select -the most appropriate words to insert at each position in the output -transcriptions, some ROVER extensions rely on critical information such as -confidence scores and other ASR decoder features. This information, which is -not always available, highly depends on the decoding process and sometimes -tends to over estimate the real quality of the recognized words. In this paper -we propose a novel variant of ROVER that takes advantage of ASR quality -estimation (QE) for ranking the transcriptions at ""segment level"" instead of: -i) relying on confidence scores, or ii) feeding ROVER with randomly ordered -hypotheses. We first introduce an effective set of features to compensate for -the absence of ASR decoder information. Then, we apply QE techniques to perform -accurate hypothesis ranking at segment-level before starting the fusion -process. The evaluation is carried out on two different tasks, in which we -respectively combine hypotheses coming from independent ASR systems and -multi-microphone recordings. In both tasks, it is assumed that the ASR decoder -information is not available. The proposed approach significantly outperforms -standard ROVER and it is competitive with two strong oracles that e xploit -prior knowledge about the real quality of the hypotheses to be combined. -Compared to standard ROVER, the abs olute WER improvements in the two -evaluation scenarios range from 0.5% to 7.3%. -" -5077,1706.07276,"Bei Shi, Wai Lam, Shoaib Jameel, Steven Schockaert, Kwun Ping Lai",Jointly Learning Word Embeddings and Latent Topics,cs.CL cs.IR cs.LG," Word embedding models such as Skip-gram learn a vector-space representation -for each word, based on the local word collocation patterns that are observed -in a text corpus. Latent topic models, on the other hand, take a more global -view, looking at the word distributions across the corpus to assign a topic to -each word occurrence. These two paradigms are complementary in how they -represent the meaning of word occurrences. While some previous works have -already looked at using word embeddings for improving the quality of latent -topics, and conversely, at using latent topics for improving word embeddings, -such ""two-step"" methods cannot capture the mutual interaction between the two -paradigms. In this paper, we propose STE, a framework which can learn word -embeddings and latent topics in a unified manner. STE naturally obtains -topic-specific word embeddings, and thus addresses the issue of polysemy. At -the same time, it also learns the term distributions of the topics, and the -topic distributions of the documents. Our experimental results demonstrate that -the STE model can indeed generate useful topic-specific word embeddings and -coherent latent topics in an effective and efficient way. -" -5078,1706.07440,"Chiori Hori, Takaaki Hori",End-to-end Conversation Modeling Track in DSTC6,cs.CL," End-to-end training of neural networks is a promising approach to automatic -construction of dialog systems using a human-to-human dialog corpus. Recently, -Vinyals et al. tested neural conversation models using OpenSubtitles. Lowe et -al. released the Ubuntu Dialogue Corpus for researching unstructured multi-turn -dialogue systems. Furthermore, the approach has been extended to accomplish -task oriented dialogs to provide information properly with natural -conversation. For example, Ghazvininejad et al. proposed a knowledge grounded -neural conversation model [3], where the research is aiming at combining -conversational dialogs with task-oriented knowledge using unstructured data -such as Twitter data for conversation and Foursquare data for external -knowledge.However, the task is still limited to a restaurant information -service, and has not yet been tested with a wide variety of dialog tasks. In -addition, it is still unclear how to create intelligent dialog systems that can -respond like a human agent. - In consideration of these problems, we proposed a challenge track to the 6th -dialog system technology challenges (DSTC6) using human-to-human dialog data to -mimic human dialog behaviors. The focus of the challenge track is to train -end-to-end conversation models from human-to-human conversation and accomplish -end-to-end dialog tasks in various situations assuming a customer service, in -which a system plays a role of human agent and generates natural and -informative sentences in response to user's questions or comments given dialog -context. -" -5079,1706.07503,"Chaitanya K. Joshi, Fei Mi and Boi Faltings",Personalization in Goal-Oriented Dialog,cs.CL cs.LG," The main goal of modeling human conversation is to create agents which can -interact with people in both open-ended and goal-oriented scenarios. End-to-end -trained neural dialog systems are an important line of research for such -generalized dialog models as they do not resort to any situation-specific -handcrafting of rules. However, incorporating personalization into such systems -is a largely unexplored topic as there are no existing corpora to facilitate -such work. In this paper, we present a new dataset of goal-oriented dialogs -which are influenced by speaker profiles attached to them. We analyze the -shortcomings of an existing end-to-end dialog system based on Memory Networks -and propose modifications to the architecture which enable personalization. We -also investigate personalization in dialog as a multi-task learning problem, -and show that a single model which shares features among various profiles -outperforms separate models for each profile. -" -5080,1706.07518,"Jiatao Gu, Daniel Jiwoong Im and Victor O.K. Li",Neural Machine Translation with Gumbel-Greedy Decoding,cs.CL," Previous neural machine translation models used some heuristic search -algorithms (e.g., beam search) in order to avoid solving the maximum a -posteriori problem over translation sentences at test time. In this paper, we -propose the Gumbel-Greedy Decoding which trains a generative network to predict -translation under a trained model. We solve such a problem using the -Gumbel-Softmax reparameterization, which makes our generative network -differentiable and trainable through standard stochastic gradient methods. We -empirically demonstrate that our proposed model is effective for generating -sequences of discrete words. -" -5081,1706.07598,"Quan Tran, Andrew MacKinlay and Antonio Jimeno Yepes","Named Entity Recognition with stack residual LSTM and trainable bias - decoding",cs.CL," Recurrent Neural Network models are the state-of-the-art for Named Entity -Recognition (NER). We present two innovations to improve the performance of -these models. The first innovation is the introduction of residual connections -between the Stacked Recurrent Neural Network model to address the degradation -problem of deep neural networks. The second innovation is a bias decoding -mechanism that allows the trained system to adapt to non-differentiable and -externally computed objectives, such as the entity-based F-measure. Our work -improves the state-of-the-art results for both Spanish and English languages on -the standard train/development/test split of the CoNLL 2003 Shared Task NER -dataset. -" -5082,1706.07786,Ismail Rusli,"Comparison of Modified Kneser-Ney and Witten-Bell Smoothing Techniques - in Statistical Language Model of Bahasa Indonesia",cs.CL," Smoothing is one technique to overcome data sparsity in statistical language -model. Although in its mathematical definition there is no explicit dependency -upon specific natural language, different natures of natural languages result -in different effects of smoothing techniques. This is true for Russian language -as shown by Whittaker (1998). In this paper, We compared Modified Kneser-Ney -and Witten-Bell smoothing techniques in statistical language model of Bahasa -Indonesia. We used train sets of totally 22M words that we extracted from -Indonesian version of Wikipedia. As far as we know, this is the largest train -set used to build statistical language model for Bahasa Indonesia. The -experiments with 3-gram, 5-gram, and 7-gram showed that Modified Kneser-Ney -consistently outperforms Witten-Bell smoothing technique in term of perplexity -values. It is interesting to note that our experiments showed 5-gram model for -Modified Kneser-Ney smoothing technique outperforms that of 7-gram. Meanwhile, -Witten-Bell smoothing is consistently improving over the increase of n-gram -order. -" -5083,1706.07859,Dong Wang and Lantian Li and Zhiyuan Tang and Thomas Fang Zheng,Deep Speaker Verification: Do We Need End to End?,cs.SD cs.CL," End-to-end learning treats the entire system as a whole adaptable black box, -which, if sufficient data are available, may learn a system that works very -well for the target task. This principle has recently been applied to several -prototype research on speaker verification (SV), where the feature learning and -classifier are learned together with an objective function that is consistent -with the evaluation metric. An opposite approach to end-to-end is feature -learning, which firstly trains a feature learning model, and then constructs a -back-end classifier separately to perform SV. Recently, both approaches -achieved significant performance gains on SV, mainly attributed to the smart -utilization of deep neural networks. However, the two approaches have not been -carefully compared, and their respective advantages have not been well -discussed. In this paper, we compare the end-to-end and feature learning -approaches on a text-independent SV task. Our experiments on a dataset sampled -from the Fisher database and involving 5,000 speakers demonstrated that the -feature learning approach outperformed the end-to-end approach. This is a -strong support for the feature learning approach, at least with data and -computation resources similar to ours. -" -5084,1706.07860,Miao Zhang and Yixiang Chen and Lantian Li and Dong Wang,"Speaker Recognition with Cough, Laugh and ""Wei""",cs.SD cs.CL," This paper proposes a speaker recognition (SRE) task with trivial speech -events, such as cough and laugh. These trivial events are ubiquitous in -conversations and less subjected to intentional change, therefore offering -valuable particularities to discover the genuine speaker from disguised speech. -However, trivial events are often short and idiocratic in spectral patterns, -making SRE extremely difficult. Fortunately, we found a very powerful deep -feature learning structure that can extract highly speaker-sensitive features. -By employing this tool, we studied the SRE performance on three types of -trivial events: cough, laugh and ""Wei"" (a short Chinese ""Hello""). The results -show that there is rich speaker information within these trivial events, even -for cough that is intuitively less speaker distinguishable. With the deep -feature approach, the EER can reach 10%-14% with the three trivial events, -despite their extremely short durations (0.2-1.0 seconds). -" -5085,1706.07861,Lantian Li and Dong Wang and Askar Rozi and Thomas Fang Zheng,Cross-lingual Speaker Verification with Deep Feature Learning,cs.SD cs.CL," Existing speaker verification (SV) systems often suffer from performance -degradation if there is any language mismatch between model training, speaker -enrollment, and test. A major cause of this degradation is that most existing -SV methods rely on a probabilistic model to infer the speaker factor, so any -significant change on the distribution of the speech signal will impact the -inference. Recently, we proposed a deep learning model that can learn how to -extract the speaker factor by a deep neural network (DNN). By this feature -learning, an SV system can be constructed with a very simple back-end model. In -this paper, we investigate the robustness of the feature-based SV system in -situations with language mismatch. Our experiments were conducted on a complex -cross-lingual scenario, where the model training was in English, and the -enrollment and test were in Chinese or Uyghur. The experiments demonstrated -that the feature-based system outperformed the i-vector system with a large -margin, particularly with language mismatch between enrollment and test. -" -5086,1706.07905,Jiangming Liu and Yue Zhang,Encoder-Decoder Shift-Reduce Syntactic Parsing,cs.CL," Starting from NMT, encoder-decoder neu- ral networks have been used for many -NLP problems. Graph-based models and transition-based models borrowing the en- -coder components achieve state-of-the-art performance on dependency parsing and -constituent parsing, respectively. How- ever, there has not been work -empirically studying the encoder-decoder neural net- works for transition-based -parsing. We apply a simple encoder-decoder to this end, achieving comparable -results to the parser of Dyer et al. (2015) on standard de- pendency parsing, -and outperforming the parser of Vinyals et al. (2015) on con- stituent parsing. -" -5087,1706.07912,"Lavanya Narayana Raju, Mahamad Suhil, D S Guru and Harsha S Gowda",Cluster Based Symbolic Representation for Skewed Text Categorization,cs.IR cs.CL," In this work, a problem associated with imbalanced text corpora is addressed. -A method of converting an imbalanced text corpus into a balanced one is -presented. The presented method employs a clustering algorithm for conversion. -Initially to avoid curse of dimensionality, an effective representation scheme -based on term class relevancy measure is adapted, which drastically reduces the -dimension to the number of classes in the corpus. Subsequently, the samples of -larger sized classes are grouped into a number of subclasses of smaller sizes -to make the entire corpus balanced. Each subclass is then given a single -symbolic vector representation by the use of interval valued features. This -symbolic representation in addition to being compact helps in reducing the -space requirement and also the classification time. The proposed model has been -empirically demonstrated for its superiority on bench marking datasets viz., -Reuters 21578 and TDT2. Further, it has been compared against several other -existing contemporary models including model based on support vector machine. -The comparative analysis indicates that the proposed model outperforms the -other existing models. -" -5088,1706.07913,"Harsha S. Gowda, Mahamad Suhil, D.S. Guru, and Lavanya Narayana Raju",Semi-supervised Text Categorization Using Recursive K-means Clustering,cs.LG cs.CL cs.IR," In this paper, we present a semi-supervised learning algorithm for -classification of text documents. A method of labeling unlabeled text documents -is presented. The presented method is based on the principle of divide and -conquer strategy. It uses recursive K-means algorithm for partitioning both -labeled and unlabeled data collection. The K-means algorithm is applied -recursively on each partition till a desired level partition is achieved such -that each partition contains labeled documents of a single class. Once the -desired clusters are obtained, the respective cluster centroids are considered -as representatives of the clusters and the nearest neighbor rule is used for -classifying an unknown text document. Series of experiments have been conducted -to bring out the superiority of the proposed model over other recent state of -the art models on 20Newsgroups dataset. -" -5089,1706.08032,Huy Nguyen and Minh-Le Nguyen,"A Deep Neural Architecture for Sentence-level Sentiment Classification - in Twitter Social Networking",cs.CL," This paper introduces a novel deep learning framework including a -lexicon-based approach for sentence-level prediction of sentiment label -distribution. We propose to first apply semantic rules and then use a Deep -Convolutional Neural Network (DeepCNN) for character-level embeddings in order -to increase information for word-level embedding. After that, a Bidirectional -Long Short-Term Memory Network (Bi-LSTM) produces a sentence-wide feature -representation from the word-level embedding. We evaluate our approach on three -Twitter sentiment classification datasets. Experimental results show that our -model can improve the classification accuracy of sentence-level sentiment -analysis in Twitter social networking. -" -5090,1706.08160,"Shyam Upadhyay and Kai-Wei Chang and Matt Taddy and Adam Kalai and - James Zou",Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context,cs.CL cs.LG," Word embeddings, which represent a word as a point in a vector space, have -become ubiquitous to several NLP tasks. A recent line of work uses bilingual -(two languages) corpora to learn a different vector for each sense of a word, -by exploiting crosslingual signals to aid sense identification. We present a -multi-view Bayesian non-parametric algorithm which improves multi-sense word -embeddings by (a) using multilingual (i.e., more than two languages) corpora to -significantly improve sense embeddings beyond what one achieves with bilingual -information, and (b) uses a principled approach to learn a variable number of -senses per word, in a data-driven manner. Ours is the first approach with the -ability to leverage multilingual corpora efficiently for multi-sense -representation learning. Experiments show that multilingual training -significantly improves performance over monolingual and bilingual training, by -allowing us to combine different parallel corpora to leverage multilingual -context. Multilingual training yields comparable performance to a state of the -art mono-lingual model trained on five times more training data. -" -5091,1706.08162,"Abeed Sarker, Diego Molla, Cecile Paris","Automated text summarisation and evidence-based medicine: A survey of - two domains",cs.CL," The practice of evidence-based medicine (EBM) urges medical practitioners to -utilise the latest research evidence when making clinical decisions. Because of -the massive and growing volume of published research on various medical topics, -practitioners often find themselves overloaded with information. As such, -natural language processing research has recently commenced exploring -techniques for performing medical domain-specific automated text summarisation -(ATS) techniques-- targeted towards the task of condensing large medical texts. -However, the development of effective summarisation techniques for this task -requires cross-domain knowledge. We present a survey of EBM, the -domain-specific needs for EBM, automated summarisation techniques, and how they -have been applied hitherto. We envision that this survey will serve as a first -resource for the development of future operational text summarisation -techniques for EBM. -" -5092,1706.08186,"Meng Qu, Xiang Ren, Jiawei Han",Automatic Synonym Discovery with Knowledge Bases,cs.CL," Recognizing entity synonyms from text has become a crucial task in many -entity-leveraging applications. However, discovering entity synonyms from -domain-specific text corpora (e.g., news articles, scientific papers) is rather -challenging. Current systems take an entity name string as input to find out -other names that are synonymous, ignoring the fact that often times a name -string can refer to multiple entities (e.g., ""apple"" could refer to both Apple -Inc and the fruit apple). Moreover, most existing methods require training data -manually created by domain experts to construct supervised-learning systems. In -this paper, we study the problem of automatic synonym discovery with knowledge -bases, that is, identifying synonyms for knowledge base entities in a given -domain-specific corpus. The manually-curated synonyms for each entity stored in -a knowledge base not only form a set of name strings to disambiguate the -meaning for each other, but also can serve as ""distant"" supervision to help -determine important features for the task. We propose a novel framework, called -DPE, to integrate two kinds of mutually-complementing signals for synonym -discovery, i.e., distributional features based on corpus-level statistics and -textual patterns based on local contexts. In particular, DPE jointly optimizes -the two kinds of signals in conjunction with distant supervision, so that they -can mutually enhance each other in the training stage. At the inference stage, -both signals will be utilized to discover synonyms for the given entities. -Experimental results prove the effectiveness of the proposed framework. -" -5093,1706.08198,"Yukio Matsumura, Takayuki Sato, Mamoru Komachi","English-Japanese Neural Machine Translation with - Encoder-Decoder-Reconstructor",cs.CL," Neural machine translation (NMT) has recently become popular in the field of -machine translation. However, NMT suffers from the problem of repeating or -missing words in the translation. To address this problem, Tu et al. (2017) -proposed an encoder-decoder-reconstructor framework for NMT using -back-translation. In this method, they selected the best forward translation -model in the same manner as Bahdanau et al. (2015), and then trained a -bi-directional translation model as fine-tuning. Their experiments show that it -offers significant improvement in BLEU scores in Chinese-English translation -task. We confirm that our re-implementation also shows the same tendency and -alleviates the problem of repeating and missing words in the translation on a -English-Japanese task too. In addition, we evaluate the effectiveness of -pre-training by comparing it with a jointly-trained model of forward -translation and back-translation. -" -5094,1706.08476,"Tiancheng Zhao, Allen Lu, Kyusong Lee and Maxine Eskenazi","Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog - Systems with Chatting Capability",cs.CL cs.AI," Generative encoder-decoder models offer great promise in developing -domain-general dialog systems. However, they have mainly been applied to -open-domain conversations. This paper presents a practical and novel framework -for building task-oriented dialog systems based on encoder-decoder models. This -framework enables encoder-decoder models to accomplish slot-value independent -decision-making and interact with external databases. Moreover, this paper -shows the flexibility of the proposed method by interleaving chatting -capability with a slot-filling system for better out-of-domain recovery. The -models were trained on both real-user data from a bus information system and -human-human chat data. Results show that the proposed framework achieves good -performance in both offline evaluation metrics and in task success rate with -human users. -" -5095,1706.08502,"Satwik Kottur, Jos\'e M.F. Moura, Stefan Lee, Dhruv Batra",Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog,cs.CL cs.AI cs.CV," A number of recent works have proposed techniques for end-to-end learning of -communication protocols among cooperative multi-agent populations, and have -simultaneously found the emergence of grounded human-interpretable language in -the protocols developed by the agents, all learned without any human -supervision! - In this paper, using a Task and Tell reference game between two agents as a -testbed, we present a sequence of 'negative' results culminating in a -'positive' one -- showing that while most agent-invented languages are -effective (i.e. achieve near-perfect task rewards), they are decidedly not -interpretable or compositional. - In essence, we find that natural language does not emerge 'naturally', -despite the semblance of ease of natural-language-emergence that one may gather -from recent literature. We discuss how it is possible to coax the invented -languages to become more and more human-like and compositional by increasing -restrictions on how two agents may communicate. -" -5096,1706.08568,"Georg Wiese, Dirk Weissenborn, Mariana Neves",Neural Question Answering at BioASQ 5B,cs.CL cs.AI cs.NE," This paper describes our submission to the 2017 BioASQ challenge. We -participated in Task B, Phase B which is concerned with biomedical question -answering (QA). We focus on factoid and list question, using an extractive QA -model, that is, we restrict our system to output substrings of the provided -text snippets. At the core of our system, we use FastQA, a state-of-the-art -neural QA system. We extended it with biomedical word embeddings and changed -its answer layer to be able to answer list questions in addition to factoid -questions. We pre-trained the model on a large-scale open-domain QA dataset, -SQuAD, and then fine-tuned the parameters on the BioASQ training set. With our -approach, we achieve state-of-the-art results on factoid questions and -competitive results on list questions. -" -5097,1706.08609,"Artemy Kolchinsky, Nakul Dhande, Kengjeun Park, Yong-Yeol Ahn","The Minor Fall, the Major Lift: Inferring Emotional Valence of Musical - Chords through Lyrics",cs.CL cs.SD," We investigate the association between musical chords and lyrics by analyzing -a large dataset of user-contributed guitar tablatures. Motivated by the idea -that the emotional content of chords is reflected in the words used in -corresponding lyrics, we analyze associations between lyrics and chord -categories. We also examine the usage patterns of chords and lyrics in -different musical genres, historical eras, and geographical regions. Our -overall results confirms a previously known association between Major chords -and positive valence. We also report a wide variation in this association -across regions, genres, and eras. Our results suggest possible existence of -different emotional associations for other types of chords. -" -5098,1706.08683,Shiyue Zhang and Gulnigar Mahmut and Dong Wang and Askar Hamdulla,Memory-augmented Chinese-Uyghur Neural Machine Translation,cs.CL," Neural machine translation (NMT) has achieved notable performance recently. -However, this approach has not been widely applied to the translation task -between Chinese and Uyghur, partly due to the limited parallel data resource -and the large proportion of rare words caused by the agglutinative nature of -Uyghur. In this paper, we collect ~200,000 sentence pairs and show that with -this middle-scale database, an attention-based NMT can perform very well on -Chinese-Uyghur/Uyghur-Chinese translation. To tackle rare words, we propose a -novel memory structure to assist the NMT inference. Our experiments -demonstrated that the memory-augmented NMT (M-NMT) outperforms both the vanilla -NMT and the phrase-based statistical machine translation (SMT). Interestingly, -the memory structure provides an elegant way for dealing with words that are -out of vocabulary. -" -5099,1706.08746,"Andrew Yates, Kai Hui",DE-PACRR: Exploring Layers Inside the PACRR Model,cs.IR cs.CL," Recent neural IR models have demonstrated deep learning's utility in ad-hoc -information retrieval. However, deep models have a reputation for being black -boxes, and the roles of a neural IR model's components may not be obvious at -first glance. In this work, we attempt to shed light on the inner workings of a -recently proposed neural IR model, namely the PACRR model, by visualizing the -output of intermediate layers and by investigating the relationship between -intermediate weights and the ultimate relevance score produced. We highlight -several insights, hoping that such insights will be generally applicable. -" -5100,1706.09031,"Ryan Cotterell, Christo Kirov, John Sylak-Glassman, G\'eraldine - Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra K\""ubler, - David Yarowsky, Jason Eisner and Mans Hulden","CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection - in 52 Languages",cs.CL," The CoNLL-SIGMORPHON 2017 shared task on supervised morphological generation -required systems to be trained and tested in each of 52 typologically diverse -languages. In sub-task 1, submitted systems were asked to predict a specific -inflected form of a given lemma. In sub-task 2, systems were given a lemma and -some of its specific inflected forms, and asked to complete the inflectional -paradigm by predicting all of the remaining inflected forms. Both sub-tasks -included high, medium, and low-resource conditions. Sub-task 1 received 24 -system submissions, while sub-task 2 received 3 system submissions. Following -the success of neural sequence-to-sequence models in the SIGMORPHON 2016 shared -task, all but one of the submissions included a neural component. The results -show that high performance can be achieved with small training datasets, so -long as models have appropriate inductive bias or make use of additional -unlabeled data or synthetic data. However, different biasing and data -augmentation resulted in disjoint sets of inflected forms being predicted -correctly, suggesting that there is room for future improvement. -" -5101,1706.09055,"Christopher Dane Shulby, Martha Dais Ferreira, Rodrigo F. de Mello, - Sandra Maria Aluisio",Acoustic Modeling Using a Shallow CNN-HTSVM Architecture,cs.SD cs.CL," High-accuracy speech recognition is especially challenging when large -datasets are not available. It is possible to bridge this gap with careful and -knowledge-driven parsing combined with the biologically inspired CNN and the -learning guarantees of the Vapnik Chervonenkis (VC) theory. This work presents -a Shallow-CNN-HTSVM (Hierarchical Tree Support Vector Machine classifier) -architecture which uses a predefined knowledge-based set of rules with -statistical machine learning techniques. Here we show that gross errors present -even in state-of-the-art systems can be avoided and that an accurate acoustic -model can be built in a hierarchical fashion. The CNN-HTSVM acoustic model -outperforms traditional GMM-HMM models and the HTSVM structure outperforms a -MLP multi-class classifier. More importantly we isolate the performance of the -acoustic model and provide results on both the frame and phoneme level -considering the true robustness of the model. We show that even with a small -amount of data accurate and robust recognition rates can be obtained. -" -5102,1706.09147,"Yotam Eshel, Noam Cohen, Kira Radinsky, Shaul Markovitch, Ikuya - Yamada, Omer Levy",Named Entity Disambiguation for Noisy Text,cs.CL," We address the task of Named Entity Disambiguation (NED) for noisy text. We -present WikilinksNED, a large-scale NED dataset of text fragments from the web, -which is significantly noisier and more challenging than existing news-based -datasets. To capture the limited and noisy local context surrounding each -mention, we design a neural model and train it with a novel method for sampling -informative negative examples. We also describe a new way of initializing word -and entity embeddings that significantly improves performance. Our model -significantly outperforms existing state-of-the-art methods on WikilinksNED -while achieving comparable performance on a smaller newswire dataset. -" -5103,1706.09254,"Jekaterina Novikova, Ond\v{r}ej Du\v{s}ek, Verena Rieser",The E2E Dataset: New Challenges For End-to-End Generation,cs.CL," This paper describes the E2E data, a new dataset for training end-to-end, -data-driven natural language generation systems in the restaurant domain, which -is ten times bigger than existing, frequently used datasets in this area. The -E2E dataset poses new challenges: (1) its human reference texts show more -lexical richness and syntactic variation, including discourse phenomena; (2) -generating from this set requires content selection. As such, learning from -this dataset promises more natural, varied and less template-like system -utterances. We also establish a baseline on this dataset, which illustrates -some of the difficulties associated with this data. -" -5104,1706.09335,"Gaurush Hiranandani, Pranav Maneriker, Harsh Jhamtani",Generating Appealing Brand Names,cs.CL," Providing appealing brand names to newly launched products, newly formed -companies or for renaming existing companies is highly important as it can play -a crucial role in deciding its success or failure. In this work, we propose a -computational method to generate appealing brand names based on the description -of such entities. We use quantitative scores for readability, pronounceability, -memorability and uniqueness of the generated names to rank order them. A set of -diverse appealing names is recommended to the user for the brand naming task. -Experimental results show that the names generated by our approach are more -appealing than names which prior approaches and recruited humans could come up. -" -5105,1706.09433,"Jekaterina Novikova, Ond\v{r}ej Du\v{s}ek and Verena Rieser",Data-driven Natural Language Generation: Paving the Road to Success,cs.CL," We argue that there are currently two major bottlenecks to the commercial use -of statistical machine learning approaches for natural language generation -(NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) -The scarcity of high quality in-domain corpora. We address the first problem by -thoroughly analysing current evaluation metrics and motivating the need for a -new, more reliable metric. The second problem is addressed by presenting a -novel framework for developing and evaluating a high quality corpus for NLG -training. -" -5106,1706.09453,Liang Lu,"Toward Computation and Memory Efficient Neural Network Acoustic Models - with Binary Weights and Activations",cs.CL," Neural network acoustic models have significantly advanced state of the art -speech recognition over the past few years. However, they are usually -computationally expensive due to the large number of matrix-vector -multiplications and nonlinearity operations. Neural network models also require -significant amounts of memory for inference because of the large model size. -For these two reasons, it is challenging to deploy neural network based speech -recognizers on resource-constrained platforms such as embedded devices. This -paper investigates the use of binary weights and activations for computation -and memory efficient neural network acoustic models. Compared to real-valued -weight matrices, binary weights require much fewer bits for storage, thereby -cutting down the memory footprint. Furthermore, with binary weights or -activations, the matrix-vector multiplications are turned into addition and -subtraction operations, which are computationally much faster and more energy -efficient for hardware platforms. In this paper, we study the applications of -binary weights and activations for neural network acoustic modeling, reporting -encouraging results on the WSJ and AMI corpora. -" -5107,1706.09528,"Swabha Swayamdipta, Sam Thomson, Chris Dyer, Noah A. Smith","Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a - Syntactic Scaffold",cs.CL," We present a new, efficient frame-semantic parser that labels semantic -arguments to FrameNet predicates. Built using an extension to the segmental RNN -that emphasizes recall, our basic system achieves competitive performance -without any calls to a syntactic parser. We then introduce a method that uses -phrase-syntactic annotations from the Penn Treebank during training only, -through a multitask objective; no parsing is required at training or test time. -This ""syntactic scaffold"" offers a cheaper alternative to traditional syntactic -pipelining, and achieves state-of-the-art performance. -" -5108,1706.09562,"Francis Ferraro, Adam Poliak, Ryan Cotterell, Benjamin Van Durme","Frame-Based Continuous Lexical Semantics through Exponential Family - Tensor Factorization and Semantic Proto-Roles",cs.CL," We study how different frame annotations complement one another when learning -continuous lexical semantics. We learn the representations from a tensorized -skip-gram model that consistently encodes syntactic-semantic content better, -with multiple 10% gains over baselines. -" -5109,1706.09569,"Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Massimo Piccardi","Recurrent neural networks with specialized word embeddings for - health-domain named-entity recognition",cs.CL," Background. Previous state-of-the-art systems on Drug Name Recognition (DNR) -and Clinical Concept Extraction (CCE) have focused on a combination of text -""feature engineering"" and conventional machine learning algorithms such as -conditional random fields and support vector machines. However, developing good -features is inherently heavily time-consuming. Conversely, more modern machine -learning approaches such as recurrent neural networks (RNNs) have proved -capable of automatically learning effective features from either random -assignments or automated word ""embeddings"". Objectives. (i) To create a highly -accurate DNR and CCE system that avoids conventional, time-consuming feature -engineering. (ii) To create richer, more specialized word embeddings by using -health domain datasets such as MIMIC-III. (iii) To evaluate our systems over -three contemporary datasets. Methods. Two deep learning methods, namely the -Bidirectional LSTM and the Bidirectional LSTM-CRF, are evaluated. A CRF model -is set as the baseline to compare the deep learning systems to a traditional -machine learning approach. The same features are used for all the models. -Results. We have obtained the best results with the Bidirectional LSTM-CRF -model, which has outperformed all previously proposed systems. The specialized -embeddings have helped to cover unusual words in DDI-DrugBank and DDI-MedLine, -but not in the 2010 i2b2/VA IRB Revision dataset. Conclusion. We present a -state-of-the-art system for DNR and CCE. Automated word embeddings has allowed -us to avoid costly feature engineering and achieve higher accuracy. -Nevertheless, the embeddings need to be retrained over datasets that are -adequate for the domain, in order to adequately cover the domain-specific -vocabulary. -" -5110,1706.09588,"Naoya Takahashi, Yuki Mitsufuji",Multi-scale Multi-band DenseNets for Audio Source Separation,cs.SD cs.CL cs.MM," This paper deals with the problem of audio source separation. To handle the -complex and ill-posed nature of the problems of audio source separation, the -current state-of-the-art approaches employ deep neural networks to obtain -instrumental spectra from a mixture. In this study, we propose a novel network -architecture that extends the recently developed densely connected -convolutional network (DenseNet), which has shown excellent results on image -classification tasks. To deal with the specific problem of audio source -separation, an up-sampling layer, block skip connection and band-dedicated -dense blocks are incorporated on top of DenseNet. The proposed approach takes -advantage of long contextual information and outperforms state-of-the-art -results on SiSEC 2016 competition by a large margin in terms of -signal-to-distortion ratio. Moreover, the proposed architecture requires -significantly fewer parameters and considerably less training time compared -with other methods. -" -5111,1706.09673,Ganesh J,Improving Distributed Representations of Tweets - Present and Future,cs.CL," Unsupervised representation learning for tweets is an important research -field which helps in solving several business applications such as sentiment -analysis, hashtag prediction, paraphrase detection and microblog ranking. A -good tweet representation learning model must handle the idiosyncratic nature -of tweets which poses several challenges such as short length, informal words, -unusual grammar and misspellings. However, there is a lack of prior work which -surveys the representation learning models with a focus on tweets. In this -work, we organize the models based on its objective function which aids the -understanding of the literature. We also provide interesting future directions, -which we believe are fruitful in advancing this field by building high-quality -tweet representation learning models. -" -5112,1706.09733,Michael Denkowski and Graham Neubig,Stronger Baselines for Trustable Results in Neural Machine Translation,cs.CL," Interest in neural machine translation has grown rapidly as its effectiveness -has been demonstrated across language and data scenarios. New research -regularly introduces architectural and algorithmic improvements that lead to -significant gains over ""vanilla"" NMT implementations. However, these new -techniques are rarely evaluated in the context of previously published -techniques, specifically those that are widely used in state-of-theart -production and shared-task systems. As a result, it is often difficult to -determine whether improvements from research will carry over to systems -deployed for real-world use. In this work, we recommend three specific methods -that are relatively easy to implement and result in much stronger experimental -systems. Beyond reporting significantly higher BLEU scores, we conduct an -in-depth analysis of where improvements originate and what inherent weaknesses -of basic NMT models are being addressed. We then compare the relative gains -afforded by several other techniques proposed in the literature when starting -with vanilla systems versus our stronger baselines, showing that experimental -conclusions may change depending on the baseline chosen. This indicates that -choosing a strong baseline is crucial for reporting reliable experimental -results. -" -5113,1706.09742,"Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen","AP17-OLR Challenge: Data, Plan, and Baseline",cs.CL," We present the data profile and the evaluation plan of the second oriental -language recognition (OLR) challenge AP17-OLR. Compared to the event last year -(AP16-OLR), the new challenge involves more languages and focuses more on short -utterances. The data is offered by SpeechOcean and the NSFC M2ASR project. Two -types of baselines are constructed to assist the participants, one is based on -the i-vector model and the other is based on various neural networks. We report -the baseline results evaluated with various metrics defined by the AP17-OLR -evaluation plan and demonstrate that the combined database is a reasonable data -resource for multilingual research. All the data is free for participants, and -the Kaldi recipes for the baselines have been published online. -" -5114,1706.09789,"David Golub, Po-Sen Huang, Xiaodong He, Li Deng","Two-Stage Synthesis Networks for Transfer Learning in Machine - Comprehension",cs.CL," We develop a technique for transfer learning in machine comprehension (MC) -using a novel two-stage synthesis network (SynNet). Given a high-performing MC -model in one domain, our technique aims to answer questions about documents in -another domain, where we use no labeled data of question-answer pairs. Using -the proposed SynNet with a pretrained model from the SQuAD dataset on the -challenging NewsQA dataset, we achieve an F1 measure of 44.3% with a single -model and 46.6% with an ensemble, approaching performance of in-domain models -(F1 measure of 50.0%) and outperforming the out-of-domain baseline of 7.6%, -without use of provided annotations. -" -5115,1706.09799,"Shikhar Sharma, Layla El Asri, Hannes Schulz, Jeremie Zumer","Relevance of Unsupervised Metrics in Task-Oriented Dialogue for - Evaluating Natural Language Generation",cs.CL," Automated metrics such as BLEU are widely used in the machine translation -literature. They have also been used recently in the dialogue community for -evaluating dialogue response generation. However, previous work in dialogue -response generation has shown that these metrics do not correlate strongly with -human judgment in the non task-oriented dialogue setting. Task-oriented -dialogue responses are expressed on narrower domains and exhibit lower -diversity. It is thus reasonable to think that these automated metrics would -correlate well with human judgment in the task-oriented setting where the -generation task consists of translating dialogue acts into a sentence. We -conduct an empirical study to confirm whether this is the case. Our findings -indicate that these automated metrics have stronger correlation with human -judgments in the task-oriented setting compared to what has been observed in -the non task-oriented setting. We also observe that these metrics correlate -even better for datasets which provide multiple ground truth reference -sentences. In addition, we show that some of the currently available corpora -for task-oriented language generation can be solved with simple models and -advocate for more challenging datasets. -" -5116,1706.09856,Majid Laali and Leila Kosseim,"Automatic Mapping of French Discourse Connectives to PDTB Discourse - Relations",cs.CL," In this paper, we present an approach to exploit phrase tables generated by -statistical machine translation in order to map French discourse connectives to -discourse relations. Using this approach, we created ConcoLeDisCo, a lexicon of -French discourse connectives and their PDTB relations. When evaluated against -LEXCONN, ConcoLeDisCo achieves a recall of 0.81 and an Average Precision of -0.68 for the Concession and Condition relations. -" -5117,1706.10006,"Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen",Automated Audio Captioning with Recurrent Neural Networks,cs.SD cs.CL cs.LG," We present the first approach to automated audio captioning. We employ an -encoder-decoder scheme with an alignment model in between. The input to the -encoder is a sequence of log mel-band energies calculated from an audio file, -while the output is a sequence of words, i.e. a caption. The encoder is a -multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a -multi-layered GRU with a classification layer connected to the last GRU of the -decoder. The classification layer and the alignment model are fully connected -layers with shared weights between timesteps. The proposed method is evaluated -using data drawn from a commercial sound effects library, ProSound Effects. The -resulting captions were rated through metrics utilized in machine translation -and image captioning fields. Results from metrics show that the proposed method -can predict words appearing in the original caption, but not always correctly -ordered. -" -5118,1706.10192,"Kai Hui, Andrew Yates, Klaus Berberich, Gerard de Melo",Co-PACRR: A Context-Aware Neural IR Model for Ad-hoc Retrieval,cs.IR cs.CL," Neural IR models, such as DRMM and PACRR, have achieved strong results by -successfully capturing relevance matching signals. We argue that the context of -these matching signals is also important. Intuitively, when extracting, -modeling, and combining matching signals, one would like to consider the -surrounding text (local context) as well as other signals from the same -document that can contribute to the overall relevance score. In this work, we -highlight three potential shortcomings caused by not considering context -information and propose three neural ingredients to address them: a -disambiguation component, cascade k-max pooling, and a shuffling combination -layer. Incorporating these components into the PACRR model yields Co-PACRR, a -novel context-aware neural IR model. Extensive comparisons with established -models on Trec Web Track data confirm that the proposed model can achieve -superior search results. In addition, an ablation analysis is conducted to gain -insights into the impact of and interactions between different components. We -release our code to enable future comparisons. -" -5119,1707.00061,"Su Lin Blodgett, Brendan O'Connor","Racial Disparity in Natural Language Processing: A Case Study of Social - Media African-American English",cs.CY cs.CL," We highlight an important frontier in algorithmic fairness: disparity in the -quality of natural language processing algorithms when applied to language from -authors of different social groups. For example, current systems sometimes -analyze the language of females and minorities more poorly than they do of -whites and males. We conduct an empirical analysis of racial disparity in -language identification for tweets written in African-American English, and -discuss implications of disparity in NLP. -" -5120,1707.00079,Hany Hassan and Mostafa Elaraby and Ahmed Tawfik,Synthetic Data for Neural Machine Translation of Spoken-Dialects,cs.CL," In this paper, we introduce a novel approach to generate synthetic data for -training Neural Machine Translation systems. The proposed approach transforms a -given parallel corpus between a written language and a target language to a -parallel corpus between a spoken dialect variant and the target language. Our -approach is language independent and can be used to generate data for any -variant of the source language such as slang or spoken dialect or even for a -different language that is closely related to the source language. - The proposed approach is based on local embedding projection of distributed -representations which utilizes monolingual embeddings to transform parallel -data across language variants. We report experimental results on Levantine to -English translation using Neural Machine Translation. We show that the -generated data can improve a very large scale system by more than 2.8 Bleu -points using synthetic spoken data which shows that it can be used to provide a -reliable translation system for a spoken dialect that does not have sufficient -parallel data. -" -5121,1707.00110,"Denny Britz, Melody Y. Guan, Minh-Thang Luong",Efficient Attention using a Fixed-Size Memory Representation,cs.CL," The standard content-based attention mechanism typically used in -sequence-to-sequence models is computationally expensive as it requires the -comparison of large encoder and decoder states at each time step. In this work, -we propose an alternative attention mechanism based on a fixed size memory -representation that is more efficient. Our technique predicts a compact set of -K attention contexts during encoding and lets the decoder compute an efficient -lookup that does not need to consult the memory. We show that our approach -performs on-par with the standard attention mechanism while yielding inference -speedups of 20% for real-world translation tasks and more for tasks with longer -sequences. By visualizing attention scores we demonstrate that our models learn -distinct, meaningful alignments. -" -5122,1707.00117,"Wenbo Hu, Lifeng Hua, Lei Li, Hang Su, Tian Wang, Ning Chen, Bo Zhang","SAM: Semantic Attribute Modulation for Language Modeling and Style - Variation",cs.CL cs.LG stat.ML," This paper presents a Semantic Attribute Modulation (SAM) for language -modeling and style variation. The semantic attribute modulation includes -various document attributes, such as titles, authors, and document categories. -We consider two types of attributes, (title attributes and category -attributes), and a flexible attribute selection scheme by automatically scoring -them via an attribute attention mechanism. The semantic attributes are embedded -into the hidden semantic space as the generation inputs. With the attributes -properly harnessed, our proposed SAM can generate interpretable texts with -regard to the input attributes. Qualitative analysis, including word semantic -analysis and attention values, shows the interpretability of SAM. On several -typical text datasets, we empirically demonstrate the superiority of the -Semantic Attribute Modulated language model with different combinations of -document attributes. Moreover, we present a style variation for the lyric -generation using SAM, which shows a strong connection between the style -variation and the semantic attributes. -" -5123,1707.00130,"Pei-Hao Su, Pawel Budzianowski, Stefan Ultes, Milica Gasic, and Steve - Young","Sample-efficient Actor-Critic Reinforcement Learning with Supervised - Data for Dialogue Management",cs.CL cs.AI cs.LG," Deep reinforcement learning (RL) methods have significant potential for -dialogue policy optimisation. However, they suffer from a poor performance in -the early stages of learning. This is especially problematic for on-line -learning with real users. Two approaches are introduced to tackle this problem. -Firstly, to speed up the learning process, two sample-efficient neural networks -algorithms: trust region actor-critic with experience replay (TRACER) and -episodic natural actor-critic with experience replay (eNACER) are presented. -For TRACER, the trust region helps to control the learning step size and avoid -catastrophic model changes. For eNACER, the natural gradient identifies the -steepest ascent direction in policy space to speed up the convergence. Both -models employ off-policy learning with experience replay to improve -sample-efficiency. Secondly, to mitigate the cold start issue, a corpus of -demonstration data is utilised to pre-train the models prior to on-line -reinforcement learning. Combining these two approaches, we demonstrate a -practical approach to learn deep RL-based dialogue policies and demonstrate -their effectiveness in a task-oriented information seeking domain. -" -5124,1707.00166,"Liyuan Liu, Xiang Ren, Qi Zhu, Shi Zhi, Huan Gui, Heng Ji and Jiawei - Han","Heterogeneous Supervision for Relation Extraction: A Representation - Learning Approach",cs.CL," Relation extraction is a fundamental task in information extraction. Most -existing methods have heavy reliance on annotations labeled by human experts, -which are costly and time-consuming. To overcome this drawback, we propose a -novel framework, REHession, to conduct relation extractor learning using -annotations from heterogeneous information source, e.g., knowledge base and -domain heuristics. These annotations, referred as heterogeneous supervision, -often conflict with each other, which brings a new challenge to the original -relation extraction task: how to infer the true label from noisy labels for a -given instance. Identifying context information as the backbone of both -relation extraction and true label discovery, we adopt embedding techniques to -learn the distributed representations of context, which bridges all components -with mutual enhancement in an iterative fashion. Extensive experimental results -demonstrate the superiority of REHession over the state-of-the-art. -" -5125,1707.00189,"Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder",Content-Based Weak Supervision for Ad-Hoc Re-Ranking,cs.IR cs.CL," One challenge with neural ranking is the need for a large amount of -manually-labeled relevance judgments for training. In contrast with prior work, -we examine the use of weak supervision sources for training that yield pseudo -query-document pairs that already exhibit relevance (e.g., newswire -headline-content pairs and encyclopedic heading-paragraph pairs). We also -propose filtering techniques to eliminate training samples that are too far out -of domain using two techniques: a heuristic-based approach and novel supervised -filter that re-purposes a neural ranker. Using several leading neural ranking -architectures and multiple weak supervision datasets, we show that these -sources of training pairs are effective on their own (outperforming prior weak -supervision techniques), and that filtering can further improve performance. -" -5126,1707.00201,"Ziteng Wang, Emmanuel Vincent, Romain Serizel, Yonghong Yan","Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in - Noisy Environments",cs.SD cs.CL," Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and -the Generalized Eigenvalue (GEV) beamformer are popular signal processing -techniques which can improve speech recognition performance. In this paper, we -present an experimental study on these linear filters in a specific speech -recognition task, namely the CHiME-4 challenge, which features real recordings -in multiple noisy environments. Specifically, the rank-1 MWF is employed for -noise reduction and a new constant residual noise power constraint is derived -which enhances the recognition performance. To fulfill the underlying rank-1 -assumption, the speech covariance matrix is reconstructed based on eigenvectors -or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with -alternative multichannel linear filters under the same framework, which -involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask -estimation. The proposed filter outperforms alternative ones, leading to a 40% -relative Word Error Rate (WER) reduction compared with the baseline Weighted -Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER -reduction compared with the GEV-BAN method. The results also suggest that the -speech recognition accuracy correlates more with the Mel-frequency cepstral -coefficients (MFCC) feature variance than with the noise reduction or the -speech distortion level. -" -5127,1707.00206,"Junxian He, Zhiting Hu, Taylor Berg-Kirkpatrick, Ying Huang, Eric P. - Xing",Efficient Correlated Topic Modeling with Topic Embedding,cs.LG cs.CL stat.ML," Correlated topic modeling has been limited to small model and problem sizes -due to their high computational cost and poor scaling. In this paper, we -propose a new model which learns compact topic embeddings and captures topic -correlations through the closeness between the topic vectors. Our method -enables efficient inference in the low-dimensional embedding space, reducing -previous cubic or quadratic time complexity to linear w.r.t the topic size. We -further speedup variational inference with a fast sampler to exploit sparsity -of topic occurrence. Extensive experiments show that our approach is capable of -handling model and data scales which are several orders of magnitude larger -than existing correlation results, without sacrificing modeling quality by -providing competitive or superior performance in document classification and -retrieval. -" -5128,1707.00248,"Xinchi Chen, Zhan Shi, Xipeng Qiu, Xuanjing Huang",DAG-based Long Short-Term Memory for Neural Word Segmentation,cs.CL," Neural word segmentation has attracted more and more research interests for -its ability to alleviate the effort of feature engineering and utilize the -external resource by the pre-trained character or word embeddings. In this -paper, we propose a new neural model to incorporate the word-level information -for Chinese word segmentation. Unlike the previous word-based models, our model -still adopts the framework of character-based sequence labeling, which has -advantages on both effectiveness and efficiency at the inference stage. To -utilize the word-level information, we also propose a new long short-term -memory (LSTM) architecture over directed acyclic graph (DAG). Experimental -results demonstrate that our model leads to better performances than the -baseline models. -" -5129,1707.00299,"Keisuke Sakaguchi, Matt Post, Benjamin Van Durme",Grammatical Error Correction with Neural Reinforcement Learning,cs.CL," We propose a neural encoder-decoder model with reinforcement learning (NRL) -for grammatical error correction (GEC). Unlike conventional maximum likelihood -estimation (MLE), the model directly optimizes towards an objective that -considers a sentence-level, task-specific evaluation metric, avoiding the -exposure bias issue in MLE. We demonstrate that NRL outperforms MLE both in -human and automated evaluation metrics, achieving the state-of-the-art on a -fluency-oriented GEC corpus. -" -5130,1707.00621,"Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Liviu P. Dinu",Including Dialects and Language Varieties in Author Profiling,cs.CL," This paper presents a computational approach to author profiling taking -gender and language variety into account. We apply an ensemble system with the -output of multiple linear SVM classifiers trained on character and word -$n$-grams. We evaluate the system using the dataset provided by the organizers -of the 2017 PAN lab on author profiling. Our approach achieved 75% average -accuracy on gender identification on tweets written in four languages and 97% -accuracy on language variety identification for Portuguese. -" -5131,1707.00683,"Harm de Vries, Florian Strub, J\'er\'emie Mary, Hugo Larochelle, - Olivier Pietquin, Aaron Courville",Modulating early visual processing by language,cs.CV cs.CL cs.LG," It is commonly assumed that language refers to high-level visual concepts -while leaving low-level visual processing unaffected. This view dominates the -current literature in computational models for language-vision tasks, where -visual and linguistic input are mostly processed independently before being -fused into a single representation. In this paper, we deviate from this classic -pipeline and propose to modulate the \emph{entire visual processing} by -linguistic input. Specifically, we condition the batch normalization parameters -of a pretrained residual network (ResNet) on a language embedding. This -approach, which we call MOdulated RESnet (\MRN), significantly improves strong -baselines on two visual question answering tasks. Our ablation study shows that -modulating from the early stages of the visual processing is beneficial. -" -5132,1707.00722,Jayadev Billa,"Improving LSTM-CTC based ASR performance in domains with limited - training data",cs.CL," This paper addresses the observed performance gap between automatic speech -recognition (ASR) systems based on Long Short Term Memory (LSTM) neural -networks trained with the connectionist temporal classification (CTC) loss -function and systems based on hybrid Deep Neural Networks (DNNs) trained with -the cross entropy (CE) loss function on domains with limited data. We step -through a number of experiments that show incremental improvements on a -baseline EESEN toolkit based LSTM-CTC ASR system trained on the Librispeech -100hr (train-clean-100) corpus. Our results show that with effective -combination of data augmentation and regularization, a LSTM-CTC based system -can exceed the performance of a strong Kaldi based baseline trained on the same -data. -" -5133,1707.00781,"Bruno Gon\c{c}alves, Luc\'ia Loureiro-Porto, Jos\'e J. Ramasco, David - S\'anchez",Mapping the Americanization of English in Space and Time,cs.CL cond-mat.stat-mech cs.CY physics.soc-ph stat.AP," As global political preeminence gradually shifted from the United Kingdom to -the United States, so did the capacity to culturally influence the rest of the -world. In this work, we analyze how the world-wide varieties of written English -are evolving. We study both the spatial and temporal variations of vocabulary -and spelling of English using a large corpus of geolocated tweets and the -Google Books datasets corresponding to books published in the US and the UK. -The advantage of our approach is that we can address both standard written -language (Google Books) and the more colloquial forms of microblogging messages -(Twitter). We find that American English is the dominant form of English -outside the UK and that its influence is felt even within the UK borders. -Finally, we analyze how this trend has evolved over time and the impact that -some cultural events have had in shaping it. -" -5134,1707.00836,"Kyung-Min Kim, Min-Oh Heo, Seong-Ho Choi, and Byoung-Tak Zhang",DeepStory: Video Story QA by Deep Embedded Memory Networks,cs.CV cs.AI cs.CL," Question-answering (QA) on video contents is a significant challenge for -achieving human-level intelligence as it involves both vision and language in -real-world settings. Here we demonstrate the possibility of an AI agent -performing video story QA by learning from a large amount of cartoon videos. We -develop a video-story learning model, i.e. Deep Embedded Memory Networks -(DEMN), to reconstruct stories from a joint scene-dialogue video stream using a -latent embedding space of observed data. The video stories are stored in a -long-term memory component. For a given question, an LSTM-based attention model -uses the long-term memory to recall the best question-story-answer triplet by -focusing on specific words containing key information. We trained the DEMN on a -novel QA dataset of children's cartoon video series, Pororo. The dataset -contains 16,066 scene-dialogue pairs of 20.5-hour videos, 27,328 fine-grained -sentences for scene description, and 8,913 story-related QA pairs. Our -experimental results show that the DEMN outperforms other QA models. This is -mainly due to 1) the reconstruction of video stories in a scene-dialogue -combined form that utilize the latent embedding and 2) attention. DEMN also -achieved state-of-the-art results on the MovieQA benchmark. -" -5135,1707.00896,Nikolaos Pappas and Andrei Popescu-Belis,Multilingual Hierarchical Attention Networks for Document Classification,cs.CL," Hierarchical attention networks have recently achieved remarkable performance -for document classification in a given language. However, when multilingual -document collections are considered, training such models separately for each -language entails linear parameter growth and lack of cross-language transfer. -Learning a single multilingual model with fewer parameters is therefore a -challenging but potentially beneficial objective. To this end, we propose -multilingual hierarchical attention networks for learning document structures, -with shared encoders and/or shared attention mechanisms across languages, using -multi-task learning and an aligned semantic space as input. We evaluate the -proposed models on multilingual document classification with disjoint label -sets, on a large dataset which we provide, with 600k news documents in 8 -languages, and 5k labels. The multilingual models outperform monolingual ones -in low-resource as well as full-resource settings, and use fewer parameters, -thus confirming their computational efficiency and the utility of -cross-language transfer. -" -5136,1707.00995,"Jean-Benoit Delbrouck, St\'ephane Dupont","An empirical study on the effectiveness of images in Multimodal Neural - Machine Translation",cs.CL," In state-of-the-art Neural Machine Translation (NMT), an attention mechanism -is used during decoding to enhance the translation. At every step, the decoder -uses this mechanism to focus on different parts of the source sentence to -gather the most useful information before outputting its target word. Recently, -the effectiveness of the attention mechanism has also been explored for -multimodal tasks, where it becomes possible to focus both on sentence parts and -image regions that they describe. In this paper, we compare several attention -mechanism on the multimodal translation task (English, image to German) and -evaluate the ability of the model to make use of images to improve translation. -We surpass state-of-the-art scores on the Multi30k data set, we nevertheless -identify and report different misbehavior of the machine while translating. -" -5137,1707.01009,"Jean-Benoit Delbrouck, St\'ephane Dupont, Omar Seddati","Visually Grounded Word Embeddings and Richer Visual Features for - Improving Multimodal Neural Machine Translation",cs.CL," In Multimodal Neural Machine Translation (MNMT), a neural model generates a -translated sentence that describes an image, given the image itself and one -source descriptions in English. This is considered as the multimodal image -caption translation task. The images are processed with Convolutional Neural -Network (CNN) to extract visual features exploitable by the translation model. -So far, the CNNs used are pre-trained on object detection and localization -task. We hypothesize that richer architecture, such as dense captioning models, -may be more suitable for MNMT and could lead to improved translations. We -extend this intuition to the word-embeddings, where we compute both linguistic -and visual representation for our corpus vocabulary. We combine and compare -different confi -" -5138,1707.01066,"Lifu Huang, Heng Ji, Kyunghyun Cho, Clare R. Voss",Zero-Shot Transfer Learning for Event Extraction,cs.CL," Most previous event extraction studies have relied heavily on features -derived from annotated event mentions, thus cannot be applied to new event -types without annotation effort. In this work, we take a fresh look at event -extraction and model it as a grounding problem. We design a transferable neural -architecture, mapping event mentions and types jointly into a shared semantic -space using structural and compositional neural networks, where the type of -each event mention can be determined by the closest of all candidate types . By -leveraging (1)~available manual annotations for a small set of existing event -types and (2)~existing event ontologies, our framework applies to new event -types without requiring additional annotation. Experiments on both existing -event types (e.g., ACE, ERE) and new event types (e.g., FrameNet) demonstrate -the effectiveness of our approach. \textit{Without any manual annotations} for -23 new event types, our zero-shot framework achieved performance comparable to -a state-of-the-art supervised model which is trained from the annotations of -500 event mentions. -" -5139,1707.01075,"Lifu Huang, Avirup Sil, Heng Ji, Radu Florian","Improving Slot Filling Performance with Attentive Neural Networks on - Dependency Structures",cs.CL," Slot Filling (SF) aims to extract the values of certain types of attributes -(or slots, such as person:cities\_of\_residence) for a given entity from a -large collection of source documents. In this paper we propose an effective DNN -architecture for SF with the following new strategies: (1). Take a regularized -dependency graph instead of a raw sentence as input to DNN, to compress the -wide contexts between query and candidate filler; (2). Incorporate two -attention mechanisms: local attention learned from query and candidate filler, -and global attention learned from external knowledge bases, to guide the model -to better select indicative contexts to determine slot type. Experiments show -that this framework outperforms state-of-the-art on both relation extraction -(16\% absolute F-score gain) and slot filling validation for each individual -system (up to 8.5\% absolute F-score gain). -" -5140,1707.01090,Daniel Dzibela and Armin Sehr,Hidden-Markov-Model Based Speech Enhancement,cs.SD cs.CL," The goal of this contribution is to use a parametric speech synthesis system -for reducing background noise and other interferences from recorded speech -signals. In a first step, Hidden Markov Models of the synthesis system are -trained. - Two adequate training corpora consisting of text and corresponding speech -files have been set up and cleared of various faults, including inaudible -utterances or incorrect assignments between audio and text data. Those are -tested and compared against each other regarding e.g. flaws in the synthesized -speech, it's naturalness and intelligibility. Thus different voices have been -synthesized, whose quality depends less on the number of training samples used, -but much more on the cleanliness and signal-to-noise ratio of those. -Generalized voice models have been used for synthesis and the results greatly -differ between the two speech corpora. - Tests regarding the adaptation to different speakers show that a resemblance -to the original speaker is audible throughout all recordings, yet the -synthesized voices sound robotic and unnatural in smaller parts. The spoken -text, however, is usually intelligible, which shows that the models are working -well. - In a novel approach, speech is synthesized using side information of the -original audio signal, particularly the pitch frequency. Results show an -increase of speech quality and intelligibility in comparison to speech -synthesized solely from text, up to the point of being nearly indistinguishable -from the original. -" -5141,1707.01161,"Harsh Jhamtani, Varun Gangal, Eduard Hovy, Eric Nyberg","Shakespearizing Modern Language Using Copy-Enriched Sequence-to-Sequence - Models",cs.CL," Variations in writing styles are commonly used to adapt the content to a -specific context, audience, or purpose. However, applying stylistic variations -is still by and large a manual process, and there have been little efforts -towards automating it. In this paper we explore automated methods to transform -text from modern English to Shakespearean English using an end to end trainable -neural model with pointers to enable copy action. To tackle limited amount of -parallel data, we pre-train embeddings of words by leveraging external -dictionaries mapping Shakespearean words to modern English words as well as -additional text. Our methods are able to get a BLEU score of 31+, an -improvement of ~6 points above the strongest baseline. We publicly release our -code to foster further research in this area. -" -5142,1707.01176,"Varun Gangal, Harsh Jhamtani, Graham Neubig, Eduard Hovy, Eric Nyberg",CharManteau: Character Embedding Models For Portmanteau Creation,cs.CL," Portmanteaus are a word formation phenomenon where two words are combined to -form a new word. We propose character-level neural sequence-to-sequence (S2S) -methods for the task of portmanteau generation that are end-to-end-trainable, -language independent, and do not explicitly use additional phonetic -information. We propose a noisy-channel-style model, which allows for the -incorporation of unsupervised word lists, improving performance over a standard -source-to-target model. This model is made possible by an exhaustive candidate -generation strategy specifically enabled by the features of the portmanteau -task. Experiments find our approach superior to a state-of-the-art FST-based -baseline with respect to ground truth accuracy and human evaluation. -" -5143,1707.01183,"Souvick Ghosh, Satanu Ghosh, and Dipankar Das",Complexity Metric for Code-Mixed Social Media Text,cs.CL cs.SI," An evaluation metric is an absolute necessity for measuring the performance -of any system and complexity of any data. In this paper, we have discussed how -to determine the level of complexity of code-mixed social media texts that are -growing rapidly due to multilingual interference. In general, texts written in -multiple languages are often hard to comprehend and analyze. At the same time, -in order to meet the demands of analysis, it is also necessary to determine the -complexity of a particular document or a text segment. Thus, in the present -paper, we have discussed the existing metrics for determining the code-mixing -complexity of a corpus, their advantages, and shortcomings as well as proposed -several improvements on the existing metrics. The new index better reflects the -variety and complexity of a multilingual document. Also, the index can be -applied to a sentence and seamlessly extended to a paragraph or an entire -document. We have employed two existing code-mixed corpora to suit the -requirements of our study. -" -5144,1707.01184,"Souvick Ghosh, Satanu Ghosh, and Dipankar Das",Sentiment Identification in Code-Mixed Social Media Text,cs.CL cs.AI cs.SI," Sentiment analysis is the Natural Language Processing (NLP) task dealing with -the detection and classification of sentiments in texts. While some tasks deal -with identifying the presence of sentiment in the text (Subjectivity analysis), -other tasks aim at determining the polarity of the text categorizing them as -positive, negative and neutral. Whenever there is a presence of sentiment in -the text, it has a source (people, group of people or any entity) and the -sentiment is directed towards some entity, object, event or person. Sentiment -analysis tasks aim to determine the subject, the target and the polarity or -valence of the sentiment. In our work, we try to automatically extract -sentiment (positive or negative) from Facebook posts using a machine learning -approach.While some works have been done in code-mixed social media data and in -sentiment analysis separately, our work is the first attempt (as of now) which -aims at performing sentiment analysis of code-mixed social media text. We have -used extensive pre-processing to remove noise from raw text. Multilayer -Perceptron model has been used to determine the polarity of the sentiment. We -have also developed the corpus for this task by manually labeling Facebook -posts with their associated sentiments. -" -5145,1707.01265,Jonggu Kim and Jong-Hyeok Lee,"Multiple Range-Restricted Bidirectional Gated Recurrent Units with - Attention for Relation Classification",cs.CL," Most of neural approaches to relation classification have focused on finding -short patterns that represent the semantic relation using Convolutional Neural -Networks (CNNs) and those approaches have generally achieved better -performances than using Recurrent Neural Networks (RNNs). In a similar -intuition to the CNN models, we propose a novel RNN-based model that strongly -focuses on only important parts of a sentence using multiple range-restricted -bidirectional layers and attention for relation classification. Experimental -results on the SemEval-2010 relation classification task show that our model is -comparable to the state-of-the-art CNN-based and RNN-based models that use -additional linguistic information. -" -5146,1707.01321,"Sanda Martin\v{c}i\'c-Ip\v{s}i\'c, Tanja Mili\v{c}i\'c, Ljup\v{c}o - Todorovski","The Influence of Feature Representation of Text on the Performance of - Document Classification",cs.CL," In this paper we perform a comparative analysis of three models for feature -representation of text documents in the context of document classification. In -particular, we consider the most often used family of models bag-of-words, -recently proposed continuous space models word2vec and doc2vec, and the model -based on the representation of text documents as language networks. While the -bag-of-word models have been extensively used for the document classification -task, the performance of the other two models for the same task have not been -well understood. This is especially true for the network-based model that have -been rarely considered for representation of text documents for classification. -In this study, we measure the performance of the document classifiers trained -using the method of random forests for features generated the three models and -their variants. The results of the empirical comparison show that the commonly -used bag-of-words model has performance comparable to the one obtained by the -emerging continuous-space model of doc2vec. In particular, the low-dimensional -variants of doc2vec generating up to 75 features are among the top-performing -document representation models. The results finally point out that doc2vec -shows a superior performance in the tasks of classifying large documents. -" -5147,1707.01355,"Peter Makarov, Tatiana Ruzsics, Simon Clematide","Align and Copy: UZH at SIGMORPHON 2017 Shared Task for Morphological - Reinflection",cs.CL," This paper presents the submissions by the University of Zurich to the -SIGMORPHON 2017 shared task on morphological reinflection. The task is to -predict the inflected form given a lemma and a set of morpho-syntactic -features. We focus on neural network approaches that can tackle the task in a -limited-resource setting. As the transduction of the lemma into the inflected -form is dominated by copying over lemma characters, we propose two recurrent -neural network architectures with hard monotonic attention that are strong at -copying and, yet, substantially different in how they achieve this. The first -approach is an encoder-decoder model with a copy mechanism. The second approach -is a neural state-transition system over a set of explicit edit actions, -including a designated COPY action. We experiment with character alignment and -find that naive, greedy alignment consistently produces strong results for some -languages. Our best system combination is the overall winner of the SIGMORPHON -2017 Shared Task 1 without external resources. At a setting with 100 training -samples, both our approaches, as ensembles of models, outperform the next best -competitor. -" -5148,1707.01378,"Yoram Bachrach, Andrej Zukov-Gregoric, Sam Coope, Ed Tovell, Bogdan - Maksak, Jose Rodriguez, Conan McMurtie","An Attention Mechanism for Answer Selection Using a Combined Global and - Local View",cs.CL," We propose a new attention mechanism for neural based question answering, -which depends on varying granularities of the input. Previous work focused on -augmenting recurrent neural networks with simple attention mechanisms which are -a function of the similarity between a question embedding and an answer -embeddings across time. We extend this by making the attention mechanism -dependent on a global embedding of the answer attained using a separate -network. - We evaluate our system on InsuranceQA, a large question answering dataset. -Our model outperforms current state-of-the-art results on InsuranceQA. Further, -we visualize which sections of text our attention mechanism focuses on, and -explore its performance across different parameter settings. -" -5149,1707.01425,"Souvick Ghosh, Dipankar Das and Tanmoy Chakraborty","Determining sentiment in citation text and analyzing its impact on the - proposed ranking index",cs.IR cs.CL cs.DL," Whenever human beings interact with each other, they exchange or express -opinions, emotions, and sentiments. These opinions can be expressed in text, -speech or images. Analysis of these sentiments is one of the popular research -areas of present day researchers. Sentiment analysis, also known as opinion -mining tries to identify or classify these sentiments or opinions into two -broad categories - positive and negative. In recent years, the scientific -community has taken a lot of interest in analyzing sentiment in textual data -available in various social media platforms. Much work has been done on social -media conversations, blog posts, newspaper articles and various narrative -texts. However, when it comes to identifying emotions from scientific papers, -researchers have faced some difficulties due to the implicit and hidden nature -of opinion. By default, citation instances are considered inherently positive -in emotion. Popular ranking and indexing paradigms often neglect the opinion -present while citing. In this paper, we have tried to achieve three objectives. -First, we try to identify the major sentiment in the citation text and assign a -score to the instance. We have used a statistical classifier for this purpose. -Secondly, we have proposed a new index (we shall refer to it hereafter as -M-index) which takes into account both the quantitative and qualitative factors -while scoring a paper. Thirdly, we developed a ranking of research papers based -on the M-index. We also try to explain how the M-index impacts the ranking of -scientific papers. -" -5150,1707.01450,Romain Laroche,The Complex Negotiation Dialogue Game,cs.AI cs.CL," This position paper formalises an abstract model for complex negotiation -dialogue. This model is to be used for the benchmark of optimisation algorithms -ranging from Reinforcement Learning to Stochastic Games, through Transfer -Learning, One-Shot Learning or others. -" -5151,1707.01477,"Reuben Binns, Michael Veale, Max Van Kleek, Nigel Shadbolt","Like trainer, like bot? Inheritance of bias in algorithmic content - moderation",cs.CY cs.CL cs.LG," The internet has become a central medium through which `networked publics' -express their opinions and engage in debate. Offensive comments and personal -attacks can inhibit participation in these spaces. Automated content moderation -aims to overcome this problem using machine learning classifiers trained on -large corpora of texts manually annotated for offence. While such systems could -help encourage more civil debate, they must navigate inherently normatively -contestable boundaries, and are subject to the idiosyncratic norms of the human -raters who provide the training data. An important objective for platforms -implementing such measures might be to ensure that they are not unduly biased -towards or against particular norms of offence. This paper provides some -exploratory methods by which the normative biases of algorithmic content -moderation systems can be measured, by way of a case study using an existing -dataset of comments labelled for offence. We train classifiers on comments -labelled by different demographic subsets (men and women) to understand how -differences in conceptions of offence between these groups might affect the -performance of the resulting models on various test sets. We conclude by -discussing some of the ethical choices facing the implementers of algorithmic -moderation systems, given various desired levels of diversity of viewpoints -amongst discussion participants. -" -5152,1707.01521,"Zhaocheng Zhu, Junfeng Hu",Context Aware Document Embedding,cs.CL," Recently, doc2vec has achieved excellent results in different tasks. In this -paper, we present a context aware variant of doc2vec. We introduce a novel -weight estimating mechanism that generates weights for each word occurrence -according to its contribution in the context, using deep neural networks. Our -context aware model can achieve similar results compared to doc2vec initialized -byWikipedia trained vectors, while being much more efficient and free from -heavy external corpus. Analysis of context aware weights shows they are a kind -of enhanced IDF weights that capture sub-topic level keywords in documents. -They might result from deep neural networks that learn hidden representations -with the least entropy. -" -5153,1707.01555,Hongyu Guo,A Deep Network with Visual Text Composition Behavior,cs.CL cs.AI cs.NE," While natural languages are compositional, how state-of-the-art neural models -achieve compositionality is still unclear. We propose a deep network, which not -only achieves competitive accuracy for text classification, but also exhibits -compositional behavior. That is, while creating hierarchical representations of -a piece of text, such as a sentence, the lower layers of the network distribute -their layer-specific attention weights to individual words. In contrast, the -higher layers compose meaningful phrases and clauses, whose lengths increase as -the networks get deeper until fully composing the sentence. -" -5154,1707.01561,"Felipe Costa, Sixun Ouyang, Peter Dolog, Aonghus Lawlor",Automatic Generation of Natural Language Explanations,cs.CL cs.LG," An important task for recommender system is to generate explanations -according to a user's preferences. Most of the current methods for explainable -recommendations use structured sentences to provide descriptions along with the -recommendations they produce. However, those methods have neglected the -review-oriented way of writing a text, even though it is known that these -reviews have a strong influence over user's decision. - In this paper, we propose a method for the automatic generation of natural -language explanations, for predicting how a user would write about an item, -based on user ratings from different items' features. We design a -character-level recurrent neural network (RNN) model, which generates an item's -review explanations using long-short term memories (LSTM). The model generates -text reviews given a combination of the review and ratings score that express -opinions about different factors or aspects of an item. Our network is trained -on a sub-sample from the large real-world dataset BeerAdvocate. Our empirical -evaluation using natural language processing metrics shows the generated text's -quality is close to a real user written review, identifying negation, -misspellings, and domain specific vocabulary. -" -5155,1707.01626,"Mohamed Abdalla, Graeme Hirst",Cross-Lingual Sentiment Analysis Without (Good) Translation,cs.CL," Current approaches to cross-lingual sentiment analysis try to leverage the -wealth of labeled English data using bilingual lexicons, bilingual vector space -embeddings, or machine translation systems. Here we show that it is possible to -use a single linear transformation, with as few as 2000 word pairs, to capture -fine-grained sentiment relationships between words in a cross-lingual setting. -We apply these cross-lingual sentiment models to a diverse set of tasks to -demonstrate their functionality in a non-English context. By effectively -leveraging English sentiment knowledge without the need for accurate -translation, we can analyze and extract features from other languages with -scarce data at a very low cost, thus making sentiment and related analyses for -many languages inexpensive. -" -5156,1707.01662,"Seunghak Yu, Nilesh Kulkarni, Haejun Lee, Jihie Kim",An Embedded Deep Learning based Word Prediction,cs.CL," Recent developments in deep learning with application to language modeling -have led to success in tasks of text processing, summarizing and machine -translation. However, deploying huge language models for mobile device such as -on-device keyboards poses computation as a bottle-neck due to their puny -computation capacities. In this work we propose an embedded deep learning based -word prediction method that optimizes run-time memory and also provides a real -time prediction environment. Our model size is 7.40MB and has average -prediction time of 6.47 ms. We improve over the existing methods for word -prediction in terms of key stroke savings and word prediction rate. -" -5157,1707.01736,"Emiel van Miltenburg, Desmond Elliott, Piek Vossen",Cross-linguistic differences and similarities in image descriptions,cs.CL cs.AI cs.CV," Automatic image description systems are commonly trained and evaluated on -large image description datasets. Recently, researchers have started to collect -such datasets for languages other than English. An unexplored question is how -different these datasets are from English and, if there are any differences, -what causes them to differ. This paper provides a cross-linguistic comparison -of Dutch, English, and German image descriptions. We find that these -descriptions are similar in many respects, but the familiarity of crowd workers -with the subjects of the images has a noticeable influence on description -specificity. -" -5158,1707.01780,Jose Camacho-Collados and Mohammad Taher Pilehvar,"On the Role of Text Preprocessing in Neural Network Architectures: An - Evaluation Study on Text Categorization and Sentiment Analysis",cs.CL cs.IR," Text preprocessing is often the first step in the pipeline of a Natural -Language Processing (NLP) system, with potential impact in its final -performance. Despite its importance, text preprocessing has not received much -attention in the deep learning literature. In this paper we investigate the -impact of simple text preprocessing decisions (particularly tokenizing, -lemmatizing, lowercasing and multiword grouping) on the performance of a -standard neural text classifier. We perform an extensive evaluation on standard -benchmarks from text categorization and sentiment analysis. While our -experiments show that a simple tokenization of input text is generally -adequate, they also highlight significant degrees of variability across -preprocessing techniques. This reveals the importance of paying attention to -this usually-overlooked step in the pipeline, particularly when comparing -different models. Finally, our evaluation provides insights into the best -preprocessing practices for training word embeddings. -" -5159,1707.01793,"Yifan Sun, Nikhil Rao, Weicong Ding",A Simple Approach to Learn Polysemous Word Embeddings,cs.CL," Many NLP applications require disambiguating polysemous words. Existing -methods that learn polysemous word vector representations involve first -detecting various senses and optimizing the sense-specific embeddings -separately, which are invariably more involved than single sense learning -methods such as word2vec. Evaluating these methods is also problematic, as -rigorous quantitative evaluations in this space is limited, especially when -compared with single-sense embeddings. In this paper, we propose a simple -method to learn a word representation, given any context. Our method only -requires learning the usual single sense representation, and coefficients that -can be learnt via a single pass over the data. We propose several new test sets -for evaluating word sense induction, relevance detection, and contextual word -similarity, significantly supplementing the currently available tests. Results -on these and other tests show that while our method is embarrassingly simple, -it achieves excellent results when compared to the state of the art models for -unsupervised polysemous word representation learning. -" -5160,1707.01830,Raphael Shu and Hideki Nakayama,Single-Queue Decoding for Neural Machine Translation,cs.CL," Neural machine translation models rely on the beam search algorithm for -decoding. In practice, we found that the quality of hypotheses in the search -space is negatively affected owing to the fixed beam size. To mitigate this -problem, we store all hypotheses in a single priority queue and use a universal -score function for hypothesis selection. The proposed algorithm is more -flexible as the discarded hypotheses can be revisited in a later step. We -further design a penalty function to punish the hypotheses that tend to produce -a final translation that is much longer or shorter than expected. Despite its -simplicity, we show that the proposed decoding algorithm is able to select -hypotheses with better qualities and improve the translation performance. -" -5161,1707.01890,"Gaurav Trivedi, Phuong Pham, Wendy Chapman, Rebecca Hwa, Janyce Wiebe, - Harry Hochheiser",An Interactive Tool for Natural Language Processing on Clinical Text,cs.HC cs.CL cs.IR," Natural Language Processing (NLP) systems often make use of machine learning -techniques that are unfamiliar to end-users who are interested in analyzing -clinical records. Although NLP has been widely used in extracting information -from clinical text, current systems generally do not support model revision -based on feedback from domain experts. - We present a prototype tool that allows end users to visualize and review the -outputs of an NLP system that extracts binary variables from clinical text. Our -tool combines multiple visualizations to help the users understand these -results and make any necessary corrections, thus forming a feedback loop and -helping improve the accuracy of the NLP models. We have tested our prototype in -a formative think-aloud user study with clinicians and researchers involved in -colonoscopy research. Results from semi-structured interviews and a System -Usability Scale (SUS) analysis show that the users are able to quickly start -refining NLP models, despite having very little or no experience with machine -learning. Observations from these sessions suggest revisions to the interface -to better support review workflow and interpretation of results. -" -5162,1707.01917,Madhav Nimishakavi and Partha Talukdar,"Higher-order Relation Schema Induction using Tensor Factorization with - Back-off and Aggregation",cs.CL cs.IR," Relation Schema Induction (RSI) is the problem of identifying type signatures -of arguments of relations from unlabeled text. Most of the previous work in -this area have focused only on binary RSI, i.e., inducing only the subject and -object type signatures per relation. However, in practice, many relations are -high-order, i.e., they have more than two arguments and inducing type -signatures of all arguments is necessary. For example, in the sports domain, -inducing a schema win(WinningPlayer, OpponentPlayer, Tournament, Location) is -more informative than inducing just win(WinningPlayer, OpponentPlayer). We -refer to this problem as Higher-order Relation Schema Induction (HRSI). In this -paper, we propose Tensor Factorization with Back-off and Aggregation (TFBA), a -novel framework for the HRSI problem. To the best of our knowledge, this is the -first attempt at inducing higher-order relation schemata from unlabeled text. -Using the experimental analysis on three real world datasets, we show how TFBA -helps in dealing with sparsity and induce higher order schemata. -" -5163,1707.01961,"Fenglong Ma, Radha Chitta, Saurabh Kataria, Jing Zhou, Palghat Ramesh, - Tong Sun, Jing Gao",Long-Term Memory Networks for Question Answering,cs.CL cs.AI," Question answering is an important and difficult task in the natural language -processing domain, because many basic natural language processing tasks can be -cast into a question answering task. Several deep neural network architectures -have been developed recently, which employ memory and inference components to -memorize and reason over text information, and generate answers to questions. -However, a major drawback of many such models is that they are capable of only -generating single-word answers. In addition, they require large amount of -training data to generate accurate answers. In this paper, we introduce the -Long-Term Memory Network (LTMN), which incorporates both an external memory -module and a Long Short-Term Memory (LSTM) module to comprehend the input data -and generate multi-word answers. The LTMN model can be trained end-to-end using -back-propagation and requires minimal supervision. We test our model on two -synthetic data sets (based on Facebook's bAbI data set) and the real-world -Stanford question answering data set, and show that it can achieve -state-of-the-art performance. -" -5164,1707.02026,"Jianshu Ji, Qinlong Wang, Kristina Toutanova, Yongen Gong, Steven - Truong, Jianfeng Gao",A Nested Attention Neural Hybrid Model for Grammatical Error Correction,cs.CL," Grammatical error correction (GEC) systems strive to correct both global -errors in word order and usage, and local errors in spelling and inflection. -Further developing upon recent work on neural machine translation, we propose a -new hybrid neural model with nested attention layers for GEC. Experiments show -that the new model can effectively correct errors of both types by -incorporating word and character-level information,and that the model -significantly outperforms previous neural models for GEC as measured on the -standard CoNLL-14 benchmark dataset. Further analysis also shows that the -superiority of the proposed model can be largely attributed to the use of the -nested attention mechanism, which has proven particularly effective in -correcting local errors that involve small edits in orthography. -" -5165,1707.02063,Wojciech Kusa and Michael Spranger,"External Evaluation of Event Extraction Classifiers for Automatic - Pathway Curation: An extended study of the mTOR pathway",cs.CL," This paper evaluates the impact of various event extraction systems on -automatic pathway curation using the popular mTOR pathway. We quantify the -impact of training data sets as well as different machine learning classifiers -and show that some improve the quality of automatically extracted pathways. -" -5166,1707.02230,Jens Nevens and Michael Spranger,Computational Models of Tutor Feedback in Language Acquisition,cs.CL," This paper investigates the role of tutor feedback in language learning using -computational models. We compare two dominant paradigms in language learning: -interactive learning and cross-situational learning - which differ primarily in -the role of social feedback such as gaze or pointing. We analyze the -relationship between these two paradigms and propose a new mixed paradigm that -combines the two paradigms and allows to test algorithms in experiments that -combine no feedback and social feedback. To deal with mixed feedback -experiments, we develop new algorithms and show how they perform with respect -to traditional knn and prototype approaches. -" -5167,1707.02268,"Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, - Elizabeth D. Trippe, Juan B. Gutierrez, Krys Kochut",Text Summarization Techniques: A Brief Survey,cs.CL," In recent years, there has been a explosion in the amount of text data from a -variety of sources. This volume of text is an invaluable source of information -and knowledge which needs to be effectively summarized to be useful. In this -review, the main approaches to automatic text summarization are described. We -review the different processes for summarization and describe the effectiveness -and shortcomings of the different methods. -" -5168,1707.02275,Antonio Valerio Miceli Barone and Rico Sennrich,"A parallel corpus of Python functions and documentation strings for - automated code documentation and code generation",cs.CL cs.AI," Automated documentation of programming source code and automated code -generation from natural language are challenging tasks of both practical and -scientific interest. Progress in these areas has been limited by the low -availability of parallel corpora of code and natural language descriptions, -which tend to be small and constrained to specific domains. - In this work we introduce a large and diverse parallel corpus of a hundred -thousands Python functions with their documentation strings (""docstrings"") -generated by scraping open source repositories on GitHub. We describe baseline -results for the code documentation and code generation tasks obtained by neural -machine translation. We also experiment with data augmentation techniques to -further increase the amount of training data. - We release our datasets and processing scripts in order to stimulate research -in these areas. -" -5169,1707.02363,"Ankur Bapna, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck",Towards Zero-Shot Frame Semantic Parsing for Domain Scaling,cs.AI cs.CL," State-of-the-art slot filling models for goal-oriented human/machine -conversational language understanding systems rely on deep learning methods. -While multi-task training of such models alleviates the need for large -in-domain annotated datasets, bootstrapping a semantic parsing model for a new -domain using only the semantic frame, such as the back-end API or knowledge -graph schema, is still one of the holy grail tasks of language understanding -for dialogue systems. This paper proposes a deep learning based approach that -can utilize only the slot description in context without the need for any -labeled or unlabeled in-domain examples, to quickly bootstrap a new domain. The -main idea of this paper is to leverage the encoding of the slot names and -descriptions within a multi-task deep learned slot filling model, to implicitly -align slots across domains. The proposed approach is promising for solving the -domain scaling problem and eliminating the need for any manually annotated data -or explicit schema alignment. Furthermore, our experiments on multiple domains -show that this approach results in significantly better slot-filling -performance when compared to using only in-domain data, especially in the low -data regime. -" -5170,1707.02377,Minmin Chen,Efficient Vector Representation for Documents through Corruption,cs.CL cs.LG," We present an efficient document representation learning framework, Document -Vector through Corruption (Doc2VecC). Doc2VecC represents each document as a -simple average of word embeddings. It ensures a representation generated as -such captures the semantic meanings of the document during learning. A -corruption model is included, which introduces a data-dependent regularization -that favors informative or rare words while forcing the embeddings of common -and non-discriminative ones to be close to zero. Doc2VecC produces -significantly better word embeddings than Word2Vec. We compare Doc2VecC with -several state-of-the-art document representation learning algorithms. The -simple model architecture introduced by Doc2VecC matches or out-performs the -state-of-the-art in generating high-quality document representations for -sentiment analysis, document classification as well as semantic relatedness -tasks. The simplicity of the model enables training on billions of words per -hour on a single machine. At the same time, the model is very efficient in -generating representations of unseen documents at test time. -" -5171,1707.02459,Jian Ni and Radu Florian,"Improving Multilingual Named Entity Recognition with Wikipedia Entity - Type Mapping",cs.CL cs.AI cs.IR," The state-of-the-art named entity recognition (NER) systems are statistical -machine learning models that have strong generalization capability (i.e., can -recognize unseen entities that do not appear in training data) based on lexical -and contextual information. However, such a model could still make mistakes if -its features favor a wrong entity type. In this paper, we utilize Wikipedia as -an open knowledge base to improve multilingual NER systems. Central to our -approach is the construction of high-accuracy, high-coverage multilingual -Wikipedia entity type mappings. These mappings are built from weakly annotated -data and can be extended to new languages with no human annotation or -language-dependent knowledge involved. Based on these mappings, we develop -several approaches to improve an NER system. We evaluate the performance of the -approaches via experiments on NER systems trained for 6 languages. Experimental -results show that the proposed approaches are effective in improving the -accuracy of such systems on unseen entities, especially when a system is -applied to a new domain or it is trained with little training data (up to 18.3 -F1 score improvement). -" -5172,1707.02483,Jian Ni and Georgiana Dinu and Radu Florian,"Weakly Supervised Cross-Lingual Named Entity Recognition via Effective - Annotation and Representation Projection",cs.CL cs.IR," The state-of-the-art named entity recognition (NER) systems are supervised -machine learning models that require large amounts of manually annotated data -to achieve high accuracy. However, annotating NER data by human is expensive -and time-consuming, and can be quite difficult for a new language. In this -paper, we present two weakly supervised approaches for cross-lingual NER with -no human annotation in a target language. The first approach is to create -automatically labeled NER data for a target language via annotation projection -on comparable corpora, where we develop a heuristic scheme that effectively -selects good-quality projection-labeled data from noisy data. The second -approach is to project distributed representations of words (word embeddings) -from a target language to a source language, so that the source-language NER -system can be applied to the target language without re-training. We also -design two co-decoding schemes that effectively combine the outputs of the two -projection-based approaches. We evaluate the performance of the proposed -approaches on both in-house and open NER data for several target languages. The -results show that the combined systems outperform three other weakly supervised -approaches on the CoNLL data. -" -5173,1707.02499,"Tong Wang, Ping Chen, Boyang Li",Predicting the Quality of Short Narratives from Social Media,cs.CL cs.SI," An important and difficult challenge in building computational models for -narratives is the automatic evaluation of narrative quality. Quality evaluation -connects narrative understanding and generation as generation systems need to -evaluate their own products. To circumvent difficulties in acquiring -annotations, we employ upvotes in social media as an approximate measure for -story quality. We collected 54,484 answers from a crowd-powered -question-and-answer website, Quora, and then used active learning to build a -classifier that labeled 28,320 answers as stories. To predict the number of -upvotes without the use of social network features, we create neural networks -that model textual regions and the interdependence among regions, which serve -as strong benchmarks for future research. To our best knowledge, this is the -first large-scale study for automatic evaluation of narrative quality. -" -5174,1707.02575,Sun-Chong Wang,Neural Machine Translation between Herbal Prescriptions and Diseases,cs.CL cs.LG," The current study applies deep learning to herbalism. Toward the goal, we -acquired the de-identified health insurance reimbursements that were claimed in -a 10-year period from 2004 to 2013 in the National Health Insurance Database of -Taiwan, the total number of reimbursement records equaling 340 millions. Two -artificial intelligence techniques were applied to the dataset: residual -convolutional neural network multitask classifier and attention-based recurrent -neural network. The former works to translate from herbal prescriptions to -diseases; and the latter from diseases to herbal prescriptions. Analysis of the -classification results indicates that herbal prescriptions are specific to: -anatomy, pathophysiology, sex and age of the patient, and season and year of -the prescription. Further analysis identifies temperature and gross domestic -product as the meteorological and socioeconomic factors that are associated -with herbal prescriptions. Analysis of the neural machine transitional result -indicates that the recurrent neural network learnt not only syntax but also -semantics of diseases and herbal prescriptions. -" -5175,1707.02633,Jessica Ficler and Yoav Goldberg,Controlling Linguistic Style Aspects in Neural Language Generation,cs.CL," Most work on neural natural language generation (NNLG) focus on controlling -the content of the generated text. We experiment with controlling several -stylistic aspects of the generated text, in addition to its content. The method -is based on conditioned RNN language model, where the desired content as well -as the stylistic parameters serve as conditioning contexts. We demonstrate the -approach on the movie reviews domain and show that it is successful in -generating coherent sentences corresponding to the required linguistic style -and content. -" -5176,1707.02657,"Edilson A. Corr\^ea Jr, Vanessa Q. Marinho, Leandro B. dos Santos, - Thales F. C. Bertaglia, Marcos V. Treviso, Henrico B. Brum",PELESent: Cross-domain polarity classification using distant supervision,cs.CL cs.AI cs.LG," The enormous amount of texts published daily by Internet users has fostered -the development of methods to analyze this content in several natural language -processing areas, such as sentiment analysis. The main goal of this task is to -classify the polarity of a message. Even though many approaches have been -proposed for sentiment analysis, some of the most successful ones rely on the -availability of large annotated corpus, which is an expensive and -time-consuming process. In recent years, distant supervision has been used to -obtain larger datasets. So, inspired by these techniques, in this paper we -extend such approaches to incorporate popular graphic symbols used in -electronic messages, the emojis, in order to create a large sentiment corpus -for Portuguese. Trained on almost one million tweets, several models were -tested in both same domain and cross-domain corpora. Our methods obtained very -competitive results in five annotated corpora from mixed domains (Twitter and -product reviews), which proves the domain-independent property of such -approach. In addition, our results suggest that the combination of emoticons -and emojis is able to properly capture the sentiment of a message. -" -5177,1707.02774,"Alexander Baturo, Niheer Dasandi, Slava J. Mikhaylov","Understanding State Preferences With Text As Data: Introducing the UN - General Debate Corpus",cs.CL cs.AI stat.ML," Every year at the United Nations, member states deliver statements during the -General Debate discussing major issues in world politics. These speeches -provide invaluable information on governments' perspectives and preferences on -a wide range of issues, but have largely been overlooked in the study of -international politics. This paper introduces a new dataset consisting of over -7,701 English-language country statements from 1970-2016. We demonstrate how -the UN General Debate Corpus (UNGDC) can be used to derive country positions on -different policy dimensions using text analytic methods. The paper provides -applications of these estimates, demonstrating the contribution the UNGDC can -make to the study of international politics. -" -5178,1707.02786,"Jihun Choi, Kang Min Yoo, Sang-goo Lee",Learning to Compose Task-Specific Tree Structures,cs.CL," For years, recursive neural networks (RvNNs) have been shown to be suitable -for representing text into fixed-length vectors and achieved good performance -on several natural language processing tasks. However, the main drawback of -RvNNs is that they require structured input, which makes data preparation and -model implementation hard. In this paper, we propose Gumbel Tree-LSTM, a novel -tree-structured long short-term memory architecture that learns how to compose -task-specific tree structures only from plain text data efficiently. Our model -uses Straight-Through Gumbel-Softmax estimator to decide the parent node among -candidates dynamically and to calculate gradients of the discrete decision. We -evaluate the proposed model on natural language inference and sentiment -analysis, and show that our model outperforms or is at least comparable to -previous models. We also find that our model converges significantly faster -than other models. -" -5179,1707.02812,"Suranjana Samanta, Sameep Mehta",Towards Crafting Text Adversarial Samples,cs.LG cs.AI cs.CL cs.CV," Adversarial samples are strategically modified samples, which are crafted -with the purpose of fooling a classifier at hand. An attacker introduces -specially crafted adversarial samples to a deployed classifier, which are being -mis-classified by the classifier. However, the samples are perceived to be -drawn from entirely different classes and thus it becomes hard to detect the -adversarial samples. Most of the prior works have been focused on synthesizing -adversarial samples in the image domain. In this paper, we propose a new method -of crafting adversarial text samples by modification of the original samples. -Modifications of the original text samples are done by deleting or replacing -the important or salient words in the text or by introducing new words in the -text sample. Our algorithm works best for the datasets which have -sub-categories within each of the classes of examples. While crafting -adversarial samples, one of the key constraint is to generate meaningful -sentences which can at pass off as legitimate from language (English) -viewpoint. Experimental results on IMDB movie review dataset for sentiment -analysis and Twitter dataset for gender detection show the efficiency of our -proposed method. -" -5180,1707.02892,"Honglun Zhang, Liqiang Xiao, Yongkun Wang, Yaohui Jin","A Generalized Recurrent Neural Architecture for Text Classification with - Multi-Task Learning",cs.CL," Multi-task learning leverages potential correlations among related tasks to -extract common features and yield performance gains. However, most previous -works only consider simple or weak interactions, thereby failing to model -complex correlations among three or more tasks. In this paper, we propose a -multi-task learning architecture with four types of recurrent neural layers to -fuse information across multiple related tasks. The architecture is -structurally flexible and considers various interactions among tasks, which can -be regarded as a generalized case of many previous works. Extensive experiments -on five benchmark datasets for text classification show that our model can -significantly improve performances of related tasks with additional information -from others. -" -5181,1707.02919,"Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saied Safaei, - Elizabeth D. Trippe, Juan B. Gutierrez, Krys Kochut","A Brief Survey of Text Mining: Classification, Clustering and Extraction - Techniques",cs.CL cs.AI cs.IR," The amount of text that is generated every day is increasing dramatically. -This tremendous volume of mostly unstructured text cannot be simply processed -and perceived by computers. Therefore, efficient and effective techniques and -algorithms are required to discover useful patterns. Text mining is the task of -extracting meaningful information from text, which has gained significant -attentions in recent years. In this paper, we describe several of the most -fundamental text mining tasks and techniques including text pre-processing, -classification and clustering. Additionally, we briefly explain text mining in -biomedical and health care domains. -" -5182,1707.03017,"Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron - Courville",Learning Visual Reasoning Without Strong Priors,cs.CV cs.AI cs.CL stat.ML," Achieving artificial visual reasoning - the ability to answer image-related -questions which require a multi-step, high-level process - is an important step -towards artificial general intelligence. This multi-modal task requires -learning a question-dependent, structured reasoning process over images from -language. Standard deep learning approaches tend to exploit biases in the data -rather than learn this underlying structure, while leading methods learn to -visually reason successfully but are hand-crafted for reasoning. We show that a -general-purpose, Conditional Batch Normalization approach achieves -state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% -error rate. We outperform the next best end-to-end method (4.5%) and even -methods that use extra supervision (3.1%). We probe our model to shed light on -how it reasons, showing it has learned a question-dependent, multi-step -process. Previous work has operated under the assumption that visual reasoning -calls for a specialized architecture, but we show that a general architecture -with proper conditioning can learn to visually reason effectively. -" -5183,1707.03058,"Daniel Fried, Mitchell Stern, Dan Klein","Improving Neural Parsing by Disentangling Model Combination and - Reranking Effects",cs.CL," Recent work has proposed several generative neural models for constituency -parsing that achieve state-of-the-art results. Since direct search in these -generative models is difficult, they have primarily been used to rescore -candidate outputs from base parsers in which decoding is more straightforward. -We first present an algorithm for direct search in these generative models. We -then demonstrate that the rescoring results are at least partly due to implicit -model combination rather than reranking effects. Finally, we show that explicit -model combination can improve performance even further, resulting in new -state-of-the-art numbers on the PTB of 94.25 F1 when training only on gold data -and 94.66 F1 when using external data. -" -5184,1707.03095,"Ben Curran, Kyle Higham, Elisenda Ortiz, Demival Vasques Filho","Look Who's Talking: Bipartite Networks as Representations of a Topic - Model of New Zealand Parliamentary Speeches",cs.CL cs.DL cs.SI physics.soc-ph," Quantitative methods to measure the participation to parliamentary debate and -discourse of elected Members of Parliament (MPs) and the parties they belong to -are lacking. This is an exploratory study in which we propose the development -of a new approach for a quantitative analysis of such participation. We utilize -the New Zealand government's digital Hansard database to construct a topic -model of parliamentary speeches consisting of nearly 40 million words in the -period 2003-2016. A Latent Dirichlet Allocation topic model is implemented in -order to reveal the thematic structure of our set of documents. This generative -statistical model enables the detection of major themes or topics that are -publicly discussed in the New Zealand parliament, as well as permitting their -classification by MP. Information on topic proportions is subsequently analyzed -using a combination of statistical methods. We observe patterns arising from -time-series analysis of topic frequencies which can be related to specific -social, economic and legislative events. We then construct a bipartite network -representation, linking MPs to topics, for each of four parliamentary terms in -this time frame. We build projected networks (onto the set of nodes represented -by MPs) and proceed to the study of the dynamical changes of their topology, -including community structure. By performing this longitudinal network -analysis, we can observe the evolution of the New Zealand parliamentary topic -network and its main parties in the period studied. -" -5185,1707.03103,"Jorge A. Balazs, Edison Marrese-Taylor, Pablo Loyola, Yutaka Matsuo","Refining Raw Sentence Representations for Textual Entailment Recognition - via Attention",cs.CL," In this paper we present the model used by the team Rivercorners for the 2017 -RepEval shared task. First, our model separately encodes a pair of sentences -into variable-length representations by using a bidirectional LSTM. Later, it -creates fixed-length raw representations by means of simple aggregation -functions, which are then refined using an attention mechanism. Finally it -combines the refined representations of both sentences into a single vector to -be used for classification. With this model we obtained test accuracies of -72.057% and 72.055% in the matched and mismatched evaluation tracks -respectively, outperforming the LSTM baseline, and obtaining performances -similar to a model that relies on shared information between sentences (ESIM). -When using an ensemble both accuracies increased to 72.247% and 72.827% -respectively. -" -5186,1707.03172,"Florin Brad, Radu Iacob, Ionel Hosu, Traian Rebedea",Dataset for a Neural Natural Language Interface for Databases (NNLIDB),cs.CL," Progress in natural language interfaces to databases (NLIDB) has been slow -mainly due to linguistic issues (such as language ambiguity) and domain -portability. Moreover, the lack of a large corpus to be used as a standard -benchmark has made data-driven approaches difficult to develop and compare. In -this paper, we revisit the problem of NLIDBs and recast it as a sequence -translation problem. To this end, we introduce a large dataset extracted from -the Stack Exchange Data Explorer website, which can be used for training neural -natural language interfaces for databases. We also report encouraging baseline -results on a smaller manually annotated test corpus, obtained using an -attention-based sequence-to-sequence neural network. -" -5187,1707.03228,David Vilares and Carlos G\'omez-Rodr\'iguez,A non-projective greedy dependency parser with bidirectional LSTMs,cs.CL," The LyS-FASTPARSE team presents BIST-COVINGTON, a neural implementation of -the Covington (2001) algorithm for non-projective dependency parsing. The -bidirectional LSTM approach by Kipperwasser and Goldberg (2016) is used to -train a greedy parser with a dynamic oracle to mitigate error propagation. The -model participated in the CoNLL 2017 UD Shared Task. In spite of not using any -ensemble methods and using the baseline segmentation and PoS tagging, the -parser obtained good results on both macro-average LAS and UAS in the big -treebanks category (55 languages), ranking 7th out of 33 teams. In the all -treebanks category (LAS and UAS) we ranked 16th and 12th. The gap between the -all and big categories is mainly due to the poor performance on four parallel -PUD treebanks, suggesting that some `suffixed' treebanks (e.g. Spanish-AnCora) -perform poorly on cross-treebank settings, which does not occur with the -corresponding `unsuffixed' treebank (e.g. Spanish). By changing that, we obtain -the 11th best LAS among all runs (official and unofficial). The code is made -available at https://github.com/CoNLL-UD-2017/LyS-FASTPARSE -" -5188,1707.03253,"Andreas Niekler, Gregor Wiedemann, Gerhard Heyer","Leipzig Corpus Miner - A Text Mining Infrastructure for Qualitative Data - Analysis",cs.CL," This paper presents the ""Leipzig Corpus Miner"", a technical infrastructure -for supporting qualitative and quantitative content analysis. The -infrastructure aims at the integration of 'close reading' procedures on -individual documents with procedures of 'distant reading', e.g. lexical -characteristics of large document collections. Therefore information retrieval -systems, lexicometric statistics and machine learning procedures are combined -in a coherent framework which enables qualitative data analysts to make use of -state-of-the-art Natural Language Processing techniques on very large document -collections. Applicability of the framework ranges from social sciences to -media studies and market research. As an example we introduce the usage of the -framework in a political science study on post-democracy and neoliberalism. -" -5189,1707.03255,"Gerhard Heyer, Cathleen Kantner, Andreas Niekler, Max Overbeck, Gregor - Wiedemann","Modeling the dynamics of domain specific terminology in diachronic - corpora",cs.CL," In terminology work, natural language processing, and digital humanities, -several studies address the analysis of variations in context and meaning of -terms in order to detect semantic change and the evolution of terms. We -distinguish three different approaches to describe contextual variations: -methods based on the analysis of patterns and linguistic clues, methods -exploring the latent semantic space of single words, and methods for the -analysis of topic membership. The paper presents the notion of context -volatility as a new measure for detecting semantic change and applies it to key -term extraction in a political science case study. The measure quantifies the -dynamics of a term's contextual variation within a diachronic corpus to -identify periods of time that are characterised by intense controversial -debates or substantial semantic transformations. -" -5190,1707.03264,"Benjamin Riedel, Isabelle Augenstein, Georgios P. Spithourakis, - Sebastian Riedel","A simple but tough-to-beat baseline for the Fake News Challenge stance - detection task",cs.CL," Identifying public misinformation is a complicated and challenging task. An -important part of checking the veracity of a specific claim is to evaluate the -stance different news sources take towards the assertion. Automatic stance -evaluation, i.e. stance detection, would arguably facilitate the process of -fact checking. In this paper, we present our stance detection system which -claimed third place in Stage 1 of the Fake News Challenge. Despite our -straightforward approach, our system performs at a competitive level with the -complex ensembles of the top two winning teams. We therefore propose our system -as the 'simple but tough-to-beat baseline' for the Fake News Challenge stance -detection task. -" -5191,1707.03457,"Joost Engelfriet, Andreas Maletti, Sebastian Maneth",Multiple Context-Free Tree Grammars: Lexicalization and Characterization,cs.FL cs.CL," Multiple (simple) context-free tree grammars are investigated, where ""simple"" -means ""linear and nondeleting"". Every multiple context-free tree grammar that -is finitely ambiguous can be lexicalized; i.e., it can be transformed into an -equivalent one (generating the same tree language) in which each rule of the -grammar contains a lexical symbol. Due to this transformation, the rank of the -nonterminals increases at most by 1, and the multiplicity (or fan-out) of the -grammar increases at most by the maximal rank of the lexical symbols; in -particular, the multiplicity does not increase when all lexical symbols have -rank 0. Multiple context-free tree grammars have the same tree generating power -as multi-component tree adjoining grammars (provided the latter can use a -root-marker). Moreover, every multi-component tree adjoining grammar that is -finitely ambiguous can be lexicalized. Multiple context-free tree grammars have -the same string generating power as multiple context-free (string) grammars and -polynomial time parsing algorithms. A tree language can be generated by a -multiple context-free tree grammar if and only if it is the image of a regular -tree language under a deterministic finite-copying macro tree transducer. -Multiple context-free tree grammars can be used as a synchronous translation -device. -" -5192,1707.03490,Stefano Gurciullo and Slava Mikhaylov,"Detecting Policy Preferences and Dynamics in the UN General Debate with - Neural Word Embeddings",cs.CL cs.AI stat.ML," Foreign policy analysis has been struggling to find ways to measure policy -preferences and paradigm shifts in international political systems. This paper -presents a novel, potential solution to this challenge, through the application -of a neural word embedding (Word2vec) model on a dataset featuring speeches by -heads of state or government in the United Nations General Debate. The paper -provides three key contributions based on the output of the Word2vec model. -First, it presents a set of policy attention indices, synthesizing the semantic -proximity of political speeches to specific policy themes. Second, it -introduces country-specific semantic centrality indices, based on topological -analyses of countries' semantic positions with respect to each other. Third, it -tests the hypothesis that there exists a statistical relation between the -semantic content of political speeches and UN voting behavior, falsifying it -and suggesting that political speeches contain information of different nature -then the one behind voting outcomes. The paper concludes with a discussion of -the practical use of its results and consequences for foreign policy analysis, -public accountability, and transparency. -" -5193,1707.03550,Yingjie Hu,Geospatial Semantics,cs.CL," Geospatial semantics is a broad field that involves a variety of research -areas. The term semantics refers to the meaning of things, and is in contrast -with the term syntactics. Accordingly, studies on geospatial semantics usually -focus on understanding the meaning of geographic entities as well as their -counterparts in the cognitive and digital world, such as cognitive geographic -concepts and digital gazetteers. Geospatial semantics can also facilitate the -design of geographic information systems (GIS) by enhancing the -interoperability of distributed systems and developing more intelligent -interfaces for user interactions. During the past years, a lot of research has -been conducted, approaching geospatial semantics from different perspectives, -using a variety of methods, and targeting different problems. Meanwhile, the -arrival of big geo data, especially the large amount of unstructured text data -on the Web, and the fast development of natural language processing methods -enable new research directions in geospatial semantics. This chapter, -therefore, provides a systematic review on the existing geospatial semantic -research. Six major research areas are identified and discussed, including -semantic interoperability, digital gazetteers, geographic information -retrieval, geospatial Semantic Web, place semantics, and cognitive geographic -concepts. -" -5194,1707.03569,"Georgios Balikas, Simon Moura, Massih-Reza Amini",Multitask Learning for Fine-Grained Twitter Sentiment Analysis,cs.IR cs.CL cs.LG," Traditional sentiment analysis approaches tackle problems like ternary -(3-category) and fine-grained (5-category) classification by learning the tasks -separately. We argue that such classification tasks are correlated and we -propose a multitask approach based on a recurrent neural network that benefits -by jointly learning them. Our study demonstrates the potential of multitask -models on this type of problems and improves the state-of-the-art results in -the fine-grained sentiment classification problem. -" -5195,1707.03736,"Georgi Karadjov, Tsvetomila Mihaylova, Yasen Kiprov, Georgi Georgiev, - Ivan Koychev, and Preslav Nakov","The Case for Being Average: A Mediocrity Approach to Style Masking and - Author Obfuscation",cs.CL," Users posting online expect to remain anonymous unless they have logged in, -which is often needed for them to be able to discuss freely on various topics. -Preserving the anonymity of a text's writer can be also important in some other -contexts, e.g., in the case of witness protection or anonymity programs. -However, each person has his/her own style of writing, which can be analyzed -using stylometry, and as a result, the true identity of the author of a piece -of text can be revealed even if s/he has tried to hide it. Thus, it could be -helpful to design automatic tools that can help a person obfuscate his/her -identity when writing text. In particular, here we propose an approach that -changes the text, so that it is pushed towards average values for some general -stylometric characteristics, thus making the use of these characteristics less -discriminative. The approach consists of three main steps: first, we calculate -the values for some popular stylometric metrics that can indicate authorship; -then we apply various transformations to the text, so that these metrics are -adjusted towards the average level, while preserving the semantics and the -soundness of the text; and finally, we add random noise. This approach turned -out to be very efficient, and yielded the best performance on the Author -Obfuscation task at the PAN-2016 competition. -" -5196,1707.03764,"Angelo Basile, Gareth Dwyer, Maria Medvedeva, Josine Rawee, Hessel - Haagsma and Malvina Nissim",N-GrAM: New Groningen Author-profiling Model,cs.CL," We describe our participation in the PAN 2017 shared task on Author -Profiling, identifying authors' gender and language variety for English, -Spanish, Arabic and Portuguese. We describe both the final, submitted system, -and a series of negative results. Our aim was to create a single model for both -gender and language, and for all language varieties. Our best-performing system -(on cross-validated results) is a linear support vector machine (SVM) with word -unigrams and character 3- to 5-grams as features. A set of additional features, -including POS tags, additional datasets, geographic entities, and Twitter -handles, hurt, rather than improve, performance. Results from cross-validation -indicated high performance overall and results on the test set confirmed them, -at 0.86 averaged accuracy, with performance on sub-tasks ranging from 0.68 to -0.98. -" -5197,1707.03804,"Hao Tan, Mohit Bansal",Source-Target Inference Models for Spatial Instruction Understanding,cs.CL cs.AI cs.LG cs.RO," Models that can execute natural language instructions for situated robotic -tasks such as assembly and navigation have several useful applications in -homes, offices, and remote scenarios. We study the semantics of -spatially-referred configuration and arrangement instructions, based on the -challenging Bisk-2016 blank-labeled block dataset. This task involves finding a -source block and moving it to the target position (mentioned via a reference -block and offset), where the blocks have no names or colors and are just -referred to via spatial location features. We present novel models for the -subtasks of source block classification and target position regression, based -on joint-loss language and spatial-world representation learning, as well as -CNN-based and dual attention models to compute the alignment between the world -blocks and the instruction phrases. For target position prediction, we compare -two inference approaches: annealed sampling via policy gradient versus -expectation inference via supervised regression. Our models achieve the new -state-of-the-art on this task, with an improvement of 47% on source block -accuracy and 22% on target position distance. -" -5198,1707.03819,Minh Le,"A Critique of a Critique of Word Similarity Datasets: Sanity Check or - Unnecessary Confusion?",cs.CL," Critical evaluation of word similarity datasets is very important for -computational lexical semantics. This short report concerns the sanity check -proposed in Batchkarov et al. (2016) to evaluate several popular datasets such -as MC, RG and MEN -- the first two reportedly failed. I argue that this test is -unstable, offers no added insight, and needs major revision in order to fulfill -its purported goal. -" -5199,1707.03903,"Dmitry Ustalov, Nikolay Arefyev, Chris Biemann, Alexander Panchenko","Negative Sampling Improves Hypernymy Extraction Based on Projection - Learning",cs.CL," We present a new approach to extraction of hypernyms based on projection -learning and word embeddings. In contrast to classification-based approaches, -projection-based methods require no candidate hyponym-hypernym pairs. While it -is natural to use both positive and negative training examples in supervised -relation extraction, the impact of negative examples on hypernym prediction was -not studied so far. In this paper, we show that explicit negative examples used -for regularization of the model significantly improve performance compared to -the state-of-the-art approach of Fu et al. (2014) on three datasets from -different languages. -" -5200,1707.03904,"Bhuwan Dhingra, Kathryn Mazaitis and William W. Cohen",Quasar: Datasets for Question Answering by Search and Reading,cs.CL cs.IR cs.LG," We present two new large-scale datasets aimed at evaluating systems designed -to comprehend a natural language query and extract its answer from a large -corpus of text. The Quasar-S dataset consists of 37000 cloze-style -(fill-in-the-gap) queries constructed from definitions of software entity tags -on the popular website Stack Overflow. The posts and comments on the website -serve as the background corpus for answering the cloze questions. The Quasar-T -dataset consists of 43000 open-domain trivia questions and their answers -obtained from various internet sources. ClueWeb09 serves as the background -corpus for extracting these answers. We pose these datasets as a challenge for -two related subtasks of factoid Question Answering: (1) searching for relevant -pieces of text that include the correct answer to a query, and (2) reading the -retrieved text to answer the query. We also describe a retrieval system for -extracting relevant sentences and documents from the corpus given a query, and -include these in the release for researchers wishing to only focus on (2). We -evaluate several baselines on both datasets, ranging from simple heuristics to -powerful neural models, and show that these lag behind human performance by -16.4% and 32.1% for Quasar-S and -T respectively. The datasets are available at -https://github.com/bdhingra/quasar . -" -5201,1707.03938,"Michael Janner, Karthik Narasimhan, Regina Barzilay",Representation Learning for Grounded Spatial Reasoning,cs.CL cs.AI cs.LG," The interpretation of spatial references is highly contextual, requiring -joint inference over both language and the environment. We consider the task of -spatial reasoning in a simulated environment, where an agent can act and -receive rewards. The proposed model learns a representation of the world -steered by instruction text. This design allows for precise alignment of local -neighborhoods with corresponding verbalizations, while also handling global -references in the instructions. We train our model with reinforcement learning -using a variant of generalized value iteration. The model outperforms -state-of-the-art approaches on several metrics, yielding a 45% reduction in -goal localization error. -" -5202,1707.03968,"Shumpei Sano, Nobuhiro Kaji, and Manabu Sassano",Predicting Causes of Reformulation in Intelligent Assistants,cs.CL," Intelligent assistants (IAs) such as Siri and Cortana conversationally -interact with users and execute a wide range of actions (e.g., searching the -Web, setting alarms, and chatting). IAs can support these actions through the -combination of various components such as automatic speech recognition, natural -language understanding, and language generation. However, the complexity of -these components hinders developers from determining which component causes an -error. To remove this hindrance, we focus on reformulation, which is a useful -signal of user dissatisfaction, and propose a method to predict the -reformulation causes. We evaluate the method using the user logs of a -commercial IA. The experimental results have demonstrated that features -designed to detect the error of a specific component improve the performance of -reformulation cause detection. -" -5203,1707.03997,John J. Camilleri and Mohammad Reza Haghshenas and Gerardo Schneider,A Web-Based Tool for Analysing Normative Documents in English,cs.CL cs.CY," Our goal is to use formal methods to analyse normative documents written in -English, such as privacy policies and service-level agreements. This requires -the combination of a number of different elements, including information -extraction from natural language, formal languages for model representation, -and an interface for property specification and verification. We have worked on -a collection of components for this task: a natural language extraction tool, a -suitable formalism for representing such documents, an interface for building -models in this formalism, and methods for answering queries asked of a given -model. In this work, each of these concerns is brought together in a web-based -tool, providing a single interface for analysing normative texts in English. -Through the use of a running example, we describe each component and -demonstrate the workflow established by our tool. -" -5204,1707.04095,Chlo\'e Braud and Anders S{\o}gaard,Is writing style predictive of scientific fraud?,cs.CL," The problem of detecting scientific fraud using machine learning was recently -introduced, with initial, positive results from a model taking into account -various general indicators. The results seem to suggest that writing style is -predictive of scientific fraud. We revisit these initial experiments, and show -that the leave-one-out testing procedure they used likely leads to a slight -over-estimate of the predictability, but also that simple models can outperform -their proposed model by some margin. We go on to explore more abstract -linguistic features, such as linguistic complexity and discourse structure, -only to obtain negative results. Upon analyzing our models, we do see some -interesting patterns, though: Scientific fraud, for examples, contains less -comparison, as well as different types of hedging and ways of presenting -logical reasoning. -" -5205,1707.04108,"Hoa T. Le, Christophe Cerisara, Alexandre Denis",Do Convolutional Networks need to be Deep for Text Classification ?,cs.CL," We study in this work the importance of depth in convolutional models for -text classification, either when character or word inputs are considered. We -show on 5 standard text classification and sentiment analysis tasks that deep -models indeed give better performances than shallow networks when the text -input is represented as a sequence of characters. However, a simple -shallow-and-wide network outperforms deep models such as DenseNet with word -inputs. Our shallow word model further establishes new state-of-the-art -performances on two datasets: Yelp Binary (95.9\%) and Yelp Full (64.9\%). -" -5206,1707.04218,Yanpeng Li,Learning Features from Co-occurrences: A Theoretical Analysis,cs.CL cs.LG math.ST stat.ML stat.TH," Representing a word by its co-occurrences with other words in context is an -effective way to capture the meaning of the word. However, the theory behind -remains a challenge. In this work, taking the example of a word classification -task, we give a theoretical analysis of the approaches that represent a word X -by a function f(P(C|X)), where C is a context feature, P(C|X) is the -conditional probability estimated from a text corpus, and the function f maps -the co-occurrence measure to a prediction score. We investigate the impact of -context feature C and the function f. We also explain the reasons why using the -co-occurrences with multiple context features may be better than just using a -single one. In addition, some of the results shed light on the theory of -feature learning and machine learning in general. -" -5207,1707.04221,Jonathan K. Kummerfeld and Dan Klein,"Parsing with Traces: An $O(n^4)$ Algorithm and a Structural - Representation",cs.CL cs.DM," General treebank analyses are graph structured, but parsers are typically -restricted to tree structures for efficiency and modeling reasons. We propose a -new representation and algorithm for a class of graph structures that is -flexible enough to cover almost all treebank structures, while still admitting -efficient learning and inference. In particular, we consider directed, acyclic, -one-endpoint-crossing graph structures, which cover most long-distance -dislocation, shared argumentation, and similar tree-violating linguistic -phenomena. We describe how to convert phrase structure parses, including -traces, to our new representation in a reversible manner. Our dynamic program -uniquely decomposes structures, is sound and complete, and covers 97.3% of the -Penn English Treebank. We also implement a proof-of-concept parser that -recovers a range of null elements and trace types. -" -5208,1707.04227,"Seppo Enarvi, Peter Smit, Sami Virpioja, Mikko Kurimo","Automatic Speech Recognition with Very Large Conversational Finnish and - Estonian Vocabularies",cs.CL cs.SD," Today, the vocabulary size for language models in large vocabulary speech -recognition is typically several hundreds of thousands of words. While this is -already sufficient in some applications, the out-of-vocabulary words are still -limiting the usability in others. In agglutinative languages the vocabulary for -conversational speech should include millions of word forms to cover the -spelling variations due to colloquial pronunciations, in addition to the word -compounding and inflections. Very large vocabularies are also needed, for -example, when the recognition of rare proper names is important. -" -5209,1707.04242,"Tom Kenter, Alexey Borisov, Christophe Van Gysel, Mostafa Dehghani, - Maarten de Rijke, Bhaskar Mitra",Neural Networks for Information Retrieval,cs.IR cs.AI cs.CL," Machine learning plays a role in many aspects of modern IR systems, and deep -learning is applied in all of them. The fast pace of modern-day research has -given rise to many different approaches for many different IR problems. The -amount of information available can be overwhelming both for junior students -and for experienced researchers looking for new research topics and directions. -Additionally, it is interesting to see what key insights into IR problems the -new technologies are able to give us. The aim of this full-day tutorial is to -give a clear overview of current tried-and-trusted neural methods in IR and how -they benefit IR research. It covers key architectures, as well as the most -promising future directions. -" -5210,1707.04244,Preeti Bhargava and Nemanja Spasojevic and Guoning Hu,"Lithium NLP: A System for Rich Information Extraction from Noisy User - Generated Text on Social Media",cs.AI cs.CL cs.IR," In this paper, we describe the Lithium Natural Language Processing (NLP) -system - a resource-constrained, high- throughput and language-agnostic system -for information extraction from noisy user generated text on social media. -Lithium NLP extracts a rich set of information including entities, topics, -hashtags and sentiment from text. We discuss several real world applications of -the system currently incorporated in Lithium products. We also compare our -system with existing commercial and academic NLP systems in terms of -performance, information extracted and languages supported. We show that -Lithium NLP is at par with and in some cases, outperforms state- of-the-art -commercial NLP systems. -" -5211,1707.04408,"Rajiv Bajpai, Soujanya Poria, Danyun Ho, and Erik Cambria","Developing a concept-level knowledge base for sentiment analysis in - Singlish",cs.CL," In this paper, we present Singlish sentiment lexicon, a concept-level -knowledge base for sentiment analysis that associates multiword expressions to -a set of emotion labels and a polarity value. Unlike many other sentiment -analysis resources, this lexicon is not built by manually labeling pieces of -knowledge coming from general NLP resources such as WordNet or DBPedia. -Instead, it is automatically constructed by applying graph-mining and -multi-dimensional scaling techniques on the affective common-sense knowledge -collected from three different sources. This knowledge is represented -redundantly at three levels: semantic network, matrix, and vector space. -Subsequently, the concepts are labeled by emotions and polarity through the -ensemble application of spreading activation, neural networks and an emotion -categorization model. -" -5212,1707.04412,"Alon Talmor, Mor Geva, Jonathan Berant","Evaluating Semantic Parsing against a Simple Web-based Question - Answering Model",cs.CL," Semantic parsing shines at analyzing complex natural language that involves -composition and computation over multiple pieces of evidence. However, datasets -for semantic parsing contain many factoid questions that can be answered from a -single web document. In this paper, we propose to evaluate semantic -parsing-based question answering models by comparing them to a question -answering baseline that queries the web and extracts the answer only from web -snippets, without access to the target knowledge-base. We investigate this -approach on COMPLEXQUESTIONS, a dataset designed to focus on compositional -language, and find that our model obtains reasonable performance (35 F1 -compared to 41 F1 of state-of-the-art). We find in our analysis that our model -performs well on complex questions involving conjunctions, but struggles on -questions that involve relation composition and superlatives. -" -5213,1707.04481,"Ozan Caglayan, Walid Aransa, Adrien Bardet, Mercedes - Garc\'ia-Mart\'inez, Fethi Bougares, Lo\""ic Barrault, Marc Masana, Luis - Herranz and Joost van de Weijer",LIUM-CVC Submissions for WMT17 Multimodal Translation Task,cs.CL," This paper describes the monomodal and multimodal Neural Machine Translation -systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal -Translation. We mainly explored two multimodal architectures where either -global visual features or convolutional feature maps are integrated in order to -benefit from visual context. Our final systems ranked first for both En-De and -En-Fr language pairs according to the automatic evaluation metrics METEOR and -BLEU. -" -5214,1707.04499,"Mercedes Garc\'ia-Mart\'inez, Ozan Caglayan, Walid Aransa, Adrien - Bardet, Fethi Bougares, Lo\""ic Barrault",LIUM Machine Translation Systems for WMT17 News Translation Task,cs.CL," This paper describes LIUM submissions to WMT17 News Translation Task for -English-German, English-Turkish, English-Czech and English-Latvian language -pairs. We train BPE-based attentive Neural Machine Translation systems with and -without factored outputs using the open source nmtpy framework. Competitive -scores were obtained by ensembling various systems and exploiting the -availability of target monolingual corpora for back-translation. The impact of -back-translation quantity and quality is also analyzed for English-Turkish -where our post-deadline submission surpassed the best entry by +1.6 BLEU. -" -5215,1707.04538,"Tomasz Jurczyk, Jinho D. Choi","Cross-genre Document Retrieval: Matching between Conversational and - Formal Writings",cs.CL," This paper challenges a cross-genre document retrieval task, where the -queries are in formal writing and the target documents are in conversational -writing. In this task, a query, is a sentence extracted from either a summary -or a plot of an episode in a TV show, and the target document consists of -transcripts from the corresponding episode. To establish a strong baseline, we -employ the current state-of-the-art search engine to perform document retrieval -on the dataset collected for this work. We then introduce a structure reranking -approach to improve the initial ranking by utilizing syntactic and semantic -structures generated by NLP tools. Our evaluation shows an improvement of more -than 4% when the structure reranking is applied, which is very promising. -" -5216,1707.04546,"Shrimai Prabhumoye and Samridhi Choudhary, Evangelia Spiliopoulou, - Christopher Bogart, Carolyn Penstein Rose, Alan W Black",Linguistic Markers of Influence in Informal Interactions,cs.CL cs.SI," There has been a long standing interest in understanding `Social Influence' -both in Social Sciences and in Computational Linguistics. In this paper, we -present a novel approach to study and measure interpersonal influence in daily -interactions. Motivated by the basic principles of influence, we attempt to -identify indicative linguistic features of the posts in an online knitting -community. We present the scheme used to operationalize and label the posts -with indicator features. Experiments with the identified features show an -improvement in the classification accuracy of influence by 3.15%. Our results -illustrate the important correlation between the characteristics of the -language and its potential to influence others. -" -5217,1707.04550,Jind\v{r}ich Helcl and Jind\v{r}ich Libovick\'y,CUNI System for the WMT17 Multimodal Translation Task,cs.CL cs.NE," In this paper, we describe our submissions to the WMT17 Multimodal -Translation Task. For Task 1 (multimodal translation), our best scoring system -is a purely textual neural translation of the source image caption to the -target language. The main feature of the system is the use of additional data -that was acquired by selecting similar sentences from parallel corpora and by -data synthesis with back-translation. For Task 2 (cross-lingual image -captioning), our best submitted system generates an English caption which is -then translated by the best system used in Task 1. We also present negative -results, which are based on ideas that we believe have potential of making -improvements, but did not prove to be useful in our particular setup. -" -5218,1707.04596,"Sheng Chen, Akshay Soni, Aasish Pappu, Yashar Mehdad","DocTag2Vec: An Embedding Based Multi-label Learning Approach for - Document Tagging",cs.CL cs.IR," Tagging news articles or blog posts with relevant tags from a collection of -predefined ones is coined as document tagging in this work. Accurate tagging of -articles can benefit several downstream applications such as recommendation and -search. In this work, we propose a novel yet simple approach called DocTag2Vec -to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two -popular models for learning distributed representation of words and documents. -In DocTag2Vec, we simultaneously learn the representation of words, documents, -and tags in a joint vector space during training, and employ the simple -$k$-nearest neighbor search to predict tags for unseen documents. In contrast -to previous multi-label learning methods, DocTag2Vec directly deals with raw -text instead of provided feature vector, and in addition, enjoys advantages -like the learning of tag representation, and the ability of handling newly -created tags. To demonstrate the effectiveness of our approach, we conduct -experiments on several datasets and show promising results against -state-of-the-art methods. -" -5219,1707.04652,"Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran",EmojiNet: An Open Service and API for Emoji Sense Discovery,cs.CL cs.SI," This paper presents the release of EmojiNet, the largest machine-readable -emoji sense inventory that links Unicode emoji representations to their English -meanings extracted from the Web. EmojiNet is a dataset consisting of: (i) -12,904 sense labels over 2,389 emoji, which were extracted from the web and -linked to machine-readable sense definitions seen in BabelNet, (ii) context -words associated with each emoji sense, which are inferred through word -embedding models trained over Google News corpus and a Twitter message corpus -for each emoji sense definition, and (iii) recognizing discrepancies in the -presentation of emoji on different platforms, specification of the most likely -platform-based emoji sense for a selected set of emoji. The dataset is hosted -as an open service with a REST API and is available at -http://emojinet.knoesis.org/. The development of this dataset, evaluation of -its quality, and its applications including emoji sense disambiguation and -emoji sense similarity are discussed. -" -5220,1707.04653,"Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran",A Semantics-Based Measure of Emoji Similarity,cs.CL cs.SI," Emoji have grown to become one of the most important forms of communication -on the web. With its widespread use, measuring the similarity of emoji has -become an important problem for contemporary text processing since it lies at -the heart of sentiment analysis, search, and interface design tasks. This paper -presents a comprehensive analysis of the semantic similarity of emoji through -embedding models that are learned over machine-readable emoji meanings in the -EmojiNet knowledge base. Using emoji descriptions, emoji sense labels and emoji -sense definitions, and with different training corpora obtained from Twitter -and Google News, we develop and test multiple embedding models to measure emoji -similarity. To evaluate our work, we create a new dataset called EmoSim508, -which assigns human-annotated semantic similarity scores to a set of 508 -carefully selected emoji pairs. After validation with EmoSim508, we present a -real-world use-case of our emoji embedding models using a sentiment analysis -task and show that our models outperform the previous best-performing emoji -embedding model on this task. The EmoSim508 dataset and our emoji embedding -models are publicly released with this paper and can be downloaded from -http://emojinet.knoesis.org/. -" -5221,1707.04662,Alexey Zobnin,"Rotations and Interpretability of Word Embeddings: the Case of the - Russian Language",cs.CL," Consider a continuous word embedding model. Usually, the cosines between word -vectors are used as a measure of similarity of words. These cosines do not -change under orthogonal transformations of the embedding space. We demonstrate -that, using some canonical orthogonal transformations from SVD, it is possible -both to increase the meaning of some components and to make the components more -stable under re-learning. We study the interpretability of components for -publicly available models for the Russian language (RusVectores, fastText, -RDT). -" -5222,1707.04678,Alexandros Tsaptsinos,"Lyrics-Based Music Genre Classification Using a Hierarchical Attention - Network",cs.IR cs.CL cs.NE," Music genre classification, especially using lyrics alone, remains a -challenging topic in Music Information Retrieval. In this study we apply -recurrent neural network models to classify a large dataset of intact song -lyrics. As lyrics exhibit a hierarchical layer structure - in which words -combine to form lines, lines form segments, and segments form a complete song - -we adapt a hierarchical attention network (HAN) to exploit these layers and in -addition learn the importance of the words, lines, and segments. We test the -model over a 117-genre dataset and a reduced 20-genre dataset. Experimental -results show that the HAN outperforms both non-neural models and simpler neural -models, whilst also classifying over a higher number of genres than previous -research. Through the learning process we can also visualise which words or -lines in a song the model believes are important to classifying the genre. As a -result the HAN provides insights, from a computational perspective, into -lyrical structure and language features that differentiate musical genres. -" -5223,1707.04724,Samer Abdallah,"Memoisation: Purely, Left-recursively, and with (Continuation Passing) - Style",cs.LO cs.CL cs.PL," Memoisation, or tabling, is a well-known technique that yields large -improvements in the performance of some recursive computations. Tabled -resolution in Prologs such as XSB and B-Prolog can transform so called -left-recursive predicates from non-terminating computations into finite and -well-behaved ones. In the functional programming literature, memoisation has -usually been implemented in a way that does not handle left-recursion, -requiring supplementary mechanisms to prevent non-termination. A notable -exception is Johnson's (1995) continuation passing approach in Scheme. This, -however, relies on mutation of a memo table data structure and coding in -explicit continuation passing style. We show how Johnson's approach can be -implemented purely functionally in a modern, strongly typed functional language -(OCaml), presented via a monadic interface that hides the implementation -details, yet providing a way to return a compact represention of the memo -tables at the end of the computation. -" -5224,1707.04817,Shervin Malmasi,Open-Set Language Identification,cs.CL," We present the first open-set language identification experiments using -one-class classification. We first highlight the shortcomings of traditional -feature extraction methods and propose a hashing-based feature vectorization -approach as a solution. Using a dataset of 10 languages from different writing -systems, we train a One- Class Support Vector Machine using only a monolingual -corpus for each language. Each model is evaluated against a test set of data -from all 10 languages and we achieve an average F-score of 0.99, highlighting -the effectiveness of this approach for open-set language identification. -" -5225,1707.04848,Shuntaro Takahashi and Kumiko Tanaka-Ishii,Do Neural Nets Learn Statistical Laws behind Natural Language?,cs.CL," The performance of deep learning in natural language processing has been -spectacular, but the reasons for this success remain unclear because of the -inherent complexity of deep learning. This paper provides empirical evidence of -its effectiveness and of a limitation of neural networks for language -engineering. Precisely, we demonstrate that a neural language model based on -long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, -two representative statistical properties underlying natural language. We -discuss the quality of reproducibility and the emergence of Zipf's law and -Heaps' law as training progresses. We also point out that the neural language -model has a limitation in reproducing long-range correlation, another -statistical property of natural language. This understanding could provide a -direction for improving the architectures of neural networks. -" -5226,1707.04860,"Amir Bakarov, Olga Gureenkova","Automated Detection of Non-Relevant Posts on the Russian Imageboard - ""2ch"": Importance of the Choice of Word Representations",cs.CL," This study considers the problem of automated detection of non-relevant posts -on Web forums and discusses the approach of resolving this problem by -approximation it with the task of detection of semantic relatedness between the -given post and the opening post of the forum discussion thread. The -approximated task could be resolved through learning the supervised classifier -with a composed word embeddings of two posts. Considering that the success in -this task could be quite sensitive to the choice of word representations, we -propose a comparison of the performance of different word embedding models. We -train 7 models (Word2Vec, Glove, Word2Vec-f, Wang2Vec, AdaGram, FastText, -Swivel), evaluate embeddings produced by them on dataset of human judgements -and compare their performance on the task of non-relevant posts detection. To -make the comparison, we propose a dataset of semantic relatedness with posts -from one of the most popular Russian Web forums, imageboard ""2ch"", which has -challenging lexical and grammatical features. -" -5227,1707.04879,"Andros Tjandra, Sakriani Sakti, Satoshi Nakamura",Listening while Speaking: Speech Chain by Deep Learning,cs.CL cs.LG cs.SD," Despite the close relationship between speech perception and production, -research in automatic speech recognition (ASR) and text-to-speech synthesis -(TTS) has progressed more or less independently without exerting much mutual -influence on each other. In human communication, on the other hand, a -closed-loop speech chain mechanism with auditory feedback from the speaker's -mouth to her ear is crucial. In this paper, we take a step further and develop -a closed-loop speech chain model based on deep learning. The -sequence-to-sequence model in close-loop architecture allows us to train our -model on the concatenation of both labeled and unlabeled data. While ASR -transcribes the unlabeled speech features, TTS attempts to reconstruct the -original speech waveform based on the text from ASR. In the opposite direction, -ASR also attempts to reconstruct the original text transcription given the -synthesized speech. To the best of our knowledge, this is the first deep -learning model that integrates human speech perception and production -behaviors. Our experimental results show that the proposed approach -significantly improved the performance more than separate systems that were -only trained with labeled data. -" -5228,1707.04913,"Rasmus Berg Palm, Dirk Hovy, Florian Laws, Ole Winther",End-to-End Information Extraction without Token-Level Supervision,cs.CL," Most state-of-the-art information extraction approaches rely on token-level -labels to find the areas of interest in text. Unfortunately, these labels are -time-consuming and costly to create, and consequently, not available for many -real-life IE tasks. To make matters worse, token-level labels are usually not -the desired output, but just an intermediary step. End-to-end (E2E) models, -which take raw text as input and produce the desired output directly, need not -depend on token-level labels. We propose an E2E model based on pointer -networks, which can be trained directly on pairs of raw input and output text. -We evaluate our model on the ATIS data set, MIT restaurant corpus and the MIT -movie corpus and compare to neural baselines that do use token-level labels. We -achieve competitive results, within a few percentage points of the baselines, -showing the feasibility of E2E information extraction without the need for -token-level labels. This opens up new possibilities, as for many tasks -currently addressed by human extractors, raw input and output data are -available, but not token-level labels. -" -5229,1707.04935,"Serhii Hamotskyi, Anis Rojbi, Sergii Stirenko, and Yuri Gordienko",Automatized Generation of Alphabets of Symbols,cs.HC cs.CL cs.CY," In this paper, we discuss the generation of symbols (and alphabets) based on -specific user requirements (medium, priorities, type of information that needs -to be conveyed). A framework for the generation of alphabets is proposed, and -its use for the generation of a shorthand writing system is explored. We -discuss the possible use of machine learning and genetic algorithms to gather -inputs for generation of such alphabets and for optimization of already -generated ones. The alphabets generated using such methods may be used in very -different fields, from the creation of synthetic languages and constructed -scripts to the creation of sensible commands for multimodal interaction through -Human-Computer Interfaces, such as mouse gestures, touchpads, body gestures, -eye-tracking cameras, and brain-computing Interfaces, especially in -applications for elderly care and people with disabilities. -" -5230,1707.04968,"Chao Ma, Chunhua Shen, Anthony Dick, Qi Wu, Peng Wang, Anton van den - Hengel, Ian Reid",Visual Question Answering with Memory-Augmented Networks,cs.CV cs.CL," In this paper, we exploit a memory-augmented neural network to predict -accurate answers to visual questions, even when those answers occur rarely in -the training set. The memory network incorporates both internal and external -memory blocks and selectively pays attention to each training exemplar. We show -that memory-augmented neural networks are able to maintain a relatively -long-term memory of scarce training exemplars, which is important for visual -question answering due to the heavy-tailed distribution of answers in a general -VQA setting. Experimental results on two large-scale benchmark datasets show -the favorable performance of the proposed algorithm with a comparison to state -of the art. -" -5231,1707.05000,Jiangming Liu and Yue Zhang,In-Order Transition-based Constituent Parsing,cs.CL," Both bottom-up and top-down strategies have been used for neural -transition-based constituent parsing. The parsing strategies differ in terms of -the order in which they recognize productions in the derivation tree, where -bottom-up strategies and top-down strategies take post-order and pre-order -traversal over trees, respectively. Bottom-up parsers benefit from rich -features from readily built partial parses, but lack lookahead guidance in the -parsing process; top-down parsers benefit from non-local guidance for local -decisions, but rely on a strong encoder over the input to predict a constituent -hierarchy before its construction.To mitigate both issues, we propose a novel -parsing system based on in-order traversal over syntactic trees, designing a -set of transition actions to find a compromise between bottom-up constituent -information and top-down lookahead information. Based on stack-LSTM, our -psycholinguistically motivated constituent parsing system achieves 91.8 F1 on -WSJ benchmark. Furthermore, the system achieves 93.6 F1 with supervised -reranking and 94.2 F1 with semi-supervised reranking, which are the best -results on the WSJ benchmark. -" -5232,1707.05005,"Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, - Lihui Chen, Yang Liu and Shantanu Jaiswal",graph2vec: Learning Distributed Representations of Graphs,cs.AI cs.CL cs.CR cs.NE cs.SE," Recent works on representation learning for graph structured data -predominantly focus on learning distributed representations of graph -substructures such as nodes and subgraphs. However, many graph analytics tasks -such as graph classification and clustering require representing entire graphs -as fixed length feature vectors. While the aforementioned approaches are -naturally unequipped to learn such representations, graph kernels remain as the -most effective way of obtaining them. However, these graph kernels use -handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are -hampered by problems such as poor generalization. To address this limitation, -in this work, we propose a neural embedding framework named graph2vec to learn -data-driven distributed representations of arbitrary sized graphs. graph2vec's -embeddings are learnt in an unsupervised manner and are task agnostic. Hence, -they could be used for any downstream task such as graph classification, -clustering and even seeding supervised representation learning approaches. Our -experiments on several benchmark and large real-world datasets show that -graph2vec achieves significant improvements in classification and clustering -accuracies over substructure representation learning approaches and are -competitive with state-of-the-art graph kernels. -" -5233,1707.05015,"Ethan Fast, Binbin Chen, Julia Mendelsohn, Jonathan Bassen, Michael - Bernstein",Iris: A Conversational Agent for Complex Tasks,cs.HC cs.CL," Today's conversational agents are restricted to simple standalone commands. -In this paper, we present Iris, an agent that draws on human conversational -strategies to combine commands, allowing it to perform more complex tasks that -it has not been explicitly designed to support: for example, composing one -command to ""plot a histogram"" with another to first ""log-transform the data"". -To enable this complexity, we introduce a domain specific language that -transforms commands into automata that Iris can compose, sequence, and execute -dynamically by interacting with a user through natural language, as well as a -conversational type system that manages what kinds of commands can be combined. -We have designed Iris to help users with data science tasks, a domain that -requires support for command combination. In evaluation, we find that data -scientists complete a predictive modeling task significantly faster (2.6 times -speedup) with Iris than a modern non-conversational programming environment. -Iris supports the same kinds of commands as today's agents, but empowers users -to weave together these commands to accomplish complex goals. -" -5234,1707.05114,"Baosong Yang, Derek F. Wong, Tong Xiao, Lidia S. Chao, Jingbo Zhu","Towards Bidirectional Hierarchical Representations for Attention-Based - Neural Machine Translation",cs.CL," This paper proposes a hierarchical attentional neural translation model which -focuses on enhancing source-side hierarchical representations by covering both -local and global semantic information using a bidirectional tree-based encoder. -To maximize the predictive likelihood of target words, a weighted variant of an -attention mechanism is used to balance the attentive information between -lexical and phrase vectors. Using a tree-based rare word encoding, the proposed -model is extended to sub-word level to alleviate the out-of-vocabulary (OOV) -problem. Empirical results reveal that the proposed model significantly -outperforms sequence-to-sequence attention-based and tree-based neural -translation models in English-Chinese translation tasks. -" -5235,1707.05115,"Anssi Yli-Jyr\""a",The Power of Constraint Grammars Revisited,cs.FL cs.AI cs.CL," Sequential Constraint Grammar (SCG) (Karlsson, 1990) and its extensions have -lacked clear connections to formal language theory. The purpose of this article -is to lay a foundation for these connections by simplifying the definition of -strings processed by the grammar and by showing that Nonmonotonic SCG is -undecidable and that derivations similar to the Generative Phonology exist. The -current investigations propose resource bounds that restrict the generative -power of SCG to a subset of context sensitive languages and present a strong -finite-state condition for grammars as wholes. We show that a grammar is -equivalent to a finite-state transducer if it is implemented with a Turing -machine that runs in o(n log n) time. This condition opens new finite-state -hypotheses and avenues for deeper analysis of SCG instances in the way inspired -by Finite-State Phonology. -" -5236,1707.05116,Rob van der Goot and Barbara Plank and Malvina Nissim,"To Normalize, or Not to Normalize: The Impact of Normalization on - Part-of-Speech Tagging",cs.CL," Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, -non-canonical data? To the best of our knowledge, little is known on the actual -impact of normalization in a real-world scenario, where gold error detection is -not available. We investigate the effect of automatic normalization on POS -tagging of tweets. We also compare normalization to strategies that leverage -large amounts of unlabeled data kept in its raw form. Our results show that -normalization helps, but does not add consistently beyond just word embedding -layer initialization. The latter approach yields a tagging model that is -competitive with a Twitter state-of-the-art tagger. -" -5237,1707.05118,Alexandre Berard and Olivier Pietquin and Laurent Besacier,LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task,cs.CL," This paper presents the LIG-CRIStAL submission to the shared Automatic Post- -Editing task of WMT 2017. We propose two neural post-editing models: a -monosource model with a task-specific attention mechanism, which performs -particularly well in a low-resource scenario; and a chained architecture which -makes use of the source sentence to provide extra context. This latter -architecture manages to slightly improve our results when more training data is -available. We present and discuss our results on two datasets (en-de and de-en) -that are made available for the task. -" -5238,1707.05127,Jie Yang and Yue Zhang and Fei Dong,Neural Reranking for Named Entity Recognition,cs.CL," We propose a neural reranking system for named entity recognition (NER). The -basic idea is to leverage recurrent neural network models to learn -sentence-level patterns that involve named entity mentions. In particular, -given an output sentence produced by a baseline NER model, we replace all -entity mentions, such as \textit{Barack Obama}, into their entity types, such -as \textit{PER}. The resulting sentence patterns contain direct output -information, yet is less sparse without specific named entities. For example, -""PER was born in LOC"" can be such a pattern. LSTM and CNN structures are -utilised for learning deep representations of such sentences for reranking. -Results show that our system can significantly improve the NER accuracies over -two different baselines, giving the best reported results on a standard -benchmark. -" -5239,1707.05227,"Marek Rei, Helen Yannakoudakis",Auxiliary Objectives for Neural Error Detection Models,cs.CL cs.LG cs.NE," We investigate the utility of different auxiliary objectives and training -strategies within a neural sequence labeling approach to error detection in -learner writing. Auxiliary costs provide the model with additional linguistic -information, allowing it to learn general-purpose compositional features that -can then be exploited for other objectives. Our experiments show that a joint -learning approach trained with parallel labels on in-domain data improves -performance over the previous best error detection system. While the resulting -model has the same number of parameters, the additional objectives allow it to -be optimised more efficiently and achieve better performance. -" -5240,1707.05233,Marek Rei,Detecting Off-topic Responses to Visual Prompts,cs.CL cs.LG cs.NE," Automated methods for essay scoring have made great progress in recent years, -achieving accuracies very close to human annotators. However, a known weakness -of such automated scorers is not taking into account the semantic relevance of -the submitted text. While there is existing work on detecting answer relevance -given a textual prompt, very little previous research has been done to -incorporate visual writing prompts. We propose a neural architecture and -several extensions for detecting off-topic responses to visual prompts and -evaluate it on a dataset of texts written by language learners. -" -5241,1707.05236,"Marek Rei, Mariano Felice, Zheng Yuan, Ted Briscoe","Artificial Error Generation with Machine Translation and Syntactic - Patterns",cs.CL cs.LG," Shortage of available training data is holding back progress in the area of -automated error detection. This paper investigates two alternative methods for -artificially generating writing errors, in order to create additional -resources. We propose treating error generation as a machine translation task, -where grammatically correct text is translated to contain errors. In addition, -we explore a system for extracting textual patterns from an annotated corpus, -which can then be used to insert errors into grammatically correct sentences. -Our experiments show that the inclusion of artificially generated errors -significantly improves error detection accuracy on both FCE and CoNLL 2014 -datasets. -" -5242,1707.05246,"Sebastian Ruder, Barbara Plank",Learning to select data for transfer learning with Bayesian Optimization,cs.CL cs.LG," Domain similarity measures can be used to gauge adaptability and select -suitable data for transfer learning, but existing approaches define ad hoc -measures that are deemed suitable for respective tasks. Inspired by work on -curriculum learning, we propose to \emph{learn} data selection measures using -Bayesian Optimization and evaluate them across models, domains and tasks. Our -learned measures outperform existing domain similarity measures significantly -on three tasks: sentiment analysis, part-of-speech tagging, and parsing. We -show the importance of complementing similarity with diversity, and that -learned measures are -- to some degree -- transferable across models, domains, -and even tasks. -" -5243,1707.05254,"Rose Catherine, Kathryn Mazaitis, Maxine Eskenazi and William Cohen",Explainable Entity-based Recommendations with Knowledge Graphs,cs.IR cs.CL," Explainable recommendation is an important task. Many methods have been -proposed which generate explanations from the content and reviews written for -items. When review text is unavailable, generating explanations is still a hard -problem. In this paper, we illustrate how explanations can be generated in such -a scenario by leveraging external knowledge in the form of knowledge graphs. -Our method jointly ranks items and knowledge graph entities using a -Personalized PageRank procedure to produce recommendations together with their -explanations. -" -5244,1707.05261,"Franziska Horn, Leila Arras, Gr\'egoire Montavon, Klaus-Robert - M\""uller, and Wojciech Samek",Exploring text datasets by visualizing relevant words,cs.CL," When working with a new dataset, it is important to first explore and -familiarize oneself with it, before applying any advanced machine learning -algorithms. However, to the best of our knowledge, no tools exist that quickly -and reliably give insight into the contents of a selection of documents with -respect to what distinguishes them from other documents belonging to different -categories. In this paper we propose to extract `relevant words' from a -collection of texts, which summarize the contents of documents belonging to a -certain class (or discovered cluster in the case of unlabeled datasets), and -visualize them in word clouds to allow for a survey of salient features at a -glance. We compare three methods for extracting relevant words and demonstrate -the usefulness of the resulting word clouds by providing an overview of the -classes contained in a dataset of scientific publications as well as by -discovering trending topics from recent New York Times article snippets. -" -5245,1707.05266,"Oren Melamud, Ido Dagan, Jacob Goldberger",A Simple Language Model based on PMI Matrix Approximations,cs.CL," In this study, we introduce a new approach for learning language models by -training them to estimate word-context pointwise mutual information (PMI), and -then deriving the desired conditional probabilities from PMI at test time. -Specifically, we show that with minor modifications to word2vec's algorithm, we -get principled language models that are closely related to the well-established -Noise Contrastive Estimation (NCE) based language models. A compelling aspect -of our approach is that our models are trained with the same simple negative -sampling objective function that is commonly used in word2vec to learn word -embeddings. -" -5246,1707.05288,"Diego Moussallem, Ricardo Usbeck, Michael R\""oder and Axel-Cyrille - Ngonga Ngomo","MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity - Linking Approach",cs.CL," Entity linking has recently been the subject of a significant body of -research. Currently, the best performing approaches rely on trained -mono-lingual models. Porting these approaches to other languages is -consequently a difficult endeavor as it requires corresponding training data -and retraining of the models. We address this drawback by presenting a novel -multilingual, knowledge-based agnostic and deterministic approach to entity -linking, dubbed MAG. MAG is based on a combination of context-based retrieval -on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data -sets and in 7 languages. Our results show that the best approach trained on -English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse -on datasets in other languages. MAG, on the other hand, achieves -state-of-the-art performance on English datasets and reaches a micro F-measure -that is up to 0.6 higher than that of PBOH on non-English languages. -" -5247,1707.05315,"Cheng-Tao Chung, Cheng-Yu Tsai, Chia-Hsiang Liu and Lin-Shan Lee","Unsupervised Iterative Deep Learning of Speech Features and Acoustic - Tokens with Applications to Spoken Term Detection",cs.CL," In this paper we aim to automatically discover high quality frame-level -speech features and acoustic tokens directly from unlabeled speech data. A -Multi-granular Acoustic Tokenizer (MAT) was proposed for automatic discovery of -multiple sets of acoustic tokens from the given corpus. Each acoustic token set -is specified by a set of hyperparameters describing the model configuration. -These different sets of acoustic tokens carry different characteristics for the -given corpus and the language behind, thus can be mutually reinforced. The -multiple sets of token labels are then used as the targets of a Multi-target -Deep Neural Network (MDNN) trained on frame-level acoustic features. Bottleneck -features extracted from the MDNN are then used as the feedback input to the MAT -and the MDNN itself in the next iteration. The multi-granular acoustic token -sets and the frame-level speech features can be iteratively optimized in the -iterative deep learning framework. We call this framework the Multi-granular -Acoustic Tokenizing Deep Neural Network (MATDNN). The results were evaluated -using the metrics and corpora defined in the Zero Resource Speech Challenge -organized at Interspeech 2015, and improved performance was obtained with a set -of experiments of query-by-example spoken term detection on the same corpora. -Visualization for the discovered tokens against the English phonemes was also -shown. -" -5248,1707.05436,"Huadong Chen, Shujian Huang, David Chiang, Jiajun Chen","Improved Neural Machine Translation with a Syntax-Aware Encoder and - Decoder",cs.CL," Most neural machine translation (NMT) models are based on the sequential -encoder-decoder framework, which makes no use of syntactic information. In this -paper, we improve this model by explicitly incorporating source-side syntactic -trees. More specifically, we propose (1) a bidirectional tree encoder which -learns both sequential and tree structured representations; (2) a tree-coverage -model that lets the attention depend on the source-side syntax. Experiments on -Chinese-English translation demonstrate that our proposed models outperform the -sequential attentional model as well as a stronger baseline with a bottom-up -tree encoder and word coverage. -" -5249,1707.05438,"Huadong Chen, Shujian Huang, David Chiang, Xinyu Dai, Jiajun Chen","Top-Rank Enhanced Listwise Optimization for Statistical Machine - Translation",cs.CL," Pairwise ranking methods are the basis of many widely used discriminative -training approaches for structure prediction problems in natural language -processing(NLP). Decomposing the problem of ranking hypotheses into pairwise -comparisons enables simple and efficient solutions. However, neglecting the -global ordering of the hypothesis list may hinder learning. We propose a -listwise learning framework for structure prediction problems such as machine -translation. Our framework directly models the entire translation list's -ordering to learn parameters which may better fit the given listwise samples. -Furthermore, we propose top-rank enhanced loss functions, which are more -sensitive to ranking errors at higher positions. Experiments on a large-scale -Chinese-English translation task show that both our listwise learning framework -and top-rank enhanced listwise losses lead to significant improvements in -translation quality. -" -5250,1707.05468,Elena Mikhalkova and Yuri Karyakin,Detecting Intentional Lexical Ambiguity in English Puns,cs.CL," The article describes a model of automatic analysis of puns, where a word is -intentionally used in two meanings at the same time (the target word). We -employ Roget's Thesaurus to discover two groups of words which, in a pun, form -around two abstract bits of meaning (semes). They become a semantic vector, -based on which an SVM classifier learns to recognize puns, reaching a score -0.73 for F-measure. We apply several rule-based methods to locate intentionally -ambiguous (target) words, based on structural and semantic criteria. It appears -that the structural criterion is more effective, although it possibly -characterizes only the tested dataset. The results we get correlate with the -results of other teams at SemEval-2017 competition (Task 7 Detection and -Interpretation of English Puns) considering effects of using supervised -learning models and word statistics. -" -5251,1707.05479,Elena Mikhalkova and Yuri Karyakin,"PunFields at SemEval-2017 Task 7: Employing Roget's Thesaurus in - Automatic Pun Recognition and Interpretation",cs.CL," The article describes a model of automatic interpretation of English puns, -based on Roget's Thesaurus, and its implementation, PunFields. In a pun, the -algorithm discovers two groups of words that belong to two main semantic -fields. The fields become a semantic vector based on which an SVM classifier -learns to recognize puns. A rule-based model is then applied for recognition of -intentionally ambiguous (target) words and their definitions. In SemEval Task 7 -PunFields shows a considerably good result in pun classification, but requires -improvement in searching for the target word and its definition. -" -5252,1707.05481,Elena Mikhalkova and Nadezhda Ganzherli and Yuri Karyakin,"A Comparative Analysis of Social Network Pages by Interests of Their - Followers",cs.CL cs.IR cs.SI," Being a matter of cognition, user interests should be apt to classification -independent of the language of users, social network and content of interest -itself. To prove it, we analyze a collection of English and Russian Twitter and -Vkontakte community pages by interests of their followers. First, we create a -model of Major Interests (MaIs) with the help of expert analysis and then -classify a set of pages using machine learning algorithms (SVM, Neural Network, -Naive Bayes, and some other). We take three interest domains that are typical -of both English and Russian-speaking communities: football, rock music, -vegetarianism. The results of classification show a greater correlation between -Russian-Vkontakte and Russian-Twitter pages while English-Twitterpages appear -to provide the highest score. -" -5253,1707.05501,"Parag Jain, Priyanka Agrawal, Abhijit Mishra, Mohak Sukhwani, Anirban - Laha, Karthik Sankaranarayanan",Story Generation from Sequence of Independent Short Descriptions,cs.CL," Existing Natural Language Generation (NLG) systems are weak AI systems and -exhibit limited capabilities when language generation tasks demand higher -levels of creativity, originality and brevity. Effective solutions or, at least -evaluations of modern NLG paradigms for such creative tasks have been elusive, -unfortunately. This paper introduces and addresses the task of coherent story -generation from independent descriptions, describing a scene or an event. -Towards this, we explore along two popular text-generation paradigms -- (1) -Statistical Machine Translation (SMT), posing story generation as a translation -problem and (2) Deep Learning, posing story generation as a sequence to -sequence learning problem. In SMT, we chose two popular methods such as phrase -based SMT (PB-SMT) and syntax based SMT (SYNTAX-SMT) to `translate' the -incoherent input text into stories. We then implement a deep recurrent neural -network (RNN) architecture that encodes sequence of variable length input -descriptions to corresponding latent representations and decodes them to -produce well formed comprehensive story like summaries. The efficacy of the -suggested approaches is demonstrated on a publicly available dataset with the -help of popular machine translation and summarization evaluation metrics. -" -5254,1707.05589,"G\'abor Melis, Chris Dyer, Phil Blunsom",On the State of the Art of Evaluation in Neural Language Models,cs.CL," Ongoing innovations in recurrent neural network architectures have provided a -steady influx of apparently state-of-the-art results on language modelling -benchmarks. However, these have been evaluated using differing code bases and -limited computational resources, which represent uncontrolled sources of -experimental variation. We reevaluate several popular architectures and -regularisation methods with large-scale automatic black-box hyperparameter -tuning and arrive at the somewhat surprising conclusion that standard LSTM -architectures, when properly regularised, outperform more recent models. We -establish a new state of the art on the Penn Treebank and Wikitext-2 corpora, -as well as strong baselines on the Hutter Prize dataset. -" -5255,1707.05612,"Fartash Faghri, David J. Fleet, Jamie Ryan Kiros and Sanja Fidler",VSE++: Improving Visual-Semantic Embeddings with Hard Negatives,cs.LG cs.CL cs.CV," We present a new technique for learning visual-semantic embeddings for -cross-modal retrieval. Inspired by hard negative mining, the use of hard -negatives in structured prediction, and ranking loss functions, we introduce a -simple change to common loss functions used for multi-modal embeddings. That, -combined with fine-tuning and use of augmented data, yields significant gains -in retrieval performance. We showcase our approach, VSE++, on MS-COCO and -Flickr30K datasets, using ablation studies and comparisons with existing -methods. On MS-COCO our approach outperforms state-of-the-art methods by 8.8% -in caption retrieval and 11.3% in image retrieval (at R@1). -" -5256,1707.05635,"Ruqing Zhang, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng",Spherical Paragraph Model,cs.CL," Representing texts as fixed-length vectors is central to many language -processing tasks. Most traditional methods build text representations based on -the simple Bag-of-Words (BoW) representation, which loses the rich semantic -relations between words. Recent advances in natural language processing have -shown that semantically meaningful representations of words can be efficiently -acquired by distributed models, making it possible to build text -representations based on a better foundation called the Bag-of-Word-Embedding -(BoWE) representation. However, existing text representation methods using BoWE -often lack sound probabilistic foundations or cannot well capture the semantic -relatedness encoded in word vectors. To address these problems, we introduce -the Spherical Paragraph Model (SPM), a probabilistic generative model based on -BoWE, for text representation. SPM has good probabilistic interpretability and -can fully leverage the rich semantics of words, the word co-occurrence -information as well as the corpus-wide information to help the representation -learning of texts. Experimental results on topical classification and sentiment -analysis demonstrate that SPM can achieve new state-of-the-art performances on -several benchmark datasets. -" -5257,1707.05720,"Mohit Shridhar, David Hsu","Grounding Spatio-Semantic Referring Expressions for Human-Robot - Interaction",cs.RO cs.AI cs.CL," The human language is one of the most natural interfaces for humans to -interact with robots. This paper presents a robot system that retrieves -everyday objects with unconstrained natural language descriptions. A core issue -for the system is semantic and spatial grounding, which is to infer objects and -their spatial relationships from images and natural language expressions. We -introduce a two-stage neural-network grounding pipeline that maps natural -language referring expressions directly to objects in the images. The first -stage uses visual descriptions in the referring expressions to generate a -candidate set of relevant objects. The second stage examines all pairwise -relationships between the candidates and predicts the most likely referred -object according to the spatial descriptions in the referring expressions. A -key feature of our system is that by leveraging a large dataset of images -labeled with text descriptions, it allows unrestricted object types and natural -language referring expressions. Preliminary results indicate that our system -outperforms a near state-of-the-art object comprehension system on standard -benchmark datasets. We also present a robot system that follows voice commands -to pick and place previously unseen objects. -" -5258,1707.05850,Elham Shahab,A Short Survey of Biomedical Relation Extraction Techniques,cs.CL," Biomedical information is growing rapidly in the recent years and retrieving -useful data through information extraction system is getting more attention. In -the current research, we focus on different aspects of relation extraction -techniques in biomedical domain and briefly describe the state-of-the-art for -relation extraction between a variety of biological elements. -" -5259,1707.05853,Glorianna Jagfeld and Ngoc Thang Vu,"Encoding Word Confusion Networks with Recurrent Neural Networks for - Dialog State Tracking",cs.CL," This paper presents our novel method to encode word confusion networks, which -can represent a rich hypothesis space of automatic speech recognition systems, -via recurrent neural networks. We demonstrate the utility of our approach for -the task of dialog state tracking in spoken dialog systems that relies on -automatic speech recognition output. Encoding confusion networks outperforms -encoding the best hypothesis of the automatic speech recognition in a neural -system for dialog state tracking on the well-known second Dialog State Tracking -Challenge dataset. -" -5260,1707.05928,"Yanyao Shen, Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, Animashree - Anandkumar",Deep Active Learning for Named Entity Recognition,cs.CL," Deep learning has yielded state-of-the-art performance on many natural -language processing tasks including named entity recognition (NER). However, -this typically requires large amounts of labeled data. In this work, we -demonstrate that the amount of labeled training data can be drastically reduced -when deep learning is combined with active learning. While active learning is -sample-efficient, it can be computationally expensive since it requires -iterative retraining. To speed this up, we introduce a lightweight architecture -for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and -word encoders and a long short term memory (LSTM) tag decoder. The model -achieves nearly state-of-the-art performance on standard datasets for the task -while being computationally much more efficient than best performing models. We -carry out incremental active learning, during the training process, and are -able to nearly match state-of-the-art performance with just 25\% of the -original training data. -" -5261,1707.05967,"Enrico Santus, Emmanuele Chersoni, Alessandro Lenci, Philippe Blache",Measuring Thematic Fit with Distributional Feature Overlap,cs.CL," In this paper, we introduce a new distributional method for modeling -predicate-argument thematic fit judgments. We use a syntax-based DSM to build a -prototypical representation of verb-specific roles: for every verb, we extract -the most salient second order contexts for each of its roles (i.e. the most -salient dimensions of typical role fillers), and then we compute thematic fit -as a weighted overlap between the top features of candidate fillers and role -prototypes. Our experiments show that our method consistently outperforms a -baseline re-implementing a state-of-the-art system, and achieves better or -comparable results to those reported in the literature for the other -unsupervised systems. Moreover, it provides an explicit representation of the -features characterizing verb-specific semantic roles. -" -5262,1707.06002,"Ivan Habernal, Raffael Hannemann, Christian Pollak, Christopher Klamm, - Patrick Pauli and Iryna Gurevych",Argotario: Computational Argumentation Meets Serious Games,cs.CL," An important skill in critical thinking and argumentation is the ability to -spot and recognize fallacies. Fallacious arguments, omnipresent in -argumentative discourse, can be deceptive, manipulative, or simply leading to -`wrong moves' in a discussion. Despite their importance, argumentation scholars -and NLP researchers with focus on argumentation quality have not yet -investigated fallacies empirically. The nonexistence of resources dealing with -fallacious argumentation calls for scalable approaches to data acquisition and -annotation, for which the serious games methodology offers an appealing, yet -unexplored, alternative. We present Argotario, a serious game that deals with -fallacies in everyday argumentation. Argotario is a multilingual, open-source, -platform-independent application with strong educational aspects, accessible at -www.argotario.net. -" -5263,1707.06012,"Ale\v{s} Tamchyna, Marion Weller-Di Marco, Alexander Fraser",Modeling Target-Side Inflection in Neural Machine Translation,cs.CL," NMT systems have problems with large vocabulary sizes. Byte-pair encoding -(BPE) is a popular approach to solving this problem, but while BPE allows the -system to generate any target-side word, it does not enable effective -generalization over the rich vocabulary in morphologically rich languages with -strong inflectional phenomena. We introduce a simple approach to overcome this -problem by training a system to produce the lemma of a word and its -morphologically rich POS tag, which is then followed by a deterministic -generation step. We apply this strategy for English-Czech and English-German -translation scenarios, obtaining improvements in both settings. We furthermore -show that the improvement is not due to only adding explicit morphological -information. -" -5264,1707.06065,"Taesup Kim, Inchul Song, Yoshua Bengio","Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in - Speech Recognition",cs.CL cs.LG," Layer normalization is a recently introduced technique for normalizing the -activities of neurons in deep neural networks to improve the training speed and -stability. In this paper, we introduce a new layer normalization technique -called Dynamic Layer Normalization (DLN) for adaptive neural acoustic modeling -in speech recognition. By dynamically generating the scaling and shifting -parameters in layer normalization, DLN adapts neural acoustic models to the -acoustic variability arising from various factors such as speakers, channel -noises, and environments. Unlike other adaptive acoustic models, our proposed -approach does not require additional adaptation data or speaker information -such as i-vectors. Moreover, the model size is fixed as it dynamically -generates adaptation parameters. We apply our proposed DLN to deep -bidirectional LSTM acoustic models and evaluate them on two benchmark datasets -for large vocabulary ASR experiments: WSJ and TED-LIUM release 2. The -experimental results show that our DLN improves neural acoustic models in terms -of transcription accuracy by dynamically adapting to various speakers and -environments. -" -5265,1707.06100,"Franziska Horn, Leila Arras, Gr\'egoire Montavon, Klaus-Robert - M\""uller, and Wojciech Samek",Discovering topics in text datasets by visualizing relevant words,cs.CL," When dealing with large collections of documents, it is imperative to quickly -get an overview of the texts' contents. In this paper we show how this can be -achieved by using a clustering algorithm to identify topics in the dataset and -then selecting and visualizing relevant words, which distinguish a group of -documents from the rest of the texts, to summarize the contents of the -documents belonging to each topic. We demonstrate our approach by discovering -trending topics in a collection of New York Times article snippets. -" -5266,1707.06130,"Fr\'ederic Godin, Joni Dambre and Wesley De Neve","Improving Language Modeling using Densely Connected Recurrent Neural - Networks",cs.CL," In this paper, we introduce the novel concept of densely connected layers -into recurrent neural networks. We evaluate our proposed architecture on the -Penn Treebank language modeling task. We show that we can obtain similar -perplexity scores with six times fewer parameters compared to a standard -stacked 2-layer LSTM model trained with dropout (Zaremba et al. 2014). In -contrast with the current usage of skip connections, we show that densely -connecting only a few stacked layers with skip connections already yields -significant perplexity reductions. -" -5267,1707.06151,"Aditya Joshi, Samarth Agrawal, Pushpak Bhattacharyya, Mark Carman","Expect the unexpected: Harnessing Sentence Completion for Sarcasm - Detection",cs.CL," The trigram `I love being' is expected to be followed by positive words such -as `happy'. In a sarcastic sentence, however, the word `ignored' may be -observed. The expected and the observed words are, thus, incongruous. We model -sarcasm detection as the task of detecting incongruity between an observed and -an expected word. In order to obtain the expected word, we use Context2Vec, a -sentence completion library based on Bidirectional LSTM. However, since the -exact word where such an incongruity occurs may not be known in advance, we -present two approaches: an All-words approach (which consults sentence -completion for every content word) and an Incongruous words-only approach -(which consults sentence completion for the 50% most incongruous content -words). The approaches outperform reported values for tweets but not for -discussion forum posts. This is likely to be because of redundant consultation -of sentence completion for discussion forum posts. Therefore, we consider an -oracle case where the exact incongruous word is manually labeled in a corpus -reported in past work. In this case, the performance is higher than the -all-words approach. This sets up the promise for using sentence completion for -sarcasm detection. -" -5268,1707.06163,"Georgi Dzhambazov, Andre Holzapfel, Ajay Srinivasamurthy, Xavier Serra",Metrical-accent Aware Vocal Onset Detection in Polyphonic Audio,cs.SD cs.CL cs.MM," The goal of this study is the automatic detection of onsets of the singing -voice in polyphonic audio recordings. Starting with a hypothesis that the -knowledge of the current position in a metrical cycle (i.e. metrical accent) -can improve the accuracy of vocal note onset detection, we propose a novel -probabilistic model to jointly track beats and vocal note onsets. The proposed -model extends a state of the art model for beat and meter tracking, in which -a-priori probability of a note at a specific metrical accent interacts with the -probability of observing a vocal note onset. We carry out an evaluation on a -varied collection of multi-instrument datasets from two music traditions -(English popular music and Turkish makam) with different types of metrical -cycles and singing styles. Results confirm that the proposed model reasonably -improves vocal note onset detection accuracy compared to a baseline model that -does not take metrical position into account. -" -5269,1707.06167,Eleftherios Avramidis,"Sentence-level quality estimation by predicting HTER as a - multi-component metric",cs.CL," This submission investigates alternative machine learning models for -predicting the HTER score on the sentence level. Instead of directly predicting -the HTER score, we suggest a model that jointly predicts the amount of the 4 -distinct post-editing operations, which are then used to calculate the HTER -score. This also gives the possibility to correct invalid (e.g. negative) -predicted values prior to the calculation of the HTER score. Without any -feature exploration, a multi-layer perceptron with 4 outputs yields small but -significant improvements over the baseline. -" -5270,1707.06195,"Yuri Khokhlov, Natalia Tomashenko, Ivan Medennikov, Alexei Romanenko",Fast and Accurate OOV Decoder on High-Level Features,cs.CL," This work proposes a novel approach to out-of-vocabulary (OOV) keyword search -(KWS) task. The proposed approach is based on using high-level features from an -automatic speech recognition (ASR) system, so called phoneme posterior based -(PPB) features, for decoding. These features are obtained by calculating -time-dependent phoneme posterior probabilities from word lattices, followed by -their smoothing. For the PPB features we developed a special novel very fast, -simple and efficient OOV decoder. Experimental results are presented on the -Georgian language from the IARPA Babel Program, which was the test language in -the OpenKWS 2016 evaluation campaign. The results show that in terms of maximum -term weighted value (MTWV) metric and computational speed, for single ASR -systems, the proposed approach significantly outperforms the state-of-the-art -approach based on using in-vocabulary proxies for OOV keywords in the indexed -database. The comparison of the two OOV KWS approaches on the fusion results of -the nine different ASR systems demonstrates that the proposed OOV decoder -outperforms the proxy-based approach in terms of MTWV metric given the -comparable processing speed. Other important advantages of the OOV decoder -include extremely low memory consumption and simplicity of its implementation -and parameter optimization. -" -5271,1707.06209,"Johannes Welbl, Nelson F. Liu, Matt Gardner",Crowdsourcing Multiple Choice Science Questions,cs.HC cs.AI cs.CL stat.ML," We present a novel method for obtaining high-quality, domain-targeted -multiple choice questions from crowd workers. Generating these questions can be -difficult without trading away originality, relevance or diversity in the -answer options. Our method addresses these problems by leveraging a large -corpus of domain-specific text and a small set of existing questions. It -produces model suggestions for document selection and answer distractor choice -which aid the human question generation process. With this method we have -assembled SciQ, a dataset of 13.7K multiple choice science exam questions -(Dataset available at http://allenai.org/data.html). We demonstrate that the -method produces in-domain questions by providing an analysis of this new -dataset and by showing that humans cannot distinguish the crowdsourced -questions from original questions. When using SciQ as additional training data -to existing questions, we observe accuracy improvements on real science exams. -" -5272,1707.06226,"Debanjan Ghosh, Alexander Richard Fabbri, Smaranda Muresan","The Role of Conversation Context for Sarcasm Detection in Online - Interactions",cs.CL cs.AI cs.LG," Computational models for sarcasm detection have often relied on the content -of utterances in isolation. However, speaker's sarcastic intent is not always -obvious without additional context. Focusing on social media discussions, we -investigate two issues: (1) does modeling of conversation context help in -sarcasm detection and (2) can we understand what part of conversation context -triggered the sarcastic reply. To address the first issue, we investigate -several types of Long Short-Term Memory (LSTM) networks that can model both the -conversation context and the sarcastic response. We show that the conditional -LSTM network (Rocktaschel et al., 2015) and LSTM networks with sentence level -attention on context and response outperform the LSTM model that reads only the -response. To address the second issue, we present a qualitative analysis of -attention weights produced by the LSTM models with attention and discuss the -results compared with human performance on the task. -" -5273,1707.06265,"Wei-Ning Hsu, Yu Zhang, James Glass","Unsupervised Domain Adaptation for Robust Speech Recognition via - Variational Autoencoder-Based Data Augmentation",cs.CL cs.LG," Domain mismatch between training and testing can lead to significant -degradation in performance in many machine learning scenarios. Unfortunately, -this is not a rare situation for automatic speech recognition deployments in -real-world applications. Research on robust speech recognition can be regarded -as trying to overcome this domain mismatch issue. In this paper, we address the -unsupervised domain adaptation problem for robust speech recognition, where -both source and target domain speech are presented, but word transcripts are -only available for the source domain speech. We present novel -augmentation-based methods that transform speech in a way that does not change -the transcripts. Specifically, we first train a variational autoencoder on both -source and target domain data (without supervision) to learn a latent -representation of speech. We then transform nuisance attributes of speech that -are irrelevant to recognition by modifying the latent representations, in order -to augment labeled training data with additional data whose distribution is -more similar to the target domain. The proposed method is evaluated on the -CHiME-4 dataset and reduces the absolute word error rate (WER) by as much as -35% compared to the non-adapted baseline. -" -5274,1707.06299,"Stefan Ultes, Pawe{\l} Budzianowski, I\~nigo Casanueva, Nikola - Mrk\v{s}i\'c, Lina Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica - Ga\v{s}i\'c and Steve Young","Reward-Balancing for Statistical Spoken Dialogue Systems using - Multi-objective Reinforcement Learning",cs.CL stat.ML," Reinforcement learning is widely used for dialogue policy optimization where -the reward function often consists of more than one component, e.g., the -dialogue success and the dialogue length. In this work, we propose a structured -method for finding a good balance between these components by searching for the -optimal reward component weighting. To render this search feasible, we use -multi-objective reinforcement learning to significantly reduce the number of -training dialogues required. We apply our proposed method to find optimized -component weights for six domains and compare them to a default baseline. -" -5275,1707.06320,"Douwe Kiela, Alexis Conneau, Allan Jabri and Maximilian Nickel",Learning Visually Grounded Sentence Representations,cs.CL cs.CV," We introduce a variety of models, trained on a supervised image captioning -corpus to predict the image features for a given caption, to perform sentence -representation grounding. We train a grounded sentence encoder that achieves -good performance on COCO caption and image retrieval and subsequently show that -this encoder can successfully be transferred to various NLP tasks, with -improved performance over text-only models. Lastly, we analyze the contribution -of grounding, and show that word embeddings learned by this system outperform -non-grounded ones. -" -5276,1707.06341,Karl Stratos,A Sub-Character Architecture for Korean Language Processing,cs.CL," We introduce a novel sub-character architecture that exploits a unique -compositional structure of the Korean language. Our method decomposes each -character into a small set of primitive phonetic units called jamo letters from -which character- and word-level representations are induced. The jamo letters -divulge syntactic and semantic information that is difficult to access with -conventional character-level units. They greatly alleviate the data sparsity -problem, reducing the observation space to 1.6% of the original while -increasing accuracy in our experiments. We apply our architecture to dependency -parsing and achieve dramatic improvement over strong lexical baselines. -" -5277,1707.06355,"Yunan Ye, Zhou Zhao, Yimeng Li, Long Chen, Jun Xiao and Yueting Zhuang","Video Question Answering via Attribute-Augmented Attention Network - Learning",cs.CV cs.AI cs.CL," Video Question Answering is a challenging problem in visual information -retrieval, which provides the answer to the referenced video content according -to the question. However, the existing visual question answering approaches -mainly tackle the problem of static image question, which may be ineffectively -for video question answering due to the insufficiency of modeling the temporal -dynamics of video contents. In this paper, we study the problem of video -question answering by modeling its temporal dynamics with frame-level attention -mechanism. We propose the attribute-augmented attention network learning -framework that enables the joint frame-level attribute detection and unified -video representation learning for video question answering. We then incorporate -the multi-step reasoning process for our proposed attention network to further -improve the performance. We construct a large-scale video question answering -dataset. We conduct the experiments on both multiple-choice and open-ended -video question answering tasks to show the effectiveness of the proposed -method. -" -5278,1707.06357,Majid Laali and Leila Kosseim,"Improving Discourse Relation Projection to Build Discourse Annotated - Corpora",cs.CL," The naive approach to annotation projection is not effective to project -discourse annotations from one language to another because implicit discourse -relations are often changed to explicit ones and vice-versa in the translation. -In this paper, we propose a novel approach based on the intersection between -statistical word-alignment models to identify unsupported discourse -annotations. This approach identified 65% of the unsupported annotations in the -English-French parallel sentences from Europarl. By filtering out these -unsupported annotations, we induced the first PDTB-style discourse annotated -corpus for French from Europarl. We then used this corpus to train a classifier -to identify the discourse-usage of French discourse connectives and show a 15% -improvement of F1-score compared to the classifier trained on the non-filtered -annotations. -" -5279,1707.06378,"Todor Mihaylov, Daniel Belchev, Yasen Kiprov, Ivan Koychev, Preslav - Nakov",Large-Scale Goodness Polarity Lexicons for Community Question Answering,cs.CL," We transfer a key idea from the field of sentiment analysis to a new domain: -community question answering (cQA). The cQA task we are interested in is the -following: given a question and a thread of comments, we want to re-rank the -comments so that the ones that are good answers to the question would be ranked -higher than the bad ones. We notice that good vs. bad comments use specific -vocabulary and that one can often predict the goodness/badness of a comment -even ignoring the question, based on the comment contents only. This leads us -to the idea to build a good/bad polarity lexicon as an analogy to the -positive/negative sentiment polarity lexicons, commonly used in sentiment -analysis. In particular, we use pointwise mutual information in order to build -large-scale goodness polarity lexicons in a semi-supervised manner starting -with a small number of initial seeds. The evaluation results show an -improvement of 0.7 MAP points absolute over a very strong baseline and -state-of-the art performance on SemEval-2016 Task 3. -" -5280,1707.06456,"Benjamin Heinzerling, Nafise Sadat Moosavi and Michael Strube",Revisiting Selectional Preferences for Coreference Resolution,cs.CL," Selectional preferences have long been claimed to be essential for -coreference resolution. However, they are mainly modeled only implicitly by -current coreference resolvers. We propose a dependency-based embedding model of -selectional preferences which allows fine-grained compatibility judgments with -high coverage. We show that the incorporation of our model improves coreference -resolution performance on the CoNLL dataset, matching the state-of-the-art -results of a more complex system. However, it comes with a cost that makes it -debatable how worthwhile such improvements are. -" -5281,1707.06480,"Zhenisbek Assylbekov, Rustem Takhanov, Bagdat Myrzakhmetov and - Jonathan N. Washington","Syllable-aware Neural Language Models: A Failure to Beat Character-aware - Ones",cs.CL cs.NE stat.ML," Syllabification does not seem to improve word-level RNN language modeling -quality when compared to character-based segmentation. However, our best -syllable-aware language model, achieving performance comparable to the -competitive character-aware model, has 18%-33% fewer parameters and is trained -1.2-2.2 times faster. -" -5282,1707.06519,"Chia-Hao Shen, Janet Y. Sung, Hung-Yi Lee","Language Transfer of Audio Word2Vec: Learning Audio Segment - Representations without Target Language Data",cs.CL cs.LG," Audio Word2Vec offers vector representations of fixed dimensionality for -variable-length audio segments using Sequence-to-sequence Autoencoder (SA). -These vector representations are shown to describe the sequential phonetic -structures of the audio segments to a good degree, with real world applications -such as query-by-example Spoken Term Detection (STD). This paper examines the -capability of language transfer of Audio Word2Vec. We train SA from one -language (source language) and use it to extract the vector representation of -the audio segments of another language (target language). We found that SA can -still catch phonetic structure from the audio segments of the target language -if the source and target languages are similar. In query-by-example STD, we -obtain the vector representations from the SA learned from a large amount of -source language data, and found them surpass the representations from naive -encoder and SA directly learned from a small amount of target language data. -The result shows that it is possible to learn Audio Word2Vec model from -high-resource languages and use it on low-resource languages. This further -expands the usability of Audio Word2Vec. -" -5283,1707.06527,"Yanmin Qian, Xuankai Chang and Dong Yu","Single-Channel Multi-talker Speech Recognition with Permutation - Invariant Training",cs.SD cs.CL cs.LG eess.AS," Although great progresses have been made in automatic speech recognition -(ASR), significant performance degradation is still observed when recognizing -multi-talker mixed speech. In this paper, we propose and evaluate several -architectures to address this problem under the assumption that only a single -channel of mixed signal is available. Our technique extends permutation -invariant training (PIT) by introducing the front-end feature separation module -with the minimum mean square error (MSE) criterion and the back-end recognition -module with the minimum cross entropy (CE) criterion. More specifically, during -training we compute the average MSE or CE over the whole utterance for each -possible utterance-level output-target assignment, pick the one with the -minimum MSE or CE, and optimize for that assignment. This strategy elegantly -solves the label permutation problem observed in the deep learning based -multi-talker mixed speech separation and recognition systems. The proposed -architectures are evaluated and compared on an artificially mixed AMI dataset -with both two- and three-talker mixed speech. The experimental results indicate -that our proposed architectures can cut the word error rate (WER) by 45.0% and -25.0% relatively against the state-of-the-art single-talker speech recognition -system across all speakers when their energies are comparable, for two- and -three-talker mixed speech, respectively. To our knowledge, this is the first -work on the multi-talker mixed speech recognition on the challenging -speaker-independent spontaneous large vocabulary continuous speech task. -" -5284,1707.06556,Aurelie Herbelot and Marco Baroni,High-risk learning: acquiring new word vectors from tiny data,cs.CL cs.LG," Distributional semantics models are known to struggle with small data. It is -generally accepted that in order to learn 'a good vector' for a word, a model -must have sufficient examples of its usage. This contradicts the fact that -humans can guess the meaning of a word from a few occurrences only. In this -paper, we show that a neural language model such as Word2Vec only necessitates -minor modifications to its standard architecture to learn new terms from tiny -data, using background knowledge from a previously learnt semantic space. We -test our model on word definitions and on a nonce task involving 2-6 sentences' -worth of context, showing a large increase in performance over state-of-the-art -models on the definitional task. -" -5285,1707.06562,Steffen Schnitzer and Svenja Neitzel and Christoph Rensing,"From Task Classification Towards Similarity Measures for Recommendation - in Crowdsourcing Systems",cs.IR cs.CL," Task selection in micro-task markets can be supported by recommender systems -to help individuals to find appropriate tasks. Previous work showed that for -the selection process of a micro-task the semantic aspects, such as the -required action and the comprehensibility, are rated more important than -factual aspects, such as the payment or the required completion time. This work -gives a foundation to create such similarity measures. Therefore, we show that -an automatic classification based on task descriptions is possible. -Additionally, we propose similarity measures to cluster micro-tasks according -to semantic aspects. -" -5286,1707.06588,"Yaniv Taigman, Lior Wolf, Adam Polyak, Eliya Nachmani",VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop,cs.LG cs.CL cs.SD," We present a new neural text to speech (TTS) method that is able to transform -text to speech in voices that are sampled in the wild. Unlike other systems, -our solution is able to deal with unconstrained voice samples and without -requiring aligned phonemes or linguistic features. The network architecture is -simpler than those in the existing literature and is based on a novel shifting -buffer working memory. The same buffer is used for estimating the attention, -computing the output audio, and for updating the buffer itself. The input -sentence is encoded using a context-free lookup table that contains one entry -per character or phoneme. The speakers are similarly represented by a short -vector that can also be fitted to new identities, even with only a few samples. -Variability in the generated speech is achieved by priming the buffer prior to -generating the audio. Experimental results on several datasets demonstrate -convincing capabilities, making TTS accessible to a wider range of -applications. In order to promote reproducibility, we release our source code -and models. -" -5287,1707.06598,Navid Rekabsaz and Bhaskar Mitra and Mihai Lupu and Allan Hanbury,Toward Incorporation of Relevant Documents in word2vec,cs.IR cs.CL," Recent advances in neural word embedding provide significant benefit to -various information retrieval tasks. However as shown by recent studies, -adapting the embedding models for the needs of IR tasks can bring considerable -further improvements. The embedding models in general define the term -relatedness by exploiting the terms' co-occurrences in short-window contexts. -An alternative (and well-studied) approach in IR for related terms to a query -is using local information i.e. a set of top-retrieved documents. In view of -these two methods of term relatedness, in this work, we report our study on -incorporating the local information of the query in the word embeddings. One -main challenge in this direction is that the dense vectors of word embeddings -and their estimation of term-to-term relatedness remain difficult to interpret -and hard to analyze. As an alternative, explicit word representations propose -vectors whose dimensions are easily interpretable, and recent methods show -competitive performance to the dense vectors. We introduce a neural-based -explicit representation, rooted in the conceptual ideas of the word2vec -Skip-Gram model. The method provides interpretable explicit vectors while -keeping the effectiveness of the Skip-Gram model. The evaluation of various -explicit representations on word association collections shows that the newly -proposed method out- performs the state-of-the-art explicit representations -when tasked with ranking highly similar terms. Based on the introduced ex- -plicit representation, we discuss our approaches on integrating local documents -in globally-trained embedding models and discuss the preliminary results. -" -5288,1707.06690,Wenhan Xiong and Thien Hoang and William Yang Wang,DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning,cs.CL cs.AI," We study the problem of learning to reason in large scale knowledge graphs -(KGs). More specifically, we describe a novel reinforcement learning framework -for learning multi-hop relational paths: we use a policy-based agent with -continuous states based on knowledge graph embeddings, which reasons in a KG -vector space by sampling the most promising relation to extend its path. In -contrast to prior work, our approach includes a reward function that takes the -accuracy, diversity, and efficiency into consideration. Experimentally, we show -that our proposed method outperforms a path-ranking based algorithm and -knowledge graph embedding methods on Freebase and Never-Ending Language -Learning datasets. -" -5289,1707.06799,Nils Reimers and Iryna Gurevych,"Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling - Tasks",cs.CL," Selecting optimal parameters for a neural network architecture can often make -the difference between mediocre and state-of-the-art performance. However, -little is published which parameters and design choices should be evaluated or -selected making the correct hyperparameter optimization often a ""black art that -requires expert experiences"" (Snoek et al., 2012). In this paper, we evaluate -the importance of different network design choices and hyperparameters for five -common linguistic sequence tagging tasks (POS, Chunking, NER, Entity -Recognition, and Event Detection). We evaluated over 50.000 different setups -and found, that some parameters, like the pre-trained word embeddings or the -last layer of the network, have a large impact on the performance, while other -parameters, for example the number of LSTM layers or the number of recurrent -units, are of minor importance. We give a recommendation on a configuration -that performs well among different tasks. -" -5290,1707.06806,"Wociech Stokowiec, Tomasz Trzcinski, Krzysztof Wolk, Krzysztof - Marasek, Przemyslaw Rokita","Shallow reading with Deep Learning: Predicting popularity of online - content using only its title",cs.CL," With the ever decreasing attention span of contemporary Internet users, the -title of online content (such as a news article or video) can be a major factor -in determining its popularity. To take advantage of this phenomenon, we propose -a new method based on a bidirectional Long Short-Term Memory (LSTM) neural -network designed to predict the popularity of online content using only its -title. We evaluate the proposed architecture on two distinct datasets of news -articles and news videos distributed in social media that contain over 40,000 -samples in total. On those datasets, our approach improves the performance over -traditional shallow approaches by a margin of 15%. Additionally, we show that -using pre-trained word vectors in the embedding layer improves the results of -LSTM models, especially when the training set is small. To our knowledge, this -is the first attempt of applying popularity prediction using only textual -information from the title. -" -5291,1707.06841,"Youmna Farag, Marek Rei, Ted Briscoe",An Error-Oriented Approach to Word Embedding Pre-Training,cs.CL cs.LG cs.NE," We propose a novel word embedding pre-training approach that exploits writing -errors in learners' scripts. We compare our method to previous models that tune -the embeddings based on script scores and the discrimination between correct -and corrupt word contexts in addition to the generic commonly-used embeddings -pre-trained on large corpora. The comparison is achieved by using the -aforementioned models to bootstrap a neural network that learns to predict a -holistic score for scripts. Furthermore, we investigate augmenting our model -with error corrections and monitor the impact on performance. Our results show -that our error-oriented approach outperforms other comparable ones which is -further demonstrated when training on more data. Additionally, extending the -model with corrections provides further performance gains when data sparsity is -an issue. -" -5292,1707.06875,"Jekaterina Novikova, Ond\v{r}ej Du\v{s}ek, Amanda Cercas Curry and - Verena Rieser",Why We Need New Evaluation Metrics for NLG,cs.CL," The majority of NLG evaluation relies on automatic metrics, such as BLEU . In -this paper, we motivate the need for novel, system- and data-independent -automatic evaluation methods: We investigate a wide range of metrics, including -state-of-the-art word-based and novel grammar-based ones, and demonstrate that -they only weakly reflect human judgements of system outputs as generated by -data-driven, end-to-end NLG. We also show that metric performance is data- and -system-specific. Nevertheless, our results also suggest that automatic metrics -perform reliably at system-level and can support system development by finding -cases where a system performs poorly. -" -5293,1707.06878,"Alexander Panchenko, Fide Marten, Eugen Ruppert, Stefano Faralli, - Dmitry Ustalov, Simone Paolo Ponzetto, Chris Biemann","Unsupervised, Knowledge-Free, and Interpretable Word Sense - Disambiguation",cs.CL," Interpretability of a predictive model is a powerful feature that gains the -trust of users in the correctness of the predictions. In word sense -disambiguation (WSD), knowledge-based systems tend to be much more -interpretable than knowledge-free counterparts as they rely on the wealth of -manually-encoded elements representing word senses, such as hypernyms, usage -examples, and images. We present a WSD system that bridges the gap between -these two so far disconnected groups of methods. Namely, our system, providing -access to several state-of-the-art WSD models, aims to be interpretable as a -knowledge-based system while it remains completely unsupervised and -knowledge-free. The presented tool features a Web interface for all-word -disambiguation of texts that makes the sense predictions human readable by -providing interpretable word sense inventories, sense representations, and -disambiguation results. We provide a public API, enabling seamless integration. -" -5294,1707.06885,"Felix Stahlberg, Eva Hasler, Danielle Saunders and Bill Byrne","SGNMT -- A Flexible NMT Decoding Platform for Quick Prototyping of New - Models and Search Strategies",cs.CL," This paper introduces SGNMT, our experimental platform for machine -translation research. SGNMT provides a generic interface to neural and symbolic -scoring modules (predictors) with left-to-right semantic such as translation -models like NMT, language models, translation lattices, $n$-best lists or other -kinds of scores and constraints. Predictors can be combined with other -predictors to form complex decoding tasks. SGNMT implements a number of search -strategies for traversing the space spanned by the predictors which are -appropriate for different predictor constellations. Adding new predictors or -decoding strategies is particularly easy, making it a very efficient tool for -prototyping new research ideas. SGNMT is actively being used by students in the -MPhil program in Machine Learning, Speech and Language Technology at the -University of Cambridge for course work and theses, as well as for most of the -research work in our group. -" -5295,1707.06932,"Michela Fazzolari and Vittoria Cozza and Marinella Petrocchi and - Angelo Spognardi",A study on text-score disagreement in online reviews,cs.CL cs.IR cs.SI," In this paper, we focus on online reviews and employ artificial intelligence -tools, taken from the cognitive computing field, to help understanding the -relationships between the textual part of the review and the assigned numerical -score. We move from the intuitions that 1) a set of textual reviews expressing -different sentiments may feature the same score (and vice-versa); and 2) -detecting and analyzing the mismatches between the review content and the -actual score may benefit both service providers and consumers, by highlighting -specific factors of satisfaction (and dissatisfaction) in texts. - To prove the intuitions, we adopt sentiment analysis techniques and we -concentrate on hotel reviews, to find polarity mismatches therein. In -particular, we first train a text classifier with a set of annotated hotel -reviews, taken from the Booking website. Then, we analyze a large dataset, with -around 160k hotel reviews collected from Tripadvisor, with the aim of detecting -a polarity mismatch, indicating if the textual content of the review is in -line, or not, with the associated score. - Using well established artificial intelligence techniques and analyzing in -depth the reviews featuring a mismatch between the text polarity and the score, -we find that -on a scale of five stars- those reviews ranked with middle scores -include a mixture of positive and negative aspects. - The approach proposed here, beside acting as a polarity detector, provides an -effective selection of reviews -on an initial very large dataset- that may -allow both consumers and providers to focus directly on the review subset -featuring a text/score disagreement, which conveniently convey to the user a -summary of positive and negative features of the review target. -" -5296,1707.06939,Xipei Liu and James P. Bagrow,"Autocompletion interfaces make crowd workers slower, but their use - promotes response diversity",cs.HC cs.CL cs.CY cs.IR," Creative tasks such as ideation or question proposal are powerful -applications of crowdsourcing, yet the quantity of workers available for -addressing practical problems is often insufficient. To enable scalable -crowdsourcing thus requires gaining all possible efficiency and information -from available workers. One option for text-focused tasks is to allow assistive -technology, such as an autocompletion user interface (AUI), to help workers -input text responses. But support for the efficacy of AUIs is mixed. Here we -designed and conducted a randomized experiment where workers were asked to -provide short text responses to given questions. Our experimental goal was to -determine if an AUI helps workers respond more quickly and with improved -consistency by mitigating typos and misspellings. Surprisingly, we found that -neither occurred: workers assigned to the AUI treatment were slower than those -assigned to the non-AUI control and their responses were more diverse, not -less, than those of the control. Both the lexical and semantic diversities of -responses were higher, with the latter measured using word2vec. A crowdsourcer -interested in worker speed may want to avoid using an AUI, but using an AUI to -boost response diversity may be valuable to crowdsourcers interested in -receiving as much novel information from workers as possible. -" -5297,1707.06945,"Ivan Vuli\'c, Nikola Mrk\v{s}i\'c, and Anna Korhonen","Cross-Lingual Induction and Transfer of Verb Classes Based on Word - Vector Space Specialisation",cs.CL," Existing approaches to automatic VerbNet-style verb classification are -heavily dependent on feature engineering and therefore limited to languages -with mature NLP pipelines. In this work, we propose a novel cross-lingual -transfer method for inducing VerbNets for multiple languages. To the best of -our knowledge, this is the first study which demonstrates how the architectures -for learning word embeddings can be applied to this challenging -syntactic-semantic task. Our method uses cross-lingual translation pairs to tie -each of the six target languages into a bilingual vector space with English, -jointly specialising the representations to encode the relational information -from English VerbNet. A standard clustering algorithm is then run on top of the -VerbNet-specialised representations, using vector dimensions as features for -learning verb classes. Our results show that the proposed cross-lingual -transfer approach sets new state-of-the-art verb classification performance -across all six target languages explored in this work. -" -5298,1707.06957,Karl Stratos,Reconstruction of Word Embeddings from Sub-Word Parameters,cs.CL," Pre-trained word embeddings improve the performance of a neural model at the -cost of increasing the model size. We propose to benefit from this resource -without paying the cost by operating strictly at the sub-lexical level. Our -approach is quite simple: before task-specific training, we first optimize -sub-word parameters to reconstruct pre-trained word embeddings using various -distance measures. We report interesting results on a variety of tasks: word -similarity, word analogy, and part-of-speech tagging. -" -5299,1707.06961,"Yuval Pinter, Robert Guthrie, Jacob Eisenstein",Mimicking Word Embeddings using Subword RNNs,cs.CL," Word embeddings improve generalization over lexical features by placing each -word in a lower-dimensional space, using distributional information obtained -from unlabeled data. However, the effectiveness of word embeddings for -downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which -embeddings do not exist. In this paper, we present MIMICK, an approach to -generating OOV word embeddings compositionally, by learning a function from -spellings to distributional embeddings. Unlike prior work, MIMICK does not -require re-training on the original word embedding corpus; instead, learning is -performed at the type level. Intrinsic and extrinsic evaluations demonstrate -the power of this simple approach. On 23 languages, MIMICK improves performance -over a word-based baseline for tagging part-of-speech and morphosyntactic -attributes. It is competitive with (and complementary to) a supervised -character-based model in low-resource settings. -" -5300,1707.06971,"Shashi Narayan and Claire Gardent and Shay B. Cohen and Anastasia - Shimorina",Split and Rephrase,cs.CL," We propose a new sentence simplification task (Split-and-Rephrase) where the -aim is to split a complex sentence into a meaning preserving sequence of -shorter sentences. Like sentence simplification, splitting-and-rephrasing has -the potential of benefiting both natural language processing and societal -applications. Because shorter sentences are generally better processed by NLP -systems, it could be used as a preprocessing step which facilitates and -improves the performance of parsers, semantic role labellers and machine -translation systems. It should also be of use for people with reading -disabilities because it allows the conversion of longer sentences into shorter -ones. This paper makes two contributions towards this new task. First, we -create and make available a benchmark consisting of 1,066,115 tuples mapping a -single complex sentence to a sequence of sentences expressing the same meaning. -Second, we propose five models (vanilla sequence-to-sequence to -semantically-motivated models) to understand the difficulty of the proposed -task. -" -5301,1707.06996,"Umang Gupta, Ankush Chatterjee, Radhakrishnan Srikanth, Puneet Agrawal","A Sentiment-and-Semantics-Based Approach for Emotion Detection in - Textual Conversations",cs.CL," Emotions are physiological states generated in humans in reaction to internal -or external events. They are complex and studied across numerous fields -including computer science. As humans, on reading ""Why don't you ever text me!"" -we can either interpret it as a sad or angry emotion and the same ambiguity -exists for machines. Lack of facial expressions and voice modulations make -detecting emotions from text a challenging problem. However, as humans -increasingly communicate using text messaging applications, and digital agents -gain popularity in our society, it is essential that these digital agents are -emotion aware, and respond accordingly. - In this paper, we propose a novel approach to detect emotions like happy, sad -or angry in textual conversations using an LSTM based Deep Learning model. Our -approach consists of semi-automated techniques to gather training data for our -model. We exploit advantages of semantic and sentiment based embeddings and -propose a solution combining both. Our work is evaluated on real-world -conversations and significantly outperforms traditional Machine Learning -baselines as well as other off-the-shelf Deep Learning models. -" -5302,1707.07045,"Kenton Lee, Luheng He, Mike Lewis, Luke Zettlemoyer",End-to-end Neural Coreference Resolution,cs.CL," We introduce the first end-to-end coreference resolution model and show that -it significantly outperforms all previous work without using a syntactic parser -or hand-engineered mention detector. The key idea is to directly consider all -spans in a document as potential mentions and learn distributions over possible -antecedents for each. The model computes span embeddings that combine -context-dependent boundary representations with a head-finding attention -mechanism. It is trained to maximize the marginal likelihood of gold antecedent -spans from coreference clusters and is factored to enable aggressive pruning of -potential mentions. Experiments demonstrate state-of-the-art performance, with -a gain of 1.5 F1 on the OntoNotes benchmark and by 3.1 F1 using a 5-model -ensemble, despite the fact that this is the first approach to be successfully -trained with no external resources. -" -5303,1707.07048,"Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong","Progressive Joint Modeling in Unsupervised Single-channel Overlapped - Speech Recognition",cs.CL cs.AI," Unsupervised single-channel overlapped speech recognition is one of the -hardest problems in automatic speech recognition (ASR). Permutation invariant -training (PIT) is a state of the art model-based approach, which applies a -single neural network to solve this single-input, multiple-output modeling -problem. We propose to advance the current state of the art by imposing a -modular structure on the neural network, applying a progressive pretraining -regimen, and improving the objective function with transfer learning and a -discriminative training criterion. The modular structure splits the problem -into three sub-tasks: frame-wise interpreting, utterance-level speaker tracing, -and speech recognition. The pretraining regimen uses these modules to solve -progressively harder tasks. Transfer learning leverages parallel clean speech -to improve the training targets for the network. Our discriminative training -formulation is a modification of standard formulations, that also penalizes -competing outputs of the system. Experiments are conducted on the artificial -overlapped Switchboard and hub5e-swb dataset. The proposed framework achieves -over 30% relative improvement of WER over both a strong jointly trained system, -PIT for ASR, and a separately optimized system, PIT for speech separation with -clean speech ASR model. The improvement comes from better model generalization, -training efficiency and the sequence level linguistic knowledge integration. -" -5304,1707.07062,"Xinyu Hua, Lu Wang","A Pilot Study of Domain Adaptation Effect for Neural Abstractive - Summarization",cs.CL," We study the problem of domain adaptation for neural abstractive -summarization. We make initial efforts in investigating what information can be -transferred to a new domain. Experimental results on news stories and opinion -articles indicate that neural summarization model benefits from pre-training -based on extractive summaries. We also find that the combination of in-domain -and out-of-domain setup yields better summaries when in-domain data is -insufficient. Further analysis shows that, the model is capable to select -salient content even trained on out-of-domain data, but requires in-domain data -to capture the style for a target domain. -" -5305,1707.07066,Hayafumi Watanabe,"Ultraslow diffusion in language: Dynamics of appearance of already - popular adjectives on Japanese blogs",physics.soc-ph cs.CL cs.CY stat.AP," What dynamics govern a time series representing the appearance of words in -social media data? In this paper, we investigate an elementary dynamics, from -which word-dependent special effects are segregated, such as breaking news, -increasing (or decreasing) concerns, or seasonality. To elucidate this problem, -we investigated approximately three billion Japanese blog articles over a -period of six years, and analysed some corresponding solvable mathematical -models. From the analysis, we found that a word appearance can be explained by -the random diffusion model based on the power-law forgetting process, which is -a type of long memory point process related to ARFIMA(0,0.5,0). In particular, -we confirmed that ultraslow diffusion (where the mean squared displacement -grows logarithmically), which the model predicts in an approximate manner, -reproduces the actual data. In addition, we also show that the model can -reproduce other statistical properties of a time series: (i) the fluctuation -scaling, (ii) spectrum density, and (iii) shapes of the probability density -functions. -" -5306,1707.07086,"Katherine A. Keith, Abram Handler, Michael Pinkham, Cara Magliozzi, - Joshua McDuffie, and Brendan O'Connor","Identifying civilians killed by police with distantly supervised - entity-event extraction",cs.CL," We propose a new, socially-impactful task for natural language processing: -from a news corpus, extract names of persons who have been killed by police. We -present a newly collected police fatality corpus, which we release publicly, -and present a model to solve this problem that uses EM-based distant -supervision with logistic regression and convolutional neural network -classifiers. Our model outperforms two off-the-shelf event extractor systems, -and it can suggest candidate victim names in some cases faster than one of the -major manually-collected police fatality databases. -" -5307,1707.07102,"Xuwang Yin, Vicente Ordonez",OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts,cs.CV cs.CL," Generating captions for images is a task that has recently received -considerable attention. In this work we focus on caption generation for -abstract scenes, or object layouts where the only information provided is a set -of objects and their locations. We propose OBJ2TEXT, a sequence-to-sequence -model that encodes a set of objects and their locations as an input sequence -using an LSTM network, and decodes this representation using an LSTM language -model. We show that our model, despite encoding object layouts as a sequence, -can represent spatial relationships between objects, and generate descriptions -that are globally coherent and semantically relevant. We test our approach in a -task of object-layout captioning by using only object annotations as inputs. We -additionally show that our model, combined with a state-of-the-art object -detector, improves an image captioning model from 0.863 to 0.950 (CIDEr score) -in the test benchmark of the standard MS-COCO Captioning task. -" -5308,1707.07129,Ali Akbar Septiandri,Predicting the Gender of Indonesian Names,cs.CL," We investigated a way to predict the gender of a name using character-level -Long-Short Term Memory (char-LSTM). We compared our method with some -conventional machine learning methods, namely Naive Bayes, logistic regression, -and XGBoost with n-grams as the features. We evaluated the models on a dataset -consisting of the names of Indonesian people. It is not common to use a family -name as the surname in Indonesian culture, except in some ethnicities. -Therefore, we inferred the gender from both full names and first names. The -results show that we can achieve 92.25% accuracy from full names, while using -first names only yields 90.65% accuracy. These results are better than the ones -from applying the classical machine learning algorithms to n-grams. -" -5309,1707.07167,"Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie",Attention-Based End-to-End Speech Recognition on Voice Search,cs.CL cs.SD," Recently, there has been a growing interest in end-to-end speech recognition -that directly transcribes speech to text without any predefined alignments. In -this paper, we explore the use of attention-based encoder-decoder model for -Mandarin speech recognition on a voice search task. Previous attempts have -shown that applying attention-based encoder-decoder to Mandarin speech -recognition was quite difficult due to the logographic orthography of Mandarin, -the large vocabulary and the conditional dependency of the attention model. In -this paper, we use character embedding to deal with the large vocabulary. -Several tricks are used for effective model training, including L2 -regularization, Gaussian weight noise and frame skipping. We compare two -attention mechanisms and use attention smoothing to cover long context in the -attention model. Taken together, these tricks allow us to finally achieve a -character error rate (CER) of 3.58% and a sentence error rate (SER) of 7.43% on -the MiTV voice search dataset. While together with a trigram language model, -CER and SER reach 2.81% and 5.77%, respectively. -" -5310,1707.07182,"Marcos Zampieri, Alina Maria Ciobanu, Liviu P. Dinu",Native Language Identification on Text and Speech,cs.CL," This paper presents an ensemble system combining the output of multiple SVM -classifiers to native language identification (NLI). The system was submitted -to the NLI Shared Task 2017 fusion track which featured students essays and -spoken responses in form of audio transcriptions and iVectors by non-native -English speakers of eleven native languages. Our system competed in the -challenge under the team name ZCD and was based on an ensemble of SVM -classifiers trained on character n-grams achieving 83.58% accuracy and ranking -3rd in the shared task. -" -5311,1707.07191,"Chieh-Yang Huang, Tristan Labetoulle, Ting-Hao Kenneth Huang, Yi-Pei - Chen, Hung-Chen Chen, Vallari Srivastava, Lun-Wei Ku","MoodSwipe: A Soft Keyboard that Suggests Messages Based on - User-Specified Emotions",cs.CL cs.HC," We present MoodSwipe, a soft keyboard that suggests text messages given the -user-specified emotions utilizing the real dialog data. The aim of MoodSwipe is -to create a convenient user interface to enjoy the technology of emotion -classification and text suggestion, and at the same time to collect labeled -data automatically for developing more advanced technologies. While users -select the MoodSwipe keyboard, they can type as usual but sense the emotion -conveyed by their text and receive suggestions for their message as a benefit. -In MoodSwipe, the detected emotions serve as the medium for suggested texts, -where viewing the latter is the incentive to correcting the former. We conduct -several experiments to show the superiority of the emotion classification -models trained on the dialog data, and further to verify good emotion cues are -important context for text suggestion. -" -5312,1707.07212,"Sandesh Swamy, Alan Ritter and Marie-Catherine de Marneffe","""i have a feeling trump will win.................."": Forecasting Winners - and Losers from User Predictions on Twitter",cs.CL," Social media users often make explicit predictions about upcoming events. -Such statements vary in the degree of certainty the author expresses toward the -outcome:""Leonardo DiCaprio will win Best Actor"" vs. ""Leonardo DiCaprio may win"" -or ""No way Leonardo wins!"". Can popular beliefs on social media predict who -will win? To answer this question, we build a corpus of tweets annotated for -veridicality on which we train a log-linear classifier that detects positive -veridicality with high precision. We then forecast uncertain outcomes using the -wisdom of crowds, by aggregating users' explicit predictions. Our method for -forecasting winners is fully automated, relying only on a set of contenders as -input. It requires no training data of past outcomes and outperforms sentiment -and tweet volume baselines on a broad range of contest prediction tasks. We -further demonstrate how our approach can be used to measure the reliability of -individual accounts' predictions and retrospectively identify surprise -outcomes. -" -5313,1707.07240,Bin Wang and Zhijian Ou,Language modeling with Neural trans-dimensional random fields,cs.CL cs.LG stat.ML," Trans-dimensional random field language models (TRF LMs) have recently been -introduced, where sentences are modeled as a collection of random fields. The -TRF approach has been shown to have the advantages of being computationally -more efficient in inference than LSTM LMs with close performance and being able -to flexibly integrating rich features. In this paper we propose neural TRFs, -beyond of the previous discrete TRFs that only use linear potentials with -discrete features. The idea is to use nonlinear potentials with continuous -features, implemented by neural networks (NNs), in the TRF framework. Neural -TRFs combine the advantages of both NNs and TRFs. The benefits of word -embedding, nonlinear feature learning and larger context modeling are inherited -from the use of NNs. At the same time, the strength of efficient inference by -avoiding expensive softmax is preserved. A number of technical contributions, -including employing deep convolutional neural networks (CNNs) to define the -potentials and incorporating the joint stochastic approximation (JSA) strategy -in the training algorithm, are developed in this work, which enable us to -successfully train neural TRF LMs. Various LMs are evaluated in terms of speech -recognition WERs by rescoring the 1000-best lists of WSJ'92 test data. The -results show that neural TRF LMs not only improve over discrete TRF LMs, but -also perform slightly better than LSTM LMs with only one fifth of parameters -and 16x faster inference efficiency. -" -5314,1707.07250,"Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, Louis-Philippe - Morency",Tensor Fusion Network for Multimodal Sentiment Analysis,cs.CL," Multimodal sentiment analysis is an increasingly popular research area, which -extends the conventional language-based definition of sentiment analysis to a -multimodal setup where other relevant modalities accompany language. In this -paper, we pose the problem of multimodal sentiment analysis as modeling -intra-modality and inter-modality dynamics. We introduce a novel model, termed -Tensor Fusion Network, which learns both such dynamics end-to-end. The proposed -approach is tailored for the volatile nature of spoken language in online -videos as well as accompanying gestures and voice. In the experiments, our -model outperforms state-of-the-art approaches for both multimodal and unimodal -sentiment analysis. -" -5315,1707.07265,"Sho Takase, Naoaki Okazaki, Kentaro Inui",Composing Distributed Representations of Relational Patterns,cs.CL," Learning distributed representations for relation instances is a central -technique in downstream NLP applications. In order to address semantic modeling -of relational patterns, this paper constructs a new dataset that provides -multiple similarity ratings for every pair of relational patterns on the -existing dataset. In addition, we conduct a comparative study of different -encoders including additive composition, RNN, LSTM, and GRU for composing -distributed representations of relational patterns. We also present Gated -Additive Composition, which is an enhancement of additive composition with the -gating mechanism. Experiments show that the new dataset does not only enable -detailed analyses of the different encoders, but also provides a gauge to -predict successes of distributed representations of relational patterns in the -relation classification task. -" -5316,1707.07270,"Yixing Fan, Liang Pang, JianPeng Hou, Jiafeng Guo, Yanyan Lan, Xueqi - Cheng",MatchZoo: A Toolkit for Deep Text Matching,cs.IR cs.CL," In recent years, deep neural models have been widely adopted for text -matching tasks, such as question answering and information retrieval, showing -improved performance as compared with previous methods. In this paper, we -introduce the MatchZoo toolkit that aims to facilitate the designing, comparing -and sharing of deep text matching models. Specifically, the toolkit provides a -unified data preparation module for different text matching problems, a -flexible layer-based model construction process, and a variety of training -objectives and evaluation metrics. In addition, the toolkit has implemented two -schools of representative deep text matching models, namely -representation-focused models and interaction-focused models. Finally, users -can easily modify existing models, create and share their own models for text -matching in MatchZoo. -" -5317,1707.07273,"Kim Anh Nguyen, Maximilian K\""oper, Sabine Schulte im Walde, Ngoc - Thang Vu",Hierarchical Embeddings for Hypernymy Detection and Directionality,cs.CL," We present a novel neural model HyperVec to learn hierarchical embeddings for -hypernymy detection and directionality. While previous embeddings have shown -limitations on prototypical hypernyms, HyperVec represents an unsupervised -measure where embeddings are learned in a specific order and capture the -hypernym$-$hyponym distributional hierarchy. Moreover, our model is able to -generalize over unseen hypernymy pairs, when using only small sets of training -data, and by mapping to other languages. Results on benchmark datasets show -that HyperVec outperforms both state$-$of$-$the$-$art unsupervised measures and -embedding models on hypernymy detection and directionality, and on predicting -graded lexical entailment. -" -5318,1707.07278,Besnik Fetahu and Katja Markert and Avishek Anand,Fine Grained Citation Span for References in Wikipedia,cs.CL," \emph{Verifiability} is one of the core editing principles in Wikipedia, -editors being encouraged to provide citations for the added content. For a -Wikipedia article, determining the \emph{citation span} of a citation, i.e. -what content is covered by a citation, is important as it helps decide for -which content citations are still missing. - We are the first to address the problem of determining the \emph{citation -span} in Wikipedia articles. We approach this problem by classifying which -textual fragments in an article are covered by a citation. We propose a -sequence classification approach where for a paragraph and a citation, we -determine the citation span at a fine-grained level. - We provide a thorough experimental evaluation and compare our approach -against baselines adopted from the scientific domain, where we show improvement -for all evaluation metrics. -" -5319,1707.07279,"Haijing Liu (1 and 2), Yang Gao (1), Pin Lv (1), Mengxue Li (1 and 2), - Shiqiang Geng (3), Minglan Li (1 and 2), Hao Wang (4) ((1) Institute of - Software, Chinese Academy of Sciences, (2) University of Chinese Academy of - Sciences, (3) School of Automation, Beijing Information Science and - Technology University, (4) Qihoo 360 Search Lab)",Using Argument-based Features to Predict and Analyse Review Helpfulness,cs.CL," We study the helpful product reviews identification problem in this paper. We -observe that the evidence-conclusion discourse relations, also known as -arguments, often appear in product reviews, and we hypothesise that some -argument-based features, e.g. the percentage of argumentative sentences, the -evidences-conclusions ratios, are good indicators of helpful reviews. To -validate this hypothesis, we manually annotate arguments in 110 hotel reviews, -and investigate the effectiveness of several combinations of argument-based -features. Experiments suggest that, when being used together with the -argument-based features, the state-of-the-art baseline features can enjoy a -performance boost (in terms of F1) of 11.01\% in average. -" -5320,1707.07328,Robin Jia and Percy Liang,Adversarial Examples for Evaluating Reading Comprehension Systems,cs.CL cs.LG," Standard accuracy metrics indicate that reading comprehension systems are -making rapid progress, but the extent to which these systems truly understand -language remains unclear. To reward systems with real language understanding -abilities, we propose an adversarial evaluation scheme for the Stanford -Question Answering Dataset (SQuAD). Our method tests whether systems can answer -questions about paragraphs that contain adversarially inserted sentences, which -are automatically generated to distract computer systems without changing the -correct answer or misleading humans. In this adversarial setting, the accuracy -of sixteen published models drops from an average of $75\%$ F1 score to $36\%$; -when the adversary is allowed to add ungrammatical sequences of words, average -accuracy on four models decreases further to $7\%$. We hope our insights will -motivate the development of new models that understand language more precisely. -" -5321,1707.07331,Natalie Ahn,"Rule-Based Spanish Morphological Analyzer Built From Spell Checking - Lexicon",cs.CL," Preprocessing tools for automated text analysis have become more widely -available in major languages, but non-English tools are often still limited in -their functionality. When working with Spanish-language text, researchers can -easily find tools for tokenization and stemming, but may not have the means to -extract more complex word features like verb tense or mood. Yet Spanish is a -morphologically rich language in which such features are often identifiable -from word form. Conjugation rules are consistent, but many special verbs and -nouns take on different rules. While building a complete dictionary of known -words and their morphological rules would be labor intensive, resources to do -so already exist, in spell checkers designed to generate valid forms of known -words. This paper introduces a set of tools for Spanish-language morphological -analysis, built using the COES spell checking tools, to label person, mood, -tense, gender and number, derive a word's root noun or verb infinitive, and -convert verbs to their nominal form. -" -5322,1707.07343,Prafulla Kumar Choubey and Ruihong Huang,"A Sequential Model for Classifying Temporal Relations between - Intra-Sentence Events",cs.CL," We present a sequential model for temporal relation classification between -intra-sentence events. The key observation is that the overall syntactic -structure and compositional meanings of the multi-word context between events -are important for distinguishing among fine-grained temporal relations. -Specifically, our approach first extracts a sequence of context words that -indicates the temporal relation between two events, which well align with the -dependency path between two event mentions. The context word sequence, together -with a parts-of-speech tag sequence and a dependency relation sequence that are -generated corresponding to the word sequence, are then provided as input to -bidirectional recurrent neural network (LSTM) models. The neural nets learn -compositional syntactic and semantic representations of contexts surrounding -the two events and predict the temporal relation between them. Evaluation of -the proposed approach on TimeBank corpus shows that sequential modeling is -capable of accurately recognizing temporal relations between events, which -outperforms a neural net model using various discrete features as input that -imitates previous feature based models. -" -5323,1707.07344,Prafulla Kumar Choubey and Ruihong Huang,"Event Coreference Resolution by Iteratively Unfolding Inter-dependencies - among Events",cs.CL," We introduce a novel iterative approach for event coreference resolution that -gradually builds event clusters by exploiting inter-dependencies among event -mentions within the same chain as well as across event chains. Among event -mentions in the same chain, we distinguish within- and cross-document event -coreference links by using two distinct pairwise classifiers, trained -separately to capture differences in feature distributions of within- and -cross-document event clusters. Our event coreference approach alternates -between WD and CD clustering and combines arguments from both event clusters -after every merge, continuing till no more merge can be made. And then it -performs further merging between event chains that are both closely related to -a set of other chains of events. Experiments on the ECB+ corpus show that our -model outperforms state-of-the-art methods in joint task of WD and CD event -coreference resolution. -" -5324,1707.07402,"Khanh Nguyen, Hal Daum\'e III and Jordan Boyd-Graber","Reinforcement Learning for Bandit Neural Machine Translation with - Simulated Human Feedback",cs.CL cs.AI cs.HC cs.LG," Machine translation is a natural candidate problem for reinforcement learning -from human feedback: users provide quick, dirty ratings on candidate -translations to guide a system to improve. Yet, current neural machine -translation training focuses on expensive human-generated reference -translations. We describe a reinforcement learning algorithm that improves -neural machine translation systems from simulated human feedback. Our algorithm -combines the advantage actor-critic algorithm (Mnih et al., 2016) with the -attention-based neural encoder-decoder architecture (Luong et al., 2015). This -algorithm (a) is well-designed for problems with a large action space and -delayed rewards, (b) effectively optimizes traditional corpus-level machine -translation metrics, and (c) is robust to skewed, high-variance, granular -feedback modeled after actual human behaviors. -" -5325,1707.07413,"Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, - Yi Li, Hairong Liu, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao - Zhu",Exploring Neural Transducers for End-to-End Speech Recognition,cs.CL cs.NE," In this work, we perform an empirical comparison among the CTC, -RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech -recognition. We show that, without any language model, Seq2Seq and -RNN-Transducer models both outperform the best reported CTC models with a -language model, on the popular Hub5'00 benchmark. On our internal diverse -dataset, these trends continue - RNNTransducer models rescored with a language -model after beam search outperform our best CTC models. These results simplify -the speech recognition pipeline so that decoding can now be expressed purely as -neural network operations. We also study how the choice of encoder architecture -affects the performance of the three models - when all encoder layers are -forward only, and when encoders downsample the input representation -aggressively. -" -5326,1707.07469,"Han Yang, Marta R. Costa-juss\`a and Jos\'e A. R. Fonollosa",Character-level Intra Attention Network for Natural Language Inference,cs.CL cs.LG," Natural language inference (NLI) is a central problem in language -understanding. End-to-end artificial neural networks have reached -state-of-the-art performance in NLI field recently. - In this paper, we propose Character-level Intra Attention Network (CIAN) for -the NLI task. In our model, we use the character-level convolutional network to -replace the standard word embedding layer, and we use the intra attention to -capture the intra-sentence semantics. The proposed CIAN model provides improved -results based on a newly published MNLI corpus. -" -5327,1707.07499,"Rudolf Schneider, Tom Oberhauser, Tobias Klatt, Felix A. Gers, - Alexander L\""oser",Analysing Errors of Open Information Extraction Systems,cs.CL," We report results on benchmarking Open Information Extraction (OIE) systems -using RelVis, a toolkit for benchmarking Open Information Extraction systems. -Our comprehensive benchmark contains three data sets from the news domain and -one data set from Wikipedia with overall 4522 labeled sentences and 11243 -binary or n-ary OIE relations. In our analysis on these data sets we compared -the performance of four popular OIE systems, ClausIE, OpenIE 4.2, Stanford -OpenIE and PredPatt. In addition, we evaluated the impact of five common error -classes on a subset of 749 n-ary tuples. From our deep analysis we unreveal -important research directions for a next generation of OIE systems. -" -5328,1707.07554,"Victor Prokhorov, Mohammad Taher Pilehvar, Dimitri Kartsaklis, Pietro - Li\'o and Nigel Collier",Learning Rare Word Representations using Semantic Bridging,cs.CL cs.AI," We propose a methodology that adapts graph embedding techniques (DeepWalk -(Perozzi et al., 2014) and node2vec (Grover and Leskovec, 2016)) as well as -cross-lingual vector space mapping approaches (Least Squares and Canonical -Correlation Analysis) in order to merge the corpus and ontological sources of -lexical knowledge. We also perform comparative analysis of the used algorithms -in order to identify the best combination for the proposed system. We then -apply this to the task of enhancing the coverage of an existing word -embedding's vocabulary with rare and unseen words. We show that our technique -can provide considerable extra coverage (over 99%), leading to consistent -performance gain (around 10% absolute gain is achieved with w2v-gn-500K cf.\S -3.3) on the Rare Word Similarity dataset. -" -5329,1707.07568,"C\'edric Lopez, Ioannis Partalas, Georgios Balikas, Nadia Derbas, - Am\'elie Martin, Coralie Reutenauer, Fr\'ed\'erique Segond, Massih-Reza Amini",CAp 2017 challenge: Twitter Named Entity Recognition,cs.CL," The paper describes the CAp 2017 challenge. The challenge concerns the -problem of Named Entity Recognition (NER) for tweets written in French. We -first present the data preparation steps we followed for constructing the -dataset released in the framework of the challenge. We begin by demonstrating -why NER for tweets is a challenging problem especially when the number of -entities increases. We detail the annotation process and the necessary -decisions we made. We provide statistics on the inter-annotator agreement, and -we conclude the data description part with examples and statistics for the -data. We, then, describe the participation in the challenge, where 8 teams -participated, with a focus on the methods employed by the challenge -participants and the scores achieved in terms of F$_1$ measure. Importantly, -the constructed dataset comprising $\sim$6,000 tweets annotated for 13 types of -entities, which to the best of our knowledge is the first such dataset in -French, is publicly available at \url{http://cap2017.imag.fr/competition.html} . -" -5330,1707.07585,"Zeya Zhang, Weizheng Chen and Hongfei Yan","Stock Prediction: a method based on extraction of news features and - recurrent neural networks",q-fin.ST cs.CL cs.IR cs.LG," This paper proposed a method for stock prediction. In terms of feature -extraction, we extract the features of stock-related news besides stock prices. -We first select some seed words based on experience which are the symbols of -good news and bad news. Then we propose an optimization method and calculate -the positive polar of all words. After that, we construct the features of news -based on the positive polar of their words. In consideration of sequential -stock prices and continuous news effects, we propose a recurrent neural network -model to help predict stock prices. Compared to SVM classifier with price -features, we find our proposed method has an over 5% improvement on stock -prediction accuracy in experiments. -" -5331,1707.07591,Timo Schick,Transition-Based Generation from Abstract Meaning Representations,cs.CL," This work addresses the task of generating English sentences from Abstract -Meaning Representation (AMR) graphs. To cope with this task, we transform each -input AMR graph into a structure similar to a dependency tree and annotate it -with syntactic information by applying various predefined actions to it. -Subsequently, a sentence is obtained from this tree structure by visiting its -nodes in a specific order. We train maximum entropy models to estimate the -probability of each individual action and devise an algorithm that efficiently -approximates the best sequence of actions to be applied. Using a substandard -language model, our generator achieves a Bleu score of 27.4 on the LDC2014T12 -test set, the best result reported so far without using silver standard -annotations from another corpus as additional training data. -" -5332,1707.07601,"Spandana Gella, Rico Sennrich, Frank Keller, Mirella Lapata",Image Pivoting for Learning Multilingual Multimodal Representations,cs.CL cs.CV," In this paper we propose a model to learn multimodal multilingual -representations for matching images and sentences in different languages, with -the aim of advancing multilingual versions of image search and image -understanding. Our model learns a common representation for images and their -descriptions in two different languages (which need not be parallel) by -considering the image as a pivot between two languages. We introduce a new -pairwise ranking loss function which can handle both symmetric and asymmetric -similarity between the two modalities. We evaluate our models on -image-description ranking for German and English, and on semantic textual -similarity of image descriptions in English. In both cases we achieve -state-of-the-art performance. -" -5333,1707.07605,"Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten de Rijke","Share your Model instead of your Data: Privacy Preserving Mimic Learning - for Ranking",cs.IR cs.AI cs.CL cs.LG," Deep neural networks have become a primary tool for solving problems in many -fields. They are also used for addressing information retrieval problems and -show strong performance in several tasks. Training these models requires large, -representative datasets and for most IR tasks, such data contains sensitive -information from users. Privacy and confidentiality concerns prevent many data -owners from sharing the data, thus today the research community can only -benefit from research on large-scale datasets in a limited manner. In this -paper, we discuss privacy preserving mimic learning, i.e., using predictions -from a privacy preserving trained model instead of labels from the original -sensitive training data as a supervision signal. We present the results of -preliminary experiments in which we apply the idea of mimic learning and -privacy preserving mimic learning for the task of document re-ranking as one of -the core IR tasks. This research is a step toward laying the ground for -enabling researchers from data-rich environments to share knowledge learned -from actual users' data, which should facilitate research collaborations. -" -5334,1707.07628,Yuanzhi Ke and Masafumi Hagiwara,Improve Lexicon-based Word Embeddings By Word Sense Disambiguation,cs.CL," There have been some works that learn a lexicon together with the corpus to -improve the word embeddings. However, they either model the lexicon separately -but update the neural networks for both the corpus and the lexicon by the same -likelihood, or minimize the distance between all of the synonym pairs in the -lexicon. Such methods do not consider the relatedness and difference of the -corpus and the lexicon, and may not be the best optimized. In this paper, we -propose a novel method that considers the relatedness and difference of the -corpus and the lexicon. It trains word embeddings by learning the corpus to -predicate a word and its corresponding synonym under the context at the same -time. For polysemous words, we use a word sense disambiguation filter to -eliminate the synonyms that have different meanings for the context. To -evaluate the proposed method, we compare the performance of the word embeddings -trained by our proposed model, the control groups without the filter or the -lexicon, and the prior works in the word similarity tasks and text -classification task. The experimental results show that the proposed model -provides better embeddings for polysemous words and improves the performance -for text classification. -" -5335,1707.07631,"Antonio Valerio Miceli Barone and Jind\v{r}ich Helcl and Rico Sennrich - and Barry Haddow and Alexandra Birch",Deep Architectures for Neural Machine Translation,cs.CL," It has been shown that increasing model depth improves the quality of neural -machine translation. However, different architectural variants to increase -model depth have been proposed, and so far, there has been no thorough -comparative study. - In this work, we describe and evaluate several existing approaches to -introduce depth in neural machine translation. Additionally, we explore novel -architectural variants, including deep transition RNNs, and we vary how -attention is used in the deep decoder. We introduce a novel ""BiDeep"" RNN -architecture that combines deep transition RNNs and stacked RNNs. - Our evaluation is carried out on the English to German WMT news translation -dataset, using a single-GPU machine for both training and inference. We find -that several of our proposed architectures improve upon existing approaches in -terms of speed and translation quality. We obtain best improvements with a -BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU -over a strong shallow baseline. - We release our code for ease of adoption. -" -5336,1707.07660,"Dat Tien Nguyen, Shafiq Joty, Basma El Amel Boussaha, Maarten de Rijke","Thread Reconstruction in Conversational Data using Neural Coherence - Models",cs.IR cs.CL," Discussion forums are an important source of information. They are often used -to answer specific questions a user might have and to discover more about a -topic of interest. Discussions in these forums may evolve in intricate ways, -making it difficult for users to follow the flow of ideas. We propose a novel -approach for automatically identifying the underlying thread structure of a -forum discussion. Our approach is based on a neural model that computes -coherence scores of possible reconstructions and then selects the highest -scoring, i.e., the most coherent one. Preliminary experiments demonstrate -promising results outperforming a number of strong baseline methods. -" -5337,1707.07678,Tom Jansen and Tobias Kuhn,Extracting Core Claims from Scientific Articles,cs.IR cs.CL cs.DL," The number of scientific articles has grown rapidly over the years and there -are no signs that this growth will slow down in the near future. Because of -this, it becomes increasingly difficult to keep up with the latest developments -in a scientific field. To address this problem, we present here an approach to -help researchers learn about the latest developments and findings by extracting -in a normalized form core claims from scientific articles. This normalized -representation is a controlled natural language of English sentences called -AIDA, which has been proposed in previous work as a method to formally -structure and organize scientific findings and discourse. We show how such AIDA -sentences can be automatically extracted by detecting the core claim of an -article, checking for AIDA compliance, and - if necessary - transforming it -into a compliant sentence. While our algorithm is still far from perfect, our -results indicate that the different steps are feasible and they support the -claim that AIDA sentences might be a promising approach to improve scientific -communication in the future. -" -5338,1707.07719,"Heike Adel and Hinrich Sch\""utze","Global Normalization of Convolutional Neural Networks for Joint Entity - and Relation Classification",cs.CL," We introduce globally normalized convolutional neural networks for joint -entity classification and relation extraction. In particular, we propose a way -to utilize a linear-chain conditional random field output layer for predicting -entity types and relations between entities at the same time. Our experiments -show that global normalization outperforms a locally normalized softmax layer -on a benchmark dataset. -" -5339,1707.07755,Miguel Ballesteros and Yaser Al-Onaizan,AMR Parsing using Stack-LSTMs,cs.CL," We present a transition-based AMR parser that directly generates AMR parses -from plain text. We use Stack-LSTMs to represent our parser state and make -decisions greedily. In our experiments, we show that our parser achieves very -competitive scores on English using only AMR training data. Adding additional -information, such as POS tags and dependency trees, improves the results -further. -" -5340,1707.07792,"Jinfeng Rao, Hua He, Haotian Zhang, Ferhan Ture, Royal Sequiera, - Salman Mohammed, and Jimmy Lin","Integrating Lexical and Temporal Signals in Neural Ranking Models for - Searching Social Media Streams",cs.IR cs.CL," Time is an important relevance signal when searching streams of social media -posts. The distribution of document timestamps from the results of an initial -query can be leveraged to infer the distribution of relevant documents, which -can then be used to rerank the initial results. Previous experiments have shown -that kernel density estimation is a simple yet effective implementation of this -idea. This paper explores an alternative approach to mining temporal signals -with recurrent neural networks. Our intuition is that neural networks provide a -more expressive framework to capture the temporal coherence of neighboring -documents in time. To our knowledge, we are the first to integrate lexical and -temporal signals in an end-to-end neural network architecture, in which -existing neural ranking models are used to generate query-document similarity -vectors that feed into a bidirectional LSTM layer for temporal modeling. Our -results are mixed: existing neural models for document ranking alone yield -limited improvements over simple baselines, but the integration of lexical and -temporal signals yield significant improvements over competitive temporal -baselines. -" -5341,1707.07804,"Royal Sequiera, Gaurav Baruah, Zhucheng Tu, Salman Mohammed, Jinfeng - Rao, Haotian Zhang, and Jimmy Lin","Exploring the Effectiveness of Convolutional Neural Networks for Answer - Selection in End-to-End Question Answering",cs.IR cs.CL," Most work on natural language question answering today focuses on answer -selection: given a candidate list of sentences, determine which contains the -answer. Although important, answer selection is only one stage in a standard -end-to-end question answering pipeline. This paper explores the effectiveness -of convolutional neural networks (CNNs) for answer selection in an end-to-end -context using the standard TrecQA dataset. We observe that a simple -idf-weighted word overlap algorithm forms a very strong baseline, and that -despite substantial efforts by the community in applying deep learning to -tackle answer selection, the gains are modest at best on this dataset. -Furthermore, it is unclear if a CNN is more effective than the baseline in an -end-to-end context based on standard retrieval metrics. To further explore this -finding, we conducted a manual user evaluation, which confirms that answers -from the CNN are detectably better than those from idf-weighted word overlap. -This result suggests that users are sensitive to relatively small differences -in answer selection quality. -" -5342,1707.07806,"Yuchen Zhang, Panupong Pasupat and Percy Liang",Macro Grammars and Holistic Triggering for Efficient Semantic Parsing,cs.CL," To learn a semantic parser from denotations, a learning algorithm must search -over a combinatorially large space of logical forms for ones consistent with -the annotated denotations. We propose a new online learning algorithm that -searches faster as training progresses. The two key ideas are using macro -grammars to cache the abstract patterns of useful logical forms found thus far, -and holistic triggering to efficiently retrieve the most relevant patterns -based on sentence similarity. On the WikiTableQuestions dataset, we first -expand the search space of an existing model to improve the state-of-the-art -accuracy from 38.7% to 42.7%, and then use macro grammars and holistic -triggering to achieve an 11x speedup and an accuracy of 43.7%. -" -5343,1707.07835,"Ajinkya Kale, Thrivikrama Taula, Sanjika Hewavitharana, Amit - Srivastava",Towards Semantic Query Segmentation,cs.IR cs.CL," Query Segmentation is one of the critical components for understanding users' -search intent in Information Retrieval tasks. It involves grouping tokens in -the search query into meaningful phrases which help downstream tasks like -search relevance and query understanding. In this paper, we propose a novel -approach to segment user queries using distributed query embeddings. Our key -contribution is a supervised approach to the segmentation task using -low-dimensional feature vectors for queries, getting rid of traditional hand -tuned and heuristic NLP features which are quite expensive. - We benchmark on a 50,000 human-annotated web search engine query corpus -achieving comparable accuracy to state-of-the-art techniques. The advantage of -our technique is its fast and does not use external knowledge-base like -Wikipedia for score boosting. This helps us generalize our approach to other -domains like eCommerce without any fine-tuning. We demonstrate the -effectiveness of this method on another 50,000 human-annotated eCommerce query -corpus from eBay search logs. Our approach is easy to implement and generalizes -well across different search domains proving the power of low-dimensional -embeddings in query segmentation task, opening up a new direction of research -for this problem. -" -5344,1707.07847,"Yi Tay, Luu Anh Tuan, Siu Cheung Hui","Hyperbolic Representation Learning for Fast and Efficient Neural - Question Answering",cs.IR cs.CL," The dominant neural architectures in question answer retrieval are based on -recurrent or convolutional encoders configured with complex word matching -layers. Given that recent architectural innovations are mostly new word -interaction layers or attention-based matching mechanisms, it seems to be a -well-established fact that these components are mandatory for good performance. -Unfortunately, the memory and computation cost incurred by these complex -mechanisms are undesirable for practical applications. As such, this paper -tackles the question of whether it is possible to achieve competitive -performance with simple neural architectures. We propose a simple but novel -deep learning architecture for fast and efficient question-answer ranking and -retrieval. More specifically, our proposed model, \textsc{HyperQA}, is a -parameter efficient neural network that outperforms other parameter intensive -models such as Attentive Pooling BiLSTMs and Multi-Perspective CNNs on multiple -QA benchmarks. The novelty behind \textsc{HyperQA} is a pairwise ranking -objective that models the relationship between question and answer embeddings -in Hyperbolic space instead of Euclidean space. This empowers our model with a -self-organizing ability and enables automatic discovery of latent hierarchies -while learning embeddings of questions and answers. Our model requires no -feature engineering, no similarity matrix matching, no complicated attention -mechanisms nor over-parameterized layers and yet outperforms and remains -competitive to many models that have these functionalities on multiple -benchmarks. -" -5345,1707.07911,"Pavel Levin, Nishikant Dhanuka, Maxim Khalilov",Machine Translation at Booking.com: Journey and Lessons Learned,cs.CL," We describe our recently developed neural machine translation (NMT) system -and benchmark it against our own statistical machine translation (SMT) system -as well as two other general purpose online engines (statistical and neural). -We present automatic and human evaluation results of the translation output -provided by each system. We also analyze the effect of sentence length on the -quality of output for SMT and NMT systems. -" -5346,1707.07922,Andrea Madotto and Giuseppe Attardi,Question Dependent Recurrent Entity Network for Question Answering,cs.CL," Question Answering is a task which requires building models capable of -providing answers to questions expressed in human language. Full question -answering involves some form of reasoning ability. We introduce a neural -network architecture for this task, which is a form of $Memory\ Network$, that -recognizes entities and their relations to answers through a focus attention -mechanism. Our model is named $Question\ Dependent\ Recurrent\ Entity\ Network$ -and extends $Recurrent\ Entity\ Network$ by exploiting aspects of the question -during the memorization process. We validate the model on both synthetic and -real datasets: the $bAbI$ question answering dataset and the $CNN\ \&\ Daily\ -News$ $reading\ comprehension$ dataset. In our experiments, the models achieved -a State-of-The-Art in the former and competitive results in the latter. -" -5347,1707.07930,"Christophe Van Gysel, Maarten de Rijke and Evangelos Kanoulas",Structural Regularities in Text-based Entity Vector Spaces,cs.IR cs.AI cs.CL," Entity retrieval is the task of finding entities such as people or products -in response to a query, based solely on the textual documents they are -associated with. Recent semantic entity retrieval algorithms represent queries -and experts in finite-dimensional vector spaces, where both are constructed -from text sequences. - We investigate entity vector spaces and the degree to which they capture -structural regularities. Such vector spaces are constructed in an unsupervised -manner without explicit information about structural aspects. For concreteness, -we address these questions for a specific type of entity: experts in the -context of expert finding. We discover how clusterings of experts correspond to -committees in organizations, the ability of expert representations to encode -the co-author graph, and the degree to which they encode academic rank. We -compare latent, continuous representations created using methods based on -distributional semantics (LSI), topic models (LDA) and neural networks -(word2vec, doc2vec, SERT). Vector spaces created using neural methods, such as -doc2vec and SERT, systematically perform better at clustering than LSI, LDA and -word2vec. When it comes to encoding entity relations, SERT performs best. -" -5348,1707.08041,"Michael Filhol, Gilles Falquet","Synthesising Sign Language from semantics, approaching ""from the target - and back""",cs.CL," We present a Sign Language modelling approach allowing to build grammars and -create linguistic input for Sign synthesis through avatars. We comment on the -type of grammar it allows to build, and observe a resemblance between the -resulting expressions and traditional semantic representations. Comparing the -ways in which the paradigms are designed, we name and contrast two essentially -different strategies for building higher-level linguistic input: -""source-and-forward"" vs. ""target-and-back"". We conclude by favouring the -latter, acknowledging the power of being able to automatically generate output -from semantically relevant input straight into articulations of the target -language. -" -5349,1707.08052,"Sam Wiseman, Stuart M. Shieber, Alexander M. Rush",Challenges in Data-to-Document Generation,cs.CL," Recent neural models have shown significant progress on the problem of -generating short descriptive texts conditioned on a small number of database -records. In this work, we suggest a slightly more difficult data-to-text -generation task, and investigate how effective current approaches are on this -task. In particular, we introduce a new, large-scale corpus of data records -paired with descriptive documents, propose a series of extractive evaluation -methods for analyzing performance, and obtain baseline results using current -neural generation methods. Experiments show that these models produce fluent -text, but fail to convincingly approximate human-generated documents. Moreover, -even templated baselines exceed the performance of these neural models on some -metrics, though copy- and reconstruction-based extensions lead to noticeable -improvements. -" -5350,1707.08081,"Guy D. Rosin, Eytan Adar, Kira Radinsky",Learning Word Relatedness over Time,cs.CL," Search systems are often focused on providing relevant results for the ""now"", -assuming both corpora and user needs that focus on the present. However, many -corpora today reflect significant longitudinal collections ranging from 20 -years of the Web to hundreds of years of digitized newspapers and books. -Understanding the temporal intent of the user and retrieving the most relevant -historical content has become a significant challenge. Common search features, -such as query expansion, leverage the relationship between terms but cannot -function well across all times when relationships vary temporally. In this -work, we introduce a temporal relationship model that is extracted from -longitudinal data collections. The model supports the task of identifying, -given two words, when they relate to each other. We present an algorithmic -framework for this task and show its application for the task of query -expansion, achieving high gain. -" -5351,1707.08084,"Andrei M. Butnaru, Radu Tudor Ionescu, Florentina Hristea","ShotgunWSD: An unsupervised algorithm for global word sense - disambiguation inspired by DNA sequencing",cs.CL," In this paper, we present a novel unsupervised algorithm for word sense -disambiguation (WSD) at the document level. Our algorithm is inspired by a -widely-used approach in the field of genetics for whole genome sequencing, -known as the Shotgun sequencing technique. The proposed WSD algorithm is based -on three main steps. First, a brute-force WSD algorithm is applied to short -context windows (up to 10 words) selected from the document in order to -generate a short list of likely sense configurations for each window. In the -second step, these local sense configurations are assembled into longer -composite configurations based on suffix and prefix matching. The resulted -configurations are ranked by their length, and the sense of each word is chosen -based on a voting scheme that considers only the top k configurations in which -the word appears. We compare our algorithm with other state-of-the-art -unsupervised WSD algorithms and demonstrate better performance, sometimes by a -very large margin. We also show that our algorithm can yield better performance -than the Most Common Sense (MCS) baseline on one data set. Moreover, our -algorithm has a very small number of parameters, is robust to parameter tuning, -and, unlike other bio-inspired methods, it gives a deterministic solution (it -does not involve random choices). -" -5352,1707.08098,Andrei M. Butnaru and Radu Tudor Ionescu,"From Image to Text Classification: A Novel Approach based on Clustering - Word Embeddings",cs.CL," In this paper, we propose a novel approach for text classification based on -clustering word embeddings, inspired by the bag of visual words model, which is -widely used in computer vision. After each word in a collection of documents is -represented as word vector using a pre-trained word embeddings model, a k-means -algorithm is applied on the word vectors in order to obtain a fixed-size set of -clusters. The centroid of each cluster is interpreted as a super word embedding -that embodies all the semantically related word vectors in a certain region of -the embedding space. Every embedded word in the collection of documents is then -assigned to the nearest cluster centroid. In the end, each document is -represented as a bag of super word embeddings by computing the frequency of -each super word embedding in the respective document. We also diverge from the -idea of building a single vocabulary for the entire collection of documents, -and propose to build class-specific vocabularies for better performance. Using -this kind of representation, we report results on two text mining tasks, namely -text categorization by topic and polarity classification. On both tasks, our -model yields better performance than the standard bag of words. -" -5353,1707.08139,Jacob Andreas and Dan Klein,Analogs of Linguistic Structure in Deep Representations,cs.CL cs.NE," We investigate the compositional structure of message vectors computed by a -deep network trained on a communication game. By comparing truth-conditional -representations of encoder-produced message vectors to human-produced referring -expressions, we are able to identify aligned (vector, utterance) pairs with the -same meaning. We then search for structured relationships among these aligned -pairs to discover simple vector space transformations corresponding to -negation, conjunction, and disjunction. Our results suggest that neural -representations are capable of spontaneously developing a ""syntax"" with -functional analogues to qualitative properties of natural language. -" -5354,1707.08172,"Nikita Nangia, Adina Williams, Angeliki Lazaridou, Samuel R. Bowman","The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference - with Sentence Representations",cs.CL," This paper presents the results of the RepEval 2017 Shared Task, which -evaluated neural network sentence representation learning models on the -Multi-Genre Natural Language Inference corpus (MultiNLI) recently introduced by -Williams et al. (2017). All of the five participating teams beat the -bidirectional LSTM (BiLSTM) and continuous bag of words baselines reported in -Williams et al.. The best single model used stacked BiLSTMs with residual -connections to extract sentence features and reached 74.5% accuracy on the -genre-matched test set. Surprisingly, the results of the competition were -fairly consistent across the genre-matched and genre-mismatched test sets, and -across subsets of the test data representing a variety of linguistic phenomena, -suggesting that all of the submitted systems learned reasonably -domain-independent representations for sentence meaning. -" -5355,1707.08209,"Jaydeep Chipalkatti, Mihir Kulkarni",On the letter frequencies and entropy of written Marathi,cs.IT cs.CL math.IT," We carry out a comprehensive analysis of letter frequencies in contemporary -written Marathi. We determine sets of letters which statistically predominate -any large generic Marathi text, and use these sets to estimate the entropy of -Marathi. -" -5356,1707.08214,"Fr\'ederic Godin, Jonas Degrave, Joni Dambre, Wesley De Neve","Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation - Functions in Quasi-Recurrent Neural Networks",cs.CL cs.LG cs.NE," In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), -called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an -unbounded positive and negative image, can be used as a drop-in replacement for -a tanh activation function in the recurrent step of Quasi-Recurrent Neural -Networks (QRNNs) (Bradbury et al. (2017)). Similar to ReLUs, DReLUs are less -prone to the vanishing gradient problem, they are noise robust, and they induce -sparse activations. - We independently reproduce the QRNN experiments of Bradbury et al. (2017) and -compare our DReLU-based QRNNs with the original tanh-based QRNNs and Long -Short-Term Memory networks (LSTMs) on sentiment classification and word-level -language modeling. Additionally, we evaluate on character-level language -modeling, showing that we are able to stack up to eight QRNN layers with -DReLUs, thus making it possible to improve the current state-of-the-art in -character-level language modeling over shallow architectures based on LSTMs. -" -5357,1707.08290,"Antoni Lozano, Bernardino Casas, Chris Bentz and Ramon Ferrer-i-Cancho",Fast calculation of entropy with Zhang's estimator,cs.CL," Entropy is a fundamental property of a repertoire. Here, we present an -efficient algorithm to estimate the entropy of types with the help of Zhang's -estimator. The algorithm takes advantage of the fact that the number of -different frequencies in a text is in general much smaller than the number of -types. We justify the convenience of the algorithm by means of an analysis of -the statistical properties of texts from more than 1000 languages. Our work -opens up various possibilities for future research. -" -5358,1707.08309,Subhabrata Mukherjee,"Probabilistic Graphical Models for Credibility Analysis in Evolving - Online Communities",cs.SI cs.AI cs.CL cs.IR stat.ML," One of the major hurdles preventing the full exploitation of information from -online communities is the widespread concern regarding the quality and -credibility of user-contributed content. Prior works in this domain operate on -a static snapshot of the community, making strong assumptions about the -structure of the data (e.g., relational tables), or consider only shallow -features for text classification. - To address the above limitations, we propose probabilistic graphical models -that can leverage the joint interplay between multiple factors in online -communities --- like user interactions, community dynamics, and textual content ---- to automatically assess the credibility of user-contributed online content, -and the expertise of users and their evolution with user-interpretable -explanation. To this end, we devise new models based on Conditional Random -Fields for different settings like incorporating partial expert knowledge for -semi-supervised learning, and handling discrete labels as well as numeric -ratings for fine-grained analysis. This enables applications such as extracting -reliable side-effects of drugs from user-contributed posts in healthforums, and -identifying credible content in news communities. - Online communities are dynamic, as users join and leave, adapt to evolving -trends, and mature over time. To capture this dynamics, we propose generative -models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian -Motion to trace the continuous evolution of user expertise and their language -model over time. This allows us to identify expert users and credible content -jointly over time, improving state-of-the-art recommender systems by explicitly -considering the maturity of users. This also enables applications such as -identifying helpful product reviews, and detecting fake and anomalous reviews -with limited information. -" -5359,1707.08349,Radu Tudor Ionescu and Marius Popescu,"Can string kernels pass the test of time in Native Language - Identification?",cs.CL," We describe a machine learning approach for the 2017 shared task on Native -Language Identification (NLI). The proposed approach combines several kernels -using multiple kernel learning. While most of our kernels are based on -character p-grams (also known as n-grams) extracted from essays or speech -transcripts, we also use a kernel based on i-vectors, a low-dimensional -representation of audio recordings, provided by the shared task organizers. For -the learning stage, we choose Kernel Discriminant Analysis (KDA) over Kernel -Ridge Regression (KRR), because the former classifier obtains better results -than the latter one on the development set. In our previous work, we have used -a similar machine learning approach to achieve state-of-the-art NLI results. -The goal of this paper is to demonstrate that our shallow and simple approach -based on string kernels (with minor improvements) can pass the test of time and -reach state-of-the-art performance in the 2017 NLI shared task, despite the -recent advances in natural language processing. We participated in all three -tracks, in which the competitors were allowed to use only the essays (essay -track), only the speech transcripts (speech track), or both (fusion track). -Using only the data provided by the organizers for training our models, we have -reached a macro F1 score of 86.95% in the closed essay track, a macro F1 score -of 87.55% in the closed speech track, and a macro F1 score of 93.19% in the -closed fusion track. With these scores, our team (UnibucKernel) ranked in the -first group of teams in all three tracks, while attaining the best scores in -the speech and the fusion tracks. -" -5360,1707.08435,"William Havard, Laurent Besacier, Olivier Rosec","SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO - Data Set",cs.CL," This paper presents an augmentation of MSCOCO dataset where speech is added -to image and text. Speech captions are generated using text-to-speech (TTS) -synthesis resulting in 616,767 spoken captions (more than 600h) paired with -images. Disfluencies and speed perturbation are added to the signal in order to -sound more natural. Each speech signal (WAV) is paired with a JSON file -containing exact timecode for each word/syllable/phoneme in the spoken caption. -Such a corpus could be used for Language and Vision (LaVi) tasks including -speech input or output instead of text. Investigating multimodal learning -schemes for unsupervised speech pattern discovery is also possible with this -corpus, as demonstrated by a preliminary study conducted on a subset of the -corpus (10h, 10k spoken captions). The dataset is available on Zenodo: -https://zenodo.org/record/4282267 -" -5361,1707.08446,"Jasabanta Patro, Bidisha Samanta, Saurabh Singh, Abhipsa Basu, - Prithwish Mukherjee, Monojit Choudhury, Animesh Mukherjee","All that is English may be Hindi: Enhancing language identification - through automatic ranking of likeliness of word borrowing in social media",cs.CL," In this paper, we present a set of computational methods to identify the -likeliness of a word being borrowed, based on the signals from social media. In -terms of Spearman correlation coefficient values, our methods perform more than -two times better (nearly 0.62) in predicting the borrowing likeliness compared -to the best performing baseline (nearly 0.26) reported in literature. Based on -this likeliness estimate we asked annotators to re-annotate the language tags -of foreign words in predominantly native contexts. In 88 percent of cases the -annotators felt that the foreign language tag should be replaced by native -language tag, thus indicating a huge scope for improvement of automatic -language identification systems. -" -5362,1707.08458,"Ekaterina Vylomova, Andrei Shcherbakov, Yuriy Philippovich, Galina - Cherkasova","Men Are from Mars, Women Are from Venus: Evaluation and Modelling of - Verbal Associations",cs.CL," We present a quantitative analysis of human word association pairs and study -the types of relations presented in the associations. We put our main focus on -the correlation between response types and respondent characteristics such as -occupation and gender by contrasting syntagmatic and paradigmatic associations. -Finally, we propose a personalised distributed word association model and show -the importance of incorporating demographic factors into the models commonly -used in natural language processing. -" -5363,1707.08470,"Sujan Perera, Pablo N. Mendes, Adarsh Alex, Amit Sheth, Krishnaprasad - Thirunarayan",Implicit Entity Linking in Tweets,cs.CL cs.AI," Over the years, Twitter has become one of the largest communication platforms -providing key data to various applications such as brand monitoring, trend -detection, among others. Entity linking is one of the major tasks in natural -language understanding from tweets and it associates entity mentions in text to -corresponding entries in knowledge bases in order to provide unambiguous -interpretation and additional con- text. State-of-the-art techniques have -focused on linking explicitly mentioned entities in tweets with reasonable -success. However, we argue that in addition to explicit mentions i.e. The movie -Gravity was more ex- pensive than the mars orbiter mission entities (movie -Gravity) can also be mentioned implicitly i.e. This new space movie is crazy. -you must watch it!. This paper introduces the problem of implicit entity -linking in tweets. We propose an approach that models the entities by -exploiting their factual and contextual knowledge. We demonstrate how to use -these models to perform implicit entity linking on a ground truth dataset with -397 tweets from two domains, namely, Movie and Book. Specifically, we show: 1) -the importance of linking implicit entities and its value addition to the -standard entity linking task, and 2) the importance of exploiting contextual -knowledge associated with an entity for linking their implicit mentions. We -also make the ground truth dataset publicly available to foster the research in -this new research area. -" -5364,1707.08559,"Cheng-Yang Fu, Joon Lee, Mohit Bansal, Alexander C. Berg",Video Highlight Prediction Using Audience Chat Reactions,cs.CL cs.AI cs.CV cs.LG cs.MM," Sports channel video portals offer an exciting domain for research on -multimodal, multilingual analysis. We present methods addressing the problem of -automatic video highlight prediction based on joint visual features and textual -analysis of the real-world audience discourse with complex slang, in both -English and traditional Chinese. We present a novel dataset based on League of -Legends championships recorded from North American and Taiwanese Twitch.tv -channels (will be released for further research), and demonstrate strong -results on these using multimodal, character-level CNN-RNN model architectures. -" -5365,1707.08588,"Yikang Shen, Shawn Tan, Chrisopher Pal and Aaron Courville",Self-organized Hierarchical Softmax,cs.CL cs.LG," We propose a new self-organizing hierarchical softmax formulation for -neural-network-based language models over large vocabularies. Instead of using -a predefined hierarchical structure, our approach is capable of learning word -clusters with clear syntactical and semantic meaning during the language model -training process. We provide experiments on standard benchmarks for language -modeling and sentence compression tasks. We find that this approach is as fast -as other efficient softmax approximations, while achieving comparable or even -better performance relative to similar full softmax models. -" -5366,1707.08608,"Jay Yoon Lee, Sanket Vaibhav Mehta, Michael Wick, Jean-Baptiste - Tristan, Jaime Carbonell",Gradient-based Inference for Networks with Output Constraints,cs.CL," Practitioners apply neural networks to increasingly complex problems in -natural language processing, such as syntactic parsing and semantic role -labeling that have rich output structures. Many such structured-prediction -problems require deterministic constraints on the output values; for example, -in sequence-to-sequence syntactic parsing, we require that the sequential -outputs encode valid trees. While hidden units might capture such properties, -the network is not always able to learn such constraints from the training data -alone, and practitioners must then resort to post-processing. In this paper, we -present an inference method for neural networks that enforces deterministic -constraints on outputs without performing rule-based post-processing or -expensive discrete search. Instead, in the spirit of gradient-based training, -we enforce constraints with gradient-based inference (GBI): for each input at -test-time, we nudge continuous model weights until the network's unconstrained -inference procedure generates an output that satisfies the constraints. We -study the efficacy of GBI on three tasks with hard constraints: semantic role -labeling, syntactic parsing, and sequence transduction. In each case, the -algorithm not only satisfies constraints but improves accuracy, even when the -underlying network is state-of-the-art. -" -5367,1707.08616,"Brent Harrison, Upol Ehsan, Mark O. Riedl",Guiding Reinforcement Learning Exploration Using Natural Language,cs.AI cs.CL cs.LG stat.ML," In this work we present a technique to use natural language to help -reinforcement learning generalize to unseen environments. This technique uses -neural machine translation, specifically the use of encoder-decoder networks, -to learn associations between natural language behavior descriptions and -state-action information. We then use this learned model to guide agent -exploration using a modified version of policy shaping to make it more -effective at learning in unseen environments. We evaluate this technique using -the popular arcade game, Frogger, under ideal and non-ideal conditions. This -evaluation shows that our modified policy shaping algorithm improves over a -Q-learning agent as well as a baseline version of policy shaping. -" -5368,1707.08660,"Andrey Kutuzov, Erik Velldal, Lilja {\O}vrelid","Temporal dynamics of semantic relations in word embeddings: an - application to predicting armed conflict participants",cs.CL," This paper deals with using word embedding models to trace the temporal -dynamics of semantic relations between pairs of words. The set-up is similar to -the well-known analogies task, but expanded with a time dimension. To this end, -we apply incremental updating of the models with new training texts, including -incremental vocabulary expansion, coupled with learned transformation matrices -that let us map between members of the relation. The proposed approach is -evaluated on the task of predicting insurgent armed groups based on -geographical locations. The gold standard data for the time span 1994--2010 is -extracted from the UCDP Armed Conflicts dataset. The results show that the -method is feasible and outperforms the baselines, but also that important work -still remains to be done. -" -5369,1707.08668,"Siddharth Karamcheti, Edward C. Williams, Dilip Arumugam, Mina Rhee, - Nakul Gopalan, Lawson L. S. Wong, and Stefanie Tellex","A Tale of Two DRAGGNs: A Hybrid Approach for Interpreting - Action-Oriented and Goal-Oriented Instructions",cs.AI cs.CL," Robots operating alongside humans in diverse, stochastic environments must be -able to accurately interpret natural language commands. These instructions -often fall into one of two categories: those that specify a goal condition or -target state, and those that specify explicit actions, or how to perform a -given task. Recent approaches have used reward functions as a semantic -representation of goal-based commands, which allows for the use of a -state-of-the-art planner to find a policy for the given task. However, these -reward functions cannot be directly used to represent action-oriented commands. -We introduce a new hybrid approach, the Deep Recurrent Action-Goal Grounding -Network (DRAGGN), for task grounding and execution that handles natural -language from either category as input, and generalizes to unseen environments. -Our robot-simulation results demonstrate that a system successfully -interpreting both goal-oriented and action-oriented task specifications brings -us closer to robust natural language understanding for human-robot interaction. -" -5370,1707.08713,"Hitomi Yanaka, Koji Mineshima, Pascual Martinez-Gomez, Daisuke Bekki",Determining Semantic Textual Similarity using Natural Deduction Proofs,cs.CL," Determining semantic textual similarity is a core research subject in natural -language processing. Since vector-based models for sentence representation -often use shallow information, capturing accurate semantics is difficult. By -contrast, logical semantic representations capture deeper levels of sentence -semantics, but their symbolic nature does not offer graded notions of textual -similarity. We propose a method for determining semantic textual similarity by -combining shallow features with features extracted from natural deduction -proofs of bidirectional entailment relations between sentence pairs. For the -natural deduction proofs, we use ccg2lambda, a higher-order automatic inference -system, which converts Combinatory Categorial Grammar (CCG) derivation trees -into semantic representations and conducts natural deduction proofs. -Experiments show that our system was able to outperform other logic-based -systems and that features derived from the proofs are effective for learning -textual similarity. -" -5371,1707.08783,"Rocco Tripodi, Stefano Li Pira",Analysis of Italian Word Embeddings,cs.CL cs.AI," In this work we analyze the performances of two of the most used word -embeddings algorithms, skip-gram and continuous bag of words on Italian -language. These algorithms have many hyper-parameter that have to be carefully -tuned in order to obtain accurate word representation in vectorial space. We -provide an accurate analysis and an evaluation, showing what are the best -configuration of parameters for specific tasks. -" -5372,1707.08852,"Dongyeop Kang, Varun Gangal, Ang Lu, Zheng Chen, Eduard Hovy",Detecting and Explaining Causes From Text For a Time Series Event,cs.CL cs.AI cs.LG," Explaining underlying causes or effects about events is a challenging but -valuable task. We define a novel problem of generating explanations of a time -series event by (1) searching cause and effect relationships of the time series -with textual data and (2) constructing a connecting chain between them to -generate an explanation. To detect causal features from text, we propose a -novel method based on the Granger causality of time series between features -extracted from text such as N-grams, topics, sentiments, and their composition. -The generation of the sequence of causal entities requires a commonsense -causative knowledge base with efficient reasoning. To ensure good -interpretability and appropriate lexical usage we combine symbolic and neural -representations, using a neural reasoning algorithm trained on commonsense -causal tuples to predict the next cause step. Our quantitative and human -analysis show empirical evidence that our method successfully extracts -meaningful causality relationships between time series with textual features -and generates appropriate explanation between them. -" -5373,1707.08866,"Yi Yao Huang, William Yang Wang",Deep Residual Learning for Weakly-Supervised Relation Extraction,cs.CL cs.AI," Deep residual learning (ResNet) is a new method for training very deep neural -networks using identity map-ping for shortcut connections. ResNet has won the -ImageNet ILSVRC 2015 classification task, and achieved state-of-the-art -performances in many computer vision tasks. However, the effect of residual -learning on noisy natural language processing tasks is still not well -understood. In this paper, we design a novel convolutional neural network (CNN) -with residual learning, and investigate its impacts on the task of distantly -supervised noisy relation extraction. In contradictory to popular beliefs that -ResNet only works well for very deep networks, we found that even with 9 layers -of CNNs, using identity mapping could significantly improve the performance for -distantly-supervised relation extraction. -" -5374,1707.08939,Kyunghyun Cho,Strawman: an Ensemble of Deep Bag-of-Ngrams for Sentiment Analysis,cs.CL," This paper describes a builder entry, named ""strawman"", to the sentence-level -sentiment analysis task of the ""Build It, Break It"" shared task of the First -Workshop on Building Linguistically Generalizable NLP Systems. The goal of a -builder is to provide an automated sentiment analyzer that would serve as a -target for breakers whose goal is to find pairs of minimally-differing -sentences that break the analyzer. -" -5375,1707.08976,"Mitchell Stern, Daniel Fried, Dan Klein",Effective Inference for Generative Neural Parsing,cs.CL," Generative neural models have recently achieved state-of-the-art results for -constituency parsing. However, without a feasible search procedure, their use -has so far been limited to reranking the output of external parsers in which -decoding is more tractable. We describe an alternative to the conventional -action-level beam search used for discriminative neural models that enables us -to decode directly in these generative models. We then show that by improving -our basic candidate selection strategy and using a coarse pruning function, we -can improve accuracy while exploring significantly less of the search space. -Applied to the model of Choe and Charniak (2016), our inference procedure -obtains 92.56 F1 on section 23 of the Penn Treebank, surpassing prior -state-of-the-art results for single-model systems. -" -5376,1707.08998,"Im\`ene Guellil (ESI), Fai\c{c}al Azouaou (ESI)","ASDA : Analyseur Syntaxique du Dialecte Alg{\'e}rien dans un but - d'analyse s{\'e}mantique",cs.CL," Opinion mining and sentiment analysis in social media is a research issue -having a great interest in the scientific community. However, before begin this -analysis, we are faced with a set of problems. In particular, the problem of -the richness of languages and dialects within these media. To address this -problem, we propose in this paper an approach of construction and -implementation of Syntactic analyzer named ASDA. This tool represents a parser -for the Algerian dialect that label the terms of a given corpus. Thus, we -construct a labeling table containing for each term its stem, different -prefixes and suffixes, allowing us to determine the different grammatical parts -a sort of POS tagging. This labeling will serve us later in the semantic -processing of the Algerian dialect, like the automatic translation of this -dialect or sentiment analysis -" -5377,1707.09050,"Artem Sokolov, Julia Kreutzer, Kellen Sunderland, Pavel Danchenko, - Witold Szymaniak, Hagen F\""urstenau, Stefan Riezler",A Shared Task on Bandit Learning for Machine Translation,cs.CL stat.ML," We introduce and describe the results of a novel shared task on bandit -learning for machine translation. The task was organized jointly by Amazon and -Heidelberg University for the first time at the Second Conference on Machine -Translation (WMT 2017). The goal of the task is to encourage research on -learning machine translation from weak user feedback instead of human -references or post-edits. On each of a sequence of rounds, a machine -translation system is required to propose a translation for an input, and -receives a real-valued estimate of the quality of the proposed translation for -learning. This paper describes the shared task's learning and evaluation setup, -using services hosted on Amazon Web Services (AWS), the data and evaluation -metrics, and the results of various machine translation architectures and -learning protocols. -" -5378,1707.09067,"Allen Schmaltz, Yoon Kim, Alexander M. Rush, Stuart M. Shieber",Adapting Sequence Models for Sentence Correction,cs.CL," In a controlled experiment of sequence-to-sequence approaches for the task of -sentence correction, we find that character-based models are generally more -effective than word-based models and models that encode subword information via -convolutions, and that modeling the output data as a series of diffs improves -effectiveness over standard approaches. Our strongest sequence-to-sequence -model improves over our strongest phrase-based statistical machine translation -model, with access to the same data, by 6 M2 (0.5 GLEU) points. Additionally, -in the data environment of the standard CoNLL-2014 setup, we demonstrate that -modeling (and tuning against) diffs yields similar or better M2 scores with -simpler models and/or significantly less data than previous -sequence-to-sequence approaches. -" -5379,1707.09098,"Boyuan Pan, Hao Li, Zhou Zhao, Bin Cao, Deng Cai, Xiaofei He","MEMEN: Multi-layer Embedding with Memory Networks for Machine - Comprehension",cs.AI cs.CL," Machine comprehension(MC) style question answering is a representative -problem in natural language processing. Previous methods rarely spend time on -the improvement of encoding layer, especially the embedding of syntactic -information and name entity of the words, which are very crucial to the quality -of encoding. Moreover, existing attention methods represent each query word as -a vector or use a single vector to represent the whole query sentence, neither -of them can handle the proper weight of the key words in query sentence. In -this paper, we introduce a novel neural network architecture called Multi-layer -Embedding with Memory Network(MEMEN) for machine reading task. In the encoding -layer, we employ classic skip-gram model to the syntactic and semantic -information of the words to train a new kind of embedding layer. We also -propose a memory network of full-orientation matching of the query and passage -to catch more pivotal information. Experiments show that our model has -competitive results both from the perspectives of precision and efficiency in -Stanford Question Answering Dataset(SQuAD) among all published results and -achieves the state-of-the-art results on TriviaQA dataset. -" -5380,1707.09118,"Carolin Lawrence, Artem Sokolov, Stefan Riezler","Counterfactual Learning from Bandit Feedback under Deterministic - Logging: A Case Study in Statistical Machine Translation",stat.ML cs.CL cs.LG," The goal of counterfactual learning for statistical machine translation (SMT) -is to optimize a target SMT system from logged data that consist of user -feedback to translations that were predicted by another, historic SMT system. A -challenge arises by the fact that risk-averse commercial SMT systems -deterministically log the most probable translation. The lack of sufficient -exploration of the SMT output space seemingly contradicts the theoretical -requirements for counterfactual learning. We show that counterfactual learning -from deterministic bandit logs is possible nevertheless by smoothing out -deterministic components in learning. This can be achieved by additive and -multiplicative control variates that avoid degenerate behavior in empirical -risk minimization. Our simulation experiments show improvements of up to 2 BLEU -points by counterfactual learning from deterministic bandit feedback. -" -5381,1707.09168,"Bingfeng Luo, Yansong Feng, Jianbo Xu, Xiang Zhang and Dongyan Zhao",Learning to Predict Charges for Criminal Cases with Legal Basis,cs.CL," The charge prediction task is to determine appropriate charges for a given -case, which is helpful for legal assistant systems where the user input is fact -description. We argue that relevant law articles play an important role in this -task, and therefore propose an attention-based neural network method to jointly -model the charge prediction task and the relevant article extraction task in a -unified framework. The experimental results show that, besides providing legal -basis, the relevant articles can also clearly improve the charge prediction -results, and our full model can effectively predict appropriate charges for -cases with different expression styles. -" -5382,1707.09231,"Ina R\""osiger, Sabrina Stehwien, Arndt Riester, Ngoc Thang Vu","Improving coreference resolution with automatically predicted prosodic - information",cs.CL," Adding manually annotated prosodic information, specifically pitch accents -and phrasing, to the typical text-based feature set for coreference resolution -has previously been shown to have a positive effect on German data. Practical -applications on spoken language, however, would rely on automatically predicted -prosodic information. In this paper we predict pitch accents (and phrase -boundaries) using a convolutional neural network (CNN) model from acoustic -features extracted from the speech signal. After an assessment of the quality -of these automatic prosodic annotations, we show that they also significantly -improve coreference resolution. -" -5383,1707.09406,"Wenlin Yao, Zeyu Dai, Ruihong Huang, James Caverlee",Online Deception Detection Refueled by Real World Data Collection,cs.CL," The lack of large realistic datasets presents a bottleneck in online -deception detection studies. In this paper, we apply a data collection method -based on social network analysis to quickly identify high-quality deceptive and -truthful online reviews from Amazon. The dataset contains more than 10,000 -deceptive reviews and is diverse in product domains and reviewers. Using this -dataset, we explore effective general features for online deception detection -that perform well across domains. We demonstrate that with generalized features -- advertising speak and writing complexity scores - deception detection -performance can be further improved by adding additional deceptive reviews from -assorted domains in training. Finally, reviewer level evaluation gives an -interesting insight into different deceptive reviewers' writing styles. -" -5384,1707.09410,"Wenlin Yao, Saipravallika Nettyam, Ruihong Huang","A Weakly Supervised Approach to Train Temporal Relation Classifiers and - Acquire Regular Event Pairs Simultaneously",cs.CL," Capabilities of detecting temporal relations between two events can benefit -many applications. Most of existing temporal relation classifiers were trained -in a supervised manner. Instead, we explore the observation that regular event -pairs show a consistent temporal relation despite of their various contexts, -and these rich contexts can be used to train a contextual temporal relation -classifier, which can further recognize new temporal relation contexts and -identify new regular event pairs. We focus on detecting after and before -temporal relations and design a weakly supervised learning approach that -extracts thousands of regular event pairs and learns a contextual temporal -relation classifier simultaneously. Evaluation shows that the acquired regular -event pairs are of high quality and contain rich commonsense knowledge and -domain specific knowledge. In addition, the weakly supervised trained temporal -relation classifier achieves comparable performance with the state-of-the-art -supervised systems. -" -5385,1707.09443,Ulrich Germann,Bilingual Document Alignment with Latent Semantic Indexing,cs.CL," We apply cross-lingual Latent Semantic Indexing to the Bilingual Document -Alignment Task at WMT16. Reduced-rank singular value decomposition of a -bilingual term-document matrix derived from known English/French page pairs in -the training data allows us to map monolingual documents into a joint semantic -space. Two variants of cosine similarity between the vectors that place each -document into the joint semantic space are combined with a measure of string -similarity between corresponding URLs to produce 1:1 alignments of -English/French web pages in a variety of domains. The system achieves a recall -of ca. 88% if no in-domain data is used for building the latent semantic model, -and 93% if such data is included. - Analysing the system's errors on the training data, we argue that evaluating -aligner performance based on exact URL matches under-estimates their true -performance and propose an alternative that is able to account for duplicates -and near-duplicates in the underlying data. -" -5386,1707.09448,"Vineet John, Olga Vechtomova","Sentiment Analysis on Financial News Headlines using Training Dataset - Augmentation",cs.CL," This paper discusses the approach taken by the UWaterloo team to arrive at a -solution for the Fine-Grained Sentiment Analysis problem posed by Task 5 of -SemEval 2017. The paper describes the document vectorization and sentiment -score prediction techniques used, as well as the design and implementation -decisions taken while building the system for this task. The system uses text -vectorization models, such as N-gram, TF-IDF and paragraph embeddings, coupled -with regression model variants to predict the sentiment scores. Amongst the -methods examined, unigrams and bigrams coupled with simple linear regression -obtained the best baseline accuracy. The paper also explores data augmentation -methods to supplement the training dataset. This system was designed for -Subtask 2 (News Statements and Headlines). -" -5387,1707.09457,"Jieyu Zhao and Tianlu Wang and Mark Yatskar and Vicente Ordonez and - Kai-Wei Chang","Men Also Like Shopping: Reducing Gender Bias Amplification using - Corpus-level Constraints",cs.AI cs.CL cs.CV stat.ML," Language is increasingly being used to define rich visual recognition -problems with supporting image collections sourced from the web. Structured -prediction models are used in these tasks to take advantage of correlations -between co-occurring labels and visual input but risk inadvertently encoding -social biases found in web corpora. In this work, we study data and models -associated with multilabel object classification and visual semantic role -labeling. We find that (a) datasets for these tasks contain significant gender -bias and (b) models trained on these datasets further amplify existing bias. -For example, the activity cooking is over 33% more likely to involve females -than males in a training set, and a trained model further amplifies the -disparity to 68% at test time. We propose to inject corpus-level constraints -for calibrating existing structured prediction models and design an algorithm -based on Lagrangian relaxation for collective inference. Our method results in -almost no performance loss for the underlying recognition task but decreases -the magnitude of bias amplification by 47.5% and 40.5% for multilabel -classification and visual semantic role labeling, respectively. -" -5388,1707.09468,"Rowan Zellers, Yejin Choi",Zero-Shot Activity Recognition with Verb Attribute Induction,cs.CL cs.CV," In this paper, we investigate large-scale zero-shot activity recognition by -modeling the visual and linguistic attributes of action verbs. For example, the -verb ""salute"" has several properties, such as being a light movement, a social -act, and short in duration. We use these attributes as the internal mapping -between visual and textual representations to reason about a previously unseen -action. In contrast to much prior work that assumes access to gold standard -attributes for zero-shot classes and focuses primarily on object attributes, -our model uniquely learns to infer action attributes from dictionary -definitions and distributed word representations. Experimental results confirm -that action attributes inferred from language can provide a predictive signal -for zero-shot prediction of previously unseen activities. -" -5389,1707.09491,Stefano Gurciullo and Slava Mikhaylov,"Topology Analysis of International Networks Based on Debates in the - United Nations",cs.CL cs.CG math.AT stat.AP," In complex, high dimensional and unstructured data it is often difficult to -extract meaningful patterns. This is especially the case when dealing with -textual data. Recent studies in machine learning, information theory and -network science have developed several novel instruments to extract the -semantics of unstructured data, and harness it to build a network of relations. -Such approaches serve as an efficient tool for dimensionality reduction and -pattern detection. This paper applies semantic network science to extract -ideological proximity in the international arena, by focusing on the data from -General Debates in the UN General Assembly on the topics of high salience to -international community. UN General Debate corpus (UNGDC) covers all high-level -debates in the UN General Assembly from 1970 to 2014, covering all UN member -states. The research proceeds in three main steps. First, Latent Dirichlet -Allocation (LDA) is used to extract the topics of the UN speeches, and -therefore semantic information. Each country is then assigned a vector -specifying the exposure to each of the topics identified. This intermediate -output is then used in to construct a network of countries based on information -theoretical metrics where the links capture similar vectorial patterns in the -topic distributions. Topology of the networks is then analyzed through network -properties like density, path length and clustering. Finally, we identify -specific topological features of our networks using the map equation framework -to detect communities in our networks of countries. -" -5390,1707.09533,Tom Kocmi and Ondrej Bojar,"Curriculum Learning and Minibatch Bucketing in Neural Machine - Translation",cs.CL," We examine the effects of particular orderings of sentence pairs on the -on-line training of neural machine translation (NMT). We focus on two types of -such orderings: (1) ensuring that each minibatch contains sentences similar in -some aspect and (2) gradual inclusion of some sentence types as the training -progresses (so called ""curriculum learning""). In our English-to-Czech -experiments, the internal homogeneity of minibatches has no effect on the -training but some of our ""curricula"" achieve a small improvement over the -baseline. -" -5391,1707.09538,"Erik Cambria, Devamanyu Hazarika, Soujanya Poria, Amir Hussain, R.B.V. - Subramaanyam",Benchmarking Multimodal Sentiment Analysis,cs.MM cs.CL," We propose a framework for multimodal sentiment analysis and emotion -recognition using convolutional neural network-based feature extraction from -text and visual modalities. We obtain a performance improvement of 10% over the -state of the art by combining visual, text and audio features. We also discuss -some major issues frequently ignored in multimodal sentiment analysis research: -the role of speaker-independent models, importance of the modalities and -generalizability. The paper thus serve as a new benchmark for further research -in multimodal sentiment analysis and also demonstrates the different facets of -analysis to be considered while performing such tasks. -" -5392,1707.09569,"Chaitanya Malaviya, Graham Neubig, Patrick Littell",Learning Language Representations for Typology Prediction,cs.CL," One central mystery of neural NLP is what neural models ""know"" about their -subject matter. When a neural machine translation system learns to translate -from one language to another, does it learn the syntax or semantics of the -languages? Can this knowledge be extracted from the system to fill holes in -human scientific knowledge? Existing typological databases contain relatively -full feature specifications for only a few hundred languages. Exploiting the -existence of parallel texts in more than a thousand languages, we build a -massive many-to-one neural machine translation (NMT) system from 1017 languages -into English, and use this to predict information missing from typological -databases. Experiments show that the proposed method is able to infer not only -syntactic, but also phonological and phonetic inventory features, and improves -over a baseline that has access to information about the languages' geographic -and phylogenetic neighbors. -" -5393,1707.09611,"Dilek K\""u\c{c}\""uk",Joint Named Entity Recognition and Stance Detection in Tweets,cs.CL," Named entity recognition (NER) is a well-established task of information -extraction which has been studied for decades. More recently, studies reporting -NER experiments on social media texts have emerged. On the other hand, stance -detection is a considerably new research topic usually considered within the -scope of sentiment analysis. Stance detection studies are mostly applied to -texts of online debates where the stance of the text owner for a particular -target, either explicitly or implicitly mentioned in text, is explored. In this -study, we investigate the possible contribution of named entities to the stance -detection task in tweets. We report the evaluation results of NER experiments -as well as that of the subsequent stance detection experiments using named -entities, on a publicly-available stance-annotated data set of tweets. Our -results indicate that named entities obtained with a high-performance NER -system can contribute to stance detection performance on tweets. -" -5394,1707.09751,"Le Van-Duyet, Vo Minh Quan, Dang Quang An","Skill2vec: Machine Learning Approach for Determining the Relevant Skills - from Job Description",cs.CL," Unsupervise learned word embeddings have seen tremendous success in numerous -Natural Language Processing (NLP) tasks in recent years. The main contribution -of this paper is to develop a technique called Skill2vec, which applies machine -learning techniques in recruitment to enhance the search strategy to find -candidates possessing the appropriate skills. Skill2vec is a neural network -architecture inspired by Word2vec, developed by Mikolov et al. in 2013. It -transforms skills to new vector space, which has the characteristics of -calculation and presents skills relationships. We conducted an experiment -evaluation manually by a recruitment company's domain experts to demonstrate -the effectiveness of our approach. -" -5395,1707.09769,"Ottokar Tilk and Tanel Alum\""ae",Low-Resource Neural Headline Generation,cs.CL," Recent neural headline generation models have shown great results, but are -generally trained on very large datasets. We focus our efforts on improving -headline quality on smaller datasets by the means of pretraining. We propose -new methods that enable pre-training all the parameters of the model and -utilize all available text, resulting in improvements by up to 32.4% relative -in perplexity and 2.84 points in ROUGE. -" -5396,1707.09816,Natalia Loukachevitch and Michael Nokel and Kirill Ivanov,Combining Thesaurus Knowledge and Probabilistic Topic Models,cs.CL," In this paper we present the approach of introducing thesaurus knowledge into -probabilistic topic models. The main idea of the approach is based on the -assumption that the frequencies of semantically related words and phrases, -which are met in the same texts, should be enhanced: this action leads to their -larger contribution into topics found in these texts. We have conducted -experiments with several thesauri and found that for improving topic models, it -is useful to utilize domain-specific knowledge. If a general thesaurus, such as -WordNet, is used, the thesaurus-based improvement of topic models can be -achieved with excluding hyponymy relations in combined topic models. -" -5397,1707.09823,"Di Jiang, Zeyu Chen, Rongzhong Lian, Siqi Bao, Chen Li",Familia: An Open-Source Toolkit for Industrial Topic Modeling,cs.IR cs.CL," Familia is an open-source toolkit for pragmatic topic modeling in industry. -Familia abstracts the utilities of topic modeling in industry as two paradigms: -semantic representation and semantic matching. Efficient implementations of the -two paradigms are made publicly available for the first time. Furthermore, we -provide off-the-shelf topic models trained on large-scale industrial corpora, -including Latent Dirichlet Allocation (LDA), SentenceLDA and Topical Word -Embedding (TWE). We further describe typical applications which are -successfully powered by topic modeling, in order to ease the confusions and -difficulties of software engineers during topic model selection and -utilization. -" -5398,1707.09861,Nils Reimers and Iryna Gurevych,"Reporting Score Distributions Makes a Difference: Performance Study of - LSTM-networks for Sequence Tagging",cs.CL stat.ML," In this paper we show that reporting a single performance score is -insufficient to compare non-deterministic approaches. We demonstrate for common -sequence tagging tasks that the seed value for the random number generator can -result in statistically significant (p < 10^-4) differences for -state-of-the-art systems. For two recent systems for NER, we observe an -absolute difference of one percentage point F1-score depending on the selected -seed value, making these systems perceived either as state-of-the-art or -mediocre. Instead of publishing and reporting single performance scores, we -propose to compare score distributions based on multiple executions. Based on -the evaluation of 50.000 LSTM-networks for five sequence tagging tasks, we -present network architectures that produce both superior performance as well as -are more stable with respect to the remaining hyperparameters. -" -5399,1707.09872,"Armand Vilalta, Dario Garcia-Gasulla, Ferran Par\'es, Eduard - Ayguad\'e, Jesus Labarta, Ulises Cort\'es, Toyotaro Suzumura",Full-Network Embedding in a Multimodal Embedding Pipeline,cs.CV cs.CL cs.NE," The current state-of-the-art for image annotation and image retrieval tasks -is obtained through deep neural networks, which combine an image representation -and a text representation into a shared embedding space. In this paper we -evaluate the impact of using the Full-Network embedding in this setting, -replacing the original image representation in a competitive multimodal -embedding generation scheme. Unlike the one-layer image embeddings typically -used by most approaches, the Full-Network embedding provides a multi-scale -representation of images, which results in richer characterizations. To measure -the influence of the Full-Network embedding, we evaluate its performance on -three different datasets, and compare the results with the original multimodal -embedding generation scheme when using a one-layer image embedding, and with -the rest of the state-of-the-art. Results for image annotation and image -retrieval tasks indicate that the Full-Network embedding is consistently -superior to the one-layer embedding. These results motivate the integration of -the Full-Network embedding on any multimodal embedding generation scheme, -something feasible thanks to the flexibility of the approach. -" -5400,1707.09879,"Duygu Ataman, Matteo Negri, Marco Turchi, Marcello Federico","Linguistically Motivated Vocabulary Reduction for Neural Machine - Translation from Turkish to English",cs.CL," The necessity of using a fixed-size word vocabulary in order to control the -model complexity in state-of-the-art neural machine translation (NMT) systems -is an important bottleneck on performance, especially for morphologically rich -languages. Conventional methods that aim to overcome this problem by using -sub-word or character-level representations solely rely on statistics and -disregard the linguistic properties of words, which leads to interruptions in -the word structure and causes semantic and syntactic losses. In this paper, we -propose a new vocabulary reduction method for NMT, which can reduce the -vocabulary of a given input corpus at any rate while also considering the -morphological properties of the language. Our method is based on unsupervised -morphology learning and can be, in principle, used for pre-processing any -language pair. We also present an alternative word segmentation method based on -supervised morphological analysis, which aids us in measuring the accuracy of -our model. We evaluate our method in Turkish-to-English NMT task where the -input language is morphologically rich and agglutinative. We analyze different -representation methods in terms of translation accuracy as well as the semantic -and syntactic properties of the generated output. Our method obtains a -significant improvement of 2.3 BLEU points over the conventional vocabulary -reduction technique, showing that it can provide better accuracy in open -vocabulary translation of morphologically rich languages. -" -5401,1707.09920,"Antonio Valerio Miceli Barone and Barry Haddow and Ulrich Germann and - Rico Sennrich",Regularization techniques for fine-tuning in neural machine translation,cs.CL," We investigate techniques for supervised domain adaptation for neural machine -translation where an existing model trained on a large out-of-domain dataset is -adapted to a small in-domain dataset. In this scenario, overfitting is a major -challenge. We investigate a number of techniques to reduce overfitting and -improve transfer learning, including regularization techniques such as dropout -and L2-regularization towards an out-of-domain prior. In addition, we introduce -tuneout, a novel regularization technique inspired by dropout. We apply these -techniques, alone and in combination, to neural machine translation, obtaining -improvements on IWSLT datasets for English->German and English->Russian. We -also investigate the amounts of in-domain training data needed for domain -adaptation in NMT, and find a logarithmic relationship between the amount of -training data and gain in BLEU score. -" -5402,1708.00055,"Daniel Cer, Mona Diab, Eneko Agirre, I\~nigo Lopez-Gazpio and Lucia - Specia","SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and - Cross-lingual Focused Evaluation",cs.CL," Semantic Textual Similarity (STS) measures the meaning similarity of -sentences. Applications include machine translation (MT), summarization, -generation, question answering (QA), short answer grading, semantic search, -dialog and conversational systems. The STS shared task is a venue for assessing -the current state-of-the-art. The 2017 task focuses on multilingual and -cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) -data. The task obtained strong participation from 31 teams, with 17 -participating in all language tracks. We summarize performance and review a -selection of well performing methods. Analysis highlights common errors, -providing insight into the limitations of existing models. To support ongoing -work on semantic representations, the STS Benchmark is introduced as a new -shared training and evaluation set carefully selected from the corpus of -English STS shared task data (2012-2017). -" -5403,1708.00077,"Ekaterina Lobacheva, Nadezhda Chirkova, Dmitry Vetrov",Bayesian Sparsification of Recurrent Neural Networks,stat.ML cs.CL cs.LG," Recurrent neural networks show state-of-the-art results in many text analysis -tasks but often require a lot of memory to store their weights. Recently -proposed Sparse Variational Dropout eliminates the majority of the weights in a -feed-forward neural network without significant loss of quality. We apply this -technique to sparsify recurrent neural networks. To account for recurrent -specifics we also rely on Binary Variational Dropout for RNN. We report 99.5% -sparsity level on sentiment analysis task without a quality drop and up to 87% -sparsity level on language modeling task with slight loss of accuracy. -" -5404,1708.00098,"Kyle Richardson, Sina Zarrie{\ss} and Jonas Kuhn",The Code2Text Challenge: Text Generation in Source Code Libraries,cs.CL," We propose a new shared task for tactical data-to-text generation in the -domain of source code libraries. Specifically, we focus on text generation of -function descriptions from example software projects. Data is drawn from -existing resources used for studying the related problem of semantic parser -induction (Richardson and Kuhn, 2017b; Richardson and Kuhn, 2017a), and spans a -wide variety of both natural languages and programming languages. In this -paper, we describe these existing resources, which will serve as training and -development data for the task, and discuss plans for building new independent -test sets. -" -5405,1708.00107,"Bryan McCann, James Bradbury, Caiming Xiong and Richard Socher",Learned in Translation: Contextualized Word Vectors,cs.CL cs.AI cs.LG," Computer vision has benefited from initializing multiple deep layers with -weights pretrained on large supervised training sets like ImageNet. Natural -language processing (NLP) typically sees initialization of only the lowest -layer of deep models with pretrained word vectors. In this paper, we use a deep -LSTM encoder from an attentional sequence-to-sequence model trained for machine -translation (MT) to contextualize word vectors. We show that adding these -context vectors (CoVe) improves performance over using only unsupervised word -and character vectors on a wide variety of common NLP tasks: sentiment analysis -(SST, IMDb), question classification (TREC), entailment (SNLI), and question -answering (SQuAD). For fine-grained sentiment analysis and entailment, CoVe -improves performance of our baseline models to the state of the art. -" -5406,1708.00111,"Kartik Goyal, Graham Neubig, Chris Dyer and Taylor Berg-Kirkpatrick","A Continuous Relaxation of Beam Search for End-to-end Training of Neural - Sequence Models",cs.LG cs.CL cs.NE," Beam search is a desirable choice of test-time decoding algorithm for neural -sequence models because it potentially avoids search errors made by simpler -greedy methods. However, typical cross entropy training procedures for these -models do not directly consider the behaviour of the final decoding method. As -a result, for cross-entropy trained models, beam decoding can sometimes yield -reduced test performance when compared with greedy decoding. In order to train -models that can more effectively make use of beam search, we propose a new -training procedure that focuses on the final loss metric (e.g. Hamming loss) -evaluated on the output of beam search. While well-defined, this ""direct loss"" -objective is itself discontinuous and thus difficult to optimize. Hence, in our -approach, we form a sub-differentiable surrogate objective by introducing a -novel continuous approximation of the beam search decoding procedure. In -experiments, we show that optimizing this new training objective yields -substantially better results on two sequence tasks (Named Entity Recognition -and CCG Supertagging) when compared with both cross entropy trained greedy -decoding and cross entropy trained beam decoding baselines. -" -5407,1708.00112,"Benjamin J. Lengerich, Andrew L. Maas, Christopher Potts","Retrofitting Distributional Embeddings to Knowledge Graphs with - Functional Relations",stat.ML cs.CL cs.LG," Knowledge graphs are a versatile framework to encode richly structured data -relationships, but it can be challenging to combine these graphs with -unstructured data. Methods for retrofitting pre-trained entity representations -to the structure of a knowledge graph typically assume that entities are -embedded in a connected space and that relations imply similarity. However, -useful knowledge graphs often contain diverse entities and relations (with -potentially disjoint underlying corpora) which do not accord with these -assumptions. To overcome these limitations, we present Functional Retrofitting, -a framework that generalizes current retrofitting methods by explicitly -modeling pairwise relations. Our framework can directly incorporate a variety -of pairwise penalty functions previously developed for knowledge graph -completion. Further, it allows users to encode, learn, and extract information -about relation semantics. We present both linear and neural instantiations of -the framework. Functional Retrofitting significantly outperforms existing -retrofitting methods on complex knowledge graphs and loses no accuracy on -simpler graphs (in which relations do imply similarity). Finally, we -demonstrate the utility of the framework by predicting new drug--disease -treatment pairs in a large, complex health knowledge graph. -" -5408,1708.00133,"Karthik Narasimhan, Regina Barzilay and Tommi Jaakkola",Grounding Language for Transfer in Deep Reinforcement Learning,cs.CL cs.AI cs.LG," In this paper, we explore the utilization of natural language to drive -transfer for reinforcement learning (RL). Despite the wide-spread application -of deep RL techniques, learning generalized policy representations that work -across domains remains a challenging problem. We demonstrate that textual -descriptions of environments provide a compact intermediate channel to -facilitate effective policy transfer. Specifically, by learning to ground the -meaning of text to the dynamics of the environment such as transitions and -rewards, an autonomous agent can effectively bootstrap policy learning on a new -domain given its description. We employ a model-based RL approach consisting of -a differentiable planning module, a model-free component and a factorized state -representation to effectively use entity descriptions. Our model outperforms -prior work on both transfer and multi-task scenarios in a variety of different -environments. For instance, we achieve up to 14% and 11.5% absolute improvement -over previously existing models in terms of average and initial rewards, -respectively. -" -5409,1708.00154,"Piji Li, Zihao Wang, Zhaochun Ren, Lidong Bing, Wai Lam","Neural Rating Regression with Abstractive Tips Generation for - Recommendation",cs.CL cs.AI cs.IR," Recently, some E-commerce sites launch a new interaction box called Tips on -their mobile apps. Users can express their experience and feelings or provide -suggestions using short texts typically several words or one sentence. In -essence, writing some tips and giving a numerical rating are two facets of a -user's product assessment action, expressing the user experience and feelings. -Jointly modeling these two facets is helpful for designing a better -recommendation system. While some existing models integrate text information -such as item specifications or user reviews into user and item latent factors -for improving the rating prediction, no existing works consider tips for -improving recommendation quality. We propose a deep learning based framework -named NRT which can simultaneously predict precise ratings and generate -abstractive tips with good linguistic quality simulating user experience and -feelings. For abstractive tips generation, gated recurrent neural networks are -employed to ""translate"" user and item latent representations into a concise -sentence. Extensive experiments on benchmark datasets from different domains -show that NRT achieves significant improvements over the state-of-the-art -methods. Moreover, the generated tips can vividly predict the user experience -and feelings. -" -5410,1708.00160,Nafise Sadat Moosavi and Michael Strube,"Using Linguistic Features to Improve the Generalization Capability of - Neural Coreference Resolvers",cs.CL," Coreference resolution is an intermediate step for text understanding. It is -used in tasks and domains for which we do not necessarily have coreference -annotated corpora. Therefore, generalization is of special importance for -coreference resolution. However, while recent coreference resolvers have -notable improvements on the CoNLL dataset, they struggle to generalize properly -to new domains or datasets. In this paper, we investigate the role of -linguistic features in building more generalizable coreference resolvers. We -show that generalization improves only slightly by merely using a set of -additional linguistic features. However, employing features and subsets of -their values that are informative for coreference resolution, considerably -improves generalization. Thanks to better generalization, our system achieves -state-of-the-art results in out-of-domain evaluations, e.g., on WikiCoref, our -system, which is trained on CoNLL, achieves on-par performance with a system -designed for this dataset. -" -5411,1708.00179,"Emily Sheng, Prem Natarajan, Jonathan Gordon, and Gully Burns",An Investigation into the Pedagogical Features of Documents,cs.CL," Characterizing the content of a technical document in terms of its learning -utility can be useful for applications related to education, such as generating -reading lists from large collections of documents. We refer to this learning -utility as the ""pedagogical value"" of the document to the learner. While -pedagogical value is an important concept that has been studied extensively -within the education domain, there has been little work exploring it from a -computational, i.e., natural language processing (NLP), perspective. To allow a -computational exploration of this concept, we introduce the notion of -""pedagogical roles"" of documents (e.g., Tutorial and Survey) as an intermediary -component for the study of pedagogical value. Given the lack of available -corpora for our exploration, we create the first annotated corpus of -pedagogical roles and use it to test baseline techniques for automatic -prediction of such roles. -" -5412,1708.00214,"Jan A. Botha, Emily Pitler, Ji Ma, Anton Bakalov, Alex Salcianu, David - Weiss, Ryan McDonald, Slav Petrov",Natural Language Processing with Small Feed-Forward Networks,cs.CL cs.NE," We show that small and shallow feed-forward neural networks can achieve near -state-of-the-art results on a range of unstructured and structured language -processing tasks while being considerably cheaper in memory and computational -requirements than deep recurrent models. Motivated by resource-constrained -environments like mobile phones, we showcase simple techniques for obtaining -such small neural network models, and investigate different tradeoffs when -deciding how to allocate a small memory budget. -" -5413,1708.00241,"Vishaal Jatav, Ravi Teja, Srini Bharadwaj and Venkat Srinivasan",Improving Part-of-Speech Tagging for NLP Pipelines,cs.CL," This paper outlines the results of sentence level linguistics based rules for -improving part-of-speech tagging. It is well known that the performance of -complex NLP systems is negatively affected if one of the preliminary stages is -less than perfect. Errors in the initial stages in the pipeline have a -snowballing effect on the pipeline's end performance. We have created a set of -linguistics based rules at the sentence level which adjust part-of-speech tags -from state-of-the-art taggers. Comparison with state-of-the-art taggers on -widely used benchmarks demonstrate significant improvements in tagging accuracy -and consequently in the quality and accuracy of NLP systems. -" -5414,1708.00308,"Ramesh Nallapati, Igor Melnyk, Abhishek Kumar and Bowen Zhou",SenGen: Sentence Generating Neural Variational Topic Model,cs.CL cs.LG stat.ML," We present a new topic model that generates documents by sampling a topic for -one whole sentence at a time, and generating the words in the sentence using an -RNN decoder that is conditioned on the topic of the sentence. We argue that -this novel formalism will help us not only visualize and model the topical -discourse structure in a document better, but also potentially lead to more -interpretable topics since we can now illustrate topics by sampling -representative sentences instead of bag of words or phrases. We present a -variational auto-encoder approach for learning in which we use a factorized -variational encoder that independently models the posterior over topical -mixture vectors of documents using a feed-forward network, and the posterior -over topic assignments to sentences using an RNN. Our preliminary experiments -on two different datasets indicate early promise, but also expose many -challenges that remain to be addressed. -" -5415,1708.00391,"Wuwei Lan, Siyu Qiu, Hua He and Wei Xu",A Continuously Growing Dataset of Sentential Paraphrases,cs.CL," A major challenge in paraphrase research is the lack of parallel corpora. In -this paper, we present a new method to collect large-scale sentential -paraphrases from Twitter by linking tweets through shared URLs. The main -advantage of our method is its simplicity, as it gets rid of the classifier or -human in the loop needed to select data before annotation and subsequent -application of paraphrase identification algorithms in the previous work. We -present the largest human-labeled paraphrase corpus to date of 51,524 sentence -pairs and the first cross-domain benchmarking for automatic paraphrase -identification. In addition, we show that more than 30,000 new sentential -paraphrases can be easily and continuously captured every month at ~70% -precision, and demonstrate their utility for downstream NLP tasks through -phrasal paraphrase extraction. We make our code and data freely available. -" -5416,1708.00415,"Jianpeng Cheng, Adam Lopez and Mirella Lapata",A Generative Parser with a Discriminative Recognition Algorithm,cs.CL," Generative models defining joint distributions over parse trees and sentences -are useful for parsing and language modeling, but impose restrictions on the -scope of features and are often outperformed by discriminative models. We -propose a framework for parsing and language modeling which marries a -generative model with a discriminative recognition model in an encoder-decoder -setting. We provide interpretations of the framework based on expectation -maximization and variational inference, and show that it enables parsing and -language modeling within a single implementation. On the English Penn -Treen-bank, our framework obtains competitive performance on constituency -parsing while matching the state-of-the-art single-model language modeling -score. -" -5417,1708.00416,"Joao Sedoc, Derry Wijaya, Masoud Rouhizadeh, Andy Schwartz, Lyle Ungar",Deriving Verb Predicates By Clustering Verbs with Arguments,cs.CL," Hand-built verb clusters such as the widely used Levin classes (Levin, 1993) -have proved useful, but have limited coverage. Verb classes automatically -induced from corpus data such as those from VerbKB (Wijaya, 2016), on the other -hand, can give clusters with much larger coverage, and can be adapted to -specific corpora such as Twitter. We present a method for clustering the -outputs of VerbKB: verbs with their multiple argument types, e.g. -""marry(person, person)"", ""feel(person, emotion)."" We make use of a novel -low-dimensional embedding of verbs and their arguments to produce high quality -clusters in which the same verb can be in different clusters depending on its -argument type. The resulting verb clusters do a better job than hand-built -clusters of predicting sarcasm, sentiment, and locus of control in tweets. -" -5418,1708.00481,"Hidekazu Oiwa, Yoshihiko Suhara, Jiyu Komiya, Andrei Lopatenko",A Lightweight Front-end Tool for Interactive Entity Population,cs.CL cs.IR," Entity population, a task of collecting entities that belong to a particular -category, has attracted attention from vertical domains. There is still a high -demand for creating entity dictionaries in vertical domains, which are not -covered by existing knowledge bases. We develop a lightweight front-end tool -for facilitating interactive entity population. We implement key components -necessary for effective interactive entity population: 1) GUI-based dashboards -to quickly modify an entity dictionary, and 2) entity highlighting on documents -for quickly viewing the current progress. We aim to reduce user cost from -beginning to end, including package installation and maintenance. The -implementation enables users to use this tool on their web browsers without any -additional packages --- users can focus on their missions to create entity -dictionaries. Moreover, an entity expansion module is implemented as external -APIs. This design makes it easy to continuously improve interactive entity -population pipelines. We are making our demo publicly available -(http://bit.ly/luwak-demo). -" -5419,1708.00531,"Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris - Dyer, Noah A. Smith, Steve Renals",End-to-End Neural Segmental Models for Speech Recognition,cs.CL cs.LG cs.SD," Segmental models are an alternative to frame-based models for sequence -prediction, where hypothesized path weights are based on entire segment scores -rather than a single frame at a time. Neural segmental models are segmental -models that use neural network-based weight functions. Neural segmental models -have achieved competitive results for speech recognition, and their end-to-end -training has been explored in several studies. In this work, we review neural -segmental models, which can be viewed as consisting of a neural network-based -acoustic encoder and a finite-state transducer decoder. We study end-to-end -segmental models with different weight functions, including ones based on -frame-level neural classifiers and on segmental recurrent neural networks. We -study how reducing the search space size impacts performance under different -weight functions. We also compare several loss functions for end-to-end -training. Finally, we explore training approaches, including multi-stage vs. -end-to-end training and multitask training that combines segmental and -frame-level losses. -" -5420,1708.00549,"Xiang Li, Luke Vilnis, Andrew McCallum",Improved Representation Learning for Predicting Commonsense Ontologies,cs.CL stat.ML," Recent work in learning ontologies (hierarchical and partially-ordered -structures) has leveraged the intrinsic geometry of spaces of learned -representations to make predictions that automatically obey complex structural -constraints. We explore two extensions of one such model, the order-embedding -model for hierarchical relation learning, with an aim towards improved -performance on text data for commonsense knowledge representation. Our first -model jointly learns ordering relations and non-hierarchical knowledge in the -form of raw text. Our second extension exploits the partial order structure of -the training data to find long-distance triplet constraints among embeddings -which are poorly enforced by the pairwise training procedure. We find that both -incorporating free text and augmented training constraints improve over the -original order-embedding model and other strong baselines. -" -5421,1708.00553,"Dung Thai, Shikhar Murty, Trapit Bansal, Luke Vilnis, David Belanger, - Andrew McCallum",Low-Rank Hidden State Embeddings for Viterbi Sequence Labeling,cs.CL," In textual information extraction and other sequence labeling tasks it is now -common to use recurrent neural networks (such as LSTM) to form rich embedded -representations of long-term input co-occurrence patterns. Representation of -output co-occurrence patterns is typically limited to a hand-designed graphical -model, such as a linear-chain CRF representing short-term Markov dependencies -among successive labels. This paper presents a method that learns embedded -representations of latent output structure in sequence data. Our model takes -the form of a finite-state machine with a large number of latent states per -label (a latent variable CRF), where the state-transition matrix is -factorized---effectively forming an embedded representation of -state-transitions capable of enforcing long-term label dependencies, while -supporting exact Viterbi inference over output labels. We demonstrate accuracy -improvements and interpretable latent structure in a synthetic but complex task -based on CoNLL named entity recognition. -" -5422,1708.00563,"Jan Niehues, Eunah Cho, Thanh-Le Ha, Alex Waibel",Analyzing Neural MT Search and Model Performance,cs.CL," In this paper, we offer an in-depth analysis about the modeling and search -performance. We address the question if a more complex search algorithm is -necessary. Furthermore, we investigate the question if more complex models -which might only be applicable during rescoring are promising. - By separating the search space and the modeling using $n$-best list -reranking, we analyze the influence of both parts of an NMT system -independently. By comparing differently performing NMT systems, we show that -the better translation is already in the search space of the translation -systems with less performance. This results indicate that the current search -algorithms are sufficient for the NMT systems. Furthermore, we could show that -even a relatively small $n$-best list of $50$ hypotheses already contain -notably better translations. -" -5423,1708.00625,"Piji Li, Wai Lam, Lidong Bing, Zihao Wang",Deep Recurrent Generative Decoder for Abstractive Text Summarization,cs.CL cs.AI," We propose a new framework for abstractive text summarization based on a -sequence-to-sequence oriented encoder-decoder model equipped with a deep -recurrent generative decoder (DRGN). - Latent structure information implied in the target summaries is learned based -on a recurrent latent random model for improving the summarization quality. - Neural variational inference is employed to address the intractable posterior -inference for the recurrent latent variables. - Abstractive summaries are generated based on both the generative latent -variables and the discriminative deterministic states. - Extensive experiments on some benchmark datasets in different languages show -that DRGN achieves improvements over the state-of-the-art methods. -" -5424,1708.00667,"Takuya Hiraoka, Masaaki Tsuchida, Yotaro Watanabe","Deep Reinforcement Learning for Inquiry Dialog Policies with Logical - Formula Embeddings",cs.AI cs.CL," This paper is the first attempt to learn the policy of an inquiry dialog -system (IDS) by using deep reinforcement learning (DRL). Most IDS frameworks -represent dialog states and dialog acts with logical formulae. In order to make -learning inquiry dialog policies more effective, we introduce a logical formula -embedding framework based on a recursive neural network. The results of -experiments to evaluate the effect of 1) the DRL and 2) the logical formula -embedding framework show that the combination of the two are as effective or -even better than existing rule-based methods for inquiry dialog policies. -" -5425,1708.00712,"Marlies van der Wees, Arianna Bisazza and Christof Monz",Dynamic Data Selection for Neural Machine Translation,cs.CL," Intelligent selection of training data has proven a successful technique to -simultaneously increase training efficiency and translation performance for -phrase-based machine translation (PBMT). With the recent increase in popularity -of neural machine translation (NMT), we explore in this paper to what extent -and how NMT can also benefit from data selection. While state-of-the-art data -selection (Axelrod et al., 2011) consistently performs well for PBMT, we show -that gains are substantially lower for NMT. Next, we introduce dynamic data -selection for NMT, a method in which we vary the selected subset of training -data between different training epochs. Our experiments show that the best -results are achieved when applying a technique we call gradual fine-tuning, -with improvements up to +2.6 BLEU over the original data selection approach and -up to +3.1 BLEU over a general baseline. -" -5426,1708.00726,"Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry - Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone and Philip Williams",The University of Edinburgh's Neural MT Systems for WMT17,cs.CL," This paper describes the University of Edinburgh's submissions to the WMT17 -shared news translation and biomedical translation tasks. We participated in 12 -translation directions for news, translating between English and Czech, German, -Latvian, Russian, Turkish and Chinese. For the biomedical task we submitted -systems for English to Czech, German, Polish and Romanian. Our systems are -neural machine translation systems trained with Nematus, an attentional -encoder-decoder. We follow our setup from last year and build BPE-based models -with parallel and back-translated monolingual training data. Novelties this -year include the use of deep architectures, layer normalization, and more -compact models due to weight tying and improvements in BPE segmentations. We -perform extensive ablative experiments, reporting on the effectivenes of layer -normalization, deep architectures, and different ensembling techniques. -" -5427,1708.00781,"Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, Noah A. - Smith",Dynamic Entity Representations in Neural Language Models,cs.CL cs.LG," Understanding a long document requires tracking how entities are introduced -and evolve over time. We present a new type of language model, EntityNLM, that -can explicitly model entities, dynamically update their representations, and -contextually generate their mentions. Our model is generative and flexible; it -can model an arbitrary number of entities in context while generating each -entity mention at an arbitrary length. In addition, it can be used for several -different tasks such as language modeling, coreference resolution, and entity -prediction. Experimental results with all these tasks demonstrate that our -model consistently outperforms strong baselines and prior work. -" -5428,1708.00790,"Yong Jiang, Wenjuan Han, Kewei Tu","Combining Generative and Discriminative Approaches to Unsupervised - Dependency Parsing via Dual Decomposition",cs.CL," Unsupervised dependency parsing aims to learn a dependency parser from -unannotated sentences. Existing work focuses on either learning generative -models using the expectation-maximization algorithm and its variants, or -learning discriminative models using the discriminative clustering algorithm. -In this paper, we propose a new learning strategy that learns a generative -model and a discriminative model jointly based on the dual decomposition -method. Our method is simple and general, yet effective to capture the -advantages of both models and improve their learning results. We tested our -method on the UD treebank and achieved a state-of-the-art performance on thirty -languages. -" -5429,1708.00801,"Wenjuan Han, Yong Jiang, Kewei Tu","Dependency Grammar Induction with Neural Lexicalization and Big Training - Data",cs.CL," We study the impact of big models (in terms of the degree of lexicalization) -and big data (in terms of the training corpus size) on dependency grammar -induction. We experimented with L-DMV, a lexicalized version of Dependency -Model with Valence and L-NDMV, our lexicalized extension of the Neural -Dependency Model with Valence. We find that L-DMV only benefits from very small -degrees of lexicalization and moderate sizes of training corpora. L-NDMV can -benefit from big training data and lexicalization of greater degrees, -especially when enhanced with good model initialization, and it achieves a -result that is competitive with the current state-of-the-art. -" -5430,1708.00818,"Grishma Jena, Mansi Vashisht, Abheek Basu, Lyle Ungar, Jo\~ao Sedoc",Enterprise to Computer: Star Trek chatbot,cs.CL," Human interactions and human-computer interactions are strongly influenced by -style as well as content. Adding a persona to a chatbot makes it more -human-like and contributes to a better and more engaging user experience. In -this work, we propose a design for a chatbot that captures the ""style"" of Star -Trek by incorporating references from the show along with peculiar tones of the -fictional characters therein. Our Enterprise to Computer bot (E2Cbot) treats -Star Trek dialog style and general dialog style differently, using two -recurrent neural network Encoder-Decoder models. The Star Trek dialog style -uses sequence to sequence (SEQ2SEQ) models (Sutskever et al., 2014; Bahdanau et -al., 2014) trained on Star Trek dialogs. The general dialog style uses Word -Graph to shift the response of the SEQ2SEQ model into the Star Trek domain. We -evaluate the bot both in terms of perplexity and word overlap with Star Trek -vocabulary and subjectively using human evaluators. -" -5431,1708.00850,"Wlodek Zadrozny, Hossein Hematialam and Luciana Garbayo","Towards Semantic Modeling of Contradictions and Disagreements: A Case - Study of Medical Guidelines",cs.CL," We introduce a formal distinction between contradictions and disagreements in -natural language texts, motivated by the need to formally reason about -contradictory medical guidelines. This is a novel and potentially very useful -distinction, and has not been discussed so far in NLP and logic. We also -describe a NLP system capable of automated finding contradictory medical -guidelines; the system uses a combination of text analysis and information -retrieval modules. We also report positive evaluation results on a small corpus -of contradictory medical recommendations. -" -5432,1708.00897,"Sajal Choudhary, Prerna Srivastava, Lyle Ungar, Jo\~ao Sedoc",Domain Aware Neural Dialog System,cs.CL," We investigate the task of building a domain aware chat system which -generates intelligent responses in a conversation comprising of different -domains. The domain, in this case, is the topic or theme of the conversation. -To achieve this, we present DOM-Seq2Seq, a domain aware neural network model -based on the novel technique of using domain-targeted sequence-to-sequence -models (Sutskever et al., 2014) and a domain classifier. The model captures -features from current utterance and domains of the previous utterances to -facilitate the formation of relevant responses. We evaluate our model on -automatic metrics and compare our performance with the Seq2Seq model. -" -5433,1708.00993,"Jan Niehues, Eunah Cho","Exploiting Linguistic Resources for Neural Machine Translation Using - Multi-task Learning",cs.CL," Linguistic resources such as part-of-speech (POS) tags have been extensively -used in statistical machine translation (SMT) frameworks and have yielded -better performances. However, usage of such linguistic annotations in neural -machine translation (NMT) systems has been left under-explored. - In this work, we show that multi-task learning is a successful and a easy -approach to introduce an additional knowledge into an end-to-end neural -attentional model. By jointly training several natural language processing -(NLP) tasks in one system, we are able to leverage common information and -improve the performance of the individual task. - We analyze the impact of three design decisions in multi-task learning: the -tasks used in training, the training schedule, and the degree of parameter -sharing across the tasks, which is defined by the network architecture. The -experiments are conducted for an German to English translation task. As -additional linguistic resources, we exploit POS information and named-entities -(NE). Experiments show that the translation quality can be improved by up to -1.5 BLEU points under the low-resource condition. The performance of the POS -tagger is also improved using the multi-task learning scheme. -" -5434,1708.01009,"Stephen Merity, Bryan McCann, Richard Socher",Revisiting Activation Regularization for Language RNNs,cs.CL cs.NE," Recurrent neural networks (RNNs) serve as a fundamental building block for -many sequence tasks across natural language processing. Recent research has -focused on recurrent dropout techniques or custom RNN cells in order to improve -performance. Both of these can require substantial modifications to the machine -learning model or to the underlying RNN configurations. We revisit traditional -regularization techniques, specifically L2 regularization on RNN activations -and slowness regularization over successive hidden states, to improve the -performance of RNNs on the task of language modeling. Both of these techniques -require minimal modification to existing RNN architectures and result in -performance improvements comparable or superior to more complicated -regularization techniques or custom cell architectures. These regularization -techniques can be used without any modification on optimized LSTM -implementations such as the NVIDIA cuDNN LSTM. -" -5435,1708.01018,"Jiong Cai, Yong Jiang, Kewei Tu",CRF Autoencoder for Unsupervised Dependency Parsing,cs.CL," Unsupervised dependency parsing, which tries to discover linguistic -dependency structures from unannotated data, is a very challenging task. Almost -all previous work on this task focuses on learning generative models. In this -paper, we develop an unsupervised dependency parsing model based on the CRF -autoencoder. The encoder part of our model is discriminative and globally -normalized which allows us to use rich features as well as universal linguistic -priors. We propose an exact algorithm for parsing as well as a tractable -learning algorithm. We evaluated the performance of our model on eight -multilingual treebanks and found that our model achieved comparable performance -with state-of-the-art approaches. -" -5436,1708.01065,"Piji Li, Lidong Bing, Wai Lam","Reader-Aware Multi-Document Summarization: An Enhanced Model and The - First Dataset",cs.CL cs.AI," We investigate the problem of reader-aware multi-document summarization -(RA-MDS) and introduce a new dataset for this problem. To tackle RA-MDS, we -extend a variational auto-encodes (VAEs) based MDS framework by jointly -considering news documents and reader comments. To conduct evaluation for -summarization performance, we prepare a new dataset. We describe the methods -for data collection, aspect annotation, and summary writing as well as -scrutinizing by experts. Experimental results show that reader comments can -improve the summarization performance, which also demonstrates the usefulness -of the proposed dataset. The annotated dataset for RA-MDS is available online. -" -5437,1708.01318,"Amr Sharaf, Shi Feng, Khanh Nguyen, Kiant\'e Brantley, Hal Daum\'e III",The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task,cs.CL cs.AI cs.HC," We describe the University of Maryland machine translation systems submitted -to the WMT17 German-English Bandit Learning Task. The task is to adapt a -translation system to a new domain, using only bandit feedback: the system -receives a German sentence to translate, produces an English sentence, and only -gets a scalar score as feedback. Targeting these two challenges (adaptation and -bandit learning), we built a standard neural machine translation system and -extended it in two ways: (1) robust reinforcement learning techniques to learn -effectively from the bandit feedback, and (2) domain adaptation using data -selection from a large corpus of parallel data. -" -5438,1708.01336,"Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin - Farfade, Alexander Hauptmann",MemexQA: Visual Memex Question Answering,cs.CV cs.CL," This paper proposes a new task, MemexQA: given a collection of photos or -videos from a user, the goal is to automatically answer questions that help -users recover their memory about events captured in the collection. Towards -solving the task, we 1) present the MemexQA dataset, a large, realistic -multimodal dataset consisting of real personal photos and crowd-sourced -questions/answers, 2) propose MemexNet, a unified, end-to-end trainable network -architecture for image, text and video question answering. Experimental results -on the MemexQA dataset demonstrate that MemexNet outperforms strong baselines -and yields the state-of-the-art on this novel and challenging task. The -promising results on TextQA and VideoQA suggest MemexNet's efficacy and -scalability across various QA tasks. -" -5439,1708.01353,"Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, Diana Inkpen","Recurrent Neural Network-Based Sentence Encoder with Gated Attention for - Natural Language Inference",cs.CL," The RepEval 2017 Shared Task aims to evaluate natural language understanding -models for sentence representation, in which a sentence is represented as a -fixed-length vector with neural networks and the quality of the representation -is tested with a natural language inference task. This paper describes our -system (alpha) that is ranked among the top in the Shared Task, on both the -in-domain test set (obtaining a 74.9% accuracy) and on the cross-domain test -set (also attaining a 74.9% accuracy), demonstrating that the model generalizes -well to the cross-domain data. Our model is equipped with intra-sentence -gated-attention composition which helps achieve a better performance. In -addition to submitting our model to the Shared Task, we have also tested it on -the Stanford Natural Language Inference (SNLI) dataset. We obtain an accuracy -of 85.5%, which is the best reported result on SNLI when cross-sentence -attention is not allowed, the same condition enforced in RepEval 2017. -" -5440,1708.01372,"Benjamin Shickel, Martin Heesacker, Sherry Benton, Parisa Rashidi","Hashtag Healthcare: From Tweets to Mental Health Journals Using Deep - Transfer Learning",cs.CL cs.CY," As the popularity of social media platforms continues to rise, an -ever-increasing amount of human communication and self- expression takes place -online. Most recent research has focused on mining social media for public user -opinion about external entities such as product reviews or sentiment towards -political news. However, less attention has been paid to analyzing users' -internalized thoughts and emotions from a mental health perspective. In this -paper, we quantify the semantic difference between public Tweets and private -mental health journals used in online cognitive behavioral therapy. We will use -deep transfer learning techniques for analyzing the semantic gap between the -two domains. We show that for the task of emotional valence prediction, social -media can be successfully harnessed to create more accurate, robust, and -personalized mental health models. Our results suggest that the semantic gap -between public and private self-expression is small, and that utilizing the -abundance of available social media is one way to overcome the small sample -sizes of mental health data, which are commonly limited by availability and -privacy concerns. -" -5441,1708.01425,Ivan Habernal and Henning Wachsmuth and Iryna Gurevych and Benno Stein,"The Argument Reasoning Comprehension Task: Identification and - Reconstruction of Implicit Warrants",cs.CL cs.AI," Reasoning is a crucial part of natural language argumentation. To comprehend -an argument, one must analyze its warrant, which explains why its claim follows -from its premises. As arguments are highly contextualized, warrants are usually -presupposed and left implicit. Thus, the comprehension does not only require -language understanding and logic skills, but also depends on common sense. In -this paper we develop a methodology for reconstructing warrants systematically. -We operationalize it in a scalable crowdsourcing process, resulting in a freely -licensed dataset with warrants for 2k authentic arguments from news comments. -On this basis, we present a new challenging task, the argument reasoning -comprehension task. Given an argument with a claim and a premise, the goal is -to choose the correct implicit warrant from two options. Both warrants are -plausible and lexically close, but lead to contradicting claims. A solution to -this task will define a substantial step towards automatic warrant -reconstruction. However, experiments with several neural attention and language -models reveal that current approaches do not suffice. -" -5442,1708.01464,"Ben Peters, Jon Dehdari and Josef van Genabith",Massively Multilingual Neural Grapheme-to-Phoneme Conversion,cs.CL," Grapheme-to-phoneme conversion (g2p) is necessary for text-to-speech and -automatic speech recognition systems. Most g2p systems are monolingual: they -require language-specific data or handcrafting of rules. Such systems are -difficult to extend to low resource languages, for which data and handcrafted -rules are not available. As an alternative, we present a neural -sequence-to-sequence approach to g2p which is trained on -spelling--pronunciation pairs in hundreds of languages. The system shares a -single encoder and decoder across all languages, allowing it to utilize the -intrinsic similarities between different writing systems. We show an 11% -improvement in phoneme error rate over an approach based on adapting -high-resource monolingual g2p models to low-resource languages. Our model is -also much more compact relative to previous approaches. -" -5443,1708.01525,"Angel J. Gallego, Roman Orus",Language Design as Information Renormalization,cs.CL cond-mat.str-el physics.hist-ph quant-ph," Here we consider some well-known facts in syntax from a physics perspective, -allowing us to establish equivalences between both fields with many -consequences. Mainly, we observe that the operation MERGE, put forward by N. -Chomsky in 1995, can be interpreted as a physical information coarse-graining. -Thus, MERGE in linguistics entails information renormalization in physics, -according to different time scales. We make this point mathematically formal in -terms of language models. In this setting, MERGE amounts to a probability -tensor implementing a coarse-graining, akin to a probabilistic context-free -grammar. The probability vectors of meaningful sentences are given by -stochastic tensor networks (TN) built from diagonal tensors and which are -mostly loop-free, such as Tree Tensor Networks and Matrix Product States, thus -being computationally very efficient to manipulate. We show that this implies -the polynomially-decaying (long-range) correlations experimentally observed in -language, and also provides arguments in favour of certain types of neural -networks for language processing. Moreover, we show how to obtain such language -models from quantum states that can be efficiently prepared on a quantum -computer, and use this to find bounds on the perplexity of the probability -distribution of words in a sentence. Implications of our results are discussed -across several ambits. -" -5444,1708.01565,"Michael Wand, Juergen Schmidhuber","Improving Speaker-Independent Lipreading with Domain-Adversarial - Training",cs.CV cs.CL," We present a Lipreading system, i.e. a speech recognition system using only -visual features, which uses domain-adversarial training for speaker -independence. Domain-adversarial training is integrated into the optimization -of a lipreader based on a stack of feedforward and LSTM (Long Short-Term -Memory) recurrent neural networks, yielding an end-to-end trainable system -which only requires a very small number of frames of untranscribed target data -to substantially improve the recognition accuracy on the target speaker. On -pairs of different source and target speakers, we achieve a relative accuracy -improvement of around 40% with only 15 to 20 seconds of untranscribed target -speech data. On multi-speaker training setups, the accuracy improvements are -smaller but still substantial. -" -5445,1708.01677,Martin Gerlach and Tiago P. Peixoto and Eduardo G. Altmann,A network approach to topic models,stat.ML cs.CL physics.data-an physics.soc-ph," One of the main computational and scientific challenges in the modern age is -to extract useful information from unstructured texts. Topic models are one -popular machine-learning approach which infers the latent topical structure of -a collection of documents. Despite their success --- in particular of its most -widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous -applications in sociology, history, and linguistics, topic models are known to -suffer from severe conceptual and practical problems, e.g. a lack of -justification for the Bayesian priors, discrepancies with statistical -properties of real texts, and the inability to properly choose the number of -topics. Here we obtain a fresh view on the problem of identifying topical -structures by relating it to the problem of finding communities in complex -networks. This is achieved by representing text corpora as bipartite networks -of documents and words. By adapting existing community-detection methods -- -using a stochastic block model (SBM) with non-parametric priors -- we obtain a -more versatile and principled framework for topic modeling (e.g., it -automatically detects the number of topics and hierarchically clusters both the -words and documents). The analysis of artificial and real corpora demonstrates -that our SBM approach leads to better topic models than LDA in terms of -statistical model selection. More importantly, our work shows how to formally -relate methods from community detection and topic modeling, opening the -possibility of cross-fertilization between these two fields. -" -5446,1708.01681,"Octavia-Maria Sulea, Marcos Zampieri, Mihaela Vela, Josef van Genabith",Predicting the Law Area and Decisions of French Supreme Court Cases,cs.CL," In this paper, we investigate the application of text classification methods -to predict the law area and the decision of cases judged by the French Supreme -Court. We also investigate the influence of the time period in which a ruling -was made over the textual form of the case description and the extent to which -it is necessary to mask the judge's motivation for a ruling to emulate a -real-world test scenario. We report results of 96% f1 score in predicting a -case ruling, 90% f1 score in predicting the law area of a case, and 75.9% f1 -score in estimating the time span when a ruling has been issued using a linear -Support Vector Machine (SVM) classifier trained on lexical features. -" -5447,1708.01713,"Shervin Minaee, Zhu Liu",Automatic Question-Answering Using A Deep Similarity Neural Network,cs.CL," Automatic question-answering is a classical problem in natural language -processing, which aims at designing systems that can automatically answer a -question, in the same way as human does. In this work, we propose a deep -learning based model for automatic question-answering. First the questions and -answers are embedded using neural probabilistic modeling. Then a deep -similarity neural network is trained to find the similarity score of a pair of -answer and question. Then for each question, the best answer is found as the -one with the highest similarity score. We first train this model on a -large-scale public question-answering database, and then fine-tune it to -transfer to the customer-care chat data. We have also tested our framework on a -public question-answering database and achieved very good performance. -" -5448,1708.01759,"Ond\v{r}ej Du\v{s}ek, Jekaterina Novikova, Verena Rieser",Referenceless Quality Estimation for Natural Language Generation,cs.CL," Traditional automatic evaluation measures for natural language generation -(NLG) use costly human-authored references to estimate the quality of a system -output. In this paper, we propose a referenceless quality estimation (QE) -approach based on recurrent neural networks, which predicts a quality score for -a NLG system output by comparing it to the source meaning representation only. -Our method outperforms traditional metrics and a constant baseline in most -respects; we also show that synthetic data helps to increase correlation -results by 21% compared to the base system. Our results are comparable to -results obtained in similar QE tasks despite the more challenging setting. -" -5449,1708.01766,"Sanghyuk Choi, Taeuk Kim, Jinseok Seol, Sang-goo Lee",A Syllable-based Technique for Word Embeddings of Korean Words,cs.CL," Word embedding has become a fundamental component to many NLP tasks such as -named entity recognition and machine translation. However, popular models that -learn such embeddings are unaware of the morphology of words, so it is not -directly applicable to highly agglutinative languages such as Korean. We -propose a syllable-based learning model for Korean using a convolutional neural -network, in which word representation is composed of trained syllable vectors. -Our model successfully produces morphologically meaningful representation of -Korean words compared to the original Skip-gram embeddings. The results also -show that it is quite robust to the Out-of-Vocabulary problem. -" -5450,1708.01769,Jorge V. Tohalino and Diego R. Amancio,"Extractive Multi Document Summarization using Dynamical Measurements of - Complex Networks",cs.CL," Due to the large amount of textual information available on Internet, it is -of paramount relevance to use techniques that find relevant and concise -content. A typical task devoted to the identification of informative sentences -in documents is the so called extractive document summarization task. In this -paper, we use complex network concepts to devise an extractive Multi Document -Summarization (MDS) method, which extracts the most central sentences from -several textual sources. In the proposed model, texts are represented as -networks, where nodes represent sentences and the edges are established based -on the number of shared words. Differently from previous works, the -identification of relevant terms is guided by the characterization of nodes via -dynamical measurements of complex networks, including symmetry, accessibility -and absorption time. The evaluation of the proposed system revealed that -excellent results were obtained with particular dynamical measurements, -including those based on the exploration of networks via random walks. -" -5451,1708.01771,"Rongxiang Weng, Shujian Huang, Zaixiang Zheng, Xinyu Dai and Jiajun - Chen",Neural Machine Translation with Word Predictions,cs.CL," In the encoder-decoder architecture for neural machine translation (NMT), the -hidden states of the recurrent structures in the encoder and decoder carry the -crucial information about the sentence.These vectors are generated by -parameters which are updated by back-propagation of translation errors through -time. We argue that propagating errors through the end-to-end recurrent -structures are not a direct way of control the hidden vectors. In this paper, -we propose to use word predictions as a mechanism for direct supervision. More -specifically, we require these vectors to be able to predict the vocabulary in -target sentence. Our simple mechanism ensures better representations in the -encoder and decoder without using any extra data or annotation. It is also -helpful in reducing the target side vocabulary and improving the decoding -efficiency. Experiments on Chinese-English and German-English machine -translation tasks show BLEU improvements by 4.53 and 1.3, respectively -" -5452,1708.01776,"Clemens Rosenbaum, Tian Gao, Tim Klinger",e-QRAQ: A Multi-turn Reasoning Dataset and Simulator with Explanations,cs.LG cs.AI cs.CL," In this paper we present a new dataset and user simulator e-QRAQ (explainable -Query, Reason, and Answer Question) which tests an Agent's ability to read an -ambiguous text; ask questions until it can answer a challenge question; and -explain the reasoning behind its questions and answer. The User simulator -provides the Agent with a short, ambiguous story and a challenge question about -the story. The story is ambiguous because some of the entities have been -replaced by variables. At each turn the Agent may ask for the value of a -variable or try to answer the challenge question. In response the User -simulator provides a natural language explanation of why the Agent's query or -answer was useful in narrowing down the set of possible answers, or not. To -demonstrate one potential application of the e-QRAQ dataset, we train a new -neural architecture based on End-to-End Memory Networks to successfully -generate both predictions and partial explanations of its current understanding -of the problem. We observe a strong correlation between the quality of the -prediction and explanation. -" -5453,1708.01809,"Eva Hasler, Felix Stahlberg, Marcus Tomalin, Adri`a de Gispert, Bill - Byrne",A Comparison of Neural Models for Word Ordering,cs.CL," We compare several language models for the word-ordering task and propose a -new bag-to-sequence neural model based on attention-based sequence-to-sequence -models. We evaluate the model on a large German WMT data set where it -significantly outperforms existing models. We also describe a novel search -strategy for LM-based word ordering and report results on the English Penn -Treebank. Our best model setup outperforms prior work both in terms of speed -and quality. -" -5454,1708.01944,"Abram Handler, Brendan O'Connor",Rookie: A unique approach for exploring news archives,cs.HC cs.CL," News archives are an invaluable primary source for placing current events in -historical context. But current search engine tools do a poor job at uncovering -broad themes and narratives across documents. We present Rookie: a practical -software system which uses natural language processing (NLP) to help readers, -reporters and editors uncover broad stories in news archives. Unlike prior -work, Rookie's design emerged from 18 months of iterative development in -consultation with editors and computational journalists. This process lead to a -dramatically different approach from previous academic systems with similar -goals. Our efforts offer a generalizable case study for others building -real-world journalism software using NLP. -" -5455,1708.01980,"Xing Wang, Zhaopeng Tu, Deyi Xiong and Min Zhang",Translating Phrases in Neural Machine Translation,cs.CL," Phrases play an important role in natural language understanding and machine -translation (Sag et al., 2002; Villavicencio et al., 2005). However, it is -difficult to integrate them into current neural machine translation (NMT) which -reads and generates sentences word by word. In this work, we propose a method -to translate phrases in NMT by integrating a phrase memory storing target -phrases from a phrase-based statistical machine translation (SMT) system into -the encoder-decoder architecture of NMT. At each decoding step, the phrase -memory is first re-written by the SMT model, which dynamically generates -relevant target phrases with contextual information provided by the NMT model. -Then the proposed model reads the phrase memory to make probability estimations -for all phrases in the phrase memory. If phrase generation is carried on, the -NMT decoder selects an appropriate phrase from the memory to perform phrase -translation and updates its decoding state by consuming the words in the -selected phrase. Otherwise, the NMT decoder generates a word from the -vocabulary as the general NMT decoder does. Experiment results on the Chinese -to English translation show that the proposed model achieves significant -improvements over the baseline on various test sets. -" -5456,1708.02005,"Yang Feng, Shiyue Zhang, Andi Zhang, Dong Wang, Andrew Abel",Memory-augmented Neural Machine Translation,cs.CL," Neural machine translation (NMT) has achieved notable success in recent -times, however it is also widely recognized that this approach has limitations -with handling infrequent words and word pairs. This paper presents a novel -memory-augmented NMT (M-NMT) architecture, which stores knowledge about how -words (usually infrequently encountered ones) should be translated in a memory -and then utilizes them to assist the neural model. We use this memory mechanism -to combine the knowledge learned from a conventional statistical machine -translation system and the rules learned by an NMT system, and also propose a -solution for out-of-vocabulary (OOV) words based on this framework. Our -experiments on two Chinese-English translation tasks demonstrated that the -M-NMT architecture outperformed the NMT baseline by $9.0$ and $2.7$ BLEU points -on the two tasks, respectively. Additionally, we found this architecture -resulted in a much more effective OOV treatment compared to competitive -methods. -" -5457,1708.02043,"Marc Tanti, Albert Gatt and Kenneth P. Camilleri","What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption - Generator?",cs.CL cs.CV cs.NE," In neural image captioning systems, a recurrent neural network (RNN) is -typically viewed as the primary `generation' component. This view suggests that -the image features should be `injected' into the RNN. This is in fact the -dominant view in the literature. Alternatively, the RNN can instead be viewed -as only encoding the previously generated words. This view suggests that the -RNN should only be used to encode linguistic features and that only the final -representation should be `merged' with the image features at a later stage. -This paper compares these two architectures. We find that, in general, late -merging outperforms injection, suggesting that RNNs are better viewed as -encoders, rather than generators. -" -5458,1708.02099,"Chi Thang Duong, Remi Lebret, Karl Aberer",Multimodal Classification for Analysing Social Media,cs.CL cs.IR cs.SI," Classification of social media data is an important approach in understanding -user behavior on the Web. Although information on social media can be of -different modalities such as texts, images, audio or videos, traditional -approaches in classification usually leverage only one prominent modality. -Techniques that are able to leverage multiple modalities are often complex and -susceptible to the absence of some modalities. In this paper, we present simple -models that combine information from different modalities to classify social -media content and are able to handle the above problems with existing -techniques. Our models combine information from different modalities using a -pooling layer and an auxiliary learning task is used to learn a common feature -space. We demonstrate the performance of our models and their robustness to the -missing of some modalities in the emotion classification domain. Our -approaches, although being simple, can not only achieve significantly higher -accuracies than traditional fusion approaches but also have comparable results -when only one modality is available. -" -5459,1708.02182,"Stephen Merity, Nitish Shirish Keskar, Richard Socher",Regularizing and Optimizing LSTM Language Models,cs.CL cs.LG cs.NE," Recurrent neural networks (RNNs), such as long short-term memory networks -(LSTMs), serve as a fundamental building block for many sequence learning -tasks, including machine translation, language modeling, and question -answering. In this paper, we consider the specific problem of word-level -language modeling and investigate strategies for regularizing and optimizing -LSTM-based models. We propose the weight-dropped LSTM which uses DropConnect on -hidden-to-hidden weights as a form of recurrent regularization. Further, we -introduce NT-ASGD, a variant of the averaged stochastic gradient method, -wherein the averaging trigger is determined using a non-monotonic condition as -opposed to being tuned by the user. Using these and other regularization -strategies, we achieve state-of-the-art word level perplexities on two data -sets: 57.3 on Penn Treebank and 65.8 on WikiText-2. In exploring the -effectiveness of a neural cache in conjunction with our proposed model, we -achieve an even lower state-of-the-art perplexity of 52.8 on Penn Treebank and -52.0 on WikiText-2. -" -5460,1708.02210,"Qing Ping, Chaomei Chen","Video Highlights Detection and Summarization with Lag-Calibration based - on Concept-Emotion Mapping of Crowd-sourced Time-Sync Comments",cs.CL cs.IR," With the prevalence of video sharing, there are increasing demands for -automatic video digestion such as highlight detection. Recently, platforms with -crowdsourced time-sync video comments have emerged worldwide, providing a good -opportunity for highlight detection. However, this task is non-trivial: (1) -time-sync comments often lag behind their corresponding shot; (2) time-sync -comments are semantically sparse and noisy; (3) to determine which shots are -highlights is highly subjective. The present paper aims to tackle these -challenges by proposing a framework that (1) uses concept-mapped lexical-chains -for lag calibration; (2) models video highlights based on comment intensity and -combination of emotion and concept concentration of each shot; (3) summarize -each detected highlight using improved SumBasic with emotion and concept -mapping. Experiments on large real-world datasets show that our highlight -detection method and summarization method both outperform other benchmarks with -considerable margins. -" -5461,1708.02214,"Qing Ping, Chaomei Chen","LitStoryTeller: An Interactive System for Visual Exploration of - Scientific Papers Leveraging Named entities and Comparative Sentences",cs.HC cs.CL cs.DL," The present study proposes LitStoryTeller, an interactive system for visually -exploring the semantic structure of a scientific article. We demonstrate how -LitStoryTeller could be used to answer some of the most fundamental research -questions, such as how a new method was built on top of existing methods, based -on what theoretical proof and experimental evidences. More importantly, -LitStoryTeller can assist users to understand the full and interesting story a -scientific paper, with a concise outline and important details. The proposed -system borrows a metaphor from screen play, and visualizes the storyline of a -scientific paper by arranging its characters (scientific concepts or -terminologies) and scenes (paragraphs/sentences) into a progressive and -interactive storyline. Such storylines help to preserve the semantic structure -and logical thinking process of a scientific paper. Semantic structures, such -as scientific concepts and comparative sentences, are extracted using existing -named entity recognition APIs and supervised classifiers, from a scientific -paper automatically. Two supplementary views, ranked entity frequency view and -entity co-occurrence network view, are provided to help users identify the -""main plot"" of such scientific storylines. When collective documents are ready, -LitStoryTeller also provides a temporal entity evolution view and entity -community view for collection digestion. -" -5462,1708.02254,"Justine Zhang, Arthur Spirling, Cristian Danescu-Niculescu-Mizil",Asking Too Much? The Rhetorical Role of Questions in Political Discourse,cs.CL cs.AI cs.CY cs.SI physics.soc-ph," Questions play a prominent role in social interactions, performing rhetorical -functions that go beyond that of simple informational exchange. The surface -form of a question can signal the intention and background of the person asking -it, as well as the nature of their relation with the interlocutor. While the -informational nature of questions has been extensively examined in the context -of question-answering applications, their rhetorical aspects have been largely -understudied. - In this work we introduce an unsupervised methodology for extracting surface -motifs that recur in questions, and for grouping them according to their latent -rhetorical role. By applying this framework to the setting of question sessions -in the UK parliament, we show that the resulting typology encodes key aspects -of the political discourse---such as the bifurcation in questioning behavior -between government and opposition parties---and reveals new insights into the -effects of a legislator's tenure and political career ambitions. -" -5463,1708.02255,"Hiroaki Tsushima, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii","Generative Statistical Models with Self-Emergent Grammar of Chord - Sequences",cs.AI cs.CL cs.SD," Generative statistical models of chord sequences play crucial roles in music -processing. To capture syntactic similarities among certain chords (e.g. in C -major key, between G and G7 and between F and Dm), we study hidden Markov -models and probabilistic context-free grammar models with latent variables -describing syntactic categories of chord symbols and their unsupervised -learning techniques for inducing the latent grammar from data. Surprisingly, we -find that these models often outperform conventional Markov models in -predictive power, and the self-emergent categories often correspond to -traditional harmonic functions. This implies the need for chord categories in -harmony models from the informatics perspective. -" -5464,1708.02267,Ali Ahmadvand and Jinho D. Choi,"ISS-MULT: Intelligent Sample Selection for Multi-Task Learning in - Question Answering",cs.CL," Transferring knowledge from a source domain to another domain is useful, -especially when gathering new data is very expensive and time-consuming. Deep -networks have been well-studied for question answering tasks in recent years; -however, no prominent research for transfer learning through deep neural -networks exists in the question answering field. In this paper, two main -methods (INIT and MULT) in this field are examined. Then, a new method named -Intelligent sample selection (ISS-MULT) is proposed to improve the MULT method -for question answering tasks. Different datasets, specificay SQuAD, SelQA, -WikiQA, NewWikiQA and InforBoxQA, are used for evaluation. Moreover, two -different tasks of question answering - answer selection and answer triggering -- are evaluated to examine the effectiveness of transfer learning. The results -show that using transfer learning generally improves the performance if the -corpora are related and are based on the same policy. In addition, using -ISS-MULT could finely improve the MULT method for question answering tasks, and -these improvements prove more significant in the answer triggering task. -" -5465,1708.02275,"Yadollah Yaghoobzadeh, Heike Adel and Hinrich Sch\""utze",Corpus-level Fine-grained Entity Typing,cs.CL," This paper addresses the problem of corpus-level entity typing, i.e., -inferring from a large corpus that an entity is a member of a class such as -""food"" or ""artist"". The application of entity typing we are interested in is -knowledge base completion, specifically, to learn which classes an entity is a -member of. We propose FIGMENT to tackle this problem. FIGMENT is embedding- -based and combines (i) a global model that scores based on aggregated -contextual information of an entity and (ii) a context model that first scores -the individual occurrences of an entity and then aggregates the scores. Each of -the two proposed models has some specific properties. For the global model, -learning high quality entity representations is crucial because it is the only -source used for the predictions. Therefore, we introduce representations using -name and contexts of entities on the three levels of entity, word, and -character. We show each has complementary information and a multi-level -representation is the best. For the context model, we need to use distant -supervision since the context-level labels are not available for entities. -Distant supervised labels are noisy and this harms the performance of models. -Therefore, we introduce and apply new algorithms for noise mitigation using -multi-instance learning. We show the effectiveness of our models in a large -entity typing dataset, built from Freebase. -" -5466,1708.02300,"Ramakanth Pasunuru, Mohit Bansal",Reinforced Video Captioning with Entailment Rewards,cs.CL cs.AI cs.CV cs.LG," Sequence-to-sequence models have shown promising improvements on the temporal -task of video captioning, but they optimize word-level cross-entropy loss -during training. First, using policy gradient and mixed-loss methods for -reinforcement learning, we directly optimize sentence-level task-based metrics -(as rewards), achieving significant improvements over the baseline, based on -both automatic metrics and human evaluation on multiple datasets. Next, we -propose a novel entailment-enhanced reward (CIDEnt) that corrects -phrase-matching based metrics (such as CIDEr) to only allow for -logically-implied partial matches and avoid contradictions, achieving further -significant improvements over the CIDEr-reward model. Overall, our -CIDEnt-reward model achieves the new state-of-the-art on the MSR-VTT dataset. -" -5467,1708.02312,"Yixin Nie, Mohit Bansal",Shortcut-Stacked Sentence Encoders for Multi-Domain Inference,cs.CL cs.AI cs.LG," We present a simple sequential sentence encoder for multi-domain natural -language inference. Our encoder is based on stacked bidirectional LSTM-RNNs -with shortcut connections and fine-tuning of word embeddings. The overall -supervised model uses the above encoder to encode two input sentences into two -vectors, and then uses a classifier over the vector combination to label the -relationship between these two sentences as that of entailment, contradiction, -or neural. Our Shortcut-Stacked sentence encoders achieve strong improvements -over existing encoders on matched and mismatched multi-domain natural language -inference (top non-ensemble single-model result in the EMNLP RepEval 2017 -Shared Task (Nangia et al., 2017)). Moreover, they achieve the new -state-of-the-art encoding result on the original SNLI dataset (Bowman et al., -2015). -" -5468,1708.02383,"Meng Fang, Yuan Li and Trevor Cohn",Learning how to Active Learn: A Deep Reinforcement Learning Approach,cs.CL cs.AI cs.LG," Active learning aims to select a small subset of data for annotation such -that a classifier learned on the data is highly accurate. This is usually done -using heuristic selection methods, however the effectiveness of such methods is -limited and moreover, the performance of heuristics varies between datasets. To -address these shortcomings, we introduce a novel formulation by reframing the -active learning as a reinforcement learning problem and explicitly learning a -data selection policy, where the policy takes the role of the active learning -heuristic. Importantly, our method allows the selection policy learned using -simulation on one language to be transferred to other languages. We demonstrate -our method using cross-lingual named entity recognition, observing uniform -improvements over traditional active learning. -" -5469,1708.02420,"Edison Marrese-Taylor, Jorge A. Balazs, Yutaka Matsuo","Mining fine-grained opinions on closed captions of YouTube videos with - an attention-RNN",cs.CL," Video reviews are the natural evolution of written product reviews. In this -paper we target this phenomenon and introduce the first dataset created from -closed captions of YouTube product review videos as well as a new attention-RNN -model for aspect extraction and joint aspect extraction and sentiment -classification. Our model provides state-of-the-art performance on aspect -extraction without requiring the usage of hand-crafted features on the SemEval -ABSA corpus, while it outperforms the baseline on the joint task. In our -dataset, the attention-RNN model outperforms the baseline for both tasks, but -we observe important performance drops for all models in comparison to SemEval. -These results, as well as further experiments on domain adaptation for aspect -extraction, suggest that differences between speech and written text, which -have been discussed extensively in the literature, also extend to the domain of -product reviews, where they are relevant for fine-grained opinion mining. -" -5470,1708.02561,Daniel Ortega and Ngoc Thang Vu,"Neural-based Context Representation Learning for Dialog Act - Classification",cs.CL," We explore context representation learning methods in neural-based models for -dialog act classification. We propose and compare extensively different methods -which combine recurrent neural network architectures and attention mechanisms -(AMs) at different context levels. Our experimental results on two benchmark -datasets show consistent improvements compared to the models without contextual -information and reveal that the most suitable AM in the architecture depends on -the nature of the dataset. -" -5471,1708.02657,"Xiang Zhang, Yann LeCun","Which Encoding is the Best for Text Classification in Chinese, English, - Japanese and Korean?",cs.CL cs.LG," This article offers an empirical study on the different ways of encoding -Chinese, Japanese, Korean (CJK) and English languages for text classification. -Different encoding levels are studied, including UTF-8 bytes, characters, -words, romanized characters and romanized words. For all encoding levels, -whenever applicable, we provide comparisons with linear models, fastText and -convolutional networks. For convolutional networks, we compare between encoding -mechanisms using character glyph images, one-hot (or one-of-n) encoding, and -embedding. In total there are 473 models, using 14 large-scale text -classification datasets in 4 languages including Chinese, English, Japanese and -Korean. Some conclusions from these results include that byte-level one-hot -encoding based on UTF-8 consistently produces competitive results for -convolutional networks, that word-level n-grams linear models are competitive -even without perfect word segmentation, and that fastText provides the best -result using character-level n-gram encoding but can overfit when the features -are overly rich. -" -5472,1708.02702,"Christophe Van Gysel, Maarten de Rijke and Evangelos Kanoulas",Neural Vector Spaces for Unsupervised Information Retrieval,cs.IR cs.CL," We propose the Neural Vector Space Model (NVSM), a method that learns -representations of documents in an unsupervised manner for news article -retrieval. In the NVSM paradigm, we learn low-dimensional representations of -words and documents from scratch using gradient descent and rank documents -according to their similarity with query representations that are composed from -word representations. We show that NVSM performs better at document ranking -than existing latent semantic vector space methods. The addition of NVSM to a -mixture of lexical language models and a state-of-the-art baseline vector space -model yields a statistically significant increase in retrieval effectiveness. -Consequently, NVSM adds a complementary relevance signal. Next to semantic -matching, we find that NVSM performs well in cases where lexical matching is -needed. - NVSM learns a notion of term specificity directly from the document -collection without feature engineering. We also show that NVSM learns -regularities related to Luhn significance. Finally, we give advice on how to -deploy NVSM in situations where model selection (e.g., cross-validation) is -infeasible. We find that an unsupervised ensemble of multiple models trained -with different hyperparameter values performs better than a single -cross-validated model. Therefore, NVSM can safely be used for ranking documents -without supervised relevance judgments. -" -5473,1708.02709,"Tom Young, Devamanyu Hazarika, Soujanya Poria and Erik Cambria",Recent Trends in Deep Learning Based Natural Language Processing,cs.CL," Deep learning methods employ multiple processing layers to learn hierarchical -representations of data and have produced state-of-the-art results in many -domains. Recently, a variety of model designs and methods have blossomed in the -context of natural language processing (NLP). In this paper, we review -significant deep learning related models and methods that have been employed -for numerous NLP tasks and provide a walk-through of their evolution. We also -summarize, compare and contrast the various models and put forward a detailed -understanding of the past, present and future of deep learning in NLP. -" -5474,1708.02711,"Damien Teney, Peter Anderson, Xiaodong He, Anton van den Hengel","Tips and Tricks for Visual Question Answering: Learnings from the 2017 - Challenge",cs.CV cs.CL," This paper presents a state-of-the-art model for visual question answering -(VQA), which won the first place in the 2017 VQA Challenge. VQA is a task of -significant importance for research in artificial intelligence, given its -multimodal nature, clear evaluation protocol, and potential real-world -applications. The performance of deep neural networks for VQA is very dependent -on choices of architectures and hyperparameters. To help further research in -the area, we describe in detail our high-performing, though relatively simple -model. Through a massive exploration of architectures and hyperparameters -representing more than 3,000 GPU-hours, we identified tips and tricks that lead -to its success, namely: sigmoid outputs, soft training targets, image features -from bottom-up attention, gated tanh activations, output embeddings initialized -using GloVe and Google Images, large mini-batches, and smart shuffling of -training data. We provide a detailed analysis of their impact on performance to -assist others in making an appropriate selection. -" -5475,1708.02912,"Tharindu Weerasooriya, Nandula Perera and S.R. Liyanage","KeyXtract Twitter Model - An Essential Keywords Extraction Model for - Twitter Designed using NLP Tools",cs.CL cs.IR," Since a tweet is limited to 140 characters, it is ambiguous and difficult for -traditional Natural Language Processing (NLP) tools to analyse. This research -presents KeyXtract which enhances the machine learning based Stanford CoreNLP -Part-of-Speech (POS) tagger with the Twitter model to extract essential -keywords from a tweet. The system was developed using rule-based parsers and -two corpora. The data for the research was obtained from a Twitter profile of a -telecommunication company. The system development consisted of two stages. At -the initial stage, a domain specific corpus was compiled after analysing the -tweets. The POS tagger extracted the Noun Phrases and Verb Phrases while the -parsers removed noise and extracted any other keywords missed by the POS -tagger. The system was evaluated using the Turing Test. After it was tested and -compared against Stanford CoreNLP, the second stage of the system was developed -addressing the shortcomings of the first stage. It was enhanced using Named -Entity Recognition and Lemmatization. The second stage was also tested using -the Turing test and its pass rate increased from 50.00% to 83.33%. The -performance of the final system output was measured using the F1 score. -Stanford CoreNLP with the Twitter model had an average F1 of 0.69 while the -improved system had a F1 of 0.77. The accuracy of the system could be improved -by using a complete domain specific corpus. Since the system used linguistic -features of a sentence, it could be applied to other NLP tools. -" -5476,1708.02977,Licheng Yu and Mohit Bansal and Tamara L. Berg,Hierarchically-Attentive RNN for Album Summarization and Storytelling,cs.CL cs.AI cs.CV cs.LG," We address the problem of end-to-end visual storytelling. Given a photo -album, our model first selects the most representative (summary) photos, and -then composes a natural language story for the album. For this task, we make -use of the Visual Storytelling dataset and a model composed of three -hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album -photos, select representative (summary) photos, and compose the story. -Automatic and human evaluations show our model achieves better performance on -selection, generation, and retrieval than baselines. -" -5477,1708.02989,"Luis Moraes, Shahryar Baki, Rakesh Verma, Daniel Lee",Identifying Reference Spans: Topic Modeling and Word Embeddings help IR,cs.CL," The CL-SciSumm 2016 shared task introduced an interesting problem: given a -document D and a piece of text that cites D, how do we identify the text spans -of D being referenced by the piece of text? The shared task provided the first -annotated dataset for studying this problem. We present an analysis of our -continued work in improving our system's performance on this task. We -demonstrate how topic models and word embeddings can be used to surpass the -previously best performing system. -" -5478,1708.03044,"Ting-Hao Kenneth Huang and Walter S. Lasecki and Amos Azaria and - Jeffrey P. Bigham","""Is there anything else I can help you with?"": Challenges in Deploying - an On-Demand Crowd-Powered Conversational Agent",cs.HC cs.AI cs.CL," Intelligent conversational assistants, such as Apple's Siri, Microsoft's -Cortana, and Amazon's Echo, have quickly become a part of our digital life. -However, these assistants have major limitations, which prevents users from -conversing with them as they would with human dialog partners. This limits our -ability to observe how users really want to interact with the underlying -system. To address this problem, we developed a crowd-powered conversational -assistant, Chorus, and deployed it to see how users and workers would interact -together when mediated by the system. Chorus sophisticatedly converses with end -users over time by recruiting workers on demand, which in turn decide what -might be the best response for each user sentence. Up to the first month of our -deployment, 59 users have held conversations with Chorus during 320 -conversational sessions. In this paper, we present an account of Chorus' -deployment, with a focus on four challenges: (i) identifying when conversations -are over, (ii) malicious users and workers, (iii) on-demand recruiting, and -(iv) settings in which consensus is not enough. Our observations could assist -the deployment of crowd-powered conversation systems and crowd-powered systems -in general. -" -5479,1708.03052,"Lee Gao, Ronghuo Zheng",Communication-Free Parallel Supervised Topic Models,cs.LG cs.CL cs.IR stat.ML," Embarrassingly (communication-free) parallel Markov chain Monte Carlo (MCMC) -methods are commonly used in learning graphical models. However, MCMC cannot be -directly applied in learning topic models because of the quasi-ergodicity -problem caused by multimodal distribution of topics. In this paper, we develop -an embarrassingly parallel MCMC algorithm for sLDA. Our algorithm works by -switching the order of sampled topics combination and labeling variable -prediction in sLDA, in which it overcomes the quasi-ergodicity problem because -high-dimension topics that follow a multimodal distribution are projected into -one-dimension document labels that follow a unimodal distribution. Our -empirical experiments confirm that the out-of-sample prediction performance -using our embarrassingly parallel algorithm is comparable to non-parallel sLDA -while the computation time is significantly reduced. -" -5480,1708.03105,"Hussein S. Al-Olimat, Krishnaprasad Thirunarayan, Valerie Shalin, and - Amit Sheth","Location Name Extraction from Targeted Text Streams using - Gazetteer-based Statistical Language Models",cs.CL," Extracting location names from informal and unstructured social media data -requires the identification of referent boundaries and partitioning compound -names. Variability, particularly systematic variability in location names -(Carroll, 1983), challenges the identification task. Some of this variability -can be anticipated as operations within a statistical language model, in this -case drawn from gazetteers such as OpenStreetMap (OSM), Geonames, and DBpedia. -This permits evaluation of an observed n-gram in Twitter targeted text as a -legitimate location name variant from the same location-context. Using n-gram -statistics and location-related dictionaries, our Location Name Extraction tool -(LNEx) handles abbreviations and automatically filters and augments the -location names in gazetteers (handling name contractions and auxiliary -contents) to help detect the boundaries of multi-word location names and -thereby delimit them in texts. - We evaluated our approach on 4,500 event-specific tweets from three targeted -streams to compare the performance of LNEx against that of ten state-of-the-art -taggers that rely on standard semantic, syntactic and/or orthographic features. -LNEx improved the average F-Score by 33-179%, outperforming all taggers. -Further, LNEx is capable of stream processing. -" -5481,1708.03152,"Zhao Meng, Lili Mou, Zhi Jin","Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, - Dataset, and Models",cs.CL," Neural network-based dialog systems are attracting increasing attention in -both academia and industry. Recently, researchers have begun to realize the -importance of speaker modeling in neural dialog systems, but there lacks -established tasks and datasets. In this paper, we propose speaker -classification as a surrogate task for general speaker modeling, and collect -massive data to facilitate research in this direction. We further investigate -temporal-based and content-based models of speakers, and propose several -hybrids of them. Experiments show that speaker classification is feasible, and -that hybrid models outperform each single component. -" -5482,1708.03186,"Shahram Khadivi, Patrick Wilken, Leonard Dahlmann, Evgeny Matusov","Neural and Statistical Methods for Leveraging Meta-information in - Machine Translation",cs.CL," In this paper, we discuss different methods which use meta information and -richer context that may accompany source language input to improve machine -translation quality. We focus on category information of input text as meta -information, but the proposed methods can be extended to all textual and -non-textual meta information that might be available for the input text or -automatically predicted using the text content. The main novelty of this work -is to use state-of-the-art neural network methods to tackle this problem within -a statistical machine translation (SMT) framework. We observe translation -quality improvements up to 3% in terms of BLEU score in some text categories. -" -5483,1708.03246,"Dasha Bogdanova, Majid Yazdani",SESA: Supervised Explicit Semantic Analysis,cs.CL cs.AI," In recent years supervised representation learning has provided state of the -art or close to the state of the art results in semantic analysis tasks -including ranking and information retrieval. The core idea is to learn how to -embed items into a latent space such that they optimize a supervised objective -in that latent space. The dimensions of the latent space have no clear -semantics, and this reduces the interpretability of the system. For example, in -personalization models, it is hard to explain why a particular item is ranked -high for a given user profile. We propose a novel model of representation -learning called Supervised Explicit Semantic Analysis (SESA) that is trained in -a supervised fashion to embed items to a set of dimensions with explicit -semantics. The model learns to compare two objects by representing them in this -explicit space, where each dimension corresponds to a concept from a knowledge -base. This work extends Explicit Semantic Analysis (ESA) with a supervised -model for ranking problems. We apply this model to the task of Job-Profile -relevance in LinkedIn in which a set of skills defines our explicit dimensions -of the space. Every profile and job are encoded to this set of skills their -similarity is calculated in this space. We use RNNs to embed text input into -this space. In addition to interpretability, our model makes use of the -web-scale collaborative skills data that is provided by users for each LinkedIn -profile. Our model provides state of the art result while it remains -interpretable. -" -5484,1708.03271,"Leonard Dahlmann, Evgeny Matusov, Pavel Petrushkov, Shahram Khadivi","Neural Machine Translation Leveraging Phrase-based Models in a Hybrid - Search",cs.CL," In this paper, we introduce a hybrid search for attention-based neural -machine translation (NMT). A target phrase learned with statistical MT models -extends a hypothesis in the NMT beam search when the attention of the NMT model -focuses on the source words translated by this phrase. Phrases added in this -way are scored with the NMT model, but also with SMT features including -phrase-level translation probabilities and a target language model. -Experimental results on German->English news domain and English->Russian -e-commerce domain translation tasks show that using phrase-based models in NMT -search improves MT quality by up to 2.3% BLEU absolute as compared to a strong -NMT baseline. -" -5485,1708.03312,Yuanzhi Ke and Masafumi Hagiwara,"Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of - Chinese and Japanese",cs.CL," The character vocabulary can be very large in non-alphabetic languages such -as Chinese and Japanese, which makes neural network models huge to process such -languages. We explored a model for sentiment classification that takes the -embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and -kanji of Japanese. Our model is composed of a CNN word feature encoder and a -bi-directional RNN document feature encoder. The results achieved are on par -with the character embedding-based models, and close to the state-of-the-art -word embedding-based models, with 90% smaller vocabulary, and at least 13% and -80% fewer parameters than the character embedding-based models and word -embedding-based models respectively. The results suggest that the radical -embedding-based approach is cost-effective for machine learning on Chinese and -Japanese. -" -5486,1708.03390,"Maria Pelevina, Nikolay Arefyev, Chris Biemann, Alexander Panchenko",Making Sense of Word Embeddings,cs.CL," We present a simple yet effective approach for learning word sense -embeddings. In contrast to existing techniques, which either directly learn -sense representations from corpora or rely on sense inventories from lexical -resources, our approach can induce a sense inventory from existing word -embeddings via clustering of ego-networks of related words. An integrated WSD -mechanism enables labeling of words in context with learned sense vectors, -which gives rise to downstream applications. Experiments show that the -performance of our method is comparable to state-of-the-art unsupervised WSD -systems. -" -5487,1708.03418,"Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, Pascal Fleury","Learning to Attend, Copy, and Generate for Session-Based Query - Suggestion",cs.IR cs.AI cs.CL," Users try to articulate their complex information needs during search -sessions by reformulating their queries. To make this process more effective, -search engines provide related queries to help users in specifying the -information need in their search process. In this paper, we propose a -customized sequence-to-sequence model for session-based query suggestion. In -our model, we employ a query-aware attention mechanism to capture the structure -of the session context. is enables us to control the scope of the session from -which we infer the suggested next query, which helps not only handle the noisy -data but also automatically detect session boundaries. Furthermore, we observe -that, based on the user query reformulation behavior, within a single session a -large portion of query terms is retained from the previously submitted queries -and consists of mostly infrequent or unseen terms that are usually not included -in the vocabulary. We therefore empower the decoder of our model to access the -source words from the session context during decoding by incorporating a copy -mechanism. Moreover, we propose evaluation metrics to assess the quality of the -generative models for query suggestion. We conduct an extensive set of -experiments and analysis. e results suggest that our model outperforms the -baselines both in terms of the generating queries and scoring candidate queries -for the task of query suggestion. -" -5488,1708.03421,Andre Cianflone and Leila Kosseim,N-gram and Neural Language Models for Discriminating Similar Languages,cs.CL," This paper describes our submission (named clac) to the 2016 Discriminating -Similar Languages (DSL) shared task. We participated in the closed Sub-task 1 -(Set A) with two separate machine learning techniques. The first approach is a -character based Convolution Neural Network with a bidirectional long short term -memory (BiLSTM) layer (CLSTM), which achieved an accuracy of 78.45% with -minimal tuning. The second approach is a character-based n-gram model. This -last approach achieved an accuracy of 88.45% which is close to the accuracy of -89.38% achieved by the best submission, and allowed us to rank #7 overall. -" -5489,1708.03425,Sohail Hooda and Leila Kosseim,"Argument Labeling of Explicit Discourse Relations using LSTM Neural - Networks",cs.CL," Argument labeling of explicit discourse relations is a challenging task. The -state of the art systems achieve slightly above 55% F-measure but require -hand-crafted features. In this paper, we propose a Long Short Term Memory -(LSTM) based model for argument labeling. We experimented with multiple -configurations of our model. Using the PDTB dataset, our best model achieved an -F1 measure of 23.05% without any feature engineering. This is significantly -higher than the 20.52% achieved by the state of the art RNN approach, but -significantly lower than the feature based state of the art systems. On the -other hand, because our approach learns only from the raw dataset, it is more -widely applicable to multiple textual genres and languages. -" -5490,1708.03446,"Sunil Kumar Sahu, Ashish Anand","What matters in a transferable neural network model for relation - classification in the biomedical domain?",cs.CL," Lack of sufficient labeled data often limits the applicability of advanced -machine learning algorithms to real life problems. However efficient use of -Transfer Learning (TL) has been shown to be very useful across domains. TL -utilizes valuable knowledge learned in one task (source task), where sufficient -data is available, to the task of interest (target task). In biomedical and -clinical domain, it is quite common that lack of sufficient training data do -not allow to fully exploit machine learning models. In this work, we present -two unified recurrent neural models leading to three transfer learning -frameworks for relation classification tasks. We systematically investigate -effectiveness of the proposed frameworks in transferring the knowledge under -multiple aspects related to source and target tasks, such as, similarity or -relatedness between source and target tasks, and size of training data for -source task. Our empirical results show that the proposed frameworks in general -improve the model performance, however these improvements do depend on aspects -related to source and target tasks. This dependence then finally determine the -choice of a particular TL framework. -" -5491,1708.03447,"Sunil Kumar Sahu, Ashish Anand","Unified Neural Architecture for Drug, Disease and Clinical Entity - Recognition",cs.CL," Most existing methods for biomedical entity recognition task rely on explicit -feature engineering where many features either are specific to a particular -task or depends on output of other existing NLP tools. Neural architectures -have been shown across various domains that efforts for explicit feature design -can be reduced. In this work we propose an unified framework using -bi-directional long short term memory network (BLSTM) for named entity -recognition (NER) tasks in biomedical and clinical domains. Three important -characteristics of the framework are as follows - (1) model learns contextual -as well as morphological features using two different BLSTM in hierarchy, (2) -model uses first order linear conditional random field (CRF) in its output -layer in cascade of BLSTM to infer label or tag sequence, (3) model does not -use any domain specific features or dictionary, i.e., in another words, same -set of features are used in the three NER tasks, namely, disease name -recognition (Disease NER), drug name recognition (Drug NER) and clinical entity -recognition (Clinical NER). We compare performance of the proposed model with -existing state-of-the-art models on the standard benchmark datasets of the -three tasks. We show empirically that the proposed framework outperforms all -existing models. Further our analysis of CRF layer and word-embedding obtained -using character based embedding show their importance. -" -5492,1708.03492,"Lucas Sterckx, Jason Naradowsky, Bill Byrne, Thomas Demeester and - Chris Develder",Break it Down for Me: A Study in Automated Lyric Annotation,cs.CL," Comprehending lyrics, as found in songs and poems, can pose a challenge to -human and machine readers alike. This motivates the need for systems that can -understand the ambiguity and jargon found in such creative texts, and provide -commentary to aid readers in reaching the correct interpretation. We introduce -the task of automated lyric annotation (ALA). Like text simplification, a goal -of ALA is to rephrase the original text in a more easily understandable manner. -However, in ALA the system must often include additional information to clarify -niche terminology and abstract concepts. To stimulate research on this task, we -release a large collection of crowdsourced annotations for song lyrics. We -analyze the performance of translation and retrieval models on this task, -measuring performance with both automated and human evaluation. We find that -each model captures a unique type of information important to the task. -" -5493,1708.03541,Elnaz Davoodi and Leila Kosseim,Automatic Identification of AltLexes using Monolingual Parallel Corpora,cs.CL," The automatic identification of discourse relations is still a challenging -task in natural language processing. Discourse connectives, such as ""since"" or -""but"", are the most informative cues to identify explicit relations; however -discourse parsers typically use a closed inventory of such connectives. As a -result, discourse relations signaled by markers outside these inventories (i.e. -AltLexes) are not detected as effectively. In this paper, we propose a novel -method to leverage parallel corpora in text simplification and lexical -resources to automatically identify alternative lexicalizations that signal -discourse relation. When applied to the Simple Wikipedia and Newsela corpora -along with WordNet and the PPDB, the method allowed the automatic discovery of -91 AltLexes. -" -5494,1708.03569,"Erich Schubert and Andreas Spitz and Michael Weiler and Johanna - Gei{\ss} and Michael Gertz","Semantic Word Clouds with Background Corpus Normalization and - t-distributed Stochastic Neighbor Embedding",cs.IR cs.CL," Many word clouds provide no semantics to the word placement, but use a random -layout optimized solely for aesthetic purposes. We propose a novel approach to -model word significance and word affinity within a document, and in comparison -to a large background corpus. We demonstrate its usefulness for generating more -meaningful word clouds as a visual summary of a given document. We then select -keywords based on their significance and construct the word cloud based on the -derived affinity. Based on a modified t-distributed stochastic neighbor -embedding (t-SNE), we generate a semantic word placement. For words that -cooccur significantly, we include edges, and cluster the words according to -their cooccurrence. For this we designed a scalable and memory-efficient -sketch-based approach usable on commodity hardware to aggregate the required -corpus statistics needed for normalization, and for identifying keywords as -well as significant cooccurences. We empirically validate our approch using a -large Wikipedia corpus. -" -5495,1708.03629,Vikas Raunak,Simple and Effective Dimensionality Reduction for Word Embeddings,cs.CL," Word embeddings have become the basic building blocks for several natural -language processing and information retrieval tasks. Pre-trained word -embeddings are used in several downstream applications as well as for -constructing representations for sentences, paragraphs and documents. Recently, -there has been an emphasis on further improving the pre-trained word vectors -through post-processing algorithms. One such area of improvement is the -dimensionality reduction of the word embeddings. Reducing the size of word -embeddings through dimensionality reduction can improve their utility in memory -constrained devices, benefiting several real-world applications. In this work, -we present a novel algorithm that effectively combines PCA based dimensionality -reduction with a recently proposed post-processing algorithm, to construct word -embeddings of lower dimensions. Empirical evaluations on 12 standard word -similarity benchmarks show that our algorithm reduces the embedding -dimensionality by 50%, while achieving similar or (more often) better -performance than the higher dimension embeddings. -" -5496,1708.03696,Saif M. Mohammad and Felipe Bravo-Marquez,Emotion Intensities in Tweets,cs.CL," This paper examines the task of detecting intensity of emotion from text. We -create the first datasets of tweets annotated for anger, fear, joy, and sadness -intensities. We use a technique called best--worst scaling (BWS) that improves -annotation consistency and obtains reliable fine-grained scores. We show that -emotion-word hashtags often impact emotion intensity, usually conveying a more -intense emotion. Finally, we create a benchmark regression system and conduct -experiments to determine: which features are useful for detecting emotion -intensity, and, the extent to which two emotions are similar in terms of how -they manifest in language. -" -5497,1708.03699,"John Pavlopoulos, Prodromos Malakasiotis, Juli Bakagianni, Ion - Androutsopoulos",Improved Abusive Comment Moderation with User Embeddings,cs.CL," Experimenting with a dataset of approximately 1.6M user comments from a Greek -news sports portal, we explore how a state of the art RNN-based moderation -method can be improved by adding user embeddings, user type embeddings, user -biases, or user type biases. We observe improvements in all cases, with user -embeddings leading to the biggest performance gains. -" -5498,1708.03700,Saif M. Mohammad and Felipe Bravo-Marquez,WASSA-2017 Shared Task on Emotion Intensity,cs.CL," We present the first shared task on detecting the intensity of emotion felt -by the speaker of a tweet. We create the first datasets of tweets annotated for -anger, fear, joy, and sadness intensities using a technique called best--worst -scaling (BWS). We show that the annotations lead to reliable fine-grained -intensity scores (rankings of tweets by intensity). The data was partitioned -into training, development, and test sets for the competition. Twenty-two teams -participated in the shared task, with the best system obtaining a Pearson -correlation of 0.747 with the gold intensity scores. We summarize the machine -learning setups, resources, and tools used by the participating teams, with a -focus on the techniques and resources that are particularly useful for the -task. The emotion intensity dataset and the shared task are helping improve our -understanding of how we convey more or less intense emotions through language. -" -5499,1708.03743,"Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, Wen-tau - Yih",Cross-Sentence N-ary Relation Extraction with Graph LSTMs,cs.CL," Past work in relation extraction has focused on binary relations in single -sentences. Recent NLP inroads in high-value domains have sparked interest in -the more general setting of extracting n-ary relations that span multiple -sentences. In this paper, we explore a general relation extraction framework -based on graph long short-term memory networks (graph LSTMs) that can be easily -extended to cross-sentence n-ary relation extraction. The graph formulation -provides a unified way of exploring different LSTM approaches and incorporating -various intra-sentential and inter-sentential dependencies, such as sequential, -syntactic, and discourse relations. A robust contextual representation is -learned for the entities, which serves as input to the relation classifier. -This simplifies handling of relations with arbitrary arity, and enables -multi-task learning with related relations. We evaluate this framework in two -important precision medicine settings, demonstrating its effectiveness with -both conventional supervised learning and distant supervision. Cross-sentence -extraction produced larger knowledge bases. and multi-task learning -significantly improved extraction accuracy. A thorough analysis of various LSTM -approaches yielded useful insight the impact of linguistic analysis on -extraction accuracy. -" -5500,1708.03892,"Fabio Calefato, Filippo Lanubile, Nicole Novielli",EmoTxt: A Toolkit for Emotion Recognition from Text,cs.HC cs.CL," We present EmoTxt, a toolkit for emotion recognition from text, trained and -tested on a gold standard of about 9K question, answers, and comments from -online interactions. We provide empirical evidence of the performance of -EmoTxt. To the best of our knowledge, EmoTxt is the first open-source toolkit -supporting both emotion recognition from text and training of custom emotion -classification models. -" -5501,1708.03910,Mario Giulianelli,"Semi-supervised emotion lexicon expansion with label propagation and - specialized word embeddings",cs.CL cs.AI cs.NE," There exist two main approaches to automatically extract affective -orientation: lexicon-based and corpus-based. In this work, we argue that these -two methods are compatible and show that combining them can improve the -accuracy of emotion classifiers. In particular, we introduce a novel variant of -the Label Propagation algorithm that is tailored to distributed word -representations, we apply batch gradient descent to accelerate the optimization -of label propagation and to make the optimization feasible for large graphs, -and we propose a reproducible method for emotion lexicon expansion. We conclude -that label propagation can expand an emotion lexicon in a meaningful way and -that the expanded emotion lexicon can be leveraged to improve the accuracy of -an emotion classifier. -" -5502,1708.03920,"Jaebok Kim, Gwenn Englebienne, Khiet P. Truong, Vanessa Evers","Towards Speech Emotion Recognition ""in the wild"" using Aggregated - Corpora and Deep Multi-Task Learning",cs.CL," One of the challenges in Speech Emotion Recognition (SER) ""in the wild"" is -the large mismatch between training and test data (e.g. speakers and tasks). In -order to improve the generalisation capabilities of the emotion models, we -propose to use Multi-Task Learning (MTL) and use gender and naturalness as -auxiliary tasks in deep neural networks. This method was evaluated in -within-corpus and various cross-corpus classification experiments that simulate -conditions ""in the wild"". In comparison to Single-Task Learning (STL) based -state of the art methods, we found that our MTL method proposed improved -performance significantly. Particularly, models using both gender and -naturalness achieved more gains than those using either gender or naturalness -separately. This benefit was also found in the high-level representations of -the feature space, obtained from our method proposed, where discriminative -emotional clusters could be observed. -" -5503,1708.03940,"Tao Yu, Christopher Hidey, Owen Rambow and Kathleen McKeown","Leveraging Sparse and Dense Feature Combinations for Sentiment - Classification",cs.CL cs.IR cs.LG," Neural networks are one of the most popular approaches for many natural -language processing tasks such as sentiment analysis. They often outperform -traditional machine learning models and achieve the state-of-art results on -most tasks. However, many existing deep learning models are complex, difficult -to train and provide a limited improvement over simpler methods. We propose a -simple, robust and powerful model for sentiment classification. This model -outperforms many deep learning models and achieves comparable results to other -deep learning models with complex architectures on sentiment analysis datasets. -We publish the code online. -" -5504,1708.03994,"Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh",Data Sets: Word Embeddings Learned from Tweets and General Data,cs.CL cs.SI," A word embedding is a low-dimensional, dense and real- valued vector -representation of a word. Word embeddings have been used in many NLP tasks. -They are usually gener- ated from a large text corpus. The embedding of a word -cap- tures both its syntactic and semantic aspects. Tweets are short, noisy and -have unique lexical and semantic features that are different from other types -of text. Therefore, it is necessary to have word embeddings learned -specifically from tweets. In this paper, we present ten word embedding data -sets. In addition to the data sets learned from just tweet data, we also built -embedding sets from the general data and the combination of tweets with the -general data. The general data consist of news articles, Wikipedia data and -other web data. These ten embedding models were learned from about 400 million -tweets and 7 billion words from the general text. In this paper, we also -present two experiments demonstrating how to use the data sets in some NLP -tasks, such as tweet sentiment analysis and tweet topic classification tasks. -" -5505,1708.03995,"Prathusha Kameswara Sarma, Bill Sethares",Sentiment Analysis by Joint Learning of Word Embeddings and Classifier,cs.CL cs.AI cs.LG stat.ML," Word embeddings are representations of individual words of a text document in -a vector space and they are often use- ful for performing natural language pro- -cessing tasks. Current state of the art al- gorithms for learning word -embeddings learn vector representations from large corpora of text documents in -an unsu- pervised fashion. This paper introduces SWESA (Supervised Word -Embeddings for Sentiment Analysis), an algorithm for sentiment analysis via -word embeddings. SWESA leverages document label infor- mation to learn vector -representations of words from a modest corpus of text doc- uments by solving an -optimization prob- lem that minimizes a cost function with respect to both word -embeddings as well as classification accuracy. Analysis re- veals that SWESA -provides an efficient way of estimating the dimension of the word embeddings -that are to be learned. Experiments on several real world data sets show that -SWESA has superior per- formance when compared to previously suggested -approaches to word embeddings and sentiment analysis tasks. -" -5506,1708.04120,"Marc Szafraniec, Gautier Marti, Philippe Donnat",Putting Self-Supervised Token Embedding on the Tables,cs.IR cs.CL," Information distribution by electronic messages is a privileged means of -transmission for many businesses and individuals, often under the form of -plain-text tables. As their number grows, it becomes necessary to use an -algorithm to extract text and numbers instead of a human. Usual methods are -focused on regular expressions or on a strict structure in the data, but are -not efficient when we have many variations, fuzzy structure or implicit labels. -In this paper we introduce SC2T, a totally self-supervised model for -constructing vector representations of tokens in semi-structured messages by -using characters and context levels that address these issues. It can then be -used for an unsupervised labeling of tokens, or be the basis for a -semi-supervised information extraction system. -" -5507,1708.04134,"Q Vera Liao, Biplav Srivastava, Pavan Kapanipathi","A Measure for Dialog Complexity and its Application in Streamlining - Service Operations",cs.CL cs.AI," Dialog is a natural modality for interaction between customers and businesses -in the service industry. As customers call up the service provider, their -interactions may be routine or extraordinary. We believe that these -interactions, when seen as dialogs, can be analyzed to obtain a better -understanding of customer needs and how to efficiently address them. We -introduce the idea of a dialog complexity measure to characterize multi-party -interactions, propose a general data-driven method to calculate it, use it to -discover insights in public and enterprise dialog datasets, and demonstrate its -beneficial usage in facilitating better handling of customer requests and -evaluating service agents. -" -5508,1708.04299,Sayyed M. Zahiri and Jinho D. Choi,"Emotion Detection on TV Show Transcripts with Sequence-based - Convolutional Neural Networks",cs.CL," While there have been significant advances in detecting emotions from speech -and image recognition, emotion detection on text is still under-explored and -remained as an active research field. This paper introduces a corpus for -text-based emotion detection on multiparty dialogue as well as deep neural -models that outperform the existing approaches for document classification. We -first present a new corpus that provides annotation of seven emotions on -consecutive utterances in dialogues extracted from the show, Friends. We then -suggest four types of sequence-based convolutional neural network models with -attention that leverage the sequence information encapsulated in dialogue. Our -best model shows the accuracies of 37.9% and 54% for fine- and coarse-grained -emotions, respectively. Given the difficulty of this task, this is promising. -" -5509,1708.04358,"Afshin Rahimi, Timothy Baldwin and Trevor Cohn","Continuous Representation of Location for Geolocation and Lexical - Dialectology using Mixture Density Networks",cs.CL cs.IR cs.SI," We propose a method for embedding two-dimensional locations in a continuous -vector space using a neural network-based model incorporating mixtures of -Gaussian distributions, presenting two model variants for text-based -geolocation and lexical dialectology. Evaluated over Twitter data, the proposed -model outperforms conventional regression-based geolocation and provides a -better estimate of uncertainty. We also show the effectiveness of the -representation for predicting words from location in lexical dialectology, and -evaluate it using the DARE dataset. -" -5510,1708.04390,"Weiyu Lan, Xirong Li and Jianfeng Dong",Fluency-Guided Cross-Lingual Image Captioning,cs.CL," Image captioning has so far been explored mostly in English, as most -available datasets are in this language. However, the application of image -captioning should not be restricted by language. Only few studies have been -conducted for image captioning in a cross-lingual setting. Different from these -works that manually build a dataset for a target language, we aim to learn a -cross-lingual captioning model fully from machine-translated sentences. To -conquer the lack of fluency in the translated sentences, we propose in this -paper a fluency-guided learning framework. The framework comprises a module to -automatically estimate the fluency of the sentences and another module to -utilize the estimated fluency scores to effectively train an image captioning -model for the target language. As experiments on two bilingual -(English-Chinese) datasets show, our approach improves both fluency and -relevance of the generated captions in Chinese, but without using any manually -written sentences from the target language. -" -5511,1708.04439,Sukriti Verma and Vagisha Nidhi,Extractive Summarization using Deep Learning,cs.CL cs.IR cs.LG," This paper proposes a text summarization approach for factual reports using a -deep learning model. This approach consists of three phases: feature -extraction, feature enhancement, and summary generation, which work together to -assimilate core information and generate a coherent, understandable summary. We -are exploring various features to improve the set of sentences selected for the -summary, and are using a Restricted Boltzmann Machine to enhance and abstract -those features to improve resultant accuracy without losing any important -information. The sentences are scored based on those enhanced features and an -extractive summary is constructed. Experimentation carried out on several -articles demonstrates the effectiveness of the proposed approach. Source code -available at: https://github.com/vagisha-nidhi/TextSummarizer -" -5512,1708.04469,"Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias - Sperber, Sebastian St\""uker, Alex Waibel",Comparison of Decoding Strategies for CTC Acoustic Models,cs.CL," Connectionist Temporal Classification has recently attracted a lot of -interest as it offers an elegant approach to building acoustic models (AMs) for -speech recognition. The CTC loss function maps an input sequence of observable -feature vectors to an output sequence of symbols. Output symbols are -conditionally independent of each other under CTC loss, so a language model -(LM) can be incorporated conveniently during decoding, retaining the -traditional separation of acoustic and linguistic components in ASR. For fixed -vocabularies, Weighted Finite State Transducers provide a strong baseline for -efficient integration of CTC AMs with n-gram LMs. Character-based neural LMs -provide a straight forward solution for open vocabulary speech recognition and -all-neural models, and can be decoded with beam search. Finally, -sequence-to-sequence models can be used to translate a sequence of individual -sounds into a word string. We compare the performance of these three -approaches, and analyze their error patterns, which provides insightful -guidance for future research and development in this important area. -" -5513,1708.04557,Alexander Herzog and Slava J. Mikhaylov,"Database of Parliamentary Speeches in Ireland, 1919-2013",cs.CL cs.SI stat.ML," We present a database of parliamentary debates that contains the complete -record of parliamentary speeches from D\'ail \'Eireann, the lower house and -principal chamber of the Irish parliament, from 1919 to 2013. In addition, the -database contains background information on all TDs (Teachta D\'ala, members of -parliament), such as their party affiliations, constituencies and office -positions. The current version of the database includes close to 4.5 million -speeches from 1,178 TDs. The speeches were downloaded from the official -parliament website and further processed and parsed with a Python script. -Background information on TDs was collected from the member database of the -parliament website. Data on cabinet positions (ministers and junior ministers) -was collected from the official website of the government. A record linkage -algorithm and human coders were used to match TDs and ministers. -" -5514,1708.04559,Sreelekha S,"Statistical Vs Rule Based Machine Translation; A Case Study on Indian - Language Perspective",cs.CL," In this paper we present our work on a case study between Statistical Machien -Transaltion (SMT) and Rule-Based Machine Translation (RBMT) systems on -English-Indian langugae and Indian to Indian langugae perspective. Main -objective of our study is to make a five way performance compariosn; such as, -a) SMT and RBMT b) SMT on English-Indian langugae c) RBMT on English-Indian -langugae d) SMT on Indian to Indian langugae perspective e) RBMT on Indian to -Indian langugae perspective. Through a detailed analysis we describe the Rule -Based and the Statistical Machine Translation system developments and its -evaluations. Through a detailed error analysis, we point out the relative -strengths and weaknesses of both systems. The observations based on our study -are: a) SMT systems outperforms RBMT b) In the case of SMT, English to Indian -language MT systmes performs better than Indian to English langugae MT systems -c) In the case of RBMT, English to Indian langugae MT systems perofrms better -than Indian to Englsih Language MT systems d) SMT systems performs better for -Indian to Indian language MT systems compared to RBMT. Effectively, we shall -see that even with a small amount of training corpus a statistical machine -translation system has many advantages for high quality domain specific machine -translation over that of a rule-based counterpart. -" -5515,1708.04587,"Nattapong Sanchan, Ahmet Aker and Kalina Bontcheva",Automatic Summarization of Online Debates,cs.CL cs.AI cs.IR," Debate summarization is one of the novel and challenging research areas in -automatic text summarization which has been largely unexplored. In this paper, -we develop a debate summarization pipeline to summarize key topics which are -discussed or argued in the two opposing sides of online debates. We view that -the generation of debate summaries can be achieved by clustering, cluster -labeling, and visualization. In our work, we investigate two different -clustering approaches for the generation of the summaries. In the first -approach, we generate the summaries by applying purely term-based clustering -and cluster labeling. The second approach makes use of X-means for clustering -and Mutual Information for labeling the clusters. Both approaches are driven by -ontologies. We visualize the results using bar charts. We think that our -results are a smooth entry for users aiming to receive the first impression -about what is discussed within a debate topic containing waste number of -argumentations. -" -5516,1708.04592,"Nattapong Sanchan, Ahmet Aker and Kalina Bontcheva","Gold Standard Online Debates Summaries and First Experiments Towards - Automatic Summarization of Online Debate Data",cs.CL cs.AI cs.IR," Usage of online textual media is steadily increasing. Daily, more and more -news stories, blog posts and scientific articles are added to the online -volumes. These are all freely accessible and have been employed extensively in -multiple research areas, e.g. automatic text summarization, information -retrieval, information extraction, etc. Meanwhile, online debate forums have -recently become popular, but have remained largely unexplored. For this reason, -there are no sufficient resources of annotated debate data available for -conducting research in this genre. In this paper, we collected and annotated -debate data for an automatic summarization task. Similar to extractive gold -standard summary generation our data contains sentences worthy to include into -a summary. Five human annotators performed this task. Inter-annotator -agreement, based on semantic similarity, is 36% for Cohen's kappa and 48% for -Krippendorff's alpha. Moreover, we also implement an extractive summarization -system for online debates and discuss prominent features for the task of -summarizing online debate data automatically. -" -5517,1708.04681,"Arman Cohan, Allan Fong, Raj Ratwani, Nazli Goharian",Identifying Harm Events in Clinical Care through Medical Narratives,cs.CL cs.IR," Preventable medical errors are estimated to be among the leading causes of -injury and death in the United States. To prevent such errors, healthcare -systems have implemented patient safety and incident reporting systems. These -systems enable clinicians to report unsafe conditions and cases where patients -have been harmed due to errors in medical care. These reports are narratives in -natural language and while they provide detailed information about the -situation, it is non-trivial to perform large scale analysis for identifying -common causes of errors and harm to the patients. In this work, we present a -method based on attentive convolutional and recurrent networks for identifying -harm events in patient care and categorize the harm based on its severity -level. We demonstrate that our methods can significantly improve the -performance over existing methods in identifying harm in clinical care. -" -5518,1708.04686,"Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong","VQS: Linking Segmentations to Questions and Answers for Supervised - Attention in VQA and Question-Focused Semantic Segmentation",cs.CV cs.CL cs.LG," Rich and dense human labeled datasets are among the main enabling factors for -the recent advance on vision-language understanding. Many seemingly distant -annotations (e.g., semantic segmentation and visual question answering (VQA)) -are inherently connected in that they reveal different levels and perspectives -of human understandings about the same visual scenes --- and even the same set -of images (e.g., of COCO). The popularity of COCO correlates those annotations -and tasks. Explicitly linking them up may significantly benefit both individual -tasks and the unified vision and language modeling. We present the preliminary -work of linking the instance segmentations provided by COCO to the questions -and answers (QAs) in the VQA dataset, and name the collected links visual -questions and segmentation answers (VQS). They transfer human supervision -between the previously separate tasks, offer more effective leverage to -existing problems, and also open the door for new research problems and models. -We study two applications of the VQS data in this paper: supervised attention -for VQA and a novel question-focused semantic segmentation task. For the -former, we obtain state-of-the-art results on the VQA real multiple-choice task -by simply augmenting the multilayer perceptrons with some attention features -that are learned using the segmentation-QA links as explicit supervision. To -put the latter in perspective, we study two plausible methods and compare them -to an oracle method assuming that the instance segmentations are given at the -test stage. -" -5519,1708.04704,"Marcos V. Treviso, Christopher D. Shulby, Sandra M. Aluisio","Evaluating Word Embeddings for Sentence Boundary Detection in Speech - Transcripts",cs.CL," This paper is motivated by the automation of neuropsychological tests -involving discourse analysis in the retellings of narratives by patients with -potential cognitive impairment. In this scenario the task of sentence boundary -detection in speech transcripts is important as discourse analysis involves the -application of Natural Language Processing tools, such as taggers and parsers, -which depend on the sentence as a processing unit. Our aim in this paper is to -verify which embedding induction method works best for the sentence boundary -detection task, specifically whether it be those which were proposed to capture -semantic, syntactic or morphological similarities. -" -5520,1708.04729,"Yizhe Zhang, Dinghan Shen, Guoyin Wang, Zhe Gan, Ricardo Henao, - Lawrence Carin",Deconvolutional Paragraph Representation Learning,cs.CL cs.LG stat.ML," Learning latent representations from long text sequences is an important -first step in many natural language processing applications. Recurrent Neural -Networks (RNNs) have become a cornerstone for this challenging task. However, -the quality of sentences during RNN-based decoding (reconstruction) decreases -with the length of the text. We propose a sequence-to-sequence, purely -convolutional and deconvolutional autoencoding framework that is free of the -above issue, while also being computationally efficient. The proposed method is -simple, easy to implement and can be leveraged as a building block for many -applications. We show empirically that compared to RNNs, our framework is -better at reconstructing and correcting long paragraphs. Quantitative -evaluation on semi-supervised text classification and summarization tasks -demonstrate the potential for better utilization of long unlabeled text data. -" -5521,1708.04755,Tzu-Ray Su and Hung-Yi Lee,Learning Chinese Word Representations From Glyphs Of Characters,cs.CL," In this paper, we propose new methods to learn Chinese word representations. -Chinese characters are composed of graphical components, which carry rich -semantics. It is common for a Chinese learner to comprehend the meaning of a -word from these graphical components. As a result, we propose models that -enhance word representations by character glyphs. The character glyph features -are directly learned from the bitmaps of characters by convolutional -auto-encoder(convAE), and the glyph features improve Chinese word -representations which are already enhanced by character embeddings. Another -contribution in this paper is that we created several evaluation datasets in -traditional Chinese and made them public. -" -5522,1708.04765,"Thi Lan Ngo, Khac Linh Pham, Minh Son Cao, Son Bao Pham, Xuan Hieu - Phan","Dialogue Act Segmentation for Vietnamese Human-Human Conversational - Texts",cs.CL," Dialog act identification plays an important role in understanding -conversations. It has been widely applied in many fields such as dialogue -systems, automatic machine translation, automatic speech recognition, and -especially useful in systems with human-computer natural language dialogue -interfaces such as virtual assistants and chatbots. The first step of -identifying dialog act is identifying the boundary of the dialog act in -utterances. In this paper, we focus on segmenting the utterance according to -the dialog act boundaries, i.e. functional segments identification, for -Vietnamese utterances. We investigate carefully functional segment -identification in two approaches: (1) machine learning approach using maximum -entropy (ME) and conditional random fields (CRFs); (2) deep learning approach -using bidirectional Long Short-Term Memory (LSTM) with a CRF layer -(Bi-LSTM-CRF) on two different conversational datasets: (1) Facebook messages -(Message data); (2) transcription from phone conversations (Phone data). To the -best of our knowledge, this is the first work that applies deep learning based -approach to dialog act segmentation. As the results show, deep learning -approach performs appreciably better as to compare with traditional machine -learning approaches. Moreover, it is also the first study that tackles dialog -act and functional segment identification for Vietnamese. -" -5523,1708.04776,"Yuxin Peng, Jinwei Qi and Yuxin Yuan","Modality-specific Cross-modal Similarity Measurement with Recurrent - Attention Network",cs.CV cs.CL cs.MM," Nowadays, cross-modal retrieval plays an indispensable role to flexibly find -information across different modalities of data. Effectively measuring the -similarity between different modalities of data is the key of cross-modal -retrieval. Different modalities such as image and text have imbalanced and -complementary relationships, which contain unequal amount of information when -describing the same semantics. For example, images often contain more details -that cannot be demonstrated by textual descriptions and vice versa. Existing -works based on Deep Neural Network (DNN) mostly construct one common space for -different modalities to find the latent alignments between them, which lose -their exclusive modality-specific characteristics. Different from the existing -works, we propose modality-specific cross-modal similarity measurement (MCSM) -approach by constructing independent semantic space for each modality, which -adopts end-to-end framework to directly generate modality-specific cross-modal -similarity without explicit common representation. For each semantic space, -modality-specific characteristics within one modality are fully exploited by -recurrent attention network, while the data of another modality is projected -into this space with attention based joint embedding to utilize the learned -attention weights for guiding the fine-grained cross-modal correlation -learning, which can capture the imbalanced and complementary relationships -between different modalities. Finally, the complementarity between the semantic -spaces for different modalities is explored by adaptive fusion of the -modality-specific cross-modal similarities to perform cross-modal retrieval. -Experiments on the widely-used Wikipedia and Pascal Sentence datasets as well -as our constructed large-scale XMediaNet dataset verify the effectiveness of -our proposed approach, outperforming 9 state-of-the-art methods. -" -5524,1708.04923,"Naveen Panwar, Shreya Khare, Neelamadhav Gantayat, Rahul Aralikatte, - Senthil Mani, Anush Sankaran",mAnI: Movie Amalgamation using Neural Imitation,cs.CL cs.LG," Cross-modal data retrieval has been the basis of various creative tasks -performed by Artificial Intelligence (AI). One such highly challenging task for -AI is to convert a book into its corresponding movie, which most of the -creative film makers do as of today. In this research, we take the first step -towards it by visualizing the content of a book using its corresponding movie -visuals. Given a set of sentences from a book or even a fan-fiction written in -the same universe, we employ deep learning models to visualize the input by -stitching together relevant frames from the movie. We studied and compared -three different types of setting to match the book with the movie content: (i) -Dialog model: using only the dialog from the movie, (ii) Visual model: using -only the visual content from the movie, and (iii) Hybrid model: using the -dialog and the visual content from the movie. Experiments on the publicly -available MovieBook dataset shows the effectiveness of the proposed models. -" -5525,1708.04968,"Rahul Aralikatte, Giriprasad Sridhara, Neelamadhav Gantayat, Senthil - Mani",Fault in your stars: An Analysis of Android App Reviews,cs.LG cs.CL," Mobile app distribution platforms such as Google Play Store allow users to -share their feedback about downloaded apps in the form of a review comment and -a corresponding star rating. Typically, the star rating ranges from one to five -stars, with one star denoting a high sense of dissatisfaction with the app and -five stars denoting a high sense of satisfaction. - Unfortunately, due to a variety of reasons, often the star rating provided by -a user is inconsistent with the opinion expressed in the review. For example, -consider the following review for the Facebook App on Android; ""Awesome App"". -One would reasonably expect the rating for this review to be five stars, but -the actual rating is one star! - Such inconsistent ratings can lead to a deflated (or inflated) overall -average rating of an app which can affect user downloads, as typically users -look at the average star ratings while making a decision on downloading an app. -Also, the app developers receive a biased feedback about the application that -does not represent ground reality. This is especially significant for small -apps with a few thousand downloads as even a small number of mismatched reviews -can bring down the average rating drastically. - In this paper, we conducted a study on this review-rating mismatch problem. -We manually examined 8600 reviews from 10 popular Android apps and found that -20% of the ratings in our dataset were inconsistent with the review. Further, -we developed three systems; two of which were based on traditional machine -learning and one on deep learning to automatically identify reviews whose -rating did not match with the opinion expressed in the review. Our deep -learning system performed the best and had an accuracy of 92% in identifying -the correct star rating to be associated with a given review. -" -5526,1708.05045,"Zequn Sun, Wei Hu, Chengkai Li",Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding,cs.CL cs.AI cs.DB," Entity alignment is the task of finding entities in two knowledge bases (KBs) -that represent the same real-world object. When facing KBs in different natural -languages, conventional cross-lingual entity alignment methods rely on machine -translation to eliminate the language barriers. These approaches often suffer -from the uneven quality of translations between languages. While recent -embedding-based techniques encode entities and relationships in KBs and do not -need machine translation for cross-lingual entity alignment, a significant -number of attributes remain largely unexplored. In this paper, we propose a -joint attribute-preserving embedding model for cross-lingual entity alignment. -It jointly embeds the structures of two KBs into a unified vector space and -further refines it by leveraging attribute correlations in the KBs. Our -experimental results on real-world datasets show that this approach -significantly outperforms the state-of-the-art embedding approaches for -cross-lingual entity alignment and could be complemented with methods based on -machine translation. -" -5527,1708.05071,"Jaebok Kim, Khiet P. Truong, Gwenn Englebienne, and Vanessa Evers","Learning spectro-temporal features with 3D CNNs for speech emotion - recognition",cs.CL cs.CV," In this paper, we propose to use deep 3-dimensional convolutional networks -(3D CNNs) in order to address the challenge of modelling spectro-temporal -dynamics for speech emotion recognition (SER). Compared to a hybrid of -Convolutional Neural Network and Long-Short-Term-Memory (CNN-LSTM), our -proposed 3D CNNs simultaneously extract short-term and long-term spectral -features with a moderate number of parameters. We evaluated our proposed and -other state-of-the-art methods in a speaker-independent manner using aggregated -corpora that give a large and diverse set of speakers. We found that 1) shallow -temporal and moderately deep spectral kernels of a homogeneous architecture are -optimal for the task; and 2) our 3D CNNs are more effective for -spectro-temporal feature learning compared to other methods. Finally, we -visualised the feature space obtained with our proposed method using -t-distributed stochastic neighbour embedding (T-SNE) and could observe distinct -clusters of emotions. -" -5528,1708.05122,"Prithvijit Chattopadhyay, Deshraj Yadav, Viraj Prabhu, Arjun - Chandrasekaran, Abhishek Das, Stefan Lee, Dhruv Batra, Devi Parikh",Evaluating Visual Conversational Agents via Cooperative Human-AI Games,cs.HC cs.AI cs.CL cs.CV," As AI continues to advance, human-AI teams are inevitable. However, progress -in AI is routinely measured in isolation, without a human in the loop. It is -crucial to benchmark progress in AI, not just in isolation, but also in terms -of how it translates to helping humans perform certain tasks, i.e., the -performance of human-AI teams. - In this work, we design a cooperative game - GuessWhich - to measure human-AI -team performance in the specific context of the AI being a visual -conversational agent. GuessWhich involves live interaction between the human -and the AI. The AI, which we call ALICE, is provided an image which is unseen -by the human. Following a brief description of the image, the human questions -ALICE about this secret image to identify it from a fixed pool of images. - We measure performance of the human-ALICE team by the number of guesses it -takes the human to correctly identify the secret image after a fixed number of -dialog rounds with ALICE. We compare performance of the human-ALICE teams for -two versions of ALICE. Our human studies suggest a counterintuitive trend - -that while AI literature shows that one version outperforms the other when -paired with an AI questioner bot, we find that this improvement in AI-AI -performance does not translate to improved human-AI performance. This suggests -a mismatch between benchmarking of AI in isolation and in the context of -human-AI teams. -" -5529,1708.05148,"Diksha Khurana, Aditya Koli, Kiran Khatter and Sukhdev Singh","Natural Language Processing: State of The Art, Current Trends and - Challenges",cs.CL," Natural language processing (NLP) has recently gained much attention for -representing and analysing human language computationally. It has spread its -applications in various fields such as machine translation, email spam -detection, information extraction, summarization, medical, and question -answering etc. The paper distinguishes four phases by discussing different -levels of NLP and components of Natural Language Generation (NLG) followed by -presenting the history and evolution of NLP, state of the art presenting the -various applications of NLP and current trends and challenges. -" -5530,1708.05269,"David Vilares, Marcos Garcia, Miguel A. Alonso, Carlos - G\'omez-Rodr\'iguez",Towards Syntactic Iberian Polarity Classification,cs.CL," Lexicon-based methods using syntactic rules for polarity classification rely -on parsers that are dependent on the language and on treebank guidelines. Thus, -rules are also dependent and require adaptation, especially in multilingual -scenarios. We tackle this challenge in the context of the Iberian Peninsula, -releasing the first symbolic syntax-based Iberian system with rules shared -across five official languages: Basque, Catalan, Galician, Portuguese and -Spanish. The model is made available. -" -5531,1708.05271,Ting Yao and Yingwei Pan and Yehao Li and Tao Mei,"Incorporating Copying Mechanism in Image Captioning for Learning Novel - Objects",cs.CV cs.CL," Image captioning often requires a large set of training image-sentence pairs. -In practice, however, acquiring sufficient training pairs is always expensive, -making the recent captioning models limited in their ability to describe -objects outside of training corpora (i.e., novel objects). In this paper, we -present Long Short-Term Memory with Copying Mechanism (LSTM-C) --- a new -architecture that incorporates copying into the Convolutional Neural Networks -(CNN) plus Recurrent Neural Networks (RNN) image captioning framework, for -describing novel objects in captions. Specifically, freely available object -recognition datasets are leveraged to develop classifiers for novel objects. -Our LSTM-C then nicely integrates the standard word-by-word sentence generation -by a decoder RNN with copying mechanism which may instead select words from -novel objects at proper places in the output sentence. Extensive experiments -are conducted on both MSCOCO image captioning and ImageNet datasets, -demonstrating the ability of our proposed LSTM-C architecture to describe novel -objects. Furthermore, superior results are reported when compared to -state-of-the-art deep models. -" -5532,1708.05286,"Ahmet Aker, Leon Derczynski, Kalina Bontcheva",Simple Open Stance Classification for Rumour Analysis,cs.CL," Stance classification determines the attitude, or stance, in a (typically -short) text. The task has powerful applications, such as the detection of fake -news or the automatic extraction of attitudes toward entities or events in the -media. This paper describes a surprisingly simple and efficient classification -approach to open stance classification in Twitter, for rumour and veracity -classification. The approach profits from a novel set of automatically -identifiable problem-specific features, which significantly boost classifier -accuracy and achieve above state-of-the-art results on recent benchmark -datasets. This calls into question the value of using complex sophisticated -models for stance classification without first doing informed feature -extraction. -" -5533,1708.05449,"Ian Beaver, Cynthia Freeman, Abdullah Mueen",An Annotated Corpus of Relational Strategies in Customer Service,cs.CL," We create and release the first publicly available commercial customer -service corpus with annotated relational segments. Human-computer data from -three live customer service Intelligent Virtual Agents (IVAs) in the domains of -travel and telecommunications were collected, and reviewers marked all text -that was deemed unnecessary to the determination of user intention. After -merging the selections of multiple reviewers to create highlighted texts, a -second round of annotation was done to determine the classes of language -present in the highlighted sections such as the presence of Greetings, -Backstory, Justification, Gratitude, Rants, or Emotions. This resulting corpus -is a valuable resource for improving the quality and relational abilities of -IVAs. As well as discussing the corpus itself, we compare the usage of such -language in human-human interactions on TripAdvisor forums. We show that -removal of this language from task-based inputs has a positive effect on IVA -understanding by both an increase in confidence and improvement in responses, -demonstrating the need for automated methods of its discovery. -" -5534,1708.05466,"Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, and Yifan Gong",Large-Scale Domain Adaptation via Teacher-Student Learning,cs.CL," High accuracy speech recognition requires a large amount of transcribed data -for supervised training. In the absence of such data, domain adaptation of a -well-trained acoustic model can be performed, but even here, high accuracy -usually requires significant labeled data from the target domain. In this work, -we propose an approach to domain adaptation that does not require -transcriptions but instead uses a corpus of unlabeled parallel data, consisting -of pairs of samples from the source domain of the well-trained model and the -desired target domain. To perform adaptation, we employ teacher/student (T/S) -learning, in which the posterior probabilities generated by the source-domain -model can be used in lieu of labels to train the target-domain model. We -evaluate the proposed approach in two scenarios, adapting a clean acoustic -model to noisy speech and adapting an adults speech acoustic model to children -speech. Significant improvements in accuracy are obtained, with reductions in -word error rate of up to 44% over the original source model without the need -for transcribed data in the target domain. Moreover, we show that increasing -the amount of unlabeled data results in additional model robustness, which is -particularly beneficial when using simulated training data in the -target-domain. -" -5535,1708.05482,"Lin Gui and Jiannan Hu and Yulan He and Ruifeng Xu and Qin Lu and - Jiachen Du",A Question Answering Approach to Emotion Cause Extraction,cs.CL," Emotion cause extraction aims to identify the reasons behind a certain -emotion expressed in text. It is a much more difficult task compared to emotion -classification. Inspired by recent advances in using deep memory networks for -question answering (QA), we propose a new approach which considers emotion -cause identification as a reading comprehension task in QA. Inspired by -convolutional neural networks, we propose a new mechanism to store relevant -context in different memory slots to model context information. Our proposed -approach can extract both word level sequence features and lexical features. -Performance evaluation shows that our method achieves the state-of-the-art -performance on a recently released emotion cause dataset, outperforming a -number of competitive baselines by at least 3.01% in F-measure. -" -5536,1708.05515,"Seunghak Yu, Nilesh Kulkarni, Haejun Lee, Jihie Kim",Syllable-level Neural Language Model for Agglutinative Language,cs.CL," Language models for agglutinative languages have always been hindered in past -due to myriad of agglutinations possible to any given word through various -affixes. We propose a method to diminish the problem of out-of-vocabulary words -by introducing an embedding derived from syllables and morphemes which -leverages the agglutinative property. Our model outperforms character-level -embedding in perplexity by 16.87 with 9.50M parameters. Proposed method -achieves state of the art performance over existing input prediction methods in -terms of Key Stroke Saving and has been commercialized. -" -5537,1708.05521,"Edison Marrese-Taylor, Yutaka Matsuo","EmoAtt at EmoInt-2017: Inner attention sentence embedding for Emotion - Intensity",cs.CL," In this paper we describe a deep learning system that has been designed and -built for the WASSA 2017 Emotion Intensity Shared Task. We introduce a -representation learning approach based on inner attention on top of an RNN. -Results show that our model offers good capabilities and is able to -successfully identify emotion-bearing words to predict intensity without -leveraging on lexicons, obtaining the 13th place among 22 shared task -competitors. -" -5538,1708.05536,E. Manjavacas and J. de Gussem and W. Daelemans and M. Kestemont,"Assessing the Stylistic Properties of Neurally Generated Text in - Authorship Attribution",cs.CL," Recent applications of neural language models have led to an increased -interest in the automatic generation of natural language. However impressive, -the evaluation of neurally generated text has so far remained rather informal -and anecdotal. Here, we present an attempt at the systematic assessment of one -aspect of the quality of neurally generated text. We focus on a specific aspect -of neural language generation: its ability to reproduce authorial writing -styles. Using established models for authorship attribution, we empirically -assess the stylistic qualities of neurally generated text. In comparison to -conventional language models, neural models generate fuzzier text that is -relatively harder to attribute correctly. Nevertheless, our results also -suggest that neurally generated text offers more valuable perspectives for the -augmentation of training data. -" -5539,1708.05565,"Yu Wang, Jiayi Liu, Yuxiang Liu, Jun Hao, Yang He, Jinghe Hu, Weipeng - P. Yan, Mantian Li","LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online - Auctions",cs.LG cs.AI cs.CL cs.GT," We present LADDER, the first deep reinforcement learning agent that can -successfully learn control policies for large-scale real-world problems -directly from raw inputs composed of high-level semantic information. The agent -is based on an asynchronous stochastic variant of DQN (Deep Q Network) named -DASQN. The inputs of the agent are plain-text descriptions of states of a game -of incomplete information, i.e. real-time large scale online auctions, and the -rewards are auction profits of very large scale. We apply the agent to an -essential portion of JD's online RTB (real-time bidding) advertising business -and find that it easily beats the former state-of-the-art bidding policy that -had been carefully engineered and calibrated by human experts: during JD.com's -June 18th anniversary sale, the agent increased the company's ads revenue from -the portion by more than 50%, while the advertisers' ROI (return on investment) -also improved significantly. -" -5540,1708.05582,"Sushant Hiray, Venkatesh Duppada",Agree to Disagree: Improving Disagreement Detection with Dual GRUs,cs.CL," This paper presents models for detecting agreement/disagreement in online -discussions. In this work we show that by using a Siamese inspired architecture -to encode the discussions, we no longer need to rely on hand-crafted features -to exploit the meta thread structure. We evaluate our model on existing online -discussion corpora - ABCD, IAC and AWTP. Experimental results on ABCD dataset -show that by fusing lexical and word embedding features, our model achieves the -state of the art performance of 0.804 average F1 score. We also show that the -model trained on ABCD dataset performs competitively on relatively smaller -annotated datasets (IAC and AWTP). -" -5541,1708.05592,"Xie Chen, Xunying Liu, Anton Ragni, Yu Wang, Mark Gales",Future Word Contexts in Neural Network Language Models,cs.CL," Recently, bidirectional recurrent network language models (bi-RNNLMs) have -been shown to outperform standard, unidirectional, recurrent neural network -language models (uni-RNNLMs) on a range of speech recognition tasks. This -indicates that future word context information beyond the word history can be -useful. However, bi-RNNLMs pose a number of challenges as they make use of the -complete previous and future word context information. This impacts both -training efficiency and their use within a lattice rescoring framework. In this -paper these issues are addressed by proposing a novel neural network structure, -succeeding word RNNLMs (su-RNNLMs). Instead of using a recurrent unit to -capture the complete future word contexts, a feedforward unit is used to model -a finite number of succeeding, future, words. This model can be trained much -more efficiently than bi-RNNLMs and can also be used for lattice rescoring. -Experimental results on a meeting transcription task (AMI) show the proposed -model consistently outperformed uni-RNNLMs and yield only a slight degradation -compared to bi-RNNLMs in N-best rescoring. Additionally, performance -improvements can be obtained using lattice rescoring and subsequent confusion -network decoding. -" -5542,1708.05682,"Lu Huang, Jiasong Sun, Ji Xu and Yi Yang",An Improved Residual LSTM Architecture for Acoustic Modeling,cs.CL cs.AI cs.SD," Long Short-Term Memory (LSTM) is the primary recurrent neural networks -architecture for acoustic modeling in automatic speech recognition systems. -Residual learning is an efficient method to help neural networks converge -easier and faster. In this paper, we propose several types of residual LSTM -methods for our acoustic modeling. Our experiments indicate that, compared with -classic LSTM, our architecture shows more than 8% relative reduction in Phone -Error Rate (PER) on TIMIT tasks. At the same time, our residual fast LSTM -approach shows 4% relative reduction in PER on the same task. Besides, we find -that all this architecture could have good results on THCHS-30, Librispeech and -Switchboard corpora. -" -5543,1708.05719,"J\""org Tiedemann","Cross-Lingual Dependency Parsing for Closely Related Languages - - Helsinki's Submission to VarDial 2017",cs.CL," This paper describes the submission from the University of Helsinki to the -shared task on cross-lingual dependency parsing at VarDial 2017. We present -work on annotation projection and treebank translation that gave good results -for all three target languages in the test set. In particular, Slovak seems to -work well with information coming from the Czech treebank, which is in line -with related work. The attachment scores for cross-lingual models even surpass -the fully supervised models trained on the target language treebank. Croatian -is the most difficult language in the test set and the improvements over the -baseline are rather modest. Norwegian works best with information coming from -Swedish whereas Danish contributes surprisingly little. -" -5544,1708.05729,"Robert \""Ostling and J\""org Tiedemann",Neural machine translation for low-resource languages,cs.CL," Neural machine translation (NMT) approaches have improved the state of the -art in many machine translation settings over the last couple of years, but -they require large amounts of training data to produce sensible output. We -demonstrate that NMT can be used for low-resource languages as well, by -introducing more local dependencies and using word alignments to learn sentence -reordering during translation. In addition to our novel model, we also present -an empirical evaluation of low-resource phrase-based statistical machine -translation (SMT) and NMT to investigate the lower limits of the respective -technologies. We find that while SMT remains the best option for low-resource -settings, our method can produce acceptable translations with only 70000 tokens -of training data, a level where the baseline NMT system fails completely. -" -5545,1708.05763,"Richard Futrell, Edward Gibson, Hal Tily, Idan Blank, Anastasia - Vishnevetsky, Steven T. Piantadosi, Evelina Fedorenko",The Natural Stories Corpus,cs.CL," It is now a common practice to compare models of human language processing by -predicting participant reactions (such as reading times) to corpora consisting -of rich naturalistic linguistic materials. However, many of the corpora used in -these studies are based on naturalistic text and thus do not contain many of -the low-frequency syntactic constructions that are often required to -distinguish processing theories. Here we describe a new corpus consisting of -English texts edited to contain many low-frequency syntactic constructions -while still sounding fluent to native speakers. The corpus is annotated with -hand-corrected parse trees and includes self-paced reading time data. Here we -give an overview of the content of the corpus and release the data. -" -5546,1708.05797,Elnaz Davoodi and Leila Kosseim,CLaC @ QATS: Quality Assessment for Text Simplification,cs.CL," This paper describes our approach to the 2016 QATS quality assessment shared -task. We trained three independent Random Forest classifiers in order to assess -the quality of the simplified texts in terms of grammaticality, meaning -preservation and simplicity. We used the language model of Google-Ngram as -feature to predict the grammaticality. Meaning preservation is predicted using -two complementary approaches based on word embedding and WordNet synonyms. A -wider range of features including TF-IDF, sentence length and frequency of cue -phrases are used to evaluate the simplicity aspect. Overall, the accuracy of -the system ranges from 33.33% for the overall aspect to 58.73% for -grammaticality. -" -5547,1708.05798,"Majid Laali, Andre Cianflone and Leila Kosseim",The CLaC Discourse Parser at CoNLL-2016,cs.CL," This paper describes our submission ""CLaC"" to the CoNLL-2016 shared task on -shallow discourse parsing. We used two complementary approaches for the task. A -standard machine learning approach for the parsing of explicit relations, and a -deep learning approach for non-explicit relations. Overall, our parser achieves -an F1-score of 0.2106 on the identification of discourse relations (0.3110 for -explicit relations and 0.1219 for non-explicit relations) on the blind -CoNLL-2016 test set. -" -5548,1708.05800,Elnaz Davoodi and Leila Kosseim,On the Contribution of Discourse Structure on Text Complexity Assessment,cs.CL," This paper investigates the influence of discourse features on text -complexity assessment. To do so, we created two data sets based on the Penn -Discourse Treebank and the Simple English Wikipedia corpora and compared the -influence of coherence, cohesion, surface, lexical and syntactic features to -assess text complexity. - Results show that with both data sets coherence features are more correlated -to text complexity than the other types of features. In addition, feature -selection revealed that with both data sets the top most discriminating feature -is a coherence feature. -" -5549,1708.05801,Reda Siblini and Leila Kosseim,ClaC: Semantic Relatedness of Words and Phrases,cs.CL," The measurement of phrasal semantic relatedness is an important metric for -many natural language processing applications. In this paper, we present three -approaches for measuring phrasal semantics, one based on a semantic network -model, another on a distributional similarity model, and a hybrid between the -two. Our hybrid approach achieved an F-measure of 77.4% on the task of -evaluating the semantic similarity of words and compositional phrases. -" -5550,1708.05803,Shamima Mithun and Leila Kosseim,Measuring the Effect of Discourse Relations on Blog Summarization,cs.CL," The work presented in this paper attempts to evaluate and quantify the use of -discourse relations in the context of blog summarization and compare their use -to more traditional and factual texts. Specifically, we measured the usefulness -of 6 discourse relations - namely comparison, contingency, illustration, -attribution, topic-opinion, and attributive for the task of text summarization -from blogs. We have evaluated the effect of each relation using the TAC 2008 -opinion summarization dataset and compared them with the results with the DUC -2007 dataset. The results show that in both textual genres, contingency, -comparison, and illustration relations provide a significant improvement on -summarization content; while attribution, topic-opinion, and attributive -relations do not provide a consistent and significant improvement. These -results indicate that, at least for summarization, discourse relations are just -as useful for informal and affective texts as for more traditional news -articles. -" -5551,1708.05857,"Majid Laali, Elnaz Davoodi and Leila Kosseim",The CLaC Discourse Parser at CoNLL-2015,cs.CL," This paper describes our submission (kosseim15) to the CoNLL-2015 shared task -on shallow discourse parsing. We used the UIMA framework to develop our parser -and used ClearTK to add machine learning functionality to the UIMA framework. -Overall, our parser achieves a result of 17.3 F1 on the identification of -discourse relations on the blind CoNLL-2015 test set, ranking in sixth place. -" -5552,1708.05873,Alexander Baturo and Niheer Dasandi and Slava J. Mikhaylov,"What Drives the International Development Agenda? An NLP Analysis of the - United Nations General Debate 1970-2016",cs.CL cs.SI," There is surprisingly little known about agenda setting for international -development in the United Nations (UN) despite it having a significant -influence on the process and outcomes of development efforts. This paper -addresses this shortcoming using a novel approach that applies natural language -processing techniques to countries' annual statements in the UN General Debate. -Every year UN member states deliver statements during the General Debate on -their governments' perspective on major issues in world politics. These -speeches provide invaluable information on state preferences on a wide range of -issues, including international development, but have largely been overlooked -in the study of global politics. This paper identifies the main international -development topics that states raise in these speeches between 1970 and 2016, -and examine the country-specific drivers of international development rhetoric. -" -5553,1708.05891,"Mohamed Eldesouki, Younes Samih, Ahmed Abdelali, Mohammed Attia, Hamdy - Mubarak, Kareem Darwish, Kallmeyer Laura",Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM,cs.CL," Arabic word segmentation is essential for a variety of NLP applications such -as machine translation and information retrieval. Segmentation entails breaking -words into their constituent stems, affixes and clitics. In this paper, we -compare two approaches for segmenting four major Arabic dialects using only -several thousand training examples for each dialect. The two approaches involve -posing the problem as a ranking problem, where an SVM ranker picks the best -segmentation, and as a sequence labeling problem, where a bi-LSTM RNN coupled -with CRF determines where best to segment words. We are able to achieve solid -segmentation results for all dialects using rather limited training data. We -also show that employing Modern Standard Arabic data for domain adaptation and -assuming context independence improve overall results. -" -5554,1708.05942,"Robert \""Ostling and Yves Scherrer and J\""org Tiedemann and Gongbo - Tang and Tommi Nieminen",The Helsinki Neural Machine Translation System,cs.CL," We introduce the Helsinki Neural Machine Translation system (HNMT) and how it -is applied in the news translation task at WMT 2017, where it ranked first in -both the human and automatic evaluations for English--Finnish. We discuss the -success of English--Finnish translations and the overall advantage of NMT over -a strong SMT baseline. We also discuss our submissions for English--Latvian, -English--Chinese and Chinese--English. -" -5555,1708.05943,"J\""org Tiedemann and Yves Scherrer",Neural Machine Translation with Extended Context,cs.CL," We investigate the use of extended context in attention-based neural machine -translation. We base our experiments on translated movie subtitles and discuss -the effect of increasing the segments beyond single translation units. We study -the use of extended source language context as well as bilingual context -extensions. The models learn to distinguish between information from different -segments and are surprisingly robust with respect to translation quality. In -this pilot study, we observe interesting cross-sentential attention patterns -that improve textual coherence in translation at least in some selected cases. -" -5556,1708.05956,"Bing Liu, Ian Lane","An End-to-End Trainable Neural Network Model with Belief Tracking for - Task-Oriented Dialog",cs.CL," We present a novel end-to-end trainable neural network model for -task-oriented dialog systems. The model is able to track dialog state, issue -API calls to knowledge base (KB), and incorporate structured KB query results -into system responses to successfully complete task-oriented dialogs. The -proposed model produces well-structured system responses by jointly learning -belief tracking and KB result processing conditioning on the dialog history. We -evaluate the model in a restaurant search domain using a dataset that is -converted from the second Dialog State Tracking Challenge (DSTC2) corpus. -Experiment results show that the proposed model can robustly track dialog state -given the dialog history. Moreover, our model demonstrates promising results in -producing appropriate system responses, outperforming prior end-to-end -trainable neural network models using per-response accuracy evaluation metrics. -" -5557,1708.05963,"Artem M. Grachev, Dmitry I. Ignatov, Andrey V. Savchenko",Neural Networks Compression for Language Modeling,stat.ML cs.CL cs.LG cs.NE," In this paper, we consider several compression techniques for the language -modeling problem based on recurrent neural networks (RNNs). It is known that -conventional RNNs, e.g, LSTM-based networks in language modeling, are -characterized with either high space complexity or substantial inference time. -This problem is especially crucial for mobile applications, in which the -constant interaction with the remote server is inappropriate. By using the Penn -Treebank (PTB) dataset we compare pruning, quantization, low-rank -factorization, tensor train decomposition for LSTM networks in terms of model -size and suitability for fast inference. -" -5558,1708.05992,Piotr \.Zelasko,"Expanding Abbreviations in a Strongly Inflected Language: Are - Morphosyntactic Tags Sufficient?",cs.CL," In this paper, the problem of recovery of morphological information lost in -abbreviated forms is addressed with a focus on highly inflected languages. -Evidence is presented that the correct inflected form of an expanded -abbreviation can in many cases be deduced solely from the morphosyntactic tags -of the context. The prediction model is a deep bidirectional LSTM network with -tag embedding. The training and evaluation data are gathered by finding the -words which could have been abbreviated and using their corresponding -morphosyntactic tags as the labels, while the tags of the context words are -used as the input features for classification. The network is trained on over -10 million words from the Polish Sejm Corpus and achieves 74.2% prediction -accuracy on a smaller, but more general National Corpus of Polish. The analysis -of errors suggests that performance in this task may improve if some prior -knowledge about the abbreviated word is incorporated into the model. -" -5559,1708.05997,"Youssef Oualil, Dietrich Klakow","A Batch Noise Contrastive Estimation Approach for Training Large - Vocabulary Language Models",cs.CL cs.AI," Training large vocabulary Neural Network Language Models (NNLMs) is a -difficult task due to the explicit requirement of the output layer -normalization, which typically involves the evaluation of the full softmax -function over the complete vocabulary. This paper proposes a Batch Noise -Contrastive Estimation (B-NCE) approach to alleviate this problem. This is -achieved by reducing the vocabulary, at each time step, to the target words in -the batch and then replacing the softmax by the noise contrastive estimation -approach, where these words play the role of targets and noise samples at the -same time. In doing so, the proposed approach can be fully formulated and -implemented using optimal dense matrix operations. Applying B-NCE to train -different NNLMs on the Large Text Compression Benchmark (LTCB) and the One -Billion Word Benchmark (OBWB) shows a significant reduction of the training -time with no noticeable degradation of the models performance. This paper also -presents a new baseline comparative study of different standard NNLMs on the -large OBWB on a single Titan-X GPU. -" -5560,1708.06000,"Wei Wei, Kennth Joseph, Kathleen Carley","Efficient Online Inference for Infinite Evolutionary Cluster models with - Applications to Latent Social Event Discovery",cs.AI cs.CL cs.SI," The Recurrent Chinese Restaurant Process (RCRP) is a powerful statistical -method for modeling evolving clusters in large scale social media data. With -the RCRP, one can allow both the number of clusters and the cluster parameters -in a model to change over time. However, application of the RCRP has largely -been limited due to the non-conjugacy between the cluster evolutionary priors -and the Multinomial likelihood. This non-conjugacy makes inference di cult and -restricts the scalability of models which use the RCRP, leading to the RCRP -being applied only in simple problems, such as those that can be approximated -by a single Gaussian emission. In this paper, we provide a novel solution for -the non-conjugacy issues for the RCRP and an example of how to leverage our -solution for one speci c problem - the social event discovery problem. By -utilizing Sequential Monte Carlo methods in inference, our approach can be -massively paralleled and is highly scalable, to the extent it can work on tens -of millions of documents. We are able to generate high quality topical and -location distributions of the clusters that can be directly interpreted as real -social events, and our experimental results suggest that the approaches -proposed achieve much better predictive performance than techniques reported in -prior work. We also demonstrate how the techniques we develop can be used in a -much more general ways toward similar problems. -" -5561,1708.06022,"Li Dong, Jonathan Mallinson, Siva Reddy, Mirella Lapata",Learning to Paraphrase for Question Answering,cs.CL," Question answering (QA) systems are sensitive to the many different ways -natural language expresses the same information need. In this paper we turn to -paraphrases as a means of capturing this knowledge and present a general -framework which learns felicitous paraphrases for various QA tasks. Our method -is trained end-to-end using question-answer pairs as a supervision signal. A -question and its paraphrases serve as input to a neural scoring model which -assigns higher weights to linguistic expressions most likely to yield correct -answers. We evaluate our approach on QA over Freebase and answer sentence -selection. Experimental results on three datasets show that our framework -consistently improves performance, achieving competitive results despite the -use of simple QA models. -" -5562,1708.06025,"Nathan Hartmann and Erick Fonseca and Christopher Shulby and Marcos - Treviso and Jessica Rodrigues and Sandra Aluisio","Portuguese Word Embeddings: Evaluating on Word Analogies and Natural - Language Tasks",cs.CL," Word embeddings have been found to provide meaningful representations for -words in an efficient way; therefore, they have become common in Natural -Language Processing sys- tems. In this paper, we evaluated different word -embedding models trained on a large Portuguese corpus, including both Brazilian -and European variants. We trained 31 word embedding models using FastText, -GloVe, Wang2Vec and Word2Vec. We evaluated them intrinsically on syntactic and -semantic analogies and extrinsically on POS tagging and sentence semantic -similarity tasks. The obtained results suggest that word analogies are not -appropriate for word embedding evaluation; task-specific evaluations appear to -be a better option. -" -5563,1708.06068,Barathi Ganesh HB and Anand Kumar M and Soman KP,Vector Space Model as Cognitive Space for Text Classification,cs.CL cs.AI cs.SI," In this era of digitization, knowing the user's sociolect aspects have become -essential features to build the user specific recommendation systems. These -sociolect aspects could be found by mining the user's language sharing in the -form of text in social media and reviews. This paper describes about the -experiment that was performed in PAN Author Profiling 2017 shared task. The -objective of the task is to find the sociolect aspects of the users from their -tweets. The sociolect aspects considered in this experiment are user's gender -and native language information. Here user's tweets written in a different -language from their native language are represented as Document - Term Matrix -with document frequency as the constraint. Further classification is done using -the Support Vector Machine by taking gender and native language as target -classes. This experiment attains the average accuracy of 73.42% in gender -prediction and 76.26% in the native language identification task. -" -5564,1708.06073,"W. Xiong, L. Wu, F. Alleva, J. Droppo, X. Huang, A. Stolcke",The Microsoft 2017 Conversational Speech Recognition System,cs.CL," We describe the 2017 version of Microsoft's conversational speech recognition -system, in which we update our 2016 system with recent developments in -neural-network-based acoustic and language modeling to further advance the -state of the art on the Switchboard speech recognition task. The system adds a -CNN-BLSTM acoustic model to the set of model architectures we combined -previously, and includes character-based and dialog session aware LSTM language -models in rescoring. For system combination we adopt a two-stage approach, -whereby subsets of acoustic models are first combined at the senone/frame -level, followed by a word-level voting via confusion networks. We also added a -confusion network rescoring step after system combination. The resulting system -yields a 5.1\% word error rate on the 2000 Switchboard evaluation set. -" -5565,1708.06075,"Yi Luan, Mari Ostendorf and Hannaneh Hajishirzi",Scientific Information Extraction with Semi-supervised Neural Tagging,cs.CL," This paper addresses the problem of extracting keyphrases from scientific -articles and categorizing them as corresponding to a task, process, or -material. We cast the problem as sequence tagging and introduce semi-supervised -methods to a neural tagging model, which builds on recent advances in named -entity recognition. Since annotated training data is scarce in this domain, we -introduce a graph-based semi-supervised algorithm together with a data -selection scheme to leverage unannotated articles. Both inductive and -transductive semi-supervised learning strategies outperform state-of-the-art -information extraction performance on the 2017 SemEval Task 10 ScienceIE task. -" -5566,1708.06185,"Venkatesh Duppada, Sushant Hiray",Seernet at EmoInt-2017: Tweet Emotion Intensity Estimator,cs.CL," The paper describes experiments on estimating emotion intensity in tweets -using a generalized regressor system. The system combines lexical, syntactic -and pre-trained word embedding features, trains them on general regressors and -finally combines the best performing models to create an ensemble. The proposed -system stood 3rd out of 22 systems in the leaderboard of WASSA-2017 Shared Task -on Emotion Intensity. -" -5567,1708.06266,"Zied Bouraoui, Shoaib Jameel, Steven Schockaert",Probabilistic Relation Induction in Vector Space Embeddings,cs.AI cs.CL," Word embeddings have been found to capture a surprisingly rich amount of -syntactic and semantic knowledge. However, it is not yet sufficiently -well-understood how the relational knowledge that is implicitly encoded in word -embeddings can be extracted in a reliable way. In this paper, we propose two -probabilistic models to address this issue. The first model is based on the -common relations-as-translations view, but is cast in a probabilistic setting. -Our second model is based on the much weaker assumption that there is a linear -relationship between the vector representations of related words. Compared to -existing approaches, our models lead to more accurate predictions, and they are -more explicit about what can and cannot be extracted from the word embedding. -" -5568,1708.06426,"Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates",Cold Fusion: Training Seq2Seq Models Together with Language Models,cs.CL," Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks -which involve generating natural language sentences such as machine -translation, image captioning and speech recognition. Performance has further -been improved by leveraging unlabeled data, often in the form of a language -model. In this work, we present the Cold Fusion method, which leverages a -pre-trained language model during training, and show its effectiveness on the -speech recognition task. We show that Seq2Seq models with Cold Fusion are able -to better utilize language information enjoying i) faster convergence and -better generalization, and ii) almost complete transfer to a new domain while -using less than 10% of the labeled training data. -" -5569,1708.06510,"Frederick Liu, Han Lu, Graham Neubig",Handling Homographs in Neural Machine Translation,cs.CL," Homographs, words with different meanings but the same surface form, have -long caused difficulty for machine translation systems, as it is difficult to -select the correct translation based on the context. However, with the advent -of neural machine translation (NMT) systems, which can theoretically take into -account global sentential context, one may hypothesize that this problem has -been alleviated. In this paper, we first provide empirical evidence that -existing NMT systems in fact still have significant problems in properly -translating ambiguous words. We then proceed to describe methods, inspired by -the word sense disambiguation literature, that model the context of the input -word with context-aware word embeddings that help to differentiate the word -sense be- fore feeding it into the encoder. Experiments on three language pairs -demonstrate that such models improve the performance of NMT systems both in -terms of BLEU score and in the accuracy of translating homographs. -" -5570,1708.06550,"Bart{\l}omiej Balcerzak, Rados{\l}aw Nielek","Golden Years, Golden Shores: A Study of Elders in Online Travel - Communities",cs.CL cs.CY," In this paper we present our exploratory findings related to extracting -knowledge and experiences from a community of senior tourists. By using tools -of qualitative analysis as well as review of literature, we managed to verify a -set of hypotheses related to the content created by senior tourists when -participating in on-line communities. We also produced a codebook, representing -various themes one may encounter in such communities. This codebook, derived -from our own qualitative research, as well a literature review will serve as a -basis for further development of automated tools of knowledge extraction. We -also managed to find that older adults more often than other poster in tourists -forums, mention their age in discussion, more often share their experiences and -motivation to travel, however they do not differ in relation to describing -barriers encountered while traveling. -" -5571,1708.06555,"Youssef Oualil, Mittul Singh, Clayton Greenberg, Dietrich Klakow",Long-Short Range Context Neural Networks for Language Modeling,cs.CL cs.LG," The goal of language modeling techniques is to capture the statistical and -structural properties of natural languages from training corpora. This task -typically involves the learning of short range dependencies, which generally -model the syntactic properties of a language and/or long range dependencies, -which are semantic in nature. We propose in this paper a new multi-span -architecture, which separately models the short and long context information -while it dynamically merges them to perform the language modeling task. This is -done through a novel recurrent Long-Short Range Context (LSRC) network, which -explicitly models the local (short) and global (long) context using two -separate hidden states that evolve in time. This new architecture is an -adaptation of the Long-Short Term Memory network (LSTM) to take into account -the linguistic properties. Extensive experiments conducted on the Penn Treebank -(PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a -significant reduction of the perplexity when compared to state-of-the-art -language modeling techniques. -" -5572,1708.06708,"Reza Takhshid, Adel Rahimi",A rule based algorithm for detecting negative words in Persian,cs.CL," In this paper, we present a novel method for detecting negative words in -Persian. We first used an algorithm to an exceptions list which was later -modified by hand. We then used the mentioned lists and a Persian polarity -corpus in our rule based algorithm to detect negative words. -" -5573,1708.06828,"Bonggun Shin, Falgun H. Chokshi, Timothy Lee and Jinho D. Choi",Classification of Radiology Reports Using Neural Attention Models,cs.CL cs.AI cs.IR," The electronic health record (EHR) contains a large amount of -multi-dimensional and unstructured clinical data of significant operational and -research value. Distinguished from previous studies, our approach embraces a -double-annotated dataset and strays away from obscure ""black-box"" models to -comprehensive deep learning models. In this paper, we present a novel neural -attention mechanism that not only classifies clinically important findings. -Specifically, convolutional neural networks (CNN) with attention analysis are -used to classify radiology head computed tomography reports based on five -categories that radiologists would account for in assessing acute and -communicable findings in daily practice. The experiments show that our CNN -attention models outperform non-neural models, especially when trained on a -larger dataset. Our attention analysis demonstrates the intuition behind the -classifier's decision by generating a heatmap that highlights attended terms -used by the CNN model; this is valuable when potential downstream medical -decisions are to be performed by human experts or the classifier information is -to be used in cohort construction such as for epidemiological studies. -" -5574,1708.06872,"Yilin Zhang, Marie Poux-Berthe, Chris Wells, Karolina Koc-Michalska, - and Karl Rohe","Discovering Political Topics in Facebook Discussion threads with Graph - Contextualization",stat.AP cs.CL physics.soc-ph," We propose a graph contextualization method, pairGraphText, to study -political engagement on Facebook during the 2012 French presidential election. -It is a spectral algorithm that contextualizes graph data with text data for -online discussion thread. In particular, we examine the Facebook posts of the -eight leading candidates and the comments beneath these posts. We find evidence -of both (i) candidate-centered structure, where citizens primarily comment on -the wall of one candidate and (ii) issue-centered structure (i.e. on political -topics), where citizens' attention and expression is primarily directed towards -a specific set of issues (e.g. economics, immigration, etc). To identify -issue-centered structure, we develop pairGraphText, to analyze a network with -high-dimensional features on the interactions (i.e. text). This technique -scales to hundreds of thousands of nodes and thousands of unique words. In the -Facebook data, spectral clustering without the contextualizing text information -finds a mixture of (i) candidate and (ii) issue clusters. The contextualized -information with text data helps to separate these two structures. We conclude -by showing that the novel methodology is consistent under a statistical model. -" -5575,1708.06989,Youssef Oualil and Dietrich Klakow,A Neural Network Approach for Mixing Language Models,cs.CL cs.AI," The performance of Neural Network (NN)-based language models is steadily -improving due to the emergence of new architectures, which are able to learn -different natural language characteristics. This paper presents a novel -framework, which shows that a significant improvement can be achieved by -combining different existing heterogeneous models in a single architecture. -This is done through 1) a feature layer, which separately learns different -NN-based models and 2) a mixture layer, which merges the resulting model -features. In doing so, this architecture benefits from the learning -capabilities of each model with no noticeable increase in the number of model -parameters or the training time. Extensive experiments conducted on the Penn -Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a -significant reduction of the perplexity when compared to state-of-the-art -feedforward as well as recurrent neural network architectures. -" -5576,1708.07104,"Ver\'onica P\'erez-Rosas, Bennett Kleinberg, Alexandra Lefevre, Rada - Mihalcea",Automatic Detection of Fake News,cs.CL," The proliferation of misleading information in everyday access media outlets -such as social media feeds, news blogs, and online newspapers have made it -challenging to identify trustworthy news sources, thus increasing the need for -computational tools able to provide insights into the reliability of online -content. In this paper, we focus on the automatic identification of fake -content in online news. Our contribution is twofold. First, we introduce two -novel datasets for the task of fake news detection, covering seven different -news domains. We describe the collection, annotation, and validation process in -detail and present several exploratory analysis on the identification of -linguistic differences in fake and legitimate news content. Second, we conduct -a set of learning experiments to build accurate fake news detectors. In -addition, we provide comparative analyses of the automatic and manual -identification of fake news. -" -5577,1708.07149,"Ryan Lowe, Michael Noseworthy, Iulian V. Serban, Nicolas - Angelard-Gontier, Yoshua Bengio, Joelle Pineau","Towards an Automatic Turing Test: Learning to Evaluate Dialogue - Responses",cs.CL cs.AI cs.LG," Automatically evaluating the quality of dialogue responses for unstructured -domains is a challenging problem. Unfortunately, existing automatic evaluation -metrics are biased and correlate very poorly with human judgements of response -quality. Yet having an accurate automatic evaluation procedure is crucial for -dialogue research, as it allows rapid prototyping and testing of new models -with fewer expensive human evaluations. In response to this challenge, we -formulate automatic dialogue evaluation as a learning problem. We present an -evaluation model (ADEM) that learns to predict human-like scores to input -responses, using a new dataset of human response scores. We show that the ADEM -model's predictions correlate significantly, and at a level much higher than -word-overlap metrics such as BLEU, with human judgements at both the utterance -and system-level. We also show that ADEM can generalize to evaluating dialogue -models unseen during training, an important step for automatic dialogue -evaluation. -" -5578,1708.07241,"Thai-Hoang Pham, Xuan-Khoai Pham, Tuan-Anh Nguyen, Phuong Le-Hong",NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit,cs.CL," This paper demonstrates neural network-based toolkit namely NNVLP for -essential Vietnamese language processing tasks including part-of-speech (POS) -tagging, chunking, named entity recognition (NER). Our toolkit is a combination -of bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network -(CNN), Conditional Random Field (CRF), using pre-trained word embeddings as -input, which achieves state-of-the-art results on these three tasks. We provide -both API and web demo for this toolkit. -" -5579,1708.07252,Dengliang Shi,A Study on Neural Network Language Modeling,cs.CL cs.AI," An exhaustive study on neural network language modeling (NNLM) is performed -in this paper. Different architectures of basic neural network language models -are described and examined. A number of different improvements over basic -neural network language models, including importance sampling, word classes, -caching and bidirectional recurrent neural network (BiRNN), are studied -separately, and the advantages and disadvantages of every technique are -evaluated. Then, the limits of neural network language modeling are explored -from the aspects of model architecture and knowledge representation. Part of -the statistical information from a word sequence will loss when it is processed -word by word in a certain order, and the mechanism of training neural network -by updating weight matrixes and vectors imposes severe restrictions on any -significant enhancement of NNLM. For knowledge representation, the knowledge -represented by neural network language models is the approximate probabilistic -distribution of word sequences from a certain training data set rather than the -knowledge of a language itself or the information conveyed by word sequences in -a natural language. Finally, some directions for improving neural network -language modeling further is discussed. -" -5580,1708.07265,"Henrique F. de Arruda, Vanessa Q. Marinho, Thales S. Lima, Diego R. - Amancio, Luciano da F. Costa",An Image Analysis Approach to the Calligraphy of Books,cs.CL cs.CV," Text network analysis has received increasing attention as a consequence of -its wide range of applications. In this work, we extend a previous work founded -on the study of topological features of mesoscopic networks. Here, the -geometrical properties of visualized networks are quantified in terms of -several image analysis techniques and used as subsidies for authorship -attribution. It was found that the visual features account for performance -similar to that achieved by using topological measurements. In addition, the -combination of these two types of features improved the performance. -" -5581,1708.07279,"Jie Yang, Zhiyang Teng, Meishan Zhang, and Yue Zhang",Combining Discrete and Neural Features for Sequence Labeling,cs.CL," Neural network models have recently received heated research attention in the -natural language processing community. Compared with traditional models with -discrete features, neural models have two main advantages. First, they take -low-dimensional, real-valued embedding vectors as inputs, which can be trained -over large raw data, thereby addressing the issue of feature sparsity in -discrete models. Second, deep neural networks can be used to automatically -combine input features, and including non-local features that capture semantic -patterns that cannot be expressed using discrete indicator features. As a -result, neural network models have achieved competitive accuracies compared -with the best discrete models for a range of NLP tasks. - On the other hand, manual feature templates have been carefully investigated -for most NLP tasks over decades and typically cover the most useful indicator -pattern for solving the problems. Such information can be complementary the -features automatically induced from neural networks, and therefore combining -discrete and neural features can potentially lead to better accuracy compared -with models that leverage discrete or neural features only. - In this paper, we systematically investigate the effect of discrete and -neural feature combination for a range of fundamental NLP tasks based on -sequence labeling, including word segmentation, POS tagging and named entity -recognition for Chinese and English, respectively. Our results on standard -benchmarks show that state-of-the-art neural models can give accuracies -comparable to the best discrete models in the literature for most tasks and -combing discrete and neural features unanimously yield better results. -" -5582,1708.07403,"Rasmus Berg Palm, Ole Winther, Florian Laws","CloudScan - A configuration-free invoice analysis system using recurrent - neural networks",cs.CL," We present CloudScan; an invoice analysis system that requires zero -configuration or upfront annotation. In contrast to previous work, CloudScan -does not rely on templates of invoice layout, instead it learns a single global -model of invoices that naturally generalizes to unseen invoice layouts. The -model is trained using data automatically extracted from end-user provided -feedback. This automatic training data extraction removes the requirement for -users to annotate the data precisely. We describe a recurrent neural network -model that can capture long range context and compare it to a baseline logistic -regression model corresponding to the current CloudScan production system. We -train and evaluate the system on 8 important fields using a dataset of 326,471 -invoices. The recurrent neural network and baseline model achieve 0.891 and -0.887 average F1 scores respectively on seen invoice layouts. For the harder -task of unseen invoice layouts, the recurrent neural network model outperforms -the baseline with 0.840 average F1 compared to 0.788. -" -5583,1708.07476,"Kevin K. Bowden, Grace I. Lin, Lena I. Reed, Marilyn A. Walker",M2D: Monolog to Dialog Generation for Conversational Story Telling,cs.CL," Storytelling serves many different social functions, e.g. stories are used to -persuade, share troubles, establish shared values, learn social behaviors, and -entertain. Moreover, stories are often told conversationally through dialog, -and previous work suggests that information provided dialogically is more -engaging than when provided in monolog. In this paper, we present algorithms -for converting a deep representation of a story into a dialogic storytelling, -that can vary aspects of the telling, including the personality of the -storytellers. We conduct several experiments to test whether dialogic -storytellings are more engaging, and whether automatically generated variants -in linguistic form that correspond to personality differences can be recognized -in an extended storytelling dialog. -" -5584,1708.07524,DeLiang Wang and Jitong Chen,Supervised Speech Separation Based on Deep Learning: An Overview,cs.CL cs.LG cs.NE cs.SD," Speech separation is the task of separating target speech from background -interference. Traditionally, speech separation is studied as a signal -processing problem. A more recent approach formulates speech separation as a -supervised learning problem, where the discriminative patterns of speech, -speakers, and background noise are learned from training data. Over the past -decade, many supervised separation algorithms have been put forward. In -particular, the recent introduction of deep learning to supervised speech -separation has dramatically accelerated progress and boosted separation -performance. This article provides a comprehensive overview of the research on -deep learning based supervised speech separation in the last several years. We -first introduce the background of speech separation and the formulation of -supervised separation. Then we discuss three main components of supervised -separation: learning machines, training targets, and acoustic features. Much of -the overview is on separation algorithms where we review monaural methods, -including speech enhancement (speech-nonspeech separation), speaker separation -(multi-talker separation), and speech dereverberation, as well as -multi-microphone techniques. The important issue of generalization, unique to -supervised learning, is discussed. This overview provides a historical -perspective on how advances are made. In addition, we discuss a number of -conceptual issues, including what constitutes the target source. -" -5585,1708.07624,"Tommaso Soru, Edgard Marx, Diego Moussallem, Gustavo Publio, Andr\'e - Valdestilhas, Diego Esteves, Ciro Baron Neto",SPARQL as a Foreign Language,cs.CL cs.DB," In the last years, the Linked Data Cloud has achieved a size of more than 100 -billion facts pertaining to a multitude of domains. However, accessing this -information has been significantly challenging for lay users. Approaches to -problems such as Question Answering on Linked Data and Link Discovery have -notably played a role in increasing information access. These approaches are -often based on handcrafted and/or statistical models derived from data -observation. Recently, Deep Learning architectures based on Neural Networks -called seq2seq have shown to achieve state-of-the-art results at translating -sequences into sequences. In this direction, we propose Neural SPARQL Machines, -end-to-end deep architectures to translate any natural language expression into -sentences encoding SPARQL queries. Our preliminary results, restricted on -selected DBpedia classes, show that Neural SPARQL Machines are a promising -approach for Question Answering on Linked Data, as they can deal with known -problems such as vocabulary mismatch and perform graph pattern composition. -" -5586,1708.07690,Demian Gholipour Ghalandari,"Revisiting the Centroid-based Method: A Strong Baseline for - Multi-Document Summarization",cs.CL," The centroid-based model for extractive document summarization is a simple -and fast baseline that ranks sentences based on their similarity to a centroid -vector. In this paper, we apply this ranking to possible summaries instead of -sentences and use a simple greedy algorithm to find the best summary. -Furthermore, we show possi- bilities to scale up to larger input docu- ment -collections by selecting a small num- ber of sentences from each document prior -to constructing the summary. Experiments were done on the DUC2004 dataset for -multi-document summarization. We ob- serve a higher performance over the orig- -inal model, on par with more complex state-of-the-art methods. -" -5587,1708.07722,"Xinying Chen, Carlos G\'omez-Rodr\'iguez and Ramon Ferrer-i-Cancho",A dependency look at the reality of constituency,cs.CL cs.SI physics.soc-ph," A comment on ""Neurophysiological dynamics of phrase-structure building during -sentence processing"" by Nelson et al (2017), Proceedings of the National -Academy of Sciences USA 114(18), E3669-E3678. -" -5588,1708.07863,"Zhiguo Wang, Wael Hamza, Linfeng Song",$k$-Nearest Neighbor Augmented Neural Networks for Text Classification,cs.CL cs.AI," In recent years, many deep-learning based models are proposed for text -classification. This kind of models well fits the training set from the -statistical point of view. However, it lacks the capacity of utilizing -instance-level information from individual instances in the training set. In -this work, we propose to enhance neural network models by allowing them to -leverage information from $k$-nearest neighbor (kNN) of the input text. Our -model employs a neural network that encodes texts into text embeddings. -Moreover, we also utilize $k$-nearest neighbor of the input text as an external -memory, and utilize it to capture instance-level information from the training -set. The final prediction is made based on features from both the neural -network encoder and the kNN memory. Experimental results on several standard -benchmark datasets show that our model outperforms the baseline model on all -the datasets, and it even beats a very deep neural network model (with 29 -layers) in several datasets. Our model also shows superior performance when -training instances are scarce, and when the training set is severely -unbalanced. Our model also leverages techniques such as semi-supervised -training and transfer learning quite well. -" -5589,1708.07903,"Junting Ye, Shuchu Han, Yifan Hu, Baris Coskun, Meizhu Liu, Hong Qin, - Steven Skiena",Nationality Classification Using Name Embeddings,cs.SI cs.CL," Nationality identification unlocks important demographic information, with -many applications in biomedical and sociological research. Existing name-based -nationality classifiers use name substrings as features and are trained on -small, unrepresentative sets of labeled names, typically extracted from -Wikipedia. As a result, these methods achieve limited performance and cannot -support fine-grained classification. - We exploit the phenomena of homophily in communication patterns to learn name -embeddings, a new representation that encodes gender, ethnicity, and -nationality which is readily applicable to building classifiers and other -systems. Through our analysis of 57M contact lists from a major Internet -company, we are able to design a fine-grained nationality classifier covering -39 groups representing over 90% of the world population. In an evaluation -against other published systems over 13 common classes, our F1 score (0.795) is -substantial better than our closest competitor Ethnea (0.580). To the best of -our knowledge, this is the most accurate, fine-grained nationality classifier -available. - As a social media application, we apply our classifiers to the followers of -major Twitter celebrities over six different domains. We demonstrate stark -differences in the ethnicities of the followers of Trump and Obama, and in the -sports and entertainments favored by different groups. Finally, we identify an -anomalous political figure whose presumably inflated following appears largely -incapable of reading the language he posts in. -" -5590,1708.07918,"Mo Yu, Xiaoxiao Guo, Jinfeng Yi, Shiyu Chang, Saloni Potdar, Gerald - Tesauro, Haoyu Wang, Bowen Zhou",Robust Task Clustering for Deep Many-Task Learning,cs.LG cs.AI cs.CL stat.ML," We investigate task clustering for deep-learning based multi-task and -few-shot learning in a many-task setting. We propose a new method to measure -task similarities with cross-task transfer performance matrix for the deep -learning scenario. Although this matrix provides us critical information -regarding similarity between tasks, its asymmetric property and unreliable -performance scores can affect conventional clustering methods adversely. -Additionally, the uncertain task-pairs, i.e., the ones with extremely -asymmetric transfer scores, may collectively mislead clustering algorithms to -output an inaccurate task-partition. To overcome these limitations, we propose -a novel task-clustering algorithm by using the matrix completion technique. The -proposed algorithm constructs a partially-observed similarity matrix based on -the certainty of cluster membership of the task-pairs. We then use a matrix -completion algorithm to complete the similarity matrix. Our theoretical -analysis shows that under mild constraints, the proposed algorithm will -perfectly recover the underlying ""true"" similarity matrix with a high -probability. Our results show that the new task clustering method can discover -task clusters for training flexible and superior neural network models in a -multi-task learning setup for sentiment classification and dialog intent -classification tasks. Our task clustering approach also extends metric-based -few-shot learning methods to adapt multiple metrics, which demonstrates -empirical advantages when the tasks are diverse. -" -5591,1708.07950,"Raj Nath Patel, Prakash B. Pimpale, M Sasikumar",Machine Translation in Indian Languages: Challenges and Resolution,cs.CL," English to Indian language machine translation poses the challenge of -structural and morphological divergence. This paper describes English to Indian -language statistical machine translation using pre-ordering and suffix -separation. The pre-ordering uses rules to transfer the structure of the source -sentences prior to training and translation. This syntactic restructuring helps -statistical machine translation to tackle the structural divergence and hence -better translation quality. The suffix separation is used to tackle the -morphological divergence between English and highly agglutinative Indian -languages. We demonstrate that the use of pre-ordering and suffix separation -helps in improving the quality of English to Indian Language machine -translation. -" -5592,1708.08123,"Ankit Vadehra, Maura R. Grossman and Gordon V. Cormack",Impact of Feature Selection on Micro-Text Classification,cs.IR cs.CL," Social media datasets, especially Twitter tweets, are popular in the field of -text classification. Tweets are a valuable source of micro-text (sometimes -referred to as ""micro-blogs""), and have been studied in domains such as -sentiment analysis, recommendation systems, spam detection, clustering, among -others. Tweets often include keywords referred to as ""Hashtags"" that can be -used as labels for the tweet. Using tweets encompassing 50 labels, we studied -the impact of word versus character-level feature selection and extraction on -different learners to solve a multi-class classification task. We show that -feature extraction of simple character-level groups performs better than simple -word groups and pre-processing methods like normalizing using Porter's Stemming -and Part-of-Speech (""POS"")-Lemmatization. -" -5593,1708.08289,Dar\'io Garigliotti and Krisztian Balog,Generating Query Suggestions to Support Task-Based Search,cs.IR cs.AI cs.CL," We address the problem of generating query suggestions to support users in -completing their underlying tasks (which motivated them to search in the first -place). Given an initial query, these query suggestions should provide a -coverage of possible subtasks the user might be looking for. We propose a -probabilistic modeling framework that obtains keyphrases from multiple sources -and generates query suggestions from these keyphrases. Using the test suites of -the TREC Tasks track, we evaluate and analyze each component of our model. -" -5594,1708.08291,Dar\'io Garigliotti and Krisztian Balog,On Type-Aware Entity Retrieval,cs.IR cs.AI cs.CL," Today, the practice of returning entities from a knowledge base in response -to search queries has become widespread. One of the distinctive characteristics -of entities is that they are typed, i.e., assigned to some hierarchically -organized type system (type taxonomy). The primary objective of this paper is -to gain a better understanding of how entity type information can be utilized -in entity retrieval. We perform this investigation in an idealized ""oracle"" -setting, assuming that we know the distribution of target types of the relevant -entities for a given query. We perform a thorough analysis of three main -aspects: (i) the choice of type taxonomy, (ii) the representation of -hierarchical type information, and (iii) the combination of type-based and -term-based similarity in the retrieval model. Using a standard entity search -test collection based on DBpedia, we find that type information proves most -useful when using large type taxonomies that provide very specific types. We -provide further insights on the extensional coverage of entities and on the -utility of target types. -" -5595,1708.08484,Kai Zhao and Liang Huang,Joint Syntacto-Discourse Parsing and the Syntacto-Discourse Treebank,cs.CL," Discourse parsing has long been treated as a stand-alone problem independent -from constituency or dependency parsing. Most attempts at this problem are -pipelined rather than end-to-end, sophisticated, and not self-contained: they -assume gold-standard text segmentations (Elementary Discourse Units), and use -external parsers for syntactic features. In this paper we propose the first -end-to-end discourse parser that jointly parses in both syntax and discourse -levels, as well as the first syntacto-discourse treebank by integrating the -Penn Treebank with the RST Treebank. Built upon our recent span-based -constituency parser, this joint syntacto-discourse parser requires no -preprocessing whatsoever (such as segmentation or feature extraction), achieves -the state-of-the-art end-to-end discourse parsing accuracy. -" -5596,1708.08572,Stephanie Lukin and Marilyn Walker,"Really? Well. Apparently Bootstrapping Improves the Performance of - Sarcasm and Nastiness Classifiers for Online Dialogue",cs.CL," More and more of the information on the web is dialogic, from Facebook -newsfeeds, to forum conversations, to comment threads on news articles. In -contrast to traditional, monologic Natural Language Processing resources such -as news, highly social dialogue is frequent in social media, making it a -challenging context for NLP. This paper tests a bootstrapping method, -originally proposed in a monologic domain, to train classifiers to identify two -different types of subjective language in dialogue: sarcasm and nastiness. We -explore two methods of developing linguistic indicators to be used in a first -level classifier aimed at maximizing precision at the expense of recall. The -best performing classifier for the first phase achieves 54% precision and 38% -recall for sarcastic utterances. We then use general syntactic patterns from -previous work to create more general sarcasm indicators, improving precision to -62% and recall to 52%. To further test the generality of the method, we then -apply it to bootstrapping a classifier for nastiness dialogic acts. Our first -phase, using crowdsourced nasty indicators, achieves 58% precision and 49% -recall, which increases to 75% precision and 62% recall when we bootstrap over -the first level with generalized syntactic patterns. -" -5597,1708.08573,"Elena Rishes and Stephanie M. Lukin and David K. Elson and Marilyn A. - Walker","Generating Different Story Tellings from Semantic Representations of - Narrative",cs.CL," In order to tell stories in different voices for different audiences, -interactive story systems require: (1) a semantic representation of story -structure, and (2) the ability to automatically generate story and dialogue -from this semantic representation using some form of Natural Language -Generation (NLG). However, there has been limited research on methods for -linking story structures to narrative descriptions of scenes and story events. -In this paper we present an automatic method for converting from Scheherazade's -story intention graph, a semantic representation, to the input required by the -Personage NLG engine. Using 36 Aesop Fables distributed in DramaBank, a -collection of story encodings, we train translation rules on one story and then -test these rules by generating text for the remaining 35. The results are -measured in terms of the string similarity metrics Levenshtein Distance and -BLEU score. The results show that we can generate the 35 stories with correct -content: the test set stories on average are close to the output of the -Scheherazade realizer, which was customized to this semantic representation. We -provide some examples of story variations generated by personage. In future -work, we will experiment with measuring the quality of the same stories -generated in different voices, and with techniques for making storytelling -interactive. -" -5598,1708.08575,"Stephanie M. Lukin and Luke Eisenberg and Thomas Corcoran and Marilyn - A. Walker",Identifying Subjective and Figurative Language in Online Dialogue,cs.CL," More and more of the information on the web is dialogic, from Facebook -newsfeeds, to forum conversations, to comment threads on news articles. In -contrast to traditional, monologic resources such as news, highly social -dialogue is very frequent in social media. We aim to automatically identify -sarcastic and nasty utterances in unannotated online dialogue, extending a -bootstrapping method previously applied to the classification of monologic -subjective sentences in Riloff and Weibe 2003. We have adapted the method to -fit the sarcastic and nasty dialogic domain. Our method is as follows: 1) -Explore methods for identifying sarcastic and nasty cue words and phrases in -dialogues; 2) Use the learned cues to train a sarcastic (nasty) Cue-Based -Classifier; 3) Learn general syntactic extraction patterns from the sarcastic -(nasty) utterances and define fine-tuned sarcastic patterns to create a -Pattern-Based Classifier; 4) Combine both Cue-Based and fine-tuned -Pattern-Based Classifiers to maximize precision at the expense of recall and -test on unannotated utterances. -" -5599,1708.08580,Stephanie M. Lukin and Lena I. Reed and Marilyn A. Walker,Generating Sentence Planning Variations for Story Telling,cs.CL," There has been a recent explosion in applications for dialogue interaction -ranging from direction-giving and tourist information to interactive story -systems. Yet the natural language generation (NLG) component for many of these -systems remains largely handcrafted. This limitation greatly restricts the -range of applications; it also means that it is impossible to take advantage of -recent work in expressive and statistical language generation that can -dynamically and automatically produce a large number of variations of given -content. We propose that a solution to this problem lies in new methods for -developing language generation resources. We describe the ES-Translator, a -computational language generator that has previously been applied only to -fables, and quantitatively evaluate the domain independence of the EST by -applying it to personal narratives from weblogs. We then take advantage of -recent work on language generation to create a parameterized sentence planner -for story generation that provides aggregation operations, variations in -discourse and in point of view. Finally, we present a user evaluation of -different personal narrative retellings. -" -5600,1708.08585,Stephanie M. Lukin and Marilyn A. Walker,Narrative Variations in a Virtual Storyteller,cs.CL," Research on storytelling over the last 100 years has distinguished at least -two levels of narrative representation (1) story, or fabula; and (2) discourse, -or sujhet. We use this distinction to create Fabula Tales, a computational -framework for a virtual storyteller that can tell the same story in different -ways through the implementation of general narratological variations, such as -varying direct vs. indirect speech, character voice (style), point of view, and -focalization. A strength of our computational framework is that it is based on -very general methods for re-using existing story content, either from fables or -from personal narratives collected from blogs. We first explain how a simple -annotation tool allows naive annotators to easily create a deep representation -of fabula called a story intention graph, and show how we use this -representation to generate story tellings automatically. Then we present -results of two studies testing our narratological parameters, and showing that -different tellings affect the reader's perception of the story and characters. -" -5601,1708.08615,Andreas Stolcke and Jasha Droppo,"Comparing Human and Machine Errors in Conversational Speech - Transcription",cs.CL," Recent work in automatic recognition of conversational telephone speech (CTS) -has achieved accuracy levels comparable to human transcribers, although there -is some debate how to precisely quantify human performance on this task, using -the NIST 2000 CTS evaluation set. This raises the question what systematic -differences, if any, may be found differentiating human from machine -transcription errors. In this paper we approach this question by comparing the -output of our most accurate CTS recognition system to that of a standard speech -transcription vendor pipeline. We find that the most frequent substitution, -deletion and insertion error types of both outputs show a high degree of -overlap. The only notable exception is that the automatic recognizer tends to -confuse filled pauses (""uh"") and backchannel acknowledgments (""uhhuh""). Humans -tend not to make this error, presumably due to the distinctive and opposing -pragmatic functions attached to these words. Furthermore, we quantify the -correlation between human and machine errors at the speaker level, and -investigate the effect of speaker overlap between training and test data. -Finally, we report on an informal ""Turing test"" asking humans to discriminate -between automatic and human transcription error cases. -" -5602,1708.08712,"Hassan Sajjad and Nadir Durrani and Fahim Dalvi and Yonatan Belinkov - and Stephan Vogel",Neural Machine Translation Training in a Multi-Domain Scenario,cs.CL," In this paper, we explore alternative ways to train a neural machine -translation system in a multi-domain scenario. We investigate data -concatenation (with fine tuning), model stacking (multi-level fine tuning), -data selection and multi-model ensemble. Our findings show that the best -translation quality can be achieved by building an initial system on a -concatenation of available out-of-domain data and then fine-tuning it on -in-domain data. Model stacking works best when training begins with the -furthest out-of-domain data and the model is incrementally fine-tuned with the -next furthest domain and so on. Data selection did not give the best results, -but can be considered as a decent compromise between training time and -translation quality. A weighted ensemble of different individual models -performed better than data selection. It is beneficial in a scenario when there -is no time for fine-tuning an already trained model. -" -5603,1708.08959,Mohab Elkaref and Bernd Bohnet,A Simple LSTM model for Transition-based Dependency Parsing,cs.CL," We present a simple LSTM-based transition-based dependency parser. Our model -is composed of a single LSTM hidden layer replacing the hidden layer in the -usual feed-forward network architecture. We also propose a new initialization -method that uses the pre-trained weights from a feed-forward neural network to -initialize our LSTM-based model. We also show that using dropout on the input -layer has a positive effect on performance. Our final parser achieves a 93.06% -unlabeled and 91.01% labeled attachment score on the Penn Treebank. We -additionally replace LSTMs with GRUs and Elman units in our model and explore -the effectiveness of our initialization method on individual gates constituting -all three types of RNN units. -" -5604,1708.09025,"Xiaofeng Zhu, Diego Klabjan, Patrick Bless","Unsupervised Terminological Ontology Learning based on Hierarchical - Topic Modeling",cs.CL cs.IR cs.LG," In this paper, we present hierarchical relationbased latent Dirichlet -allocation (hrLDA), a data-driven hierarchical topic model for extracting -terminological ontologies from a large number of heterogeneous documents. In -contrast to traditional topic models, hrLDA relies on noun phrases instead of -unigrams, considers syntax and document structures, and enriches topic -hierarchies with topic relations. Through a series of experiments, we -demonstrate the superiority of hrLDA over existing topic models, especially for -building hierarchies. Furthermore, we illustrate the robustness of hrLDA in the -settings of noisy data sets, which are likely to occur in many practical -scenarios. Our ontology evaluation results show that ontologies extracted from -hrLDA are very competitive with the ontologies created by domain experts. -" -5605,1708.09040,"Elahe Rahimtoroghi, Jiaqi Wu, Ruimin Wang, Pranav Anand, Marilyn A - Walker",Modelling Protagonist Goals and Desires in First-Person Narrative,cs.AI cs.CL cs.NE," Many genres of natural language text are narratively structured, a testament -to our predilection for organizing our experiences as narratives. There is -broad consensus that understanding a narrative requires identifying and -tracking the goals and desires of the characters and their narrative outcomes. -However, to date, there has been limited work on computational models for this -problem. We introduce a new dataset, DesireDB, which includes gold-standard -labels for identifying statements of desire, textual evidence for desire -fulfillment, and annotations for whether the stated desire is fulfilled given -the evidence in the narrative context. We report experiments on tracking desire -fulfillment using different methods, and show that LSTM Skip-Thought model -achieves F-measure of 0.7 on our corpus. -" -5606,1708.09082,"Stephanie M. Lukin, Kevin Bowden, Casey Barackman and Marilyn A. - Walker","PersonaBank: A Corpus of Personal Narratives and Their Story Intention - Graphs",cs.CL," We present a new corpus, PersonaBank, consisting of 108 personal stories from -weblogs that have been annotated with their Story Intention Graphs, a deep -representation of the fabula of a story. We describe the topics of the stories -and the basis of the Story Intention Graph representation, as well as the -process of annotating the stories to produce the Story Intention Graphs and the -challenges of adapting the tool to this new personal narrative domain We also -discuss how the corpus can be used in applications that retell the story using -different styles of tellings, co-tellings, or as a content planner. -" -5607,1708.09085,"Stephanie M. Lukin, Pranav Anand, Marilyn Walker and Steve Whittaker","Argument Strength is in the Eye of the Beholder: Audience Effects in - Persuasion",cs.CL," Americans spend about a third of their time online, with many participating -in online conversations on social and political issues. We hypothesize that -social media arguments on such issues may be more engaging and persuasive than -traditional media summaries, and that particular types of people may be more or -less convinced by particular styles of argument, e.g. emotional arguments may -resonate with some personalities while factual arguments resonate with others. -We report a set of experiments testing at large scale how audience variables -interact with argument style to affect the persuasiveness of an argument, an -under-researched topic within natural language processing. We show that belief -change is affected by personality factors, with conscientious, open and -agreeable people being more convinced by emotional arguments. -" -5608,1708.09090,"Stephanie M. Lukin, James O. Ryan and Marilyn A. Walker",Automating Direct Speech Variations in Stories and Games,cs.CL," Dialogue authoring in large games requires not only content creation but the -subtlety of its delivery, which can vary from character to character. Manually -authoring this dialogue can be tedious, time-consuming, or even altogether -infeasible. This paper utilizes a rich narrative representation for modeling -dialogue and an expressive natural language generation engine for realizing it, -and expands upon a translation tool that bridges the two. We add functionality -to the translator to allow direct speech to be modeled by the narrative -representation, whereas the original translator supports only narratives told -by a third person narrator. We show that we can perform character substitution -in dialogues. We implement and evaluate a potential application to dialogue -implementation: generating dialogue for games with big, dynamic, or -procedurally-generated open worlds. We present a pilot study on human -perceptions of the personalities of characters using direct speech, assuming -unknown personality types at the time of authoring. -" -5609,1708.09151,"Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov and - David Yarowsky",Paradigm Completion for Derivational Morphology,cs.CL," The generation of complex derived word forms has been an overlooked problem -in NLP; we fill this gap by applying neural sequence-to-sequence models to the -task. We overview the theoretical motivation for a paradigmatic treatment of -derivational morphology, and introduce the task of derivational paradigm -completion as a parallel to inflectional paradigm completion. State-of-the-art -neural models, adapted from the inflection task, are able to learn a range of -derivation patterns, and outperform a non-neural baseline by 16.4%. However, -due to semantic, historical, and lexical considerations involved in -derivational morphology, future work will be needed to achieve performance -parity with inflection-generating systems. -" -5610,1708.09157,Ryan Cotterell and Georg Heigold,"Cross-lingual, Character-Level Neural Morphological Tagging",cs.CL," Even for common NLP tasks, sufficient supervision is not available in many -languages -- morphological tagging is no exception. In the work presented here, -we explore a transfer learning scheme, whereby we train character-level -recurrent neural taggers to predict morphological taggings for high-resource -languages and low-resource languages together. Learning joint character -representations among multiple related languages successfully enables knowledge -transfer from the high-resource languages to the low-resource ones, improving -accuracy by up to 30% -" -5611,1708.09163,"Phuong Le-Hong, Minh Pham Quang Nhat, Thai-Hoang Pham, Tuan-Anh Tran, - Dang-Minh Nguyen","An Empirical Study of Discriminative Sequence Labeling Models for - Vietnamese Text Processing",cs.CL," This paper presents an empirical study of two widely-used sequence prediction -models, Conditional Random Fields (CRFs) and Long Short-Term Memory Networks -(LSTMs), on two fundamental tasks for Vietnamese text processing, including -part-of-speech tagging and named entity recognition. We show that a strong -lower bound for labeling accuracy can be obtained by relying only on simple -word-based features with minimal hand-crafted feature engineering, of 90.65\% -and 86.03\% performance scores on the standard test sets for the two tasks -respectively. In particular, we demonstrate empirically the surprising -efficiency of word embeddings in both of the two tasks, with both of the two -models. We point out that the state-of-the-art LSTMs model does not always -outperform significantly the traditional CRFs model, especially on -moderate-sized data sets. Finally, we give some suggestions and discussions for -efficient use of sequence labeling models in practical applications. -" -5612,1708.09217,"Long Zhou, Jiajun Zhang, Chengqing Zong",Look-ahead Attention for Generation in Neural Machine Translation,cs.CL," The attention model has become a standard component in neural machine -translation (NMT) and it guides translation process by selectively focusing on -parts of the source sentence when predicting each target word. However, we find -that the generation of a target word does not only depend on the source -sentence, but also rely heavily on the previous generated target words, -especially the distant words which are difficult to model by using recurrent -neural networks. To solve this problem, we propose in this paper a novel -look-ahead attention mechanism for generation in NMT, which aims at directly -capturing the dependency relationship between target words. We further design -three patterns to integrate our look-ahead attention into the conventional -attention model. Experiments on NIST Chinese-to-English and WMT -English-to-German translation tasks show that our proposed look-ahead attention -mechanism achieves substantial improvements over state-of-the-art baselines. -" -5613,1708.09230,"Sandro A. Coelho, Diego Moussallem, Gustavo C. Publio and Diego - Esteves","TANKER: Distributed Architecture for Named Entity Recognition and - Disambiguation",cs.CL," Named Entity Recognition and Disambiguation (NERD) systems have recently been -widely researched to deal with the significant growth of the Web. NERD systems -are crucial for several Natural Language Processing (NLP) tasks such as -summarization, understanding, and machine translation. However, there is no -standard interface specification, i.e. these systems may vary significantly -either for exporting their outputs or for processing the inputs. Thus, when a -given company desires to implement more than one NERD system, the process is -quite exhaustive and prone to failure. In addition, industrial solutions demand -critical requirements, e.g., large-scale processing, completeness, versatility, -and licenses. Commonly, these requirements impose a limitation, making good -NERD models to be ignored by companies. This paper presents TANKER, a -distributed architecture which aims to overcome scalability, reliability and -failure tolerance limitations related to industrial needs by combining NERD -systems. To this end, TANKER relies on a micro-services oriented architecture, -which enables agile development and delivery of complex enterprise -applications. In addition, TANKER provides a standardized API which makes -possible to combine several NERD systems at once. -" -5614,1708.09234,"Dmitry Ustalov, Mikhail Chernoskutov, Chris Biemann, and Alexander - Panchenko",Fighting with the Sparsity of Synonymy Dictionaries,cs.CL," Graph-based synset induction methods, such as MaxMax and Watset, induce -synsets by performing a global clustering of a synonymy graph. However, such -methods are sensitive to the structure of the input synonymy graph: sparseness -of the input dictionary can substantially reduce the quality of the extracted -synsets. In this paper, we propose two different approaches designed to -alleviate the incompleteness of the input dictionaries. The first one performs -a pre-processing of the graph by adding missing edges, while the second one -performs a post-processing by merging similar synset clusters. We evaluate -these approaches on two datasets for the Russian language and discuss their -impact on the performance of synset induction methods. Finally, we perform an -extensive error analysis of each approach and discuss prominent alternative -methods for coping with the problem of the sparsity of the synonymy -dictionaries. -" -5615,1708.09403,"Tianze Shi, Liang Huang, Lillian Lee","Fast(er) Exact Decoding and Global Training for Transition-Based - Dependency Parsing via a Minimal Feature Set",cs.CL," We first present a minimal feature set for transition-based dependency -parsing, continuing a recent trend started by Kiperwasser and Goldberg (2016a) -and Cross and Huang (2016a) of using bi-directional LSTM features. We plug our -minimal feature set into the dynamic-programming framework of Huang and Sagae -(2010) and Kuhlmann et al. (2011) to produce the first implementation of -worst-case O(n^3) exact decoders for arc-hybrid and arc-eager transition -systems. With our minimal features, we also present O(n^3) global training -methods. Finally, using ensembles including our new parsers, we achieve the -best unlabeled attachment score reported (to our knowledge) on the Chinese -Treebank and the ""second-best-in-class"" result on the English Penn Treebank. -" -5616,1708.09417,Lasha Abzianidze,LangPro: Natural Language Theorem Prover,cs.CL," LangPro is an automated theorem prover for natural language -(https://github.com/kovvalsky/LangPro). Given a set of premises and a -hypothesis, it is able to prove semantic relations between them. The prover is -based on a version of analytic tableau method specially designed for natural -logic. The proof procedure operates on logical forms that preserve linguistic -expressions to a large extent. %This property makes the logical forms easily -obtainable from syntactic trees. %, in particular, Combinatory Categorial -Grammar derivation trees. The nature of proofs is deductive and transparent. On -the FraCaS and SICK textual entailment datasets, the prover achieves high -results comparable to state-of-the-art. -" -5617,1708.09450,"Elahe Rahimtoroghi, Ernesto Hernandez, Marilyn A Walker","Learning Fine-Grained Knowledge about Contingent Relations between - Everyday Events",cs.CL cs.AI," Much of the user-generated content on social media is provided by ordinary -people telling stories about their daily lives. We develop and test a novel -method for learning fine-grained common-sense knowledge from these stories -about contingent (causal and conditional) relationships between everyday -events. This type of knowledge is useful for text and story understanding, -information extraction, question answering, and text summarization. We test and -compare different methods for learning contingency relation, and compare what -is learned from topic-sorted story collections vs. general-domain stories. Our -experiments show that using topic-specific datasets enables learning -finer-grained knowledge about events and results in significant improvement -over the baselines. An evaluation on Amazon Mechanical Turk shows 82% of the -relations between events that we learn from topic-sorted stories are judged as -contingent. -" -5618,1708.09453,"Zhichao Hu, Elahe Rahimtoroghi, Marilyn A Walker",Inference of Fine-Grained Event Causality from Blogs and Films,cs.CL cs.AI," Human understanding of narrative is mainly driven by reasoning about causal -relations between events and thus recognizing them is a key capability for -computational models of language understanding. Computational work in this area -has approached this via two different routes: by focusing on acquiring a -knowledge base of common causal relations between events, or by attempting to -understand a particular story or macro-event, along with its storyline. In this -position paper, we focus on knowledge acquisition approach and claim that -newswire is a relatively poor source for learning fine-grained causal relations -between everyday events. We describe experiments using an unsupervised method -to learn causal relations between events in the narrative genres of -first-person narratives and film scene descriptions. We show that our method -learns fine-grained causal relations, judged by humans as likely to be causal -over 80% of the time. We also demonstrate that the learned event pairs do not -exist in publicly available event-pair datasets extracted from newswire. -" -5619,1708.09492,"Siyuan Jiang, Ameer Armaly, Collin McMillan","Automatically Generating Commit Messages from Diffs using Neural Machine - Translation",cs.SE cs.CL," Commit messages are a valuable resource in comprehension of software -evolution, since they provide a record of changes such as feature additions and -bug repairs. Unfortunately, programmers often neglect to write good commit -messages. Different techniques have been proposed to help programmers by -automatically writing these messages. These techniques are effective at -describing what changed, but are often verbose and lack context for -understanding the rationale behind a change. In contrast, humans write messages -that are short and summarize the high level rationale. In this paper, we adapt -Neural Machine Translation (NMT) to automatically ""translate"" diffs into commit -messages. We trained an NMT algorithm using a corpus of diffs and human-written -commit messages from the top 1k Github projects. We designed a filter to help -ensure that we only trained the algorithm on higher-quality commit messages. -Our evaluation uncovered a pattern in which the messages we generate tend to be -either very high or very low quality. Therefore, we created a quality-assurance -filter to detect cases in which we are unable to produce good messages, and -return a warning instead. -" -5620,1708.09496,Zhichao Hu and Marilyn A. Walker,Inferring Narrative Causality between Event Pairs in Films,cs.CL," To understand narrative, humans draw inferences about the underlying -relations between narrative events. Cognitive theories of narrative -understanding define these inferences as four different types of causality, -that include pairs of events A, B where A physically causes B (X drop, X -break), to pairs of events where A causes emotional state B (Y saw X, Y felt -fear). Previous work on learning narrative relations from text has either -focused on ""strict"" physical causality, or has been vague about what relation -is being learned. This paper learns pairs of causal events from a corpus of -film scene descriptions which are action rich and tend to be told in -chronological order. We show that event pairs induced using our methods are of -high quality and are judged to have a stronger causal relation than event pairs -from Rel-grams. -" -5621,1708.09497,"Zhichao Hu, Elahe Rahimtoroghi, Larissa Munishkina, Reid Swanson and - Marilyn A. Walker",Unsupervised Induction of Contingent Event Pairs from Film Scenes,cs.CL," Human engagement in narrative is partially driven by reasoning about -discourse relations between narrative events, and the expectations about what -is likely to happen next that results from such reasoning. Researchers in NLP -have tackled modeling such expectations from a range of perspectives, including -treating it as the inference of the contingent discourse relation, or as a type -of common-sense causal reasoning. Our approach is to model likelihood between -events by drawing on several of these lines of previous work. We implement and -evaluate different unsupervised methods for learning event pairs that are -likely to be contingent on one another. We refine event pairs that we learn -from a corpus of film scene descriptions utilizing web search counts, and -evaluate our results by collecting human judgments of contingency. Our results -indicate that the use of web search counts increases the average accuracy of -our best method to 85.64% over a baseline of 50%, as compared to an average -accuracy of 75.15% without web search. -" -5622,1708.09516,Vikramjit Mitra and Horacio Franco,"Leveraging Deep Neural Network Activation Entropy to cope with Unseen - Data in Speech Recognition",cs.LG cs.CL stat.ML," Unseen data conditions can inflict serious performance degradation on systems -relying on supervised machine learning algorithms. Because data can often be -unseen, and because traditional machine learning algorithms are trained in a -supervised manner, unsupervised adaptation techniques must be used to adapt the -model to the unseen data conditions. However, unsupervised adaptation is often -challenging, as one must generate some hypothesis given a model and then use -that hypothesis to bootstrap the model to the unseen data conditions. -Unfortunately, reliability of such hypotheses is often poor, given the mismatch -between the training and testing datasets. In such cases, a model hypothesis -confidence measure enables performing data selection for the model adaptation. -Underlying this approach is the fact that for unseen data conditions, data -variability is introduced to the model, which the model propagates to its -output decision, impacting decision reliability. In a fully connected network, -this data variability is propagated as distortions from one layer to the next. -This work aims to estimate the propagation of such distortion in the form of -network activation entropy, which is measured over a short- time running window -on the activation from each neuron of a given hidden layer, and these -measurements are then used to compute summary entropy. This work demonstrates -that such an entropy measure can help to select data for unsupervised model -adaptation, resulting in performance gains in speech recognition tasks. Results -from standard benchmark speech recognition tasks show that the proposed -approach can alleviate the performance degradation experienced under unseen -data conditions by iteratively adapting the model to the unseen datas acoustic -condition. -" -5623,1708.09609,"Greg Durrett, Jonathan K. Kummerfeld, Taylor Berg-Kirkpatrick, Rebecca - S. Portnoff, Sadia Afroz, Damon McCoy, Kirill Levchenko, Vern Paxson","Identifying Products in Online Cybercrime Marketplaces: A Dataset for - Fine-grained Domain Adaptation",cs.CL," One weakness of machine-learned NLP models is that they typically perform -poorly on out-of-domain data. In this work, we study the task of identifying -products being bought and sold in online cybercrime forums, which exhibits -particularly challenging cross-domain effects. We formulate a task that -represents a hybrid of slot-filling information extraction and named entity -recognition and annotate data from four different forums. Each of these forums -constitutes its own ""fine-grained domain"" in that the forums cover different -market sectors with different properties, even though all forums are in the -broad domain of cybercrime. We characterize these domain differences in the -context of a learning-based system: supervised models see decreased accuracy -when applied to new forums, and standard techniques for semi-supervised -learning and domain adaptation have limited effectiveness on this data, which -suggests the need to improve these techniques. We release a dataset of 1,938 -annotated posts from across the four forums. -" -5624,1708.09666,"Shizhe Chen, Jia Chen, Qin Jin",Generating Video Descriptions with Topic Guidance,cs.CV cs.CL," Generating video descriptions in natural language (a.k.a. video captioning) -is a more challenging task than image captioning as the videos are -intrinsically more complicated than images in two aspects. First, videos cover -a broader range of topics, such as news, music, sports and so on. Second, -multiple topics could coexist in the same video. In this paper, we propose a -novel caption model, topic-guided model (TGM), to generate topic-oriented -descriptions for videos in the wild via exploiting topic information. In -addition to predefined topics, i.e., category tags crawled from the web, we -also mine topics in a data-driven way based on training captions by an -unsupervised topic mining model. We show that data-driven topics reflect a -better topic schema than the predefined topics. As for testing video topic -prediction, we treat the topic mining model as teacher to train the student, -the topic prediction model, by utilizing the full multi-modalities in the video -especially the speech modality. We propose a series of caption models to -exploit topic guidance, including implicitly using the topics as input features -to generate words related to the topic and explicitly modifying the weights in -the decoder with topics to function as an ensemble of topic-aware language -decoders. Our comprehensive experimental results on the current largest video -caption dataset MSR-VTT prove the effectiveness of our topic-guided model, -which significantly surpasses the winning performance in the 2016 MSR video to -language challenge. -" -5625,1708.09667,"Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptmann",Video Captioning with Guidance of Multimodal Latent Topics,cs.CV cs.CL," The topic diversity of open-domain videos leads to various vocabularies and -linguistic expressions in describing video contents, and therefore, makes the -video captioning task even more challenging. In this paper, we propose an -unified caption framework, M&M TGM, which mines multimodal topics in -unsupervised fashion from data and guides the caption decoder with these -topics. Compared to pre-defined topics, the mined multimodal topics are more -semantically and visually coherent and can reflect the topic distribution of -videos better. We formulate the topic-aware caption generation as a multi-task -learning problem, in which we add a parallel task, topic prediction, in -addition to the caption task. For the topic prediction task, we use the mined -topics as the teacher to train a student topic prediction model, which learns -to predict the latent topics from multimodal contents of videos. The topic -prediction provides intermediate supervision to the learning process. As for -the caption task, we propose a novel topic-aware decoder to generate more -accurate and detailed video descriptions with the guidance from latent topics. -The entire learning procedure is end-to-end and it optimizes both tasks -simultaneously. The results from extensive experiments conducted on the MSR-VTT -and Youtube2Text datasets demonstrate the effectiveness of our proposed model. -M&M TGM not only outperforms prior state-of-the-art methods on multiple -evaluation metrics and on both benchmark datasets, but also achieves better -generalization ability. -" -5626,1708.09702,"Alexander Panchenko, Dmitry Ustalov, Nikolay Arefyev, Denis Paperno, - Natalia Konstantinova, Natalia Loukachevitch, and Chris Biemann",Human and Machine Judgements for Russian Semantic Relatedness,cs.CL," Semantic relatedness of terms represents similarity of meaning by a numerical -score. On the one hand, humans easily make judgments about semantic -relatedness. On the other hand, this kind of information is useful in language -processing systems. While semantic relatedness has been extensively studied for -English using numerous language resources, such as associative norms, human -judgments, and datasets generated from lexical databases, no evaluation -resources of this kind have been available for Russian to date. Our -contribution addresses this problem. We present five language resources of -different scale and purpose for Russian semantic relatedness, each being a list -of triples (word_i, word_j, relatedness_ij). Four of them are designed for -evaluation of systems for computing semantic relatedness, complementing each -other in terms of the semantic relation type they represent. These benchmarks -were used to organize a shared task on Russian semantic relatedness, which -attracted 19 teams. We use one of the best approaches identified in this -competition to generate the fifth high-coverage resource, the first open -distributional thesaurus of Russian. Multiple evaluations of this thesaurus, -including a large-scale crowdsourcing study involving native speakers, indicate -its high accuracy. -" -5627,1708.09789,"Lena Reed, Jiaqi Wu, Shereen Oraby, Pranav Anand, Marilyn Walker",Learning Lexico-Functional Patterns for First-Person Affect,cs.CL," Informal first-person narratives are a unique resource for computational -models of everyday events and people's affective reactions to them. People -blogging about their day tend not to explicitly say I am happy. Instead they -describe situations from which other humans can readily infer their affective -reactions. However current sentiment dictionaries are missing much of the -information needed to make similar inferences. We build on recent work that -models affect in terms of lexical predicate functions and affect on the -predicate's arguments. We present a method to learn proxies for these functions -from first-person narratives. We construct a novel fine-grained test set, and -show that the patterns we learn improve our ability to predict first-person -affective reactions to everyday events, from a Stanford sentiment baseline of -.67F to .75F. -" -5628,1708.09803,"Toan Q. Nguyen, David Chiang","Transfer Learning across Low-Resource, Related Languages for Neural - Machine Translation",cs.CL," We present a simple method to improve neural translation of a low-resource -language pair using parallel data from a related, also low-resource, language -pair. The method is based on the transfer method of Zoph et al., but whereas -their method ignores any source vocabulary overlap, ours exploits it. First, we -split words using Byte Pair Encoding (BPE) to increase vocabulary overlap. -Then, we train a model on the first language pair and transfer its parameters, -including its source word embeddings, to another model and continue training on -the second language pair. Our experiments show that transfer learning helps -word-based translation only slightly, but when used on top of a much stronger -BPE baseline, it yields larger improvements of up to 4.3 BLEU. -" -5629,1709.00023,"Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei - Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, Jing Jiang",R$^3$: Reinforced Reader-Ranker for Open-Domain Question Answering,cs.CL cs.AI," In recent years researchers have achieved considerable success applying -neural network methods to question answering (QA). These approaches have -achieved state of the art results in simplified closed-domain settings such as -the SQuAD (Rajpurkar et al., 2016) dataset, which provides a pre-selected -passage, from which the answer to a given question may be extracted. More -recently, researchers have begun to tackle open-domain QA, in which the model -is given a question and access to a large corpus (e.g., wikipedia) instead of a -pre-selected passage (Chen et al., 2017a). This setting is more complex as it -requires large-scale search for relevant passages by an information retrieval -component, combined with a reading comprehension model that ""reads"" the -passages to generate an answer to the question. Performance in this setting -lags considerably behind closed-domain performance. In this paper, we present a -novel open-domain QA system called Reinforced Ranker-Reader $(R^3)$, based on -two algorithmic innovations. First, we propose a new pipeline for open-domain -QA with a Ranker component, which learns to rank retrieved passages in terms of -likelihood of generating the ground-truth answer to a given question. Second, -we propose a novel method that jointly trains the Ranker along with an -answer-generation Reader model, based on reinforcement learning. We report -extensive experimental results showing that our method significantly improves -on the state of the art for multiple open-domain QA datasets. -" -5630,1709.00028,Falcon Z. Dai and Zheng Cai,Glyph-aware Embedding of Chinese Characters,cs.CL cs.LG," Given the advantage and recent success of English character-level and -subword-unit models in several NLP tasks, we consider the equivalent modeling -problem for Chinese. Chinese script is logographic and many Chinese logograms -are composed of common substructures that provide semantic, phonetic and -syntactic hints. In this work, we propose to explicitly incorporate the visual -appearance of a character's glyph in its representation, resulting in a novel -glyph-aware embedding of Chinese characters. Being inspired by the success of -convolutional neural networks in computer vision, we use them to incorporate -the spatio-structural patterns of Chinese glyphs as rendered in raw pixels. In -the context of two basic Chinese NLP tasks of language modeling and word -segmentation, the model learns to represent each character's task-relevant -semantic and syntactic information in the character-level embedding. -" -5631,1709.00071,"Patrick Baylis, Nick Obradovich, Yury Kryvasheyeu, Haohui Chen, - Lorenzo Coviello, Esteban Moro, Manuel Cebrian, James H. Fowler",Weather impacts expressed sentiment,stat.AP cs.CL," We conduct the largest ever investigation into the relationship between -meteorological conditions and the sentiment of human expressions. To do this, -we employ over three and a half billion social media posts from tens of -millions of individuals from both Facebook and Twitter between 2009 and 2016. -We find that cold temperatures, hot temperatures, precipitation, narrower daily -temperature ranges, humidity, and cloud cover are all associated with worsened -expressions of sentiment, even when excluding weather-related posts. We compare -the magnitude of our estimates with the effect sizes associated with notable -historical events occurring within our data. -" -5632,1709.00094,"Jiaqi Wu, Marilyn Walker, Pranav Anand and Steve Whittaker",Linguistic Reflexes of Well-Being and Happiness in Echo,cs.CL," Different theories posit different sources for feelings of well-being and -happiness. Appraisal theory grounds our emotional responses in our goals and -desires and their fulfillment, or lack of fulfillment. Self Determination -theory posits that the basis for well-being rests on our assessment of our -competence, autonomy, and social connection. And surveys that measure happiness -empirically note that people require their basic needs to be met for food and -shelter, but beyond that tend to be happiest when socializing, eating or having -sex. We analyze a corpus of private microblogs from a well-being application -called ECHO, where users label each written post about daily events with a -happiness score between 1 and 9. Our goal is to ground the linguistic -descriptions of events that users experience in theories of well-being and -happiness, and then examine the extent to which different theoretical accounts -can explain the variance in the happiness scores. We show that recurrent event -types, such as OBLIGATION and INCOMPETENCE, which affect people's feelings of -well-being are not captured in current lexical or semantic resources. -" -5633,1709.00103,"Victor Zhong, Caiming Xiong, and Richard Socher","Seq2SQL: Generating Structured Queries from Natural Language using - Reinforcement Learning",cs.CL cs.AI," A significant amount of the world's knowledge is stored in relational -databases. However, the ability for users to retrieve facts from a database is -limited due to a lack of understanding of query languages such as SQL. We -propose Seq2SQL, a deep neural network for translating natural language -questions to corresponding SQL queries. Our model leverages the structure of -SQL queries to significantly reduce the output space of generated queries. -Moreover, we use rewards from in-the-loop query execution over the database to -learn a policy to generate unordered parts of the query, which we show are less -suitable for optimization via cross entropy loss. In addition, we will publish -WikiSQL, a dataset of 80654 hand-annotated examples of questions and SQL -queries distributed across 24241 tables from Wikipedia. This dataset is -required to train our model and is an order of magnitude larger than comparable -datasets. By applying policy-based reinforcement learning with a query -execution environment to WikiSQL, our model Seq2SQL outperforms attentional -sequence to sequence models, improving execution accuracy from 35.9% to 59.4% -and logical form accuracy from 23.4% to 48.3%. -" -5634,1709.00149,"Enrique Noriega-Atala, Marco A. Valenzuela-Escarcega, Clayton T. - Morrison, Mihai Surdeanu",Learning what to read: Focused machine reading,cs.AI cs.CL cs.IR cs.LG," Recent efforts in bioinformatics have achieved tremendous progress in the -machine reading of biomedical literature, and the assembly of the extracted -biochemical interactions into large-scale models such as protein signaling -pathways. However, batch machine reading of literature at today's scale (PubMed -alone indexes over 1 million papers per year) is unfeasible due to both cost -and processing overhead. In this work, we introduce a focused reading approach -to guide the machine reading of biomedical literature towards what literature -should be read to answer a biomedical query as efficiently as possible. We -introduce a family of algorithms for focused reading, including an intuitive, -strong baseline, and a second approach which uses a reinforcement learning (RL) -framework that learns when to explore (widen the search) or exploit (narrow -it). We demonstrate that the RL approach is capable of answering more queries -than the baseline, while being more efficient, i.e., reading fewer documents. -" -5635,1709.00155,"Lei Sha, Lili Mou, Tianyu Liu, Pascal Poupart, Sujian Li, Baobao - Chang, Zhifang Sui",Order-Planning Neural Text Generation From Structured Data,cs.CL cs.AI cs.IR cs.LG," Generating texts from structured data (e.g., a table) is important for -various natural language processing tasks such as question answering and dialog -systems. In recent studies, researchers use neural language models and -encoder-decoder frameworks for table-to-text generation. However, these neural -network-based approaches do not model the order of contents during text -generation. When a human writes a summary based on a given table, he or she -would probably consider the content order before wording. In a biography, for -example, the nationality of a person is typically mentioned before occupation -in a biography. In this paper, we propose an order-planning text generation -model to capture the relationship between different fields and use such -relationship to make the generated text more fluent and smooth. We conducted -experiments on the WikiBio dataset and achieve significantly higher performance -than previous methods in terms of BLEU, ROUGE, and NIST scores. -" -5636,1709.00224,Guy Emerson and Ann Copestake,Variational Inference for Logical Inference,cs.CL," Functional Distributional Semantics is a framework that aims to learn, from -text, semantic representations which can be interpreted in terms of truth. Here -we make two contributions to this framework. The first is to show how a type of -logical inference can be performed by evaluating conditional probabilities. The -second is to make these calculations tractable by means of a variational -approximation. This approximation also enables faster convergence during -training, allowing us to close the gap with state-of-the-art vector space -models when evaluating on semantic similarity. We demonstrate promising -performance on two tasks. -" -5637,1709.00226,Guy Emerson and Ann Copestake,Semantic Composition via Probabilistic Model Theory,cs.CL," Semantic composition remains an open problem for vector space models of -semantics. In this paper, we explain how the probabilistic graphical model used -in the framework of Functional Distributional Semantics can be interpreted as a -probabilistic version of model theory. Building on this, we explain how various -semantic phenomena can be recast in terms of conditional probabilities in the -graphical model. This connection between formal semantics and machine learning -is helpful in both directions: it gives us an explicit mechanism for modelling -context-dependent meanings (a challenge for formal semantics), and also gives -us well-motivated techniques for composing distributed representations (a -challenge for distributional semantics). We present results on two datasets -that go beyond word similarity, showing how these semantically-motivated -techniques improve on the performance of vector models. -" -5638,1709.00345,"Ian Stewart, Jacob Eisenstein","Making ""fetch"" happen: The influence of social and linguistic context on - nonstandard word growth and decline",cs.CL cs.SI physics.soc-ph," In an online community, new words come and go: today's ""haha"" may be replaced -by tomorrow's ""lol."" Changes in online writing are usually studied as a social -process, with innovations diffusing through a network of individuals in a -speech community. But unlike other types of innovation, language change is -shaped and constrained by the system in which it takes part. To investigate the -links between social and structural factors in language change, we undertake a -large-scale analysis of nonstandard word growth in the online community Reddit. -We find that dissemination across many linguistic contexts is a sign of growth: -words that appear in more linguistic contexts grow faster and survive longer. -We also find that social dissemination likely plays a less important role in -explaining word growth and decline than previously hypothesized. -" -5639,1709.00354,"Chia-Wei Ao, Hung-yi Lee","Query-by-example Spoken Term Detection using Attention-based Multi-hop - Networks",cs.CL cs.MM," Retrieving spoken content with spoken queries, or query-by- example spoken -term detection (STD), is attractive because it makes possible the matching of -signals directly on the acoustic level without transcribing them into text. -Here, we propose an end-to-end query-by-example STD model based on an -attention-based multi-hop network, whose input is a spoken query and an audio -segment containing several utterances; the output states whether the audio -segment includes the query. The model can be trained in either a supervised -scenario using labeled data, or in an unsupervised fashion. In the supervised -scenario, we find that the attention mechanism and multiple hops improve -performance, and that the attention weights indicate the time span of the -detected terms. In the unsupervised setting, the model mimics the behavior of -the existing query-by-example STD system, yielding performance comparable to -the existing system but with a lower search time complexity. -" -5640,1709.00387,"Suwon Shon, Ahmed Ali and James Glass","MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre - Broadcast Challenge",cs.CL cs.LG cs.SD," In order to successfully annotate the Arabic speech con- tent found in -open-domain media broadcasts, it is essential to be able to process a diverse -set of Arabic dialects. For the 2017 Multi-Genre Broadcast challenge (MGB-3) -there were two possible tasks: Arabic speech recognition, and Arabic Dialect -Identification (ADI). In this paper, we describe our efforts to create an ADI -system for the MGB-3 challenge, with the goal of distinguishing amongst four -major Arabic dialects, as well as Modern Standard Arabic. Our research fo- -cused on dialect variability and domain mismatches between the training and -test domain. In order to achieve a robust ADI system, we explored both Siamese -neural network models to learn similarity and dissimilarities among Arabic -dialects, as well as i-vector post-processing to adapt domain mismatches. Both -Acoustic and linguistic features were used for the final MGB-3 submissions, -with the best primary system achieving 75% accuracy on the official 10hr test -set. -" -5641,1709.00389,"Jian Tang, Yue Wang, Kai Zheng, Qiaozhu Mei",End-to-end Learning for Short Text Expansion,cs.CL cs.IR," Effectively making sense of short texts is a critical task for many real -world applications such as search engines, social media services, and -recommender systems. The task is particularly challenging as a short text -contains very sparse information, often too sparse for a machine learning -algorithm to pick up useful signals. A common practice for analyzing short text -is to first expand it with external information, which is usually harvested -from a large collection of longer texts. In literature, short text expansion -has been done with all kinds of heuristics. We propose an end-to-end solution -that automatically learns how to expand short text to optimize a given learning -task. A novel deep memory network is proposed to automatically find relevant -information from a collection of longer documents and reformulate the short -text through a gating mechanism. Using short text classification as a -demonstrating task, we show that the deep memory network significantly -outperforms classical text expansion methods with comprehensive experiments on -real world data sets. -" -5642,1709.00489,Miguel Ballesteros and Xavier Carreras,Arc-Standard Spinal Parsing with Stack-LSTMs,cs.CL," We present a neural transition-based parser for spinal trees, a dependency -representation of constituent trees. The parser uses Stack-LSTMs that compose -constituent nodes with dependency-based derivations. In experiments, we show -that this model adapts to different styles of dependency relations, but this -choice has little effect for predicting constituent structure, suggesting that -LSTMs induce useful states by themselves. -" -5643,1709.00541,Rustem Takhanov and Zhenisbek Assylbekov,Patterns versus Characters in Subword-aware Neural Language Modeling,cs.CL cs.LG," Words in some natural languages can have a composite structure. Elements of -this structure include the root (that could also be composite), prefixes and -suffixes with which various nuances and relations to other words can be -expressed. Thus, in order to build a proper word representation one must take -into account its internal structure. From a corpus of texts we extract a set of -frequent subwords and from the latter set we select patterns, i.e. subwords -which encapsulate information on character $n$-gram regularities. The selection -is made using the pattern-based Conditional Random Field model with $l_1$ -regularization. Further, for every word we construct a new sequence over an -alphabet of patterns. The new alphabet's symbols confine a local statistical -context stronger than the characters, therefore they allow better -representations in ${\mathbb{R}}^n$ and are better building blocks for word -representation. In the task of subword-aware language modeling, pattern-based -models outperform character-based analogues by 2-20 perplexity points. Also, a -recurrent neural network in which a word is represented as a sum of embeddings -of its patterns is on par with a competitive and significantly more -sophisticated character-based convolutional architecture. -" -5644,1709.00575,"Marek Rei, Luana Bulat, Douwe Kiela, Ekaterina Shutova","Grasping the Finer Point: A Supervised Similarity Network for Metaphor - Detection",cs.CL cs.LG cs.NE," The ubiquity of metaphor in our everyday communication makes it an important -problem for natural language understanding. Yet, the majority of metaphor -processing systems to date rely on hand-engineered features and there is still -no consensus in the field as to which features are optimal for this task. In -this paper, we present the first deep learning architecture designed to capture -metaphorical composition. Our results demonstrate that it outperforms the -existing approaches in the metaphor identification task. -" -5645,1709.00616,"Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Ahmed Abdelali, Yonatan - Belinkov, Stephan Vogel","Challenging Language-Dependent Segmentation for Arabic: An Application - to Machine Translation and Part-of-Speech Tagging",cs.CL," Word segmentation plays a pivotal role in improving any Arabic NLP -application. Therefore, a lot of research has been spent in improving its -accuracy. Off-the-shelf tools, however, are: i) complicated to use and ii) -domain/dialect dependent. We explore three language-independent alternatives to -morphological segmentation using: i) data-driven sub-word units, ii) characters -as a unit of learning, and iii) word embeddings learned using a character CNN -(Convolution Neural Network). On the tasks of Machine Translation and POS -tagging, we found these methods to achieve close to, and occasionally surpass -state-of-the-art performance. In our analysis, we show that a neural machine -translation system is sensitive to the ratio of source and target tokens, and a -ratio close to 1 or greater, gives optimal performance. -" -5646,1709.00659,"Kushal Chawla, Sunil Kumar Sahu, Ashish Anand","Investigating how well contextual features are captured by - bi-directional recurrent neural network models",cs.CL," Learning algorithms for natural language processing (NLP) tasks traditionally -rely on manually defined relevant contextual features. On the other hand, -neural network models using an only distributional representation of words have -been successfully applied for several NLP tasks. Such models learn features -automatically and avoid explicit feature engineering. Across several domains, -neural models become a natural choice specifically when limited characteristics -of data are known. However, this flexibility comes at the cost of -interpretability. In this paper, we define three different methods to -investigate ability of bi-directional recurrent neural networks (RNNs) in -capturing contextual features. In particular, we analyze RNNs for sequence -tagging tasks. We perform a comprehensive analysis on general as well as -biomedical domain datasets. Our experiments focus on important contextual words -as features, which can easily be extended to analyze various other feature -types. We also investigate positional effects of context words and show how the -developed methods can be used for error analysis. -" -5647,1709.00661,Amita Misra and Marilyn Walker,"Topic Independent Identification of Agreement and Disagreement in Social - Media Dialogue",cs.AI cs.CL," Research on the structure of dialogue has been hampered for years because -large dialogue corpora have not been available. This has impacted the dialogue -research community's ability to develop better theories, as well as good off -the shelf tools for dialogue processing. Happily, an increasing amount of -information and opinion exchange occur in natural dialogue in online forums, -where people share their opinions about a vast range of topics. In particular -we are interested in rejection in dialogue, also called disagreement and -denial, where the size of available dialogue corpora, for the first time, -offers an opportunity to empirically test theoretical accounts of the -expression and inference of rejection in dialogue. In this paper, we test -whether topic-independent features motivated by theoretical predictions can be -used to recognize rejection in online forums in a topic independent way. Our -results show that our theoretically motivated features achieve 66% accuracy, an -improvement over a unigram baseline of an absolute 6%. -" -5648,1709.00662,"Amita Misra, Pranav Anand, Jean E Fox Tree and Marilyn Walker","Using Summarization to Discover Argument Facets in Online Ideological - Dialog",cs.AI cs.CL," More and more of the information available on the web is dialogic, and a -significant portion of it takes place in online forum conversations about -current social and political topics. We aim to develop tools to summarize what -these conversations are about. What are the CENTRAL PROPOSITIONS associated -with different stances on an issue, what are the abstract objects under -discussion that are central to a speaker's argument? How can we recognize that -two CENTRAL PROPOSITIONS realize the same FACET of the argument? We hypothesize -that the CENTRAL PROPOSITIONS are exactly those arguments that people find most -salient, and use human summarization as a probe for discovering them. We -describe our corpus of human summaries of opinionated dialogs, then show how we -can identify similar repeated arguments, and group them into FACETS across many -discussions of a topic. We define a new task, ARGUMENT FACET SIMILARITY (AFS), -and show that we can predict AFS with a .54 correlation score, versus an ngram -system baseline of .39 and a semantic textual similarity system baseline of -.45. -" -5649,1709.00678,Ngoc-Tien Le and Benjamin Lecouteux and Laurent Besacier,Disentangling ASR and MT Errors in Speech Translation,cs.CL," The main aim of this paper is to investigate automatic quality assessment for -spoken language translation (SLT). More precisely, we investigate SLT errors -that can be due to transcription (ASR) or to translation (MT) modules. This -paper investigates automatic detection of SLT errors using a single classifier -based on joint ASR and MT features. We evaluate both 2-class (good/bad) and -3-class (good/badASR/badMT ) labeling tasks. The 3-class problem necessitates -to disentangle ASR and MT errors in the speech translation output and we -propose two label extraction methods for this non trivial step. This enables - -as a by-product - qualitative analysis on the SLT errors and their origin (are -they due to transcription or to translation step?) on our large in-house corpus -for French-to-English speech translation. -" -5650,1709.00728,Wen Kokke,Formalising Type-Logical Grammars in Agda,cs.LO cs.CL," In recent years, the interest in using proof assistants to formalise and -reason about mathematics and programming languages has grown. Type-logical -grammars, being closely related to type theories and systems used in functional -programming, are a perfect candidate to next apply this curiosity to. The -advantages of using proof assistants is that they allow one to write formally -verified proofs about one's type-logical systems, and that any theory, once -implemented, can immediately be computed with. The downside is that in many -cases the formal proofs are written as an afterthought, are incomplete, or use -obtuse syntax. This makes it that the verified proofs are often much more -difficult to read than the pen-and-paper proofs, and almost never directly -published. In this paper, we will try to remedy that by example. - Concretely, we use Agda to model the Lambek-Grishin calculus, a grammar logic -with a rich vocabulary of type-forming operations. We then present a verified -procedure for cut elimination in this system. Then we briefly outline a CPS -translation from proofs in the Lambek-Grishin calculus to programs in Agda. And -finally, we will put our system to use in the analysis of a simple example -sentence. -" -5651,1709.00770,"Muhammad Mahbubur Rahman, Tim Finin",Understanding the Logical and Semantic Structure of Large Documents,cs.CL cs.IR cs.LG," Current language understanding approaches focus on small documents, such as -newswire articles, blog posts, product reviews and discussion forum entries. -Understanding and extracting information from large documents like legal -briefs, proposals, technical manuals and research articles is still a -challenging task. We describe a framework that can analyze a large document and -help people to know where a particular information is in that document. We aim -to automatically identify and classify semantic sections of documents and -assign consistent and human-understandable labels to similar sections across -documents. A key contribution of our research is modeling the logical and -semantic structure of an electronic document. We apply machine learning -techniques, including deep learning, in our prototype system. We also make -available a dataset of information about a collection of scholarly articles -from the arXiv eprints collection that includes a wide range of metadata for -each article, including a table of contents, section labels, section -summarizations and more. We hope that this dataset will be a useful resource -for the machine learning and NLP communities in information retrieval, -content-based question answering and language modeling. -" -5652,1709.00813,"Samuel Cunningham-Nelson, Mahsa Baktashmotlagh, Wageeh Boles","From Review to Rating: Exploring Dependency Measures for Text - Classification",cs.CL," Various text analysis techniques exist, which attempt to uncover unstructured -information from text. In this work, we explore using statistical dependence -measures for textual classification, representing text as word vectors. Student -satisfaction scores on a 3-point scale and their free text comments written -about university subjects are used as the dataset. We have compared two textual -representations: a frequency word representation and term frequency -relationship to word vectors, and found that word vectors provide a greater -accuracy. However, these word vectors have a large number of features which -aggravates the burden of computational complexity. Thus, we explored using a -non-linear dependency measure for feature selection by maximizing the -dependence between the text reviews and corresponding scores. Our quantitative -and qualitative analysis on a student satisfaction dataset shows that our -approach achieves comparable accuracy to the full feature vector, while being -an order of magnitude faster in testing. These text analysis and feature -reduction techniques can be used for other textual data applications such as -sentiment analysis. -" -5653,1709.00831,Nishant Gurnani,Hypothesis Testing based Intrinsic Evaluation of Word Embeddings,cs.CL," We introduce the cross-match test - an exact, distribution free, -high-dimensional hypothesis test as an intrinsic evaluation metric for word -embeddings. We show that cross-match is an effective means of measuring -distributional similarity between different vector representations and of -evaluating the statistical significance of different vector embedding models. -Additionally, we find that cross-match can be used to provide a quantitative -measure of linguistic similarity for selecting bridge languages for machine -translation. We demonstrate that the results of the hypothesis test align with -our expectations and note that the framework of two sample hypothesis testing -is not limited to word embeddings and can be extended to all vector -representations. -" -5654,1709.00893,"Dehong Ma, Sujian Li, Xiaodong Zhang, Houfeng Wang",Interactive Attention Networks for Aspect-Level Sentiment Classification,cs.AI cs.CL," Aspect-level sentiment classification aims at identifying the sentiment -polarity of specific target in its context. Previous approaches have realized -the importance of targets in sentiment classification and developed various -methods with the goal of precisely modeling their contexts via generating -target-specific representations. However, these studies always ignore the -separate modeling of targets. In this paper, we argue that both targets and -contexts deserve special treatment and need to be learned their own -representations via interactive learning. Then, we propose the interactive -attention networks (IAN) to interactively learn attentions in the contexts and -targets, and generate the representations for targets and contexts separately. -With this design, the IAN model can well represent a target and its collocative -context, which is helpful to sentiment classification. Experimental results on -SemEval 2014 Datasets demonstrate the effectiveness of our model. -" -5655,1709.00917,"Shasha Xia, Hao Li and Xueliang Zhang","Using Optimal Ratio Mask as Training Target for Supervised Speech - Separation",cs.SD cs.CL," Supervised speech separation uses supervised learning algorithms to learn a -mapping from an input noisy signal to an output target. With the fast -development of deep learning, supervised separation has become the most -important direction in speech separation area in recent years. For the -supervised algorithm, training target has a significant impact on the -performance. Ideal ratio mask is a commonly used training target, which can -improve the speech intelligibility and quality of the separated speech. -However, it does not take into account the correlation between noise and clean -speech. In this paper, we use the optimal ratio mask as the training target of -the deep neural network (DNN) for speech separation. The experiments are -carried out under various noise environments and signal to noise ratio (SNR) -conditions. The results show that the optimal ratio mask outperforms other -training targets in general. -" -5656,1709.00947,"Pedro Saleiro, Lu\'is Sarmento, Eduarda Mendes Rodrigues, Carlos - Soares, Eug\'enio Oliveira","Learning Word Embeddings from the Portuguese Twitter Stream: A Study of - some Practical Aspects",cs.CL cs.LG," This paper describes a preliminary study for producing and distributing a -large-scale database of embeddings from the Portuguese Twitter stream. We start -by experimenting with a relatively small sample and focusing on three -challenges: volume of training data, vocabulary size and intrinsic evaluation -metrics. Using a single GPU, we were able to scale up vocabulary size from 2048 -words embedded and 500K training examples to 32768 words over 10M training -examples while keeping a stable validation loss and approximately linear trend -on training time per epoch. We also observed that using less than 50\% of the -available training examples for each vocabulary size might result in -overfitting. Results on intrinsic evaluation show promising performance for a -vocabulary size of 32768 words. Nevertheless, intrinsic evaluation metrics -suffer from over-sensitivity to their corresponding cosine similarity -thresholds, indicating that a wider range of metrics need to be developed to -track progress. -" -5657,1709.01042,"Reid Swanson and Stephanie Lukin and Luke Eisenberg and Thomas Chase - Corcoran and Marilyn A.Walker",Getting Reliable Annotations for Sarcasm in Online Dialogues,cs.CL," The language used in online forums differs in many ways from that of -traditional language resources such as news. One difference is the use and -frequency of nonliteral, subjective dialogue acts such as sarcasm. Whether the -aim is to develop a theory of sarcasm in dialogue, or engineer automatic -methods for reliably detecting sarcasm, a major challenge is simply the -difficulty of getting enough reliably labelled examples. In this paper we -describe our work on methods for achieving highly reliable sarcasm annotations -from untrained annotators on Mechanical Turk. We explore the use of a number of -common statistical reliability measures, such as Kappa, Karger's, Majority -Class, and EM. We show that more sophisticated measures do not appear to yield -better results for our data than simple measures such as assuming that the -correct label is the one that a majority of Turkers apply. -" -5658,1709.01058,"Linfeng Song, Zhiguo Wang and Wael Hamza","A Unified Query-based Generative Model for Question Generation and - Question Answering",cs.CL," We propose a query-based generative model for solving both tasks of question -generation (QG) and question an- swering (QA). The model follows the classic -encoder- decoder framework. The encoder takes a passage and a query as input -then performs query understanding by matching the query with the passage from -multiple per- spectives. The decoder is an attention-based Long Short Term -Memory (LSTM) model with copy and coverage mechanisms. In the QG task, a -question is generated from the system given the passage and the target answer, -whereas in the QA task, the answer is generated given the question and the -passage. During the training stage, we leverage a policy-gradient reinforcement -learning algorithm to overcome exposure bias, a major prob- lem resulted from -sequence learning with cross-entropy loss. For the QG task, our experiments -show higher per- formances than the state-of-the-art results. When used as -additional training data, the automatically generated questions even improve -the performance of a strong ex- tractive QA system. In addition, our model -shows bet- ter performance than the state-of-the-art baselines of the -generative QA task. -" -5659,1709.01121,"Adina Williams, and Andrew Drozdov and Samuel R. Bowman","Do latent tree learning models identify meaningful structure in - sentences?",cs.CL," Recent work on the problem of latent tree learning has made it possible to -train neural networks that learn to both parse a sentence and use the resulting -parse to interpret the sentence, all without exposure to ground-truth parse -trees at training time. Surprisingly, these models often perform better at -sentence understanding tasks than models that use parse trees from conventional -parsers. This paper aims to investigate what these latent tree learning models -learn. We replicate two such models in a shared codebase and find that (i) only -one of these models outperforms conventional tree-structured models on sentence -classification, (ii) its parsing strategies are not especially consistent -across random restarts, (iii) the parses it produces tend to be shallower than -standard Penn Treebank (PTB) parses, and (iv) they do not resemble those of PTB -or any other semantic or syntactic formalism that the authors are aware of. -" -5660,1709.01144,"Pranay Dighe, Afsaneh Asaei, Herv\'e Bourlard",Information Theoretic Analysis of DNN-HMM Acoustic Modeling,cs.SD cs.CL cs.LG," We propose an information theoretic framework for quantitative assessment of -acoustic modeling for hidden Markov model (HMM) based automatic speech -recognition (ASR). Acoustic modeling yields the probabilities of HMM sub-word -states for a short temporal window of speech acoustic features. We cast ASR as -a communication channel where the input sub-word probabilities convey the -information about the output HMM state sequence. The quality of the acoustic -model is thus quantified in terms of the information transmitted through this -channel. The process of inferring the most likely HMM state sequence from the -sub-word probabilities is known as decoding. HMM based decoding assumes that an -acoustic model yields accurate state-level probabilities and the data -distribution given the underlying hidden state is independent of any other -state in the sequence. We quantify 1) the acoustic model accuracy and 2) its -robustness to mismatch between data and the HMM conditional independence -assumption in terms of some mutual information quantities. In this context, -exploiting deep neural network (DNN) posterior probabilities leads to a simple -and straightforward analysis framework to assess shortcomings of the acoustic -model for HMM based decoding. This analysis enables us to evaluate the Gaussian -mixture acoustic model (GMM) and the importance of many hidden layers in DNNs -without any need of explicit speech recognition. In addition, it sheds light on -the contribution of low-dimensional models to enhance acoustic modeling for -better compliance with the HMM based decoding requirements. -" -5661,1709.01186,Krasen Samardzhiev and Andrew Gargett and Danushka Bollegala,Learning Neural Word Salience Scores,cs.CL," Measuring the salience of a word is an essential step in numerous NLP tasks. -Heuristic approaches such as tfidf have been used so far to estimate the -salience of words. We propose \emph{Neural Word Salience} (NWS) scores, unlike -heuristics, are learnt from a corpus. Specifically, we learn word salience -scores such that, using pre-trained word embeddings as the input, can -accurately predict the words that appear in a sentence, given the words that -appear in the sentences preceding or succeeding that sentence. Experimental -results on sentence similarity prediction show that the learnt word salience -scores perform comparably or better than some of the state-of-the-art -approaches for representing sentences on benchmark datasets for sentence -similarity, while using only a fraction of the training and prediction times -required by prior methods. Moreover, our NWS scores positively correlate with -psycholinguistic measures such as concreteness, and imageability implying a -close connection to the salience as perceived by humans. -" -5662,1709.01188,"Zhichao Hu, Marilyn A. Walker, Michael Neff and Jean E. Fox Tree",Storytelling Agents with Personality and Adaptivity,cs.HC cs.CL," We explore the expression of personality and adaptivity through the gestures -of virtual agents in a storytelling task. We conduct two experiments using four -different dialogic stories. We manipulate agent personality on the extraversion -scale, whether the agents adapt to one another in their gestural performance -and agent gender. Our results show that subjects are able to perceive the -intended variation in extraversion between different virtual agents, -independently of the story they are telling and the gender of the agent. A -second study shows that subjects also prefer adaptive to nonadaptive virtual -agents. -" -5663,1709.01189,"Fan Yang, Arjun Mukherjee, Eduard Dragut","Satirical News Detection and Analysis using Attention Mechanism and - Linguistic Features",cs.CL," Satirical news is considered to be entertainment, but it is potentially -deceptive and harmful. Despite the embedded genre in the article, not everyone -can recognize the satirical cues and therefore believe the news as true news. -We observe that satirical cues are often reflected in certain paragraphs rather -than the whole document. Existing works only consider document-level features -to detect the satire, which could be limited. We consider paragraph-level -linguistic features to unveil the satire by incorporating neural network and -attention mechanism. We investigate the difference between paragraph-level -features and document-level features, and analyze them on a large satirical -news dataset. The evaluation shows that the proposed model detects satirical -news effectively and reveals what features are important at which level. -" -5664,1709.01193,Huda Hakami and Danushka Bollegala,"Compositional Approaches for Representing Relations Between Words: A - Comparative Study",cs.CL," Identifying the relations that exist between words (or entities) is important -for various natural language processing tasks such as, relational search, -noun-modifier classification and analogy detection. A popular approach to -represent the relations between a pair of words is to extract the patterns in -which the words co-occur with from a corpus, and assign each word-pair a vector -of pattern frequencies. Despite the simplicity of this approach, it suffers -from data sparseness, information scalability and linguistic creativity as the -model is unable to handle previously unseen word pairs in a corpus. In -contrast, a compositional approach for representing relations between words -overcomes these issues by using the attributes of each individual word to -indirectly compose a representation for the common relations that hold between -the two words. This study aims to compare different operations for creating -relation representations from word-level representations. We investigate the -performance of the compositional methods by measuring the relational -similarities using several benchmark datasets for word analogy. Moreover, we -evaluate the different relation representations in a knowledge base completion -task. -" -5665,1709.01199,Danushka Bollegala and Yuichi Yoshida and Ken-ichi Kawarabayashi,Using $k$-way Co-occurrences for Learning Word Embeddings,cs.CL," Co-occurrences between two words provide useful insights into the semantics -of those words. Consequently, numerous prior work on word embedding learning -have used co-occurrences between two words as the training signal for learning -word embeddings. However, in natural language texts it is common for multiple -words to be related and co-occurring in the same context. We extend the notion -of co-occurrences to cover $k(\geq\!\!2)$-way co-occurrences among a set of -$k$-words. Specifically, we prove a theoretical relationship between the joint -probability of $k(\geq\!\!2)$ words, and the sum of $\ell_2$ norms of their -embeddings. Next, we propose a learning objective motivated by our theoretical -result that utilises $k$-way co-occurrences for learning word embeddings. Our -experimental results show that the derived theoretical relationship does indeed -hold empirically, and despite data sparsity, for some smaller $k$ values, -$k$-way embeddings perform comparably or better than $2$-way embeddings in a -range of tasks. -" -5666,1709.01256,"Xiaofeng Zhu, Diego Klabjan, Patrick Bless","Semantic Document Distance Measures and Unsupervised Document Revision - Detection",cs.IR cs.CL," In this paper, we model the document revision detection problem as a minimum -cost branching problem that relies on computing document distances. -Furthermore, we propose two new document distance measures, word vector-based -Dynamic Time Warping (wDTW) and word vector-based Tree Edit Distance (wTED). -Our revision detection system is designed for a large scale corpus and -implemented in Apache Spark. We demonstrate that our system can more precisely -detect revisions than state-of-the-art methods by utilizing the Wikipedia -revision dumps https://snap.stanford.edu/data/wiki-meta.html and simulated data -sets. -" -5667,1709.01562,"Alexander Bauer, Shinichi Nakajima, Nico G\""ornitz, Klaus-Robert - M\""uller",Optimizing for Measure of Performance in Max-Margin Parsing,cs.CL," Many statistical learning problems in the area of natural language processing -including sequence tagging, sequence segmentation and syntactic parsing has -been successfully approached by means of structured prediction methods. An -appealing property of the corresponding discriminative learning algorithms is -their ability to integrate the loss function of interest directly into the -optimization process, which potentially can increase the resulting performance -accuracy. Here, we demonstrate on the example of constituency parsing how to -optimize for F1-score in the max-margin framework of structural SVM. In -particular, the optimization is with respect to the original (not binarized) -trees. -" -5668,1709.01572,Hao Tang,Sequence Prediction with Neural Segmental Models,cs.CL cs.LG cs.SD," Segments that span contiguous parts of inputs, such as phonemes in speech, -named entities in sentences, actions in videos, occur frequently in sequence -prediction problems. Segmental models, a class of models that explicitly -hypothesizes segments, have allowed the exploration of rich segment features -for sequence prediction. However, segmental models suffer from slow decoding, -hampering the use of computationally expensive features. - In this thesis, we introduce discriminative segmental cascades, a multi-pass -inference framework that allows us to improve accuracy by adding higher-order -features and neural segmental features while maintaining efficiency. We also -show that instead of including more features to obtain better accuracy, -segmental cascades can be used to speed up training and decoding. - Segmental models, similarly to conventional speech recognizers, are typically -trained in multiple stages. In the first stage, a frame classifier is trained -with manual alignments, and then in the second stage, segmental models are -trained with manual alignments and the out- puts of the frame classifier. -However, obtaining manual alignments are time-consuming and expensive. We -explore end-to-end training for segmental models with various loss functions, -and show how end-to-end training with marginal log loss can eliminate the need -for detailed manual alignments. - We draw the connections between the marginal log loss and a popular -end-to-end training approach called connectionist temporal classification. We -present a unifying framework for various end-to-end graph search-based models, -such as hidden Markov models, connectionist temporal classification, and -segmental models. Finally, we discuss possible extensions of segmental models -to large-vocabulary sequence prediction tasks. -" -5669,1709.01634,J. Michael Herrmann,"The Voynich Manuscript is Written in Natural Language: The Pahlavi - Hypothesis",cs.CL," The late medieval Voynich Manuscript (VM) has resisted decryption and was -considered a meaningless hoax or an unsolvable cipher. Here, we provide -evidence that the VM is written in natural language by establishing a relation -of the Voynich alphabet and the Iranian Pahlavi script. Many of the Voynich -characters are upside-down versions of their Pahlavi counterparts, which may be -an effect of different writing directions. Other Voynich letters can be -explained as ligatures or departures from Pahlavi with the intent to cope with -known problems due to the stupendous ambiguity of Pahlavi text. While a -translation of the VM text is not attempted here, we can confirm the -Voynich-Pahlavi relation at the character level by the transcription of many -words from the VM illustrations and from parts of the main text. Many of the -transcribed words can be identified as terms from Zoroastrian cosmology which -is in line with the use of Pahlavi script in Zoroastrian communities from -medieval times. -" -5670,1709.01679,"Sosuke Kobayashi, Naoaki Okazaki, Kentaro Inui","A Neural Language Model for Dynamically Representing the Meanings of - Unknown Words and Entities in a Discourse",cs.CL," This study addresses the problem of identifying the meaning of unknown words -or entities in a discourse with respect to the word embedding approaches used -in neural language models. We proposed a method for on-the-fly construction and -exploitation of word embeddings in both the input and output layers of a neural -model by tracking contexts. This extends the dynamic entity representation used -in Kobayashi et al. (2016) and incorporates a copy mechanism proposed -independently by Gu et al. (2016) and Gulcehre et al. (2016). In addition, we -construct a new task and dataset called Anonymized Language Modeling for -evaluating the ability to capture word meanings while reading. Experiments -conducted using our novel dataset show that the proposed variant of RNN -language model outperformed the baseline model. Furthermore, the experiments -also demonstrate that dynamic updates of an output layer help a model predict -reappearing entities, whereas those of an input layer are effective to predict -words following reappearing entities. -" -5671,1709.01687,"Shashank Gupta, Sachin Pawar, Nitin Ramrakhiyani, Girish Palshikar and - Vasudeva Varma","Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction - Mention Extraction",cs.IR cs.CL," Social media is an useful platform to share health-related information due to -its vast reach. This makes it a good candidate for public-health monitoring -tasks, specifically for pharmacovigilance. We study the problem of extraction -of Adverse-Drug-Reaction (ADR) mentions from social media, particularly from -twitter. Medical information extraction from social media is challenging, -mainly due to short and highly information nature of text, as compared to more -technical and formal medical reports. - Current methods in ADR mention extraction relies on supervised learning -methods, which suffers from labeled data scarcity problem. The State-of-the-art -method uses deep neural networks, specifically a class of Recurrent Neural -Network (RNN) which are Long-Short-Term-Memory networks (LSTMs) -\cite{hochreiter1997long}. Deep neural networks, due to their large number of -free parameters relies heavily on large annotated corpora for learning the end -task. But in real-world, it is hard to get large labeled data, mainly due to -heavy cost associated with manual annotation. Towards this end, we propose a -novel semi-supervised learning based RNN model, which can leverage unlabeled -data also present in abundance on social media. Through experiments we -demonstrate the effectiveness of our method, achieving state-of-the-art -performance in ADR mention extraction. -" -5672,1709.01713,"Yuan Gao, Brij Mohan Lal Srivastava, and James Salsman","Spoken English Intelligibility Remediation with PocketSphinx Alignment - and Feature Extraction Improves Substantially over the State of the Art",cs.CL stat.ML," We use automatic speech recognition to assess spoken English learner -pronunciation based on the authentic intelligibility of the learners' spoken -responses determined from support vector machine (SVM) classifier or deep -learning neural network model predictions of transcription correctness. Using -numeric features produced by PocketSphinx alignment mode and many recognition -passes searching for the substitution and deletion of each expected phoneme and -insertion of unexpected phonemes in sequence, the SVM models achieve 82 percent -agreement with the accuracy of Amazon Mechanical Turk crowdworker -transcriptions, up from 75 percent reported by multiple independent -researchers. Using such features with SVM classifier probability prediction -models can help computer-aided pronunciation teaching (CAPT) systems provide -intelligibility remediation. -" -5673,1709.01766,"Wen Zhang, Jiawei Hu, Yang Feng and Qun Liu","Information-Propogation-Enhanced Neural Machine Translation by Relation - Model",cs.CL," Even though sequence-to-sequence neural machine translation (NMT) model have -achieved state-of-art performance in the recent fewer years, but it is widely -concerned that the recurrent neural network (RNN) units are very hard to -capture the long-distance state information, which means RNN can hardly find -the feature with long term dependency as the sequence becomes longer. -Similarly, convolutional neural network (CNN) is introduced into NMT for -speeding recently, however, CNN focus on capturing the local feature of the -sequence; To relieve this issue, we incorporate a relation network into the -standard encoder-decoder framework to enhance information-propogation in neural -network, ensuring that the information of the source sentence can flow into the -decoder adequately. Experiments show that proposed framework outperforms the -statistical MT model and the state-of-art NMT model significantly on two data -sets with different scales. -" -5674,1709.01848,"Andrew Yates, Arman Cohan, Nazli Goharian",Depression and Self-Harm Risk Assessment in Online Forums,cs.CL," Users suffering from mental health conditions often turn to online resources -for support, including specialized online support communities or general -communities such as Twitter and Reddit. In this work, we present a neural -framework for supporting and studying users in both types of communities. We -propose methods for identifying posts in support communities that may indicate -a risk of self-harm, and demonstrate that our approach outperforms strong -previously proposed methods for identifying such posts. Self-harm is closely -related to depression, which makes identifying depressed users on general -forums a crucial related task. We introduce a large-scale general forum dataset -(""RSDD"") consisting of users with self-reported depression diagnoses matched -with control users. We show how our method can be applied to effectively -identify depressed users from their use of language alone. We demonstrate that -our method outperforms strong baselines on this general forum dataset. -" -5675,1709.01887,"Amita Misra, Brian Ecker, and Marilyn A. Walker",Measuring the Similarity of Sentential Arguments in Dialog,cs.CL cs.AI," When people converse about social or political topics, similar arguments are -often paraphrased by different speakers, across many different conversations. -Debate websites produce curated summaries of arguments on such topics; these -summaries typically consist of lists of sentences that represent frequently -paraphrased propositions, or labels capturing the essence of one particular -aspect of an argument, e.g. Morality or Second Amendment. We call these -frequently paraphrased propositions ARGUMENT FACETS. Like these curated sites, -our goal is to induce and identify argument facets across multiple -conversations, and produce summaries. However, we aim to do this automatically. -We frame the problem as consisting of two steps: we first extract sentences -that express an argument from raw social media dialogs, and then rank the -extracted arguments in terms of their similarity to one another. Sets of -similar arguments are used to represent argument facets. We show here that we -can predict ARGUMENT FACET SIMILARITY with a correlation averaging 0.63 -compared to a human topline averaging 0.68 over three debate topics, easily -beating several reasonable baselines. -" -5676,1709.01888,"Miriam Cha, Youngjune Gwon, H.T. Kung","Language Modeling by Clustering with Word Embeddings for Text - Readability Assessment",cs.CL cs.LG," We present a clustering-based language model using word embeddings for text -readability prediction. Presumably, an Euclidean semantic space hypothesis -holds true for word embeddings whose training is done by observing word -co-occurrences. We argue that clustering with word embeddings in the metric -space should yield feature representations in a higher semantic space -appropriate for text regression. Also, by representing features in terms of -histograms, our approach can naturally address documents of varying lengths. An -empirical evaluation using the Common Core Standards corpus reveals that the -features formed on our clustering-based language model significantly improve -the previously known results for the same corpus in readability prediction. We -also evaluate the task of sentence matching based on semantic relatedness using -the Wiki-SimpleWiki corpus and find that our features lead to superior matching -performance. -" -5677,1709.01895,"Amita Misra, Brian Ecker, Theodore Handleman, Nicolas Hahn and Marilyn - Walker",A Semi-Supervised Approach to Detecting Stance in Tweets,cs.CL," Stance classification aims to identify, for a particular issue under -discussion, whether the speaker or author of a conversational turn has Pro -(Favor) or Con (Against) stance on the issue. Detecting stance in tweets is a -new task proposed for SemEval-2016 Task6, involving predicting stance for a -dataset of tweets on the topics of abortion, atheism, climate change, feminism -and Hillary Clinton. Given the small size of the dataset, our team created our -own topic-specific training corpus by developing a set of high precision -hashtags for each topic that were used to query the twitter API, with the aim -of developing a large training corpus without additional human labeling of -tweets for stance. The hashtags selected for each topic were predicted to be -stance-bearing on their own. Experimental results demonstrate good performance -for our features for opinion-target pairs based on generalizing dependency -features using sentiment lexicons. -" -5678,1709.01915,"James Bradbury, Richard Socher",Towards Neural Machine Translation with Latent Tree Attention,cs.CL cs.AI," Building models that take advantage of the hierarchical structure of language -without a priori annotation is a longstanding goal in natural language -processing. We introduce such a model for the task of machine translation, -pairing a recurrent neural network grammar encoder with a novel attentional -RNNG decoder and applying policy gradient reinforcement learning to induce -unsupervised tree structures on both the source and target. When trained on -character-level datasets with no explicit segmentation or parse annotation, the -model learns a plausible segmentation and shallow parse, obtaining performance -close to an attentional baseline. -" -5679,1709.01950,"Lakshya Kumar, Arpan Somani, Pushpak Bhattacharyya","""Having 2 hours to write a paper is fun!"": Detecting Sarcasm in - Numerical Portions of Text",cs.CL," Sarcasm occurring due to the presence of numerical portions in text has been -quoted as an error made by automatic sarcasm detection approaches in the past. -We present a first study in detecting sarcasm in numbers, as in the case of the -sentence 'Love waking up at 4 am'. We analyze the challenges of the problem, -and present Rule-based, Machine Learning and Deep Learning approaches to detect -sarcasm in numerical portions of text. Our Deep Learning approach outperforms -four past works for sarcasm detection and Rule-based and Machine learning -approaches on a dataset of tweets, obtaining an F1-score of 0.93. This shows -that special attention to text containing numbers may be useful to improve -state-of-the-art in sarcasm detection. -" -5680,1709.01991,"Monika Rani, Amit Kumar Dhar and O. P. Vyas",Semi-Automatic Terminology Ontology Learning Based on Topic Modeling,cs.IR cs.CL," Ontologies provide features like a common vocabulary, reusability, -machine-readable content, and also allows for semantic search, facilitate agent -interaction and ordering & structuring of knowledge for the Semantic Web (Web -3.0) application. However, the challenge in ontology engineering is automatic -learning, i.e., the there is still a lack of fully automatic approach from a -text corpus or dataset of various topics to form ontology using machine -learning techniques. In this paper, two topic modeling algorithms are explored, -namely LSI & SVD and Mr.LDA for learning topic ontology. The objective is to -determine the statistical relationship between document and terms to build a -topic ontology and ontology graph with minimum human intervention. Experimental -analysis on building a topic ontology and semantic retrieving corresponding -topic ontology for the user's query demonstrating the effectiveness of the -proposed approach. -" -5681,1709.02076,"Donya Quick, Clayton T. Morrison",Composition by Conversation,cs.SD cs.CL cs.IR cs.PL," Most musical programming languages are developed purely for coding virtual -instruments or algorithmic compositions. Although there has been some work in -the domain of musical query languages for music information retrieval, there -has been little attempt to unify the principles of musical programming and -query languages with cognitive and natural language processing models that -would facilitate the activity of composition by conversation. We present a -prototype framework, called MusECI, that merges these domains, permitting -score-level algorithmic composition in a text editor while also supporting -connectivity to existing natural language processing frameworks. -" -5682,1709.02184,"Mihael Arcan, Daniel Torregrosa, Paul Buitelaar","Translating Terminological Expressions in Knowledge Bases with Neural - Machine Translation",cs.CL," Our work presented in this paper focuses on the translation of terminological -expressions represented in semantically structured resources, like ontologies -or knowledge graphs. The challenge of translating ontology labels or -terminological expressions documented in knowledge bases lies in the highly -specific vocabulary and the lack of contextual information, which can guide a -machine translation system to translate ambiguous words into the targeted -domain. Due to these challenges, we evaluate the translation quality of -domain-specific expressions in the medical and financial domain with -statistical as well as with neural machine translation methods and experiment -domain adaptation of the translation models with terminological expressions -only. Furthermore, we perform experiments on the injection of external -terminological expressions into the translation systems. Through these -experiments, we observed a significant advantage in domain adaptation for the -domain-specific resource in the medical and financial domain and the benefit of -subword models over word-based neural machine translation models for -terminology translation. -" -5683,1709.02271,"Su Wang, Elisa Ferracane, Raymond J. Mooney",Leveraging Discourse Information Effectively for Authorship Attribution,cs.CL," We explore techniques to maximize the effectiveness of discourse information -in the task of authorship attribution. We present a novel method to embed -discourse features in a Convolutional Neural Network text classifier, which -achieves a state-of-the-art result by a substantial margin. We empirically -investigate several featurization methods to understand the conditions under -which discourse features contribute non-trivial performance gains, and analyze -discourse embeddings. -" -5684,1709.02279,Amittai Axelrod,Cynical Selection of Language Model Training Data,cs.CL," The Moore-Lewis method of ""intelligent selection of language model training -data"" is very effective, cheap, efficient... and also has structural problems. -(1) The method defines relevance by playing language models trained on the -in-domain and the out-of-domain (or data pool) corpora against each other. This -powerful idea-- which we set out to preserve-- treats the two corpora as the -opposing ends of a single spectrum. This lack of nuance does not allow for the -two corpora to be very similar. In the extreme case where the come from the -same distribution, all of the sentences have a Moore-Lewis score of zero, so -there is no resulting ranking. (2) The selected sentences are not guaranteed to -be able to model the in-domain data, nor to even cover the in-domain data. They -are simply well-liked by the in-domain model; this is necessary, but not -sufficient. (3) There is no way to tell what is the optimal number of sentences -to select, short of picking various thresholds and building the systems. - We present a greedy, lazy, approximate, and generally efficient -information-theoretic method of accomplishing the same goal using only -vocabulary counts. The method has the following properties: (1) Is responsive -to the extent to which two corpora differ. (2) Quickly reaches near-optimal -vocabulary coverage. (3) Takes into account what has already been selected. (4) -Does not involve defining any kind of domain, nor any kind of classifier. (6) -Knows approximately when to stop. This method can be used as an -inherently-meaningful measure of similarity, as it measures the bits of -information to be gained by adding one text to another. -" -5685,1709.02349,"Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng - Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath - Chandar, Nan Rosemary Ke, Sai Rajeshwar, Alexandre de Brebisson, Jose M. R. - Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, - Yoshua Bengio",A Deep Reinforcement Learning Chatbot,cs.CL cs.AI cs.LG cs.NE stat.ML," We present MILABOT: a deep reinforcement learning chatbot developed by the -Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize -competition. MILABOT is capable of conversing with humans on popular small talk -topics through both speech and text. The system consists of an ensemble of -natural language generation and retrieval models, including template-based -models, bag-of-words models, sequence-to-sequence neural network and latent -variable neural network models. By applying reinforcement learning to -crowdsourced data and real-world user interactions, the system has been trained -to select an appropriate response from the models in its ensemble. The system -has been evaluated through A/B testing with real-world users, where it -performed significantly better than many competing systems. Due to its machine -learning architecture, the system is likely to improve with additional data. -" -5686,1709.02755,"Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai and Yoav Artzi",Simple Recurrent Units for Highly Parallelizable Recurrence,cs.CL cs.NE," Common recurrent neural architectures scale poorly due to the intrinsic -difficulty in parallelizing their state computations. In this work, we propose -the Simple Recurrent Unit (SRU), a light recurrent unit that balances model -capacity and scalability. SRU is designed to provide expressive recurrence, -enable highly parallelized implementation, and comes with careful -initialization to facilitate training of deep models. We demonstrate the -effectiveness of SRU on multiple NLP tasks. SRU achieves 5--9x speed-up over -cuDNN-optimized LSTM on classification and question answering datasets, and -delivers stronger results than LSTM and convolutional models. We also obtain an -average of 0.7 BLEU improvement over the Transformer model on translation by -incorporating SRU into the architecture. -" -5687,1709.02783,"Richard Futrell, Roger Levy and Matthew Dryer",A Statistical Comparison of Some Theories of NP Word Order,cs.CL," A frequent object of study in linguistic typology is the order of elements -{demonstrative, adjective, numeral, noun} in the noun phrase. The goal is to -predict the relative frequencies of these orders across languages. Here we use -Poisson regression to statistically compare some prominent accounts of this -variation. We compare feature systems derived from Cinque (2005) to feature -systems given in Cysouw (2010) and Dryer (in prep). In this setting, we do not -find clear reasons to prefer the model of Cinque (2005) or Dryer (in prep), but -we find both of these models have substantially better fit to the typological -data than the model from Cysouw (2010). -" -5688,1709.02828,Jonathan Raiman and John Miller,Globally Normalized Reader,cs.CL," Rapid progress has been made towards question answering (QA) systems that can -extract answers from text. Existing neural approaches make use of expensive -bi-directional attention mechanisms or score all possible answer spans, -limiting scalability. We propose instead to cast extractive QA as an iterative -search problem: select the answer's sentence, start word, and end word. This -representation reduces the space of each search step and allows computation to -be conditionally allocated to promising search paths. We show that globally -normalizing the decision process and back-propagating through beam search makes -this representation viable and learning efficient. We empirically demonstrate -the benefits of this approach using our model, Globally Normalized Reader -(GNR), which achieves the second highest single model performance on the -Stanford Question Answering Dataset (68.4 EM, 76.21 F1 dev) and is 24.7x faster -than bi-attention-flow. We also introduce a data-augmentation method to produce -semantically valid examples by aligning named entities to a knowledge base and -swapping them with new entities of the same type. This method improves the -performance of all models considered in this work and is of independent -interest for a variety of NLP tasks. -" -5689,1709.02842,"Yohan Jo, Lisa Lee, Shruti Palaskar",Combining LSTM and Latent Topic Modeling for Mortality Prediction,cs.CL," There is a great need for technologies that can predict the mortality of -patients in intensive care units with both high accuracy and accountability. We -present joint end-to-end neural network architectures that combine long -short-term memory (LSTM) and a latent topic model to simultaneously train a -classifier for mortality prediction and learn latent topics indicative of -mortality from textual clinical notes. For topic interpretability, the topic -modeling layer has been carefully designed as a single-layer network with -constraints inspired by LDA. Experiments on the MIMIC-III dataset show that our -models significantly outperform prior models that are based on LDA topics in -mortality prediction. However, we achieve limited success with our method for -interpreting topics from the trained models by looking at the neural network -weights. -" -5690,1709.02843,Elnaz Davoodi and Leila Kosseim,"CLaC at SemEval-2016 Task 11: Exploring linguistic and psycho-linguistic - Features for Complex Word Identification",cs.CL," This paper describes the system deployed by the CLaC-EDLK team to the -""SemEval 2016, Complex Word Identification task"". The goal of the task is to -identify if a given word in a given context is ""simple"" or ""complex"". Our -system relies on linguistic features and cognitive complexity. We used several -supervised models, however the Random Forest model outperformed the others. -Overall our best configuration achieved a G-score of 68.8% in the task, ranking -our system 21 out of 45. -" -5691,1709.02911,"Vindula Jayawardana, Dimuthu Lakmal, Nisansa de Silva, Amal Shehan - Perera, Keet Sugathadasa, Buddhi Ayesha, Madhavi Perera","Semi-Supervised Instance Population of an Ontology using Word Vector - Embeddings",cs.CL," In many modern day systems such as information extraction and knowledge -management agents, ontologies play a vital role in maintaining the concept -hierarchies of the selected domain. However, ontology population has become a -problematic process due to its nature of heavy coupling with manual human -intervention. With the use of word embeddings in the field of natural language -processing, it became a popular topic due to its ability to cope up with -semantic sensitivity. Hence, in this study, we propose a novel way of -semi-supervised ontology population through word embeddings as the basis. We -built several models including traditional benchmark models and new types of -models which are based on word embeddings. Finally, we ensemble them together -to come up with a synergistic model with better accuracy. We demonstrate that -our ensemble model can outperform the individual models. -" -5692,1709.02968,Chao-Lin Liu and Hongsu Wang,"Matrix and Graph Operations for Relationship Inference: An Illustration - with the Kinship Inference in the China Biographical Database",cs.DL cs.CL cs.DB," Biographical databases contain diverse information about individuals. Person -names, birth information, career, friends, family and special achievements are -some possible items in the record for an individual. The relationships between -individuals, such as kinship and friendship, provide invaluable insights about -hidden communities which are not directly recorded in databases. We show that -some simple matrix and graph-based operations are effective for inferring -relationships among individuals, and illustrate the main ideas with the China -Biographical Database (CBDB). -" -5693,1709.02984,"Fabio Calefato, Filippo Lanubile, Federico Maiorano, Nicole Novielli",Sentiment Polarity Detection for Software Development,cs.SE cs.CL," The role of sentiment analysis is increasingly emerging to study software -developers' emotions by mining crowd-generated content within social software -engineering tools. However, off-the-shelf sentiment analysis tools have been -trained on non-technical domains and general-purpose social media, thus -resulting in misclassifications of technical jargon and problem reports. Here, -we present Senti4SD, a classifier specifically trained to support sentiment -analysis in developers' communication channels. Senti4SD is trained and -validated using a gold standard of Stack Overflow questions, answers, and -comments manually annotated for sentiment polarity. It exploits a suite of both -lexicon- and keyword-based features, as well as semantic features based on word -embedding. With respect to a mainstream off-the-shelf tool, which we use as a -baseline, Senti4SD reduces the misclassifications of neutral and positive posts -as emotionally negative. To encourage replications, we release a lab package -including the classifier, the word embedding space, and the gold standard with -annotation guidelines. -" -5694,1709.03010,"Di Wang, Nebojsa Jojic, Chris Brockett, Eric Nyberg",Steering Output Style and Topic in Neural Response Generation,cs.CL," We propose simple and flexible training and decoding methods for influencing -output style and topic in neural encoder-decoder based language generation. -This capability is desirable in a variety of applications, including -conversational systems, where successful agents need to produce language in a -specific style and generate responses steered by a human puppeteer or external -knowledge. We decompose the neural generation process into empirically easier -sub-problems: a faithfulness model and a decoding method based on -selective-sampling. We also describe training and sampling algorithms that bias -the generation process with a specific language style restriction, or a topic -restriction. Human evaluation results show that our proposed methods are able -to restrict style and topic without degrading output quality in conversational -tasks. -" -5695,1709.03036,"Kedar Dhamdhere and Kevin S. McCurley and Mukund Sundararajan and - Ankur Taly",Abductive Matching in Question Answering,cs.CL cs.LG," We study question-answering over semi-structured data. We introduce a new way -to apply the technique of semantic parsing by applying machine learning only to -provide annotations that the system infers to be missing; all the other parsing -logic is in the form of manually authored rules. In effect, the machine -learning is used to provide non-syntactic matches, a step that is ill-suited to -manual rules. The advantage of this approach is in its debuggability and in its -transparency to the end-user. We demonstrate the effectiveness of the approach -by achieving state-of-the-art performance of 40.42% accuracy on a standard -benchmark dataset over tables from Wikipedia. -" -5696,1709.03064,"Mayank Singh, Soham Dan, Sanyam Agarwal, Pawan Goyal, Animesh - Mukherjee","AppTechMiner: Mining Applications and Techniques from Scientific - Articles",cs.CL," This paper presents AppTechMiner, a rule-based information extraction -framework that automatically constructs a knowledge base of all application -areas and problem solving techniques. Techniques include tools, methods, -datasets or evaluation metrics. We also categorize individual research articles -based on their application areas and the techniques proposed/improved in the -article. Our system achieves high average precision (~82%) and recall (~84%) in -knowledge base creation. It also performs well in application and technique -assignment to an individual article (average accuracy ~66%). In the end, we -further present two use cases presenting a trivial information retrieval system -and an extensive temporal analysis of the usage of techniques and application -areas. At present, we demonstrate the framework for the domain of computational -linguistics but this can be easily generalized to any other field of research. -" -5697,1709.03167,"Geetanjali Rakshit, Kevin K. Bowden, Lena Reed, Amita Misra, Marilyn - Walker","Debbie, the Debate Bot of the Future",cs.CL," Chatbots are a rapidly expanding application of dialogue systems with -companies switching to bot services for customer support, and new applications -for users interested in casual conversation. One style of casual conversation -is argument, many people love nothing more than a good argument. Moreover, -there are a number of existing corpora of argumentative dialogues, annotated -for agreement and disagreement, stance, sarcasm and argument quality. This -paper introduces Debbie, a novel arguing bot, that selects arguments from -conversational corpora, and aims to use them appropriately in context. We -present an initial working prototype of Debbie, with some preliminary -evaluation and describe future work. -" -5698,1709.03190,"Kevin K. Bowden, Shereen Oraby, Amita Misra, Jiaqi Wu, and Stephanie - Lukin",Data-Driven Dialogue Systems for Social Agents,cs.CL," In order to build dialogue systems to tackle the ambitious task of holding -social conversations, we argue that we need a data driven approach that -includes insight into human conversational chit chat, and which incorporates -different natural language processing modules. Our strategy is to analyze and -index large corpora of social media data, including Twitter conversations, -online debates, dialogues between friends, and blog posts, and then to couple -this data retrieval with modules that perform tasks such as sentiment and style -analysis, topic modeling, and summarization. We aim for personal assistants -that can learn more nuanced human language, and to grow from task-oriented -agents to more personable social bots. -" -5699,1709.03406,Jo\~ao Filipe Figueiredo Pereira,Social Media Text Processing and Semantic Analysis for Smart Cities,cs.SI cs.CL cs.CY," With the rise of Social Media, people obtain and share information almost -instantly on a 24/7 basis. Many research areas have tried to gain valuable -insights from these large volumes of freely available user generated content. - With the goal of extracting knowledge from social media streams that might be -useful in the context of intelligent transportation systems and smart cities, -we designed and developed a framework that provides functionalities for -parallel collection of geo-located tweets from multiple pre-defined bounding -boxes (cities or regions), including filtering of non-complying tweets, text -pre-processing for Portuguese and English language, topic modeling, and -transportation-specific text classifiers, as well as, aggregation and data -visualization. - We performed an exploratory data analysis of geo-located tweets in 5 -different cities: Rio de Janeiro, S\~ao Paulo, New York City, London and -Melbourne, comprising a total of more than 43 million tweets in a period of 3 -months. Furthermore, we performed a large scale topic modelling comparison -between Rio de Janeiro and S\~ao Paulo. Interestingly, most of the topics are -shared between both cities which despite being in the same country are -considered very different regarding population, economy and lifestyle. - We take advantage of recent developments in word embeddings and train such -representations from the collections of geo-located tweets. We then use a -combination of bag-of-embeddings and traditional bag-of-words to train -travel-related classifiers in both Portuguese and English to filter -travel-related content from non-related. We created specific gold-standard data -to perform empirical evaluation of the resulting classifiers. Results are in -line with research work in other application areas by showing the robustness of -using word embeddings to learn word similarities that bag-of-words is not able -to capture. -" -5700,1709.03544,"Dominic Seyler, Tatiana Dembelova, Luciano Del Corro, Johannes - Hoffart, Gerhard Weikum",KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition,cs.CL," KnowNER is a multilingual Named Entity Recognition (NER) system that -leverages different degrees of external knowledge. A novel modular framework -divides the knowledge into four categories according to the depth of knowledge -they convey. Each category consists of a set of features automatically -generated from different information sources (such as a knowledge-base, a list -of names or document-specific semantic annotations) and is used to train a -conditional random field (CRF). Since those information sources are usually -multilingual, KnowNER can be easily trained for a wide range of languages. In -this paper, we show that the incorporation of deeper knowledge systematically -boosts accuracy and compare KnowNER with state-of-the-art NER approaches across -three languages (i.e., English, German and Spanish) performing amongst -state-of-the art systems in all of them. -" -5701,1709.03637,"Fei Liu, Timothy Baldwin and Trevor Cohn","Capturing Long-range Contextual Dependencies with Memory-enhanced - Conditional Random Fields",cs.CL," Despite successful applications across a broad range of NLP tasks, -conditional random fields (""CRFs""), in particular the linear-chain variant, are -only able to model local features. While this has important benefits in terms -of inference tractability, it limits the ability of the model to capture -long-range dependencies between items. Attempts to extend CRFs to capture -long-range dependencies have largely come at the cost of computational -complexity and approximate inference. In this work, we propose an extension to -CRFs by integrating external memory, taking inspiration from memory networks, -thereby allowing CRFs to incorporate information far beyond neighbouring steps. -Experiments across two tasks show substantial improvements over strong CRF and -LSTM baselines. -" -5702,1709.03665,"Zhiming Wang, Xiaolong Li, Jun Zhou","Small-footprint Keyword Spotting Using Deep Neural Network and - Connectionist Temporal Classifier",cs.CL," Mainly for the sake of solving the lack of keyword-specific data, we propose -one Keyword Spotting (KWS) system using Deep Neural Network (DNN) and -Connectionist Temporal Classifier (CTC) on power-constrained small-footprint -mobile devices, taking full advantage of general corpus from continuous speech -recognition which is of great amount. DNN is to directly predict the posterior -of phoneme units of any personally customized key-phrase, and CTC to produce a -confidence score of the given phoneme sequence as responsive decision-making -mechanism. The CTC-KWS has competitive performance in comparison with purely -DNN based keyword specific KWS, but not increasing any computational -complexity. -" -5703,1709.03742,Christina Lioma,Dependencies: Formalising Semantic Catenae for Information Retrieval,cs.IR cs.CL," Building machines that can understand text like humans is an AI-complete -problem. A great deal of research has already gone into this, with astounding -results, allowing everyday people to discuss with their telephones, or have -their reading materials analysed and classified by computers. A prerequisite -for processing text semantics, common to the above examples, is having some -computational representation of text as an abstract object. Operations on this -representation practically correspond to making semantic inferences, and by -extension simulating understanding text. The complexity and granularity of -semantic processing that can be realised is constrained by the mathematical and -computational robustness, expressiveness, and rigour of the tools used. - This dissertation contributes a series of such tools, diverse in their -mathematical formulation, but common in their application to model semantic -inferences when machines process text. These tools are principally expressed in -nine distinct models that capture aspects of semantic dependence in highly -interpretable and non-complex ways. This dissertation further reflects on -present and future problems with the current research paradigm in this area, -and makes recommendations on how to overcome them. - The amalgamation of the body of work presented in this dissertation advances -the complexity and granularity of semantic inferences that can be made -automatically by machines. -" -5704,1709.03756,Yan Shao,"Cross-lingual Word Segmentation and Morpheme Segmentation as Sequence - Labelling",cs.CL," This paper presents our segmentation system developed for the MLP 2017 shared -tasks on cross-lingual word segmentation and morpheme segmentation. We model -both word and morpheme segmentation as character-level sequence labelling -tasks. The prevalent bidirectional recurrent neural network with conditional -random fields as the output interface is adapted as the baseline system, which -is further improved via ensemble decoding. Our universal system is applied to -and extensively evaluated on all the official data sets without any -language-specific adjustment. The official evaluation results indicate that the -proposed model achieves outstanding accuracies both for word and morpheme -segmentation on all the languages in various types when compared to the other -participating systems. -" -5705,1709.03759,"Lyan Verwimp, Joris Pelemans, Marieke Lycke, Hugo Van hamme, Patrick - Wambacq",Language Models of Spoken Dutch,cs.CL," In Flanders, all TV shows are subtitled. However, the process of subtitling -is a very time-consuming one and can be sped up by providing the output of a -speech recognizer run on the audio of the TV show, prior to the subtitling. -Naturally, this speech recognition will perform much better if the employed -language model is adapted to the register and the topic of the program. We -present several language models trained on subtitles of television shows -provided by the Flemish public-service broadcaster VRT. This data was gathered -in the context of the project STON which has as purpose to facilitate the -process of subtitling TV shows. One model is trained on all available data (46M -word tokens), but we also trained models on a specific type of TV show or -domain/topic. Language models of spoken language are quite rare due to the lack -of training data. The size of this corpus is relatively large for a corpus of -spoken language (compare with e.g. CGN which has 9M words), but still rather -small for a language model. Thus, in practice it is advised to interpolate -these models with a large background language model trained on written -language. The models can be freely downloaded on -http://www.esat.kuleuven.be/psi/spraak/downloads/. -" -5706,1709.03814,"Yongchao Deng, Jungi Kim, Guillaume Klein, Catherine Kobus, Natalia - Segal, Christophe Servan, Bo Wang, Dakun Zhang, Josep Crego, Jean Senellart",SYSTRAN Purely Neural MT Engines for WMT2017,cs.CL," This paper describes SYSTRAN's systems submitted to the WMT 2017 shared news -translation task for English-German, in both translation directions. Our -systems are built using OpenNMT, an open-source neural machine translation -system, implementing sequence-to-sequence models with LSTM encoder/decoders and -attention. We experimented using monolingual data automatically -back-translated. Our resulting models are further hyper-specialised with an -adaptation technique that finely tunes models according to the evaluation test -sentences. -" -5707,1709.03815,"Guillaume Klein, Yoon Kim, Yuntian Deng, Josep Crego, Jean Senellart, - Alexander M. Rush",OpenNMT: Open-source Toolkit for Neural Machine Translation,cs.CL," We introduce an open-source toolkit for neural machine translation (NMT) to -support research into model architectures, feature representations, and source -modalities, while maintaining competitive performance, modularity and -reasonable training requirements. -" -5708,1709.03856,"Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes and - Jason Weston",StarSpace: Embed All The Things!,cs.CL," We present StarSpace, a general-purpose neural embedding model that can solve -a wide variety of problems: labeling tasks such as text classification, ranking -tasks such as information retrieval/web search, collaborative filtering-based -or content-based recommendation, embedding of multi-relational graphs, and -learning word, sentence or document level embeddings. In each case the model -works by embedding those entities comprised of discrete features and comparing -them against each other -- learning similarities dependent on the task. -Empirical results on a number of tasks show that StarSpace is highly -competitive with existing methods, whilst also being generally applicable to -new cases where those methods are not. -" -5709,1709.03925,"Natalia Loukachevitch, Anastasia Gerasimova",Human Associations Help to Detect Conventionalized Multiword Expressions,cs.CL," In this paper we show that if we want to obtain human evidence about -conventionalization of some phrases, we should ask native speakers about -associations they have to a given phrase and its component words. We have shown -that if component words of a phrase have each other as frequent associations, -then this phrase can be considered as conventionalized. Another type of -conventionalized phrases can be revealed using two factors: low entropy of -phrase associations and low intersection of component word and phrase -associations. The association experiments were performed for the Russian -language. -" -5710,1709.03933,"Dan Svenstrup, Jonas Meinertz Hansen and Ole Winther",Hash Embeddings for Efficient Word Representations,cs.CL," We present hash embeddings, an efficient method for representing words in a -continuous vector form. A hash embedding may be seen as an interpolation -between a standard word embedding and a word embedding created using a random -hash function (the hashing trick). In hash embeddings each token is represented -by $k$ $d$-dimensional embeddings vectors and one $k$ dimensional weight -vector. The final $d$ dimensional representation of the token is the product of -the two. Rather than fitting the embedding vectors for each token these are -selected by the hashing trick from a shared pool of $B$ embedding vectors. Our -experiments show that hash embeddings can easily deal with huge vocabularies -consisting of millions of tokens. When using a hash embedding there is no need -to create a dictionary before training nor to perform any kind of vocabulary -pruning after training. We show that models trained using hash embeddings -exhibit at least the same level of performance as models trained using regular -embeddings across a wide range of tasks. Furthermore, the number of parameters -needed by such an embedding is only a fraction of what is required by a regular -embedding. Since standard embeddings and embeddings constructed using the -hashing trick are actually just special cases of a hash embedding, hash -embeddings can be considered an extension and improvement over the existing -regular embedding types. -" -5711,1709.03968,"Nabiha Asghar, Pascal Poupart, Jesse Hoey, Xin Jiang, Lili Mou",Affective Neural Response Generation,cs.CL cs.AI cs.CY cs.HC cs.IR," Existing neural conversational models process natural language primarily on a -lexico-syntactic level, thereby ignoring one of the most crucial components of -human-to-human dialogue: its affective content. We take a step in this -direction by proposing three novel ways to incorporate affective/emotional -aspects into long short term memory (LSTM) encoder-decoder neural conversation -models: (1) affective word embeddings, which are cognitively engineered, (2) -affect-based objective functions that augment the standard cross-entropy loss, -and (3) affectively diverse beam search for decoding. Experiments show that -these techniques improve the open-domain conversational prowess of -encoder-decoder networks by enabling them to produce emotionally rich responses -that are more interesting and natural. -" -5712,1709.03980,"Wen Zhang, Jiawei Hu, Yang Feng, Qun Liu","Refining Source Representations with Relation Networks for Neural - Machine Translation",cs.CL cs.AI cs.LG," Although neural machine translation (NMT) with the encoder-decoder framework -has achieved great success in recent times, it still suffers from some -drawbacks: RNNs tend to forget old information which is often useful and the -encoder only operates through words without considering word relationship. To -solve these problems, we introduce a relation networks (RN) into NMT to refine -the encoding representations of the source. In our method, the RN first -augments the representation of each source word with its neighbors and reasons -all the possible pairwise relations between them. Then the source -representations and all the relations are fed to the attention module and the -decoder together, keeping the main encoder-decoder architecture unchanged. -Experiments on two Chinese-to-English data sets in different scales both show -that our method can outperform the competitive baselines significantly. -" -5713,1709.04005,"Rui Zhang, Honglak Lee, Lazaros Polymenakos, Dragomir Radev","Addressee and Response Selection in Multi-Party Conversations with - Speaker Interaction RNNs",cs.CL," In this paper, we study the problem of addressee and response selection in -multi-party conversations. Understanding multi-party conversations is -challenging because of complex speaker interactions: multiple speakers exchange -messages with each other, playing different roles (sender, addressee, -observer), and these roles vary across turns. To tackle this challenge, we -propose the Speaker Interaction Recurrent Neural Network (SI-RNN). Whereas the -previous state-of-the-art system updated speaker embeddings only for the -sender, SI-RNN uses a novel dialog encoder to update speaker embeddings in a -role-sensitive way. Additionally, unlike the previous work that selected the -addressee and response separately, SI-RNN selects them jointly by viewing the -task as a sequence prediction problem. Experimental results show that SI-RNN -significantly improves the accuracy of addressee and response selection, -particularly in complex conversations with many speakers and responses to -distant messages many turns in the past. -" -5714,1709.04071,"Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander J. Smola, Le Song",Variational Reasoning for Question Answering with Knowledge Graph,cs.LG cs.AI cs.CL," Knowledge graph (KG) is known to be helpful for the task of question -answering (QA), since it provides well-structured relational information -between entities, and allows one to further infer indirect facts. However, it -is challenging to build QA systems which can learn to reason over knowledge -graphs based on question-answer pairs alone. First, when people ask questions, -their expressions are noisy (for example, typos in texts, or variations in -pronunciations), which is non-trivial for the QA system to match those -mentioned entities to the knowledge graph. Second, many questions require -multi-hop logic reasoning over the knowledge graph to retrieve the answers. To -address these challenges, we propose a novel and unified deep learning -architecture, and an end-to-end variational learning algorithm which can handle -noise in questions, and learn multi-hop reasoning simultaneously. Our method -achieves state-of-the-art performance on a recent benchmark dataset in the -literature. We also derive a series of new benchmark datasets, including -questions for multi-hop reasoning, questions paraphrased by neural translation -model, and questions in human voice. Our method yields very promising results -on all these challenging datasets. -" -5715,1709.04109,"Liyuan Liu, Jingbo Shang, Frank F. Xu, Xiang Ren, Huan Gui, Jian Peng, - Jiawei Han",Empower Sequence Labeling with Task-Aware Neural Language Model,cs.CL cs.LG," Linguistic sequence labeling is a general modeling approach that encompasses -a variety of problems, such as part-of-speech tagging and named entity -recognition. Recent advances in neural networks (NNs) make it possible to build -reliable models without handcrafted features. However, in many cases, it is -hard to obtain sufficient annotations to train these models. In this study, we -develop a novel neural framework to extract abundant knowledge hidden in raw -texts to empower the sequence labeling task. Besides word-level knowledge -contained in pre-trained word embeddings, character-aware neural language -models are incorporated to extract character-level knowledge. Transfer learning -techniques are further adopted to mediate different components and guide the -language model towards the key knowledge. Comparing to previous methods, these -task-specific knowledge allows us to adopt a more concise model and conduct -more efficient training. Different from most transfer learning methods, the -proposed framework does not rely on any additional supervision. It extracts -knowledge from self-contained order information of training sequences. -Extensive experiments on benchmark datasets demonstrate the effectiveness of -leveraging character-level knowledge and the efficiency of co-training. For -example, on the CoNLL03 NER task, model training completes in about 6 hours on -a single GPU, reaching F1 score of 91.71$\pm$0.10 without using any extra -annotation. -" -5716,1709.04219,"Jeremy Barnes, Roman Klinger and Sabine Schulte im Walde","Assessing State-of-the-Art Sentiment Models on State-of-the-Art - Sentiment Datasets",cs.CL cs.AI," There has been a good amount of progress in sentiment analysis over the past -10 years, including the proposal of new methods and the creation of benchmark -datasets. In some papers, however, there is a tendency to compare models only -on one or two datasets, either because of time restraints or because the model -is tailored to a specific task. Accordingly, it is hard to understand how well -a certain model generalizes across different tasks and datasets. In this paper, -we contribute to this situation by comparing several models on six different -benchmarks, which belong to different domains and additionally have different -levels of granularity (binary, 3-class, 4-class and 5-class). We show that -Bi-LSTMs perform well across datasets and that both LSTMs and Bi-LSTMs are -particularly good at fine-grained sentiment tasks (i. e., with more than two -classes). Incorporating sentiment information into word embeddings during -training gives good results for datasets that are lexically similar to the -training data. With our experiments, we contribute to a better understanding of -the performance of different model architectures on different data sets. -Consequently, we detect novel state-of-the-art results on the SenTube datasets. -" -5717,1709.04250,"Harshit Kumar, Arvind Agarwal, Riddhiman Dasgupta, Sachindra Joshi, - Arun Kumar",Dialogue Act Sequence Labeling using Hierarchical encoder with CRF,cs.CL," Dialogue Act recognition associate dialogue acts (i.e., semantic labels) to -utterances in a conversation. The problem of associating semantic labels to -utterances can be treated as a sequence labeling problem. In this work, we -build a hierarchical recurrent neural network using bidirectional LSTM as a -base unit and the conditional random field (CRF) as the top layer to classify -each utterance into its corresponding dialogue act. The hierarchical network -learns representations at multiple levels, i.e., word level, utterance level, -and conversation level. The conversation level representations are input to the -CRF layer, which takes into account not only all previous utterances but also -their dialogue acts, thus modeling the dependency among both, labels and -utterances, an important consideration of natural dialogue. We validate our -approach on two different benchmark data sets, Switchboard and Meeting Recorder -Dialogue Act, and show performance improvement over the state-of-the-art -methods by $2.2\%$ and $4.1\%$ absolute points, respectively. It is worth -noting that the inter-annotator agreement on Switchboard data set is $84\%$, -and our method is able to achieve the accuracy of about $79\%$ despite being -trained on the noisy data. -" -5718,1709.04264,"Wenya Zhu, Kaixiang Mo, Yu Zhang, Zhangbin Zhu, Xuezheng Peng, Qiang - Yang",Flexible End-to-End Dialogue System for Knowledge Grounded Conversation,cs.CL," In knowledge grounded conversation, domain knowledge plays an important role -in a special domain such as Music. The response of knowledge grounded -conversation might contain multiple answer entities or no entity at all. -Although existing generative question answering (QA) systems can be applied to -knowledge grounded conversation, they either have at most one entity in a -response or cannot deal with out-of-vocabulary entities. We propose a fully -data-driven generative dialogue system GenDS that is capable of generating -responses based on input message and related knowledge base (KB). To generate -arbitrary number of answer entities even when these entities never appear in -the training set, we design a dynamic knowledge enquirer which selects -different answer entities at different positions in a single response, -according to different local context. It does not rely on the representations -of entities, enabling our model deal with out-of-vocabulary entities. We -collect a human-human conversation data (ConversMusic) with knowledge -annotations. The proposed method is evaluated on CoversMusic and a public -question answering dataset. Our proposed GenDS system outperforms baseline -methods significantly in terms of the BLEU, entity accuracy, entity recall and -human evaluation. Moreover,the experiments also demonstrate that GenDS works -better even on small datasets. -" -5719,1709.04348,"Yichen Gong, Heng Luo and Jian Zhang",Natural Language Inference over Interaction Space,cs.CL," Natural Language Inference (NLI) task requires an agent to determine the -logical relationship between a natural language premise and a natural language -hypothesis. We introduce Interactive Inference Network (IIN), a novel class of -neural network architectures that is able to achieve high-level understanding -of the sentence pair by hierarchically extracting semantic features from -interaction space. We show that an interaction tensor (attention weight) -contains semantic information to solve natural language inference, and a denser -interaction tensor contains richer semantic information. One instance of such -architecture, Densely Interactive Inference Network (DIIN), demonstrates the -state-of-the-art performance on large scale NLI copora and large-scale NLI -alike corpus. It's noteworthy that DIIN achieve a greater than 20% error -reduction on the challenging Multi-Genre NLI (MultiNLI) dataset with respect to -the strongest published system. -" -5720,1709.04359,"Ekaterina Lapshninova-Koltunski, Marcos Zampieri","Linguistic Features of Genre and Method Variation in Translation: A - Computational Perspective",cs.CL," In this paper we describe the use of text classification methods to -investigate genre and method variation in an English - German translation -corpus. For this purpose we use linguistically motivated features representing -texts using a combination of part-of-speech tags arranged in bigrams, trigrams, -and 4-grams. The classification method used in this paper is a Bayesian -classifier with Laplace smoothing. We use the output of the classifiers to -carry out an extensive feature analysis on the main difference between genres -and methods of translation. -" -5721,1709.04380,"Tianyu Li, Guillaume Rabusseau, Doina Precup",Neural Network Based Nonlinear Weighted Finite Automata,cs.FL cs.AI cs.CL cs.LG," Weighted finite automata (WFA) can expressively model functions defined over -strings but are inherently linear models. Given the recent successes of -nonlinear models in machine learning, it is natural to wonder whether -ex-tending WFA to the nonlinear setting would be beneficial. In this paper, we -propose a novel model of neural network based nonlinearWFA model (NL-WFA) along -with a learning algorithm. Our learning algorithm is inspired by the spectral -learning algorithm for WFAand relies on a nonlinear decomposition of the -so-called Hankel matrix, by means of an auto-encoder network. The expressive -power of NL-WFA and the proposed learning algorithm are assessed on both -synthetic and real-world data, showing that NL-WFA can lead to smaller model -sizes and infer complex grammatical structures from data. -" -5722,1709.04409,"Amanda Cercas Curry, Helen Hastie and Verena Rieser",A Review of Evaluation Techniques for Social Dialogue Systems,cs.CL," In contrast with goal-oriented dialogue, social dialogue has no clear measure -of task success. Consequently, evaluation of these systems is notoriously hard. -In this paper, we review current evaluation methods, focusing on automatic -metrics. We conclude that turn-based metrics often ignore the context and do -not account for the fact that several replies are valid, while end-of-dialogue -rewards are mainly hand-crafted. Both lack grounding in human perceptions. -" -5723,1709.04482,"Yonatan Belinkov, James Glass","Analyzing Hidden Representations in End-to-End Automatic Speech - Recognition Systems",cs.CL cs.NE cs.SD," Neural models have become ubiquitous in automatic speech recognition systems. -While neural networks are typically used as acoustic models in more complex -systems, recent studies have explored end-to-end speech recognition systems -based on neural networks, which can be trained to directly predict text from -input acoustic features. Although such systems are conceptually elegant and -simpler than traditional systems, it is less obvious how to interpret the -trained models. In this work, we analyze the speech representations learned by -a deep end-to-end model that is based on convolutional and recurrent layers, -and trained with a connectionist temporal classification (CTC) loss. We use a -pre-trained model to generate frame-level features which are given to a -classifier that is trained on frame classification into phones. We evaluate -representations from different layers of the deep model and compare their -quality for predicting phone labels. Our experiments shed light on important -aspects of the end-to-end model such as layer depth, model complexity, and -other design choices. -" -5724,1709.04491,"{\L}ukasz Augustyniak, Krzysztof Rajda, Tomasz Kajdanowicz",Method for Aspect-Based Sentiment Annotation Using Rhetorical Analysis,cs.CL," This paper fills a gap in aspect-based sentiment analysis and aims to present -a new method for preparing and analysing texts concerning opinion and -generating user-friendly descriptive reports in natural language. We present a -comprehensive set of techniques derived from Rhetorical Structure Theory and -sentiment analysis to extract aspects from textual opinions and then build an -abstractive summary of a set of opinions. Moreover, we propose aspect-aspect -graphs to evaluate the importance of aspects and to filter out unimportant ones -from the summary. Additionally, the paper presents a prototype solution of data -flow with interesting and valuable results. The proposed method's results -proved the high accuracy of aspect detection when applied to the gold standard -dataset. -" -5725,1709.04558,John S. Ball,"Using NLU in Context for Question Answering: Improving on Facebook's - bAbI Tasks",cs.CL cs.AI," For the next step in human to machine interaction, Artificial Intelligence -(AI) should interact predominantly using natural language because, if it -worked, it would be the fastest way to communicate. Facebook's toy tasks (bAbI) -provide a useful benchmark to compare implementations for conversational AI. -While the published experiments so far have been based on exploiting the -distributional hypothesis with machine learning, our model exploits natural -language understanding (NLU) with the decomposition of language based on Role -and Reference Grammar (RRG) and the brain-based Patom theory. Our combinatorial -system for conversational AI based on linguistics has many advantages: passing -bAbI task tests without parsing or statistics while increasing scalability. Our -model validates both the training and test data to find 'garbage' input and -output (GIGO). It is not rules-based, nor does it use parts of speech, but -instead relies on meaning. While Deep Learning is difficult to debug and fix, -every step in our model can be understood and changed like any non-statistical -computer program. Deep Learning's lack of explicable reasoning has raised -opposition to AI, partly due to fear of the unknown. To support the goals of -AI, we propose extended tasks to use human-level statements with tense, aspect -and voice, and embedded clauses with junctures: and answers to be natural -language generation (NLG) instead of keywords. While machine learning permits -invalid training data to produce incorrect test responses, our system cannot -because the context tracking would need to be intentionally broken. We believe -no existing learning systems can currently solve these extended natural -language tests. There appears to be a knowledge gap between NLP researchers and -linguists, but ongoing competitive results such as these promise to narrow that -gap. -" -5726,1709.04625,"Jia-Hong Huang, Cuong Duc Dao, Modar Alfadly, C. Huck Yang, Bernard - Ghanem",Robustness Analysis of Visual QA Models by Basic Questions,cs.CV cs.CL," Visual Question Answering (VQA) models should have both high robustness and -accuracy. Unfortunately, most of the current VQA research only focuses on -accuracy because there is a lack of proper methods to measure the robustness of -VQA models. There are two main modules in our algorithm. Given a natural -language question about an image, the first module takes the question as input -and then outputs the ranked basic questions, with similarity scores, of the -main given question. The second module takes the main question, image and these -basic questions as input and then outputs the text-based answer of the main -question about the given image. We claim that a robust VQA model is one, whose -performance is not changed much when related basic questions as also made -available to it as input. We formulate the basic questions generation problem -as a LASSO optimization, and also propose a large scale Basic Question Dataset -(BQD) and Rscore (novel robustness measure), for analyzing the robustness of -VQA models. We hope our BQD will be used as a benchmark for to evaluate the -robustness of VQA models, so as to help the community build more robust and -accurate VQA models. -" -5727,1709.04682,"Neama Abdulaziz Dahan, Fadl Mutaher Ba-Alwi, Ibrahim Ahmed Al-Baltah, - Ghaleb H. Al-gapheri",Towards an Arabic-English Machine-Translation Based on Semantic Web,cs.CL," Communication tools make the world like a small village and as a consequence -people can contact with others who are from different societies or who speak -different languages. This communication cannot happen effectively without -Machine Translation because they can be found anytime and everywhere. There are -a number of studies that have developed Machine Translation for the English -language with so many other languages except the Arabic it has not been -considered yet. Therefore we aim to highlight a roadmap for our proposed -translation machine to provide an enhanced Arabic English translation based on -Semantic. -" -5728,1709.04685,"Nabeel T. Alsohybe, Neama Abdulaziz Dahan, Fadl Mutaher Ba-Alwi","Machine-Translation History and Evolution: Survey for Arabic-English - Translations",cs.CL," As a result of the rapid changes in information and communication technology -(ICT), the world has become a small village where people from all over the -world connect with each other in dialogue and communication via the Internet. -Also, communications have become a daily routine activity due to the new -globalization where companies and even universities become global residing -cross countries borders. As a result, translation becomes a needed activity in -this connected world. ICT made it possible to have a student in one country -take a course or even a degree from a different country anytime anywhere -easily. The resulted communication still needs a language as a means that helps -the receiver understands the contents of the sent message. People need an -automated translation application because human translators are hard to find -all the times, and the human translations are very expensive comparing to the -translations automated process. Several types of research describe the -electronic process of the Machine-Translation. In this paper, the authors are -going to study some of these previous researches, and they will explore some of -the needed tools for the Machine-Translation. This research is going to -contribute to the Machine-Translation area by helping future researchers to -have a summary for the Machine-Translation groups of research and to let lights -on the importance of the translation mechanism. -" -5729,1709.04696,"Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan and - Chengqi Zhang","DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language - Understanding",cs.CL cs.AI," Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely -used on NLP tasks to capture the long-term and local dependencies, -respectively. Attention mechanisms have recently attracted enormous interest -due to their highly parallelizable computation, significantly less training -time, and flexibility in modeling dependencies. We propose a novel attention -mechanism in which the attention between elements from input sequence(s) is -directional and multi-dimensional (i.e., feature-wise). A light-weight neural -net, ""Directional Self-Attention Network (DiSAN)"", is then proposed to learn -sentence embedding, based solely on the proposed attention without any RNN/CNN -structure. DiSAN is only composed of a directional self-attention with temporal -order encoded, followed by a multi-dimensional attention that compresses the -sequence into a vector representation. Despite its simple form, DiSAN -outperforms complicated RNN models on both prediction quality and time -efficiency. It achieves the best test accuracy among all sentence encoding -methods and improves the most recent best result by 1.02% on the Stanford -Natural Language Inference (SNLI) dataset, and shows state-of-the-art test -accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language -inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), -Customer Review, MPQA, TREC question-type classification and Subjectivity -(SUBJ) datasets. -" -5730,1709.04710,Atsushi Yokoyama,Embedded-Graph Theory,cs.DM cs.CL," In this paper, we propose a new type of graph, denoted as ""embedded-graph"", -and its theory, which employs a distributed representation to describe the -relations on the graph edges. Embedded-graphs can express linguistic and -complicated relations, which cannot be expressed by the existing edge-graphs or -weighted-graphs. We introduce the mathematical definition of embedded-graph, -translation, edge distance, and graph similarity. We can transform an -embedded-graph into a weighted-graph and a weighted-graph into an edge-graph by -the translation method and by threshold calculation, respectively. The edge -distance of an embedded-graph is a distance based on the components of a target -vector, and it is calculated through cosine similarity with the target vector. -The graph similarity is obtained considering the relations with linguistic -complexity. In addition, we provide some examples and data structures for -embedded-graphs in this paper. -" -5731,1709.04820,"Damien Sileo, Camille Pradel, Philippe Muller, Tim Van de Cruys",Synapse at CAp 2017 NER challenge: Fasttext CRF,cs.CL," We present our system for the CAp 2017 NER challenge which is about named -entity recognition on French tweets. Our system leverages unsupervised learning -on a larger dataset of French tweets to learn features feeding a CRF model. It -was ranked first without using any gazetteer or structured external data, with -an F-measure of 58.89\%. To the best of our knowledge, it is the first system -to use fasttext embeddings (which include subword representations) and an -embedding-based sentence representation for NER. -" -5732,1709.04849,"Lesly Miculicich Werlen, Nikolaos Pappas, Dhananjay Ram and Andrei - Popescu-Belis",Self-Attentive Residual Decoder for Neural Machine Translation,cs.CL," Neural sequence-to-sequence networks with attention have achieved remarkable -performance for machine translation. One of the reasons for their effectiveness -is their ability to capture relevant source-side contextual information at each -time-step prediction through an attention mechanism. However, the target-side -context is solely based on the sequence model which, in practice, is prone to a -recency bias and lacks the ability to capture effectively non-sequential -dependencies among words. To address this limitation, we propose a -target-side-attentive residual recurrent network for decoding, where attention -over previous words contributes directly to the prediction of the next word. -The residual learning facilitates the flow of information from the distant past -and is able to emphasize any of the previously translated words, hence it gains -access to a wider context. The proposed model outperforms a neural MT baseline -as well as a memory and self-attention network on three language pairs. The -analysis of the attention learned by the decoder confirms that it emphasizes a -wider context, and that it captures syntactic-like structures. -" -5733,1709.04857,Kun Xing,A New Semantic Theory of Natural Language,cs.CL cs.LO," Formal Semantics and Distributional Semantics are two important semantic -frameworks in Natural Language Processing (NLP). Cognitive Semantics belongs to -the movement of Cognitive Linguistics, which is based on contemporary cognitive -science. Each framework could deal with some meaning phenomena, but none of -them fulfills all requirements proposed by applications. A unified semantic -theory characterizing all important language phenomena has both theoretical and -practical significance; however, although many attempts have been made in -recent years, no existing theory has achieved this goal yet. - This article introduces a new semantic theory that has the potential to -characterize most of the important meaning phenomena of natural language and to -fulfill most of the necessary requirements for philosophical analysis and for -NLP applications. The theory is based on a unified representation of -information, and constructs a kind of mathematical model called cognitive model -to interpret natural language expressions in a compositional manner. It accepts -the empirical assumption of Cognitive Semantics, and overcomes most -shortcomings of Formal Semantics and of Distributional Semantics. The theory, -however, is not a simple combination of existing theories, but an extensive -generalization of classic logic and Formal Semantics. It inherits nearly all -advantages of Formal Semantics, and also provides descriptive contents for -objects and events as fine-gram as possible, descriptive contents which -represent the results of human cognition. -" -5734,1709.04969,"Fred Morstatter, Kai Shu, Suhang Wang, and Huan Liu","Cross-Platform Emoji Interpretation: Analysis, a Solution, and - Applications",cs.CL," Most social media platforms are largely based on text, and users often write -posts to describe where they are, what they are seeing, and how they are -feeling. Because written text lacks the emotional cues of spoken and -face-to-face dialogue, ambiguities are common in written language. This problem -is exacerbated in the short, informal nature of many social media posts. To -bypass this issue, a suite of special characters called ""emojis,"" which are -small pictograms, are embedded within the text. Many emojis are small -depictions of facial expressions designed to help disambiguate the emotional -meaning of the text. However, a new ambiguity arises in the way that emojis are -rendered. Every platform (Windows, Mac, and Android, to name a few) renders -emojis according to their own style. In fact, it has been shown that some -emojis can be rendered so differently that they look ""happy"" on some platforms, -and ""sad"" on others. In this work, we use real-world data to verify the -existence of this problem. We verify that the usage of the same emoji can be -significantly different across platforms, with some emojis exhibiting different -sentiment polarities on different platforms. We propose a solution to identify -the intended emoji based on the platform-specific nature of the emoji used by -the author of a social media post. We apply our solution to sentiment analysis, -a task that can benefit from the emoji calibration technique we use in this -work. We conduct experiments to evaluate the effectiveness of the mapping in -this task. -" -5735,1709.05014,Gonzalo Estr\'an Buyo,"WOAH: Preliminaries to Zero-shot Ontology Learning for Conversational - Agents",cs.CL cs.AI," The present paper presents the Weighted Ontology Approximation Heuristic -(WOAH), a novel zero-shot approach to ontology estimation for conversational -agents development environments. This methodology extracts verbs and nouns -separately from data by distilling the dependencies obtained and applying -similarity and sparsity metrics to generate an ontology estimation configurable -in terms of the level of generalization. -" -5736,1709.05027,"Wei Wen, Yuxiong He, Samyam Rajbhandari, Minjia Zhang, Wenhan Wang, - Fang Liu, Bin Hu, Yiran Chen, Hai Li",Learning Intrinsic Sparse Structures within Long Short-Term Memory,cs.LG cs.AI cs.CL cs.NE," Model compression is significant for the wide adoption of Recurrent Neural -Networks (RNNs) in both user devices possessing limited resources and business -clusters requiring quick responses to large-scale service requests. This work -aims to learn structurally-sparse Long Short-Term Memory (LSTM) by reducing the -sizes of basic structures within LSTM units, including input updates, gates, -hidden states, cell states and outputs. Independently reducing the sizes of -basic structures can result in inconsistent dimensions among them, and -consequently, end up with invalid LSTM units. To overcome the problem, we -propose Intrinsic Sparse Structures (ISS) in LSTMs. Removing a component of ISS -will simultaneously decrease the sizes of all basic structures by one and -thereby always maintain the dimension consistency. By learning ISS within LSTM -units, the obtained LSTMs remain regular while having much smaller basic -structures. Based on group Lasso regularization, our method achieves 10.59x -speedup without losing any perplexity of a language modeling of Penn TreeBank -dataset. It is also successfully evaluated through a compact model with only -2.69M weights for machine Question Answering of SQuAD dataset. Our approach is -successfully extended to non- LSTM RNNs, like Recurrent Highway Networks -(RHNs). Our source code is publicly available at -https://github.com/wenwei202/iss-rnns -" -5737,1709.05036,"Tzu-Chien Liu, Yu-Hsueh Wu, Hung-Yi Lee",Query-based Attention CNN for Text Similarity Map,cs.AI cs.CL," In this paper, we introduce Query-based Attention CNN(QACNN) for Text -Similarity Map, an end-to-end neural network for question answering. This -network is composed of compare mechanism, two-staged CNN architecture with -attention mechanism, and a prediction layer. First, the compare mechanism -compares between the given passage, query, and multiple answer choices to build -similarity maps. Then, the two-staged CNN architecture extracts features -through word-level and sentence-level. At the same time, attention mechanism -helps CNN focus more on the important part of the passage based on the query -information. Finally, the prediction layer find out the most possible answer -choice. We conduct this model on the MovieQA dataset using Plot Synopses only, -and achieve 79.99% accuracy which is the state of the art on the dataset. -" -5738,1709.05038,"Yang Xian, Yingli Tian","Self-Guiding Multimodal LSTM - when we do not have a perfect training - dataset for image captioning",cs.CV cs.CL cs.LG," In this paper, a self-guiding multimodal LSTM (sg-LSTM) image captioning -model is proposed to handle uncontrolled imbalanced real-world image-sentence -dataset. We collect FlickrNYC dataset from Flickr as our testbed with 306,165 -images and the original text descriptions uploaded by the users are utilized as -the ground truth for training. Descriptions in FlickrNYC dataset vary -dramatically ranging from short term-descriptions to long -paragraph-descriptions and can describe any visual aspects, or even refer to -objects that are not depicted. To deal with the imbalanced and noisy situation -and to fully explore the dataset itself, we propose a novel guiding textual -feature extracted utilizing a multimodal LSTM (m-LSTM) model. Training of -m-LSTM is based on the portion of data in which the image content and the -corresponding descriptions are strongly bonded. Afterwards, during the training -of sg-LSTM on the rest training data, this guiding information serves as -additional input to the network along with the image representations and the -ground-truth descriptions. By integrating these input components into a -multimodal block, we aim to form a training scheme with the textual information -tightly coupled with the image content. The experimental results demonstrate -that the proposed sg-LSTM model outperforms the traditional state-of-the-art -multimodal RNN captioning framework in successfully describing the key -components of the input images. -" -5739,1709.05074,"Ankush Gupta, Arvind Agarwal, Prawaan Singh, Piyush Rai",A Deep Generative Framework for Paraphrase Generation,cs.CL," Paraphrase generation is an important problem in NLP, especially in question -answering, information retrieval, information extraction, conversation systems, -to name a few. In this paper, we address the problem of generating paraphrases -automatically. Our proposed method is based on a combination of deep generative -models (VAE) with sequence-to-sequence models (LSTM) to generate paraphrases, -given an input sentence. Traditional VAEs when combined with recurrent neural -networks can generate free text but they are not suitable for paraphrase -generation for a given sentence. We address this problem by conditioning the -both, encoder and decoder sides of VAE, on the original sentence, so that it -can generate the given sentence's paraphrases. Unlike most existing models, our -model is simple, modular and can generate multiple paraphrases, for a given -sentence. Quantitative evaluation of the proposed method on a benchmark -paraphrase dataset demonstrates its efficacy, and its performance improvement -over the state-of-the-art methods by a significant margin, whereas qualitative -human evaluation indicate that the generated paraphrases are well-formed, -grammatically correct, and are relevant to the input sentence. Furthermore, we -evaluate our method on a newly released question paraphrase dataset, and -establish a new baseline for future research. -" -5740,1709.05094,"Athanasios Giannakopoulos, Claudiu Musat, Andreea Hossmann and Michael - Baeriswyl","Unsupervised Aspect Term Extraction with B-LSTM & CRF using - Automatically Labelled Datasets",cs.CL," Aspect Term Extraction (ATE) identifies opinionated aspect terms in texts and -is one of the tasks in the SemEval Aspect Based Sentiment Analysis (ABSA) -contest. The small amount of available datasets for supervised ATE and the -costly human annotation for aspect term labelling give rise to the need for -unsupervised ATE. In this paper, we introduce an architecture that achieves -top-ranking performance for supervised ATE. Moreover, it can be used -efficiently as feature extractor and classifier for unsupervised ATE. Our -second contribution is a method to automatically construct datasets for ATE. We -train a classifier on our automatically labelled datasets and evaluate it on -the human annotated SemEval ABSA test sets. Compared to a strong rule-based -baseline, we obtain a dramatically higher F-score and attain precision values -above 80%. Our unsupervised method beats the supervised ABSA baseline from -SemEval, while preserving high precision scores. -" -5741,1709.05227,"Matthias Sperber, Graham Neubig, Jan Niehues, Satoshi Nakamura, Alex - Waibel",Transcribing Against Time,cs.CL," We investigate the problem of manually correcting errors from an automatic -speech transcript in a cost-sensitive fashion. This is done by specifying a -fixed time budget, and then automatically choosing location and size of -segments for correction such that the number of corrected errors is maximized. -The core components, as suggested by previous research [1], are a utility model -that estimates the number of errors in a particular segment, and a cost model -that estimates annotation effort for the segment. In this work we propose a -dynamic updating framework that allows for the training of cost models during -the ongoing transcription process. This removes the need for transcriber -enrollment prior to the actual transcription, and improves correction -efficiency by allowing highly transcriber-adaptive cost modeling. We first -confirm and analyze the improvements afforded by this method in a simulated -study. We then conduct a realistic user study, observing efficiency -improvements of 15% relative on average, and 42% for the participants who -deviated most strongly from our initial, transcriber-agnostic cost model. -Moreover, we find that our updating framework can capture dynamically changing -factors, such as transcriber fatigue and topic familiarity, which we observe to -have a large influence on the transcriber's working behavior. -" -5742,1709.05278,"Dion Bailey, Tom Pajak, Daoud Clarke, Carlos Rodriguez",Algorithms and Architecture for Real-time Recommendations at News UK,cs.IR cs.AI cs.CL," Recommendation systems are recognised as being hugely important in industry, -and the area is now well understood. At News UK, there is a requirement to be -able to quickly generate recommendations for users on news items as they are -published. However, little has been published about systems that can generate -recommendations in response to changes in recommendable items and user -behaviour in a very short space of time. In this paper we describe a new -algorithm for updating collaborative filtering models incrementally, and -demonstrate its effectiveness on clickstream data from The Times. We also -describe the architecture that allows recommendations to be generated on the -fly, and how we have made each component scalable. The system is currently -being used in production at News UK. -" -5743,1709.05295,"Shereen Oraby, Lena Reed, Ryan Compton, Ellen Riloff, Marilyn Walker, - and Steve Whittaker","And That's A Fact: Distinguishing Factual and Emotional Argumentation in - Online Dialogue",cs.CL," We investigate the characteristics of factual and emotional argumentation -styles observed in online debates. Using an annotated set of ""factual"" and -""feeling"" debate forum posts, we extract patterns that are highly correlated -with factual and emotional arguments, and then apply a bootstrapping -methodology to find new patterns in a larger pool of unannotated forum posts. -This process automatically produces a large set of patterns representing -linguistic expressions that are highly correlated with factual and emotional -language. Finally, we analyze the most discriminating patterns to better -understand the defining characteristics of factual and emotional arguments. -" -5744,1709.05305,"Shereen Oraby, Vrindavan Harrison, Amita Misra, Ellen Riloff, and - Marilyn Walker","Are you serious?: Rhetorical Questions and Sarcasm in Social Media - Dialog",cs.CL," Effective models of social dialog must understand a broad range of rhetorical -and figurative devices. Rhetorical questions (RQs) are a type of figurative -language whose aim is to achieve a pragmatic goal, such as structuring an -argument, being persuasive, emphasizing a point, or being ironic. While there -are computational models for other forms of figurative language, rhetorical -questions have received little attention to date. We expand a small dataset -from previous work, presenting a corpus of 10,270 RQs from debate forums and -Twitter that represent different discourse functions. We show that we can -clearly distinguish between RQs and sincere questions (0.76 F1). We then show -that RQs can be used both sarcastically and non-sarcastically, observing that -non-sarcastic (other) uses of RQs are frequently argumentative in forums, and -persuasive in tweets. We present experiments to distinguish between these uses -of RQs using SVM and LSTM models that represent linguistic features and -post-level context, achieving results as high as 0.76 F1 for ""sarcastic"" and -0.77 F1 for ""other"" in forums, and 0.83 F1 for both ""sarcastic"" and ""other"" in -tweets. We supplement our quantitative experiments with an in-depth -characterization of the linguistic variation in RQs. -" -5745,1709.05308,"Shereen Oraby, Sheideh Homayon, and Marilyn Walker","Harvesting Creative Templates for Generating Stylistically Varied - Restaurant Reviews",cs.CL," Many of the creative and figurative elements that make language exciting are -lost in translation in current natural language generation engines. In this -paper, we explore a method to harvest templates from positive and negative -reviews in the restaurant domain, with the goal of vastly expanding the types -of stylistic variation available to the natural language generator. We learn -hyperbolic adjective patterns that are representative of the strongly-valenced -expressive language commonly used in either positive or negative reviews. We -then identify and delexicalize entities, and use heuristics to extract -generation templates from review sentences. We evaluate the learned templates -against more traditional review templates, using subjective measures of -""convincingness"", ""interestingness"", and ""naturalness"". Our results show that -the learned templates score highly on these measures. Finally, we analyze the -linguistic categories that characterize the learned positive and negative -templates. We plan to use the learned templates to improve the conversational -style of dialogue systems in the restaurant domain. -" -5746,1709.05404,"Shereen Oraby, Vrindavan Harrison, Lena Reed, Ernesto Hernandez, Ellen - Riloff, and Marilyn Walker",Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue,cs.CL," The use of irony and sarcasm in social media allows us to study them at scale -for the first time. However, their diversity has made it difficult to construct -a high-quality corpus of sarcasm in dialogue. Here, we describe the process of -creating a large- scale, highly-diverse corpus of online debate forums -dialogue, and our novel methods for operationalizing classes of sarcasm in the -form of rhetorical questions and hyperbole. We show that we can use -lexico-syntactic cues to reliably retrieve sarcastic utterances with high -accuracy. To demonstrate the properties and quality of our corpus, we conduct -supervised learning experiments with simple features, and show that we achieve -both higher precision and F than previous work on sarcasm in debate forums -dialogue. We apply a weakly-supervised linguistic pattern learner and -qualitatively analyze the linguistic differences in each class. -" -5747,1709.05411,"Kevin K. Bowden, Shereen Oraby, Jiaqi Wu, Amita Misra, and Marilyn - Walker","Combining Search with Structured Data to Create a More Engaging User - Experience in Open Domain Dialogue",cs.CL," The greatest challenges in building sophisticated open-domain conversational -agents arise directly from the potential for ongoing mixed-initiative -multi-turn dialogues, which do not follow a particular plan or pursue a -particular fixed information need. In order to make coherent conversational -contributions in this context, a conversational agent must be able to track the -types and attributes of the entities under discussion in the conversation and -know how they are related. In some cases, the agent can rely on structured -information sources to help identify the relevant semantic relations and -produce a turn, but in other cases, the only content available comes from -search, and it may be unclear which semantic relations hold between the search -results and the discourse context. A further constraint is that the system must -produce its contribution to the ongoing conversation in real-time. This paper -describes our experience building SlugBot for the 2017 Alexa Prize, and -discusses how we leveraged search and structured data from different sources to -help SlugBot produce dialogic turns and carry on conversations whose length -over the semi-finals user evaluation period averaged 8:17 minutes. -" -5748,1709.05413,"Shereen Oraby, Pritam Gundecha, Jalal Mahmud, Mansurul Bhuiyan, and - Rama Akkiraju","""How May I Help You?"": Modeling Twitter Customer Service Conversations - Using Fine-Grained Dialogue Acts",cs.CL," Given the increasing popularity of customer service dialogue on Twitter, -analysis of conversation data is essential to understand trends in customer and -agent behavior for the purpose of automating customer service interactions. In -this work, we develop a novel taxonomy of fine-grained ""dialogue acts"" -frequently observed in customer service, showcasing acts that are more suited -to the domain than the more generic existing taxonomies. Using a sequential -SVM-HMM model, we model conversation flow, predicting the dialogue act of a -given turn in real-time. We characterize differences between customer and agent -behavior in Twitter customer service conversations, and investigate the effect -of testing our system on different customer service industries. Finally, we use -a data-driven approach to predict important conversation outcomes: customer -satisfaction, customer frustration, and overall problem resolution. We show -that the type and location of certain dialogue acts in a conversation have a -significant effect on the probability of desirable and undesirable outcomes, -and present actionable rules based on our findings. The patterns and rules we -derive can be used as guidelines for outcome-driven automated customer service -platforms. -" -5749,1709.05453,"Tom Young, Erik Cambria, Iti Chaturvedi, Minlie Huang, Hao Zhou, - Subham Biswas",Augmenting End-to-End Dialog Systems with Commonsense Knowledge,cs.AI cs.CL," Building dialog agents that can converse naturally with humans is a -challenging yet intriguing problem of artificial intelligence. In open-domain -human-computer conversation, where the conversational agent is expected to -respond to human responses in an interesting and engaging way, commonsense -knowledge has to be integrated into the model effectively. In this paper, we -investigate the impact of providing commonsense knowledge about the concepts -covered in the dialog. Our model represents the first attempt to integrating a -large commonsense knowledge base into end-to-end conversational models. In the -retrieval-based scenario, we propose the Tri-LSTM model to jointly take into -account message and commonsense for selecting an appropriate response. Our -experiments suggest that the knowledge-augmented models are superior to their -knowledge-free counterparts in automatic evaluation. -" -5750,1709.05467,"Ying Lin, Joe Hoover, Morteza Dehghani, Marlon Mooijman, Heng Ji",Acquiring Background Knowledge to Improve Moral Value Prediction,cs.CL cs.CY," In this paper, we address the problem of detecting expressions of moral -values in tweets using content analysis. This is a particularly challenging -problem because moral values are often only implicitly signaled in language, -and tweets contain little contextual information due to length constraints. To -address these obstacles, we present a novel approach to automatically acquire -background knowledge from an external knowledge base to enrich input texts and -thus improve moral value prediction. By combining basic text features with -background knowledge, our overall context-aware framework achieves performance -comparable to a single human annotator. To the best of our knowledge, this is -the first attempt to incorporate background knowledge for the prediction of -implicit psychological variables in the area of computational social science. -" -5751,1709.05475,"Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-Yi Lee, and Lin-shan Lee","Order-Preserving Abstractive Summarization for Spoken Content Based on - Connectionist Temporal Classification",cs.CL," Connectionist temporal classification (CTC) is a powerful approach for -sequence-to-sequence learning, and has been popularly used in speech -recognition. The central ideas of CTC include adding a label ""blank"" during -training. With this mechanism, CTC eliminates the need of segment alignment, -and hence has been applied to various sequence-to-sequence learning problems. -In this work, we applied CTC to abstractive summarization for spoken content. -The ""blank"" in this case implies the corresponding input data are less -important or noisy; thus it can be ignored. This approach was shown to -outperform the existing methods in term of ROUGE scores over Chinese Gigaword -and MATBN corpora. This approach also has the nice property that the ordering -of words or characters in the input documents can be better preserved in the -generated summaries. -" -5752,1709.05487,"Sreelekha S, Pushpak Bhattacharyya",Role of Morphology Injection in Statistical Machine Translation,cs.CL," Phrase-based Statistical models are more commonly used as they perform -optimally in terms of both, translation quality and complexity of the system. -Hindi and in general all Indian languages are morphologically richer than -English. Hence, even though Phrase-based systems perform very well for the less -divergent language pairs, for English to Indian language translation, we need -more linguistic information (such as morphology, parse tree, parts of speech -tags, etc.) on the source side. Factored models seem to be useful in this case, -as Factored models consider word as a vector of factors. These factors can -contain any information about the surface word and use it while translating. -Hence, the objective of this work is to handle morphological inflections in -Hindi and Marathi using Factored translation models while translating from -English. SMT approaches face the problem of data sparsity while translating -into a morphologically rich language. It is very unlikely for a parallel corpus -to contain all morphological forms of words. We propose a solution to generate -these unseen morphological forms and inject them into original training -corpora. In this paper, we study factored models and the problem of sparseness -in context of translation to morphologically rich languages. We propose a -simple and effective solution which is based on enriching the input with -various morphological forms of words. We observe that morphology injection -improves the quality of translation in terms of both adequacy and fluency. We -verify this with the experiments on two morphologically rich languages: Hindi -and Marathi, while translating from English. -" -5753,1709.05522,"Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng","AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech - Recognition Baseline",cs.CL," An open-source Mandarin speech corpus called AISHELL-1 is released. It is by -far the largest corpus which is suitable for conducting the speech recognition -research and building speech recognition systems for Mandarin. The recording -procedure, including audio capturing devices and environments are presented in -details. The preparation of the related resources, including transcriptions and -lexicon are described. The corpus is released with a Kaldi recipe. Experimental -results implies that the quality of audio recordings and transcriptions are -promising. -" -5754,1709.05563,Philipp Broniecki and Anna Hanchar and Slava J. Mikhaylov,"Data Innovation for International Development: An overview of natural - language processing for qualitative data analysis",cs.CL," Availability, collection and access to quantitative data, as well as its -limitations, often make qualitative data the resource upon which development -programs heavily rely. Both traditional interview data and social media -analysis can provide rich contextual information and are essential for -research, appraisal, monitoring and evaluation. These data may be difficult to -process and analyze both systematically and at scale. This, in turn, limits the -ability of timely data driven decision-making which is essential in fast -evolving complex social systems. In this paper, we discuss the potential of -using natural language processing to systematize analysis of qualitative data, -and to inform quick decision-making in the development context. We illustrate -this with interview data generated in a format of micro-narratives for the UNDP -Fragments of Impact project. -" -5755,1709.05576,"Anna Mastora, Manolis Peponakis, Sarantos Kapidakis","SKOS Concepts and Natural Language Concepts: an Analysis of Latent - Relationships in KOSs",cs.DL cs.AI cs.CL," The vehicle to represent Knowledge Organization Systems (KOSs) in the -environment of the Semantic Web and linked data is the Simple Knowledge -Organization System (SKOS). SKOS provides a way to assign a URI to each -concept, and this URI functions as a surrogate for the concept. This fact makes -of main concern the need to clarify the URIs' ontological meaning. The aim of -this study is to investigate the relation between the ontological substance of -KOS concepts and concepts revealed through the grammatical and syntactic -formalisms of natural language. For this purpose, we examined the dividableness -of concepts in specific KOSs (i.e. a thesaurus, a subject headings system and a -classification scheme) by applying Natural Language Processing (NLP) techniques -(i.e. morphosyntactic analysis) to the lexical representations (i.e. RDF -literals) of SKOS concepts. The results of the comparative analysis reveal -that, despite the use of multi-word units, thesauri tend to represent concepts -in a way that can hardly be further divided conceptually, while Subject -Headings and Classification Schemes - to a certain extent - comprise terms that -can be decomposed into more conceptual constituents. Consequently, SKOS -concepts deriving from thesauri are more likely to represent atomic conceptual -units and thus be more appropriate tools for inference and reasoning. Since -identifiers represent the meaning of a concept, complex concepts are neither -the most appropriate nor the most efficient way of modelling a KOS for the -Semantic Web. -" -5756,1709.05587,"Chao-Lin Liu, Shuhua Zhang, Yuanli Geng, Huei-ling Lai, Hongsu Wang","Character Distributions of Classical Chinese Literary Texts: Zipf's Law, - Genres, and Epochs",cs.CL cs.DL," We collect 14 representative corpora for major periods in Chinese history in -this study. These corpora include poetic works produced in several dynasties, -novels of the Ming and Qing dynasties, and essays and news reports written in -modern Chinese. The time span of these corpora ranges between 1046 BCE and 2007 -CE. We analyze their character and word distributions from the viewpoint of the -Zipf's law, and look for factors that affect the deviations and similarities -between their Zipfian curves. Genres and epochs demonstrated their influences -in our analyses. Specifically, the character distributions for poetic works of -between 618 CE and 1644 CE exhibit striking similarity. In addition, although -texts of the same dynasty may tend to use the same set of characters, their -character distributions still deviate from each other. -" -5757,1709.05599,"Wei Li, Yunfang Wu",Hierarchical Gated Recurrent Neural Tensor Network for Answer Triggering,cs.CL," In this paper, we focus on the problem of answer triggering ad-dressed by -Yang et al. (2015), which is a critical component for a real-world question -answering system. We employ a hierarchical gated recurrent neural tensor -(HGRNT) model to capture both the context information and the deep -in-teractions between the candidate answers and the question. Our result on F -val-ue achieves 42.6%, which surpasses the baseline by over 10 %. -" -5758,1709.05631,"Marcely Zanon Boito, Alexandre Berard, Aline Villavicencio and Laurent - Besacier","Unwritten Languages Demand Attention Too! Word Discovery with - Encoder-Decoder Models",cs.CL," Word discovery is the task of extracting words from unsegmented text. In this -paper we examine to what extent neural networks can be applied to this task in -a realistic unwritten language scenario, where only small corpora and limited -annotations are available. We investigate two scenarios: one with no -supervision and another with limited supervision with access to the most -frequent words. Obtained results show that it is possible to retrieve at least -27% of the gold standard vocabulary by training an encoder-decoder neural -machine translation system with only 5,157 sentences. This result is close to -those obtained with a task-specific Bayesian nonparametric model. Moreover, our -approach has the advantage of generating translation alignments, which could be -used to create a bilingual lexicon. As a future perspective, this approach is -also well suited to work directly from speech. -" -5759,1709.05700,Amin Jaber and Fadi A. Zaraket,"Morphology-based Entity and Relational Entity Extraction Framework for - Arabic",cs.IR cs.CL," Rule-based techniques to extract relational entities from documents allow -users to specify desired entities with natural language questions, finite state -automata, regular expressions and structured query language. They require -linguistic and programming expertise and lack support for Arabic morphological -analysis. We present a morphology-based entity and relational entity extraction -framework for Arabic (MERF). MERF requires basic knowledge of linguistic -features and regular expressions, and provides the ability to interactively -specify Arabic morphological and synonymity features, tag types associated with -regular expressions, and relations and code actions defined over matches of -subexpressions. MERF constructs entities and relational entities from matches -of the specifications. We evaluated MERF with several case studies. The results -show that MERF requires shorter development time and effort compared to -existing application specific techniques and produces reasonably accurate -results within a reasonable overhead in run time. -" -5760,1709.05729,Chao-Lin Liu,"Flexible Computing Services for Comparisons and Analyses of Classical - Chinese Poetry",cs.CL cs.DL," We collect nine corpora of representative Chinese poetry for the time span of -1046 BCE and 1644 CE for studying the history of Chinese words, collocations, -and patterns. By flexibly integrating our own tools, we are able to provide new -perspectives for approaching our goals. We illustrate the ideas with two -examples. The first example show a new way to compare word preferences of -poets, and the second example demonstrates how we can utilize our corpora in -historical studies of the Chinese words. We show the viability of the tools for -academic research, and we wish to make it helpful for enriching existing -Chinese dictionary as well. -" -5761,1709.05743,"Jan R. Benetka, Krisztian Balog, Kjetil N{\o}rv{\aa}g","Towards Building a Knowledge Base of Monetary Transactions from a News - Collection",cs.IR cs.CL," We address the problem of extracting structured representations of economic -events from a large corpus of news articles, using a combination of natural -language processing and machine learning techniques. The developed techniques -allow for semi-automatic population of a financial knowledge base, which, in -turn, may be used to support a range of data mining and exploration tasks. The -key challenge we face in this domain is that the same event is often reported -multiple times, with varying correctness of details. We address this challenge -by first collecting all information pertinent to a given event from the entire -corpus, then considering all possible representations of the event, and -finally, using a supervised learning method, to rank these representations by -the associated confidence scores. A main innovative element of our approach is -that it jointly extracts and stores all attributes of the event as a single -representation (quintuple). Using a purpose-built test set we demonstrate that -our supervised learning approach can achieve 25% improvement in F1-score over -baseline methods that consider the earliest, the latest or the most frequent -reporting of the event. -" -5762,1709.05778,"Bradford Heap, Michael Bain, Wayne Wobcke, Alfred Krzywicki and - Susanne Schmeidl","Word Vector Enrichment of Low Frequency Words in the Bag-of-Words Model - for Short Text Multi-class Classification Problems",cs.CL cs.LG," The bag-of-words model is a standard representation of text for many linear -classifier learners. In many problem domains, linear classifiers are preferred -over more complex models due to their efficiency, robustness and -interpretability, and the bag-of-words text representation can capture -sufficient information for linear classifiers to make highly accurate -predictions. However in settings where there is a large vocabulary, large -variance in the frequency of terms in the training corpus, many classes and -very short text (e.g., single sentences or document titles) the bag-of-words -representation becomes extremely sparse, and this can reduce the accuracy of -classifiers. A particular issue in such settings is that short texts tend to -contain infrequently occurring or rare terms which lack class-conditional -evidence. In this work we introduce a method for enriching the bag-of-words -model by complementing such rare term information with related terms from both -general and domain-specific Word Vector models. By reducing sparseness in the -bag-of-words models, our enrichment approach achieves improved classification -over several baseline classifiers in a variety of text classification problems. -Our approach is also efficient because it requires no change to the linear -classifier before or during training, since bag-of-words enrichment applies -only to text being classified. -" -5763,1709.05820,"Pavel Levin, Nishikant Dhanuka, Talaat Khalil, Fedor Kovalev, Maxim - Khalilov","Toward a full-scale neural machine translation in production: the - Booking.com use case",cs.CL," While some remarkable progress has been made in neural machine translation -(NMT) research, there have not been many reports on its development and -evaluation in practice. This paper tries to fill this gap by presenting some of -our findings from building an in-house travel domain NMT system in a large -scale E-commerce setting. The three major topics that we cover are optimization -and training (including different optimization strategies and corpus sizes), -handling real-world content and evaluating results. -" -5764,1709.05914,Mareike Hartmann and Anders Soegaard,Limitations of Cross-Lingual Learning from Image Search,cs.CL," Cross-lingual representation learning is an important step in making NLP -scale to all the world's languages. Recent work on bilingual lexicon induction -suggests that it is possible to learn cross-lingual representations of words -based on similarities between images associated with these words. However, that -work focused on the translation of selected nouns only. In our work, we -investigate whether the meaning of other parts-of-speech, in particular -adjectives and verbs, can be learned in the same way. We also experiment with -combining the representations learned from visual data with embeddings learned -from textual data. Our experiments across five language pairs indicate that -previous work does not scale to the problem of learning cross-lingual -representations beyond simple nouns. -" -5765,1709.06033,"Dai Quoc Nguyen, Dat Quoc Nguyen, Cuong Xuan Chu, Stefan Thater and - Manfred Pinkal",Sequence to Sequence Learning for Event Prediction,cs.CL," This paper presents an approach to the task of predicting an event -description from a preceding sentence in a text. Our approach explores -sequence-to-sequence learning using a bidirectional multi-layer recurrent -neural network. Our approach substantially outperforms previous work in terms -of the BLEU score on two datasets derived from WikiHow and DeScript -respectively. Since the BLEU score is not easy to interpret as a measure of -event prediction, we complement our study with a second evaluation that -exploits the rich linguistic annotation of gold paraphrase sets of events. -" -5766,1709.06136,"Bing Liu, Ian Lane","Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural - Dialog Models",cs.CL," In this paper, we present a deep reinforcement learning (RL) framework for -iterative dialog policy optimization in end-to-end task-oriented dialog -systems. Popular approaches in learning dialog policy with RL include letting a -dialog agent to learn against a user simulator. Building a reliable user -simulator, however, is not trivial, often as difficult as building a good -dialog agent. We address this challenge by jointly optimizing the dialog agent -and the user simulator with deep RL by simulating dialogs between the two -agents. We first bootstrap a basic dialog agent and a basic user simulator by -learning directly from dialog corpora with supervised training. We then improve -them further by letting the two agents to conduct task-oriented dialogs and -iteratively optimizing their policies with deep RL. Both the dialog agent and -the user simulator are designed with neural network models that can be trained -end-to-end. Our experiment results show that the proposed method leads to -promising improvements on task success rate and total task reward comparing to -supervised training and single-agent RL training baseline models. -" -5767,1709.06162,Alberto Mor\'on Hern\'andez,Paraphrasing verbal metonymy through computational methods,cs.CL," Verbal metonymy has received relatively scarce attention in the field of -computational linguistics despite the fact that a model to accurately -paraphrase metonymy has applications both in academia and the technology -sector. The method described in this paper makes use of data from the British -National Corpus in order to create word vectors, find instances of verbal -metonymy and generate potential paraphrases. Two different ways of creating -word vectors are evaluated in this study: Continuous bag of words and -Skip-grams. Skip-grams are found to outperform the Continuous bag of words -approach. Furthermore, the Skip-gram model is found to operate with -better-than-chance accuracy and there is a strong positive relationship (phi -coefficient = 0.61) between the model's classification and human judgement of -the ranked paraphrases. This study lends credence to the viability of modelling -verbal metonymy through computational methods based on distributional -semantics. -" -5768,1709.06265,"Zi-Yi Dou, Hao Zhou, Shu-Jian Huang, Xin-Yu Dai, Jia-Jun Chen",Dynamic Oracle for Neural Machine Translation in Decoding Phase,cs.CL," The past several years have witnessed the rapid progress of end-to-end Neural -Machine Translation (NMT). However, there exists discrepancy between training -and inference in NMT when decoding, which may lead to serious problems since -the model might be in a part of the state space it has never seen during -training. To address the issue, Scheduled Sampling has been proposed. However, -there are certain limitations in Scheduled Sampling and we propose two dynamic -oracle-based methods to improve it. We manage to mitigate the discrepancy by -changing the training process towards a less guided scheme and meanwhile -aggregating the oracle's demonstrations. Experimental results show that the -proposed approaches improve translation quality over standard NMT system. -" -5769,1709.06307,"Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras, Mark Johnson",A Fast and Accurate Vietnamese Word Segmenter,cs.CL," We propose a novel approach to Vietnamese word segmentation. Our approach is -based on the Single Classification Ripple Down Rules methodology (Compton and -Jansen, 1990), where rules are stored in an exception structure and new rules -are only added to correct segmentation errors given by existing rules. -Experimental results on the benchmark Vietnamese treebank show that our -approach outperforms previous state-of-the-art approaches JVnSegmenter, -vnTokenizer, DongDu and UETsegmenter in terms of both accuracy and performance -speed. Our code is open-source and available at: -https://github.com/datquocnguyen/RDRsegmenter. -" -5770,1709.06309,"Soufian Jebbara, Philipp Cimiano","Aspect-Based Relational Sentiment Analysis Using a Stacked Neural - Network Architecture",cs.CL," Sentiment analysis can be regarded as a relation extraction problem in which -the sentiment of some opinion holder towards a certain aspect of a product, -theme or event needs to be extracted. We present a novel neural architecture -for sentiment analysis as a relation extraction problem that addresses this -problem by dividing it into three subtasks: i) identification of aspect and -opinion terms, ii) labeling of opinion terms with a sentiment, and iii) -extraction of relations between opinion terms and aspect terms. For each -subtask, we propose a neural network based component and combine all of them -into a complete system for relational sentiment analysis. The component for -aspect and opinion term extraction is a hybrid architecture consisting of a -recurrent neural network stacked on top of a convolutional neural network. This -approach outperforms a standard convolutional deep neural architecture as well -as a recurrent network architecture and performs competitively compared to -other methods on two datasets of annotated customer reviews. To extract -sentiments for individual opinion terms, we propose a recurrent architecture in -combination with word distance features and achieve promising results, -outperforming a majority baseline by 18% accuracy and providing the first -results for the USAGE dataset. Our relation extraction component outperforms -the current state-of-the-art in aspect-opinion relation extraction by 15% -F-Measure. -" -5771,1709.06311,"Soufian Jebbara, Philipp Cimiano","Aspect-Based Sentiment Analysis Using a Two-Step Neural Network - Architecture",cs.CL," The World Wide Web holds a wealth of information in the form of unstructured -texts such as customer reviews for products, events and more. By extracting and -analyzing the expressed opinions in customer reviews in a fine-grained way, -valuable opportunities and insights for customers and businesses can be gained. -We propose a neural network based system to address the task of Aspect-Based -Sentiment Analysis to compete in Task 2 of the ESWC-2016 Challenge on Semantic -Sentiment Analysis. Our proposed architecture divides the task in two subtasks: -aspect term extraction and aspect-specific sentiment extraction. This approach -is flexible in that it allows to address each subtask independently. As a first -step, a recurrent neural network is used to extract aspects from a text by -framing the problem as a sequence labeling task. In a second step, a recurrent -network processes each extracted aspect with respect to its context and -predicts a sentiment label. The system uses pretrained semantic word embedding -features which we experimentally enhance with semantic knowledge extracted from -WordNet. Further features extracted from SenticNet prove to be beneficial for -the extraction of sentiment labels. As the best performing system in its -category, our proposed system proves to be an effective approach for the -Aspect-Based Sentiment Analysis. -" -5772,1709.06317,"Soufian Jebbara, Philipp Cimiano",Improving Opinion-Target Extraction with Character-Level Word Embeddings,cs.CL," Fine-grained sentiment analysis is receiving increasing attention in recent -years. Extracting opinion target expressions (OTE) in reviews is often an -important step in fine-grained, aspect-based sentiment analysis. Retrieving -this information from user-generated text, however, can be difficult. Customer -reviews, for instance, are prone to contain misspelled words and are difficult -to process due to their domain-specific language. In this work, we investigate -whether character-level models can improve the performance for the -identification of opinion target expressions. We integrate information about -the character structure of a word into a sequence labeling system using -character-level word embeddings and show their positive impact on the system's -performance. Specifically, we obtain an increase by 3.3 points F1-score with -respect to our baseline model. In further experiments, we reveal encoded -character patterns of the learned embeddings and give a nuanced view of the -performance differences of both models. -" -5773,1709.06365,"He Zhao, Lan Du, Wray Buntine, Gang Liu",MetaLDA: a Topic Model that Efficiently Incorporates Meta information,cs.CL stat.AP," Besides the text content, documents and their associated words usually come -with rich sets of meta informa- tion, such as categories of documents and -semantic/syntactic features of words, like those encoded in word embeddings. -Incorporating such meta information directly into the generative process of -topic models can improve modelling accuracy and topic quality, especially in -the case where the word-occurrence information in the training data is -insufficient. In this paper, we present a topic model, called MetaLDA, which is -able to leverage either document or word meta information, or both of them -jointly. With two data argumentation techniques, we can derive an efficient -Gibbs sampling algorithm, which benefits from the fully local conjugacy of the -model. Moreover, the algorithm is favoured by the sparsity of the meta -information. Extensive experiments on several real world datasets demonstrate -that our model achieves comparable or improved performance in terms of both -perplexity and topic quality, particularly in handling sparse texts. In -addition, compared with other models using meta information, our model runs -significantly faster. -" -5774,1709.06429,"Shaona Ghosh, Per Ola Kristensson",Neural Networks for Text Correction and Completion in Keyboard Decoding,cs.CL cs.LG," Despite the ubiquity of mobile and wearable text messaging applications, the -problem of keyboard text decoding is not tackled sufficiently in the light of -the enormous success of the deep learning Recurrent Neural Network (RNN) and -Convolutional Neural Networks (CNN) for natural language understanding. In -particular, considering that the keyboard decoders should operate on devices -with memory and processor resource constraints, makes it challenging to deploy -industrial scale deep neural network (DNN) models. This paper proposes a -sequence-to-sequence neural attention network system for automatic text -correction and completion. Given an erroneous sequence, our model encodes -character level hidden representations and then decodes the revised sequence -thus enabling auto-correction and completion. We achieve this by a combination -of character level CNN and gated recurrent unit (GRU) encoder along with and a -word level gated recurrent unit (GRU) attention decoder. Unlike traditional -language models that learn from billions of words, our corpus size is only 12 -million words; an order of magnitude smaller. The memory footprint of our -learnt model for inference and prediction is also an order of magnitude smaller -than the conventional language model based text decoders. We report baseline -performance for neural keyboard decoders in such limited domain. Our models -achieve a word level accuracy of $90\%$ and a character error rate CER of -$2.4\%$ over the Twitter typo dataset. We present a novel dataset of noisy to -corrected mappings by inducing the noise distribution from the Twitter data -over the OpenSubtitles 2009 dataset; on which our model predicts with a word -level accuracy of $98\%$ and sequence accuracy of $68.9\%$. In our user study, -our model achieved an average CER of $2.6\%$ with the state-of-the-art -non-neural touch-screen keyboard decoder at CER of $1.6\%$. -" -5775,1709.06436,"Gakuto Kurata, Bhuvana Ramabhadran, George Saon, Abhinav Sethy",Language Modeling with Highway LSTM,cs.CL," Language models (LMs) based on Long Short Term Memory (LSTM) have shown good -gains in many automatic speech recognition tasks. In this paper, we extend an -LSTM by adding highway networks inside an LSTM and use the resulting Highway -LSTM (HW-LSTM) model for language modeling. The added highway networks increase -the depth in the time dimension. Since a typical LSTM has two internal states, -a memory cell and a hidden state, we compare various types of HW-LSTM by adding -highway networks onto the memory cell and/or the hidden state. Experimental -results on English broadcast news and conversational telephone speech -recognition show that the proposed HW-LSTM LM improves speech recognition -accuracy on top of a strong LSTM LM baseline. We report 5.1% and 9.9% on the -Switchboard and CallHome subsets of the Hub5 2000 evaluation, which reaches the -best performance numbers reported on these tasks to date. -" -5776,1709.06438,"Shachar Mirkin, Michal Jacovi, Tamar Lavee, Hong-Kwang Kuo, Samuel - Thomas, Leslie Sager, Lili Kotlerman, Elad Venezian, Noam Slonim",A Recorded Debating Dataset,cs.CL," This paper describes an English audio and textual dataset of debating -speeches, a unique resource for the growing research field of computational -argumentation and debating technologies. We detail the process of speech -recording by professional debaters, the transcription of the speeches with an -Automatic Speech Recognition (ASR) system, their consequent automatic -processing to produce a text that is more ""NLP-friendly"", and in parallel -- -the manual transcription of the speeches in order to produce gold-standard -""reference"" transcripts. We release 60 speeches on various controversial -topics, each in five formats corresponding to the different stages in the -production of the data. The intention is to allow utilizing this resource for -multiple research purposes, be it the addition of in-domain training data for a -debate-specific ASR system, or applying argumentation mining on either noisy or -clean debate transcripts. We intend to make further releases of this data in -the future. -" -5777,1709.06671,"Danushka Bollegala, Kohei Hayashi and Ken-ichi Kawarabayashi","Think Globally, Embed Locally --- Locally Linear Meta-embedding of Words",cs.CL cs.LG cs.NE," Distributed word embeddings have shown superior performances in numerous -Natural Language Processing (NLP) tasks. However, their performances vary -significantly across different tasks, implying that the word embeddings learnt -by those methods capture complementary aspects of lexical semantics. Therefore, -we believe that it is important to combine the existing word embeddings to -produce more accurate and complete \emph{meta-embeddings} of words. For this -purpose, we propose an unsupervised locally linear meta-embedding learning -method that takes pre-trained word embeddings as the input, and produces more -accurate meta embeddings. Unlike previously proposed meta-embedding learning -methods that learn a global projection over all words in a vocabulary, our -proposed method is sensitive to the differences in local neighbourhoods of the -individual source word embeddings. Moreover, we show that vector concatenation, -a previously proposed highly competitive baseline approach for integrating word -embeddings, can be derived as a special case of the proposed method. -Experimental results on semantic similarity, word analogy, relation -classification, and short-text classification tasks show that our -meta-embeddings to significantly outperform prior methods in several benchmark -datasets, establishing a new state of the art for meta-embeddings. -" -5778,1709.06673,Huda Hakami and Danushka Bollegala and Hayashi Kohei,"Why PairDiff works? -- A Mathematical Analysis of Bilinear Relational - Compositional Operators for Analogy Detection",cs.CL cs.AI cs.LG cs.NE," Representing the semantic relations that exist between two given words (or -entities) is an important first step in a wide-range of NLP applications such -as analogical reasoning, knowledge base completion and relational information -retrieval. A simple, yet surprisingly accurate method for representing a -relation between two words is to compute the vector offset (\PairDiff) between -their corresponding word embeddings. Despite the empirical success, it remains -unclear as to whether \PairDiff is the best operator for obtaining a relational -representation from word embeddings. We conduct a theoretical analysis of -generalised bilinear operators that can be used to measure the $\ell_{2}$ -relational distance between two word-pairs. We show that, if the word -embeddings are standardised and uncorrelated, such an operator will be -independent of bilinear terms, and can be simplified to a linear form, where -\PairDiff is a special case. For numerous word embedding types, we empirically -verify the uncorrelation assumption, demonstrating the general applicability of -our theoretical result. Moreover, we experimentally discover \PairDiff from the -bilinear relation composition operator on several benchmark analogy datasets. -" -5779,1709.06818,"Yan Ji, Licheng Liu, Hongcui Wang, Zhilei Liu, Zhibin Niu, Bruce Denby",Updating the silent speech challenge benchmark with deep learning,cs.CL cs.CV cs.HC," The 2010 Silent Speech Challenge benchmark is updated with new results -obtained in a Deep Learning strategy, using the same input features and -decoding strategy as in the original article. A Word Error Rate of 6.4% is -obtained, compared to the published value of 17.4%. Additional results -comparing new auto-encoder-based features with the original features at reduced -dimensionality, as well as decoding scenarios on two different language models, -are also presented. The Silent Speech Challenge archive has been updated to -contain both the original and the new auto-encoder features, in addition to the -original raw data. -" -5780,1709.06901,"Zhipeng Jiang, Chao Zhao, Bin He, Yi Guan, Jingchi Jiang","De-identification of medical records using conditional random fields and - long short-term memory networks",cs.CL," The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processing -focuses on the de-identification of psychiatric evaluation records. This paper -describes two participating systems of our team, based on conditional random -fields (CRFs) and long short-term memory networks (LSTMs). A pre-processing -module was introduced for sentence detection and tokenization before -de-identification. For CRFs, manually extracted rich features were utilized to -train the model. For LSTMs, a character-level bi-directional LSTM network was -applied to represent tokens and classify tags for each token, following which a -decoding layer was stacked to decode the most probable protected health -information (PHI) terms. The LSTM-based system attained an i2b2 strict -micro-F_1 measure of 89.86%, which was higher than that of the CRF-based -system. -" -5781,1709.06907,"Simon Razniewski, Vevake Balaraman, Werner Nutt","Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings - of Knowledge Base Properties [Extended Version]",cs.IR cs.AI cs.CL cs.DB," In knowledge bases such as Wikidata, it is possible to assert a large set of -properties for entities, ranging from generic ones such as name and place of -birth to highly profession-specific or background-specific ones such as -doctoral advisor or medical condition. Determining a preference or ranking in -this large set is a challenge in tasks such as prioritisation of edits or -natural-language generation. Most previous approaches to ranking knowledge base -properties are purely data-driven, that is, as we show, mistake frequency for -interestingness. - In this work, we have developed a human-annotated dataset of 350 preference -judgments among pairs of knowledge base properties for fixed entities. From -this set, we isolate a subset of pairs for which humans show a high level of -agreement (87.5% on average). We show, however, that baseline and -state-of-the-art techniques achieve only 61.3% precision in predicting human -preferences for this subset. - We then analyze what contributes to one property being rated as more -important than another one, and identify that at least three factors play a -role, namely (i) general frequency, (ii) applicability to similar entities and -(iii) semantic similarity between property and entity. We experimentally -analyze the contribution of each factor and show that a combination of -techniques addressing all the three factors achieves 74% precision on the task. - The dataset is available at -www.kaggle.com/srazniewski/wikidatapropertyranking. -" -5782,1709.06918,"Chao Zhao, Min Zhao, Yi Guan","Constructing a Hierarchical User Interest Structure based on User - Profiles",cs.CL cs.IR," The interests of individual internet users fall into a hierarchical structure -which is useful in regards to building personalized searches and -recommendations. Most studies on this subject construct the interest hierarchy -of a single person from the document perspective. In this study, we constructed -the user interest hierarchy via user profiles. We organized 433,397 user -interests, referred to here as ""attentions"", into a user attention network -(UAN) from 200 million user profiles; we then applied the Louvain algorithm to -detect hierarchical clusters in these attentions. Finally, a 26-level hierarchy -with 34,676 clusters was obtained. We found that these attention clusters were -aggregated according to certain topics as opposed to the hyponymy-relation -based conceptual ontologies. The topics can be entities or concepts, and the -relations were not restrained by hyponymy. The concept relativity encapsulated -in the user's interest can be captured by labeling the attention clusters with -corresponding concepts. -" -5783,1709.06990,Emmanuel Dufourq and Bruce A. Bassett,Text Compression for Sentiment Analysis via Evolutionary Algorithms,cs.NE cs.AI cs.CL stat.ML," Can textual data be compressed intelligently without losing accuracy in -evaluating sentiment? In this study, we propose a novel evolutionary -compression algorithm, PARSEC (PARts-of-Speech for sEntiment Compression), -which makes use of Parts-of-Speech tags to compress text in a way that -sacrifices minimal classification accuracy when used in conjunction with -sentiment analysis algorithms. An analysis of PARSEC with eight commercial and -non-commercial sentiment analysis algorithms on twelve English sentiment data -sets reveals that accurate compression is possible with (0%, 1.3%, 3.3%) loss -in sentiment classification accuracy for (20%, 50%, 75%) data compression with -PARSEC using LingPipe, the most accurate of the sentiment algorithms. Other -sentiment analysis algorithms are more severely affected by compression. We -conclude that significant compression of text data is possible for sentiment -analysis depending on the accuracy demands of the specific application and the -specific sentiment analysis algorithm used. -" -5784,1709.07104,"Thai-Hoang Pham, Xuan-Khoai Pham, Phuong Le-Hong","On the Use of Machine Translation-Based Approaches for Vietnamese - Diacritic Restoration",cs.CL," This paper presents an empirical study of two machine translation-based -approaches for Vietnamese diacritic restoration problem, including phrase-based -and neural-based machine translation models. This is the first work that -applies neural-based machine translation method to this problem and gives a -thorough comparison to the phrase-based machine translation method which is the -current state-of-the-art method for this problem. On a large dataset, the -phrase-based approach has an accuracy of 97.32% while that of the neural-based -approach is 96.15%. While the neural-based method has a slightly lower -accuracy, it is about twice faster than the phrase-based method in terms of -inference speed. Moreover, neural-based machine translation method has much -room for future improvement such as incorporating pre-trained word embeddings -and collecting more training data. -" -5785,1709.07109,"Dinghan Shen, Yizhe Zhang, Ricardo Henao, Qinliang Su, Lawrence Carin",Deconvolutional Latent-Variable Model for Text Sequence Matching,cs.CL cs.LG stat.ML," A latent-variable model is introduced for text matching, inferring sentence -representations by jointly optimizing generative and discriminative objectives. -To alleviate typical optimization challenges in latent-variable models for -text, we employ deconvolutional networks as the sequence decoder (generator), -providing learned latent codes with more semantic information and better -generalization. Our model, trained in an unsupervised manner, yields stronger -empirical predictive performance than a decoder based on Long Short-Term Memory -(LSTM), with less parameters and considerably faster training. Further, we -apply it to text sequence-matching problems. The proposed model significantly -outperforms several strong sentence-encoding baselines, especially in the -semi-supervised setting. -" -5786,1709.07276,"Ahmed Ali, Stephan Vogel, Steve Renals",Speech Recognition Challenge in the Wild: Arabic MGB-3,cs.CL," This paper describes the Arabic MGB-3 Challenge - Arabic Speech Recognition -in the Wild. Unlike last year's Arabic MGB-2 Challenge, for which the -recognition task was based on more than 1,200 hours broadcast TV news -recordings from Aljazeera Arabic TV programs, MGB-3 emphasises dialectal Arabic -using a multi-genre collection of Egyptian YouTube videos. Seven genres were -used for the data collection: comedy, cooking, family/kids, fashion, drama, -sports, and science (TEDx). A total of 16 hours of videos, split evenly across -the different genres, were divided into adaptation, development and evaluation -data sets. The Arabic MGB-Challenge comprised two tasks: A) Speech -transcription, evaluated on the MGB-3 test set, along with the 10 hour MGB-2 -test set to report progress on the MGB-2 evaluation; B) Arabic dialect -identification, introduced this year in order to distinguish between four major -Arabic dialects - Egyptian, Levantine, North African, Gulf, as well as Modern -Standard Arabic. Two hours of audio per dialect were released for development -and a further two hours were used for evaluation. For dialect identification, -both lexical features and i-vector bottleneck features were shared with -participants in addition to the raw audio recordings. Overall, thirteen teams -submitted ten systems to the challenge. We outline the approaches adopted in -each system, and summarise the evaluation results. -" -5787,1709.07357,"Zhiguo Yu, Byron C. Wallace, Todd Johnson, Trevor Cohen","Retrofitting Concept Vector Representations of Medical Concepts to - Improve Estimates of Semantic Similarity and Relatedness",cs.CL," Estimation of semantic similarity and relatedness between biomedical concepts -has utility for many informatics applications. Automated methods fall into two -categories: methods based on distributional statistics drawn from text corpora, -and methods using the structure of existing knowledge resources. Methods in the -former category disregard taxonomic structure, while those in the latter fail -to consider semantically relevant empirical information. In this paper, we -present a method that retrofits distributional context vector representations -of biomedical concepts using structural information from the UMLS -Metathesaurus, such that the similarity between vector representations of -linked concepts is augmented. We evaluated it on the UMNSRS benchmark. Our -results demonstrate that retrofitting of concept vector representations leads -to better correlation with human raters for both similarity and relatedness, -surpassing the best results reported to date. They also demonstrate a clear -improvement in performance on this reference standard for retrofitted vector -representations, as compared to those without retrofitting. -" -5788,1709.07403,"Sapna Negi, Paul Buitelaar","Inducing Distant Supervision in Suggestion Mining through Part-of-Speech - Embeddings",cs.CL," Mining suggestion expressing sentences from a given text is a less -investigated sentence classification task, and therefore lacks hand labeled -benchmark datasets. In this work, we propose and evaluate two approaches for -distant supervision in suggestion mining. The distant supervision is obtained -through a large silver standard dataset, constructed using the text from -wikiHow and Wikipedia. Both the approaches use a LSTM based neural network -architecture to learn a classification model for suggestion mining, but vary in -their method to use the silver standard dataset. The first approach directly -trains the classifier using this dataset, while the second approach only learns -word embeddings from this dataset. In the second approach, we also learn POS -embeddings, which interestingly gives the best classification accuracy. -" -5789,1709.07432,"Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals",Dynamic Evaluation of Neural Sequence Models,cs.NE cs.CL," We present methodology for using dynamic evaluation to improve neural -sequence models. Models are adapted to recent history via a gradient descent -based mechanism, causing them to assign higher probabilities to re-occurring -sequential patterns. Dynamic evaluation outperforms existing adaptation -approaches in our comparisons. Dynamic evaluation improves the state-of-the-art -word-level perplexities on the Penn Treebank and WikiText-2 datasets to 51.1 -and 44.3 respectively, and the state-of-the-art character-level cross-entropies -on the text8 and Hutter Prize datasets to 1.19 bits/char and 1.08 bits/char -respectively. -" -5790,1709.07434,"Guoning Hu, Preeti Bhargava, Saul Fuhrmann, Sarah Ellinger, Nemanja - Spasojevic","Analyzing users' sentiment towards popular consumer industries and - brands on Twitter",cs.CL cs.IR cs.SI," Social media serves as a unified platform for users to express their thoughts -on subjects ranging from their daily lives to their opinion on consumer brands -and products. These users wield an enormous influence in shaping the opinions -of other consumers and influence brand perception, brand loyalty and brand -advocacy. In this paper, we analyze the opinion of 19M Twitter users towards 62 -popular industries, encompassing 12,898 enterprise and consumer brands, as well -as associated subject matter topics, via sentiment analysis of 330M tweets over -a period spanning a month. We find that users tend to be most positive towards -manufacturing and most negative towards service industries. In addition, they -tend to be more positive or negative when interacting with brands than -generally on Twitter. We also find that sentiment towards brands within an -industry varies greatly and we demonstrate this using two industries as use -cases. In addition, we discover that there is no strong correlation between -topic sentiments of different industries, demonstrating that topic sentiments -are highly dependent on the context of the industry that they are mentioned in. -We demonstrate the value of such an analysis in order to assess the impact of -brands on social media. We hope that this initial study will prove valuable for -both researchers and companies in understanding users' perception of -industries, brands and associated topics and encourage more research in this -field. -" -5791,1709.07470,"Arpita Roy, Youngja Park, SHimei Pan",Learning Domain-Specific Word Embeddings from Sparse Cybersecurity Texts,cs.CL," Word embedding is a Natural Language Processing (NLP) technique that -automatically maps words from a vocabulary to vectors of real numbers in an -embedding space. It has been widely used in recent years to boost the -performance of a vari-ety of NLP tasks such as Named Entity Recognition, -Syntac-tic Parsing and Sentiment Analysis. Classic word embedding methods such -as Word2Vec and GloVe work well when they are given a large text corpus. When -the input texts are sparse as in many specialized domains (e.g., -cybersecurity), these methods often fail to produce high-quality vectors. In -this pa-per, we describe a novel method to train domain-specificword embeddings -from sparse texts. In addition to domain texts, our method also leverages -diverse types of domain knowledge such as domain vocabulary and semantic -relations. Specifi-cally, we first propose a general framework to encode -diverse types of domain knowledge as text annotations. Then we de-velop a novel -Word Annotation Embedding (WAE) algorithm to incorporate diverse types of text -annotations in word em-bedding. We have evaluated our method on two -cybersecurity text corpora: a malware description corpus and a Common -Vulnerability and Exposure (CVE) corpus. Our evaluation re-sults have -demonstrated the effectiveness of our method in learning domain-specific word -embeddings. -" -5792,1709.07484,"Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals","WERd: Using Social Text Spelling Variants for Evaluating Dialectal - Speech Recognition",cs.CL," We study the problem of evaluating automatic speech recognition (ASR) systems -that target dialectal speech input. A major challenge in this case is that the -orthography of dialects is typically not standardized. From an ASR evaluation -perspective, this means that there is no clear gold standard for the expected -output, and several possible outputs could be considered correct according to -different human annotators, which makes standard word error rate (WER) -inadequate as an evaluation metric. Such a situation is typical for machine -translation (MT), and thus we borrow ideas from an MT evaluation metric, namely -TERp, an extension of translation error rate which is closely-related to WER. -In particular, in the process of comparing a hypothesis to a reference, we make -use of spelling variants for words and phrases, which we mine from Twitter in -an unsupervised fashion. Our experiments with evaluating ASR output for -Egyptian Arabic, and further manual analysis, show that the resulting WERd -(i.e., WER for dialects) metric, a variant of TERp, is more adequate than WER -for evaluating dialectal ASR. -" -5793,1709.07642,"Wenhao Zheng, Hong-Yu Zhou, Ming Li and Jianxin Wu","Code Attention: Translating Code to Comments by Exploiting Domain - Features",cs.AI cs.CL," Appropriate comments of code snippets provide insight for code functionality, -which are helpful for program comprehension. However, due to the great cost of -authoring with the comments, many code projects do not contain adequate -comments. Automatic comment generation techniques have been proposed to -generate comments from pieces of code in order to alleviate the human efforts -in annotating the code. Most existing approaches attempt to exploit certain -correlations (usually manually given) between code and generated comments, -which could be easily violated if the coding patterns change and hence the -performance of comment generation declines. In this paper, we first build -C2CGit, a large dataset from open projects in GitHub, which is more than -20$\times$ larger than existing datasets. Then we propose a new attention -module called Code Attention to translate code to comments, which is able to -utilize the domain features of code snippets, such as symbols and identifiers. -We make ablation studies to determine effects of different parts in Code -Attention. Experimental results demonstrate that the proposed module has better -performance over existing approaches in both BLEU and METEOR. -" -5794,1709.07758,Farhana Ferdousi Liza and Marek Grzes,Improving Language Modelling with Noise-contrastive estimation,cs.CL," Neural language models do not scale well when the vocabulary is large. -Noise-contrastive estimation (NCE) is a sampling-based method that allows for -fast learning with large vocabularies. Although NCE has shown promising -performance in neural machine translation, it was considered to be an -unsuccessful approach for language modelling. A sufficient investigation of the -hyperparameters in the NCE-based neural language models was also missing. In -this paper, we showed that NCE can be a successful approach in neural language -modelling when the hyperparameters of a neural network are tuned appropriately. -We introduced the 'search-then-converge' learning rate schedule for NCE and -designed a heuristic that specifies how to use this schedule. The impact of the -other important hyperparameters, such as the dropout rate and the weight -initialisation range, was also demonstrated. We showed that appropriate tuning -of NCE-based neural language models outperforms the state-of-the-art -single-model methods on a popular benchmark. -" -5795,1709.07777,Ji Wen,Sentence Correction Based on Large-scale Language Modelling,cs.CL," With the further development of informatization, more and more data is stored -in the form of text. There are some loss of text during their generation and -transmission. The paper aims to establish a language model based on the -large-scale corpus to complete the restoration of missing text. In this paper, -we introduce a novel measurement to find the missing words, and a way of -establishing a comprehensive candidate lexicon to insert the correct choice of -words. The paper also introduces some effective optimization methods, which -largely improve the efficiency of the text restoration and shorten the time of -dealing with 1000 sentences into 3.6 seconds. \keywords{ language model, -sentence correction, word imputation, parallel optimization -" -5796,1709.07809,Philipp Koehn,Neural Machine Translation,cs.CL," Draft of textbook chapter on neural machine translation. a comprehensive -treatment of the topic, ranging from introduction to neural networks, -computation graphs, description of the currently dominant attentional -sequence-to-sequence model, recent refinements, alternative architectures and -challenges. Written as chapter for the textbook Statistical Machine -Translation. Used in the JHU Fall 2017 class on machine translation. -" -5797,1709.07814,"Andros Tjandra, Sakriani Sakti, Satoshi Nakamura",Attention-based Wav2Text with Feature Transfer Learning,cs.CL cs.LG cs.SD," Conventional automatic speech recognition (ASR) typically performs -multi-level pattern recognition tasks that map the acoustic speech waveform -into a hierarchy of speech units. But, it is widely known that information loss -in the earlier stage can propagate through the later stages. After the -resurgence of deep learning, interest has emerged in the possibility of -developing a purely end-to-end ASR system from the raw waveform to the -transcription without any predefined alignments and hand-engineered models. -However, the successful attempts in end-to-end architecture still used -spectral-based features, while the successful attempts in using raw waveform -were still based on the hybrid deep neural network - Hidden Markov model -(DNN-HMM) framework. In this paper, we construct the first end-to-end -attention-based encoder-decoder model to process directly from raw speech -waveform to the text transcription. We called the model as ""Attention-based -Wav2Text"". To assist the training process of the end-to-end model, we propose -to utilize a feature transfer learning. Experimental results also reveal that -the proposed Attention-based Wav2Text model directly with raw waveform could -achieve a better result in comparison with the attentional encoder-decoder -model trained on standard front-end filterbank features. -" -5798,1709.07840,"Igor Shalyminov, Arash Eshghi, Oliver Lemon","Challenging Neural Dialogue Models with Natural Data: Memory Networks - Fail on Incremental Phenomena",cs.CL," Natural, spontaneous dialogue proceeds incrementally on a word-by-word basis; -and it contains many sorts of disfluency such as mid-utterance/sentence -hesitations, interruptions, and self-corrections. But training data for machine -learning approaches to dialogue processing is often either cleaned-up or wholly -synthetic in order to avoid such phenomena. The question then arises of how -well systems trained on such clean data generalise to real spontaneous -dialogue, or indeed whether they are trainable at all on naturally occurring -dialogue data. To answer this question, we created a new corpus called bAbI+ by -systematically adding natural spontaneous incremental dialogue phenomena such -as restarts and self-corrections to the Facebook AI Research's bAbI dialogues -dataset. We then explore the performance of a state-of-the-art retrieval model, -MemN2N, on this more natural dataset. Results show that the semantic accuracy -of the MemN2N model drops drastically; and that although it is in principle -able to learn to process the constructions in bAbI+, it needs an impractical -amount of training data to do so. Finally, we go on to show that an -incremental, semantic parser -- DyLan -- shows 100% semantic accuracy on both -bAbI and bAbI+, highlighting the generalisation properties of linguistically -informed dialogue models. -" -5799,1709.07858,"Arash Eshghi, Igor Shalyminov, Oliver Lemon","Bootstrapping incremental dialogue systems from minimal data: the - generalisation power of dialogue grammars",cs.CL," We investigate an end-to-end method for automatically inducing task-based -dialogue systems from small amounts of unannotated dialogue data. It combines -an incremental semantic grammar - Dynamic Syntax and Type Theory with Records -(DS-TTR) - with Reinforcement Learning (RL), where language generation and -dialogue management are a joint decision problem. The systems thus produced are -incremental: dialogues are processed word-by-word, shown previously to be -essential in supporting natural, spontaneous dialogue. We hypothesised that the -rich linguistic knowledge within the grammar should enable a combinatorially -large number of dialogue variations to be processed, even when trained on very -few dialogues. Our experiments show that our model can process 74% of the -Facebook AI bAbI dataset even when trained on only 0.13% of the data (5 -dialogues). It can in addition process 65% of bAbI+, a corpus we created by -systematically adding incremental dialogue phenomena such as restarts and -self-corrections to bAbI. We compare our model with a state-of-the-art -retrieval model, MemN2N. We find that, in terms of semantic accuracy, MemN2N -shows very poor robustness to the bAbI+ transformations even when trained on -the full bAbI dataset. -" -5800,1709.07862,"Pin-Jung Chen, I-Hung Hsu, Yi-Yao Huang, Hung-Yi Lee","Mitigating the Impact of Speech Recognition Errors on Chatbot using - Sequence-to-Sequence Model",cs.CL," We apply sequence-to-sequence model to mitigate the impact of speech -recognition errors on open domain end-to-end dialog generation. We cast the -task as a domain adaptation problem where ASR transcriptions and original text -are in two different domains. In this paper, our proposed model includes two -individual encoders for each domain data and make their hidden states similar -to ensure the decoder predict the same dialog text. The method shows that the -sequence-to-sequence model can learn the ASR transcriptions and original text -pair having the same meaning and eliminate the speech recognition errors. -Experimental results on Cornell movie dialog dataset demonstrate that the -domain adaption system help the spoken dialog system generate more similar -responses with the original text answers. -" -5801,1709.07871,"Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron - Courville",FiLM: Visual Reasoning with a General Conditioning Layer,cs.CV cs.AI cs.CL stat.ML," We introduce a general-purpose conditioning method for neural networks called -FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network -computation via a simple, feature-wise affine transformation based on -conditioning information. We show that FiLM layers are highly effective for -visual reasoning - answering image-related questions which require a -multi-step, high-level process - a task which has proven difficult for standard -deep learning methods that do not explicitly model reasoning. Specifically, we -show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error -for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are -robust to ablations and architectural modifications, and 4) generalize well to -challenging, new data from few examples or even zero-shot. -" -5802,1709.07902,"Wei-Ning Hsu, Yu Zhang, and James Glass","Unsupervised Learning of Disentangled and Interpretable Representations - from Sequential Data",cs.LG cs.CL cs.SD eess.AS stat.ML," We present a factorized hierarchical variational autoencoder, which learns -disentangled and interpretable representations from sequential data without -supervision. Specifically, we exploit the multi-scale nature of information in -sequential data by formulating it explicitly within a factorized hierarchical -graphical model that imposes sequence-dependent priors and sequence-independent -priors to different sets of latent variables. The model is evaluated on two -speech corpora to demonstrate, qualitatively, its ability to transform speakers -or linguistic content by manipulating different sets of latent variables; and -quantitatively, its ability to outperform an i-vector baseline for speaker -verification and reduce the word error rate by as much as 35% in mismatched -train/test scenarios for automatic speech recognition tasks. -" -5803,1709.07915,"George Shaw Jr., and Amir Karami","Computational Content Analysis of Negative Tweets for Obesity, Diet, - Diabetes, and Exercise",cs.SI cs.CL stat.AP stat.CO stat.ML," Social media based digital epidemiology has the potential to support faster -response and deeper understanding of public health related threats. This study -proposes a new framework to analyze unstructured health related textual data -via Twitter users' post (tweets) to characterize the negative health sentiments -and non-health related concerns in relations to the corpus of negative -sentiments, regarding Diet Diabetes Exercise, and Obesity (DDEO). Through the -collection of 6 million Tweets for one month, this study identified the -prominent topics of users as it relates to the negative sentiments. Our -proposed framework uses two text mining methods, sentiment analysis and topic -modeling, to discover negative topics. The negative sentiments of Twitter users -support the literature narratives and the many morbidity issues that are -associated with DDEO and the linkage between obesity and diabetes. The -framework offers a potential method to understand the publics' opinions and -sentiments regarding DDEO. More importantly, this research provides new -opportunities for computational social scientists, medical experts, and public -health professionals to collectively address DDEO-related issues. -" -5804,1709.07916,"Amir Karami, Alicia A. Dahl, Gabrielle Turner-McGrievy, Hadi Kharrazi, - Jr., George Shaw","Characterizing Diabetes, Diet, Exercise, and Obesity Comments on Twitter",cs.SI cs.CL stat.AP stat.CO stat.ML," Social media provide a platform for users to express their opinions and share -information. Understanding public health opinions on social media, such as -Twitter, offers a unique approach to characterizing common health issues such -as diabetes, diet, exercise, and obesity (DDEO), however, collecting and -analyzing a large scale conversational public health data set is a challenging -research task. The goal of this research is to analyze the characteristics of -the general public's opinions in regard to diabetes, diet, exercise and obesity -(DDEO) as expressed on Twitter. A multi-component semantic and linguistic -framework was developed to collect Twitter data, discover topics of interest -about DDEO, and analyze the topics. From the extracted 4.5 million tweets, 8% -of tweets discussed diabetes, 23.7% diet, 16.6% exercise, and 51.7% obesity. -The strongest correlation among the topics was determined between exercise and -obesity. Other notable correlations were: diabetes and obesity, and diet and -obesity DDEO terms were also identified as subtopics of each of the DDEO -topics. The frequent subtopics discussed along with Diabetes, excluding the -DDEO terms themselves, were blood pressure, heart attack, yoga, and Alzheimer. -The non-DDEO subtopics for Diet included vegetarian, pregnancy, celebrities, -weight loss, religious, and mental health, while subtopics for Exercise -included computer games, brain, fitness, and daily plan. Non-DDEO subtopics for -Obesity included Alzheimer, cancer, and children. With 2.67 billion social -media users in 2016, publicly available data such as Twitter posts can be -utilized to support clinical providers, public health experts, and social -scientists in better understanding common public opinions in regard to -diabetes, diet, exercise, and obesity. -" -5805,1709.08011,Yoshiaki Kitagawa and Mamoru Komachi,Long Short-Term Memory for Japanese Word Segmentation,cs.CL," This study presents a Long Short-Term Memory (LSTM) neural network approach -to Japanese word segmentation (JWS). Previous studies on Chinese word -segmentation (CWS) succeeded in using recurrent neural networks such as LSTM -and gated recurrent units (GRU). However, in contrast to Chinese, Japanese -includes several character types, such as hiragana, katakana, and kanji, that -produce orthographic variations and increase the difficulty of word -segmentation. Additionally, it is important for JWS tasks to consider a global -context, and yet traditional JWS approaches rely on local features. In order to -address this problem, this study proposes employing an LSTM-based approach to -JWS. The experimental results indicate that the proposed model achieves -state-of-the-art accuracy with respect to various Japanese corpora. -" -5806,1709.08074,"Michael R. Glass, Md Faisal Mahbub Chowdhury and Alfio M. Gliozzo",Language Independent Acquisition of Abbreviations,cs.CL," This paper addresses automatic extraction of abbreviations (encompassing -acronyms and initialisms) and corresponding long-form expansions from plain -unstructured text. We create and are going to release a multilingual resource -for abbreviations and their corresponding expansions, built automatically by -exploiting Wikipedia redirect and disambiguation pages, that can be used as a -benchmark for evaluation. We address a shortcoming of previous work where only -the redirect pages were used, and so every abbreviation had only a single -expansion, even though multiple different expansions are possible for many of -the abbreviations. We also develop a principled machine learning based approach -to scoring expansion candidates using different techniques such as indicators -of near synonymy, topical relatedness, and surface similarity. We show improved -performance over seven languages, including two with a non-Latin alphabet, -relative to strong baselines. -" -5807,1709.08196,"Johannes Gra\""en","Identifying Phrasemes via Interlingual Association Measures -- A - Data-driven Approach on Dependency-parsed and Word-aligned Parallel Corpora",cs.CL," This is a preprint of the article ""Identifying Phrasemes via Interlingual -Association Measures"" that was presented in February 2016 at the LeKo (Lexical -combinations and typified speech in a multilingual context) conference in -Innsbruck. -" -5808,1709.08267,"Kamran Kowsari, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari - Meimandi, Matthew S. Gerber, Laura E. Barnes",HDLTex: Hierarchical Deep Learning for Text Classification,cs.LG cs.AI cs.CL cs.CV cs.IR," The continually increasing number of documents produced each year -necessitates ever improving information processing methods for searching, -retrieving, and organizing text. Central to these information processing -methods is document classification, which has become an important application -for supervised learning. Recently the performance of these traditional -classifiers has degraded as the number of documents has increased. This is -because along with this growth in the number of documents has come an increase -in the number of categories. This paper approaches this problem differently -from current document classification methods that view the problem as -multi-class classification. Instead we perform hierarchical classification -using an approach we call Hierarchical Deep Learning for Text classification -(HDLTex). HDLTex employs stacks of deep learning architectures to provide -specialized understanding at each level of the document hierarchy. -" -5809,1709.08294,"Dinghan Shen, Martin Renqiang Min, Yitong Li, Lawrence Carin",Learning Context-Sensitive Convolutional Filters for Text Processing,cs.CL cs.LG stat.ML," Convolutional neural networks (CNNs) have recently emerged as a popular -building block for natural language processing (NLP). Despite their success, -most existing CNN models employed in NLP share the same learned (and static) -set of filters for all input sentences. In this paper, we consider an approach -of using a small meta network to learn context-sensitive convolutional filters -for text processing. The role of meta network is to abstract the contextual -information of a sentence or document into a set of input-aware filters. We -further generalize this framework to model sentence pairs, where a -bidirectional filter generation mechanism is introduced to encapsulate -co-dependent sentence representations. In our benchmarks on four different -tasks, including ontology classification, sentiment analysis, answer sentence -selection, and paraphrase identification, our proposed model, a modified CNN -with context-sensitive filters, consistently outperforms the standard CNN and -attention-based CNN baselines. By visualizing the learned context-sensitive -filters, we further validate and rationalize the effectiveness of proposed -framework. -" -5810,1709.08299,"Yiming Cui, Ting Liu, Zhipeng Chen, Wentao Ma, Shijin Wang and Guoping - Hu","Dataset for the First Evaluation on Chinese Machine Reading - Comprehension",cs.CL," Machine Reading Comprehension (MRC) has become enormously popular recently -and has attracted a lot of attention. However, existing reading comprehension -datasets are mostly in English. To add diversity in reading comprehension -datasets, in this paper we propose a new Chinese reading comprehension dataset -for accelerating related research in the community. The proposed dataset -contains two different types: cloze-style reading comprehension and user query -reading comprehension, associated with large-scale training data as well as -human-annotated validation and hidden test set. Along with this dataset, we -also hosted the first Evaluation on Chinese Machine Reading Comprehension -(CMRC-2017) and successfully attracted tens of participants, which suggest the -potential impact of this dataset. -" -5811,1709.08366,"Vitobha Munigala, Srikanth Tamilselvam, Anush Sankaran","""Let me convince you to buy my product ... "": A Case Study of an - Automated Persuasive System for Fashion Products",cs.AI cs.CL," Persuasivenes is a creative art aimed at making people believe in certain set -of beliefs. Many a times, such creativity is about adapting richness of one -domain into another to strike a chord with the target audience. In this -research, we present PersuAIDE! - A persuasive system based on linguistic -creativity to transform given sentence to generate various forms of persuading -sentences. These various forms cover multiple focus of persuasion such as -memorability and sentiment. For a given simple product line, the algorithm is -composed of several steps including: (i) select an appropriate well-known -expression for the target domain to add memorability, (ii) identify keywords -and entities in the given sentence and expression and transform it to produce -creative persuading sentence, and (iii) adding positive or negative sentiment -for further persuasion. The persuasive conversion were manually verified using -qualitative results and the effectiveness of the proposed approach is -empirically discussed. -" -5812,1709.08448,"Kevin Alex Mathews, P Sreenivasa Kumar",Extracting Ontological Knowledge from Textual Descriptions,cs.AI cs.CL," Authoring of OWL-DL ontologies is intellectually challenging and to make this -process simpler, many systems accept natural language text as input. A -text-based ontology authoring approach can be successful only when it is -combined with an effective method for extracting ontological axioms from text. -Extracting axioms from unrestricted English input is a substantially -challenging task due to the richness of the language. Controlled natural -languages (CNLs) have been proposed in this context and these tend to be highly -restrictive. In this paper, we propose a new CNL called TEDEI (TExtual -DEscription Identifier) whose grammar is inspired by the different ways OWL-DL -constructs are expressed in English. We built a system that transforms TEDEI -sentences into corresponding OWL-DL axioms. Now, ambiguity due to different -possible lexicalizations of sentences and semantic ambiguity present in -sentences are challenges in this context. We find that the best way to handle -these challenges is to construct axioms corresponding to alternative -formalizations of the sentence so that the end-user can make an appropriate -choice. The output is compared against human-authored axioms and in substantial -number of cases, human-authored axiom is indeed one of the alternatives given -by the system. The proposed system substantially enhances the types of sentence -structures that can be used for ontology authoring. -" -5813,1709.08521,Omar Al-Harbi,"Using objective words in the reviews to improve the colloquial arabic - sentiment analysis",cs.CL," One of the main difficulties in sentiment analysis of the Arabic language is -the presence of the colloquialism. In this paper, we examine the effect of -using objective words in conjunction with sentimental words on sentiment -classification for the colloquial Arabic reviews, specifically Jordanian -colloquial reviews. The reviews often include both sentimental and objective -words, however, the most existing sentiment analysis models ignore the -objective words as they are considered useless. In this work, we created two -lexicons: the first includes the colloquial sentimental words and compound -phrases, while the other contains the objective words associated with values of -sentiment tendency based on a particular estimation method. We used these -lexicons to extract sentiment features that would be training input to the -Support Vector Machines (SVM) to classify the sentiment polarity of the -reviews. The reviews dataset have been collected manually from JEERAN website. -The results of the experiments show that the proposed approach improves the -polarity classification in comparison to two baseline models, with accuracy -95.6%. -" -5814,1709.08600,"Maxim Grechkin, Hoifung Poon, Bill Howe",EZLearn: Exploiting Organic Supervision in Large-Scale Data Annotation,cs.CL cs.LG," Many real-world applications require automated data annotation, such as -identifying tissue origins based on gene expressions and classifying images -into semantic categories. Annotation classes are often numerous and subject to -changes over time, and annotating examples has become the major bottleneck for -supervised learning methods. In science and other high-value domains, large -repositories of data samples are often available, together with two sources of -organic supervision: a lexicon for the annotation classes, and text -descriptions that accompany some data samples. Distant supervision has emerged -as a promising paradigm for exploiting such indirect supervision by -automatically annotating examples where the text description contains a class -mention in the lexicon. However, due to linguistic variations and ambiguities, -such training data is inherently noisy, which limits the accuracy of this -approach. In this paper, we introduce an auxiliary natural language processing -system for the text modality, and incorporate co-training to reduce noise and -augment signal in distant supervision. Without using any manually labeled data, -our EZLearn system learned to accurately annotate data samples in functional -genomics and scientific figure comprehension, substantially outperforming -state-of-the-art supervised methods trained on tens of thousands of annotated -examples. -" -5815,1709.08624,"Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, Jun Wang",Long Text Generation via Adversarial Training with Leaked Information,cs.CL cs.AI cs.LG," Automatically generating coherent and semantically meaningful text has many -applications in machine translation, dialogue systems, image captioning, etc. -Recently, by combining with policy gradient, Generative Adversarial Nets (GAN) -that use a discriminative model to guide the training of the generative model -as a reinforcement learning policy has shown promising results in text -generation. However, the scalar guiding signal is only available after the -entire text has been generated and lacks intermediate information about text -structure during the generative process. As such, it limits its success when -the length of the generated text samples is long (more than 20 words). In this -paper, we propose a new framework, called LeakGAN, to address the problem for -long text generation. We allow the discriminative net to leak its own -high-level extracted features to the generative net to further help the -guidance. The generator incorporates such informative signals into all -generation steps through an additional Manager module, which takes the -extracted features of current generated words and outputs a latent vector to -guide the Worker module for next-word generation. Our extensive experiments on -synthetic data and various real-world tasks with Turing test demonstrate that -LeakGAN is highly effective in long text generation and also improves the -performance in short text generation scenarios. More importantly, without any -supervision, LeakGAN would be able to implicitly learn sentence structures only -through the interaction between Manager and Worker. -" -5816,1709.08694,"Luciano Barbosa, Paulo R. Cavalin, Victor Guimaraes and Matthias - Kormaksson","Methodology and Results for the Competition on Semantic Similarity - Evaluation and Entailment Recognition for PROPOR 2016",cs.CL cs.LG," In this paper, we present the methodology and the results obtained by our -teams, dubbed Blue Man Group, in the ASSIN (from the Portuguese {\it -Avalia\c{c}\~ao de Similaridade Sem\^antica e Infer\^encia Textual}) -competition, held at PROPOR 2016\footnote{International Conference on the -Computational Processing of the Portuguese Language - -http://propor2016.di.fc.ul.pt/}. Our team's strategy consisted of evaluating -methods based on semantic word vectors, following two distinct directions: 1) -to make use of low-dimensional, compact, feature sets, and 2) deep -learning-based strategies dealing with high-dimensional feature vectors. -Evaluation results demonstrated that the first strategy was more promising, so -that the results from the second strategy have been discarded. As a result, by -considering the best run of each of the six teams, we have been able to achieve -the best accuracy and F1 values in entailment recognition, in the Brazilian -Portuguese set, and the best F1 score overall. In the semantic similarity task, -our team was ranked second in the Brazilian Portuguese set, and third -considering both sets. -" -5817,1709.08698,"Boya Yu, Jiaxu Zhou, Yi Zhang, Yunong Cao",Identifying Restaurant Features via Sentiment Analysis on Yelp Reviews,cs.CL," Many people use Yelp to find a good restaurant. Nonetheless, with only an -overall rating for each restaurant, Yelp offers not enough information for -independently judging its various aspects such as environment, service or -flavor. In this paper, we introduced a machine learning based method to -characterize such aspects for particular types of restaurants. The main -approach used in this paper is to use a support vector machine (SVM) model to -decipher the sentiment tendency of each review from word frequency. Word scores -generated from the SVM models are further processed into a polarity index -indicating the significance of each word for special types of restaurant. -Customers overall tend to express more sentiment regarding service. As for the -distinction between different cuisines, results that match the common sense are -obtained: Japanese cuisines are usually fresh, some French cuisines are -overpriced while Italian Restaurants are often famous for their pizzas. -" -5818,1709.08716,Lei Shu and Hu Xu and Bing Liu,DOC: Deep Open Classification of Text Documents,cs.CL," Traditional supervised learning makes the closed-world assumption that the -classes appeared in the test data must have appeared in training. This also -applies to text learning or text classification. As learning is used -increasingly in dynamic open environments where some new/test documents may not -belong to any of the training classes, identifying these novel documents during -classification presents an important problem. This problem is called open-world -classification or open classification. This paper proposes a novel deep -learning based approach. It outperforms existing state-of-the-art techniques -dramatically. -" -5819,1709.08853,"Zhengdong Lu and Xianggen Liu and Haotian Cui and Yukun Yan and Daqi - Zheng",Object-oriented Neural Programming (OONP) for Document Understanding,cs.LG cs.AI cs.CL cs.NE," We propose Object-oriented Neural Programming (OONP), a framework for -semantically parsing documents in specific domains. Basically, OONP reads a -document and parses it into a predesigned object-oriented data structure -(referred to as ontology in this paper) that reflects the domain-specific -semantics of the document. An OONP parser models semantic parsing as a decision -process: a neural net-based Reader sequentially goes through the document, and -during the process it builds and updates an intermediate ontology to summarize -its partial understanding of the text it covers. OONP supports a rich family of -operations (both symbolic and differentiable) for composing the ontology, and a -big variety of forms (both symbolic and differentiable) for representing the -state and the document. An OONP parser can be trained with supervision of -different forms and strength, including supervised learning (SL) , -reinforcement learning (RL) and hybrid of the two. Our experiments on both -synthetic and real-world document parsing tasks have shown that OONP can learn -to handle fairly complicated ontology with training data of modest sizes. -" -5820,1709.08858,"Kana Oomoto, Haruka Oikawa, Eiko Yamamoto, Mitsuo Yoshida, Masayuki - Okabe, Kyoji Umemura",Polysemy Detection in Distributed Representation of Word Sense,cs.DS cs.CL," In this paper, we propose a statistical test to determine whether a given -word is used as a polysemic word or not. The statistic of the word in this test -roughly corresponds to the fluctuation in the senses of the neighboring words a -nd the word itself. Even though the sense of a word corresponds to a single -vector, we discuss how polysemy of the words affects the position of vectors. -Finally, we also explain the method to detect this effect. -" -5821,1709.08878,"Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren, Percy Liang",Generating Sentences by Editing Prototypes,cs.CL cs.AI cs.LG cs.NE stat.ML," We propose a new generative model of sentences that first samples a prototype -sentence from the training corpus and then edits it into a new sentence. -Compared to traditional models that generate from scratch either left-to-right -or by first sampling a latent sentence vector, our prototype-then-edit model -improves perplexity on language modeling and generates higher quality outputs -according to human evaluation. Furthermore, the model gives rise to a latent -edit vector that captures interpretable semantics such as sentence similarity -and sentence-level analogies. -" -5822,1709.08898,"Gyu-Hyeon Choi, Jong-Hun Shin and Young-Kil Kim","Improving a Multi-Source Neural Machine Translation Model with Corpus - Extension for Low-Resource Languages",cs.CL," In machine translation, we often try to collect resources to improve -performance. However, most of the language pairs, such as Korean-Arabic and -Korean-Vietnamese, do not have enough resources to train machine translation -systems. In this paper, we propose the use of synthetic methods for extending a -low-resource corpus and apply it to a multi-source neural machine translation -model. We showed the improvement of machine translation performance through -corpus extension using the synthetic method. We specifically focused on how to -create source sentences that can make better target sentences, including the -use of synthetic methods. We found that the corpus extension could also improve -the performance of multi-source neural machine translation. We showed the -corpus extension and multi-source model to be efficient methods for a -low-resource language pair. Furthermore, when both methods were used together, -we found better machine translation performance. -" -5823,1709.08907,"Sho Takase, Jun Suzuki and Masaaki Nagata",Input-to-Output Gate to Improve RNN Language Models,cs.CL," This paper proposes a reinforcing method that refines the output layers of -existing Recurrent Neural Network (RNN) language models. We refer to our -proposed method as Input-to-Output Gate (IOG). IOG has an extremely simple -structure, and thus, can be easily combined with any RNN language models. Our -experiments on the Penn Treebank and WikiText-2 datasets demonstrate that IOG -consistently boosts the performance of several different types of current -topline RNN language models. -" -5824,1709.09118,"Qiuyuan Huang, Paul Smolensky, Xiaodong He, Li Deng, Dapeng Wu",Tensor Product Generation Networks for Deep NLP Modeling,cs.CV cs.CL," We present a new approach to the design of deep networks for natural language -processing (NLP), based on the general technique of Tensor Product -Representations (TPRs) for encoding and processing symbol structures in -distributed neural networks. A network architecture --- the Tensor Product -Generation Network (TPGN) --- is proposed which is capable in principle of -carrying out TPR computation, but which uses unconstrained deep learning to -design its internal representations. Instantiated in a model for image-caption -generation, TPGN outperforms LSTM baselines when evaluated on the COCO dataset. -The TPR-capable structure enables interpretation of internal representations -and operations, which prove to contain considerable grammatical content. Our -caption-generation model can be interpreted as generating sequences of -grammatical categories and retrieving words by their categories from a plan -encoded as a distributed representation. -" -5825,1709.09119,Paul Christian Sommerhoff,Integration of Japanese Papers Into the DBLP Data Set,cs.CL cs.DL," If someone is looking for a certain publication in the field of computer -science, the searching person is likely to use the DBLP to find the desired -publication. The DBLP data set is continuously extended with new publications, -or rather their metadata, for example the names of involved authors, the title -and the publication date. While the size of the data set is already remarkable, -specific areas can still be improved. The DBLP offers a huge collection of -English papers because most papers concerning computer science are published in -English. Nevertheless, there are official publications in other languages which -are supposed to be added to the data set. One kind of these are Japanese -papers. This diploma thesis will show a way to automatically process -publication lists of Japanese papers and to make them ready for an import into -the DBLP data set. Especially important are the problems along the way of -processing, such as transcription handling and Personal Name Matching with -Japanese names. -" -5826,1709.09220,"Athanasios Giannakopoulos, Diego Antognini, Claudiu Musat, Andreea - Hossmann and Michael Baeriswyl","Dataset Construction via Attention for Aspect Term Extraction with - Distant Supervision",cs.CL," Aspect Term Extraction (ATE) detects opinionated aspect terms in sentences or -text spans, with the end goal of performing aspect-based sentiment analysis. -The small amount of available datasets for supervised ATE and the fact that -they cover only a few domains raise the need for exploiting other data sources -in new and creative ways. Publicly available review corpora contain a plethora -of opinionated aspect terms and cover a larger domain spectrum. In this paper, -we first propose a method for using such review corpora for creating a new -dataset for ATE. Our method relies on an attention mechanism to select -sentences that have a high likelihood of containing actual opinionated aspects. -We thus improve the quality of the extracted aspects. We then use the -constructed dataset to train a model and perform ATE with distant supervision. -By evaluating on human annotated datasets, we prove that our method achieves a -significantly improved performance over various unsupervised and supervised -baselines. Finally, we prove that sentence selection matters when it comes to -creating new datasets for ATE. Specifically, we show that, using a set of -selected sentences leads to higher ATE performance compared to using the whole -sentence set. -" -5827,1709.09239,"Hendrik ter Horst, Matthias Hartung, Roman Klinger, Matthias Zwick, - Philipp Cimiano","Predicting Disease-Gene Associations using Cross-Document Graph-based - Features",cs.CL," In the context of personalized medicine, text mining methods pose an -interesting option for identifying disease-gene associations, as they can be -used to generate novel links between diseases and genes which may complement -knowledge from structured databases. The most straightforward approach to -extract such links from text is to rely on a simple assumption postulating an -association between all genes and diseases that co-occur within the same -document. However, this approach (i) tends to yield a number of spurious -associations, (ii) does not capture different relevant types of associations, -and (iii) is incapable of aggregating knowledge that is spread across -documents. Thus, we propose an approach in which disease-gene co-occurrences -and gene-gene interactions are represented in an RDF graph. A machine -learning-based classifier is trained that incorporates features extracted from -the graph to separate disease-gene pairs into valid disease-gene associations -and spurious ones. On the manually curated Genetic Testing Registry, our -approach yields a 30 points increase in F1 score over a plain co-occurrence -baseline. -" -5828,1709.09250,"Omar Al-Harbi, Shaidah Jusoh, Norita Md Norwawi",Lexical Disambiguation in Natural Language Questions (NLQs),cs.CL cs.AI," Question processing is a fundamental step in a question answering (QA) -application, and its quality impacts the performance of QA application. The -major challenging issue in processing question is how to extract semantic of -natural language questions (NLQs). A human language is ambiguous. Ambiguity may -occur at two levels; lexical and syntactic. In this paper, we propose a new -approach for resolving lexical ambiguity problem by integrating context -knowledge and concepts knowledge of a domain, into shallow natural language -processing (SNLP) techniques. Concepts knowledge is modeled using ontology, -while context knowledge is obtained from WordNet, and it is determined based on -neighborhood words in a question. The approach will be applied to a university -QA system. -" -5829,1709.09254,"Ke Ni, William Yang Wang",Learning to Explain Non-Standard English Words and Phrases,cs.CL," We describe a data-driven approach for automatically explaining new, -non-standard English expressions in a given sentence, building on a large -dataset that includes 15 years of crowdsourced examples from -UrbanDictionary.com. Unlike prior studies that focus on matching keywords from -a slang dictionary, we investigate the possibility of learning a neural -sequence-to-sequence model that generates explanations of unseen non-standard -English expressions given context. We propose a dual encoder approach---a -word-level encoder learns the representation of context, and a second -character-level encoder to learn the hidden representation of the target -non-standard expression. Our model can produce reasonable definitions of new -non-standard English expressions given their context with certain confidence. -" -5830,1709.09345,"Seil Na, Sangho Lee, Jisung Kim, Gunhee Kim",A Read-Write Memory Network for Movie Story Understanding,cs.CV cs.CL," We propose a novel memory network model named Read-Write Memory Network -(RWMN) to perform question and answering tasks for large-scale, multimodal -movie story understanding. The key focus of our RWMN model is to design the -read network and the write network that consist of multiple convolutional -layers, which enable memory read and write operations to have high capacity and -flexibility. While existing memory-augmented network models treat each memory -slot as an independent block, our use of multi-layered CNNs allows the model to -read and write sequential memory cells as chunks, which is more reasonable to -represent a sequential story because adjacent memory blocks often have strong -correlations. For evaluation, we apply our model to all the six tasks of the -MovieQA benchmark, and achieve the best accuracies on several tasks, especially -on the visual QA task. Our model shows a potential to better understand not -only the content in the story, but also more abstract information, such as -relationships between characters and the reasons for their actions. -" -5831,1709.09360,"Lyndon White, Roberto Togneri, Wei Liu, Mohammed Bennamoun",Learning of Colors from Color Names: Distribution and Point Estimation,cs.CL," Color names are often made up of multiple words. As a task in natural -language understanding we investigate in depth the capacity of neural networks -based on sums of word embeddings (SOWE), recurrence (LSTM and GRU based RNNs) -and convolution (CNN), to estimate colors from sequences of terms. We consider -both point and distribution estimates of color. We argue that the latter has a -particular value as there is no clear agreement between people as to what a -particular color describes -- different people have a different idea of what it -means to be ``very dark orange'', for example. Surprisingly, despite it's -simplicity, the sum of word embeddings generally performs the best on almost -all evaluations. -" -5832,1709.09373,"Luigi Di Caro, Marco Guerzoni, Massimiliano Nuccio, Giovanni Siragusa",A Bimodal Network Approach to Model Topic Dynamics,cs.CL econ.GN q-fin.EC," This paper presents an intertemporal bimodal network to analyze the evolution -of the semantic content of a scientific field within the framework of topic -modeling, namely using the Latent Dirichlet Allocation (LDA). The main -contribution is the conceptualization of the topic dynamics and its -formalization and codification into an algorithm. To benchmark the -effectiveness of this approach, we propose three indexes which track the -transformation of topics over time, their rate of birth and death, and the -novelty of their content. Applying the LDA, we test the algorithm both on a -controlled experiment and on a corpus of several thousands of scientific papers -over a period of more than 100 years which account for the history of the -economic thought. -" -5833,1709.09404,"Wided Bakari (1), Patrice Bellot (1), Mahmoud Neji ((1) LSIS)","A Preliminary Study for Building an Arabic Corpus of Pair - Questions-Texts from the Web: AQA-Webcorp",cs.CL cs.IR," With the development of electronic media and the heterogeneity of Arabic data -on the Web, the idea of building a clean corpus for certain applications of -natural language processing, including machine translation, information -retrieval, question answer, become more and more pressing. In this manuscript, -we seek to create and develop our own corpus of pair's questions-texts. This -constitution then will provide a better base for our experimentation step. -Thus, we try to model this constitution by a method for Arabic insofar as it -recovers texts from the web that could prove to be answers to our factual -questions. To do this, we had to develop a java script that can extract from a -given query a list of html pages. Then clean these pages to the extent of -having a data base of texts and a corpus of pair's question-texts. In addition, -we give preliminary results of our proposal method. Some investigations for the -construction of Arabic corpus are also presented in this document. -" -5834,1709.09443,Lea Frermann and Michael C. Frank,"Prosodic Features from Large Corpora of Child-Directed Speech as - Predictors of the Age of Acquisition of Words",cs.CL," The impressive ability of children to acquire language is a widely studied -phenomenon, and the factors influencing the pace and patterns of word learning -remains a subject of active research. Although many models predicting the age -of acquisition of words have been proposed, little emphasis has been directed -to the raw input children achieve. In this work we present a comparatively -large-scale multi-modal corpus of prosody-text aligned child directed speech. -Our corpus contains automatically extracted word-level prosodic features, and -we investigate the utility of this information as predictors of age of -acquisition. We show that prosody features boost predictive power in a -regularized regression, and demonstrate their utility in the context of a -multi-modal factorized language models trained and tested on child-directed -speech. -" -5835,1709.09500,"Rotem Dror, Gili Baumer, Marina Bogomolov, and Roi Reichart","Replicability Analysis for Natural Language Processing: Testing - Significance with Multiple Datasets",cs.CL," With the ever-growing amounts of textual data from a large variety of -languages, domains, and genres, it has become standard to evaluate NLP -algorithms on multiple datasets in order to ensure consistent performance -across heterogeneous setups. However, such multiple comparisons pose -significant challenges to traditional statistical analysis methods in NLP and -can lead to erroneous conclusions. In this paper, we propose a Replicability -Analysis framework for a statistically sound analysis of multiple comparisons -between algorithms for NLP tasks. We discuss the theoretical advantages of this -framework over the current, statistically unjustified, practice in the NLP -literature, and demonstrate its empirical value across four applications: -multi-domain dependency parsing, multilingual POS tagging, cross-domain -sentiment classification and word similarity prediction. -" -5836,1709.09587,"Tal Baumel, Jumana Nassour-Kassis, Raphael Cohen, Michael Elhadad and - No`emie Elhadad","Multi-Label Classification of Patient Notes a Case Study on ICD Code - Assignment",cs.CL cs.AI," In the context of the Electronic Health Record, automated diagnosis coding of -patient notes is a useful task, but a challenging one due to the large number -of codes and the length of patient notes. We investigate four models for -assigning multiple ICD codes to discharge summaries taken from both MIMIC II -and III. We present Hierarchical Attention-GRU (HA-GRU), a hierarchical -approach to tag a document by identifying the sentences relevant for each -label. HA-GRU achieves state-of-the art results. Furthermore, the learned -sentence-level attention layer highlights the model decision process, allows -easier error analysis, and suggests future directions for improvement. -" -5837,1709.09590,"Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder","An attentive neural architecture for joint segmentation and parsing and - its application to real estate ads",cs.CL," In processing human produced text using natural language processing (NLP) -techniques, two fundamental subtasks that arise are (i) segmentation of the -plain text into meaningful subunits (e.g., entities), and (ii) dependency -parsing, to establish relations between subunits. In this paper, we develop a -relatively simple and effective neural joint model that performs both -segmentation and dependency parsing together, instead of one after the other as -in most state-of-the-art works. We will focus in particular on the real estate -ad setting, aiming to convert an ad to a structured description, which we name -property tree, comprising the tasks of (1) identifying important entities of a -property (e.g., rooms) from classifieds and (2) structuring them into a tree -format. In this work, we propose a new joint model that is able to tackle the -two tasks simultaneously and construct the property tree by (i) avoiding the -error propagation that would arise from the subtasks one after the other in a -pipelined fashion, and (ii) exploiting the interactions between the subtasks. -For this purpose, we perform an extensive comparative study of the pipeline -methods and the new proposed joint model, reporting an improvement of over -three percentage points in the overall edge F1 score of the property tree. -Also, we propose attention methods, to encourage our model to focus on salient -tokens during the construction of the property tree. Thus we experimentally -demonstrate the usefulness of attentive neural architectures for the proposed -joint model, showcasing a further improvement of two percentage points in edge -F1 score for our application. -" -5838,1709.09686,"L. T. Anh, M. Y. Arkhipov, M. S. Burtsev","Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named - Entity Recognition",cs.CL," Named Entity Recognition (NER) is one of the most common tasks of the natural -language processing. The purpose of NER is to find and classify tokens in text -documents into predefined categories called tags, such as person names, -quantity expressions, percentage expressions, names of locations, -organizations, as well as expression of time, currency and others. Although -there is a number of approaches have been proposed for this task in Russian -language, it still has a substantial potential for the better solutions. In -this work, we studied several deep neural network models starting from vanilla -Bi-directional Long Short-Term Memory (Bi-LSTM) then supplementing it with -Conditional Random Fields (CRF) as well as highway networks and finally adding -external word embeddings. All models were evaluated across three datasets: -Gareev's dataset, Person-1000, FactRuEval-2016. We found that extension of -Bi-LSTM model with CRF significantly increased the quality of predictions. -Encoding input tokens with external word embeddings reduced training time and -allowed to achieve state of the art for the Russian NER task. -" -5839,1709.09741,"Raj Korpan, Susan L. Epstein, Anoop Aroor, Gil Dekel",WHY: Natural Explanations from a Robot Navigator,cs.AI cs.CL cs.HC cs.RO," Effective collaboration between a robot and a person requires natural -communication. When a robot travels with a human companion, the robot should be -able to explain its navigation behavior in natural language. This paper -explains how a cognitively-based, autonomous robot navigation system produces -informative, intuitive explanations for its decisions. Language generation here -is based upon the robot's commonsense, its qualitative reasoning, and its -learned spatial model. This approach produces natural explanations in real time -for a robot as it navigates in a large, complex indoor environment. -" -5840,1709.09749,Bin Bi and Hao Ma,KeyVec: Key-semantics Preserving Document Representations,cs.CL cs.LG cs.NE," Previous studies have demonstrated the empirical success of word embeddings -in various applications. In this paper, we investigate the problem of learning -distributed representations for text documents which many machine learning -algorithms take as input for a number of NLP tasks. - We propose a neural network model, KeyVec, which learns document -representations with the goal of preserving key semantics of the input text. It -enables the learned low-dimensional vectors to retain the topics and important -information from the documents that will flow to downstream tasks. Our -empirical evaluations show the superior quality of KeyVec representations in -two different document understanding tasks. -" -5841,1709.09783,"Francis Gr\'egoire, Philippe Langlais",A Deep Neural Network Approach To Parallel Sentence Extraction,cs.CL," Parallel sentence extraction is a task addressing the data sparsity problem -found in multilingual natural language processing applications. We propose an -end-to-end deep neural network approach to detect translational equivalence -between sentences in two different languages. In contrast to previous -approaches, which typically rely on multiples models and various word alignment -features, by leveraging continuous vector representation of sentences we remove -the need of any domain specific feature engineering. Using a siamese -bidirectional recurrent neural networks, our results against a strong baseline -based on a state-of-the-art parallel sentence extraction system show a -significant improvement in both the quality of the extracted parallel sentences -and the translation performance of statistical machine translation systems. We -believe this study is the first one to investigate deep learning for the -parallel sentence extraction task. -" -5842,1709.09816,"Ben Krause, Marco Damonte, Mihai Dobre, Daniel Duma, Joachim Fainberg, - Federico Fancellu, Emmanuel Kahembwe, Jianpeng Cheng, Bonnie Webber",Edina: Building an Open Domain Socialbot with Self-dialogues,cs.CL cs.AI," We present Edina, the University of Edinburgh's social bot for the Amazon -Alexa Prize competition. Edina is a conversational agent whose responses -utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative -new technique we call self-dialogues. These are conversations in which a single -AMT Worker plays both participants in a dialogue. Such dialogues are -surprisingly natural, efficient to collect and reflective of relevant and/or -trending topics. These self-dialogues provide training data for a generative -neural network as well as a basis for soft rules used by a matching score -component. Each match of a soft rule against a user utterance is associated -with a confidence score which we show is strongly indicative of reply quality, -allowing this component to self-censor and be effectively integrated with other -components. Edina's full architecture features a rule-based system backing off -to a matching score, backing off to a generative neural network. Our hybrid -data-driven methodology thus addresses both coverage limitations of a strictly -rule-based approach and the lack of guarantees of a strictly machine-learning -approach. -" -5843,1709.09885,"Gichang Lee, Jaeyun Jeong, Seungwan Seo, CzangYeob Kim, Pilsung Kang","Sentiment Classification with Word Attention based on Weakly Supervised - Learning with a Convolutional Neural Network",cs.CL," In order to maximize the applicability of sentiment analysis results, it is -necessary to not only classify the overall sentiment (positive/negative) of a -given document but also to identify the main words that contribute to the -classification. However, most datasets for sentiment analysis only have the -sentiment label for each document or sentence. In other words, there is no -information about which words play an important role in sentiment -classification. In this paper, we propose a method for identifying key words -discriminating positive and negative sentences by using a weakly supervised -learning method based on a convolutional neural network (CNN). In our model, -each word is represented as a continuous-valued vector and each sentence is -represented as a matrix whose rows correspond to the word vector used in the -sentence. Then, the CNN model is trained using these sentence matrices as -inputs and the sentiment labels as the output. Once the CNN model is trained, -we implement the word attention mechanism that identifies high-contributing -words to classification results with a class activation map, using the weights -from the fully connected layer at the end of the learned CNN model. In order to -verify the proposed methodology, we evaluated the classification accuracy and -inclusion rate of polarity words using two movie review datasets. Experimental -result show that the proposed model can not only correctly classify the -sentence polarity but also successfully identify the corresponding words with -high polarity scores. -" -5844,1709.09927,Take Yo and Kazutoshi Sasahara,Inference of Personal Attributes from Tweets Using Machine Learning,cs.CY cs.CL cs.SI," Using machine learning algorithms, including deep learning, we studied the -prediction of personal attributes from the text of tweets, such as gender, -occupation, and age groups. We applied word2vec to construct word vectors, -which were then used to vectorize tweet blocks. The resulting tweet vectors -were used as inputs for training models, and the prediction accuracy of those -models was examined as a function of the dimension of the tweet vectors and the -size of the tweet blacks. The results showed that the machine learning -algorithms could predict the three personal attributes of interest with 60-70% -accuracy. -" -5845,1709.10053,"A. Cetoli, S. Bragaglia, A.D. O'Harney, M. Sloan",Graph Convolutional Networks for Named Entity Recognition,cs.CL," In this paper we investigate the role of the dependency tree in a named -entity recognizer upon using a set of GCN. We perform a comparison among -different NER architectures and show that the grammar of a sentence positively -influences the results. Experiments on the ontonotes dataset demonstrate -consistent performance improvements, without requiring heavy feature -engineering nor additional language-specific knowledge. -" -5846,1709.10159,"Haji Mohammad Saleem, Kelly P Dillon, Susan Benesch and Derek Ruths",A Web of Hate: Tackling Hateful Speech in Online Social Spaces,cs.CL," Online social platforms are beset with hateful speech - content that -expresses hatred for a person or group of people. Such content can frighten, -intimidate, or silence platform users, and some of it can inspire other users -to commit violence. Despite widespread recognition of the problems posed by -such content, reliable solutions even for detecting hateful speech are lacking. -In the present work, we establish why keyword-based methods are insufficient -for detection. We then propose an approach to detecting hateful speech that -uses content produced by self-identifying hateful communities as training data. -Our approach bypasses the expensive annotation process often required to train -keyword systems and performs well across several established platforms, making -substantial improvements over current state-of-the-art approaches. -" -5847,1709.10191,"Mingbo Ma, Kai Zhao, Liang Huang, Bing Xiang and Bowen Zhou","Jointly Trained Sequential Labeling and Classification by Sparse - Attention Neural Networks",cs.CL," Sentence-level classification and sequential labeling are two fundamental -tasks in language understanding. While these two tasks are usually modeled -separately, in reality, they are often correlated, for example in intent -classification and slot filling, or in topic classification and named-entity -recognition. In order to utilize the potential benefits from their -correlations, we propose a jointly trained model for learning the two tasks -simultaneously via Long Short-Term Memory (LSTM) networks. This model predicts -the sentence-level category and the word-level label sequence from the stepwise -output hidden representations of LSTM. We also introduce a novel mechanism of -""sparse attention"" to weigh words differently based on their semantic relevance -to sentence-level classification. The proposed method outperforms baseline -models on ATIS and TREC datasets. -" -5848,1709.10204,Bin Bi and Hao Ma,A Neural Comprehensive Ranker (NCR) for Open-Domain Question Answering,cs.CL cs.AI cs.LG cs.NE," This paper proposes a novel neural machine reading model for open-domain -question answering at scale. Existing machine comprehension models typically -assume that a short piece of relevant text containing answers is already -identified and given to the models, from which the models are designed to -extract answers. This assumption, however, is not realistic for building a -large-scale open-domain question answering system which requires both deep text -understanding and identifying relevant text from corpus simultaneously. - In this paper, we introduce Neural Comprehensive Ranker (NCR) that integrates -both passage ranking and answer extraction in one single framework. A Q&A -system based on this framework allows users to issue an open-domain question -without needing to provide a piece of text that must contain the answer. -Experiments show that the unified NCR model is able to outperform the -states-of-the-art in both retrieval of relevant text and answer extraction. -" -5849,1709.10217,"Wei-Nan Zhang, Zhigang Chen, Wanxiang Che, Guoping Hu, Ting Liu",The First Evaluation of Chinese Human-Computer Dialogue Technology,cs.CL," In this paper, we introduce the first evaluation of Chinese human-computer -dialogue technology. We detail the evaluation scheme, tasks, metrics and how to -collect and annotate the data for training, developing and test. The evaluation -includes two tasks, namely user intent classification and online testing of -task-oriented dialogue. To consider the different sources of the data for -training and developing, the first task can also be divided into two sub tasks. -Both the two tasks are coming from the real problems when using the -applications developed by industry. The evaluation data is provided by the -iFLYTEK Corporation. Meanwhile, in this paper, we publish the evaluation -results to present the current performance of the participants in the two tasks -of Chinese human-computer dialogue technology. Moreover, we analyze the -existing problems of human-computer dialogue as well as the evaluation scheme -itself. -" -5850,1709.10367,"Maja Rudolph, Francisco Ruiz, Susan Athey, David Blei",Structured Embedding Models for Grouped Data,cs.CL cs.LG stat.ML," Word embeddings are a powerful approach for analyzing language, and -exponential family embeddings (EFE) extend them to other types of data. Here we -develop structured exponential family embeddings (S-EFE), a method for -discovering embeddings that vary across related groups of data. We study how -the word usage of U.S. Congressional speeches varies across states and party -affiliation, how words are used differently across sections of the ArXiv, and -how the co-purchase patterns of groceries can vary across seasons. Key to the -success of our method is that the groups share statistical information. We -develop two sharing strategies: hierarchical modeling and amortization. We -demonstrate the benefits of this approach in empirical studies of speeches, -abstracts, and shopping baskets. We show how S-EFE enables group-specific -interpretation of word usage, and outperforms EFE in predicting held-out data. -" -5851,1709.10381,"Lasha Abzianidze, Johan Bos",Towards Universal Semantic Tagging,cs.CL," The paper proposes the task of universal semantic tagging---tagging word -tokens with language-neutral, semantically informative tags. We argue that the -task, with its independent nature, contributes to better semantic analysis for -wide-coverage multilingual text. We present the initial version of the semantic -tagset and show that (a) the tags provide semantically fine-grained -information, and (b) they are suitable for cross-lingual semantic parsing. An -application of the semantic tagging in the Parallel Meaning Bank supports both -of these points as the tags contribute to formal lexical semantics and their -cross-lingual projection. As a part of the application, we annotate a small -corpus with the semantic tags and present new baseline result for universal -semantic tagging. -" -5852,1709.10423,"Yanchao Yu, Arash Eshghi, Oliver Lemon","Learning how to learn: an adaptive dialogue agent for incrementally - learning visually grounded word meanings",cs.CL cs.AI cs.LG cs.RO," We present an optimised multi-modal dialogue agent for interactive learning -of visually grounded word meanings from a human tutor, trained on real -human-human tutoring data. Within a life-long interactive learning period, the -agent, trained using Reinforcement Learning (RL), must be able to handle -natural conversations with human users and achieve good learning performance -(accuracy) while minimising human effort in the learning process. We train and -evaluate this system in interaction with a simulated human tutor, which is -built on the BURCHAK corpus -- a Human-Human Dialogue dataset for the visual -learning task. The results show that: 1) The learned policy can coherently -interact with the simulated user to achieve the goal of the task (i.e. learning -visual attributes of objects, e.g. colour and shape); and 2) it finds a better -trade-off between classifier accuracy and tutoring costs than hand-crafted -rule-based policies, including ones with dynamic policies. -" -5853,1709.10426,"Yanchao Yu, Arash Eshghi, Oliver Lemon","Training an adaptive dialogue policy for interactive learning of - visually grounded word meanings",cs.CL cs.AI cs.LG cs.RO," We present a multi-modal dialogue system for interactive learning of -perceptually grounded word meanings from a human tutor. The system integrates -an incremental, semantic parsing/generation framework - Dynamic Syntax and Type -Theory with Records (DS-TTR) - with a set of visual classifiers that are -learned throughout the interaction and which ground the meaning representations -that it produces. We use this system in interaction with a simulated human -tutor to study the effects of different dialogue policies and capabilities on -the accuracy of learned meanings, learning rates, and efforts/costs to the -tutor. We show that the overall performance of the learning agent is affected -by (1) who takes initiative in the dialogues; (2) the ability to express/use -their confidence level about visual attributes; and (3) the ability to process -elliptical and incrementally constructed dialogue turns. Ultimately, we train -an adaptive dialogue policy which optimises the trade-off between classifier -accuracy and tutoring costs. -" -5854,1709.10431,"Yanchao Yu, Arash Eshghi, Gregory Mills, Oliver Joseph Lemon","The BURCHAK corpus: a Challenge Data Set for Interactive Learning of - Visually Grounded Word Meanings",cs.CL cs.AI cs.LG cs.RO," We motivate and describe a new freely available human-human dialogue dataset -for interactive learning of visually grounded word meanings through ostensive -definition by a tutor to a learner. The data has been collected using a novel, -character-by-character variant of the DiET chat tool (Healey et al., 2003; -Mills and Healey, submitted) with a novel task, where a Learner needs to learn -invented visual attribute words (such as "" burchak "" for square) from a tutor. -As such, the text-based interactions closely resemble face-to-face conversation -and thus contain many of the linguistic phenomena encountered in natural, -spontaneous dialogue. These include self-and other-correction, mid-sentence -continuations, interruptions, overlaps, fillers, and hedges. We also present a -generic n-gram framework for building user (i.e. tutor) simulations from this -type of incremental data, which is freely available to researchers. We show -that the simulations produce outputs that are similar to the original data -(e.g. 78% turn match similarity). Finally, we train and evaluate a -Reinforcement Learning dialogue control agent for learning visually grounded -word meanings, trained from the BURCHAK corpus. The learned policy shows -comparable performance to a rule-based system built previously. -" -5855,1709.10445,"Seunghyun Yoon, Pablo Estrada, Kyomin Jung",Synonym Discovery with Etymology-based Word Embeddings,cs.CL cs.AI," We propose a novel approach to learn word embeddings based on an extended -version of the distributional hypothesis. Our model derives word embedding -vectors using the etymological composition of words, rather than the context in -which they appear. It has the strength of not requiring a large text corpus, -but instead it requires reliable access to etymological roots of words, making -it specially fit for languages with logographic writing systems. The model -consists on three steps: (1) building an etymological graph, which is a -bipartite network of words and etymological roots, (2) obtaining the -biadjacency matrix of the etymological graph and reducing its dimensionality, -(3) using columns/rows of the resulting matrices as embedding vectors. We test -our model in the Chinese and Sino-Korean vocabularies. Our graphs are formed by -a set of 117,000 Chinese words, and a set of 135,000 Sino-Korean words. In both -cases we show that our model performs well in the task of synonym discovery. -" -5856,1709.10486,Casey Kennington and Sarah Plane,"Symbol, Conversational, and Societal Grounding with a Toy Robot",cs.CL," Essential to meaningful interaction is grounding at the symbolic, -conversational, and societal levels. We present ongoing work with Anki's Cozmo -toy robot as a research platform where we leverage the recent -words-as-classifiers model of lexical semantics in interactive reference -resolution tasks for language grounding. -" -5857,1710.00164,"Ta-Chung Chi, Po-Chun Chen, Shang-Yu Su, Yun-Nung Chen","Speaker Role Contextual Modeling for Language Understanding and Dialogue - Policy Learning",cs.CL," Language understanding (LU) and dialogue policy learning are two essential -components in conversational systems. Human-human dialogues are not -well-controlled and often random and unpredictable due to their own goals and -speaking habits. This paper proposes a role-based contextual model to consider -different speaker roles independently based on the various speaking patterns in -the multi-turn dialogues. The experiments on the benchmark dataset show that -the proposed role-based model successfully learns role-specific behavioral -patterns for contextual encoding and then significantly improves language -understanding and dialogue policy learning tasks. -" -5858,1710.00165,"Po-Chun Chen, Ta-Chung Chi, Shang-Yu Su, Yun-Nung Chen","Dynamic Time-Aware Attention to Speaker Roles and Contexts for Spoken - Language Understanding",cs.CL," Spoken language understanding (SLU) is an essential component in -conversational systems. Most SLU component treats each utterance independently, -and then the following components aggregate the multi-turn information in the -separate phases. In order to avoid error propagation and effectively utilize -contexts, prior work leveraged history for contextual SLU. However, the -previous model only paid attention to the content in history utterances without -considering their temporal information and speaker roles. In the dialogues, the -most recent utterances should be more important than the least recent ones. -Furthermore, users usually pay attention to 1) self history for reasoning and -2) others' utterances for listening, the speaker of the utterances may provides -informative cues to help understanding. Therefore, this paper proposes an -attention-based network that additionally leverages temporal information and -speaker role for better SLU, where the attention to contexts and speaker roles -can be automatically learned in an end-to-end manner. The experiments on the -benchmark Dialogue State Tracking Challenge 4 (DSTC4) dataset show that the -time-aware dynamic role attention networks significantly improve the -understanding performance. -" -5859,1710.00205,"Diana Nicoleta Popa, James Henderson",Bag-of-Vector Embeddings of Dependency Graphs for Semantic Induction,cs.CL," Vector-space models, from word embeddings to neural network parsers, have -many advantages for NLP. But how to generalise from fixed-length word vectors -to a vector space for arbitrary linguistic structures is still unclear. In this -paper we propose bag-of-vector embeddings of arbitrary linguistic graphs. A -bag-of-vector space is the minimal nonparametric extension of a vector space, -allowing the representation to grow with the size of the graph, but not tying -the representation to any specific tree or graph structure. We propose -efficient training and inference algorithms based on tensor factorisation for -embedding arbitrary graphs in a bag-of-vector space. We demonstrate the -usefulness of this representation by training bag-of-vector embeddings of -dependency graphs and evaluating them on unsupervised semantic induction for -the Semantic Textual Similarity and Natural Language Inference tasks. -" -5860,1710.00273,"Jason Xiaotian Dou, Michelle Liu, Haaris Muneer, Adam Schlussel",What Words Do We Use to Lie?: Word Choice in Deceptive Messages,cs.CL," Text messaging is the most widely used form of computer-mediated -communication (CMC). Previous findings have shown that linguistic factors can -reliably indicate messages as deceptive. For example, users take longer and use -more words to craft deceptive messages than they do truthful messages. Existing -research has also examined how factors, such as student status and gender, -affect rates of deception and word choice in deceptive messages. However, this -research has been limited by small sample sizes and has returned contradicting -findings. This paper aims to address these issues by using a dataset of text -messages collected from a large and varied set of participants using an Android -messaging application. The results of this paper show significant differences -in word choice and frequency of deceptive messages between male and female -participants, as well as between students and non-students. -" -5861,1710.00284,"Liqun Shao, Hao Zhang, Ming Jia, Jie Wang","Efficient and Effective Single-Document Summarizations and A - Word-Embedding Measurement of Quality",cs.IR cs.CL," Our task is to generate an effective summary for a given document with -specific realtime requirements. We use the softplus function to enhance keyword -rankings to favor important sentences, based on which we present a number of -summarization algorithms using various keyword extraction and topic clustering -methods. We show that our algorithms meet the realtime requirements and yield -the best ROUGE recall scores on DUC-02 over all previously-known algorithms. We -show that our algorithms meet the realtime requirements and yield the best -ROUGE recall scores on DUC-02 over all previously-known algorithms. To evaluate -the quality of summaries without human-generated benchmarks, we define a -measure called WESM based on word-embedding using Word Mover's Distance. We -show that the orderings of the ROUGE and WESM scores of our algorithms are -highly comparable, suggesting that WESM may serve as a viable alternative for -measuring the quality of a summary. -" -5862,1710.00286,"Liqun Shao, Jie Wang",DTATG: An Automatic Title Generator based on Dependency Trees,cs.IR cs.CL," We study automatic title generation for a given block of text and present a -method called DTATG to generate titles. DTATG first extracts a small number of -central sentences that convey the main meanings of the text and are in a -suitable structure for conversion into a title. DTATG then constructs a -dependency tree for each of these sentences and removes certain branches using -a Dependency Tree Compression Model we devise. We also devise a title test to -determine if a sentence can be used as a title. If a trimmed sentence passes -the title test, then it becomes a title candidate. DTATG selects the title -candidate with the highest ranking score as the final title. Our experiments -showed that DTATG can generate adequate titles. We also showed that -DTATG-generated titles have higher F1 scores than those generated by the -previous methods. -" -5863,1710.00341,"Georgi Karadzhov, Preslav Nakov, Lluis Marquez, Alberto Barron-Cedeno, - Ivan Koychev",Fully Automated Fact Checking Using External Sources,cs.CL," Given the constantly growing proliferation of false claims online in recent -years, there has been also a growing research interest in automatically -distinguishing false rumors from factually true claims. Here, we propose a -general-purpose framework for fully-automatic fact checking using external -sources, tapping the potential of the entire Web as a knowledge source to -confirm or reject a claim. Our framework uses a deep neural network with LSTM -text encoding to combine semantic kernels with task-specific embeddings that -encode a claim together with pieces of potentially-relevant text fragments from -the Web, taking the source reliability into account. The evaluation results -show good performance on two different tasks and datasets: (i) rumor detection -and (ii) fact checking of the answers to a question in community question -answering forums. -" -5864,1710.00346,Preslav Nakov and Stephan Vogel,Robust Tuning Datasets for Statistical Machine Translation,cs.CL," We explore the idea of automatically crafting a tuning dataset for -Statistical Machine Translation (SMT) that makes the hyper-parameters of the -SMT system more robust with respect to some specific deficiencies of the -parameter tuning algorithms. This is an under-explored research direction, -which can allow better parameter tuning. In this paper, we achieve this goal by -selecting a subset of the available sentence pairs, which are more suitable for -specific combinations of optimizers, objective functions, and evaluation -measures. We demonstrate the potential of the idea with the pairwise ranking -optimization (PRO) optimizer, which is known to yield too short translations. -We show that the learning problem can be alleviated by tuning on a subset of -the development set, selected based on sentence length. In particular, using -the longest 50% of the tuning sentences, we achieve two-fold tuning speedup, -and improvements in BLEU score that rival those of alternatives, which fix -BLEU+1's smoothing instead. -" -5865,1710.00372,"Roman Orus, Roger Martin, Juan Uriagereka",Mathematical foundations of matrix syntax,cs.CL quant-ph," Matrix syntax is a formal model of syntactic relations in language. The -purpose of this paper is to explain its mathematical foundations, for an -audience with some formal background. We make an axiomatic presentation, -motivating each axiom on linguistic and practical grounds. The resulting -mathematical structure resembles some aspects of quantum mechanics. Matrix -syntax allows us to describe a number of language phenomena that are otherwise -very difficult to explain, such as linguistic chains, and is arguably a more -economical theory of language than most of the theories proposed in the context -of the minimalist program in linguistics. In particular, sentences are -naturally modelled as vectors in a Hilbert space with a tensor product -structure, built from 2x2 matrices belonging to some specific group. -" -5866,1710.00453,"Stephanie Zhou, Alane Suhr, Yoav Artzi",Visual Reasoning with Natural Language,cs.CL," Natural language provides a widely accessible and expressive interface for -robotic agents. To understand language in complex environments, agents must -reason about the full range of language inputs and their correspondence to the -world. Such reasoning over language and vision is an open problem that is -receiving increasing attention. While existing data sets focus on visual -diversity, they do not display the full range of natural language expressions, -such as counting, set reasoning, and comparisons. - We propose a simple task for natural language visual reasoning, where images -are paired with descriptive statements. The task is to predict if a statement -is true for the given scene. This abstract describes our existing synthetic -images corpus and our current work on collecting real vision data. -" -5867,1710.00477,"Santiago Castro, Luis Chiruzzo, Aiala Ros\'a, Diego Garat and - Guillermo Moncecchi",A Crowd-Annotated Spanish Corpus for Humor Analysis,cs.CL," Computational Humor involves several tasks, such as humor recognition, humor -generation, and humor scoring, for which it is useful to have human-curated -data. In this work we present a corpus of 27,000 tweets written in Spanish and -crowd-annotated by their humor value and funniness score, with about four -annotations per tweet, tagged by 1,300 people over the Internet. It is equally -divided between tweets coming from humorous and non-humorous accounts. The -inter-annotator agreement Krippendorff's alpha value is 0.5710. The dataset is -available for general use and can serve as a basis for humor detection and as a -first step to tackle subjectivity. -" -5868,1710.00519,"Wenpeng Yin, Hinrich Sch\""utze","Attentive Convolution: Equipping CNNs with RNN-style Attention - Mechanisms",cs.CL," In NLP, convolutional neural networks (CNNs) have benefited less than -recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that -this is because the attention in CNNs has been mainly implemented as attentive -pooling (i.e., it is applied to pooling) rather than as attentive convolution -(i.e., it is integrated into convolution). Convolution is the differentiator of -CNNs in that it can powerfully model the higher-level representation of a word -by taking into account its local fixed-size context in the input text t^x. In -this work, we propose an attentive convolution network, ATTCONV. It extends the -context scope of the convolution operation, deriving higher-level features for -a word not only from local context, but also information extracted from -nonlocal context by the attention mechanism commonly used in RNNs. This -nonlocal context can come (i) from parts of the input text t^x that are distant -or (ii) from extra (i.e., external) contexts t^y. Experiments on sentence -modeling with zero-context (sentiment analysis), single-context (textual -entailment) and multiple-context (claim verification) demonstrate the -effectiveness of ATTCONV in sentence representation learning with the -incorporation of context. In particular, attentive convolution outperforms -attentive pooling and is a strong competitor to popular attentive RNNs. -" -5869,1710.00641,"Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio",Improving speech recognition by revising gated recurrent units,cs.CL cs.AI cs.LG cs.NE," Speech recognition is largely taking advantage of deep learning, showing that -substantial benefits can be obtained by modern Recurrent Neural Networks -(RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which -typically reach state-of-the-art performance in many tasks thanks to their -ability to learn long-term dependencies and robustness to vanishing gradients. -Nevertheless, LSTMs have a rather complex design with three multiplicative -gates, that might impair their efficient implementation. An attempt to simplify -LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just -two multiplicative gates. - This paper builds on these efforts by further revising GRUs and proposing a -simplified architecture potentially more suitable for speech recognition. The -contribution of this work is two-fold. First, we suggest to remove the reset -gate in the GRU design, resulting in a more efficient single-gate architecture. -Second, we propose to replace tanh with ReLU activations in the state update -equations. Results show that, in our implementation, the revised architecture -reduces the per-epoch training time with more than 30% and consistently -improves recognition performance across different tasks, input features, and -noisy conditions when compared to a standard GRU. -" -5870,1710.00683,"Xiaoyong Yan, Petter Minnhagen","The Dependence of Frequency Distributions on Multiple Meanings of Words, - Codes and Signs",cs.CL eess.AS physics.soc-ph," The dependence of the frequency distributions due to multiple meanings of -words in a text is investigated by deleting letters. By coding the words with -fewer letters the number of meanings per coded word increases. This increase is -measured and used as an input in a predictive theory. For a text written in -English, the word-frequency distribution is broad and fat-tailed, whereas if -the words are only represented by their first letter the distribution becomes -exponential. Both distribution are well predicted by the theory, as is the -whole sequence obtained by consecutively representing the words by the first -L=6,5,4,3,2,1 letters. Comparisons of texts written by Chinese characters and -the same texts written by letter-codes are made and the similarity of the -corresponding frequency-distributions are interpreted as a consequence of the -multiple meanings of Chinese characters. This further implies that the -difference of the shape for word-frequencies for an English text written by -letters and a Chinese text written by Chinese characters is due to the coding -and not to the language per se. -" -5871,1710.00689,"Martin Boyanov, Ivan Koychev, Preslav Nakov, Alessandro Moschitti, - Giovanni Da San Martino","Building Chatbots from Forum Data: Model Selection Using Question - Answering Metrics",cs.CL," We propose to use question answering (QA) data from Web forums to train -chatbots from scratch, i.e., without dialog training data. First, we extract -pairs of question and answer sentences from the typically much longer texts of -questions and answers in a forum. We then use these shorter texts to train -seq2seq models in a more efficient way. We further improve the parameter -optimization using a new model selection strategy based on QA measures. -Finally, we propose to use extrinsic evaluation with respect to a QA task as an -automatic evaluation method for chatbots. The evaluation shows that the model -achieves a MAP of 63.5% on the extrinsic task. Moreover, it can answer -correctly 49.5% of the questions when they are similar to questions asked in -the forum, and 47.3% of the questions when they are more conversational in -style. -" -5872,1710.00803,Marcos Zampieri,Compiling and Processing Historical and Contemporary Portuguese Corpora,cs.CL," This technical report describes the framework used for processing three large -Portuguese corpora. Two corpora contain texts from newspapers, one published in -Brazil and the other published in Portugal. The third corpus is Colonia, a -historical Portuguese collection containing texts written between the 16th and -the early 20th century. The report presents pre-processing methods, -segmentation, and annotation of the corpora as well as indexing and querying -methods. Finally, it presents published research papers using the corpora. -" -5873,1710.00880,"Haw-Shiuan Chang, ZiYun Wang, Luke Vilnis, Andrew McCallum","Distributional Inclusion Vector Embedding for Unsupervised Hypernymy - Detection",cs.CL," Modeling hypernymy, such as poodle is-a dog, is an important generalization -aid to many NLP tasks, such as entailment, coreference, relation extraction, -and question answering. Supervised learning from labeled hypernym sources, such -as WordNet, limits the coverage of these models, which can be addressed by -learning hypernyms from unlabeled text. Existing unsupervised methods either do -not scale to large vocabularies or yield unacceptably poor accuracy. This paper -introduces distributional inclusion vector embedding (DIVE), a -simple-to-implement unsupervised method of hypernym discovery via per-word -non-negative vector embeddings which preserve the inclusion property of word -contexts in a low-dimensional and interpretable space. In experimental -evaluations more comprehensive than any previous literature of which we are -aware-evaluating on 11 datasets using multiple existing as well as newly -proposed scoring functions-we find that our method provides up to double the -precision of previous unsupervised embeddings, and the highest average -performance, using a much more compact word representation, and yielding many -new state-of-the-art results. -" -5874,1710.00888,Jose Berengueres and Dani Castro,Sentiment Perception of Readers and Writers in Emoji use,cs.IR cs.CL cs.HC," Previous research has traditionally analyzed emoji sentiment from the point -of view of the reader of the content not the author. Here, we analyze emoji -sentiment from the point of view of the author and present a emoji sentiment -benchmark that was built from an employee happiness dataset where emoji happen -to be annotated with daily happiness of the author of the comment. The data -spans over 3 years, and 4k employees of 56 companies based in Barcelona. We -compare sentiment of writers to readers. Results indicate that, there is an 82% -agreement in how emoji sentiment is perceived by readers and writers. Finally, -we report that when authors use emoji they report higher levels of happiness. -Emoji use was not found to be correlated with differences in author moodiness. -" -5875,1710.00923,Michael Gasser,"Minimal Dependency Translation: a Framework for Computer-Assisted - Translation for Under-Resourced Languages",cs.CL," This paper introduces Minimal Dependency Translation (MDT), an ongoing -project to develop a rule-based framework for the creation of rudimentary -bilingual lexicon-grammars for machine translation and computer-assisted -translation into and out of under-resourced languages as well as initial steps -towards an implementation of MDT for English-to-Amharic translation. The basic -units in MDT, called groups, are headed multi-item sequences. In addition to -wordforms, groups may contain lexemes, syntactic-semantic categories, and -grammatical features. Each group is associated with one or more translations, -each of which is a group in a target language. During translation, constraint -satisfaction is used to select a set of source-language groups for the input -sentence and to sequence the words in the associated target-language groups. -" -5876,1710.00936,M. Stone and R. Arora,"Identifying Nominals with No Head Match Co-references Using Deep - Learning",cs.CL," Identifying nominals with no head match is a long-standing challenge in -coreference resolution with current systems performing significantly worse than -humans. In this paper we present a new neural network architecture which -outperforms the current state-of-the-art system on the English portion of the -CoNLL 2012 Shared Task. This is done by using a logistic regression on features -produced by two submodels, one of which is has the architecture proposed in -[CM16a] while the other combines domain specific embeddings of the antecedent -and the mention. We also propose some simple additional features which seem to -improve performance for all models substantially, increasing F1 by almost 4% on -basic logistic regression and other complex models. -" -5877,1710.00969,"Yukun Yan, Daqi Zheng, Zhengdong Lu, Sen Song","Event Identification as a Decision Process with Non-linear - Representation of Text",cs.CL," We propose scale-free Identifier Network(sfIN), a novel model for event -identification in documents. In general, sfIN first encodes a document into -multi-scale memory stacks, then extracts special events via conducting -multi-scale actions, which can be considered as a special type of sequence -labelling. The design of large scale actions makes it more efficient processing -a long document. The whole model is trained with both supervised learning and -reinforcement learning. -" -5878,1710.00987,Jialiang Zhao and Qi Gao,"Annotation and Detection of Emotion in Text-based Dialogue Systems with - CNN",cs.CL," Knowledge of users' emotion states helps improve human-computer interaction. -In this work, we presented EmoNet, an emotion detector of Chinese daily -dialogues based on deep convolutional neural networks. In order to maintain the -original linguistic features, such as the order, commonly used methods like -segmentation and keywords extraction were not adopted, instead we increased the -depth of CNN and tried to let CNN learn inner linguistic relationships. Our -main contribution is that we presented a new model and a new pipeline which can -be used in multi-language environment to solve sentimental problems. -Experimental results shows EmoNet has a great capacity in learning the emotion -of dialogues and achieves a better result than other state of art detectors do. -" -5879,1710.00998,"Emmanuele Chersoni, Enrico Santus, Philippe Blache, Alessandro Lenci","Is Structure Necessary for Modeling Argument Expectations in - Distributional Semantics?",cs.CL," Despite the number of NLP studies dedicated to thematic fit estimation, -little attention has been paid to the related task of composing and updating -verb argument expectations. The few exceptions have mostly modeled this -phenomenon with structured distributional models, implicitly assuming a -similarly structured representation of events. Recent experimental evidence, -however, suggests that human processing system could also exploit an -unstructured ""bag-of-arguments"" type of event representation to predict -upcoming input. In this paper, we re-implement a traditional structured model -and adapt it to compare the different hypotheses concerning the degree of -structure in our event knowledge, evaluating their relative performance in the -task of the argument expectations update. -" -5880,1710.01025,Raj Dabre and Sadao Kurohashi,"MMCR4NLP: Multilingual Multiway Corpora Repository for Natural Language - Processing",cs.CL," Multilinguality is gradually becoming ubiquitous in the sense that more and -more researchers have successfully shown that using additional languages help -improve the results in many Natural Language Processing tasks. Multilingual -Multiway Corpora (MMC) contain the same sentence in multiple languages. Such -corpora have been primarily used for Multi-Source and Pivot Language Machine -Translation but are also useful for developing multilingual sequence taggers by -transfer learning. While these corpora are available, they are not organized -for multilingual experiments and researchers need to write boilerplate code -every time they want to use said corpora. Moreover, because there is no -official MMC collection it becomes difficult to compare against existing -approaches. As such we present our work on creating a unified and -systematically organized repository of MMC spanning a large number of -languages. We also provide training, development and test splits for corpora -where official splits are unavailable. We hope that this will help speed up the -pace of multilingual NLP research and ensure that NLP researchers obtain -results that are more trustable since they can be compared easily. We indicate -corpora sources, extraction procedures if any and relevant statistics. We also -make our collection public for research purposes. -" -5881,1710.01093,"Helen L. Bear, Richard W. Harvey, Barry-John Theobald, and Yuxuan Lan","Which phoneme-to-viseme maps best improve visual-only computer - lip-reading?",cs.CV cs.CL eess.AS," A critical assumption of all current visual speech recognition systems is -that there are visual speech units called visemes which can be mapped to units -of acoustic speech, the phonemes. Despite there being a number of published -maps it is infrequent to see the effectiveness of these tested, particularly on -visual-only lip-reading (many works use audio-visual speech). Here we examine -120 mappings and consider if any are stable across talkers. We show a method -for devising maps based on phoneme confusions from an automated lip-reading -system, and we present new mappings that show improvements for individual -talkers. -" -5882,1710.01095,"Ingrid Falk, Fabienne Martin",Towards an Inferential Lexicon of Event Selecting Predicates for French,cs.CL," We present a manually constructed seed lexicon encoding the inferential -profiles of French event selecting predicates across different uses. The -inferential profile (Karttunen, 1971a) of a verb is designed to capture the -inferences triggered by the use of this verb in context. It reflects the -influence of the clause-embedding verb on the factuality of the event described -by the embedded clause. The resource developed provides evidence for the -following three hypotheses: (i) French implicative verbs have an aspect -dependent profile (their inferential profile varies with outer aspect), while -factive verbs have an aspect independent profile (they keep the same -inferential profile with both imperfective and perfective aspect); (ii) -implicativity decreases with imperfective aspect: the inferences triggered by -French implicative verbs combined with perfective aspect are often weakened -when the same verbs are combined with imperfective aspect; (iii) implicativity -decreases with an animate (deep) subject: the inferences triggered by a verb -which is implicative with an inanimate subject are weakened when the same verb -is used with an animate subject. The resource additionally shows that verbs -with different inferential profiles display clearly distinct sub-categorisation -patterns. In particular, verbs that have both factive and implicative readings -are shown to prefer infinitival clauses in their implicative reading, and -tensed clauses in their factive reading. -" -5883,1710.01142,"Helen L. Bear, Richard W. Harvey, Yuxuan Lan",Finding phonemes: improving machine lip-reading,cs.CV cs.CL eess.AS," In machine lip-reading there is continued debate and research around the -correct classes to be used for recognition. In this paper we use a structured -approach for devising speaker-dependent viseme classes, which enables the -creation of a set of phoneme-to-viseme maps where each has a different quantity -of visemes ranging from two to 45. Viseme classes are based upon the mapping of -articulated phonemes, which have been confused during phoneme recognition, into -viseme groups. Using these maps, with the LiLIR dataset, we show the effect of -changing the viseme map size in speaker-dependent machine lip-reading, measured -by word recognition correctness and so demonstrate that word recognition with -phoneme classifiers is not just possible, but often better than word -recognition with viseme classifiers. Furthermore, there are intermediate units -between visemes and phonemes which are better still. -" -5884,1710.01329,"Toan Q. Nguyen, David Chiang",Improving Lexical Choice in Neural Machine Translation,cs.CL," We explore two solutions to the problem of mistranslating rare words in -neural machine translation. First, we argue that the standard output layer, -which computes the inner product of a vector representing the context with all -possible output word embeddings, rewards frequent words disproportionately, and -we propose to fix the norms of both vectors to a constant value. Second, we -integrate a simple lexical module which is jointly trained with the rest of the -model. We evaluate our approaches on eight language pairs with data sizes -ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU, -surpassing phrase-based translation in nearly all settings. -" -5885,1710.01411,"Maryam Aminian, Mohammad Sadegh Rasooli, Mona Diab",Transferring Semantic Roles Using Translation and Syntactic Information,cs.CL," Our paper addresses the problem of annotation projection for semantic role -labeling for resource-poor languages using supervised annotations from a -resource-rich language through parallel data. We propose a transfer method that -employs information from source and target syntactic dependencies as well as -word alignment density to improve the quality of an iterative bootstrapping -method. Our experiments yield a $3.5$ absolute labeled F-score improvement over -a standard annotation projection method. -" -5886,1710.01487,"Giovanni Da San Martino, Salvatore Romeo, Alberto Barron-Cedeno, - Shafiq Joty, Lluis Marquez, Alessandro Moschitti, Preslav Nakov",Cross-Language Question Re-Ranking,cs.CL," We study how to find relevant questions in community forums when the language -of the new questions is different from that of the existing questions in the -forum. In particular, we explore the Arabic-English language pair. We compare a -kernel-based system with a feed-forward neural network in a scenario where a -large parallel corpus is available for training a machine translation system, -bilingual dictionaries, and cross-language word embeddings. We observe that -both approaches degrade the performance of the system when working on the -translated text, especially the kernel-based system, which depends heavily on a -syntactic kernel. We address this issue using a cross-language tree kernel, -which compares the original Arabic tree to the English trees of the related -questions. We show that this kernel almost closes the performance gap with -respect to the monolingual system. On the neural network side, we use the -parallel corpus to train cross-language embeddings, which we then use to -represent the Arabic input and the English related questions in the same space. -The results also improve to close to those of the monolingual neural network. -Overall, the kernel system shows a better performance compared to the neural -network in all cases. -" -5887,1710.01492,Preslav Nakov,Semantic Sentiment Analysis of Twitter Data,cs.CL," Internet and the proliferation of smart mobile devices have changed the way -information is created, shared, and spreads, e.g., microblogs such as Twitter, -weblogs such as LiveJournal, social networks such as Facebook, and instant -messengers such as Skype and WhatsApp are now commonly used to share thoughts -and opinions about anything in the surrounding world. This has resulted in the -proliferation of social media content, thus creating new opportunities to study -public opinion at a scale that was never possible before. Naturally, this -abundance of data has quickly attracted business and research interest from -various fields including marketing, political science, and social studies, -among many others, which are interested in questions like these: Do people like -the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about -the Brexit? Answering these questions requires studying the sentiment of -opinions people express in social media, which has given rise to the fast -growth of the field of sentiment analysis in social media, with Twitter being -especially popular for research due to its scale, representativeness, variety -of topics discussed, as well as ease of public access to its messages. Here we -present an overview of work on sentiment analysis on Twitter. -" -5888,1710.01504,"Shafiq Joty, Francisco Guzm\'an, Llu\'is M\`arquez, Preslav Nakov",Discourse Structure in Machine Translation Evaluation,cs.CL," In this article, we explore the potential of using sentence-level discourse -structure for machine translation evaluation. We first design discourse-aware -similarity measures, which use all-subtree kernels to compare discourse parse -trees in accordance with the Rhetorical Structure Theory (RST). Then, we show -that a simple linear combination with these measures can help improve various -existing machine translation evaluation metrics regarding correlation with -human judgments both at the segment- and at the system-level. This suggests -that discourse information is complementary to the information used by many of -the existing evaluation metrics, and thus it could be taken into account when -developing richer evaluation metrics, such as the WMT-14 winning combined -metric DiscoTKparty. We also provide a detailed analysis of the relevance of -various discourse elements and relations from the RST parse trees for machine -translation evaluation. In particular we show that: (i) all aspects of the RST -tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) -the similarity of the translation RST tree to the reference tree is positively -correlated with translation quality. -" -5889,1710.01507,"Vaibhav Kumar, Dhruv Khattar, Siddhartha Gairola, Yash Kumar Lal, - Vasudeva Varma",Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks,cs.IR cs.CL cs.CY cs.SI," Online media outlets, in a bid to expand their reach and subsequently -increase revenue through ad monetisation, have begun adopting clickbait -techniques to lure readers to click on articles. The article fails to fulfill -the promise made by the headline. Traditional methods for clickbait detection -have relied heavily on feature engineering which, in turn, is dependent on the -dataset it is built for. The application of neural networks for this task has -only been explored partially. We propose a novel approach considering all -information found in a social media post. We train a bidirectional LSTM with an -attention mechanism to learn the extent to which a word contributes to the -post's clickbait score in a differential manner. We also employ a Siamese net -to capture the similarity between source and target information. Information -gleaned from images has not been considered in previous approaches. We learn -image embeddings from large amounts of data using Convolutional Neural Networks -to add another layer of complexity to our model. Finally, we concatenate the -outputs from the three separate components, serving it as input to a fully -connected layer. We conduct experiments over a test corpus of 19538 social -media posts, attaining an F1 score of 65.37% on the dataset bettering the -previous state-of-the-art, as well as other proposed approaches, feature -engineering or otherwise. -" -5890,1710.01779,"Alexander Panchenko, Eugen Ruppert, Stefano Faralli, Simone Paolo - Ponzetto, Chris Biemann",Building a Web-Scale Dependency-Parsed Corpus from CommonCrawl,cs.CL," We present DepCC, the largest-to-date linguistically analyzed corpus in -English including 365 million documents, composed of 252 billion tokens and 7.5 -billion of named entity occurrences in 14.3 billion sentences from a web-scale -crawl of the \textsc{Common Crawl} project. The sentences are processed with a -dependency parser and with a named entity tagger and contain provenance -information, enabling various applications ranging from training syntax-based -word embeddings to open information extraction and question answering. We built -an index of all sentences and their linguistic meta-data enabling quick search -across the corpus. We demonstrate the utility of this corpus on the verb -similarity task by showing that a distributional model trained on our corpus -yields better results than models trained on smaller corpora, like Wikipedia. -This distributional model outperforms the state of art models of verb -similarity trained on smaller corpora on the SimVerb3500 dataset. -" -5891,1710.01789,"Aodong Li, Shiyue Zhang, Dong Wang and Thomas Fang Zheng",Enhanced Neural Machine Translation by Learning from Draft,cs.CL," Neural machine translation (NMT) has recently achieved impressive results. A -potential problem of the existing NMT algorithm, however, is that the decoding -is conducted from left to right, without considering the right context. This -paper proposes an two-stage approach to solve the problem. In the first stage, -a conventional attention-based NMT system is used to produce a draft -translation, and in the second stage, a novel double-attention NMT system is -used to refine the translation, by looking at the original input as well as the -draft translation. This drafting-and-refinement can obtain the right-context -information from the draft, hence producing more consistent translations. We -evaluated this approach using two Chinese-English translation tasks, one with -44k pairs and 1M pairs respectively. The experiments showed that our approach -achieved positive improvements over the conventional NMT system: the -improvements are 2.4 and 0.9 BLEU points on the small-scale and large-scale -tasks, respectively. -" -5892,1710.01799,"Kenneth C. Arnold, Kai-Wei Chang, Adam T. Kalai",Counterfactual Language Model Adaptation for Suggesting Phrases,cs.CL," Mobile devices use language models to suggest words and phrases for use in -text entry. Traditional language models are based on contextual word frequency -in a static corpus of text. However, certain types of phrases, when offered to -writers as suggestions, may be systematically chosen more often than their -frequency would predict. In this paper, we propose the task of generating -suggestions that writers accept, a related but distinct task to making accurate -predictions. Although this task is fundamentally interactive, we propose a -counterfactual setting that permits offline training and evaluation. We find -that even a simple language model can capture text characteristics that improve -acceptability. -" -5893,1710.01809,"Heike Adel, Ngoc Thang Vu, Katrin Kirchhoff, Dominic Telaar, Tanja - Schultz","Syntactic and Semantic Features For Code-Switching Factored Language - Models",cs.CL," This paper presents our latest investigations on different features for -factored language models for Code-Switching speech and their effect on -automatic speech recognition (ASR) performance. We focus on syntactic and -semantic features which can be extracted from Code-Switching text data and -integrate them into factored language models. Different possible factors, such -as words, part-of-speech tags, Brown word clusters, open class words and -clusters of open class word embeddings are explored. The experimental results -reveal that Brown word clusters, part-of-speech tags and open-class words are -the most effective at reducing the perplexity of factored language models on -the Mandarin-English Code-Switching corpus SEAME. In ASR experiments, the model -containing Brown word clusters and part-of-speech tags and the model also -including clusters of open class word embeddings yield the best mixed error -rate results. In summary, the best language model can significantly reduce the -perplexity on the SEAME evaluation set by up to 10.8% relative and the mixed -error rate by up to 3.4% relative. -" -5894,1710.01949,"Herman Kamper, Gregory Shakhnarovich, Karen Livescu","Semantic speech retrieval with a visually grounded model of - untranscribed speech",cs.CL cs.CV eess.AS," There is growing interest in models that can learn from unlabelled speech -paired with visual context. This setting is relevant for low-resource speech -processing, robotics, and human language acquisition research. Here we study -how a visually grounded speech model, trained on images of scenes paired with -spoken captions, captures aspects of semantics. We use an external image tagger -to generate soft text labels from images, which serve as targets for a neural -model that maps untranscribed speech to (semantic) keyword labels. We introduce -a newly collected data set of human semantic relevance judgements and an -associated task, semantic speech retrieval, where the goal is to search for -spoken utterances that are semantically relevant to a given text query. Without -seeing any text, the model trained on parallel speech and images achieves a -precision of almost 60% on its top ten semantic retrievals. Compared to a -supervised model trained on transcriptions, our model matches human judgements -better by some measures, especially in retrieving non-verbatim semantic -matches. We perform an extensive analysis of the model and its resulting -representations. -" -5895,1710.01977,"Xinyue Cao, Thai Le, Jason (Jiasheng) Zhang",Machine Learning Based Detection of Clickbait Posts in Social Media,cs.CL," Clickbait (headlines) make use of misleading titles that hide critical -information from or exaggerate the content on the landing target pages to -entice clicks. As clickbaits often use eye-catching wording to attract viewers, -target contents are often of low quality. Clickbaits are especially widespread -on social media such as Twitter, adversely impacting user experience by causing -immense dissatisfaction. Hence, it has become increasingly important to put -forward a widely applicable approach to identify and detect clickbaits. In this -paper, we make use of a dataset from the clickbait challenge 2017 -(clickbait-challenge.com) comprising of over 21,000 headlines/titles, each of -which is annotated by at least five judgments from crowdsourcing on how -clickbait it is. We attempt to build an effective computational clickbait -detection model on this dataset. We first considered a total of 331 features, -filtered out many features to avoid overfitting and improve the running time of -learning, and eventually selected the 60 most important features for our final -model. Using these features, Random Forest Regression achieved the following -results: MSE=0.035 MSE, Accuracy=0.82, and F1-sore=0.61 on the clickbait class. -" -5896,1710.02076,"Ignacio Cases, Minh-Thang Luong, Christopher Potts",On the Effective Use of Pretraining for Natural Language Inference,cs.CL," Neural networks have excelled at many NLP tasks, but there remain open -questions about the performance of pretrained distributed word representations -and their interaction with weight initialization and other hyperparameters. We -address these questions empirically using attention-based sequence-to-sequence -models for natural language inference (NLI). Specifically, we compare three -types of embeddings: random, pretrained (GloVe, word2vec), and retrofitted -(pretrained plus WordNet information). We show that pretrained embeddings -outperform both random and retrofitted ones in a large NLI corpus. Further -experiments on more controlled data sets shed light on the contexts for which -retrofitted embeddings can be useful. We also explore two principled approaches -to initializing the rest of the model parameters, Gaussian and orthogonal, -showing that the latter yields gains of up to 2.9% in the NLI task. -" -5897,1710.02086,"Sreelekha S, Pushpak Bhattacharyya",Indowordnets help in Indian Language Machine Translation,cs.CL," Being less resource languages, Indian-Indian and English-Indian language MT -system developments faces the difficulty to translate various lexical -phenomena. In this paper, we present our work on a comparative study of 440 -phrase-based statistical trained models for 110 language pairs across 11 Indian -languages. We have developed 110 baseline Statistical Machine Translation -systems. Then we have augmented the training corpus with Indowordnet synset -word entries of lexical database and further trained 110 models on top of the -baseline system. We have done a detailed performance comparison using various -evaluation metrics such as BLEU score, METEOR and TER. We observed significant -improvement in evaluations of translation quality across all the 440 models -after using the Indowordnet. These experiments give a detailed insight in two -ways : (1) usage of lexical database with synset mapping for resource poor -languages (2) efficient usage of Indowordnet sysnset mapping. More over, synset -mapped lexical entries helped the SMT system to handle the ambiguity to a great -extent during the translation. -" -5898,1710.02093,"Sreelekha S, Pushpak Bhattacharyya",Morphology Generation for Statistical Machine Translation,cs.CL," When translating into morphologically rich languages, Statistical MT -approaches face the problem of data sparsity. The severity of the sparseness -problem will be high when the corpus size of morphologically richer language is -less. Even though we can use factored models to correctly generate -morphological forms of words, the problem of data sparseness limits their -performance. In this paper, we describe a simple and effective solution which -is based on enriching the input corpora with various morphological forms of -words. We use this method with the phrase-based and factor-based experiments on -two morphologically rich languages: Hindi and Marathi when translating from -English. We evaluate the performance of our experiments both in terms automatic -evaluation and subjective evaluation such as adequacy and fluency. We observe -that the morphology injection method helps in improving the quality of -translation. We further analyze that the morph injection method helps in -handling the data sparseness problem to a great level. -" -5899,1710.02095,"Francisco Guzm\'an, Shafiq R. Joty, Llu\'is M\`arquez, Preslav Nakov",Machine Translation Evaluation with Neural Networks,cs.CL," We present a framework for machine translation evaluation using neural -networks in a pairwise setting, where the goal is to select the better -translation from a pair of hypotheses, given the reference translation. In this -framework, lexical, syntactic and semantic information from the reference and -the two hypotheses is embedded into compact distributed vector representations, -and fed into a multi-layer neural network that models nonlinear interactions -between each of the hypotheses and the reference, as well as between the two -hypotheses. We experiment with the benchmark datasets from the WMT Metrics -shared task, on which we obtain the best results published so far, with the -basic network configuration. We also perform a series of experiments to analyze -and understand the contribution of the different components of the network. We -evaluate variants and extensions, including fine-tuning of the semantic -embeddings, and sentence-based representations modeled with convolutional and -recurrent neural networks. In summary, the proposed framework is flexible and -generalizable, allows for efficient learning and scoring, and provides an MT -evaluation metric that correlates with human judgments, and is on par with the -state of the art. -" -5900,1710.02100,"Sreelekha S, Pushpak Bhattacharyya",Phrase Pair Mappings for Hindi-English Statistical Machine Translation,cs.CL," In this paper, we present our work on the creation of lexical resources for -the Machine Translation between English and Hindi. We describes the development -of phrase pair mappings for our experiments and the comparative performance -evaluation between different trained models on top of the baseline Statistical -Machine Translation system. We focused on augmenting the parallel corpus with -more vocabulary as well as with various inflected forms by exploring different -ways. We have augmented the training corpus with various lexical resources such -as lexical words, synset words, function words and verb phrases. We have -described the case studies, automatic and subjective evaluations, detailed -error analysis for both the English to Hindi and Hindi to English machine -translation systems. We further analyzed that, there is an incremental growth -in the quality of machine translation with the usage of various lexical -resources. Thus lexical resources do help uplift the translation quality of -resource poor langugaes. -" -5901,1710.02187,Benjamin Heinzerling and Michael Strube,BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages,cs.CL," We present BPEmb, a collection of pre-trained subword unit embeddings in 275 -languages, based on Byte-Pair Encoding (BPE). In an evaluation using -fine-grained entity typing as testbed, BPEmb performs competitively, and for -some languages bet- ter than alternative subword approaches, while requiring -vastly fewer resources and no tokenization. BPEmb is available at -https://github.com/bheinzerling/bpemb -" -5902,1710.02318,"Shuming Ma, Xu Sun","A Semantic Relevance Based Neural Network for Text Summarization and - Text Simplification",cs.CL," Text summarization and text simplification are two major ways to simplify the -text for poor readers, including children, non-native speakers, and the -functionally illiterate. Text summarization is to produce a brief summary of -the main ideas of the text, while text simplification aims to reduce the -linguistic complexity of the text and retain the original meaning. Recently, -most approaches for text summarization and text simplification are based on the -sequence-to-sequence model, which achieves much success in many text generation -tasks. However, although the generated simplified texts are similar to source -texts literally, they have low semantic relevance. In this work, our goal is to -improve semantic relevance between source texts and simplified texts for text -summarization and text simplification. We introduce a Semantic Relevance Based -neural model to encourage high semantic similarity between texts and summaries. -In our model, the source text is represented by a gated attention encoder, -while the summary representation is produced by a decoder. Besides, the -similarity score between the representations is maximized during training. Our -experiments show that the proposed model outperforms the state-of-the-art -systems on two benchmark corpus. -" -5903,1710.02365,"Pavel Kr\'al, Ladislav Lenc",Czech Text Document Corpus v 2.0,cs.CL," This paper introduces ""Czech Text Document Corpus v 2.0"", a collection of -text documents for automatic document classification in Czech language. It is -composed of the text documents provided by the Czech News Agency and is freely -available for research purposes at http://ctdc.kiv.zcu.cz/. This corpus was -created in order to facilitate a straightforward comparison of the document -classification approaches on Czech data. It is particularly dedicated to -evaluation of multi-label document classification approaches, because one -document is usually labelled with more than one label. Besides the information -about the document classes, the corpus is also annotated at the morphological -layer. This paper further shows the results of selected state-of-the-art -methods on this corpus to offer the possibility of an easy comparison with -these approaches. -" -5904,1710.02398,"Sreelekha S, Pushpak Bhattacharyya",Bilingual Words and Phrase Mappings for Marathi and Hindi SMT,cs.CL," Lack of proper linguistic resources is the major challenges faced by the -Machine Translation system developments when dealing with the resource poor -languages. In this paper, we describe effective ways to utilize the lexical -resources to improve the quality of statistical machine translation. Our -research on the usage of lexical resources mainly focused on two ways, such as; -augmenting the parallel corpus with more vocabulary and to provide various word -forms. We have augmented the training corpus with various lexical resources -such as lexical words, function words, kridanta pairs and verb phrases. We have -described the case studies, evaluations and detailed error analysis for both -Marathi to Hindi and Hindi to Marathi machine translation systems. From the -evaluations we observed that, there is an incremental growth in the quality of -machine translation as the usage of various lexical resources increases. -Moreover, usage of various lexical resources helps to improve the coverage and -quality of machine translation where limited parallel corpus is available. -" -5905,1710.02437,James Henderson,"Learning Word Embeddings for Hyponymy with Entailment-Based - Distributional Semantics",cs.CL," Lexical entailment, such as hyponymy, is a fundamental issue in the semantics -of natural language. This paper proposes distributional semantic models which -efficiently learn word embeddings for entailment, using a recently-proposed -framework for modelling entailment in a vector-space. These models postulate a -latent vector for a pseudo-phrase containing two neighbouring word vectors. We -investigate both modelling words as the evidence they contribute about this -phrase vector, or as the posterior distribution of a one-word phrase vector, -and find that the posterior vectors perform better. The resulting word -embeddings outperform the best previous results on predicting hyponymy between -words, in unsupervised and semi-supervised experiments. -" -5906,1710.02514,"Monireh Ebrahimi, Amir Hossein Yazdavar, Amit Sheth",On the Challenges of Sentiment Analysis for Dynamic Events,cs.CL," With the proliferation of social media over the last decade, determining -people's attitude with respect to a specific topic, document, interaction or -events has fueled research interest in natural language processing and -introduced a new channel called sentiment and emotion analysis. For instance, -businesses routinely look to develop systems to automatically understand their -customer conversations by identifying the relevant content to enhance marketing -their products and managing their reputations. Previous efforts to assess -people's sentiment on Twitter have suggested that Twitter may be a valuable -resource for studying political sentiment and that it reflects the offline -political landscape. According to a Pew Research Center report, in January 2016 -44 percent of US adults stated having learned about the presidential election -through social media. Furthermore, 24 percent reported use of social media -posts of the two candidates as a source of news and information, which is more -than the 15 percent who have used both candidates' websites or emails combined. -The first presidential debate between Trump and Hillary was the most tweeted -debate ever with 17.1 million tweets. -" -5907,1710.02560,Mirco Ravanelli and Maurizio Omologo,"The DIRHA-English corpus and related tasks for distant-speech - recognition in domestic environments",eess.AS cs.CL cs.SD," This paper introduces the contents and the possible usage of the -DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA -project. The reference scenario is a domestic environment equipped with a large -number of microphones and microphone arrays distributed in space. - The corpus is composed of both real and simulated material, and it includes -12 US and 12 UK English native speakers. Each speaker uttered different sets of -phonetically-rich sentences, newspaper articles, conversational speech, -keywords, and commands. From this material, a large set of 1-minute sequences -was generated, which also includes typical domestic background noise as well as -inter/intra-room reverberation effects. Dev and test sets were derived, which -represent a very precious material for different studies on multi-microphone -speech processing and distant-speech recognition. Various tasks and -corresponding Kaldi recipes have already been developed. - The paper reports a first set of baseline results obtained using different -techniques, including Deep Neural Networks (DNN), aligned with the -state-of-the-art at international level. -" -5908,1710.02569,"Ximena Gutierrez-Vasques, Victor Mijangos","Low-resource bilingual lexicon extraction using graph based word - embeddings",cs.CL," In this work we focus on the task of automatically extracting bilingual -lexicon for the language pair Spanish-Nahuatl. This is a low-resource setting -where only a small amount of parallel corpus is available. Most of the -downstream methods do not work well under low-resources conditions. This is -specially true for the approaches that use vectorial representations like -Word2Vec. Our proposal is to construct bilingual word vectors from a graph. -This graph is generated using translation pairs obtained from an unsupervised -word alignment method. - We show that, in a low-resource setting, these type of vectors are successful -in representing words in a bilingual semantic space. Moreover, when a linear -transformation is applied to translate words from one language to another, our -graph based representations considerably outperform the popular setting that -uses Word2Vec. -" -5909,1710.02603,Aaron Jaech and Mari Ostendorf,Low-Rank RNN Adaptation for Context-Aware Language Modeling,cs.CL," A context-aware language model uses location, user and/or domain metadata -(context) to adapt its predictions. In neural language models, context -information is typically represented as an embedding and it is given to the RNN -as an additional input, which has been shown to be useful in many applications. -We introduce a more powerful mechanism for using context to adapt an RNN by -letting the context vector control a low-rank transformation of the recurrent -layer weight matrix. Experiments show that allowing a greater fraction of the -model parameters to be adjusted has benefits in terms of perplexity and -classification for several different types of context. -" -5910,1710.02650,Johannes Schneider,Topic Modeling based on Keywords and Context,cs.CL cs.IR cs.LG," Current topic models often suffer from discovering topics not matching human -intuition, unnatural switching of topics within documents and high -computational demands. We address these concerns by proposing a topic model and -an inference algorithm based on automatically identifying characteristic -keywords for topics. Keywords influence topic-assignments of nearby words. Our -algorithm learns (key)word-topic scores and it self-regulates the number of -topics. Inference is simple and easily parallelizable. Qualitative analysis -yields comparable results to state-of-the-art models (eg. LDA), but with -different strengths and weaknesses. Quantitative analysis using 9 datasets -shows gains in terms of classification accuracy, PMI score, computational -performance and consistency of topic assignments within documents, while most -often using less topics. -" -5911,1710.02717,"Mingbo Ma, Liang Huang, Bing Xiang and Bowen Zhou",Group Sparse CNNs for Question Classification with Answer Sets,cs.CL," Question classification is an important task with wide applications. However, -traditional techniques treat questions as general sentences, ignoring the -corresponding answer data. In order to consider answer information into -question modeling, we first introduce novel group sparse autoencoders which -refine question representation by utilizing group information in the answer -set. We then propose novel group sparse CNNs which naturally learn question -representation with respect to their answers by implanting group sparse -autoencoders into traditional CNNs. The proposed model significantly outperform -strong baselines on four datasets. -" -5912,1710.02718,"Mingbo Ma, Dapeng Li, Kai Zhao and Liang Huang",OSU Multimodal Machine Translation System Report,cs.CL," This paper describes Oregon State University's submissions to the shared -WMT'17 task ""multimodal translation task I"". In this task, all the sentence -pairs are image captions in different languages. The key difference between -this task and conventional machine translation is that we have corresponding -images as additional information for each sentence pair. In this paper, we -introduce a simple but effective system which takes an image shared between -different languages, feeding it into the both encoding and decoding side. We -report our system's performance for English-French and English-German with -Flickr30K (in-domain) and MSCOCO (out-of-domain) datasets. Our system achieves -the best performance in TER for English-German for MSCOCO dataset. -" -5913,1710.02745,"Kaustubh Mani, Ishan Verma, Hardik Meisheri, Lipika Dey",Multi-Document Summarization using Distributed Bag-of-Words Model,cs.CL," As the number of documents on the web is growing exponentially, -multi-document summarization is becoming more and more important since it can -provide the main ideas in a document set in short time. In this paper, we -present an unsupervised centroid-based document-level reconstruction framework -using distributed bag of words model. Specifically, our approach selects -summary sentences in order to minimize the reconstruction error between the -summary and the documents. We apply sentence selection and beam search, to -further improve the performance of our model. Experimental results on two -different datasets show significant performance gains compared with the -state-of-the-art baselines. -" -5914,1710.02772,"Zheqian Chen, Rongqin Yang, Bin Cao, Zhou Zhao, Deng Cai, Xiaofei He",Smarnet: Teaching Machines to Read and Comprehend Like Human,cs.CL cs.IR," Machine Comprehension (MC) is a challenging task in Natural Language -Processing field, which aims to guide the machine to comprehend a passage and -answer the given question. Many existing approaches on MC task are suffering -the inefficiency in some bottlenecks, such as insufficient lexical -understanding, complex question-passage interaction, incorrect answer -extraction and so on. In this paper, we address these problems from the -viewpoint of how humans deal with reading tests in a scientific way. -Specifically, we first propose a novel lexical gating mechanism to dynamically -combine the words and characters representations. We then guide the machines to -read in an interactive way with attention mechanism and memory network. Finally -we add a checking layer to refine the answer for insurance. The extensive -experiments on two popular datasets SQuAD and TriviaQA show that our method -exceeds considerable performance than most state-of-the-art solutions at the -time of submission. -" -5915,1710.02855,"Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya",The IIT Bombay English-Hindi Parallel Corpus,cs.CL," We present the IIT Bombay English-Hindi Parallel Corpus. The corpus is a -compilation of parallel corpora previously available in the public domain as -well as new parallel corpora we collected. The corpus contains 1.49 million -parallel segments, of which 694k segments were not previously available in the -public domain. The corpus has been pre-processed for machine translation, and -we report baseline phrase-based SMT and NMT translation results on this corpus. -This corpus has been used in two editions of shared tasks at the Workshop on -Asian Language Translation (2016 and 2017). The corpus is freely available for -non-commercial research. To the best of our knowledge, this is the largest -publicly available English-Hindi parallel corpus. -" -5916,1710.02861,"Vijayasaradhi Indurthi, Subba Reddy Oota",Clickbait detection using word embeddings,cs.CL cs.IR," Clickbait is a pejorative term describing web content that is aimed at -generating online advertising revenue, especially at the expense of quality or -accuracy, relying on sensationalist headlines or eye-catching thumbnail -pictures to attract click-throughs and to encourage forwarding of the material -over online social networks. We use distributed word representations of the -words in the title as features to identify clickbaits in online news media. We -train a machine learning model using linear regression to predict the cickbait -score of a given tweet. Our methods achieve an F1-score of 64.98\% and an MSE -of 0.0791. Compared to other methods, our method is simple, fast to train, does -not require extensive feature engineering and yet moderately effective. -" -5917,1710.02925,"Alice Lai, Yonatan Bisk, Julia Hockenmaier",Natural Language Inference from Multiple Premises,cs.CL," We define a novel textual entailment task that requires inference over -multiple premise sentences. We present a new dataset for this task that -minimizes trivial lexical inferences, emphasizes knowledge of everyday events, -and presents a more challenging setting for textual entailment. We evaluate -several strong neural baselines and analyze how the multiple premise task -differs from standard textual entailment. -" -5918,1710.02973,"Alexandros Papangelis, Panagiotis Papadakos, Margarita Kotti, Yannis - Stylianou, Yannis Tzitzikas, Dimitris Plexousakis","LD-SDS: Towards an Expressive Spoken Dialogue System based on - Linked-Data",cs.IR cs.CL," In this work we discuss the related challenges and describe an approach -towards the fusion of state-of-the-art technologies from the Spoken Dialogue -Systems (SDS) and the Semantic Web and Information Retrieval domains. We -envision a dialogue system named LD-SDS that will support advanced, expressive, -and engaging user requests, over multiple, complex, rich, and open-domain data -sources that will leverage the wealth of the available Linked Data. -Specifically, we focus on: a) improving the identification, disambiguation and -linking of entities occurring in data sources and user input; b) offering -advanced query services for exploiting the semantics of the data, with -reasoning and exploratory capabilities; and c) expanding the typical -information seeking dialogue model (slot filling) to better reflect real-world -conversational search scenarios. -" -5919,1710.03006,Gregor Wiedemann and Gerhard Heyer,"Page Stream Segmentation with Convolutional Neural Nets Combining - Textual and Visual Features",cs.CL," In recent years, (retro-)digitizing paper-based files became a major -undertaking for private and public archives as well as an important task in -electronic mailroom applications. As a first step, the workflow involves -scanning and Optical Character Recognition (OCR) of documents. Preservation of -document contexts of single page scans is a major requirement in this context. -To facilitate workflows involving very large amounts of paper scans, page -stream segmentation (PSS) is the task to automatically separate a stream of -scanned images into multi-page documents. In a digitization project together -with a German federal archive, we developed a novel approach based on -convolutional neural networks (CNN) combining image and text features to -achieve optimal document separation results. Evaluation shows that our PSS -architecture achieves an accuracy up to 93 % which can be regarded as a new -state-of-the-art for this task. -" -5920,1710.03203,Yujie Lu and Tatsunori Mori,"Deep Learning Paradigm with Transformed Monolingual Word Embeddings for - Multilingual Sentiment Analysis",cs.CL," The surge of social media use brings huge demand of multilingual sentiment -analysis (MSA) for unveiling cultural difference. So far, traditional methods -resorted to machine translation---translating texts in other languages to -English, and then adopt the methods once worked in English. However, this -paradigm is conditioned by the quality of machine translation. In this paper, -we propose a new deep learning paradigm to assimilate the differences between -languages for MSA. We first pre-train monolingual word embeddings separately, -then map word embeddings in different spaces into a shared embedding space, and -then finally train a parameter-sharing deep neural network for MSA. The -experimental results show that our paradigm is effective. Especially, our CNN -model outperforms a state-of-the-art baseline by around 2.1% in terms of -classification accuracy. -" -5921,1710.03255,Bowen Shi and Karen Livescu,"Multitask training with unlabeled data for end-to-end sign language - fingerspelling recognition",cs.CL cs.CV," We address the problem of automatic American Sign Language fingerspelling -recognition from video. Prior work has largely relied on frame-level labels, -hand-crafted features, or other constraints, and has been hampered by the -scarcity of data for this task. We introduce a model for fingerspelling -recognition that addresses these issues. The model consists of an -auto-encoder-based feature extractor and an attention-based neural -encoder-decoder, which are trained jointly. The model receives a sequence of -image frames and outputs the fingerspelled word, without relying on any -frame-level training labels or hand-crafted features. In addition, the -auto-encoder subcomponent makes it possible to leverage unlabeled data to -improve the feature learning. The model achieves 11.6% and 4.4% absolute letter -accuracy improvement respectively in signer-independent and signer-adapted -fingerspelling recognition over previous approaches that required frame-level -training labels. -" -5922,1710.03348,"Hamidreza Ghader, Christof Monz",What does Attention in Neural Machine Translation Pay Attention to?,cs.CL," Attention in neural machine translation provides the possibility to encode -relevant parts of the source sentence at each translation step. As a result, -attention is considered to be an alignment model as well. However, there is no -work that specifically studies attention and provides analysis of what is being -learned by attention models. Thus, the question still remains that how -attention is similar or different from the traditional alignment. In this -paper, we provide detailed analysis of attention and compare it to traditional -alignment. We answer the question of whether attention is only capable of -modelling translational equivalent or it captures more information. We show -that attention is different from alignment in some cases and is capturing -useful information other than alignments. -" -5923,1710.03430,"Seunghyun Yoon, Joongbo Shin, Kyomin Jung","Learning to Rank Question-Answer Pairs using Hierarchical Recurrent - Encoder with Latent Topic Clustering",cs.CL cs.AI," In this paper, we propose a novel end-to-end neural architecture for ranking -candidate answers, that adapts a hierarchical recurrent neural network and a -latent topic clustering module. With our proposed model, a text is encoded to a -vector representation from an word-level to a chunk-level to effectively -capture the entire meaning. In particular, by adapting the hierarchical -structure, our model shows very small performance degradations in longer text -comprehension while other state-of-the-art recurrent neural network models -suffer from it. Additionally, the latent topic clustering module extracts -semantic information from target samples. This clustering module is useful for -any text related tasks by allowing each data sample to find its nearest topic -cluster, thus helping the neural network model analyze the entire data. We -evaluate our models on the Ubuntu Dialogue Corpus and consumer electronic -domain question answering dataset, which is related to Samsung products. The -proposed model shows state-of-the-art results for ranking question-answer -pairs. -" -5924,1710.03476,Rob van der Goot and Gertjan van Noord,MoNoise: Modeling Noise Using a Modular Normalization System,cs.CL," We propose MoNoise: a normalization model focused on generalizability and -efficiency, it aims at being easily reusable and adaptable. Normalization is -the task of translating texts from a non- canonical domain to a more canonical -domain, in our case: from social media data to standard language. Our proposed -model is based on a modular candidate generation in which each module is -responsible for a different type of normalization action. The most important -generation modules are a spelling correction system and a word embeddings -module. Depending on the definition of the normalization task, a static lookup -list can be crucial for performance. We train a random forest classifier to -rank the candidates, which generalizes well to all different types of -normaliza- tion actions. Most features for the ranking originate from the -generation modules; besides these features, N-gram features prove to be an -important source of information. We show that MoNoise beats the -state-of-the-art on different normalization benchmarks for English and Dutch, -which all define the task of normalization slightly different. -" -5925,1710.03501,"P. Godard, G. Adda, M. Adda-Decker, J. Benjumea, L. Besacier, J. - Cooper-Leavitt, G-N. Kouarata, L. Lamel, H. Maynard, M. Mueller, A. Rialland, - S. Stueker, F. Yvon, M. Zanon-Boito","A Very Low Resource Language Speech Corpus for Computational Language - Documentation Experiments",cs.CL," Most speech and language technologies are trained with massive amounts of -speech and text information. However, most of the world languages do not have -such resources or stable orthography. Systems constructed under these almost -zero resource conditions are not only promising for speech technology but also -for computational language documentation. The goal of computational language -documentation is to help field linguists to (semi-)automatically analyze and -annotate audio recordings of endangered and unwritten languages. Example tasks -are automatic phoneme discovery or lexicon discovery from the speech signal. -This paper presents a speech corpus collected during a realistic language -documentation process. It is made up of 5k speech utterances in Mboshi (Bantu -C25) aligned to French text translations. Speech transcriptions are also made -available: they correspond to a non-standard graphemic form close to the -language phonology. We present how the data was collected, cleaned and -processed and we illustrate its use through a zero-resource task: spoken term -discovery. The dataset is made available to the community for reproducible -computational language documentation experiments and their evaluation. -" -5926,1710.03538,"Mirco Ravanelli, Maurizio Omologo","Contaminated speech training methods for robust DNN-HMM distant speech - recognition",eess.AS cs.CL cs.SD," Despite the significant progress made in the last years, state-of-the-art -speech recognition technologies provide a satisfactory performance only in the -close-talking condition. Robustness of distant speech recognition in adverse -acoustic conditions, on the other hand, remains a crucial open issue for future -applications of human-machine interaction. To this end, several advances in -speech enhancement, acoustic scene analysis as well as acoustic modeling, have -recently contributed to improve the state-of-the-art in the field. One of the -most effective approaches to derive a robust acoustic modeling is based on -using contaminated speech, which proved helpful in reducing the acoustic -mismatch between training and testing conditions. - In this paper, we revise this classical approach in the context of modern -DNN-HMM systems, and propose the adoption of three methods, namely, asymmetric -context windowing, close-talk based supervision, and close-talk based -pre-training. The experimental results, obtained using both real and simulated -data, show a significant advantage in using these three methods, overall -providing a 15% error rate reduction compared to the baseline systems. The same -trend in performance is confirmed either using a high-quality training set of -small size, and a large one. -" -5927,1710.03743,"Mat\=iss Rikters, Mark Fishel",Confidence through Attention,cs.CL," Attention distributions of the generated translations are a useful bi-product -of attention-based recurrent neural network translation models and can be -treated as soft alignments between the input and output tokens. In this work, -we use attention distributions as a confidence metric for output translations. -We present two strategies of using the attention distributions: filtering out -bad translations from a large back-translated corpus, and selecting the best -translation in a hybrid setup of two different translation systems. While -manual evaluation indicated only a weak correlation between our confidence -score and human judgments, the use-cases showed improvements of up to 2.22 BLEU -points for filtering and 0.99 points for hybrid translation, tested on -English<->German and English<->Latvian translation. -" -5928,1710.03838,"Dingquan Wang, Jason Eisner","The Galactic Dependencies Treebanks: Getting More Data by Synthesizing - New Languages",cs.CL," We release Galactic Dependencies 1.0---a large set of synthetic languages not -found on Earth, but annotated in Universal Dependencies format. This new -resource aims to provide training and development data for NLP methods that aim -to adapt to unfamiliar languages. Each synthetic treebank is produced from a -real treebank by stochastically permuting the dependents of nouns and/or verbs -to match the word order of other real languages. We discuss the usefulness, -realism, parsability, perplexity, and diversity of the synthetic languages. As -a simple demonstration of the use of Galactic Dependencies, we consider -single-source transfer, which attempts to parse a real target language using a -parser trained on a ""nearby"" source language. We find that including synthetic -source languages somewhat increases the diversity of the source pool, which -significantly improves results for most target languages. -" -5929,1710.03877,"Dingquan Wang, Jason Eisner","Fine-Grained Prediction of Syntactic Typology: Discovering Latent - Structure with Supervised Learning",cs.CL," We show how to predict the basic word-order facts of a novel language given -only a corpus of part-of-speech (POS) sequences. We predict how often direct -objects follow their verbs, how often adjectives follow their nouns, and in -general the directionalities of all dependency relations. Such typological -properties could be helpful in grammar induction. While such a problem is -usually regarded as unsupervised learning, our innovation is to treat it as -supervised learning, using a large collection of realistic synthetic languages -as training data. The supervised learner must identify surface features of a -language's POS sequence (hand-engineered or neural features) that correlate -with the language's deeper structure (latent trees). In the experiment, we -show: 1) Given a small set of real languages, it helps to add many synthetic -languages to the training data. 2) Our system is robust even when the POS -sequences include noise. 3) Our system on this task outperforms a grammar -induction baseline by a large margin. -" -5930,1710.03954,"Mathias Kraus, Stefan Feuerriegel","Decision support from financial disclosures with deep neural networks - and transfer learning",cs.CL," Company disclosures greatly aid in the process of financial decision-making; -therefore, they are consulted by financial investors and automated traders -before exercising ownership in stocks. While humans are usually able to -correctly interpret the content, the same is rarely true of computerized -decision support systems, which struggle with the complexity and ambiguity of -natural language. A possible remedy is represented by deep learning, which -overcomes several shortcomings of traditional methods of text mining. For -instance, recurrent neural networks, such as long short-term memories, employ -hierarchical structures, together with a large number of hidden layers, to -automatically extract features from ordered sequences of words and capture -highly non-linear relationships such as context-dependent meanings. However, -deep learning has only recently started to receive traction, possibly because -its performance is largely untested. Hence, this paper studies the use of deep -neural networks for financial decision support. We additionally experiment with -transfer learning, in which we pre-train the network on a different corpus with -a length of 139.1 million words. Our results reveal a higher directional -accuracy as compared to traditional machine learning when predicting stock -price movements in response to financial disclosures. Our work thereby helps to -highlight the business value of deep learning and provides recommendations to -practitioners and executives. -" -5931,1710.03957,"Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, Shuzi Niu",DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset,cs.CL," We develop a high-quality multi-turn dialog dataset, DailyDialog, which is -intriguing in several aspects. The language is human-written and less noisy. -The dialogues in the dataset reflect our daily communication way and cover -various topics about our daily life. We also manually label the developed -dataset with communication intention and emotion information. Then, we evaluate -existing approaches on DailyDialog dataset and hope it benefit the research -field of dialog systems. -" -5932,1710.04087,"Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic - Denoyer, Herv\'e J\'egou",Word Translation Without Parallel Data,cs.CL," State-of-the-art methods for learning cross-lingual word embeddings have -relied on bilingual dictionaries or parallel corpora. Recent studies showed -that the need for parallel data supervision can be alleviated with -character-level information. While these methods showed encouraging results, -they are not on par with their supervised counterparts and are limited to pairs -of languages sharing a common alphabet. In this work, we show that we can build -a bilingual dictionary between two languages without using any parallel -corpora, by aligning monolingual word embedding spaces in an unsupervised way. -Without using any character information, our model even outperforms existing -supervised methods on cross-lingual tasks for some language pairs. Our -experiments demonstrate that our method works very well also for distant -language pairs, like English-Russian or English-Chinese. We finally describe -experiments on the English-Esperanto low-resource language pair, on which there -only exists a limited amount of parallel data, to show the potential impact of -our method in fully unsupervised machine translation. Our code, embeddings and -dictionaries are publicly available. -" -5933,1710.04099,Finn {\AA}rup Nielsen,Wembedder: Wikidata entity embedding web service,stat.ML cs.CL cs.LG," I present a web service for querying an embedding of entities in the Wikidata -knowledge graph. The embedding is trained on the Wikidata dump using Gensim's -Word2Vec implementation and a simple graph walk. A REST API is implemented. -Together with the Wikidata API the web service exposes a multilingual resource -for over 600'000 Wikidata items and properties. -" -5934,1710.04142,"Nishtha Madaan, Sameep Mehta, Mayank Saxena, Aditi Aggarwal, Taneea S - Agrawaal, Vrinda Malhotra","Bollywood Movie Corpus for Text, Images and Videos",cs.CY cs.CL," In past few years, several data-sets have been released for text and images. -We present an approach to create the data-set for use in detecting and removing -gender bias from text. We also include a set of challenges we have faced while -creating this corpora. In this work, we have worked with movie data from -Wikipedia plots and movie trailers from YouTube. Our Bollywood Movie corpus -contains 4000 movies extracted from Wikipedia and 880 trailers extracted from -YouTube which were released from 1970-2017. The corpus contains csv files with -the following data about each movie - Wikipedia title of movie, cast, plot -text, co-referenced plot text, soundtrack information, link to movie poster, -caption of movie poster, number of males in poster, number of females in -poster. In addition to that, corresponding to each cast member the following -data is available - cast name, cast gender, cast verbs, cast adjectives, cast -relations, cast centrality, cast mentions. We present some preliminary results -on the task of bias removal which suggest that the data-set is quite useful for -performing such tasks. -" -5935,1710.04203,Giannis Haralabopoulos and Elena Simperl,"Crowdsourcing for Beyond Polarity Sentiment Analysis A Pure Emotion - Lexicon",cs.CL cs.HC," Sentiment analysis aims to uncover emotions conveyed through information. In -its simplest form, it is performed on a polarity basis, where the goal is to -classify information with positive or negative emotion. Recent research has -explored more nuanced ways to capture emotions that go beyond polarity. For -these methods to work, they require a critical resource: a lexicon that is -appropriate for the task at hand, in terms of the range of emotions it captures -diversity. In the past, sentiment analysis lexicons have been created by -experts, such as linguists and behavioural scientists, with strict rules. -Lexicon evaluation was also performed by experts or gold standards. In our -paper, we propose a crowdsourcing method for lexicon acquisition, which is -scalable, cost-effective, and doesn't require experts or gold standards. We -also compare crowd and expert evaluations of the lexicon, to assess the overall -lexicon quality, and the evaluation capabilities of the crowd. -" -5936,1710.04312,"Kyle Hundman, Chris A. Mattmann","Measurement Context Extraction from Text: Discovering Opportunities and - Gaps in Earth Science",cs.IR cs.AI cs.CL," We propose Marve, a system for extracting measurement values, units, and -related words from natural language text. Marve uses conditional random fields -(CRF) to identify measurement values and units, followed by a rule-based system -to find related entities, descriptors and modifiers within a sentence. Sentence -tokens are represented by an undirected graphical model, and rules are based on -part-of-speech and word dependency patterns connecting values and units to -contextual words. Marve is unique in its focus on measurement context and early -experimentation demonstrates Marve's ability to generate high-precision -extractions with strong recall. We also discuss Marve's role in refining -measurement requirements for NASA's proposed HyspIRI mission, a hyperspectral -infrared imaging satellite that will study the world's ecosystems. In general, -our work with HyspIRI demonstrates the value of semantic measurement -extractions in characterizing quantitative discussion contained in large -corpuses of natural language text. These extractions accelerate broad, -cross-cutting research and expose scientists new algorithmic approaches and -experimental nuances. They also facilitate identification of scientific -opportunities enabled by HyspIRI leading to more efficient scientific -investment and research. -" -5937,1710.04334,"Allen Nie, Erin D. Bennett, Noah D. Goodman","DisSent: Sentence Representation Learning from Explicit Discourse - Relations",cs.CL cs.AI," Learning effective representations of sentences is one of the core missions -of natural language understanding. Existing models either train on a vast -amount of text, or require costly, manually curated sentence relation datasets. -We show that with dependency parsing and rule-based rubrics, we can curate a -high quality sentence relation task by leveraging explicit discourse relations. -We show that our curated dataset provides an excellent signal for learning -vector representations of sentence meaning, representing relations that can -only be determined when the meanings of two sentences are combined. We -demonstrate that the automatically curated corpus allows a bidirectional LSTM -sentence encoder to yield high quality sentence embeddings and can serve as a -supervised fine-tuning dataset for larger models such as BERT. Our fixed -sentence embeddings achieve high performance on a variety of transfer tasks, -including SentEval, and we achieve state-of-the-art results on Penn Discourse -Treebank's implicit relation prediction task. -" -5938,1710.04344,"Zeyu Dai, Wenlin Yao, Ruihong Huang","Using Context Events in Neural Network Models for Event Temporal Status - Identification",cs.CL," Focusing on the task of identifying event temporal status, we find that -events directly or indirectly governing the target event in a dependency tree -are most important contexts. Therefore, we extract dependency chains containing -context events and use them as input in neural network models, which -consistently outperform previous models using local context words as input. -Visualization verifies that the dependency chain representation can effectively -capture the context events which are closely related to the target event and -play key roles in predicting event temporal status. -" -5939,1710.04437,Yuichiroh Matsubayashi and Kentaro Inui,"Revisiting the Design Issues of Local Models for Japanese - Predicate-Argument Structure Analysis",cs.CL," The research trend in Japanese predicate-argument structure (PAS) analysis is -shifting from pointwise prediction models with local features to global models -designed to search for globally optimal solutions. However, the existing global -models tend to employ only relatively simple local features; therefore, the -overall performance gains are rather limited. The importance of designing a -local model is demonstrated in this study by showing that the performance of a -sophisticated local model can be considerably improved with recent feature -embedding methods and a feature combination learning based on a neural network, -outperforming the state-of-the-art global models in $F_1$ on a common benchmark -dataset. -" -5940,1710.04515,Dan Lim,Convolutional Attention-based Seq2Seq Neural Network for End-to-End ASR,cs.CL," This thesis introduces the sequence to sequence model with Luong's attention -mechanism for end-to-end ASR. It also describes various neural network -algorithms including Batch normalization, Dropout and Residual network which -constitute the convolutional attention-based seq2seq neural network. Finally -the proposed model proved its effectiveness for speech recognition achieving -15.8% phoneme error rate on TIMIT dataset. -" -5941,1710.04600,"Deepak Gupta, Pabitra Lenka, Harsimran Bedi, Asif Ekbal, Pushpak - Bhattacharyya",Auto Analysis of Customer Feedback using CNN and GRU Network,cs.CL," Analyzing customer feedback is the best way to channelize the data into new -marketing strategies that benefit entrepreneurs as well as customers. Therefore -an automated system which can analyze the customer behavior is in great demand. -Users may write feedbacks in any language, and hence mining appropriate -information often becomes intractable. Especially in a traditional -feature-based supervised model, it is difficult to build a generic system as -one has to understand the concerned language for finding the relevant features. -In order to overcome this, we propose deep Convolutional Neural Network (CNN) -and Recurrent Neural Network (RNN) based approaches that do not require -handcrafting of features. We evaluate these techniques for analyzing customer -feedback sentences in four languages, namely English, French, Japanese and -Spanish. Our empirical analysis shows that our models perform well in all the -four languages on the setups of IJCNLP Shared Task on Customer Feedback -Analysis. Our model achieved the second rank in French, with an accuracy of -71.75% and third ranks for all the other languages. -" -5942,1710.04802,"Jey Han Lau, Lianhua Chi, Khoi-Nguyen Tran, Trevor Cohn",End-to-end Network for Twitter Geolocation Prediction and Hashing,cs.CL," We propose an end-to-end neural network to predict the geolocation of a -tweet. The network takes as input a number of raw Twitter metadata such as the -tweet message and associated user account information. Our model is language -independent, and despite minimal feature engineering, it is interpretable and -capable of learning location indicative words and timing patterns. Compared to -state-of-the-art systems, our model outperforms them by 2%-6%. Additionally, we -propose extensions to the model to compress representation learnt by the -network into binary codes. Experiments show that it produces compact codes -compared to benchmark hashing algorithms. An implementation of the model is -released publicly. -" -5943,1710.04989,"Marcos Zampieri, Shervin Malmasi, Gustavo Paetzold, Lucia Specia","Complex Word Identification: Challenges in Data Annotation and System - Performance",cs.CL," This paper revisits the problem of complex word identification (CWI) -following up the SemEval CWI shared task. We use ensemble classifiers to -investigate how well computational methods can discriminate between complex and -non-complex words. Furthermore, we analyze the classification performance to -understand what makes lexical complexity challenging. Our findings show that -most systems performed poorly on the SemEval CWI dataset, and one of the -reasons for that is the way in which human annotation was performed. -" -5944,1710.05094,"Zhihao Zhou, Lifu Huang, Heng Ji",Learning Phrase Embeddings from Paraphrases with GRUs,cs.CL," Learning phrase representations has been widely explored in many Natural -Language Processing (NLP) tasks (e.g., Sentiment Analysis, Machine Translation) -and has shown promising improvements. Previous studies either learn -non-compositional phrase representations with general word embedding learning -techniques or learn compositional phrase representations based on syntactic -structures, which either require huge amounts of human annotations or cannot be -easily generalized to all phrases. In this work, we propose to take advantage -of large-scaled paraphrase database and present a pair-wise gated recurrent -units (pairwise-GRU) framework to generate compositional phrase -representations. Our framework can be re-used to generate representations for -any phrases. Experimental results show that our framework achieves -state-of-the-art results on several phrase similarity tasks. -" -5945,1710.05298,"Hyemin Ahn, Timothy Ha, Yunho Choi, Hwiyeon Yoo, and Songhwai Oh",Text2Action: Generative Adversarial Synthesis from Language to Action,cs.LG cs.CL cs.RO," In this paper, we propose a generative model which learns the relationship -between language and human action in order to generate a human action sequence -given a sentence describing human behavior. The proposed generative model is a -generative adversarial network (GAN), which is based on the sequence to -sequence (SEQ2SEQ) model. Using the proposed generative network, we can -synthesize various actions for a robot or a virtual agent using a text encoder -recurrent neural network (RNN) and an action decoder RNN. The proposed -generative network is trained from 29,770 pairs of actions and sentence -annotations extracted from MSR-Video-to-Text (MSR-VTT), a large-scale video -dataset. We demonstrate that the network can generate human-like actions which -can be transferred to a Baxter robot, such that the robot performs an action -based on a provided sentence. Results show that the proposed generative network -correctly models the relationship between language and action and can generate -a diverse set of actions from the same sentence. -" -5946,1710.05364,Yiwei Zhou,Clickbait Detection in Tweets Using Self-attentive Network,cs.CL," Clickbait detection in tweets remains an elusive challenge. In this paper, we -describe the solution for the Zingel Clickbait Detector at the Clickbait -Challenge 2017, which is capable of evaluating each tweet's level of click -baiting. We first reformat the regression problem as a multi-classification -problem, based on the annotation scheme. To perform multi-classification, we -apply a token-level, self-attentive mechanism on the hidden states of -bi-directional Gated Recurrent Units (biGRU), which enables the model to -generate tweets' task-specific vector representations by attending to important -tokens. The self-attentive neural network can be trained end-to-end, without -involving any manual feature engineering. Our detector ranked first in the -final evaluation of Clickbait Challenge 2017. -" -5947,1710.05370,"Erik Velldal and Lilja {\O}vrelid and Eivind Alexander Bergem and - Cathrine Stadsnes and Samia Touileb and Fredrik J{\o}rgensen",NoReC: The Norwegian Review Corpus,cs.CL," This paper presents the Norwegian Review Corpus (NoReC), created for training -and evaluating models for document-level sentiment analysis. The full-text -reviews have been collected from major Norwegian news sources and cover a range -of different domains, including literature, movies, video games, restaurants, -music and theater, in addition to product reviews across a range of categories. -Each review is labeled with a manually assigned score of 1-6, as provided by -the rating of the original author. This first release of the corpus comprises -more than 35,000 reviews. It is distributed using the CoNLL-U format, -pre-processed using UDPipe, along with a rich set of metadata. The work -reported in this paper forms part of the SANT initiative (Sentiment Analysis -for Norwegian Text), a project seeking to provide resources and tools for -sentiment analysis and opinion mining for Norwegian. As resources for sentiment -analysis have so far been unavailable for Norwegian, NoReC represents a highly -valuable and sought-after addition to Norwegian language technology. -" -5948,1710.05429,"Amir Hossein Yazdavar, Hussein S. Al-Olimat, Monireh Ebrahimi, - Goonmeet Bajaj, Tanvi Banerjee, Krishnaprasad Thirunarayan, Jyotishman - Pathak, Amit Sheth","Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in - Social Media",cs.CL," With the rise of social media, millions of people are routinely expressing -their moods, feelings, and daily struggles with mental health issues on social -media platforms like Twitter. Unlike traditional observational cohort studies -conducted through questionnaires and self-reported surveys, we explore the -reliable detection of clinical depression from tweets obtained unobtrusively. -Based on the analysis of tweets crawled from users with self-reported -depressive symptoms in their Twitter profiles, we demonstrate the potential for -detecting clinical depression symptoms which emulate the PHQ-9 questionnaire -clinicians use today. Our study uses a semi-supervised statistical model to -evaluate how the duration of these symptoms and their expression on Twitter (in -terms of word usage patterns and topical preferences) align with the medical -findings reported via the PHQ-9. Our proactive and automatic screening tool is -able to identify clinical depressive symptoms with an accuracy of 68% and -precision of 72%. -" -5949,1710.05519,Kiem-Hieu Nguyen,BKTreebank: Building a Vietnamese Dependency Treebank,cs.CL," Dependency treebank is an important resource in any language. In this paper, -we present our work on building BKTreebank, a dependency treebank for -Vietnamese. Important points on designing POS tagset, dependency relations, and -annotation guidelines are discussed. We describe experiments on POS tagging and -dependency parsing on the treebank. Experimental results show that the treebank -is a useful resource for Vietnamese language processing. -" -5950,1710.05709,Simon Ostermann and Michael Roth and Stefan Thater and Manfred Pinkal,Aligning Script Events with Narrative Texts,cs.CL," Script knowledge plays a central role in text understanding and is relevant -for a variety of downstream tasks. In this paper, we consider two recent -datasets which provide a rich and general representation of script events in -terms of paraphrase sets. We introduce the task of mapping event mentions in -narrative texts to such script event types, and present a model for this task -that exploits rich linguistic representations as well as information on -temporal ordering. The results of our experiments demonstrate that this complex -task is indeed feasible. -" -5951,1710.05780,"Alexander Bartl, Gerasimos Spanakis","A retrieval-based dialogue system utilizing utterance and context - embeddings",cs.CL cs.AI cs.IR," Finding semantically rich and computer-understandable representations for -textual dialogues, utterances and words is crucial for dialogue systems (or -conversational agents), as their performance mostly depends on understanding -the context of conversations. Recent research aims at finding distributed -vector representations (embeddings) for words, such that semantically similar -words are relatively close within the vector-space. Encoding the ""meaning"" of -text into vectors is a current trend, and text can range from words, phrases -and documents to actual human-to-human conversations. In recent research -approaches, responses have been generated utilizing a decoder architecture, -given the vector representation of the current conversation. In this paper, the -utilization of embeddings for answer retrieval is explored by using -Locality-Sensitive Hashing Forest (LSH Forest), an Approximate Nearest Neighbor -(ANN) model, to find similar conversations in a corpus and rank possible -candidates. Experimental results on the well-known Ubuntu Corpus (in English) -and a customer service chat dataset (in Dutch) show that, in combination with a -candidate selection method, retrieval-based approaches outperform generative -ones and reveal promising future research directions towards the usability of -such a system. -" -5952,1710.05978,Andreea Salinca,"Convolutional Neural Networks for Sentiment Classification on Business - Reviews",cs.CL," Recently Convolutional Neural Networks (CNNs) models have proven remarkable -results for text classification and sentiment analysis. In this paper, we -present our approach on the task of classifying business reviews using word -embeddings on a large-scale dataset provided by Yelp: Yelp 2017 challenge -dataset. We compare word-based CNN using several pre-trained word embeddings -and end-to-end vector representations for text reviews classification. We -conduct several experiments to capture the semantic relationship between -business reviews and we use deep learning techniques that prove that the -obtained results are competitive with traditional methods. -" -5953,1710.06071,"Franck Dernoncourt, Ji Young Lee","PubMed 200k RCT: a Dataset for Sequential Sentence Classification in - Medical Abstracts",cs.CL cs.AI stat.ML," We present PubMed 200k RCT, a new dataset based on PubMed for sequential -sentence classification. The dataset consists of approximately 200,000 -abstracts of randomized controlled trials, totaling 2.3 million sentences. Each -sentence of each abstract is labeled with their role in the abstract using one -of the following classes: background, objective, method, result, or conclusion. -The purpose of releasing this dataset is twofold. First, the majority of -datasets for sequential short-text classification (i.e., classification of -short texts that appear in sequences) are small: we hope that releasing a new -large dataset will help develop more accurate algorithms for this task. Second, -from an application perspective, researchers need better tools to efficiently -skim through the literature. Automatically classifying each sentence in an -abstract would help researchers read abstracts more efficiently, especially in -fields where abstracts may be long, such as the medical field. -" -5954,1710.06112,Jiawei Hu and Qun Liu,CASICT Tibetan Word Segmentation System for MLWS2017,cs.CL," We participated in the MLWS 2017 on Tibetan word segmentation task, our -system is trained in a unrestricted way, by introducing a baseline system and -76w tibetan segmented sentences of ours. In the system character sequence is -processed by the baseline system into word sequence, then a subword unit (BPE -algorithm) split rare words into subwords with its corresponding features, -after that a neural network classifier is adopted to token each subword into -""B,M,E,S"" label, in decoding step a simple rule is used to recover a final word -sequence. The candidate system for submition is selected by evaluating the -F-score in dev set pre-extracted from the 76w sentences. Experiment shows that -this method can fix segmentation errors of baseline system and result in a -significant performance gain. -" -5955,1710.06280,"Jun Hatori, Yuta Kikuchi, Sosuke Kobayashi, Kuniyuki Takahashi, Yuta - Tsuboi, Yuya Unno, Wilson Ko, Jethro Tan","Interactively Picking Real-World Objects with Unconstrained Spoken - Language Instructions",cs.RO cs.CL," Comprehension of spoken natural language is an essential component for robots -to communicate with human effectively. However, handling unconstrained spoken -instructions is challenging due to (1) complex structures including a wide -variety of expressions used in spoken language and (2) inherent ambiguity in -interpretation of human instructions. In this paper, we propose the first -comprehensive system that can handle unconstrained spoken language and is able -to effectively resolve ambiguity in spoken instructions. Specifically, we -integrate deep-learning-based object detection together with natural language -processing technologies to handle unconstrained spoken instructions, and -propose a method for robots to resolve instruction ambiguity through dialogue. -Through our experiments on both a simulated environment as well as a physical -industrial robot arm, we demonstrate the ability of our system to understand -natural instructions from human operators effectively, and how higher success -rates of the object picking task can be achieved through an interactive -clarification process. -" -5956,1710.06303,"Aditya Mogadala, Umanga Bista, Lexing Xie and Achim Rettinger","Describing Natural Images Containing Novel Objects with Knowledge Guided - Assitance",cs.CV cs.CL," Images in the wild encapsulate rich knowledge about varied abstract concepts -and cannot be sufficiently described with models built only using image-caption -pairs containing selected objects. We propose to handle such a task with the -guidance of a knowledge base that incorporate many abstract concepts. Our -method is a two-step process where we first build a multi-entity-label image -recognition model to predict abstract concepts as image labels and then -leverage them in the second step as an external semantic attention and -constrained inference in the caption generation model for describing images -that depict unseen/novel objects. Evaluations show that our models outperform -most of the prior work for out-of-domain captioning on MSCOCO and are useful -for integration of knowledge and vision in general. -" -5957,1710.06313,Mat\=iss Rikters and Ond\v{r}ej Bojar,Paying Attention to Multi-Word Expressions in Neural Machine Translation,cs.CL," Processing of multi-word expressions (MWEs) is a known problem for any -natural language processing task. Even neural machine translation (NMT) -struggles to overcome it. This paper presents results of experiments on -investigating NMT attention allocation to the MWEs and improving automated -translation of sentences that contain MWEs in English->Latvian and -English->Czech NMT systems. Two improvement strategies were explored -(1) -bilingual pairs of automatically extracted MWE candidates were added to the -parallel corpus used to train the NMT system, and (2) full sentences containing -the automatically extracted MWE candidates were added to the parallel corpus. -Both approaches allowed to increase automated evaluation results. The best -result - 0.99 BLEU point increase - has been reached with the first approach, -while with the second approach minimal improvements achieved. We also provide -open-source software and tools used for MWE extraction and alignment -inspection. -" -5958,1710.06371,Ivan Vuli\'c and Nikola Mrk\v{s}i\'c,Specialising Word Vectors for Lexical Entailment,cs.CL," We present LEAR (Lexical Entailment Attract-Repel), a novel post-processing -method that transforms any input word vector space to emphasise the asymmetric -relation of lexical entailment (LE), also known as the IS-A or -hyponymy-hypernymy relation. By injecting external linguistic constraints -(e.g., WordNet links) into the initial vector space, the LE specialisation -procedure brings true hyponymy-hypernymy pairs closer together in the -transformed Euclidean space. The proposed asymmetric distance measure adjusts -the norms of word vectors to reflect the actual WordNet-style hierarchy of -concepts. Simultaneously, a joint objective enforces semantic similarity using -the symmetric cosine distance, yielding a vector space specialised for both -lexical relations at once. LEAR specialisation achieves state-of-the-art -performance in the tasks of hypernymy directionality, hypernymy detection, and -graded lexical entailment, demonstrating the effectiveness and robustness of -the proposed asymmetric specialisation model. -" -5959,1710.06390,"Maria Glenski, Ellyn Ayton, Dustin Arendt, and Svitlana Volkova","Fishing for Clickbaits in Social Images and Texts with - Linguistically-Infused Neural Network Models",cs.LG cs.CL cs.SI," This paper presents the results and conclusions of our participation in the -Clickbait Challenge 2017 on automatic clickbait detection in social media. We -first describe linguistically-infused neural network models and identify -informative representations to predict the level of clickbaiting present in -Twitter posts. Our models allow to answer the question not only whether a post -is a clickbait or not, but to what extent it is a clickbait post e.g., not at -all, slightly, considerably, or heavily clickbaity using a score ranging from 0 -to 1. We evaluate the predictive power of models trained on varied text and -image representations extracted from tweets. Our best performing model that -relies on the tweet text and linguistic markers of biased language extracted -from the tweet and the corresponding page yields mean squared error (MSE) of -0.04, mean absolute error (MAE) of 0.16 and R2 of 0.43 on the held-out test -data. For the binary classification setup (clickbait vs. non-clickbait), our -model achieved F1 score of 0.69. We have not found that image representations -combined with text yield significant performance improvement yet. Nevertheless, -this work is the first to present preliminary analysis of objects extracted -using Google Tensorflow object detection API from images in clickbait vs. -non-clickbait Twitter posts. Finally, we outline several steps to improve model -performance as a part of the future work. -" -5960,1710.06393,"Aiala Ros\'a and Luis Chiruzzo and Mathias Etcheverry and Santiago - Castro","RETUYT in TASS 2017: Sentiment Analysis for Spanish Tweets using SVM and - CNN",cs.CL," This article presents classifiers based on SVM and Convolutional Neural -Networks (CNN) for the TASS 2017 challenge on tweets sentiment analysis. The -classifier with the best performance in general uses a combination of SVM and -CNN. The use of word embeddings was particularly useful for improving the -classifiers performance. -" -5961,1710.06406,"Claire Bonial, Matthew Marge, Ron artstein, Ashley Foots, Felix - Gervits, Cory J. Hayes, Cassidy Henry, Susan G. Hill, Anton Leuski, Stephanie - M. Lukin, Pooja Moolchandani, Kimberly A. Pollard, David Traum, Clare R. Voss","Laying Down the Yellow Brick Road: Development of a Wizard-of-Oz - Interface for Collecting Human-Robot Dialogue",cs.CL cs.AI cs.HC cs.RO," We describe the adaptation and refinement of a graphical user interface -designed to facilitate a Wizard-of-Oz (WoZ) approach to collecting human-robot -dialogue data. The data collected will be used to develop a dialogue system for -robot navigation. Building on an interface previously used in the development -of dialogue systems for virtual agents and video playback, we add templates -with open parameters which allow the wizard to quickly produce a wide variety -of utterances. Our research demonstrates that this approach to data collection -is viable as an intermediate step in developing a dialogue system for physical -robots in remote locations from their users - a domain in which the human and -robot need to regularly verify and update a shared understanding of the -physical environment. We show that our WoZ interface and the fixed set of -utterances and templates therein provide for a natural pace of dialogue with -good coverage of the navigation domain. -" -5962,1710.06481,"Johannes Welbl, Pontus Stenetorp, Sebastian Riedel","Constructing Datasets for Multi-hop Reading Comprehension Across - Documents",cs.CL cs.AI," Most Reading Comprehension methods limit themselves to queries which can be -answered using a single sentence, paragraph, or document. Enabling models to -combine disjoint pieces of textual evidence would extend the scope of machine -comprehension methods, but currently there exist no resources to train and test -this capability. We propose a novel task to encourage the development of models -for text understanding across multiple documents and to investigate the limits -of existing methods. In our task, a model learns to seek and combine evidence - -effectively performing multi-hop (alias multi-step) inference. We devise a -methodology to produce datasets for this task, given a collection of -query-answer pairs and thematically linked documents. Two datasets from -different domains are induced, and we identify potential pitfalls and devise -circumvention strategies. We evaluate two previously proposed competitive -models and find that one can integrate information across documents. However, -both models struggle to select relevant information, as providing documents -guaranteed to be relevant greatly improves their performance. While the models -outperform several strong baselines, their best accuracy reaches 42.9% compared -to human performance at 74.0% - leaving ample room for improvement. -" -5963,1710.06524,"Ignacio Arroyo-Fern\'andez, Carlos-Francisco M\'endez-Cruz, Gerardo - Sierra, Juan-Manuel Torres-Moreno and Grigori Sidorov","Unsupervised Sentence Representations as Word Information Series: - Revisiting TF--IDF",cs.CL," Sentence representation at the semantic level is a challenging task for -Natural Language Processing and Artificial Intelligence. Despite the advances -in word embeddings (i.e. word vector representations), capturing sentence -meaning is an open question due to complexities of semantic interactions among -words. In this paper, we present an embedding method, which is aimed at -learning unsupervised sentence representations from unlabeled text. We propose -an unsupervised method that models a sentence as a weighted series of word -embeddings. The weights of the word embeddings are fitted by using Shannon's -word entropies provided by the Term Frequency--Inverse Document Frequency -(TF--IDF) transform. The hyperparameters of the model can be selected according -to the properties of data (e.g. sentence length and textual gender). -Hyperparameter selection involves word embedding methods and dimensionalities, -as well as weighting schemata. Our method offers advantages over existing -methods: identifiable modules, short-term training, online inference of -(unseen) sentence representations, as well as independence from domain, -external knowledge and language resources. Results showed that our model -outperformed the state of the art in well-known Semantic Textual Similarity -(STS) benchmarks. Moreover, our model reached state-of-the-art performance when -compared to supervised and knowledge-based STS systems. -" -5964,1710.06536,"Iti Chaturvedi, Soujanya Poria, Erik Cambria",Basic tasks of sentiment analysis,cs.CL," Subjectivity detection is the task of identifying objective and subjective -sentences. Objective sentences are those which do not exhibit any sentiment. -So, it is desired for a sentiment analysis engine to find and separate the -objective sentences for further analysis, e.g., polarity detection. In -subjective sentences, opinions can often be expressed on one or multiple -topics. Aspect extraction is a subtask of sentiment analysis that consists in -identifying opinion targets in opinionated text, i.e., in detecting the -specific aspects of a product or service the opinion holder is either praising -or complaining about. -" -5965,1710.06554,"Raphael Tang, Jimmy Lin","Honk: A PyTorch Reimplementation of Convolutional Neural Networks for - Keyword Spotting",cs.CL," We describe Honk, an open-source PyTorch reimplementation of convolutional -neural networks for keyword spotting that are included as examples in -TensorFlow. These models are useful for recognizing ""command triggers"" in -speech-based interfaces (e.g., ""Hey Siri""), which serve as explicit cues for -audio recordings of utterances that are sent to the cloud for full speech -recognition. Evaluation on Google's recently released Speech Commands Dataset -shows that our reimplementation is comparable in accuracy and provides a -starting point for future work on the keyword spotting task. -" -5966,1710.06632,"Mohammad Taher Pilehvar, Jose Camacho-Collados, Roberto Navigli, Nigel - Collier","Towards a Seamless Integration of Word Senses into Downstream NLP - Applications",cs.CL," Lexical ambiguity can impede NLP systems from accurate understanding of -semantics. Despite its potential benefits, the integration of sense-level -information into NLP systems has remained understudied. By incorporating a -novel disambiguation algorithm into a state-of-the-art classification model, we -create a pipeline to integrate sense-level information into downstream NLP -applications. We show that a simple disambiguation of the input text can lead -to consistent performance improvement on multiple topic categorization and -polarity detection datasets, particularly when the fine granularity of the -underlying sense inventory is reduced and the document is sufficiently large. -Our results also point to the need for sense representation research to focus -more on in vivo evaluations which target the performance in downstream NLP -applications rather than artificial benchmarks. -" -5967,1710.06700,Hamdy Mubarak,Build Fast and Accurate Lemmatization for Arabic,cs.CL," In this paper we describe the complexity of building a lemmatizer for Arabic -which has a rich and complex derivational morphology, and we discuss the need -for a fast and accurate lammatization to enhance Arabic Information Retrieval -(IR) results. We also introduce a new data set that can be used to test -lemmatization accuracy, and an efficient lemmatization algorithm that -outperforms state-of-the-art Arabic lemmatization in terms of accuracy and -speed. We share the data set and the code for public. -" -5968,1710.06917,"Boyang Li, Beth Cardier, Tong Wang, Florian Metze",Annotating High-Level Structures of Short Stories and Personal Anecdotes,cs.CL," Stories are a vital form of communication in human culture; they are employed -daily to persuade, to elicit sympathy, or to convey a message. Computational -understanding of human narratives, especially high-level narrative structures, -remain limited to date. Multiple literary theories for narrative structures -exist, but operationalization of the theories has remained a challenge. We -developed an annotation scheme by consolidating and extending existing -narratological theories, including Labov and Waletsky's (1967) functional -categorization scheme and Freytag's (1863) pyramid of dramatic tension, and -present 360 annotated short stories collected from online sources. In the -future, this research will support an approach that enables systems to -intelligently sustain complex communications with humans. -" -5969,1710.06922,"Jason Lee, Kyunghyun Cho, Jason Weston and Douwe Kiela",Emergent Translation in Multi-Agent Communication,cs.CL cs.AI," While most machine translation systems to date are trained on large parallel -corpora, humans learn language in a different way: by being grounded in an -environment and interacting with other humans. In this work, we propose a -communication game where two agents, native speakers of their own respective -languages, jointly learn to solve a visual referential task. We find that the -ability to understand and translate a foreign language emerges as a means to -achieve shared goals. The emergent translation is interactive and multimodal, -and crucially does not require parallel corpora, but only monolingual, -independent text and corresponding images. Our proposed translation model -achieves this by grounding the source and target languages into a shared visual -modality, and outperforms several baselines on both word-level and -sentence-level translation tasks. Furthermore, we show that agents in a -multilingual community learn to translate better and faster than in a bilingual -communication setting. -" -5970,1710.06923,C. Anantaram and Sunil Kumar Kopparapu,"Adapting general-purpose speech recognition engine output for - domain-specific natural language question answering",cs.CL cs.AI," Speech-based natural language question-answering interfaces to enterprise -systems are gaining a lot of attention. General-purpose speech engines can be -integrated with NLP systems to provide such interfaces. Usually, -general-purpose speech engines are trained on large `general' corpus. However, -when such engines are used for specific domains, they may not recognize -domain-specific words well, and may produce erroneous output. Further, the -accent and the environmental conditions in which the speaker speaks a sentence -may induce the speech engine to inaccurately recognize certain words. The -subsequent natural language question-answering does not produce the requisite -results as the question does not accurately represent what the speaker -intended. Thus, the speech engine's output may need to be adapted for a domain -before further natural language processing is carried out. We present two -mechanisms for such an adaptation, one based on evolutionary development and -the other based on machine learning, and show how we can repair the -speech-output to make the subsequent natural language question-answering -better. -" -5971,1710.06931,Dushyanta Dhyani,"OhioState at IJCNLP-2017 Task 4: Exploring Neural Architectures for - Multilingual Customer Feedback Analysis",cs.CL," This paper describes our systems for IJCNLP 2017 Shared Task on Customer -Feedback Analysis. We experimented with simple neural architectures that gave -competitive performance on certain tasks. This includes shallow CNN and -Bi-Directional LSTM architectures with Facebook's Fasttext as a baseline model. -Our best performing model was in the Top 5 systems using the Exact-Accuracy and -Micro-Average-F1 metrics for the Spanish (85.28% for both) and French (70% and -73.17% respectively) task, and outperformed all the other models on comment -(87.28%) and meaningless (51.85%) tags using Micro Average F1 by Tags metric -for the French task. -" -5972,1710.06937,"Xiaodong Cui, Vaibhava Goel, George Saon",Embedding-Based Speaker Adaptive Training of Deep Neural Networks,cs.CL," An embedding-based speaker adaptive training (SAT) approach is proposed and -investigated in this paper for deep neural network acoustic modeling. In this -approach, speaker embedding vectors, which are a constant given a particular -speaker, are mapped through a control network to layer-dependent element-wise -affine transformations to canonicalize the internal feature representations at -the output of hidden layers of a main network. The control network for -generating the speaker-dependent mappings is jointly estimated with the main -network for the overall speaker adaptive acoustic modeling. Experiments on -large vocabulary continuous speech recognition (LVCSR) tasks show that the -proposed SAT scheme can yield superior performance over the widely-used -speaker-aware training using i-vectors with speaker-adapted input features. -" -5973,1710.07032,"Michael Ringgaard, Rahul Gupta, Fernando C. N. Pereira",SLING: A framework for frame semantic parsing,cs.CL," We describe SLING, a framework for parsing natural language into semantic -frames. SLING supports general transition-based, neural-network parsing with -bidirectional LSTM input encoding and a Transition Based Recurrent Unit (TBRU) -for output decoding. The parsing model is trained end-to-end using only the -text tokens as input. The transition system has been designed to output frame -graphs directly without any intervening symbolic representation. The SLING -framework includes an efficient and scalable frame store implementation as well -as a neural network JIT compiler for fast inference during parsing. SLING is -implemented in C++ and it is available for download on GitHub. -" -5974,1710.07045,"Pieter Fivez, Simon \v{S}uster, Walter Daelemans","Unsupervised Context-Sensitive Spelling Correction of English and Dutch - Clinical Free-Text with Word and Character N-Gram Embeddings",cs.CL," We present an unsupervised context-sensitive spelling correction method for -clinical free-text that uses word and character n-gram embeddings. Our method -generates misspelling replacement candidates and ranks them according to their -semantic fit, by calculating a weighted cosine similarity between the -vectorized representation of a candidate and the misspelling context. To tune -the parameters of this model, we generate self-induced spelling error corpora. -We perform our experiments for two languages. For English, we greatly -outperform off-the-shelf spelling correction tools on a manually annotated -MIMIC-III test set, and counter the frequency bias of a noisy channel model, -showing that neural embeddings can be successfully exploited to improve upon -the state-of-the-art. For Dutch, we also outperform an off-the-shelf spelling -correction tool on manually annotated clinical records from the Antwerp -University Hospital, but can offer no empirical evidence that our method -counters the frequency bias of a noisy channel model in this case as well. -However, both our context-sensitive model and our implementation of the noisy -channel model obtain high scores on the test set, establishing a -state-of-the-art for Dutch clinical spelling correction with the noisy channel -model. -" -5975,1710.07177,"Desmond Elliott, Stella Frank, Lo\""ic Barrault, Fethi Bougares, Lucia - Specia","Findings of the Second Shared Task on Multimodal Machine Translation and - Multilingual Image Description",cs.CL cs.CV," We present the results from the second shared task on multimodal machine -translation and multilingual image description. Nine teams submitted 19 systems -to two tasks. The multimodal translation task, in which the source sentence is -supplemented by an image, was extended with a new language (French) and two new -test sets. The multilingual image description task was changed such that at -test time, only the image is given. Compared to last year, multimodal systems -improved, but text-only systems remain competitive. -" -5976,1710.07210,"Honglun Zhang, Liqiang Xiao, Wenqing Chen, Yongkun Wang, Yaohui Jin",Multi-Task Label Embedding for Text Classification,cs.CL," Multi-task learning in text classification leverages implicit correlations -among related tasks to extract common features and yield performance gains. -However, most previous works treat labels of each task as independent and -meaningless one-hot vectors, which cause a loss of potential information and -makes it difficult for these models to jointly learn three or more tasks. In -this paper, we propose Multi-Task Label Embedding to convert labels in text -classification into semantic vectors, thereby turning the original tasks into -vector matching tasks. We implement unsupervised, supervised and -semi-supervised models of Multi-Task Label Embedding, all utilizing semantic -correlations among tasks and making it particularly convenient to scale and -transfer as more tasks are involved. Extensive experiments on five benchmark -datasets for text classification show that our models can effectively improve -performances of related tasks with semantic representations of labels and -additional information from each other. -" -5977,1710.07388,"Yi Luan, Chris Brockett, Bill Dolan, Jianfeng Gao, Michel Galley","Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation - Models",cs.CL," Building a persona-based conversation agent is challenging owing to the lack -of large amounts of speaker-specific conversation data for model training. This -paper addresses the problem by proposing a multi-task learning approach to -training neural conversation models that leverages both conversation data -across speakers and other types of data pertaining to the speaker and speaker -roles to be modeled. Experiments show that our approach leads to significant -improvements over baseline model quality, generating responses that capture -more precisely speakers' traits and speaking styles. The model offers the -benefits of being algorithmically simple and easy to implement, and not relying -on large quantities of data representing specific individual speakers. -" -5978,1710.07394,"Lei Gao, Alexis Kuppersmith, Ruihong Huang","Recognizing Explicit and Implicit Hate Speech Using a Weakly Supervised - Two-path Bootstrapping Approach",cs.CL," In the wake of a polarizing election, social media is laden with hateful -content. To address various limitations of supervised hate speech -classification methods including corpus bias and huge cost of annotation, we -propose a weakly supervised two-path bootstrapping approach for an online hate -speech detection model leveraging large-scale unlabeled data. This system -significantly outperforms hate speech detection systems that are trained in a -supervised manner using manually annotated data. Applying this model on a large -quantity of tweets collected before, after, and on election day reveals -motivations and patterns of inflammatory language. -" -5979,1710.07395,"Lei Gao, Ruihong Huang",Detecting Online Hate Speech Using Context Aware Models,cs.CL," In the wake of a polarizing election, the cyber world is laden with hate -speech. Context accompanying a hate speech text is useful for identifying hate -speech, which however has been largely overlooked in existing datasets and hate -speech detection models. In this paper, we provide an annotated corpus of hate -speech with context information well kept. Then we propose two types of hate -speech detection models that incorporate context information, a logistic -regression model with context features and a neural network model with learning -components for context. Our evaluation shows that both models outperform a -strong baseline by around 3% to 4% in F1 score and combining these two models -further improve the performance by another 7% in F1 score. -" -5980,1710.07441,"Elaheh ShafieiBavani, Mohammad Ebrahimi, Raymond Wong, Fang Chen",A Semantically Motivated Approach to Compute ROUGE Scores,cs.CL," ROUGE is one of the first and most widely used evaluation metrics for text -summarization. However, its assessment merely relies on surface similarities -between peer and model summaries. Consequently, ROUGE is unable to fairly -evaluate abstractive summaries including lexical variations and paraphrasing. -Exploring the effectiveness of lexical resource-based models to address this -issue, we adopt a graph-based algorithm into ROUGE to capture the semantic -similarities between peer and model summaries. Our semantically motivated -approach computes ROUGE scores based on both lexical and semantic similarities. -Experiment results over TAC AESOP datasets indicate that exploiting the -lexico-semantic similarity of the words used in summaries would significantly -help ROUGE correlate better with human judgments. -" -5981,1710.07503,Eirini Papagiannopoulou and Grigorios Tsoumakas,Local Word Vectors Guiding Keyphrase Extraction,cs.CL," Automated keyphrase extraction is a fundamental textual information -processing task concerned with the selection of representative phrases from a -document that summarize its content. This work presents a novel unsupervised -method for keyphrase extraction, whose main innovation is the use of local word -embeddings (in particular GloVe vectors), i.e., embeddings trained from the -single document under consideration. We argue that such local representation of -words and keyphrases are able to accurately capture their semantics in the -context of the document they are part of, and therefore can help in improving -keyphrase extraction quality. Empirical results offer evidence that indeed -local representations lead to better keyphrase extraction results compared to -both embeddings trained on very large third corpora or larger corpora -consisting of several documents of the same scientific field and to other -state-of-the-art unsupervised keyphrase extraction methods. -" -5982,1710.07551,"Tuka Alhanai, Rhoda Au, and James Glass",Spoken Language Biomarkers for Detecting Cognitive Impairment,cs.AI cs.CL q-bio.NC," In this study we developed an automated system that evaluates speech and -language features from audio recordings of neuropsychological examinations of -92 subjects in the Framingham Heart Study. A total of 265 features were used in -an elastic-net regularized binomial logistic regression model to classify the -presence of cognitive impairment, and to select the most predictive features. -We compared performance with a demographic model from 6,258 subjects in the -greater study cohort (0.79 AUC), and found that a system that incorporated both -audio and text features performed the best (0.92 AUC), with a True Positive -Rate of 29% (at 0% False Positive Rate) and a good model fit (Hosmer-Lemeshow -test > 0.05). We also found that decreasing pitch and jitter, shorter segments -of speech, and responses phrased as questions were positively associated with -cognitive impairment. -" -5983,1710.07654,"Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, - Sharan Narang, Jonathan Raiman, John Miller","Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence - Learning",cs.SD cs.AI cs.CL cs.LG eess.AS," We present Deep Voice 3, a fully-convolutional attention-based neural -text-to-speech (TTS) system. Deep Voice 3 matches state-of-the-art neural -speech synthesis systems in naturalness while training ten times faster. We -scale Deep Voice 3 to data set sizes unprecedented for TTS, training on more -than eight hundred hours of audio from over two thousand speakers. In addition, -we identify common error modes of attention-based speech synthesis networks, -demonstrate how to mitigate them, and compare several different waveform -synthesis methods. We also describe how to scale inference to ten million -queries per day on one single-GPU server. -" -5984,1710.07695,"Wanyun Cui, Xiyou Zhou, Hangyu Lin, Yanghua Xiao, Haixun Wang, - Seung-won Hwang and Wei Wang",Verb Pattern: A Probabilistic Semantic Representation on Verbs,cs.CL," Verbs are important in semantic understanding of natural language. -Traditional verb representations, such as FrameNet, PropBank, VerbNet, focus on -verbs' roles. These roles are too coarse to represent verbs' semantics. In this -paper, we introduce verb patterns to represent verbs' semantics, such that each -pattern corresponds to a single semantic of the verb. First we analyze the -principles for verb patterns: generality and specificity. Then we propose a -nonparametric model based on description length. Experimental results prove the -high effectiveness of verb patterns. We further apply verb patterns to -context-aware conceptualization, to show that verb patterns are helpful in -semantic-related tasks. -" -5985,1710.07728,Jason Anastasopoulos and Jake Ryland Williams,A Computational Framework for Multi-Modal Social Action Identification,cs.SI cs.CL cs.CY physics.soc-ph," We create a computational framework for understanding social action and -demonstrate how this framework can be used to build an open-source event -detection tool with scalable statistical machine learning algorithms and a -subsampled database of over 600 million geo-tagged Tweets from around the -world. These Tweets were collected between April 1st, 2014 and April 30th, -2015, most notably when the Black Lives Matter movement began. We demonstrate -how these methods can be used diagnostically-by researchers, government -officials and the public-to understand peaceful and violent collective action -at very fine-grained levels of time and geography. -" -5986,1710.07729,Jake Ryland Williams and Giovanni C. Santia,"Is space a word, too?",cs.CL cond-mat.stat-mech," For words, rank-frequency distributions have long been heralded for adherence -to a potentially-universal phenomenon known as Zipf's law. The hypothetical -form of this empirical phenomenon was refined by Ben\^{i}ot Mandelbrot to that -which is presently referred to as the Zipf-Mandelbrot law. Parallel to this, -Herbet Simon proposed a selection model potentially explaining Zipf's law. -However, a significant dispute between Simon and Mandelbrot, notable empirical -exceptions, and the lack of a strong empirical connection between Simon's model -and the Zipf-Mandelbrot law have left the questions of universality and -mechanistic generation open. We offer a resolution to these issues by -exhibiting how the dark matter of word segmentation, i.e., space, punctuation, -etc., connect the Zipf-Mandelbrot law to Simon's mechanistic process. This -explains Mandelbrot's refinement as no more than a fudge factor, accommodating -the effects of the exclusion of the rank-frequency dark matter. Thus, -integrating these non-word objects resolves a more-generalized rank-frequency -law. Since this relies upon the integration of space, etc., we find support for -the hypothesis that $all$ are generated by common processes, indicating from a -physical perspective that space is a word, too. -" -5987,1710.07770,"Baiyun Cui, Yingming Li, Yaqing Zhang and Zhongfei Zhang",Text Coherence Analysis Based on Deep Neural Network,cs.CL," In this paper, we propose a novel deep coherence model (DCM) using a -convolutional neural network architecture to capture the text coherence. The -text coherence problem is investigated with a new perspective of learning -sentence distributional representation and text coherence modeling -simultaneously. In particular, the model captures the interactions between -sentences by computing the similarities of their distributional -representations. Further, it can be easily trained in an end-to-end fashion. -The proposed model is evaluated on a standard Sentence Ordering task. The -experimental results demonstrate its effectiveness and promise in coherence -assessment showing a significant improvement over the state-of-the-art by a -wide margin. -" -5988,1710.07868,Mohit Yadav and Vivek Tyagi,Deep Triphone Embedding Improves Phoneme Recognition,cs.SD cs.CL cs.LG eess.AS," In this paper, we present a novel Deep Triphone Embedding (DTE) -representation derived from Deep Neural Network (DNN) to encapsulate the -discriminative information present in the adjoining speech frames. DTEs are -generated using a four hidden layer DNN with 3000 nodes in each hidden layer at -the first-stage. This DNN is trained with the tied-triphone classification -accuracy as an optimization criterion. Thereafter, we retain the activation -vectors (3000) of the last hidden layer, for each speech MFCC frame, and -perform dimension reduction to further obtain a 300 dimensional representation, -which we termed as DTE. DTEs along with MFCC features are fed into a -second-stage four hidden layer DNN, which is subsequently trained for the task -of tied-triphone classification. Both DNNs are trained using tri-phone labels -generated from a tied-state triphone HMM-GMM system, by performing a -forced-alignment between the transcriptions and MFCC feature frames. We conduct -the experiments on publicly available TED-LIUM speech corpus. The results show -that the proposed DTE method provides an improvement of absolute 2.11% in -phoneme recognition, when compared with a competitive hybrid tied-state -triphone HMM-DNN system. -" -5989,1710.07960,Piotr Przyby{\l}a,"How big is big enough? Unsupervised word sense disambiguation using a - very large corpus",cs.CL," In this paper, the problem of disambiguating a target word for Polish is -approached by searching for related words with known meaning. These relatives -are used to build a training corpus from unannotated text. This technique is -improved by proposing new rich sources of replacements that substitute the -traditional requirement of monosemy with heuristics based on wordnet relations. -The na\""ive Bayesian classifier has been modified to account for an unknown -distribution of senses. A corpus of 600 million web documents (594 billion -tokens), gathered by the NEKST search engine allows us to assess the -relationship between training set size and disambiguation accuracy. The -classifier is evaluated using both a wordnet baseline and a corpus with 17,314 -manually annotated occurrences of 54 ambiguous words. -" -5990,1710.08015,"Chenwei Zhang, Nan Du, Wei Fan, Yaliang Li, Chun-Ta Lu, Philip S. Yu","Bringing Semantic Structures to User Intent Detection in Online Medical - Queries",cs.CL," The Internet has revolutionized healthcare by offering medical information -ubiquitously to patients via web search. The healthcare status, complex medical -information needs of patients are expressed diversely and implicitly in their -medical text queries. Aiming to better capture a focused picture of user's -medical-related information search and shed insights on their healthcare -information access strategies, it is challenging yet rewarding to detect -structured user intentions from their diversely expressed medical text queries. -We introduce a graph-based formulation to explore structured concept -transitions for effective user intent detection in medical queries, where each -node represents a medical concept mention and each directed edge indicates a -medical concept transition. A deep model based on multi-task learning is -introduced to extract structured semantic transitions from user queries, where -the model extracts word-level medical concept mentions as well as -sentence-level concept transitions collectively. A customized graph-based -mutual transfer loss function is designed to impose explicit constraints and -further exploit the contribution of mentioning a medical concept word to the -implication of a semantic transition. We observe an 8% relative improvement in -AUC and 23% relative reduction in coverage error by comparing the proposed -model with the best baseline model for the concept transition inference task on -real-world medical text queries. -" -5991,1710.08048,"Andres Campero and Bjarke Felbo and Joshua B. Tenenbaum and Rebecca - Saxe","A First Step in Combining Cognitive Event Features and Natural Language - Representations to Predict Emotions",cs.CL," We explore the representational space of emotions by combining methods from -different academic fields. Cognitive science has proposed appraisal theory as a -view on human emotion with previous research showing how human-rated abstract -event features can predict fine-grained emotions and capture the similarity -space of neural patterns in mentalizing brain regions. At the same time, -natural language processing (NLP) has demonstrated how transfer and multitask -learning can be used to cope with scarcity of annotated data for text modeling. - The contribution of this work is to show that appraisal theory can be -combined with NLP for mutual benefit. First, fine-grained emotion prediction -can be improved to human-level performance by using NLP representations in -addition to appraisal features. Second, using the appraisal features as -auxiliary targets during training can improve predictions even when only text -is available as input. Third, we obtain a representation with a similarity -matrix that better correlates with the neural activity across regions. Best -results are achieved when the model is trained to simultaneously predict -appraisals, emotions and emojis using a shared representation. - While these results are preliminary, the integration of cognitive -neuroscience and NLP techniques opens up an interesting direction for future -research. -" -5992,1710.08246,"Richa Sharma, Muktabh Mayank Srivastava",Testing the limits of unsupervised learning for semantic similarity,cs.CL," Semantic Similarity between two sentences can be defined as a way to -determine how related or unrelated two sentences are. The task of Semantic -Similarity in terms of distributed representations can be thought to be -generating sentence embeddings (dense vectors) which take both context and -meaning of sentence in account. Such embeddings can be produced by multiple -methods, in this paper we try to evaluate LSTM auto encoders for generating -these embeddings. Unsupervised algorithms (auto encoders to be specific) just -try to recreate their inputs, but they can be forced to learn order (and some -inherent meaning to some extent) by creating proper bottlenecks. We try to -evaluate how properly can algorithms trained just on plain English Sentences -learn to figure out Semantic Similarity, without giving them any sense of what -meaning of a sentence is. -" -5993,1710.08312,"Patrick Verga, Emma Strubell, Ofer Shai, Andrew McCallum","Attending to All Mention Pairs for Full Abstract Biological Relation - Extraction",cs.CL," Most work in relation extraction forms a prediction by looking at a short -span of text within a single sentence containing a single entity pair mention. -However, many relation types, particularly in biomedical text, are expressed -across sentences or require a large context to disambiguate. We propose a model -to consider all mention and entity pairs simultaneously in order to make a -prediction. We encode full paper abstracts using an efficient self-attention -encoder and form pairwise predictions between all mentions with a bi-affine -operation. An entity-pair wise pooling aggregates mention pair scores to make a -final prediction while alleviating training noise by performing within document -multi-instance learning. We improve our model's performance by jointly training -the model to predict named entities and adding an additional corpus of weakly -labeled data. We demonstrate our model's effectiveness by achieving the state -of the art on the Biocreative V Chemical Disease Relation dataset for models -without KB resources, outperforming ensembles of models which use hand-crafted -features and additional linguistic resources. -" -5994,1710.08321,"Nishant Nikhil, Muktabh Mayank Srivastava",Content Based Document Recommender using Deep Learning,cs.CL cs.IR," With the recent advancements in information technology there has been a huge -surge in amount of data available. But information retrieval technology has not -been able to keep up with this pace of information generation resulting in over -spending of time for retrieving relevant information. Even though systems exist -for assisting users to search a database along with filtering and recommending -relevant information, but recommendation system which uses content of documents -for recommendation still have a long way to mature. Here we present a Deep -Learning based supervised approach to recommend similar documents based on the -similarity of content. We combine the C-DSSM model with Word2Vec distributed -representations of words to create a novel model to classify a document pair as -relevant/irrelavant by assigning a score to it. Using our model retrieval of -documents can be done in O(1) time and the memory complexity is O(n), where n -is number of documents. -" -5995,1710.08396,"Vinayakumar R, Barathi Ganesh HB, Anand Kumar M, Soman KP",Deep Health Care Text Classification,cs.CL cs.AI," Health related social media mining is a valuable apparatus for the early -recognition of the diverse antagonistic medicinal conditions. Mostly, the -existing methods are based on machine learning with knowledge-based learning. -This working note presents the Recurrent neural network (RNN) and Long -short-term memory (LSTM) based embedding for automatic health text -classification in the social media mining. For each task, two systems are built -and that classify the tweet at the tweet level. RNN and LSTM are used for -extracting features and non-linear activation function at the last layer -facilitates to distinguish the tweets of different categories. The experiments -are conducted on 2nd Social Media Mining for Health Applications Shared Task at -AMIA 2017. The experiment results are considerable; however the proposed method -is appropriate for the health text classification. This is primarily due to the -reason that, it doesn't rely on any feature engineering mechanisms. -" -5996,1710.08451,"Samhaa R. El-Beltagy, Talaat Khalil, Amal Halaby, and Muhammad Hammad","Combining Lexical Features and a Supervised Learning Approach for Arabic - Sentiment Analysis",cs.CL," The importance of building sentiment analysis tools for Arabic social media -has been recognized during the past couple of years, especially with the rapid -increase in the number of Arabic social media users. One of the main -difficulties in tackling this problem is that text within social media is -mostly colloquial, with many dialects being used within social media platforms. -In this paper, we present a set of features that were integrated with a machine -learning based sentiment analysis model and applied on Egyptian, Saudi, -Levantine, and MSA Arabic social media datasets. Many of the proposed features -were derived through the use of an Arabic Sentiment Lexicon. The model also -presents emoticon based features, as well as input text related features such -as the number of segments within the text, the length of the text, whether the -text ends with a question mark or not, etc. We show that the presented features -have resulted in an increased accuracy across six of the seven datasets we've -experimented with and which are all benchmarked. Since the developed model -out-performs all existing Arabic sentiment analysis systems that have publicly -available datasets, we can state that this model presents state-of-the-art in -Arabic sentiment analysis. -" -5997,1710.08458,"Samhaa R. El-Beltagy, Mona El Kalamawy, Abu Bakr Soliman",NileTMRG at SemEval-2017 Task 4: Arabic Sentiment Analysis,cs.CL," This paper describes two systems that were used by the authors for addressing -Arabic Sentiment Analysis as part of SemEval-2017, task 4. The authors -participated in three Arabic related subtasks which are: Subtask A (Message -Polarity Classification), Sub-task B (Topic-Based Message Polarity -classification) and Subtask D (Tweet quantification) using the team name of -NileTMRG. For subtask A, we made use of our previously developed sentiment -analyzer which we augmented with a scored lexicon. For subtasks B and D, we -used an ensemble of three different classifiers. The first classifier was a -convolutional neural network for which we trained (word2vec) word embeddings. -The second classifier consisted of a MultiLayer Perceptron, while the third -classifier was a Logistic regression model that takes the same input as the -second classifier. Voting between the three classifiers was used to determine -the final outcome. The output from task B, was quantified to produce the -results for task D. In all three Arabic related tasks in which NileTMRG -participated, the team ranked at number one. -" -5998,1710.08528,"Olga Papadopoulou, Markos Zampoglou, Symeon Papadopoulos, Ioannis - Kompatsiaris","A Two-Level Classification Approach for Detecting Clickbait Posts using - Text-Based Features",cs.SI cs.CL," The emergence of social media as news sources has led to the rise of -clickbait posts attempting to attract users to click on article links without -informing them on the actual article content. This paper presents our efforts -to create a clickbait detector inspired by fake news detection algorithms, and -our submission to the Clickbait Challenge 2017. The detector is based almost -exclusively on text-based features taken from previous work on clickbait -detection, our own work on fake post detection, and features we designed -specifically for the challenge. We use a two-level classification approach, -combining the outputs of 65 first-level classifiers in a second-level feature -vector. We present our exploratory results with individual features and their -combinations, taken from the post text and the target article title, as well as -feature selection. While our own blind tests with the dataset led to an F-score -of 0.63, our final evaluation in the Challenge only achieved an F-score of -0.43. We explore the possible causes of this, and lay out potential future -steps to achieve more successful results. -" -5999,1710.08634,"Ricardo Usbeck and Michael Hoffmann and Michael R\""oder and Jens - Lehmann and Axel-Cyrille Ngonga Ngomo",Using Multi-Label Classification for Improved Question Answering,cs.IR cs.CL," A plethora of diverse approaches for question answering over RDF data have -been developed in recent years. While the accuracy of these systems has -increased significantly over time, most systems still focus on particular types -of questions or particular challenges in question answering. What is a curse -for single systems is a blessing for the combination of these systems. We show -in this paper how machine learning techniques can be applied to create a more -accurate question answering metasystem by reusing existing systems. In -particular, we develop a multi-label classification-based metasystem for -question answering over 6 existing systems using an innovative set of 14 -question features. The metasystem outperforms the best single system by 14% -F-measure on the recent QALD-6 benchmark. Furthermore, we analyzed the -influence and correlation of the underlying features on the metasystem quality. -" -6000,1710.08691,"Axel-Cyrille Ngonga Ngomo, Michael R\""oder, Diego Moussallem, Ricardo - Usbeck, Ren\'e Speck","BENGAL: An Automatic Benchmark Generator for Entity Recognition and - Linking",cs.CL," The manual creation of gold standards for named entity recognition and entity -linking is time- and resource-intensive. Moreover, recent works show that such -gold standards contain a large proportion of mistakes in addition to being -difficult to maintain. We hence present BENGAL, a novel automatic generation of -such gold standards as a complement to manually created benchmarks. The main -advantage of our benchmarks is that they can be readily generated at any time. -They are also cost-effective while being guaranteed to be free of annotation -errors. We compare the performance of 11 tools on benchmarks in English -generated by BENGAL and on 16benchmarks created manually. We show that our -approach can be ported easily across languages by presenting results achieved -by 4 tools on both Brazilian Portuguese and Spanish. Overall, our results -suggest that our automatic benchmark generation approach can create varied -benchmarks that have characteristics similar to those of existing benchmarks. -Our approach is open-source. Our experimental results are available at -http://faturl.com/bengalexpinlg and the code at -https://github.com/dice-group/BENGAL. -" -6001,1710.08721,Philippe Thomas,Clickbait Identification using Neural Networks,cs.CL," This paper presents the results of our participation in the Clickbait -Detection Challenge 2017. The system relies on a fusion of neural networks, -incorporating different types of available informations. It does not require -any linguistic preprocessing, and hence generalizes more easily to new domains -and languages. The final combined model achieves a mean squared error of -0.0428, an accuracy of 0.826, and a F1 score of 0.564. According to the -official evaluation metric the system ranked 6th of the 13 participating teams. -" -6002,1710.08963,Patrick O. Perry and Kenneth Benoit,Scaling Text with the Class Affinity Model,stat.ML cs.CL cs.LG," Probabilistic methods for classifying text form a rich tradition in machine -learning and natural language processing. For many important problems, however, -class prediction is uninteresting because the class is known, and instead the -focus shifts to estimating latent quantities related to the text, such as -affect or ideology. We focus on one such problem of interest, estimating the -ideological positions of 55 Irish legislators in the 1991 D\'ail confidence -vote. To solve the D\'ail scaling problem and others like it, we develop a text -modeling framework that allows actors to take latent positions on a ""gray"" -spectrum between ""black"" and ""white"" polar opposites. We are able to validate -results from this model by measuring the influences exhibited by individual -words, and we are able to quantify the uncertainty in the scaling estimates by -using a sentence-level block bootstrap. Applying our method to the D\'ail -debate, we are able to scale the legislators between extreme pro-government and -pro-opposition in a way that reveals nuances in their speeches not captured by -their votes or party affiliations. -" -6003,1710.09026,"Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad - Shoeybi","Trace norm regularization and faster inference for embedded speech - recognition RNNs",cs.LG cs.CL eess.AS stat.ML," We propose and evaluate new techniques for compressing and speeding up dense -matrix multiplications as found in the fully connected and recurrent layers of -neural networks for embedded large vocabulary continuous speech recognition -(LVCSR). For compression, we introduce and study a trace norm regularization -technique for training low rank factored versions of matrix multiplications. -Compared to standard low rank training, we show that our method leads to good -accuracy versus number of parameter trade-offs and can be used to speed up -training of large models. For speedup, we enable faster inference on ARM -processors through new open sourced kernels optimized for small batch sizes, -resulting in 3x to 7x speed ups over the widely used gemmlowp library. Beyond -LVCSR, we expect our techniques and kernels to be more generally applicable to -embedded neural networks with large fully connected or recurrent layers. -" -6004,1710.09085,"Sounak Banerjee, Prasenjit Majumder, Mandar Mitra","Re-evaluating the need for Modelling Term-Dependence in Text - Classification Problems",cs.IR cs.CL," A substantial amount of research has been carried out in developing machine -learning algorithms that account for term dependence in text classification. -These algorithms offer acceptable performance in most cases but they are -associated with a substantial cost. They require significantly greater -resources to operate. This paper argues against the justification of the higher -costs of these algorithms, based on their performance in text classification -problems. In order to prove the conjecture, the performance of one of the best -dependence models is compared to several well established algorithms in text -classification. A very specific collection of datasets have been designed, -which would best reflect the disparity in the nature of text data, that are -present in real world applications. The results show that even one of the best -term dependence models, performs decent at best when compared to other -independence models. Coupled with their substantially greater requirement for -hardware resources for operation, this makes them an impractical choice for -being used in real world scenarios. -" -6005,1710.09137,"Aditya Mogadala, Dominik Jung, Achim Rettinger","Linking Tweets with Monolingual and Cross-Lingual News using Transformed - Word Embeddings",cs.CL," Social media platforms have grown into an important medium to spread -information about an event published by the traditional media, such as news -articles. Grouping such diverse sources of information that discuss the same -topic in varied perspectives provide new insights. But the gap in word usage -between informal social media content such as tweets and diligently written -content (e.g. news articles) make such assembling difficult. In this paper, we -propose a transformation framework to bridge the word usage gap between tweets -and online news articles across languages by leveraging their word embeddings. -Using our framework, word embeddings extracted from tweets and news articles -are aligned closer to each other across languages, thus facilitating the -identification of similarity between news articles and tweets. Experimental -results show a notable improvement over baselines for monolingual tweets and -news articles comparison, while new findings are reported for cross-lingual -comparison. -" -6006,1710.09233,Renato Fabbri and Luis Henrique Garcia,"A Simple Text Analytics Model To Assist Literary Criticism: comparative - approach and example on James Joyce against Shakespeare and the Bible",cs.CL," Literary analysis, criticism or studies is a largely valued field with -dedicated journals and researchers which remains mostly within the humanities -scope. Text analytics is the computer-aided process of deriving information -from texts. In this article we describe a simple and generic model for -performing literary analysis using text analytics. The method relies on -statistical measures of: 1) token and sentence sizes and 2) Wordnet synset -features. These measures are then used in Principal Component Analysis where -the texts to be analyzed are observed against Shakespeare and the Bible, -regarded as reference literature. The model is validated by analyzing selected -works from James Joyce (1882-1941), one of the most important writers of the -20th century. We discuss the consistency of this approach, the reasons why we -did not use other techniques (e.g. part-of-speech tagging) and the ways by -which the analysis model might be adapted and enhanced. -" -6007,1710.09306,"Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmasi, Mihaela Vela, - Liviu P. Dinu, Josef van Genabith",Exploring the Use of Text Classification in the Legal Domain,cs.CL," In this paper, we investigate the application of text classification methods -to support law professionals. We present several experiments applying machine -learning techniques to predict with high accuracy the ruling of the French -Supreme Court and the law area to which a case belongs to. We also investigate -the influence of the time period in which a ruling was made on the form of the -case description and the extent to which we need to mask information in a full -case ruling to automatically obtain training and test data that resembles case -descriptions. We developed a mean probability ensemble system combining the -output of multiple SVM classifiers. We report results of 98% average F1 score -in predicting a case ruling, 96% F1 score for predicting the law area of a -case, and 87.07% F1 score on estimating the date of a ruling. -" -6008,1710.09340,Daniel Fern\'andez-Gonz\'alez and Carlos G\'omez-Rodr\'iguez,Non-Projective Dependency Parsing with Non-Local Transitions,cs.CL," We present a novel transition system, based on the Covington non-projective -parser, introducing non-local transitions that can directly create arcs -involving nodes to the left of the current focus positions. This avoids the -need for long sequences of No-Arc transitions to create long-distance arcs, -thus alleviating error propagation. The resulting parser outperforms the -original version and achieves the best accuracy on the Stanford Dependencies -conversion of the Penn Treebank among greedy transition-based algorithms. -" -6009,1710.09589,Barbara Plank,ALL-IN-1: Short Text Classification with One Model for All Languages,cs.CL," We present ALL-IN-1, a simple model for multilingual text classification that -does not require any parallel data. It is based on a traditional Support Vector -Machine classifier exploiting multilingual word embeddings and character -n-grams. Our model is simple, easily extendable yet very effective, overall -ranking 1st (out of 12 teams) in the IJCNLP 2017 shared task on customer -feedback analysis in four languages: English, French, Japanese and Spanish. -" -6010,1710.09617,"Yanzhang He, Rohit Prabhavalkar, Kanishka Rao, Wei Li, Anton Bakhtin, - Ian McGraw","Streaming Small-Footprint Keyword Spotting using Sequence-to-Sequence - Models",cs.CL," We develop streaming keyword spotting systems using a recurrent neural -network transducer (RNN-T) model: an all-neural, end-to-end trained, -sequence-to-sequence model which jointly learns acoustic and language model -components. Our models are trained to predict either phonemes or graphemes as -subword units, thus allowing us to detect arbitrary keyword phrases, without -any out-of-vocabulary words. In order to adapt the models to the requirements -of keyword spotting, we propose a novel technique which biases the RNN-T system -towards a specific keyword of interest. - Our systems are compared against a strong sequence-trained, connectionist -temporal classification (CTC) based ""keyword-filler"" baseline, which is -augmented with a separate phoneme language model. Overall, our RNN-T system -with the proposed biasing technique significantly improves performance over the -baseline system. -" -6011,1710.09753,"Heike Adel and Hinrich Sch\""utze",Impact of Coreference Resolution on Slot Filling,cs.CL," In this paper, we demonstrate the importance of coreference resolution for -natural language processing on the example of the TAC Slot Filling shared task. -We illustrate the strengths and weaknesses of automatic coreference resolution -systems and provide experimental results to show that they improve performance -in the slot filling end-to-end setting. Finally, we publish KBPchains, a -resource containing automatically extracted coreference chains from the TAC -source corpus in order to support other researchers working on this topic. -" -6012,1710.09805,"Long Chen, Fajie Yuan, Joemon M. Jose, Weinan Zhang","Improving Negative Sampling for Word Representation using Self-embedded - Features",cs.LG cs.CL stat.ML," Although the word-popularity based negative sampler has shown superb -performance in the skip-gram model, the theoretical motivation behind -oversampling popular (non-observed) words as negative samples is still not well -understood. In this paper, we start from an investigation of the gradient -vanishing issue in the skipgram model without a proper negative sampler. By -performing an insightful analysis from the stochastic gradient descent (SGD) -learning perspective, we demonstrate that, both theoretically and intuitively, -negative samples with larger inner product scores are more informative than -those with lower scores for the SGD learner in terms of both convergence rate -and accuracy. Understanding this, we propose an alternative sampling algorithm -that dynamically selects informative negative samples during each SGD update. -More importantly, the proposed sampler accounts for multi-dimensional -self-embedded features during the sampling process, which essentially makes it -more effective than the original popularity-based (one-dimensional) sampler. -Empirical experiments further verify our observations, and show that our -fine-grained samplers gain significant improvement over the existing ones -without increasing computational complexity. -" -6013,1710.09867,"Felix Hill, Stephen Clark, Karl Moritz Hermann, Phil Blunsom",Understanding Early Word Learning in Situated Artificial Agents,cs.CL cs.AI cs.NE," Neural network-based systems can now learn to locate the referents of words -and phrases in images, answer questions about visual scenes, and execute -symbolic instructions as first-person actors in partially-observable worlds. To -achieve this so-called grounded language learning, models must overcome -challenges that infants face when learning their first words. While it is -notable that models with no meaningful prior knowledge overcome these -obstacles, researchers currently lack a clear understanding of how they do so, -a problem that we attempt to address in this paper. For maximum control and -generality, we focus on a simple neural network-based language learning agent, -trained via policy-gradient methods, which can interpret single-word -instructions in a simulated 3D world. Whilst the goal is not to explicitly -model infant word learning, we take inspiration from experimental paradigms in -developmental psychology and apply some of these to the artificial agent, -exploring the conditions under which established human biases and learning -effects emerge. We further propose a novel method for visualising semantic -representations in the agent. -" -6014,1710.09942,"Tushar Nagarajan, Sharmistha and Partha Talukdar",CANDiS: Coupled & Attention-Driven Neural Distant Supervision,cs.CL," Distant Supervision for Relation Extraction uses heuristically aligned text -data with an existing knowledge base as training data. The unsupervised nature -of this technique allows it to scale to web-scale relation extraction tasks, at -the expense of noise in the training data. Previous work has explored -relationships among instances of the same entity-pair to reduce this noise, but -relationships among instances across entity-pairs have not been fully -exploited. We explore the use of inter-instance couplings based on verb-phrase -and entity type similarities. We propose a novel technique, CANDiS, which casts -distant supervision using inter-instance coupling into an end-to-end neural -network model. CANDiS incorporates an attention module at the instance-level to -model the multi-instance nature of this problem. CANDiS outperforms existing -state-of-the-art techniques on a standard benchmark dataset. -" -6015,1710.10224,"Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee","BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural - Networks and its Application to Distant Speech Recognition",cs.CL cs.SD eess.AS," Despite the remarkable progress achieved on automatic speech recognition, -recognizing far-field speeches mixed with various noise sources is still a -challenging task. In this paper, we introduce novel student-teacher transfer -learning, BridgeNet which can provide a solution to improve distant speech -recognition. There are two key features in BridgeNet. First, BridgeNet extends -traditional student-teacher frameworks by providing multiple hints from a -teacher network. Hints are not limited to the soft labels from a teacher -network. Teacher's intermediate feature representations can better guide a -student network to learn how to denoise or dereverberate noisy input. Second, -the proposed recursive architecture in the BridgeNet can iteratively improve -denoising and recognition performance. The experimental results of BridgeNet -showed significant improvements in tackling the distant speech recognition -problem, where it achieved up to 13.24% relative WER reductions on AMI corpus -compared to a baseline neural network without teacher's hints. -" -6016,1710.10248,"Vasily Pestun, Yiannis Vlassopoulos",Tensor network language model,cs.CL cond-mat.dis-nn cs.LG cs.NE stat.ML," We propose a new statistical model suitable for machine learning of systems -with long distance correlations such as natural languages. The model is based -on directed acyclic graph decorated by multi-linear tensor maps in the vertices -and vector spaces in the edges, called tensor network. Such tensor networks -have been previously employed for effective numerical computation of the -renormalization group flow on the space of effective quantum field theories and -lattice models of statistical mechanics. We provide explicit algebro-geometric -analysis of the parameter moduli space for tree graphs, discuss model -properties and applications such as statistical translation. -" -6017,1710.10280,"Andrew K. Lampinen, James L. McClelland",One-shot and few-shot learning of word embeddings,cs.CL cs.LG stat.ML," Standard deep learning systems require thousands or millions of examples to -learn a concept, and cannot integrate new concepts easily. By contrast, humans -have an incredible ability to do one-shot or few-shot learning. For instance, -from just hearing a word used in a sentence, humans can infer a great deal -about it, by leveraging what the syntax and semantics of the surrounding words -tells us. Here, we draw inspiration from this to highlight a simple technique -by which deep recurrent networks can similarly exploit their prior knowledge to -learn a useful representation for a new word from little data. This could make -natural language processing systems much more flexible, by allowing them to -learn continually from the new words they encounter. -" -6018,1710.10361,"Raphael Tang, Jimmy Lin",Deep Residual Learning for Small-Footprint Keyword Spotting,cs.CL," We explore the application of deep residual learning and dilated convolutions -to the keyword spotting task, using the recently-released Google Speech -Commands Dataset as our benchmark. Our best residual network (ResNet) -implementation significantly outperforms Google's previous convolutional neural -networks in terms of accuracy. By varying model depth and width, we can achieve -compact models that also outperform previous small-footprint variants. To our -knowledge, we are the first to examine these approaches for keyword spotting, -and our results establish an open-source state-of-the-art reference to support -the development of future speech-based interfaces. -" -6019,1710.10380,"Shuai Tang, Hailin Jin, Chen Fang, Zhaowen Wang, Virginia R. de Sa","Speeding up Context-based Sentence Representation Learning with - Non-autoregressive Convolutional Decoding",cs.NE cs.CL cs.LG," Context plays an important role in human language understanding, thus it may -also be useful for machines learning vector representations of language. In -this paper, we explore an asymmetric encoder-decoder structure for unsupervised -context-based sentence representation learning. We carefully designed -experiments to show that neither an autoregressive decoder nor an RNN decoder -is required. After that, we designed a model which still keeps an RNN as the -encoder, while using a non-autoregressive convolutional decoder. We further -combine a suite of effective designs to significantly improve model efficiency -while also achieving better performance. Our model is trained on two different -large unlabelled corpora, and in both cases the transferability is evaluated on -a set of downstream NLP tasks. We empirically show that our model is simple and -fast while producing rich sentence representations that excel in downstream -tasks. -" -6020,1710.10393,"Xu Sun, Bingzhen Wei, Xuancheng Ren, Shuming Ma","Label Embedding Network: Learning Label Representation for Soft Training - of Deep Networks",cs.LG cs.CL cs.CV," We propose a method, called Label Embedding Network, which can learn label -representation (label embedding) during the training process of deep networks. -With the proposed method, the label embedding is adaptively and automatically -learned through back propagation. The original one-hot represented loss -function is converted into a new loss function with soft distributions, such -that the originally unrelated labels have continuous interactions with each -other during the training process. As a result, the trained model can achieve -substantially higher accuracy and with faster convergence speed. Experimental -results based on competitive tasks demonstrate the effectiveness of the -proposed method, and the learned label embedding is reasonable and -interpretable. The proposed method achieves comparable or even better results -than the state-of-the-art systems. The source code is available at -\url{https://github.com/lancopku/LabelEmb}. -" -6021,1710.10398,"Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu","A Study of All-Convolutional Encoders for Connectionist Temporal - Classification",cs.CL," Connectionist temporal classification (CTC) is a popular sequence prediction -approach for automatic speech recognition that is typically used with models -based on recurrent neural networks (RNNs). We explore whether deep -convolutional neural networks (CNNs) can be used effectively instead of RNNs as -the ""encoder"" in CTC. CNNs lack an explicit representation of the entire -sequence, but have the advantage that they are much faster to train. We present -an exploration of CNNs as encoders for CTC models, in the context of -character-based (lexicon-free) automatic speech recognition. In particular, we -explore a range of one-dimensional convolutional layers, which are particularly -efficient. We compare the performance of our CNN-based models against typical -RNNbased models in terms of training time, decoding time, model size and word -error rate (WER) on the Switchboard Eval2000 corpus. We find that our CNN-based -models are close in performance to LSTMs, while not matching them, and are much -faster to train and decode. -" -6022,1710.10453,"Mor Cohen, Avi Caciularu, Idan Rejwan, Jonathan Berant",Inducing Regular Grammars Using Recurrent Neural Networks,cs.CL," Grammar induction is the task of learning a grammar from a set of examples. -Recently, neural networks have been shown to be powerful learning machines that -can identify patterns in streams of data. In this work we investigate their -effectiveness in inducing a regular grammar from data, without any assumptions -about the grammar. We train a recurrent neural network to distinguish between -strings that are in or outside a regular language, and utilize an algorithm for -extracting the learned finite-state automaton. We apply this method to several -regular languages and find unexpected results regarding the connections between -the network's states that may be regarded as evidence for generalization. -" -6023,1710.10467,"Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno",Generalized End-to-End Loss for Speaker Verification,eess.AS cs.CL cs.LG stat.ML," In this paper, we propose a new loss function called generalized end-to-end -(GE2E) loss, which makes the training of speaker verification models more -efficient than our previous tuple-based end-to-end (TE2E) loss function. Unlike -TE2E, the GE2E loss function updates the network in a way that emphasizes -examples that are difficult to verify at each step of the training process. -Additionally, the GE2E loss does not require an initial stage of example -selection. With these properties, our model with the new loss function -decreases speaker verification EER by more than 10%, while reducing the -training time by 60% at the same time. We also introduce the MultiReader -technique, which allows us to do domain adaptation - training a more accurate -model that supports multiple keywords (i.e. ""OK Google"" and ""Hey Google"") as -well as multiple dialects. -" -6024,1710.10498,Sharath T. S. and Shubhangi Tandon,Topic Based Sentiment Analysis Using Deep Learning,cs.CL cs.IR," In this paper , we tackle Sentiment Analysis conditioned on a Topic in -Twitter data using Deep Learning . We propose a 2-tier approach : In the first -phase we create our own Word Embeddings and see that they do perform better -than state-of-the-art embeddings when used with standard classifiers. We then -perform inference on these embeddings to learn more about a word with respect -to all the topics being considered, and also the top n-influencing words for -each topic. In the second phase we use these embeddings to predict the -sentiment of the tweet with respect to a given topic, and all other topics -under discussion. -" -6025,1710.10504,"Rui Liu, Wei Wei, Weiguang Mao, Maria Chikina",Phase Conductor on Multi-layered Attentions for Machine Comprehension,cs.CL," Attention models have been intensively studied to improve NLP tasks such as -machine comprehension via both question-aware passage attention model and -self-matching attention model. Our research proposes phase conductor -(PhaseCond) for attention models in two meaningful ways. First, PhaseCond, an -architecture of multi-layered attention models, consists of multiple phases -each implementing a stack of attention layers producing passage representations -and a stack of inner or outer fusion layers regulating the information flow. -Second, we extend and improve the dot-product attention function for PhaseCond -by simultaneously encoding multiple question and passage embedding layers from -different perspectives. We demonstrate the effectiveness of our proposed model -PhaseCond on the SQuAD dataset, showing that our model significantly -outperforms both state-of-the-art single-layered and multiple-layered attention -models. We deepen our results with new findings via both detailed qualitative -analysis and visualized examples showing the dynamic changes through -multi-layered attention models. -" -6026,1710.10520,"Sharath T.S., Shubhangi Tandon, Ryan Bauer","A Dual Encoder Sequence to Sequence Model for Open-Domain Dialogue - Modeling",cs.CL," Ever since the successful application of sequence to sequence learning for -neural machine translation systems, interest has surged in its applicability -towards language generation in other problem domains. Recent work has -investigated the use of these neural architectures towards modeling open-domain -conversational dialogue, where it has been found that although these models are -capable of learning a good distributional language model, dialogue coherence is -still of concern. Unlike translation, conversation is much more a one-to-many -mapping from utterance to a response, and it is even more pressing that the -model be aware of the preceding flow of conversation. In this paper we propose -to tackle this problem by introducing previous conversational context in terms -of latent representations of dialogue acts over time. We inject the latent -context representations into a sequence to sequence neural network in the form -of dialog acts using a second encoder to enhance the quality and the coherence -of the conversations generated. The main task of this research work is to show -that adding latent variables that capture discourse relations does indeed -result in more coherent responses when compared to conventional sequence to -sequence models. -" -6027,1710.10574,"Zih-Wei Lin, Tzu-Wei Sung, Hung-Yi Lee, Lin-Shan Lee","Personalized word representations Carrying Personalized Semantics - Learned from Social Network Posts",cs.CL," Distributed word representations have been shown to be very useful in various -natural language processing (NLP) application tasks. These word vectors learned -from huge corpora very often carry both semantic and syntactic information of -words. However, it is well known that each individual user has his own language -patterns because of different factors such as interested topics, friend groups, -social activities, wording habits, etc., which may imply some kind of -personalized semantics. With such personalized semantics, the same word may -imply slightly differently for different users. For example, the word -""Cappuccino"" may imply ""Leisure"", ""Joy"", ""Excellent"" for a user enjoying -coffee, by only a kind of drink for someone else. Such personalized semantics -of course cannot be carried by the standard universal word vectors trained with -huge corpora produced by many people. In this paper, we propose a framework to -train different personalized word vectors for different users based on the very -successful continuous skip-gram model using the social network data posted by -many individual users. In this framework, universal background word vectors are -first learned from the background corpora, and then adapted by the personalized -corpus for each individual user to learn the personalized word vectors. We use -two application tasks to evaluate the quality of the personalized word vectors -obtained in this way, the user prediction task and the sentence completion -task. These personalized word vectors were shown to carry some personalized -semantics and offer improved performance on these two evaluation tasks. -" -6028,1710.10585,"Denghui Zhang, Pengshan Cai, Yantao Jia, Manling Li, Yuanzhuo Wang, - Xueqi Cheng",Path-Based Attention Neural Model for Fine-Grained Entity Typing,cs.CL," Fine-grained entity typing aims to assign entity mentions in the free text -with types arranged in a hierarchical structure. Traditional distant -supervision based methods employ a structured data source as a weak supervision -and do not need hand-labeled data, but they neglect the label noise in the -automatically labeled training corpus. Although recent studies use many -features to prune wrong data ahead of training, they suffer from error -propagation and bring much complexity. In this paper, we propose an end-to-end -typing model, called the path-based attention neural model (PAN), to learn a -noise- robust performance by leveraging the hierarchical structure of types. -Experiments demonstrate its effectiveness. -" -6029,1710.10586,"Yvette Graham, George Awad, Alan Smeaton",Evaluation of Automatic Video Captioning Using Direct Assessment,cs.CL," We present Direct Assessment, a method for manually assessing the quality of -automatically-generated captions for video. Evaluating the accuracy of video -captions is particularly difficult because for any given video clip there is no -definitive ground truth or correct answer against which to measure. Automatic -metrics for comparing automatic video captions against a manual caption such as -BLEU and METEOR, drawn from techniques used in evaluating machine translation, -were used in the TRECVid video captioning task in 2016 but these are shown to -have weaknesses. The work presented here brings human assessment into the -evaluation by crowdsourcing how well a caption describes a video. We -automatically degrade the quality of some sample captions which are assessed -manually and from this we are able to rate the quality of the human assessors, -a factor we take into account in the evaluation. Using data from the TRECVid -video-to-text task in 2016, we show how our direct assessment method is -replicable and robust and should scale to where there many caption-generation -techniques to be evaluated. -" -6030,1710.10609,Dhiraj Madan and Sachindra Joshi,Finding Dominant User Utterances And System Responses in Conversations,cs.CL," There are several dialog frameworks which allow manual specification of -intents and rule based dialog flow. The rule based framework provides good -control to dialog designers at the expense of being more time consuming and -laborious. The job of a dialog designer can be reduced if we could identify -pairs of user intents and corresponding responses automatically from prior -conversations between users and agents. In this paper we propose an approach to -find these frequent user utterances (which serve as examples for intents) and -corresponding agent responses. We propose a novel SimCluster algorithm that -extends standard K-means algorithm to simultaneously cluster user utterances -and agent utterances by taking their adjacency information into account. The -method also aligns these clusters to provide pairs of intents and response -groups. We compare our results with those produced by using simple Kmeans -clustering on a real dataset and observe upto 10% absolute improvement in -F1-scores. Through our experiments on synthetic dataset, we show that our -algorithm gains more advantage over K-means algorithm when the data has large -variance. -" -6031,1710.10639,"Reid Pryzant, Yongjoo Chung, Dan Jurafsky, and Denny Britz",JESC: Japanese-English Subtitle Corpus,cs.CL," In this paper we describe the Japanese-English Subtitle Corpus (JESC). JESC -is a large Japanese-English parallel corpus covering the underrepresented -domain of conversational dialogue. It consists of more than 3.2 million -examples, making it the largest freely available dataset of its kind. The -corpus was assembled by crawling and aligning subtitles found on the web. The -assembly process incorporates a number of novel preprocessing elements to -ensure high monolingual fluency and accurate bilingual alignments. We summarize -its contents and evaluate its quality using human experts and baseline machine -translation (MT) systems. -" -6032,1710.10723,Christopher Clark and Matt Gardner,Simple and Effective Multi-Paragraph Reading Comprehension,cs.CL," We consider the problem of adapting neural paragraph-level question answering -models to the case where entire documents are given as input. Our proposed -solution trains models to produce well calibrated confidence scores for their -results on individual paragraphs. We sample multiple paragraphs from the -documents during training, and use a shared-normalization training objective -that encourages the model to produce globally correct output. We combine this -method with a state-of-the-art pipeline for training models on document QA -data. Experiments demonstrate strong performance on several document QA -datasets. Overall, we are able to achieve a score of 71.3 F1 on the web portion -of TriviaQA, a large improvement from the 56.7 F1 of the previous best system. -" -6033,1710.10739,Bin Wang and Zhijian Ou,"Learning neural trans-dimensional random field language models with - noise-contrastive estimation",cs.CL stat.ML," Trans-dimensional random field language models (TRF LMs) where sentences are -modeled as a collection of random fields, have shown close performance with -LSTM LMs in speech recognition and are computationally more efficient in -inference. However, the training efficiency of neural TRF LMs is not -satisfactory, which limits the scalability of TRF LMs on large training corpus. -In this paper, several techniques on both model formulation and parameter -estimation are proposed to improve the training efficiency and the performance -of neural TRF LMs. First, TRFs are reformulated in the form of exponential -tilting of a reference distribution. Second, noise-contrastive estimation (NCE) -is introduced to jointly estimate the model parameters and normalization -constants. Third, we extend the neural TRF LMs by marrying the deep -convolutional neural network (CNN) and the bidirectional LSTM into the -potential function to extract the deep hierarchical features and -bidirectionally sequential features. Utilizing all the above techniques enables -the successful and efficient training of neural TRF LMs on a 40x larger -training set with only 1/3 training time and further reduces the WER with -relative reduction of 4.7% on top of a strong LSTM LM baseline. -" -6034,1710.10774,"Andros Tjandra, Sakriani Sakti, Satoshi Nakamura",Sequence-to-Sequence ASR Optimization via Reinforcement Learning,cs.CL cs.LG cs.SD eess.AS," Despite the success of sequence-to-sequence approaches in automatic speech -recognition (ASR) systems, the models still suffer from several problems, -mainly due to the mismatch between the training and inference conditions. In -the sequence-to-sequence architecture, the model is trained to predict the -grapheme of the current time-step given the input of speech signal and the -ground-truth grapheme history of the previous time-steps. However, it remains -unclear how well the model approximates real-world speech during inference. -Thus, generating the whole transcription from scratch based on previous -predictions is complicated and errors can propagate over time. Furthermore, the -model is optimized to maximize the likelihood of training data instead of error -rate evaluation metrics that actually quantify recognition quality. This paper -presents an alternative strategy for training sequence-to-sequence ASR models -by adopting the idea of reinforcement learning (RL). Unlike the standard -training scheme with maximum likelihood estimation, our proposed approach -utilizes the policy gradient algorithm. We can (1) sample the whole -transcription based on the model's prediction in the training process and (2) -directly optimize the model with negative Levenshtein distance as the reward. -Experimental results demonstrate that we significantly improved the performance -compared to a model trained only with maximum likelihood estimation. -" -6035,1710.10777,"Yao Ming and Shaozu Cao and Ruixiang Zhang and Zhen Li and Yuanzhe - Chen and Yangqiu Song and Huamin Qu",Understanding Hidden Memories of Recurrent Neural Networks,cs.CL cs.AI," Recurrent neural networks (RNNs) have been successfully applied to various -natural language processing (NLP) tasks and achieved better results than -conventional methods. However, the lack of understanding of the mechanisms -behind their effectiveness limits further improvements on their architectures. -In this paper, we present a visual analytics method for understanding and -comparing RNN models for NLP tasks. We propose a technique to explain the -function of individual hidden state units based on their expected response to -input texts. We then co-cluster hidden state units and words based on the -expected response and visualize co-clustering results as memory chips and word -clouds to provide more structured knowledge on RNNs' hidden states. We also -propose a glyph-based sequence visualization based on aggregate information to -analyze the behavior of an RNN's hidden state at the sentence-level. The -usability and effectiveness of our method are demonstrated through case studies -and reviews from domain experts. -" -6036,1710.10994,"Mohammad Ebrahim Khademi, Mohammad Fakhredanesh and Seyed Mojtaba - Hoseini",Conceptual Text Summarizer: A new model in continuous vector space,cs.CL cs.IR," Traditional methods of summarization are not cost-effective and possible -today. Extractive summarization is a process that helps to extract the most -important sentences from a text automatically and generates a short informative -summary. In this work, we propose an unsupervised method to summarize Persian -texts. This method is a novel hybrid approach that clusters the concepts of the -text using deep learning and traditional statistical methods. First we produce -a word embedding based on Hamshahri2 corpus and a dictionary of word -frequencies. Then the proposed algorithm extracts the keywords of the document, -clusters its concepts, and finally ranks the sentences to produce the summary. -We evaluated the proposed method on Pasokh single-document corpus using the -ROUGE evaluation measure. Without using any hand-crafted features, our proposed -method achieves state-of-the-art results. We compared our unsupervised method -with the best supervised Persian methods and we achieved an overall improvement -of ROUGE-2 recall score of 7.5%. -" -6037,1710.11027,Diego Esteves and Rafael Peres and Jens Lehmann and Giulio Napolitano,Named Entity Recognition in Twitter using Images and Text,cs.IR cs.CL," Named Entity Recognition (NER) is an important subtask of information -extraction that seeks to locate and recognise named entities. Despite recent -achievements, we still face limitations with correctly detecting and -classifying entities, prominently in short and noisy text, such as Twitter. An -important negative aspect in most of NER approaches is the high dependency on -hand-crafted features and domain-specific knowledge, necessary to achieve -state-of-the-art results. Thus, devising models to deal with such -linguistically complex contexts is still challenging. In this paper, we propose -a novel multi-level architecture that does not rely on any specific linguistic -resource or encoded rule. Unlike traditional approaches, we use features -extracted from images and text to classify named entities. Experimental tests -against state-of-the-art NER for Twitter on the Ritter dataset present -competitive results (0.59 F-measure), indicating that this approach may lead -towards better NER models. -" -6038,1710.11035,"Pierre-Edouard Honnet, Andrei Popescu-Belis, Claudiu Musat, Michael - Baeriswyl","Machine Translation of Low-Resource Spoken Dialects: Strategies for - Normalizing Swiss German",cs.CL," The goal of this work is to design a machine translation (MT) system for a -low-resource family of dialects, collectively known as Swiss German, which are -widely spoken in Switzerland but seldom written. We collected a significant -number of parallel written resources to start with, up to a total of about 60k -words. Moreover, we identified several other promising data sources for Swiss -German. Then, we designed and compared three strategies for normalizing Swiss -German input in order to address the regional diversity. We found that -character-based neural MT was the best solution for text normalization. In -combination with phrase-based statistical MT, our solution reached 36% BLEU -score when translating from the Bernese dialect. This value, however, decreases -as the testing data becomes more remote from the training one, geographically -and topically. These resources and normalization techniques are a first step -towards full MT of Swiss German dialects. -" -6039,1710.11041,"Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho",Unsupervised Neural Machine Translation,cs.CL cs.AI cs.LG," In spite of the recent success of neural machine translation (NMT) in -standard benchmarks, the lack of large parallel corpora poses a major practical -problem for many language pairs. There have been several proposals to alleviate -this issue with, for instance, triangulation and semi-supervised learning -techniques, but they still require a strong cross-lingual signal. In this work, -we completely remove the need of parallel data and propose a novel method to -train an NMT system in a completely unsupervised manner, relying on nothing but -monolingual corpora. Our model builds upon the recent work on unsupervised -embedding mappings, and consists of a slightly modified attentional -encoder-decoder model that can be trained on monolingual corpora alone using a -combination of denoising and backtranslation. Despite the simplicity of the -approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014 -French-to-English and German-to-English translation. The model can also profit -from small parallel corpora, and attains 21.81 and 15.24 points when combined -with 100,000 parallel sentences, respectively. Our implementation is released -as an open source project. -" -6040,1710.11154,"Viviana Cotik and Dar\'io Filippo and Roland Roller and Hans Uszkoreit - and Feiyu Xu",Creation of an Annotated Corpus of Spanish Radiology Reports,cs.CL," This paper presents a new annotated corpus of 513 anonymized radiology -reports written in Spanish. Reports were manually annotated with entities, -negation and uncertainty terms and relations. The corpus was conceived as an -evaluation resource for named entity recognition and relation extraction -algorithms, and as input for the use of supervised methods. Biomedical -annotated resources are scarce due to confidentiality issues and associated -costs. This work provides some guidelines that could help other researchers to -undertake similar tasks. -" -6041,1710.11169,"Zeqiu Wu, Xiang Ren, Frank F. Xu, Ji Li, Jiawei Han",Indirect Supervision for Relation Extraction using Question-Answer Pairs,cs.CL cs.AI," Automatic relation extraction (RE) for types of interest is of great -importance for interpreting massive text corpora in an efficient manner. -Traditional RE models have heavily relied on human-annotated corpus for -training, which can be costly in generating labeled data and become obstacles -when dealing with more relation types. Thus, more RE extraction systems have -shifted to be built upon training data automatically acquired by linking to -knowledge bases (distant supervision). However, due to the incompleteness of -knowledge bases and the context-agnostic labeling, the training data collected -via distant supervision (DS) can be very noisy. In recent years, as increasing -attention has been brought to tackling question-answering (QA) tasks, user -feedback or datasets of such tasks become more accessible. In this paper, we -propose a novel framework, ReQuest, to leverage question-answer pairs as an -indirect source of supervision for relation extraction, and study how to use -such supervision to reduce noise induced from DS. Our model jointly embeds -relation mentions, types, QA entity mention pairs and text features in two -low-dimensional spaces (RE and QA), where objects with same relation types or -semantically similar question-answer pairs have similar representations. Shared -features connect these two spaces, carrying clearer semantic knowledge from -both sources. ReQuest, then use these learned embeddings to estimate the types -of test relation mentions. We formulate a global objective function and adopt a -novel margin-based QA loss to reduce noise in DS by exploiting semantic -evidence from the QA dataset. Our experimental results achieve an average of -11% improvement in F1 score on two public RE datasets combined with TREC QA -dataset. -" -6042,1710.11277,"Baolin Peng and Xiujun Li and Jianfeng Gao and Jingjing Liu and - Yun-Nung Chen and Kam-Fai Wong","Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue - Policy Learning",cs.CL cs.AI cs.LG," This paper presents a new method --- adversarial advantage actor-critic -(Adversarial A2C), which significantly improves the efficiency of dialogue -policy learning in task-completion dialogue systems. Inspired by generative -adversarial networks (GAN), we train a discriminator to differentiate -responses/actions generated by dialogue agents from responses/actions by -experts. Then, we incorporate the discriminator as another critic into the -advantage actor-critic (A2C) framework, to encourage the dialogue agent to -explore state-action within the regions where the agent takes actions similar -to those of the experts. Experimental results in a movie-ticket booking domain -show that the proposed Adversarial A2C can accelerate policy exploration -efficiently. -" -6043,1710.11301,"Daniel Harasim, Chris Bruno, Eva Portelance, Martin Rohrmeier, Timothy - J. O'Donnell",A generalized parsing framework for Abstract Grammars,cs.CL cs.FL," This technical report presents a general framework for parsing a variety of -grammar formalisms. We develop a grammar formalism, called an Abstract Grammar, -which is general enough to represent grammars at many levels of the hierarchy, -including Context Free Grammars, Minimalist Grammars, and Generalized -Context-free Grammars. We then develop a single parsing framework which is -capable of parsing grammars which are at least up to GCFGs on the hierarchy. -Our parsing framework exposes a grammar interface, so that it can parse any -particular grammar formalism that can be reduced to an Abstract Grammar. -" -6044,1710.11332,Jingjing Xu,"Improving Social Media Text Summarization by Learning Sentence Weight - Distribution",cs.CL," Recently, encoder-decoder models are widely used in social media text -summarization. However, these models sometimes select noise words in irrelevant -sentences as part of a summary by error, thus declining the performance. In -order to inhibit irrelevant sentences and focus on key information, we propose -an effective approach by learning sentence weight distribution. In our model, -we build a multi-layer perceptron to predict sentence weights. During training, -we use the ROUGE score as an alternative to the estimated sentence weight, and -try to minimize the gap between estimated weights and predicted weights. In -this way, we encourage our model to focus on the key sentences, which have high -relevance with the summary. Experimental results show that our approach -outperforms baselines on a large-scale social media corpus. -" -6045,1710.11334,Jingjing Xu,Shallow Discourse Parsing with Maximum Entropy Model,cs.CL," In recent years, more research has been devoted to studying the subtask of -the complete shallow discourse parsing, such as indentifying discourse -connective and arguments of connective. There is a need to design a full -discourse parser to pull these subtasks together. So we develop a discourse -parser turning the free text into discourse relations. The parser includes -connective identifier, arguments identifier, sense classifier and non-explicit -identifier, which connects with each other in pipeline. Each component applies -the maximum entropy model with abundant lexical and syntax features extracted -from the Penn Discourse Tree-bank. The head-based representation of the PDTB is -adopted in the arguments identifier, which turns the problem of indentifying -the arguments of discourse connective into finding the head and end of the -arguments. In the non-explicit identifier, the contextual type features like -words which have high frequency and can reflect the discourse relation are -introduced to improve the performance of non-explicit identifier. Compared with -other methods, experimental results achieve the considerable performance. -" -6046,1710.11342,"Zhengli Zhao, Dheeru Dua, Sameer Singh",Generating Natural Adversarial Examples,cs.LG cs.AI cs.CL cs.CV," Due to their complex nature, it is hard to characterize the ways in which -machine learning models can misbehave or be exploited when deployed. Recent -work on adversarial examples, i.e. inputs with minor perturbations that result -in substantially different model predictions, is helpful in evaluating the -robustness of these models by exposing the adversarial scenarios where they -fail. However, these malicious perturbations are often unnatural, not -semantically meaningful, and not applicable to complicated domains such as -language. In this paper, we propose a framework to generate natural and legible -adversarial examples that lie on the data manifold, by searching in semantic -space of dense and continuous data representation, utilizing the recent -advances in generative adversarial networks. We present generated adversaries -to demonstrate the potential of the proposed approach for black-box classifiers -for a wide range of applications such as image classification, textual -entailment, and machine translation. We include experiments to show that the -generated adversaries are natural, legible to humans, and useful in evaluating -and analyzing black-box classifiers. -" -6047,1710.11344,"Yu Wu, Wei Wu, Chen Xing, Can Xu, Zhoujun Li, Ming Zhou","A Sequential Matching Framework for Multi-turn Response Selection in - Retrieval-based Chatbots",cs.CL," We study the problem of response selection for multi-turn conversation in -retrieval-based chatbots. The task requires matching a response candidate with -a conversation context, whose challenges include how to recognize important -parts of the context, and how to model the relationships among utterances in -the context. Existing matching methods may lose important information in -contexts as we can interpret them with a unified framework in which contexts -are transformed to fixed-length vectors without any interaction with responses -before matching. The analysis motivates us to propose a new matching framework -that can sufficiently carry the important information in contexts to matching -and model the relationships among utterances at the same time. The new -framework, which we call a sequential matching framework (SMF), lets each -utterance in a context interacts with a response candidate at the first step -and transforms the pair to a matching vector. The matching vectors are then -accumulated following the order of the utterances in the context with a -recurrent neural network (RNN) which models the relationships among the -utterances. The context-response matching is finally calculated with the hidden -states of the RNN. Under SMF, we propose a sequential convolutional network and -sequential attention network and conduct experiments on two public data sets to -test their performance. Experimental results show that both models can -significantly outperform the state-of-the-art matching methods. We also show -that the models are interpretable with visualizations that provide us insights -on how they capture and leverage the important information in contexts for -matching. -" -6048,1710.11350,"Eva Portelance, Amelia Bruno, Daniel Harasim, Leon Bergen and Timothy - J. O'Donnell","Grammar Induction for Minimalist Grammars using Variational Bayesian - Inference : A Technical Report",cs.CL," The following technical report presents a formal approach to probabilistic -minimalist grammar parameter estimation. We describe a formalization of a -minimalist grammar. We then present an algorithm for the application of -variational Bayesian inference to this formalization. -" -6049,1710.11475,"Qiuyuan Huang, Paul Smolensky, Xiaodong He, Li Deng, Dapeng Wu",A Neural-Symbolic Approach to Design of CAPTCHA,cs.CL," CAPTCHAs based on reading text are susceptible to machine-learning-based -attacks due to recent significant advances in deep learning (DL). To address -this, this paper promotes image/visual captioning based CAPTCHAs, which is -robust against machine-learning-based attacks. To develop -image/visual-captioning-based CAPTCHAs, this paper proposes a new image -captioning architecture by exploiting tensor product representations (TPR), a -structured neural-symbolic framework developed in cognitive science over the -past 20 years, with the aim of integrating DL with explicit language structures -and rules. We call it the Tensor Product Generation Network (TPGN). The key -ideas of TPGN are: 1) unsupervised learning of role-unbinding vectors of words -via a TPR-based deep neural network, and 2) integration of TPR with typical DL -architectures including Long Short-Term Memory (LSTM) models. The novelty of -our approach lies in its ability to generate a sentence and extract partial -grammatical structure of the sentence by using role-unbinding vectors, which -are obtained in an unsupervised manner. Experimental results demonstrate the -effectiveness of the proposed approach. -" -6050,1710.11601,Lea Frermann and Shay B. Cohen and Mirella Lapata,Whodunnit? Crime Drama as a Case for Natural Language Understanding,cs.CL cs.AI cs.CV," In this paper we argue that crime drama exemplified in television programs -such as CSI:Crime Scene Investigation is an ideal testbed for approximating -real-world natural language understanding and the complex inferences associated -with it. We propose to treat crime drama as a new inference task, capitalizing -on the fact that each episode poses the same basic question (i.e., who -committed the crime) and naturally provides the answer when the perpetrator is -revealed. We develop a new dataset based on CSI episodes, formalize perpetrator -identification as a sequence labeling problem, and develop an LSTM-based model -which learns from multi-modal data. Experimental results show that an -incremental inference strategy is key to making accurate guesses as well as -learning from representations fusing textual, visual, and acoustic input. -" -6051,1711.00043,"Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio - Ranzato",Unsupervised Machine Translation Using Monolingual Corpora Only,cs.CL cs.AI," Machine translation has recently achieved impressive performance thanks to -recent advances in deep learning and the availability of large-scale parallel -corpora. There have been numerous attempts to extend these successes to -low-resource language pairs, yet requiring tens of thousands of parallel -sentences. In this work, we take this research direction to the extreme and -investigate whether it is possible to learn to translate even without any -parallel data. We propose a model that takes sentences from monolingual corpora -in two different languages and maps them into the same latent space. By -learning to reconstruct in both languages from this shared feature space, the -model effectively learns to translate without using any labeled data. We -demonstrate our model on two widely used datasets and two language pairs, -reporting BLEU scores of 32.8 and 15.1 on the Multi30k and WMT English-French -datasets, without using even a single parallel sentence at training time. -" -6052,1711.00092,"Amita Misra, Shereen Oraby, Shubhangi Tandon, Sharath TS, Pranav Anand - and Marilyn Walker",Summarizing Dialogic Arguments from Social Media,cs.CL," Online argumentative dialog is a rich source of information on popular -beliefs and opinions that could be useful to companies as well as governmental -or public policy agencies. Compact, easy to read, summaries of these dialogues -would thus be highly valuable. A priori, it is not even clear what form such a -summary should take. Previous work on summarization has primarily focused on -summarizing written texts, where the notion of an abstract of the text is well -defined. We collect gold standard training data consisting of five human -summaries for each of 161 dialogues on the topics of Gay Marriage, Gun Control -and Abortion. We present several different computational models aimed at -identifying segments of the dialogues whose content should be used for the -summary, using linguistic features and Word2vec features with both SVMs and -Bidirectional LSTMs. We show that we can identify the most important arguments -by using the dialog context with a best F-measure of 0.74 for gun control, 0.71 -for gay marriage, and 0.67 for abortion. -" -6053,1711.00106,"Caiming Xiong, Victor Zhong, Richard Socher","DCN+: Mixed Objective and Deep Residual Coattention for Question - Answering",cs.CL cs.AI," Traditional models for question answering optimize using cross entropy loss, -which encourages exact answers at the cost of penalizing nearby or overlapping -answers that are sometimes equally accurate. We propose a mixed objective that -combines cross entropy loss with self-critical policy learning. The objective -uses rewards derived from word overlap to solve the misalignment between -evaluation metric and optimization objective. In addition to the mixed -objective, we improve dynamic coattention networks (DCN) with a deep residual -coattention encoder that is inspired by recent work in deep self-attention and -residual networks. Our proposals improve model performance across question -types and input lengths, especially for long questions that requires the -ability to capture long-term dependencies. On the Stanford Question Answering -Dataset, our model achieves state-of-the-art results with 75.1% exact match -accuracy and 83.1% F1, while the ensemble obtains 78.9% exact match accuracy -and 86.0% F1. -" -6054,1711.00155,"Pavlos Vougiouklis, Hady Elsahar, Lucie-Aim\'ee Kaffee, Christoph - Gravier, Frederique Laforest, Jonathon Hare and Elena Simperl","Neural Wikipedian: Generating Textual Summaries from Knowledge Base - Triples",cs.CL," Most people do not interact with Semantic Web data directly. Unless they have -the expertise to understand the underlying technology, they need textual or -visual interfaces to help them make sense of it. We explore the problem of -generating natural language summaries for Semantic Web data. This is -non-trivial, especially in an open-domain context. To address this problem, we -explore the use of neural networks. Our system encodes the information from a -set of triples into a vector of fixed dimensionality and generates a textual -summary by conditioning the output on the encoded vector. We train and evaluate -our models on two corpora of loosely aligned Wikipedia snippets and DBpedia and -Wikidata triples with promising results. -" -6055,1711.00179,"Boyuan Pan, Hao Li, Zhou Zhao, Deng Cai, Xiaofei He","Keyword-based Query Comprehending via Multiple Optimized-Demand - Augmentation",cs.CL," In this paper, we consider the problem of machine reading task when the -questions are in the form of keywords, rather than natural language. In recent -years, researchers have achieved significant success on machine reading -comprehension tasks, such as SQuAD and TriviaQA. These datasets provide a -natural language question sentence and a pre-selected passage, and the goal is -to answer the question according to the passage. However, in the situation of -interacting with machines by means of text, people are more likely to raise a -query in form of several keywords rather than a complete sentence. The -keyword-based query comprehension is a new challenge, because small variations -to a question may completely change its semantical information, thus yield -different answers. In this paper, we propose a novel neural network system that -consists a Demand Optimization Model based on a passage-attention neural -machine translation and a Reader Model that can find the answer given the -optimized question. The Demand Optimization Model optimizes the original query -and output multiple reconstructed questions, then the Reader Model takes the -new questions as input and locate the answers from the passage. To make -predictions robust, an evaluation mechanism will score the reconstructed -questions so the final answer strike a good balance between the quality of both -the Demand Optimization Model and the Reader Model. Experimental results on -several datasets show that our framework significantly improves multiple strong -baselines on this challenging task. -" -6056,1711.00247,"Bernardt Duvenhage, Mfundo Ntini, Phala Ramonyai",Improved Text Language Identification for the South African Languages,cs.CL," Virtual assistants and text chatbots have recently been gaining popularity. -Given the short message nature of text-based chat interactions, the language -identification systems of these bots might only have 15 or 20 characters to -make a prediction. However, accurate text language identification is important, -especially in the early stages of many multilingual natural language processing -pipelines. - This paper investigates the use of a naive Bayes classifier, to accurately -predict the language family that a piece of text belongs to, combined with a -lexicon based classifier to distinguish the specific South African language -that the text is written in. This approach leads to a 31% reduction in the -language detection error. - In the spirit of reproducible research the training and testing datasets as -well as the code are published on github. Hopefully it will be useful to create -a text language identification shared task for South African languages. -" -6057,1711.00279,"Zichao Li, Xin Jiang, Lifeng Shang, Hang Li",Paraphrase Generation with Deep Reinforcement Learning,cs.CL," Automatic generation of paraphrases from a given sentence is an important yet -challenging task in natural language processing (NLP), and plays a key role in -a number of applications such as question answering, search, and dialogue. In -this paper, we present a deep reinforcement learning approach to paraphrase -generation. Specifically, we propose a new framework for the task, which -consists of a \textit{generator} and an \textit{evaluator}, both of which are -learned from data. The generator, built as a sequence-to-sequence learning -model, can produce paraphrases given a sentence. The evaluator, constructed as -a deep matching model, can judge whether two sentences are paraphrases of each -other. The generator is first trained by deep learning and then further -fine-tuned by reinforcement learning in which the reward is given by the -evaluator. For the learning of the evaluator, we propose two methods based on -supervised learning and inverse reinforcement learning respectively, depending -on the type of available training data. Empirical study shows that the learned -evaluator can guide the generator to produce more accurate paraphrases. -Experimental results demonstrate the proposed models (the generators) -outperform the state-of-the-art methods in paraphrase generation in both -automatic evaluation and human evaluation. -" -6058,1711.00294,"Shikang Du, Xiaojun Wan and Yajie Ye","Towards Automatic Generation of Entertaining Dialogues in Chinese - Crosstalks",cs.CL," Crosstalk, also known by its Chinese name xiangsheng, is a traditional -Chinese comedic performing art featuring jokes and funny dialogues, and one of -China's most popular cultural elements. It is typically in the form of a -dialogue between two performers for the purpose of bringing laughter to the -audience, with one person acting as the leading comedian and the other as the -supporting role. Though general dialogue generation has been widely explored in -previous studies, it is unknown whether such entertaining dialogues can be -automatically generated or not. In this paper, we for the first time -investigate the possibility of automatic generation of entertaining dialogues -in Chinese crosstalks. Given the utterance of the leading comedian in each -dialogue, our task aims to generate the replying utterance of the supporting -role. We propose a humor-enhanced translation model to address this task and -human evaluation results demonstrate the efficacy of our proposed model. The -feasibility of automatic entertaining dialogue generation is also verified. -" -6059,1711.00309,"Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, Satoshi - Nakamura","Improving Neural Machine Translation through Phrase-based Forced - Decoding",cs.CL," Compared to traditional statistical machine translation (SMT), neural machine -translation (NMT) often sacrifices adequacy for the sake of fluency. We propose -a method to combine the advantages of traditional SMT and NMT by exploiting an -existing phrase-based SMT model to compute the phrase-based decoding cost for -an NMT output and then using this cost to rerank the n-best NMT outputs. The -main challenge in implementing this approach is that NMT outputs may not be in -the search space of the standard phrase-based decoding algorithm, because the -search space of phrase-based SMT is limited by the phrase-based translation -rule table. We propose a soft forced decoding algorithm, which can always -successfully find a decoding path for any NMT output. We show that using the -forced decoding cost to rerank the NMT outputs can successfully improve -translation quality on four different language pairs. -" -6060,1711.00313,"Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps","Avoiding Your Teacher's Mistakes: Training Neural Networks with - Controlled Weak Supervision",cs.LG cs.CL cs.NE stat.ML," Training deep neural networks requires massive amounts of training data, but -for many tasks only limited labeled data is available. This makes weak -supervision attractive, using weak or noisy signals like the output of -heuristic methods or user click-through data for training. In a semi-supervised -setting, we can use a large set of data with weak labels to pretrain a neural -network and then fine-tune the parameters with a small amount of data with true -labels. This feels intuitively sub-optimal as these two independent stages -leave the model unaware about the varying label quality. What if we could -somehow inform the model about the label quality? In this paper, we propose a -semi-supervised learning method where we train two neural networks in a -multi-task fashion: a ""target network"" and a ""confidence network"". The target -network is optimized to perform a given task and is trained using a large set -of unlabeled data that are weakly annotated. We propose to weight the gradient -updates to the target network using the scores provided by the second -confidence network, which is trained on a small amount of supervised data. Thus -we avoid that the weight updates computed from noisy labels harm the quality of -the target network model. We evaluate our learning strategy on two different -tasks: document ranking and sentiment classification. The results demonstrate -that our approach not only enhances the performance compared to the baselines -but also speeds up the learning process from weak labels. -" -6061,1711.00331,"Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, Aykut Koc, Tolga Cukur",Semantic Structure and Interpretability of Word Embeddings,cs.CL," Dense word embeddings, which encode semantic meanings of words to low -dimensional vector spaces have become very popular in natural language -processing (NLP) research due to their state-of-the-art performances in many -NLP tasks. Word embeddings are substantially successful in capturing semantic -relations among words, so a meaningful semantic structure must be present in -the respective vector spaces. However, in many cases, this semantic structure -is broadly and heterogeneously distributed across the embedding dimensions, -which makes interpretation a big challenge. In this study, we propose a -statistical method to uncover the latent semantic structure in the dense word -embeddings. To perform our analysis we introduce a new dataset (SEMCAT) that -contains more than 6500 words semantically grouped under 110 categories. We -further propose a method to quantify the interpretability of the word -embeddings; the proposed method is a practical alternative to the classical -word intrusion test that requires human intervention. -" -6062,1711.00350,Brenden M. Lake and Marco Baroni,"Generalization without systematicity: On the compositional skills of - sequence-to-sequence recurrent networks",cs.CL cs.AI cs.LG," Humans can understand and produce new utterances effortlessly, thanks to -their compositional skills. Once a person learns the meaning of a new verb -""dax,"" he or she can immediately understand the meaning of ""dax twice"" or ""sing -and dax."" In this paper, we introduce the SCAN domain, consisting of a set of -simple compositional navigation commands paired with the corresponding action -sequences. We then test the zero-shot generalization capabilities of a variety -of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence -methods. We find that RNNs can make successful zero-shot generalizations when -the differences between training and test commands are small, so that they can -apply ""mix-and-match"" strategies to solve the task. However, when -generalization requires systematic compositional skills (as in the ""dax"" -example above), RNNs fail spectacularly. We conclude with a proof-of-concept -experiment in neural machine translation, suggesting that lack of systematicity -might be partially responsible for neural networks' notorious training data -thirst. -" -6063,1711.00354,"Ryosuke Sonobe, Shinnosuke Takamichi, Hiroshi Saruwatari","JSUT corpus: free large-scale Japanese speech corpus for end-to-end - speech synthesis",cs.CL," Thanks to improvements in machine learning techniques including deep -learning, a free large-scale speech corpus that can be shared between academic -institutions and commercial companies has an important role. However, such a -corpus for Japanese speech synthesis does not exist. In this paper, we designed -a novel Japanese speech corpus, named the ""JSUT corpus,"" that is aimed at -achieving end-to-end speech synthesis. The corpus consists of 10 hours of -reading-style speech data and its transcription and covers all of the main -pronunciations of daily-use Japanese characters. In this paper, we describe how -we designed and analyzed the corpus. The corpus is freely available online. -" -6064,1711.00482,"Jacob Andreas, Dan Klein, Sergey Levine",Learning with Latent Language,cs.CL cs.NE," The named concepts and compositional operators present in natural language -provide a rich source of information about the kinds of abstractions humans use -to navigate the world. Can this linguistic background knowledge improve the -generality and efficiency of learned classifiers and control policies? This -paper aims to show that using the space of natural language strings as a -parameter space is an effective way to capture natural task structure. In a -pretraining phase, we learn a language interpretation model that transforms -inputs (e.g. images) into outputs (e.g. labels) given natural language -descriptions. To learn a new concept (e.g. a classifier), we search directly in -the space of descriptions to minimize the interpreter's loss on training -examples. Crucially, our models do not require language data to learn these -concepts: language is used only in pretraining to impose structure on -subsequent learning. Results on image classification, text editing, and -reinforcement learning show that, in all settings, models with a linguistic -parameterization outperform those without. -" -6065,1711.00513,"Rachel Bawden, Rico Sennrich, Alexandra Birch and Barry Haddow",Evaluating Discourse Phenomena in Neural Machine Translation,cs.CL," For machine translation to tackle discourse phenomena, models must have -access to extra-sentential linguistic context. There has been recent interest -in modelling context in neural machine translation (NMT), but models have been -principally evaluated with standard automatic metrics, poorly adapted to -evaluating discourse phenomena. In this article, we present hand-crafted, -discourse test sets, designed to test the models' ability to exploit previous -source and target sentences. We investigate the performance of recently -proposed multi-encoder NMT models trained on subtitles for English to French. -We also explore a novel way of exploiting context from the previous sentence. -Despite gains using BLEU, multi-encoder models give limited improvement in the -handling of discourse phenomena: 50% accuracy on our coreference test set and -53.5% for coherence/cohesion (compared to a non-contextual baseline of 50%). A -simple strategy of decoding the concatenation of the previous and current -sentence leads to good performance, and our novel strategy of multi-encoding -and decoding of two sentences leads to the best performance (72.5% for -coreference and 57% for coherence/cohesion), highlighting the importance of -target-side context. -" -6066,1711.00520,"Yuxuan Wang, RJ Skerry-Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric - Battenberg, Rob Clark, Rif A. Saurous",Uncovering Latent Style Factors for Expressive Speech Synthesis,cs.CL cs.SD," Prosodic modeling is a core problem in speech synthesis. The key challenge is -producing desirable prosody from textual input containing only phonetic -information. In this preliminary study, we introduce the concept of ""style -tokens"" in Tacotron, a recently proposed end-to-end neural speech synthesis -model. Using style tokens, we aim to extract independent prosodic styles from -training data. We show that without annotation data or an explicit supervision -signal, our approach can automatically learn a variety of prosodic variations -in a purely data-driven way. Importantly, each style token corresponds to a -fixed style factor regardless of the given text sequence. As a result, we can -control the prosodic style of synthetic speech in a somewhat predictable and -globally consistent way. -" -6067,1711.00529,"Angus G. Forbes, Kristine Lee, Gus Hahn-Powell, Marco A. - Valenzuela-Esc\'arcega, Mihai Surdeanu",Text Annotation Graphs: Annotating Complex Natural Language Phenomena,cs.CL," This paper introduces a new web-based software tool for annotating text, Text -Annotation Graphs, or TAG. It provides functionality for representing complex -relationships between words and word phrases that are not available in other -software tools, including the ability to define and visualize relationships -between the relationships themselves (semantic hypergraphs). Additionally, we -include an approach to representing text annotations in which annotation -subgraphs, or semantic summaries, are used to show relationships outside of the -sequential context of the text itself. Users can use these subgraphs to quickly -find similar structures within the current document or external annotated -documents. Initially, TAG was developed to support information extraction tasks -on a large database of biomedical articles. However, our software is flexible -enough to support a wide range of annotation tasks for any domain. Examples are -provided that showcase TAG's capabilities on morphological parsing and event -extraction tasks. The TAG software is available at: https://github.com/ -CreativeCodingLab/TextAnnotationGraphs. -" -6068,1711.00549,"Anjishnu Kumar, Arpit Gupta, Julian Chan, Sam Tucker, Bjorn - Hoffmeister, Markus Dreyer, Stanislav Peshterliev, Ankur Gandhe, Denis - Filiminov, Ariya Rastrow, Christian Monson and Agnika Kumar","Just ASK: Building an Architecture for Extensible Self-Service Spoken - Language Understanding",cs.CL cs.AI cs.NE cs.SE," This paper presents the design of the machine learning architecture that -underlies the Alexa Skills Kit (ASK) a large scale Spoken Language -Understanding (SLU) Software Development Kit (SDK) that enables developers to -extend the capabilities of Amazon's virtual assistant, Alexa. At Amazon, the -infrastructure powers over 25,000 skills deployed through the ASK, as well as -AWS's Amazon Lex SLU Service. The ASK emphasizes flexibility, predictability -and a rapid iteration cycle for third party developers. It imposes inductive -biases that allow it to learn robust SLU models from extremely small and sparse -datasets and, in doing so, removes significant barriers to entry for software -developers and dialogue systems researchers. -" -6069,1711.00681,"Akbar Karimi, Ebrahim Ansari and Bahram Sadeghi Bigham",Extracting an English-Persian Parallel Corpus from Comparable Corpora,cs.CL cs.IR," Parallel data are an important part of a reliable Statistical Machine -Translation (SMT) system. The more of these data are available, the better the -quality of the SMT system. However, for some language pairs such as -Persian-English, parallel sources of this kind are scarce. In this paper, a -bidirectional method is proposed to extract parallel sentences from English and -Persian document aligned Wikipedia. Two machine translation systems are -employed to translate from Persian to English and the reverse after which an IR -system is used to measure the similarity of the translated sentences. Adding -the extracted sentences to the training data of the existing SMT systems is -shown to improve the quality of the translation. Furthermore, the proposed -method slightly outperforms the one-directional approach. The extracted corpus -consists of about 200,000 sentences which have been sorted by their degree of -similarity calculated by the IR system and is freely available for public -access on the Web. -" -6070,1711.00768,Ana Marasovi\'c and Anette Frank,"SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with - Semantic Role Labeling",cs.CL," For over a decade, machine learning has been used to extract -opinion-holder-target structures from text to answer the question ""Who -expressed what kind of sentiment towards what?"". Recent neural approaches do -not outperform the state-of-the-art feature-based models for Opinion Role -Labeling (ORL). We suspect this is due to the scarcity of labeled training data -and address this issue using different multi-task learning (MTL) techniques -with a related task which has substantially more data, i.e. Semantic Role -Labeling (SRL). We show that two MTL models improve significantly over the -single-task model for labeling of both holders and targets, on the development -and the test sets. We found that the vanilla MTL model which makes predictions -using only shared ORL and SRL features, performs the best. With deeper analysis -we determine what works and what might be done to make further improvements for -ORL. -" -6071,1711.00894,"Swabha Swayamdipta, Ankur P. Parikh, Tom Kwiatkowski",Multi-Mention Learning for Reading Comprehension with Neural Cascades,cs.CL," Reading comprehension is a challenging task, especially when executed across -longer or across multiple evidence documents, where the answer is likely to -reoccur. Existing neural architectures typically do not scale to the entire -evidence, and hence, resort to selecting a single passage in the document -(either via truncation or other means), and carefully searching for the answer -within that passage. However, in some cases, this strategy can be suboptimal, -since by focusing on a specific passage, it becomes difficult to leverage -multiple mentions of the same answer throughout the document. In this work, we -take a different approach by constructing lightweight models that are combined -in a cascade to find the answer. Each submodel consists only of feed-forward -networks equipped with an attention mechanism, making it trivially -parallelizable. We show that our approach can scale to approximately an order -of magnitude larger evidence documents and can aggregate information at the -representation level from multiple mentions of each answer candidate across the -document. Empirically, our approach achieves state-of-the-art performance on -both the Wikipedia and web domains of the TriviaQA dataset, outperforming more -complex, recurrent architectures. -" -6072,1711.00938,"Manex Agirrezabal, I\~naki Alegria, Mans Hulden",A Comparison of Feature-Based and Neural Scansion of Poetry,cs.CL," Automatic analysis of poetic rhythm is a challenging task that involves -linguistics, literature, and computer science. When the language to be analyzed -is known, rule-based systems or data-driven methods can be used. In this paper, -we analyze poetic rhythm in English and Spanish. We show that the -representations of data learned from character-based neural models are more -informative than the ones from hand-crafted features, and that a -Bi-LSTM+CRF-model produces state-of-the art accuracy on scansion of poetry in -two languages. Results also show that the information about whole word -structure, and not just independent syllables, is highly informative for -performing scansion. -" -6073,1711.01006,"Yining Wang, Yang Zhao, Jiajun Zhang, Chengqing Zong, Zhengshan Xue",Towards Neural Machine Translation with Partially Aligned Corpora,cs.CL," While neural machine translation (NMT) has become the new paradigm, the -parameter optimization requires large-scale parallel data which is scarce in -many domains and language pairs. In this paper, we address a new translation -scenario in which there only exists monolingual corpora and phrase pairs. We -propose a new method towards translation with partially aligned sentence pairs -which are derived from the phrase pairs and monolingual corpora. To make full -use of the partially aligned corpora, we adapt the conventional NMT training -method in two aspects. On one hand, different generation strategies are -designed for aligned and unaligned target words. On the other hand, a different -objective function is designed to model the partially aligned parts. The -experiments demonstrate that our method can achieve a relatively good result in -such a translation scenario, and tiny bitexts can boost translation quality to -a large extent. -" -6074,1711.01048,"Saurabh Garg, Tanmay Parekh, Preethi Jyothi",Dual Language Models for Code Switched Speech Recognition,cs.CL," In this work, we present a simple and elegant approach to language modeling -for bilingual code-switched text. Since code-switching is a blend of two or -more different languages, a standard bilingual language model can be improved -upon by using structures of the monolingual language models. We propose a novel -technique called dual language models, which involves building two -complementary monolingual language models and combining them using a -probabilistic model for switching between the two. We evaluate the efficacy of -our approach using a conversational Mandarin-English speech corpus. We prove -the robustness of our model by showing significant improvements in perplexity -measures over the standard bilingual language model without the use of any -external information. Similar consistent improvements are also reflected in -automatic speech recognition error rates. -" -6075,1711.01068,"Raphael Shu, Hideki Nakayama",Compressing Word Embeddings via Deep Compositional Code Learning,cs.CL," Natural language processing (NLP) models often require a massive number of -parameters for word embeddings, resulting in a large storage or memory -footprint. Deploying neural NLP models to mobile devices requires compressing -the word embeddings without any significant sacrifices in performance. For this -purpose, we propose to construct the embeddings with few basis vectors. For -each word, the composition of basis vectors is determined by a hash code. To -maximize the compression rate, we adopt the multi-codebook quantization -approach instead of binary coding scheme. Each code is composed of multiple -discrete numbers, such as (3, 2, 1, 8), where the value of each component is -limited to a fixed range. We propose to directly learn the discrete codes in an -end-to-end neural network by applying the Gumbel-softmax trick. Experiments -show the compression rate achieves 98% in a sentiment analysis task and 94% ~ -99% in machine translation tasks without performance loss. In both tasks, the -proposed method can improve the model performance by slightly lowering the -compression rate. Compared to other approaches such as character-level -segmentation, the proposed method is language-independent and does not require -modifications to the network architecture. -" -6076,1711.01100,Johannes Bjerva,"One Model to Rule them all: Multitask and Multilingual Modelling for - Lexical Analysis",cs.CL," When learning a new skill, you take advantage of your preexisting skills and -knowledge. For instance, if you are a skilled violinist, you will likely have -an easier time learning to play cello. Similarly, when learning a new language -you take advantage of the languages you already speak. For instance, if your -native language is Norwegian and you decide to learn Dutch, the lexical overlap -between these two languages will likely benefit your rate of language -acquisition. This thesis deals with the intersection of learning multiple tasks -and learning multiple languages in the context of Natural Language Processing -(NLP), which can be defined as the study of computational processing of human -language. Although these two types of learning may seem different on the -surface, we will see that they share many similarities. - The traditional approach in NLP is to consider a single task for a single -language at a time. However, recent advances allow for broadening this -approach, by considering data for multiple tasks and languages simultaneously. -This is an important approach to explore further as the key to improving the -reliability of NLP, especially for low-resource languages, is to take advantage -of all relevant data whenever possible. In doing so, the hope is that in the -long term, low-resource languages can benefit from the advances made in NLP -which are currently to a large extent reserved for high-resource languages. -This, in turn, may then have positive consequences for, e.g., language -preservation, as speakers of minority languages will have a lower degree of -pressure to using high-resource languages. In the short term, answering the -specific research questions posed should be of use to NLP researchers working -towards the same goal. -" -6077,1711.01161,"Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, - Gabriel Synnaeve, Emmanuel Dupoux",Learning Filterbanks from Raw Speech for Phone Recognition,cs.CL," We train a bank of complex filters that operates on the raw waveform and is -fed into a convolutional neural network for end-to-end phone recognition. These -time-domain filterbanks (TD-filterbanks) are initialized as an approximation of -mel-filterbanks, and then fine-tuned jointly with the remaining convolutional -architecture. We perform phone recognition experiments on TIMIT and show that -for several architectures, models trained on TD-filterbanks consistently -outperform their counterparts trained on comparable mel-filterbanks. We get our -best performance by learning all front-end steps, from pre-emphasis up to -averaging. Finally, we observe that the filters at convergence have an -asymmetric impulse response, and that some of them remain almost analytic. -" -6078,1711.01362,Venkatesh Duppada,"""Attention"" for Detecting Unreliable News in the Information Age",cs.CL," An Unreliable news is any piece of information which is false or misleading, -deliberately spread to promote political, ideological and financial agendas. -Recently the problem of unreliable news has got a lot of attention as the -number instances of using news and social media outlets for propaganda have -increased rapidly. This poses a serious threat to society, which calls for -technology to automatically and reliably identify unreliable news sources. This -paper is an effort made in this direction to build systems for detecting -unreliable news articles. In this paper, various NLP algorithms were built and -evaluated on Unreliable News Data 2017 dataset. Variants of hierarchical -attention networks (HAN) are presented for encoding and classifying news -articles which achieve the best results of 0.944 ROC-AUC. Finally, Attention -layer weights are visualized to understand and give insight into the decisions -made by HANs. The results obtained are very promising and encouraging to deploy -and use these systems in the real world to mitigate the problem of unreliable -news. -" -6079,1711.01386,"Yuan Yang, Pengtao Xie, Xin Gao, Carol Cheng, Christy Li, Hongbao - Zhang and Eric Xing","Predicting Discharge Medications at Admission Time Based on Deep - Learning",cs.CL," Predicting discharge medications right after a patient being admitted is an -important clinical decision, which provides physicians with guidance on what -type of medication regimen to plan for and what possible changes on initial -medication may occur during an inpatient stay. It also facilitates medication -reconciliation process with easy detection of medication discrepancy at -discharge time to improve patient safety. However, since the information -available upon admission is limited and patients' condition may evolve during -an inpatient stay, these predictions could be a difficult decision for -physicians to make. In this work, we investigate how to leverage deep learning -technologies to assist physicians in predicting discharge medications based on -information documented in the admission note. We build a convolutional neural -network which takes an admission note as input and predicts the medications -placed on the patient at discharge time. Our method is able to distill semantic -patterns from unstructured and noisy texts, and is capable of capturing the -pharmacological correlations among medications. We evaluate our method on 25K -patient visits and compare with 4 strong baselines. Our methods demonstrate a -20% increase in macro-averaged F1 score than the best baseline. -" -6080,1711.01416,"Vasily Pestun, John Terilla, Yiannis Vlassopoulos",Language as a matrix product state,cs.CL cond-mat.dis-nn cs.LG cs.NE stat.ML," We propose a statistical model for natural language that begins by -considering language as a monoid, then representing it in complex matrices with -a compatible translation invariant probability measure. We interpret the -probability measure as arising via the Born rule from a translation invariant -matrix product state. -" -6081,1711.01427,"Jingjing Xu, Xu Sun, Sujian Li, Xiaoyan Cai and Bingzhen Wei","Deep Stacking Networks for Low-Resource Chinese Word Segmentation with - Transfer Learning",cs.CL," In recent years, neural networks have proven to be effective in Chinese word -segmentation. However, this promising performance relies on large-scale -training data. Neural networks with conventional architectures cannot achieve -the desired results in low-resource datasets due to the lack of labelled -training data. In this paper, we propose a deep stacking framework to improve -the performance on word segmentation tasks with insufficient data by -integrating datasets from diverse domains. Our framework consists of two parts, -domain-based models and deep stacking networks. The domain-based models are -used to learn knowledge from different datasets. The deep stacking networks are -designed to integrate domain-based models. To reduce model conflicts, we -innovatively add communication paths among models and design various structures -of deep stacking networks, including Gaussian-based Stacking Networks, -Concatenate-based Stacking Networks, Sequence-based Stacking Networks and -Tree-based Stacking Networks. We conduct experiments on six low-resource -datasets from various domains. Our proposed framework shows significant -performance improvements on all datasets compared with several strong -baselines. -" -6082,1711.01505,"Allyson Ettinger, Sudha Rao, Hal Daum\'e III, Emily M. Bender","Towards Linguistically Generalizable NLP Systems: A Workshop and Shared - Task",cs.CL," This paper presents a summary of the first Workshop on Building -Linguistically Generalizable Natural Language Processing Systems, and the -associated Build It Break It, The Language Edition shared task. The goal of -this workshop was to bring together researchers in NLP and linguistics with a -shared task aimed at testing the generalizability of NLP systems beyond the -distributions of their training data. We describe the motivation, setup, and -participation of the shared task, provide discussion of some highlighted -results, and discuss lessons learned. -" -6083,1711.01515,Yu-An Chung and James Glass,Learning Word Embeddings from Speech,cs.CL," In this paper, we propose a novel deep neural network architecture, -Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length -vector representations of audio segments excised from a speech corpus, where -the vectors contain semantic information pertaining to the segments, and are -close to other vectors in the embedding space if their corresponding segments -are semantically similar. The design of the proposed model is based on the RNN -Encoder-Decoder framework, and borrows the methodology of continuous skip-grams -for training. The learned vector representations are evaluated on 13 widely -used word similarity benchmarks, and achieved competitive results to that of -GloVe. The biggest advantage of the proposed model is its capability of -extracting semantic information of audio segments taken directly from raw -speech, without relying on any other modalities such as text or images, which -are challenging and expensive to collect and annotate. -" -6084,1711.01563,"Daochen Zha, Chenliang Li",Multi-label Dataless Text Classification with Topic Modeling,cs.IR cs.CL," Manually labeling documents is tedious and expensive, but it is essential for -training a traditional text classifier. In recent years, a few dataless text -classification techniques have been proposed to address this problem. However, -existing works mainly center on single-label classification problems, that is, -each document is restricted to belonging to a single category. In this paper, -we propose a novel Seed-guided Multi-label Topic Model, named SMTM. With a few -seed words relevant to each category, SMTM conducts multi-label classification -for a collection of documents without any labeled document. In SMTM, each -category is associated with a single category-topic which covers the meaning of -the category. To accommodate with multi-labeled documents, we explicitly model -the category sparsity in SMTM by using spike and slab prior and weak smoothing -prior. That is, without using any threshold tuning, SMTM automatically selects -the relevant categories for each document. To incorporate the supervision of -the seed words, we propose a seed-guided biased GPU (i.e., generalized Polya -urn) sampling procedure to guide the topic inference of SMTM. Experiments on -two public datasets show that SMTM achieves better classification accuracy than -state-of-the-art alternatives and even outperforms supervised solutions in some -scenarios. -" -6085,1711.01567,"Anuroop Sriram, Heewoo Jun, Yashesh Gaur and Sanjeev Satheesh",Robust Speech Recognition Using Generative Adversarial Networks,cs.CL cs.LG," This paper describes a general, scalable, end-to-end framework that uses the -generative adversarial network (GAN) objective to enable robust speech -recognition. Encoders trained with the proposed approach enjoy improved -invariance by learning to map noisy audio to the same embedding space as that -of clean audio. Unlike previous methods, the new framework does not rely on -domain expertise or simplifying assumptions as are often needed in signal -processing, and directly encourages robustness in a data-driven way. We show -the new approach improves simulated far-field speech recognition of vanilla -sequence-to-sequence models without specialized front-ends or preprocessing. -" -6086,1711.01684,Anjalie Field,Authorship Analysis of Xenophon's Cyropaedia,cs.CL," In the past several decades, many authorship attribution studies have used -computational methods to determine the authors of disputed texts. Disputed -authorship is a common problem in Classics, since little information about -ancient documents has survived the centuries. Many scholars have questioned the -authenticity of the final chapter of Xenophon's Cyropaedia, a 4th century B.C. -historical text. In this study, we use N-grams frequency vectors with a cosine -similarity function and word frequency vectors with Naive Bayes Classifiers -(NBC) and Support Vector Machines (SVM) to analyze the authorship of the -Cyropaedia. Although the N-gram analysis shows that the epilogue of the -Cyropaedia differs slightly from the rest of the work, comparing the analysis -of Xenophon with analyses of Aristotle and Plato suggests that this difference -is not significant. Both NBC and SVM analyses of word frequencies show that the -final chapter of the Cyropaedia is closely related to the other chapters of the -Cyropaedia. Therefore, this analysis suggests that the disputed chapter was -written by Xenophon. This information can help scholars better understand the -Cyropaedia and also demonstrates the usefulness of applying modern authorship -analysis techniques to classical literature. -" -6087,1711.01694,"Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, - Eugene Weinstein, Kanishka Rao",Multilingual Speech Recognition With A Single End-To-End Model,eess.AS cs.AI cs.CL," Training a conventional automatic speech recognition (ASR) system to support -multiple languages is challenging because the sub-word unit, lexicon and word -inventories are typically language specific. In contrast, sequence-to-sequence -models are well suited for multilingual ASR because they encapsulate an -acoustic, pronunciation and language model jointly in a single network. In this -work we present a single sequence-to-sequence ASR model trained on 9 different -Indian languages, which have very little overlap in their scripts. -Specifically, we take a union of language-specific grapheme sets and train a -grapheme-based sequence-to-sequence model jointly on data from all languages. -We find that this model, which is not explicitly given any information about -language identity, improves recognition performance by 21% relative compared to -analogous sequence-to-sequence models trained on each language individually. By -modifying the model to accept a language identifier as an additional input -feature, we further improve performance by an additional 7% relative and -eliminate confusion between different languages. -" -6088,1711.01701,"Wei Li, Zheng Yang","Distributed Representation for Traditional Chinese Medicine Herb via - Deep Learning Models",cs.CL," Traditional Chinese Medicine (TCM) has accumulated a big amount of precious -resource in the long history of development. TCM prescriptions that consist of -TCM herbs are an important form of TCM treatment, which are similar to natural -language documents, but in a weakly ordered fashion. Directly adapting language -modeling style methods to learn the embeddings of the herbs can be problematic -as the herbs are not strictly in order, the herbs in the front of the -prescription can be connected to the very last ones. In this paper, we propose -to represent TCM herbs with distributed representations via Prescription Level -Language Modeling (PLLM). In one of our experiments, the correlation between -our calculated similarity between medicines and the judgment of professionals -achieves a Spearman score of 55.35 indicating a strong correlation, which -surpasses human beginners (TCM related field bachelor student) by a big margin -(over 10%). -" -6089,1711.01731,"Hongshen Chen, Xiaorui Liu, Dawei Yin, and Jiliang Tang",A Survey on Dialogue Systems: Recent Advances and New Frontiers,cs.CL," Dialogue systems have attracted more and more attention. Recent advances on -dialogue systems are overwhelmingly contributed by deep learning techniques, -which have been employed to enhance a wide range of big data applications such -as computer vision, natural language processing, and recommender systems. For -dialogue systems, deep learning can leverage a massive amount of data to learn -meaningful feature representations and response generation strategies, while -requiring a minimum amount of hand-crafting. In this article, we give an -overview to these recent advances on dialogue systems from various perspectives -and discuss some possible research directions. In particular, we generally -divide existing dialogue systems into task-oriented and non-task-oriented -models, then detail how deep learning techniques help them with representative -algorithms and finally discuss some appealing research directions that can -bring the dialogue system research into a new frontier. -" -6090,1711.01804,"Lukas Svoboda, Slobodan Beliga",Evaluation of Croatian Word Embeddings,cs.CL," Croatian is poorly resourced and highly inflected language from Slavic -language family. Nowadays, research is focusing mostly on English. We created a -new word analogy corpus based on the original English Word2vec word analogy -corpus and added some of the specific linguistic aspects from Croatian -language. Next, we created Croatian WordSim353 and RG65 corpora for a basic -evaluation of word similarities. We compared created corpora on two popular -word representation models, based on Word2Vec tool and fastText tool. Models -has been trained on 1.37B tokens training data corpus and tested on a new -robust Croatian word analogy corpus. Results show that models are able to -create meaningful word representation. This research has shown that free word -order and the higher morphological complexity of Croatian language influences -the quality of resulting word embeddings. -" -6091,1711.01921,Rakshith Shetty and Bernt Schiele and Mario Fritz,"$A^{4}NT$: Author Attribute Anonymity by Adversarial Training of Neural - Machine Translation",cs.CR cs.CL cs.CY cs.SI stat.ML," Text-based analysis methods allow to reveal privacy relevant author -attributes such as gender, age and identify of the text's author. Such methods -can compromise the privacy of an anonymous author even when the author tries to -remove privacy sensitive content. In this paper, we propose an automatic -method, called Adversarial Author Attribute Anonymity Neural Translation -($A^4NT$), to combat such text-based adversaries. We combine -sequence-to-sequence language models used in machine translation and generative -adversarial networks to obfuscate author attributes. Unlike machine translation -techniques which need paired data, our method can be trained on unpaired -corpora of text containing different authors. Importantly, we propose and -evaluate techniques to impose constraints on our $A^4NT$ to preserve the -semantics of the input text. $A^4NT$ learns to make minimal changes to the -input text to successfully fool author attribute classifiers, while aiming to -maintain the meaning of the input. We show through experiments on two different -datasets and three settings that our proposed method is effective in fooling -the author attribute classifiers and thereby improving the anonymity of -authors. -" -6092,1711.01985,Tomasz Korbak and Paulina \.Zak,"Fine-tuning Tree-LSTM for phrase-level sentiment classification on a - Polish dependency treebank. Submission to PolEval task 2",cs.CL," We describe a variant of Child-Sum Tree-LSTM deep neural network (Tai et al, -2015) fine-tuned for working with dependency trees and morphologically rich -languages using the example of Polish. Fine-tuning included applying a custom -regularization technique (zoneout, described by (Krueger et al., 2016), and -further adapted for Tree-LSTMs) as well as using pre-trained word embeddings -enhanced with sub-word information (Bojanowski et al., 2016). The system was -implemented in PyTorch and evaluated on phrase-level sentiment labeling task as -part of the PolEval competition. -" -6093,1711.02012,"Senthil Mani, Neelamadhav Gantayat, Rahul Aralikatte, Monika Gupta, - Sampath Dechu, Anush Sankaran, Shreya Khare, Barry Mitchell, Hemamalini - Subramanian, Hema Venkatarangan","Hi, how can I help you?: Automating enterprise IT support help desks",cs.CL cs.AI," Question answering is one of the primary challenges of natural language -understanding. In realizing such a system, providing complex long answers to -questions is a challenging task as opposed to factoid answering as the former -needs context disambiguation. The different methods explored in the literature -can be broadly classified into three categories namely: 1) classification -based, 2) knowledge graph based and 3) retrieval based. Individually, none of -them address the need of an enterprise wide assistance system for an IT support -and maintenance domain. In this domain the variance of answers is large ranging -from factoid to structured operating procedures; the knowledge is present -across heterogeneous data sources like application specific documentation, -ticket management systems and any single technique for a general purpose -assistance is unable to scale for such a landscape. To address this, we have -built a cognitive platform with capabilities adopted for this domain. Further, -we have built a general purpose question answering system leveraging the -platform that can be instantiated for multiple products, technologies in the -support domain. The system uses a novel hybrid answering model that -orchestrates across a deep learning classifier, a knowledge graph based context -disambiguation module and a sophisticated bag-of-words search system. This -orchestration performs context switching for a provided question and also does -a smooth hand-off of the question to a human expert if none of the automated -techniques can provide a confident answer. This system has been deployed across -675 internal enterprise IT support and maintenance projects. -" -6094,1711.02013,"Yikang Shen, Zhouhan Lin, Chin-Wei Huang, Aaron Courville",Neural Language Modeling by Jointly Learning Syntax and Lexicon,cs.CL cs.AI," We propose a neural language model capable of unsupervised syntactic -structure induction. The model leverages the structure information to form -better semantic representations and better language modeling. Standard -recurrent neural networks are limited by their structure and fail to -efficiently use syntactic information. On the other hand, tree-structured -recursive networks usually require additional structural supervision at the -cost of human expert annotation. In this paper, We propose a novel neural -language model, called the Parsing-Reading-Predict Networks (PRPN), that can -simultaneously induce the syntactic structure from unannotated sentences and -leverage the inferred structure to learn a better language model. In our model, -the gradient can be directly back-propagated from the language model loss into -the neural parsing network. Experiments show that the proposed model can -discover the underlying syntactic structure and achieve state-of-the-art -performance on word/character-level language model tasks. -" -6095,1711.02085,"Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi",Neural Speed Reading via Skim-RNN,cs.CL," Inspired by the principles of speed reading, we introduce Skim-RNN, a -recurrent neural network (RNN) that dynamically decides to update only a small -fraction of the hidden state for relatively unimportant input tokens. Skim-RNN -gives computational advantage over an RNN that always updates the entire hidden -state. Skim-RNN uses the same input and output interfaces as a standard RNN and -can be easily used instead of RNNs in existing models. In our experiments, we -show that Skim-RNN can achieve significantly reduced computational cost without -losing accuracy compared to standard RNNs across five different natural -language tasks. In addition, we demonstrate that the trade-off between accuracy -and speed of Skim-RNN can be dynamically controlled during inference time in a -stable manner. Our analysis also shows that Skim-RNN running on a single CPU -offers lower latency compared to standard RNNs on GPUs. -" -6096,1711.02132,"Karim Ahmed, Nitish Shirish Keskar, Richard Socher",Weighted Transformer Network for Machine Translation,cs.AI cs.CL," State-of-the-art results on neural machine translation often use attentional -sequence-to-sequence models with some form of convolution or recursion. Vaswani -et al. (2017) propose a new architecture that avoids recurrence and convolution -completely. Instead, it uses only self-attention and feed-forward layers. While -the proposed architecture achieves state-of-the-art results on several machine -translation tasks, it requires a large number of parameters and training -iterations to converge. We propose Weighted Transformer, a Transformer with -modified attention layers, that not only outperforms the baseline network in -BLEU score but also converges 15-40% faster. Specifically, we replace the -multi-head attention by multiple self-attention branches that the model learns -to combine during the training process. Our model improves the state-of-the-art -performance by 0.5 BLEU points on the WMT 2014 English-to-German translation -task and by 0.4 on the English-to-French translation task. -" -6097,1711.02162,Prafulla Kumar Choubey and Ruihong Huang,TAMU at KBP 2017: Event Nugget Detection and Coreference Resolution,cs.CL," In this paper, we describe TAMU's system submitted to the TAC KBP 2017 event -nugget detection and coreference resolution task. Our system builds on the -statistical and empirical observations made on training and development data. -We found that modifiers of event nuggets tend to have unique syntactic -distribution. Their parts-of-speech tags and dependency relations provides them -essential characteristics that are useful in identifying their span and also -defining their types and realis status. We further found that the joint -modeling of event span detection and realis status identification performs -better than the individual models for both tasks. Our simple system designed -using minimal features achieved the micro-average F1 scores of 57.72, 44.27 and -42.47 for event span detection, type identification and realis status -classification tasks respectively. Also, our system achieved the CoNLL F1 score -of 27.20 in event coreference resolution task. -" -6098,1711.02173,"Yonatan Belinkov, Yonatan Bisk",Synthetic and Natural Noise Both Break Neural Machine Translation,cs.CL cs.LG," Character-based neural machine translation (NMT) models alleviate -out-of-vocabulary issues, learn morphology, and move us closer to completely -end-to-end translation systems. Unfortunately, they are also very brittle and -easily falter when presented with noisy data. In this paper, we confront NMT -models with synthetic and natural sources of noise. We find that -state-of-the-art models fail to translate even moderately noisy texts that -humans have no trouble comprehending. We explore two approaches to increase -model robustness: structure-invariant word representations and robust training -on noisy texts. We find that a model based on a character convolutional neural -network is able to simultaneously learn representations robust to multiple -kinds of noise. -" -6099,1711.02207,"Suyoun Kim, Michael L. Seltzer",Towards Language-Universal End-to-End Speech Recognition,cs.CL," Building speech recognizers in multiple languages typically involves -replicating a monolingual training recipe for each language, or utilizing a -multi-task learning approach where models for different languages have separate -output labels but share some internal parameters. In this work, we exploit -recent progress in end-to-end speech recognition to create a single -multilingual speech recognition system capable of recognizing any of the -languages seen in training. To do so, we propose the use of a universal -character set that is shared among all languages. We also create a -language-specific gating mechanism within the network that can modulate the -network's internal representations in a language-specific way. We evaluate our -proposed approach on the Microsoft Cortana task across three languages and show -that our system outperforms both the individual monolingual systems and systems -built with a multi-task learning approach. We also show that this model can be -used to initialize a monolingual speech recognizer, and can be used to create a -bilingual model for use in code-switching scenarios. -" -6100,1711.02212,"Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao",Improved training for online end-to-end speech recognition systems,cs.CL," Achieving high accuracy with end-to-end speech recognizers requires careful -parameter initialization prior to training. Otherwise, the networks may fail to -find a good local optimum. This is particularly true for online networks, such -as unidirectional LSTMs. Currently, the best strategy to train such systems is -to bootstrap the training from a tied-triphone system. However, this is time -consuming, and more importantly, is impossible for languages without a -high-quality pronunciation lexicon. In this work, we propose an initialization -strategy that uses teacher-student learning to transfer knowledge from a large, -well-trained, offline end-to-end speech recognition model to an online -end-to-end model, eliminating the need for a lexicon or any other linguistic -resources. We also explore curriculum learning and label smoothing and show how -they can be combined with the proposed teacher-student learning for further -improvements. We evaluate our methods on a Microsoft Cortana personal assistant -task and show that the proposed method results in a 19 % relative improvement -in word error rate compared to a randomly-initialized baseline system. -" -6101,1711.02281,"Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li and Richard - Socher",Non-Autoregressive Neural Machine Translation,cs.CL cs.LG," Existing approaches to neural machine translation condition each output word -on previously generated outputs. We introduce a model that avoids this -autoregressive property and produces its outputs in parallel, allowing an order -of magnitude lower latency during inference. Through knowledge distillation, -the use of input token fertilities as a latent variable, and policy gradient -fine-tuning, we achieve this at a cost of as little as 2.0 BLEU points relative -to the autoregressive Transformer network used as a teacher. We demonstrate -substantial cumulative improvements associated with each of the three aspects -of our training strategy, and validate our approach on IWSLT 2016 -English-German and two WMT language pairs. By sampling fertilities in parallel -at inference time, our non-autoregressive model achieves near-state-of-the-art -performance of 29.8 BLEU on WMT 2016 English-Romanian. -" -6102,1711.02295,"Ricardo Baeza-Yates, Zeinab Liaghat",Quality-Efficiency Trade-offs in Machine Learning for Text Processing,cs.IR cs.CL cs.LG," Data mining, machine learning, and natural language processing are powerful -techniques that can be used together to extract information from large texts. -Depending on the task or problem at hand, there are many different approaches -that can be used. The methods available are continuously being optimized, but -not all these methods have been tested and compared in a set of problems that -can be solved using supervised machine learning algorithms. The question is -what happens to the quality of the methods if we increase the training data -size from, say, 100 MB to over 1 GB? Moreover, are quality gains worth it when -the rate of data processing diminishes? Can we trade quality for time -efficiency and recover the quality loss by just being able to process more -data? We attempt to answer these questions in a general way for text processing -tasks, considering the trade-offs involving training data size, learning time, -and quality obtained. We propose a performance trade-off framework and apply it -to three important text processing problems: Named Entity Recognition, -Sentiment Analysis and Document Classification. These problems were also chosen -because they have different levels of object granularity: words, paragraphs, -and documents. For each problem, we selected several supervised machine -learning algorithms and we evaluated the trade-offs of them on large publicly -available data sets (news, reviews, patents). To explore these trade-offs, we -use different data subsets of increasing size ranging from 50 MB to several GB. -We also consider the impact of the data set and the evaluation technique. We -find that the results do not change significantly and that most of the time the -best algorithms is the fastest. However, we also show that the results for -small data (say less than 100 MB) are different from the results for big data -and in those cases the best algorithm is much harder to determine. -" -6103,1711.02509,Ji Wen,"Structure Regularized Bidirectional Recurrent Convolutional Neural - Network for Relation Classification",cs.CL," Relation classification is an important semantic processing task in the field -of natural language processing (NLP). In this paper, we present a novel model, -Structure Regularized Bidirectional Recurrent Convolutional Neural -Network(SR-BRCNN), to classify the relation of two entities in a sentence, and -the new dataset of Chinese Sanwen for named entity recognition and relation -classification. Some state-of-the-art systems concentrate on modeling the -shortest dependency path (SDP) between two entities leveraging convolutional or -recurrent neural networks. We further explore how to make full use of the -dependency relations information in the SDP and how to improve the model by the -method of structure regularization. We propose a structure regularized model to -learn relation representations along the SDP extracted from the forest formed -by the structure regularized dependency tree, which benefits reducing the -complexity of the whole model and helps improve the $F_{1}$ score by 10.3. -Experimental results show that our method outperforms the state-of-the-art -approaches on the Chinese Sanwen task and performs as well on the SemEval-2010 -Task 8 dataset\footnote{The Chinese Sanwen corpus this paper developed and used -will be released in the further. -" -6104,1711.02604,"Edouard Grave, Moustapha Cisse, Armand Joulin",Unbounded cache model for online language modeling with open vocabulary,cs.LG cs.CL," Recently, continuous cache models were proposed as extensions to recurrent -neural network language models, to adapt their predictions to local changes in -the data distribution. These models only capture the local context, of up to a -few thousands tokens. In this paper, we propose an extension of continuous -cache models, which can scale to larger contexts. In particular, we use a large -scale non-parametric memory component that stores all the hidden activations -seen in the past. We leverage recent advances in approximate nearest neighbor -search and quantization algorithms to store millions of representations while -searching them efficiently. We conduct extensive experiments showing that our -approach significantly improves the perplexity of pre-trained language models -on new distributions, and can scale efficiently to much larger contexts than -previously proposed local cache models. -" -6105,1711.02608,Jorge V. Tohalino and Diego R. Amancio,Extractive Multi-document Summarization Using Multilayer Networks,cs.CL," Huge volumes of textual information has been produced every single day. In -order to organize and understand such large datasets, in recent years, -summarization techniques have become popular. These techniques aims at finding -relevant, concise and non-redundant content from such a big data. While network -methods have been adopted to model texts in some scenarios, a systematic -evaluation of multilayer network models in the multi-document summarization -task has been limited to a few studies. Here, we evaluate the performance of a -multilayer-based method to select the most relevant sentences in the context of -an extractive multi document summarization (MDS) task. In the adopted model, -nodes represent sentences and edges are created based on the number of shared -words between sentences. Differently from previous studies in multi-document -summarization, we make a distinction between edges linking sentences from -different documents (inter-layer) and those connecting sentences from the same -document (intra-layer). As a proof of principle, our results reveal that such a -discrimination between intra- and inter-layer in a multilayered representation -is able to improve the quality of the generated summaries. This piece of -information could be used to improve current statistical methods and related -textual models. -" -6106,1711.02781,"Huiting Liu, Tao Lin, Hanfei Sun, Weijian Lin, Chih-Wei Chang, Teng - Zhong, Alexander Rudnicky",RubyStar: A Non-Task-Oriented Mixture Model Dialog System,cs.CL," RubyStar is a dialog system designed to create ""human-like"" conversation by -combining different response generation strategies. RubyStar conducts a -non-task-oriented conversation on general topics by using an ensemble of -rule-based, retrieval-based and generative methods. Topic detection, engagement -monitoring, and context tracking are used for managing interaction. Predictable -elements of conversation, such as the bot's backstory and simple question -answering are handled by separate modules. We describe a rating scheme we -developed for evaluating response generation. We find that character-level RNN -is an effective generation model for general responses, with proper parameter -settings; however other kinds of conversation topics might benefit from using -other models. -" -6107,1711.02799,"Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard - Sch\""olkopf",Fidelity-Weighted Learning,cs.LG cs.CL cs.NE," Training deep neural networks requires many training samples, but in practice -training labels are expensive to obtain and may be of varying quality, as some -may be from trusted expert labelers while others might be from heuristics or -other sources of weak supervision such as crowd-sourcing. This creates a -fundamental quality versus-quantity trade-off in the learning process. Do we -learn from the small amount of high-quality data or the potentially large -amount of weakly-labeled data? We argue that if the learner could somehow know -and take the label-quality into account when learning the data representation, -we could get the best of both worlds. To this end, we propose -""fidelity-weighted learning"" (FWL), a semi-supervised student-teacher approach -for training deep neural networks using weakly-labeled data. FWL modulates the -parameter updates to a student network (trained on the task we care about) on a -per-sample basis according to the posterior confidence of its label-quality -estimated by a teacher (who has access to the high-quality labels). Both -student and teacher are learned from the data. We evaluate FWL on two tasks in -information retrieval and natural language processing where we outperform -state-of-the-art alternative semi-supervised methods, indicating that our -approach makes better use of strong and weak labels, and leads to better -task-dependent data representations. -" -6108,1711.02918,"Alexander Panchenko, Dmitry Ustalov, Stefano Faralli, Simone P. - Ponzetto, Chris Biemann",Improving Hypernymy Extraction with Distributional Semantic Classes,cs.CL," In this paper, we show how distributionally-induced semantic classes can be -helpful for extracting hypernyms. We present methods for inducing sense-aware -semantic classes using distributional semantics and using these induced -semantic classes for filtering noisy hypernymy relations. Denoising of -hypernyms is performed by labeling each semantic class with its hypernyms. On -the one hand, this allows us to filter out wrong extractions using the global -structure of distributionally similar senses. On the other hand, we infer -missing hypernyms via label propagation to cluster terms. We conduct a -large-scale crowdsourcing study showing that processing of automatically -extracted hypernyms using our approach improves the quality of the hypernymy -extraction in terms of both precision and recall. Furthermore, we show the -utility of our method in the domain taxonomy induction task, achieving the -state-of-the-art results on a SemEval'16 task on taxonomy induction. -" -6109,1711.03147,"Clemente Rubio-Manzano, Martin Pereira-Fari\~na","On the incorporation of interval-valued fuzzy sets into the Bousi-Prolog - system: declarative semantics, implementation and applications",cs.AI cs.CL cs.PL," In this paper we analyse the benefits of incorporating interval-valued fuzzy -sets into the Bousi-Prolog system. A syntax, declarative semantics and im- -plementation for this extension is presented and formalised. We show, by using -potential applications, that fuzzy logic programming frameworks enhanced with -them can correctly work together with lexical resources and ontologies in order -to improve their capabilities for knowledge representation and reasoning. -" -6110,1711.03225,"Qizhe Xie, Guokun Lai, Zihang Dai, Eduard Hovy",Large-scale Cloze Test Dataset Created by Teachers,cs.CL cs.AI," Cloze tests are widely adopted in language exams to evaluate students' -language proficiency. In this paper, we propose the first large-scale -human-created cloze test dataset CLOTH, containing questions used in -middle-school and high-school language exams. With missing blanks carefully -created by teachers and candidate choices purposely designed to be nuanced, -CLOTH requires a deeper language understanding and a wider attention span than -previously automatically-generated cloze datasets. We test the performance of -dedicatedly designed baseline models including a language model trained on the -One Billion Word Corpus and show humans outperform them by a significant -margin. We investigate the source of the performance gap, trace model -deficiencies to some distinct properties of CLOTH, and identify the limited -ability of comprehending the long-term context to be the key bottleneck. -" -6111,1711.03226,"Meng Qu, Xiang Ren, Yu Zhang, Jiawei Han","Weakly-supervised Relation Extraction by Pattern-enhanced Embedding - Learning",cs.CL," Extracting relations from text corpora is an important task in text mining. -It becomes particularly challenging when focusing on weakly-supervised relation -extraction, that is, utilizing a few relation instances (i.e., a pair of -entities and their relation) as seeds to extract more instances from corpora. -Existing distributional approaches leverage the corpus-level co-occurrence -statistics of entities to predict their relations, and require large number of -labeled instances to learn effective relation classifiers. Alternatively, -pattern-based approaches perform bootstrapping or apply neural networks to -model the local contexts, but still rely on large number of labeled instances -to build reliable models. In this paper, we study integrating the -distributional and pattern-based methods in a weakly-supervised setting, such -that the two types of methods can provide complementary supervision for each -other to build an effective, unified model. We propose a novel co-training -framework with a distributional module and a pattern module. During training, -the distributional module helps the pattern module discriminate between the -informative patterns and other patterns, and the pattern module generates some -highly-confident instances to improve the distributional module. The whole -framework can be effectively optimized by iterating between improving the -pattern module and updating the distributional module. We conduct experiments -on two tasks: knowledge base completion with text corpora and corpus-level -relation extraction. Experimental results prove the effectiveness of our -framework in the weakly-supervised setting. -" -6112,1711.03230,Yelong Shen and Xiaodong Liu and Kevin Duh and Jianfeng Gao,"An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading - Comprehension Tasks",cs.CL," Reading comprehension (RC) is a challenging task that requires synthesis of -information across sentences and multiple turns of reasoning. Using a -state-of-the-art RC model, we empirically investigate the performance of -single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets. The -RC model is an end-to-end neural network with iterative attention, and uses -reinforcement learning to dynamically control the number of turns. We find that -multiple-turn reasoning outperforms single-turn reasoning for all question and -answer types; further, we observe that enabling a flexible number of turns -generally improves upon a fixed multiple-turn strategy. %across all question -types, and is particularly beneficial to questions with lengthy, descriptive -answers. We achieve results competitive to the state-of-the-art on these two -datasets. -" -6113,1711.03373,"Ziqi Zhang, Jie Gao, Fabio Ciravegna","SemRe-Rank: Improving Automatic Term Extraction By Incorporating - Semantic Relatedness With Personalised PageRank",cs.IR cs.CL," Automatic Term Extraction deals with the extraction of terminology from a -domain specific corpus, and has long been an established research area in data -and knowledge acquisition. ATE remains a challenging task as it is known that -there is no existing ATE methods that can consistently outperform others in any -domain. This work adopts a refreshed perspective to this problem: instead of -searching for such a 'one-size-fit-all' solution that may never exist, we -propose to develop generic methods to 'enhance' existing ATE methods. We -introduce SemRe-Rank, the first method based on this principle, to incorporate -semantic relatedness - an often overlooked venue - into an existing ATE method -to further improve its performance. SemRe-Rank incorporates word embeddings -into a personalised PageRank process to compute 'semantic importance' scores -for candidate terms from a graph of semantically related words (nodes), which -are then used to revise the scores of candidate terms computed by a base ATE -algorithm. Extensively evaluated with 13 state-of-the-art base ATE methods on -four datasets of diverse nature, it is shown to have achieved widespread -improvement over all base methods and across all datasets, with up to 15 -percentage points when measured by the Precision in the top ranked K candidate -terms (the average for a set of K's), or up to 28 percentage points in F1 -measured at a K that equals to the expected real terms in the candidates (F1 in -short). Compared to an alternative approach built on the well-known TextRank -algorithm, SemRe-Rank can potentially outperform by up to 8 points in Precision -at top K, or up to 17 points in F1. -" -6114,1711.03381,"Yinpei Dai, Zhijian Ou, Dawei Ren, Pengfei Yu","Tracking of enriched dialog states for flexible conversational - information access",cs.CL," Dialog state tracking (DST) is a crucial component in a task-oriented dialog -system for conversational information access. A common practice in current -dialog systems is to define the dialog state by a set of slot-value pairs. Such -representation of dialog states and the slot-filling based DST have been widely -employed, but suffer from three drawbacks. (1) The dialog state can contain -only a single value for a slot, and (2) can contain only users' affirmative -preference over the values for a slot. (3) Current task-based dialog systems -mainly focus on the searching task, while the enquiring task is also very -common in practice. The above observations motivate us to enrich current -representation of dialog states and collect a brand new dialog dataset about -movies, based upon which we build a new DST, called enriched DST (EDST), for -flexible accessing movie information. The EDST supports the searching task, the -enquiring task and their mixed task. We show that the new EDST method not only -achieves good results on Iqiyi dataset, but also outperforms other -state-of-the-art DST methods on the traditional dialog datasets, WOZ2.0 and -DSTC2. -" -6115,1711.03438,"Baoxu Shi, Tim Weninger",Open-World Knowledge Graph Completion,cs.AI cs.CL," Knowledge Graphs (KGs) have been applied to many tasks including Web search, -link prediction, recommendation, natural language processing, and entity -linking. However, most KGs are far from complete and are growing at a rapid -pace. To address these problems, Knowledge Graph Completion (KGC) has been -proposed to improve KGs by filling in its missing connections. Unlike existing -methods which hold a closed-world assumption, i.e., where KGs are fixed and new -entities cannot be easily added, in the present work we relax this assumption -and propose a new open-world KGC task. As a first attempt to solve this task we -introduce an open-world KGC model called ConMask. This model learns embeddings -of the entity's name and parts of its text-description to connect unseen -entities to the KG. To mitigate the presence of noisy text descriptions, -ConMask uses a relationship-dependent content masking to extract relevant -snippets and then trains a fully convolutional neural network to fuse the -extracted snippets with entities in the KG. Experiments on large data sets, -both old and new, show that ConMask performs well in the open-world KGC task -and even outperforms existing KGC models on the standard closed-world KGC task. -" -6116,1711.03483,"\'Eloi Zablocki, Benjamin Piwowarski, Laure Soulier, Patrick Gallinari",Learning Multi-Modal Word Representation Grounded in Visual Context,cs.CL cs.AI cs.CV," Representing the semantics of words is a long-standing problem for the -natural language processing community. Most methods compute word semantics -given their textual context in large corpora. More recently, researchers -attempted to integrate perceptual and visual features. Most of these works -consider the visual appearance of objects to enhance word representations but -they ignore the visual environment and context in which objects appear. We -propose to unify text-based techniques with vision-based techniques by -simultaneously leveraging textual and visual context to learn multimodal word -embeddings. We explore various choices for what can serve as a visual context -and present an end-to-end method to integrate visual context elements in a -multimodal skip-gram model. We provide experiments and extensive analysis of -the obtained results. -" -6117,1711.03541,Ganji Sreeram and Rohit Sinha,Language Modeling for Code-Switched Data: Challenges and Approaches,cs.CL," Lately, the problem of code-switching has gained a lot of attention and has -emerged as an active area of research. In bilingual communities, the speakers -commonly embed the words and phrases of a non-native language into the syntax -of a native language in their day-to-day communications. The code-switching is -a global phenomenon among multilingual communities, still very limited acoustic -and linguistic resources are available as yet. For developing effective speech -based applications, the ability of the existing language technologies to deal -with the code-switched data can not be over emphasized. The code-switching is -broadly classified into two modes: inter-sentential and intra-sentential -code-switching. In this work, we have studied the intra-sentential problem in -the context of code-switching language modeling task. The salient contributions -of this paper includes: (i) the creation of Hindi-English code-switching text -corpus by crawling a few blogging sites educating about the usage of the -Internet (ii) the exploration of the parts-of-speech features towards more -effective modeling of Hindi-English code-switched data by the monolingual -language model (LM) trained on native (Hindi) language data, and (iii) the -proposal of a novel textual factor referred to as the code-switch factor -(CS-factor), which allows the LM to predict the code-switching instances. In -the context of recognition of the code-switching data, the substantial -reduction in the PPL is achieved with the use of POS factors and also the -proposed CS-factor provides independent as well as additive gain in the PPL. -" -6118,1711.03602,"WooJin Chung, Sheng-Fu Wang, and Samuel R. Bowman",The Lifted Matrix-Space Model for Semantic Composition,cs.CL," Tree-structured neural network architectures for sentence encoding draw -inspiration from the approach to semantic composition generally seen in formal -linguistics, and have shown empirical improvements over comparable sequence -models by doing so. Moreover, adding multiplicative interaction terms to the -composition functions in these models can yield significant further -improvements. However, existing compositional approaches that adopt such a -powerful composition function scale poorly, with parameter counts exploding as -model dimension or vocabulary size grows. We introduce the Lifted Matrix-Space -model, which uses a global transformation to map vector word embeddings to -matrices, which can then be composed via an operation based on matrix-matrix -multiplication. Its composition function effectively transmits a larger number -of activations across layers with relatively few model parameters. We evaluate -our model on the Stanford NLI corpus, the Multi-Genre NLI corpus, and the -Stanford Sentiment Treebank and find that it consistently outperforms TreeLSTM -(Tai et al., 2015), the previous best known composition function for -tree-structured models. -" -6119,1711.03688,"Sameen Maruf, Gholamreza Haffari",Document Context Neural Machine Translation with Memory Networks,cs.CL," We present a document-level neural machine translation model which takes both -source and target document context into account using memory networks. We model -the problem as a structured prediction problem with interdependencies among the -observed and hidden variables, i.e., the source sentences and their unobserved -target translations in the document. The resulting structured prediction -problem is tackled with a neural translation model equipped with two memory -components, one each for the source and target side, to capture the documental -interdependencies. We train the model end-to-end, and propose an iterative -decoding algorithm based on block coordinate descent. Experimental results of -English translations from French, German, and Estonian documents show that our -model is effective in exploiting both source and target document context, and -statistically significantly outperforms the previous work in terms of BLEU and -METEOR. -" -6120,1711.03689,"Taku Kato, Takahiro Shinozaki","Reinforcement Learning of Speech Recognition System Based on Policy - Gradient and Hypothesis Selection",cs.CL cs.LG stat.ML," Speech recognition systems have achieved high recognition performance for -several tasks. However, the performance of such systems is dependent on the -tremendously costly development work of preparing vast amounts of task-matched -transcribed speech data for supervised training. The key problem here is the -cost of transcribing speech data. The cost is repeatedly required to support -new languages and new tasks. Assuming broad network services for transcribing -speech data for many users, a system would become more self-sufficient and more -useful if it possessed the ability to learn from very light feedback from the -users without annoying them. In this paper, we propose a general reinforcement -learning framework for speech recognition systems based on the policy gradient -method. As a particular instance of the framework, we also propose a hypothesis -selection-based reinforcement learning method. The proposed framework provides -a new view for several existing training and adaptation methods. The -experimental results show that the proposed method improves the recognition -performance compared to unsupervised adaptation. -" -6121,1711.03697,"Weiyan Wang, Yuxiang WU, Yu Zhang, Zhongqi Lu, Kaixiang Mo, Qiang Yang",Integrating User and Agent Models: A Deep Task-Oriented Dialogue System,cs.CL," Task-oriented dialogue systems can efficiently serve a large number of -customers and relieve people from tedious works. However, existing -task-oriented dialogue systems depend on handcrafted actions and states or -extra semantic labels, which sometimes degrades user experience despite the -intensive human intervention. Moreover, current user simulators have limited -expressive ability so that deep reinforcement Seq2Seq models have to rely on -selfplay and only work in some special cases. To address those problems, we -propose a uSer and Agent Model IntegrAtion (SAMIA) framework inspired by an -observation that the roles of the user and agent models are asymmetric. -Firstly, this SAMIA framework model the user model as a Seq2Seq learning -problem instead of ranking or designing rules. Then the built user model is -used as a leverage to train the agent model by deep reinforcement learning. In -the test phase, the output of the agent model is filtered by the user model to -enhance the stability and robustness. Experiments on a real-world coffee -ordering dataset verify the effectiveness of the proposed SAMIA framework. -" -6122,1711.03736,Masoud Fatemi and Mehran Safayani,"Joint Sentiment/Topic Modeling on Text Data Using Boosted Restricted - Boltzmann Machine",cs.CL cs.IR cs.LG," Recently by the development of the Internet and the Web, different types of -social media such as web blogs become an immense source of text data. Through -the processing of these data, it is possible to discover practical information -about different topics, individuals opinions and a thorough understanding of -the society. Therefore, applying models which can automatically extract the -subjective information from the documents would be efficient and helpful. Topic -modeling methods, also sentiment analysis are the most raised topics in the -natural language processing and text mining fields. In this paper a new -structure for joint sentiment-topic modeling based on Restricted Boltzmann -Machine (RBM) which is a type of neural networks is proposed. By modifying the -structure of RBM as well as appending a layer which is analogous to sentiment -of text data to it, we propose a generative structure for joint sentiment topic -modeling based on neutral networks. The proposed method is supervised and -trained by the Contrastive Divergence algorithm. The new attached layer in the -proposed model is a layer with the multinomial probability distribution which -can be used in text data sentiment classification or any other supervised -application. The proposed model is compared with existing models in the -experiments such as evaluating as a generative model, sentiment classification, -information retrieval and the corresponding results demonstrate the efficiency -of the method. -" -6123,1711.03754,"Todor Mihaylov, Zornitsa Kozareva, Anette Frank","Neural Skill Transfer from Supervised Language Tasks to Reading - Comprehension",cs.CL," Reading comprehension is a challenging task in natural language processing -and requires a set of skills to be solved. While current approaches focus on -solving the task as a whole, in this paper, we propose to use a neural network -`skill' transfer approach. We transfer knowledge from several lower-level -language tasks (skills) including textual entailment, named entity recognition, -paraphrase detection and question type classification into the reading -comprehension model. - We conduct an empirical evaluation and show that transferring language skill -knowledge leads to significant improvements for the task with much fewer steps -compared to the baseline model. We also show that the skill transfer approach -is effective even with small amounts of training data. Another finding of this -work is that using token-wise deep label supervision for text classification -improves the performance of transfer learning. -" -6124,1711.03759,"Jie Yang, Yue Zhang, Linwei Li, Xingxuan Li",YEDDA: A Lightweight Collaborative Text Span Annotation Tool,cs.CL," In this paper, we introduce \textsc{Yedda}, a lightweight but efficient and -comprehensive open-source tool for text span annotation. \textsc{Yedda} -provides a systematic solution for text span annotation, ranging from -collaborative user annotation to administrator evaluation and analysis. It -overcomes the low efficiency of traditional text annotation tools by annotating -entities through both command line and shortcut keys, which are configurable -with custom labels. \textsc{Yedda} also gives intelligent recommendations by -learning the up-to-date annotated text. An administrator client is developed to -evaluate annotation quality of multiple annotators and generate detailed -comparison report for each annotator pair. Experiments show that the proposed -system can reduce the annotation time by half compared with existing annotation -tools. And the annotation time can be further compressed by 16.47\% through -intelligent recommendation. -" -6125,1711.03800,"Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool",Object Referring in Visual Scene with Spoken Language,cs.CV cs.CL cs.HC," Object referring has important applications, especially for human-machine -interaction. While having received great attention, the task is mainly attacked -with written language (text) as input rather than spoken language (speech), -which is more natural. This paper investigates Object Referring with Spoken -Language (ORSpoken) by presenting two datasets and one novel approach. Objects -are annotated with their locations in images, text descriptions and speech -descriptions. This makes the datasets ideal for multi-modality learning. The -approach is developed by carefully taking down ORSpoken problem into three -sub-problems and introducing task-specific vision-language interactions at the -corresponding levels. Experiments show that our method outperforms competing -methods consistently and significantly. The approach is also evaluated in the -presence of audio noise, showing the efficacy of the proposed vision-language -interaction methods in counteracting background noise. -" -6126,1711.03859,Diego Molla,"Towards the Use of Deep Reinforcement Learning with Global Policy For - Query-based Extractive Summarisation",cs.CL," Supervised approaches for text summarisation suffer from the problem of -mismatch between the target labels/scores of individual sentences and the -evaluation score of the final summary. Reinforcement learning can solve this -problem by providing a learning mechanism that uses the score of the final -summary as a guide to determine the decisions made at the time of selection of -each sentence. In this paper we present a proof-of-concept approach that -applies a policy-gradient algorithm to learn a stochastic policy using an -undiscounted reward. The method has been applied to a policy consisting of a -simple neural network and simple features. The resulting deep reinforcement -learning system is able to learn a global policy and obtain encouraging -results. -" -6127,1711.03946,"Geng Ji, Robert Bamler, Erik B. Sudderth, Stephan Mandt",Bayesian Paragraph Vectors,cs.CL cs.LG stat.ML," Word2vec (Mikolov et al., 2013) has proven to be successful in natural -language processing by capturing the semantic relationships between different -words. Built on top of single-word embeddings, paragraph vectors (Le and -Mikolov, 2014) find fixed-length representations for pieces of text with -arbitrary lengths, such as documents, paragraphs, and sentences. In this work, -we propose a novel interpretation for neural-network-based paragraph vectors by -developing an unsupervised generative model whose maximum likelihood solution -corresponds to traditional paragraph vectors. This probabilistic formulation -allows us to go beyond point estimates of parameters and to perform Bayesian -posterior inference. We find that the entropy of paragraph vectors decreases -with the length of documents, and that information about posterior uncertainty -improves performance in supervised learning tasks such as sentiment analysis -and paraphrase detection. -" -6128,1711.03953,"Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen",Breaking the Softmax Bottleneck: A High-Rank RNN Language Model,cs.CL cs.LG," We formulate language modeling as a matrix factorization problem, and show -that the expressiveness of Softmax-based models (including the majority of -neural language models) is limited by a Softmax bottleneck. Given that natural -language is highly context-dependent, this further implies that in practice -Softmax with distributed word embeddings does not have enough capacity to model -natural language. We propose a simple and effective method to address this -issue, and improve the state-of-the-art perplexities on Penn Treebank and -WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on -the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points -in perplexity. -" -6129,1711.04044,"Sahil Garg and Aram Galstyan and Greg Ver Steeg and Irina Rish and - Guillermo Cecchi and Shuyang Gao",Kernelized Hashcode Representations for Relation Extraction,cs.CL cs.IR cs.LG," Kernel methods have produced state-of-the-art results for a number of NLP -tasks such as relation extraction, but suffer from poor scalability due to the -high cost of computing kernel similarities between natural language structures. -A recently proposed technique, kernelized locality-sensitive hashing (KLSH), -can significantly reduce the computational cost, but is only applicable to -classifiers operating on kNN graphs. Here we propose to use random subspaces of -KLSH codes for efficiently constructing an explicit representation of NLP -structures suitable for general classification methods. Further, we propose an -approach for optimizing the KLSH model for classification problems by -maximizing an approximation of mutual information between the KLSH codes -(feature vectors) and the class labels. We evaluate the proposed approach on -biomedical relation extraction datasets, and observe significant and robust -improvements in accuracy w.r.t. state-of-the-art classifiers, along with -drastic (orders-of-magnitude) speedup compared to conventional kernel methods. -" -6130,1711.04071,"Liwei Cai, William Yang Wang",KBGAN: Adversarial Learning for Knowledge Graph Embeddings,cs.CL cs.AI," We introduce KBGAN, an adversarial learning framework to improve the -performances of a wide range of existing knowledge graph embedding models. -Because knowledge graphs typically only contain positive facts, sampling useful -negative training examples is a non-trivial task. Replacing the head or tail -entity of a fact with a uniformly randomly selected entity is a conventional -method for generating negative facts, but the majority of the generated -negative facts can be easily discriminated from positive facts, and will -contribute little towards the training. Inspired by generative adversarial -networks (GANs), we use one knowledge graph embedding model as a negative -sample generator to assist the training of our desired model, which acts as the -discriminator in GANs. This framework is independent of the concrete form of -generator and discriminator, and therefore can utilize a wide variety of -knowledge graph embedding models as its building blocks. In experiments, we -adversarially train two translation-based models, TransE and TransD, each with -assistance from one of the two probability-based models, DistMult and ComplEx. -We evaluate the performances of KBGAN on the link prediction task, using three -knowledge base completion datasets: FB15k-237, WN18 and WN18RR. Experimental -results show that adversarial training substantially improves the performances -of target embedding models under various settings. -" -6131,1711.04075,"Haoran Shi, Pengtao Xie, Zhiting Hu, Ming Zhang, and Eric P. Xing",Towards Automated ICD Coding Using Deep Learning,cs.CL," International Classification of Diseases(ICD) is an authoritative health care -classification system of different diseases and conditions for clinical and -management purposes. Considering the complicated and dedicated process to -assign correct codes to each patient admission based on overall diagnosis, we -propose a hierarchical deep learning model with attention mechanism which can -automatically assign ICD diagnostic codes given written diagnosis. We utilize -character-aware neural language models to generate hidden representations of -written diagnosis descriptions and ICD codes, and design an attention mechanism -to address the mismatch between the numbers of descriptions and corresponding -codes. Our experimental results show the strong potential of automated ICD -coding from diagnosis descriptions. Our best model achieves 0.53 and 0.90 of F1 -score and area under curve of receiver operating characteristic respectively. -The result outperforms those achieved using character-unaware encoding method -or without attention mechanism. It indicates that our proposed deep learning -model can code automatically in a reasonable way and provide a framework for -computer-auxiliary ICD coding. -" -6132,1711.04079,"Kaixiang Mo, Yu Zhang, Qiang Yang, Pascale Fung","Fine Grained Knowledge Transfer for Personalized Task-oriented Dialogue - Systems",cs.CL cs.AI," Training a personalized dialogue system requires a lot of data, and the data -collected for a single user is usually insufficient. One common practice for -this problem is to share training dialogues between different users and train -multiple sequence-to-sequence dialogue models together with transfer learning. -However, current sequence-to-sequence transfer learning models operate on the -entire sentence, which might cause negative transfer if different personal -information from different users is mixed up. We propose a personalized decoder -model to transfer finer granularity phrase-level knowledge between different -users while keeping personal preferences of each user intact. A novel personal -control gate is introduced, enabling the personalized decoder to switch between -generating personalized phrases and shared phrases. The proposed personalized -decoder model can be easily combined with various deep models and can be -trained with reinforcement learning. Real-world experimental results -demonstrate that the phrase-level personalized decoder improves the BLEU over -multiple sentence-level transfer baseline models by as much as 7.5%. -" -6133,1711.04090,"Xianda Zhou, William Yang Wang",MojiTalk: Generating Emotional Responses at Scale,cs.CL cs.AI," Generating emotional language is a key step towards building empathetic -natural language processing agents. However, a major challenge for this line of -research is the lack of large-scale labeled training data, and previous studies -are limited to only small sets of human annotated sentiment labels. -Additionally, explicitly controlling the emotion and sentiment of generated -text is also difficult. In this paper, we take a more radical approach: we -exploit the idea of leveraging Twitter data that are naturally labeled with -emojis. More specifically, we collect a large corpus of Twitter conversations -that include emojis in the response, and assume the emojis convey the -underlying emotions of the sentence. We then introduce a reinforced conditional -variational encoder approach to train a deep generative model on these -conversations, which allows us to use emojis to control the emotion of the -generated text. Experimentally, we show in our quantitative and qualitative -analyses that the proposed models can successfully generate high-quality -abstractive conversation responses in accordance with designated emotions. -" -6134,1711.04115,"Mitodru Niyogi (1), Asim K. Pal (2) ((1) Govt. College of Engineering - & Ceramic Technology, Kolkata, India, (2) Management Information Systems, IIM - Calcutta, Kolkata, India)","Discovering conversational topics and emotions associated with - Demonetization tweets in India",cs.CL," Social media platforms contain great wealth of information which provides us -opportunities explore hidden patterns or unknown correlations, and understand -people's satisfaction with what they are discussing. As one showcase, in this -paper, we summarize the data set of Twitter messages related to recent -demonetization of all Rs. 500 and Rs. 1000 notes in India and explore insights -from Twitter's data. Our proposed system automatically extracts the popular -latent topics in conversations regarding demonetization discussed in Twitter -via the Latent Dirichlet Allocation (LDA) based topic model and also identifies -the correlated topics across different categories. Additionally, it also -discovers people's opinions expressed through their tweets related to the event -under consideration via the emotion analyzer. The system also employs an -intuitive and informative visualization to show the uncovered insight. -Furthermore, we use an evaluation measure, Normalized Mutual Information (NMI), -to select the best LDA models. The obtained LDA results show that the tool can -be effectively used to extract discussion topics and summarize them for further -manual analysis. -" -6135,1711.04154,"Anna Potapenko, Artem Popov, Konstantin Vorontsov","Interpretable probabilistic embeddings: bridging the gap between topic - models and neural networks",cs.CL," We consider probabilistic topic models and more recent word embedding -techniques from a perspective of learning hidden semantic representations. -Inspired by a striking similarity of the two approaches, we merge them and -learn probabilistic embeddings with online EM-algorithm on word co-occurrence -data. The resulting embeddings perform on par with Skip-Gram Negative Sampling -(SGNS) on word similarity tasks and benefit in the interpretability of the -components. Next, we learn probabilistic document embeddings that outperform -paragraph2vec on a document similarity task and require less memory and time -for training. Finally, we employ multimodal Additive Regularization of Topic -Models (ARTM) to obtain a high sparsity and learn embeddings for other -modalities, such as timestamps and categories. We observe further improvement -of word similarity performance and meaningful inter-modality similarities. -" -6136,1711.04168,"Chundi Liu, Shunan Zhao, Maksims Volkovs",Unsupervised Document Embedding With CNNs,cs.CL cs.LG stat.ML," We propose a new model for unsupervised document embedding. Leading existing -approaches either require complex inference or use recurrent neural networks -(RNN) that are difficult to parallelize. We take a different route and develop -a convolutional neural network (CNN) embedding model. Our CNN architecture is -fully parallelizable resulting in over 10x speedup in inference time over RNN -models. Parallelizable architecture enables to train deeper models where each -successive layer has increasingly larger receptive field and models longer -range semantic structure within the document. We additionally propose a fully -unsupervised learning algorithm to train this model based on stochastic forward -prediction. Empirical results on two public benchmarks show that our approach -produces comparable to state-of-the-art accuracy at a fraction of computational -cost. -" -6137,1711.04204,"Frank F. Xu, Bill Yuchen Lin, Kenny Q. Zhu",Automatic Extraction of Commonsense LocatedNear Knowledge,cs.CL cs.AI," LocatedNear relation is a kind of commonsense knowledge describing two -physical objects that are typically found near each other in real life. In this -paper, we study how to automatically extract such relationship through a -sentence-level relation classifier and aggregating the scores of entity pairs -from a large corpus. Also, we release two benchmark datasets for evaluation and -future research. -" -6138,1711.04231,"Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, and Tiejun Zhao",Syntax-Directed Attention for Neural Machine Translation,cs.CL," Attention mechanism, including global attention and local attention, plays a -key role in neural machine translation (NMT). Global attention attends to all -source words for word prediction. In comparison, local attention selectively -looks at fixed-window source words. However, alignment weights for the current -target word often decrease to the left and right by linear distance centering -on the aligned source position and neglect syntax-directed distance -constraints. In this paper, we extend local attention with syntax-distance -constraint, to focus on syntactically related source words with the predicted -target word, thus learning a more effective context vector for word prediction. -Moreover, we further propose a double context NMT architecture, which consists -of a global context vector and a syntax-directed context vector over the global -attention, to provide more translation performance for NMT from source -representation. The experiments on the large-scale Chinese-to-English and -English-to-Germen translation tasks show that the proposed approach achieves a -substantial and significant improvement over the baseline system. -" -6139,1711.04289,"Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, Si Wei","Neural Natural Language Inference Models Enhanced with External - Knowledge",cs.CL," Modeling natural language inference is a very challenging task. With the -availability of large annotated data, it has recently become feasible to train -complex models such as neural-network-based inference models, which have shown -to achieve the state-of-the-art performance. Although there exist relatively -large annotated data, can machines learn all knowledge needed to perform -natural language inference (NLI) from these data? If not, how can -neural-network-based NLI models benefit from external knowledge and how to -build NLI models to leverage it? In this paper, we enrich the state-of-the-art -neural natural language inference models with external knowledge. We -demonstrate that the proposed models improve neural NLI models to achieve the -state-of-the-art performance on the SNLI and MultiNLI datasets. -" -6140,1711.04352,"Felix Wu, Ni Lao, John Blitzer, Guandao Yang, Kilian Weinberger",Fast Reading Comprehension with ConvNets,cs.CL," State-of-the-art deep reading comprehension models are dominated by recurrent -neural nets. Their sequential nature is a natural fit for language, but it also -precludes parallelization within an instances and often becomes the bottleneck -for deploying such models to latency critical scenarios. This is particularly -problematic for longer texts. Here we present a convolutional architecture as -an alternative to these recurrent architectures. Using simple dilated -convolutional units in place of recurrent ones, we achieve results comparable -to the state of the art on two question answering tasks, while at the same time -achieving up to two orders of magnitude speedups for question answering. -" -6141,1711.04411,"Chunqi Wang, Bo Xu","Convolutional Neural Network with Word Embeddings for Chinese Word - Segmentation",cs.CL," Character-based sequence labeling framework is flexible and efficient for -Chinese word segmentation (CWS). Recently, many character-based neural models -have been applied to CWS. While they obtain good performance, they have two -obvious weaknesses. The first is that they heavily rely on manually designed -bigram feature, i.e. they are not good at capturing n-gram features -automatically. The second is that they make no use of full word information. -For the first weakness, we propose a convolutional neural model, which is able -to capture rich n-gram features without any feature engineering. For the second -one, we propose an effective approach to integrate the proposed model with word -embeddings. We evaluate the model on two benchmark datasets: PKU and MSR. -Without any feature engineering, the model obtains competitive performance -- -95.7% on PKU and 97.3% on MSR. Armed with word embeddings, the model achieves -state-of-the-art performance on both datasets -- 96.5% on PKU and 98.0% on MSR, -without using any external labeled resource. -" -6142,1711.04434,"Ziqiang Cao, Furu Wei, Wenjie Li, Sujian Li",Faithful to the Original: Fact Aware Neural Abstractive Summarization,cs.IR cs.CL," Unlike extractive summarization, abstractive summarization has to fuse -different parts of the source text, which inclines to create fake facts. Our -preliminary study reveals nearly 30% of the outputs from a state-of-the-art -neural summarization system suffer from this problem. While previous -abstractive summarization approaches usually focus on the improvement of -informativeness, we argue that faithfulness is also a vital prerequisite for a -practical abstractive summarization system. To avoid generating fake facts in a -summary, we leverage open information extraction and dependency parse -technologies to extract actual fact descriptions from the source text. The -dual-attention sequence-to-sequence framework is then proposed to force the -generation conditioned on both the source text and the extracted fact -descriptions. Experiments on the Gigaword benchmark dataset demonstrate that -our model can greatly reduce fake summaries by 80%. Notably, the fact -descriptions also bring significant improvement on informativeness since they -often condense the meaning of the source text. -" -6143,1711.04436,"Xiaojun Xu, Chang Liu, Dawn Song","SQLNet: Generating Structured Queries From Natural Language Without - Reinforcement Learning",cs.CL cs.AI cs.DB," Synthesizing SQL queries from natural language is a long-standing open -problem and has been attracting considerable interest recently. Toward solving -the problem, the de facto approach is to employ a sequence-to-sequence-style -model. Such an approach will necessarily require the SQL queries to be -serialized. Since the same SQL query may have multiple equivalent -serializations, training a sequence-to-sequence-style model is sensitive to the -choice from one of them. This phenomenon is documented as the ""order-matters"" -problem. Existing state-of-the-art approaches rely on reinforcement learning to -reward the decoder when it generates any of the equivalent serializations. -However, we observe that the improvement from reinforcement learning is -limited. - In this paper, we propose a novel approach, i.e., SQLNet, to fundamentally -solve this problem by avoiding the sequence-to-sequence structure when the -order does not matter. In particular, we employ a sketch-based approach where -the sketch contains a dependency graph so that one prediction can be done by -taking into consideration only the previous predictions that it depends on. In -addition, we propose a sequence-to-set model as well as the column attention -mechanism to synthesize the query based on the sketch. By combining all these -novel techniques, we show that SQLNet can outperform the prior art by 9% to 13% -on the WikiSQL task. -" -6144,1711.04452,"Jennifer Edmond, Georgina Nugent Folan","Digitising Cultural Complexity: Representing Rich Cultural Data in a Big - Data environment",cs.CL cs.CY cs.DB cs.DL," One of the major terminological forces driving ICT integration in research -today is that of ""big data."" While the phrase sounds inclusive and integrative, -""big data"" approaches are highly selective, excluding input that cannot be -effectively structured, represented, or digitised. Data of this complex sort is -precisely the kind that human activity produces, but the technological -imperative to enhance signal through the reduction of noise does not -accommodate this richness. Data and the computational approaches that -facilitate ""big data"" have acquired a perceived objectivity that belies their -curated, malleable, reactive, and performative nature. In an input environment -where anything can ""be data"" once it is entered into the system as ""data,"" data -cleaning and processing, together with the metadata and information -architectures that structure and facilitate our cultural archives acquire a -capacity to delimit what data are. This engenders a process of simplification -that has major implications for the potential for future innovation within -research environments that depend on rich material yet are increasingly -mediated by digital technologies. This paper presents the preliminary findings -of the European-funded KPLEX (Knowledge Complexity) project which investigates -the delimiting effect digital mediation and datafication has on rich, complex -cultural data. The paper presents a systematic review of existing implicit -definitions of data, elaborating on the implications of these definitions and -highlighting the ways in which metadata and computational technologies can -restrict the interpretative potential of data. It sheds light on the gap -between analogue or augmented digital practices and fully computational ones, -and the strategies researchers have developed to deal with this gap. The paper -proposes a reconceptualisation of data as it is functionally employed within -digitally-mediated research so as to incorporate and acknowledge the richness -and complexity of our source materials. -" -6145,1711.04457,"Yining Wang, Long Zhou, Jiajun Zhang, Chengqing Zong","Word, Subword or Character? An Empirical Study of Granularity in - Chinese-English NMT",cs.CL," Neural machine translation (NMT), a new approach to machine translation, has -been proved to outperform conventional statistical machine translation (SMT) -across a variety of language pairs. Translation is an open-vocabulary problem, -but most existing NMT systems operate with a fixed vocabulary, which causes the -incapability of translating rare words. This problem can be alleviated by using -different translation granularities, such as character, subword and hybrid -word-character. Translation involving Chinese is one of the most difficult -tasks in machine translation, however, to the best of our knowledge, there has -not been any other work exploring which translation granularity is most -suitable for Chinese in NMT. In this paper, we conduct an extensive comparison -using Chinese-English NMT as a case study. Furthermore, we discuss the -advantages and disadvantages of various translation granularities in detail. -Our experiments show that subword model performs best for Chinese-to-English -translation with the vocabulary which is not so big while hybrid word-character -model is most suitable for English-to-Chinese translation. Moreover, -experiments of different granularities show that Hybrid_BPE method can achieve -best result on Chinese-to-English translation task. -" -6146,1711.04498,"Yong Zhang, Hongming Zhou, Nganmeng Tan, Saeed Bagheri, Meng Joo Er",Targeted Advertising Based on Browsing History,cs.IR cs.AI cs.CL," Audience interest, demography, purchase behavior and other possible -classifications are ex- tremely important factors to be carefully studied in a -targeting campaign. This information can help advertisers and publishers -deliver advertisements to the right audience group. How- ever, it is not easy -to collect such information, especially for the online audience with whom we -have limited interaction and minimum deterministic knowledge. In this paper, we -pro- pose a predictive framework that can estimate online audience demographic -attributes based on their browsing histories. Under the proposed framework, -first, we retrieve the content of the websites visited by audience, and -represent the content as website feature vectors; second, we aggregate the -vectors of websites that audience have visited and arrive at feature vectors -representing the users; finally, the support vector machine is exploited to -predict the audience demographic attributes. The key to achieving good -prediction performance is preparing representative features of the audience. -Word Embedding, a widely used tech- nique in natural language processing tasks, -together with term frequency-inverse document frequency weighting scheme is -used in the proposed method. This new representation ap- proach is unsupervised -and very easy to implement. The experimental results demonstrate that the new -audience feature representation method is more powerful than existing baseline -methods, leading to a great improvement in prediction accuracy. -" -6147,1711.04564,"Markus M\""uller, Sebastian St\""uker, Alex Waibel",Phonemic and Graphemic Multilingual CTC Based Speech Recognition,eess.AS cs.AI cs.CL," Training automatic speech recognition (ASR) systems requires large amounts of -data in the target language in order to achieve good performance. Whereas large -training corpora are readily available for languages like English, there exists -a long tail of languages which do suffer from a lack of resources. One method -to handle data sparsity is to use data from additional source languages and -build a multilingual system. Recently, ASR systems based on recurrent neural -networks (RNNs) trained with connectionist temporal classification (CTC) have -gained substantial research interest. In this work, we extended our previous -approach towards training CTC-based systems multilingually. Our systems feature -a global phone set, based on the joint phone sets of each source language. We -evaluated the use of different language combinations as well as the addition of -Language Feature Vectors (LFVs). As contrastive experiment, we built systems -based on graphemes as well. Systems having a multilingual phone set are known -to suffer in performance compared to their monolingual counterparts. With our -proposed approach, we could reduce the gap between these mono- and multilingual -setups, using either graphemes or phonemes. -" -6148,1711.04569,"Markus M\""uller, Sebastian St\""uker, Alex Waibel",Multilingual Adaptation of RNN Based ASR Systems,eess.AS cs.AI cs.CL," In this work, we focus on multilingual systems based on recurrent neural -networks (RNNs), trained using the Connectionist Temporal Classification (CTC) -loss function. Using a multilingual set of acoustic units poses difficulties. -To address this issue, we proposed Language Feature Vectors (LFVs) to train -language adaptive multilingual systems. Language adaptation, in contrast to -speaker adaptation, needs to be applied not only on the feature level, but also -to deeper layers of the network. In this work, we therefore extended our -previous approach by introducing a novel technique which we call ""modulation"". -Based on this method, we modulated the hidden layers of RNNs using LFVs. We -evaluated this approach in both full and low resource conditions, as well as -for grapheme and phone based systems. Lower error rates throughout the -different conditions could be achieved by the use of the modulation. -" -6149,1711.04731,"Keith Carlson, Allen Riddell, Daniel Rockmore",Evaluating prose style transfer with the Bible,cs.CL," In the prose style transfer task a system, provided with text input and a -target prose style, produces output which preserves the meaning of the input -text but alters the style. These systems require parallel data for evaluation -of results and usually make use of parallel data for training. Currently, there -are few publicly available corpora for this task. In this work, we identify a -high-quality source of aligned, stylistically distinct text in different -versions of the Bible. We provide a standardized split, into training, -development and testing data, of the public domain versions in our corpus. This -corpus is highly parallel since many Bible versions are included. Sentences are -aligned due to the presence of chapter and verse numbers within all versions of -the text. In addition to the corpus, we present the results, as measured by the -BLEU and PINC metrics, of several models trained on our data which can serve as -baselines for future research. While we present these data as a style transfer -corpus, we believe that it is of unmatched quality and may be useful for other -natural language tasks as well. -" -6150,1711.04805,"David Grangier, Michael Auli",QuickEdit: Editing Text & Translations by Crossing Words Out,cs.CL," We propose a framework for computer-assisted text editing. It applies to -translation post-editing and to paraphrasing. Our proposal relies on very -simple interactions: a human editor modifies a sentence by marking tokens they -would like the system to change. Our model then generates a new sentence which -reformulates the initial sentence by avoiding marked words. The approach builds -upon neural sequence-to-sequence modeling and introduces a neural network which -takes as input a sentence along with change markers. Our model is trained on -translation bitext by simulating post-edits. We demonstrate the advantage of -our approach for translation post-editing through simulated post-edits. We also -evaluate our model for paraphrasing through a user study. -" -6151,1711.04903,"Michihiro Yasunaga, Jungo Kasai, Dragomir Radev",Robust Multilingual Part-of-Speech Tagging via Adversarial Training,cs.CL cs.LG," Adversarial training (AT) is a powerful regularization method for neural -networks, aiming to achieve robustness to input perturbations. Yet, the -specific effects of the robustness obtained from AT are still unclear in the -context of natural language processing. In this paper, we propose and analyze a -neural POS tagging model that exploits AT. In our experiments on the Penn -Treebank WSJ corpus and the Universal Dependencies (UD) dataset (27 languages), -we find that AT not only improves the overall tagging accuracy, but also 1) -prevents over-fitting well in low resource languages and 2) boosts tagging -accuracy for rare / unseen words. We also demonstrate that 3) the improved -tagging performance by AT contributes to the downstream task of dependency -parsing, and that 4) AT helps the model to learn cleaner word representations. -5) The proposed AT model is generally effective in different sequence labeling -tasks. These positive results motivate further use of AT for natural language -tasks. -" -6152,1711.04951,"Dat Quoc Nguyen, Thanh Vu, Dai Quoc Nguyen, Mark Dras, Mark Johnson",From Word Segmentation to POS Tagging for Vietnamese,cs.CL," This paper presents an empirical comparison of two strategies for Vietnamese -Part-of-Speech (POS) tagging from unsegmented text: (i) a pipeline strategy -where we consider the output of a word segmenter as the input of a POS tagger, -and (ii) a joint strategy where we predict a combined segmentation and POS tag -for each syllable. We also make a comparison between state-of-the-art (SOTA) -feature-based and neural network-based models. On the benchmark Vietnamese -treebank (Nguyen et al., 2009), experimental results show that the pipeline -strategy produces better scores of POS tagging from unsegmented text than the -joint strategy, and the highest accuracy is obtained by using a feature-based -model. -" -6153,1711.04956,"Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio - Ranzato",Classical Structured Prediction Losses for Sequence to Sequence Learning,cs.CL," There has been much recent work on training neural attention models at the -sequence-level using either reinforcement learning-style methods or by -optimizing the beam. In this paper, we survey a range of classical objective -functions that have been widely used to train linear models for structured -prediction and apply them to neural sequence to sequence models. Our -experiments show that these losses can perform surprisingly well by slightly -outperforming beam search optimization in a like for like setup. We also report -new state of the art results on both IWSLT'14 German-English translation as -well as Gigaword abstractive summarization. On the larger WMT'14 English-French -translation task, sequence-level training achieves 41.5 BLEU which is on par -with the state of the art. -" -6154,1711.04964,"Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen and Xiaodong Liu",Dynamic Fusion Networks for Machine Reading Comprehension,cs.CL," This paper presents a novel neural model - Dynamic Fusion Network (DFN), for -machine reading comprehension (MRC). DFNs differ from most state-of-the-art -models in their use of a dynamic multi-strategy attention process, in which -passages, questions and answer candidates are jointly fused into attention -vectors, along with a dynamic multi-step reasoning module for generating -answers. With the use of reinforcement learning, for each input sample that -consists of a question, a passage and a list of candidate answers, an instance -of DFN with a sample-specific network architecture can be dynamically -constructed by determining what attention strategy to apply and how many -reasoning steps to take. Experiments show that DFNs achieve the best result -reported on RACE, a challenging MRC dataset that contains real human reading -questions in a wide variety of types. A detailed empirical analysis also -demonstrates that DFNs can produce attention vectors that summarize information -from questions, passages and answer candidates more effectively than other -popular MRC models. -" -6155,1711.04981,"Yi Tay, Minh C. Phan, Luu Anh Tuan, Siu Cheung Hui","SkipFlow: Incorporating Neural Coherence Features for End-to-End - Automatic Text Scoring",cs.AI cs.CL," Deep learning has demonstrated tremendous potential for Automatic Text -Scoring (ATS) tasks. In this paper, we describe a new neural architecture that -enhances vanilla neural network models with auxiliary neural coherence -features. Our new method proposes a new \textsc{SkipFlow} mechanism that models -relationships between snapshots of the hidden representations of a long -short-term memory (LSTM) network as it reads. Subsequently, the semantic -relationships between multiple snapshots are used as auxiliary features for -prediction. This has two main benefits. Firstly, essays are typically long -sequences and therefore the memorization capability of the LSTM network may be -insufficient. Implicit access to multiple snapshots can alleviate this problem -by acting as a protection against vanishing gradients. The parameters of the -\textsc{SkipFlow} mechanism also acts as an auxiliary memory. Secondly, -modeling relationships between multiple positions allows our model to learn -features that represent and approximate textual coherence. In our model, we -call this \textit{neural coherence} features. Overall, we present a unified -deep learning architecture that generates neural coherence features as it reads -in an end-to-end fashion. Our approach demonstrates state-of-the-art -performance on the benchmark ASAP dataset, outperforming not only feature -engineering baselines but also other deep learning models. -" -6156,1711.04987,"Daniel Fried, Jacob Andreas, Dan Klein",Unified Pragmatic Models for Generating and Following Instructions,cs.CL," We show that explicit pragmatic inference aids in correctly generating and -following natural language instructions for complex, sequential tasks. Our -pragmatics-enabled models reason about why speakers produce certain -instructions, and about how listeners will react upon hearing them. Like -previous pragmatic models, we use learned base listener and speaker models to -build a pragmatic speaker that uses the base listener to simulate the -interpretation of candidate descriptions, and a pragmatic listener that reasons -counterfactually about alternative descriptions. We extend these models to -tasks with sequential structure. Evaluation of language generation and -interpretation shows that pragmatic inference improves state-of-the-art -listener models (at correctly interpreting human instructions) and speaker -models (at producing instructions correctly interpreted by humans) in diverse -settings. -" -6157,1711.05066,Jianpeng Cheng and Siva Reddy and Vijay Saraswat and Mirella Lapata,Learning an Executable Neural Semantic Parser,cs.CL," This paper describes a neural semantic parser that maps natural language -utterances onto logical forms which can be executed against a task-specific -environment, such as a knowledge base or a database, to produce a response. The -parser generates tree-structured logical forms with a transition-based approach -which combines a generic tree-generation algorithm with domain-general -operations defined by the logical language. The generation process is modeled -by structured recurrent neural networks, which provide a rich encoding of the -sentential context and generation history for making predictions. To tackle -mismatches between natural language and logical form tokens, various attention -mechanisms are explored. Finally, we consider different training settings for -the neural semantic parser, including a fully supervised training where -annotated logical forms are given, weakly-supervised training where denotations -are provided, and distant supervision where only unlabeled sentences and a -knowledge base are available. Experiments across a wide range of datasets -demonstrate the effectiveness of our parser. -" -6158,1711.05073,"Wei He, Kai Liu, Jing Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yuan - Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, Xuan Liu, Tian Wu, Haifeng Wang","DuReader: a Chinese Machine Reading Comprehension Dataset from - Real-world Applications",cs.CL," This paper introduces DuReader, a new large-scale, open-domain Chinese ma- -chine reading comprehension (MRC) dataset, designed to address real-world MRC. -DuReader has three advantages over previous MRC datasets: (1) data sources: -questions and documents are based on Baidu Search and Baidu Zhidao; answers are -manually generated. (2) question types: it provides rich annotations for more -question types, especially yes-no and opinion questions, that leaves more -opportunity for the research community. (3) scale: it contains 200K questions, -420K answers and 1M documents; it is the largest Chinese MRC dataset so far. -Experiments show that human performance is well above current state-of-the-art -baseline systems, leaving plenty of room for the community to make -improvements. To help the community make these improvements, both DuReader and -baseline systems have been posted online. We also organize a shared competition -to encourage the exploration of more models. Since the release of the task, -there are significant improvements over the baselines. -" -6159,1711.05116,"Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu - Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell","Evidence Aggregation for Answer Re-Ranking in Open-Domain Question - Answering",cs.CL cs.AI," A popular recent approach to answering open-domain questions is to first -search for question-related passages and then apply reading comprehension -models to extract answers. Existing methods usually extract answers from single -passages independently. But some questions require a combination of evidence -from across different sources to answer correctly. In this paper, we propose -two models which make use of multiple passages to generate their answers. Both -use an answer-reranking approach which reorders the answer candidates generated -by an existing state-of-the-art QA model. We propose two methods, namely, -strength-based re-ranking and coverage-based re-ranking, to make use of the -aggregated evidence from different passages to better determine the answer. Our -models have achieved state-of-the-art results on three public open-domain QA -datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with -about 8 percentage points of improvement over the former two datasets. -" -6160,1711.05170,"Hamideh Hajiabadi, Diego Molla-Aliod, Reza Monsefi",On Extending Neural Networks with Loss Ensembles for Text Classification,cs.CL cs.LG stat.ML," Ensemble techniques are powerful approaches that combine several weak -learners to build a stronger one. As a meta learning framework, ensemble -techniques can easily be applied to many machine learning techniques. In this -paper we propose a neural network extended with an ensemble loss function for -text classification. The weight of each weak loss function is tuned within the -training phase through the gradient propagation optimization method of the -neural network. The approach is evaluated on several text classification -datasets. We also evaluate its performance in various environments with several -degrees of label noise. Experimental results indicate an improvement of the -results and strong resilience against label noise in comparison with other -methods. -" -6161,1711.05186,"Anca Dumitrache, Lora Aroyo, Chris Welty",False Positive and Cross-relation Signals in Distant Supervision Data,cs.CL," Distant supervision (DS) is a well-established method for relation extraction -from text, based on the assumption that when a knowledge-base contains a -relation between a term pair, then sentences that contain that pair are likely -to express the relation. In this paper, we use the results of a crowdsourcing -relation extraction task to identify two problems with DS data quality: the -widely varying degree of false positives across different relations, and the -observed causal connection between relations that are not considered by the DS -method. The crowdsourcing data aggregation is performed using ambiguity-aware -CrowdTruth metrics, that are used to capture and interpret inter-annotator -disagreement. We also present preliminary results of using the crowd to enhance -DS training data for a relation classification model, without requiring the -crowd to annotate the entire set. -" -6162,1711.05198,"Madhumita Sushil, Simon \v{S}uster, Kim Luyckx, Walter Daelemans","Unsupervised patient representations from clinical notes with - interpretable classification decisions",cs.CL," We have two main contributions in this work: 1. We explore the usage of a -stacked denoising autoencoder, and a paragraph vector model to learn -task-independent dense patient representations directly from clinical notes. We -evaluate these representations by using them as features in multiple supervised -setups, and compare their performance with those of sparse representations. 2. -To understand and interpret the representations, we explore the best encoded -features within the patient representations obtained from the autoencoder -model. Further, we calculate the significance of the input features of the -trained classifiers when we use these pretrained representations as input. -" -6163,1711.05217,"Angela Fan, David Grangier, Michael Auli",Controllable Abstractive Summarization,cs.CL," Current models for document summarization disregard user preferences such as -the desired length, style, the entities that the user might be interested in, -or how much of the document the user has already read. We present a neural -summarization model with a simple but effective mechanism to enable users to -specify these high level attributes in order to control the shape of the final -summaries to better suit their needs. With user input, our system can produce -high quality summaries that follow user preferences. Without user input, we set -the control variables automatically. On the full text CNN-Dailymail dataset, we -outperform state of the art abstractive systems (both in terms of F1-ROUGE1 -40.38 vs. 39.53 and human evaluation). -" -6164,1711.05240,"Omer Goldman and Veronica Latcinnik and Udi Naveh and Amir Globerson - and Jonathan Berant",Weakly-supervised Semantic Parsing with Abstract Examples,cs.CL cs.AI cs.LG," Training semantic parsers from weak supervision (denotations) rather than -strong supervision (programs) complicates training in two ways. First, a large -search space of potential programs needs to be explored at training time to -find a correct program. Second, spurious programs that accidentally lead to a -correct denotation add noise to training. In this work we propose that in -closed worlds with clear semantic types, one can substantially alleviate these -problems by utilizing an abstract representation, where tokens in both the -language utterance and program are lifted to an abstract form. We show that -these abstractions can be defined with a handful of lexical rules and that they -result in sharing between different examples that alleviates the difficulties -in training. To test our approach, we develop the first semantic parser for -CNLVR, a challenging visual reasoning dataset, where the search space is large -and overcoming spuriousness is critical, because denotations are either TRUE or -FALSE, and thus random programs are likely to lead to a correct denotation. Our -method substantially improves performance, and reaches 82.5% accuracy, a 14.7% -absolute accuracy improvement compared to the best reported accuracy so far. -" -6165,1711.05294,"Shoaib Jameel, Zied Bouraoui, Steven Schockaert",Modeling Semantic Relatedness using Global Relation Vectors,cs.CL," Word embedding models such as GloVe rely on co-occurrence statistics from a -large corpus to learn vector representations of word meaning. These vectors -have proven to capture surprisingly fine-grained semantic and syntactic -information. While we may similarly expect that co-occurrence statistics can be -used to capture rich information about the relationships between different -words, existing approaches for modeling such relationships have mostly relied -on manipulating pre-trained word vectors. In this paper, we introduce a novel -method which directly learns relation vectors from co-occurrence statistics. To -this end, we first introduce a variant of GloVe, in which there is an explicit -connection between word vectors and PMI weighted co-occurrence vectors. We then -show how relation vectors can be naturally embedded into the resulting vector -space. -" -6166,1711.05313,"Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, - Yejin Choi",Simulating Action Dynamics with Neural Process Networks,cs.CL," Understanding procedural language requires anticipating the causal effects of -actions, even when they are not explicitly stated. In this work, we introduce -Neural Process Networks to understand procedural text through (neural) -simulation of action dynamics. Our model complements existing memory -architectures with dynamic entity tracking by explicitly modeling actions as -state transformers. The model updates the states of the entities by executing -learned action operators. Empirical results demonstrate that our proposed model -can reason about the unstated causal effects of actions, allowing it to provide -more accurate contextual information for understanding and generating -procedural text, all while offering more interpretable internal representations -than existing alternatives. -" -6167,1711.05345,Yu-An Chung and Hung-Yi Lee and James Glass,Supervised and Unsupervised Transfer Learning for Question Answering,cs.CL," Although transfer learning has been shown to be successful for tasks like -object and speech recognition, its applicability to question answering (QA) has -yet to be well-studied. In this paper, we conduct extensive experiments to -investigate the transferability of knowledge learned from a source QA dataset -to a target dataset using two QA models. The performance of both models on a -TOEFL listening comprehension test (Tseng et al., 2016) and MCTest (Richardson -et al., 2013) is significantly improved via a simple transfer learning -technique from MovieQA (Tapaswi et al., 2016). In particular, one of the models -achieves the state-of-the-art on all target datasets; for the TOEFL listening -comprehension test, it outperforms the previous best model by 7%. Finally, we -show that transfer learning is helpful even in unsupervised scenarios when -correct answers for target QA dataset examples are not available. -" -6168,1711.05350,"Chen Zheng, Shuangfei Zhai, Zhongfei Zhang","A Deep Learning Approach for Expert Identification in Question Answering - Communities",cs.CL," In this paper, we describe an effective convolutional neural network -framework for identifying the expert in question answering community. This -approach uses the convolutional neural network and combines user feature -representations with question feature representations to compute scores that -the user who gets the highest score is the expert on this question. Unlike -prior work, this method does not measure expert based on measure answer content -quality to identify the expert but only require question sentence and user -embedding feature to identify the expert. Remarkably, Our model can be applied -to different languages and different domains. The proposed framework is trained -on two datasets, The first dataset is Stack Overflow and the second one is -Zhihu. The Top-1 accuracy results of our experiments show that our framework -outperforms the best baseline framework for expert identification. -" -6169,1711.05380,"Shaohui Kuang, Junhui Li, Ant\'onio Branco, Weihua Luo, Deyi Xiong","Attention Focusing for Neural Machine Translation by Bridging Source and - Target Embeddings",cs.CL," In neural machine translation, a source sequence of words is encoded into a -vector from which a target sequence is generated in the decoding phase. -Differently from statistical machine translation, the associations between -source words and their possible target counterparts are not explicitly stored. -Source and target words are at the two ends of a long information processing -procedure, mediated by hidden states at both the source encoding and the target -decoding phases. This makes it possible that a source word is incorrectly -translated into a target word that is not any of its admissible equivalent -counterparts in the target language. - In this paper, we seek to somewhat shorten the distance between source and -target words in that procedure, and thus strengthen their association, by means -of a method we term bridging source and target word embeddings. We experiment -with three strategies: (1) a source-side bridging model, where source word -embeddings are moved one step closer to the output target sequence; (2) a -target-side bridging model, which explores the more relevant source word -embeddings for the prediction of the target sequence; and (3) a direct bridging -model, which directly connects source and target word embeddings seeking to -minimize errors in the translation of ones by the others. - Experiments and analysis presented in this paper demonstrate that the -proposed bridging models are able to significantly improve quality of both -sentence translation, in general, and alignment and translation of individual -source words with target words, in particular. -" -6170,1711.05408,"Yining Chen, Sorcha Gilroy, Andreas Maletti, Jonathan May, Kevin - Knight",Recurrent Neural Networks as Weighted Language Recognizers,cs.FL cs.CC cs.CL," We investigate the computational complexity of various problems for simple -recurrent neural networks (RNNs) as formal models for recognizing weighted -languages. We focus on the single-layer, ReLU-activation, rational-weight RNNs -with softmax, which are commonly used in natural language processing -applications. We show that most problems for such RNNs are undecidable, -including consistency, equivalence, minimization, and the determination of the -highest-weighted string. However, for consistent RNNs the last problem becomes -decidable, although the solution length can surpass all computable bounds. If -additionally the string is limited to polynomial length, the problem becomes -NP-complete and APX-hard. In summary, this shows that approximations and -heuristic algorithms are necessary in practical applications of those RNNs. -" -6171,1711.05433,"Yu-Ping Ruan, Qian Chen, and Zhen-Hua Ling","A Sequential Neural Encoder with Latent Structured Description for - Modeling Sentences",cs.CL," In this paper, we propose a sequential neural encoder with latent structured -description (SNELSD) for modeling sentences. This model introduces latent -chunk-level representations into conventional sequential neural encoders, i.e., -recurrent neural networks (RNNs) with long short-term memory (LSTM) units, to -consider the compositionality of languages in semantic modeling. An SNELSD -model has a hierarchical structure that includes a detection layer and a -description layer. The detection layer predicts the boundaries of latent word -chunks in an input sentence and derives a chunk-level vector for each word. The -description layer utilizes modified LSTM units to process these chunk-level -vectors in a recurrent manner and produces sequential encoding outputs. These -output vectors are further concatenated with word vectors or the outputs of a -chain LSTM encoder to obtain the final sentence representation. All the model -parameters are learned in an end-to-end manner without a dependency on -additional text chunking or syntax parsing. A natural language inference (NLI) -task and a sentiment analysis (SA) task are adopted to evaluate the performance -of our proposed model. The experimental results demonstrate the effectiveness -of the proposed SNELSD model on exploring task-dependent chunking patterns -during the semantic modeling of sentences. Furthermore, the proposed method -achieves better performance than conventional chain LSTMs and tree-structured -LSTMs on both tasks. -" -6172,1711.05443,"Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, - Haisheng Dai, Dong Wang",Human and Machine Speaker Recognition Based on Short Trivial Events,cs.SD cs.CL cs.NE eess.AS," Trivial events are ubiquitous in human to human conversations, e.g., cough, -laugh and sniff. Compared to regular speech, these trivial events are usually -short and unclear, thus generally regarded as not speaker discriminative and so -are largely ignored by present speaker recognition research. However, these -trivial events are highly valuable in some particular circumstances such as -forensic examination, as they are less subjected to intentional change, so can -be used to discover the genuine speaker from disguised speech. In this paper, -we collect a trivial event speech database that involves 75 speakers and 6 -types of events, and report preliminary speaker recognition results on this -database, by both human listeners and machines. Particularly, the deep feature -learning technique recently proposed by our group is utilized to analyze and -recognize the trivial events, which leads to acceptable equal error rates -(EERs) despite the extremely short durations (0.2-0.5 seconds) of these events. -Comparing different types of events, 'hmm' seems more speaker discriminative. -" -6173,1711.05447,"Younggun Lee, Azam Rabiee, Soo-Young Lee",Emotional End-to-End Neural Speech Synthesizer,cs.SD cs.CL eess.AS," In this paper, we introduce an emotional speech synthesizer based on the -recent end-to-end neural model, named Tacotron. Despite its benefits, we found -that the original Tacotron suffers from the exposure bias problem and -irregularity of the attention alignment. Later, we address the problem by -utilization of context vector and residual connection at recurrent neural -networks (RNNs). Our experiments showed that the model could successfully train -and generate speech for given emotion labels. -" -6174,1711.05448,"Shankar Kumar, Michael Nirschl, Daniel Holtmann-Rice, Hank Liao, - Ananda Theertha Suresh, Felix Yu","Lattice Rescoring Strategies for Long Short Term Memory Language Models - in Speech Recognition",stat.ML cs.CL cs.LG," Recurrent neural network (RNN) language models (LMs) and Long Short Term -Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform -traditional N-gram LMs on speech recognition tasks. However, these models are -computationally more expensive than N-gram LMs for decoding, and thus, -challenging to integrate into speech recognizers. Recent research has proposed -the use of lattice-rescoring algorithms using RNNLMs and LSTMLMs as an -efficient strategy to integrate these models into a speech recognition system. -In this paper, we evaluate existing lattice rescoring algorithms along with new -variants on a YouTube speech recognition task. Lattice rescoring using LSTMLMs -reduces the word error rate (WER) for this task by 8\% relative to the WER -obtained using an N-gram LM. -" -6175,1711.05467,Du Steven and Xi Zhang,Aicyber's System for NLPCC 2017 Shared Task 2: Voting of Baselines,cs.CL," This paper presents Aicyber's system for NLPCC 2017 shared task 2. It is -formed by a voting of three deep learning based system trained on -character-enhanced word vectors and a well known bag-of-word model. -" -6176,1711.05468,Johannes Bjerva and Isabelle Augenstein,"Tracking Typological Traits of Uralic Languages in Distributed Language - Representations",cs.CL," Although linguistic typology has a long history, computational approaches -have only recently gained popularity. The use of distributed representations in -computational linguistics has also become increasingly popular. A recent -development is to learn distributed representations of language, such that -typologically similar languages are spatially close to one another. Although -empirical successes have been shown for such language representations, they -have not been subjected to much typological probing. In this paper, we first -look at whether this type of language representations are empirically useful -for model transfer between Uralic languages in deep neural networks. We then -investigate which typological features are encoded in these representations by -attempting to predict features in the World Atlas of Language Structures, at -various stages of fine-tuning of the representations. We focus on Uralic -languages, and find that some typological traits can be automatically inferred -with accuracies well above a strong baseline. -" -6177,1711.05472,"Elmar Juergens and Florian Deissenboeck and Martin Feilkas and - Benjamin Hummel and Bernhard Schaetz and Stefan Wagner and Christoph Domann - and Jonathan Streit","Can clone detection support quality assessments of requirements - specifications?",cs.SE cs.CL," Due to their pivotal role in software engineering, considerable effort is -spent on the quality assurance of software requirements specifications. As they -are mainly described in natural language, relatively few means of automated -quality assessment exist. However, we found that clone detection, a technique -widely applied to source code, is promising to assess one important quality -aspect in an automated way, namely redundancy that stems from copy&paste -operations. This paper describes a large-scale case study that applied clone -detection to 28 requirements specifications with a total of 8,667 pages. We -report on the amount of redundancy found in real-world specifications, discuss -its nature as well as its consequences and evaluate in how far existing code -clone detection approaches can be applied to assess the quality of requirements -specifications in practice. -" -6178,1711.05516,"Shaonan Wang, Jiajun Zhang, Nan Lin, Chengqing Zong","Investigating Inner Properties of Multimodal Representation and Semantic - Compositionality with Brain-based Componential Semantics",cs.CL," Multimodal models have been proven to outperform text-based approaches on -learning semantic representations. However, it still remains unclear what -properties are encoded in multimodal representations, in what aspects do they -outperform the single-modality representations, and what happened in the -process of semantic compositionality in different input modalities. Considering -that multimodal models are originally motivated by human concept -representations, we assume that correlating multimodal representations with -brain-based semantics would interpret their inner properties to answer the -above questions. To that end, we propose simple interpretation methods based on -brain-based componential semantics. First we investigate the inner properties -of multimodal representations by correlating them with corresponding -brain-based property vectors. Then we map the distributed vector space to the -interpretable brain-based componential space to explore the inner properties of -semantic compositionality. Ultimately, the present paper sheds light on the -fundamental questions of natural language understanding, such as how to -represent the meaning of words and how to combine word meanings into larger -units. -" -6179,1711.05538,"Christian Kahmann, Andreas Niekler, Gerhard Heyer","Detecting and assessing contextual change in diachronic text documents - using context volatility",cs.CL," Terms in diachronic text corpora may exhibit a high degree of semantic -dynamics that is only partially captured by the common notion of semantic -change. The new measure of context volatility that we propose models the degree -by which terms change context in a text collection over time. The computation -of context volatility for a word relies on the significance-values of its -co-occurrent terms and the corresponding co-occurrence ranks in sequential time -spans. We define a baseline and present an efficient computational approach in -order to overcome problems related to computational issues in the data -structure. Results are evaluated both, on synthetic documents that are used to -simulate contextual changes, and a real example based on British newspaper -texts. -" -6180,1711.05557,"Ying Hua Tan, Chee Seng Chan",Phrase-based Image Captioning with Hierarchical LSTM Model,cs.CV cs.AI cs.CL," Automatic generation of caption to describe the content of an image has been -gaining a lot of research interests recently, where most of the existing works -treat the image caption as pure sequential data. Natural language, however -possess a temporal hierarchy structure, with complex dependencies between each -subsequence. In this paper, we propose a phrase-based hierarchical Long -Short-Term Memory (phi-LSTM) model to generate image description. In contrast -to the conventional solutions that generate caption in a pure sequential -manner, our proposed model decodes image caption from phrase to sentence. It -consists of a phrase decoder at the bottom hierarchy to decode noun phrases of -variable length, and an abbreviated sentence decoder at the upper hierarchy to -decode an abbreviated form of the image description. A complete image caption -is formed by combining the generated phrases with sentence during the inference -stage. Empirically, our proposed model shows a better or competitive result on -the Flickr8k, Flickr30k and MS-COCO datasets in comparison to the state-of-the -art models. We also show that our proposed model is able to generate more novel -captions (not seen in the training data) which are richer in word contents in -all these three datasets. -" -6181,1711.05568,"Zheqian Chen, Rongqin Yang, Zhou Zhao, Deng Cai, Xiaofei He",Dialogue Act Recognition via CRF-Attentive Structured Network,cs.CL," Dialogue Act Recognition (DAR) is a challenging problem in dialogue -interpretation, which aims to attach semantic labels to utterances and -characterize the speaker's intention. Currently, many existing approaches -formulate the DAR problem ranging from multi-classification to structured -prediction, which suffer from handcrafted feature extensions and attentive -contextual structural dependencies. In this paper, we consider the problem of -DAR from the viewpoint of extending richer Conditional Random Field (CRF) -structural dependencies without abandoning end-to-end training. We incorporate -hierarchical semantic inference with memory mechanism on the utterance -modeling. We then extend structured attention network to the linear-chain -conditional random field layer which takes into account both contextual -utterances and corresponding dialogue acts. The extensive experiments on two -major benchmark datasets Switchboard Dialogue Act (SWDA) and Meeting Recorder -Dialogue Act (MRDA) datasets show that our method achieves better performance -than other state-of-the-art solutions to the problem. It is a remarkable fact -that our method is nearly close to the human annotator's performance on SWDA -within 2% gap. -" -6182,1711.05603,"Hosein Azarbonyad, Mostafa Dehghani, Kaspar Beelen, Alexandra Arkut, - Maarten Marx, Jaap Kamps","Words are Malleable: Computing Semantic Shifts in Political and Media - Discourse",cs.CL," Recently, researchers started to pay attention to the detection of temporal -shifts in the meaning of words. However, most (if not all) of these approaches -restricted their efforts to uncovering change over time, thus neglecting other -valuable dimensions such as social or political variability. We propose an -approach for detecting semantic shifts between different viewpoints--broadly -defined as a set of texts that share a specific metadata feature, which can be -a time-period, but also a social entity such as a political party. For each -viewpoint, we learn a semantic space in which each word is represented as a low -dimensional neural embedded vector. The challenge is to compare the meaning of -a word in one space to its meaning in another space and measure the size of the -semantic shifts. We compare the effectiveness of a measure based on optimal -transformations between the two spaces with a measure based on the similarity -of the neighbors of the word in the respective spaces. Our experiments -demonstrate that the combination of these two performs best. We show that the -semantic shifts not only occur over time, but also along different viewpoints -in a short period of time. For evaluation, we demonstrate how this approach -captures meaningful semantic shifts and can help improve other tasks such as -the contrastive viewpoint summarization and ideology detection (measured as -classification accuracy) in political texts. We also show that the two laws of -semantic change which were empirically shown to hold for temporal shifts also -hold for shifts across viewpoints. These laws state that frequent words are -less likely to shift meaning while words with many senses are more likely to do -so. -" -6183,1711.05626,"Pankaj Gupta, Subburam Rajaram, Hinrich Sch\""utze, Bernt Andrassy",Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time,cs.CL cs.AI cs.IR cs.LG," Dynamic topic modeling facilitates the identification of topical trends over -time in temporal collections of unstructured documents. We introduce a novel -unsupervised neural dynamic topic model named as Recurrent Neural -Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each -time influence the topic discovery in the subsequent time steps. We account for -the temporal ordering of documents by explicitly modeling a joint distribution -of latent topical dependencies over time, using distributional estimators with -temporal recurrent connections. Applying RNN-RSM to 19 years of articles on NLP -research, we demonstrate that compared to state-of-the art topic models, RNNRSM -shows better generalization, topic interpretation, evolution and trends. We -also introduce a metric (named as SPAN) to quantify the capability of dynamic -topic model to capture word evolution in topics over time. -" -6184,1711.05678,"Syed Sarfaraz Akhtar, Arihant Gupta, Avijit Vajpayee, Arjit Srivastava - and Manish Shrivastava","Unsupervised Morphological Expansion of Small Datasets for Improving - Word Embeddings",cs.CL," We present a language independent, unsupervised method for building word -embeddings using morphological expansion of text. Our model handles the problem -of data sparsity and yields improved word embeddings by relying on training -word embeddings on artificially generated sentences. We evaluate our method -using small sized training sets on eleven test sets for the word similarity -task across seven languages. Further, for English, we evaluated the impacts of -our approach using a large training set on three standard test sets. Our method -improved results across all languages. -" -6185,1711.05680,"Syed Sarfaraz Akhtar, Arihant Gupta, Avijit Vajpayee, Arjit - Srivastava, Madan Gopal Jhawar and Manish Shrivastava",An Unsupervised Approach for Mapping between Vector Spaces,cs.CL," We present a language independent, unsupervised approach for transforming -word embeddings from source language to target language using a transformation -matrix. Our model handles the problem of data scarcity which is faced by many -languages in the world and yields improved word embeddings for words in the -target language by relying on transformed embeddings of words of the source -language. We initially evaluate our approach via word similarity tasks on a -similar language pair - Hindi as source and Urdu as the target language, while -we also evaluate our method on French and German as target languages and -English as source language. Our approach improves the current state of the art -results - by 13% for French and 19% for German. For Urdu, we saw an increment -of 16% over our initial baseline score. We further explore the prospects of our -approach by applying it on multiple models of the same language and -transferring words between the two models, thus solving the problem of missing -words in a model. We evaluate this on word similarity and word analogy tasks. -" -6186,1711.05715,"Zachary Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li - Deng","BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for - Task-Oriented Dialogue Systems",cs.AI cs.CL cs.LG," We present a new algorithm that significantly improves the efficiency of -exploration for deep Q-learning agents in dialogue systems. Our agents explore -via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop -neural network. Our algorithm learns much faster than common exploration -strategies such as \epsilon-greedy, Boltzmann, bootstrapping, and -intrinsic-reward-based ones. Additionally, we show that spiking the replay -buffer with experiences from just a few successful episodes can make Q-learning -feasible when it might otherwise fail. -" -6187,1711.05732,"John Wieting, Kevin Gimpel","ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with - Millions of Machine Translations",cs.CL," We describe PARANMT-50M, a dataset of more than 50 million English-English -sentential paraphrase pairs. We generated the pairs automatically by using -neural machine translation to translate the non-English side of a large -parallel corpus, following Wieting et al. (2017). Our hope is that ParaNMT-50M -can be a valuable resource for paraphrase generation and can provide a rich -source of semantic knowledge to improve downstream natural language -understanding tasks. To show its utility, we use ParaNMT-50M to train -paraphrastic sentence embeddings that outperform all supervised systems on -every SemEval semantic textual similarity competition, in addition to showing -how it can be used for paraphrase generation. -" -6188,1711.05780,"Tommy Sandbank, Michal Shmueli-Scheuer, Jonathan Herzig, David - Konopnicki, John Richards, David Piorkowski",Detecting Egregious Conversations between Customers and Virtual Agents,cs.CL," Virtual agents are becoming a prominent channel of interaction in customer -service. Not all customer interactions are smooth, however, and some can become -almost comically bad. In such instances, a human agent might need to step in -and salvage the conversation. Detecting bad conversations is important since -disappointing customer service may threaten customer loyalty and impact -revenue. In this paper, we outline an approach to detecting such egregious -conversations, using behavioral cues from the user, patterns in agent -responses, and user-agent interaction. Using logs of two commercial systems, we -show that using these features improves the detection F1-score by around 20% -over using textual features alone. In addition, we show that those features are -common across two quite different domains and, arguably, universal. -" -6189,1711.05789,"Yuan Yang, Jingcheng Yu, Ye Hu, Xiaoyao Xu and Eric Nyberg","CMU LiveMedQA at TREC 2017 LiveQA: A Consumer Health Question Answering - System",cs.CL cs.IR," In this paper, we present LiveMedQA, a question answering system that is -optimized for consumer health question. On top of the general QA system -pipeline, we introduce several new features that aim to exploit domain-specific -knowledge and entity structures for better performance. This includes a -question type/focus analyzer based on deep text classification model, a -tree-based knowledge graph for answer generation and a complementary -structure-aware searcher for answer retrieval. LiveMedQA system is evaluated in -the TREC 2017 LiveQA medical subtask, where it received an average score of -0.356 on a 3 point scale. Evaluation results revealed 3 substantial drawbacks -in current LiveMedQA system, based on which we provide a detailed discussion -and propose a few solutions that constitute the main focus of our subsequent -work. -" -6190,1711.05795,"Shikhar Murty, Patrick Verga, Luke Vilnis, Andrew McCallum",Finer Grained Entity Typing with TypeNet,cs.CL cs.NE," We consider the challenging problem of entity typing over an extremely fine -grained set of types, wherein a single mention or entity can have many -simultaneous and often hierarchically-structured types. Despite the importance -of the problem, there is a relative lack of resources in the form of -fine-grained, deep type hierarchies aligned to existing knowledge bases. In -response, we introduce TypeNet, a dataset of entity types consisting of over -1941 types organized in a hierarchy, obtained by manually annotating a mapping -from 1081 Freebase types to WordNet. We also experiment with several models -comparable to state-of-the-art systems and explore techniques to incorporate a -structure loss on the hierarchy with the standard mention typing loss, as a -first step towards future research on this dataset. -" -6191,1711.05851,"Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan - Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum","Go for a Walk and Arrive at the Answer: Reasoning Over Paths in - Knowledge Bases using Reinforcement Learning",cs.CL cs.AI," Knowledge bases (KB), both automatically and manually constructed, are often -incomplete --- many valid facts can be inferred from the KB by synthesizing -existing information. A popular approach to KB completion is to infer new -relations by combinatory reasoning over the information found along other paths -connecting a pair of entities. Given the enormous size of KBs and the -exponential number of paths, previous path-based models have considered only -the problem of predicting a missing relation given two entities or evaluating -the truth of a proposed triple. Additionally, these methods have traditionally -used random paths between fixed entity pairs or more recently learned to pick -paths between them. We propose a new algorithm MINERVA, which addresses the -much more difficult and practical task of answering questions where the -relation is known, but only one entity. Since random walks are impractical in a -setting with combinatorially many destinations from a start node, we present a -neural reinforcement learning approach which learns how to navigate the graph -conditioned on the input query to find predictive paths. Empirically, this -approach obtains state-of-the-art results on several datasets, significantly -outperforming prior methods. -" -6192,1711.05885,"Julian Michael, Gabriel Stanovsky, Luheng He, Ido Dagan, Luke - Zettlemoyer",Crowdsourcing Question-Answer Meaning Representations,cs.CL," We introduce Question-Answer Meaning Representations (QAMRs), which represent -the predicate-argument structure of a sentence as a set of question-answer -pairs. We also develop a crowdsourcing scheme to show that QAMRs can be labeled -with very little training, and gather a dataset with over 5,000 sentences and -100,000 questions. A detailed qualitative analysis demonstrates that the -crowd-generated question-answer pairs cover the vast majority of -predicate-argument relationships in existing datasets (including PropBank, -NomBank, QA-SRL, and AMR) along with many previously under-resourced ones, -including implicit arguments and relations. The QAMR data and annotation code -is made publicly available to enable future work on how best to model these -complex phenomena. -" -6193,1711.06004,Christophe Van Gysel,Remedies against the Vocabulary Gap in Information Retrieval,cs.IR cs.AI cs.CL," Search engines rely heavily on term-based approaches that represent queries -and documents as bags of words. Text---a document or a query---is represented -by a bag of its words that ignores grammar and word order, but retains word -frequency counts. When presented with a search query, the engine then ranks -documents according to their relevance scores by computing, among other things, -the matching degrees between query and document terms. While term-based -approaches are intuitive and effective in practice, they are based on the -hypothesis that documents that exactly contain the query terms are highly -relevant regardless of query semantics. Inversely, term-based approaches assume -documents that do not contain query terms as irrelevant. However, it is known -that a high matching degree at the term level does not necessarily mean high -relevance and, vice versa, documents that match null query terms may still be -relevant. Consequently, there exists a vocabulary gap between queries and -documents that occurs when both use different words to describe the same -concepts. It is the alleviation of the effect brought forward by this -vocabulary gap that is the topic of this dissertation. More specifically, we -propose (1) methods to formulate an effective query from complex textual -structures and (2) latent vector space models that circumvent the vocabulary -gap in information retrieval. -" -6194,1711.06061,"Ruichu Cai, Boyan Xu, Xiaoyan Yang, Zhenjie Zhang, Zijian Li, Zhihao - Liang","An Encoder-Decoder Framework Translating Natural Language to Database - Queries",cs.CL," Machine translation is going through a radical revolution, driven by the -explosive development of deep learning techniques using Convolutional Neural -Network (CNN) and Recurrent Neural Network (RNN). In this paper, we consider a -special case in machine translation problems, targeting to convert natural -language into Structured Query Language (SQL) for data retrieval over -relational database. Although generic CNN and RNN learn the grammar structure -of SQL when trained with sufficient samples, the accuracy and training -efficiency of the model could be dramatically improved, when the translation -model is deeply integrated with the grammar rules of SQL. We present a new -encoder-decoder framework, with a suite of new approaches, including new -semantic features fed into the encoder, grammar-aware states injected into the -memory of decoder, as well as recursive state management for sub-queries. These -techniques help the neural network better focus on understanding semantics of -operations in natural language and save the efforts on SQL grammar learning. -The empirical evaluation on real world database and queries show that our -approach outperform state-of-the-art solution by a significant margin. -" -6195,1711.06095,"Evgeny Stepanov, Stephane Lathuiliere, Shammur Absar Chowdhury, - Arindam Ghosh, Radu-Laurentiu Vieriu, Nicu Sebe, Giuseppe Riccardi",Depression Severity Estimation from Multiple Modalities,cs.CV cs.CL," Depression is a major debilitating disorder which can affect people from all -ages. With a continuous increase in the number of annual cases of depression, -there is a need to develop automatic techniques for the detection of the -presence and extent of depression. In this AVEC challenge we explore different -modalities (speech, language and visual features extracted from face) to design -and develop automatic methods for the detection of depression. In psychology -literature, the PHQ-8 questionnaire is well established as a tool for measuring -the severity of depression. In this paper we aim to automatically predict the -PHQ-8 scores from features extracted from the different modalities. We show -that visual features extracted from facial landmarks obtain the best -performance in terms of estimating the PHQ-8 results with a mean absolute error -(MAE) of 4.66 on the development set. Behavioral characteristics from speech -provide an MAE of 4.73. Language features yield a slightly higher MAE of 5.17. -When switching to the test set, our Turn Features derived from audio -transcriptions achieve the best performance, scoring an MAE of 4.11 -(corresponding to an RMSE of 4.94), which makes our system the winner of the -AVEC 2017 depression sub-challenge. -" -6196,1711.06141,"Lai Dac Viet, Vu Trong Sinh, Nguyen Le Minh, Ken Satoh",ConvAMR: Abstract meaning representation parsing for legal document,cs.CL," Convolutional neural networks (CNN) have recently achieved remarkable -performance in a wide range of applications. In this research, we equip -convolutional sequence-to-sequence (seq2seq) model with an efficient graph -linearization technique for abstract meaning representation parsing. Our -linearization method is better than the prior method at signaling the turn of -graph traveling. Additionally, convolutional seq2seq model is more appropriate -and considerably faster than the recurrent neural network models in this task. -Our method outperforms previous methods by a large margin on both the standard -dataset LDC2014T12. Our result indicates that future works still have a room -for improving parsing model using graph linearization approach. -" -6197,1711.06196,"Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Andres Duque","Addressing Cross-Lingual Word Sense Disambiguation on Low-Density - Languages: Application to Persian",cs.CL cs.IR," We explore the use of unsupervised methods in Cross-Lingual Word Sense -Disambiguation (CL-WSD) with the application of English to Persian. Our -proposed approach targets the languages with scarce resources (low-density) by -exploiting word embedding and semantic similarity of the words in context. We -evaluate the approach on a recent evaluation benchmark and compare it with the -state-of-the-art unsupervised system (CO-Graph). The results show that our -approach outperforms both the standard baseline and the CO-Graph system in both -of the task evaluation metrics (Out-Of-Five and Best result). -" -6198,1711.06232,"Jia-Hong Huang, Cuong Duc Dao, Modar Alfadly, Bernard Ghanem",A Novel Framework for Robustness Analysis of Visual QA Models,cs.CV cs.CL," Deep neural networks have been playing an essential role in many computer -vision tasks including Visual Question Answering (VQA). Until recently, the -study of their accuracy was the main focus of research but now there is a trend -toward assessing the robustness of these models against adversarial attacks by -evaluating their tolerance to varying noise levels. In VQA, adversarial attacks -can target the image and/or the proposed main question and yet there is a lack -of proper analysis of the later. In this work, we propose a flexible framework -that focuses on the language part of VQA that uses semantically relevant -questions, dubbed basic questions, acting as controllable noise to evaluate the -robustness of VQA models. We hypothesize that the level of noise is positively -correlated to the similarity of a basic question to the main question. Hence, -to apply noise on any given main question, we rank a pool of basic questions -based on their similarity by casting this ranking task as a LASSO optimization -problem. Then, we propose a novel robustness measure, R_score, and two -large-scale basic question datasets (BQDs) in order to standardize robustness -analysis for VQA models. -" -6199,1711.06238,Rajarshee Mitra,A Generative Approach to Question Answering,cs.CL," Question Answering has come a long way from answer sentence selection, -relational QA to reading and comprehension. We shift our attention to -generative question answering (gQA) by which we facilitate machine to read -passages and answer questions by learning to generate the answers. We frame the -problem as a generative task where the encoder being a network that models the -relationship between question and passage and encoding them to a vector thus -facilitating the decoder to directly form an abstraction of the answer. Not -being able to retain facts and making repetitions are common mistakes that -affect the overall legibility of answers. To counter these issues, we employ -copying mechanism and maintenance of coverage vector in our model respectively. -Our results on MS-MARCO demonstrate it's superiority over baselines and we also -show qualitative examples where we improved in terms of correctness and -readability -" -6200,1711.06288,"Jianbo Chen, Yelong Shen, Jianfeng Gao, Jingjing Liu, Xiaodong Liu",Language-Based Image Editing with Recurrent Attentive Models,cs.CV cs.CL cs.LG," We investigate the problem of Language-Based Image Editing (LBIE). Given a -source image and a natural language description, we want to generate a target -image by editing the source image based on the description. We propose a -generic modeling framework for two sub-tasks of LBIE: language-based image -segmentation and image colorization. The framework uses recurrent attentive -models to fuse image and language features. Instead of using a fixed step size, -we introduce for each region of the image a termination gate to dynamically -determine after each inference step whether to continue extrapolating -additional information from the textual description. The effectiveness of the -framework is validated on three datasets. First, we introduce a synthetic -dataset, called CoSaL, to evaluate the end-to-end performance of our LBIE -system. Second, we show that the framework leads to state-of-the-art -performance on image segmentation on the ReferIt dataset. Third, we present the -first language-based colorization result on the Oxford-102 Flowers dataset. -" -6201,1711.06351,"Anselm Rothe, Brenden M. Lake, Todd M. Gureckis",Question Asking as Program Generation,cs.CL cs.AI cs.LG," A hallmark of human intelligence is the ability to ask rich, creative, and -revealing questions. Here we introduce a cognitive model capable of -constructing human-like questions. Our approach treats questions as formal -programs that, when executed on the state of the world, output an answer. The -model specifies a probability distribution over a complex, compositional space -of programs, favoring concise programs that help the agent learn in the current -context. We evaluate our approach by modeling the types of open-ended questions -generated by humans who were attempting to learn about an ambiguous situation -in a game. We find that our model predicts what questions people will ask, and -can creatively produce novel questions that were not present in the training -set. In addition, we compare a number of model variants, finding that both -question informativeness and complexity are important for producing human-like -questions. -" -6202,1711.06729,"Laura Gwilliams, David Poeppel, Alec Marantz and Tal Linzen",Phonological (un)certainty weights lexical activation,cs.CL," Spoken word recognition involves at least two basic computations. First is -matching acoustic input to phonological categories (e.g. /b/, /p/, /d/). Second -is activating words consistent with those phonological categories. Here we test -the hypothesis that the listener's probability distribution over lexical items -is weighted by the outcome of both computations: uncertainty about phonological -discretisation and the frequency of the selected word(s). To test this, we -record neural responses in auditory cortex using magnetoencephalography, and -model this activity as a function of the size and relative activation of -lexical candidates. Our findings indicate that towards the beginning of a word, -the processing system indeed weights lexical candidates by both phonological -certainty and lexical frequency; however, later into the word, activation is -weighted by frequency alone. -" -6203,1711.06744,"Fan Yang, Jiazhong Nie, William W. Cohen, Ni Lao",Learning to Organize Knowledge and Answer Questions with N-Gram Machines,cs.CL cs.AI," Though deep neural networks have great success in natural language -processing, they are limited at more knowledge intensive AI tasks, such as -open-domain Question Answering (QA). Existing end-to-end deep QA models need to -process the entire text after observing the question, and therefore their -complexity in responding a question is linear in the text size. This is -prohibitive for practical tasks such as QA from Wikipedia, a novel, or the Web. -We propose to solve this scalability issue by using symbolic meaning -representations, which can be indexed and retrieved efficiently with complexity -that is independent of the text size. We apply our approach, called the N-Gram -Machine (NGM), to three representative tasks. First as proof-of-concept, we -demonstrate that NGM successfully solves the bAbI tasks of synthetic text. -Second, we show that NGM scales to large corpus by experimenting on ""life-long -bAbI"", a special version of bAbI that contains millions of sentences. Lastly on -the WikiMovies dataset, we use NGM to induce latent structure (i.e. schema) and -answer questions from natural language Wikipedia text, with only QA pairs as -weak supervision. -" -6204,1711.06794,"Pan Lu, Hongsheng Li, Wei Zhang, Jianyong Wang, Xiaogang Wang","Co-attending Free-form Regions and Detections with Multi-modal - Multiplicative Feature Embedding for Visual Question Answering",cs.CV cs.AI cs.CL," Recently, the Visual Question Answering (VQA) task has gained increasing -attention in artificial intelligence. Existing VQA methods mainly adopt the -visual attention mechanism to associate the input question with corresponding -image regions for effective question answering. The free-form region based and -the detection-based visual attention mechanisms are mostly investigated, with -the former ones attending free-form image regions and the latter ones attending -pre-specified detection-box regions. We argue that the two attention mechanisms -are able to provide complementary information and should be effectively -integrated to better solve the VQA problem. In this paper, we propose a novel -deep neural network for VQA that integrates both attention mechanisms. Our -proposed framework effectively fuses features from free-form image regions, -detection boxes, and question representations via a multi-modal multiplicative -feature embedding scheme to jointly attend question-related free-form image -regions and detection boxes for more accurate question answering. The proposed -method is extensively evaluated on two publicly available datasets, COCO-QA and -VQA, and outperforms state-of-the-art approaches. Source code is available at -https://github.com/lupantech/dual-mfa-vqa. -" -6205,1711.06821,"Guillem Collell, Luc Van Gool, Marie-Francine Moens","Acquiring Common Sense Spatial Knowledge through Implicit Spatial - Templates",cs.AI cs.CL cs.CV stat.ML," Spatial understanding is a fundamental problem with wide-reaching real-world -applications. The representation of spatial knowledge is often modeled with -spatial templates, i.e., regions of acceptability of two objects under an -explicit spatial relationship (e.g., ""on"", ""below"", etc.). In contrast with -prior work that restricts spatial templates to explicit spatial prepositions -(e.g., ""glass on table""), here we extend this concept to implicit spatial -language, i.e., those relationships (generally actions) for which the spatial -arrangement of the objects is only implicitly implied (e.g., ""man riding -horse""). In contrast with explicit relationships, predicting spatial -arrangements from implicit spatial language requires significant common sense -spatial understanding. Here, we introduce the task of predicting spatial -templates for two objects under a relationship, which can be seen as a spatial -question-answering task with a (2D) continuous output (""where is the man w.r.t. -a horse when the man is walking the horse?""). We present two simple -neural-based models that leverage annotated images and structured text to learn -this task. The good performance of these models reveals that spatial locations -are to a large extent predictable from implicit spatial language. Crucially, -the models attain similar performance in a challenging generalized setting, -where the object-relation-object combinations (e.g.,""man walking dog"") have -never been seen before. Next, we go one step further by presenting the models -with unseen objects (e.g., ""dog""). In this scenario, we show that leveraging -word embeddings enables the models to output accurate spatial predictions, -proving that the models acquire solid common sense spatial knowledge allowing -for such generalization. -" -6206,1711.06826,"Moontae Lee, David Mimno","Low-dimensional Embeddings for Interpretable Anchor-based Topic - Inference",cs.CL," The anchor words algorithm performs provably efficient topic model inference -by finding an approximate convex hull in a high-dimensional word co-occurrence -space. However, the existing greedy algorithm often selects poor anchor words, -reducing topic quality and interpretability. Rather than finding an approximate -convex hull in a high-dimensional space, we propose to find an exact convex -hull in a visualizable 2- or 3-dimensional space. Such low-dimensional -embeddings both improve topics and clearly show users why the algorithm selects -certain words. -" -6207,1711.06861,"Zhenxin Fu, Xiaoye Tan, Nanyun Peng, Dongyan Zhao and Rui Yan",Style Transfer in Text: Exploration and Evaluation,cs.CL," Style transfer is an important problem in natural language processing (NLP). -However, the progress in language style transfer is lagged behind other -domains, such as computer vision, mainly because of the lack of parallel data -and principle evaluation metrics. In this paper, we propose to learn style -transfer with non-parallel data. We explore two models to achieve this goal, -and the key idea behind the proposed models is to learn separate content -representations and style representations using adversarial networks. We also -propose novel evaluation metrics which measure two aspects of style transfer: -transfer strength and content preservation. We access our models and the -evaluation metrics on two tasks: paper-news title transfer, and -positive-negative review transfer. Results show that the proposed content -preservation metric is highly correlate to human judgments, and the proposed -models are able to generate sentences with higher style transfer strength and -similar content preservation score comparing to auto-encoder. -" -6208,1711.06872,"Sheshera Mysore, Edward Kim, Emma Strubell, Ao Liu, Haw-Shiuan Chang, - Srikrishna Kompella, Kevin Huang, Andrew McCallum, Elsa Olivetti","Automatically Extracting Action Graphs from Materials Science Synthesis - Procedures",cs.CL," Computational synthesis planning approaches have achieved recent success in -organic chemistry, where tabulated synthesis procedures are readily available -for supervised learning. The syntheses of inorganic materials, however, exist -primarily as natural language narratives contained within scientific journal -articles. This synthesis information must first be extracted from the text in -order to enable analogous synthesis planning methods for inorganic materials. -In this work, we present a system for automatically extracting structured -representations of synthesis procedures from the texts of materials science -journal articles that describe explicit, experimental syntheses of inorganic -compounds. We define the structured representation as a set of linked events -made up of extracted scientific entities and evaluate two unsupervised -approaches for extracting these structures on expert-annotated articles: a -strong heuristic baseline and a generative model of procedural text. We also -evaluate a variety of supervised models for extracting scientific entities. Our -results provide insight into the nature of the data and directions for further -work in this exciting new area of research. -" -6209,1711.06895,Hai Hu,"Is China Entering WTO or shijie maoyi zuzhi--a Corpus Study of English - Acronyms in Chinese Newspapers",cs.CL," This is one of the first studies that quantitatively examine the usage of -English acronyms (e.g. WTO) in Chinese texts. Using newspaper corpora, I try to -answer 1) for all instances of a concept that has an English acronym (e.g. -World Trade Organization), what percentage is expressed in the English acronym -(WTO), and what percentage in its Chinese translation (shijie maoyi zuzhi), and -2) what factors are at play in language users' choice between the English and -Chinese forms? Results show that different concepts have different percentage -for English acronyms (PercentOfEn), ranging from 2% to 98%. Linear models show -that PercentOfEn for individual concepts can be predicted by language economy -(how long the Chinese translation is), concept frequency, and whether the first -appearance of the concept in Chinese newspapers is the English acronym or its -Chinese translation (all p < .05). -" -6210,1711.06899,"Daniel N. Rockmore, Chen Fang, Nicholas J. Foti, Tom Ginsburg, David - C. Krakauer",The Cultural Evolution of National Constitutions,cs.SI cs.CL physics.soc-ph," We explore how ideas from infectious disease and genetics can be used to -uncover patterns of cultural inheritance and innovation in a corpus of 591 -national constitutions spanning 1789 - 2008. Legal ""Ideas"" are encoded as -""topics"" - words statistically linked in documents - derived from topic -modeling the corpus of constitutions. Using these topics we derive a diffusion -network for borrowing from ancestral constitutions back to the US Constitution -of 1789 and reveal that constitutions are complex cultural recombinants. We -find systematic variation in patterns of borrowing from ancestral texts and -""biological""-like behavior in patterns of inheritance with the distribution of -""offspring"" arising through a bounded preferential-attachment process. This -process leads to a small number of highly innovative (influential) -constitutions some of which have yet to have been identified as so in the -current literature. Our findings thus shed new light on the critical nodes of -the constitution-making network. The constitutional network structure reflects -periods of intense constitution creation, and systematic patterns of variation -in constitutional life-span and temporal influence. -" -6211,1711.06968,"Imon Banerjee, Sriraman Madhavan, Roger Eric Goldman, Daniel L. Rubin",Intelligent Word Embeddings of Free-Text Radiology Reports,cs.IR cs.CL," Radiology reports are a rich resource for advancing deep learning -applications in medicine by leveraging the large volume of data continuously -being updated, integrated, and shared. However, there are significant -challenges as well, largely due to the ambiguity and subtlety of natural -language. We propose a hybrid strategy that combines semantic-dictionary -mapping and word2vec modeling for creating dense vector embeddings of free-text -radiology reports. Our method leverages the benefits of both -semantic-dictionary mapping as well as unsupervised learning. Using the vector -representation, we automatically classify the radiology reports into three -classes denoting confidence in the diagnosis of intracranial hemorrhage by the -interpreting radiologist. We performed experiments with varying hyperparameter -settings of the word embeddings and a range of different classifiers. Best -performance achieved was a weighted precision of 88% and weighted recall of -90%. Our work offers the potential to leverage unstructured electronic health -record data by allowing direct analysis of narrative clinical notes. -" -6212,1711.07010,"Jingjing Xu, Ji Wen, Xu Sun, Qi Su","A Discourse-Level Named Entity Recognition and Relation Extraction - Dataset for Chinese Literature Text",cs.CL," Named Entity Recognition and Relation Extraction for Chinese literature text -is regarded as the highly difficult problem, partially because of the lack of -tagging sets. In this paper, we build a discourse-level dataset from hundreds -of Chinese literature articles for improving this task. To build a high quality -dataset, we propose two tagging methods to solve the problem of data -inconsistency, including a heuristic tagging method and a machine auxiliary -tagging method. Based on this corpus, we also introduce several widely used -models to conduct experiments. Experimental results not only show the -usefulness of the proposed dataset, but also provide baselines for further -research. The dataset is available at -https://github.com/lancopku/Chinese-Literature-NER-RE-Dataset -" -6213,1711.07019,"Poorya Zaremoodi, Gholamreza Haffari","Incorporating Syntactic Uncertainty in Neural Machine Translation with - Forest-to-Sequence Model",cs.CL," Incorporating syntactic information in Neural Machine Translation models is a -method to compensate their requirement for a large amount of parallel training -text, especially for low-resource language pairs. Previous works on using -syntactic information provided by (inevitably error-prone) parsers has been -promising. In this paper, we propose a forest-to-sequence Attentional Neural -Machine Translation model to make use of exponentially many parse trees of the -source sentence to compensate for the parser errors. Our method represents the -collection of parse trees as a packed forest, and learns a neural attentional -transduction model from the forest to the target sentence. Experiments on -English to German, Chinese and Persian translation show the superiority of our -method over the tree-to-sequence and vanilla sequence-to-sequence neural -translation models. -" -6214,1711.07065,"Moontae Lee, David Bindel, David Mimno","Prior-aware Dual Decomposition: Document-specific Topic Inference for - Spectral Topic Models",cs.CL cs.IR cs.LG," Spectral topic modeling algorithms operate on matrices/tensors of word -co-occurrence statistics to learn topic-specific word distributions. This -approach removes the dependence on the original documents and produces -substantial gains in efficiency and provable topic inference, but at a cost: -the model can no longer provide information about the topic composition of -individual documents. Recently Thresholded Linear Inverse (TLI) is proposed to -map the observed words of each document back to its topic composition. However, -its linear characteristics limit the inference quality without considering the -important prior information over topics. In this paper, we evaluate Simple -Probabilistic Inverse (SPI) method and novel Prior-aware Dual Decomposition -(PADD) that is capable of learning document-specific topic compositions in -parallel. Experiments show that PADD successfully leverages topic correlations -as a prior, notably outperforming TLI and learning quality topic compositions -comparable to Gibbs sampling on various data. -" -6215,1711.07128,"Yundong Zhang, Naveen Suda, Liangzhen Lai and Vikas Chandra",Hello Edge: Keyword Spotting on Microcontrollers,cs.SD cs.CL cs.LG cs.NE eess.AS," Keyword spotting (KWS) is a critical component for enabling speech based user -interactions on smart devices. It requires real-time response and high accuracy -for good user experience. Recently, neural networks have become an attractive -choice for KWS architecture because of their superior accuracy compared to -traditional speech processing algorithms. Due to its always-on nature, KWS -application has highly constrained power budget and typically runs on tiny -microcontrollers with limited memory and compute capability. The design of -neural network architecture for KWS must consider these constraints. In this -work, we perform neural network architecture evaluation and exploration for -running KWS on resource-constrained microcontrollers. We train various neural -network architectures for keyword spotting published in literature to compare -their accuracy and memory/compute requirements. We show that it is possible to -optimize these neural network architectures to fit within the memory and -compute constraints of microcontrollers without sacrificing accuracy. We -further explore the depthwise separable convolutional neural network (DS-CNN) -and compare it against other neural network architectures. DS-CNN achieves an -accuracy of 95.4%, which is ~10% higher than the DNN model with similar number -of parameters. -" -6216,1711.07265,"Hao Wang, Yves Lepage",Fast BTG-Forest-Based Hierarchical Sub-sentential Alignment,cs.CL," In this paper, we propose a novel BTG-forest-based alignment method. Based on -a fast unsupervised initialization of parameters using variational IBM models, -we synchronously parse parallel sentences top-down and align hierarchically -under the constraint of BTG. Our two-step method can achieve the same run-time -and comparable translation performance as fast_align while it yields smaller -phrase tables. Final SMT results show that our method even outperforms in the -experiment of distantly related languages, e.g., English-Japanese. -" -6217,1711.07274,"Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep - Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth - Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu, Xuedong Zhang",Speech recognition for medical conversations,cs.CL cs.SD eess.AS stat.ML," In this work we explored building automatic speech recognition models for -transcribing doctor patient conversation. We collected a large scale dataset of -clinical conversations ($14,000$ hr), designed the task to represent the real -word scenario, and explored several alignment approaches to iteratively improve -data quality. We explored both CTC and LAS systems for building speech -recognition models. The LAS was more resilient to noisy data and CTC required -more data clean up. A detailed analysis is provided for understanding the -performance for clinical tasks. Our analysis showed the speech recognition -models performed well on important medical utterances, while errors occurred in -causal conversations. Overall we believe the resulting models can provide -reasonable quality in practice. -" -6218,1711.07280,"Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko - S\""underhauf, Ian Reid, Stephen Gould, Anton van den Hengel","Vision-and-Language Navigation: Interpreting visually-grounded - navigation instructions in real environments",cs.CV cs.AI cs.CL cs.RO," A robot that can carry out a natural-language instruction has been a dream -since before the Jetsons cartoon series imagined a life of leisure mediated by -a fleet of attentive robot helpers. It is a dream that remains stubbornly -distant. However, recent advances in vision and language methods have made -incredible progress in closely related areas. This is significant because a -robot interpreting a natural-language navigation instruction on the basis of -what it sees is carrying out a vision and language process that is similar to -Visual Question Answering. Both tasks can be interpreted as visually grounded -sequence-to-sequence translation problems, and many of the same methods are -applicable. To enable and encourage the application of vision and language -methods to the problem of interpreting visually-grounded navigation -instructions, we present the Matterport3D Simulator -- a large-scale -reinforcement learning environment based on real imagery. Using this simulator, -which can in future support a range of embodied vision and language tasks, we -provide the first benchmark dataset for visually-grounded natural language -navigation in real buildings -- the Room-to-Room (R2R) dataset. -" -6219,1711.07341,"Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, Weizhu Chen","FusionNet: Fusing via Fully-Aware Attention with Application to Machine - Comprehension",cs.CL cs.AI," This paper introduces a new neural structure called FusionNet, which extends -existing attention approaches from three perspectives. First, it puts forward a -novel concept of ""history of word"" to characterize attention information from -the lowest word-level embedding up to the highest semantic-level -representation. Second, it introduces an improved attention scoring function -that better utilizes the ""history of word"" concept. Third, it proposes a -fully-aware multi-level attention mechanism to capture the complete information -in one text (such as a question) and exploit it in its counterpart (such as -context or passage) layer by layer. We apply FusionNet to the Stanford Question -Answering Dataset (SQuAD) and it achieves the first position for both single -and ensemble model on the official SQuAD leaderboard at the time of writing -(Oct. 4th, 2017). Meanwhile, we verify the generalization of FusionNet with two -adversarial SQuAD datasets and it sets up the new state-of-the-art on both -datasets: on AddSent, FusionNet increases the best F1 metric from 46.6% to -51.4%; on AddOneSent, FusionNet boosts the best F1 metric from 56.0% to 60.7%. -" -6220,1711.07404,N. Dianna Radpour and Vinay Ashokkumar,Non-Contextual Modeling of Sarcasm using a Neural Network Benchmark,cs.CL," One of the most crucial components of natural human-robot interaction is -artificial intuition and its influence on dialog systems. The intuitive -capability that humans have is undeniably extraordinary, and so remains one of -the greatest challenges for natural communicative dialogue between humans and -robots. In this paper, we introduce a novel probabilistic modeling framework of -identifying, classifying and learning features of sarcastic text via training a -neural network with human-informed sarcastic benchmarks. This is necessary for -establishing a comprehensive sentiment analysis schema that is sensitive to the -nuances of sarcasm-ridden text by being trained on linguistic cues. We show -that our model provides a good fit for this type of real-world informed data, -with potential to achieve as accurate, if not more, than alternatives. Though -the implementation and benchmarking is an extensive task, it can be extended -via the same method that we present to capture different forms of nuances in -communication and making for much more natural and engaging dialogue systems. -" -6221,1711.07611,"Noah Weber, Niranjan Balasubramanian, Nathanael Chambers",Event Representations with Tensor-based Compositions,cs.CL," Robust and flexible event representations are important to many core areas in -language understanding. Scripts were proposed early on as a way of representing -sequences of events for such understanding, and has recently attracted renewed -attention. However, obtaining effective representations for modeling -script-like event sequences is challenging. It requires representations that -can capture event-level and scenario-level semantics. We propose a new -tensor-based composition method for creating event representations. The method -captures more subtle semantic interactions between an event and its entities -and yields representations that are effective at multiple event-related tasks. -With the continuous representations, we also devise a simple schema generation -method which produces better schemas compared to a prior discrete -representation based method. Our analysis shows that the tensors capture -distinct usages of a predicate even when there are only subtle differences in -their surface realizations. -" -6222,1711.07613,"Qi Wu, Peng Wang, Chunhua Shen, Ian Reid, and Anton van den Hengel","Are You Talking to Me? Reasoned Visual Dialog Generation through - Adversarial Learning",cs.CV cs.AI cs.CL," The Visual Dialogue task requires an agent to engage in a conversation about -an image with a human. It represents an extension of the Visual Question -Answering task in that the agent needs to answer a question about an image, but -it needs to do so in light of the previous dialogue that has taken place. The -key challenge in Visual Dialogue is thus maintaining a consistent, and natural -dialogue while continuing to answer questions correctly. We present a novel -approach that combines Reinforcement Learning and Generative Adversarial -Networks (GANs) to generate more human-like responses to questions. The GAN -helps overcome the relative paucity of training data, and the tendency of the -typical MLE-based approach to generate overly terse answers. Critically, the -GAN is tightly integrated into the attention mechanism that generates -human-interpretable reasons for each answer. This means that the discriminative -model of the GAN has the task of assessing whether a candidate answer is -generated by a human or not, given the provided reason. This is significant -because it drives the generative model to produce high quality answers that are -well supported by the associated reasoning. The method also generates the -state-of-the-art results on the primary benchmark. -" -6223,1711.07614,"Junjie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, and Anton - van den Hengel","Asking the Difficult Questions: Goal-Oriented Visual Question Generation - via Intermediate Rewards",cs.CV cs.AI cs.CL," Despite significant progress in a variety of vision-and-language problems, -developing a method capable of asking intelligent, goal-oriented questions -about images is proven to be an inscrutable challenge. Towards this end, we -propose a Deep Reinforcement Learning framework based on three new intermediate -rewards, namely goal-achieved, progressive and informativeness that encourage -the generation of succinct questions, which in turn uncover valuable -information towards the overall goal. By directly optimizing for questions that -work quickly towards fulfilling the overall goal, we avoid the tendency of -existing methods to generate long series of insane queries that add little -value. We evaluate our model on the GuessWhat?! dataset and show that the -resulting questions can help a standard Guesser identify a specific object in -an image at a much higher success rate. -" -6224,1711.07632,"Xiaopeng Yang, Xiaowen Lin, Shunda Suo, and Ming Li","Generating Thematic Chinese Poetry using Conditional Variational - Autoencoders with Hybrid Decoders",cs.CL cs.AI cs.LG," Computer poetry generation is our first step towards computer writing. -Writing must have a theme. The current approaches of using sequence-to-sequence -models with attention often produce non-thematic poems. We present a novel -conditional variational autoencoder with a hybrid decoder adding the -deconvolutional neural networks to the general recurrent neural networks to -fully learn topic information via latent variables. This approach significantly -improves the relevance of the generated poems by representing each line of the -poem not only in a context-sensitive manner but also in a holistic way that is -highly related to the given keyword and the learned topic. A proposed augmented -word2vec model further improves the rhythm and symmetry. Tests show that the -generated poems by our approach are mostly satisfying with regulated rules and -consistent themes, and 73.42% of them receive an Overall score no less than 3 -(the highest score is 5). -" -6225,1711.07646,"Yutong Shao, Rico Sennrich, Bonnie Webber, Federico Fancellu","Evaluating Machine Translation Performance on Chinese Idioms with a - Blacklist Method",cs.CL," Idiom translation is a challenging problem in machine translation because the -meaning of idioms is non-compositional, and a literal (word-by-word) -translation is likely to be wrong. In this paper, we focus on evaluating the -quality of idiom translation of MT systems. We introduce a new evaluation -method based on an idiom-specific blacklist of literal translations, based on -the insight that the occurrence of any blacklisted words in the translation -output indicates a likely translation error. We introduce a dataset, CIBB -(Chinese Idioms Blacklists Bank), and perform an evaluation of a -state-of-the-art Chinese-English neural MT system. Our evaluation confirms that -a sizable number of idioms in our test set are mistranslated (46.1%), that -literal translation error is a common error type, and that our blacklist method -is effective at identifying literal translation errors. -" -6226,1711.07656,"Yi Tay, Luu Anh Tuan, Siu Cheung Hui",Cross Temporal Recurrent Networks for Ranking Question Answer Pairs,cs.CL cs.AI cs.IR," Temporal gates play a significant role in modern recurrent-based neural -encoders, enabling fine-grained control over recursive compositional operations -over time. In recurrent models such as the long short-term memory (LSTM), -temporal gates control the amount of information retained or discarded over -time, not only playing an important role in influencing the learned -representations but also serving as a protection against vanishing gradients. -This paper explores the idea of learning temporal gates for sequence pairs -(question and answer), jointly influencing the learned representations in a -pairwise manner. In our approach, temporal gates are learned via 1D -convolutional layers and then subsequently cross applied across question and -answer for joint learning. Empirically, we show that this conceptually simple -sharing of temporal gates can lead to competitive performance across multiple -benchmarks. Intuitively, what our network achieves can be interpreted as -learning representations of question and answer pairs that are aware of what -each other is remembering or forgetting, i.e., pairwise temporal gating. Via -extensive experiments, we show that our proposed model achieves -state-of-the-art performance on two community-based QA datasets and competitive -performance on one factoid-based QA dataset. -" -6227,1711.07798,"Xingyue Chen, Yunhong Wang, Qingjie Liu","Visual and Textual Sentiment Analysis Using Deep Fusion Convolutional - Neural Networks",cs.CL cs.CV cs.IR," Sentiment analysis is attracting more and more attentions and has become a -very hot research topic due to its potential applications in personalized -recommendation, opinion mining, etc. Most of the existing methods are based on -either textual or visual data and can not achieve satisfactory results, as it -is very hard to extract sufficient information from only one single modality -data. Inspired by the observation that there exists strong semantic correlation -between visual and textual data in social medias, we propose an end-to-end deep -fusion convolutional neural network to jointly learn textual and visual -sentiment representations from training examples. The two modality information -are fused together in a pooling layer and fed into fully-connected layers to -predict the sentiment polarity. We evaluate the proposed approach on two widely -used data sets. Results show that our method achieves promising result compared -with the state-of-the-art methods which clearly demonstrate its competency. -" -6228,1711.07893,Thanh-Le Ha and Jan Niehues and Alexander Waibel,Effective Strategies in Zero-Shot Neural Machine Translation,cs.CL," In this paper, we proposed two strategies which can be applied to a -multilingual neural machine translation system in order to better tackle -zero-shot scenarios despite not having any parallel corpus. The experiments -show that they are effective in terms of both performance and computing -resources, especially in multilingual translation of unbalanced data in real -zero-resourced condition when they alleviate the language bias problem. -" -6229,1711.07908,"Devendra Singh Sachan, Pengtao Xie, Mrinmaya Sachan, and Eric P Xing","Effective Use of Bidirectional Language Modeling for Transfer Learning - in Biomedical Named Entity Recognition",cs.CL," Biomedical named entity recognition (NER) is a fundamental task in text -mining of medical documents and has many applications. Deep learning based -approaches to this task have been gaining increasing attention in recent years -as their parameters can be learned end-to-end without the need for -hand-engineered features. However, these approaches rely on high-quality -labeled data, which is expensive to obtain. To address this issue, we -investigate how to use unlabeled text data to improve the performance of NER -models. Specifically, we train a bidirectional language model (BiLM) on -unlabeled data and transfer its weights to ""pretrain"" an NER model with the -same architecture as the BiLM, which results in a better parameter -initialization of the NER model. We evaluate our approach on four benchmark -datasets for biomedical NER and show that it leads to a substantial improvement -in the F1 scores compared with the state-of-the-art approaches. We also show -that BiLM weight transfer leads to a faster model training and the pretrained -model requires fewer training examples to achieve a particular F1 score. -" -6230,1711.07915,"Philipe F. Melo, Daniel H. Dalip, Manoel M. Junior, Marcos A. - Gon\c{c}alves, Fabr\'icio Benevenuto","10Sent: A Stable Sentiment Analysis Method Based on the Combination of - Off-The-Shelf Approaches",cs.CL," Sentiment analysis has become a very important tool for analysis of social -media data. There are several methods developed for this research field, many -of them working very differently from each other, covering distinct aspects of -the problem and disparate strategies. Despite the large number of existent -techniques, there is no single one which fits well in all cases or for all data -sources. Supervised approaches may be able to adapt to specific situations but -they require manually labeled training, which is very cumbersome and expensive -to acquire, mainly for a new application. In this context, in here, we propose -to combine several very popular and effective state-of-the-practice sentiment -analysis methods, by means of an unsupervised bootstrapped strategy for -polarity classification. One of our main goals is to reduce the large -variability (lack of stability) of the unsupervised methods across different -domains (datasets). Our solution was thoroughly tested considering thirteen -different datasets in several domains such as opinions, comments, and social -media. The experimental results demonstrate that our combined method (aka, -10SENT) improves the effectiveness of the classification task, but more -importantly, it solves a key problem in the field. It is consistently among the -best methods in many data types, meaning that it can produce the best (or close -to best) results in almost all considered contexts, without any additional -costs (e.g., manual labeling). Our self-learning approach is also very -independent of the base methods, which means that it is highly extensible to -incorporate any new additional method that can be envisioned in the future. -Finally, we also investigate a transfer learning approach for sentiment -analysis as a means to gather additional (unsupervised) information for the -proposed approach and we show the potential of this technique to improve our -results. -" -6231,1711.07950,"Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. - Miller, Arthur Szlam, Douwe Kiela, Jason Weston","Mastering the Dungeon: Grounded Language Learning by Mechanical Turker - Descent",cs.CL," Contrary to most natural language processing research, which makes use of -static datasets, humans learn language interactively, grounded in an -environment. In this work we propose an interactive learning procedure called -Mechanical Turker Descent (MTD) and use it to train agents to execute natural -language commands grounded in a fantasy text adventure game. In MTD, Turkers -compete to train better agents in the short term, and collaborate by sharing -their agents' skills in the long term. This results in a gamified, engaging -experience for the Turkers and a better quality teaching signal for the agents -compared to static datasets, as the Turkers naturally adapt the training data -to the agent's abilities. -" -6232,1711.08010,"Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong","Unsupervised Adaptation with Domain Separation Networks for Robust - Speech Recognition",cs.CL cs.AI cs.SD eess.AS," Unsupervised domain adaptation of speech signal aims at adapting a -well-trained source-domain acoustic model to the unlabeled data from target -domain. This can be achieved by adversarial training of deep neural network -(DNN) acoustic models to learn an intermediate deep representation that is both -senone-discriminative and domain-invariant. Specifically, the DNN is trained to -jointly optimize the primary task of senone classification and the secondary -task of domain classification with adversarial objective functions. In this -work, instead of only focusing on learning a domain-invariant feature (i.e. the -shared component between domains), we also characterize the difference between -the source and target domain distributions by explicitly modeling the private -component of each domain through a private component extractor DNN. The private -component is trained to be orthogonal with the shared component and thus -implicitly increases the degree of domain-invariance of the shared component. A -reconstructor DNN is used to reconstruct the original speech feature from the -private and shared components as a regularization. This domain separation -framework is applied to the unsupervised environment adaptation task and -achieved 11.08% relative WER reduction from the gradient reversal layer -training, a representative adversarial training method, for automatic speech -recognition on CHiME-3 dataset. -" -6233,1711.08016,"Zhong Meng, Shinji Watanabe, John R. Hershey, Hakan Erdogan","Deep Long Short-Term Memory Adaptive Beamforming Networks For - Multichannel Robust Speech Recognition",eess.AS cs.CL cs.SD," Far-field speech recognition in noisy and reverberant conditions remains a -challenging problem despite recent deep learning breakthroughs. This problem is -commonly addressed by acquiring a speech signal from multiple microphones and -performing beamforming over them. In this paper, we propose to use a recurrent -neural network with long short-term memory (LSTM) architecture to adaptively -estimate real-time beamforming filter coefficients to cope with non-stationary -environmental noise and dynamic nature of source and microphones positions -which results in a set of timevarying room impulse responses. The LSTM adaptive -beamformer is jointly trained with a deep LSTM acoustic model to predict senone -labels. Further, we use hidden units in the deep LSTM acoustic model to assist -in predicting the beamforming filter coefficients. The proposed system achieves -7.97% absolute gain over baseline systems with no beamforming on CHiME-3 real -evaluation set. -" -6234,1711.08058,"Ahmad AbdulKader, Kareem Nassar, Mohamed Mahmoud, Daniel Galvez, - Chetan Patil","Multiple-Instance, Cascaded Classification for Keyword Spotting in - Narrow-Band Audio",cs.LG cs.CL cs.SD eess.AS," We propose using cascaded classifiers for a keyword spotting (KWS) task on -narrow-band (NB), 8kHz audio acquired in non-IID environments --- a more -challenging task than most state-of-the-art KWS systems face. We present a -model that incorporates Deep Neural Networks (DNNs), cascading, -multiple-feature representations, and multiple-instance learning. The cascaded -classifiers handle the task's class imbalance and reduce power consumption on -computationally-constrained devices via early termination. The KWS system -achieves a false negative rate of 6% at an hourly false positive rate of 0.75 -" -6235,1711.08083,Radoslaw Kowalski and Marc Esteve and Slava J. Mikhaylov,"Application of Natural Language Processing to Determine User - Satisfaction in Public Services",cs.CL," Research on customer satisfaction has increased substantially in recent -years. However, the relative importance and relationships between different -determinants of satisfaction remains uncertain. Moreover, quantitative studies -to date tend to test for significance of pre-determined factors thought to have -an influence with no scalable means to identify other causes of user -satisfaction. The gaps in knowledge make it difficult to use available -knowledge on user preference for public service improvement. Meanwhile, digital -technology development has enabled new methods to collect user feedback, for -example through online forums where users can comment freely on their -experience. New tools are needed to analyze large volumes of such feedback. Use -of topic models is proposed as a feasible solution to aggregate open-ended user -opinions that can be easily deployed in the public sector. Generated insights -can contribute to a more inclusive decision-making process in public service -provision. This novel methodological approach is applied to a case of service -reviews of publicly-funded primary care practices in England. Findings from the -analysis of 145,000 reviews covering almost 7,700 primary care centers indicate -that the quality of interactions with staff and bureaucratic exigencies are the -key issues driving user satisfaction across England. -" -6236,1711.08195,"Baoyu Jing, Pengtao Xie, Eric Xing",On the Automatic Generation of Medical Imaging Reports,cs.CL cs.CV," Medical imaging is widely used in clinical practice for diagnosis and -treatment. Report-writing can be error-prone for unexperienced physicians, and -time- consuming and tedious for experienced physicians. To address these -issues, we study the automatic generation of medical imaging reports. This task -presents several challenges. First, a complete report contains multiple -heterogeneous forms of information, including findings and tags. Second, -abnormal regions in medical images are difficult to identify. Third, the re- -ports are typically long, containing multiple sentences. To cope with these -challenges, we (1) build a multi-task learning framework which jointly performs -the pre- diction of tags and the generation of para- graphs, (2) propose a -co-attention mechanism to localize regions containing abnormalities and -generate narrations for them, (3) develop a hierarchical LSTM model to generate -long paragraphs. We demonstrate the effectiveness of the proposed methods on -two publicly available datasets. -" -6237,1711.08231,"Yi Zhang, Xu Sun, Shuming Ma, Yang Yang, Xuancheng Ren","Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling - Sequence Data?",cs.CL," Existing neural models usually predict the tag of the current token -independent of the neighboring tags. The popular LSTM-CRF model considers the -tag dependencies between every two consecutive tags. However, it is hard for -existing neural models to take longer distance dependencies of tags into -consideration. The scalability is mainly limited by the complex model -structures and the cost of dynamic programming during training. In our work, we -first design a new model called ""high order LSTM"" to predict multiple tags for -the current token which contains not only the current tag but also the previous -several tags. We call the number of tags in one prediction as ""order"". Then we -propose a new method called Multi-Order BiLSTM (MO-BiLSTM) which combines low -order and high order LSTMs together. MO-BiLSTM keeps the scalability to high -order models with a pruning technique. We evaluate MO-BiLSTM on all-phrase -chunking and NER datasets. Experiment results show that MO-BiLSTM achieves the -state-of-the-art result in chunking and highly competitive results in two NER -datasets. -" -6238,1711.08412,"Nikhil Garg, Londa Schiebinger, Dan Jurafsky, James Zou",Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes,cs.CL cs.CY," Word embeddings use vectors to represent words such that the geometry between -vectors captures semantic relationship between the words. In this paper, we -develop a framework to demonstrate how the temporal dynamics of the embedding -can be leveraged to quantify changes in stereotypes and attitudes toward women -and ethnic minorities in the 20th and 21st centuries in the United States. We -integrate word embeddings trained on 100 years of text data with the U.S. -Census to show that changes in the embedding track closely with demographic and -occupation shifts over time. The embedding captures global social shifts -- -e.g., the women's movement in the 1960s and Asian immigration into the U.S -- -and also illuminates how specific adjectives and occupations became more -closely associated with certain populations over time. Our framework for -temporal analysis of word embedding opens up a powerful new intersection -between machine learning and quantitative social science. -" -6239,1711.08493,"Bing Liu, Tong Yu, Ian Lane, Ole J. Mengshoel","Customized Nonlinear Bandits for Online Response Selection in Neural - Conversation Models",cs.CL," Dialog response selection is an important step towards natural response -generation in conversational agents. Existing work on neural conversational -models mainly focuses on offline supervised learning using a large set of -context-response pairs. In this paper, we focus on online learning of response -selection in retrieval-based dialog systems. We propose a contextual -multi-armed bandit model with a nonlinear reward function that uses distributed -representation of text for online response selection. A bidirectional LSTM is -used to produce the distributed representations of dialog context and -responses, which serve as the input to a contextual bandit. In learning the -bandit, we propose a customized Thompson sampling method that is applied to a -polynomial feature space in approximating the reward. Experimental results on -the Ubuntu Dialogue Corpus demonstrate significant performance gains of the -proposed method over conventional linear contextual bandits. Moreover, we -report encouraging response selection performance of the proposed neural bandit -model using the Recall@k metric for a small set of online training samples. -" -6240,1711.08521,"Wadi' Hijawi, Hossam Faris, Ja'far Alqatawna, Ibrahim Aljarah, Ala' M. - Al-Zoubi, and Maria Habib",EMFET: E-mail Features Extraction Tool,cs.IR cs.CL," EMFET is an open source and flexible tool that can be used to extract a large -number of features from any email corpus with emails saved in EML format. The -extracted features can be categorized into three main groups: header features, -payload (body) features, and attachment features. The purpose of the tool is to -help practitioners and researchers to build datasets that can be used for -training machine learning models for spam detection. So far, 140 features can -be extracted using EMFET. EMFET is extensible and easy to use. The source code -of EMFET is publicly available at GitHub -(https://github.com/WadeaHijjawi/EmailFeaturesExtraction) -" -6241,1711.08609,"Seyed Mahdi Rezaeinia, Ali Ghodsi, Rouhollah Rahmani","Improving the Accuracy of Pre-trained Word Embeddings for Sentiment - Analysis",cs.CL cs.IR," Sentiment analysis is one of the well-known tasks and fast growing research -areas in natural language processing (NLP) and text classifications. This -technique has become an essential part of a wide range of applications -including politics, business, advertising and marketing. There are various -techniques for sentiment analysis, but recently word embeddings methods have -been widely used in sentiment classification tasks. Word2Vec and GloVe are -currently among the most accurate and usable word embedding methods which can -convert words into meaningful vectors. However, these methods ignore sentiment -information of texts and need a huge corpus of texts for training and -generating exact vectors which are used as inputs of deep learning models. As a -result, because of the small size of some corpuses, researcher often have to -use pre-trained word embeddings which were trained on other large text corpus -such as Google News with about 100 billion words. The increasing accuracy of -pre-trained word embeddings has a great impact on sentiment analysis research. -In this paper we propose a novel method, Improved Word Vectors (IWV), which -increases the accuracy of pre-trained word embeddings in sentiment analysis. -Our method is based on Part-of-Speech (POS) tagging techniques, lexicon-based -approaches and Word2Vec/GloVe methods. We tested the accuracy of our method via -different deep learning models and sentiment datasets. Our experiment results -show that Improved Word Vectors (IWV) are very effective for sentiment -analysis. -" -6242,1711.08621,"Carolin Lawrence, Pratik Gajane, Stefan Riezler","Counterfactual Learning for Machine Translation: Degeneracies and - Solutions",stat.ML cs.CL cs.LG," Counterfactual learning is a natural scenario to improve web-based machine -translation services by offline learning from feedback logged during user -interactions. In order to avoid the risk of showing inferior translations to -users, in such scenarios mostly exploration-free deterministic logging policies -are in place. We analyze possible degeneracies of inverse and reweighted -propensity scoring estimators, in stochastic and deterministic settings, and -relate them to recently proposed techniques for counterfactual learning under -deterministic logging. -" -6243,1711.08726,"Jianfei Yu and Minghui Qiu and Jing Jiang and Jun Huang and Shuangyong - Song and Wei Chu and Haiqing Chen","Modelling Domain Relationships for Transfer Learning on Retrieval-based - Question Answering Systems in E-commerce",cs.CL," In this paper, we study transfer learning for the PI and NLI problems, aiming -to propose a general framework, which can effectively and efficiently adapt the -shared knowledge learned from a resource-rich source domain to a resource- poor -target domain. Specifically, since most existing transfer learning methods only -focus on learning a shared feature space across domains while ignoring the -relationship between the source and target domains, we propose to -simultaneously learn shared representations and domain relationships in a -unified framework. Furthermore, we propose an efficient and effective hybrid -model by combining a sentence encoding- based method and a sentence -interaction-based method as our base model. Extensive experiments on both -paraphrase identification and natural language inference demonstrate that our -base model is efficient and has promising performance compared to the competing -models, and our transfer learning method can help to significantly boost the -performance. Further analysis shows that the inter-domain and intra-domain -relationship captured by our model are insightful. Last but not least, we -deploy our transfer learning model for PI into our online chatbot system, which -can bring in significant improvements over our existing system. Finally, we -launch our new system on the chatbot platform Eva in our E-commerce site -AliExpress. -" -6244,1711.08792,"Anant Subramanian, Danish Pruthi, Harsh Jhamtani, Taylor - Berg-Kirkpatrick, Eduard Hovy",SPINE: SParse Interpretable Neural Embeddings,cs.CL," Prediction without justification has limited utility. Much of the success of -neural models can be attributed to their ability to learn rich, dense and -expressive representations. While these representations capture the underlying -complexity and latent trends in the data, they are far from being -interpretable. We propose a novel variant of denoising k-sparse autoencoders -that generates highly efficient and interpretable distributed word -representations (word embeddings), beginning with existing word representations -from state-of-the-art methods like GloVe and word2vec. Through large scale -human evaluation, we report that our resulting word embedddings are much more -interpretable than the original GloVe and word2vec embeddings. Moreover, our -embeddings outperform existing popular word embeddings on a diverse suite of -benchmark downstream tasks. -" -6245,1711.08870,"Namkyu Jung, Hyeong In Choi",Continuous Semantic Topic Embedding Model Using Variational Autoencoder,stat.ML cs.CL cs.LG," This paper proposes the continuous semantic topic embedding model (CSTEM) -which finds latent topic variables in documents using continuous semantic -distance function between the topics and the words by means of the variational -autoencoder(VAE). The semantic distance could be represented by any symmetric -bell-shaped geometric distance function on the Euclidean space, for which the -Mahalanobis distance is used in this paper. In order for the semantic distance -to perform more properly, we newly introduce an additional model parameter for -each word to take out the global factor from this distance indicating how -likely it occurs regardless of its topic. It certainly improves the problem -that the Gaussian distribution which is used in previous topic model with -continuous word embedding could not explain the semantic relation correctly and -helps to obtain the higher topic coherence. Through the experiments with the -dataset of 20 Newsgroup, NIPS papers and CNN/Dailymail corpus, the performance -of the recent state-of-the-art models is accomplished by our model as well as -generating topic embedding vectors which makes possible to observe where the -topic vectors are embedded with the word vectors in the real Euclidean space -and how the topics are related each other semantically. -" -6246,1711.08992,"Kalin Stefanov, Jonas Beskow and Giampiero Salvi","Self-Supervised Vision-Based Detection of the Active Speaker as Support - for Socially-Aware Language Acquisition",cs.CV cs.CL cs.HC cs.LG stat.ML," This paper presents a self-supervised method for visual detection of the -active speaker in a multi-person spoken interaction scenario. Active speaker -detection is a fundamental prerequisite for any artificial cognitive system -attempting to acquire language in social settings. The proposed method is -intended to complement the acoustic detection of the active speaker, thus -improving the system robustness in noisy conditions. The method can detect an -arbitrary number of possibly overlapping active speakers based exclusively on -visual information about their face. Furthermore, the method does not rely on -external annotations, thus complying with cognitive development. Instead, the -method uses information from the auditory modality to support learning in the -visual domain. This paper reports an extensive evaluation of the proposed -method using a large multi-person face-to-face interaction dataset. The results -show good performance in a speaker dependent setting. However, in a speaker -independent setting the proposed method yields a significantly lower -performance. We believe that the proposed method represents an essential -component of any artificial cognitive system or robotic platform engaging in -social interactions. -" -6247,1711.09050,"Peter Henderson, Koustuv Sinha, Nicolas Angelard-Gontier, Nan Rosemary - Ke, Genevieve Fried, Ryan Lowe, Joelle Pineau",Ethical Challenges in Data-Driven Dialogue Systems,cs.CL," The use of dialogue systems as a medium for human-machine interaction is an -increasingly prevalent paradigm. A growing number of dialogue systems use -conversation strategies that are learned from large datasets. There are well -documented instances where interactions with these system have resulted in -biased or even offensive conversations due to the data-driven training process. -Here, we highlight potential ethical issues that arise in dialogue systems -research, including: implicit biases in data-driven systems, the rise of -adversarial examples, potential sources of privacy violations, safety concerns, -special considerations for reinforcement learning systems, and reproducibility -concerns. We also suggest areas stemming from these issues that deserve further -investigation. Through this initial survey, we hope to spur research leading to -robust, safe, and ethically sound dialogue systems. -" -6248,1711.09055,"Giovanni Saponaro, Lorenzo Jamone, Alexandre Bernardino and Giampiero - Salvi","Interactive Robot Learning of Gestures, Language and Affordances",cs.RO cs.AI cs.CL cs.CV cs.LG," A growing field in robotics and Artificial Intelligence (AI) research is -human-robot collaboration, whose target is to enable effective teamwork between -humans and robots. However, in many situations human teams are still superior -to human-robot teams, primarily because human teams can easily agree on a -common goal with language, and the individual members observe each other -effectively, leveraging their shared motor repertoire and sensorimotor -resources. This paper shows that for cognitive robots it is possible, and -indeed fruitful, to combine knowledge acquired from interacting with elements -of the environment (affordance exploration) with the probabilistic observation -of another agent's actions. - We propose a model that unites (i) learning robot affordances and word -descriptions with (ii) statistical recognition of human gestures with vision -sensors. We discuss theoretical motivations, possible implementations, and we -show initial results which highlight that, after having acquired knowledge of -its surrounding environment, a humanoid robot can generalize this knowledge to -the case when it observes another agent (human partner) performing the same -motor actions previously executed during training. -" -6249,1711.09160,Tom Kocmi and Ond\v{r}ej Bojar,An Exploration of Word Embedding Initialization in Deep-Learning Tasks,cs.CL," Word embeddings are the interface between the world of discrete units of text -processing and the continuous, differentiable world of neural networks. In this -work, we examine various random and pretrained initialization methods for -embeddings used in deep networks and their effect on the performance on four -NLP tasks with both recurrent and convolutional architectures. We confirm that -pretrained embeddings are a little better than random initialization, -especially considering the speed of learning. On the other hand, we do not see -any significant difference between various methods of random initialization, as -long as the variance is kept reasonably low. High-variance initialization -prevents the network to use the space of embeddings and forces it to use other -free parameters to accomplish the task. We support this hypothesis by observing -the performance in learning lexical relations and by the fact that the network -can learn to perform reasonably in its task even with fixed random embeddings. -" -6250,1711.09181,"Siyuan Zhao, Zhiwei Xu, Limin Liu, Mengjie Guo","Towards Accurate Deceptive Opinion Spam Detection based on Word - Order-preserving CNN",cs.CL," Nowadays, deep learning has been widely used. In natural language learning, -the analysis of complex semantics has been achieved because of its high degree -of flexibility. The deceptive opinions detection is an important application -area in deep learning model, and related mechanisms have been given attention -and researched. On-line opinions are quite short, varied types and content. In -order to effectively identify deceptive opinions, we need to comprehensively -study the characteristics of deceptive opinions, and explore novel -characteristics besides the textual semantics and emotional polarity that have -been widely used in text analysis. The detection mechanism based on deep -learning has better self-adaptability and can effectively identify all kinds of -deceptive opinions. In this paper, we optimize the convolution neural network -model by embedding the word order characteristics in its convolution layer and -pooling layer, which makes convolution neural network more suitable for various -text classification and deceptive opinions detection. The TensorFlow-based -experiments demonstrate that the detection mechanism proposed in this paper -achieve more accurate deceptive opinion detection results. -" -6251,1711.09271,"Aditya Thakker, Suhail Barot, Sudhir Bagul",Acronym Disambiguation: A Domain Independent Approach,cs.CL," Acronyms are omnipresent. They usually express information that is repetitive -and well known. But acronyms can also be ambiguous because there can be -multiple expansions for the same acronym. In this paper, we propose a general -system for acronym disambiguation that can work on any acronym given some -context information. We present methods for retrieving all the possible -expansions of an acronym from Wikipedia and AcronymsFinder.com. We propose to -use these expansions to collect all possible contexts in which these acronyms -are used and then score them using a paragraph embedding technique called -Doc2Vec. This method collectively led to achieving an accuracy of 90.9% in -selecting the correct expansion for given acronym, on a dataset we scraped from -Wikipedia with 707 distinct acronyms and 14,876 disambiguations. -" -6252,1711.09285,Samira Abnar and Rasyan Ahmed and Max Mijnheer and Willem Zuidema,"Experiential, Distributional and Dependency-based Word Embeddings have - Complementary Roles in Decoding Brain Activity",cs.CL," We evaluate 8 different word embedding models on their usefulness for -predicting the neural activation patterns associated with concrete nouns. The -models we consider include an experiential model, based on crowd-sourced -association data, several popular neural and distributional models, and a model -that reflects the syntactic context of words (based on dependency parses). Our -goal is to assess the cognitive plausibility of these various embedding models, -and understand how we can further improve our methods for interpreting brain -imaging data. - We show that neural word embedding models exhibit superior performance on the -tasks we consider, beating experiential word representation model. The -syntactically informed model gives the overall best performance when predicting -brain activation patterns from word embeddings; whereas the GloVe -distributional method gives the overall best performance when predicting in the -reverse direction (words vectors from brain images). Interestingly, however, -the error patterns of these different models are markedly different. This may -support the idea that the brain uses different systems for processing different -kinds of words. Moreover, we suggest that taking the relative strengths of -different embedding models into account will lead to better models of the brain -activity associated with words. -" -6253,1711.09357,"Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, Hongyan Li",Generative Adversarial Network for Abstractive Text Summarization,cs.CL cs.AI," In this paper, we propose an adversarial process for abstractive text -summarization, in which we simultaneously train a generative model G and a -discriminative model D. In particular, we build the generator G as an agent of -reinforcement learning, which takes the raw text as input and predicts the -abstractive summarization. We also build a discriminator which attempts to -distinguish the generated summary from the ground truth summary. Extensive -experiments demonstrate that our model achieves competitive ROUGE scores with -the state-of-the-art methods on CNN/Daily Mail dataset. Qualitatively, we show -that our model is able to generate more abstractive, readable and diverse -summaries. -" -6254,1711.09367,"Zhaopeng Tu, Yang Liu, Shuming Shi, Tong Zhang",Learning to Remember Translation History with a Continuous Cache,cs.CL," Existing neural machine translation (NMT) models generally translate -sentences in isolation, missing the opportunity to take advantage of -document-level information. In this work, we propose to augment NMT models with -a very light-weight cache-like memory network, which stores recent hidden -representations as translation history. The probability distribution over -generated words is updated online depending on the translation history -retrieved from the memory, endowing NMT models with the capability to -dynamically adapt over time. Experiments on multiple domains with different -topics and styles show the effectiveness of the proposed approach with -negligible impact on the computational cost. -" -6255,1711.09395,"Igor Melnyk, Cicero Nogueira dos Santos, Kahini Wadhawan, Inkit Padhi, - Abhishek Kumar",Improved Neural Text Attribute Transfer with Non-parallel Data,cs.CL cs.AI cs.LG," Text attribute transfer using non-parallel data requires methods that can -perform disentanglement of content and linguistic attributes. In this work, we -propose multiple improvements over the existing approaches that enable the -encoder-decoder framework to cope with the text attribute transfer from -non-parallel data. We perform experiments on the sentiment transfer task using -two datasets. For both datasets, our proposed method outperforms a strong -baseline in two of the three employed evaluation metrics. -" -6256,1711.09476,Diego Moussallem and Matthias Wauer and Axel-Cyrille Ngonga Ngomo,Machine Translation using Semantic Web Technologies: A Survey,cs.CL," A large number of machine translation approaches have recently been developed -to facilitate the fluid migration of content across languages. However, the -literature suggests that many obstacles must still be dealt with to achieve -better automatic translations. One of these obstacles is lexical and syntactic -ambiguity. A promising way of overcoming this problem is using Semantic Web -technologies. This article presents the results of a systematic review of -machine translation approaches that rely on Semantic Web technologies for -translating texts. Overall, our survey suggests that while Semantic Web -technologies can enhance the quality of machine translation outputs for various -problems, the combination of both is still in its infancy. -" -6257,1711.09502,"Zaixiang Zheng, Hao Zhou, Shujian Huang, Lili Mou, Xinyu Dai, Jiajun - Chen and Zhaopeng Tu",Modeling Past and Future for Neural Machine Translation,cs.CL," Existing neural machine translation systems do not explicitly model what has -been translated and what has not during the decoding phase. To address this -problem, we propose a novel mechanism that separates the source information -into two parts: translated Past contents and untranslated Future contents, -which are modeled by two additional recurrent layers. The Past and Future -contents are fed to both the attention model and the decoder states, which -offers NMT systems the knowledge of translated and untranslated contents. -Experimental results show that the proposed approach significantly improves -translation performance in Chinese-English, German-English and English-German -translation tasks. Specifically, the proposed model outperforms the -conventional coverage model in both of the translation quality and the -alignment error rate. -" -6258,1711.09534,Ziang Xie,Neural Text Generation: A Practical Guide,cs.CL cs.LG stat.ML," Deep learning methods have recently achieved great empirical success on -machine translation, dialogue response generation, summarization, and other -text generation tasks. At a high level, the technique has been to train -end-to-end neural network models consisting of an encoder model to produce a -hidden representation of the source text, followed by a decoder model to -generate the target. While such models have significantly fewer pieces than -earlier systems, significant tuning is still required to achieve good -performance. For text generation models in particular, the decoder can behave -in undesired ways, such as by generating truncated or repetitive outputs, -outputting bland and generic responses, or in some cases producing -ungrammatical gibberish. This paper is intended as a practical guide for -resolving such undesired behavior in text generation models, with the aim of -helping enable real-world applications. -" -6259,1711.09573,"Jian Li, Yue Wang, Michael R. Lyu, Irwin King",Code Completion with Neural Attention and Pointer Networks,cs.CL cs.SE," Intelligent code completion has become an essential research task to -accelerate modern software development. To facilitate effective code completion -for dynamically-typed programming languages, we apply neural language models by -learning from large codebases, and develop a tailored attention mechanism for -code completion. However, standard neural language models even with attention -mechanism cannot correctly predict the out-of-vocabulary (OoV) words that -restrict the code completion performance. In this paper, inspired by the -prevalence of locally repeated terms in program source code, and the recently -proposed pointer copy mechanism, we propose a pointer mixture network for -better predicting OoV words in code completion. Based on the context, the -pointer mixture network learns to either generate a within-vocabulary word -through an RNN component, or regenerate an OoV word from local context through -a pointer component. Experiments on two benchmarked datasets demonstrate the -effectiveness of our attention mechanism and pointer mixture network on the -code completion task. -" -6260,1711.09645,"Stefanos Angelidis, Mirella Lapata",Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis,cs.CL cs.IR cs.LG," We consider the task of fine-grained sentiment analysis from the perspective -of multiple instance learning (MIL). Our neural model is trained on document -sentiment labels, and learns to predict the sentiment of text segments, i.e. -sentences or elementary discourse units (EDUs), without segment-level -supervision. We introduce an attention-based polarity scoring method for -identifying positive and negative text snippets and a new dataset which we call -SPOT (as shorthand for Segment-level POlariTy annotations) for evaluating -MIL-style sentiment models like ours. Experimental results demonstrate superior -performance against multiple baselines, whereas a judgement elicitation study -shows that EDU-level opinion extraction produces more informative summaries -than sentence-based alternatives. -" -6261,1711.09684,"Aniruddha Tammewar, Monik Pamecha, Chirag Jain, Apurva Nagvenkar, - Krupal Modi",Production Ready Chatbots: Generate if not Retrieve,cs.CL cs.AI," In this paper, we present a hybrid model that combines a neural -conversational model and a rule-based graph dialogue system that assists users -in scheduling reminders through a chat conversation. The graph based system has -high precision and provides a grammatically accurate response but has a low -recall. The neural conversation model can cater to a variety of requests, as it -generates the responses word by word as opposed to using canned responses. The -hybrid system shows significant improvements over the existing baseline system -of rule based approach and caters to complex queries with a domain-restricted -neural model. Restricting the conversation topic and combination of graph based -retrieval system with a neural generative model makes the final system robust -enough for a real world application. -" -6262,1711.09714,"Giampiero Salvi, Luis Montesano, Alexandre Bernardino, Jos\'e - Santos-Victor","Language Bootstrapping: Learning Word Meanings From Perception-Action - Association",cs.RO cs.CL cs.HC stat.ML," We address the problem of bootstrapping language acquisition for an -artificial system similarly to what is observed in experiments with human -infants. Our method works by associating meanings to words in manipulation -tasks, as a robot interacts with objects and listens to verbal descriptions of -the interactions. The model is based on an affordance network, i.e., a mapping -between robot actions, robot perceptions, and the perceived effects of these -actions upon objects. We extend the affordance model to incorporate spoken -words, which allows us to ground the verbal symbols to the execution of actions -and the perception of the environment. The model takes verbal descriptions of a -task as the input and uses temporal co-occurrence to create links between -speech utterances and the involved objects, actions, and effects. We show that -the robot is able form useful word-to-meaning associations, even without -considering grammatical structure in the learning process and in the presence -of recognition errors. These word-to-meaning associations are embedded in the -robot's own understanding of its actions. Thus, they can be directly used to -instruct the robot to perform tasks and also allow to incorporate context in -the speech recognition task. We believe that the encouraging results with our -approach may afford robots with a capacity to acquire language descriptors in -their operation's environment as well as to shed some light as to how this -challenging process develops with human infants. -" -6263,1711.09724,"Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang and Zhifang Sui",Table-to-text Generation by Structure-aware Seq2seq Learning,cs.CL cs.AI," Table-to-text generation aims to generate a description for a factual table -which can be viewed as a set of field-value records. To encode both the content -and the structure of a table, we propose a novel structure-aware seq2seq -architecture which consists of field-gating encoder and description generator -with dual attention. In the encoding phase, we update the cell memory of the -LSTM unit by a field gate and its corresponding field value in order to -incorporate field information into table representation. In the decoding phase, -dual attention mechanism which contains word level attention and field level -attention is proposed to model the semantic relevance between the generated -description and the table. We conduct experiments on the \texttt{WIKIBIO} -dataset which contains over 700k biographies and corresponding infoboxes from -Wikipedia. The attention visualizations and case studies show that our model is -capable of generating coherent and informative descriptions based on the -comprehensive understanding of both the content and the structure of a table. -Automatic evaluations also show our model outperforms the baselines by a great -margin. Code for this work is available on -https://github.com/tyliupku/wiki2bio. -" -6264,1711.09824,"Xuan-Son Vu, Lucie Flekova, Lili Jiang, Iryna Gurevych","Lexical-semantic resources: yet powerful resources for automatic - personality classification",cs.CL," In this paper, we aim to reveal the impact of lexical-semantic resources, -used in particular for word sense disambiguation and sense-level semantic -categorization, on automatic personality classification task. While stylistic -features (e.g., part-of-speech counts) have been shown their power in this -task, the impact of semantics beyond targeted word lists is relatively -unexplored. We propose and extract three types of lexical-semantic features, -which capture high-level concepts and emotions, overcoming the lexical gap of -word n-grams. Our experimental results are comparable to state-of-the-art -methods, while no personality-specific resources are required. -" -6265,1711.09873,"Zhongliang Li, Raymond Kulhanek, Shaojun Wang, Yunxin Zhao, Shuang Wu",Slim Embedding Layers for Recurrent Neural Language Models,cs.CL," Recurrent neural language models are the state-of-the-art models for language -modeling. When the vocabulary size is large, the space taken to store the model -parameters becomes the bottleneck for the use of recurrent neural language -models. In this paper, we introduce a simple space compression method that -randomly shares the structured parameters at both the input and output -embedding layers of the recurrent neural language models to significantly -reduce the size of model parameters, but still compactly represent the original -input and output embedding layers. The method is easy to implement and tune. -Experiments on several data sets show that the new method can get similar -perplexity and BLEU score results while only using a very tiny fraction of -parameters. -" -6266,1711.10093,"Jherez Taylor, Melvyn Peignon, Yi-Shin Chen",Surfacing contextual hate speech words within social media,cs.CL," Social media platforms have recently seen an increase in the occurrence of -hate speech discourse which has led to calls for improved detection methods. -Most of these rely on annotated data, keywords, and a classification technique. -While this approach provides good coverage, it can fall short when dealing with -new terms produced by online extremist communities which act as original -sources of words which have alternate hate speech meanings. These code words -(which can be both created and adopted words) are designed to evade automatic -detection and often have benign meanings in regular discourse. As an example, -""skypes"", ""googles"", and ""yahoos"" are all instances of words which have an -alternate meaning that can be used for hate speech. This overlap introduces -additional challenges when relying on keywords for both the collection of data -that is specific to hate speech, and downstream classification. In this work, -we develop a community detection approach for finding extremist hate speech -communities and collecting data from their members. We also develop a word -embedding model that learns the alternate hate speech meaning of words and -demonstrate the candidacy of our code words with several annotation -experiments, designed to determine if it is possible to recognize a word as -being used for hate speech without knowing its alternate meaning. We report an -inter-annotator agreement rate of K=0.871, and K=0.676 for data drawn from our -extremist community and the keyword approach respectively, supporting our claim -that hate speech detection is a contextual task and does not depend on a fixed -list of keywords. Our goal is to advance the domain by providing a high quality -hate speech dataset in addition to learned code words that can be fed into -existing classification approaches, thus improving the accuracy of automated -detection. -" -6267,1711.10122,Oswaldo Ludwig,End-to-end Adversarial Learning for Generative Conversational Agents,cs.CL," This paper presents a new adversarial learning method for generative -conversational agents (GCA) besides a new model of GCA. Similar to previous -works on adversarial learning for dialogue generation, our method assumes the -GCA as a generator that aims at fooling a discriminator that labels dialogues -as human-generated or machine-generated; however, in our approach, the -discriminator performs token-level classification, i.e. it indicates whether -the current token was generated by humans or machines. To do so, the -discriminator also receives the context utterances (the dialogue history) and -the incomplete answer up to the current token as input. This new approach makes -possible the end-to-end training by backpropagation. A self-conversation -process enables to produce a set of generated data with more diversity for the -adversarial training. This approach improves the performance on questions not -related to the training data. Experimental results with human and adversarial -evaluations show that the adversarial method yields significant performance -gains over the usual teacher forcing training. -" -6268,1711.10124,"Phuong Le-Hong, Thai Hoang Pham, Xuan Khoai Pham, Thi Minh Huyen - Nguyen, Thi Luong Nguyen, Minh Hiep Nguyen",Vietnamese Semantic Role Labelling,cs.CL," In this paper, we study semantic role labelling (SRL), a subtask of semantic -parsing of natural language sentences and its application for the Vietnamese -language. We present our effort in building Vietnamese PropBank, the first -Vietnamese SRL corpus and a software system for labelling semantic roles of -Vietnamese texts. In particular, we present a novel constituent extraction -algorithm in the argument candidate identification step which is more suitable -and more accurate than the common node-mapping method. In the machine learning -part, our system integrates distributed word features produced by two recent -unsupervised learning models in two learned statistical classifiers and makes -use of integer linear programming inference procedure to improve the accuracy. -The system is evaluated in a series of experiments and achieves a good result, -an $F_1$ score of 74.77%. Our system, including corpus and software, is -available as an open source project for free research and we believe that it is -a good baseline for the development of future Vietnamese SRL systems. -" -6269,1711.10133,Cheng-Tao Chung and Lin-Shan Lee,"Unsupervised Discovery of Structured Acoustic Tokens with Applications - to Spoken Term Detection",cs.CL," In this paper, we compare two paradigms for unsupervised discovery of -structured acoustic tokens directly from speech corpora without any human -annotation. The Multigranular Paradigm seeks to capture all available -information in the corpora with multiple sets of tokens for different model -granularities. The Hierarchical Paradigm attempts to jointly learn several -levels of signal representations in a hierarchical structure. The two paradigms -are unified within a theoretical framework in this paper. Query-by-Example -Spoken Term Detection (QbE-STD) experiments on the QUESST dataset of MediaEval -2015 verifies the competitiveness of the acoustic tokens. The Enhanced -Relevance Score (ERS) proposed in this work improves both paradigms for the -task of QbE-STD. We also list results on the ABX evaluation task of the Zero -Resource Challenge 2015 for comparison of the Paradigms. -" -6270,1711.10136,"Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo and Yifan Gong",Acoustic-To-Word Model Without OOV,cs.CL," Recently, the acoustic-to-word model based on the Connectionist Temporal -Classification (CTC) criterion was shown as a natural end-to-end model directly -targeting words as output units. However, this type of word-based CTC model -suffers from the out-of-vocabulary (OOV) issue as it can only model limited -number of words in the output layer and maps all the remaining words into an -OOV output node. Therefore, such word-based CTC model can only recognize the -frequent words modeled by the network output nodes. It also cannot easily -handle the hot-words which emerge after the model is trained. In this study, we -improve the acoustic-to-word model with a hybrid CTC model which can predict -both words and characters at the same time. With a shared-hidden-layer -structure and modular design, the alignments of words generated from the -word-based CTC and the character-based CTC are synchronized. Whenever the -acoustic-to-word model emits an OOV token, we back off that OOV segment to the -word output generated from the character-based CTC, hence solving the OOV or -hot-words issue. Evaluated on a Microsoft Cortana voice assistant task, the -proposed model can reduce the errors introduced by the OOV output token in the -acoustic-to-word model by 30%. -" -6271,1711.10163,"Xuancheng Ren, Xu Sun","Hybrid Oracle: Making Use of Ambiguity in Transition-based Chinese - Dependency Parsing",cs.CL," In the training of transition-based dependency parsers, an oracle is used to -predict a transition sequence for a sentence and its gold tree. However, the -transition system may exhibit ambiguity, that is, there can be multiple correct -transition sequences that form the gold tree. We propose to make use of the -property in the training of neural dependency parsers, and present the Hybrid -Oracle. The new oracle gives all the correct transitions for a parsing state, -which are used in the cross entropy loss function to provide better supervisory -signal. It is also used to generate different transition sequences for a -sentence to better explore the training data and improve the generalization -ability of the parser. Evaluations show that the parsers trained using the -hybrid oracle outperform the parsers using the traditional oracle in Chinese -dependency parsing. We provide analysis from a linguistic view. The code is -available at https://github.com/lancopku/nndep . -" -6272,1711.10203,"Dieuwke Hupkes, Sara Veldhoen, Willem Zuidema","Visualisation and 'diagnostic classifiers' reveal how recurrent and - recursive neural networks process hierarchical structure",cs.CL," We investigate how neural networks can learn and process languages with -hierarchical, compositional semantics. To this end, we define the artificial -task of processing nested arithmetic expressions, and study whether different -types of neural networks can learn to compute their meaning. We find that -recursive neural networks can find a generalising solution to this problem, and -we visualise this solution by breaking it up in three steps: project, sum and -squash. As a next step, we investigate recurrent neural networks, and show that -a gated recurrent unit, that processes its input incrementally, also performs -very well on this task. To develop an understanding of what the recurrent -network encodes, visualisation techniques alone do not suffice. Therefore, we -develop an approach where we formulate and test multiple hypotheses on the -information encoded and processed by the network. For each hypothesis, we -derive predictions about features of the hidden state representations at each -time step, and train 'diagnostic classifiers' to test those predictions. Our -results indicate that the networks follow a strategy similar to our -hypothesised 'cumulative strategy', which explains the high accuracy of the -network on novel expressions, the generalisation to longer expressions than -seen in training, and the mild deterioration with increasing length. This is -turn shows that diagnostic classifiers can be a useful technique for opening up -the black box of neural networks. We argue that diagnostic classification, -unlike most visualisation techniques, does scale up from small networks in a -toy domain, to larger and deeper recurrent networks dealing with real-life -data, and may therefore contribute to a better understanding of the internal -dynamics of current state-of-the-art models in natural language processing. -" -6273,1711.10307,Jean-Fran\c{c}ois Delpech,"Semantic Technology-Assisted Review (STAR) Document analysis and - monitoring using random vectors",cs.IR cs.CL," The review and analysis of large collections of documents and the periodic -monitoring of new additions thereto has greatly benefited from new developments -in computer software. This paper demonstrates how using random vectors to -construct a low-dimensional Euclidean space embedding words and documents -enables fast and accurate computation of semantic similarities between them. -With this technique of Semantic Technology-Assisted Review (STAR), documents -can be selected, compared, classified, summarized and evaluated very quickly -with minimal expert involvement and high-quality results. -" -6274,1711.10327,"Danijar Hafner, Alexander Immer, Willi Raschkowski, Fabian Windheuser",Generative Interest Estimation for Document Recommendations,cs.IR cs.CL cs.LG stat.ML," Learning distributed representations of documents has pushed the -state-of-the-art in several natural language processing tasks and was -successfully applied to the field of recommender systems recently. In this -paper, we propose a novel content-based recommender system based on learned -representations and a generative model of user interest. Our method works as -follows: First, we learn representations on a corpus of text documents. Then, -we capture a user's interest as a generative model in the space of the document -representations. In particular, we model the distribution of interest for each -user as a Gaussian mixture model (GMM). Recommendations can be obtained -directly by sampling from a user's generative model. Using Latent semantic -analysis (LSA) as comparison, we compute and explore document representations -on the Delicious bookmarks dataset, a standard benchmark for recommender -systems. We then perform density estimation in both spaces and show that -learned representations outperform LSA in terms of predictive performance. -" -6275,1711.10331,"Xu Sun, Weiwei Sun, Shuming Ma, Xuancheng Ren, Yi Zhang, Wenjie Li, - Houfeng Wang","Complex Structure Leads to Overfitting: A Structure Regularization - Decoding Method for Natural Language Processing",cs.LG cs.AI cs.CL," Recent systems on structured prediction focus on increasing the level of -structural dependencies within the model. However, our study suggests that -complex structures entail high overfitting risks. To control the -structure-based overfitting, we propose to conduct structure regularization -decoding (SR decoding). The decoding of the complex structure model is -regularized by the additionally trained simple structure model. We -theoretically analyze the quantitative relations between the structural -complexity and the overfitting risk. The analysis shows that complex structure -models are prone to the structure-based overfitting. Empirical evaluations show -that the proposed method improves the performance of the complex structure -models by reducing the structure-based overfitting. On the sequence labeling -tasks, the proposed method substantially improves the performance of the -complex neural network models. The maximum F1 error rate reduction is 36.4% for -the third-order model. The proposed method also works for the parsing task. The -maximum UAS improvement is 5.5% for the tri-sibling model. The results are -competitive with or better than the state-of-the-art results. -" -6276,1711.10377,"Hamid Bagheri, Md Johirul Islam",Sentiment analysis of twitter data,cs.IR cs.CL," Social networks are the main resources to gather information about people's -opinion and sentiments towards different topics as they spend hours daily on -social media and share their opinion. In this technical paper, we show the -application of sentimental analysis and how to connect to Twitter and run -sentimental analysis queries. We run experiments on different queries from -politics to humanity and show the interesting results. We realized that the -neutral sentiments for tweets are significantly high which clearly shows the -limitations of the current works. -" -6277,1711.10705,"Young-Bum Kim, Sungjin Lee, Ruhi Sarikaya",Speaker-Sensitive Dual Memory Networks for Multi-Turn Slot Tagging,cs.CL," In multi-turn dialogs, natural language understanding models can introduce -obvious errors by being blind to contextual information. To incorporate dialog -history, we present a neural architecture with Speaker-Sensitive Dual Memory -Networks which encode utterances differently depending on the speaker. This -addresses the different extents of information available to the system - the -system knows only the surface form of user utterances while it has the exact -semantics of system output. We performed experiments on real user data from -Microsoft Cortana, a commercial personal assistant. The result showed a -significant performance improvement over the state-of-the-art slot tagging -models using contextual information. -" -6278,1711.10712,"Bing Liu, Gokhan Tur, Dilek Hakkani-Tur, Pararth Shah, Larry Heck","End-to-End Optimization of Task-Oriented Dialogue Model with Deep - Reinforcement Learning",cs.CL," In this paper, we present a neural network based task-oriented dialogue -system that can be optimized end-to-end with deep reinforcement learning (RL). -The system is able to track dialogue state, interface with knowledge bases, and -incorporate query results into agent's responses to successfully complete -task-oriented dialogues. Dialogue policy learning is conducted with a hybrid -supervised and deep RL methods. We first train the dialogue agent in a -supervised manner by learning directly from task-oriented dialogue corpora, and -further optimize it with deep RL during its interaction with users. In the -experiments on two different dialogue task domains, our model demonstrates -robust performance in tracking dialogue state and producing reasonable system -responses. We show that deep RL based optimization leads to significant -improvement on task success rate and reduction in dialogue length comparing to -supervised training model. We further show benefits of training task-oriented -dialogue model end-to-end comparing to component-wise optimization with -experiment results on dialogue simulations and human evaluations. -" -6279,1711.10837,"Ahmed H. Zaidi, Russell Moore, Ted Briscoe",Curriculum Q-Learning for Visual Vocabulary Acquisition,cs.CL," The structure of curriculum plays a vital role in our learning process, both -as children and adults. Presenting material in ascending order of difficulty -that also exploits prior knowledge can have a significant impact on the rate of -learning. However, the notion of difficulty and prior knowledge differs from -person to person. Motivated by the need for a personalised curriculum, we -present a novel method of curriculum learning for vocabulary words in the form -of visual prompts. We employ a reinforcement learning model grounded in -pedagogical theories that emulates the actions of a tutor. We simulate three -students with different levels of vocabulary knowledge in order to evaluate the -how well our model adapts to the environment. The results of the simulation -reveal that through interaction, the model is able to identify the areas of -weakness, as well as push students to the edge of their ZPD. We hypothesise -that these methods can also be effective in training agents to learn language -representations in a simulated environment where it has previously been shown -that order of words and prior knowledge play an important role in the efficacy -of language learning. -" -6280,1711.10960,"Moumita Bhattacharya, Claudine Jurkovitz, Hagit Shatkay","Identifying Patterns of Associated-Conditions through Topic Models of - Electronic Medical Records",cs.CL," Multiple adverse health conditions co-occurring in a patient are typically -associated with poor prognosis and increased office or hospital visits. -Developing methods to identify patterns of co-occurring conditions can assist -in diagnosis. Thus identifying patterns of associations among co-occurring -conditions is of growing interest. In this paper, we report preliminary results -from a data-driven study, in which we apply a machine learning method, namely, -topic modeling, to electronic medical records, aiming to identify patterns of -associated conditions. Specifically, we use the well established latent -dirichlet allocation, a method based on the idea that documents can be modeled -as a mixture of latent topics, where each topic is a distribution over words. -In our study, we adapt the LDA model to identify latent topics in patients' -EMRs. We evaluate the performance of our method both qualitatively, and show -that the obtained topics indeed align well with distinct medical phenomena -characterized by co-occurring conditions. -" -6281,1711.11017,"Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca - Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville",HoME: a Household Multimodal Environment,cs.AI cs.CL cs.CV cs.RO cs.SD eess.AS," We introduce HoME: a Household Multimodal Environment for artificial agents -to learn from vision, audio, semantics, physics, and interaction with objects -and other agents, all within a realistic context. HoME integrates over 45,000 -diverse 3D house layouts based on the SUNCG dataset, a scale which may -facilitate learning, generalization, and transfer. HoME is an open-source, -OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, -language grounding, sound-based navigation, robotics, multi-agent learning, and -more. We hope HoME better enables artificial agents to learn as humans do: in -an interactive, multimodal, and richly contextualized setting. -" -6282,1711.11023,"I\~nigo Casanueva, Pawe{\l} Budzianowski, Pei-Hao Su, Nikola - Mrk\v{s}i\'c, Tsung-Hsien Wen, Stefan Ultes, Lina Rojas-Barahona, Steve - Young, Milica Ga\v{s}i\'c","A Benchmarking Environment for Reinforcement Learning Based Task - Oriented Dialogue Management",stat.ML cs.CL cs.NE," Dialogue assistants are rapidly becoming an indispensable daily aid. To avoid -the significant effort needed to hand-craft the required dialogue flow, the -Dialogue Management (DM) module can be cast as a continuous Markov Decision -Process (MDP) and trained through Reinforcement Learning (RL). Several RL -models have been investigated over recent years. However, the lack of a common -benchmarking framework makes it difficult to perform a fair comparison between -different models and their capability to generalise to different environments. -Therefore, this paper proposes a set of challenging simulated environments for -dialogue model development and evaluation. To provide some baselines, we -investigate a number of representative parametric algorithms, namely deep -reinforcement learning algorithms - DQN, A2C and Natural Actor-Critic and -compare them to a non-parametric model, GP-SARSA. Both the environments and -policy models are implemented using the publicly available PyDial toolkit and -released on-line, in order to establish a testbed framework for further -experiments and to facilitate experimental reproducibility. -" -6283,1711.11027,"Arthur Bra\v{z}inskas, Serhii Havrylov, Ivan Titov",Embedding Words as Distributions with a Bayesian Skip-gram Model,cs.CL cs.AI cs.LG," We introduce a method for embedding words as probability densities in a -low-dimensional space. Rather than assuming that a word embedding is fixed -across the entire text collection, as in standard word embedding methods, in -our Bayesian model we generate it from a word-specific prior density for each -occurrence of a given word. Intuitively, for each word, the prior density -encodes the distribution of its potential 'meanings'. These prior densities are -conceptually similar to Gaussian embeddings. Interestingly, unlike the Gaussian -embeddings, we can also obtain context-specific densities: they encode -uncertainty about the sense of a word given its context and correspond to -posterior distributions within our model. The context-dependent densities have -many potential applications: for example, we show that they can be directly -used in the lexical substitution task. We describe an effective estimation -method based on the variational autoencoding framework. We also demonstrate -that our embeddings achieve competitive results on standard benchmarks. -" -6284,1711.11081,Angela Lin,"Improved Twitter Sentiment Analysis Using Naive Bayes and Custom - Language Model",cs.CL," In the last couple decades, social network services like Twitter have -generated large volumes of data about users and their interests, providing -meaningful business intelligence so organizations can better understand and -engage their customers. All businesses want to know who is promoting their -products, who is complaining about them, and how are these opinions bringing or -diminishing value to a company. Companies want to be able to identify their -high-value customers and quantify the value each user brings. Many businesses -use social media metrics to calculate the user contribution score, which -enables them to quantify the value that influential users bring on social -media, so the businesses can offer them more differentiated services. However, -the score calculation can be refined to provide a better illustration of a -user's contribution. Using Microsoft Azure as a case study, we conducted -Twitter sentiment analysis to develop a machine learning classification model -that identifies tweet contents and sentiments most illustrative of -positive-value user contribution. Using data mining and AI-powered cognitive -tools, we analyzed factors of social influence and specifically, promotional -language in the developer community. Our predictive model was a combination of -a traditional supervised machine learning algorithm and a custom-developed -natural language model for identifying promotional tweets, that identifies a -product-specific promotion on Twitter with a 90% accuracy rate. -" -6285,1711.11118,"Robert L. Logan IV, Samuel Humeau, Sameer Singh",Multimodal Attribute Extraction,cs.CL," The broad goal of information extraction is to derive structured information -from unstructured data. However, most existing methods focus solely on text, -ignoring other types of unstructured data such as images, video and audio which -comprise an increasing portion of the information on the web. To address this -shortcoming, we propose the task of multimodal attribute extraction. Given a -collection of unstructured and semi-structured contextual information about an -entity (such as a textual description, or visual depictions) the task is to -extract the entity's underlying attributes. In this paper, we provide a dataset -containing mixed-media data for over 2 million product items along with 7 -million attribute-value pairs describing the items which can be used to train -attribute extractors in a weakly supervised manner. We provide a variety of -baselines which demonstrate the relative effectiveness of the individual modes -of information towards solving the task, as well as study human performance. -" -6286,1711.11125,"Filip Miscevic, Aida Nematzadeh, Suzanne Stevenson",Predicting and Explaining Human Semantic Search in a Cognitive Model,cs.CL," Recent work has attempted to characterize the structure of semantic memory -and the search algorithms which, together, best approximate human patterns of -search revealed in a semantic fluency task. There are a number of models that -seek to capture semantic search processes over networks, but they vary in the -cognitive plausibility of their implementation. Existing work has also -neglected to consider the constraints that the incremental process of language -acquisition must place on the structure of semantic memory. Here we present a -model that incrementally updates a semantic network, with limited computational -steps, and replicates many patterns found in human semantic fluency using a -simple random walk. We also perform thorough analyses showing that a -combination of both structural and semantic features are correlated with human -performance patterns. -" -6287,1711.11135,"Xin Wang, Wenhu Chen, Jiawei Wu, Yuan-Fang Wang, William Yang Wang",Video Captioning via Hierarchical Reinforcement Learning,cs.CV cs.AI cs.CL," Video captioning is the task of automatically generating a textual -description of the actions in a video. Although previous work (e.g. -sequence-to-sequence model) has shown promising results in abstracting a coarse -description of a short video, it is still very challenging to caption a video -containing multiple fine-grained actions with a detailed description. This -paper aims to address the challenge by proposing a novel hierarchical -reinforcement learning framework for video captioning, where a high-level -Manager module learns to design sub-goals and a low-level Worker module -recognizes the primitive actions to fulfill the sub-goal. With this -compositional framework to reinforce video captioning at different levels, our -approach significantly outperforms all the baseline methods on a newly -introduced large-scale dataset for fine-grained video captioning. Furthermore, -our non-ensemble model has already achieved the state-of-the-art results on the -widely-used MSR-VTT dataset. -" -6288,1711.11191,"Yu Wu, Wei Wu, Dejian Yang, Can Xu, Zhoujun Li, Ming Zhou",Neural Response Generation with Dynamic Vocabularies,cs.CL," We study response generation for open domain conversation in chatbots. -Existing methods assume that words in responses are generated from an identical -vocabulary regardless of their inputs, which not only makes them vulnerable to -generic patterns and irrelevant noise, but also causes a high cost in decoding. -We propose a dynamic vocabulary sequence-to-sequence (DVS2S) model which allows -each input to possess their own vocabulary in decoding. In training, vocabulary -construction and response generation are jointly learned by maximizing a lower -bound of the true objective with a Monte Carlo sampling method. In inference, -the model dynamically allocates a small vocabulary for an input with the word -prediction model, and conducts decoding only with the small vocabulary. Because -of the dynamic vocabulary mechanism, DVS2S eludes many generic patterns and -irrelevant words in generation, and enjoys efficient decoding at the same time. -Experimental results on both automatic metrics and human annotations show that -DVS2S can significantly outperform state-of-the-art methods in terms of -response quality, but only requires 60% decoding time compared to the most -efficient baseline. -" -6289,1711.11221,"Shaohui Kuang, Deyi Xiong, Weihua Luo, Guodong Zhou","Modeling Coherence for Neural Machine Translation with Dynamic and Topic - Caches",cs.CL," Sentences in a well-formed text are connected to each other via various links -to form the cohesive structure of the text. Current neural machine translation -(NMT) systems translate a text in a conventional sentence-by-sentence fashion, -ignoring such cross-sentence links and dependencies. This may lead to generate -an incoherent target text for a coherent source text. In order to handle this -issue, we propose a cache-based approach to modeling coherence for neural -machine translation by capturing contextual information either from recently -translated sentences or the entire document. Particularly, we explore two types -of caches: a dynamic cache, which stores words from the best translation -hypotheses of preceding sentences, and a topic cache, which maintains a set of -target-side topical words that are semantically related to the document to be -translated. On this basis, we build a new layer to score target words in these -two caches with a cache-based neural model. Here the estimated probabilities -from the cache-based neural model are combined with NMT probabilities into the -final word prediction probabilities via a gating mechanism. Finally, the -proposed cache-based neural model is trained jointly with NMT system in an -end-to-end manner. Experiments and analysis presented in this paper demonstrate -that the proposed cache-based model achieves substantial improvements over -several state-of-the-art SMT and NMT baselines. -" -6290,1711.11310,"Bing Liu, Ian Lane","Multi-Domain Adversarial Learning for Slot Filling in Spoken Language - Understanding",cs.CL," The goal of this paper is to learn cross-domain representations for slot -filling task in spoken language understanding (SLU). Most of the recently -published SLU models are domain-specific ones that work on individual task -domains. Annotating data for each individual task domain is both financially -costly and non-scalable. In this work, we propose an adversarial training -method in learning common features and representations that can be shared -across multiple domains. Model that produces such shared representations can be -combined with models trained on individual domain SLU data to reduce the amount -of training samples required for developing a new domain. In our experiments -using data sets from multiple domains, we show that adversarial training helps -in learning better domain-general SLU models, leading to improved slot filling -F1 scores. We further show that applying adversarial learning on domain-general -model also helps in achieving higher slot filling performance when the model is -jointly optimized with domain-specific models. -" -6291,1711.11383,"Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps",Learning to Learn from Weak Supervision by Full Supervision,stat.ML cs.AI cs.CL cs.LG," In this paper, we propose a method for training neural networks when we have -a large set of data with weak labels and a small amount of data with true -labels. In our proposed model, we train two neural networks: a target network, -the learner and a confidence network, the meta-learner. The target network is -optimized to perform a given task and is trained using a large set of unlabeled -data that are weakly annotated. We propose to control the magnitude of the -gradient updates to the target network using the scores provided by the second -confidence network, which is trained on a small amount of supervised data. Thus -we avoid that the weight updates computed from noisy labels harm the quality of -the target network model. -" -6292,1711.11486,"Christopher Tegho, Pawe{\l} Budzianowski, Milica Ga\v{s}i\'c","Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy - Optimisation",stat.ML cs.CL cs.LG cs.NE," In statistical dialogue management, the dialogue manager learns a policy that -maps a belief state to an action for the system to perform. Efficient -exploration is key to successful policy optimisation. Current deep -reinforcement learning methods are very promising but rely on epsilon-greedy -exploration, thus subjecting the user to a random choice of action during -learning. Alternative approaches such as Gaussian Process SARSA (GPSARSA) -estimate uncertainties and are sample efficient, leading to better user -experience, but on the expense of a greater computational complexity. This -paper examines approaches to extract uncertainty estimates from deep Q-networks -(DQN) in the context of dialogue management. We perform an extensive benchmark -of deep Bayesian methods to extract uncertainty estimates, namely -Bayes-By-Backprop, dropout, its concrete variation, bootstrapped ensemble and -alpha-divergences, combining it with DQN algorithm. -" -6293,1711.11508,"Ming Liu, Bo Lang, and Zepeng Gu","Calculating Semantic Similarity between Academic Articles using Topic - Event and Ontology",cs.CL cs.AI cs.IR," Determining semantic similarity between academic documents is crucial to many -tasks such as plagiarism detection, automatic technical survey and semantic -search. Current studies mostly focus on semantic similarity between concepts, -sentences and short text fragments. However, document-level semantic matching -is still based on statistical information in surface level, neglecting article -structures and global semantic meanings, which may cause the deviation in -document understanding. In this paper, we focus on the document-level semantic -similarity issue for academic literatures with a novel method. We represent -academic articles with topic events that utilize multiple information profiles, -such as research purposes, methodologies and domains to integrally describe the -research work, and calculate the similarity between topic events based on the -domain ontology to acquire the semantic similarity between articles. -Experiments show that our approach achieves significant performance compared to -state-of-the-art methods. -" -6294,1711.11513,Michael Moortgat and Gijs Wijnholds,"Lexical and Derivational Meaning in Vector-Based Models of - Relativisation",cs.CL," Sadrzadeh et al (2013) present a compositional distributional analysis of -relative clauses in English in terms of the Frobenius algebraic structure of -finite dimensional vector spaces. The analysis relies on distinct type -assignments and lexical recipes for subject vs object relativisation. The -situation for Dutch is different: because of the verb final nature of Dutch, -relative clauses are ambiguous between a subject vs object relativisation -reading. Using an extended version of Lambek calculus, we present a -compositional distributional framework that accounts for this derivational -ambiguity, and that allows us to give a single meaning recipe for the relative -pronoun reconciling the Frobenius semantics with the demands of Dutch -derivational syntax. -" -6295,1711.11543,"Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, - Dhruv Batra",Embodied Question Answering,cs.CV cs.AI cs.CL cs.LG," We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where -an agent is spawned at a random location in a 3D environment and asked a -question (""What color is the car?""). In order to answer, the agent must first -intelligently navigate to explore the environment, gather information through -first-person (egocentric) vision, and then answer the question (""orange""). - This challenging task requires a range of AI skills -- active perception, -language understanding, goal-driven navigation, commonsense reasoning, and -grounding of language into actions. In this work, we develop the environments, -end-to-end-trained reinforcement learning agents, and evaluation protocols for -EmbodiedQA. -" -6296,1712.00044,"Hussam Hamdan, Jean-Gabriel Ganascia",Graph Centrality Measures for Boosting Popularity-Based Entity Linking,cs.CL cs.IR cs.SI," Many Entity Linking systems use collective graph-based methods to -disambiguate the entity mentions within a document. Most of them have focused -on graph construction and initial weighting of the candidate entities, less -attention has been devoted to compare the graph ranking algorithms. In this -work, we focus on the graph-based ranking algorithms, therefore we propose to -apply five centrality measures: Degree, HITS, PageRank, Betweenness and -Closeness. A disambiguation graph of candidate entities is constructed for each -document using the popularity method, then centrality measures are applied to -choose the most relevant candidate to boost the results of entity popularity -method. We investigate the effectiveness of each centrality measure on the -performance across different domains and datasets. Our experiments show that a -simple and fast centrality measure such as Degree centrality can outperform -other more time-consuming measures. -" -6297,1712.00069,"Zeinab Noorian, Chlo\'e Pou-Prom, Frank Rudzicz",On the importance of normative data in speech-based assessment,cs.CL," Data sets for identifying Alzheimer's disease (AD) are often relatively -sparse, which limits their ability to train generalizable models. Here, we -augment such a data set, DementiaBank, with each of two normative data sets, -the Wisconsin Longitudinal Study and Talk2Me, each of which employs a -speech-based picture-description assessment. Through minority class -oversampling with ADASYN, we outperform state-of-the-art results in binary -classification of people with and without AD in DementiaBank. This work -highlights the effectiveness of combining sparse and difficult-to-acquire -patient data with relatively large and easily accessible normative datasets. -" -6298,1712.00170,"Heng Wang, Zengchang Qin, Tao Wan","Text Generation Based on Generative Adversarial Nets with Latent - Variable",cs.CL," In this paper, we propose a model using generative adversarial net (GAN) to -generate realistic text. Instead of using standard GAN, we combine variational -autoencoder (VAE) with generative adversarial net. The use of high-level latent -random variables is helpful to learn the data distribution and solve the -problem that generative adversarial net always emits the similar data. We -propose the VGAN model where the generative model is composed of recurrent -neural network and VAE. The discriminative model is a convolutional neural -network. We train the model via policy gradient. We apply the proposed model to -the task of text generation and compare it to other recent neural network based -models, such as recurrent neural network language model and SeqGAN. We evaluate -the performance of the model by calculating negative log-likelihood and the -BLEU score. We conduct experiments on three benchmark datasets, and results -show that our model outperforms other previous models. -" -6299,1712.00334,Fabio Paolizzo,Enabling Embodied Analogies in Intelligent Music Systems,cs.HC cs.CL cs.IR cs.LG cs.MM," The present methodology is aimed at cross-modal machine learning and uses -multidisciplinary tools and methods drawn from a broad range of areas and -disciplines, including music, systematic musicology, dance, motion capture, -human-computer interaction, computational linguistics and audio signal -processing. Main tasks include: (1) adapting wisdom-of-the-crowd approaches to -embodiment in music and dance performance to create a dataset of music and -music lyrics that covers a variety of emotions, (2) applying -audio/language-informed machine learning techniques to that dataset to identify -automatically the emotional content of the music and the lyrics, and (3) -integrating motion capture data from a Vicon system and dancers performing on -that music. -" -6300,1712.00377,"Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi","Don't Just Assume; Look and Answer: Overcoming Priors for Visual - Question Answering",cs.CV cs.AI cs.CL cs.LG," A number of studies have found that today's Visual Question Answering (VQA) -models are heavily driven by superficial correlations in the training data and -lack sufficient image grounding. To encourage development of models geared -towards the latter, we propose a new setting for VQA where for every question -type, train and test sets have different prior distributions of answers. -Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we -call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 -respectively). First, we evaluate several existing VQA models under this new -setting and show that their performance degrades significantly compared to the -original VQA setting. Second, we propose a novel Grounded Visual Question -Answering model (GVQA) that contains inductive biases and restrictions in the -architecture specifically designed to prevent the model from 'cheating' by -primarily relying on priors in the training data. Specifically, GVQA explicitly -disentangles the recognition of visual concepts present in the image from the -identification of plausible answer space for a given question, enabling the -model to more robustly generalize across different distributions of answers. -GVQA is built off an existing VQA model -- Stacked Attention Networks (SAN). -Our experiments demonstrate that GVQA significantly outperforms SAN on both -VQA-CP v1 and VQA-CP v2 datasets. Interestingly, it also outperforms more -powerful VQA models such as Multimodal Compact Bilinear Pooling (MCB) in -several cases. GVQA offers strengths complementary to SAN when trained and -evaluated on the original VQA v1 and VQA v2 datasets. Finally, GVQA is more -transparent and interpretable than existing VQA models. -" -6301,1712.00489,"Abhinav Gupta, Yajie Miao, Leonardo Neves, Florian Metze",Visual Features for Context-Aware Speech Recognition,cs.CL cs.AI cs.CV cs.LG eess.AS," Automatic transcriptions of consumer-generated multi-media content such as -""Youtube"" videos still exhibit high word error rates. Such data typically -occupies a very broad domain, has been recorded in challenging conditions, with -cheap hardware and a focus on the visual modality, and may have been -post-processed or edited. In this paper, we extend our earlier work on adapting -the acoustic model of a DNN-based speech recognition system to an RNN language -model and show how both can be adapted to the objects and scenes that can be -automatically detected in the video. We are working on a corpus of ""how-to"" -videos from the web, and the idea is that an object that can be seen (""car""), -or a scene that is being detected (""kitchen"") can be used to condition both -models on the ""context"" of the recording, thereby reducing perplexity and -improving transcription. We achieve good improvements in both cases and compare -and analyze the respective reductions in word error rate. We expect that our -results can be used for any type of speech processing in which ""context"" -information is available, for example in robotics, man-machine interaction, or -when indexing large audio-visual archives, and should ultimately help to bring -together the ""video-to-text"" and ""speech-to-text"" communities. -" -6302,1712.00609,"Kang Min Yoo, Youhyun Shin, Sang-goo Lee",Improving Visually Grounded Sentence Representations with Self-Attention,cs.CL," Sentence representation models trained only on language could potentially -suffer from the grounding problem. Recent work has shown promising results in -improving the qualities of sentence representations by jointly training them -with associated image features. However, the grounding capability is limited -due to distant connection between input sentences and image features by the -design of the architecture. In order to further close the gap, we propose -applying self-attention mechanism to the sentence encoder to deepen the -grounding effect. Our results on transfer tasks show that self-attentive -encoders are better for visual grounding, as they exploit specific words with -strong visual associations. -" -6303,1712.00725,"Laura Graesser, Abhinav Gupta, Lakshay Sharma, Evelina Bakhturina",Sentiment Classification using Images and Label Embeddings,cs.CL cs.AI cs.CV cs.LG stat.ML," In this project we analysed how much semantic information images carry, and -how much value image data can add to sentiment analysis of the text associated -with the images. To better understand the contribution from images, we compared -models which only made use of image data, models which only made use of text -data, and models which combined both data types. We also analysed if this -approach could help sentiment classifiers generalize to unknown sentiments. -" -6304,1712.00733,"Guohao Li, Hang Su, Wenwu Zhu","Incorporating External Knowledge to Answer Open-Domain Visual Questions - with Dynamic Memory Networks",cs.CV cs.CL," Visual Question Answering (VQA) has attracted much attention since it offers -insight into the relationships between the multi-modal analysis of images and -natural language. Most of the current algorithms are incapable of answering -open-domain questions that require to perform reasoning beyond the image -contents. To address this issue, we propose a novel framework which endows the -model capabilities in answering more complex questions by leveraging massive -external knowledge with dynamic memory networks. Specifically, the questions -along with the corresponding images trigger a process to retrieve the relevant -information in external knowledge bases, which are embedded into a continuous -vector space by preserving the entity-relation structures. Afterwards, we -employ dynamic memory networks to attend to the large body of facts in the -knowledge graph and images, and then perform reasoning over these facts to -generate corresponding answers. Extensive experiments demonstrate that our -model not only achieves the state-of-the-art performance in the visual question -answering task, but can also answer open-domain questions effectively by -leveraging the external knowledge. -" -6305,1712.00991,"Girish Keshav Palshikar, Sachin Pawar, Saheb Chourasia, Nitin - Ramrakhiyani",Mining Supervisor Evaluation and Peer Feedback in Performance Appraisals,cs.CL cs.AI," Performance appraisal (PA) is an important HR process to periodically measure -and evaluate every employee's performance vis-a-vis the goals established by -the organization. A PA process involves purposeful multi-step multi-modal -communication between employees, their supervisors and their peers, such as -self-appraisal, supervisor assessment and peer feedback. Analysis of the -structured data and text produced in PA is crucial for measuring the quality of -appraisals and tracking actual improvements. In this paper, we apply text -mining techniques to produce insights from PA text. First, we perform sentence -classification to identify strengths, weaknesses and suggestions of -improvements found in the supervisor assessments and then use clustering to -discover broad categories among them. Next we use multi-class multi-label -classification techniques to match supervisor assessments to predefined broad -perspectives on performance. Finally, we propose a short-text summarization -technique to produce a summary of peer feedback comments for a given employee -and compare it with manual summaries. All techniques are illustrated using a -real-life dataset of supervisor assessment and peer feedback text produced -during the PA of 4528 employees in a large multi-national IT company. -" -6306,1712.01097,"Thomas Kollar, Stefanie Tellex, Matthew Walter, Albert Huang, Abraham - Bachrach, Sachi Hemachandra, Emma Brunskill, Ashis Banerjee, Deb Roy, Seth - Teller and Nicholas Roy","Generalized Grounding Graphs: A Probabilistic Framework for - Understanding Grounded Commands",cs.CL cs.RO," Many task domains require robots to interpret and act upon natural language -commands which are given by people and which refer to the robot's physical -surroundings. Such interpretation is known variously as the symbol grounding -problem, grounded semantics and grounded language acquisition. This problem is -challenging because people employ diverse vocabulary and grammar, and because -robots have substantial uncertainty about the nature and contents of their -surroundings, making it difficult to associate the constitutive language -elements (principally noun phrases and spatial relations) of the command text -to elements of those surroundings. Symbolic models capture linguistic structure -but have not scaled successfully to handle the diverse language produced by -untrained users. Existing statistical approaches can better handle diversity, -but have not to date modeled complex linguistic structure, limiting achievable -accuracy. Recent hybrid approaches have addressed limitations in scaling and -complexity, but have not effectively associated linguistic and perceptual -features. Our framework, called Generalized Grounding Graphs (G^3), addresses -these issues by defining a probabilistic graphical model dynamically according -to the linguistic parse structure of a natural language command. This approach -scales effectively, handles linguistic diversity, and enables the system to -associate parts of a command with the specific objects, places, and events in -the external world to which they refer. We show that robots can learn word -meanings and use those learned meanings to robustly follow natural language -commands produced by untrained users. We demonstrate our approach for both -mobility commands and mobile manipulation commands involving a variety of -semi-autonomous robotic platforms, including a wheelchair, a micro-air vehicle, -a forklift, and the Willow Garage PR2. -" -6307,1712.01213,"Elena Tutubalina, Zulfat Miftahutdinov",An Encoder-Decoder Model for ICD-10 Coding of Death Certificates,cs.CL cs.CY," Information extraction from textual documents such as hospital records and -healthrelated user discussions has become a topic of intense interest. The task -of medical concept coding is to map a variable length text to medical concepts -and corresponding classification codes in some external system or ontology. In -this work, we utilize recurrent neural networks to automatically assign ICD-10 -codes to fragments of death certificates written in English. We develop -end-to-end neural architectures directly tailored to the task, including basic -encoder-decoder architecture for statistical translation. In order to -incorporate prior knowledge, we concatenate cosine similarities vector among -the text and dictionary entry to the encoded state. Being applied to a standard -benchmark from CLEF eHealth 2017 challenge, our model achieved F-measure of -85.01% on a full test set with significant improvement as compared to the -average score of 62.2% for all official participants approaches. -" -6308,1712.01238,"Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, - Laurens van der Maaten",Learning by Asking Questions,cs.CV cs.CL cs.LG," We introduce an interactive learning framework for the development and -testing of intelligent visual systems, called learning-by-asking (LBA). We -explore LBA in context of the Visual Question Answering (VQA) task. LBA differs -from standard VQA training in that most questions are not observed during -training time, and the learner must ask questions it wants answers to. Thus, -LBA more closely mimics natural learning and has the potential to be more -data-efficient than the traditional VQA setting. We present a model that -performs LBA on the CLEVR dataset, and show that it automatically discovers an -easy-to-hard curriculum when learning interactively from an oracle. Our LBA -generated data consistently matches or outperforms the CLEVR train data and is -more sample efficient. We also show that our model asks questions that -generalize to state-of-the-art VQA models and to novel test time distributions. -" -6309,1712.01329,"Mircea Mironenco, Dana Kianfar, Ke Tran, Evangelos Kanoulas, - Efstratios Gavves",Examining Cooperation in Visual Dialog Models,cs.CV cs.AI cs.CL cs.IR cs.LG," In this work we propose a blackbox intervention method for visual dialog -models, with the aim of assessing the contribution of individual linguistic or -visual components. Concretely, we conduct structured or randomized -interventions that aim to impair an individual component of the model, and -observe changes in task performance. We reproduce a state-of-the-art visual -dialog model and demonstrate that our methodology yields surprising insights, -namely that both dialog and image information have minimal contributions to -task performance. The intervention method presented here can be applied as a -sanity check for the strength and robustness of each component in visual dialog -systems. -" -6310,1712.01411,"Ian Stewart and Stevie Chancellor and Munmun De Choudhury and Jacob - Eisenstein","#anorexia, #anarexia, #anarexyia: Characterizing Online Community - Practices with Orthographic Variation",cs.CL cs.SI," Distinctive linguistic practices help communities build solidarity and -differentiate themselves from outsiders. In an online community, one such -practice is variation in orthography, which includes spelling, punctuation, and -capitalization. Using a dataset of over two million Instagram posts, we -investigate orthographic variation in a community that shares pro-eating -disorder (pro-ED) content. We find that not only does orthographic variation -grow more frequent over time, it also becomes more profound or deep, with -variants becoming increasingly distant from the original: as, for example, -#anarexyia is more distant than #anarexia from the original spelling #anorexia. -These changes are driven by newcomers, who adopt the most extreme linguistic -practices as they enter the community. Moreover, this behavior correlates with -engagement: the newcomers who adopt deeper orthographic variants tend to remain -active for longer in the community, and the posts that contain deeper variation -receive more positive feedback in the form of ""likes."" Previous work has linked -community membership change with language change, and our work casts this -connection in a new light, with newcomers driving an evolving practice, rather -than adapting to it. We also demonstrate the utility of orthographic variation -as a new lens to study sociolinguistic change in online communities, -particularly when the change results from an exogenous force such as a content -ban. -" -6311,1712.01455,"Zhiqian Chen, Xuchao Zhang, Arnold P. Boedihardjo, Jing Dai and - Chang-Tien Lu",Multimodal Storytelling via Generative Adversarial Imitation Learning,cs.AI cs.CL cs.CV," Deriving event storylines is an effective summarization method to succinctly -organize extensive information, which can significantly alleviate the pain of -information overload. The critical challenge is the lack of widely recognized -definition of storyline metric. Prior studies have developed various approaches -based on different assumptions about users' interests. These works can extract -interesting patterns, but their assumptions do not guarantee that the derived -patterns will match users' preference. On the other hand, their exclusiveness -of single modality source misses cross-modality information. This paper -proposes a method, multimodal imitation learning via generative adversarial -networks(MIL-GAN), to directly model users' interests as reflected by various -data. In particular, the proposed model addresses the critical challenge by -imitating users' demonstrated storylines. Our proposed model is designed to -learn the reward patterns given user-provided storylines and then applies the -learned policy to unseen data. The proposed approach is demonstrated to be -capable of acquiring the user's implicit intent and outperforming competing -methods by a substantial margin with a user study. -" -6312,1712.01460,Willie Boag and Hassan Kan\'e,AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus,cs.CL," In recent years, word embeddings have been surprisingly effective at -capturing intuitive characteristics of the words they represent. These vectors -achieve the best results when training corpora are extremely large, sometimes -billions of words. Clinical natural language processing datasets, however, tend -to be much smaller. Even the largest publicly-available dataset of medical -notes is three orders of magnitude smaller than the dataset of the oft-used -""Google News"" word vectors. In order to make up for limited training data -sizes, we encode expert domain knowledge into our embeddings. Building on a -previous extension of word2vec, we show that generalizing the notion of a -word's ""context"" to include arbitrary features creates an avenue for encoding -domain knowledge into word embeddings. We show that the word vectors produced -by this method outperform their text-only counterparts across the board in -correlation with clinical experts. -" -6313,1712.01476,"J\'ulio Hoffimann, Youli Mao, Avinash Wesley and Aimee Taylor","Sequence Mining and Pattern Analysis in Drilling Reports with Deep - Natural Language Processing",cs.CL," Drilling activities in the oil and gas industry have been reported over -decades for thousands of wells on a daily basis, yet the analysis of this text -at large-scale for information retrieval, sequence mining, and pattern analysis -is very challenging. Drilling reports contain interpretations written by -drillers from noting measurements in downhole sensors and surface equipment, -and can be used for operation optimization and accident mitigation. In this -initial work, a methodology is proposed for automatic classification of -sentences written in drilling reports into three relevant labels (EVENT, -SYMPTOM and ACTION) for hundreds of wells in an actual field. Some of the main -challenges in the text corpus were overcome, which include the high frequency -of technical symbols, mistyping/abbreviation of technical terms, and the -presence of incomplete sentences in the drilling reports. We obtain -state-of-the-art classification accuracy within this technical language and -illustrate advanced queries enabled by the tool. -" -6314,1712.01562,"Kuntal Dey, Ritvik Shrivastava, Saroj Kaushik and L. Venkata - Subramaniam","EmTaggeR: A Word Embedding Based Novel Method for Hashtag Recommendation - on Twitter",cs.CL cs.IR cs.SI," The hashtag recommendation problem addresses recommending (suggesting) one or -more hashtags to explicitly tag a post made on a given social network platform, -based upon the content and context of the post. In this work, we propose a -novel methodology for hashtag recommendation for microblog posts, specifically -Twitter. The methodology, EmTaggeR, is built upon a training-testing framework -that builds on the top of the concept of word embedding. The training phase -comprises of learning word vectors associated with each hashtag, and deriving a -word embedding for each hashtag. We provide two training procedures, one in -which each hashtag is trained with a separate word embedding model applicable -in the context of that hashtag, and another in which each hashtag obtains its -embedding from a global context. The testing phase constitutes computing the -average word embedding of the test post, and finding the similarity of this -embedding with the known embeddings of the hashtags. The tweets that contain -the most-similar hashtag are extracted, and all the hashtags that appear in -these tweets are ranked in terms of embedding similarity scores. The top-K -hashtags that appear in this ranked list, are recommended for the given test -post. Our system produces F1 score of 50.83%, improving over the LDA baseline -by around 6.53 times, outperforming the best-performing system known in the -literature that provides a lift of 6.42 times. EmTaggeR is a fast, scalable and -lightweight system, which makes it practical to deploy in real-life -applications. -" -6315,1712.01586,"Zhixing Tan, Mingxuan Wang, Jun Xie, Yidong Chen, Xiaodong Shi",Deep Semantic Role Labeling with Self-Attention,cs.CL," Semantic Role Labeling (SRL) is believed to be a crucial step towards natural -language understanding and has been widely studied. Recent years, end-to-end -SRL with recurrent neural networks (RNN) has gained increasing attention. -However, it remains a major challenge for RNNs to handle structural information -and long range dependencies. In this paper, we present a simple and effective -architecture for SRL which aims to address these problems. Our model is based -on self-attention which can directly capture the relationships between two -tokens regardless of their distance. Our single model achieves F$_1=83.4$ on -the CoNLL-2005 shared task dataset and F$_1=82.7$ on the CoNLL-2012 shared task -dataset, which outperforms the previous state-of-the-art results by $1.8$ and -$1.0$ F$_1$ score respectively. Besides, our model is computationally -efficient, and the parsing speed is 50K tokens per second on a single Titan X -GPU. -" -6316,1712.01719,"Kevin Shu, Andrew Ortegaray, Robert Berwick, Matilde Marcolli","Phylogenetics of Indo-European Language families via an - Algebro-Geometric Analysis of their Syntactic Structures",cs.CL," Using Phylogenetic Algebraic Geometry, we analyze computationally the -phylogenetic tree of subfamilies of the Indo-European language family, using -data of syntactic structures. The two main sources of syntactic data are the -SSWL database and Longobardi's recent data of syntactic parameters. We compute -phylogenetic invariants and likelihood functions for two sets of Germanic -languages, a set of Romance languages, a set of Slavic languages and a set of -early Indo-European languages, and we compare the results with what is known -through historical linguistics. -" -6317,1712.01741,Svetlana Kiritchenko and Saif M. Mohammad,"Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing - and Best-Worst Scaling",cs.CL," Access to word-sentiment associations is useful for many applications, -including sentiment analysis, stance detection, and linguistic analysis. -However, manually assigning fine-grained sentiment association scores to words -has many challenges with respect to keeping annotations consistent. We apply -the annotation technique of Best-Worst Scaling to obtain real-valued sentiment -association scores for words and phrases in three different domains: general -English, English Twitter, and Arabic Twitter. We show that on all three domains -the ranking of words by sentiment remains remarkably consistent even when the -annotation process is repeated with a different set of annotators. We also, for -the first time, determine the minimum difference in sentiment association that -is perceptible to native speakers of a language. -" -6318,1712.01765,Svetlana Kiritchenko and Saif M. Mohammad,"Best-Worst Scaling More Reliable than Rating Scales: A Case Study on - Sentiment Intensity Annotation",cs.CL," Rating scales are a widely used method for data annotation; however, they -present several challenges, such as difficulty in maintaining inter- and -intra-annotator consistency. Best-worst scaling (BWS) is an alternative method -of annotation that is claimed to produce high-quality annotations while keeping -the required number of annotations similar to that of rating scales. However, -the veracity of this claim has never been systematically established. Here for -the first time, we set up an experiment that directly compares the rating scale -method with BWS. We show that with the same total number of annotations, BWS -produces significantly more reliable results than the rating scale. -" -6319,1712.01769,"Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, - Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, - Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani",State-of-the-art Speech Recognition With Sequence-to-Sequence Models,cs.CL cs.SD eess.AS stat.ML," Attention-based encoder-decoder architectures such as Listen, Attend, and -Spell (LAS), subsume the acoustic, pronunciation and language model components -of a traditional automatic speech recognition (ASR) system into a single neural -network. In previous work, we have shown that such architectures are comparable -to state-of-theart ASR systems on dictation tasks, but it was not clear if such -architectures would be practical for more challenging tasks such as voice -search. In this work, we explore a variety of structural and optimization -improvements to our LAS model which significantly improve performance. On the -structural side, we show that word piece models can be used instead of -graphemes. We also introduce a multi-head attention architecture, which offers -improvements over the commonly-used single-head attention. On the optimization -side, we explore synchronous training, scheduled sampling, label smoothing, and -minimum word error rate optimization, which are all shown to improve accuracy. -We present results with a unidirectional LSTM encoder for streaming -recognition. On a 12, 500 hour voice search task, we find that the proposed -changes improve the WER from 9.2% to 5.6%, while the best conventional system -achieves 6.7%; on a dictation task our model achieves a WER of 4.1% compared to -5% for the conventional system. -" -6320,1712.01794,Svetlana Kiritchenko and Saif M. Mohammad,"The Effect of Negators, Modals, and Degree Adverbs on Sentiment - Composition",cs.CL," Negators, modals, and degree adverbs can significantly affect the sentiment -of the words they modify. Often, their impact is modeled with simple -heuristics; although, recent work has shown that such heuristics do not capture -the true sentiment of multi-word phrases. We created a dataset of phrases that -include various negators, modals, and degree adverbs, as well as their -combinations. Both the phrases and their constituent content words were -annotated with real-valued scores of sentiment association. Using phrasal terms -in the created dataset, we analyze the impact of individual modifiers and the -average effect of the groups of modifiers on overall sentiment. We find that -the effect of modifiers varies substantially among the members of the same -group. Furthermore, each individual modifier can affect sentiment words in -different ways. Therefore, solutions based on statistical learning seem more -promising than fixed hand-crafted rules on the task of automatic sentiment -prediction. -" -6321,1712.01797,Avirup Sil and Radu Florian,One for All: Towards Language Independent Named Entity Linking,cs.CL," Entity linking (EL) is the task of disambiguating mentions in text by -associating them with entries in a predefined database of mentions (persons, -organizations, etc). Most previous EL research has focused mainly on one -language, English, with less attention being paid to other languages, such as -Spanish or Chinese. In this paper, we introduce LIEL, a Language Independent -Entity Linking system, which provides an EL framework which, once trained on -one language, works remarkably well on a number of different languages without -change. LIEL makes a joint global prediction over the entire document, -employing a discriminative reranking framework with many domain and -language-independent feature functions. Experiments on numerous benchmark -datasets, show that the proposed system, once trained on one language, English, -outperforms several state-of-the-art systems in English (by 4 points) and the -trained model also works very well on Spanish (14 points better than a -competitor system), demonstrating the viability of the approach. -" -6322,1712.01807,"Tara N. Sainath, Chung-Cheng Chiu, Rohit Prabhavalkar, Anjuli Kannan, - Yonghui Wu, Patrick Nguyen, Zhifeng Chen",Improving the Performance of Online Neural Transducer Models,cs.CL eess.AS stat.ML," Having a sequence-to-sequence model which can operate in an online fashion is -important for streaming applications such as Voice Search. Neural transducer is -a streaming sequence-to-sequence model, but has shown a significant degradation -in performance compared to non-streaming models such as Listen, Attend and -Spell (LAS). In this paper, we present various improvements to NT. -Specifically, we look at increasing the window over which NT computes -attention, mainly by looking backwards in time so the model still remains -online. In addition, we explore initializing a NT model from a LAS-trained -model so that it is guided with a better alignment. Finally, we explore -including stronger language models such as using wordpiece models, and applying -an external LM during the beam search. On a Voice Search task, we find with -these improvements we can get NT to match the performance of LAS. -" -6323,1712.01813,Avirup Sil and Gourab Kundu and Radu Florian and Wael Hamza,Neural Cross-Lingual Entity Linking,cs.CL," A major challenge in Entity Linking (EL) is making effective use of -contextual information to disambiguate mentions to Wikipedia that might refer -to different entities in different contexts. The problem exacerbates with -cross-lingual EL which involves linking mentions written in non-English -documents to entries in the English Wikipedia: to compare textual clues across -languages we need to compute similarity between textual fragments across -languages. In this paper, we propose a neural EL model that trains fine-grained -similarities and dissimilarities between the query and candidate document from -multiple perspectives, combined with convolution and tensor networks. Further, -we show that this English-trained system can be applied, in zero-shot learning, -to other languages by making surprisingly effective use of multi-lingual -embeddings. The proposed system has strong empirical evidence yielding -state-of-the-art results in English as well as cross-lingual: Spanish and -Chinese TAC 2015 datasets. -" -6324,1712.01818,"Rohit Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick Nguyen, - Zhifeng Chen, Chung-Cheng Chiu, Anjuli Kannan","Minimum Word Error Rate Training for Attention-based - Sequence-to-Sequence Models",cs.CL eess.AS stat.ML," Sequence-to-sequence models, such as attention-based models in automatic -speech recognition (ASR), are typically trained to optimize the cross-entropy -criterion which corresponds to improving the log-likelihood of the data. -However, system performance is usually measured in terms of word error rate -(WER), not log-likelihood. Traditional ASR systems benefit from discriminative -sequence training which optimizes criteria such as the state-level minimum -Bayes risk (sMBR) which are more closely related to WER. In the present work, -we explore techniques to train attention-based models to directly minimize -expected word error rate. We consider two loss functions which approximate the -expected number of word errors: either by sampling from the model, or by using -N-best lists of decoded hypotheses, which we find to be more effective than the -sampling-based method. In experimental evaluations, we find that the proposed -training procedure improves performance by up to 8.2% relative to the baseline -system. This allows us to train grapheme-based, uni-directional attention-based -models which match the performance of a traditional, state-of-the-art, -discriminative sequence-trained system on a mobile voice-search task. -" -6325,1712.01821,"Mercedes Garc\'ia-Mart\'inez and Lo\""ic Barrault and Fethi Bougares",Neural Machine Translation by Generating Multiple Linguistic Factors,cs.CL," Factored neural machine translation (FNMT) is founded on the idea of using -the morphological and grammatical decomposition of the words (factors) at the -output side of the neural network. This architecture addresses two well-known -problems occurring in MT, namely the size of target language vocabulary and the -number of unknown tokens produced in the translation. FNMT system is designed -to manage larger vocabulary and reduce the training time (for systems with -equivalent target language vocabulary size). Moreover, we can produce -grammatically correct words that are not part of the vocabulary. FNMT model is -evaluated on IWSLT'15 English to French task and compared to the baseline -word-based and BPE-based NMT systems. Promising qualitative and quantitative -results (in terms of BLEU and METEOR) are reported. -" -6326,1712.01864,"Tara N. Sainath, Rohit Prabhavalkar, Shankar Kumar, Seungji Lee, - Anjuli Kannan, David Rybach, Vlad Schogol, Patrick Nguyen, Bo Li, Yonghui Wu, - Zhifeng Chen, Chung-Cheng Chiu","No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica - in End-to-End Models",cs.CL cs.SD eess.AS stat.ML," For decades, context-dependent phonemes have been the dominant sub-word unit -for conventional acoustic modeling systems. This status quo has begun to be -challenged recently by end-to-end models which seek to combine acoustic, -pronunciation, and language model components into a single neural network. Such -systems, which typically predict graphemes or words, simplify the recognition -process since they remove the need for a separate expert-curated pronunciation -lexicon to map from phoneme-based units to words. However, there has been -little previous work comparing phoneme-based versus grapheme-based sub-word -units in the end-to-end modeling framework, to determine whether the gains from -such approaches are primarily due to the new probabilistic model, or from the -joint learning of the various components with grapheme-based units. - In this work, we conduct detailed experiments which are aimed at quantifying -the value of phoneme-based pronunciation lexica in the context of end-to-end -models. We examine phoneme-based end-to-end models, which are contrasted -against grapheme-based ones on a large vocabulary English Voice-search task, -where we find that graphemes do indeed outperform phonemes. We also compare -grapheme and phoneme-based approaches on a multi-dialect English task, which -once again confirm the superiority of graphemes, greatly simplifying the system -for recognizing multiple dialects. -" -6327,1712.01969,"Salman Mohammed, Peng Shi, Jimmy Lin","Strong Baselines for Simple Question Answering over Knowledge Graphs - with and without Neural Networks",cs.CL," We examine the problem of question answering over knowledge graphs, focusing -on simple questions that can be answered by the lookup of a single fact. -Adopting a straightforward decomposition of the problem into entity detection, -entity linking, relation prediction, and evidence combination, we explore -simple yet strong baselines. On the popular SimpleQuestions dataset, we find -that basic LSTMs and GRUs plus a few heuristics yield accuracies that approach -the state of the art, and techniques that do not use neural networks also -perform reasonably well. These results show that gains from sophisticated deep -learning techniques proposed in the literature are quite modest and that some -previous models exhibit unnecessary complexity. -" -6328,1712.01996,"Anjuli Kannan, Yonghui Wu, Patrick Nguyen, Tara N. Sainath, Zhifeng - Chen, Rohit Prabhavalkar","An analysis of incorporating an external language model into a - sequence-to-sequence model",eess.AS cs.AI cs.CL cs.SD," Attention-based sequence-to-sequence models for automatic speech recognition -jointly train an acoustic model, language model, and alignment mechanism. Thus, -the language model component is only trained on transcribed audio-text pairs. -This leads to the use of shallow fusion with an external language model at -inference time. Shallow fusion refers to log-linear interpolation with a -separately trained language model at each step of the beam search. In this -work, we investigate the behavior of shallow fusion across a range of -conditions: different types of language models, different decoding units, and -different tasks. On Google Voice Search, we demonstrate that the use of shallow -fusion with a neural LM with wordpieces yields a 9.1% relative word error rate -reduction (WERR) over our competitive attention-based sequence-to-sequence -model, obviating the need for second-pass rescoring. -" -6329,1712.02016,Hu Xu and Sihong Xie and Lei Shu and Philip S. Yu,"Dual Attention Network for Product Compatibility and Function - Satisfiability Analysis",cs.CL," Product compatibility and their functionality are of utmost importance to -customers when they purchase products, and to sellers and manufacturers when -they sell products. Due to the huge number of products available online, it is -infeasible to enumerate and test the compatibility and functionality of every -product. In this paper, we address two closely related problems: product -compatibility analysis and function satisfiability analysis, where the second -problem is a generalization of the first problem (e.g., whether a product works -with another product can be considered as a special function). We first -identify a novel question and answering corpus that is up-to-date regarding -product compatibility and functionality information. To allow automatic -discovery product compatibility and functionality, we then propose a deep -learning model called Dual Attention Network (DAN). Given a QA pair for a -to-be-purchased product, DAN learns to 1) discover complementary products (or -functions), and 2) accurately predict the actual compatibility (or -satisfiability) of the discovered products (or functions). The challenges -addressed by the model include the briefness of QAs, linguistic patterns -indicating compatibility, and the appropriate fusion of questions and answers. -We conduct experiments to quantitatively and qualitatively show that the -identified products and functions have both high coverage and accuracy, -compared with a wide spectrum of baselines. -" -6330,1712.02034,"Garrett B. Goh, Nathan O. Hodas, Charles Siegel, Abhinav Vishnu","SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for - Predicting Chemical Properties",stat.ML cs.AI cs.CL cs.LG," Chemical databases store information in text representations, and the SMILES -format is a universal standard used in many cheminformatics software. Encoded -in each SMILES string is structural information that can be used to predict -complex chemical properties. In this work, we develop SMILES2vec, a deep RNN -that automatically learns features from SMILES to predict chemical properties, -without the need for additional explicit feature engineering. Using Bayesian -optimization methods to tune the network architecture, we show that an -optimized SMILES2vec model can serve as a general-purpose neural network for -predicting distinct chemical properties including toxicity, activity, -solubility and solvation energy, while also outperforming contemporary MLP -neural networks that uses engineered features. Furthermore, we demonstrate -proof-of-concept of interpretability by developing an explanation mask that -localizes on the most important characters used in making a prediction. When -tested on the solubility dataset, it identified specific parts of a chemical -that is consistent with established first-principles knowledge with an accuracy -of 88%. Our work demonstrates that neural networks can learn technically -accurate chemical concept and provide state-of-the-art accuracy, making -interpretable deep neural networks a useful tool of relevance to the chemical -industry. -" -6331,1712.02047,Jinbae Im and Sungzoon Cho,Distance-based Self-Attention Network for Natural Language Inference,cs.CL cs.AI," Attention mechanism has been used as an ancillary means to help RNN or CNN. -However, the Transformer (Vaswani et al., 2017) recently recorded the -state-of-the-art performance in machine translation with a dramatic reduction -in training time by solely using attention. Motivated by the Transformer, -Directional Self Attention Network (Shen et al., 2017), a fully attention-based -sentence encoder, was proposed. It showed good performance with various data by -using forward and backward directional information in a sentence. But in their -study, not considered at all was the distance between words, an important -feature when learning the local dependency to help understand the context of -input text. We propose Distance-based Self-Attention Network, which considers -the word distance by using a simple distance mask in order to model the local -dependency without losing the ability of modeling global dependency which -attention has inherent. Our model shows good performance with NLI data, and it -records the new state-of-the-art result with SNLI data. Additionally, we show -that our model has a strength in long sentences or documents. -" -6332,1712.02109,"Hao Xiong, Zhongjun He, Xiaoguang Hu and Hua Wu",Multi-channel Encoder for Neural Machine Translation,cs.CL," Attention-based Encoder-Decoder has the effective architecture for neural -machine translation (NMT), which typically relies on recurrent neural networks -(RNN) to build the blocks that will be lately called by attentive reader during -the decoding process. This design of encoder yields relatively uniform -composition on source sentence, despite the gating mechanism employed in -encoding RNN. On the other hand, we often hope the decoder to take pieces of -source sentence at varying levels suiting its own linguistic structure: for -example, we may want to take the entity name in its raw form while taking an -idiom as a perfectly composed unit. Motivated by this demand, we propose -Multi-channel Encoder (MCE), which enhances encoding components with different -levels of composition. More specifically, in addition to the hidden state of -encoding RNN, MCE takes 1) the original word embedding for raw encoding with no -composition, and 2) a particular design of external memory in Neural Turing -Machine (NTM) for more complex composition, while all three encoding strategies -are properly blended during decoding. Empirical study on Chinese-English -translation shows that our model can improve by 6.52 BLEU points upon a strong -open source NMT system: DL4MT1. On the WMT14 English- French task, our single -shallow system achieves BLEU=38.8, comparable with the state-of-the-art deep -models. -" -6333,1712.02121,"Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, Dinh Phung","A Novel Embedding Model for Knowledge Base Completion Based on - Convolutional Neural Network",cs.CL," In this paper, we propose a novel embedding model, named ConvKB, for -knowledge base completion. Our model ConvKB advances state-of-the-art models by -employing a convolutional neural network, so that it can capture global -relationships and transitional characteristics between entities and relations -in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) -is represented as a 3-column matrix where each column vector represents a -triple element. This 3-column matrix is then fed to a convolution layer where -multiple filters are operated on the matrix to generate different feature maps. -These feature maps are then concatenated into a single feature vector -representing the input triple. The feature vector is multiplied with a weight -vector via a dot product to return a score. This score is then used to predict -whether the triple is valid or not. Experiments show that ConvKB achieves -better link prediction performance than previous state-of-the-art embedding -models on two benchmark datasets WN18RR and FB15k-237. -" -6334,1712.02186,"Hu Xu, Sihong Xie, Lei Shu, Philip S. Yu",Product Function Need Recognition via Semi-supervised Attention Network,cs.CL," Functionality is of utmost importance to customers when they purchase -products. However, it is unclear to customers whether a product can really -satisfy their needs on functions. Further, missing functions may be -intentionally hidden by the manufacturers or the sellers. As a result, a -customer needs to spend a fair amount of time before purchasing or just -purchase the product on his/her own risk. In this paper, we first identify a -novel QA corpus that is dense on product functionality information -\footnote{The annotated corpus can be found at -\url{https://www.cs.uic.edu/~hxu/}.}. We then design a neural network called -Semi-supervised Attention Network (SAN) to discover product functions from -questions. This model leverages unlabeled data as contextual information to -perform semi-supervised sequence labeling. We conduct experiments to show that -the extracted function have both high coverage and accuracy, compared with a -wide spectrum of baselines. -" -6335,1712.02223,"Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal - Lukasik, Kalina Bontcheva, Trevor Cohn, Isabelle Augenstein","Discourse-Aware Rumour Stance Classification in Social Media Using - Sequential Classifiers",cs.CL cs.SI," Rumour stance classification, defined as classifying the stance of specific -social media posts into one of supporting, denying, querying or commenting on -an earlier post, is becoming of increasing interest to researchers. While most -previous work has focused on using individual tweets as classifier inputs, here -we report on the performance of sequential classifiers that exploit the -discourse features inherent in social media interactions or 'conversational -threads'. Testing the effectiveness of four sequential classifiers -- Hawkes -Processes, Linear-Chain Conditional Random Fields (Linear CRF), Tree-Structured -Conditional Random Fields (Tree CRF) and Long Short Term Memory networks (LSTM) --- on eight datasets associated with breaking news stories, and looking at -different types of local and contextual features, our work sheds new light on -the development of accurate stance classifiers. We show that sequential -classifiers that exploit the use of discourse properties in social media -conversations while using only local features, outperform non-sequential -classifiers. Furthermore, we show that LSTM using a reduced set of features can -outperform the other sequential classifiers; this performance is consistent -across datasets and across types of stances. To conclude, our work also -analyses the different features under study, identifying those that best help -characterise and distinguish between stances, such as supporting tweets being -more likely to be accompanied by evidence than denying tweets. We also set -forth a number of directions for future research. -" -6336,1712.02250,"Bolin Wei, Shuai Lu, Lili Mou, Hao Zhou, Pascal Poupart, Ge Li, Zhi - Jin","Why Do Neural Dialog Systems Generate Short and Meaningless Replies? A - Comparison between Dialog and Translation",cs.CL cs.LG," This paper addresses the question: Why do neural dialog systems generate -short and meaningless replies? We conjecture that, in a dialog system, an -utterance may have multiple equally plausible replies, causing the deficiency -of neural networks in the dialog application. We propose a systematic way to -mimic the dialog scenario in a machine translation system, and manage to -reproduce the phenomenon of generating short and less meaningful sentences in -the translation setting, showing evidence of our conjecture. -" -6337,1712.02259,"Nicolas Thiebaut, Antoine Simoulin, Karl Neuberger, Issam Ibnouhsein, - Nicolas Bousquet, Nathalie Reix, S\'ebastien Moli\`ere, Carole Mathelin",An innovative solution for breast cancer textual big data analysis,stat.ML cs.CL," The digitalization of stored information in hospitals now allows for the -exploitation of medical data in text format, as electronic health records -(EHRs), initially gathered for other purposes than epidemiology. Manual search -and analysis operations on such data become tedious. In recent years, the use -of natural language processing (NLP) tools was highlighted to automatize the -extraction of information contained in EHRs, structure it and perform -statistical analysis on this structured information. The main difficulties with -the existing approaches is the requirement of synonyms or ontology -dictionaries, that are mostly available in English only and do not include -local or custom notations. In this work, a team composed of oncologists as -domain experts and data scientists develop a custom NLP-based system to process -and structure textual clinical reports of patients suffering from breast -cancer. The tool relies on the combination of standard text mining techniques -and an advanced synonym detection method. It allows for a global analysis by -retrieval of indicators such as medical history, tumor characteristics, -therapeutic responses, recurrences and prognosis. The versatility of the method -allows to obtain easily new indicators, thus opening up the way for -retrospective studies with a substantial reduction of the amount of manual -work. With no need for biomedical annotators or pre-defined ontologies, this -language-agnostic method reached an good extraction accuracy for several -concepts of interest, according to a comparison with a manually structured -file, without requiring any existing corpus with local or new notations. -" -6338,1712.02480,Paul Reisert and Naoya Inoue and Naoaki Okazaki and Kentaro Inui,"A Corpus of Deep Argumentative Structures as an Explanation to - Argumentative Relations",cs.CL," In this paper, we compose a new task for deep argumentative structure -analysis that goes beyond shallow discourse structure analysis. The idea is -that argumentative relations can reasonably be represented with a small set of -predefined patterns. For example, using value judgment and bipolar causality, -we can explain a support relation between two argumentative segments as -follows: Segment 1 states that something is good, and Segment 2 states that it -is good because it promotes something good when it happens. We are motivated by -the following questions: (i) how do we formulate the task?, (ii) can a -reasonable pattern set be created?, and (iii) do the patterns work? To examine -the task feasibility, we conduct a three-stage, detailed annotation study using -357 argumentative relations from the argumentative microtext corpus, a small, -but highly reliable corpus. We report the coverage of explanations captured by -our patterns on a test set composed of 270 relations. Our coverage result of -74.6% indicates that argumentative relations can reasonably be explained by our -small pattern set. Our agreement result of 85.9% shows that a reasonable -inter-annotator agreement can be achieved. To assist with future work in -computational argumentation, the annotated corpus is made publicly available. -" -6339,1712.02555,"Han Xiao, Yidong Chen, Xiaodong Shi",Hungarian Layer: Logics Empowered Neural Architecture,cs.CL," Neural architecture is a purely numeric framework, which fits the data as a -continuous function. However, lacking of logic flow (e.g. \textit{if, for, -while}), traditional algorithms (e.g. \textit{Hungarian algorithm, A$^*$ -searching, decision tress algorithm}) could not be embedded into this paradigm, -which limits the theories and applications. In this paper, we reform the -calculus graph as a dynamic process, which is guided by logic flow. Within our -novel methodology, traditional algorithms could empower numerical neural -network. Specifically, regarding the subject of sentence matching, we -reformulate this issue as the form of task-assignment, which is solved by -Hungarian algorithm. First, our model applies BiLSTM to parse the sentences. -Then Hungarian layer aligns the matching positions. Last, we transform the -matching results for soft-max regression by another BiLSTM. Extensive -experiments show that our model outperforms other state-of-the-art baselines -substantially. -" -6340,1712.02767,"Sachin Pawar, Nitin Ramrakhiyani, Swapnil Hingmire and Girish K. - Palshikar","Topics and Label Propagation: Best of Both Worlds for Weakly Supervised - Text Classification",cs.CL cs.LG," We propose a Label Propagation based algorithm for weakly supervised text -classification. We construct a graph where each document is represented by a -node and edge weights represent similarities among the documents. Additionally, -we discover underlying topics using Latent Dirichlet Allocation (LDA) and -enrich the document graph by including the topics in the form of additional -nodes. The edge weights between a topic and a text document represent level of -""affinity"" between them. Our approach does not require document level -labelling, instead it expects manual labels only for topic nodes. This -significantly minimizes the level of supervision needed as only a few topics -are observed to be enough for achieving sufficiently high accuracy. The Label -Propagation Algorithm is employed on this enriched graph to propagate labels -among the nodes. Our approach combines the advantages of Label Propagation -(through document-document similarities) and Topic Modelling (for minimal but -smart supervision). We demonstrate the effectiveness of our approach on various -datasets and compare with state-of-the-art weakly supervised text -classification approaches. -" -6341,1712.02768,"Christy Li, Dimitris Konomis, Graham Neubig, Pengtao Xie, Carol Cheng, - Eric Xing",Convolutional Neural Networks for Medical Diagnosis from Admission Notes,cs.CL," $\textbf{Objective}$ Develop an automatic diagnostic system which only uses -textual admission information from Electronic Health Records (EHRs) and assist -clinicians with a timely and statistically proved decision tool. The hope is -that the tool can be used to reduce mis-diagnosis. - $\textbf{Materials and Methods}$ We use the real-world clinical notes from -MIMIC-III, a freely available dataset consisting of clinical data of more than -forty thousand patients who stayed in intensive care units of the Beth Israel -Deaconess Medical Center between 2001 and 2012. We proposed a Convolutional -Neural Network model to learn semantic features from unstructured textual input -and automatically predict primary discharge diagnosis. - $\textbf{Results}$ The proposed model achieved an overall 96.11% accuracy and -80.48% weighted F1 score values on 10 most frequent disease classes, -significantly outperforming four strong baseline models by at least 12.7% in -weighted F1 score. - $\textbf{Discussion}$ Experimental results imply that the CNN model is -suitable for supporting diagnosis decision making in the presence of complex, -noisy and unstructured clinical data while at the same time using fewer layers -and parameters that other traditional Deep Network models. - $\textbf{Conclusion}$ Our model demonstrated capability of representing -complex medical meaningful features from unstructured clinical notes and -prediction power for commonly misdiagnosed frequent diseases. It can use easily -adopted in clinical setting to provide timely and statistically proved decision -support. - $\textbf{Keywords}$ Convolutional neural network, text classification, -discharge diagnosis prediction, admission information from EHRs. -" -6342,1712.02820,"Basant Agarwal, Heri Ramampiaro, Helge Langseth, Massimiliano Ruocco",A Deep Network Model for Paraphrase Detection in Short Text Messages,cs.IR cs.AI cs.CL," This paper is concerned with paraphrase detection. The ability to detect -similar sentences written in natural language is crucial for several -applications, such as text mining, text summarization, plagiarism detection, -authorship authentication and question answering. Given two sentences, the -objective is to detect whether they are semantically identical. An important -insight from this work is that existing paraphrase systems perform well when -applied on clean texts, but they do not necessarily deliver good performance -against noisy texts. Challenges with paraphrase detection on user generated -short texts, such as Twitter, include language irregularity and noise. To cope -with these challenges, we propose a novel deep neural network-based approach -that relies on coarse-grained sentence modeling using a convolutional neural -network and a long short-term memory model, combined with a specific -fine-grained word-level similarity matching model. Our experimental results -show that the proposed approach outperforms existing state-of-the-art -approaches on user-generated noisy social media data, such as Twitter texts, -and achieves highly competitive performance on a cleaner corpus. -" -6343,1712.02838,"Li Zhou, Kevin Small, Oleg Rokhlenko, Charles Elkan","End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy - Gradient",cs.AI cs.CL cs.LG," Learning a goal-oriented dialog policy is generally performed offline with -supervised learning algorithms or online with reinforcement learning (RL). -Additionally, as companies accumulate massive quantities of dialog transcripts -between customers and trained human agents, encoder-decoder methods have gained -popularity as agent utterances can be directly treated as supervision without -the need for utterance-level annotations. However, one potential drawback of -such approaches is that they myopically generate the next agent utterance -without regard for dialog-level considerations. To resolve this concern, this -paper describes an offline RL method for learning from unannotated corpora that -can optimize a goal-oriented policy at both the utterance and dialog level. We -introduce a novel reward function and use both on-policy and off-policy policy -gradient to learn a policy offline without requiring online user interaction or -an explicit state space definition. -" -6344,1712.02856,"Han He, Lei Wu, Hua Yan, Zhimin Gao, Yi Feng, George Townsend",Effective Neural Solution for Multi-Criteria Word Segmentation,cs.CL," We present a simple yet elegant solution to train a single joint model on -multi-criteria corpora for Chinese Word Segmentation (CWS). Our novel design -requires no private layers in model architecture, instead, introduces two -artificial tokens at the beginning and ending of input sentence to specify the -required target criteria. The rest of the model including Long Short-Term -Memory (LSTM) layer and Conditional Random Fields (CRFs) layer remains -unchanged and is shared across all datasets, keeping the size of parameter -collection minimal and constant. On Bakeoff 2005 and Bakeoff 2008 datasets, our -innovative design has surpassed both single-criterion and multi-criteria -state-of-the-art learning results. To the best knowledge, our design is the -first one that has achieved the latest high performance on such large scale -datasets. Source codes and corpora of this paper are available on GitHub. -" -6345,1712.02896,Eric Chu and Deb Roy,Audio-Visual Sentiment Analysis for Learning Emotional Arcs in Movies,cs.CV cs.CL," Stories can have tremendous power -- not only useful for entertainment, they -can activate our interests and mobilize our actions. The degree to which a -story resonates with its audience may be in part reflected in the emotional -journey it takes the audience upon. In this paper, we use machine learning -methods to construct emotional arcs in movies, calculate families of arcs, and -demonstrate the ability for certain arcs to predict audience engagement. The -system is applied to Hollywood films and high quality shorts found on the web. -We begin by using deep convolutional neural networks for audio and visual -sentiment analysis. These models are trained on both new and existing -large-scale datasets, after which they can be used to compute separate audio -and visual emotional arcs. We then crowdsource annotations for 30-second video -clips extracted from highs and lows in the arcs in order to assess the -micro-level precision of the system, with precision measured in terms of -agreement in polarity between the system's predictions and annotators' ratings. -These annotations are also used to combine the audio and visual predictions. -Next, we look at macro-level characterizations of movies by investigating -whether there exist `universal shapes' of emotional arcs. In particular, we -develop a clustering approach to discover distinct classes of emotional arcs. -Finally, we show on a sample corpus of short web videos that certain emotional -arcs are statistically significant predictors of the number of comments a video -receives. These results suggest that the emotional arcs learned by our approach -successfully represent macroscopic aspects of a video story that drive audience -engagement. Such machine understanding could be used to predict audience -reactions to video stories, ultimately improving our ability as storytellers to -communicate with each other. -" -6346,1712.02959,"Mehreen Alam, Sibt ul Hussain",Sequence to Sequence Networks for Roman-Urdu to Urdu Transliteration,cs.CL," Neural Machine Translation models have replaced the conventional phrase based -statistical translation methods since the former takes a generic, scalable, -data-driven approach rather than relying on manual, hand-crafted features. The -neural machine translation system is based on one neural network that is -composed of two parts, one that is responsible for input language sentence and -other part that handles the desired output language sentence. This model based -on encoder-decoder architecture also takes as input the distributed -representations of the source language which enriches the learnt dependencies -and gives a warm start to the network. In this work, we transform Roman-Urdu to -Urdu transliteration into sequence to sequence learning problem. To this end, -we make the following contributions. We create the first ever parallel corpora -of Roman-Urdu to Urdu, create the first ever distributed representation of -Roman-Urdu and present the first neural machine translation model that -transliterates text from Roman-Urdu to Urdu language. Our model has achieved -the state-of-the-art results using BLEU as the evaluation metric. Precisely, -our model is able to correctly predict sentences up to length 10 while -achieving BLEU score of 48.6 on the test set. We are hopeful that our model and -our results shall serve as the baseline for further work in the domain of -neural machine translation for Roman-Urdu to Urdu using distributed -representation. -" -6347,1712.03086,"Mayank Kejriwal, Jiayuan Ding, Runqi Shao, Anoop Kumar, Pedro Szekely","FlagIt: A System for Minimally Supervised Human Trafficking Indicator - Mining",cs.CY cs.AI cs.CL," In this paper, we describe and study the indicator mining problem in the -online sex advertising domain. We present an in-development system, FlagIt -(Flexible and adaptive generation of Indicators from text), which combines the -benefits of both a lightweight expert system and classical semi-supervision -(heuristic re-labeling) with recently released state-of-the-art unsupervised -text embeddings to tag millions of sentences with indicators that are highly -correlated with human trafficking. The FlagIt technology stack is open source. -On preliminary evaluations involving five indicators, FlagIt illustrates -promising performance compared to several alternatives. The system is being -actively developed, refined and integrated into a domain-specific search system -used by over 200 law enforcement agencies to combat human trafficking, and is -being aggressively extended to mine at least six more indicators with minimal -programming effort. FlagIt is a good example of a system that operates in -limited label settings, and that requires creative combinations of established -machine learning techniques to produce outputs that could be used by real-world -non-technical analysts. -" -6348,1712.03133,"Kartik Audhkhasi, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, - Michael Picheny","Building competitive direct acoustics-to-word models for English - conversational speech recognition",cs.CL cs.AI cs.NE stat.ML," Direct acoustics-to-word (A2W) models in the end-to-end paradigm have -received increasing attention compared to conventional sub-word based automatic -speech recognition models using phones, characters, or context-dependent hidden -Markov model states. This is because A2W models recognize words from speech -without any decoder, pronunciation lexicon, or externally-trained language -model, making training and decoding with such models simple. Prior work has -shown that A2W models require orders of magnitude more training data in order -to perform comparably to conventional models. Our work also showed this -accuracy gap when using the English Switchboard-Fisher data set. This paper -describes a recipe to train an A2W model that closes this gap and is at-par -with state-of-the-art sub-word based models. We achieve a word error rate of -8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder -or language model. We find that model initialization, training data order, and -regularization have the most impact on the A2W model performance. Next, we -present a joint word-character A2W model that learns to first spell the word -and then recognize it. This model provides a rich output to the user instead of -simple word hypotheses, making it especially useful in the case of words unseen -or rarely-seen during training. -" -6349,1712.03199,"Victor Akinwande, Sekou L. Remy","Characterizing the hyper-parameter space of LSTM language models for - mixed context applications",cs.CL," Applying state of the art deep learning models to novel real world datasets -gives a practical evaluation of the generalizability of these models. Of -importance in this process is how sensitive the hyper parameters of such models -are to novel datasets as this would affect the reproducibility of a model. We -present work to characterize the hyper parameter space of an LSTM for language -modeling on a code-mixed corpus. We observe that the evaluated model shows -minimal sensitivity to our novel dataset bar a few hyper parameters. -" -6350,1712.03249,"Florian Krebs, Bruno Lubascher, Tobias Moers, Pieter Schaap, Gerasimos - Spanakis",Social Emotion Mining Techniques for Facebook Posts Reaction Prediction,cs.AI cs.CL cs.IR," As of February 2016 Facebook allows users to express their experienced -emotions about a post by using five so-called `reactions'. This research paper -proposes and evaluates alternative methods for predicting these reactions to -user posts on public pages of firms/companies (like supermarket chains). For -this purpose, we collected posts (and their reactions) from Facebook pages of -large supermarket chains and constructed a dataset which is available for other -researches. In order to predict the distribution of reactions of a new post, -neural network architectures (convolutional and recurrent neural networks) were -tested using pretrained word embeddings. Results of the neural networks were -improved by introducing a bootstrapping approach for sentiment and emotion -mining on the comments for each post. The final model (a combination of neural -network and a baseline emotion miner) is able to predict the reaction -distribution on Facebook posts with a mean squared error (or misclassification -rate) of 0.135. -" -6351,1712.03376,"Minh Le, Marten Postma, Jacopo Urbani","Word Sense Disambiguation with LSTM: Do We Really Need 100 Billion - Words?",cs.CL," Recently, Yuan et al. (2016) have shown the effectiveness of using Long -Short-Term Memory (LSTM) for performing Word Sense Disambiguation (WSD). Their -proposed technique outperformed the previous state-of-the-art with several -benchmarks, but neither the training data nor the source code was released. -This paper presents the results of a reproduction study of this technique using -only openly available datasets (GigaWord, SemCore, OMSTI) and software -(TensorFlow). From them, it emerged that state-of-the-art results can be -obtained with much less data than hinted by Yuan et al. All code and trained -models are made freely available. -" -6352,1712.03430,Sharmistha Dey,"Aspect Extraction and Sentiment Classification of Mobile Apps using - App-Store Reviews",cs.CL," Understanding of customer sentiment can be useful for product development. On -top of that if the priorities for the development order can be known, then -development procedure become simpler. This work has tried to address this issue -in the mobile app domain. Along with aspect and opinion extraction this work -has also categorized the extracted aspects ac-cording to their importance. This -can help developers to focus their time and energy at the right place. -" -6353,1712.03449,Jean-Benoit Delbrouck and St\'ephane Dupont,"Modulating and attending the source image during encoding improves - Multimodal Translation",cs.CL," We propose a new and fully end-to-end approach for multimodal translation -where the source text encoder modulates the entire visual input processing -using conditional batch normalization, in order to compute the most informative -image features for our task. Additionally, we propose a new attention mechanism -derived from this original idea, where the attention model for the visual input -is conditioned on the source text encoder representations. In the paper, we -detail our models as well as the image analysis pipeline. Finally, we report -experimental results. They are, as far as we know, the new state of the art on -three different test sets. -" -6354,1712.03463,"Yonatan Bisk, Kevin J. Shih, Yejin Choi, Daniel Marcu",Learning Interpretable Spatial Operations in a Rich 3D Blocks World,cs.CL," In this paper, we study the problem of mapping natural language instructions -to complex spatial actions in a 3D blocks world. We first introduce a new -dataset that pairs complex 3D spatial operations to rich natural language -descriptions that require complex spatial and pragmatic interpretations such as -""mirroring"", ""twisting"", and ""balancing"". This dataset, built on the simulation -environment of Bisk, Yuret, and Marcu (2016), attains language that is -significantly richer and more complex, while also doubling the size of the -original dataset in the 2D environment with 100 new world configurations and -250,000 tokens. In addition, we propose a new neural architecture that achieves -competitive results while automatically discovering an inventory of -interpretable spatial operations (Figure 5) -" -6355,1712.03512,Inna A. Belashova and Vladimir V. Bochkarev,"Comparative analysis of criteria for filtering time series of word usage - frequencies",stat.ME cs.CL stat.AP," This paper describes a method of nonlinear wavelet thresholding of time -series. The Ramachandran-Ranganathan runs test is used to assess the quality of -approximation. To minimize the objective function, it is proposed to use -genetic algorithms - one of the stochastic optimization methods. The suggested -method is tested both on the model series and on the word frequency series -using the Google Books Ngram data. It is shown that method of filtering which -uses the runs criterion shows significantly better results compared with the -standard wavelet thresholding. The method can be used when quality of filtering -is of primary importance but not the speed of calculations. -" -6356,1712.03538,"Adrian Benton, Margaret Mitchell, Dirk Hovy",Multi-Task Learning for Mental Health using Social Media Text,cs.CL," We introduce initial groundwork for estimating suicide risk and mental health -in a deep learning framework. By modeling multiple conditions, the system -learns to make predictions about suicide risk and mental health at a low false -positive rate. Conditions are modeled as tasks in a multi-task learning (MTL) -framework, with gender prediction as an additional auxiliary task. We -demonstrate the effectiveness of multi-task learning by comparison to a -well-tuned single-task baseline with the same number of parameters. Our best -MTL model predicts potential suicide attempt, as well as the presence of -atypical mental health, with AUC > 0.8. We also find additional large -improvements using multi-task learning on mental health tasks with limited -training data. -" -6357,1712.03547,"Chandrahas and Tathagata Sengupta and Cibi Pragadeesh and Partha - Pratim Talukdar",Inducing Interpretability in Knowledge Graph Embeddings,cs.CL," We study the problem of inducing interpretability in KG embeddings. -Specifically, we explore the Universal Schema (Riedel et al., 2013) and propose -a method to induce interpretability. There have been many vector space models -proposed for the problem, however, most of these methods don't address the -interpretability (semantics) of individual dimensions. In this work, we study -this problem and propose a method for inducing interpretability in KG -embeddings using entity co-occurrence statistics. The proposed method -significantly improves the interpretability, while maintaining comparable -performance in other KG tasks. -" -6358,1712.03556,"Xiaodong Liu, Yelong Shen, Kevin Duh and Jianfeng Gao",Stochastic Answer Networks for Machine Reading Comprehension,cs.CL," We propose a simple yet robust stochastic answer network (SAN) that simulates -multi-step reasoning in machine reading comprehension. Compared to previous -work such as ReasoNet which used reinforcement learning to determine the number -of steps, the unique feature is the use of a kind of stochastic prediction -dropout on the answer module (final layer) of the neural network during the -training. We show that this simple trick improves robustness and achieves -results competitive to the state-of-the-art on the Stanford Question Answering -Dataset (SQuAD), the Adversarial SQuAD, and the Microsoft MAchine Reading -COmprehension Dataset (MS MARCO). -" -6359,1712.03609,"Shimi Salant, Jonathan Berant",Contextualized Word Representations for Reading Comprehension,cs.CL," Reading a document and extracting an answer to a question about its content -has attracted substantial attention recently. While most work has focused on -the interaction between the question and the document, in this work we evaluate -the importance of context when the question and document are processed -independently. We take a standard neural architecture for this task, and show -that by providing rich contextualized word representations from a large -pre-trained language model as well as allowing the model to choose between -context-dependent and context-independent word representations, we can obtain -dramatic improvements and reach performance comparable to state-of-the-art on -the competitive SQuAD dataset. -" -6360,1712.03645,Kumiko Tanaka-Ishii,"Long-Range Correlation Underlying Childhood Language and Generative - Models",cs.CL physics.soc-ph," Long-range correlation, a property of time series exhibiting long-term -memory, is mainly studied in the statistical physics domain and has been -reported to exist in natural language. Using a state-of-the-art method for such -analysis, long-range correlation is first shown to occur in long CHILDES data -sets. To understand why, Bayesian generative models of language, originally -proposed in the cognitive scientific domain, are investigated. Among -representative models, the Simon model was found to exhibit surprisingly good -long-range correlation, but not the Pitman-Yor model. Since the Simon model is -known not to correctly reflect the vocabulary growth of natural language, a -simple new model is devised as a conjunct of the Simon and Pitman-Yor models, -such that long-range correlation holds with a correct vocabulary growth rate. -The investigation overall suggests that uniform sampling is one cause of -long-range correlation and could thus have a relation with actual linguistic -processes. -" -6361,1712.03665,"Ying Zeng, Yansong Feng, Rong Ma, Zheng Wang, Rui Yan, Chongde Shi, - Dongyan Zhao","Scale Up Event Extraction Learning via Automatic Training Data - Generation",cs.CL," The task of event extraction has long been investigated in a supervised -learning paradigm, which is bound by the number and the quality of the training -instances. Existing training data must be manually generated through a -combination of expert domain knowledge and extensive human involvement. -However, due to drastic efforts required in annotating text, the resultant -datasets are usually small, which severally affects the quality of the learned -model, making it hard to generalize. Our work develops an automatic approach -for generating training data for event extraction. Our approach allows us to -scale up event extraction training instances from thousands to hundreds of -thousands, and it does this at a much lower cost than a manual approach. We -achieve this by employing distant supervision to automatically create event -annotations from unlabelled text using existing structured knowledge bases or -tables.We then develop a neural network model with post inference to transfer -the knowledge extracted from structured knowledge bases to automatically -annotate typed events with corresponding arguments in text.We evaluate our -approach by using the knowledge extracted from Freebase to label texts from -Wikipedia articles. Experimental results show that our approach can generate a -large number of high quality training instances. We show that this large volume -of training data not only leads to a better event extractor, but also allows us -to detect multiple typed events. -" -6362,1712.03897,"Kenneth Leidal, David Harwath, and James Glass",Learning Modality-Invariant Representations for Speech and Images,cs.LG cs.CL cs.CV," In this paper, we explore the unsupervised learning of a semantic embedding -space for co-occurring sensory inputs. Specifically, we focus on the task of -learning a semantic vector space for both spoken and handwritten digits using -the TIDIGITs and MNIST datasets. Current techniques encode image and -audio/textual inputs directly to semantic embeddings. In contrast, our -technique maps an input to the mean and log variance vectors of a diagonal -Gaussian from which sample semantic embeddings are drawn. In addition to -encouraging semantic similarity between co-occurring inputs,our loss function -includes a regularization term borrowed from variational autoencoders (VAEs) -which drives the posterior distributions over embeddings to be unit Gaussian. -We can use this regularization term to filter out modality information while -preserving semantic information. We speculate this technique may be more -broadly applicable to other areas of cross-modality/domain information -retrieval and transfer learning. -" -6363,1712.03903,"Dan Liu, Ching Yee Suen, Olga Ormandjieva",A Novel Way of Identifying Cyber Predators,cs.CL cs.CY," Recurrent Neural Networks with Long Short-Term Memory cell (LSTM-RNN) have -impressive ability in sequence data processing, particularly for language model -building and text classification. This research proposes the combination of -sentiment analysis, new approach of sentence vectors and LSTM-RNN as a novel -way for Sexual Predator Identification (SPI). LSTM-RNN language model is -applied to generate sentence vectors which are the last hidden states in the -language model. Sentence vectors are fed into another LSTM-RNN classifier, so -as to capture suspicious conversations. Hidden state enables to generate -vectors for sentences never seen before. Fasttext is used to filter the -contents of conversations and generate a sentiment score so as to identify -potential predators. The experiment achieves a record-breaking accuracy and -precision of 100% with recall of 81.10%, exceeding the top-ranked result in the -SPI competition. -" -6364,1712.03935,"Gaurav Bhatt, Aman Sharma, Shivam Sharma, Ankush Nagpal, - Balasubramanian Raman, and Ankush Mittal","On the Benefit of Combining Neural, Statistical and External Features - for Fake News Identification",cs.CL," Identifying the veracity of a news article is an interesting problem while -automating this process can be a challenging task. Detection of a news article -as fake is still an open question as it is contingent on many factors which the -current state-of-the-art models fail to incorporate. In this paper, we explore -a subtask to fake news identification, and that is stance detection. Given a -news article, the task is to determine the relevance of the body and its claim. -We present a novel idea that combines the neural, statistical and external -features to provide an efficient solution to this problem. We compute the -neural embedding from the deep recurrent model, statistical features from the -weighted n-gram bag-of-words model and handcrafted external features with the -help of feature engineering heuristics. Finally, using deep neural layer all -the features are combined, thereby classifying the headline-body news pair as -agree, disagree, discuss, or unrelated. We compare our proposed technique with -the current state-of-the-art models on the fake news challenge dataset. Through -extensive experiments, we find that the proposed model outperforms all the -state-of-the-art techniques including the submissions to the fake news -challenge. -" -6365,1712.04034,"Maryam Fazel-Zarandi, Shang-Wen Li, Jin Cao, Jared Casale, Peter - Henderson, David Whitney, Alborz Geramifard",Learning Robust Dialog Policies in Noisy Environments,cs.CL cs.AI," Modern virtual personal assistants provide a convenient interface for -completing daily tasks via voice commands. An important consideration for these -assistants is the ability to recover from automatic speech recognition (ASR) -and natural language understanding (NLU) errors. In this paper, we focus on -learning robust dialog policies to recover from these errors. To this end, we -develop a user simulator which interacts with the assistant through voice -commands in realistic scenarios with noisy audio, and use it to learn dialog -policies through deep reinforcement learning. We show that dialogs generated by -our simulator are indistinguishable from human generated dialogs, as determined -by human evaluators. Furthermore, preliminary experimental results show that -the learned policies in noisy environments achieve the same execution success -rate with fewer dialog turns compared to fixed rule-based policies. -" -6366,1712.04046,Jason Poulos and Rafael Valle,Character-Based Handwritten Text Transcription with Attention Networks,cs.CV cs.CL stat.ML," The paper approaches the task of handwritten text recognition (HTR) with -attentional encoder-decoder networks trained on sequences of characters, rather -than words. We experiment on lines of text from popular handwriting datasets -and compare different activation functions for the attention mechanism used for -aligning image pixels and target characters. We find that softmax attention -focuses heavily on individual characters, while sigmoid attention focuses on -multiple characters at each step of the decoding. When the sequence alignment -is one-to-one, softmax attention is able to learn a more precise alignment at -each step of the decoding, whereas the alignment generated by sigmoid attention -is much less precise. When a linear function is used to obtain attention -weights, the model predicts a character by looking at the entire sequence of -characters and performs poorly because it lacks a precise alignment between the -source and target. Future research may explore HTR in natural scene images, -since the model is capable of transcribing handwritten text without the need -for producing segmentations or bounding boxes of text in images. -" -6367,1712.04048,"Hao Zhang, Shizhen Xu, Graham Neubig, Wei Dai, Qirong Ho, Guangwen - Yang, Eric P. Xing",Cavs: A Vertex-centric Programming Interface for Dynamic Neural Networks,cs.LG cs.CL cs.DC," Recent deep learning (DL) models have moved beyond static network -architectures to dynamic ones, handling data where the network structure -changes every example, such as sequences of variable lengths, trees, and -graphs. Existing dataflow-based programming models for DL---both static and -dynamic declaration---either cannot readily express these dynamic models, or -are inefficient due to repeated dataflow graph construction and processing, and -difficulties in batched execution. We present Cavs, a vertex-centric -programming interface and optimized system implementation for dynamic DL -models. Cavs represents dynamic network structure as a static vertex function -$\mathcal{F}$ and a dynamic instance-specific graph $\mathcal{G}$, and performs -backpropagation by scheduling the execution of $\mathcal{F}$ following the -dependencies in $\mathcal{G}$. Cavs bypasses expensive graph construction and -preprocessing overhead, allows for the use of static graph optimization -techniques on pre-defined operations in $\mathcal{F}$, and naturally exposes -batched execution opportunities over different graphs. Experiments comparing -Cavs to two state-of-the-art frameworks for dynamic NNs (TensorFlow Fold and -DyNet) demonstrate the efficacy of this approach: Cavs achieves a near one -order of magnitude speedup on training of various dynamic NN architectures, and -ablations demonstrate the contribution of our proposed batching and memory -management strategies. -" -6368,1712.04116,"Peixian Chen, Zhourong Chen and Nevin L. Zhang","A Novel Document Generation Process for Topic Detection based on - Hierarchical Latent Tree Models",cs.CL cs.IR cs.LG," We propose a novel document generation process based on hierarchical latent -tree models (HLTMs) learned from data. An HLTM has a layer of observed word -variables at the bottom and multiple layers of latent variables on top. For -each document, we first sample values for the latent variables layer by layer -via logic sampling, then draw relative frequencies for the words conditioned on -the values of the latent variables, and finally generate words for the document -using the relative word frequencies. The motivation for the work is to take -word counts into consideration with HLTMs. In comparison with LDA-based -hierarchical document generation processes, the new process achieves -drastically better model fit with much fewer parameters. It also yields more -meaningful topics and topic hierarchies. It is the new state-of-the-art for the -hierarchical topic detection. -" -6369,1712.04158,"Xihu Zhang, Chu Wei and Hai Zhao",Tracing a Loose Wordhood for Chinese Input Method Engine,cs.CL," Chinese input methods are used to convert pinyin sequence or other Latin -encoding systems into Chinese character sentences. For more effective -pinyin-to-character conversion, typical Input Method Engines (IMEs) rely on a -predefined vocabulary that demands manually maintenance on schedule. For the -purpose of removing the inconvenient vocabulary setting, this work focuses on -automatic wordhood acquisition by fully considering that Chinese inputting is a -free human-computer interaction procedure. Instead of strictly defining words, -a loose word likelihood is introduced for measuring how likely a character -sequence can be a user-recognized word with respect to using IME. Then an -online algorithm is proposed to adjust the word likelihood or generate new -words by comparing user true choice for inputting and the algorithm prediction. -The experimental results show that the proposed solution can agilely adapt to -diverse typings and demonstrate performance approaching highly-optimized IME -with fixed vocabulary. -" -6370,1712.04313,"Ewan Dunbar, Xuan Nga Cao, Juan Benjumea, Julien Karadayi, Mathieu - Bernard, Laurent Besacier, Xavier Anguera, Emmanuel Dupoux",The Zero Resource Speech Challenge 2017,cs.CL," We describe a new challenge aimed at discovering subword and word units from -raw speech. This challenge is the followup to the Zero Resource Speech -Challenge 2015. It aims at constructing systems that generalize across -languages and adapt to new speakers. The design features and evaluation metrics -of the challenge are presented and the results of seventeen models are -discussed. -" -6371,1712.04314,"Serhii Hamotskyi, Sergii Stirenko, Yuri Gordienko, Anis Rojbi","Generating and Estimating Nonverbal Alphabets for Situated and - Multimodal Communications",cs.HC cs.CL cs.CY cs.LG," In this paper, we discuss the formalized approach for generating and -estimating symbols (and alphabets), which can be communicated by the wide range -of non-verbal means based on specific user requirements (medium, priorities, -type of information that needs to be conveyed). The short characterization of -basic terms and parameters of such symbols (and alphabets) with approaches to -generate them are given. Then the framework, experimental setup, and some -machine learning methods to estimate usefulness and effectiveness of the -nonverbal alphabets and systems are presented. The previous results demonstrate -that usage of multimodal data sources (like wearable accelerometer, heart -monitor, muscle movements sensors, braincomputer interface) along with machine -learning approaches can provide the deeper understanding of the usefulness and -effectiveness of such alphabets and systems for nonverbal and situated -communication. The symbols (and alphabets) generated and estimated by such -methods may be useful in various applications: from synthetic languages and -constructed scripts to multimodal nonverbal and situated interaction between -people and artificial intelligence systems through Human-Computer Interfaces, -such as mouse gestures, touchpads, body gestures, eyetracking cameras, -wearables, and brain-computing interfaces, especially in applications for -elderly care and people with disabilities. -" -6372,1712.04708,Vlad Zhukov and Eugene Golikov and Maksim Kretov,Differentiable lower bound for expected BLEU score,cs.CL cs.LG," In natural language processing tasks performance of the models is often -measured with some non-differentiable metric, such as BLEU score. To use -efficient gradient-based methods for optimization, it is a common workaround to -optimize some surrogate loss function. This approach is effective if -optimization of such loss also results in improving target metric. The -corresponding problem is referred to as loss-evaluation mismatch. In the -present work we propose a method for calculation of differentiable lower bound -of expected BLEU score that does not involve computationally expensive sampling -procedure such as the one required when using REINFORCE rule from reinforcement -learning (RL) framework. -" -6373,1712.04753,"Karttikeya Mangalam, Tanaya Guha",Learning Spontaneity to Improve Emotion Recognition In Speech,eess.AS cs.CL cs.HC cs.SD," We investigate the effect and usefulness of spontaneity (i.e. whether a given -speech is spontaneous or not) in speech in the context of emotion recognition. -We hypothesize that emotional content in speech is interrelated with its -spontaneity, and use spontaneity classification as an auxiliary task to the -problem of emotion recognition. We propose two supervised learning settings -that utilize spontaneity to improve speech emotion recognition: a hierarchical -model that performs spontaneity detection before performing emotion -recognition, and a multitask learning model that jointly learns to recognize -both spontaneity and emotion. Through various experiments on the well known -IEMOCAP database, we show that by using spontaneity detection as an additional -task, significant improvement can be achieved over emotion recognition systems -that are unaware of spontaneity. We achieve state-of-the-art emotion -recognition accuracy (4-class, 69.1%) on the IEMOCAP database outperforming -several relevant and competitive baselines. -" -6374,1712.04762,"Himank Yadav, Juliang Li",Social Media Writing Style Fingerprint,cs.CL," We present our approach for computer-aided social media text authorship -attribution based on recent advances in short text authorship verification. We -use various natural language techniques to create word-level and -character-level models that act as hidden layers to simulate a simple neural -network. The choice of word-level and character-level models in each layer was -informed through validation performance. The output layer of our system uses an -unweighted majority vote vector to arrive at a conclusion. We also considered -writing bias in social media posts while collecting our training dataset to -increase system robustness. Our system achieved a precision, recall, and -F-measure of 0.82, 0.926 and 0.869 respectively. -" -6375,1712.04787,"Ingmar Steiner, S\'ebastien Le Maguer","Creating New Language and Voice Components for the Updated MaryTTS - Text-to-Speech Synthesis Platform",cs.CL cs.HC," We present a new workflow to create components for the MaryTTS text-to-speech -synthesis platform, which is popular with researchers and developers, extending -it to support new languages and custom synthetic voices. This workflow replaces -the previous toolkit with an efficient, flexible process that leverages modern -build automation and cloud-hosted infrastructure. Moreover, it is compatible -with the updated MaryTTS architecture, enabling new features and -state-of-the-art paradigms such as synthesis based on deep neural networks -(DNNs). Like MaryTTS itself, the new tools are free, open source software -(FOSS), and promote the use of open data. -" -6376,1712.04798,"Arif Khan, Ingmar Steiner, Yusuke Sugano, Andreas Bulling, Ross - Macdonald","A Multimodal Corpus of Expert Gaze and Behavior during Phonetic - Segmentation Tasks",cs.HC cs.CL," Phonetic segmentation is the process of splitting speech into distinct -phonetic units. Human experts routinely perform this task manually by analyzing -auditory and visual cues using analysis software, which is an extremely -time-consuming process. Methods exist for automatic segmentation, but these are -not always accurate enough. In order to improve automatic segmentation, we need -to model it as close to the manual segmentation as possible. This corpus is an -effort to capture the human segmentation behavior by recording experts -performing a segmentation task. We believe that this data will enable us to -highlight the important aspects of manual segmentation, which can be used in -automatic segmentation to improve its accuracy. -" -6377,1712.04853,"Sariya Karimova, Patrick Simianer and Stefan Riezler (Heidelberg - University)","A User-Study on Online Adaptation of Neural Machine Translation to Human - Post-Edits",cs.CL," The advantages of neural machine translation (NMT) have been extensively -validated for offline translation of several language pairs for different -domains of spoken and written language. However, research on interactive -learning of NMT by adaptation to human post-edits has so far been confined to -simulation experiments. We present the first user study on online adaptation of -NMT to user post-edits in the domain of patent translation. Our study involves -29 human subjects (translation students) whose post-editing effort and -translation quality were measured on about 4,500 interactions of a human -post-editor and a machine translation system integrating an online adaptive -learning algorithm. Our experimental results show a significant reduction of -human post-editing effort due to online adaptation in NMT according to several -evaluation metrics, including hTER, hBLEU, and KSMR. Furthermore, we found -significant improvements in BLEU/TER between NMT outputs and professional -translations in granted patents, providing further evidence for the advantages -of online adaptive NMT in an interactive setup. -" -6378,1712.05128,"Pedro Delfino and Bruno Cuconato and Edward Hermann Haeusler and - Alexandre Rademaker",Passing the Brazilian OAB Exam: data preparation and some experiments,cs.CL," In Brazil, all legal professionals must demonstrate their knowledge of the -law and its application by passing the OAB exams, the national bar exams. The -OAB exams therefore provide an excellent benchmark for the performance of legal -information systems since passing the exam would arguably signal that the -system has acquired capacity of legal reasoning comparable to that of a human -lawyer. This article describes the construction of a new data set and some -preliminary experiments on it, treating the problem of finding the -justification for the answers to questions. The results provide a baseline -performance measure against which to evaluate future improvements. We discuss -the reasons to the poor performance and propose next steps. -" -6379,1712.05181,"Tom Bocklisch, Joey Faulkner, Nick Pawlowski, Alan Nichol",Rasa: Open Source Language Understanding and Dialogue Management,cs.CL cs.AI cs.LG," We introduce a pair of tools, Rasa NLU and Rasa Core, which are open source -python libraries for building conversational software. Their purpose is to make -machine-learning based dialogue management and language understanding -accessible to non-specialist software developers. In terms of design -philosophy, we aim for ease of use, and bootstrapping from minimal (or no) -initial training data. Both packages are extensively documented and ship with a -comprehensive suite of tests. The code is available at -https://github.com/RasaHQ/ -" -6380,1712.05191,"Sachin Pawar, Girish K. Palshikar, Pushpak Bhattacharyya",Relation Extraction : A Survey,cs.CL cs.AI cs.IR," With the advent of the Internet, large amount of digital text is generated -everyday in the form of news articles, research publications, blogs, question -answering forums and social media. It is important to develop techniques for -extracting information automatically from these documents, as lot of important -information is hidden within them. This extracted information can be used to -improve access and management of knowledge hidden in large text corpora. -Several applications such as Question Answering, Information Retrieval would -benefit from this information. Entities like persons and organizations, form -the most basic unit of the information. Occurrences of entities in a sentence -are often linked through well-defined relations; e.g., occurrences of person -and organization in a sentence may be linked through relations such as employed -at. The task of Relation Extraction (RE) is to identify such relations -automatically. In this paper, we survey several important supervised, -semi-supervised and unsupervised RE techniques. We also cover the paradigms of -Open Information Extraction (OIE) and Distant Supervision. Finally, we describe -some of the recent trends in the RE techniques and possible future research -directions. This survey would be useful for three kinds of readers - i) -Newcomers in the field who want to quickly learn about RE; ii) Researchers who -want to know how the various RE techniques evolved over time and what are -possible future research directions and iii) Practitioners who just need to -know which RE technique works best in various settings. -" -6381,1712.05382,Chung-Cheng Chiu and Colin Raffel,Monotonic Chunkwise Attention,cs.CL stat.ML," Sequence-to-sequence models with soft attention have been successfully -applied to a wide variety of problems, but their decoding process incurs a -quadratic time and space cost and is inapplicable to real-time sequence -transduction. To address these issues, we propose Monotonic Chunkwise Attention -(MoChA), which adaptively splits the input sequence into small chunks over -which soft attention is computed. We show that models utilizing MoChA can be -trained efficiently with standard backpropagation while allowing online and -linear-time decoding at test time. When applied to online speech recognition, -we obtain state-of-the-art results and match the performance of a model using -an offline soft attention mechanism. In document summarization experiments -where we do not expect monotonic alignments, we show significantly improved -performance compared to a baseline monotonic attention-based model. -" -6382,1712.05403,"Yi Tay, Anh Tuan Luu, Siu Cheung Hui","Learning to Attend via Word-Aspect Associative Fusion for Aspect-based - Sentiment Analysis",cs.CL cs.AI cs.IR," Aspect-based sentiment analysis (ABSA) tries to predict the polarity of a -given document with respect to a given aspect entity. While neural network -architectures have been successful in predicting the overall polarity of -sentences, aspect-specific sentiment analysis still remains as an open problem. -In this paper, we propose a novel method for integrating aspect information -into the neural model. More specifically, we incorporate aspect information -into the neural model by modeling word-aspect relationships. Our novel model, -\textit{Aspect Fusion LSTM} (AF-LSTM) learns to attend based on associative -relationships between sentence words and aspect which allows our model to -adaptively focus on the correct words given an aspect term. This ameliorates -the flaws of other state-of-the-art models that utilize naive concatenations to -model word-aspect similarity. Instead, our model adopts circular convolution -and circular correlation to model the similarity between aspect and words and -elegantly incorporates this within a differentiable neural attention framework. -Finally, our model is end-to-end differentiable and highly related to -convolution-correlation (holographic like) memories. Our proposed neural model -achieves state-of-the-art performance on benchmark datasets, outperforming -ATAE-LSTM by $4\%-5\%$ on average across multiple datasets. -" -6383,1712.05483,Alexander Rosenberg Johansen and Richard Socher,Learning when to skim and when to read,cs.CL," Many recent advances in deep learning for natural language processing have -come at increasing computational cost, but the power of these state-of-the-art -models is not needed for every example in a dataset. We demonstrate two -approaches to reducing unnecessary computation in cases where a fast but weak -baseline classier and a stronger, slower model are both available. Applying an -AUC-based metric to the task of sentiment classification, we find significant -efficiency gains with both a probability-threshold method for reducing -computational cost and one that uses a secondary decision network. -" -6384,1712.05558,"Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak - Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh","CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven - Communication",cs.CV cs.AI cs.CL cs.LG," In this work, we propose a goal-driven collaborative task that combines -language, perception, and action. Specifically, we develop a Collaborative -image-Drawing game between two agents, called CoDraw. Our game is grounded in a -virtual world that contains movable clip art objects. The game involves two -players: a Teller and a Drawer. The Teller sees an abstract scene containing -multiple clip art pieces in a semantically meaningful configuration, while the -Drawer tries to reconstruct the scene on an empty canvas using available clip -art pieces. The two players communicate with each other using natural language. -We collect the CoDraw dataset of ~10K dialogs consisting of ~138K messages -exchanged between human players. We define protocols and metrics to evaluate -learned agents in this testbed, highlighting the need for a novel ""crosstalk"" -evaluation condition which pairs agents trained independently on disjoint -subsets of the training data. We present models for our task and benchmark them -using both fully automated evaluation and by having them play the game live -with humans. -" -6385,1712.05608,"Sri Harsha Dumpala, Rupayan Chakraborty, Sunil Kumar Kopparapu",A Novel Approach for Effective Learning in Low Resourced Scenarios,cs.CL cs.SD eess.AS," Deep learning based discriminative methods, being the state-of-the-art -machine learning techniques, are ill-suited for learning from lower amounts of -data. In this paper, we propose a novel framework, called simultaneous two -sample learning (s2sL), to effectively learn the class discriminative -characteristics, even from very low amount of data. In s2sL, more than one -sample (here, two samples) are simultaneously considered to both, train and -test the classifier. We demonstrate our approach for speech/music -discrimination and emotion classification through experiments. Further, we also -show the effectiveness of s2sL approach for classification in low-resource -scenario, and for imbalanced data. -" -6386,1712.05626,"Denis Fedorenko, Nikita Smetanin, Artem Rodichev",Avoiding Echo-Responses in a Retrieval-Based Conversation System,cs.CL," Retrieval-based conversation systems generally tend to highly rank responses -that are semantically similar or even identical to the given conversation -context. While the system's goal is to find the most appropriate response, -rather than the most semantically similar one, this tendency results in -low-quality responses. We refer to this challenge as the echoing problem. To -mitigate this problem, we utilize a hard negative mining approach at the -training stage. The evaluation shows that the resulting model reduces echoing -and achieves better results in terms of Average Precision and Recall@N metrics, -compared to the models trained without the proposed approach. -" -6387,1712.05690,"Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem - Sokolov, Ann Clifton, Matt Post",Sockeye: A Toolkit for Neural Machine Translation,cs.CL cs.LG stat.ML," We describe Sockeye (version 1.12), an open-source sequence-to-sequence -toolkit for Neural Machine Translation (NMT). Sockeye is a production-ready -framework for training and applying models as well as an experimental platform -for researchers. Written in Python and built on MXNet, the toolkit offers -scalable training and inference for the three most prominent encoder-decoder -architectures: attentional recurrent neural networks, self-attentional -transformers, and fully convolutional networks. Sockeye also supports a wide -range of optimizers, normalization and regularization techniques, and inference -improvements from current NMT literature. Users can easily run standard -training recipes, explore different model settings, and incorporate new ideas. -In this paper, we highlight Sockeye's features and benchmark it against other -NMT toolkits on two language arcs from the 2017 Conference on Machine -Translation (WMT): English-German and Latvian-English. We report competitive -BLEU scores across all three architectures, including an overall best score for -Sockeye's transformer implementation. To facilitate further comparison, we -release all system outputs and training scripts used in our experiments. The -Sockeye toolkit is free software released under the Apache 2.0 license. -" -6388,1712.05785,"Jordan Prosky, Xingyou Song, Andrew Tan, Michael Zhao",Sentiment Predictability for Stocks,cs.CL cs.AI cs.LG cs.MM," In this work, we present our findings and experiments for stock-market -prediction using various textual sentiment analysis tools, such as mood -analysis and event extraction, as well as prediction models, such as LSTMs and -specific convolutional architectures. -" -6389,1712.05846,"Denis Yarats, Mike Lewis",Hierarchical Text Generation and Planning for Strategic Dialogue,cs.CL," End-to-end models for goal-orientated dialogue are challenging to train, -because linguistic and strategic aspects are entangled in latent state vectors. -We introduce an approach to learning representations of messages in dialogues -by maximizing the likelihood of subsequent sentences and actions, which -decouples the semantics of the dialogue utterance from its linguistic -realization. We then use these latent sentence representations for hierarchical -language generation, planning and reinforcement learning. Experiments show that -our approach increases the end-task reward achieved by the model, improves the -effectiveness of long-term planning using rollouts, and allows self-play -reinforcement learning to improve decision making without diverging from human -language. Our hierarchical latent-variable model outperforms previous work both -linguistically and strategically. -" -6390,1712.05884,"Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep - Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, - Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu","Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram - Predictions",cs.CL," This paper describes Tacotron 2, a neural network architecture for speech -synthesis directly from text. The system is composed of a recurrent -sequence-to-sequence feature prediction network that maps character embeddings -to mel-scale spectrograms, followed by a modified WaveNet model acting as a -vocoder to synthesize timedomain waveforms from those spectrograms. Our model -achieves a mean opinion score (MOS) of $4.53$ comparable to a MOS of $4.58$ for -professionally recorded speech. To validate our design choices, we present -ablation studies of key components of our system and evaluate the impact of -using mel spectrograms as the input to WaveNet instead of linguistic, duration, -and $F_0$ features. We further demonstrate that using a compact acoustic -intermediate representation enables significant simplification of the WaveNet -architecture. -" -6391,1712.05898,"Yifan Peng and Xiaosong Wang and Le Lu and Mohammadhadi Bagheri and - Ronald Summers and Zhiyong Lu","NegBio: a high-performance tool for negation and uncertainty detection - in radiology reports",cs.CL," Negative and uncertain medical findings are frequent in radiology reports, -but discriminating them from positive findings remains challenging for -information extraction. Here, we propose a new algorithm, NegBio, to detect -negative and uncertain findings in radiology reports. Unlike previous -rule-based methods, NegBio utilizes patterns on universal dependencies to -identify the scope of triggers that are indicative of negation or uncertainty. -We evaluated NegBio on four datasets, including two public benchmarking corpora -of radiology reports, a new radiology corpus that we annotated for this work, -and a public corpus of general clinical texts. Evaluation on these datasets -demonstrates that NegBio is highly accurate for detecting negative and -uncertain findings and compares favorably to a widely-used state-of-the-art -system NegEx (an average of 9.5% improvement in precision and 5.1% in -F1-score). -" -6392,1712.05972,"Pushpankar Kumar Pushp, Muktabh Mayank Srivastava","Train Once, Test Anywhere: Zero-Shot Learning for Text Classification",cs.CL," Zero-shot Learners are models capable of predicting unseen classes. In this -work, we propose a Zero-shot Learning approach for text categorization. Our -method involves training model on a large corpus of sentences to learn the -relationship between a sentence and embedding of sentence's tags. Learning such -relationship makes the model generalize to unseen sentences, tags, and even new -datasets provided they can be put into same embedding space. The model learns -to predict whether a given sentence is related to a tag or not; unlike other -classifiers that learn to classify the sentence as one of the possible classes. -We propose three different neural networks for the task and report their -accuracy on the test set of the dataset used for training them as well as two -other standard datasets for which no retraining was done. We show that our -models generalize well across new unseen classes in both cases. Although the -models do not achieve the accuracy level of the state of the art supervised -models, yet it evidently is a step forward towards general intelligence in -natural language processing. -" -6393,1712.05997,Amir Karami,Taming Wild High Dimensional Text Data with a Fuzzy Lash,stat.ML cs.CL cs.IR cs.LG stat.AP," The bag of words (BOW) represents a corpus in a matrix whose elements are the -frequency of words. However, each row in the matrix is a very high-dimensional -sparse vector. Dimension reduction (DR) is a popular method to address sparsity -and high-dimensionality issues. Among different strategies to develop DR -method, Unsupervised Feature Transformation (UFT) is a popular strategy to map -all words on a new basis to represent BOW. The recent increase of text data and -its challenges imply that DR area still needs new perspectives. Although a wide -range of methods based on the UFT strategy has been developed, the fuzzy -approach has not been considered for DR based on this strategy. This research -investigates the application of fuzzy clustering as a DR method based on the -UFT strategy to collapse BOW matrix to provide a lower-dimensional -representation of documents instead of the words in a corpus. The quantitative -evaluation shows that fuzzy clustering produces superior performance and -features to Principal Components Analysis (PCA) and Singular Value -Decomposition (SVD), two popular DR methods based on the UFT strategy. -" -6394,1712.05999,Julio Amador and Axel Oehmichen and Miguel Molina-Solana,Characterizing Political Fake News in Twitter by its Meta-Data,cs.CL cs.SI stat.ML," This article presents a preliminary approach towards characterizing political -fake news on Twitter through the analysis of their meta-data. In particular, we -focus on more than 1.5M tweets collected on the day of the election of Donald -Trump as 45th president of the United States of America. We use the meta-data -embedded within those tweets in order to look for differences between tweets -containing fake news and tweets not containing them. Specifically, we perform -our analysis only on tweets that went viral, by studying proxies for users' -exposure to the tweets, by characterizing accounts spreading fake news, and by -looking at their polarization. We found significant differences on the -distribution of followers, the number of URLs on tweets, and the verification -of the users. -" -6395,1712.06074,"Xiaoyong Yan, Seong-Gyu Yang, Beom Jun Kim and Petter Minnhagen",Benford's Law and First Letter of Word,cs.CL physics.soc-ph," A universal First-Letter Law (FLL) is derived and described. It predicts the -percentages of first letters for words in novels. The FLL is akin to Benford's -law (BL) of first digits, which predicts the percentages of first digits in a -data collection of numbers. Both are universal in the sense that FLL only -depends on the numbers of letters in the alphabet, whereas BL only depends on -the number of digits in the base of the number system. The existence of these -types of universal laws appears counter-intuitive. Nonetheless both describe -data very well. Relations to some earlier works are given. FLL predicts that an -English author on the average starts about 16 out of 100 words with the English -letter `t'. This is corroborated by data, yet an author can freely write -anything. Fuller implications and the applicability of FLL remain for the -future. -" -6396,1712.06086,Mirco Ravanelli,Deep Learning for Distant Speech Recognition,cs.CL cs.SD eess.AS," Deep learning is an emerging technology that is considered one of the most -promising directions for reaching higher levels of artificial intelligence. -Among the other achievements, building computers that understand speech -represents a crucial leap towards intelligent machines. Despite the great -efforts of the past decades, however, a natural and robust human-machine speech -interaction still appears to be out of reach, especially when users interact -with a distant microphone in noisy and reverberant environments. The latter -disturbances severely hamper the intelligibility of a speech signal, making -Distant Speech Recognition (DSR) one of the major open challenges in the field. - This thesis addresses the latter scenario and proposes some novel techniques, -architectures, and algorithms to improve the robustness of distant-talking -acoustic models. We first elaborate on methodologies for realistic data -contamination, with a particular emphasis on DNN training with simulated data. -We then investigate on approaches for better exploiting speech contexts, -proposing some original methodologies for both feed-forward and recurrent -neural networks. Lastly, inspired by the idea that cooperation across different -DNNs could be the key for counteracting the harmful effects of noise and -reverberation, we propose a novel deep learning paradigm called network of deep -neural networks. The analysis of the original concepts were based on extensive -experimental validations conducted on both real and simulated data, considering -different corpora, microphone configurations, environments, noisy conditions, -and ASR tasks. -" -6397,1712.06100,"Johan Hasselqvist, Niklas Helmertz, Mikael K{\aa}geb\""ack",Query-Based Abstractive Summarization Using Neural Networks,cs.CL," In this paper, we present a model for generating summaries of text documents -with respect to a query. This is known as query-based summarization. We adapt -an existing dataset of news article summaries for the task and train a -pointer-generator model using this dataset. The generated summaries are -evaluated by measuring similarity to reference summaries. Our results show that -a neural network summarization model, similar to existing neural network models -for abstractive summarization, can be constructed to make use of queries to -produce targeted summaries. -" -6398,1712.06163,Andrew J. Reagan,"Towards a science of human stories: using sentiment analysis and - emotional arcs to understand the building blocks of complex social systems",cs.CL," Given the growing assortment of sentiment measuring instruments, it is -imperative to understand which aspects of sentiment dictionaries contribute to -both their classification accuracy and their ability to provide richer -understanding of texts. Here, we perform detailed, quantitative tests and -qualitative assessments of 6 dictionary-based methods applied, and briefly -examine a further 20 methods. We show that while inappropriate for sentences, -dictionary-based methods are generally robust in their classification accuracy -for longer texts. - Stories often following distinct emotional trajectories, forming patterns -that are meaningful to us. By classifying the emotional arcs for a filtered -subset of 4,803 stories from Project Gutenberg's fiction collection, we find a -set of six core trajectories which form the building blocks of complex -narratives. Of profound scientific interest will be the degree to which we can -eventually understand the full landscape of human stories, and data driven -approaches will play a crucial role. - Finally, we utilize web-scale data from Twitter to study the limits of what -social data can tell us about public health, mental illness, discourse around -the protest movement of #BlackLivesMatter, discourse around climate change, and -hidden networks. We conclude with a review of published works in complex -systems that separately analyze charitable donations, the happiness of words in -10 languages, 100 years of daily temperature data across the United States, and -Australian Rules Football games. -" -6399,1712.06204,"Yuting Chen, Joseph Wang, Yannan Bai, Gregory Casta\~n\'on, and - Venkatesh Saligrama","Probabilistic Semantic Retrieval for Surveillance Videos with Activity - Graphs",cs.MM cs.CL," We present a novel framework for finding complex activities matching -user-described queries in cluttered surveillance videos. The wide diversity of -queries coupled with unavailability of annotated activity data limits our -ability to train activity models. To bridge the semantic gap we propose to let -users describe an activity as a semantic graph with object attributes and -inter-object relationships associated with nodes and edges, respectively. We -learn node/edge-level visual predictors during training and, at test-time, -propose to retrieve activity by identifying likely locations that match the -semantic graph. We formulate a novel CRF based probabilistic activity -localization objective that accounts for mis-detections, mis-classifications -and track-losses, and outputs a likelihood score for a candidate grounded -location of the query in the video. We seek groundings that maximize overall -precision and recall. To handle the combinatorial search over all -high-probability groundings, we propose a highest precision subgraph matching -algorithm. Our method outperforms existing retrieval methods on benchmarked -datasets. -" -6400,1712.06273,"Alexander Erdmann, Nizar Habash, Dima Taji, Houda Bouamor","Low Resourced Machine Translation via Morpho-syntactic Modeling: The - Case of Dialectal Arabic",cs.CL," We present the second ever evaluated Arabic dialect-to-dialect machine -translation effort, and the first to leverage external resources beyond a small -parallel corpus. The subject has not previously received serious attention due -to lack of naturally occurring parallel data; yet its importance is evidenced -by dialectal Arabic's wide usage and breadth of inter-dialect variation, -comparable to that of Romance languages. Our results suggest that modeling -morphology and syntax significantly improves dialect-to-dialect translation, -though optimizing such data-sparse models requires consideration of the -linguistic differences between dialects and the nature of available data and -resources. On a single-reference blind test set where untranslated input scores -6.5 BLEU and a model trained only on parallel data reaches 14.6, pivot -techniques and morphosyntactic modeling significantly improve performance to -17.5. -" -6401,1712.06289,"Yi Zhang, Xu Sun","A Chinese Dataset with Negative Full Forms for General Abbreviation - Prediction",cs.CL," Abbreviation is a common phenomenon across languages, especially in Chinese. -In most cases, if an expression can be abbreviated, its abbreviation is used -more often than its fully expanded forms, since people tend to convey -information in a most concise way. For various language processing tasks, -abbreviation is an obstacle to improving the performance, as the textual form -of an abbreviation does not express useful information, unless it's expanded to -the full form. Abbreviation prediction means associating the fully expanded -forms with their abbreviations. However, due to the deficiency in the -abbreviation corpora, such a task is limited in current studies, especially -considering general abbreviation prediction should also include those full form -expressions that do not have valid abbreviations, namely the negative full -forms (NFFs). Corpora incorporating negative full forms for general -abbreviation prediction are few in number. In order to promote the research in -this area, we build a dataset for general Chinese abbreviation prediction, -which needs a few preprocessing steps, and evaluate several different models on -the built dataset. The dataset is available at -https://github.com/lancopku/Chinese-abbreviation-dataset -" -6402,1712.06414,"Feng Shi, Misha Teplitskiy, Eamon Duede, James Evans",The Wisdom of Polarized Crowds,cs.SI cs.CL cs.CY cs.DL stat.AP," As political polarization in the United States continues to rise, the -question of whether polarized individuals can fruitfully cooperate becomes -pressing. Although diversity of individual perspectives typically leads to -superior team performance on complex tasks, strong political perspectives have -been associated with conflict, misinformation and a reluctance to engage with -people and perspectives beyond one's echo chamber. It is unclear whether -self-selected teams of politically diverse individuals will create higher or -lower quality outcomes. In this paper, we explore the effect of team political -composition on performance through analysis of millions of edits to Wikipedia's -Political, Social Issues, and Science articles. We measure editors' political -alignments by their contributions to conservative versus liberal articles. A -survey of editors validates that those who primarily edit liberal articles -identify more strongly with the Democratic party and those who edit -conservative ones with the Republican party. Our analysis then reveals that -polarized teams---those consisting of a balanced set of politically diverse -editors---create articles of higher quality than politically homogeneous teams. -The effect appears most strongly in Wikipedia's Political articles, but is also -observed in Social Issues and even Science articles. Analysis of article ""talk -pages"" reveals that politically polarized teams engage in longer, more -constructive, competitive, and substantively focused but linguistically diverse -debates than political moderates. More intense use of Wikipedia policies by -politically diverse teams suggests institutional design principles to help -unleash the power of politically polarized teams. -" -6403,1712.06427,"Shervin Malmasi, Marcos Zampieri",Detecting Hate Speech in Social Media,cs.CL," In this paper we examine methods to detect hate speech in social media, while -distinguishing this from general profanity. We aim to establish lexical -baselines for this task by applying supervised classification methods using a -recently released dataset annotated for this purpose. As features, our system -uses character n-grams, word n-grams and word skip-grams. We obtain results of -78% accuracy in identifying posts across three classes. Results demonstrate -that the main challenge lies in discriminating profanity and hate speech from -each other. A number of directions for future work are discussed. -" -6404,1712.06674,Siamak Sarmady and Erfan Rahmani,word representation or word embedding in Persian text,cs.CL," Text processing is one of the sub-branches of natural language processing. -Recently, the use of machine learning and neural networks methods has been -given greater consideration. For this reason, the representation of words has -become very important. This article is about word representation or converting -words into vectors in Persian text. In this research GloVe, CBOW and skip-gram -methods are updated to produce embedded vectors for Persian words. In order to -train a neural networks, Bijankhan corpus, Hamshahri corpus and UPEC corpus -have been compound and used. Finally, we have 342,362 words that obtained -vectors in all three models for this words. These vectors have many usage for -Persian natural language processing. -" -6405,1712.06682,"Jason Xie, Tingwen Bao",Synthesizing Novel Pairs of Image and Text,cs.CV cs.CL cs.LG," Generating novel pairs of image and text is a problem that combines computer -vision and natural language processing. In this paper, we present strategies -for generating novel image and caption pairs based on existing captioning -datasets. The model takes advantage of recent advances in generative -adversarial networks and sequence-to-sequence modeling. We make generalizations -to generate paired samples from multiple domains. Furthermore, we study cycles --- generating from image to text then back to image and vise versa, as well as -its connection with autoencoders. -" -6406,1712.06704,"Kriste Krstovski, Michael J. Kurtz, David A. Smith and Alberto - Accomazzi",Multilingual Topic Models,stat.ML cs.CL cs.IR," Scientific publications have evolved several features for mitigating -vocabulary mismatch when indexing, retrieving, and computing similarity between -articles. These mitigation strategies range from simply focusing on high-value -article sections, such as titles and abstracts, to assigning keywords, often -from controlled vocabularies, either manually or through automatic annotation. -Various document representation schemes possess different cost-benefit -tradeoffs. In this paper, we propose to model different representations of the -same article as translations of each other, all generated from a common latent -representation in a multilingual topic model. We start with a methodological -overview on latent variable models for parallel document representations that -could be used across many information science tasks. We then show how solving -the inference problem of mapping diverse representations into a shared topic -space allows us to evaluate representations based on how topically similar they -are to the original article. In addition, our proposed approach provides means -to discover where different concept vocabularies require improvement. -" -6407,1712.06751,"Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou",HotFlip: White-Box Adversarial Examples for Text Classification,cs.CL cs.LG," We propose an efficient method to generate white-box adversarial examples to -trick a character-level neural classifier. We find that only a few -manipulations are needed to greatly decrease the accuracy. Our method relies on -an atomic flip operation, which swaps one token for another, based on the -gradients of the one-hot input vectors. Due to efficiency of our method, we can -perform adversarial training which makes the model more robust to attacks at -test time. With the use of a few semantics-preserving constraints, we -demonstrate that HotFlip can be adapted to attack a word-level classifier as -well. -" -6408,1712.06855,"Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel",Subword and Crossword Units for CTC Acoustic Models,cs.CL," This paper proposes a novel approach to create an unit set for CTC based -speech recognition systems. By using Byte Pair Encoding we learn an unit set of -an arbitrary size on a given training text. In contrast to using characters or -words as units this allows us to find a good trade-off between the size of our -unit set and the available training data. We evaluate both Crossword units, -that may span multiple word, and Subword units. By combining this approach with -decoding methods using a separate language model we are able to achieve state -of the art results for grapheme based CTC systems. -" -6409,1712.06880,"Karni Gilon, Felicia Y Ng, Joel Chan, Hila Lifshitz Assaf, Aniket - Kittur, Dafna Shahaf",Analogy Mining for Specific Design Needs,cs.CL," Finding analogical inspirations in distant domains is a powerful way of -solving problems. However, as the number of inspirations that could be matched -and the dimensions on which that matching could occur grow, it becomes -challenging for designers to find inspirations relevant to their needs. -Furthermore, designers are often interested in exploring specific aspects of a -product-- for example, one designer might be interested in improving the -brewing capability of an outdoor coffee maker, while another might wish to -optimize for portability. In this paper we introduce a novel system for -targeting analogical search for specific needs. Specifically, we contribute a -novel analogical search engine for expressing and abstracting specific design -needs that returns more distant yet relevant inspirations than alternate -approaches. -" -6410,1712.06961,"Hanan Aldarmaki, Mahesh Mohan and Mona Diab","Unsupervised Word Mapping Using Structural Similarities in Monolingual - Embeddings",cs.CL," Most existing methods for automatic bilingual dictionary induction rely on -prior alignments between the source and target languages, such as parallel -corpora or seed dictionaries. For many language pairs, such supervised -alignments are not readily available. We propose an unsupervised approach for -learning a bilingual dictionary for a pair of languages given their -independently-learned monolingual word embeddings. The proposed method exploits -local and global structures in monolingual vector spaces to align them such -that similar words are mapped to each other. We show empirically that the -performance of bilingual correspondents learned using our proposed unsupervised -method is comparable to that of using supervised bilingual correspondents from -a seed dictionary. -" -6411,1712.06994,"Maryam Zare, Shaurya Rohatgi",DeepNorm-A Deep Learning Approach to Text Normalization,cs.CL," This paper presents an simple yet sophisticated approach to the challenge by -Sproat and Jaitly (2016)- given a large corpus of written text aligned to its -normalized spoken form, train an RNN to learn the correct normalization -function. Text normalization for a token seems very straightforward without -it's context. But given the context of the used token and then normalizing -becomes tricky for some classes. We present a novel approach in which the -prediction of our classification algorithm is used by our sequence to sequence -model to predict the normalized text of the input token. Our approach takes -very less time to learn and perform well unlike what has been reported by -Google (5 days on their GPU cluster). We have achieved an accuracy of 97.62 -which is impressive given the resources we use. Our approach is using the best -of both worlds, gradient boosting - state of the art in most classification -tasks and sequence to sequence learning - state of the art in machine -translation. We present our experiments and report results with various -parameter settings. -" -6412,1712.07004,"Rasoul Kaljahi, Jennifer Foster","Any-gram Kernels for Sentence Classification: A Sentiment Analysis Case - Study",cs.CL cs.AI stat.ML," Any-gram kernels are a flexible and efficient way to employ bag-of-n-gram -features when learning from textual data. They are also compatible with the use -of word embeddings so that word similarities can be accounted for. While the -original any-gram kernels are implemented on top of tree kernels, we propose a -new approach which is independent of tree kernels and is more efficient. We -also propose a more effective way to make use of word embeddings than the -original any-gram formulation. When applied to the task of sentiment -classification, our new formulation achieves significantly better performance. -" -6413,1712.07040,"Tom\'a\v{s} Ko\v{c}isk\'y, Jonathan Schwarz, Phil Blunsom, Chris Dyer, - Karl Moritz Hermann, G\'abor Melis, Edward Grefenstette",The NarrativeQA Reading Comprehension Challenge,cs.CL cs.AI cs.NE," Reading comprehension (RC)---in contrast to information retrieval---requires -integrating information and reasoning about events, entities, and their -relations across a full document. Question answering is conventionally used to -assess RC ability, in both artificial agents and children learning to read. -However, existing RC datasets and tasks are dominated by questions that can be -solved by selecting answers using superficial information (e.g., local context -similarity or global term frequency); they thus fail to test for the essential -integrative aspect of RC. To encourage progress on deeper comprehension of -language, we present a new dataset and set of tasks in which the reader must -answer questions about stories by reading entire books or movie scripts. These -tasks are designed so that successfully answering their questions requires -understanding the underlying narrative rather than relying on shallow pattern -matching or salience. We show that although humans solve the tasks easily, -standard RC models struggle on the tasks presented here. We provide an analysis -of the dataset and the challenges it presents. -" -6414,1712.07101,"Yingbo Zhou, Caiming Xiong, Richard Socher",Improving End-to-End Speech Recognition with Policy Learning,cs.CL cs.SD eess.AS stat.ML," Connectionist temporal classification (CTC) is widely used for maximum -likelihood learning in end-to-end speech recognition models. However, there is -usually a disparity between the negative maximum likelihood and the performance -metric used in speech recognition, e.g., word error rate (WER). This results in -a mismatch between the objective function and metric during training. We show -that the above problem can be mitigated by jointly training with maximum -likelihood and policy gradient. In particular, with policy learning we are able -to directly optimize on the (otherwise non-differentiable) performance metric. -We show that joint training improves relative performance by 4% to 13% for our -end-to-end model as compared to the same model learned through maximum -likelihood. The model achieves 5.53% WER on Wall Street Journal dataset, and -5.42% and 14.70% on Librispeech test-clean and test-other set, respectively. -" -6415,1712.07108,"Yingbo Zhou, Caiming Xiong, Richard Socher",Improved Regularization Techniques for End-to-End Speech Recognition,cs.CL cs.SD eess.AS stat.ML," Regularization is important for end-to-end speech models, since the models -are highly flexible and easy to overfit. Data augmentation and dropout has been -important for improving end-to-end models in other domains. However, they are -relatively under explored for end-to-end speech models. Therefore, we -investigate the effectiveness of both methods for end-to-end trainable, deep -speech recognition models. We augment audio data through random perturbations -of tempo, pitch, volume, temporal alignment, and adding random noise.We further -investigate the effect of dropout when applied to the inputs of all layers of -the network. We show that the combination of data augmentation and dropout give -a relative performance improvement on both Wall Street Journal (WSJ) and -LibriSpeech dataset of over 20%. Our model performance is also competitive with -other end-to-end speech models on both datasets. -" -6416,1712.07199,Rajesh Bordawekar and Bortik Bandyopadhyay and Oded Shmueli,"Cognitive Database: A Step towards Endowing Relational Databases with - Artificial Intelligence Capabilities",cs.DB cs.AI cs.CL cs.NE," We propose Cognitive Databases, an approach for transparently enabling -Artificial Intelligence (AI) capabilities in relational databases. A novel -aspect of our design is to first view the structured data source as meaningful -unstructured text, and then use the text to build an unsupervised neural -network model using a Natural Language Processing (NLP) technique called word -embedding. This model captures the hidden inter-/intra-column relationships -between database tokens of different types. For each database token, the model -includes a vector that encodes contextual semantic relationships. We seamlessly -integrate the word embedding model into existing SQL query infrastructure and -use it to enable a new class of SQL-based analytics queries called cognitive -intelligence (CI) queries. CI queries use the model vectors to enable complex -queries such as semantic matching, inductive reasoning queries such as -analogies, predictive queries using entities not present in a database, and, -more generally, using knowledge from external sources. We demonstrate unique -capabilities of Cognitive Databases using an Apache Spark based prototype to -execute inductive reasoning CI queries over a multi-modal database containing -text and images. We believe our first-of-a-kind system exemplifies using AI -functionality to endow relational databases with capabilities that were -previously very hard to realize in practice. -" -6417,1712.07229,"Tom Kenter, Maarten de Rijke","Attentive Memory Networks: Efficient Machine Reading for Conversational - Search",cs.CL," Recent advances in conversational systems have changed the search paradigm. -Traditionally, a user poses a query to a search engine that returns an answer -based on its index, possibly leveraging external knowledge bases and -conditioning the response on earlier interactions in the search session. In a -natural conversation, there is an additional source of information to take into -account: utterances produced earlier in a conversation can also be referred to -and a conversational IR system has to keep track of information conveyed by the -user during the conversation, even if it is implicit. - We argue that the process of building a representation of the conversation -can be framed as a machine reading task, where an automated system is presented -with a number of statements about which it should answer questions. The -questions should be answered solely by referring to the statements provided, -without consulting external knowledge. The time is right for the information -retrieval community to embrace this task, both as a stand-alone task and -integrated in a broader conversational search setting. - In this paper, we focus on machine reading as a stand-alone task and present -the Attentive Memory Network (AMN), an end-to-end trainable machine reading -algorithm. Its key contribution is in efficiency, achieved by having an -hierarchical input encoder, iterating over the input only once. Speed is an -important requirement in the setting of conversational search, as gaps between -conversational turns have a detrimental effect on naturalness. On 20 datasets -commonly used for evaluating machine reading algorithms we show that the AMN -achieves performance comparable to the state-of-the-art models, while using -considerably fewer computations. -" -6418,1712.07316,"Martin Schrimpf, Stephen Merity, James Bradbury, Richard Socher",A Flexible Approach to Automated RNN Architecture Generation,cs.CL cs.LG stat.ML," The process of designing neural architectures requires expert knowledge and -extensive trial and error. While automated architecture search may simplify -these requirements, the recurrent neural network (RNN) architectures generated -by existing methods are limited in both flexibility and components. We propose -a domain-specific language (DSL) for use in automated architecture search which -can produce novel RNNs of arbitrary depth and width. The DSL is flexible enough -to define standard architectures such as the Gated Recurrent Unit and Long -Short Term Memory and allows the introduction of non-standard RNN components -such as trigonometric curves and layer normalization. Using two different -candidate generation techniques, random search with a ranking function and -reinforcement learning, we explore the novel architectures produced by the RNN -DSL for language modeling and machine translation domains. The resulting -architectures do not follow human intuition yet perform well on their targeted -tasks, suggesting the space of usable RNN architectures is far larger than -previously assumed. -" -6419,1712.07473,"Vadim Popov, Mikhail Kudinov, Irina Piontkovskaya, Petr Vytovtov and - Alex Nevidomsky",Differentially Private Distributed Learning for Language Modeling Tasks,cs.CL cs.CR cs.LG," One of the big challenges in machine learning applications is that training -data can be different from the real-world data faced by the algorithm. In -language modeling, users' language (e.g. in private messaging) could change in -a year and be completely different from what we observe in publicly available -data. At the same time, public data can be used for obtaining general knowledge -(i.e. general model of English). We study approaches to distributed fine-tuning -of a general model on user private data with the additional requirements of -maintaining the quality on the general data and minimization of communication -costs. We propose a novel technique that significantly improves prediction -quality on users' language compared to a general model and outperforms gradient -compression methods in terms of communication efficiency. The proposed -procedure is fast and leads to an almost 70% perplexity reduction and 8.7 -percentage point improvement in keystroke saving rate on informal English -texts. We also show that the range of tasks our approach is applicable to is -not limited by language modeling only. Finally, we propose an experimental -framework for evaluating differential privacy of distributed training of -language models and show that our approach has good privacy guarantees. -" -6420,1712.07512,Anil Kumar Singh and Akhilesh Sudhakar,Ethical Questions in NLP Research: The (Mis)-Use of Forensic Linguistics,cs.CL cs.CY," Ideas from forensic linguistics are now being used frequently in Natural -Language Processing (NLP), using machine learning techniques. While the role of -forensic linguistics was more benign earlier, it is now being used for purposes -which are questionable. Certain methods from forensic linguistics are employed, -without considering their scientific limitations and ethical concerns. While we -take the specific case of forensic linguistics as an example of such trends in -NLP and machine learning, the issue is a larger one and present in many other -scientific and data-driven domains. We suggest that such trends indicate that -some of the applied sciences are exceeding their legal and scientific briefs. -We highlight how carelessly implemented practices are serving to short-circuit -the due processes of law as well breach ethical codes. -" -6421,1712.07558,"Ioannis Papaioannou, Amanda Cercas Curry, Jose L. Part, Igor - Shalyminov, Xinnuo Xu, Yanchao Yu, Ond\v{r}ej Du\v{s}ek, Verena Rieser and - Oliver Lemon",An Ensemble Model with Ranking for Social Dialogue,cs.CL," Open-domain social dialogue is one of the long-standing goals of Artificial -Intelligence. This year, the Amazon Alexa Prize challenge was announced for the -first time, where real customers get to rate systems developed by leading -universities worldwide. The aim of the challenge is to converse ""coherently and -engagingly with humans on popular topics for 20 minutes"". We describe our Alexa -Prize system (called 'Alana') consisting of an ensemble of bots, combining -rule-based and machine learning systems, and using a contextual ranking -mechanism to choose a system response. The ranker was trained on real user -feedback received during the competition, where we address the problem of how -to train on the noisy and sparse feedback obtained during the competition. -" -6422,1712.07745,"Sahisnu Mazumder, Bing Liu",Context-aware Path Ranking for Knowledge Base Completion,cs.CL cs.AI," Knowledge base (KB) completion aims to infer missing facts from existing ones -in a KB. Among various approaches, path ranking (PR) algorithms have received -increasing attention in recent years. PR algorithms enumerate paths between -entity pairs in a KB and use those paths as features to train a model for -missing fact prediction. Due to their good performances and high model -interpretability, several methods have been proposed. However, most existing -methods suffer from scalability (high RAM consumption) and feature explosion -(trains on an exponentially large number of features) problems. This paper -proposes a Context-aware Path Ranking (C-PR) algorithm to solve these problems -by introducing a selective path exploration strategy. C-PR learns global -semantics of entities in the KB using word embedding and leverages the -knowledge of entity semantics to enumerate contextually relevant paths using -bidirectional random walk. Experimental results on three large KBs show that -the path features (fewer in number) discovered by C-PR not only improve -predictive performance but also are more interpretable than existing baselines. -" -6423,1712.07794,Roger T. Dean and Hazel Smith,"The Character Thinks Ahead: creative writing with deep learning nets and - its stylistic assessment",cs.CL," We discuss how to control outputs from deep learning models of text corpora -so as to create contemporary poetic works. We assess whether these controls are -successful in the immediate sense of creating stylo- metric distinctiveness. -The specific context is our piece The Character Thinks Ahead (2016/17); the -potential applications are broad. -" -6424,1712.08207,"Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, Pascal Poupart",Variational Attention for Sequence-to-Sequence Models,cs.CL," The variational encoder-decoder (VED) encodes source information as a set of -random variables using a neural network, which in turn is decoded into target -data using another neural network. In natural language processing, -sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder -networks. When combined with a traditional (deterministic) attention mechanism, -the variational latent space may be bypassed by the attention model, and thus -becomes ineffective. In this paper, we propose a variational attention -mechanism for VED, where the attention vector is also modeled as Gaussian -distributed random variables. Results on two experiments show that, without -loss of quality, our proposed method alleviates the bypassing phenomenon as it -increases the diversity of generated sentences. -" -6425,1712.08291,Vivek Kulkarni and William Yang Wang,"TFW, DamnGina, Juvie, and Hotsie-Totsie: On the Linguistic and Social - Aspects of Internet Slang",cs.CL," Slang is ubiquitous on the Internet. The emergence of new social contexts -like micro-blogs, question-answering forums, and social networks has enabled -slang and non-standard expressions to abound on the web. Despite this, slang -has been traditionally viewed as a form of non-standard language -- a form of -language that is not the focus of linguistic analysis and has largely been -neglected. In this work, we use UrbanDictionary to conduct the first -large-scale linguistic analysis of slang and its social aspects on the Internet -to yield insights into this variety of language that is increasingly used all -over the world online. - We begin by computationally analyzing the phonological, morphological and -syntactic properties of slang. We then study linguistic patterns in four -specific categories of slang namely alphabetisms, blends, clippings, and -reduplicatives. Our analysis reveals that slang demonstrates extra-grammatical -rules of phonological and morphological formation that markedly distinguish it -from the standard form shedding insight into its generative patterns. Next, we -analyze the social aspects of slang by studying subject restriction and -stereotyping in slang usage. Analyzing tens of thousands of such slang words -reveals that the majority of slang on the Internet belongs to two major -categories: sex and drugs. We also noted that not only is slang usage not -immune to prevalent social biases and prejudices but also reflects such biases -and stereotypes more intensely than the standard variety. -" -6426,1712.08302,"Shun Kiyono, Sho Takase, Jun Suzuki, Naoaki Okazaki, Kentaro Inui, - Masaaki Nagata",Source-side Prediction for Neural Headline Generation,cs.CL," The encoder-decoder model is widely used in natural language generation -tasks. However, the model sometimes suffers from repeated redundant generation, -misses important phrases, and includes irrelevant entities. Toward solving -these problems we propose a novel source-side token prediction module. Our -method jointly estimates the probability distributions over source and target -vocabularies to capture a correspondence between source and target tokens. The -experiments show that the proposed model outperforms the current -state-of-the-art method in the headline generation task. Additionally, we show -that our method has an ability to learn a reasonable token-wise correspondence -without knowing any true alignments. -" -6427,1712.08349,Leon Derczynski and Matthew Rowe,Tracking the Diffusion of Named Entities,cs.CL cs.SI," Existing studies of how information diffuses across social networks have thus -far concentrated on analysing and recovering the spread of deterministic -innovations such as URLs, hashtags, and group membership. However investigating -how mentions of real-world entities appear and spread has yet to be explored, -largely due to the computationally intractable nature of performing large-scale -entity extraction. In this paper we present, to the best of our knowledge, one -of the first pieces of work to closely examine the diffusion of named entities -on social media, using Reddit as our case study platform. We first investigate -how named entities can be accurately recognised and extracted from discussion -posts. We then use these extracted entities to study the patterns of entity -cascades and how the probability of a user adopting an entity (i.e. mentioning -it) is associated with exposures to the entity. We put these pieces together by -presenting a parallelised diffusion model that can forecast the probability of -entity adoption, finding that the influence of adoption between users can be -characterised by their prior interactions -- as opposed to whether the users -propagated entity-adoptions beforehand. Our findings have important -implications for researchers studying influence and language, and for community -analysts who wish to understand entity-level influence dynamics. -" -6428,1712.08439,Jakub Dutkiewicz and Czes{\l}aw J\k{e}drzejek,Novel Ranking-Based Lexical Similarity Measure for Word Embedding,cs.CL," Distributional semantics models derive word space from linguistic items in -context. Meaning is obtained by defining a distance measure between vectors -corresponding to lexical entities. Such vectors present several problems. In -this paper we provide a guideline for post process improvements to the baseline -vectors. We focus on refining the similarity aspect, address imperfections of -the model by applying the hubness reduction method, implementing relational -knowledge into the model, and providing a new ranking similarity definition -that give maximum weight to the top 1 component value. This feature ranking is -similar to the one used in information retrieval. All these enrichments -outperform current literature results for joint ESL and TOEF sets comparison. -Since single word embedding is a basic element of any semantic task one can -expect a significant improvement of results for these tasks. Moreover, our -improved method of text processing can be translated to continuous distributed -representation of biological sequences for deep proteomics and genomics. -" -6429,1712.08636,"Yunhao Jiao, Cheng Li, Fei Wu, Qiaozhu Mei",Find the Conversation Killers: a Predictive Study of Thread-ending Posts,cs.CL cs.SI," How to improve the quality of conversations in online communities has -attracted considerable attention recently. Having engaged, urbane, and reactive -online conversations has a critical effect on the social life of Internet -users. In this study, we are particularly interested in identifying a post in a -multi-party conversation that is unlikely to be further replied to, which -therefore kills that thread of the conversation. For this purpose, we propose a -deep learning model called the ConverNet. ConverNet is attractive due to its -capability of modeling the internal structure of a long conversation and its -appropriate encoding of the contextual information of the conversation, through -effective integration of attention mechanisms. Empirical experiments on -real-world datasets demonstrate the effectiveness of the proposal model. For -the widely concerned topic, our analysis also offers implications for improving -the quality and user experience of online conversations. -" -6430,1712.08647,Dong Nguyen and Barbara McGillivray and Taha Yasseri,"Emo, Love, and God: Making Sense of Urban Dictionary, a Crowd-Sourced - Online Dictionary",cs.CL cs.CY cs.SI," The Internet facilitates large-scale collaborative projects and the emergence -of Web 2.0 platforms, where producers and consumers of content unify, has -drastically changed the information market. On the one hand, the promise of the -""wisdom of the crowd"" has inspired successful projects such as Wikipedia, which -has become the primary source of crowd-based information in many languages. On -the other hand, the decentralized and often un-monitored environment of such -projects may make them susceptible to low quality content. In this work, we -focus on Urban Dictionary, a crowd-sourced online dictionary. We combine -computational methods with qualitative annotation and shed light on the overall -features of Urban Dictionary in terms of growth, coverage and types of content. -We measure a high presence of opinion-focused entries, as opposed to the -meaning-focused entries that we expect from traditional dictionaries. -Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as -proper nouns. Urban Dictionary also contains offensive content, but highly -offensive content tends to receive lower scores through the dictionary's voting -system. The low threshold to include new material in Urban Dictionary enables -quick recording of new words and new meanings, but the resulting heterogeneous -content can pose challenges in using Urban Dictionary as a source to study -language innovation. -" -6431,1712.08697,"Alexander Trott, Caiming Xiong, Richard Socher",Interpretable Counting for Visual Question Answering,cs.AI cs.CL cs.CV," Questions that require counting a variety of objects in images remain a major -challenge in visual question answering (VQA). The most common approaches to VQA -involve either classifying answers based on fixed length representations of -both the image and question or summing fractional counts estimated from each -section of the image. In contrast, we treat counting as a sequential decision -process and force our model to make discrete choices of what to count. -Specifically, the model sequentially selects from detected objects and learns -interactions between objects that influence subsequent selections. A -distinction of our approach is its intuitive and interpretable output, as -discrete counts are automatically grounded in the image. Furthermore, our -method outperforms the state of the art architecture for VQA on multiple -metrics that evaluate counting. -" -6432,1712.08793,"Adriana Guevara-Rukoz, Alejandrina Cristia, Bogdan Ludusan, Roland - Thiolli\`ere, Andrew Martin, Reiko Mazuka, Emmanuel Dupoux","Are words easier to learn from infant- than adult-directed speech? A - quantitative corpus-based investigation",cs.CL," We investigate whether infant-directed speech (IDS) could facilitate word -form learning when compared to adult-directed speech (ADS). To study this, we -examine the distribution of word forms at two levels, acoustic and -phonological, using a large database of spontaneous speech in Japanese. At the -acoustic level we show that, as has been documented before for phonemes, the -realizations of words are more variable and less discriminable in IDS than in -ADS. At the phonological level, we find an effect in the opposite direction: -the IDS lexicon contains more distinctive words (such as onomatopoeias) than -the ADS counterpart. Combining the acoustic and phonological metrics together -in a global discriminability score reveals that the bigger separation of -lexical categories in the phonological space does not compensate for the -opposite effect observed at the acoustic level. As a result, IDS word forms are -still globally less discriminable than ADS word forms, even though the effect -is numerically small. We discuss the implication of these findings for the view -that the functional role of IDS is to improve language learnability. -" -6433,1712.08819,"Chris Biemann, Stefano Faralli, Alexander Panchenko, Simone Paolo - Ponzetto","A Framework for Enriching Lexical Semantic Resources with Distributional - Semantics",cs.CL," We present an approach to combining distributional semantic representations -induced from text corpora with manually constructed lexical-semantic networks. -While both kinds of semantic resources are available with high lexical -coverage, our aligned resource combines the domain specificity and availability -of contextual information from distributional models with the conciseness and -high quality of manually crafted lexical networks. We start with a -distributional representation of induced senses of vocabulary terms, which are -accompanied with rich context information given by related lexical items. We -then automatically disambiguate such representations to obtain a full-fledged -proto-conceptualization, i.e. a typed graph of induced word senses. In a final -step, this proto-conceptualization is aligned to a lexical ontology, resulting -in a hybrid aligned resource. Moreover, unmapped induced senses are associated -with a semantic type in order to connect them to the core resource. Manual -evaluations against ground-truth judgments for different stages of our method -as well as an extrinsic evaluation on a knowledge-based Word Sense -Disambiguation benchmark all indicate the high quality of the new hybrid -resource. Additionally, we show the benefits of enriching top-down lexical -knowledge resources with bottom-up distributional information from text for -addressing high-end knowledge acquisition tasks such as cleaning hypernym -graphs and learning taxonomies from scratch. -" -6434,1712.08841,"Han He, Lei Wu, Xiaokun Yang, Hua Yan, Zhimin Gao, Yi Feng, George - Townsend","Dual Long Short-Term Memory Networks for Sub-Character Representation - Learning",cs.CL," Characters have commonly been regarded as the minimal processing unit in -Natural Language Processing (NLP). But many non-latin languages have -hieroglyphic writing systems, involving a big alphabet with thousands or -millions of characters. Each character is composed of even smaller parts, which -are often ignored by the previous work. In this paper, we propose a novel -architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to -learn sub-character level representation and capture deeper level of semantic -meanings. To build a concrete study and substantiate the efficiency of our -neural architecture, we take Chinese Word Segmentation as a research case -example. Among those languages, Chinese is a typical case, for which every -character contains several components called radicals. Our networks employ a -shared radical level embedding to solve both Simplified and Traditional Chinese -Word Segmentation, without extra Traditional to Simplified Chinese conversion, -in such a highly end-to-end way the word segmentation can be significantly -simplified compared to the previous work. Radical level embeddings can also -capture deeper semantic meaning below character level and improve the system -performance of learning. By tying radical and character embeddings together, -the parameter count is reduced whereas semantic knowledge is shared and -transferred between two levels, boosting the performance largely. On 3 out of 4 -Bakeoff 2005 datasets, our method surpassed state-of-the-art results by up to -0.4%. Our results are reproducible, source codes and corpora are available on -GitHub. -" -6435,1712.08862,"Feng Jin, Shiliang Sun",Neural Network Multitask Learning for Traffic Flow Forecasting,cs.LG cs.CL," Traditional neural network approaches for traffic flow forecasting are -usually single task learning (STL) models, which do not take advantage of the -information provided by related tasks. In contrast to STL, multitask learning -(MTL) has the potential to improve generalization by transferring information -in training signals of extra tasks. In this paper, MTL based neural networks -are used for traffic flow forecasting. For neural network MTL, a -backpropagation (BP) network is constructed by incorporating traffic flows at -several contiguous time instants into an output layer. Nodes in the output -layer can be seen as outputs of different but closely related STL tasks. -Comprehensive experiments on urban vehicular traffic flow data and comparisons -with STL show that MTL in BP neural networks is a promising and effective -approach for traffic flow forecasting. -" -6436,1712.08917,"Henrico Bertini Brum, Maria das Gra\c{c}as Volpe Nunes",Building a Sentiment Corpus of Tweets in Brazilian Portuguese,cs.CL," The large amount of data available in social media, forums and websites -motivates researches in several areas of Natural Language Processing, such as -sentiment analysis. The popularity of the area due to its subjective and -semantic characteristics motivates research on novel methods and approaches for -classification. Hence, there is a high demand for datasets on different domains -and different languages. This paper introduces TweetSentBR, a sentiment corpora -for Brazilian Portuguese manually annotated with 15.000 sentences on TV show -domain. The sentences were labeled in three classes (positive, neutral and -negative) by seven annotators, following literature guidelines for ensuring -reliability on the annotation. We also ran baseline experiments on polarity -classification using three machine learning methods, reaching 80.99% on -F-Measure and 82.06% on accuracy in binary classification, and 59.85% F-Measure -and 64.62% on accuracy on three point classification. -" -6437,1712.08933,Danillo da Silva Rocha and Alex Gwo Jen Lan and Ivandre Paraboni,Semi-automatic definite description annotation: a first report,cs.CL," Studies in Referring Expression Generation (REG) often make use of corpora of -definite descriptions produced by human subjects in controlled experiments. -Experiments of this kind, which are essential for the study of reference -phenomena and many others, may however include a considerable amount of noise. -Human subjects may easily lack attention, or may simply misunderstand the task -at hand and, as a result, the elicited data may include large proportions of -ambiguous or ill-formed descriptions. In addition to that, REG corpora are -usually collected for the study of semantics-related phenomena, and it is often -the case that the elicited descriptions (and their input contexts) need to be -annotated with their corresponding semantic properties. This, as in many other -fields, may require considerable time and skilled annotators. As a means to -tackle both kinds of difficulties - poor data quality and high annotation costs -- this work discusses a semi-automatic method for the annotation of definite -descriptions produced by human subjects in REG data collection experiments. The -method makes use of simple rules to establish associations between words and -meanings, and is intended to facilitate the design of experiments that produce -REG corpora. -" -6438,1712.08941,Kasturi Dewi Varathan and Anastasia Giachanou and Fabio Crestani,Comparative Opinion Mining: A Review,cs.IR cs.CL," Opinion mining refers to the use of natural language processing, text -analysis and computational linguistics to identify and extract subjective -information in textual material. Opinion mining, also known as sentiment -analysis, has received a lot of attention in recent times, as it provides a -number of tools to analyse the public opinion on a number of different topics. -Comparative opinion mining is a subfield of opinion mining that deals with -identifying and extracting information that is expressed in a comparative form -(e.g.~""paper X is better than the Y""). Comparative opinion mining plays a very -important role when ones tries to evaluate something, as it provides a -reference point for the comparison. This paper provides a review of the area of -comparative opinion mining. It is the first review that cover specifically this -topic as all previous reviews dealt mostly with general opinion mining. This -survey covers comparative opinion mining from two different angles. One from -perspective of techniques and the other from perspective of comparative opinion -elements. It also incorporates preprocessing tools as well as dataset that were -used by the past researchers that can be useful to the future researchers in -the field of comparative opinion mining. -" -6439,1712.08992,"Aditya Siddhant, Preethi Jyothi, Sriram Ganapathy","Leveraging Native Language Speech for Accent Identification using Deep - Siamese Networks",cs.CL cs.LG cs.SD eess.AS," The problem of automatic accent identification is important for several -applications like speaker profiling and recognition as well as for improving -speech recognition systems. The accented nature of speech can be primarily -attributed to the influence of the speaker's native language on the given -speech recording. In this paper, we propose a novel accent identification -system whose training exploits speech in native languages along with the -accented speech. Specifically, we develop a deep Siamese network-based model -which learns the association between accented speech recordings and the native -language speech recordings. The Siamese networks are trained with i-vector -features extracted from the speech recordings using either an unsupervised -Gaussian mixture model (GMM) or a supervised deep neural network (DNN) model. -We perform several accent identification experiments using the CSLU Foreign -Accented English (FAE) corpus. In these experiments, our proposed approach -using deep Siamese networks yield significant relative performance improvements -of 15.4 percent on a 10-class accent identification task, over a baseline -DNN-based classification system that uses GMM i-vectors. Furthermore, we -present a detailed error analysis of the proposed accent identification system. -" -6440,1712.09127,"Baiyang Wang, Diego Klabjan",Generative Adversarial Nets for Multiple Text Corpora,cs.CL," Generative adversarial nets (GANs) have been successfully applied to the -artificial generation of image data. In terms of text data, much has been done -on the artificial generation of natural language from a single corpus. We -consider multiple text corpora as the input data, for which there can be two -applications of GANs: (1) the creation of consistent cross-corpus word -embeddings given different word embeddings per corpus; (2) the generation of -robust bag-of-words document embeddings for each corpora. We demonstrate our -GAN models on real-world text data sets from different corpora, and show that -embeddings from both models lead to improvements in supervised learning -problems. -" -6441,1712.09185,"Chu-Cheng Lin, Dongyeop Kang, Michael Gamon, Madian Khabsa, Ahmed - Hassan Awadallah, Patrick Pantel",Actionable Email Intent Modeling with Reparametrized RNNs,cs.CL," Emails in the workplace are often intentional calls to action for its -recipients. We propose to annotate these emails for what action its recipient -will take. We argue that our approach of action-based annotation is more -scalable and theory-agnostic than traditional speech-act-based email intent -annotation, while still carrying important semantic and pragmatic information. -We show that our action-based annotation scheme achieves good inter-annotator -agreement. We also show that we can leverage threaded messages from other -domains, which exhibit comparable intents in their conversation, with domain -adaptive RAINBOW (Recurrently AttentIve Neural Bag-Of-Words). On a collection -of datasets consisting of IRC, Reddit, and email, our reparametrized RNNs -outperform common multitask/multidomain approaches on several speech act -related tasks. We also experiment with a minimally supervised scenario of email -recipient action classification, and find the reparametrized RNNs learn a -useful representation. -" -6442,1712.09359,Renato Fabbri,"Basic concepts and tools for the Toki Pona minimal and constructed - language: description of the language and main issues; analysis of the - vocabulary; text synthesis and syntax highlighting; Wordnet synsets",cs.CY cs.CL," A minimal constructed language (conlang) is useful for experiments and -comfortable for making tools. The Toki Pona (TP) conlang is minimal both in the -vocabulary (with only 14 letters and 124 lemmas) and in the (about) 10 syntax -rules. The language is useful for being a used and somewhat established minimal -conlang with at least hundreds of fluent speakers. This article exposes current -concepts and resources for TP, and makes available Python (and Vim) scripted -routines for the analysis of the language, synthesis of texts, syntax -highlighting schemes, and the achievement of a preliminary TP Wordnet. Focus is -on the analysis of the basic vocabulary, as corpus analyses were found. The -synthesis is based on sentence templates, relates to context by keeping track -of used words, and renders larger texts by using a fixed number of phonemes -(e.g. for poems) and number of sentences, words and letters (e.g. for -paragraphs). Syntax highlighting reflects morphosyntactic classes given in the -official dictionary and different solutions are described and implemented in -the well-established Vim text editor. The tentative TP Wordnet is made -available in three patterns of relations between synsets and word lemmas. In -summary, this text holds potentially novel conceptualizations about, and tools -and results in analyzing, synthesizing and syntax highlighting the TP language. -" -6443,1712.09391,Subhro Roy and Dan Roth,Mapping to Declarative Knowledge for Word Problem Solving,cs.CL," Math word problems form a natural abstraction to a range of quantitative -reasoning problems, such as understanding financial news, sports results, and -casualties of war. Solving such problems requires the understanding of several -mathematical concepts such as dimensional analysis, subset relationships, etc. -In this paper, we develop declarative rules which govern the translation of -natural language description of these concepts to math expressions. We then -present a framework for incorporating such declarative knowledge into word -problem solving. Our method learns to map arithmetic word problem text to math -expressions, by learning to select the relevant declarative knowledge for each -operation of the solution expression. This provides a way to handle multiple -concepts in the same problem while, at the same time, support interpretability -of the answer expression. Our method models the mapping to declarative -knowledge as a latent variable, thus removing the need for expensive -annotations. Experimental evaluation suggests that our domain knowledge based -solver outperforms all other systems, and that it generalizes better in the -realistic case where the training data it is exposed to is biased in a -different way than the test data. -" -6444,1712.09405,"Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, - Armand Joulin",Advances in Pre-Training Distributed Word Representations,cs.CL," Many Natural Language Processing applications nowadays rely on pre-trained -word representations estimated from large text corpora such as news -collections, Wikipedia and Web Crawl. In this paper, we show how to train -high-quality word vector representations by using a combination of known tricks -that are however rarely used together. The main result of our work is the new -set of publicly available pre-trained models that outperform the current state -of the art by a large margin on a number of tasks. -" -6445,1712.09444,"Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert",Letter-Based Speech Recognition with Gated ConvNets,cs.CL cs.AI," In the recent literature, ""end-to-end"" speech systems often refer to -letter-based acoustic models trained in a sequence-to-sequence manner, either -via a recurrent model or via a structured output learning approach (such as -CTC). In contrast to traditional phone (or senone)-based approaches, these -""end-to-end'' approaches alleviate the need of word pronunciation modeling, and -do not require a ""forced alignment"" step at training time. Phone-based -approaches remain however state of the art on classical benchmarks. In this -paper, we propose a letter-based speech recognition system, leveraging a -ConvNet acoustic model. Key ingredients of the ConvNet are Gated Linear Units -and high dropout. The ConvNet is trained to map audio sequences to their -corresponding letter transcriptions, either via a classical CTC approach, or -via a recent variant called ASG. Coupled with a simple decoder at inference -time, our system matches the best existing letter-based systems on WSJ (in word -error rate), and shows near state of the art performance on LibriSpeech. -" -6446,1712.09509,"Zhiqing Sun, Gehui Shen, Zhihong Deng","A Gap-Based Framework for Chinese Word Segmentation via Very Deep - Convolutional Networks",cs.CL," Most previous approaches to Chinese word segmentation can be roughly -classified into character-based and word-based methods. The former regards this -task as a sequence-labeling problem, while the latter directly segments -character sequence into words. However, if we consider segmenting a given -sentence, the most intuitive idea is to predict whether to segment for each gap -between two consecutive characters, which in comparison makes previous -approaches seem too complex. Therefore, in this paper, we propose a gap-based -framework to implement this intuitive idea. Moreover, very deep convolutional -neural networks, namely, ResNets and DenseNets, are exploited in our -experiments. Results show that our approach outperforms the best -character-based and word-based methods on 5 benchmarks, without any further -post-processing module (e.g. Conditional Random Fields) nor beam search. -" -6447,1712.09518,"Salman Ahmad Ansari, Usman Zafar and Asim Karim",Improving Text Normalization by Optimizing Nearest Neighbor Matching,cs.CL," Text normalization is an essential task in the processing and analysis of -social media that is dominated with informal writing. It aims to map informal -words to their intended standard forms. Previously proposed text normalization -approaches typically require manual selection of parameters for improved -performance. In this paper, we present an automatic optimizationbased nearest -neighbor matching approach for text normalization. This approach is motivated -by the observation that text normalization is essentially a matching problem -and nearest neighbor matching with an adaptive similarity function is the most -direct procedure for it. Our similarity function incorporates weighted -contributions of contextual, string, and phonetic similarity, and the nearest -neighbor matching involves a minimum similarity threshold. These four -parameters are tuned efficiently using grid search. We evaluate the performance -of our approach on two benchmark datasets. The results demonstrate that -parameter tuning on small sized labeled datasets produce state-of-the-art text -normalization performances. Thus, this approach allows practically easy -construction of evolving domain-specific normalization lexicons -" -6448,1712.09662,"Qiming Chen, Ren Wu",CNN Is All You Need,cs.CL cs.LG cs.NE," The Convolution Neural Network (CNN) has demonstrated the unique advantage in -audio, image and text learning; recently it has also challenged Recurrent -Neural Networks (RNNs) with long short-term memory cells (LSTM) in -sequence-to-sequence learning, since the computations involved in CNN are -easily parallelizable whereas those involved in RNN are mostly sequential, -leading to a performance bottleneck. However, unlike RNN, the native CNN lacks -the history sensitivity required for sequence transformation; therefore -enhancing the sequential order awareness, or position-sensitivity, becomes the -key to make CNN the general deep learning model. In this work we introduce an -extended CNN model with strengthen position-sensitivity, called PoseNet. A -notable feature of PoseNet is the asymmetric treatment of position information -in the encoder and the decoder. Experiments shows that PoseNet allows us to -improve the accuracy of CNN based sequence-to-sequence learning significantly, -achieving around 33-36 BLEU scores on the WMT 2014 English-to-German -translation task, and around 44-46 BLEU scores on the English-to-French -translation task. -" -6449,1712.09687,"Tim Rockt\""aschel",Combining Representation Learning with Logic for Language Processing,cs.NE cs.CL cs.LG cs.LO," The current state-of-the-art in many natural language processing and -automated knowledge base completion tasks is held by representation learning -methods which learn distributed vector representations of symbols via -gradient-based optimization. They require little or no hand-crafted features, -thus avoiding the need for most preprocessing steps and task-specific -assumptions. However, in many cases representation learning requires a large -amount of annotated training data to generalize well to unseen data. Such -labeled training data is provided by human annotators who often use formal -logic as the language for specifying annotations. This thesis investigates -different combinations of representation learning methods with logic for -reducing the need for annotated training data, and for improving -generalization. -" -6450,1712.09783,"Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, - Sanjeev Satheesh, Lawrence Carin",Topic Compositional Neural Language Model,cs.LG cs.CL," We propose a Topic Compositional Neural Language Model (TCNLM), a novel -method designed to simultaneously capture both the global semantic meaning and -the local word ordering structure in a document. The TCNLM learns the global -semantic coherence of a document via a neural topic model, and the probability -of each learned latent topic is further used to build a Mixture-of-Experts -(MoE) language model, where each expert (corresponding to one topic) is a -recurrent neural network (RNN) that accounts for learning the local structure -of a word sequence. In order to train the MoE model efficiently, a matrix -factorization method is applied, by extending each weight matrix of the RNN to -be an ensemble of topic-dependent weight matrices. The degree to which each -member of the ensemble is used is tied to the document-dependent probability of -the corresponding topics. Experimental results on several corpora show that the -proposed approach outperforms both a pure RNN-based model and other -topic-guided language models. Further, our model yields sensible topics, and -also has the capacity to generate meaningful sentences conditioned on given -topics. -" -6451,1712.09827,Guy Danon and Mark Last,A Syntactic Approach to Domain-Specific Automatic Question Generation,cs.CL," Factoid questions are questions that require short fact-based answers. -Automatic generation (AQG) of factoid questions from a given text can -contribute to educational activities, interactive question answering systems, -search engines, and other applications. The goal of our research is to generate -factoid source-question-answer triplets based on a specific domain. We propose -a four-component pipeline, which obtains as input a training corpus of -domain-specific documents, along with a set of declarative sentences from the -same domain, and generates as output a set of factoid questions that refer to -the source sentences but are slightly different from them, so that a -question-answering system or a person can be asked a question that requires a -deeper understanding and knowledge than a simple word-matching. Contrary to -existing domain-specific AQG systems that utilize the template-based approach -to question generation, we propose to transform each source sentence into a set -of questions by applying a series of domain-independent rules (a -syntactic-based approach). Our pipeline was evaluated in the domain of cyber -security using a series of experiments on each component of the pipeline -separately and on the end-to-end system. The proposed approach generated a -higher percentage of acceptable questions than a prior state-of-the-art AQG -system. -" -6452,1712.09929,"Karan Grewal, Khai N. Truong",On the Challenges of Detecting Rude Conversational Behaviour,cs.HC cs.CL," In this study, we aim to identify moments of rudeness between two -individuals. In particular, we segment all occurrences of rudeness in -conversations into three broad, distinct categories and try to identify each. -We show how machine learning algorithms can be used to identify rudeness based -on acoustic and semantic signals extracted from conversations. Furthermore, we -make note of our shortcomings in this task and highlight what makes this -problem inherently difficult. Finally, we provide next steps which are needed -to ensure further success in identifying rudeness in conversations. -" -6453,1712.09943,Sungjin Lee,Toward Continual Learning for Conversational Agents,cs.CL cs.AI cs.HC," While end-to-end neural conversation models have led to promising advances in -reducing hand-crafted features and errors induced by the traditional complex -system architecture, they typically require an enormous amount of data due to -the lack of modularity. Previous studies adopted a hybrid approach with -knowledge-based components either to abstract out domain-specific information -or to augment data to cover more diverse patterns. On the contrary, we propose -to directly address the problem using recent developments in the space of -continual learning for neural models. Specifically, we adopt a -domain-independent neural conversational model and introduce a novel neural -continual learning algorithm that allows a conversational agent to accumulate -skills across different tasks in a data-efficient way. To the best of our -knowledge, this is the first work that applies continual learning to -conversation systems. We verified the efficacy of our method through a -conversational skill transfer from either synthetic dialogs or human-human -dialogs to human-computer conversations in a customer support domain. -" -6454,1712.10054,"Edgar Altszyler, Mariano Sigman and Diego Fernandez Slezak","Corpus specificity in LSA and Word2vec: the role of out-of-domain - documents",cs.CL cs.AI," Latent Semantic Analysis (LSA) and Word2vec are some of the most widely used -word embeddings. Despite the popularity of these techniques, the precise -mechanisms by which they acquire new semantic relations between words remain -unclear. In the present article we investigate whether LSA and Word2vec -capacity to identify relevant semantic dimensions increases with size of -corpus. One intuitive hypothesis is that the capacity to identify relevant -dimensions should increase as the amount of data increases. However, if corpus -size grow in topics which are not specific to the domain of interest, signal to -noise ratio may weaken. Here we set to examine and distinguish these -alternative hypothesis. To investigate the effect of corpus specificity and -size in word-embeddings we study two ways for progressive elimination of -documents: the elimination of random documents vs. the elimination of documents -unrelated to a specific task. We show that Word2vec can take advantage of all -the documents, obtaining its best performance when it is trained with the whole -corpus. On the contrary, the specialization (removal of out-of-domain -documents) of the training corpus, accompanied by a decrease of dimensionality, -can increase LSA word-representation quality while speeding up the processing -time. Furthermore, we show that the specialization without the decrease in LSA -dimensionality can produce a strong performance reduction in specific tasks. -From a cognitive-modeling point of view, we point out that LSA's word-knowledge -acquisitions may not be efficiently exploiting higher-order co-occurrences and -global relations, whereas Word2vec does. -" -6455,1712.10066,"Maria Larsson, Amanda Nilsson, Mikael K{\aa}geb\""ack",Disentangled Representations for Manipulation of Sentiment in Text,cs.CL," The ability to change arbitrary aspects of a text while leaving the core -message intact could have a strong impact in fields like marketing and politics -by enabling e.g. automatic optimization of message impact and personalized -language adapted to the receiver's profile. In this paper we take a first step -towards such a system by presenting an algorithm that can manipulate the -sentiment of a text while preserving its semantics using disentangled -representations. Validation is performed by examining trajectories in embedding -space and analyzing transformed sentences for semantic preservation while -expression of desired sentiment shift. -" -6456,1712.10190,Victor Thompson,Detecting Cross-Lingual Plagiarism Using Simulated Word Embeddings,cs.CL cs.IR," Cross-lingual plagiarism (CLP) occurs when texts written in one language are -translated into a different language and used without acknowledging the -original sources. One of the most common methods for detecting CLP requires -online machine translators (such as Google or Microsoft translate) which are -not always available, and given that plagiarism detection typically involves -large document comparison, the amount of translations required would overwhelm -an online machine translator, especially when detecting plagiarism over the -web. In addition, when translated texts are replaced with their synonyms, using -online machine translators to detect CLP would result in poor performance. This -paper addresses the problem of cross-lingual plagiarism detection (CLPD) by -proposing a model that uses simulated word embeddings to reproduce the -predictions of an online machine translator (Google translate) when detecting -CLP. The simulated embeddings comprise of translated words in different -languages mapped in a common space, and replicated to increase the prediction -probability of retrieving the translations of a word (and their synonyms) from -the model. Unlike most existing models, the proposed model does not require -parallel corpora, and accommodates multiple languages (multi-lingual). We -demonstrated the effectiveness of the proposed model in detecting CLP in -standard datasets that contain CLP cases, and evaluated its performance against -a state-of-the-art baseline that relies on online machine translator (T+MA -model). Evaluation results revealed that the proposed model is not only -effective in detecting CLP, it outperformed the baseline. The results indicate -that CLP could be detected with state-of-the-art performances by leveraging the -prediction accuracy of an internet translator with word embeddings, without -relying on internet translators. -" -6457,1712.10224,"Abhinav Rastogi, Dilek Hakkani-Tur, Larry Heck",Scalable Multi-Domain Dialogue State Tracking,cs.CL," Dialogue state tracking (DST) is a key component of task-oriented dialogue -systems. DST estimates the user's goal at each user turn given the interaction -until then. State of the art approaches for state tracking rely on deep -learning methods, and represent dialogue state as a distribution over all -possible slot values for each slot present in the ontology. Such a -representation is not scalable when the set of possible values are unbounded -(e.g., date, time or location) or dynamic (e.g., movies or usernames). -Furthermore, training of such models requires labeled data, where each user -turn is annotated with the dialogue state, which makes building models for new -domains challenging. In this paper, we present a scalable multi-domain deep -learning based approach for DST. We introduce a novel framework for state -tracking which is independent of the slot value set, and represent the dialogue -state as a distribution over a set of values of interest (candidate set) -derived from the dialogue history or knowledge. Restricting these candidate -sets to be bounded in size addresses the problem of slot-scalability. -Furthermore, by leveraging the slot-independent architecture and transfer -learning, we show that our proposed approach facilitates quick adaptation to -new domains. -" -6458,1712.10309,Victor Thompson,Methods for Detecting Paraphrase Plagiarism,cs.IR cs.CL," Paraphrase plagiarism is one of the difficult challenges facing plagiarism -detection systems. Paraphrasing occur when texts are lexically or syntactically -altered to look different, but retain their original meaning. Most plagiarism -detection systems (many of which are commercial based) are designed to detect -word co-occurrences and light modifications, but are unable to detect severe -semantic and structural alterations such as what is seen in many academic -documents. Hence many paraphrase plagiarism cases go undetected. In this paper, -we approached the problem of paraphrase plagiarism by proposing methods for -detecting the most common techniques (phenomena) used in paraphrasing texts -(namely; lexical substitution, insertion/deletion and word and phrase -reordering), and combined the methods into a paraphrase detection model. We -evaluated our proposed methods and model on collections containing paraphrase -texts. Experimental results show significant improvement in performance when -the methods were combined (the proposed model) as opposed to running them -individually. The results also show that the proposed paraphrase detection -model outperformed a standard baseline (based on greedy string tilling), and -previous studies. -" -6459,1801.00049,Ama\c{c} Herda\u{g}delen,Personal Names in Modern Turkey,cs.CL," We analyzed the most common 5000 male and 5000 female Turkish names based on -their etymological, morphological, and semantic attributes. The name statistics -are based on all Turkish citizens who were alive in 2014 and they cover 90% of -all population. To the best of our knowledge, this study is the most -comprehensive data-driven analysis of Turkish personal names. Female names have -a greater diversity than male names (e.g., top 15 male names cover 25% of the -male population, whereas top 28 female names cover 25% of the female -population). Despite their diversity, female names exhibit predictable -patterns. For example, certain roots such as g\""ul and nar (rose and -pomegranate/red, respectively) are used to generate hundreds of unique female -names. Turkish personal names have their origins mainly in Arabic, followed by -Turkish and Persian. We computed overall frequencies of names according to -broad semantic themes that were identified in previous studies. We found that -foreign-origin names such as olga and khaled, pastoral names such as ya\u{g}mur -and deniz (rain and sea, respectively), and names based on fruits and plants -such as filiz and menek\c{s}e (sprout and violet, respectively) are more -frequently observed among females. Among males, names based on animals such as -arslan and yunus (lion and dolphin, respectively) and names based on famous -and/or historical figures such as mustafa kemal and o\u{g}uz ka\u{g}an (founder -of the Turkish Republic and the founder of the Turks in Turkish mythology, -respectively) are observed more frequently. -" -6460,1801.00059,"Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane",The CAPIO 2017 Conversational Speech Recognition System,cs.CL," In this paper we show how we have achieved the state-of-the-art performance -on the industry-standard NIST 2000 Hub5 English evaluation set. We explore -densely connected LSTMs, inspired by the densely connected convolutional -networks recently introduced for image classification tasks. We also propose an -acoustic model adaptation scheme that simply averages the parameters of a seed -neural network acoustic model and its adapted version. This method was applied -with the CallHome training corpus and improved individual system performances -by on average 6.1% (relative) against the CallHome portion of the evaluation -set with no performance loss on the Switchboard portion. With RNN-LM rescoring -and lattice combination on the 5 systems trained across three different phone -sets, our 2017 speech recognition system has obtained 5.0% and 9.1% on -Switchboard and CallHome, respectively, both of which are the best word error -rates reported thus far. According to IBM in their latest work to compare human -and machine transcriptions, our reported Switchboard word error rate can be -considered to surpass the human parity (5.1%) of transcribing conversational -telephone speech. -" -6461,1801.00076,"Tong Guo, Huilin Gao",Bidirectional Attention for SQL Generation,cs.CL," Generating structural query language (SQL) queries from natural language is a -long-standing open problem. Answering a natural language question about a -database table requires modeling complex interactions between the columns of -the table and the question. In this paper, we apply the synthesizing approach -to solve this problem. Based on the structure of SQL queries, we break down the -model to three sub-modules and design specific deep neural networks for each of -them. Taking inspiration from the similar machine reading task, we employ the -bidirectional attention mechanisms and character-level embedding with -convolutional neural networks (CNNs) to improve the result. Experimental -evaluations show that our model achieves the state-of-the-art results in -WikiSQL dataset. -" -6462,1801.00102,"Yi Tay, Luu Anh Tuan, Siu Cheung Hui","Compare, Compress and Propagate: Enhancing Neural Architectures with - Alignment Factorization for Natural Language Inference",cs.CL cs.AI," This paper presents a new deep learning architecture for Natural Language -Inference (NLI). Firstly, we introduce a new architecture where alignment pairs -are compared, compressed and then propagated to upper layers for enhanced -representation learning. Secondly, we adopt factorization layers for efficient -and expressive compression of alignment vectors into scalar features, which are -then used to augment the base word representations. The design of our approach -is aimed to be conceptually simple, compact and yet powerful. We conduct -experiments on three popular benchmarks, SNLI, MultiNLI and SciTail, achieving -competitive performance on all. A lightweight parameterization of our model -also enjoys a $\approx 3$ times reduction in parameter size compared to the -existing state-of-the-art models, e.g., ESIM and DIIN, while maintaining -competitive performance. Additionally, visual analysis shows that our -propagated features are highly interpretable. -" -6463,1801.00168,Ramon Ferrer-i-Cancho and Michael S. Vitevitch,The origins of Zipf's meaning-frequency law,cs.CL cs.IT math.IT physics.soc-ph," In his pioneering research, G. K. Zipf observed that more frequent words tend -to have more meanings, and showed that the number of meanings of a word grows -as the square root of its frequency. He derived this relationship from two -assumptions: that words follow Zipf's law for word frequencies (a power law -dependency between frequency and rank) and Zipf's law of meaning distribution -(a power law dependency between number of meanings and rank). Here we show that -a single assumption on the joint probability of a word and a meaning suffices -to infer Zipf's meaning-frequency law or relaxed versions. Interestingly, this -assumption can be justified as the outcome of a biased random walk in the -process of mental exploration. -" -6464,1801.00215,"Simon Stiebellehner, Jun Wang, Shuai Yuan","Learning Continuous User Representations through Hybrid Filtering with - doc2vec",cs.IR cs.AI cs.CL," Players in the online ad ecosystem are struggling to acquire the user data -required for precise targeting. Audience look-alike modeling has the potential -to alleviate this issue, but models' performance strongly depends on quantity -and quality of available data. In order to maximize the predictive performance -of our look-alike modeling algorithms, we propose two novel hybrid filtering -techniques that utilize the recent neural probabilistic language model -algorithm doc2vec. We apply these methods to data from a large mobile ad -exchange and additional app metadata acquired from the Apple App store and -Google Play store. First, we model mobile app users through their app usage -histories and app descriptions (user2vec). Second, we introduce context -awareness to that model by incorporating additional user and app-related -metadata in model training (context2vec). Our findings are threefold: (1) the -quality of recommendations provided by user2vec is notably higher than current -state-of-the-art techniques. (2) User representations generated through hybrid -filtering using doc2vec prove to be highly valuable features in supervised -machine learning models for look-alike modeling. This represents the first -application of hybrid filtering user models using neural probabilistic language -models, specifically doc2vec, in look-alike modeling. (3) Incorporating context -metadata in the doc2vec model training process to introduce context awareness -has positive effects on performance and is superior to directly including the -data as features in the downstream supervised models. -" -6465,1801.00254,"Youngsam Kim, Hyopil Shin","A New Approach for Measuring Sentiment Orientation based on - Multi-Dimensional Vector Space",cs.CL," This study implements a vector space model approach to measure the sentiment -orientations of words. Two representative vectors for positive/negative -polarity are constructed using high-dimensional vec-tor space in both an -unsupervised and a semi-supervised manner. A sentiment ori-entation value per -word is determined by taking the difference between the cosine distances -against the two reference vec-tors. These two conditions (unsupervised and -semi-supervised) are compared against an existing unsupervised method (Turney, -2002). As a result of our experi-ment, we demonstrate that this novel ap-proach -significantly outperforms the pre-vious unsupervised approach and is more -practical and data efficient as well. -" -6466,1801.00388,"Walid Shalaby, Wlodek Zadrozny, and Hongxia Jin","Beyond Word Embeddings: Learning Entity and Concept Representations from - Large Scale Knowledge Bases",cs.CL cs.AI cs.IR cs.SI," Text representations using neural word embeddings have proven effective in -many NLP applications. Recent researches adapt the traditional word embedding -models to learn vectors of multiword expressions (concepts/entities). However, -these methods are limited to textual knowledge bases (e.g., Wikipedia). In this -paper, we propose a novel and simple technique for integrating the knowledge -about concepts from two large scale knowledge bases of different structure -(Wikipedia and Probase) in order to learn concept representations. We adapt the -efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia -text and Probase concept graph. We evaluate our concept embedding models on two -tasks: (1) analogical reasoning, where we achieve a state-of-the-art -performance of 91% on semantic analogies, (2) concept categorization, where we -achieve a state-of-the-art performance on two benchmark datasets achieving -categorization accuracy of 100% on one and 98% on the other. Additionally, we -present a case study to evaluate our model on unsupervised argument type -identification for neural semantic parsing. We demonstrate the competitive -accuracy of our unsupervised method and its ability to better generalize to out -of vocabulary entity mentions compared to the tedious and error prone methods -which depend on gazetteers and regular expressions. -" -6467,1801.00409,"Haris Bin Zia, Agha Ali Raza, Awais Athar",PronouncUR: An Urdu Pronunciation Lexicon Generator,cs.CL," State-of-the-art speech recognition systems rely heavily on three basic -components: an acoustic model, a pronunciation lexicon and a language model. To -build these components, a researcher needs linguistic as well as technical -expertise, which is a barrier in low-resource domains. Techniques to construct -these three components without having expert domain knowledge are in great -demand. Urdu, despite having millions of speakers all over the world, is a -low-resource language in terms of standard publically available linguistic -resources. In this paper, we present a grapheme-to-phoneme conversion tool for -Urdu that generates a pronunciation lexicon in a form suitable for use with -speech recognition systems from a list of Urdu words. The tool predicts the -pronunciation of words using a LSTM-based model trained on a handcrafted expert -lexicon of around 39,000 words and shows an accuracy of 64% upon internal -evaluation. For external evaluation on a speech recognition task, we obtain a -word error rate comparable to one achieved using a fully handcrafted expert -lexicon. -" -6468,1801.00428,"Rahul Aralikatte, Neelamadhav Gantayat, Naveen Panwar, Anush Sankaran, - Senthil Mani",Sanskrit Sandhi Splitting using seq2(seq)^2,cs.CL," In Sanskrit, small words (morphemes) are combined to form compound words -through a process known as Sandhi. Sandhi splitting is the process of splitting -a given compound word into its constituent morphemes. Although rules governing -word splitting exists in the language, it is highly challenging to identify the -location of the splits in a compound word. Though existing Sandhi splitting -systems incorporate these pre-defined splitting rules, they have a low accuracy -as the same compound word might be broken down in multiple ways to provide -syntactically correct splits. - In this research, we propose a novel deep learning architecture called Double -Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95% -accuracy, and (ii) predicts the constituent words (learning the Sandhi -splitting rules) with 79.5% accuracy, outperforming the state-of-art by 20%. -Additionally, we show the generalization capability of our deep learning model, -by showing competitive results in the problem of Chinese word segmentation, as -well. -" -6469,1801.00453,"Akzharkyn Izbassarova, Aidana Irmanova, A. P. James","Automated rating of recorded classroom presentations using speech - analysis in kazakh",cs.CL cs.AI," Effective presentation skills can help to succeed in business, career and -academy. This paper presents the design of speech assessment during the oral -presentation and the algorithm for speech evaluation based on criteria of -optimal intonation. As the pace of the speech and its optimal intonation varies -from language to language, developing an automatic identification of language -during the presentation is required. Proposed algorithm was tested with -presentations delivered in Kazakh language. For testing purposes the features -of Kazakh phonemes were extracted using MFCC and PLP methods and created a -Hidden Markov Model (HMM) [5], [5] of Kazakh phonemes. Kazakh vowel formants -were defined and the correlation between the deviation rate in fundamental -frequency and the liveliness of the speech to evaluate intonation of the -presentation was analyzed. It was established that the threshold value between -monotone and dynamic speech is 0.16 and the error for intonation evaluation is -19%. -" -6470,1801.00532,"Shaonan Wang, Jiajun Zhang, Chengqing Zong",Learning Multimodal Word Representation via Dynamic Fusion Methods,cs.CL," Multimodal models have been proven to outperform text-based models on -learning semantic word representations. Almost all previous multimodal models -typically treat the representations from different modalities equally. However, -it is obvious that information from different modalities contributes -differently to the meaning of words. This motivates us to build a multimodal -model that can dynamically fuse the semantic representations from different -modalities according to different types of words. To that end, we propose three -novel dynamic fusion methods to assign importance weights to each modality, in -which weights are learned under the weak supervision of word association pairs. -The extensive experiments have demonstrated that the proposed methods -outperform strong unimodal baselines and state-of-the-art multimodal models. -" -6471,1801.00554,"Moustafa Alzantot, Bharathan Balaji, Mani Srivastava","Did you hear that? Adversarial Examples Against Automatic Speech - Recognition",cs.CL cs.CR," Speech is a common and effective way of communication between humans, and -modern consumer devices such as smartphones and home hubs are equipped with -deep learning based accurate automatic speech recognition to enable natural -interaction between humans and machines. Recently, researchers have -demonstrated powerful attacks against machine learning models that can fool -them to produceincorrect results. However, nearly all previous research in -adversarial attacks has focused on image recognition and object detection -models. In this short paper, we present a first of its kind demonstration of -adversarial attacks against speech classification model. Our algorithm performs -targeted attacks with 87% success by adding small background noise without -having to know the underlying model parameter and architecture. Our attack only -changes the least significant bits of a subset of audio clip samples, and the -noise does not change 89% the human listener's perception of the audio clip as -evaluated in our human study. -" -6472,1801.00625,Suriyadeepan Ramamoorthy and Selvakumar Murugan,"An Attentive Sequence Model for Adverse Drug Event Extraction from - Biomedical Text",cs.CL," Adverse reaction caused by drugs is a potentially dangerous problem which may -lead to mortality and morbidity in patients. Adverse Drug Event (ADE) -extraction is a significant problem in biomedical research. We model ADE -extraction as a Question-Answering problem and take inspiration from Machine -Reading Comprehension (MRC) literature, to design our model. Our objective in -designing such a model, is to exploit the local linguistic context in clinical -text and enable intra-sequence interaction, in order to jointly learn to -classify drug and disease entities, and to extract adverse reactions caused by -a given drug. Our model makes use of a self-attention mechanism to facilitate -intra-sequence interaction in a text sequence. This enables us to visualize and -understand how the network makes use of the local and wider context for -classification. -" -6473,1801.00632,"Cedric De Boom, Thomas Demeester, Bart Dhoedt","Character-level Recurrent Neural Networks in Practice: Comparing - Training and Sampling Schemes",cs.LG cs.CL stat.ML," Recurrent neural networks are nowadays successfully used in an abundance of -applications, going from text, speech and image processing to recommender -systems. Backpropagation through time is the algorithm that is commonly used to -train these networks on specific tasks. Many deep learning frameworks have -their own implementation of training and sampling procedures for recurrent -neural networks, while there are in fact multiple other possibilities to choose -from and other parameters to tune. In existing literature this is very often -overlooked or ignored. In this paper we therefore give an overview of possible -training and sampling schemes for character-level recurrent neural networks to -solve the task of predicting the next token in a given sequence. We test these -different schemes on a variety of datasets, neural network architectures and -parameter settings, and formulate a number of take-home recommendations. The -choice of training and sampling scheme turns out to be subject to a number of -trade-offs, such as training stability, sampling time, model performance and -implementation effort, but is largely independent of the data. Perhaps the most -surprising result is that transferring hidden states for correctly initializing -the model on subsequences often leads to unstable training behavior depending -on the dataset. -" -6474,1801.00644,"Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman, L. Jason - Anastasopoulos","Matching with Text Data: An Experimental Evaluation of Methods for - Matching Documents and of Measuring Match Quality",stat.ME cs.CL," Matching for causal inference is a well-studied problem, but standard methods -fail when the units to match are text documents: the high-dimensional and rich -nature of the data renders exact matching infeasible, causes propensity scores -to produce incomparable matches, and makes assessing match quality difficult. -In this paper, we characterize a framework for matching text documents that -decomposes existing methods into: (1) the choice of text representation, and -(2) the choice of distance metric. We investigate how different choices within -this framework affect both the quantity and quality of matches identified -through a systematic multifactor evaluation experiment using human subjects. -Altogether we evaluate over 100 unique text matching methods along with 5 -comparison methods taken from the literature. Our experimental results identify -methods that generate matches with higher subjective match quality than current -state-of-the-art techniques. We enhance the precision of these results by -developing a predictive model to estimate the match quality of pairs of text -documents as a function of our various distance scores. This model, which we -find successfully mimics human judgment, also allows for approximate and -unsupervised evaluation of new procedures. We then employ the identified best -method to illustrate the utility of text matching in two applications. First, -we engage with a substantive debate in the study of media bias by using text -matching to control for topic selection when comparing news articles from -thirteen news sources. We then show how conditioning on text data leads to more -precise causal inferences in an observational study examining the effects of a -medical intervention. -" -6475,1801.00801,Nicolai Pogrebnyakov and Edgar Maldonado,"Identifying emergency stages in Facebook posts of police departments - with convolutional and recurrent neural networks and support vector machines",cs.CL," Classification of social media posts in emergency response is an important -practical problem: accurate classification can help automate processing of such -messages and help other responders and the public react to emergencies in a -timely fashion. This research focused on classifying Facebook messages of US -police departments. Randomly selected 5,000 messages were used to train -classifiers that distinguished between four categories of messages: emergency -preparedness, response and recovery, as well as general engagement messages. -Features were represented with bag-of-words and word2vec, and models were -constructed using support vector machines (SVMs) and convolutional (CNNs) and -recurrent neural networks (RNNs). The best performing classifier was an RNN -with a custom-trained word2vec model to represent features, which achieved the -F1 measure of 0.839. -" -6476,1801.00841,"Kanishka Rao, Ha\c{s}im Sak, Rohit Prabhavalkar","Exploring Architectures, Data and Units For Streaming End-to-End Speech - Recognition with RNN-Transducer",cs.CL cs.SD eess.AS," We investigate training end-to-end speech recognition models with the -recurrent neural network transducer (RNN-T): a streaming, all-neural, -sequence-to-sequence architecture which jointly learns acoustic and language -model components from transcribed acoustic data. We explore various model -architectures and demonstrate how the model can be improved further if -additional text or pronunciation data are available. The model consists of an -`encoder', which is initialized from a connectionist temporal -classification-based (CTC) acoustic model, and a `decoder' which is partially -initialized from a recurrent neural network language model trained on text data -alone. The entire neural network is trained with the RNN-T loss and directly -outputs the recognized transcript as a sequence of graphemes, thus performing -end-to-end speech recognition. We find that performance can be improved further -through the use of sub-word units (`wordpieces') which capture longer context -and significantly reduce substitution errors. The best RNN-T system, a -twelve-layer LSTM encoder with a two-layer LSTM decoder trained with 30,000 -wordpieces as output targets achieves a word error rate of 8.5\% on -voice-search and 5.2\% on voice-dictation tasks and is comparable to a -state-of-the-art baseline at 8.3\% on voice-search and 5.4\% voice-dictation. -" -6477,1801.00984,"Abdelkrime Aries, Djamel Eddine Zegour, Walid Khaled Hidouci","Sentence Object Notation: Multilingual sentence notation based on - Wordnet",cs.CL cs.AI," The representation of sentences is a very important task. It can be used as a -way to exchange data inter-applications. One main characteristic, that a -notation must have, is a minimal size and a representative form. This can -reduce the transfer time, and hopefully the processing time as well. - Usually, sentence representation is associated to the processed language. The -grammar of this language affects how we represent the sentence. To avoid -language-dependent notations, we have to come up with a new representation -which don't use words, but their meanings. This can be done using a lexicon -like wordnet, instead of words we use their synsets. As for syntactic -relations, they have to be universal as much as possible. - Our new notation is called STON ""SenTences Object Notation"", which somehow -has similarities to JSON. It is meant to be minimal, representative and -language-independent syntactic representation. Also, we want it to be readable -and easy to be created. This simplifies developing simple automatic generators -and creating test banks manually. Its benefit is to be used as a medium between -different parts of applications like: text summarization, language translation, -etc. The notation is based on 4 languages: Arabic, English, Franch and -Japanese; and there are some cases where these languages don't agree on one -representation. Also, given the diversity of grammatical structure of different -world languages, this annotation may fail for some languages which allows more -future improvements. -" -6478,1801.01102,Barathi Ganesh HB,Social Media Analysis based on Semanticity of Streaming and Batch Data,cs.CL cs.AI cs.SI," Languages shared by people differ in different regions based on their -accents, pronunciation and word usages. In this era sharing of language takes -place mainly through social media and blogs. Every second swing of such a micro -posts exist which induces the need of processing those micro posts, in-order to -extract knowledge out of it. Knowledge extraction differs with respect to the -application in which the research on cognitive science fed the necessities for -the same. This work further moves forward such a research by extracting -semantic information of streaming and batch data in applications like Named -Entity Recognition and Author Profiling. In the case of Named Entity -Recognition context of a single micro post has been utilized and context that -lies in the pool of micro posts were utilized to identify the sociolect aspects -of the author of those micro posts. In this work Conditional Random Field has -been utilized to do the entity recognition and a novel approach has been -proposed to find the sociolect aspects of the author (Gender, Age group). -" -6479,1801.01331,"Thanh Vu, Dat Quoc Nguyen, Dai Quoc Nguyen, Mark Dras, Mark Johnson",VnCoreNLP: A Vietnamese Natural Language Processing Toolkit,cs.CL," We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP -annotation pipeline for Vietnamese. Our VnCoreNLP supports key natural language -processing (NLP) tasks including word segmentation, part-of-speech (POS) -tagging, named entity recognition (NER) and dependency parsing, and obtains -state-of-the-art (SOTA) results for these tasks. We release VnCoreNLP to -provide rich linguistic annotations to facilitate research work on Vietnamese -NLP. Our VnCoreNLP is open-source and available at: -https://github.com/vncorenlp/VnCoreNLP -" -6480,1801.01531,"Kevin K. Bowden, Jiaqi Wu, Shereen Oraby, Amita Misra and Marilyn - Walker","Slugbot: An Application of a Novel and Scalable Open Domain Socialbot - Framework",cs.CL cs.HC," In this paper we introduce a novel, open domain socialbot for the Amazon -Alexa Prize competition, aimed at carrying on friendly conversations with users -on a variety of topics. We present our modular system, highlighting our -different data sources and how we use the human mind as a model for data -management. Additionally we build and employ natural language understanding and -information retrieval tools and APIs to expand our knowledge bases. We describe -our semistructured, scalable framework for crafting topic-specific dialogue -flows, and give details on our dialogue management schemes and scoring -mechanisms. Finally we briefly evaluate the performance of our system and -observe the challenges that an open domain socialbot faces. -" -6481,1801.01641,Liu Yang and Qingyao Ai and Jiafeng Guo and W. Bruce Croft,"aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching - Model",cs.IR cs.CL," As an alternative to question answering methods based on feature engineering, -deep learning approaches such as convolutional neural networks (CNNs) and Long -Short-Term Memory Models (LSTMs) have recently been proposed for semantic -matching of questions and answers. To achieve good results, however, these -models have been combined with additional features such as word overlap or BM25 -scores. Without this combination, these models perform significantly worse than -methods based on linguistic feature engineering. In this paper, we propose an -attention based neural matching model for ranking short answer text. We adopt -value-shared weighting scheme instead of position-shared weighting scheme for -combining different matching signals and incorporate question term importance -learning using question attention network. Using the popular benchmark TREC QA -data, we show that the relatively simple aNMM model can significantly -outperform other neural network models that have been used for the question -answering task, and is competitive with models that are combined with -additional features. When aNMM is combined with additional features, it -outperforms all baselines. -" -6482,1801.01725,"Jingang Wang, Junfeng Tian, Long Qiu, Sheng Li, Jun Lang, Luo Si and - Man Lan","A Multi-task Learning Approach for Improving Product Title Compression - with User Search Log Data",cs.CL," It is a challenging and practical research problem to obtain effective -compression of lengthy product titles for E-commerce. This is particularly -important as more and more users browse mobile E-commerce apps and more -merchants make the original product titles redundant and lengthy for Search -Engine Optimization. Traditional text summarization approaches often require a -large amount of preprocessing costs and do not capture the important issue of -conversion rate in E-commerce. This paper proposes a novel multi-task learning -approach for improving product title compression with user search log data. In -particular, a pointer network-based sequence-to-sequence approach is utilized -for title compression with an attentive mechanism as an extractive method and -an attentive encoder-decoder approach is utilized for generating user search -queries. The encoding parameters (i.e., semantic embedding of original titles) -are shared among the two tasks and the attention distributions are jointly -optimized. An extensive set of experiments with both human annotated data and -online deployment demonstrate the advantage of the proposed research for both -compression qualities and online business values. -" -6483,1801.01768,Corina Florescu and Wei Jin,Learning Feature Representations for Keyphrase Extraction,cs.CL cs.AI," In supervised approaches for keyphrase extraction, a candidate phrase is -encoded with a set of hand-crafted features and machine learning algorithms are -trained to discriminate keyphrases from non-keyphrases. Although the -manually-designed features have shown to work well in practice, feature -engineering is a difficult process that requires expert knowledge and normally -does not generalize well. In this paper, we present SurfKE, a feature learning -framework that exploits the text itself to automatically discover patterns that -keyphrases exhibit. Our model represents the document as a graph and -automatically learns feature representation of phrases. The proposed model -obtains remarkable improvements in performance over strong baselines. -" -6484,1801.01825,"Danish Contractor, Barun Patra, Mausam Singla, Parag Singla","Towards Understanding and Answering Multi-Sentence Recommendation - Questions on Tourism",cs.CL cs.AI," We introduce the first system towards the novel task of answering complex -multisentence recommendation questions in the tourism domain. Our solution uses -a pipeline of two modules: question understanding and answering. For question -understanding, we define an SQL-like query language that captures the semantic -intent of a question; it supports operators like subset, negation, preference -and similarity, which are often found in recommendation questions. We train and -compare traditional CRFs as well as bidirectional LSTM-based models for -converting a question to its semantic representation. We extend these models to -a semisupervised setting with partially labeled sequences gathered through -crowdsourcing. We find that our best model performs semi-supervised training of -BiDiLSTM+CRF with hand-designed features and CCM(Chang et al., 2007) -constraints. Finally, in an end to end QA system, our answering component -converts our question representation into queries fired on underlying knowledge -sources. Our experiments on two different answer corpora demonstrate that our -system can significantly outperform baselines with up to 20 pt higher accuracy -and 17 pt higher recall. -" -6485,1801.01828,Nestor Rodriguez and Sergio Rojas-Galeano,Shielding Google's language toxicity model against adversarial attacks,cs.CL cs.SI," Lack of moderation in online communities enables participants to incur in -personal aggression, harassment or cyberbullying, issues that have been -accentuated by extremist radicalisation in the contemporary post-truth politics -scenario. This kind of hostility is usually expressed by means of toxic -language, profanity or abusive statements. Recently Google has developed a -machine-learning-based toxicity model in an attempt to assess the hostility of -a comment; unfortunately, it has been suggested that said model can be deceived -by adversarial attacks that manipulate the text sequence of the comment. In -this paper we firstly characterise such adversarial attacks as using -obfuscation and polarity transformations. The former deceives by corrupting -toxic trigger content with typographic edits, whereas the latter deceives by -grammatical negation of the toxic content. Then, we propose a two--stage -approach to counter--attack these anomalies, bulding upon a recently proposed -text deobfuscation method and the toxicity scoring model. Lastly, we conducted -an experiment with approximately 24000 distorted comments, showing how in this -way it is feasible to restore toxicity of the adversarial variants, while -incurring roughly on a twofold increase in processing time. Even though novel -adversary challenges would keep coming up derived from the versatile nature of -written language, we anticipate that techniques combining machine learning and -text pattern recognition methods, each one targeting different layers of -linguistic features, would be needed to achieve robust detection of toxic -language, thus fostering aggression--free digital interaction. -" -6486,1801.01884,"Neil R. Smalheiser, Gary Bonifield","Unsupervised Low-Dimensional Vector Representations for Words, Phrases - and Text that are Transparent, Scalable, and produce Similarity Metrics that - are Complementary to Neural Embeddings",cs.CL cs.IR," Neural embeddings are a popular set of methods for representing words, -phrases or text as a low dimensional vector (typically 50-500 dimensions). -However, it is difficult to interpret these dimensions in a meaningful manner, -and creating neural embeddings requires extensive training and tuning of -multiple parameters and hyperparameters. We present here a simple unsupervised -method for representing words, phrases or text as a low dimensional vector, in -which the meaning and relative importance of dimensions is transparent to -inspection. We have created a near-comprehensive vector representation of -words, and selected bigrams, trigrams and abbreviations, using the set of -titles and abstracts in PubMed as a corpus. This vector is used to create -several novel implicit word-word and text-text similarity metrics. The implicit -word-word similarity metrics correlate well with human judgement of word pair -similarity and relatedness, and outperform or equal all other reported methods -on a variety of biomedical benchmarks, including several implementations of -neural embeddings trained on PubMed corpora. Our implicit word-word metrics -capture different aspects of word-word relatedness than word2vec-based metrics -and are only partially correlated (rho = ~0.5-0.8 depending on task and -corpus). The vector representations of words, bigrams, trigrams, abbreviations, -and PubMed title+abstracts are all publicly available from -http://arrowsmith.psych.uic.edu for release under CC-BY-NC license. Several -public web query interfaces are also available at the same site, including one -which allows the user to specify a given word and view its most closely related -terms according to direct co-occurrence as well as different implicit -similarity metrics. -" -6487,1801.01900,"Devendra Singh Chaplot, Ruslan Salakhutdinov",Knowledge-based Word Sense Disambiguation using Topic Models,cs.CL cs.LG," Word Sense Disambiguation is an open problem in Natural Language Processing -which is particularly challenging and useful in the unsupervised setting where -all the words in any given text need to be disambiguated without using any -labeled data. Typically WSD systems use the sentence or a small window of words -around the target word as the context for disambiguation because their -computational complexity scales exponentially with the size of the context. In -this paper, we leverage the formalism of topic model to design a WSD system -that scales linearly with the number of words in the context. As a result, our -system is able to utilize the whole document as the context for a word to be -disambiguated. The proposed method is a variant of Latent Dirichlet Allocation -in which the topic proportions for a document are replaced by synset -proportions. We further utilize the information in the WordNet by assigning a -non-uniform prior to synset distribution over words and a logistic-normal prior -for document distribution over synsets. We evaluate the proposed method on -Senseval-2, Senseval-3, SemEval-2007, SemEval-2013 and SemEval-2015 English -All-Word WSD datasets and show that it outperforms the state-of-the-art -unsupervised knowledge-based WSD system by a significant margin. -" -6488,1801.01967,"Amir Mazaheri, Mubarak Shah",Visual Text Correction,cs.CV cs.CL," Videos, images, and sentences are mediums that can express the same -semantics. One can imagine a picture by reading a sentence or can describe a -scene with some words. However, even small changes in a sentence can cause a -significant semantic inconsistency with the corresponding video/image. For -example, by changing the verb of a sentence, the meaning may drastically -change. There have been many efforts to encode a video/sentence and decode it -as a sentence/video. In this research, we study a new scenario in which both -the sentence and the video are given, but the sentence is inaccurate. A -semantic inconsistency between the sentence and the video or between the words -of a sentence can result in an inaccurate description. This paper introduces a -new problem, called Visual Text Correction (VTC), i.e., finding and replacing -an inaccurate word in the textual description of a video. We propose a deep -network that can simultaneously detect an inaccuracy in a sentence, and fix it -by replacing the inaccurate word(s). Our method leverages the semantic -interdependence of videos and words, as well as the short-term and long-term -relations of the words in a sentence. In our formulation, part of a visual -feature vector for every single word is dynamically selected through a gating -process. Furthermore, to train and evaluate our model, we propose an approach -to automatically construct a large dataset for VTC problem. Our experiments and -performance analysis demonstrates that the proposed method provides very good -results and also highlights the general challenges in solving the VTC problem. -To the best of our knowledge, this work is the first of its kind for the Visual -Text Correction task. -" -6489,1801.01999,Mikul\'a\v{s} Zelinka,Using reinforcement learning to learn how to play text-based games,cs.CL," The ability to learn optimal control policies in systems where action space -is defined by sentences in natural language would allow many interesting -real-world applications such as automatic optimisation of dialogue systems. -Text-based games with multiple endings and rewards are a promising platform for -this task, since their feedback allows us to employ reinforcement learning -techniques to jointly learn text representations and control policies. We -present a general text game playing agent, testing its generalisation and -transfer learning performance and showing its ability to play multiple games at -once. We also present pyfiction, an open-source library for universal access to -different text games that could, together with our agent that implements its -interface, serve as a baseline for future research. -" -6490,1801.02054,Arthur M. Jacobs,"Explorations in an English Poetry Corpus: A Neurocognitive Poetics - Perspective",cs.CL," This paper describes a corpus of about 3000 English literary texts with about -250 million words extracted from the Gutenberg project that span a range of -genres from both fiction and non-fiction written by more than 130 authors -(e.g., Darwin, Dickens, Shakespeare). Quantitative Narrative Analysis (QNA) is -used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC) -which comprises over 100 poetic texts with around 2 million words from about 50 -authors (e.g., Keats, Joyce, Wordsworth). Some exemplary QNA studies show -author similarities based on latent semantic analysis, significant topics for -each author or various text-analytic metrics for George Eliot's poem 'How Lisa -Loved the King' and James Joyce's 'Chamber Music', concerning e.g. lexical -diversity or sentiment analysis. The GEPC is particularly suited for research -in Digital Humanities, Natural Language Processing or Neurocognitive Poetics, -e.g. as training and test corpus, or for stimulus development and control. -" -6491,1801.02073,"Tomasz Jurczyk, Amit Deshmane, Jinho D. Choi",Analysis of Wikipedia-based Corpora for Question Answering,cs.CL," This paper gives comprehensive analyses of corpora based on Wikipedia for -several tasks in question answering. Four recent corpora are collected,WikiQA, -SelQA, SQuAD, and InfoQA, and first analyzed intrinsically by contextual -similarities, question types, and answer categories. These corpora are then -analyzed extrinsically by three question answering tasks, answer retrieval, -selection, and triggering. An indexing-based method for the creation of a -silver-standard dataset for answer retrieval using the entire Wikipedia is also -presented. Our analysis shows the uniqueness of these corpora and suggests a -better use of them for statistical question answering learning. -" -6492,1801.02107,Omid Kashefi,MIZAN: A Large Persian-English Parallel Corpus,cs.CL," One of the most major and essential tasks in natural language processing is -machine translation that is now highly dependent upon multilingual parallel -corpora. Through this paper, we introduce the biggest Persian-English parallel -corpus with more than one million sentence pairs collected from masterpieces of -literature. We also present acquisition process and statistics of the corpus, -and experiment a base-line statistical machine translation system using the -corpus. -" -6493,1801.02243,"Catherine Xiao, Wanfeng Chen",Trading the Twitter Sentiment with Reinforcement Learning,cs.AI cs.CL cs.SI," This paper is to explore the possibility to use alternative data and -artificial intelligence techniques to trade stocks. The efficacy of the daily -Twitter sentiment on predicting the stock return is examined using machine -learning methods. Reinforcement learning(Q-learning) is applied to generate the -optimal trading policy based on the sentiment signal. The predicting power of -the sentiment signal is more significant if the stock price is driven by the -expectation of the company growth and when the company has a major event that -draws the public attention. The optimal trading strategy based on reinforcement -learning outperforms the trading strategy based on the machine learning -prediction. -" -6494,1801.02581,"Soumil Mandal, Dipankar Das","Analyzing Roles of Classifiers and Code-Mixed factors for Sentiment - Identification",cs.CL," Multilingual speakers often switch between languages to express themselves on -social communication platforms. Sometimes, the original script of the language -is preserved, while using a common script for all the languages is quite -popular as well due to convenience. On such occasions, multiple languages are -being mixed with different rules of grammar, using the same script which makes -it a challenging task for natural language processing even in case of accurate -sentiment identification. In this paper, we report results of various -experiments carried out on movie reviews dataset having this code-mixing -property of two languages, English and Bengali, both typed in Roman script. We -have tested various machine learning algorithms trained only on English -features on our code-mixed data and have achieved the maximum accuracy of -59.00% using Naive Bayes (NB) model. We have also tested various models trained -on code-mixed data, as well as English features and the highest accuracy of -72.50% was obtained by a Support Vector Machine (SVM) model. Finally, we have -analyzed the misclassified snippets and have discussed the challenges needed to -be resolved for better accuracy. -" -6495,1801.02668,Ting-Hao 'Kenneth' Huang and Joseph Chee Chang and Jeffrey P. Bigham,"Evorus: A Crowd-powered Conversational Assistant Built to Automate - Itself Over Time",cs.HC cs.AI cs.CL," Crowd-powered conversational assistants have been shown to be more robust -than automated systems, but do so at the cost of higher response latency and -monetary costs. A promising direction is to combine the two approaches for high -quality, low latency, and low cost solutions. In this paper, we introduce -Evorus, a crowd-powered conversational assistant built to automate itself over -time by (i) allowing new chatbots to be easily integrated to automate more -scenarios, (ii) reusing prior crowd answers, and (iii) learning to -automatically approve response candidates. Our 5-month-long deployment with 80 -participants and 281 conversations shows that Evorus can automate itself -without compromising conversation quality. Crowd-AI architectures have long -been proposed as a way to reduce cost and latency for crowd-powered systems; -Evorus demonstrates how automation can be introduced successfully in a deployed -system. Its architecture allows future researchers to make further innovation -on the underlying automated components in the context of a deployed open domain -dialog system. -" -6496,1801.02808,"Zhiyuan Chen, Nianzu Ma, Bing Liu",Lifelong Learning for Sentiment Classification,cs.CL cs.IR cs.LG," This paper proposes a novel lifelong learning (LL) approach to sentiment -classification. LL mimics the human continuous learning process, i.e., -retaining the knowledge learned from past tasks and use it to help future -learning. In this paper, we first discuss LL in general and then LL for -sentiment classification in particular. The proposed LL approach adopts a -Bayesian optimization framework based on stochastic gradient descent. Our -experimental results show that the proposed method outperforms baseline methods -significantly, which demonstrates that lifelong learning is a promising -research direction. -" -6497,1801.02832,Ferenc Galk\'o and Carsten Eickhoff,"Biomedical Question Answering via Weighted Neural Network Passage - Retrieval",cs.IR cs.AI cs.CL," The amount of publicly available biomedical literature has been growing -rapidly in recent years, yet question answering systems still struggle to -exploit the full potential of this source of data. In a preliminary processing -step, many question answering systems rely on retrieval models for identifying -relevant documents and passages. This paper proposes a weighted cosine distance -retrieval scheme based on neural network word embeddings. Our experiments are -based on publicly available data and tasks from the BioASQ biomedical question -answering challenge and demonstrate significant performance gains over a wide -range of state-of-the-art models. -" -6498,1801.02916,"Miroslav Vodol\'an, Filip Jur\v{c}\'i\v{c}ek",Denotation Extraction for Interactive Learning in Dialogue Systems,cs.CL," This paper presents a novel task using real user data obtained in -human-machine conversation. The task concerns with denotation extraction from -answer hints collected interactively in a dialogue. The task is motivated by -the need for large amounts of training data for question answering dialogue -system development, where the data is often expensive and hard to collect. -Being able to collect denotation interactively and directly from users, one -could improve, for example, natural understanding components on-line and ease -the collection of the training data. This paper also presents introductory -results of evaluation of several denotation extraction models including -attention-based neural network approaches. -" -6499,1801.03032,"Kuntal Dey, Ritvik Shrivastava, Saroj Kaushik","Topical Stance Detection for Twitter: A Two-Phase LSTM Model Using - Attention",cs.CL cs.IR cs.SI," The topical stance detection problem addresses detecting the stance of the -text content with respect to a given topic: whether the sentiment of the given -text content is in FAVOR of (positive), is AGAINST (negative), or is NONE -(neutral) towards the given topic. Using the concept of attention, we develop a -two-phase solution. In the first phase, we classify subjectivity - whether a -given tweet is neutral or subjective with respect to the given topic. In the -second phase, we classify sentiment of the subjective tweets (ignoring the -neutral tweets) - whether a given subjective tweet has a FAVOR or AGAINST -stance towards the topic. We propose a Long Short-Term memory (LSTM) based deep -neural network for each phase, and embed attention at each of the phases. On -the SemEval 2016 stance detection Twitter task dataset, we obtain a best-case -macro F-score of 68.84% and a best-case accuracy of 60.2%, outperforming the -existing deep learning based solutions. Our framework, T-PAN, is the first in -the topical stance detection literature, that uses deep learning within a -two-phase architecture. -" -6500,1801.03257,"Longyue Wang, Zhaopeng Tu, Shuming Shi, Tong Zhang, Yvette Graham, Qun - Liu",Translating Pro-Drop Languages with Reconstruction Models,cs.CL," Pronouns are frequently omitted in pro-drop languages, such as Chinese, -generally leading to significant challenges with respect to the production of -complete translations. To date, very little attention has been paid to the -dropped pronoun (DP) problem within neural machine translation (NMT). In this -work, we propose a novel reconstruction-based approach to alleviating DP -translation problems for NMT models. Firstly, DPs within all source sentences -are automatically annotated with parallel information extracted from the -bilingual training corpus. Next, the annotated source sentence is reconstructed -from hidden representations in the NMT model. With auxiliary training -objectives, in terms of reconstruction scores, the parameters associated with -the NMT model are guided to produce enhanced hidden representations that are -encouraged as much as possible to embed annotated DP information. Experimental -results on both Chinese-English and Japanese-English dialogue translation tasks -show that the proposed approach significantly and consistently improves -translation performance over a strong NMT baseline, which is directly built on -the training data annotated with DPs. -" -6501,1801.03339,"Felix Kreuk, Yossi Adi, Moustapha Cisse, Joseph Keshet",Fooling End-to-end Speaker Verification by Adversarial Examples,cs.LG cs.CL," Automatic speaker verification systems are increasingly used as the primary -means to authenticate costumers. Recently, it has been proposed to train -speaker verification systems using end-to-end deep neural models. In this -paper, we show that such systems are vulnerable to adversarial example attack. -Adversarial examples are generated by adding a peculiar noise to original -speaker examples, in such a way that they are almost indistinguishable from the -original examples by a human listener. Yet, the generated waveforms, which -sound as speaker A can be used to fool such a system by claiming as if the -waveforms were uttered by speaker B. We present white-box attacks on an -end-to-end deep network that was either trained on YOHO or NTIMIT. We also -present two black-box attacks: where the adversarial examples were generated -with a system that was trained on YOHO, but the attack is on a system that was -trained on NTIMIT; and when the adversarial examples were generated with a -system that was trained on Mel-spectrum feature set, but the attack is on a -system that was trained on MFCC. Results suggest that the accuracy of the -attacked system was decreased and the false-positive rate was dramatically -increased. -" -6502,1801.03460,"Marcelo Criscuolo, Erick Rocha Fonseca, Sandra Maria Alu\'isio, Ana - Carolina Speran\c{c}a-Criscuolo",MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection,cs.CL," We introduce MilkQA, a question answering dataset from the dairy domain -dedicated to the study of consumer questions. The dataset contains 2,657 pairs -of questions and answers, written in the Portuguese language and originally -collected by the Brazilian Agricultural Research Corporation (Embrapa). All -questions were motivated by real situations and written by thousands of authors -with very different backgrounds and levels of literacy, while answers were -elaborated by specialists from Embrapa's customer service. Our dataset was -filtered and anonymized by three human annotators. Consumer questions are a -challenging kind of question that is usually employed as a form of seeking -information. Although several question answering datasets are available, most -of such resources are not suitable for research on answer selection models for -consumer questions. We aim to fill this gap by making MilkQA publicly -available. We study the behavior of four answer selection models on MilkQA: two -baseline models and two convolutional neural network archictetures. Our results -show that MilkQA poses real challenges to computational models, particularly -due to linguistic characteristics of its questions and to their unusually -longer lengths. Only one of the experimented models gives reasonable results, -at the cost of high computational requirements. -" -6503,1801.03562,"Paul Tupper, Paul Smolensky, Pyeong Whan Cho","Discrete symbolic optimization and Boltzmann sampling by continuous - neural dynamics: Gradient Symbolic Computation",cs.CL," Gradient Symbolic Computation is proposed as a means of solving discrete -global optimization problems using a neurally plausible continuous stochastic -dynamical system. Gradient symbolic dynamics involves two free parameters that -must be adjusted as a function of time to obtain the global maximizer at the -end of the computation. We provide a summary of what is known about the GSC -dynamics for special cases of settings of the parameters, and also establish -that there is a schedule for the two parameters for which convergence to the -correct answer occurs with high probability. These results put the empirical -results already obtained for GSC on a sound theoretical footing. -" -6504,1801.03563,"Nia Dowell, Tristian Nixon, and Arthur Graesser","Group Communication Analysis: A Computational Linguistics Approach for - Detecting Sociocognitive Roles in Multi-Party Interactions",cs.CL," Roles are one of the most important concepts in understanding human -sociocognitive behavior. During group interactions, members take on different -roles within the discussion. Roles have distinct patterns of behavioral -engagement (i.e., active or passive, leading or following), contribution -characteristics (i.e., providing new information or echoing given material), -and social orientation (i.e., individual or group). Different combinations of -these roles can produce characteristically different group outcomes, being -either less or more productive towards collective goals. In online -collaborative learning environments, this can lead to better or worse learning -outcomes for the individual participants. In this study, we propose and -validate a novel approach for detecting emergent roles from the participants' -contributions and patterns of interaction. Specifically, we developed a group -communication analysis (GCA) by combining automated computational linguistic -techniques with analyses of the sequential interactions of online group -communication. The GCA was applied to three large collaborative interaction -datasets (participant N = 2,429; group N = 3,598). Cluster analyses and linear -mixed-effects modeling were used to assess the validity of the GCA approach and -the influence of learner roles on student and group performance. The results -indicate that participants' patterns in linguistic coordination and cohesion -are representative of the roles that individuals play in collaborative -discussions. More broadly, GCA provides a framework for researchers to explore -the micro intra- and interpersonal patterns associated with the participants' -roles and the sociocognitive processes related to successful collaboration. -" -6505,1801.03564,Omid Kashefi,Unsupervised Part-of-Speech Induction,cs.CL," Part-of-Speech (POS) tagging is an old and fundamental task in natural -language processing. While supervised POS taggers have shown promising -accuracy, it is not always feasible to use supervised methods due to lack of -labeled data. In this project, we attempt to unsurprisingly induce POS tags by -iteratively looking for a recurring pattern of words through a hierarchical -agglomerative clustering process. Our approach shows promising results when -compared to the tagging results of the state-of-the-art unsupervised POS -taggers. -" -6506,1801.03603,"Zhengqiu He and Wenliang Chen and Zhenghua Li and Meishan Zhang and - Wei Zhang and Min Zhang",SEE: Syntax-aware Entity Embedding for Neural Relation Extraction,cs.CL," Distant supervised relation extraction is an efficient approach to scale -relation extraction to very large corpora, and has been widely used to find -novel relational facts from plain text. Recent studies on neural relation -extraction have shown great progress on this task via modeling the sentences in -low-dimensional spaces, but seldom considered syntax information to model the -entities. In this paper, we propose to learn syntax-aware entity embedding for -neural relation extraction. First, we encode the context of entities on a -dependency tree as sentence-level entity embedding based on tree-GRU. Then, we -utilize both intra-sentence and inter-sentence attentions to obtain sentence -set-level entity embedding over all sentences containing the focus entity pair. -Finally, we combine both sentence embedding and entity embedding for relation -classification. We conduct experiments on a widely used real-world dataset and -the experimental results show that our model can make full use of all -informative instances and achieve state-of-the-art performance of relation -extraction. -" -6507,1801.03604,"Ashwin Ram, Rohit Prasad, Chandra Khatri, Anu Venkatesh, Raefer - Gabriel, Qing Liu, Jeff Nunn, Behnam Hedayatnia, Ming Cheng, Ashish Nagar, - Eric King, Kate Bland, Amanda Wartick, Yi Pan, Han Song, Sk Jayadevan, Gene - Hwang, Art Pettigrue",Conversational AI: The Science Behind the Alexa Prize,cs.AI cs.CL cs.CY cs.HC cs.MA," Conversational agents are exploding in popularity. However, much work remains -in the area of social conversation as well as free-form conversation over a -broad range of domains and topics. To advance the state of the art in -conversational AI, Amazon launched the Alexa Prize, a 2.5-million-dollar -university competition where sixteen selected university teams were challenged -to build conversational agents, known as socialbots, to converse coherently and -engagingly with humans on popular topics such as Sports, Politics, -Entertainment, Fashion and Technology for 20 minutes. The Alexa Prize offers -the academic community a unique opportunity to perform research with a live -system used by millions of users. The competition provided university teams -with real user conversational data at scale, along with the user-provided -ratings and feedback augmented with annotations by the Alexa team. This enabled -teams to effectively iterate and make improvements throughout the competition -while being evaluated in real-time through live user interactions. To build -their socialbots, university teams combined state-of-the-art techniques with -novel strategies in the areas of Natural Language Understanding, Context -Modeling, Dialog Management, Response Generation, and Knowledge Acquisition. To -support the efforts of participating teams, the Alexa Prize team made -significant scientific and engineering investments to build and improve -Conversational Speech Recognition, Topic Tracking, Dialog Evaluation, Voice -User Experience, and tools for traffic management and scalability. This paper -outlines the advances created by the university teams as well as the Alexa -Prize team to achieve the common goal of solving the problem of Conversational -AI. -" -6508,1801.03615,"Kai Song, Yue Zhang, Min Zhang, Weihua Luo",Improved English to Russian Translation by Neural Suffix Prediction,cs.CL," Neural machine translation (NMT) suffers a performance deficiency when a -limited vocabulary fails to cover the source or target side adequately, which -happens frequently when dealing with morphologically rich languages. To address -this problem, previous work focused on adjusting translation granularity or -expanding the vocabulary size. However, morphological information is relatively -under-considered in NMT architectures, which may further improve translation -quality. We propose a novel method, which can not only reduce data sparsity but -also model morphology through a simple but effective mechanism. By predicting -the stem and suffix separately during decoding, our system achieves an -improvement of up to 1.98 BLEU compared with previous work on English to -Russian translation. Our method is orthogonal to different NMT architectures -and stably gains improvements on various domains. -" -6509,1801.03622,"Fenfei Guo, Angeliki Metallinou, Chandra Khatri, Anirudh Raju, Anu - Venkatesh, Ashwin Ram",Topic-based Evaluation for Conversational Bots,cs.CL cs.AI cs.CY cs.HC cs.MA," Dialog evaluation is a challenging problem, especially for non task-oriented -dialogs where conversational success is not well-defined. We propose to -evaluate dialog quality using topic-based metrics that describe the ability of -a conversational bot to sustain coherent and engaging conversations on a topic, -and the diversity of topics that a bot can handle. To detect conversation -topics per utterance, we adopt Deep Average Networks (DAN) and train a topic -classifier on a variety of question and query data categorized into multiple -topics. We propose a novel extension to DAN by adding a topic-word attention -table that allows the system to jointly capture topic keywords in an utterance -and perform topic classification. We compare our proposed topic based metrics -with the ratings provided by users and show that our metrics both correlate -with and complement human judgment. Our analysis is performed on tens of -thousands of real human-bot dialogs from the Alexa Prize competition and -highlights user expectations for conversational bots. -" -6510,1801.03625,"Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, - Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki - Metallinou, Rahul Goel, Shaohua Yang, Anirudh Raju",On Evaluating and Comparing Open Domain Dialog Systems,cs.CL cs.AI cs.CY cs.HC cs.MA," Conversational agents are exploding in popularity. However, much work remains -in the area of non goal-oriented conversations, despite significant growth in -research interest over recent years. To advance the state of the art in -conversational AI, Amazon launched the Alexa Prize, a 2.5-million dollar -university competition where sixteen selected university teams built -conversational agents to deliver the best social conversational experience. -Alexa Prize provided the academic community with the unique opportunity to -perform research with a live system used by millions of users. The subjectivity -associated with evaluating conversations is key element underlying the -challenge of building non-goal oriented dialogue systems. In this paper, we -propose a comprehensive evaluation strategy with multiple metrics designed to -reduce subjectivity by selecting metrics which correlate well with human -judgement. The proposed metrics provide granular analysis of the conversational -agents, which is not captured in human ratings. We show that these metrics can -be used as a reasonable proxy for human judgment. We provide a mechanism to -unify the metrics for selecting the top performing agents, which has also been -applied throughout the Alexa Prize competition. To our knowledge, to date it is -the largest setting for evaluating agents with millions of conversations and -hundreds of thousands of ratings from users. We believe that this work is a -step towards an automatic evaluation process for conversational AIs. -" -6511,1801.03825,"Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri, Jens Lehmann","EARL: Joint Entity and Relation Linking for Question Answering over - Knowledge Graphs",cs.AI cs.CL," Many question answering systems over knowledge graphs rely on entity and -relation linking components in order to connect the natural language input to -the underlying knowledge graph. Traditionally, entity linking and relation -linking have been performed either as dependent sequential tasks or as -independent parallel tasks. In this paper, we propose a framework called EARL, -which performs entity linking and relation linking as a joint task. EARL -implements two different solution strategies for which we provide a comparative -analysis in this paper: The first strategy is a formalisation of the joint -entity and relation linking tasks as an instance of the Generalised Travelling -Salesman Problem (GTSP). In order to be computationally feasible, we employ -approximate GTSP solvers. The second strategy uses machine learning in order to -exploit the connection density between nodes in the knowledge graph. It relies -on three base features and re-ranking steps in order to predict entities and -relations. We compare the strategies and evaluate them on a dataset with 5000 -questions. Both strategies significantly outperform the current -state-of-the-art approaches for entity and relation linking. -" -6512,1801.03911,Sahil Garg and Greg Ver Steeg and Aram Galstyan,"Stochastic Learning of Nonstationary Kernels for Natural Language - Modeling",cs.CL cs.IR cs.LG stat.ML," Natural language processing often involves computations with semantic or -syntactic graphs to facilitate sophisticated reasoning based on structural -relationships. While convolution kernels provide a powerful tool for comparing -graph structure based on node (word) level relationships, they are difficult to -customize and can be computationally expensive. We propose a generalization of -convolution kernels, with a nonstationary model, for better expressibility of -natural languages in supervised settings. For a scalable learning of the -parameters introduced with our model, we propose a novel algorithm that -leverages stochastic sampling on k-nearest neighbor graphs, along with -approximations based on locality-sensitive hashing. We demonstrate the -advantages of our approach on a challenging real-world (structured inference) -problem of automatically extracting biological models from the text of -scientific papers. -" -6513,1801.04017,"David Kernot, Terry Bossomaier, Roger Bradbury",Did William Shakespeare and Thomas Kyd Write Edward III?,cs.CL," William Shakespeare is believed to be a significant author in the anonymous -play, The Reign of King Edward III, published in 1596. However, recently, -Thomas Kyd, has been suggested as the primary author. Using a neurolinguistics -approach to authorship identification we use a four-feature technique, RPAS, to -convert the 19 scenes in Edward III into a multi-dimensional vector. Three -complementary analytical techniques are applied to cluster the data and reduce -single technique bias before an alternate method, seriation, is used to measure -the distances between clusters and test the strength of the connections. We -find the multivariate techniques robust and are able to allocate up to 14 -scenes to Thomas Kyd, and further question if scenes long believed to be -Shakespeare's are not his. -" -6514,1801.04354,"Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi","Black-box Generation of Adversarial Text Sequences to Evade Deep - Learning Classifiers",cs.CL cs.CR cs.IR cs.LG," Although various techniques have been proposed to generate adversarial -samples for white-box attacks on text, little attention has been paid to -black-box attacks, which are more realistic scenarios. In this paper, we -present a novel algorithm, DeepWordBug, to effectively generate small text -perturbations in a black-box setting that forces a deep-learning classifier to -misclassify a text input. We employ novel scoring strategies to identify the -critical tokens that, if modified, cause the classifier to make an incorrect -prediction. Simple character-level transformations are applied to the -highest-ranked tokens in order to minimize the edit distance of the -perturbation, yet change the original classification. We evaluated DeepWordBug -on eight real-world text datasets, including text classification, sentiment -analysis, and spam detection. We compare the result of DeepWordBug with two -baselines: Random (Black-box) and Gradient (White-box). Our experimental -results indicate that DeepWordBug reduces the prediction accuracy of current -state-of-the-art deep-learning models, including a decrease of 68\% on average -for a Word-LSTM model and 48\% on average for a Char-CNN model. -" -6515,1801.04433,Georgios K. Pitsilis and Heri Ramampiaro and Helge Langseth,Detecting Offensive Language in Tweets Using Deep Learning,cs.CL cs.CY cs.SI," This paper addresses the important problem of discerning hateful content in -social media. We propose a detection scheme that is an ensemble of Recurrent -Neural Network (RNN) classifiers, and it incorporates various features -associated with user-related information, such as the users' tendency towards -racism or sexism. These data are fed as input to the above classifiers along -with the word frequency vectors derived from the textual content. Our approach -has been evaluated on a publicly available corpus of 16k tweets, and the -results demonstrate its effectiveness in comparison to existing state of the -art solutions. More specifically, our scheme can successfully distinguish -racism and sexism messages from normal text, and achieve higher classification -quality than current state-of-the-art algorithms. -" -6516,1801.04470,"Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael - Baeriswyl, Martin Jaggi",Simple Unsupervised Keyphrase Extraction using Sentence Embeddings,cs.CL," Keyphrase extraction is the task of automatically selecting a small set of -phrases that best describe a given free text document. Supervised keyphrase -extraction requires large amounts of labeled training data and generalizes very -poorly outside the domain of the training data. At the same time, unsupervised -systems have poor accuracy, and often do not generalize well, as they require -the input document to belong to a larger corpus also given as input. Addressing -these drawbacks, in this paper, we tackle keyphrase extraction from single -documents with EmbedRank: a novel unsupervised method, that leverages sentence -embeddings. EmbedRank achieves higher F-scores than graph-based state of the -art systems on standard datasets and is suitable for real-time processing of -large amounts of Web data. With EmbedRank, we also explicitly increase coverage -and diversity among the selected keyphrases by introducing an embedding-based -maximal marginal relevance (MMR) for new phrases. A user study including over -200 votes showed that, although reducing the phrases' semantic overlap leads to -no gains in F-score, our high diversity selection is preferred by humans. -" -6517,1801.04554,"Charles Henrique Porto Ferreira, Debora Maria Rossi de Medeiros, - Fabricio Olivetti de Fran\c{c}a","DCDistance: A Supervised Text Document Feature extraction based on class - labels",cs.IR cs.CL cs.LG," Text Mining is a field that aims at extracting information from textual data. -One of the challenges of such field of study comes from the pre-processing -stage in which a vector (and structured) representation should be extracted -from unstructured data. The common extraction creates large and sparse vectors -representing the importance of each term to a document. As such, this usually -leads to the curse-of-dimensionality that plagues most machine learning -algorithms. To cope with this issue, in this paper we propose a new supervised -feature extraction and reduction algorithm, named DCDistance, that creates -features based on the distance between a document to a representative of each -class label. As such, the proposed technique can reduce the features set in -more than 99% of the original set. Additionally, this algorithm was also -capable of improving the classification accuracy over a set of benchmark -datasets when compared to traditional and state-of-the-art features selection -algorithms. -" -6518,1801.04726,"Mantong Zhou, Minlie Huang, Xiaoyan Zhu",An Interpretable Reasoning Network for Multi-Relation Question Answering,cs.CL," Multi-relation Question Answering is a challenging task, due to the -requirement of elaborated analysis on questions and reasoning over multiple -fact triples in knowledge base. In this paper, we present a novel model called -Interpretable Reasoning Network that employs an interpretable, hop-by-hop -reasoning process for question answering. The model dynamically decides which -part of an input question should be analyzed at each hop; predicts a relation -that corresponds to the current parsed results; utilizes the predicted relation -to update the question representation and the state of the reasoning process; -and then drives the next-hop reasoning. Experiments show that our model yields -state-of-the-art results on two datasets. More interestingly, the model can -offer traceable and observable intermediate predictions for reasoning analysis -and failure diagnosis, thereby allowing manual manipulation in predicting the -final answer. -" -6519,1801.04813,Quan Hoang,Predicting Movie Genres Based on Plot Summaries,cs.CL cs.LG stat.ML," This project explores several Machine Learning methods to predict movie -genres based on plot summaries. Naive Bayes, Word2Vec+XGBoost and Recurrent -Neural Networks are used for text classification, while K-binary -transformation, rank method and probabilistic classification with learned -probability threshold are employed for the multi-label problem involved in the -genre tagging task.Experiments with more than 250,000 movies show that -employing the Gated Recurrent Units (GRU) neural networks for the probabilistic -classification with learned probability threshold approach achieves the best -result on the test set. The model attains a Jaccard Index of 50.0%, a F-score -of 0.56, and a hit rate of 80.5%. -" -6520,1801.04871,"Pararth Shah, Dilek Hakkani-T\""ur, Gokhan T\""ur, Abhinav Rastogi, - Ankur Bapna, Neha Nayak, Larry Heck",Building a Conversational Agent Overnight with Dialogue Self-Play,cs.AI cs.CL," We propose Machines Talking To Machines (M2M), a framework combining -automation and crowdsourcing to rapidly bootstrap end-to-end dialogue agents -for goal-oriented dialogues in arbitrary domains. M2M scales to new tasks with -just a task schema and an API client from the dialogue system developer, but it -is also customizable to cater to task-specific interactions. Compared to the -Wizard-of-Oz approach for data collection, M2M achieves greater diversity and -coverage of salient dialogue flows while maintaining the naturalness of -individual utterances. In the first phase, a simulated user bot and a -domain-agnostic system bot converse to exhaustively generate dialogue -""outlines"", i.e. sequences of template utterances and their semantic parses. In -the second phase, crowd workers provide contextual rewrites of the dialogues to -make the utterances more natural while preserving their meaning. The entire -process can finish within a few hours. We propose a new corpus of 3,000 -dialogues spanning 2 domains collected with M2M, and present comparisons with -popular dialogue datasets on the quality and diversity of the surface forms and -dialogue flows. -" -6521,1801.04958,Robert Giaquinto and Arindam Banerjee,Topic Modeling on Health Journals with Regularized Variational Inference,cs.CL cs.LG stat.ML," Topic modeling enables exploration and compact representation of a corpus. -The CaringBridge (CB) dataset is a massive collection of journals written by -patients and caregivers during a health crisis. Topic modeling on the CB -dataset, however, is challenging due to the asynchronous nature of multiple -authors writing about their health journeys. To overcome this challenge we -introduce the Dynamic Author-Persona topic model (DAP), a probabilistic -graphical model designed for temporal corpora with multiple authors. The -novelty of the DAP model lies in its representation of authors by a persona --- -where personas capture the propensity to write about certain topics over time. -Further, we present a regularized variational inference algorithm, which we use -to encourage the DAP model's personas to be distinct. Our results show -significant improvements over competing topic models --- particularly after -regularization, and highlight the DAP model's unique ability to capture common -journeys shared by different authors. -" -6522,1801.04962,Antonio Toral and Andy Way,"What Level of Quality can Neural Machine Translation Attain on Literary - Text?",cs.CL," Given the rise of a new approach to MT, Neural MT (NMT), and its promising -performance on different text types, we assess the translation quality it can -attain on what is perceived to be the greatest challenge for MT: literary text. -Specifically, we target novels, arguably the most popular type of literary -text. We build a literary-adapted NMT system for the English-to-Catalan -translation direction and evaluate it against a system pertaining to the -previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this -end, for the first time we train MT systems, both NMT and PBSMT, on large -amounts of literary text (over 100 million words) and evaluate them on a set of -twelve widely known novels spanning from the the 1920s to the present day. -According to the BLEU automatic evaluation metric, NMT is significantly better -than PBSMT (p < 0.01) on all the novels considered. Overall, NMT results in a -11% relative improvement (3 points absolute) over PBSMT. A complementary human -evaluation on three of the books shows that between 17% and 34% of the -translations, depending on the book, produced by NMT (versus 8% and 20% with -PBSMT) are perceived by native speakers of the target language to be of -equivalent quality to translations produced by a professional human translator. -" -6523,1801.05032,"Feng-Lin Li, Minghui Qiu, Haiqing Chen, Xiongwei Wang, Xing Gao, Jun - Huang, Juwei Ren, Zhongzhou Zhao, Weipeng Zhao, Lei Wang, Guwei Jin, Wei Chu","AliMe Assist: An Intelligent Assistant for Creating an Innovative - E-commerce Experience",cs.CL cs.AI," We present AliMe Assist, an intelligent assistant designed for creating an -innovative online shopping experience in E-commerce. Based on question -answering (QA), AliMe Assist offers assistance service, customer service, and -chatting service. It is able to take voice and text input, incorporate context -to QA, and support multi-round interaction. Currently, it serves millions of -customer questions per day and is able to address 85% of them. In this paper, -we demonstrate the system, present the underlying techniques, and share our -experience in dealing with real-world QA in the E-commerce field. -" -6524,1801.05088,Chandra Khatri,Real-time Road Traffic Information Detection Through Social Media,cs.CL cs.AI cs.CY cs.IR cs.SI," In current study, a mechanism to extract traffic related information such as -congestion and incidents from textual data from the internet is proposed. The -current source of data is Twitter. As the data being considered is extremely -large in size automated models are developed to stream, download, and mine the -data in real-time. Furthermore, if any tweet has traffic related information -then the models should be able to infer and extract this data. - Currently, the data is collected only for United States and a total of -120,000 geo-tagged traffic related tweets are extracted, while six million -geo-tagged non-traffic related tweets are retrieved and classification models -are trained. Furthermore, this data is used for various kinds of spatial and -temporal analysis. A mechanism to calculate level of traffic congestion, -safety, and traffic perception for cities in U.S. is proposed. Traffic -congestion and safety rankings for the various urban areas are obtained and -then they are statistically validated with existing widely adopted rankings. -Traffic perception depicts the attitude and perception of people towards the -traffic. - It is also seen that traffic related data when visualized spatially and -temporally provides the same pattern as the actual traffic flows for various -urban areas. When visualized at the city level, it is clearly visible that the -flow of tweets is similar to flow of vehicles and that the traffic related -tweets are representative of traffic within the cities. With all the findings -in current study, it is shown that significant amount of traffic related -information can be extracted from Twitter and other sources on internet. -Furthermore, Twitter and these data sources are freely available and are not -bound by spatial and temporal limitations. That is, wherever there is a user -there is a potential for data. -" -6525,1801.05096,"Komei Sugiura, Hisashi Kawai","Grounded Language Understanding for Manipulation Instructions Using - GAN-Based Classification",cs.RO cs.CL cs.HC," The target task of this study is grounded language understanding for domestic -service robots (DSRs). In particular, we focus on instruction understanding for -short sentences where verbs are missing. This task is of critical importance to -build communicative DSRs because manipulation is essential for DSRs. Existing -instruction understanding methods usually estimate missing information only -from non-grounded knowledge; therefore, whether the predicted action is -physically executable or not was unclear. - In this paper, we present a grounded instruction understanding method to -estimate appropriate objects given an instruction and situation. We extend the -Generative Adversarial Nets (GAN) and build a GAN-based classifier using latent -representations. To quantitatively evaluate the proposed method, we have -developed a data set based on the standard data set used for Visual QA. -Experimental results have shown that the proposed method gives the better -result than baseline methods. -" -6526,1801.05119,"Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, Biao Zhang",Variational Recurrent Neural Machine Translation,cs.CL," Partially inspired by successful applications of variational recurrent neural -networks, we propose a novel variational recurrent neural machine translation -(VRNMT) model in this paper. Different from the variational NMT, VRNMT -introduces a series of latent random variables to model the translation -procedure of a sentence in a generative way, instead of a single latent -variable. Specifically, the latent random variables are included into the -hidden states of the NMT decoder with elements from the variational -autoencoder. In this way, these variables are recurrently generated, which -enables them to further capture strong and complex dependencies among the -output translations at different timesteps. In order to deal with the -challenges in performing efficient posterior inference and large-scale training -during the incorporation of latent variables, we build a neural posterior -approximator, and equip it with a reparameterization technique to estimate the -variational lower bound. Experiments on Chinese-English and English-German -translation tasks demonstrate that the proposed model achieves significant -improvements over both the conventional and variational NMT models. -" -6527,1801.05122,"Xiangwen Zhang, Jinsong Su, Yue Qin, Yang Liu, Rongrong Ji, Hongji - Wang",Asynchronous Bidirectional Decoding for Neural Machine Translation,cs.CL," The dominant neural machine translation (NMT) models apply unified -attentional encoder-decoder neural networks for translation. Traditionally, the -NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a -left-toright manner, leaving the target-side contexts generated from right to -left unexploited during translation. In this paper, we equip the conventional -attentional encoder-decoder NMT framework with a backward decoder, in order to -explore bidirectional decoding for NMT. Attending to the hidden state sequence -produced by the encoder, our backward decoder first learns to generate the -target-side hidden state sequence from right to left. Then, the forward decoder -performs translation in the forward direction, while in each translation -prediction timestep, it simultaneously applies two attention models to consider -the source-side and reverse target-side hidden states, respectively. With this -new architecture, our model is able to fully exploit source- and target-side -contexts to improve translation quality altogether. Experimental results on -NIST Chinese-English and WMT English-German translation tasks demonstrate that -our model achieves substantial improvements over the conventional NMT by 3.14 -and 1.38 BLEU points, respectively. The source code of this work can be -obtained from https://github.com/DeepLearnXMU/ABDNMT. -" -6528,1801.05147,"YaoSheng Yang and Meishan Zhang and Wenliang Chen and Wei Zhang and - Haofen Wang and Min Zhang",Adversarial Learning for Chinese NER from Crowd Annotations,cs.CL," To quickly obtain new labeled data, we can choose crowdsourcing as an -alternative way at lower cost in a short time. But as an exchange, crowd -annotations from non-experts may be of lower quality than those from experts. -In this paper, we propose an approach to performing crowd annotation learning -for Chinese Named Entity Recognition (NER) to make full use of the noisy -sequence labels from multiple annotators. Inspired by adversarial learning, our -approach uses a common Bi-LSTM and a private Bi-LSTM for representing -annotator-generic and -specific information. The annotator-generic information -is the common knowledge for entities easily mastered by the crowd. Finally, we -build our Chinese NE tagger based on the LSTM-CRF model. In our experiments, we -create two data sets for Chinese NER tasks from two domains. The experimental -results show that our system achieves better scores than strong baseline -systems. -" -6529,1801.05149,"Young-Bum Kim, Sungjin Lee, Karl Stratos","OneNet: Joint Domain, Intent, Slot Prediction for Spoken Language - Understanding",cs.CL," In practice, most spoken language understanding systems process user input in -a pipelined manner; first domain is predicted, then intent and semantic slots -are inferred according to the semantic frames of the predicted domain. The -pipeline approach, however, has some disadvantages: error propagation and lack -of information sharing. To address these issues, we present a unified neural -network that jointly performs domain, intent, and slot predictions. Our -approach adopts a principled architecture for multitask learning to fold in the -state-of-the-art models for each task. With a few more ingredients, e.g. -orthography-sensitive input encoding and curriculum training, our model -delivered significant improvements in all three tasks across all domains over -strong baselines, including one using oracle prediction for domain detection, -on real user data of a commercial personal assistant. -" -6530,1801.05164,"Jean N\'eraud (LITIS, UNIROUEN), Carla Selmi (LITIS, UNIROUEN)",Embedding a $\theta$-invariant code into a complete one,cs.DM cs.CL math.CO," Let A be a finite or countable alphabet and let $\theta$ be a literal -(anti-)automorphism onto A * (by definition, such a correspondence is -determinated by a permutation of the alphabet). This paper deals with sets -which are invariant under $\theta$ ($\theta$-invariant for short) that is, -languages L such that $\theta$ (L) is a subset of L.We establish an extension -of the famous defect theorem. With regards to the so-called notion of -completeness, we provide a series of examples of finite complete -$\theta$-invariant codes. Moreover, we establish a formula which allows to -embed any non-complete $\theta$-invariant code into a complete one. As a -consequence, in the family of the so-called thin $\theta$--invariant codes, -maximality and completeness are two equivalent notions. -" -6531,1801.05420,"Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia II, Xinyu Xing, Xue - Liu, C. Lee Giles",A Comparative Study of Rule Extraction for Recurrent Neural Networks,cs.LG cs.CL," Understanding recurrent networks through rule extraction has a long history. -This has taken on new interests due to the need for interpreting or verifying -neural networks. One basic form for representing stateful rules is -deterministic finite automata (DFA). Previous research shows that extracting -DFAs from trained second-order recurrent networks is not only possible but also -relatively stable. Recently, several new types of recurrent networks with more -complicated architectures have been introduced. These handle challenging -learning tasks usually involving sequential data. However, it remains an open -problem whether DFAs can be adequately extracted from these models. -Specifically, it is not clear how DFA extraction will be affected when applied -to different recurrent networks trained on data sets with different levels of -complexity. Here, we investigate DFA extraction on several widely adopted -recurrent networks that are trained to learn a set of seven regular Tomita -grammars. We first formally analyze the complexity of Tomita grammars and -categorize these grammars according to that complexity. Then we empirically -evaluate different recurrent networks for their performance of DFA extraction -on all Tomita grammars. Our experiments show that for most recurrent networks, -their extraction performance decreases as the complexity of the underlying -grammar increases. On grammars of lower complexity, most recurrent networks -obtain desirable extraction performance. As for grammars with the highest level -of complexity, while several complicated models fail with only certain -recurrent networks having satisfactory extraction performance. -" -6532,1801.05453,"W. James Murdoch, Peter J. Liu, Bin Yu","Beyond Word Importance: Contextual Decomposition to Extract Interactions - from LSTMs",cs.CL cs.LG stat.ML," The driving force behind the recent success of LSTMs has been their ability -to learn complex and non-linear relationships. Consequently, our inability to -describe these relationships has led to LSTMs being characterized as black -boxes. To this end, we introduce contextual decomposition (CD), an -interpretation algorithm for analysing individual predictions made by standard -LSTMs, without any changes to the underlying model. By decomposing the output -of a LSTM, CD captures the contributions of combinations of words or variables -to the final prediction of an LSTM. On the task of sentiment analysis with the -Yelp and SST data sets, we show that CD is able to reliably identify words and -phrases of contrasting sentiment, and how they are combined to yield the LSTM's -final prediction. Using the phrase-level labels in SST, we also demonstrate -that CD is able to successfully extract positive and negative negations from an -LSTM, something which has not previously been done. -" -6533,1801.05613,"Shrainik Jain, Bill Howe, Jiaqi Yan, Thierry Cruanes","Query2Vec: An Evaluation of NLP Techniques for Generalized Workload - Analytics",cs.DB cs.CL," We consider methods for learning vector representations of SQL queries to -support generalized workload analytics tasks, including workload summarization -for index selection and predicting queries that will trigger memory errors. We -consider vector representations of both raw SQL text and optimized query plans, -and evaluate these methods on synthetic and real SQL workloads. We find that -general algorithms based on vector representations can outperform existing -approaches that rely on specialized features. For index recommendation, we -cluster the vector representations to compress large workloads with no loss in -performance from the recommended index. For error prediction, we train a -classifier over learned vectors that can automatically relate subtle syntactic -patterns with specific errors raised during query execution. Surprisingly, we -also find that these methods enable transfer learning, where a model trained on -one SQL corpus can be applied to an unrelated corpus and still enable good -performance. We find that these general approaches, when trained on a large -corpus of SQL queries, provides a robust foundation for a variety of workload -analysis tasks and database features, without requiring application-specific -feature engineering. -" -6534,1801.05617,"Cynthia Van Hee, Gilles Jacobs, Chris Emmery, Bart Desmet, Els - Lefever, Ben Verhoeven, Guy De Pauw, Walter Daelemans and V\'eronique Hoste",Automatic Detection of Cyberbullying in Social Media Text,cs.CL cs.CY cs.SI," While social media offer great communication opportunities, they also -increase the vulnerability of young people to threatening situations online. -Recent studies report that cyberbullying constitutes a growing problem among -youngsters. Successful prevention depends on the adequate detection of -potentially harmful messages and the information overload on the Web requires -intelligent systems to identify potential risks automatically. The focus of -this paper is on automatic cyberbullying detection in social media text by -modelling posts written by bullies, victims, and bystanders of online bullying. -We describe the collection and fine-grained annotation of a training corpus for -English and Dutch and perform a series of binary classification experiments to -determine the feasibility of automatic cyberbullying detection. We make use of -linear support vector machines exploiting a rich feature set and investigate -which information sources contribute the most for this particular task. -Experiments on a holdout test set reveal promising results for the detection of -cyberbullying-related posts. After optimisation of the hyperparameters, the -classifier yields an F1-score of 64% and 61% for English and Dutch -respectively, and considerably outperforms baseline systems based on keywords -and word unigrams. -" -6535,1801.06024,"Gino Brunner, Yuyi Wang, Roger Wattenhofer, Michael Weigelt","Natural Language Multitasking: Analyzing and Improving Syntactic - Saliency of Hidden Representations",cs.CL cs.AI cs.LG stat.ML," We train multi-task autoencoders on linguistic tasks and analyze the learned -hidden sentence representations. The representations change significantly when -translation and part-of-speech decoders are added. The more decoders a model -employs, the better it clusters sentences according to their syntactic -similarity, as the representation space becomes less entangled. We explore the -structure of the representation space by interpolating between sentences, which -yields interesting pseudo-English sentences, many of which have recognizable -syntactic structure. Lastly, we point out an interesting property of our -models: The difference-vector between two sentences can be added to change a -third sentence with similar features in a meaningful way. -" -6536,1801.06126,Yedid Hoshen and Lior Wolf,Non-Adversarial Unsupervised Word Translation,cs.LG cs.CL," Unsupervised word translation from non-parallel inter-lingual corpora has -attracted much research interest. Very recently, neural network methods trained -with adversarial loss functions achieved high accuracy on this task. Despite -the impressive success of the recent techniques, they suffer from the typical -drawbacks of generative adversarial models: sensitivity to hyper-parameters, -long training time and lack of interpretability. In this paper, we make the -observation that two sufficiently similar distributions can be aligned -correctly with iterative matching methods. We present a novel method that first -aligns the second moment of the word distributions of the two languages and -then iteratively refines the alignment. Extensive experiments on word -translation of European and Non-European languages show that our method -achieves better performance than recent state-of-the-art deep adversarial -approaches and is competitive with the supervised baseline. It is also -efficient, easy to parallelize on CPU and interpretable. -" -6537,1801.06146,"Jeremy Howard, Sebastian Ruder",Universal Language Model Fine-tuning for Text Classification,cs.CL cs.LG stat.ML," Inductive transfer learning has greatly impacted computer vision, but -existing approaches in NLP still require task-specific modifications and -training from scratch. We propose Universal Language Model Fine-tuning -(ULMFiT), an effective transfer learning method that can be applied to any task -in NLP, and introduce techniques that are key for fine-tuning a language model. -Our method significantly outperforms the state-of-the-art on six text -classification tasks, reducing the error by 18-24% on the majority of datasets. -Furthermore, with only 100 labeled examples, it matches the performance of -training from scratch on 100x more data. We open-source our pretrained models -and code. -" -6538,1801.06172,"Shuai Wang, Mianwei Zhou, Geli Fei, Yi Chang, Bing Liu","Contextual and Position-Aware Factorization Machines for Sentiment - Classification",cs.CL," While existing machine learning models have achieved great success for -sentiment classification, they typically do not explicitly capture -sentiment-oriented word interaction, which can lead to poor results for -fine-grained analysis at the snippet level (a phrase or sentence). -Factorization Machine provides a possible approach to learning element-wise -interaction for recommender systems, but they are not directly applicable to -our task due to the inability to model contexts and word sequences. In this -work, we develop two Position-aware Factorization Machines which consider word -interaction, context and position information. Such information is jointly -encoded in a set of sentiment-oriented word interaction vectors. Compared to -traditional word embeddings, SWI vectors explicitly capture sentiment-oriented -word interaction and simplify the parameter learning. Experimental results show -that while they have comparable performance with state-of-the-art methods for -document-level classification, they benefit the snippet/sentence-level -sentiment analysis. -" -6539,1801.06176,"Baolin Peng and Xiujun Li and Jianfeng Gao and Jingjing Liu and - Kam-Fai Wong and Shang-Yu Su","Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy - Learning",cs.CL cs.AI cs.LG cs.NE," Training a task-completion dialogue agent via reinforcement learning (RL) is -costly because it requires many interactions with real users. One common -alternative is to use a user simulator. However, a user simulator usually lacks -the language complexity of human interlocutors and the biases in its design may -tend to degrade the agent. To address these issues, we present Deep Dyna-Q, -which to our knowledge is the first deep RL framework that integrates planning -for task-completion dialogue policy learning. We incorporate into the dialogue -agent a model of the environment, referred to as the world model, to mimic real -user response and generate simulated experience. During dialogue policy -learning, the world model is constantly updated with real user experience to -approach real user behavior, and in turn, the dialogue agent is optimized using -both real experience and simulated experience. The effectiveness of our -approach is demonstrated on a movie-ticket booking task in both simulated and -human-in-the-loop settings. -" -6540,1801.06261,Devendra Singh Sachan and Manzil Zaheer and Ruslan Salakhutdinov,Investigating the Working of Text Classifiers,cs.CL," Text classification is one of the most widely studied tasks in natural -language processing. Motivated by the principle of compositionality, large -multilayer neural network models have been employed for this task in an attempt -to effectively utilize the constituent expressions. Almost all of the reported -work train large networks using discriminative approaches, which come with a -caveat of no proper capacity control, as they tend to latch on to any signal -that may not generalize. Using various recent state-of-the-art approaches for -text classification, we explore whether these models actually learn to compose -the meaning of the sentences or still just focus on some keywords or lexicons -for classifying the document. To test our hypothesis, we carefully construct -datasets where the training and test splits have no direct overlap of such -lexicons, but overall language structure would be similar. We study various -text classifiers and observe that there is a big performance drop on these -datasets. Finally, we show that even simple models with our proposed -regularization techniques, which disincentivize focusing on key lexicons, can -substantially improve classification accuracy. -" -6541,1801.06287,"Linyuan Gong, Ruyi Ji",What Does a TextCNN Learn?,stat.ML cs.CL cs.LG," TextCNN, the convolutional neural network for text, is a useful deep learning -algorithm for sentence classification tasks such as sentiment analysis and -question classification. However, neural networks have long been known as black -boxes because interpreting them is a challenging task. Researchers have -developed several tools to understand a CNN for image classification by deep -visualization, but research about deep TextCNNs is still insufficient. In this -paper, we are trying to understand what a TextCNN learns on two classical NLP -datasets. Our work focuses on functions of different convolutional kernels and -correlations between convolutional kernels. -" -6542,1801.06294,"Shaika Chowdhury, Chenwei Zhang and Philip S. Yu",Multi-Task Pharmacovigilance Mining from Social Media Posts,cs.LG cs.AI cs.CL," Social media has grown to be a crucial information source for -pharmacovigilance studies where an increasing number of people post adverse -reactions to medical drugs that are previously unreported. Aiming to -effectively monitor various aspects of Adverse Drug Reactions (ADRs) from -diversely expressed social medical posts, we propose a multi-task neural -network framework that learns several tasks associated with ADR monitoring with -different levels of supervisions collectively. Besides being able to correctly -classify ADR posts and accurately extract ADR mentions from online posts, the -proposed framework is also able to further understand reasons for which the -drug is being taken, known as 'indication', from the given social media post. A -coverage-based attention mechanism is adopted in our framework to help the -model properly identify 'phrasal' ADRs and Indications that are attentive to -multiple words in a post. Our framework is applicable in situations where -limited parallel data for different pharmacovigilance tasks are available.We -evaluate the proposed framework on real-world Twitter datasets, where the -proposed model outperforms the state-of-the-art alternatives of each individual -task consistently. -" -6543,1801.06353,"Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir, and Julien - Epps",Transfer Learning for Improving Speech Emotion Classification Accuracy,cs.CV cs.CL," The majority of existing speech emotion recognition research focuses on -automatic emotion detection using training and testing data from same corpus -collected under the same conditions. The performance of such systems has been -shown to drop significantly in cross-corpus and cross-language scenarios. To -address the problem, this paper exploits a transfer learning technique to -improve the performance of speech emotion recognition systems that is novel in -cross-language and cross-corpus scenarios. Evaluations on five different -corpora in three different languages show that Deep Belief Networks (DBNs) -offer better accuracy than previous approaches on cross-corpus emotion -recognition, relative to a Sparse Autoencoder and SVM baseline system. Results -also suggest that using a large number of languages for training and using a -small fraction of the target data in training can significantly boost accuracy -compared with baseline also for the corpus with limited training examples. -" -6544,1801.06407,"Andrey Kutuzov, Maria Kunilovskaya","Size vs. Structure in Training Corpora for Word Embedding Models: - Araneum Russicum Maximum and Russian National Corpus",cs.CL," In this paper, we present a distributional word embedding model trained on -one of the largest available Russian corpora: Araneum Russicum Maximum (over 10 -billion words crawled from the web). We compare this model to the model trained -on the Russian National Corpus (RNC). The two corpora are much different in -their size and compilation procedures. We test these differences by evaluating -the trained models against the Russian part of the Multilingual SimLex999 -semantic similarity dataset. We detect and describe numerous issues in this -dataset and publish a new corrected version. Aside from the already known fact -that the RNC is generally a better training corpus than web corpora, we -enumerate and explain fine differences in how the models process semantic -similarity task, what parts of the evaluation set are difficult for particular -models and why. Additionally, the learning curves for both models are -described, showing that the RNC is generally more robust as training material -for this task. -" -6545,1801.06422,"Nina Poerner, Benjamin Roth and Hinrich Sch\""utze","Evaluating neural network explanation methods using hybrid documents and - morphological agreement",cs.CL," The behavior of deep neural networks (DNNs) is hard to understand. This makes -it necessary to explore post hoc explanation methods. We conduct the first -comprehensive evaluation of explanation methods for NLP. To this end, we design -two novel evaluation paradigms that cover two important classes of NLP -problems: small context and large context problems. Both paradigms require no -manual annotation and are therefore broadly applicable. We also introduce -LIMSSE, an explanation method inspired by LIME that is designed for NLP. We -show empirically that LIMSSE, LRP and DeepLIFT are the most effective -explanation methods and recommend them for explaining DNNs in NLP. -" -6546,1801.06436,"Goran Glava\v{s}, Marc Franco-Salvador, Simone Paolo Ponzetto, Paolo - Rosso",A Resource-Light Method for Cross-Lingual Semantic Textual Similarity,cs.CL," Recognizing semantically similar sentences or paragraphs across languages is -beneficial for many tasks, ranging from cross-lingual information retrieval and -plagiarism detection to machine translation. Recently proposed methods for -predicting cross-lingual semantic similarity of short texts, however, make use -of tools and resources (e.g., machine translation systems, syntactic parsers or -named entity recognition) that for many languages (or language pairs) do not -exist. In contrast, we propose an unsupervised and a very resource-light -approach for measuring semantic similarity between texts in different -languages. To operate in the bilingual (or multilingual) space, we project -continuous word vectors (i.e., word embeddings) from one language to the vector -space of the other language via the linear translation model. We then align -words according to the similarity of their vectors in the bilingual embedding -space and investigate different unsupervised measures of semantic similarity -exploiting bilingual embeddings and word alignments. Requiring only a -limited-size set of word translation pairs between the languages, the proposed -approach is applicable to virtually any pair of languages for which there -exists a sufficiently large corpus, required to learn monolingual word -embeddings. Experimental results on three different datasets for measuring -semantic textual similarity show that our simple resource-light approach -reaches performance close to that of supervised and resource intensive methods, -displaying stability across different language pairs. Furthermore, we evaluate -the proposed method on two extrinsic tasks, namely extraction of parallel -sentences from comparable corpora and cross lingual plagiarism detection, and -show that it yields performance comparable to those of complex -resource-intensive state-of-the-art models for the respective tasks. -" -6547,1801.06480,"Tushar Semwal, Gaurav Mathur, Promod Yenigalla and Shivashankar B. - Nair","A Practitioners' Guide to Transfer Learning for Text Classification - using Convolutional Neural Networks",cs.CL," Transfer Learning (TL) plays a crucial role when a given dataset has -insufficient labeled examples to train an accurate model. In such scenarios, -the knowledge accumulated within a model pre-trained on a source dataset can be -transferred to a target dataset, resulting in the improvement of the target -model. Though TL is found to be successful in the realm of image-based -applications, its impact and practical use in Natural Language Processing (NLP) -applications is still a subject of research. Due to their hierarchical -architecture, Deep Neural Networks (DNN) provide flexibility and customization -in adjusting their parameters and depth of layers, thereby forming an apt area -for exploiting the use of TL. In this paper, we report the results and -conclusions obtained from extensive empirical experiments using a Convolutional -Neural Network (CNN) and try to uncover thumb rules to ensure a successful -positive transfer. In addition, we also highlight the flawed means that could -lead to a negative transfer. We explore the transferability of various layers -and describe the effect of varying hyper-parameters on the transfer -performance. Also, we present a comparison of accuracy value and model size -against state-of-the-art methods. Finally, we derive inferences from the -empirical results and provide best practices to achieve a successful positive -transfer. -" -6548,1801.06482,"Sweta Agrawal, Amit Awekar","Deep Learning for Detecting Cyberbullying Across Multiple Social Media - Platforms",cs.IR cs.CL cs.SI," Harassment by cyberbullies is a significant phenomenon on the social media. -Existing works for cyberbullying detection have at least one of the following -three bottlenecks. First, they target only one particular social media platform -(SMP). Second, they address just one topic of cyberbullying. Third, they rely -on carefully handcrafted features of the data. We show that deep learning based -models can overcome all three bottlenecks. Knowledge learned by these models on -one dataset can be transferred to other datasets. We performed extensive -experiments using three real-world datasets: Formspring (12k posts), Twitter -(16k posts), and Wikipedia(100k posts). Our experiments provide several useful -insights about cyberbullying detection. To the best of our knowledge, this is -the first work that systematically analyzes cyberbullying detection on various -topics across multiple SMPs using deep learning based models and transfer -learning. -" -6549,1801.06607,"Yuanhang Su, Yuzhong Huang and C.-C. Jay Kuo","Efficient Text Classification Using Tree-structured Multi-linear - Principal Component Analysis",cs.CL," A novel text data dimension reduction technique, called the tree-structured -multi-linear principal component anal- ysis (TMPCA), is proposed in this work. -Being different from traditional text dimension reduction methods that deal -with the word-level representation, the TMPCA technique reduces the dimension -of input sequences and sentences to simplify the following text classification -tasks. It is shown mathematically and experimentally that the TMPCA tool -demands much lower complexity (and, hence, less computing power) than the -ordinary principal component analysis (PCA). Furthermore, it is demon- strated -by experimental results that the support vector machine (SVM) method applied to -the TMPCA-processed data achieves commensurable or better performance than the -state-of-the-art recurrent neural network (RNN) approach. -" -6550,1801.06613,"Xuancheng Ren, Xu Sun, Ji Wen, Bingzhen Wei, Weidong Zhan, Zhiyuan - Zhang",Building an Ellipsis-aware Chinese Dependency Treebank for Web Text,cs.CL," Web 2.0 has brought with it numerous user-produced data revealing one's -thoughts, experiences, and knowledge, which are a great source for many tasks, -such as information extraction, and knowledge base construction. However, the -colloquial nature of the texts poses new challenges for current natural -language processing techniques, which are more adapt to the formal form of the -language. Ellipsis is a common linguistic phenomenon that some words are left -out as they are understood from the context, especially in oral utterance, -hindering the improvement of dependency parsing, which is of great importance -for tasks relied on the meaning of the sentence. In order to promote research -in this area, we are releasing a Chinese dependency treebank of 319 weibos, -containing 572 sentences with omissions restored and contexts reserved. -" -6551,1801.06700,"Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng - Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath - Chandar, Nan Rosemary Ke, Sai Rajeswar, Alexandre de Brebisson, Jose M. R. - Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, - Yoshua Bengio",A Deep Reinforcement Learning Chatbot (Short Version),cs.CL cs.AI cs.LG cs.NE stat.ML," We present MILABOT: a deep reinforcement learning chatbot developed by the -Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize -competition. MILABOT is capable of conversing with humans on popular small talk -topics through both speech and text. The system consists of an ensemble of -natural language generation and retrieval models, including neural network and -template-based models. By applying reinforcement learning to crowdsourced data -and real-world user interactions, the system has been trained to select an -appropriate response from the models in its ensemble. The system has been -evaluated through A/B testing with real-world users, where it performed -significantly better than other systems. The results highlight the potential of -coupling ensemble systems with deep reinforcement learning as a fruitful path -for developing real-world, open-domain conversational agents. -" -6552,1801.06792,"Gaurav Bhatt, Shivam Sharma and Balasubramanian Raman",Attentive Recurrent Tensor Model for Community Question Answering,cs.CL," A major challenge to the problem of community question answering is the -lexical and semantic gap between the sentence representations. Some solutions -to minimize this gap includes the introduction of extra parameters to deep -models or augmenting the external handcrafted features. In this paper, we -propose a novel attentive recurrent tensor network for solving the lexical and -semantic gap in community question answering. We introduce token-level and -phrase-level attention strategy that maps input sequences to the output using -trainable parameters. Further, we use the tensor parameters to introduce a -3-way interaction between question, answer and external features in vector -space. We introduce simplified tensor matrices with L2 regularization that -results in smooth optimization during training. The proposed model achieves -state-of-the-art performance on the task of answer sentence selection (TrecQA -and WikiQA datasets) while outperforming the current state-of-the-art on the -tasks of best answer selection (Yahoo! L4) and answer triggering task (WikiQA). -" -6553,1801.06807,"Philipp Dufter, Mengjie Zhao, Martin Schmitt, Alexander Fraser, - Hinrich Sch\""utze",Embedding Learning Through Multilingual Concept Induction,cs.CL," We present a new method for estimating vector space representations of words: -embedding learning by concept induction. We test this method on a highly -parallel corpus and learn semantic representations of words in 1259 different -languages in a single common space. An extensive experimental evaluation on -crosslingual word similarity and sentiment analysis indicates that -concept-based multilingual embedding learning performs better than previous -approaches. -" -6554,1801.06830,Ronan Cummins and Marek Rei,Neural Multi-task Learning in Automated Assessment,cs.CL," Grammatical error detection and automated essay scoring are two tasks in the -area of automated assessment. Traditionally these tasks have been treated -independently with different machine learning models and features used for each -task. In this paper, we develop a multi-task neural network model that jointly -optimises for both tasks, and in particular we show that neural automated essay -scoring can be significantly improved. We show that while the essay score -provides little evidence to inform grammatical error detection, the essay score -is highly influenced by error detection. -" -6555,1801.07073,"Antske Fokkens, Serge ter Braake, Niels Ockeloen, Piek Vossen, Susan - Leg\^ene, Guus Schreiber and Victor de Boer",BiographyNet: Extracting Relations Between People and Events,cs.CL," This paper describes BiographyNet, a digital humanities project (2012-2016) -that brings together researchers from history, computational linguistics and -computer science. The project uses data from the Biography Portal of the -Netherlands (BPN), which contains approximately 125,000 biographies from a -variety of Dutch biographical dictionaries from the eighteenth century until -now, describing around 76,000 individuals. BiographyNet's aim is to strengthen -the value of the portal and comparable biographical datasets for historical -research, by improving the search options and the presentation of its outcome, -with a historically justified NLP pipeline that works through a user evaluated -demonstrator. The project's main target group are professional historians. The -project therefore worked with two key concepts: ""provenance"" -understood as a -term allowing for both historical source criticism and for references to -data-management and programming interventions in digitized sources; and -""perspective"" interpreted as inherent uncertainty concerning the interpretation -of historical results. -" -6556,1801.07174,"Hady Elsahar, Elena Demidova, Simon Gottschalk, Christophe Gravier, - Frederique Laforest",Unsupervised Open Relation Extraction,cs.CL," We explore methods to extract relations between named entities from free text -in an unsupervised setting. In addition to standard feature extraction, we -develop a novel method to re-weight word embeddings. We alleviate the problem -of features sparsity using an individual feature reduction. Our approach -exhibits a significant improvement by 5.8% over the state-of-the-art relation -clustering scoring a F1-score of 0.416 on the NYT-FB dataset. -" -6557,1801.07175,Zhitao Gong and Wenlu Wang and Bo Li and Dawn Song and Wei-Shinn Ku,Adversarial Texts with Gradient Methods,cs.CL cs.CR cs.LG," Adversarial samples for images have been extensively studied in the -literature. Among many of the attacking methods, gradient-based methods are -both effective and easy to compute. In this work, we propose a framework to -adapt the gradient attacking methods on images to text domain. The main -difficulties for generating adversarial texts with gradient methods are i) the -input space is discrete, which makes it difficult to accumulate small noise -directly in the inputs, and ii) the measurement of the quality of the -adversarial texts is difficult. We tackle the first problem by searching for -adversarials in the embedding space and then reconstruct the adversarial texts -via nearest neighbor search. For the latter problem, we employ the Word Mover's -Distance (WMD) to quantify the quality of adversarial texts. Through extensive -experiments on three datasets, IMDB movie reviews, Reuters-2 and Reuters-5 -newswires, we show that our framework can leverage gradient attacking methods -to generate very high-quality adversarial texts that are only a few words -different from the original texts. There are many cases where we can change one -word to alter the label of the whole piece of text. We successfully incorporate -FGM and DeepFool into our framework. In addition, we empirically show that WMD -is closely related to the quality of adversarial texts. -" -6558,1801.07243,"Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, - Jason Weston","Personalizing Dialogue Agents: I have a dog, do you have pets too?",cs.AI cs.CL," Chit-chat models are known to have several problems: they lack specificity, -do not display a consistent personality and are often not very captivating. In -this work we present the task of making chit-chat more engaging by conditioning -on profile information. We collect data and train models to (i) condition on -their given profile information; and (ii) information about the person they are -talking to, resulting in improved dialogues, as measured by next utterance -prediction. Since (ii) is initially unknown our model is trained to engage its -partner with personal topics, and we show the resulting dialogue can be used to -predict profile information about the interlocutors. -" -6559,1801.07288,"Ameya Godbole, Aman Dalmia and Sunil Kumar Sahu","Siamese Neural Networks with Random Forest for detecting duplicate - question pairs",cs.CL," Determining whether two given questions are semantically similar is a fairly -challenging task given the different structures and forms that the questions -can take. In this paper, we use Gated Recurrent Units(GRU) in combination with -other highly used machine learning algorithms like Random Forest, Adaboost and -SVM for the similarity prediction task on a dataset released by Quora, -consisting of about 400k labeled question pairs. We got the best result by -using the Siamese adaptation of a Bidirectional GRU with a Random Forest -classifier, which landed us among the top 24% in the competition Quora Question -Pairs hosted on Kaggle. -" -6560,1801.07311,"Arkaitz Zubiaga, Aiqi Jiang",Early Detection of Social Media Hoaxes at Scale,cs.CL cs.SI," The unmoderated nature of social media enables the diffusion of hoaxes, which -in turn jeopardises the credibility of information gathered from social media -platforms. Existing research on automated detection of hoaxes has the -limitation of using relatively small datasets, owing to the difficulty of -getting labelled data. This in turn has limited research exploring early -detection of hoaxes as well as exploring other factors such as the effect of -the size of the training data or the use of sliding windows. To mitigate this -problem, we introduce a semi-automated method that leverages the Wikidata -knowledge base to build large-scale datasets for veracity classification, -focusing on celebrity death reports. This enables us to create a dataset with -4,007 reports including over 13 million tweets, 15% of which are fake. -Experiments using class-specific representations of word embeddings show that -we can achieve F1 scores nearing 72% within 10 minutes of the first tweet being -posted when we expand the size of the training data following our -semi-automated means. Our dataset represents a realistic scenario with a real -distribution of true, commemorative and false stories, which we release for -further use as a benchmark in future research. -" -6561,1801.07414,"Zhao Yan, Duyu Tang, Nan Duan, Shujie Liu, Wendi Wang, Daxin Jiang, - Ming Zhou, Zhoujun Li",Assertion-based QA with Question-Aware Open Information Extraction,cs.CL," We present assertion based question answering (ABQA), an open domain question -answering task that takes a question and a passage as inputs, and outputs a -semi-structured assertion consisting of a subject, a predicate and a list of -arguments. An assertion conveys more evidences than a short answer span in -reading comprehension, and it is more concise than a tedious passage in -passage-based QA. These advantages make ABQA more suitable for human-computer -interaction scenarios such as voice-controlled speakers. Further progress -towards improving ABQA requires richer supervised dataset and powerful models -of text understanding. To remedy this, we introduce a new dataset called -WebAssertions, which includes hand-annotated QA labels for 358,427 assertions -in 55,960 web passages. To address ABQA, we develop both generative and -extractive approaches. The backbone of our generative approach is sequence to -sequence learning. In order to capture the structure of the output assertion, -we introduce a hierarchical decoder that first generates the structure of the -assertion and then generates the words of each field. The extractive approach -is based on learning to rank. Features at different levels of granularity are -designed to measure the semantic relevance between a question and an assertion. -Experimental results show that our approaches have the ability to infer -question-aware assertions from a passage. We further evaluate our approaches by -incorporating the ABQA results as additional features in passage-based QA. -Results on two datasets show that ABQA features significantly improve the -accuracy on passage-based~QA. -" -6562,1801.07495,"Wafa Alorainy, Pete Burnap, Han Liu, Matthew Williams","The Enemy Among Us: Detecting Hate Speech with Threats Based 'Othering' - Language Embeddings",cs.CL cs.SI," Offensive or antagonistic language targeted at individuals and social groups -based on their personal characteristics (also known as cyber hate speech or -cyberhate) has been frequently posted and widely circulated viathe World Wide -Web. This can be considered as a key risk factor for individual and societal -tension linked toregional instability. Automated Web-based cyberhate detection -is important for observing and understandingcommunity and regional societal -tension - especially in online social networks where posts can be rapidlyand -widely viewed and disseminated. While previous work has involved using -lexicons, bags-of-words orprobabilistic language parsing approaches, they often -suffer from a similar issue which is that cyberhate can besubtle and indirect - -thus depending on the occurrence of individual words or phrases can lead to a -significantnumber of false negatives, providing inaccurate representation of -the trends in cyberhate. This problemmotivated us to challenge thinking around -the representation of subtle language use, such as references toperceived -threats from ""the other"" including immigration or job prosperity in a hateful -context. We propose anovel framework that utilises language use around the -concept of ""othering"" and intergroup threat theory toidentify these subtleties -and we implement a novel classification method using embedding learning to -computesemantic distances between parts of speech considered to be part of an -""othering"" narrative. To validate ourapproach we conduct several experiments on -different types of cyberhate, namely religion, disability, race andsexual -orientation, with F-measure scores for classifying hateful instances obtained -through applying ourmodel of 0.93, 0.86, 0.97 and 0.98 respectively, providing -a significant improvement in classifier accuracy overthe state-of-the-art -" -6563,1801.07507,"Yosi Mass, Lili Kotlerman, Shachar Mirkin, Elad Venezian, Gera - Witzling, Noam Slonim","What did you Mention? A Large Scale Mention Detection Benchmark for - Spoken and Written Text",cs.CL," We describe a large, high-quality benchmark for the evaluation of Mention -Detection tools. The benchmark contains annotations of both named entities as -well as other types of entities, annotated on different types of text, ranging -from clean text taken from Wikipedia, to noisy spoken data. The benchmark was -built through a highly controlled crowd sourcing process to ensure its quality. -We describe the benchmark, the process and the guidelines that were used to -build it. We then demonstrate the results of a state-of-the-art system running -on that benchmark. -" -6564,1801.07537,"Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech - Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang",Analyzing Language Learned by an Active Question Answering Agent,cs.CL cs.AI," We analyze the language learned by an agent trained with reinforcement -learning as a component of the ActiveQA system [Buck et al., 2017]. In -ActiveQA, question answering is framed as a reinforcement learning task in -which an agent sits between the user and a black box question-answering system. -The agent learns to reformulate the user's questions to elicit the optimal -answers. It probes the system with many versions of a question that are -generated via a sequence-to-sequence question reformulation model, then -aggregates the returned evidence to find the best answer. This process is an -instance of \emph{machine-machine} communication. The question reformulation -model must adapt its language to increase the quality of the answers returned, -matching the language of the question answering system. We find that the agent -does not learn transformations that align with semantic intuitions but -discovers through learning classical information retrieval techniques such as -tf-idf re-weighting and stemming. -" -6565,1801.07704,"Tal Baumel, Matan Eyal, Michael Elhadad","Query Focused Abstractive Summarization: Incorporating Query Relevance, - Multi-Document Coverage, and Summary Length Constraints into seq2seq Models",cs.CL," Query Focused Summarization (QFS) has been addressed mostly using extractive -methods. Such methods, however, produce text which suffers from low coherence. -We investigate how abstractive methods can be applied to QFS, to overcome such -limitations. Recent developments in neural-attention based sequence-to-sequence -models have led to state-of-the-art results on the task of abstractive generic -single document summarization. Such models are trained in an end to end method -on large amounts of training data. We address three aspects to make abstractive -summarization applicable to QFS: (a)since there is no training data, we -incorporate query relevance into a pre-trained abstractive model; (b) since -existing abstractive models are trained in a single-document setting, we design -an iterated method to embed abstractive models within the multi-document -requirement of QFS; (c) the abstractive models we adapt are trained to generate -text of specific length (about 100 words), while we aim at generating output of -a different size (about 250 words); we design a way to adapt the target size of -the generated summaries to a given size ratio. We compare our method (Relevance -Sensitive Attention for QFS) to extractive baselines and with various ways to -combine abstractive models on the DUC QFS datasets and demonstrate solid -improvements on ROUGE performance. -" -6566,1801.07737,"Pedram Hosseini, Ali Ahmadian Ramaki, Hassan Maleki, Mansoureh Anvari, - Seyed Abolghasem Mirroshandel",SentiPers: A Sentiment Analysis Corpus for Persian,cs.CL," Sentiment Analysis (SA) is a major field of study in natural language -processing, computational linguistics and information retrieval. Interest in SA -has been constantly growing in both academia and industry over the recent -years. Moreover, there is an increasing need for generating appropriate -resources and datasets in particular for low resource languages including -Persian. These datasets play an important role in designing and developing -appropriate opinion mining platforms using supervised, semi-supervised or -unsupervised methods. In this paper, we outline the entire process of -developing a manually annotated sentiment corpus, SentiPers, which covers -formal and informal written contemporary Persian. To the best of our knowledge, -SentiPers is a unique sentiment corpus with such a rich annotation in three -different levels including document-level, sentence-level, and -entity/aspect-level for Persian. The corpus contains more than 26000 sentences -of users opinions from digital product domain and benefits from special -characteristics such as quantifying the positiveness or negativity of an -opinion through assigning a number within a specific range to any given -sentence. Furthermore, we present statistics on various components of our -corpus as well as studying the inter-annotator agreement among the annotators. -Finally, some of the challenges that we faced during the annotation process -will be discussed as well. -" -6567,1801.07746,"Akari Asai, Sara Evensen, Behzad Golshan, Alon Halevy, Vivian Li, - Andrei Lopatenko, Daniela Stepanov, Yoshihiko Suhara, Wang-Chiew Tan, Yinzhan - Xu","HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments",cs.CL," The science of happiness is an area of positive psychology concerned with -understanding what behaviors make people happy in a sustainable fashion. -Recently, there has been interest in developing technologies that help -incorporate the findings of the science of happiness into users' daily lives by -steering them towards behaviors that increase happiness. With the goal of -building technology that can understand how people express their happy moments -in text, we crowd-sourced HappyDB, a corpus of 100,000 happy moments that we -make publicly available. This paper describes HappyDB and its properties, and -outlines several important NLP problems that can be studied with the help of -the corpus. We also apply several state-of-the-art analysis techniques to -analyze HappyDB. Our results demonstrate the need for deeper NLP techniques to -be developed which makes HappyDB an exciting resource for follow-on research. -" -6568,1801.07772,"Yonatan Belinkov, Llu\'is M\`arquez, Hassan Sajjad, Nadir Durrani, - Fahim Dalvi, James Glass","Evaluating Layers of Representation in Neural Machine Translation on - Part-of-Speech and Semantic Tagging Tasks",cs.CL," While neural machine translation (NMT) models provide improved translation -quality in an elegant, end-to-end framework, it is less clear what they learn -about language. Recent work has started evaluating the quality of vector -representations learned by NMT models on morphological and syntactic tasks. In -this paper, we investigate the representations learned at different layers of -NMT encoders. We train NMT systems on parallel data and use the trained models -to extract features for training a classifier on two tasks: part-of-speech and -semantic tagging. We then measure the performance of the classifier as a proxy -to the quality of the original NMT model for the given task. Our quantitative -analysis yields interesting insights regarding representation learning in NMT -models. For instance, we find that higher layers are better at learning -semantics while lower layers tend to be better for part-of-speech tagging. We -also observe little effect of the target language on source-side -representations, especially with higher quality NMT models. -" -6569,1801.07779,Martin Thoma,The WiLI benchmark dataset for written language identification,cs.CV cs.CL," This paper describes the WiLI-2018 benchmark dataset for monolingual written -natural language identification. WiLI-2018 is a publicly available, free of -charge dataset of short text extracts from Wikipedia. It contains 1000 -paragraphs of 235 languages, totaling in 23500 paragraphs. WiLI is a -classification dataset: Given an unknown paragraph written in one dominant -language, it has to be decided which language it is. -" -6570,1801.07804,"Diem Truong, Duc-Thuan Vo, and U.T Nguyen",Vietnamese Open Information Extraction,cs.CL," Open information extraction (OIE) is the process to extract relations and -their arguments automatically from textual documents without the need to -restrict the search to predefined relations. In recent years, several OIE -systems for the English language have been created but there is not any system -for the Vietnamese language. In this paper, we propose a method of OIE for -Vietnamese using a clause-based approach. Accordingly, we exploit Vietnamese -dependency parsing using grammar clauses that strives to consider all possible -relations in a sentence. The corresponding clause types are identified by their -propositions as extractable relations based on their grammatical functions of -constituents. As a result, our system is the first OIE system named vnOIE for -the Vietnamese language that can generate open relations and their arguments -from Vietnamese text with highly scalable extraction while being domain -independent. Experimental results show that our OIE system achieves promising -results with a precision of 83.71%. -" -6571,1801.07861,"Zhen Wu, Xin-Yu Dai, Cunyan Yin, Shujian Huang, Jiajun Chen","Improving Review Representations with User Attention and Product - Attention for Sentiment Classification",cs.CL cs.IR cs.LG," Neural network methods have achieved great success in reviews sentiment -classification. Recently, some works achieved improvement by incorporating user -and product information to generate a review representation. However, in -reviews, we observe that some words or sentences show strong user's preference, -and some others tend to indicate product's characteristic. The two kinds of -information play different roles in determining the sentiment label of a -review. Therefore, it is not reasonable to encode user and product information -together into one representation. In this paper, we propose a novel framework -to encode user and product information. Firstly, we apply two individual -hierarchical neural networks to generate two representations, with user -attention or with product attention. Then, we design a combined strategy to -make full use of the two representations for training and final prediction. The -experimental results show that our model obviously outperforms other -state-of-the-art methods on IMDB and Yelp datasets. Through the visualization -of attention over words related to user or product, we validate our observation -mentioned above. -" -6572,1801.07875,Michael Bloodgood,"Support Vector Machine Active Learning Algorithms with - Query-by-Committee versus Closest-to-Hyperplane Selection",cs.LG cs.CL cs.IR stat.ML," This paper investigates and evaluates support vector machine active learning -algorithms for use with imbalanced datasets, which commonly arise in many -applications such as information extraction applications. Algorithms based on -closest-to-hyperplane selection and query-by-committee selection are combined -with methods for addressing imbalance such as positive amplification based on -prevalence statistics from initial random samples. Three algorithms (ClosestPA, -QBagPA, and QBoostPA) are presented and carefully evaluated on datasets for -text classification and relation extraction. The ClosestPA algorithm is shown -to consistently outperform the other two in a variety of ways and insights are -provided as to why this is the case. -" -6573,1801.07883,"Lei Zhang, Shuai Wang, Bing Liu",Deep Learning for Sentiment Analysis : A Survey,cs.CL cs.IR cs.LG stat.ML," Deep learning has emerged as a powerful machine learning technique that -learns multiple layers of representations or features of the data and produces -state-of-the-art prediction results. Along with the success of deep learning in -many other application domains, deep learning is also popularly used in -sentiment analysis in recent years. This paper first gives an overview of deep -learning and then provides a comprehensive survey of its current applications -in sentiment analysis. -" -6574,1801.07887,"Garrett Beatty, Ethan Kochis and Michael Bloodgood",Impact of Batch Size on Stopping Active Learning for Text Classification,cs.LG cs.CL cs.IR stat.ML," When using active learning, smaller batch sizes are typically more efficient -from a learning efficiency perspective. However, in practice due to speed and -human annotator considerations, the use of larger batch sizes is necessary. -While past work has shown that larger batch sizes decrease learning efficiency -from a learning curve perspective, it remains an open question how batch size -impacts methods for stopping active learning. We find that large batch sizes -degrade the performance of a leading stopping method over and above the -degradation that results from reduced learning efficiency. We analyze this -degradation and find that it can be mitigated by changing the window size -parameter of how many past iterations of learning are taken into account when -making the stopping decision. We find that when using larger batch sizes, -stopping methods are more effective when smaller window sizes are used. -" -6575,1801.07948,Hayafumi Watanabe,"Empirical observations of ultraslow diffusion driven by the fractional - dynamics in languages: Dynamical statistical properties of word counts of - already popular words",physics.soc-ph cs.CL cs.CY," Ultraslow diffusion (i.e. logarithmic diffusion) has been extensively studied -theoretically, but has hardly been observed empirically. In this paper, -firstly, we find the ultraslow-like diffusion of the time-series of word counts -of already popular words by analysing three different nationwide language -databases: (i) newspaper articles (Japanese), (ii) blog articles (Japanese), -and (iii) page views of Wikipedia (English, French, Chinese, and Japanese). -Secondly, we use theoretical analysis to show that this diffusion is basically -explained by the random walk model with the power-law forgetting with the -exponent $\beta \approx 0.5$, which is related to the fractional Langevin -equation. The exponent $\beta$ characterises the speed of forgetting and $\beta -\approx 0.5$ corresponds to (i) the border (or thresholds) between the -stationary and the nonstationary and (ii) the right-in-the-middle dynamics -between the IID noise for $\beta=1$ and the normal random walk for $\beta=0$. -Thirdly, the generative model of the time-series of word counts of already -popular words, which is a kind of Poisson process with the Poisson parameter -sampled by the above-mentioned random walk model, can almost reproduce not only -the empirical mean-squared displacement but also the power spectrum density and -the probability density function. -" -6576,1801.08186,"Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, - Tamara L.Berg","MAttNet: Modular Attention Network for Referring Expression - Comprehension",cs.CV cs.AI cs.CL," In this paper, we address referring expression comprehension: localizing an -image region described by a natural language expression. While most recent work -treats expressions as a single unit, we propose to decompose them into three -modular components related to subject appearance, location, and relationship to -other objects. This allows us to flexibly adapt to expressions containing -different types of information in an end-to-end framework. In our model, which -we call the Modular Attention Network (MAttNet), two types of attention are -utilized: language-based attention that learns the module weights as well as -the word/phrase attention that each module should focus on; and visual -attention that allows the subject and relationship modules to focus on relevant -image components. Module weights combine scores from all three modules -dynamically to output an overall score. Experiments show that MAttNet -outperforms previous state-of-art methods by a large margin on both -bounding-box-level and pixel-level comprehension tasks. Demo and code are -provided. -" -6577,1801.08290,Souvik Kundu and Hwee Tou Ng,A Question-Focused Multi-Factor Attention Network for Question Answering,cs.CL," Neural network models recently proposed for question answering (QA) primarily -focus on capturing the passage-question relation. However, they have minimal -capability to link relevant facts distributed across multiple sentences which -is crucial in achieving deeper understanding, such as performing multi-sentence -reasoning, co-reference resolution, etc. They also do not explicitly focus on -the question and answer type which often plays a critical role in QA. In this -paper, we propose a novel end-to-end question-focused multi-factor attention -network for answer extraction. Multi-factor attentive encoding using -tensor-based transformation aggregates meaningful facts even when they are -located in multiple sentences. To implicitly infer the answer type, we also -propose a max-attentional question aggregation mechanism to encode a question -vector based on the important words in a question. During prediction, we -incorporate sequence-level encoding of the first wh-word and its immediately -following word as an additional source of question type information. Our -proposed model achieves significant improvements over the best prior -state-of-the-art results on three large-scale challenging QA datasets, namely -NewsQA, TriviaQA, and SearchQA. -" -6578,1801.08337,Nadir Durrani and Fahim Dalvi,Continuous Space Reordering Models for Phrase-based MT,cs.CL," Bilingual sequence models improve phrase-based translation and reordering by -overcoming phrasal independence assumption and handling long range reordering. -However, due to data sparsity, these models often fall back to very small -context sizes. This problem has been previously addressed by learning sequences -over generalized representations such as POS tags or word clusters. In this -paper, we explore an alternative based on neural network models. More -concretely we train neuralized versions of lexicalized reordering and the -operation sequence models using feed-forward neural network. Our results show -improvements of up to 0.6 and 0.5 BLEU points on top of the baseline -German->English and English->German systems. We also observed improvements -compared to the systems that used POS tags and word clusters to train these -models. Because we modify the bilingual corpus to integrate reordering -operations, this allows us to also train a sequence-to-sequence neural MT model -having explicit reordering triggers. Our motivation was to directly enable -reordering information in the encoder-decoder framework, which otherwise relies -solely on the attention model to handle long range reordering. We tried both -coarser and fine-grained reordering operations. However, these experiments did -not yield any improvements over the baseline Neural MT systems. -" -6579,1801.08660,"Angli Liu, Katrin Kirchhoff",Context Models for OOV Word Translation in Low-Resource Languages,cs.CL stat.ML," Out-of-vocabulary word translation is a major problem for the translation of -low-resource languages that suffer from a lack of parallel training data. This -paper evaluates the contributions of target-language context models towards the -translation of OOV words, specifically in those cases where OOV translations -are derived from external knowledge sources, such as dictionaries. We develop -both neural and non-neural context models and evaluate them within both -phrase-based and self-attention based neural machine translation systems. Our -results show that neural language models that integrate additional context -beyond the current sentence are the most effective in disambiguating possible -OOV word translations. We present an efficient second-pass lattice-rescoring -method for wide-context neural language models and demonstrate performance -improvements over state-of-the-art self-attention based neural MT systems in -five out of six low-resource language pairs. -" -6580,1801.08831,Shamil Chollampatt and Hwee Tou Ng,"A Multilayer Convolutional Encoder-Decoder Neural Network for - Grammatical Error Correction",cs.CL," We improve automatic correction of grammatical, orthographic, and collocation -errors in text using a multilayer convolutional encoder-decoder neural network. -The network is initialized with embeddings that make use of character N-gram -information to better suit this task. When evaluated on common benchmark test -data sets (CoNLL-2014 and JFLEG), our model substantially outperforms all prior -neural approaches on this task as well as strong statistical machine -translation-based systems with neural and task-specific features trained on the -same data. Our analysis shows the superiority of convolutional neural networks -over recurrent neural networks such as long short-term memory (LSTM) networks -in capturing the local context via attention, and thereby improving the -coverage in correcting grammatical errors. By ensembling multiple models, and -incorporating an N-gram language model and edit features via rescoring, our -novel method becomes the first neural approach to outperform the current -state-of-the-art statistical machine translation-based approach, both in terms -of grammaticality and fluency. -" -6581,1801.08991,Maxime Peyrard,A Simple Theoretical Model of Importance for Summarization,cs.CL," Research on summarization has mainly been driven by empirical approaches, -crafting systems to perform well on standard datasets with the notion of -information Importance remaining latent. We argue that establishing theoretical -models of Importance will advance our understanding of the task and help to -further improve summarization systems. To this end, we propose simple but -rigorous definitions of several concepts that were previously used only -intuitively in summarization: Redundancy, Relevance, and Informativeness. -Importance arises as a single quantity naturally unifying these concepts. -Additionally, we provide intuitions to interpret the proposed quantities and -experiments to demonstrate the potential of the framework to inform and guide -subsequent works. -" -6582,1801.09030,"Wei Li, Zheng Yang, Xu Sun","Exploration on Generating Traditional Chinese Medicine Prescription from - Symptoms with an End-to-End method",cs.CL," Traditional Chinese Medicine (TCM) is an influential form of medical -treatment in China and surrounding areas. In this paper, we propose a TCM -prescription generation task that aims to automatically generate a herbal -medicine prescription based on textual symptom descriptions. -Sequence-to-sequence (seq2seq) model has been successful in dealing with -sequence generation tasks. We explore a potential end-to-end solution to the -TCM prescription generation task using seq2seq models. However, experiments -show that directly applying seq2seq model leads to unfruitful results due to -the repetition problem. To solve the problem, we propose a novel decoder with -coverage mechanism and a novel soft loss function. The experimental results -demonstrate the effectiveness of the proposed approach. Judged by professors -who excel in TCM, the generated prescriptions are rated 7.3 out of 10. It shows -that the model can indeed help with the prescribing procedure in real life. -" -6583,1801.09031,"Wei Li, Yunfang Wu, Xueqiang Lv",Improving Word Vector with Prior Knowledge in Semantic Dictionary,cs.CL," Using low dimensional vector space to represent words has been very effective -in many NLP tasks. However, it doesn't work well when faced with the problem of -rare and unseen words. In this paper, we propose to leverage the knowledge in -semantic dictionary in combination with some morphological information to build -an enhanced vector space. We get an improvement of 2.3% over the -state-of-the-art Heidel Time system in temporal expression recognition, and -obtain a large gain in other name entity recognition (NER) tasks. The semantic -dictionary Hownet alone also shows promising results in computing lexical -similarity. -" -6584,1801.09036,Wlodek Zadrozny and Luciana Garbayo,"A Sheaf Model of Contradictions and Disagreements. Preliminary Report - and Discussion",cs.CL," We introduce a new formal model -- based on the mathematical construct of -sheaves -- for representing contradictory information in textual sources. This -model has the advantage of letting us (a) identify the causes of the -inconsistency; (b) measure how strong it is; (c) and do something about it, -e.g. suggest ways to reconcile inconsistent advice. This model naturally -represents the distinction between contradictions and disagreements. It is -based on the idea of representing natural language sentences as formulas with -parameters sitting on lattices, creating partial orders based on predicates -shared by theories, and building sheaves on these partial orders with products -of lattices as stalks. Degrees of disagreement are measured by the existence of -global and local sections. - Limitations of the sheaf approach and connections to recent work in natural -language processing, as well as the topics of contextuality in physics, data -fusion, topological data analysis and epistemology are also discussed. -" -6585,1801.09053,"Vinh D. Van, Thien Thai, Minh-Quoc Nghiem","Combining Convolution and Recursive Neural Networks for Sentiment - Analysis",cs.CL," This paper addresses the problem of sentence-level sentiment analysis. In -recent years, Convolution and Recursive Neural Networks have been proven to be -effective network architecture for sentence-level sentiment analysis. -Nevertheless, each of them has their own potential drawbacks. For alleviating -their weaknesses, we combined Convolution and Recursive Neural Networks into a -new network architecture. In addition, we employed transfer learning from a -large document-level labeled sentiment dataset to improve the word embedding in -our models. The resulting models outperform all recent Convolution and -Recursive Neural Networks. Beyond that, our models achieve comparable -performance with state-of-the-art systems on Stanford Sentiment Treebank. -" -6586,1801.09079,A. B. Veretennikov,"Using Additional Indexes for Fast Full-Text Search of Phrases That - Contain Frequently Used Words",cs.IR cs.CL," Searches for phrases and word sets in large text arrays by means of -additional indexes are considered. Their use may reduce the query-processing -time by an order of magnitude in comparison with standard inverted files. -" -6587,1801.09251,"Yi Tay, Luu Anh Tuan, Siu Cheung Hui",Multi-Pointer Co-Attention Networks for Recommendation,cs.CL cs.AI cs.IR," Many recent state-of-the-art recommender systems such as D-ATT, TransNet and -DeepCoNN exploit reviews for representation learning. This paper proposes a new -neural architecture for recommendation with reviews. Our model operates on a -multi-hierarchical paradigm and is based on the intuition that not all reviews -are created equal, i.e., only a select few are important. The importance, -however, should be dynamically inferred depending on the current target. To -this end, we propose a review-by-review pointer-based learning scheme that -extracts important reviews, subsequently matching them in a word-by-word -fashion. This enables not only the most informative reviews to be utilized for -prediction but also a deeper word-level interaction. Our pointer-based method -operates with a novel gumbel-softmax based pointer mechanism that enables the -incorporation of discrete vectors within differentiable neural architectures. -Our pointer mechanism is co-attentive in nature, learning pointers which are -co-dependent on user-item relationships. Finally, we propose a multi-pointer -learning scheme that learns to combine multiple views of interactions between -user and item. Overall, we demonstrate the effectiveness of our proposed model -via extensive experiments on \textbf{24} benchmark datasets from Amazon and -Yelp. Empirical results show that our approach significantly outperforms -existing state-of-the-art, with up to 19% and 71% relative improvement when -compared to TransNet and DeepCoNN respectively. We study the behavior of our -multi-pointer learning mechanism, shedding light on evidence aggregation -patterns in review-based recommender systems. -" -6588,1801.09536,Amir Bakarov,A Survey of Word Embeddings Evaluation Methods,cs.CL," Word embeddings are real-valued word representations able to capture lexical -semantics and trained on natural language corpora. Models proposing these -representations have gained popularity in the recent years, but the issue of -the most adequate evaluation method still remains open. This paper presents an -extensive overview of the field of word embeddings evaluation, highlighting -main problems and proposing a typology of approaches to evaluation, summarizing -16 intrinsic methods and 12 extrinsic methods. I describe both widely-used and -experimental methods, systematize information about evaluation datasets and -discuss some key challenges. -" -6589,1801.09633,"Leon Derczynski, Kenny Meesters, Kalina Bontcheva, Diana Maynard","Helping Crisis Responders Find the Informative Needle in the Tweet - Haystack",cs.CL," Crisis responders are increasingly using social media, data and other digital -sources of information to build a situational understanding of a crisis -situation in order to design an effective response. However with the increased -availability of such data, the challenge of identifying relevant information -from it also increases. This paper presents a successful automatic approach to -handling this problem. Messages are filtered for informativeness based on a -definition of the concept drawn from prior research and crisis response -experts. Informative messages are tagged for actionable data -- for example, -people in need, threats to rescue efforts, changes in environment, and so on. -In all, eight categories of actionability are identified. The two components -- -informativeness and actionability classification -- are packaged together as an -openly-available tool called Emina (Emergent Informativeness and -Actionability). -" -6590,1801.09637,"Henri Kauhanen, Deepthi Gopal, Tobias Galla, Ricardo Berm\'udez-Otero","Geospatial distributions reflect rates of evolution of features of - language",physics.soc-ph cond-mat.stat-mech cs.CL nlin.AO," Different structural features of human language change at different rates and -thus exhibit different temporal stabilities. Existing methods of linguistic -stability estimation depend upon the prior genealogical classification of the -world's languages into language families; these methods result in unreliable -stability estimates for features which are sensitive to horizontal transfer -between families and whenever data are aggregated from families of divergent -time depths. To overcome these problems, we describe a method of stability -estimation without family classifications, based on mathematical modelling and -the analysis of contemporary geospatial distributions of linguistic features. -Regressing the estimates produced by our model against those of a genealogical -method, we report broad agreement but also important differences. In -particular, we show that our approach is not liable to some of the false -positives and false negatives incurred by the genealogical method. Our results -suggest that the historical evolution of a linguistic feature leaves a -footprint in its global geospatial distribution, and that rates of evolution -can be recovered from these distributions by treating language dynamics as a -spatially extended stochastic process. -" -6591,1801.09746,Sushant Kafle and Matt Huenerfauth,A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts,cs.CL," Motivated by a project to create a system for people who are deaf or -hard-of-hearing that would use automatic speech recognition (ASR) to produce -real-time text captions of spoken English during in-person meetings with -hearing individuals, we have augmented a transcript of the Switchboard -conversational dialogue corpus with an overlay of word-importance annotations, -with a numeric score for each word, to indicate its importance to the meaning -of each dialogue turn. Further, we demonstrate the utility of this corpus by -training an automatic word importance labeling model; our best performing model -has an F-score of 0.60 in an ordinal 6-class word-importance classification -task with an agreement (concordance correlation coefficient) of 0.839 with the -human annotators (agreement score between annotators is 0.89). Finally, we -discuss our intended future applications of this resource, particularly for the -task of evaluating ASR performance, i.e. creating metrics that predict -ASR-output caption text usability for DHH users better thanWord Error Rate -(WER). -" -6592,1801.09788,"Natalia Ruemmele, Yuriy Tyshetskiy, Alex Collins",Evaluating approaches for supervised semantic labeling,cs.LG cs.AI cs.CL," Relational data sources are still one of the most popular ways to store -enterprise or Web data, however, the issue with relational schema is the lack -of a well-defined semantic description. A common ontology provides a way to -represent the meaning of a relational schema and can facilitate the integration -of heterogeneous data sources within a domain. Semantic labeling is achieved by -mapping attributes from the data sources to the classes and properties in the -ontology. We formulate this problem as a multi-class classification problem -where previously labeled data sources are used to learn rules for labeling new -data sources. The majority of existing approaches for semantic labeling have -focused on data integration challenges such as naming conflicts and semantic -heterogeneity. In addition, machine learning approaches typically have issues -around class imbalance, lack of labeled instances and relative importance of -attributes. To address these issues, we develop a new machine learning model -with engineered features as well as two deep learning models which do not -require extensive feature engineering. We evaluate our new approaches with the -state-of-the-art. -" -6593,1801.09851,"Xuan Wang, Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo - Shang, Curtis Langlotz and Jiawei Han","Cross-type Biomedical Named Entity Recognition with Deep Multi-Task - Learning",cs.IR cs.CL stat.ML," Motivation: State-of-the-art biomedical named entity recognition (BioNER) -systems often require handcrafted features specific to each entity type, such -as genes, chemicals and diseases. Although recent studies explored using neural -network models for BioNER to free experts from manual feature engineering, the -performance remains limited by the available training data for each entity -type. Results: We propose a multi-task learning framework for BioNER to -collectively use the training data of different types of entities and improve -the performance on each of them. In experiments on 15 benchmark BioNER -datasets, our multi-task model achieves substantially better performance -compared with state-of-the-art BioNER systems and baseline neural sequence -labeling models. Further analysis shows that the large performance gains come -from sharing character- and word-level information among relevant biomedical -entities across differently labeled corpora. -" -6594,1801.09866,"Kyungmin Lee, Chiyoun Park, Namhoon Kim, and Jaewon Lee","Accelerating recurrent neural network language model based online speech - recognition system",cs.CL cs.LG," This paper presents methods to accelerate recurrent neural network based -language models (RNNLMs) for online speech recognition systems. Firstly, a -lossy compression of the past hidden layer outputs (history vector) with -caching is introduced in order to reduce the number of LM queries. Next, RNNLM -computations are deployed in a CPU-GPU hybrid manner, which computes each layer -of the model on a more advantageous platform. The added overhead by data -exchanges between CPU and GPU is compensated through a frame-wise batching -strategy. The performance of the proposed methods evaluated on LibriSpeech test -sets indicates that the reduction in history vector precision improves the -average recognition speed by 1.23 times with minimum degradation in accuracy. -On the other hand, the CPU-GPU hybrid parallelization enables RNNLM based -real-time recognition with a four times improvement in speed. -" -6595,1801.09872,Xuri Tang,A State-of-the-Art of Semantic Change Computation,cs.CL," This paper reviews the state-of-the-art of semantic change computation, one -emerging research field in computational linguistics, proposing a framework -that summarizes the literature by identifying and expounding five essential -components in the field: diachronic corpus, diachronic word sense -characterization, change modelling, evaluation data and data visualization. -Despite the potential of the field, the review shows that current studies are -mainly focused on testifying hypotheses proposed in theoretical linguistics and -that several core issues remain to be solved: the need for diachronic corpora -of languages other than English, the need for comprehensive evaluation data for -evaluation, the comparison and construction of approaches to diachronic word -sense characterization and change modelling, and further exploration of data -visualization techniques for hypothesis justification. -" -6596,1801.09893,"Hongzhi Zhang, Guandong Xu, Xiao Liang, Tinglei Huang and Kun fu","An Attention-Based Word-Level Interaction Model: Relation Detection for - Knowledge Base Question Answering",cs.CL," Relation detection plays a crucial role in Knowledge Base Question Answering -(KBQA) because of the high variance of relation expression in the question. -Traditional deep learning methods follow an encoding-comparing paradigm, where -the question and the candidate relation are represented as vectors to compare -their semantic similarity. Max- or average- pooling operation, which compresses -the sequence of words into fixed-dimensional vectors, becomes the bottleneck of -information. In this paper, we propose to learn attention-based word-level -interactions between questions and relations to alleviate the bottleneck issue. -Similar to the traditional models, the question and relation are firstly -represented as sequences of vectors. Then, instead of merging the sequence into -a single vector with pooling operation, soft alignments between words from the -question and the relation are learned. The aligned words are subsequently -compared with the convolutional neural network (CNN) and the comparison results -are merged finally. Through performing the comparison on low-level -representations, the attention-based word-level interaction model (ABWIM) -relieves the information loss issue caused by merging the sequence into a -fixed-dimensional vector before the comparison. The experimental results of -relation detection on both SimpleQuestions and WebQuestions datasets show that -ABWIM achieves state-of-the-art accuracy, demonstrating its effectiveness. -" -6597,1801.09896,"Barbara McGillivray, Federico Sangati","Pilot study for the COST Action ""Reassembling the Republic of Letters"": - language-driven network analysis of letters from the Hartlib's Papers",cs.CL," The present report summarizes an exploratory study which we carried out in -the context of the COST Action IS1310 ""Reassembling the Republic of Letters, -1500-1800"", and which is relevant to the activities of Working Group 3 ""Texts -and Topics"" and Working Group 2 ""People and Networks"". In this study we -investigated the use of Natural Language Processing (NLP) and Network Text -Analysis on a small sample of seventeenth-century letters selected from Hartlib -Papers, whose records are in one of the catalogues of Early Modern Letters -Online (EMLO) and whose online edition is available on the website of the -Humanities Research Institute at the University of Sheffield -(http://www.hrionline.ac.uk/hartlib/). We outline the NLP pipeline used to -automatically process the texts into a network representation, in order to -identify the texts' ""narrative centrality"", i.e. the most central entities in -the texts, and the relations between them. -" -6598,1801.09936,"Mahsa Sadat Shahshahani, Mahdi Mohseni, Azadeh Shakery, Heshaam Faili",PEYMA: A Tagged Corpus for Persian Named Entities,cs.CL," The goal in the NER task is to classify proper nouns of a text into classes -such as person, location, and organization. This is an important preprocessing -step in many NLP tasks such as question-answering and summarization. Although -many research studies have been conducted in this area in English and the -state-of-the-art NER systems have reached performances of higher than 90 -percent in terms of F1 measure, there are very few research studies for this -task in Persian. One of the main important causes of this may be the lack of a -standard Persian NER dataset to train and test NER systems. In this research we -create a standard, big-enough tagged Persian NER dataset which will be -distributed for free for research purposes. In order to construct such a -standard dataset, we studied standard NER datasets which are constructed for -English researches and found out that almost all of these datasets are -constructed using news texts. So we collected documents from ten news websites. -Later, in order to provide annotators with some guidelines to tag these -documents, after studying guidelines used for constructing CoNLL and MUC -standard English datasets, we set our own guidelines considering the Persian -linguistic rules. -" -6599,1801.09975,"Semiha Makinist, Ibrahim Riza Hallac, Betul Ay Karakus and Galip Aydin","Preparation of Improved Turkish DataSet for Sentiment Analysis in Social - Media",cs.CL cs.IR cs.SI," A public dataset, with a variety of properties suitable for sentiment -analysis [1], event prediction, trend detection and other text mining -applications, is needed in order to be able to successfully perform analysis -studies. The vast majority of data on social media is text-based and it is not -possible to directly apply machine learning processes into these raw data, -since several different processes are required to prepare the data before the -implementation of the algorithms. For example, different misspellings of same -word enlarge the word vector space unnecessarily, thereby it leads to reduce -the success of the algorithm and increase the computational power requirement. -This paper presents an improved Turkish dataset with an effective spelling -correction algorithm based on Hadoop [2]. The collected data is recorded on the -Hadoop Distributed File System and the text based data is processed by -MapReduce programming model. This method is suitable for the storage and -processing of large sized text based social media data. In this study, movie -reviews have been automatically recorded with Apache ManifoldCF (MCF) [3] and -data clusters have been created. Various methods compared such as Levenshtein -and Fuzzy String Matching have been proposed to create a public dataset from -collected data. Experimental results show that the proposed algorithm, which -can be used as an open source dataset in sentiment analysis studies, have been -performed successfully to the detection and correction of spelling errors. -" -6600,1801.10063,Stefan Gerdjikov,"Characterisation of (Sub)sequential Rational Functions over a General - Class Monoids",cs.FL cs.CL cs.LO," In this technical report we describe a general class of monoids for which -(sub)sequential rational can be characterised in terms of a congruence relation -in the flavour of Myhill-Nerode relation. The class of monoids that we consider -can be described in terms of natural algebraic axioms, contains the free -monoids, groups, the tropical monoid, and is closed under Cartesian. -" -6601,1801.10080,"Aayushee Gupta, Haimonti Dutta, Srikanta Bedathur, Lipika Dey",A Machine Learning Approach to Quantitative Prosopography,cs.DL cs.CL," Prosopography is an investigation of the common characteristics of a group of -people in history, by a collective study of their lives. It involves a study of -biographies to solve historical problems. If such biographies are unavailable, -surviving documents and secondary biographical data are used. Quantitative -prosopography involves analysis of information from a wide variety of sources -about ""ordinary people"". In this paper, we present a machine learning framework -for automatically designing a people gazetteer which forms the basis of -quantitative prosopographical research. The gazetteer is learnt from the noisy -text of newspapers using a Named Entity Recognizer (NER). It is capable of -identifying influential people from it by making use of a custom designed -Influential Person Index (IPI). Our corpus comprises of 14020 articles from a -local newspaper, ""The Sun"", published from New York in 1896. Some influential -people identified by our algorithm include Captain Donald Hankey (an English -soldier), Dame Nellie Melba (an Australian operatic soprano), Hugh Allan (a -Canadian shipping magnate) and Sir Hugh John McDonald (the first Prime Minister -of Canada). -" -6602,1801.10095,"Alberto Garcia-Duran, Roberto Gonzalez, Daniel Onoro-Rubio, Mathias - Niepert, Hui Li",TransRev: Modeling Reviews as Translations from Users to Items,cs.IR cs.CL," The text of a review expresses the sentiment a customer has towards a -particular product. This is exploited in sentiment analysis where machine -learning models are used to predict the review score from the text of the -review. Furthermore, the products costumers have purchased in the past are -indicative of the products they will purchase in the future. This is what -recommender systems exploit by learning models from purchase information to -predict the items a customer might be interested in. We propose TransRev, an -approach to the product recommendation problem that integrates ideas from -recommender systems, sentiment analysis, and multi-relational learning into a -joint learning objective. TransRev learns vector representations for users, -items, and reviews. The embedding of a review is learned such that (a) it -performs well as input feature of a regression model for sentiment prediction; -and (b) it always translates the reviewer embedding to the embedding of the -reviewed items. This allows TransRev to approximate a review embedding at test -time as the difference of the embedding of each item and the user embedding. -The approximated review embedding is then used with the regression model to -predict the review score for each item. TransRev outperforms state of the art -recommender systems on a large number of benchmark data sets. Moreover, it is -able to retrieve, for each user and item, the review text from the training set -whose embedding is most similar to the approximated review embedding. -" -6603,1801.10198,"Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, - Lukasz Kaiser, Noam Shazeer",Generating Wikipedia by Summarizing Long Sequences,cs.CL," We show that generating English Wikipedia articles can be approached as a -multi- document summarization of source documents. We use extractive -summarization to coarsely identify salient information and a neural abstractive -model to generate the article. For the abstractive model, we introduce a -decoder-only architecture that can scalably attend to very long sequences, much -longer than typical encoder- decoder architectures used in sequence -transduction. We show that this model can generate fluent, coherent -multi-sentence paragraphs and even whole Wikipedia articles. When given -reference documents, we show it can extract relevant factual information as -reflected in perplexity, ROUGE scores and human evaluations. -" -6604,1801.10253,"Spencer Cappallo, Stacey Svetlichnaya, Pierre Garrigues, Thomas - Mensink, Cees G. M. Snoek","The New Modality: Emoji Challenges in Prediction, Anticipation, and - Retrieval",cs.CL cs.IR cs.MM," Over the past decade, emoji have emerged as a new and widespread form of -digital communication, spanning diverse social networks and spoken languages. -We propose to treat these ideograms as a new modality in their own right, -distinct in their semantic structure from both the text in which they are often -embedded as well as the images which they resemble. As a new modality, emoji -present rich novel possibilities for representation and interaction. In this -paper, we explore the challenges that arise naturally from considering the -emoji modality through the lens of multimedia research. Specifically, the ways -in which emoji can be related to other common modalities such as text and -images. To do so, we first present a large scale dataset of real-world emoji -usage collected from Twitter. This dataset contains examples of both text-emoji -and image-emoji relationships. We present baseline results on the challenge of -predicting emoji from both text and images, using state-of-the-art neural -networks. Further, we offer a first consideration into the problem of how to -account for new, unseen emoji - a relevant issue as the emoji vocabulary -continues to expand on a yearly basis. Finally, we present results for -multimedia retrieval using emoji as queries. -" -6605,1801.10293,"Avneesh Saluja, Chris Dyer, Jean-David Ruvini",Paraphrase-Supervised Models of Compositionality,cs.CL," Compositional vector space models of meaning promise new solutions to -stubborn language understanding problems. This paper makes two contributions -toward this end: (i) it uses automatically-extracted paraphrase examples as a -source of supervision for training compositional models, replacing previous -work which relied on manual annotations used for the same purpose, and (ii) -develops a context-aware model for scoring phrasal compositionality. -Experimental results indicate that these multiple sources of information can be -used to learn partial semantic supervision that matches previous techniques in -intrinsic evaluation tasks. Our approaches are also evaluated for their impact -on a machine translation system where we show improvements in translation -quality, demonstrating that compositionality in interpretation correlates with -compositionality in translation. -" -6606,1801.10296,"Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Sen Wang, Chengqi - Zhang","Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention - for Sequence Modeling",cs.CL," Many natural language processing tasks solely rely on sparse dependencies -between a few tokens in a sentence. Soft attention mechanisms show promising -performance in modeling local/global dependencies by soft probabilities between -every two tokens, but they are not effective and efficient when applied to long -sentences. By contrast, hard attention mechanisms directly select a subset of -tokens but are difficult and inefficient to train due to their combinatorial -nature. In this paper, we integrate both soft and hard attention into one -context fusion model, ""reinforced self-attention (ReSA)"", for the mutual -benefit of each other. In ReSA, a hard attention trims a sequence for a soft -self-attention to process, while the soft attention feeds reward signals back -to facilitate the training of the hard one. For this purpose, we develop a -novel hard attention called ""reinforced sequence sampling (RSS)"", selecting -tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA -efficiently extracts the sparse dependencies between each pair of selected -tokens. We finally propose an RNN/CNN-free sentence-encoding model, ""reinforced -self-attention network (ReSAN)"", solely based on ReSA. It achieves -state-of-the-art performance on both Stanford Natural Language Inference (SNLI) -and Sentences Involving Compositional Knowledge (SICK) datasets. -" -6607,1801.10308,Joel Ruben Antony Moniz and David Krueger,Nested LSTMs,cs.CL cs.LG," We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple -levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to -stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, -which has its own inner memory cell. Specifically, instead of computing the -value of the (outer) memory cell as $c^{outer}_t = f_t \odot c_{t-1} + i_t -\odot g_t$, NLSTM memory cells use the concatenation $(f_t \odot c_{t-1}, i_t -\odot g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set -$c^{outer}_t$ = $h^{inner}_t$. Nested LSTMs outperform both stacked and -single-layer LSTMs with similar numbers of parameters in our experiments on -various character-level language modeling tasks, and the inner memories of an -LSTM learn longer term dependencies compared with the higher-level units of a -stacked LSTM. -" -6608,1801.10314,"Amrita Saha, Vardaan Pahuja, Mitesh M. Khapra, Karthik - Sankaranarayanan and Sarath Chandar","Complex Sequential Question Answering: Towards Learning to Converse Over - Linked Question Answer Pairs with a Knowledge Graph",cs.CL," While conversing with chatbots, humans typically tend to ask many questions, -a significant portion of which can be answered by referring to large-scale -knowledge graphs (KG). While Question Answering (QA) and dialog systems have -been studied independently, there is a need to study them closely to evaluate -such real-world scenarios faced by bots involving both these tasks. Towards -this end, we introduce the task of Complex Sequential QA which combines the two -tasks of (i) answering factual questions through complex inferencing over a -realistic-sized KG of millions of entities, and (ii) learning to converse -through a series of coherently linked QA pairs. Through a labor intensive -semi-automatic process, involving in-house and crowdsourced workers, we created -a dataset containing around 200K dialogs with a total of 1.6M turns. Further, -unlike existing large scale QA datasets which contain simple questions that can -be answered from a single tuple, the questions in our dialogs require a larger -subgraph of the KG. Specifically, our dataset has questions which require -logical, quantitative, and comparative reasoning as well as their combinations. -This calls for models which can: (i) parse complex natural language questions, -(ii) use conversation context to resolve coreferences and ellipsis in -utterances, (iii) ask for clarifications for ambiguous queries, and finally -(iv) retrieve relevant subgraphs of the KG to answer such questions. However, -our experiments with a combination of state of the art dialog and QA models -show that they clearly do not achieve the above objectives and are inadequate -for dealing with such complex real world settings. We believe that this new -dataset coupled with the limitations of existing models as reported in this -paper should encourage further research in Complex Sequential QA. -" -6609,1802.00033,"Peter Sch\""uller","Technical Report: Adjudication of Coreference Annotations via Answer Set - Optimization",cs.CL cs.AI," We describe the first automatic approach for merging coreference annotations -obtained from multiple annotators into a single gold standard. This merging is -subject to certain linguistic hard constraints and optimization criteria that -prefer solutions with minimal divergence from annotators. The representation -involves an equivalence relation over a large number of elements. We use Answer -Set Programming to describe two representations of the problem and four -objective functions suitable for different datasets. We provide two -structurally different real-world benchmark datasets based on the METU-Sabanci -Turkish Treebank and we report our experiences in using the Gringo, Clasp, and -Wasp tools for computing optimal adjudication results on these datasets. -" -6610,1802.00209,Ahmed Osman and Wojciech Samek,Dual Recurrent Attention Units for Visual Question Answering,cs.AI cs.CL cs.CV cs.NE stat.ML," Visual Question Answering (VQA) requires AI models to comprehend data in two -domains, vision and text. Current state-of-the-art models use learned attention -mechanisms to extract relevant information from the input domains to answer a -certain question. Thus, robust attention mechanisms are essential for powerful -VQA models. In this paper, we propose a recurrent attention mechanism and show -its benefits compared to the traditional convolutional approach. We perform two -ablation studies to evaluate recurrent attention. First, we introduce a -baseline VQA model with visual attention and test the performance difference -between convolutional and recurrent attention on the VQA 2.0 dataset. Secondly, -we design an architecture for VQA which utilizes dual (textual and visual) -Recurrent Attention Units (RAUs). Using this model, we show the effect of all -possible combinations of recurrent and convolutional dual attention. Our single -model outperforms the first place winner on the VQA 2016 challenge and to the -best of our knowledge, it is the second best performing single model on the VQA -1.0 dataset. Furthermore, our model noticeably improves upon the winner of the -VQA 2017 challenge. Moreover, we experiment replacing attention mechanisms in -state-of-the-art models with our RAUs and show increased performance. -" -6611,1802.00231,"Binny Mathew, Suman Kalyan Maity, Pratip Sarkar, Animesh Mukherjee and - Pawan Goyal","Adapting predominant and novel sense discovery algorithms for - identifying corpus-specific sense differences",cs.CL," Word senses are not static and may have temporal, spatial or corpus-specific -scopes. Identifying such scopes might benefit the existing WSD systems largely. -In this paper, while studying corpus specific word senses, we adapt three -existing predominant and novel-sense discovery algorithms to identify these -corpus-specific senses. We make use of text data available in the form of -millions of digitized books and newspaper archives as two different sources of -corpora and propose automated methods to identify corpus-specific word senses -at various time points. We conduct an extensive and thorough human judgment -experiment to rigorously evaluate and compare the performance of these -approaches. Post adaptation, the output of the three algorithms are in the same -format and the accuracy results are also comparable, with roughly 45-60% of the -reported corpus-specific senses being judged as genuine. -" -6612,1802.00254,"Yu Wang, Xie Chen, Mark Gales, Anton Ragni and Jeremy Wong",Phonetic and Graphemic Systems for Multi-Genre Broadcast Transcription,cs.SD cs.CL eess.AS," State-of-the-art English automatic speech recognition systems typically use -phonetic rather than graphemic lexicons. Graphemic systems are known to perform -less well for English as the mapping from the written form to the spoken form -is complicated. However, in recent years the representational power of -deep-learning based acoustic models has improved, raising interest in graphemic -acoustic models for English, due to the simplicity of generating the lexicon. -In this paper, phonetic and graphemic models are compared for an English -Multi-Genre Broadcast transcription task. A range of acoustic models based on -lattice-free MMI training are constructed using phonetic and graphemic -lexicons. For this task, it is found that having a long-span temporal history -reduces the difference in performance between the two forms of models. In -addition, system combination is examined, using parameter smoothing and -hypothesis combination. As the combination approaches become more complicated -the difference between the phonetic and graphemic systems further decreases. -Finally, for all configurations examined the combination of phonetic and -graphemic systems yields consistent gains. -" -6613,1802.00273,"J\""org Tiedemann",Emerging Language Spaces Learned From Massively Multilingual Corpora,cs.CL," Translations capture important information about languages that can be used -as implicit supervision in learning linguistic properties and semantic -representations. In an information-centric view, translated texts may be -considered as semantic mirrors of the original text and the significant -variations that we can observe across various languages can be used to -disambiguate a given expression using the linguistic signal that is grounded in -translation. Parallel corpora consisting of massive amounts of human -translations with a large linguistic variation can be applied to increase -abstractions and we propose the use of highly multilingual machine translation -models to find language-independent meaning representations. Our initial -experiments show that neural machine translation models can indeed learn in -such a setup and we can show that the learning algorithm picks up information -about the relation between languages in order to optimize transfer leaning with -shared parameters. The model creates a continuous language space that -represents relationships in terms of geometric distances, which we can -visualize to illustrate how languages cluster according to language families -and groups. Does this open the door for new ideas of data-driven language -typology with promising models and techniques in empirical cross-linguistic -research? -" -6614,1802.00382,Amitabha Karmakar,"Classifying medical notes into standard disease codes using Machine - Learning",cs.LG cs.CL stat.AP stat.ML," We investigate the automatic classification of patient discharge notes into -standard disease labels. We find that Convolutional Neural Networks with -Attention outperform previous algorithms used in this task, and suggest further -areas for improvement. -" -6615,1802.00385,"Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy - Blackburn, Athena Vakali, Ilias Leontiadis",A Unified Deep Learning Architecture for Abuse Detection,cs.CL cs.SI," Hate speech, offensive language, sexism, racism and other types of abusive -behavior have become a common phenomenon in many online social media platforms. -In recent years, such diverse abusive behaviors have been manifesting with -increased frequency and levels of intensity. This is due to the openness and -willingness of popular media platforms, such as Twitter and Facebook, to host -content of sensitive or controversial topics. However, these platforms have not -adequately addressed the problem of online abusive behavior, and their -responsiveness to the effective detection and blocking of such inappropriate -behavior remains limited. - In the present paper, we study this complex problem by following a more -holistic approach, which considers the various aspects of abusive behavior. To -make the approach tangible, we focus on Twitter data and analyze user and -textual properties from different angles of abusive posting behavior. We -propose a deep learning architecture, which utilizes a wide variety of -available metadata, and combines it with automatically-extracted hidden -patterns within the text of the tweets, to detect multiple abusive behavioral -norms which are highly inter-related. We apply this unified architecture in a -seamless, transparent fashion to detect different types of abusive behavior -(hate speech, sexism vs. racism, bullying, sarcasm, etc.) without the need for -any tuning of the model architecture for each task. We test the proposed -approach with multiple datasets addressing different and multiple abusive -behaviors on Twitter. Our results demonstrate that it largely outperforms the -state-of-art methods (between 21 and 45\% improvement in AUC, depending on the -dataset). -" -6616,1802.00396,Caleb Pomeroy and Niheer Dasandi and Slava J. Mikhaylov,"Disunited Nations? A Multiplex Network Approach to Detecting Preference - Affinity Blocs using Texts and Votes",cs.CL cs.CY cs.SI physics.soc-ph," This paper contributes to an emerging literature that models votes and text -in tandem to better understand polarization of expressed preferences. It -introduces a new approach to estimate preference polarization in -multidimensional settings, such as international relations, based on -developments in the natural language processing and network science literatures --- namely word embeddings, which retain valuable syntactical qualities of human -language, and community detection in multilayer networks, which locates densely -connected actors across multiple, complex networks. We find that the employment -of these tools in tandem helps to better estimate states' foreign policy -preferences expressed in UN votes and speeches beyond that permitted by votes -alone. The utility of these located affinity blocs is demonstrated through an -application to conflict onset in International Relations, though these tools -will be of interest to all scholars faced with the measurement of preferences -and polarization in multidimensional settings. -" -6617,1802.00500,"Vladimir Ilievski, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl","Goal-Oriented Chatbot Dialog Management Bootstrapping with Transfer - Learning",cs.CL," Goal-Oriented (GO) Dialogue Systems, colloquially known as goal oriented -chatbots, help users achieve a predefined goal (e.g. book a movie ticket) -within a closed domain. A first step is to understand the user's goal by using -natural language understanding techniques. Once the goal is known, the bot must -manage a dialogue to achieve that goal, which is conducted with respect to a -learnt policy. The success of the dialogue system depends on the quality of the -policy, which is in turn reliant on the availability of high-quality training -data for the policy learning method, for instance Deep Reinforcement Learning. - Due to the domain specificity, the amount of available data is typically too -low to allow the training of good dialogue policies. In this paper we introduce -a transfer learning method to mitigate the effects of the low in-domain data -availability. Our transfer learning based approach improves the bot's success -rate by 20% in relative terms for distant domains and we more than double it -for close domains, compared to the model without transfer learning. Moreover, -the transfer learning chatbots learn the policy up to 5 to 10 times faster. -Finally, as the transfer learning approach is complementary to additional -processing such as warm-starting, we show that their joint application gives -the best outcomes. -" -6618,1802.00510,"Daniel Li, Asim Kadav",Adaptive Memory Networks,cs.AI cs.CL," We present Adaptive Memory Networks (AMN) that processes input-question pairs -to dynamically construct a network architecture optimized for lower inference -times for Question Answering (QA) tasks. AMN processes the input story to -extract entities and stores them in memory banks. Starting from a single bank, -as the number of input entities increases, AMN learns to create new banks as -the entropy in a single bank becomes too high. Hence, after processing an -input-question(s) pair, the resulting network represents a hierarchical -structure where entities are stored in different banks, distanced by question -relevance. At inference, one or few banks are used, creating a tradeoff between -accuracy and performance. AMN is enabled by dynamic networks that allow input -dependent network creation and efficiency in dynamic mini-batching as well as -our novel bank controller that allows learning discrete decision making with -high accuracy. In our results, we demonstrate that AMN learns to create -variable depth networks depending on task complexity and reduces inference -times for QA tasks. -" -6619,1802.00757,"Mladen Dimovski, Claudiu Musat, Vladimir Ilievski, Andreea Hossmann, - Michael Baeriswyl","Submodularity-Inspired Data Selection for Goal-Oriented Chatbot Training - Based on Sentence Embeddings",cs.CL," Spoken language understanding (SLU) systems, such as goal-oriented chatbots -or personal assistants, rely on an initial natural language understanding (NLU) -module to determine the intent and to extract the relevant information from the -user queries they take as input. SLU systems usually help users to solve -problems in relatively narrow domains and require a large amount of in-domain -training data. This leads to significant data availability issues that inhibit -the development of successful systems. To alleviate this problem, we propose a -technique of data selection in the low-data regime that enables us to train -with fewer labeled sentences, thus smaller labelling costs. - We propose a submodularity-inspired data ranking function, the ratio-penalty -marginal gain, for selecting data points to label based only on the information -extracted from the textual embedding space. We show that the distances in the -embedding space are a viable source of information that can be used for data -selection. Our method outperforms two known active learning techniques and -enables cost-efficient training of the NLU unit. Moreover, our proposed -selection technique does not need the model to be retrained in between the -selection steps, making it time efficient as well. -" -6620,1802.00768,"Philip A Huebner, Jon A Willits","Order matters: Distributional properties of speech to young children - bootstraps learning of semantic representations",cs.CL," Some researchers claim that language acquisition is critically dependent on -experiencing linguistic input in order of increasing complexity. We set out to -test this hypothesis using a simple recurrent neural network (SRN) trained to -predict word sequences in CHILDES, a 5-million-word corpus of speech directed -to children. First, we demonstrated that age-ordered CHILDES exhibits a gradual -increase in linguistic complexity. Next, we compared the performance of two -groups of SRNs trained on CHILDES which had either been age-ordered or not. -Specifically, we assessed learning of grammatical and semantic structure and -showed that training on age-ordered input facilitates learning of semantic, but -not of sequential structure. We found that this advantage is eliminated when -the models were trained on input with utterance boundary information removed. -" -6621,1802.00840,"Andrei Amatuni, Estelle He, Elika Bergelson",Preserved Structure Across Vector Space Representations,q-bio.NC cs.CL," Certain concepts, words, and images are intuitively more similar than others -(dog vs. cat, dog vs. spoon), though quantifying such similarity is notoriously -difficult. Indeed, this kind of computation is likely a critical part of -learning the category boundaries for words within a given language. Here, we -use a set of 27 items (e.g. 'dog') that are highly common in infants' input, -and use both image- and word-based algorithms to independently compute -similarity among them. We find three key results. First, the pairwise item -similarities derived within image-space and word-space are correlated, -suggesting preserved structure among these extremely different representational -formats. Second, the closest 'neighbors' for each item, within each space, -showed significant overlap (e.g. both found 'egg' as a neighbor of 'apple'). -Third, items with the most overlapping neighbors are later-learned by infants -and toddlers. We conclude that this approach, which does not rely on human -ratings of similarity, may nevertheless reflect stable within-class structure -across these two spaces. We speculate that such invariance might aid lexical -acquisition, by serving as an informative marker of category boundaries. -" -6622,1802.00889,"Zixiang Ding, Rui Xia, Jianfei Yu, Xiang Li, Jian Yang","Densely Connected Bidirectional LSTM with Applications to Sentence - Classification",cs.CL," Deep neural networks have recently been shown to achieve highly competitive -performance in many computer vision tasks due to their abilities of exploring -in a much larger hypothesis space. However, since most deep architectures like -stacked RNNs tend to suffer from the vanishing-gradient and overfitting -problems, their effects are still understudied in many NLP tasks. Inspired by -this, we propose a novel multi-layer RNN model called densely connected -bidirectional long short-term memory (DC-Bi-LSTM) in this paper, which -essentially represents each layer by the concatenation of its hidden state and -all preceding layers' hidden states, followed by recursively passing each -layer's representation to all subsequent layers. We evaluate our proposed model -on five benchmark datasets of sentence classification. DC-Bi-LSTM with depth up -to 20 can be successfully trained and obtain significant improvements over the -traditional Bi-LSTM with the same or even less parameters. Moreover, our model -has promising performance compared with the state-of-the-art approaches. -" -6623,1802.00892,"Shiliang Zheng, Rui Xia","Left-Center-Right Separated Neural Network for Aspect-based Sentiment - Analysis with Rotatory Attention",cs.CL," Deep learning techniques have achieved success in aspect-based sentiment -analysis in recent years. However, there are two important issues that still -remain to be further studied, i.e., 1) how to efficiently represent the target -especially when the target contains multiple words; 2) how to utilize the -interaction between target and left/right contexts to capture the most -important words in them. In this paper, we propose an approach, called -left-center-right separated neural network with rotatory attention (LCR-Rot), -to better address the two problems. Our approach has two characteristics: 1) it -has three separated LSTMs, i.e., left, center and right LSTMs, corresponding to -three parts of a review (left context, target phrase and right context); 2) it -has a rotatory attention mechanism which models the relation between target and -left/right contexts. The target2context attention is used to capture the most -indicative sentiment words in left/right contexts. Subsequently, the -context2target attention is used to capture the most important word in the -target. This leads to a two-side representation of the target: left-aware -target and right-aware target. We compare our approach on three benchmark -datasets with ten related methods proposed recently. The results show that our -approach significantly outperforms the state-of-the-art techniques. -" -6624,1802.00923,"Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, - Louis-Philippe Morency",Multi-attention Recurrent Network for Human Communication Comprehension,cs.AI cs.CL cs.LG," Human face-to-face communication is a complex multimodal signal. We use words -(language modality), gestures (vision modality) and changes in tone (acoustic -modality) to convey our intentions. Humans easily process and understand -face-to-face communication, however, comprehending this form of communication -remains a significant challenge for Artificial Intelligence (AI). AI must -understand each modality and the interactions between them that shape human -communication. In this paper, we present a novel neural architecture for -understanding human communication called the Multi-attention Recurrent Network -(MARN). The main strength of our model comes from discovering interactions -between modalities through time using a neural component called the -Multi-attention Block (MAB) and storing them in the hybrid memory of a -recurrent component called the Long-short Term Hybrid Memory (LSTHM). We -perform extensive comparisons on six publicly available datasets for multimodal -sentiment analysis, speaker trait recognition and emotion recognition. MARN -shows state-of-the-art performance on all the datasets. -" -6625,1802.00924,"Minghai Chen, Sen Wang, Paul Pu Liang, Tadas Baltru\v{s}aitis, Amir - Zadeh, Louis-Philippe Morency","Multimodal Sentiment Analysis with Word-Level Fusion and Reinforcement - Learning",cs.LG cs.AI cs.CL stat.ML," With the increasing popularity of video sharing websites such as YouTube and -Facebook, multimodal sentiment analysis has received increasing attention from -the scientific community. Contrary to previous works in multimodal sentiment -analysis which focus on holistic information in speech segments such as bag of -words representations and average facial expression intensity, we develop a -novel deep architecture for multimodal sentiment analysis that performs -modality fusion at the word level. In this paper, we propose the Gated -Multimodal Embedding LSTM with Temporal Attention (GME-LSTM(A)) model that is -composed of 2 modules. The Gated Multimodal Embedding alleviates the -difficulties of fusion when there are noisy modalities. The LSTM with Temporal -Attention performs word level fusion at a finer fusion resolution between input -modalities and attends to the most important time steps. As a result, the -GME-LSTM(A) is able to better model the multimodal structure of speech through -time and perform better sentiment comprehension. We demonstrate the -effectiveness of this approach on the publicly-available Multimodal Corpus of -Sentiment Intensity and Subjectivity Analysis (CMU-MOSI) dataset by achieving -state-of-the-art sentiment classification and regression results. Qualitative -analysis on our model emphasizes the importance of the Temporal Attention Layer -in sentiment prediction because the additional acoustic and visual modalities -are noisy. We also demonstrate the effectiveness of the Gated Multimodal -Embedding in selectively filtering these noisy modalities out. Our results and -analysis open new areas in the study of sentiment analysis in human -communication and provide new models for multimodal fusion. -" -6626,1802.00946,Parth Mehta and Prasenjit Majumder,Content based Weighted Consensus Summarization,cs.IR cs.CL," Multi-document summarization has received a great deal of attention in the -past couple of decades. Several approaches have been proposed, many of which -perform equally well and it is becoming in- creasingly difficult to choose one -particular system over another. An ensemble of such systems that is able to -leverage the strengths of each individual systems can build a better and more -robust summary. Despite this, few attempts have been made in this direction. In -this paper, we describe a category of ensemble systems which use consensus -between the candidate systems to build a better meta-summary. We highlight two -major shortcomings of such systems: the inability to take into account relative -performance of individual systems and overlooking content of candidate -summaries in favour of the sentence rankings. We propose an alternate method, -content-based weighted consensus summarization, which address these concerns. -We use pseudo-relevant summaries to estimate the performance of individual -candidate systems, and then use this information to generate a better aggregate -ranking. Experiments on DUC 2003 and DUC 2004 datasets show that the proposed -system outperforms existing consensus-based techniques by a large margin. -" -6627,1802.01021,Jonathan Raiman and Olivier Raiman,DeepType: Multilingual Entity Linking by Neural Type System Evolution,cs.CL," The wealth of structured (e.g. Wikidata) and unstructured data about the -world available today presents an incredible opportunity for tomorrow's -Artificial Intelligence. So far, integration of these two different modalities -is a difficult process, involving many decisions concerning how best to -represent the information so that it will be captured or useful, and -hand-labeling large amounts of data. DeepType overcomes this challenge by -explicitly integrating symbolic information into the reasoning process of a -neural network with a type system. First we construct a type system, and -second, we use it to constrain the outputs of a neural network to respect the -symbolic structure. We achieve this by reformulating the design problem into a -mixed integer problem: create a type system and subsequently train a neural -network with it. In this reformulation discrete variables select which -parent-child relations from an ontology are types within the type system, while -continuous variables control a classifier fit to the type system. The original -problem cannot be solved exactly, so we propose a 2-step algorithm: 1) -heuristic search or stochastic optimization over discrete variables that define -a type system informed by an Oracle and a Learnability heuristic, 2) gradient -descent to fit classifier parameters. We apply DeepType to the problem of -Entity Linking on three standard datasets (i.e. WikiDisamb30, CoNLL (YAGO), TAC -KBP 2010) and find that it outperforms all existing solutions by a wide margin, -including approaches that rely on a human-designed type system or recent deep -learning-based entity embeddings, while explicitly using symbolic information -lets it integrate new entities without retraining. -" -6628,1802.01074,Minh C. Phan and Aixin Sun and Yi Tay and Jialong Han and Chenliang Li,"Pair-Linking for Collective Entity Disambiguation: Two Could Be Better - Than All",cs.IR cs.CL," Collective entity disambiguation aims to jointly resolve multiple mentions by -linking them to their associated entities in a knowledge base. Previous works -are primarily based on the underlying assumption that entities within the same -document are highly related. However, the extend to which these mentioned -entities are actually connected in reality is rarely studied and therefore -raises interesting research questions. For the first time, we show that the -semantic relationships between the mentioned entities are in fact less dense -than expected. This could be attributed to several reasons such as noise, data -sparsity and knowledge base incompleteness. As a remedy, we introduce MINTREE, -a new tree-based objective for the entity disambiguation problem. The key -intuition behind MINTREE is the concept of coherence relaxation which utilizes -the weight of a minimum spanning tree to measure the coherence between -entities. Based on this new objective, we design a novel entity disambiguation -algorithms which we call Pair-Linking. Instead of considering all the given -mentions, Pair-Linking iteratively selects a pair with the highest confidence -at each step for decision making. Via extensive experiments, we show that our -approach is not only more accurate but also surprisingly faster than many -state-of-the-art collective linking algorithms. -" -6629,1802.01191,"Matti Wiegmann and Michael V\""olske and Benno Stein and Matthias Hagen - and Martin Potthast",Heuristic Feature Selection for Clickbait Detection,cs.CL," We study feature selection as a means to optimize the baseline clickbait -detector employed at the Clickbait Challenge 2017. The challenge's task is to -score the ""clickbaitiness"" of a given Twitter tweet on a scale from 0 (no -clickbait) to 1 (strong clickbait). Unlike most other approaches submitted to -the challenge, the baseline approach is based on manual feature engineering and -does not compete out of the box with many of the deep learning-based -approaches. We show that scaling up feature selection efforts to heuristically -identify better-performing feature subsets catapults the performance of the -baseline classifier to second rank overall, beating 12 other competing -approaches and improving over the baseline performance by 20%. This -demonstrates that traditional classification approaches can still keep up with -deep learning on this task. -" -6630,1802.01241,"Gabriel Grand, Idan Asher Blank, Francisco Pereira and Evelina - Fedorenko","Semantic projection: recovering human knowledge of multiple, distinct - object features from word embeddings",cs.CL," The words of a language reflect the structure of the human mind, allowing us -to transmit thoughts between individuals. However, language can represent only -a subset of our rich and detailed cognitive architecture. Here, we ask what -kinds of common knowledge (semantic memory) are captured by word meanings -(lexical semantics). We examine a prominent computational model that represents -words as vectors in a multidimensional space, such that proximity between -word-vectors approximates semantic relatedness. Because related words appear in -similar contexts, such spaces - called ""word embeddings"" - can be learned from -patterns of lexical co-occurrences in natural language. Despite their -popularity, a fundamental concern about word embeddings is that they appear to -be semantically ""rigid"": inter-word proximity captures only overall similarity, -yet human judgments about object similarities are highly context-dependent and -involve multiple, distinct semantic features. For example, dolphins and -alligators appear similar in size, but differ in intelligence and -aggressiveness. Could such context-dependent relationships be recovered from -word embeddings? To address this issue, we introduce a powerful, domain-general -solution: ""semantic projection"" of word-vectors onto lines that represent -various object features, like size (the line extending from the word ""small"" to -""big""), intelligence (from ""dumb"" to ""smart""), or danger (from ""safe"" to -""dangerous""). This method, which is intuitively analogous to placing objects -""on a mental scale"" between two extremes, recovers human judgments across a -range of object categories and properties. We thus show that word embeddings -inherit a wealth of common knowledge from word co-occurrence statistics and can -be flexibly manipulated to express context-dependent meanings. -" -6631,1802.01255,"Yifan Peng, Anthony Rios, Ramakanth Kavuluru, Zhiyong Lu","Chemical-protein relation extraction with ensembles of SVM, CNN, and RNN - models",cs.CL," Text mining the relations between chemicals and proteins is an increasingly -important task. The CHEMPROT track at BioCreative VI aims to promote the -development and evaluation of systems that can automatically detect the -chemical-protein relations in running text (PubMed abstracts). This manuscript -describes our submission, which is an ensemble of three systems, including a -Support Vector Machine, a Convolutional Neural Network, and a Recurrent Neural -Network. Their output is combined using a decision based on majority voting or -stacking. Our CHEMPROT system obtained 0.7266 in precision and 0.5735 in recall -for an f-score of 0.6410, demonstrating the effectiveness of machine -learning-based approaches for automatic relation extraction from biomedical -literature. Our submission achieved the highest performance in the task during -the 2017 challenge. -" -6632,1802.01345,"Jingjing Xu, Xuancheng Ren, Junyang Lin, Xu Sun","DP-GAN: Diversity-Promoting Generative Adversarial Network for - Generating Informative and Diversified Text",cs.CL," Existing text generation methods tend to produce repeated and ""boring"" -expressions. To tackle this problem, we propose a new text generation model, -called Diversity-Promoting Generative Adversarial Network (DP-GAN). The -proposed model assigns low reward for repeatedly generated text and high reward -for ""novel"" and fluent text, encouraging the generator to produce diverse and -informative text. Moreover, we propose a novel language-model based -discriminator, which can better distinguish novel text from repeated text -without the saturation problem compared with existing classifier-based -discriminators. The experimental results on review generation and dialogue -generation tasks demonstrate that our model can generate substantially more -diverse and informative text than existing baselines. The code is available at -https://github.com/lancopku/DPGAN -" -6633,1802.01405,"Guozhen An, Rivka Levitan","Comparing approaches for mitigating intergroup variability in - personality recognition",cs.SD cs.CL eess.AS," Personality have been found to predict many life outcomes, and there have -been huge interests on automatic personality recognition from a speaker's -utterance. Previously, we achieved accuracies between 37%-44% for three-way -classification of high, medium or low for each of the Big Five personality -traits (Openness to Experience, Conscientiousness, Extraversion, Agreeableness, -Neuroticism). We show here that we can improve performance on this task by -accounting for heterogeneity of gender and L1 in our data, which has English -speech from female and male native speakers of Chinese and Standard American -English (SAE). We experiment with personalizing models by L1 and gender and -normalizing features by speaker, L1 group, and/or gender. -" -6634,1802.01429,Jean-Baptiste Camps (CJM (EA 3624)),"Manuscripts in Time and Space: Experiments in Scriptometrics on an Old - French Corpus",cs.CL stat.AP," Witnesses of medieval literary texts, preserved in manuscript, are layered -objects , being almost exclusively copies of copies. This results in multiple -and hard to distinguish linguistic strata -- the author's scripta interacting -with the scriptae of the various scribes -- in a context where literary written -language is already a dialectal hybrid. Moreover, no single linguistic -phenomenon allows to distinguish between different scriptae, and only the -combination of multiple characteristics is likely to be significant [9] -- but -which ones? The most common approach is to search for these features in a set -of previously selected texts, that are supposed to be representative of a given -scripta. This can induce a circularity, in which texts are used to select -features that in turn characterise them as belonging to a linguistic area. To -counter this issue, this paper offers an unsupervised and corpus-based -approach, in which clustering methods are applied to an Old French corpus to -identify main divisions and groups. Ultimately, scriptometric profiles are -built for each of them. -" -6635,1802.01433,"Haonan Yu, Haichao Zhang, Wei Xu","Interactive Grounded Language Acquisition and Generalization in a 2D - World",cs.CL cs.AI cs.LG," We build a virtual agent for learning language in a 2D maze-like world. The -agent sees images of the surrounding environment, listens to a virtual teacher, -and takes actions to receive rewards. It interactively learns the teacher's -language from scratch based on two language use cases: sentence-directed -navigation and question answering. It learns simultaneously the visual -representations of the world, the language, and the action control. By -disentangling language grounding from other computational routines and sharing -a concept detection function between language grounding and prediction, the -agent reliably interpolates and extrapolates to interpret sentences that -contain new word combinations or new words missing from training sentences. The -new words are transferred from the answers of language prediction. Such a -language ability is trained and evaluated on a population of over 1.6 million -distinct sentences consisting of 119 object words, 8 color words, 9 -spatial-relation words, and 50 grammatical words. The proposed model -significantly outperforms five comparison methods for interpreting zero-shot -sentences. In addition, we demonstrate human-interpretable intermediate outputs -of the model in the appendix. -" -6636,1802.01451,"Filip Klubi\v{c}ka, Antonio Toral, V\'ictor M. S\'anchez-Cartagena","Quantitative Fine-Grained Human Evaluation of Machine Translation - Systems: a Case Study on English to Croatian",cs.CL cs.AI," This paper presents a quantitative fine-grained manual evaluation approach to -comparing the performance of different machine translation (MT) systems. We -build upon the well-established Multidimensional Quality Metrics (MQM) error -taxonomy and implement a novel method that assesses whether the differences in -performance for MQM error types between different MT systems are statistically -significant. We conduct a case study for English-to-Croatian, a language -direction that involves translating into a morphologically rich language, for -which we compare three MT systems belonging to different paradigms: pure -phrase-based, factored phrase-based and neural. First, we design an -MQM-compliant error taxonomy tailored to the relevant linguistic phenomena of -Slavic languages, which made the annotation process feasible and accurate. -Errors in MT outputs were then annotated by two annotators following this -taxonomy. Subsequently, we carried out a statistical analysis which showed that -the best-performing system (neural) reduces the errors produced by the worst -system (pure phrase-based) by more than half (54\%). Moreover, we conducted an -additional analysis of agreement errors in which we distinguished between short -(phrase-level) and long distance (sentence-level) errors. We discovered that -phrase-based MT approaches are of limited use for long distance agreement -phenomena, for which neural MT was found to be especially effective. -" -6637,1802.01457,"Andr\'e Cibils, Claudiu Musat, Andreea Hossman, Michael Baeriswyl",Diverse Beam Search for Increased Novelty in Abstractive Summarization,cs.CL," Text summarization condenses a text to a shorter version while retaining the -important informations. Abstractive summarization is a recent development that -generates new phrases, rather than simply copying or rephrasing sentences -within the original text. Recently neural sequence-to-sequence models have -achieved good results in the field of abstractive summarization, which opens -new possibilities and applications for industrial purposes. However, most -practitioners observe that these models still use large parts of the original -text in the output summaries, making them often similar to extractive -frameworks. To address this drawback, we first introduce a new metric to -measure how much of a summary is extracted from the input text. Secondly, we -present a novel method, that relies on a diversity factor in computing the -neural network loss, to improve the diversity of the summaries generated by any -neural abstractive model implementing beam search. Finally, we show that this -method not only makes the system less extractive, but also improves the overall -rouge score of state-of-the-art methods by at least 2 points. -" -6638,1802.01766,"Girish Kumar, Matthew Henderson, Shannon Chan, Hoang Nguyen, Lucas - Ngoo",Question-Answer Selection in User to User Marketplace Conversations,cs.CL," Sellers in user to user marketplaces can be inundated with questions from -potential buyers. Answers are often already available in the product -description. We collected a dataset of around 590K such questions and answers -from conversations in an online marketplace. We propose a question answering -system that selects a sentence from the product description using a -neural-network ranking model. We explore multiple encoding strategies, with -recurrent neural networks and feed-forward attention layers yielding good -results. This paper presents a demo to interactively pose buyer questions and -visualize the ranking scores of product description sentences from live online -listings. -" -6639,1802.01786,"Amir Karami, London S. Bennett, Xiaoyun He","Mining Public Opinion about Economic Issues: Twitter and the U.S. - Presidential Election",cs.SI cs.CL cs.IR stat.AP stat.ML," Opinion polls have been the bridge between public opinion and politicians in -elections. However, developing surveys to disclose people's feedback with -respect to economic issues is limited, expensive, and time-consuming. In recent -years, social media such as Twitter has enabled people to share their opinions -regarding elections. Social media has provided a platform for collecting a -large amount of social media data. This paper proposes a computational public -opinion mining approach to explore the discussion of economic issues in social -media during an election. Current related studies use text mining methods -independently for election analysis and election prediction; this research -combines two text mining methods: sentiment analysis and topic modeling. The -proposed approach has effectively been deployed on millions of tweets to -analyze economic concerns of people during the 2012 US presidential election. -" -6640,1802.01812,"Junyang Lin, Shuming Ma, Qi Su, Xu Sun","Decoding-History-Based Adaptive Control of Attention for Neural Machine - Translation",cs.CL cs.AI cs.LG," Attention-based sequence-to-sequence model has proved successful in Neural -Machine Translation (NMT). However, the attention without consideration of -decoding history, which includes the past information in the decoder and the -attention mechanism, often causes much repetition. To address this problem, we -propose the decoding-history-based Adaptive Control of Attention (ACA) for the -NMT model. ACA learns to control the attention by keeping track of the decoding -history and the current information with a memory vector, so that the model can -take the translated contents and the current information into consideration. -Experiments on Chinese-English translation and the English-Vietnamese -translation have demonstrated that our model significantly outperforms the -strong baselines. The analysis shows that our model is capable of generating -translation with less repetition and higher accuracy. The code will be -available at https://github.com/lancopku -" -6641,1802.01817,"Xiang Zhang, Yann LeCun",Byte-Level Recursive Convolutional Auto-Encoder for Text,cs.CL," This article proposes to auto-encode text at byte-level using convolutional -networks with a recursive architecture. The motivation is to explore whether it -is possible to have scalable and homogeneous text generation at byte-level in a -non-sequential fashion through the simple task of auto-encoding. We show that -non-sequential text generation from a fixed-length representation is not only -possible, but also achieved much better auto-encoding results than recurrent -networks. The proposed model is a multi-stage deep convolutional -encoder-decoder framework using residual connections, containing up to 160 -parameterized layers. Each encoder or decoder contains a shared group of -modules that consists of either pooling or upsampling layers, making the -network recursive in terms of abstraction levels in representation. Results for -6 large-scale paragraph datasets are reported, in 3 languages including Arabic, -Chinese and English. Analyses are conducted to study several properties of the -proposed model. -" -6642,1802.01830,Akira Utsumi,A Neurobiologically Motivated Analysis of Distributional Semantic Models,cs.CL q-bio.NC," The pervasive use of distributional semantic models or word embeddings in a -variety of research fields is due to their remarkable ability to represent the -meanings of words for both practical application and cognitive modeling. -However, little has been known about what kind of information is encoded in -text-based word vectors. This lack of understanding is particularly problematic -when word vectors are regarded as a model of semantic representation for -abstract concepts. This paper attempts to reveal the internal information of -distributional word vectors by the analysis using Binder et al.'s (2016) -brain-based vectors, explicitly structured conceptual representations based on -neurobiologically motivated attributes. In the analysis, the mapping from -text-based vectors to brain-based vectors is trained and prediction performance -is evaluated by comparing the estimated and original brain-based vectors. The -analysis demonstrates that social and cognitive information is better encoded -in text-based word vectors, but emotional information is not. This result is -discussed in terms of embodied theories for abstract concepts. -" -6643,1802.01886,"Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, - Yong Yu",Texygen: A Benchmarking Platform for Text Generation Models,cs.CL cs.IR cs.LG," We introduce Texygen, a benchmarking platform to support research on -open-domain text generation models. Texygen has not only implemented a majority -of text generation models, but also covered a set of metrics that evaluate the -diversity, the quality and the consistency of the generated texts. The Texygen -platform could help standardize the research on text generation and facilitate -the sharing of fine-tuned open-source implementations among researchers for -their work. As a consequence, this would help in improving the reproductivity -and reliability of future research work in text generation. -" -6644,1802.02032,"Xiaoyu Shen, Hui Su, Shuzi Niu and Vera Demberg",Improving Variational Encoder-Decoders in Dialogue Generation,cs.CL cs.AI cs.LG," Variational encoder-decoders (VEDs) have shown promising results in dialogue -generation. However, the latent variable distributions are usually approximated -by a much simpler model than the powerful RNN structure used for encoding and -decoding, yielding the KL-vanishing problem and inconsistent training -objective. In this paper, we separate the training step into two phases: The -first phase learns to autoencode discrete texts into continuous embeddings, -from which the second phase learns to generalize latent representations by -reconstructing the encoded embedding. In this case, latent variables are -sampled by transforming Gaussian noise through multi-layer perceptrons and are -trained with a separate VED model, which has the potential of realizing a much -more flexible distribution. We compare our model with current popular models -and the experiment demonstrates substantial improvement in both metric-based -and human evaluations. -" -6645,1802.02053,"Marwa Hadj Salah, Didier Schwab, Herv\'e Blanchon and Mounir Zrigui",Syst\`eme de traduction automatique statistique Anglais-Arabe,cs.CL cs.LG," Machine translation (MT) is the process of translating text written in a -source language into text in a target language. In this article, we present our -English-Arabic statistical machine translation system. First, we present the -general process for setting up a statistical machine translation system, then -we describe the tools as well as the different corpora we used to build our MT -system. Our system was evaluated in terms of the BLUE score (24.51%) -" -6646,1802.02114,"Peng Xu, and Denilson Barbosa","Investigations on Knowledge Base Embedding for Relation Prediction and - Extraction",cs.CL," We report an evaluation of the effectiveness of the existing knowledge base -embedding models for relation prediction and for relation extraction on a wide -range of benchmarks. We also describe a new benchmark, which is much larger and -complex than previous ones, which we introduce to help validate the -effectiveness of both tasks. The results demonstrate that knowledge base -embedding models are generally effective for relation prediction but unable to -give improvements for the state-of-art neural relation extraction model with -the existing strategies, while pointing limitations of existing methods. -" -6647,1802.02116,Matteo Grella and Simone Cangialosi,Non-Projective Dependency Parsing via Latent Heads Representation (LHR),cs.CL," In this paper, we introduce a novel approach based on a bidirectional -recurrent autoencoder to perform globally optimized non-projective dependency -parsing via semi-supervised learning. The syntactic analysis is completed at -the end of the neural process that generates a Latent Heads Representation -(LHR), without any algorithmic constraint and with a linear complexity. The -resulting ""latent syntactic structure"" can be used directly in other semantic -tasks. The LHR is transformed into the usual dependency tree computing a simple -vectors similarity. We believe that our model has the potential to compete with -much more complex state-of-the-art parsing architectures. -" -6648,1802.02163,"Naoki Egami, Christian J. Fong, Justin Grimmer, Margaret E. Roberts, - Brandon M. Stewart",How to Make Causal Inferences Using Texts,stat.ML cs.CL stat.ME," New text as data techniques offer a great promise: the ability to inductively -discover measures that are useful for testing social science theories of -interest from large collections of text. We introduce a conceptual framework -for making causal inferences with discovered measures as a treatment or -outcome. Our framework enables researchers to discover high-dimensional textual -interventions and estimate the ways that observed treatments affect text-based -outcomes. We argue that nearly all text-based causal inferences depend upon a -latent representation of the text and we provide a framework to learn the -latent representation. But estimating this latent representation, we show, -creates new risks: we may introduce an identification problem or overfit. To -address these risks we describe a split-sample framework and apply it to -estimate causal effects from an experiment on immigration attitudes and a study -on bureaucratic response. Our work provides a rigorous foundation for -text-based causal inferences. -" -6649,1802.02311,"Jinmiao Huang, Cesar Osorio, Luke Wicent Sy","An Empirical Evaluation of Deep Learning for ICD-9 Code Assignment using - MIMIC-III Clinical Notes",cs.CL," Background and Objective: Code assignment is of paramount importance in many -levels in modern hospitals, from ensuring accurate billing process to creating -a valid record of patient care history. However, the coding process is tedious -and subjective, and it requires medical coders with extensive training. This -study aims to evaluate the performance of deep-learning-based systems to -automatically map clinical notes to ICD-9 medical codes. Methods: The -evaluations of this research are focused on end-to-end learning methods without -manually defined rules. Traditional machine learning algorithms, as well as -state-of-the-art deep learning methods such as Recurrent Neural Networks and -Convolution Neural Networks, were applied to the Medical Information Mart for -Intensive Care (MIMIC-III) dataset. An extensive number of experiments was -applied to different settings of the tested algorithm. Results: Findings showed -that the deep learning-based methods outperformed other conventional machine -learning methods. From our assessment, the best models could predict the top 10 -ICD-9 codes with 0.6957 F1 and 0.8967 accuracy and could estimate the top 10 -ICD-9 categories with 0.7233 F1 and 0.8588 accuracy. Our implementation also -outperformed existing work under certain evaluation metrics. Conclusion: A set -of standard metrics was utilized in assessing the performance of ICD-9 code -assignment on MIMIC-III dataset. All the developed evaluation tools and -resources are available online, which can be used as a baseline for further -research. -" -6650,1802.02550,"Yoon Kim, Sam Wiseman, Andrew C. Miller, David Sontag, Alexander M. - Rush",Semi-Amortized Variational Autoencoders,stat.ML cs.CL cs.LG," Amortized variational inference (AVI) replaces instance-specific local -inference with a global inference network. While AVI has enabled efficient -training of deep generative models such as variational autoencoders (VAE), -recent empirical work suggests that inference networks can produce suboptimal -variational parameters. We propose a hybrid approach, to use AVI to initialize -the variational parameters and run stochastic variational inference (SVI) to -refine them. Crucially, the local SVI procedure is itself differentiable, so -the inference network and generative model can be trained end-to-end with -gradient-based optimization. This semi-amortized approach enables the use of -rich generative models without experiencing the posterior-collapse phenomenon -common in training VAEs for problems like text generation. Experiments show -this approach outperforms strong autoregressive and variational baselines on -standard text and image datasets. -" -6651,1802.02561,"Hamza Harkous, Kassem Fawaz, R\'emi Lebret, Florian Schaub, Kang G. - Shin, Karl Aberer","Polisis: Automated Analysis and Presentation of Privacy Policies Using - Deep Learning",cs.CL cs.CR cs.HC," Privacy policies are the primary channel through which companies inform users -about their data collection and sharing practices. These policies are often -long and difficult to comprehend. Short notices based on information extracted -from privacy policies have been shown to be useful but face a significant -scalability hurdle, given the number of policies and their evolution over time. -Companies, users, researchers, and regulators still lack usable and scalable -tools to cope with the breadth and depth of privacy policies. To address these -hurdles, we propose an automated framework for privacy policy analysis -(Polisis). It enables scalable, dynamic, and multi-dimensional queries on -natural language privacy policies. At the core of Polisis is a privacy-centric -language model, built with 130K privacy policies, and a novel hierarchy of -neural-network classifiers that accounts for both high-level aspects and -fine-grained details of privacy practices. We demonstrate Polisis' modularity -and utility with two applications supporting structured and free-form querying. -The structured querying application is the automated assignment of privacy -icons from privacy policies. With Polisis, we can achieve an accuracy of 88.4% -on this task. The second application, PriBot, is the first freeform -question-answering system for privacy policies. We show that PriBot can produce -a correct answer among its top-3 results for 82% of the test questions. Using -an MTurk user study with 700 participants, we show that at least one of -PriBot's top-3 answers is relevant to users for 89% of the test questions. -" -6652,1802.02605,Jean-Fran\c{c}ois Delpech,Unsupervised word sense disambiguation in dynamic semantic spaces,cs.CL," In this paper, we are mainly concerned with the ability to quickly and -automatically distinguish word senses in dynamic semantic spaces in which new -terms and new senses appear frequently. Such spaces are built '""on the fly"" -from constantly evolving data sets such as Wikipedia, repositories of patent -grants and applications, or large sets of legal documents for Technology -Assisted Review and e-discovery. This immediacy rules out supervision as well -as the use of a priori training sets. We show that the various senses of a term -can be automatically made apparent with a simple clustering algorithm, each -sense being a vector in the semantic space. While we only consider here -semantic spaces built by using random vectors, this algorithm should work with -any kind of embedding, provided meaningful similarities between terms can be -computed and do fulfill at least the two basic conditions that terms which -close meanings have high similarities and terms with unrelated meanings have -near-zero similarities. -" -6653,1802.02607,"Prashanth Gurunath Shivakumar, Haoqi Li, Kevin Knight, Panayiotis - Georgiou","Learning from Past Mistakes: Improving Automatic Speech Recognition - Output via Noisy-Clean Phrase Context Modeling",cs.CL cs.SD eess.AS," Automatic speech recognition (ASR) systems often make unrecoverable errors -due to subsystem pruning (acoustic, language and pronunciation models); for -example pruning words due to acoustics using short-term context, prior to -rescoring with long-term context based on linguistics. In this work we model -ASR as a phrase-based noisy transformation channel and propose an error -correction system that can learn from the aggregate errors of all the -independent modules constituting the ASR and attempt to invert those. The -proposed system can exploit long-term context using a neural network language -model and can better choose between existing ASR output possibilities as well -as re-introduce previously pruned or unseen (out-of-vocabulary) phrases. It -provides corrections under poorly performing ASR conditions without degrading -any accurate transcriptions; such corrections are greater on top of -out-of-domain and mismatched data ASR. Our system consistently provides -improvements over the baseline ASR, even when baseline is further optimized -through recurrent neural network language model rescoring. This demonstrates -that any ASR improvements can be exploited independently and that our proposed -system can potentially still provide benefits on highly optimized ASR. Finally, -we present an extensive analysis of the type of errors corrected by our system. -" -6654,1802.02614,"Jianxiong Dong, Jim Huang","Enhance word representation for out-of-vocabulary on Ubuntu dialogue - corpus",cs.CL," Ubuntu dialogue corpus is the largest public available dialogue corpus to -make it feasible to build end-to-end deep neural network models directly from -the conversation data. One challenge of Ubuntu dialogue corpus is the large -number of out-of-vocabulary words. In this paper we proposed a method which -combines the general pre-trained word embedding vectors with those generated on -the task-specific training set to address this issue. We integrated character -embedding into Chen et al's Enhanced LSTM method (ESIM) and used it to evaluate -the effectiveness of our proposed method. For the task of next utterance -selection, the proposed method has demonstrated a significant performance -improvement against original ESIM and the new model has achieved -state-of-the-art results on both Ubuntu dialogue corpus and Douban conversation -corpus. In addition, we investigated the performance impact of end-of-utterance -and end-of-turn token tags. -" -6655,1802.02656,"Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, - Bhuvana Ramabhadran, Mark Hasegawa-Johnson","Joint Modeling of Accents and Acoustics for Multi-Accent Speech - Recognition",cs.CL cs.SD eess.AS," The performance of automatic speech recognition systems degrades with -increasing mismatch between the training and testing scenarios. Differences in -speaker accents are a significant source of such mismatch. The traditional -approach to deal with multiple accents involves pooling data from several -accents during training and building a single model in multi-task fashion, -where tasks correspond to individual accents. In this paper, we explore an -alternate model where we jointly learn an accent classifier and a multi-task -acoustic model. Experiments on the American English Wall Street Journal and -British English Cambridge corpora demonstrate that our joint model outperforms -the strong multi-task acoustic model baseline. We obtain a 5.94% relative -improvement in word error rate on British English, and 9.47% relative -improvement on American English. This illustrates that jointly modeling with -accent information improves acoustic model performance. -" -6656,1802.02745,"Reuben Feinman, Brenden M. Lake",Learning Inductive Biases with Simple Neural Networks,cs.CL cs.CV cs.LG," People use rich prior knowledge about the world in order to efficiently learn -new concepts. These priors - also known as ""inductive biases"" - pertain to the -space of internal models considered by a learner, and they help the learner -make inferences that go beyond the observed data. A recent study found that -deep neural networks optimized for object recognition develop the shape bias -(Ritter et al., 2017), an inductive bias possessed by children that plays an -important role in early word learning. However, these networks use -unrealistically large quantities of training data, and the conditions required -for these biases to develop are not well understood. Moreover, it is unclear -how the learning dynamics of these networks relate to developmental processes -in childhood. We investigate the development and influence of the shape bias in -neural networks using controlled datasets of abstract patterns and synthetic -images, allowing us to systematically vary the quantity and form of the -experience provided to the learning algorithms. We find that simple neural -networks develop a shape bias after seeing as few as 3 examples of 4 object -categories. The development of these biases predicts the onset of vocabulary -acceleration in our networks, consistent with the developmental process in -children. -" -6657,1802.02870,"Naiara Perez, Montse Cuadros, German Rigau",Biomedical term normalization of EHRs with UMLS,cs.CL," This paper presents a novel prototype for biomedical term normalization of -electronic health record excerpts with the Unified Medical Language System -(UMLS) Metathesaurus. Despite being multilingual and cross-lingual by design, -we first focus on processing clinical text in Spanish because there is no -existing tool for this language and for this specific purpose. The tool is -based on Apache Lucene to index the Metathesaurus and generate mapping -candidates from input text. It uses the IXA pipeline for basic language -processing and resolves ambiguities with the UKB toolkit. It has been evaluated -by measuring its agreement with MetaMap in two English-Spanish parallel -corpora. In addition, we present a web-based interface for the tool. -" -6658,1802.02892,"D. Kiela, E. Grave, A. Joulin, T. Mikolov",Efficient Large-Scale Multi-Modal Classification,cs.CL cs.AI cs.CV," While the incipient internet was largely text-based, the modern digital world -is becoming increasingly multi-modal. Here, we examine multi-modal -classification where one modality is discrete, e.g. text, and the other is -continuous, e.g. visual representations transferred from a convolutional neural -network. In particular, we focus on scenarios where we have to be able to -classify large quantities of data quickly. We investigate various methods for -performing multi-modal fusion and analyze their trade-offs in terms of -classification accuracy and computational efficiency. Our findings indicate -that the inclusion of continuous information improves performance over -text-only on a range of multi-modal classification tasks, even with simple -fusion methods. In addition, we experiment with discretizing the continuous -features in order to speed up and simplify the fusion process even further. Our -results show that fusion with discretized features outperforms text-only -classification, at a fraction of the computational cost of full multi-modal -fusion, with the additional benefit of improved interpretability. -" -6659,1802.02914,George Christodoulides (ILC),Praaline: Integrating Tools for Speech Corpus Research,cs.CL cs.DB," This paper presents Praaline, an open-source software system for managing, -annotating, analysing and visualising speech corpora. Researchers working with -speech corpora are often faced with multiple tools and formats, and they need -to work with ever-increasing amounts of data in a collaborative way. Praaline -integrates and extends existing time-proven tools for spoken corpora analysis -(Praat, Sonic Visualiser and a bridge to the R statistical package) in a -modular system, facilitating automation and reuse. Users are exposed to an -integrated, user-friendly interface from which to access multiple tools. Corpus -metadata and annotations may be stored in a database, locally or remotely, and -users can define the metadata and annotation structure. Users may run a -customisable cascade of analysis steps, based on plug-ins and scripts, and -update the database with the results. The corpus database may be queried, to -produce aggregated data-sets. Praaline is extensible using Python or C++ -plug-ins, while Praat and R scripts may be executed against the corpus data. A -series of visualisations, editors and plug-ins are provided. Praaline is free -software, released under the GPL license. -" -6660,1802.02926,"George Christodoulides (ILC), Mathieu Avanzi, Jean-Philippe Goldman - (UNIGE)","DisMo: A Morphosyntactic, Disfluency and Multi-Word Unit Annotator. An - Evaluation on a Corpus of French Spontaneous and Read Speech",cs.CL," We present DisMo, a multi-level annotator for spoken language corpora that -integrates part-of-speech tagging with basic disfluency detection and -annotation, and multi-word unit recognition. DisMo is a hybrid system that uses -a combination of lexical resources, rules, and statistical models based on -Conditional Random Fields (CRF). In this paper, we present the first public -version of DisMo for French. The system is trained and its performance -evaluated on a 57k-token corpus, including different varieties of French spoken -in three countries (Belgium, France and Switzerland). DisMo supports a -multi-level annotation scheme, in which the tokenisation to minimal word units -is complemented with multi-word unit groupings (each having associated POS -tags), as well as separate levels for annotating disfluencies and discourse -phenomena. We present the system's architecture, linguistic resources and its -hierarchical tag-set. Results show that DisMo achieves a precision of 95% -(finest tag-set) to 96.8% (coarse tag-set) in POS-tagging non-punctuated, -sound-aligned transcriptions of spoken French, while also offering substantial -possibilities for automated multi-level annotation. -" -6661,1802.03052,"Peter A. Jansen, Elizabeth Wainwright, Steven Marmorstein, Clayton T. - Morrison","WorldTree: A Corpus of Explanation Graphs for Elementary Science - Questions supporting Multi-Hop Inference",cs.CL cs.AI cs.IR," Developing methods of automated inference that are able to provide users with -compelling human-readable justifications for why the answer to a question is -correct is critical for domains such as science and medicine, where user trust -and detecting costly errors are limiting factors to adoption. One of the -central barriers to training question answering models on explainable inference -tasks is the lack of gold explanations to serve as training data. In this paper -we present a corpus of explanations for standardized science exams, a recent -challenge task for question answering. We manually construct a corpus of -detailed explanations for nearly all publicly available standardized elementary -science question (approximately 1,680 3rd through 5th grade questions) and -represent these as ""explanation graphs"" -- sets of lexically overlapping -sentences that describe how to arrive at the correct answer to a question -through a combination of domain and world knowledge. We also provide an -explanation-centered tablestore, a collection of semi-structured tables that -contain the knowledge to construct these elementary science explanations. -Together, these two knowledge resources map out a substantial portion of the -knowledge required for answering and explaining elementary science exams, and -provide both structured and free-text training data for the explainable -inference task. -" -6662,1802.03116,"Yun Chen, Yang Liu, Victor O.K. Li","Zero-Resource Neural Machine Translation with Multi-Agent Communication - Game",cs.CL," While end-to-end neural machine translation (NMT) has achieved notable -success in the past years in translating a handful of resource-rich language -pairs, it still suffers from the data scarcity problem for low-resource -language pairs and domains. To tackle this problem, we propose an interactive -multimodal framework for zero-resource neural machine translation. Instead of -being passively exposed to large amounts of parallel corpora, our learners -(implemented as encoder-decoder architecture) engage in cooperative image -description games, and thus develop their own image captioning or neural -machine translation model from the need to communicate in order to succeed at -the game. Experimental results on the IAPR-TC12 and Multi30K datasets show that -the proposed learning mechanism significantly improves over the -state-of-the-art methods. -" -6663,1802.03142,"Ali Can Kocabiyikoglu, Laurent Besacier, Olivier Kraif","Augmenting Librispeech with French Translations: A Multimodal Corpus for - Direct Speech Translation Evaluation",cs.CL," Recent works in spoken language translation (SLT) have attempted to build -end-to-end speech-to-text translation without using source language -transcription during learning or decoding. However, while large quantities of -parallel texts (such as Europarl, OpenSubtitles) are available for training -machine translation systems, there are no large (100h) and open source parallel -corpora that include speech in a source language aligned to text in a target -language. This paper tries to fill this gap by augmenting an existing -(monolingual) corpus: LibriSpeech. This corpus, used for automatic speech -recognition, is derived from read audiobooks from the LibriVox project, and has -been carefully segmented and aligned. After gathering French e-books -corresponding to the English audio-books from LibriSpeech, we align speech -segments at the sentence level with their respective translations and obtain -236h of usable parallel data. This paper presents the details of the processing -as well as a manual evaluation conducted on a small subset of the corpus. This -evaluation shows that the automatic alignments scores are reasonably correlated -with the human judgments of the bilingual alignment quality. We believe that -this corpus (which is made available online) is useful for replicable -experiments in direct speech translation or more general spoken language -translation experiments. -" -6664,1802.03198,"Martin Mirakyan, Karen Hambardzumyan, Hrant Khachatrian","Natural Language Inference over Interaction Space: ICLR 2018 - Reproducibility Report",cs.CL," We have tried to reproduce the results of the paper ""Natural Language -Inference over Interaction Space"" submitted to ICLR 2018 conference as part of -the ICLR 2018 Reproducibility Challenge. Initially, we were not aware that the -code was available, so we started to implement the network from scratch. We -have evaluated our version of the model on Stanford NLI dataset and reached -86.38% accuracy on the test set, while the paper claims 88.0% accuracy. The -main difference, as we understand it, comes from the optimizers and the way -model selection is performed. -" -6665,1802.03238,"Myeongjun Jang, Seungwan Seo, Pilsung Kang","Recurrent Neural Network-Based Semantic Variational Autoencoder for - Sequence-to-Sequence Learning",cs.CL," Sequence-to-sequence (Seq2seq) models have played an important role in the -recent success of various natural language processing methods, such as machine -translation, text summarization, and speech recognition. However, current -Seq2seq models have trouble preserving global latent information from a long -sequence of words. Variational autoencoder (VAE) alleviates this problem by -learning a continuous semantic space of the input sentence. However, it does -not solve the problem completely. In this paper, we propose a new recurrent -neural network (RNN)-based Seq2seq model, RNN semantic variational autoencoder -(RNN--SVAE), to better capture the global latent information of a sequence of -words. To reflect the meaning of words in a sentence properly, without regard -to its position within the sentence, we construct a document information vector -using the attention information between the final state of the encoder and -every prior hidden state. Then, the mean and standard deviation of the -continuous semantic space are learned by using this vector to take advantage of -the variational method. By using the document information vector to find the -semantic space of the sentence, it becomes possible to better capture the -global latent feature of the sentence. Experimental results of three natural -language tasks (i.e., language modeling, missing word imputation, paraphrase -identification) confirm that the proposed RNN--SVAE yields higher performance -than two benchmark models. -" -6666,1802.03268,"Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean",Efficient Neural Architecture Search via Parameter Sharing,cs.LG cs.CL cs.CV cs.NE stat.ML," We propose Efficient Neural Architecture Search (ENAS), a fast and -inexpensive approach for automatic model design. In ENAS, a controller learns -to discover neural network architectures by searching for an optimal subgraph -within a large computational graph. The controller is trained with policy -gradient to select a subgraph that maximizes the expected reward on the -validation set. Meanwhile the model corresponding to the selected subgraph is -trained to minimize a canonical cross entropy loss. Thanks to parameter sharing -between child models, ENAS is fast: it delivers strong empirical performances -using much fewer GPU-hours than all existing automatic model design approaches, -and notably, 1000x less expensive than standard Neural Architecture Search. On -the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a -test perplexity of 55.8, establishing a new state-of-the-art among all methods -without post-training processing. On the CIFAR-10 dataset, ENAS designs novel -architectures that achieve a test error of 2.89%, which is on par with NASNet -(Zoph et al., 2018), whose test error is 2.65%. -" -6667,1802.03594,\'Alvaro Peris and Francisco Casacuberta,"Online Learning for Effort Reduction in Interactive Neural Machine - Translation",cs.CL," Neural machine translation systems require large amounts of training data and -resources. Even with this, the quality of the translations may be insufficient -for some users or domains. In such cases, the output of the system must be -revised by a human agent. This can be done in a post-editing stage or following -an interactive machine translation protocol. - We explore the incremental update of neural machine translation systems -during the post-editing or interactive translation processes. Such -modifications aim to incorporate the new knowledge, from the edited sentences, -into the translation system. Updates to the model are performed on-the-fly, as -sentences are corrected, via online learning techniques. In addition, we -implement a novel interactive, adaptive system, able to react to -single-character interactions. This system greatly reduces the human effort -required for obtaining high-quality translations. - In order to stress our proposals, we conduct exhaustive experiments varying -the amount and type of data available for training. Results show that online -learning effectively achieves the objective of reducing the human effort -required during the post-editing or the interactive machine translation stages. -Moreover, these adaptive systems also perform well in scenarios with scarce -resources. We show that a neural machine translation system can be rapidly -adapted to a specific domain, exclusively by means of online learning -techniques. -" -6668,1802.03606,"Galip Aydin, Ibrahim Riza Hallac",Distributed NLP,cs.DC cs.CL," In this paper we present the performance of parallel text processing with Map -Reduce on a cloud platform. Scientific papers in Turkish language are processed -using Zemberek NLP library. Experiments were run on a Hadoop cluster and -compared with the single machines performance. -" -6669,1802.03656,"Benyou Wang, Li Wang, Qikang Wei, Lichun Liu","TextZoo, a New Benchmark for Reconsidering Text Classification",cs.CL," Text representation is a fundamental concern in Natural Language Processing, -especially in text classification. Recently, many neural network approaches -with delicate representation model (e.g. FASTTEXT, CNN, RNN and many hybrid -models with attention mechanisms) claimed that they achieved state-of-art in -specific text classification datasets. However, it lacks an unified benchmark -to compare these models and reveals the advantage of each sub-components for -various settings. We re-implement more than 20 popular text representation -models for classification in more than 10 datasets. In this paper, we -reconsider the text classification task in the perspective of neural network -and get serval effects with analysis of the above results. -" -6670,1802.03701,"Sourish Dasgupta, Ankur Padia, Gaurav Maheshwari, Priyansh Trivedi, - Jens Lehmann",Formal Ontology Learning from English IS-A Sentences,cs.AI cs.CL," Ontology learning (OL) is the process of automatically generating an -ontological knowledge base from a plain text document. In this paper, we -propose a new ontology learning approach and tool, called DLOL, which generates -a knowledge base in the description logic (DL) SHOQ(D) from a collection of -factual non-negative IS-A sentences in English. We provide extensive -experimental results on the accuracy of DLOL, giving experimental comparisons -to three state-of-the-art existing OL tools, namely Text2Onto, FRED, and LExO. -Here, we use the standard OL accuracy measure, called lexical accuracy, and a -novel OL accuracy measure, called instance-based inference model. In our -experimental results, DLOL turns out to be about 21% and 46%, respectively, -better than the best of the other three approaches. -" -6671,1802.03712,Rodolfo Delmonte,"Syntax and Semantics of Italian Poetry in the First Half of the 20th - Century",cs.CL," In this paper we study, analyse and comment rhetorical figures present in -some of most interesting poetry of the first half of the twentieth century. -These figures are at first traced back to some famous poet of the past and then -compared to classical Latin prose. Linguistic theory is then called in to show -how they can be represented in syntactic structures and classified as -noncanonical structures, by positioning discontinuous or displaced linguistic -elements in Spec XP projections at various levels of constituency. Then we -introduce LFG (Lexical Functional Grammar) as the theory that allows us to -connect syntactic noncanonical structures with informational structure and -psycholinguistic theories for complexity evaluation. We end up with two -computational linguistics experiments and then evaluate the results. The first -one uses best online parsers of Italian to parse poetic structures; the second -one uses Getarun, the system created at Ca Foscari Computational Linguistics -Laboratory. As will be shown, the first approach is unable to cope with these -structures due to the use of only statistical probabilistic information. On the -contrary, the second one, being a symbolic rule based system, is by far -superior and allows also to complete both semantic an pragmatic analysis. -" -6672,1802.03753,"Gell\'ert Weisz, Pawe{\l} Budzianowski, Pei-Hao Su, Milica Ga\v{s}i\'c","Sample Efficient Deep Reinforcement Learning for Dialogue Systems with - Large Action Spaces",cs.CL cs.AI cs.LG stat.ML," In spoken dialogue systems, we aim to deploy artificial intelligence to build -automated dialogue agents that can converse with humans. A part of this effort -is the policy optimisation task, which attempts to find a policy describing how -to respond to humans, in the form of a function taking the current state of the -dialogue and returning the response of the system. In this paper, we -investigate deep reinforcement learning approaches to solve this problem. -Particular attention is given to actor-critic methods, off-policy reinforcement -learning with experience replay, and various methods aimed at reducing the bias -and variance of estimators. When combined, these methods result in the -previously proposed ACER algorithm that gave competitive results in gaming -environments. These environments however are fully observable and have a -relatively small action set so in this paper we examine the application of ACER -to dialogue policy optimisation. We show that this method beats the current -state-of-the-art in deep learning approaches for spoken dialogue systems. This -not only leads to a more sample efficient algorithm that can train faster, but -also allows us to apply the algorithm in more difficult environments than -before. We thus experiment with learning in a very large action space, which -has two orders of magnitude more actions than previously considered. We find -that ACER trains significantly faster than the current state-of-the-art. -" -6673,1802.03793,"Justin Sybrandt, Michael Shtutman and Ilya Safro","Large-Scale Validation of Hypothesis Generation Systems via Candidate - Ranking",cs.IR cs.CL," The first step of many research projects is to define and rank a short list -of candidates for study. In the modern rapidity of scientific progress, some -turn to automated hypothesis generation (HG) systems to aid this process. These -systems can identify implicit or overlooked connections within a large -scientific corpus, and while their importance grows alongside the pace of -science, they lack thorough validation. Without any standard numerical -evaluation method, many validate general-purpose HG systems by rediscovering a -handful of historical findings, and some wishing to be more thorough may run -laboratory experiments based on automatic suggestions. These methods are -expensive, time consuming, and cannot scale. Thus, we present a numerical -evaluation framework for the purpose of validating HG systems that leverages -thousands of validation hypotheses. This method evaluates a HG system by its -ability to rank hypotheses by plausibility; a process reminiscent of human -candidate selection. Because HG systems do not produce a ranking criteria, -specifically those that produce topic models, we additionally present novel -metrics to quantify the plausibility of hypotheses given topic model system -output. Finally, we demonstrate that our proposed validation method aligns with -real-world research goals by deploying our method within Moliere, our recent -topic-driven HG system, in order to automatically generate a set of candidate -genes related to HIV-associated neurodegenerative disease (HAND). By performing -laboratory experiments based on this candidate set, we discover a new -connection between HAND and Dead Box RNA Helicase 3 (DDX3). Reproducibility: -code, validation data, and results can be found at -sybrandt.com/2018/validation. -" -6674,1802.03816,"Skanda Koppula, Khe Chai Sim, and Kean Chin",Understanding Recurrent Neural State Using Memory Signatures,cs.CL," We demonstrate a network visualization technique to analyze the recurrent -state inside the LSTMs/GRUs used commonly in language and acoustic models. -Interpreting intermediate state and network activations inside end-to-end -models remains an open challenge. Our method allows users to understand exactly -how much and what history is encoded inside recurrent state in grapheme -sequence models. Our procedure trains multiple decoders that predict prior -input history. Compiling results from these decoders, a user can obtain a -signature of the recurrent kernel that characterizes its memory behavior. We -demonstrate this method's usefulness in revealing information divergence in the -bases of recurrent factorized kernels, visualizing the character-level -differences between the memory of n-gram and recurrent language models, and -extracting knowledge of history encoded in the layers of grapheme-based -end-to-end ASR networks. -" -6675,1802.03821,"Betul Karakus, Ibrahim Riza Hallac, Galip Aydin",Distributed Readability Analysis Of Turkish Elementary School Textbooks,cs.DC cs.CL," The readability assessment deals with estimating the level of difficulty in -reading texts.Many readability tests, which do not indicate execution -efficiency, have been applied on specific texts to measure the reading grade -level in science textbooks. In this paper, we analyze the content covered in -elementary school Turkish textbooks by employing a distributed parallel -processing framework based on popular MapReduce paradigm. We outline the -architecture of a distributed Big Data processing system which uses Hadoop for -full-text readability analysis. The readability scores of the textbooks and -system performance measurements are also given in the paper. -" -6676,1802.03881,"Sang-Woo Lee, Yu-Jung Heo, Byoung-Tak Zhang","Answerer in Questioner's Mind: Information Theoretic Approach to - Goal-Oriented Visual Dialog",cs.CV cs.AI cs.CL cs.LG," Goal-oriented dialog has been given attention due to its numerous -applications in artificial intelligence. Goal-oriented dialogue tasks occur -when a questioner asks an action-oriented question and an answerer responds -with the intent of letting the questioner know a correct action to take. To ask -the adequate question, deep learning and reinforcement learning have been -recently applied. However, these approaches struggle to find a competent -recurrent neural questioner, owing to the complexity of learning a series of -sentences. Motivated by theory of mind, we propose ""Answerer in Questioner's -Mind"" (AQM), a novel information theoretic algorithm for goal-oriented dialog. -With AQM, a questioner asks and infers based on an approximated probabilistic -model of the answerer. The questioner figures out the answerer's intention via -selecting a plausible question by explicitly calculating the information gain -of the candidate intentions and possible answers to each question. We test our -framework on two goal-oriented visual dialog tasks: ""MNIST Counting Dialog"" and -""GuessWhat?!"". In our experiments, AQM outperforms comparative algorithms by a -large margin. -" -6677,1802.04028,Sarai Duek and Shaul Markovitch,"Automatic Generation of Language-Independent Features for Cross-Lingual - Classification",cs.CL," Many applications require categorization of text documents using predefined -categories. The main approach to performing text categorization is learning -from labeled examples. For many tasks, it may be difficult to find examples in -one language but easy in others. The problem of learning from examples in one -or more languages and classifying (categorizing) in another is called -cross-lingual learning. In this work, we present a novel approach that solves -the general cross-lingual text categorization problem. Our method generates, -for each training document, a set of language-independent features. Using these -features for training yields a language-independent classifier. At the -classification stage, we generate language-independent features for the -unlabeled document, and apply the classifier on the new representation. - To build the feature generator, we utilize a hierarchical -language-independent ontology, where each concept has a set of support -documents for each language involved. In the preprocessing stage, we use the -support documents to build a set of language-independent feature generators, -one for each language. The collection of these generators is used to map any -document into the language-independent feature space. - Our methodology works on the most general cross-lingual text categorization -problems, being able to learn from any mix of languages and classify documents -in any other language. We also present a method for exploiting the hierarchical -structure of the ontology to create virtual supporting documents for languages -that do not have them. We tested our method, using Wikipedia as our ontology, -on the most commonly used test collections in cross-lingual text -categorization, and found that it outperforms existing methods. -" -6678,1802.04140,Ian Stewart and Jacob Eisenstein,"Making ""fetch"" happen: The influence of social and linguistic context on - nonstandard word growth and decline",cs.CL," In an online community, new words come and go: today's ""haha"" may be replaced -by tomorrow's ""lol."" Changes in online writing are usually studied as a social -process, with innovations diffusing through a network of individuals in a -speech community. But unlike other types of innovation, language change is -shaped and constrained by the system in which it takes part. To investigate the -links between social and structural factors in language change, we undertake a -large-scale analysis of nonstandard word growth in the online community Reddit. -We find that dissemination across many linguistic contexts is a sign of growth: -words that appear in more linguistic contexts grow faster and survive longer. -We also find that social dissemination likely plays a less important role in -explaining word growth and decline than previously hypothesized. -" -6679,1802.04200,"Alexandre B\'erard, Laurent Besacier, Ali Can Kocabiyikoglu, Olivier - Pietquin",End-to-End Automatic Speech Translation of Audiobooks,cs.CL," We investigate end-to-end speech-to-text translation on a corpus of -audiobooks specifically augmented for this task. Previous works investigated -the extreme case where source language transcription is not available during -learning nor decoding, but we also study a midway case where source language -transcription is available at training time only. In this case, a single model -is trained to decode source speech into target text in a single pass. -Experimental results show that it is possible to train compact and efficient -end-to-end speech translation models in this setup. We also distribute the -corpus and hope that our speech translation baseline on this corpus will be -challenged in the future. -" -6680,1802.04223,"Vlad Niculae, Andr\'e F. T. Martins, Mathieu Blondel, Claire Cardie",SparseMAP: Differentiable Sparse Structured Inference,stat.ML cs.CL cs.LG," Structured prediction requires searching over a combinatorial number of -structures. To tackle it, we introduce SparseMAP: a new method for sparse -structured inference, and its natural loss function. SparseMAP automatically -selects only a few global structures: it is situated between MAP inference, -which picks a single structure, and marginal inference, which assigns -probability mass to all structures, including implausible ones. Importantly, -SparseMAP can be computed using only calls to a MAP oracle, making it -applicable to problems with intractable marginal inference, e.g., linear -assignment. Sparsity makes gradient backpropagation efficient regardless of the -structure, enabling us to augment deep neural networks with generic and sparse -structured hidden layers. Experiments in dependency parsing and natural -language inference reveal competitive accuracy, improved interpretability, and -the ability to capture natural language ambiguities, which is attractive for -pipeline systems. -" -6681,1802.04302,"Ishita Dasgupta, Demi Guo, Andreas Stuhlm\""uller, Samuel J. Gershman - and Noah D. Goodman",Evaluating Compositionality in Sentence Embeddings,cs.CL stat.ML," An important challenge for human-like AI is compositional semantics. Recent -research has attempted to address this by using deep neural networks to learn -vector space embeddings of sentences, which then serve as input to other tasks. -We present a new dataset for one such task, `natural language inference' (NLI), -that cannot be solved using only word-level knowledge and requires some -compositionality. We find that the performance of state of the art sentence -embeddings (InferSent; Conneau et al., 2017) on our new dataset is poor. We -analyze the decision rules learned by InferSent and find that they are -consistent with simple heuristics that are ecologically valid in its training -dataset. Further, we find that augmenting training with our dataset improves -test performance on our dataset without loss of performance on the original -training dataset. This highlights the importance of structured datasets in -better understanding and improving AI systems. -" -6682,1802.04335,"Illia Polosukhin, Alexander Skidanov","Neural Program Search: Solving Programming Tasks from Description and - Examples",cs.AI cs.CL cs.PL," We present a Neural Program Search, an algorithm to generate programs from -natural language description and a small number of input/output examples. The -algorithm combines methods from Deep Learning and Program Synthesis fields by -designing rich domain-specific language (DSL) and defining efficient search -algorithm guided by a Seq2Tree model on it. To evaluate the quality of the -approach we also present a semi-synthetic dataset of descriptions with test -examples and corresponding programs. We show that our algorithm significantly -outperforms a sequence-to-sequence model with attention baseline. -" -6683,1802.04358,"Song Feng, R. Chulaka Gunasekara, Sunil Shashidhara, Kshitij P. Fadnis - and Lazaros C. Polymenakos",A Unified Implicit Dialog Framework for Conversational Search,cs.CL," We propose a unified Implicit Dialog framework for goal-oriented, information -seeking tasks of Conversational Search applications. It aims to enable dialog -interactions with domain data without replying on explicitly encoded the rules -but utilizing the underlying data representation to build the components -required for dialog interaction, which we refer as Implicit Dialog in this -work. The proposed framework consists of a pipeline of End-to-End trainable -modules. A centralized knowledge representation is used to semantically ground -multiple dialog modules. An associated set of tools are integrated with the -framework to gather end users' input for continuous improvement of the system. -The goal is to facilitate development of conversational systems by identifying -the components and the data that can be adapted and reused across many end-user -applications. We demonstrate our approach by creating conversational agents for -several independent domains. -" -6684,1802.04394,"Yelong Shen, Jianshu Chen, Po-Sen Huang, Yuqing Guo, Jianfeng Gao",M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search,cs.AI cs.CL cs.LG," Learning to walk over a graph towards a target node for a given query and a -source node is an important problem in applications such as knowledge base -completion (KBC). It can be formulated as a reinforcement learning (RL) problem -with a known state transition model. To overcome the challenge of sparse -rewards, we develop a graph-walking agent called M-Walk, which consists of a -deep recurrent neural network (RNN) and Monte Carlo Tree Search (MCTS). The RNN -encodes the state (i.e., history of the walked path) and maps it separately to -a policy and Q-values. In order to effectively train the agent from sparse -rewards, we combine MCTS with the neural policy to generate trajectories -yielding more positive rewards. From these trajectories, the network is -improved in an off-policy manner using Q-learning, which modifies the RNN -policy via parameter sharing. Our proposed RL algorithm repeatedly applies this -policy-improvement step to learn the model. At test time, MCTS is combined with -the neural policy to predict the target node. Experimental results on several -graph-walking benchmarks show that M-Walk is able to learn better policies than -other RL-based methods, which are mainly based on policy gradients. M-Walk also -outperforms traditional KBC baselines. -" -6685,1802.04425,Hannah Morrison and Chris Martens,"""How Was Your Weekend?"" A Generative Model of Phatic Conversation",cs.CL cs.AI," Unspoken social rules, such as those that govern choosing a proper discussion -topic and when to change discussion topics, guide conversational behaviors. We -propose a computational model of conversation that can follow or break such -rules, with participant agents that respond accordingly. Additionally, we -demonstrate an application of the model: the Experimental Social Tutor (EST), a -first step toward a social skills training tool that generates human-readable -conversation and a conversational guideline at each point in the dialogue. -Finally, we discuss the design and results of a pilot study evaluating the EST. -Results show that our model is capable of producing conversations that follow -social norms. -" -6686,1802.04559,Carlos-Emiliano Gonz\'alez-Gallardo and Juan-Manuel Torres-Moreno,"Sentence Boundary Detection for French with Subword-Level Information - Vectors and Convolutional Neural Networks",cs.CL," In this work we tackle the problem of sentence boundary detection applied to -French as a binary classification task (""sentence boundary"" or ""not sentence -boundary""). We combine convolutional neural networks with subword-level -information vectors, which are word embedding representations learned from -Wikipedia that take advantage of the words morphology; so each word is -represented as a bag of their character n-grams. - We decide to use a big written dataset (French Gigaword) instead of standard -size transcriptions to train and evaluate the proposed architectures with the -intention of using the trained models in posterior real life ASR -transcriptions. - Three different architectures are tested showing similar results; general -accuracy for all models overpasses 0.96. All three models have good F1 scores -reaching values over 0.97 regarding the ""not sentence boundary"" class. However, -the ""sentence boundary"" class reflects lower scores decreasing the F1 metric to -0.778 for one of the models. - Using subword-level information vectors seem to be very effective leading to -conclude that the morphology of words encoded in the embeddings representations -behave like pixels in an image making feasible the use of convolutional neural -network architectures. -" -6687,1802.04609,Abhik Jana and Pawan Goyal,Network Features Based Co-hyponymy Detection,cs.CL," Distinguishing lexical relations has been a long term pursuit in natural -language processing (NLP) domain. Recently, in order to detect lexical -relations like hypernymy, meronymy, co-hyponymy etc., distributional semantic -models are being used extensively in some form or the other. Even though a lot -of efforts have been made for detecting hypernymy relation, the problem of -co-hyponymy detection has been rarely investigated. In this paper, we are -proposing a novel supervised model where various network measures have been -utilized to identify co-hyponymy relation with high accuracy performing better -or at par with the state-of-the-art models. -" -6688,1802.04675,"Parth Mehta, Gaurav Arora, Prasenjit Majumder","Attention based Sentence Extraction from Scientific Articles using - Pseudo-Labeled data",cs.IR cs.AI cs.CL," In this work, we present a weakly supervised sentence extraction technique -for identifying important sentences in scientific papers that are worthy of -inclusion in the abstract. We propose a new attention based deep learning -architecture that jointly learns to identify important content, as well as the -cue phrases that are indicative of summary worthy sentences. We propose a new -context embedding technique for determining the focus of a given paper using -topic models and use it jointly with an LSTM based sequence encoder to learn -attention weights across the sentence words. We use a collection of articles -publicly available through ACL anthology for our experiments. Our system -achieves a performance that is better, in terms of several ROUGE metrics, as -compared to several state of art extractive techniques. It also generates more -coherent summaries and preserves the overall structure of the document. -" -6689,1802.04681,"Marzieh Fadaee, Arianna Bisazza, Christof Monz",Examining the Tip of the Iceberg: A Data Set for Idiom Translation,cs.CL," Neural Machine Translation (NMT) has been widely used in recent years with -significant improvements for many language pairs. Although state-of-the-art NMT -systems are generating progressively better translations, idiom translation -remains one of the open challenges in this field. Idioms, a category of -multiword expressions, are an interesting language phenomenon where the overall -meaning of the expression cannot be composed from the meanings of its parts. A -first important challenge is the lack of dedicated data sets for learning and -evaluating idiom translation. In this paper we address this problem by creating -the first large-scale data set for idiom translation. Our data set is -automatically extracted from a widely used German-English translation corpus -and includes, for each language direction, a targeted evaluation set where all -sentences contain idioms and a regular training corpus where sentences -including idioms are marked. We release this data set and use it to perform -preliminary NMT experiments as the first step towards better idiom translation. -" -6690,1802.04744,Tommaso Pasini and Jose Camacho-Collados,A Short Survey on Sense-Annotated Corpora,cs.CL," Large sense-annotated datasets are increasingly necessary for training deep -supervised systems in Word Sense Disambiguation. However, gathering -high-quality sense-annotated data for as many instances as possible is a -laborious and expensive task. This has led to the proliferation of automatic -and semi-automatic methods for overcoming the so-called knowledge-acquisition -bottleneck. In this short survey we present an overview of sense-annotated -corpora, annotated either manually- or (semi)automatically, that are currently -available for different languages and featuring distinct lexical resources as -inventory of senses, i.e. WordNet, Wikipedia, BabelNet. Furthermore, we provide -the reader with general statistics of each dataset and an analysis of their -specific features. -" -6691,1802.05014,Amaru Cuba Gyllensten and Magnus Sahlgren,Distributional Term Set Expansion,cs.CL," This paper is a short empirical study of the performance of centrality and -classification based iterative term set expansion methods for distributional -semantic models. Iterative term set expansion is an interactive process using -distributional semantics models where a user labels terms as belonging to some -sought after term set, and a system uses this labeling to supply the user with -new, candidate, terms to label, trying to maximize the number of positive -examples found. While centrality based methods have a long history in term set -expansion, we compare them to classification methods based on the the Simple -Margin method, an Active Learning approach to classification using Support -Vector Machines. Examining the performance of various centrality and -classification based methods for a variety of distributional models over five -different term sets, we can show that active learning based methods -consistently outperform centrality based methods. -" -6692,1802.05092,"Odette Scharenborg, Laurent Besacier, Alan Black, Mark - Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre - Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, - Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, - Liming Wang, Emmanuel Dupoux","Linguistic unit discovery from multi-modal inputs in unwritten - languages: Summary of the ""Speaking Rosetta"" JSALT 2017 Workshop",cs.CL," We summarize the accomplishments of a multi-disciplinary workshop exploring -the computational and scientific issues surrounding the discovery of linguistic -units (subwords and words) in a language without orthography. We study the -replacement of orthographic transcriptions by images and/or translated text in -a well-resourced language to help unsupervised discovery from raw speech. -" -6693,1802.05121,"Shashank Gupta, Manish Gupta, Vasudeva Varma, Sachin Pawar, Nitin - Ramrakhiyani, Girish K. Palshikar",Co-training for Extraction of Adverse Drug Reaction Mentions from Tweets,cs.IR cs.CL," Adverse drug reactions (ADRs) are one of the leading causes of mortality in -health care. Current ADR surveillance systems are often associated with a -substantial time lag before such events are officially published. On the other -hand, online social media such as Twitter contain information about ADR events -in real-time, much before any official reporting. Current state-of-the-art -methods in ADR mention extraction use Recurrent Neural Networks (RNN), which -typically need large labeled corpora. Towards this end, we propose a -semi-supervised method based on co-training which can exploit a large pool of -unlabeled tweets to augment the limited supervised training data, and as a -result enhance the performance. Experiments with 0.1M tweets show that the -proposed approach outperforms the state-of-the-art methods for the ADR mention -extraction task by 5% in terms of F1 score. -" -6694,1802.05130,"Shashank Gupta, Manish Gupta, Vasudeva Varma, Sachin Pawar, Nitin - Ramrakhiyani and Girish K. Palshikar","Multi-Task Learning for Extraction of Adverse Drug Reaction Mentions - from Tweets",cs.IR cs.CL," Adverse drug reactions (ADRs) are one of the leading causes of mortality in -health care. Current ADR surveillance systems are often associated with a -substantial time lag before such events are officially published. On the other -hand, online social media such as Twitter contain information about ADR events -in real-time, much before any official reporting. Current state-of-the-art in -ADR mention extraction uses Recurrent Neural Networks (RNN), which typically -need large labeled corpora. Towards this end, we propose a multi-task learning -based method which can utilize a similar auxiliary task (adverse drug event -detection) to enhance the performance of the main task, i.e., ADR extraction. -Furthermore, in the absence of auxiliary task dataset, we propose a novel joint -multi-task learning method to automatically generate weak supervision dataset -for the auxiliary task when a large pool of unlabeled tweets is available. -Experiments with 0.48M tweets show that the proposed approach outperforms the -state-of-the-art methods for the ADR mention extraction task by 7.2% in terms -of F1 score. -" -6695,1802.05300,"Dan Hendrycks, Mantas Mazeika, Duncan Wilson, Kevin Gimpel","Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe - Noise",cs.LG cs.CL cs.CV cs.NE," The growing importance of massive datasets used for deep learning makes -robustness to label noise a critical property for classifiers to have. Sources -of label noise include automatic labeling, non-expert labeling, and label -corruption by data poisoning adversaries. Numerous previous works assume that -no source of labels can be trusted. We relax this assumption and assume that a -small subset of the training data is trusted. This enables substantial label -corruption robustness performance gains. In addition, particularly severe label -noise can be combated by using a set of trusted data with clean labels. We -utilize trusted data by proposing a loss correction technique that utilizes -trusted examples in a data-efficient manner to mitigate the effects of label -noise on deep neural network classifiers. Across vision and natural language -processing tasks, we experiment with various label noises at several strengths, -and show that our method significantly outperforms existing methods. -" -6696,1802.05322,Adam Nyberg,Classifying movie genres by analyzing text reviews,cs.CL," This paper proposes a method for classifying movie genres by only looking at -text reviews. The data used are from Large Movie Review Dataset v1.0 and IMDb. -This paper compared a K-nearest neighbors (KNN) model and a multilayer -perceptron (MLP) that uses tf-idf as input features. The paper also discusses -different evaluation metrics used when doing multi-label classification. For -the data used in this research, the KNN model performed the best with an -accuracy of 55.4\% and a Hamming loss of 0.047. -" -6697,1802.05365,"Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, - Christopher Clark, Kenton Lee, Luke Zettlemoyer",Deep contextualized word representations,cs.CL," We introduce a new type of deep contextualized word representation that -models both (1) complex characteristics of word use (e.g., syntax and -semantics), and (2) how these uses vary across linguistic contexts (i.e., to -model polysemy). Our word vectors are learned functions of the internal states -of a deep bidirectional language model (biLM), which is pre-trained on a large -text corpus. We show that these representations can be easily added to existing -models and significantly improve the state of the art across six challenging -NLP problems, including question answering, textual entailment and sentiment -analysis. We also present an analysis showing that exposing the deep internals -of the pre-trained network is crucial, allowing downstream models to mix -different types of semi-supervision signals. -" -6698,1802.05368,"Jiatao Gu, Hany Hassan, Jacob Devlin, Victor O.K. Li","Universal Neural Machine Translation for Extremely Low Resource - Languages",cs.CL," In this paper, we propose a new universal machine translation approach -focusing on languages with a limited amount of parallel data. Our proposed -approach utilizes a transfer-learning approach to share lexical and sentence -level representations across multiple source languages into one target -language. The lexical part is shared through a Universal Lexical Representation -to support multilingual word-level sharing. The sentence-level sharing is -represented by a model of experts from all source languages that share the -source encoders with all other languages. This enables the low-resource -language to utilize the lexical and sentence representations of the higher -resource languages. Our approach is able to achieve 23 BLEU on Romanian-English -WMT2016 using a tiny parallel corpus of 6k sentences, compared to the 18 BLEU -of strong baseline system which uses multilingual training and -back-translation. Furthermore, we show that the proposed approach can achieve -almost 20 BLEU on the same dataset through fine-tuning a pre-trained -multi-lingual system in a zero-shot setting. -" -6699,1802.05373,"Guozhen An, Mehrnoosh Shafiee, Davood Shamsi","Improving Retrieval Modeling Using Cross Convolution Networks And Multi - Frequency Word Embedding",cs.CL," To build a satisfying chatbot that has the ability of managing a -goal-oriented multi-turn dialogue, accurate modeling of human conversation is -crucial. In this paper we concentrate on the task of response selection for -multi-turn human-computer conversation with a given context. Previous -approaches show weakness in capturing information of rare keywords that appear -in either or both context and correct response, and struggle with long input -sequences. We propose Cross Convolution Network (CCN) and Multi Frequency word -embedding to address both problems. We train several models using the Ubuntu -Dialogue dataset which is the largest freely available multi-turn based -dialogue corpus. We further build an ensemble model by averaging predictions of -multiple models. We achieve a new state-of-the-art on this dataset with -considerable improvements compared to previous best results. -" -6700,1802.05383,"Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, - Mark Hasegawa-Johnson",Deep Learning Based Speech Beamforming,cs.CL cs.AI cs.SD eess.AS eess.SP," Multi-channel speech enhancement with ad-hoc sensors has been a challenging -task. Speech model guided beamforming algorithms are able to recover natural -sounding speech, but the speech models tend to be oversimplified or the -inference would otherwise be too complicated. On the other hand, deep learning -based enhancement approaches are able to learn complicated speech distributions -and perform efficient inference, but they are unable to deal with variable -number of input channels. Also, deep learning approaches introduce a lot of -errors, particularly in the presence of unseen noise types and settings. We -have therefore proposed an enhancement framework called DEEPBEAM, which -combines the two complementary classes of algorithms. DEEPBEAM introduces a -beamforming filter to produce natural sounding speech, but the filter -coefficients are determined with the help of a monaural speech enhancement -neural network. Experiments on synthetic and real-world data show that DEEPBEAM -is able to produce clean, dry and natural sounding speech, and is robust -against unseen noise. -" -6701,1802.05412,Chan Woo Kim,"NtMalDetect: A Machine Learning Approach to Malware Detection Using - Native API System Calls",cs.CR cs.CL," As computing systems become increasingly advanced and as users increasingly -engage themselves in technology, security has never been a greater concern. In -malware detection, static analysis, the method of analyzing potentially -malicious files, has been the prominent approach. This approach, however, -quickly falls short as malicious programs become more advanced and adopt the -capabilities of obfuscating its binaries to execute the same malicious -functions, making static analysis extremely difficult for newer variants. The -approach assessed in this paper is a novel dynamic malware analysis method, -which may generalize better than static analysis to newer variants. Inspired by -recent successes in Natural Language Processing (NLP), widely used document -classification techniques were assessed in detecting malware by doing such -analysis on system calls, which contain useful information about the operation -of a program as requests that the program makes of the kernel. Features -considered are extracted from system call traces of benign and malicious -programs, and the task to classify these traces is treated as a binary document -classification task of system call traces. The system call traces were -processed to remove the parameters to only leave the system call function -names. The features were grouped into various n-grams and weighted with Term -Frequency-Inverse Document Frequency. This paper shows that Linear Support -Vector Machines (SVM) optimized by Stochastic Gradient Descent and the -traditional Coordinate Descent on the Wolfe Dual form of the SVM are effective -in this approach, achieving a highest of 96% accuracy with 95% recall score. -Additional contributions include the identification of significant system call -sequences that could be avenues for further research. -" -6702,1802.05415,Sumeet S. Singh,"Teaching Machines to Code: Neural Markup Generation with Visual - Attention",cs.LG cs.CL cs.CV cs.NE," We present a neural transducer model with visual attention that learns to -generate LaTeX markup of a real-world math formula given its image. Applying -sequence modeling and transduction techniques that have been very successful -across modalities such as natural language, image, handwriting, speech and -audio; we construct an image-to-markup model that learns to produce -syntactically and semantically correct LaTeX markup code over 150 words long -and achieves a BLEU score of 89%; improving upon the previous state-of-art for -the Im2Latex problem. We also demonstrate with heat-map visualization how -attention helps in interpreting the model and can pinpoint (detect and -localize) symbols on the image accurately despite having been trained without -any bounding box data. -" -6703,1802.05574,Paul Groth and Michael Lauruhn and Antony Scerri and Ron Daniel Jr,Open Information Extraction on Scientific Text: An Evaluation,cs.CL," Open Information Extraction (OIE) is the task of the unsupervised creation of -structured information from text. OIE is often used as a starting point for a -number of downstream tasks including knowledge base construction, relation -extraction, and question answering. While OIE methods are targeted at being -domain independent, they have been evaluated primarily on newspaper, -encyclopedic or general web text. In this article, we evaluate the performance -of OIE on scientific texts originating from 10 different disciplines. To do so, -we use two state-of-the-art OIE systems applying a crowd-sourcing approach. We -find that OIE systems perform significantly worse on scientific text than -encyclopedic text. We also provide an error analysis and suggest areas of work -to reduce errors. Our corpus of sentences and judgments are made available. -" -6704,1802.05577,"Reza Ghaeini, Sadid A. Hasan, Vivek Datla, Joey Liu, Kathy Lee, - Ashequl Qadir, Yuan Ling, Aaditya Prakash, Xiaoli Z. Fern and Oladimeji Farri","DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language - Inference",cs.CL," We present a novel deep learning architecture to address the natural language -inference (NLI) task. Existing approaches mostly rely on simple reading -mechanisms for independent encoding of the premise and hypothesis. Instead, we -propose a novel dependent reading bidirectional LSTM network (DR-BiLSTM) to -efficiently model the relationship between a premise and a hypothesis during -encoding and inference. We also introduce a sophisticated ensemble strategy to -combine our proposed models, which noticeably improves final predictions. -Finally, we demonstrate how the results can be improved further with an -additional preprocessing step. Our evaluation shows that DR-BiLSTM obtains the -best single model and ensemble model results achieving the new state-of-the-art -scores on the Stanford NLI dataset. -" -6705,1802.05583,"Tiberiu Boros, Stefan Daniel Dumitrescu and Vasile Pais","Tools and resources for Romanian text-to-speech and speech-to-text - applications",cs.CL," In this paper we introduce a set of resources and tools aimed at providing -support for natural language processing, text-to-speech synthesis and speech -recognition for Romanian. While the tools are general purpose and can be used -for any language (we successfully trained our system for more than 50 languages -and participated in the Universal Dependencies Shared Task), the resources are -only relevant for Romanian language processing. -" -6706,1802.05630,"Caroline Etienne, Guillaume Fidanza, Andrei Petrovskii, Laurence - Devillers and Benoit Schmauch","CNN+LSTM Architecture for Speech Emotion Recognition with Data - Augmentation",cs.SD cs.CL cs.LG eess.AS," In this work we design a neural network for recognizing emotions in speech, -using the IEMOCAP dataset. Following the latest advances in audio analysis, we -use an architecture involving both convolutional layers, for extracting -high-level features from raw spectrograms, and recurrent ones for aggregating -long-term dependencies. We examine the techniques of data augmentation with -vocal track length perturbation, layer-wise optimizer adjustment, batch -normalization of recurrent layers and obtain highly competitive results of -64.5% for weighted accuracy and 61.7% for unweighted accuracy on four emotions. -" -6707,1802.05667,Atish Pawar and Vijay Mago,"Calculating the similarity between words and sentences using a lexical - database and corpus statistics",cs.CL," Calculating the semantic similarity between sentences is a long dealt problem -in the area of natural language processing. The semantic analysis field has a -crucial role to play in the research related to the text analytics. The -semantic similarity differs as the domain of operation differs. In this paper, -we present a methodology which deals with this issue by incorporating semantic -similarity and corpus statistics. To calculate the semantic similarity between -words and sentences, the proposed method follows an edge-based approach using a -lexical database. The methodology can be applied in a variety of domains. The -methodology has been tested on both benchmark standards and mean human -similarity dataset. When tested on these two datasets, it gives highest -correlation value for both word and sentence similarity outperforming other -similar models. For word similarity, we obtained Pearson correlation -coefficient of 0.8753 and for sentence similarity, the correlation obtained is -0.8794. -" -6708,1802.05672,"Reza Ghaeini, Xiaoli Z. Fern, Liang Huang, Prasad Tadepalli",Event Nugget Detection with Forward-Backward Recurrent Neural Networks,cs.CL," Traditional event detection methods heavily rely on manually engineered rich -features. Recent deep learning approaches alleviate this problem by automatic -feature engineering. But such efforts, like tradition methods, have so far only -focused on single-token event mentions, whereas in practice events can also be -a phrase. We instead use forward-backward recurrent neural networks (FBRNNs) to -detect events that can be either words or phrases. To the best our knowledge, -this is one of the first efforts to handle multi-word events and also the first -attempt to use RNNs for event detection. Experimental results demonstrate that -FBRNN is competitive with the state-of-the-art methods on the ACE 2005 and the -Rich ERE 2015 event detection tasks. -" -6709,1802.05694,"Xilun Chen, Claire Cardie",Multinomial Adversarial Networks for Multi-Domain Text Classification,cs.CL cs.LG stat.ML," Many text classification tasks are known to be highly domain-dependent. -Unfortunately, the availability of training data can vary drastically across -domains. Worse still, for some domains there may not be any annotated data at -all. In this work, we propose a multinomial adversarial network (MAN) to tackle -the text classification problem in this real-world multidomain setting (MDTC). -We provide theoretical justifications for the MAN framework, proving that -different instances of MANs are essentially minimizers of various f-divergence -metrics (Ali and Silvey, 1966) among multiple probability distributions. MANs -are thus a theoretically sound generalization of traditional adversarial -networks that discriminate over two distributions. More specifically, for the -MDTC task, MAN learns features that are invariant across multiple domains by -resorting to its ability to reduce the divergence among the feature -distributions of each domain. We present experimental results showing that MANs -significantly outperform the prior art on the MDTC task. We also show that MANs -achieve state-of-the-art performance for domains with no labeled data. -" -6710,1802.05695,"James Mullenbach, Sarah Wiegreffe, Jon Duke, Jimeng Sun, Jacob - Eisenstein",Explainable Prediction of Medical Codes from Clinical Text,cs.CL cs.LG stat.ML," Clinical notes are text documents that are created by clinicians for each -patient encounter. They are typically accompanied by medical codes, which -describe the diagnosis and treatment. Annotating these codes is labor intensive -and error prone; furthermore, the connection between the codes and the text is -not annotated, obscuring the reasons and details behind specific diagnoses and -treatments. We present an attentional convolutional network that predicts -medical codes from clinical text. Our method aggregates information across the -document using a convolutional neural network, and uses an attention mechanism -to select the most relevant segments for each of the thousands of possible -codes. The method is accurate, achieving precision@8 of 0.71 and a Micro-F1 of -0.54, which are both better than the prior state of the art. Furthermore, -through an interpretability evaluation by a physician, we show that the -attention mechanism identifies meaningful explanations for each code assignment -" -6711,1802.05737,Kamal Sarkar,"JU_KS@SAIL_CodeMixed-2017: Sentiment Analysis for Indian Code Mixed - Social Media Texts",cs.CL," This paper reports about our work in the NLP Tool Contest @ICON-2017, shared -task on Sentiment Analysis for Indian Languages (SAIL) (code mixed). To -implement our system, we have used a machine learning algo-rithm called -Multinomial Na\""ive Bayes trained using n-gram and SentiWordnet features. We -have also used a small SentiWordnet for English and a small SentiWordnet for -Bengali. But we have not used any SentiWordnet for Hindi language. We have -tested our system on Hindi-English and Bengali-English code mixed social media -data sets released for the contest. The performance of our system is very close -to the best system participated in the contest. For both Bengali-English and -Hindi-English runs, our system was ranked at the 3rd position out of all -submitted runs and awarded the 3rd prize in the contest. -" -6712,1802.05758,Christian Stab and Tristan Miller and Iryna Gurevych,"Cross-topic Argument Mining from Heterogeneous Sources Using - Attention-based Neural Networks",cs.CL," Argument mining is a core technology for automating argument search in large -document collections. Despite its usefulness for this task, most current -approaches to argument mining are designed for use only with specific text -types and fall short when applied to heterogeneous texts. In this paper, we -propose a new sentential annotation scheme that is reliably applicable by crowd -workers to arbitrary Web texts. We source annotations for over 25,000 instances -covering eight controversial topics. The results of cross-topic experiments -show that our attention-based neural network generalizes best to unseen topics -and outperforms vanilla BiLSTM models by 6% in accuracy and 11% in F-score. -" -6713,1802.05766,"Yan Zhang, Jonathon Hare, Adam Pr\""ugel-Bennett","Learning to Count Objects in Natural Images for Visual Question - Answering",cs.CV cs.CL," Visual Question Answering (VQA) models have struggled with counting objects -in natural images so far. We identify a fundamental problem due to soft -attention in these models as a cause. To circumvent this problem, we propose a -neural network component that allows robust counting from object proposals. -Experiments on a toy task show the effectiveness of this component and we -obtain state-of-the-art accuracy on the number category of the VQA v2 dataset -without negatively affecting other categories, even outperforming ensemble -models with our single model. On a difficult balanced pair metric, the -component gives a substantial improvement in counting over a strong baseline by -6.6%. -" -6714,1802.05818,"Shuai Wang, Mianwei Zhou, Sahisnu Mazumder, Bing Liu, Yi Chang","Disentangling Aspect and Opinion Words in Target-based Sentiment - Analysis using Lifelong Learning",cs.CL cs.AI," Given a target name, which can be a product aspect or entity, identifying its -aspect words and opinion words in a given corpus is a fine-grained task in -target-based sentiment analysis (TSA). This task is challenging, especially -when we have no labeled data and we want to perform it for any given domain. To -address it, we propose a general two-stage approach. Stage one extracts/groups -the target-related words (call t-words) for a given target. This is relatively -easy as we can apply an existing semantics-based learning technique. Stage two -separates the aspect and opinion words from the grouped t-words, which is -challenging because we often do not have enough word-level aspect and opinion -labels. In this work, we formulate this problem in a PU learning setting and -incorporate the idea of lifelong learning to solve it. Experimental results -show the effectiveness of our approach. -" -6715,1802.05853,"Vikramjit Mitra, Wen Wang, Chris Bartels, Horacio Franco and Dimitra - Vergyri","Articulatory information and Multiview Features for Large Vocabulary - Continuous Speech Recognition",cs.CL cs.SD eess.AS," This paper explores the use of multi-view features and their discriminative -transforms in a convolutional deep neural network (CNN) architecture for a -continuous large vocabulary speech recognition task. Mel-filterbank energies -and perceptually motivated forced damped oscillator coefficient (DOC) features -are used after feature-space maximum-likelihood linear regression (fMLLR) -transforms, which are combined and fed as a multi-view feature to a single CNN -acoustic model. Use of multi-view feature representation demonstrated -significant reduction in word error rates (WERs) compared to the use of -individual features by themselves. In addition, when articulatory information -was used as an additional input to a fused deep neural network (DNN) and CNN -acoustic model, it was found to demonstrate further reduction in WER for the -Switchboard subset and the CallHome subset (containing partly non-native -accented speech) of the NIST 2000 conversational telephone speech test set, -reducing the error rate by 12% relative to the baseline in both cases. This -work shows that multi-view features in association with articulatory -information can improve speech recognition robustness to spontaneous and -non-native speech. -" -6716,1802.05883,Miguel Rios and Wilker Aziz and Khalil Sima'an,Deep Generative Model for Joint Alignment and Word Representation,cs.CL cs.AI," This work exploits translation data as a source of semantically relevant -learning signal for models of word representation. In particular, we exploit -equivalence through translation as a form of distributed context and jointly -learn how to embed and align with a deep generative model. Our EmbedAlign model -embeds words in their complete observed context and learns by marginalisation -of latent lexical alignments. Besides, it embeds words as posterior probability -densities, rather than point estimates, which allows us to compare words in -context using a measure of overlap between distributions (e.g. KL divergence). -We investigate our model's performance on a range of lexical semantics tasks -achieving competitive results on several standard benchmarks including natural -language inference, paraphrasing, and text similarity. -" -6717,1802.05930,K M Annervaz and Somnath Basu Roy Chowdhury and Ambedkar Dukkipati,"Learning beyond datasets: Knowledge Graph Augmented Neural Networks for - Natural language Processing",cs.CL," Machine Learning has been the quintessential solution for many AI problems, -but learning is still heavily dependent on the specific training data. Some -learning models can be incorporated with a prior knowledge in the Bayesian set -up, but these learning models do not have the ability to access any organised -world knowledge on demand. In this work, we propose to enhance learning models -with world knowledge in the form of Knowledge Graph (KG) fact triples for -Natural Language Processing (NLP) tasks. Our aim is to develop a deep learning -model that can extract relevant prior support facts from knowledge graphs -depending on the task using attention mechanism. We introduce a -convolution-based model for learning representations of knowledge graph entity -and relation clusters in order to reduce the attention space. We show that the -proposed method is highly scalable to the amount of prior information that has -to be processed and can be applied to any generic NLP task. Using this method -we show significant improvement in performance for text classification with -News20, DBPedia datasets and natural language inference with Stanford Natural -Language Inference (SNLI) dataset. We also demonstrate that a deep learning -model can be trained well with substantially less amount of labeled training -data, when it has access to organised world knowledge in the form of knowledge -graph. -" -6718,1802.05934,Somnath Basu Roy Chowdhury and K M Annervaz and Ambedkar Dukkipati,"Instance-based Inductive Deep Transfer Learning by Cross-Dataset - Querying with Locality Sensitive Hashing",cs.CL," Supervised learning models are typically trained on a single dataset and the -performance of these models rely heavily on the size of the dataset, i.e., -amount of data available with the ground truth. Learning algorithms try to -generalize solely based on the data that is presented with during the training. -In this work, we propose an inductive transfer learning method that can augment -learning models by infusing similar instances from different learning tasks in -the Natural Language Processing (NLP) domain. We propose to use instance -representations from a source dataset, \textit{without inheriting anything} -from the source learning model. Representations of the instances of -\textit{source} \& \textit{target} datasets are learned, retrieval of relevant -source instances is performed using soft-attention mechanism and -\textit{locality sensitive hashing}, and then, augmented into the model during -training on the target dataset. Our approach simultaneously exploits the local -\textit{instance level information} as well as the macro statistical viewpoint -of the dataset. Using this approach we have shown significant improvements for -three major news classification datasets over the baseline. Experimental -evaluations also show that the proposed approach reduces dependency on labeled -data by a significant margin for comparable performance. With our proposed -cross dataset learning procedure we show that one can achieve -competitive/better performance than learning from a single dataset. -" -6719,1802.06003,"Takatomo Kano, Sakriani Sakti, Satoshi Nakamura","Structured-based Curriculum Learning for End-to-end English-Japanese - Speech Translation",cs.CL cs.SD eess.AS," Sequence-to-sequence attentional-based neural network architectures have been -shown to provide a powerful model for machine translation and speech -recognition. Recently, several works have attempted to extend the models for -end-to-end speech translation task. However, the usefulness of these models -were only investigated on language pairs with similar syntax and word order -(e.g., English-French or English-Spanish). In this work, we focus on end-to-end -speech translation tasks on syntactically distant language pairs (e.g., -English-Japanese) that require distant word reordering. - To guide the encoder-decoder attentional model to learn this difficult -problem, we propose a structured-based curriculum learning strategy. - Unlike conventional curriculum learning that gradually emphasizes difficult -data examples, we formalize learning strategies from easier network structures -to more difficult network structures. Here, we start the training with -end-to-end encoder-decoder for speech recognition or text-based machine -translation task then gradually move to end-to-end speech translation task. The -experiment results show that the proposed approach could provide significant -improvements in comparison with the one without curriculum learning. -" -6720,1802.06006,"Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou",Neural Voice Cloning with a Few Samples,cs.CL cs.LG cs.SD eess.AS," Voice cloning is a highly desired feature for personalized speech interfaces. -Neural network based speech synthesis has been shown to generate high quality -speech for a large number of speakers. In this paper, we introduce a neural -voice cloning system that takes a few audio samples as input. We study two -approaches: speaker adaptation and speaker encoding. Speaker adaptation is -based on fine-tuning a multi-speaker generative model with a few cloning -samples. Speaker encoding is based on training a separate model to directly -infer a new speaker embedding from cloning audios and to be used with a -multi-speaker generative model. In terms of naturalness of the speech and its -similarity to original speaker, both approaches can achieve good performance, -even with very few cloning audios. While speaker adaptation can achieve better -naturalness and similarity, the cloning time or required memory for the speaker -encoding approach is significantly less, making it favorable for low-resource -deployment. -" -6721,1802.06007,"Daniel Lichtblau, Catalin Stoean",Authorship Attribution Using the Chaos Game Representation,cs.CL cs.DL cs.IR," The Chaos Game Representation, a method for creating images from nucleotide -sequences, is modified to make images from chunks of text documents. Machine -learning methods are then applied to train classifiers based on authorship. -Experiments are conducted on several benchmark data sets in English, including -the widely used Federalist Papers, and one in Portuguese. Validation results -for the trained classifiers are competitive with the best methods in prior -literature. The methodology is also successfully applied for text -categorization with encouraging results. One classifier method is moreover seen -to hold promise for the task of digital fingerprinting. -" -6722,1802.06024,"Sahisnu Mazumder, Nianzu Ma and Bing Liu",Towards a Continuous Knowledge Learning Engine for Chatbots,cs.CL cs.AI cs.HC," Although chatbots have been very popular in recent years, they still have -some serious weaknesses which limit the scope of their applications. One major -weakness is that they cannot learn new knowledge during the conversation -process, i.e., their knowledge is fixed beforehand and cannot be expanded or -updated during conversation. In this paper, we propose to build a general -knowledge learning engine for chatbots to enable them to continuously and -interactively learn new knowledge during conversations. As time goes by, they -become more and more knowledgeable and better and better at learning and -conversation. We model the task as an open-world knowledge base completion -problem and propose a novel technique called lifelong interactive learning and -inference (LiLi) to solve it. LiLi works by imitating how humans acquire -knowledge and perform inference during an interactive conversation. Our -experimental results show LiLi is highly promising. -" -6723,1802.06041,Marianna J. Martindale and Marine Carpuat,"Fluency Over Adequacy: A Pilot Study in Measuring User Trust in - Imperfect MT",cs.CL," Although measuring intrinsic quality has been a key factor in the advancement -of Machine Translation (MT), successfully deploying MT requires considering not -just intrinsic quality but also the user experience, including aspects such as -trust. This work introduces a method of studying how users modulate their trust -in an MT system after seeing errorful (disfluent or inadequate) output amidst -good (fluent and adequate) output. We conduct a survey to determine how users -respond to good translations compared to translations that are either adequate -but not fluent, or fluent but not adequate. In this pilot study, users -responded strongly to disfluent translations, but were, surprisingly, much less -concerned with adequacy. -" -6724,1802.06053,"Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark - Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, - Fran\c{c}ois Yvon, Sanjeev Khudanpur",Bayesian Models for Unit Discovery on a Very Low Resource Language,cs.CL," Developing speech technologies for low-resource languages has become a very -active research field over the last decade. Among others, Bayesian models have -shown some promising results on artificial examples but still lack of in situ -experiments. Our work applies state-of-the-art Bayesian models to unsupervised -Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also -show that Bayesian models can naturally integrate information from other -resourceful languages by means of informative prior leading to more consistent -discovered units. Finally, discovered acoustic units are used, either as the -1-best sequence or as a lattice, to perform word segmentation. Word -segmentation results show that this Bayesian approach clearly outperforms a -Segmental-DTW baseline on the same corpus. -" -6725,1802.06079,"Gerhard J\""ager",Global-scale phylogenetic linguistic inference from lexical resources,cs.CL q-bio.QM," Automatic phylogenetic inference plays an increasingly important role in -computational historical linguistics. Most pertinent work is currently based on -expert cognate judgments. This limits the scope of this approach to a small -number of well-studied language families. We used machine learning techniques -to compile data suitable for phylogenetic inference from the ASJP database, a -collection of almost 7,000 phonetically transcribed word lists over 40 -concepts, covering two third of the extant world-wide linguistic diversity. -First, we estimated Pointwise Mutual Information scores between sound classes -using weighted sequence alignment and general-purpose optimization. From this -we computed a dissimilarity matrix over all ASJP word lists. This matrix is -suitable for distance-based phylogenetic inference. Second, we applied cognate -clustering to the ASJP data, using supervised training of an SVM classifier on -expert cognacy judgments. Third, we defined two types of binary characters, -based on automatically inferred cognate classes and on sound-class occurrences. -Several tests are reported demonstrating the suitability of these characters -for character-based phylogenetic inference. -" -6726,1802.06185,"Vikas Reddy, Amrith Krishna, Vishnu Dutt Sharma, Prateek Gupta, - Vineeth M R, Pawan Goyal",Building a Word Segmenter for Sanskrit Overnight,cs.CL cs.IR," There is an abundance of digitised texts available in Sanskrit. However, the -word segmentation task in such texts are challenging due to the issue of -'Sandhi'. In Sandhi, words in a sentence often fuse together to form a single -chunk of text, where the word delimiter vanishes and sounds at the word -boundaries undergo transformations, which is also reflected in the written -text. Here, we propose an approach that uses a deep sequence to sequence -(seq2seq) model that takes only the sandhied string as the input and predicts -the unsandhied string. The state of the art models are linguistically involved -and have external dependencies for the lexical and morphological analysis of -the input. Our model can be trained ""overnight"" and be used for production. In -spite of the knowledge lean approach, our system preforms better than the -current state of the art by gaining a percentage increase of 16.79 % than the -current state of the art. -" -6727,1802.06196,"Abhik Jana, Pawan Goyal","Can Network Embedding of Distributional Thesaurus be Combined with Word - Vectors for Better Representation?",cs.CL," Distributed representations of words learned from text have proved to be -successful in various natural language processing tasks in recent times. While -some methods represent words as vectors computed from text using predictive -model (Word2vec) or dense count based model (GloVe), others attempt to -represent these in a distributional thesaurus network structure where the -neighborhood of a word is a set of words having adequate context overlap. Being -motivated by recent surge of research in network embedding techniques -(DeepWalk, LINE, node2vec etc.), we turn a distributional thesaurus network -into dense word vectors and investigate the usefulness of distributional -thesaurus embedding in improving overall word representation. This is the first -attempt where we show that combining the proposed word representation obtained -by distributional thesaurus embedding with the state-of-the-art word -representations helps in improving the performance by a significant margin when -evaluated against NLP tasks like word similarity and relatedness, synonym -detection, analogy detection. Additionally, we show that even without using any -handcrafted lexical resources we can come up with representations having -comparable performance in the word similarity and relatedness tasks compared to -the representations where a lexical resource has been used. -" -6728,1802.06209,"Maghilnan S, Rajesh Kumar M",Sentiment Analysis on Speaker Specific Speech Data,cs.CL cs.SD eess.AS," Sentiment analysis has evolved over past few decades, most of the work in it -revolved around textual sentiment analysis with text mining techniques. But -audio sentiment analysis is still in a nascent stage in the research community. -In this proposed research, we perform sentiment analysis on speaker -discriminated speech transcripts to detect the emotions of the individual -speakers involved in the conversation. We analyzed different techniques to -perform speaker discrimination and sentiment analysis to find efficient -algorithms to perform this task. -" -6729,1802.06412,"Florian Kreyssig, Chao Zhang, Philip Woodland",Improved TDNNs using Deep Kernels and Frequency Dependent Grid-RNNs,cs.CL cs.AI cs.SD eess.AS stat.ML," Time delay neural networks (TDNNs) are an effective acoustic model for large -vocabulary speech recognition. The strength of the model can be attributed to -its ability to effectively model long temporal contexts. However, current TDNN -models are relatively shallow, which limits the modelling capability. This -paper proposes a method of increasing the network depth by deepening the kernel -used in the TDNN temporal convolutions. The best performing kernel consists of -three fully connected layers with a residual (ResNet) connection from the -output of the first to the output of the third. The addition of -spectro-temporal processing as the input to the TDNN in the form of a -convolutional neural network (CNN) and a newly designed Grid-RNN was -investigated. The Grid-RNN strongly outperforms a CNN if different sets of -parameters for different frequency bands are used and can be further enhanced -by using a bi-directional Grid-RNN. Experiments using the multi-genre broadcast -(MGB3) English data (275h) show that deep kernel TDNNs reduces the word error -rate (WER) by 6% relative and when combined with the frequency dependent -Grid-RNN gives a relative WER reduction of 9%. -" -6730,1802.06428,"Fengyi Tang, Kaixiang Lin, Ikechukwu Uchendu, Hiroko H. Dodge, Jiayu - Zhou","Improving Mild Cognitive Impairment Prediction via Reinforcement - Learning and Dialogue Simulation",cs.LG cs.CL stat.ML," Mild cognitive impairment (MCI) is a prodromal phase in the progression from -normal aging to dementia, especially Alzheimers disease. Even though there is -mild cognitive decline in MCI patients, they have normal overall cognition and -thus is challenging to distinguish from normal aging. Using transcribed data -obtained from recorded conversational interactions between participants and -trained interviewers, and applying supervised learning models to these data, a -recent clinical trial has shown a promising result in differentiating MCI from -normal aging. However, the substantial amount of interactions with medical -staff can still incur significant medical care expenses in practice. In this -paper, we propose a novel reinforcement learning (RL) framework to train an -efficient dialogue agent on existing transcripts from clinical trials. -Specifically, the agent is trained to sketch disease-specific lexical -probability distribution, and thus to converse in a way that maximizes the -diagnosis accuracy and minimizes the number of conversation turns. We evaluate -the performance of the proposed reinforcement learning framework on the MCI -diagnosis from a real clinical trial. The results show that while using only a -few turns of conversation, our framework can significantly outperform -state-of-the-art supervised learning approaches. -" -6731,1802.06613,Ivan Habernal and Henning Wachsmuth and Iryna Gurevych and Benno Stein,"Before Name-calling: Dynamics and Triggers of Ad Hominem Fallacies in - Web Argumentation",cs.CL," Arguing without committing a fallacy is one of the main requirements of an -ideal debate. But even when debating rules are strictly enforced and fallacious -arguments punished, arguers often lapse into attacking the opponent by an ad -hominem argument. As existing research lacks solid empirical investigation of -the typology of ad hominem arguments as well as their potential causes, this -paper fills this gap by (1) performing several large-scale annotation studies, -(2) experimenting with various neural architectures and validating our working -hypotheses, such as controversy or reasonableness, and (3) providing linguistic -insights into triggers of ad hominem using explainable neural network -architectures. -" -6732,1802.06655,Antonios Anastasopoulos and David Chiang,Tied Multitask Learning for Neural Speech Translation,cs.CL," We explore multitask models for neural translation of speech, augmenting them -in order to reflect two intuitive notions. First, we introduce a model where -the second task decoder receives information from the decoder of the first -task, since higher-level intermediate representations should provide useful -information. Second, we apply regularization that encourages transitivity and -invertibility. We show that the application of these notions on jointly trained -models improves performance on the tasks of low-resource speech transcription -and translation. It also leads to better performance when using attention -information for word discovery over unsegmented input. -" -6733,1802.06757,"Guillem Cucurull, Pau Rodr\'iguez, V. Oguz Yazici, Josep M. Gonfaus, - F. Xavier Roca, Jordi Gonz\`alez","Deep Inference of Personality Traits by Integrating Image and Word Use - in Social Networks",cs.CY cs.CL cs.CV," Social media, as a major platform for communication and information exchange, -is a rich repository of the opinions and sentiments of 2.3 billion users about -a vast spectrum of topics. To sense the whys of certain social user's demands -and cultural-driven interests, however, the knowledge embedded in the 1.8 -billion pictures which are uploaded daily in public profiles has just started -to be exploited since this process has been typically been text-based. -Following this trend on visual-based social analysis, we present a novel -methodology based on Deep Learning to build a combined image-and-text based -personality trait model, trained with images posted together with words found -highly correlated to specific personality traits. So the key contribution here -is to explore whether OCEAN personality trait modeling can be addressed based -on images, here called \emph{Mind{P}ics}, appearing with certain tags with -psychological insights. We found that there is a correlation between those -posted images and their accompanying texts, which can be successfully modeled -using deep neural networks for personality estimation. The experimental results -are consistent with previous cyber-psychology results based on texts or images. -In addition, classification results on some traits show that some patterns -emerge in the set of images corresponding to a specific text, in essence to -those representing an abstract concept. These results open new avenues of -research for further refining the proposed personality model under the -supervision of psychology experts. -" -6734,1802.06764,Michele Pasquini and Maurizio Serva,"Stability of meanings versus rate of replacement of words: an - experimental test",cs.CL physics.soc-ph," The words of a language are randomly replaced in time by new ones, but it has -long been known that words corresponding to some items (meanings) are less -frequently replaced than others. Usually, the rate of replacement for a given -item is not directly observable, but it is inferred by the estimated stability -which, on the contrary, is observable. This idea goes back a long way in the -lexicostatistical literature, nevertheless nothing ensures that it gives the -correct answer. The family of Romance languages allows for a direct test of the -estimated stabilities against the replacement rates since the proto-language -(Latin) is known and the replacement rates can be explicitly computed. The -output of the test is threefold:first, we prove that the standard approach -which tries to infer the replacement rates trough the estimated stabilities is -sound; second, we are able to rewrite the fundamental formula of -Glottochronology for a non universal replacement rate (a rate which depends on -the item); third, we give indisputable evidence that the stability ranking is -far from being the same for different families of languages. This last result -is also supported by comparison with the Malagasy family of dialects. As a side -result we also provide some evidence that Vulgar Latin and not Late Classical -Latin is at the root of modern Romance languages. -" -6735,1802.06842,"Hady Elsahar, Christophe Gravier, Frederique Laforest","Zero-Shot Question Generation from Knowledge Graphs for Unseen - Predicates and Entity Types",cs.CL," We present a neural model for question generation from knowledge base triples -in a ""Zero-Shot"" setup, that is generating questions for triples containing -predicates, subject types or object types that were not seen at training time. -Our model leverages triples occurrences in the natural language corpus in an -encoder-decoder architecture, paired with an original part-of-speech copy -action mechanism to generate questions. Benchmark and human evaluation show -that our model sets a new state-of-the-art for zero-shot QG. -" -6736,1802.06861,Vikramjit Mitra and Horacio Franco,"Interpreting DNN output layer activations: A strategy to cope with - unseen data in speech recognition",cs.CL cs.SD eess.AS," Unseen data can degrade performance of deep neural net acoustic models. To -cope with unseen data, adaptation techniques are deployed. For unlabeled unseen -data, one must generate some hypothesis given an existing model, which is used -as the label for model adaptation. However, assessing the goodness of the -hypothesis can be difficult, and an erroneous hypothesis can lead to poorly -trained models. In such cases, a strategy to select data having reliable -hypothesis can ensure better model adaptation. This work proposes a -data-selection strategy for DNN model adaptation, where DNN output layer -activations are used to ascertain the goodness of a generated hypothesis. In a -DNN acoustic model, the output layer activations are used to generate target -class probabilities. Under unseen data conditions, the difference between the -most probable target and the next most probable target is decreased compared to -the same for seen data, indicating that the model may be uncertain while -generating its hypothesis. This work proposes a strategy to assess a model's -performance by analyzing the output layer activations by using a distance -measure between the most likely target and the next most likely target, which -is used for data selection for performing unsupervised adaptation. -" -6737,1802.06893,"Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas - Mikolov",Learning Word Vectors for 157 Languages,cs.CL cs.LG," Distributed word representations, or word vectors, have recently been applied -to many tasks in natural language processing, leading to state-of-the-art -performance. A key ingredient to the successful application of these -representations is to train them on very large corpora, and use these -pre-trained models in downstream tasks. In this paper, we describe how we -trained such high quality word representations for 157 languages. We used two -sources of data to train these models: the free online encyclopedia Wikipedia -and data from the common crawl project. We also introduce three new word -analogy datasets to evaluate these word vectors, for French, Hindi and Polish. -Finally, we evaluate our pre-trained word vectors on 10 languages for which -evaluation datasets exists, showing very strong performance compared to -previous models. -" -6738,1802.06894,"Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos","Learning Hidden Markov Models from Pairwise Co-occurrences with - Application to Topic Modeling",cs.CL cs.LG eess.SP stat.ML," We present a new algorithm for identifying the transition and emission -probabilities of a hidden Markov model (HMM) from the emitted data. -Expectation-maximization becomes computationally prohibitive for long -observation records, which are often required for identification. The new -algorithm is particularly suitable for cases where the available sample size is -large enough to accurately estimate second-order output probabilities, but not -higher-order ones. We show that if one is only able to obtain a reliable -estimate of the pairwise co-occurrence probabilities of the emissions, it is -still possible to uniquely identify the HMM if the emission probability is -\emph{sufficiently scattered}. We apply our method to hidden topic Markov -modeling, and demonstrate that we can learn topics with higher quality if -documents are modeled as observations of HMMs sharing the same emission (topic) -probability, compared to the simple but widely used bag-of-words model. -" -6739,1802.06901,"Jason Lee, Elman Mansimov, Kyunghyun Cho","Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative - Refinement",cs.LG cs.CL stat.ML," We propose a conditional non-autoregressive neural sequence model based on -iterative refinement. The proposed model is designed based on the principles of -latent variable models and denoising autoencoders, and is generally applicable -to any sequence generation task. We extensively evaluate the proposed model on -machine translation (En-De and En-Ro) and image caption generation, and observe -that it significantly speeds up decoding while maintaining the generation -quality comparable to the autoregressive counterpart. -" -6740,1802.06941,"Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Bin Liu","Distilling Knowledge Using Parallel Data for Far-field Speech - Recognition",cs.CL cs.SD eess.AS," In order to improve the performance for far-field speech recognition, this -paper proposes to distill knowledge from the close-talking model to the -far-field model using parallel data. The close-talking model is called the -teacher model. The far-field model is called the student model. The student -model is trained to imitate the output distributions of the teacher model. This -constraint can be realized by minimizing the Kullback-Leibler (KL) divergence -between the output distribution of the student model and the teacher model. -Experimental results on AMI corpus show that the best student model achieves up -to 4.7% absolute word error rate (WER) reduction when compared with the -conventionally-trained baseline models. -" -6741,1802.06950,"Tirthankar Ghosal, Amitra Salam, Swati Tiwari, Asif Ekbal, Pushpak - Bhattacharyya",TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection,cs.CL," Detecting novelty of an entire document is an Artificial Intelligence (AI) -frontier problem that has widespread NLP applications, such as extractive -document summarization, tracking development of news events, predicting impact -of scholarly articles, etc. Important though the problem is, we are unaware of -any benchmark document level data that correctly addresses the evaluation of -automatic novelty detection techniques in a classification framework. To bridge -this gap, we present here a resource for benchmarking the techniques for -document level novelty detection. We create the resource via event-specific -crawling of news documents across several domains in a periodic manner. We -release the annotated corpus with necessary statistics and show its use with a -developed system for the problem in concern. -" -6742,1802.07089,"Qiuyuan Huang, Li Deng, Dapeng Wu, Chang Liu, Xiaodong He",Attentive Tensor Product Learning,cs.CL cs.AI cs.LG cs.NE," This paper proposes a new architecture - Attentive Tensor Product Learning -(ATPL) - to represent grammatical structures in deep learning models. ATPL is a -new architecture to bridge this gap by exploiting Tensor Product -Representations (TPR), a structured neural-symbolic model developed in -cognitive science, aiming to integrate deep learning with explicit language -structures and rules. The key ideas of ATPL are: 1) unsupervised learning of -role-unbinding vectors of words via TPR-based deep neural network; 2) employing -attention modules to compute TPR; and 3) integration of TPR with typical deep -learning architectures including Long Short-Term Memory (LSTM) and Feedforward -Neural Network (FFNN). The novelty of our approach lies in its ability to -extract the grammatical structure of a sentence by using role-unbinding -vectors, which are obtained in an unsupervised manner. This ATPL approach is -applied to 1) image captioning, 2) part of speech (POS) tagging, and 3) -constituency parsing of a sentence. Experimental results demonstrate the -effectiveness of the proposed approach. -" -6743,1802.07117,"Ana Paula Appel, Paulo Rodrigo Cavalin, Marisa Affonso Vasconcelos, - Claudio Santos Pinhanez",Combining Textual Content and Structure to Improve Dialog Similarity,cs.CL," Chatbots, taking advantage of the success of the messaging apps and recent -advances in Artificial Intelligence, have become very popular, from helping -business to improve customer services to chatting to users for the sake of -conversation and engagement (celebrity or personal bots). However, developing -and improving a chatbot requires understanding their data generated by its -users. Dialog data has a different nature of a simple question and answering -interaction, in which context and temporal properties (turn order) creates a -different understanding of such data. In this paper, we propose a novelty -metric to compute dialogs' similarity based not only on the text content but -also on the information related to the dialog structure. Our experimental -results performed over the Switchboard dataset show that using evidence from -both textual content and the dialog structure leads to more accurate results -than using each measure in isolation. -" -6744,1802.07170,"Xiaolin Wang, Masao Utiyama, Eiichiro Sumita","CytonMT: an Efficient Neural Machine Translation Open-source Toolkit - Implemented in C++",cs.CL," This paper presents an open-source neural machine translation toolkit named -CytonMT (https://github.com/arthurxlw/cytonMt). The toolkit is built from -scratch only using C++ and NVIDIA's GPU-accelerated libraries. The toolkit -features training efficiency, code simplicity and translation quality. -Benchmarks show that CytonMT accelerates the training speed by 64.5% to 110.8% -on neural networks of various sizes, and achieves competitive translation -quality. -" -6745,1802.07226,"Pengxiang Cheng, Katrin Erk",Implicit Argument Prediction with Event Knowledge,cs.CL," Implicit arguments are not syntactically connected to their predicates, and -are therefore hard to extract. Previous work has used models with large numbers -of features, evaluated on very small datasets. We propose to train models for -implicit argument prediction on a simple cloze task, for which data can be -generated automatically at scale. This allows us to use a neural model, which -draws on narrative coherence and entity salience for predictions. We show that -our model has superior performance on both synthetic and natural data. -" -6746,1802.07370,Siddhartha Brahma,SufiSent - Universal Sentence Representations Using Suffix Encodings,cs.CL cs.AI," Computing universal distributed representations of sentences is a fundamental -task in natural language processing. We propose a method to learn such -representations by encoding the suffixes of word sequences in a sentence and -training on the Stanford Natural Language Inference (SNLI) dataset. We -demonstrate the effectiveness of our approach by evaluating it on the SentEval -benchmark, improving on existing approaches on several transfer tasks. -" -6747,1802.07374,Siddhartha Brahma,On the scaling of polynomial features for representation matching,cs.CL cs.AI," In many neural models, new features as polynomial functions of existing ones -are used to augment representations. Using the natural language inference task -as an example, we investigate the use of scaled polynomials of degree 2 and -above as matching features. We find that scaling degree 2 features has the -highest impact on performance, reducing classification error by 5% in the best -models. -" -6748,1802.07420,"Siddharth Dalmia, Ramon Sanabria, Florian Metze and Alan W. Black",Sequence-based Multi-lingual Low Resource Speech Recognition,cs.CL cs.SD eess.AS," Techniques for multi-lingual and cross-lingual speech recognition can help in -low resource scenarios, to bootstrap systems and enable analysis of new -languages and domains. End-to-end approaches, in particular sequence-based -techniques, are attractive because of their simplicity and elegance. While it -is possible to integrate traditional multi-lingual bottleneck feature -extractors as front-ends, we show that end-to-end multi-lingual training of -sequence models is effective on context independent models trained using -Connectionist Temporal Classification (CTC) loss. We show that our model -improves performance on Babel languages by over 6% absolute in terms of -word/phoneme error rate when compared to mono-lingual systems built in the same -setting for these languages. We also show that the trained model can be adapted -cross-lingually to an unseen language using just 25% of the target data. We -show that training on multiple languages is important for very low resource -cross-lingual target scenarios, but not for multi-lingual testing scenarios. -Here, it appears beneficial to include large well prepared datasets. -" -6749,1802.07459,"Bang Liu, Di Niu, Haojie Wei, Jinghong Lin, Yancheng He, Kunfeng Lai, - Yu Xu",Matching Article Pairs with Graphical Decomposition and Convolutions,cs.CL cs.IR," Identifying the relationship between two articles, e.g., whether two articles -published from different sources describe the same breaking news, is critical -to many document understanding tasks. Existing approaches for modeling and -matching sentence pairs do not perform well in matching longer documents, which -embody more complex interactions between the enclosed entities than a sentence -does. To model article pairs, we propose the Concept Interaction Graph to -represent an article as a graph of concepts. We then match a pair of articles -by comparing the sentences that enclose the same concept vertex through a -series of encoding techniques, and aggregate the matching signals through a -graph convolutional network. To facilitate the evaluation of long article -matching, we have created two datasets, each consisting of about 30K pairs of -breaking news articles covering diverse topics in the open domain. Extensive -evaluations of the proposed methods on the two datasets demonstrate significant -improvements over a wide range of state-of-the-art methods for natural language -matching. -" -6750,1802.07839,"Kevin Tian, Teng Zhang, James Zou","CoVeR: Learning Covariate-Specific Vector Representations with Tensor - Decompositions",cs.CL," Word embedding is a useful approach to capture co-occurrence structures in -large text corpora. However, in addition to the text data itself, we often have -additional covariates associated with individual corpus documents---e.g. the -demographic of the author, time and venue of publication---and we would like -the embedding to naturally capture this information. We propose CoVeR, a new -tensor decomposition model for vector embeddings with covariates. CoVeR jointly -learns a \emph{base} embedding for all the words as well as a weighted diagonal -matrix to model how each covariate affects the base embedding. To obtain author -or venue-specific embedding, for example, we can then simply multiply the base -embedding by the associated transformation matrix. The main advantages of our -approach are data efficiency and interpretability of the covariate -transformation. Our experiments demonstrate that our joint model learns -substantially better covariate-specific embeddings compared to the standard -approach of learning a separate embedding for each covariate using only the -relevant subset of data, as well as other related methods. Furthermore, CoVeR -encourages the embeddings to be ""topic-aligned"" in that the dimensions have -specific independent meanings. This allows our covariate-specific embeddings to -be compared by topic, enabling downstream differential analysis. We empirically -evaluate the benefits of our algorithm on datasets, and demonstrate how it can -be used to address many natural questions about covariate effects. - Accompanying code to this paper can be found at -http://github.com/kjtian/CoVeR. -" -6751,1802.07858,"Sudipta Kar and Suraj Maharjan and A. Pastor L\'opez-Monroy and Thamar - Solorio",MPST: A Corpus of Movie Plot Synopses with Tags,cs.CL," Social tagging of movies reveals a wide range of heterogeneous information -about movies, like the genre, plot structure, soundtracks, metadata, visual and -emotional experiences. Such information can be valuable in building automatic -systems to create tags for movies. Automatic tagging systems can help -recommendation engines to improve the retrieval of similar movies as well as -help viewers to know what to expect from a movie in advance. In this paper, we -set out to the task of collecting a corpus of movie plot synopses and tags. We -describe a methodology that enabled us to build a fine-grained set of around 70 -tags exposing heterogeneous characteristics of movie plots and the multi-label -associations of these tags with some 14K movie plot synopses. We investigate -how these tags correlate with movies and the flow of emotions throughout -different types of movies. Finally, we use this corpus to explore the -feasibility of inferring tags from plot synopses. We expect the corpus will be -useful in other tasks where analysis of narratives is relevant. -" -6752,1802.07859,"Zubair Shah, Paige Martin, Enrico Coiera, Kenneth D. Mandl, Adam G. - Dunn","Modeling Spatiotemporal Factors Associated With Sentiment on Twitter: - Synthesis and Suggestions for Improving the Identification of Localized - Deviations",cs.SI cs.CL," Background: Studies examining how sentiment on social media varies depending -on timing and location appear to produce inconsistent results, making it hard -to design systems that use sentiment to detect localized events for public -health applications. - Objective: The aim of this study was to measure how common timing and -location confounders explain variation in sentiment on Twitter. - Methods: Using a dataset of 16.54 million English-language tweets from 100 -cities posted between July 13 and November 30, 2017, we estimated the positive -and negative sentiment for each of the cities using a dictionary-based -sentiment analysis and constructed models to explain the differences in -sentiment using time of day, day of week, weather, city, and interaction type -(conversations or broadcasting) as factors and found that all factors were -independently associated with sentiment. - Results: In the full multivariable model of positive (Pearson r in test data -0.236; 95\% CI 0.231-0.241) and negative (Pearson r in test data 0.306; 95\% CI -0.301-0.310) sentiment, the city and time of day explained more of the variance -than weather and day of week. Models that account for these confounders produce -a different distribution and ranking of important events compared with models -that do not account for these confounders. - Conclusions: In public health applications that aim to detect localized -events by aggregating sentiment across populations of Twitter users, it is -worthwhile accounting for baseline differences before looking for unexpected -changes. -" -6753,1802.07860,Arindam Jati and Panayiotis Georgiou,"Neural Predictive Coding using Convolutional Neural Networks towards - Unsupervised Learning of Speaker Characteristics",cs.SD cs.CL eess.AS," Learning speaker-specific features is vital in many applications like speaker -recognition, diarization and speech recognition. This paper provides a novel -approach, we term Neural Predictive Coding (NPC), to learn speaker-specific -characteristics in a completely unsupervised manner from large amounts of -unlabeled training data that even contain many non-speech events and -multi-speaker audio streams. The NPC framework exploits the proposed short-term -active-speaker stationarity hypothesis which assumes two temporally-close short -speech segments belong to the same speaker, and thus a common representation -that can encode the commonalities of both the segments, should capture the -vocal characteristics of that speaker. We train a convolutional deep siamese -network to produce ""speaker embeddings"" by learning to separate `same' vs -`different' speaker pairs which are generated from an unlabeled data of audio -streams. Two sets of experiments are done in different scenarios to evaluate -the strength of NPC embeddings and compare with state-of-the-art in-domain -supervised methods. First, two speaker identification experiments with -different context lengths are performed in a scenario with comparatively -limited within-speaker channel variability. NPC embeddings are found to perform -the best at short duration experiment, and they provide complementary -information to i-vectors for full utterance experiments. Second, a large scale -speaker verification task having a wide range of within-speaker channel -variability is adopted as an upper-bound experiment where comparisons are drawn -with in-domain supervised methods. -" -6754,1802.07862,"Seungwhan Moon, Leonardo Neves, Vitor Carvalho",Multimodal Named Entity Recognition for Short Social Media Posts,cs.CL," We introduce a new task called Multimodal Named Entity Recognition (MNER) for -noisy user-generated data such as tweets or Snapchat captions, which comprise -short text with accompanying images. These social media posts often come in -inconsistent or incomplete syntax and lexical notations with very limited -surrounding textual contexts, bringing significant challenges for NER. To this -end, we create a new dataset for MNER called SnapCaptions (Snapchat -image-caption pairs submitted to public and crowd-sourced stories with fully -annotated named entities). We then build upon the state-of-the-art Bi-LSTM -word/character based NER models with 1) a deep image network which incorporates -relevant visual context to augment textual information, and 2) a generic -modality-attention module which learns to attenuate irrelevant modalities while -amplifying the most informative ones to extract contexts from, adaptive to each -sample and token. The proposed MNER model with modality attention significantly -outperforms the state-of-the-art text-only NER models by successfully -leveraging provided visual contexts, opening up potential applications of MNER -on myriads of social media platforms. -" -6755,1802.07997,"Heng Ding, Shuo Zhang, Dar\'io Garigliotti, and Krisztian Balog","Generating High-Quality Query Suggestion Candidates for Task-Based - Search",cs.IR cs.AI cs.CL," We address the task of generating query suggestions for task-based search. -The current state of the art relies heavily on suggestions provided by a major -search engine. In this paper, we solve the task without reliance on search -engines. Specifically, we focus on the first step of a two-stage pipeline -approach, which is dedicated to the generation of query suggestion candidates. -We present three methods for generating candidate suggestions and apply them on -multiple information sources. Using a purpose-built test collection, we find -that these methods are able to generate high-quality suggestion candidates. -" -6756,1802.08010,Dar\'io Garigliotti and Krisztian Balog,Towards an Understanding of Entity-Oriented Search Intents,cs.IR cs.AI cs.CL," Entity-oriented search deals with a wide variety of information needs, from -displaying direct answers to interacting with services. In this work, we aim to -understand what are prominent entity-oriented search intents and how they can -be fulfilled. We develop a scheme of entity intent categories, and use them to -annotate a sample of queries. Specifically, we annotate unique query refiners -on the level of entity types. We observe that, on average, over half of those -refiners seek to interact with a service, while over a quarter of the refiners -search for information that may be looked up in a knowledge base. -" -6757,1802.08129,"Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt - Schiele, Trevor Darrell, Marcus Rohrbach","Multimodal Explanations: Justifying Decisions and Pointing to the - Evidence",cs.AI cs.CL cs.CV," Deep models that are both effective and explainable are desirable in many -settings; prior explainable models have been unimodal, offering either -image-based visualization of attention weights or text-based generation of -post-hoc justifications. We propose a multimodal approach to explanation, and -argue that the two modalities provide complementary explanatory strengths. We -collect two new datasets to define and evaluate this task, and propose a novel -model which can provide joint textual rationale generation and attention -visualization. Our datasets define visual and textual justifications of a -classification decision for activity recognition tasks (ACT-X) and for visual -question answering tasks (VQA-X). We quantitatively show that training with the -textual explanations not only yields better textual justification models, but -also better localizes the evidence that supports the decision. We also -qualitatively show cases where visual explanation is more insightful than -textual explanation, and vice versa, supporting our thesis that multimodal -explanation models offer significant benefits over unimodal approaches. -" -6758,1802.08148,"Diego Moussallem, Mohamed Ahmed Sherif, Diego Esteves, Marcos Zampieri - and Axel-Cyrille Ngonga Ngomo",LIDIOMS: A Multilingual Linked Idioms Data Set,cs.CL," In this paper, we describe the LIDIOMS data set, a multilingual RDF -representation of idioms currently containing five languages: English, German, -Italian, Portuguese, and Russian. The data set is intended to support natural -language processing applications by providing links between idioms across -languages. The underlying data was crawled and integrated from various sources. -To ensure the quality of the crawled data, all idioms were evaluated by at -least two native speakers. Herein, we present the model devised for structuring -the data. We also provide the details of linking LIDIOMS to well-known -multilingual data sets such as BabelNet. The resulting data set complies with -best practices according to Linguistic Linked Open Data Community. -" -6759,1802.08150,"Diego Moussallem, Thiago Castro Ferreira, Marcos Zampieri, Maria - Claudia Cavalcanti, Geraldo Xex\'eo, Mariana Neves, Axel-Cyrille Ngonga Ngomo",RDF2PT: Generating Brazilian Portuguese Texts from RDF Data,cs.CL," The generation of natural language from Resource Description Framework (RDF) -data has recently gained significant attention due to the continuous growth of -Linked Data. A number of these approaches generate natural language in -languages other than English, however, no work has been proposed to generate -Brazilian Portuguese texts out of RDF. We address this research gap by -presenting RDF2PT, an approach that verbalizes RDF data to Brazilian Portuguese -language. We evaluated RDF2PT in an open questionnaire with 44 native speakers -divided into experts and non-experts. Our results suggest that RDF2PT is able -to generate text which is similar to that generated by humans and can hence be -easily understood. -" -6760,1802.08218,"Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen - Grauman, Jiebo Luo, and Jeffrey P. Bigham",VizWiz Grand Challenge: Answering Visual Questions from Blind People,cs.CV cs.CL cs.HC," The study of algorithms to automatically answer visual questions currently is -motivated by visual question answering (VQA) datasets constructed in artificial -VQA settings. We propose VizWiz, the first goal-oriented VQA dataset arising -from a natural VQA setting. VizWiz consists of over 31,000 visual questions -originating from blind people who each took a picture using a mobile phone and -recorded a spoken question about it, together with 10 crowdsourced answers per -visual question. VizWiz differs from the many existing VQA datasets because (1) -images are captured by blind photographers and so are often poor quality, (2) -questions are spoken and so are more conversational, and (3) often visual -questions cannot be answered. Evaluation of modern algorithms for answering -visual questions and deciding if a visual question is answerable reveals that -VizWiz is a challenging dataset. We introduce this dataset to encourage a -larger community to develop more generalized algorithms that can assist blind -people. -" -6761,1802.08301,"Chandra Bhagavatula and Sergey Feldman and Russell Power and Waleed - Ammar",Content-Based Citation Recommendation,cs.CL cs.DL cs.IR," We present a content-based method for recommending citations in an academic -paper draft. We embed a given query document into a vector space, then use its -nearest neighbors as candidates, and rerank the candidates using a -discriminative model trained to distinguish between observed and unobserved -citations. Unlike previous work, our method does not require metadata such as -author names which can be missing, e.g., during the peer review process. -Without using metadata, our method outperforms the best reported results on -PubMed and DBLP datasets with relative improvements of over 18% in F1@20 and -over 22% in MRR. We show empirically that, although adding metadata improves -the performance on standard metrics, it favors self-citations which are less -useful in a citation recommendation setup. We release an online portal -(http://labs.semanticscholar.org/citeomatic/) for citation recommendation based -on our method, and a new dataset OpenCorpus of 7 million research articles to -facilitate future research on this task. -" -6762,1802.08314,"Chao Zhang, Philip Woodland",High Order Recurrent Neural Networks for Acoustic Modelling,cs.CL cs.AI eess.AS stat.ML," Vanishing long-term gradients are a major issue in training standard -recurrent neural networks (RNNs), which can be alleviated by long short-term -memory (LSTM) models with memory cells. However, the extra parameters -associated with the memory cells mean an LSTM layer has four times as many -parameters as an RNN with the same hidden vector size. This paper addresses the -vanishing gradient problem using a high order RNN (HORNN) which has additional -connections from multiple previous time steps. Speech recognition experiments -using British English multi-genre broadcast (MGB3) data showed that the -proposed HORNN architectures for rectified linear unit and sigmoid activation -functions reduced word error rates (WER) by 4.2% and 6.3% over the -corresponding RNNs, and gave similar WERs to a (projected) LSTM while using -only 20%--50% of the recurrent layer parameters and computation. -" -6763,1802.08332,"Yue Gu, Shuhong Chen, Ivan Marsic",Deep Multimodal Learning for Emotion Recognition in Spoken Language,cs.CL," In this paper, we present a novel deep multimodal framework to predict human -emotions based on sentence-level spoken language. Our architecture has two -distinctive characteristics. First, it extracts the high-level features from -both text and audio via a hybrid deep multimodal structure, which considers the -spatial information from text, temporal information from audio, and high-level -associations from low-level handcrafted features. Second, we fuse all features -by using a three-layer deep neural network to learn the correlations across -modalities and train the feature extraction and fusion modules together, -allowing optimal global fine-tuning of the entire structure. We evaluated the -proposed framework on the IEMOCAP dataset. Our result shows promising -performance, achieving 60.4% in weighted accuracy for five emotion categories. -" -6764,1802.08375,Zhenisbek Assylbekov and Rustem Takhanov,Reusing Weights in Subword-aware Neural Language Models,cs.CL cs.NE stat.ML," We propose several ways of reusing subword embeddings and other weights in -subword-aware neural language models. The proposed techniques do not benefit a -competitive character-aware model, but some of them improve the performance of -syllable- and morpheme-aware models while showing significant reductions in -model sizes. We discover a simple hands-on principle: in a multi-layer input -embedding model, layers should be tied consecutively bottom-up if reused at -output. Our best morpheme-aware model with properly reused weights beats the -competitive word-level model by a large margin across multiple languages and -has 20%-87% fewer parameters. -" -6765,1802.08379,"Sheng-Yeh Chen and Chao-Chun Hsu, Chuan-Chun Kuo, Ting-Hao (Kenneth) - Huang, Lun-Wei Ku",EmotionLines: An Emotion Corpus of Multi-Party Conversations,cs.CL," Feeling emotion is a critical characteristic to distinguish people from -machines. Among all the multi-modal resources for emotion detection, textual -datasets are those containing the least additional information in addition to -semantics, and hence are adopted widely for testing the developed systems. -However, most of the textual emotional datasets consist of emotion labels of -only individual words, sentences or documents, which makes it challenging to -discuss the contextual flow of emotions. In this paper, we introduce -EmotionLines, the first dataset with emotions labeling on all utterances in -each dialogue only based on their textual content. Dialogues in EmotionLines -are collected from Friends TV scripts and private Facebook messenger dialogues. -Then one of seven emotions, six Ekman's basic emotions plus the neutral -emotion, is labeled on each utterance by 5 Amazon MTurkers. A total of 29,245 -utterances from 2,000 dialogues are labeled in EmotionLines. We also provide -several strong baselines for emotion detection models on EmotionLines in this -paper. -" -6766,1802.08395,"Dmitriy Serdyuk and Yongqiang Wang and Christian Fuegen and Anuj Kumar - and Baiyang Liu and Yoshua Bengio",Towards end-to-end spoken language understanding,cs.CL," Spoken language understanding system is traditionally designed as a pipeline -of a number of components. First, the audio signal is processed by an automatic -speech recognizer for transcription or n-best hypotheses. With the recognition -results, a natural language understanding system classifies the text to -structured data as domain, intent and slots for down-streaming consumers, such -as dialog system, hands-free applications. These components are usually -developed and optimized independently. In this paper, we present our study on -an end-to-end learning system for spoken language understanding. With this -unified approach, we can infer the semantic meaning directly from audio -features without the intermediate text representation. This study showed that -the trained model can achieve reasonable good result and demonstrated that the -model can capture the semantic attention directly from the audio features. -" -6767,1802.08504,"Hai Ye, Xin Jiang, Zhunchen Luo, Wenhan Chao","Interpretable Charge Predictions for Criminal Cases: Learning to - Generate Court Views from Fact Descriptions",cs.CL," In this paper, we propose to study the problem of COURT VIEW GENeration from -the fact description in a criminal case. The task aims to improve the -interpretability of charge prediction systems and help automatic legal document -generation. We formulate this task as a text-to-text natural language -generation (NLG) problem. Sequenceto-sequence model has achieved cutting-edge -performances in many NLG tasks. However, due to the non-distinctions of fact -descriptions, it is hard for Seq2Seq model to generate charge-discriminative -court views. In this work, we explore charge labels to tackle this issue. We -propose a label-conditioned Seq2Seq model with attention for this problem, to -decode court views conditioned on encoded charge labels. Experimental results -show the effectiveness of our method. -" -6768,1802.08545,"Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane - Schwartz",Unsupervised Grammar Induction with Depth-bounded PCFG,cs.CL cs.AI," There has been recent interest in applying cognitively or empirically -motivated bounds on recursion depth to limit the search space of grammar -induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., -2016). This work extends this depth-bounding approach to probabilistic -context-free grammar induction (DB-PCFG), which has a smaller parameter space -than hierarchical sequence models, and therefore more fully exploits the space -reductions of depth-bounding. Results for this model on grammar acquisition -from transcribed child-directed speech and newswire text exceed or are -competitive with those of other models when evaluated on parse accuracy. -Moreover, gram- mars acquired from this model demonstrate a consistent use of -category labels, something which has not been demonstrated by other acquisition -models. -" -6769,1802.08599,"Rik van Noord, Lasha Abzianidze, Hessel Haagsma, Johan Bos",Evaluating Scoped Meaning Representations,cs.CL," Semantic parsing offers many opportunities to improve natural language -understanding. We present a semantically annotated parallel corpus for English, -German, Italian, and Dutch where sentences are aligned with scoped meaning -representations in order to capture the semantics of negation, modals, -quantification, and presupposition triggers. The semantic formalism is based on -Discourse Representation Theory, but concepts are represented by WordNet -synsets and thematic roles by VerbNet relations. Translating scoped meaning -representations to sets of clauses enables us to compare them for the purpose -of semantic parser evaluation and checking translations. This is done by -computing precision and recall on matching clauses, in a similar way as is done -for Abstract Meaning Representations. We show that our matching tool for -evaluating scoped meaning representations is both accurate and efficient. -Applying this matching tool to three baseline semantic parsers yields F-scores -between 43% and 54%. A pilot study is performed to automatically find changes -in meaning by comparing meaning representations of translations. This -comparison turns out to be an additional way of (i) finding annotation mistakes -and (ii) finding instances where our semantic analysis needs to be improved. -" -6770,1802.08614,Baoxu Shi and Tim Weninger,Visualizing the Flow of Discourse with a Concept Ontology,cs.CL cs.AI cs.IR," Understanding and visualizing human discourse has long being a challenging -task. Although recent work on argument mining have shown success in classifying -the role of various sentences, the task of recognizing concepts and -understanding the ways in which they are discussed remains challenging. Given -an email thread or a transcript of a group discussion, our task is to extract -the relevant concepts and understand how they are referenced and re-referenced -throughout the discussion. In the present work, we present a preliminary -approach for extracting and visualizing group discourse by adapting Wikipedia's -category hierarchy to be an external concept ontology. From a user study, we -found that our method achieved better results than 4 strong alternative -approaches, and we illustrate our visualization method based on the extracted -discourse flows. -" -6771,1802.08636,"Shashi Narayan, Shay B. Cohen, Mirella Lapata","Ranking Sentences for Extractive Summarization with Reinforcement - Learning",cs.CL," Single document summarization is the task of producing a shorter version of a -document while preserving its principal information content. In this paper we -conceptualize extractive summarization as a sentence ranking task and propose a -novel training algorithm which globally optimizes the ROUGE evaluation metric -through a reinforcement learning objective. We use our algorithm to train a -neural summarization model on the CNN and DailyMail datasets and demonstrate -experimentally that it outperforms state-of-the-art extractive and abstractive -systems when evaluated automatically and by humans. -" -6772,1802.08690,Chenhao Tan and Hao Peng and Noah A. Smith,"""You are no Jack Kennedy"": On Media Selection of Highlights from - Presidential Debates",cs.SI cs.CL physics.soc-ph," Political speeches and debates play an important role in shaping the images -of politicians, and the public often relies on media outlets to select bits of -political communication from a large pool of utterances. It is an important -research question to understand what factors impact this selection process. - To quantitatively explore the selection process, we build a three- decade -dataset of presidential debate transcripts and post-debate coverage. We first -examine the effect of wording and propose a binary classification framework -that controls for both the speaker and the debate situation. We find that -crowdworkers can only achieve an accuracy of 60% in this task, indicating that -media choices are not entirely obvious. Our classifiers outperform crowdworkers -on average, mainly in primary debates. We also compare important factors from -crowdworkers' free-form explanations with those from data-driven methods and -find interesting differences. Few crowdworkers mentioned that ""context -matters"", whereas our data show that well-quoted sentences are more distinct -from the previous utterance by the same speaker than less-quoted sentences. -Finally, we examine the aggregate effect of media preferences towards different -wordings to understand the extent of fragmentation among media outlets. By -analyzing a bipartite graph built from quoting behavior in our data, we observe -a decreasing trend in bipartisan coverage. -" -6773,1802.08731,"Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, - Jan Trmal, Zhongqiang Huang, Najim Dehak, Sanjeev Khudanpur","Automatic Speech Recognition and Topic Identification for - Almost-Zero-Resource Languages",cs.CL," Automatic speech recognition (ASR) systems often need to be developed for -extremely low-resource languages to serve end-uses such as audio content -categorization and search. While universal phone recognition is natural to -consider when no transcribed speech is available to train an ASR system in a -language, adapting universal phone models using very small amounts (minutes -rather than hours) of transcribed speech also needs to be studied, particularly -with state-of-the-art DNN-based acoustic models. The DARPA LORELEI program -provides a framework for such very-low-resource ASR studies, and provides an -extrinsic metric for evaluating ASR performance in a humanitarian assistance, -disaster relief setting. This paper presents our Kaldi-based systems for the -program, which employ a universal phone modeling approach to ASR, and describes -recipes for very rapid adaptation of this universal ASR system. The results we -obtain significantly outperform results obtained by many competing approaches -on the NIST LoReHLT 2017 Evaluation datasets. -" -6774,1802.08786,"Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, Le Song",Syntax-Directed Variational Autoencoder for Structured Data,cs.LG cs.CL," Deep generative models have been enjoying success in modeling continuous -data. However it remains challenging to capture the representations for -discrete structures with formal grammars and semantics, e.g., computer programs -and molecular structures. How to generate both syntactically and semantically -correct data still remains largely an open problem. Inspired by the theory of -compiler where the syntax and semantics check is done via syntax-directed -translation (SDT), we propose a novel syntax-directed variational autoencoder -(SD-VAE) by introducing stochastic lazy attributes. This approach converts the -offline SDT check into on-the-fly generated guidance for constraining the -decoder. Comparing to the state-of-the-art methods, our approach enforces -constraints on the output space so that the output will be not only -syntactically valid, but also semantically reasonable. We evaluate the proposed -model with applications in programming language and molecules, including -reconstruction and program/molecule optimization. The results demonstrate the -effectiveness in incorporating syntactic and semantic constraints in discrete -generative models, which is significantly better than current state-of-the-art -approaches. -" -6775,1802.08949,Dushyanta Dhyani,"OhioState at SemEval-2018 Task 7: Exploiting Data Augmentation for - Relation Classification in Scientific Papers using Piecewise Convolutional - Neural Networks",cs.CL," We describe our system for SemEval-2018 Shared Task on Semantic Relation -Extraction and Classification in Scientific Papers where we focus on the -Classification task. Our simple piecewise convolution neural encoder performs -decently in an end to end manner. A simple inter-task data augmentation -signifi- cantly boosts the performance of the model. Our best-performing -systems stood 8th out of 20 teams on the classification task on noisy data and -12th out of 28 teams on the classification task on clean data. -" -6776,1802.08969,"Junkun Chen, Xipeng Qiu, Pengfei Liu, Xuanjing Huang",Meta Multi-Task Learning for Sequence Modeling,cs.AI cs.CL," Semantic composition functions have been playing a pivotal role in neural -representation learning of text sequences. In spite of their success, most -existing models suffer from the underfitting problem: they use the same shared -compositional function on all the positions in the sequence, thereby lacking -expressive power due to incapacity to capture the richness of compositionality. -Besides, the composition functions of different tasks are independent and -learned from scratch. In this paper, we propose a new sharing scheme of -composition function across multiple tasks. Specifically, we use a shared -meta-network to capture the meta-knowledge of semantic composition and generate -the parameters of the task-specific semantic composition models. We conduct -extensive experiments on two types of tasks, text classification and sequence -tagging, which demonstrate the benefits of our approach. Besides, we show that -the shared meta-knowledge learned by our proposed model can be regarded as -off-the-shelf knowledge and easily transferred to new tasks. -" -6777,1802.08970,"Jinyue Su, Jiacheng Xu, Xipeng Qiu, Xuanjing Huang","Incorporating Discriminator in Sentence Generation: a Gibbs Sampling - Method",cs.CL," Generating plausible and fluent sentence with desired properties has long -been a challenge. Most of the recent works use recurrent neural networks (RNNs) -and their variants to predict following words given previous sequence and -target label. In this paper, we propose a novel framework to generate -constrained sentences via Gibbs Sampling. The candidate sentences are revised -and updated iteratively, with sampled new words replacing old ones. Our -experiments show the effectiveness of the proposed method to generate plausible -and diverse sentences. -" -6778,1802.08979,"Xi Victoria Lin and Chenglong Wang and Luke Zettlemoyer and Michael D. - Ernst","NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to - the Linux Operating System",cs.CL cs.SE," We present new data and semantic parsing methods for the problem of mapping -English sentences to Bash commands (NL2Bash). Our long-term goal is to enable -any user to perform operations such as file manipulation, search, and -application-specific scripting by simply stating their goals in English. We -take a first step in this domain, by providing a new dataset of challenging but -commonly used Bash commands and expert-written English descriptions, along with -baseline methods to establish performance levels on this task. -" -6779,1802.09059,"Ahmad Pesaranghader, Ali Pesaranghader, Stan Matwin, Marina Sokolova","One Single Deep Bidirectional LSTM Network for Word Sense Disambiguation - of Text Data",cs.LG cs.CL cs.IR stat.ML," Due to recent technical and scientific advances, we have a wealth of -information hidden in unstructured text data such as offline/online narratives, -research articles, and clinical reports. To mine these data properly, -attributable to their innate ambiguity, a Word Sense Disambiguation (WSD) -algorithm can avoid numbers of difficulties in Natural Language Processing -(NLP) pipeline. However, considering a large number of ambiguous words in one -language or technical domain, we may encounter limiting constraints for proper -deployment of existing WSD models. This paper attempts to address the problem -of one-classifier-per-one-word WSD algorithms by proposing a single -Bidirectional Long Short-Term Memory (BLSTM) network which by considering -senses and context sequences works on all ambiguous words collectively. -Evaluated on SensEval-3 benchmark, we show the result of our model is -comparable with top-performing WSD algorithms. We also discuss how applying -additional modifications alleviates the model fault and the need for more -training data. -" -6780,1802.09091,R. Thomas McCoy and Robert Frank and Tal Linzen,"Revisiting the poverty of the stimulus: hierarchical generalization - without a hierarchical bias in recurrent neural networks",cs.CL," Syntactic rules in natural language typically need to make reference to -hierarchical sentence structure. However, the simple examples that language -learners receive are often equally compatible with linear rules. Children -consistently ignore these linear explanations and settle instead on the correct -hierarchical one. This fact has motivated the proposal that the learner's -hypothesis space is constrained to include only hierarchical rules. We examine -this proposal using recurrent neural networks (RNNs), which are not constrained -in such a way. We simulate the acquisition of question formation, a -hierarchical transformation, in a fragment of English. We find that some RNN -architectures tend to learn the hierarchical rule, suggesting that hierarchical -cues within the language, combined with the implicit architectural biases -inherent in certain RNNs, may be sufficient to induce hierarchical -generalizations. The likelihood of acquiring the hierarchical generalization -increased when the language included an additional cue to hierarchy in the form -of subject-verb agreement, underscoring the role of cues to hierarchy in the -learner's input. -" -6781,1802.09130,Payam Karisani and Eugene Agichtein,"Did You Really Just Have a Heart Attack? Towards Robust Detection of - Personal Health Mentions in Social Media",cs.CL," Millions of users share their experiences on social media sites, such as -Twitter, which in turn generate valuable data for public health monitoring, -digital epidemiology, and other analyses of population health at global scale. -The first, critical, task for these applications is classifying whether a -personal health event was mentioned, which we call the (PHM) problem. This task -is challenging for many reasons, including typically short length of social -media posts, inventive spelling and lexicons, and figurative language, -including hyperbole using diseases like ""heart attack"" or ""cancer"" for -emphasis, and not as a health self-report. This problem is even more -challenging for rarely reported, or frequent but ambiguously expressed -conditions, such as ""stroke"". To address this problem, we propose a general, -robust method for detecting PHMs in social media, which we call WESPAD, that -combines lexical, syntactic, word embedding-based, and context-based features. -WESPAD is able to generalize from few examples by automatically distorting the -word embedding space to most effectively detect the true health mentions. -Unlike previously proposed state-of-the-art supervised and deep-learning -techniques, WESPAD requires relatively little training data, which makes it -possible to adapt, with minimal effort, to each new disease and condition. We -evaluate WESPAD on both an established publicly available Flu detection -benchmark, and on a new dataset that we have constructed with mentions of -multiple health conditions. Our experiments show that WESPAD outperforms the -baselines and state-of-the-art methods, especially in cases when the number and -proportion of true health mentions in the training data is small. -" -6782,1802.09189,"XingYu Fu, ZiYi Yang, XiuWen Duan","Language Distribution Prediction based on Batch Markov Monte Carlo - Simulation with Migration",cs.CL," Language spreading is a complex mechanism that involves issues like culture, -economics, migration, population etc. In this paper, we propose a set of -methods to model the dynamics of the spreading system. To model the randomness -of language spreading, we propose the Batch Markov Monte Carlo Simulation with -Migration(BMMCSM) algorithm, in which each agent is treated as a language -stack. The agent learns languages and migrates based on the proposed Batch -Markov Property according to the transition matrix T and migration matrix M. -Since population plays a crucial role in language spreading, we also introduce -the Mortality and Fertility Mechanism, which controls the birth and death of -the simulated agents, into the BMMCSM algorithm. The simulation results of -BMMCSM show that the numerical and geographic distribution of languages varies -across the time. The change of distribution fits the world cultural and -economic development trend. Next, when we construct Matrix T, there are some -entries of T can be directly calculated from historical statistics while some -entries of T is unknown. Thus, the key to the success of the BMMCSM lies in the -accurate estimation of transition matrix T by estimating the unknown entries of -T under the supervision of the known entries. To achieve this, we first -construct a 20 by 20 by 5 factor tensor X to characterize each entry of T. Then -we train a Random Forest Regressor on the known entries of T and use the -trained regressor to predict the unknown entries. The reason why we choose -Random Forest(RF) is that, compared to Single Decision Tree, it conquers the -problem of over fitting and the Shapiro test also suggests that the residual of -RF subjects to the Normal distribution. -" -6783,1802.09194,"Mengxiao Bi, Heng Lu, Shiliang Zhang, Ming Lei, Zhijie Yan",Deep Feed-forward Sequential Memory Networks for Speech Synthesis,cs.CL," The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the -best parametric Text-to-Speech (TTS) systems in terms of the naturalness of -generated speech, especially the naturalness in prosody. However, the model -complexity and inference cost of BLSTM prevents its usage in many runtime -applications. Meanwhile, Deep Feed-forward Sequential Memory Networks (DFSMN) -has shown its consistent out-performance over BLSTM in both word error rate -(WER) and the runtime computation cost in speech recognition tasks. Since -speech synthesis also requires to model long-term dependencies compared to -speech recognition, in this paper, we investigate the Deep-FSMN (DFSMN) in -speech synthesis. Both objective and subjective experiments show that, compared -with BLSTM TTS method, the DFSMN system can generate synthesized speech with -comparable speech quality while drastically reduce model complexity and speech -generation time. -" -6784,1802.09233,Mohammed Jabreel and Antonio Moreno,"EiTAKA at SemEval-2018 Task 1: An Ensemble of N-Channels ConvNet and - XGboost Regressors for Emotion Analysis of Tweets",cs.CL," This paper describes our system that has been used in Task1 Affect in Tweets. -We combine two different approaches. The first one called N-Stream ConvNets, -which is a deep learning approach where the second one is XGboost regresseor -based on a set of embedding and lexicons based features. Our system was -evaluated on the testing sets of the tasks outperforming all other approaches -for the Arabic version of valence intensity regression task and valence ordinal -classification task. -" -6785,1802.09287,"Mostafa Elaraby, Ahmed Y. Tawfik, Mahmoud Khaled, Hany Hassan, Aly - Osama",Gender Aware Spoken Language Translation Applied to English-Arabic,cs.CL," Spoken Language Translation (SLT) is becoming more widely used and becoming a -communication tool that helps in crossing language barriers. One of the -challenges of SLT is the translation from a language without gender agreement -to a language with gender agreement such as English to Arabic. In this paper, -we introduce an approach to tackle such limitation by enabling a Neural Machine -Translation system to produce gender-aware translation. We show that NMT system -can model the speaker/listener gender information to produce gender-aware -translation. We propose a method to generate data used in adapting a NMT system -to produce gender-aware. The proposed approach can achieve significant -improvement of the translation quality by 2 BLEU points. -" -6786,1802.09296,"Sherzod Hakimov, Soufian Jebbara, Philipp Cimiano","AMUSE: Multilingual Semantic Parsing for Question Answering over Linked - Data",cs.AI cs.CL," The task of answering natural language questions over RDF data has received -wide interest in recent years, in particular in the context of the series of -QALD benchmarks. The task consists of mapping a natural language question to an -executable form, e.g. SPARQL, so that answers from a given KB can be extracted. -So far, most systems proposed are i) monolingual and ii) rely on a set of -hard-coded rules to interpret questions and map them into a SPARQL query. We -present the first multilingual QALD pipeline that induces a model from training -data for mapping a natural language question into logical form as probabilistic -inference. In particular, our approach learns to map universal syntactic -dependency representations to a language-independent logical form based on -DUDES (Dependency-based Underspecified Discourse Representation Structures) -that are then mapped to a SPARQL query as a deterministic second step. Our -model builds on factor graphs that rely on features extracted from the -dependency graph and corresponding semantic representations. We rely on -approximate inference techniques, Markov Chain Monte Carlo methods in -particular, as well as Sample Rank to update parameters using a ranking -objective. Our focus lies on developing methods that overcome the lexical gap -and present a novel combination of machine translation and word embedding -approaches for this purpose. As a proof of concept for our approach, we -evaluate our approach on the QALD-6 datasets for English, German & Spanish. -" -6787,1802.09375,Johannes Bjerva and Isabelle Augenstein,"From Phonology to Syntax: Unsupervised Linguistic Typology at Different - Levels with Language Embeddings",cs.CL," A core part of linguistic typology is the classification of languages -according to linguistic properties, such as those detailed in the World Atlas -of Language Structure (WALS). Doing this manually is prohibitively -time-consuming, which is in part evidenced by the fact that only 100 out of -over 7,000 languages spoken in the world are fully covered in WALS. - We learn distributed language representations, which can be used to predict -typological properties on a massively multilingual scale. Additionally, -quantitative and qualitative analyses of these language embeddings can tell us -how language similarities are encoded in NLP models for tasks at different -typological levels. The representations are learned in an unsupervised manner -alongside tasks at three typological levels: phonology (grapheme-to-phoneme -prediction, and phoneme reconstruction), morphology (morphological inflection), -and syntax (part-of-speech tagging). - We consider more than 800 languages and find significant differences in the -language representations encoded, depending on the target task. For instance, -although Norwegian Bokm{\aa}l and Danish are typologically close to one -another, they are phonologically distant, which is reflected in their language -embeddings growing relatively distant in a phonological task. We are also able -to predict typological features in WALS with high accuracies, even for unseen -language families. -" -6788,1802.09416,"Mohammadreza Rezvan, Saeedeh Shekarpour, Lakshika Balasuriya, - Krishnaprasad Thirunarayan, Valerie Shalin, Amit Sheth","A Quality Type-aware Annotated Corpus and Lexicon for Harassment - Research",cs.CL," Having a quality annotated corpus is essential especially for applied -research. Despite the recent focus of Web science community on researching -about cyberbullying, the community dose not still have standard benchmarks. In -this paper, we publish first, a quality annotated corpus and second, an -offensive words lexicon capturing different types type of harassment as (i) -sexual harassment, (ii) racial harassment, (iii) appearance-related harassment, -(iv) intellectual harassment, and (v) political harassment.We crawled data from -Twitter using our offensive lexicon. Then relied on the human judge to annotate -the collected tweets w.r.t. the contextual types because using offensive words -is not sufficient to reliably detect harassment. Our corpus consists of 25,000 -annotated tweets in five contextual types. We are pleased to share this novel -annotated corpus and the lexicon with the research community. The instruction -to acquire the corpus has been published on the Git repository. -" -6789,1802.09426,"Mayank Chaudhari, Aakash Nelson Mattukoyya",Tone Biased MMR Text Summarization,cs.IR cs.CL," Text summarization is an interesting area for researchers to develop new -techniques to provide human like summaries for vast amounts of information. -Summarization techniques tend to focus on providing accurate representation of -content, and often the tone of the content is ignored. Tone of the content sets -a baseline for how a reader perceives the content. As such being able to -generate summary with tone that is appropriate for the reader is important. In -our work we implement Maximal Marginal Relevance [MMR] based multi-document -text summarization and propose a naive model to change tone of the -summarization by setting a bias to specific set of words and restricting other -words in the summarization output. This bias towards a specified set of words -produces a summary whose tone is same as tone of specified words. -" -6790,1802.09777,Niko Brummer and Anna Silnova and Lukas Burget and Themos Stafylakis,"Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA - model",stat.ML cs.CL cs.LG," Embeddings in machine learning are low-dimensional representations of complex -input patterns, with the property that simple geometric operations like -Euclidean distances and dot products can be used for classification and -comparison tasks. The proposed meta-embeddings are special embeddings that live -in more general inner product spaces. They are designed to propagate -uncertainty to the final output in speaker recognition and similar -applications. The familiar Gaussian PLDA model (GPLDA) can be re-formulated as -an extractor for Gaussian meta-embeddings (GMEs), such that likelihood ratio -scores are given by Hilbert space inner products between Gaussian likelihood -functions. GMEs extracted by the GPLDA model have fixed precisions and do not -propagate uncertainty. We show that a generalization to heavy-tailed PLDA gives -GMEs with variable precisions, which do propagate uncertainty. Experiments on -NIST SRE 2010 and 2016 show that the proposed method applied to i-vectors -without length normalization is up to 20% more accurate than GPLDA applied to -length-normalized ivectors. -" -6791,1802.09884,"Avinesh P.V.S., Maxime Peyrard, Christian M. Meyer",Live Blog Corpus for Summarization,cs.CL," Live blogs are an increasingly popular news format to cover breaking news and -live events in online journalism. Online news websites around the world are -using this medium to give their readers a minute by minute update on an event. -Good summaries enhance the value of the live blogs for a reader but are often -not available. In this paper, we study a way of collecting corpora for -automatic live blog summarization. In an empirical evaluation using well-known -state-of-the-art summarization systems, we show that live blogs corpus poses -new challenges in the field of summarization. We make our tools publicly -available to reconstruct the corpus to encourage the research community and -replicate our results. -" -6792,1802.09913,"Isabelle Augenstein, Sebastian Ruder, Anders S{\o}gaard","Multi-task Learning of Pairwise Sequence Classification Tasks Over - Disparate Label Spaces",cs.CL cs.NE stat.ML," We combine multi-task learning and semi-supervised learning by inducing a -joint embedding space between disparate label spaces and learning transfer -functions between label embeddings, enabling us to jointly leverage unlabelled -data and auxiliary, annotated datasets. We evaluate our approach on a variety -of sequence classification tasks with disparate label spaces. We outperform -strong single and multi-task baselines and achieve a new state-of-the-art for -topic-based sentiment analysis. -" -6793,1802.09914,M. Andrecut,High-Dimensional Vector Semantics,cs.CL cs.AI cs.LG stat.ML," In this paper we explore the ""vector semantics"" problem from the perspective -of ""almost orthogonal"" property of high-dimensional random vectors. We show -that this intriguing property can be used to ""memorize"" random vectors by -simply adding them, and we provide an efficient probabilistic solution to the -set membership problem. Also, we discuss several applications to word context -vector embeddings, document sentences similarity, and spam filtering. -" -6794,1802.09944,Jaimie Murdock and Colin Allen and Simon DeDeo,The Development of Darwin's Origin of Species,cs.CL cs.DL," From 1837, when he returned to England aboard the $\textit{HMS Beagle}$, to -1860, just after publication of $\textit{The Origin of Species}$, Charles -Darwin kept detailed notes of each book he read or wanted to read. His notes -and manuscripts provide information about decades of individual scientific -practice. Previously, we trained topic models on the full texts of each -reading, and applied information-theoretic measures to detect that changes in -his reading patterns coincided with the boundaries of his three major -intellectual projects in the period 1837-1860. In this new work we apply the -reading model to five additional documents, four of them by Darwin: the first -edition of $\textit{The Origin of Species}$, two private essays stating -intermediate forms of his theory in 1842 and 1844, a third essay of disputed -dating, and Alfred Russel Wallace's essay, which Darwin received in 1858. We -address three historical inquiries, previously treated qualitatively: 1) the -mythology of ""Darwin's Delay,"" that despite completing an extensive draft in -1844, Darwin waited until 1859 to publish $\textit{The Origin of Species}$ due -to external pressures; 2) the relationship between Darwin and Wallace's -contemporaneous theories, especially in light of their joint presentation; and -3) dating of the ""Outline and Draft"" which was rediscovered in 1975 and -postulated first as an 1839 draft preceding the Sketch of 1842, then as an -interstitial draft between the 1842 and 1844 essays. -" -6795,1802.09957,"Spiros V. Georgakopoulos, Sotiris K. Tasoulis, Aristidis G. Vrahatis - and Vassilis P. Plagianakos",Convolutional Neural Networks for Toxic Comment Classification,cs.CL cs.LG," Flood of information is produced in a daily basis through the global Internet -usage arising from the on-line interactive communications among users. While -this situation contributes significantly to the quality of human life, -unfortunately it involves enormous dangers, since on-line texts with high -toxicity can cause personal attacks, on-line harassment and bullying behaviors. -This has triggered both industrial and research community in the last few years -while there are several tries to identify an efficient model for on-line toxic -comment prediction. However, these steps are still in their infancy and new -approaches and frameworks are required. On parallel, the data explosion that -appears constantly, makes the construction of new machine learning -computational tools for managing this information, an imperative need. -Thankfully advances in hardware, cloud computing and big data management allow -the development of Deep Learning approaches appearing very promising -performance so far. For text classification in particular the use of -Convolutional Neural Networks (CNN) have recently been proposed approaching -text analytics in a modern manner emphasizing in the structure of words in a -document. In this work, we employ this approach to discover toxic comments in a -large pool of documents provided by a current Kaggle's competition regarding -Wikipedia's talk page edits. To justify this decision we choose to compare CNNs -against the traditional bag-of-words approach for text analysis combined with a -selection of algorithms proven to be very effective in text classification. The -reported results provide enough evidence that CNN enhance toxic comment -classification reinforcing research interest towards this direction. -" -6796,1802.09961,"Jing Peng, Anna Feldman, Ekaterina Vylomova","Classifying Idiomatic and Literal Expressions Using Topic Models and - Intensity of Emotions",cs.CL," We describe an algorithm for automatic classification of idiomatic and -literal expressions. Our starting point is that words in a given text segment, -such as a paragraph, that are highranking representatives of a common topic of -discussion are less likely to be a part of an idiomatic expression. Our -additional hypothesis is that contexts in which idioms occur, typically, are -more affective and therefore, we incorporate a simple analysis of the intensity -of the emotions expressed by the contexts. We investigate the bag of words -topic representation of one to three paragraphs containing an expression that -should be classified as idiomatic or literal (a target phrase). We extract -topics from paragraphs containing idioms and from paragraphs containing -literals using an unsupervised clustering method, Latent Dirichlet Allocation -(LDA) (Blei et al., 2003). Since idiomatic expressions exhibit the property of -non-compositionality, we assume that they usually present different semantics -than the words used in the local topic. We treat idioms as semantic outliers, -and the identification of a semantic shift as outlier detection. Thus, this -topic representation allows us to differentiate idioms from literals using -local semantic contexts. Our results are encouraging. -" -6797,1802.09968,"Chieh-Teng Chang, Chi-Chia Huang, Chih-Yuan Yang and Jane Yung-Jen Hsu",A Hybrid Word-Character Approach to Abstractive Summarization,cs.CL," Automatic abstractive text summarization is an important and challenging -research topic of natural language processing. Among many widely used -languages, the Chinese language has a special property that a Chinese character -contains rich information comparable to a word. Existing Chinese text -summarization methods, either adopt totally character-based or word-based -representations, fail to fully exploit the information carried by both -representations. To accurately capture the essence of articles, we propose a -hybrid word-character approach (HWC) which preserves the advantages of both -word-based and character-based representations. We evaluate the advantage of -the proposed HWC approach by applying it to two existing methods, and discover -that it generates state-of-the-art performance with a margin of 24 ROUGE points -on a widely used dataset LCSTS. In addition, we find an issue contained in the -LCSTS dataset and offer a script to remove overlapping pairs (a summary and a -short text) to create a clean dataset for the community. The proposed HWC -approach also generates the best performance on the new, clean LCSTS dataset. -" -6798,1802.10078,"Sunil Mohan, Nicolas Fiorini, Sun Kim, Zhiyong Lu","A Fast Deep Learning Model for Textual Relevance in Biomedical - Information Retrieval",cs.IR cs.CL," Publications in the life sciences are characterized by a large technical -vocabulary, with many lexical and semantic variations for expressing the same -concept. Towards addressing the problem of relevance in biomedical literature -search, we introduce a deep learning model for the relevance of a document's -text to a keyword style query. Limited by a relatively small amount of training -data, the model uses pre-trained word embeddings. With these, the model first -computes a variable-length Delta matrix between the query and document, -representing a difference between the two texts, which is then passed through a -deep convolution stage followed by a deep feed-forward network to compute a -relevance score. This results in a fast model suitable for use in an online -search engine. The model is robust and outperforms comparable state-of-the-art -deep learning approaches. -" -6799,1802.10137,"Aakash Sinha, Abhishek Yadav, Akshay Gahlot",Extractive Text Summarization using Neural Networks,cs.CL," Text Summarization has been an extensively studied problem. Traditional -approaches to text summarization rely heavily on feature engineering. In -contrast to this, we propose a fully data-driven approach using feedforward -neural networks for single document summarization. We train and evaluate the -model on standard DUC 2002 dataset which shows results comparable to the state -of the art models. The proposed model is scalable and is able to produce the -summary of arbitrarily sized documents by breaking the original document into -fixed sized parts and then feeding it recursively to the network. -" -6800,1802.10229,"Yi Yang, Ozan Irsoy, Kazi Shefaet Rahman",Collective Entity Disambiguation with Structured Gradient Tree Boosting,cs.CL," We present a gradient-tree-boosting-based structured learning model for -jointly disambiguating named entities in a document. Gradient tree boosting is -a widely used machine learning algorithm that underlies many top-performing -natural language processing systems. Surprisingly, most works limit the use of -gradient tree boosting as a tool for regular classification or regression -problems, despite the structured nature of language. To the best of our -knowledge, our work is the first one that employs the structured gradient tree -boosting (SGTB) algorithm for collective entity disambiguation. By defining -global features over previous disambiguation decisions and jointly modeling -them with local features, our system is able to produce globally optimized -entity assignments for mentions in a document. Exact inference is prohibitively -expensive for our globally normalized model. To solve this problem, we propose -Bidirectional Beam Search with Gold path (BiBSG), an approximate inference -algorithm that is a variant of the standard beam search algorithm. BiBSG makes -use of global information from both past and future to perform better local -search. Experiments on standard benchmark datasets show that SGTB significantly -improves upon published results. Specifically, SGTB outperforms the previous -state-of-the-art neural system by near 1\% absolute accuracy on the popular -AIDA-CoNLL dataset. -" -6801,1802.10279,"Xiao Zhang, Ji Wu, Zhiyang He, Xien Liu, Ying Su",Medical Exam Question Answering with Large-scale Reading Comprehension,cs.CL," Reading and understanding text is one important component in computer aided -diagnosis in clinical medicine, also being a major research problem in the -field of NLP. In this work, we introduce a question-answering task called MedQA -to study answering questions in clinical medicine using knowledge in a -large-scale document collection. The aim of MedQA is to answer real-world -questions with large-scale reading comprehension. We propose our solution -SeaReader--a modular end-to-end reading comprehension model based on LSTM -networks and dual-path attention architecture. The novel dual-path attention -models information flow from two perspectives and has the ability to -simultaneously read individual documents and integrate information across -multiple documents. In experiments our SeaReader achieved a large increase in -accuracy on MedQA over competing models. Additionally, we develop a series of -novel techniques to demonstrate the interpretation of the question answering -process in SeaReader. -" -6802,1802.10411,Massimo Stella and Manlio De Domenico,"Distance entropy cartography characterises centrality in complex - networks",physics.soc-ph cond-mat.stat-mech cs.CL cs.SI physics.data-an," We introduce distance entropy as a measure of homogeneity in the distribution -of path lengths between a given node and its neighbours in a complex network. -Distance entropy defines a new centrality measure whose properties are -investigated for a variety of synthetic network models. By coupling distance -entropy information with closeness centrality, we introduce a network -cartography which allows one to reduce the degeneracy of ranking based on -closeness alone. We apply this methodology to the empirical multiplex lexical -network encoding the linguistic relationships known to English speaking -toddlers. We show that the distance entropy cartography better predicts how -children learn words compared to closeness centrality. Our results highlight -the importance of distance entropy for gaining insights from distance patterns -in complex networks. -" -6803,1802.10569,Patrick Verga and Emma Strubell and Andrew McCallum,"Simultaneously Self-Attending to All Mentions for Full-Abstract - Biological Relation Extraction",cs.CL," Most work in relation extraction forms a prediction by looking at a short -span of text within a single sentence containing a single entity pair mention. -This approach often does not consider interactions across mentions, requires -redundant computation for each mention pair, and ignores relationships -expressed across sentence boundaries. These problems are exacerbated by the -document- (rather than sentence-) level annotation common in biological text. -In response, we propose a model which simultaneously predicts relationships -between all mention pairs in a document. We form pairwise predictions over -entire paper abstracts using an efficient self-attention encoder. All-pairs -mention scores allow us to perform multi-instance learning by aggregating over -mentions to form entity pair representations. We further adapt to settings -without mention-level annotation by jointly training to predict named entities -and adding a corpus of weakly labeled data. In experiments on two Biocreative -benchmark datasets, we achieve state of the art performance on the Biocreative -V Chemical Disease Relation dataset for models without external KB resources. -We also introduce a new dataset an order of magnitude larger than existing -human-annotated biological information extraction datasets and more accurate -than distantly supervised alternatives. -" -6804,1803.00047,Myle Ott and Michael Auli and David Grangier and Marc'Aurelio Ranzato,Analyzing Uncertainty in Neural Machine Translation,cs.CL," Machine translation is a popular test bed for research in neural -sequence-to-sequence models but despite much recent research, there is still a -lack of understanding of these models. Practitioners report performance -degradation with large beams, the under-estimation of rare words and a lack of -diversity in the final translations. Our study relates some of these issues to -the inherent uncertainty of the task, due to the existence of multiple valid -translations for a single source sentence, and to the extrinsic uncertainty -caused by noisy training data. We propose tools and metrics to assess how -uncertainty in the data is captured by the model distribution and how it -affects search strategies that generate translations. Our results show that -search works remarkably well but that models tend to spread too much -probability mass over the hypothesis space. Next, we propose tools to assess -model calibration and show how to easily fix some shortcomings of current -models. As part of this study, we release multiple human reference translations -for two popular benchmarks. -" -6805,1803.00057,"Pelin Dogan, Boyang Li, Leonid Sigal, Markus Gross",A Neural Multi-sequence Alignment TeCHnique (NeuMATCH),cs.CV cs.CL cs.LG," The alignment of heterogeneous sequential data (video to text) is an -important and challenging problem. Standard techniques for this task, including -Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from -inherent drawbacks. Mainly, the Markov assumption implies that, given the -immediate past, future alignment decisions are independent of further history. -The separation between similarity computation and alignment decision also -prevents end-to-end training. In this paper, we propose an end-to-end neural -architecture where alignment actions are implemented as moving data between -stacks of Long Short-term Memory (LSTM) blocks. This flexible architecture -supports a large variety of alignment tasks, including one-to-one, one-to-many, -skipping unmatched elements, and (with extensions) non-monotonic alignment. -Extensive experiments on semi-synthetic and real datasets show that our -algorithm outperforms state-of-the-art baselines. -" -6806,1803.00124,"Abdulaziz M. Alayba, Vasile Palade, Matthew England and Rahat Iqbal",Improving Sentiment Analysis in Arabic Using Word Representation,cs.CL," The complexities of Arabic language in morphology, orthography and dialects -makes sentiment analysis for Arabic more challenging. Also, text feature -extraction from short messages like tweets, in order to gauge the sentiment, -makes this task even more difficult. In recent years, deep neural networks were -often employed and showed very good results in sentiment classification and -natural language processing applications. Word embedding, or word distributing -approach, is a current and powerful tool to capture together the closest words -from a contextual text. In this paper, we describe how we construct Word2Vec -models from a large Arabic corpus obtained from ten newspapers in different -Arab countries. By applying different machine learning algorithms and -convolutional neural networks with different text feature selections, we report -improved accuracy of sentiment classification (91%-95%) on our publicly -available Arabic language health sentiment dataset [1] -" -6807,1803.00179,"Bang Liu, Ting Zhang, Fred X. Han, Di Niu, Kunfeng Lai, Yu Xu","Matching Natural Language Sentences with Hierarchical Sentence - Factorization",cs.CL," Semantic matching of natural language sentences or identifying the -relationship between two sentences is a core research problem underlying many -natural language tasks. Depending on whether training data is available, prior -research has proposed both unsupervised distance-based schemes and supervised -deep learning schemes for sentence matching. However, previous approaches -either omit or fail to fully utilize the ordered, hierarchical, and flexible -structures of language objects, as well as the interactions between them. In -this paper, we propose Hierarchical Sentence Factorization---a technique to -factorize a sentence into a hierarchical representation, with the components at -each different scale reordered into a ""predicate-argument"" form. The proposed -sentence factorization technique leads to the invention of: 1) a new -unsupervised distance metric which calculates the semantic distance between a -pair of text snippets by solving a penalized optimal transport problem while -preserving the logical relationship of words in the reordered sentences, and 2) -new multi-scale deep learning models for supervised semantic training, based on -factorized sentence hierarchies. We apply our techniques to text-pair -similarity estimation and text-pair relationship classification tasks, based on -multiple datasets such as STSbenchmark, the Microsoft Research paraphrase -identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments -show that the proposed hierarchical sentence factorization can be used to -significantly improve the performance of existing unsupervised distance-based -metrics as well as multiple supervised deep learning models based on the -convolutional neural network (CNN) and long short-term memory (LSTM). -" -6808,1803.00188,"Graham Neubig, Matthias Sperber, Xinyi Wang, Matthieu Felix, Austin - Matthews, Sarguna Padmanabhan, Ye Qi, Devendra Singh Sachan, Philip Arthur, - Pierre Godard, John Hewitt, Rachid Riad, Liming Wang",XNMT: The eXtensible Neural Machine Translation Toolkit,cs.CL," This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. -XNMT distin- guishes itself from other open-source NMT toolkits by its focus on -modular code design, with the purpose of enabling fast iteration in research -and replicable, reliable results. In this paper we describe the design of XNMT -and its experiment configuration system, and demonstrate its utility on the -tasks of machine translation, speech recognition, and multi-tasked machine -translation/parsing. XNMT is available open-source at -https://github.com/neulab/xnmt -" -6809,1803.00189,"Bang Liu, Di Niu, Kunfeng Lai, Linglong Kong, Yu Xu",Growing Story Forest Online from Massive Breaking News,cs.IR cs.CL," We describe our experience of implementing a news content organization system -at Tencent that discovers events from vast streams of breaking news and evolves -news story structures in an online fashion. Our real-world system has distinct -requirements in contrast to previous studies on topic detection and tracking -(TDT) and event timeline or graph generation, in that we 1) need to accurately -and quickly extract distinguishable events from massive streams of long text -documents that cover diverse topics and contain highly redundant information, -and 2) must develop the structures of event stories in an online manner, -without repeatedly restructuring previously formed stories, in order to -guarantee a consistent user viewing experience. In solving these challenges, we -propose Story Forest, a set of online schemes that automatically clusters -streaming documents into events, while connecting related events in growing -trees to tell evolving stories. We conducted extensive evaluation based on 60 -GB of real-world Chinese news data, although our ideas are not -language-dependent and can easily be extended to other languages, through -detailed pilot user experience studies. The results demonstrate the superior -capability of Story Forest to accurately identify events and organize news text -into a logical structure that is appealing to human readers, compared to -multiple existing algorithm frameworks. -" -6810,1803.00191,"Liang Wang, Meng Sun, Wei Zhao, Kewei Shen, Jingming Liu","Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational - Knowledge for Commonsense Machine Comprehension",cs.CL," This paper describes our system for SemEval-2018 Task 11: Machine -Comprehension using Commonsense Knowledge. We use Three-way Attentive Networks -(TriAN) to model interactions between the passage, question and answers. To -incorporate commonsense knowledge, we augment the input with relation embedding -from the graph of general knowledge ConceptNet (Speer et al., 2017). As a -result, our system achieves state-of-the-art performance with 83.95% accuracy -on the official test data. Code is publicly available at -https://github.com/intfloat/commonsense-rc -" -6811,1803.00202,"Miguel Campo, JJ Espinoza, Julie Rieger, Abhinav Taliyan","Collaborative Metric Learning Recommendation System: Application to - Theatrical Movie Releases",cs.IR cs.CL," Product recommendation systems are important for major movie studios during -the movie greenlight process and as part of machine learning personalization -pipelines. Collaborative Filtering (CF) models have proved to be effective at -powering recommender systems for online streaming services with explicit -customer feedback data. CF models do not perform well in scenarios in which -feedback data is not available, in cold start situations like new product -launches, and situations with markedly different customer tiers (e.g., high -frequency customers vs. casual customers). Generative natural language models -that create useful theme-based representations of an underlying corpus of -documents can be used to represent new product descriptions, like new movie -plots. When combined with CF, they have shown to increase the performance in -cold start situations. Outside of those cases though in which explicit customer -feedback is available, recommender engines must rely on binary purchase data, -which materially degrades performance. Fortunately, purchase data can be -combined with product descriptions to generate meaningful representations of -products and customer trajectories in a convenient product space in which -proximity represents similarity. Learning to measure the distance between -points in this space can be accomplished with a deep neural network that trains -on customer histories and on dense vectorizations of product descriptions. We -developed a system based on Collaborative (Deep) Metric Learning (CML) to -predict the purchase probabilities of new theatrical releases. We trained and -evaluated the model using a large dataset of customer histories, and tested the -model for a set of movies that were released outside of the training window. -Initial experiments show gains relative to models that do not train on -collaborative preferences. -" -6812,1803.00344,"Gangeshwar Krishnamurthy, Navonil Majumder, Soujanya Poria, Erik - Cambria",A Deep Learning Approach for Multimodal Deception Detection,cs.CL cs.AI cs.CV," Automatic deception detection is an important task that has gained momentum -in computational linguistics due to its potential applications. In this paper, -we propose a simple yet tough to beat multi-modal neural model for deception -detection. By combining features from different modalities such as video, -audio, and text along with Micro-Expression features, we show that detecting -deception in real life videos can be more accurate. Experimental results on a -dataset of real-life deception videos show that our model outperforms existing -techniques for deception detection with an accuracy of 96.14% and ROC-AUC of -0.9799. -" -6813,1803.00353,"Zhirui Zhang, Shujie Liu, Mu Li, Ming Zhou, Enhong Chen","Joint Training for Neural Machine Translation Models with Monolingual - Data",cs.CL," Monolingual data have been demonstrated to be helpful in improving -translation quality of both statistical machine translation (SMT) systems and -neural machine translation (NMT) systems, especially in resource-poor or domain -adaptation tasks where parallel data are not rich enough. In this paper, we -propose a novel approach to better leveraging monolingual data for neural -machine translation by jointly learning source-to-target and target-to-source -NMT models for a language pair with a joint EM optimization method. The -training process starts with two initial NMT models pre-trained on parallel -data for each direction, and these two models are iteratively updated by -incrementally decreasing translation losses on training data. In each iteration -step, both NMT models are first used to translate monolingual data from one -language to the other, forming pseudo-training data of the other NMT model. -Then two new NMT models are learnt from parallel data together with the pseudo -training data. Both NMT models are expected to be improved and better -pseudo-training data can be generated in next step. Experiment results on -Chinese-English and English-German translation tasks show that our approach can -simultaneously improve translation quality of source-to-target and -target-to-source models, significantly outperforming strong baseline systems -which are enhanced with monolingual data for model training including -back-translation. -" -6814,1803.00357,"Michael Neumann, Ngoc Thang Vu","Cross-lingual and Multilingual Speech Emotion Recognition on English and - French",cs.CL," Research on multilingual speech emotion recognition faces the problem that -most available speech corpora differ from each other in important ways, such as -annotation methods or interaction scenarios. These inconsistencies complicate -building a multilingual system. We present results for cross-lingual and -multilingual emotion recognition on English and French speech data with similar -characteristics in terms of interaction (human-human conversations). Further, -we explore the possibility of fine-tuning a pre-trained cross-lingual model -with only a small number of samples from the target language, which is of great -interest for low-resource languages. To gain more insights in what is learned -by the deployed convolutional neural network, we perform an analysis on the -attention mechanism inside the network. -" -6815,1803.00712,"Phuong Le-Hong, Duc-Thien Bui",A Factoid Question Answering System for Vietnamese,cs.CL," In this paper, we describe the development of an end-to-end factoid question -answering system for the Vietnamese language. This system combines both -statistical models and ontology-based methods in a chain of processing modules -to provide high-quality mappings from natural language text to entities. We -present the challenges in the development of such an intelligent user interface -for an isolating language like Vietnamese and show that techniques developed -for inflectional languages cannot be applied ""as is"". Our question answering -system can answer a wide range of general knowledge questions with promising -accuracy on a test set. -" -6816,1803.00721,Denys Katerenchuk,Age Group Classification with Speech and Metadata Multimodality Fusion,cs.CL cs.SD eess.AS," Children comprise a significant proportion of TV viewers and it is worthwhile -to customize the experience for them. However, identifying who is a child in -the audience can be a challenging task. Identifying gender and age from audio -commands is a well-studied problem but is still very challenging to get good -accuracy when the utterances are typically only a couple of seconds long. We -present initial studies of a novel method which combines utterances with user -metadata. In particular, we develop an ensemble of different machine learning -techniques on different subsets of data to improve child detection. Our initial -results show a 9.2\% absolute improvement over the baseline, leading to a -state-of-the-art performance. -" -6817,1803.00729,"Yu Gong, Kaiqi Zhao, Kenny Q. Zhu",Representing Verbs as Argument Concepts,cs.CL cs.AI," Verbs play an important role in the understanding of natural language text. -This paper studies the problem of abstracting the subject and object arguments -of a verb into a set of noun concepts, known as the ""argument concepts"". This -set of concepts, whose size is parameterized, represents the fine-grained -semantics of a verb. For example, the object of ""enjoy"" can be abstracted into -time, hobby and event, etc. We present a novel framework to automatically infer -human readable and machine computable action concepts with high accuracy. -" -6818,1803.00831,Daniel Ortega and Ngoc Thang Vu,Lexico-acoustic Neural-based Models for Dialog Act Classification,cs.CL," Recent works have proposed neural models for dialog act classification in -spoken dialogs. However, they have not explored the role and the usefulness of -acoustic information. We propose a neural model that processes both lexical and -acoustic features for classification. Our results on two benchmark datasets -reveal that acoustic features are helpful in improving the overall accuracy. -Finally, a deeper analysis shows that acoustic features are valuable in three -cases: when a dialog act has sufficient data, when lexical information is -limited and when strong lexical cues are not present. -" -6819,1803.00832,Dennis Diefenbach and Andreas Both and Kamal Singh and Pierre Maret,Towards a Question Answering System over the Semantic Web,cs.AI cs.CL," Thanks to the development of the Semantic Web, a lot of new structured data -has become available on the Web in the form of knowledge bases (KBs). Making -this valuable data accessible and usable for end-users is one of the main goals -of Question Answering (QA) over KBs. Most current QA systems query one KB, in -one language (namely English). The existing approaches are not designed to be -easily adaptable to new KBs and languages. We first introduce a new approach -for translating natural language questions to SPARQL queries. It is able to -query several KBs simultaneously, in different languages, and can easily be -ported to other KBs and languages. In our evaluation, the impact of our -approach is proven using 5 different well-known and large KBs: Wikidata, -DBpedia, MusicBrainz, DBLP and Freebase as well as 5 different languages namely -English, German, French, Italian and Spanish. Second, we show how we integrated -our approach, to make it easily accessible by the research community and by -end-users. To summarize, we provided a conceptional solution for multilingual, -KB-agnostic Question Answering over the Semantic Web. The provided first -approximation validates this concept. -" -6820,1803.00860,"Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi - Yamagishi, Tomi Kinnunen","Can we steal your vocal identity from the Internet?: Initial - investigation of cloning Obama's voice using GAN, WaveNet and low-quality - found data",eess.AS cs.CL cs.SD stat.ML," Thanks to the growing availability of spoofing databases and rapid advances -in using them, systems for detecting voice spoofing attacks are becoming more -and more capable, and error rates close to zero are being reached for the -ASVspoof2015 database. However, speech synthesis and voice conversion paradigms -that are not considered in the ASVspoof2015 database are appearing. Such -examples include direct waveform modelling and generative adversarial networks. -We also need to investigate the feasibility of training spoofing systems using -only low-quality found data. For that purpose, we developed a generative -adversarial network-based speech enhancement system that improves the quality -of speech data found in publicly available sources. Using the enhanced data, we -trained state-of-the-art text-to-speech and voice conversion models and -evaluated them in terms of perceptual speech quality and speaker similarity. -The results show that the enhancement models significantly improved the SNR of -low-quality degraded data found in publicly available sources and that they -significantly improved the perceptual cleanliness of the source speech without -significantly degrading the naturalness of the voice. However, the results also -show limitations when generating speech with the low-quality found data. -" -6821,1803.00886,"Lantian Li, Dong Wang, Yixiang Chen, Ying Shi, Zhiyuan Tang and Thomas - Fang Zheng",Deep factorization for speech signal,eess.AS cs.CL cs.LG cs.SD," Various informative factors mixed in speech signals, leading to great -difficulty when decoding any of the factors. An intuitive idea is to factorize -each speech frame into individual informative factors, though it turns out to -be highly difficult. Recently, we found that speaker traits, which were assumed -to be long-term distributional properties, are actually short-time patterns, -and can be learned by a carefully designed deep neural network (DNN). This -discovery motivated a cascade deep factorization (CDF) framework that will be -presented in this paper. The proposed framework infers speech factors in a -sequential way, where factors previously inferred are used as conditional -variables when inferring other factors. We will show that this approach can -effectively factorize speech signals, and using these factors, the original -speech spectrum can be recovered with a high accuracy. This factorization and -reconstruction approach provides potential values for many speech processing -tasks, e.g., speaker recognition and emotion recognition, as will be -demonstrated in the paper. -" -6822,1803.00902,Duygu Altinok,"DEMorphy, German Language Morphological Analyzer",cs.CL," DEMorphy is a morphological analyzer for German. It is built onto large, -compactified lexicons from German Morphological Dictionary. A guesser based on -German declension suffixed is also provided. For German, we provided a -state-of-art morphological analyzer. DEMorphy is implemented in Python with -ease of usability and accompanying documentation. The package is suitable for -both academic and commercial purposes wit a permissive licence. -" -6823,1803.00985,"Henrique X. Goulart, Mauro D. L. Tosi, Daniel Soares Gon\c{c}alves, - Rodrigo F. Maia, Guilherme A. Wachs-Lopes","Hybrid Model For Word Prediction Using Naive Bayes and Latent - Information",cs.CL," Historically, the Natural Language Processing area has been given too much -attention by many researchers. One of the main motivation beyond this interest -is related to the word prediction problem, which states that given a set words -in a sentence, one can recommend the next word. In literature, this problem is -solved by methods based on syntactic or semantic analysis. Solely, each of -these analysis cannot achieve practical results for end-user applications. For -instance, the Latent Semantic Analysis can handle semantic features of text, -but cannot suggest words considering syntactical rules. On the other hand, -there are models that treat both methods together and achieve state-of-the-art -results, e.g. Deep Learning. These models can demand high computational effort, -which can make the model infeasible for certain types of applications. With the -advance of the technology and mathematical models, it is possible to develop -faster systems with more accuracy. This work proposes a hybrid word suggestion -model, based on Naive Bayes and Latent Semantic Analysis, considering -neighbouring words around unfilled gaps. Results show that this model could -achieve 44.2% of accuracy in the MSR Sentence Completion Challenge. -" -6824,1803.01090,"Zhehuai Chen, Qi Liu, Hao Li, Kai Yu",On Modular Training of Neural Acoustics-to-Word Model for LVCSR,cs.CL," End-to-end (E2E) automatic speech recognition (ASR) systems directly map -acoustics to words using a unified model. Previous works mostly focus on E2E -training a single model which integrates acoustic and language model into a -whole. Although E2E training benefits from sequence modeling and simplified -decoding pipelines, large amount of transcribed acoustic data is usually -required, and traditional acoustic and language modelling techniques cannot be -utilized. In this paper, a novel modular training framework of E2E ASR is -proposed to separately train neural acoustic and language models during -training stage, while still performing end-to-end inference in decoding stage. -Here, an acoustics-to-phoneme model (A2P) and a phoneme-to-word model (P2W) are -trained using acoustic data and text data respectively. A phone synchronous -decoding (PSD) module is inserted between A2P and P2W to reduce sequence -lengths without precision loss. Finally, modules are integrated into an -acousticsto-word model (A2W) and jointly optimized using acoustic data to -retain the advantage of sequence modeling. Experiments on a 300- hour -Switchboard task show significant improvement over the direct A2W model. The -efficiency in both training and decoding also benefits from the proposed -method. -" -6825,1803.01165,"Yizhong Wang, Sujian Li, Jingfeng Yang, Xu Sun, Houfeng Wang","Tag-Enhanced Tree-Structured Neural Networks for Implicit Discourse - Relation Classification",cs.CL," Identifying implicit discourse relations between text spans is a challenging -task because it requires understanding the meaning of the text. To tackle this -task, recent studies have tried several deep learning methods but few of them -exploited the syntactic information. In this work, we explore the idea of -incorporating syntactic parse tree into neural networks. Specifically, we -employ the Tree-LSTM model and Tree-GRU model, which are based on the tree -structure, to encode the arguments in a relation. Moreover, we further leverage -the constituent tags to control the semantic composition process in these -tree-structured neural networks. Experimental results show that our method -achieves state-of-the-art performance on PDTB corpus. -" -6826,1803.01255,"Haoyue Shi, Yuqi Sun, Junfeng Hu","Understanding and Improving Multi-Sense Word Embeddings via Extended - Robust Principal Component Analysis",cs.CL," Unsupervised learned representations of polysemous words generate a large of -pseudo multi senses since unsupervised methods are overly sensitive to -contextual variations. In this paper, we address the pseudo multi-sense -detection for word embeddings by dimensionality reduction of sense pairs. We -propose a novel principal analysis method, termed Ex-RPCA, designed to detect -both pseudo multi senses and real multi senses. With Ex-RPCA, we empirically -show that pseudo multi senses are generated systematically in unsupervised -method. Moreover, the multi-sense word embeddings can by improved by a simple -linear transformation based on Ex-RPCA. Our improved word embedding outperform -the original one by 5.6 points on Stanford contextual word similarity (SCWS) -dataset. We hope our simple yet effective approach will help the linguistic -analysis of multi-sense word embeddings in the future. -" -6827,1803.01271,"Shaojie Bai, J. Zico Kolter, Vladlen Koltun","An Empirical Evaluation of Generic Convolutional and Recurrent Networks - for Sequence Modeling",cs.LG cs.AI cs.CL," For most deep learning practitioners, sequence modeling is synonymous with -recurrent networks. Yet recent results indicate that convolutional -architectures can outperform recurrent networks on tasks such as audio -synthesis and machine translation. Given a new sequence modeling task or -dataset, which architecture should one use? We conduct a systematic evaluation -of generic convolutional and recurrent architectures for sequence modeling. The -models are evaluated across a broad range of standard tasks that are commonly -used to benchmark recurrent networks. Our results indicate that a simple -convolutional architecture outperforms canonical recurrent networks such as -LSTMs across a diverse range of tasks and datasets, while demonstrating longer -effective memory. We conclude that the common association between sequence -modeling and recurrent networks should be reconsidered, and convolutional -networks should be regarded as a natural starting point for sequence modeling -tasks. To assist related work, we have made code available at -http://github.com/locuslab/TCN . -" -6828,1803.01335,Long-Huei Chen and Kshitiz Tripathi,CAESAR: Context Awareness Enabled Summary-Attentive Reader,cs.CL," Comprehending meaning from natural language is a primary objective of Natural -Language Processing (NLP), and text comprehension is the cornerstone for -achieving this objective upon which all other problems like chat bots, language -translation and others can be achieved. We report a Summary-Attentive Reader we -designed to better emulate the human reading process, along with a -dictiontary-based solution regarding out-of-vocabulary (OOV) words in the data, -to generate answer based on machine comprehension of reading passages and -question from the SQuAD benchmark. Our implementation of these features with -two popular models (Match LSTM and Dynamic Coattention) was able to reach close -to matching the results obtained from humans. -" -6829,1803.01400,"Andreas R\""uckl\'e, Steffen Eger, Maxime Peyrard, Iryna Gurevych","Concatenated Power Mean Word Embeddings as Universal Cross-Lingual - Sentence Representations",cs.CL," Average word embeddings are a common baseline for more sophisticated sentence -embedding techniques. However, they typically fall short of the performances of -more complex models such as InferSent. Here, we generalize the concept of -average word embeddings to power mean word embeddings. We show that the -concatenation of different types of power mean word embeddings considerably -closes the gap to state-of-the-art methods monolingually and substantially -outperforms these more complex techniques cross-lingually. In addition, our -proposed method outperforms different recently proposed baselines such as SIF -and Sent2Vec by a solid margin, thus constituting a much harder-to-beat -monolingual baseline. Our data and code are publicly available. -" -6830,1803.01465,"Shuming Ma, Xu Sun, Wei Li, Sujian Li, Wenjie Li, Xuancheng Ren","Query and Output: Generating Words by Querying Distributed Word - Representations for Paraphrase Generation",cs.CL cs.LG," Most recent approaches use the sequence-to-sequence model for paraphrase -generation. The existing sequence-to-sequence model tends to memorize the words -and the patterns in the training dataset instead of learning the meaning of the -words. Therefore, the generated sentences are often grammatically correct but -semantically improper. In this work, we introduce a novel model based on the -encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our -proposed model generates the words by querying distributed word representations -(i.e. neural word embeddings), hoping to capturing the meaning of the according -words. Following previous work, we evaluate our model on two -paraphrase-oriented tasks, namely text simplification and short text -abstractive summarization. Experimental results show that our model outperforms -the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two -English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a -Chinese summarization dataset. Moreover, our model achieves state-of-the-art -performances on these three benchmark datasets. -" -6831,1803.01557,"Zhiyuan Zhang, Wei Li, Qi Su","Automatic Translating between Ancient Chinese and Contemporary Chinese - with Limited Aligned Corpora",cs.CL," The Chinese language has evolved a lot during the long-term development. -Therefore, native speakers now have trouble in reading sentences written in -ancient Chinese. In this paper, we propose to build an end-to-end neural model -to automatically translate between ancient and contemporary Chinese. However, -the existing ancient-contemporary Chinese parallel corpora are not aligned at -the sentence level and sentence-aligned corpora are limited, which makes it -difficult to train the model. To build the sentence level parallel training -data for the model, we propose an unsupervised algorithm that constructs -sentence-aligned ancient-contemporary pairs by using the fact that the aligned -sentence pair shares many of the tokens. Based on the aligned corpus, we -propose an end-to-end neural model with copying mechanism and local attention -to translate between ancient and contemporary Chinese. Experiments show that -the proposed unsupervised algorithm achieves 99.4% F1 score for sentence -alignment, and the translation model achieves 26.95 BLEU from ancient to -contemporary, and 36.34 BLEU from contemporary to ancient. -" -6832,1803.01580,"Andrew Krizhanovsky, Alexander Kirillov",Calculated attributes of synonym sets,cs.CL cs.IR," The goal of formalization, proposed in this paper, is to bring together, as -near as possible, the theoretic linguistic problem of synonym conception and -the computer linguistic methods based generally on empirical intuitive -unjustified factors. Using the word vector representation we have proposed the -geometric approach to mathematical modeling of synonym set (synset). The word -embedding is based on the neural networks (Skip-gram, CBOW), developed and -realized as word2vec program by T. Mikolov. The standard cosine similarity is -used as the distance between word-vectors. Several geometric characteristics of -the synset words are introduced: the interior of synset, the synset word rank -and centrality. These notions are intended to select the most significant -synset words, i.e. the words which senses are the nearest to the sense of a -synset. Some experiments with proposed notions, based on RusVectores resources, -are represented. A brief description of this work can be viewed in slides -https://goo.gl/K82Fei -" -6833,1803.01686,"Yuanhang Su, C.-C. Jay Kuo","On Extended Long Short-term Memory and Dependent Bidirectional Recurrent - Neural Network",cs.LG cs.CL cs.NE stat.ML," In this work, we first analyze the memory behavior in three recurrent neural -networks (RNN) cells; namely, the simple RNN (SRN), the long short-term memory -(LSTM) and the gated recurrent unit (GRU), where the memory is defined as a -function that maps previous elements in a sequence to the current output. Our -study shows that all three of them suffer rapid memory decay. Then, to -alleviate this effect, we introduce trainable scaling factors that act like an -attention mechanism to adjust memory decay adaptively. The new design is called -the extended LSTM (ELSTM). Finally, to design a system that is robust to -previous erroneous predictions, we propose a dependent bidirectional recurrent -neural network (DBRNN). Extensive experiments are conducted on different -language tasks to demonstrate the superiority of the proposed ELSTM and DBRNN -solutions. The ELTSM has achieved up to 30% increase in the labeled attachment -score (LAS) as compared to LSTM and GRU in the dependency parsing (DP) task. -Our models also outperform other state-of-the-art models such as bi-attention -and convolutional sequence to sequence (convseq2seq) by close to 10% in the -LAS. The code is released as an open source -(https://github.com/yuanhangsu/ELSTM-DBRNN) -" -6834,1803.01707,"Benjamin Roth, Costanza Conforti, Nina Poerner, Sanjeev Karn and - Hinrich Sch\""utze",Neural Architectures for Open-Type Relation Argument Extraction,cs.CL," In this work, we introduce the task of Open-Type Relation Argument Extraction -(ORAE): Given a corpus, a query entity Q and a knowledge base relation (e.g.,""Q -authored notable work with title X""), the model has to extract an argument of -non-standard entity type (entities that cannot be extracted by a standard named -entity tagger, e.g. X: the title of a book or a work of art) from the corpus. A -distantly supervised dataset based on WikiData relations is obtained and -released to address the task. - We develop and compare a wide range of neural models for this task yielding -large improvements over a strong baseline obtained with a neural question -answering system. The impact of different sentence encoding architectures and -answer extraction methods is systematically compared. An encoder based on gated -recurrent units combined with a conditional random fields tagger gives the best -results. -" -6835,1803.01934,Lu\'is F Seoane and Ricard Sol\'e,The morphospace of language networks,physics.soc-ph cs.CL," Language can be described as a network of interacting objects with different -qualitative properties and complexity. These networks include semantic, -syntactic, or phonological levels and have been found to provide a new picture -of language complexity and its evolution. A general approach considers language -from an information theory perspective that incorporates a speaker, a hearer, -and a noisy channel. The later is often encoded in a matrix connecting the -signals used for communication with meanings to be found in the real world. -Most studies of language evolution deal in a way or another with such -theoretical contraption and explore the outcome of diverse forms of selection -on the communication matrix that somewhat optimizes communication. This -framework naturally introduces networks mediating the communicating agents, but -no systematic analysis of the underlying landscape of possible language graphs -has been developed. Here we present a detailed analysis of network properties -on a generic model of a communication code, which reveals a rather complex and -heterogeneous morphospace of language networks. Additionally, we use curated -data of English words to locate and evaluate real languages within this -language morphospace. Our findings indicate a surprisingly simple structure in -human language unless particles are introduced in the vocabulary, with the -ability of naming any other concept. These results refine and for the first -time complement with empirical data a lasting theoretical tradition around the -framework of \emph{least effort language}. -" -6836,1803.01937,Kavita Ganesan,"ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization - Tasks",cs.IR cs.AI cs.CL," Evaluation of summarization tasks is extremely crucial to determining the -quality of machine generated summaries. Over the last decade, ROUGE has become -the standard automatic evaluation measure for evaluating summarization tasks. -While ROUGE has been shown to be effective in capturing n-gram overlap between -system and human composed summaries, there are several limitations with the -existing ROUGE measures in terms of capturing synonymous concepts and coverage -of topics. Thus, often times ROUGE scores do not reflect the true quality of -summaries and prevents multi-faceted evaluation of summaries (i.e. by topics, -by overall content coverage and etc). In this paper, we introduce ROUGE 2.0, -which has several updated measures of ROUGE: ROUGE-N+Synonyms, ROUGE-Topic, -ROUGE-Topic+Synonyms, ROUGE-TopicUniq and ROUGE-TopicUniq+Synonyms; all of -which are improvements over the core ROUGE measures. -" -6837,1803.02088,"Francisco J. Chiyah Garcia, David A. Robb, Xingkun Liu, Atanas Laskov, - Pedro Patron and Helen Hastie","Explain Yourself: A Natural Language Interface for Scrutable Autonomous - Robots",cs.CL cs.AI cs.HC," Autonomous systems in remote locations have a high degree of autonomy and -there is a need to explain what they are doing and why in order to increase -transparency and maintain trust. Here, we describe a natural language chat -interface that enables vehicle behaviour to be queried by the user. We obtain -an interpretable model of autonomy through having an expert 'speak out-loud' -and provide explanations during a mission. This approach is agnostic to the -type of autonomy model and as expert and operator are from the same user-group, -we predict that these explanations will align well with the operator's mental -model, increase transparency and assist with operator training. -" -6838,1803.02155,"Peter Shaw, Jakob Uszkoreit, Ashish Vaswani",Self-Attention with Relative Position Representations,cs.CL," Relying entirely on an attention mechanism, the Transformer introduced by -Vaswani et al. (2017) achieves state-of-the-art results for machine -translation. In contrast to recurrent and convolutional neural networks, it -does not explicitly model relative or absolute position information in its -structure. Instead, it requires adding representations of absolute positions to -its inputs. In this work we present an alternative approach, extending the -self-attention mechanism to efficiently consider representations of the -relative positions, or distances between sequence elements. On the WMT 2014 -English-to-German and English-to-French translation tasks, this approach yields -improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, -respectively. Notably, we observe that combining relative and absolute position -representations yields no further improvement in translation quality. We -describe an efficient implementation of our method and cast it as an instance -of relation-aware self-attention mechanisms that can generalize to arbitrary -graph-labeled inputs. -" -6839,1803.02205,Vasiliki Efstathiou and Diomidis Spinellis,Code Review Comments: Language Matters,cs.SE cs.CL," Recent research provides evidence that effective communication in -collaborative software development has significant impact on the software -development lifecycle. Although related qualitative and quantitative studies -point out textual characteristics of well-formed messages, the underlying -semantics of the intertwined linguistic structures still remain largely -misinterpreted or ignored. Especially, regarding quality of code reviews the -importance of thorough feedback, and explicit rationale is often mentioned but -rarely linked with related linguistic features. As a first step towards -addressing this shortcoming, we propose grounding these studies on theories of -linguistics. We particularly focus on linguistic structures of coherent speech -and explain how they can be exploited in practice. We reflect on related -approaches and examine through a preliminary study on four open source -projects, possible links between existing findings and the directions we -suggest for detecting textual features of useful code reviews. -" -6840,1803.02238,"Ivan Gavran, Brendon Boldt, Eva Darulova, Rupak Majumdar",Precise but Natural Specification for Robot Tasks,cs.RO cs.CL cs.SY," We present Flipper, a natural language interface for describing high-level -task specifications for robots that are compiled into robot actions. Flipper -starts with a formal core language for task planning that allows expressing -rich temporal specifications and uses a semantic parser to provide a natural -language interface. Flipper provides immediate visual feedback by executing an -automatically constructed plan of the task in a graphical user interface. This -allows the user to resolve potentially ambiguous interpretations. Flipper -extends itself via naturalization: its users can add definitions for -utterances, from which Flipper induces new rules and adds them to the core -language, gradually growing a more and more natural task specification -language. Flipper improves the naturalization by generalizing the definition -provided by users. Unlike other task-specification systems, Flipper enables -natural language interactions while maintaining the expressive power and formal -precision of a programming language. We show through an initial user study that -natural language interactions and generalization can considerably ease the -description of tasks. Moreover, over time, users employ more and more concepts -outside of the initial core language. Such extensions are available to the -Flipper community, and users can use concepts that others have defined. -" -6841,1803.02245,"Willie Boag, Elena Sergeeva, Saurabh Kulshreshtha, Peter Szolovits, - Anna Rumshisky, Tristan Naumann",CliNER 2.0: Accessible and Accurate Clinical Concept Extraction,cs.CL," Clinical notes often describe important aspects of a patient's stay and are -therefore critical to medical research. Clinical concept extraction (CCE) of -named entities - such as problems, tests, and treatments - aids in forming an -understanding of notes and provides a foundation for many downstream clinical -decision-making tasks. Historically, this task has been posed as a standard -named entity recognition (NER) sequence tagging problem, and solved with -feature-based methods using handengineered domain knowledge. Recent advances, -however, have demonstrated the efficacy of LSTM-based models for NER tasks, -including CCE. This work presents CliNER 2.0, a simple-to-install, open-source -tool for extracting concepts from clinical text. CliNER 2.0 uses a word- and -character- level LSTM model, and achieves state-of-the-art performance. For -ease of use, the tool also includes pre-trained models available for public -use. -" -6842,1803.02279,"Stefan Constantin, Jan Niehues, and Alex Waibel","An End-to-End Goal-Oriented Dialog System with a Generative Natural - Language Response Generation",cs.CL," Recently advancements in deep learning allowed the development of end-to-end -trained goal-oriented dialog systems. Although these systems already achieve -good performance, some simplifications limit their usage in real-life -scenarios. - In this work, we address two of these limitations: ignoring positional -information and a fixed number of possible response candidates. We propose to -use positional encodings in the input to model the word order of the user -utterances. Furthermore, by using a feedforward neural network, we are able to -generate the output word by word and are no longer restricted to a fixed number -of possible response candidates. Using the positional encoding, we were able to -achieve better accuracies in the Dialog bAbI Tasks and using the feedforward -neural network for generating the response, we were able to save computation -time and space consumption. -" -6843,1803.02324,"Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel - R. Bowman, Noah A. Smith",Annotation Artifacts in Natural Language Inference Data,cs.CL cs.AI," Large-scale datasets for natural language inference are created by presenting -crowd workers with a sentence (premise), and asking them to generate three new -sentences (hypotheses) that it entails, contradicts, or is logically neutral -with respect to. We show that, in a significant portion of such data, this -protocol leaves clues that make it possible to identify the label by looking -only at the hypothesis, without observing the premise. Specifically, we show -that a simple text categorization model can correctly classify the hypothesis -alone in about 67% of SNLI (Bowman et. al, 2015) and 53% of MultiNLI (Williams -et. al, 2017). Our analysis reveals that specific linguistic phenomena such as -negation and vagueness are highly correlated with certain inference classes. -Our findings suggest that the success of natural language inference models to -date has been overestimated, and that the task remains a hard open problem. -" -6844,1803.02392,"Francesco Barbieri, Miguel Ballesteros, Francesco Ronzano, Horacio - Saggion",Multimodal Emoji Prediction,cs.CL," Emojis are small images that are commonly included in social media text -messages. The combination of visual and textual content in the same message -builds up a modern way of communication, that automatic systems are not used to -deal with. In this paper we extend recent advances in emoji prediction by -putting forward a multimodal approach that is able to predict emojis in -Instagram posts. Instagram posts are composed of pictures together with texts -which sometimes include emojis. We show that these emojis can be predicted by -using the text, but also using the picture. Our main finding is that -incorporating the two synergistic modalities, in a combined model, improves -accuracy in an emoji prediction task. This result demonstrates that these two -modalities (text and images) encode different information on the use of emojis -and therefore can complement each other. -" -6845,1803.02400,"Po-Sen Huang, Chenglong Wang, Rishabh Singh, Wen-tau Yih, Xiaodong He",Natural Language to Structured Query Generation via Meta-Learning,cs.CL cs.LG," In conventional supervised training, a model is trained to fit all the -training examples. However, having a monolithic model may not always be the -best strategy, as examples could vary widely. In this work, we explore a -different learning protocol that treats each example as a unique pseudo-task, -by reducing the original learning problem to a few-shot meta-learning scenario -with the help of a domain-dependent relevance function. When evaluated on the -WikiSQL dataset, our approach leads to faster convergence and achieves -1.1%-5.4% absolute accuracy gains over the non-meta-learning counterparts. -" -6846,1803.02551,Wei-Ning Hsu and James Glass,"Extracting Domain Invariant Features by Unsupervised Learning for Robust - Automatic Speech Recognition",cs.CL cs.LG cs.SD eess.AS stat.ML," The performance of automatic speech recognition (ASR) systems can be -significantly compromised by previously unseen conditions, which is typically -due to a mismatch between training and testing distributions. In this paper, we -address robustness by studying domain invariant features, such that domain -information becomes transparent to ASR systems, resolving the mismatch problem. -Specifically, we investigate a recent model, called the Factorized Hierarchical -Variational Autoencoder (FHVAE). FHVAEs learn to factorize sequence-level and -segment-level attributes into different latent variables without supervision. -We argue that the set of latent variables that contain segment-level -information is our desired domain invariant feature for ASR. Experiments are -conducted on Aurora-4 and CHiME-4, which demonstrate 41% and 27% absolute word -error rate reductions respectively on mismatched domains. -" -6847,1803.02632,"Wenfeng Feng, Hankz Hankui Zhuo, Subbarao Kambhampati","Extracting Action Sequences from Texts Based on Deep Reinforcement - Learning",cs.AI cs.CL," Extracting action sequences from natural language texts is challenging, as it -requires commonsense inferences based on world knowledge. Although there has -been work on extracting action scripts, instructions, navigation actions, etc., -they require that either the set of candidate actions be provided in advance, -or that action descriptions are restricted to a specific form, e.g., -description templates. In this paper, we aim to extract action sequences from -texts in free natural language, i.e., without any restricted templates, -provided the candidate set of actions is unknown. We propose to extract action -sequences from texts based on the deep reinforcement learning framework. -Specifically, we view ""selecting"" or ""eliminating"" words from texts as -""actions"", and the texts associated with actions as ""states"". We then build -Q-networks to learn the policy of extracting actions and extract plans from the -labeled texts. We demonstrate the effectiveness of our approach on several -datasets with comparison to state-of-the-art approaches, including online -experiments interacting with humans. -" -6848,1803.02710,"Yikang Shen, Shawn Tan, Chin-Wei Huang, Aaron Courville","Generating Contradictory, Neutral, and Entailing Sentences",cs.CL cs.AI," Learning distributed sentence representations remains an interesting problem -in the field of Natural Language Processing (NLP). We want to learn a model -that approximates the conditional latent space over the representations of a -logical antecedent of the given statement. In our paper, we propose an approach -to generating sentences, conditioned on an input sentence and a logical -inference label. We do this by modeling the different possibilities for the -output sentence as a distribution over the latent representation, which we -train using an adversarial objective. We evaluate the model using two -state-of-the-art models for the Recognizing Textual Entailment (RTE) task, and -measure the BLEU scores against the actual sentences as a probe for the -diversity of sentences produced by our model. The experiment results show that, -given our framework, we have clear ways to improve the quality and diversity of -generated sentences. -" -6849,1803.02728,"Willie Boag, Tristan Naumann, Peter Szolovits","Towards the Creation of a Large Corpus of Synthetically-Identified - Clinical Notes",cs.CL cs.CY," Clinical notes often describe the most important aspects of a patient's -physiology and are therefore critical to medical research. However, these notes -are typically inaccessible to researchers without prior removal of sensitive -protected health information (PHI), a natural language processing (NLP) task -referred to as deidentification. Tools to automatically de-identify clinical -notes are needed but are difficult to create without access to those very same -notes containing PHI. This work presents a first step toward creating a large -synthetically-identified corpus of clinical notes and corresponding PHI -annotations in order to facilitate the development de-identification tools. -Further, one such tool is evaluated against this corpus in order to understand -the advantages and shortcomings of this approach. -" -6850,1803.02839,Sean A. Cantrell,The emergent algebraic structure of RNNs and embeddings in NLP,cs.CL cs.AI stat.ML," We examine the algebraic and geometric properties of a uni-directional GRU -and word embeddings trained end-to-end on a text classification task. A -hyperparameter search over word embedding dimension, GRU hidden dimension, and -a linear combination of the GRU outputs is performed. We conclude that words -naturally embed themselves in a Lie group and that RNNs form a nonlinear -representation of the group. Appealing to these results, we propose a novel -class of recurrent-like neural networks and a word embedding scheme. -" -6851,1803.02893,"Lajanugen Logeswaran, Honglak Lee",An efficient framework for learning sentence representations,cs.CL cs.AI cs.LG," In this work we propose a simple and efficient framework for learning -sentence representations from unlabelled data. Drawing inspiration from the -distributional hypothesis and recent work on learning sentence representations, -we reformulate the problem of predicting the context in which a sentence -appears as a classification problem. Given a sentence and its context, a -classifier distinguishes context sentences from other contrastive sentences -based on their vector representations. This allows us to efficiently learn -different types of encoding functions, and we show that the model learns -high-quality sentence representations. We demonstrate that our sentence -representations outperform state-of-the-art unsupervised and supervised -representation learning methods on several downstream NLP tasks that involve -understanding sentence semantics while achieving an order of magnitude speedup -in training time. -" -6852,1803.02914,Mihael Arcan,Translating Questions into Answers using DBPedia n-triples,cs.CL," In this paper we present a question answering system using a neural network -to interpret questions learned from the DBpedia repository. We train a -sequence-to-sequence neural network model with n-triples extracted from the -DBpedia Infobox Properties. Since these properties do not represent the natural -language, we further used question-answer dialogues from movie subtitles. -Although the automatic evaluation shows a low overlap of the generated answers -compared to the gold standard set, a manual inspection of the showed promising -outcomes from the experiment for further work. -" -6853,1803.02994,"Linli Xu, Liang Jiang, Chuan Qin, Zhe Wang, Dongfang Du","How Images Inspire Poems: Generating Classical Chinese Poetry from - Images with Memory Networks",cs.CL," With the recent advances of neural models and natural language processing, -automatic generation of classical Chinese poetry has drawn significant -attention due to its artistic and cultural value. Previous works mainly focus -on generating poetry given keywords or other text information, while visual -inspirations for poetry have been rarely explored. Generating poetry from -images is much more challenging than generating poetry from text, since images -contain very rich visual information which cannot be described completely using -several keywords, and a good poem should convey the image accurately. In this -paper, we propose a memory based neural model which exploits images to generate -poems. Specifically, an Encoder-Decoder model with a topic memory network is -proposed to generate classical Chinese poetry from images. To the best of our -knowledge, this is the first work attempting to generate classical Chinese -poetry from images with neural networks. A comprehensive experimental -investigation with both human evaluation and quantitative analysis demonstrates -that the proposed model can generate poems which convey images accurately. -" -6854,1803.03018,"Heishiro Kanagawa, Hayato Kobayashi, Nobuyuki Shimizu, Yukihiro Tagami - and Taiji Suzuki",Cross-domain Recommendation via Deep Domain Adaptation,cs.LG cs.CL cs.IR," The behavior of users in certain services could be a clue that can be used to -infer their preferences and may be used to make recommendations for other -services they have never used. However, the cross-domain relationships between -items and user consumption patterns are not simple, especially when there are -few or no common users and items across domains. To address this problem, we -propose a content-based cross-domain recommendation method for cold-start users -that does not require user- and item- overlap. We formulate recommendation as -extreme multi-class classification where labels (items) corresponding to the -users are predicted. With this formulation, the problem is reduced to a domain -adaptation setting, in which a classifier trained in the source domain is -adapted to the target domain. For this, we construct a neural network that -combines an architecture for domain adaptation, Domain Separation Network, with -a denoising autoencoder for item representation. We assess the performance of -our approach in experiments on a pair of data sets collected from movie and -news services of Yahoo! JAPAN and show that our approach outperforms several -baseline methods including a cross-domain collaborative filtering method. -" -6855,1803.03178,"Tsvetomila Mihaylova, Preslav Nakov, Lluis Marquez, Alberto - Barron-Cedeno, Mitra Mohtarami, Georgi Karadzhov, James Glass",Fact Checking in Community Forums,cs.CL," Community Question Answering (cQA) forums are very popular nowadays, as they -represent effective means for communities around particular topics to share -information. Unfortunately, this information is not always factual. Thus, here -we explore a new dimension in the context of cQA, which has been ignored so -far: checking the veracity of answers to particular questions in cQA forums. As -this is a new problem, we create a specialized dataset for it. We further -propose a novel multi-faceted model, which captures information from the answer -content (what is said and how), from the author profile (who says it), from the -rest of the community forum (where it is said), and from external authoritative -sources of information (external support). Evaluation results show a MAP value -of 86.54, which is 21 points absolute above the baseline. -" -6856,1803.03232,"I\~nigo Casanueva, Pawe{\l} Budzianowski, Pei-Hao Su, Stefan Ultes, - Lina Rojas-Barahona, Bo-Hsiang Tseng and Milica Ga\v{s}i\'c",Feudal Reinforcement Learning for Dialogue Management in Large Domains,cs.CL cs.AI cs.NE," Reinforcement learning (RL) is a promising approach to solve dialogue policy -optimisation. Traditional RL algorithms, however, fail to scale to large -domains due to the curse of dimensionality. We propose a novel Dialogue -Management architecture, based on Feudal RL, which decomposes the decision into -two steps; a first step where a master policy selects a subset of primitive -actions, and a second step where a primitive action is chosen from the selected -subset. The structural information included in the domain ontology is used to -abstract the dialogue state space, taking the decisions at each step using -different parts of the abstracted state. This, combined with an information -sharing mechanism between slots, increases the scalability to large domains. We -show that an implementation of this approach, based on Deep-Q Networks, -significantly outperforms previous state of the art in several dialogue domains -and environments, without the need of any additional reward signal. -" -6857,1803.03370,"Huan Gui, Qi Zhu, Liyuan Liu, Aston Zhang, Jiawei Han","Expert Finding in Heterogeneous Bibliographic Networks with - Locally-trained Embeddings",cs.IR cs.AI cs.CL cs.SI," Expert finding is an important task in both industry and academia. It is -challenging to rank candidates with appropriate expertise for various queries. -In addition, different types of objects interact with one another, which -naturally forms heterogeneous information networks. We study the task of expert -finding in heterogeneous bibliographical networks based on two aspects: textual -content analysis and authority ranking. Regarding the textual content analysis, -we propose a new method for query expansion via locally-trained embedding -learning with concept hierarchy as guidance, which is particularly tailored for -specific queries with narrow semantic meanings. Compared with global embedding -learning, locally-trained embedding learning projects the terms into a latent -semantic space constrained on relevant topics, therefore it preserves more -precise and subtle information for specific queries. Considering the candidate -ranking, the heterogeneous information network structure, while being largely -ignored in the previous studies of expert finding, provides additional -information. Specifically, different types of interactions among objects play -different roles. We propose a ranking algorithm to estimate the authority of -objects in the network, treating each strongly-typed edge type individually. To -demonstrate the effectiveness of the proposed framework, we apply the proposed -method to a large-scale bibliographical dataset with over two million entries -and one million researcher candidates. The experiment results show that the -proposed framework outperforms existing methods for both general and specific -queries. -" -6858,1803.03376,"Lifu Tu, Kevin Gimpel",Learning Approximate Inference Networks for Structured Prediction,cs.CL cs.LG stat.ML," Structured prediction energy networks (SPENs; Belanger & McCallum 2016) use -neural network architectures to define energy functions that can capture -arbitrary dependencies among parts of structured outputs. Prior work used -gradient descent for inference, relaxing the structured output to a set of -continuous variables and then optimizing the energy with respect to them. We -replace this use of gradient descent with a neural network trained to -approximate structured argmax inference. This ""inference network"" outputs -continuous values that we treat as the output structure. We develop -large-margin training criteria for joint training of the structured energy -function and inference network. On multi-label classification we report -speed-ups of 10-60x compared to (Belanger et al, 2017) while also improving -accuracy. For sequence labeling with simple structured energies, our approach -performs comparably to exact inference while being much faster at test time. We -then demonstrate improved accuracy by augmenting the energy with a ""label -language model"" that scores entire output label sequences, showing it can -improve handling of long-distance dependencies in part-of-speech tagging. -Finally, we show how inference networks can replace dynamic programming for -test-time inference in conditional random fields, suggestive for their general -use for fast inference in structured settings. -" -6859,1803.03378,Peng Xu and Denilson Barbosa,Neural Fine-Grained Entity Type Classification with Hierarchy-Aware Loss,cs.CL," The task of Fine-grained Entity Type Classification (FETC) consists of -assigning types from a hierarchy to entity mentions in text. Existing methods -rely on distant supervision and are thus susceptible to noisy labels that can -be out-of-context or overly-specific for the training sentence. Previous -methods that attempt to address these issues do so with heuristics or with the -help of hand-crafted features. Instead, we propose an end-to-end solution with -a neural network model that uses a variant of cross- entropy loss function to -handle out-of-context labels, and hierarchical loss normalization to cope with -overly-specific ones. Also, previous work solve FETC a multi-label -classification followed by ad-hoc post-processing. In contrast, our solution is -more elegant: we use public word embeddings to train a single-label that -jointly learns representations for entity mentions and their context. We show -experimentally that our approach is robust against noise and consistently -outperforms the state-of-the-art on established benchmarks for the task. -" -6860,1803.03476,"Minghua Zhang, Yunfang Wu",An Unsupervised Model with Attention Autoencoders for Question Retrieval,cs.CL," Question retrieval is a crucial subtask for community question answering. -Previous research focus on supervised models which depend heavily on training -data and manual feature engineering. In this paper, we propose a novel -unsupervised framework, namely reduced attentive matching network (RAMN), to -compute semantic matching between two questions. Our RAMN integrates together -the deep semantic representations, the shallow lexical mismatching information -and the initial rank produced by an external search engine. For the first time, -we propose attention autoencoders to generate semantic representations of -questions. In addition, we employ lexical mismatching to capture surface -matching between two questions, which is derived from the importance of each -word in a question. We conduct experiments on the open CQA datasets of -SemEval-2016 and SemEval-2017. The experimental results show that our -unsupervised model obtains comparable performance with the state-of-the-art -supervised methods in SemEval-2016 Task 3, and outperforms the best system in -SemEval-2017 Task 3 by a wide margin. -" -6861,1803.03481,"Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi, and Tetsunari - Inamura","Improved and Scalable Online Learning of Spatial Concepts and Language - Models with Mapping",cs.RO cs.AI cs.CL cs.LG," We propose a novel online learning algorithm, called SpCoSLAM 2.0, for -spatial concepts and lexical acquisition with high accuracy and scalability. -Previously, we proposed SpCoSLAM as an online learning algorithm based on -unsupervised Bayesian probabilistic model that integrates multimodal place -categorization, lexical acquisition, and SLAM. However, our original algorithm -had limited estimation accuracy owing to the influence of the early stages of -learning, and increased computational complexity with added training data. -Therefore, we introduce techniques such as fixed-lag rejuvenation to reduce the -calculation time while maintaining an accuracy higher than that of the original -algorithm. The results show that, in terms of estimation accuracy, the proposed -algorithm exceeds the original algorithm and is comparable to batch learning. -In addition, the calculation time of the proposed algorithm does not depend on -the amount of training data and becomes constant for each step of the scalable -algorithm. Our approach will contribute to the realization of long-term spatial -language interactions between humans and robots. -" -6862,1803.03585,Ke Tran and Arianna Bisazza and Christof Monz,The Importance of Being Recurrent for Modeling Hierarchical Structure,cs.CL," Recent work has shown that recurrent neural networks (RNNs) can implicitly -capture and exploit hierarchical information when trained to solve common -natural language processing tasks such as language modeling (Linzen et al., -2016) and neural machine translation (Shi et al., 2016). In contrast, the -ability to model structured data with non-recurrent neural networks has -received little attention despite their success in many NLP tasks (Gehring et -al., 2017; Vaswani et al., 2017). In this work, we compare the two -architectures---recurrent versus non-recurrent---with respect to their ability -to model hierarchical structure and find that recurrency is indeed important -for this purpose. -" -6863,1803.03662,Ziqi Zhang and Lei Luo,"Hate Speech Detection: A Solved Problem? The Challenging Case of Long - Tail on Twitter",cs.CL," In recent years, the increasing propagation of hate speech on social media -and the urgent need for effective counter-measures have drawn significant -investment from governments, companies, and researchers. A large number of -methods have been developed for automated hate speech detection online. This -aims to classify textual content into non-hate or hate speech, in which case -the method may also identify the targeting characteristics (i.e., types of -hate, such as race, and religion) in the hate speech. However, we notice -significant difference between the performance of the two (i.e., non-hate v.s. -hate). In this work, we argue for a focus on the latter problem for practical -reasons. We show that it is a much more challenging task, as our analysis of -the language in the typical datasets shows that hate speech lacks unique, -discriminative features and therefore is found in the 'long tail' in a dataset -that is difficult to discover. We then propose Deep Neural Network structures -serving as feature extractors that are particularly effective for capturing the -semantics of hate speech. Our methods are evaluated on the largest collection -of hate speech datasets based on Twitter, and are shown to be able to -outperform the best performing method by up to 5 percentage points in -macro-average F1, or 8 percentage points in the more challenging case of -identifying hateful content. -" -6864,1803.03664,"Vishwajeet Kumar, Kireeti Boorla, Yogesh Meena, Ganesh Ramakrishnan - and Yuan-Fang Li",Automating Reading Comprehension by Generating Question and Answer Pairs,cs.CL cs.AI," Neural network-based methods represent the state-of-the-art in question -generation from text. Existing work focuses on generating only questions from -text without concerning itself with answer generation. Moreover, our analysis -shows that handling rare words and generating the most appropriate question -given a candidate answer are still challenges facing existing approaches. We -present a novel two-stage process to generate question-answer pairs from the -text. For the first stage, we present alternatives for encoding the span of the -pivotal answer in the sentence using Pointer Networks. In our second stage, we -employ sequence to sequence models for question generation, enhanced with rich -linguistic features. Finally, global attention and answer encoding are used for -generating the question most relevant to the answer. We motivate and -linguistically analyze the role of each component in our framework and consider -compositions of these. This analysis is supported by extensive experimental -evaluations. Using standard evaluation metrics as well as human evaluations, -our experimental results validate the significant improvement in the quality of -questions generated by our framework over the state-of-the-art. The technique -presented here represents another step towards more automated reading -comprehension assessment. We also present a live system \footnote{Demo of the -system is available at -\url{https://www.cse.iitb.ac.in/~vishwajeet/autoqg.html}.} to demonstrate the -effectiveness of our approach. -" -6865,1803.03665,Duncan Blythe and Alan Akbik and Roland Vollgraf,Syntax-Aware Language Modeling with Recurrent Neural Networks,cs.CL cs.LG," Neural language models (LMs) are typically trained using only lexical -features, such as surface forms of words. In this paper, we argue this deprives -the LM of crucial syntactic signals that can be detected at high confidence -using existing parsers. We present a simple but highly effective approach for -training neural LMs using both lexical and syntactic information, and a novel -approach for applying such LMs to unparsed text using sequential Monte Carlo -sampling. In experiments on a range of corpora and corpus sizes, we show our -approach consistently outperforms standard lexical LMs in character-level -language modeling; on the other hand, for word-level models the models are on a -par with standard language models. These results indicate potential for -expanding LMs beyond lexical surface features to higher-level NLP features for -character-level models. -" -6866,1803.03667,"Evgeny Shulzinger, Irina Legchenkova and Edward Bormashenko","Co-occurrence of the Benford-like and Zipf Laws Arising from the Texts - Representing Human and Artificial Languages",cs.CL physics.soc-ph stat.OT," We demonstrate that large texts, representing human (English, Russian, -Ukrainian) and artificial (C++, Java) languages, display quantitative patterns -characterized by the Benford-like and Zipf laws. The frequency of a word -following the Zipf law is inversely proportional to its rank, whereas the total -numbers of a certain word appearing in the text generate the uneven -Benford-like distribution of leading numbers. Excluding the most popular words -essentially improves the correlation of actual textual data with the Zipfian -distribution, whereas the Benford distribution of leading numbers (arising from -the overall amount of a certain word) is insensitive to the same elimination -procedure. The calculated values of the moduli of slopes of double -logarithmical plots for artificial languages (C++, Java) are markedly larger -than those for human ones. -" -6867,1803.03670,"Shuqing Bian, Zhenpeng Deng, Fei Li, Will Monroe, Peng Shi, Zijun Sun, - Wei Wu, Sikuang Wang, William Yang Wang, Arianna Yuan, Tianwei Zhang and - Jiwei Li",IcoRating: A Deep-Learning System for Scam ICO Identification,cs.CL," Cryptocurrencies (or digital tokens, digital currencies, e.g., BTC, ETH, XRP, -NEO) have been rapidly gaining ground in use, value, and understanding among -the public, bringing astonishing profits to investors. Unlike other money and -banking systems, most digital tokens do not require central authorities. Being -decentralized poses significant challenges for credit rating. Most ICOs are -currently not subject to government regulations, which makes a reliable credit -rating system for ICO projects necessary and urgent. - In this paper, we introduce IcoRating, the first learning--based -cryptocurrency rating system. We exploit natural-language processing techniques -to analyze various aspects of 2,251 digital currencies to date, such as white -paper content, founding teams, Github repositories, websites, etc. Supervised -learning models are used to correlate the life span and the price change of -cryptocurrencies with these features. For the best setting, the proposed system -is able to identify scam ICO projects with 0.83 precision. - We hope this work will help investors identify scam ICOs and attract more -efforts in automatically evaluating and analyzing ICO projects. -" -6868,1803.03697,"Srijan Kumar, William L. Hamilton, Jure Leskovec, Dan Jurafsky",Community Interaction and Conflict on the Web,cs.SI cs.CL cs.HC," Users organize themselves into communities on web platforms. These -communities can interact with one another, often leading to conflicts and toxic -interactions. However, little is known about the mechanisms of interactions -between communities and how they impact users. - Here we study intercommunity interactions across 36,000 communities on -Reddit, examining cases where users of one community are mobilized by negative -sentiment to comment in another community. We show that such conflicts tend to -be initiated by a handful of communities---less than 1% of communities start -74% of conflicts. While conflicts tend to be initiated by highly active -community members, they are carried out by significantly less active members. -We find that conflicts are marked by formation of echo chambers, where users -primarily talk to other users from their own community. In the long-term, -conflicts have adverse effects and reduce the overall activity of users in the -targeted communities. - Our analysis of user interactions also suggests strategies for mitigating the -negative impact of conflicts---such as increasing direct engagement between -attackers and defenders. Further, we accurately predict whether a conflict will -occur by creating a novel LSTM model that combines graph embeddings, user, -community, and text features. This model can be used toreate early-warning -systems for community moderators to prevent conflicts. Altogether, this work -presents a data-driven view of community interactions and conflict, and paves -the way towards healthier online communities. -" -6869,1803.03786,"Georgi Karadzhov, Pepa Gencheva, Preslav Nakov, Ivan Koychev","We Built a Fake News & Click-bait Filter: What Happened Next Will Blow - Your Mind!",cs.CL," It is completely amazing! Fake news and click-baits have totally invaded the -cyber space. Let us face it: everybody hates them for three simple reasons. -Reason #2 will absolutely amaze you. What these can achieve at the time of -election will completely blow your mind! Now, we all agree, this cannot go on, -you know, somebody has to stop it. So, we did this research on fake -news/click-bait detection and trust us, it is totally great research, it really -is! Make no mistake. This is the best research ever! Seriously, come have a -look, we have it all: neural networks, attention mechanism, sentiment lexicons, -author profiling, you name it. Lexical features, semantic features, we -absolutely have it all. And we have totally tested it, trust us! We have -results, and numbers, really big numbers. The best numbers ever! Oh, and -analysis, absolutely top notch analysis. Interested? Come read the shocking -truth about fake news and click-bait in the Bulgarian cyber space. You won't -believe what we have found! -" -6870,1803.03827,"Albert Gatt, Marc Tanti, Adrian Muscat, Patrizia Paggio, Reuben A. - Farrugia, Claudia Borg, Kenneth P. Camilleri, Mike Rosner, Lonneke van der - Plas","Face2Text: Collecting an Annotated Image Description Corpus for the - Generation of Rich Face Descriptions",cs.CL cs.AI cs.CV," The past few years have witnessed renewed interest in NLP tasks at the -interface between vision and language. One intensively-studied problem is that -of automatically generating text from images. In this paper, we extend this -problem to the more specific domain of face description. Unlike scene -descriptions, face descriptions are more fine-grained and rely on attributes -extracted from the image, rather than objects and relations. Given that no data -exists for this task, we present an ongoing crowdsourcing study to collect a -corpus of descriptions of face images taken `in the wild'. To gain a better -understanding of the variation we find in face description and the possible -issues that this may raise, we also conducted an annotation study on a subset -of the corpus. Primarily, we found descriptions to refer to a mixture of -attributes, not only physical, but also emotional and inferential, which is -bound to create further challenges for current image-to-text methods. -" -6871,1803.03859,"Soumil Mandal, Sourya Dipta Das, Dipankar Das","Language Identification of Bengali-English Code-Mixed data using - Character & Phonetic based LSTM Models",cs.CL," Language identification of social media text still remains a challenging task -due to properties like code-mixing and inconsistent phonetic transliterations. -In this paper, we present a supervised learning approach for language -identification at the word level of low resource Bengali-English code-mixed -data taken from social media. We employ two methods of word encoding, namely -character based and root phone based to train our deep LSTM models. Utilizing -these two models we created two ensemble models using stacking and threshold -technique which gave 91.78% and 92.35% accuracies respectively on our testing -data. -" -6872,1803.03887,"Hai Hu, Yiwen Zhang",Path of Vowel Raising in Chengdu Dialect of Mandarin,cs.CL," He and Rao (2013) reported a raising phenomenon of /a/ in /Xan/ (X being a -consonant or a vowel) in Chengdu dialect of Mandarin, i.e. /a/ is realized as -[epsilon] for young speakers but [ae] for older speakers, but they offered no -acoustic analysis. We designed an acoustic study that examined the realization -of /Xan/ in speakers of different age (old vs. young) and gender (male vs. -female) groups, where X represents three conditions: 1) unaspirated consonants: -C ([p], [t], [k]), 2) aspirated consonants: Ch ([ph], [th], [kh]), and 3) high -vowels: V ([i], [y], [u]). 17 native speakers were asked to read /Xan/ -characters and the F1 values are extracted for comparison. Our results -confirmed the raising effect in He and Rao (2013), i.e., young speakers realize -/a/ as [epsilon] in /an/, whereas older speakers in the most part realize it as -[ae]. Also, female speakers raise more than male speakers within the same age -group. Interestingly, within the /Van/ condition, older speakers do raise /a/ -in /ian/ and /yan/. We interpret this as /a/ first assimilates to its preceding -front high vowels /i/ and /y/ for older speakers, which then becomes -phonologized in younger speakers in all conditions, including /Chan/ and /Can/. -This shows a possible trajectory of the ongoing sound change in the Chengdu -dialect. -" -6873,1803.03917,"Will Monroe, Jennifer Hu, Andrew Jong, Christopher Potts",Generating Bilingual Pragmatic Color References,cs.CL," Contextual influences on language often exhibit substantial cross-lingual -regularities; for example, we are more verbose in situations that require finer -distinctions. However, these regularities are sometimes obscured by semantic -and syntactic differences. Using a newly-collected dataset of color reference -games in Mandarin Chinese (which we release to the public), we confirm that a -variety of constructions display the same sensitivity to contextual difficulty -in Chinese and English. We then show that a neural speaker agent trained on -bilingual data with a simple multitask learning approach displays more -human-like patterns of context dependence and is more pragmatically informative -than its monolingual Chinese counterpart. Moreover, this is not at the expense -of language-specific semantic understanding: the resulting speaker model learns -the different basic color term systems of English and Chinese (with noteworthy -cross-lingual influences), and it can identify synonyms between the two -languages using vector analogy operations on its output layer, despite having -no exposure to parallel data. -" -6874,1803.04000,"Soumil Mandal, Sainik Kumar Mahata, Dipankar Das","Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of - Indian Languages",cs.CL," Analysis of informative contents and sentiments of social users has been -attempted quite intensively in the recent past. Most of the systems are usable -only for monolingual data and fails or gives poor results when used on data -with code-mixing property. To gather attention and encourage researchers to -work on this crisis, we prepared gold standard Bengali-English code-mixed data -with language and polarity tag for sentiment analysis purposes. In this paper, -we discuss the systems we prepared to collect and filter raw Twitter data. In -order to reduce manual work while annotation, hybrid systems combining rule -based and supervised models were developed for both language and sentiment -tagging. The final corpus was annotated by a group of annotators following a -few guidelines. The gold standard corpus thus obtained has impressive -inter-annotator agreement obtained in terms of Kappa values. Various metrics -like Code-Mixed Index (CMI), Code-Mixed Factor (CF) along with various aspects -(language and emotion) also qualitatively polled the code-mixed and sentiment -properties of the corpus. -" -6875,1803.04291,"Mohammad Sadegh Rasooli, Sarangarajan Parthasarathy",Entity-Aware Language Model as an Unsupervised Reranker,cs.CL," In language modeling, it is difficult to incorporate entity relationships -from a knowledge-base. One solution is to use a reranker trained with global -features, in which global features are derived from n-best lists. However, -training such a reranker requires manually annotated n-best lists, which is -expensive to obtain. We propose a method based on the contrastive estimation -method that alleviates the need for such data. Experiments in the music domain -demonstrate that global features, as well as features extracted from an -external knowledge-base, can be incorporated into our reranker. Our final -model, a simple ensemble of a language model and reranker, achieves a 0.44\% -absolute word error rate improvement over an LSTM language model on the blind -test data. -" -6876,1803.04329,Fabiano Ferreira Luz and Marcelo Finger,"Semantic Parsing Natural Language into SPARQL: Improving Target Language - Representation with Neural Attention",cs.CL," Semantic parsing is the process of mapping a natural language sentence into a -formal representation of its meaning. In this work we use the neural network -approach to transform natural language sentence into a query to an ontology -database in the SPARQL language. This method does not rely on handcraft-rules, -high-quality lexicons, manually-built templates or other handmade complex -structures. Our approach is based on vector space model and neural networks. -The proposed model is based in two learning steps. The first step generates a -vector representation for the sentence in natural language and SPARQL query. -The second step uses this vector representation as input to a neural network -(LSTM with attention mechanism) to generate a model able to encode natural -language and decode SPARQL. -" -6877,1803.04349,Finn {\AA}rup Nielsen,Linking ImageNet WordNet Synsets with Wikidata,cs.DL cs.CL," The linkage of ImageNet WordNet synsets to Wikidata items will leverage deep -learning algorithm with access to a rich multilingual knowledge graph. Here I -will describe our on-going efforts in linking the two resources and issues -faced in matching the Wikidata and WordNet knowledge graphs. I show an example -on how the linkage can be used in a deep learning setting with real-time image -classification and labeling in a non-English language and discuss what -opportunities lies ahead. -" -6878,1803.04375,Pham Quang Nhat Minh,A Feature-Rich Vietnamese Named-Entity Recognition Model,cs.CL," In this paper, we present a feature-based named-entity recognition (NER) -model that achieves the start-of-the-art accuracy for Vietnamese language. We -combine word, word-shape features, PoS, chunk, Brown-cluster-based features, -and word-embedding-based features in the Conditional Random Fields (CRF) model. -We also explore the effects of word segmentation, PoS tagging, and chunking -results of many popular Vietnamese NLP toolkits on the accuracy of the proposed -feature-based NER model. Up to now, our work is the first work that -systematically performs an extrinsic evaluation of basic Vietnamese NLP -toolkits on the downstream NER task. Experimental results show that while -automatically-generated word segmentation is useful, PoS and chunking -information generated by Vietnamese NLP tools does not show their benefits for -the proposed feature-based NER model. -" -6879,1803.04488,"Faisal Alshargi, Saeedeh Shekarpour, Tommaso Soru, Amit Sheth","Concept2vec: Metrics for Evaluating Quality of Embeddings for - Ontological Concepts",cs.CL cs.AI," Although there is an emerging trend towards generating embeddings for -primarily unstructured data and, recently, for structured data, no systematic -suite for measuring the quality of embeddings has been proposed yet. This -deficiency is further sensed with respect to embeddings generated for -structured data because there are no concrete evaluation metrics measuring the -quality of the encoded structure as well as semantic patterns in the embedding -space. In this paper, we introduce a framework containing three distinct tasks -concerned with the individual aspects of ontological concepts: (i) the -categorization aspect, (ii) the hierarchical aspect, and (iii) the relational -aspect. Then, in the scope of each task, a number of intrinsic metrics are -proposed for evaluating the quality of the embeddings. Furthermore, w.r.t. this -framework, multiple experimental studies were run to compare the quality of the -available embedding models. Employing this framework in future research can -reduce misjudgment and provide greater insight about quality comparisons of -embeddings for ontological concepts. We positioned our sampled data and code at -https://github.com/alshargi/Concept2vec under GNU General Public License v3.0. -" -6880,1803.04579,"Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, Kedar - Dhamdhere",It was the training data pruning too!,cs.LG cs.CL," We study the current best model (KDG) for question answering on tabular data -evaluated over the WikiTableQuestions dataset. Previous ablation studies -performed against this model attributed the model's performance to certain -aspects of its architecture. In this paper, we find that the model's -performance also crucially depends on a certain pruning of the data used to -train the model. Disabling the pruning step drops the accuracy of the model -from 43.3% to 36.3%. The large impact on the performance of the KDG model -suggests that the pruning may be a useful pre-processing step in training other -semantic parsers as well. -" -6881,1803.04596,"Tom De Smedt, Guy De Pauw, Pieter Van Ostaeyen",Automatic Detection of Online Jihadist Hate Speech,cs.CL cs.AI cs.CR," We have developed a system that automatically detects online jihadist hate -speech with over 80% accuracy, by using techniques from Natural Language -Processing and Machine Learning. The system is trained on a corpus of 45,000 -subversive Twitter messages collected from October 2014 to December 2016. We -present a qualitative and quantitative analysis of the jihadist rhetoric in the -corpus, examine the network of Twitter users, outline the technical procedure -used to train the system, and discuss examples of use. -" -6882,1803.04715,"Nghi D. Q. Bui, Lingxiao Jiang","Hierarchical Learning of Cross-Language Mappings through Distributed - Vector Representations for Code",cs.LG cs.CL cs.SE," Translating a program written in one programming language to another can be -useful for software development tasks that need functionality implementations -in different languages. Although past studies have considered this problem, -they may be either specific to the language grammars, or specific to certain -kinds of code elements (e.g., tokens, phrases, API uses). This paper proposes a -new approach to automatically learn cross-language representations for various -kinds of structural code elements that may be used for program translation. Our -key idea is two folded: First, we normalize and enrich code token streams with -additional structural and semantic information, and train cross-language vector -representations for the tokens (a.k.a. shared embeddings based on word2vec, a -neural-network-based technique for producing word embeddings; Second, -hierarchically from bottom up, we construct shared embeddings for code elements -of higher levels of granularity (e.g., expressions, statements, methods) from -the embeddings for their constituents, and then build mappings among code -elements across languages based on similarities among embeddings. - Our preliminary evaluations on about 40,000 Java and C# source files from 9 -software projects show that our approach can automatically learn shared -embeddings for various code elements in different languages and identify their -cross-language mappings with reasonable Mean Average Precision scores. When -compared with an existing tool for mapping library API methods, our approach -identifies many more mappings accurately. The mapping results and code can be -accessed at -https://github.com/bdqnghi/hierarchical-programming-language-mapping. We -believe that our idea for learning cross-language vector representations with -code structural information can be a useful step towards automated program -translation. -" -6883,1803.04757,"Tim Isbister, Magnus Sahlgren, Lisa Kaati, Milan Obaidi, Nazar Akrami",Monitoring Targeted Hate in Online Environments,cs.CL," Hateful comments, swearwords and sometimes even death threats are becoming a -reality for many people today in online environments. This is especially true -for journalists, politicians, artists, and other public figures. This paper -describes how hate directed towards individuals can be measured in online -environments using a simple dictionary-based approach. We present a case study -on Swedish politicians, and use examples from this study to discuss -shortcomings of the proposed dictionary-based approach. We also outline -possibilities for potential refinements of the proposed approach. -" -6884,1803.04790,Yufang Hou,Enhanced Word Representations for Bridging Anaphora Resolution,cs.CL," Most current models of word representations(e.g.,GloVe) have successfully -captured fine-grained semantics. However, semantic similarity exhibited in -these word embeddings is not suitable for resolving bridging anaphora, which -requires the knowledge of associative similarity (i.e., relatedness) instead of -semantic similarity information between synonyms or hypernyms. We create word -embeddings (embeddings_PP) to capture such relatedness by exploring the -syntactic structure of noun phrases. We demonstrate that using embeddings_PP -alone achieves around 30% of accuracy for bridging anaphora resolution on the -ISNotes corpus. Furthermore, we achieve a substantial gain over the -state-of-the-art system (Hou et al., 2013) for bridging antecedent selection. -" -6885,1803.04884,"Torsten Kilias, Alexander L\""oser, Felix A. Gers, Richard - Koopmanschap, Ying Zhang, Martin Kersten",IDEL: In-Database Entity Linking with Neural Embeddings,cs.DB cs.CL cs.NE," We present a novel architecture, In-Database Entity Linking (IDEL), in which -we integrate the analytics-optimized RDBMS MonetDB with neural text mining -abilities. Our system design abstracts core tasks of most neural entity linking -systems for MonetDB. To the best of our knowledge, this is the first defacto -implemented system integrating entity-linking in a database. We leverage the -ability of MonetDB to support in-database-analytics with user defined functions -(UDFs) implemented in Python. These functions call machine learning libraries -for neural text mining, such as TensorFlow. The system achieves zero cost for -data shipping and transformation by utilizing MonetDB's ability to embed Python -processes in the database kernel and exchange data in NumPy arrays. IDEL -represents text and relational data in a joint vector space with neural -embeddings and can compensate errors with ambiguous entity representations. For -detecting matching entities, we propose a novel similarity function based on -joint neural embeddings which are learned via minimizing pairwise contrastive -ranking loss. This function utilizes a high dimensional index structures for -fast retrieval of matching entities. Our first implementation and experiments -using the WebNLG corpus show the effectiveness and the potentials of IDEL. -" -6886,1803.05030,"Shiliang Zhang, Ming Lei, Zhijie Yan, Lirong Dai",Deep-FSMN for Large Vocabulary Continuous Speech Recognition,cs.NE cs.CL," In this paper, we present an improved feedforward sequential memory networks -(FSMN) architecture, namely Deep-FSMN (DFSMN), by introducing skip connections -between memory blocks in adjacent layers. These skip connections enable the -information flow across different layers and thus alleviate the gradient -vanishing problem when building very deep structure. As a result, DFSMN -significantly benefits from these skip connections and deep structure. We have -compared the performance of DFSMN to BLSTM both with and without lower frame -rate (LFR) on several large speech recognition tasks, including English and -Mandarin. Experimental results shown that DFSMN can consistently outperform -BLSTM with dramatic gain, especially trained with LFR using CD-Phone as -modeling units. In the 2000 hours Fisher (FSH) task, the proposed DFSMN can -achieve a word error rate of 9.4% by purely using the cross-entropy criterion -and decoding with a 3-gram language model, which achieves a 1.5% absolute -improvement compared to the BLSTM. In a 20000 hours Mandarin recognition task, -the LFR trained DFSMN can achieve more than 20% relative improvement compared -to the LFR trained BLSTM. Moreover, we can easily design the lookahead filter -order of the memory blocks in DFSMN to control the latency for real-time -applications. -" -6887,1803.05058,Odette Scharenborg and Martha Larson,Investigating the Effect of Music and Lyrics on Spoken-Word Recognition,cs.SD cs.CL eess.AS," Background music in social interaction settings can hinder conversation. Yet, -little is known of how specific properties of music impact speech processing. -This paper addresses this knowledge gap by investigating 1) whether the masking -effect of background music with lyrics is larger than that of music without -lyrics, and 2) whether the masking effect is larger for more complex music. To -answer these questions, a word identification experiment was run in which Dutch -participants listened to Dutch CVC words embedded in stretches of background -music in two conditions, with and without lyrics, and at three SNRs. Three -songs were used of different genres and complexities. Music stretches with and -without lyrics were sampled from the same song in order to control for factors -beyond the presence of lyrics. The results showed a clear negative impact of -the presence of lyrics in background music on spoken-word recognition. This -impact is independent of complexity. The results suggest that social spaces -(e.g., restaurants, caf\'es and bars) should make careful choices of music to -promote conversation, and open a path for future work. -" -6888,1803.05071,"Jacob Buckman, Graham Neubig",Neural Lattice Language Models,cs.CL," In this work, we propose a new language modeling paradigm that has the -ability to perform both prediction and moderation of information flow at -multiple granularities: neural lattice language models. These models construct -a lattice of possible paths through a sentence and marginalize across this -lattice to calculate sequence probabilities or optimize parameters. This -approach allows us to seamlessly incorporate linguistic intuitions - including -polysemy and existence of multi-word lexical items - into our language model. -Experiments on multiple language modeling tasks show that English neural -lattice language models that utilize polysemous embeddings are able to improve -perplexity by 9.95% relative to a word-level baseline, and that a Chinese model -that handles multi-character tokens is able to improve perplexity by 20.94% -relative to a character-level baseline. -" -6889,1803.05160,"Igor Mozeti\v{c}, Luis Torgo, Vitor Cerqueira, Jasmina Smailovi\'c",How to evaluate sentiment classifiers for Twitter time-ordered data?,cs.CL cs.IR cs.SI," Social media are becoming an increasingly important source of information -about the public mood regarding issues such as elections, Brexit, stock market, -etc. In this paper we focus on sentiment classification of Twitter data. -Construction of sentiment classifiers is a standard text mining task, but here -we address the question of how to properly evaluate them as there is no settled -way to do so. Sentiment classes are ordered and unbalanced, and Twitter -produces a stream of time-ordered data. The problem we address concerns the -procedures used to obtain reliable estimates of performance measures, and -whether the temporal ordering of the training and test data matters. We -collected a large set of 1.5 million tweets in 13 European languages. We -created 138 sentiment models and out-of-sample datasets, which are used as a -gold standard for evaluations. The corresponding 138 in-sample datasets are -used to empirically compare six different estimation procedures: three variants -of cross-validation, and three variants of sequential validation (where test -set always follows the training set). We find no significant difference between -the best cross-validation and sequential validation. However, we observe that -all cross-validation variants tend to overestimate the performance, while the -sequential methods tend to underestimate it. Standard cross-validation with -random selection of examples is significantly worse than the blocked -cross-validation, and should not be used to evaluate classifiers in -time-ordered data scenarios. -" -6890,1803.05223,"Simon Ostermann, Ashutosh Modi, Michael Roth, Stefan Thater, Manfred - Pinkal","MCScript: A Novel Dataset for Assessing Machine Comprehension Using - Script Knowledge",cs.CL," We introduce a large dataset of narrative texts and questions about these -texts, intended to be used in a machine comprehension task that requires -reasoning using commonsense knowledge. Our dataset complements similar datasets -in that we focus on stories about everyday activities, such as going to the -movies or working in the garden, and that the questions require commonsense -knowledge, or more specifically, script knowledge, to be answered. We show that -our mode of data collection via crowdsourcing results in a substantial amount -of such inference questions. The dataset forms the basis of a shared task on -commonsense and script knowledge organized at SemEval 2018 and provides -challenging test cases for the broader natural language understanding -community. -" -6891,1803.05307,"Sergey Novoselov, Oleg Kudashev, Vadim Schemelinin, Ivan Kremnev and - Galina Lavrentyeva",Deep CNN based feature extractor for text-prompted speaker recognition,eess.AS cs.CL cs.LG cs.SD stat.ML," Deep learning is still not a very common tool in speaker verification field. -We study deep convolutional neural network performance in the text-prompted -speaker verification task. The prompted passphrase is segmented into word -states - i.e. digits -to test each digit utterance separately. We train a -single high-level feature extractor for all states and use cosine similarity -metric for scoring. The key feature of our network is the Max-Feature-Map -activation function, which acts as an embedded feature selector. By using -multitask learning scheme to train the high-level feature extractor we were -able to surpass the classic baseline systems in terms of quality and achieved -impressive results for such a novice approach, getting 2.85% EER on the RSR2015 -evaluation set. Fusion of the proposed and the baseline systems improves this -result. -" -6892,1803.05355,"James Thorne, Andreas Vlachos, Christos Christodoulopoulos and Arpit - Mittal",FEVER: a large-scale dataset for Fact Extraction and VERification,cs.CL," In this paper we introduce a new publicly available dataset for verification -against textual sources, FEVER: Fact Extraction and VERification. It consists -of 185,445 claims generated by altering sentences extracted from Wikipedia and -subsequently verified without knowledge of the sentence they were derived from. -The claims are classified as Supported, Refuted or NotEnoughInfo by annotators -achieving 0.6841 in Fleiss $\kappa$. For the first two classes, the annotators -also recorded the sentence(s) forming the necessary evidence for their -judgment. To characterize the challenge of the dataset presented, we develop a -pipeline approach and compare it to suitably designed oracles. The best -accuracy we achieve on labeling a claim accompanied by the correct evidence is -31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that -FEVER is a challenging testbed that will help stimulate progress on claim -verification against textual sources. -" -6893,1803.05449,Alexis Conneau and Douwe Kiela,SentEval: An Evaluation Toolkit for Universal Sentence Representations,cs.CL," We introduce SentEval, a toolkit for evaluating the quality of universal -sentence representations. SentEval encompasses a variety of tasks, including -binary and multi-class classification, natural language inference and sentence -similarity. The set of tasks was selected based on what appears to be the -community consensus regarding the appropriate evaluations for universal -sentence representations. The toolkit comes with scripts to download and -preprocess datasets, and an easy interface to evaluate sentence encoders. The -aim is to provide a fairer, less cumbersome and more centralized way for -evaluating sentence representations. -" -6894,1803.05457,"Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish - Sabharwal, Carissa Schoenick, Oyvind Tafjord","Think you have Solved Question Answering? Try ARC, the AI2 Reasoning - Challenge",cs.AI cs.CL cs.IR," We present a new question set, text corpus, and baselines assembled to -encourage AI research in advanced question answering. Together, these -constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful -knowledge and reasoning than previous challenges such as SQuAD or SNLI. The ARC -question set is partitioned into a Challenge Set and an Easy Set, where the -Challenge Set contains only questions answered incorrectly by both a -retrieval-based algorithm and a word co-occurence algorithm. The dataset -contains only natural, grade-school science questions (authored for human -tests), and is the largest public-domain set of this kind (7,787 questions). We -test several baselines on the Challenge Set, including leading neural models -from the SQuAD and SNLI tasks, and find that none are able to significantly -outperform a random baseline, reflecting the difficult nature of this task. We -are also releasing the ARC Corpus, a corpus of 14M science sentences relevant -to the task, and implementations of the three neural baseline models tested. -Can your model perform better? We pose ARC as a challenge to the community. -" -6895,1803.05495,"Shervin Malmasi, Marcos Zampieri",Challenges in Discriminating Profanity from Hate Speech,cs.CL," In this study we approach the problem of distinguishing general profanity -from hate speech in social media, something which has not been widely -considered. Using a new dataset annotated specifically for this task, we employ -supervised classification along with a set of features that includes n-grams, -skip-grams and clustering-based word representations. We apply approaches based -on single classifiers as well as more advanced ensemble classifiers and stacked -generalization, achieving the best result of 80% accuracy for this 3-class -classification task. Analysis of the results reveals that discriminating hate -speech and profanity is not a simple task, which may require features that -capture a deeper understanding of the text not always possible with surface -n-grams. The variability of gold labels in the annotated data, due to -differences in the subjective adjudications of the annotators, is also an -issue. Other directions for future work are discussed. -" -6896,1803.05547,"Siddarth Srinivasan, Richa Arora, Mark Riedl",A Simple and Effective Approach to the Story Cloze Test,cs.CL," In the Story Cloze Test, a system is presented with a 4-sentence prompt to a -story, and must determine which one of two potential endings is the 'right' -ending to the story. Previous work has shown that ignoring the training set and -training a model on the validation set can achieve high accuracy on this task -due to stylistic differences between the story endings in the training set and -validation and test sets. Following this approach, we present a simpler -fully-neural approach to the Story Cloze Test using skip-thought embeddings of -the stories in a feed-forward network that achieves close to state-of-the-art -performance on this task without any feature engineering. We also find that -considering just the last sentence of the prompt instead of the whole prompt -yields higher accuracy with our approach. -" -6897,1803.05563,"Amit Das, Jinyu Li, Rui Zhao, Yifan Gong",Advancing Connectionist Temporal Classification With Attention Modeling,cs.CL," In this study, we propose advancing all-neural speech recognition by directly -incorporating attention modeling within the Connectionist Temporal -Classification (CTC) framework. In particular, we derive new context vectors -using time convolution features to model attention as part of the CTC network. -To further improve attention modeling, we utilize content information extracted -from a network representing an implicit language model. Finally, we introduce -vector based attention weights that are applied on context vectors across both -time and their individual components. We evaluate our system on a 3400 hours -Microsoft Cortana voice assistant task and demonstrate that our proposed model -consistently outperforms the baseline model achieving about 20% relative -reduction in word error rates. -" -6898,1803.05566,"Jinyu Li, Guoli Ye, Amit Das, Rui Zhao, Yifan Gong",Advancing Acoustic-to-Word CTC Model,cs.CL," The acoustic-to-word model based on the connectionist temporal classification -(CTC) criterion was shown as a natural end-to-end (E2E) model directly -targeting words as output units. However, the word-based CTC model suffers from -the out-of-vocabulary (OOV) issue as it can only model limited number of words -in the output layer and maps all the remaining words into an OOV output node. -Hence, such a word-based CTC model can only recognize the frequent words -modeled by the network output nodes. Our first attempt to improve the -acoustic-to-word model is a hybrid CTC model which consults a letter-based CTC -when the word-based CTC model emits OOV tokens during testing time. Then, we -propose a much better solution by training a mixed-unit CTC model which -decomposes all the OOV words into sequences of frequent words and multi-letter -units. Evaluated on a 3400 hours Microsoft Cortana voice assistant task, the -final acoustic-to-word solution improves the baseline word-based CTC by -relative 12.09% word error rate (WER) reduction when combined with our proposed -attention CTC. Such an E2E model without using any language model (LM) or -complex decoder outperforms the traditional context-dependent phoneme CTC which -has strong LM and decoder by relative 6.79%. -" -6899,1803.05567,"Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan - Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William - Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, - Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong - Zhang, Zhirui Zhang, Ming Zhou",Achieving Human Parity on Automatic Chinese to English News Translation,cs.CL," Machine translation has made rapid advances in recent years. Millions of -people are using it today in online translation systems and mobile applications -in order to communicate across language barriers. The question naturally arises -whether such systems can approach or achieve parity with human translations. In -this paper, we first address the problem of how to define and accurately -measure human parity in translation. We then describe Microsoft's machine -translation system and measure the quality of its translations on the widely -used WMT 2017 news translation task from Chinese to English. We find that our -latest neural machine translation system has reached a new state-of-the-art, -and that the translation quality is at human parity when compared to -professional human translations. We also find that it significantly exceeds the -quality of crowd-sourced non-professional translations. -" -6900,1803.05651,Maximilian Lam,Word2Bits - Quantized Word Vectors,cs.CL," Word vectors require significant amounts of memory and storage, posing issues -to resource limited devices like mobile phones and GPUs. We show that high -quality quantized word vectors using 1-2 bits per parameter can be learned by -introducing a quantization function into Word2Vec. We furthermore show that -training with the quantization function acts as a regularizer. We train word -vectors on English Wikipedia (2017) and evaluate them on standard word -similarity and analogy tasks and on question answering (SQuAD). Our quantized -word vectors not only take 8-16x less space than full precision (32 bit) word -vectors but also outperform them on word similarity tasks and question -answering. -" -6901,1803.05655,"Zhipeng Chen, Yiming Cui, Wentao Ma, Shijin Wang, Ting Liu and Guoping - Hu","HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for - Commonsense Reading Comprehension",cs.CL," This paper describes the system which got the state-of-the-art results at -SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge. In -this paper, we present a neural network called Hybrid Multi-Aspects (HMA) -model, which mimic the human's intuitions on dealing with the multiple-choice -reading comprehension. In this model, we aim to produce the predictions in -multiple aspects by calculating attention among the text, question and choices, -and combine these results for final predictions. Experimental results show that -our HMA model could give substantial improvements over the baseline system and -got the first place on the final test set leaderboard with the accuracy of -84.13%. -" -6902,1803.05662,"Ji Wen, Xu Sun, Xuancheng Ren, Qi Su","Structure Regularized Neural Network for Entity Relation Classification - for Chinese Literature Text",cs.CL," Relation classification is an important semantic processing task in the field -of natural language processing. In this paper, we propose the task of relation -classification for Chinese literature text. A new dataset of Chinese literature -text is constructed to facilitate the study in this task. We present a novel -model, named Structure Regularized Bidirectional Recurrent Convolutional Neural -Network (SR-BRCNN), to identify the relation between entities. The proposed -model learns relation representations along the shortest dependency path (SDP) -extracted from the structure regularized dependency tree, which has the -benefits of reducing the complexity of the whole model. Experimental results -show that the proposed method significantly improves the F1 score by 10.3, and -outperforms the state-of-the-art approaches on Chinese literature text. -" -6903,1803.05667,"Parisa Naderi Golshan, HosseinAli Rahmani Dashti, Shahrzad Azizi and - Leila Safari",A Study of Recent Contributions on Information Extraction,cs.IR cs.CL," This paper reports on modern approaches in Information Extraction (IE) and -its two main sub-tasks of Named Entity Recognition (NER) and Relation -Extraction (RE). Basic concepts and the most recent approaches in this area are -reviewed, which mainly include Machine Learning (ML) based approaches and the -more recent trend to Deep Learning (DL) based methods. -" -6904,1803.05795,"Alexander Panchenko, Anastasiya Lopukhina, Dmitry Ustalov, Konstantin - Lopukhin, Nikolay Arefyev, Alexey Leontyev, Natalia Loukachevitch","RUSSE'2018: A Shared Task on Word Sense Induction for the Russian - Language",cs.CL," The paper describes the results of the first shared task on word sense -induction (WSI) for the Russian language. While similar shared tasks were -conducted in the past for some Romance and Germanic languages, we explore the -performance of sense induction and disambiguation methods for a Slavic language -that shares many features with other Slavic languages, such as rich morphology -and virtually free word order. The participants were asked to group contexts of -a given word in accordance with its senses that were not provided beforehand. -For instance, given a word ""bank"" and a set of contexts for this word, e.g. -""bank is a financial institution that accepts deposits"" and ""river bank is a -slope beside a body of water"", a participant was asked to cluster such contexts -in the unknown in advance number of clusters corresponding to, in this case, -the ""company"" and the ""area"" senses of the word ""bank"". For the purpose of this -evaluation campaign, we developed three new evaluation datasets based on sense -inventories that have different sense granularity. The contexts in these -datasets were sampled from texts of Wikipedia, the academic corpus of Russian, -and an explanatory dictionary of Russian. Overall, 18 teams participated in the -competition submitting 383 models. Multiple teams managed to substantially -outperform competitive state-of-the-art baselines from the previous years based -on sense embeddings. -" -6905,1803.05820,"Alexander Panchenko, Natalia Loukachevitch, Dmitry Ustalov, Denis - Paperno, Christian Meyer, Natalia Konstantinova",RUSSE: The First Workshop on Russian Semantic Similarity,cs.CL," The paper gives an overview of the Russian Semantic Similarity Evaluation -(RUSSE) shared task held in conjunction with the Dialogue 2015 conference. -There exist a lot of comparative studies on semantic similarity, yet no -analysis of such measures was ever performed for the Russian language. -Exploring this problem for the Russian language is even more interesting, -because this language has features, such as rich morphology and free word -order, which make it significantly different from English, German, and other -well-studied languages. We attempt to bridge this gap by proposing a shared -task on the semantic similarity of Russian nouns. Our key contribution is an -evaluation methodology based on four novel benchmark datasets for the Russian -language. Our analysis of the 105 submissions from 19 teams reveals that -successful approaches for English, such as distributional and skip-gram models, -are directly applicable to Russian as well. On the one hand, the best results -in the contest were obtained by sophisticated supervised models that combine -evidence from different sources. On the other hand, completely unsupervised -approaches, such as a skip-gram model estimated on a large-scale corpus, were -able score among the top 5 systems. -" -6906,1803.05829,"Stefano Faralli, Alexander Panchenko, Chris Biemann, Simone Paolo - Ponzetto",Enriching Frame Representations with Distributionally Induced Senses,cs.CL," We introduce a new lexical resource that enriches the Framester knowledge -graph, which links Framnet, WordNet, VerbNet and other resources, with semantic -features from text corpora. These features are extracted from distributionally -induced sense inventories and subsequently linked to the manually-constructed -frame representations to boost the performance of frame disambiguation in -context. Since Framester is a frame-based knowledge graph, which enables -full-fledged OWL querying and reasoning, our resource paves the way for the -development of novel, deeper semantic-aware applications that could benefit -from the combination of knowledge from text and complex symbolic -representations of events and participants. Together with the resource we also -provide the software we developed for the evaluation in the task of Word Frame -Disambiguation (WFD). -" -6907,1803.05928,"Jekaterina Novikova, Ond\v{r}ej Du\v{s}ek and Verena Rieser",RankME: Reliable Human Ratings for Natural Language Generation,cs.CL," Human evaluation for natural language generation (NLG) often suffers from -inconsistent user ratings. While previous research tends to attribute this -problem to individual user preferences, we show that the quality of human -judgements can also be improved by experimental design. We present a novel -rank-based magnitude estimation method (RankME), which combines the use of -continuous scales and relative assessments. We show that RankME significantly -improves the reliability and consistency of human ratings compared to -traditional evaluation methods. In addition, we show that it is possible to -evaluate NLG systems according to multiple, distinct criteria, which is -important for error analysis. Finally, we demonstrate that RankME, in -combination with Bayesian estimation of system quality, is a cost-effective -alternative for ranking multiple NLG systems. -" -6908,1803.06064,"Chao-Chun Liang, Yu-Shiang Wong, Yi-Chung Lin and Keh-Yih Su",A Meaning-based Statistical English Math Word Problem Solver,cs.AI cs.CL," We introduce MeSys, a meaning-based approach, for solving English math word -problems (MWPs) via understanding and reasoning in this paper. It first -analyzes the text, transforms both body and question parts into their -corresponding logic forms, and then performs inference on them. The associated -context of each quantity is represented with proposed role-tags (e.g., nsubj, -verb, etc.), which provides the flexibility for annotating an extracted math -quantity with its associated context information (i.e., the physical meaning of -this quantity). Statistical models are proposed to select the operator and -operands. A noisy dataset is designed to assess if a solver solves MWPs mainly -via understanding or mechanical pattern matching. Experimental results show -that our approach outperforms existing systems on both benchmark datasets and -the noisy dataset, which demonstrates that the proposed approach understands -the meaning of each quantity in the text more. -" -6909,1803.06252,"Manuel Carbonell, Mauricio Villegas, Alicia Forn\'es, Josep Llad\'os","Joint Recognition of Handwritten Text and Named Entities with a Neural - End-to-end Model",cs.CV cs.CL," When extracting information from handwritten documents, text transcription -and named entity recognition are usually faced as separate subsequent tasks. -This has the disadvantage that errors in the first module affect heavily the -performance of the second module. In this work we propose to do both tasks -jointly, using a single neural network with a common architecture used for -plain text recognition. Experimentally, the work has been tested on a -collection of historical marriage records. Results of experiments are presented -to show the effect on the performance for different configurations: different -ways of encoding the information, doing or not transfer learning and processing -at text line or multi-line region level. The results are comparable to state of -the art reported in the ICDAR 2017 Information Extraction competition, even -though the proposed technique does not use any dictionaries, language modeling -or post processing. -" -6910,1803.06390,"Marina Sokolova, Victoria Bobicev",Corpus Statistics in Text Classification of Online Data,cs.CL cs.IR cs.LG," Transformation of Machine Learning (ML) from a boutique science to a -generally accepted technology has increased importance of reproduction and -transportability of ML studies. In the current work, we investigate how corpus -characteristics of textual data sets correspond to text classification results. -We work with two data sets gathered from sub-forums of an online health-related -forum. Our empirical results are obtained for a multi-class sentiment analysis -application. -" -6911,1803.06397,"Bernhard Kratzwald, Suzana Ilic, Mathias Kraus, Stefan Feuerriegel, - Helmut Prendinger","Deep learning for affective computing: text-based emotion recognition in - decision support",cs.CL," Emotions widely affect human decision-making. This fact is taken into account -by affective computing with the goal of tailoring decision support to the -emotional states of individuals. However, the accurate recognition of emotions -within narrative documents presents a challenging undertaking due to the -complexity and ambiguity of language. Performance improvements can be achieved -through deep learning; yet, as demonstrated in this paper, the specific nature -of this task requires the customization of recurrent neural networks with -regard to bidirectional processing, dropout layers as a means of -regularization, and weighted loss functions. In addition, we propose -sent2affect, a tailored form of transfer learning for affective computing: here -the network is pre-trained for a different task (i.e. sentiment analysis), -while the output layer is subsequently tuned to the task of emotion -recognition. The resulting performance is evaluated in a holistic setting -across 6 benchmark datasets, where we find that both recurrent neural networks -and transfer learning consistently outperform traditional machine learning. -Altogether, the findings have considerable implications for the use of -affective computing. -" -6912,1803.06456,Marjan Hosseinia and Arjun Mukherjee,"Experiments with Neural Networks for Small and Large Scale Authorship - Verification",cs.CL," We propose two models for a special case of authorship verification problem. -The task is to investigate whether the two documents of a given pair are -written by the same author. We consider the authorship verification problem for -both small and large scale datasets. The underlying small-scale problem has two -main challenges: First, the authors of the documents are unknown to us because -no previous writing samples are available. Second, the two documents are short -(a few hundred to a few thousand words) and may differ considerably in the -genre and/or topic. To solve it we propose transformation encoder to transform -one document of the pair into the other. This document transformation generates -a loss which is used as a recognizable feature to verify if the authors of the -pair are identical. For the large scale problem where various authors are -engaged and more examples are available with larger length, a parallel -recurrent neural network is proposed. It compares the language models of the -two documents. We evaluate our methods on various types of datasets including -Authorship Identification datasets of PAN competition, Amazon reviews, and -machine learning articles. Experiments show that both methods achieve stable -and competitive performance compared to the baselines. -" -6913,1803.06500,"Joseph Corneli, Ursula Martin, Dave Murray-Rust, Gabriela Rino Nesin, - and Alison Pease",Argumentation theory for mathematical argument,cs.CL cs.AI," To adequately model mathematical arguments the analyst must be able to -represent the mathematical objects under discussion and the relationships -between them, as well as inferences drawn about these objects and relationships -as the discourse unfolds. We introduce a framework with these properties, which -has been used to analyse mathematical dialogues and expository texts. The -framework can recover salient elements of discourse at, and within, the -sentence level, as well as the way mathematical content connects to form larger -argumentative structures. We show how the framework might be used to support -computational reasoning, and argue that it provides a more natural way to -examine the process of proving theorems than do Lamport's structured proofs. -" -6914,1803.06535,Sudha Rao and Joel Tetreault,"Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus, Benchmarks - and Metrics for Formality Style Transfer",cs.CL," Style transfer is the task of automatically transforming a piece of text in -one particular style into another. A major barrier to progress in this field -has been a lack of training and evaluation datasets, as well as benchmarks and -automatic metrics. In this work, we create the largest corpus for a particular -stylistic transfer (formality) and show that techniques from the machine -translation community can serve as strong baselines for future work. We also -discuss challenges of using automatic metrics. -" -6915,1803.06581,"Wenhu Chen, Wenhan Xiong, Xifeng Yan, William Wang",Variational Knowledge Graph Reasoning,cs.AI cs.CL," Inferring missing links in knowledge graphs (KG) has attracted a lot of -attention from the research community. In this paper, we tackle a practical -query answering task involving predicting the relation of a given entity pair. -We frame this prediction problem as an inference problem in a probabilistic -graphical model and aim at resolving it from a variational inference -perspective. In order to model the relation between the query entity pair, we -assume that there exists an underlying latent variable (paths connecting two -nodes) in the KG, which carries the equivalent semantics of their relations. -However, due to the intractability of connections in large KGs, we propose to -use variation inference to maximize the evidence lower bound. More -specifically, our framework (\textsc{Diva}) is composed of three modules, i.e. -a posterior approximator, a prior (path finder), and a likelihood (path -reasoner). By using variational inference, we are able to incorporate them -closely into a unified architecture and jointly optimize them to perform KG -reasoning. With active interactions among these sub-modules, \textsc{Diva} is -better at handling noise and coping with more complex reasoning scenarios. In -order to evaluate our method, we conduct the experiment of the link prediction -task on multiple datasets and achieve state-of-the-art performances on both -datasets. -" -6916,1803.06643,"Alon Talmor, Jonathan Berant",The Web as a Knowledge-base for Answering Complex Questions,cs.CL cs.AI cs.LG," Answering complex questions is a time-consuming activity for humans that -requires reasoning and integration of information. Recent work on reading -comprehension made headway in answering simple questions, but tackling complex -questions is still an ongoing research challenge. Conversely, semantic parsers -have been successful at handling compositionality, but only when the -information resides in a target knowledge-base. In this paper, we present a -novel framework for answering broad and complex questions, assuming answering -simple questions is possible using a search engine and a reading comprehension -model. We propose to decompose complex questions into a sequence of simple -questions, and compute the final answer from the sequence of answers. To -illustrate the viability of our approach, we create a new dataset of complex -questions, ComplexWebQuestions, and present a model that decomposes questions -and interacts with the web to compute an answer. We empirically demonstrate -that question decomposition improves performance from 20.8 precision@1 to 27.5 -precision@1 on this new dataset. -" -6917,1803.06745,"Braja Gopal Patra, Dipankar Das, and Amitava Das","Sentiment Analysis of Code-Mixed Indian Languages: An Overview of - SAIL_Code-Mixed Shared Task @ICON-2017",cs.CL," Sentiment analysis is essential in many real-world applications such as -stance detection, review analysis, recommendation system, and so on. Sentiment -analysis becomes more difficult when the data is noisy and collected from -social media. India is a multilingual country; people use more than one -languages to communicate within themselves. The switching in between the -languages is called code-switching or code-mixing, depending upon the type of -mixing. This paper presents overview of the shared task on sentiment analysis -of code-mixed data pairs of Hindi-English and Bengali-English collected from -the different social media platform. The paper describes the task, dataset, -evaluation, baseline and participant's systems. -" -6918,1803.06805,"Qingming Tang, Weiran Wang and Karen Livescu",Acoustic feature learning using cross-domain articulatory measurements,cs.CL," Previous work has shown that it is possible to improve speech recognition by -learning acoustic features from paired acoustic-articulatory data, for example -by using canonical correlation analysis (CCA) or its deep extensions. One -limitation of this prior work is that the learned feature models are difficult -to port to new datasets or domains, and articulatory data is not available for -most speech corpora. In this work we study the problem of acoustic feature -learning in the setting where we have access to an external, domain-mismatched -dataset of paired speech and articulatory measurements, either with or without -labels. We develop methods for acoustic feature learning in these settings, -based on deep variational CCA and extensions that use both source and target -domain data and labels. Using this approach, we improve phonetic recognition -accuracies on both TIMIT and Wall Street Journal and analyze a number of design -choices. -" -6919,1803.06966,Kyle Richardson and Jonathan Berant and Jonas Kuhn,Polyglot Semantic Parsing in APIs,cs.CL," Traditional approaches to semantic parsing (SP) work by training individual -models for each available parallel dataset of text-meaning pairs. In this -paper, we explore the idea of polyglot semantic translation, or learning -semantic parsing models that are trained on multiple datasets and natural -languages. In particular, we focus on translating text to code signature -representations using the software component datasets of Richardson and Kuhn -(2017a,b). The advantage of such models is that they can be used for parsing a -wide variety of input natural languages and output programming languages, or -mixed input languages, using a single unified model. To facilitate modeling of -this type, we develop a novel graph-based decoding framework that achieves -state-of-the-art performance on the above datasets, and apply this method to -two other benchmark SP tasks. -" -6920,1803.07038,"Noah Weber, Leena Shekhar, Niranjan Balasubramanian, Kyunghyun Cho","Controlling Decoding for More Abstractive Summaries with Copy-Based - Networks",cs.CL," Attention-based neural abstractive summarization systems equipped with copy -mechanisms have shown promising results. Despite this success, it has been -noticed that such a system generates a summary by mostly, if not entirely, -copying over phrases, sentences, and sometimes multiple consecutive sentences -from an input paragraph, effectively performing extractive summarization. In -this paper, we verify this behavior using the latest neural abstractive -summarization system - a pointer-generator network. We propose a simple -baseline method that allows us to control the amount of copying without -retraining. Experiments indicate that the method provides a strong baseline for -abstractive systems looking to obtain high ROUGE scores while minimizing -overlap with the source article, substantially reducing the n-gram overlap with -the original article while keeping within 2 points of the original model's -ROUGE score. -" -6921,1803.07116,"Lucie-Aim\'ee Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe - Gravier, Fr\'ed\'erique Laforest, Jonathon Hare, Elena Simperl","Learning to Generate Wikipedia Summaries for Underserved Languages from - Wikidata",cs.CL," While Wikipedia exists in 287 languages, its content is unevenly distributed -among them. In this work, we investigate the generation of open domain -Wikipedia summaries in underserved languages using structured data from -Wikidata. To this end, we propose a neural network architecture equipped with -copy actions that learns to generate single-sentence and comprehensible textual -summaries from Wikidata triples. We demonstrate the effectiveness of the -proposed approach by evaluating it against a set of baselines on two languages -of different natures: Arabic, a morphological rich language with a larger -vocabulary than English, and Esperanto, a constructed language known for its -easy acquisition. -" -6922,1803.07133,"Sidi Lu, Yaoming Zhu, Weinan Zhang, Jun Wang, Yong Yu","Neural Text Generation: Past, Present and Beyond",cs.CL cs.AI cs.LG," This paper presents a systematic survey on recent development of neural text -generation models. Specifically, we start from recurrent neural network -language models with the traditional maximum likelihood estimation training -scheme and point out its shortcoming for text generation. We thus introduce the -recently proposed methods for text generation based on reinforcement learning, -re-parametrization tricks and generative adversarial nets (GAN) techniques. We -compare different properties of these models and the corresponding techniques -to handle their common problems such as gradient vanishing and generation -diversity. Finally, we conduct a benchmarking experiment with different types -of neural text generation models on two well-known datasets and discuss the -empirical results along with the aforementioned model properties. -" -6923,1803.07136,"Rick Dale, Nicholas D. Duran, and Moreno Coco","Dynamic Natural Language Processing with Recurrence Quantification - Analysis",cs.CL," Writing and reading are dynamic processes. As an author composes a text, a -sequence of words is produced. This sequence is one that, the author hopes, -causes a revisitation of certain thoughts and ideas in others. These processes -of composition and revisitation by readers are ordered in time. This means that -text itself can be investigated under the lens of dynamical systems. A common -technique for analyzing the behavior of dynamical systems, known as recurrence -quantification analysis (RQA), can be used as a method for analyzing sequential -structure of text. RQA treats text as a sequential measurement, much like a -time series, and can thus be seen as a kind of dynamic natural language -processing (NLP). The extension has several benefits. Because it is part of a -suite of time series analysis tools, many measures can be extracted in one -common framework. Secondly, the measures have a close relationship with some -commonly used measures from natural language processing. Finally, using -recurrence analysis offers an opportunity expand analysis of text by developing -theoretical descriptions derived from complex dynamic systems. We showcase an -example analysis on 8,000 texts from the Gutenberg Project, compare it to -well-known NLP approaches, and describe an R package (crqanlp) that can be used -in conjunction with R library crqa. -" -6924,1803.07139,"Marta R. Costa-juss\`a, Noe Casas, Maite Melero","English-Catalan Neural Machine Translation in the Biomedical Domain - through the cascade approach",cs.CL cs.AI," This paper describes the methodology followed to build a neural machine -translation system in the biomedical domain for the English-Catalan language -pair. This task can be considered a low-resourced task from the point of view -of the domain and the language pair. To face this task, this paper reports -experiments on a cascade pivot strategy through Spanish for the neural machine -translation using the English-Spanish SCIELO and Spanish-Catalan El Peri\'odico -database. To test the final performance of the system, we have created a new -test data set for English-Catalan in the biomedical domain which is freely -available on request. -" -6925,1803.07204,"Felix Stahlberg, Danielle Saunders, Gonzalo Iglesias, Bill Byrne","Why not be Versatile? Applications of the SGNMT Decoder for Machine - Translation",cs.CL," SGNMT is a decoding platform for machine translation which allows paring -various modern neural models of translation with different kinds of constraints -and symbolic models. In this paper, we describe three use cases in which SGNMT -is currently playing an active role: (1) teaching as SGNMT is being used for -course work and student theses in the MPhil in Machine Learning, Speech and -Language Technology at the University of Cambridge, (2) research as most of the -research work of the Cambridge MT group is based on SGNMT, and (3) technology -transfer as we show how SGNMT is helping to transfer research findings from the -laboratory to the industry, eg. into a product of SDL plc. -" -6926,1803.07274,"Matteo Negri, Marco Turchi, Rajen Chatterjee, Nicola Bertoldi",eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing,cs.CL," Training models for the automatic correction of machine-translated text -usually relies on data consisting of (source, MT, human post- edit) triplets -providing, for each source sentence, examples of translation errors with the -corresponding corrections made by a human post-editor. Ideally, a large amount -of data of this kind should allow the model to learn reliable correction -patterns and effectively apply them at test stage on unseen (source, MT) pairs. -In practice, however, their limited availability calls for solutions that also -integrate in the training process other sources of knowledge. Along this -direction, state-of-the-art results have been recently achieved by systems -that, in addition to a limited amount of available training data, exploit -artificial corpora that approximate elements of the ""gold"" training instances -with automatic translations. Following this idea, we present eSCAPE, the -largest freely-available Synthetic Corpus for Automatic Post-Editing released -so far. eSCAPE consists of millions of entries in which the MT element of the -training triplets has been obtained by translating the source side of -publicly-available parallel corpora, and using the target side as an artificial -human post-edit. Translations are obtained both with phrase-based and neural -models. For each MT paradigm, eSCAPE contains 7.2 million triplets for -English-German and 3.3 millions for English-Italian, resulting in a total of -14,4 and 6,6 million instances respectively. The usefulness of eSCAPE is proved -through experiments in a general-domain scenario, the most challenging one for -automatic post-editing. For both language directions, the models trained on our -artificial data always improve MT quality with statistically significant gains. -The current version of eSCAPE can be freely downloaded from: -http://hltshare.fbk.eu/QT21/eSCAPE.html. -" -6927,1803.07295,Rodolfo Delmonte,Expressivity in TTS from Semantics and Pragmatics,cs.CL," In this paper we present ongoing work to produce an expressive TTS reader -that can be used both in text and dialogue applications. The system called -SPARSAR has been used to read (English) poetry so far but it can now be applied -to any text. The text is fully analyzed both at phonetic and phonological -level, and at syntactic and semantic level. In addition, the system has access -to a restricted list of typical pragmatically marked phrases and expressions -that are used to convey specific discourse function and speech acts and need -specialized intonational contours. The text is transformed into a poem-like -structures, where each line corresponds to a Breath Group, semantically and -syntactically consistent. Stanzas correspond to paragraph boundaries. -Analogical parameters are related to ToBI theoretical indices but their number -is doubled. In this paper, we concentrate on short stories and fables. -" -6928,1803.07416,"Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. - Gomez, Stephan Gouws, Llion Jones, {\L}ukasz Kaiser, Nal Kalchbrenner, Niki - Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit",Tensor2Tensor for Neural Machine Translation,cs.LG cs.CL stat.ML," Tensor2Tensor is a library for deep learning models that is well-suited for -neural machine translation and includes the reference implementation of the -state-of-the-art Transformer model. -" -6929,1803.07427,"Soujanya Poria, Navonil Majumder, Devamanyu Hazarika, Erik Cambria, - Alexander Gelbukh, Amir Hussain","Multimodal Sentiment Analysis: Addressing Key Issues and Setting up the - Baselines",cs.CL cs.CV cs.IR," We compile baselines, along with dataset split, for multimodal sentiment -analysis. In this paper, we explore three different deep-learning based -architectures for multimodal sentiment classification, each improving upon the -previous. Further, we evaluate these architectures with multiple datasets with -fixed train/test partition. We also discuss some major issues, frequently -ignored in multimodal sentiment analysis research, e.g., role of -speaker-exclusive models, importance of different modalities, and -generalizability. This framework illustrates the different facets of analysis -to be considered while performing multimodal sentiment analysis and, hence, -serves as a new benchmark for future research in this emerging field. -" -6930,1803.07602,Andrei M. Butnaru and Radu Tudor Ionescu,"UnibucKernel: A kernel-based learning method for complex word - identification",cs.CL," In this paper, we present a kernel-based learning approach for the 2018 -Complex Word Identification (CWI) Shared Task. Our approach is based on -combining multiple low-level features, such as character n-grams, with -high-level semantic features that are either automatically learned using word -embeddings or extracted from a lexical knowledge base, namely WordNet. After -feature extraction, we employ a kernel method for the learning phase. The -feature matrix is first transformed into a normalized kernel matrix. For the -binary classification task (simple versus complex), we employ Support Vector -Machines. For the regression task, in which we have to predict the complexity -level of a word (a word is more complex if it is labeled as complex by more -annotators), we employ v-Support Vector Regression. We applied our approach -only on the three English data sets containing documents from Wikipedia, -WikiNews and News domains. Our best result during the competition was the third -place on the English Wikipedia data set. However, in this paper, we also report -better post-competition results. -" -6931,1803.07640,"Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, - Nelson Liu, Matthew Peters, Michael Schmitz, Luke Zettlemoyer",AllenNLP: A Deep Semantic Natural Language Processing Platform,cs.CL," This paper describes AllenNLP, a platform for research on deep learning -methods in natural language understanding. AllenNLP is designed to support -researchers who want to build novel language understanding models quickly and -easily. It is built on top of PyTorch, allowing for dynamic computation graphs, -and provides (1) a flexible data API that handles intelligent batching and -padding, (2) high-level abstractions for common operations in working with -text, and (3) a modular and extensible experiment framework that makes doing -good science easy. It also includes reference implementations of high quality -approaches for both core semantic problems (e.g. semantic role labeling (Palmer -et al., 2005)) and language understanding applications (e.g. machine -comprehension (Rajpurkar et al., 2016)). AllenNLP is an ongoing open-source -effort maintained by engineers and researchers at the Allen Institute for -Artificial Intelligence. -" -6932,1803.07679,"\^Angelo Cardoso, Fabio Daolio and Sa\'ul Vargas","Product Characterisation towards Personalisation: Learning Attributes - from Unstructured Data to Recommend Fashion Products",stat.ML cs.CL cs.CV cs.IR cs.LG," In this paper, we describe a solution to tackle a common set of challenges in -e-commerce, which arise from the fact that new products are continually being -added to the catalogue. The challenges involve properly personalising the -customer experience, forecasting demand and planning the product range. We -argue that the foundational piece to solve all of these problems is having -consistent and detailed information about each product, information that is -rarely available or consistent given the multitude of suppliers and types of -products. We describe in detail the architecture and methodology implemented at -ASOS, one of the world's largest fashion e-commerce retailers, to tackle this -problem. We then show how this quantitative understanding of the products can -be leveraged to improve recommendations in a hybrid recommender system -approach. -" -6933,1803.07718,"Jasper Friedrichs, Debanjan Mahata, Shubham Gupta","InfyNLP at SMM4H Task 2: Stacked Ensemble of Shallow Convolutional - Neural Networks for Identifying Personal Medication Intake from Twitter",cs.CL," This paper describes Infosys's participation in the ""2nd Social Media Mining -for Health Applications Shared Task at AMIA, 2017, Task 2"". Mining social media -messages for health and drug related information has received significant -interest in pharmacovigilance research. This task targets at developing -automated classification models for identifying tweets containing descriptions -of personal intake of medicines. Towards this objective we train a stacked -ensemble of shallow convolutional neural network (CNN) models on an annotated -dataset provided by the organizers. We use random search for tuning the -hyper-parameters of the CNN and submit an ensemble of best models for the -prediction task. Our system secured first place among 9 teams, with a -micro-averaged F-score of 0.693. -" -6934,1803.07724,"Jasdeep Singh, Vincent Ying, Alex Nutkiewicz","Attention on Attention: Architectures for Visual Question Answering - (VQA)",cs.CL cs.AI cs.CV," Visual Question Answering (VQA) is an increasingly popular topic in deep -learning research, requiring coordination of natural language processing and -computer vision modules into a single architecture. We build upon the model -which placed first in the VQA Challenge by developing thirteen new attention -mechanisms and introducing a simplified classifier. We performed 300 GPU hours -of extensive hyperparameter and architecture searches and were able to achieve -an evaluation score of 64.78%, outperforming the existing state-of-the-art -single model's validation score of 63.15%. -" -6935,1803.07729,"Xin Wang, Wenhan Xiong, Hongmin Wang, William Yang Wang","Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement - Learning for Planned-Ahead Vision-and-Language Navigation",cs.CV cs.AI cs.CL cs.RO," Existing research studies on vision and language grounding for robot -navigation focus on improving model-free deep reinforcement learning (DRL) -models in synthetic environments. However, model-free DRL models do not -consider the dynamics in the real-world environments, and they often fail to -generalize to new scenes. In this paper, we take a radical approach to bridge -the gap between synthetic studies and real-world practices---We propose a -novel, planned-ahead hybrid reinforcement learning model that combines -model-free and model-based reinforcement learning to solve a real-world -vision-language navigation task. Our look-ahead module tightly integrates a -look-ahead policy model with an environment model that predicts the next state -and the reward. Experimental results suggest that our proposed method -significantly outperforms the baselines and achieves the best on the real-world -Room-to-Room dataset. Moreover, our scalable method is more generalizable when -transferring to unseen environments. -" -6936,1803.07738,"Haotian Guan, Zhilei Liu, Longbiao Wang, Jianwu Dang, Ruiguo Yu",Speech Emotion Recognition Considering Local Dynamic Features,cs.HC cs.AI cs.CL," Recently, increasing attention has been directed to the study of the speech -emotion recognition, in which global acoustic features of an utterance are -mostly used to eliminate the content differences. However, the expression of -speech emotion is a dynamic process, which is reflected through dynamic -durations, energies, and some other prosodic information when one speaks. In -this paper, a novel local dynamic pitch probability distribution feature, which -is obtained by drawing the histogram, is proposed to improve the accuracy of -speech emotion recognition. Compared with most of the previous works using -global features, the proposed method takes advantage of the local dynamic -information conveyed by the emotional speech. Several experiments on Berlin -Database of Emotional Speech are conducted to verify the effectiveness of the -proposed method. The experimental results demonstrate that the local dynamic -information obtained with the proposed method is more effective for speech -emotion recognition than the traditional global features. -" -6937,1803.07771,"Ou Wu, Tao Yang, Mengyang Li, Ming Li",$\rho$-hot Lexicon Embedding-based Two-level LSTM for Sentiment Analysis,cs.CL," Sentiment analysis is a key component in various text mining applications. -Numerous sentiment classification techniques, including conventional and deep -learning-based methods, have been proposed in the literature. In most existing -methods, a high-quality training set is assumed to be given. Nevertheless, -constructing a high-quality training set that consists of highly accurate -labels is challenging in real applications. This difficulty stems from the fact -that text samples usually contain complex sentiment representations, and their -annotation is subjective. We address this challenge in this study by leveraging -a new labeling strategy and utilizing a two-level long short-term memory -network to construct a sentiment classifier. Lexical cues are useful for -sentiment analysis, and they have been utilized in conventional studies. For -example, polar and privative words play important roles in sentiment analysis. -A new encoding strategy, that is, $\rho$-hot encoding, is proposed to alleviate -the drawbacks of one-hot encoding and thus effectively incorporate useful -lexical cues. We compile three Chinese data sets on the basis of our label -strategy and proposed methodology. Experiments on the three data sets -demonstrate that the proposed method outperforms state-of-the-art algorithms. -" -6938,1803.07828,"Tommaso Soru, Stefano Ruberto, Diego Moussallem, Andr\'e Valdestilhas, - Alexander Bigerl, Edgard Marx, Diego Esteves",Expeditious Generation of Knowledge Graph Embeddings,cs.CL cs.AI," Knowledge Graph Embedding methods aim at representing entities and relations -in a knowledge base as points or vectors in a continuous vector space. Several -approaches using embeddings have shown promising results on tasks such as link -prediction, entity recommendation, question answering, and triplet -classification. However, only a few methods can compute low-dimensional -embeddings of very large knowledge bases without needing state-of-the-art -computational resources. In this paper, we propose KG2Vec, a simple and fast -approach to Knowledge Graph Embedding based on the skip-gram model. Instead of -using a predefined scoring function, we learn it relying on Long Short-Term -Memories. We show that our embeddings achieve results comparable with the most -scalable approaches on knowledge graph completion as well as on a new metric. -Yet, KG2Vec can embed large graphs in lesser time by processing more than 250 -million triples in less than 7 hours on common hardware. -" -6939,1803.08035,"Xiaolong Wang, Yufei Ye, Abhinav Gupta",Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs,cs.CV cs.CL," We consider the problem of zero-shot recognition: learning a visual -classifier for a category with zero training examples, just using the word -embedding of the category and its relationship to other categories, which -visual data are provided. The key to dealing with the unfamiliar or novel -category is to transfer knowledge obtained from familiar classes to describe -the unfamiliar class. In this paper, we build upon the recently introduced -Graph Convolutional Network (GCN) and propose an approach that uses both -semantic embeddings and the categorical relationships to predict the -classifiers. Given a learned knowledge graph (KG), our approach takes as input -semantic embeddings for each node (representing visual category). After a -series of graph convolutions, we predict the visual classifier for each -category. During training, the visual classifiers for a few categories are -given to learn the GCN parameters. At test time, these filters are used to -predict the visual classifiers of unseen categories. We show that our approach -is robust to noise in the KG. More importantly, our approach provides -significant improvement in performance compared to the current state-of-the-art -results (from 2 ~ 3% on some metrics to whopping 20% on a few). -" -6940,1803.08073,Vered Shwartz and Chris Waterson,"Olive Oil is Made of Olives, Baby Oil is Made for Babies: Interpreting - Noun Compounds using Paraphrases in a Neural Model",cs.CL," Automatic interpretation of the relation between the constituents of a noun -compound, e.g. olive oil (source) and baby oil (purpose) is an important task -for many NLP applications. Recent approaches are typically based on either -noun-compound representations or paraphrases. While the former has initially -shown promising results, recent work suggests that the success stems from -memorizing single prototypical words for each relation. We explore a neural -paraphrasing approach that demonstrates superior performance when such -memorization is not possible. -" -6941,1803.08240,"Stephen Merity, Nitish Shirish Keskar, Richard Socher",An Analysis of Neural Language Modeling at Multiple Scales,cs.CL cs.AI cs.NE," Many of the leading approaches in language modeling introduce novel, complex -and specialized architectures. We take existing state-of-the-art word level -language models based on LSTMs and QRNNs and extend them to both larger -vocabularies as well as character-level granularity. When properly tuned, LSTMs -and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, -enwik8) and word-level (WikiText-103) datasets, respectively. Results are -obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single -modern GPU. -" -6942,1803.08312,Aurelia Bustos and Antonio Pertusa,"Learning Eligibility in Cancer Clinical Trials using Deep Neural - Networks",cs.CL cs.LG stat.ML," Interventional cancer clinical trials are generally too restrictive, and some -patients are often excluded on the basis of comorbidity, past or concomitant -treatments, or the fact that they are over a certain age. The efficacy and -safety of new treatments for patients with these characteristics are, -therefore, not defined. In this work, we built a model to automatically predict -whether short clinical statements were considered inclusion or exclusion -criteria. We used protocols from cancer clinical trials that were available in -public registries from the last 18 years to train word-embeddings, and we -constructed a~dataset of 6M short free-texts labeled as eligible or not -eligible. A text classifier was trained using deep neural networks, with -pre-trained word-embeddings as inputs, to predict whether or not short -free-text statements describing clinical information were considered eligible. -We additionally analyzed the semantic reasoning of the word-embedding -representations obtained and were able to identify equivalent treatments for a -type of tumor analogous with the drugs used to treat other tumors. We show that -representation learning using {deep} neural networks can be successfully -leveraged to extract the medical knowledge from clinical trial protocols for -potentially assisting practitioners when prescribing treatments. -" -6943,1803.08409,Andy Way,Quality expectations of machine translation,cs.CL," Machine Translation (MT) is being deployed for a range of use-cases by -millions of people on a daily basis. There should, therefore, be no doubt as to -the utility of MT. However, not everyone is convinced that MT can be useful, -especially as a productivity enhancer for human translators. In this chapter, I -address this issue, describing how MT is currently deployed, how its output is -evaluated and how this could be enhanced, especially as MT quality itself -improves. Central to these issues is the acceptance that there is no longer a -single 'gold standard' measure of quality, such that the situation in which MT -is deployed needs to be borne in mind, especially with respect to the expected -'shelf-life' of the translation itself. -" -6944,1803.08419,Vinayak Mathur and Arpit Singh,The Rapidly Changing Landscape of Conversational Agents,cs.AI cs.CL," Conversational agents have become ubiquitous, ranging from goal-oriented -systems for helping with reservations to chit-chat models found in modern -virtual assistants. In this survey paper, we explore this fascinating field. We -look at some of the pioneering work that defined the field and gradually move -to the current state-of-the-art models. We look at statistical, neural, -generative adversarial network based and reinforcement learning based -approaches and how they evolved. Along the way we discuss various challenges -that the field faces, lack of context in utterances, not having a good -quantitative metric to compare models, lack of trust in agents because they do -not have a consistent persona etc. We structure this paper in a way that -answers these pertinent questions and discusses competing approaches to solve -them. -" -6945,1803.08463,Pham Quang Nhat Minh,"A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 - NER Evaluation Campaign",cs.CL," In this report, we describe our participant named-entity recognition system -at VLSP 2018 evaluation campaign. We formalized the task as a sequence labeling -problem using BIO encoding scheme. We applied a feature-based model which -combines word, word-shape features, Brown-cluster-based features, and -word-embedding-based features. We compare several methods to deal with nested -entities in the dataset. We showed that combining tags of entities at all -levels for training a sequence labeling model (joint-tag model) improved the -accuracy of nested named-entity recognition. -" -6946,1803.08471,"Aaron Schein, Zhiwei Steven Wu, Alexandra Schofield, Mingyuan Zhou, - Hanna Wallach",Locally Private Bayesian Inference for Count Models,stat.ML cs.CL cs.CR cs.LG cs.SI," We present a general method for privacy-preserving Bayesian inference in -Poisson factorization, a broad class of models that includes some of the most -widely used models in the social sciences. Our method satisfies limited -precision local privacy, a generalization of local differential privacy, which -we introduce to formulate privacy guarantees appropriate for sparse count data. -We develop an MCMC algorithm that approximates the locally private posterior -over model parameters given data that has been locally privatized by the -geometric mechanism (Ghosh et al., 2012). Our solution is based on two -insights: 1) a novel reinterpretation of the geometric mechanism in terms of -the Skellam distribution (Skellam, 1946) and 2) a general theorem that relates -the Skellam to the Bessel distribution (Yuan & Kalbfleisch, 2000). We -demonstrate our method in two case studies on real-world email data in which we -show that our method consistently outperforms the commonly-used naive approach, -obtaining higher quality topics in text and more accurate link prediction in -networks. On some tasks, our privacy-preserving method even outperforms -non-private inference which conditions on the true data. -" -6947,1803.08476,Edilson A. Corr\^ea Jr. and Diego R. Amancio,"Word sense induction using word embeddings and community detection in - complex networks",cs.CL cs.SI," Word Sense Induction (WSI) is the ability to automatically induce word senses -from corpora. The WSI task was first proposed to overcome the limitations of -manually annotated corpus that are required in word sense disambiguation -systems. Even though several works have been proposed to induce word senses, -existing systems are still very limited in the sense that they make use of -structured, domain-specific knowledge sources. In this paper, we devise a -method that leverages recent findings in word embeddings research to generate -context embeddings, which are embeddings containing information about the -semantical context of a word. In order to induce senses, we modeled the set of -ambiguous words as a complex network. In the generated network, two instances -(nodes) are connected if the respective context embeddings are similar. Upon -using well-established community detection methods to cluster the obtained -context embeddings, we found that the proposed method yields excellent -performance for the WSI task. Our method outperformed competing algorithms and -baselines, in a completely unsupervised manner and without the need of any -additional structured knowledge source. -" -6948,1803.08493,"Eric Zelikman, Richard Socher",Contextual Salience for Fast and Accurate Sentence Vectors,cs.CL," Unsupervised vector representations of sentences or documents are a major -building block for many language tasks such as sentiment classification. -However, current methods are uninterpretable and slow or require large training -datasets. Recent word vector-based proposals implicitly assume that distances -in a word embedding space are equally important, regardless of context. We -introduce contextual salience (CoSal), a measure of word importance that uses -the distribution of context vectors to normalize distances and weights. CoSal -relies on the insight that unusual word vectors disproportionately affect -phrase vectors. A bag-of-words model with CoSal-based weights produces accurate -unsupervised sentence or document representations for classification, requiring -little computation to evaluate and only a single covariance calculation to -``train."" CoSal supports small contexts, out-of context words and outperforms -SkipThought on most benchmarks, beats tf-idf on all benchmarks, and is -competitive with the unsupervised state-of-the-art. -" -6949,1803.08614,"Jeremy Barnes, Patrik Lambert, Toni Badia","MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for - Aspect-level Sentiment Classification",cs.CL," While sentiment analysis has become an established field in the NLP -community, research into languages other than English has been hindered by the -lack of resources. Although much research in multi-lingual and cross-lingual -sentiment analysis has focused on unsupervised or semi-supervised approaches, -these still require a large number of resources and do not reach the -performance of supervised approaches. With this in mind, we introduce two -datasets for supervised aspect-level sentiment analysis in Basque and Catalan, -both of which are under-resourced languages. We provide high-quality -annotations and benchmarks with the hope that they will be useful to the -growing community of researchers working on these languages. -" -6950,1803.08652,"Ikuya Yamada, Ryuji Tamaki, Hiroyuki Shindo, Yoshiyasu Takefuji",Studio Ousia's Quiz Bowl Question Answering System,cs.CL," In this chapter, we describe our question answering system, which was the -winning system at the Human-Computer Question Answering (HCQA) Competition at -the Thirty-first Annual Conference on Neural Information Processing Systems -(NIPS). The competition requires participants to address a factoid question -answering task referred to as quiz bowl. To address this task, we use two novel -neural network models and combine these models with conventional information -retrieval models using a supervised machine learning model. Our system achieved -the best performance among the systems submitted in the competition and won a -match against six top human quiz experts by a wide margin. -" -6951,1803.08721,Florian Boudin,Unsupervised Keyphrase Extraction with Multipartite Graphs,cs.IR cs.CL," We propose an unsupervised keyphrase extraction model that encodes topical -information within a multipartite graph structure. Our model represents -keyphrase candidates and topics in a single graph and exploits their mutually -reinforcing relationship to improve candidate ranking. We further introduce a -novel mechanism to incorporate keyphrase selection preferences into the model. -Experiments conducted on three widely used datasets show significant -improvements over state-of-the-art graph-based models. -" -6952,1803.08790,"Hemayet Ahmed Chowdhury, Tanvir Alam Nibir and Md. Saiful Islam","Sentiment Analysis of Comments on Rohingya Movement with Support Vector - Machine",cs.IR cs.CL cs.LG," The Rohingya Movement and Crisis caused a huge uproar in the political and -economic state of Bangladesh. Refugee movement is a recurring event and a large -amount of data in the form of opinions remains on social media such as -Facebook, with very little analysis done on them.To analyse the comments based -on all Rohingya related posts, we had to create and modify a classifier based -on the Support Vector Machine algorithm. The code is implemented in python and -uses scikit-learn library. A dataset on Rohingya analysis is not currently -available so we had to use our own data set of 2500 positive and 2500 negative -comments. We specifically used a support vector machine with linear kernel. A -previous experiment was performed by us on the same dataset using the naive -bayes algorithm, but that did not yield impressive results. -" -6953,1803.08793,"Jack Lanchantin, Ji Gao",Exploring the Naturalness of Buggy Code with Recurrent Neural Networks,cs.SE cs.CL cs.LG," Statistical language models are powerful tools which have been used for many -tasks within natural language processing. Recently, they have been used for -other sequential data such as source code.(Ray et al., 2015) showed that it is -possible train an n-gram source code language mode, and use it to predict buggy -lines in code by determining ""unnatural"" lines via entropy with respect to the -language model. In this work, we propose using a more advanced language -modeling technique, Long Short-term Memory recurrent neural networks, to model -source code and classify buggy lines based on entropy. We show that our method -slightly outperforms an n-gram model in the buggy line classification task -using AUC. -" -6954,1803.08863,"Enno Hermann, Sharon Goldwater","Multilingual bottleneck features for subword modeling in zero-resource - languages",cs.CL eess.AS," How can we effectively develop speech technology for languages where no -transcribed data is available? Many existing approaches use no annotated -resources at all, yet it makes sense to leverage information from large -annotated corpora in other languages, for example in the form of multilingual -bottleneck features (BNFs) obtained from a supervised speech recognition -system. In this work, we evaluate the benefits of BNFs for subword modeling -(feature extraction) in six unseen languages on a word discrimination task. -First we establish a strong unsupervised baseline by combining two existing -methods: vocal tract length normalisation (VTLN) and the correspondence -autoencoder (cAE). We then show that BNFs trained on a single language already -beat this baseline; including up to 10 languages results in additional -improvements which cannot be matched by just adding more data from a single -language. Finally, we show that the cAE can improve further on the BNFs if -high-quality same-word pairs are available. -" -6955,1803.08869,"Grzegorz Chrupa{\l}a, Lieke Gelderloos, \'Akos K\'ad\'ar, Afra - Alishahi",On the difficulty of a distributional semantics of spoken language,cs.CL cs.LG cs.SD eess.AS," In the domain of unsupervised learning most work on speech has focused on -discovering low-level constructs such as phoneme inventories or word-like -units. In contrast, for written language, where there is a large body of work -on unsupervised induction of semantic representations of words, whole sentences -and longer texts. In this study we examine the challenges of adapting these -approaches from written to spoken language. We conjecture that unsupervised -learning of the semantics of spoken language becomes feasible if we abstract -from the surface variability. We simulate this setting with a dataset of -utterances spoken by a realistic but uniform synthetic voice. We evaluate two -simple unsupervised models which, to varying degrees of success, learn semantic -representations of speech fragments. Finally we present inconclusive results on -human speech, and discuss the challenges inherent in learning distributional -semantic representations on unrestricted natural spoken language. -" -6956,1803.08896,"Somak Aditya, Yezhou Yang, Chitta Baral","Explicit Reasoning over End-to-End Neural Architectures for Visual - Question Answering",cs.CV cs.CL," Many vision and language tasks require commonsense reasoning beyond -data-driven image and natural language processing. Here we adopt Visual -Question Answering (VQA) as an example task, where a system is expected to -answer a question in natural language about an image. Current state-of-the-art -systems attempted to solve the task using deep neural architectures and -achieved promising performance. However, the resulting systems are generally -opaque and they struggle in understanding questions for which extra knowledge -is required. In this paper, we present an explicit reasoning layer on top of a -set of penultimate neural network based systems. The reasoning layer enables -reasoning and answering questions where additional knowledge is required, and -at the same time provides an interpretable interface to the end users. -Specifically, the reasoning layer adopts a Probabilistic Soft Logic (PSL) based -engine to reason over a basket of inputs: visual relations, the semantic parse -of the question, and background ontological knowledge from word2vec and -ConceptNet. Experimental analysis of the answers and the key evidential -predicates generated on the VQA dataset validate our approach. -" -6957,1803.08910,"Dilek K\""u\c{c}\""uk and Fazli Can",Stance Detection on Tweets: An SVM-based Approach,cs.CL," Stance detection is a subproblem of sentiment analysis where the stance of -the author of a piece of natural language text for a particular target (either -explicitly stated in the text or not) is explored. The stance output is usually -given as Favor, Against, or Neither. In this paper, we target at stance -detection on sports-related tweets and present the performance results of our -SVM-based stance classifiers on such tweets. First, we describe three versions -of our proprietary tweet data set annotated with stance information, all of -which are made publicly available for research purposes. Next, we evaluate SVM -classifiers using different feature sets for stance detection on this data set. -The employed features are based on unigrams, bigrams, hashtags, external links, -emoticons, and lastly, named entities. The results indicate that joint use of -the features based on unigrams, hashtags, and named entities by SVM classifiers -is a plausible approach for stance detection problem on sports-related tweets. -" -6958,1803.08966,"Lu Feng, Mahsa Ghasemi, Kai-Wei Chang, Ufuk Topcu",Counterexamples for Robotic Planning Explained in Structured Language,cs.RO cs.CL cs.FL," Automated techniques such as model checking have been used to verify models -of robotic mission plans based on Markov decision processes (MDPs) and generate -counterexamples that may help diagnose requirement violations. However, such -artifacts may be too complex for humans to understand, because existing -representations of counterexamples typically include a large number of paths or -a complex automaton. To help improve the interpretability of counterexamples, -we define a notion of explainable counterexample, which includes a set of -structured natural language sentences to describe the robotic behavior that -lead to a requirement violation in an MDP model of robotic mission plan. We -propose an approach based on mixed-integer linear programming for generating -explainable counterexamples that are minimal, sound and complete. We -demonstrate the usefulness of the proposed approach via a case study of -warehouse robots planning. -" -6959,1803.08976,"Yu-An Chung, James Glass","Speech2Vec: A Sequence-to-Sequence Framework for Learning Word - Embeddings from Speech",cs.CL," In this paper, we propose a novel deep neural network architecture, -Speech2Vec, for learning fixed-length vector representations of audio segments -excised from a speech corpus, where the vectors contain semantic information -pertaining to the underlying spoken words, and are close to other vectors in -the embedding space if their corresponding underlying spoken words are -semantically similar. The proposed model can be viewed as a speech version of -Word2Vec. Its design is based on a RNN Encoder-Decoder framework, and borrows -the methodology of skipgrams or continuous bag-of-words for training. Learning -word embeddings directly from speech enables Speech2Vec to make use of the -semantic information carried by speech that does not exist in plain text. The -learned word embeddings are evaluated and analyzed on 13 widely used word -similarity benchmarks, and outperform word embeddings learned by Word2Vec from -the transcriptions. -" -6960,1803.08983,Patrick Huber and Jan Niehues and Alex Waibel,Automated Evaluation of Out-of-Context Errors,cs.CL cs.AI," We present a new approach to evaluate computational models for the task of -text understanding by the means of out-of-context error detection. Through the -novel design of our automated modification process, existing large-scale data -sources can be adopted for a vast number of text understanding tasks. The data -is thereby altered on a semantic level, allowing models to be tested against a -challenging set of modified text passages that require to comprise a broader -narrative discourse. Our newly introduced task targets actual real-world -problems of transcription and translation systems by inserting authentic -out-of-context errors. The automated modification process is applied to the -2016 TEDTalk corpus. Entirely automating the process allows the adoption of -complete datasets at low cost, facilitating supervised learning procedures and -deeper networks to be trained and tested. To evaluate the quality of the -modification algorithm a language model and a supervised binary classification -model are trained and tested on the altered dataset. A human baseline -evaluation is examined to compare the results with human performance. The -outcome of the evaluation task indicates the difficulty to detect semantic -errors for machine-learning algorithms and humans, showing that the errors -cannot be identified when limited to a single sentence. -" -6961,1803.08991,Antonis Anastasopoulos and David Chiang,"Leveraging translations for speech transcription in low-resource - settings",cs.CL," Recently proposed data collection frameworks for endangered language -documentation aim not only to collect speech in the language of interest, but -also to collect translations into a high-resource language that will render the -collected resource interpretable. We focus on this scenario and explore whether -we can improve transcription quality under these extremely low-resource -settings with the assistance of text translations. We present a neural -multi-source model and evaluate several variations of it on three low-resource -datasets. We find that our multi-source model with shared attention outperforms -the baselines, reducing transcription character error rate by up to 12.3%. -" -6962,1803.09000,"Yang Yu, Vincent Ng",WikiRank: Improving Keyphrase Extraction Based on Background Knowledge,cs.CL cs.IR," Keyphrase is an efficient representation of the main idea of documents. While -background knowledge can provide valuable information about documents, they are -rarely incorporated in keyphrase extraction methods. In this paper, we propose -WikiRank, an unsupervised method for keyphrase extraction based on the -background knowledge from Wikipedia. Firstly, we construct a semantic graph for -the document. Then we transform the keyphrase extraction problem into an -optimization problem on the graph. Finally, we get the optimal keyphrase set to -be the output. Our method obtains improvements over other state-of-art models -by more than 2% in F1-score. -" -6963,1803.09017,"Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, - Joel Shor, Ying Xiao, Fei Ren, Ye Jia, Rif A. Saurous","Style Tokens: Unsupervised Style Modeling, Control and Transfer in - End-to-End Speech Synthesis",cs.CL cs.LG cs.SD eess.AS," In this work, we propose ""global style tokens"" (GSTs), a bank of embeddings -that are jointly trained within Tacotron, a state-of-the-art end-to-end speech -synthesis system. The embeddings are trained with no explicit labels, yet learn -to model a large range of acoustic expressiveness. GSTs lead to a rich set of -significant results. The soft interpretable ""labels"" they generate can be used -to control synthesis in novel ways, such as varying speed and speaking style - -independently of the text content. They can also be used for style transfer, -replicating the speaking style of a single audio clip across an entire -long-form text corpus. When trained on noisy, unlabeled found data, GSTs learn -to factorize noise and speaker identity, providing a path towards highly -scalable but robust speech synthesis. -" -6964,1803.09047,"RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy - Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous","Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with - Tacotron",cs.CL cs.LG cs.SD eess.AS," We present an extension to the Tacotron speech synthesis architecture that -learns a latent embedding space of prosody, derived from a reference acoustic -representation containing the desired prosody. We show that conditioning -Tacotron on this learned embedding space results in synthesized audio that -matches the prosody of the reference signal with fine time detail even when the -reference and synthesis speakers are different. Additionally, we show that a -reference prosody embedding can be used to synthesize text that is different -from that of the reference utterance. We define several quantitative and -subjective metrics for evaluating prosody transfer, and report results with -accompanying audio samples from single-speaker and 44-speaker Tacotron models -on a prosody transfer task. -" -6965,1803.09065,"Julien Tissier, Christophe Gravier, Amaury Habrard",Near-lossless Binarization of Word Embeddings,cs.CL," Word embeddings are commonly used as a starting point in many NLP models to -achieve state-of-the-art performances. However, with a large vocabulary and -many dimensions, these floating-point representations are expensive both in -terms of memory and calculations which makes them unsuitable for use on -low-resource devices. The method proposed in this paper transforms real-valued -embeddings into binary embeddings while preserving semantic information, -requiring only 128 or 256 bits for each vector. This leads to a small memory -footprint and fast vector operations. The model is based on an autoencoder -architecture, which also allows to reconstruct original vectors from the binary -ones. Experimental results on semantic similarity, text classification and -sentiment analysis tasks show that the binarization of word embeddings only -leads to a loss of ~2% in accuracy while vector size is reduced by 97%. -Furthermore, a top-k benchmark demonstrates that using these binary vectors is -30 times faster than using real-valued vectors. -" -6966,1803.09074,"Yi Tay, Luu Anh Tuan, Siu Cheung Hui",Multi-range Reasoning for Machine Comprehension,cs.CL cs.AI cs.NE," We propose MRU (Multi-Range Reasoning Units), a new fast compositional -encoder for machine comprehension (MC). Our proposed MRU encoders are -characterized by multi-ranged gating, executing a series of parameterized -contract-and-expand layers for learning gating vectors that benefit from long -and short-term dependencies. The aims of our approach are as follows: (1) -learning representations that are concurrently aware of long and short-term -context, (2) modeling relationships between intra-document blocks and (3) fast -and efficient sequence encoding. We show that our proposed encoder demonstrates -promising results both as a standalone encoder and as well as a complementary -building block. We conduct extensive experiments on three challenging MC -datasets, namely RACE, SearchQA and NarrativeQA, achieving highly competitive -performance on all. On the RACE benchmark, our model outperforms DFN (Dynamic -Fusion Networks) by 1.5%-6% without using any recurrent or convolution layers. -Similarly, we achieve competitive performance relative to AMANDA on the -SearchQA benchmark and BiDAF on the NarrativeQA benchmark without using any -LSTM/GRU layers. Finally, incorporating MRU encoders with standard BiLSTM -architectures further improves performance, achieving state-of-the-art results. -" -6967,1803.09091,"Christos Christodoulopoulos, Arpit Mittal",Simple Large-scale Relation Extraction from Unstructured Text,cs.CL," Knowledge-based question answering relies on the availability of facts, the -majority of which cannot be found in structured sources (e.g. Wikipedia -info-boxes, Wikidata). One of the major components of extracting facts from -unstructured text is Relation Extraction (RE). In this paper we propose a novel -method for creating distant (weak) supervision labels for training a -large-scale RE system. We also provide new evidence about the effectiveness of -neural network approaches by decoupling the model architecture from the feature -design of a state-of-the-art neural network system. Surprisingly, a much -simpler classifier trained on similar features performs on par with the highly -complex neural network system (at 75x reduction to the training time), -suggesting that the features are a bigger contributor to the final performance. -" -6968,1803.09103,Sowmya Vajjala,Machine Learning and Applied Linguistics,cs.CL," This entry introduces the topic of machine learning and provides an overview -of its relevance for applied linguistics and language learning. The discussion -will focus on giving an introduction to the methods and applications of machine -learning in applied linguistics, and will provide references for further study. -" -6969,1803.09123,Kriste Krstovski and David M. Blei,Equation Embeddings,stat.ML cs.CL cs.LG," We present an unsupervised approach for discovering semantic representations -of mathematical equations. Equations are challenging to analyze because each is -unique, or nearly unique. Our method, which we call equation embeddings, finds -good representations of equations by using the representations of their -surrounding words. We used equation embeddings to analyze four collections of -scientific articles from the arXiv, covering four computer science domains -(NLP, IR, AI, and ML) and $\sim$98.5k equations. Quantitatively, we found that -equation embeddings provide better models when compared to existing word -embedding approaches. Qualitatively, we found that equation embeddings provide -coherent semantic representations of equations and can capture semantic -similarity to other equations and to words. -" -6970,1803.09164,"Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon - Goldwater",Low-Resource Speech-to-Text Translation,cs.CL," Speech-to-text translation has many potential applications for low-resource -languages, but the typical approach of cascading speech recognition with -machine translation is often impossible, since the transcripts needed to train -a speech recognizer are usually not available for low-resource languages. -Recent work has found that neural encoder-decoder models can learn to directly -translate foreign speech in high-resource scenarios, without the need for -intermediate transcription. We investigate whether this approach also works in -settings where both data and computation are limited. To make the approach -efficient, we make several architectural changes, including a change from -character-level to word-level decoding. We find that this choice yields crucial -speed improvements that allow us to train with fewer computational resources, -yet still performs well on frequent words. We explore models trained on between -20 and 160 hours of data, and find that although models trained on less data -have considerably lower BLEU scores, they can still predict words with -relatively high precision and recall---around 50% for a model trained on 50 -hours of data, versus around 60% for the full 160 hour model. Thus, they may -still be useful for some low-resource scenarios. -" -6971,1803.09189,"Yu-Siang Wang, Chenxi Liu, Xiaohui Zeng, Alan Yuille",Scene Graph Parsing as Dependency Parsing,cs.CL cs.CV," In this paper, we study the problem of parsing structured knowledge graphs -from textual descriptions. In particular, we consider the scene graph -representation that considers objects together with their attributes and -relations: this representation has been proved useful across a variety of -vision and language applications. We begin by introducing an alternative but -equivalent edge-centric view of scene graphs that connect to dependency parses. -Together with a careful redesign of label and action space, we combine the -two-stage pipeline used in prior work (generic dependency parsing followed by -simple post-processing) into one, enabling end-to-end training. The scene -graphs generated by our learned neural dependency parser achieve an F-score -similarity of 49.67% to ground truth graphs on our evaluation set, surpassing -best previous approaches by 5%. We further demonstrate the effectiveness of our -learned parser on image retrieval applications. -" -6972,1803.09230,"Zia Hasan, Sebastian Fischer",Pay More Attention - Neural Architectures for Question-Answering,cs.CL," Machine comprehension is a representative task of natural language -understanding. Typically, we are given context paragraph and the objective is -to answer a question that depends on the context. Such a problem requires to -model the complex interactions between the context paragraph and the question. -Lately, attention mechanisms have been found to be quite successful at these -tasks and in particular, attention mechanisms with attention flow from both -context-to-question and question-to-context have been proven to be quite -useful. In this paper, we study two state-of-the-art attention mechanisms -called Bi-Directional Attention Flow (BiDAF) and Dynamic Co-Attention Network -(DCN) and propose a hybrid scheme combining these two architectures that gives -better overall performance. Moreover, we also suggest a new simpler attention -mechanism that we call Double Cross Attention (DCA) that provides better -results compared to both BiDAF and Co-Attention mechanisms while providing -similar performance as the hybrid scheme. The objective of our paper is to -focus particularly on the attention layer and to suggest improvements on that. -Our experimental evaluations show that both our proposed models achieve -superior results on the Stanford Question Answering Dataset (SQuAD) compared to -BiDAF and DCN attention mechanisms. -" -6973,1803.09288,"Austin C. Kozlowski, Matt Taddy, James A. Evans",The Geometry of Culture: Analyzing Meaning through Word Embeddings,cs.CL," We demonstrate the utility of a new methodological tool, neural-network word -embedding models, for large-scale text analysis, revealing how these models -produce richer insights into cultural associations and categories than possible -with prior methods. Word embeddings represent semantic relations between words -as geometric relationships between vectors in a high-dimensional space, -operationalizing a relational model of meaning consistent with contemporary -theories of identity and culture. We show that dimensions induced by word -differences (e.g. man - woman, rich - poor, black - white, liberal - -conservative) in these vector spaces closely correspond to dimensions of -cultural meaning, and the projection of words onto these dimensions reflects -widely shared cultural connotations when compared to surveyed responses and -labeled historical data. We pilot a method for testing the stability of these -associations, then demonstrate applications of word embeddings for -macro-cultural investigation with a longitudinal analysis of the coevolution of -gender and class associations in the United States over the 20th century and a -comparative analysis of historic distinctions between markers of gender and -class in the U.S. and Britain. We argue that the success of these -high-dimensional models motivates a move towards ""high-dimensional theorizing"" -of meanings, identities and cultural processes. -" -6974,1803.09337,"Omri Koshorek, Adir Cohen, Noam Mor, Michael Rotman, Jonathan Berant",Text Segmentation as a Supervised Learning Task,cs.CL," Text segmentation, the task of dividing a document into contiguous segments -based on its semantic structure, is a longstanding challenge in language -understanding. Previous work on text segmentation focused on unsupervised -methods such as clustering or graph search, due to the paucity in labeled data. -In this work, we formulate text segmentation as a supervised learning problem, -and present a large new dataset for text segmentation that is automatically -extracted and labeled from Wikipedia. Moreover, we develop a segmentation model -based on this dataset and show that it generalizes well to unseen natural text. -" -6975,1803.09371,"Ziyu Yao, Daniel S. Weld, Wei-Peng Chen, Huan Sun",StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow,cs.CL," Stack Overflow (SO) has been a great source of natural language questions and -their code solutions (i.e., question-code pairs), which are critical for many -tasks including code retrieval and annotation. In most existing research, -question-code pairs were collected heuristically and tend to have low quality. -In this paper, we investigate a new problem of systematically mining -question-code pairs from Stack Overflow (in contrast to heuristically -collecting them). It is formulated as predicting whether or not a code snippet -is a standalone solution to a question. We propose a novel Bi-View Hierarchical -Neural Network which can capture both the programming content and the textual -context of a code snippet (i.e., two views) to make a prediction. On two -manually annotated datasets in Python and SQL domain, our framework -substantially outperforms heuristic methods with at least 15% higher F1 and -accuracy. Furthermore, we present StaQC (Stack Overflow Question-Code pairs), -the largest dataset to date of ~148K Python and ~120K SQL question-code pairs, -automatically mined from SO using our framework. Under various case studies, we -demonstrate that StaQC can greatly help develop data-hungry models for -associating natural language with programming language. -" -6976,1803.09401,"Anik Islam, Arifa Akter and Bayzid Ashik Hossain","HomeGuard: A Smart System to Deal with the Emergency Response of - Domestic Violence Victims",cs.IR cs.CL," Domestic violence is a silent crisis in the developing and underdeveloped -countries, though developed countries also remain drowned in the curse of it. -In developed countries, victims can easily report and ask help on the contrary -in developing and underdeveloped countries victims hardly report the crimes and -when it's noticed by the authority it's become too late to save or support the -victim. If this kind of problems can be identified at the very beginning of the -event and proper actions can be taken, it'll not only help the victim but also -reduce the domestic violence crimes. This paper proposed a smart system which -can extract victim's situation and provide help according to it. Among of the -developing and underdeveloped countries Bangladesh has been chosen though the -rate of reporting of domestic violence is low, the extreme report collected by -authorities is too high. Case studies collected by different NGO's relating to -domestic violence have been studied and applied to extract possible condition -for the victims. -" -6977,1803.09402,"Ritesh Kumar, Aishwarya N. Reganti, Akshit Bhatia, Tushar Maheshwari",Aggression-annotated Corpus of Hindi-English Code-mixed Data,cs.CL," As the interaction over the web has increased, incidents of aggression and -related events like trolling, cyberbullying, flaming, hate speech, etc. too -have increased manifold across the globe. While most of these behaviour like -bullying or hate speech have predated the Internet, the reach and extent of the -Internet has given these an unprecedented power and influence to affect the -lives of billions of people. So it is of utmost significance and importance -that some preventive measures be taken to provide safeguard to the people using -the web such that the web remains a viable medium of communication and -connection, in general. In this paper, we discuss the development of an -aggression tagset and an annotated corpus of Hindi-English code-mixed data from -two of the most popular social networking and social media platforms in India, -Twitter and Facebook. The corpus is annotated using a hierarchical tagset of 3 -top-level tags and 10 level 2 tags. The final dataset contains approximately -18k tweets and 21k facebook comments and is being released for further research -in the field. -" -6978,1803.09405,"Ritesh Kumar, Bornini Lahiri, Deepak Alok, Atul Kr. Ojha, Mayank Jain, - Abdul Basit, Yogesh Dawer","Automatic Identification of Closely-related Indian Languages: Resources - and Experiments",cs.CL," In this paper, we discuss an attempt to develop an automatic language -identification system for 5 closely-related Indo-Aryan languages of India, -Awadhi, Bhojpuri, Braj, Hindi and Magahi. We have compiled a comparable corpora -of varying length for these languages from various resources. We discuss the -method of creation of these corpora in detail. Using these corpora, a language -identification system was developed, which currently gives state of the art -accuracy of 96.48\%. We also used these corpora to study the similarity between -the 5 languages at the lexical level, which is the first data-based study of -the extent of closeness of these languages. -" -6979,1803.09519,"Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian St\""uker, Alex - Waibel",Self-Attentional Acoustic Models,cs.CL," Self-attention is a method of encoding sequences of vectors by relating these -vectors to each-other based on pairwise similarities. These models have -recently shown promising results for modeling discrete sequences, but they are -non-trivial to apply to acoustic modeling due to computational and modeling -issues. In this paper, we apply self-attention to acoustic modeling, proposing -several improvements to mitigate these issues: First, self-attention memory -grows quadratically in the sequence length, which we address through a -downsampling technique. Second, we find that previous approaches to incorporate -position information into the model are unsuitable and explore other -representations and hybrid models to this end. Third, to stress the importance -of local context in the acoustic signal, we propose a Gaussian biasing approach -that allows explicit control over the context range. Experiments find that our -model approaches a strong baseline based on LSTMs with network-in-network -connections while being much faster to compute. Besides speed, we find that -interpretability is a strength of self-attentional acoustic models, and -demonstrate that self-attention heads learn a linguistically plausible division -of labor. -" -6980,1803.09551,"Guang-Neng Hu, Xin-Yu Dai, Feng-Yu Qiu, Rui Xia, Tao Li, Shu-Jian - Huang, Jia-Jun Chen","Collaborative Filtering with Topic and Social Latent Factors - Incorporating Implicit Feedback",cs.IR cs.AI cs.CL cs.LG," Recommender systems (RSs) provide an effective way of alleviating the -information overload problem by selecting personalized items for different -users. Latent factors based collaborative filtering (CF) has become the popular -approaches for RSs due to its accuracy and scalability. Recently, online social -networks and user-generated content provide diverse sources for recommendation -beyond ratings. Although {\em social matrix factorization} (Social MF) and {\em -topic matrix factorization} (Topic MF) successfully exploit social relations -and item reviews, respectively, both of them ignore some useful information. In -this paper, we investigate the effective data fusion by combining the -aforementioned approaches. First, we propose a novel model {\em \mbox{MR3}} to -jointly model three sources of information (i.e., ratings, item reviews, and -social relations) effectively for rating prediction by aligning the latent -factors and hidden topics. Second, we incorporate the implicit feedback from -ratings into the proposed model to enhance its capability and to demonstrate -its flexibility. We achieve more accurate rating prediction on real-life -datasets over various state-of-the-art methods. Furthermore, we measure the -contribution from each of the three data sources and the impact of implicit -feedback from ratings, followed by the sensitivity analysis of hyperparameters. -Empirical studies demonstrate the effectiveness and efficacy of our proposed -model and its extension. -" -6981,1803.09578,"Nils Reimers, Iryna Gurevych","Why Comparing Single Performance Scores Does Not Allow to Draw - Conclusions About Machine Learning Approaches",cs.LG cs.AI cs.CL stat.ML," Developing state-of-the-art approaches for specific tasks is a major driving -force in our research community. Depending on the prestige of the task, -publishing it can come along with a lot of visibility. The question arises how -reliable are our evaluation methodologies to compare approaches? - One common methodology to identify the state-of-the-art is to partition data -into a train, a development and a test set. Researchers can train and tune -their approach on some part of the dataset and then select the model that -worked best on the development set for a final evaluation on unseen test data. -Test scores from different approaches are compared, and performance differences -are tested for statistical significance. - In this publication, we show that there is a high risk that a statistical -significance in this type of evaluation is not due to a superior learning -approach. Instead, there is a high risk that the difference is due to chance. -For example for the CoNLL 2003 NER dataset we observed in up to 26% of the -cases type I errors (false positives) with a threshold of p < 0.05, i.e., -falsely concluding a statistically significant difference between two identical -approaches. - We prove that this evaluation setup is unsuitable to compare learning -approaches. We formalize alternative evaluation setups based on score -distributions. -" -6982,1803.09641,Deepak P,Unsupervised Separation of Transliterable and Native Words for Malayalam,cs.CL," Differentiating intrinsic language words from transliterable words is a key -step aiding text processing tasks involving different natural languages. We -consider the problem of unsupervised separation of transliterable words from -native words for text in Malayalam language. Outlining a key observation on the -diversity of characters beyond the word stem, we develop an optimization method -to score words based on their nativeness. Our method relies on the usage of -probability distributions over character n-grams that are refined in step with -the nativeness scorings in an iterative optimization formulation. Using an -empirical evaluation, we illustrate that our method, DTIM, provides significant -improvements in nativeness scoring for Malayalam, establishing DTIM as the -preferred method for the task. -" -6983,1803.09720,Simon \v{S}uster and Walter Daelemans,"CliCR: A Dataset of Clinical Case Reports for Machine Reading - Comprehension",cs.CL," We present a new dataset for machine comprehension in the medical domain. Our -dataset uses clinical case reports with around 100,000 gap-filling queries -about these cases. We apply several baselines and state-of-the-art neural -readers to the dataset, and observe a considerable gap in performance (20% F1) -between the best human and machine readers. We analyze the skills required for -successful answering and show how reader performance varies depending on the -applicable skills. We find that inferences using domain knowledge and object -tracking are the most frequently required skills, and that recognizing omitted -information and spatio-temporal reasoning are the most difficult for the -machines. -" -6984,1803.09745,"Tyler J. Gray, Andrew J. Reagan, Peter Sheridan Dodds, and Christopher - M. Danforth",English verb regularization in books and tweets,cs.CL physics.soc-ph," The English language has evolved dramatically throughout its lifespan, to the -extent that a modern speaker of Old English would be incomprehensible without -translation. One concrete indicator of this process is the movement from -irregular to regular (-ed) forms for the past tense of verbs. In this study we -quantify the extent of verb regularization using two vastly disparate datasets: -(1) Six years of published books scanned by Google (2003--2008), and (2) A -decade of social media messages posted to Twitter (2008--2017). We find that -the extent of verb regularization is greater on Twitter, taken as a whole, than -in English Fiction books. Regularization is also greater for tweets geotagged -in the United States relative to American English books, but the opposite is -true for tweets geotagged in the United Kingdom relative to British English -books. We also find interesting regional variations in regularization across -counties in the United States. However, once differences in population are -accounted for, we do not identify strong correlations with socio-demographic -variables such as education or income. -" -6985,1803.09816,"Deblin Bagchi, Peter Plantinga, Adam Stiff and Eric Fosler-Lussier",Spectral feature mapping with mimic loss for robust speech recognition,cs.SD cs.CL eess.AS," For the task of speech enhancement, local learning objectives are agnostic to -phonetic structures helpful for speech recognition. We propose to add a global -criterion to ensure de-noised speech is useful for downstream tasks like ASR. -We first train a spectral classifier on clean speech to predict senone labels. -Then, the spectral classifier is joined with our speech enhancer as a noisy -speech recognizer. This model is taught to imitate the output of the spectral -classifier alone on clean speech. This \textit{mimic loss} is combined with the -traditional local criterion to train the speech enhancer to produce de-noised -speech. Feeding the de-noised speech to an off-the-shelf Kaldi training recipe -for the CHiME-2 corpus shows significant improvements in WER. -" -6986,1803.09832,"Andrew Ortegaray, Robert C. Berwick, Matilde Marcolli",Heat Kernel analysis of Syntactic Structures,cs.CL," We consider two different data sets of syntactic parameters and we discuss -how to detect relations between parameters through a heat kernel method -developed by Belkin-Niyogi, which produces low dimensional representations of -the data, based on Laplace eigenfunctions, that preserve neighborhood -information. We analyze the different connectivity and clustering structures -that arise in the two datasets, and the regions of maximal variance in the -two-parameter space of the Belkin-Niyogi construction, which identify -preferable choices of independent variables. We compute clustering coefficients -and their variance. -" -6987,1803.09840,"Luigi Asprino, Valerio Basile, Paolo Ciancarini, Valentina Presutti",Empirical Analysis of Foundational Distinctions in Linked Open Data,cs.AI cs.CL," The Web and its Semantic extension (i.e. Linked Open Data) contain open -global-scale knowledge and make it available to potentially intelligent -machines that want to benefit from it. Nevertheless, most of Linked Open Data -lack ontological distinctions and have sparse axiomatisation. For example, -distinctions such as whether an entity is inherently a class or an individual, -or whether it is a physical object or not, are hardly expressed in the data, -although they have been largely studied and formalised by foundational -ontologies (e.g. DOLCE, SUMO). These distinctions belong to common sense too, -which is relevant for many artificial intelligence tasks such as natural -language understanding, scene recognition, and the like. There is a gap between -foundational ontologies, that often formalise or are inspired by pre-existing -philosophical theories and are developed with a top-down approach, and Linked -Open Data that mostly derive from existing databases or crowd-based effort -(e.g. DBpedia, Wikidata). We investigate whether machines can learn -foundational distinctions over Linked Open Data entities, and if they match -common sense. We want to answer questions such as ""does the DBpedia entity for -dog refer to a class or to an instance?"". We report on a set of experiments -based on machine learning and crowdsourcing that show promising results. -" -6988,1803.09845,"Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh",Neural Baby Talk,cs.CV cs.CL," We introduce a novel framework for image captioning that can produce natural -language explicitly grounded in entities that object detectors find in the -image. Our approach reconciles classical slot filling approaches (that are -generally better grounded in images) with modern neural captioning approaches -(that are generally more natural sounding and accurate). Our approach first -generates a sentence `template' with slot locations explicitly tied to specific -image regions. These slots are then filled in by visual concepts identified in -the regions by object detectors. The entire architecture (sentence template -generation and slot filling with object detectors) is end-to-end -differentiable. We verify the effectiveness of our proposed model on different -image captioning tasks. On standard image captioning and novel object -captioning, our model reaches state-of-the-art on both COCO and Flickr30k -datasets. We also demonstrate that our model has unique advantages when the -train and test distributions of scene compositions -- and hence language priors -of associated captions -- are different. Code has been made available at: -https://github.com/jiasenlu/NeuralBabyTalk -" -6989,1803.09901,Nicholas Dingwall and Christopher Potts,"Mittens: An Extension of GloVe for Learning Domain-Specialized - Representations",cs.CL," We present a simple extension of the GloVe representation learning model that -begins with general-purpose representations and updates them based on data from -a specialized domain. We show that the resulting representations can lead to -faster learning and better results on a variety of tasks. -" -6990,1803.10132,"Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie","Investigating Generative Adversarial Networks based Speech - Dereverberation for Robust Speech Recognition",cs.SD cs.CL eess.AS," We investigate the use of generative adversarial networks (GANs) in speech -dereverberation for robust speech recognition. GANs have been recently studied -for speech enhancement to remove additive noises, but there still lacks of a -work to examine their ability in speech dereverberation and the advantages of -using GANs have not been fully established. In this paper, we provide deep -investigations in the use of GAN-based dereverberation front-end in ASR. First, -we study the effectiveness of different dereverberation networks (the generator -in GAN) and find that LSTM leads a significant improvement as compared with -feed-forward DNN and CNN in our dataset. Second, further adding residual -connections in the deep LSTMs can boost the performance as well. Finally, we -find that, for the success of GAN, it is important to update the generator and -the discriminator using the same mini-batch data during training. Moreover, -using reverberant spectrogram as a condition to discriminator, as suggested in -previous studies, may degrade the performance. In summary, our GAN-based -dereverberation front-end achieves 14%-19% relative CER reduction as compared -to the baseline DNN dereverberation network when tested on a strong -multi-condition training acoustic model. -" -6991,1803.10146,"Ke Wang, Junbo Zhang, Yujun Wang, Lei Xie",Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model,cs.SD cs.CL eess.AS," Speaker adaptation aims to estimate a speaker specific acoustic model from a -speaker independent one to minimize the mismatch between the training and -testing conditions arisen from speaker variabilities. A variety of neural -network adaptation methods have been proposed since deep learning models have -become the main stream. But there still lacks an experimental comparison -between different methods, especially when DNN-based acoustic models have been -advanced greatly. In this paper, we aim to close this gap by providing an -empirical evaluation of three typical speaker adaptation methods: LIN, LHUC and -KLD. Adaptation experiments, with different size of adaptation data, are -conducted on a strong TDNN-LSTM acoustic model. More challengingly, here, the -source and target we are concerned with are standard Mandarin speaker model and -accented Mandarin speaker model. We compare the performances of different -methods and their combinations. Speaker adaptation performance is also examined -by speaker's accent degree. -" -6992,1803.10299,"Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner and Shinji - Watanabe",Multi-Modal Data Augmentation for End-to-End ASR,cs.CL cs.SD eess.AS," We present a new end-to-end architecture for automatic speech recognition -(ASR) that can be trained using \emph{symbolic} input in addition to the -traditional acoustic input. This architecture utilizes two separate encoders: -one for acoustic input and another for symbolic input, both sharing the -attention and decoder parameters. We call this architecture a multi-modal data -augmentation network (MMDA), as it can support multi-modal (acoustic and -symbolic) input and enables seamless mixing of large text datasets with -significantly smaller transcribed speech corpora during training. We study -different ways of transforming large text corpora into a symbolic form suitable -for training our MMDA network. Our best MMDA setup obtains small improvements -on character error rate (CER), and as much as 7-10\% relative word error rate -(WER) improvement over a baseline both with and without an external language -model. -" -6993,1803.10357,"Asli Celikyilmaz, Antoine Bosselut, Xiaodong He, Yejin Choi",Deep Communicating Agents for Abstractive Summarization,cs.CL," We present deep communicating agents in an encoder-decoder architecture to -address the challenges of representing a long document for abstractive -summarization. With deep communicating agents, the task of encoding a long text -is divided across multiple collaborating agents, each in charge of a subsection -of the input text. These encoders are connected to a single decoder, trained -end-to-end using reinforcement learning to generate a focused and coherent -summary. Empirical results demonstrate that multiple communicating encoders -lead to a higher quality summary compared to several strong baselines, -including those based on a single encoder or multiple non-communicating -encoders. -" -6994,1803.10384,Yuan Gong and Christian Poellabauer,Topic Modeling Based Multi-modal Depression Detection,cs.CL cs.IR cs.LG cs.SD eess.AS," Major depressive disorder is a common mental disorder that affects almost 7% -of the adult U.S. population. The 2017 Audio/Visual Emotion Challenge (AVEC) -asks participants to build a model to predict depression levels based on the -audio, video, and text of an interview ranging between 7-33 minutes. Since -averaging features over the entire interview will lose most temporal -information, how to discover, capture, and preserve useful temporal details for -such a long interview are significant challenges. Therefore, we propose a novel -topic modeling based approach to perform context-aware analysis of the -recording. Our experiments show that the proposed approach outperforms -context-unaware methods and the challenge baselines for all metrics. -" -6995,1803.10421,Daniyar Itegulov and Ekaterina Lebedeva,Handling Verb Phrase Anaphora with Dependent Types and Events,cs.CL," This paper studies how dependent typed events can be used to treat verb -phrase anaphora. We introduce a framework that extends Dependent Type Semantics -(DTS) with a new atomic type for neo-Davidsonian events and an extended -@-operator that can return new events that share properties of events -referenced by verb phrase anaphora. The proposed framework, along with -illustrative examples of its use, are presented after a brief overview of the -necessary background and of the major challenges posed by verb phrase anaphora. -" -6996,1803.10525,"Andros Tjandra, Sakriani Sakti, Satoshi Nakamura",Machine Speech Chain with One-shot Speaker Adaptation,cs.CL cs.LG cs.SD eess.AS," In previous work, we developed a closed-loop speech chain model based on deep -learning, in which the architecture enabled the automatic speech recognition -(ASR) and text-to-speech synthesis (TTS) components to mutually improve their -performance. This was accomplished by the two parts teaching each other using -both labeled and unlabeled data. This approach could significantly improve -model performance within a single-speaker speech dataset, but only a slight -increase could be gained in multi-speaker tasks. Furthermore, the model is -still unable to handle unseen speakers. In this paper, we present a new speech -chain mechanism by integrating a speaker recognition model inside the loop. We -also propose extending the capability of TTS to handle unseen speakers by -implementing one-shot speaker adaptation. This enables TTS to mimic voice -characteristics from one speaker to another with only a one-shot speaker -sample, even from a text without any speaker information. In the speech chain -loop mechanism, ASR also benefits from the ability to further learn an -arbitrary speaker's characteristics from the generated speech waveform, -resulting in a significant improvement in the recognition rate. -" -6997,1803.10547,"Nurendra Choudhary, Rajat Singh, Ishita Bindlish and Manish - Shrivastava",Neural Network Architecture for Credibility Assessment of Textual Claims,cs.CL," Text articles with false claims, especially news, have recently become -aggravating for the Internet users. These articles are in wide circulation and -readers face difficulty discerning fact from fiction. Previous work on -credibility assessment has focused on factual analysis and linguistic features. -The task's main challenge is the distinction between the features of true and -false articles. In this paper, we propose a novel approach called Credibility -Outcome (CREDO) which aims at scoring the credibility of an article in an open -domain setting. - CREDO consists of different modules for capturing various features -responsible for the credibility of an article. These features includes -credibility of the article's source and author, semantic similarity between the -article and related credible articles retrieved from a knowledge base, and -sentiments conveyed by the article. A neural network architecture learns the -contribution of each of these modules to the overall credibility of an article. -Experiments on Snopes dataset reveals that CREDO outperforms the -state-of-the-art approaches based on linguistic features. -" -6998,1803.10631,"Thomas Wolf, Julien Chaumond, Clement Delangue",Meta-Learning a Dynamical Language Model,cs.CL," We consider the task of word-level language modeling and study the -possibility of combining hidden-states-based short-term representations with -medium-term representations encoded in dynamical weights of a language model. -Our work extends recent experiments on language models with dynamically -evolving weights by casting the language modeling problem into an online -learning-to-learn framework in which a meta-learner is trained by -gradient-descent to continuously update a language model weights. -" -6999,1803.10916,"Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie",Attention-based End-to-End Models for Small-Footprint Keyword Spotting,cs.SD cs.CL eess.AS," In this paper, we propose an attention-based end-to-end neural approach for -small-footprint keyword spotting (KWS), which aims to simplify the pipelines of -building a production-quality KWS system. Our model consists of an encoder and -an attention mechanism. The encoder transforms the input signal into a high -level representation using RNNs. Then the attention mechanism weights the -encoder features and generates a fixed-length vector. Finally, by linear -transformation and softmax function, the vector becomes a score used for -keyword detection. We also evaluate the performance of different encoder -architectures, including LSTM, GRU and CRNN. Experiments on real-world wake-up -data show that our approach outperforms the recent Deep KWS approach by a large -margin and the best performance is achieved by CRNN. To be more specific, with -~84K parameters, our attention-based model achieves 1.02% false rejection rate -(FRR) at 1.0 false alarm (FA) per hour. -" -7000,1803.10952,"Yi-Chen Chen, Chia-Hao Shen, Sung-Feng Huang, Hung-yi Lee","Towards Unsupervised Automatic Speech Recognition Trained by Unaligned - Speech and Text only",cs.CL," Automatic speech recognition (ASR) has been widely researched with supervised -approaches, while many low-resourced languages lack audio-text aligned data, -and supervised methods cannot be applied on them. - In this work, we propose a framework to achieve unsupervised ASR on a read -English speech dataset, where audio and text are unaligned. In the first stage, -each word-level audio segment in the utterances is represented by a vector -representation extracted by a sequence-of-sequence autoencoder, in which -phonetic information and speaker information are disentangled. - Secondly, semantic embeddings of audio segments are trained from the vector -representations using a skip-gram model. Last but not the least, an -unsupervised method is utilized to transform semantic embeddings of audio -segments to text embedding space, and finally the transformed embeddings are -mapped to words. - With the above framework, we are towards unsupervised ASR trained by -unaligned text and speech only. -" -7001,1803.11045,Ryan Wesslen,"Computer-Assisted Text Analysis for Social Science: Topic Models and - Beyond",cs.CL," Topic models are a family of statistical-based algorithms to summarize, -explore and index large collections of text documents. After a decade of -research led by computer scientists, topic models have spread to social science -as a new generation of data-driven social scientists have searched for tools to -explore large collections of unstructured text. Recently, social scientists -have contributed to topic model literature with developments in causal -inference and tools for handling the problem of multi-modality. In this paper, -I provide a literature review on the evolution of topic modeling including -extensions for document covariates, methods for evaluation and interpretation, -and advances in interactive visualizations along with each aspect's relevance -and application for social science research. -" -7002,1803.11070,"Piji Li, Lidong Bing, Wai Lam",Actor-Critic based Training Framework for Abstractive Summarization,cs.CL cs.AI," We present a training framework for neural abstractive summarization based on -actor-critic approaches from reinforcement learning. In the traditional neural -network based methods, the objective is only to maximize the likelihood of the -predicted summaries, no other assessment constraints are considered, which may -generate low-quality summaries or even incorrect sentences. To alleviate this -problem, we employ an actor-critic framework to enhance the training procedure. -For the actor, we employ the typical attention based sequence-to-sequence -(seq2seq) framework as the policy network for summary generation. For the -critic, we combine the maximum likelihood estimator with a well designed global -summary quality estimator which is a neural network based binary classifier -aiming to make the generated summaries indistinguishable from the human-written -ones. Policy gradient method is used to conduct the parameter learning. An -alternating training strategy is proposed to conduct the joint training of the -actor and critic models. Extensive experiments on some benchmark datasets in -different languages show that our framework achieves improvements over the -state-of-the-art methods. -" -7003,1803.11112,"Yogarshi Vyas, Xing Niu, Marine Carpuat",Identifying Semantic Divergences in Parallel Text without Annotations,cs.CL," Recognizing that even correct translations are not always semantically -equivalent, we automatically detect meaning divergences in parallel sentence -pairs with a deep neural model of bilingual semantic similarity which can be -trained for any parallel corpus without any manual annotation. We show that our -semantic model detects divergences more accurately than models based on surface -features derived from word alignments, and that these divergences matter for -neural machine translation. -" -7004,1803.11138,"Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, Marco - Baroni",Colorless green recurrent networks dream hierarchically,cs.CL," Recurrent neural networks (RNNs) have achieved impressive results in a -variety of linguistic processing tasks, suggesting that they can induce -non-trivial properties of language. We investigate here to what extent RNNs -learn to track abstract hierarchical syntactic structure. We test whether RNNs -trained with a generic language modeling objective in four languages (Italian, -English, Hebrew, Russian) can predict long-distance number agreement in various -constructions. We include in our evaluation nonsensical sentences where RNNs -cannot rely on semantic or lexical cues (""The colorless green ideas I ate with -the chair sleep furiously""), and, for Italian, we compare model performance to -human intuitions. Our language-model-trained RNNs make reliable predictions -about long-distance agreement, and do not lag much behind human performance. We -thus bring support to the hypothesis that RNNs are not just shallow-pattern -extractors, but they also acquire deeper grammatical competence. -" -7005,1803.11175,"Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, - Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris - Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil",Universal Sentence Encoder,cs.CL," We present models for encoding sentences into embedding vectors that -specifically target transfer learning to other NLP tasks. The models are -efficient and result in accurate performance on diverse transfer tasks. Two -variants of the encoding models allow for trade-offs between accuracy and -compute resources. For both variants, we investigate and report the -relationship between model complexity, resource consumption, the availability -of transfer task training data, and task performance. Comparisons are made with -baselines that use word level transfer learning via pretrained word embeddings -as well as baselines do not use any transfer learning. We find that transfer -learning using sentence embeddings tends to outperform word level transfer. -With transfer learning via sentence embeddings, we observe surprisingly good -performance with minimal amounts of supervised training data for a transfer -task. We obtain encouraging results on Word Embedding Association Tests (WEAT) -targeted at detecting model bias. Our pre-trained sentence encoding models are -made freely available for download and on TF Hub. -" -7006,1803.11186,"Unnat Jain, Svetlana Lazebnik, Alexander Schwing","Two can play this Game: Visual Dialog with Discriminative Question - Generation and Answering",cs.CV cs.CL," Human conversation is a complex mechanism with subtle nuances. It is hence an -ambitious goal to develop artificial intelligence agents that can participate -fluently in a conversation. While we are still far from achieving this goal, -recent progress in visual question answering, image captioning, and visual -question generation shows that dialog systems may be realizable in the not too -distant future. To this end, a novel dataset was introduced recently and -encouraging results were demonstrated, particularly for question answering. In -this paper, we demonstrate a simple symmetric discriminative baseline, that can -be applied to both predicting an answer as well as predicting a question. We -show that this method performs on par with the state of the art, even memory -net based methods. In addition, for the first time on the visual dialog -dataset, we assess the performance of a system asking questions, and -demonstrate how visual dialog can be generated from discriminative question -generation and question answering. -" -7007,1803.11284,"Bodhisattwa Prasad Majumder, Aditya Subramanian, Abhinandan Krishnan, - Shreyansh Gandhi, Ajinkya More","Deep Recurrent Neural Networks for Product Attribute Extraction in - eCommerce",cs.CL cs.AI," Extracting accurate attribute qualities from product titles is a vital -component in delivering eCommerce customers with a rewarding online shopping -experience via an enriched faceted search. We demonstrate the potential of Deep -Recurrent Networks in this domain, primarily models such as Bidirectional LSTMs -and Bidirectional LSTM-CRF with or without an attention mechanism. These have -improved overall F1 scores, as compared to the previous benchmarks (More et -al.) by at least 0.0391, showcasing an overall precision of 97.94%, recall of -94.12% and the F1 score of 0.9599. This has made us achieve a significant -coverage of important facets or attributes of products which not only shows the -efficacy of deep recurrent models over previous machine learning benchmarks but -also greatly enhances the overall customer experience while shopping online. -" -7008,1803.11291,"Shyam Upadhyay, Yogarshi Vyas, Marine Carpuat, Dan Roth",Robust Cross-lingual Hypernymy Detection using Dependency Context,cs.CL," Cross-lingual Hypernymy Detection involves determining if a word in one -language (""fruit"") is a hypernym of a word in another language (""pomme"" i.e. -apple in French). The ability to detect hypernymy cross-lingually can aid in -solving cross-lingual versions of tasks such as textual entailment and event -coreference. We propose BISPARSE-DEP, a family of unsupervised approaches for -cross-lingual hypernymy detection, which learns sparse, bilingual word -embeddings based on dependency contexts. We show that BISPARSE-DEP can -significantly improve performance on this task, compared to approaches based -only on lexical context. Our approach is also robust, showing promise for -low-resource settings: our dependency-based embeddings can be learned using a -parser trained on related languages, with negligible loss in performance. We -also crowd-source a challenging dataset for this task on four languages -- -Russian, French, Arabic, and Chinese. Our embeddings and datasets are publicly -available. -" -7009,1803.11326,"Yu Gong, Xusheng Luo, Yu Zhu, Wenwu Ou, Zhao Li, Muhua Zhu, Kenny Q. - Zhu, Lu Duan, Xi Chen","Deep Cascade Multi-task Learning for Slot Filling in Online Shopping - Assistant",cs.CL," Slot filling is a critical task in natural language understanding (NLU) for -dialog systems. State-of-the-art approaches treat it as a sequence labeling -problem and adopt such models as BiLSTM-CRF. While these models work relatively -well on standard benchmark datasets, they face challenges in the context of -E-commerce where the slot labels are more informative and carry richer -expressions. In this work, inspired by the unique structure of E-commerce -knowledge base, we propose a novel multi-task model with cascade and residual -connections, which jointly learns segment tagging, named entity tagging and -slot filling. Experiments show the effectiveness of the proposed cascade and -residual structures. Our model has a 14.6% advantage in F1 score over the -strong baseline methods on a new Chinese E-commerce shopping assistant dataset, -while achieving competitive accuracies on a standard dataset. Furthermore, -online test deployed on such dominant E-commerce platform shows 130% -improvement on accuracy of understanding user utterances. Our model has already -gone into production in the E-commerce platform. -" -7010,1803.11359,"Yu Gong, Xusheng Luo, Kenny Q. Zhu, Wenwu Ou, Zhao Li, Lu Duan",Automatic Generation of Chinese Short Product Titles for Mobile Display,cs.CL," This paper studies the problem of automatically extracting a short title from -a manually written longer description of E-commerce products for display on -mobile devices. It is a new extractive summarization problem on short text -inputs, for which we propose a feature-enriched network model, combining three -different categories of features in parallel. Experimental results show that -our framework significantly outperforms several baselines by a substantial gain -of 4.5%. Moreover, we produce an extractive summarization dataset for -E-commerce short texts and will release it to the research community. -" -7011,1803.11407,"Heeyoul Choi, Kyunghyun Cho and Yoshua Bengio",Fine-Grained Attention Mechanism for Neural Machine Translation,cs.CL," Neural machine translation (NMT) has been a new paradigm in machine -translation, and the attention mechanism has become the dominant approach with -the state-of-the-art records in many language pairs. While there are variants -of the attention mechanism, all of them use only temporal attention where one -scalar value is assigned to one context vector corresponding to a source word. -In this paper, we propose a fine-grained (or 2D) attention mechanism where each -dimension of a context vector will receive a separate attention score. In -experiments with the task of En-De and En-Fi translation, the fine-grained -attention method improves the translation quality in terms of BLEU score. In -addition, our alignment analysis reveals how the fine-grained attention -mechanism exploits the internal structure of context vectors. -" -7012,1803.11506,"Egor Lakomkin, Cornelius Weber, Stefan Wermter","Automatically augmenting an emotion dataset improves classification - using audio",cs.CL," In this work, we tackle a problem of speech emotion classification. One of -the issues in the area of affective computation is that the amount of annotated -data is very limited. On the other hand, the number of ways that the same -emotion can be expressed verbally is enormous due to variability between -speakers. This is one of the factors that limits performance and -generalization. We propose a simple method that extracts audio samples from -movies using textual sentiment analysis. As a result, it is possible to -automatically construct a larger dataset of audio samples with positive, -negative emotional and neutral speech. We show that pretraining recurrent -neural network on such a dataset yields better results on the challenging -EmotiW corpus. This experiment shows a potential benefit of combining textual -sentiment analysis with vocal information. -" -7013,1803.11508,"Egor Lakomkin, Cornelius Weber, Sven Magg, Stefan Wermter",Reusing Neural Speech Representations for Auditory Emotion Recognition,cs.CL," Acoustic emotion recognition aims to categorize the affective state of the -speaker and is still a difficult task for machine learning models. The -difficulties come from the scarcity of training data, general subjectivity in -emotion perception resulting in low annotator agreement, and the uncertainty -about which features are the most relevant and robust ones for classification. -In this paper, we will tackle the latter problem. Inspired by the recent -success of transfer learning methods we propose a set of architectures which -utilize neural representations inferred by training on large speech databases -for the acoustic emotion recognition task. Our experiments on the IEMOCAP -dataset show ~10% relative improvements in the accuracy and F1-score over the -baseline recurrent neural network which is trained end-to-end for emotion -recognition. -" -7014,1803.11509,"Egor Lakomkin, Chandrakant Bothe, Stefan Wermter","GradAscent at EmoInt-2017: Character- and Word-Level Recurrent Neural - Network Models for Tweet Emotion Intensity Detection",cs.CL," The WASSA 2017 EmoInt shared task has the goal to predict emotion intensity -values of tweet messages. Given the text of a tweet and its emotion category -(anger, joy, fear, and sadness), the participants were asked to build a system -that assigns emotion intensity values. Emotion intensity estimation is a -challenging problem given the short length of the tweets, the noisy structure -of the text and the lack of annotated data. To solve this problem, we developed -an ensemble of two neural models, processing input on the character. and -word-level with a lexicon-driven system. The correlation scores across all four -emotions are averaged to determine the bottom-line competition metric, and our -system ranks place forth in full intensity range and third in 0.5-1 range of -intensity among 23 systems at the time of writing (June 2017). -" -7015,1804.00015,"Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro - Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew - Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai",ESPnet: End-to-End Speech Processing Toolkit,cs.CL," This paper introduces a new open source platform for end-to-end speech -processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech -recognition (ASR), and adopts widely-used dynamic neural network toolkits, -Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the -Kaldi ASR toolkit style for data processing, feature extraction/format, and -recipes to provide a complete setup for speech recognition and other speech -processing experiments. This paper explains a major architecture of this -software platform, several important functionalities, which differentiate -ESPnet from other open source ASR toolkits, and experimental results with major -ASR benchmarks. -" -7016,1804.00047,"Albert Haque, Michelle Guo, Prateek Verma",Conditional End-to-End Audio Transforms,cs.SD cs.CL cs.LG eess.AS," We present an end-to-end method for transforming audio from one style to -another. For the case of speech, by conditioning on speaker identities, we can -train a single model to transform words spoken by multiple people into multiple -target voices. For the case of music, we can specify musical instruments and -achieve the same result. Architecturally, our method is a fully-differentiable -sequence-to-sequence model based on convolutional and hierarchical recurrent -neural networks. It is designed to capture long-term acoustic dependencies, -requires minimal post-processing, and produces realistic audio transforms. -Ablation studies confirm that our model can separate speaker and instrument -properties from acoustic content at different receptive fields. Empirically, -our method achieves competitive performance on community-standard datasets. -" -7017,1804.00065,"Yohan Jo, Shivani Poddar, Byungsoo Jeon, Qinlan Shen, Carolyn P. Rose, - Graham Neubig",Attentive Interaction Model: Modeling Changes in View in Argumentation,cs.CL cs.CY," We present a neural architecture for modeling argumentative dialogue that -explicitly models the interplay between an Opinion Holder's (OH's) reasoning -and a challenger's argument, with the goal of predicting if the argument -successfully changes the OH's view. The model has two components: (1) -vulnerable region detection, an attention model that identifies parts of the -OH's reasoning that are amenable to change, and (2) interaction encoding, which -identifies the relationship between the content of the OH's reasoning and that -of the challenger's argument. Based on evaluation on discussions from the -Change My View forum on Reddit, the two components work together to predict an -OH's change in view, outperforming several baselines. A posthoc analysis -suggests that sentences picked out by the attention model are addressed more -frequently by successful arguments than by unsuccessful ones. -" -7018,1804.00079,"Sandeep Subramanian, Adam Trischler, Yoshua Bengio, Christopher J Pal","Learning General Purpose Distributed Sentence Representations via Large - Scale Multi-task Learning",cs.CL," A lot of the recent success in natural language processing (NLP) has been -driven by distributed vector representations of words trained on large amounts -of text in an unsupervised manner. These representations are typically used as -general purpose features for words across a range of NLP problems. However, -extending this success to learning representations of sequences of words, such -as sentences, remains an open problem. Recent work has explored unsupervised as -well as supervised learning techniques with different training objectives to -learn general purpose fixed-length sentence representations. In this work, we -present a simple, effective multi-task learning framework for sentence -representations that combines the inductive biases of diverse training -objectives in a single model. We train this model on several data sources with -multiple training objectives on over 100 million sentences. Extensive -experiments demonstrate that sharing a single recurrent sentence encoder across -weakly related tasks leads to consistent improvements over previous methods. We -present substantial improvements in the context of transfer learning and -low-resource settings using our learned general-purpose representations. -" -7019,1804.00084,Johnnatan Messias,Characterizing Interconnections and Linguistic Patterns in Twitter,cs.SI cs.CL," Social media is considered a democratic space in which people connect and -interact with each other regardless of their gender, race, or any other -demographic aspect. Despite numerous efforts that explore demographic aspects -in social media, it is still unclear whether social media perpetuates old -inequalities from the offline world. In this dissertation, we attempt to -identify gender and race of Twitter users located in the United States using -advanced image processing algorithms from Face++. We investigate how different -demographic groups connect with each other and differentiate them regarding -linguistic styles and also their interests. We quantify to what extent one -group follows and interacts with each other and the extent to which these -connections and interactions reflect in inequalities in Twitter. We also -extract linguistic features from six categories (affective attributes, -cognitive attributes, lexical density and awareness, temporal references, -social and personal concerns, and interpersonal focus) in order to identify the -similarities and the differences in the messages they share in Twitter. -Furthermore, we extract the absolute ranking difference of top phrases between -demographic groups. As a dimension of diversity, we use the topics of interest -that we retrieve from each user. Our analysis shows that users identified as -white and male tend to attain higher positions, in terms of the number of -followers and number of times in another user's lists, in Twitter. There are -clear differences in the way of writing across different demographic groups in -both gender and race domains as well as in the topic of interest. We hope our -effort can stimulate the development of new theories of demographic information -in the online space. Finally, we developed a Web-based system that leverages -the demographic aspects of users to provide transparency to the Twitter -trending topics system. -" -7020,1804.00146,Simon Keizer and Verena Rieser,"Towards Learning Transferable Conversational Skills using - Multi-dimensional Dialogue Modelling",cs.CL," Recent statistical approaches have improved the robustness and scalability of -spoken dialogue systems. However, despite recent progress in domain adaptation, -their reliance on in-domain data still limits their cross-domain scalability. -In this paper, we argue that this problem can be addressed by extending current -models to reflect and exploit the multi-dimensional nature of human dialogue. -We present our multi-dimensional, statistical dialogue management framework, in -which transferable conversational skills can be learnt by separating out -domain-independent dimensions of communication and using multi-agent -reinforcement learning. Our initial experiments with a simulated user show that -we can speed up the learning process by transferring learnt policies. -" -7021,1804.00247,"Martin Popel, Ond\v{r}ej Bojar",Training Tips for the Transformer Model,cs.CL," This article describes our experiments in neural machine translation using -the recent Tensor2Tensor framework and the Transformer sequence-to-sequence -model (Vaswani et al., 2017). We examine some of the critical parameters that -affect the final translation quality, memory usage, training stability and -training time, concluding each experiment with a set of recommendations for -fellow researchers. In addition to confirming the general mantra ""more data and -larger models"", we address scaling to multiple GPUs and provide practical tips -for improved training regarding batch size, learning rate, warmup steps, -maximum sentence length and checkpoint averaging. We hope that our observations -will allow others to get better results given their particular hardware and -data constraints. -" -7022,1804.00306,"Cun Mu, Guang Yang and Zheng Yan",Revisiting Skip-Gram Negative Sampling Model with Rectification,cs.CL cs.LG stat.ML," We revisit skip-gram negative sampling (SGNS), one of the most popular -neural-network based approaches to learning distributed word representation. We -first point out the ambiguity issue undermining the SGNS model, in the sense -that the word vectors can be entirely distorted without changing the objective -value. To resolve the issue, we investigate the intrinsic structures in -solution that a good word embedding model should deliver. Motivated by this, we -rectify the SGNS model with quadratic regularization, and show that this simple -modification suffices to structure the solution in the desired manner. A -theoretical justification is presented, which provides novel insights into -quadratic regularization . Preliminary experiments are also conducted on -Google's analytical reasoning task to support the modified SGNS model. -" -7023,1804.00316,"Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee","Completely Unsupervised Phoneme Recognition by Adversarially Learning - Mapping Relationships from Audio Embeddings",cs.CL," Unsupervised discovery of acoustic tokens from audio corpora without -annotation and learning vector representations for these tokens have been -widely studied. Although these techniques have been shown successful in some -applications such as query-by-example Spoken Term Detection (STD), the lack of -mapping relationships between these discovered tokens and real phonemes have -limited the down-stream applications. This paper represents probably the first -attempt towards the goal of completely unsupervised phoneme recognition, or -mapping audio signals to phoneme sequences without phoneme-labeled audio data. -The basic idea is to cluster the embedded acoustic tokens and learn the mapping -between the cluster sequences and the unknown phoneme sequences with a -Generative Adversarial Network (GAN). An unsupervised phoneme recognition -accuracy of 36% was achieved in the preliminary experiments. -" -7024,1804.00318,"Pei-Hung Chung, Kuan Tung, Ching-Lun Tai, Hung-Yi Lee","Joint Learning of Interactive Spoken Content Retrieval and Trainable - User Simulator",cs.CL," User-machine interaction is crucial for information retrieval, especially for -spoken content retrieval, because spoken content is difficult to browse, and -speech recognition has a high degree of uncertainty. In interactive retrieval, -the machine takes different actions to interact with the user to obtain better -retrieval results; here it is critical to select the most efficient action. In -previous work, deep Q-learning techniques were proposed to train an interactive -retrieval system but rely on a hand-crafted user simulator; building a reliable -user simulator is difficult. In this paper, we further improve the interactive -spoken content retrieval framework by proposing a learnable user simulator -which is jointly trained with interactive retrieval system, making the -hand-crafted user simulator unnecessary. The experimental results show that the -learned simulated users not only achieve larger rewards than the hand-crafted -ones but act more like real users. -" -7025,1804.00320,"Chia-Hsuan Li, Szu-Lin Wu, Chi-Liang Liu, Hung-yi Lee","Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition - Errors on Listening Comprehension",cs.CL," Reading comprehension has been widely studied. One of the most representative -reading comprehension tasks is Stanford Question Answering Dataset (SQuAD), on -which machine is already comparable with human. On the other hand, accessing -large collections of multimedia or spoken content is much more difficult and -time-consuming than plain text content for humans. It's therefore highly -attractive to develop machines which can automatically understand spoken -content. In this paper, we propose a new listening comprehension task - Spoken -SQuAD. On the new task, we found that speech recognition errors have -catastrophic impact on machine comprehension, and several approaches are -proposed to mitigate the impact. -" -7026,1804.00344,"Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, - Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri - Aji, Nikolay Bogoychev, Andr\'e F. T. Martins, Alexandra Birch",Marian: Fast Neural Machine Translation in C++,cs.CL," We present Marian, an efficient and self-contained Neural Machine Translation -framework with an integrated automatic differentiation engine based on dynamic -computation graphs. Marian is written entirely in C++. We describe the design -of the encoder-decoder framework and demonstrate that a research-friendly -toolkit can achieve high training and translation speed. -" -7027,1804.00401,"Prasetya Utama, Nathaniel Weir, Fuat Basik, Carsten Binnig, Ugur - Cetintemel, Benjamin H\""attasch, Amir Ilkhechi, Shekar Ramaswamy, Arif Usta",An End-to-end Neural Natural Language Interface for Databases,cs.DB cs.CL cs.HC," The ability to extract insights from new data sets is critical for decision -making. Visual interactive tools play an important role in data exploration -since they provide non-technical users with an effective way to visually -compose queries and comprehend the results. Natural language has recently -gained traction as an alternative query interface to databases with the -potential to enable non-expert users to formulate complex questions and -information needs efficiently and effectively. However, understanding natural -language questions and translating them accurately to SQL is a challenging -task, and thus Natural Language Interfaces for Databases (NLIDBs) have not yet -made their way into practical tools and commercial products. - In this paper, we present DBPal, a novel data exploration tool with a natural -language interface. DBPal leverages recent advances in deep models to make -query understanding more robust in the following ways: First, DBPal uses a deep -model to translate natural language statements to SQL, making the translation -process more robust to paraphrasing and other linguistic variations. Second, to -support the users in phrasing questions without knowing the database schema and -the query features, DBPal provides a learned auto-completion model that -suggests partial query extensions to users during query formulation and thus -helps to write complex queries. -" -7028,1804.00425,"Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba","High-quality nonparallel voice conversion based on cycle-consistent - adversarial network",eess.AS cs.CL cs.SD stat.ML," Although voice conversion (VC) algorithms have achieved remarkable success -along with the development of machine learning, superior performance is still -difficult to achieve when using nonparallel data. In this paper, we propose -using a cycle-consistent adversarial network (CycleGAN) for nonparallel -data-based VC training. A CycleGAN is a generative adversarial network (GAN) -originally developed for unpaired image-to-image translation. A subjective -evaluation of inter-gender conversion demonstrated that the proposed method -significantly outperformed a method based on the Merlin open source neural -network speech synthesis system (a parallel VC system adapted for our setup) -and a GAN-based parallel VC system. This is the first research to show that the -performance of a nonparallel VC method can exceed that of state-of-the-art -parallel VC methods. -" -7029,1804.00482,"Sotiris K. Tasoulis, Aristidis G. Vrahatis, Spiros V. Georgakopoulos, - Vassilis P. Plagianakos",Real Time Sentiment Change Detection of Twitter Data Streams,cs.CL," In the past few years, there has been a huge growth in Twitter sentiment -analysis having already provided a fair amount of research on sentiment -detection of public opinion among Twitter users. Given the fact that Twitter -messages are generated constantly with dizzying rates, a huge volume of -streaming data is created, thus there is an imperative need for accurate -methods for knowledge discovery and mining of this information. Although there -exists a plethora of twitter sentiment analysis methods in the recent -literature, the researchers have shifted to real-time sentiment identification -on twitter streaming data, as expected. A major challenge is to deal with the -Big Data challenges arising in Twitter streaming applications concerning both -Volume and Velocity. Under this perspective, in this paper, a methodological -approach based on open source tools is provided for real-time detection of -changes in sentiment that is ultra efficient with respect to both memory -consumption and computational cost. This is achieved by iteratively collecting -tweets in real time and discarding them immediately after their process. For -this purpose, we employ the Lexicon approach for sentiment characterizations, -while change detection is achieved through appropriate control charts that do -not require historical information. We believe that the proposed methodology -provides the trigger for a potential large-scale monitoring of threads in an -attempt to discover fake news spread or propaganda efforts in their early -stages. Our experimental real-time analysis based on a recent hashtag provides -evidence that the proposed approach can detect meaningful sentiment changes -across a hashtags lifetime. -" -7030,1804.00508,"Rivas P. Pedro E., Velarde-Anaya Omar, Gonzalez-Lopez Samuel, Rivas P. - Pablo, Alvarez-Torres Norma Angelica","Entrenamiento de una red neuronal para el reconocimiento de imagenes de - lengua de senas capturadas con sensores de profundidad",cs.CV cs.CL," Due to the growth of the population with hearing problems, devices have been -developed that facilitate the inclusion of deaf people in society, using -technology as a communication tool, such as vision systems. Then, a solution to -this problem is presented using neural networks and autoencoders for the -classification of American Sign Language images. As a result, 99.5% accuracy -and an error of 0.01684 were obtained for image classification -" -7031,1804.00520,"Thanh Vu, Dat Quoc Nguyen, Xuan-Son Vu, Dai Quoc Nguyen, Michael Catt - and Michael Trenell","NIHRIO at SemEval-2018 Task 3: A Simple and Accurate Neural Network - Model for Irony Detection in Twitter",cs.CL," This paper describes our NIHRIO system for SemEval-2018 Task 3 ""Irony -detection in English tweets"". We propose to use a simple neural network -architecture of Multilayer Perceptron with various types of input features -including: lexical, syntactic, semantic and polarity features. Our system -achieves very high performance in both subtasks of binary and multi-class irony -detection in tweets. In particular, we rank third using the accuracy metric and -fifth using the F1 metric. Our code is available at -https://github.com/NIHRIO/IronyDetectionInTwitter -" -7032,1804.00522,"Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher","A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech - Domain Adaptation",cs.CL cs.LG," Domain adaptation plays an important role for speech recognition models, in -particular, for domains that have low resources. We propose a novel generative -model based on cyclic-consistent generative adversarial network (CycleGAN) for -unsupervised non-parallel speech domain adaptation. The proposed model employs -multiple independent discriminators on the power spectrogram, each in charge of -different frequency bands. As a result we have 1) better discriminators that -focus on fine-grained details of the frequency features, and 2) a generator -that is capable of generating more realistic domain-adapted spectrogram. We -demonstrate the effectiveness of our method on speech recognition with gender -adaptation, where the model only has access to supervised data from one gender -during training, but is evaluated on the other at test time. Our model is able -to achieve an average of $7.41\%$ on phoneme error rate, and $11.10\%$ word -error rate relative performance improvement as compared to the baseline, on -TIMIT and WSJ dataset, respectively. Qualitatively, our model also generates -more natural sounding speech, when conditioned on data from the other domain. -" -7033,1804.00538,"Wei Zhao, Jianbo Ye, Min Yang, Zeyang Lei, Suofei Zhang, Zhou Zhao","Investigating Capsule Networks with Dynamic Routing for Text - Classification",cs.CL cs.AI," In this study, we explore capsule networks with dynamic routing for text -classification. We propose three strategies to stabilize the dynamic routing -process to alleviate the disturbance of some noise capsules which may contain -""background"" information or have not been successfully trained. A series of -experiments are conducted with capsule networks on six text classification -benchmarks. Capsule networks achieve state of the art on 4 out of 6 datasets, -which shows the effectiveness of capsule networks for text classification. We -additionally show that capsule networks exhibit significant improvement when -transfer single-label to multi-label text classification over strong baseline -methods. To the best of our knowledge, this is the first work that capsule -networks have been empirically investigated for text modeling. -" -7034,1804.00540,Madhvi Soni and Jitendra Singh Thakur,A Systematic Review of Automated Grammar Checking in English Language,cs.CL," Grammar checking is the task of detection and correction of grammatical -errors in the text. English is the dominating language in the field of science -and technology. Therefore, the non-native English speakers must be able to use -correct English grammar while reading, writing or speaking. This generates the -need of automatic grammar checking tools. So far many approaches have been -proposed and implemented. But less efforts have been made in surveying the -literature in the past decade. The objective of this systematic review is to -examine the existing literature, highlighting the current issues and suggesting -the potential directions of future research. This systematic review is a result -of analysis of 12 primary studies obtained after designing a search strategy -for selecting papers found on the web. We also present a possible scheme for -the classification of grammar errors. Among the main observations, we found -that there is a lack of efficient and robust grammar checking tools for real -time applications. We present several useful illustrations- most prominent are -the schematic diagrams that we provide for each approach and a table that -summarizes these approaches along different dimensions such as target error -types, linguistic dataset used, strengths and limitations of the approach. This -facilitates better understandability, comparison and evaluation of previous -research. -" -7035,1804.00551,"A.Artemov, A. Sergeev, A. Khasenevich, A. Yuzhakov, M. Chugunov","The Training of Neuromodels for Machine Comprehension of Text. - Brain2Text Algorithm",cs.CL cs.LG," Nowadays, the Internet represents a vast informational space, growing -exponentially and the problem of search for relevant data becomes essential as -never before. The algorithm proposed in the article allows to perform natural -language queries on content of the document and get comprehensive meaningful -answers. The problem is partially solved for English as SQuAD contains enough -data to learn on, but there is no such dataset in Russian, so the methods used -by scientists now are not applicable to Russian. Brain2 framework allows to -cope with the problem - it stands out for its ability to be applied on small -datasets and does not require impressive computing power. The algorithm is -illustrated on Sberbank of Russia Strategy's text and assumes the use of a -neuromodel consisting of 65 mln synapses. The trained model is able to -construct word-by-word answers to questions based on a given text. The existing -limitations are its current inability to identify synonyms, pronoun relations -and allegories. Nevertheless, the results of conducted experiments showed high -capacity and generalisation ability of the suggested approach. -" -7036,1804.00619,"Su Wang, Greg Durrett, Katrin Erk",Modeling Semantic Plausibility by Injecting World Knowledge,cs.CL," Distributional data tells us that a man can swallow candy, but not that a man -can swallow a paintball, since this is never attested. However both are -physically plausible events. This paper introduces the task of semantic -plausibility: recognizing plausible but possibly novel events. We present a new -crowdsourced dataset of semantic plausibility judgments of single events such -as ""man swallow paintball"". Simple models based on distributional -representations perform poorly on this task, despite doing well on selection -preference, but injecting manually elicited knowledge about entity properties -provides a substantial performance boost. Our error analysis shows that our new -dataset is a great testbed for semantic plausibility models: more sophisticated -knowledge representation and propagation could address many of the remaining -errors. -" -7037,1804.00644,"Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang",Adversarial Teacher-Student Learning for Unsupervised Domain Adaptation,eess.AS cs.CL cs.LG cs.SD," The teacher-student (T/S) learning has been shown effective in unsupervised -domain adaptation [1]. It is a form of transfer learning, not in terms of the -transfer of recognition decisions, but the knowledge of posteriori -probabilities in the source domain as evaluated by the teacher model. It learns -to handle the speaker and environment variability inherent in and restricted to -the speech signal in the target domain without proactively addressing the -robustness to other likely conditions. Performance degradation may thus ensue. -In this work, we advance T/S learning by proposing adversarial T/S learning to -explicitly achieve condition-robust unsupervised domain adaptation. In this -method, a student acoustic model and a condition classifier are jointly -optimized to minimize the Kullback-Leibler divergence between the output -distributions of the teacher and student models, and simultaneously, to -min-maximize the condition classification loss. A condition-invariant deep -feature is learned in the adapted student model through this procedure. We -further propose multi-factorial adversarial T/S learning which suppresses -condition variabilities caused by multiple factors simultaneously. Evaluated -with the noisy CHiME-3 test set, the proposed methods achieve relative word -error rate improvements of 44.60% and 5.38%, respectively, over a clean source -model and a strong T/S learning baseline model. -" -7038,1804.00720,"Bhuwan Dhingra, Danish Pruthi, Dheeraj Rajagopal",Simple and Effective Semi-Supervised Question Answering,cs.CL cs.LG," Recent success of deep learning models for the task of extractive Question -Answering (QA) is hinged on the availability of large annotated corpora. -However, large domain specific annotated corpora are limited and expensive to -construct. In this work, we envision a system where the end user specifies a -set of base documents and only a few labelled examples. Our system exploits the -document structure to create cloze-style questions from these base documents; -pre-trains a powerful neural network on the cloze style questions; and further -fine-tunes the model on the labeled examples. We evaluate our proposed system -across three diverse datasets from different domains, and find it to be highly -effective with very little labeled data. We attain more than 50% F1 score on -SQuAD and TriviaQA with less than a thousand labelled examples. We are also -releasing a set of 3.2M cloze-style questions for practitioners to use while -building QA systems. -" -7039,1804.00732,"Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, - Biing-Hwang (Fred) Juang",Speaker-Invariant Training via Adversarial Learning,eess.AS cs.AI cs.CL cs.SD," We propose a novel adversarial multi-task learning scheme, aiming at actively -curtailing the inter-talker feature variability while maximizing its senone -discriminability so as to enhance the performance of a deep neural network -(DNN) based ASR system. We call the scheme speaker-invariant training (SIT). In -SIT, a DNN acoustic model and a speaker classifier network are jointly -optimized to minimize the senone (tied triphone state) classification loss, and -simultaneously mini-maximize the speaker classification loss. A -speaker-invariant and senone-discriminative deep feature is learned through -this adversarial multi-task learning. With SIT, a canonical DNN acoustic model -with significantly reduced variance in its output probabilities is learned with -no explicit speaker-independent (SI) transformations or speaker-specific -representations used in training or testing. Evaluated on the CHiME-3 dataset, -the SIT achieves 4.99% relative word error rate (WER) improvement over the -conventional SI acoustic model. With additional unsupervised speaker -adaptation, the speaker-adapted (SA) SIT model achieves 4.86% relative WER gain -over the SA SI acoustic model. -" -7040,1804.00804,"Rajat Singh, Nurendra Choudhary and Manish Shrivastava","Automatic Normalization of Word Variations in Code-Mixed Social Media - Text",cs.CL," Social media platforms such as Twitter and Facebook are becoming popular in -multilingual societies. This trend induces portmanteau of South Asian languages -with English. The blend of multiple languages as code-mixed data has recently -become popular in research communities for various NLP tasks. Code-mixed data -consist of anomalies such as grammatical errors and spelling variations. In -this paper, we leverage the contextual property of words where the different -spelling variation of words share similar context in a large noisy social media -text. We capture different variations of words belonging to same context in an -unsupervised manner using distributed representations of words. Our experiments -reveal that preprocessing of the code-mixed dataset based on our approach -improves the performance in state-of-the-art part-of-speech tagging -(POS-tagging) and sentiment analysis tasks. -" -7041,1804.00805,"Nurendra Choudhary, Rajat Singh, Ishita Bindlish and Manish - Shrivastava","Emotions are Universal: Learning Sentiment Based Representations of - Resource-Poor Languages using Siamese Networks",cs.CL," Machine learning approaches in sentiment analysis principally rely on the -abundance of resources. To limit this dependence, we propose a novel method -called Siamese Network Architecture for Sentiment Analysis (SNASA) to learn -representations of resource-poor languages by jointly training them with -resource-rich languages using a siamese network. - SNASA model consists of twin Bi-directional Long Short-Term Memory Recurrent -Neural Networks (Bi-LSTM RNN) with shared parameters joined by a contrastive -loss function, based on a similarity metric. The model learns the sentence -representations of resource-poor and resource-rich language in a common -sentiment space by using a similarity metric based on their individual -sentiments. The model, hence, projects sentences with similar sentiment closer -to each other and the sentences with different sentiment farther from each -other. Experiments on large-scale datasets of resource-rich languages - English -and Spanish and resource-poor languages - Hindi and Telugu reveal that SNASA -outperforms the state-of-the-art sentiment analysis approaches based on -distributional semantics, semantic rules, lexicon lists and deep neural network -representations without sh -" -7042,1804.00806,"Nurendra Choudhary, Rajat Singh, Ishita Bindlish and Manish - Shrivastava","Sentiment Analysis of Code-Mixed Languages leveraging Resource Rich - Languages",cs.CL," Code-mixed data is an important challenge of natural language processing -because its characteristics completely vary from the traditional structures of -standard languages. - In this paper, we propose a novel approach called Sentiment Analysis of -Code-Mixed Text (SACMT) to classify sentences into their corresponding -sentiment - positive, negative or neutral, using contrastive learning. We -utilize the shared parameters of siamese networks to map the sentences of -code-mixed and standard languages to a common sentiment space. Also, we -introduce a basic clustering based preprocessing method to capture variations -of code-mixed transliterated words. Our experiments reveal that SACMT -outperforms the state-of-the-art approaches in sentiment analysis for -code-mixed text by 7.6% in accuracy and 10.1% in F-score. -" -7043,1804.00823,"Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, Michael Witbrock, and - Vadim Sheinin","Graph2Seq: Graph to Sequence Learning with Attention-based Neural - Networks",cs.AI cs.CL cs.LG stat.ML," The celebrated Sequence to Sequence learning (Seq2Seq) technique and its -numerous variants achieve excellent performance on many tasks. However, many -machine learning tasks have inputs naturally represented as graphs; existing -Seq2Seq models face a significant challenge in achieving accurate conversion -from graph form to the appropriate sequence. To address this challenge, we -introduce a novel general end-to-end graph-to-sequence neural encoder-decoder -model that maps an input graph to a sequence of vectors and uses an -attention-based LSTM method to decode the target sequence from these vectors. -Our method first generates the node and graph embeddings using an improved -graph-based neural network with a novel aggregation strategy to incorporate -edge direction information in the node embeddings. We further introduce an -attention mechanism that aligns node embeddings and the decoding sequence to -better cope with large graphs. Experimental results on bAbI, Shortest Path, and -Natural Language Generation tasks demonstrate that our model achieves -state-of-the-art performance and significantly outperforms existing graph -neural networks, Seq2Seq, and Tree2Seq models; using the proposed -bi-directional node embedding aggregation strategy, the model can converge -rapidly to the optimal performance. -" -7044,1804.00828,"Kang-Min Kim, Aliyeva Dinara, Byung-Ju Choi, SangKeun Lee","Incorporating Word Embeddings into Open Directory Project based - Large-scale Classification",cs.CL," Recently, implicit representation models, such as embedding or deep learning, -have been successfully adopted to text classification task due to their -outstanding performance. However, these approaches are limited to small- or -moderate-scale text classification. Explicit representation models are often -used in a large-scale text classification, like the Open Directory Project -(ODP)-based text classification. However, the performance of these models is -limited to the associated knowledge bases. In this paper, we incorporate word -embeddings into the ODP-based large-scale classification. To this end, we first -generate category vectors, which represent the semantics of ODP categories by -jointly modeling word embeddings and the ODP-based text classification. We then -propose a novel semantic similarity measure, which utilizes the category and -word vectors obtained from the joint model and word embeddings, respectively. -The evaluation results clearly show the efficacy of our methodology in -large-scale text classification. The proposed scheme exhibits significant -improvements of 10% and 28% in terms of macro-averaging F1-score and precision -at k, respectively, over state-of-the-art techniques. -" -7045,1804.00831,Yanghoon Kim and Hwanhee Lee and Kyomin Jung,"AttnConvnet at SemEval-2018 Task 1: Attention-based Convolutional Neural - Networks for Multi-label Emotion Classification",cs.CL cs.LG cs.NE," In this paper, we propose an attention-based classifier that predicts -multiple emotions of a given sentence. Our model imitates human's two-step -procedure of sentence understanding and it can effectively represent and -classify sentences. With emoji-to-meaning preprocessing and extra lexicon -utilization, we further improve the model performance. We train and evaluate -our model with data provided by SemEval-2018 task 1-5, each sentence of which -has several labels among 11 given sentiments. Our model achieves 5-th/1-th rank -in English/Spanish respectively. -" -7046,1804.00832,Iroro Orife,"Attentive Sequence-to-Sequence Learning for Diacritic Restoration of - Yor\`ub\'a Language Text",cs.CL," Yor\`ub\'a is a widely spoken West African language with a writing system -rich in tonal and orthographic diacritics. With very few exceptions, diacritics -are omitted from electronic texts, due to limited device and application -support. Diacritics provide morphological information, are crucial for lexical -disambiguation, pronunciation and are vital for any Yor\`ub\'a text-to-speech -(TTS), automatic speech recognition (ASR) and natural language processing (NLP) -tasks. Reframing Automatic Diacritic Restoration (ADR) as a machine translation -task, we experiment with two different attentive Sequence-to-Sequence neural -models to process undiacritized text. On our evaluation dataset, this approach -produces diacritization error rates of less than 5%. We have released -pre-trained models, datasets and source-code as an open-source project to -advance efforts on Yor\`ub\'a language technology. -" -7047,1804.00857,"Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang","Bi-Directional Block Self-Attention for Fast and Memory-Efficient - Sequence Modeling",cs.CL cs.AI," Recurrent neural networks (RNN), convolutional neural networks (CNN) and -self-attention networks (SAN) are commonly used to produce context-aware -representations. RNN can capture long-range dependency but is hard to -parallelize and not time-efficient. CNN focuses on local dependency but does -not perform well on some tasks. SAN can model both such dependencies via highly -parallelizable computation, but memory requirement grows rapidly in line with -sequence length. In this paper, we propose a model, called ""bi-directional -block self-attention network (Bi-BloSAN)"", for RNN/CNN-free sequence encoding. -It requires as little memory as RNN but with all the merits of SAN. Bi-BloSAN -splits the entire sequence into blocks, and applies an intra-block SAN to each -block for modeling local context, then applies an inter-block SAN to the -outputs for all blocks to capture long-range dependency. Thus, each SAN only -needs to process a short sequence, and only a small amount of memory is -required. Additionally, we use feature-level attention to handle the variation -of contexts around the same word, and use forward/backward masks to encode -temporal order information. On nine benchmark datasets for different NLP tasks, -Bi-BloSAN achieves or improves upon state-of-the-art accuracy, and shows better -efficiency-memory trade-off than existing RNN/CNN/SAN. -" -7048,1804.00920,"Lauri Juvela and Bajibabu Bollepalli and Xin Wang and Hirokazu Kameoka - and Manu Airaksinen and Junichi Yamagishi and Paavo Alku","Speech waveform synthesis from MFCC sequences with generative - adversarial networks",eess.AS cs.CL cs.SD stat.ML," This paper proposes a method for generating speech from filterbank mel -frequency cepstral coefficients (MFCC), which are widely used in speech -applications, such as ASR, but are generally considered unusable for speech -synthesis. First, we predict fundamental frequency and voicing information from -MFCCs with an autoregressive recurrent neural net. Second, the spectral -envelope information contained in MFCCs is converted to all-pole filters, and a -pitch-synchronous excitation model matched to these filters is trained. -Finally, we introduce a generative adversarial network -based noise model to -add a realistic high-frequency stochastic component to the modeled excitation -signal. The results show that high quality speech reconstruction can be -obtained, given only MFCC information at test time. -" -7049,1804.00968,Prudhvi Raj Dachapally and Srikanth Ramanam,In-depth Question classification using Convolutional Neural Networks,cs.CL," Convolutional neural networks for computer vision are fairly intuitive. In a -typical CNN used in image classification, the first layers learn edges, and the -following layers learn some filters that can identify an object. But CNNs for -Natural Language Processing are not used often and are not completely -intuitive. We have a good idea about what the convolution filters learn for the -task of text classification, and to that, we propose a neural network structure -that will be able to give good results in less time. We will be using -convolutional neural networks to predict the primary or broader topic of a -question, and then use separate networks for each of these predicted topics to -accurately classify their sub-topics. -" -7050,1804.00982,"Sebastian Ruder, John Glover, Afshin Mehrabani, Parsa Ghaffari",360{\deg} Stance Detection,cs.CL cs.IR cs.SI stat.ML," The proliferation of fake news and filter bubbles makes it increasingly -difficult to form an unbiased, balanced opinion towards a topic. To ameliorate -this, we propose 360{\deg} Stance Detection, a tool that aggregates news with -multiple perspectives on a topic. It presents them on a spectrum ranging from -support to opposition, enabling the user to base their opinion on multiple -pieces of diverse evidence. -" -7051,1804.00987,Kyle Richardson,A Language for Function Signature Representations,cs.CL cs.AI cs.PL," Recent work by (Richardson and Kuhn, 2017a,b; Richardson et al., 2018) looks -at semantic parser induction and question answering in the domain of source -code libraries and APIs. In this brief note, we formalize the representations -being learned in these studies and introduce a simple domain specific language -and a systematic translation from this language to first-order logic. By -recasting the target representations in terms of classical logic, we aim to -broaden the applicability of existing code datasets for investigating more -complex natural language understanding and reasoning problems in the software -domain. -" -7052,1804.01000,Karamjit Singh and Vishal Sunder,"CIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble - of Deep and Shallow Learning to predict the Quality of Product Titles",cs.CL cs.AI," We present an approach where two different models (Deep and Shallow) are -trained separately on the data and a weighted average of the outputs is taken -as the final result. For the Deep approach, we use different combinations of -models like Convolution Neural Network, pretrained word2vec embeddings and -LSTMs to get representations which are then used to train a Deep Neural -Network. For Clarity prediction, we also use an Attentive Pooling approach for -the pooling operation so as to be aware of the Title-Category pair. For the -shallow approach, we use boosting technique LightGBM on features generated -using title and categories. We find that an ensemble of these approaches does a -better job than using them alone suggesting that the results of the deep and -shallow approach are highly complementary -" -7053,1804.01041,"Prashant Mathur, Nicola Ueffing and Gregor Leusch",Multi-lingual neural title generation for e-Commerce browse pages,cs.CL," To provide better access of the inventory to buyers and better search engine -optimization, e-Commerce websites are automatically generating millions of -easily searchable browse pages. A browse page consists of a set of slot -name/value pairs within a given category, grouping multiple items which share -some characteristics. These browse pages require a title describing the content -of the page. Since the number of browse pages are huge, manual creation of -these titles is infeasible. Previous statistical and neural approaches depend -heavily on the availability of large amounts of data in a language. In this -research, we apply sequence-to-sequence models to generate titles for high- & -low-resourced languages by leveraging transfer learning. We train these models -on multi-lingual data, thereby creating one joint model which can generate -titles in various different languages. Performance of the title generation -system is evaluated on three different languages; English, German, and French, -with a particular focus on low-resourced French language. -" -7054,1804.01155,"Jacob Levy Abitbol, M\'arton Karsai, Jean-Philippe Magu\'e, - Jean-Pierre Chevrot and Eric Fleury","Socioeconomic Dependencies of Linguistic Patterns in Twitter: A - Multivariate Analysis",cs.CL cs.CY cs.SI physics.soc-ph stat.ML," Our usage of language is not solely reliant on cognition but is arguably -determined by myriad external factors leading to a global variability of -linguistic patterns. This issue, which lies at the core of sociolinguistics and -is backed by many small-scale studies on face-to-face communication, is -addressed here by constructing a dataset combining the largest French Twitter -corpus to date with detailed socioeconomic maps obtained from national census -in France. We show how key linguistic variables measured in individual Twitter -streams depend on factors like socioeconomic status, location, time, and the -social network of individuals. We found that (i) people of higher socioeconomic -status, active to a greater degree during the daytime, use a more standard -language; (ii) the southern part of the country is more prone to use more -standard language than the northern one, while locally the used variety or -dialect is determined by the spatial distribution of socioeconomic status; and -(iii) individuals connected in the social network are closer linguistically -than disconnected ones, even after the effects of status homophily have been -removed. Our results inform sociolinguistic theory and may inspire novel -learning methods for the inference of socioeconomic status of people from the -way they tweet. -" -7055,1804.01189,Aaron Jaech and Baosen Zhang and Mari Ostendorf and Daniel S. Kirschen,Real-Time Prediction of the Duration of Distribution System Outages,cs.SY cs.CL math.OC stat.ML," This paper addresses the problem of predicting duration of unplanned power -outages, using historical outage records to train a series of neural network -predictors. The initial duration prediction is made based on environmental -factors, and it is updated based on incoming field reports using natural -language processing to automatically analyze the text. Experiments using 15 -years of outage records show good initial results and improved performance -leveraging text. Case studies show that the language processing identifies -phrases that point to outage causes and repair steps. -" -7056,1804.01452,"David Harwath, Adri\`a Recasens, D\'idac Sur\'is, Galen Chuang, - Antonio Torralba, and James Glass","Jointly Discovering Visual Objects and Spoken Words from Raw Sensory - Input",cs.CV cs.CL cs.SD," In this paper, we explore neural network models that learn to associate -segments of spoken audio captions with the semantically relevant portions of -natural images that they refer to. We demonstrate that these audio-visual -associative localizations emerge from network-internal representations learned -as a by-product of training to perform an image-audio retrieval task. Our -models operate directly on the image pixels and speech waveform, and do not -rely on any conventional supervision in the form of labels, segmentations, or -alignments between the modalities during training. We perform analysis using -the Places 205 and ADE20k datasets demonstrating that our models implicitly -learn semantically-coupled object and word detectors. -" -7057,1804.01486,"Andrew L. Beam, Benjamin Kompa, Allen Schmaltz, Inbar Fried, Griffin - Weber, Nathan P. Palmer, Xu Shi, Tianxi Cai, Isaac S. Kohane","Clinical Concept Embeddings Learned from Massive Sources of Multimodal - Medical Data",cs.CL cs.AI stat.ML," Word embeddings are a popular approach to unsupervised learning of word -relationships that are widely used in natural language processing. In this -article, we present a new set of embeddings for medical concepts learned using -an extremely large collection of multimodal medical data. Leaning on recent -theoretical insights, we demonstrate how an insurance claims database of 60 -million members, a collection of 20 million clinical notes, and 1.7 million -full text biomedical journal articles can be combined to embed concepts into a -common space, resulting in the largest ever set of embeddings for 108,477 -medical concepts. To evaluate our approach, we present a new benchmark -methodology based on statistical power specifically designed to test embeddings -of medical concepts. Our approach, called cui2vec, attains state-of-the-art -performance relative to previous methods in most instances. Finally, we provide -a downloadable set of pre-trained embeddings for other researchers to use, as -well as an online tool for interactive exploration of the cui2vec embeddings -" -7058,1804.01503,"Paul Azunre, Craig Corcoran, David Sullivan, Garrett Honke, Rebecca - Ruppel, Sandeep Verma, Jonathon Morgan","Abstractive Tabular Dataset Summarization via Knowledge Base Semantic - Embeddings",cs.AI cs.CL," This paper describes an abstractive summarization method for tabular data -which employs a knowledge base semantic embedding to generate the summary. -Assuming the dataset contains descriptive text in headers, columns and/or some -augmenting metadata, the system employs the embedding to recommend a -subject/type for each text segment. Recommendations are aggregated into a small -collection of super types considered to be descriptive of the dataset by -exploiting the hierarchy of types in a pre-specified ontology. Using February -2015 Wikipedia as the knowledge base, and a corresponding DBpedia ontology as -types, we present experimental results on open data taken from several -sources--OpenML, CKAN and data.world--to illustrate the effectiveness of the -approach. -" -7059,1804.01720,"Martin Engilberge, Louis Chevallier, Patrick P\'erez, Matthieu Cord","Finding beans in burgers: Deep semantic-visual embedding with - localization",cs.CV cs.CL cs.LG," Several works have proposed to learn a two-path neural network that maps -images and texts, respectively, to a same shared Euclidean space where geometry -captures useful semantic relationships. Such a multi-modal embedding can be -trained and used for various tasks, notably image captioning. In the present -work, we introduce a new architecture of this type, with a visual path that -leverages recent space-aware pooling mechanisms. Combined with a textual path -which is jointly trained from scratch, our semantic-visual embedding offers a -versatile model. Once trained under the supervision of captioned images, it -yields new state-of-the-art performance on cross-modal retrieval. It also -allows the localization of new concepts from the embedding space into any input -image, delivering state-of-the-art result on the visual grounding of phrases. -" -7060,1804.01760,Longyue Wang,Domain Adaptation for Statistical Machine Translation,cs.CL," Statistical machine translation (SMT) systems perform poorly when it is -applied to new target domains. Our goal is to explore domain adaptation -approaches and techniques for improving the translation quality of -domain-specific SMT systems. However, translating texts from a specific domain -(e.g., medicine) is full of challenges. The first challenge is ambiguity. Words -or phrases contain different meanings in different contexts. The second one is -language style due to the fact that texts from different genres are always -presented in different syntax, length and structural organization. The third -one is the out-of-vocabulary words (OOVs) problem. In-domain training data are -often scarce with low terminology coverage. In this thesis, we explore the -state-of-the-art domain adaptation approaches and propose effective solutions -to address those problems. -" -7061,1804.01768,"Siyou Liu, Longyue Wang, Chao-Hong Liu","Chinese-Portuguese Machine Translation: A Study on Building Parallel - Corpora from Comparable Texts",cs.CL," Although there are increasing and significant ties between China and -Portuguese-speaking countries, there is not much parallel corpora in the -Chinese-Portuguese language pair. Both languages are very populous, with 1.2 -billion native Chinese speakers and 279 million native Portuguese speakers, the -language pair, however, could be considered as low-resource in terms of -available parallel corpora. In this paper, we describe our methods to curate -Chinese-Portuguese parallel corpora and evaluate their quality. We extracted -bilingual data from Macao government websites and proposed a hierarchical -strategy to build a large parallel corpus. Experiments are conducted on -existing and our corpora using both Phrased-Based Machine Translation (PBMT) -and the state-of-the-art Neural Machine Translation (NMT) models. The results -of this work can be used as a benchmark for future Chinese-Portuguese MT -systems. The approach we used in this paper also shows a good example on how to -boost performance of MT systems for low-resource language pairs. -" -7062,1804.01772,Andres Garcia and Jose Manuel Gomez-Perez,"Not just about size - A Study on the Role of Distributed Word - Representations in the Analysis of Scientific Publications",cs.CL," The emergence of knowledge graphs in the scholarly communication domain and -recent advances in artificial intelligence and natural language processing -bring us closer to a scenario where intelligent systems can assist scientists -over a range of knowledge-intensive tasks. In this paper we present -experimental results about the generation of word embeddings from scholarly -publications for the intelligent processing of scientific texts extracted from -SciGraph. We compare the performance of domain-specific embeddings with -existing pre-trained vectors generated from very large and general purpose -corpora. Our results suggest that there is a trade-off between corpus -specificity and volume. Embeddings from domain-specific scientific corpora -effectively capture the semantics of the domain. On the other hand, obtaining -comparable results through general corpora can also be achieved, but only in -the presence of very large corpora of well formed text. Furthermore, We also -show that the degree of overlapping between knowledge areas is directly related -to the performance of embeddings in domain evaluation tasks. -" -7063,1804.01778,Yuanhao Liu and Sheng Yu,Word Segmentation as Graph Partition,cs.CL," We propose a new approach to the Chinese word segmentation problem that -considers the sentence as an undirected graph, whose nodes are the characters. -One can use various techniques to compute the edge weights that measure the -connection strength between characters. Spectral graph partition algorithms are -used to group the characters and achieve word segmentation. We follow the graph -partition approach and design several unsupervised algorithms, and we show -their inspiring segmentation results on two corpora: (1) electronic health -records in Chinese, and (2) benchmark data from the Second International -Chinese Word Segmentation Bakeoff. -" -7064,1804.01855,"Nurendra Choudhary, Rajat Singh, Ishita Bindlish and Manish - Shrivastava","Contrastive Learning of Emoji-based Representations for Resource-Poor - Languages",cs.CL," The introduction of emojis (or emoticons) in social media platforms has given -the users an increased potential for expression. We propose a novel method -called Classification of Emojis using Siamese Network Architecture (CESNA) to -learn emoji-based representations of resource-poor languages by jointly -training them with resource-rich languages using a siamese network. - CESNA model consists of twin Bi-directional Long Short-Term Memory Recurrent -Neural Networks (Bi-LSTM RNN) with shared parameters joined by a contrastive -loss function based on a similarity metric. The model learns the -representations of resource-poor and resource-rich language in a common emoji -space by using a similarity metric based on the emojis present in sentences -from both languages. The model, hence, projects sentences with similar emojis -closer to each other and the sentences with different emojis farther from one -another. Experiments on large-scale Twitter datasets of resource-rich languages -- English and Spanish and resource-poor languages - Hindi and Telugu reveal -that CESNA outperforms the state-of-the-art emoji prediction approaches based -on distributional semantics, semantic rules, lexicon lists and deep neural -network representations without shared parameters. -" -7065,1804.01963,"Emmanuel Dufourq, Bruce A. Bassett",Automated Classification of Text Sentiment,cs.CL cs.IR cs.NE stat.ML," The ability to identify sentiment in text, referred to as sentiment analysis, -is one which is natural to adult humans. This task is, however, not one which a -computer can perform by default. Identifying sentiments in an automated, -algorithmic manner will be a useful capability for business and research in -their search to understand what consumers think about their products or -services and to understand human sociology. Here we propose two new Genetic -Algorithms (GAs) for the task of automated text sentiment analysis. The GAs -learn whether words occurring in a text corpus are either sentiment or -amplifier words, and their corresponding magnitude. Sentiment words, such as -'horrible', add linearly to the final sentiment. Amplifier words in contrast, -which are typically adjectives/adverbs like 'very', multiply the sentiment of -the following word. This increases, decreases or negates the sentiment of the -following word. The sentiment of the full text is then the sum of these terms. -This approach grows both a sentiment and amplifier dictionary which can be -reused for other purposes and fed into other machine learning algorithms. We -report the results of multiple experiments conducted on large Amazon data sets. -The results reveal that our proposed approach was able to outperform several -public and/or commercial sentiment analysis algorithms. -" -7066,1804.02042,"Jonathan Rotsztejn, Nora Hollenstein, Ce Zhang","ETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and - Convolutional Neural Networks for Relation Classification and Extraction",cs.CL," Reliably detecting relevant relations between entities in unstructured text -is a valuable resource for knowledge extraction, which is why it has awaken -significant interest in the field of Natural Language Processing. In this -paper, we present a system for relation classification and extraction based on -an ensemble of convolutional and recurrent neural networks that ranked first in -3 out of the 4 subtasks at SemEval 2018 Task 7. We provide detailed -explanations and grounds for the design choices behind the most relevant -features and analyze their importance. -" -7067,1804.02063,Katherine Bailey and Sunny Chopra,"Few-Shot Text Classification with Pre-Trained Word Embeddings and a - Human in the Loop",cs.CL," Most of the literature around text classification treats it as a supervised -learning problem: given a corpus of labeled documents, train a classifier such -that it can accurately predict the classes of unseen documents. In industry, -however, it is not uncommon for a business to have entire corpora of documents -where few or none have been classified, or where existing classifications have -become meaningless. With web content, for example, poor taxonomy management can -result in labels being applied indiscriminately, making filtering by these -labels unhelpful. Our work aims to make it possible to classify an entire -corpus of unlabeled documents using a human-in-the-loop approach, where the -content owner manually classifies just one or two documents per category and -the rest can be automatically classified. This ""few-shot"" learning approach -requires rich representations of the documents such that those that have been -manually labeled can be treated as prototypes, and automatic classification of -the rest is a simple case of measuring the distance to prototypes. This -approach uses pre-trained word embeddings, where documents are represented -using a simple weighted average of constituent word embeddings. We have tested -the accuracy of the approach on existing labeled datasets and provide the -results here. We have also made code available for reproducing the results we -got on the 20 Newsgroups dataset. -" -7068,1804.02135,"Kei Akuzawa, Yusuke Iwasawa, Yutaka Matsuo","Expressive Speech Synthesis via Modeling Expressions with Variational - Autoencoder",cs.CL cs.SD eess.AS," Recent advances in neural autoregressive models have improve the performance -of speech synthesis (SS). However, as they lack the ability to model global -characteristics of speech (such as speaker individualities or speaking styles), -particularly when these characteristics have not been labeled, making neural -autoregressive SS systems more expressive is still an open issue. In this -paper, we propose to combine VoiceLoop, an autoregressive SS model, with -Variational Autoencoder (VAE). This approach, unlike traditional autoregressive -SS systems, uses VAE to model the global characteristics explicitly, enabling -the expressiveness of the synthesized speech to be controlled in an -unsupervised manner. Experiments using the VCTK and Blizzard2012 datasets show -the VAE helps VoiceLoop to generate higher quality speech and to control the -expressions in its synthesized speech by incorporating global characteristics -into the speech generating process. -" -7069,1804.02173,"Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg and - Stefan Wermter","On the Robustness of Speech Emotion Recognition for Human-Robot - Interaction with Deep Neural Networks",cs.RO cs.CL cs.HC cs.SD eess.AS," Speech emotion recognition (SER) is an important aspect of effective -human-robot collaboration and received a lot of attention from the research -community. For example, many neural network-based architectures were proposed -recently and pushed the performance to a new level. However, the applicability -of such neural SER models trained only on in-domain data to noisy conditions is -currently under-researched. In this work, we evaluate the robustness of -state-of-the-art neural acoustic emotion recognition models in human-robot -interaction scenarios. We hypothesize that a robot's ego noise, room -conditions, and various acoustic events that can occur in a home environment -can significantly affect the performance of a model. We conduct several -experiments on the iCub robot platform and propose several novel ways to reduce -the gap between the model's performance during training and testing in -real-world conditions. Furthermore, we observe large improvements in the model -performance on the robot and demonstrate the necessity of introducing several -data augmentation techniques like overlaying background noise and loudness -variations to improve the robustness of the neural approaches. -" -7070,1804.02186,Sreekavitha Parupalli and Navjyoti Singh,Enrichment of OntoSenseNet: Adding a Sense-annotated Telugu lexicon,cs.CL," The paper describes the enrichment of OntoSenseNet - a verb-centric lexical -resource for Indian Languages. This resource contains a newly developed -Telugu-Telugu dictionary. It is important because native speakers can better -annotate the senses when both the word and its meaning are in Telugu. Hence -efforts are made to develop a soft copy of Telugu dictionary. Our resource also -has manually annotated gold standard corpus consisting 8483 verbs, 253 adverbs -and 1673 adjectives. Annotations are done by native speakers according to -defined annotation guidelines. In this paper, we provide an overview of the -annotation procedure and present the validation of our resource through -inter-annotator agreement. Concepts of sense-class and sense-type are -discussed. Additionally, we discuss the potential of lexical sense-annotated -corpora in improving word sense disambiguation (WSD) tasks. Telugu WordNet is -crowd-sourced for annotation of individual words in synsets and is compared -with the developed sense-annotated lexicon (OntoSenseNet) to examine the -improvement. Also, we present a special categorization (spatio-temporal -classification) of adjectives. -" -7071,1804.02204,Adnan Haider and Philip C. Woodland,Sequence Training of DNN Acoustic Models With Natural Gradient,cs.CL cs.LG stat.ML," Deep Neural Network (DNN) acoustic models often use discriminative sequence -training that optimises an objective function that better approximates the word -error rate (WER) than frame-based training. Sequence training is normally -implemented using Stochastic Gradient Descent (SGD) or Hessian Free (HF) -training. This paper proposes an alternative batch style optimisation framework -that employs a Natural Gradient (NG) approach to traverse through the parameter -space. By correcting the gradient according to the local curvature of the -KL-divergence, the NG optimisation process converges more quickly than HF. -Furthermore, the proposed NG approach can be applied to any sequence -discriminative training criterion. The efficacy of the NG method is shown using -experiments on a Multi-Genre Broadcast (MGB) transcription task that -demonstrates both the computational efficiency and the accuracy of the -resulting DNN models. -" -7072,1804.02233,"Igor Mozeti\v{c}, Peter Gabrov\v{s}ek, Petra Kralj Novak","Forex trading and Twitter: Spam, bots, and reputation manipulation",cs.SI cs.CL cs.CY econ.TH," Currency trading (Forex) is the largest world market in terms of volume. We -analyze trading and tweeting about the EUR-USD currency pair over a period of -three years. First, a large number of tweets were manually labeled, and a -Twitter stance classification model is constructed. The model then classifies -all the tweets by the trading stance signal: buy, hold, or sell (EUR vs. USD). -The Twitter stance is compared to the actual currency rates by applying the -event study methodology, well-known in financial economics. It turns out that -there are large differences in Twitter stance distribution and potential -trading returns between the four groups of Twitter users: trading robots, -spammers, trading companies, and individual traders. Additionally, we observe -attempts of reputation manipulation by post festum removal of tweets with poor -predictions, and deleting/reposting of identical tweets to increase the -visibility without tainting one's Twitter timeline. -" -7073,1804.02286,"Richard Moot (CNRS, LIRMM/INFO, UM)",Chart Parsing Multimodal Grammars,cs.CL," The short note describes the chart parser for multimodal type-logical -grammars which has been developed in conjunction with the type-logical treebank -for French. The chart parser presents an incomplete but fast implementation of -proof search for multimodal type-logical grammars using the ""deductive parsing"" -framework. Proofs found can be transformed to natural deduction proofs. -" -7074,1804.02341,"Edward Choi, Angeliki Lazaridou, Nando de Freitas",Compositional Obverter Communication Learning From Raw Visual Input,cs.AI cs.CL cs.LG cs.NE," One of the distinguishing aspects of human language is its compositionality, -which allows us to describe complex environments with limited vocabulary. -Previously, it has been shown that neural network agents can learn to -communicate in a highly structured, possibly compositional language based on -disentangled input (e.g. hand- engineered features). Humans, however, do not -learn to communicate based on well-summarized features. In this work, we train -neural agents to simultaneously develop visual perception from raw image -pixels, and learn to communicate with a sequence of discrete symbols. The -agents play an image description game where the image contains factors such as -colors and shapes. We train the agents using the obverter technique where an -agent introspects to generate messages that maximize its own understanding. -Through qualitative analysis, visualization and a zero-shot test, we show that -the agents can develop, out of raw image pixels, a language with compositional -properties, given a proper pressure from the environment. -" -7075,1804.02472,"Rachel Rudinger, Aaron Steven White, Benjamin Van Durme",Neural models of factuality,cs.CL," We present two neural models for event factuality prediction, which yield -significant performance gains over previous models on three event factuality -datasets: FactBank, UW, and MEANTIME. We also present a substantial expansion -of the It Happened portion of the Universal Decompositional Semantics dataset, -yielding the largest event factuality dataset to date. We report model results -on this extended factuality dataset as well. -" -7076,1804.02504,"Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen, Hung-Yi - Lee, Lin-shan Lee","Scalable Sentiment for Sequence-to-sequence Chatbot Response with - Performance Analysis",cs.CL," Conventional seq2seq chatbot models only try to find the sentences with the -highest probabilities conditioned on the input sequences, without considering -the sentiment of the output sentences. Some research works trying to modify the -sentiment of the output sequences were reported. In this paper, we propose five -models to scale or adjust the sentiment of the chatbot response: persona-based -model, reinforcement learning, plug and play model, sentiment transformation -network and cycleGAN, all based on the conventional seq2seq model. We also -develop two evaluation metrics to estimate if the responses are reasonable -given the input. These metrics together with other two popularly used metrics -were used to analyze the performance of the five proposed models on different -aspects, and reinforcement learning and cycleGAN were shown to be very -attractive. The evaluation metrics were also found to be well correlated with -human evaluation. -" -7077,1804.02525,"Dario Pavllo, Tiziano Piccardi, Robert West","Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs - from Large News Corpora via Bootstrapping",cs.SI cs.CL cs.IR," We propose Quootstrap, a method for extracting quotations, as well as the -names of the speakers who uttered them, from large news corpora. Whereas prior -work has addressed this problem primarily with supervised machine learning, our -approach follows a fully unsupervised bootstrapping paradigm. It leverages the -redundancy present in large news corpora, more precisely, the fact that the -same quotation often appears across multiple news articles in slightly -different contexts. Starting from a few seed patterns, such as [""Q"", said S.], -our method extracts a set of quotation-speaker pairs (Q, S), which are in turn -used for discovering new patterns expressing the same quotations; the process -is then repeated with the larger pattern set. Our algorithm is highly scalable, -which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. -Validating our results against a crowdsourced ground truth, we obtain 90% -precision at 40% recall using a single seed pattern, with significantly higher -recall values for more frequently reported (and thus likely more interesting) -quotations. Finally, we showcase the usefulness of our algorithm's output for -computational social science by analyzing the sentiment expressed in our -extracted quotations. -" -7078,1804.02545,"Alexander Robertson, Sharon Goldwater","Evaluating historical text normalization systems: How well do they - generalize?",cs.CL," We highlight several issues in the evaluation of historical text -normalization systems that make it hard to tell how well these systems would -actually work in practice---i.e., for new datasets or languages; in comparison -to more na\""ive systems; or as a preprocessing step for downstream NLP tools. -We illustrate these issues and exemplify our proposed evaluation practices by -comparing two neural models against a na\""ive baseline system. We show that the -neural models generalize well to unseen words in tests on five languages; -nevertheless, they provide no clear benefit over the na\""ive baseline for -downstream POS tagging of an English historical collection. We conclude that -future work should include more rigorous evaluation, including both intrinsic -and extrinsic measures where possible. -" -7079,1804.02549,"Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi - Yamagishi","A comparison of recent waveform generation and acoustic modeling methods - for neural-network-based speech synthesis",eess.AS cs.CL cs.SD stat.ML," Recent advances in speech synthesis suggest that limitations such as the -lossy nature of the amplitude spectrum with minimum phase approximation and the -over-smoothing effect in acoustic modeling can be overcome by using advanced -machine learning approaches. In this paper, we build a framework in which we -can fairly compare new vocoding and acoustic modeling techniques with -conventional approaches by means of a large scale crowdsourced evaluation. -Results on acoustic models showed that generative adversarial networks and an -autoregressive (AR) model performed better than a normal recurrent network and -the AR model performed best. Evaluation on vocoders by using the same AR -acoustic model demonstrated that a Wavenet vocoder outperformed classical -source-filter-based vocoders. Particularly, generated speech waveforms from the -combination of AR acoustic model and Wavenet vocoder achieved a similar score -of speech quality to vocoded speech. -" -7080,1804.02559,"Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, Satoshi - Nakamura",Guiding Neural Machine Translation with Retrieved Translation Pieces,cs.CL," One of the difficulties of neural machine translation (NMT) is the recall and -appropriate translation of low-frequency words or phrases. In this paper, we -propose a simple, fast, and effective method for recalling previously seen -translation examples and incorporating them into the NMT decoding process. -Specifically, for an input sentence, we use a search engine to retrieve -sentence pairs whose source sides are similar with the input sentence, and then -collect $n$-grams that are both in the retrieved target sentences and aligned -with words that match in the source sentences, which we call ""translation -pieces"". We compute pseudo-probabilities for each retrieved sentence based on -similarities between the input sentence and the retrieved source sentences, and -use these to weight the retrieved translation pieces. Finally, an existing NMT -model is used to translate the input sentence, with an additional bonus given -to outputs that contain the collected translation pieces. We show our method -improves NMT translation results up to 6 BLEU points on three narrow domain -translation tasks where repetitiveness of the target sentences is particularly -salient. It also causes little increase in the translation time, and compares -favorably to another alternative retrieval-based method with respect to -accuracy, speed, and simplicity of implementation. -" -7081,1804.02596,Vivek Kulkarni and William Yang Wang,Simple Models for Word Formation in English Slang,cs.CL cs.AI," We propose generative models for three types of extra-grammatical word -formation phenomena abounding in English slang: Blends, Clippings, and -Reduplicatives. Adopting a data-driven approach coupled with linguistic -knowledge, we propose simple models with state of the art performance on human -annotated gold standard datasets. Overall, our models reveal insights into the -generative processes of word formation in slang -- insights which are -increasingly relevant in the context of the rising prevalence of slang and -non-standard varieties on the Internet. -" -7082,1804.02617,Mehrad Moradshahi and Utkarsh Contractor,Language Modeling with Generative Adversarial Networks,cs.LG cs.CL stat.ML," Generative Adversarial Networks (GANs) have been promising in the field of -image generation, however, they have been hard to train for language -generation. GANs were originally designed to output differentiable values, so -discrete language generation is challenging for them which causes high levels -of instability in training GANs. Consequently, past work has resorted to -pre-training with maximum-likelihood or training GANs without pre-training with -a WGAN objective with a gradient penalty. In this study, we present a -comparison of those approaches. Furthermore, we present the results of some -experiments that indicate better training and convergence of Wasserstein GANs -(WGANs) when a weaker regularization term is enforcing the Lipschitz -constraint. -" -7083,1804.02657,"Takumi Ichimura, Issei Tachibana","Emotion Orientated Recommendation System for Hiroshima Tourist by Fuzzy - Petri Net",cs.HC cs.AI cs.CL," We developed an Android Smartophone application software for tourist -information system. Especially, the agent system recommends the sightseeing -spot and local hospitality corresponding to the current feelings. The system -such as concierge can estimate user's emotion and mood by Emotion Generating -Calculations and Mental State Transition Network. In this paper, the system -decides the next candidates for spots and foods by the reasoning of fuzzy Petri -Net in order to make more smooth communication between human and smartphone. -The system was developed for Hiroshima Tourist Information and described some -hospitality about the concierge system. -" -7084,1804.02812,"Ju-chieh Chou, Cheng-chieh Yeh, Hung-yi Lee, Lin-shan Lee","Multi-target Voice Conversion without Parallel Data by Adversarially - Learning Disentangled Audio Representations",eess.AS cs.CL cs.SD," Recently, cycle-consistent adversarial network (Cycle-GAN) has been -successfully applied to voice conversion to a different speaker without -parallel data, although in those approaches an individual model is needed for -each target speaker. In this paper, we propose an adversarial learning -framework for voice conversion, with which a single model can be trained to -convert the voice to many different speakers, all without parallel data, by -separating the speaker characteristics from the linguistic content in speech -signals. An autoencoder is first trained to extract speaker-independent latent -representations and speaker embedding separately using another auxiliary -speaker classifier to regularize the latent representation. The decoder then -takes the speaker-independent latent representation and the target speaker -embedding as the input to generate the voice of the target speaker with the -linguistic content of the source utterance. The quality of decoder output is -further improved by patching with the residual signal produced by another pair -of generator and discriminator. A target speaker set size of 20 was tested in -the preliminary experiments, and very good voice quality was obtained. -Conventional voice conversion metrics are reported. We also show that the -speaker information has been properly reduced from the latent representations. -" -7085,1804.03052,"David Harwath, Galen Chuang, and James Glass","Vision as an Interlingua: Learning Multilingual Semantic Embeddings of - Untranscribed Speech",cs.CL cs.SD eess.AS," In this paper, we explore the learning of neural network embeddings for -natural images and speech waveforms describing the content of those images. -These embeddings are learned directly from the waveforms without the use of -linguistic transcriptions or conventional speech recognition technology. While -prior work has investigated this setting in the monolingual case using English -speech data, this work represents the first effort to apply these techniques to -languages beyond English. Using spoken captions collected in English and Hindi, -we show that the same model architecture can be successfully applied to both -languages. Further, we demonstrate that training a multilingual model -simultaneously on both languages offers improved performance over the -monolingual models. Finally, we show that these models are capable of -performing semantic cross-lingual speech-to-speech retrieval. -" -7086,1804.03124,"Jing Qian, Mai ElSherief, Elizabeth M. Belding, William Yang Wang","Leveraging Intra-User and Inter-User Representation Learning for - Automated Hate Speech Detection",cs.CL cs.AI," Hate speech detection is a critical, yet challenging problem in Natural -Language Processing (NLP). Despite the existence of numerous studies dedicated -to the development of NLP hate speech detection approaches, the accuracy is -still poor. The central problem is that social media posts are short and noisy, -and most existing hate speech detection solutions take each post as an isolated -input instance, which is likely to yield high false positive and negative -rates. In this paper, we radically improve automated hate speech detection by -presenting a novel model that leverages intra-user and inter-user -representation learning for robust hate speech detection on Twitter. In -addition to the target Tweet, we collect and analyze the user's historical -posts to model intra-user Tweet representations. To suppress the noise in a -single Tweet, we also model the similar Tweets posted by all other users with -reinforced inter-user representation learning techniques. Experimentally, we -show that leveraging these two representations can significantly improve the -f-score of a strong bidirectional LSTM baseline model by 10.1%. -" -7087,1804.03201,Wei-Ning Hsu and James Glass,Scalable Factorized Hierarchical Variational Autoencoder Training,stat.ML cs.CL cs.LG cs.SD eess.AS," Deep generative models have achieved great success in unsupervised learning -with the ability to capture complex nonlinear relationships between latent -generating factors and observations. Among them, a factorized hierarchical -variational autoencoder (FHVAE) is a variational inference-based model that -formulates a hierarchical generative process for sequential data. Specifically, -an FHVAE model can learn disentangled and interpretable representations, which -have been proven useful for numerous speech applications, such as speaker -verification, robust speech recognition, and voice conversion. However, as we -will elaborate in this paper, the training algorithm proposed in the original -paper is not scalable to datasets of thousands of hours, which makes this model -less applicable on a larger scale. After identifying limitations in terms of -runtime, memory, and hyperparameter optimization, we propose a hierarchical -sampling training algorithm to address all three issues. Our proposed method is -evaluated comprehensively on a wide variety of datasets, ranging from 3 to -1,000 hours and involving different types of generating factors, such as -recording conditions and noise types. In addition, we also present a new -visualization method for qualitatively evaluating the performance with respect -to the interpretability and disentanglement. Models trained with our proposed -algorithm demonstrate the desired characteristics on all the datasets. -" -7088,1804.03209,Pete Warden,Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition,cs.CL cs.HC," Describes an audio dataset of spoken words designed to help train and -evaluate keyword spotting systems. Discusses why this task is an interesting -challenge, and why it requires a specialized dataset that is different from -conventional datasets used for automatic speech recognition of full sentences. -Suggests a methodology for reproducible and comparable accuracy metrics for -this task. Describes how the data was collected and verified, what it contains, -previous versions and properties. Concludes by reporting baseline results of -models trained on this dataset. -" -7089,1804.03240,"Djordje Gligorijevic, Jelena Stojanovic, Wayne Satz, Ivan Stojkovic, - Kathrin Schreyer, Daniel Del Portal, Zoran Obradovic",Deep Attention Model for Triage of Emergency Department Patients,cs.CY cs.CL cs.LG," Optimization of patient throughput and wait time in emergency departments -(ED) is an important task for hospital systems. For that reason, Emergency -Severity Index (ESI) system for patient triage was introduced to help guide -manual estimation of acuity levels, which is used by nurses to rank the -patients and organize hospital resources. However, despite improvements that it -brought to managing medical resources, such triage system greatly depends on -nurse's subjective judgment and is thus prone to human errors. Here, we propose -a novel deep model based on the word attention mechanism designed for -predicting a number of resources an ED patient would need. Our approach -incorporates routinely available continuous and nominal (structured) data with -medical text (unstructured) data, including patient's chief complaint, past -medical history, medication list, and nurse assessment collected for 338,500 ED -visits over three years in a large urban hospital. Using both structured and -unstructured data, the proposed approach achieves the AUC of $\sim 88\%$ for -the task of identifying resource intensive patients (binary classification), -and the accuracy of $\sim 44\%$ for predicting exact category of number of -resources (multi-class classification task), giving an estimated lift over -nurses' performance by 16\% in accuracy. Furthermore, the attention mechanism -of the proposed model provides interpretability by assigning attention scores -for nurses' notes which is crucial for decision making and implementation of -such approaches in the real systems working on human health. -" -7090,1804.03243,"Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, - Sanjeev Khudanpur",A GPU-based WFST Decoder with Exact Lattice Generation,cs.CL," We describe initial work on an extension of the Kaldi toolkit that supports -weighted finite-state transducer (WFST) decoding on Graphics Processing Units -(GPUs). We implement token recombination as an atomic GPU operation in order to -fully parallelize the Viterbi beam search, and propose a dynamic load balancing -strategy for more efficient token passing scheduling among GPU threads. We also -redesign the exact lattice generation and lattice pruning algorithms for better -utilization of the GPUs. Experiments on the Switchboard corpus show that the -proposed method achieves identical 1-best results and lattice quality in -recognition and confidence measure tasks, while running 3 to 15 times faster -than the single process Kaldi decoder. The above results are reported on -different GPU architectures. Additionally we obtain a 46-fold speedup with -sequence parallelism and multi-process service (MPS) in GPU. -" -7091,1804.03257,"Haw-Shiuan Chang, Amol Agrawal, Ananya Ganesh, Anirudha Desai, Vinayak - Mathur, Alfred Hough, Andrew McCallum","Efficient Graph-based Word Sense Induction by Distributional Inclusion - Vector Embeddings",cs.CL," Word sense induction (WSI), which addresses polysemy by unsupervised -discovery of multiple word senses, resolves ambiguities for downstream NLP -tasks and also makes word representations more interpretable. This paper -proposes an accurate and efficient graph-based method for WSI that builds a -global non-negative vector embedding basis (which are interpretable like -topics) and clusters the basis indexes in the ego network of each polysemous -word. By adopting distributional inclusion vector embeddings as our basis -formation model, we avoid the expensive step of nearest neighbor search that -plagues other graph-based methods without sacrificing the quality of sense -clusters. Experiments on three datasets show that our proposed method produces -similar or better sense clusters and embeddings compared with previous -state-of-the-art methods while being significantly more efficient. -" -7092,1804.03317,"Yingqi Qu, Jie Liu, Liangyi Kang, Qinfeng Shi, Dan Ye","Question Answering over Freebase via Attentive RNN with Similarity - Matrix based CNN",cs.CL cs.AI cs.LG," With the rapid growth of knowledge bases (KBs), question answering over -knowledge base, a.k.a. KBQA has drawn huge attention in recent years. Most of -the existing KBQA methods follow so called encoder-compare framework. They map -the question and the KB facts to a common embedding space, in which the -similarity between the question vector and the fact vectors can be conveniently -computed. This, however, inevitably loses original words interaction -information. To preserve more original information, we propose an attentive -recurrent neural network with similarity matrix based convolutional neural -network (AR-SMCNN) model, which is able to capture comprehensive hierarchical -information utilizing the advantages of both RNN and CNN. We use RNN to capture -semantic-level correlation by its sequential modeling nature, and use an -attention mechanism to keep track of the entities and relations simultaneously. -Meanwhile, we use a similarity matrix based CNN with two-directions pooling to -extract literal-level words interaction matching utilizing CNNs strength of -modeling spatial correlation among data. Moreover, we have developed a new -heuristic extension method for entity detection, which significantly decreases -the effect of noise. Our method has outperformed the state-of-the-arts on -SimpleQuestion benchmark in both accuracy and efficiency. -" -7093,1804.03396,"Lin Qiu, Hao Zhou, Yanru Qu, Weinan Zhang, Suoheng Li, Shu Rong, - Dongyu Ru, Lihua Qian, Kewei Tu and Yong Yu",QA4IE: A Question Answering based Framework for Information Extraction,cs.IR cs.AI cs.CL," Information Extraction (IE) refers to automatically extracting structured -relation tuples from unstructured texts. Common IE solutions, including -Relation Extraction (RE) and open IE systems, can hardly handle cross-sentence -tuples, and are severely restricted by limited relation types as well as -informal relation specifications (e.g., free-text based relation tuples). In -order to overcome these weaknesses, we propose a novel IE framework named -QA4IE, which leverages the flexible question answering (QA) approaches to -produce high quality relation triples across sentences. Based on the framework, -we develop a large IE benchmark with high quality human evaluation. This -benchmark contains 293K documents, 2M golden relation triples, and 636 relation -types. We compare our system with some IE baselines on our benchmark and the -results show that our system achieves great improvements. -" -7094,1804.03424,"Yookoon Park, Jaemin Cho and Gunhee Kim",A Hierarchical Latent Structure for Variational Conversation Modeling,cs.CL cs.AI cs.LG," Variational autoencoders (VAE) combined with hierarchical RNNs have emerged -as a powerful framework for conversation modeling. However, they suffer from -the notorious degeneration problem, where the decoders learn to ignore latent -variables and reduce to vanilla RNNs. We empirically show that this degeneracy -occurs mostly due to two reasons. First, the expressive power of hierarchical -RNN decoders is often high enough to model the data using only its decoding -distributions without relying on the latent variables. Second, the conditional -VAE structure whose generation process is conditioned on a context, makes the -range of training targets very sparse; that is, the RNN decoders can easily -overfit to the training data ignoring the latent variables. To solve the -degeneration problem, we propose a novel model named Variational Hierarchical -Conversation RNNs (VHCR), involving two key ideas of (1) using a hierarchical -structure of latent variables, and (2) exploiting an utterance drop -regularization. With evaluations on two datasets of Cornell Movie Dialog and -Ubuntu Dialog Corpus, we show that our VHCR successfully utilizes latent -variables and outperforms state-of-the-art models for conversation generation. -Moreover, it can perform several new utterance control tasks, thanks to its -hierarchical latent structure. -" -7095,1804.03433,"Fabio Del Vigna, Marinella Petrocchi, Alessandro Tommasi, Cesare - Zavattari, Maurizio Tesconi","Who framed Roger Reindeer? De-censorship of Facebook posts by snippet - classification",cs.CL," This paper considers online news censorship and it concentrates on censorship -of identities. Obfuscating identities may occur for disparate reasons, from -military to judiciary ones. In the majority of cases, this happens to protect -individuals from being identified and persecuted by hostile people. However, -being the collaborative web characterised by a redundancy of information, it is -not unusual that the same fact is reported by multiple sources, which may not -apply the same restriction policies in terms of censorship. Also, the proven -aptitude of social network users to disclose personal information leads to the -phenomenon that comments to news can reveal the data withheld in the news -itself. This gives us a mean to figure out who the subject of the censored news -is. We propose an adaptation of a text analysis approach to unveil censored -identities. The approach is tested on a synthesised scenario, which however -resembles a real use case. Leveraging a text analysis based on a context -classifier trained over snippets from posts and comments of Facebook pages, we -achieve promising results. Despite the quite constrained settings in which we -operate -- such as considering only snippets of very short length -- our system -successfully detects the censored name, choosing among 10 different candidate -names, in more than 50\% of the investigated cases. This outperforms the -results of two reference baselines. The findings reported in this paper, other -than being supported by a thorough experimental methodology and interesting on -their own, also pave the way for further investigation on the insidious issues -of censorship on the web. -" -7096,1804.03540,Arkaitz Zubiaga,Mining Social Media for Newsgathering: A Review,cs.CL cs.IR cs.SI," Social media is becoming an increasingly important data source for learning -about breaking news and for following the latest developments of ongoing news. -This is in part possible thanks to the existence of mobile devices, which -allows anyone with access to the Internet to post updates from anywhere, -leading in turn to a growing presence of citizen journalism. Consequently, -social media has become a go-to resource for journalists during the process of -newsgathering. Use of social media for newsgathering is however challenging, -and suitable tools are needed in order to facilitate access to useful -information for reporting. In this paper, we provide an overview of research in -data mining and natural language processing for mining social media for -newsgathering. We discuss five different areas that researchers have worked on -to mitigate the challenges inherent to social media newsgathering: news -discovery, curation of news, validation and verification of content, -newsgathering dashboards, and other tasks. We outline the progress made so far -in the field, summarise the current challenges as well as discuss future -directions in the use of computational journalism to assist with social media -newsgathering. This review is relevant to computer scientists researching news -in social media as well as for interdisciplinary researchers interested in the -intersection of computer science and journalism. -" -7097,1804.03608,"Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, and Aniruddha - Kembhavi",Imagine This! Scripts to Compositions to Videos,cs.CV cs.CL cs.IR cs.LG," Imagining a scene described in natural language with realistic layout and -appearance of entities is the ultimate test of spatial, visual, and semantic -world knowledge. Towards this goal, we present the Composition, Retrieval, and -Fusion Network (CRAFT), a model capable of learning this knowledge from -video-caption data and applying it while generating videos from novel captions. -CRAFT explicitly predicts a temporal-layout of mentioned entities (characters -and objects), retrieves spatio-temporal entity segments from a video database -and fuses them to generate scene videos. Our contributions include sequential -training of components of CRAFT while jointly modeling layout and appearances, -and losses that encourage learning compositional representations for retrieval. -We evaluate CRAFT on semantic fidelity to caption, composition consistency, and -visual quality. CRAFT outperforms direct pixel generation approaches and -generalizes well to unseen captions and to unseen video databases with no text -annotations. We demonstrate CRAFT on FLINTSTONES, a new richly annotated -video-caption dataset with over 25000 videos. For a glimpse of videos generated -by CRAFT, see https://youtu.be/688Vv86n0z8. -" -7098,1804.03673,"Reshma U, Barathi Ganesh H B, Mandar Kale, Prachi Mankame and Gouri - Kulkarni",Deep Learning for Digital Text Analytics: Sentiment Analysis,cs.CL," In today's scenario, imagining a world without negativity is something very -unrealistic, as bad NEWS spreads more virally than good ones. Though it seems -impractical in real life, this could be implemented by building a system using -Machine Learning and Natural Language Processing techniques in identifying the -news datum with negative shade and filter them by taking only the news with -positive shade (good news) to the end user. In this work, around two lakhs -datum have been trained and tested using a combination of rule-based and data -driven approaches. VADER along with a filtration method has been used as an -annotating tool followed by statistical Machine Learning approach that have -used Document Term Matrix (representation) and Support Vector Machine -(classification). Deep Learning algorithms then came into picture to make this -system reliable (Doc2Vec) which finally ended up with Convolutional Neural -Network(CNN) that yielded better results than the other experimented modules. -It showed up a training accuracy of 96%, while a test accuracy of (internal and -external news datum) above 85% was obtained. -" -7099,1804.03782,"Sidi Lu and Lantao Yu and Siyuan Feng and Yaoming Zhu and Weinan Zhang - and Yong Yu",CoT: Cooperative Training for Generative Modeling of Discrete Data,cs.LG cs.AI cs.CL stat.ML," In this paper, we study the generative models of sequential discrete data. To -tackle the exposure bias problem inherent in maximum likelihood estimation -(MLE), generative adversarial networks (GANs) are introduced to penalize the -unrealistic generated samples. To exploit the supervision signal from the -discriminator, most previous models leverage REINFORCE to address the -non-differentiable problem of sequential discrete data. However, because of the -unstable property of the training signal during the dynamic process of -adversarial training, the effectiveness of REINFORCE, in this case, is hardly -guaranteed. To deal with such a problem, we propose a novel approach called -Cooperative Training (CoT) to improve the training of sequence generative -models. CoT transforms the min-max game of GANs into a joint maximization -framework and manages to explicitly estimate and optimize Jensen-Shannon -divergence. Moreover, CoT works without the necessity of pre-training via MLE, -which is crucial to the success of previous methods. In the experiments, -compared to existing state-of-the-art methods, CoT shows superior or at least -competitive performance on sample quality, diversity, as well as training -stability. -" -7100,1804.03799,"Rashmi Gangadharaiah, Balakrishnan Narayanaswamy, Charles Elkan",Achieving Fluency and Coherency in Task-oriented Dialog,cs.CL cs.AI," We consider real world task-oriented dialog settings, where agents need to -generate both fluent natural language responses and correct external actions -like database queries and updates. We demonstrate that, when applied to -customer support chat transcripts, Sequence to Sequence (Seq2Seq) models often -generate short, incoherent and ungrammatical natural language responses that -are dominated by words that occur with high frequency in the training data. -These phenomena do not arise in synthetic datasets such as bAbI, where we show -Seq2Seq models are nearly perfect. We develop techniques to learn embeddings -that succinctly capture relevant information from the dialog history, and -demonstrate that nearest neighbor based approaches in this learned neural -embedding space generate more fluent responses. However, we see that these -methods are not able to accurately predict when to execute an external action. -We show how to combine nearest neighbor and Seq2Seq methods in a hybrid model, -where nearest neighbor is used to generate fluent responses and Seq2Seq type -models ensure dialog coherency and generate accurate external actions. We show -that this approach is well suited for customer support scenarios, where agents' -responses are typically script-driven, and correct external actions are -critically important. The hybrid model on the customer support data achieves a -78% relative improvement in fluency scores, and a 130% improvement in accuracy -of external calls. -" -7101,1804.03824,Leshem Choshen and Omri Abend,Reference-less Measure of Faithfulness for Grammatical Error Correction,cs.CL cs.AI," We propose USim, a semantic measure for Grammatical Error Correction (GEC) -that measures the semantic faithfulness of the output to the source, thereby -complementing existing reference-less measures (RLMs) for measuring the -output's grammaticality. USim operates by comparing the semantic symbolic -structure of the source and the correction, without relying on manually-curated -references. Our experiments establish the validity of USim, by showing that (1) -semantic annotation can be consistently applied to ungrammatical text; (2) -valid corrections obtain a high USim similarity score to the source; and (3) -invalid corrections obtain a lower score. -" -7102,1804.03839,"Nishtha Madaan, Gautam Singh, Sameep Mehta, Aditya Chetan, Brihi Joshi",Generating Clues for Gender based Occupation De-biasing in Text,cs.CL cs.CY," Vast availability of text data has enabled widespread training and use of AI -systems that not only learn and predict attributes from the text but also -generate text automatically. However, these AI models also learn gender, racial -and ethnic biases present in the training data. In this paper, we present the -first system that discovers the possibility that a given text portrays a gender -stereotype associated with an occupation. If the possibility exists, the system -offers counter-evidences of opposite gender also being associated with the same -occupation in the context of user-provided geography and timespan. The system -thus enables text de-biasing by assisting a human-in-the-loop. The system can -not only act as a text pre-processor before training any AI model but also help -human story writers write stories free of occupation-level gender bias in the -geographical and temporal context of their choice. -" -7103,1804.03923,Farshad Jafari,Generating Multilingual Parallel Corpus Using Subtitles,cs.CL," Neural Machine Translation with its significant results, still has a great -problem: lack or absence of parallel corpus for many languages. This article -suggests a method for generating considerable amount of parallel corpus for any -language pairs, extracted from open source materials existing on the Internet. -Parallel corpus contents will be derived from video subtitles. It needs a set -of video titles, with some attributes like release date, rating, duration and -etc. Process of finding and downloading subtitle pairs for desired language -pairs is automated by using a crawler. Finally sentence pairs will be extracted -from synchronous dialogues in subtitles. The main problem of this method is -unsynchronized subtitle pairs. Therefore subtitles will be verified before -downloading. If two subtitle were not synchronized, then another subtitle of -that video will be processed till it finds the matching subtitle. Using this -approach gives ability to make context based parallel corpus through filtering -videos by genre. Context based corpus can be used in complex translators which -decode sentences by different networks after determining contents subject. -Languages have many differences in their formal and informal styles, including -words and syntax. Other advantage of this method is to make corpus of informal -style of languages. Because most of movies dialogues are parts of a -conversation. So they had informal style. This feature of generated corpus can -be used in real-time translators to have more accurate conversation -translations. -" -7104,1804.03980,"Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, - Stephen Clark",Emergent Communication through Negotiation,cs.AI cs.CL cs.LG cs.MA," Multi-agent reinforcement learning offers a way to study how communication -could emerge in communities of agents needing to solve specific problems. In -this paper, we study the emergence of communication in the negotiation -environment, a semi-cooperative model of agent interaction. We introduce two -communication protocols -- one grounded in the semantics of the game, and one -which is \textit{a priori} ungrounded and is a form of cheap talk. We show that -self-interested agents can use the pre-grounded communication channel to -negotiate fairly, but are unable to effectively use the ungrounded channel. -However, prosocial agents do learn to use cheap talk to find an optimal -negotiating strategy, suggesting that cooperation is necessary for language to -emerge. We also study communication behaviour in a setting where one agent -interacts with agents in a community with different levels of prosociality and -show how agent identifiability can aid negotiation. -" -7105,1804.03984,"Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark","Emergence of Linguistic Communication from Referential Games with - Symbolic and Pixel Input",cs.AI cs.CL cs.LG cs.MA," The ability of algorithms to evolve or learn (compositional) communication -protocols has traditionally been studied in the language evolution literature -through the use of emergent communication tasks. Here we scale up this research -by using contemporary deep learning methods and by training -reinforcement-learning neural network agents on referential communication -games. We extend previous work, in which agents were trained in symbolic -environments, by developing agents which are able to learn from raw pixel data, -a more challenging and realistic input representation. We find that the degree -of structure found in the input data affects the nature of the emerged -protocols, and thereby corroborate the hypothesis that structured compositional -language is most likely to emerge when agents perceive the world as being -structured. -" -7106,1804.04003,"Ayush Singh, Ritu Palod",Sentiment Transfer using Seq2Seq Adversarial Autoencoders,cs.CL," Expressing in language is subjective. Everyone has a different style of -reading and writing, apparently it all boil downs to the way their mind -understands things (in a specific format). Language style transfer is a way to -preserve the meaning of a text and change the way it is expressed. Progress in -language style transfer is lagged behind other domains, such as computer -vision, mainly because of the lack of parallel data, use cases, and reliable -evaluation metrics. In response to the challenge of lacking parallel data, we -explore learning style transfer from non-parallel data. We propose a model -combining seq2seq, autoencoders, and adversarial loss to achieve this goal. The -key idea behind the proposed models is to learn separate content -representations and style representations using adversarial networks. -Considering the problem of evaluating style transfer tasks, we frame the -problem as sentiment transfer and evaluation using a sentiment classifier to -calculate how many sentiments was the model able to transfer. We report our -results on several kinds of models. -" -7107,1804.04053,"Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan - Wermter","EmoRL: Continuous Acoustic Emotion Classification using Deep - Reinforcement Learning",cs.RO cs.CL cs.HC cs.LG," Acoustically expressed emotions can make communication with a robot more -efficient. Detecting emotions like anger could provide a clue for the robot -indicating unsafe/undesired situations. Recently, several deep neural -network-based models have been proposed which establish new state-of-the-art -results in affective state evaluation. These models typically start processing -at the end of each utterance, which not only requires a mechanism to detect the -end of an utterance but also makes it difficult to use them in a real-time -communication scenario, e.g. human-robot interaction. We propose the EmoRL -model that triggers an emotion classification as soon as it gains enough -confidence while listening to a person speaking. As a result, we minimize the -need for segmenting the audio signal for classification and achieve lower -latency as the audio signal is processed incrementally. The method is -competitive with the accuracy of a strong baseline model, while allowing much -earlier prediction. -" -7108,1804.04058,"Rizwan Sadiq, Mohsin Khan",Analyzing Self-Driving Cars on Twitter,cs.LG cs.CL cs.SI stat.ML," This paper studies users' perception regarding a controversial product, -namely self-driving (autonomous) cars. To find people's opinion regarding this -new technology, we used an annotated Twitter dataset, and extracted the topics -in positive and negative tweets using an unsupervised, probabilistic model -known as topic modeling. We later used the topics, as well as linguist and -Twitter specific features to classify the sentiment of the tweets. Regarding -the opinions, the result of our analysis shows that people are optimistic and -excited about the future technology, but at the same time they find it -dangerous and not reliable. For the classification task, we found Twitter -specific features, such as hashtags as well as linguistic features such as -emphatic words among top attributes in classifying the sentiment of the tweets. -" -7109,1804.04059,"A. Ceron, L. Curini, S.M. Iacus","ISIS at its apogee: the Arabic discourse on Twitter and what we can - learn from that about ISIS support and Foreign Fighters",cs.CL cs.SI," We analyze 26.2 million comments published in Arabic language on Twitter, -from July 2014 to January 2015, when ISIS' strength reached its peak and the -group was prominently expanding the territorial area under its control. By -doing that, we are able to measure the share of support and aversion toward the -Islamic State within the online Arab communities. We then investigate two -specific topics. First, by exploiting the time-granularity of the tweets, we -link the opinions with daily events to understand the main determinants of the -changing trend in support toward ISIS. Second, by taking advantage of the -geographical locations of tweets, we explore the relationship between online -opinions across countries and the number of foreign fighters joining ISIS. -" -7110,1804.04083,"Claudia Schulz, Steffen Eger, Johannes Daxenberger, Tobias Kahse, - Iryna Gurevych",Multi-Task Learning for Argumentation Mining in Low-Resource Settings,cs.CL," We investigate whether and where multi-task learning (MTL) can improve -performance on NLP problems related to argumentation mining (AM), in particular -argument component identification. Our results show that MTL performs -particularly well (and better than single-task learning) when little training -data is available for the main task, a common scenario in AM. Our findings -challenge previous assumptions that conceptualizations across AM datasets are -divergent and that MTL is difficult for semantic or higher-level tasks. -" -7111,1804.04087,"Marco Lippi, Marcelo A Montemurro, Mirko Degli Esposti, Giampaolo - Cristadoro",Natural Language Statistical Features of LSTM-generated Texts,cs.CL cs.LG," Long Short-Term Memory (LSTM) networks have recently shown remarkable -performance in several tasks dealing with natural language generation, such as -image captioning or poetry composition. Yet, only few works have analyzed text -generated by LSTMs in order to quantitatively evaluate to which extent such -artificial texts resemble those generated by humans. We compared the -statistical structure of LSTM-generated language to that of written natural -language, and to those produced by Markov models of various orders. In -particular, we characterized the statistical structure of language by assessing -word-frequency statistics, long-range correlations, and entropy measures. Our -main finding is that while both LSTM and Markov-generated texts can exhibit -features similar to real ones in their word-frequency statistics and entropy -measures, LSTM-texts are shown to reproduce long-range correlations at scales -comparable to those found in natural language. Moreover, for LSTM networks a -temperature-like parameter controlling the generation process shows an optimal -value---for which the produced texts are closest to real language---consistent -across all the different statistical features investigated. -" -7112,1804.04093,"Ye Zhang, Nan Ding, Radu Soricut",SHAPED: Shared-Private Encoder-Decoder for Text Style Adaptation,cs.CL," Supervised training of abstractive language generation models results in -learning conditional probabilities over language sequences based on the -supervised training signal. When the training signal contains a variety of -writing styles, such models may end up learning an 'average' style that is -directly influenced by the training data make-up and cannot be controlled by -the needs of an application. We describe a family of model architectures -capable of capturing both generic language characteristics via shared model -parameters, as well as particular style characteristics via private model -parameters. Such models are able to generate language according to a specific -learned style, while still taking advantage of their power to model generic -language phenomena. Furthermore, we describe an extension that uses a mixture -of output distributions from all learned styles to perform on-the fly style -adaptation based on the textual input alone. Experimentally, we find that the -proposed models consistently outperform models that encapsulate single-style or -average-style language generation capabilities. -" -7113,1804.04095,Nikolaos Aletras and Benjamin Paul Chamberlain,"Predicting Twitter User Socioeconomic Attributes with Network and - Language Information",cs.CL cs.AI cs.SI," Inferring socioeconomic attributes of social media users such as occupation -and income is an important problem in computational social science. Automated -inference of such characteristics has applications in personalised recommender -systems, targeted computational advertising and online political campaigning. -While previous work has shown that language features can reliably predict -socioeconomic attributes on Twitter, employing information coming from users' -social networks has not yet been explored for such complex user -characteristics. In this paper, we describe a method for predicting the -occupational class and the income of Twitter users given information extracted -from their extended networks by learning a low-dimensional vector -representation of users, i.e. graph embeddings. We use this representation to -train predictive models for occupational class and income. Results on two -publicly available datasets show that our method consistently outperforms the -state-of-the-art methods in both tasks. We also obtain further significant -improvements when we combine graph embeddings with textual features, -demonstrating that social network and language information are complementary. -" -7114,1804.04164,"Hannah Kim, Denys Katerenchuk, Daniel Billet, Jun Huan, Haesun Park, - Boyang Li",Understanding Actors and Evaluating Personae with Gaussian Embeddings,cs.CY cs.CL cs.MM cs.SI," Understanding narrative content has become an increasingly popular topic. -Nonetheless, research on identifying common types of narrative characters, or -personae, is impeded by the lack of automatic and broad-coverage evaluation -methods. We argue that computationally modeling actors provides benefits, -including novel evaluation mechanisms for personae. Specifically, we propose -two actor-modeling tasks, cast prediction and versatility ranking, which can -capture complementary aspects of the relation between actors and the characters -they portray. For an actor model, we present a technique for embedding actors, -movies, character roles, genres, and descriptive keywords as Gaussian -distributions and translation vectors, where the Gaussian variance corresponds -to actors' versatility. Empirical results indicate that (1) the technique -considerably outperforms TransE (Bordes et al. 2013) and ablation baselines and -(2) automatically identified persona topics (Bamman, O'Connor, and Smith 2013) -yield statistically significant improvements in both tasks, whereas simplistic -persona descriptors including age and gender perform inconsistently, validating -prior research. -" -7115,1804.04191,"Shimei Pan, Tao Ding",Automatically Infer Human Traits and Behavior from Social Media Data,cs.SI cs.CL cs.IR," Given the complexity of human minds and their behavioral flexibility, it -requires sophisticated data analysis to sift through a large amount of human -behavioral evidence to model human minds and to predict human behavior. People -currently spend a significant amount of time on social media such as Twitter -and Facebook. Thus many aspects of their lives and behaviors have been -digitally captured and continuously archived on these platforms. This makes -social media a great source of large, rich and diverse human behavioral -evidence. In this paper, we survey the recent work on applying machine learning -to infer human traits and behavior from social media data. We will also point -out several future research directions. -" -7116,1804.04205,"Ziyi Zhao, Krittaphat Pugdeethosapol, Sheng Lin, Zhe Li, Caiwen Ding, - Yanzhi Wang, Qinru Qiu",Learning Topics using Semantic Locality,cs.LG cs.CL cs.IR stat.ML," The topic modeling discovers the latent topic probability of the given text -documents. To generate the more meaningful topic that better represents the -given document, we proposed a new feature extraction technique which can be -used in the data preprocessing stage. The method consists of three steps. -First, it generates the word/word-pair from every single document. Second, it -applies a two-way TF-IDF algorithm to word/word-pair for semantic filtering. -Third, it uses the K-means algorithm to merge the word pairs that have the -similar semantic meaning. - Experiments are carried out on the Open Movie Database (OMDb), Reuters -Dataset and 20NewsGroup Dataset. The mean Average Precision score is used as -the evaluation metric. Comparing our results with other state-of-the-art topic -models, such as Latent Dirichlet allocation and traditional Restricted -Boltzmann Machines. Our proposed data preprocessing can improve the generated -topic accuracy by up to 12.99\%. -" -7117,1804.04211,"Maryam Fanaeepour, Adam Makarucha, Jey Han Lau","Evaluating Word Embedding Hyper-Parameters for Similarity and Analogy - Tasks",cs.CL," The versatility of word embeddings for various applications is attracting -researchers from various fields. However, the impact of hyper-parameters when -training embedding model is often poorly understood. How much do -hyper-parameters such as vector dimensions and corpus size affect the quality -of embeddings, and how do these results translate to downstream applications? -Using standard embedding evaluation metrics and datasets, we conduct a study to -empirically measure the impact of these hyper-parameters. -" -7118,1804.04212,"Hugo Caselles-Dupr\'e, Florian Lesaint, Jimena Royo-Letelier",Word2Vec applied to Recommendation: Hyperparameters Matter,cs.IR cs.CL cs.LG stat.ML," Skip-gram with negative sampling, a popular variant of Word2vec originally -designed and tuned to create word embeddings for Natural Language Processing, -has been used to create item embeddings with successful applications in -recommendation. While these fields do not share the same type of data, neither -evaluate on the same tasks, recommendation applications tend to use the same -already tuned hyperparameters values, even if optimal hyperparameters values -are often known to be data and task dependent. We thus investigate the marginal -importance of each hyperparameter in a recommendation setting through large -hyperparameter grid searches on various datasets. Results reveal that -optimizing neglected hyperparameters, namely negative sampling distribution, -number of epochs, subsampling parameter and window-size, significantly improves -performance on a recommendation task, and can increase it by an order of -magnitude. Importantly, we find that optimal hyperparameters configurations for -Natural Language Processing tasks and Recommendation tasks are noticeably -different. -" -7119,1804.04225,"Yue Liu, Tao Ge, Kusum S. Mathews, Heng Ji, Deborah L. McGuinness","Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical - Abbreviation Expansion",cs.CL cs.AI," In the medical domain, identifying and expanding abbreviations in clinical -texts is a vital task for both better human and machine understanding. It is a -challenging task because many abbreviations are ambiguous especially for -intensive care medicine texts, in which phrase abbreviations are frequently -used. Besides the fact that there is no universal dictionary of clinical -abbreviations and no universal rules for abbreviation writing, such texts are -difficult to acquire, expensive to annotate and even sometimes, confusing to -domain experts. This paper proposes a novel and effective approach - exploiting -task-oriented resources to learn word embeddings for expanding abbreviations in -clinical notes. We achieved 82.27% accuracy, close to expert human performance. -" -7120,1804.04242,"Han Wang, Ye Wang, Xinxiang Zhang, Mi Lu, Yoonsuck Choe, Jingjing Cao",English Out-of-Vocabulary Lexical Evaluation Task,cs.CL," Unlike previous unknown nouns tagging task, this is the first attempt to -focus on out-of-vocabulary (OOV) lexical evaluation tasks that do not require -any prior knowledge. The OOV words are words that only appear in test samples. -The goal of tasks is to provide solutions for OOV lexical classification and -prediction. The tasks require annotators to conclude the attributes of the OOV -words based on their related contexts. Then, we utilize unsupervised word -embedding methods such as Word2Vec and Word2GM to perform the baseline -experiments on the categorical classification task and OOV words attribute -prediction tasks. -" -7121,1804.04257,"Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, - Elizabeth Belding","Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social - Media",cs.CL cs.SI," While social media empowers freedom of expression and individual voices, it -also enables anti-social behavior, online harassment, cyberbullying, and hate -speech. In this paper, we deepen our understanding of online hate speech by -focusing on a largely neglected but crucial aspect of hate speech -- its -target: either ""directed"" towards a specific person or entity, or ""generalized"" -towards a group of people sharing a common protected characteristic. We perform -the first linguistic and psycholinguistic analysis of these two forms of hate -speech and reveal the presence of interesting markers that distinguish these -types of hate speech. Our analysis reveals that Directed hate speech, in -addition to being more personal and directed, is more informal, angrier, and -often explicitly attacks the target (via name calling) with fewer analytic -words and more words suggesting authority and influence. Generalized hate -speech, on the other hand, is dominated by religious hate, is characterized by -the use of lethal words such as murder, exterminate, and kill; and quantity -words such as million and many. Altogether, our work provides a data-driven -analysis of the nuances of online-hate speech that enables not only a deepened -understanding of hate speech and its social implications but also its -detection. -" -7122,1804.04262,"Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, - Fernando Villavicencio, Tomi Kinnunen, Zhenhua Ling","The Voice Conversion Challenge 2018: Promoting Development of Parallel - and Nonparallel Methods",eess.AS cs.CL cs.SD stat.ML," We present the Voice Conversion Challenge 2018, designed as a follow up to -the 2016 edition with the aim of providing a common framework for evaluating -and comparing different state-of-the-art voice conversion (VC) systems. The -objective of the challenge was to perform speaker conversion (i.e. transform -the vocal identity) of a source speaker to a target speaker while maintaining -linguistic information. As an update to the previous challenge, we considered -both parallel and non-parallel data to form the Hub and Spoke tasks, -respectively. A total of 23 teams from around the world submitted their -systems, 11 of them additionally participated in the optional Spoke task. A -large-scale crowdsourced perceptual evaluation was then carried out to rate the -submitted converted speech in terms of naturalness and similarity to the target -speaker identity. In this paper, we present a brief summary of the -state-of-the-art techniques for VC, followed by a detailed explanation of the -challenge tasks and the results that were obtained. -" -7123,1804.04264,"Phu Mon Htut, Samuel R. Bowman, Kyunghyun Cho",Training a Ranking Function for Open-Domain Question Answering,cs.CL cs.IR," In recent years, there have been amazing advances in deep learning methods -for machine reading. In machine reading, the machine reader has to extract the -answer from the given ground truth paragraph. Recently, the state-of-the-art -machine reading models achieve human level performance in SQuAD which is a -reading comprehension-style question answering (QA) task. The success of -machine reading has inspired researchers to combine information retrieval with -machine reading to tackle open-domain QA. However, these systems perform poorly -compared to reading comprehension-style QA because it is difficult to retrieve -the pieces of paragraphs that contain the answer to the question. In this -study, we propose two neural network rankers that assign scores to different -passages based on their likelihood of containing the answer to a given -question. Additionally, we analyze the relative importance of semantic -similarity and word level relevance matching in open-domain QA. -" -7124,1804.04266,"Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen and Dinh Phung",A Capsule Network-based Embedding Model for Search Personalization,cs.CL cs.IR," Search personalization aims to tailor search results to each specific user -based on the user's personal interests and preferences (i.e., the user -profile). Recent research approaches to search personalization by modelling the -potential 3-way relationship between the submitted query, the user and the -search results (i.e., documents). That relationship is then used to personalize -the search results to that user. In this paper, we introduce a novel embedding -model based on capsule network, which recently is a breakthrough in deep -learning, to model the 3-way relationships for search personalization. In the -model, each user (submitted query or returned document) is embedded by a vector -in the same vector space. The 3-way relationship is described as a triple of -(query, user, document) which is then modeled as a 3-column matrix containing -the three embedding vectors. After that, the 3-column matrix is fed into a deep -learning architecture to re-rank the search results returned by a basis ranker. -Experimental results on query logs from a commercial web search engine show -that our model achieves better performances than the basis ranker as well as -strong search personalization baselines. -" -7125,1804.04380,"Alon Rozental, Daniel Fleischer","Amobee at SemEval-2018 Task 1: GRU Neural Network with a CNN Attention - Mechanism for Sentiment Classification",cs.CL stat.ML," This paper describes the participation of Amobee in the shared sentiment -analysis task at SemEval 2018. We participated in all the English sub-tasks and -the Spanish valence tasks. Our system consists of three parts: training -task-specific word embeddings, training a model consisting of -gated-recurrent-units (GRU) with a convolution neural network (CNN) attention -mechanism and training stacking-based ensembles for each of the sub-tasks. Our -algorithm reached 3rd and 1st places in the valence ordinal classification -sub-tasks in English and Spanish, respectively. -" -7126,1804.04475,"Mitodru Niyogi, Kripabandhu Ghosh, Arnab Bhattacharya","Learning Multilingual Embeddings for Cross-Lingual Information Retrieval - in the Presence of Topically Aligned Corpora",cs.IR cs.CL," Cross-lingual information retrieval is a challenging task in the absence of -aligned parallel corpora. In this paper, we address this problem by considering -topically aligned corpora designed for evaluating an IR setup. To emphasize, we -neither use any sentence-aligned corpora or document-aligned corpora, nor do we -use any language specific resources such as dictionary, thesaurus, or grammar -rules. Instead, we use an embedding into a common space and learn word -correspondences directly from there. We test our proposed approach for -bilingual IR on standard FIRE datasets for Bangla, Hindi and English. The -proposed method is superior to the state-of-the-art method not only for IR -evaluation measures but also in terms of time requirements. We extend our -method successfully to the trilingual setting. -" -7127,1804.04526,"Simon Gottschalk, Elena Demidova",EventKG: A Multilingual Event-Centric Temporal Knowledge Graph,cs.CL cs.DB," One of the key requirements to facilitate semantic analytics of information -regarding contemporary and historical events on the Web, in the news and in -social media is the availability of reference knowledge repositories containing -comprehensive representations of events and temporal relations. Existing -knowledge graphs, with popular examples including DBpedia, YAGO and Wikidata, -focus mostly on entity-centric information and are insufficient in terms of -their coverage and completeness with respect to events and temporal relations. -EventKG presented in this paper is a multilingual event-centric temporal -knowledge graph that addresses this gap. EventKG incorporates over 690 thousand -contemporary and historical events and over 2.3 million temporal relations -extracted from several large-scale knowledge graphs and semi-structured sources -and makes them available through a canonical representation. -" -7128,1804.04589,Yue Dong,A Survey on Neural Network-Based Summarization Methods,cs.CL," Automatic text summarization, the automated process of shortening a text -while reserving the main ideas of the document(s), is a critical research area -in natural language processing. The aim of this literature review is to survey -the recent work on neural-based models in automatic text summarization. We -examine in detail ten state-of-the-art neural-based summarizers: five -abstractive models and five extractive models. In addition, we discuss the -related techniques that can be applied to the summarization tasks and present -promising paths for future research in neural-based summarization. -" -7129,1804.04749,Christoph Treude and Markus Wagner,"Predicting Good Configurations for GitHub and Stack Overflow Topic - Models",cs.CL cs.NE," Software repositories contain large amounts of textual data, ranging from -source code comments and issue descriptions to questions, answers, and comments -on Stack Overflow. To make sense of this textual data, topic modelling is -frequently used as a text-mining tool for the discovery of hidden semantic -structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used -topic model that aims to explain the structure of a corpus by grouping texts. -LDA requires multiple parameters to work well, and there are only rough and -sometimes conflicting guidelines available on how these parameters should be -set. In this paper, we contribute (i) a broad study of parameters to arrive at -good local optima for GitHub and Stack Overflow text corpora, (ii) an -a-posteriori characterisation of text corpora related to eight programming -languages, and (iii) an analysis of corpus feature importance via per-corpus -LDA configuration. We find that (1) popular rules of thumb for topic modelling -parameter configuration are not applicable to the corpora used in our -experiments, (2) corpora sampled from GitHub and Stack Overflow have different -characteristics and require different configurations to achieve good model fit, -and (3) we can predict good configurations for unseen corpora reliably. These -findings support researchers and practitioners in efficiently determining -suitable configurations for topic modelling when analysing textual data -contained in software repositories. -" -7130,1804.04838,Duygu Altinok,"An Ontology-Based Dialogue Management System for Banking and Finance - Dialogue Systems",cs.CL," Keeping the dialogue state in dialogue systems is a notoriously difficult -task. We introduce an ontology-based dialogue manage(OntoDM), a dialogue -manager that keeps the state of the conversation, provides a basis for anaphora -resolution and drives the conversation via domain ontologies. The banking and -finance area promises great potential for disambiguating the context via a rich -set of products and specificity of proper nouns, named entities and verbs. We -used ontologies both as a knowledge base and a basis for the dialogue manager; -the knowledge base component and dialogue manager components coalesce in a -sense. Domain knowledge is used to track Entities of Interest, i.e. nodes -(classes) of the ontology which happen to be products and services. In this way -we also introduced conversation memory and attention in a sense. We finely -blended linguistic methods, domain-driven keyword ranking and domain ontologies -to create ways of domain-driven conversation. Proposed framework is used in our -in-house German language banking and finance chatbots. General challenges of -German language processing and finance-banking domain chatbot language models -and lexicons are also introduced. This work is still in progress, hence no -success metrics have been introduced yet. -" -7131,1804.05017,"Qi Wang, Yuhang Xia, Yangming Zhou, Tong Ruan, Daqi Gao, Ping He","Incorporating Dictionaries into Deep Neural Networks for the Chinese - Clinical Named Entity Recognition",cs.CL," Clinical Named Entity Recognition (CNER) aims to identify and classify -clinical terms such as diseases, symptoms, treatments, exams, and body parts in -electronic health records, which is a fundamental and crucial task for clinical -and translational research. In recent years, deep neural networks have achieved -significant success in named entity recognition and many other Natural Language -Processing (NLP) tasks. Most of these algorithms are trained end to end, and -can automatically learn features from large scale labeled datasets. However, -these data-driven methods typically lack the capability of processing rare or -unseen entities. Previous statistical methods and feature engineering practice -have demonstrated that human knowledge can provide valuable information for -handling rare and unseen cases. In this paper, we address the problem by -incorporating dictionaries into deep neural networks for the Chinese CNER task. -Two different architectures that extend the Bi-directional Long Short-Term -Memory (Bi-LSTM) neural network and five different feature representation -schemes are proposed to handle the task. Computational results on the CCKS-2017 -Task 2 benchmark dataset show that the proposed method achieves the highly -competitive performance compared with the state-of-the-art deep learning -methods. -" -7132,1804.05038,Jerry Quinn and Miguel Ballesteros,Pieces of Eight: 8-bit Neural Machine Translation,cs.CL," Neural machine translation has achieved levels of fluency and adequacy that -would have been surprising a short time ago. Output quality is extremely -relevant for industry purposes, however it is equally important to produce -results in the shortest time possible, mainly for latency-sensitive -applications and to control cloud hosting costs. In this paper we show the -effectiveness of translating with 8-bit quantization for models that have been -trained using 32-bit floating point values. Results show that 8-bit translation -makes a non-negligible impact in terms of speed with no degradation in accuracy -and adequacy. -" -7133,1804.05088,"Ian Stewart, Yuval Pinter, Jacob Eisenstein","S\'i o no, qu\`e penses? Catalonian Independence and Linguistic Identity - on Social Media",cs.CL cs.SI," Political identity is often manifested in language variation, but the -relationship between the two is still relatively unexplored from a quantitative -perspective. This study examines the use of Catalan, a language local to the -semi-autonomous region of Catalonia in Spain, on Twitter in discourse related -to the 2017 independence referendum. We corroborate prior findings that -pro-independence tweets are more likely to include the local language than -anti-independence tweets. We also find that Catalan is used more often in -referendum-related discourse than in other contexts, contrary to prior findings -on language variation. This suggests a strong role for the Catalan language in -the expression of Catalonian political identity. -" -7134,1804.05095,"Priya Rani, Atul Kr. Ojha, Girish Nath Jha",Automatic Language Identification System for Hindi and Magahi,cs.CL," Language identification has become a prerequisite for all kinds of automated -text processing systems. In this paper, we present a rule-based language -identifier tool for two closely related Indo-Aryan languages: Hindi and Magahi. -This system has currently achieved an accuracy of approx 86.34%. We hope to -improve this in the future. Automatic identification of languages will be -significant in the accuracy of output of Web Crawlers. -" -7135,1804.05166,"Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, - and Yifan Gong",Developing Far-Field Speaker System Via Teacher-Student Learning,cs.CL," In this study, we develop the keyword spotting (KWS) and acoustic model (AM) -components in a far-field speaker system. Specifically, we use teacher-student -(T/S) learning to adapt a close-talk well-trained production AM to far-field by -using parallel close-talk and simulated far-field data. We also use T/S -learning to compress a large-size KWS model into a small-size one to fit the -device computational cost. Without the need of transcription, T/S learning well -utilizes untranscribed data to boost the model performance in both the AM -adaptation and KWS model compression. We further optimize the models with -sequence discriminative training and live data to reach the best performance of -systems. The adapted AM improved from the baseline by 72.60% and 57.16% -relative word error rate reduction on play-back and live test data, -respectively. The final KWS model size was reduced by 27 times from a -large-size KWS model without losing accuracy. -" -7136,1804.05253,Debanjan Ghosh and Smaranda Muresan,"""With 1 follower I must be AWESOME :P"". Exploring the role of irony - markers in irony recognition",cs.CL," Conversations in social media often contain the use of irony or sarcasm, when -the users say the opposite of what they really mean. Irony markers are the -meta-communicative clues that inform the reader that an utterance is ironic. We -propose a thorough analysis of theoretically grounded irony markers in two -social media platforms: $Twitter$ and $Reddit$. Classification and frequency -analysis show that for $Twitter$, typographic markers such as emoticons and -emojis are the most discriminative markers to recognize ironic utterances, -while for $Reddit$ the morphological markers (e.g., interjections, tag -questions) are the most discriminative. -" -7137,1804.05260,"Danushka Bollegala, Vincent Atanasov, Takanori Maehara, Ken-ichi - Kawarabayashi",ClassiNet -- Predicting Missing Features for Short-Text Classification,cs.CL cs.AI cs.CV cs.LG," The fundamental problem in short-text classification is \emph{feature -sparseness} -- the lack of feature overlap between a trained model and a test -instance to be classified. We propose \emph{ClassiNet} -- a network of -classifiers trained for predicting missing features in a given instance, to -overcome the feature sparseness problem. Using a set of unlabeled training -instances, we first learn binary classifiers as feature predictors for -predicting whether a particular feature occurs in a given instance. Next, each -feature predictor is represented as a vertex $v_i$ in the ClassiNet where a -one-to-one correspondence exists between feature predictors and vertices. The -weight of the directed edge $e_{ij}$ connecting a vertex $v_i$ to a vertex -$v_j$ represents the conditional probability that given $v_i$ exists in an -instance, $v_j$ also exists in the same instance. We show that ClassiNets -generalize word co-occurrence graphs by considering implicit co-occurrences -between features. We extract numerous features from the trained ClassiNet to -overcome feature sparseness. In particular, for a given instance $\vec{x}$, we -find similar features from ClassiNet that did not appear in $\vec{x}$, and -append those features in the representation of $\vec{x}$. Moreover, we propose -a method based on graph propagation to find features that are indirectly -related to a given short-text. We evaluate ClassiNets on several benchmark -datasets for short-text classification. Our experimental results show that by -using ClassiNet, we can statistically significantly improve the accuracy in -short-text classification tasks, without having to use any external resources -such as thesauri for finding related features. -" -7138,1804.05262,"Joshua Coates, Danushka Bollegala","Frustratingly Easy Meta-Embedding -- Computing Meta-Embeddings by - Averaging Source Word Embeddings",cs.CL," Creating accurate meta-embeddings from pre-trained source embeddings has -received attention lately. Methods based on global and locally-linear -transformation and concatenation have shown to produce accurate -meta-embeddings. In this paper, we show that the arithmetic mean of two -distinct word embedding sets yields a performant meta-embedding that is -comparable or better than more complex meta-embedding learning methods. The -result seems counter-intuitive given that vector spaces in different source -embeddings are not comparable and cannot be simply averaged. We give insight -into why averaging can still produce accurate meta-embedding despite the -incomparability of the source vector spaces. -" -7139,1804.05276,"Ashok Deb, Kristina Lerman, Emilio Ferrara",Predicting Cyber Events by Leveraging Hacker Sentiment,cs.CL," Recent high-profile cyber attacks exemplify why organizations need better -cyber defenses. Cyber threats are hard to accurately predict because attackers -usually try to mask their traces. However, they often discuss exploits and -techniques on hacking forums. The community behavior of the hackers may provide -insights into groups' collective malicious activity. We propose a novel -approach to predict cyber events using sentiment analysis. We test our approach -using cyber attack data from 2 major business organizations. We consider 3 -types of events: malicious software installation, malicious destination visits, -and malicious emails that surpassed the target organizations' defenses. We -construct predictive signals by applying sentiment analysis on hacker forum -posts to better understand hacker behavior. We analyze over 400K posts -generated between January 2016 and January 2018 on over 100 hacking forums both -on surface and Dark Web. We find that some forums have significantly more -predictive power than others. Sentiment-based models that leverage specific -forums can outperform state-of-the-art deep learning and time-series models on -forecasting cyber attacks weeks ahead of the events. -" -7140,1804.05294,"P. Le\'on-Ara\'uz, A. San Mart\'in","The EcoLexicon Semantic Sketch Grammar: from Knowledge Patterns to Word - Sketches",cs.CL," Many projects have applied knowledge patterns (KPs) to the retrieval of -specialized information. Yet terminologists still rely on manual analysis of -concordance lines to extract semantic information, since there are no -user-friendly publicly available applications enabling them to find knowledge -rich contexts (KRCs). To fill this void, we have created the KP-based -EcoLexicon Semantic SketchGrammar (ESSG) in the well-known corpus query system -Sketch Engine. For the first time, the ESSG is now publicly available inSketch -Engine to query the EcoLexicon English Corpus. Additionally, reusing the ESSG -in any English corpus uploaded by the user enables Sketch Engine to extract -KRCs codifying generic-specific, part-whole, location, cause and function -relations, because most of the KPs are domain-independent. The information is -displayed in the form of summary lists (word sketches) containing the pairs of -terms linked by a given semantic relation. This paper describes the process of -building a KP-based sketch grammar with special focus on the last stage, -namely, the evaluation with refinement purposes. We conducted an initial -shallow precision and recall evaluation of the 64 English sketch grammar rules -created so far for hyponymy, meronymy and causality. Precision was measured -based on a random sample of concordances extracted from each word sketch type. -Recall was assessed based on a random sample of concordances where known term -pairs are found. The results are necessary for the improvement and refinement -of the ESSG. The noise of false positives helped to further specify the rules, -whereas the silence of false negatives allows us to find useful new patterns. -" -7141,1804.05306,"Che-Ping Tsai, Yi-Lin Tuan and Lin-shan Lee","Transcribing Lyrics From Commercial Song Audio: The First Step Towards - Singing Content Processing",cs.SD cs.CL eess.AS," Spoken content processing (such as retrieval and browsing) is maturing, but -the singing content is still almost completely left out. Songs are human voice -carrying plenty of semantic information just as speech, and may be considered -as a special type of speech with highly flexible prosody. The various problems -in song audio, for example the significantly changing phone duration over -highly flexible pitch contours, make the recognition of lyrics from song audio -much more difficult. This paper reports an initial attempt towards this goal. -We collected music-removed version of English songs directly from commercial -singing content. The best results were obtained by TDNN-LSTM with data -augmentation with 3-fold speed perturbation plus some special approaches. The -WER achieved (73.90%) was significantly lower than the baseline (96.21%), but -still relatively high. -" -7142,1804.05374,"Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio",Twin Regularization for online speech recognition,eess.AS cs.AI cs.CL cs.LG cs.NE," Online speech recognition is crucial for developing natural human-machine -interfaces. This modality, however, is significantly more challenging than -off-line ASR, since real-time/low-latency constraints inevitably hinder the use -of future information, that is known to be very helpful to perform robust -predictions. A popular solution to mitigate this issue consists of feeding -neural acoustic models with context windows that gather some future frames. -This introduces a latency which depends on the number of employed look-ahead -features. This paper explores a different approach, based on estimating the -future rather than waiting for it. Our technique encourages the hidden -representations of a unidirectional recurrent network to embed some useful -information about the future. Inspired by a recently proposed technique called -Twin Networks, we add a regularization term that forces forward hidden states -to be as close as possible to cotemporal backward ones, computed by a ""twin"" -neural network running backwards in time. The experiments, conducted on a -number of datasets, recurrent architectures, input features, and acoustic -conditions, have shown the effectiveness of this approach. One important -advantage is that our method does not introduce any additional computation at -test time if compared to standard unidirectional recurrent networks. -" -7143,1804.05388,"Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu","Introducing two Vietnamese Datasets for Evaluating Semantic Models of - (Dis-)Similarity and Relatedness",cs.CL," We present two novel datasets for the low-resource language Vietnamese to -assess models of semantic similarity: ViCon comprises pairs of synonyms and -antonyms across word classes, thus offering data to distinguish between -similarity and dissimilarity. ViSim-400 provides degrees of similarity across -five semantic relations, as rated by human judges. The two datasets are -verified through standard co-occurrence and neural network models, showing -results comparable to the respective English datasets. -" -7144,1804.05392,"Kenton Lee, Luheng He, Luke Zettlemoyer",Higher-order Coreference Resolution with Coarse-to-fine Inference,cs.CL," We introduce a fully differentiable approximation to higher-order inference -for coreference resolution. Our approach uses the antecedent distribution from -a span-ranking architecture as an attention mechanism to iteratively refine -span representations. This enables the model to softly consider multiple hops -in the predicted clusters. To alleviate the computational cost of this -iterative process, we introduce a coarse-to-fine approach that incorporates a -less accurate but more efficient bilinear factor, enabling more aggressive -pruning without hurting accuracy. Compared to the existing state-of-the-art -span-ranking approach, our model significantly improves accuracy on the English -OntoNotes benchmark, while being far more computationally efficient. -" -7145,1804.05398,Radhika Mamidi,Context and Humor: Understanding Amul advertisements of India,cs.CL," Contextual knowledge is the most important element in understanding language. -By contextual knowledge we mean both general knowledge and discourse knowledge -i.e. knowledge of the situational context, background knowledge and the -co-textual context [10]. In this paper, we will discuss the importance of -contextual knowledge in understanding the humor present in the cartoon based -Amul advertisements in India.In the process, we will analyze these -advertisements and also see if humor is an effective tool for advertising and -thereby, for marketing.These bilingual advertisements also expect the audience -to have the appropriate linguistic knowledge which includes knowledge of -English and Hindi vocabulary, morphology and syntax. Different techniques like -punning, portmanteaus and parodies of popular proverbs, expressions, acronyms, -famous dialogues, songs etc are employed to convey the message in a humorous -way. The present study will concentrate on these linguistic cues and the -required context for understanding wit and humor. -" -7146,1804.05408,"Sean MacAvaney, Luca Soldaini, Arman Cohan, Nazli Goharian","GU IRLAB at SemEval-2018 Task 7: Tree-LSTMs for Scientific Relation - Classification",cs.CL," SemEval 2018 Task 7 focuses on relation ex- traction and classification in -scientific literature. In this work, we present our tree-based LSTM network for -this shared task. Our approach placed 9th (of 28) for subtask 1.1 (relation -classification), and 5th (of 20) for subtask 1.2 (relation classification with -noisy entities). We also provide an ablation study of features included as -input to the network. -" -7147,1804.05416,"Taraka Rama, Johann-Mattis List, Johannes Wahle, Gerhard J\""ager","Are Automatic Methods for Cognate Detection Good Enough for Phylogenetic - Reconstruction in Historical Linguistics?",cs.CL," We evaluate the performance of state-of-the-art algorithms for automatic -cognate detection by comparing how useful automatically inferred cognates are -for the task of phylogenetic inference compared to classical manually annotated -cognate sets. Our findings suggest that phylogenies inferred from automated -cognate sets come close to phylogenies inferred from expert-annotated ones, -although on average, the latter are still superior. We conclude that future -work on phylogenetic reconstruction can profit much from automatic cognate -detection. Especially where scholars are merely interested in exploring the -bigger picture of a language family's phylogeny, algorithms for automatic -cognate detection are a useful complement for current research on language -phylogenies. -" -7148,1804.05417,"Reuben Cohn-Gordon, Noah Goodman, Christopher Potts","Pragmatically Informative Image Captioning with Character-Level - Inference",cs.CL," We combine a neural image captioner with a Rational Speech Acts (RSA) model -to make a system that is pragmatically informative: its objective is to produce -captions that are not merely true but also distinguish their inputs from -similar images. Previous attempts to combine RSA with neural image captioning -require an inference which normalizes over the entire set of possible -utterances. This poses a serious problem of efficiency, previously solved by -sampling a small subset of possible utterances. We instead solve this problem -by implementing a version of RSA which operates at the level of characters -(""a"",""b"",""c""...) during the unrolling of the caption. We find that the -utterance-level effect of referential captions can be obtained with only -character-level decisions. Finally, we introduce an automatic method for -testing the performance of pragmatic speaker models, and show that our model -outperforms a non-pragmatic baseline as well as a word-level RSA captioner. -" -7149,1804.05435,"Peter Clark, Bhavana Dalvi, Niket Tandon","What Happened? Leveraging VerbNet to Predict the Effects of Actions in - Procedural Text",cs.CL cs.AI," Our goal is to answer questions about paragraphs describing processes (e.g., -photosynthesis). Texts of this genre are challenging because the effects of -actions are often implicit (unstated), requiring background knowledge and -inference to reason about the changing world states. To supply this knowledge, -we leverage VerbNet to build a rulebase (called the Semantic Lexicon) of the -preconditions and effects of actions, and use it along with commonsense -knowledge of persistence to answer questions about change. Our evaluation shows -that our system, ProComp, significantly outperforms two strong reading -comprehension (RC) baselines. Our contributions are two-fold: the Semantic -Lexicon rulebase itself, and a demonstration of how a simulation-based approach -to machine reading can outperform RC methods that rely on surface cues alone. - Since this work was performed, we have developed neural systems that -outperform ProComp, described elsewhere (Dalvi et al., NAACL'18). However, the -Semantic Lexicon remains a novel and potentially useful resource, and its -integration with neural systems remains a currently unexplored opportunity for -further improvements in machine reading about processes. -" -7150,1804.05448,"Xin Wang, Yuan-Fang Wang, William Yang Wang","Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal - Attentions for Video Captioning",cs.CL cs.AI cs.CV," A major challenge for video captioning is to combine audio and visual cues. -Existing multi-modal fusion methods have shown encouraging results in video -understanding. However, the temporal structures of multiple modalities at -different granularities are rarely explored, and how to selectively fuse the -multi-modal representations at different levels of details remains uncharted. -In this paper, we propose a novel hierarchically aligned cross-modal attention -(HACA) framework to learn and selectively fuse both global and local temporal -dynamics of different modalities. Furthermore, for the first time, we validate -the superior performance of the deep audio features on the video captioning -task. Finally, our HACA model significantly outperforms the previous best -systems and achieves new state-of-the-art results on the widely used MSR-VTT -dataset. -" -7151,1804.05499,Aaron Jaech and Shobhit Hathi and Mari Ostendorf,Community Member Retrieval on Social Media using Textual Information,cs.CL," This paper addresses the problem of community membership detection using only -text features in a scenario where a small number of positive labeled examples -defines the community. The solution introduces an unsupervised proxy task for -learning user embeddings: user re-identification. Experiments with 16 different -communities show that the resulting embeddings are more effective for community -membership identification than common unsupervised representations. -" -7152,1804.05630,Ismail El Bazi and Nabil Laachfoubi,Arabic Named Entity Recognition using Word Representations,cs.CL," Recent work has shown the effectiveness of the word representations features -in significantly improving supervised NER for the English language. In this -study we investigate whether word representations can also boost supervised NER -in Arabic. We use word representations as additional features in a Conditional -Random Field (CRF) model and we systematically compare three popular neural -word embedding algorithms (SKIP-gram, CBOW and GloVe) and six different -approaches for integrating word representations into NER system. Experimental -results show that Brown Clustering achieves the best performance among the six -approaches. Concerning the word embedding features, the clustering embedding -features outperform other embedding features and the distributional prototypes -produce the second best result. Moreover, the combination of Brown clusters and -word embedding features provides additional improvement of nearly 10% in -F1-score over the baseline. -" -7153,1804.05685,"Arman Cohan, Franck Dernoncourt, Doo Soon Kim, Trung Bui, Seokhwan - Kim, Walter Chang and Nazli Goharian","A Discourse-Aware Attention Model for Abstractive Summarization of Long - Documents",cs.CL," Neural abstractive summarization models have led to promising results in -summarizing relatively short documents. We propose the first model for -abstractive summarization of single, longer-form documents (e.g., research -papers). Our approach consists of a new hierarchical encoder that models the -discourse structure of a document, and an attentive discourse-aware decoder to -generate the summary. Empirical results on two large-scale datasets of -scientific papers show that our model significantly outperforms -state-of-the-art models. -" -7154,1804.05686,Sabine Ploux and Viviane D\'eprez,"Organization and Independence or Interdependence? Study of the - Neurophysiological Dynamics of Syntactic and Semantic Processing",cs.CL," In this article we present a multivariate model for determining the different -syntactic, semantic, and form (surface-structure) processes underlying the -comprehension of simple phrases. This model is applied to EEG signals recorded -during a reading task. The results show a hierarchical precedence of the -neurolinguistic processes : form, then syntactic and lastly semantic processes. -We also found (a) that verbs are at the heart of phrase syntax processing, (b) -an interaction between syntactic movement within the phrase, and semantic -processes derived from a person-centered reference frame. Eigenvectors of the -multivariate model provide electrode-times profiles that separate the -distinctive linguistic processes and/or highlight their interaction. The -accordance of these findings with different linguistic theories are discussed. -" -7155,1804.05689,Sowmya Vajjala and Ziwei Zhou,"The Relevance of Text and Speech Features in Automatic Non-native - English Accent Identification",cs.CL," This paper describes our experiments with automatically identifying native -accents from speech samples of non-native English speakers using low level -audio features, and n-gram features from manual transcriptions. Using a -publicly available non-native speech corpus and simple audio feature -representations that do not perform word/phoneme recognition, we show that it -is possible to achieve close to 90% classification accuracy for this task. -While character n-grams perform similar to speech features, we show that speech -features are not affected by prompt variation, whereas ngrams are. Since the -approach followed can be easily adapted to any language provided we have enough -training data, we believe these results will provide useful insights for the -development of accent recognition systems and for the study of accents in the -context of language learning. -" -7156,1804.05734,"Chenhua Chen, Yue Zhang","Learning How to Self-Learn: Enhancing Self-Training Using Neural - Reinforcement Learning",cs.CL," Self-training is a useful strategy for semi-supervised learning, leveraging -raw texts for enhancing model performances. Traditional self-training methods -depend on heuristics such as model confidence for instance selection, the -manual adjustment of which can be expensive. To address these challenges, we -propose a deep reinforcement learning method to learn the self-training -strategy automatically. Based on neural network representation of sentences, -our model automatically learns an optimal policy for instance selection. -Experimental results show that our approach outperforms the baseline solutions -in terms of better tagging performances and stability. -" -7157,1804.05825,"Lena Hettinger, Alexander Dallmann, Albin Zehe, Thomas Niebler, - Andreas Hotho",ClaiRE at SemEval-2018 Task 7 - Extended Version,cs.CL," In this paper we describe our post-evaluation results for SemEval-2018 Task 7 -on clas- sification of semantic relations in scientific literature for clean -(subtask 1.1) and noisy data (subtask 1.2). This is an extended ver- sion of -our workshop paper (Hettinger et al., 2018) including further technical details -(Sec- tions 3.2 and 4.3) and changes made to the preprocessing step in the -post-evaluation phase (Section 2.1). Due to these changes Classification of -Relations using Embeddings (ClaiRE) achieved an improved F1 score of 75.11% for -the first subtask and 81.44% for the second. -" -7158,1804.05831,"Nikita Muravyev, Alexander Panchenko, Sergei Obiedkov",Neologisms on Facebook,cs.CL," In this paper, we present a study of neologisms and loan words frequently -occurring in Facebook user posts. We have analyzed a dataset of several million -publically available posts written during 2006-2013 by Russian-speaking -Facebook users. From these, we have built a vocabulary of most frequent -lemmatized words missing from the OpenCorpora dictionary the assumption being -that many such words have entered common use only recently. This assumption is -certainly not true for all the words extracted in this way; for that reason, we -manually filtered the automatically obtained list in order to exclude -non-Russian or incorrectly lemmatized words, as well as words recorded by other -dictionaries or those occurring in texts from the Russian National Corpus. The -result is a list of 168 words that can potentially be considered neologisms. We -present an attempt at an etymological classification of these neologisms -(unsurprisingly, most of them have recently been borrowed from English, but -there are also quite a few new words composed of previously borrowed stems) and -identify various derivational patterns. We also classify words into several -large thematic areas, ""internet"", ""marketing"", and ""multimedia"" being among -those with the largest number of words. We believe that, together with the word -base collected in the process, they can serve as a starting point in further -studies of neologisms and lexical processes that lead to their acceptance into -the mainstream language. -" -7159,1804.05868,"Irshad Ahmad Bhat, Riyaz Ahmad Bhat, Manish Shrivastava and Dipti - Misra Sharma",Universal Dependency Parsing for Hindi-English Code-switching,cs.CL," Code-switching is a phenomenon of mixing grammatical structures of two or -more languages under varied social constraints. The code-switching data differ -so radically from the benchmark corpora used in NLP community that the -application of standard technologies to these data degrades their performance -sharply. Unlike standard corpora, these data often need to go through -additional processes such as language identification, normalization and/or -back-transliteration for their efficient processing. In this paper, we -investigate these indispensable processes and other problems associated with -syntactic parsing of code-switching data and propose methods to mitigate their -effects. In particular, we study dependency parsing of code-switching data of -Hindi and English multilingual speakers from Twitter. We present a treebank of -Hindi-English code-switching tweets under Universal Dependencies scheme and -propose a neural stacking model for parsing that efficiently leverages -part-of-speech tag and syntactic tree annotations in the code-switching -treebank and the preexisting Hindi and English treebanks. We also present -normalization and back-transliteration models with a decoding process tailored -for code-switching data. Results show that our neural stacking parser is 1.5% -LAS points better than the augmented parsing model and our decoding process -improves results by 3.8% LAS points over the first-best normalization and/or -back-transliteration. -" -7160,1804.05918,"Zeyu Dai, Ruihong Huang","Improving Implicit Discourse Relation Classification by Modeling - Inter-dependencies of Discourse Units in a Paragraph",cs.CL," We argue that semantic meanings of a sentence or clause can not be -interpreted independently from the rest of a paragraph, or independently from -all discourse relations and the overall paragraph-level discourse structure. -With the goal of improving implicit discourse relation classification, we -introduce a paragraph-level neural networks that model inter-dependencies -between discourse units as well as discourse relation continuity and patterns, -and predict a sequence of discourse relations in a paragraph. Experimental -results show that our model outperforms the previous state-of-the-art systems -on the benchmark corpus of PDTB. -" -7161,1804.05922,"Bhuwan Dhingra, Qiao Jin, Zhilin Yang, William W. Cohen, Ruslan - Salakhutdinov",Neural Models for Reasoning over Multiple Mentions using Coreference,cs.CL cs.LG," Many problems in NLP require aggregating information from multiple mentions -of the same entity which may be far apart in the text. Existing Recurrent -Neural Network (RNN) layers are biased towards short-term dependencies and -hence not suited to such tasks. We present a recurrent layer which is instead -biased towards coreferent dependencies. The layer uses coreference annotations -extracted from an external system to connect entity mentions belonging to the -same cluster. Incorporating this layer into a state-of-the-art reading -comprehension model improves performance on three datasets -- Wikihop, LAMBADA -and the bAbi AI tasks -- with large gains when training data is scarce. -" -7162,1804.05940,"Marcin Junczys-Dowmunt, Roman Grundkiewicz, Shubha Guha, Kenneth - Heafield","Approaching Neural Grammatical Error Correction as a Low-Resource - Machine Translation Task",cs.CL," Previously, neural methods in grammatical error correction (GEC) did not -reach state-of-the-art results compared to phrase-based statistical machine -translation (SMT) baselines. We demonstrate parallels between neural GEC and -low-resource neural MT and successfully adapt several methods from low-resource -MT to neural GEC. We further establish guidelines for trustable results in -neural GEC and propose a set of model-independent methods for neural GEC that -can be easily applied in most GEC settings. Proposed methods include adding -source-side noise, domain-adaptation techniques, a GEC-specific -training-objective, transfer learning with monolingual data, and ensembling of -independently trained GEC models and language models. The combined effects of -these methods result in better than state-of-the-art neural GEC models that -outperform previously best neural GEC systems by more than 10% M$^2$ on the -CoNLL-2014 benchmark and 5.9% on the JFLEG test set. Non-neural -state-of-the-art systems are outperformed by more than 2% on the CoNLL-2014 -benchmark and by 4% on JFLEG. -" -7163,1804.05945,"Roman Grundkiewicz, Marcin Junczys-Dowmunt","Near Human-Level Performance in Grammatical Error Correction with Hybrid - Machine Translation",cs.CL," We combine two of the most popular approaches to automated Grammatical Error -Correction (GEC): GEC based on Statistical Machine Translation (SMT) and GEC -based on Neural Machine Translation (NMT). The hybrid system achieves new -state-of-the-art results on the CoNLL-2014 and JFLEG benchmarks. This GEC -system preserves the accuracy of SMT output and, at the same time, generates -more fluent sentences as it typical for NMT. Our analysis shows that the -created systems are closer to reaching human-level performance than any other -GEC system reported so far. -" -7164,1804.05958,"Julia Kreutzer, Shahram Khadivi, Evgeny Matusov, Stefan Riezler",Can Neural Machine Translation be Improved with User Feedback?,cs.CL stat.ML," We present the first real-world application of methods for improving neural -machine translation (NMT) with human reinforcement, based on explicit and -implicit user feedback collected on the eBay e-commerce platform. Previous work -has been confined to simulation experiments, whereas in this paper we work with -real logged feedback for offline bandit learning of NMT parameters. We conduct -a thorough analysis of the available explicit user judgments---five-star -ratings of translation quality---and show that they are not reliable enough to -yield significant improvements in bandit learning. In contrast, we successfully -utilize implicit task-based feedback collected in a cross-lingual search task -to improve task-specific and machine translation quality metrics. -" -7165,1804.05972,"Sean MacAvaney, Amir Zeldes",A Deeper Look into Dependency-Based Word Embeddings,cs.CL," We investigate the effect of various dependency-based word embeddings on -distinguishing between functional and domain similarity, word similarity -rankings, and two downstream tasks in English. Variations include word -embeddings trained using context windows from Stanford and Universal -dependencies at several levels of enhancement (ranging from unlabeled, to -Enhanced++ dependencies). Results are compared to basic linear contexts and -evaluated on several datasets. We found that embeddings trained with Universal -and Stanford dependency contexts excel at different tasks, and that enhanced -dependencies often improve performance. -" -7166,1804.05990,"Hao Peng, Sam Thomson, Swabha Swayamdipta, and Noah A. Smith",Learning Joint Semantic Parsers from Disjoint Data,cs.CL," We present a new approach to learning semantic parsers from multiple -datasets, even when the target semantic formalisms are drastically different, -and the underlying corpora do not overlap. We handle such ""disjoint"" data by -treating annotations for unobserved formalisms as latent structured variables. -Building on state-of-the-art baselines, we show improvements both in -frame-semantic parsing and semantic dependency parsing by modeling them -jointly. -" -7167,1804.06004,"Katherine A. Keith, Su Lin Blodgett, and Brendan O'Connor",Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses,cs.CL," Dependency parsing research, which has made significant gains in recent -years, typically focuses on improving the accuracy of single-tree predictions. -However, ambiguity is inherent to natural language syntax, and communicating -such ambiguity is important for error analysis and better-informed downstream -applications. In this work, we propose a transition sampling algorithm to -sample from the full joint distribution of parse trees defined by a -transition-based parsing model, and demonstrate the use of the samples in -probabilistic dependency analysis. First, we define the new task of dependency -path prediction, inferring syntactic substructures over part of a sentence, and -provide the first analysis of performance on this task. Second, we demonstrate -the usefulness of our Monte Carlo syntax marginal method for parser error -analysis and calibration. Finally, we use this method to propagate parse -uncertainty to two downstream information extraction applications: identifying -persons killed by police and semantic role assignment. -" -7168,1804.06024,"Katharina Kann, Manuel Mager, Ivan Meza-Ruiz, Hinrich Sch\""utze","Fortification of Neural Morphological Segmentation Models for - Polysynthetic Minimal-Resource Languages",cs.CL," Morphological segmentation for polysynthetic languages is challenging, -because a word may consist of many individual morphemes and training data can -be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define -the state of the art for morphological segmentation in high-resource settings -and for (mostly) European languages, we first show that they also obtain -competitive performance for Mexican polysynthetic languages in minimal-resource -settings. We then propose two novel multi-task training approaches -one with, -one without need for external unlabeled resources-, and two corresponding data -augmentation methods, improving over the neural baseline for all languages. -Finally, we explore cross-lingual transfer as a third way to fortify our neural -model and show that we can train one single multi-lingual model for related -languages while maintaining comparable or even improved performance, thus -reducing the amount of parameters by close to 75%. We provide our morphological -segmentation datasets for Mexicanero, Nahuatl, Wixarika and Yorem Nokki for -future research. -" -7169,1804.06026,"Varun Manjunatha and Mohit Iyyer and Jordan Boyd-Graber and Larry - Davis",Learning to Color from Language,cs.CV cs.CL," Automatic colorization is the process of adding color to greyscale images. We -condition this process on language, allowing end users to manipulate a -colorized image by feeding in different captions. We present two different -architectures for language-conditioned colorization, both of which produce more -accurate and plausible colorizations than a language-agnostic version. Through -this language-based framework, we can dramatically alter colorizations by -manipulating descriptive color words in captions. -" -7170,1804.06028,Nikita Nangia and Samuel R. Bowman,ListOps: A Diagnostic Dataset for Latent Tree Learning,cs.CL," Latent tree learning models learn to parse a sentence without syntactic -supervision, and use that parse to build the sentence representation. Existing -work on such models has shown that, while they perform well on tasks like -sentence classification, they do not learn grammars that conform to any -plausible semantic or syntactic formalism (Williams et al., 2018a). Studying -the parsing ability of such models in natural language can be challenging due -to the inherent complexities of natural language, like having several valid -parses for a single sentence. In this paper we introduce ListOps, a toy dataset -created to study the parsing ability of latent tree models. ListOps sequences -are in the style of prefix arithmetic. The dataset is designed to have a single -correct parsing strategy that a system needs to learn to succeed at the task. -We show that the current leading latent tree models are unable to learn to -parse and succeed at ListOps. These models achieve accuracies worse than purely -sequential RNNs. -" -7171,1804.06035,"Jiawei Wu, Lei Li, William Yang Wang",Reinforced Co-Training,cs.CL," Co-training is a popular semi-supervised learning framework to utilize a -large amount of unlabeled data in addition to a small labeled set. Co-training -methods exploit predicted labels on the unlabeled data and select samples based -on prediction confidence to augment the training. However, the selection of -samples in existing co-training methods is based on a predetermined policy, -which ignores the sampling bias between the unlabeled and the labeled subsets, -and fails to explore the data space. In this paper, we propose a novel method, -Reinforced Co-Training, to select high-quality unlabeled samples to better -co-train on. More specifically, our approach uses Q-learning to learn a data -selection policy with a small labeled dataset, and then exploits this policy to -train the co-training classifiers automatically. Experimental results on -clickbait detection and generic text classification tasks demonstrate that our -proposed method can obtain more accurate text classification results. -" -7172,1804.06059,"Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer","Adversarial Example Generation with Syntactically Controlled Paraphrase - Networks",cs.CL," We propose syntactically controlled paraphrase networks (SCPNs) and use them -to generate adversarial examples. Given a sentence and a target syntactic form -(e.g., a constituency parse), SCPNs are trained to produce a paraphrase of the -sentence with the desired syntax. We show it is possible to create training -data for this task by first doing backtranslation at a very large scale, and -then using a parser to label the syntactic transformations that naturally occur -during this process. Such data allows us to train a neural encoder-decoder -model with extra inputs to specify the target syntax. A combination of -automated and human evaluations show that SCPNs generate paraphrases that -follow their target specifications without decreasing paraphrase quality when -compared to baseline (uncontrolled) paraphrase systems. Furthermore, they are -more capable of generating syntactically adversarial examples that both (1) -""fool"" pretrained models and (2) improve the robustness of these models to -syntactic variation when used to augment their training data. -" -7173,1804.06137,"Venkatesh Duppada, Royal Jain, Sushant Hiray",SeerNet at SemEval-2018 Task 1: Domain Adaptation for Affect in Tweets,cs.CL," The paper describes the best performing system for the SemEval-2018 Affect in -Tweets (English) sub-tasks. The system focuses on the ordinal classification -and regression sub-tasks for valence and emotion. For ordinal classification -valence is classified into 7 different classes ranging from -3 to 3 whereas -emotion is classified into 4 different classes 0 to 3 separately for each -emotion namely anger, fear, joy and sadness. The regression sub-tasks estimate -the intensity of valence and each emotion. The system performs domain -adaptation of 4 different models and creates an ensemble to give the final -prediction. The proposed system achieved 1st position out of 75 teams which -participated in the fore-mentioned sub-tasks. We outperform the baseline model -by margins ranging from 49.2% to 76.4%, thus, pushing the state-of-the-art -significantly. -" -7174,1804.06189,"Alberto Poncelas, Dimitar Shterionov, Andy Way, Gideon Maillette de - Buy Wenniger and Peyman Passban",Investigating Backtranslation in Neural Machine Translation,cs.CL," A prerequisite for training corpus-based machine translation (MT) systems -- -either Statistical MT (SMT) or Neural MT (NMT) -- is the availability of -high-quality parallel data. This is arguably more important today than ever -before, as NMT has been shown in many studies to outperform SMT, but mostly -when large parallel corpora are available; in cases where data is limited, SMT -can still outperform NMT. - Recently researchers have shown that back-translating monolingual data can be -used to create synthetic parallel corpora, which in turn can be used in -combination with authentic parallel data to train a high-quality NMT system. -Given that large collections of new parallel text become available only quite -rarely, backtranslation has become the norm when building state-of-the-art NMT -systems, especially in resource-poor scenarios. - However, we assert that there are many unknown factors regarding the actual -effects of back-translated data on the translation capabilities of an NMT -model. Accordingly, in this work we investigate how using back-translated data -as a training corpus -- both as a separate standalone dataset as well as -combined with human-generated parallel data -- affects the performance of an -NMT model. We use incrementally larger amounts of back-translated data to train -a range of NMT systems for German-to-English, and analyse the resulting -translation performance. -" -7175,1804.06201,"Guangneng Hu, Yu Zhang, Qiang Yang","LCMR: Local and Centralized Memories for Collaborative Filtering with - Unstructured Text",cs.IR cs.AI cs.CL," Collaborative filtering (CF) is the key technique for recommender systems. -Pure CF approaches exploit the user-item interaction data (e.g., clicks, likes, -and views) only and suffer from the sparsity issue. Items are usually -associated with content information such as unstructured text (e.g., abstracts -of articles and reviews of products). CF can be extended to leverage text. In -this paper, we develop a unified neural framework to exploit interaction data -and content information seamlessly. The proposed framework, called LCMR, is -based on memory networks and consists of local and centralized memories for -exploiting content information and interaction data, respectively. By modeling -content information as local memories, LCMR attentively learns what to exploit -with the guidance of user-item interaction. On real-world datasets, LCMR shows -better performance by comparing with various baselines in terms of the hit -ratio and NDCG metrics. We further conduct analyses to understand how local and -centralized memories work for the proposed framework. -" -7176,1804.06323,"Ye Qi, Devendra Singh Sachan, Matthieu Felix, Sarguna Janani - Padmanabhan, Graham Neubig","When and Why are Pre-trained Word Embeddings Useful for Neural Machine - Translation?",cs.CL," The performance of Neural Machine Translation (NMT) systems often suffers in -low-resource scenarios where sufficiently large-scale parallel corpora cannot -be obtained. Pre-trained word embeddings have proven to be invaluable for -improving performance in natural language analysis tasks, which often suffer -from paucity of data. However, their utility for NMT has not been extensively -explored. In this work, we perform five sets of experiments that analyze when -we can expect pre-trained word embeddings to help in NMT tasks. We show that -such embeddings can be surprisingly effective in some cases -- providing gains -of up to 20 BLEU points in the most favorable setting. -" -7177,1804.06333,"Atish Pawar, Vijay Mago","Similarity between Learning Outcomes from Course Objectives using - Semantic Analysis, Blooms taxonomy and Corpus statistics",cs.CL," The course description provided by instructors is an essential piece of -information as it defines what is expected from the instructor and what he/she -is going to deliver during a particular course. One of the key components of a -course description is the Learning Objectives section. The contents of this -section are used by program managers who are tasked to compare and match two -different courses during the development of Transfer Agreements between various -institutions. This research introduces the development of semantic similarity -algorithms to calculate the similarity between two learning objectives of the -same domain. We present a novel methodology which deals with the semantic -similarity by using a previously established algorithm and integrating it with -the domain corpus utilizing domain statistics. The disambiguated domain serves -as a supervised learning data for the algorithm. We also introduce Bloom Index -to calculate the similarity between action verbs in the Learning Objectives -referring to the Blooms taxonomy. -" -7178,1804.06385,Laura Perez-Beltrachini and Mirella Lapata,Bootstrapping Generators from Noisy Data,cs.CL," A core step in statistical data-to-text generation concerns learning -correspondences between structured data representations (e.g., facts in a -database) and associated texts. In this paper we aim to bootstrap generators -from large scale datasets where the data (e.g., DBPedia facts) and related -texts (e.g., Wikipedia abstracts) are loosely aligned. We tackle this -challenging task by introducing a special-purpose content selection mechanism. -We use multi-instance learning to automatically discover correspondences -between data and text pairs and show how these can be used to enhance the -content signal while training an encoder-decoder architecture. Experimental -results demonstrate that models trained with content-specific objectives -improve upon a vanilla encoder-decoder which solely relies on soft attention. -" -7179,1804.06437,Juncen Li and Robin Jia and He He and Percy Liang,"Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style - Transfer",cs.CL," We consider the task of text attribute transfer: transforming a sentence to -alter a specific attribute (e.g., sentiment) while preserving its -attribute-independent content (e.g., changing ""screen is just the right size"" -to ""screen is too small""). Our training data includes only sentences labeled -with their attribute (e.g., positive or negative), but not pairs of sentences -that differ only in their attributes, so we must learn to disentangle -attributes from attribute-independent content in an unsupervised way. Previous -work using adversarial methods has struggled to produce high-quality outputs. -In this paper, we propose simpler methods motivated by the observation that -text attributes are often marked by distinctive phrases (e.g., ""too small""). -Our strongest method extracts content words by deleting phrases associated with -the sentence's original attribute value, retrieves new phrases associated with -the target attribute, and uses a neural model to fluently combine these into a -final output. On human evaluation, our best method generates grammatical and -appropriate responses on 22% more inputs than the best previous system, -averaged over three attribute transfer datasets: altering sentiment of reviews -on Yelp, altering sentiment of reviews on Amazon, and altering image captions -to be more romantic or humorous. -" -7180,1804.06439,"Nicolas Fiorini, Zhiyong Lu",Personalized neural language models for real-world query auto completion,cs.CL cs.AI," Query auto completion (QAC) systems are a standard part of search engines in -industry, helping users formulate their query. Such systems update their -suggestions after the user types each character, predicting the user's intent -using various signals - one of the most common being popularity. Recently, deep -learning approaches have been proposed for the QAC task, to specifically -address the main limitation of previous popularity-based methods: the inability -to predict unseen queries. In this work we improve previous methods based on -neural language modeling, with the goal of building an end-to-end system. We -particularly focus on using real-world data by integrating user information for -personalized suggestions when possible. We also make use of time information -and study how to increase diversity in the suggestions while studying the -impact on scalability. Our empirical results demonstrate a marked improvement -on two separate datasets over previous best methods in both accuracy and -scalability, making a step towards neural query auto-completion in production -search engines. -" -7181,1804.06440,"Sweta Karlekar, Tong Niu, Mohit Bansal","Detecting Linguistic Characteristics of Alzheimer's Dementia by - Interpreting Neural Models",cs.CL," Alzheimer's disease (AD) is an irreversible and progressive brain disease -that can be stopped or slowed down with medical treatment. Language changes -serve as a sign that a patient's cognitive functions have been impacted, -potentially leading to early diagnosis. In this work, we use NLP techniques to -classify and analyze the linguistic characteristics of AD patients using the -DementiaBank dataset. We apply three neural models based on CNNs, LSTM-RNNs, -and their combination, to distinguish between language samples from AD and -control patients. We achieve a new independent benchmark accuracy for the AD -classification task. More importantly, we next interpret what these neural -models have learned about the linguistic characteristics of AD patients, via -analysis based on activation clustering and first-derivative saliency -techniques. We then perform novel automatic pattern discovery inside activation -clusters, and consolidate AD patients' distinctive grammar patterns. -Additionally, we show that first derivative saliency can not only rediscover -previous language patterns of AD patients, but also shed light on the -limitations of neural models. Lastly, we also include analysis of -gender-separated AD data. -" -7182,1804.06451,"Ramakanth Pasunuru, Mohit Bansal",Multi-Reward Reinforced Summarization with Saliency and Entailment,cs.CL cs.AI cs.LG," Abstractive text summarization is the task of compressing and rewriting a -long document into a short summary while maintaining saliency, directed logical -entailment, and non-redundancy. In this work, we address these three important -aspects of a good summary via a reinforcement learning approach with two novel -reward functions: ROUGESal and Entail, on top of a coverage-based baseline. The -ROUGESal reward modifies the ROUGE metric by up-weighting the salient -phrases/words detected via a keyphrase classifier. The Entail reward gives high -(length-normalized) scores to logically-entailed summaries using an entailment -classifier. Further, we show superior performance improvement when these -rewards are combined with traditional metric (ROUGE) based rewards, via our -novel and effective multi-reward approach of optimizing multiple rewards -simultaneously in alternate mini-batches. Our method achieves the new -state-of-the-art results (including human evaluation) on the CNN/Daily Mail -dataset as well as strong improvements in a test-only transfer setup on -DUC-2002. -" -7183,1804.06473,"Yicheng Wang, Mohit Bansal",Robust Machine Comprehension Models via Adversarial Training,cs.CL," It is shown that many published models for the Stanford Question Answering -Dataset (Rajpurkar et al., 2016) lack robustness, suffering an over 50% -decrease in F1 score during adversarial evaluation based on the AddSent (Jia -and Liang, 2017) algorithm. It has also been shown that retraining models on -data generated by AddSent has limited effect on their robustness. We propose a -novel alternative adversary-generation algorithm, AddSentDiverse, that -significantly increases the variance within the adversarial training data by -providing effective examples that punish the model for making certain -superficial assumptions. Further, in order to improve robustness to AddSent's -semantic perturbations (e.g., antonyms), we jointly improve the model's -semantic-relationship learning capabilities in addition to our -AddSentDiverse-based adversarial training data augmentation. With these -additions, we show that we can make a state-of-the-art model significantly more -robust, achieving a 36.5% increase in F1 score under many different types of -adversarial evaluation while maintaining performance on the regular SQuAD task. -" -7184,1804.06506,"Peyman Passban, Qun Liu, Andy Way","Improving Character-based Decoding Using Target-Side Morphological - Information for Neural Machine Translation",cs.CL," Recently, neural machine translation (NMT) has emerged as a powerful -alternative to conventional statistical approaches. However, its performance -drops considerably in the presence of morphologically rich languages (MRLs). -Neural engines usually fail to tackle the large vocabulary and high -out-of-vocabulary (OOV) word rate of MRLs. Therefore, it is not suitable to -exploit existing word-based models to translate this set of languages. In this -paper, we propose an extension to the state-of-the-art model of Chung et al. -(2016), which works at the character level and boosts the decoder with -target-side morphological information. In our architecture, an additional -morphology table is plugged into the model. Each time the decoder samples from -a target vocabulary, the table sends auxiliary signals from the most relevant -affixes in order to enrich the decoder's current state and constrain it to -provide better predictions. We evaluated our model to translate English into -German, Russian, and Turkish as three MRLs and observed significant -improvements. -" -7185,1804.06512,"Bing Liu, Gokhan Tur, Dilek Hakkani-Tur, Pararth Shah, Larry Heck","Dialogue Learning with Human Teaching and Feedback in End-to-End - Trainable Task-Oriented Dialogue Systems",cs.CL," In this work, we present a hybrid learning method for training task-oriented -dialogue systems through online user interactions. Popular methods for learning -task-oriented dialogues include applying reinforcement learning with user -feedback on supervised pre-training models. Efficiency of such learning method -may suffer from the mismatch of dialogue state distribution between offline -training and online interactive learning stages. To address this challenge, we -propose a hybrid imitation and reinforcement learning method, with which a -dialogue agent can effectively learn from its interaction with users by -learning from human teaching and feedback. We design a neural network based -task-oriented dialogue agent that can be optimized end-to-end with the proposed -learning method. Experimental results show that our end-to-end dialogue agent -can learn effectively from the mistake it makes via imitation learning from -user teaching. Applying reinforcement learning with user feedback after the -imitation learning stage further improves the agent's capability in -successfully completing a task. -" -7186,1804.06517,"Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann","Diachronic Usage Relatedness (DURel): A Framework for the Annotation of - Lexical Semantic Change",cs.CL," We propose a framework that extends synchronic polysemy annotation to -diachronic changes in lexical meaning, to counteract the lack of resources for -evaluating computational models of lexical semantic change. Our framework -exploits an intuitive notion of semantic relatedness, and distinguishes between -innovative and reductive meaning changes with high inter-annotator agreement. -The resulting test set for German comprises ratings from five annotators for -the relatedness of 1,320 use pairs across 22 target words. -" -7187,1804.06536,"Binxuan Huang, Yanglan Ou, Kathleen M. Carley","Aspect Level Sentiment Classification with Attention-over-Attention - Neural Networks",cs.CL," Aspect-level sentiment classification aims to identify the sentiment -expressed towards some aspects given context sentences. In this paper, we -introduce an attention-over-attention (AOA) neural network for aspect level -sentiment classification. Our approach models aspects and sentences in a joint -way and explicitly captures the interaction between aspects and context -sentences. With the AOA module, our model jointly learns the representations -for aspects and sentences, and automatically focuses on the important parts in -sentences. Our experiments on laptop and restaurant datasets demonstrate our -approach outperforms previous LSTM-based architectures. -" -7188,1804.06609,Matt Post and David Vilar,"Fast Lexically Constrained Decoding with Dynamic Beam Allocation for - Neural Machine Translation",cs.CL," The end-to-end nature of neural machine translation (NMT) removes many ways -of manually guiding the translation process that were available in older -paradigms. Recent work, however, has introduced a new capability: lexically -constrained or guided decoding, a modification to beam search that forces the -inclusion of pre-specified words and phrases in the output. However, while -theoretically sound, existing approaches have computational complexities that -are either linear (Hokamp and Liu, 2017) or exponential (Anderson et al., 2017) -in the number of constraints. We present a algorithm for lexically constrained -decoding with a complexity of O(1) in the number of constraints. We demonstrate -the algorithms remarkable ability to properly place these constraints, and use -it to explore the shaky relationship between model and BLEU scores. Our -implementation is available as part of Sockeye. -" -7189,1804.06610,"Jungo Kasai and Robert Frank and Pauli Xu and William Merrill and Owen - Rambow",End-to-end Graph-based TAG Parsing with Neural Networks,cs.CL," We present a graph-based Tree Adjoining Grammar (TAG) parser that uses -BiLSTMs, highway connections, and character-level CNNs. Our best end-to-end -parser, which jointly performs supertagging, POS tagging, and parsing, -outperforms the previously reported best results by more than 2.2 LAS and UAS -points. The graph-based parsing architecture allows for global inference and -rich feature representations for TAG parsing, alleviating the fundamental -trade-off between transition-based and graph-based parsing systems. We also -demonstrate that the proposed parser achieves state-of-the-art performance in -the downstream tasks of Parsing Evaluation using Textual Entailments (PETE) and -Unbounded Dependency Recovery. This provides further support for the claim that -TAG is a viable formalism for problems that require rich structural analysis of -sentences. -" -7190,1804.06636,Sowmya Vajjala and Taraka Rama,Experiments with Universal CEFR Classification,cs.CL," The Common European Framework of Reference (CEFR) guidelines describe -language proficiency of learners on a scale of 6 levels. While the description -of CEFR guidelines is generic across languages, the development of automated -proficiency classification systems for different languages follow different -approaches. In this paper, we explore universal CEFR classification using -domain-specific and domain-agnostic, theory-guided as well as data-driven -features. We report the results of our preliminary experiments in monolingual, -cross-lingual, and multilingual classification with three languages: German, -Czech, and Italian. Our results show that both monolingual and multilingual -models achieve similar performance, and cross-lingual classification yields -lower, but comparable results to monolingual classification. -" -7191,1804.06657,"Christos Baziotis, Nikos Athanasiou, Georgios Paraskevopoulos, - Nikolaos Ellinas, Athanasia Kolovou, Alexandros Potamianos","NTUA-SLP at SemEval-2018 Task 2: Predicting Emojis using RNNs with - Context-aware Attention",cs.CL," In this paper we present a deep-learning model that competed at SemEval-2018 -Task 2 ""Multilingual Emoji Prediction"". We participated in subtask A, in which -we are called to predict the most likely associated emoji in English tweets. -The proposed architecture relies on a Long Short-Term Memory network, augmented -with an attention mechanism, that conditions the weight of each word, on a -""context vector"" which is taken as the aggregation of a tweet's meaning. -Moreover, we initialize the embedding layer of our model, with word2vec word -embeddings, pretrained on a dataset of 550 million English tweets. Finally, our -model does not rely on hand-crafted features or lexicons and is trained -end-to-end with back-propagation. We ranked 2nd out of 48 teams. -" -7192,1804.06658,"Christos Baziotis, Nikos Athanasiou, Alexandra Chronopoulou, Athanasia - Kolovou, Georgios Paraskevopoulos, Nikolaos Ellinas, Shrikanth Narayanan, - Alexandros Potamianos","NTUA-SLP at SemEval-2018 Task 1: Predicting Affective Content in Tweets - with Deep Attentive RNNs and Transfer Learning",cs.CL," In this paper we present deep-learning models that submitted to the -SemEval-2018 Task~1 competition: ""Affect in Tweets"". We participated in all -subtasks for English tweets. We propose a Bi-LSTM architecture equipped with a -multi-layer self attention mechanism. The attention mechanism improves the -model performance and allows us to identify salient words in tweets, as well as -gain insight into the models making them more interpretable. Our model utilizes -a set of word2vec word embeddings trained on a large collection of 550 million -Twitter messages, augmented by a set of word affective features. Due to the -limited amount of task-specific training data, we opted for a transfer learning -approach by pretraining the Bi-LSTMs on the dataset of Semeval 2017, Task 4A. -The proposed approach ranked 1st in Subtask E ""Multi-Label Emotion -Classification"", 2nd in Subtask A ""Emotion Intensity Regression"" and achieved -competitive results in other subtasks. -" -7193,1804.06659,"Christos Baziotis, Nikos Athanasiou, Pinelopi Papalampidi, Athanasia - Kolovou, Georgios Paraskevopoulos, Nikolaos Ellinas, Alexandros Potamianos","NTUA-SLP at SemEval-2018 Task 3: Tracking Ironic Tweets using Ensembles - of Word and Character Level Attentive RNNs",cs.CL," In this paper we present two deep-learning systems that competed at -SemEval-2018 Task 3 ""Irony detection in English tweets"". We design and ensemble -two independent models, based on recurrent neural networks (Bi-LSTM), which -operate at the word and character level, in order to capture both the semantic -and syntactic information in tweets. Our models are augmented with a -self-attention mechanism, in order to identify the most informative words. The -embedding layer of our word-level model is initialized with word2vec word -embeddings, pretrained on a collection of 550 million English tweets. We did -not utilize any handcrafted features, lexicons or external datasets as prior -information and our models are trained end-to-end using back propagation on -constrained data. Furthermore, we provide visualizations of tweets with -annotations for the salient tokens of the attention layer that can help to -interpret the inner workings of the proposed models. We ranked 2nd out of 42 -teams in Subtask A and 2nd out of 31 teams in Subtask B. However, -post-task-completion enhancements of our models achieve state-of-the-art -results ranking 1st for both subtasks. -" -7194,1804.06705,"Jan Pichl, Petr Marek, Jakub Konr\'ad, Martin Matul\'ik, Hoang Long - Nguyen and Jan \v{S}ediv\'y",Alquist: The Alexa Prize Socialbot,cs.CL," This paper describes a new open domain dialogue system Alquist developed as -part of the Alexa Prize competition for the Amazon Echo line of products. The -Alquist dialogue system is designed to conduct a coherent and engaging -conversation on popular topics. We are presenting a hybrid system combining -several machine learning and rule based approaches. We discuss and describe the -Alquist pipeline, data acquisition, and processing, dialogue manager, NLG, -knowledge aggregation and hierarchy of sub-dialogs. We present some of the -experimental results. -" -7195,1804.06716,"Rajneesh Pandey, Atul Kr. Ojha, Girish Nath Jha",Demo of Sanskrit-Hindi SMT System,cs.CL," The demo proposal presents a Phrase-based Sanskrit-Hindi (SaHiT) Statistical -Machine Translation system. The system has been developed on Moses. 43k -sentences of Sanskrit-Hindi parallel corpus and 56k sentences of a monolingual -corpus in the target language (Hindi) have been used. This system gives 57 BLEU -score. -" -7196,1804.06719,"Dominik Schlechtweg, Sabine Schulte im Walde","Distribution-based Prediction of the Degree of Grammaticalization for - German Prepositions",cs.CL," We test the hypothesis that the degree of grammaticalization of German -prepositions correlates with their corpus-based contextual dispersion measured -by word entropy. We find that there is indeed a moderate correlation for -entropy, but a stronger correlation for frequency and number of context types. -" -7197,1804.06759,"Ping Liu, Joshua Guberman, Libby Hemphill, Aron Culotta","Forecasting the presence and intensity of hostility on Instagram using - linguistic and social features",cs.CL cs.SI," Online antisocial behavior, such as cyberbullying, harassment, and trolling, -is a widespread problem that threatens free discussion and has negative -physical and mental health consequences for victims and communities. While -prior work has proposed automated methods to identify hostile comments in -online discussions, these methods work retrospectively on comments that have -already been posted, making it difficult to intervene before an interaction -escalates. In this paper we instead consider the problem of forecasting future -hostilities in online discussions, which we decompose into two tasks: (1) given -an initial sequence of non-hostile comments in a discussion, predict whether -some future comment will contain hostility; and (2) given the first hostile -comment in a discussion, predict whether this will lead to an escalation of -hostility in subsequent comments. Thus, we aim to forecast both the presence -and intensity of hostile comments based on linguistic and social features from -earlier comments. To evaluate our approach, we introduce a corpus of over 30K -annotated Instagram comments from over 1,100 posts. Our approach is able to -predict the appearance of a hostile comment on an Instagram post ten or more -hours in the future with an AUC of .82 (task 1), and can furthermore -distinguish between high and low levels of future hostility with an AUC of .91 -(task 2). -" -7198,1804.06775,"Benjamin Milde, Chris Biemann",Unspeech: Unsupervised Speech Context Embeddings,cs.SD cs.CL eess.AS," We introduce ""Unspeech"" embeddings, which are based on unsupervised learning -of context feature representations for spoken language. The embeddings were -trained on up to 9500 hours of crawled English speech data without -transcriptions or speaker information, by using a straightforward learning -objective based on context and non-context discrimination with negative -sampling. We use a Siamese convolutional neural network architecture to train -Unspeech embeddings and evaluate them on speaker comparison, utterance -clustering and as a context feature in TDNN-HMM acoustic models trained on -TED-LIUM, comparing it to i-vector baselines. Particularly decoding -out-of-domain speech data from the recently released Common Voice corpus shows -consistent WER reductions. We release our source code and pre-trained Unspeech -models under a permissive open source license. -" -7199,1804.06786,"Jack Hessel, David Mimno, Lillian Lee","Quantifying the visual concreteness of words and topics in multimodal - datasets",cs.CL cs.CV cs.IR," Multimodal machine learning algorithms aim to learn visual-textual -correspondences. Previous work suggests that concepts with concrete visual -manifestations may be easier to learn than concepts with abstract ones. We give -an algorithm for automatically computing the visual concreteness of words and -topics within multimodal datasets. We apply the approach in four settings, -ranging from image captions to images/text scraped from historical books. In -addition to enabling explorations of concepts in multimodal datasets, our -concreteness scores predict the capacity of machine learning algorithms to -learn textual/visual relationships. We find that 1) concrete concepts are -indeed easier to learn; 2) the large number of algorithms we consider have -similar failure cases; 3) the precise positive relationship between -concreteness and performance varies between datasets. We conclude with -recommendations for using concreteness scores to facilitate future multimodal -research. -" -7200,1804.06868,"Alane Suhr, Srinivasan Iyer, Yoav Artzi",Learning to Map Context-Dependent Sentences to Executable Formal Queries,cs.CL," We propose a context-dependent model to map utterances within an interaction -to executable formal queries. To incorporate interaction history, the model -maintains an interaction-level encoder that updates after each turn, and can -copy sub-sequences of previously predicted queries during generation. Our -approach combines implicit and explicit modeling of references between -utterances. We evaluate our model on the ATIS flight planning interactions, and -demonstrate the benefits of modeling context and explicit references. -" -7201,1804.06870,"Hao Tan, Mohit Bansal",Object Ordering with Bidirectional Matchings for Visual Reasoning,cs.CL cs.AI cs.CV," Visual reasoning with compositional natural language instructions, e.g., -based on the newly-released Cornell Natural Language Visual Reasoning (NLVR) -dataset, is a challenging task, where the model needs to have the ability to -create an accurate mapping between the diverse phrases and the several objects -placed in complex arrangements in the image. Further, this mapping needs to be -processed to answer the question in the statement given the ordering and -relationship of the objects across three similar images. In this paper, we -propose a novel end-to-end neural model for the NLVR task, where we first use -joint bidirectional attention to build a two-way conditioning between the -visual information and the language phrases. Next, we use an RL-based pointer -network to sort and process the varying number of unordered objects (so as to -match the order of the statement phrases) in each of the three images and then -pool over the three decisions. Our model achieves strong improvements (of 4-6% -absolute) over the state-of-the-art on both the structured representation and -raw image versions of the dataset. -" -7202,1804.06876,"Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang",Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods,cs.CL cs.AI," We introduce a new benchmark, WinoBias, for coreference resolution focused on -gender bias. Our corpus contains Winograd-schema style sentences with entities -corresponding to people referred by their occupation (e.g. the nurse, the -doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a -neural coreference system all link gendered pronouns to pro-stereotypical -entities with higher accuracy than anti-stereotypical entities, by an average -difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation -approach that, in combination with existing word-embedding debiasing -techniques, removes the bias demonstrated by these systems in WinoBias without -significantly affecting their performance on existing coreference benchmark -datasets. Our dataset and code are available at http://winobias.org. -" -7203,1804.06898,"Youmna Farag, Helen Yannakoudakis, Ted Briscoe","Neural Automated Essay Scoring and Coherence Modeling for Adversarially - Crafted Input",cs.CL cs.AI," We demonstrate that current state-of-the-art approaches to Automated Essay -Scoring (AES) are not well-suited to capturing adversarially crafted input of -grammatical but incoherent sequences of sentences. We develop a neural model of -local coherence that can effectively learn connectedness features between -sentences, and propose a framework for integrating and jointly training the -local coherence model with a state-of-the-art AES model. We evaluate our -approach against a number of baselines and experimentally demonstrate its -effectiveness on both the AES task and the task of flagging adversarial input, -further contributing to the development of an approach that strengthens the -validity of neural essay scoring models. -" -7204,1804.06922,"Sebastian Schuster, Joakim Nivre, Christopher D. Manning",Sentences with Gapping: Parsing and Reconstructing Elided Predicates,cs.CL," Sentences with gapping, such as Paul likes coffee and Mary tea, lack an overt -predicate to indicate the relation between two or more arguments. Surface -syntax representations of such sentences are often produced poorly by parsers, -and even if correct, not well suited to downstream natural language -understanding tasks such as relation extraction that are typically designed to -extract information from sentences with canonical clause structure. In this -paper, we present two methods for parsing to a Universal Dependencies graph -representation that explicitly encodes the elided material with additional -nodes and edges. We find that both methods can reconstruct elided material from -dependency trees with high accuracy when the parser correctly predicts the -existence of a gap. We further demonstrate that one of our methods can be -applied to other languages based on a case study on Swedish. -" -7205,1804.06987,"Sharmistha Jat, Siddhesh Khandelwal, Partha Talukdar","Improving Distantly Supervised Relation Extraction using Word and Entity - Based Attention",cs.CL," Relation extraction is the problem of classifying the relationship between -two entities in a given sentence. Distant Supervision (DS) is a popular -technique for developing relation extractors starting with limited supervision. -We note that most of the sentences in the distant supervision relation -extraction setting are very long and may benefit from word attention for better -sentence representation. Our contributions in this paper are threefold. -Firstly, we propose two novel word attention models for distantly- supervised -relation extraction: (1) a Bi-directional Gated Recurrent Unit (Bi-GRU) based -word attention model (BGWA), (2) an entity-centric attention model (EA), and -(3) a combination model which combines multiple complementary models using -weighted voting method for improved relation extraction. Secondly, we introduce -GDS, a new distant supervision dataset for relation extraction. GDS removes -test data noise present in all previous distant- supervision benchmark -datasets, making credible automatic evaluation possible. Thirdly, through -extensive experiments on multiple real-world datasets, we demonstrate the -effectiveness of the proposed methods. -" -7206,1804.07000,"Marcel Trotzek, Sven Koitka, Christoph M. Friedrich","Utilizing Neural Networks and Linguistic Metadata for Early Detection of - Depression Indications in Text Sequences",cs.CL cs.IR," Depression is ranked as the largest contributor to global disability and is -also a major reason for suicide. Still, many individuals suffering from forms -of depression are not treated for various reasons. Previous studies have shown -that depression also has an effect on language usage and that many depressed -individuals use social media platforms or the internet in general to get -information or discuss their problems. This paper addresses the early detection -of depression using machine learning models based on messages on a social -platform. In particular, a convolutional neural network based on different word -embeddings is evaluated and compared to a classification based on user-level -linguistic metadata. An ensemble of both approaches is shown to achieve -state-of-the-art results in a current early detection task. Furthermore, the -currently popular ERDE score as metric for early detection systems is examined -in detail and its drawbacks in the context of shared tasks are illustrated. A -slightly modified metric is proposed and compared to the original score. -Finally, a new word embedding was trained on a large corpus of the same domain -as the described task and is evaluated as well. -" -7207,1804.07007,"Yi Liao, Lidong Bing, Piji Li, Shuming Shi, Wai Lam, Tong Zhang",QuaSE: Accurate Text Style Transfer under Quantifiable Guidance,cs.CL," We propose the task of Quantifiable Sequence Editing (QuaSE): editing an -input sequence to generate an output sequence that satisfies a given numerical -outcome value measuring a certain property of the sequence, with the -requirement of keeping the main content of the input sequence. For example, an -input sequence could be a word sequence, such as review sentence and -advertisement text. For a review sentence, the outcome could be the review -rating; for an advertisement, the outcome could be the click-through rate. The -major challenge in performing QuaSE is how to perceive the outcome-related -wordings, and only edit them to change the outcome. In this paper, the proposed -framework contains two latent factors, namely, outcome factor and content -factor, disentangled from the input sentence to allow convenient editing to -change the outcome and keep the content. Our framework explores the -pseudo-parallel sentences by modeling their content similarity and outcome -differences to enable a better disentanglement of the latent factors, which -allows generating an output to better satisfy the desired outcome and keep the -content. The dual reconstruction structure further enhances the capability of -generating expected output by exploiting the couplings of latent factors of -pseudo-parallel sentences. For evaluation, we prepared a dataset of Yelp review -sentences with the ratings as outcome. Extensive experimental results are -reported and discussed to elaborate the peculiarities of our framework. -" -7208,1804.07036,"Yuxiang Wu, Baotian Hu",Learning to Extract Coherent Summary via Deep Reinforcement Learning,cs.CL," Coherence plays a critical role in producing a high-quality summary from a -document. In recent years, neural extractive summarization is becoming -increasingly attractive. However, most of them ignore the coherence of -summaries when extracting sentences. As an effort towards extracting coherent -summaries, we propose a neural coherence model to capture the cross-sentence -semantic and syntactic coherence patterns. The proposed neural coherence model -obviates the need for feature engineering and can be trained in an end-to-end -fashion using unlabeled data. Empirical results show that the proposed neural -coherence model can efficiently capture the cross-sentence coherence patterns. -Using the combined output of the neural coherence model and ROUGE package as -the reward, we design a reinforcement learning method to train a proposed -neural extractive summarizer which is named Reinforced Neural Extractive -Summarization (RNES) model. The RNES model learns to optimize coherence and -informative importance of the summary simultaneously. Experimental results show -that the proposed RNES outperforms existing baselines and achieves -state-of-the-art performance in term of ROUGE on CNN/Daily Mail dataset. The -qualitative evaluation indicates that summaries produced by RNES are more -coherent and readable. -" -7209,1804.07068,"Masashi Yoshikawa, Koji Mineshima, Hiroshi Noji and Daisuke Bekki","Consistent CCG Parsing over Multiple Sentences for Improved Logical - Reasoning",cs.CL," In formal logic-based approaches to Recognizing Textual Entailment (RTE), a -Combinatory Categorial Grammar (CCG) parser is used to parse input premises and -hypotheses to obtain their logical formulas. Here, it is important that the -parser processes the sentences consistently; failing to recognize a similar -syntactic structure results in inconsistent predicate argument structures among -them, in which case the succeeding theorem proving is doomed to failure. In -this work, we present a simple method to extend an existing CCG parser to parse -a set of sentences consistently, which is achieved with an inter-sentence -modeling with Markov Random Fields (MRF). When combined with existing -logic-based systems, our method always shows improvement in the RTE experiments -on English and Japanese languages. -" -7210,1804.07097,Bernhard Kratzwald and Stefan Feuerriegel,"Putting Question-Answering Systems into Practice: Transfer Learning for - Efficient Domain Customization",cs.CL," Traditional information retrieval (such as that offered by web search -engines) impedes users with information overload from extensive result pages -and the need to manually locate the desired information therein. Conversely, -question-answering systems change how humans interact with information systems: -users can now ask specific questions and obtain a tailored answer - both -conveniently in natural language. Despite obvious benefits, their use is often -limited to an academic context, largely because of expensive domain -customizations, which means that the performance in domain-specific -applications often fails to meet expectations. This paper proposes -cost-efficient remedies: (i) we leverage metadata through a filtering -mechanism, which increases the precision of document retrieval, and (ii) we -develop a novel fuse-and-oversample approach for transfer learning in order to -improve the performance of answer extraction. Here knowledge is inductively -transferred from a related, yet different, tasks to the domain-specific -application, while accounting for potential differences in the sample sizes -across both tasks. The resulting performance is demonstrated with actual use -cases from a finance company and the film industry, where fewer than 400 -question-answer pairs had to be annotated in order to yield significant -performance gains. As a direct implication to management, this presents a -promising path to better leveraging of knowledge stored in information systems. -" -7211,1804.07212,"Sarthak Jain, Edward Banner, Jan-Willem van de Meent, Iain J. - Marshall, Byron C. Wallace","Learning Disentangled Representations of Texts with Application to - Biomedical Abstracts",cs.CL," We propose a method for learning disentangled representations of texts that -code for distinct and complementary aspects, with the aim of affording -efficient model transfer and interpretability. To induce disentangled -embeddings, we propose an adversarial objective based on the (dis)similarity -between triplets of documents with respect to specific aspects. Our motivating -application is embedding biomedical abstracts describing clinical trials in a -manner that disentangles the populations, interventions, and outcomes in a -given trial. We show that our method learns representations that encode these -clinically salient aspects, and that these can be effectively used to perform -aspect-specific retrieval. We demonstrate that the approach generalizes beyond -our motivating application in experiments on two multi-aspect review corpora. -" -7212,1804.07247,"Dominic Seyler, Lunan Li, ChengXiang Zhai","Semantic Text Analysis for Detection of Compromised Accounts on Social - Networks",cs.SI cs.CL cs.CR," Compromised accounts on social networks are regular user accounts that have -been taken over by an entity with malicious intent. Since the adversary -exploits the already established trust of a compromised account, it is crucial -to detect these accounts to limit the damage they can cause. We propose a novel -general framework for semantic analysis of text messages coming out from an -account to detect compromised accounts. Our framework is built on the -observation that normal users will use language that is measurably different -from the language that an adversary would use when the account is compromised. -We propose to use the difference of language models of users and adversaries to -define novel interpretable semantic features for measuring semantic incoherence -in a message stream. We study the effectiveness of the proposed semantic -features using a Twitter data set. Evaluation results show that the proposed -framework is effective for discovering compromised accounts on social networks -and a KL-divergence-based language model feature works best. -" -7213,1804.07253,"Luca Soldaini, Timothy Walsh, Arman Cohan, Julien Han, Nazli Goharian","Helping or Hurting? Predicting Changes in Users' Risk of Self-Harm - Through Online Community Interactions",cs.CL," In recent years, online communities have formed around suicide and self-harm -prevention. While these communities offer support in moment of crisis, they can -also normalize harmful behavior, discourage professional treatment, and -instigate suicidal ideation. In this work, we focus on how interaction with -others in such a community affects the mental state of users who are seeking -support. We first build a dataset of conversation threads between users in a -distressed state and community members offering support. We then show how to -construct a classifier to predict whether distressed users are helped or harmed -by the interactions in the thread, and we achieve a macro-F1 score of up to -0.69. -" -7214,1804.07329,"Yevgeni Berzak, Boris Katz, Roger Levy",Assessing Language Proficiency from Eye Movements in Reading,cs.CL," We present a novel approach for determining learners' second language -proficiency which utilizes behavioral traces of eye movements during reading. -Our approach provides stand-alone eyetracking based English proficiency scores -which reflect the extent to which the learner's gaze patterns in reading are -similar to those of native English speakers. We show that our scores correlate -strongly with standardized English proficiency tests. We also demonstrate that -gaze information can be used to accurately predict the outcomes of such tests. -Our approach yields the strongest performance when the test taker is presented -with a suite of sentences for which we have eyetracking data from other -readers. However, it remains effective even using eyetracking with sentences -for which eye movement data have not been previously collected. By deriving -proficiency as an automatic byproduct of eye movements during ordinary reading, -our approach offers a potentially valuable new tool for second language -proficiency assessment. More broadly, our results open the door to future -methods for inferring reader characteristics from the behavioral traces of -reading. -" -7215,1804.07331,Murali Raghu Babu Balusu and Taha Merghani and Jacob Eisenstein,Stylistic Variation in Social Media Part-of-Speech Tagging,cs.CL cs.AI," Social media features substantial stylistic variation, raising new challenges -for syntactic analysis of online writing. However, this variation is often -aligned with author attributes such as age, gender, and geography, as well as -more readily-available social network metadata. In this paper, we report new -evidence on the link between language and social networks in the task of -part-of-speech tagging. We find that tagger error rates are correlated with -network structure, with high accuracy in some parts of the network, and lower -accuracy elsewhere. As a result, tagger accuracy depends on training from a -balanced sample of the network, rather than training on texts from a narrow -subcommunity. We also describe our attempts to add robustness to stylistic -variation, by building a mixture-of-experts model in which each expert is -associated with a region of the social network. While prior work found that -similar approaches yield performance improvements in sentiment analysis and -entity linking, we were unable to obtain performance improvements in -part-of-speech tagging, despite strong evidence for the link between -part-of-speech error rates and social network structure. -" -7216,1804.07375,Amir Zeldes,A Predictive Model for Notional Anaphora in English,cs.CL," Notional anaphors are pronouns which disagree with their antecedents' -grammatical categories for notional reasons, such as plural to singular -agreement in: 'the government ... they'. Since such cases are rare and conflict -with evidence from strictly agreeing cases ('the government ... it'), they -present a substantial challenge to both coreference resolution and referring -expression generation. Using the OntoNotes corpus, this paper takes an ensemble -approach to predicting English notional anaphora in context on the basis of the -largest empirical data to date. In addition to state of the art prediction -accuracy, the results suggest that theoretical approaches positing a plural -construal at the antecedent's utterance are insufficient, and that -circumstances at the anaphor's utterance location, as well as global factors -such as genre, have a strong effect on the choice of referring expression. -" -7217,1804.07399,"Akash Ganesan, Divyansh Pal, Karthik Muthuraman, Shubham Dash",Video based Contextual Question Answering,cs.CL cs.CV," The primary aim of this project is to build a contextual Question-Answering -model for videos. The current methodologies provide a robust model for image -based Question-Answering, but we are aim to generalize this approach to be -videos. We propose a graphical representation of video which is able to handle -several types of queries across the whole video. For example, if a frame has an -image of a man and a cat sitting, it should be able to handle queries like, -where is the cat sitting with respect to the man? or ,what is the man holding -in his hand?. It should be able to answer queries relating to temporal -relationships also. -" -7218,1804.07445,"Tu Vu, Baotian Hu, Tsendsuren Munkhdalai and Hong Yu",Sentence Simplification with Memory-Augmented Neural Networks,cs.CL," Sentence simplification aims to simplify the content and structure of complex -sentences, and thus make them easier to interpret for human readers, and easier -to process for downstream NLP applications. Recent advances in neural machine -translation have paved the way for novel approaches to the task. In this paper, -we adapt an architecture with augmented memory capacities called Neural -Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our -experiments demonstrate the effectiveness of our approach on different -simplification datasets, both in terms of automatic evaluation measures and -human judgments. -" -7219,1804.07461,"Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, - Samuel R. Bowman","GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language - Understanding",cs.CL," For natural language understanding (NLU) technology to be maximally useful, -both practically and as a scientific object of study, it must be general: it -must be able to process language in a way that is not exclusively tailored to -any one specific task or dataset. In pursuit of this objective, we introduce -the General Language Understanding Evaluation benchmark (GLUE), a tool for -evaluating and analyzing the performance of models across a diverse range of -existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing -knowledge across tasks because certain tasks have very limited training data. -We further provide a hand-crafted diagnostic test suite that enables detailed -linguistic analysis of NLU models. We evaluate baselines based on current -methods for multi-task and transfer learning and find that they do not -immediately give substantial improvements over the aggregate performance of -training a separate model per task, indicating room for improvement in -developing general and robust NLU systems. -" -7220,1804.07581,"Mitra Mohtarami, Ramy Baly, James Glass, Preslav Nakov, Lluis Marquez, - Alessandro Moschitti",Automatic Stance Detection Using End-to-End Memory Networks,cs.CL," We present a novel end-to-end memory network for stance detection, which -jointly (i) predicts whether a document agrees, disagrees, discusses or is -unrelated with respect to a given target claim, and also (ii) extracts snippets -of evidence for that prediction. The network operates at the paragraph level -and integrates convolutional and recurrent neural networks, as well as a -similarity matrix as part of the overall architecture. The experimental -evaluation on the Fake News Challenge dataset shows state-of-the-art -performance. -" -7221,1804.07583,Besnik Fetahu,Approaches for Enriching and Improving Textual Knowledge Bases,cs.CL cs.IR," Verifiability is one of the core editing principles in Wikipedia, where -editors are encouraged to provide citations for the added statements. -Statements can be any arbitrary piece of text, ranging from a sentence up to a -paragraph. However, in many cases, citations are either outdated, missing, or -link to non-existing references (e.g. dead URL, moved content etc.). In total, -20\% of the cases such citations refer to news articles and represent the -second most cited source. Even in cases where citations are provided, there are -no explicit indicators for the span of a citation for a given piece of text. In -addition to issues related with the verifiability principle, many Wikipedia -entity pages are incomplete, with relevant information that is already -available in online news sources missing. Even for the already existing -citations, there is often a delay between the news publication time and the -reference time. - In this thesis, we address the aforementioned issues and propose automated -approaches that enforce the verifiability principle in Wikipedia, and suggest -relevant and missing news references for further enriching Wikipedia entity -pages. -" -7222,1804.07587,"Israa Jaradat, Pepa Gencheva, Alberto Barron-Cedeno, Lluis Marquez, - Preslav Nakov",ClaimRank: Detecting Check-Worthy Claims in Arabic and English,cs.CL," We present ClaimRank, an online system for detecting check-worthy claims. -While originally trained on political debates, the system can work for any kind -of text, e.g., interviews or regular news articles. Its aim is to facilitate -manual fact-checking efforts by prioritizing the claims that fact-checkers -should consider first. ClaimRank supports both Arabic and English, it is -trained on actual annotations from nine reputable fact-checking organizations -(PolitiFact, FactCheck, ABC, CNN, NPR, NYT, Chicago Tribune, The Guardian, and -Washington Post), and thus it can mimic the claim selection strategies for each -and any of them, as well as for the union of them all. -" -7223,1804.07656,"Hitomi Yanaka, Koji Mineshima, Pascual Martinez-Gomez and Daisuke - Bekki",Acquisition of Phrase Correspondences using Natural Deduction Proofs,cs.CL," How to identify, extract, and use phrasal knowledge is a crucial problem for -the task of Recognizing Textual Entailment (RTE). To solve this problem, we -propose a method for detecting paraphrases via natural deduction proofs of -semantic relations between sentence pairs. Our solution relies on a graph -reformulation of partial variable unifications and an algorithm that induces -subgraph alignments between meaning representations. Experiments show that our -method can automatically detect various paraphrases that are absent from -existing paraphrase databases. In addition, the detection of paraphrases using -proof information improves the accuracy of RTE tasks. -" -7224,1804.07691,"Kaixiang Mo, Yu Zhang, Qiang Yang, Pascale Fung","Cross-domain Dialogue Policy Transfer via Simultaneous Speech-act and - Slot Alignment",cs.CL cs.AI," Dialogue policy transfer enables us to build dialogue policies in a target -domain with little data by leveraging knowledge from a source domain with -plenty of data. Dialogue sentences are usually represented by speech-acts and -domain slots, and the dialogue policy transfer is usually achieved by assigning -a slot mapping matrix based on human heuristics. However, existing dialogue -policy transfer methods cannot transfer across dialogue domains with different -speech-acts, for example, between systems built by different companies. Also, -they depend on either common slots or slot entropy, which are not available -when the source and target slots are totally disjoint and no database is -available to calculate the slot entropy. To solve this problem, we propose a -Policy tRansfer across dOMaIns and SpEech-acts (PROMISE) model, which is able -to transfer dialogue policies across domains with different speech-acts and -disjoint slots. The PROMISE model can learn to align different speech-acts and -slots simultaneously, and it does not require common slots or the calculation -of the slot entropy. Experiments on both real-world dialogue data and -simulations demonstrate that PROMISE model can effectively transfer dialogue -policies across domains with different speech-acts and disjoint slots. -" -7225,1804.07705,"Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave",Lightweight Adaptive Mixture of Neural and N-gram Language Models,cs.CL," It is often the case that the best performing language model is an ensemble -of a neural language model with n-grams. In this work, we propose a method to -improve how these two models are combined. By using a small network which -predicts the mixture weight between the two models, we adapt their relative -importance at each time step. Because the gating network is small, it trains -quickly on small amounts of held out data, and does not add overhead at scoring -time. Our experiments carried out on the One Billion Word benchmark show a -significant improvement over the state of the art ensemble without retraining -of the basic modules. -" -7226,1804.07707,Kris Cao and Stephen Clark,Factorising AMR generation through syntax,cs.CL," Generating from Abstract Meaning Representation (AMR) is an underspecified -problem, as many syntactic decisions are not constrained by the semantic graph. -To explicitly account for this underspecification, we break down generating -from AMR into two steps: first generate a syntactic structure, and then -generate the surface form. We show that decomposing the generation process this -way leads to state-of-the-art single model performance generating from AMR -without additional unlabelled data. We also demonstrate that we can generate -meaning-preserving syntactic paraphrases of the same AMR graph, as judged by -humans. -" -7227,1804.07726,"Minjoon Seo, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh - Hajishirzi","Phrase-Indexed Question Answering: A New Challenge for Scalable Document - Comprehension",cs.CL," We formalize a new modular variant of current question answering tasks by -enforcing complete independence of the document encoder from the question -encoder. This formulation addresses a key challenge in machine comprehension by -requiring a standalone representation of the document discourse. It -additionally leads to a significant scalability advantage since the encoding of -the answer candidate phrases in the document can be pre-computed and indexed -offline for efficient retrieval. We experiment with baseline models for the new -task, which achieve a reasonable accuracy but significantly underperform -unconstrained QA models. We invite the QA research community to engage in -Phrase-Indexed Question Answering (PIQA, pika) for closing the gap. The -leaderboard is at: nlp.cs.washington.edu/piqa -" -7228,1804.07745,"Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard - Grave","Loss in Translation: Learning Bilingual Word Mapping with a Retrieval - Criterion",cs.CL cs.LG," Continuous word representations learned separately on distinct languages can -be aligned so that their words become comparable in a common space. Existing -works typically solve a least-square regression problem to learn a rotation -aligning a small bilingual lexicon, and use a retrieval criterion for -inference. In this paper, we propose an unified formulation that directly -optimizes a retrieval criterion in an end-to-end fashion. Our experiments on -standard benchmarks show that our approach outperforms the state of the art on -word translation, with the biggest improvements observed for distant language -pairs such as English-Chinese. -" -7229,1804.07754,"Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-yi Kong, Noah Constant, - Petr Pilar, Heming Ge, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil",Learning Semantic Textual Similarity from Conversations,cs.CL," We present a novel approach to learn representations for sentence-level -semantic similarity using conversational data. Our method trains an -unsupervised model to predict conversational input-response pairs. The -resulting sentence embeddings perform well on the semantic textual similarity -(STS) benchmark and SemEval 2017's Community Question Answering (CQA) question -similarity subtask. Performance is further improved by introducing multitask -training combining the conversational input-response prediction task and a -natural language inference task. Extensive experiments show the proposed model -achieves the best performance among all neural models on the STS benchmark and -is competitive with the state-of-the-art feature engineered and mixed systems -in both tasks. -" -7230,1804.07755,"Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, - Marc'Aurelio Ranzato",Phrase-Based & Neural Unsupervised Machine Translation,cs.CL," Machine translation systems achieve near human-level performance on some -languages, yet their effectiveness strongly relies on the availability of large -amounts of parallel sentences, which hinders their applicability to the -majority of language pairs. This work investigates how to learn to translate -when having access to only large monolingual corpora in each language. We -propose two model variants, a neural and a phrase-based model. Both versions -leverage a careful initialization of the parameters, the denoising effect of -language models and automatic generation of parallel data by iterative -back-translation. These models are significantly better than methods from the -literature, while being simpler and having fewer hyper-parameters. On the -widely used WMT'14 English-French and WMT'16 German-English benchmarks, our -models respectively obtain 28.1 and 25.2 BLEU points without using a single -parallel sentence, outperforming the state of the art by more than 11 BLEU -points. On low-resource languages like English-Urdu and English-Romanian, our -methods achieve even better results than semi-supervised and supervised -approaches leveraging the paucity of available bitexts. Our code for NMT and -PBSMT is publicly available. -" -7231,1804.07781,"Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro - Rodriguez, Jordan Boyd-Graber",Pathologies of Neural Models Make Interpretations Difficult,cs.CL," One way to interpret neural model predictions is to highlight the most -important input features---for example, a heatmap visualization over the words -in an input sentence. In existing interpretation methods for NLP, a word's -importance is determined by either input perturbation---measuring the decrease -in model confidence when that word is removed---or by the gradient with respect -to that word. To understand the limitations of these methods, we use input -reduction, which iteratively removes the least important word from the input. -This exposes pathological behaviors of neural models: the remaining words -appear nonsensical to humans and are not the ones determined as important by -interpretation methods. As we confirm with human experiments, the reduced -examples lack information to support the prediction of any label, but models -still make the same predictions with high confidence. To explain these -counterintuitive results, we draw connections to adversarial examples and -confidence calibration: pathological behaviors reveal difficulties in -interpreting neural models trained with maximum likelihood. To mitigate their -deficiencies, we fine-tune the models by encouraging high entropy outputs on -reduced examples. Fine-tuned models become more interpretable under input -reduction without accuracy loss on regular examples. -" -7232,1804.07789,"Preksha Nema, Shreyas Shetty, Parag Jain, Anirban Laha, Karthik - Sankaranarayanan, Mitesh M. Khapra","Generating Descriptions from Structured Data Using a Bifocal Attention - Mechanism and Gated Orthogonalization",cs.CL cs.AI cs.LG," In this work, we focus on the task of generating natural language -descriptions from a structured table of facts containing fields (such as -nationality, occupation, etc) and values (such as Indian, actor, director, -etc). One simple choice is to treat the table as a sequence of fields and -values and then use a standard seq2seq model for this task. However, such a -model is too generic and does not exploit task-specific characteristics. For -example, while generating descriptions from a table, a human would attend to -information at two levels: (i) the fields (macro level) and (ii) the values -within the field (micro level). Further, a human would continue attending to a -field for a few timesteps till all the information from that field has been -rendered and then never return back to this field (because there is nothing -left to say about it). To capture this behavior we use (i) a fused bifocal -attention mechanism which exploits and combines this micro and macro level -information and (ii) a gated orthogonalization mechanism which tries to ensure -that a field is remembered for a few time steps and then forgotten. We -experiment with a recently released dataset which contains fact tables about -people and their corresponding one line biographical descriptions in English. -In addition, we also introduce two similar datasets for French and German. Our -experiments show that the proposed model gives 21% relative improvement over a -recently proposed state of the art method and 10% relative improvement over -basic seq2seq models. The code and the datasets developed as a part of this -work are publicly available. -" -7233,1804.07790,"Parag Jain, Anirban Laha, Karthik Sankaranarayanan, Preksha Nema, - Mitesh M. Khapra, Shreyas Shetty","A Mixed Hierarchical Attention based Encoder-Decoder Approach for - Standard Table Summarization",cs.CL cs.AI," Structured data summarization involves generation of natural language -summaries from structured input data. In this work, we consider summarizing -structured data occurring in the form of tables as they are prevalent across a -wide variety of domains. We formulate the standard table summarization problem, -which deals with tables conforming to a single predefined schema. To this end, -we propose a mixed hierarchical attention based encoder-decoder model which is -able to leverage the structure in addition to the content of the tables. Our -experiments on the publicly available WEATHERGOV dataset show around 18 BLEU (~ -30%) improvement over the current state-of-the-art. -" -7234,1804.07827,"Liyuan Liu, Xiang Ren, Jingbo Shang, Jian Peng and Jiawei Han","Efficient Contextualized Representation: Language Model Pruning for - Sequence Labeling",cs.CL," Many efforts have been made to facilitate natural language processing tasks -with pre-trained language models (LMs), and brought significant improvements to -various applications. To fully leverage the nearly unlimited corpora and -capture linguistic information of multifarious levels, large-size LMs are -required; but for a specific task, only parts of these information are useful. -Such large-sized LMs, even in the inference stage, may cause heavy computation -workloads, making them too time-consuming for large-scale applications. Here we -propose to compress bulky LMs while preserving useful information with regard -to a specific task. As different layers of the model keep different -information, we develop a layer selection method for model pruning using -sparsity-inducing regularization. By introducing the dense connectivity, we can -detach any layer without affecting others, and stretch shallow and wide LMs to -be deep and narrow. In model training, LMs are learned with layer-wise dropouts -for better robustness. Experiments on two benchmark datasets demonstrate the -effectiveness of our method. -" -7235,1804.07828,Qiang Ning and Hao Wu and Dan Roth,A Multi-Axis Annotation Scheme for Event Temporal Relations,cs.CL," Existing temporal relation (TempRel) annotation schemes often have low -inter-annotator agreements (IAA) even between experts, suggesting that the -current annotation task needs a better definition. This paper proposes a new -multi-axis modeling to better capture the temporal structure of events. In -addition, we identify that event end-points are a major source of confusion in -annotation, so we also propose to annotate TempRels based on start-points only. -A pilot expert annotation using the proposed scheme shows significant -improvement in IAA from the conventional 60's to 80's (Cohen's Kappa). This -better-defined annotation scheme further enables the use of crowdsourcing to -alleviate the labor intensity for each annotator. We hope that this work can -foster more interesting studies towards event understanding. -" -7236,1804.07835,"Li Zhang, Steven R. Wilson, Rada Mihalcea","Direct Network Transfer: Transfer Learning of Sentence Embeddings for - Semantic Similarity",cs.CL," Sentence encoders, which produce sentence embeddings using neural networks, -are typically evaluated by how well they transfer to downstream tasks. This -includes semantic similarity, an important task in natural language -understanding. Although there has been much work dedicated to building sentence -encoders, the accompanying transfer learning techniques have received -relatively little attention. In this paper, we propose a transfer learning -setting specialized for semantic similarity, which we refer to as direct -network transfer. Through experiments on several standard text similarity -datasets, we show that applying direct network transfer to existing encoders -can lead to state-of-the-art performance. Additionally, we compare several -approaches to transfer sentence encoders to semantic similarity tasks, showing -that the choice of transfer learning setting greatly affects the performance in -many cases, and differs by encoder and dataset. -" -7237,1804.07847,"Giannis Bekoulis, Johannes Deleu, Thomas Demeester, Chris Develder","Joint entity recognition and relation extraction as a multi-head - selection problem",cs.CL," State-of-the-art models for joint entity recognition and relation extraction -strongly rely on external natural language processing (NLP) tools such as POS -(part-of-speech) taggers and dependency parsers. Thus, the performance of such -joint models depends on the quality of the features obtained from these NLP -tools. However, these features are not always accurate for various languages -and contexts. In this paper, we propose a joint neural model which performs -entity recognition and relation extraction simultaneously, without the need of -any manually extracted features or the use of any external tool. Specifically, -we model the entity recognition task using a CRF (Conditional Random Fields) -layer and the relation extraction task as a multi-head selection problem (i.e., -potentially identify multiple relations for each entity). We present an -extensive experimental setup, to demonstrate the effectiveness of our method -using datasets from various contexts (i.e., news, biomedical, real estate) and -languages (i.e., English, Dutch). Our model outperforms the previous neural -models that use automatically extracted features, while it performs within a -reasonable margin of feature-based neural models, or even beats them. -" -7238,1804.07849,Karl Stratos,"Mutual Information Maximization for Simple and Accurate Part-Of-Speech - Induction",cs.CL," We address part-of-speech (POS) induction by maximizing the mutual -information between the induced label and its context. We focus on two training -objectives that are amenable to stochastic gradient descent (SGD): a novel -generalization of the classical Brown clustering objective and a recently -proposed variational lower bound. While both objectives are subject to noise in -gradient updates, we show through analysis and experiments that the variational -lower bound is robust whereas the generalized Brown objective is vulnerable. We -obtain competitive performance on a multitude of datasets and languages with a -simple architecture that encodes morphology and context. -" -7239,1804.07853,"David Gaddy, Mitchell Stern, and Dan Klein",What's Going On in Neural Constituency Parsers? An Analysis,cs.CL," A number of differences have emerged between modern and classic approaches to -constituency parsing in recent years, with structural components like grammars -and feature-rich lexicons becoming less central while recurrent neural network -representations rise in popularity. The goal of this work is to analyze the -extent to which information provided directly by the model structure in -classical systems is still being captured by neural methods. To this end, we -propose a high-performance neural model (92.08 F1 on PTB) that is -representative of recent work and perform a series of investigative -experiments. We find that our model implicitly learns to encode much of the -same information that was explicitly provided by grammars and lexicons in the -past, indicating that this scaffolding can largely be subsumed by powerful -general-purpose neural machinery. -" -7240,1804.07855,"Da Tang and Xiujun Li and Jianfeng Gao and Chong Wang and Lihong Li - and Tony Jebara",Subgoal Discovery for Hierarchical Dialogue Policy Learning,cs.CL cs.AI cs.LG," Developing agents to engage in complex goal-oriented dialogues is challenging -partly because the main learning signals are very sparse in long conversations. -In this paper, we propose a divide-and-conquer approach that discovers and -exploits the hidden structure of the task to enable efficient policy learning. -First, given successful example dialogues, we propose the Subgoal Discovery -Network (SDN) to divide a complex goal-oriented task into a set of simpler -subgoals in an unsupervised fashion. We then use these subgoals to learn a -multi-level policy by hierarchical reinforcement learning. We demonstrate our -method by building a dialogue agent for the composite task of travel planning. -Experiments with simulated and real users show that our approach performs -competitively against a state-of-the-art method that requires human-defined -subgoals. Moreover, we show that the learned subgoals are often human -comprehensible. -" -7241,1804.07875,"Lifu Huang, Kyunghyun Cho, Boliang Zhang, Heng Ji, Kevin Knight","Multi-lingual Common Semantic Space Construction via Cluster-consistent - Word Embedding",cs.CL cs.AI," We construct a multilingual common semantic space based on distributional -semantics, where words from multiple languages are projected into a shared -space to enable knowledge and resource transfer across languages. Beyond word -alignment, we introduce multiple cluster-level alignments and enforce the word -clusters to be consistently distributed across multiple languages. We exploit -three signals for clustering: (1) neighbor words in the monolingual word -embedding space; (2) character-level information; and (3) linguistic properties -(e.g., apposition, locative suffix) derived from linguistic structure knowledge -bases available for thousands of languages. We introduce a new -cluster-consistent correlational neural network to construct the common -semantic space by aligning words as well as clusters. Intrinsic evaluation on -monolingual and multilingual QVEC tasks shows our approach achieves -significantly higher correlation with linguistic features than state-of-the-art -multi-lingual embedding learning methods do. Using low-resource language name -tagging as a case study for extrinsic evaluation, our approach achieves up to -24.5\% absolute F-score gain over the state of the art. -" -7242,1804.07878,"Zhong Zhou, Matthias Sperber, Alex Waibel","Massively Parallel Cross-Lingual Learning in Low-Resource Target - Language Translation",cs.CL," We work on translation from rich-resource languages to low-resource -languages. The main challenges we identify are the lack of low-resource -language data, effective methods for cross-lingual transfer, and the -variable-binding problem that is common in neural systems. We build a -translation system that addresses these challenges using eight European -language families as our test ground. Firstly, we add the source and the target -family labels and study intra-family and inter-family influences for effective -cross-lingual transfer. We achieve an improvement of +9.9 in BLEU score for -English-Swedish translation using eight families compared to the single-family -multi-source multi-target baseline. Moreover, we find that training on two -neighboring families closest to the low-resource language is often enough. -Secondly, we construct an ablation study and find that reasonably good results -can be achieved even with considerably less target data. Thirdly, we address -the variable-binding problem by building an order-preserving named entity -translation model. We obtain 60.6% accuracy in qualitative evaluation where our -translations are akin to human translations in a preliminary study. -" -7243,1804.07881,Tongtao Zhang and Heng Ji,Event Extraction with Generative Adversarial Imitation Learning,cs.CL," We propose a new method for event extraction (EE) task based on an imitation -learning framework, specifically, inverse reinforcement learning (IRL) via -generative adversarial network (GAN). The GAN estimates proper rewards -according to the difference between the actions committed by the expert (or -ground truth) and the agent among complicated states in the environment. EE -task benefits from these dynamic rewards because instances and labels yield to -various extents of difficulty and the gains are expected to be diverse -- e.g., -an ambiguous but correctly detected trigger or argument should receive high -gains -- while the traditional RL models usually neglect such differences and -pay equal attention on all instances. Moreover, our experiments also -demonstrate that the proposed framework outperforms state-of-the-art methods, -without explicit feature engineering. -" -7244,1804.07888,"Xiaodong Liu, Kevin Duh and Jianfeng Gao",Stochastic Answer Networks for Natural Language Inference,cs.CL," We propose a stochastic answer network (SAN) to explore multi-step inference -strategies in Natural Language Inference. Rather than directly predicting the -results given the inputs, the model maintains a state and iteratively refines -its predictions. Our experiments show that SAN achieves the state-of-the-art -results on three benchmarks: Stanford Natural Language Inference (SNLI) -dataset, MultiGenre Natural Language Inference (MultiNLI) dataset and Quora -Question Pairs dataset. -" -7245,1804.07889,"Di Lu, Spencer Whitehead, Lifu Huang, Heng Ji, Shih-Fu Chang",Entity-aware Image Caption Generation,cs.CL," Current image captioning approaches generate descriptions which lack specific -information, such as named entities that are involved in the images. In this -paper we propose a new task which aims to generate informative image captions, -given images and hashtags as input. We propose a simple but effective approach -to tackle this problem. We first train a convolutional neural networks - long -short term memory networks (CNN-LSTM) model to generate a template caption -based on the input image. Then we use a knowledge graph based collective -inference algorithm to fill in the template with specific named entities -retrieved via the hashtags. Experiments on a new benchmark dataset collected -from Flickr show that our model generates news-style image descriptions with -much richer information. Our model outperforms unimodal baselines significantly -with various evaluation metrics. -" -7246,1804.07893,"Tatsuru Kobayashi, Kumiko Tanaka-Ishii",Taylor's law for Human Linguistic Sequences,cs.CL," Taylor's law describes the fluctuation characteristics underlying a system in -which the variance of an event within a time span grows by a power law with -respect to the mean. Although Taylor's law has been applied in many natural and -social systems, its application for language has been scarce. This article -describes a new quantification of Taylor's law in natural language and reports -an analysis of over 1100 texts across 14 languages. The Taylor exponents of -written natural language texts were found to exhibit almost the same value. The -exponent was also compared for other language-related data, such as the -child-directed speech, music, and programming language code. The results show -how the Taylor exponent serves to quantify the fundamental structural -complexity underlying linguistic time series. The article also shows the -applicability of these findings in evaluating language models. -" -7247,1804.07899,"Markus Freitag, Scott Roy",Unsupervised Natural Language Generation with Denoising Autoencoders,cs.CL," Generating text from structured data is important for various tasks such as -question answering and dialog systems. We show that in at least one domain, -without any supervision and only based on unlabeled text, we are able to build -a Natural Language Generation (NLG) system with higher performance than -supervised approaches. In our approach, we interpret the structured data as a -corrupt representation of the desired output and use a denoising auto-encoder -to reconstruct the sentence. We show how to introduce noise into training -examples that do not contain structured data, and that the resulting denoising -auto-encoder generalizes to generate correct sentences when given structured -data. -" -7248,1804.07911,"Wasi Uddin Ahmad, Xueying Bai, Zhechao Huang, Chao Jiang, Nanyun Peng, - Kai-Wei Chang","Multi-task Learning for Universal Sentence Embeddings: A Thorough - Evaluation using Transfer and Auxiliary Tasks",cs.CL," Learning distributed sentence representations is one of the key challenges in -natural language processing. Previous work demonstrated that a recurrent neural -network (RNNs) based sentence encoder trained on a large collection of -annotated natural language inference data, is efficient in the transfer -learning to facilitate other related tasks. In this paper, we show that joint -learning of multiple tasks results in better generalizable sentence -representations by conducting extensive experiments and analysis comparing the -multi-task and single-task learned sentence encoders. The quantitative analysis -using auxiliary tasks show that multi-task learning helps to embed better -semantic information in the sentence representations compared to single-task -learning. In addition, we compare multi-task sentence encoders with -contextualized word representations and show that combining both of them can -further boost the performance of transfer learning. -" -7249,1804.07915,"Yun Chen, Victor O.K. Li, Kyunghyun Cho, Samuel R. Bowman",A Stable and Effective Learning Strategy for Trainable Greedy Decoding,cs.CL," Beam search is a widely used approximate search strategy for neural network -decoders, and it generally outperforms simple greedy decoding on tasks like -machine translation. However, this improvement comes at substantial -computational cost. In this paper, we propose a flexible new method that allows -us to reap nearly the full benefits of beam search with nearly no additional -computational cost. The method revolves around a small neural network actor -that is trained to observe and manipulate the hidden state of a -previously-trained decoder. To train this actor network, we introduce the use -of a pseudo-parallel corpus built using the output of beam search on a base -model, ranked by a target quality metric like BLEU. Our method is inspired by -earlier work on this problem, but requires no reinforcement learning, and can -be trained reliably on a range of models. Experiments on three parallel corpora -and three architectures show that the method yields substantial improvements in -translation quality and speed over each base system. -" -7250,1804.07918,Jonathan Herzig and Jonathan Berant,Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing,cs.CL cs.AI," Building a semantic parser quickly in a new domain is a fundamental challenge -for conversational interfaces, as current semantic parsers require expensive -supervision and lack the ability to generalize to new domains. In this paper, -we introduce a zero-shot approach to semantic parsing that can parse utterances -in unseen domains while only being trained on examples in other source domains. -First, we map an utterance to an abstract, domain-independent, logical form -that represents the structure of the logical form, but contains slots instead -of KB constants. Then, we replace slots with KB constants via lexical alignment -scores and global inference. Our model reaches an average accuracy of 53.4% on -7 domains in the Overnight dataset, substantially better than other zero-shot -baselines, and performs as good as a parser trained on over 30% of the target -domain examples. -" -7251,1804.07927,"Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, Karthik - Sankaranarayanan","DuoRC: Towards Complex Language Understanding with Paraphrased Reading - Comprehension",cs.CL," We propose DuoRC, a novel dataset for Reading Comprehension (RC) that -motivates several new challenges for neural approaches in language -understanding beyond those offered by existing RC datasets. DuoRC contains -186,089 unique question-answer pairs created from a collection of 7680 pairs of -movie plots where each pair in the collection reflects two versions of the same -movie - one from Wikipedia and the other from IMDb - written by two different -authors. We asked crowdsourced workers to create questions from one version of -the plot and a different set of workers to extract or synthesize answers from -the other version. This unique characteristic of DuoRC where questions and -answers are created from different versions of a document narrating the same -underlying story, ensures by design, that there is very little lexical overlap -between the questions created from one version and the segments containing the -answer in the other version. Further, since the two versions have different -levels of plot detail, narration style, vocabulary, etc., answering questions -from the second version requires deeper language understanding and -incorporating external background knowledge. Additionally, the narrative style -of passages arising from movie plots (as opposed to typical descriptive -passages in existing datasets) exhibits the need to perform complex reasoning -over events across multiple sentences. Indeed, we observe that state-of-the-art -neural RC models which have achieved near human performance on the SQuAD -dataset, even when coupled with traditional NLP techniques to address the -challenges presented in DuoRC exhibit very poor performance (F1 score of 37.42% -on DuoRC v/s 86% on SQuAD dataset). This opens up several interesting research -avenues wherein DuoRC could complement other RC datasets to explore novel -neural approaches for studying language understanding. -" -7252,1804.07942,"Zhaopeng Tu and Yong Jiang and Xiaojiang Liu and Lei Shu and Shuming - Shi",Generative Stock Question Answering,cs.CL," We study the problem of stock related question answering (StockQA): -automatically generating answers to stock related questions, just like -professional stock analysts providing action recommendations to stocks upon -user's requests. StockQA is quite different from previous QA tasks since (1) -the answers in StockQA are natural language sentences (rather than entities or -values) and due to the dynamic nature of StockQA, it is scarcely possible to -get reasonable answers in an extractive way from the training data; and (2) -StockQA requires properly analyzing the relationship between keywords in QA -pair and the numerical features of a stock. We propose to address the problem -with a memory-augmented encoder-decoder architecture, and integrate different -mechanisms of number understanding and generation, which is a critical -component of StockQA. - We build a large-scale dataset containing over 180K StockQA instances, based -on which various technique combinations are extensively studied and compared. -Experimental results show that a hybrid word-character model with separate -character components for number processing, achieves the best performance. By -analyzing the results, we found that 44.8% of answers generated by our best -model still suffer from the generic answer problem, which can be alleviated by -a straightforward hybrid retrieval-generation model. -" -7253,1804.07944,"Akash Srivastava, Charles Sutton",Variational Inference In Pachinko Allocation Machines,cs.CL cs.LG stat.ML," The Pachinko Allocation Machine (PAM) is a deep topic model that allows -representing rich correlation structures among topics by a directed acyclic -graph over topics. Because of the flexibility of the model, however, -approximate inference is very difficult. Perhaps for this reason, only a small -number of potential PAM architectures have been explored in the literature. In -this paper we present an efficient and flexible amortized variational inference -method for PAM, using a deep inference network to parameterize the approximate -posterior distribution in a manner similar to the variational autoencoder. Our -inference method produces more coherent topics than state-of-art inference -methods for PAM while being an order of magnitude faster, which allows -exploration of a wider range of PAM architectures than have previously been -studied. -" -7254,1804.07946,Hwiyeol Jo and Stanley Jungkyu Choi,"Extrofitting: Enriching Word Representation and its Vector Space with - Semantic Lexicons",cs.CL cs.AI," We propose post-processing method for enriching not only word representation -but also its vector space using semantic lexicons, which we call extrofitting. -The method consists of 3 steps as follows: (i) Expanding 1 or more dimension(s) -on all the word vectors, filling with their representative value. (ii) -Transferring semantic knowledge by averaging each representative values of -synonyms and filling them in the expanded dimension(s). These two steps make -representations of the synonyms close together. (iii) Projecting the vector -space using Linear Discriminant Analysis, which eliminates the expanded -dimension(s) with semantic knowledge. When experimenting with GloVe, we find -that our method outperforms Faruqui's retrofitting on some of word similarity -task. We also report further analysis on our method in respect to word vector -dimensions, vocabulary size as well as other well-known pretrained word vectors -(e.g., Word2Vec, Fasttext). -" -7255,1804.07954,M\u{a}d\u{a}lina Cozma and Andrei M. Butnaru and Radu Tudor Ionescu,Automated essay scoring with string kernels and word embeddings,cs.CL," In this work, we present an approach based on combining string kernels and -word embeddings for automatic essay scoring. String kernels capture the -similarity among strings based on counting common character n-grams, which are -a low-level yet powerful type of feature, demonstrating state-of-the-art -results in various text classification tasks such as Arabic dialect -identification or native language identification. To our best knowledge, we are -the first to apply string kernels to automatically score essays. We are also -the first to combine them with a high-level semantic feature representation, -namely the bag-of-super-word-embeddings. We report the best performance on the -Automated Student Assessment Prize data set, in both in-domain and cross-domain -settings, surpassing recent state-of-the-art deep learning approaches. -" -7256,1804.07958,"Sha Yuan, Yu Zhang, Jie Tang, Juan Bautista Cabot\`a",Expert Finding in Community Question Answering: A Review,cs.IR cs.CL," The rapid development recently of Community Question Answering (CQA) -satisfies users quest for professional and personal knowledge about anything. -In CQA, one central issue is to find users with expertise and willingness to -answer the given questions. Expert finding in CQA often exhibits very different -challenges compared to traditional methods. Sparse data and new features -violate fundamental assumptions of traditional recommendation systems. This -paper focuses on reviewing and categorizing the current progress on expert -finding in CQA. We classify all the existing solutions into four different -categories: matrix factorization based models (MF-based models), gradient -boosting tree based models (GBT-based models), deep learning based models -(DL-based models) and ranking based models (R-based models). We find that -MF-based models outperform other categories of models in the field of expert -finding in CQA. Moreover, we use innovative diagrams to clarify several -important concepts of ensemble learning, and find that ensemble models with -several specific single models can further boosting the performance. Further, -we compare the performance of different models on different types of matching -tasks, including text vs. text, graph vs. text, audio vs. text and video vs. -text. The results can help the model selection of expert finding in practice. -Finally, we explore some potential future issues in expert finding research in -CQA. -" -7257,1804.07961,Daniel Fern\'andez-Gonz\'alez and Carlos G\'omez-Rodr\'iguez,"Faster Shift-Reduce Constituent Parsing with a Non-Binary, Bottom-Up - Strategy",cs.CL," An increasingly wide range of artificial intelligence applications rely on -syntactic information to process and extract meaning from natural language text -or speech, with constituent trees being one of the most widely used syntactic -formalisms. To produce these phrase-structure representations from sentences in -natural language, shift-reduce constituent parsers have become one of the most -efficient approaches. Increasing their accuracy and speed is still one of the -main objectives pursued by the research community so that artificial -intelligence applications that make use of parsing outputs, such as machine -translation or voice assistant services, can improve their performance. With -this goal in mind, we propose in this article a novel non-binary shift-reduce -algorithm for constituent parsing. Our parser follows a classical bottom-up -strategy but, unlike others, it straightforwardly creates non-binary branchings -with just one Reduce transition, instead of requiring prior binarization or a -sequence of binary transitions, allowing its direct application to any language -without the need of further resources such as percolation tables. As a result, -it uses fewer transitions per sentence than existing transition-based -constituent parsers, becoming the fastest such system and, as a consequence, -speeding up downstream applications. Using static oracle training and greedy -search, the accuracy of this novel approach is on par with state-of-the-art -transition-based constituent parsers and outperforms all top-down and bottom-up -greedy shift-reduce systems on the Wall Street Journal section from the English -Penn Treebank and the Penn Chinese Treebank. Additionally, we develop a dynamic -oracle for training the proposed transition-based algorithm, achieving further -improvements in both benchmarks and obtaining the best accuracy to date on the -Penn Chinese Treebank among greedy shift-reduce parsers. -" -7258,1804.07972,"Ond\v{r}ej C\'ifka, Aliaksei Severyn, Enrique Alfonseca, Katja - Filippova","Eval all, trust a few, do wrong to none: Comparing sentence generation - models",cs.CL," In this paper, we study recent neural generative models for text generation -related to variational autoencoders. Previous works have employed various -techniques to control the prior distribution of the latent codes in these -models, which is important for sampling performance, but little attention has -been paid to reconstruction error. In our study, we follow a rigorous -evaluation protocol using a large set of previously used and novel automatic -and human evaluation metrics, applied to both generated samples and -reconstructions. We hope that it will become the new evaluation standard when -comparing neural generative models for text. -" -7259,1804.07976,"Rachel Rudinger, Adam Teichert, Ryan Culkin, Sheng Zhang, Benjamin Van - Durme",Neural-Davidsonian Semantic Proto-role Labeling,cs.CL," We present a model for semantic proto-role labeling (SPRL) using an adapted -bidirectional LSTM encoding strategy that we call ""Neural-Davidsonian"": -predicate-argument structure is represented as pairs of hidden states -corresponding to predicate and argument head tokens of the input sequence. We -demonstrate: (1) state-of-the-art results in SPRL, and (2) that our network -naturally shares parameters between attributes, allowing for learning new -attribute types with limited added supervision. -" -7260,1804.07983,"Douwe Kiela, Changhan Wang, Kyunghyun Cho",Dynamic Meta-Embeddings for Improved Sentence Representations,cs.CL," While one of the first steps in many NLP systems is selecting what -pre-trained word embeddings to use, we argue that such a step is better left -for neural networks to figure out by themselves. To that end, we introduce -dynamic meta-embeddings, a simple yet effective method for the supervised -learning of embedding ensembles, which leads to state-of-the-art performance -within the same model class on a variety of tasks. We subsequently show how the -technique can be used to shed new light on the usage of word embeddings in NLP -systems. -" -7261,1804.07998,"Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani - Srivastava, Kai-Wei Chang",Generating Natural Language Adversarial Examples,cs.CL," Deep neural networks (DNNs) are vulnerable to adversarial examples, -perturbations to correctly classified examples which can cause the model to -misclassify. In the image domain, these perturbations are often virtually -indistinguishable to human perception, causing humans and state-of-the-art -models to disagree. However, in the natural language domain, small -perturbations are clearly perceptible, and the replacement of a single word can -drastically alter the semantics of the document. Given these challenges, we use -a black-box population-based optimization algorithm to generate semantically -and syntactically similar adversarial examples that fool well-trained sentiment -analysis and textual entailment models with success rates of 97% and 70%, -respectively. We additionally demonstrate that 92.3% of the successful -sentiment analysis adversarial examples are classified to their original label -by 20 human annotators, and that the examples are perceptibly quite similar. -Finally, we discuss an attempt to use adversarial training as a defense, but -fail to yield improvement, demonstrating the strength and diversity of our -adversarial examples. We hope our findings encourage researchers to pursue -improving the robustness of DNNs in the natural language domain. -" -7262,1804.08000,"Sheng Zhang, Kevin Duh and Benjamin Van Durme","Fine-grained Entity Typing through Increased Discourse Context and - Adaptive Classification Thresholds",cs.CL," Fine-grained entity typing is the task of assigning fine-grained semantic -types to entity mentions. We propose a neural architecture which learns a -distributional semantic representation that leverages a greater amount of -semantic context -- both document and sentence level information -- than prior -work. We find that additional context improves performance, with further -improvements gained by utilizing adaptive classification thresholds. -Experiments show that our approach without reliance on hand-crafted features -achieves the state-of-the-art results on three benchmark datasets. -" -7263,1804.08012,"Ramy Baly, Mitra Mohtarami, James Glass, Lluis Marquez, Alessandro - Moschitti, Preslav Nakov",Integrating Stance Detection and Fact Checking in a Unified Corpus,cs.CL," A reasonable approach for fact checking a claim involves retrieving -potentially relevant documents from different sources (e.g., news websites, -social media, etc.), determining the stance of each document with respect to -the claim, and finally making a prediction about the claim's factuality by -aggregating the strength of the stances, while taking the reliability of the -source into account. Moreover, a fact checking system should be able to explain -its decision by providing relevant extracts (rationales) from the documents. -Yet, this setup is not directly supported by existing datasets, which treat -fact checking, document retrieval, source credibility, stance detection and -rationale extraction as independent tasks. In this paper, we support the -interdependencies between these tasks as annotations in the same corpus. We -implement this setup on an Arabic fact checking corpus, the first of its kind. -" -7264,1804.08037,"Sheng Zhang, Kevin Duh and Benjamin Van Durme",Cross-lingual Semantic Parsing,cs.CL," We introduce the task of cross-lingual semantic parsing: mapping content -provided in a source language into a meaning representation based on a target -language. We present: (1) a meaning representation designed to allow systems to -target varying levels of structural complexity (shallow to deep analysis), (2) -an evaluation metric to measure the similarity between system output and -reference meaning representations, (3) an end-to-end model with a novel copy -mechanism that supports intrasentential coreference, and (4) an evaluation -dataset where experiments show our model outperforms strong baselines by at -least 1.18 F1 score. -" -7265,1804.08049,"Afshin Rahimi, Trevor Cohn, Timothy Baldwin",Semi-supervised User Geolocation via Graph Convolutional Networks,cs.CL," Social media user geolocation is vital to many applications such as event -detection. In this paper, we propose GCN, a multiview geolocation model based -on Graph Convolutional Networks, that uses both text and network context. We -compare GCN to the state-of-the-art, and to two baselines we propose, and show -that our model achieves or is competitive with the state- of-the-art over three -benchmark geolocation datasets when sufficient supervision is available. We -also evaluate GCN under a minimal supervision scenario, and show it outperforms -baselines. We find that highway network gates are essential for controlling the -amount of useful neighbourhood expansion in GCN. -" -7266,1804.08050,"Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda",Multi-Head Decoder for End-to-End Speech Recognition,cs.CL," This paper presents a new network architecture called multi-head decoder for -end-to-end speech recognition as an extension of a multi-head attention model. -In the multi-head attention model, multiple attentions are calculated, and -then, they are integrated into a single attention. On the other hand, instead -of the integration in the attention level, our proposed method uses multiple -decoders for each attention and integrates their outputs to generate a final -output. Furthermore, in order to make each head to capture the different -modalities, different attention functions are used for each head, leading to -the improvement of the recognition performance with an ensemble effect. To -evaluate the effectiveness of our proposed method, we conduct an experimental -evaluation using Corpus of Spontaneous Japanese. Experimental results -demonstrate that our proposed method outperforms the conventional methods such -as location-based and multi-head attention models, and that it can capture -different speech/linguistic contexts within the attention-based encoder-decoder -framework. -" -7267,1804.08053,"Tanner Bohn, Yining Hu, Jinhang Zhang, Charles X. Ling",Learning Sentence Embeddings for Coherence Modelling and Beyond,cs.CL," We present a novel and effective technique for performing text coherence -tasks while facilitating deeper insights into the data. Despite obtaining -ever-increasing task performance, modern deep-learning approaches to NLP tasks -often only provide users with the final network decision and no additional -understanding of the data. In this work, we show that a new type of sentence -embedding learned through self-supervision can be applied effectively to text -coherence tasks while serving as a window through which deeper understanding of -the data can be obtained. To produce these sentence embeddings, we train a -recurrent neural network to take individual sentences and predict their -location in a document in the form of a distribution over locations. We -demonstrate that these embeddings, combined with simple visual heuristics, can -be used to achieve performance competitive with state-of-the-art on multiple -text coherence tasks, outperforming more complex and specialized approaches. -Additionally, we demonstrate that these embeddings can provide insights useful -to writers for improving writing quality and informing document structuring, -and assisting readers in summarizing and locating information. -" -7268,1804.08057,"Md Faisal Mahbub Chowdhury and Vijil Chenthamarakshan and Rishav - Chakravarti and Alfio M. Gliozzo","A Study on Passage Re-ranking in Embedding based Unsupervised Semantic - Search",cs.CL cs.IR," State of the art approaches for (embedding based) unsupervised semantic -search exploits either compositional similarity (of a query and a passage) or -pair-wise word (or term) similarity (from the query and the passage). By -design, word based approaches do not incorporate similarity in the larger -context (query/passage), while compositional similarity based approaches are -usually unable to take advantage of the most important cues in the context. In -this paper we propose a new compositional similarity based approach, called -variable centroid vector (VCVB), that tries to address both of these -limitations. We also presents results using a different type of compositional -similarity based approach by exploiting universal sentence embedding. We -provide empirical evaluation on two different benchmarks. -" -7269,1804.08058,"Xiao Yang, Madian Khabsa, Miaosen Wang, Wei Wang, Madian Khabsa, Ahmed - Awadallah, Daniel Kifer, C. Lee Giles","Adversarial Training for Community Question Answer Selection Based on - Multi-scale Matching",cs.CL," Community-based question answering (CQA) websites represent an important -source of information. As a result, the problem of matching the most valuable -answers to their corresponding questions has become an increasingly popular -research topic. We frame this task as a binary (relevant/irrelevant) -classification problem, and present an adversarial training framework to -alleviate label imbalance issue. We employ a generative model to iteratively -sample a subset of challenging negative samples to fool our classification -model. Both models are alternatively optimized using REINFORCE algorithm. The -proposed method is completely different from previous ones, where negative -samples in training set are directly used or uniformly down-sampled. Further, -we propose using Multi-scale Matching which explicitly inspects the correlation -between words and ngrams of different levels of granularity. We evaluate the -proposed method on SemEval 2016 and SemEval 2017 datasets and achieves -state-of-the-art or similar performance. -" -7270,1804.08064,"Young-Bum Kim, Dongchan Kim, Joo-Kyung Kim, Ruhi Sarikaya","A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain - Classification in Natural Language Understanding",cs.CL," Intelligent personal digital assistants (IPDAs), a popular real-life -application with spoken language understanding capabilities, can cover -potentially thousands of overlapping domains for natural language -understanding, and the task of finding the best domain to handle an utterance -becomes a challenging problem on a large scale. In this paper, we propose a set -of efficient and scalable neural shortlisting-reranking models for large-scale -domain classification in IPDAs. The shortlisting stage focuses on efficiently -trimming all domains down to a list of k-best candidate domains, and the -reranking stage performs a list-wise reranking of the initial k-best domains -with additional contextual information. We show the effectiveness of our -approach with extensive experiments on 1,500 IPDA domains. -" -7271,1804.08065,"Young-Bum Kim, Dongchan Kim, Anjishnu Kumar, Ruhi Sarikaya",Efficient Large-Scale Domain Classification with Personalized Attention,cs.CL," In this paper, we explore the task of mapping spoken language utterances to -one of thousands of natural language understanding domains in intelligent -personal digital assistants (IPDAs). This scenario is observed for many -mainstream IPDAs in industry that allow third parties to develop thousands of -new domains to augment built-in ones to rapidly increase domain coverage and -overall IPDA capabilities. We propose a scalable neural model architecture with -a shared encoder, a novel attention mechanism that incorporates personalization -information and domain-specific classifiers that solves the problem -efficiently. Our architecture is designed to efficiently accommodate new -domains that appear in-between full model retraining cycles with a rapid -bootstrapping mechanism two orders of magnitude faster than retraining. We -account for practical constraints in real-time production systems, and design -to minimize memory footprint and runtime latency. We demonstrate that -incorporating personalization results in significantly more accurate domain -classification in the setting with thousands of overlapping domains. -" -7272,1804.08069,"Tiancheng Zhao, Kyusong Lee and Maxine Eskenazi","Unsupervised Discrete Sentence Representation Learning for Interpretable - Neural Dialog Generation",cs.CL cs.AI," The encoder-decoder dialog model is one of the most prominent methods used to -build dialog systems in complex domains. Yet it is limited because it cannot -output interpretable actions as in traditional systems, which hinders humans -from understanding its generation process. We present an unsupervised discrete -sentence representation learning method that can integrate with any existing -encoder-decoder dialog models for interpretable response generation. Building -upon variational autoencoders (VAEs), we present two novel models, DI-VAE and -DI-VST that improve VAEs and can discover interpretable semantics via either -auto encoding or context predicting. Our methods have been validated on -real-world dialog datasets to discover semantic representations and enhance -encoder-decoder models with interpretable generation. -" -7273,1804.08077,"Fenfei Guo, Mohit Iyyer, Jordan Boyd-Graber",Inducing and Embedding Senses with Scaled Gumbel Softmax,cs.CL," Methods for learning word sense embeddings represent a single word with -multiple sense-specific vectors. These methods should not only produce -interpretable sense embeddings, but should also learn how to select which sense -to use in a given context. We propose an unsupervised model that learns sense -embeddings using a modified Gumbel softmax function, which allows for -differentiable discrete sense selection. Our model produces sense embeddings -that are competitive (and sometimes state of the art) on multiple similarity -based downstream evaluations. However, performance on these downstream -evaluations tasks does not correlate with interpretability of sense embeddings, -as we discover through an interpretability comparison with competing -multi-sense embeddings. While many previous approaches perform well on -downstream evaluations, they do not produce interpretable embeddings and learn -duplicated sense groups; our method achieves the best of both worlds. -" -7274,1804.08094,"Edison Marrese-Taylor, Suzana Ilic, Jorge A. Balazs, Yutaka Matsuo, - Helmut Prendinger",IIIDYT at SemEval-2018 Task 3: Irony detection in English tweets,cs.CL," In this paper we introduce our system for the task of Irony detection in -English tweets, a part of SemEval 2018. We propose representation learning -approach that relies on a multi-layered bidirectional LSTM, without using -external features that provide additional semantic information. Although our -model is able to outperform the baseline in the validation set, our results -show limited generalization power over the test set. Given the limited size of -the dataset, we think the usage of more pre-training schemes would greatly -improve the obtained results. -" -7275,1804.08117,Masatoshi Tsuchiya,"Performance Impact Caused by Hidden Bias of Training Data for - Recognizing Textual Entailment",cs.CL cs.AI," The quality of training data is one of the crucial problems when a -learning-centered approach is employed. This paper proposes a new method to -investigate the quality of a large corpus designed for the recognizing textual -entailment (RTE) task. The proposed method, which is inspired by a statistical -hypothesis test, consists of two phases: the first phase is to introduce the -predictability of textual entailment labels as a null hypothesis which is -extremely unacceptable if a target corpus has no hidden bias, and the second -phase is to test the null hypothesis using a Naive Bayes model. The -experimental result of the Stanford Natural Language Inference (SNLI) corpus -does not reject the null hypothesis. Therefore, it indicates that the SNLI -corpus has a hidden bias which allows prediction of textual entailment labels -from hypothesis sentences even if no context information is given by a premise -sentence. This paper also presents the performance impact of NN models for RTE -caused by this hidden bias. -" -7276,1804.08125,"Jeff Mitchell, Sebastian Riedel","Reduce, Reuse, Recycle: New uses for old QA resources",cs.CL," We investigate applying repurposed generic QA data and models to a recently -proposed relation extraction task. We find that training on SQuAD produces -better zero-shot performance and more robust generalisation compared to the -task specific training set. We also show that standard QA architectures (e.g. -FastQA or BiDAF) can be applied to the slot filling queries without the need -for model modification. -" -7277,1804.08139,"Renjie Zheng, Junkun Chen, Xipeng Qiu","Same Representation, Different Attentions: Shareable Sentence - Representation Learning from Multiple Tasks",cs.CL cs.AI," Distributed representation plays an important role in deep learning based -natural language processing. However, the representation of a sentence often -varies in different tasks, which is usually learned from scratch and suffers -from the limited amounts of training data. In this paper, we claim that a good -sentence representation should be invariant and can benefit the various -subsequent tasks. To achieve this purpose, we propose a new scheme of -information sharing for multi-task learning. More specifically, all tasks share -the same sentence representation and each task can select the task-specific -information from the shared sentence representation with attention mechanism. -The query vector of each task's attention could be either static parameters or -generated dynamically. We conduct extensive experiments on 16 different text -classification tasks, which demonstrate the benefits of our architecture. -" -7278,1804.08166,Dongxu Zhang and Zhichao Yang,Word Embedding Perturbation for Sentence Classification,cs.CL," In this technique report, we aim to mitigate the overfitting problem of -natural language by applying data augmentation methods. Specifically, we -attempt several types of noise to perturb the input word embedding, such as -Gaussian noise, Bernoulli noise, and adversarial noise, etc. We also apply -several constraints on different types of noise. By implementing these proposed -data augmentation methods, the baseline models can gain improvements on several -sentence classification tasks. -" -7279,1804.08186,"Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister - Lind\'en",Automatic Language Identification in Texts: A Survey,cs.CL," Language identification (LI) is the problem of determining the natural -language that a document or part thereof is written in. Automatic LI has been -extensively researched for over fifty years. Today, LI is a key part of many -text processing pipelines, as text processing techniques generally assume that -the language of the input text is known. Research in this area has recently -been especially active. This article provides a brief history of LI research, -and an extensive survey of the features and methods used so far in the LI -literature. For describing the features and methods we introduce a unified -notation. We discuss evaluation methods, applications of LI, as well as -off-the-shelf LI systems that do not require training by the end user. Finally, -we identify open issues, survey the work to date on each issue, and propose -future directions for research in LI. -" -7280,1804.08198,"Yichao Lu, Phillip Keung, Faisal Ladhak, Vikas Bhardwaj, Shaonan - Zhang, Jason Sun",A neural interlingua for multilingual machine translation,cs.CL," We incorporate an explicit neural interlingua into a multilingual -encoder-decoder neural machine translation (NMT) architecture. We demonstrate -that our model learns a language-independent representation by performing -direct zero-shot translation (without using pivot translation), and by using -the source sentence embeddings to create an English Yelp review classifier -that, through the mediation of the neural interlingua, can also classify French -and German reviews. Furthermore, we show that, despite using a smaller number -of parameters than a pairwise collection of bilingual NMT models, our approach -produces comparable BLEU scores for each language pair in WMT15. -" -7281,1804.08199,"Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, Andrew - McCallum",Linguistically-Informed Self-Attention for Semantic Role Labeling,cs.CL," Current state-of-the-art semantic role labeling (SRL) uses a deep neural -network with no explicit linguistic features. However, prior work has shown -that gold syntax trees can dramatically improve SRL decoding, suggesting the -possibility of increased accuracy from explicit modeling of syntax. In this -work, we present linguistically-informed self-attention (LISA): a neural -network model that combines multi-head self-attention with multi-task learning -across dependency parsing, part-of-speech tagging, predicate detection and SRL. -Unlike previous models which require significant pre-processing to prepare -linguistic features, LISA can incorporate syntax using merely raw tokens as -input, encoding the sequence only once to simultaneously perform parsing, -predicate detection and role labeling for all predicates. Syntax is -incorporated by training one attention head to attend to syntactic parents for -each token. Moreover, if a high-quality syntactic parse is already available, -it can be beneficially injected at test time without re-training our SRL model. -In experiments on CoNLL-2005 SRL, LISA achieves new state-of-the-art -performance for a model using predicted predicates and standard word -embeddings, attaining 2.5 F1 absolute higher than the previous state-of-the-art -on newswire and more than 3.5 F1 on out-of-domain data, nearly 10% reduction in -error. On ConLL-2012 English SRL we also show an improvement of more than 2.5 -F1. LISA also out-performs the state-of-the-art with contextually-encoded -(ELMo) word representations, by nearly 1.0 F1 on news and more than 2.0 F1 on -out-of-domain text. -" -7282,1804.08204,"Jatin Ganhotra, Lazaros Polymenakos",Knowledge-based end-to-end memory networks,cs.CL cs.AI," End-to-end dialog systems have become very popular because they hold the -promise of learning directly from human to human dialog interaction. Retrieval -and Generative methods have been explored in this area with mixed results. A -key element that is missing so far, is the incorporation of a-priori knowledge -about the task at hand. This knowledge may exist in the form of structured or -unstructured information. As a first step towards this direction, we present a -novel approach, Knowledge based end-to-end memory networks (KB-memN2N), which -allows special handling of named entities for goal-oriented dialog tasks. We -present results on two datasets, DSTC6 challenge dataset and dialog bAbI tasks. -" -7283,1804.08205,Sabrina J. Mielke and Jason Eisner,"Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model",cs.CL," We show how the spellings of known words can help us deal with unknown words -in open-vocabulary NLP tasks. The method we propose can be used to extend any -closed-vocabulary generative model, but in this paper we specifically consider -the case of neural language modeling. Our Bayesian generative story combines a -standard RNN language model (generating the word tokens in each sentence) with -an RNN-based spelling model (generating the letters in each word type). These -two RNNs respectively capture sentence structure and word structure, and are -kept separate as in linguistics. By invoking the second RNN to generate -spellings for novel words in context, we obtain an open-vocabulary language -model. For known words, embeddings are naturally inferred by combining evidence -from type spelling and token context. Comparing to baselines (including a novel -strong baseline), we beat previous work and establish state-of-the-art results -on multiple datasets. -" -7284,1804.08207,"Adam Poliak, Aparajita Haldar, Rachel Rudinger, J. Edward Hu, Ellie - Pavlick, Aaron Steven White, Benjamin Van Durme","Collecting Diverse Natural Language Inference Problems for Sentence - Representation Evaluation",cs.CL," We present a large-scale collection of diverse natural language inference -(NLI) datasets that help provide insight into how well a sentence -representation captures distinct types of reasoning. The collection results -from recasting 13 existing datasets from 7 semantic phenomena into a common NLI -structure, resulting in over half a million labeled context-hypothesis pairs in -total. We refer to our collection as the DNC: Diverse Natural Language -Inference Collection. The DNC is available online at https://www.decomp.net, -and will grow over time as additional resources are recast and added from novel -sources. -" -7285,1804.08217,Andrea Madotto and Chien-Sheng Wu and Pascale Fung,"Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End - Task-Oriented Dialog Systems",cs.CL," End-to-end task-oriented dialog systems usually suffer from the challenge of -incorporating knowledge bases. In this paper, we propose a novel yet simple -end-to-end differentiable model called memory-to-sequence (Mem2Seq) to address -this issue. Mem2Seq is the first neural generative model that combines the -multi-hop attention over memories with the idea of pointer network. We -empirically show how Mem2Seq controls each generation step, and how its -multi-hop attention mechanism helps in learning correlations between memories. -In addition, our model is quite general without complicated task-specific -designs. As a result, we show that Mem2Seq can be trained faster and attain the -state-of-the-art performance on three different task-oriented dialog datasets. -" -7286,1804.08228,"Yijia Liu, Yi Zhu, Wanxiang Che, Bing Qin, Nathan Schneider, Noah A. - Smith",Parsing Tweets into Universal Dependencies,cs.CL," We study the problem of analyzing tweets with Universal Dependencies. We -extend the UD guidelines to cover special constructions in tweets that affect -tokenization, part-of-speech tagging, and labeled dependencies. Using the -extended guidelines, we create a new tweet treebank for English (Tweebank v2) -that is four times larger than the (unlabeled) Tweebank v1 introduced by Kong -et al. (2014). We characterize the disagreements between our annotators and -show that it is challenging to deliver consistent annotation due to ambiguity -in understanding and explaining tweets. Nonetheless, using the new treebank, we -build a pipeline system to parse raw tweets into UD. To overcome annotation -noise without sacrificing computational efficiency, we propose a new method to -distill an ensemble of 20 transition-based parsers into a single one. Our -parser achieves an improvement of 2.2 in LAS over the un-ensembled baseline and -outperforms parsers that are state-of-the-art on other treebanks in both -accuracy and speed. -" -7287,1804.08234,"Muhmmad Al-Khiza'ay, Noora Alallaq, Qusay Alanoz, Adil Al-Azzawi, - N.Maheswari","PeRView: A Framework for Personalized Review Selection Using - Micro-Reviews",cs.IR cs.CL cs.SI," In the contemporary era, social media has its influence on people in making -decisions. The proliferation of online reviews with diversified and verbose -content often causes problems inaccurate decision making. Since online reviews -have an impact on people of all walks of life while taking decisions, choosing -appropriate reviews based on the podsolization consisting is very important -since it relies on using such micro-reviews consistency to evaluate the review -set section. Micro-reviews are very concise and directly talk about product or -service instead of having unnecessary verbose content. Thus, micro-reviews can -help in choosing reviews based on their personalized consistency that is -related to directly or indirectly to the main profile of the reviews. -Personalized reviews selection that is highly relevant with high personalized -coverage in terms of matching with micro-reviews is the main problem that is -considered in this paper. Furthermore, personalization with user preferences -while making review selection is also considered based on the personalized -users' profile. Towards this end, we proposed a framework known as PeRView for -personalized review selection using micro-reviews based on the proposed -evaluation metric approach which considering two main factors (personalized -matching score and subset size). Personalized Review Selection Algorithm (PRSA) -is proposed which makes use of multiple similarity measures merged to have -highly efficient personalized reviews matching function for selection. The -experimental results based on using reviews dataset which is collected from -YELP.COM while micro-reviews dataset is obtained from Foursqure.COM. show that -the personalized reviews selection is a very empirical case of study. -" -7288,1804.08261,"Zhongliang Yang, Yongfeng Huang, Yiran Jiang, Yuxi Sun, Yu-Jin Zhan, - Pengcheng Luo","Clinical Assistant Diagnosis for Electronic Medical Record Based on - Convolutional Neural Network",cs.CL cs.CY cs.LG," Automatically extracting useful information from electronic medical records -along with conducting disease diagnoses is a promising task for both clinical -decision support(CDS) and neural language processing(NLP). Most of the existing -systems are based on artificially constructed knowledge bases, and then -auxiliary diagnosis is done by rule matching. In this study, we present a -clinical intelligent decision approach based on Convolutional Neural -Networks(CNN), which can automatically extract high-level semantic information -of electronic medical records and then perform automatic diagnosis without -artificial construction of rules or knowledge bases. We use collected 18,590 -copies of the real-world clinical electronic medical records to train and test -the proposed model. Experimental results show that the proposed model can -achieve 98.67\% accuracy and 96.02\% recall, which strongly supports that using -convolutional neural network to automatically learn high-level semantic -features of electronic medical records and then conduct assist diagnosis is -feasible and effective. -" -7289,1804.08262,"Ryan Cotterell, Christo Kirov, Mans Hulden, Jason Eisner",On the Diachronic Stability of Irregularity in Inflectional Morphology,cs.CL," Many languages' inflectional morphological systems are replete with -irregulars, i.e., words that do not seem to follow standard inflectional rules. -In this work, we quantitatively investigate the conditions under which -irregulars can survive in a language over the course of time. Using recurrent -neural networks to simulate language learners, we test the diachronic relation -between frequency of words and their irregularity. -" -7290,1804.08266,"Tim Niven, Hung-Yu Kao","NLITrans at SemEval-2018 Task 12: Transfer of Semantic Knowledge for - Argument Comprehension",cs.CL," The Argument Reasoning Comprehension Task requires significant language -understanding and complex reasoning over world knowledge. We focus on transfer -of a sentence encoder to bootstrap more complicated models given the small size -of the dataset. Our best model uses a pre-trained BiLSTM to encode input -sentences, learns task-specific features for the argument and warrants, then -performs independent argument-warrant matching. This model achieves mean test -set accuracy of 64.43%. Encoder transfer yields a significant gain to our best -model over random initialization. Independent warrant matching effectively -doubles the size of the dataset and provides additional regularization. We -demonstrate that regularization comes from ignoring statistical correlations -between warrant features and position. We also report an experiment with our -best model that only matches warrants to reasons, ignoring claims. Relatively -low performance degradation suggests that our model is not necessarily learning -the intended task. -" -7291,1804.08280,"Ji Ho Park, Peng Xu, Pascale Fung","PlusEmo2Vec at SemEval-2018 Task 1: Exploiting emotion knowledge from - emoji and #hashtags",cs.CL," This paper describes our system that has been submitted to SemEval-2018 Task -1: Affect in Tweets (AIT) to solve five subtasks. We focus on modeling both -sentence and word level representations of emotion inside texts through large -distantly labeled corpora with emojis and hashtags. We transfer the emotional -knowledge by exploiting neural network models as feature extractors and use -these representations for traditional machine learning models such as support -vector regression (SVR) and logistic regression to solve the competition tasks. -Our system is placed among the Top3 for all subtasks we participated. -" -7292,1804.08313,"Diego Marcheggiani, Jasmijn Bastings, Ivan Titov","Exploiting Semantics in Neural Machine Translation with Graph - Convolutional Networks",cs.CL," Semantic representations have long been argued as potentially useful for -enforcing meaning preservation and improving generalization performance of -machine translation methods. In this work, we are the first to incorporate -information about predicate-argument structure of source sentences (namely, -semantic-role representations) into neural machine translation. We use Graph -Convolutional Networks (GCNs) to inject a semantic bias into sentence encoders -and achieve improvements in BLEU scores over the linguistic-agnostic and -syntax-aware versions on the English--German language pair. -" -7293,1804.08316,"J.Goikoetxea, A.Soroa, E.Agirre",Bilingual Embeddings with Random Walks over Multilingual Wordnets,cs.CL cs.AI," Bilingual word embeddings represent words of two languages in the same space, -and allow to transfer knowledge from one language to the other without machine -translation. The main approach is to train monolingual embeddings first and -then map them using bilingual dictionaries. In this work, we present a novel -method to learn bilingual embeddings based on multilingual knowledge bases (KB) -such as WordNet. Our method extracts bilingual information from multilingual -wordnets via random walks and learns a joint embedding space in one go. We -further reinforce cross-lingual equivalence adding bilingual con- straints in -the loss function of the popular skipgram model. Our experiments involve twelve -cross-lingual word similarity and relatedness datasets in six lan- guage pairs -covering four languages, and show that: 1) random walks over mul- tilingual -wordnets improve results over just using dictionaries; 2) multilingual wordnets -on their own improve over text-based systems in similarity datasets; 3) the -good results are consistent for large wordnets (e.g. English, Spanish), smaller -wordnets (e.g. Basque) or loosely aligned wordnets (e.g. Italian); 4) the -combination of wordnets and text yields the best results, above mapping-based -approaches. Our method can be applied to richer KBs like DBpedia or Babel- Net, -and can be easily extended to multilingual embeddings. All software and -resources are open source. -" -7294,1804.08338,"Yibo Sun, Duyu Tang, Nan Duan, Jianshu Ji, Guihong Cao, Xiaocheng - Feng, Bing Qin, Ting Liu, Ming Zhou",Semantic Parsing with Syntax- and Table-Aware SQL Generation,cs.CL," We present a generative model to map natural language questions into SQL -queries. Existing neural network based approaches typically generate a SQL -query word-by-word, however, a large portion of the generated results are -incorrect or not executable due to the mismatch between question words and -table contents. Our approach addresses this problem by considering the -structure of table and the syntax of SQL language. The quality of the generated -SQL query is significantly improved through (1) learning to replicate content -from column names, cells or SQL keywords; and (2) improving the generation of -WHERE clause by leveraging the column-cell relation. Experiments are conducted -on WikiSQL, a recently released dataset with the largest question-SQL pairs. -Our approach significantly improves the state-of-the-art execution accuracy -from 69.0% to 74.4%. -" -7295,1804.08420,"Qiang Ning, Zhongzhi Yu, Chuchu Fan, Dan Roth",Exploiting Partially Annotated Data for Temporal Relation Extraction,cs.CL cs.LG stat.ML," Annotating temporal relations (TempRel) between events described in natural -language is known to be labor intensive, partly because the total number of -TempRels is quadratic in the number of events. As a result, only a small number -of documents are typically annotated, limiting the coverage of various -lexical/semantic phenomena. In order to improve existing approaches, one -possibility is to make use of the readily available, partially annotated data -(P as in partial) that cover more documents. However, missing annotations in P -are known to hurt, rather than help, existing systems. This work is a case -study in exploring various usages of P for TempRel extraction. Results show -that despite missing annotations, P is still a useful supervision signal for -this task within a constrained bootstrapping learning framework. The system -described in this system is publicly available. -" -7296,1804.08426,"Tyler Renslow and G\""unter Neumann","LightRel SemEval-2018 Task 7: Lightweight and Fast Relation - Classification",cs.CL cs.AI," We present LightRel, a lightweight and fast relation classifier. Our goal is -to develop a high baseline for different relation extraction tasks. By defining -only very few data-internal, word-level features and external knowledge sources -in the form of word clusters and word embeddings, we train a fast and simple -linear classifier. -" -7297,1804.08438,"Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, - Daisuke Saito, Fernando Villavicencio, Zhenhua Ling","A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging - from Spoofing Countermeasures for Speech Artifact Assessment",eess.AS cs.CL cs.SD stat.ML," Voice conversion (VC) aims at conversion of speaker characteristic without -altering content. Due to training data limitations and modeling imperfections, -it is difficult to achieve believable speaker mimicry without introducing -processing artifacts; performance assessment of VC, therefore, usually involves -both speaker similarity and quality evaluation by a human panel. As a -time-consuming, expensive, and non-reproducible process, it hinders rapid -prototyping of new VC technology. We address artifact assessment using an -alternative, objective approach leveraging from prior work on spoofing -countermeasures (CMs) for automatic speaker verification. Therein, CMs are used -for rejecting `fake' inputs such as replayed, synthetic or converted speech but -their potential for automatic speech artifact assessment remains unknown. This -study serves to fill that gap. As a supplement to subjective results for the -2018 Voice Conversion Challenge (VCC'18) data, we configure a standard -constant-Q cepstral coefficient CM to quantify the extent of processing -artifacts. Equal error rate (EER) of the CM, a confusability index of VC -samples with real human speech, serves as our artifact measure. Two clusters of -VCC'18 entries are identified: low-quality ones with detectable artifacts (low -EERs), and higher quality ones with less artifacts. None of the VCC'18 systems, -however, is perfect: all EERs are < 30 % (the `ideal' value would be 50 %). Our -preliminary findings suggest potential of CMs outside of their original -application, as a supplemental optimization and benchmarking tool to enhance VC -technology. -" -7298,1804.08454,"Akilesh B, Abhishek Sinha, Mausoom Sarkar, Balaji Krishnamurthy","Attention Based Natural Language Grounding by Navigating Virtual - Environment",cs.CL cs.AI cs.CV cs.LG," In this work, we focus on the problem of grounding language by training an -agent to follow a set of natural language instructions and navigate to a target -object in an environment. The agent receives visual information through raw -pixels and a natural language instruction telling what task needs to be -achieved and is trained in an end-to-end way. We develop an attention mechanism -for multi-modal fusion of visual and textual modalities that allows the agent -to learn to complete the task and achieve language grounding. Our experimental -results show that our attention mechanism outperforms the existing multi-modal -fusion mechanisms proposed for both 2D and 3D environments in order to solve -the above-mentioned task in terms of both speed and success rate. We show that -the learnt textual representations are semantically meaningful as they follow -vector arithmetic in the embedding space. The effectiveness of our attention -approach over the contemporary fusion mechanisms is also highlighted from the -textual embeddings learnt by the different approaches. We also show that our -model generalizes effectively to unseen scenarios and exhibit zero-shot -generalization capabilities both in 2D and 3D environments. The code for our 2D -environment as well as the models that we developed for both 2D and 3D are -available at https://github.com/rl-lang-grounding/rl-lang-ground. -" -7299,1804.08460,"Daniil Sorokin, Iryna Gurevych","Mixing Context Granularities for Improved Entity Linking on Question - Answering Data across Entity Categories",cs.CL," The first stage of every knowledge base question answering approach is to -link entities in the input question. We investigate entity linking in the -context of a question answering task and present a jointly optimized neural -architecture for entity mention detection and entity disambiguation that models -the surrounding context on different levels of granularity. We use the Wikidata -knowledge base and available question answering datasets to create benchmarks -for entity linking on question answering data. Our approach outperforms the -previous state-of-the-art system on this data, resulting in an average 8% -improvement of the final score. We further demonstrate that our model delivers -a strong performance across different entity categories. -" -7300,1804.08477,"Zied Elloumi and Laurent Besacier and Olivier Galibert and Juliette - Kahn and Benjamin Lecouteux","ASR Performance Prediction on Unseen Broadcast Programs using - Convolutional Neural Networks",cs.CL," In this paper, we address a relatively new task: prediction of ASR -performance on unseen broadcast programs. We first propose an heterogenous -French corpus dedicated to this task. Two prediction approaches are compared: a -state-of-the-art performance prediction based on regression (engineered -features) and a new strategy based on convolutional neural networks (learnt -features). We particularly focus on the combination of both textual (ASR -transcription) and signal inputs. While the joint use of textual and signal -features did not work for the regression baseline, the combination of inputs -for CNNs leads to the best WER prediction performance. We also show that our -CNN prediction remarkably predicts the WER distribution on a collection of -speech recordings. -" -7301,1804.08559,"Srijan Kumar, Neil Shah",False Information on Web and Social Media: A Survey,cs.SI cs.CL cs.CY cs.DL," False information can be created and spread easily through the web and social -media platforms, resulting in widespread real-world impact. Characterizing how -false information proliferates on social platforms and why it succeeds in -deceiving readers are critical to develop efficient detection algorithms and -tools for early detection. A recent surge of research in this area has aimed to -address the key issues using methods based on feature engineering, graph -mining, and information modeling. Majority of the research has primarily -focused on two broad categories of false information: opinion-based (e.g., fake -reviews), and fact-based (e.g., false news and hoaxes). Therefore, in this -work, we present a comprehensive survey spanning diverse aspects of false -information, namely (i) the actors involved in spreading false information, -(ii) rationale behind successfully deceiving readers, (iii) quantifying the -impact of false information, (iv) measuring its characteristics across -different dimensions, and finally, (iv) algorithms developed to detect false -information. In doing so, we create a unified framework to describe these -recent methods and highlight a number of important directions for future -research. -" -7302,1804.08666,"Christopher Mitcheltree, Skyler Wharton, and Avneesh Saluja","Using Aspect Extraction Approaches to Generate Review Summaries and User - Profiles",cs.CL," Reviews of products or services on Internet marketplace websites contain a -rich amount of information. Users often wish to survey reviews or review -snippets from the perspective of a certain aspect, which has resulted in a -large body of work on aspect identification and extraction from such corpora. -In this work, we evaluate a newly-proposed neural model for aspect extraction -on two practical tasks. The first is to extract canonical sentences of various -aspects from reviews, and is judged by human evaluators against alternatives. A -$k$-means baseline does remarkably well in this setting. The second experiment -focuses on the suitability of the recovered aspect distributions to represent -users by the reviews they have written. Through a set of review reranking -experiments, we find that aspect-based profiles can largely capture notions of -user preferences, by showing that divergent users generate markedly different -review rankings. -" -7303,1804.08675,"Aniket Jain, Bhavya Sharma, Paridhi Choudhary, Rohan Sangave, William - Yang",Data-Driven Investigative Journalism For Connectas Dataset,cs.CL," The following paper explores the possibility of using Machine Learning -algorithms to detect the cases of corruption and malpractice by governments. -The dataset used by the authors contains information about several government -contracts in Colombia from year 2007 to 2012. The authors begin with exploring -and cleaning the data, followed by which they perform feature engineering -before finally implementing Machine Learning models to detect anomalies in the -given dataset. -" -7304,1804.08749,Amir Bakarov,"Can Eye Movement Data Be Used As Ground Truth For Word Embeddings - Evaluation?",cs.CL," In recent years a certain success in the task of modeling lexical semantics -was obtained with distributional semantic models. Nevertheless, the scientific -community is still unaware what is the most reliable evaluation method for -these models. Some researchers argue that the only possible gold standard could -be obtained from neuro-cognitive resources that store information about human -cognition. One of such resources is eye movement data on silent reading. The -goal of this work is to test the hypothesis of whether such data could be used -to evaluate distributional semantic models on different languages. We propose -experiments with English and Russian eye movement datasets (Provo Corpus, GECO -and Russian Sentence Corpus), word vectors (Skip-Gram models trained on -national corpora and Web corpora) and word similarity datasets of Russian and -English assessed by humans in order to find the existence of correlation -between embeddings and eye movement data and test the hypothesis that this -correlation is language independent. As a result, we found that the validity of -the hypothesis being tested could be questioned. -" -7305,1804.08756,"Hai Hu, Wen Li, Sandra K\""ubler",Detecting Syntactic Features of Translated Chinese,cs.CL," We present a machine learning approach to distinguish texts translated to -Chinese (by humans) from texts originally written in Chinese, with a focus on a -wide range of syntactic features. Using Support Vector Machines (SVMs) as -classifier on a genre-balanced corpus in translation studies of Chinese, we -find that constituent parse trees and dependency triples as features without -lexical information perform very well on the task, with an F-measure above 90%, -close to the results of lexical n-gram features, without the risk of learning -topic information rather than translation features. Thus, we claim syntactic -features alone can accurately distinguish translated from original Chinese. -Translated Chinese exhibits an increased use of determiners, subject position -pronouns, NP + 'de' as NP modifiers, multiple NPs or VPs conjoined by a Chinese -specific punctuation, among other structures. We also interpret the syntactic -features with reference to previous translation studies in Chinese, -particularly the usage of pronouns. -" -7306,1804.08759,"Chen Qu, Liu Yang, W. Bruce Croft, Johanne R. Trippas, Yongfeng Zhang - and Minghui Qiu","Analyzing and Characterizing User Intent in Information-seeking - Conversations",cs.IR cs.CL," Understanding and characterizing how people interact in information-seeking -conversations is crucial in developing conversational search systems. In this -paper, we introduce a new dataset designed for this purpose and use it to -analyze information-seeking conversations by user intent distribution, -co-occurrence, and flow patterns. The MSDialog dataset is a labeled dialog -dataset of question answering (QA) interactions between information seekers and -providers from an online forum on Microsoft products. The dataset contains more -than 2,000 multi-turn QA dialogs with 10,000 utterances that are annotated with -user intent on the utterance level. Annotations were done using crowdsourcing. -With MSDialog, we find some highly recurring patterns in user intent during an -information-seeking process. They could be useful for designing conversational -search systems. We will make our dataset freely available to encourage -exploration of information-seeking conversation models. -" -7307,1804.08771,Matt Post,A Call for Clarity in Reporting BLEU Scores,cs.CL," The field of machine translation faces an under-recognized problem because of -inconsistency in the reporting of scores from its dominant metric. Although -people refer to ""the"" BLEU score, BLEU is in fact a parameterized metric whose -values can vary wildly with changes to these parameters. These parameters are -often not reported or are hard to find, and consequently, BLEU scores between -papers cannot be directly compared. I quantify this variation, finding -differences as high as 1.8 between commonly used configurations. The main -culprit is different tokenization and normalization schemes applied to the -reference. Pointing to the success of the parsing community, I suggest machine -translation researchers settle upon the BLEU scheme used by the annual -Conference on Machine Translation (WMT), which does not allow for user-supplied -reference processing, and provide a new tool, SacreBLEU, to facilitate this. -" -7308,1804.08782,"Md Nasir, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou","Towards an Unsupervised Entrainment Distance in Conversational Speech - using Deep Neural Networks",eess.AS cs.CL cs.SD," Entrainment is a known adaptation mechanism that causes interaction -participants to adapt or synchronize their acoustic characteristics. -Understanding how interlocutors tend to adapt to each other's speaking style -through entrainment involves measuring a range of acoustic features and -comparing those via multiple signal comparison methods. In this work, we -present a turn-level distance measure obtained in an unsupervised manner using -a Deep Neural Network (DNN) model, which we call Neural Entrainment Distance -(NED). This metric establishes a framework that learns an embedding from the -population-wide entrainment in an unlabeled training corpus. We use the -framework for a set of acoustic features and validate the measure -experimentally by showing its efficacy in distinguishing real conversations -from fake ones created by randomly shuffling speaker turns. Moreover, we show -real world evidence of the validity of the proposed measure. We find that high -value of NED is associated with high ratings of emotional bond in suicide -assessment interviews, which is consistent with prior studies. -" -7309,1804.08798,Michael Petrochuk and Luke Zettlemoyer,SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach,cs.CL cs.AI," The SimpleQuestions dataset is one of the most commonly used benchmarks for -studying single-relation factoid questions. In this paper, we present new -evidence that this benchmark can be nearly solved by standard methods. First we -show that ambiguity in the data bounds performance on this benchmark at 83.4%; -there are often multiple answers that cannot be disambiguated from the -linguistic signal alone. Second we introduce a baseline that sets a new -state-of-the-art performance level at 78.1% accuracy, despite using standard -methods. Finally, we report an empirical analysis showing that the upperbound -is loose; roughly a third of the remaining errors are also not resolvable from -the linguistic signal. Together, these results suggest that the SimpleQuestions -dataset is nearly solved. -" -7310,1804.08813,"Wenpeng Yin, Hinrich Sch\""utze and Dan Roth","End-Task Oriented Textual Entailment via Deep Explorations of - Inter-Sentence Interactions",cs.CL," This work deals with SciTail, a natural entailment challenge derived from a -multi-choice question answering problem. The premises and hypotheses in SciTail -were generated with no awareness of each other, and did not specifically aim at -the entailment task. This makes it more challenging than other entailment data -sets and more directly useful to the end-task -- question answering. We propose -DEISTE (deep explorations of inter-sentence interactions for textual -entailment) for this entailment task. Given word-to-word interactions between -the premise-hypothesis pair ($P$, $H$), DEISTE consists of: (i) a -parameter-dynamic convolution to make important words in $P$ and $H$ play a -dominant role in learnt representations; and (ii) a position-aware attentive -convolution to encode the representation and position information of the -aligned word pairs. Experiments show that DEISTE gets $\approx$5\% improvement -over prior state of the art and that the pretrained DEISTE on SciTail -generalizes well on RTE-5. -" -7311,1804.08845,"Tu Vu, Vered Shwartz","Integrating Multiplicative Features into Supervised Distributional - Methods for Lexical Entailment",cs.CL," Supervised distributional methods are applied successfully in lexical -entailment, but recent work questioned whether these methods actually learn a -relation between two words. Specifically, Levy et al. (2015) claimed that -linear classifiers learn only separate properties of each word. We suggest a -cheap and easy way to boost the performance of these methods by integrating -multiplicative features into commonly used representations. We provide an -extensive evaluation with different classifiers and evaluation setups, and -suggest a suitable evaluation setup for the task, eliminating biases existing -in previous ones. -" -7312,1804.08847,"Elvis Saravia, Hsien-Chi Toby Liu, Yi-Shin Chen",DeepEmo: Learning and Enriching Pattern-Based Emotion Representations,cs.CL cs.IR," We propose a graph-based mechanism to extract rich-emotion bearing patterns, -which fosters a deeper analysis of online emotional expressions, from a corpus. -The patterns are then enriched with word embeddings and evaluated through -several emotion recognition tasks. Moreover, we conduct analysis on the -emotion-oriented patterns to demonstrate its applicability and to explore its -properties. Our experimental results demonstrate that the proposed techniques -outperform most state-of-the-art emotion recognition techniques. -" -7313,1804.08875,"Nikola I. Nikolov, Michael Pfeiffer, Richard H.R. Hahnloser",Data-driven Summarization of Scientific Articles,cs.CL," Data-driven approaches to sequence-to-sequence modelling have been -successfully applied to short text summarization of news articles. Such models -are typically trained on input-summary pairs consisting of only a single or a -few sentences, partially due to limited availability of multi-sentence training -data. Here, we propose to use scientific articles as a new milestone for text -summarization: large-scale training data come almost for free with two types of -high-quality summaries at different levels - the title and the abstract. We -generate two novel multi-sentence summarization datasets from scientific -articles and test the suitability of a wide range of existing extractive and -abstractive neural network-based summarization approaches. Our analysis -demonstrates that scientific papers are suitable for data-driven text -summarization. Our results could serve as valuable benchmarks for scaling -sequence-to-sequence models to very long sequences. -" -7314,1804.08881,Shuntaro Takahashi and Kumiko Tanaka-Ishii,Assessing Language Models with Scaling Properties,cs.CL," Language models have primarily been evaluated with perplexity. While -perplexity quantifies the most comprehensible prediction performance, it does -not provide qualitative information on the success or failure of models. -Another approach for evaluating language models is thus proposed, using the -scaling properties of natural language. Five such tests are considered, with -the first two accounting for the vocabulary population and the other three for -the long memory of natural language. The following models were evaluated with -these tests: n-grams, probabilistic context-free grammar (PCFG), Simon and -Pitman-Yor (PY) processes, hierarchical PY, and neural language models. Only -the neural language models exhibit the long memory properties of natural -language, but to a limited degree. The effectiveness of every test of these -models is also discussed. -" -7315,1804.08887,"Farhad Nooralahzadeh, Lilja {\O}vrelid, Jan Tore L{\o}nning","SIRIUS-LTG-UiO at SemEval-2018 Task 7: Convolutional Neural Networks - with Shortest Dependency Paths for Semantic Relation Extraction and - Classification in Scientific Papers",cs.CL," This article presents the SIRIUS-LTG-UiO system for the SemEval 2018 Task 7 -on Semantic Relation Extraction and Classification in Scientific Papers. First -we extract the shortest dependency path (sdp) between two entities, then we -introduce a convolutional neural network (CNN) which takes the shortest -dependency path embeddings as input and performs relation classification with -differing objectives for each subtask of the shared task. This approach -achieved overall F1 scores of 76.7 and 83.2 for relation classification on -clean and noisy data, respectively. Furthermore, for combined relation -extraction and classification on clean data, it obtained F1 scores of 37.4 and -33.6 for each phase. Our system ranks 3rd in all three sub-tasks of the shared -task. -" -7316,1804.08915,Eliyahu Kiperwasser and Miguel Ballesteros,Scheduled Multi-Task Learning: From Syntax to Translation,cs.CL," Neural encoder-decoder models of machine translation have achieved impressive -results, while learning linguistic knowledge of both the source and target -languages in an implicit end-to-end manner. We propose a framework in which our -model begins learning syntax and translation interleaved, gradually putting -more focus on translation. Using this approach, we achieve considerable -improvements in terms of BLEU score on relatively large parallel corpus (WMT14 -English to German) and a low-resource (WIT German to English) setup. -" -7317,1804.09000,"Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, Alan W Black",Style Transfer Through Back-Translation,cs.CL," Style transfer is the task of rephrasing the text to contain specific -stylistic properties without changing the intent or affect within the context. -This paper introduces a new method for automatic style transfer. We first learn -a latent representation of the input sentence which is grounded in a language -translation model in order to better preserve the meaning of the sentence while -reducing stylistic properties. Then adversarial generation techniques are used -to make the output match the desired style. We evaluate this technique on three -different style transformations: sentiment, gender and political slant. -Compared to two state-of-the-art style transfer modeling techniques we show -improvements both in automatic evaluation of style transfer and in manual -evaluation of meaning preservation and fluency. -" -7318,1804.09010,"Jianmin Zhang, Jiwei Tan, Xiaojun Wan","Towards a Neural Network Approach to Abstractive Multi-Document - Summarization",cs.CL," Till now, neural abstractive summarization methods have achieved great -success for single document summarization (SDS). However, due to the lack of -large scale multi-document summaries, such methods can be hardly applied to -multi-document summarization (MDS). In this paper, we investigate neural -abstractive methods for MDS by adapting a state-of-the-art neural abstractive -summarization model for SDS. We propose an approach to extend the neural -abstractive model trained on large scale SDS data to the MDS task. Our approach -only makes use of a small number of multi-document summaries for fine tuning. -Experimental results on two benchmark DUC datasets demonstrate that our -approach can outperform a variety of baseline neural models. -" -7319,1804.09021,"Zhenghui Wang, Yanru Qu, Liheng Chen, Jian Shen, Weinan Zhang, - Shaodian Zhang, Yimei Gao, Gen Gu, Ken Chen, Yong Yu","Label-aware Double Transfer Learning for Cross-Specialty Medical Named - Entity Recognition",cs.CL cs.AI," We study the problem of named entity recognition (NER) from electronic -medical records, which is one of the most fundamental and critical problems for -medical text mining. Medical records which are written by clinicians from -different specialties usually contain quite different terminologies and writing -styles. The difference of specialties and the cost of human annotation makes it -particularly difficult to train a universal medical NER system. In this paper, -we propose a label-aware double transfer learning framework (La-DTL) for -cross-specialty NER, so that a medical NER system designed for one specialty -could be conveniently applied to another one with minimal annotation efforts. -The transferability is guaranteed by two components: (i) we propose label-aware -MMD for feature representation transfer, and (ii) we perform parameter transfer -with a theoretical upper bound which is also label aware. We conduct extensive -experiments on 12 cross-specialty NER tasks. The experimental results -demonstrate that La-DTL provides consistent accuracy improvement over strong -baselines. Besides, the promising experimental results on non-medical NER -scenarios indicate that La-DTL is potential to be seamlessly adapted to a wide -range of NER tasks. -" -7320,1804.09028,"Guy Hadash, Einat Kermany, Boaz Carmeli, Ofer Lavi, George Kour and - Alon Jacovi","Estimate and Replace: A Novel Approach to Integrating Deep Neural - Networks with Existing Applications",cs.LG cs.CL stat.ML," Existing applications include a huge amount of knowledge that is out of reach -for deep neural networks. This paper presents a novel approach for integrating -calls to existing applications into deep learning architectures. Using this -approach, we estimate each application's functionality with an estimator, which -is implemented as a deep neural network (DNN). The estimator is then embedded -into a base network that we direct into complying with the application's -interface during an end-to-end optimization process. At inference time, we -replace each estimator with its existing application counterpart and let the -base network solve the task by interacting with the existing application. Using -this 'Estimate and Replace' method, we were able to train a DNN end-to-end with -less data and outperformed a matching DNN that did not interact with the -external application. -" -7321,1804.09057,"Zhen Yang, Wei Chen, Feng Wang, Bo Xu",Unsupervised Neural Machine Translation with Weight Sharing,cs.CL," Unsupervised neural machine translation (NMT) is a recently proposed approach -for machine translation which aims to train the model without using any labeled -data. The models proposed for unsupervised NMT often use only one shared -encoder to map the pairs of sentences from different languages to a -shared-latent space, which is weak in keeping the unique and internal -characteristics of each language, such as the style, terminology, and sentence -structure. To address this issue, we introduce an extension by utilizing two -independent encoders but sharing some partial weights which are responsible for -extracting high-level representations of the input sentences. Besides, two -different generative adversarial networks (GANs), namely the local GAN and -global GAN, are proposed to enhance the cross-language translation. With this -new approach, we achieve significant improvements on English-German, -English-French and Chinese-to-English translation tasks. -" -7322,1804.09132,"Seid Muhie Yimam and Chris Biemann and Shervin Malmasi and Gustavo H. - Paetzold and Lucia Specia and Sanja \v{S}tajner and Ana\""is Tack and Marcos - Zampieri",A Report on the Complex Word Identification Shared Task 2018,cs.CL," We report the findings of the second Complex Word Identification (CWI) shared -task organized as part of the BEA workshop co-located with NAACL-HLT'2018. The -second CWI shared task featured multilingual and multi-genre datasets divided -into four tracks: English monolingual, German monolingual, Spanish monolingual, -and a multilingual track with a French test set, and two tasks: binary -classification and probabilistic classification. A total of 12 teams submitted -their results in different task/track combinations and 11 of them wrote system -description papers that are referred to in this report and appear in the BEA -workshop proceedings. -" -7323,1804.09148,Diego Saldana Miranda,"Automated Detection of Adverse Drug Reactions in the Biomedical - Literature Using Convolutional Neural Networks and Biomedical Word Embeddings",cs.CL cs.LG stat.ML," Monitoring the biomedical literature for cases of Adverse Drug Reactions -(ADRs) is a critically important and time consuming task in pharmacovigilance. -The development of computer assisted approaches to aid this process in -different forms has been the subject of many recent works. One particular area -that has shown promise is the use of Deep Neural Networks, in particular, -Convolutional Neural Networks (CNNs), for the detection of ADR relevant -sentences. Using token-level convolutions and general purpose word embeddings, -this architecture has shown good performance relative to more traditional -models as well as Long Short Term Memory (LSTM) models. In this work, we -evaluate and compare two different CNN architectures using the ADE corpus. In -addition, we show that by de-duplicating the ADR relevant sentences, we can -greatly reduce overoptimism in the classification results. Finally, we evaluate -the use of word embeddings specifically developed for biomedical text and show -that they lead to a better performance in this task. -" -7324,1804.09160,"Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang","No Metrics Are Perfect: Adversarial Reward Learning for Visual - Storytelling",cs.CL cs.AI cs.CV cs.LG," Though impressive results have been achieved in visual captioning, the task -of generating abstract stories from photo streams is still a little-tapped -problem. Different from captions, stories have more expressive language styles -and contain many imaginary concepts that do not appear in the images. Thus it -poses challenges to behavioral cloning algorithms. Furthermore, due to the -limitations of automatic metrics on evaluating story quality, reinforcement -learning methods with hand-crafted rewards also face difficulties in gaining an -overall performance boost. Therefore, we propose an Adversarial REward Learning -(AREL) framework to learn an implicit reward function from human -demonstrations, and then optimize policy search with the learned reward -function. Though automatic eval- uation indicates slight performance boost over -state-of-the-art (SOTA) methods in cloning expert behaviors, human evaluation -shows that our approach achieves significant improvement in generating more -human-like stories than SOTA systems. -" -7325,1804.09259,"Stanis{\l}aw Jastrz\k{e}bski, Dzmitry Bahdanau, Seyedarian Hosseini, - Michael Noukhovitch, Yoshua Bengio, Jackie Chi Kit Cheung","Commonsense mining as knowledge base completion? A study on the impact - of novelty",cs.CL," Commonsense knowledge bases such as ConceptNet represent knowledge in the -form of relational triples. Inspired by the recent work by Li et al., we -analyse if knowledge base completion models can be used to mine commonsense -knowledge from raw text. We propose novelty of predicted triples with respect -to the training set as an important factor in interpreting results. We -critically analyse the difficulty of mining novel commonsense knowledge, and -show that a simple baseline method outperforms the previous state of the art on -predicting more novel. -" -7326,1804.09298,Dong Yu and Jinyu Li,Recent Progresses in Deep Learning based Acoustic Models (Updated),eess.AS cs.CL cs.SD," In this paper, we summarize recent progresses made in deep learning based -acoustic models and the motivation and insights behind the surveyed techniques. -We first discuss acoustic models that can effectively exploit variable-length -contextual information, such as recurrent neural networks (RNNs), convolutional -neural networks (CNNs), and their various combination with other models. We -then describe acoustic models that are optimized end-to-end with emphasis on -feature representations learned jointly with rest of the system, the -connectionist temporal classification (CTC) criterion, and the attention-based -sequence-to-sequence model. We further illustrate robustness issues in speech -recognition systems, and discuss acoustic model adaptation, speech enhancement -and separation, and robust training strategies. We also cover modeling -techniques that lead to more efficient decoding and discuss possible future -directions in acoustic model research. -" -7327,1804.09299,"Hendrik Strobelt, Sebastian Gehrmann, Michael Behrisch, Adam Perer, - Hanspeter Pfister, Alexander M. Rush",Seq2Seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models,cs.CL cs.AI cs.NE," Neural Sequence-to-Sequence models have proven to be accurate and robust for -many sequence prediction tasks, and have become the standard approach for -automatic translation of text. The models work in a five stage blackbox process -that involves encoding a source sequence to a vector space and then decoding -out to a new target sequence. This process is now standard, but like many deep -learning methods remains quite difficult to understand or debug. In this work, -we present a visual analysis tool that allows interaction with a trained -sequence-to-sequence model through each stage of the translation process. The -aim is to identify which patterns have been learned and to detect model errors. -We demonstrate the utility of our tool through several real-world large-scale -sequence-to-sequence use cases. -" -7328,1804.09301,"Rachel Rudinger, Jason Naradowsky, Brian Leonard, Benjamin Van Durme",Gender Bias in Coreference Resolution,cs.CL," We present an empirical study of gender bias in coreference resolution -systems. We first introduce a novel, Winograd schema-style set of minimal pair -sentences that differ only by pronoun gender. With these ""Winogender schemas,"" -we evaluate and confirm systematic gender bias in three publicly-available -coreference resolution systems, and correlate this bias with real-world and -textual gender statistics. -" -7329,1804.09321,Xi Rao and Zhenxing Ke,Hierarchical RNN for Information Extraction from Lawsuit Documents,cs.CL," Every lawsuit document contains the information about the party's claim, -court's analysis, decision and others, and all of this information are helpful -to understand the case better and predict the judge's decision on similar case -in the future. However, the extraction of these information from the document -is difficult because the language is too complicated and sentences varied at -length. We treat this problem as a task of sequence labeling, and this paper -presents the first research to extract relevant information from the civil -lawsuit document in China with the hierarchical RNN framework. -" -7330,1804.09530,"Sebastian Ruder, Barbara Plank",Strong Baselines for Neural Semi-supervised Learning under Domain Shift,cs.CL cs.LG stat.ML," Novel neural models have been proposed in recent years for learning under -domain shift. Most models, however, only evaluate on a single task, on -proprietary datasets, or compare to weak baselines, which makes comparison of -models difficult. In this paper, we re-evaluate classic general-purpose -bootstrapping approaches in the context of neural networks under domain shifts -vs. recent neural approaches and propose a novel multi-task tri-training method -that reduces the time and space complexity of classic tri-training. Extensive -experiments on two benchmarks are negative: while our novel method establishes -a new state-of-the-art for sentiment analysis, it does not fare consistently -the best. More importantly, we arrive at the somewhat surprising conclusion -that classic tri-training, with some additions, outperforms the state of the -art. We conclude that classic approaches constitute an important and strong -baseline. -" -7331,1804.09540,"Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder - Singh, Lazaros Polymenakos",NE-Table: A Neural key-value table for Named Entities,cs.CL cs.AI," Many Natural Language Processing (NLP) tasks depend on using Named Entities -(NEs) that are contained in texts and in external knowledge sources. While this -is easy for humans, the present neural methods that rely on learned word -embeddings may not perform well for these NLP tasks, especially in the presence -of Out-Of-Vocabulary (OOV) or rare NEs. In this paper, we propose a solution -for this problem, and present empirical evaluations on: a) a structured -Question-Answering task, b) three related Goal-Oriented dialog tasks, and c) a -Reading-Comprehension task, which show that the proposed method can be -effective in dealing with both in-vocabulary and OOV NEs. We create extended -versions of dialog bAbI tasks 1,2 and 4 and OOV versions of the CBT test set -available at - https://github.com/IBM/ne-table-datasets. -" -7332,1804.09541,"Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, - Mohammad Norouzi, Quoc V. Le","QANet: Combining Local Convolution with Global Self-Attention for - Reading Comprehension",cs.CL cs.AI cs.LG," Current end-to-end machine reading and question answering (Q\&A) models are -primarily based on recurrent neural networks (RNNs) with attention. Despite -their success, these models are often slow for both training and inference due -to the sequential nature of RNNs. We propose a new Q\&A architecture called -QANet, which does not require recurrent networks: Its encoder consists -exclusively of convolution and self-attention, where convolution models local -interactions and self-attention models global interactions. On the SQuAD -dataset, our model is 3x to 13x faster in training and 4x to 9x faster in -inference, while achieving equivalent accuracy to recurrent models. The -speed-up gain allows us to train the model with much more data. We hence -combine our model with data generated by backtranslation from a neural machine -translation model. On the SQuAD dataset, our single model, trained with -augmented data, achieves 84.6 F1 score on the test set, which is significantly -better than the best published F1 score of 81.8. -" -7333,1804.09543,Dafydd Gibbon,The Future of Prosody: It's about Time,cs.CL cs.SD," Prosody is usually defined in terms of the three distinct but interacting -domains of pitch, intensity and duration patterning, or, more generally, as -phonological and phonetic properties of 'suprasegmentals', speech segments -which are larger than consonants and vowels. Rather than taking this approach, -the concept of multiple time domains for prosody processing is taken up, and -methods of time domain analysis are discussed: annotation mining with timing -dispersion measures, time tree induction, oscillator models in phonology and -phonetics, and finally the use of the Amplitude Envelope Modulation Spectrum -(AEMS). While frequency demodulation (in the form of pitch tracking) is a -central issue in prosodic analysis, in the present context it is amplitude -envelope demodulation and frequency zones in the long time-domain spectra of -the demodulated envelope which are focused. A generalised view is taken of -oscillation as iteration in abstract prosodic models and as modulation and -demodulation of a variety of rhythms in the speech signal. -" -7334,1804.09552,"Kyongsik Yun, Joseph Osborne, Madison Lee, Thomas Lu, Edward Chow","Automatic speech recognition for launch control center communication - using recurrent neural networks with data augmentation and custom language - model",cs.CL cs.HC," Transcribing voice communications in NASA's launch control center is -important for information utilization. However, automatic speech recognition in -this environment is particularly challenging due to the lack of training data, -unfamiliar words in acronyms, multiple different speakers and accents, and -conversational characteristics of speaking. We used bidirectional deep -recurrent neural networks to train and test speech recognition performance. We -showed that data augmentation and custom language models can improve speech -recognition accuracy. Transcribing communications from the launch control -center will help the machine analyze information and accelerate knowledge -generation. -" -7335,1804.09558,"Raquel P\'erez-Arnal, Armand Vilalta, Dario Garcia-Gasulla, Ulises - Cort\'es, Eduard Ayguad\'e, Jesus Labarta",A Visual Distance for WordNet,cs.CL cs.AI cs.LG cs.NE stat.ML," Measuring the distance between concepts is an important field of study of -Natural Language Processing, as it can be used to improve tasks related to the -interpretation of those same concepts. WordNet, which includes a wide variety -of concepts associated with words (i.e., synsets), is often used as a source -for computing those distances. In this paper, we explore a distance for WordNet -synsets based on visual features, instead of lexical ones. For this purpose, we -extract the graphic features generated within a deep convolutional neural -networks trained with ImageNet and use those features to generate a -representative of each synset. Based on those representatives, we define a -distance measure of synsets, which complements the traditional lexical -distances. Finally, we propose some experiments to evaluate its performance and -compare it with the current state-of-the-art. -" -7336,1804.09635,"Dongyeop Kang and Waleed Ammar and Bhavana Dalvi and Madeleine van - Zuylen and Sebastian Kohlmeier and Eduard Hovy and Roy Schwartz","A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP - Applications",cs.CL cs.AI," Peer reviewing is a central component in the scientific publishing process. -We present the first public dataset of scientific peer reviews available for -research purposes (PeerRead v1) providing an opportunity to study this -important artifact. The dataset consists of 14.7K paper drafts and the -corresponding accept/reject decisions in top-tier venues including ACL, NIPS -and ICLR. The dataset also includes 10.7K textual peer reviews written by -experts for a subset of the papers. We describe the data collection process and -report interesting observed phenomena in the peer reviews. We also propose two -novel NLP tasks based on this dataset and provide simple baseline models. In -the first task, we show that simple models can predict whether a paper is -accepted with up to 21% error reduction compared to the majority baseline. In -the second task, we predict the numerical scores of review aspects and show -that simple models can outperform the mean baseline for aspects with high -variance such as 'originality' and 'impact'. -" -7337,1804.09661,Aaron Jaech and Mari Ostendorf,Personalized Language Model for Query Auto-Completion,cs.CL cs.IR," Query auto-completion is a search engine feature whereby the system suggests -completed queries as the user types. Recently, the use of a recurrent neural -network language model was suggested as a method of generating query -completions. We show how an adaptable language model can be used to generate -personalized completions and how the model can use online updating to make -predictions for users not seen during training. The personalized predictions -are significantly better than a baseline that uses no user information. -" -7338,1804.09692,"Laura Wendlandt, Jonathan K. Kummerfeld, Rada Mihalcea",Factors Influencing the Surprising Instability of Word Embeddings,cs.CL," Despite the recent popularity of word embedding methods, there is only a -small body of work exploring the limitations of these representations. In this -paper, we consider one aspect of embedding spaces, namely their stability. We -show that even relatively high frequency words (100-200 occurrences) are often -unstable. We provide empirical evidence for how various factors contribute to -the stability of word embeddings, and we analyze the effects of stability on -downstream tasks. -" -7339,1804.09713,"Shruti Palaskar, Ramon Sanabria and Florian Metze",End-to-End Multimodal Speech Recognition,eess.AS cs.CL cs.LG," Transcription or sub-titling of open-domain videos is still a challenging -domain for Automatic Speech Recognition (ASR) due to the data's challenging -acoustics, variable signal processing and the essentially unrestricted domain -of the data. In previous work, we have shown that the visual channel -- -specifically object and scene features -- can help to adapt the acoustic model -(AM) and language model (LM) of a recognizer, and we are now expanding this -work to end-to-end approaches. In the case of a Connectionist Temporal -Classification (CTC)-based approach, we retain the separation of AM and LM, -while for a sequence-to-sequence (S2S) approach, both information sources are -adapted together, in a single model. This paper also analyzes the behavior of -CTC and S2S models on noisy video data (How-To corpus), and compares it to -results on the clean Wall Street Journal (WSJ) corpus, providing insight into -the robustness of both approaches. -" -7340,1804.09746,Olivier Bournez and Sabrina Ouazzani,Cheap Non-standard Analysis and Computability,cs.LO cs.CL," Non standard analysis is an area of Mathematics dealing with notions of -infinitesimal and infinitely large numbers, in which many statements from -classical analysis can be expressed very naturally. Cheap non-standard analysis -introduced by Terence Tao in 2012 is based on the idea that considering that a -property holds eventually is sufficient to give the essence of many of its -statements. This provides constructivity but at some (acceptable) price. We -consider computability in cheap non-standard analysis. We prove that many -concepts from computable analysis as well as several concepts from -computability can be very elegantly and alternatively presented in this -framework. It provides a dual view and dual proofs to several statements -already known in these fields. -" -7341,1804.09769,"Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, Dragomir Radev",TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation,cs.CL," Interacting with relational databases through natural language helps users of -any background easily query and analyze a vast amount of data. This requires a -system that understands users' questions and converts them to SQL queries -automatically. In this paper we present a novel approach, TypeSQL, which views -this problem as a slot filling task. Additionally, TypeSQL utilizes type -information to better understand rare entities and numbers in natural language -questions. We test this idea on the WikiSQL dataset and outperform the prior -state-of-the-art by 5.5% in much less time. We also show that accessing the -content of databases can significantly improve the performance when users' -queries are not well-formed. TypeSQL gets 82.6% accuracy, a 17.5% absolute -improvement compared to the previous content-sensitive model. -" -7342,1804.09779,"Adam Poliak, Yonatan Belinkov, James Glass, Benjamin Van Durme","On the Evaluation of Semantic Phenomena in Neural Machine Translation - Using Natural Language Inference",cs.CL," We propose a process for investigating the extent to which sentence -representations arising from neural machine translation (NMT) systems encode -distinct semantic phenomena. We use these representations as features to train -a natural language inference (NLI) classifier based on datasets recast from -existing semantic annotations. In applying this process to a representative NMT -system, we find its encoder appears most suited to supporting inferences at the -syntax-semantics interface, as compared to anaphora resolution requiring -world-knowledge. We conclude with a discussion on the merits and potential -deficiencies of the existing process, and how it may be improved and extended -as a broader framework for evaluating semantic coverage. -" -7343,1804.09843,Ben Athiwaratkun and Andrew Gordon Wilson,Hierarchical Density Order Embeddings,cs.CL cs.AI cs.LG stat.ML," By representing words with probability densities rather than point vectors, -probabilistic word embeddings can capture rich and interpretable semantic -information and uncertainty. The uncertainty information can be particularly -meaningful in capturing entailment relationships -- whereby general words such -as ""entity"" correspond to broad distributions that encompass more specific -words such as ""animal"" or ""instrument"". We introduce density order embeddings, -which learn hierarchical representations through encapsulation of probability -densities. In particular, we propose simple yet effective loss functions and -distance metrics, as well as graph-based schemes to select negative samples to -better learn hierarchical density representations. Our approach provides -state-of-the-art performance on the WordNet hypernym relationship prediction -task and the challenging HyperLex lexical entailment dataset -- while retaining -a rich and interpretable density representation. -" -7344,1804.09849,"Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang - Macherey, George Foster, Llion Jones, Niki Parmar, Mike Schuster, Zhifeng - Chen, Yonghui Wu, Macduff Hughes","The Best of Both Worlds: Combining Recent Advances in Neural Machine - Translation",cs.CL cs.AI," The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) -modeling for Machine Translation (MT). The classic RNN-based approaches to MT -were first out-performed by the convolutional seq2seq model, which was then -out-performed by the more recent Transformer model. Each of these new -approaches consists of a fundamental architecture accompanied by a set of -modeling and training techniques that are in principle applicable to other -seq2seq architectures. In this paper, we tease apart the new architectures and -their accompanying techniques in two ways. First, we identify several key -modeling and training techniques, and apply them to the RNN architecture, -yielding a new RNMT+ model that outperforms all of the three fundamental -architectures on the benchmark WMT'14 English to French and English to German -tasks. Second, we analyze the properties of each fundamental seq2seq -architecture and devise new hybrid architectures intended to combine their -strengths. Our hybrid models obtain further improvements, outperforming the -RNMT+ model on both benchmark datasets. -" -7345,1804.09931,"Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, Jiawei Han","Integrating Local Context and Global Cohesiveness for Open Information - Extraction",cs.CL," Extracting entities and their relations from text is an important task for -understanding massive text corpora. Open information extraction (IE) systems -mine relation tuples (i.e., entity arguments and a predicate string to describe -their relation) from sentences. These relation tuples are not confined to a -predefined schema for the relations of interests. However, current Open IE -systems focus on modeling local context information in a sentence to extract -relation tuples, while ignoring the fact that global statistics in a large -corpus can be collectively leveraged to identify high-quality sentence-level -extractions. In this paper, we propose a novel Open IE system, called ReMine, -which integrates local context signals and global structural signals in a -unified, distant-supervision framework. Leveraging facts from external -knowledge bases as supervision, the new system can be applied to many different -domains to facilitate sentence-level tuple extractions using corpus-level -statistics. Our system operates by solving a joint optimization problem to -unify (1) segmenting entity/relation phrases in individual sentences based on -local context; and (2) measuring the quality of tuples extracted from -individual sentences with a translating-based objective. Learning the two -subtasks jointly helps correct errors produced in each subtask so that they can -mutually enhance each other. Experiments on two real-world corpora from -different domains demonstrate the effectiveness, generality, and robustness of -ReMine when compared to state-of-the-art open IE systems. -" -7346,1804.09949,"Adam Bielski, Tomasz Trzcinski","Pay Attention to Virality: understanding popularity of social media - videos with the attention mechanism",cs.CV cs.CL," Predicting popularity of social media videos before they are published is a -challenging task, mainly due to the complexity of content distribution network -as well as the number of factors that play part in this process. As solving -this task provides tremendous help for media content creators, many successful -methods were proposed to solve this problem with machine learning. In this -work, we change the viewpoint and postulate that it is not only the predicted -popularity that matters, but also, maybe even more importantly, understanding -of how individual parts influence the final popularity score. To that end, we -propose to combine the Grad-CAM visualization method with a soft attention -mechanism. Our preliminary results show that this approach allows for more -intuitive interpretation of the content impact on video popularity, while -achieving competitive results in terms of prediction accuracy. -" -7347,1804.10080,"Sergey Novoselov, Andrey Shulipa, Ivan Kremnev, Alexandr Kozlov, Vadim - Shchemelinin",On deep speaker embeddings for text-independent speaker recognition,cs.SD cs.CL eess.AS stat.ML," We investigate deep neural network performance in the textindependent speaker -recognition task. We demonstrate that using angular softmax activation at the -last classification layer of a classification neural network instead of a -simple softmax activation allows to train a more generalized discriminative -speaker embedding extractor. Cosine similarity is an effective metric for -speaker verification in this embedding space. We also address the problem of -choosing an architecture for the extractor. We found that deep networks with -residual frame level connections outperform wide but relatively shallow -architectures. This paper also proposes several improvements for previous -DNN-based extractor systems to increase the speaker recognition accuracy. We -show that the discriminatively trained similarity metric learning approach -outperforms the standard LDA-PLDA method as an embedding backend. The results -obtained on Speakers in the Wild and NIST SRE 2016 evaluation sets demonstrate -robustness of the proposed systems when dealing with close to real-life -conditions. -" -7348,1804.10184,"Shudong Hao, Jordan Boyd-Graber, Michael J. Paul","Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic - Model Evaluation",cs.CL," Multilingual topic models enable document analysis across languages through -coherent multilingual summaries of the data. However, there is no standard and -effective metric to evaluate the quality of multilingual topics. We introduce a -new intrinsic evaluation of multilingual topic models that correlates well with -human judgments of multilingual topic coherence as well as performance in -downstream applications. Importantly, we also study evaluation for low-resource -languages. Because standard metrics fail to accurately measure topic quality -when robust external resources are unavailable, we propose an adaptation model -that improves the accuracy and reliability of these metrics in low-resource -settings. -" -7349,1804.10188,"Sahil Garg, Irina Rish, Guillermo Cecchi, Palash Goyal, Sarik - Ghazarian, Shuyang Gao, Greg Ver Steeg, Aram Galstyan","Modeling Psychotherapy Dialogues with Kernelized Hashcode - Representations: A Nonparametric Information-Theoretic Approach",cs.LG cs.AI cs.CL cs.IT math.IT stat.ML," We propose a novel dialogue modeling framework, the first-ever nonparametric -kernel functions based approach for dialogue modeling, which learns kernelized -hashcodes as compressed text representations; unlike traditional deep learning -models, it handles well relatively small datasets, while also scaling to large -ones. We also derive a novel lower bound on mutual information, used as a -model-selection criterion favoring representations with better alignment -between the utterances of participants in a collaborative dialogue setting, as -well as higher predictability of the generated responses. As demonstrated on -three real-life datasets, including prominently psychotherapy sessions, the -proposed approach significantly outperforms several state-of-art neural network -based dialogue systems, both in terms of computational efficiency, reducing -training time from days or weeks to hours, and the response quality, achieving -an order of magnitude improvement over competitors in frequency of being chosen -as the best model by human evaluators. -" -7350,1804.10202,"Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin - Choi, Noah A. Smith, Mari Ostendorf",Sounding Board: A User-Centric and Content-Driven Social Chatbot,cs.HC cs.AI cs.CL," We present Sounding Board, a social chatbot that won the 2017 Amazon Alexa -Prize. The system architecture consists of several components including spoken -language processing, dialogue management, language generation, and content -management, with emphasis on user-centric and content-driven design. We also -share insights gained from large-scale online logs based on 160,000 -conversations with real-world users. -" -7351,1804.10204,"Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John R. Hershey","End-to-End Speech Separation with Unfolded Iterative Phase - Reconstruction",cs.SD cs.CL cs.LG eess.AS stat.ML," This paper proposes an end-to-end approach for single-channel -speaker-independent multi-speaker speech separation, where time-frequency (T-F) -masking, the short-time Fourier transform (STFT), and its inverse are -represented as layers within a deep network. Previous approaches, rather than -computing a loss on the reconstructed signal, used a surrogate loss based on -the target STFT magnitudes. This ignores reconstruction error introduced by -phase inconsistency. In our approach, the loss function is directly defined on -the reconstructed signals, which are optimized for best separation. In -addition, we train through unfolded iterations of a phase reconstruction -algorithm, represented as a series of STFT and inverse STFT layers. While mask -values are typically limited to lie between zero and one for approaches using -the mixture phase for reconstruction, this limitation is less relevant if the -estimated magnitudes are to be used together with phase reconstruction. We thus -propose several novel activation functions for the output layer of the T-F -masking, to allow mask values beyond one. On the publicly-available wsj0-2mix -dataset, our approach achieves state-of-the-art 12.6 dB scale-invariant -signal-to-distortion ratio (SI-SDR) and 13.1 dB SDR, revealing new -possibilities for deep learning based phase reconstruction and representing a -fundamental progress towards solving the notoriously-hard cocktail party -problem. -" -7352,1804.10413,"Jakub K\'udela, Irena Holubov\'a, Ond\v{r}ej Bojar",Extracting Parallel Paragraphs from Common Crawl,cs.CL," Most of the current methods for mining parallel texts from the web assume -that web pages of web sites share same structure across languages. We believe -that there still exists a non-negligible amount of parallel data spread across -sources not satisfying this assumption. We propose an approach based on a -combination of bivec (a bilingual extension of word2vec) and locality-sensitive -hashing which allows us to efficiently identify pairs of parallel segments -located anywhere on pages of a given web domain, regardless their structure. We -validate our method on realigning segments from a large parallel corpus. -Another experiment with real-world data provided by Common Crawl Foundation -confirms that our solution scales to hundreds of terabytes large set of -web-crawled data. -" -7353,1804.10490,"Martin Raison, Pierre-Emmanuel Mazar\'e, Rajarshi Das, Antoine Bordes",Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading,cs.CL," This paper aims at improving how machines can answer questions directly from -text, with the focus of having models that can answer correctly multiple types -of questions and from various types of texts, documents or even from large -collections of them. To that end, we introduce the Weaver model that uses a new -way to relate a question to a textual context by weaving layers of recurrent -networks, with the goal of making as few assumptions as possible as to how the -information from both question and context should be combined to form the -answer. We show empirically on six datasets that Weaver performs well in -multiple conditions. For instance, it produces solid results on the very -popular SQuAD dataset (Rajpurkar et al., 2016), solves almost all bAbI tasks -(Weston et al., 2015) and greatly outperforms state-of-the-art methods for open -domain question answering from text (Chen et al., 2017). -" -7354,1804.10615,"Tianze Shi, Carlos G\'omez-Rodr\'iguez, Lillian Lee","Improving Coverage and Runtime Complexity for Exact Inference in - Non-Projective Transition-Based Dependency Parsers",cs.CL," We generalize Cohen, G\'omez-Rodr\'iguez, and Satta's (2011) parser to a -family of non-projective transition-based dependency parsers allowing -polynomial-time exact inference. This includes novel parsers with better -coverage than Cohen et al. (2011), and even a variant that reduces time -complexity to $O(n^6)$, improving over the known bounds in exact inference for -non-projective transition-based parsing. We hope that this piece of theoretical -work inspires design of novel transition systems with better coverage and -better run-time guarantees. - Code available at https://github.com/tzshi/nonproj-dp-variants-naacl2018 -" -7355,1804.10637,Phong Le and Ivan Titov,Improving Entity Linking by Modeling Latent Relations between Mentions,cs.CL," Entity linking involves aligning textual mentions of named entities to their -corresponding entries in a knowledge base. Entity linking systems often exploit -relations between textual mentions in a document (e.g., coreference) to decide -if the linking decisions are compatible. Unlike previous approaches, which -relied on supervised systems or heuristics to predict these relations, we treat -relations as latent variables in our neural entity-linking model. We induce the -relations without any supervision while optimizing the entity-linking system in -an end-to-end fashion. Our multi-relational model achieves the best reported -scores on the standard benchmark (AIDA-CoNLL) and substantially outperforms its -relation-agnostic version. Its training also converges much faster, suggesting -that the injected structural bias helps to explain regularities in the training -data. -" -7356,1804.10686,"Dmitry Ustalov, Denis Teslenko, Alexander Panchenko, Mikhail - Chernoskutov, Chris Biemann, Simone Paolo Ponzetto","An Unsupervised Word Sense Disambiguation System for Under-Resourced - Languages",cs.CL," In this paper, we present Watasense, an unsupervised system for word sense -disambiguation. Given a sentence, the system chooses the most relevant sense of -each input word with respect to the semantic similarity between the given -sentence and the synset constituting the sense of the target word. Watasense -has two modes of operation. The sparse mode uses the traditional vector space -model to estimate the most similar word sense corresponding to its context. The -dense mode, instead, uses synset embeddings to cope with the sparsity problem. -We describe the architecture of the present system and also conduct its -evaluation on three different lexical semantic resources for Russian. We found -that the dense mode substantially outperforms the sparse one on all datasets -according to the adjusted Rand index. -" -7357,1804.10718,"Benjamin Robaidek, Rik Koncel-Kedziorski, Hannaneh Hajishirzi",Data-Driven Methods for Solving Algebra Word Problems,cs.AI cs.CL," We explore contemporary, data-driven techniques for solving math word -problems over recent large-scale datasets. We show that well-tuned neural -equation classifiers can outperform more sophisticated models such as sequence -to sequence and self-attention across these datasets. Our error analysis -indicates that, while fully data driven models show some promise, semantic and -world knowledge is necessary for further advances. -" -7358,1804.10731,"Weiyan Shi, Zhou Yu",Sentiment Adaptive End-to-End Dialog Systems,cs.CL," End-to-end learning framework is useful for building dialog systems for its -simplicity in training and efficiency in model updating. However, current -end-to-end approaches only consider user semantic inputs in learning and -under-utilize other user information. Therefore, we propose to include user -sentiment obtained through multimodal information (acoustic, dialogic and -textual), in the end-to-end learning framework to make systems more -user-adaptive and effective. We incorporated user sentiment information in both -supervised and reinforcement learning settings. In both settings, adding -sentiment information reduced the dialog length and improved the task success -rate on a bus information search task. This work is the first attempt to -incorporate multimodal user information in the adaptive end-to-end dialog -system training framework and attained state-of-the-art performance. -" -7359,1804.10747,Chu-Cheng Lin and Jason Eisner,Neural Particle Smoothing for Sampling from Conditional Sequence Models,cs.CL," We introduce neural particle smoothing, a sequential Monte Carlo method for -sampling annotations of an input string from a given probability model. In -contrast to conventional particle filtering algorithms, we train a proposal -distribution that looks ahead to the end of the input string by means of a -right-to-left LSTM. We demonstrate that this innovation can improve the quality -of the sample. To motivate our formal choices, we explain how our neural model -and neural sampler can be viewed as low-dimensional but nonlinear -approximations to working with HMMs over very large state spaces. -" -7360,1804.10752,"Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu","Syllable-Based Sequence-to-Sequence Speech Recognition with the - Transformer in Mandarin Chinese",eess.AS cs.CL cs.SD," Sequence-to-sequence attention-based models have recently shown very -promising results on automatic speech recognition (ASR) tasks, which integrate -an acoustic, pronunciation and language model into a single neural network. In -these models, the Transformer, a new sequence-to-sequence attention-based model -relying entirely on self-attention without using RNNs or convolutions, achieves -a new single-model state-of-the-art BLEU on neural machine translation (NMT) -tasks. Since the outstanding performance of the Transformer, we extend it to -speech and concentrate on it as the basic architecture of sequence-to-sequence -attention-based model on Mandarin Chinese ASR tasks. Furthermore, we -investigate a comparison between syllable based model and context-independent -phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese. -Additionally, a greedy cascading decoder with the Transformer is proposed for -mapping CI-phoneme sequences and syllable sequences into word sequences. -Experiments on HKUST datasets demonstrate that syllable based model with the -Transformer performs better than CI-phoneme based counterpart, and achieves a -character error rate (CER) of \emph{$28.77\%$}, which is competitive to the -state-of-the-art CER of $28.0\%$ by the joint CTC-attention based -encoder-decoder network. -" -7361,1804.10911,"Yadi Lao, Jun Xu, Yanyan Lan, Jiafeng Guo, Sheng Gao, Xueqi Cheng",A Tree Search Algorithm for Sequence Labeling,cs.CL cs.IR," In this paper we propose a novel reinforcement learning based model for -sequence tagging, referred to as MM-Tag. Inspired by the success and -methodology of the AlphaGo Zero, MM-Tag formalizes the problem of sequence -tagging with a Monte Carlo tree search (MCTS) enhanced Markov decision process -(MDP) model, in which the time steps correspond to the positions of words in a -sentence from left to right, and each action corresponds to assign a tag to a -word. Two long short-term memory networks (LSTM) are used to summarize the past -tag assignments and words in the sentence. Based on the outputs of LSTMs, the -policy for guiding the tag assignment and the value for predicting the whole -tagging accuracy of the whole sentence are produced. The policy and value are -then strengthened with MCTS, which takes the produced raw policy and value as -inputs, simulates and evaluates the possible tag assignments at the subsequent -positions, and outputs a better search policy for assigning tags. A -reinforcement learning algorithm is proposed to train the model parameters. Our -work is the first to apply the MCTS enhanced MDP model to the sequence tagging -task. We show that MM-Tag can accurately predict the tags thanks to the -exploratory decision making mechanism introduced by MCTS. Experimental results -show based on a chunking benchmark showed that MM-Tag outperformed the -state-of-the-art sequence tagging baselines including CRF and CRF with LSTM. -" -7362,1804.10922,"Fatima Zohra Smaili, Xin Gao and Robert Hoehndorf","OPA2Vec: combining formal and informal content of biomedical ontologies - to improve similarity-based prediction",cs.CL cs.AI cs.CE," Motivation: Ontologies are widely used in biology for data annotation, -integration, and analysis. In addition to formally structured axioms, -ontologies contain meta-data in the form of annotation axioms which provide -valuable pieces of information that characterize ontology classes. Annotations -commonly used in ontologies include class labels, descriptions, or synonyms. -Despite being a rich source of semantic information, the ontology meta-data are -generally unexploited by ontology-based analysis methods such as semantic -similarity measures. Results: We propose a novel method, OPA2Vec, to generate -vector representations of biological entities in ontologies by combining formal -ontology axioms and annotation axioms from the ontology meta-data. We apply a -Word2Vec model that has been pre-trained on PubMed abstracts to produce feature -vectors from our collected data. We validate our method in two different ways: -first, we use the obtained vector representations of proteins as a similarity -measure to predict protein-protein interaction (PPI) on two different datasets. -Second, we evaluate our method on predicting gene-disease associations based on -phenotype similarity by generating vector representations of genes and diseases -using a phenotype ontology, and applying the obtained vectors to predict -gene-disease associations. These two experiments are just an illustration of -the possible applications of our method. OPA2Vec can be used to produce vector -representations of any biomedical entity given any type of biomedical ontology. -Availability: https://github.com/bio-ontology-research-group/opa2vec Contact: -robert.hoehndorf@kaust.edu.sa and xin.gao@kaust.edu.sa. -" -7363,1804.10959,Taku Kudo,"Subword Regularization: Improving Neural Network Translation Models with - Multiple Subword Candidates",cs.CL," Subword units are an effective way to alleviate the open vocabulary problems -in neural machine translation (NMT). While sentences are usually converted into -unique subword sequences, subword segmentation is potentially ambiguous and -multiple segmentations are possible even with the same vocabulary. The question -addressed in this paper is whether it is possible to harness the segmentation -ambiguity as a noise to improve the robustness of NMT. We present a simple -regularization method, subword regularization, which trains the model with -multiple subword segmentations probabilistically sampled during training. In -addition, for better subword sampling, we propose a new subword segmentation -algorithm based on a unigram language model. We experiment with multiple -corpora and report consistent improvements especially on low resource and -out-of-domain settings. -" -7364,1804.10974,"Zihang Dai, Qizhe Xie, Eduard Hovy","From Credit Assignment to Entropy Regularization: Two New Algorithms for - Neural Sequence Prediction",cs.CL cs.LG stat.ML," In this work, we study the credit assignment problem in reward augmented -maximum likelihood (RAML) learning, and establish a theoretical equivalence -between the token-level counterpart of RAML and the entropy regularized -reinforcement learning. Inspired by the connection, we propose two sequence -prediction algorithms, one extending RAML with fine-grained credit assignment -and the other improving Actor-Critic with a systematic entropy regularization. -On two benchmark datasets, we show the proposed algorithms outperform RAML and -Actor-Critic respectively, providing new alternatives to sequence prediction. -" -7365,1804.11019,Fei Liu and Trevor Cohn and Timothy Baldwin,"Recurrent Entity Networks with Delayed Memory Update for Targeted - Aspect-based Sentiment Analysis",cs.CL," While neural networks have been shown to achieve impressive results for -sentence-level sentiment analysis, targeted aspect-based sentiment analysis -(TABSA) --- extraction of fine-grained opinion polarity w.r.t. a pre-defined -set of aspects --- remains a difficult task. Motivated by recent advances in -memory-augmented models for machine reading, we propose a novel architecture, -utilising external ""memory chains"" with a delayed memory update mechanism to -track entities. On a TABSA task, the proposed model demonstrates substantial -improvements over state-of-the-art approaches, including those using external -knowledge bases. -" -7366,1804.11046,"Albert Haque, Corinna Fukushima",Automatic Documentation of ICD Codes with Far-Field Speech Recognition,cs.SD cs.CL eess.AS," Documentation errors increase healthcare costs and cause unnecessary patient -deaths. As the standard language for diagnoses and billing, ICD codes serve as -the foundation for medical documentation worldwide. Despite the prevalence of -electronic medical records, hospitals still witness high levels of ICD -miscoding. In this paper, we propose to automatically document ICD codes with -far-field speech recognition. Far-field speech occurs when the microphone is -located several meters from the source, as is common with smart homes and -security systems. Our method combines acoustic signal processing with recurrent -neural networks to recognize and document ICD codes in real time. To evaluate -our model, we collected a far-field speech dataset of ICD-10 codes and found -our model to achieve 87% accuracy with a BLEU score of 85%. By sampling from an -unsupervised medical language model, our method is able to outperform existing -methods. Overall, this work shows the potential of automatic speech recognition -to provide efficient, accurate, and cost-effective healthcare documentation. -" -7367,1804.11067,"Trung Ngo Trong and Ville Hautam\""aki and Kristiina Jokinen","Staircase Network: structural language identification via hierarchical - attentive units",cs.AI cs.CL cs.LG stat.ML," Language recognition system is typically trained directly to optimize -classification error on the target language labels, without using the external, -or meta-information in the estimation of the model parameters. However labels -are not independent of each other, there is a dependency enforced by, for -example, the language family, which affects negatively on classification. The -other external information sources (e.g. audio encoding, telephony or video -speech) can also decrease classification accuracy. In this paper, we attempt to -solve these issues by constructing a deep hierarchical neural network, where -different levels of meta-information are encapsulated by attentive prediction -units and also embedded into the training progress. The proposed method learns -auxiliary tasks to obtain robust internal representation and to construct a -variant of attentive units within the hierarchical model. The final result is -the structural prediction of the target language and a closely related language -family. The algorithm reflects a ""staircase"" way of learning in both its -architecture and training, advancing from the fundamental audio encoding to the -language family level and finally to the target language level. This process -not only improves generalization but also tackles the issues of imbalanced -class priors and channel variability in the deep neural network model. Our -experimental findings show that the proposed architecture outperforms the -state-of-the-art i-vector approaches on both small and big language corpora by -a significant margin. -" -7368,1804.11105,"Asan Agibetov, Matthias Samwald","Fast and scalable learning of neuro-symbolic representations of - biomedical knowledge",cs.AI cs.CL," In this work we address the problem of fast and scalable learning of -neuro-symbolic representations for general biological knowledge. Based on a -recently published comprehensive biological knowledge graph (Alshahrani, 2017) -that was used for demonstrating neuro-symbolic representation learning, we show -how to train fast (under 1 minute) log-linear neural embeddings of the -entities. We utilize these representations as inputs for machine learning -classifiers to enable important tasks such as biological link prediction. -Classifiers are trained by concatenating learned entity embeddings to represent -entity relations, and training classifiers on the concatenated embeddings to -discern true relations from automatically generated negative examples. Our -simple embedding methodology greatly improves on classification error compared -to previously published state-of-the-art results, yielding a maximum increase -of $+0.28$ F-measure and $+0.22$ ROC AUC scores for the most difficult -biological link prediction problem. Finally, our embedding approach is orders -of magnitude faster to train ($\leq$ 1 minute vs. hours), much more economical -in terms of embedding dimensions ($d=50$ vs. $d=512$), and naturally encodes -the directionality of the asymmetric biological relations, that can be -controlled by the order with which we concatenate the embeddings. -" -7369,1804.11109,Andrew Hopkinson and Amit Gurdasani and Dave Palfrey and Arpit Mittal,Demand-Weighted Completeness Prediction for a Knowledge Base,cs.AI cs.CL," In this paper we introduce the notion of Demand-Weighted Completeness, -allowing estimation of the completeness of a knowledge base with respect to how -it is used. Defining an entity by its classes, we employ usage data to predict -the distribution over relations for that entity. For example, instances of -person in a knowledge base may require a birth date, name and nationality to be -considered complete. These predicted relation distributions enable detection of -important gaps in the knowledge base, and define the required facts for unseen -entities. Such characterisation of the knowledge base can also quantify how -usage and completeness change over time. We demonstrate a method to measure -Demand-Weighted Completeness, and show that a simple neural network model -performs well at this prediction task. -" -7370,1804.11146,"Micael Carvalho, R\'emi Cad\`ene, David Picard, Laure Soulier, Nicolas - Thome, Matthieu Cord","Cross-Modal Retrieval in the Cooking Context: Learning Semantic - Text-Image Embeddings",cs.CL cs.CV cs.IR," Designing powerful tools that support cooking activities has rapidly gained -popularity due to the massive amounts of available data, as well as recent -advances in machine learning that are capable of analyzing them. In this paper, -we propose a cross-modal retrieval model aligning visual and textual data (like -pictures of dishes and their recipes) in a shared representation space. We -describe an effective learning scheme, capable of tackling large-scale -problems, and validate it on the Recipe1M dataset containing nearly 1 million -picture-recipe pairs. We show the effectiveness of our approach regarding -previous state-of-the-art models and present qualitative results over -computational cooking use cases. -" -7371,1804.11149,"Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi - Kasivajjala",Q-Map: Clinical Concept Mining from Clinical Documents,cs.IR cs.CL," Over the past decade, there has been a steep rise in the data-driven analysis -in major areas of medicine, such as clinical decision support system, survival -analysis, patient similarity analysis, image analytics etc. Most of the data in -the field are well-structured and available in numerical or categorical formats -which can be used for experiments directly. But on the opposite end of the -spectrum, there exists a wide expanse of data that is intractable for direct -analysis owing to its unstructured nature which can be found in the form of -discharge summaries, clinical notes, procedural notes which are in human -written narrative format and neither have any relational model nor any standard -grammatical structure. An important step in the utilization of these texts for -such studies is to transform and process the data to retrieve structured -information from the haystack of irrelevant data using information retrieval -and data mining techniques. To address this problem, the authors present Q-Map -in this paper, which is a simple yet robust system that can sift through -massive datasets with unregulated formats to retrieve structured information -aggressively and efficiently. It is backed by an effective mining technique -which is based on a string matching algorithm that is indexed on curated -knowledge sources, that is both fast and configurable. The authors also briefly -examine its comparative performance with MetaMap, one of the most reputed tools -for medical concepts retrieval and present the advantages the former displays -over the latter. -" -7372,1804.11225,Leshem Choshen and Omri Abend,Automatic Metric Validation for Grammatical Error Correction,cs.CL," Metric validation in Grammatical Error Correction (GEC) is currently done by -observing the correlation between human and metric-induced rankings. However, -such correlation studies are costly, methodologically troublesome, and suffer -from low inter-rater agreement. We propose MAEGE, an automatic methodology for -GEC metric validation, that overcomes many of the difficulties with existing -practices. Experiments with \maege\ shed a new light on metric quality, showing -for example that the standard $M^2$ metric fares poorly on corpus-level -ranking. Moreover, we use MAEGE to perform a detailed analysis of metric -behavior, showing that correcting some types of errors is consistently -penalized by existing metrics. -" -7373,1804.11243,Shih-Feng Yang and Julia Taylor Rayz,An Event Detection Approach Based On Twitter Hashtags,cs.SI cs.CL cs.IR," Twitter is one of the most popular microblogging services in the world. The -great amount of information within Twitter makes it an important information -channel for people to learn and share news. Twitter hashtag is an popular -feature that can be viewed as human-labeled information which people use to -identify the topic of a tweet. Many researchers have proposed event-detection -approaches that can monitor Twitter data and determine whether special events, -such as accidents, extreme weather, earthquakes, or crimes take place. Although -many approaches use hashtags as one of their features, few of them explicitly -focus on the effectiveness of using hashtags on event detection. In this study, -we proposed an event detection approach that utilizes hashtags in tweets. We -adopted the feature extraction used in STREAMCUBE and applied a clustering -K-means approach to it. The experiments demonstrated that the K-means approach -performed better than STREAMCUBE in the clustering results. A discussion on -optimal K values for the K-means approach is also provided. -" -7374,1804.11251,"Enrico Santus, Chris Biemann, Emmanuele Chersoni","BomJi at SemEval-2018 Task 10: Combining Vector-, Pattern- and - Graph-based Information to Identify Discriminative Attributes",cs.CL," This paper describes BomJi, a supervised system for capturing discriminative -attributes in word pairs (e.g. yellow as discriminative for banana over -watermelon). The system relies on an XGB classifier trained on carefully -engineered graph-, pattern- and word embedding based features. It participated -in the SemEval- 2018 Task 10 on Capturing Discriminative Attributes, achieving -an F1 score of 0:73 and ranking 2nd out of 26 participant systems. -" -7375,1804.11254,Leshem Choshen and Omri Abend,"Inherent Biases in Reference based Evaluation for Grammatical Error - Correction and Text Simplification",cs.CL," The prevalent use of too few references for evaluating text-to-text -generation is known to bias estimates of their quality ({\it low coverage bias} -or LCB). This paper shows that overcoming LCB in Grammatical Error Correction -(GEC) evaluation cannot be attained by re-scaling or by increasing the number -of references in any feasible range, contrary to previous suggestions. This is -due to the long-tailed distribution of valid corrections for a sentence. -Concretely, we show that LCB incentivizes GEC systems to avoid correcting even -when they can generate a valid correction. Consequently, existing systems -obtain comparable or superior performance compared to humans, by making few but -targeted changes to the input. Similar effects on Text Simplification further -support our claims. -" -7376,1804.11258,"Zhan Shi, Xinchi Chen, Xipeng Qiu, Xuanjing Huang",Toward Diverse Text Generation with Inverse Reinforcement Learning,cs.CL cs.LG stat.ML," Text generation is a crucial task in NLP. Recently, several adversarial -generative models have been proposed to improve the exposure bias problem in -text generation. Though these models gain great success, they still suffer from -the problems of reward sparsity and mode collapse. In order to address these -two problems, in this paper, we employ inverse reinforcement learning (IRL) for -text generation. Specifically, the IRL framework learns a reward function on -training data, and then an optimal policy to maximum the expected total reward. -Similar to the adversarial models, the reward and policy function in IRL are -optimized alternately. Our method has two advantages: (1) the reward function -can produce more dense reward signals. (2) the generation policy, trained by -""entropy regularized"" policy gradient, encourages to generate more diversified -texts. Experiment results demonstrate that our proposed method can generate -higher quality texts than the previous methods. -" -7377,1804.11283,"Max Grusky, Mor Naaman, Yoav Artzi","Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive - Strategies",cs.CL," We present NEWSROOM, a summarization dataset of 1.3 million articles and -summaries written by authors and editors in newsrooms of 38 major news -publications. Extracted from search and social media metadata between 1998 and -2017, these high-quality summaries demonstrate high diversity of summarization -styles. In particular, the summaries combine abstractive and extractive -strategies, borrowing words and phrases from articles at varying rates. We -analyze the extraction strategies used in NEWSROOM summaries against other -datasets to quantify the diversity and difficulty of our new data, and train -existing methods on the data to evaluate its utility and challenges. -" -7378,1804.11297,"Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, - Thomas Schatz, Emmanuel Dupoux","Sampling strategies in Siamese Networks for unsupervised speech - representation learning",cs.CL cs.LG," Recent studies have investigated siamese network architectures for learning -invariant speech representations using same-different side information at the -word level. Here we investigate systematically an often ignored component of -siamese networks: the sampling procedure (how pairs of same vs. different -tokens are selected). We show that sampling strategies taking into account -Zipf's Law, the distribution of speakers and the proportions of same and -different pairs of words significantly impact the performance of the network. -In particular, we show that word frequency compression improves learning across -a large range of variations in number of training pairs. This effect does not -apply to the same extent to the fully unsupervised setting, where the pairs of -same-different words are obtained by spoken term discovery. We apply these -results to pairs of words discovered using an unsupervised algorithm and show -an improvement on state-of-the-art in unsupervised representation learning -using siamese networks. -" -7379,1804.11324,"Gonzalo Iglesias, William Tambellini, Adri\`a De Gispert, Eva Hasler - and Bill Byrne","Accelerating NMT Batched Beam Decoding with LMBR Posteriors for - Deployment",cs.CL," We describe a batched beam decoding algorithm for NMT with LMBR n-gram -posteriors, showing that LMBR techniques still yield gains on top of the best -recently reported results with Transformers. We also discuss acceleration -strategies for deployment, and the effect of the beam size and batching on -memory and speed. -" -7380,1804.11346,"Iria del R\'io, Marcos Zampieri, Shervin Malmasi",A Portuguese Native Language Identification Dataset,cs.CL," In this paper we present NLI-PT, the first Portuguese dataset compiled for -Native Language Identification (NLI), the task of identifying an author's first -language based on their second language writing. The dataset includes 1,868 -student essays written by learners of European Portuguese, native speakers of -the following L1s: Chinese, English, Spanish, German, Russian, French, -Japanese, Italian, Dutch, Tetum, Arabic, Polish, Korean, Romanian, and Swedish. -NLI-PT includes the original student text and four different types of -annotation: POS, fine-grained POS, constituency parses, and dependency parses. -NLI-PT can be used not only in NLI but also in research on several topics in -the field of Second Language Acquisition and educational NLP. We discuss -possible applications of this dataset and present the results obtained for the -first lexical baseline system for Portuguese NLI. -" -7381,1805.00063,"Pierre L. Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, and Tom - Sercu (IBM Research, USA)",Adversarial Semantic Alignment for Improved Image Captions,cs.LG cs.CL cs.CV stat.ML," In this paper we study image captioning as a conditional GAN training, -proposing both a context-aware LSTM captioner and co-attentive discriminator, -which enforces semantic alignment between images and captions. We empirically -focus on the viability of two training methods: Self-critical Sequence Training -(SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more -stable gradient behavior and improved results over Gumbel ST, even without -accessing discriminator gradients directly. We also address the problem of -automatic evaluation for captioning models and introduce a new semantic score, -and show its correlation to human judgement. As an evaluation paradigm, we -argue that an important criterion for a captioner is the ability to generalize -to compositions of objects that do not usually co-occur together. To this end, -we introduce a small captioned Out of Context (OOC) test set. The OOC set, -combined with our semantic score, are the proposed new diagnosis tools for the -captioning community. When evaluated on OOC and MS-COCO benchmarks, we show -that SCST-based training has a strong performance in both semantic score and -human evaluation, promising to be a valuable new approach for efficient -discrete GAN training. -" -7382,1805.00097,"Roma Patel, Yinfei Yang, Iain Marshall, Ani Nenkova and Byron Wallace",Syntactic Patterns Improve Information Extraction for Medical Search,cs.CL," Medical professionals search the published literature by specifying the type -of patients, the medical intervention(s) and the outcome measure(s) of -interest. In this paper we demonstrate how features encoding syntactic patterns -improve the performance of state-of-the-art sequence tagging models (both -linear and neural) for information extraction of these medically relevant -categories. We present an analysis of the type of patterns exploited, and the -semantic space induced for these, i.e., the distributed representations learned -for identified multi-token patterns. We show that these learned representations -differ substantially from those of the constituent unigrams, suggesting that -the patterns capture contextual information that is otherwise lost. -" -7383,1805.00150,"Zheng Zhang, Minlie Huang, Zhongzhou Zhao, Feng Ji, Haiqing Chen, - Xiaoyan Zhu",Memory-augmented Dialogue Management for Task-oriented Dialogue Systems,cs.CL cs.AI cs.IR," Dialogue management (DM) decides the next action of a dialogue system -according to the current dialogue state, and thus plays a central role in -task-oriented dialogue systems. Since dialogue management requires to have -access to not only local utterances, but also the global semantics of the -entire dialogue session, modeling the long-range history information is a -critical issue. To this end, we propose a novel Memory-Augmented Dialogue -management model (MAD) which employs a memory controller and two additional -memory structures, i.e., a slot-value memory and an external memory. The -slot-value memory tracks the dialogue state by memorizing and updating the -values of semantic slots (for instance, cuisine, price, and location), and the -external memory augments the representation of hidden states of traditional -recurrent neural networks through storing more context information. To update -the dialogue state efficiently, we also propose slot-level attention on user -utterances to extract specific semantic information for each slot. Experiments -show that our model can obtain state-of-the-art performance and outperforms -existing baselines. -" -7384,1805.00178,"Rui Wang, Masao Utiyama, and Eiichiro Sumita","Dynamic Sentence Sampling for Efficient Training of Neural Machine - Translation",cs.CL," Traditional Neural machine translation (NMT) involves a fixed training -procedure where each sentence is sampled once during each epoch. In reality, -some sentences are well-learned during the initial few epochs; however, using -this approach, the well-learned sentences would continue to be trained along -with those sentences that were not well learned for 10-30 epochs, which results -in a wastage of time. Here, we propose an efficient method to dynamically -sample the sentences in order to accelerate the NMT training. In this approach, -a weight is assigned to each sentence based on the measured difference between -the training costs of two iterations. Further, in each epoch, a certain -percentage of sentences are dynamically sampled according to their weights. -Empirical results based on the NIST Chinese-to-English and the WMT -English-to-German tasks depict that the proposed method can significantly -accelerate the NMT training and improve the NMT performance. -" -7385,1805.00188,"Liu Yang, Minghui Qiu, Chen Qu, Jiafeng Guo, Yongfeng Zhang, W. Bruce - Croft, Jun Huang, Haiqing Chen","Response Ranking with Deep Matching Networks and External Knowledge in - Information-seeking Conversation Systems",cs.IR cs.CL," Intelligent personal assistant systems with either text-based or voice-based -conversational interfaces are becoming increasingly popular around the world. -Retrieval-based conversation models have the advantages of returning fluent and -informative responses. Most existing studies in this area are on open domain -""chit-chat"" conversations or task / transaction oriented conversations. More -research is needed for information-seeking conversations. There is also a lack -of modeling external knowledge beyond the dialog utterances among current -conversational models. In this paper, we propose a learning framework on the -top of deep neural matching networks that leverages external knowledge for -response ranking in information-seeking conversation systems. We incorporate -external knowledge into deep neural models with pseudo-relevance feedback and -QA correspondence knowledge distillation. Extensive experiments with three -information-seeking conversation data sets including both open benchmarks and -commercial data show that, our methods outperform various baseline methods -including several deep text matching models and the state-of-the-art method on -response selection in multi-turn conversations. We also perform analysis over -different response types, model variations and ranking examples. Our models and -research findings provide new insights on how to utilize external knowledge -with deep neural models for response selection and have implications for the -design of the next generation of information-seeking conversation systems. -" -7386,1805.00195,"Chaitanya Kulkarni, Wei Xu, Alan Ritter, Raghu Machiraju","An Annotated Corpus for Machine Reading of Instructions in Wet Lab - Protocols",cs.CL cs.AI," We describe an effort to annotate a corpus of natural language instructions -consisting of 622 wet lab protocols to facilitate automatic or semi-automatic -conversion of protocols into a machine-readable format and benefit biological -research. Experimental results demonstrate the utility of our corpus for -developing machine learning approaches to shallow semantic parsing of -instructional texts. We make our annotated Wet Lab Protocol Corpus available to -the research community. -" -7387,1805.00249,"Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun",Nugget Proposal Networks for Chinese Event Detection,cs.CL," Neural network based models commonly regard event detection as a word-wise -classification task, which suffer from the mismatch problem between words and -event triggers, especially in languages without natural word delimiters such as -Chinese. In this paper, we propose Nugget Proposal Networks (NPNs), which can -solve the word-trigger mismatch problem by directly proposing entire trigger -nuggets centered at each character regardless of word boundaries. Specifically, -NPNs perform event detection in a character-wise paradigm, where a hybrid -representation for each character is first learned to capture both structural -and semantic information from both characters and words. Then based on learned -representations, trigger nuggets are proposed and categorized by exploiting -character compositional structures of Chinese event triggers. Experiments on -both ACE2005 and TAC KBP 2017 datasets show that NPNs significantly outperform -the state-of-the-art methods. -" -7388,1805.00250,"Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun",Adaptive Scaling for Sparse Detection in Information Extraction,cs.CL cs.LG," This paper focuses on detection tasks in information extraction, where -positive instances are sparsely distributed and models are usually evaluated -using F-measure on positive classes. These characteristics often result in -deficient performance of neural network based detection models. In this paper, -we propose adaptive scaling, an algorithm which can handle the positive -sparsity problem and directly optimize over F-measure via dynamic -cost-sensitive learning. To this end, we borrow the idea of marginal utility -from economics and propose a theoretical framework for instance importance -measuring without introducing any additional hyper-parameters. Experiments show -that our algorithm leads to a more effective and stable training of neural -network based detection models. -" -7389,1805.00254,"Pankaj Gupta and Benjamin Roth and Hinrich Sch\""utze",Joint Bootstrapping Machines for High Confidence Relation Extraction,cs.CL cs.AI cs.IR cs.LG cs.NE," Semi-supervised bootstrapping techniques for relationship extraction from -text iteratively expand a set of initial seed instances. Due to the lack of -labeled data, a key challenge in bootstrapping is semantic drift: if a false -positive instance is added during an iteration, then all following iterations -are contaminated. We introduce BREX, a new bootstrapping method that protects -against such contamination by highly effective confidence assessment. This is -achieved by using entity and template seeds jointly (as opposed to just one as -in previous work), by expanding entities and templates in parallel and in a -mutually constraining fashion in each iteration and by introducing -higherquality similarity measures for templates. Experimental results show that -BREX achieves an F1 that is 0.13 (0.87 vs. 0.74) better than the state of the -art for four relationships. -" -7390,1805.00270,"Anca Dumitrache, Lora Aroyo, Chris Welty",Capturing Ambiguity in Crowdsourcing Frame Disambiguation,cs.CL," FrameNet is a computational linguistics resource composed of semantic frames, -high-level concepts that represent the meanings of words. In this paper, we -present an approach to gather frame disambiguation annotations in sentences -using a crowdsourcing approach with multiple workers per sentence to capture -inter-annotator disagreement. We perform an experiment over a set of 433 -sentences annotated with frames from the FrameNet corpus, and show that the -aggregated crowd annotations achieve an F1 score greater than 0.67 as compared -to expert linguists. We highlight cases where the crowd annotation was correct -even though the expert is in disagreement, arguing for the need to have -multiple annotators per sentence. Most importantly, we examine cases in which -crowd workers could not agree, and demonstrate that these cases exhibit -ambiguity, either in the sentence, frame, or the task itself, and argue that -collapsing such cases to a single, discrete truth value (i.e. correct or -incorrect) is inappropriate, creating arbitrary targets for machine learning. -" -7391,1805.00287,"Daniel Hershcovich, Omri Abend and Ari Rappoport",Multitask Parsing Across Semantic Representations,cs.CL," The ability to consolidate information of different types is at the core of -intelligence, and has tremendous practical value in allowing learning for one -task to benefit from generalizations learned for others. In this paper we -tackle the challenging task of improving semantic parsing performance, taking -UCCA parsing as a test case, and AMR, SDP and Universal Dependencies (UD) -parsing as auxiliary tasks. We experiment on three languages, using a uniform -transition-based system and learning architecture for all parsing tasks. -Despite notable conceptual, formal and domain differences, we show that -multitask learning significantly improves UCCA parsing in both in-domain and -out-of-domain settings. -" -7392,1805.00314,"Josiah Wang, Pranava Madhyastha, Lucia Specia",Object Counts! Bringing Explicit Detections Back into Image Captioning,cs.CV cs.AI cs.CL," The use of explicit object detectors as an intermediate step to image -captioning - which used to constitute an essential stage in early work - is -often bypassed in the currently dominant end-to-end approaches, where the -language model is conditioned directly on a mid-level image embedding. We argue -that explicit detections provide rich semantic information, and can thus be -used as an interpretable representation to better understand why end-to-end -image captioning systems work well. We provide an in-depth analysis of -end-to-end image captioning by exploring a variety of cues that can be derived -from such object detections. Our study reveals that end-to-end image captioning -systems rely on matching image representations to generate captions, and that -encoding the frequency, size and position of objects are complementary and all -play a role in forming a good image representation. It also reveals that -different object categories contribute in different ways towards image -captioning. -" -7393,1805.00352,"Qufei Chen, Marina Sokolova","Word2Vec and Doc2Vec in Unsupervised Sentiment Analysis of Clinical - Discharge Summaries",cs.CL cs.LG," In this study, we explored application of Word2Vec and Doc2Vec for sentiment -analysis of clinical discharge summaries. We applied unsupervised learning -since the data sets did not have sentiment annotations. Note that unsupervised -learning is a more realistic scenario than supervised learning which requires -an access to a training set of sentiment-annotated data. We aim to detect if -there exists any underlying bias towards or against a certain disease. We used -SentiWordNet to establish a gold sentiment standard for the data sets and -evaluate performance of Word2Vec and Doc2Vec methods. We have shown that the -Word2vec and Doc2Vec methods complement each other results in sentiment -analysis of the data sets. -" -7394,1805.00444,Steven Coats,Skin Tone Emoji and Sentiment on Twitter,cs.CY cs.CL," In 2015, the Unicode Consortium introduced five skin tone emoji that can be -used in combination with emoji representing human figures and body parts. In -this study, use of the skin tone emoji is analyzed geographically in a large -sample of data from Twitter. It can be shown that values for the skin tone -emoji by country correspond approximately to the skin tone of the resident -populations, and that a negative correlation exists between tweet sentiment and -darker skin tone at the global level. In an era of large-scale migrations and -continued sensitivity to questions of skin color and race, understanding how -new language elements such as skin tone emoji are used can help frame our -understanding of how people represent themselves and others in terms of a -salient personal appearance attribute. -" -7395,1805.00456,"Danielle Saunders, Felix Stahlberg, Adria de Gispert and Bill Byrne","Multi-representation Ensembles and Delayed SGD Updates Improve - Syntax-based NMT",cs.CL," We explore strategies for incorporating target syntax into Neural Machine -Translation. We specifically focus on syntax in ensembles containing multiple -sentence representations. We formulate beam search over such ensembles using -WFSTs, and describe a delayed SGD update training procedure that is especially -effective for long representations like linearized syntax. Our approach gives -state-of-the-art performance on a difficult Japanese-English task. -" -7396,1805.00460,"Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada","Customized Image Narrative Generation via Interactive Visual Question - Generation and Answering",cs.CL cs.AI cs.CV cs.HC," Image description task has been invariably examined in a static manner with -qualitative presumptions held to be universally applicable, regardless of the -scope or target of the description. In practice, however, different viewers may -pay attention to different aspects of the image, and yield different -descriptions or interpretations under various contexts. Such diversity in -perspectives is difficult to derive with conventional image description -techniques. In this paper, we propose a customized image narrative generation -task, in which the users are interactively engaged in the generation process by -providing answers to the questions. We further attempt to learn the user's -interest via repeating such interactive stages, and to automatically reflect -the interest in descriptions for new images. Experimental results demonstrate -that our model can generate a variety of descriptions from single image that -cover a wider range of topics than conventional models, while being -customizable to the target user of interaction. -" -7397,1805.00462,"Haichao Zhang, Haonan Yu and Wei Xu","Interactive Language Acquisition with One-shot Visual Concept Learning - through a Conversational Game",cs.CL cs.AI," Building intelligent agents that can communicate with and learn from humans -in natural language is of great value. Supervised language learning is limited -by the ability of capturing mainly the statistics of training data, and is -hardly adaptive to new scenarios or flexible for acquiring new knowledge -without inefficient retraining or catastrophic forgetting. We highlight the -perspective that conversational interaction serves as a natural interface both -for language learning and for novel knowledge acquisition and propose a joint -imitation and reinforcement approach for grounded language learning through an -interactive conversational game. The agent trained with this approach is able -to actively acquire information by asking questions about novel objects and use -the just-learned knowledge in subsequent conversations in a one-shot fashion. -Results compared with other methods verified the effectiveness of the proposed -approach. -" -7398,1805.00471,Soumya Kambhampati,"""I ain't tellin' white folks nuthin"": A quantitative exploration of the - race-related problem of candour in the WPA slave narratives",cs.CL," From 1936-38, the Works Progress Administration interviewed thousands of -former slaves about their life experiences. While these interviews are crucial -to understanding the ""peculiar institution"" from the standpoint of the slave -himself, issues relating to bias cloud analyses of these interviews. The -problem I investigate is the problem of candour in the WPA slave narratives: it -is widely held in the historical community that the strict racial caste system -of the Deep South compelled black ex-slaves to tell white interviewers what -they thought they wanted to hear, suggesting that there was a significant -difference candour depending on whether their interviewer was white or black. -In this work, I attempt to quantitatively characterise this race-related -problem of candour. Prior work has either been of an impressionistic, -qualitative nature, or utilised exceedingly simple quantitative methodology. In -contrast, I use more sophisticated statistical methods: in particular word -frequency and sentiment analysis and comparative topic modelling with LDA to -try and identify differences in the content and sentiment expressed by -ex-slaves in front of white interviewers versus black interviewers. While my -sentiment analysis methodology was ultimately unsuccessful due to the -complexity of the task, my word frequency analysis and comparative topic -modelling methods both showed strong evidence that the content expressed in -front of white interviewers was different from that of black interviewers. In -particular, I found that the ex-slaves spoke much more about unfavourable -aspects of slavery like whipping and slave patrollers in front of interviewers -of their own race. I hope that my more-sophisticated statistical methodology -helps improve the robustness of the argument for the existence of this problem -of candour in the slave narratives, which some would seek to deny for -revisionist purposes. -" -7399,1805.00551,"Marilyn A. Walker, Albry Smither, Shereen Oraby, Vrindavan Harrison, - Hadar Shemtov","Exploring Conversational Language Generation for Rich Content about - Hotels",cs.CL," Dialogue systems for hotel and tourist information have typically simplified -the richness of the domain, focusing system utterances on only a few selected -attributes such as price, location and type of rooms. However, much more -content is typically available for hotels, often as many as 50 distinct -instantiated attributes for an individual entity. New methods are needed to use -this content to generate natural dialogues for hotel information, and in -general for any domain with such rich complex content. We describe three -experiments aimed at collecting data that can inform an NLG for hotels -dialogues, and show, not surprisingly, that the sentences in the original -written hotel descriptions provided on webpages for each hotel are -stylistically not a very good match for conversational interaction. We quantify -the stylistic features that characterize the differences between the original -textual data and the collected dialogic data. We plan to use these in stylistic -models for generation, and for scoring retrieved utterances for use in hotel -dialogues -" -7400,1805.00579,"Han Zhao, Shuayb Zarar, Ivan Tashev, Chin-Hui Lee",Convolutional-Recurrent Neural Networks for Speech Enhancement,cs.SD cs.CL cs.LG eess.AS," We propose an end-to-end model based on convolutional and recurrent neural -networks for speech enhancement. Our model is purely data-driven and does not -make any assumptions about the type or the stationarity of the noise. In -contrast to existing methods that use multilayer perceptrons (MLPs), we employ -both convolutional and recurrent neural network architectures. Thus, our -approach allows us to exploit local structures in both the frequency and -temporal domains. By incorporating prior knowledge of speech signals into the -design of model structures, we build a model that is more data-efficient and -achieves better generalization on both seen and unseen noise. Based on -experiments with synthetic data, we demonstrate that our model outperforms -existing methods, improving PESQ by up to 0.6 on seen noise and 0.64 on unseen -noise. -" -7401,1805.00604,"Aryan Mobiny, Mohammad Najarian","Text-Independent Speaker Verification Using Long Short-Term Memory - Networks",eess.AS cs.CL cs.SD," In this paper, an architecture based on Long Short-Term Memory Networks has -been proposed for the text-independent scenario which is aimed to capture the -temporal speaker-related information by operating over traditional speech -features. For speaker verification, at first, a background model must be -created for speaker representation. Then, in enrollment stage, the speaker -models will be created based on the enrollment utterances. For this work, the -model will be trained in an end-to-end fashion to combine the first two stages. -The main goal of end-to-end training is the model being optimized to be -consistent with the speaker verification protocol. The end- to-end training -jointly learns the background and speaker models by creating the representation -space. The LSTM architecture is trained to create a discrimination space for -validating the match and non-match pairs for speaker verification. The proposed -architecture demonstrate its superiority in the text-independent compared to -other traditional methods. -" -7402,1805.00625,"Didan Deng, Yuqian Zhou, Jimin Pi, Bertram E.Shi","Multimodal Utterance-level Affect Analysis using Visual, Audio and Text - Features",eess.IV cs.CL cs.CV," The integration of information across multiple modalities and across time is -a promising way to enhance the emotion recognition performance of affective -systems. Much previous work has focused on instantaneous emotion recognition. -The 2018 One-Minute Gradual-Emotion Recognition (OMG-Emotion) challenge, which -was held in conjunction with the IEEE World Congress on Computational -Intelligence, encouraged participants to address long-term emotion recognition -by integrating cues from multiple modalities, including facial expression, -audio and language. Intuitively, a multi-modal inference network should be able -to leverage information from each modality and their correlations to improve -recognition over that achievable by a single modality network. We describe here -a multi-modal neural architecture that integrates visual information over time -using an LSTM, and combines it with utterance level audio and text cues to -recognize human sentiment from multimodal clips. Our model outperforms the -unimodal baseline, achieving the concordance correlation coefficients (CCC) of -0.400 on the arousal task, and 0.353 on the valence task. -" -7403,1805.00631,"Biao Zhang, Deyi Xiong, Jinsong Su",Accelerating Neural Transformer via an Average Attention Network,cs.CL," With parallelizable attention networks, the neural Transformer is very fast -to train. However, due to the auto-regressive architecture and self-attention -in the decoder, the decoding procedure becomes slow. To alleviate this issue, -we propose an average attention network as an alternative to the self-attention -network in the decoder of the neural Transformer. The average attention network -consists of two layers, with an average layer that models dependencies on -previous positions and a gating layer that is stacked over the average layer to -enhance the expressiveness of the proposed attention network. We apply this -network on the decoder part of the neural Transformer to replace the original -target-side self-attention model. With masking tricks and dynamic programming, -our model enables the neural Transformer to decode sentences over four times -faster than its original version with almost no loss in training time and -translation performance. We conduct a series of experiments on WMT17 -translation tasks, where on 6 different language pairs, we obtain robust and -consistent speed-ups in decoding. -" -7404,1805.00676,Cristian Bodnar,Text to Image Synthesis Using Generative Adversarial Networks,cs.CV cs.CL," Generating images from natural language is one of the primary applications of -recent conditional generative models. Besides testing our ability to model -conditional, highly dimensional distributions, text to image synthesis has many -exciting and practical applications such as photo editing or computer-aided -content creation. Recent progress has been made using Generative Adversarial -Networks (GANs). This material starts with a gentle introduction to these -topics and discusses the existent state of the art models. Moreover, I propose -Wasserstein GAN-CLS, a new model for conditional image generation based on the -Wasserstein distance which offers guarantees of stability. Then, I show how the -novel loss function of Wasserstein GAN-CLS can be used in a Conditional -Progressive Growing GAN. In combination with the proposed loss, the model -boosts by 7.07% the best Inception Score (on the Caltech birds dataset) of the -models which use only the sentence-level visual semantics. The only model which -performs better than the Conditional Wasserstein Progressive Growing GAN is the -recently proposed AttnGAN which uses word-level visual semantics as well. -" -7405,1805.00731,"Francesco Barbieri, Luis Marujo, Pradeep Karuturi, William Brendel, - Horacio Saggion",Exploring Emoji Usage and Prediction Through a Temporal Variation Lens,cs.CL," The frequent use of Emojis on social media platforms has created a new form -of multimodal social interaction. Developing methods for the study and -representation of emoji semantics helps to improve future multimodal -communication systems. In this paper, we explore the usage and semantics of -emojis over time. We compare emoji embeddings trained on a corpus of different -seasons and show that some emojis are used differently depending on the time of -the year. Moreover, we propose a method to take into account the time -information for emoji prediction systems, outperforming state-of-the-art -systems. We show that, using the time information, the accuracy of some emojis -can be significantly improved. -" -7406,1805.00741,"Hengyi Cai, Xingguang Ji, Yonghao Song, Yan Jin, Yang Zhang, Mairgup - Mansur, Xiaofang Zhao","KNPTC: Knowledge and Neural Machine Translation Powered Chinese Pinyin - Typo Correction",cs.CL cs.AI," Chinese pinyin input methods are very important for Chinese language -processing. Actually, users may make typos inevitably when they input pinyin. -Moreover, pinyin typo correction has become an increasingly important task with -the popularity of smartphones and the mobile Internet. How to exploit the -knowledge of users typing behaviors and support the typo correction for acronym -pinyin remains a challenging problem. To tackle these challenges, we propose -KNPTC, a novel approach based on neural machine translation (NMT). In contrast -to previous work, KNPTC is able to integrate explicit knowledge into NMT for -pinyin typo correction, and is able to learn to correct a variety of typos -without the guidance of manually selected constraints or languagespecific -features. In this approach, we first obtain the transition probabilities -between adjacent letters based on large-scale real-life datasets. Then, we -construct the ""ground-truth"" alignments of training sentence pairs by utilizing -these probabilities. Furthermore, these alignments are integrated into NMT to -capture sensible pinyin typo correction patterns. KNPTC is applied to correct -typos in real-life datasets, which achieves 32.77% increment on average in -accuracy rate of typo correction compared against the state-of-the-art system. -" -7407,1805.00760,"Xin Li, Lidong Bing, Piji Li, Wai Lam, Zhimou Yang","Aspect Term Extraction with History Attention and Selective - Transformation",cs.CL," Aspect Term Extraction (ATE), a key sub-task in Aspect-Based Sentiment -Analysis, aims to extract explicit aspect expressions from online user reviews. -We present a new framework for tackling ATE. It can exploit two useful clues, -namely opinion summary and aspect detection history. Opinion summary is -distilled from the whole input sentence, conditioned on each current token for -aspect prediction, and thus the tailor-made summary can help aspect prediction -on this token. Another clue is the information of aspect detection history, and -it is distilled from the previous aspect predictions so as to leverage the -coordinate structure and tagging schema constraints to upgrade the aspect -prediction. Experimental results over four benchmark datasets clearly -demonstrate that our framework can outperform all state-of-the-art methods. -" -7408,1805.00879,"Robert Litschko, Goran Glava\v{s}, Simone Paolo Ponzetto, Ivan Vuli\'c","Unsupervised Cross-Lingual Information Retrieval using Monolingual Data - Only",cs.CL," We propose a fully unsupervised framework for ad-hoc cross-lingual -information retrieval (CLIR) which requires no bilingual data at all. The -framework leverages shared cross-lingual word embedding spaces in which terms, -queries, and documents can be represented, irrespective of their actual -language. The shared embedding spaces are induced solely on the basis of -monolingual corpora in two languages through an iterative process based on -adversarial neural networks. Our experiments on the standard CLEF CLIR -collections for three language pairs of varying degrees of language similarity -(English-Dutch/Italian/Finnish) demonstrate the usefulness of the proposed -fully unsupervised approach. Our CLIR models with unsupervised cross-lingual -embeddings outperform baselines that utilize cross-lingual embeddings induced -relying on word-level and document-level alignments. We then demonstrate that -further improvements can be achieved by unsupervised ensemble CLIR models. We -believe that the proposed framework is the first step towards development of -effective CLIR models for language pairs and domains where parallel data are -scarce or non-existent. -" -7409,1805.00900,"Micael Carvalho, R\'emi Cad\`ene, David Picard, Laure Soulier, - Matthieu Cord",Images & Recipes: Retrieval in the cooking context,cs.AI cs.CL cs.CV cs.IR," Recent advances in the machine learning community allowed different use cases -to emerge, as its association to domains like cooking which created the -computational cuisine. In this paper, we tackle the picture-recipe alignment -problem, having as target application the large-scale retrieval task (finding a -recipe given a picture, and vice versa). Our approach is validated on the -Recipe1M dataset, composed of one million image-recipe pairs and additional -class information, for which we achieve state-of-the-art results. -" -7410,1805.00912,"Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang","Tensorized Self-Attention: Efficiently Modeling Pairwise and Global - Dependencies Together",cs.CL cs.AI," Neural networks equipped with self-attention have parallelizable computation, -light-weight structure, and the ability to capture both long-range and local -dependencies. Further, their expressive power and performance can be boosted by -using a vector to measure pairwise dependency, but this requires to expand the -alignment matrix to a tensor, which results in memory and computation -bottlenecks. In this paper, we propose a novel attention mechanism called -""Multi-mask Tensorized Self-Attention"" (MTSA), which is as fast and as -memory-efficient as a CNN, but significantly outperforms previous -CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token) -and global (source2token) dependencies by a novel compatibility function -composed of dot-product and additive attentions, 2) uses a tensor to represent -the feature-wise alignment scores for better expressive power but only requires -parallelizable matrix multiplications, and 3) combines multi-head with -multi-dimensional attentions, and applies a distinct positional mask to each -head (subspace), so the memory and computation can be distributed to multiple -heads, each with sequential information encoded independently. The experiments -show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or -competitive performance on nine NLP benchmarks with compelling memory- and -time-efficiency. -" -7411,1805.01035,Roee Aharoni and Yoav Goldberg,Split and Rephrase: Better Evaluation and a Stronger Baseline,cs.CL," Splitting and rephrasing a complex sentence into several shorter sentences -that convey the same meaning is a challenging problem in NLP. We show that -while vanilla seq2seq models can reach high scores on the proposed benchmark -(Narayan et al., 2017), they suffer from memorization of the training set which -contains more than 89% of the unique simple sentences from the validation and -test sets. To aid this, we present a new train-development-test data split and -neural models augmented with a copy-mechanism, outperforming the best reported -baseline by 8.68 BLEU and fostering further progress on the task. -" -7412,1805.01042,"Adam Poliak, Jason Naradowsky, Aparajita Haldar, Rachel Rudinger, - Benjamin Van Durme",Hypothesis Only Baselines in Natural Language Inference,cs.CL," We propose a hypothesis only baseline for diagnosing Natural Language -Inference (NLI). Especially when an NLI dataset assumes inference is occurring -based purely on the relationship between a context and a hypothesis, it follows -that assessing entailment relations while ignoring the provided context is a -degenerate solution. Yet, through experiments on ten distinct NLI datasets, we -find that this approach, which we refer to as a hypothesis-only model, is able -to significantly outperform a majority class baseline across a number of NLI -datasets. Our analysis suggests that statistical irregularities may allow a -model to perform NLI in some datasets beyond what should be achievable without -access to the context. -" -7413,1805.01052,Nikita Kitaev and Dan Klein,Constituency Parsing with a Self-Attentive Encoder,cs.CL," We demonstrate that replacing an LSTM encoder with a self-attentive -architecture can lead to improvements to a state-of-the-art discriminative -constituency parser. The use of attention makes explicit the manner in which -information is propagated between different locations in the sentence, which we -use to both analyze our model and propose potential improvements. For example, -we find that separating positional and content information in the encoder can -lead to improved parsing accuracy. Additionally, we evaluate different -approaches for lexical representation. Our parser achieves new state-of-the-art -results for single models trained on the Penn Treebank: 93.55 F1 without the -use of any external data, and 95.13 F1 when using pre-trained word -representations. Our parser also outperforms the previous best-published -accuracy figures on 8 of the 9 languages in the SPMRL dataset. -" -7414,1805.01054,Scott Werwath,"Automatic Coding for Neonatal Jaundice From Free Text Data Using - Ensemble Methods",cs.CL," This study explores the creation of a machine learning model to automatically -identify whether a Neonatal Intensive Care Unit (NICU) patient was diagnosed -with neonatal jaundice during a particular hospitalization based on their -associated clinical notes. We develop a number of techniques for text -preprocessing and feature selection and compare the effectiveness of different -classification models. We show that using ensemble decision tree -classification, both with AdaBoost and with bagging, outperforms support vector -machines (SVM), the current state-of-the-art technique for neonatal jaundice -coding. -" -7415,1805.01060,"Ziqi Zheng, Chenjie Cao, Xingwei Chen, Guoqiang Xu",Multimodal Emotion Recognition for One-Minute-Gradual Emotion Challenge,cs.AI cs.CL cs.CV," The continuous dimensional emotion modelled by arousal and valence can depict -complex changes of emotions. In this paper, we present our works on arousal and -valence predictions for One-Minute-Gradual (OMG) Emotion Challenge. Multimodal -representations are first extracted from videos using a variety of acoustic, -video and textual models and support vector machine (SVM) is then used for -fusion of multimodal signals to make final predictions. Our solution achieves -Concordant Correlation Coefficient (CCC) scores of 0.397 and 0.520 on arousal -and valence respectively for the validation dataset, which outperforms the -baseline systems with the best CCC scores of 0.15 and 0.23 on arousal and -valence by a large margin. -" -7416,1805.01070,"Alexis Conneau, German Kruszewski, Guillaume Lample, Lo\""ic Barrault, - Marco Baroni","What you can cram into a single vector: Probing sentence embeddings for - linguistic properties",cs.CL," Although much effort has recently been devoted to training high-quality -sentence embeddings, we still have a poor understanding of what they are -capturing. ""Downstream"" tasks, often based on sentence classification, are -commonly used to evaluate the quality of sentence representations. The -complexity of the tasks makes it however difficult to infer what kind of -information is present in the representations. We introduce here 10 probing -tasks designed to capture simple linguistic features of sentences, and we use -them to study embeddings generated by three different encoders trained in eight -distinct ways, uncovering intriguing properties of both encoders and training -methods. -" -7417,1805.01083,"Xiaolan Wang, Aaron Feng, Behzad Golshan, Alon Halevy, George Mihaila, - Hidekazu Oiwa, Wang-Chiew Tan",Scalable Semantic Querying of Text,cs.DB cs.CL," We present the KOKO system that takes declarative information extraction to a -new level by incorporating advances in natural language processing techniques -in its extraction language. KOKO is novel in that its extraction language -simultaneously supports conditions on the surface of the text and on the -structure of the dependency parse tree of sentences, thereby allowing for more -refined extractions. KOKO also supports conditions that are forgiving to -linguistic variation of expressing concepts and allows to aggregate evidence -from the entire document in order to filter extractions. - To scale up, KOKO exploits a multi-indexing scheme and heuristics for -efficient extractions. We extensively evaluate KOKO over publicly available -text corpora. We show that KOKO indices take up the smallest amount of space, -are notably faster and more effective than a number of prior indexing schemes. -Finally, we demonstrate KOKO's scale up on a corpus of 5 million Wikipedia -articles. -" -7418,1805.01086,"Xin Li, Lidong Bing, Wai Lam, Bei Shi",Transformation Networks for Target-Oriented Sentiment Classification,cs.CL," Target-oriented sentiment classification aims at classifying sentiment -polarities over individual opinion targets in a sentence. RNN with attention -seems a good fit for the characteristics of this task, and indeed it achieves -the state-of-the-art performance. After re-examining the drawbacks of attention -mechanism and the obstacles that block CNN to perform well in this -classification task, we propose a new model to overcome these issues. Instead -of attention, our model employs a CNN layer to extract salient features from -the transformed word representations originated from a bi-directional RNN -layer. Between the two layers, we propose a component to generate -target-specific representations of words in the sentence, meanwhile incorporate -a mechanism for preserving the original contextual information from the RNN -layer. Experiments show that our model achieves a new state-of-the-art -performance on a few benchmarks. -" -7419,1805.01087,"Xuezhe Ma, Zecong Hu, Jingzhou Liu, Nanyun Peng, Graham Neubig, Eduard - Hovy",Stack-Pointer Networks for Dependency Parsing,cs.CL cs.LG," We introduce a novel architecture for dependency parsing: \emph{stack-pointer -networks} (\textbf{\textsc{StackPtr}}). Combining pointer -networks~\citep{vinyals2015pointer} with an internal stack, the proposed model -first reads and encodes the whole sentence, then builds the dependency tree -top-down (from root-to-leaf) in a depth-first fashion. The stack tracks the -status of the depth-first search and the pointer networks select one child for -the word at the top of the stack at each step. The \textsc{StackPtr} parser -benefits from the information of the whole sentence and all previously derived -subtree structures, and removes the left-to-right restriction in classical -transition-based parsers. Yet, the number of steps for building any (including -non-projective) parse tree is linear in the length of the sentence just as -other transition-based parsers, yielding an efficient decoding algorithm with -$O(n^2)$ time complexity. We evaluate our model on 29 treebanks spanning 20 -languages and different dependency annotation schemas, and achieve -state-of-the-art performance on 21 of them. -" -7420,1805.01089,"Shuming Ma, Xu Sun, Junyang Lin, Xuancheng Ren","A Hierarchical End-to-End Model for Jointly Improving Text Summarization - and Sentiment Classification",cs.CL cs.LG," Text summarization and sentiment classification both aim to capture the main -ideas of the text but at different levels. Text summarization is to describe -the text within a few sentences, while sentiment classification can be regarded -as a special type of summarization which ""summarizes"" the text into a even more -abstract fashion, i.e., a sentiment class. Based on this idea, we propose a -hierarchical end-to-end model for joint learning of text summarization and -sentiment classification, where the sentiment classification label is treated -as the further ""summarization"" of the text summarization output. Hence, the -sentiment classification layer is put upon the text summarization layer, and a -hierarchical structure is derived. Experimental results on Amazon online -reviews datasets show that our model achieves better performance than the -strong baseline systems on both abstractive summarization and sentiment -classification. -" -7421,1805.01112,"Nishant Nikhil, Muktabh Mayank Srivastava","Binarizer at SemEval-2018 Task 3: Parsing dependency and deep learning - for irony detection",cs.CL," In this paper, we describe the system submitted for the SemEval 2018 Task 3 -(Irony detection in English tweets) Subtask A by the team Binarizer. Irony -detection is a key task for many natural language processing works. Our method -treats ironical tweets to consist of smaller parts containing different -emotions. We break down tweets into separate phrases using a dependency parser. -We then embed those phrases using an LSTM-based neural network model which is -pre-trained to predict emoticons for tweets. Finally, we train a -fully-connected network to achieve classification. -" -7422,1805.01156,Ville Vestman and Tomi Kinnunen,"Supervector Compression Strategies to Speed up I-Vector System - Development",eess.AS cs.CL cs.LG cs.SD stat.ML," The front-end factor analysis (FEFA), an extension of principal component -analysis (PPCA) tailored to be used with Gaussian mixture models (GMMs), is -currently the prevalent approach to extract compact utterance-level features -(i-vectors) for automatic speaker verification (ASV) systems. Little research -has been conducted comparing FEFA to the conventional PPCA applied to maximum a -posteriori (MAP) adapted GMM supervectors. We study several alternative -methods, including PPCA, factor analysis (FA), and two supervised approaches, -supervised PPCA (SPPCA) and the recently proposed probabilistic partial least -squares (PPLS), to compress MAP-adapted GMM supervectors. The resulting -i-vectors are used in ASV tasks with a probabilistic linear discriminant -analysis (PLDA) back-end. We experiment on two different datasets, on the -telephone condition of NIST SRE 2010 and on the recent VoxCeleb corpus -collected from YouTube videos containing celebrity interviews recorded in -various acoustical and technical conditions. The results suggest that, in terms -of ASV accuracy, the supervector compression approaches are on a par with FEFA. -The supervised approaches did not result in improved performance. In comparison -to FEFA, we obtained more than hundred-fold (100x) speedups in the total -variability model (TVM) training using the PPCA and FA supervector compression -approaches. -" -7423,1805.01216,"Dinesh Raghu, Nikhil Gupta and Mausam",Disentangling Language and Knowledge in Task-Oriented Dialogs,cs.LG cs.CL stat.ML," The Knowledge Base (KB) used for real-world applications, such as booking a -movie or restaurant reservation, keeps changing over time. End-to-end neural -networks trained for these task-oriented dialogs are expected to be immune to -any changes in the KB. However, existing approaches breakdown when asked to -handle such changes. We propose an encoder-decoder architecture (BoSsNet) with -a novel Bag-of-Sequences (BoSs) memory, which facilitates the disentangled -learning of the response's language model and its knowledge incorporation. -Consequently, the KB can be modified with new knowledge without a drop in -interpretability. We find that BoSsNet outperforms state-of-the-art models, -with considerable improvements (> 10\%) on bAbI OOV test sets and other -human-human datasets. We also systematically modify existing datasets to -measure disentanglement and show BoSsNet to be robust to KB modifications. -" -7424,1805.01252,Carolin Lawrence and Stefan Riezler,"Improving a Neural Semantic Parser by Counterfactual Learning from Human - Bandit Feedback",cs.CL cs.LG stat.ML," Counterfactual learning from human bandit feedback describes a scenario where -user feedback on the quality of outputs of a historic system is logged and used -to improve a target system. We show how to apply this learning framework to -neural semantic parsing. From a machine learning perspective, the key challenge -lies in a proper reweighting of the estimator so as to avoid known degeneracies -in counterfactual learning, while still being applicable to stochastic gradient -optimization. To conduct experiments with human users, we devise an easy-to-use -interface to collect human feedback on semantic parses. Our work is the first -to show that semantic parsers can be improved significantly by counterfactual -learning from logged human feedback data. -" -7425,1805.01334,Chenyan Xiong and Zhengzhong Liu and Jamie Callan and Tie-Yan Liu,"Towards Better Text Understanding and Retrieval through Kernel Entity - Salience Modeling",cs.IR cs.CL," This paper presents a Kernel Entity Salience Model (KESM) that improves text -understanding and retrieval by better estimating entity salience (importance) -in documents. KESM represents entities by knowledge enriched distributed -representations, models the interactions between entities and words by kernels, -and combines the kernel scores to estimate entity salience. The whole model is -learned end-to-end using entity salience labels. The salience model also -improves ad hoc search accuracy, providing effective ranking features by -modeling the salience of query entities in candidate documents. Our experiments -on two entity salience corpora and two TREC ad hoc search datasets demonstrate -the effectiveness of KESM over frequency-based and feature-based methods. We -also provide examples showing how KESM conveys its text understanding ability -learned from entity salience to search. -" -7426,1805.01369,"Grigoriy Sterling, Andrey Belyaev, Maxim Ryabov",Framewise approach in multimodal emotion recognition in OMG challenge,cs.AI cs.CL cs.CV," In this report we described our approach achieves $53\%$ of unweighted -accuracy over $7$ emotions and $0.05$ and $0.09$ mean squared errors for -arousal and valence in OMG emotion recognition challenge. Our results were -obtained with ensemble of single modality models trained on voice and face data -from video separately. We consider each stream as a sequence of frames. Next we -estimated features from frames and handle it with recurrent neural network. As -audio frame we mean short $0.4$ second spectrogram interval. For features -estimation for face pictures we used own ResNet neural network pretrained on -AffectNet database. Each short spectrogram was considered as a picture and -processed by convolutional network too. As a base audio model we used ResNet -pretrained in speaker recognition task. Predictions from both modalities were -fused on decision level and improve single-channel approaches by a few percent -" -7427,1805.01416,"Pedro M. Ferreira, Diogo Pernes, Kelwin Fernandes, Ana Rebelo and - Jaime S. Cardoso",Dimensional emotion recognition using visual and textual cues,cs.AI cs.CL cs.CV," This paper addresses the problem of automatic emotion recognition in the -scope of the One-Minute Gradual-Emotional Behavior challenge (OMG-Emotion -challenge). The underlying objective of the challenge is the automatic -estimation of emotion expressions in the two-dimensional emotion representation -space (i.e., arousal and valence). The adopted methodology is a weighted -ensemble of several models from both video and text modalities. For video-based -recognition, two different types of visual cues (i.e., face and facial -landmarks) were considered to feed a multi-input deep neural network. Regarding -the text modality, a sequential model based on a simple recurrent architecture -was implemented. In addition, we also introduce a model based on high-level -features in order to embed domain knowledge in the learning process. -Experimental results on the OMG-Emotion validation set demonstrate the -effectiveness of the implemented ensemble model as it clearly outperforms the -current baseline methods. -" -7428,1805.01445,"Noah Weber, Leena Shekhar, Niranjan Balasubramanian","The Fine Line between Linguistic Generalization and Failure in - Seq2Seq-Attention Models",cs.CL," Seq2Seq based neural architectures have become the go-to architecture to -apply to sequence to sequence language tasks. Despite their excellent -performance on these tasks, recent work has noted that these models usually do -not fully capture the linguistic structure required to generalize beyond the -dense sections of the data distribution \cite{ettinger2017towards}, and as -such, are likely to fail on samples from the tail end of the distribution (such -as inputs that are noisy \citep{belkinovnmtbreak} or of different lengths -\citep{bentivoglinmtlength}). In this paper, we look at a model's ability to -generalize on a simple symbol rewriting task with a clearly defined structure. -We find that the model's ability to generalize this structure beyond the -training distribution depends greatly on the chosen random seed, even when -performance on the standard test set remains the same. This suggests that a -model's ability to capture generalizable structure is highly sensitive. -Moreover, this sensitivity may not be apparent when evaluating it on standard -test sets. -" -7429,1805.01460,"Denner S. Vieira, Sergio Picoli, and Renio S. Mendes",Robustness of sentence length measures in written texts,cs.CL physics.soc-ph," Hidden structural patterns in written texts have been subject of considerable -research in the last decades. In particular, mapping a text into a time series -of sentence lengths is a natural way to investigate text structure. Typically, -sentence length has been quantified by using measures based on the number of -words and the number of characters, but other variations are possible. To -quantify the robustness of different sentence length measures, we analyzed a -database containing about five hundred books in English. For each book, we -extracted six distinct measures of sentence length, including number of words -and number of characters (taking into account lemmatization and stop words -removal). We compared these six measures for each book by using i) Pearson's -coefficient to investigate linear correlations; ii) Kolmogorov--Smirnov test to -compare distributions; and iii) detrended fluctuation analysis (DFA) to -quantify auto-correlations. We have found that all six measures exhibit very -similar behavior, suggesting that sentence length is a robust measure related -to text structure. -" -7430,1805.01542,"Anuj Goyal, Angeliki Metallinou, Spyros Matsoukas","Fast and Scalable Expansion of Natural Language Understanding - Functionality for Intelligent Agents",cs.CL," Fast expansion of natural language functionality of intelligent virtual -agents is critical for achieving engaging and informative interactions. -However, developing accurate models for new natural language domains is a time -and data intensive process. We propose efficient deep neural network -architectures that maximally re-use available resources through transfer -learning. Our methods are applied for expanding the understanding capabilities -of a popular commercial agent and are evaluated on hundreds of new domains, -designed by internal or external developers. We demonstrate that our proposed -methods significantly increase accuracy in low resource settings and enable -rapid development of accurate models with less data. -" -7431,1805.01553,"Tsz Kin Lam, Julia Kreutzer, Stefan Riezler","A Reinforcement Learning Approach to Interactive-Predictive Neural - Machine Translation",cs.CL stat.ML," We present an approach to interactive-predictive neural machine translation -that attempts to reduce human effort from three directions: Firstly, instead of -requiring humans to select, correct, or delete segments, we employ the idea of -learning from human reinforcements in form of judgments on the quality of -partial translations. Secondly, human effort is further reduced by using the -entropy of word predictions as uncertainty criterion to trigger feedback -requests. Lastly, online updates of the model parameters after every -interaction allow the model to adapt quickly. We show in simulation experiments -that reward signals on partial translations significantly improve character -F-score and BLEU compared to feedback on full translations only, while human -effort can be reduced to an average number of $5$ feedback requests for every -input. -" -7432,1805.01555,Puyang Xu and Qi Hu,"An End-to-end Approach for Handling Unknown Slot Values in Dialogue - State Tracking",cs.CL," We highlight a practical yet rarely discussed problem in dialogue state -tracking (DST), namely handling unknown slot values. Previous approaches -generally assume predefined candidate lists and thus are not designed to output -unknown values, especially when the spoken language understanding (SLU) module -is absent as in many end-to-end (E2E) systems. We describe in this paper an E2E -architecture based on the pointer network (PtrNet) that can effectively extract -unknown slot values while still obtains state-of-the-art accuracy on the -standard DSTC2 benchmark. We also provide extensive empirical evidence to show -that tracking unknown values can be challenging and our approach can bring -significant improvement with the help of an effective feature dropout -technique. -" -7433,1805.01565,Lifeng Han and Shaohui Kuang,"Incorporating Chinese Radicals Into Neural Machine Translation: Deeper - Than Character Level",cs.CL," In neural machine translation (NMT), researchers face the challenge of -un-seen (or out-of-vocabulary OOV) words translation. To solve this, some -researchers propose the splitting of western languages such as English and -German into sub-words or compounds. In this paper, we try to address this OOV -issue and improve the NMT adequacy with a harder language Chinese whose -characters are even more sophisticated in composition. We integrate the Chinese -radicals into the NMT model with different settings to address the unseen words -challenge in Chinese to English translation. On the other hand, this also can -be considered as semantic part of the MT system since the Chinese radicals -usually carry the essential meaning of the words they are constructed in. -Meaningful radicals and new characters can be integrated into the NMT systems -with our models. We use an attention-based NMT system as a strong baseline -system. The experiments on standard Chinese-to-English NIST translation shared -task data 2006 and 2008 show that our designed models outperform the baseline -model in a wide range of state-of-the-art evaluation metrics including LEPOR, -BEER, and CharacTER, in addition to BLEU and NIST scores, especially on the -adequacy-level translation. We also have some interesting findings from the -results of our various experiment settings about the performance of words and -characters in Chinese NMT, which is different with other languages. For -instance, the fully character level NMT may perform well or the state of the -art in some other languages as researchers demonstrated recently, however, in -the Chinese NMT model, word boundary knowledge is important for the model -learning. -" -7434,1805.01589,"Lucas S. Oliveira, Pedro O. S. Vaz de Melo, Marcelo S. Amaral, Jos\'e - Ant\^onio. G. Pinho","When Politicians Talk About Politics: Identifying Political Tweets of - Brazilian Congressmen",cs.SI cs.CL," Since June 2013, when Brazil faced the largest and most significant mass -protests in a generation, a political crisis is in course. In midst of this -crisis, Brazilian politicians use social media to communicate with the -electorate in order to retain or to grow their political capital. The problem -is that many controversial topics are in course and deputies may prefer to -avoid such themes in their messages. To characterize this behavior, we propose -a method to accurately identify political and non-political tweets -independently of the deputy who posted it and of the time it was posted. -Moreover, we collected tweets of all congressmen who were active on Twitter and -worked in the Brazilian parliament from October 2013 to October 2017. To -evaluate our method, we used word clouds and a topic model to identify the main -political and non-political latent topics in parliamentarian tweets. Both -results indicate that our proposal is able to accurately distinguish political -from non-political tweets. Moreover, our analyses revealed a striking fact: -more than half of the messages posted by Brazilian deputies are non-political. -" -7435,1805.01646,"Roland Roller, Madeleine Kittner, Dirk Weissenborn, Ulf Leser",Cross-lingual Candidate Search for Biomedical Concept Normalization,cs.CL," Biomedical concept normalization links concept mentions in texts to a -semantically equivalent concept in a biomedical knowledge base. This task is -challenging as concepts can have different expressions in natural languages, -e.g. paraphrases, which are not necessarily all present in the knowledge base. -Concept normalization of non-English biomedical text is even more challenging -as non-English resources tend to be much smaller and contain less synonyms. To -overcome the limitations of non-English terminologies we propose a -cross-lingual candidate search for concept normalization using a -character-based neural translation model trained on a multilingual biomedical -terminology. Our model is trained with Spanish, French, Dutch and German -versions of UMLS. The evaluation of our model is carried out on the French -Quaero corpus, showing that it outperforms most teams of CLEF eHealth 2015 and -2016. Additionally, we compare performance to commercial translators on -Spanish, French, Dutch and German versions of Mantra. Our model performs -similarly well, but is free of charge and can be run locally. This is -particularly important for clinical NLP applications as medical documents -underlay strict privacy restrictions. -" -7436,1805.01676,"Christian Hadiwinoto, Hwee Tou Ng","Upping the Ante: Towards a Better Benchmark for Chinese-to-English - Machine Translation",cs.CL," There are many machine translation (MT) papers that propose novel approaches -and show improvements over their self-defined baselines. The experimental -setting in each paper often differs from one another. As such, it is hard to -determine if a proposed approach is really useful and advances the state of the -art. Chinese-to-English translation is a common translation direction in MT -papers, although there is not one widely accepted experimental setting in -Chinese-to-English MT. Our goal in this paper is to propose a benchmark in -evaluation setup for Chinese-to-English machine translation, such that the -effectiveness of a new proposed MT approach can be directly compared to -previous approaches. Towards this end, we also built a highly competitive -state-of-the-art MT system trained on a large-scale training set. Our system -outperforms reported results on NIST OpenMT test sets in almost all papers -published in major conferences and journals in computational linguistics and -artificial intelligence in the past 11 years. We argue that a standardized -benchmark on data and performance is important for meaningful comparison. -" -7437,1805.01817,Paul Michel and Graham Neubig,Extreme Adaptation for Personalized Neural Machine Translation,cs.CL," Every person speaks or writes their own flavor of their native language, -influenced by a number of factors: the content they tend to talk about, their -gender, their social status, or their geographical origin. - When attempting to perform Machine Translation (MT), these variations have a -significant effect on how the system should perform translation, but this is -not captured well by standard one-size-fits-all models. - In this paper, we propose a simple and parameter-efficient adaptation -technique that only requires adapting the bias of the output softmax to each -particular user of the MT system, either directly or through a factored -approximation. - Experiments on TED talks in three languages demonstrate improvements in -translation accuracy, and better reflection of speaker traits in the target -text. -" -7438,1805.01923,"Enrico Santus, Hongmin Wang, Emmanuele Chersoni and Yue Zhang",A Rank-Based Similarity Metric for Word Embeddings,cs.CL," Word Embeddings have recently imposed themselves as a standard for -representing word meaning in NLP. Semantic similarity between word pairs has -become the most common evaluation benchmark for these representations, with -vector cosine being typically used as the only similarity metric. In this -paper, we report experiments with a rank-based metric for WE, which performs -comparably to vector cosine in similarity estimation and outperforms it in the -recently-introduced and challenging task of outlier detection, thus suggesting -that rank-based measures can improve clustering quality. -" -7439,1805.01984,"Amlaan Bhoi, Sandeep Joshi",Various Approaches to Aspect-based Sentiment Analysis,cs.CL," The problem of aspect-based sentiment analysis deals with classifying -sentiments (negative, neutral, positive) for a given aspect in a sentence. A -traditional sentiment classification task involves treating the entire sentence -as a text document and classifying sentiments based on all the words. Let us -assume, we have a sentence such as ""the acceleration of this car is fast, but -the reliability is horrible"". This can be a difficult sentence because it has -two aspects with conflicting sentiments about the same entity. Considering -machine learning techniques (or deep learning), how do we encode the -information that we are interested in one aspect and its sentiment but not the -other? Let us explore various pre-processing steps, features, and methods used -to facilitate in solving this task. -" -7440,1805.02023,Yue Zhang and Jie Yang,Chinese NER Using Lattice LSTM,cs.CL," We investigate a lattice-structured LSTM model for Chinese NER, which encodes -a sequence of input characters as well as all potential words that match a -lexicon. Compared with character-based methods, our model explicitly leverages -word and word sequence information. Compared with word-based methods, lattice -LSTM does not suffer from segmentation errors. Gated recurrent cells allow our -model to choose the most relevant characters and words from a sentence for -better NER results. Experiments on various datasets show that lattice LSTM -outperforms both word-based and character-based LSTM baselines, achieving the -best results. -" -7441,1805.02036,Duygu Ataman and Marcello Federico,"Compositional Representation of Morphologically-Rich Input for Neural - Machine Translation",cs.CL," Neural machine translation (NMT) models are typically trained with fixed-size -input and output vocabularies, which creates an important bottleneck on their -accuracy and generalization capability. As a solution, various studies proposed -segmenting words into sub-word units and performing translation at the -sub-lexical level. However, statistical word segmentation methods have recently -shown to be prone to morphological errors, which can lead to inaccurate -translations. In this paper, we propose to overcome this problem by replacing -the source-language embedding layer of NMT with a bi-directional recurrent -neural network that generates compositional representations of the input at any -desired level of granularity. We test our approach in a low-resource setting -with five languages from different morphological typologies, and under -different composition assumptions. By training NMT to compose word -representations from character n-grams, our approach consistently outperforms -(from 1.71 to 2.48 BLEU points) NMT learning embeddings of statistically -generated sub-word units. -" -7442,1805.02094,"Robert Lim, Kenneth Heafield, Hieu Hoang, Mark Briers and Allen Malony","Exploring Hyper-Parameter Optimization for Neural Machine Translation on - GPU Architectures",cs.CL," Neural machine translation (NMT) has been accelerated by deep learning neural -networks over statistical-based approaches, due to the plethora and -programmability of commodity heterogeneous computing architectures such as -FPGAs and GPUs and the massive amount of training corpuses generated from news -outlets, government agencies and social media. Training a learning classifier -for neural networks entails tuning hyper-parameters that would yield the best -performance. Unfortunately, the number of parameters for machine translation -include discrete categories as well as continuous options, which makes for a -combinatorial explosive problem. This research explores optimizing -hyper-parameters when training deep learning neural networks for machine -translation. Specifically, our work investigates training a language model with -Marian NMT. Results compare NMT under various hyper-parameter settings across a -variety of modern GPU architecture generations in single node and multi-node -settings, revealing insights on which hyper-parameters matter most in terms of -performance, such as words processed per second, convergence rates, and -translation accuracy, and provides insights on how to best achieve -high-performing NMT systems. -" -7443,1805.02096,Dmitriy Dligach and Timothy Miller,Learning Patient Representations from Text,cs.CL," Mining electronic health records for patients who satisfy a set of predefined -criteria is known in medical informatics as phenotyping. Phenotyping has -numerous applications such as outcome prediction, clinical trial recruitment, -and retrospective studies. Supervised machine learning for phenotyping -typically relies on sparse patient representations such as bag-of-words. We -consider an alternative that involves learning patient representations. We -develop a neural network model for learning patient representations and show -that the learned representations are general enough to obtain state-of-the-art -performance on a standard comorbidity detection task. -" -7444,1805.02203,"Rem Hida, Naoya Takeishi, Takehisa Yairi, Koichi Hori","Dynamic and Static Topic Model for Analyzing Time-Series Document - Collections",cs.CL," For extracting meaningful topics from texts, their structures should be -considered properly. In this paper, we aim to analyze structured time-series -documents such as a collection of news articles and a series of scientific -papers, wherein topics evolve along time depending on multiple topics in the -past and are also related to each other at each time. To this end, we propose a -dynamic and static topic model, which simultaneously considers the dynamic -structures of the temporal topic evolution and the static structures of the -topic hierarchy at each time. We show the results of experiments on collections -of scientific papers, in which the proposed method outperformed conventional -models. Moreover, we show an example of extracted topic structures, which we -found helpful for analyzing research activities. -" -7445,1805.02214,"Marek Rei, Anders S{\o}gaard","Zero-shot Sequence Labeling: Transferring Knowledge from Sentences to - Tokens",cs.CL cs.LG cs.NE," Can attention- or gradient-based visualization techniques be used to infer -token-level labels for binary sequence tagging problems, using networks trained -only on sentence-level labels? We construct a neural network architecture based -on soft attention, train it as a binary sentence classifier and evaluate -against token-level annotation on four different datasets. Inferring token -labels from a network provides a method for quantitatively evaluating what the -model is learning, along with generating useful feedback in assistance systems. -Our results indicate that attention-based methods are able to predict -token-level labels more accurately, compared to gradient-based methods, -sometimes even rivaling the supervised oracle network. -" -7446,1805.02220,"Yizhong Wang, Kai Liu, Jing Liu, Wei He, Yajuan Lyu, Hua Wu, Sujian Li - and Haifeng Wang","Multi-Passage Machine Reading Comprehension with Cross-Passage Answer - Verification",cs.CL," Machine reading comprehension (MRC) on real web data usually requires the -machine to answer a question by analyzing multiple passages retrieved by search -engine. Compared with MRC on a single passage, multi-passage MRC is more -challenging, since we are likely to get multiple confusing answer candidates -from different passages. To address this problem, we propose an end-to-end -neural model that enables those answer candidates from different passages to -verify each other based on their content representations. Specifically, we -jointly train three modules that can predict the final answer based on three -factors: the answer boundary, the answer content and the cross-passage answer -verification. The experimental results show that our method outperforms the -baseline by a large margin and achieves the state-of-the-art performance on the -English MS-MARCO dataset and the Chinese DuReader dataset, both of which are -designed for MRC in real-world settings. -" -7447,1805.02258,Andrey Kutuzov,Russian word sense induction by clustering averaged word embeddings,cs.CL," The paper reports our participation in the shared task on word sense -induction and disambiguation for the Russian language (RUSSE-2018). Our team -was ranked 2nd for the wiki-wiki dataset (containing mostly homonyms) and 5th -for the bts-rnc and active-dict datasets (containing mostly polysemous words) -among all 19 participants. - The method we employed was extremely naive. It implied representing contexts -of ambiguous words as averaged word embedding vectors, using off-the-shelf -pre-trained distributional models. Then, these vector representations were -clustered with mainstream clustering techniques, thus producing the groups -corresponding to the ambiguous word senses. As a side result, we show that word -embedding models trained on small but balanced corpora can be superior to those -trained on large but noisy data - not only in intrinsic evaluation, but also in -downstream tasks like word sense induction. -" -7448,1805.02262,"Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles - Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu - Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, - Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, - Zheng Yuan, Madeleine van Zuylen, Oren Etzioni",Construction of the Literature Graph in Semantic Scholar,cs.CL," We describe a deployed scalable system for organizing published scientific -literature into a heterogeneous graph to facilitate algorithmic manipulation -and discovery. The resulting literature graph consists of more than 280M nodes, -representing papers, authors, entities and various interactions between them -(e.g., authorships, citations, entity mentions). We reduce literature graph -construction into familiar NLP tasks (e.g., entity extraction and linking), -point out research challenges due to differences from standard formulations of -these tasks, and report empirical results for each task. The methods described -in this paper are used to enable semantic features in www.semanticscholar.org -" -7449,1805.02266,"Max Glockner, Vered Shwartz, and Yoav Goldberg","Breaking NLI Systems with Sentences that Require Simple Lexical - Inferences",cs.CL," We create a new NLI test set that shows the deficiency of state-of-the-art -models in inferences that require lexical and world knowledge. The new examples -are simpler than the SNLI test set, containing sentences that differ by at most -one word from sentences in the training set. Yet, the performance on the new -test set is substantially worse across systems trained on SNLI, demonstrating -that these systems are limited in their generalization ability, failing to -capture many simple inferences. -" -7450,1805.02275,"Tasnim Mohiuddin, Shafiq Joty and Dat Tien Nguyen","Coherence Modeling of Asynchronous Conversations: A Neural Entity Grid - Approach",cs.CL," We propose a novel coherence model for written asynchronous conversations -(e.g., forums, emails), and show its applications in coherence assessment and -thread reconstruction tasks. We conduct our research in two steps. First, we -propose improvements to the recently proposed neural entity grid model by -lexicalizing its entity transitions. Then, we extend the model to asynchronous -conversations by incorporating the underlying conversational structure in the -entity grid representation and feature computation. Our model achieves state of -the art results on standard coherence assessment tasks in monologue and -conversations outperforming existing models. We also demonstrate its -effectiveness in reconstructing thread structures. -" -7451,1805.02282,Sander Tars and Mark Fishel,Multi-Domain Neural Machine Translation,cs.CL," We present an approach to neural machine translation (NMT) that supports -multiple domains in a single model and allows switching between the domains -when translating. The core idea is to treat text domains as distinct languages -and use multilingual NMT methods to create multi-domain translation systems, we -show that this approach results in significant translation quality gains over -fine-tuning. We also explore whether the knowledge of pre-specified text -domains is necessary, turns out that it is after all, but also that when it is -not known quite high translation quality can be reached. -" -7452,1805.02333,"Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou","Learning Matching Models with Weak Supervision for Response Selection in - Retrieval-based Chatbots",cs.CL," We propose a method that can leverage unlabeled data to learn a matching -model for response selection in retrieval-based chatbots. The method employs a -sequence-to-sequence architecture (Seq2Seq) model as a weak annotator to judge -the matching degree of unlabeled pairs, and then performs learning with both -the weak signals and the unlabeled data. Experimental results on two public -data sets indicate that matching models get significant improvements when they -are learned with the proposed method. -" -7453,1805.02356,"Xin Qian, Ziyi Zhong, Jieli Zhou",Multimodal Machine Translation with Reinforcement Learning,cs.CL cs.AI cs.IR cs.MA cs.MM," Multimodal machine translation is one of the applications that integrates -computer vision and language processing. It is a unique task given that in the -field of machine translation, many state-of-the-arts algorithms still only -employ textual information. In this work, we explore the effectiveness of -reinforcement learning in multimodal machine translation. We present a novel -algorithm based on the Advantage Actor-Critic (A2C) algorithm that specifically -cater to the multimodal machine translation task of the EMNLP 2018 Third -Conference on Machine Translation (WMT18). We experiment our proposed algorithm -on the Multi30K multilingual English-German image description dataset and the -Flickr30K image entity dataset. Our model takes two channels of inputs, image -and text, uses translation evaluation metrics as training rewards, and achieves -better results than supervised learning MLE baseline models. Furthermore, we -discuss the prospects and limitations of using reinforcement learning for -machine translation. Our experiment results suggest a promising reinforcement -learning solution to the general task of multimodal sequence to sequence -learning. -" -7454,1805.02400,"Mika Juuti, Bo Sun, Tatsuya Mori, and N. Asokan",Stay On-Topic: Generating Context-specific Fake Restaurant Reviews,cs.CR cs.CL," Automatically generated fake restaurant reviews are a threat to online review -systems. Recent research has shown that users have difficulties in detecting -machine-generated fake reviews hiding among real restaurant reviews. The method -used in this work (char-LSTM ) has one drawback: it has difficulties staying in -context, i.e. when it generates a review for specific target entity, the -resulting review may contain phrases that are unrelated to the target, thus -increasing its detectability. In this work, we present and evaluate a more -sophisticated technique based on neural machine translation (NMT) with which we -can generate reviews that stay on-topic. We test multiple variants of our -technique using native English speakers on Amazon Mechanical Turk. We -demonstrate that reviews generated by the best variant have almost optimal -undetectability (class-averaged F-score 47%). We conduct a user study with -skeptical users and show that our method evades detection more frequently -compared to the state-of-the-art (average evasion 3.2/4 vs 1.5/4) with -statistical significance, at level {\alpha} = 1% (Section 4.3). We develop very -effective detection tools and reach average F-score of 97% in classifying -these. Although fake reviews are very effective in fooling people, effective -automatic detection is still feasible. -" -7455,1805.02408,"Boyang Ding, Quan Wang, Bin Wang, Li Guo",Improving Knowledge Graph Embedding Using Simple Constraints,cs.AI cs.CL," Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of -current research. Early works performed this task via simple models developed -over KG triples. Recent attempts focused on either designing more complicated -triple scoring models, or incorporating extra information beyond triples. This -paper, by contrast, investigates the potential of using very simple constraints -to improve KG embedding. We examine non-negativity constraints on entity -representations and approximate entailment constraints on relation -representations. The former help to learn compact and interpretable -representations for entities. The latter further encode regularities of logical -entailment between relations into their distributed representations. These -constraints impose prior beliefs upon the structure of the embedding space, -without negative impacts on efficiency or scalability. Evaluation on WordNet, -Freebase, and DBpedia shows that our approach is simple yet surprisingly -effective, significantly and consistently outperforming competitive baselines. -The constraints imposed indeed improve model interpretability, leading to a -substantially increased structuring of the embedding space. Code and data are -available at https://github.com/iieir-km/ComplEx-NNE_AER. -" -7456,1805.02442,Vered Shwartz and Ido Dagan,Paraphrase to Explicate: Revealing Implicit Noun-Compound Relations,cs.CL," Revealing the implicit semantic relation between the constituents of a -noun-compound is important for many NLP applications. It has been addressed in -the literature either as a classification task to a set of pre-defined -relations or by producing free text paraphrases explicating the relations. Most -existing paraphrasing methods lack the ability to generalize, and have a hard -time interpreting infrequent or new noun-compounds. We propose a neural model -that generalizes better by representing paraphrases in a continuous space, -generalizing for both unseen noun-compounds and rare paraphrases. Our model -helps improving performance on both the noun-compound paraphrasing and -classification tasks. -" -7457,1805.02473,"Linfeng Song, Yue Zhang, Zhiguo Wang and Daniel Gildea",A Graph-to-Sequence Model for AMR-to-Text Generation,cs.CL," The problem of AMR-to-text generation is to recover a text representing the -same meaning as an input AMR graph. The current state-of-the-art method uses a -sequence-to-sequence model, leveraging LSTM for encoding a linearized AMR -structure. Although being able to model non-local semantic information, a -sequence LSTM can lose information from the AMR graph structure, and thus faces -challenges with large graphs, which result in long sequences. We introduce a -neural graph-to-sequence model, using a novel LSTM structure for directly -encoding graph-level semantics. On a standard benchmark, our model shows -superior results to existing methods in the literature. -" -7458,1805.02474,"Yue Zhang, Qi Liu and Linfeng Song",Sentence-State LSTM for Text Representation,cs.CL cs.LG stat.ML," Bi-directional LSTMs are a powerful tool for text representation. On the -other hand, they have been shown to suffer various limitations due to their -sequential nature. We investigate an alternative LSTM structure for encoding -text, which consists of a parallel state for each word. Recurrent steps are -used to perform local and global information exchange between words -simultaneously, rather than incremental reading of a sequence of words. Results -on various classification and sequence labelling benchmarks show that the -proposed model has strong representation power, giving highly competitive -performances compared to stacked BiLSTM models with similar parameter numbers. -" -7459,1805.02823,"Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin",Hierarchical Structured Model for Fine-to-coarse Manifesto Text Analysis,cs.CL," Election manifestos document the intentions, motives, and views of political -parties. They are often used for analysing a party's fine-grained position on a -particular issue, as well as for coarse-grained positioning of a party on the -left--right spectrum. In this paper we propose a two-stage model for -automatically performing both levels of analysis over manifestos. In the first -step we employ a hierarchical multi-task structured deep model to predict fine- -and coarse-grained positions, and in the second step we perform post-hoc -calibration of coarse-grained positions using probabilistic soft logic. We -empirically show that the proposed model outperforms state-of-art approaches at -both granularities using manifestos from twelve countries, written in ten -different languages. -" -7460,1805.02856,"Yi Tay, Luu Anh Tuan, Siu Cheung Hui, Jian Su",Reasoning with Sarcasm by Reading In-between,cs.CL cs.AI cs.IR," Sarcasm is a sophisticated speech act which commonly manifests on social -communities such as Twitter and Reddit. The prevalence of sarcasm on the social -web is highly disruptive to opinion mining systems due to not only its tendency -of polarity flipping but also usage of figurative language. Sarcasm commonly -manifests with a contrastive theme either between positive-negative sentiments -or between literal-figurative scenarios. In this paper, we revisit the notion -of modeling contrast in order to reason with sarcasm. More specifically, we -propose an attention-based neural model that looks in-between instead of -across, enabling it to explicitly model contrast and incongruity. We conduct -extensive experiments on six benchmark datasets from Twitter, Reddit and the -Internet Argument Corpus. Our proposed model not only achieves state-of-the-art -performance on all datasets but also enjoys improved interpretability. -" -7461,1805.02867,"Maxim Milakov (NVIDIA), Natalia Gimelshein (NVIDIA)",Online normalizer calculation for softmax,cs.PF cs.AI cs.CL," The Softmax function is ubiquitous in machine learning, multiple previous -works suggested faster alternatives for it. In this paper we propose a way to -compute classical Softmax with fewer memory accesses and hypothesize that this -reduction in memory accesses should improve Softmax performance on actual -hardware. The benchmarks confirm this hypothesis: Softmax accelerates by up to -1.3x and Softmax+TopK combined and fused by up to 5x. -" -7462,1805.02914,"Xiaowei Tong, Zhenxin Fu, Mingyue Shang, Dongyan Zhao, Rui Yan","One ""Ruler"" for All Languages: Multi-Lingual Dialogue Evaluation with - Adversarial Multi-Task Learning",cs.CL," Automatic evaluating the performance of Open-domain dialogue system is a -challenging problem. Recent work in neural network-based metrics has shown -promising opportunities for automatic dialogue evaluation. However, existing -methods mainly focus on monolingual evaluation, in which the trained metric is -not flexible enough to transfer across different languages. To address this -issue, we propose an adversarial multi-task neural metric (ADVMT) for -multi-lingual dialogue evaluation, with shared feature extraction across -languages. We evaluate the proposed model in two different languages. -Experiments show that the adversarial multi-task neural metric achieves a high -correlation with human annotation, which yields better performance than -monolingual ones and various existing metrics. -" -7463,1805.02917,"Motoki Sato, Jun Suzuki, Hiroyuki Shindo, Yuji Matsumoto",Interpretable Adversarial Perturbation in Input Embedding Space for Text,cs.LG cs.CL stat.ML," Following great success in the image processing field, the idea of -adversarial training has been applied to tasks in the natural language -processing (NLP) field. One promising approach directly applies adversarial -training developed in the image processing field to the input word embedding -space instead of the discrete input space of texts. However, this approach -abandons such interpretability as generating adversarial texts to significantly -improve the performance of NLP tasks. This paper restores interpretability to -such methods by restricting the directions of perturbations toward the existing -words in the input embedding space. As a result, we can straightforwardly -reconstruct each input with perturbations to an actual text by considering the -perturbations to be the replacement of words in the sentence while maintaining -or even improving the task performance. -" -7464,1805.02924,"Kwanchiva Thangthai, Helen L Bear and Richard Harvey",Comparing phonemes and visemes with DNN-based lipreading,cs.CV cs.CL cs.SD eess.AS eess.IV," There is debate if phoneme or viseme units are the most effective for a -lipreading system. Some studies use phoneme units even though phonemes describe -unique short sounds; other studies tried to improve lipreading accuracy by -focusing on visemes with varying results. We compare the performance of a -lipreading system by modeling visual speech using either 13 viseme or 38 -phoneme units. We report the accuracy of our system at both word and unit -levels. The evaluation task is large vocabulary continuous speech using the -TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs -and our visual speech decoder is a Weighted Finite-State Transducer (WFST). We -use DCT and Eigenlips as a representation of mouth ROI image. The phoneme -lipreading system word accuracy outperforms the viseme based system word -accuracy. However, the phoneme system achieved lower accuracy at the unit level -which shows the importance of the dictionary for decoding classification -outputs into words. -" -7465,1805.02937,Jinyi Zhang and Tadahiro Matsumoto,"Improving Character-level Japanese-Chinese Neural Machine Translation - with Radicals as an Additional Input Feature",cs.CL," In recent years, Neural Machine Translation (NMT) has been proven to get -impressive results. While some additional linguistic features of input words -improve word-level NMT, any additional character features have not been used to -improve character-level NMT so far. In this paper, we show that the radicals of -Chinese characters (or kanji), as a character feature information, can be -easily provide further improvements in the character-level NMT. In experiments -on WAT2016 Japanese-Chinese scientific paper excerpt corpus (ASPEC-JP), we find -that the proposed method improves the translation quality according to two -aspects: perplexity and BLEU. The character-level NMT with the radical input -feature's model got a state-of-the-art result of 40.61 BLEU points in the test -set, which is an improvement of about 8.6 BLEU points over the best system on -the WAT2016 Japanese-to-Chinese translation subtask with ASPEC-JP. The -improvements over the character-level NMT with no additional input feature are -up to about 1.5 and 1.4 BLEU points in the development-test set and the test -set of the corpus, respectively. -" -7466,1805.02958,Akihiro Kato and Tomi Kinnunen,"A Regression Model of Recurrent Deep Neural Networks for Noise Robust - Estimation of the Fundamental Frequency Contour of Speech",eess.AS cs.CL cs.SD stat.ML," The fundamental frequency (F0) contour of speech is a key aspect to represent -speech prosody that finds use in speech and spoken language analysis such as -voice conversion and speech synthesis as well as speaker and language -identification. This work proposes new methods to estimate the F0 contour of -speech using deep neural networks (DNNs) and recurrent neural networks (RNNs). -They are trained using supervised learning with the ground truth of F0 -contours. The latest prior research addresses this problem first as a -frame-by-frame-classification problem followed by sequence tracking using deep -neural network hidden Markov model (DNN-HMM) hybrid architecture. This study, -however, tackles the problem as a regression problem instead, in order to -obtain F0 contours with higher frequency resolution from clean and noisy -speech. Experiments using PTDB-TUG corpus contaminated with additive noise -(NOISEX-92) show the proposed method improves gross pitch error (GPE) by more -than 25 % at signal-to-noise ratios (SNRs) between -10 dB and +10 dB as -compared with one of the most noise-robust F0 trackers, PEFAC. Furthermore, the -performance on fine pitch error (FPE) is improved by approximately 20 % against -a state-of-the-art DNN-HMM-based approach. -" -7467,1805.03122,"Rob van der Goot, Nikola Ljube\v{s}i\'c, Ian Matroos, Malvina Nissim, - Barbara Plank",Bleaching Text: Abstract Features for Cross-lingual Gender Prediction,cs.CL," Gender prediction has typically focused on lexical and social network -features, yielding good performance, but making systems highly language-, -topic-, and platform-dependent. Cross-lingual embeddings circumvent some of -these limitations, but capture gender-specific style less. We propose an -alternative: bleaching text, i.e., transforming lexical strings into more -abstract features. This study provides evidence that such features allow for -better transfer across languages. Moreover, we present a first study on the -ability of humans to perform cross-lingual gender prediction. We find that -human predictive power proves similar to that of our bleached models, and both -perform better than lexical models. -" -7468,1805.03162,"Tong Niu, Mohit Bansal",Polite Dialogue Generation Without Parallel Data,cs.CL cs.AI cs.LG," Stylistic dialogue response generation, with valuable applications in -personality-based conversational agents, is a challenging task because the -response needs to be fluent, contextually-relevant, as well as -paralinguistically accurate. Moreover, parallel datasets for -regular-to-stylistic pairs are usually unavailable. We present three -weakly-supervised models that can generate diverse polite (or rude) dialogue -responses without parallel data. Our late fusion model (Fusion) merges the -decoder of an encoder-attention-decoder dialogue model with a language model -trained on stand-alone polite utterances. Our label-fine-tuning (LFT) model -prepends to each source sequence a politeness-score scaled label (predicted by -our state-of-the-art politeness classifier) during training, and at test time -is able to generate polite, neutral, and rude responses by simply scaling the -label embedding by the corresponding score. Our reinforcement learning model -(Polite-RL) encourages politeness generation by assigning rewards proportional -to the politeness classifier score of the sampled response. We also present two -retrieval-based polite dialogue model baselines. Human evaluation validates -that while the Fusion and the retrieval-based models achieve politeness with -poorer context-relevance, the LFT and Polite-RL models can produce -significantly more polite responses without sacrificing dialogue quality. -" -7469,1805.03228,"Ivan Vuli\'c, Goran Glava\v{s}, Nikola Mrk\v{s}i\'c, Anna Korhonen","Post-Specialisation: Retrofitting Vectors of Words Unseen in Lexical - Resources",cs.CL," Word vector specialisation (also known as retrofitting) is a portable, -light-weight approach to fine-tuning arbitrary distributional word vector -spaces by injecting external knowledge from rich lexical resources such as -WordNet. By design, these post-processing methods only update the vectors of -words occurring in external lexicons, leaving the representations of all unseen -words intact. In this paper, we show that constraint-driven vector space -specialisation can be extended to unseen words. We propose a novel -post-specialisation method that: a) preserves the useful linguistic knowledge -for seen words; while b) propagating this external signal to unseen words in -order to improve their vector representations as well. Our post-specialisation -approach explicits a non-linear specialisation function in the form of a deep -neural network by learning to predict specialised vectors from their original -distributional counterparts. The learned function is then used to specialise -vectors of unseen words. This approach, applicable to any post-processing -model, yields considerable gains over the initial specialisation models both in -intrinsic word similarity tasks, and in two downstream tasks: dialogue state -tracking and lexical text simplification. The positive effects persist across -three languages, demonstrating the importance of specialising the full -vocabulary of distributional word vector spaces. -" -7470,1805.03257,"Jiaping Zhang, Tiancheng Zhao and Zhou Yu","Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented - Visual Dialog",cs.CL," Creating an intelligent conversational system that understands vision and -language is one of the ultimate goals in Artificial Intelligence -(AI)~\cite{winograd1972understanding}. Extensive research has focused on -vision-to-language generation, however, limited research has touched on -combining these two modalities in a goal-driven dialog context. We propose a -multimodal hierarchical reinforcement learning framework that dynamically -integrates vision and language for task-oriented visual dialog. The framework -jointly learns the multimodal dialog state representation and the hierarchical -dialog policy to improve both dialog task success and efficiency. We also -propose a new technique, state adaptation, to integrate context awareness in -the dialog state representation. We evaluate the proposed framework and the -state adaptation technique in an image guessing game and achieve promising -results. -" -7471,1805.03294,"Albert Zeyer, and Kazuki Irie, and Ralf Schl\""uter, and Hermann Ney",Improved training of end-to-end attention models for speech recognition,cs.CL cs.LG stat.ML," Sequence-to-sequence attention-based models on subword units allow simple -open-vocabulary end-to-end speech recognition. In this work, we show that such -models can achieve competitive results on the Switchboard 300h and LibriSpeech -1000h tasks. In particular, we report the state-of-the-art word error rates -(WER) of 3.54% on the dev-clean and 3.82% on the test-clean evaluation subsets -of LibriSpeech. We introduce a new pretraining scheme by starting with a high -time reduction factor and lowering it during training, which is crucial both -for convergence and final performance. In some experiments, we also use an -auxiliary CTC loss function to help the convergence. In addition, we train long -short-term memory (LSTM) language models on subword units. By shallow fusion, -we report up to 27% relative improvements in WER over the attention baseline -without a language model. -" -7472,1805.03308,"Stefan Feuerriegel, Nicolas Pr\""ollochs","Investor Reaction to Financial Disclosures Across Topics: An Application - of Latent Dirichlet Allocation",cs.CL q-fin.GN," This paper provides a holistic study of how stock prices vary in their -response to financial disclosures across different topics. Thereby, we -specifically shed light into the extensive amount of filings for which no a -priori categorization of their content exists. For this purpose, we utilize an -approach from data mining - namely, latent Dirichlet allocation - as a means of -topic modeling. This technique facilitates our task of automatically -categorizing, ex ante, the content of more than 70,000 regulatory 8-K filings -from U.S. companies. We then evaluate the subsequent stock market reaction. Our -empirical evidence suggests a considerable discrepancy among various types of -news stories in terms of their relevance and impact on financial markets. For -instance, we find a statistically significant abnormal return in response to -earnings results and credit rating, but also for disclosures regarding business -strategy, the health sector, as well as mergers and acquisitions. Our results -yield findings that benefit managers, investors and policy-makers by indicating -how regulatory filings should be structured and the topics most likely to -precede changes in stock valuations. -" -7473,1805.03322,Prashanth Gurunath Shivakumar and Panayiotis Georgiou,"Transfer Learning from Adult to Children for Speech Recognition: - Evaluation, Analysis and Recommendations",eess.AS cs.CL cs.SD," Children speech recognition is challenging mainly due to the inherent high -variability in children's physical and articulatory characteristics and -expressions. This variability manifests in both acoustic constructs and -linguistic usage due to the rapidly changing developmental stage in children's -life. Part of the challenge is due to the lack of large amounts of available -children speech data for efficient modeling. This work attempts to address the -key challenges using transfer learning from adult's models to children's models -in a Deep Neural Network (DNN) framework for children's Automatic Speech -Recognition (ASR) task evaluating on multiple children's speech corpora with a -large vocabulary. The paper presents a systematic and an extensive analysis of -the proposed transfer learning technique considering the key factors affecting -children's speech recognition from prior literature. Evaluations are presented -on (i) comparisons of earlier GMM-HMM and the newer DNN Models, (ii) -effectiveness of standard adaptation techniques versus transfer learning, (iii) -various adaptation configurations in tackling the variabilities present in -children speech, in terms of (a) acoustic spectral variability, and (b) -pronunciation variability and linguistic constraints. Our Analysis spans over -(i) number of DNN model parameters (for adaptation), (ii) amount of adaptation -data, (iii) ages of children, (iv) age dependent-independent adaptation. -Finally, we provide Recommendations on (i) the favorable strategies over -various aforementioned - analyzed parameters, and (ii) potential future -research directions and relevant challenges/problems persisting in DNN based -ASR for children's speech. -" -7474,1805.03330,"Nikola I. Nikolov, Yuhuang Hu, Mi Xue Tan, Richard H.R. Hahnloser",Character-level Chinese-English Translation through ASCII Encoding,cs.CL," Character-level Neural Machine Translation (NMT) models have recently -achieved impressive results on many language pairs. They mainly do well for -Indo-European language pairs, where the languages share the same writing -system. However, for translating between Chinese and English, the gap between -the two different writing systems poses a major challenge because of a lack of -systematic correspondence between the individual linguistic units. In this -paper, we enable character-level NMT for Chinese, by breaking down Chinese -characters into linguistic units similar to that of Indo-European languages. We -use the Wubi encoding scheme, which preserves the original shape and semantic -information of the characters, while also being reversible. We show promising -results from training Wubi-based models on the character- and subword-level -with recurrent as well as convolutional models. -" -7475,1805.03366,"Chao Jiang, Hsiang-Fu Yu, Cho-Jui Hsieh, Kai-Wei Chang",LearningWord Embeddings for Low-resource Languages by PU Learning,cs.CL," Word embedding is a key component in many downstream applications in -processing natural languages. Existing approaches often assume the existence of -a large collection of text for learning effective word embedding. However, such -a corpus may not be available for some low-resource languages. In this paper, -we study how to effectively learn a word embedding model on a corpus with only -a few million tokens. In such a situation, the co-occurrence matrix is sparse -as the co-occurrences of many word pairs are unobserved. In contrast to -existing approaches often only sample a few unobserved word pairs as negative -samples, we argue that the zero entries in the co-occurrence matrix also -provide valuable information. We then design a Positive-Unlabeled Learning -(PU-Learning) approach to factorize the co-occurrence matrix and validate the -proposed approaches in four different languages. -" -7476,1805.03379,"Manqing Dong, Lina Yao, Xianzhi Wang, Boualem Benatallah, Chaoran - Huang, Xiaodong Ning",Opinion Fraud Detection via Neural Autoencoder Decision Forest,cs.CL cs.AI cs.LG," Online reviews play an important role in influencing buyers' daily purchase -decisions. However, fake and meaningless reviews, which cannot reflect users' -genuine purchase experience and opinions, widely exist on the Web and pose -great challenges for users to make right choices. Therefore,it is desirable to -build a fair model that evaluates the quality of products by distinguishing -spamming reviews. We present an end-to-end trainable unified model to leverage -the appealing properties from Autoencoder and random forest. A stochastic -decision tree model is implemented to guide the global parameter learning -process. Extensive experiments were conducted on a large Amazon review dataset. -The proposed model consistently outperforms a series of compared methods. -" -7477,1805.03435,"Vitalii Zhelezniak, Dan Busbridge, April Shen, Samuel L. Smith and - Nils Y. Hammerla","Decoding Decoders: Finding Optimal Representation Spaces for - Unsupervised Similarity Tasks",cs.AI cs.CL cs.LG," Experimental evidence indicates that simple models outperform complex deep -networks on many unsupervised similarity tasks. We provide a simple yet -rigorous explanation for this behaviour by introducing the concept of an -optimal representation space, in which semantically close symbols are mapped to -representations that are close under a similarity measure induced by the -model's objective function. In addition, we present a straightforward procedure -that, without any retraining or architectural modifications, allows deep -recurrent models to perform equally well (and sometimes better) when compared -to shallow models. To validate our analysis, we conduct a set of consistent -empirical evaluations and introduce several new sentence embedding models in -the process. Even though this work is presented within the context of natural -language processing, the insights are readily applicable to other domains that -rely on distributed representations for transfer tasks. -" -7478,1805.03616,"Li Wang, Junlin Yao, Yunzhe Tao, Li Zhong, Wei Liu, Qiang Du","A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for - Abstractive Text Summarization",cs.CL cs.LG stat.ML," In this paper, we propose a deep learning approach to tackle the automatic -summarization tasks by incorporating topic information into the convolutional -sequence-to-sequence (ConvS2S) model and using self-critical sequence training -(SCST) for optimization. Through jointly attending to topics and word-level -alignment, our approach can improve coherence, diversity, and informativeness -of generated summaries via a biased probability generation mechanism. On the -other hand, reinforcement training, like SCST, directly optimizes the proposed -model with respect to the non-differentiable metric ROUGE, which also avoids -the exposure bias during inference. We carry out the experimental evaluation -with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. -The empirical results demonstrate the superiority of our proposed method in the -abstractive summarization. -" -7479,1805.03620,"Anders S{\o}gaard, Sebastian Ruder, Ivan Vuli\'c",On the Limitations of Unsupervised Bilingual Dictionary Induction,cs.CL cs.LG stat.ML," Unsupervised machine translation---i.e., not assuming any cross-lingual -supervision signal, whether a dictionary, translations, or comparable -corpora---seems impossible, but nevertheless, Lample et al. (2018) recently -proposed a fully unsupervised machine translation (MT) model. The model relies -heavily on an adversarial, unsupervised alignment of word embedding spaces for -bilingual dictionary induction (Conneau et al., 2018), which we examine here. -Our results identify the limitations of current unsupervised MT: unsupervised -bilingual dictionary induction performs much worse on morphologically rich -languages that are not dependent marking, when monolingual corpora from -different domains or different embedding algorithms are used. We show that a -simple trick, exploiting a weak supervision signal from identical words, -enables more robust induction, and establish a near-perfect correlation between -unsupervised bilingual dictionary induction performance and a previously -unexplored graph similarity metric. -" -7480,1805.03642,"Avishek Joey Bose, Huan Ling, Yanshuai Cao",Adversarial Contrastive Estimation,cs.CL cs.AI cs.LG," Learning by contrasting positive and negative samples is a general strategy -adopted by many methods. Noise contrastive estimation (NCE) for word embeddings -and translating embeddings for knowledge graphs are examples in NLP employing -this approach. In this work, we view contrastive learning as an abstraction of -all such methods and augment the negative sampler into a mixture distribution -containing an adversarially learned sampler. The resulting adaptive sampler -finds harder negative examples, which forces the main model to learn a better -representation of the data. We evaluate our proposal on learning word -embeddings, order embeddings and knowledge graph embeddings and observe both -faster convergence and improved results on multiple metrics. -" -7481,1805.03645,Taraka Rama,"Three tree priors and five datasets: A study of the effect of tree - priors in Indo-European phylogenetics",cs.CL," The age of the root of the Indo-European language family has received much -attention since the application of Bayesian phylogenetic methods by Gray and -Atkinson(2003). The root age of the Indo-European family has tended to decrease -from an age that supported the Anatolian origin hypothesis to an age that -supports the Steppe origin hypothesis with the application of new models (Chang -et al., 2015). However, none of the published work in the Indo-European -phylogenetics studied the effect of tree priors on phylogenetic analyses of the -Indo-European family. In this paper, I intend to fill this gap by exploring the -effect of tree priors on different aspects of the Indo-European family's -phylogenetic inference. I apply three tree priors---Uniform, Fossilized -Birth-Death (FBD), and Coalescent---to five publicly available datasets of the -Indo-European language family. I evaluate the posterior distribution of the -trees from the Bayesian analysis using Bayes Factor, and find that there is -support for the Steppe origin hypothesis in the case of two tree priors. I -report the median and 95% highest posterior density (HPD) interval of the root -ages for all the three tree priors. A model comparison suggested that either -Uniform prior or FBD prior is more suitable than the Coalescent prior to the -datasets belonging to the Indo-European language family. -" -7482,1805.03668,"Lianhui Qin, Lemao Liu, Victoria Bi, Yan Wang, Xiaojiang Liu, Zhiting - Hu, Hai Zhao, Shuming Shi",Automatic Article Commenting: the Task and Dataset,cs.CL," Comments of online articles provide extended views and improve user -engagement. Automatically making comments thus become a valuable functionality -for online forums, intelligent chatbots, etc. This paper proposes the new task -of automatic article commenting, and introduces a large-scale Chinese dataset -with millions of real comments and a human-annotated subset characterizing the -comments' varying quality. Incorporating the human bias of comment quality, we -further develop automatic metrics that generalize a broad set of popular -reference-based metrics and exhibit greatly improved correlations with human -evaluations. -" -7483,1805.03687,Abien Fred Agarap,"Statistical Analysis on E-Commerce Reviews, with Sentiment - Classification using Bidirectional Recurrent Neural Network (RNN)",cs.CL cs.LG stat.ML," Understanding customer sentiments is of paramount importance in marketing -strategies today. Not only will it give companies an insight as to how -customers perceive their products and/or services, but it will also give them -an idea on how to improve their offers. This paper attempts to understand the -correlation of different variables in customer reviews on a women clothing -e-commerce, and to classify each review whether it recommends the reviewed -product or not and whether it consists of positive, negative, or neutral -sentiment. To achieve these goals, we employed univariate and multivariate -analyses on dataset features except for review titles and review texts, and we -implemented a bidirectional recurrent neural network (RNN) with long-short term -memory unit (LSTM) for recommendation and sentiment classification. Results -have shown that a recommendation is a strong indicator of a positive sentiment -score, and vice-versa. On the other hand, ratings in product reviews are fuzzy -indicators of sentiment scores. We also found out that the bidirectional LSTM -was able to reach an F1-score of 0.88 for recommendation classification, and -0.93 for sentiment classification. -" -7484,1805.03710,Alexandre Salle and Aline Villavicencio,"Incorporating Subword Information into Matrix Factorization Word - Embeddings",cs.CL," The positive effect of adding subword information to word embeddings has been -demonstrated for predictive models. In this paper we investigate whether -similar benefits can also be derived from incorporating subwords into counting -models. We evaluate the impact of different types of subwords (n-grams and -unsupervised morphemes), with results confirming the importance of subword -information in learning representations of rare and out-of-vocabulary words. -" -7485,1805.03716,"Omer Levy, Kenton Lee, Nicholas FitzGerald, Luke Zettlemoyer","Long Short-Term Memory as a Dynamically Computed Element-wise Weighted - Sum",cs.CL cs.AI cs.LG stat.ML," LSTMs were introduced to combat vanishing gradients in simple RNNs by -augmenting them with gated additive recurrent connections. We present an -alternative view to explain the success of LSTMs: the gates themselves are -versatile recurrent models that provide more representational power than -previously appreciated. We do this by decoupling the LSTM's gates from the -embedded simple RNN, producing a new class of RNNs where the recurrence -computes an element-wise weighted sum of context-independent functions of the -input. Ablations on a range of problems demonstrate that the gating mechanism -alone performs as well as an LSTM in most settings, strongly suggesting that -the gates are doing much more in practice than just alleviating vanishing -gradients. -" -7486,1805.03750,"Eva Hasler, Adri\`a De Gispert, Gonzalo Iglesias, Bill Byrne",Neural Machine Translation Decoding with Terminology Constraints,cs.CL," Despite the impressive quality improvements yielded by neural machine -translation (NMT) systems, controlling their translation output to adhere to -user-provided terminology constraints remains an open problem. We describe our -approach to constrained neural decoding based on finite-state machines and -multi-stack decoding which supports target-side constraints as well as -constraints with corresponding aligned input text spans. We demonstrate the -performance of our framework on multiple translation tasks and motivate the -need for constrained decoding with attentions as a means of reducing -misplacement and duplication when translating user constraints. -" -7487,1805.03766,"Antoine Bosselut, Asli Celikyilmaz, Xiaodong He, Jianfeng Gao, Po-Sen - Huang, Yejin Choi",Discourse-Aware Neural Rewards for Coherent Text Generation,cs.CL," In this paper, we investigate the use of discourse-aware rewards with -reinforcement learning to guide a model to generate long, coherent text. In -particular, we propose to learn neural rewards to model cross-sentence ordering -as a means to approximate desired discourse structure. Empirical results -demonstrate that a generator trained with the learned reward produces more -coherent and less repetitive text than models trained with cross-entropy or -with reinforcement learning with commonly used scores as rewards. -" -7488,1805.03774,Fan Bu,"The Evolution of Popularity and Images of Characters in Marvel Cinematic - Universe Fanfictions",cs.CL," This analysis proposes a new topic model to study the yearly trends in Marvel -Cinematic Universe fanfictions on three levels: character popularity, character -images/topics, and vocabulary pattern of topics. It is found that character -appearances in fanfictions have become more diverse over the years thanks to -constant introduction of new characters in feature films, and in the case of -Captain America, multi-dimensional character development is well-received by -the fanfiction world. -" -7489,1805.03784,"Kevin K. Bowden, Jiaqi Wu, Shereen Oraby, Amita Misra, and Marilyn - Walker","SlugNERDS: A Named Entity Recognition Tool for Open Domain Dialogue - Systems",cs.CL," In dialogue systems, the tasks of named entity recognition (NER) and named -entity linking (NEL) are vital preprocessing steps for understanding user -intent, especially in open domain interaction where we cannot rely on -domain-specific inference. UCSC's effort as one of the funded teams in the 2017 -Amazon Alexa Prize Contest has yielded Slugbot, an open domain social bot, -aimed at casual conversation. We discovered several challenges specifically -associated with both NER and NEL when building Slugbot, such as that the NE -labels are too coarse-grained or the entity types are not linked to a useful -ontology. Moreover, we have discovered that traditional approaches do not -perform well in our context: even systems designed to operate on tweets or -other social media data do not work well in dialogue systems. In this paper, we -introduce Slugbot's Named Entity Recognition for dialogue Systems (SlugNERDS), -a NER and NEL tool which is optimized to address these issues. We describe two -new resources that we are building as part of this work: SlugEntityDB and -SchemaActuator. We believe these resources will be useful for the research -community. -" -7490,1805.03793,"Jialong Han, Yan Song, Wayne Xin Zhao, Shuming Shi, Haisong Zhang",hyperdoc2vec: Distributed Representations of Hypertext Documents,cs.CL cs.SI," Hypertext documents, such as web pages and academic papers, are of great -importance in delivering information in our daily life. Although being -effective on plain documents, conventional text embedding methods suffer from -information loss if directly adapted to hyper-documents. In this paper, we -propose a general embedding approach for hyper-documents, namely, hyperdoc2vec, -along with four criteria characterizing necessary information that -hyper-document embedding models should preserve. Systematic comparisons are -conducted between hyperdoc2vec and several competitors on two tasks, i.e., -paper classification and citation recommendation, in the academic paper domain. -Analyses and experiments both validate the superiority of hyperdoc2vec to other -models w.r.t. the four criteria. -" -7491,1805.03801,"Bei Shi, Zihao Fu, Lidong Bing and Wai Lam",Learning Domain-Sensitive and Sentiment-Aware Word Embeddings,cs.CL cs.AI," Word embeddings have been widely used in sentiment classification because of -their efficacy for semantic representations of words. Given reviews from -different domains, some existing methods for word embeddings exploit sentiment -information, but they cannot produce domain-sensitive embeddings. On the other -hand, some other existing methods can generate domain-sensitive word -embeddings, but they cannot distinguish words with similar contexts but -opposite sentiment polarity. We propose a new method for learning -domain-sensitive and sentiment-aware embeddings that simultaneously capture the -information of sentiment semantics and domain sensitivity of individual words. -Our method can automatically determine and produce domain-common embeddings and -domain-specific embeddings. The differentiation of domain-common and -domain-specific words enables the advantage of data augmentation of common -semantics from multiple domains and capture the varied semantics of specific -words from different domains at the same time. Experimental results show that -our model provides an effective way to learn domain-sensitive and -sentiment-aware word embeddings which benefit sentiment classification at both -sentence level and lexicon term level. -" -7492,1805.03818,"Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy - Liang and Christopher R\'e",Training Classifiers with Natural Language Explanations,cs.CL," Training accurate classifiers requires many labels, but each label provides -only limited information (one bit for binary classification). In this work, we -propose BabbleLabble, a framework for training classifiers in which an -annotator provides a natural language explanation for each labeling decision. A -semantic parser converts these explanations into programmatic labeling -functions that generate noisy labels for an arbitrary amount of unlabeled data, -which is used to train a classifier. On three relation extraction tasks, we -find that users are able to train classifiers with comparable F1 scores from -5-100$\times$ faster by providing explanations instead of just labels. -Furthermore, given the inherent imperfection of labeling functions, we find -that a simple rule-based semantic parser suffices. -" -7493,1805.03830,Soumya Wadhwa and Varsha Embar and Matthias Grabmair and Eric Nyberg,Towards Inference-Oriented Reading Comprehension: ParallelQA,cs.CL cs.AI," In this paper, we investigate the tendency of end-to-end neural Machine -Reading Comprehension (MRC) models to match shallow patterns rather than -perform inference-oriented reasoning on RC benchmarks. We aim to test the -ability of these systems to answer questions which focus on referential -inference. We propose ParallelQA, a strategy to formulate such questions using -parallel passages. We also demonstrate that existing neural models fail to -generalize well to this setting. -" -7494,1805.03832,"Wei Zou, Dongwei Jiang, Shuaijiang Zhao, Xiangang Li","A comparable study of modeling units for end-to-end Mandarin speech - recognition",cs.CL eess.AS," End-To-End speech recognition have become increasingly popular in mandarin -speech recognition and achieved delightful performance. - Mandarin is a tonal language which is different from English and requires -special treatment for the acoustic modeling units. There have been several -different kinds of modeling units for mandarin such as phoneme, syllable and -Chinese character. - In this work, we explore two major end-to-end models: connectionist temporal -classification (CTC) model and attention based encoder-decoder model for -mandarin speech recognition. We compare the performance of three different -scaled modeling units: context dependent phoneme(CDP), syllable with tone and -Chinese character. - We find that all types of modeling units can achieve approximate character -error rate (CER) in CTC model and the performance of Chinese character -attention model is better than syllable attention model. Furthermore, we find -that Chinese character is a reasonable unit for mandarin speech recognition. On -DidiCallcenter task, Chinese character attention model achieves a CER of 5.68% -and CTC model gets a CER of 7.29%, on the other DidiReading task, CER are 4.89% -and 5.79%, respectively. Moreover, attention model achieves a better -performance than CTC model on both datasets. -" -7495,1805.03838,Zhi-Xiu Ye and Zhen-Hua Ling,Hybrid semi-Markov CRF for Neural Sequence Labeling,cs.CL," This paper proposes hybrid semi-Markov conditional random fields (SCRFs) for -neural sequence labeling in natural language processing. Based on conventional -conditional random fields (CRFs), SCRFs have been designed for the tasks of -assigning labels to segments by extracting features from and describing -transitions between segments instead of words. In this paper, we improve the -existing SCRF methods by employing word-level and segment-level information -simultaneously. First, word-level labels are utilized to derive the segment -scores in SCRFs. Second, a CRF output layer and an SCRF output layer are -integrated into an unified neural network and trained jointly. Experimental -results on CoNLL 2003 named entity recognition (NER) shared task show that our -model achieves state-of-the-art performance when no external knowledge is used. -" -7496,1805.03871,Ilias Chalkidis and Ion Androutsopoulos and Achilleas Michos,Obligation and Prohibition Extraction Using Hierarchical RNNs,cs.CL," We consider the task of detecting contractual obligations and prohibitions. -We show that a self-attention mechanism improves the performance of a BILSTM -classifier, the previous state of the art for this task, by allowing it to -focus on indicative tokens. We also introduce a hierarchical BILSTM, which -converts each sentence to an embedding, and processes the sentence embeddings -to classify each sentence. Apart from being faster to train, the hierarchical -BILSTM outperforms the flat one, even when the latter considers surrounding -sentences, because the hierarchical model has a broader discourse view. -" -7497,1805.03900,Furu Wei,Improv Chat: Second Response Generation for Chatbot,cs.CL," Existing research on response generation for chatbot focuses on \textbf{First -Response Generation} which aims to teach the chatbot to say the first response -(e.g. a sentence) appropriate to the conversation context (e.g. the user's -query). In this paper, we introduce a new task \textbf{Second Response -Generation}, termed as Improv chat, which aims to teach the chatbot to say the -second response after saying the first response with respect the conversation -context, so as to lighten the burden on the user to keep the conversation -going. Specifically, we propose a general learning based framework and develop -a retrieval based system which can generate the second responses with the -users' query and the chatbot's first response as input. We present the approach -to building the conversation corpus for Improv chat from public forums and -social networks, as well as the neural networks based models for response -matching and ranking. We include the preliminary experiments and results in -this paper. This work could be further advanced with better deep matching -models for retrieval base systems or generative models for generation based -systems as well as extensive evaluations in real-life applications. -" -7498,1805.03977,"Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma","Automatic Academic Paper Rating Based on Modularized Hierarchical - Convolutional Neural Network",cs.CL," As more and more academic papers are being submitted to conferences and -journals, evaluating all these papers by professionals is time-consuming and -can cause inequality due to the personal factors of the reviewers. In this -paper, in order to assist professionals in evaluating academic papers, we -propose a novel task: automatic academic paper rating (AAPR), which -automatically determine whether to accept academic papers. We build a new -dataset for this task and propose a novel modularized hierarchical -convolutional neural network to achieve automatic academic paper rating. -Evaluation results show that the proposed model outperforms the baselines by a -large margin. The dataset and code are available at -\url{https://github.com/lancopku/AAPR} -" -7499,1805.03989,"Junyang Lin, Xu Sun, Shuming Ma and Qi Su",Global Encoding for Abstractive Summarization,cs.CL cs.AI cs.LG," In neural abstractive summarization, the conventional sequence-to-sequence -(seq2seq) model often suffers from repetition and semantic irrelevance. To -tackle the problem, we propose a global encoding framework, which controls the -information flow from the encoder to the decoder based on the global -information of the source context. It consists of a convolutional gated unit to -perform global encoding to improve the representations of the source-side -information. Evaluations on the LCSTS and the English Gigaword both demonstrate -that our model outperforms the baseline models, and the analysis shows that our -model is capable of reducing repetition. -" -7500,1805.04016,"Craig Stewart, Nikolai Vogler, Junjie Hu, Jordan Boyd-Graber, Graham - Neubig",Automatic Estimation of Simultaneous Interpreter Performance,cs.CL," Simultaneous interpretation, translation of the spoken word in real-time, is -both highly challenging and physically demanding. Methods to predict -interpreter confidence and the adequacy of the interpreted message have a -number of potential applications, such as in computer-assisted interpretation -interfaces or pedagogical tools. We propose the task of predicting simultaneous -interpreter performance by building on existing methodology for quality -estimation (QE) of machine translation output. In experiments over five -settings in three language pairs, we extend a QE pipeline to estimate -interpreter performance (as approximated by the METEOR evaluation metric) and -propose novel features reflecting interpretation strategy and evaluation -measures that further improve prediction accuracy. -" -7501,1805.04032,Jose Camacho-Collados and Mohammad Taher Pilehvar,"From Word to Sense Embeddings: A Survey on Vector Representations of - Meaning",cs.CL cs.AI," Over the past years, distributed semantic representations have proved to be -effective and flexible keepers of prior knowledge to be integrated into -downstream applications. This survey focuses on the representation of meaning. -We start from the theoretical background behind word vector space models and -highlight one of their major limitations: the meaning conflation deficiency, -which arises from representing a word with all its possible meanings as a -single vector. Then, we explain how this deficiency can be addressed through a -transition from the word level to the more fine-grained level of word senses -(in its broader acceptation) as a method for modelling unambiguous lexical -meaning. We present a comprehensive overview of the wide range of techniques in -the two main branches of sense representation, i.e., unsupervised and -knowledge-based. Finally, this survey covers the main evaluation procedures and -applications for this type of representation, and provides an analysis of four -of its important aspects: interpretability, sense granularity, adaptability to -different domains and compositionality. -" -7502,1805.04033,"Bingzhen Wei, Xuancheng Ren, Xu Sun, Yi Zhang, Xiaoyan Cai, Qi Su","Regularizing Output Distribution of Abstractive Chinese Social Media - Text Summarization for Improved Semantic Consistency",cs.CL," Abstractive text summarization is a highly difficult problem, and the -sequence-to-sequence model has shown success in improving the performance on -the task. However, the generated summaries are often inconsistent with the -source content in semantics. In such cases, when generating summaries, the -model selects semantically unrelated words with respect to the source content -as the most probable output. The problem can be attributed to heuristically -constructed training data, where summaries can be unrelated to the source -content, thus containing semantically unrelated words and spurious word -correspondence. In this paper, we propose a regularization approach for the -sequence-to-sequence model and make use of what the model has learned to -regularize the learning objective to alleviate the effect of the problem. In -addition, we propose a practical human evaluation method to address the problem -that the existing automatic evaluation method does not evaluate the semantic -consistency with the source content properly. Experimental results demonstrate -the effectiveness of the proposed approach, which outperforms almost all the -existing models. Especially, the proposed approach improves the semantic -consistency by 4\% in terms of human evaluation. -" -7503,1805.04044,"Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu, Jiawei Han",End-to-End Reinforcement Learning for Automatic Taxonomy Induction,cs.CL," We present a novel end-to-end reinforcement learning approach to automatic -taxonomy induction from a set of terms. While prior methods treat the problem -as a two-phase task (i.e., detecting hypernymy pairs followed by organizing -these pairs into a tree-structured hierarchy), we argue that such two-phase -methods may suffer from error propagation, and cannot effectively optimize -metrics that capture the holistic structure of a taxonomy. In our approach, the -representations of term pairs are learned using multiple sources of information -and used to determine \textit{which} term to select and \textit{where} to place -it on the taxonomy via a policy network. All components are trained in an -end-to-end manner with cumulative rewards, measured by a holistic tree metric -over the training taxonomies. Experiments on two public datasets of different -domains show that our approach outperforms prior state-of-the-art taxonomy -induction methods up to 19.6\% on ancestor F1. -" -7504,1805.04174,"Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, - Xinyuan Zhang, Ricardo Henao, Lawrence Carin",Joint Embedding of Words and Labels for Text Classification,cs.CL cs.LG," Word embeddings are effective intermediate representations for capturing -semantic regularities between words, when learning the representations of text -sequences. We propose to view text classification as a label-word joint -embedding problem: each label is embedded in the same space with the word -vectors. We introduce an attention framework that measures the compatibility of -embeddings between text sequences and labels. The attention is learned on a -training set of labeled samples to ensure that, given a text sequence, the -relevant words are weighted higher than the irrelevant ones. Our method -maintains the interpretability of word embeddings, and enjoys a built-in -ability to leverage alternative sources of information, in addition to input -text sequences. Extensive results on the several large text datasets show that -the proposed framework outperforms the state-of-the-art methods by a large -margin, in terms of both accuracy and speed. -" -7505,1805.04185,Mattia Antonino Di Gangi and Marcello Federico,Deep Neural Machine Translation with Weakly-Recurrent Units,cs.CL," Recurrent neural networks (RNNs) have represented for years the state of the -art in neural machine translation. Recently, new architectures have been -proposed, which can leverage parallel computation on GPUs better than classical -RNNs. Faster training and inference combined with different -sequence-to-sequence modeling also lead to performance improvements. While the -new models completely depart from the original recurrent architecture, we -decided to investigate how to make RNNs more efficient. In this work, we -propose a new recurrent NMT architecture, called Simple Recurrent NMT, built on -a class of fast and weakly-recurrent units that use layer normalization and -multiple attentions. Our experiments on the WMT14 English-to-German and WMT16 -English-Romanian benchmarks show that our model represents a valid alternative -to LSTMs, as it can achieve better results at a significantly lower -computational cost. -" -7506,1805.04212,"Vicente Ivan Sanchez Carmona, Jeff Mitchell, Sebastian Riedel","Behavior Analysis of NLI Models: Uncovering the Influence of Three - Factors on Robustness",cs.CL cs.AI," Natural Language Inference is a challenging task that has received -substantial attention, and state-of-the-art models now achieve impressive test -set performance in the form of accuracy scores. Here, we go beyond this single -evaluation metric to examine robustness to semantically-valid alterations to -the input data. We identify three factors - insensitivity, polarity and unseen -pairs - and compare their impact on three SNLI models under a variety of -conditions. Our results demonstrate a number of strengths and weaknesses in the -models' ability to generalise to new in-domain instances. In particular, while -strong performance is possible on unseen hypernyms, unseen antonyms are more -challenging for all the models. More generally, the models suffer from an -insensitivity to certain small but semantically significant alterations, and -are also often influenced by simple statistical correlations between words and -training labels. Overall, we show that evaluations of NLI models can benefit -from studying the influence of factors intrinsic to the models or found in the -dataset used. -" -7507,1805.04218,"Terra Blevins, Omer Levy, Luke Zettlemoyer",Deep RNNs Encode Soft Hierarchical Syntax,cs.CL," We present a set of experiments to demonstrate that deep recurrent neural -networks (RNNs) learn internal representations that capture soft hierarchical -notions of syntax from highly varied supervision. We consider four syntax tasks -at different depths of the parse tree; for each word, we predict its part of -speech as well as the first (parent), second (grandparent) and third level -(great-grandparent) constituent labels that appear above it. These predictions -are made from representations produced at different depths in networks that are -pretrained with one of four objectives: dependency parsing, semantic role -labeling, machine translation, or language modeling. In every case, we find a -correspondence between network depth and syntactic depth, suggesting that a -soft syntactic hierarchy emerges. This effect is robust across all conditions, -indicating that the models encode significant amounts of syntax even in the -absence of an explicit syntactic training supervision. -" -7508,1805.04237,Poorya Zaremoodi and Gholamreza Haffari,"Neural Machine Translation for Bilingually Scarce Scenarios: A Deep - Multi-task Learning Approach",cs.CL," Neural machine translation requires large amounts of parallel training text -to learn a reasonable-quality translation model. This is particularly -inconvenient for language pairs for which enough parallel text is not -available. In this paper, we use monolingual linguistic resources in the source -side to address this challenging problem based on a multi-task learning -approach. More specifically, we scaffold the machine translation task on -auxiliary tasks including semantic parsing, syntactic parsing, and named-entity -recognition. This effectively injects semantic and/or syntactic knowledge into -the translation model, which would otherwise require a large amount of training -bitext. We empirically evaluate and show the effectiveness of our multi-task -learning approach on three translation tasks: English-to-French, -English-to-Farsi, and English-to-Vietnamese. -" -7509,1805.04247,"Moshiur R Farazi, Salman H Khan",Reciprocal Attention Fusion for Visual Question Answering,cs.CV cs.AI cs.CL," Existing attention mechanisms either attend to local image grid or object -level features for Visual Question Answering (VQA). Motivated by the -observation that questions can relate to both object instances and their parts, -we propose a novel attention mechanism that jointly considers reciprocal -relationships between the two levels of visual details. The bottom-up attention -thus generated is further coalesced with the top-down information to only focus -on the scene elements that are most relevant to a given question. Our design -hierarchically fuses multi-modal information i.e., language, object- and -gird-level features, through an efficient tensor decomposition scheme. The -proposed model improves the state-of-the-art single model performances from -67.9% to 68.2% on VQAv1 and from 65.7% to 67.4% on VQAv2, demonstrating a -significant boost. -" -7510,1805.04264,"Lyan Verwimp, Hugo Van hamme, Vincent Renkens, Patrick Wambacq",State Gradients for RNN Memory Analysis,cs.CL cs.NE," We present a framework for analyzing what the state in RNNs remembers from -its input embeddings. Our approach is inspired by backpropagation, in the sense -that we compute the gradients of the states with respect to the input -embeddings. The gradient matrix is decomposed with Singular Value Decomposition -to analyze which directions in the embedding space are best transferred to the -hidden state space, characterized by the largest singular values. We apply our -approach to LSTM language models and investigate to what extent and for how -long certain classes of words are remembered on average for a certain corpus. -Additionally, the extent to which a specific property or relationship is -remembered by the RNN can be tracked by comparing a vector characterizing that -property with the direction(s) in embedding space that are best preserved in -hidden state space. -" -7511,1805.04270,"Lei Cui, Furu Wei and Ming Zhou",Neural Open Information Extraction,cs.CL," Conventional Open Information Extraction (Open IE) systems are usually built -on hand-crafted patterns from other NLP tools such as syntactic parsing, yet -they face problems of error propagation. In this paper, we propose a neural -Open IE approach with an encoder-decoder framework. Distinct from existing -methods, the neural Open IE approach learns highly confident arguments and -relation tuples bootstrapped from a state-of-the-art Open IE system. An -empirical study on a large benchmark dataset shows that the neural Open IE -system significantly outperforms several baselines, while maintaining -comparable computational efficiency. -" -7512,1805.04277,"Eneko Agirre, Oier L\'opez de Lacalle, Aitor Soroa","The risk of sub-optimal use of Open Source NLP Software: UKB is - inadvertently state-of-the-art in knowledge-based WSD",cs.CL," UKB is an open source collection of programs for performing, among other -tasks, knowledge-based Word Sense Disambiguation (WSD). Since it was released -in 2009 it has been often used out-of-the-box in sub-optimal settings. We show -that nine years later it is the state-of-the-art on knowledge-based WSD. This -case shows the pitfalls of releasing open source NLP software without optimal -default settings and precise instructions for reproducibility. -" -7513,1805.04402,Makoto Kanazawa and Tobias Kapp\'e,Decision problems for Clark-congruential languages,cs.CL cs.FL," A common question when studying a class of context-free grammars is whether -equivalence is decidable within this class. We answer this question positively -for the class of Clark-congruential grammars, which are of interest to -grammatical inference. We also consider the problem of checking whether a given -CFG is Clark-congruential, and show that it is decidable given that the CFG is -a DCFG. -" -7514,1805.04437,"Georgios Balikas, Charlotte Laclau, Ievgen Redko, Massih-Reza Amini",Cross-lingual Document Retrieval using Regularized Wasserstein Distance,cs.CL stat.ML," Many information retrieval algorithms rely on the notion of a good distance -that allows to efficiently compare objects of different nature. Recently, a new -promising metric called Word Mover's Distance was proposed to measure the -divergence between text passages. In this paper, we demonstrate that this -metric can be extended to incorporate term-weighting schemes and provide more -accurate and computationally efficient matching between documents using -entropic regularization. We evaluate the benefits of both extensions in the -task of cross-lingual document retrieval (CLDR). Our experimental results on -eight CLDR problems suggest that the proposed methods achieve remarkable -improvements in terms of Mean Reciprocal Rank compared to several baselines. -" -7515,1805.04453,"Nicholas Ruiz, Srinivas Bangalore, John Chen","Bootstrapping Multilingual Intent Models via Machine Translation for - Dialog Automation",cs.CL," With the resurgence of chat-based dialog systems in consumer and enterprise -applications, there has been much success in developing data-driven and -rule-based natural language models to understand human intent. Since these -models require large amounts of data and in-domain knowledge, expanding an -equivalent service into new markets is disrupted by language barriers that -inhibit dialog automation. - This paper presents a user study to evaluate the utility of out-of-the-box -machine translation technology to (1) rapidly bootstrap multilingual spoken -dialog systems and (2) enable existing human analysts to understand foreign -language utterances. We additionally evaluate the utility of machine -translation in human assisted environments, where a portion of the traffic is -processed by analysts. In English->Spanish experiments, we observe a high -potential for dialog automation, as well as the potential for human analysts to -process foreign language utterances with high accuracy. -" -7516,1805.04508,Svetlana Kiritchenko and Saif M. Mohammad,Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems,cs.CL," Automatic machine learning systems can inadvertently accentuate and -perpetuate inappropriate human biases. Past work on examining inappropriate -biases has largely focused on just individual systems. Further, there is no -benchmark dataset for examining inappropriate biases in systems. Here for the -first time, we present the Equity Evaluation Corpus (EEC), which consists of -8,640 English sentences carefully chosen to tease out biases towards certain -races and genders. We use the dataset to examine 219 automatic sentiment -analysis systems that took part in a recent shared task, SemEval-2018 Task 1 -'Affect in Tweets'. We find that several of the systems show statistically -significant bias; that is, they consistently provide slightly higher sentiment -intensity predictions for one race or one gender. We make the EEC freely -available. -" -7517,1805.04542,Svetlana Kiritchenko and Saif M. Mohammad,Sentiment Composition of Words with Opposing Polarities,cs.CL," In this paper, we explore sentiment composition in phrases that have at least -one positive and at least one negative word---phrases like 'happy accident' and -'best winter break'. We compiled a dataset of such opposing polarity phrases -and manually annotated them with real-valued scores of sentiment association. -Using this dataset, we analyze the linguistic patterns present in opposing -polarity phrases. Finally, we apply several unsupervised and supervised -techniques of sentiment composition to determine their efficacy on this -dataset. Our best system, which incorporates information from the phrase's -constituents, their parts of speech, their sentiment association scores, and -their embedding vectors, obtains an accuracy of over 80% on the opposing -polarity phrases. -" -7518,1805.04558,"Svetlana Kiritchenko, Saif M. Mohammad, Jason Morin, and Berry de - Bruijn","NRC-Canada at SMM4H Shared Task: Classifying Tweets Mentioning Adverse - Drug Reactions and Medication Intake",cs.CL," Our team, NRC-Canada, participated in two shared tasks at the AMIA-2017 -Workshop on Social Media Mining for Health Applications (SMM4H): Task 1 - -classification of tweets mentioning adverse drug reactions, and Task 2 - -classification of tweets describing personal medication intake. For both tasks, -we trained Support Vector Machine classifiers using a variety of surface-form, -sentiment, and domain-specific features. With nine teams participating in each -task, our submissions ranked first on Task 1 and third on Task 2. Handling -considerable class imbalance proved crucial for Task 1. We applied an -under-sampling technique to reduce class imbalance (from about 1:10 to 1:2). -Standard n-gram features, n-grams generalized over domain terms, as well as -general-domain and domain-specific word embeddings had a substantial impact on -the overall performance in both tasks. On the other hand, including sentiment -lexicon features did not result in any improvement. -" -7519,1805.04570,"Chaitanya Malaviya, Matthew R. Gormley, Graham Neubig",Neural Factor Graph Models for Cross-lingual Morphological Tagging,cs.CL," Morphological analysis involves predicting the syntactic traits of a word -(e.g. {POS: Noun, Case: Acc, Gender: Fem}). Previous work in morphological -tagging improves performance for low-resource languages (LRLs) through -cross-lingual training with a high-resource language (HRL) from the same -family, but is limited by the strict, often false, assumption that tag sets -exactly overlap between the HRL and LRL. In this paper we propose a method for -cross-lingual morphological tagging that aims to improve information sharing -between languages by relaxing this assumption. The proposed model uses -factorial conditional random fields with neural network potentials, making it -possible to (1) utilize the expressive power of neural network representations -to smooth over superficial differences in the surface forms, (2) model pairwise -and transitive relationships between tags, and (3) accurately generate tag sets -that are unseen or rare in the training data. Experiments on four languages -from the Universal Dependencies Treebank demonstrate superior tagging -accuracies over existing cross-lingual approaches. -" -7520,1805.04576,"Prathusha K Sarma, YIngyu Liang, William A Sethares",Domain Adapted Word Embeddings for Improved Sentiment Classification,cs.CL," Generic word embeddings are trained on large-scale generic corpora; Domain -Specific (DS) word embeddings are trained only on data from a domain of -interest. This paper proposes a method to combine the breadth of generic -embeddings with the specificity of domain specific embeddings. The resulting -embeddings, called Domain Adapted (DA) word embeddings, are formed by aligning -corresponding word vectors using Canonical Correlation Analysis (CCA) or the -related nonlinear Kernel CCA. Evaluation results on sentiment classification -tasks show that the DA embeddings substantially outperform both generic and DS -embeddings when used as input features to standard or state-of-the-art sentence -encoding algorithms for classification. -" -7521,1805.04579,"Divyanshu Daiya, Anukarsh Singh, Mukesh Jadon",Using Statistical and Semantic Models for Multi-Document Summarization,cs.CL," We report a series of experiments with different semantic models on top of -various statistical models for extractive text summarization. Though -statistical models may better capture word co-occurrences and distribution -around the text, they fail to detect the context and the sense of sentences -/words as a whole. Semantic models help us gain better insight into the context -of sentences. We show that how tuning weights between different models can help -us achieve significant results on various benchmarks. Learning pre-trained -vectors used in semantic models further, on given corpus, can give addition -spike in performance. Using weighing techniques in between different -statistical models too further refines our result. For Statistical models, we -have used TF/IDF, TextRAnk, Jaccard/Cosine Similarities. For Semantic Models, -we have used WordNet-based Model and proposed two models based on Glove Vectors -and Facebook's InferSent. We tested our approach on DUC 2004 dataset, -generating 100-word summaries. We have discussed the system, algorithms, -analysis and also proposed and tested possible improvements. ROUGE scores were -used to compare to other summarizers. -" -7522,1805.04601,"Hu Xu, Bing Liu, Lei Shu, Philip S. Yu",Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction,cs.CL," One key task of fine-grained sentiment analysis of product reviews is to -extract product aspects or features that users have expressed opinions on. This -paper focuses on supervised aspect extraction using deep learning. Unlike other -highly sophisticated supervised deep learning models, this paper proposes a -novel and yet simple CNN model employing two types of pre-trained embeddings -for aspect extraction: general-purpose embeddings and domain-specific -embeddings. Without using any additional supervision, this model achieves -surprisingly good results, outperforming state-of-the-art sophisticated -existing methods. To our knowledge, this paper is the first to report such -double embeddings based CNN model for aspect extraction and achieve very good -results. -" -7523,1805.04604,"Li Dong, Chris Quirk, Mirella Lapata",Confidence Modeling for Neural Semantic Parsing,cs.CL," In this work we focus on confidence modeling for neural semantic parsers -which are built upon sequence-to-sequence models. We outline three major causes -of uncertainty, and design various metrics to quantify these factors. These -metrics are then used to estimate confidence scores that indicate whether model -predictions are likely to be correct. Beyond confidence estimation, we identify -which parts of the input contribute to uncertain predictions allowing users to -interpret their model, and verify or refine its input. Experimental results -show that our confidence model significantly outperforms a widely used method -that relies on posterior probability, and improves the quality of -interpretation compared to simply relying on attention scores. -" -7524,1805.04609,"Jonathan Zarecki, Shaul Markovitch",Textual Membership Queries,cs.LG cs.CL stat.ML," Human labeling of data can be very time-consuming and expensive, yet, in many -cases it is critical for the success of the learning process. In order to -minimize human labeling efforts, we propose a novel active learning solution -that does not rely on existing sources of unlabeled data. It uses a small -amount of labeled data as the core set for the synthesis of useful membership -queries (MQs) - unlabeled instances generated by an algorithm for human -labeling. Our solution uses modification operators, functions that modify -instances to some extent. We apply the operators on a small set of instances -(core set), creating a set of new membership queries. Using this framework, we -look at the instance space as a search space and apply search algorithms in -order to generate new examples highly relevant to the learner. We implement -this framework in the textual domain and test it on several text classification -tasks and show improved classifier performance as more MQs are labeled and -incorporated into the training set. To the best of our knowledge, this is the -first work on membership queries in the textual domain. -" -7525,1805.04617,"Alexander R. Fabbri, Irene Li, Prawat Trairatvorakul, Yijiao He, Wei - Tai Ting, Robert Tung, Caitlin Westerfield, Dragomir R. Radev","TutorialBank: A Manually-Collected Corpus for Prerequisite Chains, - Survey Extraction and Resource Recommendation",cs.CL," The field of Natural Language Processing (NLP) is growing rapidly, with new -research published daily along with an abundance of tutorials, codebases and -other online resources. In order to learn this dynamic field or stay up-to-date -on the latest research, students as well as educators and researchers must -constantly sift through multiple sources to find valuable, relevant -information. To address this situation, we introduce TutorialBank, a new, -publicly available dataset which aims to facilitate NLP education and research. -We have manually collected and categorized over 6,300 resources on NLP as well -as the related fields of Artificial Intelligence (AI), Machine Learning (ML) -and Information Retrieval (IR). Our dataset is notably the largest -manually-picked corpus of resources intended for NLP education which does not -include only academic papers. Additionally, we have created both a search -engine and a command-line tool for the resources and have annotated the corpus -to include lists of research topics, relevant resources for each topic, -prerequisite relations among topics, relevant sub-parts of individual -resources, among other annotations. We are releasing the dataset and present -several avenues for further research. -" -7526,1805.04623,"Urvashi Khandelwal, He He, Peng Qi, Dan Jurafsky","Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context",cs.CL," We know very little about how neural language models (LM) use prior -linguistic context. In this paper, we investigate the role of context in an -LSTM LM, through ablation studies. Specifically, we analyze the increase in -perplexity when prior context words are shuffled, replaced, or dropped. On two -standard datasets, Penn Treebank and WikiText-2, we find that the model is -capable of using about 200 tokens of context on average, but sharply -distinguishes nearby context (recent 50 tokens) from the distant history. The -model is highly sensitive to the order of words within the most recent -sentence, but ignores word order in the long-range context (beyond 50 tokens), -suggesting the distant past is modeled only as a rough semantic field or topic. -We further find that the neural caching model (Grave et al., 2017b) especially -helps the LSTM to copy words from within this distant context. Overall, our -analysis not only provides a better understanding of how neural LMs use their -context, but also sheds light on recent success from cache-based models. -" -7527,1805.04655,Sudha Rao and Hal Daum\'e III,"Learning to Ask Good Questions: Ranking Clarification Questions using - Neural Expected Value of Perfect Information",cs.CL," Inquiry is fundamental to communication, and machines cannot effectively -collaborate with humans unless they can ask questions. In this work, we build a -neural network model for the task of ranking clarification questions. Our model -is inspired by the idea of expected value of perfect information: a good -question is one whose expected answer will be useful. We study this problem -using data from StackExchange, a plentiful online resource in which people -routinely ask clarifying questions to posts so that they can better offer -assistance to the original poster. We create a dataset of clarification -questions consisting of ~77K posts paired with a clarification question (and -answer) from three domains of StackExchange: askubuntu, unix and superuser. We -evaluate our model on 500 samples of this dataset against expert human -judgments and demonstrate significant improvements over controlled baselines. -" -7528,1805.04658,"Hao Peng, Sam Thomson, and Noah A. Smith",Backpropagating through Structured Argmax using a SPIGOT,cs.CL," We introduce the structured projection of intermediate gradients optimization -technique (SPIGOT), a new method for backpropagating through neural networks -that include hard-decision structured predictions (e.g., parsing) in -intermediate layers. SPIGOT requires no marginal inference, unlike structured -attention networks (Kim et al., 2017) and some reinforcement learning-inspired -solutions (Yogatama et al., 2017). Like so-called straight-through estimators -(Hinton, 2012), SPIGOT defines gradient-like quantities associated with -intermediate nondifferentiable operations, allowing backpropagation before and -after them; SPIGOT's proxy aims to ensure that, after a parameter update, the -intermediate structure will remain well-formed. - We experiment on two structured NLP pipelines: syntactic-then-semantic -dependency parsing, and semantic parsing followed by sentiment classification. -We show that training with SPIGOT leads to a larger improvement on the -downstream task than a modularly-trained pipeline, the straight-through -estimator, and structured attention, reaching a new state of the art on -semantic dependency parsing. -" -7529,1805.04661,Filip Klubi\v{c}ka and Raquel Fern\'andez,"Examining a hate speech corpus for hate speech detection and popularity - prediction",cs.CL cs.AI cs.CY," As research on hate speech becomes more and more relevant every day, most of -it is still focused on hate speech detection. By attempting to replicate a hate -speech detection experiment performed on an existing Twitter corpus annotated -for hate speech, we highlight some issues that arise from doing research in the -field of hate speech, which is essentially still in its infancy. We take a -critical look at the training corpus in order to understand its biases, while -also using it to venture beyond hate speech detection and investigate whether -it can be used to shed light on other facets of research, such as popularity of -hate tweets. -" -7530,1805.04680,Dongyeop Kang and Tushar Khot and Ashish Sabharwal and Eduard Hovy,"AdvEntuRe: Adversarial Training for Textual Entailment with - Knowledge-Guided Examples",cs.CL cs.AI cs.LG," We consider the problem of learning textual entailment models with limited -supervision (5K-10K training examples), and present two complementary -approaches for it. First, we propose knowledge-guided adversarial example -generators for incorporating large lexical resources in entailment models via -only a handful of rule templates. Second, to make the entailment model - a -discriminator - more robust, we propose the first GAN-style approach for -training it using a natural language example generator that iteratively adjusts -based on the discriminator's performance. We demonstrate effectiveness using -two entailment datasets, where the proposed methods increase accuracy by 4.7% -on SciTail and by 2.8% on a 1% training sub-sample of SNLI. Notably, even a -single hand-written rule, negate, improves the accuracy on the negation -examples in SNLI by 6.1%. -" -7531,1805.04685,"Tommaso Pasini, Francesco Maria Elia, Roberto Navigli","Huge Automatically Extracted Training Sets for Multilingual Word Sense - Disambiguation",cs.CL," We release to the community six large-scale sense-annotated datasets in -multiple language to pave the way for supervised multilingual Word Sense -Disambiguation. Our datasets cover all the nouns in the English WordNet and -their translations in other languages for a total of millions of sense-tagged -sentences. Experiments prove that these corpora can be effectively used as -training sets for supervised WSD systems, surpassing the state of the art for -low-resourced languages and providing competitive results for English, where -manually annotated training sets are accessible. The data is available at -trainomatic.org. -" -7532,1805.04688,"Yanpeng Zhao, Liwen Zhang, Kewei Tu",Gaussian Mixture Latent Vector Grammars,cs.CL cs.LG," We introduce Latent Vector Grammars (LVeGs), a new framework that extends -latent variable grammars such that each nonterminal symbol is associated with a -continuous vector space representing the set of (infinitely many) subtypes of -the nonterminal. We show that previous models such as latent variable grammars -and compositional vector grammars can be interpreted as special cases of LVeGs. -We then present Gaussian Mixture LVeGs (GM-LVeGs), a new special case of LVeGs -that uses Gaussian mixtures to formulate the weights of production rules over -subtypes of nonterminals. A major advantage of using Gaussian mixtures is that -the partition function and the expectations of subtype rules can be computed -using an extension of the inside-outside algorithm, which enables efficient -inference and learning. We apply GM-LVeGs to part-of-speech tagging and -constituency parsing and show that GM-LVeGs can achieve competitive accuracies. -Our code is available at https://github.com/zhaoyanpeng/lveg. -" -7533,1805.04699,"Fran\c{c}ois Hernandez and Vincent Nguyen and Sahar Ghannay and - Natalia Tomashenko and Yannick Est\`eve","TED-LIUM 3: twice as much data and corpus repartition for experiments on - speaker adaptation",cs.CL," In this paper, we present TED-LIUM release 3 corpus dedicated to speech -recognition in English, that multiplies by more than two the available data to -train acoustic models in comparison with TED-LIUM 2. We present the recent -development on Automatic Speech Recognition (ASR) systems in comparison with -the two previous releases of the TED-LIUM Corpus from 2012 and 2014. We -demonstrate that, passing from 207 to 452 hours of transcribed speech training -data is really more useful for end-to-end ASR systems than for HMM-based -state-of-the-art ones, even if the HMM-based ASR system still outperforms -end-to-end ASR system when the size of audio training data is 452 hours, with -respectively a Word Error Rate (WER) of 6.6% and 13.7%. Last, we propose two -repartitions of the TED-LIUM release 3 corpus: the legacy one that is the same -as the one existing in release 2, and a new one, calibrated and designed to -make experiments on speaker adaptation. Like the two first releases, TED-LIUM 3 -corpus will be freely available for the research community. -" -7534,1805.04715,"Dmitry Ustalov and Alexander Panchenko and Andrei Kutuzov and Chris - Biemann and Simone Paolo Ponzetto",Unsupervised Semantic Frame Induction using Triclustering,cs.CL," We use dependency triples automatically extracted from a Web-scale corpus to -perform unsupervised semantic frame induction. We cast the frame induction -problem as a triclustering problem that is a generalization of clustering for -triadic data. Our replicable benchmarks demonstrate that the proposed -graph-based approach, Triframes, shows state-of-the art results on this task on -a FrameNet-derived dataset and performing on par with competitive methods on a -verb class clustering task. -" -7535,1805.04787,"Luheng He, Kenton Lee, Omer Levy, Luke Zettlemoyer","Jointly Predicting Predicates and Arguments in Neural Semantic Role - Labeling",cs.CL," Recent BIO-tagging-based neural semantic role labeling models are very high -performing, but assume gold predicates as part of the input and cannot -incorporate span-level features. We propose an end-to-end approach for jointly -predicting all predicates, arguments spans, and the relations between them. The -model makes independent decisions about what relationship, if any, holds -between every possible word-span pair, and learns contextualized span -representations that provide rich, shared input features for each decision. -Experiments demonstrate that this approach sets a new state of the art on -PropBank SRL without gold predicates. -" -7536,1805.04793,"Li Dong, Mirella Lapata",Coarse-to-Fine Decoding for Neural Semantic Parsing,cs.CL," Semantic parsing aims at mapping natural language utterances into structured -meaning representations. In this work, we propose a structure-aware neural -architecture which decomposes the semantic parsing process into two stages. -Given an input utterance, we first generate a rough sketch of its meaning, -where low-level information (such as variable names and arguments) is glossed -over. Then, we fill in missing details by taking into account the natural -language input and the sketch itself. Experimental results on four datasets -characteristic of different domains and meaning representations show that our -approach consistently improves performance, achieving competitive results -despite the use of relatively simple decoders. -" -7537,1805.04803,Tiancheng Zhao and Maxine Eskenazi,Zero-Shot Dialog Generation with Cross-Domain Latent Actions,cs.CL cs.AI," This paper introduces zero-shot dialog generation (ZSDG), as a step towards -neural dialog systems that can instantly generalize to new situations with -minimal data. ZSDG enables an end-to-end generative dialog system to generalize -to a new domain for which only a domain description is provided and no training -dialogs are available. Then a novel learning framework, Action Matching, is -proposed. This algorithm can learn a cross-domain embedding space that models -the semantics of dialog responses which, in turn, lets a neural dialog -generation model generalize to new domains. We evaluate our methods on a new -synthetic dialog dataset, and an existing human-human dialog dataset. Results -show that our method has superior performance in learning dialog models that -rapidly adapt their behavior to new domains and suggests promising future -research. -" -7538,1805.04813,"Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming Zhou, Shuai Ma",Triangular Architecture for Rare Language Translation,cs.CL cs.AI," Neural Machine Translation (NMT) performs poor on the low-resource language -pair $(X,Z)$, especially when $Z$ is a rare language. By introducing another -rich language $Y$, we propose a novel triangular training architecture (TA-NMT) -to leverage bilingual data $(Y,Z)$ (may be small) and $(X,Y)$ (can be rich) to -improve the translation performance of low-resource pairs. In this triangular -architecture, $Z$ is taken as the intermediate latent variable, and translation -models of $Z$ are jointly optimized with a unified bidirectional EM algorithm -under the goal of maximizing the translation likelihood of $(X,Y)$. Empirical -results demonstrate that our method significantly improves the translation -quality of rare languages on MultiUN and IWSLT2012 datasets, and achieves even -better performance combining back-translation methods. -" -7539,1805.04827,"Qi Wang, Chenming Xu, Yangming Zhou, Tong Ruan, Daqi Gao, Ping He","An attention-based Bi-GRU-CapsNet model for hypernymy detection between - compound entities",cs.CL," Named entities are usually composable and extensible. Typical examples are -names of symptoms and diseases in medical areas. To distinguish these entities -from general entities, we name them \textit{compound entities}. In this paper, -we present an attention-based Bi-GRU-CapsNet model to detect hypernymy -relationship between compound entities. Our model consists of several important -components. To avoid the out-of-vocabulary problem, English words or Chinese -characters in compound entities are fed into the bidirectional gated recurrent -units. An attention mechanism is designed to focus on the differences between -the two compound entities. Since there are some different cases in hypernymy -relationship between compound entities, capsule network is finally employed to -decide whether the hypernymy relationship exists or not. Experimental results -demonstrate -" -7540,1805.04833,"Angela Fan, Mike Lewis, Yann Dauphin",Hierarchical Neural Story Generation,cs.CL," We explore story generation: creative systems that can build coherent and -fluent passages of text about a topic. We collect a large dataset of 300K -human-written stories paired with writing prompts from an online forum. Our -dataset enables hierarchical story generation, where the model first generates -a premise, and then transforms it into a passage of text. We gain further -improvements with a novel form of model fusion that improves the relevance of -the story to the prompt, and adding a new gated multi-scale self-attention -mechanism to model long-range context. Experiments show large improvements over -strong baselines on both automated and human evaluations. Human judges prefer -stories generated by our approach to those from a strong non-hierarchical model -by a factor of two to one. -" -7541,1805.04836,"Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang",Building Language Models for Text with Named Entities,cs.CL," Text in many domains involves a significant amount of named entities. -Predict- ing the entity names is often challenging for a language model as they -appear less frequent on the training corpus. In this paper, we propose a novel -and effective approach to building a discriminative language model which can -learn the entity names by leveraging their entity type information. We also -introduce two benchmark datasets based on recipes and Java programming codes, -on which we evalu- ate the proposed model. Experimental re- sults show that our -model achieves 52.2% better perplexity in recipe generation and 22.06% on code -generation than the state-of-the-art language models. -" -7542,1805.04843,"Yansen Wang, Chenyi Liu, Minlie Huang, Liqiang Nie","Learning to Ask Questions in Open-domain Conversational Systems with - Typed Decoders",cs.CL," Asking good questions in large-scale, open-domain conversational systems is -quite significant yet rather untouched. This task, substantially different from -traditional question generation, requires to question not only with various -patterns but also on diverse and relevant topics. We observe that a good -question is a natural composition of {\it interrogatives}, {\it topic words}, -and {\it ordinary words}. Interrogatives lexicalize the pattern of questioning, -topic words address the key information for topic transition in dialogue, and -ordinary words play syntactical and grammatical roles in making a natural -sentence. We devise two typed decoders (\textit{soft typed decoder} and -\textit{hard typed decoder}) in which a type distribution over the three types -is estimated and used to modulate the final generation distribution. Extensive -experiments show that the typed decoders outperform state-of-the-art baselines -and can generate more meaningful questions. -" -7543,1805.04869,"Shuming Ma, Xu Sun, Junyang Lin, Houfeng Wang","Autoencoder as Assistant Supervisor: Improving Text Representation for - Chinese Social Media Text Summarization",cs.CL," Most of the current abstractive text summarization models are based on the -sequence-to-sequence model (Seq2Seq). The source content of social media is -long and noisy, so it is difficult for Seq2Seq to learn an accurate semantic -representation. Compared with the source content, the annotated summary is -short and well written. Moreover, it shares the same meaning as the source -content. In this work, we supervise the learning of the representation of the -source content with that of the summary. In implementation, we regard a summary -autoencoder as an assistant supervisor of Seq2Seq. Following previous work, we -evaluate our model on a popular Chinese social media dataset. Experimental -results show that our model achieves the state-of-the-art performances on the -benchmark dataset. -" -7544,1805.04871,"Shuming Ma, Xu Sun, Yizhong Wang, Junyang Lin",Bag-of-Words as Target for Neural Machine Translation,cs.CL," A sentence can be translated into more than one correct sentences. However, -most of the existing neural machine translation models only use one of the -correct translations as the targets, and the other correct sentences are -punished as the incorrect sentences in the training stage. Since most of the -correct translations for one sentence share the similar bag-of-words, it is -possible to distinguish the correct translations from the incorrect ones by the -bag-of-words. In this paper, we propose an approach that uses both the -sentences and the bag-of-words as targets in the training stage, in order to -encourage the model to generate the potentially correct sentences that are not -appeared in the training set. We evaluate our model on a Chinese-English -translation dataset, and experiments show our model outperforms the strong -baselines by the BLEU score of 4.55. -" -7545,1805.04876,Andrei M. Butnaru and Radu Tudor Ionescu,"UnibucKernel Reloaded: First Place in Arabic Dialect Identification for - the Second Year in a Row",cs.CL," We present a machine learning approach that ranked on the first place in the -Arabic Dialect Identification (ADI) Closed Shared Tasks of the 2018 VarDial -Evaluation Campaign. The proposed approach combines several kernels using -multiple kernel learning. While most of our kernels are based on character -p-grams (also known as n-grams) extracted from speech or phonetic transcripts, -we also use a kernel based on dialectal embeddings generated from audio -recordings by the organizers. In the learning stage, we independently employ -Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR). -Preliminary experiments indicate that KRR provides better classification -results. Our approach is shallow and simple, but the empirical results obtained -in the 2018 ADI Closed Shared Task prove that it achieves the best performance. -Furthermore, our top macro-F1 score (58.92%) is significantly better than the -second best score (57.59%) in the 2018 ADI Shared Task, according to the -statistical significance test performed by the organizers. Nevertheless, we -obtain even better post-competition results (a macro-F1 score of 62.28%) using -the audio embeddings released by the organizers after the competition. With a -very similar approach (that did not include phonetic features), we also ranked -first in the ADI Closed Shared Tasks of the 2017 VarDial Evaluation Campaign, -surpassing the second best method by 4.62%. We therefore conclude that our -multiple kernel learning method is the best approach to date for Arabic dialect -identification. -" -7546,1805.04893,"Rui Zhang, Cicero Nogueira dos Santos, Michihiro Yasunaga, Bing Xiang, - Dragomir Radev","Neural Coreference Resolution with Deep Biaffine Attention by Joint - Mention Detection and Mention Clustering",cs.CL," Coreference resolution aims to identify in a text all mentions that refer to -the same real-world entity. The state-of-the-art end-to-end neural coreference -model considers all text spans in a document as potential mentions and learns -to link an antecedent for each possible mention. In this paper, we propose to -improve the end-to-end coreference resolution system by (1) using a biaffine -attention model to get antecedent scores for each possible mention, and (2) -jointly optimizing the mention detection accuracy and the mention clustering -log-likelihood given the mention cluster labels. Our model achieves the -state-of-the-art performance on the CoNLL-2012 Shared Task English test set. -" -7547,1805.04905,"Nathan Schneider, Jena D. Hwang, Vivek Srikumar, Jakob Prange, Austin - Blodgett, Sarah R. Moeller, Aviram Stern, Adi Bitan, Omri Abend","Comprehensive Supersense Disambiguation of English Prepositions and - Possessives",cs.CL," Semantic relations are often signaled with prepositional or possessive -marking--but extreme polysemy bedevils their analysis and automatic -interpretation. We introduce a new annotation scheme, corpus, and task for the -disambiguation of prepositions and possessives in English. Unlike previous -approaches, our annotations are comprehensive with respect to types and tokens -of these markers; use broadly applicable supersense classes rather than -fine-grained dictionary definitions; unite prepositions and possessives under -the same class inventory; and distinguish between a marker's lexical -contribution and the role it marks in the context of a predicate or scene. -Strong interannotator agreement rates, as well as encouraging disambiguation -results with established supervised methods, speak to the viability of the -scheme and task. -" -7548,1805.04908,"Gail Weiss, Yoav Goldberg, Eran Yahav","On the Practical Computational Power of Finite Precision RNNs for - Language Recognition",cs.LG cs.CL stat.ML," While Recurrent Neural Networks (RNNs) are famously known to be Turing -complete, this relies on infinite precision in the states and unbounded -computation time. We consider the case of RNNs with finite precision whose -computation time is linear in the input length. Under these limitations, we -show that different RNN variants have different computational power. In -particular, we show that the LSTM and the Elman-RNN with ReLU activation are -strictly stronger than the RNN with a squashing activation and the GRU. This is -achieved because LSTMs and ReLU-RNNs can easily implement counting behavior. We -show empirically that the LSTM does indeed learn to effectively use the -counting mechanism. -" -7549,1805.04988,"Jon Gauthier, Roger Levy, Joshua B. Tenenbaum",Word learning and the acquisition of syntactic--semantic overhypotheses,cs.CL," Children learning their first language face multiple problems of induction: -how to learn the meanings of words, and how to build meaningful phrases from -those words according to syntactic rules. We consider how children might solve -these problems efficiently by solving them jointly, via a computational model -that learns the syntax and semantics of multi-word utterances in a grounded -reference game. We select a well-studied empirical case in which children are -aware of patterns linking the syntactic and semantic properties of words --- -that the properties picked out by base nouns tend to be related to shape, while -prenominal adjectives tend to refer to other properties such as color. We show -that children applying such inductive biases are accurately reflecting the -statistics of child-directed speech, and that inducing similar biases in our -computational model captures children's behavior in a classic adjective -learning experiment. Our model incorporating such biases also demonstrates a -clear data efficiency in learning, relative to a baseline model that learns -without forming syntax-sensitive overhypotheses of word meaning. Thus solving a -more complex joint inference problem may make the full problem of language -acquisition easier, not harder. -" -7550,1805.04993,"Alice Lai, Joel Tetreault","Discourse Coherence in the Wild: A Dataset, Evaluation and Methods",cs.CL," To date there has been very little work on assessing discourse coherence -methods on real-world data. To address this, we present a new corpus of -real-world texts (GCDC) as well as the first large-scale evaluation of leading -discourse coherence algorithms. We show that neural models, including two that -we introduce here (SentAvg and ParSeq), tend to perform best. We analyze these -performance differences and discuss patterns we observed in low coherence texts -in four domains. -" -7551,1805.05062,Maha Elbayad and Laurent Besacier and Jakob Verbeek,Token-level and sequence-level loss smoothing for RNN language models,cs.CL cs.CV," Despite the effectiveness of recurrent neural network language models, their -maximum likelihood estimation suffers from two limitations. It treats all -sentences that do not match the ground truth as equally poor, ignoring the -structure of the output space. Second, it suffers from ""exposure bias"": during -training tokens are predicted given ground-truth sequences, while at test time -prediction is conditioned on generated output sequences. To overcome these -limitations we build upon the recent reward augmented maximum likelihood -approach \ie sequence-level smoothing that encourages the model to predict -sentences close to the ground truth according to a given performance metric. We -extend this approach to token-level loss smoothing, and propose improvements to -the sequence-level smoothing approach. Our experiments on two different tasks, -image captioning and machine translation, show that token-level and -sequence-level loss smoothing are complementary, and significantly improve -results. -" -7552,1805.05081,"Zhongyang Li, Xiao Ding, Ting Liu","Constructing Narrative Event Evolutionary Graph for Script Event - Prediction",cs.AI cs.CL," Script event prediction requires a model to predict the subsequent event -given an existing event context. Previous models based on event pairs or event -chains cannot make full use of dense event connections, which may limit their -capability of event prediction. To remedy this, we propose constructing an -event graph to better utilize the event network information for script event -prediction. In particular, we first extract narrative event chains from large -quantities of news corpus, and then construct a narrative event evolutionary -graph (NEEG) based on the extracted chains. NEEG can be seen as a knowledge -base that describes event evolutionary principles and patterns. To solve the -inference problem on NEEG, we present a scaled graph neural network (SGNN) to -model event interactions and learn better event representations. Instead of -computing the representations on the whole graph, SGNN processes only the -concerned nodes each time, which makes our model feasible to large-scale -graphs. By comparing the similarity between input context event representations -and candidate event representations, we can choose the most reasonable -subsequent event. Experimental results on widely used New York Times corpus -demonstrate that our model significantly outperforms state-of-the-art baseline -methods, by using standard multiple choice narrative cloze evaluation. -" -7553,1805.05089,"Sara Stymne, Miryam de Lhoneux, Aaron Smith, and Joakim Nivre",Parser Training with Heterogeneous Treebanks,cs.CL," How to make the most of multiple heterogeneous treebanks when training a -monolingual dependency parser is an open question. We start by investigating -previously suggested, but little evaluated, strategies for exploiting multiple -treebanks based on concatenating training sets, with or without fine-tuning. We -go on to propose a new method based on treebank embeddings. We perform -experiments for several languages and show that in many cases fine-tuning and -treebank embeddings lead to substantial improvements over single treebanks or -concatenation, with average gains of 2.0--3.5 LAS points. We argue that -treebank embeddings should be preferred due to their conceptual simplicity, -flexibility and extensibility. -" -7554,1805.05091,"Jos\'e Lopes, Nils Hemmingsson, Oliver \r{A}strand","The Spot the Difference corpus: a multi-modal corpus of spontaneous task - oriented spoken interactions",cs.CL," This paper describes the Spot the Difference Corpus which contains 54 -interactions between pairs of subjects interacting to find differences in two -very similar scenes. The setup used, the participants' metadata and details -about collection are described. We are releasing this corpus of task-oriented -spontaneous dialogues. This release includes rich transcriptions, annotations, -audio and video. We believe that this dataset constitutes a valuable resource -to study several dimensions of human communication that go from turn-taking to -the study of referring expressions. In our preliminary analyses we have looked -at task success (how many differences were found out of the total number of -differences) and how it evolves over time. In addition we have looked at scene -complexity provided by the RGB components' entropy and how it could relate to -speech overlaps, interruptions and the expression of uncertainty. We found -there is a tendency that more complex scenes have more competitive -interruptions. -" -7555,1805.05095,Duygu Ataman,"Bianet: A Parallel News Corpus in Turkish, Kurdish and English",cs.CL," We present a new open-source parallel corpus consisting of news articles -collected from the Bianet magazine, an online newspaper that publishes Turkish -news, often along with their translations in English and Kurdish. In this -paper, we describe the collection process of the corpus and its statistical -properties. We validate the benefit of using the Bianet corpus by evaluating -bilingual and multilingual neural machine translation models in English-Turkish -and English-Kurdish directions. -" -7556,1805.05181,"Jingjing Xu, Xu Sun, Qi Zeng, Xuancheng Ren, Xiaodong Zhang, Houfeng - Wang, Wenjie Li","Unpaired Sentiment-to-Sentiment Translation: A Cycled Reinforcement - Learning Approach",cs.CL," The goal of sentiment-to-sentiment ""translation"" is to change the underlying -sentiment of a sentence while keeping its content. The main challenge is the -lack of parallel data. To solve this problem, we propose a cycled reinforcement -learning method that enables training on unpaired data by collaboration between -a neutralization module and an emotionalization module. We evaluate our -approach on two review datasets, Yelp and Amazon. Experimental results show -that our approach significantly outperforms the state-of-the-art systems. -Especially, the proposed method substantially improves the content preservation -performance. The BLEU score is improved from 1.64 to 22.46 and from 0.56 to -14.06 on the two datasets, respectively. -" -7557,1805.05202,Daniel Fern\'andez-Gonz\'alez and Carlos G\'omez-Rodr\'iguez,A Dynamic Oracle for Linear-Time 2-Planar Dependency Parsing,cs.CL," We propose an efficient dynamic oracle for training the 2-Planar -transition-based parser, a linear-time parser with over 99% coverage on -non-projective syntactic corpora. This novel approach outperforms the static -training strategy in the vast majority of languages tested and scored better on -most datasets than the arc-hybrid parser enhanced with the SWAP transition, -which can handle unrestricted non-projectivity. -" -7558,1805.05225,"Albert Zeyer, Tamer Alkhouli, and Hermann Ney","RETURNN as a Generic Flexible Neural Toolkit with Application to - Translation and Speech Recognition",cs.NE cs.AI cs.CL," We compare the fast training and decoding speed of RETURNN of attention -models for translation, due to fast CUDA LSTM kernels, and a fast pure -TensorFlow beam search decoder. We show that a layer-wise pretraining scheme -for recurrent attention models gives over 1% BLEU improvement absolute and it -allows to train deeper recurrent encoder networks. Promising preliminary -results on max. expected BLEU training are presented. We are able to train -state-of-the-art models for translation and end-to-end models for speech -recognition and show results on WMT 2017 and Switchboard. The flexibility of -RETURNN allows a fast research feedback loop to experiment with alternative -architectures, and its generality allows to use it on a wide range of -applications. -" -7559,1805.05237,"Sabrina Stehwien, Ngoc Thang Vu, Antje Schweitzer","Effects of Word Embeddings on Neural Network-based Pitch Accent - Detection",cs.CL," Pitch accent detection often makes use of both acoustic and lexical features -based on the fact that pitch accents tend to correlate with certain words. In -this paper, we extend a pitch accent detector that involves a convolutional -neural network to include word embeddings, which are state-of-the-art vector -representations of words. We examine the effect these features have on -within-corpus and cross-corpus experiments on three English datasets. The -results show that while word embeddings can improve the performance in -corpus-dependent experiments, they also have the potential to make -generalization to unseen data more challenging. -" -7560,1805.05271,"Guokan Shang (1 and 2), Wensi Ding (1), Zekun Zhang (1), Antoine - Jean-Pierre Tixier (1), Polykarpos Meladianos (1 and 3), Michalis - Vazirgiannis (1 and 3), Jean-Pierre Lorr\'e (2) ((1) \'Ecole Polytechnique, - (2) Linagora, (3) AUEB)","Unsupervised Abstractive Meeting Summarization with Multi-Sentence - Compression and Budgeted Submodular Maximization",cs.CL," We introduce a novel graph-based framework for abstractive meeting speech -summarization that is fully unsupervised and does not rely on any annotations. -Our work combines the strengths of multiple recent approaches while addressing -their weaknesses. Moreover, we leverage recent advances in word embeddings and -graph degeneracy applied to NLP to take exterior semantic knowledge into -account, and to design custom diversity and informativeness measures. -Experiments on the AMI and ICSI corpus show that our system improves on the -state-of-the-art. Code and data are publicly available, and our system can be -interactively tested. -" -7561,1805.05286,Chunchuan Lyu and Ivan Titov,AMR Parsing as Graph Prediction with Latent Alignment,cs.CL," Abstract meaning representations (AMRs) are broad-coverage sentence-level -semantic representations. AMRs represent sentences as rooted labeled directed -acyclic graphs. AMR parsing is challenging partly due to the lack of annotated -alignments between nodes in the graphs and words in the corresponding -sentences. We introduce a neural parser which treats alignments as latent -variables within a joint probabilistic model of concepts, relations and -alignments. As exact inference requires marginalizing over alignments and is -infeasible, we use the variational auto-encoding framework and a continuous -relaxation of the discrete alignments. We show that joint modeling is -preferable to using a pipeline of align and parse. The parser achieves the best -reported results on the standard benchmark (74.4% on LDC2016E25). -" -7562,1805.05345,"Justine Zhang, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil, - Lucas Dixon, Yiqing Hua, Nithum Thain, Dario Taraborelli",Conversations Gone Awry: Detecting Early Signs of Conversational Failure,cs.CL cs.AI cs.CY cs.HC physics.soc-ph," One of the main challenges online social systems face is the prevalence of -antisocial behavior, such as harassment and personal attacks. In this work, we -introduce the task of predicting from the very start of a conversation whether -it will get out of hand. As opposed to detecting undesirable behavior after the -fact, this task aims to enable early, actionable prediction at a time when the -conversation might still be salvaged. - To this end, we develop a framework for capturing pragmatic devices---such as -politeness strategies and rhetorical prompts---used to start a conversation, -and analyze their relation to its future trajectory. Applying this framework in -a controlled setting, we demonstrate the feasibility of detecting early warning -signs of antisocial behavior in online discussions. -" -7563,1805.05361,"Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin - Wang, Lawrence Carin, Ricardo Henao","NASH: Toward End-to-End Neural Architecture for Generative Semantic - Hashing",cs.CL cs.IR cs.LG," Semantic hashing has become a powerful paradigm for fast similarity search in -many information retrieval systems. While fairly successful, previous -techniques generally require two-stage training, and the binary constraints are -handled ad-hoc. In this paper, we present an end-to-end Neural Architecture for -Semantic Hashing (NASH), where the binary hashing codes are treated as -Bernoulli latent variables. A neural variational inference framework is -proposed for training, where gradients are directly back-propagated through the -discrete latent variable to optimize the hash function. We also draw -connections between proposed method and rate-distortion theory, which provides -a theoretical foundation for the effectiveness of the proposed framework. -Experimental results on three public datasets demonstrate that our method -significantly outperforms several state-of-the-art models on both unsupervised -and supervised scenarios. -" -7564,1805.05370,"Laura Aina, Carina Silberer, Ionut-Teodor Sorodoc, Matthijs Westera, - Gemma Boleda",AMORE-UPF at SemEval-2018 Task 4: BiLSTM with Entity Library,cs.CL," This paper describes our winning contribution to SemEval 2018 Task 4: -Character Identification on Multiparty Dialogues. It is a simple, standard -model with one key innovation, an entity library. Our results show that this -innovation greatly facilitates the identification of infrequent characters. -Because of the generic nature of our model, this finding is potentially -relevant to any task that requires effective learning from sparse or unbalanced -data. -" -7565,1805.05377,"Nicholas FitzGerald, Julian Michael, Luheng He, Luke Zettlemoyer",Large-Scale QA-SRL Parsing,cs.CL cs.AI," We present a new large-scale corpus of Question-Answer driven Semantic Role -Labeling (QA-SRL) annotations, and the first high-quality QA-SRL parser. Our -corpus, QA-SRL Bank 2.0, consists of over 250,000 question-answer pairs for -over 64,000 sentences across 3 domains and was gathered with a new -crowd-sourcing scheme that we show has high precision and good recall at modest -cost. We also present neural models for two QA-SRL subtasks: detecting argument -spans for a predicate and generating questions to label the semantic -relationship. The best models achieve question accuracy of 82.6% and span-level -accuracy of 77.6% (under human evaluation) on the full pipelined QA-SRL -prediction task. They can also, as we show, be used to gather additional -annotations at low cost. -" -7566,1805.05388,"Mikhail Khodak, Nikunj Saunshi, Yingyu Liang, Tengyu Ma, Brandon - Stewart, Sanjeev Arora","A La Carte Embedding: Cheap but Effective Induction of Semantic Feature - Vectors",cs.CL cs.AI," Motivations like domain adaptation, transfer learning, and feature learning -have fueled interest in inducing embeddings for rare or unseen words, n-grams, -synsets, and other textual features. This paper introduces a la carte -embedding, a simple and general alternative to the usual word2vec-based -approaches for building such representations that is based upon recent -theoretical results for GloVe-like embeddings. Our method relies mainly on a -linear transformation that is efficiently learnable using pretrained word -vectors and linear regression. This transform is applicable on the fly in the -future when a new text feature or rare word is encountered, even if only a -single usage example is available. We introduce a new dataset showing how the a -la carte method requires fewer examples of words in context to learn -high-quality embeddings and we obtain state-of-the-art results on a nonce task -and some unsupervised document classification tasks. -" -7567,1805.05470,"Davide Frazzetto, Bijay Neupane, Torben Bach Pedersen, Thomas Dyhre - Nielsen","Adaptive User-Oriented Direct Load-Control of Residential Flexible - Devices",cs.CY cs.CL," Demand Response (DR) schemes are effective tools to maintain a dynamic -balance in energy markets with higher integration of fluctuating renewable -energy sources. DR schemes can be used to harness residential devices' -flexibility and to utilize it to achieve social and financial objectives. -However, existing DR schemes suffer from low user participation as they fail at -taking into account the users' requirements. First, DR schemes are highly -demanding for the users, as users need to provide direct information, e.g. via -surveys, on their energy consumption preferences. Second, the user utility -models based on these surveys are hard-coded and do not adapt over time. Third, -the existing scheduling techniques require the users to input their energy -requirements on a daily basis. As an alternative, this paper proposes a DR -scheme for user-oriented direct load-control of residential appliances -operations. Instead of relying on user surveys to evaluate the user utility, we -propose an online data-driven approach for estimating user utility functions, -purely based on available load consumption data, that adaptively models the -users' preference over time. Our scheme is based on a day-ahead scheduling -technique that transparently prescribes the users with optimal device operation -schedules that take into account both financial benefits and user-perceived -quality of service. To model day-ahead user energy demand and flexibility, we -propose a probabilistic approach for generating flexibility models under -uncertainty. Results on both real-world and simulated datasets show that our DR -scheme can provide significant financial benefits while preserving the -user-perceived quality of service. -" -7568,1805.05491,"Martin Mueller, Marcel Salath\'e","Crowdbreaks: Tracking Health Trends using Public Social Media Data and - Crowdsourcing",cs.CY cs.CL cs.SI stat.ML," In the past decade, tracking health trends using social media data has shown -great promise, due to a powerful combination of massive adoption of social -media around the world, and increasingly potent hardware and software that -enables us to work with these new big data streams. At the same time, many -challenging problems have been identified. First, there is often a mismatch -between how rapidly online data can change, and how rapidly algorithms are -updated, which means that there is limited reusability for algorithms trained -on past data as their performance decreases over time. Second, much of the work -is focusing on specific issues during a specific past period in time, even -though public health institutions would need flexible tools to assess multiple -evolving situations in real time. Third, most tools providing such capabilities -are proprietary systems with little algorithmic or data transparency, and thus -little buy-in from the global public health and research community. Here, we -introduce Crowdbreaks, an open platform which allows tracking of health trends -by making use of continuous crowdsourced labelling of public social media -content. The system is built in a way which automatizes the typical workflow -from data collection, filtering, labelling and training of machine learning -classifiers and therefore can greatly accelerate the research process in the -public health domain. This work introduces the technical aspects of the -platform and explores its future use cases. -" -7569,1805.05492,"Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, Kedar - Dhamdhere",Did the Model Understand the Question?,cs.CL cs.AI," We analyze state-of-the-art deep learning models for three tasks: question -answering on (1) images, (2) tables, and (3) passages of text. Using the notion -of \emph{attribution} (word importance), we find that these deep networks often -ignore important question terms. Leveraging such behavior, we perturb questions -to craft a variety of adversarial examples. Our strongest attacks drop the -accuracy of a visual question answering model from $61.1\%$ to $19\%$, and that -of a tabular question answering model from $33.5\%$ to $3.3\%$. Additionally, -we show how attributions can strengthen attacks proposed by Jia and Liang -(2017) on paragraph comprehension models. Our results demonstrate that -attributions can augment standard measures of accuracy and empower -investigation of model performance. When a model is accurate but for the wrong -reasons, attributions can surface erroneous logic in the model that indicates -inadequacies in the test data. -" -7570,1805.05542,"Jing Li, Yan Song, Haisong Zhang, Shuming Shi","A Manually Annotated Chinese Corpus for Non-task-oriented Dialogue - Systems",cs.CL," This paper presents a large-scale corpus for non-task-oriented dialogue -response selection, which contains over 27K distinct prompts more than 82K -responses collected from social media. To annotate this corpus, we define a -5-grade rating scheme: bad, mediocre, acceptable, good, and excellent, -according to the relevance, coherence, informativeness, interestingness, and -the potential to move a conversation forward. To test the validity and -usefulness of the produced corpus, we compare various unsupervised and -supervised models for response selection. Experimental results confirm that the -proposed corpus is helpful in training response selection models. -" -7571,1805.05557,"Alexander Mathews, Lexing Xie, Xuming He",Simplifying Sentences with Sequence to Sequence Models,cs.CL," We simplify sentences with an attentive neural network sequence to sequence -model, dubbed S4. The model includes a novel word-copy mechanism and loss -function to exploit linguistic similarities between the original and simplified -sentences. It also jointly uses pre-trained and fine-tuned word embeddings to -capture the semantics of complex sentences and to mitigate the effects of -limited data. When trained and evaluated on pairs of sentences from thousands -of news articles, we observe a 8.8 point improvement in BLEU score over a -sequence to sequence baseline; however, learning word substitutions remains -difficult. Such sequence to sequence models are promising for other text -generation tasks such as style transfer. -" -7572,1805.05574,"Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen","Improved ASR for Under-Resourced Languages Through Multi-Task Learning - with Acoustic Landmarks",cs.CL cs.SD eess.AS," Furui first demonstrated that the identity of both consonant and vowel can be -perceived from the C-V transition; later, Stevens proposed that acoustic -landmarks are the primary cues for speech perception, and that steady-state -regions are secondary or supplemental. Acoustic landmarks are perceptually -salient, even in a language one doesn't speak, and it has been demonstrated -that non-speakers of the language can identify features such as the primary -articulator of the landmark. These factors suggest a strategy for developing -language-independent automatic speech recognition: landmarks can potentially be -learned once from a suitably labeled corpus and rapidly applied to many other -languages. This paper proposes enhancing the cross-lingual portability of a -neural network by using landmarks as the secondary task in multi-task learning -(MTL). The network is trained in a well-resourced source language with both -phone and landmark labels (English), then adapted to an under-resourced target -language with only word labels (Iban). Landmark-tasked MTL reduces -source-language phone error rate by 2.9% relative, and reduces target-language -word error rate by 1.9%-5.9% depending on the amount of target-language -training data. These results suggest that landmark-tasked MTL causes the DNN to -learn hidden-node features that are useful for cross-lingual adaptation. -" -7573,1805.05581,"Reina Akama, Kento Watanabe, Sho Yokoi, Sosuke Kobayashi, Kentaro Inui",Unsupervised Learning of Style-sensitive Word Vectors,cs.CL," This paper presents the first study aimed at capturing stylistic similarity -between words in an unsupervised manner. We propose extending the continuous -bag of words (CBOW) model (Mikolov et al., 2013) to learn style-sensitive word -vectors using a wider context window under the assumption that the style of all -the words in an utterance is consistent. In addition, we introduce a novel task -to predict lexical stylistic similarity and to create a benchmark dataset for -this task. Our experiment with this dataset supports our assumption and -demonstrates that the proposed extensions contribute to the acquisition of -style-sensitive word embeddings. -" -7574,1805.05588,"Bingfeng Luo, Yansong Feng, Zheng Wang, Songfang Huang, Rui Yan and - Dongyan Zhao","Marrying up Regular Expressions with Neural Networks: A Case Study for - Spoken Language Understanding",cs.CL," The success of many natural language processing (NLP) tasks is bound by the -number and quality of annotated data, but there is often a shortage of such -training data. In this paper, we ask the question: ""Can we combine a neural -network (NN) with regular expressions (RE) to improve supervised learning for -NLP?"". In answer, we develop novel methods to exploit the rich expressiveness -of REs at different levels within a NN, showing that the combination -significantly enhances the learning effectiveness when a small number of -training examples are available. We evaluate our approach by applying it to -spoken language understanding for intent detection and slot filling. -Experimental results show that our approach is highly effective in exploiting -the available training data, giving a clear boost to the RE-unaware NN. -" -7575,1805.05593,"Masaki Asada, Makoto Miwa and Yutaka Sasaki","Enhancing Drug-Drug Interaction Extraction from Texts by Molecular - Structure Information",cs.CL," We propose a novel neural method to extract drug-drug interactions (DDIs) -from texts using external drug molecular structure information. We encode -textual drug pairs with convolutional neural networks and their molecular pairs -with graph convolutional networks (GCNs), and then we concatenate the outputs -of these two networks. In the experiments, we show that GCNs can predict DDIs -from the molecular structures of drugs in high accuracy and the molecular -information can enhance text-based DDI extraction by 2.39 percent points in the -F-score on the DDIExtraction 2013 shared task data set. -" -7576,1805.05622,"Marko Smilevski, Ilija Lalkovski, Gjorgji Madjarov",Stories for Images-in-Sequence by using Visual and Narrative Components,cs.AI cs.CL cs.CV," Recent research in AI is focusing towards generating narrative stories about -visual scenes. It has the potential to achieve more human-like understanding -than just basic description generation of images- in-sequence. In this work, we -propose a solution for generating stories for images-in-sequence that is based -on the Sequence to Sequence model. As a novelty, our encoder model is composed -of two separate encoders, one that models the behaviour of the image sequence -and other that models the sentence-story generated for the previous image in -the sequence of images. By using the image sequence encoder we capture the -temporal dependencies between the image sequence and the sentence-story and by -using the previous sentence-story encoder we achieve a better story flow. Our -solution generates long human-like stories that not only describe the visual -context of the image sequence but also contains narrative and evaluative -language. The obtained results were confirmed by manual human evaluation. -" -7577,1805.05631,"William Schueller, Vittorio Loreto and Pierre-Yves Oudeyer",Complexity Reduction in the Negotiation of New Lexical Conventions,cs.MA cs.CL cs.SI," In the process of collectively inventing new words for new concepts in a -population, conflicts can quickly become numerous, in the form of synonymy and -homonymy. Remembering all of them could cost too much memory, and remembering -too few may slow down the overall process. Is there an efficient behavior that -could help balance the two? The Naming Game is a multi-agent computational -model for the emergence of language, focusing on the negotiation of new lexical -conventions, where a common lexicon self-organizes but going through a phase of -high complexity. Previous work has been done on the control of complexity -growth in this particular model, by allowing agents to actively choose what -they talk about. However, those strategies were relying on ad hoc heuristics -highly dependent on fine-tuning of parameters. We define here a new principled -measure and a new strategy, based on the beliefs of each agent on the global -state of the population. The measure does not rely on heavy computation, and is -cognitively plausible. The new strategy yields an efficient control of -complexity growth, along with a faster agreement process. Also, we show that -short-term memory is enough to build relevant beliefs about the global lexicon. -" -7578,1805.05691,Graham Spinks and Marie-Francine Moens,Generating Continuous Representations of Medical Texts,cs.CL," We present an architecture that generates medical texts while learning an -informative, continuous representation with discriminative features. During -training the input to the system is a dataset of captions for medical X-Rays. -The acquired continuous representations are of particular interest for use in -many machine learning techniques where the discrete and high-dimensional nature -of textual input is an obstacle. We use an Adversarially Regularized -Autoencoder to create realistic text in both an unconditional and conditional -setting. We show that this technique is applicable to medical texts which often -contain syntactic and domain-specific shorthands. A quantitative evaluation -shows that we achieve a lower model perplexity than a traditional LSTM -generator. -" -7579,1805.05758,"Thomas Wolf, Julien Chaumond, Clement Delangue",Continuous Learning in a Hierarchical Multiscale Neural Network,cs.CL," We reformulate the problem of encoding a multi-scale representation of a -sequence in a language model by casting it in a continuous learning framework. -We propose a hierarchical multi-scale language model in which short time-scale -dependencies are encoded in the hidden state of a lower-level recurrent neural -network while longer time-scale dependencies are encoded in the dynamic of the -lower-level network by having a meta-learner update the weights of the -lower-level neural network in an online meta-learning fashion. We use elastic -weights consolidation as a higher-level to prevent catastrophic forgetting in -our continuous learning framework. -" -7580,1805.05826,"Hiroshi Seki, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux, John R. - Hershey",A Purely End-to-end System for Multi-speaker Speech Recognition,cs.SD cs.CL eess.AS stat.ML," Recently, there has been growing interest in multi-speaker speech -recognition, where the utterances of multiple speakers are recognized from -their mixture. Promising techniques have been proposed for this task, but -earlier works have required additional training data such as isolated source -signals or senone alignments for effective learning. In this paper, we propose -a new sequence-to-sequence framework to directly decode multiple label -sequences from a single speech sequence by unifying source separation and -speech recognition functions in an end-to-end manner. We further propose a new -objective function to improve the contrast between the hidden vectors to avoid -generating similar hypotheses. Experimental results show that the model is -directly able to learn a mapping from a speech mixture to multiple label -sequences, achieving 83.1 % relative improvement compared to a model trained -without the proposed objective. Interestingly, the results are comparable to -those produced by previous end-to-end works featuring explicit separation and -recognition modules. -" -7581,1805.05927,"M A H Zahid, Ankush Mittal, R.C. Joshi, and G. Atluri",CLINIQA: A Machine Intelligence Based Clinical Question Answering System,cs.CL," The recent developments in the field of biomedicine have made large volumes -of biomedical literature available to the medical practitioners. Due to the -large size and lack of efficient searching strategies, medical practitioners -struggle to obtain necessary information available in the biomedical -literature. Moreover, the most sophisticated search engines of age are not -intelligent enough to interpret the clinicians' questions. These facts reflect -the urgent need of an information retrieval system that accepts the queries -from medical practitioners' in natural language and returns the answers quickly -and efficiently. In this paper, we present an implementation of a machine -intelligence based CLINIcal Question Answering system (CLINIQA) to answer -medical practitioner's questions. The system was rigorously evaluated on -different text mining algorithms and the best components for the system were -selected. The system makes use of Unified Medical Language System for semantic -analysis of both questions and medical documents. In addition, the system -employs supervised machine learning algorithms for classification of the -documents, identifying the focus of the question and answer selection. -Effective domain-specific heuristics are designed for answer ranking. The -performance evaluation on hundred clinical questions shows the effectiveness of -our approach. -" -7582,1805.05942,Xinya Du and Claire Cardie,Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia,cs.CL," We study the task of generating from Wikipedia articles question-answer pairs -that cover content beyond a single sentence. We propose a neural network -approach that incorporates coreference knowledge via a novel gating mechanism. -Compared to models that only take into account sentence-level information -(Heilman and Smith, 2010; Du et al., 2017; Zhou et al., 2017), we find that the -linguistic knowledge introduced by the coreference representation aids question -generation significantly, producing models that outperform the current -state-of-the-art. We apply our system (composed of an answer span extraction -system and the passage-level QG system) to the 10,000 top-ranking Wikipedia -articles and create a corpus of over one million question-answer pairs. We also -provide a qualitative analysis for this large-scale generated corpus from -Wikipedia. -" -7583,1805.06016,"Vinodkumar Prabhakaran, Premkumar Ganeshkumar, Owen Rambow","Author Commitment and Social Power: Automatic Belief Tagging to Infer - the Social Context of Interactions",cs.CL," Understanding how social power structures affect the way we interact with one -another is of great interest to social scientists who want to answer -fundamental questions about human behavior, as well as to computer scientists -who want to build automatic methods to infer the social contexts of -interactions. In this paper, we employ advancements in extra-propositional -semantics extraction within NLP to study how author commitment reflects the -social context of an interaction. Specifically, we investigate whether the -level of commitment expressed by individuals in an organizational interaction -reflects the hierarchical power structures they are part of. We find that -subordinates use significantly more instances of non-commitment than superiors. -More importantly, we also find that subordinates attribute propositions to -other agents more often than superiors do --- an aspect that has not been -studied before. Finally, we show that enriching lexical features with -commitment labels captures important distinctions in social meanings. -" -7584,1805.06061,"Roy Schwartz, Sam Thomson, and Noah A. Smith","SoPa: Bridging CNNs, RNNs, and Weighted Finite-State Machines",cs.CL cs.AI cs.LG," Recurrent and convolutional neural networks comprise two distinct families of -models that have proven to be useful for encoding natural language utterances. -In this paper we present SoPa, a new model that aims to bridge these two -approaches. SoPa combines neural representation learning with weighted -finite-state automata (WFSAs) to learn a soft version of traditional surface -patterns. We show that SoPa is an extension of a one-layer CNN, and that such -CNNs are equivalent to a restricted version of SoPa, and accordingly, to a -restricted form of WFSA. Empirically, on three text classification tasks, SoPa -is comparable or better than both a BiLSTM (RNN) baseline and a CNN baseline, -and is particularly useful in small data settings. -" -7585,1805.06064,"Qingyun Wang, Zhihao Zhou, Lifu Huang, Spencer Whitehead, Boliang - Zhang, Heng Ji, Kevin Knight",Paper Abstract Writing through Editing Mechanism,cs.CL cs.AI," We present a paper abstract writing system based on an attentive neural -sequence-to-sequence model that can take a title as input and automatically -generate an abstract. We design a novel Writing-editing Network that can attend -to both the title and the previously generated abstract drafts and then -iteratively revise and polish the abstract. With two series of Turing tests, -where the human judges are asked to distinguish the system-generated abstracts -from human-written ones, our system passes Turing tests by junior domain -experts at a rate up to 30% and by non-expert at a rate up to 80%. -" -7586,1805.06087,"Ari Holtzman, Jan Buys, Maxwell Forbes, Antoine Bosselut, David Golub, - and Yejin Choi",Learning to Write with Cooperative Discriminators,cs.CL," Recurrent Neural Networks (RNNs) are powerful autoregressive sequence models, -but when used to generate natural language their output tends to be overly -generic, repetitive, and self-contradictory. We postulate that the objective -function optimized by RNN language models, which amounts to the overall -perplexity of a text, is not expressive enough to capture the notion of -communicative goals described by linguistic principles such as Grice's Maxims. -We propose learning a mixture of multiple discriminative models that can be -used to complement the RNN generator and guide the decoding process. Human -evaluation demonstrates that text generated by our system is preferred over -that of baselines by a large margin and significantly enhances the overall -coherence, style, and information content of the generated text. -" -7587,1805.06088,Yitong Li and Timothy Baldwin and Trevor Cohn,"What's in a Domain? Learning Domain-Robust Text Representations using - Adversarial Training",cs.CL," Most real world language problems require learning from heterogenous corpora, -raising the problem of learning robust models which generalise well to both -similar (in domain) and dissimilar (out of domain) instances to those seen in -training. This requires learning an underlying task, while not learning -irrelevant signals and biases specific to individual domains. We propose a -novel method to optimise both in- and out-of-domain accuracy based on joint -learning of a structured neural model with domain-specific and domain-general -components, coupled with adversarial training for domain. Evaluating on -multi-domain language identification and multi-domain sentiment analysis, we -show substantial improvements over standard domain adaptation techniques, and -domain-adversarial training. -" -7588,1805.06093,"Yitong Li, Timothy Baldwin, and Trevor Cohn",Towards Robust and Privacy-preserving Text Representations,cs.CL," Written text often provides sufficient clues to identify the author, their -gender, age, and other important attributes. Consequently, the authorship of -training and evaluation corpora can have unforeseen impacts, including -differing model performance for different user groups, as well as privacy -implications. In this paper, we propose an approach to explicitly obscure -important author characteristics at training time, such that representations -learned are invariant to these attributes. Evaluating on two tasks, we show -that this leads to increased privacy in the learned representations, as well as -more robust models to varying evaluation conditions, including out-of-domain -corpora. -" -7589,1805.06122,"Fei Liu, Trevor Cohn, Timothy Baldwin",Narrative Modeling with Memory Chains and Semantic Supervision,cs.CL," Story comprehension requires a deep semantic understanding of the narrative, -making it a challenging task. Inspired by previous studies on ROC Story Cloze -Test, we propose a novel method, tracking various semantic aspects with -external neural memory chains while encouraging each to focus on a particular -semantic aspect. Evaluated on the task of story ending prediction, our model -demonstrates superior performance to a collection of competitive baselines, -setting a new state of the art. -" -7590,1805.06130,"Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, Yang Liu",Towards Robust Neural Machine Translation,cs.CL," Small perturbations in the input can severely distort intermediate -representations and thus impact translation quality of neural machine -translation (NMT) models. In this paper, we propose to improve the robustness -of NMT models with adversarial stability training. The basic idea is to make -both the encoder and decoder in NMT models robust against input perturbations -by enabling them to behave similarly for the original input and its perturbed -counterpart. Experimental results on Chinese-English, English-German and -English-French translation tasks show that our approaches can not only achieve -significant improvements over strong NMT systems but also improve the -robustness of NMT models. -" -7591,1805.06145,"Zhen Wang, Jiachen Liu, Xinyan Xiao, Yajuan Lyu, Tian Wu","Joint Training of Candidate Extraction and Answer Selection for Reading - Comprehension",cs.CL," While sophisticated neural-based techniques have been developed in reading -comprehension, most approaches model the answer in an independent manner, -ignoring its relations with other answer candidates. This problem can be even -worse in open-domain scenarios, where candidates from multiple passages should -be combined to answer a single question. In this paper, we formulate reading -comprehension as an extract-then-select two-stage procedure. We first extract -answer candidates from passages, then select the final answer by combining -information from all the candidates. Furthermore, we regard candidate -extraction as a latent variable and train the two-stage process jointly with -reinforcement learning. As a result, our approach has improved the -state-of-the-art performance significantly on two challenging open-domain -reading comprehension datasets. Further analysis demonstrates the effectiveness -of our model components, especially the information fusion of all the -candidates and the joint training of the extract-then-select procedure. -" -7592,1805.06150,"Pararth Shah, Marek Fiser, Aleksandra Faust, J. Chase Kew, and Dilek - Hakkani-Tur","FollowNet: Robot Navigation by Following Natural Language Directions - with Deep Reinforcement Learning",cs.RO cs.AI cs.CL cs.LG," Understanding and following directions provided by humans can enable robots -to navigate effectively in unknown situations. We present FollowNet, an -end-to-end differentiable neural architecture for learning multi-modal -navigation policies. FollowNet maps natural language instructions as well as -visual and depth inputs to locomotion primitives. FollowNet processes -instructions using an attention mechanism conditioned on its visual and depth -input to focus on the relevant parts of the command while performing the -navigation task. Deep reinforcement learning (RL) a sparse reward learns -simultaneously the state representation, the attention function, and control -policies. We evaluate our agent on a dataset of complex natural language -directions that guide the agent through a rich and realistic dataset of -simulated homes. We show that the FollowNet agent learns to execute previously -unseen instructions described with a similar vocabulary, and successfully -navigates along paths not encountered during training. The agent shows 30% -improvement over a baseline model without the attention mechanism, with 52% -success rate at novel instructions. -" -7593,1805.06165,"Noga Zaslavsky, Charles Kemp, Naftali Tishby, Terry Regier",Color naming reflects both perceptual structure and communicative need,cs.CL," Gibson et al. (2017) argued that color naming is shaped by patterns of -communicative need. In support of this claim, they showed that color naming -systems across languages support more precise communication about warm colors -than cool colors, and that the objects we talk about tend to be warm-colored -rather than cool-colored. Here, we present new analyses that alter this -picture. We show that greater communicative precision for warm than for cool -colors, and greater communicative need, may both be explained by perceptual -structure. However, using an information-theoretic analysis, we also show that -color naming across languages bears signs of communicative need beyond what -would be predicted by perceptual structure alone. We conclude that color naming -is shaped both by perceptual structure, as has traditionally been argued, and -by patterns of communicative need, as argued by Gibson et al. - although for -reasons other than those they advanced. -" -7594,1805.06201,Sosuke Kobayashi,"Contextual Augmentation: Data Augmentation by Words with Paradigmatic - Relations",cs.CL cs.LG," We propose a novel data augmentation for labeled sentences called contextual -augmentation. We assume an invariance that sentences are natural even if the -words in the sentences are replaced with other words with paradigmatic -relations. We stochastically replace words with other words that are predicted -by a bi-directional language model at the word positions. Words predicted -according to a context are numerous but appropriate for the augmentation of the -original words. Furthermore, we retrofit a language model with a -label-conditional architecture, which allows the model to augment sentences -without breaking the label-compatibility. Through the experiments for six -various different text classification tasks, we demonstrate that the proposed -method improves classifiers based on the convolutional or recurrent neural -networks. -" -7595,1805.06239,"Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu","A Comparison of Modeling Units in Sequence-to-Sequence Speech - Recognition with the Transformer on Mandarin Chinese",eess.AS cs.CL cs.SD," The choice of modeling units is critical to automatic speech recognition -(ASR) tasks. Conventional ASR systems typically choose context-dependent states -(CD-states) or context-dependent phonemes (CD-phonemes) as their modeling -units. However, it has been challenged by sequence-to-sequence attention-based -models, which integrate an acoustic, pronunciation and language model into a -single neural network. On English ASR tasks, previous attempts have already -shown that the modeling unit of graphemes can outperform that of phonemes by -sequence-to-sequence attention-based model. - In this paper, we are concerned with modeling units on Mandarin Chinese ASR -tasks using sequence-to-sequence attention-based models with the Transformer. -Five modeling units are explored including context-independent phonemes -(CI-phonemes), syllables, words, sub-words and characters. Experiments on HKUST -datasets demonstrate that the lexicon free modeling units can outperform -lexicon related modeling units in terms of character error rate (CER). Among -five modeling units, character based model performs best and establishes a new -state-of-the-art CER of $26.64\%$ on HKUST datasets without a hand-designed -lexicon and an extra language model integration, which corresponds to a $4.8\%$ -relative improvement over the existing best CER of $28.0\%$ by the joint -CTC-attention based encoder-decoder network. -" -7596,1805.06242,"Chandrakant Bothe, Sven Magg, Cornelius Weber and Stefan Wermter","Conversational Analysis using Utterance-level Attention-based - Bidirectional Recurrent Neural Networks",cs.CL cs.AI cs.HC cs.NE," Recent approaches for dialogue act recognition have shown that context from -preceding utterances is important to classify the subsequent one. It was shown -that the performance improves rapidly when the context is taken into account. -We propose an utterance-level attention-based bidirectional recurrent neural -network (Utt-Att-BiRNN) model to analyze the importance of preceding utterances -to classify the current one. In our setup, the BiRNN is given the input set of -current and preceding utterances. Our model outperforms previous models that -use only preceding utterances as context on the used corpus. Another -contribution of the article is to discover the amount of information in each -utterance to classify the subsequent one and to show that context-based -learning not only improves the performance but also achieves higher confidence -in the classification. We use character- and word-level features to represent -the utterances. The results are presented for character and word feature -representations and as an ensemble model of both representations. We found that -when classifying short utterances, the closest preceding utterances contributes -to a higher degree. -" -7597,1805.06266,"Wan-Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, Kerui Min, Jing Tang, Min - Sun","A Unified Model for Extractive and Abstractive Summarization using - Inconsistency Loss",cs.CL," We propose a unified model combining the strength of extractive and -abstractive summarization. On the one hand, a simple extractive model can -obtain sentence-level attention with high ROUGE scores but less readable. On -the other hand, a more complicated abstractive model can obtain word-level -dynamic attention to generate a more readable paragraph. In our model, -sentence-level attention is used to modulate the word-level attention such that -words in less attended sentences are less likely to be generated. Moreover, a -novel inconsistency loss function is introduced to penalize the inconsistency -between two levels of attentions. By end-to-end training our model with the -inconsistency loss and original losses of extractive and abstractive models, we -achieve state-of-the-art ROUGE scores while being the most informative and -readable summarization on the CNN/Daily Mail dataset in a solid human -evaluation. -" -7598,1805.06280,"Chandrakant Bothe, Cornelius Weber, Sven Magg, and Stefan Wermter","A Context-based Approach for Dialogue Act Recognition using Simple - Recurrent Neural Networks",cs.CL cs.AI cs.HC cs.NE," Dialogue act recognition is an important part of natural language -understanding. We investigate the way dialogue act corpora are annotated and -the learning approaches used so far. We find that the dialogue act is -context-sensitive within the conversation for most of the classes. -Nevertheless, previous models of dialogue act classification work on the -utterance-level and only very few consider context. We propose a novel -context-based learning method to classify dialogue acts using a character-level -language model utterance representation, and we notice significant improvement. -We evaluate this method on the Switchboard Dialogue Act corpus, and our results -show that the consideration of the preceding utterances as a context of the -current utterance improves dialogue act detection. -" -7599,1805.06289,"Firoj Alam, Shafiq Joty, Muhammad Imran","Graph Based Semi-supervised Learning with Convolution Neural Networks to - Classify Crisis Related Tweets",cs.CY cs.CL cs.IR cs.SI," During time-critical situations such as natural disasters, rapid -classification of data posted on social networks by affected people is useful -for humanitarian organizations to gain situational awareness and to plan -response efforts. However, the scarcity of labeled data in the early hours of a -crisis hinders machine learning tasks thus delays crisis response. In this -work, we propose to use an inductive semi-supervised technique to utilize -unlabeled data, which is often abundant at the onset of a crisis event, along -with fewer labeled data. Specif- ically, we adopt a graph-based deep learning -framework to learn an inductive semi-supervised model. We use two real-world -crisis datasets from Twitter to evaluate the proposed approach. Our results -show significant improvements using unlabeled data as compared to only using -labeled data. -" -7600,1805.06297,"Mikel Artetxe, Gorka Labaka, Eneko Agirre","A robust self-learning method for fully unsupervised cross-lingual - mappings of word embeddings",cs.CL cs.AI cs.LG," Recent work has managed to learn cross-lingual word embeddings without -parallel data by mapping monolingual embeddings to a shared space through -adversarial training. However, their evaluation has focused on favorable -conditions, using comparable corpora or closely-related languages, and we show -that they often fail in more realistic scenarios. This work proposes an -alternative approach based on a fully unsupervised initialization that -explicitly exploits the structural similarity of the embeddings, and a robust -self-learning algorithm that iteratively improves this solution. Our method -succeeds in all tested scenarios and obtains the best published results in -standard datasets, even surpassing previous supervised systems. Our -implementation is released as an open source project at -https://github.com/artetxem/vecmap -" -7601,1805.06344,"Rita Hijazi, Amani Sabra, Moustafa Al-Hajj",Automatic Annotation of Locative and Directional Expressions in Arabic,cs.CL," In this paper, we introduce a rule-based approach to annotate Locative and -Directional Expressions in Arabic natural language text. The annotation is -based on a constructed semantic map of the spatiality domain. Challenges are -twofold: first, we need to study how locative and directional expressions are -expressed linguistically in these texts; and second, we need to automatically -annotate the relevant textual segments accordingly. The research method we will -use in this article is analytic-descriptive. We will validate this approach on -specific novel rich with these expressions and show that it has very promising -results. We will be using NOOJ as a software tool to implement finite-state -transducers to annotate linguistic elements according to Locative and -Directional Expressions. In conclusion, NOOJ allowed us to write linguistic -rules for the automatic annotation in Arabic text of Locative and Directional -Expressions. -" -7602,1805.06375,"Debanjan Mahata, Jasper Friedrichs, Hitkul, Rajiv Ratn Shah","#phramacovigilance - Exploring Deep Learning Techniques for Identifying - Mentions of Medication Intake from Twitter",cs.CL," Mining social media messages for health and drug related information has -received significant interest in pharmacovigilance research. Social media sites -(e.g., Twitter), have been used for monitoring drug abuse, adverse reactions of -drug usage and analyzing expression of sentiments related to drugs. Most of -these studies are based on aggregated results from a large population rather -than specific sets of individuals. In order to conduct studies at an individual -level or specific cohorts, identifying posts mentioning intake of medicine by -the user is necessary. Towards this objective, we train different deep neural -network classification models on a publicly available annotated dataset and -study their performances on identifying mentions of personal intake of medicine -in tweets. We also design and train a new architecture of a stacked ensemble of -shallow convolutional neural network (CNN) ensembles. We use random search for -tuning the hyperparameters of the models and share the details of the values -taken by the hyperparameters for the best learnt model in different deep neural -network architectures. Our system produces state-of-the-art results, with a -micro- averaged F-score of 0.693. -" -7603,1805.06383,"Arturo Argueta, David Chiang",Composing Finite State Transducers on GPUs,cs.CL cs.DC," Weighted finite-state transducers (FSTs) are frequently used in language -processing to handle tasks such as part-of-speech tagging and speech -recognition. There has been previous work using multiple CPU cores to -accelerate finite state algorithms, but limited attention has been given to -parallel graphics processing unit (GPU) implementations. In this paper, we -introduce the first (to our knowledge) GPU implementation of the FST -composition operation, and we also discuss the optimizations used to achieve -the best performance on this architecture. We show that our approach obtains -speedups of up to 6x over our serial implementation and 4.5x over OpenFST. -" -7604,1805.06413,"Devamanyu Hazarika, Soujanya Poria, Sruthi Gorantla, Erik Cambria, - Roger Zimmermann, Rada Mihalcea",CASCADE: Contextual Sarcasm Detection in Online Discussion Forums,cs.CL," The literature in automated sarcasm detection has mainly focused on lexical, -syntactic and semantic-level analysis of text. However, a sarcastic sentence -can be expressed with contextual presumptions, background and commonsense -knowledge. In this paper, we propose CASCADE (a ContextuAl SarCasm DEtector) -that adopts a hybrid approach of both content and context-driven modeling for -sarcasm detection in online social media discussions. For the latter, CASCADE -aims at extracting contextual information from the discourse of a discussion -thread. Also, since the sarcastic nature and form of expression can vary from -person to person, CASCADE utilizes user embeddings that encode stylometric and -personality features of the users. When used along with content-based feature -extractors such as Convolutional Neural Networks (CNNs), we see a significant -boost in the classification performance on a large Reddit corpus. -" -7605,1805.06502,"Qingxiang Wang, Cezary Kaliszyk, Josef Urban","First Experiments with Neural Translation of Informal to Formal - Mathematics",cs.CL cs.AI cs.LG cs.LO," We report on our experiments to train deep neural networks that automatically -translate informalized LaTeX-written Mizar texts into the formal Mizar -language. To the best of our knowledge, this is the first time when neural -networks have been adopted in the formalization of mathematics. Using Luong et -al.'s neural machine translation model (NMT), we tested our aligned -informal-formal corpora against various hyperparameters and evaluated their -results. Our experiments show that our best performing model configurations are -able to generate correct Mizar statements on 65.73\% of the inference data, -with the union of all models covering 79.17\%. These results indicate that -formalization through artificial neural network is a promising approach for -automated formalization of mathematics. We present several case studies to -illustrate our results. -" -7606,1805.06503,Ameet Deshpande and Vedant Somani,Weight Initialization in Neural Language Models,cs.CL cs.AI," Semantic Similarity is an important application which finds its use in many -downstream NLP applications. Though the task is mathematically defined, -semantic similarity's essence is to capture the notions of similarity -impregnated in humans. Machines use some heuristics to calculate the similarity -between words, but these are typically corpus dependent or are useful for -specific domains. The difference between Semantic Similarity and Semantic -Relatedness motivates the development of new algorithms. For a human, the word -car and road are probably as related as car and bus. But this may not be the -case for computational methods. Ontological methods are good at encoding -Semantic Similarity and Vector Space models are better at encoding Semantic -Relatedness. There is a dearth of methods which leverage ontologies to create -better vector representations. The aim of this proposal is to explore in the -direction of a hybrid method which combines statistical/vector space methods -like Word2Vec and Ontological methods like WordNet to leverage the advantages -provided by both. -" -7607,1805.06504,"Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, Xiaoyong Du",Analogical Reasoning on Chinese Morphological and Semantic Relations,cs.CL cs.AI," Analogical reasoning is effective in capturing linguistic regularities. This -paper proposes an analogical reasoning task on Chinese. After delving into -Chinese lexical knowledge, we sketch 68 implicit morphological relations and 28 -explicit semantic relations. A big and balanced dataset CA8 is then built for -this task, including 17813 questions. Furthermore, we systematically explore -the influences of vector representations, context features, and corpora on -analogical reasoning. With the experiments, CA8 is proved to be a reliable -benchmark for evaluating Chinese word embeddings. -" -7608,1805.06510,"Po Chen Kuo, Fernando H. Calderon Alvarado, Yi-Shin Chen",Facebook Reaction-Based Emotion Classifier as Cue for Sarcasm Detection,cs.CL cs.CY," Online social media users react to content in them based on context. Emotions -or mood play a significant part of these reactions, which has filled these -platforms with opinionated content. Different approaches and applications to -make better use of this data are continuously being developed. However, due to -the nature of the data, the variety of platforms, and dynamic online user -behavior, there are still many issues to be dealt with. It remains a challenge -to properly obtain a reliable emotional status from a user prior to posting a -comment. This work introduces a methodology that explores semi-supervised -multilingual emotion detection based on the overlap of Facebook reactions and -textual data. With the resulting emotion detection system we evaluate the -possibility of using emotions and user behavior features for the task of -sarcasm detection. More than 1 million English and Chinese comments from over -62,000 public Facebook pages posts have been collected and processed, conducted -experiments show acceptable performance metrics. -" -7609,1805.06511,"Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost","Improving End-of-turn Detection in Spoken Dialogues by Detecting Speaker - Intentions as a Secondary Task",cs.CL cs.AI," This work focuses on the use of acoustic cues for modeling turn-taking in -dyadic spoken dialogues. Previous work has shown that speaker intentions (e.g., -asking a question, uttering a backchannel, etc.) can influence turn-taking -behavior and are good predictors of turn-transitions in spoken dialogues. -However, speaker intentions are not readily available for use by automated -systems at run-time; making it difficult to use this information to anticipate -a turn-transition. To this end, we propose a multi-task neural approach for -predicting turn- transitions and speaker intentions simultaneously. Our results -show that adding the auxiliary task of speaker intention prediction improves -the performance of turn-transition prediction in spoken dialogues, without -relying on additional input features during run-time. -" -7610,1805.06521,"Siamak Barzegar, Andre Freitas, Siegfried Handschuh and Brian Davis",Composite Semantic Relation Classification,cs.CL," Different semantic interpretation tasks such as text entailment and question -answering require the classification of semantic relations between terms or -entities within text. However, in most cases it is not possible to assign a -direct semantic relation between entities/terms. This paper proposes an -approach for composite semantic relation classification, extending the -traditional semantic relation classification task. Different from existing -approaches, which use machine learning models built over lexical and -distributional word vector features, the proposed model uses the combination of -a large commonsense knowledge base of binary relations, a distributional -navigational algorithm and sequence classification to provide a solution for -the composite semantic relation classification problem. -" -7611,1805.06522,"Andre Freitas, Siamak Barzegar, Juliano Efson Sales, Siegfried - Handschuh and Brian Davis","Semantic Relatedness for All (Languages): A Comparative Analysis of - Multilingual Semantic Relatedness Using Machine Translation",cs.CL," This paper provides a comparative analysis of the performance of four -state-of-the-art distributional semantic models (DSMs) over 11 languages, -contrasting the native language-specific models with the use of machine -translation over English-based DSMs. The experimental results show that there -is a significant improvement (average of 16.7% for the Spearman correlation) by -using state-of-the-art machine translation approaches. The results also show -that the benefit of using the most informative corpus outweighs the possible -errors introduced by the machine translation. For all languages, the -combination of machine translation over the Word2Vec English distributional -model provided the best results consistently (average Spearman correlation of -0.68). -" -7612,1805.06524,"Ming Li, Peilun Xiao, and Ju Zhang",Hybrid Adaptive Fuzzy Extreme Learning Machine for text classification,cs.IR cs.AI cs.CL cs.LG," In traditional ELM and its improved versions suffer from the problems of -outliers or noises due to overfitting and imbalance due to distribution. We -propose a novel hybrid adaptive fuzzy ELM(HA-FELM), which introduces a fuzzy -membership function to the traditional ELM method to deal with the above -problems. We define the fuzzy membership function not only basing on the -distance between each sample and the center of the class but also the density -among samples which based on the quantum harmonic oscillator model. The -proposed fuzzy membership function overcomes the shortcoming of the traditional -fuzzy membership function and could make itself adjusted according to the -specific distribution of different samples adaptively. Experiments show the -proposed HA-FELM can produce better performance than SVM, ELM, and RELM in text -classification. -" -7613,1805.06525,"Ming Li, Peilun Xiao, and Ju Zhang",Text classification based on ensemble extreme learning machine,cs.IR cs.AI cs.CL cs.LG," In this paper, we propose a novel approach based on cost-sensitive ensemble -weighted extreme learning machine; we call this approach AE1-WELM. We apply -this approach to text classification. AE1-WELM is an algorithm including -balanced and imbalanced multiclassification for text classification. Weighted -ELM assigning the different weights to the different samples improves the -classification accuracy to a certain extent, but weighted ELM considers the -differences between samples in the different categories only and ignores the -differences between samples within the same categories. We measure the -importance of the documents by the sample information entropy, and generate -cost-sensitive matrix and factor based on the document importance, then embed -the cost-sensitive weighted ELM into the AdaBoost.M1 framework seamlessly. -Vector space model(VSM) text representation produces the high dimensions and -sparse features which increase the burden of ELM. To overcome this problem, we -develop a text classification framework combining the word vector and AE1-WELM. -The experimental results show that our method provides an accurate, reliable -and effective solution for text classification. -" -7614,1805.06533,"Hannah Rashkin, Antoine Bosselut, Maarten Sap, Kevin Knight and Yejin - Choi",Modeling Naive Psychology of Characters in Simple Commonsense Stories,cs.CL," Understanding a narrative requires reading between the lines and reasoning -about the unspoken but obvious implications about events and people's mental -states - a capability that is trivial for humans but remarkably hard for -machines. To facilitate research addressing this challenge, we introduce a new -annotation framework to explain naive psychology of story characters as -fully-specified chains of mental states with respect to motivations and -emotional reactions. Our work presents a new large-scale dataset with rich -low-level annotations and establishes baseline performance on several new -tasks, suggesting avenues for future research. -" -7615,1805.06536,"Ond\v{r}ej C\'ifka, Ond\v{r}ej Bojar",Are BLEU and Meaning Representation in Opposition?,cs.CL," One of possible ways of obtaining continuous-space sentence representations -is by training neural machine translation (NMT) systems. The recent attention -mechanism however removes the single point in the neural network from which the -source sentence representation can be extracted. We propose several variations -of the attentive NMT architecture bringing this meeting point back. Empirical -evaluation suggests that the better the translation quality, the worse the -learned sentence representations serve in a wide range of classification and -similarity tasks. -" -7616,1805.06549,"Pranava Madhyastha, Josiah Wang, Lucia Specia",Defoiling Foiled Image Captions,cs.CV cs.AI cs.CL," We address the task of detecting foiled image captions, i.e. identifying -whether a caption contains a word that has been deliberately replaced by a -semantically similar word, thus rendering it inaccurate with respect to the -image being described. Solving this problem should in principle require a -fine-grained understanding of images to detect linguistically valid -perturbations in captions. In such contexts, encoding sufficiently descriptive -image information becomes a key challenge. In this paper, we demonstrate that -it is possible to solve this task using simple, interpretable yet powerful -representations based on explicit object information. Our models achieve -state-of-the-art performance on a standard dataset, with scores exceeding those -achieved by humans on the task. We also measure the upper-bound performance of -our models using gold standard annotations. Our analysis reveals that the -simpler model performs well even without image information, suggesting that the -dataset contains strong linguistic bias. -" -7617,1805.06553,"Juraj Juraska, Panagiotis Karagiannis, Kevin K. Bowden, Marilyn A. - Walker","A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence - Natural Language Generation",cs.CL," Natural language generation lies at the core of generative dialogue systems -and conversational agents. We describe an ensemble neural language generator, -and present several novel methods for data representation and augmentation that -yield improved results in our model. We test the model on three datasets in the -restaurant, TV and laptop domains, and report both objective and subjective -evaluations of our best model. Using a range of automatic metrics, as well as -human evaluators, we show that our approach achieves better results than -state-of-the-art models on the same datasets. -" -7618,1805.06556,"Vidur Joshi, Matthew Peters, Mark Hopkins","Extending a Parser to Distant Domains Using a Few Dozen Partially - Annotated Examples",cs.CL," We revisit domain adaptation for parsers in the neural era. First we show -that recent advances in word representations greatly diminish the need for -domain adaptation when the target domain is syntactically similar to the source -domain. As evidence, we train a parser on the Wall Street Jour- nal alone that -achieves over 90% F1 on the Brown corpus. For more syntactically dis- tant -domains, we provide a simple way to adapt a parser using only dozens of partial -annotations. For instance, we increase the percentage of error-free -geometry-domain parses in a held-out set from 45% to 73% using approximately -five dozen training examples. In the process, we demon- strate a new -state-of-the-art single model result on the Wall Street Journal test set of -94.3%. This is an absolute increase of 1.7% over the previous state-of-the-art -of 92.6%. -" -7619,1805.06566,"Shivashankar Subramanian, Timothy Baldwin, Trevor Cohn","Content-based Popularity Prediction of Online Petitions Using a Deep - Regression Model",cs.CL," Online petitions are a cost-effective way for citizens to collectively engage -with policy-makers in a democracy. Predicting the popularity of a petition --- -commonly measured by its signature count --- based on its textual content has -utility for policy-makers as well as those posting the petition. In this work, -we model this task using CNN regression with an auxiliary ordinal regression -objective. We demonstrate the effectiveness of our proposed approach using UK -and US government petition datasets. -" -7620,1805.06593,"Chang Xu, Cecile Paris, Surya Nepal, Ross Sparks",Cross-Target Stance Classification with Self-Attention Networks,cs.CL cs.AI," In stance classification, the target on which the stance is made defines the -boundary of the task, and a classifier is usually trained for prediction on the -same target. In this work, we explore the potential for generalizing -classifiers between different targets, and propose a neural model that can -apply what has been learned from a source target to a destination target. We -show that our model can find useful information shared between relevant targets -which improves generalization in certain scenarios. -" -7621,1805.06606,"Chan Woo Lee, Kyu Ye Song, Jihoon Jeong, Woo Yong Choi","Convolutional Attention Networks for Multimodal Emotion Recognition from - Speech and Text Data",cs.CL cs.AI cs.HC," Emotion recognition has become a popular topic of interest, especially in the -field of human computer interaction. Previous works involve unimodal analysis -of emotion, while recent efforts focus on multi-modal emotion recognition from -vision and speech. In this paper, we propose a new method of learning about the -hidden representations between just speech and text data using convolutional -attention networks. Compared to the shallow model which employs simple -concatenation of feature vectors, the proposed attention model performs much -better in classifying emotion from speech and text data contained in the -CMU-MOSEI dataset. -" -7622,1805.06648,"Jeff Mitchell, Pasquale Minervini, Pontus Stenetorp and Sebastian - Riedel",Extrapolation in NLP,cs.CL," We argue that extrapolation to examples outside the training space will often -be easier for models that capture global structures, rather than just maximise -their local fit to the training data. We show that this is true for two popular -models: the Decomposable Attention Model and word2vec. -" -7623,1805.06665,"Bin He, Yi Guan, Rui Dai","Classifying medical relations in clinical text via convolutional neural - networks",cs.CL," Deep learning research on relation classification has achieved solid -performance in the general domain. This study proposes a convolutional neural -network (CNN) architecture with a multi-pooling operation for medical relation -classification on clinical records and explores a loss function with a -category-level constraint matrix. Experiments using the 2010 i2b2/VA relation -corpus demonstrate these models, which do not depend on any external features, -outperform previous single-model methods and our best model is competitive with -the existing ensemble-based method. -" -7624,1805.06816,"Preethi Raghavan, Siddharth Patwardhan, Jennifer J. Liang, Murthy V. - Devarakonda",Annotating Electronic Medical Records for Question Answering,cs.CL cs.CY," Our research is in the relatively unexplored area of question answering -technologies for patient-specific questions over their electronic health -records. A large dataset of human expert curated question and answer pairs is -an important pre-requisite for developing, training and evaluating any question -answering system that is powered by machine learning. In this paper, we -describe a process for creating such a dataset of questions and answers. Our -methodology is replicable, can be conducted by medical students as annotators, -and results in high inter-annotator agreement (0.71 Cohen's kappa). Over the -course of 11 months, 11 medical students followed our annotation methodology, -resulting in a question answering dataset of 5696 questions over 71 patient -records, of which 1747 questions have corresponding answers generated by the -medical students. -" -7625,1805.06879,James P. Bagrow and Daniel Berenberg and Joshua Bongard,Neural language representations predict outcomes of scientific research,cs.CL cs.AI cs.CY cs.LG stat.ML," Many research fields codify their findings in standard formats, often by -reporting correlations between quantities of interest. But the space of all -testable correlates is far larger than scientific resources can currently -address, so the ability to accurately predict correlations would be useful to -plan research and allocate resources. Using a dataset of approximately 170,000 -correlational findings extracted from leading social science journals, we show -that a trained neural network can accurately predict the reported correlations -using only the text descriptions of the correlates. Accurate predictive models -such as these can guide scientists towards promising untested correlates, -better quantify the information gained from new findings, and has implications -for moving artificial intelligence systems from predicting structures to -predicting relationships in the real world. -" -7626,1805.06939,"Hannah Rashkin, Maarten Sap, Emily Allaway, Noah A. Smith and Yejin - Choi","Event2Mind: Commonsense Inference on Events, Intents, and Reactions",cs.CL," We investigate a new commonsense inference task: given an event described in -a short free-form text (""X drinks coffee in the morning""), a system reasons -about the likely intents (""X wants to stay awake"") and reactions (""X feels -alert"") of the event's participants. To support this study, we construct a new -crowdsourced corpus of 25,000 event phrases covering a diverse range of -everyday events and situations. We report baseline performance on this task, -demonstrating that neural encoder-decoder models can successfully compose -embedding representations of previously unseen events and reason about the -likely intents and reactions of the event participants. In addition, we -demonstrate how commonsense inference on people's intents and reactions can -help unveil the implicit gender inequality prevalent in modern movie scripts. -" -7627,1805.06960,"Ravi Shekhar, Tim Baumgartner, Aashish Venkatesh, Elia Bruni, - Raffaella Bernardi, Raquel Fernandez",Ask No More: Deciding when to guess in referential visual dialogue,cs.CL cs.CV cs.MM," Our goal is to explore how the abilities brought in by a dialogue manager can -be included in end-to-end visually grounded conversational agents. We make -initial steps towards this general goal by augmenting a task-oriented visual -dialogue model with a decision-making component that decides whether to ask a -follow-up question to identify a target referent in an image, or to stop the -conversation to make a guess. Our analyses show that adding a decision making -component produces dialogues that are less repetitive and that include fewer -unnecessary questions, thus potentially leading to more efficient and less -unnatural interactions. -" -7628,1805.06966,"Florian Kreyssig, Inigo Casanueva, Pawel Budzianowski, Milica Gasic","Neural User Simulation for Corpus-based Policy Optimisation for Spoken - Dialogue Systems",cs.CL cs.AI stat.ML," User Simulators are one of the major tools that enable offline training of -task-oriented dialogue systems. For this task the Agenda-Based User Simulator -(ABUS) is often used. The ABUS is based on hand-crafted rules and its output is -in semantic form. Issues arise from both properties such as limited diversity -and the inability to interface a text-level belief tracker. This paper -introduces the Neural User Simulator (NUS) whose behaviour is learned from a -corpus and which generates natural language, hence needing a less labelled -dataset than simulators generating a semantic output. In comparison to much of -the past work on this topic, which evaluates user simulators on corpus-based -metrics, we use the NUS to train the policy of a reinforcement learning based -Spoken Dialogue System. The NUS is compared to the ABUS by evaluating the -policies that were trained using the simulators. Cross-model evaluation is -performed i.e. training on one simulator and testing on the other. Furthermore, -the trained policies are tested on real users. In both evaluation tasks the NUS -outperformed the ABUS. -" -7629,1805.06975,"Bhavana Dalvi Mishra, Lifu Huang, Niket Tandon, Wen-tau Yih, Peter - Clark","Tracking State Changes in Procedural Text: A Challenge Dataset and - Models for Process Paragraph Comprehension",cs.CL," We present a new dataset and models for comprehending paragraphs about -processes (e.g., photosynthesis), an important genre of text describing a -dynamic world. The new dataset, ProPara, is the first to contain natural -(rather than machine-generated) text about a changing world along with a full -annotation of entity states (location and existence) during those changes (81k -datapoints). The end-task, tracking the location and existence of entities -through the text, is challenging because the causal effects of actions are -often implicit and need to be inferred. We find that previous models that have -worked well on synthetic data achieve only mediocre performance on ProPara, and -introduce two new neural models that exploit alternative mechanisms for state -prediction, in particular using LSTM input encoding and span prediction. The -new models improve accuracy by up to 19%. The dataset and models are available -to the community at http://data.allenai.org/propara. -" -7630,1805.06995,Juneki Hong and Liang Huang,Linear-Time Constituency Parsing with RNNs and Dynamic Programming,cs.CL," Recently, span-based constituency parsing has achieved competitive accuracies -with extremely simple models by using bidirectional RNNs to model ""spans"". -However, the minimal span parser of Stern et al (2017a) which holds the current -state of the art accuracy is a chart parser running in cubic time, $O(n^3)$, -which is too slow for longer sentences and for applications beyond sentence -boundaries such as end-to-end discourse parsing and joint sentence boundary -detection and parsing. We propose a linear-time constituency parser with RNNs -and dynamic programming using graph-structured stack and beam search, which -runs in time $O(n b^2)$ where $b$ is the beam size. We further speed this up to -$O(n b\log b)$ by integrating cube pruning. Compared with chart parsing -baselines, this linear-time parser is substantially faster for long sentences -on the Penn Treebank and orders of magnitude faster for discourse parsing, and -achieves the highest F1 accuracy on the Penn Treebank among single model -end-to-end systems. -" -7631,1805.07024,"Jie Li, Xiaorui Wang, Yuanyuan Zhao, Yan Li",Gated Recurrent Unit Based Acoustic Modeling with Future Context,cs.CL cs.SD eess.AS," The use of future contextual information is typically shown to be helpful for -acoustic modeling. However, for the recurrent neural network (RNN), it's not so -easy to model the future temporal context effectively, meanwhile keep lower -model latency. In this paper, we attempt to design a RNN acoustic model that -being capable of utilizing the future context effectively and directly, with -the model latency and computation cost as low as possible. The proposed model -is based on the minimal gated recurrent unit (mGRU) with an input projection -layer inserted in it. Two context modules, temporal encoding and temporal -convolution, are specifically designed for this architecture to model the -future context. Experimental results on the Switchboard task and an internal -Mandarin ASR task show that, the proposed model performs much better than long -short-term memory (LSTM) and mGRU models, whereas enables online decoding with -a maximum latency of 170 ms. This model even outperforms a very strong -baseline, TDNN-LSTM, with smaller model latency and almost half less -parameters. -" -7632,1805.07043,Wei Xue and Tao Li,Aspect Based Sentiment Analysis with Gated Convolutional Networks,cs.CL," Aspect based sentiment analysis (ABSA) can provide more detailed information -than general sentiment analysis, because it aims to predict the sentiment -polarities of the given aspects or entities in text. We summarize previous -approaches into two subtasks: aspect-category sentiment analysis (ACSA) and -aspect-term sentiment analysis (ATSA). Most previous approaches employ long -short-term memory and attention mechanisms to predict the sentiment polarity of -the concerned targets, which are often complicated and need more training time. -We propose a model based on convolutional neural networks and gating -mechanisms, which is more accurate and efficient. First, the novel Gated -Tanh-ReLU Units can selectively output the sentiment features according to the -given aspect or entity. The architecture is much simpler than attention layer -used in the existing models. Second, the computations of our model could be -easily parallelized during training, because convolutional layers do not have -time dependency as in LSTM layers, and gating units also work independently. -The experiments on SemEval datasets demonstrate the efficiency and -effectiveness of our models. -" -7633,1805.07049,"Taeuk Kim, Jihun Choi, Sang-goo Lee","SNU_IDS at SemEval-2018 Task 12: Sentence Encoder with Contextualized - Vectors for Argument Reasoning Comprehension",cs.CL," We present a novel neural architecture for the Argument Reasoning -Comprehension task of SemEval 2018. It is a simple neural network consisting of -three parts, collectively judging whether the logic built on a set of given -sentences (a claim, reason, and warrant) is plausible or not. The model -utilizes contextualized word vectors pre-trained on large machine translation -(MT) datasets as a form of transfer learning, which can help to mitigate the -lack of training data. Quantitative analysis shows that simply leveraging LSTMs -trained on MT datasets outperforms several baselines and non-transferred -models, achieving accuracies of about 70% on the development set and about 60% -on the test set. -" -7634,1805.07133,"Thi-Vinh Ngo, Thanh-Le Ha, Phuong-Thai Nguyen, Le-Minh Nguyen","Combining Advanced Methods in Japanese-Vietnamese Neural Machine - Translation",cs.CL," Neural machine translation (NMT) systems have recently obtained state-of-the -art in many machine translation systems between popular language pairs because -of the availability of data. For low-resourced language pairs, there are few -researches in this field due to the lack of bilingual data. In this paper, we -attempt to build the first NMT systems for a low-resourced language -pairs:Japanese-Vietnamese. We have also shown significant improvements when -combining advanced methods to reduce the adverse impacts of data sparsity and -improve the quality of NMT systems. In addition, we proposed a variant of -Byte-Pair Encoding algorithm to perform effective word segmentation for -Vietnamese texts and alleviate the rare-word problem that persists in NMT -systems. -" -7635,1805.07143,"Chris Emmery, Enrique Manjavacas, Grzegorz Chrupa{\l}a",Style Obfuscation by Invariance,cs.CL," The task of obfuscating writing style using sequence models has previously -been investigated under the framework of obfuscation-by-transfer, where the -input text is explicitly rewritten in another style. These approaches also -often lead to major alterations to the semantic content of the input. In this -work, we propose obfuscation-by-invariance, and investigate to what extent -models trained to be explicitly style-invariant preserve semantics. We evaluate -our architectures on parallel and non-parallel corpora, and compare automatic -and human evaluations on the obfuscated sentences. Our experiments show that -style classifier performance can be reduced to chance level, whilst the -automatic evaluation of the output is seemingly equal to models applying -style-transfer. However, based on human evaluation we demonstrate a trade-off -between the level of obfuscation and the observed quality of the output in -terms of meaning preservation and grammaticality. -" -7636,1805.07231,"Eug\'enio Ribeiro, Ricardo Ribeiro, and David Martins de Matos",A Study on Dialog Act Recognition using Character-Level Tokenization,cs.CL," Dialog act recognition is an important step for dialog systems since it -reveals the intention behind the uttered words. Most approaches on the task use -word-level tokenization. In contrast, this paper explores the use of -character-level tokenization. This is relevant since there is information at -the sub-word level that is related to the function of the words and, thus, -their intention. We also explore the use of different context windows around -each token, which are able to capture important elements, such as affixes. -Furthermore, we assess the importance of punctuation and capitalization. We -performed experiments on both the Switchboard Dialog Act Corpus and the DIHANA -Corpus. In both cases, the experiments not only show that character-level -tokenization leads to better performance than the typical word-level -approaches, but also that both approaches are able to capture complementary -information. Thus, the best results are achieved by combining tokenization at -both levels. -" -7637,1805.07274,"Ghulam Ahmed Ansari, Sagar J P, Sarath Chandar, Balaraman Ravindran",Language Expansion In Text-Based Games,cs.CL cs.AI," Text-based games are suitable test-beds for designing agents that can learn -by interaction with the environment in the form of natural language text. Very -recently, deep reinforcement learning based agents have been successfully -applied for playing text-based games. In this paper, we explore the possibility -of designing a single agent to play several text-based games and of expanding -the agent's vocabulary using the vocabulary of agents trained for multiple -games. To this extent, we explore the application of recently proposed policy -distillation method for video games to the text-based game setting. We also use -text-based games as a test-bed to analyze and hence understand policy -distillation approach in detail. -" -7638,1805.07340,Siddhartha Brahma,Improved Sentence Modeling using Suffix Bidirectional LSTM,cs.LG cs.AI cs.CL stat.ML," Recurrent neural networks have become ubiquitous in computing representations -of sequential data, especially textual data in natural language processing. In -particular, Bidirectional LSTMs are at the heart of several neural models -achieving state-of-the-art performance in a wide variety of tasks in NLP. -However, BiLSTMs are known to suffer from sequential bias - the contextual -representation of a token is heavily influenced by tokens close to it in a -sentence. We propose a general and effective improvement to the BiLSTM model -which encodes each suffix and prefix of a sequence of tokens in both forward -and reverse directions. We call our model Suffix Bidirectional LSTM or -SuBiLSTM. This introduces an alternate bias that favors long range -dependencies. We apply SuBiLSTMs to several tasks that require sentence -modeling. We demonstrate that using SuBiLSTM instead of a BiLSTM in existing -models leads to improvements in performance in learning general sentence -representations, text classification, textual entailment and paraphrase -detection. Using SuBiLSTM we achieve new state-of-the-art results for -fine-grained sentiment classification and question classification. -" -7639,1805.07398,"Abhijit Mahabal, Dan Roth, Sid Mittal",Robust Handling of Polysemy via Sparse Representations,cs.CL cs.AI," Words are polysemous and multi-faceted, with many shades of meanings. We -suggest that sparse distributed representations are more suitable than other, -commonly used, (dense) representations to express these multiple facets, and -present Category Builder, a working system that, as we show, makes use of -sparse representations to support multi-faceted lexical representations. We -argue that the set expansion task is well suited to study these meaning -distinctions since a word may belong to multiple sets with a different reason -for membership in each. We therefore exhibit the performance of Category -Builder on this task, while showing that our representation captures at the -same time analogy problems such as ""the Ganga of Egypt"" or ""the Voldemort of -Tolkien"". Category Builder is shown to be a more expressive lexical -representation and to outperform dense representations such as Word2Vec in some -analogy classes despite being shown only two of the three input terms. -" -7640,1805.07443,"Shuai Tang, Virginia R. de Sa",Multi-view Sentence Representation Learning,cs.CL cs.LG cs.NE stat.ML," Multi-view learning can provide self-supervision when different views are -available of the same data. The distributional hypothesis provides another form -of useful self-supervision from adjacent sentences which are plentiful in large -unlabelled corpora. Motivated by the asymmetry in the two hemispheres of the -human brain as well as the observation that different learning architectures -tend to emphasise different aspects of sentence meaning, we create a unified -multi-view sentence representation learning framework, in which, one view -encodes the input sentence with a Recurrent Neural Network (RNN), and the other -view encodes it with a simple linear model, and the training objective is to -maximise the agreement specified by the adjacent context information between -two views. We show that, after training, the vectors produced from our -multi-view training provide improved representations over the single-view -training, and the combination of different views gives further representational -improvement and demonstrates solid transferability on standard downstream -tasks. -" -7641,1805.07467,Yu-An Chung and Wei-Hung Weng and Schrasing Tong and James Glass,Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces,cs.CL cs.SD eess.AS," Recent research has shown that word embedding spaces learned from text -corpora of different languages can be aligned without any parallel data -supervision. Inspired by the success in unsupervised cross-lingual word -embeddings, in this paper we target learning a cross-modal alignment between -the embedding spaces of speech and text learned from corpora of their -respective modalities in an unsupervised fashion. The proposed framework learns -the individual speech and text embedding spaces, and attempts to align the two -spaces via adversarial training, followed by a refinement procedure. We show -how our framework could be used to perform spoken word classification and -translation, and the results on these two tasks demonstrate that the -performance of our unsupervised alignment approach is comparable to its -supervised counterpart. Our framework is especially useful for developing -automatic speech recognition (ASR) and speech-to-text translation systems for -low- or zero-resource languages, which have little parallel audio-text data for -training modern supervised ASR and speech-to-text translation models, but -account for the majority of the languages spoken across the world. -" -7642,1805.07469,"Hiroki Shimanaka, Tomoyuki Kajiwara, Mamoru Komachi","Metric for Automatic Machine Translation Evaluation based on Universal - Sentence Representations",cs.CL," Sentence representations can capture a wide range of information that cannot -be captured by local features based on character or word N-grams. This paper -examines the usefulness of universal sentence representations for evaluating -the quality of machine translation. Although it is difficult to train sentence -representations using small-scale translation datasets with manual evaluation, -sentence representations trained from large-scale data in other tasks can -improve the automatic evaluation of machine translation. Experimental results -of the WMT-2016 dataset show that the proposed method achieves state-of-the-art -performance with sentence representation features only. -" -7643,1805.07475,"Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher P. Reale, - Rebecca L. Russell, Louis Y. Kim, Peter Chin","Learning to Repair Software Vulnerabilities with Generative Adversarial - Networks",cs.CL cs.LG cs.NE stat.ML," Motivated by the problem of automated repair of software vulnerabilities, we -propose an adversarial learning approach that maps from one discrete source -domain to another target domain without requiring paired labeled examples or -source and target domains to be bijections. We demonstrate that the proposed -adversarial learning approach is an effective technique for repairing software -vulnerabilities, performing close to seq2seq approaches that require labeled -pairs. The proposed Generative Adversarial Network approach is -application-agnostic in that it can be applied to other problems similar to -code repair, such as grammar correction or sentiment translation. -" -7644,1805.07513,"Mo Yu, Xiaoxiao Guo, Jinfeng Yi, Shiyu Chang, Saloni Potdar, Yu Cheng, - Gerald Tesauro, Haoyu Wang, Bowen Zhou",Diverse Few-Shot Text Classification with Multiple Metrics,cs.CL cs.LG," We study few-shot learning in natural language domains. Compared to many -existing works that apply either metric-based or optimization-based -meta-learning to image domain with low inter-task variance, we consider a more -realistic setting, where tasks are diverse. However, it imposes tremendous -difficulties to existing state-of-the-art metric-based algorithms since a -single metric is insufficient to capture complex task variations in natural -language domain. To alleviate the problem, we propose an adaptive metric -learning approach that automatically determines the best weighted combination -from a set of metrics obtained from meta-training tasks for a newly seen -few-shot task. Extensive quantitative evaluations on real-world sentiment -analysis and dialog intent classification datasets demonstrate that the -proposed method performs favorably against state-of-the-art few shot learning -algorithms in terms of predictive accuracy. We make our code and data available -for further study. -" -7645,1805.07616,Guillem Collell and Marie-Francine Moens,Do Neural Network Cross-Modal Mappings Really Bridge Modalities?,stat.ML cs.CL cs.CV cs.LG," Feed-forward networks are widely used in cross-modal applications to bridge -modalities by mapping distributed vectors of one modality to the other, or to a -shared space. The predicted vectors are then used to perform e.g., retrieval or -labeling. Thus, the success of the whole system relies on the ability of the -mapping to make the neighborhood structure (i.e., the pairwise similarities) of -the predicted vectors akin to that of the target vectors. However, whether this -is achieved has not been investigated yet. Here, we propose a new similarity -measure and two ad hoc experiments to shed light on this issue. In three -cross-modal benchmarks we learn a large number of language-to-vision and -vision-to-language neural network mappings (up to five layers) using a rich -diversity of image and text features and loss functions. Our results reveal -that, surprisingly, the neighborhood structure of the predicted vectors -consistently resembles more that of the input vectors than that of the target -vectors. In a second experiment, we further show that untrained nets do not -significantly disrupt the neighborhood (i.e., semantic) structure of the input -vectors. -" -7646,1805.07685,"Cicero Nogueira dos Santos, Igor Melnyk, Inkit Padhi","Fighting Offensive Language on Social Media with Unsupervised Text Style - Transfer",cs.CL cs.LG," We introduce a new approach to tackle the problem of offensive language in -online social media. Our approach uses unsupervised text style transfer to -translate offensive sentences into non-offensive ones. We propose a new method -for training encoder-decoders using non-parallel data that combines a -collaborative classifier, attention and the cycle consistency loss. -Experimental results on data from Twitter and Reddit show that our method -outperforms a state-of-the-art text style transfer system in two out of three -quantitative metrics and produces reliable non-offensive transferred sentences. -" -7647,1805.07697,"Elad Tolochinsky, Ohad Mosafi, Ella Rabinovich, Shuly Wintner",The UN Parallel Corpus Annotated for Translation Direction,cs.CL," This work distinguishes between translated and original text in the UN -protocol corpus. By modeling the problem as classification problem, we can -achieve up to 95% classification accuracy. We begin by deriving a parallel -corpus for different language-pairs annotated for translation direction, and -then classify the data by using various feature extraction methods. We compare -the different methods as well as the ability to distinguish between translated -and original texts in the different languages. The annotated corpus is publicly -available. -" -7648,1805.07719,"Rosario Scalise, Yonatan Bisk, Maxwell Forbes, Daqing Yi, Yejin Choi, - Siddhartha Srinivasa",Balancing Shared Autonomy with Human-Robot Communication,cs.RO cs.CL," Robotic agents that share autonomy with a human should leverage human domain -knowledge and account for their preferences when completing a task. This extra -knowledge can dramatically improve plan efficiency and user-satisfaction, but -these gains are lost if communicating with a robot is taxing and unnatural. In -this paper, we show how viewing humanrobot language through the lens of shared -autonomy explains the efficiency versus cognitive load trade-offs humans make -when deciding how cooperative and explicit to make their instructions. -" -7649,1805.07731,"Henry Elder, Chris Hokamp","Generating High-Quality Surface Realizations Using Data Augmentation and - Factored Sequence Models",cs.CL," This work presents a new state of the art in reconstruction of surface -realizations from obfuscated text. We identify the lack of sufficient training -data as the major obstacle to training high-performing models, and solve this -issue by generating large amounts of synthetic training data. We also propose -preprocessing techniques which make the structure contained in the input -features more accessible to sequence models. Our models were ranked first on -all evaluation metrics in the English portion of the 2018 Surface Realization -shared task. -" -7650,1805.07745,Taehoon Kim and Jihoon Yang,"Abstractive Text Classification Using Sequence-to-convolution Neural - Networks",cs.CL," We propose a new deep neural network model and its training scheme for text -classification. Our model Sequence-to-convolution Neural Networks(Seq2CNN) -consists of two blocks: Sequential Block that summarizes input texts and -Convolution Block that receives summary of input and classifies it to a label. -Seq2CNN is trained end-to-end to classify various-length texts without -preprocessing inputs into fixed length. We also present Gradual Weight -Shift(GWS) method that stabilizes training. GWS is applied to our model's loss -function. We compared our model with word-based TextCNN trained with different -data preprocessing methods. We obtained significant improvement in -classification accuracy over word-based TextCNN without any ensemble or data -augmentation. -" -7651,1805.07799,"Kamal Al-Sabahi, Zhang Zuping, and Mohammed Nadher","A Hierarchical Structured Self-Attentive Model for Extractive Document - Summarization (HSSAS)",cs.CL," The recent advance in neural network architecture and training algorithms -have shown the effectiveness of representation learning. The neural -network-based models generate better representation than the traditional ones. -They have the ability to automatically learn the distributed representation for -sentences and documents. To this end, we proposed a novel model that addresses -several issues that are not adequately modeled by the previously proposed -models, such as the memory problem and incorporating the knowledge of document -structure. Our model uses a hierarchical structured self-attention mechanism to -create the sentence and document embeddings. This architecture mirrors the -hierarchical structure of the document and in turn enables us to obtain better -feature representation. The attention mechanism provides extra source of -information to guide the summary extraction. The new model treated the -summarization task as a classification problem in which the model computes the -respective probabilities of sentence-summary membership. The model predictions -are broken up by several features such as information content, salience, -novelty and positional representation. The proposed model was evaluated on two -well-known datasets, the CNN / Daily Mail, and DUC 2002. The experimental -results show that our model outperforms the current extractive state-of-the-art -by a considerable margin. -" -7652,1805.07819,"Abhishek Kumar, Daisuke Kawahara, Sadao Kurohashi",Knowledge-enriched Two-layered Attention Network for Sentiment Analysis,cs.CL," We propose a novel two-layered attention network based on Bidirectional Long -Short-Term Memory for sentiment analysis. The novel two-layered attention -network takes advantage of the external knowledge bases to improve the -sentiment prediction. It uses the Knowledge Graph Embedding generated using the -WordNet. We build our model by combining the two-layered attention network with -the supervised model based on Support Vector Regression using a Multilayer -Perceptron network for sentiment analysis. We evaluate our model on the -benchmark dataset of SemEval 2017 Task 5. Experimental results show that the -proposed model surpasses the top system of SemEval 2017 Task 5. The model -performs significantly better by improving the state-of-the-art system at -SemEval 2017 Task 5 by 1.7 and 3.7 points for sub-tracks 1 and 2 respectively. -" -7653,1805.07824,Javier \'Alvez and Itziar Gonzalez-Dios and German Rigau,Validating WordNet Meronymy Relations using Adimen-SUMO,cs.CL," In this paper, we report on the practical application of a novel approach for -validating the knowledge of WordNet using Adimen-SUMO. In particular, this -paper focuses on cross-checking the WordNet meronymy relations against the -knowledge encoded in Adimen-SUMO. Our validation approach tests a large set of -competency questions (CQs), which are derived (semi)-automatically from the -knowledge encoded in WordNet, SUMO and their mapping, by applying efficient -first-order logic automated theorem provers. Unfortunately, despite of being -created manually, these knowledge resources are not free of errors and -discrepancies. In consequence, some of the resulting CQs are not plausible -according to the knowledge included in Adimen-SUMO. Thus, first we focus on -(semi)-automatically improving the alignment between these knowledge resources, -and second, we perform a minimal set of corrections in the ontology. Our aim is -to minimize the manual effort required for an extensive validation process. We -report on the strategies followed, the changes made, the effort needed and its -impact when validating the WordNet meronymy relations using improved versions -of the mapping and the ontology. Based on the new results, we discuss the -implications of the appropriate corrections and the need of future -enhancements. -" -7654,1805.07851,"Harish Gandhi Ramachandran, Dan DeRose Jr",A Text Analysis of Federal Reserve meeting minutes,cs.IR cs.CL," Recent developments in monetary policy by the Federal Reserve has created a -need for an objective method of communication analysis.Using methods developed -for text analysis, we present a novel technique of analysis which creates a -semantic space defined by various policymakers public comments and places the -committee consensus in the appropriate location. Its then possible to determine -which member of the committee is most closely aligned with the committee -consensus over time and create a foundation for further actionable research. -" -7655,1805.07858,Todor Mihaylov and Anette Frank,"Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with - External Commonsense Knowledge",cs.CL," We introduce a neural reading comprehension model that integrates external -commonsense knowledge, encoded as a key-value memory, in a cloze-style setting. -Instead of relying only on document-to-question interaction or discrete -features as in prior work, our model attends to relevant external knowledge and -combines this knowledge with the context representation before inferring the -answer. This allows the model to attract and imply knowledge from an external -knowledge source that is not explicitly stated in the text, but that is -relevant for inferring the answer. Our model improves results over a very -strong baseline on a hard Common Nouns dataset, making it a strong competitor -of much more complex models. By including knowledge explicitly, our model can -also provide evidence about the background knowledge used in the RC process. -" -7656,1805.07882,"Huy Nguyen Tien, Minh Nguyen Le, Yamasaki Tomohiro, Izuha Tatsuya","Sentence Modeling via Multiple Word Embeddings and Multi-level - Comparison for Semantic Textual Similarity",cs.CL," Different word embedding models capture different aspects of linguistic -properties. This inspired us to propose a model (M-MaxLSTM-CNN) for employing -multiple sets of word embeddings for evaluating sentence similarity/relation. -Representing each word by multiple word embeddings, the MaxLSTM-CNN encoder -generates a novel sentence embedding. We then learn the similarity/relation -between our sentence embeddings via Multi-level comparison. Our method -M-MaxLSTM-CNN consistently shows strong performances in several tasks (i.e., -measure textual similarity, identify paraphrase, recognize textual entailment). -According to the experimental results on STS Benchmark dataset and SICK dataset -from SemEval, M-MaxLSTM-CNN outperforms the state-of-the-art methods for -textual similarity tasks. Our model does not use hand-crafted features (e.g., -alignment features, Ngram overlaps, dependency features) as well as does not -require pre-trained word embeddings to have the same dimension. -" -7657,1805.07889,"Huaishao Luo, Tianrui Li, Bing Liu, Bin Wang, and Herwig Unger","Improving Aspect Term Extraction with Bidirectional Dependency Tree - Representation",cs.CL," Aspect term extraction is one of the important subtasks in aspect-based -sentiment analysis. Previous studies have shown that using dependency tree -structure representation is promising for this task. However, most dependency -tree structures involve only one directional propagation on the dependency -tree. In this paper, we first propose a novel bidirectional dependency tree -network to extract dependency structure features from the given sentences. The -key idea is to explicitly incorporate both representations gained separately -from the bottom-up and top-down propagation on the given dependency syntactic -tree. An end-to-end framework is then developed to integrate the embedded -representations and BiLSTM plus CRF to learn both tree-structured and -sequential features to solve the aspect term extraction problem. Experimental -results demonstrate that the proposed model outperforms state-of-the-art -baseline models on four benchmark SemEval datasets. -" -7658,1805.07932,"Jin-Hwa Kim, Jaehyun Jun, Byoung-Tak Zhang",Bilinear Attention Networks,cs.CV cs.AI cs.CL cs.LG," Attention networks in multimodal learning provide an efficient way to utilize -given visual information selectively. However, the computational cost to learn -attention distributions for every pair of multimodal input channels is -prohibitively expensive. To solve this problem, co-attention builds two -separate attention distributions for each modality neglecting the interaction -between multimodal inputs. In this paper, we propose bilinear attention -networks (BAN) that find bilinear attention distributions to utilize given -vision-language information seamlessly. BAN considers bilinear interactions -among two groups of input channels, while low-rank bilinear pooling extracts -the joint representations for each pair of channels. Furthermore, we propose a -variant of multimodal residual networks to exploit eight-attention maps of the -BAN efficiently. We quantitatively and qualitatively evaluate our model on -visual question answering (VQA 2.0) and Flickr30k Entities datasets, showing -that BAN significantly outperforms previous methods and achieves new -state-of-the-arts on both datasets. -" -7659,1805.07946,"Ekin Aky\""urek, Erenay Dayan{\i}k, Deniz Yuret",Morphological analysis using a sequence decoder,cs.CL," We introduce Morse, a recurrent encoder-decoder model that produces -morphological analyses of each word in a sentence. The encoder turns the -relevant information about the word and its context into a fixed size vector -representation and the decoder generates the sequence of characters for the -lemma followed by a sequence of individual morphological features. We show that -generating morphological features individually rather than as a combined tag -allows the model to handle rare or unseen tags and outperform whole-tag models. -In addition, generating morphological features as a sequence rather than e.g.\ -an unordered set allows our model to produce an arbitrary number of features -that represent multiple inflectional groups in morphologically complex -languages. We obtain state-of-the art results in nine languages of different -morphological complexity under low-resource, high-resource and transfer -learning settings. We also introduce TrMor2018, a new high accuracy Turkish -morphology dataset. Our Morse implementation and the TrMor2018 dataset are -available online to support future research\footnote{See -\url{https://github.com/ai-ku/Morse.jl} for a Morse implementation in -Julia/Knet \cite{knet2016mlsys} and \url{https://github.com/ai-ku/TrMor2018} -for the new Turkish dataset.}. -" -7660,1805.07952,"Ozan Arkan Can, Deniz Yuret","A new dataset and model for learning to understand navigational - instructions",cs.CL," In this paper, we present a state-of-the-art model and introduce a new -dataset for grounded language learning. Our goal is to develop a model that can -learn to follow new instructions given prior instruction-perception-action -examples. We based our work on the SAIL dataset which consists of navigational -instructions and actions in a maze-like environment. The new model we propose -achieves the best results to date on the SAIL dataset by using an improved -perceptual component that can represent relative positions of objects. We also -analyze the problems with the SAIL dataset regarding its size and balance. We -argue that performance on a small, fixed-size dataset is no longer a good -measure to differentiate state-of-the-art models. We introduce SAILx, a -synthetic dataset generator, and perform experiments where the size and balance -of the dataset are controlled. -" -7661,1805.07966,"Sopan Khosla, Niyati Chhaya, Kushal Chawla",Aff2Vec: Affect--Enriched Distributional Word Representations,cs.CL cs.AI," Human communication includes information, opinions, and reactions. Reactions -are often captured by the affective-messages in written as well as verbal -communications. While there has been work in affect modeling and to some extent -affective content generation, the area of affective word distributions in not -well studied. Synsets and lexica capture semantic relationships across words. -These models however lack in encoding affective or emotional word -interpretations. Our proposed model, Aff2Vec provides a method for enriched -word embeddings that are representative of affective interpretations of words. -Aff2Vec outperforms the state--of--the--art in intrinsic word-similarity tasks. -Further, the use of Aff2Vec representations outperforms baseline embeddings in -downstream natural language understanding tasks including sentiment analysis, -personality detection, and frustration prediction. -" -7662,1805.08028,"Fuli Luo, Tianyu Liu, Qiaolin Xia, Baobao Chang and Zhifang Sui",Incorporating Glosses into Neural Word Sense Disambiguation,cs.CL," Word Sense Disambiguation (WSD) aims to identify the correct meaning of -polysemous words in the particular context. Lexical resources like WordNet -which are proved to be of great help for WSD in the knowledge-based methods. -However, previous neural networks for WSD always rely on massive labeled data -(context), ignoring lexical resources like glosses (sense definitions). In this -paper, we integrate the context and glosses of the target word into a unified -framework in order to make full use of both labeled data and lexical knowledge. -Therefore, we propose GAS: a gloss-augmented WSD neural network which jointly -encodes the context and glosses of the target word. GAS models the semantic -relationship between the context and the gloss in an improved memory network -framework, which breaks the barriers of the previous supervised methods and -knowledge-based methods. We further extend the original gloss of word sense via -its semantic relations in WordNet to enrich the gloss information. The -experimental results show that our model outperforms the state-of-theart -systems on several English all-words WSD datasets. -" -7663,1805.08032,"Jan Chorowski, Adrian {\L}a\'ncucki, Szymon Malik, Maciej Pawlikowski, - Pawe{\l} Rychlikowski, Pawe{\l} Zykowski","A Talker Ensemble: the University of Wroc{\l}aw's Entry to the NIPS 2017 - Conversational Intelligence Challenge",cs.CL," We present Poetwannabe, a chatbot submitted by the University of Wroc{\l}aw -to the NIPS 2017 Conversational Intelligence Challenge, in which it ranked -first ex-aequo. It is able to conduct a conversation with a user in a natural -language. The primary functionality of our dialogue system is context-aware -question answering (QA), while its secondary function is maintaining user -engagement. The chatbot is composed of a number of sub-modules, which -independently prepare replies to user's prompts and assess their own -confidence. To answer questions, our dialogue system relies heavily on factual -data, sourced mostly from Wikipedia and DBpedia, data of real user interactions -in public forums, as well as data concerning general literature. Where -applicable, modules are trained on large datasets using GPUs. However, to -comply with the competition's requirements, the final system is compact and -runs on commodity hardware. -" -7664,1805.08092,"Sewon Min, Victor Zhong, Richard Socher, Caiming Xiong","Efficient and Robust Question Answering from Minimal Context over - Documents",cs.CL," Neural models for question answering (QA) over documents have achieved -significant performance improvements. Although effective, these models do not -scale to large corpora due to their complex modeling of interactions between -the document and the question. Moreover, recent work has shown that such models -are sensitive to adversarial inputs. In this paper, we study the minimal -context required to answer the question, and find that most questions in -existing datasets can be answered with a small set of sentences. Inspired by -this observation, we propose a simple sentence selector to select the minimal -set of sentences to feed into the QA model. Our overall system achieves -significant reductions in training (up to 15 times) and inference times (up to -13 times), with accuracy comparable to or better than the state-of-the-art on -SQuAD, NewsQA, TriviaQA and SQuAD-Open. Furthermore, our experimental results -and analyses show that our approach is more robust to adversarial inputs. -" -7665,1805.08093,"Thiago Castro Ferreira, Diego Moussallem, \'Akos K\'ad\'ar, Sander - Wubben and Emiel Krahmer",NeuralREG: An end-to-end approach to referring expression generation,cs.CL," Traditionally, Referring Expression Generation (REG) models first decide on -the form and then on the content of references to discourse entities in text, -typically relying on features such as salience and grammatical function. In -this paper, we present a new approach (NeuralREG), relying on deep neural -networks, which makes decisions about form and content in one go without -explicit feature extraction. Using a delexicalized version of the WebNLG -corpus, we show that the neural model substantially improves over two strong -baselines. Data and models are publicly available. -" -7666,1805.08099,"Gerhard J\""ager",Computational Historical Linguistics,cs.CL," Computational approaches to historical linguistics have been proposed since -half a century. Within the last decade, this line of research has received a -major boost, owing both to the transfer of ideas and software from -computational biology and to the release of several large electronic data -resources suitable for systematic comparative work. - In this article, some of the central research topic of this new wave of -computational historical linguistics are introduced and discussed. These are -automatic assessment of genetic relatedness, automatic cognate detection, -phylogenetic inference and ancestral state reconstruction. They will be -demonstrated by means of a case study of automatically reconstructing a -Proto-Romance word list from lexical data of 50 modern Romance languages and -dialects. -" -7667,1805.08154,Georgios P. Spithourakis and Sebastian Riedel,"Numeracy for Language Models: Evaluating and Improving their Ability to - Predict Numbers",cs.CL cs.NE stat.ML," Numeracy is the ability to understand and work with numbers. It is a -necessary skill for composing and understanding documents in clinical, -scientific, and other technical domains. In this paper, we explore different -strategies for modelling numerals with language models, such as memorisation -and digit-by-digit composition, and propose a novel neural architecture that -uses a continuous probability density function to model numerals from an open -vocabulary. Our evaluation on clinical and scientific datasets shows that using -hierarchical models to distinguish numerals from words improves a perplexity -metric on the subset of numerals by 2 and 4 orders of magnitude, respectively, -over non-hierarchical models. A combination of strategies can further improve -perplexity. Our continuous probability density function model reduces mean -absolute percentage errors by 18% and 54% in comparison to the second best -strategy for each dataset, respectively. -" -7668,1805.08159,"Jinfeng Rao, Wei Yang, Yuhao Zhang, Ferhan Ture, Jimmy Lin","Multi-Perspective Relevance Matching with Hierarchical ConvNets for - Social Media Search",cs.IR cs.CL," Despite substantial interest in applications of neural networks to -information retrieval, neural ranking models have only been applied to standard -ad hoc retrieval tasks over web pages and newswire documents. This paper -proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network) -a novel neural ranking model specifically designed for ranking short social -media posts. We identify document length, informal language, and heterogeneous -relevance signals as features that distinguish documents in our domain, and -present a model specifically designed with these characteristics in mind. Our -model uses hierarchical convolutional layers to learn latent semantic -soft-match relevance signals at the character, word, and phrase levels. A -pooling-based similarity measurement layer integrates evidence from multiple -types of matches between the query, the social media post, as well as URLs -contained in the post. Extensive experiments using Twitter data from the TREC -Microblog Tracks 2011--2014 show that our model significantly outperforms prior -feature-based as well and existing neural ranking models. To our best -knowledge, this paper presents the first substantial work tackling search over -social media posts using neural ranking models. -" -7669,1805.08174,"Shagun Sodhani, Vardaan Pahuja","Reproducibility Report for ""Learning To Count Objects In Natural Images - For Visual Question Answering""",cs.CV cs.CL," This is the reproducibility report for the paper ""Learning To Count Objects -In Natural Images For Visual QuestionAnswering"" -" -7670,1805.08182,"Anastassia Kornilova, Daniel Argyle and Vlad Eidelman","Party Matters: Enhancing Legislative Embeddings with Author Attributes - for Vote Prediction",cs.CL cs.LG," Predicting how Congressional legislators will vote is important for -understanding their past and future behavior. However, previous work on -roll-call prediction has been limited to single session settings, thus did not -consider generalization across sessions. In this paper, we show that metadata -is crucial for modeling voting outcomes in new contexts, as changes between -sessions lead to changes in the underlying data generation process. We show how -augmenting bill text with the sponsors' ideologies in a neural network model -can achieve an average of a 4% boost in accuracy over the previous -state-of-the-art. -" -7671,1805.08237,"Bernd Bohnet, Ryan McDonald, Goncalo Simoes, Daniel Andor, Emily - Pitler, Joshua Maynez","Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive - Token Encodings",cs.CL," The rise of neural networks, and particularly recurrent neural networks, has -produced significant advances in part-of-speech tagging accuracy. One -characteristic common among these models is the presence of rich initial word -encodings. These encodings typically are composed of a recurrent -character-based representation with learned and pre-trained word embeddings. -However, these encodings do not consider a context wider than a single word and -it is only through subsequent recurrent layers that word or sub-word -information interacts. In this paper, we investigate models that use recurrent -neural networks with sentence-level context for initial character and -word-based representations. In particular we show that optimal results are -obtained by integrating these context sensitive representations through -synchronized training with a meta-model that learns to combine their states. We -present results on part-of-speech and morphological tagging with -state-of-the-art performance on a number of languages. -" -7672,1805.08241,"Chaitanya Malaviya, Pedro Ferreira, Andr\'e F. T. Martins",Sparse and Constrained Attention for Neural Machine Translation,cs.CL," In NMT, words are sometimes dropped from the source or generated repeatedly -in the translation. We explore novel strategies to address the coverage problem -that change only the attention transformation. Our approach allocates -fertilities to source words, used to bound the attention each word can receive. -We experiment with various sparse and constrained attention transformations and -propose a new one, constrained sparsemax, shown to be differentiable and -sparse. Empirical evaluation is provided in three languages pairs. -" -7673,1805.08271,"Hongyuan Mei, Sheng Zhang, Kevin Duh, Benjamin Van Durme","Halo: Learning Semantics-Aware Representations for Cross-Lingual - Information Extraction",cs.CL," Cross-lingual information extraction (CLIE) is an important and challenging -task, especially in low resource scenarios. To tackle this challenge, we -propose a training method, called Halo, which enforces the local region of each -hidden state of a neural model to only generate target tokens with the same -semantic structure tag. This simple but powerful technique enables a neural -model to learn semantics-aware representations that are robust to noise, -without introducing any extra parameter, thus yielding better generalization in -both high and low resource settings. -" -7674,1805.08297,Wuwei Lan and Wei Xu,Character-based Neural Networks for Sentence Pair Modeling,cs.CL," Sentence pair modeling is critical for many NLP tasks, such as paraphrase -identification, semantic textual similarity, and natural language inference. -Most state-of-the-art neural models for these tasks rely on pretrained word -embedding and compose sentence-level semantics in varied ways; however, few -works have attempted to verify whether we really need pretrained embeddings in -these tasks. In this paper, we study how effective subword-level (character and -character n-gram) representations are in sentence pair modeling. Though it is -well-known that subword models are effective in tasks with single sentence -input, including language modeling and machine translation, they have not been -systematically studied in sentence pair modeling tasks where the semantic and -string similarities between texts matter. Our experiments show that subword -models without any pretrained word embedding can achieve new state-of-the-art -results on two social media datasets and competitive results on news data for -paraphrase identification. -" -7675,1805.08329,"Haonan Yu, Xiaochen Lian, Haichao Zhang, Wei Xu","Guided Feature Transformation (GFT): A Neural Language Grounding Module - for Embodied Agents",cs.AI cs.CL cs.LG cs.RO," Recently there has been a rising interest in training agents, embodied in -virtual environments, to perform language-directed tasks by deep reinforcement -learning. In this paper, we propose a simple but effective neural language -grounding module for embodied agents that can be trained end to end from -scratch taking raw pixels, unstructured linguistic commands, and sparse rewards -as the inputs. We model the language grounding process as a language-guided -transformation of visual features, where latent sentence embeddings are used as -the transformation matrices. In several language-directed navigation tasks that -feature challenging partial observability and require simple reasoning, our -module significantly outperforms the state of the art. We also release -XWorld3D, an easy-to-customize 3D environment that can potentially be modified -to evaluate a variety of embodied agents. -" -7676,1805.08352,"Shereen Oraby, Lena Reed, Shubhangi Tandon, T. S. Sharath, Stephanie - Lukin, Marilyn Walker","Controlling Personality-Based Stylistic Variation with Neural Natural - Language Generators",cs.CL," Natural language generators for task-oriented dialogue must effectively -realize system dialogue actions and their associated semantics. In many -applications, it is also desirable for generators to control the style of an -utterance. To date, work on task-oriented neural generation has primarily -focused on semantic fidelity rather than achieving stylistic goals, while work -on style has been done in contexts where it is difficult to measure content -preservation. Here we present three different sequence-to-sequence models and -carefully test how well they disentangle content and style. We use a -statistical generator, Personage, to synthesize a new corpus of over 88,000 -restaurant domain utterances whose style varies according to models of -personality, giving us total control over both the semantic content and the -stylistic variation in the training data. We then vary the amount of explicit -stylistic supervision given to the three models. We show that our most explicit -model can simultaneously achieve high fidelity to both semantic and stylistic -goals: this model adds a context vector of 36 stylistic parameters as input to -the hidden state of the encoder at each time step, showing the benefits of -explicit stylistic supervision, even when the amount of training data is large. -" -7677,1805.08353,Anson Bastos,Learning sentence embeddings using Recursive Networks,cs.CL," Learning sentence vectors that generalise well is a challenging task. In this -paper we compare three methods of learning phrase embeddings: 1) Using LSTMs, -2) using recursive nets, 3) A variant of the method 2 using the POS information -of the phrase. We train our models on dictionary definitions of words to obtain -a reverse dictionary application similar to Felix et al. [1]. To see if our -embeddings can be transferred to a new task we also train and test on the -rotten tomatoes dataset [2]. We train keeping the sentence embeddings fixed as -well as with fine tuning. -" -7678,1805.08389,"Jialin Wu, Zeyuan Hu, Raymond J. Mooney",Joint Image Captioning and Question Answering,cs.CL cs.CV," Answering visual questions need acquire daily common knowledge and model the -semantic connection among different parts in images, which is too difficult for -VQA systems to learn from images with the only supervision from answers. -Meanwhile, image captioning systems with beam search strategy tend to generate -similar captions and fail to diversely describe images. To address the -aforementioned issues, we present a system to have these two tasks compensate -with each other, which is capable of jointly producing image captions and -answering visual questions. In particular, we utilize question and image -features to generate question-related captions and use the generated captions -as additional features to provide new knowledge to the VQA system. For image -captioning, our system attains more informative results in term of the relative -improvements on VQA tasks as well as competitive results using automated -metrics. Applying our system to the VQA tasks, our results on VQA v2 dataset -achieve 65.8% using generated captions and 69.1% using annotated captions in -validation set and 68.4% in the test-standard set. Further, an ensemble of 10 -models results in 69.7% in the test-standard split. -" -7679,1805.08415,"Mohammadamir Kavousi, Sepehr Saadatmand",Estimating the Rating of Reviewers Based on the Text,cs.CL," User-generated texts such as reviews and social media are valuable sources of -information. Online reviews are important assets for users to buy a product, -see a movie, or make a decision. Therefore, rating of a review is one of the -reliable factors for all users to read and trust the reviews. This paper -analyzes the texts of the reviews to evaluate and predict the ratings. -Moreover, we study the effect of lexical features generated from text as well -as sentimental words on the accuracy of rating prediction. Our analysis show -that words with high information gain score are more efficient compared to -words with high TF-IDF value. In addition, we explore the best number of -features for predicting the ratings of the reviews. -" -7680,1805.08438,Cem Bozsahin and Arzu Burcu Guven,"Paracompositionality, MWEs and Argument Substitution",cs.CL," Multi-word expressions, verb-particle constructions, idiomatically combining -phrases, and phrasal idioms have something in common: not all of their elements -contribute to the argument structure of the predicate implicated by the -expression. - Radically lexicalized theories of grammar that avoid string-, term-, logical -form-, and tree-writing, and categorial grammars that avoid wrap operation, -make predictions about the categories involved in verb-particles and phrasal -idioms. They may require singleton types, which can only substitute for one -value, not just for one kind of value. These types are asymmetric: they can be -arguments only. They also narrowly constrain the kind of semantic value that -can correspond to such syntactic categories. Idiomatically combining phrases do -not subcategorize for singleton types, and they exploit another locally -computable and compositional property of a correspondence, that every syntactic -expression can project its head word. Such MWEs can be seen as empirically -realized categorial possibilities, rather than lacuna in a theory of -lexicalizable syntactic categories. -" -7681,1805.08455,"Silje Christensen, Simen Johnsrud, Massimiliano Ruocco, Heri - Ramampiaro",Context-Aware Sequence-to-Sequence Models for Conversational Systems,cs.CL cs.AI," This work proposes a novel approach based on sequence-to-sequence (seq2seq) -models for context-aware conversational systems. Exist- ing seq2seq models have -been shown to be good for generating natural responses in a data-driven -conversational system. However, they still lack mechanisms to incorporate -previous conversation turns. We investigate RNN-based methods that efficiently -integrate previous turns as a context for generating responses. Overall, our -experimental results based on human judgment demonstrate the feasibility and -effectiveness of the proposed approach. -" -7682,1805.08533,"Nora Al-Twairesh, Hend Al-Khalifa, AbdulMalik Alsalman, Yousef - Al-Ohali","Sentiment Analysis of Arabic Tweets: Feature Engineering and A Hybrid - Approach",cs.CL," Sentiment Analysis in Arabic is a challenging task due to the rich morphology -of the language. Moreover, the task is further complicated when applied to -Twitter data that is known to be highly informal and noisy. In this paper, we -develop a hybrid method for sentiment analysis for Arabic tweets for a specific -Arabic dialect which is the Saudi Dialect. Several features were engineered and -evaluated using a feature backward selection method. Then a hybrid method that -combines a corpus-based and lexicon-based method was developed for several -classification models (two-way, three-way, four-way). The best F1-score for -each of these models was (69.9,61.63,55.07) respectively. -" -7683,1805.08660,"Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li and Ivan - Marsic","Multimodal Affective Analysis Using Hierarchical Attention Strategy with - Word-Level Alignment",cs.CL," Multimodal affective computing, learning to recognize and interpret human -affects and subjective information from multiple data sources, is still -challenging because: (i) it is hard to extract informative features to -represent human affects from heterogeneous inputs; (ii) current fusion -strategies only fuse different modalities at abstract level, ignoring -time-dependent interactions between modalities. Addressing such issues, we -introduce a hierarchical multimodal architecture with attention and word-level -fusion to classify utter-ance-level sentiment and emotion from text and audio -data. Our introduced model outperforms the state-of-the-art approaches on -published datasets and we demonstrated that our model is able to visualize and -interpret the synchronized attention over modalities. -" -7684,1805.08661,"Xirong Li and Chaoxi Xu and Xiaoxu Wang and Weiyu Lan and Zhengxiong - Jia and Gang Yang and Jieping Xu","COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval",cs.CL cs.CV," This paper contributes to cross-lingual image annotation and retrieval in -terms of data and baseline methods. We propose COCO-CN, a novel dataset -enriching MS-COCO with manually written Chinese sentences and tags. For more -effective annotation acquisition, we develop a recommendation-assisted -collective annotation system, automatically providing an annotator with several -tags and sentences deemed to be relevant with respect to the pictorial content. -Having 20,342 images annotated with 27,218 Chinese sentences and 70,993 tags, -COCO-CN is currently the largest Chinese-English dataset that provides a -unified and challenging platform for cross-lingual image tagging, captioning -and retrieval. We develop conceptually simple yet effective methods per task -for learning from cross-lingual resources. Extensive experiments on the three -tasks justify the viability of the proposed dataset and methods. Data and code -are publicly available at https://github.com/li-xirong/coco-cn -" -7685,1805.08701,"Soumil Mandal, Karthick Nanmaran","Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq - Model & Levenshtein Distance",cs.CL," Building tools for code-mixed data is rapidly gaining popularity in the NLP -research community as such data is exponentially rising on social media. -Working with code-mixed data contains several challenges, especially due to -grammatical inconsistencies and spelling variations in addition to all the -previous known challenges for social media scenarios. In this article, we -present a novel architecture focusing on normalizing phonetic typing -variations, which is commonly seen in code-mixed data. One of the main features -of our architecture is that in addition to normalizing, it can also be utilized -for back-transliteration and word identification in some cases. Our model -achieved an accuracy of 90.27% on the test data. -" -7686,1805.08707,"Pasquale Iero, Allan Third, Paul Piwek",A syllogistic system for propositions with intermediate quantifiers,cs.LO cs.CL," This paper describes a formalism that subsumes Peterson's intermediate -quantifier syllogistic system, and extends the ideas by van Eijck on -Aristotle's logic. Syllogisms are expressed in a concise form making use of and -extending the Monotonicity Calculus. Contradictory and contrary relationships -are added so that deduction can derive propositions expressing a form of -negation. -" -7687,1805.08914,"Ruixi Lin, Charles Costello, Charles Jankowski","Enhancing Chinese Intent Classification by Dynamically Integrating - Character Features into Word Embeddings with Ensemble Techniques",cs.CL," Intent classification has been widely researched on English data with deep -learning approaches that are based on neural networks and word embeddings. The -challenge for Chinese intent classification stems from the fact that, unlike -English where most words are made up of 26 phonologic alphabet letters, Chinese -is logographic, where a Chinese character is a more basic semantic unit that -can be informative and its meaning does not vary too much in contexts. Chinese -word embeddings alone can be inadequate for representing words, and pre-trained -embeddings can suffer from not aligning well with the task at hand. To account -for the inadequacy and leverage Chinese character information, we propose a -low-effort and generic way to dynamically integrate character embedding based -feature maps with word embedding based inputs, whose resulting word-character -embeddings are stacked with a contextual information extraction module to -further incorporate context information for predictions. On top of the proposed -model, we employ an ensemble method to combine single models and obtain the -final result. The approach is data-independent without relying on external -sources like pre-trained word embeddings. The proposed model outperforms -baseline models and existing methods. -" -7688,1805.08949,"Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, Graham Neubig","Learning to Mine Aligned Code and Natural Language Pairs from Stack - Overflow",cs.CL cs.SE," For tasks like code synthesis from natural language, code retrieval, and code -summarization, data-driven models have shown great promise. However, creating -these models require parallel data between natural language (NL) and code with -fine-grained alignments. Stack Overflow (SO) is a promising source to create -such a data set: the questions are diverse and most of them have corresponding -answers with high-quality code snippets. However, existing heuristic methods -(e.g., pairing the title of a post with the code in the accepted answer) are -limited both in their coverage and the correctness of the NL-code pairs -obtained. In this paper, we propose a novel method to mine high-quality aligned -data from SO using two sets of features: hand-crafted features considering the -structure of the extracted snippets, and correspondence features obtained by -training a probabilistic model to capture the correlation between NL and code -using neural networks. These features are fed into a classifier that determines -the quality of mined NL-code pairs. Experiments using Python and Java as test -beds show that the proposed method greatly expands coverage and accuracy over -existing mining methods, even when using only a small number of labeled -examples. Further, we find that reasonable results are achieved even when -training the classifier on one language and testing on another, showing promise -for scaling NL-code mining to a wide variety of programming languages beyond -those for which we are able to annotate data. -" -7689,1805.08983,"Jonggu Kim, Doyeon Kong, Jong-Hyeok Lee","Self-Attention-Based Message-Relevant Response Generation for Neural - Conversation Model",cs.CL," Using a sequence-to-sequence framework, many neural conversation models for -chit-chat succeed in naturalness of the response. Nevertheless, the neural -conversation models tend to give generic responses which are not specific to -given messages, and it still remains as a challenge. To alleviate the tendency, -we propose a method to promote message-relevant and diverse responses for -neural conversation model by using self-attention, which is time-efficient as -well as effective. Furthermore, we present an investigation of why and how -effective self-attention is in deep comparison with the standard dialogue -generation. The experiment results show that the proposed method improves the -standard dialogue generation in various evaluation metrics. -" -7690,1805.09007,David Vilares and Carlos G\'omez-Rodr\'iguez,A Transition-based Algorithm for Unrestricted AMR Parsing,cs.CL," Non-projective parsing can be useful to handle cycles and reentrancy in AMR -graphs. We explore this idea and introduce a greedy left-to-right -non-projective transition-based parser. At each parsing configuration, an -oracle decides whether to create a concept or whether to connect a pair of -existing concepts. The algorithm handles reentrancy and arbitrary cycles -natively, i.e. within the transition system itself. The model is evaluated on -the LDC2015E86 corpus, obtaining results close to the state of the art, -including a Smatch of 64%, and showing good behavior on reentrant edges. -" -7691,1805.09016,Jeremy Barnes and Roman Klinger and Sabine Schulte im Walde,"Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across - Languages",cs.CL," Sentiment analysis in low-resource languages suffers from a lack of annotated -corpora to estimate high-performing models. Machine translation and bilingual -word embeddings provide some relief through cross-lingual sentiment approaches. -However, they either require large amounts of parallel data or do not -sufficiently capture sentiment information. We introduce Bilingual Sentiment -Embeddings (BLSE), which jointly represent sentiment information in a source -and target language. This model only requires a small bilingual lexicon, a -source-language corpus annotated for sentiment, and monolingual word embeddings -for each language. We perform experiments on three language combinations -(Spanish, Catalan, Basque) for sentence-level cross-lingual sentiment -classification and find that our model significantly outperforms -state-of-the-art methods on four out of six experimental setups, as well as -capturing complementary information to machine translation. Our analysis of the -resulting embedding space provides evidence that it represents sentiment -information in the resource-poor target language without any annotated data in -that language. -" -7692,1805.09055,David Vilares and Carlos G\'omez-Rodr\'iguez,Grounding the Semantics of Part-of-Day Nouns Worldwide using Twitter,cs.CL," The usage of part-of-day nouns, such as 'night', and their time-specific -greetings ('good night'), varies across languages and cultures. We show the -possibilities that Twitter offers for studying the semantics of these terms and -its variability between countries. We mine a worldwide sample of multilingual -tweets with temporal greetings, and study how their frequencies vary in -relation with local time. The results provide insights into the semantics of -these temporal expressions and the cultural and sociological factors -influencing their usage. -" -7693,1805.09119,Judith Gaspers and Penny Karanasou and Rajen Chatterjee,"Selecting Machine-Translated Data for Quick Bootstrapping of a Natural - Language Understanding System",cs.CL," This paper investigates the use of Machine Translation (MT) to bootstrap a -Natural Language Understanding (NLU) system for a new language for the use case -of a large-scale voice-controlled device. The goal is to decrease the cost and -time needed to get an annotated corpus for the new language, while still having -a large enough coverage of user requests. Different methods of filtering MT -data in order to keep utterances that improve NLU performance and -language-specific post-processing methods are investigated. These methods are -tested in a large-scale NLU task with translating around 10 millions training -utterances from English to German. The results show a large improvement for -using MT data over a grammar-based and over an in-house data collection -baseline, while reducing the manual effort greatly. Both filtering and -post-processing approaches improve results further. -" -7694,1805.09145,"Matthias Jurisch, Bodo Igler",RDF2Vec-based Classification of Ontology Alignment Changes,cs.CL cs.AI," When ontologies cover overlapping topics, the overlap can be represented -using ontology alignments. These alignments need to be continuously adapted to -changing ontologies. Especially for large ontologies this is a costly task -often consisting of manual work. Finding changes that do not lead to an -adaption of the alignment can potentially make this process significantly -easier. This work presents an approach to finding these changes based on RDF -embeddings and common classification techniques. To examine the feasibility of -this approach, an evaluation on a real-world dataset is presented. In this -evaluation, the best classifiers reached a precision of 0.8. -" -7695,1805.09197,"No\'e Tits, Kevin El Haddad, Thierry Dutoit",ASR-based Features for Emotion Recognition: A Transfer Learning Approach,eess.AS cs.AI cs.CL cs.SD," During the last decade, the applications of signal processing have -drastically improved with deep learning. However areas of affecting computing -such as emotional speech synthesis or emotion recognition from spoken language -remains challenging. In this paper, we investigate the use of a neural -Automatic Speech Recognition (ASR) as a feature extractor for emotion -recognition. We show that these features outperform the eGeMAPS feature set to -predict the valence and arousal emotional dimensions, which means that the -audio-to-text mapping learning by the ASR system contain information related to -the emotional dimensions in spontaneous speech. We also examine the -relationship between first layers (closer to speech) and last layers (closer to -text) of the ASR and valence/arousal. -" -7696,1805.09208,"G\'abor Melis, Charles Blundell, Tom\'a\v{s} Ko\v{c}isk\'y, Karl - Moritz Hermann, Chris Dyer, Phil Blunsom",Pushing the bounds of dropout,stat.ML cs.CL cs.LG," We show that dropout training is best understood as performing MAP estimation -concurrently for a family of conditional models whose objectives are themselves -lower bounded by the original dropout objective. This discovery allows us to -pick any model from this family after training, which leads to a substantial -improvement on regularisation-heavy language modelling. The family includes -models that compute a power mean over the sampled dropout masks, and their less -stochastic subvariants with tighter and higher lower bounds than the fully -stochastic dropout objective. We argue that since the deterministic -subvariant's bound is equal to its objective, and the highest amongst these -models, the predominant view of it as a good approximation to MC averaging is -misleading. Rather, deterministic dropout is the best available approximation -to the true objective. -" -7697,1805.09209,"Nikolay Arefyev, Pavel Ermolaev, Alexander Panchenko","How much does a word weigh? Weighting word embeddings for word sense - induction",cs.CL," The paper describes our participation in the first shared task on word sense -induction and disambiguation for the Russian language RUSSE'2018 (Panchenko et -al., 2018). For each of several dozens of ambiguous words, the participants -were asked to group text fragments containing it according to the senses of -this word, which were not provided beforehand, therefore the ""induction"" part -of the task. For instance, a word ""bank"" and a set of text fragments (also -known as ""contexts"") in which this word occurs, e.g. ""bank is a financial -institution that accepts deposits"" and ""river bank is a slope beside a body of -water"" were given. A participant was asked to cluster such contexts in the -unknown in advance number of clusters corresponding to, in this case, the -""company"" and the ""area"" senses of the word ""bank"". The organizers proposed -three evaluation datasets of varying complexity and text genres based -respectively on texts of Wikipedia, Web pages, and a dictionary of the Russian -language. We present two experiments: a positive and a negative one, based -respectively on clustering of contexts represented as a weighted average of -word embeddings and on machine translation using two state-of-the-art -production neural machine translation systems. Our team showed the second best -result on two datasets and the third best result on the remaining one dataset -among 18 participating teams. We managed to substantially outperform -competitive state-of-the-art baselines from the previous years based on sense -embeddings. -" -7698,1805.09354,"Juan Pavez, H\'ector Allende and H\'ector Allende-Cid","Working Memory Networks: Augmenting Memory Networks with a Relational - Reasoning Module",cs.CL," During the last years, there has been a lot of interest in achieving some -kind of complex reasoning using deep neural networks. To do that, models like -Memory Networks (MemNNs) have combined external memory storages and attention -mechanisms. These architectures, however, lack of more complex reasoning -mechanisms that could allow, for instance, relational reasoning. Relation -Networks (RNs), on the other hand, have shown outstanding results in relational -reasoning tasks. Unfortunately, their computational cost grows quadratically -with the number of memories, something prohibitive for larger problems. To -solve these issues, we introduce the Working Memory Network, a MemNN -architecture with a novel working memory storage and reasoning module. Our -model retains the relational reasoning abilities of the RN while reducing its -computational complexity from quadratic to linear. We tested our model on the -text QA dataset bAbI and the visual QA dataset NLVR. In the jointly trained -bAbI-10k, we set a new state-of-the-art, achieving a mean error of less than -0.5%. Moreover, a simple ensemble of two of our models solves all 20 tasks in -the joint version of the benchmark. -" -7699,1805.09355,"Marek Rei, Daniela Gerz, Ivan Vuli\'c","Scoring Lexical Entailment with a Supervised Directional Similarity - Network",cs.CL cs.LG cs.NE," We present the Supervised Directional Similarity Network (SDSN), a novel -neural architecture for learning task-specific transformation functions on top -of general-purpose word embeddings. Relying on only a limited amount of -supervision from task-specific scores on a subset of the vocabulary, our -architecture is able to generalise and transform a general-purpose -distributional vector space to model the relation of lexical entailment. -Experiments show excellent performance on scoring graded lexical entailment, -raising the state-of-the-art on the HyperLex dataset by approximately 25%. -" -7700,1805.09389,"Hongyu Gong, Suma Bhat, Pramod Viswanath",Embedding Syntax and Semantics of Prepositions via Tensor Decomposition,cs.CL," Prepositions are among the most frequent words in English and play complex -roles in the syntax and semantics of sentences. Not surprisingly, they pose -well-known difficulties in automatic processing of sentences (prepositional -attachment ambiguities and idiosyncratic uses in phrases). Existing methods on -preposition representation treat prepositions no different from content words -(e.g., word2vec and GloVe). In addition, recent studies aiming at solving -prepositional attachment and preposition selection problems depend heavily on -external linguistic resources and use dataset-specific word representations. In -this paper we use word-triple counts (one of the triples being a preposition) -to capture a preposition's interaction with its attachment and complement. We -then derive preposition embeddings via tensor decomposition on a large -unlabeled corpus. We reveal a new geometry involving Hadamard products and -empirically demonstrate its utility in paraphrasing phrasal verbs. Furthermore, -our preposition embeddings are used as simple features in two challenging -downstream tasks: preposition selection and prepositional attachment -disambiguation. We achieve results comparable to or better than the -state-of-the-art on multiple standardized datasets. -" -7701,1805.09436,"Sandeep Nallan Chakravarthula, Brian Baucom, Panayiotis Georgiou","Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy - Dyadic Interactions",cs.CL," Dyadic interactions among humans are marked by speakers continuously -influencing and reacting to each other in terms of responses and behaviors, -among others. Understanding how interpersonal dynamics affect behavior is -important for successful treatment in psychotherapy domains. Traditional -schemes that automatically identify behavior for this purpose have often looked -at only the target speaker. In this work, we propose a Markov model of how a -target speaker's behavior is influenced by their own past behavior as well as -their perception of their partner's behavior, based on lexical features. Apart -from incorporating additional potentially useful information, our model can -also control the degree to which the partner affects the target speaker. We -evaluate our proposed model on the task of classifying Negative behavior in -Couples Therapy and show that it is more accurate than the single-speaker -model. Furthermore, we investigate the degree to which the optimal influence -relates to how well a couple does on the long-term, via relating to -relationship outcomes -" -7702,1805.09559,Alexander Kirillov and Natalia Krizhanovsky and Andrew Krizhanovsky,"WSD algorithm based on a new method of vector-word contexts proximity - calculation via epsilon-filtration",cs.IR cs.CL," The problem of word sense disambiguation (WSD) is considered in the article. -Given a set of synonyms (synsets) and sentences with these synonyms. It is -necessary to select the meaning of the word in the sentence automatically. 1285 -sentences were tagged by experts, namely, one of the dictionary meanings was -selected by experts for target words. To solve the WSD-problem, an algorithm -based on a new method of vector-word contexts proximity calculation is -proposed. In order to achieve higher accuracy, a preliminary epsilon-filtering -of words is performed, both in the sentence and in the set of synonyms. An -extensive program of experiments was carried out. Four algorithms are -implemented, including a new algorithm. Experiments have shown that in a number -of cases the new algorithm shows better results. The developed software and the -tagged corpus have an open license and are available online. Wiktionary and -Wikisource are used. A brief description of this work can be viewed in slides -(https://goo.gl/9ak6Gt). Video lecture in Russian on this research is available -online (https://youtu.be/-DLmRkepf58). -" -7703,1805.09590,"Ella Rabinovich, Yulia Tsvetkov, Shuly Wintner",Native Language Cognate Effects on Second Language Lexical Choice,cs.CL," We present a computational analysis of cognate effects on the spontaneous -linguistic productions of advanced non-native speakers. Introducing a large -corpus of highly competent non-native English speakers, and using a set of -carefully selected lexical items, we show that the lexical choices of -non-natives are affected by cognates in their native language. This effect is -so powerful that we are able to reconstruct the phylogenetic language tree of -the Indo-European language family solely from the frequencies of specific -lexical items in the English of authors with various native languages. We -quantitatively analyze non-native lexical choice, highlighting cognate -facilitation as one of the important phenomena shaping the language of -non-native speakers. -" -7704,1805.09644,"Siamak Barzegar, Juliano Efson Sales, Andre Freitas, Siegfried - Handschuh and Brian Davis",DINFRA: A One Stop Shop for Computing Multilingual Semantic Relatedness,cs.IR cs.CL," This demonstration presents an infrastructure for computing multilingual -semantic relatedness and correlation for twelve natural languages by using -three distributional semantic models (DSMs). Our demonsrator - DInfra -(Distributional Infrastructure) provides researchers and developers with a -highly useful platform for processing large-scale corpora and conducting -experiments with distributional semantics. We integrate several multilingual -DSMs in our webservice so the end user can obtain a result without worrying -about the complexities involved in building DSMs. Our webservice allows the -users to have easy access to a wide range of comparisons of DSMs with different -parameters. In addition, users can configure and access DSM parameters using an -easy to use API. -" -7705,1805.09648,"Iurii Chernushenko, Felix A. Gers, Alexander L\""oser, Alessandro - Checco",Crowd-Labeling Fashion Reviews with Quality Control,cs.CL," We present a new methodology for high-quality labeling in the fashion domain -with crowd workers instead of experts. We focus on the Aspect-Based Sentiment -Analysis task. Our methods filter out inaccurate input from crowd workers but -we preserve different worker labeling to capture the inherent high variability -of the opinions. We demonstrate the quality of labeled data based on Facebook's -FastText framework as a baseline. -" -7706,1805.09655,"Victor Zhong, Caiming Xiong, Richard Socher",Global-Locally Self-Attentive Dialogue State Tracker,cs.CL cs.AI," Dialogue state tracking, which estimates user goals and requests given the -dialogue context, is an essential part of task-oriented dialogue systems. In -this paper, we propose the Global-Locally Self-Attentive Dialogue State Tracker -(GLAD), which learns representations of the user utterance and previous system -actions with global-local modules. Our model uses global modules to share -parameters between estimators for different types (called slots) of dialogue -states, and uses local modules to learn slot-specific features. We show that -this significantly improves tracking of rare states and achieves -state-of-the-art performance on the WoZ and DSTC2 state tracking tasks. GLAD -obtains 88.1% joint goal accuracy and 97.1% request accuracy on WoZ, -outperforming prior work by 3.7% and 5.5%. On DSTC2, our model obtains 74.5% -joint goal accuracy and 97.5% request accuracy, outperforming prior work by -1.1% and 1.0%. -" -7707,1805.09657,"Dieuwke Hupkes, Anand Singh, Kris Korrel, German Kruszewski, Elia - Bruni",Learning compositionally through attentive guidance,cs.CL cs.AI cs.LG," While neural network models have been successfully applied to domains that -require substantial generalisation skills, recent studies have implied that -they struggle when solving the task they are trained on requires inferring its -underlying compositional structure. In this paper, we introduce Attentive -Guidance, a mechanism to direct a sequence to sequence model equipped with -attention to find more compositional solutions. We test it on two tasks, -devised precisely to assess the compositional capabilities of neural models, -and we show that vanilla sequence to sequence models with attention overfit the -training distribution, while the guided versions come up with compositional -solutions that fit the training and testing distributions almost equally well. -Moreover, the learned solutions generalise even in cases where the training and -testing distributions strongly diverge. In this way, we demonstrate that -sequence to sequence models are capable of finding compositional solutions -without requiring extra components. These results helps to disentangle the -causes for the lack of systematic compositionality in neural networks, which -can in turn fuel future work. -" -7708,1805.09687,"Peter W J Staar, Michele Dolfi, Christoph Auer, Costas Bekas","Corpus Conversion Service: A machine learning platform to ingest - documents at scale [Poster abstract]",cs.DL cs.CL cs.CV cs.DC cs.IR," Over the past few decades, the amount of scientific articles and technical -literature has increased exponentially in size. Consequently, there is a great -need for systems that can ingest these documents at scale and make their -content discoverable. Unfortunately, both the format of these documents (e.g. -the PDF format or bitmap images) as well as the presentation of the data (e.g. -complex tables) make the extraction of qualitative and quantitive data -extremely challenging. We present a platform to ingest documents at scale which -is powered by Machine Learning techniques and allows the user to train custom -models on document collections. We show precision/recall results greater than -97% with regard to conversion to structured formats, as well as scaling -evidence for each of the microservices constituting the platform. -" -7709,1805.09701,"Pan Lu, Lei Ji, Wei Zhang, Nan Duan, Ming Zhou, Jianyong Wang","R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual - Question Answering",cs.CV cs.AI cs.CL cs.LG cs.MM," Recently, Visual Question Answering (VQA) has emerged as one of the most -significant tasks in multimodal learning as it requires understanding both -visual and textual modalities. Existing methods mainly rely on extracting image -and question features to learn their joint feature embedding via multimodal -fusion or attention mechanism. Some recent studies utilize external -VQA-independent models to detect candidate entities or attributes in images, -which serve as semantic knowledge complementary to the VQA task. However, these -candidate entities or attributes might be unrelated to the VQA task and have -limited semantic capacities. To better utilize semantic knowledge in images, we -propose a novel framework to learn visual relation facts for VQA. Specifically, -we build up a Relation-VQA (R-VQA) dataset based on the Visual Genome dataset -via a semantic similarity module, in which each data consists of an image, a -corresponding question, a correct answer and a supporting relation fact. A -well-defined relation detector is then adopted to predict visual -question-related relation facts. We further propose a multi-step attention -model composed of visual attention and semantic attention sequentially to -extract related visual knowledge and semantic knowledge. We conduct -comprehensive experiments on the two benchmark datasets, demonstrating that our -model achieves state-of-the-art performance and verifying the benefit of -considering visual relation facts. -" -7710,1805.09746,"Suraj Maharjan, Sudipta Kar, Manuel Montes-y-Gomez, Fabio A. Gonzalez, - Thamar Solorio","Letting Emotions Flow: Success Prediction by Modeling the Flow of - Emotions in Books",cs.CL," Books have the power to make us feel happiness, sadness, pain, surprise, or -sorrow. An author's dexterity in the use of these emotions captivates readers -and makes it difficult for them to put the book down. In this paper, we model -the flow of emotions over a book using recurrent neural networks and quantify -its usefulness in predicting success in books. We obtained the best weighted -F1-score of 69% for predicting books' success in a multitask setting -(simultaneously predicting success and genre of books). -" -7711,1805.09772,"Graham Bleaney, Matthew Kuzyk, Julian Man, Hossein Mayanloo, - H.R.Tizhoosh",Auto-Detection of Safety Issues in Baby Products,cs.LG cs.CL cs.IR stat.ML," Every year, thousands of people receive consumer product related injuries. -Research indicates that online customer reviews can be processed to -autonomously identify product safety issues. Early identification of safety -issues can lead to earlier recalls, and thus fewer injuries and deaths. A -dataset of product reviews from Amazon.com was compiled, along with -\emph{SaferProducts.gov} complaints and recall descriptions from the Consumer -Product Safety Commission (CPSC) and European Commission Rapid Alert system. A -system was built to clean the collected text and to extract relevant features. -Dimensionality reduction was performed by computing feature relevance through a -Random Forest and discarding features with low information gain. Various -classifiers were analyzed, including Logistic Regression, SVMs, -Na{\""i}ve-Bayes, Random Forests, and an Ensemble classifier. Experimentation -with various features and classifier combinations resulted in a logistic -regression model with 66\% precision in the top 50 reviews surfaced. This -classifier outperforms all benchmarks set by related literature and consumer -product safety professionals. -" -7712,1805.09780,"Abhirut Gupta, Abhay Khosla, Gautam Singh, Gargi Dasgupta",Mining Procedures from Technical Support Documents,cs.AI cs.CL cs.IR," Guided troubleshooting is an inherent task in the domain of technical support -services. When a customer experiences an issue with the functioning of a -technical service or a product, an expert user helps guide the customer through -a set of steps comprising a troubleshooting procedure. The objective is to -identify the source of the problem through a set of diagnostic steps and -observations, and arrive at a resolution. Procedures containing these set of -diagnostic steps and observations in response to different problems are common -artifacts in the body of technical support documentation. The ability to use -machine learning and linguistics to understand and leverage these procedures -for applications like intelligent chatbots or robotic process automation, is -crucial. Existing research on question answering or intelligent chatbots does -not look within procedures or deep-understand them. In this paper, we outline a -system for mining procedures from technical support documents. We create models -for solving important subproblems like extraction of procedures, identifying -decision points within procedures, identifying blocks of instructions -corresponding to these decision points and mapping instructions within a -decision block. We also release a dataset containing our manual annotations on -publicly available support documents, to promote further research on the -problem. -" -7713,1805.09821,Holger Schwenk and Xian Li,A Corpus for Multilingual Document Classification in Eight Languages,cs.CL," Cross-lingual document classification aims at training a document classifier -on resources in one language and transferring it to a different language -without any additional resources. Several approaches have been proposed in the -literature and the current best practice is to evaluate them on a subset of the -Reuters Corpus Volume 2. However, this subset covers only few languages -(English, German, French and Spanish) and almost all published works focus on -the the transfer between English and German. In addition, we have observed that -the class prior distributions differ significantly between the languages. We -argue that this complicates the evaluation of the multilinguality. In this -paper, we propose a new subset of the Reuters corpus with balanced class priors -for eight languages. By adding Italian, Russian, Japanese and Chinese, we cover -languages which are very different with respect to syntax, morphology, etc. We -provide strong baselines for all language transfer directions using -multilingual word and sentence embeddings respectively. Our goal is to offer a -freely available framework to evaluate cross-lingual document classification, -and we hope to foster by these means, research in this important area. -" -7714,1805.09822,Holger Schwenk,Filtering and Mining Parallel Data in a Joint Multilingual Space,cs.CL cs.AI," We learn a joint multilingual sentence embedding and use the distance between -sentences in different languages to filter noisy parallel data and to mine for -parallel data in large news collections. We are able to improve a competitive -baseline on the WMT'14 English to German task by 0.3 BLEU by filtering out 25% -of the training data. The same approach is used to mine additional bitexts for -the WMT'14 system and to obtain competitive results on the BUCC shared task to -identify parallel sentences in comparable corpora. The approach is generic, it -can be applied to many language pairs and it is independent of the architecture -of the machine translation system. -" -7715,1805.09843,"Dinghan Shen, Guoyin Wang, Wenlin Wang, Martin Renqiang Min, Qinliang - Su, Yizhe Zhang, Chunyuan Li, Ricardo Henao, Lawrence Carin","Baseline Needs More Love: On Simple Word-Embedding-Based Models and - Associated Pooling Mechanisms",cs.CL cs.AI cs.LG," Many deep learning architectures have been proposed to model the -compositionality in text sequences, requiring a substantial number of -parameters and expensive computations. However, there has not been a rigorous -evaluation regarding the added value of sophisticated compositional functions. -In this paper, we conduct a point-by-point comparative study between Simple -Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling -operations, relative to word-embedding-based RNN/CNN models. Surprisingly, -SWEMs exhibit comparable or even superior performance in the majority of cases -considered. Based upon this understanding, we propose two additional pooling -strategies over learned word embeddings: (i) a max-pooling operation for -improved interpretability; and (ii) a hierarchical pooling operation, which -preserves spatial (n-gram) information within text sequences. We present -experiments on 17 datasets encompassing three tasks: (i) (long) document -classification; (ii) text sequence matching; and (iii) short text tasks, -including classification and tagging. The source code and datasets can be -obtained from https:// github.com/dinghanshen/SWEM. -" -7716,1805.09863,"Hieu Hoang, Tomasz Dwojak, Rihards Krislauks, Daniel Torregrosa, - Kenneth Heafield",Fast Neural Machine Translation Implementation,cs.CL," This paper describes the submissions to the efficiency track for GPUs at the -Workshop for Neural Machine Translation and Generation by members of the -University of Edinburgh, Adam Mickiewicz University, Tilde and University of -Alicante. We focus on efficient implementation of the recurrent deep-learning -model as implemented in Amun, the fast inference engine for neural machine -translation. We improve the performance with an efficient mini-batching -algorithm, and by fusing the softmax operation with the k-best extraction -algorithm. Submissions using Amun were first, second and third fastest in the -GPU efficiency track. -" -7717,1805.09906,"Xinyuan Zhang, Yitong Li, Dinghan Shen, Lawrence Carin",Diffusion Maps for Textual Network Embedding,cs.CL cs.SI stat.ML," Textual network embedding leverages rich text information associated with the -network to learn low-dimensional vectorial representations of vertices. Rather -than using typical natural language processing (NLP) approaches, recent -research exploits the relationship of texts on the same edge to graphically -embed text. However, these models neglect to measure the complete level of -connectivity between any two texts in the graph. We present diffusion maps for -textual network embedding (DMTE), integrating global structural information of -the graph to capture the semantic relatedness between texts, with a -diffusion-convolution operation applied on the text inputs. In addition, a new -objective function is designed to efficiently preserve the high-order proximity -using the graph diffusion. Experimental results show that the proposed approach -outperforms state-of-the-art methods on the vertex-classification and -link-prediction tasks. -" -7718,1805.09927,"Pengda Qin, Weiran Xu, William Yang Wang","Robust Distant Supervision Relation Extraction via Deep Reinforcement - Learning",cs.CL," Distant supervision has become the standard method for relation extraction. -However, even though it is an efficient method, it does not come at no -cost---The resulted distantly-supervised training samples are often very noisy. -To combat the noise, most of the recent state-of-the-art approaches focus on -selecting one-best sentence or calculating soft attention weights over the set -of the sentences of one specific entity pair. However, these methods are -suboptimal, and the false positive problem is still a key stumbling bottleneck -for the performance. We argue that those incorrectly-labeled candidate -sentences must be treated with a hard decision, rather than being dealt with -soft attention weights. To do this, our paper describes a radical solution---We -explore a deep reinforcement learning strategy to generate the false-positive -indicator, where we automatically recognize false positives for each relation -type without any supervised information. Unlike the removal operation in the -previous studies, we redistribute them into the negative examples. The -experimental results show that the proposed strategy significantly improves the -performance of distant supervision comparing to state-of-the-art systems. -" -7719,1805.09929,"Pengda Qin, Weiran Xu, William Yang Wang","DSGAN: Generative Adversarial Training for Distant Supervision Relation - Extraction",cs.CL," Distant supervision can effectively label data for relation extraction, but -suffers from the noise labeling problem. Recent works mainly perform soft -bag-level noise reduction strategies to find the relatively better samples in a -sentence bag, which is suboptimal compared with making a hard decision of false -positive samples in sentence level. In this paper, we introduce an adversarial -learning framework, which we named DSGAN, to learn a sentence-level -true-positive generator. Inspired by Generative Adversarial Networks, we regard -the positive samples generated by the generator as the negative samples to -train the discriminator. The optimal generator is obtained until the -discrimination ability of the discriminator has the greatest decline. We adopt -the generator to filter distant supervision training dataset and redistribute -the false positive instances into the negative set, in which way to provide a -cleaned dataset for relation classification. The experimental results show that -the proposed strategy significantly improves the performance of distant -supervision relation extraction comparing to state-of-the-art systems. -" -7720,1805.09959,"Eric M. Clark, Ted James, Chris A. Jones, Amulya Alapati, Promise - Ukandu, Christopher M. Danforth, Peter Sheridan Dodds","A Sentiment Analysis of Breast Cancer Treatment Experiences and - Healthcare Perceptions Across Twitter",cs.CL cs.SI," Background: Social media has the capacity to afford the healthcare industry -with valuable feedback from patients who reveal and express their medical -decision-making process, as well as self-reported quality of life indicators -both during and post treatment. In prior work, [Crannell et. al.], we have -studied an active cancer patient population on Twitter and compiled a set of -tweets describing their experience with this disease. We refer to these online -public testimonies as ""Invisible Patient Reported Outcomes"" (iPROs), because -they carry relevant indicators, yet are difficult to capture by conventional -means of self-report. Methods: Our present study aims to identify tweets -related to the patient experience as an additional informative tool for -monitoring public health. Using Twitter's public streaming API, we compiled -over 5.3 million ""breast cancer"" related tweets spanning September 2016 until -mid December 2017. We combined supervised machine learning methods with natural -language processing to sift tweets relevant to breast cancer patient -experiences. We analyzed a sample of 845 breast cancer patient and survivor -accounts, responsible for over 48,000 posts. We investigated tweet content with -a hedonometric sentiment analysis to quantitatively extract emotionally charged -topics. Results: We found that positive experiences were shared regarding -patient treatment, raising support, and spreading awareness. Further -discussions related to healthcare were prevalent and largely negative focusing -on fear of political legislation that could result in loss of coverage. -Conclusions: Social media can provide a positive outlet for patients to discuss -their needs and concerns regarding their healthcare coverage and treatment -needs. Capturing iPROs from online communication can help inform healthcare -professionals and lead to more connected and personalized treatment regimens. -" -7721,1805.09960,"Yang Zhao, Yining Wang, Jiajun Zhang and Chengqing Zong",Phrase Table as Recommendation Memory for Neural Machine Translation,cs.CL," Neural Machine Translation (NMT) has drawn much attention due to its -promising translation performance recently. However, several studies indicate -that NMT often generates fluent but unfaithful translations. In this paper, we -propose a method to alleviate this problem by using a phrase table as -recommendation memory. The main idea is to add bonus to words worthy of -recommendation, so that NMT can make correct predictions. Specifically, we -first derive a prefix tree to accommodate all the candidate target phrases by -searching the phrase translation table according to the source sentence. Then, -we construct a recommendation word set by matching between candidate target -phrases and previously translated target words by NMT. After that, we determine -the specific bonus value for each recommendable word by using the attention -vector and phrase translation probability. Finally, we integrate this bonus -value into NMT to improve the translation results. The extensive experiments -demonstrate that the proposed methods obtain remarkable improvements over the -strong attentionbased NMT. -" -7722,1805.09991,"Hu Xu, Bing Liu, Lei Shu, Philip S. Yu",Lifelong Domain Word Embedding via Meta-Learning,cs.CL," Learning high-quality domain word embeddings is important for achieving good -performance in many NLP tasks. General-purpose embeddings trained on -large-scale corpora are often sub-optimal for domain-specific applications. -However, domain-specific tasks often do not have large in-domain corpora for -training high-quality domain embeddings. In this paper, we propose a novel -lifelong learning setting for domain embedding. That is, when performing the -new domain embedding, the system has seen many past domains, and it tries to -expand the new in-domain corpus by exploiting the corpora from the past domains -via meta-learning. The proposed meta-learner characterizes the similarities of -the contexts of the same word in many domain corpora, which helps retrieve -relevant data from the past domains to expand the new domain corpus. -Experimental results show that domain embeddings produced from such a process -improve the performance of the downstream tasks. -" -7723,1805.10047,"Michiki Kurosawa, Yukio Matsumura, Hayahide Yamagishi, Mamoru Komachi",Japanese Predicate Conjugation for Neural Machine Translation,cs.CL," Neural machine translation (NMT) has a drawback in that can generate only -high-frequency words owing to the computational costs of the softmax function -in the output layer. - In Japanese-English NMT, Japanese predicate conjugation causes an increase in -vocabulary size. For example, one verb can have as many as 19 surface -varieties. In this research, we focus on predicate conjugation for compressing -the vocabulary size in Japanese. The vocabulary list is filled with the various -forms of verbs. We propose methods using predicate conjugation information -without discarding linguistic information. The proposed methods can generate -low-frequency words and deal with unknown words. Two methods were considered to -introduce conjugation information: the first considers it as a token -(conjugation token) and the second considers it as an embedded vector -(conjugation feature). - The results using these methods demonstrate that the vocabulary size can be -compressed by approximately 86.1% (Tanaka corpus) and the NMT models can output -the words not in the training data set. Furthermore, BLEU scores improved by -0.91 points in Japanese-to-English translation, and 0.32 points in -English-to-Japanese translation with ASPEC. -" -7724,1805.10163,"Elena Voita, Pavel Serdyukov, Rico Sennrich, Ivan Titov",Context-Aware Neural Machine Translation Learns Anaphora Resolution,cs.CL," Standard machine translation systems process sentences in isolation and hence -ignore extra-sentential information, even though extended context can both -prevent mistakes in ambiguous cases and improve translation coherence. We -introduce a context-aware neural machine translation model designed in such way -that the flow of information from the extended context to the translation model -can be controlled and analyzed. We experiment with an English-Russian subtitles -dataset, and observe that much of what is captured by our model deals with -improving pronoun translation. We measure correspondences between induced -attention distributions and coreference relations and observe that the model -implicitly captures anaphora. It is consistent with gains for sentences where -pronouns need to be gendered in translation. Beside improvements in anaphoric -cases, the model also improves in overall BLEU, both over its context-agnostic -version (+0.7) and over simple concatenation of the context and source -sentences (+0.6). -" -7725,1805.10187,"Yuki Kawara, Chenhui Chu, Yuki Arase","Recursive Neural Network Based Preordering for English-to-Japanese - Machine Translation",cs.CL," The word order between source and target languages significantly influences -the translation quality in machine translation. Preordering can effectively -address this problem. Previous preordering methods require a manual feature -design, making language dependent design costly. In this paper, we propose a -preordering method with a recursive neural network that learns features from -raw inputs. Experiments show that the proposed method achieves comparable gain -in translation quality to the state-of-the-art method but without a manual -feature design. -" -7726,1805.10190,"Alice Coucke, Alaa Saade, Adrien Ball, Th\'eodore Bluche, Alexandre - Caulier, David Leroy, Cl\'ement Doumouro, Thibault Gisselbrecht, Francesco - Caltagirone, Thibaut Lavril, Ma\""el Primet, Joseph Dureau","Snips Voice Platform: an embedded Spoken Language Understanding system - for private-by-design voice interfaces",cs.CL cs.NE," This paper presents the machine learning architecture of the Snips Voice -Platform, a software solution to perform Spoken Language Understanding on -microprocessors typical of IoT devices. The embedded inference is fast and -accurate while enforcing privacy by design, as no personal user data is ever -collected. Focusing on Automatic Speech Recognition and Natural Language -Understanding, we detail our approach to training high-performance Machine -Learning models that are small enough to run in real-time on small devices. -Additionally, we describe a data generation procedure that provides sufficient, -high-quality training data without compromising user privacy. -" -7727,1805.10209,Alane Suhr and Yoav Artzi,"Situated Mapping of Sequential Instructions to Actions with Single-step - Reward Observation",cs.CL," We propose a learning approach for mapping context-dependent sequential -instructions to actions. We address the problem of discourse and state -dependencies with an attention-based model that considers both the history of -the interaction and the state of the world. To train from start and goal states -without access to demonstrations, we propose SESTRA, a learning algorithm that -takes advantage of single-step reward observations and immediate expected -reward maximization. We evaluate on the SCONE domains, and show absolute -accuracy improvements of 9.8%-25.3% across the domains over approaches that use -high-level logical representations. -" -7728,1805.10254,Xinyu Hua and Lu Wang,Neural Argument Generation Augmented with Externally Retrieved Evidence,cs.CL," High quality arguments are essential elements for human reasoning and -decision-making processes. However, effective argument construction is a -challenging task for both human and machines. In this work, we study a novel -task on automatically generating arguments of a different stance for a given -statement. We propose an encoder-decoder style neural network-based argument -generation model enriched with externally retrieved evidence from Wikipedia. -Our model first generates a set of talking point phrases as intermediate -representation, followed by a separate decoder producing the final argument -based on both input and the keyphrases. Experiments on a large-scale dataset -collected from Reddit show that our model constructs arguments with more -topic-relevant content than a popular sequence-to-sequence generation model -according to both automatic evaluation and human assessments. -" -7729,1805.10267,Shuning Jin and Ted Pedersen,"Duluth UROP at SemEval-2018 Task 2: Multilingual Emoji Prediction with - Ensemble Learning and Oversampling",cs.CL," This paper describes the Duluth UROP systems that participated in -SemEval--2018 Task 2, Multilingual Emoji Prediction. We relied on a variety of -ensembles made up of classifiers using Naive Bayes, Logistic Regression, and -Random Forests. We used unigram and bigram features and tried to offset the -skewness of the data through the use of oversampling. Our task evaluation -results place us 19th of 48 systems in the English evaluation, and 5th of 21 in -the Spanish. After the evaluation we realized that some simple changes to -preprocessing could significantly improve our results. After making these -changes we attained results that would have placed us sixth in the English -evaluation, and second in the Spanish. -" -7730,1805.10271,Arshia Z. Hassan and Manikya S. Vallabhajosyula and Ted Pedersen,"UMDuluth-CS8761 at SemEval-2018 Task 9: Hypernym Discovery using Hearst - Patterns, Co-occurrence frequencies and Word Embeddings",cs.CL," Hypernym Discovery is the task of identifying potential hypernyms for a given -term. A hypernym is a more generalized word that is super-ordinate to more -specific words. This paper explores several approaches that rely on -co-occurrence frequencies of word pairs, Hearst Patterns based on regular -expressions, and word embeddings created from the UMBC corpus. Our system -Babbage participated in Subtask 1A for English and placed 6th of 19 systems -when identifying concept hypernyms, and 12th of 18 systems for entity -hypernyms. -" -7731,1805.10274,Zhenduo Wang and Ted Pedersen,"UMDSub at SemEval-2018 Task 2: Multilingual Emoji Prediction - Multi-channel Convolutional Neural Network on Subword Embedding",cs.CL," This paper describes the UMDSub system that participated in Task 2 of -SemEval-2018. We developed a system that predicts an emoji given the raw text -in a English tweet. The system is a Multi-channel Convolutional Neural Network -based on subword embeddings for the representation of tweets. This model -improves on character or word based methods by about 2\%. Our system placed -21st of 48 participating systems in the official evaluation. -" -7732,1805.10338,"Lierni Sestorain and Massimiliano Ciaramita and Christian Buck and - Thomas Hofmann",Zero-Shot Dual Machine Translation,cs.CL cs.NE," Neural Machine Translation (NMT) systems rely on large amounts of parallel -data. This is a major challenge for low-resource languages. Building on recent -work on unsupervised and semi-supervised methods, we present an approach that -combines zero-shot and dual learning. The latter relies on reinforcement -learning, to exploit the duality of the machine translation task, and requires -only monolingual data for the target language pair. Experiments show that a -zero-shot dual system, trained on English-French and English-Spanish, -outperforms by large margins a standard NMT system in zero-shot translation -performance on Spanish-French (both directions). The zero-shot dual method -approaches the performance, within 2.2 BLEU points, of a comparable supervised -setting. Our method can obtain improvements also on the setting where a small -amount of parallel data for the zero-shot language pair is available. Adding -Russian, to extend our experiments to jointly modeling 6 zero-shot translation -directions, all directions improve between 4 and 15 BLEU points, again, -reaching performance near that of the supervised setting. -" -7733,1805.10364,"Hojjat Aghakhani, Aravind Machiry, Shirin Nilizadeh, Christopher - Kruegel, and Giovanni Vigna",Detecting Deceptive Reviews using Generative Adversarial Networks,cs.CR cs.AI cs.CL cs.IR cs.LG," In the past few years, consumer review sites have become the main target of -deceptive opinion spam, where fictitious opinions or reviews are deliberately -written to sound authentic. Most of the existing work to detect the deceptive -reviews focus on building supervised classifiers based on syntactic and lexical -patterns of an opinion. With the successful use of Neural Networks on various -classification applications, in this paper, we propose FakeGAN a system that -for the first time augments and adopts Generative Adversarial Networks (GANs) -for a text classification task, in particular, detecting deceptive reviews. -Unlike standard GAN models which have a single Generator and Discriminator -model, FakeGAN uses two discriminator models and one generative model. The -generator is modeled as a stochastic policy agent in reinforcement learning -(RL), and the discriminators use Monte Carlo search algorithm to estimate and -pass the intermediate action-value as the RL reward to the generator. Providing -the generator model with two discriminator models avoids the mod collapse issue -by learning from both distributions of truthful and deceptive reviews. Indeed, -our experiments show that using two discriminators provides FakeGAN high -stability, which is a known issue for GAN architectures. While FakeGAN is built -upon a semi-supervised classifier, known for less accuracy, our evaluation -results on a dataset of TripAdvisor hotel reviews show the same performance in -terms of accuracy as of the state-of-the-art approaches that apply supervised -machine learning. These results indicate that GANs can be effective for text -classification tasks. Specifically, FakeGAN is effective at detecting deceptive -reviews. -" -7734,1805.10387,"Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Jason - Li, Huyen Nguyen, Carl Case, Paulius Micikevicius",Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq,cs.CL," We present OpenSeq2Seq - a TensorFlow-based toolkit for training -sequence-to-sequence models that features distributed and mixed-precision -training. Benchmarks on machine translation and speech recognition tasks show -that models built using OpenSeq2Seq give state-of-the-art performance at 1.5-3x -less training time. OpenSeq2Seq currently provides building blocks for models -that solve a wide range of tasks including neural machine translation, -automatic speech recognition, and speech synthesis. -" -7735,1805.10389,"Kristjan Arumae, Guo-Jun Qi, Fei Liu","A Study of Question Effectiveness Using Reddit ""Ask Me Anything"" Threads",cs.CL," Asking effective questions is a powerful social skill. In this paper we seek -to build computational models that learn to discriminate effective questions -from ineffective ones. Armed with such a capability, future advanced systems -can evaluate the quality of questions and provide suggestions for effective -question wording. We create a large-scale, real-world dataset that contains -over 400,000 questions collected from Reddit ""Ask Me Anything"" threads. Each -thread resembles an online press conference where questions compete with each -other for attention from the host. This dataset enables the development of a -class of computational models for predicting whether a question will be -answered. We develop a new convolutional neural network architecture with -variable-length context and demonstrate the efficacy of the model by comparing -it with state-of-the-art baselines and human judges. -" -7736,1805.10390,"Sansiri Tarnpradab, Fei Liu, Kien A. Hua","Toward Extractive Summarization of Online Forum Discussions via - Hierarchical Attention Networks",cs.CL," Forum threads are lengthy and rich in content. Concise thread summaries will -benefit both newcomers seeking information and those who participate in the -discussion. Few studies, however, have examined the task of forum thread -summarization. In this work we make the first attempt to adapt the hierarchical -attention networks for thread summarization. The model draws on the recent -development of neural attention mechanisms to build sentence and thread -representations and use them for summarization. Our results indicate that the -proposed approach can outperform a range of competitive baselines. Further, a -redundancy removal step is crucial for achieving outstanding results. -" -7737,1805.10392,"Kristjan Arumae, Fei Liu",Reinforced Extractive Summarization with Question-Focused Rewards,cs.CL," We investigate a new training paradigm for extractive summarization. -Traditionally, human abstracts are used to derive goldstandard labels for -extraction units. However, the labels are often inaccurate, because human -abstracts and source documents cannot be easily aligned at the word level. In -this paper we convert human abstracts to a set of Cloze-style comprehension -questions. System summaries are encouraged to preserve salient source content -useful for answering questions and share common words with the abstracts. We -use reinforcement learning to explore the space of possible extractive -summaries and introduce a question-focused reward function to promote concise, -fluent, and informative summaries. Our experiments show that the proposed -method is effective. It surpasses state-of-the-art systems on the standard -summarization dataset. -" -7738,1805.10393,"Fei Liu, Nicole Lee Fella, Kexin Liao","Modeling Language Vagueness in Privacy Policies using Deep Neural - Networks",cs.CL," Website privacy policies are too long to read and difficult to understand. -The over-sophisticated language makes privacy notices to be less effective than -they should be. People become even less willing to share their personal -information when they perceive the privacy policy as vague. This paper focuses -on decoding vagueness from a natural language processing perspective. While -thoroughly identifying the vague terms and their linguistic scope remains an -elusive challenge, in this work we seek to learn vector representations of -words in privacy policies using deep neural networks. The vector -representations are fed to an interactive visualization tool (LSTMVis) to test -on their ability to discover syntactically and semantically related vague -terms. The approach holds promise for modeling and understanding language -vagueness. -" -7739,1805.10395,"Wencan Luo, Fei Liu, Zitao Liu, Diane Litman",Automatic Summarization of Student Course Feedback,cs.CL," Student course feedback is generated daily in both classrooms and online -course discussion forums. Traditionally, instructors manually analyze these -responses in a costly manner. In this work, we propose a new approach to -summarizing student course feedback based on the integer linear programming -(ILP) framework. Our approach allows different student responses to share -co-occurrence statistics and alleviates sparsity issues. Experimental results -on a student feedback corpus show that our approach outperforms a range of -baselines in terms of both ROUGE scores and human evaluation. -" -7740,1805.10396,"Wencan Luo, Fei Liu, Diane Litman","An Improved Phrase-based Approach to Annotating and Summarizing Student - Course Responses",cs.CL," Teaching large classes remains a great challenge, primarily because it is -difficult to attend to all the student needs in a timely manner. Automatic text -summarization systems can be leveraged to summarize the student feedback, -submitted immediately after each lecture, but it is left to be discovered what -makes a good summary for student responses. In this work we explore a new -methodology that effectively extracts summary phrases from the student -responses. Each phrase is tagged with the number of students who raise the -issue. The phrases are evaluated along two dimensions: with respect to text -content, they should be informative and well-formed, measured by the ROUGE -metric; additionally, they shall attend to the most pressing student needs, -measured by a newly proposed metric. This work is enabled by a phrase-based -annotation and highlighting scheme, which is new to the summarization task. The -phrase-based framework allows us to summarize the student responses into a set -of bullet points and present to the instructor promptly. -" -7741,1805.10399,"Fei Liu, Jeffrey Flanigan, Sam Thomson, Norman Sadeh, Noah A. Smith",Toward Abstractive Summarization Using Semantic Representations,cs.CL," We present a novel abstractive summarization framework that draws on the -recent development of a treebank for the Abstract Meaning Representation (AMR). -In this framework, the source text is parsed to a set of AMR graphs, the graphs -are transformed into a summary graph, and then text is generated from the -summary graph. We focus on the graph-to-graph transformation that reduces the -source semantic graph into a summary graph, making use of an existing AMR -parser and assuming the eventual availability of an AMR-to-text generator. The -framework is data-driven, trainable, and not specifically designed for a -particular domain. Experiments on gold-standard AMR annotations and system -parses show promising results. Code is available at: -https://github.com/summarization -" -7742,1805.10414,Wangjin Lee and Jinwook Choi,"Connecting Distant Entities with Induction through Conditional Random - Fields for Named Entity Recognition: Precursor-Induced CRF",cs.CL," This paper presents a method of designing specific high-order dependency -factor on the linear chain conditional random fields (CRFs) for named entity -recognition (NER). Named entities tend to be separated from each other by -multiple outside tokens in a text, and thus the first-order CRF, as well as the -second-order CRF, may innately lose transition information between distant -named entities. The proposed design uses outside label in NER as a transmission -medium of precedent entity information on the CRF. Then, empirical results -apparently demonstrate that it is possible to exploit long-distance label -dependency in the original first-order linear chain CRF structure upon NER -while reducing computational loss rather than in the second-order CRF. -" -7743,1805.10465,"Zhuosheng Zhang, Jiangtong Li, Hai Zhao, Bingjie Tang","SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term - Embeddings",cs.CL," This paper describes a hypernym discovery system for our participation in the -SemEval-2018 Task 9, which aims to discover the best (set of) candidate -hypernyms for input concepts or entities, given the search space of a -pre-defined vocabulary. We introduce a neural network architecture for the -concerned task and empirically study various neural network models to build the -representations in latent space for words and phrases. The evaluated models -include convolutional neural network, long-short term memory network, gated -recurrent unit and recurrent convolutional neural network. We also explore -different embedding methods, including word embedding and sense embedding for -better performance. -" -7744,1805.10528,"Reza Ghaeini, Xiaoli Z. Fern, Hamed Shahbazi, Prasad Tadepalli",Dependent Gated Reading for Cloze-Style Question Answering,cs.CL cs.AI," We present a novel deep learning architecture to address the cloze-style -question answering task. Existing approaches employ reading mechanisms that do -not fully exploit the interdependency between the document and the query. In -this paper, we propose a novel \emph{dependent gated reading} bidirectional GRU -network (DGR) to efficiently model the relationship between the document and -the query during encoding and decision making. Our evaluation shows that DGR -obtains highly competitive performance on well-known machine comprehension -benchmarks such as the Children's Book Test (CBT-NE and CBT-CN) and Who DiD -What (WDW, Strict and Relaxed). Finally, we extensively analyze and validate -our model by ablation and attention studies. -" -7745,1805.10547,"Volkan Cirik, Taylor Berg-Kirkpatrick, Louis-Philippe Morency",Using Syntax to Ground Referring Expressions in Natural Images,cs.CV cs.CL cs.NE," We introduce GroundNet, a neural network for referring expression recognition --- the task of localizing (or grounding) in an image the object referred to by -a natural language expression. Our approach to this task is the first to rely -on a syntactic analysis of the input referring expression in order to inform -the structure of the computation graph. Given a parse tree for an input -expression, we explicitly map the syntactic constituents and relationships -present in the tree to a composed graph of neural modules that defines our -architecture for performing localization. This syntax-based approach aids -localization of \textit{both} the target object and auxiliary supporting -objects mentioned in the expression. As a result, GroundNet is more -interpretable than previous methods: we can (1) determine which phrase of the -referring expression points to which object in the image and (2) track how the -localization of the target object is determined by the network. We study this -property empirically by introducing a new set of annotations on the GoogleRef -dataset to evaluate localization of supporting objects. Our experiments show -that GroundNet achieves state-of-the-art accuracy in identifying supporting -objects, while maintaining comparable performance in the localization of target -objects. -" -7746,1805.10564,Rajarshi Bhowmik and Gerard de Melo,Generating Fine-Grained Open Vocabulary Entity Type Descriptions,cs.CL," While large-scale knowledge graphs provide vast amounts of structured facts -about entities, a short textual description can often be useful to succinctly -characterize an entity and its type. Unfortunately, many knowledge graph -entities lack such textual descriptions. In this paper, we introduce a dynamic -memory-based network that generates a short open vocabulary description of an -entity by jointly leveraging induced fact embeddings as well as the dynamic -context of the generated sequence of words. We demonstrate the ability of our -architecture to discern relevant information for more accurate generation of -type description by pitting the system against several strong baselines. -" -7747,1805.10586,Dat Quoc Nguyen and Karin Verspoor,"Convolutional neural networks for chemical-disease relation extraction - are improved with character-based word embeddings",cs.CL," We investigate the incorporation of character-based word representations into -a standard CNN-based relation extraction model. We experiment with two common -neural architectures, CNN and LSTM, to learn word vector representations from -character embeddings. Through a task on the BioCreative-V CDR corpus, -extracting relationships between chemicals and diseases, we show that models -exploiting the character-based word representations improve on models that do -not use this information, obtaining state-of-the-art result relative to -previous neural approaches. -" -7748,1805.10627,"Julia Kreutzer, Joshua Uyheng, Stefan Riezler","Reliability and Learnability of Human Bandit Feedback for - Sequence-to-Sequence Reinforcement Learning",cs.CL stat.ML," We present a study on reinforcement learning (RL) from human bandit feedback -for sequence-to-sequence learning, exemplified by the task of bandit neural -machine translation (NMT). We investigate the reliability of human bandit -feedback, and analyze the influence of reliability on the learnability of a -reward estimator, and the effect of the quality of reward estimates on the -overall RL task. Our analysis of cardinal (5-point ratings) and ordinal -(pairwise preferences) feedback shows that their intra- and inter-annotator -$\alpha$-agreement is comparable. Best reliability is obtained for standardized -cardinal feedback, and cardinal feedback is also easiest to learn and -generalize from. Finally, improvements of over 1 BLEU can be obtained by -integrating a regression-based reward estimator trained on cardinal feedback -for 800 translations into RL for NMT. This shows that RL is possible even from -small amounts of fairly reliable human feedback, pointing to a great potential -for applications at larger scale. -" -7749,1805.10685,"Keet Sugathadasa, Buddhi Ayesha, Nisansa de Silva, Amal Shehan Perera, - Vindula Jayawardana, Dimuthu Lakmal, Madhavi Perera","Legal Document Retrieval using Document Vector Embeddings and Deep - Learning",cs.IR cs.CL," Domain specific information retrieval process has been a prominent and -ongoing research in the field of natural language processing. Many researchers -have incorporated different techniques to overcome the technical and domain -specificity and provide a mature model for various domains of interest. The -main bottleneck in these studies is the heavy coupling of domain experts, that -makes the entire process to be time consuming and cumbersome. In this study, we -have developed three novel models which are compared against a golden standard -generated via the on line repositories provided, specifically for the legal -domain. The three different models incorporated vector space representations of -the legal domain, where document vector generation was done in two different -mechanisms and as an ensemble of the above two. This study contains the -research being carried out in the process of representing legal case documents -into different vector spaces, whilst incorporating semantic word measures and -natural language processing techniques. The ensemble model built in this study, -shows a significantly higher accuracy level, which indeed proves the need for -incorporation of domain specific semantic similarity measures into the -information retrieval process. This study also shows, the impact of varying -distribution of the word similarity measures, against varying document vector -dimensions, which can lead to improvements in the process of legal information -retrieval. -" -7750,1805.10796,"Krzysztof Wr\'obel, Marcin Pietro\'n, Maciej Wielgosz, Micha{\l} - Karwatowski and Kazimierz Wiatr",Convolutional neural network compression for natural language processing,cs.CL cs.LG cs.NE," Convolutional neural networks are modern models that are very efficient in -many classification tasks. They were originally created for image processing -purposes. Then some trials were performed to use them in different domains like -natural language processing. The artificial intelligence systems (like humanoid -robots) are very often based on embedded systems with constraints on memory, -power consumption etc. Therefore convolutional neural network because of its -memory capacity should be reduced to be mapped to given hardware. In this -paper, results are presented of compressing the efficient convolutional neural -networks for sentiment analysis. The main steps are quantization and pruning -processes. The method responsible for mapping compressed network to FPGA and -results of this implementation are presented. The described simulations showed -that 5-bit width is enough to have no drop in accuracy from floating point -version of the network. Additionally, significant memory footprint reduction -was achieved (from 85% up to 93%). -" -7751,1805.10799,"Hyemin Ahn, Sungjoon Choi, Nuri Kim, Geonho Cha, Songhwai Oh","Interactive Text2Pickup Network for Natural Language based Human-Robot - Collaboration",cs.RO cs.CL cs.HC," In this paper, we propose the Interactive Text2Pickup (IT2P) network for -human-robot collaboration which enables an effective interaction with a human -user despite the ambiguity in user's commands. We focus on the task where a -robot is expected to pick up an object instructed by a human, and to interact -with the human when the given instruction is vague. The proposed network -understands the command from the human user and estimates the position of the -desired object first. To handle the inherent ambiguity in human language -commands, a suitable question which can resolve the ambiguity is generated. The -user's answer to the question is combined with the initial command and given -back to the network, resulting in more accurate estimation. The experiment -results show that given unambiguous commands, the proposed method can estimate -the position of the requested object with an accuracy of 98.49% based on our -test dataset. Given ambiguous language commands, we show that the accuracy of -the pick up task increases by 1.94 times after incorporating the information -obtained from the interaction. -" -7752,1805.10824,"Marloes Kuijper, Mike van Lenthe, Rik van Noord","UG18 at SemEval-2018 Task 1: Generating Additional Training Data for - Predicting Emotion Intensity in Spanish",cs.CL," The present study describes our submission to SemEval 2018 Task 1: Affect in -Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to -automatically generate additional training data by (i) translating training -data from other languages and (ii) applying a semi-supervised learning method. -We find strong support for both approaches, with those models outperforming our -regular models in all subtasks. However, creating a stepwise ensemble of -different models as opposed to simply averaging did not result in an increase -in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and -fifth (V-Oc) in the four Spanish subtasks we participated in. -" -7753,1805.10844,"Philip Schulz, Wilker Aziz, Trevor Cohn",A Stochastic Decoder for Neural Machine Translation,stat.ML cs.CL cs.LG," The process of translation is ambiguous, in that there are typically many -valid trans- lations for a given sentence. This gives rise to significant -variation in parallel cor- pora, however, most current models of machine -translation do not account for this variation, instead treating the prob- lem -as a deterministic process. To this end, we present a deep generative model of -machine translation which incorporates a chain of latent variables, in order to -ac- count for local lexical and syntactic varia- tion in parallel corpora. We -provide an in- depth analysis of the pitfalls encountered in variational -inference for training deep generative models. Experiments on sev- eral -different language pairs demonstrate that the model consistently improves over -strong baselines. -" -7754,1805.10850,Ke Tran and Yonatan Bisk,Inducing Grammars with and for Neural Machine Translation,cs.CL," Machine translation systems require semantic knowledge and grammatical -understanding. Neural machine translation (NMT) systems often assume this -information is captured by an attention mechanism and a decoder that ensures -fluency. Recent work has shown that incorporating explicit syntax alleviates -the burden of modeling both types of knowledge. However, requiring parses is -expensive and does not explore the question of what syntax a model needs during -translation. To address both of these issues we introduce a model that -simultaneously translates while inducing dependency trees. In this way, we -leverage the benefits of structure while investigating what syntax NMT must -induce to maximize performance. We show that our dependency trees are 1. -language pair dependent and 2. improve translation quality. -" -7755,1805.10856,"Yang Yang, Haoyan Liu, Xia Hu, Jiawei Zhang, Xiaoming Zhang, Zhoujun - Li, Philip S. Yu",r-Instance Learning for Missing People Tweets Identification,cs.SI cs.CL," The number of missing people (i.e., people who get lost) greatly increases in -recent years. It is a serious worldwide problem, and finding the missing people -consumes a large amount of social resources. In tracking and finding these -missing people, timely data gathering and analysis actually play an important -role. With the development of social media, information about missing people -can get propagated through the web very quickly, which provides a promising way -to solve the problem. The information in online social media is usually of -heterogeneous categories, involving both complex social interactions and -textual data of diverse structures. Effective fusion of these different types -of information for addressing the missing people identification problem can be -a great challenge. Motivated by the multi-instance learning problem and -existing social science theory of ""homophily"", in this paper, we propose a -novel r-instance (RI) learning model. -" -7756,1805.10956,Wenlin Yao and Ruihong Huang,Temporal Event Knowledge Acquisition via Identifying Narratives,cs.CL cs.AI," Inspired by the double temporality characteristic of narrative texts, we -propose a novel approach for acquiring rich temporal ""before/after"" event -knowledge across sentences in narrative stories. The double temporality states -that a narrative story often describes a sequence of events following the -chronological order and therefore, the temporal order of events matches with -their textual order. We explored narratology principles and built a weakly -supervised approach that identifies 287k narrative paragraphs from three large -text corpora. We then extracted rich temporal event knowledge from these -narrative paragraphs. Such event knowledge is shown useful to improve temporal -relation classification and outperform several recent neural network models on -the narrative cloze task. -" -7757,1805.10959,"Xu Han, Zhiyuan Liu, Maosong Sun","Denoising Distant Supervision for Relation Extraction via Instance-Level - Adversarial Training",cs.CL," Existing neural relation extraction (NRE) models rely on distant supervision -and suffer from wrong labeling problems. In this paper, we propose a novel -adversarial training mechanism over instances for relation extraction to -alleviate the noise issue. As compared with previous denoising methods, our -proposed method can better discriminate those informative instances from noisy -ones. Our method is also efficient and flexible to be applied to various NRE -architectures. As shown in the experiments on a large-scale benchmark dataset -in relation extraction, our denoising method can effectively filter out noisy -instances and achieve significant improvements as compared with the -state-of-the-art models. -" -7758,1805.10973,"Taehyeong Kim, Min-Oh Heo, Seonil Son, Kyoung-Wha Park, Byoung-Tak - Zhang","GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story - Generation",cs.CL cs.CV," The task of multi-image cued story generation, such as visual storytelling -dataset (VIST) challenge, is to compose multiple coherent sentences from a -given sequence of images. The main difficulty is how to generate image-specific -sentences within the context of overall images. Here we propose a deep learning -network model, GLAC Net, that generates visual stories by combining -global-local (glocal) attention and context cascading mechanisms. The model -incorporates two levels of attention, i.e., overall encoding level and image -feature level, to construct image-dependent sentences. While standard attention -configuration needs a large number of parameters, the GLAC Net implements them -in a very simple way via hard connections from the outputs of encoders or image -features onto the sentence generators. The coherency of the generated story is -further improved by conveying (cascading) the information of the previous -sentence to the next sentence serially. We evaluate the performance of the GLAC -Net on the visual storytelling dataset (VIST) and achieve very competitive -results compared to the state-of-the-art techniques. Our code and pre-trained -models are available here. -" -7759,1805.10985,"Kian Kenyon-Dean, Jackie Chi Kit Cheung, Doina Precup","Resolving Event Coreference with Supervised Representation Learning and - Clustering-Oriented Regularization",cs.CL," We present an approach to event coreference resolution by developing a -general framework for clustering that uses supervised representation learning. -We propose a neural network architecture with novel Clustering-Oriented -Regularization (CORE) terms in the objective function. These terms encourage -the model to create embeddings of event mentions that are amenable to -clustering. We then use agglomerative clustering on these embeddings to build -event coreference chains. For both within- and cross-document coreference on -the ECB+ corpus, our model obtains better results than models that require -significantly more pre-annotated information. This work provides insight and -motivating results for a new general approach to solving coreference and -clustering problems with representation learning. -" -7760,1805.11004,"Han Guo, Ramakanth Pasunuru, Mohit Bansal","Soft Layer-Specific Multi-Task Summarization with Entailment and - Question Generation",cs.CL cs.AI cs.LG," An accurate abstractive summary of a document should contain all its salient -information and should be logically entailed by the input document. We improve -these important aspects of abstractive summarization via multi-task learning -with the auxiliary tasks of question generation and entailment generation, -where the former teaches the summarization model how to look for salient -questioning-worthy details, and the latter teaches the model how to rewrite a -summary which is a directed-logical subset of the input document. We also -propose novel multi-task architectures with high-level (semantic) -layer-specific sharing across multiple encoder and decoder layers of the three -tasks, as well as soft-sharing mechanisms (and show performance ablations and -analysis examples of each contribution). Overall, we achieve statistically -significant improvements over the state-of-the-art on both the CNN/DailyMail -and Gigaword datasets, as well as on the DUC-2002 transfer setup. We also -present several quantitative and qualitative analysis studies of our model's -learned saliency and entailment skills. -" -7761,1805.11025,"Ankit Goyal, Jian Wang and Jia Deng",Think Visually: Question Answering through Virtual Imagery,cs.CL cs.AI," In this paper, we study the problem of geometric reasoning in the context of -question-answering. We introduce Dynamic Spatial Memory Network (DSMN), a new -deep network architecture designed for answering questions that admit latent -visual representations. DSMN learns to generate and reason over such -representations. Further, we propose two synthetic benchmarks, FloorPlanQA and -ShapeIntersection, to evaluate the geometric reasoning capability of QA -systems. Experimental results validate the effectiveness of our proposed DSMN -for visual thinking tasks. -" -7762,1805.11080,"Yen-Chun Chen, Mohit Bansal","Fast Abstractive Summarization with Reinforce-Selected Sentence - Rewriting",cs.CL cs.AI cs.LG," Inspired by how humans summarize long documents, we propose an accurate and -fast summarization model that first selects salient sentences and then rewrites -them abstractively (i.e., compresses and paraphrases) to generate a concise -overall summary. We use a novel sentence-level policy gradient method to bridge -the non-differentiable computation between these two neural networks in a -hierarchical way, while maintaining language fluency. Empirically, we achieve -the new state-of-the-art on all metrics (including human evaluation) on the -CNN/Daily Mail dataset, as well as significantly higher abstractiveness scores. -Moreover, by first operating at the sentence-level and then the word-level, we -enable parallel decoding of our neural generative model that results in -substantially faster (10-20x) inference speed as well as 4x faster training -convergence than previous long-paragraph encoder-decoder models. We also -demonstrate the generalization of our model on the test-only DUC-2002 dataset, -where we achieve higher scores than a state-of-the-art model. -" -7763,1805.11140,"Fionn Murtagh, Giuseppe Iurato",Core Conflictual Relationship: Text Mining to Discover What and When,cs.CL," Following detailed presentation of the Core Conflictual Relationship Theme -(CCRT), there is the objective of relevant methods for what has been described -as verbalization and visualization of data. Such is also termed data mining and -text mining, and knowledge discovery in data. The Correspondence Analysis -methodology, also termed Geometric Data Analysis, is shown in a case study to -be comprehensive and revealing. Computational efficiency depends on how the -analysis process is structured. For both illustrative and revealing aspects of -the case study here, relatively extensive dream reports are used. This -Geometric Data Analysis confirms the validity of CCRT method. -" -7764,1805.11154,"Wen Zhang, Jiawei Hu, Yang Feng and Qun Liu","Refining Source Representations with Relation Networks for Neural - Machine Translation",cs.CL cs.AI stat.ML," Although neural machine translation with the encoder-decoder framework has -achieved great success recently, it still suffers drawbacks of forgetting -distant information, which is an inherent disadvantage of recurrent neural -network structure, and disregarding relationship between source words during -encoding step. Whereas in practice, the former information and relationship are -often useful in current step. We target on solving these problems and thus -introduce relation networks to learn better representations of the source. The -relation networks are able to facilitate memorization capability of recurrent -neural network via associating source words with each other, this would also -help retain their relationships. Then the source representations and all the -relations are fed into the attention component together while decoding, with -the main encoder-decoder framework unchanged. Experiments on several datasets -show that our method can improve the translation performance significantly over -the conventional encoder-decoder model and even outperform the approach -involving supervised syntactic knowledge. -" -7765,1805.11166,"Miguel A. Alvarez-Carmona, Luis Pellegrin, Manuel Montes-y-G\'omez, - Fernando S\'anchez-Vega, Hugo Jair Escalante, A. Pastor L\'opez-Monroy, Luis - Villase\~nor-Pineda, Esa\'u Villatoro-Tello",A visual approach for age and gender identification on Twitter,cs.CL cs.AI," The goal of Author Profiling (AP) is to identify demographic aspects (e.g., -age, gender) from a given set of authors by analyzing their written texts. -Recently, the AP task has gained interest in many problems related to computer -forensics, psychology, marketing, but specially in those related with social -media exploitation. As known, social media data is shared through a wide range -of modalities (e.g., text, images and audio), representing valuable information -to be exploited for extracting valuable insights from users. Nevertheless, most -of the current work in AP using social media data has been devoted to analyze -textual information only, and there are very few works that have started -exploring the gender identification using visual information. Contrastingly, -this paper focuses in exploiting the visual modality to perform both age and -gender identification in social media, specifically in Twitter. Our goal is to -evaluate the pertinence of using visual information in solving the AP task. -Accordingly, we have extended the Twitter corpus from PAN 2014, incorporating -posted images from all the users, making a distinction between tweeted and -retweeted images. Performed experiments provide interesting evidence on the -usefulness of visual information in comparison with traditional textual -representations for the AP task. -" -7766,1805.11189,"Satoru Katsumata, Yukio Matsumura, Hayahide Yamagishi, Mamoru Komachi","Graph-based Filtering of Out-of-Vocabulary Words for Encoder-Decoder - Models",cs.CL," Encoder-decoder models typically only employ words that are frequently used -in the training corpus to reduce the computational costs and exclude noise. -However, this vocabulary set may still include words that interfere with -learning in encoder-decoder models. This paper proposes a method for selecting -more suitable words for learning encoders by utilizing not only frequency, but -also co-occurrence information, which we capture using the HITS algorithm. We -apply our proposed method to two tasks: machine translation and grammatical -error correction. For Japanese-to-English translation, this method achieves a -BLEU score that is 0.56 points more than that of a baseline. It also -outperforms the baseline method for English grammatical error correction, with -an F0.5-measure that is 1.48 points higher. -" -7767,1805.11213,"Xing Niu, Michael Denkowski, Marine Carpuat",Bi-Directional Neural Machine Translation with Synthetic Parallel Data,cs.CL," Despite impressive progress in high-resource settings, Neural Machine -Translation (NMT) still struggles in low-resource and out-of-domain scenarios, -often failing to match the quality of phrase-based translation. We propose a -novel technique that combines back-translation and multilingual NMT to improve -performance in these difficult cases. Our technique trains a single model for -both directions of a language pair, allowing us to back-translate source or -target monolingual data without requiring an auxiliary model. We then continue -training on the augmented parallel data, enabling a cycle of improvement for a -single model that can incorporate any source, target, or parallel data to -improve both translation directions. As a byproduct, these models can reduce -training and deployment costs significantly compared to uni-directional models. -Extensive experiments show that our technique outperforms standard -back-translation in low-resource scenarios, improves quality on cross-domain -tasks, and effectively reduces costs across the board. -" -7768,1805.11222,"Edouard Grave, Armand Joulin, Quentin Berthet",Unsupervised Alignment of Embeddings with Wasserstein Procrustes,cs.LG cs.CL stat.ML," We consider the task of aligning two sets of points in high dimension, which -has many applications in natural language processing and computer vision. As an -example, it was recently shown that it is possible to infer a bilingual -lexicon, without supervised data, by aligning word embeddings trained on -monolingual data. These recent advances are based on adversarial training to -learn the mapping between the two embeddings. In this paper, we propose to use -an alternative formulation, based on the joint estimation of an orthogonal -matrix and a permutation matrix. While this problem is not convex, we propose -to initialize our optimization algorithm by using a convex relaxation, -traditionally considered for the graph isomorphism problem. We propose a -stochastic algorithm to minimize our cost function on large scale problems. -Finally, we evaluate our method on the problem of unsupervised word -translation, by aligning word embeddings trained on monolingual data. On this -task, our method obtains state of the art results, while requiring less -computational resources than competing approaches. -" -7769,1805.11224,"Yijia Liu, Wanxiang Che, Huaipeng Zhao, Bing Qin, Ting Liu",Distilling Knowledge for Search-based Structured Prediction,cs.CL," Many natural language processing tasks can be modeled into structured -prediction and solved as a search problem. In this paper, we distill an -ensemble of multiple models trained with different initialization into a single -model. In addition to learning to match the ensemble's probability output on -the reference states, we also use the ensemble to explore the search space and -learn from the encountered states in the exploration. Experimental results on -two typical search-based structured prediction tasks -- transition-based -dependency parsing and neural machine translation show that distillation can -effectively improve the single model's performance and the final model achieves -improvements of 1.32 in LAS and 2.65 in BLEU score on these two tasks -respectively over strong baselines and it outperforms the greedy structured -prediction models in previous literatures. -" -7770,1805.11234,"Junwei Bao, Duyu Tang, Nan Duan, Zhao Yan, Yuanhua Lv, Ming Zhou, - Tiejun Zhao",Table-to-Text: Describing Table Region with Natural Language,cs.CL cs.AI," In this paper, we present a generative model to generate a natural language -sentence describing a table region, e.g., a row. The model maps a row from a -table to a continuous vector and then generates a natural language sentence by -leveraging the semantics of a table. To deal with rare words appearing in a -table, we develop a flexible copying mechanism that selectively replicates -contents from the table in the output sequence. Extensive experiments -demonstrate the accuracy of the model and the power of the copying mechanism. -On two synthetic datasets, WIKIBIO and SIMPLEQUESTIONS, our model improves the -current state-of-the-art BLEU-4 score from 34.70 to 40.26 and from 33.32 to -39.12, respectively. Furthermore, we introduce an open-domain dataset -WIKITABLETEXT including 13,318 explanatory sentences for 4,962 tables. Our -model achieves a BLEU-4 score of 38.23, which outperforms template based and -language model based approaches. -" -7771,1805.11264,"Wei-Ning Hsu, James Glass","Disentangling by Partitioning: A Representation Learning Framework for - Multimodal Sensory Data",stat.ML cs.CL cs.LG cs.SD eess.AS," Multimodal sensory data resembles the form of information perceived by humans -for learning, and are easy to obtain in large quantities. Compared to unimodal -data, synchronization of concepts between modalities in such data provides -supervision for disentangling the underlying explanatory factors of each -modality. Previous work leveraging multimodal data has mainly focused on -retaining only the modality-invariant factors while discarding the rest. In -this paper, we present a partitioned variational autoencoder (PVAE) and several -training objectives to learn disentangled representations, which encode not -only the shared factors, but also modality-dependent ones, into separate latent -variables. Specifically, PVAE integrates a variational inference framework and -a multimodal generative model that partitions the explanatory factors and -conditions only on the relevant subset of them for generation. We evaluate our -model on two parallel speech/image datasets, and demonstrate its ability to -learn disentangled representations by qualitatively exploring within-modality -and cross-modality conditional generation with semantics and styles specified -by examples. For quantitative analysis, we evaluate the classification accuracy -of automatically discovered semantic units. Our PVAE can achieve over 99% -accuracy on both modalities. -" -7772,1805.11267,Peter Jansen,"Multi-hop Inference for Sentence-level TextGraphs: How Challenging is - Meaningfully Combining Information for Science Question Answering?",cs.CL," Question Answering for complex questions is often modeled as a graph -construction or traversal task, where a solver must build or traverse a graph -of facts that answer and explain a given question. This ""multi-hop"" inference -has been shown to be extremely challenging, with few models able to aggregate -more than two facts before being overwhelmed by ""semantic drift"", or the -tendency for long chains of facts to quickly drift off topic. This is a major -barrier to current inference models, as even elementary science questions -require an average of 4 to 6 facts to answer and explain. In this work we -empirically characterize the difficulty of building or traversing a graph of -sentences connected by lexical overlap, by evaluating chance sentence -aggregation quality through 9,784 manually-annotated judgments across knowledge -graphs built from three free-text corpora (including study guides and Simple -Wikipedia). We demonstrate semantic drift tends to be high and aggregation -quality low, at between 0.04% and 3%, and highlight scenarios that maximize the -likelihood of meaningfully combining information. -" -7773,1805.11295,Jean-Fran\c{c}ois Delpech,Unsupervised detection of diachronic word sense evolution,cs.CL," Most words have several senses and connotations which evolve in time due to -semantic shift, so that closely related words may gain different or even -opposite meanings over the years. This evolution is very relevant to the study -of language and of cultural changes, but the tools currently available for -diachronic semantic analysis have significant, inherent limitations and are not -suitable for real-time analysis. In this article, we demonstrate how the -linearity of random vectors techniques enables building time series of -congruent word embeddings (or semantic spaces) which can then be compared and -combined linearly without loss of precision over any time period to detect -diachronic semantic shifts. We show how this approach yields time trajectories -of polysemous words such as amazon or apple, enables following semantic drifts -and gender bias across time, reveals the shifting instantiations of stable -concepts such as hurricane or president. This very fast, linear approach can -easily be distributed over many processors to follow in real time streams of -social media such as Twitter or Facebook; the resulting, time-dependent -semantic spaces can then be combined at will by simple additions or -subtractions. -" -7774,1805.11350,Nikola Mrk\v{s}i\'c and Ivan Vuli\'c,Fully Statistical Neural Belief Tracking,cs.CL cs.AI cs.LG," This paper proposes an improvement to the existing data-driven Neural Belief -Tracking (NBT) framework for Dialogue State Tracking (DST). The existing NBT -model uses a hand-crafted belief state update mechanism which involves an -expensive manual retuning step whenever the model is deployed to a new dialogue -domain. We show that this update mechanism can be learned jointly with the -semantic decoding and context modelling parts of the NBT model, eliminating the -last rule-based module from this DST framework. We propose two different -statistical update mechanisms and show that dialogue dynamics can be modelled -with a very small number of additional model parameters. In our DST evaluation -over three languages, we show that this model achieves competitive performance -and provides a robust framework for building resource-light DST models. -" -7775,1805.11351,"Qiuchi Li, Sagar Uprety, Benyou Wang, Dawei Song",Quantum-inspired Complex Word Embedding,cs.CL," A challenging task for word embeddings is to capture the emergent meaning or -polarity of a combination of individual words. For example, existing approaches -in word embeddings will assign high probabilities to the words ""Penguin"" and -""Fly"" if they frequently co-occur, but it fails to capture the fact that they -occur in an opposite sense - Penguins do not fly. We hypothesize that humans do -not associate a single polarity or sentiment to each word. The word contributes -to the overall polarity of a combination of words depending upon which other -words it is combined with. This is analogous to the behavior of microscopic -particles which exist in all possible states at the same time and interfere -with each other to give rise to new states depending upon their relative -phases. We make use of the Hilbert Space representation of such particles in -Quantum Mechanics where we subscribe a relative phase to each word, which is a -complex number, and investigate two such quantum inspired models to derive the -meaning of a combination of words. The proposed models achieve better -performances than state-of-the-art non-quantum models on the binary sentence -classification task. -" -7776,1805.11360,"Seonhoon Kim, Inho Kang, Nojun Kwak","Semantic Sentence Matching with Densely-connected Recurrent and - Co-attentive Information",cs.CL," Sentence matching is widely used in various natural language tasks such as -natural language inference, paraphrase identification, and question answering. -For these tasks, understanding logical and semantic relationship between two -sentences is required but it is yet challenging. Although attention mechanism -is useful to capture the semantic relationship and to properly align the -elements of two sentences, previous methods of attention mechanism simply use a -summation operation which does not retain original features enough. Inspired by -DenseNet, a densely connected convolutional network, we propose a -densely-connected co-attentive recurrent neural network, each layer of which -uses concatenated information of attentive features as well as hidden features -of all the preceding recurrent layers. It enables preserving the original and -the co-attentive feature information from the bottommost word embedding layer -to the uppermost recurrent layer. To alleviate the problem of an -ever-increasing size of feature vectors due to dense concatenation operations, -we also propose to use an autoencoder after dense concatenation. We evaluate -our proposed architecture on highly competitive benchmark datasets related to -sentence matching. Experimental results show that our architecture, which -retains recurrent and attentive features, achieves state-of-the-art -performances for most of the tasks. -" -7777,1805.11404,"Andreas Niekler, Arnim Bleier, Christian Kahmann, Lisa Posch, Gregor - Wiedemann, Kenan Erdogan, Gerhard Heyer, Markus Strohmaier","iLCM - A Virtual Research Infrastructure for Large-Scale Qualitative - Data",cs.IR cs.CL," The iLCM project pursues the development of an integrated research -environment for the analysis of structured and unstructured data in a ""Software -as a Service"" architecture (SaaS). The research environment addresses -requirements for the quantitative evaluation of large amounts of qualitative -data with text mining methods as well as requirements for the reproducibility -of data-driven research designs in the social sciences. For this, the iLCM -research environment comprises two central components. First, the Leipzig -Corpus Miner (LCM), a decentralized SaaS application for the analysis of large -amounts of news texts developed in a previous Digital Humanities project. -Second, the text mining tools implemented in the LCM are extended by an ""Open -Research Computing"" (ORC) environment for executable script documents, -so-called ""notebooks"". This novel integration allows to combine generic, -high-performance methods to process large amounts of unstructured text data and -with individual program scripts to address specific research requirements in -computational social science and digital humanities. -" -7778,1805.11461,"Farhad Nooralahzadeh, Lilja {\O}vrelid",Syntactic Dependency Representations in Neural Relation Classification,cs.CL," We investigate the use of different syntactic dependency representations in a -neural relation classification task and compare the CoNLL, Stanford Basic and -Universal Dependencies schemes. We further compare with a syntax-agnostic -approach and perform an error analysis in order to gain a better understanding -of the results. -" -7779,1805.11462,"Guillaume Klein, Yoon Kim, Yuntian Deng, Vincent Nguyen, Jean - Senellart, Alexander M. Rush",OpenNMT: Neural Machine Translation Toolkit,cs.CL," OpenNMT is an open-source toolkit for neural machine translation (NMT). The -system prioritizes efficiency, modularity, and extensibility with the goal of -supporting NMT research into model architectures, feature representations, and -source modalities, while maintaining competitive performance and reasonable -training requirements. The toolkit consists of modeling and translation -support, as well as detailed pedagogical documentation about the underlying -techniques. OpenNMT has been used in several production MT systems, modified -for numerous research papers, and is implemented across several deep learning -frameworks. -" -7780,1805.11465,"Jonas Groschwitz, Matthias Lindemann, Meaghan Fowlie, Mark Johnson, - Alexander Koller",AMR Dependency Parsing with a Typed Semantic Algebra,cs.CL," We present a semantic parser for Abstract Meaning Representations which -learns to parse strings into tree representations of the compositional -structure of an AMR graph. This allows us to use standard neural techniques for -supertagging and dependency tree parsing, constrained by a linguistically -principled type system. We present two approximative decoding algorithms, which -achieve state-of-the-art accuracy and outperform strong baselines. -" -7781,1805.11467,"Diego Moussallem, Ricardo Usbeck, Michael R\""oder, Axel-Cyrille Ngonga - Ngomo",Entity Linking in 40 Languages using MAG,cs.CL," A plethora of Entity Linking (EL) approaches has recently been developed. -While many claim to be multilingual, the MAG (Multilingual AGDISTIS) approach -has been shown recently to outperform the state of the art in multilingual EL -on 7 languages. With this demo, we extend MAG to support EL in 40 different -languages, including especially low-resources languages such as Ukrainian, -Greek, Hungarian, Croatian, Portuguese, Japanese and Korean. Our demo relies on -online web services which allow for an easy access to our entity linking -approaches and can disambiguate against DBpedia and Wikidata. During the demo, -we will show how to use MAG by means of POST requests as well as using its -user-friendly web interface. All data used in the demo is available at -https://hobbitdata.informatik.uni-leipzig.de/agdistis/ -" -7782,1805.11474,Anastasia Shimorina,Human vs Automatic Metrics: on the Importance of Correlation Design,cs.CL," This paper discusses two existing approaches to the correlation analysis -between automatic evaluation metrics and human scores in the area of natural -language generation. Our experiments show that depending on the usage of a -system- or sentence-level correlation analysis, correlation results between -automatic scores and human judgments are inconsistent. -" -7783,1805.11535,"Yi Tay, Anh Tuan Luu, Siu Cheung Hui","CoupleNet: Paying Attention to Couples with Coupled Attention for - Relationship Recommendation",cs.CL cs.AI cs.IR cs.NE," Dating and romantic relationships not only play a huge role in our personal -lives but also collectively influence and shape society. Today, many romantic -partnerships originate from the Internet, signifying the importance of -technology and the web in modern dating. In this paper, we present a text-based -computational approach for estimating the relationship compatibility of two -users on social media. Unlike many previous works that propose reciprocal -recommender systems for online dating websites, we devise a distant supervision -heuristic to obtain real world couples from social platforms such as Twitter. -Our approach, the CoupleNet is an end-to-end deep learning based estimator that -analyzes the social profiles of two users and subsequently performs a -similarity match between the users. Intuitively, our approach performs both -user profiling and match-making within a unified end-to-end framework. -CoupleNet utilizes hierarchical recurrent neural models for learning -representations of user profiles and subsequently coupled attention mechanisms -to fuse information aggregated from two users. To the best of our knowledge, -our approach is the first data-driven deep learning approach for our novel -relationship recommendation problem. We benchmark our CoupleNet against several -machine learning and deep learning baselines. Experimental results show that -our approach outperforms all approaches significantly in terms of precision. -Qualitative analysis shows that our model is capable of also producing -explainable results to users. -" -7784,1805.11545,Marco A. Valenzuela-Esc\'arcega and Ajay Nagesh and Mihai Surdeanu,Lightly-supervised Representation Learning with Global Interpretability,cs.CL," We propose a lightly-supervised approach for information extraction, in -particular named entity classification, which combines the benefits of -traditional bootstrapping, i.e., use of limited annotations and -interpretability of extraction patterns, with the robust learning approaches -proposed in representation learning. Our algorithm iteratively learns custom -embeddings for both the multi-word entities to be extracted and the patterns -that match them from a few example entities per category. We demonstrate that -this representation-based approach outperforms three other state-of-the-art -bootstrapping approaches on two datasets: CoNLL-2003 and OntoNotes. -Additionally, using these embeddings, our approach outputs a -globally-interpretable model consisting of a decision list, by ranking patterns -based on their proximity to the average entity embedding in a given class. We -show that this interpretable model performs close to our complete bootstrapping -model, proving that representation learning can be used to produce -interpretable models with small loss in performance. -" -7785,1805.11546,"Alexander G. Ororbia, Ankur Mali, Matthew A. Kelly, and David Reitter",Like a Baby: Visually Situated Neural Language Acquisition,cs.CL cs.AI," We examine the benefits of visual context in training neural language models -to perform next-word prediction. A multi-modal neural architecture is -introduced that outperform its equivalent trained on language alone with a 2\% -decrease in perplexity, even when no visual context is available at test. -Fine-tuning the embeddings of a pre-trained state-of-the-art bidirectional -language model (BERT) in the language modeling framework yields a 3.5\% -improvement. The advantage for training with visual context when testing -without is robust across different languages (English, German and Spanish) and -different models (GRU, LSTM, $\Delta$-RNN, as well as those that use BERT -embeddings). Thus, language models perform better when they learn like a baby, -i.e, in a multi-modal environment. This finding is compatible with the theory -of situated cognition: language is inseparable from its physical context. -" -7786,1805.11564,"Uwe D. Reichel, \v{S}tefan Be\v{n}u\v{s}, Katalin M\'ady","Entrainment profiles: Comparison by gender, role, and feature set",cs.CL," We examine prosodic entrainment in cooperative game dialogs for new feature -sets describing register, pitch accent shape, and rhythmic aspects of -utterances. For these as well as for established features we present -entrainment profiles to detect within- and across-dialog entrainment by the -speakers' gender and role in the game. It turned out, that feature sets undergo -entrainment in different quantitative and qualitative ways, which can partly be -attributed to their different functions. Furthermore, interactions between -speaker gender and role (describer vs. follower) suggest gender-dependent -strategies in cooperative solution-oriented interactions: female describers -entrain most, male describers least. Our data suggests a slight advantage of -the latter strategy on task success. -" -7787,1805.11598,"Phoebe Mulcaire, Swabha Swayamdipta, Noah Smith",Polyglot Semantic Role Labeling,cs.CL," Previous approaches to multilingual semantic dependency parsing treat -languages independently, without exploiting the similarities between semantic -structures across languages. We experiment with a new approach where we combine -resources from a pair of languages in the CoNLL 2009 shared task to build a -polyglot semantic role labeler. Notwithstanding the absence of parallel data, -and the dissimilarity in annotations between languages, our approach results in -an improvement in SRL performance on multiple languages over a monolingual -baseline. Analysis of the polyglot model shows it to be advantageous in -lower-resource settings. -" -7788,1805.11603,"Moustafa Al-Hajj, Amani Sabra","Automatic Identification of Arabic expressions related to future events - in Lebanon's economy",cs.CL," In this paper, we propose a method to automatically identify future events in -Lebanon's economy from Arabic texts. Challenges are threefold: first, we need -to build a corpus of Arabic texts that covers Lebanon's economy; second, we -need to study how future events are expressed linguistically in these texts; -and third, we need to automatically identify the relevant textual segments -accordingly. We will validate this method on a constructed corpus form the web -and show that it has very promising results. To do so, we will be using SLCSAS, -a system for semantic analysis, based on the Contextual Explorer method, and -""AlKhalil Morpho Sys"" system for morpho-syntactic analysis. -" -7789,1805.11611,"Miguel A. \'Alvarez-Carmona, Marc Franco-Salvador, Esa\'u - Villatoro-Tello, Manuel Montes-y-G\'omez, Paolo Rosso and Luis - Villase\~nor-Pineda","Semantically-informed distance and similarity measures for paraphrase - plagiarism identification",cs.CL," Paraphrase plagiarism identification represents a very complex task given -that plagiarized texts are intentionally modified through several rewording -techniques. Accordingly, this paper introduces two new measures for evaluating -the relatedness of two given texts: a semantically-informed similarity measure -and a semantically-informed edit distance. Both measures are able to extract -semantic information from either an external resource or a distributed -representation of words, resulting in informative features for training a -supervised classifier for detecting paraphrase plagiarism. Obtained results -indicate that the proposed metrics are consistently good in detecting different -types of paraphrase plagiarism. In addition, results are very competitive -against state-of-the art methods having the advantage of representing a much -more simple but equally effective solution. -" -7790,1805.11651,"Vadim Markovtsev, Waren Long, Egor Bulychev, Romain Keramitas, - Konstantin Slavnov, Gabor Markowski","Splitting source code identifiers using Bidirectional LSTM Recurrent - Neural Network",cs.CL cs.PL," Programmers make rich use of natural language in the source code they write -through identifiers and comments. Source code identifiers are selected from a -pool of tokens which are strongly related to the meaning, naming conventions, -and context. These tokens are often combined to produce more precise and -obvious designations. Such multi-part identifiers count for 97% of all naming -tokens in the Public Git Archive - the largest dataset of Git repositories to -date. We introduce a bidirectional LSTM recurrent neural network to detect -subtokens in source code identifiers. We trained that network on 41.7 million -distinct splittable identifiers collected from 182,014 open source projects in -Public Git Archive, and show that it outperforms several other machine learning -models. The proposed network can be used to improve the upstream models which -are based on source code identifiers, as well as improving developer experience -allowing writing code without switching the keyboard case. -" -7791,1805.11653,"Nelson F. Liu, Omer Levy, Roy Schwartz, Chenhao Tan, and Noah A. Smith",LSTMs Exploit Linguistic Attributes of Data,cs.CL," While recurrent neural networks have found success in a variety of natural -language processing applications, they are general models of sequential data. -We investigate how the properties of natural language data affect an LSTM's -ability to learn a nonlinguistic task: recalling elements from its input. We -find that models trained on natural language data are able to recall tokens -from much longer sequences than models trained on non-language sequential data. -Furthermore, we show that the LSTM learns to solve the memorization task by -explicitly using a subset of its neurons to count timesteps in the input. We -hypothesize that the patterns and structure in natural language data enable -LSTMs to learn by providing approximate ways of reducing loss, but -understanding the effect of different training data on the learnability of -LSTMs remains an open question. -" -7792,1805.11749,"Zichao Yang, Zhiting Hu, Chris Dyer, Eric P. Xing, Taylor - Berg-Kirkpatrick",Unsupervised Text Style Transfer using Language Models as Discriminators,cs.CL," Binary classifiers are often employed as discriminators in GAN-based -unsupervised style transfer systems to ensure that transferred sentences are -similar to sentences in the target domain. One difficulty with this approach is -that the error signal provided by the discriminator can be unstable and is -sometimes insufficient to train the generator to produce fluent language. In -this paper, we propose a new technique that uses a target domain language model -as the discriminator, providing richer and more stable token-level feedback -during the learning process. We train the generator to minimize the negative -log likelihood (NLL) of generated sentences, evaluated by the language model. -By using a continuous approximation of discrete sampling under the generator, -our model can be trained using back-propagation in an end- to-end fashion. -Moreover, our empirical results show that when using a language model as a -structured discriminator, it is possible to forgo adversarial steps during -training, making the process more stable. We compare our model with previous -work using convolutional neural networks (CNNs) as discriminators and show that -our approach leads to improved performance on three tasks: word substitution -decipherment, sentiment modification, and related language translation. -" -7793,1805.11752,"Oluwatobi Olabiyi, Alan Salimov, Anish Khazane, Erik T. Mueller","Multi-turn Dialogue Response Generation in an Adversarial Learning - Framework",cs.CL cs.AI cs.LG cs.NE stat.ML," We propose an adversarial learning approach for generating multi-turn -dialogue responses. Our proposed framework, hredGAN, is based on conditional -generative adversarial networks (GANs). The GAN's generator is a modified -hierarchical recurrent encoder-decoder network (HRED) and the discriminator is -a word-level bidirectional RNN that shares context and word embeddings with the -generator. During inference, noise samples conditioned on the dialogue history -are used to perturb the generator's latent space to generate several possible -responses. The final response is the one ranked best by the discriminator. The -hredGAN shows improved performance over existing methods: (1) it generalizes -better than networks trained using only the log-likelihood criterion, and (2) -it generates longer, more informative and more diverse responses with high -utterance and topic relevance even with limited training data. This improvement -is demonstrated on the Movie triples and Ubuntu dialogue datasets using both -automatic and human evaluations. -" -7794,1805.11762,"Bing Liu, Ian Lane",Adversarial Learning of Task-Oriented Neural Dialog Models,cs.CL," In this work, we propose an adversarial learning method for reward estimation -in reinforcement learning (RL) based task-oriented dialog models. Most of the -current RL based task-oriented dialog systems require the access to a reward -signal from either user feedback or user ratings. Such user ratings, however, -may not always be consistent or available in practice. Furthermore, online -dialog policy learning with RL typically requires a large number of queries to -users, suffering from sample efficiency problem. To address these challenges, -we propose an adversarial learning method to learn dialog rewards directly from -dialog samples. Such rewards are further used to optimize the dialog policy -with policy gradient based RL. In the evaluation in a restaurant search domain, -we show that the proposed adversarial dialog learning method achieves advanced -dialog success rate comparing to strong baseline methods. We further discuss -the covariate shift problem in online adversarial dialog learning and show how -we can address that with partial access to user feedback. -" -7795,1805.11774,"Fereshte Khani, Noah D. Goodman, Percy Liang","Planning, Inference and Pragmatics in Sequential Language Games",cs.CL," We study sequential language games in which two players, each with private -information, communicate to achieve a common goal. In such games, a successful -player must (i) infer the partner's private information from the partner's -messages, (ii) generate messages that are most likely to help with the goal, -and (iii) reason pragmatically about the partner's strategy. We propose a model -that captures all three characteristics and demonstrate their importance in -capturing human behavior on a new goal-oriented dataset we collected using -crowdsourcing. -" -7796,1805.11818,"Volkan Cirik, Louis-Philippe Morency, Taylor Berg-Kirkpatrick",Visual Referring Expression Recognition: What Do Systems Actually Learn?,cs.CL cs.AI cs.CV cs.NE," We present an empirical analysis of the state-of-the-art systems for -referring expression recognition -- the task of identifying the object in an -image referred to by a natural language expression -- with the goal of gaining -insight into how these systems reason about language and vision. Surprisingly, -we find strong evidence that even sophisticated and linguistically-motivated -models for this task may ignore the linguistic structure, instead relying on -shallow correlations introduced by unintended biases in the data selection and -annotation process. For example, we show that a system trained and tested on -the input image $\textit{without the input referring expression}$ can achieve a -precision of 71.2% in top-2 predictions. Furthermore, a system that predicts -only the object category given the input can achieve a precision of 84.2% in -top-2 predictions. These surprisingly positive results for what should be -deficient prediction scenarios suggest that careful analysis of what our models -are learning -- and further, how our data is constructed -- is critical as we -seek to make substantive progress on grounded language tasks. -" -7797,1805.11824,"Rhea Sukthanker, Soujanya Poria, Erik Cambria, Ramkumar - Thirunavukarasu",Anaphora and Coreference Resolution: A Review,cs.CL," Entity resolution aims at resolving repeated references to an entity in a -document and forms a core component of natural language processing (NLP) -research. This field possesses immense potential to improve the performance of -other NLP fields like machine translation, sentiment analysis, paraphrase -detection, summarization, etc. The area of entity resolution in NLP has seen -proliferation of research in two separate sub-areas namely: anaphora resolution -and coreference resolution. Through this review article, we aim at clarifying -the scope of these two tasks in entity resolution. We also carry out a detailed -analysis of the datasets, evaluation metrics and research methods that have -been adopted to tackle this NLP problem. This survey is motivated with the aim -of providing the reader with a clear understanding of what constitutes this NLP -problem and the issues that require attention. -" -7798,1805.11850,"Kota Yoshida, Munetaka Minoguchi, Kenichiro Wani, Akio Nakamura and - Hirokatsu Kataoka",Neural Joking Machine : Humorous image captioning,cs.CV cs.CL," What is an effective expression that draws laughter from human beings? In the -present paper, in order to consider this question from an academic standpoint, -we generate an image caption that draws a ""laugh"" by a computer. A system that -outputs funny captions based on the image caption proposed in the computer -vision field is constructed. Moreover, we also propose the Funny Score, which -flexibly gives weights according to an evaluation database. The Funny Score -more effectively brings out ""laughter"" to optimize a model. In addition, we -build a self-collected BoketeDB, which contains a theme (image) and funny -caption (text) posted on ""Bokete"", which is an image Ogiri website. In an -experiment, we use BoketeDB to verify the effectiveness of the proposed method -by comparing the results obtained using the proposed method and those obtained -using MS COCO Pre-trained CNN+LSTM, which is the baseline and idiot created by -humans. We refer to the proposed method, which uses the BoketeDB pre-trained -model, as the Neural Joking Machine (NJM). -" -7799,1805.11867,"Chao-Chun Hsu, Szu-Min Chen, Ming-Hsun Hsieh, Lun-Wei Ku","Using Inter-Sentence Diverse Beam Search to Reduce Redundancy in Visual - Storytelling",cs.CL cs.AI," Visual storytelling includes two important parts: coherence between the story -and images as well as the story structure. For image to text neural network -models, similar images in the sequence would provide close information for -story generator to obtain almost identical sentence. However, repeatedly -narrating same objects or events will undermine a good story structure. In this -paper, we proposed an inter-sentence diverse beam search to generate a more -expressive story. Comparing to some recent models of visual storytelling task, -which generate story without considering the generated sentence of the previous -picture, our proposed method can avoid generating identical sentence even given -a sequence of similar pictures. -" -7800,1805.11868,"Sahil Swami, Ankush Khandelwal, Vinay Singh, Syed Sarfaraz Akhtar, - Manish Shrivastava","An English-Hindi Code-Mixed Corpus: Stance Annotation and Baseline - System",cs.CL," Social media has become one of the main channels for peo- ple to communicate -and share their views with the society. We can often detect from these views -whether the person is in favor, against or neu- tral towards a given topic. -These opinions from social media are very useful for various companies. We -present a new dataset that consists of 3545 English-Hindi code-mixed tweets -with opinion towards Demoneti- sation that was implemented in India in 2016 -which was followed by a large countrywide debate. We present a baseline -supervised classification system for stance detection developed using the same -dataset that uses various machine learning techniques to achieve an accuracy of -58.7% on 10-fold cross validation. -" -7801,1805.11869,"Sahil Swami, Ankush Khandelwal, Vinay Singh, Syed Sarfaraz Akhtar, - Manish Shrivastava",A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm Detection,cs.CL," Social media platforms like twitter and facebook have be- come two of the -largest mediums used by people to express their views to- wards different -topics. Generation of such large user data has made NLP tasks like sentiment -analysis and opinion mining much more important. Using sarcasm in texts on -social media has become a popular trend lately. Using sarcasm reverses the -meaning and polarity of what is implied by the text which poses challenge for -many NLP tasks. The task of sarcasm detection in text is gaining more and more -importance for both commer- cial and security services. We present the first -English-Hindi code-mixed dataset of tweets marked for presence of sarcasm and -irony where each token is also annotated with a language tag. We present a -baseline su- pervised classification system developed using the same dataset -which achieves an average F-score of 78.4 after using random forest classifier -and performing 10-fold cross validation. -" -7802,1805.11937,"G\""ozde G\""ul \c{S}ahin and Mark Steedman",Character-Level Models versus Morphology in Semantic Role Labeling,cs.CL," Character-level models have become a popular approach specially for their -accessibility and ability to handle unseen data. However, little is known on -their ability to reveal the underlying morphological structure of a word, which -is a crucial skill for high-level semantic analysis tasks, such as semantic -role labeling (SRL). In this work, we train various types of SRL models that -use word, character and morphology level information and analyze how -performance of characters compare to words and morphology for several -languages. We conduct an in-depth error analysis for each morphological -typology and analyze the strengths and limitations of character-level models -that relate to out-of-domain data, training data size, long range dependencies -and model complexity. Our exhaustive analyses shed light on important -characteristics of character-level models and their semantic capability. -" -7803,1805.12032,"Maria Glenski, Tim Weninger, and Svitlana Volkova","Identifying and Understanding User Reactions to Deceptive and Trusted - Social News Sources",cs.CL," In the age of social news, it is important to understand the types of -reactions that are evoked from news sources with various levels of credibility. -In the present work we seek to better understand how users react to trusted and -deceptive news sources across two popular, and very different, social media -platforms. To that end, (1) we develop a model to classify user reactions into -one of nine types, such as answer, elaboration, and question, etc, and (2) we -measure the speed and the type of reaction for trusted and deceptive news -sources for 10.8M Twitter posts and 6.2M Reddit comments. We show that there -are significant differences in the speed and the type of reactions between -trusted and deceptive news sources on Twitter, but far smaller differences on -Reddit. -" -7804,1805.12045,"Sahar Ghannay and Antoine Caubri\`ere and Yannick Est\`eve and Antoine - Laurent and Emmanuel Morin",End-to-end named entity extraction from speech,cs.CL," Named entity recognition (NER) is among SLU tasks that usually extract -semantic information from textual documents. Until now, NER from speech is made -through a pipeline process that consists in processing first an automatic -speech recognition (ASR) on the audio and then processing a NER on the ASR -outputs. Such approach has some disadvantages (error propagation, metric to -tune ASR systems sub-optimal in regards to the final task, reduced space search -at the ASR output level...) and it is known that more integrated approaches -outperform sequential ones, when they can be applied. In this paper, we present -a first study of end-to-end approach that directly extracts named entities from -speech, though a unique neural architecture. On a such way, a joint -optimization is able for both ASR and NER. Experiments are carried on French -data easily accessible, composed of data distributed in several evaluation -campaign. Experimental results show that this end-to-end approach provides -better results (F-measure=0.69 on test data) than a classical pipeline approach -to detect named entity categories (F-measure=0.65). -" -7805,1805.12061,"Genta Indra Winata, Chien-Sheng Wu, Andrea Madotto, Pascale Fung","Bilingual Character Representation for Efficiently Addressing - Out-of-Vocabulary Words in Code-Switching Named Entity Recognition",cs.CL," We propose an LSTM-based model with hierarchical architecture on named entity -recognition from code-switching Twitter data. Our model uses bilingual -character representation and transfer learning to address out-of-vocabulary -words. In order to mitigate data noise, we propose to use token replacement and -normalization. In the 3rd Workshop on Computational Approaches to Linguistic -Code-Switching Shared Task, we achieved second place with 62.76% harmonic mean -F1-score for English-Spanish language pair without using any gazetteer and -knowledge-based information. -" -7806,1805.12070,"Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, Pascale Fung",Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning,cs.CL," Lack of text data has been the major issue on code-switching language -modeling. In this paper, we introduce multi-task learning based language model -which shares syntax representation of languages to leverage linguistic -information and tackle the low resource data issue. Our model jointly learns -both language modeling and Part-of-Speech tagging on code-switched utterances. -In this way, the model is able to identify the location of code-switching -points and improves the prediction of next word. Our approach outperforms -standard LSTM based language model, with an improvement of 9.7% and 7.4% in -perplexity on SEAME Phase I and Phase II dataset respectively. -" -7807,1805.12096,"Marcin Junczys-Dowmunt, Kenneth Heafield, Hieu Hoang, Roman - Grundkiewicz, Anthony Aue",Marian: Cost-effective High-Quality Neural Machine Translation in C++,cs.CL," This paper describes the submissions of the ""Marian"" team to the WNMT 2018 -shared task. We investigate combinations of teacher-student training, -low-precision matrix products, auto-tuning and other methods to optimize the -Transformer model on GPU and CPU. By further integrating these methods with the -new averaging attention networks, a recently introduced faster Transformer -variant, we create a number of high-quality, high-performance models on the GPU -and CPU, dominating the Pareto frontier for this shared task. -" -7808,1805.12115,"Aldo Gangemi, Mehwish Alam, Valentina Presutti",Amnestic Forgery: an Ontology of Conceptual Metaphors,cs.CL cs.AI," This paper presents Amnestic Forgery, an ontology for metaphor semantics, -based on MetaNet, which is inspired by the theory of Conceptual Metaphor. -Amnestic Forgery reuses and extends the Framester schema, as an ideal ontology -design framework to deal with both semiotic and referential aspects of frames, -roles, mappings, and eventually blending. The description of the resource is -supplied by a discussion of its applications, with examples taken from metaphor -generation, and the referential problems of metaphoric mappings. Both schema -and data are available from the Framester SPARQL endpoint. -" -7809,1805.12164,"Carl Allen, Ivana Bala\v{z}evi\'c, Timothy Hospedales",What the Vec? Towards Probabilistically Grounded Embeddings,cs.CL cs.LG stat.ML," Word2Vec (W2V) and GloVe are popular, fast and efficient word embedding -algorithms. Their embeddings are widely used and perform well on a variety of -natural language processing tasks. Moreover, W2V has recently been adopted in -the field of graph embedding, where it underpins several leading algorithms. -However, despite their ubiquity and relatively simple model architecture, a -theoretical understanding of what the embedding parameters of W2V and GloVe -learn and why that is useful in downstream tasks has been lacking. We show that -different interactions between PMI vectors reflect semantic word relationships, -such as similarity and paraphrasing, that are encoded in low dimensional word -embeddings under a suitable projection, theoretically explaining why embeddings -of W2V and GloVe work. As a consequence, we also reveal an interesting -mathematical interconnection between the considered semantic relationships -themselves. -" -7810,1805.12216,"Zhihong Shen, Hao Ma, Kuansan Wang",A Web-scale system for scientific knowledge exploration,cs.CL cs.DL," To enable efficient exploration of Web-scale scientific knowledge, it is -necessary to organize scientific publications into a hierarchical concept -structure. In this work, we present a large-scale system to (1) identify -hundreds of thousands of scientific concepts, (2) tag these identified concepts -to hundreds of millions of scientific publications by leveraging both text and -graph structure, and (3) build a six-level concept hierarchy with a -subsumption-based model. The system builds the most comprehensive cross-domain -scientific concept ontology published to date, with more than 200 thousand -concepts and over one million relationships. -" -7811,1805.12282,"Huda Khayrallah, Philipp Koehn",On the Impact of Various Types of Noise on Neural Machine Translation,cs.CL," We examine how various types of noise in the parallel training data impact -the quality of neural machine translation systems. We create five types of -artificial noise and analyze how they degrade performance in neural and -statistical machine translation. We find that neural models are generally more -harmed by noise than statistical models. For one especially egregious type of -noise they learn to just copy the input sentence. -" -7812,1805.12291,Kemal Kurniawan and Samuel Louvan,"Empirical Evaluation of Character-Based Model on Neural Named-Entity - Recognition in Indonesian Conversational Texts",cs.CL," Despite the long history of named-entity recognition (NER) task in the -natural language processing community, previous work rarely studied the task on -conversational texts. Such texts are challenging because they contain a lot of -word variations which increase the number of out-of-vocabulary (OOV) words. The -high number of OOV words poses a difficulty for word-based neural models. -Meanwhile, there is plenty of evidence to the effectiveness of character-based -neural models in mitigating this OOV problem. We report an empirical evaluation -of neural sequence labeling models with character embedding to tackle NER task -in Indonesian conversational texts. Our experiments show that (1) character -models outperform word embedding-only models by up to 4 $F_1$ points, (2) -character models perform better in OOV cases with an improvement of as high as -15 $F_1$ points, and (3) character models are robust against a very high OOV -rate. -" -7813,1805.12307,"Genta Indra Winata, Onno Pepijn Kampman, Pascale Fung","Attention-Based LSTM for Psychological Stress Detection from Spoken - Language Using Distant Supervision",cs.CL," We propose a Long Short-Term Memory (LSTM) with attention mechanism to -classify psychological stress from self-conducted interview transcriptions. We -apply distant supervision by automatically labeling tweets based on their -hashtag content, which complements and expands the size of our corpus. This -additional data is used to initialize the model parameters, and which it is -fine-tuned using the interview data. This improves the model's robustness, -especially by expanding the vocabulary size. The bidirectional LSTM model with -attention is found to be the best model in terms of accuracy (74.1%) and -f-score (74.3%). Furthermore, we show that distant supervision fine-tuning -enhances the model's performance by 1.6% accuracy and 2.1% f-score. The -attention mechanism helps the model to select informative words. -" -7814,1805.12316,"Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-Ling Wang, Michael I. - Jordan","Greedy Attack and Gumbel Attack: Generating Adversarial Examples for - Discrete Data",cs.LG cs.AI cs.CL cs.CR stat.ML," We present a probabilistic framework for studying adversarial attacks on -discrete data. Based on this framework, we derive a perturbation-based method, -Greedy Attack, and a scalable learning-based method, Gumbel Attack, that -illustrate various tradeoffs in the design of attacks. We demonstrate the -effectiveness of these methods using both quantitative metrics and human -evaluation on various state-of-the-art models for text classification, -including a word-based CNN, a character-based CNN and an LSTM. As as example of -our results, we show that the accuracy of character-based convolutional -networks drops to the level of random selection by modifying only five -characters through Greedy Attack. -" -7815,1805.12352,"Xiaodong Gu, Kyunghyun Cho, Jung-Woo Ha, Sunghun Kim","DialogWAE: Multimodal Response Generation with Conditional Wasserstein - Auto-Encoder",cs.CL cs.AI cs.LG cs.NE," Variational autoencoders~(VAEs) have shown a promise in data-driven -conversation modeling. However, most VAE conversation models match the -approximate posterior distribution over the latent variables to a simple prior -such as standard normal distribution, thereby restricting the generated -responses to a relatively simple (e.g., unimodal) scope. In this paper, we -propose DialogWAE, a conditional Wasserstein autoencoder~(WAE) specially -designed for dialogue modeling. Unlike VAEs that impose a simple distribution -over the latent variables, DialogWAE models the distribution of data by -training a GAN within the latent variable space. Specifically, our model -samples from the prior and posterior distributions over the latent variables by -transforming context-dependent random noise using neural networks and minimizes -the Wasserstein distance between the two distributions. We further develop a -Gaussian mixture prior network to enrich the latent space. Experiments on two -popular datasets show that DialogWAE outperforms the state-of-the-art -approaches in generating more coherent, informative and diverse responses. -" -7816,1805.12386,"Daniel Hershcovich, Leshem Choshen, Elior Sulem, Zohar Aizenbud, Ari - Rappoport and Omri Abend","SemEval 2019 Shared Task: Cross-lingual Semantic Parsing with UCCA - - Call for Participation",cs.CL," We announce a shared task on UCCA parsing in English, German and French, and -call for participants to submit their systems. UCCA is a cross-linguistically -applicable framework for semantic representation, which builds on extensive -typological work and supports rapid annotation. UCCA poses a challenge for -existing parsing techniques, as it exhibits reentrancy (resulting in DAG -structures), discontinuous structures and non-terminal nodes corresponding to -complex semantic units. Given the success of recent semantic parsing shared -tasks (on SDP and AMR), we expect the task to have a significant contribution -to the advancement of UCCA parsing in particular, and semantic parsing in -general. Furthermore, existing applications for semantic evaluation that are -based on UCCA will greatly benefit from better automatic methods for UCCA -parsing. The competition website is -https://competitions.codalab.org/competitions/19160 -" -7817,1805.12393,"Yuyu Zhang, Hanjun Dai, Kamil Toraman, Le Song","KG^2: Learning to Reason Science Exam Questions with Contextual - Knowledge Graph Embeddings",cs.LG cs.AI cs.CL stat.ML," The AI2 Reasoning Challenge (ARC), a new benchmark dataset for question -answering (QA) has been recently released. ARC only contains natural science -questions authored for human exams, which are hard to answer and require -advanced logic reasoning. On the ARC Challenge Set, existing state-of-the-art -QA systems fail to significantly outperform random baseline, reflecting the -difficult nature of this task. In this paper, we propose a novel framework for -answering science exam questions, which mimics human solving process in an -open-book exam. To address the reasoning challenge, we construct contextual -knowledge graphs respectively for the question itself and supporting sentences. -Our model learns to reason with neural embeddings of both knowledge graphs. -Experiments on the ARC Challenge Set show that our model outperforms the -previous state-of-the-art QA systems. -" -7818,1805.12471,"Alex Warstadt, Amanpreet Singh, Samuel R. Bowman",Neural Network Acceptability Judgments,cs.CL," This paper investigates the ability of artificial neural networks to judge -the grammatical acceptability of a sentence, with the goal of testing their -linguistic competence. We introduce the Corpus of Linguistic Acceptability -(CoLA), a set of 10,657 English sentences labeled as grammatical or -ungrammatical from published linguistics literature. As baselines, we train -several recurrent neural network models on acceptability classification, and -find that our models outperform unsupervised models by Lau et al (2016) on -CoLA. Error-analysis on specific grammatical phenomena reveals that both Lau et -al.'s models and ours learn systematic generalizations like subject-verb-object -order. However, all models we test perform far below human level on a wide -range of grammatical constructions. -" -7819,1805.12501,"Li Zhang, Steven R. Wilson, Rada Mihalcea",Multi-Label Transfer Learning for Multi-Relational Semantic Similarity,cs.CL," Multi-relational semantic similarity datasets define the semantic relations -between two short texts in multiple ways, e.g., similarity, relatedness, and so -on. Yet, all the systems to date designed to capture such relations target one -relation at a time. We propose a multi-label transfer learning approach based -on LSTM to make predictions for several relations simultaneously and aggregate -the losses to update the parameters. This multi-label regression approach -jointly learns the information provided by the multiple relations, rather than -treating them as separate tasks. Not only does this approach outperform the -single-task approach and the traditional multi-task learning approach, but it -also achieves state-of-the-art performance on all but one relation of the Human -Activity Phrase dataset. -" -7820,1805.12518,"Arne K\""ohn","Incremental Natural Language Processing: Challenges, Strategies, and - Evaluation",cs.CL," Incrementality is ubiquitous in human-human interaction and beneficial for -human-computer interaction. It has been a topic of research in different parts -of the NLP community, mostly with focus on the specific topic at hand even -though incremental systems have to deal with similar challenges regardless of -domain. In this survey, I consolidate and categorize the approaches, -identifying similarities and differences in the computation and data, and show -trade-offs that have to be considered. A focus lies on evaluating incremental -systems because the standard metrics often fail to capture the incremental -properties of a system and coming up with a suitable evaluation scheme is -non-trivial. -" -7821,1806.00044,"Subhojeet Pramanik, Aman Hussain",Text normalization using memory augmented neural networks,cs.CL," We perform text normalization, i.e. the transformation of words from the -written to the spoken form, using a memory augmented neural network. With the -addition of dynamic memory access and storage mechanism, we present a neural -architecture that will serve as a language-agnostic text normalization system -while avoiding the kind of unacceptable errors made by the LSTM-based recurrent -neural networks. By successfully reducing the frequency of such mistakes, we -show that this novel architecture is indeed a better alternative. Our proposed -system requires significantly lesser amounts of data, training time and compute -resources. Additionally, we perform data up-sampling, circumventing the data -sparsity problem in some semiotic classes, to show that sufficient examples in -any particular class can improve the performance of our text normalization -system. Although a few occurrences of these errors still remain in certain -semiotic classes, we demonstrate that memory augmented networks with -meta-learning capabilities can open many doors to a superior text normalization -system. -" -7822,1806.00047,"Valts Blukis and Nataly Brukhim and Andrew Bennett and Ross A. Knepper - and Yoav Artzi","Following High-level Navigation Instructions on a Simulated Quadcopter - with Imitation Learning",cs.AI cs.CL cs.CV cs.LG cs.RO," We introduce a method for following high-level navigation instructions by -mapping directly from images, instructions and pose estimates to continuous -low-level velocity commands for real-time control. The Grounded Semantic -Mapping Network (GSMN) is a fully-differentiable neural network architecture -that builds an explicit semantic map in the world reference frame by -incorporating a pinhole camera projection model within the network. The -information stored in the map is learned from experience, while the -local-to-world transformation is computed explicitly. We train the model using -DAggerFM, a modified variant of DAgger that trades tabular convergence -guarantees for improved training speed and memory use. We test GSMN in virtual -environments on a realistic quadcopter simulator and show that incorporating an -explicit mapping and grounding modules allows GSMN to outperform strong neural -baselines and almost reach an expert policy performance. Finally, we analyze -the learned map representations and show that using an explicit map leads to an -interpretable instruction-following model. -" -7823,1806.00187,Myle Ott and Sergey Edunov and David Grangier and Michael Auli,Scaling Neural Machine Translation,cs.CL," Sequence to sequence learning models still require several days to reach -state of the art performance on large benchmark datasets using a single -machine. This paper shows that reduced precision and large batch training can -speedup training by nearly 5x on a single 8-GPU machine with careful tuning and -implementation. On WMT'14 English-German translation, we match the accuracy of -Vaswani et al. (2017) in under 5 hours when training on 8 GPUs and we obtain a -new state of the art of 29.3 BLEU after training for 85 minutes on 128 GPUs. We -further improve these results to 29.8 BLEU by training on the much larger -Paracrawl dataset. On the WMT'14 English-French task, we obtain a -state-of-the-art BLEU of 43.2 in 8.5 hours on 128 GPUs. -" -7824,1806.00258,Chenhui Chu and Rui Wang,A Survey of Domain Adaptation for Neural Machine Translation,cs.CL cs.AI cs.LG," Neural machine translation (NMT) is a deep learning based approach for -machine translation, which yields the state-of-the-art translation performance -in scenarios where large-scale parallel corpora are available. Although the -high-quality and domain-specific translation is crucial in the real world, -domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT -performs poorly in such scenarios. Domain adaptation that leverages both -out-of-domain parallel corpora as well as monolingual corpora for in-domain -translation, is very important for domain-specific translation. In this paper, -we give a comprehensive survey of the state-of-the-art domain adaptation -techniques for NMT. -" -7825,1806.00354,"Sandro Pezzelle, Shane Steinert-Threlkeld, Raffaela Bernardi, Jakub - Szymanik","Some of Them Can be Guessed! Exploring the Effect of Linguistic Context - in Predicting Quantifiers",cs.CL cs.AI," We study the role of linguistic context in predicting quantifiers (`few', -`all'). We collect crowdsourced data from human participants and test various -models in a local (single-sentence) and a global context (multi-sentence) -condition. Models significantly out-perform humans in the former setting and -are only slightly better in the latter. While human performance improves with -more linguistic context (especially on proportional quantifiers), model -performance suffers. Models are very effective in exploiting lexical and -morpho-syntactic patterns; humans are better at genuinely understanding the -meaning of the (global) context. -" -7826,1806.00358,"Michael Boratko, Harshit Padigela, Divyendra Mikkilineni, Pritish - Yuvraj, Rajarshi Das, Andrew McCallum, Maria Chang, Achille Fokoue-Nkoutche, - Pavan Kapanipathi, Nicholas Mattei, Ryan Musa, Kartik Talamadupula, Michael - Witbrock","A Systematic Classification of Knowledge, Reasoning, and Context within - the ARC Dataset",cs.AI cs.CL cs.IR," The recent work of Clark et al. introduces the AI2 Reasoning Challenge (ARC) -and the associated ARC dataset that partitions open domain, complex science -questions into an Easy Set and a Challenge Set. That paper includes an analysis -of 100 questions with respect to the types of knowledge and reasoning required -to answer them; however, it does not include clear definitions of these types, -nor does it offer information about the quality of the labels. We propose a -comprehensive set of definitions of knowledge and reasoning types necessary for -answering the questions in the ARC dataset. Using ten annotators and a -sophisticated annotation interface, we analyze the distribution of labels -across the Challenge Set and statistics related to them. Additionally, we -demonstrate that although naive information retrieval methods return sentences -that are irrelevant to answering the query, sufficient supporting text is often -present in the (ARC) corpus. Evaluating with human-selected relevant sentences -improves the performance of a neural machine comprehension model by 42 points. -" -7827,1806.00512,"Maohua Zhu, Jason Clemons, Jeff Pool, Minsoo Rhu, Stephen W. Keckler, - Yuan Xie","Structurally Sparsified Backward Propagation for Faster Long Short-Term - Memory Training",cs.LG cs.CL stat.ML," Exploiting sparsity enables hardware systems to run neural networks faster -and more energy-efficiently. However, most prior sparsity-centric optimization -techniques only accelerate the forward pass of neural networks and usually -require an even longer training process with iterative pruning and retraining. -We observe that artificially inducing sparsity in the gradients of the gates in -an LSTM cell has little impact on the training quality. Further, we can enforce -structured sparsity in the gate gradients to make the LSTM backward pass up to -45% faster than the state-of-the-art dense approach and 168% faster than the -state-of-the-art sparsifying method on modern GPUs. Though the structured -sparsifying method can impact the accuracy of a model, this performance gap can -be eliminated by mixing our sparse training method and the standard dense -training method. Experimental results show that the mixed method can achieve -comparable results in a shorter time span than using purely dense training. -" -7828,1806.00522,"AbdelRahim Elmadany, Sherif Abdou, Mervat Gheith","Improving Dialogue Act Classification for Spontaneous Arabic Speech and - Instant Messages at Utterance Level",cs.CL," The ability to model and automatically detect dialogue act is an important -step toward understanding spontaneous speech and Instant Messages. However, it -has been difficult to infer a dialogue act from a surface utterance because it -highly depends on the context of the utterance and speaker linguistic -knowledge; especially in Arabic dialects. This paper proposes a statistical -dialogue analysis model to recognize utterance's dialogue acts using a -multi-classes hierarchical structure. The model can automatically acquire -probabilistic discourse knowledge from a dialogue corpus were collected and -annotated manually from multi-genre Egyptian call-centers. Extensive -experiments were conducted using Support Vector Machines classifier to evaluate -the system performance. The results attained in the term of average F-measure -scores of 0.912; showed that the proposed approach has moderately improved -F-measure by approximately 20%. -" -7829,1806.00525,"Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, - Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K. Marks, - Chiori Hori",Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7,cs.CL cs.CV," Scene-aware dialog systems will be able to have conversations with users -about the objects and events around them. Progress on such systems can be made -by integrating state-of-the-art technologies from multiple research areas -including end-to-end dialog systems visual dialog, and video description. We -introduce the Audio Visual Scene Aware Dialog (AVSD) challenge and dataset. In -this challenge, which is one track of the 7th Dialog System Technology -Challenges (DSTC7) workshop1, the task is to build a system that generates -responses in a dialog about an input video -" -7830,1806.00588,"Xing Shi, Shizhen Xu, Kevin Knight",Fast Locality Sensitive Hashing for Beam Search on GPU,cs.CL cs.AI cs.DC cs.DS," We present a GPU-based Locality Sensitive Hashing (LSH) algorithm to speed up -beam search for sequence models. We utilize the winner-take-all (WTA) hash, -which is based on relative ranking order of hidden dimensions and thus -resilient to perturbations in numerical values. Our algorithm is designed by -fully considering the underling architecture of CUDA-enabled GPUs -(Algorithm/Architecture Co-design): 1) A parallel Cuckoo hash table is applied -for LSH code lookup (guaranteed O(1) lookup time); 2) Candidate lists are -shared across beams to maximize the parallelism; 3) Top frequent words are -merged into candidate lists to improve performance. Experiments on 4 -large-scale neural machine translation models demonstrate that our algorithm -can achieve up to 4x speedup on softmax module, and 2x overall speedup without -hurting BLEU on GPU. -" -7831,1806.00591,"Jon Gauthier, Anna Ivanova","Does the brain represent words? An evaluation of brain decoding studies - of language understanding",cs.CL," Language decoding studies have identified word representations which can be -used to predict brain activity in response to novel words and sentences -(Anderson et al., 2016; Pereira et al., 2018). The unspoken assumption of these -studies is that, during processing, linguistic information is transformed into -some shared semantic space, and those semantic representations are then used -for a variety of linguistic and non-linguistic tasks. We claim that current -studies vastly underdetermine the content of these representations, the -algorithms which the brain deploys to produce and consume them, and the -computational tasks which they are designed to solve. We illustrate this -indeterminacy with an extension of the sentence-decoding experiment of Pereira -et al. (2018), showing how standard evaluations fail to distinguish between -language processing models which deploy different mechanisms and which are -optimized to solve very different tasks. We conclude by suggesting changes to -the brain decoding paradigm which can support stronger claims of neural -representation. -" -7832,1806.00615,Caleb Pomeroy and Niheer Dasandi and Slava Jankin Mikhaylov,Multiplex Communities and the Emergence of International Conflict,cs.CL cs.CY cs.SI," Advances in community detection reveal new insights into multiplex and -multilayer networks. Less work, however, investigates the relationship between -these communities and outcomes in social systems. We leverage these advances to -shed light on the relationship between the cooperative mesostructure of the -international system and the onset of interstate conflict. We detect -communities based upon weaker signals of affinity expressed in United Nations -votes and speeches, as well as stronger signals observed across multiple layers -of bilateral cooperation. Communities of diplomatic affinity display an -expected negative relationship with conflict onset. Ties in communities based -upon observed cooperation, however, display no effect under a standard model -specification and a positive relationship with conflict under an alternative -specification. These results align with some extant hypotheses but also point -to a paucity in our understanding of the relationship between community -structure and behavioral outcomes in networks. -" -7833,1806.00616,"Zhiyuan Tang, Dong Wang and Qing Chen",AP18-OLR Challenge: Three Tasks and Their Baselines,cs.CL," The third oriental language recognition (OLR) challenge AP18-OLR is -introduced in this paper, including the data profile, the tasks and the -evaluation principles. Following the events in the last two years, namely -AP16-OLR and AP17-OLR, the challenge this year focuses on more challenging -tasks, including (1) short-duration utterances, (2) confusing languages, and -(3) open-set recognition. The same as the previous events, the data of AP18-OLR -is also provided by SpeechOcean and the NSFC M2ASR project. Baselines based on -both the i-vector model and neural networks are constructed for the -participants' reference. We report the baseline results on the three tasks and -demonstrate that the three tasks are truly challenging. All the data is free -for participants, and the Kaldi recipes for the baselines have been published -online. -" -7834,1806.00628,"Xi Chen, Zhihong Deng, Gehui Shen, Ting Huang","A Novel Framework for Recurrent Neural Networks with Enhancing - Information Processing and Transmission between Units",cs.NE cs.CL cs.LG," This paper proposes a novel framework for recurrent neural networks (RNNs) -inspired by the human memory models in the field of cognitive neuroscience to -enhance information processing and transmission between adjacent RNNs' units. -The proposed framework for RNNs consists of three stages that is working -memory, forget, and long-term store. The first stage includes taking input data -into sensory memory and transferring it to working memory for preliminary -treatment. And the second stage mainly focuses on proactively forgetting the -secondary information rather than the primary in the working memory. And -finally, we get the long-term store normally using some kind of RNN's unit. Our -framework, which is generalized and simple, is evaluated on 6 datasets which -fall into 3 different tasks, corresponding to text classification, image -classification and language modelling. Experiments reveal that our framework -can obviously improve the performance of traditional recurrent neural networks. -And exploratory task shows the ability of our framework of correctly forgetting -the secondary information. -" -7835,1806.00674,"Armin Seyeditabari, Narges Tabari, Wlodek Zadrozny",Emotion Detection in Text: a Review,cs.CL," In recent years, emotion detection in text has become more popular due to its -vast potential applications in marketing, political science, psychology, -human-computer interaction, artificial intelligence, etc. Access to a huge -amount of textual data, especially opinionated and self-expression text also -played a special role to bring attention to this field. In this paper, we -review the work that has been done in identifying emotion expressions in text -and argue that although many techniques, methodologies, and models have been -created to detect emotion in text, there are various reasons that make these -methods insufficient. Although, there is an essential need to improve the -design and architecture of current systems, factors such as the complexity of -human emotions, and the use of implicit and metaphorical language in expressing -it, lead us to think that just re-purposing standard methodologies will not be -enough to capture these complexities, and it is important to pay attention to -the linguistic intricacies of emotion expression. -" -7836,1806.00692,"Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose, - Graham Neubig",Stress Test Evaluation for Natural Language Inference,cs.CL," Natural language inference (NLI) is the task of determining if a natural -language hypothesis can be inferred from a given premise in a justifiable -manner. NLI was proposed as a benchmark task for natural language -understanding. Existing models perform well at standard datasets for NLI, -achieving impressive results across different genres of text. However, the -extent to which these models understand the semantic content of sentences is -unclear. In this work, we propose an evaluation methodology consisting of -automatically constructed ""stress tests"" that allow us to examine whether -systems have the ability to make real inferential decisions. Our evaluation of -six sentence-encoder models on these stress tests reveals strengths and -weaknesses of these models with respect to challenging linguistic phenomena, -and suggests important directions for future work in this area. -" -7837,1806.00696,"Vahid Garousi, Sara Bauer, Michael Felderer",NLP-assisted software testing: A systematic mapping of the literature,cs.SE cs.CL," Context: To reduce manual effort of extracting test cases from -natural-language requirements, many approaches based on Natural Language -Processing (NLP) have been proposed in the literature. Given the large amount -of approaches in this area, and since many practitioners are eager to utilize -such techniques, it is important to synthesize and provide an overview of the -state-of-the-art in this area. Objective: Our objective is to summarize the -state-of-the-art in NLP-assisted software testing which could benefit -practitioners to potentially utilize those NLP-based techniques. Moreover, this -can benefit researchers in providing an overview of the research landscape. -Method: To address the above need, we conducted a survey in the form of a -systematic literature mapping (classification). After compiling an initial pool -of 95 papers, we conducted a systematic voting, and our final pool included 67 -technical papers. Results: This review paper provides an overview of the -contribution types presented in the papers, types of NLP approaches used to -assist software testing, types of required input requirements, and a review of -tool support in this area. Some key results we have detected are: (1) only four -of the 38 tools (11%) presented in the papers are available for download; (2) a -larger ratio of the papers (30 of 67) provided a shallow exposure to the NLP -aspects (almost no details). Conclusion: This paper would benefit both -practitioners and researchers by serving as an ""index"" to the body of knowledge -in this area. The results could help practitioners utilizing the existing -NLP-based techniques; this in turn reduces the cost of test-case design and -decreases the amount of human resources spent on test activities. After sharing -this review with some of our industrial collaborators, initial insights show -that this review can indeed be useful and beneficial to practitioners. -" -7838,1806.00699,"Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith",Quantifying the dynamics of topical fluctuations in language,cs.CL," The availability of large diachronic corpora has provided the impetus for a -growing body of quantitative research on language evolution and meaning change. -The central quantities in this research are token frequencies of linguistic -elements in texts, with changes in frequency taken to reflect the popularity or -selective fitness of an element. However, corpus frequencies may change for a -wide variety of reasons, including purely random sampling effects, or because -corpora are composed of contemporary media and fiction texts within which the -underlying topics ebb and flow with cultural and socio-political trends. In -this work, we introduce a simple model for controlling for topical fluctuations -in corpora - the topical-cultural advection model - and demonstrate how it -provides a robust baseline of variability in word frequency changes over time. -We validate the model on a diachronic corpus spanning two centuries, and a -carefully-controlled artificial language change scenario, and then use it to -correct for topical fluctuations in historical time series. Finally, we use the -model to show that the emergence of new words typically corresponds with the -rise of a trending topic. This suggests that some lexical innovations occur due -to growing communicative need in a subspace of the lexicon, and that the -topical-cultural advection model can be used to quantify this. -" -7839,1806.00722,"Yanyao Shen, Xu Tan, Di He, Tao Qin, Tie-Yan Liu",Dense Information Flow for Neural Machine Translation,cs.CL," Recently, neural machine translation has achieved remarkable progress by -introducing well-designed deep neural networks into its encoder-decoder -framework. From the optimization perspective, residual connections are adopted -to improve learning performance for both encoder and decoder in most of these -deep architectures, and advanced attention connections are applied as well. -Inspired by the success of the DenseNet model in computer vision problems, in -this paper, we propose a densely connected NMT architecture (DenseNMT) that is -able to train more efficiently for NMT. The proposed DenseNMT not only allows -dense connection in creating new features for both encoder and decoder, but -also uses the dense attention structure to improve attention quality. Our -experiments on multiple datasets show that DenseNMT structure is more -competitive and efficient. -" -7840,1806.00738,"Diana Gonzalez-Rico, Gibran Fuentes-Pineda","Contextualize, Show and Tell: A Neural Visual Storyteller",cs.CL cs.AI cs.CV cs.LG," We present a neural model for generating short stories from image sequences, -which extends the image description model by Vinyals et al. (Vinyals et al., -2015). This extension relies on an encoder LSTM to compute a context vector of -each story from the image sequence. This context vector is used as the first -state of multiple independent decoder LSTMs, each of which generates the -portion of the story corresponding to each image in the sequence by taking the -image embedding as the first input. Our model showed competitive results with -the METEOR metric and human ratings in the internal track of the Visual -Storytelling Challenge 2018. -" -7841,1806.00749,"Yang Yang, Lei Zheng, Jiawei Zhang, Qingcai Cui, Zhoujun Li, Philip S. - Yu",TI-CNN: Convolutional Neural Networks for Fake News Detection,cs.CL cs.SI," With the development of social networks, fake news for various commercial and -political purposes has been appearing in large numbers and gotten widespread in -the online world. With deceptive words, people can get infected by the fake -news very easily and will share them without any fact-checking. For instance, -during the 2016 US president election, various kinds of fake news about the -candidates widely spread through both official news media and the online social -networks. These fake news is usually released to either smear the opponents or -support the candidate on their side. The erroneous information in the fake news -is usually written to motivate the voters' irrational emotion and enthusiasm. -Such kinds of fake news sometimes can bring about devastating effects, and an -important goal in improving the credibility of online social networks is to -identify the fake news timely. In this paper, we propose to study the fake news -detection problem. Automatic fake news identification is extremely hard, since -pure model based fact-checking for news is still an open problem, and few -existing models can be applied to solve the problem. With a thorough -investigation of a fake news data, lots of useful explicit features are -identified from both the text words and images used in the fake news. Besides -the explicit features, there also exist some hidden patterns in the words and -images used in fake news, which can be captured with a set of latent features -extracted via the multiple convolutional layers in our model. A model named as -TI-CNN (Text and Image information based Convolutinal Neural Network) is -proposed in this paper. By projecting the explicit and latent features into a -unified feature space, TI-CNN is trained with both the text and image -information simultaneously. Extensive experiments carried on the real-world -fake news datasets have demonstrate the effectiveness of TI-CNN. -" -7842,1806.00754,Hwiyeol Jo and Jeong Ryu,Psychological State in Text: A Limitation of Sentiment Analysis,cs.CL cs.AI," Starting with the idea that sentiment analysis models should be able to -predict not only positive or negative but also other psychological states of a -person, we implement a sentiment analysis model to investigate the relationship -between the model and emotional state. We first examine psychological -measurements of 64 participants and ask them to write a book report about a -story. After that, we train our sentiment analysis model using crawled movie -review data. We finally evaluate participants' writings, using the pretrained -model as a concept of transfer learning. The result shows that sentiment -analysis model performs good at predicting a score, but the score does not have -any correlation with human's self-checked sentiment. -" -7843,1806.00778,"Yi Tay, Luu Anh Tuan, Siu Cheung Hui","Multi-Cast Attention Networks for Retrieval-based Question Answering and - Response Prediction",cs.CL cs.AI cs.IR," Attention is typically used to select informative sub-phrases that are used -for prediction. This paper investigates the novel use of attention as a form of -feature augmentation, i.e, casted attention. We propose Multi-Cast Attention -Networks (MCAN), a new attention mechanism and general model architecture for a -potpourri of ranking tasks in the conversational modeling and question -answering domains. Our approach performs a series of soft attention operations, -each time casting a scalar feature upon the inner word embeddings. The key idea -is to provide a real-valued hint (feature) to a subsequent encoder layer and is -targeted at improving the representation learning process. There are several -advantages to this design, e.g., it allows an arbitrary number of attention -mechanisms to be casted, allowing for multiple attention types (e.g., -co-attention, intra-attention) and attention variants (e.g., alignment-pooling, -max-pooling, mean-pooling) to be executed simultaneously. This not only -eliminates the costly need to tune the nature of the co-attention layer, but -also provides greater extents of explainability to practitioners. Via extensive -experiments on four well-known benchmark datasets, we show that MCAN achieves -state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms -existing state-of-the-art models by $9\%$. MCAN also achieves the best -performing score to date on the well-studied TrecQA dataset. -" -7844,1806.00780,Vladimir Ilievski,Building Advanced Dialogue Managers for Goal-Oriented Dialogue Systems,cs.CL," Goal-Oriented (GO) Dialogue Systems, colloquially known as goal oriented -chatbots, help users achieve a predefined goal (e.g. book a movie ticket) -within a closed domain. A first step is to understand the user's goal by using -natural language understanding techniques. Once the goal is known, the bot must -manage a dialogue to achieve that goal, which is conducted with respect to a -learnt policy. The success of the dialogue system depends on the quality of the -policy, which is in turn reliant on the availability of high-quality training -data for the policy learning method, for instance Deep Reinforcement Learning. - Due to the domain specificity, the amount of available data is typically too -low to allow the training of good dialogue policies. In this master thesis we -introduce a transfer learning method to mitigate the effects of the low -in-domain data availability. Our transfer learning based approach improves the -bot's success rate by $20\%$ in relative terms for distant domains and we more -than double it for close domains, compared to the model without transfer -learning. Moreover, the transfer learning chatbots learn the policy up to 5 to -10 times faster. Finally, as the transfer learning approach is complementary to -additional processing such as warm-starting, we show that their joint -application gives the best outcomes. -" -7845,1806.00793,Alexander Herzog and Peter John and Slava Jankin Mikhaylov,"Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis - of UK House of Commons Speeches 1935-2014",cs.CL cs.CY," Topic models are widely used in natural language processing, allowing -researchers to estimate the underlying themes in a collection of documents. -Most topic models use unsupervised methods and hence require the additional -step of attaching meaningful labels to estimated topics. This process of manual -labeling is not scalable and suffers from human bias. We present a -semi-automatic transfer topic labeling method that seeks to remedy these -problems. Domain-specific codebooks form the knowledge-base for automated topic -labeling. We demonstrate our approach with a dynamic topic model analysis of -the complete corpus of UK House of Commons speeches 1935-2014, using the coding -instructions of the Comparative Agendas Project to label topics. We show that -our method works well for a majority of the topics we estimate; but we also -find that institution-specific topics, in particular on subnational governance, -require manual input. We validate our results using human expert coding. -" -7846,1806.00807,"Badri N. Patro, Vinod K. Kurmi, Sandeep Kumar, Vinay P. Namboodiri","Learning Semantic Sentence Embeddings using Sequential Pair-wise - Discriminator",cs.CL cs.AI," In this paper, we propose a method for obtaining sentence-level embeddings. -While the problem of securing word-level embeddings is very well studied, we -propose a novel method for obtaining sentence-level embeddings. This is -obtained by a simple method in the context of solving the paraphrase generation -task. If we use a sequential encoder-decoder model for generating paraphrase, -we would like the generated paraphrase to be semantically close to the original -sentence. One way to ensure this is by adding constraints for true paraphrase -embeddings to be close and unrelated paraphrase candidate sentence embeddings -to be far. This is ensured by using a sequential pair-wise discriminator that -shares weights with the encoder that is trained with a suitable loss function. -Our loss function penalizes paraphrase sentence embedding distances from being -too large. This loss is used in combination with a sequential encoder-decoder -network. We also validated our method by evaluating the obtained embeddings for -a sentiment analysis task. The proposed method results in semantic embeddings -and outperforms the state-of-the-art on the paraphrase generation and sentiment -analysis task on standard datasets. These results are also shown to be -statistically significant. -" -7847,1806.00840,Jean Maillard and Stephen Clark,"Latent Tree Learning with Differentiable Parsers: Shift-Reduce Parsing - and Chart Parsing",cs.CL," Latent tree learning models represent sentences by composing their words -according to an induced parse tree, all based on a downstream task. These -models often outperform baselines which use (externally provided) syntax trees -to drive the composition order. This work contributes (a) a new latent tree -learning model based on shift-reduce parsing, with competitive downstream -performance and non-trivial induced trees, and (b) an analysis of the trees -learned by our shift-reduce model and by a chart-based model. -" -7848,1806.00910,Abeed Sarker and Graciela Gonzalez-Hernandez,"An unsupervised and customizable misspelling generator for mining noisy - health-related text sources",cs.CL," In this paper, we present a customizable datacentric system that -automatically generates common misspellings for complex health-related terms. -The spelling variant generator relies on a dense vector model learned from -large unlabeled text, which is used to find semantically close terms to the -original/seed keyword, followed by the filtering of terms that are lexically -dissimilar beyond a given threshold. The process is executed recursively, -converging when no new terms similar (lexically and semantically) to the seed -keyword are found. Weighting of intra-word character sequence similarities -allows further problem-specific customization of the system. On a dataset -prepared for this study, our system outperforms the current state-of-the-art -for medication name variant generation with best F1-score of 0.69 and -F1/4-score of 0.78. Extrinsic evaluation of the system on a set of -cancer-related terms showed an increase of over 67% in retrieval rate from -Twitter posts when the generated variants are included. Our proposed spelling -variant generator has several advantages over the current state-of-the-art and -other types of variant generators-(i) it is capable of filtering out lexically -similar but semantically dissimilar terms, (ii) the number of variants -generated is low as many low-frequency and ambiguous misspellings are filtered -out, and (iii) the system is fully automatic, customizable and easily -executable. While the base system is fully unsupervised, we show how -supervision maybe employed to adjust weights for task-specific customization. -The performance and significant relative simplicity of our proposed approach -makes it a much needed misspelling generation resource for health-related text -mining from noisy sources. The source code for the system has been made -publicly available for research purposes. -" -7849,1806.00913,Jacob Goldberger and Oren Melamud,Self-Normalization Properties of Language Modeling,cs.CL," Self-normalizing discriminative models approximate the normalized probability -of a class without having to compute the partition function. In the context of -language modeling, this property is particularly appealing as it may -significantly reduce run-times due to large word vocabularies. In this study, -we provide a comprehensive investigation of language modeling -self-normalization. First, we theoretically analyze the inherent -self-normalization properties of Noise Contrastive Estimation (NCE) language -models. Then, we compare them empirically to softmax-based approaches, which -are self-normalized using explicit regularization, and suggest a hybrid model -with compelling properties. Finally, we uncover a surprising negative -correlation between self-normalization and perplexity across the board, as well -as some regularity in the observed errors, which may potentially be used for -improving self-normalization algorithms in the future. -" -7850,1806.00920,"Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying Tseng and Sam Tsai",DRCD: a Chinese Machine Reading Comprehension Dataset,cs.CL," In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), an -open domain traditional Chinese machine reading comprehension (MRC) dataset. -This dataset aimed to be a standard Chinese machine reading comprehension -dataset, which can be a source dataset in transfer learning. The dataset -contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions -generated by annotators. We build a baseline model that achieves an F1 score of -89.59%. F1 score of Human performance is 93.30%. -" -7851,1806.00971,"Shuhei Kurita, Daisuke Kawahara and Sadao Kurohashi","Neural Adversarial Training for Semi-supervised Japanese - Predicate-argument Structure Analysis",cs.CL," Japanese predicate-argument structure (PAS) analysis involves zero anaphora -resolution, which is notoriously difficult. To improve the performance of -Japanese PAS analysis, it is straightforward to increase the size of corpora -annotated with PAS. However, since it is prohibitively expensive, it is -promising to take advantage of a large amount of raw corpora. In this paper, we -propose a novel Japanese PAS analysis model based on semi-supervised -adversarial training with a raw corpus. In our experiments, our model -outperforms existing state-of-the-art models for Japanese PAS analysis. -" -7852,1806.01045,Tobias Hecking and Loet Leydesdorff,"Topic Modelling of Empirical Text Corpora: Validity, Reliability, and - Reproducibility in Comparison to Semantic Maps",cs.CL," Using the 6,638 case descriptions of societal impact submitted for evaluation -in the Research Excellence Framework (REF 2014), we replicate the topic model -(Latent Dirichlet Allocation or LDA) made in this context and compare the -results with factor-analytic results using a traditional word-document matrix -(Principal Component Analysis or PCA). Removing a small fraction of documents -from the sample, for example, has on average a much larger impact on LDA than -on PCA-based models to the extent that the largest distortion in the case of -PCA has less effect than the smallest distortion of LDA-based models. In terms -of semantic coherence, however, LDA models outperform PCA-based models. The -topic models inform us about the statistical properties of the document sets -under study, but the results are statistical and should not be used for a -semantic interpretation - for example, in grant selections and micro-decision -making, or scholarly work-without follow-up using domain-specific semantic -maps. -" -7853,1806.01170,"Keisuke Sakaguchi, Benjamin Van Durme",Efficient Online Scalar Annotation with Bounded Support,cs.CL," We describe a novel method for efficiently eliciting scalar annotations for -dataset construction and system quality estimation by human judgments. We -contrast direct assessment (annotators assign scores to items directly), online -pairwise ranking aggregation (scores derive from annotator comparison of -items), and a hybrid approach (EASL: Efficient Annotation of Scalar Labels) -proposed here. Our proposal leads to increased correlation with ground truth, -at far greater annotator efficiency, suggesting this strategy as an improved -mechanism for dataset creation and manual system evaluation. -" -7854,1806.01185,Thomas Lansdall-Welfare and Nello Cristianini,"History Playground: A Tool for Discovering Temporal Trends in Massive - Textual Corpora",cs.CL," Recent studies have shown that macroscopic patterns of continuity and change -over the course of centuries can be detected through the analysis of time -series extracted from massive textual corpora. Similar data-driven approaches -have already revolutionised the natural sciences, and are widely believed to -hold similar potential for the humanities and social sciences, driven by the -mass-digitisation projects that are currently under way, and coupled with the -ever-increasing number of documents which are ""born digital"". As such, new -interactive tools are required to discover and extract macroscopic patterns -from these vast quantities of textual data. Here we present History Playground, -an interactive web-based tool for discovering trends in massive textual -corpora. The tool makes use of scalable algorithms to first extract trends from -textual corpora, before making them available for real-time search and -discovery, presenting users with an interface to explore the data. Included in -the tool are algorithms for standardization, regression, change-point detection -in the relative frequencies of ngrams, multi-term indices and comparison of -trends across different corpora. -" -7855,1806.01264,"Guineng Zheng, Subhabrata Mukherjee, Xin Luna Dong, Feifei Li","OpenTag: Open Attribute Value Extraction from Product Profiles [Deep - Learning, Active Learning, Named Entity Recognition]",cs.CL cs.AI cs.IR stat.ML," Extraction of missing attribute values is to find values describing an -attribute of interest from a free text input. Most past related work on -extraction of missing attribute values work with a closed world assumption with -the possible set of values known beforehand, or use dictionaries of values and -hand-crafted features. How can we discover new attribute values that we have -never seen before? Can we do this with limited human annotation or supervision? -We study this problem in the context of product catalogs that often have -missing values for many attributes of interest. - In this work, we leverage product profile information such as titles and -descriptions to discover missing values of product attributes. We develop a -novel deep tagging model OpenTag for this extraction problem with the following -contributions: (1) we formalize the problem as a sequence tagging task, and -propose a joint model exploiting recurrent neural networks (specifically, -bidirectional LSTM) to capture context and semantics, and Conditional Random -Fields (CRF) to enforce tagging consistency, (2) we develop a novel attention -mechanism to provide interpretable explanation for our model's decisions, (3) -we propose a novel sampling strategy exploring active learning to reduce the -burden of human annotation. OpenTag does not use any dictionary or hand-crafted -features as in prior works. Extensive experiments in real-life datasets in -different domains show that OpenTag with our active learning strategy discovers -new attribute values from as few as 150 annotated samples (reduction in 3.3x -amount of annotation effort) with a high F-score of 83%, outperforming -state-of-the-art models. -" -7856,1806.01330,"Sunipa Dev, Safia Hassan, Jeff M. Phillips",Closed Form Word Embedding Alignment,cs.CL stat.ML," We develop a family of techniques to align word embeddings which are derived -from different source datasets or created using different mechanisms (e.g., -GloVe or word2vec). Our methods are simple and have a closed form to optimally -rotate, translate, and scale to minimize root mean squared errors or maximize -the average cosine similarity between two embeddings of the same vocabulary -into the same dimensional space. Our methods extend approaches known as -Absolute Orientation, which are popular for aligning objects in -three-dimensions, and generalize an approach by Smith etal (ICLR 2017). We -prove new results for optimal scaling and for maximizing cosine similarity. -Then we demonstrate how to evaluate the similarity of embeddings from different -sources or mechanisms, and that certain properties like synonyms and analogies -are preserved across the embeddings and can be enhanced by simply aligning and -averaging ensembles of embeddings. -" -7857,1806.01351,"Khoi-Nguyen Tran and Jey Han Lau and Danish Contractor and Utkarsh - Gupta and Bikram Sengupta and Christopher J. Butler and Mukesh Mohania","Document Chunking and Learning Objective Generation for Instruction - Design",cs.CL cs.CY cs.IR," Instructional Systems Design is the practice of creating of instructional -experiences that make the acquisition of knowledge and skill more efficient, -effective, and appealing. Specifically in designing courses, an hour of -training material can require between 30 to 500 hours of effort in sourcing and -organizing reference data for use in just the preparation of course material. -In this paper, we present the first system of its kind that helps reduce the -effort associated with sourcing reference material and course creation. We -present algorithms for document chunking and automatic generation of learning -objectives from content, creating descriptive content metadata to improve -content-discoverability. Unlike existing methods, the learning objectives -generated by our system incorporate pedagogically motivated Bloom's verbs. We -demonstrate the usefulness of our methods using real world data from the -banking industry and through a live deployment at a large pharmaceutical -company. -" -7858,1806.01353,Scott Lee,Natural Language Generation for Electronic Health Records,cs.CL cs.LG stat.ML," A variety of methods existing for generating synthetic electronic health -records (EHRs), but they are not capable of generating unstructured text, like -emergency department (ED) chief complaints, history of present illness or -progress notes. Here, we use the encoder-decoder model, a deep learning -algorithm that features in many contemporary machine translation systems, to -generate synthetic chief complaints from discrete variables in EHRs, like age -group, gender, and discharge diagnosis. After being trained end-to-end on -authentic records, the model can generate realistic chief complaint text that -preserves much of the epidemiological information in the original data. As a -side effect of the model's optimization goal, these synthetic chief complaints -are also free of relatively uncommon abbreviation and misspellings, and they -include none of the personally-identifiable information (PII) that was in the -training data, suggesting it may be used to support the de-identification of -text in EHRs. When combined with algorithms like generative adversarial -networks (GANs), our model could be used to generate fully-synthetic EHRs, -facilitating data sharing between healthcare providers and researchers and -improving our ability to develop machine learning methods tailored to the -information in healthcare data. -" -7859,1806.01483,"Hongru Liang, Haozheng Wang, Jun Wang, Shaodi You, Zhe Sun, Jin-Mao - Wei, Zhenglu Yang","JTAV: Jointly Learning Social Media Content Representation by Fusing - Textual, Acoustic, and Visual Features",cs.CL," Learning social media content is the basis of many real-world applications, -including information retrieval and recommendation systems, among others. In -contrast with previous works that focus mainly on single modal or bi-modal -learning, we propose to learn social media content by fusing jointly textual, -acoustic, and visual information (JTAV). Effective strategies are proposed to -extract fine-grained features of each modality, that is, attBiGRU and DCRNN. We -also introduce cross-modal fusion and attentive pooling techniques to integrate -multi-modal information comprehensively. Extensive experimental evaluation -conducted on real-world datasets demonstrates our proposed model outperforms -the state-of-the-art approaches by a large margin. -" -7860,1806.01501,"Jingjing Gong, Xipeng Qiu, Shaojing Wang and Xuanjing Huang",Information Aggregation via Dynamic Routing for Sequence Encoding,cs.CL," While much progress has been made in how to encode a text sequence into a -sequence of vectors, less attention has been paid to how to aggregate these -preceding vectors (outputs of RNN/CNN) into fixed-size encoding vector. -Usually, a simple max or average pooling is used, which is a bottom-up and -passive way of aggregation and lack of guidance by task information. In this -paper, we propose an aggregation mechanism to obtain a fixed-size encoding with -a dynamic routing policy. The dynamic routing policy is dynamically deciding -that what and how much information need be transferred from each word to the -final encoding of the text sequence. Following the work of Capsule Network, we -design two dynamic routing policies to aggregate the outputs of RNN/CNN -encoding layer into a final encoding vector. Compared to the other aggregation -methods, dynamic routing can refine the messages according to the state of -final encoding vector. Experimental results on five text classification tasks -show that our method outperforms other aggregating models by a significant -margin. Related source code is released on our github page. -" -7861,1806.01515,Shuoyang Ding and Kevin Duh,"How Do Source-side Monolingual Word Embeddings Impact Neural Machine - Translation?",cs.CL," Using pre-trained word embeddings as input layer is a common practice in many -natural language processing (NLP) tasks, but it is largely neglected for neural -machine translation (NMT). In this paper, we conducted a systematic analysis on -the effect of using pre-trained source-side monolingual word embedding in NMT. -We compared several strategies, such as fixing or updating the embeddings -during NMT training on varying amounts of data, and we also proposed a novel -strategy called dual-embedding that blends the fixing and updating strategies. -Our results suggest that pre-trained embeddings can be helpful if properly -incorporated into NMT, especially when parallel data is limited or additional -in-domain monolingual data is readily available. -" -7862,1806.01523,"Fariz Ikhwantri, Samuel Louvan, Kemal Kurniawan, Bagas Abisena, Valdi - Rachman, Alfan Farizki Wicaksono, Rahmad Mahendra","Multi-Task Active Learning for Neural Semantic Role Labeling on Low - Resource Conversational Corpus",cs.CL," Most Semantic Role Labeling (SRL) approaches are supervised methods which -require a significant amount of annotated corpus, and the annotation requires -linguistic expertise. In this paper, we propose a Multi-Task Active Learning -framework for Semantic Role Labeling with Entity Recognition (ER) as the -auxiliary task to alleviate the need for extensive data and use additional -information from ER to help SRL. We evaluate our approach on Indonesian -conversational dataset. Our experiments show that multi-task active learning -can outperform single-task active learning method and standard multi-task -learning. According to our results, active learning is more efficient by using -12% less of training data compared to passive learning in both single-task and -multi-task setting. We also introduce a new dataset for SRL in Indonesian -conversational domain to encourage further research in this area. -" -7863,1806.01526,"Piek Vossen, Selene Baez, Lenka Baj\v{c}eti\'c, and Bram Kraaijeveld","Leolani: a reference machine with a theory of mind for social - communication",cs.AI cs.CL cs.HC," Our state of mind is based on experiences and what other people tell us. This -may result in conflicting information, uncertainty, and alternative facts. We -present a robot that models relativity of knowledge and perception within -social interaction following principles of the theory of mind. We utilized -vision and speech capabilities on a Pepper robot to build an interaction model -that stores the interpretations of perceptions and conversations in combination -with provenance on its sources. The robot learns directly from what people tell -it, possibly in relation to its perception. We demonstrate how the robot's -communication is driven by hunger to acquire more knowledge from and on people -and objects, to resolve uncertainties and conflicts, and to share awareness of -the per- ceived environment. Likewise, the robot can make reference to the -world and its knowledge about the world and the encounters with people that -yielded this knowledge. -" -7864,1806.01620,"Erik Holmer, Andreas Marfurt",Explaining Away Syntactic Structure in Semantic Document Representations,cs.CL," Most generative document models act on bag-of-words input in an attempt to -focus on the semantic content and thereby partially forego syntactic -information. We argue that it is preferable to keep the original word order -intact and explicitly account for the syntactic structure instead. We propose -an extension to the Neural Variational Document Model (Miao et al., 2016) that -does exactly that to separate local (syntactic) context from the global -(semantic) representation of the document. Our model builds on the variational -autoencoder framework to define a generative document model based on next-word -prediction. We name our approach Sequence-Aware Variational Autoencoder since -in contrast to its predecessor, it operates on the true input sequence. In a -series of experiments we observe stronger topicality of the learned -representations as well as increased robustness to syntactic noise in our -training data. -" -7865,1806.01694,"Chao-Hong Liu, Declan Groves, Akira Hayakawa, Alberto Poncelas and Qun - Liu",Understanding Meanings in Multilingual Customer Feedback,cs.CL," Understanding and being able to react to customer feedback is the most -fundamental task in providing good customer service. However, there are two -major obstacles for international companies to automatically detect the meaning -of customer feedback in a global multilingual environment. Firstly, there is no -widely acknowledged categorisation (classes) of meaning for customer feedback. -Secondly, the applicability of one meaning categorisation, if it exists, to -customer feedback in multiple languages is questionable. In this paper, we -extracted representative real world samples of customer feedback from Microsoft -Office customers in multiple languages, English, Spanish and Japanese,and -concluded a five-class categorisation(comment, request, bug, complaint and -meaningless) for meaning classification that could be used across languages in -the realm of customer feedback analysis. -" -7866,1806.01733,Robyn Speer and Joanna Lowry-Duda,"Luminoso at SemEval-2018 Task 10: Distinguishing Attributes Using Text - Corpora and Relational Knowledge",cs.CL," Luminoso participated in the SemEval 2018 task on ""Capturing Discriminative -Attributes"" with a system based on ConceptNet, an open knowledge graph focused -on general knowledge. In this paper, we describe how we trained a linear -classifier on a small number of semantically-informed features to achieve an -$F_1$ score of 0.7368 on the task, close to the task's high score of 0.75. -" -7867,1806.01742,"Alexander LeClair, Zachary Eberhart, Collin McMillan",Adapting Neural Text Classification for Improved Software Categorization,cs.SE cs.CL," Software Categorization is the task of organizing software into groups that -broadly describe the behavior of the software, such as ""editors"" or ""science."" -Categorization plays an important role in several maintenance tasks, such as -repository navigation and feature elicitation. Current approaches attempt to -cast the problem as text classification, to make use of the rich body of -literature from the NLP domain. However, as we will show in this paper, text -classification algorithms are generally not applicable off-the-shelf to source -code; we found that they work well when high-level project descriptions are -available, but suffer very large performance penalties when classifying source -code and comments only. We propose a set of adaptations to a state-of-the-art -neural classification algorithm and perform two evaluations: one with reference -data from Debian end-user programs, and one with a set of C/C++ libraries that -we hired professional programmers to annotate. We show that our proposed -approach achieves performance exceeding that of previous software -classification techniques as well as a state-of-the-art neural text -classification technique. -" -7868,1806.01773,"Chetan Naik, Arpit Gupta, Hancheng Ge, Lambert Mathias, Ruhi Sarikaya",Contextual Slot Carryover for Disparate Schemas,cs.CL," In the slot-filling paradigm, where a user can refer back to slots in the -context during a conversation, the goal of the contextual understanding system -is to resolve the referring expressions to the appropriate slots in the -context. In large-scale multi-domain systems, this presents two challenges - -scaling to a very large and potentially unbounded set of slot values, and -dealing with diverse schemas. We present a neural network architecture that -addresses the slot value scalability challenge by reformulating the contextual -interpretation as a decision to carryover a slot from a set of possible -candidates. To deal with heterogenous schemas, we introduce a simple -data-driven method for trans- forming the candidate slots. Our experiments show -that our approach can scale to multiple domains and provides competitive -results over a strong baseline. -" -7869,1806.02179,"Sapna Negi, Maarten de Rijke, Paul Buitelaar",Open Domain Suggestion Mining: Problem Definition and Datasets,cs.CL," We propose a formal definition for the task of suggestion mining in the -context of a wide range of open domain applications. Human perception of the -term \emph{suggestion} is subjective and this effects the preparation of hand -labeled datasets for the task of suggestion mining. Existing work either lacks -a formal problem definition and annotation procedure, or provides domain and -application specific definitions. Moreover, many previously used manually -labeled datasets remain proprietary. We first present an annotation study, and -based on our observations propose a formal task definition and annotation -procedure for creating benchmark datasets for suggestion mining. With this -study, we also provide publicly available labeled datasets for suggestion -mining in multiple domains. -" -7870,1806.02253,"Amir Bakarov, Roman Suvorov, Ilya Sochenkov",The Limitations of Cross-language Word Embeddings Evaluation,cs.CL," The aim of this work is to explore the possible limitations of existing -methods of cross-language word embeddings evaluation, addressing the lack of -correlation between intrinsic and extrinsic cross-language evaluation methods. -To prove this hypothesis, we construct English-Russian datasets for extrinsic -and intrinsic evaluation tasks and compare performances of 5 different -cross-language models on them. The results say that the scores even on -different intrinsic benchmarks do not correlate to each other. We can conclude -that the use of human references as ground truth for cross-language word -embeddings is not proper unless one does not understand how do native speakers -process semantics in their cognition. -" -7871,1806.02418,Edwin Simpson and Iryna Gurevych,Finding Convincing Arguments Using Scalable Bayesian Preference Learning,cs.CL," We introduce a scalable Bayesian preference learning method for identifying -convincing arguments in the absence of gold-standard rat- ings or rankings. In -contrast to previous work, we avoid the need for separate methods to perform -quality control on training data, predict rankings and perform pairwise -classification. Bayesian approaches are an effective solution when faced with -sparse or noisy training data, but have not previously been used to identify -convincing arguments. One issue is scalability, which we address by developing -a stochastic variational inference method for Gaussian process (GP) preference -learning. We show how our method can be applied to predict argument -convincingness from crowdsourced data, outperforming the previous -state-of-the-art, particularly when trained with small amounts of unreliable -data. We demonstrate how the Bayesian approach enables more effective active -learning, thereby reducing the amount of data required to identify convincing -arguments for new users and domains. While word embeddings are principally used -with neural networks, our results show that word embeddings in combination with -linguistic features also benefit GPs when predicting argument convincingness. -" -7872,1806.02437,"Casey Casalnuovo, Kenji Sagae, Prem Devanbu",Studying the Difference Between Natural and Programming Language Corpora,cs.CL," Code corpora, as observed in large software systems, are now known to be far -more repetitive and predictable than natural language corpora. But why? Does -the difference simply arise from the syntactic limitations of programming -languages? Or does it arise from the differences in authoring decisions made by -the writers of these natural and programming language texts? We conjecture that -the differences are not entirely due to syntax, but also from the fact that -reading and writing code is un-natural for humans, and requires substantial -mental effort; so, people prefer to write code in ways that are familiar to -both reader and writer. To support this argument, we present results from two -sets of studies: 1) a first set aimed at attenuating the effects of syntax, and -2) a second, aimed at measuring repetitiveness of text written in other -settings (e.g. second language, technical/specialized jargon), which are also -effortful to write. We find find that this repetition in source code is not -entirely the result of grammar constraints, and thus some repetition must -result from human choice. While the evidence we find of similar repetitive -behavior in technical and learner corpora does not conclusively show that such -language is used by humans to mitigate difficulty, it is consistent with that -theory. -" -7873,1806.02525,"Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, Satoshi Nakamura",Multi-Source Neural Machine Translation with Missing Data,cs.CL," Multi-source translation is an approach to exploit multiple inputs (e.g. in -two different languages) to increase translation accuracy. In this paper, we -examine approaches for multi-source neural machine translation (NMT) using an -incomplete multilingual corpus in which some translations are missing. In -practice, many multilingual corpora are not complete due to the difficulty to -provide translations in all of the relevant languages (for example, in TED -talks, most English talks only have subtitles for a small portion of the -languages that TED supports). Existing studies on multi-source translation did -not explicitly handle such situations. This study focuses on the use of -incomplete multilingual corpora in multi-encoder NMT and mixture of NMT experts -and examines a very simple implementation where missing source translations are -replaced by a special symbol . These methods allow us to use incomplete -corpora both at training time and test time. In experiments with real -incomplete multilingual corpora of TED Talks, the multi-source NMT with the - tokens achieved higher translation accuracies measured by BLEU than -those by any one-to-one NMT systems. -" -7874,1806.02724,"Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, - Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, - Trevor Darrell",Speaker-Follower Models for Vision-and-Language Navigation,cs.CV cs.CL," Navigation guided by natural language instructions presents a challenging -reasoning problem for instruction followers. Natural language instructions -typically identify only a few high-level decisions and landmarks rather than -complete low-level motor behaviors; much of the missing information must be -inferred based on perceptual context. In machine learning settings, this is -doubly challenging: it is difficult to collect enough annotated data to enable -learning of this reasoning process from scratch, and also difficult to -implement the reasoning process using generic sequence models. Here we describe -an approach to vision-and-language navigation that addresses both these issues -with an embedded speaker model. We use this speaker model to (1) synthesize new -instructions for data augmentation and to (2) implement pragmatic reasoning, -which evaluates how well candidate action sequences explain an instruction. -Both steps are supported by a panoramic action space that reflects the -granularity of human-generated instructions. Experiments show that all three -components of this approach---speaker-driven data augmentation, pragmatic -reasoning and panoramic action space---dramatically improve the performance of -a baseline instruction follower, more than doubling the success rate over the -best existing approach on a standard benchmark. -" -7875,1806.02725,Pierre Isabelle and Roland Kuhn,A Challenge Set for French --> English Machine Translation,cs.CL," We present a challenge set for French --> English machine translation based -on the approach introduced in Isabelle, Cherry and Foster (EMNLP 2017). Such -challenge sets are made up of sentences that are expected to be relatively -difficult for machines to translate correctly because their most -straightforward translations tend to be linguistically divergent. We present -here a set of 506 manually constructed French sentences, 307 of which are -targeted to the same kinds of structural divergences as in the paper mentioned -above. The remaining 199 sentences are designed to test the ability of the -systems to correctly translate difficult grammatical words such as -prepositions. We report on the results of using this challenge set for testing -two different systems, namely Google Translate and DEEPL, each on two different -dates (October 2017 and January 2018). All the resulting data are made publicly -available. -" -7876,1806.02782,"Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie","Training Augmentation with Adversarial Examples for Robust Speech - Recognition",cs.CL cs.LG eess.AS stat.ML," This paper explores the use of adversarial examples in training speech -recognition systems to increase robustness of deep neural network acoustic -models. During training, the fast gradient sign method is used to generate -adversarial examples augmenting the original training data. Different from -conventional data augmentation based on data transformations, the examples are -dynamically generated based on current acoustic model parameters. We assess the -impact of adversarial data augmentation in experiments on the Aurora-4 and -CHiME-4 single-channel tasks, showing improved robustness against noise and -channel variation. Further improvement is obtained when combining adversarial -examples with teacher/student training, leading to a 23% relative word error -rate reduction on Aurora-4. -" -7877,1806.02786,"Sining Sun, Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie",Domain Adversarial Training for Accented Speech Recognition,cs.CL," In this paper, we propose a domain adversarial training (DAT) algorithm to -alleviate the accented speech recognition problem. In order to reduce the -mismatch between labeled source domain data (""standard"" accent) and unlabeled -target domain data (with heavy accents), we augment the learning objective for -a Kaldi TDNN network with a domain adversarial training (DAT) objective to -encourage the model to learn accent-invariant features. In experiments with -three Mandarin accents, we show that DAT yields up to 7.45% relative character -error rate reduction when we do not have transcriptions of the accented speech, -compared with the baseline trained on standard accent data only. We also find a -benefit from DAT when used in combination with training from automatic -transcriptions on the accented data. Furthermore, we find that DAT is superior -to multi-task learning for accented speech recognition. -" -7878,1806.02814,Denis Newman-Griffis and Ayah Zirikly,"Embedding Transfer for Low-Resource Medical Named Entity Recognition: A - Case Study on Patient Mobility",cs.CL cs.AI," Functioning is gaining recognition as an important indicator of global -health, but remains under-studied in medical natural language processing -research. We present the first analysis of automatically extracting -descriptions of patient mobility, using a recently-developed dataset of free -text electronic health records. We frame the task as a named entity recognition -(NER) problem, and investigate the applicability of NER techniques to mobility -extraction. As text corpora focused on patient functioning are scarce, we -explore domain adaptation of word embeddings for use in a recurrent neural -network NER system. We find that embeddings trained on a small in-domain corpus -perform nearly as well as those learned from large out-of-domain corpora, and -that domain adaptation techniques yield additional improvements in both -precision and recall. Our analysis identifies several significant challenges in -extracting descriptions of patient mobility, including the length and -complexity of annotated entities and high linguistic variability in mobility -descriptions. -" -7879,1806.02847,Trieu H. Trinh and Quoc V. Le,A Simple Method for Commonsense Reasoning,cs.AI cs.CL cs.LG," Commonsense reasoning is a long-standing challenge for deep learning. For -example, it is difficult to use neural networks to tackle the Winograd Schema -dataset (Levesque et al., 2011). In this paper, we present a simple method for -commonsense reasoning with neural networks, using unsupervised learning. Key to -our method is the use of language models, trained on a massive amount of -unlabled data, to score multiple choice questions posed by commonsense -reasoning tests. On both Pronoun Disambiguation and Winograd Schema challenges, -our models outperform previous state-of-the-art methods by a large margin, -without using expensive annotated knowledge bases or hand-engineered features. -We train an array of large RNN language models that operate at word or -character level on LM-1-Billion, CommonCrawl, SQuAD, Gutenberg Books, and a -customized corpus for this task and show that diversity of training data plays -an important role in test performance. Further analysis also shows that our -system successfully discovers important features of the context that decide the -correct answer, indicating a good grasp of commonsense knowledge. -" -7880,1806.02863,"Rahul Gupta, Saurabh Sahu, Carol Espy-Wilson, Shrikanth Narayanan","Semi-supervised and Transfer learning approaches for low resource - sentiment classification",cs.IR cs.CL cs.LG stat.ML," Sentiment classification involves quantifying the affective reaction of a -human to a document, media item or an event. Although researchers have -investigated several methods to reliably infer sentiment from lexical, speech -and body language cues, training a model with a small set of labeled datasets -is still a challenge. For instance, in expanding sentiment analysis to new -languages and cultures, it may not always be possible to obtain comprehensive -labeled datasets. In this paper, we investigate the application of -semi-supervised and transfer learning methods to improve performances on low -resource sentiment classification tasks. We experiment with extracting dense -feature representations, pre-training and manifold regularization in enhancing -the performance of sentiment classification systems. Our goal is a coherent -implementation of these methods and we evaluate the gains achieved by these -methods in matched setting involving training and testing on a single corpus -setting as well as two cross corpora settings. In both the cases, our -experiments demonstrate that the proposed methods can significantly enhance the -model performance against a purely supervised approach, particularly in cases -involving a handful of training data. -" -7881,1806.02873,"Xiangrui Cai, Jinyang Gao, Kee Yuan Ngiam, Beng Chin Ooi, Ying Zhang, - Xiaojie Yuan",Medical Concept Embedding with Time-Aware Attention,cs.CL cs.AI," Embeddings of medical concepts such as medication, procedure and diagnosis -codes in Electronic Medical Records (EMRs) are central to healthcare analytics. -Previous work on medical concept embedding takes medical concepts and EMRs as -words and documents respectively. Nevertheless, such models miss out the -temporal nature of EMR data. On the one hand, two consecutive medical concepts -do not indicate they are temporally close, but the correlations between them -can be revealed by the time gap. On the other hand, the temporal scopes of -medical concepts often vary greatly (e.g., \textit{common cold} and -\textit{diabetes}). In this paper, we propose to incorporate the temporal -information to embed medical codes. Based on the Continuous Bag-of-Words model, -we employ the attention mechanism to learn a ""soft"" time-aware context window -for each medical concept. Experiments on public and proprietary datasets -through clustering and nearest neighbour search tasks demonstrate the -effectiveness of our model, showing that it outperforms five state-of-the-art -baselines. -" -7882,1806.02875,"Mauricio Gruppi, Benjamin D. Horne, Sibel Adali",An Exploration of Unreliable News Classification in Brazil and The U.S,cs.CL," The propagation of unreliable information is on the rise in many places -around the world. This expansion is facilitated by the rapid spread of -information and anonymity granted by the Internet. The spread of unreliable -information is a wellstudied issue and it is associated with negative social -impacts. In a previous work, we have identified significant differences in the -structure of news articles from reliable and unreliable sources in the US -media. Our goal in this work was to explore such differences in the Brazilian -media. We found significant features in two data sets: one with Brazilian news -in Portuguese and another one with US news in English. Our results show that -features related to the writing style were prominent in both data sets and, -despite the language difference, some features have a universal behavior, being -significant to both US and Brazilian news articles. Finally, we combined both -data sets and used the universal features to build a machine learning -classifier to predict the source type of a news article as reliable or -unreliable. -" -7883,1806.02901,"Ben Athiwaratkun, Andrew Gordon Wilson, Anima Anandkumar",Probabilistic FastText for Multi-Sense Word Embeddings,cs.CL cs.AI cs.LG stat.ML," We introduce Probabilistic FastText, a new model for word embeddings that can -capture multiple word senses, sub-word structure, and uncertainty information. -In particular, we represent each word with a Gaussian mixture density, where -the mean of a mixture component is given by the sum of n-grams. This -representation allows the model to share statistical strength across sub-word -structures (e.g. Latin roots), producing accurate representations of rare, -misspelt, or even unseen words. Moreover, each component of the mixture can -capture a different word sense. Probabilistic FastText outperforms both -FastText, which has no probabilistic model, and dictionary-level probabilistic -embeddings, which do not incorporate subword structures, on several -word-similarity benchmarks, including English RareWord and foreign language -datasets. We also achieve state-of-art performance on benchmarks that measure -ability to discern different meanings. Thus, the proposed model is the first to -achieve multi-sense representations while having enriched semantics on rare -words. -" -7884,1806.02908,Fahim Mohammad,"Is preprocessing of text really worth your time for online comment - classification?",cs.CL cs.AI," A large proportion of online comments present on public domains are -constructive, however a significant proportion are toxic in nature. The -comments contain lot of typos which increases the number of features manifold, -making the ML model difficult to train. Considering the fact that the data -scientists spend approximately 80% of their time in collecting, cleaning and -organizing their data [1], we explored how much effort should we invest in the -preprocessing (transformation) of raw comments before feeding it to the -state-of-the-art classification models. With the help of four models on Jigsaw -toxic comment classification data, we demonstrated that the training of model -without any transformation produce relatively decent model. Applying even basic -transformations, in some cases, lead to worse performance and should be applied -with caution. -" -7885,1806.02923,"Saurav Sahay, Shachi H Kumar, Rui Xia, Jonathan Huang, Lama Nachman","Multimodal Relational Tensor Network for Sentiment and Emotion - Classification",cs.CL," Understanding Affect from video segments has brought researchers from the -language, audio and video domains together. Most of the current multimodal -research in this area deals with various techniques to fuse the modalities, and -mostly treat the segments of a video independently. Motivated by the work of -(Zadeh et al., 2017) and (Poria et al., 2017), we present our architecture, -Relational Tensor Network, where we use the inter-modal interactions within a -segment (intra-segment) and also consider the sequence of segments in a video -to model the inter-segment inter-modal interactions. We also generate rich -representations of text and audio modalities by leveraging richer audio and -linguistic context alongwith fusing fine-grained knowledge based polarity -scores from text. We present the results of our model on CMU-MOSEI dataset and -show that our model outperforms many baselines and state of the art methods for -sentiment classification and emotion recognition. -" -7886,1806.02934,"Ashwin Kalyan, Stefan Lee, Anitha Kannan, Dhruv Batra","Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse - Annotations",stat.ML cs.CL cs.CV cs.LG," Many structured prediction problems (particularly in vision and language -domains) are ambiguous, with multiple outputs being correct for an input - e.g. -there are many ways of describing an image, multiple ways of translating a -sentence; however, exhaustively annotating the applicability of all possible -outputs is intractable due to exponentially large output spaces (e.g. all -English sentences). In practice, these problems are cast as multi-class -prediction, with the likelihood of only a sparse set of annotations being -maximized - unfortunately penalizing for placing beliefs on plausible but -unannotated outputs. We make and test the following hypothesis - for a given -input, the annotations of its neighbors may serve as an additional supervisory -signal. Specifically, we propose an objective that transfers supervision from -neighboring examples. We first study the properties of our developed method in -a controlled toy setup before reporting results on multi-label classification -and two image-grounded sequence modeling tasks - captioning and question -generation. We evaluate using standard task-specific metrics and measures of -output diversity, finding consistent improvements over standard maximum -likelihood training and other baselines. -" -7887,1806.02940,"Alexandra Birch, Andrew Finch, Minh-Thang Luong, Graham Neubig, Yusuke - Oda","Findings of the Second Workshop on Neural Machine Translation and - Generation",cs.CL," This document describes the findings of the Second Workshop on Neural Machine -Translation and Generation, held in concert with the annual conference of the -Association for Computational Linguistics (ACL 2018). First, we summarize the -research trends of papers presented in the proceedings, and note that there is -particular interest in linguistic structure, domain adaptation, data -augmentation, handling inadequate resources, and analysis of models. Second, we -describe the results of the workshop's shared task on efficient neural machine -translation, where participants were tasked with creating MT systems that are -both accurate and efficient. -" -7888,1806.02960,"Ikuya Yamada, Hiroyuki Shindo, Yoshiyasu Takefuji","Representation Learning of Entities and Documents from Knowledge Base - Descriptions",cs.CL cs.NE," In this paper, we describe TextEnt, a neural network model that learns -distributed representations of entities and documents directly from a knowledge -base (KB). Given a document in a KB consisting of words and entity annotations, -we train our model to predict the entity that the document describes and map -the document and its target entity close to each other in a continuous vector -space. Our model is trained using a large number of documents extracted from -Wikipedia. The performance of the proposed model is evaluated using two tasks, -namely fine-grained entity typing and multiclass text classification. The -results demonstrate that our model achieves state-of-the-art performance on -both tasks. The code and the trained representations are made available online -for further academic research. -" -7889,1806.02988,"Zhuohan Li, Di He, Fei Tian, Wei Chen, Tao Qin, Liwei Wang, Tie-Yan - Liu",Towards Binary-Valued Gates for Robust LSTM Training,cs.LG cs.CL stat.ML," Long Short-Term Memory (LSTM) is one of the most widely used recurrent -structures in sequence modeling. It aims to use gates to control information -flow (e.g., whether to skip some information or not) in the recurrent -computations, although its practical implementation based on soft gates only -partially achieves this goal. In this paper, we propose a new way for LSTM -training, which pushes the output values of the gates towards 0 or 1. By doing -so, we can better control the information flow: the gates are mostly open or -closed, instead of in a middle state, which makes the results more -interpretable. Empirical studies show that (1) Although it seems that we -restrict the model capacity, there is no performance drop: we achieve better or -comparable performances due to its better generalization ability; (2) The -outputs of gates are not sensitive to their inputs: we can easily compress the -LSTM unit in multiple ways, e.g., low-rank approximation and low-precision -approximation. The compressed models are even better than the baseline models -without compression. -" -7890,1806.03125,"Erica K. Shimomoto, Lincon S. Souza, Bernardo B. Gatto, Kazuhiro Fukui",Text Classification based on Word Subspace with Term-Frequency,stat.ML cs.CL cs.LG," Text classification has become indispensable due to the rapid increase of -text in digital form. Over the past three decades, efforts have been made to -approach this task using various learning algorithms and statistical models -based on bag-of-words (BOW) features. Despite its simple implementation, BOW -features lack semantic meaning representation. To solve this problem, neural -networks started to be employed to learn word vectors, such as the word2vec. -Word2vec embeds word semantic structure into vectors, where the angle between -vectors indicates the meaningful similarity between words. To measure the -similarity between texts, we propose the novel concept of word subspace, which -can represent the intrinsic variability of features in a set of word vectors. -Through this concept, it is possible to model text from word vectors while -holding semantic information. To incorporate the word frequency directly in the -subspace model, we further extend the word subspace to the term-frequency (TF) -weighted word subspace. Based on these new concepts, text classification can be -performed under the mutual subspace method (MSM) framework. The validity of our -modeling is shown through experiments on the Reuters text database, comparing -the results to various state-of-art algorithms. -" -7891,1806.03191,"Stephen Roller, Douwe Kiela, and Maximilian Nickel","Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text - Corpora",cs.CL," Methods for unsupervised hypernym detection may broadly be categorized -according to two paradigms: pattern-based and distributional methods. In this -paper, we study the performance of both approaches on several hypernymy tasks -and find that simple pattern-based methods consistently outperform -distributional methods on common benchmark datasets. Our results show that -pattern-based models provide important contextual constraints which are not yet -captured in distributional methods. -" -7892,1806.03223,"Elena Musi, Debanjan Ghosh, Smaranda Muresan",ChangeMyView Through Concessions: Do Concessions Increase Persuasion?,cs.CL," In discourse studies concessions are considered among those argumentative -strategies that increase persuasion. We aim to empirically test this hypothesis -by calculating the distribution of argumentative concessions in persuasive vs. -non-persuasive comments from the ChangeMyView subreddit. This constitutes a -challenging task since concessions are not always part of an argument. Drawing -from a theoretically-informed typology of concessions, we conduct an annotation -task to label a set of polysemous lexical markers as introducing an -argumentative concession or not and we observe their distribution in threads -that achieved and did not achieve persuasion. For the annotation, we used both -expert and novice annotators. With the ultimate goal of conducting the study on -large datasets, we present a self-training method to automatically identify -argumentative concessions using linguistically motivated features. We achieve a -moderate F1 of 57.4% on the development set and 46.0% on the test set via the -self-training method. These results are comparable to state of the art results -on similar tasks of identifying explicit discourse connective types from the -Penn Discourse Treebank. Our findings from the manual labeling and the -classification experiments indicate that the type of argumentative concessions -we investigated is almost equally likely to be used in winning and losing -arguments from the ChangeMyView dataset. While this result seems to contradict -theoretical assumptions, we provide some reasons for this discrepancy related -to the ChangeMyView subreddit. -" -7893,1806.03280,Graeme Blackwood and Miguel Ballesteros and Todd Ward,Multilingual Neural Machine Translation with Task-Specific Attention,cs.CL," Multilingual machine translation addresses the task of translating between -multiple source and target languages. We propose task-specific attention -models, a simple but effective technique for improving the quality of -sequence-to-sequence neural multilingual translation. Our approach seeks to -retain as much of the parameter sharing generalization of NMT models as -possible, while still allowing for language-specific specialization of the -attention model to a particular language-pair or task. Our experiments on four -languages of the Europarl corpus show that using a target-specific model of -attention provides consistent gains in translation quality for all possible -translation directions, compared to a model in which all parameters are shared. -We observe improved translation quality even in the (extreme) low-resource -zero-shot translation directions for which the model never saw explicitly -paired parallel data. -" -7894,1806.03290,Daniel Fried and Dan Klein,Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing,cs.CL," Dynamic oracles provide strong supervision for training constituency parsers -with exploration, but must be custom defined for a given parser's transition -system. We explore using a policy gradient method as a parser-agnostic -alternative. In addition to directly optimizing for a tree-level metric such as -F1, policy gradient has the potential to reduce exposure bias by allowing -exploration during training; moreover, it does not require a dynamic oracle for -supervision. On four constituency parsers in three languages, the method -substantially outperforms static oracle likelihood training in almost all -settings. For parsers where a dynamic oracle is available (including a novel -oracle which we define for the transition system of Dyer et al. 2016), policy -gradient typically recaptures a substantial fraction of the performance gain -afforded by the dynamic oracle. -" -7895,1806.03357,"Victor Ardulov, Manoj Kumar, Shanna Williams, Thomas Lyon, Shrikanth - Narayanan",Measuring Conversational Productivity in Child Forensic Interviews,cs.CL cs.CY," Child Forensic Interviewing (FI) presents a challenge for effective -information retrieval and decision making. The high stakes associated with the -process demand that expert legal interviewers are able to effectively establish -a channel of communication and elicit substantive knowledge from the -child-client while minimizing potential for experiencing trauma. As a first -step toward computationally modeling and producing quality spoken interviewing -strategies and a generalized understanding of interview dynamics, we propose a -novel methodology to computationally model effectiveness criteria, by applying -summarization and topic modeling techniques to objectively measure and rank the -responsiveness and conversational productivity of a child during FI. We score -information retrieval by constructing an agenda to represent general topics of -interest and measuring alignment with a given response and leveraging lexical -entrainment for responsiveness. For comparison, we present our methods along -with traditional metrics of evaluation and discuss the use of prior information -for generating situational awareness. -" -7896,1806.03369,Natalie Parde and Rodney D. Nielsen,"#SarcasmDetection is soooo general! Towards a Domain-Independent - Approach for Detecting Sarcasm",cs.CL," Automatic sarcasm detection methods have traditionally been designed for -maximum performance on a specific domain. This poses challenges for those -wishing to transfer those approaches to other existing or novel domains, which -may be typified by very different language characteristics. We develop a -general set of features and evaluate it under different training scenarios -utilizing in-domain and/or out-of-domain training data. The best-performing -scenario, training on both while employing a domain adaptation step, achieves -an F1 of 0.780, which is well above baseline F1-measures of 0.515 and 0.345. We -also show that the approach outperforms the best results from prior work on the -same target domain. -" -7897,1806.03431,Kumiko Tanaka-Ishii and Hiroshi Terada,Word Familiarity and Frequency,cs.CL," Word frequency is assumed to correlate with word familiarity, but the -strength of this correlation has not been thoroughly investigated. In this -paper, we report on our analysis of the correlation between a word familiarity -rating list obtained through a psycholinguistic experiment and the -log-frequency obtained from various corpora of different kinds and sizes (up to -the terabyte scale) for English and Japanese. Major findings are threefold: -First, for a given corpus, familiarity is necessary for a word to achieve high -frequency, but familiar words are not necessarily frequent. Second, correlation -increases with the corpus data size. Third, a corpus of spoken language -correlates better than one of written language. These findings suggest that -cognitive familiarity ratings are correlated to frequency, but more highly to -that of spoken rather than written language. -" -7898,1806.03489,Abbas Ghaddar and Philippe Langlais,"Robust Lexical Features for Improved Neural Network Named-Entity - Recognition",cs.CL," Neural network approaches to Named-Entity Recognition reduce the need for -carefully hand-crafted features. While some features do remain in -state-of-the-art systems, lexical features have been mostly discarded, with the -exception of gazetteers. In this work, we show that this is unfair: lexical -features are actually quite useful. We propose to embed words and entity types -into a low-dimensional vector space we train from annotated data produced by -distant supervision thanks to Wikipedia. From this, we compute - offline - a -feature vector representing each word. When used with a vanilla recurrent -neural network model, this representation yields substantial improvements. We -establish a new state-of-the-art F1 score of 87.95 on ONTONOTES 5.0, while -matching state-of-the-art performance with a F1 score of 91.73 on the -over-studied CONLL-2003 dataset. -" -7899,1806.03497,"Siyuan Qi, Baoxiong Jia, Song-Chun Zhu","Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data - for Future Prediction",stat.ML cs.AI cs.CL cs.CV cs.LG," Future predictions on sequence data (e.g., videos or audios) require the -algorithms to capture non-Markovian and compositional properties of high-level -semantics. Context-free grammars are natural choices to capture such -properties, but traditional grammar parsers (e.g., Earley parser) only take -symbolic sentences as inputs. In this paper, we generalize the Earley parser to -parse sequence data which is neither segmented nor labeled. This generalized -Earley parser integrates a grammar parser with a classifier to find the optimal -segmentation and labels, and makes top-down future predictions. Experiments -show that our method significantly outperforms other approaches for future -human activity prediction. -" -7900,1806.03529,Mor Geva and Jonathan Berant,Learning to Search in Long Documents Using Document Structure,cs.CL cs.IR," Reading comprehension models are based on recurrent neural networks that -sequentially process the document tokens. As interest turns to answering more -complex questions over longer documents, sequential reading of large portions -of text becomes a substantial bottleneck. Inspired by how humans use document -structure, we propose a novel framework for reading comprehension. We represent -documents as trees, and model an agent that learns to interleave quick -navigation through the document tree with more expensive answer extraction. To -encourage exploration of the document tree, we propose a new algorithm, based -on Deep Q-Network (DQN), which strategically samples tree nodes at training -time. Empirically we find our algorithm improves question answering performance -compared to DQN and a strong information-retrieval (IR) baseline, and that -ensembling our model with the IR baseline results in further gains in -performance. -" -7901,1806.03537,"Andrey Kutuzov, Lilja {\O}vrelid, Terrence Szymanski, Erik Velldal",Diachronic word embeddings and semantic shifts: a survey,cs.CL," Recent years have witnessed a surge of publications aimed at tracing temporal -changes in lexical semantics using distributional methods, particularly -prediction-based word embedding models. However, this vein of research lacks -the cohesion, common terminology and shared practices of more established areas -of natural language processing. In this paper, we survey the current state of -academic research related to diachronic word embeddings and semantic shifts -detection. We start with discussing the notion of semantic shifts, and then -continue with an overview of the existing methods for tracing such time-related -shifts with word embedding models. We propose several axes along which these -methods can be compared, and outline the main challenges before this emerging -subfield of NLP, as well as prospects and possible applications. -" -7902,1806.03561,Peter Clark,What Knowledge is Needed to Solve the RTE5 Textual Entailment Challenge?,cs.CL cs.AI," This document gives a knowledge-oriented analysis of about 20 interesting -Recognizing Textual Entailment (RTE) examples, drawn from the 2005 RTE5 -competition test set. The analysis ignores shallow statistical matching -techniques between T and H, and rather asks: What would it take to reasonably -infer that T implies H? What world knowledge would be needed for this task? -Although such knowledge-intensive techniques have not had much success in RTE -evaluations, ultimately an intelligent system should be expected to know and -deploy this kind of world knowledge required to perform this kind of reasoning. - The selected examples are typically ones which our RTE system (called BLUE) -got wrong and ones which require world knowledge to answer. In particular, the -analysis covers cases where there was near-perfect lexical overlap between T -and H, yet the entailment was NO, i.e., examples that most likely all current -RTE systems will have got wrong. A nice example is #341 (page 26), that -requires inferring from ""a river floods"" that ""a river overflows its banks"". -Seems it should be easy, right? Enjoy! -" -7903,1806.03578,"An Yang, Kai Liu, Jing Liu, Yajuan Lyu, Sujian Li","Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading - Comprehension Task",cs.CL," Current evaluation metrics to question answering based machine reading -comprehension (MRC) systems generally focus on the lexical overlap between the -candidate and reference answers, such as ROUGE and BLEU. However, bias may -appear when these metrics are used for specific question types, especially -questions inquiring yes-no opinions and entity lists. In this paper, we make -adaptations on the metrics to better correlate n-gram overlap with the human -judgment for answers to these two question types. Statistical analysis proves -the effectiveness of our approach. Our adaptations may provide positive -guidance for the development of real-scene MRC systems. -" -7904,1806.03590,"Nurendra Choudhary, Rajat Singh, Manish Shrivastava","Cross-Lingual Task-Specific Representation Learning for Text - Classification in Resource Poor Languages",cs.CL," Neural network models have shown promising results for text classification. -However, these solutions are limited by their dependence on the availability of -annotated data. - The prospect of leveraging resource-rich languages to enhance the text -classification of resource-poor languages is fascinating. The performance on -resource-poor languages can significantly improve if the resource availability -constraints can be offset. To this end, we present a twin Bidirectional Long -Short Term Memory (Bi-LSTM) network with shared parameters consolidated by a -contrastive loss function (based on a similarity metric). The model learns the -representation of resource-poor and resource-rich sentences in a common space -by using the similarity between their assigned annotation tags. Hence, the -model projects sentences with similar tags closer and those with different tags -farther from each other. We evaluated our model on the classification tasks of -sentiment analysis and emoji prediction for resource-poor languages - Hindi and -Telugu and resource-rich languages - English and Spanish. Our model -significantly outperforms the state-of-the-art approaches in both the tasks -across all metrics. -" -7905,1806.03621,"Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou - Li","Learning Acoustic Word Embeddings with Temporal Context for - Query-by-Example Speech Search",cs.CL," We propose to learn acoustic word embeddings with temporal context for -query-by-example (QbE) speech search. The temporal context includes the leading -and trailing word sequences of a word. We assume that there exist spoken word -pairs in the training database. We pad the word pairs with their original -temporal context to form fixed-length speech segment pairs. We obtain the -acoustic word embeddings through a deep convolutional neural network (CNN) -which is trained on the speech segment pairs with a triplet loss. Shifting a -fixed-length analysis window through the search content, we obtain a running -sequence of embeddings. In this way, searching for the spoken query is -equivalent to the matching of acoustic word embeddings. The experiments show -that our proposed acoustic word embeddings learned with temporal context are -effective in QbE speech search. They outperform the state-of-the-art -frame-level feature representations and reduce run-time computation since no -dynamic time warping is required in QbE speech search. We also find that it is -important to have sufficient speech segment pairs to train the deep CNN for -effective acoustic word embeddings. -" -7906,1806.03648,Ken Yano,"Neural Disease Named Entity Extraction with Character-based BiLSTM+CRF - in Japanese Medical Text",cs.CL cs.IR," We propose an 'end-to-end' character-based recurrent neural network that -extracts disease named entities from a Japanese medical text and simultaneously -judges its modality as either positive or negative; i.e., the mentioned disease -or symptom is affirmed or negated. The motivation to adopt neural networks is -to learn effective lexical and structural representation features for Entity -Recognition and also for Positive/Negative classification from an annotated -corpora without explicitly providing any rule-based or manual feature sets. We -confirmed the superiority of our method over previous char-based CRF or SVM -methods in the results. -" -7907,1806.03653,An Yang and Sujian Li,SciDTB: Discourse Dependency TreeBank for Scientific Abstracts,cs.CL," Annotation corpus for discourse relations benefits NLP tasks such as machine -translation and question answering. In this paper, we present SciDTB, a -domain-specific discourse treebank annotated on scientific articles. Different -from widely-used RST-DT and PDTB, SciDTB uses dependency trees to represent -discourse structure, which is flexible and simplified to some extent but do not -sacrifice structural integrity. We discuss the labeling framework, annotation -workflow and some statistics about SciDTB. Furthermore, our treebank is made as -a benchmark for evaluating discourse dependency parsers, on which we provide -several baselines as fundamental work. -" -7908,1806.03661,Fahim Dalvi and Nadir Durrani and Hassan Sajjad and Stephan Vogel,"Incremental Decoding and Training Methods for Simultaneous Translation - in Neural Machine Translation",cs.CL," We address the problem of simultaneous translation by modifying the Neural MT -decoder to operate with dynamically built encoder and attention. We propose a -tunable agent which decides the best segmentation strategy for a user-defined -BLEU loss and Average Proportion (AP) constraint. Our agent outperforms -previously proposed Wait-if-diff and Wait-if-worse agents (Cho and Esipova, -2016) on BLEU with a lower latency. Secondly we proposed data-driven changes to -Neural MT training to better match the incremental decoding framework. -" -7909,1806.03688,Michael J Bommarito II and Daniel Martin Katz and Eric M Detterman,"LexNLP: Natural language processing and information extraction for legal - and regulatory texts",cs.CL cs.IR stat.ML," LexNLP is an open source Python package focused on natural language -processing and machine learning for legal and regulatory text. The package -includes functionality to (i) segment documents, (ii) identify key text such as -titles and section headings, (iii) extract over eighteen types of structured -information like distances and dates, (iv) extract named entities such as -companies and geopolitical entities, (v) transform text into features for model -training, and (vi) build unsupervised and supervised models such as word -embedding or tagging models. LexNLP includes pre-trained models based on -thousands of unit tests drawn from real documents available from the SEC EDGAR -database as well as various judicial and regulatory proceedings. LexNLP is -designed for use in both academic research and industrial applications, and is -distributed at https://github.com/LexPredict/lexpredict-lexnlp. -" -7910,1806.03692,"Junyang Lin, Xu Sun, Xuancheng Ren, Shuming Ma, Jinsong Su, Qi Su",Deconvolution-Based Global Decoding for Neural Machine Translation,cs.CL cs.AI cs.LG," A great proportion of sequence-to-sequence (Seq2Seq) models for Neural -Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate -translation word by word following a sequential order. As the studies of -linguistics have proved that language is not linear word sequence but sequence -of complex structure, translation at each step should be conditioned on the -whole target-side context. To tackle the problem, we propose a new NMT model -that decodes the sequence with the guidance of its structural prediction of the -context of the target sequence. Our model generates translation based on the -structural prediction of the target-side context so that the translation can be -freed from the bind of sequential order. Experimental results demonstrate that -our model is more competitive compared with the state-of-the-art methods, and -the analysis reflects that our model is also robust to translating sentences of -different lengths and it also reduces repetition with the instruction from the -target-side context for decoding. -" -7911,1806.03711,"Qingyu Yin, Yu Zhang, Weinan Zhang, Ting Liu, William Yang Wang",Deep Reinforcement Learning for Chinese Zero pronoun Resolution,cs.CL," Deep neural network models for Chinese zero pronoun resolution learn semantic -information for zero pronoun and candidate antecedents, but tend to be -short-sighted---they often make local decisions. They typically predict -coreference chains between the zero pronoun and one single candidate antecedent -one link at a time, while overlooking their long-term influence on future -decisions. Ideally, modeling useful information of preceding potential -antecedents is critical when later predicting zero pronoun-candidate antecedent -pairs. In this study, we show how to integrate local and global decision-making -by exploiting deep reinforcement learning models. With the help of the -reinforcement learning agent, our model learns the policy of selecting -antecedents in a sequential manner, where useful information provided by -earlier predicted antecedents could be utilized for making later coreference -decisions. Experimental results on OntoNotes 5.0 dataset show that our -technique surpasses the state-of-the-art models. -" -7912,1806.03713,"Elena Kochkina, Maria Liakata, Arkaitz Zubiaga",All-in-one: Multi-task Learning for Rumour Verification,cs.CL," Automatic resolution of rumours is a challenging task that can be broken down -into smaller components that make up a pipeline, including rumour detection, -rumour tracking and stance classification, leading to the final outcome of -determining the veracity of a rumour. In previous work, these steps in the -process of rumour verification have been developed as separate components where -the output of one feeds into the next. We propose a multi-task learning -approach that allows joint training of the main and auxiliary tasks, improving -the performance of rumour verification. We examine the connection between the -dataset properties and the outcomes of the multi-task learning models used. -" -7913,1806.03740,"Ryan Cotterell, Christo Kirov, Sabrina J. Mielke, Jason Eisner",Unsupervised Disambiguation of Syncretism in Inflected Lexicons,cs.CL," Lexical ambiguity makes it difficult to compute various useful statistics of -a corpus. A given word form might represent any of several morphological -feature bundles. One can, however, use unsupervised learning (as in EM) to fit -a model that probabilistically disambiguates word forms. We present such an -approach, which employs a neural network to smoothly model a prior distribution -over feature bundles (even rare ones). Although this basic model does not -consider a token's context, that very property allows it to operate on a simple -list of unigram type counts, partitioning each count among different analyses -of that unigram. We discuss evaluation metrics for this novel task and report -results on 5 languages. -" -7914,1806.03743,"Ryan Cotterell, Sabrina J. Mielke, Jason Eisner, Brian Roark",Are All Languages Equally Hard to Language-Model?,cs.CL," For general modeling methods applied to diverse languages, a natural question -is: how well should we expect our models to work on languages with differing -typological profiles? In this work, we develop an evaluation framework for fair -cross-linguistic comparison of language models, using translated text so that -all models are asked to predict approximately the same information. We then -conduct a study on 21 languages, demonstrating that in some languages, the -textual expression of the information is harder to predict with both $n$-gram -and LSTM language models. We show complex inflectional morphology to be a cause -of performance differences among languages. -" -7915,1806.03746,"Lawrence Wolf-Sonkin, Jason Naradowsky, Sabrina J. Mielke, Ryan - Cotterell","A Structured Variational Autoencoder for Contextual Morphological - Inflection",cs.CL," Statistical morphological inflectors are typically trained on fully -supervised, type-level data. One remaining open research question is the -following: How can we effectively exploit raw, token-level data to improve -their performance? To this end, we introduce a novel generative latent-variable -model for the semi-supervised learning of inflection generation. To enable -posterior inference over the latent variables, we derive an efficient -variational inference procedure based on the wake-sleep algorithm. We -experiment on 23 languages, using the Universal Dependencies corpora in a -simulated low-resource setting, and find improvements of over 10% absolute -accuracy in some cases. -" -7916,1806.03757,"Antonis Anastasopoulos, Marika Lekakou, Josep Quer, Eleni Zimianiti, - Justin DeBenedetto, and David Chiang","Part-of-Speech Tagging on an Endangered Language: a Parallel - Griko-Italian Resource",cs.CL," Most work on part-of-speech (POS) tagging is focused on high resource -languages, or examines low-resource and active learning settings through -simulated studies. We evaluate POS tagging techniques on an actual endangered -language, Griko. We present a resource that contains 114 narratives in Griko, -along with sentence-level translations in Italian, and provides gold -annotations for the test set. Based on a previously collected small corpus, we -investigate several traditional methods, as well as methods that take advantage -of monolingual data or project cross-lingual POS tags. We show that the -combination of a semi-supervised method with cross-lingual transfer is more -appropriate for this extremely challenging setting, with the best tagger -achieving an accuracy of 72.9%. With an applied active learning scheme, which -we use to collect sentence-level annotations over the test set, we achieve -improvements of more than 21 percentage points. -" -7917,1806.03821,"Gangula Rama Rohit Reddy, Radhika Mamidi","Addition of Code Mixed Features to Enhance the Sentiment Prediction of - Song Lyrics",cs.CL," Sentiment analysis, also called opinion mining, is the field of study that -analyzes people's opinions,sentiments, attitudes and emotions. Songs are -important to sentiment analysis since the songs and mood are mutually dependent -on each other. Based on the selected song it becomes easy to find the mood of -the listener, in future it can be used for recommendation. The song lyric is a -rich source of datasets containing words that are helpful in analysis and -classification of sentiments generated from it. Now a days we observe a lot of -inter-sentential and intra-sentential code-mixing in songs which has a varying -impact on audience. To study this impact we created a Telugu songs dataset -which contained both Telugu-English code-mixed and pure Telugu songs. In this -paper, we classify the songs based on its arousal as exciting or non-exciting. -We develop a language identification tool and introduce code-mixing features -obtained from it as additional features. Our system with these additional -features attains 4-5% accuracy greater than traditional approaches on our -dataset. -" -7918,1806.03822,"Pranav Rajpurkar, Robin Jia, and Percy Liang",Know What You Don't Know: Unanswerable Questions for SQuAD,cs.CL," Extractive reading comprehension systems can often locate the correct answer -to a question in a context document, but they also tend to make unreliable -guesses on questions for which the correct answer is not stated in the context. -Existing datasets either focus exclusively on answerable questions, or use -automatically generated unanswerable questions that are easy to identify. To -address these weaknesses, we present SQuAD 2.0, the latest version of the -Stanford Question Answering Dataset (SQuAD). SQuAD 2.0 combines existing SQuAD -data with over 50,000 unanswerable questions written adversarially by -crowdworkers to look similar to answerable ones. To do well on SQuAD 2.0, -systems must not only answer questions when possible, but also determine when -no answer is supported by the paragraph and abstain from answering. SQuAD 2.0 -is a challenging natural language understanding task for existing models: a -strong neural system that gets 86% F1 on SQuAD 1.1 achieves only 66% F1 on -SQuAD 2.0. -" -7919,1806.03831,"Mohit Shridhar, David Hsu","Interactive Visual Grounding of Referring Expressions for Human-Robot - Interaction",cs.RO cs.CL cs.CV," This paper presents INGRESS, a robot system that follows human natural -language instructions to pick and place everyday objects. The core issue here -is the grounding of referring expressions: infer objects and their -relationships from input images and language expressions. INGRESS allows for -unconstrained object categories and unconstrained language expressions. -Further, it asks questions to disambiguate referring expressions interactively. -To achieve these, we take the approach of grounding by generation and propose a -two-stage neural network model for grounding. The first stage uses a neural -network to generate visual descriptions of objects, compares them with the -input language expression, and identifies a set of candidate objects. The -second stage uses another neural network to examine all pairwise relations -between the candidates and infers the most likely referred object. The same -neural networks are used for both grounding and question generation for -disambiguation. Experiments show that INGRESS outperformed a state-of-the-art -method on the RefCOCO dataset and in robot experiments with humans. -" -7920,1806.03847,"Aly Magassouba, Komei Sugiura and Hisashi Kawai","A Multimodal Classifier Generative Adversarial Network for Carry and - Place Tasks from Ambiguous Language Instructions",cs.RO cs.CL," This paper focuses on a multimodal language understanding method for -carry-and-place tasks with domestic service robots. We address the case of -ambiguous instructions, that is, when the target area is not specified. For -instance ""put away the milk and cereal"" is a natural instruction where there is -ambiguity regarding the target area, considering environments in daily life. -Conventionally, this instruction can be disambiguated from a dialogue system, -but at the cost of time and cumbersome interaction. Instead, we propose a -multimodal approach, in which the instructions are disambiguated using the -robot's state and environment context. We develop the Multi-Modal Classifier -Generative Adversarial Network (MMC-GAN) to predict the likelihood of different -target areas considering the robot's physical limitation and the target -clutter. Our approach, MMC-GAN, significantly improves accuracy compared with -baseline methods that use instructions only or simple deep neural networks. -" -7921,1806.03869,Yuichiroh Matsubayashi and Kentaro Inui,"Distance-Free Modeling of Multi-Predicate Interactions in End-to-End - Japanese Predicate-Argument Structure Analysis",cs.CL," Capturing interactions among multiple predicate-argument structures (PASs) is -a crucial issue in the task of analyzing PAS in Japanese. In this paper, we -propose new Japanese PAS analysis models that integrate the label prediction -information of arguments in multiple PASs by extending the input and last -layers of a standard deep bidirectional recurrent neural network (bi-RNN) -model. In these models, using the mechanisms of pooling and attention, we aim -to directly capture the potential interactions among multiple PASs, without -being disturbed by the word order and distance. Our experiments show that the -proposed models improve the prediction accuracy specifically for cases where -the predicate and argument are in an indirect dependency relation and achieve a -new state of the art in the overall $F_1$ on a standard benchmark corpus. -" -7922,1806.03957,"Aleksandr Chuklin, Aliaksei Severyn, Johanne Trippas, Enrique - Alfonseca, Hanna Silen and Damiano Spina",Prosody Modifications for Question-Answering in Voice-Only Settings,cs.CL cs.HC," Many popular form factors of digital assistants---such as Amazon Echo, Apple -Homepod, or Google Home---enable the user to hold a conversation with these -systems based only on the speech modality. The lack of a screen presents unique -challenges. To satisfy the information need of a user, the presentation of the -answer needs to be optimized for such voice-only interactions. In this paper, -we propose a task of evaluating the usefulness of audio transformations (i.e., -prosodic modifications) for voice-only question answering. We introduce a -crowdsourcing setup where we evaluate the quality of our proposed modifications -along multiple dimensions corresponding to the informativeness, naturalness, -and ability of the user to identify key parts of the answer. We offer a set of -prosodic modifications that highlight potentially important parts of the answer -using various acoustic cues. Our experiments show that some of these prosodic -modifications lead to better comprehension at the expense of only slightly -degraded naturalness of the audio. -" -7923,1806.04068,"Shuohang Wang, Mo Yu, Shiyu Chang, Jing Jiang",A Co-Matching Model for Multi-choice Reading Comprehension,cs.CL," Multi-choice reading comprehension is a challenging task, which involves the -matching between a passage and a question-answer pair. This paper proposes a -new co-matching approach to this problem, which jointly models whether a -passage can match both a question and a candidate answer. Experimental results -on the RACE dataset demonstrate that our approach achieves state-of-the-art -performance. -" -7924,1806.04092,"Abhik Jana, Pranjal Kanojiya, Pawan Goyal and Animesh Mukherjee","WikiRef: Wikilinks as a route to recommending appropriate references for - scientific Wikipedia pages",cs.CL," The exponential increase in the usage of Wikipedia as a key source of -scientific knowledge among the researchers is making it absolutely necessary to -metamorphose this knowledge repository into an integral and self-contained -source of information for direct utilization. Unfortunately, the references -which support the content of each Wikipedia entity page, are far from complete. -Why are the reference section ill-formed for most Wikipedia pages? Is this -section edited as frequently as the other sections of a page? Can there be -appropriate surrogates that can automatically enhance the reference section? In -this paper, we propose a novel two step approach -- WikiRef -- that (i) -leverages the wikilinks present in a scientific Wikipedia target page and, -thereby, (ii) recommends highly relevant references to be included in that -target page appropriately and automatically borrowed from the reference section -of the wikilinks. In the first step, we build a classifier to ascertain whether -a wikilink is a potential source of reference or not. In the following step, we -recommend references to the target page from the reference section of the -wikilinks that are classified as potential sources of references in the first -step. We perform an extensive evaluation of our approach on datasets from two -different domains -- Computer Science and Physics. For Computer Science we -achieve a notably good performance with a precision@1 of 0.44 for reference -recommendation as opposed to 0.38 obtained from the most competitive baseline. -For the Physics dataset, we obtain a similar performance boost of 10% with -respect to the most competitive baseline. -" -7925,1806.04127,"John Hale, Chris Dyer, Adhiguna Kuncoro, Jonathan R. Brennan",Finding Syntax in Human Encephalography with Beam Search,cs.CL," Recurrent neural network grammars (RNNGs) are generative models of -(tree,string) pairs that rely on neural networks to evaluate derivational -choices. Parsing with them using beam search yields a variety of incremental -complexity metrics such as word surprisal and parser action count. When used as -regressors against human electrophysiological responses to naturalistic text, -they derive two amplitude effects: an early peak and a P600-like later peak. By -contrast, a non-syntactic neural language model yields no reliable effects. -Model comparisons attribute the early peak to syntactic composition within the -RNNG. This pattern of results recommends the RNNG+beam search combination as a -mechanistic model of the syntactic processing that occurs during normal human -language comprehension. -" -7926,1806.04168,"Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron - Courville, Yoshua Bengio","Straight to the Tree: Constituency Parsing with Neural Syntactic - Distance",cs.CL cs.AI cs.LG," In this work, we propose a novel constituency parsing scheme. The model -predicts a vector of real-valued scalars, named syntactic distances, for each -split position in the input sentence. The syntactic distances specify the order -in which the split points will be selected, recursively partitioning the input, -in a top-down fashion. Compared to traditional shift-reduce parsing schemes, -our approach is free from the potential problem of compounding errors, while -being faster and easier to parallelize. Our model achieves competitive -performance amongst single model, discriminative parsers in the PTB dataset and -outperforms previous models in the CTB dataset. -" -7927,1806.04185,"Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain J. - Marshall, Ani Nenkova, Byron C. Wallace","A Corpus with Multi-Level Annotations of Patients, Interventions and - Outcomes to Support Language Processing for Medical Literature",cs.CL," We present a corpus of 5,000 richly annotated abstracts of medical articles -describing clinical randomized controlled trials. Annotations include -demarcations of text spans that describe the Patient population enrolled, the -Interventions studied and to what they were Compared, and the Outcomes measured -(the `PICO' elements). These spans are further annotated at a more granular -level, e.g., individual interventions within them are marked and mapped onto a -structured medical vocabulary. We acquired annotations from a diverse set of -workers with varying levels of expertise and cost. We describe our data -collection process and the corpus itself in detail. We then outline a set of -challenging NLP tasks that would aid searching of the medical literature and -the practice of evidence-based medicine. -" -7928,1806.04189,"Minjia Zhang, Xiaodong Liu, Wenhan Wang, Jianfeng Gao, Yuxiong He","Navigating with Graph Representations for Fast and Scalable Decoding of - Neural Language Models",cs.CL cs.AI," Neural language models (NLMs) have recently gained a renewed interest by -achieving state-of-the-art performance across many natural language processing -(NLP) tasks. However, NLMs are very computationally demanding largely due to -the computational cost of the softmax layer over a large vocabulary. We observe -that, in decoding of many NLP tasks, only the probabilities of the top-K -hypotheses need to be calculated preciously and K is often much smaller than -the vocabulary size. This paper proposes a novel softmax layer approximation -algorithm, called Fast Graph Decoder (FGD), which quickly identifies, for a -given context, a set of K words that are most likely to occur according to a -NLM. We demonstrate that FGD reduces the decoding time by an order of magnitude -while attaining close to the full softmax baseline accuracy on neural machine -translation and language modeling tasks. We also prove the theoretical -guarantee on the softmax approximation quality. -" -7929,1806.04197,"Sanjana Sharma, Saksham Agrawal, Manish Shrivastava",Degree based Classification of Harmful Speech using Twitter Data,cs.CL," Harmful speech has various forms and it has been plaguing the social media in -different ways. If we need to crackdown different degrees of hate speech and -abusive behavior amongst it, the classification needs to be based on complex -ramifications which needs to be defined and hold accountable for, other than -racist, sexist or against some particular group and community. This paper -primarily describes how we created an ontological classification of harmful -speech based on degree of hateful intent, and used it to annotate twitter data -accordingly. The key contribution of this paper is the new dataset of tweets we -created based on ontological classes and degrees of harmful speech found in the -text. We also propose supervised classification system for recognizing these -respective harmful speech classes in the texts hence. -" -7930,1806.04262,"Andre Cianflone, Yulan Feng, Jad Kabbara, Jackie Chi Kit Cheung","Let's do it ""again"": A First Computational Approach to Detecting - Adverbial Presupposition Triggers",cs.CL," We introduce the task of predicting adverbial presupposition triggers such as -also and again. Solving such a task requires detecting recurring or similar -events in the discourse context, and has applications in natural language -generation tasks such as summarization and dialogue systems. We create two new -datasets for the task, derived from the Penn Treebank and the Annotated English -Gigaword corpora, as well as a novel attention mechanism tailored to this task. -Our attention mechanism augments a baseline recurrent neural network without -the need for additional trainable parameters, minimizing the added -computational cost of our mechanism. We demonstrate that our model -statistically outperforms a number of baselines, including an LSTM-based -language model. -" -7931,1806.04270,"Shudong Hao, Michael J. Paul",Learning Multilingual Topics from Incomparable Corpus,cs.CL," Multilingual topic models enable crosslingual tasks by extracting consistent -topics from multilingual corpora. Most models require parallel or comparable -training corpora, which limits their ability to generalize. In this paper, we -first demystify the knowledge transfer mechanism behind multilingual topic -models by defining an alternative but equivalent formulation. Based on this -analysis, we then relax the assumption of training data required by most -existing models, creating a model that only requires a dictionary for training. -Experiments show that our new method effectively learns coherent multilingual -topics from partially and fully incomparable corpora with limited amounts of -dictionary resources. -" -7932,1806.04284,"Chenhui Chu, Mayu Otani and Yuta Nakashima",iParaphrasing: Extracting Visually Grounded Paraphrases via an Image,cs.CL cs.AI cs.CV cs.LG cs.MM," A paraphrase is a restatement of the meaning of a text in other words. -Paraphrases have been studied to enhance the performance of many natural -language processing tasks. In this paper, we propose a novel task iParaphrasing -to extract visually grounded paraphrases (VGPs), which are different phrasal -expressions describing the same visual concept in an image. These extracted -VGPs have the potential to improve language and image multimodal tasks such as -visual question answering and image captioning. How to model the similarity -between VGPs is the key of iParaphrasing. We apply various existing methods as -well as propose a novel neural network-based method with image attention, and -report the results of the first attempt toward iParaphrasing. -" -7933,1806.04291,"Manuel Mager, Ximena Gutierrez-Vasques, Gerardo Sierra, Ivan Meza","Challenges of language technologies for the indigenous languages of the - Americas",cs.CL," Indigenous languages of the American continent are highly diverse. However, -they have received little attention from the technological perspective. In this -paper, we review the research, the digital resources and the available NLP -systems that focus on these languages. We present the main challenges and -research questions that arise when distant languages and low-resource scenarios -are faced. We would like to encourage NLP research in linguistically rich and -diverse areas like the Americas. -" -7934,1806.04313,"Bhuwan Dhingra, Christopher J. Shallue, Mohammad Norouzi, Andrew M. - Dai, George E. Dahl",Embedding Text in Hyperbolic Spaces,cs.CL cs.LG," Natural language text exhibits hierarchical structure in a variety of -respects. Ideally, we could incorporate our prior knowledge of this -hierarchical structure into unsupervised learning algorithms that work on text -data. Recent work by Nickel & Kiela (2017) proposed using hyperbolic instead of -Euclidean embedding spaces to represent hierarchical data and demonstrated -encouraging results when embedding graphs. In this work, we extend their method -with a re-parameterization technique that allows us to learn hyperbolic -embeddings of arbitrarily parameterized objects. We apply this framework to -learn word and sentence embeddings in hyperbolic space in an unsupervised -manner from text corpora. The resulting embeddings seem to encode certain -intuitive notions of hierarchy, such as word-context frequency and phrase -constituency. However, the implicit continuous hierarchy in the learned -hyperbolic space makes interrogating the model's learned hierarchies more -difficult than for models that learn explicit edges between items. The learned -hyperbolic embeddings show improvements over Euclidean embeddings in some -- -but not all -- downstream tasks, suggesting that hierarchical organization is -more useful for some tasks than others. -" -7935,1806.04327,"Stefano Mezza, Alessandra Cervone, Giuliano Tortoreto, Evgeny A. - Stepanov, Giuseppe Riccardi","ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational - Agents",cs.CL," Dialogue Act (DA) tagging is crucial for spoken language understanding -systems, as it provides a general representation of speakers' intents, not -bound to a particular dialogue system. Unfortunately, publicly available data -sets with DA annotation are all based on different annotation schemes and thus -incompatible with each other. Moreover, their schemes often do not cover all -aspects necessary for open-domain human-machine interaction. In this paper, we -propose a methodology to map several publicly available corpora to a subset of -the ISO standard, in order to create a large task-independent training corpus -for DA classification. We show the feasibility of using this corpus to train a -domain-independent DA tagger testing it on out-of-domain conversational data, -and argue the importance of training on multiple corpora to achieve robustness -across different DA categories. -" -7936,1806.04330,Wuwei Lan and Wei Xu,"Neural Network Models for Paraphrase Identification, Semantic Textual - Similarity, Natural Language Inference, and Question Answering",cs.CL," In this paper, we analyze several neural network designs (and their -variations) for sentence pair modeling and compare their performance -extensively across eight datasets, including paraphrase identification, -semantic textual similarity, natural language inference, and question answering -tasks. Although most of these models have claimed state-of-the-art performance, -the original papers often reported on only one or two selected datasets. We -provide a systematic study and show that (i) encoding contextual information by -LSTM and inter-sentence interactions are critical, (ii) Tree-LSTM does not help -as much as previously claimed but surprisingly improves performance on Twitter -datasets, (iii) the Enhanced Sequential Inference Model is the best so far for -larger datasets, while the Pairwise Word Interaction Model achieves the best -performance when less data is available. We release our implementations as an -open-source toolkit. -" -7937,1806.04346,Ruidan He and Wee Sun Lee and Hwee Tou Ng and Daniel Dahlmeier,Exploiting Document Knowledge for Aspect-level Sentiment Classification,cs.CL," Attention-based long short-term memory (LSTM) networks have proven to be -useful in aspect-level sentiment classification. However, due to the -difficulties in annotating aspect-level data, existing public datasets for this -task are all relatively small, which largely limits the effectiveness of those -neural models. In this paper, we explore two approaches that transfer knowledge -from document- level data, which is much less expensive to obtain, to improve -the performance of aspect-level sentiment classification. We demonstrate the -effectiveness of our approaches on 4 public datasets from SemEval 2014, 2015, -and 2016, and we show that attention-based LSTM benefits from document-level -knowledge in multiple ways. -" -7938,1806.04357,"Xing Niu, Sudha Rao, Marine Carpuat","Multi-Task Neural Models for Translating Between Styles Within and - Across Languages",cs.CL," Generating natural language requires conveying content in an appropriate -style. We explore two related tasks on generating text of varying formality: -monolingual formality transfer and formality-sensitive machine translation. We -propose to solve these tasks jointly using multi-task learning, and show that -our models achieve state-of-the-art performance for formality transfer and are -able to perform formality-sensitive translation without being explicitly -trained on style-annotated translation examples. -" -7939,1806.04381,"Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde","Projecting Embeddings for Domain Adaptation: Joint Modeling of Sentiment - Analysis in Diverse Domains",cs.CL," Domain adaptation for sentiment analysis is challenging due to the fact that -supervised classifiers are very sensitive to changes in domain. The two most -prominent approaches to this problem are structural correspondence learning and -autoencoders. However, they either require long training times or suffer -greatly on highly divergent domains. Inspired by recent advances in -cross-lingual sentiment analysis, we provide a novel perspective and cast the -domain adaptation problem as an embedding projection task. Our model takes as -input two mono-domain embedding spaces and learns to project them to a -bi-domain space, which is jointly optimized to (1) project across domains and -to (2) predict sentiment. We perform domain adaptation experiments on 20 -source-target domain pairs for sentiment classification and report novel -state-of-the-art results on 11 domain pairs, including the Amazon domain -adaptation datasets and SemEval 2013 and 2016 datasets. Our analysis shows that -our model performs comparably to state-of-the-art approaches on domains that -are similar, while performing significantly better on highly divergent domains. -Our code is available at https://github.com/jbarnesspain/domain_blse -" -7940,1806.04387,Bhargav Chippada and Shubajit Saha,Knowledge Amalgam: Generating Jokes and Quotes Together,cs.CL cs.AI," Generating humor and quotes are very challenging problems in the field of -computational linguistics and are often tackled separately. In this paper, we -present a controlled Long Short-Term Memory (LSTM) architecture which is -trained with categorical data like jokes and quotes together by passing -category as an input along with the sequence of words. The idea is that a -single neural net will learn the structure of both jokes and quotes to generate -them on demand according to input category. Importantly, we believe the neural -net has more knowledge as it's trained on different datasets and hence will -enable it to generate more creative jokes or quotes from the mixture of -information. May the network generate a funny inspirational joke! -" -7941,1806.04402,"Ryan Cotterell, Julia Kreutzer",Explaining and Generalizing Back-Translation through Wake-Sleep,cs.CL," Back-translation has become a commonly employed heuristic for semi-supervised -neural machine translation. The technique is both straightforward to apply and -has led to state-of-the-art results. In this work, we offer a principled -interpretation of back-translation as approximate inference in a generative -model of bitext and show how the standard implementation of back-translation -corresponds to a single iteration of the wake-sleep algorithm in our proposed -model. Moreover, this interpretation suggests a natural iterative -generalization, which we demonstrate leads to further improvement of up to 1.6 -BLEU. -" -7942,1806.04441,"Haoyang Wen, Yijia Liu, Wanxiang Che, Libo Qin, Ting Liu","Sequence-to-Sequence Learning for Task-oriented Dialogue with Dialogue - State Representation",cs.CL," Classic pipeline models for task-oriented dialogue system require explicit -modeling the dialogue states and hand-crafted action spaces to query a -domain-specific knowledge base. Conversely, sequence-to-sequence models learn -to map dialogue history to the response in current turn without explicit -knowledge base querying. In this work, we propose a novel framework that -leverages the advantages of classic pipeline and sequence-to-sequence models. -Our framework models a dialogue state as a fixed-size distributed -representation and use this representation to query a knowledge base via an -attention mechanism. Experiment on Stanford Multi-turn Multi-domain -Task-oriented Dialogue Dataset shows that our framework significantly -outperforms other sequence-to-sequence based baseline models on both automatic -and human evaluation. -" -7943,1806.04450,"Madan Gopal Jhanwar, Arpita Das","An Ensemble Model for Sentiment Analysis of Hindi-English Code-Mixed - Data",cs.CL," In multilingual societies like India, code-mixed social media texts comprise -the majority of the Internet. Detecting the sentiment of the code-mixed user -opinions plays a crucial role in understanding social, economic and political -trends. In this paper, we propose an ensemble of character-trigrams based LSTM -model and word-ngrams based Multinomial Naive Bayes (MNB) model to identify the -sentiments of Hindi-English (Hi-En) code-mixed data. The ensemble model -combines the strengths of rich sequential patterns from the LSTM model and -polarity of keywords from the probabilistic ngram model to identify sentiments -in sparse and inconsistent code-mixed data. Experiments on reallife user -code-mixed data reveals that our approach yields state-of-the-art results as -compared to several baselines and other deep learning based proposed methods. -" -7944,1806.04456,"Rajeev Gupta, Ranganath Kondapally, Chakrapani Ravi Kiran",Impersonation: Modeling Persona in Smart Responses to Email,cs.CL," In this paper, we present design, implementation, and effectiveness of -generating personalized suggestions for email replies. To personalize email -responses based on users style and personality, we model the users persona -based on her past responses to emails. This model is added to the -language-based model created across users using past responses of the all user -emails. - A users model captures the typical responses of the user given a particular -context. The context includes the email received, recipient of the email, and -other external signals such as calendar activities, preferences, etc. The -context along with users personality (e.g., extrovert, formal, reserved, etc.) -is used to suggest responses. These responses can be a mixture of multiple -modes: email replies (textual), audio clips, etc. This helps in making -responses mimic the user as much as possible and helps the user to be more -productive while retaining her mark in the responses. -" -7945,1806.04458,"Artem Sokolov, Julian Hitschler, Mayumi Ohta, Stefan Riezler","Sparse Stochastic Zeroth-Order Optimization with an Application to - Bandit Structured Prediction",stat.ML cs.CL cs.LG," Stochastic zeroth-order (SZO), or gradient-free, optimization allows to -optimize arbitrary functions by relying only on function evaluations under -parameter perturbations, however, the iteration complexity of SZO methods -suffers a factor proportional to the dimensionality of the perturbed function. -We show that in scenarios with natural sparsity patterns as in structured -prediction applications, this factor can be reduced to the expected number of -active features over input-output pairs. We give a general proof that applies -sparse SZO optimization to Lipschitz-continuous, nonconvex, stochastic -objectives, and present an experimental evaluation on linear bandit structured -prediction tasks with sparse word-based feature representations that confirm -our theoretical results. -" -7946,1806.04466,"Shaohui Kuang, Deyi Xiong","Fusing Recency into Neural Machine Translation with an Inter-Sentence - Gate Model",cs.CL," Neural machine translation (NMT) systems are usually trained on a large -amount of bilingual sentence pairs and translate one sentence at a time, -ignoring inter-sentence information. This may make the translation of a -sentence ambiguous or even inconsistent with the translations of neighboring -sentences. In order to handle this issue, we propose an inter-sentence gate -model that uses the same encoder to encode two adjacent sentences and controls -the amount of information flowing from the preceding sentence to the -translation of the current sentence with an inter-sentence gate. In this way, -our proposed model can capture the connection between sentences and fuse -recency from neighboring sentences into neural machine translation. On several -NIST Chinese-English translation tasks, our experiments demonstrate that the -proposed inter-sentence gate model achieves substantial improvements over the -baseline. -" -7947,1806.04470,"Jie Yang, Shuailong Liang, Yue Zhang",Design Challenges and Misconceptions in Neural Sequence Labeling,cs.CL," We investigate the design challenges of constructing effective and efficient -neural sequence labeling systems, by reproducing twelve neural sequence -labeling models, which include most of the state-of-the-art structures, and -conduct a systematic model comparison on three benchmarks (i.e. NER, Chunking, -and POS tagging). Misconceptions and inconsistent conclusions in existing -literature are examined and clarified under statistical experiments. In the -comparison and analysis process, we reach several practical conclusions which -can be useful to practitioners. -" -7948,1806.04508,Ndapa Nakashole and Raphael Flauger,Characterizing Departures from Linearity in Word Translation,cs.CL," We investigate the behavior of maps learned by machine translation methods. -The maps translate words by projecting between word embedding spaces of -different languages. We locally approximate these maps using linear maps, and -find that they vary across the word embedding space. This demonstrates that the -underlying maps are non-linear. Importantly, we show that the locally linear -maps vary by an amount that is tightly correlated with the distance between the -neighborhoods on which they are trained. Our results can be used to test -non-linear methods, and to drive the design of more accurate maps for word -translation. -" -7949,1806.04510,Abel L Peirson V and E Meltem Tolunay,Dank Learning: Generating Memes Using Deep Neural Networks,cs.CL cs.LG," We introduce a novel meme generation system, which given any image can -produce a humorous and relevant caption. Furthermore, the system can be -conditioned on not only an image but also a user-defined label relating to the -meme template, giving a handle to the user on meme content. The system uses a -pretrained Inception-v3 network to return an image embedding which is passed to -an attention-based deep-layer LSTM model producing the caption - inspired by -the widely recognised Show and Tell Model. We implement a modified beam search -to encourage diversity in the captions. We evaluate the quality of our model -using perplexity and human assessment on both the quality of memes generated -and whether they can be differentiated from real ones. Our model produces -original memes that cannot on the whole be differentiated from real ones. -" -7950,1806.04511,"Ethem F. Can, Aysu Ezen-Can, Fazli Can",Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data,cs.CL cs.IR," Sentiment analysis is a widely studied NLP task where the goal is to -determine opinions, emotions, and evaluations of users towards a product, an -entity or a service that they are reviewing. One of the biggest challenges for -sentiment analysis is that it is highly language dependent. Word embeddings, -sentiment lexicons, and even annotated data are language specific. Further, -optimizing models for each language is very time consuming and labor intensive -especially for recurrent neural network models. From a resource perspective, it -is very challenging to collect data for different languages. - In this paper, we look for an answer to the following research question: can -a sentiment analysis model trained on a language be reused for sentiment -analysis in other languages, Russian, Spanish, Turkish, and Dutch, where the -data is more limited? Our goal is to build a single model in the language with -the largest dataset available for the task, and reuse it for languages that -have limited resources. For this purpose, we train a sentiment analysis model -using recurrent neural networks with reviews in English. We then translate -reviews in other languages and reuse this model to evaluate the sentiments. -Experimental results show that our robust approach of single model trained on -English reviews statistically significantly outperforms the baselines in -several different languages. -" -7951,1806.04523,"Wenpeng Yin, Yadollah Yaghoobzadeh, Hinrich Sch\""utze",Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs,cs.CL," Large scale knowledge graphs (KGs) such as Freebase are generally incomplete. -Reasoning over multi-hop (mh) KG paths is thus an important capability that is -needed for question answering or other NLP tasks that require knowledge about -the world. mh-KG reasoning includes diverse scenarios, e.g., given a head -entity and a relation path, predict the tail entity; or given two entities -connected by some relation paths, predict the unknown relation between them. We -present ROPs, recurrent one-hop predictors, that predict entities at each step -of mh-KB paths by using recurrent neural networks and vector representations of -entities and relations, with two benefits: (i) modeling mh-paths of arbitrary -lengths while updating the entity and relation representations by the training -signal at each step; (ii) handling different types of mh-KG reasoning in a -unified framework. Our models show state-of-the-art for two important multi-hop -KG reasoning tasks: Knowledge Base Completion and Path Query Answering. -" -7952,1806.04524,"Edison Marrese-Taylor, Ai Nakajima, Yutaka Matsuo, Ono Yuichi",Learning to Automatically Generate Fill-In-The-Blank Quizzes,cs.CL," In this paper we formalize the problem automatic fill-in-the-blank question -generation using two standard NLP machine learning schemes, proposing concrete -deep learning models for each. We present an empirical study based on data -obtained from a language learning platform showing that both of our proposed -settings offer promising results. -" -7953,1806.04525,"Anton Osika, Susanna Nilsson, Andrii Sydorchuk, Faruk Sahin, Anders - Huss",Second Language Acquisition Modeling: An Ensemble Approach,cs.CL," Accurate prediction of students knowledge is a fundamental building block of -personalized learning systems. Here, we propose a novel ensemble model to -predict student knowledge gaps. Applying our approach to student trace data -from the online educational platform Duolingo we achieved highest score on both -evaluation metrics for all three datasets in the 2018 Shared Task on Second -Language Acquisition Modeling. We describe our model and discuss relevance of -the task compared to how it would be setup in a production environment for -personalized education. -" -7954,1806.04532,"Wenpeng Yin, Dan Roth",Term Definitions Help Hypernymy Detection,cs.CL," Existing methods of hypernymy detection mainly rely on statistics over a big -corpus, either mining some co-occurring patterns like ""animals such as cats"" or -embedding words of interest into context-aware vectors. These approaches are -therefore limited by the availability of a large enough corpus that can cover -all terms of interest and provide sufficient contextual information to -represent their meaning. In this work, we propose a new paradigm, HyperDef, for -hypernymy detection -- expressing word meaning by encoding word definitions, -along with context driven representation. This has two main benefits: (i) -Definitional sentences express (sense-specific) corpus-independent meanings of -words, hence definition-driven approaches enable strong generalization -- once -trained, the model is expected to work well in open-domain testbeds; (ii) -Global context from a large corpus and definitions provide complementary -information for words. Consequently, our model, HyperDef, once trained on -task-agnostic data, gets state-of-the-art results in multiple benchmarks -" -7955,1806.04535,"Srishti Aggarwal, Kritik Mathur, Radhika Mamidi",Automatic Target Recovery for Hindi-English Code Mixed Puns,cs.CL cs.AI," In order for our computer systems to be more human-like, with a higher -emotional quotient, they need to be able to process and understand intrinsic -human language phenomena like humour. In this paper, we consider a subtype of -humour - puns, which are a common type of wordplay-based jokes. In particular, -we consider code-mixed puns which have become increasingly mainstream on social -media, in informal conversations and advertisements and aim to build a system -which can automatically identify the pun location and recover the target of -such puns. We first study and classify code-mixed puns into two categories -namely intra-sentential and intra-word, and then propose a four-step algorithm -to recover the pun targets for puns belonging to the intra-sentential category. -Our algorithm uses language models, and phonetic similarity-based features to -get the desired results. We test our approach on a small set of code-mixed -punning advertisements, and observe that our system is successfully able to -recover the targets for 67% of the puns. -" -7956,1806.04550,Florian Schmidt and Thomas Hofmann,Deep State Space Models for Unconditional Word Generation,cs.LG cs.CL stat.ML," Autoregressive feedback is considered a necessity for successful -unconditional text generation using stochastic sequence models. However, such -feedback is known to introduce systematic biases into the training process and -it obscures a principle of generation: committing to global information and -forgetting local nuances. We show that a non-autoregressive deep state space -model with a clear separation of global and local uncertainty can be built from -only two ingredients: An independent noise source and a deterministic -transition function. Recent advances on flow-based variational inference can be -used to train an evidence lower-bound without resorting to annealing, auxiliary -losses or similar measures. The result is a highly interpretable generative -model on par with comparable auto-regressive models on the task of word -generation. -" -7957,1806.04558,"Ye Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, - Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu","Transfer Learning from Speaker Verification to Multispeaker - Text-To-Speech Synthesis",cs.CL cs.LG cs.SD eess.AS," We describe a neural network-based system for text-to-speech (TTS) synthesis -that is able to generate speech audio in the voice of many different speakers, -including those unseen during training. Our system consists of three -independently trained components: (1) a speaker encoder network, trained on a -speaker verification task using an independent dataset of noisy speech from -thousands of speakers without transcripts, to generate a fixed-dimensional -embedding vector from seconds of reference speech from a target speaker; (2) a -sequence-to-sequence synthesis network based on Tacotron 2, which generates a -mel spectrogram from text, conditioned on the speaker embedding; (3) an -auto-regressive WaveNet-based vocoder that converts the mel spectrogram into a -sequence of time domain waveform samples. We demonstrate that the proposed -model is able to transfer the knowledge of speaker variability learned by the -discriminatively-trained speaker encoder to the new task, and is able to -synthesize natural speech from speakers that were not seen during training. We -quantify the importance of training the speaker encoder on a large and diverse -speaker set in order to obtain the best generalization performance. Finally, we -show that randomly sampled speaker embeddings can be used to synthesize speech -in the voice of novel speakers dissimilar from those used in training, -indicating that the model has learned a high quality speaker representation. -" -7958,1806.04616,"Annie Louis, Santanu Kumar Dash, Earl T. Barr and Charles Sutton",Deep Learning to Detect Redundant Method Comments,cs.SE cs.CL," Comments in software are critical for maintenance and reuse. But apart from -prescriptive advice, there is little practical support or quantitative -understanding of what makes a comment useful. In this paper, we introduce the -task of identifying comments which are uninformative about the code they are -meant to document. To address this problem, we introduce the notion of comment -entailment from code, high entailment indicating that a comment's natural -language semantics can be inferred directly from the code. Although not all -entailed comments are low quality, comments that are too easily inferred, for -example, comments that restate the code, are widely discouraged by authorities -on software style. Based on this, we develop a tool called CRAIC which scores -method-level comments for redundancy. Highly redundant comments can then be -expanded or alternately removed by the developer. CRAIC uses deep language -models to exploit large software corpora without requiring expensive manual -annotations of entailment. We show that CRAIC can perform the comment -entailment task with good agreement with human judgements. Our findings also -have implications for documentation tools. For example, we find that common -tags in Javadoc are at least two times more predictable from code than -non-Javadoc sentences, suggesting that Javadoc tags are less informative than -more free-form comments -" -7959,1806.04713,Hanan Aldarmaki and Mona Diab,Evaluation of Unsupervised Compositional Representations,cs.CL," We evaluated various compositional models, from bag-of-words representations -to compositional RNN-based models, on several extrinsic supervised and -unsupervised evaluation benchmarks. Our results confirm that weighted vector -averaging can outperform context-sensitive models in most benchmarks, but -structural features encoded in RNN models can also be useful in certain -classification tasks. We analyzed some of the evaluation datasets to identify -the aspects of meaning they measure and the characteristics of the various -models that explain their performance variance. -" -7960,1806.04818,"Zexian Zeng, Ankita Roy, Xiaoyu Li, Sasa Espino, Susan Clare, Seema - Khan, Yuan Luo","Using Clinical Narratives and Structured Data to Identify Distant - Recurrences in Breast Cancer",cs.CL," Accurately identifying distant recurrences in breast cancer from the -Electronic Health Records (EHR) is important for both clinical care and -secondary analysis. Although multiple applications have been developed for -computational phenotyping in breast cancer, distant recurrence identification -still relies heavily on manual chart review. In this study, we aim to develop a -model that identifies distant recurrences in breast cancer using clinical -narratives and structured data from EHR. We apply MetaMap to extract features -from clinical narratives and also retrieve structured clinical data from EHR. -Using these features, we train a support vector machine model to identify -distant recurrences in breast cancer patients. We train the model using 1,396 -double-annotated subjects and validate the model using 599 double-annotated -subjects. In addition, we validate the model on a set of 4,904 single-annotated -subjects as a generalization test. We obtained a high area under curve (AUC) -score of 0.92 (SD=0.01) in the cross-validation using the training dataset, -then obtained AUC scores of 0.95 and 0.93 in the held-out test and -generalization test using 599 and 4,904 samples respectively. Our model can -accurately and efficiently identify distant recurrences in breast cancer by -combining features extracted from unstructured clinical narratives and -structured clinical data. -" -7961,1806.04820,"Zexian Zeng, Yu Deng, Xiaoyu Li, Tristan Naumann, Yuan Luo",Natural Language Processing for EHR-Based Computational Phenotyping,cs.CL," This article reviews recent advances in applying natural language processing -(NLP) to Electronic Health Records (EHRs) for computational phenotyping. -NLP-based computational phenotyping has numerous applications including -diagnosis categorization, novel phenotype discovery, clinical trial screening, -pharmacogenomics, drug-drug interaction (DDI) and adverse drug event (ADE) -detection, as well as genome-wide and phenome-wide association studies. -Significant progress has been made in algorithm development and resource -construction for computational phenotyping. Among the surveyed methods, -well-designed keyword search and rule-based systems often achieve good -performance. However, the construction of keyword and rule lists requires -significant manual effort, which is difficult to scale. Supervised machine -learning models have been favored because they are capable of acquiring both -classification patterns and structures from data. Recently, deep learning and -unsupervised learning have received growing attention, with the former favored -for its performance and the latter for its ability to find novel phenotypes. -Integrating heterogeneous data sources have become increasingly important and -have shown promise in improving model performance. Often better performance is -achieved by combining multiple modalities of information. Despite these many -advances, challenges and opportunities remain for NLP-based computational -phenotyping, including better model interpretability and generalizability, and -proper characterization of feature relations in clinical narratives -" -7962,1806.04822,"Pengcheng Yang and Xu Sun and Wei Li and Shuming Ma and Wei Wu and - Houfeng Wang",SGM: Sequence Generation Model for Multi-label Classification,cs.CL," Multi-label classification is an important yet challenging task in natural -language processing. It is more complex than single-label classification in -that the labels tend to be correlated. Existing methods tend to ignore the -correlations between labels. Besides, different parts of the text can -contribute differently for predicting different labels, which is not considered -by existing models. In this paper, we propose to view the multi-label -classification task as a sequence generation problem, and apply a sequence -generation model with a novel decoder structure to solve it. Extensive -experimental results show that our proposed methods outperform previous work by -a substantial margin. Further analysis of experimental results demonstrates -that the proposed methods not only capture the correlations between labels, but -also select the most informative words automatically when predicting different -labels. -" -7963,1806.04841,Hao Tang and Wei-Ning Hsu and Francois Grondin and James Glass,"A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain - Adaptation in Distant Speech Recognition",cs.CL cs.LG cs.SD eess.AS," Speech recognizers trained on close-talking speech do not generalize to -distant speech and the word error rate degradation can be as large as 40% -absolute. Most studies focus on tackling distant speech recognition as a -separate problem, leaving little effort to adapting close-talking speech -recognizers to distant speech. In this work, we review several approaches from -a domain adaptation perspective. These approaches, including speech -enhancement, multi-condition training, data augmentation, and autoencoders, all -involve a transformation of the data between domains. We conduct experiments on -the AMI data set, where these approaches can be realized under the same -controlled setting. These approaches lead to different amounts of improvement -under their respective assumptions. The purpose of this paper is to quantify -and characterize the performance gap between the two domains, setting up the -basis for studying adaptation of speech recognizers from close-talking speech -to distant speech. Our results also have implications for improving distant -speech recognition. -" -7964,1806.04856,"Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin and Tie-Yan Liu",Double Path Networks for Sequence to Sequence Learning,cs.CL," Encoder-decoder based Sequence to Sequence learning (S2S) has made remarkable -progress in recent years. Different network architectures have been used in the -encoder/decoder. Among them, Convolutional Neural Networks (CNN) and Self -Attention Networks (SAN) are the prominent ones. The two architectures achieve -similar performances but use very different ways to encode and decode context: -CNN use convolutional layers to focus on the local connectivity of the -sequence, while SAN uses self-attention layers to focus on global semantics. In -this work we propose Double Path Networks for Sequence to Sequence learning -(DPN-S2S), which leverage the advantages of both models by using double path -information fusion. During the encoding step, we develop a double path -architecture to maintain the information coming from different paths with -convolutional layers and self-attention layers separately. To effectively use -the encoded context, we develop a cross attention module with gating and use it -to automatically pick up the information needed during the decoding step. By -deeply integrating the two paths with cross attention, both types of -information are combined and well exploited. Experiments show that our proposed -method can significantly improve the performance of sequence to sequence -learning over state-of-the-art systems. -" -7965,1806.04860,"Zhou Su, Chen Zhu, Yinpeng Dong, Dongqi Cai, Yurong Chen, Jianguo Li",Learning Visual Knowledge Memory Networks for Visual Question Answering,cs.CV cs.CL," Visual question answering (VQA) requires joint comprehension of images and -natural language questions, where many questions can't be directly or clearly -answered from visual content but require reasoning from structured human -knowledge with confirmation from visual content. This paper proposes visual -knowledge memory network (VKMN) to address this issue, which seamlessly -incorporates structured human knowledge and deep visual features into memory -networks in an end-to-end learning framework. Comparing to existing methods for -leveraging external knowledge for supporting VQA, this paper stresses more on -two missing mechanisms. First is the mechanism for integrating visual contents -with knowledge facts. VKMN handles this issue by embedding knowledge triples -(subject, relation, target) and deep visual features jointly into the visual -knowledge features. Second is the mechanism for handling multiple knowledge -facts expanding from question and answer pairs. VKMN stores joint embedding -using key-value pair structure in the memory networks so that it is easy to -handle multiple facts. Experiments show that the proposed method achieves -promising results on both VQA v1.0 and v2.0 benchmarks, while outperforms -state-of-the-art methods on the knowledge-reasoning related questions. -" -7966,1806.04872,"Wei-Ning Hsu, Hao Tang, James Glass","Unsupervised Adaptation with Interpretable Disentangled Representations - for Distant Conversational Speech Recognition",cs.CL cs.LG cs.NE cs.SD eess.AS," The current trend in automatic speech recognition is to leverage large -amounts of labeled data to train supervised neural network models. -Unfortunately, obtaining data for a wide range of domains to train robust -models can be costly. However, it is relatively inexpensive to collect large -amounts of unlabeled data from domains that we want the models to generalize -to. In this paper, we propose a novel unsupervised adaptation method that -learns to synthesize labeled data for the target domain from unlabeled -in-domain data and labeled out-of-domain data. We first learn without -supervision an interpretable latent representation of speech that encodes -linguistic and nuisance factors (e.g., speaker and channel) using different -latent variables. To transform a labeled out-of-domain utterance without -altering its transcript, we transform the latent nuisance variables while -maintaining the linguistic variables. To demonstrate our approach, we focus on -a channel mismatch setting, where the domain of interest is distant -conversational speech, and labels are only available for close-talking speech. -Our proposed method is evaluated on the AMI dataset, outperforming all -baselines and bridging the gap between unadapted and in-domain models by over -77% without using any parallel data. -" -7967,1806.04936,"Stanislau Semeniuta, Aliaksei Severyn, Sylvain Gelly",On Accurate Evaluation of GANs for Language Generation,cs.CL," Generative Adversarial Networks (GANs) are a promising approach to language -generation. The latest works introducing novel GAN models for language -generation use n-gram based metrics for evaluation and only report single -scores of the best run. In this paper, we argue that this often misrepresents -the true picture and does not tell the full story, as GAN models can be -extremely sensitive to the random initialization and small deviations from the -best hyperparameter choice. In particular, we demonstrate that the previously -used BLEU score is not sensitive to semantic deterioration of generated texts -and propose alternative metrics that better capture the quality and diversity -of the generated samples. We also conduct a set of experiments comparing a -number of GAN models for text with a conventional Language Model (LM) and find -that neither of the considered models performs convincingly better than the LM. -" -7968,1806.04973,Michael J Bommarito II and Daniel Martin Katz and Eric M Detterman,OpenEDGAR: Open Source Software for SEC EDGAR Analysis,cs.CL cs.DB," OpenEDGAR is an open source Python framework designed to rapidly construct -research databases based on the Electronic Data Gathering, Analysis, and -Retrieval (EDGAR) system operated by the US Securities and Exchange Commission -(SEC). OpenEDGAR is built on the Django application framework, supports -distributed compute across one or more servers, and includes functionality to -(i) retrieve and parse index and filing data from EDGAR, (ii) build tables for -key metadata like form type and filer, (iii) retrieve, parse, and update CIK to -ticker and industry mappings, (iv) extract content and metadata from filing -documents, and (v) search filing document contents. OpenEDGAR is designed for -use in both academic research and industrial applications, and is distributed -under MIT License at https://github.com/LexPredict/openedgar. -" -7969,1806.05030,Herman Kamper and Michael Roth,Visually grounded cross-lingual keyword spotting in speech,cs.CL cs.CV," Recent work considered how images paired with speech can be used as -supervision for building speech systems when transcriptions are not available. -We ask whether visual grounding can be used for cross-lingual keyword spotting: -given a text keyword in one language, the task is to retrieve spoken utterances -containing that keyword in another language. This could enable searching -through speech in a low-resource language using text queries in a high-resource -language. As a proof-of-concept, we use English speech with German queries: we -use a German visual tagger to add keyword labels to each training image, and -then train a neural network to map English speech to German keywords. Without -seeing parallel speech-transcriptions or translations, the model achieves a -precision at ten of 58%. We show that most erroneous retrievals contain -equivalent or semantically relevant keywords; excluding these would improve -P@10 to 91%. -" -7970,1806.05059,"Shiyu Zhou, Shuang Xu, Bo Xu","Multilingual End-to-End Speech Recognition with A Single Transformer on - Low-Resource Languages",eess.AS cs.CL cs.SD," Sequence-to-sequence attention-based models integrate an acoustic, -pronunciation and language model into a single neural network, which make them -very suitable for multilingual automatic speech recognition (ASR). In this -paper, we are concerned with multilingual speech recognition on low-resource -languages by a single Transformer, one of sequence-to-sequence attention-based -models. Sub-words are employed as the multilingual modeling unit without using -any pronunciation lexicon. First, we show that a single multilingual ASR -Transformer performs well on low-resource languages despite of some language -confusion. We then look at incorporating language information into the model by -inserting the language symbol at the beginning or at the end of the original -sub-words sequence under the condition of language information being known -during training. Experiments on CALLHOME datasets demonstrate that the -multilingual ASR Transformer with the language symbol at the end performs -better and can obtain relatively 10.5\% average word error rate (WER) reduction -compared to SHL-MLSTM with residual learning. We go on to show that, assuming -the language information being known during training and testing, about -relatively 12.4\% average WER reduction can be observed compared to SHL-MLSTM -with residual learning through giving the language symbol as the sentence start -token. -" -7971,1806.05099,"Zhengzhong Liu, Teruko Mitamura, Eduard Hovy",Graph-Based Decoding for Event Sequencing and Coreference Resolution,cs.CL," Events in text documents are interrelated in complex ways. In this paper, we -study two types of relation: Event Coreference and Event Sequencing. We show -that the popular tree-like decoding structure for automated Event Coreference -is not suitable for Event Sequencing. To this end, we propose a graph-based -decoding algorithm that is applicable to both tasks. The new decoding algorithm -supports flexible feature sets for both tasks. Empirically, our event -coreference system has achieved state-of-the-art performance on the TAC-KBP -2015 event coreference task and our event sequencing system beats a strong -temporal-based, oracle-informed baseline. We discuss the challenges of studying -these event relations. -" -7972,1806.05130,"Andrew Wood, Paige Rodeghero, Ameer Armaly, Collin McMillan","Detecting Speech Act Types in Developer Question/Answer Conversations - During Bug Repair",cs.SE cs.CL," This paper targets the problem of speech act detection in conversations about -bug repair. We conduct a ""Wizard of Oz"" experiment with 30 professional -programmers, in which the programmers fix bugs for two hours, and use a -simulated virtual assistant for help. Then, we use an open coding manual -annotation procedure to identify the speech act types in the conversations. -Finally, we train and evaluate a supervised learning algorithm to automatically -detect the speech act types in the conversations. In 30 two-hour conversations, -we made 2459 annotations and uncovered 26 speech act types. Our automated -detection achieved 69% precision and 50% recall. The key application of this -work is to advance the state of the art for virtual assistants in software -engineering. Virtual assistant technology is growing rapidly, though -applications in software engineering are behind those in other areas, largely -due to a lack of relevant data and experiments. This paper targets this problem -in the area of developer Q/A conversations about bug repair. -" -7973,1806.05138,"Harshil Shah, David Barber",Generative Neural Machine Translation,cs.CL cs.LG stat.ML," We introduce Generative Neural Machine Translation (GNMT), a latent variable -architecture which is designed to model the semantics of the source and target -sentences. We modify an encoder-decoder translation model by adding a latent -variable as a language agnostic representation which is encouraged to learn the -meaning of the sentence. GNMT achieves competitive BLEU scores on pure -translation tasks, and is superior when there are missing words in the source -sentence. We augment the model to facilitate multilingual translation and -semi-supervised learning without adding parameters. This framework -significantly reduces overfitting when there is limited paired data available, -and is effective for translating between pairs of languages not seen during -training. -" -7974,1806.05177,"Subba Reddy Oota, Naresh Manwani, and Bapi Raju S","fMRI Semantic Category Decoding using Linguistic Encoding of Word - Embeddings",q-bio.NC cs.CL cs.CV," The dispute of how the human brain represents conceptual knowledge has been -argued in many scientific fields. Brain imaging studies have shown that the -spatial patterns of neural activation in the brain are correlated with thinking -about different semantic categories of words (for example, tools, animals, and -buildings) or when viewing the related pictures. In this paper, we present a -computational model that learns to predict the neural activation captured in -functional magnetic resonance imaging (fMRI) data of test words. Unlike the -models with hand-crafted features that have been used in the literature, in -this paper we propose a novel approach wherein decoding models are built with -features extracted from popular linguistic encodings of Word2Vec, GloVe, -Meta-Embeddings in conjunction with the empirical fMRI data associated with -viewing several dozen concrete nouns. We compared these models with several -other models that use word features extracted from FastText, Randomly-generated -features, Mitchell's 25 features [1]. The experimental results show that the -predicted fMRI images using Meta-Embeddings meet the state-of-the-art -performance. Although models with features from GloVe and Word2Vec predict fMRI -images similar to the state-of-the-art model, model with features from -Meta-Embeddings predicts significantly better. The proposed scheme that uses -popular linguistic encoding offers a simple and easy approach for semantic -decoding from fMRI experiments. -" -7975,1806.05178,"Harshil Shah, Bowen Zheng, David Barber",Generating Sentences Using a Dynamic Canvas,cs.CL cs.LG stat.ML," We introduce the Attentive Unsupervised Text (W)riter (AUTR), which is a word -level generative model for natural language. It uses a recurrent neural network -with a dynamic attention and canvas memory mechanism to iteratively construct -sentences. By viewing the state of the memory at intermediate stages and where -the model is placing its attention, we gain insight into how it constructs -sentences. We demonstrate that AUTR learns a meaningful latent representation -for each sentence, and achieves competitive log-likelihood lower bounds whilst -being computationally efficient. It is effective at generating and -reconstructing sentences, as well as imputing missing words. -" -7976,1806.05180,"Andreas Hanselowski, Avinesh PVS, Benjamin Schiller, Felix Caspelherr, - Debanjan Chaudhuri, Christian M. Meyer and Iryna Gurevych","A Retrospective Analysis of the Fake News Challenge Stance Detection - Task",cs.IR cs.AI cs.CL cs.SI," The 2017 Fake News Challenge Stage 1 (FNC-1) shared task addressed a stance -classification task as a crucial first step towards detecting fake news. To -date, there is no in-depth analysis paper to critically discuss FNC-1's -experimental setup, reproduce the results, and draw conclusions for -next-generation stance classification methods. In this paper, we provide such -an in-depth analysis for the three top-performing systems. We first find that -FNC-1's proposed evaluation metric favors the majority class, which can be -easily classified, and thus overestimates the true discriminative power of the -methods. Therefore, we propose a new F1-based metric yielding a changed system -ranking. Next, we compare the features and architectures used, which leads to a -novel feature-rich stacked LSTM model that performs on par with the best -systems, but is superior in predicting minority classes. To understand the -methods' ability to generalize, we derive a new dataset and perform both -in-domain and cross-domain experiments. Our qualitative and quantitative study -helps interpreting the original FNC-1 scores and understand which features help -improving performance and why. Our new dataset and all source code used during -the reproduction study are publicly available for future research. -" -7977,1806.05210,Gongbo Tang and Fabienne Cap and Eva Pettersson and Joakim Nivre,"An Evaluation of Neural Machine Translation Models on Historical - Spelling Normalization",cs.CL," In this paper, we apply different NMT models to the problem of historical -spelling normalization for five languages: English, German, Hungarian, -Icelandic, and Swedish. The NMT models are at different levels, have different -attention mechanisms, and different neural network architectures. Our results -show that NMT models are much better than SMT models in terms of character -error rate. The vanilla RNNs are competitive to GRUs/LSTMs in historical -spelling normalization. Transformer models perform better only when provided -with more training data. We also find that subword-level models with a small -subword vocabulary are better than character-level models for low-resource -languages. In addition, we propose a hybrid method which further improves the -performance of historical spelling normalization. -" -7978,1806.05219,Andrew Moore and Paul Rayson,"Bringing replication and reproduction together with generalisability in - NLP: Three reproduction studies for Target Dependent Sentiment Analysis",cs.CL," Lack of repeatability and generalisability are two significant threats to -continuing scientific development in Natural Language Processing. Language -models and learning methods are so complex that scientific conference papers no -longer contain enough space for the technical depth required for replication or -reproduction. Taking Target Dependent Sentiment Analysis as a case study, we -show how recent work in the field has not consistently released code, or -described settings for learning methods in enough detail, and lacks -comparability and generalisability in train, test or validation data. To -investigate generalisability and to enable state of the art comparative -evaluations, we carry out the first reproduction studies of three groups of -complementary methods and perform the first large-scale mass evaluation on six -different English datasets. Reflecting on our experiences, we recommend that -future replication or reproduction experiments should always consider a variety -of datasets alongside documenting and releasing their methods and published -code in order to minimise the barriers to both repeatability and -generalisability. We have released our code with a model zoo on GitHub with -Jupyter Notebooks to aid understanding and full documentation, and we recommend -that others do the same with their papers at submission time through an -anonymised GitHub account. -" -7979,1806.05231,D.B. Skillicorn and N. Alsadhan,Beyond Bags of Words: Inferring Systemic Nets,cs.CL," Textual analytics based on representations of documents as bags of words have -been reasonably successful. However, analysis that requires deeper insight into -language, into author properties, or into the contexts in which documents were -created requires a richer representation. Systemic nets are one such -representation. They have not been extensively used because they required human -effort to construct. We show that systemic nets can be algorithmically inferred -from corpora, that the resulting nets are plausible, and that they can provide -practical benefits for knowledge discovery problems. This opens up a new class -of practical analysis techniques for textual analytics. -" -7980,1806.05258,"Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney - and Nazli Goharian","SMHD: A Large-Scale Resource for Exploring Online Language Usage for - Multiple Mental Health Conditions",cs.CL," Mental health is a significant and growing public health concern. As language -usage can be leveraged to obtain crucial insights into mental health -conditions, there is a need for large-scale, labeled, mental health-related -datasets of users who have been diagnosed with one or more of such conditions. -In this paper, we investigate the creation of high-precision patterns to -identify self-reported diagnoses of nine different mental health conditions, -and obtain high-quality labeled data without the need for manual labelling. We -introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it -available. SMHD is a novel large dataset of social media posts from users with -one or multiple mental health conditions along with matched control users. We -examine distinctions in users' language, as measured by linguistic and -psychological variables. We further explore text classification methods to -identify individuals with mental conditions through their language. -" -7981,1806.05284,"Vlad Eidelman, Anastassia Kornilova, Daniel Argyle","How Predictable is Your State? Leveraging Lexical and Contextual - Information for Predicting Legislative Floor Action at the State Level",cs.CL," Modeling U.S. Congressional legislation and roll-call votes has received -significant attention in previous literature. However, while legislators across -50 state governments and D.C. propose over 100,000 bills each year, and on -average enact over 30% of them, state level analysis has received relatively -less attention due in part to the difficulty in obtaining the necessary data. -Since each state legislature is guided by their own procedures, politics and -issues, however, it is difficult to qualitatively asses the factors that affect -the likelihood of a legislative initiative succeeding. Herein, we present -several methods for modeling the likelihood of a bill receiving floor action -across all 50 states and D.C. We utilize the lexical content of over 1 million -bills, along with contextual legislature and legislator derived features to -build our predictive models, allowing a comparison of the factors that are -important to the lawmaking process. Furthermore, we show that these signals -hold complementary predictive power, together achieving an average improvement -in accuracy of 18% over state specific baselines. -" -7982,1806.05337,"Chandan Singh, W. James Murdoch, Bin Yu",Hierarchical interpretations for neural network predictions,cs.LG cs.AI cs.CL cs.CV stat.ML," Deep neural networks (DNNs) have achieved impressive predictive performance -due to their ability to learn complex, non-linear relationships between -variables. However, the inability to effectively visualize these relationships -has led to DNNs being characterized as black boxes and consequently limited -their applications. To ameliorate this problem, we introduce the use of -hierarchical interpretations to explain DNN predictions through our proposed -method, agglomerative contextual decomposition (ACD). Given a prediction from a -trained DNN, ACD produces a hierarchical clustering of the input features, -along with the contribution of each cluster to the final prediction. This -hierarchy is optimized to identify clusters of features that the DNN learned -are predictive. Using examples from Stanford Sentiment Treebank and ImageNet, -we show that ACD is effective at diagnosing incorrect predictions and -identifying dataset bias. Through human experiments, we demonstrate that ACD -enables users both to identify the more accurate of two DNNs and to better -trust a DNN's outputs. We also find that ACD's hierarchy is largely robust to -adversarial perturbations, implying that it captures fundamental aspects of the -input and ignores spurious noise. -" -7983,1806.05432,"Haris Bin Zia, Agha Ali Raza, Awais Athar",Urdu Word Segmentation using Conditional Random Fields (CRFs),cs.CL," State-of-the-art Natural Language Processing algorithms rely heavily on -efficient word segmentation. Urdu is amongst languages for which word -segmentation is a complex task as it exhibits space omission as well as space -insertion issues. This is partly due to the Arabic script which although -cursive in nature, consists of characters that have inherent joining and -non-joining attributes regardless of word boundary. This paper presents a word -segmentation system for Urdu which uses a Conditional Random Field sequence -modeler with orthographic, linguistic and morphological features. Our proposed -model automatically learns to predict white space as word boundary as well as -Zero Width Non-Joiner (ZWNJ) as sub-word boundary. Using a manually annotated -corpus, our model achieves F1 score of 0.97 for word boundary identification -and 0.85 for sub-word boundary identification tasks. We have made our code and -corpus publicly available to make our results reproducible. -" -7984,1806.05434,"Minghui Qiu, Liu Yang, Feng Ji, Weipeng Zhao, Wei Zhou, Jun Huang, - Haiqing Chen, W. Bruce Croft, Wei Lin","Transfer Learning for Context-Aware Question Matching in - Information-seeking Conversations in E-commerce",cs.CL," Building multi-turn information-seeking conversation systems is an important -and challenging research topic. Although several advanced neural text matching -models have been proposed for this task, they are generally not efficient for -industrial applications. Furthermore, they rely on a large amount of labeled -data, which may not be available in real-world applications. To alleviate these -problems, we study transfer learning for multi-turn information seeking -conversations in this paper. We first propose an efficient and effective -multi-turn conversation model based on convolutional neural networks. After -that, we extend our model to adapt the knowledge learned from a resource-rich -domain to enhance the performance. Finally, we deployed our model in an -industrial chatbot called AliMe Assist -(https://consumerservice.taobao.com/online-help) and observed a significant -improvement over the existing online model. -" -7985,1806.05461,Yanyan Zou and Wei Lu,"Learning Cross-lingual Distributed Logical Representations for Semantic - Parsing",cs.CL," With the development of several multilingual datasets used for semantic -parsing, recent research efforts have looked into the problem of learning -semantic parsers in a multilingual setup. However, how to improve the -performance of a monolingual semantic parser for a specific language by -leveraging data annotated in different languages remains a research question -that is under-explored. In this work, we present a study to show how learning -distributed representations of the logical forms from data annotated in -different languages can be used for improving the performance of a monolingual -semantic parser. We extend two existing monolingual semantic parsers to -incorporate such cross-lingual distributed logical representations as features. -Experiments show that our proposed approach is able to yield improved semantic -parsing results on the standard multilingual GeoQuery dataset. -" -7986,1806.05480,Ciprian-Octavian Truic\u{a} and Julien Velcin and Alexandru Boicea,"Automatic Language Identification for Romance Languages using Stop Words - and Diacritics",cs.CL cs.IR," Automatic language identification is a natural language processing problem -that tries to determine the natural language of a given content. In this paper -we present a statistical method for automatic language identification of -written text using dictionaries containing stop words and diacritics. We -propose different approaches that combine the two dictionaries to accurately -determine the language of textual corpora. This method was chosen because stop -words and diacritics are very specific to a language, although some languages -have some similar words and special characters they are not all common. The -languages taken into account were romance languages because they are very -similar and usually it is hard to distinguish between them from a computational -point of view. We have tested our method using a Twitter corpus and a news -article corpus. Both corpora consists of UTF-8 encoded text, so the diacritics -could be taken into account, in the case that the text has no diacritics only -the stop words are used to determine the language of the text. The experimental -results show that the proposed method has an accuracy of over 90% for small -texts and over 99.8% for -" -7987,1806.05482,"Dominik Mach\'a\v{c}ek, Jon\'a\v{s} Vidra, Ond\v{r}ej Bojar",Morphological and Language-Agnostic Word Segmentation for NMT,cs.CL," The state of the art of handling rich morphology in neural machine -translation (NMT) is to break word forms into subword units, so that the -overall vocabulary size of these units fits the practical limits given by the -NMT model and GPU memory capacity. In this paper, we compare two common but -linguistically uninformed methods of subword construction (BPE and STE, the -method implemented in Tensor2Tensor toolkit) and two linguistically-motivated -methods: Morfessor and one novel method, based on a derivational dictionary. -Our experiments with German-to-Czech translation, both morphologically rich, -document that so far, the non-motivated methods perform better. Furthermore, we -iden- tify a critical difference between BPE and STE and show a simple pre- -processing step for BPE that considerably increases translation quality as -evaluated by automatic measures. -" -7988,1806.05484,"Lina M.Rojas-Barahona, Stefan Ultes, Pawel Budzianowski, I\~nigo - Casanueva, Milica Gasic, Bo-Hsiang Tseng and Steve Young","Nearly Zero-Shot Learning for Semantic Decoding in Spoken Dialogue - Systems",cs.CL cs.AI," This paper presents two ways of dealing with scarce data in semantic decoding -using N-Best speech recognition hypotheses. First, we learn features by using a -deep learning architecture in which the weights for the unknown and known -categories are jointly optimised. Second, an unsupervised method is used for -further tuning the weights. Sharing weights injects prior knowledge to unknown -categories. The unsupervised tuning (i.e. the risk minimisation) improves the -F-Measure when recognising nearly zero-shot data on the DSTC3 corpus. This -unsupervised method can be applied subject to two assumptions: the rank of the -class marginal is assumed to be known and the class-conditional scores of the -classifier are assumed to follow a Gaussian distribution. -" -7989,1806.05499,Reinald Kim Amplayo and Seung-won Hwang,Aspect Sentiment Model for Micro Reviews,cs.CL," This paper aims at an aspect sentiment model for aspect-based sentiment -analysis (ABSA) focused on micro reviews. This task is important in order to -understand short reviews majority of the users write, while existing topic -models are targeted for expert-level long reviews with sufficient co-occurrence -patterns to observe. Current methods on aggregating micro reviews using -metadata information may not be effective as well due to metadata absence, -topical heterogeneity, and cold start problems. To this end, we propose a model -called Micro Aspect Sentiment Model (MicroASM). MicroASM is based on the -observation that short reviews 1) are viewed with sentiment-aspect word pairs -as building blocks of information, and 2) can be clustered into larger reviews. -When compared to the current state-of-the-art aspect sentiment models, -experiments show that our model provides better performance on aspect-level -tasks such as aspect term extraction and document-level tasks such as sentiment -classification. -" -7990,1806.05504,Reinald Kim Amplayo and Seonjae Lim and Seung-won Hwang,Entity Commonsense Representation for Neural Abstractive Summarization,cs.CL," A major proportion of a text summary includes important entities found in the -original text. These entities build up the topic of the summary. Moreover, they -hold commonsense information once they are linked to a knowledge base. Based on -these observations, this paper investigates the usage of linked entities to -guide the decoder of a neural text summarizer to generate concise and better -summaries. To this end, we leverage on an off-the-shelf entity linking system -(ELS) to extract linked entities and propose Entity2Topic (E2T), a module -easily attachable to a sequence-to-sequence model that transforms a list of -entities into a vector representation of the topic of the summary. Current -available ELS's are still not sufficiently effective, possibly introducing -unresolved ambiguities and irrelevant entities. We resolve the imperfections of -the ELS by (a) encoding entities with selective disambiguation, and (b) pooling -entity vectors using firm attention. By applying E2T to a simple -sequence-to-sequence model with attention mechanism as base model, we see -significant improvements of the performance in the Gigaword (sentence to title) -and CNN (long document to multi-sentence highlights) summarization datasets by -at least 2 ROUGE points. -" -7991,1806.05507,Reinald Kim Amplayo and Jihyeok Kim and Sua Sung and Seung-won Hwang,Cold-Start Aware User and Product Attention for Sentiment Classification,cs.CL," The use of user/product information in sentiment analysis is important, -especially for cold-start users/products, whose number of reviews are very -limited. However, current models do not deal with the cold-start problem which -is typical in review websites. In this paper, we present Hybrid Contextualized -Sentiment Classifier (HCSC), which contains two modules: (1) a fast word -encoder that returns word vectors embedded with short and long range dependency -features; and (2) Cold-Start Aware Attention (CSAA), an attention mechanism -that considers the existence of cold-start problem when attentively pooling the -encoded word vectors. HCSC introduces shared vectors that are constructed from -similar users/products, and are used when the original distinct vectors do not -have sufficient information (i.e. cold-start). This is decided by a -frequency-guided selective gate vector. Our experiments show that in terms of -RMSE, HCSC performs significantly better when compared with on famous datasets, -despite having less complexity, and thus can be trained much faster. More -importantly, our model performs significantly better than previous models when -the training data is sparse and has cold-start problems. -" -7992,1806.05513,"Ankush Khandelwal, Sahil Swami, Syed S. Akhtar and Manish Shrivastava","Humor Detection in English-Hindi Code-Mixed Social Media Content : - Corpus and Baseline System",cs.CL," The tremendous amount of user generated data through social networking sites -led to the gaining popularity of automatic text classification in the field of -computational linguistics over the past decade. Within this domain, one problem -that has drawn the attention of many researchers is automatic humor detection -in texts. In depth semantic understanding of the text is required to detect -humor which makes the problem difficult to automate. With increase in the -number of social media users, many multilingual speakers often interchange -between languages while posting on social media which is called code-mixing. It -introduces some challenges in the field of linguistic analysis of social media -content (Barman et al., 2014), like spelling variations and non-grammatical -structures in a sentence. Past researches include detecting puns in texts (Kao -et al., 2016) and humor in one-lines (Mihalcea et al., 2010) in a single -language, but with the tremendous amount of code-mixed data available online, -there is a need to develop techniques which detects humor in code-mixed tweets. -In this paper, we analyze the task of humor detection in texts and describe a -freely available corpus containing English-Hindi code-mixed tweets annotated -with humorous(H) or non-humorous(N) tags. We also tagged the words in the -tweets with Language tags (English/Hindi/Others). Moreover, we describe the -experiments carried out on the corpus and provide a baseline classification -system which distinguishes between humorous and non-humorous texts. -" -7993,1806.05516,"Reinald Kim Amplayo and Kyungjae Lee and Jinyeong Yeo and Seung-won - Hwang",Translations as Additional Contexts for Sentence Classification,cs.CL," In sentence classification tasks, additional contexts, such as the -neighboring sentences, may improve the accuracy of the classifier. However, -such contexts are domain-dependent and thus cannot be used for another -classification task with an inappropriate domain. In contrast, we propose the -use of translated sentences as context that is always available regardless of -the domain. We find that naive feature expansion of translations gains only -marginal improvements and may decrease the performance of the classifier, due -to possible inaccurate translations thus producing noisy sentence vectors. To -this end, we present multiple context fixing attachment (MCFA), a series of -modules attached to multiple sentence vectors to fix the noise in the vectors -using the other sentence vectors as context. We show that our method performs -competitively compared to previous models, achieving best classification -performance on multiple data sets. We are the first to use translations as -domain-free contexts for sentence classification. -" -7994,1806.05521,"Jisun An, Haewoon Kwak, Yong-Yeol Ahn","SemAxis: A Lightweight Framework to Characterize Domain-Specific Word - Semantics Beyond Sentiment",cs.CL cs.CY," Because word semantics can substantially change across communities and -contexts, capturing domain-specific word semantics is an important challenge. -Here, we propose SEMAXIS, a simple yet powerful framework to characterize word -semantics using many semantic axes in word- vector spaces beyond sentiment. We -demonstrate that SEMAXIS can capture nuanced semantic representations in -multiple online communities. We also show that, when the sentiment axis is -examined, SEMAXIS outperforms the state-of-the-art approaches in building -domain-specific sentiment lexicons. -" -7995,1806.05559,"Francis Gr\'egoire, Philippe Langlais","Extracting Parallel Sentences with Bidirectional Recurrent Neural - Networks to Improve Machine Translation",cs.CL cs.LG stat.ML," Parallel sentence extraction is a task addressing the data sparsity problem -found in multilingual natural language processing applications. We propose a -bidirectional recurrent neural network based approach to extract parallel -sentences from collections of multilingual texts. Our experiments with noisy -parallel corpora show that we can achieve promising results against a -competitive baseline by removing the need of specific feature engineering or -additional external resources. To justify the utility of our approach, we -extract sentence pairs from Wikipedia articles to train machine translation -systems and show significant improvements in translation performance. -" -7996,1806.05599,"Christina Niklaus, Matthias Cetto, Andr\'e Freitas and Siegfried - Handschuh",A Survey on Open Information Extraction,cs.CL," We provide a detailed overview of the various approaches that were proposed -to date to solve the task of Open Information Extraction. We present the major -challenges that such systems face, show the evolution of the suggested -approaches over time and depict the specific issues they address. In addition, -we provide a critique of the commonly applied evaluation procedures for -assessing the performance of Open IE systems and highlight some directions for -future work. -" -7997,1806.05600,"Ankush Khandelwal, Sahil Swami, Syed Sarfaraz Akhtar and Manish - Shrivastava","Gender Prediction in English-Hindi Code-Mixed Social Media Content : - Corpus and Baseline System",cs.CL," The rapid expansion in the usage of social media networking sites leads to a -huge amount of unprocessed user generated data which can be used for text -mining. Author profiling is the problem of automatically determining profiling -aspects like the author's gender and age group through a text is gaining much -popularity in computational linguistics. Most of the past research in author -profiling is concentrated on English texts \cite{1,2}. However many users often -change the language while posting on social media which is called code-mixing, -and it develops some challenges in the field of text classification and author -profiling like variations in spelling, non-grammatical structure and -transliteration \cite{3}. There are very few English-Hindi code-mixed annotated -datasets of social media content present online \cite{4}. In this paper, we -analyze the task of author's gender prediction in code-mixed content and -present a corpus of English-Hindi texts collected from Twitter which is -annotated with author's gender. We also explore language identification of -every word in this corpus. We present a supervised classification baseline -system which uses various machine learning algorithms to identify the gender of -an author using a text, based on character and word level features. -" -7998,1806.05626,"Jie Yang, Yue Zhang",NCRF++: An Open-source Neural Sequence Labeling Toolkit,cs.CL," This paper describes NCRF++, a toolkit for neural sequence labeling. NCRF++ -is designed for quick implementation of different neural sequence labeling -models with a CRF inference layer. It provides users with an inference for -building the custom model structure through configuration file with flexible -neural feature design and utilization. Built on PyTorch, the core operations -are calculated in batch, making the toolkit efficient with the acceleration of -GPU. It also includes the implementations of most state-of-the-art neural -sequence labeling models such as LSTM-CRF, facilitating reproducing and -refinement on those methods. -" -7999,1806.05645,"Hoa Trong Vu, Claudio Greco, Aliia Erofeeva, Somayeh Jafaritazehjan, - Guido Linders, Marc Tanti, Alberto Testoni, Raffaella Bernardi, Albert Gatt",Grounded Textual Entailment,cs.CL cs.CV," Capturing semantic relations between sentences, such as entailment, is a -long-standing challenge for computational semantics. Logic-based models analyse -entailment in terms of possible worlds (interpretations, or situations) where a -premise P entails a hypothesis H iff in all worlds where P is true, H is also -true. Statistical models view this relationship probabilistically, addressing -it in terms of whether a human would likely infer H from P. In this paper, we -wish to bridge these two perspectives, by arguing for a visually-grounded -version of the Textual Entailment task. Specifically, we ask whether models can -perform better if, in addition to P and H, there is also an image -(corresponding to the relevant ""world"" or ""situation""). We use a multimodal -version of the SNLI dataset (Bowman et al., 2015) and we compare ""blind"" and -visually-augmented models of textual entailment. We show that visual -information is beneficial, but we also conduct an in-depth error analysis that -reveals that current multimodal models are not performing ""grounding"" in an -optimal fashion. -" -8000,1806.05655,"Kexin Liao, Logan Lebanoff, Fei Liu",Abstract Meaning Representation for Multi-Document Summarization,cs.CL," Generating an abstract from a collection of documents is a desirable -capability for many real-world applications. However, abstractive approaches to -multi-document summarization have not been thoroughly investigated. This paper -studies the feasibility of using Abstract Meaning Representation (AMR), a -semantic representation of natural language grounded in linguistic theory, as a -form of content representation. Our approach condenses source documents to a -set of summary graphs following the AMR formalism. The summary graphs are then -transformed to a set of summary sentences in a surface realization step. The -framework is fully data-driven and flexible. Each component can be optimized -independently using small-scale, in-domain training data. We perform -experiments on benchmark summarization datasets and report promising results. -We also describe opportunities and challenges for advancing this line of -research. -" -8001,1806.05658,"Kaiqiang Song, Lin Zhao, Fei Liu",Structure-Infused Copy Mechanisms for Abstractive Summarization,cs.CL," Seq2seq learning has produced promising results on summarization. However, in -many cases, system summaries still struggle to keep the meaning of the original -intact. They may miss out important words or relations that play critical roles -in the syntactic structure of source sentences. In this paper, we present -structure-infused copy mechanisms to facilitate copying important words and -relations from the source sentence to summary sentence. The approach naturally -combines source dependency structure with the copy mechanism of an abstractive -sentence summarizer. Experimental results demonstrate the effectiveness of -incorporating source-side syntactic information in the system, and our proposed -approach compares favorably to state-of-the-art methods. -" -8002,1806.05662,"Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, - Ruslan Salakhutdinov, Yann LeCun","GLoMo: Unsupervisedly Learned Relational Graphs as Transferable - Representations",cs.LG cs.CL cs.CV stat.ML," Modern deep transfer learning approaches have mainly focused on learning -generic feature vectors from one task that are transferable to other tasks, -such as word embeddings in language and pretrained convolutional features in -vision. However, these approaches usually transfer unary features and largely -ignore more structured graphical representations. This work explores the -possibility of learning generic latent relational graphs that capture -dependencies between pairs of data units (e.g., words or pixels) from -large-scale unlabeled data and transferring the graphs to downstream tasks. Our -proposed transfer learning framework improves performance on various tasks -including question answering, natural language inference, sentiment analysis, -and image classification. We also show that the learned graphs are generic -enough to be transferred to different embeddings on which the graphs have not -been trained (including GloVe embeddings, ELMo embeddings, and task-specific -RNN hidden unit), or embedding-free units such as image pixels. -" -8003,1806.05740,"Rediet Abebe, Shawndra Hill, Jennifer Wortman Vaughan, Peter M. Small, - H. Andrew Schwartz",Using Search Queries to Understand Health Information Needs in Africa,cs.CY cs.AI cs.CL," The lack of comprehensive, high-quality health data in developing nations -creates a roadblock for combating the impacts of disease. One key challenge is -understanding the health information needs of people in these nations. Without -understanding people's everyday needs, concerns, and misconceptions, health -organizations and policymakers lack the ability to effectively target education -and programming efforts. In this paper, we propose a bottom-up approach that -uses search data from individuals to uncover and gain insight into health -information needs in Africa. We analyze Bing searches related to HIV/AIDS, -malaria, and tuberculosis from all 54 African nations. For each disease, we -automatically derive a set of common search themes or topics, revealing a -wide-spread interest in various types of information, including disease -symptoms, drugs, concerns about breastfeeding, as well as stigma, beliefs in -natural cures, and other topics that may be hard to uncover through traditional -surveys. We expose the different patterns that emerge in health information -needs by demographic groups (age and sex) and country. We also uncover -discrepancies in the quality of content returned by search engines to users by -topic. Combined, our results suggest that search data can help illuminate -health information needs in Africa and inform discussions on health policy and -targeted education efforts both on- and offline. -" -8004,1806.05838,"Marco Del Tredici, Raquel Fern\'andez","The Road to Success: Assessing the Fate of Linguistic Innovations in - Online Communities",cs.CL cs.SI," We investigate the birth and diffusion of lexical innovations in a large -dataset of online social communities. We build on sociolinguistic theories and -focus on the relation between the spread of a novel term and the social role of -the individuals who use it, uncovering characteristics of innovators and -adopters. Finally, we perform a prediction task that allows us to anticipate -whether an innovation will successfully spread within a community. -" -8005,1806.05847,Marco Del Tredici and Raquel Fern\'andez,Semantic Variation in Online Communities of Practice,cs.CL," We introduce a framework for quantifying semantic variation of common words -in Communities of Practice and in sets of topic-related communities. We show -that while some meaning shifts are shared across related communities, others -are community-specific, and therefore independent from the discussed topic. We -propose such findings as evidence in favour of sociolinguistic theories of -socially-driven semantic variation. Results are evaluated using an independent -language modelling task. Furthermore, we investigate extralinguistic features -and show that factors such as prominence and dissemination of words are related -to semantic variation. -" -8006,1806.05900,"Arne K\""ohn, Timo Baumann, Oskar D\""orfler",An Empirical Analysis of the Correlation of Syntax and Prosody,cs.CL," The relation of syntax and prosody (the syntax--prosody interface) has been -an active area of research, mostly in linguistics and typically studied under -controlled conditions. More recently, prosody has also been successfully used -in the data-based training of syntax parsers. However, there is a gap between -the controlled and detailed study of the individual effects between syntax and -prosody and the large-scale application of prosody in syntactic parsing with -only a shallow analysis of the respective influences. In this paper, we close -the gap by investigating the significance of correlations of prosodic -realization with specific syntactic functions using linear mixed effects models -in a very large corpus of read-out German encyclopedic texts. Using this -corpus, we are able to analyze prosodic structuring performed by a diverse set -of speakers while they try to optimize factual content delivery. After -normalization by speaker, we obtain significant effects, e.g. confirming that -the subject function, as compared to the object function, has a positive effect -on pitch and duration of a word, but a negative effect on loudness. -" -8007,1806.05947,Nikos Engonopoulos and Christoph Teichmann and Alexander Koller,Discovering User Groups for Natural Language Generation,cs.CL," We present a model which predicts how individual users of a dialog system -understand and produce utterances based on user groups. In contrast to previous -work, these user groups are not specified beforehand, but learned in training. -We evaluate on two referring expression (RE) generation tasks; our experiments -show that our model can identify user groups and learn how to most effectively -talk to them, and can dynamically assign unseen users to the correct groups as -they interact with the system. -" -8008,1806.05997,"Suman Banerjee, Nikita Moghe, Siddhartha Arora and Mitesh M. Khapra",A Dataset for Building Code-Mixed Goal Oriented Conversation Systems,cs.CL," There is an increasing demand for goal-oriented conversation systems which -can assist users in various day-to-day activities such as booking tickets, -restaurant reservations, shopping, etc. Most of the existing datasets for -building such conversation systems focus on monolingual conversations and there -is hardly any work on multilingual and/or code-mixed conversations. Such -datasets and systems thus do not cater to the multilingual regions of the -world, such as India, where it is very common for people to speak more than one -language and seamlessly switch between them resulting in code-mixed -conversations. For example, a Hindi speaking user looking to book a restaurant -would typically ask, ""Kya tum is restaurant mein ek table book karne mein meri -help karoge?"" (""Can you help me in booking a table at this restaurant?""). To -facilitate the development of such code-mixed conversation models, we build a -goal-oriented dialog dataset containing code-mixed conversations. Specifically, -we take the text from the DSTC2 restaurant reservation dataset and create -code-mixed versions of it in Hindi-English, Bengali-English, Gujarati-English -and Tamil-English. We also establish initial baselines on this dataset using -existing state of the art models. This dataset along with our baseline -implementations is made publicly available for research purposes. -" -8009,1806.06176,"Yao-Hung Hubert Tsai and Paul Pu Liang and Amir Zadeh and - Louis-Philippe Morency and Ruslan Salakhutdinov",Learning Factorized Multimodal Representations,cs.LG cs.CL cs.CV stat.ML," Learning multimodal representations is a fundamentally complex research -problem due to the presence of multiple heterogeneous sources of information. -Although the presence of multiple modalities provides additional valuable -information, there are two key challenges to address when learning from -multimodal data: 1) models must learn the complex intra-modal and cross-modal -interactions for prediction and 2) models must be robust to unexpected missing -or noisy modalities during testing. In this paper, we propose to optimize for a -joint generative-discriminative objective across multimodal data and labels. We -introduce a model that factorizes representations into two sets of independent -factors: multimodal discriminative and modality-specific generative factors. -Multimodal discriminative factors are shared across all modalities and contain -joint multimodal features required for discriminative tasks such as sentiment -prediction. Modality-specific generative factors are unique for each modality -and contain the information required for generating data. Experimental results -show that our model is able to learn meaningful multimodal representations that -achieve state-of-the-art or competitive performance on six multimodal datasets. -Our model demonstrates flexible generative capabilities by conditioning on -independent factors and can reconstruct missing modalities without -significantly impacting performance. Lastly, we interpret our factorized -representations to understand the interactions that influence multimodal -learning. -" -8010,1806.06187,"Wenhan Xiong, Xiaoxiao Guo, Mo Yu, Shiyu Chang, Bowen Zhou, William - Yang Wang","Scheduled Policy Optimization for Natural Language Communication with - Intelligent Agents",cs.CL," We investigate the task of learning to follow natural language instructions -by jointly reasoning with visual observations and language inputs. In contrast -to existing methods which start with learning from demonstrations (LfD) and -then use reinforcement learning (RL) to fine-tune the model parameters, we -propose a novel policy optimization algorithm which dynamically schedules -demonstration learning and RL. The proposed training paradigm provides -efficient exploration and better generalization beyond existing methods. -Comparing to existing ensemble models, the best single model based on our -proposed method tremendously decreases the execution error by over 50% on a -block-world environment. To further illustrate the exploration strategy of our -RL algorithm, We also include systematic studies on the evolution of policy -entropy during training. -" -8011,1806.06200,"Pengcheng Guo, Haihua Xu, Lei Xie, Eng Siong Chng","Study of Semi-supervised Approaches to Improving English-Mandarin - Code-Switching Speech Recognition",cs.CL," In this paper, we present our overall efforts to improve the performance of a -code-switching speech recognition system using semi-supervised training methods -from lexicon learning to acoustic modeling, on the South East Asian -Mandarin-English (SEAME) data. We first investigate semi-supervised lexicon -learning approach to adapt the canonical lexicon, which is meant to alleviate -the heavily accented pronunciation issue within the code-switching conversation -of the local area. As a result, the learned lexicon yields improved -performance. Furthermore, we attempt to use semi-supervised training to deal -with those transcriptions that are highly mismatched between human transcribers -and ASR system. Specifically, we conduct semi-supervised training assuming -those poorly transcribed data as unsupervised data. We found the -semi-supervised acoustic modeling can lead to improved results. Finally, to -make up for the limitation of the conventional n-gram language models due to -data sparsity issue, we perform lattice rescoring using neural network language -models, and significant WER reduction is obtained. -" -8012,1806.06208,"Sauradip Nag, Pallab Kumar Ganguly, Sumit Roy, Sourab Jha, Krishna - Bose, Abhishek Jha, Kousik Dasgupta","Offline Extraction of Indic Regional Language from Natural Scene Image - using Text Segmentation and Deep Convolutional Sequence",cs.CV cs.AI cs.CL cs.IR," Regional language extraction from a natural scene image is always a -challenging proposition due to its dependence on the text information extracted -from Image. Text Extraction on the other hand varies on different lighting -condition, arbitrary orientation, inadequate text information, heavy background -influence over text and change of text appearance. This paper presents a novel -unified method for tackling the above challenges. The proposed work uses an -image correction and segmentation technique on the existing Text Detection -Pipeline an Efficient and Accurate Scene Text Detector (EAST). EAST uses -standard PVAnet architecture to select features and non maximal suppression to -detect text from image. Text recognition is done using combined architecture of -MaxOut convolution neural network (CNN) and Bidirectional long short term -memory (LSTM) network. After recognizing text using the Deep Learning based -approach, the native Languages are translated to English and tokenized using -standard Text Tokenizers. The tokens that very likely represent a location is -used to find the Global Positioning System (GPS) coordinates of the location -and subsequently the regional languages spoken in that location is extracted. -The proposed method is tested on a self generated dataset collected from -Government of India dataset and experimented on Standard Dataset to evaluate -the performance of the proposed technique. Comparative study with a few -state-of-the-art methods on text detection, recognition and extraction of -regional language from images shows that the proposed method outperforms the -existing methods. -" -8013,1806.06219,Nikolaos Pappas and James Henderson,GILE: A Generalized Input-Label Embedding for Text Classification,cs.CL," Neural text classification models typically treat output labels as -categorical variables which lack description and semantics. This forces their -parametrization to be dependent on the label set size, and, hence, they are -unable to scale to large label sets and generalize to unseen ones. Existing -joint input-label text models overcome these issues by exploiting label -descriptions, but they are unable to capture complex label relationships, have -rigid parametrization, and their gains on unseen labels happen often at the -expense of weak performance on the labels seen during training. In this paper, -we propose a new input-label model which generalizes over previous such models, -addresses their limitations and does not compromise performance on seen labels. -The model consists of a joint non-linear input-label embedding with -controllable capacity and a joint-space-dependent classification unit which is -trained with cross-entropy loss to optimize classification performance. We -evaluate models on full-resource and low- or zero-resource text classification -of multilingual news and biomedical text with a large label set. Our model -outperforms monolingual and multilingual models which do not leverage label -semantics and previous joint input-label space models in both scenarios. -" -8014,1806.06228,"N. Majumder, D. Hazarika, A. Gelbukh, E. Cambria, S. Poria","Multimodal Sentiment Analysis using Hierarchical Fusion with Context - Modeling",cs.CL cs.CV," Multimodal sentiment analysis is a very actively growing field of research. A -promising area of opportunity in this field is to improve the multimodal fusion -mechanism. We present a novel feature fusion strategy that proceeds in a -hierarchical fashion, first fusing the modalities two in two and only then -fusing all three modalities. On multimodal sentiment analysis of individual -utterances, our strategy outperforms conventional concatenation of features by -1%, which amounts to 5% reduction in error rate. On utterance-level multimodal -sentiment analysis of multi-utterance video clips, for which current -state-of-the-art techniques incorporate contextual information from other -utterances of the same clip, our hierarchical fusion gives up to 2.4% (almost -10% error rate reduction) over currently used concatenation. The implementation -of our method is publicly available in the form of open-source code. -" -8015,1806.06259,"Christian S. Perone, Roberto Silveira, Thomas S. Paula","Evaluation of sentence embeddings in downstream and linguistic probing - tasks",cs.CL," Despite the fast developmental pace of new sentence embedding methods, it is -still challenging to find comprehensive evaluations of these different -techniques. In the past years, we saw significant improvements in the field of -sentence embeddings and especially towards the development of universal -sentence encoders that could provide inductive transfer to a wide variety of -downstream tasks. In this work, we perform a comprehensive evaluation of recent -methods using a wide variety of downstream and linguistic feature probing -tasks. We show that a simple approach using bag-of-words with a recently -introduced language model for deep context-dependent word embeddings proved to -yield better results in many tasks when compared to sentence encoders trained -on entailment datasets. We also show, however, that we are still far away from -a universal encoder that can perform consistently across several downstream -tasks. -" -8016,1806.06301,"Adam Sutton, Thomas Lansdall-Welfare, Nello Cristianini","Biased Embeddings from Wild Data: Measuring, Understanding and Removing",cs.CL cs.AI stat.ML," Many modern Artificial Intelligence (AI) systems make use of data embeddings, -particularly in the domain of Natural Language Processing (NLP). These -embeddings are learnt from data that has been gathered ""from the wild"" and have -been found to contain unwanted biases. In this paper we make three -contributions towards measuring, understanding and removing this problem. We -present a rigorous way to measure some of these biases, based on the use of -word lists created for social psychology applications; we observe how gender -bias in occupations reflects actual gender bias in the same occupations in the -real world; and finally we demonstrate how a simple projection can -significantly reduce the effects of embedding bias. All this is part of an -ongoing effort to understand how trust can be built into AI systems. -" -8017,1806.06342,"Linhao Dong, Shiyu Zhou, Wei Chen, Bo Xu","Extending Recurrent Neural Aligner for Streaming End-to-End Speech - Recognition in Mandarin",cs.SD cs.CL eess.AS," End-to-end models have been showing superiority in Automatic Speech -Recognition (ASR). At the same time, the capacity of streaming recognition has -become a growing requirement for end-to-end models. Following these trends, an -encoder-decoder recurrent neural network called Recurrent Neural Aligner (RNA) -has been freshly proposed and shown its competitiveness on two English ASR -tasks. However, it is not clear if RNA can be further improved and applied to -other spoken language. In this work, we explore the applicability of RNA in -Mandarin Chinese and present four effective extensions: In the encoder, we -redesign the temporal down-sampling and introduce a powerful convolutional -structure. In the decoder, we utilize a regularizer to smooth the output -distribution and conduct joint training with a language model. On two Mandarin -Chinese conversational telephone speech recognition (MTS) datasets, our -Extended-RNA obtains promising performance. Particularly, it achieves 27.7% -character error rate (CER), which is superior to current state-of-the-art -result on the popular HKUST task. -" -8018,1806.06349,"Huiming Jin, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Fen Lin, - Leyu Lin",Incorporating Chinese Characters of Words for Lexical Sememe Prediction,cs.CL cs.AI cs.LG," Sememes are minimum semantic units of concepts in human languages, such that -each word sense is composed of one or multiple sememes. Words are usually -manually annotated with their sememes by linguists, and form linguistic -common-sense knowledge bases widely used in various NLP tasks. Recently, the -lexical sememe prediction task has been introduced. It consists of -automatically recommending sememes for words, which is expected to improve -annotation efficiency and consistency. However, existing methods of lexical -sememe prediction typically rely on the external context of words to represent -the meaning, which usually fails to deal with low-frequency and -out-of-vocabulary words. To address this issue for Chinese, we propose a novel -framework to take advantage of both internal character information and external -context information of words. We experiment on HowNet, a Chinese sememe -knowledge base, and demonstrate that our framework outperforms state-of-the-art -baselines by a large margin, and maintains a robust performance even for -low-frequency words. -" -8019,1806.06371,"Lisa Beinborn, Teresa Botschen and Iryna Gurevych",Multimodal Grounding for Language Processing,cs.CL cs.AI," This survey discusses how recent developments in multimodal processing -facilitate conceptual grounding of language. We categorize the information flow -in multimodal processing with respect to cognitive models of human information -processing and analyze different methods for combining multimodal -representations. Based on this methodological inventory, we discuss the benefit -of multimodal grounding for a variety of language processing tasks and the -challenges that arise. We particularly focus on multimodal grounding of verbs -which play a crucial role for the compositional power of language. -" -8020,1806.06407,"Bijoyan Das, Sarit Chakraborty","An Improved Text Sentiment Classification Model Using TF-IDF and Next - Word Negation",cs.CL cs.IR," With the rapid growth of Text sentiment analysis, the demand for automatic -classification of electronic documents has increased by leaps and bound. The -paradigm of text classification or text mining has been the subject of many -research works in recent time. In this paper we propose a technique for text -sentiment classification using term frequency- inverse document frequency -(TF-IDF) along with Next Word Negation (NWN). We have also compared the -performances of binary bag of words model, TF-IDF model and TF-IDF with next -word negation (TF-IDF-NWN) model for text classification. Our proposed model is -then applied on three different text mining algorithms and we found the Linear -Support vector machine (LSVM) is the most appropriate to work with our proposed -model. The achieved results show significant increase in accuracy compared to -earlier methods. -" -8021,1806.06411,"Svitlana Vakulenko, Maarten de Rijke, Michael Cochez, Vadim Savenkov, - Axel Polleres",Measuring Semantic Coherence of a Conversation,cs.CL cs.AI," Conversational systems have become increasingly popular as a way for humans -to interact with computers. To be able to provide intelligent responses, -conversational systems must correctly model the structure and semantics of a -conversation. We introduce the task of measuring semantic (in)coherence in a -conversation with respect to background knowledge, which relies on the -identification of semantic relations between concepts introduced during a -conversation. We propose and evaluate graph-based and machine learning-based -approaches for measuring semantic coherence using knowledge graphs, their -vector space embeddings and word embedding models, as sources of background -knowledge. We demonstrate how these approaches are able to uncover different -coherence patterns in conversations on the Ubuntu Dialogue Corpus. -" -8022,1806.06478,"Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, Carlo Zaniolo","Co-training Embeddings of Knowledge Graphs and Entity Descriptions for - Cross-lingual Entity Alignment",cs.AI cs.CL," Multilingual knowledge graph (KG) embeddings provide latent semantic -representations of entities and structured knowledge with cross-lingual -inferences, which benefit various knowledge-driven cross-lingual NLP tasks. -However, precisely learning such cross-lingual inferences is usually hindered -by the low coverage of entity alignment in many KGs. Since many multilingual -KGs also provide literal descriptions of entities, in this paper, we introduce -an embedding-based approach which leverages a weakly aligned multilingual KG -for semi-supervised cross-lingual learning using entity descriptions. Our -approach performs co-training of two embedding models, i.e. a multilingual KG -embedding model and a multilingual literal description embedding model. The -models are trained on a large Wikipedia-based trilingual dataset where most -entity alignment is unknown to training. Experimental results show that the -performance of the proposed approach on the entity alignment task improves at -each iteration of co-training, and eventually reaches a stage at which it -significantly surpasses previous approaches. We also show that our approach has -promising abilities for zero-shot entity alignment, and cross-lingual KG -completion. -" -8023,1806.06513,"Chao Zhang, Philip Woodland",Semi-tied Units for Efficient Gating in LSTM and Highway Networks,cs.CL cs.LG eess.AS stat.ML," Gating is a key technique used for integrating information from multiple -sources by long short-term memory (LSTM) models and has recently also been -applied to other models such as the highway network. Although gating is -powerful, it is rather expensive in terms of both computation and storage as -each gating unit uses a separate full weight matrix. This issue can be severe -since several gates can be used together in e.g. an LSTM cell. This paper -proposes a semi-tied unit (STU) approach to solve this efficiency issue, which -uses one shared weight matrix to replace those in all the units in the same -layer. The approach is termed ""semi-tied"" since extra parameters are used to -separately scale each of the shared output values. These extra scaling factors -are associated with the network activation functions and result in the use of -parameterised sigmoid, hyperbolic tangent, and rectified linear unit functions. -Speech recognition experiments using British English multi-genre broadcast data -showed that using STUs can reduce the calculation and storage cost by a factor -of three for highway networks and four for LSTMs, while giving similar word -error rates to the original models. -" -8024,1806.06571,Tom Kocmi and Ond\v{r}ej Bojar,SubGram: Extending Skip-gram Word Representation with Substrings,cs.CL," Skip-gram (word2vec) is a recent method for creating vector representations -of words (""distributed word representations"") using a neural network. The -representation gained popularity in various areas of natural language -processing, because it seems to capture syntactic and semantic information -about words without any explicit supervision in this respect. We propose -SubGram, a refinement of the Skip-gram model to consider also the word -structure during the training process, achieving large gains on the Skip-gram -original test set. -" -8025,1806.06583,"Xuefei Ning, Yin Zheng, Zhuxi Jiang, Yu Wang, Huazhong Yang, Junzhou - Huang",Nonparametric Topic Modeling with Neural Inference,cs.CL cs.IR cs.LG," This work focuses on combining nonparametric topic models with Auto-Encoding -Variational Bayes (AEVB). Specifically, we first propose iTM-VAE, where the -topics are treated as trainable parameters and the document-specific topic -proportions are obtained by a stick-breaking construction. The inference of -iTM-VAE is modeled by neural networks such that it can be computed in a simple -feed-forward manner. We also describe how to introduce a hyper-prior into -iTM-VAE so as to model the uncertainty of the prior parameter. Actually, the -hyper-prior technique is quite general and we show that it can be applied to -other AEVB based models to alleviate the {\it collapse-to-prior} problem -elegantly. Moreover, we also propose HiTM-VAE, where the document-specific -topic distributions are generated in a hierarchical manner. HiTM-VAE is even -more flexible and can generate topic distributions with better variability. -Experimental results on 20News and Reuters RCV1-V2 datasets show that the -proposed models outperform the state-of-the-art baselines significantly. The -advantages of the hyper-prior technique and the hierarchical model construction -are also confirmed by experiments. -" -8026,1806.06626,"Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson","On Enhancing Speech Emotion Recognition using Generative Adversarial - Networks",cs.CL," Generative Adversarial Networks (GANs) have gained a lot of attention from -machine learning community due to their ability to learn and mimic an input -data distribution. GANs consist of a discriminator and a generator working in -tandem playing a min-max game to learn a target underlying data distribution; -when fed with data-points sampled from a simpler distribution (like uniform or -Gaussian distribution). Once trained, they allow synthetic generation of -examples sampled from the target distribution. We investigate the application -of GANs to generate synthetic feature vectors used for speech emotion -recognition. Specifically, we investigate two set ups: (i) a vanilla GAN that -learns the distribution of a lower dimensional representation of the actual -higher dimensional feature vector and, (ii) a conditional GAN that learns the -distribution of the higher dimensional feature vectors conditioned on the -labels or the emotional class to which it belongs. As a potential practical -application of these synthetically generated samples, we measure any -improvement in a classifier's performance when the synthetic data is used along -with real data for training. We perform cross-validation analyses followed by a -cross-corpus study. -" -8027,1806.06734,"Pierre Godard, Marcely Zanon-Boito, Lucas Ondel, Alexandre Berard, - Fran\c{c}ois Yvon, Aline Villavicencio, and Laurent Besacier",Unsupervised Word Segmentation from Speech with Attention,cs.CL cs.AI," We present a first attempt to perform attentional word segmentation directly -from the speech signal, with the final goal to automatically identify lexical -units in a low-resource, unwritten language (UL). Our methodology assumes a -pairing between recordings in the UL with translations in a well-resourced -language. It uses Acoustic Unit Discovery (AUD) to convert speech into a -sequence of pseudo-phones that is segmented using neural soft-alignments -produced by a neural machine translation model. Evaluation uses an actual Bantu -UL, Mboshi; comparisons to monolingual and bilingual baselines illustrate the -potential of attentional word segmentation for language documentation. -" -8028,1806.06874,Ruixi Lin,"Combining Word Feature Vector Method with the Convolutional Neural - Network for Slot Filling in Spoken Language Understanding",cs.CL," Slot filling is an important problem in Spoken Language Understanding (SLU) -and Natural Language Processing (NLP), which involves identifying a user's -intent and assigning a semantic concept to each word in a sentence. This paper -presents a word feature vector method and combines it into the convolutional -neural network (CNN). We consider 18 word features and each word feature is -constructed by merging similar word labels. By introducing the concept of -external library, we propose a feature set approach that is beneficial for -building the relationship between a word from the training dataset and the -feature. Computational results are reported using the ATIS dataset and -comparisons with traditional CNN as well as bi-directional sequential CNN are -also presented. -" -8029,1806.06950,"Patrick H. Chen, Si Si, Yang Li, Ciprian Chelba, Cho-jui Hsieh","GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model - Shrinking",cs.CL cs.AI cs.LG stat.ML," Model compression is essential for serving large deep neural nets on devices -with limited resources or applications that require real-time responses. As a -case study, a state-of-the-art neural language model usually consists of one or -more recurrent layers sandwiched between an embedding layer used for -representing input tokens and a softmax layer for generating output tokens. For -problems with a very large vocabulary size, the embedding and the softmax -matrices can account for more than half of the model size. For instance, the -bigLSTM model achieves state-of- the-art performance on the One-Billion-Word -(OBW) dataset with around 800k vocabulary, and its word embedding and softmax -matrices use more than 6GBytes space, and are responsible for over 90% of the -model parameters. In this paper, we propose GroupReduce, a novel compression -method for neural language models, based on vocabulary-partition (block) based -low-rank matrix approximation and the inherent frequency distribution of tokens -(the power-law distribution of words). The experimental results show our method -can significantly outperform traditional compression methods such as low-rank -approximation and pruning. On the OBW dataset, our method achieved 6.6 times -compression rate for the embedding and softmax matrices, and when combined with -quantization, our method can achieve 26 times compression rate, which -translates to a factor of 12.8 times compression for the entire model with very -little degradation in perplexity. -" -8030,1806.06957,"Surafel M. Lakew, Mauro Cettolo, Marcello Federico","A Comparison of Transformer and Recurrent Neural Networks on - Multilingual Neural Machine Translation",cs.CL," Recently, neural machine translation (NMT) has been extended to -multilinguality, that is to handle more than one translation direction with a -single system. Multilingual NMT showed competitive performance against pure -bilingual systems. Notably, in low-resource settings, it proved to work -effectively and efficiently, thanks to shared representation space that is -forced across languages and induces a sort of transfer-learning. Furthermore, -multilingual NMT enables so-called zero-shot inference across language pairs -never seen at training time. Despite the increasing interest in this framework, -an in-depth analysis of what a multilingual NMT model is capable of and what it -is not is still missing. Motivated by this, our work (i) provides a -quantitative and comparative analysis of the translations produced by -bilingual, multilingual and zero-shot systems; (ii) investigates the -translation quality of two of the currently dominant neural architectures in -MT, which are the Recurrent and the Transformer ones; and (iii) quantitatively -explores how the closeness between languages influences the zero-shot -translation. Our analysis leverages multiple professional post-edits of -automatic translations by several different systems and focuses both on -automatic standard metrics (BLEU and TER) and on widely used error categories, -which are lexical, morphology, and word order errors. -" -8031,1806.06972,Soumya Wadhwa and Khyathi Raghavi Chandu and Eric Nyberg,Comparative Analysis of Neural QA models on SQuAD,cs.CL cs.AI," The task of Question Answering has gained prominence in the past few decades -for testing the ability of machines to understand natural language. Large -datasets for Machine Reading have led to the development of neural models that -cater to deeper language understanding compared to information retrieval tasks. -Different components in these neural architectures are intended to tackle -different challenges. As a first step towards achieving generalization across -multiple domains, we attempt to understand and compare the peculiarities of -existing end-to-end neural models on the Stanford Question Answering Dataset -(SQuAD) by performing quantitative as well as qualitative analysis of the -results attained by each of them. We observed that prediction errors reflect -certain model-specific biases, which we further discuss in this paper. -" -8032,1806.06998,Leif W. Hanlen and Richard Nock and Hanna Suominen and Neil Bacon,Private Text Classification,cs.CL cs.CR cs.IR," Confidential text corpora exist in many forms, but do not allow arbitrary -sharing. We explore how to use such private corpora using privacy preserving -text analytics. We construct typical text processing applications using -appropriate privacy preservation techniques (including homomorphic encryption, -Rademacher operators and secure computation). We set out the preliminary -materials from Rademacher operators for binary classifiers, and then construct -basic text processing approaches to match those binary classifiers. -" -8033,1806.07000,"Jingyuan Li, Xiao Sun","A Syntactically Constrained Bidirectional-Asynchronous Approach for - Emotional Conversation Generation",cs.CL cs.LG," Traditional neural language models tend to generate generic replies with poor -logic and no emotion. In this paper, a syntactically constrained -bidirectional-asynchronous approach for emotional conversation generation -(E-SCBA) is proposed to address this issue. In our model, pre-generated emotion -keywords and topic keywords are asynchronously introduced into the process of -decoding. It is much different from most existing methods which generate -replies from the first word to the last. Through experiments, the results -indicate that our approach not only improves the diversity of replies, but -gains a boost on both logic and emotion compared with baselines. -" -8034,1806.07039,"Linkai Luo, Haiqing Yang and Francis Y. L. Chin","EmotionX-DLC: Self-Attentive BiLSTM for Detecting Sequential Emotions in - Dialogue",cs.CL," In this paper, we propose a self-attentive bidirectional long short-term -memory (SA-BiLSTM) network to predict multiple emotions for the EmotionX -challenge. The BiLSTM exhibits the power of modeling the word dependencies, and -extracting the most relevant features for emotion classification. Building on -top of BiLSTM, the self-attentive network can model the contextual dependencies -between utterances which are helpful for classifying the ambiguous emotions. We -achieve 59.6 and 55.0 unweighted accuracy scores in the \textit{Friends} and -the \textit{EmotionPush} test sets, respectively. -" -8035,1806.07042,"Yu Wu, Furu Wei, Shaohan Huang, Yunli Wang, Zhoujun Li, Ming Zhou",Response Generation by Context-aware Prototype Editing,cs.CL," Open domain response generation has achieved remarkable progress in recent -years, but sometimes yields short and uninformative responses. We propose a new -paradigm for response generation, that is response generation by editing, which -significantly increases the diversity and informativeness of the generation -results. Our assumption is that a plausible response can be generated by -slightly revising an existing response prototype. The prototype is retrieved -from a pre-defined index and provides a good start-point for generation because -it is grammatical and informative. We design a response editing model, where an -edit vector is formed by considering differences between a prototype context -and a current context, and then the edit vector is fed to a decoder to revise -the prototype response for the current context. Experiment results on a large -scale dataset demonstrate that the response editing model outperforms -generative and retrieval-based models on various aspects. -" -8036,1806.07072,"Sauradip Nag, Palaiahnakote Shivakumara, Wu Yirui, Umapada Pal, and - Tong Lu","A New COLD Feature based Handwriting Analysis for Ethnicity/Nationality - Identification",cs.CV cs.AI cs.CG cs.CL," Identifying crime for forensic investigating teams when crimes involve people -of different nationals is challenging. This paper proposes a new method for -ethnicity (nationality) identification based on Cloud of Line Distribution -(COLD) features of handwriting components. The proposed method, at first, -explores tangent angle for the contour pixels in each row and the mean of -intensity values of each row in an image for segmenting text lines. For -segmented text lines, we use tangent angle and direction of base lines to -remove rule lines in the image. We use polygonal approximation for finding -dominant points for contours of edge components. Then the proposed method -connects the nearest dominant points of every dominant point, which results in -line segments of dominant point pairs. For each line segment, the proposed -method estimates angle and length, which gives a point in polar domain. For all -the line segments, the proposed method generates dense points in polar domain, -which results in COLD distribution. As character component shapes change, -according to nationals, the shape of the distribution changes. This observation -is extracted based on distance from pixels of distribution to Principal Axis of -the distribution. Then the features are subjected to an SVM classifier for -identifying nationals. Experiments are conducted on a complex dataset, which -show the proposed method is effective and outperforms the existing method -" -8037,1806.07098,"Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, - Emmanuel Dupoux",End-to-End Speech Recognition From the Raw Waveform,cs.CL cs.SD eess.AS," State-of-the-art speech recognition systems rely on fixed, hand-crafted -features such as mel-filterbanks to preprocess the waveform before the training -pipeline. In this paper, we study end-to-end systems trained directly from the -raw waveform, building on two alternatives for trainable replacements of -mel-filterbanks that use a convolutional architecture. The first one is -inspired by gammatone filterbanks (Hoshen et al., 2015; Sainath et al, 2015), -and the second one by the scattering transform (Zeghidour et al., 2017). We -propose two modifications to these architectures and systematically compare -them to mel-filterbanks, on the Wall Street Journal dataset. The first -modification is the addition of an instance normalization layer, which greatly -improves on the gammatone-based trainable filterbanks and speeds up the -training of the scattering-based filterbanks. The second one relates to the -low-pass filter used in these approaches. These modifications consistently -improve performances for both approaches, and remove the need for a careful -initialization in scattering-based trainable filterbanks. In particular, we -show a consistent improvement in word error rate of the trainable filterbanks -relatively to comparable mel-filterbanks. It is the first time end-to-end -models trained from the raw signal significantly outperform mel-filterbanks on -a large vocabulary task under clean recording conditions. -" -8038,1806.07139,"Henry B.Moss, David S.Leslie and Paul Rayson","Using J-K fold Cross Validation to Reduce Variance When Tuning NLP - Models",cs.CL stat.ML," K-fold cross validation (CV) is a popular method for estimating the true -performance of machine learning models, allowing model selection and parameter -tuning. However, the very process of CV requires random partitioning of the -data and so our performance estimates are in fact stochastic, with variability -that can be substantial for natural language processing tasks. We demonstrate -that these unstable estimates cannot be relied upon for effective parameter -tuning. The resulting tuned parameters are highly sensitive to how our data is -partitioned, meaning that we often select sub-optimal parameter choices and -have serious reproducibility issues. - Instead, we propose to use the less variable J-K-fold CV, in which J -independent K-fold cross validations are used to assess performance. Our main -contributions are extending J-K-fold CV from performance estimation to -parameter tuning and investigating how to choose J and K. We argue that -variability is more important than bias for effective tuning and so advocate -lower choices of K than are typically seen in the NLP literature, instead use -the saved computation to increase J. To demonstrate the generality of our -recommendations we investigate a wide range of case-studies: sentiment -classification (both general and target-specific), part-of-speech tagging and -document classification. -" -8039,1806.07169,Pavel Petrushkov and Shahram Khadivi and Evgeny Matusov,Learning from Chunk-based Feedback in Neural Machine Translation,cs.CL," We empirically investigate learning from partial feedback in neural machine -translation (NMT), when partial feedback is collected by asking users to -highlight a correct chunk of a translation. We propose a simple and effective -way of utilizing such feedback in NMT training. We demonstrate how the common -machine translation problem of domain mismatch between training and deployment -can be reduced solely based on chunk-level user feedback. We conduct a series -of simulation experiments to test the effectiveness of the proposed method. Our -results show that chunk-level feedback outperforms sentence based feedback by -up to 2.61% BLEU absolute. -" -8040,1806.07186,"Jan Vanek, Josef Michalek, Josef Psutka",Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task,cs.CL," In this paper, we have investigated recurrent deep neural networks (DNNs) in -combination with regularization techniques as dropout, zoneout, and -regularization post-layer. As a benchmark, we chose the TIMIT phone recognition -task due to its popularity and broad availability in the community. It also -simulates a low-resource scenario that is helpful in minor languages. Also, we -prefer the phone recognition task because it is much more sensitive to an -acoustic model quality than a large vocabulary continuous speech recognition -task. In recent years, recurrent DNNs pushed the error rates in automatic -speech recognition down. But, there was no clear winner in proposed -architectures. The dropout was used as the regularization technique in most -cases, but combination with other regularization techniques together with model -ensembles was omitted. However, just an ensemble of recurrent DNNs performed -best and achieved an average phone error rate from 10 experiments 14.84 % -(minimum 14.69 %) on core test set that is slightly lower then the -best-published PER to date, according to our knowledge. Finally, in contrast of -the most papers, we published the open-source scripts to easily replicate the -results and to help continue the development. -" -8041,1806.07221,"Xuan-Son Vu, Lili Jiang",Self-adaptive Privacy Concern Detection for User-generated Content,cs.CR cs.CL," To protect user privacy in data analysis, a state-of-the-art strategy is -differential privacy in which scientific noise is injected into the real -analysis output. The noise masks individual's sensitive information contained -in the dataset. However, determining the amount of noise is a key challenge, -since too much noise will destroy data utility while too little noise will -increase privacy risk. Though previous research works have designed some -mechanisms to protect data privacy in different scenarios, most of the existing -studies assume uniform privacy concerns for all individuals. Consequently, -putting an equal amount of noise to all individuals leads to insufficient -privacy protection for some users, while over-protecting others. To address -this issue, we propose a self-adaptive approach for privacy concern detection -based on user personality. Our experimental studies demonstrate the -effectiveness to address a suitable personalized privacy protection for -cold-start users (i.e., without their privacy-concern information in training -data). -" -8042,1806.07304,"Han Guo, Ramakanth Pasunuru, Mohit Bansal",Dynamic Multi-Level Multi-Task Learning for Sentence Simplification,cs.CL cs.AI cs.LG," Sentence simplification aims to improve readability and understandability, -based on several operations such as splitting, deletion, and paraphrasing. -However, a valid simplified sentence should also be logically entailed by its -input sentence. In this work, we first present a strong pointer-copy mechanism -based sequence-to-sequence sentence simplification model, and then improve its -entailment and paraphrasing capabilities via multi-task learning with related -auxiliary tasks of entailment and paraphrase generation. Moreover, we propose a -novel 'multi-level' layered soft sharing approach where each auxiliary task -shares different (higher versus lower) level layers of the sentence -simplification model, depending on the task's semantic versus lexico-syntactic -nature. We also introduce a novel multi-armed bandit based training approach -that dynamically learns how to effectively switch across tasks during -multi-task learning. Experiments on multiple popular datasets demonstrate that -our model outperforms competitive simplification systems in SARI and FKGL -automatic metrics, and human evaluation. Further, we present several ablation -analyses on alternative layer sharing methods, soft versus hard sharing, -dynamic multi-armed bandit sampling approaches, and our model's learned -entailment and paraphrasing skills. -" -8043,1806.07346,"Imon Banerjee, Hailey H. Choi, Terry Desser, Daniel L. Rubin","A Scalable Machine Learning Approach for Inferring Probabilistic - US-LI-RADS Categorization",cs.CL cs.AI," We propose a scalable computerized approach for large-scale inference of -Liver Imaging Reporting and Data System (LI-RADS) final assessment categories -in narrative ultrasound (US) reports. Although our model was trained on reports -created using a LI-RADS template, it was also able to infer LI-RADS scoring for -unstructured reports that were created before the LI-RADS guidelines were -established. No human-labelled data was required in any step of this study; for -training, LI-RADS scores were automatically extracted from those reports that -contained structured LI-RADS scores, and it translated the derived knowledge to -reasoning on unstructured radiology reports. By providing automated LI-RADS -categorization, our approach may enable standardizing screening recommendations -and treatment planning of patients at risk for hepatocellular carcinoma, and it -may facilitate AI-based healthcare research with US images by offering large -scale text mining and data gathering opportunities from standard hospital -clinical data repositories. -" -8044,1806.07407,"Tobias Menne, Ralf Schl\""uter, Hermann Ney","Speaker Adapted Beamforming for Multi-Channel Automatic Speech - Recognition",cs.CL cs.SD eess.AS," This paper presents, in the context of multi-channel ASR, a method to adapt a -mask based, statistically optimal beamforming approach to a speaker of -interest. The beamforming vector of the statistically optimal beamformer is -computed by utilizing speech and noise masks, which are estimated by a neural -network. The proposed adaptation approach is based on the integration of the -beamformer, which includes the mask estimation network, and the acoustic model -of the ASR system. This allows for the propagation of the training error, from -the acoustic modeling cost function, all the way through the beamforming -operation and through the mask estimation network. By using the results of a -first pass recognition and by keeping all other parameters fixed, the mask -estimation network can therefore be fine tuned by retraining. Utterances of a -speaker of interest can thus be used in a two pass approach, to optimize the -beamforming for the speech characteristics of that specific speaker. It is -shown that this approach improves the ASR performance of a state-of-the-art -multi-channel ASR system on the CHiME-4 data. Furthermore the effect of the -adaptation on the estimated speech masks is discussed. -" -8045,1806.07495,"Hamed Shahbazi, Xiaoli Z. Fern, Reza Ghaeini, Chao Ma, Rasha Obeidat, - Prasad Tadepalli",Joint Neural Entity Disambiguation with Output Space Search,cs.CL cs.AI," In this paper, we present a novel model for entity disambiguation that -combines both local contextual information and global evidences through Limited -Discrepancy Search (LDS). Given an input document, we start from a complete -solution constructed by a local model and conduct a search in the space of -possible corrections to improve the local solution from a global view point. -Our search utilizes a heuristic function to focus more on the least confident -local decisions and a pruning function to score the global solutions based on -their local fitness and the global coherences among the predicted entities. -Experimental results on CoNLL 2003 and TAC 2010 benchmarks verify the -effectiveness of our model. -" -8046,1806.07573,{\O}ystein Repp and Heri Ramampiaro,Extracting News Events from Microblogs,cs.CL cs.IR cs.SI," Twitter stream has become a large source of information for many people, but -the magnitude of tweets and the noisy nature of its content have made -harvesting the knowledge from Twitter a challenging task for researchers for a -long time. Aiming at overcoming some of the main challenges of extracting the -hidden information from tweet streams, this work proposes a new approach for -real-time detection of news events from the Twitter stream. We divide our -approach into three steps. The first step is to use a neural network or deep -learning to detect news-relevant tweets from the stream. The second step is to -apply a novel streaming data clustering algorithm to the detected news tweets -to form news events. The third and final step is to rank the detected events -based on the size of the event clusters and growth speed of the tweet -frequencies. We evaluate the proposed system on a large, publicly available -corpus of annotated news events from Twitter. As part of the evaluation, we -compare our approach with a related state-of-the-art solution. Overall, our -experiments and user-based evaluation show that our approach on detecting -current (real) news events delivers a state-of-the-art performance. -" -8047,1806.07687,"James Thorne, Andreas Vlachos","Automated Fact Checking: Task formulations, methods and future - directions",cs.CL," The recently increased focus on misinformation has stimulated research in -fact checking, the task of assessing the truthfulness of a claim. Research in -automating this task has been conducted in a variety of disciplines including -natural language processing, machine learning, knowledge representation, -databases, and journalism. While there has been substantial progress, relevant -papers and articles have been published in research communities that are often -unaware of each other and use inconsistent terminology, thus impeding -understanding and further progress. In this paper we survey automated fact -checking research stemming from natural language processing and related -disciplines, unifying the task formulations and methodologies across papers and -authors. Furthermore, we highlight the use of evidence as an important -distinguishing factor among them cutting across task formulations and methods. -We conclude with proposing avenues for future NLP research on automated fact -checking. -" -8048,1806.07699,"Vivian S. Silva, Andr\'e Freitas, Siegfried Handschuh","Word Tagging with Foundational Ontology Classes: Extending the - WordNet-DOLCE Mapping to Verbs",cs.CL," Semantic annotation is fundamental to deal with large-scale lexical -information, mapping the information to an enumerable set of categories over -which rules and algorithms can be applied, and foundational ontology classes -can be used as a formal set of categories for such tasks. A previous alignment -between WordNet noun synsets and DOLCE provided a starting point for -ontology-based annotation, but in NLP tasks verbs are also of substantial -importance. This work presents an extension to the WordNet-DOLCE noun mapping, -aligning verbs according to their links to nouns denoting perdurants, -transferring to the verb the DOLCE class assigned to the noun that best -represents that verb's occurrence. To evaluate the usefulness of this resource, -we implemented a foundational ontology-based semantic annotation framework, -that assigns a high-level foundational category to each word or phrase in a -text, and compared it to a similar annotation tool, obtaining an increase of -9.05% in accuracy. -" -8049,1806.07711,"Vivian S. Silva, Siegfried Handschuh, Andr\'e Freitas",Categorization of Semantic Roles for Dictionary Definitions,cs.CL," Understanding the semantic relationships between terms is a fundamental task -in natural language processing applications. While structured resources that -can express those relationships in a formal way, such as ontologies, are still -scarce, a large number of linguistic resources gathering dictionary definitions -is becoming available, but understanding the semantic structure of natural -language definitions is fundamental to make them useful in semantic -interpretation tasks. Based on an analysis of a subset of WordNet's glosses, we -propose a set of semantic roles that compose the semantic structure of a -dictionary definition, and show how they are related to the definition's -syntactic configuration, identifying patterns that can be used in the -development of information extraction frameworks and semantic models. -" -8050,1806.07713,"Amin Omidvar, Hui Jiang, Aijun An",Using Neural Network for Identifying Clickbaits in Online News Media,cs.CL cs.CY cs.IR," Online news media sometimes use misleading headlines to lure users to open -the news article. These catchy headlines that attract users but disappointed -them at the end, are called Clickbaits. Because of the importance of automatic -clickbait detection in online medias, lots of machine learning methods were -proposed and employed to find the clickbait headlines. In this research, a -model using deep learning methods is proposed to find the clickbaits in -Clickbait Challenge 2017's dataset. The proposed model gained the first rank in -the Clickbait Challenge 2017 in terms of Mean Squared Error. Also, data -analytics and visualization techniques are employed to explore and discover the -provided dataset to get more insight from the data. -" -8051,1806.07721,"Vivian S. Silva, Manuela H\""urliman, Brian Davis, Siegfried Handschuh, - Andr\'e Freitas",Semantic Relation Classification: Task Formalisation and Refinement,cs.CL," The identification of semantic relations between terms within texts is a -fundamental task in Natural Language Processing which can support applications -requiring a lightweight semantic interpretation model. Currently, semantic -relation classification concentrates on relations which are evaluated over -open-domain data. This work provides a critique on the set of abstract -relations used for semantic relation classification with regard to their -ability to express relationships between terms which are found in a -domain-specific corpora. Based on this analysis, this work proposes an -alternative semantic relation model based on reusing and extending the set of -abstract relations present in the DOLCE ontology. The resulting set of -relations is well grounded, allows to capture a wide range of relations and -could thus be used as a foundation for automatic classification of semantic -relations. -" -8052,1806.07722,Paul Kinsler,"Stylized innovation: generating timelines by interrogating incrementally - available randomised dictionaries",cs.AI cs.CL physics.soc-ph," A key challenge when trying to understand innovation is that it is a dynamic, -ongoing process, which can be highly contingent on ephemeral factors such as -culture, economics, or luck. This means that any analysis of the real-world -process must necessarily be historical - and thus probably too late to be most -useful - but also cannot be sure what the properties of the web of connections -between innovations is or was. Here I try to address this by designing and -generating a set of synthetic innovation web ""dictionaries"" that can be used to -host sampled innovation timelines, probe the overall statistics and behaviours -of these processes, and determine the degree of their reliance on the structure -or generating algorithm. Thus, inspired by the work of Fink, Reeves, Palma and -Farr (2017) on innovation in language, gastronomy, and technology, I study how -new symbol discovery manifests itself in terms of additional ""word"" vocabulary -being available from dictionaries generated from a finite number of symbols. -Several distinct dictionary generation models are investigated using numerical -simulation, with emphasis on the scaling of knowledge as dictionary generators -and parameters are varied, and the role of which order the symbols are -discovered in. -" -8053,1806.07731,"Vivian S. Silva, Andr\'e Freitas, Siegfried Handschuh","Building a Knowledge Graph from Natural Language Definitions for - Interpretable Text Entailment Recognition",cs.CL," Natural language definitions of terms can serve as a rich source of -knowledge, but structuring them into a comprehensible semantic model is -essential to enable them to be used in semantic interpretation tasks. We -propose a method and provide a set of tools for automatically building a graph -world knowledge base from natural language definitions. Adopting a conceptual -model composed of a set of semantic roles for dictionary definitions, we -trained a classifier for automatically labeling definitions, preparing the data -to be later converted to a graph representation. WordNetGraph, a knowledge -graph built out of noun and verb WordNet definitions according to this -methodology, was successfully used in an interpretable text entailment -recognition approach which uses paths in this graph to provide clear -justifications for entailment decisions. -" -8054,1806.07787,Valentin Barriere and Chlo\'e Clavel and Slim Essid,"Opinion Dynamics Modeling for Movie Review Transcripts Classification - with Hidden Conditional Random Fields",cs.CL," In this paper, the main goal is to detect a movie reviewer's opinion using -hidden conditional random fields. This model allows us to capture the dynamics -of the reviewer's opinion in the transcripts of long unsegmented audio reviews -that are analyzed by our system. High level linguistic features are computed at -the level of inter-pausal segments. The features include syntactic features, a -statistical word embedding model and subjectivity lexicons. The proposed system -is evaluated on the ICT-MMMO corpus. We obtain a F1-score of 82\%, which is -better than logistic regression and recurrent neural network approaches. We -also offer a discussion that sheds some light on the capacity of our system to -adapt the word embedding model learned from general written texts data to -spoken movie reviews and thus model the dynamics of the opinion. -" -8055,1806.07832,"Pengcheng Yin, Chunting Zhou, Junxian He, Graham Neubig","StructVAE: Tree-structured Latent Variable Models for Semi-supervised - Semantic Parsing",cs.CL cs.LG," Semantic parsing is the task of transducing natural language (NL) utterances -into formal meaning representations (MRs), commonly represented as tree -structures. Annotating NL utterances with their corresponding MRs is expensive -and time-consuming, and thus the limited availability of labeled data often -becomes the bottleneck of data-driven, supervised models. We introduce -StructVAE, a variational auto-encoding model for semisupervised semantic -parsing, which learns both from limited amounts of parallel data, and -readily-available unlabeled NL utterances. StructVAE models latent MRs not -observed in the unlabeled data as tree-structured latent variables. Experiments -on semantic parsing on the ATIS domain and Python code generation show that -with extra unlabeled data, StructVAE outperforms strong supervised models. -" -8056,1806.07914,"Charles Costello, Ruixi Lin, Vishwas Mruthyunjaya, Bettina Bolla, - Charles Jankowski",Multi-Layer Ensembling Techniques for Multilingual Intent Classification,cs.CL," In this paper we determine how multi-layer ensembling improves performance on -multilingual intent classification. We develop a novel multi-layer ensembling -approach that ensembles both different model initializations and different -model architectures. We also introduce a new banking domain dataset and compare -results against the standard ATIS dataset and the Chinese SMP2017 dataset to -determine ensembling performance in multilingual and multi-domain contexts. We -run ensemble experiments across all three datasets, and conclude that -ensembling provides significant performance increases, and that multi-layer -ensembling is a no-risk way to improve performance on intent classification. We -also find that a diverse ensemble of simple models can reach perform comparable -to much more sophisticated state-of-the-art models. Our best F 1 scores on -ATIS, Banking, and SMP are 97.54%, 91.79%, and 93.55% respectively, which -compare well with the state-of-the-art on ATIS and best submission to the -SMP2017 competition. The total ensembling performance increases we achieve are -0.23%, 1.96%, and 4.04% F 1 respectively. -" -8057,1806.07916,"Sean MacAvaney, Bart Desmet, Arman Cohan, Luca Soldaini, Andrew Yates, - Ayah Zirikly, Nazli Goharian",RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses,cs.CL," Self-reported diagnosis statements have been widely employed in studying -language related to mental health in social media. However, existing research -has largely ignored the temporality of mental health diagnoses. In this work, -we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported -depression diagnosis posts from Reddit that include temporal information about -the diagnosis. Annotations include whether a mental health condition is present -and how recently the diagnosis happened. Furthermore, we include exact temporal -spans that relate to the date of diagnosis. This information is valuable for -various computational methods to examine mental health through social media -because one's mental health state is not static. We also test several baseline -classification and extraction approaches, which suggest that extracting -temporal information from self-reported diagnosis statements is challenging. -" -8058,1806.07974,"Josef Michalek, Jan Vanek",A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task,cs.CL cs.HC," In this survey paper, we have evaluated several recent deep neural network -(DNN) architectures on a TIMIT phone recognition task. We chose the TIMIT -corpus due to its popularity and broad availability in the community. It also -simulates a low-resource scenario that is helpful in minor languages. Also, we -prefer the phone recognition task because it is much more sensitive to an -acoustic model quality than a large vocabulary continuous speech recognition -(LVCSR) task. In recent years, many DNN published papers reported results on -TIMIT. However, the reported phone error rates (PERs) were often much higher -than a PER of a simple feed-forward (FF) DNN. That was the main motivation of -this paper: To provide a baseline DNNs with open-source scripts to easily -replicate the baseline results for future papers with lowest possible PERs. -According to our knowledge, the best-achieved PER of this survey is better than -the best-published PER to date. -" -8059,1806.07976,"Lucy Lu Wang, Chandra Bhagavatula, Mark Neumann, Kyle Lo, Chris - Wilhelm, Waleed Ammar","Ontology Alignment in the Biomedical Domain Using Entity Definitions and - Context",cs.CL," Ontology alignment is the task of identifying semantically equivalent -entities from two given ontologies. Different ontologies have different -representations of the same entity, resulting in a need to de-duplicate -entities when merging ontologies. We propose a method for enriching entities in -an ontology with external definition and context information, and use this -additional information for ontology alignment. We develop a neural architecture -capable of encoding the additional information when available, and show that -the addition of external data results in an F1-score of 0.69 on the Ontology -Alignment Evaluation Initiative (OAEI) largebio SNOMED-NCI subtask, comparable -with the entity-level matchers in a SOTA system. -" -8060,1806.07977,"Gabriela Ram\'irez-de-la-Rosa, Esa\'u Villatoro-Tello, H\'ector - Jim\'enez-Salazar",TxPI-u: A Resource for Personality Identification of Undergraduates,cs.CL," Resources such as labeled corpora are necessary to train automatic models -within the natural language processing (NLP) field. Historically, a large -number of resources regarding a broad number of problems are available mostly -in English. One of such problems is known as Personality Identification where -based on a psychological model (e.g. The Big Five Model), the goal is to find -the traits of a subject's personality given, for instance, a text written by -the same subject. In this paper we introduce a new corpus in Spanish called -Texts for Personality Identification (TxPI). This corpus will help to develop -models to automatically assign a personality trait to an author of a text -document. Our corpus, TxPI-u, contains information of 416 Mexican undergraduate -students with some demographics information such as, age, gender, and the -academic program they are enrolled. Finally, as an additional contribution, we -present a set of baselines to provide a comparison scheme for further research. -" -8061,1806.07978,Tobias Eichinger,The Corpus Replication Task,cs.LG cs.CL stat.ML," In the field of Natural Language Processing (NLP), we revisit the well-known -word embedding algorithm word2vec. Word embeddings identify words by vectors -such that the words' distributional similarity is captured. Unexpectedly, -besides semantic similarity even relational similarity has been shown to be -captured in word embeddings generated by word2vec, whence two questions arise. -Firstly, which kind of relations are representable in continuous space and -secondly, how are relations built. In order to tackle these questions we -propose a bottom-up point of view. We call generating input text for which -word2vec outputs target relations solving the Corpus Replication Task. Deeming -generalizations of this approach to any set of relations possible, we expect -solving of the Corpus Replication Task to provide partial answers to the -questions. -" -8062,1806.07999,"Paul Landes, Barbara Di Eugenio",A Supervised Approach To The Interpretation Of Imperative To-Do Lists,cs.CL," To-do lists are a popular medium for personal information management. As -to-do tasks are increasingly tracked in electronic form with mobile and desktop -organizers, so does the potential for software support for the corresponding -tasks by means of intelligent agents. While there has been work in the area of -personal assistants for to-do tasks, no work has focused on classifying user -intention and information extraction as we do. We show that our methods perform -well across two corpora that span sub-domains, one of which we released. -" -8063,1806.08009,"Antonio Uva, Daniele Bonadiman, Alessandro Moschitti","Injecting Relational Structural Representation in Neural Networks for - Question Similarity",cs.CL," Effectively using full syntactic parsing information in Neural Networks (NNs) -to solve relational tasks, e.g., question similarity, is still an open problem. -In this paper, we propose to inject structural representations in NNs by (i) -learning an SVM model using Tree Kernels (TKs) on relatively few pairs of -questions (few thousands) as gold standard (GS) training data is typically -scarce, (ii) predicting labels on a very large corpus of question pairs, and -(iii) pre-training NNs on such large corpus. The results on Quora and SemEval -question similarity datasets show that NNs trained with our approach can learn -more accurate models, especially after fine tuning on GS. -" -8064,1806.08040,Yingjie Hu and Krzysztof Janowicz,"An empirical study on the names of points of interest and their changes - with geographic distance",cs.CL," While Points Of Interest (POIs), such as restaurants, hotels, and barber -shops, are part of urban areas irrespective of their specific locations, the -names of these POIs often reveal valuable information related to local culture, -landmarks, influential families, figures, events, and so on. Place names have -long been studied by geographers, e.g., to understand their origins and -relations to family names. However, there is a lack of large-scale empirical -studies that examine the localness of place names and their changes with -geographic distance. In addition to enhancing our understanding of the -coherence of geographic regions, such empirical studies are also significant -for geographic information retrieval where they can inform computational models -and improve the accuracy of place name disambiguation. In this work, we conduct -an empirical study based on 112,071 POIs in seven US metropolitan areas -extracted from an open Yelp dataset. We propose to adopt term frequency and -inverse document frequency in geographic contexts to identify local terms used -in POI names and to analyze their usages across different POI types. Our -results show an uneven usage of local terms across POI types, which is highly -consistent among different geographic regions. We also examine the decaying -effect of POI name similarity with the increase of distance among POIs. While -our analysis focuses on urban POI names, the presented methods can be -generalized to other place types as well, such as mountain peaks and streets. -" -8065,1806.08044,Alessandra Cervone and Evgeny Stepanov and Giuseppe Riccardi,Coherence Models for Dialogue,cs.CL," Coherence across multiple turns is a major challenge for state-of-the-art -dialogue models. Arguably the most successful approach to automatically -learning text coherence is the entity grid, which relies on modelling patterns -of distribution of entities across multiple sentences of a text. Originally -applied to the evaluation of automatic summaries and the news genre, among its -many extensions, this model has also been successfully used to assess dialogue -coherence. Nevertheless, both the original grid and its extensions do not model -intents, a crucial aspect that has been studied widely in the literature in -connection to dialogue structure. We propose to augment the original grid -document representation for dialogue with the intentional structure of the -conversation. Our models outperform the original grid representation on both -text discrimination and insertion, the two main standard tasks for coherence -assessment across three different dialogue datasets, confirming that intents -play a key role in modelling dialogue coherence. -" -8066,1806.08077,"Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou",Dictionary-Guided Editing Networks for Paraphrase Generation,cs.CL," An intuitive way for a human to write paraphrase sentences is to replace -words or phrases in the original sentence with their corresponding synonyms and -make necessary changes to ensure the new sentences are fluent and grammatically -correct. We propose a novel approach to modeling the process with -dictionary-guided editing networks which effectively conduct rewriting on the -source sentence to generate paraphrase sentences. It jointly learns the -selection of the appropriate word level and phrase level paraphrase pairs in -the context of the original sentence from an off-the-shelf dictionary as well -as the generation of fluent natural language sentences. Specifically, the -system retrieves a set of word level and phrase level araphrased pairs derived -from the Paraphrase Database (PPDB) for the original sentence, which is used to -guide the decision of which the words might be deleted or inserted with the -soft attention mechanism under the sequence-to-sequence framework. We conduct -experiments on two benchmark datasets for paraphrase generation, namely the -MSCOCO and Quora dataset. The evaluation results demonstrate that our -dictionary-guided editing networks outperforms the baseline methods. -" -8067,1806.08097,"Dayiheng Liu, Jie Fu, Qian Qu, Jiancheng Lv","BFGAN: Backward and Forward Generative Adversarial Networks for - Lexically Constrained Sentence Generation",cs.CL," Incorporating prior knowledge like lexical constraints into the model's -output to generate meaningful and coherent sentences has many applications in -dialogue system, machine translation, image captioning, etc. However, existing -RNN-based models incrementally generate sentences from left to right via beam -search, which makes it difficult to directly introduce lexical constraints into -the generated sentences. In this paper, we propose a new algorithmic framework, -dubbed BFGAN, to address this challenge. Specifically, we employ a backward -generator and a forward generator to generate lexically constrained sentences -together, and use a discriminator to guide the joint training of two generators -by assigning them reward signals. Due to the difficulty of BFGAN training, we -propose several training techniques to make the training process more stable -and efficient. Our extensive experiments on two large-scale datasets with human -evaluation demonstrate that BFGAN has significant improvements over previous -methods. -" -8068,1806.08115,"Johannes Hellrich, Sven Buechel, Udo Hahn","Modeling Word Emotion in Historical Language: Quantity Beats Supposed - Stability in Seed Word Selection",cs.CL," To understand historical texts, we must be aware that language -- including -the emotional connotation attached to words -- changes over time. In this -paper, we aim at estimating the emotion which is associated with a given word -in former language stages of English and German. Emotion is represented -following the popular Valence-Arousal-Dominance (VAD) annotation scheme. While -being more expressive than polarity alone, existing word emotion induction -methods are typically not suited for addressing it. To overcome this -limitation, we present adaptations of two popular algorithms to VAD. To measure -their effectiveness in diachronic settings, we present the first gold standard -for historical word emotions, which was created by scholars with proficiency in -the respective language stages and covers both English and German. In contrast -to claims in previous work, our findings indicate that hand-selecting small -sets of seed words with supposedly stable emotional meaning is actually harmful -rather than helpful. -" -8069,1806.08202,"Hussein T. Al-Natsheh, Lucie Martinet, Fabrice Muhlenbach, Fabien - Rico, Djamel A. Zighed","Metadata Enrichment of Multi-Disciplinary Digital Library: A - Semantic-based Approach",cs.DL cs.AI cs.CL cs.IR," In the scientific digital libraries, some papers from different research -communities can be described by community-dependent keywords even if they share -a semantically similar topic. Articles that are not tagged with enough keyword -variations are poorly indexed in any information retrieval system which limits -potentially fruitful exchanges between scientific disciplines. In this paper, -we introduce a novel experimentally designed pipeline for multi-label -semantic-based tagging developed for open-access metadata digital libraries. -The approach starts by learning from a standard scientific categorization and a -sample of topic tagged articles to find semantically relevant articles and -enrich its metadata accordingly. Our proposed pipeline aims to enable -researchers reaching articles from various disciplines that tend to use -different terminologies. It allows retrieving semantically relevant articles -given a limited known variation of search terms. In addition to achieving an -accuracy that is higher than an expanded query based method using a topic -synonym set extracted from a semantic network, our experiments also show a -higher computational scalability versus other comparable techniques. We created -a new benchmark extracted from the open-access metadata of a scientific digital -library and published it along with the experiment code to allow further -research in the topic. -" -8070,1806.08309,Seid Muhie Yimam and Chris Biemann,Par4Sim -- Adaptive Paraphrasing for Text Simplification,cs.CL," Learning from a real-world data stream and continuously updating the model -without explicit supervision is a new challenge for NLP applications with -machine learning components. In this work, we have developed an adaptive -learning system for text simplification, which improves the underlying -learning-to-rank model from usage data, i.e. how users have employed the system -for the task of simplification. Our experimental result shows that, over a -period of time, the performance of the embedded paraphrase ranking model -increases steadily improving from a score of 62.88% up to 75.70% based on the -NDCG@10 evaluation metrics. To our knowledge, this is the first study where an -NLP component is adaptively improved through usage. -" -8071,1806.08409,"Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, - Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, - Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh","End-to-End Audio Visual Scene-Aware Dialog using Multimodal - Attention-Based Video Features",cs.CL cs.CV cs.SD eess.AS," Dialog systems need to understand dynamic visual scenes in order to have -conversations with users about the objects and events around them. Scene-aware -dialog systems for real-world applications could be developed by integrating -state-of-the-art technologies from multiple research areas, including: -end-to-end dialog technologies, which generate system responses using models -trained from dialog data; visual question answering (VQA) technologies, which -answer questions about images using learned image features; and video -description technologies, in which descriptions/captions are generated from -videos using multimodal information. We introduce a new dataset of dialogs -about videos of human behaviors. Each dialog is a typed conversation that -consists of a sequence of 10 question-and-answer(QA) pairs between two Amazon -Mechanical Turk (AMT) workers. In total, we collected dialogs on roughly 9,000 -videos. Using this new dataset for Audio Visual Scene-aware dialog (AVSD), we -trained an end-to-end conversation model that generates responses in a dialog -about a video. Our experiments demonstrate that using multimodal features that -were developed for multimodal attention-based video description enhances the -quality of generated dialog about dynamic scenes (videos). Our dataset, model -code and pretrained models will be publicly available for a new Video -Scene-Aware Dialog challenge. -" -8072,1806.08462,"Hareesh Bahuleyan, Lili Mou, Hao Zhou, Olga Vechtomova",Stochastic Wasserstein Autoencoder for Probabilistic Sentence Generation,cs.CL cs.LG stat.ML," The variational autoencoder (VAE) imposes a probabilistic distribution -(typically Gaussian) on the latent space and penalizes the Kullback--Leibler -(KL) divergence between the posterior and prior. In NLP, VAEs are extremely -difficult to train due to the problem of KL collapsing to zero. One has to -implement various heuristics such as KL weight annealing and word dropout in a -carefully engineered manner to successfully train a VAE for text. In this -paper, we propose to use the Wasserstein autoencoder (WAE) for probabilistic -sentence generation, where the encoder could be either stochastic or -deterministic. We show theoretically and empirically that, in the original WAE, -the stochastically encoded Gaussian distribution tends to become a Dirac-delta -function, and we propose a variant of WAE that encourages the stochasticity of -the encoder. Experimental results show that the latent space learned by WAE -exhibits properties of continuity and smoothness as in VAEs, while -simultaneously achieving much higher BLEU scores for sentence reconstruction. -" -8073,1806.08467,"Henrique F. de Arruda, Vanessa Q. Marinho, Luciano da F. Costa, Diego - R. Amancio","Paragraph-based complex networks: application to document classification - and authenticity verification",cs.CL physics.soc-ph," With the increasing number of texts made available on the Internet, many -applications have relied on text mining tools to tackle a diversity of -problems. A relevant model to represent texts is the so-called word adjacency -(co-occurrence) representation, which is known to capture mainly syntactical -features of texts.In this study, we introduce a novel network representation -that considers the semantic similarity between paragraphs. Two main properties -of paragraph networks are considered: (i) their ability to incorporate -characteristics that can discriminate real from artificial, shuffled -manuscripts and (ii) their ability to capture syntactical and semantic textual -features. Our results revealed that real texts are organized into communities, -which turned out to be an important feature for discriminating them from -artificial texts. Interestingly, we have also found that, differently from -traditional co-occurrence networks, the adopted representation is able to -capture semantic features. Additionally, the proposed framework was employed to -analyze the Voynich manuscript, which was found to be compatible with texts -written in natural languages. Taken together, our findings suggest that the -proposed methodology can be combined with traditional network models to improve -text classification tasks. -" -8074,1806.08621,"Martin Karu, Tanel Alum\""ae",Weakly Supervised Training of Speaker Identification Models,cs.SD cs.CL cs.HC eess.AS," We propose an approach for training speaker identification models in a weakly -supervised manner. We concentrate on the setting where the training data -consists of a set of audio recordings and the speaker annotation is provided -only at the recording level. The method uses speaker diarization to find unique -speakers in each recording, and i-vectors to project the speech of each speaker -to a fixed-dimensional vector. A neural network is then trained to map -i-vectors to speakers, using a special objective function that allows to -optimize the model using recording-level speaker labels. We report experiments -on two different real-world datasets. On the VoxCeleb dataset, the method -provides 94.6% accuracy on a closed set speaker identification task, surpassing -the baseline performance by a large margin. On an Estonian broadcast news -dataset, the method provides 66% time-weighted speaker identification recall at -93% precision. -" -8075,1806.08727,"Dirk Weissenborn, Pasquale Minervini, Tim Dettmers, Isabelle - Augenstein, Johannes Welbl, Tim Rockt\""aschel, Matko Bo\v{s}njak, Jeff - Mitchell, Thomas Demeester, Pontus Stenetorp, Sebastian Riedel",Jack the Reader - A Machine Reading Framework,cs.CL cs.LG stat.ML," Many Machine Reading and Natural Language Understanding tasks require reading -supporting text in order to answer questions. For example, in Question -Answering, the supporting text can be newswire or Wikipedia articles; in -Natural Language Inference, premises can be seen as the supporting text and -hypotheses as questions. Providing a set of useful primitives operating in a -single framework of related tasks would allow for expressive modelling, and -easier model comparison and replication. To that end, we present Jack the -Reader (Jack), a framework for Machine Reading that allows for quick model -prototyping by component reuse, evaluation of new models on existing datasets -as well as integrating new datasets and applying them on a growing set of -implemented baseline models. Jack is currently supporting (but not limited to) -three tasks: Question Answering, Natural Language Inference, and Link -Prediction. It is developed with the aim of increasing research efficiency and -code reuse. -" -8076,1806.08730,"Bryan McCann and Nitish Shirish Keskar and Caiming Xiong and Richard - Socher",The Natural Language Decathlon: Multitask Learning as Question Answering,cs.CL cs.AI cs.LG stat.ML," Deep learning has improved performance on many natural language processing -(NLP) tasks individually. However, general NLP models cannot emerge within a -paradigm that focuses on the particularities of a single metric, dataset, and -task. We introduce the Natural Language Decathlon (decaNLP), a challenge that -spans ten tasks: question answering, machine translation, summarization, -natural language inference, sentiment analysis, semantic role labeling, -zero-shot relation extraction, goal-oriented dialogue, semantic parsing, and -commonsense pronoun resolution. We cast all tasks as question answering over a -context. Furthermore, we present a new Multitask Question Answering Network -(MQAN) jointly learns all tasks in decaNLP without any task-specific modules or -parameters in the multitask setting. MQAN shows improvements in transfer -learning for machine translation and named entity recognition, domain -adaptation for sentiment analysis and natural language inference, and zero-shot -capabilities for text classification. We demonstrate that the MQAN's -multi-pointer-generator decoder is key to this success and performance further -improves with an anti-curriculum training strategy. Though designed for -decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic -parsing task in the single-task setting. We also release code for procuring and -processing data, training and evaluating models, and reproducing all -experiments for decaNLP. -" -8077,1806.08748,Heeyoul Choi,"Persistent Hidden States and Nonlinear Transformation for Long - Short-Term Memory",cs.CL cs.LG stat.ML," Recurrent neural networks (RNNs) have been drawing much attention with great -success in many applications like speech recognition and neural machine -translation. Long short-term memory (LSTM) is one of the most popular RNN units -in deep learning applications. LSTM transforms the input and the previous -hidden states to the next states with the affine transformation, multiplication -operations and a nonlinear activation function, which makes a good data -representation for a given task. The affine transformation includes rotation -and reflection, which change the semantic or syntactic information of -dimensions in the hidden states. However, considering that a model interprets -the output sequence of LSTM over the whole input sequence, the dimensions of -the states need to keep the same type of semantic or syntactic information -regardless of the location in the sequence. In this paper, we propose a simple -variant of the LSTM unit, persistent recurrent unit (PRU), where each dimension -of hidden states keeps persistent information across time, so that the space -keeps the same meaning over the whole sequence. In addition, to improve the -nonlinear transformation power, we add a feedforward layer in the PRU -structure. In the experiment, we evaluate our proposed methods with three -different tasks, and the results confirm that our methods have better -performance than the conventional LSTM. -" -8078,1806.08760,"Khuong Vo, Dang Pham, Mao Nguyen, Trung Mai and Tho Quan",Combination of Domain Knowledge and Deep Learning for Sentiment Analysis,cs.CL cs.LG cs.NE," The emerging technique of deep learning has been widely applied in many -different areas. However, when adopted in a certain specific domain, this -technique should be combined with domain knowledge to improve efficiency and -accuracy. In particular, when analyzing the applications of deep learning in -sentiment analysis, we found that the current approaches are suffering from the -following drawbacks: (i) the existing works have not paid much attention to the -importance of different types of sentiment terms, which is an important concept -in this area; and (ii) the loss function currently employed does not well -reflect the degree of error of sentiment misclassification. To overcome such -problem, we propose to combine domain knowledge with deep learning. Our -proposal includes using sentiment scores, learnt by quadratic programming, to -augment training data; and introducing the penalty matrix for enhancing the -loss function of cross entropy. When experimented, we achieved a significant -improvement in classification results. -" -8079,1806.08890,Sven Buechel and Udo Hahn,"Emotion Representation Mapping for Automatic Lexicon Construction - (Mostly) Performs on Human Level",cs.CL," Emotion Representation Mapping (ERM) has the goal to convert existing emotion -ratings from one representation format into another one, e.g., mapping -Valence-Arousal-Dominance annotations for words or sentences into Ekman's Basic -Emotions and vice versa. ERM can thus not only be considered as an alternative -to Word Emotion Induction (WEI) techniques for automatic emotion lexicon -construction but may also help mitigate problems that come from the -proliferation of emotion representation formats in recent years. We propose a -new neural network approach to ERM that not only outperforms the previous -state-of-the-art. Equally important, we present a refined evaluation -methodology and gather strong evidence that our model yields results which are -(almost) as reliable as human annotations, even in cross-lingual settings. -Based on these results we generate new emotion ratings for 13 typologically -diverse languages and claim that they have near-gold quality, at least. -" -8080,1806.09010,Gabrielle K. Liu,"Evaluating Gammatone Frequency Cepstral Coefficients with Neural - Networks for Emotion Recognition from Speech",cs.SD cs.CL eess.AS," Current approaches to speech emotion recognition focus on speech features -that can capture the emotional content of a speech signal. Mel Frequency -Cepstral Coefficients (MFCCs) are one of the most commonly used representations -for audio speech recognition and classification. This paper proposes Gammatone -Frequency Cepstral Coefficients (GFCCs) as a potentially better representation -of speech signals for emotion recognition. The effectiveness of MFCC and GFCC -representations are compared and evaluated over emotion and intensity -classification tasks with fully connected and recurrent neural network -architectures. The results provide evidence that GFCCs outperform MFCCs in -speech emotion recognition. -" -8081,1806.09029,"Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik - Ramanathan, Sesh Sadasivam, Rui Zhang, Dragomir Radev",Improving Text-to-SQL Evaluation Methodology,cs.CL cs.AI cs.DB," To be informative, an evaluation must measure how well systems generalize to -realistic unseen data. We identify limitations of and propose improvements to -current evaluations of text-to-SQL systems. First, we compare human-generated -and automatically generated questions, characterizing properties of queries -necessary for real-world applications. To facilitate evaluation on multiple -datasets, we release standardized and improved versions of seven existing -datasets and one new text-to-SQL dataset. Second, we show that the current -division of data into training and test sets measures robustness to variations -in the way questions are asked, but only partially tests how well systems -generalize to new queries; therefore, we propose a complementary dataset split -for evaluation of future work. Finally, we demonstrate how the common practice -of anonymizing variables during evaluation removes an important challenge of -the task. Our observations highlight key difficulties, and our methodology -enables effective measurement of future development. -" -8082,1806.09030,"Javid Ebrahimi, Daniel Lowd, Dejing Dou",On Adversarial Examples for Character-Level Neural Machine Translation,cs.CL cs.AI," Evaluating on adversarial examples has become a standard procedure to measure -robustness of deep learning models. Due to the difficulty of creating white-box -adversarial examples for discrete text input, most analyses of the robustness -of NLP models have been done through black-box adversarial examples. We -investigate adversarial examples for character-level neural machine translation -(NMT), and contrast black-box adversaries with a novel white-box adversary, -which employs differentiable string-edit operations to rank adversarial -changes. We propose two novel types of attacks which aim to remove or change a -word in a translation, rather than simply break the NMT. We demonstrate that -white-box adversarial examples are significantly stronger than their black-box -counterparts in different attack scenarios, which show more serious -vulnerabilities than previously known. In addition, after performing -adversarial training, which takes only 3 times longer than regular training, we -can improve the model's robustness significantly. -" -8083,1806.09055,"Hanxiao Liu, Karen Simonyan, Yiming Yang",DARTS: Differentiable Architecture Search,cs.LG cs.CL cs.CV stat.ML," This paper addresses the scalability challenge of architecture search by -formulating the task in a differentiable manner. Unlike conventional approaches -of applying evolution or reinforcement learning over a discrete and -non-differentiable search space, our method is based on the continuous -relaxation of the architecture representation, allowing efficient search of the -architecture using gradient descent. Extensive experiments on CIFAR-10, -ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in -discovering high-performance convolutional architectures for image -classification and recurrent architectures for language modeling, while being -orders of magnitude faster than state-of-the-art non-differentiable techniques. -Our implementation has been made publicly available to facilitate further -research on efficient architecture search algorithms. -" -8084,1806.09089,"Chanhee Lee, Young-Bum Kim, Dongyub Lee, HeuiSeok Lim",Character-Level Feature Extraction with Densely Connected Networks,cs.CL," Generating character-level features is an important step for achieving good -results in various natural language processing tasks. To alleviate the need for -human labor in generating hand-crafted features, methods that utilize neural -architectures such as Convolutional Neural Network (CNN) or Recurrent Neural -Network (RNN) to automatically extract such features have been proposed and -have shown great results. However, CNN generates position-independent features, -and RNN is slow since it needs to process the characters sequentially. In this -paper, we propose a novel method of using a densely connected network to -automatically extract character-level features. The proposed method does not -require any language or task specific assumptions, and shows robustness and -effectiveness while being faster than CNN- or RNN-based methods. Evaluating -this method on three sequence labeling tasks - slot tagging, Part-of-Speech -(POS) tagging, and Named-Entity Recognition (NER) - we obtain state-of-the-art -performance with a 96.62 F1-score and 97.73% accuracy on slot tagging and POS -tagging, respectively, and comparable performance to the state-of-the-art 91.13 -F1-score on NER. -" -8085,1806.09102,"Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu, Hai Zhao and Gongshen Liu",Modeling Multi-turn Conversation with Deep Utterance Aggregation,cs.CL," Multi-turn conversation understanding is a major challenge for building -intelligent dialogue systems. This work focuses on retrieval-based response -matching for multi-turn conversation whose related work simply concatenates the -conversation utterances, ignoring the interactions among previous utterances -for context modeling. In this paper, we formulate previous utterances into -context using a proposed deep utterance aggregation model to form a -fine-grained context representation. In detail, a self-matching attention is -first introduced to route the vital information in each utterance. Then the -model matches a response with each refined utterance and the final matching -score is obtained after attentive turns aggregation. Experimental results show -our model outperforms the state-of-the-art methods on three multi-turn -conversation benchmarks, including a newly introduced e-commerce dialogue -corpus. -" -8086,1806.09103,"Zhuosheng Zhang, Yafang Huang and Hai Zhao",Subword-augmented Embedding for Cloze Reading Comprehension,cs.CL," Representation learning is the foundation of machine reading comprehension. -In state-of-the-art models, deep learning methods broadly use word and -character level representations. However, character is not naturally the -minimal linguistic unit. In addition, with a simple concatenation of character -and word embedding, previous models actually give suboptimal solution. In this -paper, we propose to use subword rather than character for word embedding -enhancement. We also empirically explore different augmentation strategies on -subword-augmented embedding to enhance the cloze-style reading comprehension -model reader. In detail, we present a reader that uses subword-level -representation to augment word embedding with a short list to handle rare words -effectively. A thorough examination is conducted to evaluate the comprehensive -performance and generalization ability of the proposed reader. Experimental -results show that the proposed approach helps the reader significantly -outperform the state-of-the-art baselines on various public datasets. -" -8087,1806.09105,Zhuosheng Zhang and Hai Zhao,One-shot Learning for Question-Answering in Gaokao History Challenge,cs.CL," Answering questions from university admission exams (Gaokao in Chinese) is a -challenging AI task since it requires effective representation to capture -complicated semantic relations between questions and answers. In this work, we -propose a hybrid neural model for deep question-answering task from history -examinations. Our model employs a cooperative gated neural network to retrieve -answers with the assistance of extra labels given by a neural turing machine -labeler. Empirical study shows that the labeler works well with only a small -training dataset and the gated mechanism is good at fetching the semantic -representation of lengthy answers. Experiments on question answering -demonstrate the proposed model obtains substantial performance gains over -various neural model baselines in terms of multiple evaluation metrics. -" -8088,1806.09202,"Sayash Kapoor, Vijay Keswani, Nisheeth K. Vishnoi, L. Elisa Celis",Balanced News Using Constrained Bandit-based Personalization,cs.CY cs.CL cs.SI," We present a prototype for a news search engine that presents balanced -viewpoints across liberal and conservative articles with the goal of -de-polarizing content and allowing users to escape their filter bubble. The -balancing is done according to flexible user-defined constraints, and leverages -recent advances in constrained bandit optimization. We showcase our balanced -news feed by displaying it side-by-side with the news feed produced by a -traditional (polarized) feed. -" -8089,1806.09279,Amritpal Kaur and Harkiran Kaur,"Framework for Opinion Mining Approach to Augment Education System - Performance",cs.IR cs.CL," The extensive expansion growth of social networking sites allows the people -to share their views and experiences freely with their peers on internet. Due -to this, huge amount of data is generated on everyday basis which can be used -for the opinion mining to extract the views of people in a particular field. -Opinion mining finds its applications in many areas such as Tourism, Politics, -education and entertainment, etc. It has not been extensively implemented in -area of education system. This paper discusses the malpractices in the present -examination system. In the present scenario, Opinion mining is vastly used for -decision making. The authors of this paper have designed a framework by -applying Na\""ive Bayes approach to the education dataset. The various phases of -Na\""ive Bayes approach include three steps: conversion of data into frequency -table, making classes of dataset and apply the Na\""ive Bayes algorithm equation -to calculate the probabilities of classes. Finally the highest probability -class is the outcome of this prediction. These predictions are used to make -improvements in the education system and help to provide better education. -" -8090,1806.09325,"Chenxing Li, Tieqiang Wang, Shuang Xu, Bo Xu","Single-channel Speech Dereverberation via Generative Adversarial - Training",cs.SD cs.CL eess.AS," In this paper, we propose a single-channel speech dereverberation system -(DeReGAT) based on convolutional, bidirectional long short-term memory and deep -feed-forward neural network (CBLDNN) with generative adversarial training -(GAT). In order to obtain better speech quality instead of only minimizing a -mean square error (MSE), GAT is employed to make the dereverberated speech -indistinguishable form the clean samples. Besides, our system can deal with -wide range reverberation and be well adapted to variant environments. The -experimental results show that the proposed model outperforms weighted -prediction error (WPE) and deep neural network-based systems. In addition, -DeReGAT is extended to an online speech dereverberation scenario, which reports -comparable performance with the offline case. -" -8091,1806.09374,"Raghav Menon, Herman Kamper, John Quinn, Thomas Niesler","Fast ASR-free and almost zero-resource keyword spotting using DTW and - CNNs for humanitarian monitoring",cs.CL," We use dynamic time warping (DTW) as supervision for training a convolutional -neural network (CNN) based keyword spotting system using a small set of spoken -isolated keywords. The aim is to allow rapid deployment of a keyword spotting -system in a new language to support urgent United Nations (UN) relief -programmes in parts of Africa where languages are extremely under-resourced and -the development of annotated speech resources is infeasible. First, we use 1920 -recorded keywords (40 keyword types, 34 minutes of speech) as exemplars in a -DTW-based template matching system and apply it to untranscribed broadcast -speech. Then, we use the resulting DTW scores as targets to train a CNN on the -same unlabelled speech. In this way we use just 34 minutes of labelled speech, -but leverage a large amount of unlabelled data for training. While the -resulting CNN keyword spotter cannot match the performance of the DTW-based -system, it substantially outperforms a CNN classifier trained only on the -keywords, improving the area under the ROC curve from 0.54 to 0.64. Because our -CNN system is several orders of magnitude faster at runtime than the DTW -system, it represents the most viable keyword spotter on this extremely limited -dataset. -" -8092,1806.09439,"Lucas Sterckx, Johannes Deleu, Chris Develder, Thomas Demeester",Prior Attention for Style-aware Sequence-to-Sequence Models,cs.CL," We extend sequence-to-sequence models with the possibility to control the -characteristics or style of the generated output, via attention that is -generated a priori (before decoding) from a latent code vector. After training -an initial attention-based sequence-to-sequence model, we use a variational -auto-encoder conditioned on representations of input sequences and a latent -code vector space to generate attention matrices. By sampling the code vector -from specific regions of this latent space during decoding and imposing prior -attention generated from it in the seq2seq model, output can be steered towards -having certain attributes. This is demonstrated for the task of sentence -simplification, where the latent code vector allows control over output length -and lexical simplification, and enables fine-tuning to optimize for different -evaluation metrics. -" -8093,1806.09511,"Jos\'e Marcelino, Jo\~ao Faria, Lu\'is Ba\'ia, Ricardo Gamelas Sousa",A Hierarchical Deep Learning Natural Language Parser for Fashion,cs.IR cs.AI cs.CL," This work presents a hierarchical deep learning natural language parser for -fashion. Our proposal intends not only to recognize fashion-domain entities but -also to expose syntactic and morphologic insights. We leverage the usage of an -architecture of specialist models, each one for a different task (from parsing -to entity recognition). Such architecture renders a hierarchical model able to -capture the nuances of the fashion language. The natural language parser is -able to deal with textual ambiguities which are left unresolved by our -currently existing solution. Our empirical results establish a robust baseline, -which justifies the use of hierarchical architectures of deep learning models -while opening new research avenues to explore. -" -8094,1806.09514,"Adaeze Adigwe, No\'e Tits, Kevin El Haddad, Sarah Ostadabbas and - Thierry Dutoit","The Emotional Voices Database: Towards Controlling the Emotion Dimension - in Voice Generation Systems",cs.CL cs.AI eess.AS," In this paper, we present a database of emotional speech intended to be -open-sourced and used for synthesis and generation purpose. It contains data -for male and female actors in English and a male actor in French. The database -covers 5 emotion classes so it could be suitable to build synthesis and voice -transformation systems with the potential to control the emotional dimension in -a continuous way. We show the data's efficiency by building a simple MLP system -converting neutral to angry speech style and evaluate it via a CMOS perception -test. Even though the system is a very simple one, the test show the efficiency -of the data which is promising for future work. -" -8095,1806.09533,Marc Velay and Fabrice Daniel,Using NLP on news headlines to predict index trends,cs.CL cs.LG stat.ML," This paper attempts to provide a state of the art in trend prediction using -news headlines. We present the research done on predicting DJIA trends using -Natural Language Processing. We will explain the different algorithms we have -used as well as the various embedding techniques attempted. We rely on -statistical and deep learning models in order to extract information from the -corpuses. -" -8096,1806.09542,Wei-Hung Weng and Peter Szolovits,"Mapping Unparalleled Clinical Professional and Consumer Languages with - Embedding Alignment",cs.LG cs.CL stat.ML," Mapping and translating professional but arcane clinical jargons to consumer -language is essential to improve the patient-clinician communication. -Researchers have used the existing biomedical ontologies and consumer health -vocabulary dictionary to translate between the languages. However, such -approaches are limited by expert efforts to manually build the dictionary, -which is hard to be generalized and scalable. In this work, we utilized the -embeddings alignment method for the word mapping between unparalleled clinical -professional and consumer language embeddings. To map semantically similar -words in two different word embeddings, we first independently trained word -embeddings on both the corpus with abundant clinical professional terms and the -other with mainly healthcare consumer terms. Then, we aligned the embeddings by -the Procrustes algorithm. We also investigated the approach with the -adversarial training with refinement. We evaluated the quality of the alignment -through the similar words retrieval both by computing the model precision and -as well as judging qualitatively by human. We show that the Procrustes -algorithm can be performant for the professional consumer language embeddings -alignment, whereas adversarial training with refinement may find some relations -between two languages. -" -8097,1806.09652,Sree Harsha Ramesh and Krishna Prasad Sankaranarayanan,"Neural Machine Translation for Low Resource Languages using Bilingual - Lexicon Induced from Comparable Corpora",cs.CL," Resources for the non-English languages are scarce and this paper addresses -this problem in the context of machine translation, by automatically extracting -parallel sentence pairs from the multilingual articles available on the -Internet. In this paper, we have used an end-to-end Siamese bidirectional -recurrent neural network to generate parallel sentences from comparable -multilingual articles in Wikipedia. Subsequently, we have showed that using the -harvested dataset improved BLEU scores on both NMT and phrase-based SMT systems -for the low-resource language pairs: English--Hindi and English--Tamil, when -compared to training exclusively on the limited bilingual corpora collected for -these language pairs. -" -8098,1806.09736,Amir Karami and Noelle M. Pendergraft,Computational Analysis of Insurance Complaints: GEICO Case Study,stat.AP cs.CL cs.IR stat.ML," The online environment has provided a great opportunity for insurance -policyholders to share their complaints with respect to different services. -These complaints can reveal valuable information for insurance companies who -seek to improve their services; however, analyzing a huge number of online -complaints is a complicated task for human and must involve computational -methods to create an efficient process. This research proposes a computational -approach to characterize the major topics of a large number of online -complaints. Our approach is based on using the topic modeling approach to -disclose the latent semantic of complaints. The proposed approach deployed on -thousands of GEICO negative reviews. Analyzing 1,371 GEICO complaints indicates -that there are 30 major complains in four categories: (1) customer service, (2) -insurance coverage, paperwork, policy, and reports, (3) legal issues, and (4) -costs, estimates, and payments. This research approach can be used in other -applications to explore a large number of reviews. -" -8099,1806.09751,"Hussein S. Al-Olimat and Steven Gustafson and Jason Mackay and - Krishnaprasad Thirunarayan and Amit Sheth",A Practical Incremental Learning Framework For Sparse Entity Extraction,cs.CL," This work addresses challenges arising from extracting entities from textual -data, including the high cost of data annotation, model accuracy, selecting -appropriate evaluation criteria, and the overall quality of annotation. We -present a framework that integrates Entity Set Expansion (ESE) and Active -Learning (AL) to reduce the annotation cost of sparse data and provide an -online evaluation method as feedback. This incremental and interactive learning -framework allows for rapid annotation and subsequent extraction of sparse data -while maintaining high accuracy. We evaluate our framework on three publicly -available datasets and show that it drastically reduces the cost of sparse -entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores -respectively. Moreover, the method exhibited robust performance across all -datasets. -" -8100,1806.09764,"Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Xiaodan Liang, Lianhui - Qin, Haoye Dong, Eric Xing",Deep Generative Models with Learnable Knowledge Constraints,cs.LG cs.CL cs.CV stat.ML," The broad set of deep generative models (DGMs) has achieved remarkable -advances. However, it is often difficult to incorporate rich structured domain -knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a -principled framework to impose structured constraints on probabilistic models, -but has limited applicability to the diverse DGMs that can lack a Bayesian -formulation or even explicit density evaluation. PR also requires constraints -to be fully specified a priori, which is impractical or suboptimal for complex -knowledge with learnable uncertain parts. In this paper, we establish -mathematical correspondence between PR and reinforcement learning (RL), and, -based on the connection, expand PR to learn constraints as the extrinsic reward -in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is -flexible to adapt arbitrary constraints with the model jointly. Experiments on -human image generation and templated sentence generation show models with -learned knowledge constraints by our algorithm greatly improve over base -generative models. -" -8101,1806.09792,"Dayiheng Liu, Quan Guo, Wubo Li, Jiancheng Lv",A Multi-Modal Chinese Poetry Generation Model,cs.CL," Recent studies in sequence-to-sequence learning demonstrate that RNN -encoder-decoder structure can successfully generate Chinese poetry. However, -existing methods can only generate poetry with a given first line or user's -intent theme. In this paper, we proposed a three-stage multi-modal Chinese -poetry generation approach. Given a picture, the first line, the title and the -other lines of the poem are successively generated in three stages. According -to the characteristics of Chinese poems, we propose a hierarchy-attention -seq2seq model which can effectively capture character, phrase, and sentence -information between contexts and improve the symmetry delivered in poems. In -addition, the Latent Dirichlet allocation (LDA) model is utilized for title -generation and improve the relevance of the whole poem and the title. Compared -with strong baseline, the experimental results demonstrate the effectiveness of -our approach, using machine evaluations as well as human judgments. -" -8102,1806.09827,Sim\'on Roca-Sotelo and Jer\'onimo Arenas-Garc\'ia,"Unveiling the semantic structure of text documents using paragraph-aware - Topic Models",cs.CL cs.IR cs.LG stat.ML," Classic Topic Models are built under the Bag Of Words assumption, in which -word position is ignored for simplicity. Besides, symmetric priors are -typically used in most applications. In order to easily learn topics with -different properties among the same corpus, we propose a new line of work in -which the paragraph structure is exploited. Our proposal is based on the -following assumption: in many text document corpora there are formal -constraints shared across all the collection, e.g. sections. When this -assumption is satisfied, some paragraphs may be related to general concepts -shared by all documents in the corpus, while others would contain the genuine -description of documents. Assuming each paragraph can be semantically more -general, specific, or hybrid, we look for ways to measure this, transferring -this distinction to topics and being able to learn what we call specific and -general topics. Experiments show that this is a proper methodology to highlight -certain paragraphs in structured documents at the same time we learn -interesting and more diverse topics. -" -8103,1806.09828,"Qian Chen, Zhen-Hua Ling, Xiaodan Zhu",Enhancing Sentence Embedding with Generalized Pooling,cs.CL cs.AI cs.LG," Pooling is an essential component of a wide variety of sentence -representation and embedding models. This paper explores generalized pooling -methods to enhance sentence embedding. We propose vector-based multi-head -attention that includes the widely used max pooling, mean pooling, and scalar -self-attention as special cases. The model benefits from properly designed -penalization terms to reduce redundancy in multi-head attention. We evaluate -the proposed model on three different tasks: natural language inference (NLI), -author profiling, and sentiment classification. The experiments show that the -proposed model achieves significant improvement over strong -sentence-encoding-based methods, resulting in state-of-the-art performances on -four datasets. The proposed approach can be easily implemented for more -problems than we discuss in this paper. -" -8104,1806.09835,"Daniel Beck, Gholamreza Haffari, Trevor Cohn",Graph-to-Sequence Learning using Gated Graph Neural Networks,cs.CL cs.LG," Many NLP applications can be framed as a graph-to-sequence learning problem. -Previous work proposing neural architectures on this setting obtained promising -results compared to grammar-based approaches but still rely on linearisation -heuristics and/or standard recurrent networks to achieve the best performance. -In this work, we propose a new model that encodes the full structural -information contained in the graph. Our architecture couples the recently -proposed Gated Graph Neural Networks with an input transformation that allows -nodes and edges to have their own hidden representations, while tackling the -parameter explosion problem present in previous work. Experimental results show -that our model outperforms strong baselines in generation from AMR graphs and -syntax-based neural machine translation. -" -8105,1806.09932,"Mohamed Adel, Mohamed Afify and Akram Gaballah","Text-Independent Speaker Verification Based on Deep Neural Networks and - Segmental Dynamic Time Warping",cs.SD cs.CL eess.AS," In this paper we present a new method for text-independent speaker -verification that combines segmental dynamic time warping (SDTW) and the -d-vector approach. The d-vectors, generated from a feed forward deep neural -network trained to distinguish between speakers, are used as features to -perform alignment and hence calculate the overall distance between the -enrolment and test utterances.We present results on the NIST 2008 data set for -speaker verification where the proposed method outperforms the conventional -i-vector baseline with PLDA scores and outperforms d-vector approach with local -distances based on cosine and PLDA scores. Also score combination with the -i-vector/PLDA baseline leads to significant gains over both methods. -" -8106,1806.10090,"Artyom Gadetsky, Ilya Yakubovskiy, Dmitry Vetrov",Conditional Generators of Words Definitions,cs.CL," We explore recently introduced definition modeling technique that provided -the tool for evaluation of different distributed vector representations of -words through modeling dictionary definitions of words. In this work, we study -the problem of word ambiguities in definition modeling and propose a possible -solution by employing latent variable modeling and soft attention mechanisms. -Our quantitative and qualitative evaluation and analysis of the model shows -that taking into account words ambiguity and polysemy leads to performance -improvement. -" -8107,1806.10201,Gourab Kundu and Avirup Sil and Radu Florian and Wael Hamza,"Neural Cross-Lingual Coreference Resolution and its Application to - Entity Linking",cs.CL," We propose an entity-centric neural cross-lingual coreference model that -builds on multi-lingual embeddings and language-independent features. We -perform both intrinsic and extrinsic evaluations of our model. In the intrinsic -evaluation, we show that our model, when trained on English and tested on -Chinese and Spanish, achieves competitive results to the models trained -directly on Chinese and Spanish respectively. In the extrinsic evaluation, we -show that our English model helps achieve superior entity linking accuracy on -Chinese and Spanish test sets than the top 2015 TAC system without using any -annotated data from Chinese or Spanish. -" -8108,1806.10215,"Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra - Khatri, Angeliki Metallinou, Anu Venkatesh, Ariya Rastrow",Contextual Language Model Adaptation for Conversational Agents,cs.CL," Statistical language models (LM) play a key role in Automatic Speech -Recognition (ASR) systems used by conversational agents. These ASR systems -should provide a high accuracy under a variety of speaking styles, domains, -vocabulary and argots. In this paper, we present a DNN-based method to adapt -the LM to each user-agent interaction based on generalized contextual -information, by predicting an optimal, context-dependent set of LM -interpolation weights. We show that this framework for contextual adaptation -provides accuracy improvements under different possible mixture LM partitions -that are relevant for both (1) Goal-oriented conversational agents where it's -natural to partition the data by the requested application and for (2) Non-goal -oriented conversational agents where the data can be partitioned using topic -labels that come from predictions of a topic classifier. We obtain a relative -WER improvement of 3% with a 1-pass decoding strategy and 6% in a 2-pass -decoding framework, over an unadapted model. We also show up to a 15% relative -improvement in recognizing named entities which is of significant value for -conversational ASR systems. -" -8109,1806.10306,Yerbolat Khassanov and Eng Siong Chng,"Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural - Network Language Models in ASR",cs.CL eess.AS," In automatic speech recognition (ASR) systems, recurrent neural network -language models (RNNLM) are used to rescore a word lattice or N-best hypotheses -list. Due to the expensive training, the RNNLM's vocabulary set accommodates -only small shortlist of most frequent words. This leads to suboptimal -performance if an input speech contains many out-of-shortlist (OOS) words. An -effective solution is to increase the shortlist size and retrain the entire -network which is highly inefficient. Therefore, we propose an efficient method -to expand the shortlist set of a pretrained RNNLM without incurring expensive -retraining and using additional training data. Our method exploits the -structure of RNNLM which can be decoupled into three parts: input projection -layer, middle layers, and output projection layer. Specifically, our method -expands the word embedding matrices in projection layers and keeps the middle -layers unchanged. In this approach, the functionality of the pretrained RNNLM -will be correctly maintained as long as OOS words are properly modeled in two -embedding spaces. We propose to model the OOS words by borrowing linguistic -knowledge from appropriate in-shortlist words. Additionally, we propose to -generate the list of OOS words to expand vocabulary in unsupervised manner by -automatically extracting them from ASR output. -" -8110,1806.10348,"Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun","Learning Visually-Grounded Semantics from Contrastive Adversarial - Samples",cs.CL cs.CV," We study the problem of grounding distributional representations of texts on -the visual domain, namely visual-semantic embeddings (VSE for short). Begin -with an insightful adversarial attack on VSE embeddings, we show the limitation -of current frameworks and image-text datasets (e.g., MS-COCO) both -quantitatively and qualitatively. The large gap between the number of possible -constitutions of real-world semantics and the size of parallel data, to a large -extent, restricts the model to establish the link between textual semantics and -visual concepts. We alleviate this problem by augmenting the MS-COCO image -captioning datasets with textual contrastive adversarial samples. These samples -are synthesized using linguistic rules and the WordNet knowledge base. The -construction procedure is both syntax- and semantics-aware. The samples enforce -the model to ground learned embeddings to concrete concepts within the image. -This simple but powerful technique brings a noticeable improvement over the -baselines on a diverse set of downstream tasks, in addition to defending -known-type adversarial attacks. We release the codes at -https://github.com/ExplorerFreda/VSE-C. -" -8111,1806.10478,"Tommaso Soru, Edgard Marx, Andr\'e Valdestilhas, Diego Esteves, Diego - Moussallem, Gustavo Publio",Neural Machine Translation for Query Construction and Composition,cs.CL cs.AI cs.DB," Research on question answering with knowledge base has recently seen an -increasing use of deep architectures. In this extended abstract, we study the -application of the neural machine translation paradigm for question parsing. We -employ a sequence-to-sequence model to learn graph patterns in the SPARQL graph -query language and their compositions. Instead of inducing the programs through -question-answer pairs, we expect a semi-supervised approach, where alignments -between questions and queries are built through templates. We argue that the -coverage of language utterances can be expanded using late notable works in -natural language generation. -" -8112,1806.10654,"Stefan Gr\""unewald and Sophie Henning and Alexander Koller",Generalized chart constraints for efficient PCFG and TAG parsing,cs.CL," Chart constraints, which specify at which string positions a constituent may -begin or end, have been shown to speed up chart parsers for PCFGs. We -generalize chart constraints to more expressive grammar formalisms and describe -a neural tagger which predicts chart constraints at very high precision. Our -constraints accelerate both PCFG and TAG parsing, and combine effectively with -other pruning techniques (coarse-to-fine and supertagging) for an overall -speedup of two orders of magnitude, while improving accuracy. -" -8113,1806.10722,"Allen Nie, Ashley Zehnder, Rodney L. Page, Arturo L. Pineda, Manuel A. - Rivas, Carlos D. Bustamante, James Zou","DeepTag: inferring all-cause diagnoses from clinical notes in - under-resourced medical domain",cs.CL," Large scale veterinary clinical records can become a powerful resource for -patient care and research. However, clinicians lack the time and resource to -annotate patient records with standard medical diagnostic codes and most -veterinary visits are captured in free text notes. The lack of standard coding -makes it challenging to use the clinical data to improve patient care. It is -also a major impediment to cross-species translational research, which relies -on the ability to accurately identify patient cohorts with specific diagnostic -criteria in humans and animals. In order to reduce the coding burden for -veterinary clinical practice and aid translational research, we have developed -a deep learning algorithm, DeepTag, which automatically infers diagnostic codes -from veterinary free text notes. DeepTag is trained on a newly curated dataset -of 112,558 veterinary notes manually annotated by experts. DeepTag extends -multi-task LSTM with an improved hierarchical objective that captures the -semantic structures between diseases. To foster human-machine collaboration, -DeepTag also learns to abstain in examples when it is uncertain and defers them -to human experts, resulting in improved performance. DeepTag accurately infers -disease codes from free text even in challenging cross-hospital settings where -the text comes from different clinical settings than the ones used for -training. It enables automated disease annotation across a broad range of -clinical diagnoses with minimal pre-processing. The technical framework in this -work can be applied in other medical domains that currently lack medical coding -resources. -" -8114,1806.10771,"Andrew Matteson, Chanhee Lee, Young-Bum Kim, Heuiseok Lim","Rich Character-Level Information for Korean Morphological Analysis and - Part-of-Speech Tagging",cs.CL," Due to the fact that Korean is a highly agglutinative, character-rich -language, previous work on Korean morphological analysis typically employs the -use of sub-character features known as graphemes or otherwise utilizes -comprehensive prior linguistic knowledge (i.e., a dictionary of known -morphological transformation forms, or actions). These models have been created -with the assumption that character-level, dictionary-less morphological -analysis was intractable due to the number of actions required. We present, in -this study, a multi-stage action-based model that can perform morphological -transformation and part-of-speech tagging using arbitrary units of input and -apply it to the case of character-level Korean morphological analysis. Among -models that do not employ prior linguistic knowledge, we achieve -state-of-the-art word and sentence-level tagging accuracy with the Sejong -Korean corpus using our proposed data-driven Bi-LSTM model. -" -8115,1806.11099,"Taylor Arnold, Nicolas Ballier, Thomas Gaillat, Paula Liss\`on","Predicting CEFRL levels in learner English on the basis of metrics and - full texts",cs.CL," This paper analyses the contribution of language metrics and, potentially, of -linguistic structures, to classify French learners of English according to -levels of the Common European Framework of Reference for Languages (CEFRL). The -purpose is to build a model for the prediction of learner levels as a function -of language complexity features. We used the EFCAMDAT corpus, a database of one -million written assignments by learners. After applying language complexity -metrics on the texts, we built a representation matching the language metrics -of the texts to their assigned CEFRL levels. Lexical and syntactic metrics were -computed with LCA, LSA, and koRpus. Several supervised learning models were -built by using Gradient Boosted Trees and Keras Neural Network methods and by -contrasting pairs of CEFRL levels. Results show that it is possible to -implement pairwise distinctions, especially for levels ranging from A1 to B1 -(A1=>A2: 0.916 AUC and A2=>B1: 0.904 AUC). Model explanation reveals -significant linguistic features for the predictiveness in the corpus. Word -tokens and word types appear to play a significant role in determining levels. -This shows that levels are highly dependent on specific semantic profiles. -" -8116,1806.11183,Taylor Arnold and Lauren Tilton,"Cross-Discourse and Multilingual Exploration of Textual Corpora with the - DualNeighbors Algorithm",cs.CL," Word choice is dependent on the cultural context of writers and their -subjects. Different words are used to describe similar actions, objects, and -features based on factors such as class, race, gender, geography and political -affinity. Exploratory techniques based on locating and counting words may, -therefore, lead to conclusions that reinforce culturally inflected boundaries. -We offer a new method, the DualNeighbors algorithm, for linking thematically -similar documents both within and across discursive and linguistic barriers to -reveal cross-cultural connections. Qualitative and quantitative evaluations of -this technique are shown as applied to two cultural datasets of interest to -researchers across the humanities and social sciences. An open-source -implementation of the DualNeighbors algorithm is provided to assist in its -application. -" -8117,1806.11249,"Fandong Meng, Zhaopeng Tu, Yong Cheng, Haiyang Wu, Junjie Zhai, Yuekui - Yang, Di Wang",Neural Machine Translation with Key-Value Memory-Augmented Attention,cs.CL," Although attention-based Neural Machine Translation (NMT) has achieved -remarkable progress in recent years, it still suffers from issues of repeating -and dropping translations. To alleviate these issues, we propose a novel -key-value memory-augmented attention model for NMT, called KVMEMATT. -Specifically, we maintain a timely updated keymemory to keep track of attention -history and a fixed value-memory to store the representation of source sentence -throughout the whole translation process. Via nontrivial transformations and -iterative interactions between the two memories, the decoder focuses on more -appropriate source word(s) for predicting the next target word at each decoding -step, therefore can improve the adequacy of translations. Experimental results -on Chinese=>English and WMT17 German<=>English translation tasks demonstrate -the superiority of the proposed model. -" -8118,1806.11316,"Oluwaseun Ajao, Deepayan Bhowmik and Shahrzad Zargari",Fake News Identification on Twitter with Hybrid CNN and RNN Models,cs.SI cs.CL," The problem associated with the propagation of fake news continues to grow at -an alarming scale. This trend has generated much interest from politics to -academia and industry alike. We propose a framework that detects and classifies -fake news messages from Twitter posts using hybrid of convolutional neural -networks and long-short term recurrent neural network models. The proposed work -using this deep learning approach achieves 82% accuracy. Our approach -intuitively identifies relevant features associated with fake news stories -without previous knowledge of the domain. -" -8119,1806.11322,Nicholas Asher and Soumya Paul,Bias in Semantic and Discourse Interpretation,cs.CL," In this paper, we show how game-theoretic work on conversation combined with -a theory of discourse structure provides a framework for studying interpretive -bias. Interpretive bias is an essential feature of learning and understanding -but also something that can be used to pervert or subvert the truth. The -framework we develop here provides tools for understanding and analyzing the -range of interpretive biases and the factors that contribute to them. -" -8120,1806.11420,"Chandrakant Bothe, Sven Magg, Cornelius Weber, Stefan Wermter","Discourse-Wizard: Discovering Deep Discourse Structure in your - Conversation with RNNs",cs.CL cs.HC cs.LG cs.NE," Spoken language understanding is one of the key factors in a dialogue system, -and a context in a conversation plays an important role to understand the -current utterance. In this work, we demonstrate the importance of context -within the dialogue for neural network models through an online web interface -live demo. We developed two different neural models: a model that does not use -context and a context-based model. The no-context model classifies dialogue -acts at an utterance-level whereas the context-based model takes some preceding -utterances into account. We make these trained neural models available as a -live demo called Discourse-Wizard using a modular server architecture. The live -demo provides an easy to use interface for conversational analysis and for -discovering deep discourse structures in a conversation. -" -8121,1806.11432,"Richard Diehl Martinez, John Kaleialoha Kamalu",Using General Adversarial Networks for Marketing: A Case Study of Airbnb,cs.CL," In this paper, we examine the use case of general adversarial networks (GANs) -in the field of marketing. In particular, we analyze how GAN models can -replicate text patterns from successful product listings on Airbnb, a -peer-to-peer online market for short-term apartment rentals. To do so, we -define the Diehl-Martinez-Kamalu (DMK) loss function as a new class of -functions that forces the model's generated output to include a set of -user-defined keywords. This allows the general adversarial network to recommend -a way of rewording the phrasing of a listing description to increase the -likelihood that it is booked. Although we tailor our analysis to Airbnb data, -we believe this framework establishes a more general model for how generative -algorithms can be used to produce text samples for the purposes of marketing. -" -8122,1806.11461,"Matthew Roddy, Gabriel Skantze, Naomi Harte","Investigating Speech Features for Continuous Turn-Taking Prediction - Using LSTMs",cs.CL," For spoken dialog systems to conduct fluid conversational interactions with -users, the systems must be sensitive to turn-taking cues produced by a user. -Models should be designed so that effective decisions can be made as to when it -is appropriate, or not, for the system to speak. Traditional end-of-turn -models, where decisions are made at utterance end-points, are limited in their -ability to model fast turn-switches and overlap. A more flexible approach is to -model turn-taking in a continuous manner using RNNs, where the system predicts -speech probability scores for discrete frames within a future window. The -continuous predictions represent generalized turn-taking behaviors observed in -the training data and can be applied to make decisions that are not just -limited to end-of-turn detection. In this paper, we investigate optimal -speech-related feature sets for making predictions at pauses and overlaps in -conversation. We find that while traditional acoustic features perform well, -part-of-speech features generally perform worse than word features. We show -that our current models outperform previously reported baselines. -" -8123,1806.11525,"Xingdi Yuan, Marc-Alexandre C\^ot\'e, Alessandro Sordoni, Romain - Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler",Counting to Explore and Generalize in Text-based Games,cs.CL cs.LG," We propose a recurrent RL agent with an episodic exploration mechanism that -helps discovering good policies in text-based game environments. We show -promising results on a set of generated text-based games of varying difficulty -where the goal is to collect a coin located at the end of a chain of rooms. In -contrast to previous text-based RL approaches, we observe that our agent learns -policies that generalize to unseen games of greater difficulty. -" -8124,1806.11532,"Marc-Alexandre C\^ot\'e, \'Akos K\'ad\'ar, Xingdi Yuan, Ben Kybartas, - Tavian Barnes, Emery Fine, James Moore, Ruo Yu Tao, Matthew Hausknecht, Layla - El Asri, Mahmoud Adada, Wendy Tay, Adam Trischler",TextWorld: A Learning Environment for Text-based Games,cs.LG cs.CL stat.ML," We introduce TextWorld, a sandbox learning environment for the training and -evaluation of RL agents on text-based games. TextWorld is a Python library that -handles interactive play-through of text games, as well as backend functions -like state tracking and reward assignment. It comes with a curated list of -games whose features and challenges we have analyzed. More significantly, it -enables users to handcraft or automatically generate new games. Its generative -mechanisms give precise control over the difficulty, scope, and language of -constructed games, and can be used to relax challenges inherent to commercial -text games like partial observability and sparse rewards. By generating sets of -varied but similar games, TextWorld can also be used to study generalization -and transfer learning. We cast text-based games in the Reinforcement Learning -formalism, use our framework to develop a set of benchmark games, and evaluate -several baseline agents on this set and the curated list. -" -8125,1807.00072,Joo-Kyung Kim and Young-Bum Kim,"Joint Learning of Domain Classification and Out-of-Domain Detection with - Dynamic Class Weighting for Satisficing False Acceptance Rates",cs.CL," In domain classification for spoken dialog systems, correct detection of -out-of-domain (OOD) utterances is crucial because it reduces confusion and -unnecessary interaction costs between users and the systems. Previous work -usually utilizes OOD detectors that are trained separately from in-domain (IND) -classifiers, and confidence thresholding for OOD detection given target -evaluation scores. In this paper, we introduce a neural joint learning model -for domain classification and OOD detection, where dynamic class weighting is -used during the model training to satisfice a given OOD false acceptance rate -(FAR) while maximizing the domain classification accuracy. Evaluating on two -domain classification tasks for the utterances from a large spoken dialogue -system, we show that our approach significantly improves the domain -classification performance with satisficing given target FARs. -" -8126,1807.00099,"Braden Hancock, Hongrae Lee, Cong Yu",Generating Titles for Web Tables,cs.CL cs.LG stat.ML," Descriptive titles provide crucial context for interpreting tables that are -extracted from web pages and are a key component of table-based web -applications. Prior approaches have attempted to produce titles by selecting -existing text snippets associated with the table. These approaches, however, -are limited by their dependence on suitable titles existing a priori. In our -user study, we observe that the relevant information for the title tends to be -scattered across the page, and often--more than 80% of the time--does not -appear verbatim anywhere in the page. We propose instead the application of a -sequence-to-sequence neural network model as a more generalizable means of -generating high-quality titles. This is accomplished by extracting many text -snippets that have potentially relevant information to the table, encoding them -into an input sequence, and using both copy and generation mechanisms in the -decoder to balance relevance and readability of the generated title. We -validate this approach with human evaluation on sample web tables and report -that while sequence models with only a copy mechanism or only a generation -mechanism are easily outperformed by simple selection-based baselines, the -model with both capabilities outperforms them all, approaching the quality of -crowdsourced titles while training on fewer than ten thousand examples. To the -best of our knowledge, the proposed technique is the first to consider text -generation methods for table titles and establishes a new state of the art. -" -8127,1807.00122,"Sanaz Bahargam, Evangelos E. Papalexakis","A Constrained Coupled Matrix-Tensor Factorization for Learning - Time-evolving and Emerging Topics",cs.IR cs.CL cs.LG stat.ML," Topic discovery has witnessed a significant growth as a field of data mining -at large. In particular, time-evolving topic discovery, where the evolution of -a topic is taken into account has been instrumental in understanding the -historical context of an emerging topic in a dynamic corpus. Traditionally, -time-evolving topic discovery has focused on this notion of time. However, -especially in settings where content is contributed by a community or a crowd, -an orthogonal notion of time is the one that pertains to the level of expertise -of the content creator: the more experienced the creator, the more advanced the -topic. In this paper, we propose a novel time-evolving topic discovery method -which, in addition to the extracted topics, is able to identify the evolution -of that topic over time, as well as the level of difficulty of that topic, as -it is inferred by the level of expertise of its main contributors. Our method -is based on a novel formulation of Constrained Coupled Matrix-Tensor -Factorization, which adopts constraints well-motivated for, and, as we -demonstrate, are essential for high-quality topic discovery. We qualitatively -evaluate our approach using real data from the Physics and also Programming -Stack Exchange forum, and we were able to identify topics of varying levels of -difficulty which can be linked to external events, such as the announcement of -gravitational waves by the LIGO lab in Physics forum. We provide a quantitative -evaluation of our method by conducting a user study where experts were asked to -judge the coherence and quality of the extracted topics. Finally, our proposed -method has implications for automatic curriculum design using the extracted -topics, where the notion of the level of difficulty is necessary for the proper -modeling of prerequisites and advanced concepts. -" -8128,1807.00181,Ted Underwood,The Historical Significance of Textual Distances,cs.CL cs.CY cs.DL," Measuring similarity is a basic task in information retrieval, and now often -a building-block for more complex arguments about cultural change. But do -measures of textual similarity and distance really correspond to evidence about -cultural proximity and differentiation? To explore that question empirically, -this paper compares textual and social measures of the similarities between -genres of English-language fiction. Existing measures of textual similarity -(cosine similarity on tf-idf vectors or topic vectors) are also compared to new -strategies that use supervised learning to anchor textual measurement in a -social context. -" -8129,1807.00248,"Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Massimo Piccardi","A Shared Attention Mechanism for Interpretation of Neural Automatic - Post-Editing Systems",cs.CL," Automatic post-editing (APE) systems aim to correct the systematic errors -made by machine translators. In this paper, we propose a neural APE system that -encodes the source (src) and machine translated (mt) sentences with two -separate encoders, but leverages a shared attention mechanism to better -understand how the two inputs contribute to the generation of the post-edited -(pe) sentences. Our empirical observations have showed that when the mt is -incorrect, the attention shifts weight toward tokens in the src sentence to -properly edit the incorrect translation. The model has been trained and -evaluated on the official data from the WMT16 and WMT17 APE IT domain -English-German shared tasks. Additionally, we have used the extra 500K -artificial data provided by the shared task. Our system has been able to -reproduce the accuracies of systems trained with the same data, while at the -same time providing better interpretability. -" -8130,1807.00267,"Raghav Gupta, Abhinav Rastogi and Dilek Hakkani-Tur","An Efficient Approach to Encoding Context for Spoken Language - Understanding",cs.CL," In task-oriented dialogue systems, spoken language understanding, or SLU, -refers to the task of parsing natural language user utterances into semantic -frames. Making use of context from prior dialogue history holds the key to more -effective SLU. State of the art approaches to SLU use memory networks to encode -context by processing multiple utterances from the dialogue at each turn, -resulting in significant trade-offs between accuracy and computational -efficiency. On the other hand, downstream components like the dialogue state -tracker (DST) already keep track of the dialogue state, which can serve as a -summary of the dialogue history. In this work, we propose an efficient approach -to encoding context from prior utterances for SLU. More specifically, our -architecture includes a separate recurrent neural network (RNN) based encoding -module that accumulates dialogue context to guide the frame parsing sub-tasks -and can be shared between SLU and DST. In our experiments, we demonstrate the -effectiveness of our approach on dialogues from two domains. -" -8131,1807.00286,"Manuel Mager and Elisabeth Mager and Alfonso Medina-Urrea and Ivan - Meza and Katharina Kann","Lost in Translation: Analysis of Information Loss During Machine - Translation Between Polysynthetic and Fusional Languages",cs.CL," Machine translation from polysynthetic to fusional languages is a challenging -task, which gets further complicated by the limited amount of parallel text -available. Thus, translation performance is far from the state of the art for -high-resource and more intensively studied language pairs. To shed light on the -phenomena which hamper automatic translation to and from polysynthetic -languages, we study translations from three low-resource, polysynthetic -languages (Nahuatl, Wixarika and Yorem Nokki) into Spanish and vice versa. -Doing so, we find that in a morpheme-to-morpheme alignment an important amount -of information contained in polysynthetic morphemes has no Spanish counterpart, -and its translation is often omitted. We further conduct a qualitative analysis -and, thus, identify morpheme types that are commonly hard to align or ignored -in the translation process. -" -8132,1807.00303,"Vinicius Woloszyn, Guilherme Medeiros Machado, Leandro Krug Wives, - Jos\'e Palazzo Moreira de Oliveira","Modeling, comprehending and summarizing textual content by graphs",cs.CL cs.IR," Automatic Text Summarization strategies have been successfully employed to -digest text collections and extract its essential content. Usually, summaries -are generated using textual corpora that belongs to the same domain area where -the summary will be used. Nonetheless, there are special cases where it is not -found enough textual sources, and one possible alternative is to generate a -summary from a different domain. One manner to summarize texts consists of -using a graph model. This model allows giving more importance to words -corresponding to the main concepts from the target domain found in the -summarized text. This gives the reader an overview of the main text concepts as -well as their relationships. However, this kind of summarization presents a -significant number of repeated terms when compared to human-generated -summaries. In this paper, we present an approach to produce graph-model -extractive summaries of texts, meeting the target domain exigences and treating -the terms repetition problem. To evaluate the proposition, we performed a -series of experiments showing that the proposed approach statistically improves -the performance of a model based on Graph Centrality, achieving better -coverage, accuracy, and recall. -" -8133,1807.00488,"Zhu Kaili, Chuan Wang, Ruobing Li, Yang Liu, Tianlei Hu and Hui Lin","A Simple but Effective Classification Model for Grammatical Error - Correction",cs.CL," We treat grammatical error correction (GEC) as a classification problem in -this study, where for different types of errors, a target word is identified, -and the classifier predicts the correct word form from a set of possible -choices. We propose a novel neural network based feature representation and -classification model, trained using large text corpora without human -annotations. Specifically we use RNNs with attention to represent both the left -and right context of a target word. All feature embeddings are learned jointly -in an end-to-end fashion. Experimental results show that our novel approach -outperforms other classifier methods on the CoNLL-2014 test set (F0.5 45.05%). -Our model is simple but effective, and is suitable for industrial production. -" -8134,1807.00543,"Piotr \.Zelasko, Piotr Szyma\'nski, Jan Mizgajski, Adrian Szymczak, - Yishay Carmiel, Najim Dehak",Punctuation Prediction Model for Conversational Speech,cs.CL," An ASR system usually does not predict any punctuation or capitalization. -Lack of punctuation causes problems in result presentation and confuses both -the human reader andoff-the-shelf natural language processing algorithms. To -overcome these limitations, we train two variants of Deep Neural Network (DNN) -sequence labelling models - a Bidirectional Long Short-Term Memory (BLSTM) and -a Convolutional Neural Network (CNN), to predict the punctuation. The models -are trained on the Fisher corpus which includes punctuation annotation. In our -experiments, we combine time-aligned and punctuated Fisher corpus transcripts -using a sequence alignment algorithm. The neural networks are trained on Common -Web Crawl GloVe embedding of the words in Fisher transcripts aligned with -conversation side indicators and word time infomation. The CNNs yield a better -precision and BLSTMs tend to have better recall. While BLSTMs make fewer -mistakes overall, the punctuation predicted by the CNN is more accurate - -especially in the case of question marks. Our results constitute significant -evidence that the distribution of words in time, as well as pre-trained -embeddings, can be useful in the punctuation prediction task. -" -8135,1807.00560,"Sihao Xue, Zhenyi Ying, Fan Mo, Min Wang, Jue Sun",Weight-importance sparse training in keyword spotting,cs.LG cs.CL stat.ML," Large size models are implemented in recently ASR system to deal with complex -speech recognition problems. The num- ber of parameters in these models makes -them hard to deploy, especially on some resource-short devices such as car -tablet. Besides this, at most of time, ASR system is used to deal with -real-time problem such as keyword spotting (KWS). It is contradictory to the -fact that large model requires long com- putation time. To deal with this -problem, we apply some sparse algo- rithms to reduces number of parameters in -some widely used models, Deep Neural Network (DNN) KWS, which requires real -short computation time. We can prune more than 90 % even 95% of parameters in -the model with tiny effect decline. And the sparse model performs better than -baseline models which has same order number of parameters. Besides this, sparse -algorithm can lead us to find rational model size au- tomatically for certain -problem without concerning choosing an original model size. -" -8136,1807.00571,"Jose Camacho-Collados and Luis Espinosa-Anke and Mohammad Taher - Pilehvar",The Interplay between Lexical Resources and Natural Language Processing,cs.CL," Incorporating linguistic, world and common sense knowledge into AI/NLP -systems is currently an important research area, with several open problems and -challenges. At the same time, processing and storing this knowledge in lexical -resources is not a straightforward task. This tutorial proposes to address -these complementary goals from two methodological perspectives: the use of NLP -methods to help the process of constructing and enriching lexical resources and -the use of lexical resources for improving NLP applications. Two main types of -audience can benefit from this tutorial: those working on language resources -who are interested in becoming acquainted with automatic NLP techniques, with -the end goal of speeding and/or easing up the process of resource curation; and -on the other hand, researchers in NLP who would like to benefit from the -knowledge of lexical resources to improve their systems and models. The slides -of the tutorial are available at https://bitbucket.org/luisespinosa/lr-nlp/ -" -8137,1807.00651,"Marta R. Costa-juss\`a, Marcos Zampieri, Santanu Pal",A Neural Approach to Language Variety Translation,cs.CL," In this paper we present the first neural-based machine translation system -trained to translate between standard national varieties of the same language. -We take the pair Brazilian - European Portuguese as an example and compare the -performance of this method to a phrase-based statistical machine translation -system. We report a performance improvement of 0.9 BLEU points in translating -from European to Brazilian Portuguese and 0.2 BLEU points when translating in -the opposite direction. We also carried out a human evaluation experiment with -native speakers of Brazilian Portuguese which indicates that humans prefer the -output produced by the neural-based system in comparison to the statistical -system. -" -8138,1807.00717,"Mark-Christoph M\""uller and Michael Strube","Transparent, Efficient, and Robust Word Embedding Access with WOMBAT",cs.CL," We present WOMBAT, a Python tool which supports NLP practitioners in -accessing word embeddings from code. WOMBAT addresses common research problems, -including unified access, scaling, and robust and reproducible preprocessing. -Code that uses WOMBAT for accessing word embeddings is not only cleaner, more -readable, and easier to reuse, but also much more efficient than code using -standard in-memory methods: a Python script using WOMBAT for evaluating seven -large word embedding collections (8.7M embedding vectors in total) on a simple -SemEval sentence similarity task involving 250 raw sentence pairs completes in -under ten seconds end-to-end on a standard notebook computer. -" -8139,1807.00735,"Yang-Hui He, Vishnu Jejjala, Brent D. Nelson",hep-th,cs.CL hep-th," We apply techniques in natural language processing, computational -linguistics, and machine-learning to investigate papers in hep-th and four -related sections of the arXiv: hep-ph, hep-lat, gr-qc, and math-ph. All of the -titles of papers in each of these sections, from the inception of the arXiv -until the end of 2017, are extracted and treated as a corpus which we use to -train the neural network Word2Vec. A comparative study of common n-grams, -linear syntactical identities, word cloud and word similarities is carried out. -We find notable scientific and sociological differences between the fields. In -conjunction with support vector machines, we also show that the syntactic -structure of the titles in different sub-fields of high energy and mathematical -physics are sufficiently different that a neural network can perform a binary -classification of formal versus phenomenological sections with 87.1% accuracy, -and can perform a finer five-fold classification across all sections with 65.1% -accuracy. -" -8140,1807.00745,Michael A. Hedderich and Dietrich Klakow,"Training a Neural Network in a Low-Resource Setting on Automatically - Annotated Noisy Data",cs.LG cs.CL stat.ML," Manually labeled corpora are expensive to create and often not available for -low-resource languages or domains. Automatic labeling approaches are an -alternative way to obtain labeled data in a quicker and cheaper way. However, -these labels often contain more errors which can deteriorate a classifier's -performance when trained on this data. We propose a noise layer that is added -to a neural network architecture. This allows modeling the noise and train on a -combination of clean and noisy data. We show that in a low-resource NER task we -can improve performance by up to 35% by using additional, noisy data and -handling the noise. -" -8141,1807.00752,Akihiro Kato and Tomi Kinnunen,"Waveform to Single Sinusoid Regression to Estimate the F0 Contour from - Noisy Speech Using Recurrent Deep Neural Networks",eess.AS cs.CL cs.SD stat.ML," The fundamental frequency (F0) represents pitch in speech that determines -prosodic characteristics of speech and is needed in various tasks for speech -analysis and synthesis. Despite decades of research on this topic, F0 -estimation at low signal-to-noise ratios (SNRs) in unexpected noise conditions -remains difficult. This work proposes a new approach to noise robust F0 -estimation using a recurrent neural network (RNN) trained in a supervised -manner. Recent studies employ deep neural networks (DNNs) for F0 tracking as a -frame-by-frame classification task into quantised frequency states but we -propose waveform-to-sinusoid regression instead to achieve both noise -robustness and accurate estimation with increased frequency resolution. - Experimental results with PTDB-TUG corpus contaminated by additive noise -(NOISEX-92) demonstrate that the proposed method improves gross pitch error -(GPE) rate and fine pitch error (FPE) by more than 35 % at SNRs between -10 dB -and +10 dB compared with well-known noise robust F0 tracker, PEFAC. -Furthermore, the proposed method also outperforms state-of-the-art DNN-based -approaches by more than 15 % in terms of both FPE and GPE rate over the -preceding SNR range. -" -8142,1807.00775,Sven Buechel and Udo Hahn,"Representation Mapping: A Novel Approach to Generate High-Quality - Multi-Lingual Emotion Lexicons",cs.CL," In the past years, sentiment analysis has increasingly shifted attention to -representational frameworks more expressive than semantic polarity (being -positive, negative or neutral). However, these richer formats (like Basic -Emotions or Valence-Arousal-Dominance, and variants therefrom), rooted in -psychological research, tend to proliferate the number of representation -schemes for emotion encoding. Thus, a large amount of representationally -incompatible emotion lexicons has been developed by various research groups -adopting one or the other emotion representation format. As a consequence, the -reusability of these resources decreases as does the comparability of systems -using them. In this paper, we propose to solve this dilemma by methods and -tools which map different representation formats onto each other for the sake -of mutual compatibility and interoperability of language resources. We present -the first large-scale investigation of such representation mappings for four -typologically diverse languages and find evidence that our approach produces -(near-)gold quality emotion lexicons, even in cross-lingual settings. Finally, -we use our models to create new lexicons for eight typologically diverse -languages. -" -8143,1807.00791,"Aliaksei Vertsel, Mikhail Rumiantsau","Pragmatic approach to structured data querying via natural language - interface",cs.CL," As the use of technology increases and data analysis becomes integral in many -businesses, the ability to quickly access and interpret data has become more -important than ever. Information retrieval technologies are being utilized by -organizations and companies to manage their information systems and processes. -Despite information retrieval of a large amount of data being efficient -organized in relational databases, a user still needs to master the DB -language/schema to completely formulate the queries. This puts a burden on -organizations and companies to hire employees that are proficient in DB -languages/schemas to formulate queries. To reduce some of the burden on already -overstretched data teams, many organizations are looking for tools that allow -non-developers to query their databases. Unfortunately, writing a valid SQL -query that answers the question a user is trying to ask isn't always easy. Even -seemingly simple questions, like ""Which start-up companies received more than -$200M in funding?"" can actually be very hard to answer, let alone convert into -a SQL query. How do you define start-up companies? By size, location, duration -of time they have been incorporated? This may be fine if a user is working with -a database they're already familiar with, but what if users are not familiar -with the database. What is needed is a centralized system that can effectively -translate natural language queries into specific database queries for different -customer database types. There is a number of factors that can dramatically -affect the system architecture and the set of algorithms used to translate NL -queries into a structured query representation. -" -8144,1807.00818,"Daniil Anastasyev, Ilya Gusev, Eugene Indenbom","Improving part-of-speech tagging via multi-task learning and - character-level word representations",cs.CL cs.AI cs.LG stat.ML," In this paper, we explore the ways to improve POS-tagging using various types -of auxiliary losses and different word representations. As a baseline, we -utilized a BiLSTM tagger, which is able to achieve state-of-the-art results on -the sequence labelling tasks. We developed a new method for character-level -word representation using feedforward neural network. Such representation gave -us better results in terms of speed and performance of the model. We also -applied a novel technique of pretraining such word representations with -existing word vectors. Finally, we designed a new variant of auxiliary loss for -sequence labelling tasks: an additional prediction of the neighbour labels. -Such loss forces a model to learn the dependencies in-side a sequence of labels -and accelerates the process of training. We test these methods on English and -Russian languages. -" -8145,1807.00868,"Vladimir Bataev, Maxim Korenevsky, Ivan Medennikov, Alexander - Zatvornitskiy",Exploring End-to-End Techniques for Low-Resource Speech Recognition,cs.SD cs.CL eess.AS," In this work we present simple grapheme-based system for low-resource speech -recognition using Babel data for Turkish spontaneous speech (80 hours). We have -investigated different neural network architectures performance, including -fully-convolutional, recurrent and ResNet with GRU. Different features and -normalization techniques are compared as well. We also proposed CTC-loss -modification using segmentation during training, which leads to improvement -while decoding with small beam size. Our best model achieved word error rate of -45.8%, which is the best reported result for end-to-end systems using in-domain -data for this task, according to our knowledge. -" -8146,1807.00914,"Edoardo Maria Ponti, Helen O'Horan, Yevgeni Berzak, Ivan Vuli\'c, Roi - Reichart, Thierry Poibeau, Ekaterina Shutova, Anna Korhonen","Modeling Language Variation and Universals: A Survey on Typological - Linguistics for Natural Language Processing",cs.CL," Linguistic typology aims to capture structural and semantic variation across -the world's languages. A large-scale typology could provide excellent guidance -for multilingual Natural Language Processing (NLP), particularly for languages -that suffer from the lack of human labeled resources. We present an extensive -literature survey on the use of typological information in the development of -NLP techniques. Our survey demonstrates that to date, the use of information in -existing typological databases has resulted in consistent but modest -improvements in system performance. We show that this is due to both intrinsic -limitations of databases (in terms of coverage and feature granularity) and -under-employment of the typological features included in them. We advocate for -a new approach that adapts the broad and discrete nature of typological -categories to the contextual and continuous nature of machine learning -algorithms used in contemporary NLP. In particular, we suggest that such -approach could be facilitated by recent developments in data-driven induction -of typological knowledge. -" -8147,1807.00930,"Davide Nunes, Luis Antunes",Neural Random Projections for Language Modelling,cs.CL cs.NE," Neural network-based language models deal with data sparsity problems by -mapping the large discrete space of words into a smaller continuous space of -real-valued vectors. By learning distributed vector representations for words, -each training sample informs the neural network model about a combinatorial -number of other patterns. In this paper, we exploit the sparsity in natural -language even further by encoding each unique input word using a fixed sparse -random representation. These sparse codes are then projected onto a smaller -embedding space which allows for the encoding of word occurrences from a -possibly unknown vocabulary, along with the creation of more compact language -models using a reduced number of parameters. We investigate the properties of -our encoding mechanism empirically, by evaluating its performance on the widely -used Penn Treebank corpus. We show that guaranteeing approximately equidistant -(nearly orthogonal) vector representations for unique discrete inputs is enough -to provide the neural network model with enough information to learn --and make -use-- of distributed representations for these inputs. -" -8148,1807.00938,Gibran Fuentes-Pineda and Ivan Vladimir Meza-Ruiz,Topic Discovery in Massive Text Corpora Based on Min-Hashing,cs.CL," The task of discovering topics in text corpora has been dominated by Latent -Dirichlet Allocation and other Topic Models for over a decade. In order to -apply these approaches to massive text corpora, the vocabulary needs to be -reduced considerably and large computer clusters and/or GPUs are typically -required. Moreover, the number of topics must be provided beforehand but this -depends on the corpus characteristics and it is often difficult to estimate, -especially for massive text corpora. Unfortunately, both topic quality and time -complexity are sensitive to this choice. This paper describes an alternative -approach to discover topics based on Min-Hashing, which can handle massive text -corpora and large vocabularies using modest computer hardware and does not -require to fix the number of topics in advance. The basic idea is to generate -multiple random partitions of the corpus vocabulary to find sets of highly -co-occurring words, which are then clustered to produce the final topics. In -contrast to probabilistic topic models where topics are distributions over the -complete vocabulary, the topics discovered by the proposed approach are sets of -highly co-occurring words. Interestingly, these topics underlie various -thematics with different levels of granularity. An extensive qualitative and -quantitative evaluation using the 20 Newsgroups (18K), Reuters (800K), Spanish -Wikipedia (1M), and English Wikipedia (5M) corpora shows that the proposed -approach is able to consistently discover meaningful and coherent topics. -Remarkably, the time complexity of the proposed approach is linear with respect -to corpus and vocabulary size; a non-parallel implementation was able to -discover topics from the entire English edition of Wikipedia with over 5 -million documents and 1 million words in less than 7 hours. -" -8149,1807.00993,"Bin Wang, Zhijian Ou","Improved training of neural trans-dimensional random field language - models with dynamic noise-contrastive estimation",cs.CL stat.ML," A new whole-sentence language model - neural trans-dimensional random field -language model (neural TRF LM), where sentences are modeled as a collection of -random fields, and the potential function is defined by a neural network, has -been introduced and successfully trained by noise-contrastive estimation (NCE). -In this paper, we extend NCE and propose dynamic noise-contrastive estimation -(DNCE) to solve the two problems observed in NCE training. First, a dynamic -noise distribution is introduced and trained simultaneously to converge to the -data distribution. This helps to significantly cut down the noise sample number -used in NCE and reduce the training cost. Second, DNCE discriminates between -sentences generated from the noise distribution and sentences generated from -the interpolation of the data distribution and the noise distribution. This -alleviates the overfitting problem caused by the sparseness of the training -set. With DNCE, we can successfully and efficiently train neural TRF LMs on -large corpus (about 0.8 billion words) with large vocabulary (about 568 K -words). Neural TRF LMs perform as good as LSTM LMs with less parameters and -being 5x~114x faster in rescoring sentences. Interpolating neural TRF LMs with -LSTM LMs and n-gram LMs can further reduce the error rates. -" -8150,1807.01122,"Nathaniel Blanchard, Daniel Moreira, Aparna Bharati, Walter J. - Scheirer","Getting the subtext without the text: Scalable multimodal sentiment - classification from visual and acoustic modalities",cs.CV cs.CL," In the last decade, video blogs (vlogs) have become an extremely popular -method through which people express sentiment. The ubiquitousness of these -videos has increased the importance of multimodal fusion models, which -incorporate video and audio features with traditional text features for -automatic sentiment detection. Multimodal fusion offers a unique opportunity to -build models that learn from the full depth of expression available to human -viewers. In the detection of sentiment in these videos, acoustic and video -features provide clarity to otherwise ambiguous transcripts. In this paper, we -present a multimodal fusion model that exclusively uses high-level video and -audio features to analyze spoken sentences for sentiment. We discard -traditional transcription features in order to minimize human intervention and -to maximize the deployability of our model on at-scale real-world data. We -select high-level features for our model that have been successful in nonaffect -domains in order to test their generalizability in the sentiment detection -domain. We train and test our model on the newly released CMU Multimodal -Opinion Sentiment and Emotion Intensity (CMUMOSEI) dataset, obtaining an F1 -score of 0.8049 on the validation set and an F1 score of 0.6325 on the held-out -challenge test set. -" -8151,1807.01270,"Tao Ge, Furu Wei, Ming Zhou","Reaching Human-level Performance in Automatic Grammatical Error - Correction: An Empirical Study",cs.CL cs.AI," Neural sequence-to-sequence (seq2seq) approaches have proven to be successful -in grammatical error correction (GEC). Based on the seq2seq framework, we -propose a novel fluency boost learning and inference mechanism. Fluency -boosting learning generates diverse error-corrected sentence pairs during -training, enabling the error correction model to learn how to improve a -sentence's fluency from more instances, while fluency boosting inference allows -the model to correct a sentence incrementally with multiple inference steps. -Combining fluency boost learning and inference with convolutional seq2seq -models, our approach achieves the state-of-the-art performance: 75.72 (F_{0.5}) -on CoNLL-2014 10 annotation dataset and 62.42 (GLEU) on JFLEG test set -respectively, becoming the first GEC system that reaches human-level -performance (72.58 for CoNLL and 62.37 for JFLEG) on both of the benchmarks. -" -8152,1807.01292,Umutcan \c{S}im\c{s}ek and Dieter Fensel,"Intent Generation for Goal-Oriented Dialogue Systems based on Schema.org - Annotations",cs.CL," Goal-oriented dialogue systems typically communicate with a backend (e.g. -database, Web API) to complete certain tasks to reach a goal. The intents that -a dialogue system can recognize are mostly included to the system by the -developer statically. For an open dialogue system that can work on more than a -small set of well curated data and APIs, this manual intent creation will not -scalable. In this paper, we introduce a straightforward methodology for intent -creation based on semantic annotation of data and services on the web. With -this method, the Natural Language Understanding (NLU) module of a goal-oriented -dialogue system can adapt to newly introduced APIs without requiring heavy -developer involvement. We were able to extract intents and necessary slots to -be filled from schema.org annotations. We were also able to create a set of -initial training sentences for classifying user utterances into the generated -intents. We demonstrate our approach on the NLU module of a state-of-the art -dialogue system development framework. -" -8153,1807.01337,"Piero Molino, Huaixiu Zheng, Yi-Chia Wang","COTA: Improving the Speed and Accuracy of Customer Support through - Ranking and Deep Networks",cs.LG cs.CL stat.ML," For a company looking to provide delightful user experiences, it is of -paramount importance to take care of any customer issues. This paper proposes -COTA, a system to improve speed and reliability of customer support for end -users through automated ticket classification and answers selection for support -representatives. Two machine learning and natural language processing -techniques are demonstrated: one relying on feature engineering (COTA v1) and -the other exploiting raw signals through deep learning architectures (COTA v2). -COTA v1 employs a new approach that converts the multi-classification task into -a ranking problem, demonstrating significantly better performance in the case -of thousands of classes. For COTA v2, we propose an Encoder-Combiner-Decoder, a -novel deep learning architecture that allows for heterogeneous input and output -feature types and injection of prior knowledge through network architecture -choices. This paper compares these models and their variants on the task of -ticket classification and answer selection, showing model COTA v2 outperforms -COTA v1, and analyzes their inner workings and shortcomings. Finally, an A/B -test is conducted in a production setting validating the real-world impact of -COTA in reducing issue resolution time by 10 percent without reducing customer -satisfaction. -" -8154,1807.01395,"Madhumita Sushil and Simon \v{S}uster and Kim Luyckx and Walter - Daelemans","Patient representation learning and interpretable evaluation using - clinical notes",cs.CL cs.LG," We have three contributions in this work: 1. We explore the utility of a -stacked denoising autoencoder and a paragraph vector model to learn -task-independent dense patient representations directly from clinical notes. To -analyze if these representations are transferable across tasks, we evaluate -them in multiple supervised setups to predict patient mortality, primary -diagnostic and procedural category, and gender. We compare their performance -with sparse representations obtained from a bag-of-words model. We observe that -the learned generalized representations significantly outperform the sparse -representations when we have few positive instances to learn from, and there is -an absence of strong lexical features. 2. We compare the model performance of -the feature set constructed from a bag of words to that obtained from medical -concepts. In the latter case, concepts represent problems, treatments, and -tests. We find that concept identification does not improve the classification -performance. 3. We propose novel techniques to facilitate model -interpretability. To understand and interpret the representations, we explore -the best encoded features within the patient representations obtained from the -autoencoder model. Further, we calculate feature sensitivity across two -networks to identify the most significant input features for different -classification tasks when we use these pretrained representations as the -supervised input. We successfully extract the most influential features for the -pipeline using this technique. -" -8155,1807.01396,Timothy Dozat and Christopher D. Manning,Simpler but More Accurate Semantic Dependency Parsing,cs.CL," While syntactic dependency annotations concentrate on the surface or -functional structure of a sentence, semantic dependency annotations aim to -capture between-word relationships that are more closely related to the meaning -of a sentence, using graph-structured representations. We extend the LSTM-based -syntactic parser of Dozat and Manning (2017) to train on and generate these -graph structures. The resulting system on its own achieves state-of-the-art -performance, beating the previous, substantially more complex state-of-the-art -system by 0.6% labeled F1. Adding linguistically richer input representations -pushes the margin even higher, allowing us to beat it by 1.9% labeled F1. -" -8156,1807.01466,"Leimin Tian, Catherine Lai, Johanna D. Moore",Polarity and Intensity: the Two Aspects of Sentiment Analysis,cs.CL," Current multimodal sentiment analysis frames sentiment score prediction as a -general Machine Learning task. However, what the sentiment score actually -represents has often been overlooked. As a measurement of opinions and -affective states, a sentiment score generally consists of two aspects: polarity -and intensity. We decompose sentiment scores into these two aspects and study -how they are conveyed through individual modalities and combined multimodal -models in a naturalistic monologue setting. In particular, we build unimodal -and multimodal multi-task learning models with sentiment score prediction as -the main task and polarity and/or intensity classification as the auxiliary -tasks. Our experiments show that sentiment analysis benefits from multi-task -learning, and individual modalities differ when conveying the polarity and -intensity aspects of sentiment. -" -8157,1807.01554,"Yutai Hou, Yijia Liu, Wanxiang Che, Ting Liu","Sequence-to-Sequence Data Augmentation for Dialogue Language - Understanding",cs.CL cs.AI," In this paper, we study the problem of data augmentation for language -understanding in task-oriented dialogue system. In contrast to previous work -which augments an utterance without considering its relation with other -utterances, we propose a sequence-to-sequence generation based data -augmentation framework that leverages one utterance's same semantic -alternatives in the training data. A novel diversity rank is incorporated into -the utterance representation to make the model produce diverse utterances and -these diversely augmented utterances help to improve the language understanding -module. Experimental results on the Airline Travel Information System dataset -and a newly created semantic frame annotation on Stanford Multi-turn, -Multidomain Dialogue Dataset show that our framework achieves significant -improvements of 6.38 and 10.04 F-scores respectively when only a training set -of hundreds utterances is represented. Case studies also confirm that our -method generates diverse utterances. -" -8158,1807.01670,"Tiago Ramalho, Tom\'a\v{s} Ko\v{c}isk\'y, Frederic Besse, S. M. Ali - Eslami, G\'abor Melis, Fabio Viola, Phil Blunsom, Karl Moritz Hermann",Encoding Spatial Relations from Natural Language,cs.CL cs.AI cs.CV cs.LG," Natural language processing has made significant inroads into learning the -semantics of words through distributional approaches, however representations -learnt via these methods fail to capture certain kinds of information implicit -in the real world. In particular, spatial relations are encoded in a way that -is inconsistent with human spatial reasoning and lacking invariance to -viewpoint changes. We present a system capable of capturing the semantics of -spatial relations such as behind, left of, etc from natural language. Our key -contributions are a novel multi-modal objective based on generating images of -scenes from their textual descriptions, and a new dataset on which to train it. -We demonstrate that internal representations are robust to meaning preserving -transformations of descriptions (paraphrase invariance), while viewpoint -invariance is an emergent property of the system. -" -8159,1807.01677,"Sreekavitha Parupalli, Vijjini Anvesh Rao and Radhika Mamidi","Towards Automation of Sense-type Identification of Verbs in - OntoSenseNet(Telugu)",cs.CL," In this paper, we discuss the enrichment of a manually developed resource of -Telugu lexicon, OntoSenseNet. OntoSenseNet is a ontological sense annotated -lexicon that marks each verb of Telugu with a primary and a secondary sense. -The area of research is relatively recent but has a large scope of development. -We provide an introductory work to enrich the OntoSenseNet to promote further -research in Telugu. Classifiers are adopted to learn the sense relevant -features of the words in the resource and also to automate the tagging of -sense-types for verbs. We perform a comparative analysis of different -classifiers applied on OntoSenseNet. The results of the experiment prove that -automated enrichment of the resource is effective using SVM classifiers and -Adaboost ensemble. -" -8160,1807.01679,"Sreekavitha Parupalli, Vijjini Anvesh Rao and Radhika Mamidi","BCSAT : A Benchmark Corpus for Sentiment Analysis in Telugu Using - Word-level Annotations",cs.CL," The presented work aims at generating a systematically annotated corpus that -can support the enhancement of sentiment analysis tasks in Telugu using -word-level sentiment annotations. From OntoSenseNet, we extracted 11,000 -adjectives, 253 adverbs, 8483 verbs and sentiment annotation is being done by -language experts. We discuss the methodology followed for the polarity -annotations and validate the developed resource. This work aims at developing a -benchmark corpus, as an extension to SentiWordNet, and baseline accuracy for a -model where lexeme annotations are applied for sentiment predictions. The -fundamental aim of this paper is to validate and study the possibility of -utilizing machine learning algorithms, word-level sentiment annotations in the -task of automated sentiment identification. Furthermore, accuracy is improved -by annotating the bi-grams extracted from the target corpus. -" -8161,1807.01682,"Weidong Yuan, Alan W Black","Generating Mandarin and Cantonese F0 Contours with Decision Trees and - BLSTMs",cs.CL," This paper models the fundamental frequency contours on both Mandarin and -Cantonese speech with decision trees and DNNs (deep neural networks). Different -kinds of f0 representations and model architectures are tested for decision -trees and DNNs. A new model called Additive-BLSTM (additive bidirectional long -short term memory) that predicts a base f0 contour and a residual f0 contour -with two BLSTMs is proposed. With respect to objective measures of RMSE and -correlation, applying tone-dependent trees together with sample normalization -and delta feature regularization within decision tree framework performs best. -While the new Additive-BLSTM model with delta feature regularization performs -even better. Subjective listening tests on both Mandarin and Cantonese -comparing Random Forest model (multiple decision trees) and the Additive-BLSTM -model were also held and confirmed the advantage of the new model according to -the listeners' preference. -" -8162,1807.01704,Yongping Xing and Chuangbai Xiao and Yifei Wu and Ziming Ding,A Convolutional Neural Network for Aspect Sentiment Classification,cs.CL," With the development of the Internet, natural language processing (NLP), in -which sentiment analysis is an important task, became vital in information -processing.Sentiment analysis includes aspect sentiment classification. Aspect -sentiment can provide complete and in-depth results with increased attention on -aspect-level. Different context words in a sentence influence the sentiment -polarity of a sentence variably, and polarity varies based on the different -aspects in a sentence. Take the sentence, 'I bought a new camera. The picture -quality is amazing but the battery life is too short.'as an example. If the -aspect is picture quality, then the expected sentiment polarity is 'positive', -if the battery life aspect is considered, then the sentiment polarity should be -'negative'; therefore, aspect is important to consider when we explore aspect -sentiment in the sentence. Recurrent neural network (RNN) is regarded as a good -model to deal with natural language processing, and RNNs has get good -performance on aspect sentiment classification including Target-Dependent LSTM -(TD-LSTM) ,Target-Connection LSTM (TC-LSTM) (Tang, 2015a, b), AE-LSTM, AT-LSTM, -AEAT-LSTM (Wang et al., 2016).There are also extensive literatures on sentiment -classification utilizing convolutional neural network, but there is little -literature on aspect sentiment classification using convolutional neural -network. In our paper, we develop attention-based input layers in which aspect -information is considered by input layer. We then incorporate attention-based -input layers into convolutional neural network (CNN) to introduce context words -information. In our experiment, incorporating aspect information into CNN -improves the latter's aspect sentiment classification performance without using -syntactic parser or external sentiment lexicons in a benchmark dataset from -Twitter but get better performance compared with other models. -" -8163,1807.01745,Carlos G\'omez-Rodr\'iguez and Tianze Shi and Lillian Lee,Global Transition-based Non-projective Dependency Parsing,cs.CL," Shi, Huang, and Lee (2017) obtained state-of-the-art results for English and -Chinese dependency parsing by combining dynamic-programming implementations of -transition-based dependency parsers with a minimal set of bidirectional LSTM -features. However, their results were limited to projective parsing. In this -paper, we extend their approach to support non-projectivity by providing the -first practical implementation of the MH_4 algorithm, an $O(n^4)$ mildly -nonprojective dynamic-programming parser with very high coverage on -non-projective treebanks. To make MH_4 compatible with minimal transition-based -feature sets, we introduce a transition-based interpretation of it in which -parser items are mapped to sequences of transitions. We thus obtain the first -implementation of global decoding for non-projective transition-based parsing, -and demonstrate empirically that it is more effective than its projective -counterpart in parsing a number of highly non-projective languages -" -8164,1807.01763,"Yue Liu, Tongtao Zhang, Zhicheng Liang, Heng Ji, Deborah L. McGuinness","Seq2RDF: An end-to-end application for deriving Triples from Natural - Language Text",cs.CL cs.AI," We present an end-to-end approach that takes unstructured textual input and -generates structured output compliant with a given vocabulary. Inspired by -recent successes in neural machine translation, we treat the triples within a -given knowledge graph as an independent graph language and propose an -encoder-decoder framework with an attention mechanism that leverages knowledge -graph embeddings. Our model learns the mapping from natural language text to -triple representation in the form of subject-predicate-object using the -selected knowledge graph vocabulary. Experiments on three different data sets -show that we achieve competitive F1-Measures over the baselines using our -simple yet effective approach. A demo video is included. -" -8165,1807.01836,Vikas Yadav and Rebecca Sharp and Mihai Surdeanu,"Sanity Check: A Strong Alignment and Information Retrieval Baseline for - Question Answering",cs.IR cs.CL," While increasingly complex approaches to question answering (QA) have been -proposed, the true gain of these systems, particularly with respect to their -expensive training requirements, can be inflated when they are not compared to -adequate baselines. Here we propose an unsupervised, simple, and fast alignment -and information retrieval baseline that incorporates two novel contributions: a -\textit{one-to-many alignment} between query and document terms and -\textit{negative alignment} as a proxy for discriminative information. Our -approach not only outperforms all conventional baselines as well as many -supervised recurrent neural networks, but also approaches the state of the art -for supervised systems on three QA datasets. With only three hyperparameters, -we achieve 47\% P@1 on an 8th grade Science QA dataset, 32.9\% P@1 on a Yahoo! -answers QA dataset and 64\% MAP on WikiQA. We also achieve 26.56\% and 58.36\% -on ARC challenge and easy dataset respectively. In addition to including the -additional ARC results in this version of the paper, for the ARC easy set only -we also experimented with one additional parameter -- number of justifications -retrieved. -" -8166,1807.01855,"Shuiyuan Yu, Chunshan Xu, Haitao Liu","Zipf's law in 50 languages: its structural pattern, linguistic - interpretation, and cognitive motivation",cs.CL," Zipf's law has been found in many human-related fields, including language, -where the frequency of a word is persistently found as a power law function of -its frequency rank, known as Zipf's law. However, there is much dispute whether -it is a universal law or a statistical artifact, and little is known about what -mechanisms may have shaped it. To answer these questions, this study conducted -a large scale cross language investigation into Zipf's law. The statistical -results show that Zipf's laws in 50 languages all share a 3-segment structural -pattern, with each segment demonstrating distinctive linguistic properties and -the lower segment invariably bending downwards to deviate from theoretical -expectation. This finding indicates that this deviation is a fundamental and -universal feature of word frequency distributions in natural languages, not the -statistical error of low frequency words. A computer simulation based on the -dual-process theory yields Zipf's law with the same structural pattern, -suggesting that Zipf's law of natural languages are motivated by common -cognitive mechanisms. These results show that Zipf's law in languages is -motivated by cognitive mechanisms like dual-processing that govern human verbal -behaviors. -" -8167,1807.01882,"Zhenyu Jiao, Shuqi Sun, Ke Sun",Chinese Lexical Analysis with Deep Bi-GRU-CRF Network,cs.CL," Lexical analysis is believed to be a crucial step towards natural language -understanding and has been widely studied. Recent years, end-to-end lexical -analysis models with recurrent neural networks have gained increasing -attention. In this report, we introduce a deep Bi-GRU-CRF network that jointly -models word segmentation, part-of-speech tagging and named entity recognition -tasks. We trained the model using several massive corpus pre-tagged by our best -Chinese lexical analysis tool, together with a small, yet high-quality human -annotated corpus. We conducted balanced sampling between different corpora to -guarantee the influence of human annotations, and fine-tune the CRF decoding -layer regularly during the training progress. As evaluated by linguistic -experts, the model achieved a 95.5% accuracy on the test set, roughly 13% -relative error reduction over our (previously) best Chinese lexical analysis -tool. The model is computationally efficient, achieving the speed of 2.3K -characters per second with one thread. -" -8168,1807.01956,"Markus M\""uller, Sebastian St\""uker, and Alex Waibel",Neural Language Codes for Multilingual Acoustic Models,cs.CL cs.LG cs.SD eess.AS," Multilingual Speech Recognition is one of the most costly AI problems, -because each language (7,000+) and even different accents require their own -acoustic models to obtain best recognition performance. Even though they all -use the same phoneme symbols, each language and accent imposes its own coloring -or ""twang"". Many adaptive approaches have been proposed, but they require -further training, additional data and generally are inferior to monolingually -trained models. In this paper, we propose a different approach that uses a -large multilingual model that is \emph{modulated} by the codes generated by an -ancillary network that learns to code useful differences between the ""twangs"" -or human language. - We use Meta-Pi networks to have one network (the language code net) gate the -activity of neurons in another (the acoustic model nets). Our results show that -during recognition multilingual Meta-Pi networks quickly adapt to the proper -language coloring without retraining or new data, and perform better than -monolingually trained networks. The model was evaluated by training acoustic -modeling nets and modulating language code nets jointly and optimize them for -best recognition performance. -" -8169,1807.01996,Sreekavitha Parupalli and Navjyoti Singh,A Formal Ontology-Based Classification of Lexemes and its Applications,cs.CL," The paper describes the enrichment of OntoSenseNet - a verb-centric lexical -resource for Indian Languages. A major contribution of this work is -preservation of an authentic Telugu dictionary by developing a computational -version of the same. It is important because native speakers can better -annotate the sense-types when both the word and its meaning are in Telugu. -Hence efforts are made to develop the aforementioned Telugu dictionary and -annotations are done manually. The manually annotated gold standard corpus -consists 8483 verbs, 253 adverbs and 1673 adjectives. Annotations are done by -native speakers according to defined annotation guidelines. In this paper, we -provide an overview of the annotation procedure and present the validation of -the developed resource through inter-annotator agreement. Additional words from -Telugu WordNet are added to our resource and are crowd-sourced for annotation. -The statistics are compared with the sense-annotated lexicon, our resource for -more insights. -" -8170,1807.02162,"Shweta Yadav, Ankit Kumar, Asif Ekbal, Sriparna Saha and Pushpak - Bhattacharyya","Feature Assisted bi-directional LSTM Model for Protein-Protein - Interaction Identification from Biomedical Texts",cs.IR cs.CL," Knowledge about protein-protein interactions is essential in understanding -the biological processes such as metabolic pathways, DNA replication, and -transcription etc. However, a majority of the existing Protein-Protein -Interaction (PPI) systems are dependent primarily on the scientific literature, -which is yet not accessible as a structured database. Thus, efficient -information extraction systems are required for identifying PPI information -from the large collection of biomedical texts. Most of the existing systems -model the PPI extraction task as a classification problem and are tailored to -the handcrafted feature set including domain dependent features. In this paper, -we present a novel method based on deep bidirectional long short-term memory -(B-LSTM) technique that exploits word sequences and dependency path related -information to identify PPI information from text. This model leverages joint -modeling of proteins and relations in a single unified framework, which we name -as Shortest Dependency Path B-LSTM (sdpLSTM) model. We perform experiments on -two popular benchmark PPI datasets, namely AiMed & BioInfer. The evaluation -shows the F1-score values of 86.45% and 77.35% on AiMed and BioInfer, -respectively. Comparisons with the existing systems show that our proposed -approach attains state-of-the-art performance. -" -8171,1807.02200,"Sergio Oramas, Luis Espinosa-Anke, Francisco G\'omez, Xavier Serra",Natural Language Processing for Music Knowledge Discovery,cs.CL," Today, a massive amount of musical knowledge is stored in written form, with -testimonies dated as far back as several centuries ago. In this work, we -present different Natural Language Processing (NLP) approaches to harness the -potential of these text collections for automatic music knowledge discovery, -covering different phases in a prototypical NLP pipeline, namely corpus -compilation, text-mining, information extraction, knowledge graph generation -and sentiment analysis. Each of these approaches is presented alongside -different use cases (i.e., flamenco, Renaissance and popular music) where large -collections of documents are processed, and conclusions stemming from -data-driven analyses are presented and discussed. -" -8172,1807.02202,"Arun Tejasvi Chaganty, Stephen Mussman, Percy Liang",The price of debiasing automatic metrics in natural language evaluation,cs.CL," For evaluating generation systems, automatic metrics such as BLEU cost -nothing to run but have been shown to correlate poorly with human judgment, -leading to systematic bias against certain model improvements. On the other -hand, averaging human judgments, the unbiased gold standard, is often too -expensive. In this paper, we use control variates to combine automatic metrics -with human evaluation to obtain an unbiased estimator with lower cost than -human evaluation alone. In practice, however, we obtain only a 7-13% cost -reduction on evaluating summarization and open-response question answering -systems. We then prove that our estimator is optimal: there is no unbiased -estimator with lower cost. Our theory further highlights the two fundamental -bottlenecks---the automatic metric and the prompt shown to human -evaluators---both of which need to be improved to obtain greater cost savings. -" -8173,1807.02221,"Marco Del Vecchio, Alexander Kharlamov, Glenn Parry, Ganna Pogrebna","The Data Science of Hollywood: Using Emotional Arcs of Movies to Drive - Business Model Innovation in Entertainment Industries",cs.CL cs.CY," Much of business literature addresses the issues of consumer-centric design: -how can businesses design customized services and products which accurately -reflect consumer preferences? This paper uses data science natural language -processing methodology to explore whether and to what extent emotions shape -consumer preferences for media and entertainment content. Using a unique -filtered dataset of 6,174 movie scripts, we generate a mapping of screen -content to capture the emotional trajectory of each motion picture. We then -combine the obtained mappings into clusters which represent groupings of -consumer emotional journeys. These clusters are used to predict overall success -parameters of the movies including box office revenues, viewer satisfaction -levels (captured by IMDb ratings), awards, as well as the number of viewers' -and critics' reviews. We find that like books all movie stories are dominated -by 6 basic shapes. The highest box offices are associated with the Man in a -Hole shape which is characterized by an emotional fall followed by an emotional -rise. This shape results in financially successful movies irrespective of genre -and production budget. Yet, Man in a Hole succeeds not because it produces most -""liked"" movies but because it generates most ""talked about"" movies. -Interestingly, a carefully chosen combination of production budget and genre -may produce a financially successful movie with any emotional shape. -Implications of this analysis for generating on-demand content and for driving -business model innovation in entertainment industries are discussed. -" -8174,1807.02226,Patrick Connor,"A Concept Specification and Abstraction-based Semantic Representation: - Addressing the Barriers to Rule-based Machine Translation",cs.CL," Rule-based machine translation is more data efficient than the big data-based -machine translation approaches, making it appropriate for languages with low -bilingual corpus resources -- i.e., minority languages. However, the rule-based -approach has declined in popularity relative to its big data cousins primarily -because of the extensive training and labour required to define the language -rules. To address this, we present a semantic representation that 1) treats all -bits of meaning as individual concepts that 2) modify or further specify one -another to build a network that relates entities in space and time. Also, the -representation can 3) encapsulate propositions and thereby define concepts in -terms of other concepts, supporting the abstraction of underlying linguistic -and ontological details. These features afford an exact, yet intuitive semantic -representation aimed at handling the great variety in language and reducing -labour and training time. The proposed natural language generation, parsing, -and translation strategies are also amenable to probabilistic modeling and thus -to learning the necessary rules from example data. -" -8175,1807.02291,Zeping Yu and Gongshen Liu,Sliced Recurrent Neural Networks,cs.CL," Recurrent neural networks have achieved great success in many NLP tasks. -However, they have difficulty in parallelization because of the recurrent -structure, so it takes much time to train RNNs. In this paper, we introduce -sliced recurrent neural networks (SRNNs), which could be parallelized by -slicing the sequences into many subsequences. SRNNs have the ability to obtain -high-level information through multiple layers with few extra parameters. We -prove that the standard RNN is a special case of the SRNN when we use linear -activation functions. Without changing the recurrent units, SRNNs are 136 times -as fast as standard RNNs and could be even faster when we train longer -sequences. Experiments on six largescale sentiment analysis datasets show that -SRNNs achieve better performance than standard RNNs. -" -8176,1807.02301,"Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou",Sequential Copying Networks,cs.CL," Copying mechanism shows effectiveness in sequence-to-sequence based neural -network models for text generation tasks, such as abstractive sentence -summarization and question generation. However, existing works on modeling -copying or pointing mechanism only considers single word copying from the -source sentences. In this paper, we propose a novel copying framework, named -Sequential Copying Networks (SeqCopyNet), which not only learns to copy single -words, but also copies sequences from the input sentence. It leverages the -pointer networks to explicitly select a sub-span from the source side to target -side, and integrates this sequential copying mechanism to the generation -process in the encoder-decoder paradigm. Experiments on abstractive sentence -summarization and question generation tasks show that the proposed SeqCopyNet -can copy meaningful spans and outperforms the baseline models. -" -8177,1807.02305,"Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao","Neural Document Summarization by Jointly Learning to Score and Select - Sentences",cs.CL," Sentence scoring and sentence selection are two main steps in extractive -document summarization systems. However, previous works treat them as two -separated subtasks. In this paper, we present a novel end-to-end neural network -framework for extractive document summarization by jointly learning to score -and select sentences. It first reads the document sentences with a hierarchical -encoder to obtain the representation of sentences. Then it builds the output -summary by extracting sentences one by one. Different from previous methods, -our approach integrates the selection strategy into the scoring model, which -directly predicts the relative importance given previously selected sentences. -Experiments on the CNN/Daily Mail dataset show that the proposed framework -significantly outperforms the state-of-the-art extractive summarization models. -" -8178,1807.02314,"Xianggen Liu, Lili Mou, Haotian Cui, Zhengdong Lu, Sen Song",JUMPER: Learning When to Make Classification Decisions in Reading,cs.IR cs.AI cs.CL cs.LG," In early years, text classification is typically accomplished by -feature-based machine learning models; recently, deep neural networks, as a -powerful learning machine, make it possible to work with raw input as the text -stands. However, exiting end-to-end neural networks lack explicit -interpretation of the prediction. In this paper, we propose a novel framework, -JUMPER, inspired by the cognitive process of text reading, that models text -classification as a sequential decision process. Basically, JUMPER is a neural -system that scans a piece of text sequentially and makes classification -decisions at the time it wishes. Both the classification result and when to -make the classification are part of the decision process, which is controlled -by a policy network and trained with reinforcement learning. Experimental -results show that a properly trained JUMPER has the following properties: (1) -It can make decisions whenever the evidence is enough, therefore reducing total -text reading by 30-40% and often finding the key rationale of prediction. (2) -It achieves classification accuracy better than or comparable to -state-of-the-art models in several benchmark and industrial datasets. -" -8179,1807.02322,"Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc Le, Ni Lao","Memory Augmented Policy Optimization for Program Synthesis and Semantic - Parsing",cs.LG cs.AI cs.CL stat.ML," We present Memory Augmented Policy Optimization (MAPO), a simple and novel -way to leverage a memory buffer of promising trajectories to reduce the -variance of policy gradient estimate. MAPO is applicable to deterministic -environments with discrete actions, such as structured prediction and -combinatorial optimization tasks. We express the expected return objective as a -weighted sum of two terms: an expectation over the high-reward trajectories -inside the memory buffer, and a separate expectation over trajectories outside -the buffer. To make an efficient algorithm of MAPO, we propose: (1) memory -weight clipping to accelerate and stabilize training; (2) systematic -exploration to discover high-reward trajectories; (3) distributed sampling from -inside and outside of the memory buffer to scale up training. MAPO improves the -sample efficiency and robustness of policy gradient, especially on tasks with -sparse rewards. We evaluate MAPO on weakly supervised program synthesis from -natural language (semantic parsing). On the WikiTableQuestions benchmark, we -improve the state-of-the-art by 2.6%, achieving an accuracy of 46.3%. On the -WikiSQL benchmark, MAPO achieves an accuracy of 74.9% with only weak -supervision, outperforming several strong baselines with full supervision. Our -source code is available at -https://github.com/crazydonkey200/neural-symbolic-machines -" -8180,1807.02340,"Wujie Zheng, Wenyu Wang, Dian Liu, Changrong Zhang, Qinsong Zeng, - Yuetang Deng, Wei Yang, Pinjia He, Tao Xie",Testing Untestable Neural Machine Translation: An Industrial Case,cs.CL cs.AI cs.SE," Neural Machine Translation (NMT) has been widely adopted recently due to its -advantages compared with the traditional Statistical Machine Translation (SMT). -However, an NMT system still often produces translation failures due to the -complexity of natural language and sophistication in designing neural networks. -While in-house black-box system testing based on reference translations (i.e., -examples of valid translations) has been a common practice for NMT quality -assurance, an increasingly critical industrial practice, named in-vivo testing, -exposes unseen types or instances of translation failures when real users are -using a deployed industrial NMT system. To fill the gap of lacking test oracle -for in-vivo testing of an NMT system, in this paper, we propose a new approach -for automatically identifying translation failures, without requiring reference -translations for a translation task; our approach can directly serve as a test -oracle for in-vivo testing. Our approach focuses on properties of natural -language translation that can be checked systematically and uses information -from both the test inputs (i.e., the texts to be translated) and the test -outputs (i.e., the translations under inspection) of the NMT system. Our -evaluation conducted on real-world datasets shows that our approach can -effectively detect targeted property violations as translation failures. Our -experiences on deploying our approach in both production and development -environments of WeChat (a messenger app with over one billion monthly active -users) demonstrate high effectiveness of our approach along with high industry -impact. -" -8181,1807.02383,Sonit Singh,Natural Language Processing for Information Extraction,cs.CL cs.AI," With rise of digital age, there is an explosion of information in the form of -news, articles, social media, and so on. Much of this data lies in unstructured -form and manually managing and effectively making use of it is tedious, boring -and labor intensive. This explosion of information and need for more -sophisticated and efficient information handling tools gives rise to -Information Extraction(IE) and Information Retrieval(IR) technology. -Information Extraction systems takes natural language text as input and -produces structured information specified by certain criteria, that is relevant -to a particular application. Various sub-tasks of IE such as Named Entity -Recognition, Coreference Resolution, Named Entity Linking, Relation Extraction, -Knowledge Base reasoning forms the building blocks of various high end Natural -Language Processing (NLP) tasks such as Machine Translation, Question-Answering -System, Natural Language Understanding, Text Summarization and Digital -Assistants like Siri, Cortana and Google Now. This paper introduces Information -Extraction technology, its various sub-tasks, highlights state-of-the-art -research in various IE subtasks, current challenges and future research -directions. -" -8182,1807.02391,"Sudha Subramani, Manjula O'Connor","Extracting Actionable Knowledge from Domestic Violence Discourses on - Social Media",cs.IR cs.CL cs.LG stat.ML," Domestic Violence (DV) is considered as big social issue and there exists a -strong relationship between DV and health impacts of the public. Existing -research studies have focused on social media to track and analyse real world -events like emerging trends, natural disasters, user sentiment analysis, -political opinions, and health care. However there is less attention given on -social welfare issues like DV and its impact on public health. Recently, the -victims of DV turned to social media platforms to express their feelings in the -form of posts and seek the social and emotional support, for sympathetic -encouragement, to show compassion and empathy among public. But, it is -difficult to mine the actionable knowledge from large conversational datasets -from social media due to the characteristics of high dimensions, short, noisy, -huge volume, high velocity, and so on. Hence, this paper will propose a novel -framework to model and discover the various themes related to DV from the -public domain. The proposed framework would possibly provide unprecedentedly -valuable information to the public health researchers, national family health -organizations, government and public with data enrichment and consolidation to -improve the social welfare of the community. Thus provides actionable knowledge -by monitoring and analysing continuous and rich user generated content. -" -8183,1807.02471,Debadri Dutta,"A Review of Different Word Embeddings for Sentiment Classification using - Deep Learning",cs.IR cs.CL cs.LG stat.ML," The web is loaded with textual content, and Natural Language Processing is a -standout amongst the most vital fields in Machine Learning. But when data is -huge simple Machine Learning algorithms are not able to handle it and that is -when Deep Learning comes into play which based on Neural Networks. However -since neural networks cannot process raw text, we have to change over them -through some diverse strategies of word embedding. This paper demonstrates -those distinctive word embedding strategies implemented on an Amazon Review -Dataset, which has two sentiments to be classified: Happy and Unhappy based on -numerous customer reviews. Moreover we demonstrate the distinction in accuracy -with a discourse about which word embedding to apply when. -" -8184,1807.02478,"Chaojun Xiao and Haoxi Zhong and Zhipeng Guo and Cunchao Tu and - Zhiyuan Liu and Maosong Sun and Yansong Feng and Xianpei Han and Zhen Hu and - Heng Wang and Jianfeng Xu",CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction,cs.CL," In this paper, we introduce the \textbf{C}hinese \textbf{AI} and \textbf{L}aw -challenge dataset (CAIL2018), the first large-scale Chinese legal dataset for -judgment prediction. \dataset contains more than $2.6$ million criminal cases -published by the Supreme People's Court of China, which are several times -larger than other datasets in existing works on judgment prediction. Moreover, -the annotations of judgment results are more detailed and rich. It consists of -applicable law articles, charges, and prison terms, which are expected to be -inferred according to the fact descriptions of cases. For comparison, we -implement several conventional text classification baselines for judgment -prediction and experimental results show that it is still a challenge for -current models to predict the judgment results of legal cases, especially on -prison terms. To help the researchers make improvements on legal judgment -prediction, both \dataset and baselines will be released after the CAIL -competition\footnote{http://cail.cipsc.org.cn/}. -" -8185,1807.02599,"M. Tarik Altuncu, Erik Mayer, Sophia N. Yaliraki, Mauricio Barahona","From Text to Topics in Healthcare Records: An Unsupervised Graph - Partitioning Methodology",cs.CL cs.IR cs.LG cs.SI math.SP," Electronic Healthcare Records contain large volumes of unstructured data, -including extensive free text. Yet this source of detailed information often -remains under-used because of a lack of methodologies to extract interpretable -content in a timely manner. Here we apply network-theoretical tools to analyse -free text in Hospital Patient Incident reports from the National Health -Service, to find clusters of documents with similar content in an unsupervised -manner at different levels of resolution. We combine deep neural network -paragraph vector text-embedding with multiscale Markov Stability community -detection applied to a sparsified similarity graph of document vectors, and -showcase the approach on incident reports from Imperial College Healthcare NHS -Trust, London. The multiscale community structure reveals different levels of -meaning in the topics of the dataset, as shown by descriptive terms extracted -from the clusters of records. We also compare a posteriori against hand-coded -categories assigned by healthcare personnel, and show that our approach -outperforms LDA-based models. Our content clusters exhibit good correspondence -with two levels of hand-coded categories, yet they also provide further medical -detail in certain areas and reveal complementary descriptors of incidents -beyond the external classification taxonomy. -" -8186,1807.02658,"J\""org Franke, Jan Niehues, Alex Waibel","Robust and Scalable Differentiable Neural Computer for Question - Answering",cs.CL cs.LG," Deep learning models are often not easily adaptable to new tasks and require -task-specific adjustments. The differentiable neural computer (DNC), a -memory-augmented neural network, is designed as a general problem solver which -can be used in a wide range of tasks. But in reality, it is hard to apply this -model to new tasks. We analyze the DNC and identify possible improvements -within the application of question answering. This motivates a more robust and -scalable DNC (rsDNC). The objective precondition is to keep the general -character of this model intact while making its application more reliable and -speeding up its required training time. The rsDNC is distinguished by a more -robust training, a slim memory unit and a bidirectional architecture. We not -only achieve new state-of-the-art performance on the bAbI task, but also -minimize the performance variance between different initializations. -Furthermore, we demonstrate the simplified applicability of the rsDNC to new -tasks with passable results on the CNN RC task without adaptions. -" -8187,1807.02745,Ryan Cotterell and Jason Eisner,A Deep Generative Model of Vowel Formant Typology,cs.CL," What makes some types of languages more probable than others? For instance, -we know that almost all spoken languages contain the vowel phoneme /i/; why -should that be? The field of linguistic typology seeks to answer these -questions and, thereby, divine the mechanisms that underlie human language. In -our work, we tackle the problem of vowel system typology, i.e., we propose a -generative probability model of which vowels a language contains. In contrast -to previous work, we work directly with the acoustic information -- the first -two formant values -- rather than modeling discrete sets of phonemic symbols -(IPA). We develop a novel generative probability model and report results based -on a corpus of 233 languages. -" -8188,1807.02747,Ryan Cotterell and Christo Kirov and Mans Hulden and Jason Eisner,On the Complexity and Typology of Inflectional Morphological Systems,cs.CL," We quantify the linguistic complexity of different languages' morphological -systems. We verify that there is an empirical trade-off between paradigm size -and irregularity: a language's inflectional paradigms may be either large in -size or highly irregular, but never both. Our methodology measures paradigm -irregularity as the entropy of the surface realization of a paradigm -- how -hard it is to jointly predict all the surface forms of a paradigm. We estimate -this by a variational approximation. Our measurements are taken on large -morphological paradigms from 31 typologically diverse languages. -" -8189,1807.02748,"Kamal Al-Sabahi, Zhang Zuping, Yang Kang","Latent Semantic Analysis Approach for Document Summarization Based on - Word Embeddings",cs.CL," Since the amount of information on the internet is growing rapidly, it is not -easy for a user to find relevant information for his/her query. To tackle this -issue, much attention has been paid to Automatic Document Summarization. The -key point in any successful document summarizer is a good document -representation. The traditional approaches based on word overlapping mostly -fail to produce that kind of representation. Word embedding, distributed -representation of words, has shown an excellent performance that allows words -to match on semantic level. Naively concatenating word embeddings makes the -common word dominant which in turn diminish the representation quality. In this -paper, we employ word embeddings to improve the weighting schemes for -calculating the input matrix of Latent Semantic Analysis method. Two -embedding-based weighting schemes are proposed and then combined to calculate -the values of this matrix. The new weighting schemes are modified versions of -the augment weight and the entropy frequency. The new schemes combine the -strength of the traditional weighting schemes and word embedding. The proposed -approach is experimentally evaluated on three well-known English datasets, DUC -2002, DUC 2004 and Multilingual 2015 Single-document Summarization for English. -The proposed model performs comprehensively better compared to the -state-of-the-art methods, by at least 1% ROUGE points, leading to a conclusion -that it provides a better document representation and a better document summary -as a result. -" -8190,1807.02854,"Pankaj Gupta and Bernt Andrassy and Hinrich Sch\""utze","Replicated Siamese LSTM in Ticketing System for Similarity Learning and - Retrieval in Asymmetric Texts",cs.IR cs.CL cs.LG," The goal of our industrial ticketing system is to retrieve a relevant -solution for an input query, by matching with historical tickets stored in -knowledge base. A query is comprised of subject and description, while a -historical ticket consists of subject, description and solution. To retrieve a -relevant solution, we use textual similarity paradigm to learn similarity in -the query and historical tickets. The task is challenging due to significant -term mismatch in the query and ticket pairs of asymmetric lengths, where -subject is a short text but description and solution are multi-sentence texts. -We present a novel Replicated Siamese LSTM model to learn similarity in -asymmetric text pairs, that gives 22% and 7% gain (Accuracy@10) for retrieval -task, respectively over unsupervised and supervised baselines. We also show -that the topic and distributed semantic features for short and long texts -improved both similarity learning and retrieval. -" -8191,1807.02903,"Nikola Ljube\v{s}i\'c, Darja Fi\v{s}er, Anita Peti-Stanti\'c","Predicting Concreteness and Imageability of Words Within and Across - Languages via Word Embeddings",cs.CL," The notions of concreteness and imageability, traditionally important in -psycholinguistics, are gaining significance in semantic-oriented natural -language processing tasks. In this paper we investigate the predictability of -these two concepts via supervised learning, using word embeddings as -explanatory variables. We perform predictions both within and across languages -by exploiting collections of cross-lingual embeddings aligned to a single -vector space. We show that the notions of concreteness and imageability are -highly predictable both within and across languages, with a moderate loss of up -to 20% in correlation when predicting across languages. We further show that -the cross-lingual transfer via word embeddings is more efficient than the -simple transfer via bilingual dictionaries. -" -8192,1807.02911,"Abdulaziz M. Alayba, Vasile Palade, Matthew England, and Rahat Iqbal",A Combined CNN and LSTM Model for Arabic Sentiment Analysis,cs.CL," Deep neural networks have shown good data modelling capabilities when dealing -with challenging and large datasets from a wide range of application areas. -Convolutional Neural Networks (CNNs) offer advantages in selecting good -features and Long Short-Term Memory (LSTM) networks have proven good abilities -of learning sequential data. Both approaches have been reported to provide -improved results in areas such image processing, voice recognition, language -translation and other Natural Language Processing (NLP) tasks. Sentiment -classification for short text messages from Twitter is a challenging task, and -the complexity increases for Arabic language sentiment classification tasks -because Arabic is a rich language in morphology. In addition, the availability -of accurate pre-processing tools for Arabic is another current limitation, -along with limited research available in this area. In this paper, we -investigate the benefits of integrating CNNs and LSTMs and report obtained -improved accuracy for Arabic sentiment analysis on different datasets. -Additionally, we seek to consider the morphological diversity of particular -Arabic words by using different sentiment classification levels. -" -8193,1807.02974,"Yan Shao, Christian Hardmeier, Joakim Nivre",Universal Word Segmentation: Implementation and Interpretation,cs.CL," Word segmentation is a low-level NLP task that is non-trivial for a -considerable number of languages. In this paper, we present a sequence tagging -framework and apply it to word segmentation for a wide range of languages with -different writing systems and typological characteristics. Additionally, we -investigate the correlations between various typological factors and word -segmentation accuracy. The experimental results indicate that segmentation -accuracy is positively related to word boundary markers and negatively to the -number of unique non-segmental terms. Based on the analysis, we design a small -set of language-specific settings and extensively evaluate the segmentation -system on the Universal Dependencies datasets. Our model obtains -state-of-the-art accuracies on all the UD languages. It performs substantially -better on languages that are non-trivial to segment, such as Chinese, Japanese, -Arabic and Hebrew, when compared to previous work. -" -8194,1807.03004,"Sreekavitha Parupalli, Vijjini Anvesh Rao and Radhika Mamidi","Towards Enhancing Lexical Resource and Using Sense-annotations of - OntoSenseNet for Sentiment Analysis",cs.CL," This paper illustrates the interface of the tool we developed for crowd -sourcing and we explain the annotation procedure in detail. Our tool is named -as 'Parupalli Padajaalam' which means web of words by Parupalli. The aim of -this tool is to populate the OntoSenseNet, sentiment polarity annotated Telugu -resource. Recent works have shown the importance of word-level annotations on -sentiment analysis. With this as basis, we aim to analyze the importance of -sense-annotations obtained from OntoSenseNet in performing the task of -sentiment analysis. We explain the fea- tures extracted from OntoSenseNet -(Telugu). Furthermore we compute and explain the adverbial class distribution -of verbs in OntoSenseNet. This task is known to aid in disambiguating -word-senses which helps in enhancing the performance of word-sense -disambiguation (WSD) task(s). -" -8195,1807.03006,Angel Daza and Anette Frank,A Sequence-to-Sequence Model for Semantic Role Labeling,cs.CL," We explore a novel approach for Semantic Role Labeling (SRL) by casting it as -a sequence-to-sequence process. We employ an attention-based model enriched -with a copying mechanism to ensure faithful regeneration of the input sequence, -while enabling interleaved generation of argument role labels. Here, we apply -this model in a monolingual setting, performing PropBank SRL on English -language data. The constrained sequence generation set-up enforced with the -copying mechanism allows us to analyze the performance and special properties -of the model on manually labeled data and benchmarking against state-of-the-art -sequence labeling models. We show that our model is able to solve the SRL -argument labeling task on English data, yet further structural decoding -constraints will need to be added to make the model truly competitive. Our work -represents a first step towards more advanced, generative SRL labeling setups. -" -8196,1807.03012,"Miguel Feria, Juan Paolo Balbin, Francis Michael Bautista","Constructing a Word Similarity Graph from Vector based Word - Representation for Named Entity Recognition",cs.CL cs.IR," In this paper, we discuss a method for identifying a seed word that would -best represent a class of named entities in a graphical representation of words -and their similarities. Word networks, or word graphs, are representations of -vectorized text where nodes are the words encountered in a corpus, and the -weighted edges incident on the nodes represent how similar the words are to -each other. We intend to build a bilingual word graph and identify seed words -through community analysis that would be best used to segment a graph according -to its named entities, therefore providing an unsupervised way of tagging named -entities for a bilingual language base. -" -8197,1807.03052,Ivan Bilan and Benjamin Roth,"Position-aware Self-attention with Relative Positional Encodings for - Slot Filling",cs.CL cs.AI cs.LG," This paper describes how to apply self-attention with relative positional -encodings to the task of relation extraction. We propose to use the -self-attention encoder layer together with an additional position-aware -attention layer that takes into account positions of the query and the object -in the sentence. The self-attention encoder also uses a custom implementation -of relative positional encodings which allow each word in the sentence to take -into account its left and right context. The evaluation of the model is done on -the TACRED dataset. The proposed model relies only on attention (no recurrent -or convolutional layers are used), while improving performance w.r.t. the -previous state of the art. -" -8198,1807.03053,Pedro Henrique Martins and Lu\'is Cust\'odio and Rodrigo Ventura,"A deep learning approach for understanding natural language commands for - mobile service robots",cs.CL cs.RO," Using natural language to give instructions to robots is challenging, since -natural language understanding is still largely an open problem. In this paper -we address this problem by restricting our attention to commands modeled as one -action, plus arguments (also known as slots). For action detection (also called -intent detection) and slot filling various architectures of Recurrent Neural -Networks and Long Short Term Memory (LSTM) networks were evaluated, having -LSTMs achieved a superior accuracy. As the action requested may not fall within -the robots capabilities, a Support Vector Machine(SVM) is used to determine -whether it is or not. For the input of the neural networks, several word -embedding algorithms were compared. Finally, to implement the system in a -robot, a ROS package is created using a SMACH state machine. The proposed -system is then evaluated both using well-known datasets and benchmarks in the -context of domestic service robots. -" -8199,1807.03096,\'Alvaro Peris and Francisco Casacuberta,"NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and - Online Learning",cs.CL," We present NMT-Keras, a flexible toolkit for training deep learning models, -which puts a particular emphasis on the development of advanced applications of -neural machine translation systems, such as interactive-predictive translation -protocols and long-term adaptation of the translation system via continuous -learning. NMT-Keras is based on an extended version of the popular Keras -library, and it runs on Theano and Tensorflow. State-of-the-art neural machine -translation models are deployed and used following the high-level framework -provided by Keras. Given its high modularity and flexibility, it also has been -extended to tackle different problems, such as image and video captioning, -sentence classification and visual question answering. -" -8200,1807.03100,"Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi - Mao, Oleksandr Polozov, Rishabh Singh",Robust Text-to-SQL Generation with Execution-Guided Decoding,cs.CL cs.AI cs.DB cs.LG cs.PL," We consider the problem of neural semantic parsing, which translates natural -language questions into executable SQL queries. We introduce a new mechanism, -execution guidance, to leverage the semantics of SQL. It detects and excludes -faulty programs during the decoding procedure by conditioning on the execution -of partially generated program. The mechanism can be used with any -autoregressive generative model, which we demonstrate on four state-of-the-art -recurrent or template-based semantic parsing models. We demonstrate that -execution guidance universally improves model performance on various -text-to-SQL datasets with different scales and query complexity: WikiSQL, ATIS, -and GeoQuery. As a result, we achieve new state-of-the-art execution accuracy -of 83.8% on WikiSQL. -" -8201,1807.03108,"Alina Maria Ciobanu, Marcos Zampieri, Shervin Malmasi, Santanu Pal, - Liviu P. Dinu",Discriminating between Indo-Aryan Languages Using SVM Ensembles,cs.CL," In this paper we present a system based on SVM ensembles trained on -characters and words to discriminate between five similar languages of the -Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi. We -investigate the performance of individual features and combine the output of -single classifiers to maximize performance. The system competed in the -Indo-Aryan Language Identification (ILI) shared task organized within the -VarDial Evaluation Campaign 2018. Our best entry in the competition, named -ILIdentification, scored 88:95% F1 score and it was ranked 3rd out of 8 teams. -" -8202,1807.03121,"Wanxiang Che, Yijia Liu, Yuxuan Wang, Bo Zheng, Ting Liu","Towards Better UD Parsing: Deep Contextualized Word Embeddings, - Ensemble, and Treebank Concatenation",cs.CL," This paper describes our system (HIT-SCIR) submitted to the CoNLL 2018 shared -task on Multilingual Parsing from Raw Text to Universal Dependencies. We base -our submission on Stanford's winning system for the CoNLL 2017 shared task and -make two effective extensions: 1) incorporating deep contextualized word -embeddings into both the part of speech tagger and parser; 2) ensembling -parsers trained with different initialization. We also explore different ways -of concatenating treebanks for further improvements. Experimental results on -the development data show the effectiveness of our methods. In the final -evaluation, our system was ranked first according to LAS (75.84%) and -outperformed the other systems by a large margin. -" -8203,1807.03367,"Harm de Vries, Kurt Shuster, Dhruv Batra, Devi Parikh, Jason Weston, - Douwe Kiela",Talk the Walk: Navigating New York City through Grounded Dialogue,cs.AI cs.CL cs.CV cs.LG," We introduce ""Talk The Walk"", the first large-scale dialogue dataset grounded -in action and perception. The task involves two agents (a ""guide"" and a -""tourist"") that communicate via natural language in order to achieve a common -goal: having the tourist navigate to a given target location. The task and -dataset, which are described in detail, are challenging and their full solution -is an open problem that we pose to the community. We (i) focus on the task of -tourist localization and develop the novel Masked Attention for Spatial -Convolutions (MASC) mechanism that allows for grounding tourist utterances into -the guide's map, (ii) show it yields significant improvements for both emergent -and natural language communication, and (iii) using this method, we establish -non-trivial baselines on the full task. -" -8204,1807.03396,Hao Tang and James Glass,"On Training Recurrent Networks with Truncated Backpropagation Through - Time in Speech Recognition",cs.CL cs.LG cs.SD eess.AS," Recurrent neural networks have been the dominant models for many speech and -language processing tasks. However, we understand little about the behavior and -the class of functions recurrent networks can realize. Moreover, the heuristics -used during training complicate the analyses. In this paper, we study recurrent -networks' ability to learn long-term dependency in the context of speech -recognition. We consider two decoding approaches, online and batch decoding, -and show the classes of functions to which the decoding approaches correspond. -We then draw a connection between batch decoding and a popular training -approach for recurrent networks, truncated backpropagation through time. -Changing the decoding approach restricts the amount of past history recurrent -networks can use for prediction, allowing us to analyze their ability to -remember. Empirically, we utilize long-term dependency in subphonetic states, -phonemes, and words, and show how the design decisions, such as the decoding -approach, lookahead, context frames, and consecutive prediction, characterize -the behavior of recurrent networks. Finally, we draw a connection between -Markov processes and vanishing gradients. These results have implications for -studying the long-term dependency in speech data and how these properties are -learned by recurrent networks. -" -8205,1807.03397,"Ashwath Kumar Salimath, Robin K Thomas, Sethuram Ramalinga Reddy, - Yuhao Qiao",Detecting Levels of Depression in Text Based on Metrics,cs.CL," Depression is one of the most common and a major concern for society. Proper -monitoring using devices that can aid in its detection could be helpful to -prevent it all together. The Distress Analysis Interview Corpus (DAIC) is used -to build a metric-based depression detection. We have designed a metric to -describe the level of depression using negative sentences and classify the -participant accordingly. The score generated from the algorithm is then -levelled up to denote the intensity of depression. The results show that -measuring depression is very complex to using text alone as other factors are -not taken into consideration. Further, In the paper, the limitations of -measuring depression using text are described, and future suggestions are made. -" -8206,1807.03399,"Denis Newman-Griffis, Albert M. Lai, Eric Fosler-Lussier",Jointly Embedding Entities and Text with Distant Supervision,cs.CL cs.AI," Learning representations for knowledge base entities and concepts is becoming -increasingly important for NLP applications. However, recent entity embedding -methods have relied on structured resources that are expensive to create for -new domains and corpora. We present a distantly-supervised method for jointly -learning embeddings of entities and text from an unnanotated corpus, using only -a list of mappings between entities and surface forms. We learn embeddings from -open-domain and biomedical corpora, and compare against prior methods that rely -on human-annotated text or large knowledge graph structure. Our embeddings -capture entity similarity and relatedness better than prior work, both in -existing biomedical datasets and a new Wikipedia-based dataset that we release -to the community. Results on analogy completion and entity sense disambiguation -indicate that entities and words capture complementary information that can be -effectively combined for downstream use. -" -8207,1807.03409,Minh Nguyen and Thien Huu Nguyen,"Who is Killed by Police: Introducing Supervised Attention for - Hierarchical LSTMs",cs.CL," Finding names of people killed by police has become increasingly important as -police shootings get more and more public attention (police killing detection). -Unfortunately, there has been not much work in the literature addressing this -problem. The early work in this field \cite{keith2017identifying} proposed a -distant supervision framework based on Expectation Maximization (EM) to deal -with the multiple appearances of the names in documents. However, such EM-based -framework cannot take full advantages of deep learning models, necessitating -the use of hand-designed features to improve the detection performance. In this -work, we present a novel deep learning method to solve the problem of police -killing recognition. The proposed method relies on hierarchical LSTMs to model -the multiple sentences that contain the person names of interests, and -introduce supervised attention mechanisms based on semantical word lists and -dependency trees to upweight the important contextual words. Our experiments -demonstrate the benefits of the proposed model and yield the state-of-the-art -performance for police killing detection. -" -8208,1807.03491,"Jey Han Lau and Trevor Cohn and Timothy Baldwin and Julian Brooke and - Adam Hammond","Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme",cs.CL," In this paper, we propose a joint architecture that captures language, rhyme -and meter for sonnet modelling. We assess the quality of generated poems using -crowd and expert judgements. The stress and rhyme models perform very well, as -generated poems are largely indistinguishable from human-written poems. Expert -evaluation, however, reveals that a vanilla language model captures meter -implicitly, and that machine-generated poems still underperform in terms of -readability and emotion. Our research shows the importance expert evaluation -for poetry generation, and that future research should look beyond rhyme/meter -and focus on poetic language. -" -8209,1807.03583,Andr\'as Dob\'o,"Multi-D Kneser-Ney Smoothing Preserving the Original Marginal - Distributions",cs.CL," Smoothing is an essential tool in many NLP tasks, therefore numerous -techniques have been developed for this purpose in the past. One of the most -widely used smoothing methods are the Kneser-Ney smoothing (KNS) and its -variants, including the Modified Kneser-Ney smoothing (MKNS), which are widely -considered to be among the best smoothing methods available. Although when -creating the original KNS the intention of the authors was to develop such a -smoothing method that preserves the marginal distributions of the original -model, this property was not maintained when developing the MKNS. - In this article I would like to overcome this and propose such a refined -version of the MKNS that preserves these marginal distributions while keeping -the advantages of both previous versions. Beside its advantageous properties, -this novel smoothing method is shown to achieve about the same results as the -MKNS in a standard language modelling task. -" -8210,1807.03586,"Yifan Gao, Lidong Bing, Wang Chen, Michael R. Lyu, Irwin King",Difficulty Controllable Generation of Reading Comprehension Questions,cs.CL cs.AI," We investigate the difficulty levels of questions in reading comprehension -datasets such as SQuAD, and propose a new question generation setting, named -Difficulty-controllable Question Generation (DQG). Taking as input a sentence -in the reading comprehension paragraph and some of its text fragments (i.e., -answers) that we want to ask questions about, a DQG method needs to generate -questions each of which has a given text fragment as its answer, and meanwhile -the generation is under the control of specified difficulty labels---the output -questions should satisfy the specified difficulty as much as possible. To solve -this task, we propose an end-to-end framework to generate questions of -designated difficulty levels by exploring a few important intuitions. For -evaluation, we prepared the first dataset of reading comprehension questions -with difficulty labels. The results show that the question generated by our -framework not only have better quality under the metrics like BLEU, but also -comply with the specified difficulty labels. -" -8211,1807.03591,"Christoph Dalitz, Jens Wilberg, Katrin E. Bednarek",Paired Comparison Sentiment Scores,cs.CL," The method of paired comparisons is an established method in psychology. In -this article, it is applied to obtain continuous sentiment scores for words -from comparisons made by test persons. We created an initial lexicon with -$n=199$ German words from a two-fold all-pair comparison experiment with ten -different test persons. From the probabilistic models taken into account, the -logistic model showed the best agreement with the results of the comparison -experiment. The initial lexicon can then be used in different ways. One is to -create special purpose sentiment lexica through the addition of arbitrary words -that are compared with some of the initial words by test persons. A -cross-validation experiment suggests that only about 18 two-fold comparisons -are necessary to estimate the score of a new, yet unknown word, provided these -words are selected by a modification of a method by Silverstein & Farrell. -Another application of the initial lexicon is the evaluation of automatically -created corpus-based lexica. By such an evaluation, we compared the -corpus-based lexica SentiWS, SenticNet, and SentiWordNet, of which SenticNet 4 -performed best. This technical report is a corrected and extended version of a -presentation made at the ICDM Sentire workshop in 2016. -" -8212,1807.03595,"\'Akos K\'ad\'ar, Marc-Alexandre C\^ot\'e, Grzegorz Chrupa{\l}a, Afra - Alishahi",Revisiting the Hierarchical Multiscale LSTM,cs.CL," Hierarchical Multiscale LSTM (Chung et al., 2016a) is a state-of-the-art -language model that learns interpretable structure from character-level input. -Such models can provide fertile ground for (cognitive) computational -linguistics studies. However, the high complexity of the architecture, training -procedure and implementations might hinder its applicability. We provide a -detailed reproduction and ablation study of the architecture, shedding light on -some of the potential caveats of re-purposing complex deep-learning -architectures. We further show that simplifying certain aspects of the -architecture can in fact improve its performance. We also investigate the -linguistic units (segments) learned by various levels of the model, and argue -that their quality does not correlate with the overall performance of the model -on language modeling. -" -8213,1807.03654,"Kei Yin Ng, Anna Feldman, Jing Peng, Chris Leberknight",Linguistic Characteristics of Censorable Language on SinaWeibo,cs.CL," This paper investigates censorship from a linguistic perspective. We collect -a corpus of censored and uncensored posts on a number of topics, build a -classifier that predicts censorship decisions independent of discussion topics. -Our investigation reveals that the strongest linguistic indicator of censored -content of our corpus is its readability. -" -8214,1807.03656,"Paramita Mirza and Simon Razniewski and Fariz Darari and Gerhard - Weikum",Enriching Knowledge Bases with Counting Quantifiers,cs.CL," Information extraction traditionally focuses on extracting relations between -identifiable entities, such as . Yet, texts -often also contain Counting information, stating that a subject is in a -specific relation with a number of objects, without mentioning the objects -themselves, for example, ""California is divided into 58 counties"". Such -counting quantifiers can help in a variety of tasks such as query answering or -knowledge base curation, but are neglected by prior work. This paper develops -the first full-fledged system for extracting counting information from text, -called CINEX. We employ distant supervision using fact counts from a knowledge -base as training seeds, and develop novel techniques for dealing with several -challenges: (i) non-maximal training seeds due to the incompleteness of -knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) -high diversity of linguistic patterns. Experiments with five human-evaluated -relations show that CINEX can achieve 60% average precision for extracting -counting information. In a large-scale experiment, we demonstrate the potential -for knowledge base enrichment by applying CINEX to 2,474 frequent relations in -Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct -relations, which is 28% more than the existing Wikidata facts for these -relations. -" -8215,1807.03658,"Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty","Video Captioning with Boundary-aware Hierarchical Language Decoding and - Joint Video Prediction",cs.CV cs.CL," The explosion of video data on the internet requires effective and efficient -technology to generate captions automatically for people who are not able to -watch the videos. Despite the great progress of video captioning research, -particularly on video feature encoding, the language decoder is still largely -based on the prevailing RNN decoder such as LSTM, which tends to prefer the -frequent word that aligns with the video. In this paper, we propose a -boundary-aware hierarchical language decoder for video captioning, which -consists of a high-level GRU based language decoder, working as a global -(caption-level) language model, and a low-level GRU based language decoder, -working as a local (phrase-level) language model. Most importantly, we -introduce a binary gate into the low-level GRU language decoder to detect the -language boundaries. Together with other advanced components including joint -video prediction, shared soft attention, and boundary-aware video encoding, our -integrated video captioning framework can discover hierarchical language -information and distinguish the subject and the object in a sentence, which are -usually confusing during the language generation. Extensive experiments on two -widely-used video captioning datasets, MSR-Video-to-Text (MSR-VTT) -\cite{xu2016msr} and YouTube-to-Text (MSVD) \cite{chen2011collecting} show that -our method is highly competitive, compared with the state-of-the-art methods. -" -8216,1807.03674,"S\'ebastien Cossin, Vianney Jouhet, Fleur Mougin, Gayo Diallo, Frantz - Thiessard","IAM at CLEF eHealth 2018: Concept Annotation and Coding in French Death - Certificates",cs.CL," In this paper, we describe the approach and results for our participation in -the task 1 (multilingual information extraction) of the CLEF eHealth 2018 -challenge. We addressed the task of automatically assigning ICD-10 codes to -French death certificates. We used a dictionary-based approach using materials -provided by the task organizers. The terms of the ICD-10 terminology were -normalized, tokenized and stored in a tree data structure. The Levenshtein -distance was used to detect typos. Frequent abbreviations were detected by -manually creating a small set of them. Our system achieved an F-score of 0.786 -(precision: 0.794, recall: 0.779). These scores were substantially higher than -the average score of the systems that participated in the challenge. -" -8217,1807.03756,"Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, Alexander M. Rush",Latent Alignment and Variational Attention,stat.ML cs.CL cs.LG," Neural attention has become central to many state-of-the-art models in -natural language processing and related domains. Attention networks are an -easy-to-train and effective method for softly simulating alignment; however, -the approach does not marginalize over latent alignments in a probabilistic -sense. This property makes it difficult to compare attention to other alignment -approaches, to compose it with probabilistic models, and to perform posterior -inference conditioned on observed data. A related latent approach, hard -attention, fixes these issues, but is generally harder to train and less -accurate. This work considers variational attention networks, alternatives to -soft and hard attention for learning latent variable alignment models, with -tighter approximation bounds based on amortized variational inference. We -further propose methods for reducing the variance of gradients to make these -approaches computationally feasible. Experiments show that for machine -translation and visual question answering, inefficient exact latent variable -models outperform standard neural attention, but these gains go away when using -hard attention based training. On the other hand, variational attention retains -most of the performance gain but with training speed comparable to neural -attention. -" -8218,1807.03819,"Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, - {\L}ukasz Kaiser",Universal Transformers,cs.CL cs.LG stat.ML," Recurrent neural networks (RNNs) sequentially process data by updating their -state with each new data point, and have long been the de facto choice for -sequence modeling tasks. However, their inherently sequential computation makes -them slow to train. Feed-forward and convolutional architectures have recently -been shown to achieve superior results on some sequence modeling tasks such as -machine translation, with the added advantage that they concurrently process -all inputs in the sequence, leading to easy parallelization and faster training -times. Despite these successes, however, popular feed-forward sequence models -like the Transformer fail to generalize in many simple tasks that recurrent -models handle with ease, e.g. copying strings or even simple logical inference -when the string or formula lengths exceed those observed at training time. We -propose the Universal Transformer (UT), a parallel-in-time self-attentive -recurrent sequence model which can be cast as a generalization of the -Transformer model and which addresses these issues. UTs combine the -parallelizability and global receptive field of feed-forward sequence models -like the Transformer with the recurrent inductive bias of RNNs. We also add a -dynamic per-position halting mechanism and find that it improves accuracy on -several tasks. In contrast to the standard Transformer, under certain -assumptions, UTs can be shown to be Turing-complete. Our experiments show that -UTs outperform standard Transformers on a wide range of algorithmic and -language understanding tasks, including the challenging LAMBADA language -modeling task where UTs achieve a new state of the art, and machine translation -where UTs achieve a 0.9 BLEU improvement over Transformers on the WMT14 En-De -dataset. -" -8219,1807.03915,"Hai Pham, Thomas Manzini, Paul Pu Liang, Barnabas Poczos","Seq2Seq2Sentiment: Multimodal Sequence to Sequence Models for Sentiment - Analysis",cs.CL cs.LG stat.ML," Multimodal machine learning is a core research area spanning the language, -visual and acoustic modalities. The central challenge in multimodal learning -involves learning representations that can process and relate information from -multiple modalities. In this paper, we propose two methods for unsupervised -learning of joint multimodal representations using sequence to sequence -(Seq2Seq) methods: a \textit{Seq2Seq Modality Translation Model} and a -\textit{Hierarchical Seq2Seq Modality Translation Model}. We also explore -multiple different variations on the multimodal inputs and outputs of these -seq2seq models. Our experiments on multimodal sentiment analysis using the -CMU-MOSI dataset indicate that our methods learn informative multimodal -representations that outperform the baselines and achieve improved performance -on multimodal sentiment analysis, specifically in the Bimodal case where our -model is able to improve F1 Score by twelve points. We also discuss future -directions for multimodal Seq2Seq methods. -" -8220,1807.03948,"Ramesh Manuvinakurike, Sumanth Bharadwaj, Kallirroi Georgila","A Dialogue Annotation Scheme for Weight Management Chat using the - Trans-Theoretical Model of Health Behavior Change",cs.CL," In this study we collect and annotate human-human role-play dialogues in the -domain of weight management. There are two roles in the conversation: the -""seeker"" who is looking for ways to lose weight and the ""helper"" who provides -suggestions to help the ""seeker"" in their weight loss journey. The chat -dialogues collected are then annotated with a novel annotation scheme inspired -by a popular health behavior change theory called ""trans-theoretical model of -health behavior change"". We also build classifiers to automatically predict the -annotation labels used in our corpus. We find that classification accuracy -improves when oracle segmentations of the interlocutors' sentences are provided -compared to directly classifying unsegmented sentences. -" -8221,1807.03950,"Deepthi Karkada, Ramesh Manuvinakurike, Kallirroi Georgila",Towards Understanding End-of-trip Instructions in a Taxi Ride Scenario,cs.CL," We introduce a dataset containing human-authored descriptions of target -locations in an ""end-of-trip in a taxi ride"" scenario. We describe our data -collection method and a novel annotation scheme that supports understanding of -such descriptions of target locations. Our dataset contains target location -descriptions for both synthetic and real-world images as well as visual -annotations (ground truth labels, dimensions of vehicles and objects, -coordinates of the target location,distance and direction of the target -location from vehicles and objects) that can be used in various visual and -language tasks. We also perform a pilot experiment on how the corpus could be -applied to visual reference resolution in this domain. -" -8222,1807.03955,Dat Quoc Nguyen and Karin Verspoor,"An improved neural network model for joint POS tagging and dependency - parsing",cs.CL," We propose a novel neural network model for joint part-of-speech (POS) -tagging and dependency parsing. Our model extends the well-known BIST -graph-based dependency parser (Kiperwasser and Goldberg, 2016) by incorporating -a BiLSTM-based tagging component to produce automatically predicted POS tags -for the parser. On the benchmark English Penn treebank, our model obtains -strong UAS and LAS scores at 94.51% and 92.87%, respectively, producing 1.5+% -absolute improvements to the BIST graph-based parser, and also obtaining a -state-of-the-art POS tagging accuracy at 97.97%. Furthermore, experimental -results on parsing 61 ""big"" Universal Dependencies treebanks from raw texts -show that our model outperforms the baseline UDPipe (Straka and Strakov\'a, -2017) with 0.8% higher average POS tagging score and 3.6% higher average LAS -score. In addition, with our model, we also obtain state-of-the-art downstream -task scores for biomedical event extraction and opinion analysis applications. -Our code is available together with all pre-trained models at: -https://github.com/datquocnguyen/jPTDP -" -8223,1807.04053,Daniel Varab and Natalie Schluter,UniParse: A universal graph-based parsing toolkit,cs.CL," This paper describes the design and use of the graph-based parsing framework -and toolkit UniParse, released as an open-source python software package. -UniParse as a framework novelly streamlines research prototyping, development -and evaluation of graph-based dependency parsing architectures. UniParse does -this by enabling highly efficient, sufficiently independent, easily readable, -and easily extensible implementations for all dependency parser components. We -distribute the toolkit with ready-made configurations as re-implementations of -all current state-of-the-art first-order graph-based parsers, including even -more efficient Cython implementations of both encoders and decoders, as well as -the required specialised loss functions. -" -8224,1807.04148,"Johannes Hellrich, Sven Buechel, and Udo Hahn","JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and - Emotion",cs.CL," We here introduce a substantially extended version of JeSemE, an interactive -website for visually exploring computationally derived time-variant information -on word meanings and lexical emotions assembled from five large diachronic text -corpora. JeSemE is designed for scholars in the (digital) humanities as an -alternative to consulting manually compiled, printed dictionaries for such -information (if available at all). This tool uniquely combines state-of-the-art -distributional semantics with a nuanced model of human emotions, two -information streams we deem beneficial for a data-driven interpretation of -texts in the humanities. -" -8225,1807.04172,Tom\'a\v{s} Brychc\'in,Linear Transformations for Cross-lingual Semantic Textual Similarity,cs.CL," Cross-lingual semantic textual similarity systems estimate the degree of the -meaning similarity between two sentences, each in a different language. -State-of-the-art algorithms usually employ machine translation and combine vast -amount of features, making the approach strongly supervised, resource rich, and -difficult to use for poorly-resourced languages. - In this paper, we study linear transformations, which project monolingual -semantic spaces into a shared space using bilingual dictionaries. We propose a -novel transformation, which builds on the best ideas from prior works. We -experiment with unsupervised techniques for sentence similarity based only on -semantic spaces and we show they can be significantly improved by the word -weighting. Our transformation outperforms other methods and together with word -weighting leads to very promising results on several datasets in different -languages. -" -8226,1807.04175,"Tom\'a\v{s} Brychc\'in, Stephen Eugene Taylor, Luk\'a\v{s} Svoboda","Cross-lingual Word Analogies using Linear Transformations between - Semantic Spaces",cs.CL," We generalize the word analogy task across languages, to provide a new -intrinsic evaluation method for cross-lingual semantic spaces. We experiment -with six languages within different language families, including English, -German, Spanish, Italian, Czech, and Croatian. State-of-the-art monolingual -semantic spaces are transformed into a shared space using dictionaries of word -translations. We compare several linear transformations and rank them for -experiments with monolingual (no transformation), bilingual (one semantic space -is transformed to another), and multilingual (all semantic spaces are -transformed onto English space) versions of semantic spaces. We show that -tested linear transformations preserve relationships between words (word -analogies) and lead to impressive results. We achieve average accuracy of -51.1%, 43.1%, and 38.2% for monolingual, bilingual, and multilingual semantic -spaces, respectively. -" -8227,1807.04441,"Roberto Camacho Barranco, Raimundo F. Dos Santos, M. Shahriar Hossain","Tracking the Evolution of Words with Time-reflective Text - Representations",cs.CL cs.IR," More than 80% of today's data is unstructured in nature, and these -unstructured datasets evolve over time. A large part of these datasets are text -documents generated by media outlets, scholarly articles in digital libraries, -findings from scientific and professional communities, and social media. Vector -space models were developed to analyze text data using data mining and machine -learning algorithms. While ample vector space models exist for text data, the -evolutionary aspect of ever-changing text corpora is still missing in -vector-based representations. The advent of word embeddings has enabled us to -create a contextual vector space, but the embeddings fail to consider the -temporal aspects of the feature space successfully. This paper presents an -approach to include temporal aspects in feature spaces. The inclusion of the -time aspect in the feature space provides vectors for every natural language -element, such as words or entities, at every timestamp. Such temporal word -vectors allow us to track how the meaning of a word changes over time, by -studying the changes in its neighborhood. Moreover, a time-reflective text -representation will pave the way to a new set of text analytic abilities -involving time series for text collections. In this paper, we present a -time-reflective vector space model for temporal text data that is able to -capture short and long-term changes in the meaning of words. We compare our -approach with the limited literature on dynamic embeddings. We present -qualitative and quantitative evaluations using the tracking of semantic -evolution as the target application. -" -8228,1807.04687,"Linara Adilova, Sven Giesselbach, Stefan R\""uping",Making Efficient Use of a Domain Expert's Time in Relation Extraction,cs.LG cs.CL stat.ML," Scarcity of labeled data is one of the most frequent problems faced in -machine learning. This is particularly true in relation extraction in text -mining, where large corpora of texts exists in many application domains, while -labeling of text data requires an expert to invest much time to read the -documents. Overall, state-of-the art models, like the convolutional neural -network used in this paper, achieve great results when trained on large enough -amounts of labeled data. However, from a practical point of view the question -arises whether this is the most efficient approach when one takes the manual -effort of the expert into account. In this paper, we report on an alternative -approach where we first construct a relation extraction model using distant -supervision, and only later make use of a domain expert to refine the results. -Distant supervision provides a mean of labeling data given known relations in a -knowledge base, but it suffers from noisy labeling. We introduce an active -learning based extension, that allows our neural network to incorporate expert -feedback and report on first results on a complex data set. -" -8229,1807.04715,"Konstantinos Skianis, Nikolaos Tziortziotis, Michalis Vazirgiannis",Orthogonal Matching Pursuit for Text Classification,cs.LG cs.CL stat.ML," In text classification, the problem of overfitting arises due to the high -dimensionality, making regularization essential. Although classic regularizers -provide sparsity, they fail to return highly accurate models. On the contrary, -state-of-the-art group-lasso regularizers provide better results at the expense -of low sparsity. In this paper, we apply a greedy variable selection algorithm, -called Orthogonal Matching Pursuit, for the text classification task. We also -extend standard group OMP by introducing overlapping Group OMP to handle -overlapping groups of features. Empirical analysis verifies that both OMP and -overlapping GOMP constitute powerful regularizers, able to produce effective -and very sparse models. Code and data are available online: -https://github.com/y3nk0/OMP-for-Text-Classification . -" -8230,1807.04723,"Iulian Vlad Serban, Chinnadhurai Sankar, Michael Pieper, Joelle - Pineau, Yoshua Bengio","The Bottleneck Simulator: A Model-based Deep Reinforcement Learning - Approach",cs.LG cs.AI cs.CL cs.NE stat.ML," Deep reinforcement learning has recently shown many impressive successes. -However, one major obstacle towards applying such methods to real-world -problems is their lack of data-efficiency. To this end, we propose the -Bottleneck Simulator: a model-based reinforcement learning method which -combines a learned, factorized transition model of the environment with rollout -simulations to learn an effective policy from few examples. The learned -transition model employs an abstract, discrete (bottleneck) state, which -increases sample efficiency by reducing the number of model parameters and by -exploiting structural properties of the environment. We provide a mathematical -analysis of the Bottleneck Simulator in terms of fixed points of the learned -policy, which reveals how performance is affected by four distinct sources of -error: an error related to the abstract space structure, an error related to -the transition model estimation variance, an error related to the transition -model estimation bias, and an error related to the transition model class bias. -Finally, we evaluate the Bottleneck Simulator on two natural language -processing tasks: a text adventure game and a real-world, complex dialogue -response selection task. On both tasks, the Bottleneck Simulator yields -excellent performance beating competing approaches. -" -8231,1807.04783,Christo Kirov and Ryan Cotterell,"Recurrent Neural Networks in Linguistic Theory: Revisiting Pinker and - Prince (1988) and the Past Tense Debate",cs.CL," Can advances in NLP help advance cognitive modeling? We examine the role of -artificial neural networks, the current state of the art in many common NLP -tasks, by returning to a classic case study. In 1986, Rumelhart and McClelland -famously introduced a neural architecture that learned to transduce English -verb stems to their past tense forms. Shortly thereafter, Pinker & Prince -(1988) presented a comprehensive rebuttal of many of Rumelhart and McClelland's -claims. Much of the force of their attack centered on the empirical inadequacy -of the Rumelhart and McClelland (1986) model. Today, however, that model is -severely outmoded. We show that the Encoder-Decoder network architectures used -in modern NLP systems obviate most of Pinker and Prince's criticisms without -requiring any simplication of the past tense mapping problem. We suggest that -the empirical performance of modern networks warrants a re-examination of their -utility in linguistic and cognitive modeling. -" -8232,1807.04863,"Adji B. Dieng, Yoon Kim, Alexander M. Rush, David M. Blei",Avoiding Latent Variable Collapse With Generative Skip Models,stat.ML cs.CL cs.LG," Variational autoencoders learn distributions of high-dimensional data. They -model data with a deep latent-variable model and then fit the model by -maximizing a lower bound of the log marginal likelihood. VAEs can capture -complex distributions, but they can also suffer from an issue known as ""latent -variable collapse,"" especially if the likelihood model is powerful. -Specifically, the lower bound involves an approximate posterior of the latent -variables; this posterior ""collapses"" when it is set equal to the prior, i.e., -when the approximate posterior is independent of the data. While VAEs learn -good generative models, latent variable collapse prevents them from learning -useful representations. In this paper, we propose a simple new way to avoid -latent variable collapse by including skip connections in our generative model; -these connections enforce strong links between the latent variables and the -likelihood function. We study generative skip models both theoretically and -empirically. Theoretically, we prove that skip models increase the mutual -information between the observations and the inferred latent variables. -Empirically, we study images (MNIST and Omniglot) and text (Yahoo). Compared to -existing VAE architectures, we show that generative skip models maintain -similar predictive performance but lead to less collapse and provide more -meaningful representations of the data. -" -8233,1807.04905,"Eunsol Choi, Omer Levy, Yejin Choi, Luke Zettlemoyer",Ultra-Fine Entity Typing,cs.CL cs.AI cs.LG," We introduce a new entity typing task: given a sentence with an entity -mention, the goal is to predict a set of free-form phrases (e.g. skyscraper, -songwriter, or criminal) that describe appropriate types for the target entity. -This formulation allows us to use a new type of distant supervision at large -scale: head words, which indicate the type of the noun phrases they appear in. -We show that these ultra-fine types can be crowd-sourced, and introduce new -evaluation sets that are much more diverse and fine-grained than existing -benchmarks. We present a model that can predict open types, and is trained -using a multitask objective that pools our new head-word supervision with prior -supervision from entity linking. Experimental results demonstrate that our -model is effective in predicting entity types at varying granularity; it -achieves state of the art performance on an existing fine-grained entity typing -benchmark, and sets baselines for our newly-introduced datasets. Our data and -model can be downloaded from: http://nlp.cs.washington.edu/entity_type -" -8234,1807.04978,"Zhangyu Xiao, Zhijian Ou, Wei Chu, Hui Lin","Hybrid CTC-Attention based End-to-End Speech Recognition using Subword - Units",eess.AS cs.CL cs.SD," In this paper, we present an end-to-end automatic speech recognition system, -which successfully employs subword units in a hybrid CTC-Attention based -system. The subword units are obtained by the byte-pair encoding (BPE) -compression algorithm. Compared to using words as modeling units, using -characters or subword units does not suffer from the out-of-vocabulary (OOV) -problem. Furthermore, using subword units further offers a capability in -modeling longer context than using characters. We evaluate different systems -over the LibriSpeech 1000h dataset. The subword-based hybrid CTC-Attention -system obtains 6.8% word error rate (WER) on the test_clean subset without any -dictionary or external language model. This represents a significant -improvement (a 12.8% WER relative reduction) over the character-based hybrid -CTC-Attention system. -" -8235,1807.04990,"Zeyang Lei, Yujiu Yang, Min Yang, Yi Liu","A Multi-sentiment-resource Enhanced Attention Network for Sentiment - Classification",cs.CL," Deep learning approaches for sentiment classification do not fully exploit -sentiment linguistic knowledge. In this paper, we propose a -Multi-sentiment-resource Enhanced Attention Network (MEAN) to alleviate the -problem by integrating three kinds of sentiment linguistic knowledge (e.g., -sentiment lexicon, negation words, intensity words) into the deep neural -network via attention mechanisms. By using various types of sentiment -resources, MEAN utilizes sentiment-relevant information from different -representation subspaces, which makes it more effective to capture the overall -semantics of the sentiment, negation and intensity words for sentiment -prediction. The experimental results demonstrate that MEAN has robust -superiority over strong competitors. -" -8236,1807.05013,"Christophe Cerisara (SYNALP), Somayeh Jafaritazehjani, Adedayo - Oluokun, Hoa Le (SYNALP)",Multi-task dialog act and sentiment recognition on Mastodon,cs.CL," Because of license restrictions, it often becomes impossible to strictly -reproduce most research results on Twitter data already a few months after the -creation of the corpus. This situation worsened gradually as time passes and -tweets become inaccessible. This is a critical issue for reproducible and -accountable research on social media. We partly solve this challenge by -annotating a new Twitter-like corpus from an alternative large social medium -with licenses that are compatible with reproducible experiments: Mastodon. We -manually annotate both dialogues and sentiments on this corpus, and train a -multi-task hierarchical recurrent network on joint sentiment and dialog act -recognition. We experimentally demonstrate that transfer learning may be -efficiently achieved between both tasks, and further analyze some specific -correlations between sentiments and dialogues on social media. Both the -annotated corpus and deep network are released with an open-source license. -" -8237,1807.05127,"Shikhar Murty*, Patrick Verga*, Luke Vilnis, Irena Radovanovic, Andrew - McCallum","Hierarchical Losses and New Resources for Fine-grained Entity Typing and - Linking",cs.CL," Extraction from raw text to a knowledge base of entities and fine-grained -types is often cast as prediction into a flat set of entity and type labels, -neglecting the rich hierarchies over types and entities contained in curated -ontologies. Previous attempts to incorporate hierarchical structure have -yielded little benefit and are restricted to shallow ontologies. This paper -presents new methods using real and complex bilinear mappings for integrating -hierarchical information, yielding substantial improvement over flat -predictions in entity linking and fine-grained entity typing, and achieving new -state-of-the-art results for end-to-end models on the benchmark FIGER dataset. -We also present two new human-annotated datasets containing wide and deep -hierarchies which we will release to the community to encourage further -research in this direction: MedMentions, a collection of PubMed abstracts in -which 246k mentions have been mapped to the massive UMLS ontology; and TypeNet, -which aligns Freebase types with the WordNet hierarchy to obtain nearly 2k -entity types. In experiments on all three datasets we show substantial gains -from hierarchy-aware training. -" -8238,1807.05151,Gregor Wiedemann and Seid Muhie Yimam and Chris Biemann,"New/s/leak 2.0 - Multilingual Information Extraction and Visualization - for Investigative Journalism",cs.CL cs.IR," Investigative journalism in recent years is confronted with two major -challenges: 1) vast amounts of unstructured data originating from large text -collections such as leaks or answers to Freedom of Information requests, and 2) -multi-lingual data due to intensified global cooperation and communication in -politics, business and civil society. Faced with these challenges, journalists -are increasingly cooperating in international networks. To support such -collaborations, we present the new version of new/s/leak 2.0, our open-source -software for content-based searching of leaks. It includes three novel main -features: 1) automatic language detection and language-dependent information -extraction for 40 languages, 2) entity and keyword visualization for efficient -exploration, and 3) decentral deployment for analysis of confidential data from -various formats. We illustrate the new analysis capabilities with an exemplary -case study. -" -8239,1807.05154,"Hongxiao Bai, Hai Zhao",Deep Enhanced Representation for Implicit Discourse Relation Recognition,cs.CL cs.AI cs.LG," Implicit discourse relation recognition is a challenging task as the relation -prediction without explicit connectives in discourse parsing needs -understanding of text spans and cannot be easily derived from surface features -from the input sentence pairs. Thus, properly representing the text is very -crucial to this task. In this paper, we propose a model augmented with -different grained text representations, including character, subword, word, -sentence, and sentence pair levels. The proposed deeper model is evaluated on -the benchmark treebank and achieves state-of-the-art accuracy with greater than -48% in 11-way and $F_1$ score greater than 50% in 4-way classifications for the -first time according to our best knowledge. -" -8240,1807.05195,"Daniel Grie{\ss}haber, Ngoc Thang Vu, and Johannes Maucher",Low-Resource Text Classification using Domain-Adversarial Learning,cs.CL," Deep learning techniques have recently shown to be successful in many natural -language processing tasks forming state-of-the-art systems. They require, -however, a large amount of annotated data which is often missing. This paper -explores the use of domain-adversarial learning as a regularizer to avoid -overfitting when training domain invariant features for deep, complex neural -networks in low-resource and zero-resource settings in new target domains or -languages. In case of new languages, we show that monolingual word vectors can -be directly used for training without prealignment. Their projection into a -common space can be learnt ad-hoc at training time reaching the final -performance of pretrained multilingual word vectors. -" -8241,1807.05206,Abdulkareem Alsudais,"Image Classification for Arabic: Assessing the Accuracy of Direct - English to Arabic Translations",cs.CV cs.CL," Image classification is an ongoing research challenge. Most of the available -research focuses on image classification for the English language, however -there is very little research on image classification for the Arabic language. -Expanding image classification to Arabic has several applications. The present -study investigated a method for generating Arabic labels for images of objects. -The method used in this study involved a direct English to Arabic translation -of the labels that are currently available on ImageNet, a database commonly -used in image classification research. The purpose of this study was to test -the accuracy of this method. In this study, 2,887 labeled images were randomly -selected from ImageNet. All of the labels were translated from English to -Arabic using Google Translate. The accuracy of the translations was evaluated. -Results indicated that that 65.6% of the Arabic labels were accurate. This -study makes three important contributions to the image classification -literature: (1) it determined the baseline level of accuracy for algorithms -that provide Arabic labels for images, (2) it provided 1,895 images that are -tagged with accurate Arabic labels, and (3) provided the accuracy of -translations of image labels from English to Arabic. -" -8242,1807.05324,"Heng Ding, Krisztian Balog",Generating Synthetic Data for Neural Keyword-to-Question Models,cs.IR cs.CL," Search typically relies on keyword queries, but these are often semantically -ambiguous. We propose to overcome this by offering users natural language -questions, based on their keyword queries, to disambiguate their intent. This -keyword-to-question task may be addressed using neural machine translation -techniques. Neural translation models, however, require massive amounts of -training data (keyword-question pairs), which is unavailable for this task. The -main idea of this paper is to generate large amounts of synthetic training data -from a small seed set of hand-labeled keyword-question pairs. Since natural -language questions are available in large quantities, we develop models to -automatically generate the corresponding keyword queries. Further, we introduce -various filtering mechanisms to ensure that synthetic training data is of high -quality. We demonstrate the feasibility of our approach using both automatic -and manual evaluation. This is an extended version of the article published -with the same title in the Proceedings of ICTIR'18. -" -8243,1807.05353,"Raj Dabre, Atsushi Fujita","Recurrent Stacking of Layers for Compact Neural Machine Translation - Models",cs.CL," In neural machine translation (NMT), the most common practice is to stack a -number of recurrent or feed-forward layers in the encoder and the decoder. As a -result, the addition of each new layer improves the translation quality -significantly. However, this also leads to a significant increase in the number -of parameters. In this paper, we propose to share parameters across all the -layers thereby leading to a recurrently stacked NMT model. We empirically show -that the translation quality of a model that recurrently stacks a single layer -6 times is comparable to the translation quality of a model that stacks 6 -separate layers. We also show that using pseudo-parallel corpora by -back-translation leads to further significant improvements in translation -quality. -" -8244,1807.05518,"Jacob Krantz, Maxwell Dulin, Paul De Palma, Mark VanDam",Syllabification by Phone Categorization,cs.CL," Syllables play an important role in speech synthesis, speech recognition, and -spoken document retrieval. A novel, low cost, and language agnostic approach to -dividing words into their corresponding syllables is presented. A hybrid -genetic algorithm constructs a categorization of phones optimized for -syllabification. This categorization is used on top of a hidden Markov model -sequence classifier to find syllable boundaries. The technique shows promising -preliminary results when trained and tested on English words. -" -8245,1807.05519,Yukun Ma and Erik Cambria,Concept-Based Embeddings for Natural Language Processing,cs.CL," In this work, we focus on effectively leveraging and integrating information -from concept-level as well as word-level via projecting concepts and words into -a lower dimensional space while retaining most critical semantics. In a broad -context of opinion understanding system, we investigate the use of the fused -embedding for several core NLP tasks: named entity detection and -classification, automatic speech recognition reranking, and targeted sentiment -analysis. -" -8246,1807.05574,"Vuong M. Ngo, Tru H. Cao and Tuan M. V. Le","WordNet-Based Information Retrieval Using Common Hypernyms and Combined - Features",cs.CL cs.IR," Text search based on lexical matching of keywords is not satisfactory due to -polysemous and synonymous words. Semantic search that exploits word meanings, -in general, improves search performance. In this paper, we survey WordNet-based -information retrieval systems, which employ a word sense disambiguation method -to process queries and documents. The problem is that in many cases a word has -more than one possible direct sense, and picking only one of them may give a -wrong sense for the word. Moreover, the previous systems use only word forms to -represent word senses and their hypernyms. We propose a novel approach that -uses the most specific common hypernym of the remaining undisambiguated -multi-senses of a word, as well as combined WordNet features to represent word -meanings. Experiments on a benchmark dataset show that, in terms of the MAP -measure, our search engine is 17.7% better than the lexical search, and at -least 9.4% better than all surveyed search systems using WordNet. - Keywords Ontology, word sense disambiguation, semantic annotation, semantic -search. -" -8247,1807.05642,"Peter Ahrens, John Feser, Robin Hui",LATE Ain'T Earley: A Faster Parallel Earley Parser,cs.CL cs.DC," We present the LATE algorithm, an asynchronous variant of the Earley -algorithm for parsing context-free grammars. The Earley algorithm is naturally -task-based, but is difficult to parallelize because of dependencies between the -tasks. We present the LATE algorithm, which uses additional data structures to -maintain information about the state of the parse so that work items may be -processed in any order. This property allows the LATE algorithm to be sped up -using task parallelism. We show that the LATE algorithm can achieve a 120x -speedup over the Earley algorithm on a natural language task. -" -8248,1807.05797,"Pilar Leon-Arauz, Antonio San Martin and Arianne Reimerink",The EcoLexicon English Corpus as an open corpus in Sketch Engine,cs.CL," The EcoLexicon English Corpus (EEC) is a 23.1-million-word corpus of -contemporary environmental texts. It was compiled by the LexiCon research group -for the development of EcoLexicon (Faber, Leon-Arauz & Reimerink 2016; San -Martin et al. 2017), a terminological knowledge base on the environment. It is -available as an open corpus in the well-known corpus query system Sketch Engine -(Kilgarriff et al. 2014), which means that any user, even without a -subscription, can freely access and query the corpus. In this paper, the EEC is -introduced by de- scribing how it was built and compiled and how it can be -queried and exploited, based both on the functionalities provided by Sketch -Engine and on the parameters in which the texts in the EEC are classified. -" -8249,1807.05849,"Junxin Liu, Fangzhao Wu, Chuhan Wu, Yongfeng Huang, Xing Xie",Neural Chinese Word Segmentation with Dictionary Knowledge,cs.CL cs.LG stat.ML," Chinese word segmentation (CWS) is an important task for Chinese NLP. -Recently, many neural network based methods have been proposed for CWS. -However, these methods require a large number of labeled sentences for model -training, and usually cannot utilize the useful information in Chinese -dictionary. In this paper, we propose two methods to exploit the dictionary -information for CWS. The first one is based on pseudo labeled data generation, -and the second one is based on multi-task learning. The experimental results on -two benchmark datasets validate that our approach can effectively improve the -performance of Chinese word segmentation, especially when training data is -insufficient. -" -8250,1807.05855,"Hosung Park, Donghyun Lee, Minkyu Lim, Yoseb Kang, Juneseok Oh and - Ji-Hwan Kim","A Fast-Converged Acoustic Modeling for Korean Speech Recognition: A - Preliminary Study on Time Delay Neural Network",cs.CL cs.SD eess.AS," In this paper, a time delay neural network (TDNN) based acoustic model is -proposed to implement a fast-converged acoustic modeling for Korean speech -recognition. The TDNN has an advantage in fast-convergence where the amount of -training data is limited, due to subsampling which excludes duplicated weights. -The TDNN showed an absolute improvement of 2.12% in terms of character error -rate compared to feed forward neural network (FFNN) based modelling for Korean -speech corpora. The proposed model converged 1.67 times faster than a -FFNN-based model did. -" -8251,1807.05962,"Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, Roger Zimmermann, - John R. Talburt","Theme-weighted Ranking of Keywords from Text Documents using Phrase - Embeddings",cs.CL," Keyword extraction is a fundamental task in natural language processing that -facilitates mapping of documents to a concise set of representative single and -multi-word phrases. Keywords from text documents are primarily extracted using -supervised and unsupervised approaches. In this paper, we present an -unsupervised technique that uses a combination of theme-weighted personalized -PageRank algorithm and neural phrase embeddings for extracting and ranking -keywords. We also introduce an efficient way of processing text documents and -training phrase embeddings using existing techniques. We share an evaluation -dataset derived from an existing dataset that is used for choosing the -underlying embedding model. The evaluations for ranked keyword extraction are -performed on two benchmark datasets comprising of short abstracts (Inspec), and -long scientific papers (SemEval 2010), and is shown to produce results better -than the state-of-the-art systems. -" -8252,1807.06008,Kittipitch Kuptavanich,Using Textual Summaries to Describe a Set of Products,cs.CL," When customers are faced with the task of making a purchase in an unfamiliar -product domain, it might be useful to provide them with an overview of the -product set to help them understand what they can expect. In this paper we -present and evaluate a method to summarise sets of products in natural -language, focusing on the price range, common product features across the set, -and product features that impact on price. In our study, participants reported -that they found our summaries useful, but we found no evidence that the -summaries influenced the selections made by participants. -" -8253,1807.06107,"Mansurul Bhuiyan, Amita Misra, Saurabh Tripathy, Jalal Mahmud, Rama - Akkiraju","Don't get Lost in Negation: An Effective Negation Handled Dialogue Acts - Prediction Algorithm for Twitter Customer Service Conversations",cs.CL cs.AI," In the last several years, Twitter is being adopted by the companies as an -alternative platform to interact with the customers to address their concerns. -With the abundance of such unconventional conversation resources, push for -developing effective virtual agents is more than ever. To address this -challenge, a better understanding of such customer service conversations is -required. Lately, there have been several works proposing a novel taxonomy for -fine-grained dialogue acts as well as develop algorithms for automatic -detection of these acts. The outcomes of these works are providing stepping -stones for the ultimate goal of building efficient and effective virtual -agents. But none of these works consider handling the notion of negation into -the proposed algorithms. In this work, we developed an SVM-based dialogue acts -prediction algorithm for Twitter customer service conversations where negation -handling is an integral part of the end-to-end solution. For negation handling, -we propose several efficient heuristics as well as adopt recent state-of- art -third party machine learning based solutions. Empirically we show model's -performance gain while handling negation compared to when we don't. Our -experiments show that for the informal text such as tweets, the heuristic-based -approach is more effective. -" -8254,1807.06151,"Nishant Nikhil, Ramit Pahwa, Mehul Kumar Nirala and Rohan Khilnani",LSTMs with Attention for Aggression Detection,cs.CL," In this paper, we describe the system submitted for the shared task on -Aggression Identification in Facebook posts and comments by the team Nishnik. -Previous works demonstrate that LSTMs have achieved remarkable performance in -natural language processing tasks. We deploy an LSTM model with an attention -unit over it. Our system ranks 6th and 4th in the Hindi subtask for Facebook -comments and subtask for generalized social media data respectively. And it -ranks 17th and 10th in the corresponding English subtasks. -" -8255,1807.06204,"Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, - Najim Dehak, Sanjeev Khudanpur",Low-Resource Contextual Topic Identification on Speech,cs.CL," In topic identification (topic ID) on real-world unstructured audio, an audio -instance of variable topic shifts is first broken into sequential segments, and -each segment is independently classified. We first present a general purpose -method for topic ID on spoken segments in low-resource languages, using a -cascade of universal acoustic modeling, translation lexicons to English, and -English-language topic classification. Next, instead of classifying each -segment independently, we demonstrate that exploring the contextual -dependencies across sequential segments can provide large improvements. In -particular, we propose an attention-based contextual model which is able to -leverage the contexts in a selective manner. We test both our contextual and -non-contextual models on four LORELEI languages, and on all but one our -attention-based contextual model significantly outperforms the -context-independent models. -" -8256,1807.06234,"Kalpesh Krishna, Shubham Toshniwal, Karen Livescu",Hierarchical Multitask Learning for CTC-based Speech Recognition,cs.CL," Previous work has shown that neural encoder-decoder speech recognition can be -improved with hierarchical multitask learning, where auxiliary tasks are added -at intermediate layers of a deep encoder. We explore the effect of hierarchical -multitask learning in the context of connectionist temporal classification -(CTC)-based speech recognition, and investigate several aspects of this -approach. Consistent with previous work, we observe performance improvements on -telephone conversational speech recognition (specifically the Eval2000 test -sets) when training a subword-level CTC model with an auxiliary phone loss at -an intermediate layer. We analyze the effects of a number of experimental -variables (like interpolation constant and position of the auxiliary loss -function), performance in lower-resource settings, and the relationship between -pretraining and multitask learning. We observe that the hierarchical multitask -approach improves over standard multitask training in our higher-data -experiments, while in the low-resource settings standard multitask training -works well. The best results are obtained by combining hierarchical multitask -learning and pretraining, which improves word error rates by 3.4% absolute on -the Eval2000 test sets. -" -8257,1807.06270,"Animesh Prasad, Herv\'e D\'ejean, Jean-Luc Meunier, Max Weidemann, - Johannes Michael, Gundram Leifert","Bench-Marking Information Extraction in Semi-Structured Historical - Handwritten Records",cs.CV cs.CL," In this report, we present our findings from benchmarking experiments for -information extraction on historical handwritten marriage records Esposalles -from IEHHR - ICDAR 2017 robust reading competition. The information extraction -is modeled as semantic labeling of the sequence across 2 set of labels. This -can be achieved by sequentially or jointly applying handwritten text -recognition (HTR) and named entity recognition (NER). We deploy a pipeline -approach where first we use state-of-the-art HTR and use its output as input -for NER. We show that given low resource setup and simple structure of the -records, high performance of HTR ensures overall high performance. We explore -the various configurations of conditional random fields and neural networks to -benchmark NER on given certain noisy input. The best model on 10-fold -cross-validation as well as blind test data uses n-gram features with -bidirectional long short-term memory. -" -8258,1807.06414,"Mehdi Ben Lazreg, Morten Goodwin","Combining a Context Aware Neural Network with a Denoising Autoencoder - for Measuring String Similarities",cs.IR cs.AI cs.CL cs.LG," Measuring similarities between strings is central for many established and -fast growing research areas including information retrieval, biology, and -natural language processing. The traditional approach for string similarity -measurements is to define a metric over a word space that quantifies and sums -up the differences between characters in two strings. The state-of-the-art in -the area has, surprisingly, not evolved much during the last few decades. The -majority of the metrics are based on a simple comparison between character and -character distributions without consideration for the context of the words. -This paper proposes a string metric that encompasses similarities between -strings based on (1) the character similarities between the words including. -Non-Standard and standard spellings of the same words, and (2) the context of -the words. Our proposal is a neural network composed of a denoising autoencoder -and what we call a context encoder specifically designed to find similarities -between the words based on their context. The experimental results show that -the resulting metrics succeeds in 85.4\% of the cases in finding the correct -version of a non-standard spelling among the closest words, compared to 63.2\% -with the established Normalised-Levenshtein distance. Besides, we show that -words used in similar context are with our approach calculated to be similar -than words with different contexts, which is a desirable property missing in -established string metrics. -" -8259,1807.06441,"Jan Vanek, Josef Michalek, Jan Zelinka, Josef Psutka","A Comparison of Adaptation Techniques and Recurrent Neural Network - Architectures",eess.AS cs.CL cs.SD," Recently, recurrent neural networks have become state-of-the-art in acoustic -modeling for automatic speech recognition. The long short-term memory (LSTM) -units are the most popular ones. However, alternative units like gated -recurrent unit (GRU) and its modifications outperformed LSTM in some -publications. In this paper, we compared five neural network (NN) architectures -with various adaptation and feature normalization techniques. We have evaluated -feature-space maximum likelihood linear regression, five variants of i-vector -adaptation and two variants of cepstral mean normalization. The most adaptation -and normalization techniques were developed for feed-forward NNs and, according -to results in this paper, not all of them worked also with RNNs. For -experiments, we have chosen a well known and available TIMIT phone recognition -task. The phone recognition is much more sensitive to the quality of AM than -large vocabulary task with a complex language model. Also, we published the -open-source scripts to easily replicate the results and to help continue the -development. -" -8260,1807.06500,"Jiyuan Zhang, Dong Wang",Chinese Poetry Generation with Flexible Styles,cs.CL," Research has shown that sequence-to-sequence neural models, particularly -those with the attention mechanism, can successfully generate classical Chinese -poems. However, neural models are not capable of generating poems that match -specific styles, such as the impulsive style of Li Bai, a famous poet in the -Tang Dynasty. This work proposes a memory-augmented neural model to enable the -generation of style-specific poetry. The key idea is a memory structure that -stores how poems with a desired style were generated by humans, and uses -similar fragments to adjust the generation. We demonstrate that the proposed -algorithm generates poems with flexible styles, including styles of a -particular era and an individual poet. -" -8261,1807.06517,"Osman Ramadan, Pawe{\l} Budzianowski, Milica Ga\v{s}i\'c",Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing,cs.CL," Robust dialogue belief tracking is a key component in maintaining good -quality dialogue systems. The tasks that dialogue systems are trying to solve -are becoming increasingly complex, requiring scalability to multi domain, -semantically rich dialogues. However, most current approaches have difficulty -scaling up with domains because of the dependency of the model parameters on -the dialogue ontology. In this paper, a novel approach is introduced that fully -utilizes semantic similarity between dialogue utterances and the ontology -terms, allowing the information to be shared across domains. The evaluation is -performed on a recently collected multi-domain dialogues dataset, one order of -magnitude larger than currently available corpora. Our model demonstrates great -capability in handling multi-domain dialogues, simultaneously outperforming -existing state-of-the-art models in single-domain dialogue tracking tasks. -" -8262,1807.06538,Tomohiko Konno and Michiaki Iwazume,"Cavity Filling: Pseudo-Feature Generation for Multi-Class Imbalanced - Data Problems in Deep Learning",cs.LG cs.AI cs.CL cs.CV stat.ML," Herein, we generate pseudo-features based on the multivariate probability -distributions obtained from the feature maps in layers of trained deep neural -networks. Further, we augment the minor-class data based on these generated -pseudo-features to overcome the imbalanced data problems. The proposed method, -i.e., cavity filling, improves the deep learning capabilities in several -problems because all the real-world data are observed to be imbalanced. -" -8263,1807.06557,"Michelle Lam, Catherina Xu, Angela Kong, Vinodkumar Prabhakaran",Power Networks: A Novel Neural Architecture to Predict Power Relations,cs.CL," Can language analysis reveal the underlying social power relations that exist -between participants of an interaction? Prior work within NLP has shown promise -in this area, but the performance of automatically predicting power relations -using NLP analysis of social interactions remains wanting. In this paper, we -present a novel neural architecture that captures manifestations of power -within individual emails which are then aggregated in an order-preserving way -in order to infer the direction of power between pairs of participants in an -email thread. We obtain an accuracy of 80.4%, a 10.1% improvement over -state-of-the-art methods, in this task. We further apply our model to the task -of predicting power relations between individuals based on the entire set of -messages exchanged between them; here also, our model significantly outperforms -the70.0% accuracy using prior state-of-the-art techniques, obtaining an -accuracy of 83.0%. -" -8264,1807.06588,"Charlie Kingston and Jason R. C. Nurse and Ioannis Agrafiotis and - Andrew Milich","Using semantic clustering to support situation awareness on Twitter: The - case of World Views",cs.CL," In recent years, situation awareness has been recognised as a critical part -of effective decision making, in particular for crisis management. One way to -extract value and allow for better situation awareness is to develop a system -capable of analysing a dataset of multiple posts, and clustering consistent -posts into different views or stories (or, world views). However, this can be -challenging as it requires an understanding of the data, including determining -what is consistent data, and what data corroborates other data. Attempting to -address these problems, this article proposes Subject-Verb-Object Semantic -Suffix Tree Clustering (SVOSSTC) and a system to support it, with a special -focus on Twitter content. The novelty and value of SVOSSTC is its emphasis on -utilising the Subject-Verb-Object (SVO) typology in order to construct -semantically consistent world views, in which individuals---particularly those -involved in crisis response---might achieve an enhanced picture of a situation -from social media data. To evaluate our system and its ability to provide -enhanced situation awareness, we tested it against existing approaches, -including human data analysis, using a variety of real-world scenarios. The -results indicated a noteworthy degree of evidence (e.g., in cluster granularity -and meaningfulness) to affirm the suitability and rigour of our approach. -Moreover, these results highlight this article's proposals as innovative and -practical system contributions to the research field. -" -8265,1807.06610,"Davis Liang, Zhiheng Huang, Zachary C. Lipton",Learning Noise-Invariant Representations for Robust Speech Recognition,eess.AS cs.CL cs.LG cs.SD," Despite rapid advances in speech recognition, current models remain brittle -to superficial perturbations to their inputs. Small amounts of noise can -destroy the performance of an otherwise state-of-the-art model. To harden -models against background noise, practitioners often perform data augmentation, -adding artificially-noised examples to the training set, carrying over the -original label. In this paper, we hypothesize that a clean example and its -superficially perturbed counterparts shouldn't merely map to the same class --- -they should map to the same representation. We propose -invariant-representation-learning (IRL): At each training iteration, for each -training example,we sample a noisy counterpart. We then apply a penalty term to -coerce matched representations at each layer (above some chosen layer). Our key -results, demonstrated on the Librispeech dataset are the following: (i) IRL -significantly reduces character error rates (CER) on both 'clean' (3.3% vs -6.5%) and 'other' (11.0% vs 18.1%) test sets; (ii) on several out-of-domain -noise settings (different from those seen during training), IRL's benefits are -even more pronounced. Careful ablations confirm that our results are not simply -due to shrinking activations at the chosen layers. -" -8266,1807.06638,"Himanshu Sharma, Chengsheng Mao, Yizhen Zhang, Haleh Vatani, Liang - Yao, Yizhen Zhong, Luke Rasmussen, Guoqian Jiang, Jyotishman Pathak and Yuan - Luo","Developing a Portable Natural Language Processing Based Phenotyping - System",cs.CL cs.IR," This paper presents a portable phenotyping system that is capable of -integrating both rule-based and statistical machine learning based approaches. -Our system utilizes UMLS to extract clinically relevant features from the -unstructured text and then facilitates portability across different -institutions and data systems by incorporating OHDSI's OMOP Common Data Model -(CDM) to standardize necessary data elements. Our system can also store the key -components of rule-based systems (e.g., regular expression matches) in the -format of OMOP CDM, thus enabling the reuse, adaptation and extension of many -existing rule-based clinical NLP systems. We experimented with our system on -the corpus from i2b2's Obesity Challenge as a pilot study. Our system -facilitates portable phenotyping of obesity and its 15 comorbidities based on -the unstructured patient discharge summaries, while achieving a performance -that often ranked among the top 10 of the challenge participants. This -standardization enables a consistent application of numerous rule-based and -machine learning based classification techniques downstream. -" -8267,1807.06683,"Onur G\""ung\""or, Suzan \""Usk\""udarl{\i}, Tunga G\""ung\""or","Improving Named Entity Recognition by Jointly Learning to Disambiguate - Morphological Tags",cs.CL," Previous studies have shown that linguistic features of a word such as -possession, genitive or other grammatical cases can be employed in word -representations of a named entity recognition (NER) tagger to improve the -performance for morphologically rich languages. However, these taggers require -external morphological disambiguation (MD) tools to function which are hard to -obtain or non-existent for many languages. In this work, we propose a model -which alleviates the need for such disambiguators by jointly learning NER and -MD taggers in languages for which one can provide a list of candidate -morphological analyses. We show that this can be done independent of the -morphological annotation schemes, which differ among languages. Our experiments -employing three different model architectures that join these two tasks show -that joint learning improves NER performance. Furthermore, the morphological -disambiguator's performance is shown to be competitive. -" -8268,1807.06718,"Qi Wang, Jiahui Qiu, Yangming Zhou, Tong Ruan, Daqi Gao and Ju Gao","Automatic Severity Classification of Coronary Artery Disease via - Recurrent Capsule Network",cs.CL," Coronary artery disease (CAD) is one of the leading causes of cardiovascular -disease deaths. CAD condition progresses rapidly, if not diagnosed and treated -at an early stage may eventually lead to an irreversible state of the heart -muscle death. Invasive coronary arteriography is the gold standard technique -for CAD diagnosis. Coronary arteriography texts describe which part has -stenosis and how much stenosis is in details. It is crucial to conduct the -severity classification of CAD. In this paper, we employ a recurrent capsule -network (RCN) to extract semantic relations between clinical named entities in -Chinese coronary arteriography texts, through which we can automatically find -out the maximal stenosis for each lumen to inference how severe CAD is -according to the improved method of Gensini. Experimental results on the corpus -collected from Shanghai Shuguang Hospital show that our proposed method -achieves an accuracy of 97.0\% in the severity classification of CAD. -" -8269,1807.06736,"Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai","Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech - Synthesis",cs.CL cs.LG cs.SD eess.AS," This paper proposes a forward attention method for the sequenceto- sequence -acoustic modeling of speech synthesis. This method is motivated by the nature -of the monotonic alignment from phone sequences to acoustic sequences. Only the -alignment paths that satisfy the monotonic condition are taken into -consideration at each decoder timestep. The modified attention probabilities at -each timestep are computed recursively using a forward algorithm. A transition -agent for forward attention is further proposed, which helps the attention -mechanism to make decisions whether to move forward or stay at each decoder -timestep. Experimental results show that the proposed forward attention method -achieves faster convergence speed and higher stability than the baseline -attention method. Besides, the method of forward attention with transition -agent can also help improve the naturalness of synthetic speech and control the -speed of synthetic speech effectively. -" -8270,1807.06792,Shao-Yen Tseng and Brian Baucom and Panayiotis Georgiou,Unsupervised Online Multitask Learning of Behavioral Sentence Embeddings,cs.CL," Unsupervised learning has been an attractive method for easily deriving -meaningful data representations from vast amounts of unlabeled data. These -representations, or embeddings, often yield superior results in many tasks, -whether used directly or as features in subsequent training stages. However, -the quality of the embeddings is highly dependent on the assumed knowledge in -the unlabeled data and how the system extracts information without supervision. -Domain portability is also very limited in unsupervised learning, often -requiring re-training on other in-domain corpora to achieve robustness. In this -work we present a multitask paradigm for unsupervised contextual learning of -behavioral interactions which addresses unsupervised domain adaption. We -introduce an online multitask objective into unsupervised learning and show -that sentence embeddings generated through this process increases performance -of affective tasks. -" -8271,1807.06882,Tal Linzen and Brian Leonard,"Distinct patterns of syntactic agreement errors in recurrent networks - and humans",cs.CL," Determining the correct form of a verb in context requires an understanding -of the syntactic structure of the sentence. Recurrent neural networks have been -shown to perform this task with an error rate comparable to humans, despite the -fact that they are not designed with explicit syntactic representations. To -examine the extent to which the syntactic representations of these networks are -similar to those used by humans when processing sentences, we compare the -detailed pattern of errors that RNNs and humans make on this task. Despite -significant similarities (attraction errors, asymmetry between singular and -plural subjects), the error patterns differed in important ways. In particular, -in complex sentences with relative clauses error rates increased in RNNs but -decreased in humans. Furthermore, RNNs showed a cumulative effect of attractors -but humans did not. We conclude that at least in some respects the syntactic -representations acquired by RNNs are fundamentally different from those used by -humans. -" -8272,1807.06926,"Evandro Cunha, Gabriel Magno, Josemar Caetano, Douglas Teixeira, - Virgilio Almeida","Fake news as we feel it: perception and conceptualization of the term - ""fake news"" in the media",cs.CL cs.SI," In this article, we quantitatively analyze how the term ""fake news"" is being -shaped in news media in recent years. We study the perception and the -conceptualization of this term in the traditional media using eight years of -data collected from news outlets based in 20 countries. Our results not only -corroborate previous indications of a high increase in the usage of the -expression ""fake news"", but also show contextual changes around this expression -after the United States presidential election of 2016. Among other results, we -found changes in the related vocabulary, in the mentioned entities, in the -surrounding topics and in the contextual polarity around the term ""fake news"", -suggesting that this expression underwent a change in perception and -conceptualization after 2016. These outcomes expand the understandings on the -usage of the term ""fake news"", helping to comprehend and more accurately -characterize this relevant social phenomenon linked to misinformation and -manipulation. -" -8273,1807.06978,Sixun Ouyang and Aonghus Lawlor and Felipe Costa and Peter Dolog,Improving Explainable Recommendations with Synthetic Reviews,cs.IR cs.AI cs.CL," An important task for a recommender system to provide interpretable -explanations for the user. This is important for the credibility of the system. -Current interpretable recommender systems tend to focus on certain features -known to be important to the user and offer their explanations in a structured -form. It is well known that user generated reviews and feedback from reviewers -have strong leverage over the users' decisions. On the other hand, recent text -generation works have been shown to generate text of similar quality to human -written text, and we aim to show that generated text can be successfully used -to explain recommendations. - In this paper, we propose a framework consisting of popular review-oriented -generation models aiming to create personalised explanations for -recommendations. The interpretations are generated at both character and word -levels. We build a dataset containing reviewers' feedback from the Amazon books -review dataset. Our cross-domain experiments are designed to bridge from -natural language processing to the recommender system domain. Besides language -model evaluation methods, we employ DeepCoNN, a novel review-oriented -recommender system using a deep neural network, to evaluate the recommendation -performance of generated reviews by root mean square error (RMSE). We -demonstrate that the synthetic personalised reviews have better recommendation -performance than human written reviews. To our knowledge, this presents the -first machine-generated natural language explanations for rating prediction. -" -8274,1807.06998,"Filip Klubi\v{c}ka, Giancarlo D. Salton, John D. Kelleher",Is it worth it? Budget-related evaluation metrics for model selection,cs.CL," Creating a linguistic resource is often done by using a machine learning -model that filters the content that goes through to a human annotator, before -going into the final resource. However, budgets are often limited, and the -amount of available data exceeds the amount of affordable annotation. In order -to optimize the benefit from the invested human work, we argue that deciding on -which model one should employ depends not only on generalized evaluation -metrics such as F-score, but also on the gain metric. Because the model with -the highest F-score may not necessarily have the best sequencing of predicted -classes, this may lead to wasting funds on annotating false positives, yielding -zero improvement of the linguistic resource. We exemplify our point with a case -study, using real data from a task of building a verb-noun idiom dictionary. We -show that, given the choice of three systems with varying F-scores, the system -with the highest F-score does not yield the highest profits. In other words, in -our case the cost-benefit trade off is more favorable for a system with a lower -F-score. -" -8275,1807.07104,Ramon Sanabria and Florian Metze,Hierarchical Multi Task Learning With CTC,cs.CL," In Automatic Speech Recognition it is still challenging to learn useful -intermediate representations when using high-level (or abstract) target units -such as words. For that reason, character or phoneme based systems tend to -outperform word-based systems when just few hundreds of hours of training data -are being used. In this paper, we first show how hierarchical multi-task -training can encourage the formation of useful intermediate representations. We -achieve this by performing Connectionist Temporal Classification at different -levels of the network with targets of different granularity. Our model thus -performs predictions in multiple scales for the same input. On the standard -300h Switchboard training setup, our hierarchical multi-task architecture -exhibits improvements over single-task architectures with the same number of -parameters. Our model obtains 14.0% Word Error Rate on the Eval2000 Switchboard -subset without any decoder or language model, outperforming the current -state-of-the-art on acoustic-to-word models. -" -8276,1807.07108,Fabiano Ferreira Luz and Marcelo Finger,"Semantic Parsing: Syntactic assurance to target sentence using LSTM - Encoder CFG-Decoder",cs.CL," Semantic parsing can be defined as the process of mapping natural language -sentences into a machine interpretable, formal representation of its meaning. -Semantic parsing using LSTM encoder-decoder neural networks have become -promising approach. However, human automated translation of natural language -does not provide grammaticality guarantees for the sentences generate such a -guarantee is particularly important for practical cases where a data base query -can cause critical errors if the sentence is ungrammatical. In this work, we -propose an neural architecture called Encoder CFG-Decoder, whose output -conforms to a given context-free grammar. Results are show for any -implementation of such architecture display its correctness and providing -benchmark accuracy levels better than the literature. -" -8277,1807.07147,Alexey Tikhonov and Ivan P. Yamshchikov,"Guess who? Multilingual approach for the automated generation of - author-stylized poetry",cs.CL cs.AI cs.LG," This paper addresses the problem of stylized text generation in a -multilingual setup. A version of a language model based on a long short-term -memory (LSTM) artificial neural network with extended phonetic and semantic -embeddings is used for stylized poetry generation. The quality of the resulting -poems generated by the network is estimated through bilingual evaluation -understudy (BLEU), a survey and a new cross-entropy based metric that is -suggested for the problems of such type. The experiments show that the proposed -model consistently outperforms random sample and vanilla-LSTM baselines, humans -also tend to associate machine generated texts with the target author. -" -8278,1807.07149,"Albert Parra and Andrew W. Haddad and Mireille Boutin and Edward J. - Delp","A Hand-Held Multimedia Translation and Interpretation System with - Application to Diet Management",cs.CL cs.MM stat.ML," We propose a network independent, hand-held system to translate and -disambiguate foreign restaurant menu items in real-time. The system is based on -the use of a portable multimedia device, such as a smartphones or a PDA. An -accurate and fast translation is obtained using a Machine Translation engine -and a context-specific corpora to which we apply two pre-processing steps, -called translation standardization and $n$-gram consolidation. The phrase-table -generated is orders of magnitude lighter than the ones commonly used in market -applications, thus making translations computationally less expensive, and -decreasing the battery usage. Translation ambiguities are mitigated using -multimedia information including images of dishes and ingredients, along with -ingredient lists. We implemented a prototype of our system on an iPod Touch -Second Generation for English speakers traveling in Spain. Our tests indicate -that our translation method yields higher accuracy than translation engines -such as Google Translate, and does so almost instantaneously. The memory -requirements of the application, including the database of images, are also -well within the limits of the device. By combining it with a database of -nutritional information, our proposed system can be used to help individuals -who follow a medical diet maintain this diet while traveling. -" -8279,1807.07186,"Yadollah Yaghoobzadeh, Katharina Kann and Hinrich Sch\""utze","Evaluating Word Embeddings in Multi-label Classification Using - Fine-grained Name Typing",cs.CL cs.AI," Embedding models typically associate each word with a single real-valued -vector, representing its different properties. Evaluation methods, therefore, -need to analyze the accuracy and completeness of these properties in -embeddings. This requires fine-grained analysis of embedding subspaces. -Multi-label classification is an appropriate way to do so. We propose a new -evaluation method for word embeddings based on multi-label classification given -a word embedding. The task we use is fine-grained name typing: given a large -corpus, find all types that a name can refer to based on the name embedding. -Given the scale of entities in knowledge bases, we can build datasets for this -task that are complementary to the current embedding evaluation datasets in: -they are very large, contain fine-grained classes, and allow the direct -evaluation of embeddings without confounding factors like sentence context -" -8280,1807.07187,"Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Li Zhang, Xinyang Yi, - Lichan Hong, Ed Chi, John Anderson",Efficient Training on Very Large Corpora via Gramian Estimation,stat.ML cs.CL cs.LG," We study the problem of learning similarity functions over very large corpora -using neural network embedding models. These models are typically trained using -SGD with sampling of random observed and unobserved pairs, with a number of -samples that grows quadratically with the corpus size, making it expensive to -scale to very large corpora. We propose new efficient methods to train these -models without having to sample unobserved pairs. Inspired by matrix -factorization, our approach relies on adding a global quadratic penalty to all -pairs of examples and expressing this term as the matrix-inner-product of two -generalized Gramians. We show that the gradient of this term can be efficiently -computed by maintaining estimates of the Gramians, and develop variance -reduction schemes to improve the quality of the estimates. We conduct -large-scale experiments that show a significant improvement in training time -and generalization quality compared to traditional sampling methods. -" -8281,1807.07255,"Can Xu, Wei Wu, Yu Wu","Towards Explainable and Controllable Open Domain Dialogue Generation - with Dialogue Acts",cs.CL cs.AI cs.HC," We study open domain dialogue generation with dialogue acts designed to -explain how people engage in social chat. To imitate human behavior, we propose -managing the flow of human-machine interactions with the dialogue acts as -policies. The policies and response generation are jointly learned from -human-human conversations, and the former is further optimized with a -reinforcement learning approach. With the dialogue acts, we achieve significant -improvement over state-of-the-art methods on response quality for given -contexts and dialogue length in both machine-machine simulation and -human-machine conversation. -" -8282,1807.07279,"Lutfi Kerem Senel, Ihsan Utlu, Furkan \c{S}ahinu\c{c}, Haldun M. - Ozaktas, Aykut Ko\c{c}","Imparting Interpretability to Word Embeddings while Preserving Semantic - Structure",cs.CL," As an ubiquitous method in natural language processing, word embeddings are -extensively employed to map semantic properties of words into a dense vector -representation. They capture semantic and syntactic relations among words but -the vectors corresponding to the words are only meaningful relative to each -other. Neither the vector nor its dimensions have any absolute, interpretable -meaning. We introduce an additive modification to the objective function of the -embedding learning algorithm that encourages the embedding vectors of words -that are semantically related to a predefined concept to take larger values -along a specified dimension, while leaving the original semantic learning -mechanism mostly unaffected. In other words, we align words that are already -determined to be related, along predefined concepts. Therefore, we impart -interpretability to the word embedding by assigning meaning to its vector -dimensions. The predefined concepts are derived from an external lexical -resource, which in this paper is chosen as Roget's Thesaurus. We observe that -alignment along the chosen concepts is not limited to words in the Thesaurus -and extends to other related words as well. We quantify the extent of -interpretability and assignment of meaning from our experimental results. -Manual human evaluation results have also been presented to further verify that -the proposed method increases interpretability. We also demonstrate the -preservation of semantic coherence of the resulting vector space by using -word-analogy and word-similarity tests. These tests show that the -interpretability-imparted word embeddings that are obtained by the proposed -framework do not sacrifice performances in common benchmark tests. -" -8283,1807.07281,"Wei Ping, Kainan Peng, Jitong Chen",ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech,cs.CL cs.AI cs.LG cs.SD eess.AS," In this work, we propose a new solution for parallel wave generation by -WaveNet. In contrast to parallel WaveNet (van den Oord et al., 2018), we -distill a Gaussian inverse autoregressive flow from the autoregressive WaveNet -by minimizing a regularized KL divergence between their highly-peaked output -distributions. Our method computes the KL divergence in closed-form, which -simplifies the training algorithm and provides very efficient distillation. In -addition, we introduce the first text-to-wave neural architecture for speech -synthesis, which is fully convolutional and enables fast end-to-end training -from scratch. It significantly outperforms the previous pipeline that connects -a text-to-spectrogram model to a separately trained WaveNet (Ping et al., -2018). We also successfully distill a parallel waveform synthesizer conditioned -on the hidden representation in this end-to-end model. -" -8284,1807.07351,"Adam Tsakalidis, Maria Liakata, Theo Damoulas, Alexandra I. Cristea","Can We Assess Mental Health through Social Media and Smart Devices? - Addressing Bias in Methodology and Evaluation",cs.CY cs.CL," Predicting mental health from smartphone and social media data on a -longitudinal basis has recently attracted great interest, with very promising -results being reported across many studies. Such approaches have the potential -to revolutionise mental health assessment, if their development and evaluation -follows a real world deployment setting. In this work we take a closer look at -state-of-the-art approaches, using different mental health datasets and -indicators, different feature sources and multiple simulations, in order to -assess their ability to generalise. We demonstrate that under a pragmatic -evaluation framework, none of the approaches deliver or even approach the -reported performances. In fact, we show that current state-of-the-art -approaches can barely outperform the most na\""ive baselines in the real-world -setting, posing serious questions not only about their deployment ability, but -also about the contribution of the derived features for the mental health -assessment task and how to make better use of such data in the future. -" -8285,1807.07425,"Liang Yao, Chengsheng Mao, Yuan Luo","Clinical Text Classification with Rule-based Features and - Knowledge-guided Convolutional Neural Networks",cs.CL," Clinical text classification is an important problem in medical natural -language processing. Existing studies have conventionally focused on rules or -knowledge sources-based feature engineering, but only a few have exploited -effective feature learning capability of deep learning methods. In this study, -we propose a novel approach which combines rule-based features and -knowledge-guided deep learning techniques for effective disease classification. -Critical Steps of our method include identifying trigger phrases, predicting -classes with very few examples using trigger phrases and training a -convolutional neural network with word embeddings and Unified Medical Language -System (UMLS) entity embeddings. We evaluated our method on the 2008 -Integrating Informatics with Biology and the Bedside (i2b2) obesity challenge. -The results show that our method outperforms the state of the art methods. -" -8286,1807.07517,"Priyanka Ranade, Sudip Mittal, Anupam Joshi, Karuna Joshi","Using Deep Neural Networks to Translate Multi-lingual Threat - Intelligence",cs.CL cs.CR," The multilingual nature of the Internet increases complications in the -cybersecurity community's ongoing efforts to strategically mine threat -intelligence from OSINT data on the web. OSINT sources such as social media, -blogs, and dark web vulnerability markets exist in diverse languages and hinder -security analysts, who are unable to draw conclusions from intelligence in -languages they don't understand. Although third party translation engines are -growing stronger, they are unsuited for private security environments. First, -sensitive intelligence is not a permitted input to third party engines due to -privacy and confidentiality policies. In addition, third party engines produce -generalized translations that tend to lack exclusive cybersecurity terminology. -In this paper, we address these issues and describe our system that enables -threat intelligence understanding across unfamiliar languages. We create a -neural network based system that takes in cybersecurity data in a different -language and outputs the respective English translation. The English -translation can then be understood by an analyst, and can also serve as input -to an AI based cyber-defense system that can take mitigative action. As a proof -of concept, we have created a pipeline which takes Russian threats and -generates its corresponding English, RDF, and vectorized representations. Our -network optimizes translations on specifically, cybersecurity data. -" -8287,1807.07520,"Grant P. Strimel, Kanthashree Mysore Sathyendra, Stanislav Peshterliev","Statistical Model Compression for Small-Footprint Natural Language - Understanding",cs.CL," In this paper we investigate statistical model compression applied to natural -language understanding (NLU) models. Small-footprint NLU models are important -for enabling offline systems on hardware restricted devices, and for decreasing -on-demand model loading latency in cloud-based systems. To compress NLU models, -we present two main techniques, parameter quantization and perfect feature -hashing. These techniques are complementary to existing model pruning -strategies such as L1 regularization. We performed experiments on a large scale -NLU system. The results show that our approach achieves 14-fold reduction in -memory usage compared to the original models with minimal predictive -performance impact. -" -8288,1807.07545,"Jo\~ao Loula, Marco Baroni, Brenden M. Lake","Rearranging the Familiar: Testing Compositional Generalization in - Recurrent Networks",cs.CL cs.AI cs.LG," Systematic compositionality is the ability to recombine meaningful units with -regular and predictable outcomes, and it's seen as key to humans' capacity for -generalization in language. Recent work has studied systematic compositionality -in modern seq2seq models using generalization to novel navigation instructions -in a grounded environment as a probing tool, requiring models to quickly -bootstrap the meaning of new words. We extend this framework here to settings -where the model needs only to recombine well-trained functional words (such as -""around"" and ""right"") in novel contexts. Our findings confirm and strengthen -the earlier ones: seq2seq models can be impressively good at generalizing to -novel combinations of previously-seen input, but only when they receive -extensive training on the specific pattern to be generalized (e.g., -generalizing from many examples of ""X around right"" to ""jump around right""), -while failing when generalization requires novel application of compositional -rules (e.g., inferring the meaning of ""around right"" from those of ""right"" and -""around""). -" -8289,1807.07741,"Luiza Sayfullina, Eric Malmi and Juho Kannala",Learning Representations for Soft Skill Matching,cs.CL cs.LG stat.ML," Employers actively look for talents having not only specific hard skills but -also various soft skills. To analyze the soft skill demands on the job market, -it is important to be able to detect soft skill phrases from job advertisements -automatically. However, a naive matching of soft skill phrases can lead to -false positive matches when a soft skill phrase, such as friendly, is used to -describe a company, a team, or another entity, rather than a desired candidate. - In this paper, we propose a phrase-matching-based approach which -differentiates between soft skill phrases referring to a candidate vs. -something else. The disambiguation is formulated as a binary text -classification problem where the prediction is made for the potential soft -skill based on the context where it occurs. To inform the model about the soft -skill for which the prediction is made, we develop several approaches, -including soft skill masking and soft skill tagging. - We compare several neural network based approaches, including CNN, LSTM and -Hierarchical Attention Model. The proposed tagging-based input representation -using LSTM achieved the highest recall of 83.92% on the job dataset when fixing -a precision to 95%. -" -8290,1807.07752,Shaunak Joshi and Deepali Deshpande,Twitter Sentiment Analysis System,cs.CL cs.LG stat.ML," Social media is increasingly used by humans to express their feelings and -opinions in the form of short text messages. Detecting sentiments in the text -has a wide range of applications including identifying anxiety or depression of -individuals and measuring well-being or mood of a community. Sentiments can be -expressed in many ways that can be seen such as facial expression and gestures, -speech and by written text. Sentiment Analysis in text documents is essentially -a content-based classification problem involving concepts from the domains of -Natural Language Processing as well as Machine Learning. In this paper, -sentiment recognition based on textual data and the techniques used in -sentiment analysis are discussed. -" -8291,1807.07779,Vuong M. Ngo and Tru H. Cao,"A Generalized Vector Space Model for Ontology-Based Information - Retrieval",cs.IR cs.CL cs.DB," Named entities (NE) are objects that are referred to by names such as people, -organizations and locations. Named entities and keywords are important to the -meaning of a document. We propose a generalized vector space model that -combines named entities and keywords. In the model, we take into account -different ontological features of named entities, namely, aliases, classes and -identifiers. Moreover, we use entity classes to represent the latent -information of interrogative words in Wh-queries, which are ignored in -traditional keyword-based searching. We have implemented and tested the -proposed model on a TREC dataset, as presented and discussed in the paper. -" -8292,1807.07828,"Jules Hedges (University of Oxford), Martha Lewis (ILLC, University of - Amsterdam)",Towards Functorial Language-Games,cs.LO cs.CL," In categorical compositional semantics of natural language one studies -functors from a category of grammatical derivations (such as a Lambek pregroup) -to a semantic category (such as real vector spaces). We compositionally build -game-theoretic semantics of sentences by taking the semantic category to be the -category whose morphisms are open games. This requires some modifications to -the grammar category to compensate for the failure of open games to form a -compact closed category. We illustrate the theory using simple examples of -Wittgenstein's language-games. -" -8293,1807.07961,"Yuxiao Chen, Jianbo Yuan, Quanzeng You, Jiebo Luo","Twitter Sentiment Analysis via Bi-sense Emoji Embedding and - Attention-based LSTM",cs.CL cs.MM," Sentiment analysis on large-scale social media data is important to bridge -the gaps between social media contents and real world activities including -political election prediction, individual and public emotional status -monitoring and analysis, and so on. Although textual sentiment analysis has -been well studied based on platforms such as Twitter and Instagram, analysis of -the role of extensive emoji uses in sentiment analysis remains light. In this -paper, we propose a novel scheme for Twitter sentiment analysis with extra -attention on emojis. We first learn bi-sense emoji embeddings under positive -and negative sentimental tweets individually, and then train a sentiment -classifier by attending on these bi-sense emoji embeddings with an -attention-based long short-term memory network (LSTM). Our experiments show -that the bi-sense embedding is effective for extracting sentiment-aware -embeddings of emojis and outperforms the state-of-the-art models. We also -visualize the attentions to show that the bi-sense emoji embedding provides -better guidance on the attention mechanism to obtain a more robust -understanding of the semantics and sentiments. -" -8294,1807.07964,"Minjeong Kim, David Keetae Park, Hyungjong Noh, Yeonsoo Lee and Jaegul - Choo",Question-Aware Sentence Gating Networks for Question and Answering,cs.CL cs.AI," Machine comprehension question answering, which finds an answer to the -question given a passage, involves high-level reasoning processes of -understanding and tracking the relevant contents across various semantic units -such as words, phrases, and sentences in a document. This paper proposes the -novel question-aware sentence gating networks that directly incorporate the -sentence-level information into word-level encoding processes. To this end, our -model first learns question-aware sentence representations and then dynamically -combines them with word-level representations, resulting in semantically -meaningful word representations for QA tasks. Experimental results demonstrate -that our approach consistently improves the accuracy over existing baseline -approaches on various QA datasets and bears the wide applicability to other -neural network-based QA models. -" -8295,1807.07965,Arindam Chowdhury and Lovekesh Vig,An Efficient End-to-End Neural Model for Handwritten Text Recognition,cs.CL cs.CV cs.LG," Offline handwritten text recognition from images is an important problem for -enterprises attempting to digitize large volumes of handmarked scanned -documents/reports. Deep recurrent models such as Multi-dimensional LSTMs have -been shown to yield superior performance over traditional Hidden Markov Model -based approaches that suffer from the Markov assumption and therefore lack the -representational power of RNNs. In this paper we introduce a novel approach -that combines a deep convolutional network with a recurrent Encoder-Decoder -network to map an image to a sequence of characters corresponding to the text -present in the image. The entire model is trained end-to-end using Focal Loss, -an improvement over the standard Cross-Entropy loss that addresses the class -imbalance problem, inherent to text recognition. To enhance the decoding -capacity of the model, Beam Search algorithm is employed which searches for the -best sequence out of a set of hypotheses based on a joint distribution of -individual characters. Our model takes as input a downsampled version of the -original image thereby making it both computationally and memory efficient. The -experimental results were benchmarked against two publicly available datasets, -IAM and RIMES. We surpass the state-of-the-art word level accuracy on the -evaluation set of both datasets by 3.5% & 1.1%, respectively. -" -8296,1807.07982,"Aaron J. Schwartz, Peter Sheridan Dodds, Jarlath P.M. O'Neil-Dunne, - Christopher M. Danforth, Taylor H. Ricketts","Visitors to urban greenspace have higher sentiment and lower negativity - on Twitter",cs.SI cs.CL cs.CY," With more people living in cities, we are witnessing a decline in exposure to -nature. A growing body of research has demonstrated an association between -nature contact and improved mood. Here, we used Twitter and the Hedonometer, a -world analysis tool, to investigate how sentiment, or the estimated happiness -of the words people write, varied before, during, and after visits to San -Francisco's urban park system. We found that sentiment was substantially higher -during park visits and remained elevated for several hours following the visit. -Leveraging differences in vegetative cover across park types, we explored how -different types of outdoor public spaces may contribute to subjective -well-being. Tweets during visits to Regional Parks, which are greener and have -greater vegetative cover, exhibited larger increases in sentiment than tweets -during visits to Civic Plazas and Squares. Finally, we analyzed word -frequencies to explore several mechanisms theorized to link nature exposure -with mental and cognitive benefits. Negation words such as 'no', 'not', and -'don't' decreased in frequency during visits to urban parks. These results can -be used by urban planners and public health officials to better target nature -contact recommendations for growing urban populations. -" -8297,1807.08000,"Chandra Khatri, Gyanit Singh, Nish Parikh","Abstractive and Extractive Text Summarization using Document Context - Vector and Recurrent Neural Networks",cs.CL," Sequence to sequence (Seq2Seq) learning has recently been used for -abstractive and extractive summarization. In current study, Seq2Seq models have -been used for eBay product description summarization. We propose a novel -Document-Context based Seq2Seq models using RNNs for abstractive and extractive -summarizations. Intuitively, this is similar to humans reading the title, -abstract or any other contextual information before reading the document. This -gives humans a high-level idea of what the document is about. We use this idea -and propose that Seq2Seq models should be started with contextual information -at the first time-step of the input to obtain better summaries. In this manner, -the output summaries are more document centric, than being generic, overcoming -one of the major hurdles of using generative models. We generate -document-context from user-behavior and seller provided information. We train -and evaluate our models on human-extracted-golden-summaries. The -document-contextual Seq2Seq models outperform standard Seq2Seq models. -Moreover, generating human extracted summaries is prohibitively expensive to -scale, we therefore propose a semi-supervised technique for extracting -approximate summaries and using it for training Seq2Seq models at scale. -Semi-supervised models are evaluated against human extracted summaries and are -found to be of similar efficacy. We provide side by side comparison for -abstractive and extractive summarizers (contextual and non-contextual) on same -evaluation dataset. Overall, we provide methodologies to use and evaluate the -proposed techniques for large document summarization. Furthermore, we found -these techniques to be highly effective, which is not the case with existing -techniques. -" -8298,1807.08074,"Stephanie M. Lukin, Felix Gervits, Cory J. Hayes, Anton Leuski, Pooja - Moolchandani, John G. Rogers III, Carlos Sanchez Amaro, Matthew Marge, Clare - R. Voss, David Traum",ScoutBot: A Dialogue System for Collaborative Navigation,cs.CL cs.HC," ScoutBot is a dialogue interface to physical and simulated robots that -supports collaborative exploration of environments. The demonstration will -allow users to issue unconstrained spoken language commands to ScoutBot. -ScoutBot will prompt for clarification if the user's instruction needs -additional input. It is trained on human-robot dialogue collected from -Wizard-of-Oz experiments, where robot responses were initiated by a human -wizard in previous interactions. The demonstration will show a simulated ground -robot (Clearpath Jackal) in a simulated environment supported by ROS (Robot -Operating System). -" -8299,1807.08076,"Stephanie M. Lukin, Kimberly A. Pollard, Claire Bonial, Matthew Marge, - Cassidy Henry, Ron Arstein, David Traum, Clare R. Voss","Consequences and Factors of Stylistic Differences in Human-Robot - Dialogue",cs.CL cs.HC cs.RO," This paper identifies stylistic differences in instruction-giving observed in -a corpus of human-robot dialogue. Differences in verbosity and structure (i.e., -single-intent vs. multi-intent instructions) arose naturally without -restrictions or prior guidance on how users should speak with the robot. -Different styles were found to produce different rates of miscommunication, and -correlations were found between style differences and individual user -variation, trust, and interaction experience with the robot. Understanding -potential consequences and factors that influence style can inform design of -dialogue systems that are robust to natural variation from human users. -" -8300,1807.08077,"Stephanie M. Lukin, Reginald Hobbs, Clare R. Voss",A Pipeline for Creative Visual Storytelling,cs.CL," Computational visual storytelling produces a textual description of events -and interpretations depicted in a sequence of images. These texts are made -possible by advances and cross-disciplinary approaches in natural language -processing, generation, and computer vision. We define a computational creative -visual storytelling as one with the ability to alter the telling of a story -along three aspects: to speak about different environments, to produce -variations based on narrative goals, and to adapt the narrative to the -audience. These aspects of creative storytelling and their effect on the -narrative have yet to be explored in visual storytelling. This paper presents a -pipeline of task-modules, Object Identification, Single-Image Inferencing, and -Multi-Image Narration, that serve as a preliminary design for building a -creative visual storyteller. We have piloted this design for a sequence of -images in an annotation task. We present and analyze the collected corpus and -describe plans towards automation. -" -8301,1807.08089,"Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-yi Lee, Lin-shan - Lee","Phonetic-and-Semantic Embedding of Spoken Words with Applications in - Spoken Content Retrieval",cs.CL cs.SD eess.AS," Word embedding or Word2Vec has been successful in offering semantics for text -words learned from the context of words. Audio Word2Vec was shown to offer -phonetic structures for spoken words (signal segments for words) learned from -signals within spoken words. This paper proposes a two-stage framework to -perform phonetic-and-semantic embedding on spoken words considering the context -of the spoken words. Stage 1 performs phonetic embedding with speaker -characteristics disentangled. Stage 2 then performs semantic embedding in -addition. We further propose to evaluate the phonetic-and-semantic nature of -the audio embeddings obtained in Stage 2 by parallelizing with text embeddings. -In general, phonetic structure and semantics inevitably disturb each other. For -example the words ""brother"" and ""sister"" are close in semantics but very -different in phonetic structure, while the words ""brother"" and ""bother"" are in -the other way around. But phonetic-and-semantic embedding is attractive, as -shown in the initial experiments on spoken document retrieval. Not only spoken -documents including the spoken query can be retrieved based on the phonetic -structures, but spoken documents semantically related to the query but not -including the query can also be retrieved based on the semantics. -" -8302,1807.08133,John D. Kelleher and Simon Dobnik,"What is not where: the challenge of integrating spatial representations - into deep learning architectures",cs.LG cs.AI cs.CL cs.NE stat.ML," This paper examines to what degree current deep learning architectures for -image caption generation capture spatial language. On the basis of the -evaluation of examples of generated captions from the literature we argue that -systems capture what objects are in the image data but not where these objects -are located: the captions generated by these systems are the output of a -language model conditioned on the output of an object detector that cannot -capture fine-grained location information. Although language models provide -useful knowledge for image captions, we argue that deep learning image -captioning architectures should also model geometric relations between objects. -" -8303,1807.08204,"Pasquale Minervini, Matko Bosnjak, Tim Rockt\""aschel, Sebastian Riedel",Towards Neural Theorem Proving at Scale,cs.AI cs.CL," Neural models combining representation learning and reasoning in an -end-to-end trainable manner are receiving increasing interest. However, their -use is severely limited by their computational complexity, which renders them -unusable on real world datasets. We focus on the Neural Theorem Prover (NTP) -model proposed by Rockt{\""{a}}schel and Riedel (2017), a continuous relaxation -of the Prolog backward chaining algorithm where unification between terms is -replaced by the similarity between their embedding representations. For -answering a given query, this model needs to consider all possible proof paths, -and then aggregate results - this quickly becomes infeasible even for small -Knowledge Bases (KBs). We observe that we can accurately approximate the -inference process in this model by considering only proof paths associated with -the highest proof scores. This enables inference and learning on previously -impracticable KBs. -" -8304,1807.08228,"Yuanhang Su, Ruiyuan Lin, C.-C. Jay Kuo","Tree-structured multi-stage principal component analysis (TMPCA): theory - and applications",cs.CL," A PCA based sequence-to-vector (seq2vec) dimension reduction method for the -text classification problem, called the tree-structured multi-stage principal -component analysis (TMPCA) is presented in this paper. Theoretical analysis and -applicability of TMPCA are demonstrated as an extension to our previous work -(Su, Huang & Kuo). Unlike conventional word-to-vector embedding methods, the -TMPCA method conducts dimension reduction at the sequence level without labeled -training data. Furthermore, it can preserve the sequential structure of input -sequences. We show that TMPCA is computationally efficient and able to -facilitate sequence-based text classification tasks by preserving strong mutual -information between its input and output mathematically. It is also -demonstrated by experimental results that a dense (fully connected) network -trained on the TMPCA preprocessed data achieves better performance than -state-of-the-art fastText and other neural-network-based solutions. -" -8305,1807.08230,"Alina Maria Ciobanu, Shervin Malmasi, Liviu P. Dinu",German Dialect Identification Using Classifier Ensembles,cs.CL," In this paper we present the GDI_classification entry to the second German -Dialect Identification (GDI) shared task organized within the scope of the -VarDial Evaluation Campaign 2018. We present a system based on SVM classifier -ensembles trained on characters and words. The system was trained on a -collection of speech transcripts of five Swiss-German dialects provided by the -organizers. The transcripts included in the dataset contained speakers from -Basel, Bern, Lucerne, and Zurich. Our entry in the challenge reached 62.03% -F1-score and was ranked third out of eight teams. -" -8306,1807.08280,"Andros Tjandra, Sakriani Sakti, Satoshi Nakamura","Multi-scale Alignment and Contextual History for Attention Mechanism in - Sequence-to-sequence Model",cs.CL cs.LG cs.SD eess.AS," A sequence-to-sequence model is a neural network module for mapping two -sequences of different lengths. The sequence-to-sequence model has three core -modules: encoder, decoder, and attention. Attention is the bridge that connects -the encoder and decoder modules and improves model performance in many tasks. -In this paper, we propose two ideas to improve sequence-to-sequence model -performance by enhancing the attention module. First, we maintain the history -of the location and the expected context from several previous time-steps. -Second, we apply multiscale convolution from several previous attention vectors -to the current decoder state. We utilized our proposed framework for -sequence-to-sequence speech recognition and text-to-speech systems. The results -reveal that our proposed extension could improve performance significantly -compared to a standard attention baseline. -" -8307,1807.08374,"Chao Lu, Yi Bu, Jie Wang, Ying Ding, Vetle Torvik, Matthew Schnaars, - Chengzhi Zhang","Examining Scientific Writing Styles from the Perspective of Linguistic - Complexity",cs.CL," Publishing articles in high-impact English journals is difficult for scholars -around the world, especially for non-native English-speaking scholars (NNESs), -most of whom struggle with proficiency in English. In order to uncover the -differences in English scientific writing between native English-speaking -scholars (NESs) and NNESs, we collected a large-scale data set containing more -than 150,000 full-text articles published in PLoS between 2006 and 2015. We -divided these articles into three groups according to the ethnic backgrounds of -the first and corresponding authors, obtained by Ethnea, and examined the -scientific writing styles in English from a two-fold perspective of linguistic -complexity: (1) syntactic complexity, including measurements of sentence length -and sentence complexity; and (2) lexical complexity, including measurements of -lexical diversity, lexical density, and lexical sophistication. The -observations suggest marginal differences between groups in syntactical and -lexical complexity. -" -8308,1807.08435,"Prakruthi Prabhakar, Nitish Kulkarni, Linghao Zhang",Question Relevance in Visual Question Answering,cs.CV cs.CL," Free-form and open-ended Visual Question Answering systems solve the problem -of providing an accurate natural language answer to a question pertaining to an -image. Current VQA systems do not evaluate if the posed question is relevant to -the input image and hence provide nonsensical answers when posed with -irrelevant questions to an image. In this paper, we solve the problem of -identifying the relevance of the posed question to an image. We address the -problem as two sub-problems. We first identify if the question is visual or -not. If the question is visual, we then determine if it's relevant to the image -or not. For the second problem, we generate a large dataset from existing -visual question answering datasets in order to enable the training of complex -architectures and model the relevance of a visual question to an image. We also -compare the results of our Long Short-Term Memory Recurrent Neural Network -based models to Logistic Regression, XGBoost and multi-layer perceptron based -approaches to the problem. -" -8309,1807.08447,"Rakshit Trivedi and Bunyamin Sisman and Jun Ma and Christos Faloutsos - and Hongyuan Zha and Xin Luna Dong",LinkNBed: Multi-Graph Representation Learning with Entity Linkage,cs.LG cs.AI cs.CL stat.ML," Knowledge graphs have emerged as an important model for studying complex -multi-relational data. This has given rise to the construction of numerous -large scale but incomplete knowledge graphs encoding information extracted from -various resources. An effective and scalable approach to jointly learn over -multiple graphs and eventually construct a unified graph is a crucial next step -for the success of knowledge-based inference for many downstream applications. -To this end, we propose LinkNBed, a deep relational learning framework that -learns entity and relationship representations across multiple graphs. We -identify entity linkage across graphs as a vital component to achieve our goal. -We design a novel objective that leverage entity linkage and build an efficient -multi-task training procedure. Experiments on link prediction and entity -linkage demonstrate substantial improvements over the state-of-the-art -relational learning approaches. -" -8310,1807.08465,"Philipp Blandfort, Desmond Patton, William R. Frey, Svebor Karaman, - Surabhi Bhargava, Fei-Tzin Lee, Siddharth Varia, Chris Kedzie, Michael B. - Gaskell, Rossano Schifanella, Kathleen McKeown, Shih-Fu Chang",Multimodal Social Media Analysis for Gang Violence Prevention,cs.LG cs.CL stat.ML," Gang violence is a severe issue in major cities across the U.S. and recent -studies [Patton et al. 2017] have found evidence of social media communications -that can be linked to such violence in communities with high rates of exposure -to gang activity. In this paper we partnered computer scientists with social -work researchers, who have domain expertise in gang violence, to analyze how -public tweets with images posted by youth who mention gang associations on -Twitter can be leveraged to automatically detect psychosocial factors and -conditions that could potentially assist social workers and violence outreach -workers in prevention and early intervention programs. To this end, we -developed a rigorous methodology for collecting and annotating tweets. We -gathered 1,851 tweets and accompanying annotations related to visual concepts -and the psychosocial codes: aggression, loss, and substance use. These codes -are relevant to social work interventions, as they represent possible pathways -to violence on social media. We compare various methods for classifying tweets -into these three classes, using only the text of the tweet, only the image of -the tweet, or both modalities as input to the classifier. In particular, we -analyze the usefulness of mid-level visual concepts and the role of different -modalities for this tweet classification task. Our experiments show that -individually, text information dominates classification performance of the loss -class, while image information dominates the aggression and substance use -classes. Our multimodal approach provides a very promising improvement (18% -relative in mean average precision) over the best single modality approach. -Finally, we also illustrate the complexity of understanding social media data -and elaborate on open challenges. -" -8311,1807.08484,"Ruijie Wang, Yuchen Yan, Jialu Wang, Yuting Jia, Ye Zhang, Weinan - Zhang, Xinbing Wang",AceKG: A Large-scale Knowledge Graph for Academic Data Mining,cs.IR cs.AI cs.CL," Most existing knowledge graphs (KGs) in academic domains suffer from problems -of insufficient multi-relational information, name ambiguity and improper data -format for large-scale machine processing. In this paper, we present AceKG, a -new large-scale KG in academic domain. AceKG not only provides clean academic -information, but also offers a large-scale benchmark dataset for researchers to -conduct challenging data mining projects including link prediction, community -detection and scholar classification. Specifically, AceKG describes 3.13 -billion triples of academic facts based on a consistent ontology, including -necessary properties of papers, authors, fields of study, venues and -institutes, as well as the relations among them. To enrich the proposed -knowledge graph, we also perform entity alignment with existing databases and -rule-based inference. Based on AceKG, we conduct experiments of three typical -academic data mining tasks and evaluate several state-of- the-art knowledge -embedding and network representation learning approaches on the benchmark -datasets built from AceKG. Finally, we discuss several promising research -directions that benefit from AceKG. -" -8312,1807.08587,"Eug\'enio Ribeiro, Ricardo Ribeiro, and David Martins de Matos","Deep Dialog Act Recognition using Multiple Token, Segment, and Context - Information Representations",cs.CL," Dialog act (DA) recognition is a task that has been widely explored over the -years. Recently, most approaches to the task explored different DNN -architectures to combine the representations of the words in a segment and -generate a segment representation that provides cues for intention. In this -study, we explore means to generate more informative segment representations, -not only by exploring different network architectures, but also by considering -different token representations, not only at the word level, but also at the -character and functional levels. At the word level, in addition to the commonly -used uncontextualized embeddings, we explore the use of contextualized -representations, which provide information concerning word sense and segment -structure. Character-level tokenization is important to capture -intention-related morphological aspects that cannot be captured at the word -level. Finally, the functional level provides an abstraction from words, which -shifts the focus to the structure of the segment. We also explore approaches to -enrich the segment representation with context information from the history of -the dialog, both in terms of the classifications of the surrounding segments -and the turn-taking history. This kind of information has already been proved -important for the disambiguation of DAs in previous studies. Nevertheless, we -are able to capture additional information by considering a summary of the -dialog history and a wider turn-taking context. By combining the best -approaches at each step, we achieve results that surpass the previous -state-of-the-art on generic DA recognition on both SwDA and MRDA, two of the -most widely explored corpora for the task. Furthermore, by considering both -past and future context, simulating annotation scenario, our approach achieves -a performance similar to that of a human annotator on SwDA and surpasses it on -MRDA. -" -8313,1807.08666,"Raghav Menon, Herman Kamper, Emre Yilmaz, John Quinn, Thomas Niesler","ASR-free CNN-DTW keyword spotting using multilingual bottleneck features - for almost zero-resource languages",cs.CL stat.ML," We consider multilingual bottleneck features (BNFs) for nearly zero-resource -keyword spotting. This forms part of a United Nations effort using keyword -spotting to support humanitarian relief programmes in parts of Africa where -languages are severely under-resourced. We use 1920 isolated keywords (40 -types, 34 minutes) as exemplars for dynamic time warping (DTW) template -matching, which is performed on a much larger body of untranscribed speech. -These DTW costs are used as targets for a convolutional neural network (CNN) -keyword spotter, giving a much faster system than direct DTW. Here we consider -how available data from well-resourced languages can improve this CNN-DTW -approach. We show that multilingual BNFs trained on ten languages improve the -area under the ROC curve of a CNN-DTW system by 10.9% absolute relative to the -MFCC baseline. By combining low-resource DTW-based supervision with information -from well-resourced languages, CNN-DTW is a competitive option for low-resource -keyword spotting. -" -8314,1807.08669,"Raghav Menon, Astik Biswas, Armin Saeb, John Quinn and Thomas Niesler",Automatic Speech Recognition for Humanitarian Applications in Somali,cs.CL stat.ML," We present our first efforts in building an automatic speech recognition -system for Somali, an under-resourced language, using 1.57 hrs of annotated -speech for acoustic model training. The system is part of an ongoing effort by -the United Nations (UN) to implement keyword spotting systems supporting -humanitarian relief programmes in parts of Africa where languages are severely -under-resourced. We evaluate several types of acoustic model, including recent -neural architectures. Language model data augmentation using a combination of -recurrent neural networks (RNN) and long short-term memory neural networks -(LSTMs) as well as the perturbation of acoustic data are also considered. We -find that both types of data augmentation are beneficial to performance, with -our best system using a combination of convolutional neural networks (CNNs), -time-delay neural networks (TDNNs) and bi-directional long short term memory -(BLSTMs) to achieve a word error rate of 53.75%. -" -8315,1807.08945,"Jing Yang, Biao Zhang, Yue Qin, Xiangwen Zhang, Qian Lin and Jinsong - Su",Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT,cs.CL," Although neural machine translation(NMT) yields promising translation -performance, it unfortunately suffers from over- and under-translation is- sues -[Tu et al., 2016], of which studies have become research hotspots in NMT. At -present, these studies mainly apply the dominant automatic evaluation metrics, -such as BLEU, to evaluate the overall translation quality with respect to both -adequacy and uency. However, they are unable to accurately measure the ability -of NMT systems in dealing with the above-mentioned issues. In this paper, we -propose two quantitative metrics, the Otem and Utem, to automatically evaluate -the system perfor- mance in terms of over- and under-translation respectively. -Both metrics are based on the proportion of mismatched n-grams between gold -ref- erence and system translation. We evaluate both metrics by comparing their -scores with human evaluations, where the values of Pearson Cor- relation -Coefficient reveal their strong correlation. Moreover, in-depth analyses on -various translation systems indicate some inconsistency be- tween BLEU and our -proposed metrics, highlighting the necessity and significance of our metrics. -" -8316,1807.08998,"Steffen Eger and Johannes Daxenberger and Christian Stab and Iryna - Gurevych","Cross-lingual Argumentation Mining: Machine Translation (and a bit of - Projection) is All You Need!",cs.CL," Argumentation mining (AM) requires the identification of complex discourse -structures and has lately been applied with success monolingually. In this -work, we show that the existing resources are, however, not adequate for -assessing cross-lingual AM, due to their heterogeneity or lack of complexity. -We therefore create suitable parallel corpora by (human and machine) -translating a popular AM dataset consisting of persuasive student essays into -German, French, Spanish, and Chinese. We then compare (i) annotation projection -and (ii) bilingual word embeddings based direct transfer strategies for -cross-lingual AM, finding that the former performs considerably better and -almost eliminates the loss from cross-lingual transfer. Moreover, we find that -annotation projection works equally well when using either costly human or -cheap machine translations. Our code and data are available at -\url{http://github.com/UKPLab/coling2018-xling_argument_mining}. -" -8317,1807.09000,"Robert D. Hawkins, Hyowon Gweon, Noah D. Goodman","The division of labor in communication: Speakers help listeners account - for asymmetries in visual perspective",cs.CL," Recent debates over adults' theory of mind use have been fueled by surprising -failures of perspective-taking in communication, suggesting that -perspective-taking can be relatively effortful. How, then, should speakers and -listeners allocate their resources to achieve successful communication? We -begin with the observation that this shared goal induces a natural division of -labor: the resources one agent chooses to allocate toward perspective-taking -should depend on their expectations about the other's allocation. We formalize -this idea in a resource-rational model augmenting recent probabilistic -weighting accounts with a mechanism for (costly) control over the degree of -perspective-taking. In a series of simulations, we first derive an intermediate -degree of perspective weighting as an optimal tradeoff between expected costs -and benefits of perspective-taking. We then present two behavioral experiments -testing novel predictions of our model. In Experiment 1, we manipulated the -presence or absence of occlusions in a director-matcher task and found that -speakers spontaneously produced more informative descriptions to account for -""known unknowns"" in their partner's private view. In Experiment 2, we compared -the scripted utterances used by confederates in prior work with those produced -in interactions with unscripted directors. We found that confederates were -systematically less informative than listeners would initially expect given the -presence of occlusions, but listeners used violations to adaptively make fewer -errors over time. Taken together, our work suggests that people are not simply -""mindblind""; they use contextually appropriate expectations to navigate the -division of labor with their partner. We discuss how a resource rational -framework may provide a more deeply explanatory foundation for understanding -flexible perspective-taking under processing constraints. -" -8318,1807.09433,"Kai Fan, Jiayi Wang, Bo Li, Fengming Zhou, Boxing Chen, Luo Si","""Bilingual Expert"" Can Find Translation Errors",cs.CL," Recent advances in statistical machine translation via the adoption of neural -sequence-to-sequence models empower the end-to-end system to achieve -state-of-the-art in many WMT benchmarks. The performance of such machine -translation (MT) system is usually evaluated by automatic metric BLEU when the -golden references are provided for validation. However, for model inference or -production deployment, the golden references are prohibitively available or -require expensive human annotation with bilingual expertise. In order to -address the issue of quality evaluation (QE) without reference, we propose a -general framework for automatic evaluation of translation output for most WMT -quality evaluation tasks. We first build a conditional target language model -with a novel bidirectional transformer, named neural bilingual expert model, -which is pre-trained on large parallel corpora for feature extraction. For QE -inference, the bilingual expert model can simultaneously produce the joint -latent representation between the source and the translation, and real-valued -measurements of possible erroneous tokens based on the prior knowledge learned -from parallel data. Subsequently, the features will further be fed into a -simple Bi-LSTM predictive model for quality evaluation. The experimental -results show that our approach achieves the state-of-the-art performance in the -quality estimation track of WMT 2017/2018. -" -8319,1807.09434,"Boeun Kim, Young Han Lee, Hyedong Jung and Choongsang Cho",Distinctive-attribute Extraction for Image Captioning,cs.CV cs.CL," Image captioning, an open research issue, has been evolved with the progress -of deep neural networks. Convolutional neural networks (CNNs) and recurrent -neural networks (RNNs) are employed to compute image features and generate -natural language descriptions in the research. In previous works, a caption -involving semantic description can be generated by applying additional -information into the RNNs. In this approach, we propose a distinctive-attribute -extraction (DaE) which explicitly encourages significant meanings to generate -an accurate caption describing the overall meaning of the image with their -unique situation. Specifically, the captions of training images are analyzed by -term frequency-inverse document frequency (TF-IDF), and the analyzed semantic -information is trained to extract distinctive-attributes for inferring -captions. The proposed scheme is evaluated on a challenge data, and it improves -an objective performance while describing images in more detail. -" -8320,1807.09561,"Ahmad Hany Hossny, Terry Moschou, Grant Osborne, Lewis Mitchell, Nick - Lothian","Enhancing keyword correlation for event detection in social networks - using SVD and k-means: Twitter case study",cs.SI cs.CL cs.LG," Extracting textual features from tweets is a challenging process due to the -noisy nature of the content and the weak signal of most of the words used. In -this paper, we propose using singular value decomposition (SVD) with clustering -to enhance the signals of the textual features in the tweets to improve the -correlation with events. The proposed technique applies SVD to the time series -vector for each feature to factorize the matrix of feature/day counts, in order -to ensure the independence of the feature vectors. Afterwards, the k-means -clustering is applied to build a look-up table that maps members of each -cluster to the cluster-centroid. The lookup table is used to map each feature -in the original data to the centroid of its cluster, then we calculate the sum -of the term frequency vectors of all features in each cluster to the -term-frequency-vector of the cluster centroid. To test the technique we -calculated the correlations of the cluster centroids with the golden standard -record (GSR) vector before and after summing the vectors of the cluster members -to the centroid-vector. The proposed method is applied to multiple correlation -techniques including the Pearson, Spearman, distance correlation and Kendal -Tao. The experiments have also considered the different word forms and lengths -of the features including keywords, n-grams, skip-grams and bags-of-words. The -correlation results are enhanced significantly as the highest correlation -scores have increased from 0.3 to 0.6, and the average correlation scores have -increased from 0.3 to 0.4. -" -8321,1807.09597,Shruti Palaskar and Florian Metze,Acoustic-to-Word Recognition with Sequence-to-Sequence Models,eess.AS cs.CL cs.LG cs.SD," Acoustic-to-Word recognition provides a straightforward solution to -end-to-end speech recognition without needing external decoding, language model -re-scoring or lexicon. While character-based models offer a natural solution to -the out-of-vocabulary problem, word models can be simpler to decode and may -also be able to directly recognize semantically meaningful units. We present -effective methods to train Sequence-to-Sequence models for direct word-level -recognition (and character-level recognition) and show an absolute improvement -of 4.4-5.0\% in Word Error Rate on the Switchboard corpus compared to prior -work. In addition to these promising results, word-based models are more -interpretable than character models, which have to be composed into words using -a separate decoding step. We analyze the encoder hidden states and the -attention behavior, and show that location-aware attention naturally represents -words as a single speech-word-vector, despite spanning multiple frames in the -input. We finally show that the Acoustic-to-Word model also learns to segment -speech into words with a mean standard deviation of 3 frames as compared with -human annotated forced-alignments for the Switchboard corpus. -" -8322,1807.09602,"Seyed Mahdi Rezaeinia, Ali Ghodsi, Rouhollah Rahmani",Text Classification based on Multiple Block Convolutional Highways,cs.CL," In the Text Classification areas of Sentiment Analysis, -Subjectivity/Objectivity Analysis, and Opinion Polarity, Convolutional Neural -Networks have gained special attention because of their performance and -accuracy. In this work, we applied recent advances in CNNs and propose a novel -architecture, Multiple Block Convolutional Highways (MBCH), which achieves -improved accuracy on multiple popular benchmark datasets, compared to previous -architectures. The MBCH is based on new techniques and architectures including -highway networks, DenseNet, batch normalization and bottleneck layers. In -addition, to cope with the limitations of existing pre-trained word vectors -which are used as inputs for the CNN, we propose a novel method, Improved Word -Vectors (IWV). The IWV improves the accuracy of CNNs which are used for text -classification tasks. -" -8323,1807.09623,Alon Talmor and Jonathan Berant,Repartitioning of the ComplexWebQuestions Dataset,cs.CL cs.AI cs.LG," Recently, Talmor and Berant (2018) introduced ComplexWebQuestions - a dataset -focused on answering complex questions by decomposing them into a sequence of -simpler questions and extracting the answer from retrieved web snippets. In -their work the authors used a pre-trained reading comprehension (RC) model -(Salant and Berant, 2018) to extract the answer from the web snippets. In this -short note we show that training a RC model directly on the training data of -ComplexWebQuestions reveals a leakage from the training set to the test set -that allows to obtain unreasonably high performance. As a solution, we -construct a new partitioning of ComplexWebQuestions that does not suffer from -this leakage and publicly release it. We also perform an empirical evaluation -on these two datasets and show that training a RC model on the training data -substantially improves state-of-the-art performance. -" -8324,1807.09639,Yingting Wu and Hai Zhao,Finding Better Subword Segmentation for Neural Machine Translation,cs.CL cs.AI cs.LG," For different language pairs, word-level neural machine translation (NMT) -models with a fixed-size vocabulary suffer from the same problem of -representing out-of-vocabulary (OOV) words. The common practice usually -replaces all these rare or unknown words with a token, which limits the -translation performance to some extent. Most of recent work handled such a -problem by splitting words into characters or other specially extracted subword -units to enable open-vocabulary translation. Byte pair encoding (BPE) is one of -the successful attempts that has been shown extremely competitive by providing -effective subword segmentation for NMT systems. In this paper, we extend the -BPE style segmentation to a general unsupervised framework with three -statistical measures: frequency (FRQ), accessor variety (AV) and description -length gain (DLG). We test our approach on two translation tasks: German to -English and Chinese to English. The experimental results show that AV and DLG -enhanced systems outperform the FRQ baseline in the frequency weighted schemes -at different significant levels. -" -8325,1807.09671,"Wencan Luo, Fei Liu, Zitao Liu, and Diane Litman",A Novel ILP Framework for Summarizing Content with High Lexical Variety,cs.CL," Summarizing content contributed by individuals can be challenging, because -people make different lexical choices even when describing the same events. -However, there remains a significant need to summarize such content. Examples -include the student responses to post-class reflective questions, product -reviews, and news articles published by different news agencies related to the -same events. High lexical diversity of these documents hinders the system's -ability to effectively identify salient content and reduce summary redundancy. -In this paper, we overcome this issue by introducing an integer linear -programming-based summarization framework. It incorporates a low-rank -approximation to the sentence-word co-occurrence matrix to intrinsically group -semantically-similar lexical items. We conduct extensive experiments on -datasets of student responses, product reviews, and news documents. Our -approach compares favorably to a number of extractive baselines as well as a -neural abstractive summarization system. The paper finally sheds light on when -and why the proposed framework is effective at summarizing content with high -lexical variety. -" -8326,1807.09842,"Muhammad Mahbubur Rahman, Tim Finin","Understanding and representing the semantics of large structured - documents",cs.CL cs.IR cs.LG stat.ML," Understanding large, structured documents like scholarly articles, requests -for proposals or business reports is a complex and difficult task. It involves -discovering a document's overall purpose and subject(s), understanding the -function and meaning of its sections and subsections, and extracting low level -entities and facts about them. In this research, we present a deep learning -based document ontology to capture the general purpose semantic structure and -domain specific semantic concepts from a large number of academic articles and -business documents. The ontology is able to describe different functional parts -of a document, which can be used to enhance semantic indexing for a better -understanding by human beings and machines. We evaluate our models through -extensive experiments on datasets of scholarly articles from arXiv and Request -for Proposal documents. -" -8327,1807.09844,Simon Dobnik and John D. Kelleher,"Modular Mechanistic Networks: On Bridging Mechanistic and - Phenomenological Models with Deep Neural Networks in Natural Language - Processing",cs.CL cs.AI cs.LG cs.NE stat.ML," Natural language processing (NLP) can be done using either top-down (theory -driven) and bottom-up (data driven) approaches, which we call mechanistic and -phenomenological respectively. The approaches are frequently considered to -stand in opposition to each other. Examining some recent approaches in deep -learning we argue that deep neural networks incorporate both perspectives and, -furthermore, that leveraging this aspect of deep learning may help in solving -complex problems within language technology, such as modelling language and -perception in the domain of spatial cognition. -" -8328,1807.09875,"Caio Corro, Ivan Titov","Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a - Structured Variational Autoencoder",cs.CL cs.LG," Human annotation for syntactic parsing is expensive, and large resources are -available only for a fraction of languages. A question we ask is whether one -can leverage abundant unlabeled texts to improve syntactic parsers, beyond just -using the texts to obtain more generalisable lexical features (i.e. beyond word -embeddings). To this end, we propose a novel latent-variable generative model -for semi-supervised syntactic dependency parsing. As exact inference is -intractable, we introduce a differentiable relaxation to obtain approximate -samples and compute gradients with respect to the parser parameters. Our method -(Differentiable Perturb-and-Parse) relies on differentiable dynamic programming -over stochastically perturbed edge scores. We demonstrate effectiveness of our -approach with experiments on English, French and Swedish. -" -8329,1807.09950,"Hung Le, Truyen Tran, Thin Nguyen and Svetha Venkatesh",Variational Memory Encoder-Decoder,cs.CL," Introducing variability while maintaining coherence is a core task in -learning to generate utterances in conversation. Standard neural -encoder-decoder models and their extensions using conditional variational -autoencoder often result in either trivial or digressive responses. To overcome -this, we explore a novel approach that injects variability into neural -encoder-decoder via the use of external memory as a mixture model, namely -Variational Memory Encoder-Decoder (VMED). By associating each memory read with -a mode in the latent mixture distribution at each timestep, our model can -capture the variability observed in sequential data such as natural -conversations. We empirically compare the proposed model against other recent -approaches on various conversational datasets. The results show that VMED -consistently achieves significant improvement over others in both metric-based -and qualitative evaluations. -" -8330,1807.10076,"Georgios Balikas, Ga\""el Dias, Rumen Moraliyski, Massih-Reza Amini",Concurrent Learning of Semantic Relations,cs.CL," Discovering whether words are semantically related and identifying the -specific semantic relation that holds between them is of crucial importance for -NLP as it is essential for tasks like query expansion in IR. Within this -context, different methodologies have been proposed that either exclusively -focus on a single lexical relation (e.g. hypernymy vs. random) or learn -specific classifiers capable of identifying multiple semantic relations (e.g. -hypernymy vs. synonymy vs. random). In this paper, we propose another way to -look at the problem that relies on the multi-task learning paradigm. In -particular, we want to study whether the learning process of a given semantic -relation (e.g. hypernymy) can be improved by the concurrent learning of another -semantic relation (e.g. co-hyponymy). Within this context, we particularly -examine the benefits of semi-supervised learning where the training of a -prediction function is performed over few labeled data jointly with many -unlabeled ones. Preliminary results based on simple learning strategies and -state-of-the-art distributional feature representations show that concurrent -learning can lead to improvements in a vast majority of tested situations. -" -8331,1807.10104,"Jonathan Mamou, Oren Pereg, Moshe Wasserblat, Ido Dagan, Yoav - Goldberg, Alon Eirew, Yael Green, Shira Guskin, Peter Izsak, Daniel Korat","Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end - Workflow",cs.AI cs.CL," We present SetExpander, a corpus-based system for expanding a seed set of -terms into a more complete set of terms that belong to the same semantic class. -SetExpander implements an iterative end-to end workflow for term set expansion. -It enables users to easily select a seed set of terms, expand it, view the -expanded set, validate it, re-expand the validated set and store it, thus -simplifying the extraction of domain-specific fine-grained semantic classes. -SetExpander has been used for solving real-life use cases including integration -in an automated recruitment system and an issues and defects resolution system. -A video demo of SetExpander is available at -https://drive.google.com/open?id=1e545bB87Autsch36DjnJHmq3HWfSd1Rv (some images -were blurred for privacy reasons). -" -8332,1807.10311,"Benjamin Milde and Arne K\""ohn",Open Source Automatic Speech Recognition for German,cs.CL," High quality Automatic Speech Recognition (ASR) is a prerequisite for -speech-based applications and research. While state-of-the-art ASR software is -freely available, the language dependent acoustic models are lacking for -languages other than English, due to the limited amount of freely available -training data. We train acoustic models for German with Kaldi on two datasets, -which are both distributed under a Creative Commons license. The resulting -model is freely redistributable, lowering the cost of entry for German ASR. The -models are trained on a total of 412 hours of German read speech data and we -achieve a relative word error reduction of 26% by adding data from the Spoken -Wikipedia Corpus to the previously best freely available German acoustic model -recipe and dataset. Our best model achieves a word error rate of 14.38 on the -Tuda-De test set. Due to the large amount of speakers and the diversity of -topics included in the training data, our model is robust against speaker -variation and topic shift. -" -8333,1807.10543,"Neslihan Suzen, Alexander Gorban, Jeremy Levesley and Evgeny Mirkes",Automatic Short Answer Grading and Feedback Using Text Mining Methods,cs.CL," Automatic grading is not a new approach but the need to adapt the latest -technology to automatic grading has become very important. As the technology -has rapidly became more powerful on scoring exams and essays, especially from -the 1990s onwards, partially or wholly automated grading systems using -computational methods have evolved and have become a major area of research. In -particular, the demand of scoring of natural language responses has created a -need for tools that can be applied to automatically grade these responses. In -this paper, we focus on the concept of automatic grading of short answer -questions such as are typical in the UK GCSE system, and providing useful -feedback on their answers to students. We present experimental results on a -dataset provided from the introductory computer science class in the University -of North Texas. We first apply standard data mining techniques to the corpus of -student answers for the purpose of measuring similarity between the student -answers and the model answer. This is based on the number of common words. We -then evaluate the relation between these similarities and marks awarded by -scorers. We then consider an approach that groups student answers into -clusters. Each cluster would be awarded the same mark, and the same feedback -given to each answer in a cluster. In this manner, we demonstrate that clusters -indicate the groups of students who are awarded the same or the similar scores. -Words in each cluster are compared to show that clusters are constructed based -on how many and which words of the model answer have been used. The main -novelty in this paper is that we design a model to predict marks based on the -similarities between the student answers and the model answer. -" -8334,1807.10564,Bryan Eikema and Wilker Aziz,Auto-Encoding Variational Neural Machine Translation,cs.CL," We present a deep generative model of bilingual sentence pairs for machine -translation. The model generates source and target sentences jointly from a -shared latent representation and is parameterised by neural networks. We -perform efficient training using amortised variational inference and -reparameterised gradients. Additionally, we discuss the statistical -implications of joint modelling and propose an efficient approximation to -maximum a posteriori decoding for fast test-time predictions. We demonstrate -the effectiveness of our model in three machine translation scenarios: -in-domain training, mixed-domain training, and learning from a mix of -gold-standard and synthetic data. Our experiments show consistently that our -joint formulation outperforms conditional modelling (i.e. standard neural -machine translation) in all such scenarios. -" -8335,1807.10615,"Nishtha Madaan, Sameep Mehta, Shravika Mittal, Ashima Suvarna","Judging a Book by its Description : Analyzing Gender Stereotypes in the - Man Bookers Prize Winning Fiction",cs.CL cs.AI," The presence of gender stereotypes in many aspects of society is a well-known -phenomenon. In this paper, we focus on studying and quantifying such -stereotypes and bias in the Man Bookers Prize winning fiction. We consider 275 -books shortlisted for Man Bookers Prize between 1969 and 2017. The gender bias -is analyzed by semantic modeling of book descriptions on Goodreads. This -reveals the pervasiveness of gender bias and stereotype in the books on -different features like occupation, introductions and actions associated to the -characters in the book. -" -8336,1807.10661,Jacopo Gobbi and Evgeny Stepanov and Giuseppe Riccardi,"Concept Tagging for Natural Language Understanding: Two Decadelong - Algorithm Development",cs.CL," Concept tagging is a type of structured learning needed for natural language -understanding (NLU) systems. In this task, meaning labels from a domain -ontology are assigned to word sequences. In this paper, we review the -algorithms developed over the last twenty five years. We perform a comparative -evaluation of generative, discriminative and deep learning methods on two -public datasets. We report on the statistical variability performance -measurements. The third contribution is the release of a repository of the -algorithms, datasets and recipes for NLU evaluation. -" -8337,1807.10675,"Sajawel Ahmed, Alexander Mehler","Resource-Size matters: Improving Neural Named Entity Recognition with - Optimized Large Corpora",cs.CL cs.LG stat.ML," This study improves the performance of neural named entity recognition by a -margin of up to 11% in F-score on the example of a low-resource language like -German, thereby outperforming existing baselines and establishing a new -state-of-the-art on each single open-source dataset. Rather than designing -deeper and wider hybrid neural architectures, we gather all available resources -and perform a detailed optimization and grammar-dependent morphological -processing consisting of lemmatization and part-of-speech tagging prior to -exposing the raw data to any training process. We test our approach in a -threefold monolingual experimental setup of a) single, b) joint, and c) -optimized training and shed light on the dependency of downstream-tasks on the -size of corpora used to compute word embeddings. -" -8338,1807.10740,"Marcely Zanon Boito, Antonios Anastasopoulos, Marika Lekakou, Aline - Villavicencio, Laurent Besacier",A small Griko-Italian speech translation corpus,cs.CL," This paper presents an extension to a very low-resource parallel corpus -collected in an endangered language, Griko, making it useful for computational -research. The corpus consists of 330 utterances (about 20 minutes of speech) -which have been transcribed and translated in Italian, with annotations for -word-level speech-to-transcription and speech-to-translation alignments. The -corpus also includes morphosyntactic tags and word-level glosses. Applying an -automatic unit discovery method, pseudo-phones were also generated. We detail -how the corpus was collected, cleaned and processed, and we illustrate its use -on zero-resource tasks by presenting some baseline results for the task of -speech-to-translation alignment and unsupervised word discovery. The dataset is -available online, aiming to encourage replicability and diversity in -computational language documentation experiments. -" -8339,1807.10800,"Abdulkareem Alsudais, Hovig Tchalian","Clustering Prominent People and Organizations in Topic-Specific Text - Corpora",cs.CL," Named entities in text documents are the names of people, organization, -location or other types of objects in the documents that exist in the real -world. A persisting research challenge is to use computational techniques to -identify such entities in text documents. Once identified, several text mining -tools and algorithms can be utilized to leverage these discovered named -entities and improve NLP applications. In this paper, a method that clusters -prominent names of people and organizations based on their semantic similarity -in a text corpus is proposed. The method relies on common named entity -recognition techniques and on recent word embeddings models. The semantic -similarity scores generated using the word embeddings models for the named -entities are used to cluster similar entities of the people and organizations -types. Two human judges evaluated ten variations of the method after it was run -on a corpus that consists of 4,821 articles on a specific topic. The -performance of the method was measured using three quantitative measures. The -results of these three metrics demonstrate that the method is effective in -clustering semantically similar named entities. -" -8340,1807.10805,"Mahtab Ahmed, Muhammad Rifayat Samee, Robert E. Mercer","Improving Neural Sequence Labelling using Additional Linguistic - Information",cs.CL cs.LG," Sequence labelling is the task of assigning categorical labels to a data -sequence. In Natural Language Processing, sequence labelling can be applied to -various fundamental problems, such as Part of Speech (POS) tagging, Named -Entity Recognition (NER), and Chunking. In this study, we propose a method to -add various linguistic features to the neural sequence framework to improve -sequence labelling. Besides word level knowledge, sense embeddings are added to -provide semantic information. Additionally, selective readings of character -embeddings are added to capture contextual as well as morphological features -for each word in a sentence. Compared to previous methods, these added -linguistic features allow us to design a more concise model and perform more -efficient training. Our proposed architecture achieves state of the art results -on the benchmark datasets of POS, NER, and chunking. Moreover, the convergence -rate of our model is significantly better than the previous state of the art -models. -" -8341,1807.10854,"Daniel W. Otter, Julian R. Medina, Jugal K. Kalita",A Survey of the Usages of Deep Learning in Natural Language Processing,cs.CL," Over the last several years, the field of natural language processing has -been propelled forward by an explosion in the use of deep learning models. This -survey provides a brief introduction to the field and a quick overview of deep -learning architectures and methods. It then sifts through the plethora of -recent studies and summarizes a large assortment of relevant contributions. -Analyzed research areas include several core linguistic processing issues in -addition to a number of applications of computational linguistics. A discussion -of the current state of the art is then provided along with recommendations for -future research in the field. -" -8342,1807.10857,"Shubham Toshniwal, Anjuli Kannan, Chung-Cheng Chiu, Yonghui Wu, Tara N - Sainath, Karen Livescu","A Comparison of Techniques for Language Model Integration in - Encoder-Decoder Speech Recognition",eess.AS cs.AI cs.CL cs.SD," Attention-based recurrent neural encoder-decoder models present an elegant -solution to the automatic speech recognition problem. This approach folds the -acoustic model, pronunciation model, and language model into a single network -and requires only a parallel corpus of speech and text for training. However, -unlike in conventional approaches that combine separate acoustic and language -models, it is not clear how to use additional (unpaired) text. While there has -been previous work on methods addressing this problem, a thorough comparison -among methods is still lacking. In this paper, we compare a suite of past -methods and some of our own proposed methods for using unpaired text data to -improve encoder-decoder models. For evaluation, we use the medium-sized -Switchboard data set and the large-scale Google voice search and dictation data -sets. Our results confirm the benefits of using unpaired text across a range of -methods and data sets. Surprisingly, for first-pass decoding, the rather simple -approach of shallow fusion performs best across data sets. However, for Google -data sets we find that cold fusion has a lower oracle error rate and -outperforms other approaches after second-pass rescoring on the Google voice -search data set. -" -8343,1807.10893,"Tomoki Hayashi, Shinji Watanabe, Yu Zhang, Tomoki Toda, Takaaki Hori, - Ramon Astudillo, Kazuya Takeda",Back-Translation-Style Data Augmentation for End-to-End ASR,cs.CL," In this paper we propose a novel data augmentation method for attention-based -end-to-end automatic speech recognition (E2E-ASR), utilizing a large amount of -text which is not paired with speech signals. Inspired by the back-translation -technique proposed in the field of machine translation, we build a neural -text-to-encoder model which predicts a sequence of hidden states extracted by a -pre-trained E2E-ASR encoder from a sequence of characters. By using hidden -states as a target instead of acoustic features, it is possible to achieve -faster attention learning and reduce computational cost, thanks to sub-sampling -in E2E-ASR encoder, also the use of the hidden states can avoid to model -speaker dependencies unlike acoustic features. After training, the -text-to-encoder model generates the hidden states from a large amount of -unpaired text, then E2E-ASR decoder is retrained using the generated hidden -states as additional training data. Experimental evaluation using LibriSpeech -dataset demonstrates that our proposed method achieves improvement of ASR -performance and reduces the number of unknown words without the need for paired -data. -" -8344,1807.10945,"Emre Y{\i}lmaz, Henk van den Heuvel and David A. van Leeuwen","Acoustic and Textual Data Augmentation for Improved ASR of - Code-Switching Speech",cs.CL," In this paper, we describe several techniques for improving the acoustic and -language model of an automatic speech recognition (ASR) system operating on -code-switching (CS) speech. We focus on the recognition of Frisian-Dutch radio -broadcasts where one of the mixed languages, namely Frisian, is an -under-resourced language. In previous work, we have proposed several automatic -transcription strategies for CS speech to increase the amount of available -training speech data. In this work, we explore how the acoustic modeling (AM) -can benefit from monolingual speech data belonging to the high-resourced mixed -language. For this purpose, we train state-of-the-art AMs, which were -ineffective due to lack of training data, on a significantly increased amount -of CS speech and monolingual Dutch speech. Moreover, we improve the language -model (LM) by creating code-switching text, which is in practice almost -non-existent, by (1) generating text using recurrent LMs trained on the -transcriptions of the training CS speech data, (2) adding the transcriptions of -the automatically transcribed CS speech data and (3) translating Dutch text -extracted from the transcriptions of a large Dutch speech corpora. We report -significantly improved CS ASR performance due to the increase in the acoustic -and textual training data. -" -8345,1807.10948,"Emre Y{\i}lmaz, Vikramjit Mitra, Chris Bartels and Horacio Franco",Articulatory Features for ASR of Pathological Speech,cs.CL," In this work, we investigate the joint use of articulatory and acoustic -features for automatic speech recognition (ASR) of pathological speech. Despite -long-lasting efforts to build speaker- and text-independent ASR systems for -people with dysarthria, the performance of state-of-the-art systems is still -considerably lower on this type of speech than on normal speech. The most -prominent reason for the inferior performance is the high variability in -pathological speech that is characterized by the spectrotemporal deviations -caused by articulatory impairments due to various etiologies. To cope with this -high variation, we propose to use speech representations which utilize -articulatory information together with the acoustic properties. A designated -acoustic model, namely a fused-feature-map convolutional neural network (fCNN), -which performs frequency convolution on acoustic features and time convolution -on articulatory features is trained and tested on a Dutch and a Flemish -pathological speech corpus. The ASR performance of fCNN-based ASR system using -joint features is compared to other neural network architectures such -conventional CNNs and time-frequency convolutional networks (TFCNNs) in several -training scenarios. -" -8346,1807.10949,"Emre Y{\i}lmaz, Astik Biswas, Ewald van der Westhuizen, Febe de Wet - and Thomas Niesler",Building a Unified Code-Switching ASR System for South African Languages,cs.CL," We present our first efforts towards building a single multilingual automatic -speech recognition (ASR) system that can process code-switching (CS) speech in -five languages spoken within the same population. This contrasts with related -prior work which focuses on the recognition of CS speech in bilingual -scenarios. Recently, we have compiled a small five-language corpus of South -African soap opera speech which contains examples of CS between 5 languages -occurring in various contexts such as using English as the matrix language and -switching to other indigenous languages. The ASR system presented in this work -is trained on 4 corpora containing English-isiZulu, English-isiXhosa, -English-Setswana and English-Sesotho CS speech. The interpolation of multiple -language models trained on these language pairs enables the ASR system to -hypothesize mixed word sequences from these 5 languages. We evaluate various -state-of-the-art acoustic models trained on this 5-lingual training data and -report ASR accuracy and language recognition performance on the development and -test sets of the South African multilingual soap opera corpus. -" -8347,1807.10965,"Jennifer Sleeman, Tim Finin, Milton Halem",Ontology-Grounded Topic Modeling for Climate Science Research,cs.CL cs.AI," In scientific disciplines where research findings have a strong impact on -society, reducing the amount of time it takes to understand, synthesize and -exploit the research is invaluable. Topic modeling is an effective technique -for summarizing a collection of documents to find the main themes among them -and to classify other documents that have a similar mixture of co-occurring -words. We show how grounding a topic model with an ontology, extracted from a -glossary of important domain phrases, improves the topics generated and makes -them easier to understand. We apply and evaluate this method to the climate -science domain. The result improves the topics generated and supports faster -research understanding, discovery of social networks among researchers, and -automatic ontology generation. -" -8348,1807.10984,"Siddharth Dalmia, Xinjian Li, Florian Metze and Alan W. Black",Domain Robust Feature Extraction for Rapid Low Resource ASR Development,cs.CL cs.SD eess.AS," Developing a practical speech recognizer for a low resource language is -challenging, not only because of the (potentially unknown) properties of the -language, but also because test data may not be from the same domain as the -available training data. In this paper, we focus on the latter challenge, i.e. -domain mismatch, for systems trained using a sequence-based criterion. We -demonstrate the effectiveness of using a pre-trained English recognizer, which -is robust to such mismatched conditions, as a domain normalizing feature -extractor on a low resource language. In our example, we use Turkish -Conversational Speech and Broadcast News data. This enables rapid development -of speech recognizers for new languages which can easily adapt to any domain. -Testing in various cross-domain scenarios, we achieve relative improvements of -around 25% in phoneme error rate, with improvements being around 50% for some -domains. -" -8349,1807.11024,"L.H. Nguyen, N.T.H. Pham, V.M. Ngo","Opinion Spam Recognition Method for Online Reviews using Ontological - Features",cs.IR cs.AI cs.CL," Nowadays, there are a lot of people using social media opinions to make their -decision on buying products or services. Opinion spam detection is a hard -problem because fake reviews can be made by organizations as well as -individuals for different purposes. They write fake reviews to mislead readers -or automated detection system by promoting or demoting target products to -promote them or to damage their reputations. In this paper, we pro-pose a new -approach using knowledge-based Ontology to detect opinion spam with high -accuracy (higher than 75%). Keywords: Opinion spam, Fake review, E-commercial, -Ontology. -" -8350,1807.11057,"Wei Li, Brian Mak",NMT-based Cross-lingual Document Embeddings,cs.CL," This paper investigates a cross-lingual document embedding method that -improves the current Neural machine Translation framework based Document Vector -(NTDV or simply NV). NV is developed with a self-attention mechanism under the -neural machine translation (NMT) framework. In NV, each pair of parallel -documents in different languages are projected to the same shared layer in the -model. However, the pair of NV embeddings are not guaranteed to be similar. -This paper further adds a distance constraint to the training objective -function of NV so that the two embeddings of a parallel document are required -to be as close as possible. The new method will be called constrained NV (cNV). -In a cross-lingual document classification task, the new cNV performs as well -as NV and outperforms other published studies that require forward-pass -decoding. Compared with the previous NV, cNV does not need a translator during -testing, and so the method is lighter and more flexible. -" -8351,1807.11082,"Bin He, Yi Guan, Rui Dai",Convolutional Gated Recurrent Units for Medical Relation Classification,cs.CL," Convolutional neural network (CNN) and recurrent neural network (RNN) models -have become the mainstream methods for relation classification. We propose a -unified architecture, which exploits the advantages of CNN and RNN -simultaneously, to identify medical relations in clinical records, with only -word embedding features. Our model learns phrase-level features through a CNN -layer, and these feature representations are directly fed into a bidirectional -gated recurrent unit (GRU) layer to capture long-term feature dependencies. We -evaluate our model on two clinical datasets, and experiments demonstrate that -our model performs significantly better than previous single-model methods on -both datasets. -" -8352,1807.11089,"Pramit Saha, Praneeth Srungarapu and Sidney Fels","Towards Automatic Speech Identification from Vocal Tract Shape Dynamics - in Real-time MRI",cs.SD cs.CL cs.CV cs.LG eess.AS," Vocal tract configurations play a vital role in generating distinguishable -speech sounds, by modulating the airflow and creating different resonant -cavities in speech production. They contain abundant information that can be -utilized to better understand the underlying speech production mechanism. As a -step towards automatic mapping of vocal tract shape geometry to acoustics, this -paper employs effective video action recognition techniques, like Long-term -Recurrent Convolutional Networks (LRCN) models, to identify different -vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract. -Such a model typically combines a CNN based deep hierarchical visual feature -extractor with Recurrent Networks, that ideally makes the network -spatio-temporally deep enough to learn the sequential dynamics of a short video -clip for video classification tasks. We use a database consisting of 2D -real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The -comparative performances of this class of algorithms under various parameter -settings and for various classification tasks are discussed. Interestingly, the -results show a marked difference in the model performance in the context of -speech classification with respect to generic sequence or video classification -tasks. -" -8353,1807.11125,"Xiujun Li and Yu Wang and Siqi Sun and Sarah Panda and Jingjing Liu - and Jianfeng Gao","Microsoft Dialogue Challenge: Building End-to-End Task-Completion - Dialogue Systems",cs.CL cs.AI cs.LG," This proposal introduces a Dialogue Challenge for building end-to-end -task-completion dialogue systems, with the goal of encouraging the dialogue -research community to collaborate and benchmark on standard datasets and -unified experimental environment. In this special session, we will release -human-annotated conversational data in three domains (movie-ticket booking, -restaurant reservation, and taxi booking), as well as an experiment platform -with built-in simulators in each domain, for training and evaluation purposes. -The final submitted systems will be evaluated both in simulated setting and by -human judges. -" -8354,1807.11172,"Shweta Yadav, Joy Sain, Amit Sheth, Asif Ekbal, Sriparna Saha, Pushpak - Bhattacharyya","Leveraging Medical Sentiment to Understand Patients Health on Social - Media",cs.CL," The unprecedented growth of Internet users in recent years has resulted in an -abundance of unstructured information in the form of social media text. A large -percentage of this population is actively engaged in health social networks to -share health-related information. In this paper, we address an important and -timely topic by analyzing the users' sentiments and emotions w.r.t their -medical conditions. Towards this, we examine users on popular medical forums -(Patient.info,dailystrength.org), where they post on important topics such as -asthma, allergy, depression, and anxiety. First, we provide a benchmark setup -for the task by crawling the data, and further define the sentiment specific -fine-grained medical conditions (Recovered, Exist, Deteriorate, and Other). We -propose an effective architecture that uses a Convolutional Neural Network -(CNN) as a data-driven feature extractor and a Support Vector Machine (SVM) as -a classifier. We further develop a sentiment feature which is sensitive to the -medical context. Here, we show that the use of medical sentiment feature along -with extracted features from CNN improves the model performance. In addition to -our dataset, we also evaluate our approach on the benchmark ""CLEF eHealth 2014"" -corpora and show that our model outperforms the state-of-the-art techniques. -" -8355,1807.11219,"Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura",Training Neural Machine Translation using Word Embedding-based Loss,cs.CL," In neural machine translation (NMT), the computational cost at the output -layer increases with the size of the target-side vocabulary. Using a -limited-size vocabulary instead may cause a significant decrease in translation -quality. This trade-off is derived from a softmax-based loss function that -handles in-dictionary words independently, in which word similarity is not -considered. In this paper, we propose a novel NMT loss function that includes -word similarity in forms of distances in a word embedding space. The proposed -loss function encourages an NMT decoder to generate words close to their -references in the embedding space; this helps the decoder to choose similar -acceptable words when the actual best candidates are not included in the -vocabulary due to its size limitation. In experiments using ASPEC -Japanese-to-English and IWSLT17 English-to-French data sets, the proposed -method showed improvements against a standard NMT baseline in both datasets; -especially with IWSLT17 En-Fr, it achieved up to +1.72 in BLEU and +1.99 in -METEOR. When the target-side vocabulary was very limited to 1,000 words, the -proposed method demonstrated a substantial gain, +1.72 in METEOR with ASPEC -Ja-En. -" -8356,1807.11227,"Tao Li, Lei Lin, Minsoo Choi, Kaiming Fu, Siyuan Gong, Jian Wang",YouTube AV 50K: An Annotated Corpus for Comments in Autonomous Vehicles,cs.CL cs.AI," With one billion monthly viewers, and millions of users discussing and -sharing opinions, comments below YouTube videos are rich sources of data for -opinion mining and sentiment analysis. We introduce the YouTube AV 50K dataset, -a freely-available collections of more than 50,000 YouTube comments and -metadata below autonomous vehicle (AV)-related videos. We describe its creation -process, its content and data format, and discuss its possible usages. -Especially, we do a case study of the first self-driving car fatality to -evaluate the dataset, and show how we can use this dataset to better understand -public attitudes toward self-driving cars and public reactions to the accident. -Future developments of the dataset are also discussed. -" -8357,1807.11243,\'Alvaro Peris and Francisco Casacuberta,"Active Learning for Interactive Neural Machine Translation of Data - Streams",cs.CL," We study the application of active learning techniques to the translation of -unbounded data streams via interactive neural machine translation. The main -idea is to select, from an unbounded stream of source sentences, those worth to -be supervised by a human agent. The user will interactively translate those -samples. Once validated, these data is useful for adapting the neural machine -translation model. - We propose two novel methods for selecting the samples to be validated. We -exploit the information from the attention mechanism of a neural machine -translation system. Our experiments show that the inclusion of active learning -techniques into this pipeline allows to reduce the effort required during the -process, while increasing the quality of the translation system. Moreover, it -enables to balance the human effort required for achieving a certain -translation quality. Moreover, our neural system outperforms classical -approaches by a large margin. -" -8358,1807.11276,"Matthias Cetto, Christina Niklaus, Andr\'e Freitas and Siegfried - Handschuh","Graphene: Semantically-Linked Propositions in Open Information - Extraction",cs.CL," We present an Open Information Extraction (IE) approach that uses a -two-layered transformation stage consisting of a clausal disembedding layer and -a phrasal disembedding layer, together with rhetorical relation identification. -In that way, we convert sentences that present a complex linguistic structure -into simplified, syntactically sound sentences, from which we can extract -propositions that are represented in a two-layered hierarchy in the form of -core relational tuples and accompanying contextual information which are -semantically linked via rhetorical relations. In a comparative evaluation, we -demonstrate that our reference implementation Graphene outperforms -state-of-the-art Open IE systems in the construction of correct n-ary -predicate-argument structures. Moreover, we show that existing Open IE -approaches can benefit from the transformation process of our framework. -" -8359,1807.11284,"Pavel Denisov, Ngoc Thang Vu, Marc Ferras Font","Unsupervised Domain Adaptation by Adversarial Learning for Robust Speech - Recognition",eess.AS cs.AI cs.CL cs.SD," In this paper, we investigate the use of adversarial learning for -unsupervised adaptation to unseen recording conditions, more specifically, -single microphone far-field speech. We adapt neural networks based acoustic -models trained with close-talk clean speech to the new recording conditions -using untranscribed adaptation data. Our experimental results on Italian -SPEECON data set show that our proposed method achieves 19.8% relative word -error rate (WER) reduction compared to the unadapted models. Furthermore, this -adaptation method is beneficial even when performed on data from another -language (i.e. French) giving 12.6% relative WER reduction. -" -8360,1807.11535,"Sanjeev Kumar Karn, Mark Buckley, Ulli Waltinger and Hinrich Sch\""utze",News Article Teaser Tweets and How to Generate Them,cs.CL," In this work, we define the task of teaser generation and provide an -evaluation benchmark and baseline systems for the process of generating -teasers. A teaser is a short reading suggestion for an article that is -illustrative and includes curiosity-arousing elements to entice potential -readers to read particular news items. Teasers are one of the main vehicles for -transmitting news to social media users. We compile a novel dataset of teasers -by systematically accumulating tweets and selecting those that conform to the -teaser definition. We have compared a number of neural abstractive -architectures on the task of teaser generation and the overall best performing -system is See et al.(2017)'s seq2seq with pointer network. -" -8361,1807.11567,"Seonghan Ryu, Seokhwan Kim, Junhwi Choi, Hwanjo Yu, Gary Geunbae Lee","Neural Sentence Embedding using Only In-domain Sentences for - Out-of-domain Sentence Detection in Dialog Systems",cs.CL cs.AI," To ensure satisfactory user experience, dialog systems must be able to -determine whether an input sentence is in-domain (ID) or out-of-domain (OOD). -We assume that only ID sentences are available as training data because -collecting enough OOD sentences in an unbiased way is a laborious and -time-consuming job. This paper proposes a novel neural sentence embedding -method that represents sentences in a low-dimensional continuous vector space -that emphasizes aspects that distinguish ID cases from OOD cases. We first used -a large set of unlabeled text to pre-train word representations that are used -to initialize neural sentence embedding. Then we used domain-category analysis -as an auxiliary task to train neural sentence embedding for OOD sentence -detection. After the sentence representations were learned, we used them to -train an autoencoder aimed at OOD sentence detection. We evaluated our method -by experimentally comparing it to the state-of-the-art methods in an -eight-domain dialog system; our proposed method achieved the highest accuracy -in all tests. -" -8362,1807.11582,Patrick Huber and Jan Niehues and Alex Waibel,A Hierarchical Approach to Neural Context-Aware Modeling,cs.CL cs.LG stat.ML," We present a new recurrent neural network topology to enhance -state-of-the-art machine learning systems by incorporating a broader context. -Our approach overcomes recent limitations with extended narratives through a -multi-layered computational approach to generate an abstract context -representation. Therefore, the developed system captures the narrative on -word-level, sentence-level, and context-level. Through the hierarchical set-up, -our proposed model summarizes the most salient information on each level and -creates an abstract representation of the extended context. We subsequently use -this representation to enhance neural language processing systems on the task -of semantic error detection. To show the potential of the newly introduced -topology, we compare the approach against a context-agnostic set-up including a -standard neural language model and a supervised binary classification network. -The performance measures on the error detection task show the advantage of the -hierarchical context-aware topologies, improving the baseline by 12.75% -relative for unsupervised models and 20.37% relative for supervised models. -" -8363,1807.11584,"Marc Franco-Salvador, Sudipta Kar, Thamar Solorio, and Paolo Rosso","UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based - Features for Community Question Answering",cs.CL," In this work we describe the system built for the three English subtasks of -the SemEval 2016 Task 3 by the Department of Computer Science of the University -of Houston (UH) and the Pattern Recognition and Human Language Technology -(PRHLT) research center - Universitat Polit`ecnica de Val`encia: UH-PRHLT. Our -system represents instances by using both lexical and semantic-based similarity -measures between text pairs. Our semantic features include the use of -distributed representations of words, knowledge graphs generated with the -BabelNet multilingual semantic network, and the FrameNet lexical database. -Experimental results outperform the random and Google search engine baselines -in the three English subtasks. Our approach obtained the highest results of -subtask B compared to the other task participants. -" -8364,1807.11605,"Hasan Sait Arslan, Mark Fishel, Gholamreza Anbarjafari",Doubly Attentive Transformer Machine Translation,cs.CL," In this paper a doubly attentive transformer machine translation model -(DATNMT) is presented in which a doubly-attentive transformer decoder normally -joins spatial visual features obtained via pretrained convolutional neural -networks, conquering any gap between image captioning and translation. In this -framework, the transformer decoder figures out how to take care of -source-language words and parts of an image freely by methods for two separate -attention components in an Enhanced Multi-Head Attention Layer of doubly -attentive transformer, as it generates words in the target language. We find -that the proposed model can effectively exploit not just the scarce multimodal -machine translation data, but also large general-domain text-only machine -translation corpora, or image-text image captioning corpora. The experimental -results show that the proposed doubly-attentive transformer-decoder performs -better than a single-decoder transformer model, and gives the state-of-the-art -results in the English-German multimodal machine translation task. -" -8365,1807.11618,"Kamal Al-Sabahi, Zuping Zhang, Jun Long, Khaled Alwesabi","An Enhanced Latent Semantic Analysis Approach for Arabic Document - Summarization",cs.CL," The fast-growing amount of information on the Internet makes the research in -automatic document summarization very urgent. It is an effective solution for -information overload. Many approaches have been proposed based on different -strategies, such as latent semantic analysis (LSA). However, LSA, when applied -to document summarization, has some limitations which diminish its performance. -In this work, we try to overcome these limitations by applying statistic and -linear algebraic approaches combined with syntactic and semantic processing of -text. First, the part of speech tagger is utilized to reduce the dimension of -LSA. Then, the weight of the term in four adjacent sentences is added to the -weighting schemes while calculating the input matrix to take into account the -word order and the syntactic relations. In addition, a new LSA-based sentence -selection algorithm is proposed, in which the term description is combined with -sentence description for each topic which in turn makes the generated summary -more informative and diverse. To ensure the effectiveness of the proposed -LSA-based sentence selection algorithm, extensive experiment on Arabic and -English are done. Four datasets are used to evaluate the new model, Linguistic -Data Consortium (LDC) Arabic Newswire-a corpus, Essex Arabic Summaries Corpus -(EASC), DUC2002, and Multilingual MSS 2015 dataset. Experimental results on the -four datasets show the effectiveness of the proposed model on Arabic and -English datasets. It performs comprehensively better compared to the -state-of-the-art methods. -" -8366,1807.11679,"Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke - Saito, Nobuaki Minematsu","Wasserstein GAN and Waveform Loss-based Acoustic Model Training for - Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder",eess.AS cs.CL cs.SD stat.ML," Recent neural networks such as WaveNet and sampleRNN that learn directly from -speech waveform samples have achieved very high-quality synthetic speech in -terms of both naturalness and speaker similarity even in multi-speaker -text-to-speech synthesis systems. Such neural networks are being used as an -alternative to vocoders and hence they are often called neural vocoders. The -neural vocoder uses acoustic features as local condition parameters, and these -parameters need to be accurately predicted by another acoustic model. However, -it is not yet clear how to train this acoustic model, which is problematic -because the final quality of synthetic speech is significantly affected by the -performance of the acoustic model. Significant degradation happens, especially -when predicted acoustic features have mismatched characteristics compared to -natural ones. In order to reduce the mismatched characteristics between natural -and generated acoustic features, we propose frameworks that incorporate either -a conditional generative adversarial network (GAN) or its variant, Wasserstein -GAN with gradient penalty (WGAN-GP), into multi-speaker speech synthesis that -uses the WaveNet vocoder. We also extend the GAN frameworks and use the -discretized mixture logistic loss of a well-trained WaveNet in addition to mean -squared error and adversarial losses as parts of objective functions. -Experimental results show that acoustic models trained using the WGAN-GP -framework using back-propagated discretized-mixture-of-logistics (DML) loss -achieves the highest subjective evaluation scores in terms of both quality and -speaker similarity. -" -8367,1807.11689,"Muhao Chen, Changping Meng, Gang Huang and Carlo Zaniolo",Neural Article Pair Modeling for Wikipedia Sub-article Matching,cs.IR cs.CL cs.HC," Nowadays, editors tend to separate different subtopics of a long Wiki-pedia -article into multiple sub-articles. This separation seeks to improve human -readability. However, it also has a deleterious effect on many Wikipedia-based -tasks that rely on the article-as-concept assumption, which requires each -entity (or concept) to be described solely by one article. This underlying -assumption significantly simplifies knowledge representation and extraction, -and it is vital to many existing technologies such as automated knowledge base -construction, cross-lingual knowledge alignment, semantic search and data -lineage of Wikipedia entities. In this paper we provide an approach to match -the scattered sub-articles back to their corresponding main-articles, with the -intent of facilitating automated Wikipedia curation and processing. The -proposed model adopts a hierarchical learning structure that combines multiple -variants of neural document pair encoders with a comprehensive set of explicit -features. A large crowdsourced dataset is created to support the evaluation and -feature extraction for the task. Based on the large dataset, the proposed model -achieves promising results of cross-validation and significantly outperforms -previous approaches. Large-scale serving on the entire English Wikipedia also -proves the practicability and scalability of the proposed model by effectively -extracting a vast collection of newly paired main and sub-articles. -" -8368,1807.11712,"Niloofar Safi Samghabadi and Deepthi Mave and Sudipta Kar and Thamar - Solorio",RiTUAL-UH at TRAC 2018 Shared Task: Aggression Identification,cs.CL," This paper presents our system for ""TRAC 2018 Shared Task on Aggression -Identification"". Our best systems for the English dataset use a combination of -lexical and semantic features. However, for Hindi data using only lexical -features gave us the best results. We obtained weighted F1- measures of 0.5921 -for the English Facebook task (ranked 12th), 0.5663 for the English Social -Media task (ranked 6th), 0.6292 for the Hindi Facebook task (ranked 1st), and -0.4853 for the Hindi Social Media task (ranked 2nd). -" -8369,1807.11714,"Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam Amancharla, Anupam - Datta",Gender Bias in Neural Natural Language Processing,cs.CL," We examine whether neural natural language processing (NLP) systems reflect -historical biases in training data. We define a general benchmark to quantify -gender bias in a variety of neural NLP tasks. Our empirical evaluation with -state-of-the-art neural coreference resolution and textbook RNN-based language -models trained on benchmark datasets finds significant gender bias in how -models view occupations. We then mitigate bias with CDA: a generic methodology -for corpus augmentation via causal interventions that breaks associations -between gendered and gender-neutral words. We empirically show that CDA -effectively decreases gender bias while preserving accuracy. We also explore -the space of mitigation strategies with CDA, a prior approach to word embedding -debiasing (WED), and their compositions. We show that CDA outperforms WED, -drastically so when word embeddings are trained. For pre-trained embeddings, -the two methods can be effectively composed. We also find that as training -proceeds on the original data set with gradient descent the gender bias grows -as the loss reduces, indicating that the optimization encourages bias; CDA -mitigates this behavior. -" -8370,1807.11761,"Michael Cochez and Martina Garofalo and J\'er\^ome Len{\ss}en and - Maria Angela Pellegrino",A First Experiment on Including Text Literals in KGloVe,cs.AI cs.CL," Graph embedding models produce embedding vectors for entities and relations -in Knowledge Graphs, often without taking literal properties into account. We -show an initial idea based on the combination of global graph structure with -additional information provided by textual information in properties. Our -initial experiment shows that this approach might be useful, but does not -clearly outperform earlier approaches when evaluated on machine learning tasks. -" -8371,1807.11838,Jonathan Connell,Extensible Grounding of Speech for Robot Instruction,cs.RO cs.AI cs.CL," Spoken language is a convenient interface for commanding a mobile robot. Yet -for this to work a number of base terms must be grounded in perceptual and -motor skills. We detail the language processing used on our robot ELI and -explain how this grounding is performed, how it interacts with user gestures, -and how it handles phenomena such as anaphora. More importantly, however, there -are certain concepts which the robot cannot be preprogrammed with, such as the -names of various objects in a household or the nature of specific tasks it may -be requested to perform. In these cases it is vital that there exist a method -for extending the grounding, essentially ""learning by being told"". We describe -how this was successfully implemented for learning new nouns and verbs in a -tabletop setting. Creating this language learning kernel may be the last -explicit programming the robot ever needs - the core mechanism could eventually -be used for imparting a vast amount of knowledge, much as a child learns from -its parents and teachers. -" -8372,1807.11906,"Mandy Guo, Qinlan Shen, Yinfei Yang, Heming Ge, Daniel Cer, Gustavo - Hernandez Abrego, Keith Stevens, Noah Constant, Yun-Hsuan Sung, Brian Strope, - Ray Kurzweil",Effective Parallel Corpus Mining using Bilingual Sentence Embeddings,cs.CL," This paper presents an effective approach for parallel corpus mining using -bilingual sentence embeddings. Our embedding models are trained to produce -similar representations exclusively for bilingual sentence pairs that are -translations of each other. This is achieved using a novel training method that -introduces hard negatives consisting of sentences that are not translations but -that have some degree of semantic similarity. The quality of the resulting -embeddings are evaluated on parallel corpus reconstruction and by assessing -machine translation systems trained on gold vs. mined sentence pairs. We find -that the sentence embeddings can be used to reconstruct the United Nations -Parallel Corpus at the sentence level with a precision of 48.9% for en-fr and -54.9% for en-es. When adapted to document level matching, we achieve a parallel -document matching accuracy that is comparable to the significantly more -computationally intensive approach of [Jakob 2010]. Using reconstructed -parallel data, we are able to train NMT models that perform nearly as well as -models trained on the original data (within 1-2 BLEU). -" -8373,1808.00054,"Michael Hahn, Frank Keller","Modeling Task Effects in Human Reading with Neural Network-based - Attention",cs.CL," Research on human reading has long documented that reading behavior shows -task-specific effects, but it has been challenging to build general models -predicting what reading behavior humans will show in a given task. We introduce -NEAT, a computational model of the allocation of attention in human reading, -based on the hypothesis that human reading optimizes a tradeoff between economy -of attention and success at a task. Our model is implemented using contemporary -neural network modeling techniques, and makes explicit and testable predictions -about how the allocation of attention varies across different tasks. We test -this in an eyetracking study comparing two versions of a reading comprehension -task, finding that our model successfully accounts for reading behavior across -the tasks. Our work thus provides evidence that task effects can be modeled as -optimal adaptation to task demands. -" -8374,1808.00103,"Paul Sheridan, Mikael Onsj\""o, Claudia Becerra, Sergio Jimenez, and - George Due\~nas","An Ontology-Based Recommender System with an Application to the Star - Trek Television Franchise",cs.IR cs.CL," Collaborative filtering based recommender systems have proven to be extremely -successful in settings where user preference data on items is abundant. -However, collaborative filtering algorithms are hindered by their weakness -against the item cold-start problem and general lack of interpretability. -Ontology-based recommender systems exploit hierarchical organizations of users -and items to enhance browsing, recommendation, and profile construction. While -ontology-based approaches address the shortcomings of their collaborative -filtering counterparts, ontological organizations of items can be difficult to -obtain for items that mostly belong to the same category (e.g., television -series episodes). In this paper, we present an ontology-based recommender -system that integrates the knowledge represented in a large ontology of -literary themes to produce fiction content recommendations. The main novelty of -this work is an ontology-based method for computing similarities between items -and its integration with the classical Item-KNN (K-nearest neighbors) -algorithm. As a study case, we evaluated the proposed method against other -approaches by performing the classical rating prediction task on a collection -of Star Trek television series episodes in an item cold-start scenario. This -transverse evaluation provides insights into the utility of different -information resources and methods for the initial stages of recommender system -development. We found our proposed method to be a convenient alternative to -collaborative filtering approaches for collections of mostly similar items, -particularly when other content-based approaches are not applicable or -otherwise unavailable. Aside from the new methods, this paper contributes a -testbed for future research and an online framework to collaboratively extend -the ontology of literary themes to cover other narrative content. -" -8375,1808.00179,"Elizaveta Korotkova, Maksym Del, Mark Fishel",Monolingual and Cross-lingual Zero-shot Style Transfer,cs.CL," We introduce the task of zero-shot style transfer between different -languages. Our training data includes multilingual parallel corpora, but does -not contain any parallel sentences between styles, similarly to the recent -previous work. We propose a unified multilingual multi-style machine -translation system design, that allows to perform zero-shot style conversions -during inference; moreover, it does so both monolingually and cross-lingually. -Our model allows to increase the presence of dissimilar styles in corpus by up -to 3 times, easily learns to operate with various contractions, and provides -reasonable lexicon swaps as we see from manual evaluation. -" -8376,1808.00265,"Yundong Zhang, Juan Carlos Niebles, Alvaro Soto","Interpretable Visual Question Answering by Visual Grounding from - Attention Supervision Mining",cs.CV cs.AI cs.CL cs.LG," A key aspect of VQA models that are interpretable is their ability to ground -their answers to relevant regions in the image. Current approaches with this -capability rely on supervised learning and human annotated groundings to train -attention mechanisms inside the VQA architecture. Unfortunately, obtaining -human annotations specific for visual grounding is difficult and expensive. In -this work, we demonstrate that we can effectively train a VQA architecture with -grounding supervision that can be automatically obtained from available region -descriptions and object annotations. We also show that our model trained with -this mined supervision generates visual groundings that achieve a higher -correlation with respect to manually-annotated groundings, meanwhile achieving -state-of-the-art VQA accuracy. -" -8377,1808.00300,"Mateusz Malinowski and Carl Doersch and Adam Santoro and Peter - Battaglia",Learning Visual Question Answering by Bootstrapping Hard Attention,cs.CV cs.AI cs.CL cs.LG cs.NE," Attention mechanisms in biological perception are thought to select subsets -of perceptual information for more sophisticated processing which would be -prohibitive to perform on all sensory inputs. In computer vision, however, -there has been relatively little exploration of hard attention, where some -information is selectively ignored, in spite of the success of soft attention, -where information is re-weighted and aggregated, but never filtered out. Here, -we introduce a new approach for hard attention and find it achieves very -competitive performance on a recently-released visual question answering -datasets, equalling and in some cases surpassing similar soft attention -architectures while entirely ignoring some features. Even though the hard -attention mechanism is thought to be non-differentiable, we found that the -feature magnitudes correlate with semantic relevance, and provide a useful -signal for our mechanism's attentional selection criterion. Because hard -attention selects important features of the input information, it can also be -more efficient than analogous soft attention mechanisms. This is especially -important for recent approaches that use non-local pairwise operations, whereby -computational and memory costs are quadratic in the size of the set of -features. -" -8378,1808.00423,Marc Velay and Fabrice Daniel,"Seq2Seq and Multi-Task Learning for joint intent and content extraction - for domain specific interpreters",cs.LG cs.CL stat.ML," This study evaluates the performances of an LSTM network for detecting and -extracting the intent and content of com- mands for a financial chatbot. It -presents two techniques, sequence to sequence learning and Multi-Task Learning, -which might improve on the previous task. -" -8379,1808.00491,"Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber and Alex - Waibel",Low-Latency Neural Speech Translation,cs.CL," Through the development of neural machine translation, the quality of machine -translation systems has been improved significantly. By exploiting advancements -in deep learning, systems are now able to better approximate the complex -mapping from source sentences to target sentences. But with this ability, new -challenges also arise. An example is the translation of partial sentences in -low-latency speech translation. Since the model has only seen complete -sentences in training, it will always try to generate a complete sentence, -though the input may only be a partial sentence. We show that NMT systems can -be adapted to scenarios where no task-specific training data is available. -Furthermore, this is possible without losing performance on the original -training data. We achieve this by creating artificial data and by using -multi-task learning. After adaptation, we are able to reduce the number of -corrections displayed during incremental output construction by 45%, without a -decrease in translation quality. -" -8380,1808.00521,"Emre Y{\i}lmaz, Henk van den Heuvel and David A. van Leeuwen","Code-Switching Detection with Data-Augmented Acoustic and Language - Models",cs.CL," In this paper, we investigate the code-switching detection performance of a -code-switching (CS) automatic speech recognition (ASR) system with -data-augmented acoustic and language models. We focus on the recognition of -Frisian-Dutch radio broadcasts where one of the mixed languages, namely -Frisian, is under-resourced. Recently, we have explored how the acoustic -modeling (AM) can benefit from monolingual speech data belonging to the -high-resourced mixed language. For this purpose, we have trained -state-of-the-art AMs on a significantly increased amount of CS speech by -applying automatic transcription and monolingual Dutch speech. Moreover, we -have improved the language model (LM) by creating CS text in various ways -including text generation using recurrent LMs trained on existing CS text. -Motivated by the significantly improved CS ASR performance, we delve into the -CS detection performance of the same ASR system in this work by reporting CS -detection accuracies together with a detailed detection error analysis. -" -8381,1808.00525,Jinseok Kim and Jenna Kim,"The impact of imbalanced training data on machine learning for author - name disambiguation",cs.IR cs.CL cs.DL cs.LG stat.ML," In supervised machine learning for author name disambiguation, negative -training data are often dominantly larger than positive training data. This -paper examines how the ratios of negative to positive training data can affect -the performance of machine learning algorithms to disambiguate author names in -bibliographic records. On multiple labeled datasets, three classifiers - -Logistic Regression, Na\""ive Bayes, and Random Forest - are trained through -representative features such as coauthor names, and title words extracted from -the same training data but with various positive-negative training data ratios. -Results show that increasing negative training data can improve disambiguation -performance but with a few percent of performance gains and sometimes degrade -it. Logistic Regression and Na\""ive Bayes learn optimal disambiguation models -even with a base ratio (1:1) of positive and negative training data. Also, the -performance improvement by Random Forest tends to quickly saturate roughly -after 1:10 ~ 1:15. These findings imply that contrary to the common practice -using all training data, name disambiguation algorithms can be trained using -part of negative training data without degrading much disambiguation -performance while increasing computational efficiency. This study calls for -more attention from author name disambiguation scholars to methods for machine -learning from imbalanced data. -" -8382,1808.00563,"Anirudh Raju, Sankaran Panchapagesan, Xing Liu, Arindam Mandal, Nikko - Strom","Data Augmentation for Robust Keyword Spotting under Playback - Interference",cs.CL cs.LG stat.ML," Accurate on-device keyword spotting (KWS) with low false accept and false -reject rate is crucial to customer experience for far-field voice control of -conversational agents. It is particularly challenging to maintain low false -reject rate in real world conditions where there is (a) ambient noise from -external sources such as TV, household appliances, or other speech that is not -directed at the device (b) imperfect cancellation of the audio playback from -the device, resulting in residual echo, after being processed by the Acoustic -Echo Cancellation (AEC) system. In this paper, we propose a data augmentation -strategy to improve keyword spotting performance under these challenging -conditions. The training set audio is artificially corrupted by mixing in music -and TV/movie audio, at different signal to interference ratios. Our results -show that we get around 30-45% relative reduction in false reject rates, at a -range of false alarm rates, under audio playback from such devices. -" -8383,1808.00639,Zhehuai Chen and Yanmin Qian and Kai Yu,"Sequence Discriminative Training for Deep Learning based Acoustic - Keyword Spotting",cs.CL," Speech recognition is a sequence prediction problem. Besides employing -various deep learning approaches for framelevel classification, sequence-level -discriminative training has been proved to be indispensable to achieve the -state-of-the-art performance in large vocabulary continuous speech recognition -(LVCSR). However, keyword spotting (KWS), as one of the most common speech -recognition tasks, almost only benefits from frame-level deep learning due to -the difficulty of getting competing sequence hypotheses. The few studies on -sequence discriminative training for KWS are limited for fixed vocabulary or -LVCSR based methods and have not been compared to the state-of-the-art deep -learning based KWS approaches. In this paper, a sequence discriminative -training framework is proposed for both fixed vocabulary and unrestricted -acoustic KWS. Sequence discriminative training for both sequence-level -generative and discriminative models are systematically investigated. By -introducing word-independent phone lattices or non-keyword blank symbols to -construct competing hypotheses, feasible and efficient sequence discriminative -training approaches are proposed for acoustic KWS. Experiments showed that the -proposed approaches obtained consistent and significant improvement in both -fixed vocabulary and unrestricted KWS tasks, compared to previous frame-level -deep learning based acoustic KWS methods. -" -8384,1808.00665,"Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa","Investigating accuracy of pitch-accent annotations in neural - network-based speech synthesis and denoising effects",eess.AS cs.CL cs.SD stat.ML," We investigated the impact of noisy linguistic features on the performance of -a Japanese speech synthesis system based on neural network that uses WaveNet -vocoder. We compared an ideal system that uses manually corrected linguistic -features including phoneme and prosodic information in training and test sets -against a few other systems that use corrupted linguistic features. Both -subjective and objective results demonstrate that corrupted linguistic -features, especially those in the test set, affected the ideal system's -performance significantly in a statistical sense due to a mismatched condition -between the training and test sets. Interestingly, while an utterance-level -Turing test showed that listeners had a difficult time differentiating -synthetic speech from natural speech, it further indicated that adding noise to -the linguistic features in the training set can partially reduce the effect of -the mismatch, regularize the model, and help the system perform better when -linguistic features of the test set are noisy. -" -8385,1808.00687,Zhehuai Chen,Linguistic Search Optimization for Deep Learning Based LVCSR,cs.CL," Recent advances in deep learning based large vocabulary con- tinuous speech -recognition (LVCSR) invoke growing demands in large scale speech transcription. -The inference process of a speech recognizer is to find a sequence of labels -whose corresponding acoustic and language models best match the input feature -[1]. The main computation includes two stages: acoustic model (AM) inference -and linguistic search (weighted finite-state transducer, WFST). Large -computational overheads of both stages hamper the wide application of LVCSR. -Benefit from stronger classifiers, deep learning, and more powerful computing -devices, we propose general ideas and some initial trials to solve these -fundamental problems. -" -8386,1808.00694,"Jyoti Jha, Sreekavitha Parupalli, Navjyoti Singh",OntoSenseNet: A Verb-Centric Ontological Resource for Indian Languages,cs.CL," Following approaches for understanding lexical meaning developed by Yaska, -Patanjali and Bhartrihari from Indian linguistic traditions and extending -approaches developed by Leibniz and Brentano in the modern times, a framework -of formal ontology of language was developed. This framework proposes that -meaning of words are in-formed by intrinsic and extrinsic ontological -structures. The paper aims to capture such intrinsic and extrinsic meanings of -words for two major Indian languages, namely, Hindi and Telugu. Parts-of-speech -have been rendered into sense-types and sense-classes. Using them we have -developed a gold- standard annotated lexical resource to support semantic -understanding of a language. The resource has collection of Hindi and Telugu -lexicons, which has been manually annotated by native speakers of the languages -following our annotation guidelines. Further, the resource was utilised to -derive adverbial sense-class distribution of verbs and karaka-verb sense- type -distribution. Different corpora (news, novels) were compared using verb -sense-types distribution. Word Embedding was used as an aid for the enrichment -of the resource. This is a work in progress that aims at lexical coverage of -language extensively. -" -8387,1808.00926,"Micha{\l} Ptaszy\'nski and Gniewosz Leliwa and Mateusz Piech and - Aleksander Smywi\'nski-Pohl","Cyberbullying Detection -- Technical Report 2/2018, Department of - Computer Science AGH, University of Science and Technology",cs.CL," The research described in this paper concerns automatic cyberbullying -detection in social media. There are two goals to achieve: building a gold -standard cyberbullying detection dataset and measuring the performance of the -Samurai cyberbullying detection system. The Formspring dataset provided in a -Kaggle competition was re-annotated as a part of the research. The annotation -procedure is described in detail and, unlike many other recent data annotation -initiatives, does not use Mechanical Turk for finding people willing to perform -the annotation. The new annotation compared to the old one seems to be more -coherent since all tested cyberbullying detection system performed better on -the former. The performance of the Samurai system is compared with 5 commercial -systems and one well-known machine learning algorithm, used for classifying -textual content, namely Fasttext. It turns out that Samurai scores the best in -all measures (accuracy, precision and recall), while Fasttext is the -second-best performing algorithm. -" -8388,1808.00957,"Vaibhav Kumar, Mrinal Dhar, Dhruv Khattar, Yash Kumar Lal, Abhimanshu - Mishra, Manish Shrivastava, Vasudeva Varma","SWDE : A Sub-Word And Document Embedding Based Engine for Clickbait - Detection",cs.IR cs.CL," In order to expand their reach and increase website ad revenue, media outlets -have started using clickbait techniques to lure readers to click on articles on -their digital platform. Having successfully enticed the user to open the -article, the article fails to satiate his curiosity serving only to boost -click-through rates. Initial methods for this task were dependent on feature -engineering, which varies with each dataset. Industry systems have relied on an -exhaustive set of rules to get the job done. Neural networks have barely been -explored to perform this task. We propose a novel approach considering -different textual embeddings of a news headline and the related article. We -generate sub-word level embeddings of the title using Convolutional Neural -Networks and use them to train a bidirectional LSTM architecture. An attention -layer allows for calculation of significance of each term towards the nature of -the post. We also generate Doc2Vec embeddings of the title and article text and -model how they interact, following which it is concatenated with the output of -the previous component. Finally, this representation is passed through a neural -network to obtain a score for the headline. We test our model over 2538 posts -(having trained it on 17000 records) and achieve an accuracy of 83.49% -outscoring previous state-of-the-art approaches. -" -8389,1808.01160,"Szymon Malik, Adrian Lancucki, Jan Chorowski",Efficient Purely Convolutional Text Encoding,cs.CL," In this work, we focus on a lightweight convolutional architecture that -creates fixed-size vector embeddings of sentences. Such representations are -useful for building NLP systems, including conversational agents. Our work -derives from a recently proposed recursive convolutional architecture for -auto-encoding text paragraphs at byte level. We propose alternations that -significantly reduce training time, the number of parameters, and improve -auto-encoding accuracy. Finally, we evaluate the representations created by our -model on tasks from SentEval benchmark suite, and show that it can serve as a -better, yet fairly low-resource alternative to popular bag-of-words embeddings. -" -8390,1808.01175,"M. Tarik Altuncu, Sophia N. Yaliraki, Mauricio Barahona","Content-driven, unsupervised clustering of news articles through - multiscale graph partitioning",cs.CL cs.IR cs.LG math.SP," The explosion in the amount of news and journalistic content being generated -across the globe, coupled with extended and instantaneous access to information -through online media, makes it difficult and time-consuming to monitor news -developments and opinion formation in real time. There is an increasing need -for tools that can pre-process, analyse and classify raw text to extract -interpretable content; specifically, identifying topics and content-driven -groupings of articles. We present here such a methodology that brings together -powerful vector embeddings from Natural Language Processing with tools from -Graph Theory that exploit diffusive dynamics on graphs to reveal natural -partitions across scales. Our framework uses a recent deep neural network text -analysis methodology (Doc2vec) to represent text in vector form and then -applies a multi-scale community detection method (Markov Stability) to -partition a similarity graph of document vectors. The method allows us to -obtain clusters of documents with similar content, at different levels of -resolution, in an unsupervised manner. We showcase our approach with the -analysis of a corpus of 9,000 news articles published by Vox Media over one -year. Our results show consistent groupings of documents according to content -without a priori assumptions about the number or type of clusters to be found. -The multilevel clustering reveals a quasi-hierarchy of topics and subtopics -with increased intelligibility and improved topic coherence as compared to -external taxonomy services and standard topic detection methods. -" -8391,1808.01216,"Md Shad Akhtar, Deepanway Ghosal, Asif Ekbal, Pushpak Bhattacharyya, - Sadao Kurohashi","A Multi-task Ensemble Framework for Emotion, Sentiment and Intensity - Prediction",cs.CL," In this paper, through multi-task ensemble framework we address three -problems of emotion and sentiment analysis i.e. ""emotion classification & -intensity"", ""valence, arousal & dominance for emotion"" and ""valence & arousal} -for sentiment"". The underlying problems cover two granularities (i.e. -coarse-grained and fine-grained) and a diverse range of domains (i.e. tweets, -Facebook posts, news headlines, blogs, letters etc.). The ensemble model aims -to leverage the learned representations of three deep learning models (i.e. -CNN, LSTM and GRU) and a hand-crafted feature representation for the -predictions. Experimental results on the benchmark datasets show the efficacy -of our proposed multi-task ensemble frameworks. We obtain the performance -improvement of 2-3 points on an average over single-task systems for most of -the problems and domains. -" -8392,1808.01371,"Raul Puri, Robert Kirby, Nikolai Yakovenko and Bryan Catanzaro",Large Scale Language Modeling: Converging on 40GB of Text in Four Hours,cs.LG cs.CL stat.ML," Recent work has shown how to train Convolutional Neural Networks (CNNs) -rapidly on large image datasets, then transfer the knowledge gained from these -models to a variety of tasks. Following [Radford 2017], in this work, we -demonstrate similar scalability and transfer for Recurrent Neural Networks -(RNNs) for Natural Language tasks. By utilizing mixed precision arithmetic and -a 32k batch size distributed across 128 NVIDIA Tesla V100 GPUs, we are able to -train a character-level 4096-dimension multiplicative LSTM (mLSTM) for -unsupervised text reconstruction over 3 epochs of the 40 GB Amazon Reviews -dataset in four hours. This runtime compares favorably with previous work -taking one month to train the same size and configuration for one epoch over -the same dataset. Converging large batch RNN models can be challenging. Recent -work has suggested scaling the learning rate as a function of batch size, but -we find that simply scaling the learning rate as a function of batch size leads -either to significantly worse convergence or immediate divergence for this -problem. We provide a learning rate schedule that allows our model to converge -with a 32k batch size. Since our model converges over the Amazon Reviews -dataset in hours, and our compute requirement of 128 Tesla V100 GPUs, while -substantial, is commercially available, this work opens up large scale -unsupervised NLP training to most commercial applications and deep learning -researchers. A model can be trained over most public or private text datasets -overnight. -" -8393,1808.01410,"Daisy Stanton, Yuxuan Wang, RJ Skerry-Ryan","Predicting Expressive Speaking Style From Text In End-To-End Speech - Synthesis",cs.CL cs.LG cs.SD eess.AS stat.ML," Global Style Tokens (GSTs) are a recently-proposed method to learn latent -disentangled representations of high-dimensional data. GSTs can be used within -Tacotron, a state-of-the-art end-to-end text-to-speech synthesis system, to -uncover expressive factors of variation in speaking style. In this work, we -introduce the Text-Predicted Global Style Token (TP-GST) architecture, which -treats GST combination weights or style embeddings as ""virtual"" speaking style -labels within Tacotron. TP-GST learns to predict stylistic renderings from text -alone, requiring neither explicit labels during training nor auxiliary inputs -for inference. We show that, when trained on a dataset of expressive speech, -our system generates audio with more pitch and energy variation than two -state-of-the-art baseline models. We further demonstrate that TP-GSTs can -synthesize speech with background noise removed, and corroborate these analyses -with positive results on human-rated listener preference audiobook tasks. -Finally, we demonstrate that multi-speaker TP-GST models successfully factorize -speaker identity and speaking style. We provide a website with audio samples -for each of our findings. -" -8394,1808.01426,"Niantao Xie, Sujian Li, Huiling Ren, and Qibin Zhai",Abstractive Summarization Improved by WordNet-based Extractive Sentences,cs.CL," Recently, the seq2seq abstractive summarization models have achieved good -results on the CNN/Daily Mail dataset. Still, how to improve abstractive -methods with extractive methods is a good research direction, since extractive -methods have their potentials of exploiting various efficient features for -extracting important sentences in one text. In this paper, in order to improve -the semantic relevance of abstractive summaries, we adopt the WordNet based -sentence ranking algorithm to extract the sentences which are most semantically -to one text. Then, we design a dual attentional seq2seq framework to generate -summaries with consideration of the extracted information. At the same time, we -combine pointer-generator and coverage mechanisms to solve the problems of -out-of-vocabulary (OOV) words and duplicate words which exist in the -abstractive models. Experiments on the CNN/Daily Mail dataset show that our -models achieve competitive performance with the state-of-the-art ROUGE scores. -Human evaluations also show that the summaries generated by our models have -high semantic relevance to the original text. -" -8395,1808.01535,"Huan Song, Megan Willi, Jayaraman J. Thiagarajan, Visar Berisha, - Andreas Spanias",Triplet Network with Attention for Speaker Diarization,eess.AS cs.CL cs.LG stat.ML," In automatic speech processing systems, speaker diarization is a crucial -front-end component to separate segments from different speakers. Inspired by -the recent success of deep neural networks (DNNs) in semantic inferencing, -triplet loss-based architectures have been successfully used for this problem. -However, existing work utilizes conventional i-vectors as the input -representation and builds simple fully connected networks for metric learning, -thus not fully leveraging the modeling power of DNN architectures. This paper -investigates the importance of learning effective representations from the -sequences directly in metric learning pipelines for speaker diarization. More -specifically, we propose to employ attention models to learn embeddings and the -metric jointly in an end-to-end fashion. Experiments are conducted on the -CALLHOME conversational speech corpus. The diarization results demonstrate -that, besides providing a unified model, the proposed approach achieves -improved performance when compared against existing approaches. -" -8396,1808.01591,"Pankaj Gupta and Hinrich Sch\""utze","LISA: Explaining Recurrent Neural Network Judgments via Layer-wIse - Semantic Accumulation and Example to Pattern Transformation",cs.CL cs.AI cs.IR cs.LG," Recurrent neural networks (RNNs) are temporal networks and cumulative in -nature that have shown promising results in various natural language processing -tasks. Despite their success, it still remains a challenge to understand their -hidden behavior. In this work, we analyze and interpret the cumulative nature -of RNN via a proposed technique named as Layer-wIse-Semantic-Accumulation -(LISA) for explaining decisions and detecting the most likely (i.e., saliency) -patterns that the network relies on while decision making. We demonstrate (1) -LISA: ""How an RNN accumulates or builds semantics during its sequential -processing for a given text example and expected response"" (2) Example2pattern: -""How the saliency patterns look like for each category in the data according to -the network in decision making"". We analyse the sensitiveness of RNNs about -different inputs to check the increase or decrease in prediction scores and -further extract the saliency patterns learned by the network. We employ two -relation classification datasets: SemEval 10 Task 8 and TAC KBP Slot Filling to -explain RNN predictions via the LISA and example2pattern. -" -8397,1808.01662,Abhijeet Gupta and Gemma Boleda and Sebastian Pado,Instantiation,cs.CL," In computational linguistics, a large body of work exists on distributed -modeling of lexical relations, focussing largely on lexical relations such as -hypernymy (scientist -- person) that hold between two categories, as expressed -by common nouns. In contrast, computational linguistics has paid little -attention to entities denoted by proper nouns (Marie Curie, Mumbai, ...). These -have investigated in detail by the Knowledge Representation and Semantic Web -communities, but generally not with regard to their linguistic properties. - Our paper closes this gap by investigating and modeling the lexical relation -of instantiation, which holds between an entity-denoting and a -category-denoting expression (Marie Curie -- scientist or Mumbai -- city). We -present a new, principled dataset for the task of instantiation detection as -well as experiments and analyses on this dataset. We obtain the following -results: (a), entities belonging to one category form a region in -distributional space, but the embedding for the category word is typically -located outside this subspace; (b) it is easy to learn to distinguish entities -from categories from distributional evidence, but due to (a), instantiation -proper is much harder to learn when using common nouns as representations of -categories; (c) this problem can be alleviated by using category -representations based on entity rather than category word embeddings. -" -8398,1808.01741,Walid S. Saba,"Logical Semantics and Commonsense Knowledge: Where Did we Go Wrong, and - How to Go Forward, Again",cs.AI cs.CL," We argue that logical semantics might have faltered due to its failure in -distinguishing between two fundamentally very different types of concepts: -ontological concepts, that should be types in a strongly-typed ontology, and -logical concepts, that are predicates corresponding to properties of and -relations between objects of various ontological types. We will then show that -accounting for these differences amounts to the integration of lexical and -compositional semantics in one coherent framework, and to an embedding in our -logical semantics of a strongly-typed ontology that reflects our commonsense -view of the world and the way we talk about it in ordinary language. We will -show that in such a framework a number of challenges in natural language -semantics can be adequately and systematically treated. -" -8399,1808.01742,Rezvaneh Rezapour,Using Linguistic Cues for Analyzing Social Movements,cs.CL cs.SI," With the growth of social media usage, social activists try to leverage this -platform to raise the awareness related to a social issue and engage the public -worldwide. The broad use of social media platforms in recent years, made it -easier for the people to stay up-to-date on the news related to regional and -worldwide events. While social media, namely Twitter, assists social movements -to connect with more people and mobilize the movement, traditional media such -as news articles help in spreading the news related to the events in a broader -aspect. In this study, we analyze linguistic features and cues, such as -individualism vs. pluralism, sentiment and emotion to examine the relationship -between the medium and discourse over time. We conduct this work in a specific -application context, the ""Black Lives Matter"" (BLM) movement, and compare -discussions related to this event in social media vs. news articles. -" +version https://git-lfs.github.com/spec/v1 +oid sha256:6467849e8340bb5dc20386099502b4974e12d4c21d77203592811ea158634a3a +size 17133959